From: Jens Axboe <axboe@kernel.dk>
To: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: io-uring@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org
Subject: Re: [PATCH 2/2] io_uring: introduce IORING_OP_MMAP
Date: Fri, 30 Jan 2026 08:55:50 -0700 [thread overview]
Message-ID: <efa7714d-565d-41c4-af85-d7a89e7fa399@kernel.dk> (raw)
In-Reply-To: <20260129221138.897715-3-krisman@suse.de>
On 1/29/26 3:11 PM, Gabriel Krisman Bertazi wrote:
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index b5b23c0d5283..e24fe3b00059 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -74,6 +74,7 @@ struct io_uring_sqe {
> __u32 install_fd_flags;
> __u32 nop_flags;
> __u32 pipe_flags;
> + __u32 mmap_flags;
> };
> __u64 user_data; /* data to be passed back at completion time */
> /* pack this to avoid bogus arm OABI complaints */
> @@ -303,6 +304,7 @@ enum io_uring_op {
> IORING_OP_PIPE,
> IORING_OP_NOP128,
> IORING_OP_URING_CMD128,
> + IORING_OP_MMAP,
>
> /* this goes last, obviously */
> IORING_OP_LAST,
> @@ -1113,6 +1115,14 @@ struct zcrx_ctrl {
> };
> };
>
> +struct io_uring_mmap_desc {
> + void __user *addr;
> + unsigned long len;
> + unsigned long pgoff;
> + unsigned int prot;
> + unsigned int flags;
> +};
You can't use pointers or unsigned long or unsigned int in a uapi, as
they'd be different sizes on 32-bit and 64-bit. And then you need compat
handling. It's much better to make this:
struct io_uring_mmap_desc {
__u64 addr
__u64 len;
__u64 pgoff;
__u32 prot;
__u32 flags;
};
and then generally also a good idea to have a bit of expansion space
there, so you don't need a new desc down the line.
> +int io_mmap_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> + struct io_mmap_data *mmap = io_kiocb_to_cmd(req, struct io_mmap_data);
> + struct io_mmap_async *maps;
> + int nr_maps;
> +
> + mmap->uaddr = u64_to_user_ptr(READ_ONCE(sqe->addr));
> + mmap->flags = READ_ONCE(sqe->mmap_flags);
> + nr_maps = READ_ONCE(sqe->len);
> +
> + if (mmap->flags & MAP_ANONYMOUS && req->cqe.fd != -1)
> + return -EINVAL;
> + if (nr_maps < 0 || nr_maps > MMAP_MAX_BATCH)
> + return -EINVAL;
> + if (!access_ok(mmap->uaddr, nr_maps*sizeof(struct io_uring_mmap_desc)))
> + return -EFAULT;
Does this access_ok actually provide anything? We're copying it in later
anyway, no?
> +static int io_prep_mmap_hugetlb(struct file **filp, unsigned long *len,
> + int flags)
> +{
> + if (*filp) {
> + *len = ALIGN(*len, huge_page_size(hstate_file(*filp)));
> + } else {
> + struct hstate *hs;
> + unsigned long nlen = *len;
> +
> + hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> + if (!hs)
> + return -EINVAL;
> + nlen = ALIGN(nlen, huge_page_size(hs));
> + *filp = hugetlb_file_setup(HUGETLB_ANON_FILE, nlen,
> + VM_NORESERVE,
> + HUGETLB_ANONHUGE_INODE,
> + (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
This looks like it dips into vm_mmap_pgoff(). More on that below.
> + desc->addr = (void *) vm_mmap_pgoff(file,
> + (unsigned long) desc->addr,
> + len, desc->prot, flags, desc->pgoff);
One concern here is that vm_mmap_pgoff() ends up doing:
mmap_write_lock_killable(mm)
grabs mm lock, can block, for a long time?
which could potentially stall the io_uring pipeline for a long time.
Ideally you'd be able to do something where you try to grab the mm lock
from io_mmap(), and if it fails, then either fail the request (if it's a
killable thing) or punt it with -EAGAIN to let an io-wq thread handle
it.
I'm not so sure simply wrapping vm_mmap_pgoff() either directly or
indirectly via the hugetlb stuff is going to be super useful, if we can
end up blocking for a long time on these operations.
--
Jens Axboe
prev parent reply other threads:[~2026-01-30 15:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-29 22:11 [PATCH 0/2] Introduce IORING_OP_MMAP Gabriel Krisman Bertazi
2026-01-29 22:11 ` [PATCH 1/2] io_uring: Support commands with optional file descriptors Gabriel Krisman Bertazi
2026-01-29 22:11 ` [PATCH 2/2] io_uring: introduce IORING_OP_MMAP Gabriel Krisman Bertazi
2026-01-30 6:03 ` kernel test robot
2026-01-30 15:47 ` Gabriel Krisman Bertazi
2026-01-30 15:55 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=efa7714d-565d-41c4-af85-d7a89e7fa399@kernel.dk \
--to=axboe@kernel.dk \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=io-uring@vger.kernel.org \
--cc=krisman@suse.de \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox