Re: [PATCH 2/2] io_uring: introduce IORING_OP_MMAP

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: io-uring@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH 2/2] io_uring: introduce IORING_OP_MMAP
Date: Fri, 30 Jan 2026 08:55:50 -0700	[thread overview]
Message-ID: <efa7714d-565d-41c4-af85-d7a89e7fa399@kernel.dk> (raw)
In-Reply-To: <20260129221138.897715-3-krisman@suse.de>

On 1/29/26 3:11 PM, Gabriel Krisman Bertazi wrote:
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index b5b23c0d5283..e24fe3b00059 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -74,6 +74,7 @@ struct io_uring_sqe {
>  		__u32		install_fd_flags;
>  		__u32		nop_flags;
>  		__u32		pipe_flags;
> +		__u32		mmap_flags;
>  	};
>  	__u64	user_data;	/* data to be passed back at completion time */
>  	/* pack this to avoid bogus arm OABI complaints */
> @@ -303,6 +304,7 @@ enum io_uring_op {
>  	IORING_OP_PIPE,
>  	IORING_OP_NOP128,
>  	IORING_OP_URING_CMD128,
> +	IORING_OP_MMAP,
>  
>  	/* this goes last, obviously */
>  	IORING_OP_LAST,
> @@ -1113,6 +1115,14 @@ struct zcrx_ctrl {
>  	};
>  };
>  
> +struct io_uring_mmap_desc {
> +	void __user *addr;
> +	unsigned long len;
> +	unsigned long pgoff;
> +	unsigned int prot;
> +	unsigned int flags;
> +};

You can't use pointers or unsigned long or unsigned int in a uapi, as
they'd be different sizes on 32-bit and 64-bit. And then you need compat
handling. It's much better to make this:

struct io_uring_mmap_desc {
	__u64 addr
	__u64 len;
	__u64 pgoff;
	__u32 prot;
	__u32 flags;
};

and then generally also a good idea to have a bit of expansion space
there, so you don't need a new desc down the line.

> +int io_mmap_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> +	struct io_mmap_data *mmap = io_kiocb_to_cmd(req, struct io_mmap_data);
> +	struct io_mmap_async *maps;
> +	int nr_maps;
> +
> +	mmap->uaddr = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	mmap->flags = READ_ONCE(sqe->mmap_flags);
> +	nr_maps = READ_ONCE(sqe->len);
> +
> +	if (mmap->flags & MAP_ANONYMOUS && req->cqe.fd != -1)
> +		return -EINVAL;
> +	if (nr_maps < 0 || nr_maps > MMAP_MAX_BATCH)
> +		return -EINVAL;
> +	if (!access_ok(mmap->uaddr, nr_maps*sizeof(struct io_uring_mmap_desc)))
> +		return -EFAULT;

Does this access_ok actually provide anything? We're copying it in later
anyway, no?

> +static int io_prep_mmap_hugetlb(struct file **filp, unsigned long *len,
> +				int flags)
> +{
> +	if (*filp) {
> +		*len = ALIGN(*len, huge_page_size(hstate_file(*filp)));
> +	} else {
> +		struct hstate *hs;
> +		unsigned long nlen = *len;
> +
> +		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> +		if (!hs)
> +			return -EINVAL;
> +		nlen = ALIGN(nlen, huge_page_size(hs));
> +		*filp = hugetlb_file_setup(HUGETLB_ANON_FILE, nlen,
> +					   VM_NORESERVE,
> +					   HUGETLB_ANONHUGE_INODE,
> +				   (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);

This looks like it dips into vm_mmap_pgoff(). More on that below.

> +		desc->addr = (void *) vm_mmap_pgoff(file,
> +					   (unsigned long) desc->addr,
> +					   len, desc->prot, flags, desc->pgoff);

One concern here is that vm_mmap_pgoff() ends up doing:

mmap_write_lock_killable(mm)
	grabs mm lock, can block, for a long time?

which could potentially stall the io_uring pipeline for a long time.
Ideally you'd be able to do something where you try to grab the mm lock
from io_mmap(), and if it fails, then either fail the request (if it's a
killable thing) or punt it with -EAGAIN to let an io-wq thread handle
it.

I'm not so sure simply wrapping vm_mmap_pgoff() either directly or
indirectly via the hugetlb stuff is going to be super useful, if we can
end up blocking for a long time on these operations.

-- 
Jens Axboe

     prev parent reply	other threads:[~2026-01-30 15:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 22:11 [PATCH 0/2] Introduce IORING_OP_MMAP Gabriel Krisman Bertazi
2026-01-29 22:11 ` [PATCH 1/2] io_uring: Support commands with optional file descriptors Gabriel Krisman Bertazi
2026-01-29 22:11 ` [PATCH 2/2] io_uring: introduce IORING_OP_MMAP Gabriel Krisman Bertazi
2026-01-30  6:03   ` kernel test robot
2026-01-30 15:47     ` Gabriel Krisman Bertazi
2026-01-30 15:55   ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efa7714d-565d-41c4-af85-d7a89e7fa399@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=krisman@suse.de \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox