public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Josh Triplett <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: [WIP PATCH] io_uring: Support opening a file into the fixed-file table
Date: Wed, 15 Jul 2020 22:54:21 +0300	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <5e04f8fc6b0a2e218ace517bc9acf0d44530c430.1594759879.git.josh@joshtriplett.org>

On 15/07/2020 00:08, Josh Triplett wrote:
> Add a new operation IORING_OP_OPENAT2_FIXED_FILE, which opens a file
> into the fixed-file table rather than installing a file descriptor.
> Using a new operation avoids having an IOSQE flag that almost all
> operations will need to ignore; io_openat2_fixed_file also has
> substantially different control-flow than io_openat2, and it can avoid
> requiring the file table if not needed for the dirfd.
> 
> (This intentionally does not use the IOSQE_FIXED_FILE flag, because
> semantically, IOSQE_FIXED_FILE for openat2 should mean to interpret the
> dirfd as a fixed-file-table index, and that would be useful future
> behavior for both IORING_OP_OPENAT2 and IORING_OP_OPENAT2_FIXED_FILE.)
> 
> Create a new io_sqe_files_add_new function to add a single new file to
> the fixed-file table. This function returns -EBUSY if attempting to
> overwrite an existing file.
> 
> Provide a new field to pass along the fixed-file-table index for an
> open-like operation; future operations such as
> IORING_OP_ACCEPT_FIXED_FILE can use the same index.
> 
> Signed-off-by: Josh Triplett <[email protected]>
> ---
> 
> (Should this check for and reject open flags like O_CLOEXEC that only
> affect the file descriptor?)
> 
> I've tested this (and I'll send my liburing patch momentarily), and it
> works fine if you do the open in one batch and operate on the fixed-file
> in another batch. As discussed via Twitter, opening and operating on a
> file in the same batch will require changing other operations to obtain
> their fixed-file entries later, post-prep.
> 
> It might make sense to do and test that for one operation at a time, and
> add a .late_fixed_file flag to the operation definition for operations
> that support that.
> 
> It might also make sense to have the prep for
> IORING_OP_OPENAT2_FIXED_FILE stick an indication in the fixed-file table
> that there *will* be a file there later, perhaps an
> ERR_PTR(-EINPROGRESS), and make sure there isn't one already, to detect
> potential errors earlier and to let the prep for other operations
> confirm that there *will* be a file; on the other hand, that would mean
> there's an invalid non-NULL file pointer in the fixed file table, which
> seems potentially error-prone if any operation ever forgets that.
> 
> The other next step would be to add an IORING_OP_CLOSE_FIXED_FILE
> (separate from the existing CLOSE op) that removes an entry currently in
> the fixed file table and calls fput on it. (With some care, that
> *should* be possible even for an entry that was originally registered
> from a file descriptor.)
> 
> And finally, we should have an IORING_OP_FIXED_FILE_TO_FD operation,
> which calls get_unused_fd_flags (with specified flags to allow for
> O_CLOEXEC) and then fd_install. That allows opening a file via io_uring,
> operating on it via the ring, but then also operating on it via other
> syscalls (or inheriting it or anything else you can do with a file
> descriptor).
> 
>  fs/io_uring.c                 | 90 ++++++++++++++++++++++++++++++++++-
>  include/uapi/linux/io_uring.h |  6 ++-
>  2 files changed, 94 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 9fd7e69696c3..df6f017ef8e8 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -425,6 +425,7 @@ struct io_sr_msg {
>  struct io_open {
>  	struct file			*file;
>  	int				dfd;
> +	u32				open_fixed_idx;
>  	struct filename			*filename;
>  	struct open_how			how;
>  	unsigned long			nofile;
> @@ -878,6 +879,10 @@ static const struct io_op_def io_op_defs[] = {
>  		.hash_reg_file		= 1,
>  		.unbound_nonreg_file	= 1,
>  	},
> +	[IORING_OP_OPENAT2_FIXED_FILE] = {
> +		.file_table		= 1,
> +		.needs_fs		= 1,
> +	},
>  };
>  
>  static void io_wq_submit_work(struct io_wq_work **workptr);
> @@ -886,6 +891,9 @@ static void io_put_req(struct io_kiocb *req);
>  static void __io_double_put_req(struct io_kiocb *req);
>  static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req);
>  static void io_queue_linked_timeout(struct io_kiocb *req);
> +static int io_sqe_files_add_new(struct io_ring_ctx *ctx,
> +				u32 index,
> +				struct file *file);
>  static int __io_sqe_files_update(struct io_ring_ctx *ctx,
>  				 struct io_uring_files_update *ip,
>  				 unsigned nr_args);
> @@ -3060,10 +3068,48 @@ static int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  					len);
>  	if (ret)
>  		return ret;
> +	req->open.open_fixed_idx = READ_ONCE(sqe->open_fixed_idx);
>  
>  	return __io_openat_prep(req, sqe);
>  }
>  
> +static int io_openat2_fixed_file(struct io_kiocb *req, bool force_nonblock)
> +{

How about having it in io_openat2()? There are almost identical, that would be
just a couple of if's.

> +	struct io_open *open = &req->open;
> +	struct open_flags op;
> +	struct file *file;
> +	int ret;
> +
> +	if (force_nonblock) {
> +		/* only need file table for an actual valid fd */
> +		if (open->dfd == -1 || open->dfd == AT_FDCWD)
> +			req->flags |= REQ_F_NO_FILE_TABLE;
> +		return -EAGAIN;
> +	}
> +
> +	ret = build_open_flags(&open->how, &op);
> +	if (ret)
> +		goto err;
> +
> +	file = do_filp_open(open->dfd, open->filename, &op);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +	} else {
> +		fsnotify_open(file);
> +		ret = io_sqe_files_add_new(req->ctx, open->open_fixed_idx, file);
> +		if (ret)
> +			fput(file);
> +	}
> +err:
> +	putname(open->filename);
> +	req->flags &= ~REQ_F_NEED_CLEANUP;
> +	if (ret < 0)
> +		req_set_fail_links(req);
> +	io_cqring_add_event(req, ret);
> +	io_put_req(req);

These 2 lines are better to be replace with (since 5.9):

io_req_complete(req, ret);

> +	return 0;
> +}
> +
>  static int io_openat2(struct io_kiocb *req, bool force_nonblock)
>  {
>  	struct open_flags op;
> @@ -5048,6 +5094,7 @@ static int io_req_defer_prep(struct io_kiocb *req,
>  		ret = io_madvise_prep(req, sqe);
>  		break;
>  	case IORING_OP_OPENAT2:
> +	case IORING_OP_OPENAT2_FIXED_FILE:
>  		ret = io_openat2_prep(req, sqe);
>  		break;
>  	case IORING_OP_EPOLL_CTL:
> @@ -5135,6 +5182,7 @@ static void io_cleanup_req(struct io_kiocb *req)
>  		break;
>  	case IORING_OP_OPENAT:
>  	case IORING_OP_OPENAT2:
> +	case IORING_OP_OPENAT2_FIXED_FILE:

These OPENAT cases weren't doing anything, so were killed,
as should be this line.

>  		break;
>  	case IORING_OP_SPLICE:
>  	case IORING_OP_TEE:
> @@ -5329,12 +5377,17 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  		ret = io_madvise(req, force_nonblock);
>  		break;
>  	case IORING_OP_OPENAT2:
> +	case IORING_OP_OPENAT2_FIXED_FILE:
>  		if (sqe) {
>  			ret = io_openat2_prep(req, sqe);
>  			if (ret)
>  				break;
>  		}
> -		ret = io_openat2(req, force_nonblock);
> +		if (req->opcode == IORING_OP_OPENAT2) {
> +			ret = io_openat2(req, force_nonblock);
> +		} else {
> +			ret = io_openat2_fixed_file(req, force_nonblock);
> +		}

We don't need all these brackets for one liners

>  		break;
>  	case IORING_OP_EPOLL_CTL:
>  		if (sqe) {
> @@ -6791,6 +6844,41 @@ static int io_queue_file_removal(struct fixed_file_data *data,
>  	return 0;
>  }
>  
> +/*
> + * Add a single new file in an empty entry of the fixed file table. Does not
> + * allow overwriting an existing entry; returns -EBUSY in that case.
> + */
> +static int io_sqe_files_add_new(struct io_ring_ctx *ctx,
> +				u32 index,
> +				struct file *file)
> +{
> +	struct fixed_file_table *table;
> +	u32 i;
> +	int err;
> +
> +	if (unlikely(index > ctx->nr_user_files))
> +		return -EINVAL;
> +	i = array_index_nospec(index, ctx->nr_user_files);
> +	table = &ctx->file_data->table[i >> IORING_FILE_TABLE_SHIFT];
> +	index = i & IORING_FILE_TABLE_MASK;
> +	if (unlikely(table->files[index]))
> +		return -EBUSY;
> +	/*
> +	 * Don't allow io_uring instances to be registered. If UNIX isn't
> +	 * enabled, then this causes a reference cycle and this instance can
> +	 * never get freed. If UNIX is enabled we'll handle it just fine, but
> +	 * there's still no point in allowing a ring fd as it doesn't support
> +	 * regular read/write anyway.
> +	 */
> +	if (unlikely(file->f_op == &io_uring_fops))
> +		return -EBADF;
> +	err = io_sqe_file_register(ctx, file, i);
> +	if (err)
> +		return err;
> +	table->files[index] = file;
> +	return 0;
> +}
> +
>  static int __io_sqe_files_update(struct io_ring_ctx *ctx,
>  				 struct io_uring_files_update *up,
>  				 unsigned nr_args)
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 7843742b8b74..95f107e6f65e 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -54,7 +54,10 @@ struct io_uring_sqe {
>  			} __attribute__((packed));
>  			/* personality to use, if used */
>  			__u16	personality;
> -			__s32	splice_fd_in;
> +			union {
> +				__s32	splice_fd_in;
> +				__s32	open_fixed_idx;
> +			};
>  		};
>  		__u64	__pad2[3];
>  	};
> @@ -130,6 +133,7 @@ enum {
>  	IORING_OP_PROVIDE_BUFFERS,
>  	IORING_OP_REMOVE_BUFFERS,
>  	IORING_OP_TEE,
> +	IORING_OP_OPENAT2_FIXED_FILE,

I think, it's better to reuse IORING_OP_OPENAT2.
E.g. fixed version if "open_fixed_idx != 0" or something similar.

>  
>  	/* this goes last, obviously */
>  	IORING_OP_LAST,
> 

-- 
Pavel Begunkov

  parent reply	other threads:[~2020-07-15 19:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-14 21:08 [WIP PATCH] io_uring: Support opening a file into the fixed-file table Josh Triplett
2020-07-14 21:16 ` [WIP PATCH] liburing: Support IORING_OP_OPENAT2_FIXED_FILE Josh Triplett
2020-07-14 22:59 ` [WIP PATCH] io_uring: Support opening a file into the fixed-file table Clay Harris
2020-07-15  0:42   ` Josh Triplett
2020-07-15  2:32   ` Clay Harris
2020-07-15 21:04     ` josh
2020-07-15 16:07 ` Jens Axboe
2020-07-15 19:54 ` Pavel Begunkov [this message]
2020-07-15 20:46   ` Josh Triplett
2020-07-15 20:54     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox