public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/2 for-next] Rework ->files tracking
@ 2020-09-11 21:26 Jens Axboe
  2020-09-11 21:26 ` [PATCH 1/2] io_uring: stash ctx task reference instead of task files Jens Axboe
  2020-09-11 21:26 ` [PATCH 2/2] io_uring: implement ->flush() sequence to handle ->files validity Jens Axboe
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2020-09-11 21:26 UTC (permalink / raw)
  To: io-uring; +Cc: jannh, asml.silence

As discussed on the list, it'd be really nice to rework this so we have
more flexibility (in particular in the presence of SQPOLL). Please
poke some holes in this one. Patch 2 is the meat of it, patch 1 is just
a prep patch. The basic idea is explained in that patch, so I won't
repeat that here.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] io_uring: stash ctx task reference instead of task files
  2020-09-11 21:26 [PATCH 0/2 for-next] Rework ->files tracking Jens Axboe
@ 2020-09-11 21:26 ` Jens Axboe
  2020-09-11 21:26 ` [PATCH 2/2] io_uring: implement ->flush() sequence to handle ->files validity Jens Axboe
  1 sibling, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2020-09-11 21:26 UTC (permalink / raw)
  To: io-uring; +Cc: jannh, asml.silence, Jens Axboe

We can grab a reference to the task instead of stashing away the task
files_struct. This is doable without creating a circular reference
between the ring fd and the task itself.

This is in preparation for handling the ->files assignment a bit
differently, so we don't need to force SQPOLL to enter the kernel for
an update.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7ee5e18218c2..4958a9dca51a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -290,11 +290,10 @@ struct io_ring_ctx {
 	struct io_wq		*io_wq;
 	struct mm_struct	*sqo_mm;
 	/*
-	 * For SQPOLL usage - no reference is held to this file table, we
-	 * rely on fops->flush() and our callback there waiting for the users
-	 * to finish.
+	 * For SQPOLL usage - we hold a reference to the parent task, so we
+	 * have access to the ->files
 	 */
-	struct files_struct	*sqo_files;
+	struct task_struct	*sqo_task;
 
 	struct wait_queue_entry	sqo_wait_entry;
 	struct list_head	sqd_list;
@@ -6824,10 +6823,12 @@ static int io_sq_thread(void *data)
 				old_cred = override_creds(ctx->creds);
 			}
 
-			if (current->files != ctx->sqo_files) {
+			if (current->files != ctx->sqo_task->files) {
+				task_lock(ctx->sqo_task);
 				task_lock(current);
-				current->files = ctx->sqo_files;
+				current->files = ctx->sqo_task->files;
 				task_unlock(current);
+				task_unlock(ctx->sqo_task);
 			}
 
 			ret |= __io_sq_thread(ctx, start_jiffies, cap_entries);
@@ -7155,6 +7156,11 @@ static void io_finish_async(struct io_ring_ctx *ctx)
 		io_wq_destroy(ctx->io_wq);
 		ctx->io_wq = NULL;
 	}
+
+	if (ctx->sqo_task) {
+		put_task_struct(ctx->sqo_task);
+		ctx->sqo_task = NULL;
+	}
 }
 
 #if defined(CONFIG_UNIX)
@@ -7804,11 +7810,11 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 		io_sq_thread_unpark(sqd);
 
 		/*
-		 * We will exit the sqthread before current exits, so we can
-		 * avoid taking a reference here and introducing weird
-		 * circular dependencies on the files table.
+		 * Grab task reference for SQPOLL usage. This doesn't
+		 * introduce a circular reference, as the task reference is
+		 * just to ensure that the struct itself stays valid.
 		 */
-		ctx->sqo_files = current->files;
+		ctx->sqo_task = get_task_struct(current);
 
 		ctx->sq_thread_idle = msecs_to_jiffies(p->sq_thread_idle);
 		if (!ctx->sq_thread_idle)
@@ -7850,7 +7856,10 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 
 	return 0;
 err:
-	ctx->sqo_files = NULL;
+	if (ctx->sqo_task) {
+		put_task_struct(ctx->sqo_task);
+		ctx->sqo_task = NULL;
+	}
 	io_finish_async(ctx);
 	return ret;
 }
@@ -8564,7 +8573,6 @@ static int io_uring_flush(struct file *file, void *data)
 		mutex_lock(&ctx->uring_lock);
 		ctx->ring_fd = -1;
 		ctx->ring_file = NULL;
-		ctx->sqo_files = NULL;
 		mutex_unlock(&ctx->uring_lock);
 		io_ring_set_wakeup_flag(ctx);
 		io_sq_thread_unpark(sqd);
@@ -8711,7 +8719,6 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 			mutex_lock(&ctx->uring_lock);
 			ctx->ring_fd = fd;
 			ctx->ring_file = f.file;
-			ctx->sqo_files = current->files;
 			mutex_unlock(&ctx->uring_lock);
 
 			io_sq_thread_unpark(sqd);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] io_uring: implement ->flush() sequence to handle ->files validity
  2020-09-11 21:26 [PATCH 0/2 for-next] Rework ->files tracking Jens Axboe
  2020-09-11 21:26 ` [PATCH 1/2] io_uring: stash ctx task reference instead of task files Jens Axboe
@ 2020-09-11 21:26 ` Jens Axboe
  2020-09-11 21:57   ` Jann Horn
  1 sibling, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2020-09-11 21:26 UTC (permalink / raw)
  To: io-uring; +Cc: jannh, asml.silence, Jens Axboe

The current scheme stashes away ->ring_fd and ->ring_file, and uses
that to check against whether or not ->files could have changed. This
works, but doesn't work so well for SQPOLL. If the application does
close the ring_fd, then we require that applications enter the kernel
to refresh our state.

Add an atomic sequence for the ->flush() count on the ring fd, and if
we get a mismatch between checking this sequence before and after
grabbing the ->files, then we fail the request.

This should offer the same protection that we currently have, with the
added benefit of being able to update the ->files automatically.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 137 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 83 insertions(+), 54 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4958a9dca51a..49be5e21f166 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -308,8 +308,11 @@ struct io_ring_ctx {
 	 */
 	struct fixed_file_data	*file_data;
 	unsigned		nr_user_files;
-	int 			ring_fd;
-	struct file 		*ring_file;
+
+	/* incremented when ->flush() is called */
+	atomic_t		files_seq;
+	/* assigned when ->files are grabbed */
+	int			cur_files_seq;
 
 	/* if used, fixed mapped user buffers */
 	unsigned		nr_user_bufs;
@@ -394,6 +397,7 @@ struct io_close {
 	struct file			*file;
 	struct file			*put_file;
 	int				fd;
+	int				files_seq;
 };
 
 struct io_timeout_data {
@@ -409,6 +413,7 @@ struct io_accept {
 	int __user			*addr_len;
 	int				flags;
 	unsigned long			nofile;
+	int				files_seq;
 };
 
 struct io_sync {
@@ -461,6 +466,7 @@ struct io_sr_msg {
 struct io_open {
 	struct file			*file;
 	int				dfd;
+	int				files_seq;
 	struct filename			*filename;
 	struct open_how			how;
 	unsigned long			nofile;
@@ -471,6 +477,7 @@ struct io_files_update {
 	u64				arg;
 	u32				nr_args;
 	u32				offset;
+	int				files_seq;
 };
 
 struct io_fadvise {
@@ -492,6 +499,7 @@ struct io_epoll {
 	int				epfd;
 	int				op;
 	int				fd;
+	int				files_seq;
 	struct epoll_event		event;
 };
 
@@ -518,6 +526,7 @@ struct io_statx {
 	int				dfd;
 	unsigned int			mask;
 	unsigned int			flags;
+	int				files_seq;
 	const char __user		*filename;
 	struct statx __user		*buffer;
 };
@@ -3602,6 +3611,28 @@ static int io_fallocate(struct io_kiocb *req, bool force_nonblock)
 	return 0;
 }
 
+/*
+ * Check that our ->files sequence matches. If files isn't assigned yet,
+ * just store the current sequence. If they are assigned, check against
+ * the sequence from when they got assigned. If we get a mismatch, we fail
+ * the request. This is only applicable to requests that sets ->file_table
+ * in io_op_defs[], indicating that they need access to the file_struct
+ * when executed async.
+ */
+static int io_check_files_seq(struct io_kiocb *req, int *seq)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	if (!req->work.files) {
+		*seq = ctx->cur_files_seq;
+		return 0;
+	} else if (*seq == atomic_read(&ctx->files_seq)) {
+		return 0;
+	}
+
+	return -EBADF;
+}
+
 static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	const char __user *fname;
@@ -3627,6 +3658,7 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 		return ret;
 	}
 	req->open.nofile = rlimit(RLIMIT_NOFILE);
+	req->open.files_seq = req->ctx->cur_files_seq;
 	req->flags |= REQ_F_NEED_CLEANUP;
 	return 0;
 }
@@ -3670,6 +3702,10 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 	struct file *file;
 	int ret;
 
+	ret = io_check_files_seq(req, &req->open.files_seq);
+	if (ret)
+		goto done;
+
 	if (force_nonblock)
 		return -EAGAIN;
 
@@ -3692,6 +3728,7 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock)
 err:
 	putname(req->open.filename);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
+done:
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_req_complete(req, ret);
@@ -3881,6 +3918,7 @@ static int io_epoll_ctl_prep(struct io_kiocb *req,
 			return -EFAULT;
 	}
 
+	req->epoll.files_seq = req->ctx->cur_files_seq;
 	return 0;
 #else
 	return -EOPNOTSUPP;
@@ -3894,10 +3932,15 @@ static int io_epoll_ctl(struct io_kiocb *req, bool force_nonblock,
 	struct io_epoll *ie = &req->epoll;
 	int ret;
 
+	ret = io_check_files_seq(req, &ie->files_seq);
+	if (ret)
+		goto done;
+
 	ret = do_epoll_ctl(ie->epfd, ie->op, ie->fd, &ie->event, force_nonblock);
 	if (force_nonblock && ret == -EAGAIN)
 		return -EAGAIN;
 
+done:
 	if (ret < 0)
 		req_set_fail_links(req);
 	__io_req_complete(req, ret, 0, cs);
@@ -3993,6 +4036,7 @@ static int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	req->statx.filename = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	req->statx.buffer = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 	req->statx.flags = READ_ONCE(sqe->statx_flags);
+	req->statx.files_seq = req->ctx->cur_files_seq;
 
 	return 0;
 }
@@ -4002,6 +4046,10 @@ static int io_statx(struct io_kiocb *req, bool force_nonblock)
 	struct io_statx *ctx = &req->statx;
 	int ret;
 
+	ret = io_check_files_seq(req, &ctx->files_seq);
+	if (ret)
+		goto done;
+
 	if (force_nonblock) {
 		/* only need file table for an actual valid fd */
 		if (ctx->dfd == -1 || ctx->dfd == AT_FDCWD)
@@ -4012,6 +4060,7 @@ static int io_statx(struct io_kiocb *req, bool force_nonblock)
 	ret = do_statx(ctx->dfd, ctx->filename, ctx->flags, ctx->mask,
 		       ctx->buffer);
 
+done:
 	if (ret < 0)
 		req_set_fail_links(req);
 	io_req_complete(req, ret);
@@ -4037,11 +4086,11 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return -EBADF;
 
 	req->close.fd = READ_ONCE(sqe->fd);
-	if ((req->file && req->file->f_op == &io_uring_fops) ||
-	    req->close.fd == req->ctx->ring_fd)
+	if (req->file && req->file->f_op == &io_uring_fops)
 		return -EBADF;
 
 	req->close.put_file = NULL;
+	req->close.files_seq = req->ctx->cur_files_seq;
 	return 0;
 }
 
@@ -4051,6 +4100,10 @@ static int io_close(struct io_kiocb *req, bool force_nonblock,
 	struct io_close *close = &req->close;
 	int ret;
 
+	ret = io_check_files_seq(req, &close->files_seq);
+	if (ret)
+		goto done;
+
 	/* might be already done during nonblock submission */
 	if (!close->put_file) {
 		ret = __close_fd_get_file(close->fd, &close->put_file);
@@ -4069,10 +4122,11 @@ static int io_close(struct io_kiocb *req, bool force_nonblock,
 
 	/* No ->flush() or already async, safely close from here */
 	ret = filp_close(close->put_file, req->work.files);
-	if (ret < 0)
-		req_set_fail_links(req);
 	fput(close->put_file);
 	close->put_file = NULL;
+done:
+	if (ret < 0)
+		req_set_fail_links(req);
 	__io_req_complete(req, ret, 0, cs);
 	return 0;
 }
@@ -4526,6 +4580,7 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
 	accept->flags = READ_ONCE(sqe->accept_flags);
 	accept->nofile = rlimit(RLIMIT_NOFILE);
+	accept->files_seq = req->ctx->cur_files_seq;
 	return 0;
 }
 
@@ -4536,6 +4591,10 @@ static int io_accept(struct io_kiocb *req, bool force_nonblock,
 	unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
 	int ret;
 
+	ret = io_check_files_seq(req, &accept->files_seq);
+	if (ret)
+		goto done;
+
 	if (req->file->f_flags & O_NONBLOCK)
 		req->flags |= REQ_F_NOWAIT;
 
@@ -4544,6 +4603,7 @@ static int io_accept(struct io_kiocb *req, bool force_nonblock,
 					accept->nofile);
 	if (ret == -EAGAIN && force_nonblock)
 		return -EAGAIN;
+done:
 	if (ret < 0) {
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
@@ -5513,6 +5573,7 @@ static int io_files_update_prep(struct io_kiocb *req,
 	if (!req->files_update.nr_args)
 		return -EINVAL;
 	req->files_update.arg = READ_ONCE(sqe->addr);
+	req->files_update.files_seq = req->ctx->cur_files_seq;
 	return 0;
 }
 
@@ -5523,6 +5584,10 @@ static int io_files_update(struct io_kiocb *req, bool force_nonblock,
 	struct io_uring_files_update up;
 	int ret;
 
+	ret = io_check_files_seq(req, &req->files_update.files_seq);
+	if (ret)
+		goto done;
+
 	if (force_nonblock)
 		return -EAGAIN;
 
@@ -5532,7 +5597,7 @@ static int io_files_update(struct io_kiocb *req, bool force_nonblock,
 	mutex_lock(&ctx->uring_lock);
 	ret = __io_sqe_files_update(ctx, &up, req->files_update.nr_args);
 	mutex_unlock(&ctx->uring_lock);
-
+done:
 	if (ret < 0)
 		req_set_fail_links(req);
 	__io_req_complete(req, ret, 0, cs);
@@ -6118,34 +6183,21 @@ static int io_req_set_file(struct io_submit_state *state, struct io_kiocb *req,
 
 static int io_grab_files(struct io_kiocb *req)
 {
-	int ret = -EBADF;
 	struct io_ring_ctx *ctx = req->ctx;
 
 	io_req_init_async(req);
 
 	if (req->work.files || (req->flags & REQ_F_NO_FILE_TABLE))
 		return 0;
-	if (!ctx->ring_file)
-		return -EBADF;
 
-	rcu_read_lock();
 	spin_lock_irq(&ctx->inflight_lock);
-	/*
-	 * We use the f_ops->flush() handler to ensure that we can flush
-	 * out work accessing these files if the fd is closed. Check if
-	 * the fd has changed since we started down this path, and disallow
-	 * this operation if it has.
-	 */
-	if (fcheck(ctx->ring_fd) == ctx->ring_file) {
-		list_add(&req->inflight_entry, &ctx->inflight_list);
-		req->flags |= REQ_F_INFLIGHT;
-		req->work.files = current->files;
-		ret = 0;
-	}
+	list_add(&req->inflight_entry, &ctx->inflight_list);
+	req->flags |= REQ_F_INFLIGHT;
+	ctx->cur_files_seq = atomic_read(&ctx->files_seq);
+	req->work.files = current->files;
 	spin_unlock_irq(&ctx->inflight_lock);
-	rcu_read_unlock();
 
-	return ret;
+	return 0;
 }
 
 static inline int io_prep_work_files(struct io_kiocb *req)
@@ -6705,14 +6757,7 @@ static enum sq_ret __io_sq_thread(struct io_ring_ctx *ctx,
 		mutex_unlock(&ctx->uring_lock);
 	}
 
-	/*
-	 * If ->ring_file is NULL, we're waiting on new fd/file assigment.
-	 * Don't submit anything new until that happens.
-	 */
-	if (ctx->ring_file)
-		to_submit = io_sqring_entries(ctx);
-	else
-		to_submit = 0;
+	to_submit = io_sqring_entries(ctx);
 
 	/*
 	 * If submit got -EBUSY, flag us as needing the application
@@ -6756,7 +6801,7 @@ static enum sq_ret __io_sq_thread(struct io_ring_ctx *ctx,
 		}
 
 		to_submit = io_sqring_entries(ctx);
-		if (!to_submit || ret == -EBUSY || !ctx->ring_file)
+		if (!to_submit || ret == -EBUSY)
 			return SQT_IDLE;
 
 		finish_wait(&sqd->wait, &ctx->sqo_wait_entry);
@@ -8557,6 +8602,9 @@ static int io_uring_flush(struct file *file, void *data)
 {
 	struct io_ring_ctx *ctx = file->private_data;
 
+	/* assume current files sequence is no longer valid */
+	atomic_inc(&ctx->files_seq);
+
 	io_uring_cancel_files(ctx, data);
 
 	/*
@@ -8568,13 +8616,8 @@ static int io_uring_flush(struct file *file, void *data)
 	} else if (ctx->flags & IORING_SETUP_SQPOLL) {
 		struct io_sq_data *sqd = ctx->sq_data;
 
-		/* Ring is being closed, mark us as neding new assignment */
+		/* quiesce sqpoll thread */
 		io_sq_thread_park(sqd);
-		mutex_lock(&ctx->uring_lock);
-		ctx->ring_fd = -1;
-		ctx->ring_file = NULL;
-		mutex_unlock(&ctx->uring_lock);
-		io_ring_set_wakeup_flag(ctx);
 		io_sq_thread_unpark(sqd);
 	}
 
@@ -8711,18 +8754,6 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		if (!list_empty_careful(&ctx->cq_overflow_list))
 			io_cqring_overflow_flush(ctx, false);
-		if (fd != ctx->ring_fd) {
-			struct io_sq_data *sqd = ctx->sq_data;
-
-			io_sq_thread_park(sqd);
-
-			mutex_lock(&ctx->uring_lock);
-			ctx->ring_fd = fd;
-			ctx->ring_file = f.file;
-			mutex_unlock(&ctx->uring_lock);
-
-			io_sq_thread_unpark(sqd);
-		}
 		if (flags & IORING_ENTER_SQ_WAKEUP)
 			wake_up(&ctx->sq_data->wait);
 		if (flags & IORING_ENTER_SQ_WAIT)
@@ -8730,8 +8761,6 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		submitted = to_submit;
 	} else if (to_submit) {
 		mutex_lock(&ctx->uring_lock);
-		ctx->ring_fd = fd;
-		ctx->ring_file = f.file;
 		submitted = io_submit_sqes(ctx, to_submit);
 		mutex_unlock(&ctx->uring_lock);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] io_uring: implement ->flush() sequence to handle ->files validity
  2020-09-11 21:26 ` [PATCH 2/2] io_uring: implement ->flush() sequence to handle ->files validity Jens Axboe
@ 2020-09-11 21:57   ` Jann Horn
  2020-09-11 22:33     ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Jann Horn @ 2020-09-11 21:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov

On Fri, Sep 11, 2020 at 11:28 PM Jens Axboe <[email protected]> wrote:
> The current scheme stashes away ->ring_fd and ->ring_file, and uses
> that to check against whether or not ->files could have changed. This
> works, but doesn't work so well for SQPOLL. If the application does
> close the ring_fd, then we require that applications enter the kernel
> to refresh our state.

I don't understand the intent; please describe the scenario this is
trying to fix. Is this something about applications that call dup()
and close() on the uring fd, or something like that?

> Add an atomic sequence for the ->flush() count on the ring fd, and if
> we get a mismatch between checking this sequence before and after
> grabbing the ->files, then we fail the request.

Is this expected to actually be possible during benign usage?

> This should offer the same protection that we currently have, with the
> added benefit of being able to update the ->files automatically.

Please clarify what "update the ->files" is about.

> Signed-off-by: Jens Axboe <[email protected]>
> ---
>  fs/io_uring.c | 137 ++++++++++++++++++++++++++++++--------------------
>  1 file changed, 83 insertions(+), 54 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 4958a9dca51a..49be5e21f166 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -308,8 +308,11 @@ struct io_ring_ctx {
>          */
>         struct fixed_file_data  *file_data;
>         unsigned                nr_user_files;
> -       int                     ring_fd;
> -       struct file             *ring_file;
> +
> +       /* incremented when ->flush() is called */
> +       atomic_t                files_seq;

If this ends up landing, all of this should probably use 64-bit types
(atomic64_t and s64). 32-bit counters in fast syscalls can typically
be wrapped around in a reasonable amount of time. (For example, the
VMA cache sequence number wraparound issue
<https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html>
could be triggered in about an hour according to my blogpost from back
then. For this sequence number, it should be significantly faster, I
think.)

(I haven't properly looked at the rest of this patch so far - I stared
at it for a bit, but wasn't able to immediately figure out what's
actually going on. So I figured I'd ask the more fundamental questions
first.)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] io_uring: implement ->flush() sequence to handle ->files validity
  2020-09-11 21:57   ` Jann Horn
@ 2020-09-11 22:33     ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2020-09-11 22:33 UTC (permalink / raw)
  To: Jann Horn; +Cc: io-uring, Pavel Begunkov

On 9/11/20 3:57 PM, Jann Horn wrote:
> On Fri, Sep 11, 2020 at 11:28 PM Jens Axboe <[email protected]> wrote:
>> The current scheme stashes away ->ring_fd and ->ring_file, and uses
>> that to check against whether or not ->files could have changed. This
>> works, but doesn't work so well for SQPOLL. If the application does
>> close the ring_fd, then we require that applications enter the kernel
>> to refresh our state.
> 
> I don't understand the intent; please describe the scenario this is
> trying to fix. Is this something about applications that call dup()
> and close() on the uring fd, or something like that?

Sorry, I guess it should have been clearer. It's basically just a
replacement for the old fcheck(), to guard against dup and close between
when we grab the ->files and actually use them. So functionally it
should not be any different, unless I messed something up, but it allows
us to be a bit more flexible in how we handle it. The scope should be
more exact now, as it's between when we grab the ->files and when we
actually use them.

>> Add an atomic sequence for the ->flush() count on the ring fd, and if
>> we get a mismatch between checking this sequence before and after
>> grabbing the ->files, then we fail the request.
> 
> Is this expected to actually be possible during benign usage?

Doesn't introduce any new failure cases here. If you submit an IO that
needs to use the file table and close the ring fd in between, then the
IO _will_ get canceled.

>> This should offer the same protection that we currently have, with the
>> added benefit of being able to update the ->files automatically.
> 
> Please clarify what "update the ->files" is about.

async commands that need to use current->files - that means SQPOLL, and
it means regular uses cases that end up being punted to async execution.
Hope this helps?

>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>  fs/io_uring.c | 137 ++++++++++++++++++++++++++++++--------------------
>>  1 file changed, 83 insertions(+), 54 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 4958a9dca51a..49be5e21f166 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -308,8 +308,11 @@ struct io_ring_ctx {
>>          */
>>         struct fixed_file_data  *file_data;
>>         unsigned                nr_user_files;
>> -       int                     ring_fd;
>> -       struct file             *ring_file;
>> +
>> +       /* incremented when ->flush() is called */
>> +       atomic_t                files_seq;
> 
> If this ends up landing, all of this should probably use 64-bit types
> (atomic64_t and s64). 32-bit counters in fast syscalls can typically
> be wrapped around in a reasonable amount of time. (For example, the
> VMA cache sequence number wraparound issue
> <https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html>
> could be triggered in about an hour according to my blogpost from back
> then. For this sequence number, it should be significantly faster, I
> think.)

Yeah good point, we should use atomic64 and s64 for for the other parts.
I'll make that change right now, so I don't forget...

> (I haven't properly looked at the rest of this patch so far - I stared
> at it for a bit, but wasn't able to immediately figure out what's
> actually going on. So I figured I'd ask the more fundamental questions
> first.)

Hope the above helps!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-11 22:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-11 21:26 [PATCH 0/2 for-next] Rework ->files tracking Jens Axboe
2020-09-11 21:26 ` [PATCH 1/2] io_uring: stash ctx task reference instead of task files Jens Axboe
2020-09-11 21:26 ` [PATCH 2/2] io_uring: implement ->flush() sequence to handle ->files validity Jens Axboe
2020-09-11 21:57   ` Jann Horn
2020-09-11 22:33     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox