[RFC] .flush and io_uring_cancel

public inbox for [email protected]
 help / color / mirror / Atom feed

* [RFC] .flush and io_uring_cancel_files
@ 2020-05-26 18:09 Pavel Begunkov
  2020-05-26 21:12 ` Jens Axboe
  2020-05-26 22:04 ` Jann Horn
  0 siblings, 2 replies; 6+ messages in thread
From: Pavel Begunkov @ 2020-05-26 18:09 UTC (permalink / raw)
  To: io-uring, Jens Axboe

It looks like taking ->uring_lock should work like kind of grace
period for struct files_struct and io_uring_flush(), and that would
solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file".

Can you take a look? If you like it, I'll send a proper patch
and a bunch of cleanups on top.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index a3dbd5f40391..012af200dc72 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req)
 	 * the fd has changed since we started down this path, and disallow
 	 * this operation if it has.
 	 */
-	if (fcheck(ctx->ring_fd) == ctx->ring_file) {
-		list_add(&req->inflight_entry, &ctx->inflight_list);
-		req->flags |= REQ_F_INFLIGHT;
-		req->work.files = current->files;
-		ret = 0;
-	}
+	list_add(&req->inflight_entry, &ctx->inflight_list);
+	req->flags |= REQ_F_INFLIGHT;
+	req->work.files = current->files;
+	ret = 0;
+
 	spin_unlock_irq(&ctx->inflight_lock);
 	rcu_read_unlock();

@@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct
file *file)
 static void io_uring_cancel_files(struct io_ring_ctx *ctx,
 				  struct files_struct *files)
 {
+	/* wait all submitters that can race for @files */
+	mutex_lock(&ctx->uring_lock);
+	mutex_unlock(&ctx->uring_lock);
+
 	while (!list_empty_careful(&ctx->inflight_list)) {
 		struct io_kiocb *cancel_req = NULL, *req;
 		DEFINE_WAIT(wait);

-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC] .flush and io_uring_cancel_files
  2020-05-26 18:09 [RFC] .flush and io_uring_cancel_files Pavel Begunkov
@ 2020-05-26 21:12 ` Jens Axboe
  2020-05-26 22:04 ` Jann Horn
  1 sibling, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2020-05-26 21:12 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, Jann Horn

On 5/26/20 12:09 PM, Pavel Begunkov wrote:
> It looks like taking ->uring_lock should work like kind of grace
> period for struct files_struct and io_uring_flush(), and that would
> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file".
> 
> Can you take a look? If you like it, I'll send a proper patch
> and a bunch of cleanups on top.

Adding Jann.

> 
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a3dbd5f40391..012af200dc72 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req)
>  	 * the fd has changed since we started down this path, and disallow
>  	 * this operation if it has.
>  	 */
> -	if (fcheck(ctx->ring_fd) == ctx->ring_file) {
> -		list_add(&req->inflight_entry, &ctx->inflight_list);
> -		req->flags |= REQ_F_INFLIGHT;
> -		req->work.files = current->files;
> -		ret = 0;
> -	}
> +	list_add(&req->inflight_entry, &ctx->inflight_list);
> +	req->flags |= REQ_F_INFLIGHT;
> +	req->work.files = current->files;
> +	ret = 0;
> +
>  	spin_unlock_irq(&ctx->inflight_lock);
>  	rcu_read_unlock();
> 
> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct
> file *file)
>  static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>  				  struct files_struct *files)
>  {
> +	/* wait all submitters that can race for @files */
> +	mutex_lock(&ctx->uring_lock);
> +	mutex_unlock(&ctx->uring_lock);
> +
>  	while (!list_empty_careful(&ctx->inflight_list)) {
>  		struct io_kiocb *cancel_req = NULL, *req;
>  		DEFINE_WAIT(wait);
> 


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] .flush and io_uring_cancel_files
  2020-05-26 18:09 [RFC] .flush and io_uring_cancel_files Pavel Begunkov
  2020-05-26 21:12 ` Jens Axboe
@ 2020-05-26 22:04 ` Jann Horn
  2020-05-27 10:13   ` Pavel Begunkov
  1 sibling, 1 reply; 6+ messages in thread
From: Jann Horn @ 2020-05-26 22:04 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe

On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote:
> It looks like taking ->uring_lock should work like kind of grace
> period for struct files_struct and io_uring_flush(), and that would
> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file".
>
> Can you take a look? If you like it, I'll send a proper patch
> and a bunch of cleanups on top.
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a3dbd5f40391..012af200dc72 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req)
>          * the fd has changed since we started down this path, and disallow
>          * this operation if it has.
>          */
> -       if (fcheck(ctx->ring_fd) == ctx->ring_file) {
> -               list_add(&req->inflight_entry, &ctx->inflight_list);
> -               req->flags |= REQ_F_INFLIGHT;
> -               req->work.files = current->files;
> -               ret = 0;
> -       }
> +       list_add(&req->inflight_entry, &ctx->inflight_list);
> +       req->flags |= REQ_F_INFLIGHT;
> +       req->work.files = current->files;
> +       ret = 0;
> +
>         spin_unlock_irq(&ctx->inflight_lock);
>         rcu_read_unlock();
>
> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct
> file *file)
>  static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>                                   struct files_struct *files)
>  {
> +       /* wait all submitters that can race for @files */
> +       mutex_lock(&ctx->uring_lock);
> +       mutex_unlock(&ctx->uring_lock);
> +
>         while (!list_empty_careful(&ctx->inflight_list)) {
>                 struct io_kiocb *cancel_req = NULL, *req;
>                 DEFINE_WAIT(wait);

First off: You're removing a check in io_grab_files() without changing
the comment that describes the check; and the new comment you're
adding in io_uring_cancel_files() is IMO too short to be useful.

I'm trying to figure out how your change is supposed to work, and I
don't get it. If a submitter is just past fdget() (at which point no
locks are held), the ->flush() caller can instantly take and drop the
->uring_lock, and then later the rest of the submission path will grab
an unprotected pointer to the files_struct. Am I missing something?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] .flush and io_uring_cancel_files
  2020-05-26 22:04 ` Jann Horn
@ 2020-05-27 10:13   ` Pavel Begunkov
  2020-05-27 18:04     ` Jann Horn
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2020-05-27 10:13 UTC (permalink / raw)
  To: Jann Horn; +Cc: io-uring, Jens Axboe

On 27/05/2020 01:04, Jann Horn wrote:
> On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote:
>> It looks like taking ->uring_lock should work like kind of grace
>> period for struct files_struct and io_uring_flush(), and that would
>> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file".
>>
>> Can you take a look? If you like it, I'll send a proper patch
>> and a bunch of cleanups on top.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index a3dbd5f40391..012af200dc72 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req)
>>          * the fd has changed since we started down this path, and disallow
>>          * this operation if it has.
>>          */
>> -       if (fcheck(ctx->ring_fd) == ctx->ring_file) {
>> -               list_add(&req->inflight_entry, &ctx->inflight_list);
>> -               req->flags |= REQ_F_INFLIGHT;
>> -               req->work.files = current->files;
>> -               ret = 0;
>> -       }
>> +       list_add(&req->inflight_entry, &ctx->inflight_list);
>> +       req->flags |= REQ_F_INFLIGHT;
>> +       req->work.files = current->files;
>> +       ret = 0;
>> +
>>         spin_unlock_irq(&ctx->inflight_lock);
>>         rcu_read_unlock();
>>
>> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct
>> file *file)
>>  static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>>                                   struct files_struct *files)
>>  {
>> +       /* wait all submitters that can race for @files */
>> +       mutex_lock(&ctx->uring_lock);
>> +       mutex_unlock(&ctx->uring_lock);
>> +
>>         while (!list_empty_careful(&ctx->inflight_list)) {
>>                 struct io_kiocb *cancel_req = NULL, *req;
>>                 DEFINE_WAIT(wait);
> 
> First off: You're removing a check in io_grab_files() without changing
> the comment that describes the check; and the new comment you're
> adding in io_uring_cancel_files() is IMO too short to be useful.

Obviously, it was stripped down to show the idea, nobody is talking about
commiting it as is. I hoped Jens remembers it well enough to understand.
Let me describe it in more details then:

> 
> I'm trying to figure out how your change is supposed to work, and I
> don't get it. If a submitter is just past fdget() (at which point no
> locks are held), the ->flush() caller can instantly take and drop the
> ->uring_lock, and then later the rest of the submission path will grab
> an unprotected pointer to the files_struct. Am I missing something?

old = tsk->files;
task_lock(tsk);
tsk->files = files;
task_unlock(tsk);
put_files_struct(old); (i.e. ->flush(old))

It's from reset_files_struct(), and I presume the whole idea of
io_uring->flush() is to protect against racing for similarly going away @old
files. I.e. ensuring of not having io_uring requests holding @old files.

The only place, where current->files are accessed and copied by io_uring, is
io_grab_files(), which is called in the submission path. And the whole
submission path is done under @uring_mtx.
For your case, the submitter will take @uring_mtx only after this lock/unlock
happened, so it won't see old files (happens-before by locking mutex).

The idea behind lock/unlock is that
- submitters already locked @uring_mtx (i.e. started submission) before the
lock/unlock, are waited for in the flush. These can potentially access @old.

- submitters, that came after the lock/unlock, won't see @old files.

So, no new request associated with @old can appear after that. All's left is to
deal with already submitted requests, that's done by the rest of
io_uring_cancel_files().


The thing I don't know is why current->files is originally accessed without
protection in io_grab_files(), but presumably rcu_read_lock() there is for that
reason.

Do you see anything suspicious from the description?

-- 
Pavel Begunkov



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] .flush and io_uring_cancel_files
  2020-05-27 10:13   ` Pavel Begunkov
@ 2020-05-27 18:04     ` Jann Horn
  2020-05-29  8:09       ` Pavel Begunkov
  0 siblings, 1 reply; 6+ messages in thread
From: Jann Horn @ 2020-05-27 18:04 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe

On Wed, May 27, 2020 at 12:14 PM Pavel Begunkov <[email protected]> wrote:
> On 27/05/2020 01:04, Jann Horn wrote:
> > On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote:
> >> It looks like taking ->uring_lock should work like kind of grace
> >> period for struct files_struct and io_uring_flush(), and that would
> >> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file".
> >>
> >> Can you take a look? If you like it, I'll send a proper patch
> >> and a bunch of cleanups on top.
> >>
> >>
> >> diff --git a/fs/io_uring.c b/fs/io_uring.c
> >> index a3dbd5f40391..012af200dc72 100644
> >> --- a/fs/io_uring.c
> >> +++ b/fs/io_uring.c
> >> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req)
> >>          * the fd has changed since we started down this path, and disallow
> >>          * this operation if it has.
> >>          */
> >> -       if (fcheck(ctx->ring_fd) == ctx->ring_file) {
> >> -               list_add(&req->inflight_entry, &ctx->inflight_list);
> >> -               req->flags |= REQ_F_INFLIGHT;
> >> -               req->work.files = current->files;
> >> -               ret = 0;
> >> -       }
> >> +       list_add(&req->inflight_entry, &ctx->inflight_list);
> >> +       req->flags |= REQ_F_INFLIGHT;
> >> +       req->work.files = current->files;
> >> +       ret = 0;
> >> +
> >>         spin_unlock_irq(&ctx->inflight_lock);
> >>         rcu_read_unlock();
> >>
> >> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct
> >> file *file)
> >>  static void io_uring_cancel_files(struct io_ring_ctx *ctx,
> >>                                   struct files_struct *files)
> >>  {
> >> +       /* wait all submitters that can race for @files */
> >> +       mutex_lock(&ctx->uring_lock);
> >> +       mutex_unlock(&ctx->uring_lock);
> >> +
> >>         while (!list_empty_careful(&ctx->inflight_list)) {
> >>                 struct io_kiocb *cancel_req = NULL, *req;
> >>                 DEFINE_WAIT(wait);
> >
> > First off: You're removing a check in io_grab_files() without changing
> > the comment that describes the check; and the new comment you're
> > adding in io_uring_cancel_files() is IMO too short to be useful.
>
> Obviously, it was stripped down to show the idea, nobody is talking about
> commiting it as is. I hoped Jens remembers it well enough to understand.
> Let me describe it in more details then:
>
> >
> > I'm trying to figure out how your change is supposed to work, and I
> > don't get it. If a submitter is just past fdget() (at which point no
> > locks are held), the ->flush() caller can instantly take and drop the
> > ->uring_lock, and then later the rest of the submission path will grab
> > an unprotected pointer to the files_struct. Am I missing something?
>
> old = tsk->files;
> task_lock(tsk);
> tsk->files = files;
> task_unlock(tsk);
> put_files_struct(old); (i.e. ->flush(old))
>
> It's from reset_files_struct(), and I presume the whole idea of
> io_uring->flush() is to protect against racing for similarly going away @old
> files. I.e. ensuring of not having io_uring requests holding @old files.

Kind of. We use the ->flush() handler to be notified when the
files_struct goes away, so that instead of holding a reference to the
files_struct (which would cause a reference loop), we can clean up our
references when it goes away.

> The only place, where current->files are accessed and copied by io_uring, is
> io_grab_files(), which is called in the submission path. And the whole
> submission path is done under @uring_mtx.

No it isn't. We do fdget(fd) at the start of the io_uring_enter
syscall, and at that point we obviously can't hold the uring_mtx yet.

> For your case, the submitter will take @uring_mtx only after this lock/unlock
> happened, so it won't see old files (happens-before by locking mutex).

No, it will see the old files. The concurrent operation we're worried
about is not that the files_struct goes away somehow (that can't
happen); what we want to guard against is a concurrent close() or
dup2() or so removing the uring fd from the files_struct, because if
someone calls close() before we stash a pointer to current->files,
that pointer isn't protected anymore.


> The thing I don't know is why current->files is originally accessed without
> protection in io_grab_files(), but presumably rcu_read_lock() there is for that
> reason.

No, it's because current->files can never change under you; pretty
much the only places where current->files can change are unshare() and
execve().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] .flush and io_uring_cancel_files
  2020-05-27 18:04     ` Jann Horn
@ 2020-05-29  8:09       ` Pavel Begunkov
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2020-05-29  8:09 UTC (permalink / raw)
  To: Jann Horn; +Cc: io-uring, Jens Axboe

On 27/05/2020 21:04, Jann Horn wrote:
> On Wed, May 27, 2020 at 12:14 PM Pavel Begunkov <[email protected]> wrote:
>> On 27/05/2020 01:04, Jann Horn wrote:
>>> On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote:
>>>> It looks like taking ->uring_lock should work like kind of grace
>>>> period for struct files_struct and io_uring_flush(), and that would
>>>> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file".
>>>>
>>>> Can you take a look? If you like it, I'll send a proper patch
>>>> and a bunch of cleanups on top.
>>>>
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index a3dbd5f40391..012af200dc72 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req)
>>>>          * the fd has changed since we started down this path, and disallow
>>>>          * this operation if it has.
>>>>          */
>>>> -       if (fcheck(ctx->ring_fd) == ctx->ring_file) {
>>>> -               list_add(&req->inflight_entry, &ctx->inflight_list);
>>>> -               req->flags |= REQ_F_INFLIGHT;
>>>> -               req->work.files = current->files;
>>>> -               ret = 0;
>>>> -       }
>>>> +       list_add(&req->inflight_entry, &ctx->inflight_list);
>>>> +       req->flags |= REQ_F_INFLIGHT;
>>>> +       req->work.files = current->files;
>>>> +       ret = 0;
>>>> +
>>>>         spin_unlock_irq(&ctx->inflight_lock);
>>>>         rcu_read_unlock();
>>>>
>>>> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct
>>>> file *file)
>>>>  static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>>>>                                   struct files_struct *files)
>>>>  {
>>>> +       /* wait all submitters that can race for @files */
>>>> +       mutex_lock(&ctx->uring_lock);
>>>> +       mutex_unlock(&ctx->uring_lock);
>>>> +
>>>>         while (!list_empty_careful(&ctx->inflight_list)) {
>>>>                 struct io_kiocb *cancel_req = NULL, *req;
>>>>                 DEFINE_WAIT(wait);
>>>
>>> First off: You're removing a check in io_grab_files() without changing
>>> the comment that describes the check; and the new comment you're
>>> adding in io_uring_cancel_files() is IMO too short to be useful.
>>
>> Obviously, it was stripped down to show the idea, nobody is talking about
>> commiting it as is. I hoped Jens remembers it well enough to understand.
>> Let me describe it in more details then:
>>
>>>
>>> I'm trying to figure out how your change is supposed to work, and I
>>> don't get it. If a submitter is just past fdget() (at which point no
>>> locks are held), the ->flush() caller can instantly take and drop the
>>> ->uring_lock, and then later the rest of the submission path will grab
>>> an unprotected pointer to the files_struct. Am I missing something?
>>
>> old = tsk->files;
>> task_lock(tsk);
>> tsk->files = files;
>> task_unlock(tsk);
>> put_files_struct(old); (i.e. ->flush(old))
>>
>> It's from reset_files_struct(), and I presume the whole idea of
>> io_uring->flush() is to protect against racing for similarly going away @old
>> files. I.e. ensuring of not having io_uring requests holding @old files.
> 
> Kind of. We use the ->flush() handler to be notified when the
> files_struct goes away, so that instead of holding a reference to the
> files_struct (which would cause a reference loop), we can clean up our
> references when it goes away.
> 
>> The only place, where current->files are accessed and copied by io_uring, is
>> io_grab_files(), which is called in the submission path. And the whole
>> submission path is done under @uring_mtx.
> 
> No it isn't. We do fdget(fd) at the start of the io_uring_enter
> syscall, and at that point we obviously can't hold the uring_mtx yet.

__directly__ accessing ->files, or hand-coded. fdget() by itself shouldn't be a
problem.

> 
>> For your case, the submitter will take @uring_mtx only after this lock/unlock
>> happened, so it won't see old files (happens-before by locking mutex).
> 
> No, it will see the old files. The concurrent operation we're worried

From what you wrote below, it will see the old files just because nobody would
try to replace them. Is that it?

> about is not that the files_struct goes away somehow (that can't
> happen); what we want to guard against is a concurrent close() or
> dup2() or so removing the uring fd from the files_struct, because if
> someone calls close() before we stash a pointer to current->files,
> that pointer isn't protected anymore.

Let me guess, you mean maintenance of ->files itself, such as expand_files()?

> 
> 
>> The thing I don't know is why current->files is originally accessed without
>> protection in io_grab_files(), but presumably rcu_read_lock() there is for that
>> reason.
> 
> No, it's because current->files can never change under you; pretty
> much the only places where current->files can change are unshare() and
> execve().

I see, good to know


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-29  8:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-26 18:09 [RFC] .flush and io_uring_cancel_files Pavel Begunkov
2020-05-26 21:12 ` Jens Axboe
2020-05-26 22:04 ` Jann Horn
2020-05-27 10:13   ` Pavel Begunkov
2020-05-27 18:04     ` Jann Horn
2020-05-29  8:09       ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox