* [RFC] .flush and io_uring_cancel_files @ 2020-05-26 18:09 Pavel Begunkov 2020-05-26 21:12 ` Jens Axboe 2020-05-26 22:04 ` Jann Horn 0 siblings, 2 replies; 6+ messages in thread From: Pavel Begunkov @ 2020-05-26 18:09 UTC (permalink / raw) To: io-uring, Jens Axboe It looks like taking ->uring_lock should work like kind of grace period for struct files_struct and io_uring_flush(), and that would solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file". Can you take a look? If you like it, I'll send a proper patch and a bunch of cleanups on top. diff --git a/fs/io_uring.c b/fs/io_uring.c index a3dbd5f40391..012af200dc72 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req) * the fd has changed since we started down this path, and disallow * this operation if it has. */ - if (fcheck(ctx->ring_fd) == ctx->ring_file) { - list_add(&req->inflight_entry, &ctx->inflight_list); - req->flags |= REQ_F_INFLIGHT; - req->work.files = current->files; - ret = 0; - } + list_add(&req->inflight_entry, &ctx->inflight_list); + req->flags |= REQ_F_INFLIGHT; + req->work.files = current->files; + ret = 0; + spin_unlock_irq(&ctx->inflight_lock); rcu_read_unlock(); @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct file *file) static void io_uring_cancel_files(struct io_ring_ctx *ctx, struct files_struct *files) { + /* wait all submitters that can race for @files */ + mutex_lock(&ctx->uring_lock); + mutex_unlock(&ctx->uring_lock); + while (!list_empty_careful(&ctx->inflight_list)) { struct io_kiocb *cancel_req = NULL, *req; DEFINE_WAIT(wait); -- Pavel Begunkov ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC] .flush and io_uring_cancel_files 2020-05-26 18:09 [RFC] .flush and io_uring_cancel_files Pavel Begunkov @ 2020-05-26 21:12 ` Jens Axboe 2020-05-26 22:04 ` Jann Horn 1 sibling, 0 replies; 6+ messages in thread From: Jens Axboe @ 2020-05-26 21:12 UTC (permalink / raw) To: Pavel Begunkov, io-uring, Jann Horn On 5/26/20 12:09 PM, Pavel Begunkov wrote: > It looks like taking ->uring_lock should work like kind of grace > period for struct files_struct and io_uring_flush(), and that would > solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file". > > Can you take a look? If you like it, I'll send a proper patch > and a bunch of cleanups on top. Adding Jann. > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index a3dbd5f40391..012af200dc72 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req) > * the fd has changed since we started down this path, and disallow > * this operation if it has. > */ > - if (fcheck(ctx->ring_fd) == ctx->ring_file) { > - list_add(&req->inflight_entry, &ctx->inflight_list); > - req->flags |= REQ_F_INFLIGHT; > - req->work.files = current->files; > - ret = 0; > - } > + list_add(&req->inflight_entry, &ctx->inflight_list); > + req->flags |= REQ_F_INFLIGHT; > + req->work.files = current->files; > + ret = 0; > + > spin_unlock_irq(&ctx->inflight_lock); > rcu_read_unlock(); > > @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct > file *file) > static void io_uring_cancel_files(struct io_ring_ctx *ctx, > struct files_struct *files) > { > + /* wait all submitters that can race for @files */ > + mutex_lock(&ctx->uring_lock); > + mutex_unlock(&ctx->uring_lock); > + > while (!list_empty_careful(&ctx->inflight_list)) { > struct io_kiocb *cancel_req = NULL, *req; > DEFINE_WAIT(wait); > -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] .flush and io_uring_cancel_files 2020-05-26 18:09 [RFC] .flush and io_uring_cancel_files Pavel Begunkov 2020-05-26 21:12 ` Jens Axboe @ 2020-05-26 22:04 ` Jann Horn 2020-05-27 10:13 ` Pavel Begunkov 1 sibling, 1 reply; 6+ messages in thread From: Jann Horn @ 2020-05-26 22:04 UTC (permalink / raw) To: Pavel Begunkov; +Cc: io-uring, Jens Axboe On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote: > It looks like taking ->uring_lock should work like kind of grace > period for struct files_struct and io_uring_flush(), and that would > solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file". > > Can you take a look? If you like it, I'll send a proper patch > and a bunch of cleanups on top. > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index a3dbd5f40391..012af200dc72 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req) > * the fd has changed since we started down this path, and disallow > * this operation if it has. > */ > - if (fcheck(ctx->ring_fd) == ctx->ring_file) { > - list_add(&req->inflight_entry, &ctx->inflight_list); > - req->flags |= REQ_F_INFLIGHT; > - req->work.files = current->files; > - ret = 0; > - } > + list_add(&req->inflight_entry, &ctx->inflight_list); > + req->flags |= REQ_F_INFLIGHT; > + req->work.files = current->files; > + ret = 0; > + > spin_unlock_irq(&ctx->inflight_lock); > rcu_read_unlock(); > > @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct > file *file) > static void io_uring_cancel_files(struct io_ring_ctx *ctx, > struct files_struct *files) > { > + /* wait all submitters that can race for @files */ > + mutex_lock(&ctx->uring_lock); > + mutex_unlock(&ctx->uring_lock); > + > while (!list_empty_careful(&ctx->inflight_list)) { > struct io_kiocb *cancel_req = NULL, *req; > DEFINE_WAIT(wait); First off: You're removing a check in io_grab_files() without changing the comment that describes the check; and the new comment you're adding in io_uring_cancel_files() is IMO too short to be useful. I'm trying to figure out how your change is supposed to work, and I don't get it. If a submitter is just past fdget() (at which point no locks are held), the ->flush() caller can instantly take and drop the ->uring_lock, and then later the rest of the submission path will grab an unprotected pointer to the files_struct. Am I missing something? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] .flush and io_uring_cancel_files 2020-05-26 22:04 ` Jann Horn @ 2020-05-27 10:13 ` Pavel Begunkov 2020-05-27 18:04 ` Jann Horn 0 siblings, 1 reply; 6+ messages in thread From: Pavel Begunkov @ 2020-05-27 10:13 UTC (permalink / raw) To: Jann Horn; +Cc: io-uring, Jens Axboe On 27/05/2020 01:04, Jann Horn wrote: > On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote: >> It looks like taking ->uring_lock should work like kind of grace >> period for struct files_struct and io_uring_flush(), and that would >> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file". >> >> Can you take a look? If you like it, I'll send a proper patch >> and a bunch of cleanups on top. >> >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index a3dbd5f40391..012af200dc72 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req) >> * the fd has changed since we started down this path, and disallow >> * this operation if it has. >> */ >> - if (fcheck(ctx->ring_fd) == ctx->ring_file) { >> - list_add(&req->inflight_entry, &ctx->inflight_list); >> - req->flags |= REQ_F_INFLIGHT; >> - req->work.files = current->files; >> - ret = 0; >> - } >> + list_add(&req->inflight_entry, &ctx->inflight_list); >> + req->flags |= REQ_F_INFLIGHT; >> + req->work.files = current->files; >> + ret = 0; >> + >> spin_unlock_irq(&ctx->inflight_lock); >> rcu_read_unlock(); >> >> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct >> file *file) >> static void io_uring_cancel_files(struct io_ring_ctx *ctx, >> struct files_struct *files) >> { >> + /* wait all submitters that can race for @files */ >> + mutex_lock(&ctx->uring_lock); >> + mutex_unlock(&ctx->uring_lock); >> + >> while (!list_empty_careful(&ctx->inflight_list)) { >> struct io_kiocb *cancel_req = NULL, *req; >> DEFINE_WAIT(wait); > > First off: You're removing a check in io_grab_files() without changing > the comment that describes the check; and the new comment you're > adding in io_uring_cancel_files() is IMO too short to be useful. Obviously, it was stripped down to show the idea, nobody is talking about commiting it as is. I hoped Jens remembers it well enough to understand. Let me describe it in more details then: > > I'm trying to figure out how your change is supposed to work, and I > don't get it. If a submitter is just past fdget() (at which point no > locks are held), the ->flush() caller can instantly take and drop the > ->uring_lock, and then later the rest of the submission path will grab > an unprotected pointer to the files_struct. Am I missing something? old = tsk->files; task_lock(tsk); tsk->files = files; task_unlock(tsk); put_files_struct(old); (i.e. ->flush(old)) It's from reset_files_struct(), and I presume the whole idea of io_uring->flush() is to protect against racing for similarly going away @old files. I.e. ensuring of not having io_uring requests holding @old files. The only place, where current->files are accessed and copied by io_uring, is io_grab_files(), which is called in the submission path. And the whole submission path is done under @uring_mtx. For your case, the submitter will take @uring_mtx only after this lock/unlock happened, so it won't see old files (happens-before by locking mutex). The idea behind lock/unlock is that - submitters already locked @uring_mtx (i.e. started submission) before the lock/unlock, are waited for in the flush. These can potentially access @old. - submitters, that came after the lock/unlock, won't see @old files. So, no new request associated with @old can appear after that. All's left is to deal with already submitted requests, that's done by the rest of io_uring_cancel_files(). The thing I don't know is why current->files is originally accessed without protection in io_grab_files(), but presumably rcu_read_lock() there is for that reason. Do you see anything suspicious from the description? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] .flush and io_uring_cancel_files 2020-05-27 10:13 ` Pavel Begunkov @ 2020-05-27 18:04 ` Jann Horn 2020-05-29 8:09 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Jann Horn @ 2020-05-27 18:04 UTC (permalink / raw) To: Pavel Begunkov; +Cc: io-uring, Jens Axboe On Wed, May 27, 2020 at 12:14 PM Pavel Begunkov <[email protected]> wrote: > On 27/05/2020 01:04, Jann Horn wrote: > > On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote: > >> It looks like taking ->uring_lock should work like kind of grace > >> period for struct files_struct and io_uring_flush(), and that would > >> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file". > >> > >> Can you take a look? If you like it, I'll send a proper patch > >> and a bunch of cleanups on top. > >> > >> > >> diff --git a/fs/io_uring.c b/fs/io_uring.c > >> index a3dbd5f40391..012af200dc72 100644 > >> --- a/fs/io_uring.c > >> +++ b/fs/io_uring.c > >> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req) > >> * the fd has changed since we started down this path, and disallow > >> * this operation if it has. > >> */ > >> - if (fcheck(ctx->ring_fd) == ctx->ring_file) { > >> - list_add(&req->inflight_entry, &ctx->inflight_list); > >> - req->flags |= REQ_F_INFLIGHT; > >> - req->work.files = current->files; > >> - ret = 0; > >> - } > >> + list_add(&req->inflight_entry, &ctx->inflight_list); > >> + req->flags |= REQ_F_INFLIGHT; > >> + req->work.files = current->files; > >> + ret = 0; > >> + > >> spin_unlock_irq(&ctx->inflight_lock); > >> rcu_read_unlock(); > >> > >> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct > >> file *file) > >> static void io_uring_cancel_files(struct io_ring_ctx *ctx, > >> struct files_struct *files) > >> { > >> + /* wait all submitters that can race for @files */ > >> + mutex_lock(&ctx->uring_lock); > >> + mutex_unlock(&ctx->uring_lock); > >> + > >> while (!list_empty_careful(&ctx->inflight_list)) { > >> struct io_kiocb *cancel_req = NULL, *req; > >> DEFINE_WAIT(wait); > > > > First off: You're removing a check in io_grab_files() without changing > > the comment that describes the check; and the new comment you're > > adding in io_uring_cancel_files() is IMO too short to be useful. > > Obviously, it was stripped down to show the idea, nobody is talking about > commiting it as is. I hoped Jens remembers it well enough to understand. > Let me describe it in more details then: > > > > > I'm trying to figure out how your change is supposed to work, and I > > don't get it. If a submitter is just past fdget() (at which point no > > locks are held), the ->flush() caller can instantly take and drop the > > ->uring_lock, and then later the rest of the submission path will grab > > an unprotected pointer to the files_struct. Am I missing something? > > old = tsk->files; > task_lock(tsk); > tsk->files = files; > task_unlock(tsk); > put_files_struct(old); (i.e. ->flush(old)) > > It's from reset_files_struct(), and I presume the whole idea of > io_uring->flush() is to protect against racing for similarly going away @old > files. I.e. ensuring of not having io_uring requests holding @old files. Kind of. We use the ->flush() handler to be notified when the files_struct goes away, so that instead of holding a reference to the files_struct (which would cause a reference loop), we can clean up our references when it goes away. > The only place, where current->files are accessed and copied by io_uring, is > io_grab_files(), which is called in the submission path. And the whole > submission path is done under @uring_mtx. No it isn't. We do fdget(fd) at the start of the io_uring_enter syscall, and at that point we obviously can't hold the uring_mtx yet. > For your case, the submitter will take @uring_mtx only after this lock/unlock > happened, so it won't see old files (happens-before by locking mutex). No, it will see the old files. The concurrent operation we're worried about is not that the files_struct goes away somehow (that can't happen); what we want to guard against is a concurrent close() or dup2() or so removing the uring fd from the files_struct, because if someone calls close() before we stash a pointer to current->files, that pointer isn't protected anymore. > The thing I don't know is why current->files is originally accessed without > protection in io_grab_files(), but presumably rcu_read_lock() there is for that > reason. No, it's because current->files can never change under you; pretty much the only places where current->files can change are unshare() and execve(). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] .flush and io_uring_cancel_files 2020-05-27 18:04 ` Jann Horn @ 2020-05-29 8:09 ` Pavel Begunkov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Begunkov @ 2020-05-29 8:09 UTC (permalink / raw) To: Jann Horn; +Cc: io-uring, Jens Axboe On 27/05/2020 21:04, Jann Horn wrote: > On Wed, May 27, 2020 at 12:14 PM Pavel Begunkov <[email protected]> wrote: >> On 27/05/2020 01:04, Jann Horn wrote: >>> On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <[email protected]> wrote: >>>> It looks like taking ->uring_lock should work like kind of grace >>>> period for struct files_struct and io_uring_flush(), and that would >>>> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file". >>>> >>>> Can you take a look? If you like it, I'll send a proper patch >>>> and a bunch of cleanups on top. >>>> >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index a3dbd5f40391..012af200dc72 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req) >>>> * the fd has changed since we started down this path, and disallow >>>> * this operation if it has. >>>> */ >>>> - if (fcheck(ctx->ring_fd) == ctx->ring_file) { >>>> - list_add(&req->inflight_entry, &ctx->inflight_list); >>>> - req->flags |= REQ_F_INFLIGHT; >>>> - req->work.files = current->files; >>>> - ret = 0; >>>> - } >>>> + list_add(&req->inflight_entry, &ctx->inflight_list); >>>> + req->flags |= REQ_F_INFLIGHT; >>>> + req->work.files = current->files; >>>> + ret = 0; >>>> + >>>> spin_unlock_irq(&ctx->inflight_lock); >>>> rcu_read_unlock(); >>>> >>>> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct >>>> file *file) >>>> static void io_uring_cancel_files(struct io_ring_ctx *ctx, >>>> struct files_struct *files) >>>> { >>>> + /* wait all submitters that can race for @files */ >>>> + mutex_lock(&ctx->uring_lock); >>>> + mutex_unlock(&ctx->uring_lock); >>>> + >>>> while (!list_empty_careful(&ctx->inflight_list)) { >>>> struct io_kiocb *cancel_req = NULL, *req; >>>> DEFINE_WAIT(wait); >>> >>> First off: You're removing a check in io_grab_files() without changing >>> the comment that describes the check; and the new comment you're >>> adding in io_uring_cancel_files() is IMO too short to be useful. >> >> Obviously, it was stripped down to show the idea, nobody is talking about >> commiting it as is. I hoped Jens remembers it well enough to understand. >> Let me describe it in more details then: >> >>> >>> I'm trying to figure out how your change is supposed to work, and I >>> don't get it. If a submitter is just past fdget() (at which point no >>> locks are held), the ->flush() caller can instantly take and drop the >>> ->uring_lock, and then later the rest of the submission path will grab >>> an unprotected pointer to the files_struct. Am I missing something? >> >> old = tsk->files; >> task_lock(tsk); >> tsk->files = files; >> task_unlock(tsk); >> put_files_struct(old); (i.e. ->flush(old)) >> >> It's from reset_files_struct(), and I presume the whole idea of >> io_uring->flush() is to protect against racing for similarly going away @old >> files. I.e. ensuring of not having io_uring requests holding @old files. > > Kind of. We use the ->flush() handler to be notified when the > files_struct goes away, so that instead of holding a reference to the > files_struct (which would cause a reference loop), we can clean up our > references when it goes away. > >> The only place, where current->files are accessed and copied by io_uring, is >> io_grab_files(), which is called in the submission path. And the whole >> submission path is done under @uring_mtx. > > No it isn't. We do fdget(fd) at the start of the io_uring_enter > syscall, and at that point we obviously can't hold the uring_mtx yet. __directly__ accessing ->files, or hand-coded. fdget() by itself shouldn't be a problem. > >> For your case, the submitter will take @uring_mtx only after this lock/unlock >> happened, so it won't see old files (happens-before by locking mutex). > > No, it will see the old files. The concurrent operation we're worried From what you wrote below, it will see the old files just because nobody would try to replace them. Is that it? > about is not that the files_struct goes away somehow (that can't > happen); what we want to guard against is a concurrent close() or > dup2() or so removing the uring fd from the files_struct, because if > someone calls close() before we stash a pointer to current->files, > that pointer isn't protected anymore. Let me guess, you mean maintenance of ->files itself, such as expand_files()? > > >> The thing I don't know is why current->files is originally accessed without >> protection in io_grab_files(), but presumably rcu_read_lock() there is for that >> reason. > > No, it's because current->files can never change under you; pretty > much the only places where current->files can change are unshare() and > execve(). I see, good to know -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-05-29 8:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-05-26 18:09 [RFC] .flush and io_uring_cancel_files Pavel Begunkov 2020-05-26 21:12 ` Jens Axboe 2020-05-26 22:04 ` Jann Horn 2020-05-27 10:13 ` Pavel Begunkov 2020-05-27 18:04 ` Jann Horn 2020-05-29 8:09 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox