* [GIT PULL] io_uring fixes for 5.12-rc2
@ 2021-03-05 18:09 Jens Axboe
2021-03-05 20:54 ` Linus Torvalds
2021-03-05 21:58 ` pr-tracker-bot
0 siblings, 2 replies; 8+ messages in thread
From: Jens Axboe @ 2021-03-05 18:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: io-uring
Hi Linus,
A bit of a mix between fallout from the worker change,
cleanups/reductions now possible from that change, and fixes in general.
In detail:
- Fully serialize manager and worker creation, fixing races due to that.
- Clean up some naming that had gone stale.
- SQPOLL fixes.
- Fix race condition around task_work rework that went into this merge
window.
- Implement unshare. Used for when the original task does unshare(2) or
setuid/seteuid and friends, drops the original workers and forks new
ones.
- Drop the only remaining piece of state shuffling we had left, which
was cred. Move it into issue instead, and we can drop all of that code
too.
- Kill f_op->flush() usage. That was such a nasty hack that we had out
of necessity, we no longer need it.
- Following from ->flush() removal, we can also drop various bits of ctx
state related to SQPOLL and cancelations.
- Fix an issue with IOPOLL retry, which originally was fallout from a
filemap change (removing iov_iter_revert()), but uncovered an issue
with iovec re-import too late.
- Fix an issue with system suspend.
- Use xchg() for fallback work, instead of cmpxchg().
- Properly destroy io-wq on exec.
- Add create_io_thread() core helper, and use that in io-wq and
io_uring. This allows us to remove various silly completion events
related to thread setup.
- A few error handling fixes.
This should be the grunt of fixes necessary for the new workers, next
week should be quieter. We've got a pending series from Pavel on
cancelations, and how tasks and rings are indexed. Outside of that,
should just be minor fixes. Even with these fixes, we're still killing a
net ~80 lines.
Please pull!
The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8:
Linux 5.12-rc1 (2021-02-28 16:05:19 -0800)
are available in the Git repository at:
git://git.kernel.dk/linux-block.git tags/io_uring-5.12-2021-03-05
for you to fetch changes up to e45cff58858883290c98f65d409839a7295c95f3:
io_uring: don't restrict issue_flags for io_openat (2021-03-05 09:52:29 -0700)
----------------------------------------------------------------
io_uring-5.12-2021-03-05
----------------------------------------------------------------
Jens Axboe (27):
io-wq: wait for worker startup when forking a new one
io-wq: have manager wait for all workers to exit
io-wq: don't ask for a new worker if we're exiting
io-wq: rename wq->done completion to wq->started
io-wq: wait for manager exit on wq destroy
io-wq: fix double put of 'wq' in error path
io_uring: SQPOLL stop error handling fixes
io_uring: don't use complete_all() on SQPOLL thread exit
io-wq: provide an io_wq_put_and_exit() helper
io_uring: fix race condition in task_work add and clear
io_uring: remove unused argument 'tsk' from io_req_caches_free()
io_uring: kill unnecessary REQ_F_WORK_INITIALIZED checks
io_uring: move cred assignment into io_issue_sqe()
io_uring: kill unnecessary io_run_ctx_fallback() in io_ring_exit_work()
io_uring: kill io_uring_flush()
io_uring: ensure that SQPOLL thread is started for exit
io_uring: ignore double poll add on the same waitqueue head
io-wq: fix error path leak of buffered write hash map
io_uring: fix -EAGAIN retry with IOPOLL
io_uring: ensure that threads freeze on suspend
io-wq: ensure all pending work is canceled on exit
kernel: provide create_io_thread() helper
io_uring: move to using create_io_thread()
io_uring: don't keep looping for more events if we can't flush overflow
io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
io-wq: kill hashed waitqueue before manager exits
io_uring: make SQPOLL thread parking saner
Pavel Begunkov (14):
io_uring: run fallback on cancellation
io_uring: warn on not destroyed io-wq
io_uring: destroy io-wq on exec
io_uring: fix __tctx_task_work() ctx race
io_uring: replace cmpxchg in fallback with xchg
io_uring: kill sqo_dead and sqo submission halting
io_uring: remove sqo_task
io_uring: choose right tctx->io_wq for try cancel
io_uring: inline io_req_clean_work()
io_uring: inline __io_queue_async_work()
io_uring: remove extra in_idle wake up
io_uring: cancel-match based on flags
io_uring: reliably cancel linked timeouts
io_uring: don't restrict issue_flags for io_openat
fs/io-wq.c | 261 +++++++++++------------
fs/io-wq.h | 5 +-
fs/io_uring.c | 500 +++++++++++++++++++--------------------------
include/linux/io_uring.h | 2 +-
include/linux/sched/task.h | 2 +
kernel/fork.c | 30 +++
6 files changed, 361 insertions(+), 439 deletions(-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.12-rc2
2021-03-05 18:09 [GIT PULL] io_uring fixes for 5.12-rc2 Jens Axboe
@ 2021-03-05 20:54 ` Linus Torvalds
2021-03-05 21:58 ` Jens Axboe
2021-03-05 21:58 ` pr-tracker-bot
1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2021-03-05 20:54 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring
On Fri, Mar 5, 2021 at 10:09 AM Jens Axboe <[email protected]> wrote:
>
> - Implement unshare. Used for when the original task does unshare(2) or
> setuid/seteuid and friends, drops the original workers and forks new
> ones.
This explanation makes no sense to me, and I don't see any commit that
would remotely do what I parse that as doing.
I left it in the merge message, but would like you to explain what you
actually meant...
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.12-rc2
2021-03-05 20:54 ` Linus Torvalds
@ 2021-03-05 21:58 ` Jens Axboe
2021-03-05 23:03 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2021-03-05 21:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: io-uring
On 3/5/21 1:54 PM, Linus Torvalds wrote:
> On Fri, Mar 5, 2021 at 10:09 AM Jens Axboe <[email protected]> wrote:
>>
>> - Implement unshare. Used for when the original task does unshare(2) or
>> setuid/seteuid and friends, drops the original workers and forks new
>> ones.
>
> This explanation makes no sense to me, and I don't see any commit that
> would remotely do what I parse that as doing.
Hah, good catch... I actually wrote this up as I sent out the series for
reviews the other day, and I dropped the unshare bit as it was
contentious and I think there are better ways to do it. But I obviously
forgot to drop it from the commit message.
But since I have you here, would love to hear your thoughts. For the
setuid/seteuid, io_uring actually has a mechanism for this already, so I
don't believe we _need_ to do anything. You can register creds and use a
specific one for requests, this is what samba does for example.
But it pertains to the problem in general, which is how do we handle
when the original task sets up a ring and then goes and does
unshare(FILES) or whatever. It then submits a new request that just
happens to go async, which is oblivious to this fact. Same thing that
would happen in userspace if you created a thread pool and then did
unshare, and then had your existing threads handle work for you. Except
here it just kind of happens implicitly, and I can see how that would be
confusing or even problematic.
The patch mentioned above basically just did a cancel across that, like
we do for exec. If you go async after that, you get a new manager then
new threads, which takes care of it. But it's also very brutal and not
necessarily all that useful for the application.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.12-rc2
2021-03-05 18:09 [GIT PULL] io_uring fixes for 5.12-rc2 Jens Axboe
2021-03-05 20:54 ` Linus Torvalds
@ 2021-03-05 21:58 ` pr-tracker-bot
1 sibling, 0 replies; 8+ messages in thread
From: pr-tracker-bot @ 2021-03-05 21:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, io-uring
The pull request you sent on Fri, 5 Mar 2021 11:09:09 -0700:
> git://git.kernel.dk/linux-block.git tags/io_uring-5.12-2021-03-05
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f292e8730a349577aaf13635399b39a50b8f5910
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.12-rc2
2021-03-05 21:58 ` Jens Axboe
@ 2021-03-05 23:03 ` Linus Torvalds
2021-03-06 16:25 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2021-03-05 23:03 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring
On Fri, Mar 5, 2021 at 1:58 PM Jens Axboe <[email protected]> wrote:
>
> But it pertains to the problem in general, which is how do we handle
> when the original task sets up a ring and then goes and does
> unshare(FILES) or whatever. It then submits a new request that just
> happens to go async, which is oblivious to this fact. Same thing that
> would happen in userspace if you created a thread pool and then did
> unshare, and then had your existing threads handle work for you. Except
> here it just kind of happens implicitly, and I can see how that would be
> confusing or even problematic.
Honestly, I'd aim for a "keep a cred per request". And if that means
that then the async worker thread has to check that it matches the
cred it already has, then so be it.
Otherwise, you'll always have odd situations where "synchronous
completion gets different results than something that was kicked off
to an async thread".
But I don't think this has anything to do with unshare() per se. I
think this is a generic "hey, the process can change creds between
ring creation - and thread creation - and submission of an io_ring
command".
No? Or am I misunderstanding what you're thinking of?
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.12-rc2
2021-03-05 23:03 ` Linus Torvalds
@ 2021-03-06 16:25 ` Jens Axboe
2021-03-06 21:14 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2021-03-06 16:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: io-uring
On 3/5/21 4:03 PM, Linus Torvalds wrote:
> On Fri, Mar 5, 2021 at 1:58 PM Jens Axboe <[email protected]> wrote:
>>
>> But it pertains to the problem in general, which is how do we handle
>> when the original task sets up a ring and then goes and does
>> unshare(FILES) or whatever. It then submits a new request that just
>> happens to go async, which is oblivious to this fact. Same thing that
>> would happen in userspace if you created a thread pool and then did
>> unshare, and then had your existing threads handle work for you. Except
>> here it just kind of happens implicitly, and I can see how that would be
>> confusing or even problematic.
>
> Honestly, I'd aim for a "keep a cred per request". And if that means
> that then the async worker thread has to check that it matches the
> cred it already has, then so be it.
Agree, which is actually not that bad and can be done without having
io-wq deal with it. See below.
> Otherwise, you'll always have odd situations where "synchronous
> completion gets different results than something that was kicked off
> to an async thread".
Exactly, that was my main concern. It violates the principle of least
surprise, and I didn't like it.
> But I don't think this has anything to do with unshare() per se. I
> think this is a generic "hey, the process can change creds between
> ring creation - and thread creation - and submission of an io_ring
> command".
>
> No? Or am I misunderstanding what you're thinking of?
You're not, but creds is just one part of it. But I think we're OK
saying "If you do unshare(CLONE_FILES) or CLONE_FS, then it's not
going to impact async offload for your io_uring". IOW, you really
should do that before setting up your ring(s).
commit 6169c4a517317ff729553b66d55957bf03912dc4
Author: Jens Axboe <[email protected]>
Date: Sat Mar 6 09:22:27 2021 -0700
io-wq: always track creds for async issue
If we go async with a request, grab the creds that the task currently has
assigned and make sure that the async side switches to them. This is
handled in the same way that we do for registered personalities.
Signed-off-by: Jens Axboe <[email protected]>
diff --git a/fs/io-wq.h b/fs/io-wq.h
index 5fbf7997149e..1ac2f3248088 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -79,8 +79,8 @@ static inline void wq_list_del(struct io_wq_work_list *list,
struct io_wq_work {
struct io_wq_work_node list;
+ const struct cred *creds;
unsigned flags;
- unsigned short personality;
};
static inline struct io_wq_work *wq_next_work(struct io_wq_work *work)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 92c25b5f1349..d51c6ba9268b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1183,6 +1183,9 @@ static void io_prep_async_work(struct io_kiocb *req)
const struct io_op_def *def = &io_op_defs[req->opcode];
struct io_ring_ctx *ctx = req->ctx;
+ if (!req->work.creds)
+ req->work.creds = get_current_cred();
+
if (req->flags & REQ_F_FORCE_ASYNC)
req->work.flags |= IO_WQ_WORK_CONCURRENT;
@@ -1648,6 +1651,10 @@ static void io_dismantle_req(struct io_kiocb *req)
io_put_file(req, req->file, (req->flags & REQ_F_FIXED_FILE));
if (req->fixed_rsrc_refs)
percpu_ref_put(req->fixed_rsrc_refs);
+ if (req->work.creds) {
+ put_cred(req->work.creds);
+ req->work.creds = NULL;
+ }
if (req->flags & REQ_F_INFLIGHT) {
struct io_ring_ctx *ctx = req->ctx;
@@ -5916,18 +5923,8 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
const struct cred *creds = NULL;
int ret;
- if (req->work.personality) {
- const struct cred *new_creds;
-
- if (!(issue_flags & IO_URING_F_NONBLOCK))
- mutex_lock(&ctx->uring_lock);
- new_creds = idr_find(&ctx->personality_idr, req->work.personality);
- if (!(issue_flags & IO_URING_F_NONBLOCK))
- mutex_unlock(&ctx->uring_lock);
- if (!new_creds)
- return -EINVAL;
- creds = override_creds(new_creds);
- }
+ if (req->work.creds && req->work.creds != current_cred())
+ creds = override_creds(req->work.creds);
switch (req->opcode) {
case IORING_OP_NOP:
@@ -6291,7 +6288,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
{
struct io_submit_state *state;
unsigned int sqe_flags;
- int ret = 0;
+ int personality, ret = 0;
req->opcode = READ_ONCE(sqe->opcode);
/* same numerical values with corresponding REQ_F_*, safe to copy */
@@ -6324,8 +6321,16 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
return -EOPNOTSUPP;
req->work.list.next = NULL;
+ personality = READ_ONCE(sqe->personality);
+ if (personality) {
+ req->work.creds = idr_find(&ctx->personality_idr, personality);
+ if (!req->work.creds)
+ return -EINVAL;
+ get_cred(req->work.creds);
+ } else {
+ req->work.creds = NULL;
+ }
req->work.flags = 0;
- req->work.personality = READ_ONCE(sqe->personality);
state = &ctx->submit_state;
/*
--
Jens Axboe
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.12-rc2
2021-03-06 16:25 ` Jens Axboe
@ 2021-03-06 21:14 ` Linus Torvalds
2021-03-06 22:28 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2021-03-06 21:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring
On Sat, Mar 6, 2021 at 8:25 AM Jens Axboe <[email protected]> wrote:
>
> You're not, but creds is just one part of it. But I think we're OK
> saying "If you do unshare(CLONE_FILES) or CLONE_FS, then it's not
> going to impact async offload for your io_uring". IOW, you really
> should do that before setting up your ring(s).
Yeah, I think that one is solidly in a "don't do that then" thing.
Changing credentials is "normal use" - if you are a server process,
you may want to just make sure that you use different credentials for
different client requests etc.
But doing something like "I set up an io_uring for async work, and
then I dis-associated from my existing FS state entirely", that's not
normal. That's just a "you did something crazy, and you'll get crazy
results back, because the async part is fundamentally different from
the synchronous part".
It might be an option to just tear down the IO uring state entirely if
somebody does a "unshare(CLONE_FILES)", the same way unsharing the VM
(ie execve) does. But then it shouldn't even try to keep the state in
sync, it would just be "ok, your old io_uring is now simply gone".
I'm not sure it's even worth worrying about. Because at some point,
it's just "garbage in, garbage out".
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.12-rc2
2021-03-06 21:14 ` Linus Torvalds
@ 2021-03-06 22:28 ` Jens Axboe
0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2021-03-06 22:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: io-uring
On 3/6/21 2:14 PM, Linus Torvalds wrote:
> On Sat, Mar 6, 2021 at 8:25 AM Jens Axboe <[email protected]> wrote:
>>
>> You're not, but creds is just one part of it. But I think we're OK
>> saying "If you do unshare(CLONE_FILES) or CLONE_FS, then it's not
>> going to impact async offload for your io_uring". IOW, you really
>> should do that before setting up your ring(s).
>
> Yeah, I think that one is solidly in a "don't do that then" thing.
>
> Changing credentials is "normal use" - if you are a server process,
> you may want to just make sure that you use different credentials for
> different client requests etc.
>
> But doing something like "I set up an io_uring for async work, and
> then I dis-associated from my existing FS state entirely", that's not
> normal. That's just a "you did something crazy, and you'll get crazy
> results back, because the async part is fundamentally different from
> the synchronous part".
I agree, just wanted to make sure I wasn't the odd one out here...
> It might be an option to just tear down the IO uring state entirely if
> somebody does a "unshare(CLONE_FILES)", the same way unsharing the VM
> (ie execve) does. But then it shouldn't even try to keep the state in
> sync, it would just be "ok, your old io_uring is now simply gone".
>
> I'm not sure it's even worth worrying about. Because at some point,
> it's just "garbage in, garbage out".
I'd rather not worry about it. It's perfectly legit, and lots of uses
cases do this, to use a ring and never have any work that uses the
io-wq thread helpers. For those cases, there's never any concern
with having moved to a different fs or files. So I'd rather just
put that one in the category of "don't do weird stuff, or you'll
get unexpected results".
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-03-06 22:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-05 18:09 [GIT PULL] io_uring fixes for 5.12-rc2 Jens Axboe
2021-03-05 20:54 ` Linus Torvalds
2021-03-05 21:58 ` Jens Axboe
2021-03-05 23:03 ` Linus Torvalds
2021-03-06 16:25 ` Jens Axboe
2021-03-06 21:14 ` Linus Torvalds
2021-03-06 22:28 ` Jens Axboe
2021-03-05 21:58 ` pr-tracker-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox