* [GIT PULL] io_uring fixes for 5.15-rc3 @ 2021-09-25 20:32 Jens Axboe 2021-09-25 23:05 ` Linus Torvalds 2021-09-25 23:05 ` pr-tracker-bot 0 siblings, 2 replies; 12+ messages in thread From: Jens Axboe @ 2021-09-25 20:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: io-uring Hi Linus, This one looks a bit bigger than it is, but that's mainly because 2/3 of it is enabling IORING_OP_CLOSE to close direct file descriptors. We've had a few folks using them and finding it confusing that the way to close them is through using -1 for file update, this just brings API symmetry for direct descriptors. Hence I think we should just do this now and have a better API for 5.15 release. There's some room for de-duplicating the close code, but we're leaving that for the next merge window. Outside of that, just small fixes: - Poll race fixes (Hao) - io-wq core dump exit fix (me) - Reschedule around potentially intensive tctx and buffer iterators on teardown (me) - Fix for always ending up punting files update to io-wq (me) - Put the provided buffer meta data under memcg accounting (me) - Tweak for io_write(), removing dead code that was added with the iterator changes in this release (Pavel) Please pull! The following changes since commit e4e737bb5c170df6135a127739a9e6148ee3da82: Linux 5.15-rc2 (2021-09-19 17:28:22 -0700) are available in the Git repository at: git://git.kernel.dk/linux-block.git tags/io_uring-5.15-2021-09-25 for you to fetch changes up to 7df778be2f61e1a23002d1f2f5d6aaf702771eb8: io_uring: make OP_CLOSE consistent with direct open (2021-09-24 14:07:54 -0600) ---------------------------------------------------------------- io_uring-5.15-2021-09-25 ---------------------------------------------------------------- Hao Xu (3): io_uring: fix race between poll completion and cancel_hash insertion io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow io_uring: fix potential req refcount underflow Jens Axboe (4): io-wq: ensure we exit if thread group is exiting io_uring: allow conditional reschedule for intensive iterators io_uring: put provided buffer meta data under memcg accounting io_uring: don't punt files update to io-wq unconditionally Pavel Begunkov (2): io_uring: kill extra checks in io_write() io_uring: make OP_CLOSE consistent with direct open fs/io-wq.c | 3 ++- fs/io_uring.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 72 insertions(+), 16 deletions(-) -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-25 20:32 [GIT PULL] io_uring fixes for 5.15-rc3 Jens Axboe @ 2021-09-25 23:05 ` Linus Torvalds 2021-09-26 1:20 ` Jens Axboe 2021-09-26 4:31 ` Eric W. Biederman 2021-09-25 23:05 ` pr-tracker-bot 1 sibling, 2 replies; 12+ messages in thread From: Linus Torvalds @ 2021-09-25 23:05 UTC (permalink / raw) To: Jens Axboe, Eric W. Biederman, Oleg Nesterov, Al Viro; +Cc: io-uring On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: > > - io-wq core dump exit fix (me) Hmm. That one strikes me as odd. I get the feeling that if the io_uring thread needs to have that signal_group_exit() test, something is wrong in signal-land. It's basically a "fatal signal has been sent to another thread", and I really get the feeling that "fatal_signal_pending()" should just be modified to handle that case too. Because what about a number of other situations where we have that "killable" logic (ie "stop waiting for locks or IO if you're just going to get killed anyway" - things like lock_page_killable() and friends) Adding Eric, Oleg and Al to the participants, so that somebody else can pipe up. That piping up may quite possibly be to just tell me I'm being stupid, and that this is just a result of some io_uring thread thing, and nobody else has this problem. It's commit 87c169665578 ("io-wq: ensure we exit if thread group is exiting") in my tree. Comments? Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-25 23:05 ` Linus Torvalds @ 2021-09-26 1:20 ` Jens Axboe 2021-09-27 13:51 ` Eric W. Biederman 2021-09-26 4:31 ` Eric W. Biederman 1 sibling, 1 reply; 12+ messages in thread From: Jens Axboe @ 2021-09-26 1:20 UTC (permalink / raw) To: Linus Torvalds, Eric W. Biederman, Oleg Nesterov, Al Viro; +Cc: io-uring On 9/25/21 5:05 PM, Linus Torvalds wrote: > On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >> >> - io-wq core dump exit fix (me) > > Hmm. > > That one strikes me as odd. > > I get the feeling that if the io_uring thread needs to have that > signal_group_exit() test, something is wrong in signal-land. > > It's basically a "fatal signal has been sent to another thread", and I > really get the feeling that "fatal_signal_pending()" should just be > modified to handle that case too. It did surprise me as well, which is why that previous change ended up being broken for the coredump case... You could argue that the io-wq thread should just exit on signal_pending(), which is what we did before, but that really ends up sucking for workloads that do use signals for communication purposes. postgres was the reporter here. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-26 1:20 ` Jens Axboe @ 2021-09-27 13:51 ` Eric W. Biederman 2021-09-27 14:29 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2021-09-27 13:51 UTC (permalink / raw) To: Jens Axboe; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring Jens Axboe <[email protected]> writes: > On 9/25/21 5:05 PM, Linus Torvalds wrote: >> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >>> >>> - io-wq core dump exit fix (me) >> >> Hmm. >> >> That one strikes me as odd. >> >> I get the feeling that if the io_uring thread needs to have that >> signal_group_exit() test, something is wrong in signal-land. >> >> It's basically a "fatal signal has been sent to another thread", and I >> really get the feeling that "fatal_signal_pending()" should just be >> modified to handle that case too. > > It did surprise me as well, which is why that previous change ended up > being broken for the coredump case... You could argue that the io-wq > thread should just exit on signal_pending(), which is what we did > before, but that really ends up sucking for workloads that do use > signals for communication purposes. postgres was the reporter here. The primary function get_signal is to make signals not pending. So I don't understand any use of testing signal_pending after a call to get_signal. My confusion doubles when I consider the fact io_uring threads should only be dequeuing SIGSTOP and SIGKILL. I am concerned that an io_uring thread that dequeues SIGKILL won't call signal_group_exit and thus kill the other threads in the thread group. What motivated removing the break and adding the fatal_signal_pending test? Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-27 13:51 ` Eric W. Biederman @ 2021-09-27 14:29 ` Jens Axboe 2021-09-27 14:59 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2021-09-27 14:29 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring On 9/27/21 7:51 AM, Eric W. Biederman wrote: > Jens Axboe <[email protected]> writes: > >> On 9/25/21 5:05 PM, Linus Torvalds wrote: >>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >>>> >>>> - io-wq core dump exit fix (me) >>> >>> Hmm. >>> >>> That one strikes me as odd. >>> >>> I get the feeling that if the io_uring thread needs to have that >>> signal_group_exit() test, something is wrong in signal-land. >>> >>> It's basically a "fatal signal has been sent to another thread", and I >>> really get the feeling that "fatal_signal_pending()" should just be >>> modified to handle that case too. >> >> It did surprise me as well, which is why that previous change ended up >> being broken for the coredump case... You could argue that the io-wq >> thread should just exit on signal_pending(), which is what we did >> before, but that really ends up sucking for workloads that do use >> signals for communication purposes. postgres was the reporter here. > > The primary function get_signal is to make signals not pending. So I > don't understand any use of testing signal_pending after a call to > get_signal. > > My confusion doubles when I consider the fact io_uring threads should > only be dequeuing SIGSTOP and SIGKILL. > > I am concerned that an io_uring thread that dequeues SIGKILL won't call > signal_group_exit and thus kill the other threads in the thread group. > > What motivated removing the break and adding the fatal_signal_pending > test? I played with this a bit this morning, and I agree it doesn't seem to be needed at all. The original issue was with postgres, I'll give that a whirl as well and see if we run into any unwarranted exits. My simpler test case did not. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-27 14:29 ` Jens Axboe @ 2021-09-27 14:59 ` Jens Axboe 2021-09-27 15:13 ` Eric W. Biederman 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2021-09-27 14:59 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring On 9/27/21 8:29 AM, Jens Axboe wrote: > On 9/27/21 7:51 AM, Eric W. Biederman wrote: >> Jens Axboe <[email protected]> writes: >> >>> On 9/25/21 5:05 PM, Linus Torvalds wrote: >>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >>>>> >>>>> - io-wq core dump exit fix (me) >>>> >>>> Hmm. >>>> >>>> That one strikes me as odd. >>>> >>>> I get the feeling that if the io_uring thread needs to have that >>>> signal_group_exit() test, something is wrong in signal-land. >>>> >>>> It's basically a "fatal signal has been sent to another thread", and I >>>> really get the feeling that "fatal_signal_pending()" should just be >>>> modified to handle that case too. >>> >>> It did surprise me as well, which is why that previous change ended up >>> being broken for the coredump case... You could argue that the io-wq >>> thread should just exit on signal_pending(), which is what we did >>> before, but that really ends up sucking for workloads that do use >>> signals for communication purposes. postgres was the reporter here. >> >> The primary function get_signal is to make signals not pending. So I >> don't understand any use of testing signal_pending after a call to >> get_signal. >> >> My confusion doubles when I consider the fact io_uring threads should >> only be dequeuing SIGSTOP and SIGKILL. >> >> I am concerned that an io_uring thread that dequeues SIGKILL won't call >> signal_group_exit and thus kill the other threads in the thread group. >> >> What motivated removing the break and adding the fatal_signal_pending >> test? > > I played with this a bit this morning, and I agree it doesn't seem to be > needed at all. The original issue was with postgres, I'll give that a > whirl as well and see if we run into any unwarranted exits. My simpler > test case did not. Ran the postgres test, and we get tons of io-wq exiting on get_signal() returning true. Took a closer look, and it actually looks very much expected, as it's a SIGKILL to the original task. So it looks like I was indeed wrong, and this probably masked the original issue that was fixed in that series. I've been running with this: diff --git a/fs/io-wq.c b/fs/io-wq.c index c2360cdc403d..afd1db8e000d 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data) if (!get_signal(&ksig)) continue; - if (fatal_signal_pending(current) || - signal_group_exit(current->signal)) - break; - continue; + if (ksig.sig != SIGKILL) + printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig); + break; } last_timeout = !ret; } and it's running fine and, as expected, we don't generate any printk activity as these are all fatal deliveries to the parent. -- Jens Axboe ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-27 14:59 ` Jens Axboe @ 2021-09-27 15:13 ` Eric W. Biederman 2021-09-27 15:41 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2021-09-27 15:13 UTC (permalink / raw) To: Jens Axboe; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring Jens Axboe <[email protected]> writes: > On 9/27/21 8:29 AM, Jens Axboe wrote: >> On 9/27/21 7:51 AM, Eric W. Biederman wrote: >>> Jens Axboe <[email protected]> writes: >>> >>>> On 9/25/21 5:05 PM, Linus Torvalds wrote: >>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >>>>>> >>>>>> - io-wq core dump exit fix (me) >>>>> >>>>> Hmm. >>>>> >>>>> That one strikes me as odd. >>>>> >>>>> I get the feeling that if the io_uring thread needs to have that >>>>> signal_group_exit() test, something is wrong in signal-land. >>>>> >>>>> It's basically a "fatal signal has been sent to another thread", and I >>>>> really get the feeling that "fatal_signal_pending()" should just be >>>>> modified to handle that case too. >>>> >>>> It did surprise me as well, which is why that previous change ended up >>>> being broken for the coredump case... You could argue that the io-wq >>>> thread should just exit on signal_pending(), which is what we did >>>> before, but that really ends up sucking for workloads that do use >>>> signals for communication purposes. postgres was the reporter here. >>> >>> The primary function get_signal is to make signals not pending. So I >>> don't understand any use of testing signal_pending after a call to >>> get_signal. >>> >>> My confusion doubles when I consider the fact io_uring threads should >>> only be dequeuing SIGSTOP and SIGKILL. >>> >>> I am concerned that an io_uring thread that dequeues SIGKILL won't call >>> signal_group_exit and thus kill the other threads in the thread group. >>> >>> What motivated removing the break and adding the fatal_signal_pending >>> test? >> >> I played with this a bit this morning, and I agree it doesn't seem to be >> needed at all. The original issue was with postgres, I'll give that a >> whirl as well and see if we run into any unwarranted exits. My simpler >> test case did not. > > Ran the postgres test, and we get tons of io-wq exiting on get_signal() > returning true. Took a closer look, and it actually looks very much > expected, as it's a SIGKILL to the original task. > > So it looks like I was indeed wrong, and this probably masked the > original issue that was fixed in that series. I've been running with > this: > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index c2360cdc403d..afd1db8e000d 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data) > > if (!get_signal(&ksig)) > continue; > - if (fatal_signal_pending(current) || > - signal_group_exit(current->signal)) > - break; > - continue; > + if (ksig.sig != SIGKILL) > + printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig); > + break; > } > last_timeout = !ret; > } > > and it's running fine and, as expected, we don't generate any printk > activity as these are all fatal deliveries to the parent. Good. So just a break should be fine. A little bit of me is concerned about not calling do_group_exit in this case. Fortunately it is not a problem as complete_signal kills all of the threads in a signal_group when SIGKILL is delivered. So at least until something else is refactored and io_uring threads unblock another fatal signal all is well. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-27 15:13 ` Eric W. Biederman @ 2021-09-27 15:41 ` Jens Axboe 2021-09-27 15:52 ` Eric W. Biederman 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2021-09-27 15:41 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring On 9/27/21 9:13 AM, Eric W. Biederman wrote: > Jens Axboe <[email protected]> writes: > >> On 9/27/21 8:29 AM, Jens Axboe wrote: >>> On 9/27/21 7:51 AM, Eric W. Biederman wrote: >>>> Jens Axboe <[email protected]> writes: >>>> >>>>> On 9/25/21 5:05 PM, Linus Torvalds wrote: >>>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >>>>>>> >>>>>>> - io-wq core dump exit fix (me) >>>>>> >>>>>> Hmm. >>>>>> >>>>>> That one strikes me as odd. >>>>>> >>>>>> I get the feeling that if the io_uring thread needs to have that >>>>>> signal_group_exit() test, something is wrong in signal-land. >>>>>> >>>>>> It's basically a "fatal signal has been sent to another thread", and I >>>>>> really get the feeling that "fatal_signal_pending()" should just be >>>>>> modified to handle that case too. >>>>> >>>>> It did surprise me as well, which is why that previous change ended up >>>>> being broken for the coredump case... You could argue that the io-wq >>>>> thread should just exit on signal_pending(), which is what we did >>>>> before, but that really ends up sucking for workloads that do use >>>>> signals for communication purposes. postgres was the reporter here. >>>> >>>> The primary function get_signal is to make signals not pending. So I >>>> don't understand any use of testing signal_pending after a call to >>>> get_signal. >>>> >>>> My confusion doubles when I consider the fact io_uring threads should >>>> only be dequeuing SIGSTOP and SIGKILL. >>>> >>>> I am concerned that an io_uring thread that dequeues SIGKILL won't call >>>> signal_group_exit and thus kill the other threads in the thread group. >>>> >>>> What motivated removing the break and adding the fatal_signal_pending >>>> test? >>> >>> I played with this a bit this morning, and I agree it doesn't seem to be >>> needed at all. The original issue was with postgres, I'll give that a >>> whirl as well and see if we run into any unwarranted exits. My simpler >>> test case did not. >> >> Ran the postgres test, and we get tons of io-wq exiting on get_signal() >> returning true. Took a closer look, and it actually looks very much >> expected, as it's a SIGKILL to the original task. >> >> So it looks like I was indeed wrong, and this probably masked the >> original issue that was fixed in that series. I've been running with >> this: >> >> diff --git a/fs/io-wq.c b/fs/io-wq.c >> index c2360cdc403d..afd1db8e000d 100644 >> --- a/fs/io-wq.c >> +++ b/fs/io-wq.c >> @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data) >> >> if (!get_signal(&ksig)) >> continue; >> - if (fatal_signal_pending(current) || >> - signal_group_exit(current->signal)) >> - break; >> - continue; >> + if (ksig.sig != SIGKILL) >> + printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig); >> + break; >> } >> last_timeout = !ret; >> } >> >> and it's running fine and, as expected, we don't generate any printk >> activity as these are all fatal deliveries to the parent. > > Good. So just a break should be fine. Indeed, I'll send out a patch for that. > A little bit of me is concerned about not calling do_group_exit in this > case. Fortunately it is not a problem as complete_signal kills all of > the threads in a signal_group when SIGKILL is delivered. > > So at least until something else is refactored and io_uring threads > unblock another fatal signal all is well. Should we put a comment in io-wq to that effect? I don't see why we'd ever unblock other signals there, but... -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-27 15:41 ` Jens Axboe @ 2021-09-27 15:52 ` Eric W. Biederman 2021-09-27 16:03 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Eric W. Biederman @ 2021-09-27 15:52 UTC (permalink / raw) To: Jens Axboe; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring Jens Axboe <[email protected]> writes: > On 9/27/21 9:13 AM, Eric W. Biederman wrote: >> Jens Axboe <[email protected]> writes: >> >>> On 9/27/21 8:29 AM, Jens Axboe wrote: >>>> On 9/27/21 7:51 AM, Eric W. Biederman wrote: >>>>> Jens Axboe <[email protected]> writes: >>>>> >>>>>> On 9/25/21 5:05 PM, Linus Torvalds wrote: >>>>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >>>>>>>> >>>>>>>> - io-wq core dump exit fix (me) >>>>>>> >>>>>>> Hmm. >>>>>>> >>>>>>> That one strikes me as odd. >>>>>>> >>>>>>> I get the feeling that if the io_uring thread needs to have that >>>>>>> signal_group_exit() test, something is wrong in signal-land. >>>>>>> >>>>>>> It's basically a "fatal signal has been sent to another thread", and I >>>>>>> really get the feeling that "fatal_signal_pending()" should just be >>>>>>> modified to handle that case too. >>>>>> >>>>>> It did surprise me as well, which is why that previous change ended up >>>>>> being broken for the coredump case... You could argue that the io-wq >>>>>> thread should just exit on signal_pending(), which is what we did >>>>>> before, but that really ends up sucking for workloads that do use >>>>>> signals for communication purposes. postgres was the reporter here. >>>>> >>>>> The primary function get_signal is to make signals not pending. So I >>>>> don't understand any use of testing signal_pending after a call to >>>>> get_signal. >>>>> >>>>> My confusion doubles when I consider the fact io_uring threads should >>>>> only be dequeuing SIGSTOP and SIGKILL. >>>>> >>>>> I am concerned that an io_uring thread that dequeues SIGKILL won't call >>>>> signal_group_exit and thus kill the other threads in the thread group. >>>>> >>>>> What motivated removing the break and adding the fatal_signal_pending >>>>> test? >>>> >>>> I played with this a bit this morning, and I agree it doesn't seem to be >>>> needed at all. The original issue was with postgres, I'll give that a >>>> whirl as well and see if we run into any unwarranted exits. My simpler >>>> test case did not. >>> >>> Ran the postgres test, and we get tons of io-wq exiting on get_signal() >>> returning true. Took a closer look, and it actually looks very much >>> expected, as it's a SIGKILL to the original task. >>> >>> So it looks like I was indeed wrong, and this probably masked the >>> original issue that was fixed in that series. I've been running with >>> this: >>> >>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>> index c2360cdc403d..afd1db8e000d 100644 >>> --- a/fs/io-wq.c >>> +++ b/fs/io-wq.c >>> @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data) >>> >>> if (!get_signal(&ksig)) >>> continue; >>> - if (fatal_signal_pending(current) || >>> - signal_group_exit(current->signal)) >>> - break; >>> - continue; >>> + if (ksig.sig != SIGKILL) >>> + printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig); >>> + break; >>> } >>> last_timeout = !ret; >>> } >>> >>> and it's running fine and, as expected, we don't generate any printk >>> activity as these are all fatal deliveries to the parent. >> >> Good. So just a break should be fine. > > Indeed, I'll send out a patch for that. > >> A little bit of me is concerned about not calling do_group_exit in this >> case. Fortunately it is not a problem as complete_signal kills all of >> the threads in a signal_group when SIGKILL is delivered. >> >> So at least until something else is refactored and io_uring threads >> unblock another fatal signal all is well. > > Should we put a comment in io-wq to that effect? I don't see why we'd > ever unblock other signals there, but... I suspect rather we should update this comment in get_signal instead. /* * PF_IO_WORKER threads will catch and exit on fatal signals * themselves. They have cleanup that must be performed, so * we cannot call do_exit() on their behalf. */ if (current->flags & PF_IO_WORKER) goto out; Although I would not mind updating io-wq.c and io_uring.c where they call get_signal as well. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-27 15:52 ` Eric W. Biederman @ 2021-09-27 16:03 ` Jens Axboe 0 siblings, 0 replies; 12+ messages in thread From: Jens Axboe @ 2021-09-27 16:03 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring On 9/27/21 9:52 AM, Eric W. Biederman wrote: > Jens Axboe <[email protected]> writes: > >> On 9/27/21 9:13 AM, Eric W. Biederman wrote: >>> Jens Axboe <[email protected]> writes: >>> >>>> On 9/27/21 8:29 AM, Jens Axboe wrote: >>>>> On 9/27/21 7:51 AM, Eric W. Biederman wrote: >>>>>> Jens Axboe <[email protected]> writes: >>>>>> >>>>>>> On 9/25/21 5:05 PM, Linus Torvalds wrote: >>>>>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >>>>>>>>> >>>>>>>>> - io-wq core dump exit fix (me) >>>>>>>> >>>>>>>> Hmm. >>>>>>>> >>>>>>>> That one strikes me as odd. >>>>>>>> >>>>>>>> I get the feeling that if the io_uring thread needs to have that >>>>>>>> signal_group_exit() test, something is wrong in signal-land. >>>>>>>> >>>>>>>> It's basically a "fatal signal has been sent to another thread", and I >>>>>>>> really get the feeling that "fatal_signal_pending()" should just be >>>>>>>> modified to handle that case too. >>>>>>> >>>>>>> It did surprise me as well, which is why that previous change ended up >>>>>>> being broken for the coredump case... You could argue that the io-wq >>>>>>> thread should just exit on signal_pending(), which is what we did >>>>>>> before, but that really ends up sucking for workloads that do use >>>>>>> signals for communication purposes. postgres was the reporter here. >>>>>> >>>>>> The primary function get_signal is to make signals not pending. So I >>>>>> don't understand any use of testing signal_pending after a call to >>>>>> get_signal. >>>>>> >>>>>> My confusion doubles when I consider the fact io_uring threads should >>>>>> only be dequeuing SIGSTOP and SIGKILL. >>>>>> >>>>>> I am concerned that an io_uring thread that dequeues SIGKILL won't call >>>>>> signal_group_exit and thus kill the other threads in the thread group. >>>>>> >>>>>> What motivated removing the break and adding the fatal_signal_pending >>>>>> test? >>>>> >>>>> I played with this a bit this morning, and I agree it doesn't seem to be >>>>> needed at all. The original issue was with postgres, I'll give that a >>>>> whirl as well and see if we run into any unwarranted exits. My simpler >>>>> test case did not. >>>> >>>> Ran the postgres test, and we get tons of io-wq exiting on get_signal() >>>> returning true. Took a closer look, and it actually looks very much >>>> expected, as it's a SIGKILL to the original task. >>>> >>>> So it looks like I was indeed wrong, and this probably masked the >>>> original issue that was fixed in that series. I've been running with >>>> this: >>>> >>>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>>> index c2360cdc403d..afd1db8e000d 100644 >>>> --- a/fs/io-wq.c >>>> +++ b/fs/io-wq.c >>>> @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data) >>>> >>>> if (!get_signal(&ksig)) >>>> continue; >>>> - if (fatal_signal_pending(current) || >>>> - signal_group_exit(current->signal)) >>>> - break; >>>> - continue; >>>> + if (ksig.sig != SIGKILL) >>>> + printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig); >>>> + break; >>>> } >>>> last_timeout = !ret; >>>> } >>>> >>>> and it's running fine and, as expected, we don't generate any printk >>>> activity as these are all fatal deliveries to the parent. >>> >>> Good. So just a break should be fine. >> >> Indeed, I'll send out a patch for that. >> >>> A little bit of me is concerned about not calling do_group_exit in this >>> case. Fortunately it is not a problem as complete_signal kills all of >>> the threads in a signal_group when SIGKILL is delivered. >>> >>> So at least until something else is refactored and io_uring threads >>> unblock another fatal signal all is well. >> >> Should we put a comment in io-wq to that effect? I don't see why we'd >> ever unblock other signals there, but... > > I suspect rather we should update this comment in get_signal > instead. > > /* > * PF_IO_WORKER threads will catch and exit on fatal signals > * themselves. They have cleanup that must be performed, so > * we cannot call do_exit() on their behalf. > */ > if (current->flags & PF_IO_WORKER) > goto out; > > > Although I would not mind updating io-wq.c and io_uring.c where > they call get_signal as well. Probably best to leave the explanation to the source, in get_signal(). If you don't mind, I'll leave updating that one to you. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-25 23:05 ` Linus Torvalds 2021-09-26 1:20 ` Jens Axboe @ 2021-09-26 4:31 ` Eric W. Biederman 1 sibling, 0 replies; 12+ messages in thread From: Eric W. Biederman @ 2021-09-26 4:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jens Axboe, Oleg Nesterov, Al Viro, io-uring Linus Torvalds <[email protected]> writes: > On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote: >> >> - io-wq core dump exit fix (me) > > Hmm. > > That one strikes me as odd. > > I get the feeling that if the io_uring thread needs to have that > signal_group_exit() test, something is wrong in signal-land. > > It's basically a "fatal signal has been sent to another thread", and I > really get the feeling that "fatal_signal_pending()" should just be > modified to handle that case too. > > Because what about a number of other situations where we have that > "killable" logic (ie "stop waiting for locks or IO if you're just > going to get killed anyway" - things like lock_page_killable() and > friends) > > Adding Eric, Oleg and Al to the participants, so that somebody else can pipe up. > > That piping up may quite possibly be to just tell me I'm being stupid, > and that this is just a result of some io_uring thread thing, and > nobody else has this problem. > > It's commit 87c169665578 ("io-wq: ensure we exit if thread group is > exiting") in my tree. > > Comments? I agree it smells. It smells that there needs to be a test after get_signal returns with a signal from an io_uring thread. As I recall io-wq threads block all signals except for SIGSTOP and SIGKILL. The signal SIGSTOP is never returned from get_signal. So at a first glance every return from get_signal should be SIGKILL and thus fatal. So I don't understand why there needs to be a test at all after get_signal. Further the SIGKILL should have been delivered by get_signal so it should not be pending so I am not certain when fatal_signal_pending. Can someone please explain commit 15e20db2e0ce ("io-wq: only exit on fatal signals") to me? What signals is get_signal returning to be delivered to userspace that are not fatal and that are ok to ignore? I think I am missing something. Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] io_uring fixes for 5.15-rc3 2021-09-25 20:32 [GIT PULL] io_uring fixes for 5.15-rc3 Jens Axboe 2021-09-25 23:05 ` Linus Torvalds @ 2021-09-25 23:05 ` pr-tracker-bot 1 sibling, 0 replies; 12+ messages in thread From: pr-tracker-bot @ 2021-09-25 23:05 UTC (permalink / raw) To: Jens Axboe; +Cc: Linus Torvalds, io-uring The pull request you sent on Sat, 25 Sep 2021 14:32:38 -0600: > git://git.kernel.dk/linux-block.git tags/io_uring-5.15-2021-09-25 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/f6f360aef0e70a45cbf43db1dd9df5a5e96d9836 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-09-27 16:03 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-09-25 20:32 [GIT PULL] io_uring fixes for 5.15-rc3 Jens Axboe 2021-09-25 23:05 ` Linus Torvalds 2021-09-26 1:20 ` Jens Axboe 2021-09-27 13:51 ` Eric W. Biederman 2021-09-27 14:29 ` Jens Axboe 2021-09-27 14:59 ` Jens Axboe 2021-09-27 15:13 ` Eric W. Biederman 2021-09-27 15:41 ` Jens Axboe 2021-09-27 15:52 ` Eric W. Biederman 2021-09-27 16:03 ` Jens Axboe 2021-09-26 4:31 ` Eric W. Biederman 2021-09-25 23:05 ` pr-tracker-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox