public inbox for [email protected]
 help / color / mirror / Atom feed
* [GIT PULL] io_uring fixes for 5.15-rc3
@ 2021-09-25 20:32 Jens Axboe
  2021-09-25 23:05 ` Linus Torvalds
  2021-09-25 23:05 ` pr-tracker-bot
  0 siblings, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2021-09-25 20:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: io-uring

Hi Linus,

This one looks a bit bigger than it is, but that's mainly because 2/3 of
it is enabling IORING_OP_CLOSE to close direct file descriptors. We've
had a few folks using them and finding it confusing that the way to
close them is through using -1 for file update, this just brings API
symmetry for direct descriptors. Hence I think we should just do this
now and have a better API for 5.15 release. There's some room for
de-duplicating the close code, but we're leaving that for the next merge
window.

Outside of that, just small fixes:

- Poll race fixes (Hao)

- io-wq core dump exit fix (me)

- Reschedule around potentially intensive tctx and buffer iterators on
  teardown (me)

- Fix for always ending up punting files update to io-wq (me)

- Put the provided buffer meta data under memcg accounting (me)

- Tweak for io_write(), removing dead code that was added with the
  iterator changes in this release (Pavel)

Please pull!


The following changes since commit e4e737bb5c170df6135a127739a9e6148ee3da82:

  Linux 5.15-rc2 (2021-09-19 17:28:22 -0700)

are available in the Git repository at:

  git://git.kernel.dk/linux-block.git tags/io_uring-5.15-2021-09-25

for you to fetch changes up to 7df778be2f61e1a23002d1f2f5d6aaf702771eb8:

  io_uring: make OP_CLOSE consistent with direct open (2021-09-24 14:07:54 -0600)

----------------------------------------------------------------
io_uring-5.15-2021-09-25

----------------------------------------------------------------
Hao Xu (3):
      io_uring: fix race between poll completion and cancel_hash insertion
      io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow
      io_uring: fix potential req refcount underflow

Jens Axboe (4):
      io-wq: ensure we exit if thread group is exiting
      io_uring: allow conditional reschedule for intensive iterators
      io_uring: put provided buffer meta data under memcg accounting
      io_uring: don't punt files update to io-wq unconditionally

Pavel Begunkov (2):
      io_uring: kill extra checks in io_write()
      io_uring: make OP_CLOSE consistent with direct open

 fs/io-wq.c    |  3 ++-
 fs/io_uring.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 72 insertions(+), 16 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-25 20:32 [GIT PULL] io_uring fixes for 5.15-rc3 Jens Axboe
@ 2021-09-25 23:05 ` Linus Torvalds
  2021-09-26  1:20   ` Jens Axboe
  2021-09-26  4:31   ` Eric W. Biederman
  2021-09-25 23:05 ` pr-tracker-bot
  1 sibling, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2021-09-25 23:05 UTC (permalink / raw)
  To: Jens Axboe, Eric W. Biederman, Oleg Nesterov, Al Viro; +Cc: io-uring

On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>
> - io-wq core dump exit fix (me)

Hmm.

That one strikes me as odd.

I get the feeling that if the io_uring thread needs to have that
signal_group_exit() test, something is wrong in signal-land.

It's basically a "fatal signal has been sent to another thread", and I
really get the feeling that "fatal_signal_pending()" should just be
modified to handle that case too.

Because what about a number of other situations where we have that
"killable" logic (ie "stop waiting for locks or IO if you're just
going to get killed anyway" - things like lock_page_killable() and
friends)

Adding Eric, Oleg and Al to the participants, so that somebody else can pipe up.

That piping up may quite possibly be to just tell me I'm being stupid,
and that this is just a result of some io_uring thread thing, and
nobody else has this problem.

It's commit 87c169665578 ("io-wq: ensure we exit if thread group is
exiting") in my tree.

Comments?

            Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-25 20:32 [GIT PULL] io_uring fixes for 5.15-rc3 Jens Axboe
  2021-09-25 23:05 ` Linus Torvalds
@ 2021-09-25 23:05 ` pr-tracker-bot
  1 sibling, 0 replies; 12+ messages in thread
From: pr-tracker-bot @ 2021-09-25 23:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, io-uring

The pull request you sent on Sat, 25 Sep 2021 14:32:38 -0600:

> git://git.kernel.dk/linux-block.git tags/io_uring-5.15-2021-09-25

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f6f360aef0e70a45cbf43db1dd9df5a5e96d9836

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-25 23:05 ` Linus Torvalds
@ 2021-09-26  1:20   ` Jens Axboe
  2021-09-27 13:51     ` Eric W. Biederman
  2021-09-26  4:31   ` Eric W. Biederman
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-09-26  1:20 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman, Oleg Nesterov, Al Viro; +Cc: io-uring

On 9/25/21 5:05 PM, Linus Torvalds wrote:
> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>
>> - io-wq core dump exit fix (me)
> 
> Hmm.
> 
> That one strikes me as odd.
> 
> I get the feeling that if the io_uring thread needs to have that
> signal_group_exit() test, something is wrong in signal-land.
> 
> It's basically a "fatal signal has been sent to another thread", and I
> really get the feeling that "fatal_signal_pending()" should just be
> modified to handle that case too.

It did surprise me as well, which is why that previous change ended up
being broken for the coredump case... You could argue that the io-wq
thread should just exit on signal_pending(), which is what we did
before, but that really ends up sucking for workloads that do use
signals for communication purposes. postgres was the reporter here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-25 23:05 ` Linus Torvalds
  2021-09-26  1:20   ` Jens Axboe
@ 2021-09-26  4:31   ` Eric W. Biederman
  1 sibling, 0 replies; 12+ messages in thread
From: Eric W. Biederman @ 2021-09-26  4:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, Oleg Nesterov, Al Viro, io-uring

Linus Torvalds <[email protected]> writes:

> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>
>> - io-wq core dump exit fix (me)
>
> Hmm.
>
> That one strikes me as odd.
>
> I get the feeling that if the io_uring thread needs to have that
> signal_group_exit() test, something is wrong in signal-land.
>
> It's basically a "fatal signal has been sent to another thread", and I
> really get the feeling that "fatal_signal_pending()" should just be
> modified to handle that case too.
>
> Because what about a number of other situations where we have that
> "killable" logic (ie "stop waiting for locks or IO if you're just
> going to get killed anyway" - things like lock_page_killable() and
> friends)
>
> Adding Eric, Oleg and Al to the participants, so that somebody else can pipe up.
>
> That piping up may quite possibly be to just tell me I'm being stupid,
> and that this is just a result of some io_uring thread thing, and
> nobody else has this problem.
>
> It's commit 87c169665578 ("io-wq: ensure we exit if thread group is
> exiting") in my tree.
>
> Comments?

I agree it smells.

It smells that there needs to be a test after get_signal returns with a
signal from an io_uring thread.

As I recall io-wq threads block all signals except for SIGSTOP and
SIGKILL.  The signal SIGSTOP is never returned from get_signal.  So at
a first glance every return from get_signal should be SIGKILL and thus
fatal.

So I don't understand why there needs to be a test at all after
get_signal.

Further the SIGKILL should have been delivered by get_signal so it
should not be pending so I am not certain when fatal_signal_pending.

Can someone please explain commit 15e20db2e0ce ("io-wq: only exit on
fatal signals") to me?

What signals is get_signal returning to be delivered to userspace that
are not fatal and that are ok to ignore?

I think I am missing something.

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-26  1:20   ` Jens Axboe
@ 2021-09-27 13:51     ` Eric W. Biederman
  2021-09-27 14:29       ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2021-09-27 13:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring

Jens Axboe <[email protected]> writes:

> On 9/25/21 5:05 PM, Linus Torvalds wrote:
>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>>
>>> - io-wq core dump exit fix (me)
>> 
>> Hmm.
>> 
>> That one strikes me as odd.
>> 
>> I get the feeling that if the io_uring thread needs to have that
>> signal_group_exit() test, something is wrong in signal-land.
>> 
>> It's basically a "fatal signal has been sent to another thread", and I
>> really get the feeling that "fatal_signal_pending()" should just be
>> modified to handle that case too.
>
> It did surprise me as well, which is why that previous change ended up
> being broken for the coredump case... You could argue that the io-wq
> thread should just exit on signal_pending(), which is what we did
> before, but that really ends up sucking for workloads that do use
> signals for communication purposes. postgres was the reporter here.

The primary function get_signal is to make signals not pending.  So I
don't understand any use of testing signal_pending after a call to
get_signal.

My confusion doubles when I consider the fact io_uring threads should
only be dequeuing SIGSTOP and SIGKILL.

I am concerned that an io_uring thread that dequeues SIGKILL won't call
signal_group_exit and thus kill the other threads in the thread group.

What motivated removing the break and adding the fatal_signal_pending
test?

Eric


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-27 13:51     ` Eric W. Biederman
@ 2021-09-27 14:29       ` Jens Axboe
  2021-09-27 14:59         ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-09-27 14:29 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring

On 9/27/21 7:51 AM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
> 
>> On 9/25/21 5:05 PM, Linus Torvalds wrote:
>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>>>
>>>> - io-wq core dump exit fix (me)
>>>
>>> Hmm.
>>>
>>> That one strikes me as odd.
>>>
>>> I get the feeling that if the io_uring thread needs to have that
>>> signal_group_exit() test, something is wrong in signal-land.
>>>
>>> It's basically a "fatal signal has been sent to another thread", and I
>>> really get the feeling that "fatal_signal_pending()" should just be
>>> modified to handle that case too.
>>
>> It did surprise me as well, which is why that previous change ended up
>> being broken for the coredump case... You could argue that the io-wq
>> thread should just exit on signal_pending(), which is what we did
>> before, but that really ends up sucking for workloads that do use
>> signals for communication purposes. postgres was the reporter here.
> 
> The primary function get_signal is to make signals not pending.  So I
> don't understand any use of testing signal_pending after a call to
> get_signal.
> 
> My confusion doubles when I consider the fact io_uring threads should
> only be dequeuing SIGSTOP and SIGKILL.
> 
> I am concerned that an io_uring thread that dequeues SIGKILL won't call
> signal_group_exit and thus kill the other threads in the thread group.
> 
> What motivated removing the break and adding the fatal_signal_pending
> test?

I played with this a bit this morning, and I agree it doesn't seem to be
needed at all. The original issue was with postgres, I'll give that a
whirl as well and see if we run into any unwarranted exits. My simpler
test case did not.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-27 14:29       ` Jens Axboe
@ 2021-09-27 14:59         ` Jens Axboe
  2021-09-27 15:13           ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-09-27 14:59 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring

On 9/27/21 8:29 AM, Jens Axboe wrote:
> On 9/27/21 7:51 AM, Eric W. Biederman wrote:
>> Jens Axboe <[email protected]> writes:
>>
>>> On 9/25/21 5:05 PM, Linus Torvalds wrote:
>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>>>>
>>>>> - io-wq core dump exit fix (me)
>>>>
>>>> Hmm.
>>>>
>>>> That one strikes me as odd.
>>>>
>>>> I get the feeling that if the io_uring thread needs to have that
>>>> signal_group_exit() test, something is wrong in signal-land.
>>>>
>>>> It's basically a "fatal signal has been sent to another thread", and I
>>>> really get the feeling that "fatal_signal_pending()" should just be
>>>> modified to handle that case too.
>>>
>>> It did surprise me as well, which is why that previous change ended up
>>> being broken for the coredump case... You could argue that the io-wq
>>> thread should just exit on signal_pending(), which is what we did
>>> before, but that really ends up sucking for workloads that do use
>>> signals for communication purposes. postgres was the reporter here.
>>
>> The primary function get_signal is to make signals not pending.  So I
>> don't understand any use of testing signal_pending after a call to
>> get_signal.
>>
>> My confusion doubles when I consider the fact io_uring threads should
>> only be dequeuing SIGSTOP and SIGKILL.
>>
>> I am concerned that an io_uring thread that dequeues SIGKILL won't call
>> signal_group_exit and thus kill the other threads in the thread group.
>>
>> What motivated removing the break and adding the fatal_signal_pending
>> test?
> 
> I played with this a bit this morning, and I agree it doesn't seem to be
> needed at all. The original issue was with postgres, I'll give that a
> whirl as well and see if we run into any unwarranted exits. My simpler
> test case did not.

Ran the postgres test, and we get tons of io-wq exiting on get_signal()
returning true. Took a closer look, and it actually looks very much
expected, as it's a SIGKILL to the original task.

So it looks like I was indeed wrong, and this probably masked the
original issue that was fixed in that series. I've been running with
this:

diff --git a/fs/io-wq.c b/fs/io-wq.c
index c2360cdc403d..afd1db8e000d 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -584,10 +584,9 @@ static int io_wqe_worker(void *data)
 
 			if (!get_signal(&ksig))
 				continue;
-			if (fatal_signal_pending(current) ||
-			    signal_group_exit(current->signal))
-				break;
-			continue;
+			if (ksig.sig != SIGKILL)
+				printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig);
+			break;
 		}
 		last_timeout = !ret;
 	}

and it's running fine and, as expected, we don't generate any printk
activity as these are all fatal deliveries to the parent.

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-27 14:59         ` Jens Axboe
@ 2021-09-27 15:13           ` Eric W. Biederman
  2021-09-27 15:41             ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2021-09-27 15:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring

Jens Axboe <[email protected]> writes:

> On 9/27/21 8:29 AM, Jens Axboe wrote:
>> On 9/27/21 7:51 AM, Eric W. Biederman wrote:
>>> Jens Axboe <[email protected]> writes:
>>>
>>>> On 9/25/21 5:05 PM, Linus Torvalds wrote:
>>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>>>>>
>>>>>> - io-wq core dump exit fix (me)
>>>>>
>>>>> Hmm.
>>>>>
>>>>> That one strikes me as odd.
>>>>>
>>>>> I get the feeling that if the io_uring thread needs to have that
>>>>> signal_group_exit() test, something is wrong in signal-land.
>>>>>
>>>>> It's basically a "fatal signal has been sent to another thread", and I
>>>>> really get the feeling that "fatal_signal_pending()" should just be
>>>>> modified to handle that case too.
>>>>
>>>> It did surprise me as well, which is why that previous change ended up
>>>> being broken for the coredump case... You could argue that the io-wq
>>>> thread should just exit on signal_pending(), which is what we did
>>>> before, but that really ends up sucking for workloads that do use
>>>> signals for communication purposes. postgres was the reporter here.
>>>
>>> The primary function get_signal is to make signals not pending.  So I
>>> don't understand any use of testing signal_pending after a call to
>>> get_signal.
>>>
>>> My confusion doubles when I consider the fact io_uring threads should
>>> only be dequeuing SIGSTOP and SIGKILL.
>>>
>>> I am concerned that an io_uring thread that dequeues SIGKILL won't call
>>> signal_group_exit and thus kill the other threads in the thread group.
>>>
>>> What motivated removing the break and adding the fatal_signal_pending
>>> test?
>> 
>> I played with this a bit this morning, and I agree it doesn't seem to be
>> needed at all. The original issue was with postgres, I'll give that a
>> whirl as well and see if we run into any unwarranted exits. My simpler
>> test case did not.
>
> Ran the postgres test, and we get tons of io-wq exiting on get_signal()
> returning true. Took a closer look, and it actually looks very much
> expected, as it's a SIGKILL to the original task.
>
> So it looks like I was indeed wrong, and this probably masked the
> original issue that was fixed in that series. I've been running with
> this:
>
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index c2360cdc403d..afd1db8e000d 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data)
>  
>  			if (!get_signal(&ksig))
>  				continue;
> -			if (fatal_signal_pending(current) ||
> -			    signal_group_exit(current->signal))
> -				break;
> -			continue;
> +			if (ksig.sig != SIGKILL)
> +				printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig);
> +			break;
>  		}
>  		last_timeout = !ret;
>  	}
>
> and it's running fine and, as expected, we don't generate any printk
> activity as these are all fatal deliveries to the parent.

Good.  So just a break should be fine.

A little bit of me is concerned about not calling do_group_exit in this
case.  Fortunately it is not a problem as complete_signal kills all of
the threads in a signal_group when SIGKILL is delivered.

So at least until something else is refactored and io_uring threads
unblock another fatal signal all is well.

Eric

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-27 15:13           ` Eric W. Biederman
@ 2021-09-27 15:41             ` Jens Axboe
  2021-09-27 15:52               ` Eric W. Biederman
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-09-27 15:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring

On 9/27/21 9:13 AM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
> 
>> On 9/27/21 8:29 AM, Jens Axboe wrote:
>>> On 9/27/21 7:51 AM, Eric W. Biederman wrote:
>>>> Jens Axboe <[email protected]> writes:
>>>>
>>>>> On 9/25/21 5:05 PM, Linus Torvalds wrote:
>>>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>>>>>>
>>>>>>> - io-wq core dump exit fix (me)
>>>>>>
>>>>>> Hmm.
>>>>>>
>>>>>> That one strikes me as odd.
>>>>>>
>>>>>> I get the feeling that if the io_uring thread needs to have that
>>>>>> signal_group_exit() test, something is wrong in signal-land.
>>>>>>
>>>>>> It's basically a "fatal signal has been sent to another thread", and I
>>>>>> really get the feeling that "fatal_signal_pending()" should just be
>>>>>> modified to handle that case too.
>>>>>
>>>>> It did surprise me as well, which is why that previous change ended up
>>>>> being broken for the coredump case... You could argue that the io-wq
>>>>> thread should just exit on signal_pending(), which is what we did
>>>>> before, but that really ends up sucking for workloads that do use
>>>>> signals for communication purposes. postgres was the reporter here.
>>>>
>>>> The primary function get_signal is to make signals not pending.  So I
>>>> don't understand any use of testing signal_pending after a call to
>>>> get_signal.
>>>>
>>>> My confusion doubles when I consider the fact io_uring threads should
>>>> only be dequeuing SIGSTOP and SIGKILL.
>>>>
>>>> I am concerned that an io_uring thread that dequeues SIGKILL won't call
>>>> signal_group_exit and thus kill the other threads in the thread group.
>>>>
>>>> What motivated removing the break and adding the fatal_signal_pending
>>>> test?
>>>
>>> I played with this a bit this morning, and I agree it doesn't seem to be
>>> needed at all. The original issue was with postgres, I'll give that a
>>> whirl as well and see if we run into any unwarranted exits. My simpler
>>> test case did not.
>>
>> Ran the postgres test, and we get tons of io-wq exiting on get_signal()
>> returning true. Took a closer look, and it actually looks very much
>> expected, as it's a SIGKILL to the original task.
>>
>> So it looks like I was indeed wrong, and this probably masked the
>> original issue that was fixed in that series. I've been running with
>> this:
>>
>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>> index c2360cdc403d..afd1db8e000d 100644
>> --- a/fs/io-wq.c
>> +++ b/fs/io-wq.c
>> @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data)
>>  
>>  			if (!get_signal(&ksig))
>>  				continue;
>> -			if (fatal_signal_pending(current) ||
>> -			    signal_group_exit(current->signal))
>> -				break;
>> -			continue;
>> +			if (ksig.sig != SIGKILL)
>> +				printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig);
>> +			break;
>>  		}
>>  		last_timeout = !ret;
>>  	}
>>
>> and it's running fine and, as expected, we don't generate any printk
>> activity as these are all fatal deliveries to the parent.
> 
> Good.  So just a break should be fine.

Indeed, I'll send out a patch for that.

> A little bit of me is concerned about not calling do_group_exit in this
> case.  Fortunately it is not a problem as complete_signal kills all of
> the threads in a signal_group when SIGKILL is delivered.
> 
> So at least until something else is refactored and io_uring threads
> unblock another fatal signal all is well.

Should we put a comment in io-wq to that effect? I don't see why we'd
ever unblock other signals there, but...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-27 15:41             ` Jens Axboe
@ 2021-09-27 15:52               ` Eric W. Biederman
  2021-09-27 16:03                 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Eric W. Biederman @ 2021-09-27 15:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring

Jens Axboe <[email protected]> writes:

> On 9/27/21 9:13 AM, Eric W. Biederman wrote:
>> Jens Axboe <[email protected]> writes:
>> 
>>> On 9/27/21 8:29 AM, Jens Axboe wrote:
>>>> On 9/27/21 7:51 AM, Eric W. Biederman wrote:
>>>>> Jens Axboe <[email protected]> writes:
>>>>>
>>>>>> On 9/25/21 5:05 PM, Linus Torvalds wrote:
>>>>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>>>>>>>
>>>>>>>> - io-wq core dump exit fix (me)
>>>>>>>
>>>>>>> Hmm.
>>>>>>>
>>>>>>> That one strikes me as odd.
>>>>>>>
>>>>>>> I get the feeling that if the io_uring thread needs to have that
>>>>>>> signal_group_exit() test, something is wrong in signal-land.
>>>>>>>
>>>>>>> It's basically a "fatal signal has been sent to another thread", and I
>>>>>>> really get the feeling that "fatal_signal_pending()" should just be
>>>>>>> modified to handle that case too.
>>>>>>
>>>>>> It did surprise me as well, which is why that previous change ended up
>>>>>> being broken for the coredump case... You could argue that the io-wq
>>>>>> thread should just exit on signal_pending(), which is what we did
>>>>>> before, but that really ends up sucking for workloads that do use
>>>>>> signals for communication purposes. postgres was the reporter here.
>>>>>
>>>>> The primary function get_signal is to make signals not pending.  So I
>>>>> don't understand any use of testing signal_pending after a call to
>>>>> get_signal.
>>>>>
>>>>> My confusion doubles when I consider the fact io_uring threads should
>>>>> only be dequeuing SIGSTOP and SIGKILL.
>>>>>
>>>>> I am concerned that an io_uring thread that dequeues SIGKILL won't call
>>>>> signal_group_exit and thus kill the other threads in the thread group.
>>>>>
>>>>> What motivated removing the break and adding the fatal_signal_pending
>>>>> test?
>>>>
>>>> I played with this a bit this morning, and I agree it doesn't seem to be
>>>> needed at all. The original issue was with postgres, I'll give that a
>>>> whirl as well and see if we run into any unwarranted exits. My simpler
>>>> test case did not.
>>>
>>> Ran the postgres test, and we get tons of io-wq exiting on get_signal()
>>> returning true. Took a closer look, and it actually looks very much
>>> expected, as it's a SIGKILL to the original task.
>>>
>>> So it looks like I was indeed wrong, and this probably masked the
>>> original issue that was fixed in that series. I've been running with
>>> this:
>>>
>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>> index c2360cdc403d..afd1db8e000d 100644
>>> --- a/fs/io-wq.c
>>> +++ b/fs/io-wq.c
>>> @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data)
>>>  
>>>  			if (!get_signal(&ksig))
>>>  				continue;
>>> -			if (fatal_signal_pending(current) ||
>>> -			    signal_group_exit(current->signal))
>>> -				break;
>>> -			continue;
>>> +			if (ksig.sig != SIGKILL)
>>> +				printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig);
>>> +			break;
>>>  		}
>>>  		last_timeout = !ret;
>>>  	}
>>>
>>> and it's running fine and, as expected, we don't generate any printk
>>> activity as these are all fatal deliveries to the parent.
>> 
>> Good.  So just a break should be fine.
>
> Indeed, I'll send out a patch for that.
>
>> A little bit of me is concerned about not calling do_group_exit in this
>> case.  Fortunately it is not a problem as complete_signal kills all of
>> the threads in a signal_group when SIGKILL is delivered.
>> 
>> So at least until something else is refactored and io_uring threads
>> unblock another fatal signal all is well.
>
> Should we put a comment in io-wq to that effect? I don't see why we'd
> ever unblock other signals there, but...

I suspect rather we should update this comment in get_signal
instead.

		/*
		 * PF_IO_WORKER threads will catch and exit on fatal signals
		 * themselves. They have cleanup that must be performed, so
		 * we cannot call do_exit() on their behalf.
		 */
		if (current->flags & PF_IO_WORKER)
			goto out;


Although I would not mind updating io-wq.c and io_uring.c where
they call get_signal as well. 

Eric



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] io_uring fixes for 5.15-rc3
  2021-09-27 15:52               ` Eric W. Biederman
@ 2021-09-27 16:03                 ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2021-09-27 16:03 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, Oleg Nesterov, Al Viro, io-uring

On 9/27/21 9:52 AM, Eric W. Biederman wrote:
> Jens Axboe <[email protected]> writes:
> 
>> On 9/27/21 9:13 AM, Eric W. Biederman wrote:
>>> Jens Axboe <[email protected]> writes:
>>>
>>>> On 9/27/21 8:29 AM, Jens Axboe wrote:
>>>>> On 9/27/21 7:51 AM, Eric W. Biederman wrote:
>>>>>> Jens Axboe <[email protected]> writes:
>>>>>>
>>>>>>> On 9/25/21 5:05 PM, Linus Torvalds wrote:
>>>>>>>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> - io-wq core dump exit fix (me)
>>>>>>>>
>>>>>>>> Hmm.
>>>>>>>>
>>>>>>>> That one strikes me as odd.
>>>>>>>>
>>>>>>>> I get the feeling that if the io_uring thread needs to have that
>>>>>>>> signal_group_exit() test, something is wrong in signal-land.
>>>>>>>>
>>>>>>>> It's basically a "fatal signal has been sent to another thread", and I
>>>>>>>> really get the feeling that "fatal_signal_pending()" should just be
>>>>>>>> modified to handle that case too.
>>>>>>>
>>>>>>> It did surprise me as well, which is why that previous change ended up
>>>>>>> being broken for the coredump case... You could argue that the io-wq
>>>>>>> thread should just exit on signal_pending(), which is what we did
>>>>>>> before, but that really ends up sucking for workloads that do use
>>>>>>> signals for communication purposes. postgres was the reporter here.
>>>>>>
>>>>>> The primary function get_signal is to make signals not pending.  So I
>>>>>> don't understand any use of testing signal_pending after a call to
>>>>>> get_signal.
>>>>>>
>>>>>> My confusion doubles when I consider the fact io_uring threads should
>>>>>> only be dequeuing SIGSTOP and SIGKILL.
>>>>>>
>>>>>> I am concerned that an io_uring thread that dequeues SIGKILL won't call
>>>>>> signal_group_exit and thus kill the other threads in the thread group.
>>>>>>
>>>>>> What motivated removing the break and adding the fatal_signal_pending
>>>>>> test?
>>>>>
>>>>> I played with this a bit this morning, and I agree it doesn't seem to be
>>>>> needed at all. The original issue was with postgres, I'll give that a
>>>>> whirl as well and see if we run into any unwarranted exits. My simpler
>>>>> test case did not.
>>>>
>>>> Ran the postgres test, and we get tons of io-wq exiting on get_signal()
>>>> returning true. Took a closer look, and it actually looks very much
>>>> expected, as it's a SIGKILL to the original task.
>>>>
>>>> So it looks like I was indeed wrong, and this probably masked the
>>>> original issue that was fixed in that series. I've been running with
>>>> this:
>>>>
>>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>>> index c2360cdc403d..afd1db8e000d 100644
>>>> --- a/fs/io-wq.c
>>>> +++ b/fs/io-wq.c
>>>> @@ -584,10 +584,9 @@ static int io_wqe_worker(void *data)
>>>>  
>>>>  			if (!get_signal(&ksig))
>>>>  				continue;
>>>> -			if (fatal_signal_pending(current) ||
>>>> -			    signal_group_exit(current->signal))
>>>> -				break;
>>>> -			continue;
>>>> +			if (ksig.sig != SIGKILL)
>>>> +				printk("exit on sig! fatal? %d, sig=%d\n", fatal_signal_pending(current), ksig.sig);
>>>> +			break;
>>>>  		}
>>>>  		last_timeout = !ret;
>>>>  	}
>>>>
>>>> and it's running fine and, as expected, we don't generate any printk
>>>> activity as these are all fatal deliveries to the parent.
>>>
>>> Good.  So just a break should be fine.
>>
>> Indeed, I'll send out a patch for that.
>>
>>> A little bit of me is concerned about not calling do_group_exit in this
>>> case.  Fortunately it is not a problem as complete_signal kills all of
>>> the threads in a signal_group when SIGKILL is delivered.
>>>
>>> So at least until something else is refactored and io_uring threads
>>> unblock another fatal signal all is well.
>>
>> Should we put a comment in io-wq to that effect? I don't see why we'd
>> ever unblock other signals there, but...
> 
> I suspect rather we should update this comment in get_signal
> instead.
> 
> 		/*
> 		 * PF_IO_WORKER threads will catch and exit on fatal signals
> 		 * themselves. They have cleanup that must be performed, so
> 		 * we cannot call do_exit() on their behalf.
> 		 */
> 		if (current->flags & PF_IO_WORKER)
> 			goto out;
> 
> 
> Although I would not mind updating io-wq.c and io_uring.c where
> they call get_signal as well. 

Probably best to leave the explanation to the source, in get_signal(). If
you don't mind, I'll leave updating that one to you.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-27 16:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-25 20:32 [GIT PULL] io_uring fixes for 5.15-rc3 Jens Axboe
2021-09-25 23:05 ` Linus Torvalds
2021-09-26  1:20   ` Jens Axboe
2021-09-27 13:51     ` Eric W. Biederman
2021-09-27 14:29       ` Jens Axboe
2021-09-27 14:59         ` Jens Axboe
2021-09-27 15:13           ` Eric W. Biederman
2021-09-27 15:41             ` Jens Axboe
2021-09-27 15:52               ` Eric W. Biederman
2021-09-27 16:03                 ` Jens Axboe
2021-09-26  4:31   ` Eric W. Biederman
2021-09-25 23:05 ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox