* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL [not found] ` <87h7i694ij.fsf_-_@disp2133> @ 2021-06-09 20:33 ` Linus Torvalds 2021-06-09 20:48 ` Eric W. Biederman 2021-06-09 21:02 ` Olivier Langlois 2021-10-22 14:13 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Pavel Begunkov 1 sibling, 2 replies; 66+ messages in thread From: Linus Torvalds @ 2021-06-09 20:33 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Olivier Langlois, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Wed, Jun 9, 2021 at 1:17 PM Eric W. Biederman <[email protected]> wrote: >> > In short the coredump code deliberately supports being interrupted by > SIGKILL, and depends upon prepare_signal to filter out all other > signals. Hmm. I have to say, that looks like the core reason for the bug: if you want to be interrupted by a fatal signal, you shouldn't use signal_pending(), you should use fatal_signal_pending(). Now, the fact that we haven't cleared TIF_NOTIFY_SIGNAL for the first signal is clearly the immediate cause of this, but at the same time I really get the feeling that that coredump aborting code should always had used fatal_signal_pending(). We do want to be able to abort core-dumps (stuck network filesystems is the traditional reason), but the fact that it used signal_pending() looks buggy. In fact, the very comment in that dump_interrupted() function seems to acknowledge that signal_pending() is all kinds of silly. So regardless of the fact that io_uring does seem to have messed up this part of signals, I think the fix is not to change signal_pending() to task_sigpending(), but to just do what the comment suggests we should do. But also: > With the io_uring code comes an extra test in signal_pending > for TIF_NOTIFY_SIGNAL (which is something about asking a task to run > task_work_run). Jens, is this still relevant? Maybe we can revert that whole series now, and make the confusing difference between signal_pending() and task_sigpending() go away again? Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-09 20:33 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Linus Torvalds @ 2021-06-09 20:48 ` Eric W. Biederman 2021-06-09 20:52 ` Linus Torvalds 2021-06-09 21:02 ` Olivier Langlois 1 sibling, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2021-06-09 20:48 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Olivier Langlois, Jens Axboe, Pavel Begunkov>, Oleg Nesterov Linus Torvalds <[email protected]> writes: > On Wed, Jun 9, 2021 at 1:17 PM Eric W. Biederman <[email protected]> wrote: >>> >> In short the coredump code deliberately supports being interrupted by >> SIGKILL, and depends upon prepare_signal to filter out all other >> signals. > > Hmm. > > I have to say, that looks like the core reason for the bug: if you > want to be interrupted by a fatal signal, you shouldn't use > signal_pending(), you should use fatal_signal_pending(). > > Now, the fact that we haven't cleared TIF_NOTIFY_SIGNAL for the first > signal is clearly the immediate cause of this, but at the same time I > really get the feeling that that coredump aborting code should always > had used fatal_signal_pending(). > > We do want to be able to abort core-dumps (stuck network filesystems > is the traditional reason), but the fact that it used signal_pending() > looks buggy. > > In fact, the very comment in that dump_interrupted() function seems to > acknowledge that signal_pending() is all kinds of silly. > > So regardless of the fact that io_uring does seem to have messed up > this part of signals, I think the fix is not to change > signal_pending() to task_sigpending(), but to just do what the comment > suggests we should do. It looks like it would need to be: static bool dump_interrupted(void) { return fatal_signal_pending() || freezing(); } As the original implementation of dump_interrupted 528f827ee0bb ("coredump: introduce dump_interrupted()") is deliberately allowing the freezer to terminate the core dumps to allow for reliable system suspend. > > But also: > >> With the io_uring code comes an extra test in signal_pending >> for TIF_NOTIFY_SIGNAL (which is something about asking a task to run >> task_work_run). > > Jens, is this still relevant? Maybe we can revert that whole series > now, and make the confusing difference between signal_pending() and > task_sigpending() go away again? > > Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-09 20:48 ` Eric W. Biederman @ 2021-06-09 20:52 ` Linus Torvalds 0 siblings, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2021-06-09 20:52 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Olivier Langlois, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Wed, Jun 9, 2021 at 1:48 PM Eric W. Biederman <[email protected]> wrote: > > It looks like it would need to be: > > static bool dump_interrupted(void) > { > return fatal_signal_pending() || freezing(); > } > > As the original implementation of dump_interrupted 528f827ee0bb > ("coredump: introduce dump_interrupted()") is deliberately allowing the > freezer to terminate the core dumps to allow for reliable system > suspend. Ack. That would seem to be the right conversion to do. Now, I'm not sure if system suspend really should abort a core dump, but it's clearly what we have done in the past. Maybe we'd like to remove that "|| freezing()" at some point, but that's a separate discussion, I think. At least having it in that form makes it all very explicit, instead of the current very subtle exact interaction with the TIF_NOTIFY_SIGNAL bit (that has other meanings). Hmm? Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-09 20:33 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Linus Torvalds 2021-06-09 20:48 ` Eric W. Biederman @ 2021-06-09 21:02 ` Olivier Langlois 2021-06-09 21:05 ` Eric W. Biederman 1 sibling, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2021-06-09 21:02 UTC (permalink / raw) To: Linus Torvalds, Eric W. Biederman Cc: Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Wed, 2021-06-09 at 13:33 -0700, Linus Torvalds wrote: > Now, the fact that we haven't cleared TIF_NOTIFY_SIGNAL for the first > signal is clearly the immediate cause of this, but at the same time I > really get the feeling that that coredump aborting code should always > had used fatal_signal_pending(). I need clarify what does happen with the io_uring situation. If somehow, TIF_NOTIFY_SIGNAL wasn't cleared, I would get all the time a 0 byte size core dump because do_coredump() does check if the dump is interrupted before writing a single byte. io_uring is quite a strange animal. AFAIK, the common pattern to use a wait_queue is to insert a task into it and then put that task to sleep until the waited event occur. io_uring place tasks into wait queues and then let the the task return to user space to do some other stuff (like core dumping). I would guess that it is the main reason for it using the task_work feature. So the TIF_NOTIFY_SIGNAL does get set WHILE the core dump is written. Greetings, Olivier ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-09 21:02 ` Olivier Langlois @ 2021-06-09 21:05 ` Eric W. Biederman 2021-06-09 21:26 ` Olivier Langlois 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2021-06-09 21:05 UTC (permalink / raw) To: Olivier Langlois Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov Olivier Langlois <[email protected]> writes: > On Wed, 2021-06-09 at 13:33 -0700, Linus Torvalds wrote: >> Now, the fact that we haven't cleared TIF_NOTIFY_SIGNAL for the first >> signal is clearly the immediate cause of this, but at the same time I >> really get the feeling that that coredump aborting code should always >> had used fatal_signal_pending(). > > I need clarify what does happen with the io_uring situation. If > somehow, TIF_NOTIFY_SIGNAL wasn't cleared, I would get all the time a 0 > byte size core dump because do_coredump() does check if the dump is > interrupted before writing a single byte. > > io_uring is quite a strange animal. AFAIK, the common pattern to use a > wait_queue is to insert a task into it and then put that task to sleep > until the waited event occur. > > io_uring place tasks into wait queues and then let the the task return > to user space to do some other stuff (like core dumping). I would guess > that it is the main reason for it using the task_work feature. > > So the TIF_NOTIFY_SIGNAL does get set WHILE the core dump is written. Did you mean? So the TIF_NOTIFY_SIGNAL does _not_ get set WHILE the core dump is written. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-09 21:05 ` Eric W. Biederman @ 2021-06-09 21:26 ` Olivier Langlois 2021-06-09 21:56 ` Olivier Langlois 2021-06-10 14:26 ` Eric W. Biederman 0 siblings, 2 replies; 66+ messages in thread From: Olivier Langlois @ 2021-06-09 21:26 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Wed, 2021-06-09 at 16:05 -0500, Eric W. Biederman wrote: > > > > So the TIF_NOTIFY_SIGNAL does get set WHILE the core dump is > > written. > > Did you mean? > > So the TIF_NOTIFY_SIGNAL does _not_ get set WHILE the core dump is > written. > > Absolutely not. I did really mean what I have said. Bear with me that, I am not qualifying myself as an expert kernel dev yet so feel free to correct me if I say some heresy... io_uring is placing my task in my TCP socket wait queue because it wants to read data from it. The task returns to user space and core dump with a SEGV. now my understanding is that the code that is waking up tasks, it is the NIC driver interrupt handler which can occur while the core dump is written. does that make sense? my testing is telling me that this is exactly what happens... ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-09 21:26 ` Olivier Langlois @ 2021-06-09 21:56 ` Olivier Langlois 2021-06-10 14:26 ` Eric W. Biederman 1 sibling, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2021-06-09 21:56 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Wed, 2021-06-09 at 17:26 -0400, Olivier Langlois wrote: > On Wed, 2021-06-09 at 16:05 -0500, Eric W. Biederman wrote: > > > > > > So the TIF_NOTIFY_SIGNAL does get set WHILE the core dump is > > > written. > > > > Did you mean? > > > > So the TIF_NOTIFY_SIGNAL does _not_ get set WHILE the core dump is > > written. > > > > > Absolutely not. I did really mean what I have said. Bear with me > that, > I am not qualifying myself as an expert kernel dev yet so feel free > to > correct me if I say some heresy... > > io_uring is placing my task in my TCP socket wait queue because it > wants to read data from it. > > The task returns to user space and core dump with a SEGV. > > now my understanding is that the code that is waking up tasks, it is > the NIC driver interrupt handler which can occur while the core dump > is > written. > > does that make sense? > > my testing is telling me that this is exactly what happens... > > Another thing to know is that dump_interrupted() isn't only called from do_coredump(). At first, I did the mistake to think that if dump_interrupt() was returning false when called from do_coredump() all was good. It is not the case. dump_interrupted() is also called from dump_emit() which is called from several places by functions inside binfmt_elf.c So dump_interrupted() is called several times during the coredump generation. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-09 21:26 ` Olivier Langlois 2021-06-09 21:56 ` Olivier Langlois @ 2021-06-10 14:26 ` Eric W. Biederman 2021-06-10 15:17 ` Olivier Langlois 2021-06-10 18:58 ` [CFT}[PATCH] coredump: Limit what can interrupt coredumps Eric W. Biederman 1 sibling, 2 replies; 66+ messages in thread From: Eric W. Biederman @ 2021-06-10 14:26 UTC (permalink / raw) To: Olivier Langlois Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov Olivier Langlois <[email protected]> writes: > On Wed, 2021-06-09 at 16:05 -0500, Eric W. Biederman wrote: >> > >> > So the TIF_NOTIFY_SIGNAL does get set WHILE the core dump is >> > written. >> >> Did you mean? >> >> So the TIF_NOTIFY_SIGNAL does _not_ get set WHILE the core dump is >> written. >> >> > Absolutely not. I did really mean what I have said. Bear with me that, > I am not qualifying myself as an expert kernel dev yet so feel free to > correct me if I say some heresy... No. I was just asking to make certain I understood what you said. I thought you said you were getting a consistent 0 byte coredump, and that implied that TIF_NOTIFY_SIGNAL was coming in before the coredump even started. > io_uring is placing my task in my TCP socket wait queue because it > wants to read data from it. > > The task returns to user space and core dump with a SEGV. > > now my understanding is that the code that is waking up tasks, it is > the NIC driver interrupt handler which can occur while the core dump is > written. > > does that make sense? > > my testing is telling me that this is exactly what happens... If you are getting partial coredumps that completely makes sense. I was hoping that by this time Jens or Oleg would have been able to chime in and at least confirm I am not missing something subtle. I was afraid for a little bit that the file system code in called in dump_emit would be checking signal_pending. After looking into that I see that the filesystem code very reasonably limits itself to testing fatal_signal_pending (because by definition disk I/O on unix is not interruptible). So I will spin up a good version of my patch (based on your patch) so we can unbreak coredumps. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-06-10 14:26 ` Eric W. Biederman @ 2021-06-10 15:17 ` Olivier Langlois 2021-06-10 18:58 ` [CFT}[PATCH] coredump: Limit what can interrupt coredumps Eric W. Biederman 1 sibling, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2021-06-10 15:17 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Thu, 2021-06-10 at 09:26 -0500, Eric W. Biederman wrote: > Olivier Langlois <[email protected]> writes: > > > On Wed, 2021-06-09 at 16:05 -0500, Eric W. Biederman wrote: > > > > > > > > So the TIF_NOTIFY_SIGNAL does get set WHILE the core dump is > > > > written. > > > > > > Did you mean? > > > > > > So the TIF_NOTIFY_SIGNAL does _not_ get set WHILE the core dump > > > is > > > written. > > > > > > > > Absolutely not. I did really mean what I have said. Bear with me > > that, > > I am not qualifying myself as an expert kernel dev yet so feel free > > to > > correct me if I say some heresy... > > No. I was just asking to make certain I understood what you said. > > I thought you said you were getting a consistent 0 byte coredump, > and that implied that TIF_NOTIFY_SIGNAL was coming in before > the coredump even started. due to the asynchronous nature of the problem, it is all random. Sometimes, I do get 0 byte coredump. Most of the times, I get a truncated one and very rarely (this is why the issue was so annoying), I get a full coredump. > > So I will spin up a good version of my patch (based on your patch) > so we can unbreak coredumps. > That is super nice. I am looking forward it! Greetings, Olivier ^ permalink raw reply [flat|nested] 66+ messages in thread
* [CFT}[PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 14:26 ` Eric W. Biederman 2021-06-10 15:17 ` Olivier Langlois @ 2021-06-10 18:58 ` Eric W. Biederman 2021-06-10 19:10 ` Linus Torvalds 1 sibling, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2021-06-10 18:58 UTC (permalink / raw) To: Olivier Langlois Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov Olivier Langlois has been struggling with coredumps written incompletely in processes using io_uring. Olivier Langlois <[email protected]> writes: > io_uring is a big user of task_work and any event that io_uring made a > task waiting for that occurs during the core dump generation will > generate a TIF_NOTIFY_SIGNAL. > > Here are the detailed steps of the problem: > 1. io_uring calls vfs_poll() to install a task to a file wait queue > with io_async_wake() as the wakeup function cb from io_arm_poll_handler() > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling > set_notify_signal() The coredump code deliberately supports being interrupted by SIGKILL, and depends upon prepare_signal to filter out all other signals. Now that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack in dump_emitted by the coredump code no longer works. Make the coredump code more robust by explicitly testing for all of the wakeup conditions the coredump code supports. This prevents new wakeup conditions from breaking the coredump code, as well as fixing the current issue. The filesystem code that the coredump code uses already limits itself to only aborting on fatal_signal_pending. So it should not develop surprising wake-up reasons either. With dump_interrupted properly testing for the reasons it supports being interrupted remove the special case from prepare_signal. Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Reported-by: Olivier Langlois <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> --- Olivier can you test this, and confirm this works for you? fs/coredump.c | 2 +- kernel/signal.c | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 2868e3e171ae..c3d8fc14b993 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -519,7 +519,7 @@ static bool dump_interrupted(void) * but then we need to teach dump_write() to restart and clear * TIF_SIGPENDING. */ - return signal_pending(current); + return fatal_signal_pending(current) || freezing(current); } static void wait_for_dump_helpers(struct file *file) diff --git a/kernel/signal.c b/kernel/signal.c index f7c6ffcbd044..83d534deeb76 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -943,8 +943,6 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) sigset_t flush; if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { - if (!(signal->flags & SIGNAL_GROUP_EXIT)) - return sig == SIGKILL; /* * The process is in the middle of dying, nothing to do. */ -- 2.20.1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [CFT}[PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 18:58 ` [CFT}[PATCH] coredump: Limit what can interrupt coredumps Eric W. Biederman @ 2021-06-10 19:10 ` Linus Torvalds 2021-06-10 19:18 ` Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2021-06-10 19:10 UTC (permalink / raw) To: Eric W. Biederman Cc: Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Thu, Jun 10, 2021 at 12:01 PM Eric W. Biederman <[email protected]> wrote: > > diff --git a/kernel/signal.c b/kernel/signal.c > index f7c6ffcbd044..83d534deeb76 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -943,8 +943,6 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) > sigset_t flush; > > if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { > - if (!(signal->flags & SIGNAL_GROUP_EXIT)) > - return sig == SIGKILL; > /* > * The process is in the middle of dying, nothing to do. > */ I do think this part of the patch is correct, but I'd like to know what triggered this change? It seems fairly harmless - SIGKILL used to be the only signal that was passed through in the coredump case, now you pass through all non-ignored signals. But since SIGKILL is the only signal that is relevant for the fatal_signal_pending() case, this change seems irrelevant for the coredump issue. Any other signals passed through won't matter. End result: I think removing those two lines is likely a good idea, but I also suspect it could/should just be a separate patch with a separate explanation for it. Hmm? Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [CFT}[PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 19:10 ` Linus Torvalds @ 2021-06-10 19:18 ` Eric W. Biederman 2021-06-10 19:50 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2021-06-10 19:18 UTC (permalink / raw) To: Linus Torvalds Cc: Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov Linus Torvalds <[email protected]> writes: > On Thu, Jun 10, 2021 at 12:01 PM Eric W. Biederman > <[email protected]> wrote: >> >> diff --git a/kernel/signal.c b/kernel/signal.c >> index f7c6ffcbd044..83d534deeb76 100644 >> --- a/kernel/signal.c >> +++ b/kernel/signal.c >> @@ -943,8 +943,6 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) >> sigset_t flush; >> >> if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { >> - if (!(signal->flags & SIGNAL_GROUP_EXIT)) >> - return sig == SIGKILL; >> /* >> * The process is in the middle of dying, nothing to do. >> */ > > I do think this part of the patch is correct, but I'd like to know > what triggered this change? > > It seems fairly harmless - SIGKILL used to be the only signal that was > passed through in the coredump case, now you pass through all > non-ignored signals. > > But since SIGKILL is the only signal that is relevant for the > fatal_signal_pending() case, this change seems irrelevant for the > coredump issue. Any other signals passed through won't matter. > > End result: I think removing those two lines is likely a good idea, > but I also suspect it could/should just be a separate patch with a > separate explanation for it. > > Hmm? I just didn't want those two lines hiding any other issues we might have in the coredumps. That is probably better development thinking than minimal fix thinking. I am annoyed at the moment that those two lines even exist and figure they are the confusing root cause of the problem. That if we had realized all it would take was to call fatal_signal_pending || freezing than we could have avoided a problem entirely. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [CFT}[PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 19:18 ` Eric W. Biederman @ 2021-06-10 19:50 ` Linus Torvalds 2021-06-10 20:11 ` [PATCH] " Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2021-06-10 19:50 UTC (permalink / raw) To: Eric W. Biederman Cc: Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Thu, Jun 10, 2021 at 12:18 PM Eric W. Biederman <[email protected]> wrote: > > I just didn't want those two lines hiding any other issues we might > have in the coredumps. > > That is probably better development thinking than minimal fix thinking. Well, I think we should first do the minimal targeted fix (the part in fs/coredump.c). Then we should look at whether we could do cleanups as a result of that fix. And I suspect the cleanups might bigger than the two-liner removal. The whole SIGNAL_GROUP_COREDUMP flag was introduced for this issue, See commit 403bad72b67d ("coredump: only SIGKILL should interrupt the coredumping task") which introduced this all. Now, we have since grown other users of SIGNAL_GROUP_COREDUMP - OOM hanmdling and the clear_child_tid thing in mm_release(). So maybe we should keep SIGNAL_GROUP_COREDUMP around. So maybe only those two lines end up being the ones to remove, but I'd really like to think of it as a separate thing from the fix itself. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 19:50 ` Linus Torvalds @ 2021-06-10 20:11 ` Eric W. Biederman 2021-06-10 21:04 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 66+ messages in thread From: Eric W. Biederman @ 2021-06-10 20:11 UTC (permalink / raw) To: Linus Torvalds Cc: Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov Olivier Langlois has been struggling with coredumps being incompletely written in processes using io_uring. Olivier Langlois <[email protected]> writes: > io_uring is a big user of task_work and any event that io_uring made a > task waiting for that occurs during the core dump generation will > generate a TIF_NOTIFY_SIGNAL. > > Here are the detailed steps of the problem: > 1. io_uring calls vfs_poll() to install a task to a file wait queue > with io_async_wake() as the wakeup function cb from io_arm_poll_handler() > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling > set_notify_signal() The coredump code deliberately supports being interrupted by SIGKILL, and depends upon prepare_signal to filter out all other signals. Now that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack in dump_emitted by the coredump code no longer works. Make the coredump code more robust by explicitly testing for all of the wakeup conditions the coredump code supports. This prevents new wakeup conditions from breaking the coredump code, as well as fixing the current issue. The filesystem code that the coredump code uses already limits itself to only aborting on fatal_signal_pending. So it should not develop surprising wake-up reasons either. v2: Don't remove the now unnecessary code in prepare_signal. Cc: [email protected] Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Reported-by: Olivier Langlois <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> --- fs/coredump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/coredump.c b/fs/coredump.c index 2868e3e171ae..c3d8fc14b993 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -519,7 +519,7 @@ static bool dump_interrupted(void) * but then we need to teach dump_write() to restart and clear * TIF_SIGPENDING. */ - return signal_pending(current); + return fatal_signal_pending(current) || freezing(current); } static void wait_for_dump_helpers(struct file *file) -- 2.20.1 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 20:11 ` [PATCH] " Eric W. Biederman @ 2021-06-10 21:04 ` Linus Torvalds 2021-06-12 14:36 ` Olivier Langlois 2021-06-14 14:10 ` Oleg Nesterov 2 siblings, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2021-06-10 21:04 UTC (permalink / raw) To: Eric W. Biederman Cc: Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Thu, Jun 10, 2021 at 1:11 PM Eric W. Biederman <[email protected]> wrote: > > Make the coredump code more robust by explicitly testing for all of > the wakeup conditions the coredump code supports. This prevents > new wakeup conditions from breaking the coredump code, as well > as fixing the current issue. Thanks, applied. And lots of thanks to Olivier who kept debugging the odd coredump behavior he saw. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 20:11 ` [PATCH] " Eric W. Biederman 2021-06-10 21:04 ` Linus Torvalds @ 2021-06-12 14:36 ` Olivier Langlois 2021-06-12 16:26 ` Jens Axboe 2021-06-14 14:10 ` Oleg Nesterov 2 siblings, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2021-06-12 14:36 UTC (permalink / raw) To: Eric W. Biederman, Linus Torvalds Cc: Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov>, Oleg Nesterov On Thu, 2021-06-10 at 15:11 -0500, Eric W. Biederman wrote: > > Olivier Langlois has been struggling with coredumps being incompletely > written in > processes using io_uring. > > Olivier Langlois <[email protected]> writes: > > io_uring is a big user of task_work and any event that io_uring made > > a > > task waiting for that occurs during the core dump generation will > > generate a TIF_NOTIFY_SIGNAL. > > > > Here are the detailed steps of the problem: > > 1. io_uring calls vfs_poll() to install a task to a file wait queue > > with io_async_wake() as the wakeup function cb from > > io_arm_poll_handler() > > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL > > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling > > set_notify_signal() > > The coredump code deliberately supports being interrupted by SIGKILL, > and depends upon prepare_signal to filter out all other signals. Now > that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack > in dump_emitted by the coredump code no longer works. > > Make the coredump code more robust by explicitly testing for all of > the wakeup conditions the coredump code supports. This prevents > new wakeup conditions from breaking the coredump code, as well > as fixing the current issue. > > The filesystem code that the coredump code uses already limits > itself to only aborting on fatal_signal_pending. So it should > not develop surprising wake-up reasons either. > > v2: Don't remove the now unnecessary code in prepare_signal. > > Cc: [email protected] > Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") > Reported-by: Olivier Langlois <[email protected]> > Signed-off-by: "Eric W. Biederman" <[email protected]> > --- > fs/coredump.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 2868e3e171ae..c3d8fc14b993 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -519,7 +519,7 @@ static bool dump_interrupted(void) > * but then we need to teach dump_write() to restart and clear > * TIF_SIGPENDING. > */ > - return signal_pending(current); > + return fatal_signal_pending(current) || freezing(current); > } > > static void wait_for_dump_helpers(struct file *file) Tested-by: Olivier Langlois <[email protected]> ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-12 14:36 ` Olivier Langlois @ 2021-06-12 16:26 ` Jens Axboe 0 siblings, 0 replies; 66+ messages in thread From: Jens Axboe @ 2021-06-12 16:26 UTC (permalink / raw) To: Olivier Langlois, Eric W. Biederman, Linus Torvalds Cc: Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov>, Oleg Nesterov On 6/12/21 8:36 AM, Olivier Langlois wrote: > On Thu, 2021-06-10 at 15:11 -0500, Eric W. Biederman wrote: >> >> Olivier Langlois has been struggling with coredumps being incompletely >> written in >> processes using io_uring. >> >> Olivier Langlois <[email protected]> writes: >>> io_uring is a big user of task_work and any event that io_uring made >>> a >>> task waiting for that occurs during the core dump generation will >>> generate a TIF_NOTIFY_SIGNAL. >>> >>> Here are the detailed steps of the problem: >>> 1. io_uring calls vfs_poll() to install a task to a file wait queue >>> with io_async_wake() as the wakeup function cb from >>> io_arm_poll_handler() >>> 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL >>> 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling >>> set_notify_signal() >> >> The coredump code deliberately supports being interrupted by SIGKILL, >> and depends upon prepare_signal to filter out all other signals. Now >> that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack >> in dump_emitted by the coredump code no longer works. >> >> Make the coredump code more robust by explicitly testing for all of >> the wakeup conditions the coredump code supports. This prevents >> new wakeup conditions from breaking the coredump code, as well >> as fixing the current issue. >> >> The filesystem code that the coredump code uses already limits >> itself to only aborting on fatal_signal_pending. So it should >> not develop surprising wake-up reasons either. >> >> v2: Don't remove the now unnecessary code in prepare_signal. >> >> Cc: [email protected] >> Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") >> Reported-by: Olivier Langlois <[email protected]> >> Signed-off-by: "Eric W. Biederman" <[email protected]> >> --- >> fs/coredump.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/coredump.c b/fs/coredump.c >> index 2868e3e171ae..c3d8fc14b993 100644 >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -519,7 +519,7 @@ static bool dump_interrupted(void) >> * but then we need to teach dump_write() to restart and clear >> * TIF_SIGPENDING. >> */ >> - return signal_pending(current); >> + return fatal_signal_pending(current) || freezing(current); >> } >> >> static void wait_for_dump_helpers(struct file *file) > > Tested-by: Olivier Langlois <[email protected]> Thanks Olivier and Eric for taking care of this. I've been mostly offline for more than a week, back at it next week. -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-10 20:11 ` [PATCH] " Eric W. Biederman 2021-06-10 21:04 ` Linus Torvalds 2021-06-12 14:36 ` Olivier Langlois @ 2021-06-14 14:10 ` Oleg Nesterov 2021-06-14 16:37 ` Eric W. Biederman 2021-06-15 22:08 ` Eric W. Biederman 2 siblings, 2 replies; 66+ messages in thread From: Oleg Nesterov @ 2021-06-14 14:10 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> Eric, et al, sorry for delay, I didn't read emails several days. On 06/10, Eric W. Biederman wrote: > > v2: Don't remove the now unnecessary code in prepare_signal. No, that code is still needed. Otherwise any fatal signal will be turned into SIGKILL. > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -519,7 +519,7 @@ static bool dump_interrupted(void) > * but then we need to teach dump_write() to restart and clear > * TIF_SIGPENDING. > */ > - return signal_pending(current); > + return fatal_signal_pending(current) || freezing(current); > } Well yes, this is what the comment says. But note that there is another reason why dump_interrupted() returns true if signal_pending(), it assumes thagt __dump_emit()->__kernel_write() may fail anyway if signal_pending() is true. Say, pipe_write(), or iirc nfs, perhaps something else... That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should clear TIF_NOTIFY_SIGNAL as well and we should change io-uring to not abuse the dumping threads? Or perhaps we should change __dump_emit() to clear signal_pending() and restart __kernel_write() if it fails or returns a short write. Otherwise the change above doesn't look like a full fix to me. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-14 14:10 ` Oleg Nesterov @ 2021-06-14 16:37 ` Eric W. Biederman 2021-06-14 16:59 ` Oleg Nesterov 2021-06-15 22:08 ` Eric W. Biederman 1 sibling, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2021-06-14 16:37 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> Oleg Nesterov <[email protected]> writes: > Eric, et al, sorry for delay, I didn't read emails several days. > > On 06/10, Eric W. Biederman wrote: >> >> v2: Don't remove the now unnecessary code in prepare_signal. > > No, that code is still needed. Otherwise any fatal signal will be > turned into SIGKILL. > >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -519,7 +519,7 @@ static bool dump_interrupted(void) >> * but then we need to teach dump_write() to restart and clear >> * TIF_SIGPENDING. >> */ >> - return signal_pending(current); >> + return fatal_signal_pending(current) || freezing(current); >> } > > > Well yes, this is what the comment says. > > But note that there is another reason why dump_interrupted() returns true > if signal_pending(), it assumes thagt __dump_emit()->__kernel_write() may > fail anyway if signal_pending() is true. Say, pipe_write(), or iirc nfs, > perhaps something else... The pipe_write case is a good case to consider. In general filesystems are only allowed to stop if fatal_signal_pending. It is an ancient unix/posix thing. > That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should clear > TIF_NOTIFY_SIGNAL as well and we should change io-uring to not abuse the > dumping threads? I would very much like some clarity on TIF_NOTIFY_SIGNAL. At the very least it would be nice if it could get renamed TIF_NOTIFY_TASK_WORK. I don't know what it has to do with signals. > Or perhaps we should change __dump_emit() to clear signal_pending() and > restart __kernel_write() if it fails or returns a short write. Given that the code needs to handle pipes that seems a reasonable thing to do. Note. All of the blocking of new signals in prepare_signal is still in place. So only a SIGKILL can come in set TIF_SIGPENDING. So I would say that the current fix is correct (and safe to backport). But still requires some magic in prepare_signal until we do more. I don't understand the logic with well enough of adding work to non-io_uring threads with task_work_add to understand why that happens in the first place. There are a lot of silly corners here. Yes please let's keep on digging. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-14 16:37 ` Eric W. Biederman @ 2021-06-14 16:59 ` Oleg Nesterov 0 siblings, 0 replies; 66+ messages in thread From: Oleg Nesterov @ 2021-06-14 16:59 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> On 06/14, Eric W. Biederman wrote: > > I would very much like some clarity on TIF_NOTIFY_SIGNAL. At the very > least it would be nice if it could get renamed TIF_NOTIFY_TASK_WORK. No, no, no ;) I think that, for example, freezer should be changed to use set_notify_signal() rather than fake_signal_wake_up(). Livepatch. And probably it will have more users. > I don't understand the logic with well enough of adding work to > non-io_uring threads with task_work_add to understand why that happens > in the first place. Same here. Oleg. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-14 14:10 ` Oleg Nesterov 2021-06-14 16:37 ` Eric W. Biederman @ 2021-06-15 22:08 ` Eric W. Biederman 2021-06-16 19:23 ` Olivier Langlois 2021-08-05 13:06 ` Olivier Langlois 1 sibling, 2 replies; 66+ messages in thread From: Eric W. Biederman @ 2021-06-15 22:08 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Olivier Langlois, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> Oleg Nesterov <[email protected]> writes: >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -519,7 +519,7 @@ static bool dump_interrupted(void) >> * but then we need to teach dump_write() to restart and clear >> * TIF_SIGPENDING. >> */ >> - return signal_pending(current); >> + return fatal_signal_pending(current) || freezing(current); >> } > > > Well yes, this is what the comment says. > > But note that there is another reason why dump_interrupted() returns true > if signal_pending(), it assumes thagt __dump_emit()->__kernel_write() may > fail anyway if signal_pending() is true. Say, pipe_write(), or iirc nfs, > perhaps something else... > > That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should clear > TIF_NOTIFY_SIGNAL as well and we should change io-uring to not abuse the > dumping threads? > > Or perhaps we should change __dump_emit() to clear signal_pending() and > restart __kernel_write() if it fails or returns a short write. > > Otherwise the change above doesn't look like a full fix to me. Agreed. The coredump to a pipe will still be short. That needs something additional. The problem Olivier Langlois <[email protected]> reported was core dumps coming up short because TIF_NOTIFY_SIGNAL was being set during a core dump. We can see this with pipe_write returning -ERESTARTSYS on a full pipe if signal_pending which includes TIF_NOTIFY_SIGNAL is true. Looking further if the thread that is core dumping initiated any io_uring work then io_ring_exit_work will use task_work_add to request that thread clean up it's io_uring state. Perhaps we can put a big comment in dump_emit and if we get back -ERESTARTSYS run tracework_notify_signal. I am not seeing any locks held at that point in the coredump, so it should be safe. The coredump is run inside of file_start_write which is the only potential complication. The code flow is complicated but it looks like the entire point of the exercise is to call io_uring_del_task_file on the originating thread. I suppose that keeps the locking of the xarray in io_uring_task simple. Hmm. All of this comes from io_uring_release. How do we get to io_uring_release? The coredump should be catching everything in exit_mm before exit_files? Confused and hopeful someone can explain to me what is going on, and perhaps simplify it. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-15 22:08 ` Eric W. Biederman @ 2021-06-16 19:23 ` Olivier Langlois 2021-06-16 20:00 ` Eric W. Biederman 2021-08-05 13:06 ` Olivier Langlois 1 sibling, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2021-06-16 19:23 UTC (permalink / raw) To: Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> On Tue, 2021-06-15 at 17:08 -0500, Eric W. Biederman wrote: > Oleg Nesterov <[email protected]> writes: > > > > --- a/fs/coredump.c > > > +++ b/fs/coredump.c > > > @@ -519,7 +519,7 @@ static bool dump_interrupted(void) > > > * but then we need to teach dump_write() to restart and > > > clear > > > * TIF_SIGPENDING. > > > */ > > > - return signal_pending(current); > > > + return fatal_signal_pending(current) || > > > freezing(current); > > > } > > > > > > Well yes, this is what the comment says. > > > > But note that there is another reason why dump_interrupted() > > returns true > > if signal_pending(), it assumes thagt __dump_emit()- > > >__kernel_write() may > > fail anyway if signal_pending() is true. Say, pipe_write(), or iirc > > nfs, > > perhaps something else... > > > > That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should > > clear > > TIF_NOTIFY_SIGNAL as well and we should change io-uring to not > > abuse the > > dumping threads? > > > > Or perhaps we should change __dump_emit() to clear signal_pending() > > and > > restart __kernel_write() if it fails or returns a short write. > > > > Otherwise the change above doesn't look like a full fix to me. > > Agreed. The coredump to a pipe will still be short. That needs > something additional. > > The problem Olivier Langlois <[email protected]> reported was > core dumps coming up short because TIF_NOTIFY_SIGNAL was being > set during a core dump. > > We can see this with pipe_write returning -ERESTARTSYS > on a full pipe if signal_pending which includes TIF_NOTIFY_SIGNAL > is true. > Eric, I redid my test but this time instead of dumping directly into a file, I did let the coredump be piped to the systemd coredump module and the coredump generation isn't working as expected when piping. So your code review conclusions are correct. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-16 19:23 ` Olivier Langlois @ 2021-06-16 20:00 ` Eric W. Biederman 2021-06-18 20:05 ` Olivier Langlois 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2021-06-16 20:00 UTC (permalink / raw) To: Olivier Langlois Cc: Oleg Nesterov, Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> Olivier Langlois <[email protected]> writes: > I redid my test but this time instead of dumping directly into a file, > I did let the coredump be piped to the systemd coredump module and the > coredump generation isn't working as expected when piping. > > So your code review conclusions are correct. Thank you for confirming that. Do you know how your test program is using io_uring? I have been trying to put the pieces together on what io_uring is doing that stops the coredump. The fact that it takes a little while before it kills the coredump is a little puzzling. The code looks like all of the io_uring operations should have been canceled before the coredump starts. Thanks, Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-16 20:00 ` Eric W. Biederman @ 2021-06-18 20:05 ` Olivier Langlois 0 siblings, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2021-06-18 20:05 UTC (permalink / raw) To: Eric W. Biederman Cc: Oleg Nesterov, Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> On Wed, 2021-06-16 at 15:00 -0500, Eric W. Biederman wrote: > Olivier Langlois <[email protected]> writes: > > > I redid my test but this time instead of dumping directly into a > > file, > > I did let the coredump be piped to the systemd coredump module and > > the > > coredump generation isn't working as expected when piping. > > > > So your code review conclusions are correct. > > Thank you for confirming that. > > Do you know how your test program is using io_uring? > > I have been trying to put the pieces together on what io_uring is > doing > that stops the coredump. The fact that it takes a little while > before > it kills the coredump is a little puzzling. The code looks like all > of > the io_uring operations should have been canceled before the coredump > starts. > > With a very simple setup, I guess that this could easily be reproducible. Make a TCP connection with a server that is streaming non-stop data and enter a loop where you keep initiating async OP_IOURING_READ operations on your TCP fd. Once you have that, manually sending a SIG_SEGV is a sure fire way to stumble into the problem. This is how I am testing the patches. IRL, it is possible to call io_uring_enter() to submit operations and return from the syscall without waiting on all events to have completed. Once the process is back in userspace, if it stumble into a bug that triggers a coredump, any remaining pending I/O operations can set TIF_SIGNAL_NOTIFY while the coredump is generated. I have read the part of your previous email where you share the result of your ongoing investigation. I didn't comment as the definitive references in io_uring matters are Jens and Pavel but I am going to share my opinion on the matter. I think that you did put the finger on the code cleaning up the io_uring instance in regards to pending operations. It seems to be io_uring_release() which is probably called from exit_files() which happens to be after the call to exit_mm(). At first, I did entertain the idea of considering if it could be possible to duplicate some of the operations performed by io_uring_release() related to the infamous TIF_SIGNAL_NOTIFY setting into io_uring_files_cancel() which is called before exit_mm(). but the idea is useless as it is not the other threads of the group that are causing the TIF_SIGNAL_NOTIFY problem. It is the thread calling do_coredump() which is done by the signal handing code even before that thread enters do_exit() and start to be cleaned up. That thread when it enters do_coredump() is still fully loaded and operational in terms of io_uring functionality. I guess that this io_uring cancel all pending operations hook would have to be called from do_coredump or from get_signal() but if it is the way to go, I feel that this is a change major enough that wouldn't dare going there without the blessing of the maintainers in cause.... ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-06-15 22:08 ` Eric W. Biederman 2021-06-16 19:23 ` Olivier Langlois @ 2021-08-05 13:06 ` Olivier Langlois 2021-08-10 21:48 ` Tony Battersby 1 sibling, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2021-08-05 13:06 UTC (permalink / raw) To: Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> On Tue, 2021-06-15 at 17:08 -0500, Eric W. Biederman wrote: > Oleg Nesterov <[email protected]> writes: > > > > --- a/fs/coredump.c > > > +++ b/fs/coredump.c > > > @@ -519,7 +519,7 @@ static bool dump_interrupted(void) > > > * but then we need to teach dump_write() to restart and > > > clear > > > * TIF_SIGPENDING. > > > */ > > > - return signal_pending(current); > > > + return fatal_signal_pending(current) || freezing(current); > > > } > > > > > > Well yes, this is what the comment says. > > > > But note that there is another reason why dump_interrupted() returns > > true > > if signal_pending(), it assumes thagt __dump_emit()->__kernel_write() > > may > > fail anyway if signal_pending() is true. Say, pipe_write(), or iirc > > nfs, > > perhaps something else... > > > > That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should > > clear > > TIF_NOTIFY_SIGNAL as well and we should change io-uring to not abuse > > the > > dumping threads? > > > > Or perhaps we should change __dump_emit() to clear signal_pending() > > and > > restart __kernel_write() if it fails or returns a short write. > > > > Otherwise the change above doesn't look like a full fix to me. > > Agreed. The coredump to a pipe will still be short. That needs > something additional. > > The problem Olivier Langlois <[email protected]> reported was > core dumps coming up short because TIF_NOTIFY_SIGNAL was being > set during a core dump. > > We can see this with pipe_write returning -ERESTARTSYS > on a full pipe if signal_pending which includes TIF_NOTIFY_SIGNAL > is true. > > Looking further if the thread that is core dumping initiated > any io_uring work then io_ring_exit_work will use task_work_add > to request that thread clean up it's io_uring state. > > Perhaps we can put a big comment in dump_emit and if we > get back -ERESTARTSYS run tracework_notify_signal. I am not > seeing any locks held at that point in the coredump, so it > should be safe. The coredump is run inside of file_start_write > which is the only potential complication. > > > > The code flow is complicated but it looks like the entire > point of the exercise is to call io_uring_del_task_file > on the originating thread. I suppose that keeps the > locking of the xarray in io_uring_task simple. > > > Hmm. All of this comes from io_uring_release. > How do we get to io_uring_release? The coredump should > be catching everything in exit_mm before exit_files? > > Confused and hopeful someone can explain to me what is going on, > and perhaps simplify it. > > Eric Hi all, I didn't forgot about this remaining issue and I have kept thinking about it on and off. I did try the following on 5.12.19: diff --git a/fs/coredump.c b/fs/coredump.c index 07afb5ddb1c4..614fe7a54c1a 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -41,6 +41,7 @@ #include <linux/fs.h> #include <linux/path.h> #include <linux/timekeeping.h> +#include <linux/io_uring.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t *siginfo) need_suid_safe = true; } + io_uring_files_cancel(current->files); + retval = coredump_wait(siginfo->si_signo, &core_state); if (retval < 0) goto fail_creds; -- 2.32.0 with my current understanding, io_uring_files_cancel is supposed to cancel everything that might set the TIF_NOTIFY_SIGNAL. I must report that in my testing with generating a core dump through a pipe with the modif above, I still get truncated core dumps. systemd is having a weird error: [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such process and nothing is captured so I have replaced it with a very simple shell: $ cat /proc/sys/kernel/core_pattern |/home/lano1106/bin/pipe_core.sh %e %p ~/bin $ cat pipe_core.sh #!/bin/sh cat > /home/lano1106/core/core.$1.$2 BFD: warning: /home/lano1106/core/core.test.10886 is truncated: expected core file size >= 24129536, found: 61440 I conclude from my attempt that maybe io_uring_files_cancel is not 100% cleaning everything that it should clean. ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-05 13:06 ` Olivier Langlois @ 2021-08-10 21:48 ` Tony Battersby 2021-08-11 20:47 ` Olivier Langlois 2021-08-12 1:55 ` Jens Axboe 0 siblings, 2 replies; 66+ messages in thread From: Tony Battersby @ 2021-08-10 21:48 UTC (permalink / raw) To: Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> On 8/5/21 9:06 AM, Olivier Langlois wrote: > On Tue, 2021-06-15 at 17:08 -0500, Eric W. Biederman wrote: >> Oleg Nesterov <[email protected]> writes: >> >>>> --- a/fs/coredump.c >>>> +++ b/fs/coredump.c >>>> @@ -519,7 +519,7 @@ static bool dump_interrupted(void) >>>> * but then we need to teach dump_write() to restart and >>>> clear >>>> * TIF_SIGPENDING. >>>> */ >>>> - return signal_pending(current); >>>> + return fatal_signal_pending(current) || freezing(current); >>>> } >>> Well yes, this is what the comment says. >>> >>> But note that there is another reason why dump_interrupted() returns >>> true >>> if signal_pending(), it assumes thagt __dump_emit()->__kernel_write() >>> may >>> fail anyway if signal_pending() is true. Say, pipe_write(), or iirc >>> nfs, >>> perhaps something else... >>> >>> That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should >>> clear >>> TIF_NOTIFY_SIGNAL as well and we should change io-uring to not abuse >>> the >>> dumping threads? >>> >>> Or perhaps we should change __dump_emit() to clear signal_pending() >>> and >>> restart __kernel_write() if it fails or returns a short write. >>> >>> Otherwise the change above doesn't look like a full fix to me. >> Agreed. The coredump to a pipe will still be short. That needs >> something additional. >> >> The problem Olivier Langlois <[email protected]> reported was >> core dumps coming up short because TIF_NOTIFY_SIGNAL was being >> set during a core dump. >> >> We can see this with pipe_write returning -ERESTARTSYS >> on a full pipe if signal_pending which includes TIF_NOTIFY_SIGNAL >> is true. >> >> Looking further if the thread that is core dumping initiated >> any io_uring work then io_ring_exit_work will use task_work_add >> to request that thread clean up it's io_uring state. >> >> Perhaps we can put a big comment in dump_emit and if we >> get back -ERESTARTSYS run tracework_notify_signal. I am not >> seeing any locks held at that point in the coredump, so it >> should be safe. The coredump is run inside of file_start_write >> which is the only potential complication. >> >> >> >> The code flow is complicated but it looks like the entire >> point of the exercise is to call io_uring_del_task_file >> on the originating thread. I suppose that keeps the >> locking of the xarray in io_uring_task simple. >> >> >> Hmm. All of this comes from io_uring_release. >> How do we get to io_uring_release? The coredump should >> be catching everything in exit_mm before exit_files? >> >> Confused and hopeful someone can explain to me what is going on, >> and perhaps simplify it. >> >> Eric > Hi all, > > I didn't forgot about this remaining issue and I have kept thinking > about it on and off. > > I did try the following on 5.12.19: > > diff --git a/fs/coredump.c b/fs/coredump.c > index 07afb5ddb1c4..614fe7a54c1a 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -41,6 +41,7 @@ > #include <linux/fs.h> > #include <linux/path.h> > #include <linux/timekeeping.h> > +#include <linux/io_uring.h> > > #include <linux/uaccess.h> > #include <asm/mmu_context.h> > @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t *siginfo) > need_suid_safe = true; > } > > + io_uring_files_cancel(current->files); > + > retval = coredump_wait(siginfo->si_signo, &core_state); > if (retval < 0) > goto fail_creds; > -- > 2.32.0 > > with my current understanding, io_uring_files_cancel is supposed to > cancel everything that might set the TIF_NOTIFY_SIGNAL. > > I must report that in my testing with generating a core dump through a > pipe with the modif above, I still get truncated core dumps. > > systemd is having a weird error: > [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such > process > > and nothing is captured > > so I have replaced it with a very simple shell: > $ cat /proc/sys/kernel/core_pattern > |/home/lano1106/bin/pipe_core.sh %e %p > > ~/bin $ cat pipe_core.sh > #!/bin/sh > > cat > /home/lano1106/core/core.$1.$2 > > BFD: warning: /home/lano1106/core/core.test.10886 is truncated: > expected core file size >= 24129536, found: 61440 > > I conclude from my attempt that maybe io_uring_files_cancel is not 100% > cleaning everything that it should clean. > > > I just ran into this problem also - coredumps from an io_uring program to a pipe are truncated. But I am using kernel 5.10.57, which does NOT have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") or commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). Kernel 5.4 works though, so I bisected the problem to commit f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup properly") in kernel 5.9. Note that my io_uring program uses only async buffered reads, which may be why this particular commit makes a difference to my program. My io_uring program is a multi-purpose long-running program with many threads. Most threads don't use io_uring but a few of them do. Normally, my core dumps are piped to a program so that they can be compressed before being written to disk, but I can also test writing the core dumps directly to disk. This is what I have found: *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers a coredump, the core file is written correctly, whether it is written to disk or piped to a program, even if another thread is using io_uring at the same time. *) Unpatched 5.10.57: if a thread that uses io_uring triggers a coredump, the core file is truncated, whether written directly to disk or piped to a program. *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring triggers a coredump, and the core is written directly to disk, then it is written correctly. *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring triggers a coredump, and the core is piped to a program, then it is truncated. *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, whether written directly to disk or piped to a program. Tony Battersby Cybernetics ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-10 21:48 ` Tony Battersby @ 2021-08-11 20:47 ` Olivier Langlois 2021-08-12 1:55 ` Jens Axboe 1 sibling, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2021-08-11 20:47 UTC (permalink / raw) To: Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Pavel Begunkov> On Tue, 2021-08-10 at 17:48 -0400, Tony Battersby wrote: > > > I just ran into this problem also - coredumps from an io_uring > program > to a pipe are truncated. But I am using kernel 5.10.57, which does > NOT > have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") > or > commit 06af8679449d ("coredump: Limit what can interrupt > coredumps"). > Kernel 5.4 works though, so I bisected the problem to commit > f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup > properly") in kernel 5.9. Note that my io_uring program uses only > async > buffered reads, which may be why this particular commit makes a > difference to my program. > > My io_uring program is a multi-purpose long-running program with many > threads. Most threads don't use io_uring but a few of them do. > Normally, my core dumps are piped to a program so that they can be > compressed before being written to disk, but I can also test writing > the > core dumps directly to disk. This is what I have found: > > *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers > a > coredump, the core file is written correctly, whether it is written > to > disk or piped to a program, even if another thread is using io_uring > at > the same time. > > *) Unpatched 5.10.57: if a thread that uses io_uring triggers a > coredump, the core file is truncated, whether written directly to > disk > or piped to a program. > > *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring > triggers a coredump, and the core is written directly to disk, then > it > is written correctly. > > *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring > triggers a coredump, and the core is piped to a program, then it is > truncated. > > *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, > whether written directly to disk or piped to a program. > > Tony Battersby > Cybernetics > Tony, this is super interesting details. I'm leaving for few days so I will not be able to look into it until I am back but here is my interpretation of your findings: f38c7e3abfba makes it more likely that your task ends up in a fd read wait queue. Previously the io_uring req queuing was failing and returning EAGAIN. Now it ends up using io uring fast poll. When the core dump gets written through a pipe, pipe_write must block waiting for some event. If the task gets waken up by the io_uring wait queue entry instead, it must somehow make pipe_write fails. So the problem must be a mix of TIF_NOTIFY_SIGNAL and the fact that io_uring wait queue entries aren't cleaned up while doing the core dump. I have a new modif to try out. I'll hopefully be able to submit a patch to fix that once I come back (I cannot do it now or else, I'll never leave ;-)) ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-10 21:48 ` Tony Battersby 2021-08-11 20:47 ` Olivier Langlois @ 2021-08-12 1:55 ` Jens Axboe 2021-08-12 13:53 ` Tony Battersby 2021-08-15 20:42 ` Olivier Langlois 1 sibling, 2 replies; 66+ messages in thread From: Jens Axboe @ 2021-08-12 1:55 UTC (permalink / raw) To: Tony Battersby, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/10/21 3:48 PM, Tony Battersby wrote: > On 8/5/21 9:06 AM, Olivier Langlois wrote: >> On Tue, 2021-06-15 at 17:08 -0500, Eric W. Biederman wrote: >>> Oleg Nesterov <[email protected]> writes: >>> >>>>> --- a/fs/coredump.c >>>>> +++ b/fs/coredump.c >>>>> @@ -519,7 +519,7 @@ static bool dump_interrupted(void) >>>>> * but then we need to teach dump_write() to restart and >>>>> clear >>>>> * TIF_SIGPENDING. >>>>> */ >>>>> - return signal_pending(current); >>>>> + return fatal_signal_pending(current) || freezing(current); >>>>> } >>>> Well yes, this is what the comment says. >>>> >>>> But note that there is another reason why dump_interrupted() returns >>>> true >>>> if signal_pending(), it assumes thagt __dump_emit()->__kernel_write() >>>> may >>>> fail anyway if signal_pending() is true. Say, pipe_write(), or iirc >>>> nfs, >>>> perhaps something else... >>>> >>>> That is why zap_threads() clears TIF_SIGPENDING. Perhaps it should >>>> clear >>>> TIF_NOTIFY_SIGNAL as well and we should change io-uring to not abuse >>>> the >>>> dumping threads? >>>> >>>> Or perhaps we should change __dump_emit() to clear signal_pending() >>>> and >>>> restart __kernel_write() if it fails or returns a short write. >>>> >>>> Otherwise the change above doesn't look like a full fix to me. >>> Agreed. The coredump to a pipe will still be short. That needs >>> something additional. >>> >>> The problem Olivier Langlois <[email protected]> reported was >>> core dumps coming up short because TIF_NOTIFY_SIGNAL was being >>> set during a core dump. >>> >>> We can see this with pipe_write returning -ERESTARTSYS >>> on a full pipe if signal_pending which includes TIF_NOTIFY_SIGNAL >>> is true. >>> >>> Looking further if the thread that is core dumping initiated >>> any io_uring work then io_ring_exit_work will use task_work_add >>> to request that thread clean up it's io_uring state. >>> >>> Perhaps we can put a big comment in dump_emit and if we >>> get back -ERESTARTSYS run tracework_notify_signal. I am not >>> seeing any locks held at that point in the coredump, so it >>> should be safe. The coredump is run inside of file_start_write >>> which is the only potential complication. >>> >>> >>> >>> The code flow is complicated but it looks like the entire >>> point of the exercise is to call io_uring_del_task_file >>> on the originating thread. I suppose that keeps the >>> locking of the xarray in io_uring_task simple. >>> >>> >>> Hmm. All of this comes from io_uring_release. >>> How do we get to io_uring_release? The coredump should >>> be catching everything in exit_mm before exit_files? >>> >>> Confused and hopeful someone can explain to me what is going on, >>> and perhaps simplify it. >>> >>> Eric >> Hi all, >> >> I didn't forgot about this remaining issue and I have kept thinking >> about it on and off. >> >> I did try the following on 5.12.19: >> >> diff --git a/fs/coredump.c b/fs/coredump.c >> index 07afb5ddb1c4..614fe7a54c1a 100644 >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -41,6 +41,7 @@ >> #include <linux/fs.h> >> #include <linux/path.h> >> #include <linux/timekeeping.h> >> +#include <linux/io_uring.h> >> >> #include <linux/uaccess.h> >> #include <asm/mmu_context.h> >> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> need_suid_safe = true; >> } >> >> + io_uring_files_cancel(current->files); >> + >> retval = coredump_wait(siginfo->si_signo, &core_state); >> if (retval < 0) >> goto fail_creds; >> -- >> 2.32.0 >> >> with my current understanding, io_uring_files_cancel is supposed to >> cancel everything that might set the TIF_NOTIFY_SIGNAL. >> >> I must report that in my testing with generating a core dump through a >> pipe with the modif above, I still get truncated core dumps. >> >> systemd is having a weird error: >> [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such >> process >> >> and nothing is captured >> >> so I have replaced it with a very simple shell: >> $ cat /proc/sys/kernel/core_pattern >> |/home/lano1106/bin/pipe_core.sh %e %p >> >> ~/bin $ cat pipe_core.sh >> #!/bin/sh >> >> cat > /home/lano1106/core/core.$1.$2 >> >> BFD: warning: /home/lano1106/core/core.test.10886 is truncated: >> expected core file size >= 24129536, found: 61440 >> >> I conclude from my attempt that maybe io_uring_files_cancel is not 100% >> cleaning everything that it should clean. >> >> >> > I just ran into this problem also - coredumps from an io_uring program > to a pipe are truncated. But I am using kernel 5.10.57, which does NOT > have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") or > commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). > Kernel 5.4 works though, so I bisected the problem to commit > f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup > properly") in kernel 5.9. Note that my io_uring program uses only async > buffered reads, which may be why this particular commit makes a > difference to my program. > > My io_uring program is a multi-purpose long-running program with many > threads. Most threads don't use io_uring but a few of them do. > Normally, my core dumps are piped to a program so that they can be > compressed before being written to disk, but I can also test writing the > core dumps directly to disk. This is what I have found: > > *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers a > coredump, the core file is written correctly, whether it is written to > disk or piped to a program, even if another thread is using io_uring at > the same time. > > *) Unpatched 5.10.57: if a thread that uses io_uring triggers a > coredump, the core file is truncated, whether written directly to disk > or piped to a program. > > *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring > triggers a coredump, and the core is written directly to disk, then it > is written correctly. > > *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring > triggers a coredump, and the core is piped to a program, then it is > truncated. > > *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, > whether written directly to disk or piped to a program. That is very interesting. Like Olivier mentioned, it's not that actual commit, but rather the change of behavior implemented by it. Before that commit, we'd hit the async workers more often, whereas after we do the correct retry method where it's driven by the wakeup when the page is unlocked. This is purely speculation, but perhaps the fact that the process changes state potentially mid dump is why the dump ends up being truncated? I'd love to dive into this and try and figure it out. Absent a test case, at least the above gives me an idea of what to try out. I'll see if it makes it easier for me to create a case that does result in a truncated core dump. -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-12 1:55 ` Jens Axboe @ 2021-08-12 13:53 ` Tony Battersby 2021-08-15 20:42 ` Olivier Langlois 1 sibling, 0 replies; 66+ messages in thread From: Tony Battersby @ 2021-08-12 13:53 UTC (permalink / raw) To: Jens Axboe, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/11/21 9:55 PM, Jens Axboe wrote: > > That is very interesting. Like Olivier mentioned, it's not that actual > commit, but rather the change of behavior implemented by it. Before that > commit, we'd hit the async workers more often, whereas after we do the > correct retry method where it's driven by the wakeup when the page is > unlocked. This is purely speculation, but perhaps the fact that the > process changes state potentially mid dump is why the dump ends up being > truncated? > > I'd love to dive into this and try and figure it out. Absent a test > case, at least the above gives me an idea of what to try out. I'll see > if it makes it easier for me to create a case that does result in a > truncated core dump. > If it helps, a "good" coredump from my program is about 350 MB compressed down to about 7 MB by bzip2. A truncated coredump varies in size from about 60 KB to about 2 MB before compression. The program that receives the coredump uses bzip2 to compress the data before writing it to disk. Tony ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-12 1:55 ` Jens Axboe 2021-08-12 13:53 ` Tony Battersby @ 2021-08-15 20:42 ` Olivier Langlois 2021-08-16 13:02 ` Pavel Begunkov 2021-08-17 18:15 ` Jens Axboe 1 sibling, 2 replies; 66+ messages in thread From: Olivier Langlois @ 2021-08-15 20:42 UTC (permalink / raw) To: Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On Wed, 2021-08-11 at 19:55 -0600, Jens Axboe wrote: > On 8/10/21 3:48 PM, Tony Battersby wrote: > > On 8/5/21 9:06 AM, Olivier Langlois wrote: > > > > > > Hi all, > > > > > > I didn't forgot about this remaining issue and I have kept thinking > > > about it on and off. > > > > > > I did try the following on 5.12.19: > > > > > > diff --git a/fs/coredump.c b/fs/coredump.c > > > index 07afb5ddb1c4..614fe7a54c1a 100644 > > > --- a/fs/coredump.c > > > +++ b/fs/coredump.c > > > @@ -41,6 +41,7 @@ > > > #include <linux/fs.h> > > > #include <linux/path.h> > > > #include <linux/timekeeping.h> > > > +#include <linux/io_uring.h> > > > > > > #include <linux/uaccess.h> > > > #include <asm/mmu_context.h> > > > @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t > > > *siginfo) > > > need_suid_safe = true; > > > } > > > > > > + io_uring_files_cancel(current->files); > > > + > > > retval = coredump_wait(siginfo->si_signo, &core_state); > > > if (retval < 0) > > > goto fail_creds; > > > -- > > > 2.32.0 > > > > > > with my current understanding, io_uring_files_cancel is supposed to > > > cancel everything that might set the TIF_NOTIFY_SIGNAL. > > > > > > I must report that in my testing with generating a core dump > > > through a > > > pipe with the modif above, I still get truncated core dumps. > > > > > > systemd is having a weird error: > > > [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such > > > process > > > > > > and nothing is captured > > > > > > so I have replaced it with a very simple shell: > > > $ cat /proc/sys/kernel/core_pattern > > > > /home/lano1106/bin/pipe_core.sh %e %p > > > > > > ~/bin $ cat pipe_core.sh > > > #!/bin/sh > > > > > > cat > /home/lano1106/core/core.$1.$2 > > > > > > BFD: warning: /home/lano1106/core/core.test.10886 is truncated: > > > expected core file size >= 24129536, found: 61440 > > > > > > I conclude from my attempt that maybe io_uring_files_cancel is not > > > 100% > > > cleaning everything that it should clean. > > > > > > > > > > > I just ran into this problem also - coredumps from an io_uring > > program > > to a pipe are truncated. But I am using kernel 5.10.57, which does > > NOT > > have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") > > or > > commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). > > Kernel 5.4 works though, so I bisected the problem to commit > > f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup > > properly") in kernel 5.9. Note that my io_uring program uses only > > async > > buffered reads, which may be why this particular commit makes a > > difference to my program. > > > > My io_uring program is a multi-purpose long-running program with many > > threads. Most threads don't use io_uring but a few of them do. > > Normally, my core dumps are piped to a program so that they can be > > compressed before being written to disk, but I can also test writing > > the > > core dumps directly to disk. This is what I have found: > > > > *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers > > a > > coredump, the core file is written correctly, whether it is written > > to > > disk or piped to a program, even if another thread is using io_uring > > at > > the same time. > > > > *) Unpatched 5.10.57: if a thread that uses io_uring triggers a > > coredump, the core file is truncated, whether written directly to > > disk > > or piped to a program. > > > > *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring > > triggers a coredump, and the core is written directly to disk, then > > it > > is written correctly. > > > > *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring > > triggers a coredump, and the core is piped to a program, then it is > > truncated. > > > > *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, > > whether written directly to disk or piped to a program. > > That is very interesting. Like Olivier mentioned, it's not that actual > commit, but rather the change of behavior implemented by it. Before > that > commit, we'd hit the async workers more often, whereas after we do the > correct retry method where it's driven by the wakeup when the page is > unlocked. This is purely speculation, but perhaps the fact that the > process changes state potentially mid dump is why the dump ends up > being > truncated? > > I'd love to dive into this and try and figure it out. Absent a test > case, at least the above gives me an idea of what to try out. I'll see > if it makes it easier for me to create a case that does result in a > truncated core dump. > Jens, When I have first encountered the issue, the very first thing that I did try was to create a simple test program that would synthetize the problem. After few time consumming failed attempts, I just gave up the idea and simply settle to my prod program that showcase systematically the problem every time that I kill the process with a SEGV signal. In a nutshell, all the program does is to issue read operations with io_uring on a TCP socket on which there is a constant data stream. Now that I have a better understanding of what is going on, I think that one way that could reproduce the problem consistently could be along those lines: 1. Create a pipe 2. fork a child 3. Initiate a read operation on the pipe with io_uring from the child 4. Let the parent kill its child with a core dump generating signal. 5. Write something in the pipe from the parent so that the io_uring read operation completes while the core dump is generated. I guess that I'll end up doing that if I cannot fix the issue with my current setup but here is what I have attempted so far: 1. Call io_uring_files_cancel from do_coredump 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on returning from io_uring_files_cancel Those attempts didn't work but lurking in the io_uring dev mailing list is starting to pay off. I thought that I did reach the bottom of the rabbit hole in my journey of understanding io_uring but the recent patch set sent by Hao Xu https://lore.kernel.org/io-uring/[email protected]/T/#t made me realize that I still haven't assimilated all the small io_uring nuances... Here is my feedback. From my casual io_uring code reader point of view, it is not 100% obvious what the difference is between io_uring_files_cancel and io_uring_task_cancel It seems like io_uring_files_cancel is cancelling polls only if they have the REQ_F_INFLIGHT flag set. I have no idea what an inflight request means and why someone would want to call io_uring_files_cancel over io_uring_task_cancel. I guess that if I was to meditate on the question for few hours, I would at some point get some illumination strike me but I believe that it could be a good idea to document in the code those concepts for helping casual readers... Bottomline, I now understand that io_uring_files_cancel does not cancel all the requests. Therefore, without fully understanding what I am doing, I am going to replace my call to io_uring_files_cancel from do_coredump with io_uring_task_cancel and see if this finally fix the issue for good. What I am trying to do is to cancel pending io_uring requests to make sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. Maybe another solution would simply be to modify __dump_emit to make it resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally suggested. or maybe do both... Not sure which approach is best. If someone has an opinion, I would be curious to hear it. Greetings, ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-15 20:42 ` Olivier Langlois @ 2021-08-16 13:02 ` Pavel Begunkov 2021-08-16 13:06 ` Pavel Begunkov 2021-08-17 18:15 ` Jens Axboe 1 sibling, 1 reply; 66+ messages in thread From: Pavel Begunkov @ 2021-08-16 13:02 UTC (permalink / raw) To: Olivier Langlois, Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro On 8/15/21 9:42 PM, Olivier Langlois wrote: [...] > When I have first encountered the issue, the very first thing that I > did try was to create a simple test program that would synthetize the > problem. > > After few time consumming failed attempts, I just gave up the idea and > simply settle to my prod program that showcase systematically the > problem every time that I kill the process with a SEGV signal. > > In a nutshell, all the program does is to issue read operations with > io_uring on a TCP socket on which there is a constant data stream. > > Now that I have a better understanding of what is going on, I think > that one way that could reproduce the problem consistently could be > along those lines: > > 1. Create a pipe > 2. fork a child > 3. Initiate a read operation on the pipe with io_uring from the child > 4. Let the parent kill its child with a core dump generating signal. > 5. Write something in the pipe from the parent so that the io_uring > read operation completes while the core dump is generated. > > I guess that I'll end up doing that if I cannot fix the issue with my > current setup but here is what I have attempted so far: > > 1. Call io_uring_files_cancel from do_coredump > 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on > returning from io_uring_files_cancel > > Those attempts didn't work but lurking in the io_uring dev mailing list > is starting to pay off. I thought that I did reach the bottom of the > rabbit hole in my journey of understanding io_uring but the recent > patch set sent by Hao Xu > > https://lore.kernel.org/io-uring/[email protected]/T/#t > > made me realize that I still haven't assimilated all the small io_uring > nuances... > > Here is my feedback. From my casual io_uring code reader point of view, > it is not 100% obvious what the difference is between > io_uring_files_cancel and io_uring_task_cancel As you mentioned, io_uring_task_cancel() cancels and waits for all requests submitted by current task, used in exec() and SQPOLL because of potential races. io_uring_task_cancel() cancels only selected ones and io_uring_files_cancel() cancels and waits only some specific requests that we absolutely have to, e.g. in 5.15 it'll be only requests referencing the ring itself. It's used on normal task exit. io_uring_task_cancel() cancels and waits all requests submitted by current task, used on exec() because of races. As you mentioned > > It seems like io_uring_files_cancel is cancelling polls only if they > have the REQ_F_INFLIGHT flag set. > > I have no idea what an inflight request means and why someone would > want to call io_uring_files_cancel over io_uring_task_cancel. > > I guess that if I was to meditate on the question for few hours, I > would at some point get some illumination strike me but I believe that > it could be a good idea to document in the code those concepts for > helping casual readers... > > Bottomline, I now understand that io_uring_files_cancel does not cancel > all the requests. Therefore, without fully understanding what I am > doing, I am going to replace my call to io_uring_files_cancel from > do_coredump with io_uring_task_cancel and see if this finally fix the > issue for good. > > What I am trying to do is to cancel pending io_uring requests to make > sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. > > Maybe another solution would simply be to modify __dump_emit to make it > resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally > suggested. > > or maybe do both... > > Not sure which approach is best. If someone has an opinion, I would be > curious to hear it. > > Greetings, > > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-16 13:02 ` Pavel Begunkov @ 2021-08-16 13:06 ` Pavel Begunkov 0 siblings, 0 replies; 66+ messages in thread From: Pavel Begunkov @ 2021-08-16 13:06 UTC (permalink / raw) To: Olivier Langlois, Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro On 8/16/21 2:02 PM, Pavel Begunkov wrote: > On 8/15/21 9:42 PM, Olivier Langlois wrote: > [...] >> When I have first encountered the issue, the very first thing that I >> did try was to create a simple test program that would synthetize the >> problem. >> >> After few time consumming failed attempts, I just gave up the idea and >> simply settle to my prod program that showcase systematically the >> problem every time that I kill the process with a SEGV signal. >> >> In a nutshell, all the program does is to issue read operations with >> io_uring on a TCP socket on which there is a constant data stream. >> >> Now that I have a better understanding of what is going on, I think >> that one way that could reproduce the problem consistently could be >> along those lines: >> >> 1. Create a pipe >> 2. fork a child >> 3. Initiate a read operation on the pipe with io_uring from the child >> 4. Let the parent kill its child with a core dump generating signal. >> 5. Write something in the pipe from the parent so that the io_uring >> read operation completes while the core dump is generated. >> >> I guess that I'll end up doing that if I cannot fix the issue with my >> current setup but here is what I have attempted so far: >> >> 1. Call io_uring_files_cancel from do_coredump >> 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on >> returning from io_uring_files_cancel >> >> Those attempts didn't work but lurking in the io_uring dev mailing list >> is starting to pay off. I thought that I did reach the bottom of the >> rabbit hole in my journey of understanding io_uring but the recent >> patch set sent by Hao Xu >> >> https://lore.kernel.org/io-uring/[email protected]/T/#t >> >> made me realize that I still haven't assimilated all the small io_uring >> nuances... >> >> Here is my feedback. From my casual io_uring code reader point of view, >> it is not 100% obvious what the difference is between >> io_uring_files_cancel and io_uring_task_cancel > > As you mentioned, io_uring_task_cancel() cancels and waits for all > requests submitted by current task, used in exec() and SQPOLL because > of potential races. Apologies for this draft rumbling... As you mentioned, io_uring_task_cancel() cancels and waits for all requests submitted by current task, used in exec() and SQPOLL because of potential races. io_uring_task_cancel() cancels only selected ones, e.g. in 5.15 will be only requests operating on io_uring, and used during normal exit. Agree that the names may be not too descriptive. >> >> It seems like io_uring_files_cancel is cancelling polls only if they >> have the REQ_F_INFLIGHT flag set. >> >> I have no idea what an inflight request means and why someone would >> want to call io_uring_files_cancel over io_uring_task_cancel. >> >> I guess that if I was to meditate on the question for few hours, I >> would at some point get some illumination strike me but I believe that >> it could be a good idea to document in the code those concepts for >> helping casual readers... >> >> Bottomline, I now understand that io_uring_files_cancel does not cancel >> all the requests. Therefore, without fully understanding what I am >> doing, I am going to replace my call to io_uring_files_cancel from >> do_coredump with io_uring_task_cancel and see if this finally fix the >> issue for good. >> >> What I am trying to do is to cancel pending io_uring requests to make >> sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. >> >> Maybe another solution would simply be to modify __dump_emit to make it >> resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally >> suggested. >> >> or maybe do both... >> >> Not sure which approach is best. If someone has an opinion, I would be >> curious to hear it. >> >> Greetings, >> >> > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-15 20:42 ` Olivier Langlois 2021-08-16 13:02 ` Pavel Begunkov @ 2021-08-17 18:15 ` Jens Axboe 2021-08-17 18:24 ` Jens Axboe 2021-08-21 9:48 ` Olivier Langlois 1 sibling, 2 replies; 66+ messages in thread From: Jens Axboe @ 2021-08-17 18:15 UTC (permalink / raw) To: Olivier Langlois, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/15/21 2:42 PM, Olivier Langlois wrote: > On Wed, 2021-08-11 at 19:55 -0600, Jens Axboe wrote: >> On 8/10/21 3:48 PM, Tony Battersby wrote: >>> On 8/5/21 9:06 AM, Olivier Langlois wrote: >>>> >>>> Hi all, >>>> >>>> I didn't forgot about this remaining issue and I have kept thinking >>>> about it on and off. >>>> >>>> I did try the following on 5.12.19: >>>> >>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>> index 07afb5ddb1c4..614fe7a54c1a 100644 >>>> --- a/fs/coredump.c >>>> +++ b/fs/coredump.c >>>> @@ -41,6 +41,7 @@ >>>> #include <linux/fs.h> >>>> #include <linux/path.h> >>>> #include <linux/timekeeping.h> >>>> +#include <linux/io_uring.h> >>>> >>>> #include <linux/uaccess.h> >>>> #include <asm/mmu_context.h> >>>> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t >>>> *siginfo) >>>> need_suid_safe = true; >>>> } >>>> >>>> + io_uring_files_cancel(current->files); >>>> + >>>> retval = coredump_wait(siginfo->si_signo, &core_state); >>>> if (retval < 0) >>>> goto fail_creds; >>>> -- >>>> 2.32.0 >>>> >>>> with my current understanding, io_uring_files_cancel is supposed to >>>> cancel everything that might set the TIF_NOTIFY_SIGNAL. >>>> >>>> I must report that in my testing with generating a core dump >>>> through a >>>> pipe with the modif above, I still get truncated core dumps. >>>> >>>> systemd is having a weird error: >>>> [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such >>>> process >>>> >>>> and nothing is captured >>>> >>>> so I have replaced it with a very simple shell: >>>> $ cat /proc/sys/kernel/core_pattern >>>>> /home/lano1106/bin/pipe_core.sh %e %p >>>> >>>> ~/bin $ cat pipe_core.sh >>>> #!/bin/sh >>>> >>>> cat > /home/lano1106/core/core.$1.$2 >>>> >>>> BFD: warning: /home/lano1106/core/core.test.10886 is truncated: >>>> expected core file size >= 24129536, found: 61440 >>>> >>>> I conclude from my attempt that maybe io_uring_files_cancel is not >>>> 100% >>>> cleaning everything that it should clean. >>>> >>>> >>>> >>> I just ran into this problem also - coredumps from an io_uring >>> program >>> to a pipe are truncated. But I am using kernel 5.10.57, which does >>> NOT >>> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") >>> or >>> commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). >>> Kernel 5.4 works though, so I bisected the problem to commit >>> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup >>> properly") in kernel 5.9. Note that my io_uring program uses only >>> async >>> buffered reads, which may be why this particular commit makes a >>> difference to my program. >>> >>> My io_uring program is a multi-purpose long-running program with many >>> threads. Most threads don't use io_uring but a few of them do. >>> Normally, my core dumps are piped to a program so that they can be >>> compressed before being written to disk, but I can also test writing >>> the >>> core dumps directly to disk. This is what I have found: >>> >>> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers >>> a >>> coredump, the core file is written correctly, whether it is written >>> to >>> disk or piped to a program, even if another thread is using io_uring >>> at >>> the same time. >>> >>> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a >>> coredump, the core file is truncated, whether written directly to >>> disk >>> or piped to a program. >>> >>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>> triggers a coredump, and the core is written directly to disk, then >>> it >>> is written correctly. >>> >>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>> triggers a coredump, and the core is piped to a program, then it is >>> truncated. >>> >>> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, >>> whether written directly to disk or piped to a program. >> >> That is very interesting. Like Olivier mentioned, it's not that actual >> commit, but rather the change of behavior implemented by it. Before >> that >> commit, we'd hit the async workers more often, whereas after we do the >> correct retry method where it's driven by the wakeup when the page is >> unlocked. This is purely speculation, but perhaps the fact that the >> process changes state potentially mid dump is why the dump ends up >> being >> truncated? >> >> I'd love to dive into this and try and figure it out. Absent a test >> case, at least the above gives me an idea of what to try out. I'll see >> if it makes it easier for me to create a case that does result in a >> truncated core dump. >> > Jens, > > When I have first encountered the issue, the very first thing that I > did try was to create a simple test program that would synthetize the > problem. > > After few time consumming failed attempts, I just gave up the idea and > simply settle to my prod program that showcase systematically the > problem every time that I kill the process with a SEGV signal. > > In a nutshell, all the program does is to issue read operations with > io_uring on a TCP socket on which there is a constant data stream. > > Now that I have a better understanding of what is going on, I think > that one way that could reproduce the problem consistently could be > along those lines: > > 1. Create a pipe > 2. fork a child > 3. Initiate a read operation on the pipe with io_uring from the child > 4. Let the parent kill its child with a core dump generating signal. > 5. Write something in the pipe from the parent so that the io_uring > read operation completes while the core dump is generated. > > I guess that I'll end up doing that if I cannot fix the issue with my > current setup but here is what I have attempted so far: > > 1. Call io_uring_files_cancel from do_coredump > 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on > returning from io_uring_files_cancel > > Those attempts didn't work but lurking in the io_uring dev mailing list > is starting to pay off. I thought that I did reach the bottom of the > rabbit hole in my journey of understanding io_uring but the recent > patch set sent by Hao Xu > > https://lore.kernel.org/io-uring/[email protected]/T/#t > > made me realize that I still haven't assimilated all the small io_uring > nuances... > > Here is my feedback. From my casual io_uring code reader point of view, > it is not 100% obvious what the difference is between > io_uring_files_cancel and io_uring_task_cancel > > It seems like io_uring_files_cancel is cancelling polls only if they > have the REQ_F_INFLIGHT flag set. > > I have no idea what an inflight request means and why someone would > want to call io_uring_files_cancel over io_uring_task_cancel. > > I guess that if I was to meditate on the question for few hours, I > would at some point get some illumination strike me but I believe that > it could be a good idea to document in the code those concepts for > helping casual readers... > > Bottomline, I now understand that io_uring_files_cancel does not cancel > all the requests. Therefore, without fully understanding what I am > doing, I am going to replace my call to io_uring_files_cancel from > do_coredump with io_uring_task_cancel and see if this finally fix the > issue for good. > > What I am trying to do is to cancel pending io_uring requests to make > sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. > > Maybe another solution would simply be to modify __dump_emit to make it > resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally > suggested. > > or maybe do both... > > Not sure which approach is best. If someone has an opinion, I would be > curious to hear it. It does indeed sound like it's TIF_NOTIFY_SIGNAL that will trigger some signal_pending() and cause an interruption of the core dump. Just out of curiosity, what is your /proc/sys/kernel/core_pattern set to? If it's set to some piped process, can you try and set it to 'core' and see if that eliminates the truncation of the core dumps for your case? -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 18:15 ` Jens Axboe @ 2021-08-17 18:24 ` Jens Axboe 2021-08-17 19:29 ` Tony Battersby 2021-08-21 9:52 ` Olivier Langlois 2021-08-21 9:48 ` Olivier Langlois 1 sibling, 2 replies; 66+ messages in thread From: Jens Axboe @ 2021-08-17 18:24 UTC (permalink / raw) To: Olivier Langlois, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 12:15 PM, Jens Axboe wrote: > On 8/15/21 2:42 PM, Olivier Langlois wrote: >> On Wed, 2021-08-11 at 19:55 -0600, Jens Axboe wrote: >>> On 8/10/21 3:48 PM, Tony Battersby wrote: >>>> On 8/5/21 9:06 AM, Olivier Langlois wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I didn't forgot about this remaining issue and I have kept thinking >>>>> about it on and off. >>>>> >>>>> I did try the following on 5.12.19: >>>>> >>>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>>> index 07afb5ddb1c4..614fe7a54c1a 100644 >>>>> --- a/fs/coredump.c >>>>> +++ b/fs/coredump.c >>>>> @@ -41,6 +41,7 @@ >>>>> #include <linux/fs.h> >>>>> #include <linux/path.h> >>>>> #include <linux/timekeeping.h> >>>>> +#include <linux/io_uring.h> >>>>> >>>>> #include <linux/uaccess.h> >>>>> #include <asm/mmu_context.h> >>>>> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t >>>>> *siginfo) >>>>> need_suid_safe = true; >>>>> } >>>>> >>>>> + io_uring_files_cancel(current->files); >>>>> + >>>>> retval = coredump_wait(siginfo->si_signo, &core_state); >>>>> if (retval < 0) >>>>> goto fail_creds; >>>>> -- >>>>> 2.32.0 >>>>> >>>>> with my current understanding, io_uring_files_cancel is supposed to >>>>> cancel everything that might set the TIF_NOTIFY_SIGNAL. >>>>> >>>>> I must report that in my testing with generating a core dump >>>>> through a >>>>> pipe with the modif above, I still get truncated core dumps. >>>>> >>>>> systemd is having a weird error: >>>>> [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such >>>>> process >>>>> >>>>> and nothing is captured >>>>> >>>>> so I have replaced it with a very simple shell: >>>>> $ cat /proc/sys/kernel/core_pattern >>>>>> /home/lano1106/bin/pipe_core.sh %e %p >>>>> >>>>> ~/bin $ cat pipe_core.sh >>>>> #!/bin/sh >>>>> >>>>> cat > /home/lano1106/core/core.$1.$2 >>>>> >>>>> BFD: warning: /home/lano1106/core/core.test.10886 is truncated: >>>>> expected core file size >= 24129536, found: 61440 >>>>> >>>>> I conclude from my attempt that maybe io_uring_files_cancel is not >>>>> 100% >>>>> cleaning everything that it should clean. >>>>> >>>>> >>>>> >>>> I just ran into this problem also - coredumps from an io_uring >>>> program >>>> to a pipe are truncated. But I am using kernel 5.10.57, which does >>>> NOT >>>> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") >>>> or >>>> commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). >>>> Kernel 5.4 works though, so I bisected the problem to commit >>>> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup >>>> properly") in kernel 5.9. Note that my io_uring program uses only >>>> async >>>> buffered reads, which may be why this particular commit makes a >>>> difference to my program. >>>> >>>> My io_uring program is a multi-purpose long-running program with many >>>> threads. Most threads don't use io_uring but a few of them do. >>>> Normally, my core dumps are piped to a program so that they can be >>>> compressed before being written to disk, but I can also test writing >>>> the >>>> core dumps directly to disk. This is what I have found: >>>> >>>> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers >>>> a >>>> coredump, the core file is written correctly, whether it is written >>>> to >>>> disk or piped to a program, even if another thread is using io_uring >>>> at >>>> the same time. >>>> >>>> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a >>>> coredump, the core file is truncated, whether written directly to >>>> disk >>>> or piped to a program. >>>> >>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>> triggers a coredump, and the core is written directly to disk, then >>>> it >>>> is written correctly. >>>> >>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>> triggers a coredump, and the core is piped to a program, then it is >>>> truncated. >>>> >>>> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, >>>> whether written directly to disk or piped to a program. >>> >>> That is very interesting. Like Olivier mentioned, it's not that actual >>> commit, but rather the change of behavior implemented by it. Before >>> that >>> commit, we'd hit the async workers more often, whereas after we do the >>> correct retry method where it's driven by the wakeup when the page is >>> unlocked. This is purely speculation, but perhaps the fact that the >>> process changes state potentially mid dump is why the dump ends up >>> being >>> truncated? >>> >>> I'd love to dive into this and try and figure it out. Absent a test >>> case, at least the above gives me an idea of what to try out. I'll see >>> if it makes it easier for me to create a case that does result in a >>> truncated core dump. >>> >> Jens, >> >> When I have first encountered the issue, the very first thing that I >> did try was to create a simple test program that would synthetize the >> problem. >> >> After few time consumming failed attempts, I just gave up the idea and >> simply settle to my prod program that showcase systematically the >> problem every time that I kill the process with a SEGV signal. >> >> In a nutshell, all the program does is to issue read operations with >> io_uring on a TCP socket on which there is a constant data stream. >> >> Now that I have a better understanding of what is going on, I think >> that one way that could reproduce the problem consistently could be >> along those lines: >> >> 1. Create a pipe >> 2. fork a child >> 3. Initiate a read operation on the pipe with io_uring from the child >> 4. Let the parent kill its child with a core dump generating signal. >> 5. Write something in the pipe from the parent so that the io_uring >> read operation completes while the core dump is generated. >> >> I guess that I'll end up doing that if I cannot fix the issue with my >> current setup but here is what I have attempted so far: >> >> 1. Call io_uring_files_cancel from do_coredump >> 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on >> returning from io_uring_files_cancel >> >> Those attempts didn't work but lurking in the io_uring dev mailing list >> is starting to pay off. I thought that I did reach the bottom of the >> rabbit hole in my journey of understanding io_uring but the recent >> patch set sent by Hao Xu >> >> https://lore.kernel.org/io-uring/[email protected]/T/#t >> >> made me realize that I still haven't assimilated all the small io_uring >> nuances... >> >> Here is my feedback. From my casual io_uring code reader point of view, >> it is not 100% obvious what the difference is between >> io_uring_files_cancel and io_uring_task_cancel >> >> It seems like io_uring_files_cancel is cancelling polls only if they >> have the REQ_F_INFLIGHT flag set. >> >> I have no idea what an inflight request means and why someone would >> want to call io_uring_files_cancel over io_uring_task_cancel. >> >> I guess that if I was to meditate on the question for few hours, I >> would at some point get some illumination strike me but I believe that >> it could be a good idea to document in the code those concepts for >> helping casual readers... >> >> Bottomline, I now understand that io_uring_files_cancel does not cancel >> all the requests. Therefore, without fully understanding what I am >> doing, I am going to replace my call to io_uring_files_cancel from >> do_coredump with io_uring_task_cancel and see if this finally fix the >> issue for good. >> >> What I am trying to do is to cancel pending io_uring requests to make >> sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. >> >> Maybe another solution would simply be to modify __dump_emit to make it >> resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally >> suggested. >> >> or maybe do both... >> >> Not sure which approach is best. If someone has an opinion, I would be >> curious to hear it. > > It does indeed sound like it's TIF_NOTIFY_SIGNAL that will trigger some > signal_pending() and cause an interruption of the core dump. Just out of > curiosity, what is your /proc/sys/kernel/core_pattern set to? If it's > set to some piped process, can you try and set it to 'core' and see if > that eliminates the truncation of the core dumps for your case? And assuming that works, then I suspect this one would fix your issue even with a piped core dump: diff --git a/fs/coredump.c b/fs/coredump.c index 07afb5ddb1c4..852737a9ccbf 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -41,6 +41,7 @@ #include <linux/fs.h> #include <linux/path.h> #include <linux/timekeeping.h> +#include <linux/io_uring.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -603,6 +604,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) }; audit_core_dumps(siginfo->si_signo); + io_uring_task_cancel(); binfmt = mm->binfmt; if (!binfmt || !binfmt->core_dump) -- Jens Axboe ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 18:24 ` Jens Axboe @ 2021-08-17 19:29 ` Tony Battersby 2021-08-17 19:59 ` Jens Axboe 2021-08-21 9:52 ` Olivier Langlois 1 sibling, 1 reply; 66+ messages in thread From: Tony Battersby @ 2021-08-17 19:29 UTC (permalink / raw) To: Jens Axboe, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 2:24 PM, Jens Axboe wrote: > On 8/17/21 12:15 PM, Jens Axboe wrote: >> On 8/15/21 2:42 PM, Olivier Langlois wrote: >>> On Wed, 2021-08-11 at 19:55 -0600, Jens Axboe wrote: >>>> On 8/10/21 3:48 PM, Tony Battersby wrote: >>>>> On 8/5/21 9:06 AM, Olivier Langlois wrote: >>>>>> Hi all, >>>>>> >>>>>> I didn't forgot about this remaining issue and I have kept thinking >>>>>> about it on and off. >>>>>> >>>>>> I did try the following on 5.12.19: >>>>>> >>>>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>>>> index 07afb5ddb1c4..614fe7a54c1a 100644 >>>>>> --- a/fs/coredump.c >>>>>> +++ b/fs/coredump.c >>>>>> @@ -41,6 +41,7 @@ >>>>>> #include <linux/fs.h> >>>>>> #include <linux/path.h> >>>>>> #include <linux/timekeeping.h> >>>>>> +#include <linux/io_uring.h> >>>>>> >>>>>> #include <linux/uaccess.h> >>>>>> #include <asm/mmu_context.h> >>>>>> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t >>>>>> *siginfo) >>>>>> need_suid_safe = true; >>>>>> } >>>>>> >>>>>> + io_uring_files_cancel(current->files); >>>>>> + >>>>>> retval = coredump_wait(siginfo->si_signo, &core_state); >>>>>> if (retval < 0) >>>>>> goto fail_creds; >>>>>> -- >>>>>> 2.32.0 >>>>>> >>>>>> with my current understanding, io_uring_files_cancel is supposed to >>>>>> cancel everything that might set the TIF_NOTIFY_SIGNAL. >>>>>> >>>>>> I must report that in my testing with generating a core dump >>>>>> through a >>>>>> pipe with the modif above, I still get truncated core dumps. >>>>>> >>>>>> systemd is having a weird error: >>>>>> [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such >>>>>> process >>>>>> >>>>>> and nothing is captured >>>>>> >>>>>> so I have replaced it with a very simple shell: >>>>>> $ cat /proc/sys/kernel/core_pattern >>>>>>> /home/lano1106/bin/pipe_core.sh %e %p >>>>>> ~/bin $ cat pipe_core.sh >>>>>> #!/bin/sh >>>>>> >>>>>> cat > /home/lano1106/core/core.$1.$2 >>>>>> >>>>>> BFD: warning: /home/lano1106/core/core.test.10886 is truncated: >>>>>> expected core file size >= 24129536, found: 61440 >>>>>> >>>>>> I conclude from my attempt that maybe io_uring_files_cancel is not >>>>>> 100% >>>>>> cleaning everything that it should clean. >>>>>> >>>>>> >>>>>> >>>>> I just ran into this problem also - coredumps from an io_uring >>>>> program >>>>> to a pipe are truncated. But I am using kernel 5.10.57, which does >>>>> NOT >>>>> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") >>>>> or >>>>> commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). >>>>> Kernel 5.4 works though, so I bisected the problem to commit >>>>> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup >>>>> properly") in kernel 5.9. Note that my io_uring program uses only >>>>> async >>>>> buffered reads, which may be why this particular commit makes a >>>>> difference to my program. >>>>> >>>>> My io_uring program is a multi-purpose long-running program with many >>>>> threads. Most threads don't use io_uring but a few of them do. >>>>> Normally, my core dumps are piped to a program so that they can be >>>>> compressed before being written to disk, but I can also test writing >>>>> the >>>>> core dumps directly to disk. This is what I have found: >>>>> >>>>> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers >>>>> a >>>>> coredump, the core file is written correctly, whether it is written >>>>> to >>>>> disk or piped to a program, even if another thread is using io_uring >>>>> at >>>>> the same time. >>>>> >>>>> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a >>>>> coredump, the core file is truncated, whether written directly to >>>>> disk >>>>> or piped to a program. >>>>> >>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>> triggers a coredump, and the core is written directly to disk, then >>>>> it >>>>> is written correctly. >>>>> >>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>> triggers a coredump, and the core is piped to a program, then it is >>>>> truncated. >>>>> >>>>> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, >>>>> whether written directly to disk or piped to a program. >>>> That is very interesting. Like Olivier mentioned, it's not that actual >>>> commit, but rather the change of behavior implemented by it. Before >>>> that >>>> commit, we'd hit the async workers more often, whereas after we do the >>>> correct retry method where it's driven by the wakeup when the page is >>>> unlocked. This is purely speculation, but perhaps the fact that the >>>> process changes state potentially mid dump is why the dump ends up >>>> being >>>> truncated? >>>> >>>> I'd love to dive into this and try and figure it out. Absent a test >>>> case, at least the above gives me an idea of what to try out. I'll see >>>> if it makes it easier for me to create a case that does result in a >>>> truncated core dump. >>>> >>> Jens, >>> >>> When I have first encountered the issue, the very first thing that I >>> did try was to create a simple test program that would synthetize the >>> problem. >>> >>> After few time consumming failed attempts, I just gave up the idea and >>> simply settle to my prod program that showcase systematically the >>> problem every time that I kill the process with a SEGV signal. >>> >>> In a nutshell, all the program does is to issue read operations with >>> io_uring on a TCP socket on which there is a constant data stream. >>> >>> Now that I have a better understanding of what is going on, I think >>> that one way that could reproduce the problem consistently could be >>> along those lines: >>> >>> 1. Create a pipe >>> 2. fork a child >>> 3. Initiate a read operation on the pipe with io_uring from the child >>> 4. Let the parent kill its child with a core dump generating signal. >>> 5. Write something in the pipe from the parent so that the io_uring >>> read operation completes while the core dump is generated. >>> >>> I guess that I'll end up doing that if I cannot fix the issue with my >>> current setup but here is what I have attempted so far: >>> >>> 1. Call io_uring_files_cancel from do_coredump >>> 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on >>> returning from io_uring_files_cancel >>> >>> Those attempts didn't work but lurking in the io_uring dev mailing list >>> is starting to pay off. I thought that I did reach the bottom of the >>> rabbit hole in my journey of understanding io_uring but the recent >>> patch set sent by Hao Xu >>> >>> https://lore.kernel.org/io-uring/[email protected]/T/#t >>> >>> made me realize that I still haven't assimilated all the small io_uring >>> nuances... >>> >>> Here is my feedback. From my casual io_uring code reader point of view, >>> it is not 100% obvious what the difference is between >>> io_uring_files_cancel and io_uring_task_cancel >>> >>> It seems like io_uring_files_cancel is cancelling polls only if they >>> have the REQ_F_INFLIGHT flag set. >>> >>> I have no idea what an inflight request means and why someone would >>> want to call io_uring_files_cancel over io_uring_task_cancel. >>> >>> I guess that if I was to meditate on the question for few hours, I >>> would at some point get some illumination strike me but I believe that >>> it could be a good idea to document in the code those concepts for >>> helping casual readers... >>> >>> Bottomline, I now understand that io_uring_files_cancel does not cancel >>> all the requests. Therefore, without fully understanding what I am >>> doing, I am going to replace my call to io_uring_files_cancel from >>> do_coredump with io_uring_task_cancel and see if this finally fix the >>> issue for good. >>> >>> What I am trying to do is to cancel pending io_uring requests to make >>> sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. >>> >>> Maybe another solution would simply be to modify __dump_emit to make it >>> resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally >>> suggested. >>> >>> or maybe do both... >>> >>> Not sure which approach is best. If someone has an opinion, I would be >>> curious to hear it. >> It does indeed sound like it's TIF_NOTIFY_SIGNAL that will trigger some >> signal_pending() and cause an interruption of the core dump. Just out of >> curiosity, what is your /proc/sys/kernel/core_pattern set to? If it's >> set to some piped process, can you try and set it to 'core' and see if >> that eliminates the truncation of the core dumps for your case? > And assuming that works, then I suspect this one would fix your issue > even with a piped core dump: > > diff --git a/fs/coredump.c b/fs/coredump.c > index 07afb5ddb1c4..852737a9ccbf 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -41,6 +41,7 @@ > #include <linux/fs.h> > #include <linux/path.h> > #include <linux/timekeeping.h> > +#include <linux/io_uring.h> > > #include <linux/uaccess.h> > #include <asm/mmu_context.h> > @@ -603,6 +604,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) > }; > > audit_core_dumps(siginfo->si_signo); > + io_uring_task_cancel(); > > binfmt = mm->binfmt; > if (!binfmt || !binfmt->core_dump) > FYI, I tested kernel 5.10.59 + backport 06af8679449d + the patch above with my io_uring program. The coredump locked up even when writing the core file directly to disk; the zombie process could not be killed with "kill -9". Unfortunately I can't test with newer kernels without spending some time on it, and I am too busy with other stuff right now. My io_uring program does async buffered reads (io_uring_prep_read()/io_uring_prep_readv()) from a raw disk partition (no filesystem). One thread submits I/Os while another thread calls io_uring_wait_cqe() and processes the completions. To trigger the coredump, I added an intentional abort() in the thread that submits I/Os after running for a second. Tony Battersby ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 19:29 ` Tony Battersby @ 2021-08-17 19:59 ` Jens Axboe 2021-08-17 21:28 ` Jens Axboe 0 siblings, 1 reply; 66+ messages in thread From: Jens Axboe @ 2021-08-17 19:59 UTC (permalink / raw) To: Tony Battersby, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 1:29 PM, Tony Battersby wrote: > On 8/17/21 2:24 PM, Jens Axboe wrote: >> On 8/17/21 12:15 PM, Jens Axboe wrote: >>> On 8/15/21 2:42 PM, Olivier Langlois wrote: >>>> On Wed, 2021-08-11 at 19:55 -0600, Jens Axboe wrote: >>>>> On 8/10/21 3:48 PM, Tony Battersby wrote: >>>>>> On 8/5/21 9:06 AM, Olivier Langlois wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I didn't forgot about this remaining issue and I have kept thinking >>>>>>> about it on and off. >>>>>>> >>>>>>> I did try the following on 5.12.19: >>>>>>> >>>>>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>>>>> index 07afb5ddb1c4..614fe7a54c1a 100644 >>>>>>> --- a/fs/coredump.c >>>>>>> +++ b/fs/coredump.c >>>>>>> @@ -41,6 +41,7 @@ >>>>>>> #include <linux/fs.h> >>>>>>> #include <linux/path.h> >>>>>>> #include <linux/timekeeping.h> >>>>>>> +#include <linux/io_uring.h> >>>>>>> >>>>>>> #include <linux/uaccess.h> >>>>>>> #include <asm/mmu_context.h> >>>>>>> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t >>>>>>> *siginfo) >>>>>>> need_suid_safe = true; >>>>>>> } >>>>>>> >>>>>>> + io_uring_files_cancel(current->files); >>>>>>> + >>>>>>> retval = coredump_wait(siginfo->si_signo, &core_state); >>>>>>> if (retval < 0) >>>>>>> goto fail_creds; >>>>>>> -- >>>>>>> 2.32.0 >>>>>>> >>>>>>> with my current understanding, io_uring_files_cancel is supposed to >>>>>>> cancel everything that might set the TIF_NOTIFY_SIGNAL. >>>>>>> >>>>>>> I must report that in my testing with generating a core dump >>>>>>> through a >>>>>>> pipe with the modif above, I still get truncated core dumps. >>>>>>> >>>>>>> systemd is having a weird error: >>>>>>> [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such >>>>>>> process >>>>>>> >>>>>>> and nothing is captured >>>>>>> >>>>>>> so I have replaced it with a very simple shell: >>>>>>> $ cat /proc/sys/kernel/core_pattern >>>>>>>> /home/lano1106/bin/pipe_core.sh %e %p >>>>>>> ~/bin $ cat pipe_core.sh >>>>>>> #!/bin/sh >>>>>>> >>>>>>> cat > /home/lano1106/core/core.$1.$2 >>>>>>> >>>>>>> BFD: warning: /home/lano1106/core/core.test.10886 is truncated: >>>>>>> expected core file size >= 24129536, found: 61440 >>>>>>> >>>>>>> I conclude from my attempt that maybe io_uring_files_cancel is not >>>>>>> 100% >>>>>>> cleaning everything that it should clean. >>>>>>> >>>>>>> >>>>>>> >>>>>> I just ran into this problem also - coredumps from an io_uring >>>>>> program >>>>>> to a pipe are truncated. But I am using kernel 5.10.57, which does >>>>>> NOT >>>>>> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") >>>>>> or >>>>>> commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). >>>>>> Kernel 5.4 works though, so I bisected the problem to commit >>>>>> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup >>>>>> properly") in kernel 5.9. Note that my io_uring program uses only >>>>>> async >>>>>> buffered reads, which may be why this particular commit makes a >>>>>> difference to my program. >>>>>> >>>>>> My io_uring program is a multi-purpose long-running program with many >>>>>> threads. Most threads don't use io_uring but a few of them do. >>>>>> Normally, my core dumps are piped to a program so that they can be >>>>>> compressed before being written to disk, but I can also test writing >>>>>> the >>>>>> core dumps directly to disk. This is what I have found: >>>>>> >>>>>> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers >>>>>> a >>>>>> coredump, the core file is written correctly, whether it is written >>>>>> to >>>>>> disk or piped to a program, even if another thread is using io_uring >>>>>> at >>>>>> the same time. >>>>>> >>>>>> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a >>>>>> coredump, the core file is truncated, whether written directly to >>>>>> disk >>>>>> or piped to a program. >>>>>> >>>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>>> triggers a coredump, and the core is written directly to disk, then >>>>>> it >>>>>> is written correctly. >>>>>> >>>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>>> triggers a coredump, and the core is piped to a program, then it is >>>>>> truncated. >>>>>> >>>>>> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, >>>>>> whether written directly to disk or piped to a program. >>>>> That is very interesting. Like Olivier mentioned, it's not that actual >>>>> commit, but rather the change of behavior implemented by it. Before >>>>> that >>>>> commit, we'd hit the async workers more often, whereas after we do the >>>>> correct retry method where it's driven by the wakeup when the page is >>>>> unlocked. This is purely speculation, but perhaps the fact that the >>>>> process changes state potentially mid dump is why the dump ends up >>>>> being >>>>> truncated? >>>>> >>>>> I'd love to dive into this and try and figure it out. Absent a test >>>>> case, at least the above gives me an idea of what to try out. I'll see >>>>> if it makes it easier for me to create a case that does result in a >>>>> truncated core dump. >>>>> >>>> Jens, >>>> >>>> When I have first encountered the issue, the very first thing that I >>>> did try was to create a simple test program that would synthetize the >>>> problem. >>>> >>>> After few time consumming failed attempts, I just gave up the idea and >>>> simply settle to my prod program that showcase systematically the >>>> problem every time that I kill the process with a SEGV signal. >>>> >>>> In a nutshell, all the program does is to issue read operations with >>>> io_uring on a TCP socket on which there is a constant data stream. >>>> >>>> Now that I have a better understanding of what is going on, I think >>>> that one way that could reproduce the problem consistently could be >>>> along those lines: >>>> >>>> 1. Create a pipe >>>> 2. fork a child >>>> 3. Initiate a read operation on the pipe with io_uring from the child >>>> 4. Let the parent kill its child with a core dump generating signal. >>>> 5. Write something in the pipe from the parent so that the io_uring >>>> read operation completes while the core dump is generated. >>>> >>>> I guess that I'll end up doing that if I cannot fix the issue with my >>>> current setup but here is what I have attempted so far: >>>> >>>> 1. Call io_uring_files_cancel from do_coredump >>>> 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on >>>> returning from io_uring_files_cancel >>>> >>>> Those attempts didn't work but lurking in the io_uring dev mailing list >>>> is starting to pay off. I thought that I did reach the bottom of the >>>> rabbit hole in my journey of understanding io_uring but the recent >>>> patch set sent by Hao Xu >>>> >>>> https://lore.kernel.org/io-uring/[email protected]/T/#t >>>> >>>> made me realize that I still haven't assimilated all the small io_uring >>>> nuances... >>>> >>>> Here is my feedback. From my casual io_uring code reader point of view, >>>> it is not 100% obvious what the difference is between >>>> io_uring_files_cancel and io_uring_task_cancel >>>> >>>> It seems like io_uring_files_cancel is cancelling polls only if they >>>> have the REQ_F_INFLIGHT flag set. >>>> >>>> I have no idea what an inflight request means and why someone would >>>> want to call io_uring_files_cancel over io_uring_task_cancel. >>>> >>>> I guess that if I was to meditate on the question for few hours, I >>>> would at some point get some illumination strike me but I believe that >>>> it could be a good idea to document in the code those concepts for >>>> helping casual readers... >>>> >>>> Bottomline, I now understand that io_uring_files_cancel does not cancel >>>> all the requests. Therefore, without fully understanding what I am >>>> doing, I am going to replace my call to io_uring_files_cancel from >>>> do_coredump with io_uring_task_cancel and see if this finally fix the >>>> issue for good. >>>> >>>> What I am trying to do is to cancel pending io_uring requests to make >>>> sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. >>>> >>>> Maybe another solution would simply be to modify __dump_emit to make it >>>> resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally >>>> suggested. >>>> >>>> or maybe do both... >>>> >>>> Not sure which approach is best. If someone has an opinion, I would be >>>> curious to hear it. >>> It does indeed sound like it's TIF_NOTIFY_SIGNAL that will trigger some >>> signal_pending() and cause an interruption of the core dump. Just out of >>> curiosity, what is your /proc/sys/kernel/core_pattern set to? If it's >>> set to some piped process, can you try and set it to 'core' and see if >>> that eliminates the truncation of the core dumps for your case? >> And assuming that works, then I suspect this one would fix your issue >> even with a piped core dump: >> >> diff --git a/fs/coredump.c b/fs/coredump.c >> index 07afb5ddb1c4..852737a9ccbf 100644 >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -41,6 +41,7 @@ >> #include <linux/fs.h> >> #include <linux/path.h> >> #include <linux/timekeeping.h> >> +#include <linux/io_uring.h> >> >> #include <linux/uaccess.h> >> #include <asm/mmu_context.h> >> @@ -603,6 +604,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> }; >> >> audit_core_dumps(siginfo->si_signo); >> + io_uring_task_cancel(); >> >> binfmt = mm->binfmt; >> if (!binfmt || !binfmt->core_dump) >> > FYI, I tested kernel 5.10.59 + backport 06af8679449d + the patch above > with my io_uring program. The coredump locked up even when writing the > core file directly to disk; the zombie process could not be killed with > "kill -9". Unfortunately I can't test with newer kernels without > spending some time on it, and I am too busy with other stuff right now. That sounds like 5.10-stable is missing some of the cancelation backports, and your setup makes the cancelation stall because of that. Need to go over the 11/12/13 fixes and ensure that we've got everything we need for those stable versions, particularly 5.10. > My io_uring program does async buffered reads > (io_uring_prep_read()/io_uring_prep_readv()) from a raw disk partition > (no filesystem). One thread submits I/Os while another thread calls > io_uring_wait_cqe() and processes the completions. To trigger the > coredump, I added an intentional abort() in the thread that submits I/Os > after running for a second. OK, so that one is also using task_work for the retry based async buffered reads, so it makes sense. Maybe a temporary work-around is to use 06af8679449d and eliminate the pipe based coredump? -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 19:59 ` Jens Axboe @ 2021-08-17 21:28 ` Jens Axboe 2021-08-17 21:39 ` Tony Battersby 2021-08-18 2:57 ` Jens Axboe 0 siblings, 2 replies; 66+ messages in thread From: Jens Axboe @ 2021-08-17 21:28 UTC (permalink / raw) To: Tony Battersby, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 1:59 PM, Jens Axboe wrote: > On 8/17/21 1:29 PM, Tony Battersby wrote: >> On 8/17/21 2:24 PM, Jens Axboe wrote: >>> On 8/17/21 12:15 PM, Jens Axboe wrote: >>>> On 8/15/21 2:42 PM, Olivier Langlois wrote: >>>>> On Wed, 2021-08-11 at 19:55 -0600, Jens Axboe wrote: >>>>>> On 8/10/21 3:48 PM, Tony Battersby wrote: >>>>>>> On 8/5/21 9:06 AM, Olivier Langlois wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I didn't forgot about this remaining issue and I have kept thinking >>>>>>>> about it on and off. >>>>>>>> >>>>>>>> I did try the following on 5.12.19: >>>>>>>> >>>>>>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>>>>>> index 07afb5ddb1c4..614fe7a54c1a 100644 >>>>>>>> --- a/fs/coredump.c >>>>>>>> +++ b/fs/coredump.c >>>>>>>> @@ -41,6 +41,7 @@ >>>>>>>> #include <linux/fs.h> >>>>>>>> #include <linux/path.h> >>>>>>>> #include <linux/timekeeping.h> >>>>>>>> +#include <linux/io_uring.h> >>>>>>>> >>>>>>>> #include <linux/uaccess.h> >>>>>>>> #include <asm/mmu_context.h> >>>>>>>> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t >>>>>>>> *siginfo) >>>>>>>> need_suid_safe = true; >>>>>>>> } >>>>>>>> >>>>>>>> + io_uring_files_cancel(current->files); >>>>>>>> + >>>>>>>> retval = coredump_wait(siginfo->si_signo, &core_state); >>>>>>>> if (retval < 0) >>>>>>>> goto fail_creds; >>>>>>>> -- >>>>>>>> 2.32.0 >>>>>>>> >>>>>>>> with my current understanding, io_uring_files_cancel is supposed to >>>>>>>> cancel everything that might set the TIF_NOTIFY_SIGNAL. >>>>>>>> >>>>>>>> I must report that in my testing with generating a core dump >>>>>>>> through a >>>>>>>> pipe with the modif above, I still get truncated core dumps. >>>>>>>> >>>>>>>> systemd is having a weird error: >>>>>>>> [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such >>>>>>>> process >>>>>>>> >>>>>>>> and nothing is captured >>>>>>>> >>>>>>>> so I have replaced it with a very simple shell: >>>>>>>> $ cat /proc/sys/kernel/core_pattern >>>>>>>>> /home/lano1106/bin/pipe_core.sh %e %p >>>>>>>> ~/bin $ cat pipe_core.sh >>>>>>>> #!/bin/sh >>>>>>>> >>>>>>>> cat > /home/lano1106/core/core.$1.$2 >>>>>>>> >>>>>>>> BFD: warning: /home/lano1106/core/core.test.10886 is truncated: >>>>>>>> expected core file size >= 24129536, found: 61440 >>>>>>>> >>>>>>>> I conclude from my attempt that maybe io_uring_files_cancel is not >>>>>>>> 100% >>>>>>>> cleaning everything that it should clean. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> I just ran into this problem also - coredumps from an io_uring >>>>>>> program >>>>>>> to a pipe are truncated. But I am using kernel 5.10.57, which does >>>>>>> NOT >>>>>>> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") >>>>>>> or >>>>>>> commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). >>>>>>> Kernel 5.4 works though, so I bisected the problem to commit >>>>>>> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup >>>>>>> properly") in kernel 5.9. Note that my io_uring program uses only >>>>>>> async >>>>>>> buffered reads, which may be why this particular commit makes a >>>>>>> difference to my program. >>>>>>> >>>>>>> My io_uring program is a multi-purpose long-running program with many >>>>>>> threads. Most threads don't use io_uring but a few of them do. >>>>>>> Normally, my core dumps are piped to a program so that they can be >>>>>>> compressed before being written to disk, but I can also test writing >>>>>>> the >>>>>>> core dumps directly to disk. This is what I have found: >>>>>>> >>>>>>> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers >>>>>>> a >>>>>>> coredump, the core file is written correctly, whether it is written >>>>>>> to >>>>>>> disk or piped to a program, even if another thread is using io_uring >>>>>>> at >>>>>>> the same time. >>>>>>> >>>>>>> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a >>>>>>> coredump, the core file is truncated, whether written directly to >>>>>>> disk >>>>>>> or piped to a program. >>>>>>> >>>>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>>>> triggers a coredump, and the core is written directly to disk, then >>>>>>> it >>>>>>> is written correctly. >>>>>>> >>>>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>>>> triggers a coredump, and the core is piped to a program, then it is >>>>>>> truncated. >>>>>>> >>>>>>> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, >>>>>>> whether written directly to disk or piped to a program. >>>>>> That is very interesting. Like Olivier mentioned, it's not that actual >>>>>> commit, but rather the change of behavior implemented by it. Before >>>>>> that >>>>>> commit, we'd hit the async workers more often, whereas after we do the >>>>>> correct retry method where it's driven by the wakeup when the page is >>>>>> unlocked. This is purely speculation, but perhaps the fact that the >>>>>> process changes state potentially mid dump is why the dump ends up >>>>>> being >>>>>> truncated? >>>>>> >>>>>> I'd love to dive into this and try and figure it out. Absent a test >>>>>> case, at least the above gives me an idea of what to try out. I'll see >>>>>> if it makes it easier for me to create a case that does result in a >>>>>> truncated core dump. >>>>>> >>>>> Jens, >>>>> >>>>> When I have first encountered the issue, the very first thing that I >>>>> did try was to create a simple test program that would synthetize the >>>>> problem. >>>>> >>>>> After few time consumming failed attempts, I just gave up the idea and >>>>> simply settle to my prod program that showcase systematically the >>>>> problem every time that I kill the process with a SEGV signal. >>>>> >>>>> In a nutshell, all the program does is to issue read operations with >>>>> io_uring on a TCP socket on which there is a constant data stream. >>>>> >>>>> Now that I have a better understanding of what is going on, I think >>>>> that one way that could reproduce the problem consistently could be >>>>> along those lines: >>>>> >>>>> 1. Create a pipe >>>>> 2. fork a child >>>>> 3. Initiate a read operation on the pipe with io_uring from the child >>>>> 4. Let the parent kill its child with a core dump generating signal. >>>>> 5. Write something in the pipe from the parent so that the io_uring >>>>> read operation completes while the core dump is generated. >>>>> >>>>> I guess that I'll end up doing that if I cannot fix the issue with my >>>>> current setup but here is what I have attempted so far: >>>>> >>>>> 1. Call io_uring_files_cancel from do_coredump >>>>> 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on >>>>> returning from io_uring_files_cancel >>>>> >>>>> Those attempts didn't work but lurking in the io_uring dev mailing list >>>>> is starting to pay off. I thought that I did reach the bottom of the >>>>> rabbit hole in my journey of understanding io_uring but the recent >>>>> patch set sent by Hao Xu >>>>> >>>>> https://lore.kernel.org/io-uring/[email protected]/T/#t >>>>> >>>>> made me realize that I still haven't assimilated all the small io_uring >>>>> nuances... >>>>> >>>>> Here is my feedback. From my casual io_uring code reader point of view, >>>>> it is not 100% obvious what the difference is between >>>>> io_uring_files_cancel and io_uring_task_cancel >>>>> >>>>> It seems like io_uring_files_cancel is cancelling polls only if they >>>>> have the REQ_F_INFLIGHT flag set. >>>>> >>>>> I have no idea what an inflight request means and why someone would >>>>> want to call io_uring_files_cancel over io_uring_task_cancel. >>>>> >>>>> I guess that if I was to meditate on the question for few hours, I >>>>> would at some point get some illumination strike me but I believe that >>>>> it could be a good idea to document in the code those concepts for >>>>> helping casual readers... >>>>> >>>>> Bottomline, I now understand that io_uring_files_cancel does not cancel >>>>> all the requests. Therefore, without fully understanding what I am >>>>> doing, I am going to replace my call to io_uring_files_cancel from >>>>> do_coredump with io_uring_task_cancel and see if this finally fix the >>>>> issue for good. >>>>> >>>>> What I am trying to do is to cancel pending io_uring requests to make >>>>> sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. >>>>> >>>>> Maybe another solution would simply be to modify __dump_emit to make it >>>>> resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally >>>>> suggested. >>>>> >>>>> or maybe do both... >>>>> >>>>> Not sure which approach is best. If someone has an opinion, I would be >>>>> curious to hear it. >>>> It does indeed sound like it's TIF_NOTIFY_SIGNAL that will trigger some >>>> signal_pending() and cause an interruption of the core dump. Just out of >>>> curiosity, what is your /proc/sys/kernel/core_pattern set to? If it's >>>> set to some piped process, can you try and set it to 'core' and see if >>>> that eliminates the truncation of the core dumps for your case? >>> And assuming that works, then I suspect this one would fix your issue >>> even with a piped core dump: >>> >>> diff --git a/fs/coredump.c b/fs/coredump.c >>> index 07afb5ddb1c4..852737a9ccbf 100644 >>> --- a/fs/coredump.c >>> +++ b/fs/coredump.c >>> @@ -41,6 +41,7 @@ >>> #include <linux/fs.h> >>> #include <linux/path.h> >>> #include <linux/timekeeping.h> >>> +#include <linux/io_uring.h> >>> >>> #include <linux/uaccess.h> >>> #include <asm/mmu_context.h> >>> @@ -603,6 +604,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) >>> }; >>> >>> audit_core_dumps(siginfo->si_signo); >>> + io_uring_task_cancel(); >>> >>> binfmt = mm->binfmt; >>> if (!binfmt || !binfmt->core_dump) >>> >> FYI, I tested kernel 5.10.59 + backport 06af8679449d + the patch above >> with my io_uring program. The coredump locked up even when writing the >> core file directly to disk; the zombie process could not be killed with >> "kill -9". Unfortunately I can't test with newer kernels without >> spending some time on it, and I am too busy with other stuff right now. > > That sounds like 5.10-stable is missing some of the cancelation > backports, and your setup makes the cancelation stall because of that. > Need to go over the 11/12/13 fixes and ensure that we've got everything > we need for those stable versions, particularly 5.10. > >> My io_uring program does async buffered reads >> (io_uring_prep_read()/io_uring_prep_readv()) from a raw disk partition >> (no filesystem). One thread submits I/Os while another thread calls >> io_uring_wait_cqe() and processes the completions. To trigger the >> coredump, I added an intentional abort() in the thread that submits I/Os >> after running for a second. > > OK, so that one is also using task_work for the retry based async > buffered reads, so it makes sense. > > Maybe a temporary work-around is to use 06af8679449d and eliminate the > pipe based coredump? Another approach - don't allow TWA_SIGNAL task_work to get queued if PF_SIGNALED has been set on the task. This is similar to how we reject task_work_add() on process exit, and the callers must be able to handle that already. Can you test this one on top of your 5.10-stable? diff --git a/fs/coredump.c b/fs/coredump.c index 07afb5ddb1c4..ca7c1ee44ada 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t *siginfo) .mm_flags = mm->flags, }; + /* + * task_work_add() will refuse to add work after PF_SIGNALED has + * been set, ensure that we flush any pending TIF_NOTIFY_SIGNAL work + * if any was queued before that. + */ + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) + tracehook_notify_signal(); + audit_core_dumps(siginfo->si_signo); binfmt = mm->binfmt; diff --git a/kernel/task_work.c b/kernel/task_work.c index 1698fbe6f0e1..1ab28904adc4 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, head = READ_ONCE(task->task_works); if (unlikely(head == &work_exited)) return -ESRCH; + /* + * TIF_NOTIFY_SIGNAL notifications will interfere with + * a core dump in progress, reject them. + */ + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) + return -ESRCH; work->next = head; } while (cmpxchg(&task->task_works, head, work) != head); -- Jens Axboe ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 21:28 ` Jens Axboe @ 2021-08-17 21:39 ` Tony Battersby 2021-08-17 22:05 ` Jens Axboe 2021-08-18 2:57 ` Jens Axboe 1 sibling, 1 reply; 66+ messages in thread From: Tony Battersby @ 2021-08-17 21:39 UTC (permalink / raw) To: Jens Axboe, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 5:28 PM, Jens Axboe wrote: > > Another approach - don't allow TWA_SIGNAL task_work to get queued if > PF_SIGNALED has been set on the task. This is similar to how we reject > task_work_add() on process exit, and the callers must be able to handle > that already. > > Can you test this one on top of your 5.10-stable? > > > diff --git a/fs/coredump.c b/fs/coredump.c > index 07afb5ddb1c4..ca7c1ee44ada 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t *siginfo) > .mm_flags = mm->flags, > }; > > + /* > + * task_work_add() will refuse to add work after PF_SIGNALED has > + * been set, ensure that we flush any pending TIF_NOTIFY_SIGNAL work > + * if any was queued before that. > + */ > + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) > + tracehook_notify_signal(); > + > audit_core_dumps(siginfo->si_signo); > > binfmt = mm->binfmt; > diff --git a/kernel/task_work.c b/kernel/task_work.c > index 1698fbe6f0e1..1ab28904adc4 100644 > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, > head = READ_ONCE(task->task_works); > if (unlikely(head == &work_exited)) > return -ESRCH; > + /* > + * TIF_NOTIFY_SIGNAL notifications will interfere with > + * a core dump in progress, reject them. > + */ > + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) > + return -ESRCH; > work->next = head; > } while (cmpxchg(&task->task_works, head, work) != head); > > Doesn't compile. 5.10 doesn't have TIF_NOTIFY_SIGNAL. Tony Battersby ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 21:39 ` Tony Battersby @ 2021-08-17 22:05 ` Jens Axboe 2021-08-18 14:37 ` Tony Battersby 0 siblings, 1 reply; 66+ messages in thread From: Jens Axboe @ 2021-08-17 22:05 UTC (permalink / raw) To: Tony Battersby, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 3:39 PM, Tony Battersby wrote: > On 8/17/21 5:28 PM, Jens Axboe wrote: >> >> Another approach - don't allow TWA_SIGNAL task_work to get queued if >> PF_SIGNALED has been set on the task. This is similar to how we reject >> task_work_add() on process exit, and the callers must be able to handle >> that already. >> >> Can you test this one on top of your 5.10-stable? >> >> >> diff --git a/fs/coredump.c b/fs/coredump.c >> index 07afb5ddb1c4..ca7c1ee44ada 100644 >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> .mm_flags = mm->flags, >> }; >> >> + /* >> + * task_work_add() will refuse to add work after PF_SIGNALED has >> + * been set, ensure that we flush any pending TIF_NOTIFY_SIGNAL work >> + * if any was queued before that. >> + */ >> + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) >> + tracehook_notify_signal(); >> + >> audit_core_dumps(siginfo->si_signo); >> >> binfmt = mm->binfmt; >> diff --git a/kernel/task_work.c b/kernel/task_work.c >> index 1698fbe6f0e1..1ab28904adc4 100644 >> --- a/kernel/task_work.c >> +++ b/kernel/task_work.c >> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, >> head = READ_ONCE(task->task_works); >> if (unlikely(head == &work_exited)) >> return -ESRCH; >> + /* >> + * TIF_NOTIFY_SIGNAL notifications will interfere with >> + * a core dump in progress, reject them. >> + */ >> + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) >> + return -ESRCH; >> work->next = head; >> } while (cmpxchg(&task->task_works, head, work) != head); >> >> > Doesn't compile. 5.10 doesn't have TIF_NOTIFY_SIGNAL. Oh right... Here's one hacked up for the 5.10 TWA_SIGNAL setup. Totally untested... diff --git a/fs/coredump.c b/fs/coredump.c index c6acfc694f65..9e899ce67589 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -603,6 +603,19 @@ void do_coredump(const kernel_siginfo_t *siginfo) .mm_flags = mm->flags, }; + /* + * task_work_add() will refuse to add work after PF_SIGNALED has + * been set, ensure that we flush any pending TWA_SIGNAL work + * if any was queued before that. + */ + if (signal_pending(current) && (current->jobctl & JOBCTL_TASK_WORK)) { + task_work_run(); + spin_lock_irq(¤t->sighand->siglock); + current->jobctl &= ~JOBCTL_TASK_WORK; + recalc_sigpending(); + spin_unlock_irq(¤t->sighand->siglock); + } + audit_core_dumps(siginfo->si_signo); binfmt = mm->binfmt; diff --git a/kernel/task_work.c b/kernel/task_work.c index 8d6e1217c451..93b3f262eb4a 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -39,6 +39,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, head = READ_ONCE(task->task_works); if (unlikely(head == &work_exited)) return -ESRCH; + /* + * TWA_SIGNAL notifications will interfere with + * a core dump in progress, reject them. + */ + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) + return -ESRCH; work->next = head; } while (cmpxchg(&task->task_works, head, work) != head); -- Jens Axboe ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 22:05 ` Jens Axboe @ 2021-08-18 14:37 ` Tony Battersby 2021-08-18 14:46 ` Jens Axboe 0 siblings, 1 reply; 66+ messages in thread From: Tony Battersby @ 2021-08-18 14:37 UTC (permalink / raw) To: Jens Axboe, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 6:05 PM, Jens Axboe wrote: > On 8/17/21 3:39 PM, Tony Battersby wrote: >> On 8/17/21 5:28 PM, Jens Axboe wrote: >>> Another approach - don't allow TWA_SIGNAL task_work to get queued if >>> PF_SIGNALED has been set on the task. This is similar to how we reject >>> task_work_add() on process exit, and the callers must be able to handle >>> that already. >>> >>> Can you test this one on top of your 5.10-stable? >>> >>> >>> diff --git a/fs/coredump.c b/fs/coredump.c >>> index 07afb5ddb1c4..ca7c1ee44ada 100644 >>> --- a/fs/coredump.c >>> +++ b/fs/coredump.c >>> @@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t *siginfo) >>> .mm_flags = mm->flags, >>> }; >>> >>> + /* >>> + * task_work_add() will refuse to add work after PF_SIGNALED has >>> + * been set, ensure that we flush any pending TIF_NOTIFY_SIGNAL work >>> + * if any was queued before that. >>> + */ >>> + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) >>> + tracehook_notify_signal(); >>> + >>> audit_core_dumps(siginfo->si_signo); >>> >>> binfmt = mm->binfmt; >>> diff --git a/kernel/task_work.c b/kernel/task_work.c >>> index 1698fbe6f0e1..1ab28904adc4 100644 >>> --- a/kernel/task_work.c >>> +++ b/kernel/task_work.c >>> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, >>> head = READ_ONCE(task->task_works); >>> if (unlikely(head == &work_exited)) >>> return -ESRCH; >>> + /* >>> + * TIF_NOTIFY_SIGNAL notifications will interfere with >>> + * a core dump in progress, reject them. >>> + */ >>> + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) >>> + return -ESRCH; >>> work->next = head; >>> } while (cmpxchg(&task->task_works, head, work) != head); >>> >>> >> Doesn't compile. 5.10 doesn't have TIF_NOTIFY_SIGNAL. > Oh right... Here's one hacked up for the 5.10 TWA_SIGNAL setup. Totally > untested... > > diff --git a/fs/coredump.c b/fs/coredump.c > index c6acfc694f65..9e899ce67589 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -603,6 +603,19 @@ void do_coredump(const kernel_siginfo_t *siginfo) > .mm_flags = mm->flags, > }; > > + /* > + * task_work_add() will refuse to add work after PF_SIGNALED has > + * been set, ensure that we flush any pending TWA_SIGNAL work > + * if any was queued before that. > + */ > + if (signal_pending(current) && (current->jobctl & JOBCTL_TASK_WORK)) { > + task_work_run(); > + spin_lock_irq(¤t->sighand->siglock); > + current->jobctl &= ~JOBCTL_TASK_WORK; > + recalc_sigpending(); > + spin_unlock_irq(¤t->sighand->siglock); > + } > + > audit_core_dumps(siginfo->si_signo); > > binfmt = mm->binfmt; > diff --git a/kernel/task_work.c b/kernel/task_work.c > index 8d6e1217c451..93b3f262eb4a 100644 > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -39,6 +39,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, > head = READ_ONCE(task->task_works); > if (unlikely(head == &work_exited)) > return -ESRCH; > + /* > + * TWA_SIGNAL notifications will interfere with > + * a core dump in progress, reject them. > + */ > + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) > + return -ESRCH; > work->next = head; > } while (cmpxchg(&task->task_works, head, work) != head); > > Tested with 5.10.59 + backport 06af8679449d + the patch above. That fixes it for me. I tested a couple of variations to make sure. Thanks! Tested-by: Tony Battersby <[email protected]> ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-18 14:37 ` Tony Battersby @ 2021-08-18 14:46 ` Jens Axboe 0 siblings, 0 replies; 66+ messages in thread From: Jens Axboe @ 2021-08-18 14:46 UTC (permalink / raw) To: Tony Battersby, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/18/21 8:37 AM, Tony Battersby wrote: > On 8/17/21 6:05 PM, Jens Axboe wrote: >> On 8/17/21 3:39 PM, Tony Battersby wrote: >>> On 8/17/21 5:28 PM, Jens Axboe wrote: >>>> Another approach - don't allow TWA_SIGNAL task_work to get queued if >>>> PF_SIGNALED has been set on the task. This is similar to how we reject >>>> task_work_add() on process exit, and the callers must be able to handle >>>> that already. >>>> >>>> Can you test this one on top of your 5.10-stable? >>>> >>>> >>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>> index 07afb5ddb1c4..ca7c1ee44ada 100644 >>>> --- a/fs/coredump.c >>>> +++ b/fs/coredump.c >>>> @@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t *siginfo) >>>> .mm_flags = mm->flags, >>>> }; >>>> >>>> + /* >>>> + * task_work_add() will refuse to add work after PF_SIGNALED has >>>> + * been set, ensure that we flush any pending TIF_NOTIFY_SIGNAL work >>>> + * if any was queued before that. >>>> + */ >>>> + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) >>>> + tracehook_notify_signal(); >>>> + >>>> audit_core_dumps(siginfo->si_signo); >>>> >>>> binfmt = mm->binfmt; >>>> diff --git a/kernel/task_work.c b/kernel/task_work.c >>>> index 1698fbe6f0e1..1ab28904adc4 100644 >>>> --- a/kernel/task_work.c >>>> +++ b/kernel/task_work.c >>>> @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, >>>> head = READ_ONCE(task->task_works); >>>> if (unlikely(head == &work_exited)) >>>> return -ESRCH; >>>> + /* >>>> + * TIF_NOTIFY_SIGNAL notifications will interfere with >>>> + * a core dump in progress, reject them. >>>> + */ >>>> + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) >>>> + return -ESRCH; >>>> work->next = head; >>>> } while (cmpxchg(&task->task_works, head, work) != head); >>>> >>>> >>> Doesn't compile. 5.10 doesn't have TIF_NOTIFY_SIGNAL. >> Oh right... Here's one hacked up for the 5.10 TWA_SIGNAL setup. Totally >> untested... >> >> diff --git a/fs/coredump.c b/fs/coredump.c >> index c6acfc694f65..9e899ce67589 100644 >> --- a/fs/coredump.c >> +++ b/fs/coredump.c >> @@ -603,6 +603,19 @@ void do_coredump(const kernel_siginfo_t *siginfo) >> .mm_flags = mm->flags, >> }; >> >> + /* >> + * task_work_add() will refuse to add work after PF_SIGNALED has >> + * been set, ensure that we flush any pending TWA_SIGNAL work >> + * if any was queued before that. >> + */ >> + if (signal_pending(current) && (current->jobctl & JOBCTL_TASK_WORK)) { >> + task_work_run(); >> + spin_lock_irq(¤t->sighand->siglock); >> + current->jobctl &= ~JOBCTL_TASK_WORK; >> + recalc_sigpending(); >> + spin_unlock_irq(¤t->sighand->siglock); >> + } >> + >> audit_core_dumps(siginfo->si_signo); >> >> binfmt = mm->binfmt; >> diff --git a/kernel/task_work.c b/kernel/task_work.c >> index 8d6e1217c451..93b3f262eb4a 100644 >> --- a/kernel/task_work.c >> +++ b/kernel/task_work.c >> @@ -39,6 +39,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, >> head = READ_ONCE(task->task_works); >> if (unlikely(head == &work_exited)) >> return -ESRCH; >> + /* >> + * TWA_SIGNAL notifications will interfere with >> + * a core dump in progress, reject them. >> + */ >> + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) >> + return -ESRCH; >> work->next = head; >> } while (cmpxchg(&task->task_works, head, work) != head); >> >> > Tested with 5.10.59 + backport 06af8679449d + the patch above. That > fixes it for me. I tested a couple of variations to make sure. > > Thanks! > > Tested-by: Tony Battersby <[email protected]> Great, thanks for testing! The 5.10 version is a bit uglier due to how TWA_SIGNAL used to work, but it's the most straight forward backport of the other version I sent. -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 21:28 ` Jens Axboe 2021-08-17 21:39 ` Tony Battersby @ 2021-08-18 2:57 ` Jens Axboe 2021-08-18 2:58 ` Jens Axboe 2021-08-21 10:08 ` Olivier Langlois 1 sibling, 2 replies; 66+ messages in thread From: Jens Axboe @ 2021-08-18 2:57 UTC (permalink / raw) To: Tony Battersby, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/17/21 3:28 PM, Jens Axboe wrote: > On 8/17/21 1:59 PM, Jens Axboe wrote: >> On 8/17/21 1:29 PM, Tony Battersby wrote: >>> On 8/17/21 2:24 PM, Jens Axboe wrote: >>>> On 8/17/21 12:15 PM, Jens Axboe wrote: >>>>> On 8/15/21 2:42 PM, Olivier Langlois wrote: >>>>>> On Wed, 2021-08-11 at 19:55 -0600, Jens Axboe wrote: >>>>>>> On 8/10/21 3:48 PM, Tony Battersby wrote: >>>>>>>> On 8/5/21 9:06 AM, Olivier Langlois wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I didn't forgot about this remaining issue and I have kept thinking >>>>>>>>> about it on and off. >>>>>>>>> >>>>>>>>> I did try the following on 5.12.19: >>>>>>>>> >>>>>>>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>>>>>>> index 07afb5ddb1c4..614fe7a54c1a 100644 >>>>>>>>> --- a/fs/coredump.c >>>>>>>>> +++ b/fs/coredump.c >>>>>>>>> @@ -41,6 +41,7 @@ >>>>>>>>> #include <linux/fs.h> >>>>>>>>> #include <linux/path.h> >>>>>>>>> #include <linux/timekeeping.h> >>>>>>>>> +#include <linux/io_uring.h> >>>>>>>>> >>>>>>>>> #include <linux/uaccess.h> >>>>>>>>> #include <asm/mmu_context.h> >>>>>>>>> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t >>>>>>>>> *siginfo) >>>>>>>>> need_suid_safe = true; >>>>>>>>> } >>>>>>>>> >>>>>>>>> + io_uring_files_cancel(current->files); >>>>>>>>> + >>>>>>>>> retval = coredump_wait(siginfo->si_signo, &core_state); >>>>>>>>> if (retval < 0) >>>>>>>>> goto fail_creds; >>>>>>>>> -- >>>>>>>>> 2.32.0 >>>>>>>>> >>>>>>>>> with my current understanding, io_uring_files_cancel is supposed to >>>>>>>>> cancel everything that might set the TIF_NOTIFY_SIGNAL. >>>>>>>>> >>>>>>>>> I must report that in my testing with generating a core dump >>>>>>>>> through a >>>>>>>>> pipe with the modif above, I still get truncated core dumps. >>>>>>>>> >>>>>>>>> systemd is having a weird error: >>>>>>>>> [ 2577.870742] systemd-coredump[4056]: Failed to get COMM: No such >>>>>>>>> process >>>>>>>>> >>>>>>>>> and nothing is captured >>>>>>>>> >>>>>>>>> so I have replaced it with a very simple shell: >>>>>>>>> $ cat /proc/sys/kernel/core_pattern >>>>>>>>>> /home/lano1106/bin/pipe_core.sh %e %p >>>>>>>>> ~/bin $ cat pipe_core.sh >>>>>>>>> #!/bin/sh >>>>>>>>> >>>>>>>>> cat > /home/lano1106/core/core.$1.$2 >>>>>>>>> >>>>>>>>> BFD: warning: /home/lano1106/core/core.test.10886 is truncated: >>>>>>>>> expected core file size >= 24129536, found: 61440 >>>>>>>>> >>>>>>>>> I conclude from my attempt that maybe io_uring_files_cancel is not >>>>>>>>> 100% >>>>>>>>> cleaning everything that it should clean. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> I just ran into this problem also - coredumps from an io_uring >>>>>>>> program >>>>>>>> to a pipe are truncated. But I am using kernel 5.10.57, which does >>>>>>>> NOT >>>>>>>> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") >>>>>>>> or >>>>>>>> commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). >>>>>>>> Kernel 5.4 works though, so I bisected the problem to commit >>>>>>>> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup >>>>>>>> properly") in kernel 5.9. Note that my io_uring program uses only >>>>>>>> async >>>>>>>> buffered reads, which may be why this particular commit makes a >>>>>>>> difference to my program. >>>>>>>> >>>>>>>> My io_uring program is a multi-purpose long-running program with many >>>>>>>> threads. Most threads don't use io_uring but a few of them do. >>>>>>>> Normally, my core dumps are piped to a program so that they can be >>>>>>>> compressed before being written to disk, but I can also test writing >>>>>>>> the >>>>>>>> core dumps directly to disk. This is what I have found: >>>>>>>> >>>>>>>> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers >>>>>>>> a >>>>>>>> coredump, the core file is written correctly, whether it is written >>>>>>>> to >>>>>>>> disk or piped to a program, even if another thread is using io_uring >>>>>>>> at >>>>>>>> the same time. >>>>>>>> >>>>>>>> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a >>>>>>>> coredump, the core file is truncated, whether written directly to >>>>>>>> disk >>>>>>>> or piped to a program. >>>>>>>> >>>>>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>>>>> triggers a coredump, and the core is written directly to disk, then >>>>>>>> it >>>>>>>> is written correctly. >>>>>>>> >>>>>>>> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring >>>>>>>> triggers a coredump, and the core is piped to a program, then it is >>>>>>>> truncated. >>>>>>>> >>>>>>>> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly, >>>>>>>> whether written directly to disk or piped to a program. >>>>>>> That is very interesting. Like Olivier mentioned, it's not that actual >>>>>>> commit, but rather the change of behavior implemented by it. Before >>>>>>> that >>>>>>> commit, we'd hit the async workers more often, whereas after we do the >>>>>>> correct retry method where it's driven by the wakeup when the page is >>>>>>> unlocked. This is purely speculation, but perhaps the fact that the >>>>>>> process changes state potentially mid dump is why the dump ends up >>>>>>> being >>>>>>> truncated? >>>>>>> >>>>>>> I'd love to dive into this and try and figure it out. Absent a test >>>>>>> case, at least the above gives me an idea of what to try out. I'll see >>>>>>> if it makes it easier for me to create a case that does result in a >>>>>>> truncated core dump. >>>>>>> >>>>>> Jens, >>>>>> >>>>>> When I have first encountered the issue, the very first thing that I >>>>>> did try was to create a simple test program that would synthetize the >>>>>> problem. >>>>>> >>>>>> After few time consumming failed attempts, I just gave up the idea and >>>>>> simply settle to my prod program that showcase systematically the >>>>>> problem every time that I kill the process with a SEGV signal. >>>>>> >>>>>> In a nutshell, all the program does is to issue read operations with >>>>>> io_uring on a TCP socket on which there is a constant data stream. >>>>>> >>>>>> Now that I have a better understanding of what is going on, I think >>>>>> that one way that could reproduce the problem consistently could be >>>>>> along those lines: >>>>>> >>>>>> 1. Create a pipe >>>>>> 2. fork a child >>>>>> 3. Initiate a read operation on the pipe with io_uring from the child >>>>>> 4. Let the parent kill its child with a core dump generating signal. >>>>>> 5. Write something in the pipe from the parent so that the io_uring >>>>>> read operation completes while the core dump is generated. >>>>>> >>>>>> I guess that I'll end up doing that if I cannot fix the issue with my >>>>>> current setup but here is what I have attempted so far: >>>>>> >>>>>> 1. Call io_uring_files_cancel from do_coredump >>>>>> 2. Same as #1 but also make sure that TIF_NOTIFY_SIGNAL is cleared on >>>>>> returning from io_uring_files_cancel >>>>>> >>>>>> Those attempts didn't work but lurking in the io_uring dev mailing list >>>>>> is starting to pay off. I thought that I did reach the bottom of the >>>>>> rabbit hole in my journey of understanding io_uring but the recent >>>>>> patch set sent by Hao Xu >>>>>> >>>>>> https://lore.kernel.org/io-uring/[email protected]/T/#t >>>>>> >>>>>> made me realize that I still haven't assimilated all the small io_uring >>>>>> nuances... >>>>>> >>>>>> Here is my feedback. From my casual io_uring code reader point of view, >>>>>> it is not 100% obvious what the difference is between >>>>>> io_uring_files_cancel and io_uring_task_cancel >>>>>> >>>>>> It seems like io_uring_files_cancel is cancelling polls only if they >>>>>> have the REQ_F_INFLIGHT flag set. >>>>>> >>>>>> I have no idea what an inflight request means and why someone would >>>>>> want to call io_uring_files_cancel over io_uring_task_cancel. >>>>>> >>>>>> I guess that if I was to meditate on the question for few hours, I >>>>>> would at some point get some illumination strike me but I believe that >>>>>> it could be a good idea to document in the code those concepts for >>>>>> helping casual readers... >>>>>> >>>>>> Bottomline, I now understand that io_uring_files_cancel does not cancel >>>>>> all the requests. Therefore, without fully understanding what I am >>>>>> doing, I am going to replace my call to io_uring_files_cancel from >>>>>> do_coredump with io_uring_task_cancel and see if this finally fix the >>>>>> issue for good. >>>>>> >>>>>> What I am trying to do is to cancel pending io_uring requests to make >>>>>> sure that TIF_NOTIFY_SIGNAL isn't set while core dump is generated. >>>>>> >>>>>> Maybe another solution would simply be to modify __dump_emit to make it >>>>>> resilient to TIF_NOTIFY_SIGNAL as Eric W. Biederman originally >>>>>> suggested. >>>>>> >>>>>> or maybe do both... >>>>>> >>>>>> Not sure which approach is best. If someone has an opinion, I would be >>>>>> curious to hear it. >>>>> It does indeed sound like it's TIF_NOTIFY_SIGNAL that will trigger some >>>>> signal_pending() and cause an interruption of the core dump. Just out of >>>>> curiosity, what is your /proc/sys/kernel/core_pattern set to? If it's >>>>> set to some piped process, can you try and set it to 'core' and see if >>>>> that eliminates the truncation of the core dumps for your case? >>>> And assuming that works, then I suspect this one would fix your issue >>>> even with a piped core dump: >>>> >>>> diff --git a/fs/coredump.c b/fs/coredump.c >>>> index 07afb5ddb1c4..852737a9ccbf 100644 >>>> --- a/fs/coredump.c >>>> +++ b/fs/coredump.c >>>> @@ -41,6 +41,7 @@ >>>> #include <linux/fs.h> >>>> #include <linux/path.h> >>>> #include <linux/timekeeping.h> >>>> +#include <linux/io_uring.h> >>>> >>>> #include <linux/uaccess.h> >>>> #include <asm/mmu_context.h> >>>> @@ -603,6 +604,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) >>>> }; >>>> >>>> audit_core_dumps(siginfo->si_signo); >>>> + io_uring_task_cancel(); >>>> >>>> binfmt = mm->binfmt; >>>> if (!binfmt || !binfmt->core_dump) >>>> >>> FYI, I tested kernel 5.10.59 + backport 06af8679449d + the patch above >>> with my io_uring program. The coredump locked up even when writing the >>> core file directly to disk; the zombie process could not be killed with >>> "kill -9". Unfortunately I can't test with newer kernels without >>> spending some time on it, and I am too busy with other stuff right now. >> >> That sounds like 5.10-stable is missing some of the cancelation >> backports, and your setup makes the cancelation stall because of that. >> Need to go over the 11/12/13 fixes and ensure that we've got everything >> we need for those stable versions, particularly 5.10. >> >>> My io_uring program does async buffered reads >>> (io_uring_prep_read()/io_uring_prep_readv()) from a raw disk partition >>> (no filesystem). One thread submits I/Os while another thread calls >>> io_uring_wait_cqe() and processes the completions. To trigger the >>> coredump, I added an intentional abort() in the thread that submits I/Os >>> after running for a second. >> >> OK, so that one is also using task_work for the retry based async >> buffered reads, so it makes sense. >> >> Maybe a temporary work-around is to use 06af8679449d and eliminate the >> pipe based coredump? > > Another approach - don't allow TWA_SIGNAL task_work to get queued if > PF_SIGNALED has been set on the task. This is similar to how we reject > task_work_add() on process exit, and the callers must be able to handle > that already. > > Can you test this one on top of your 5.10-stable? > > > diff --git a/fs/coredump.c b/fs/coredump.c > index 07afb5ddb1c4..ca7c1ee44ada 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -602,6 +602,14 @@ void do_coredump(const kernel_siginfo_t *siginfo) > .mm_flags = mm->flags, > }; > > + /* > + * task_work_add() will refuse to add work after PF_SIGNALED has > + * been set, ensure that we flush any pending TIF_NOTIFY_SIGNAL work > + * if any was queued before that. > + */ > + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) > + tracehook_notify_signal(); > + > audit_core_dumps(siginfo->si_signo); > > binfmt = mm->binfmt; > diff --git a/kernel/task_work.c b/kernel/task_work.c > index 1698fbe6f0e1..1ab28904adc4 100644 > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -41,6 +41,12 @@ int task_work_add(struct task_struct *task, struct callback_head *work, > head = READ_ONCE(task->task_works); > if (unlikely(head == &work_exited)) > return -ESRCH; > + /* > + * TIF_NOTIFY_SIGNAL notifications will interfere with > + * a core dump in progress, reject them. > + */ > + if ((task->flags & PF_SIGNALED) && notify == TWA_SIGNAL) > + return -ESRCH; > work->next = head; > } while (cmpxchg(&task->task_works, head, work) != head); > > Olivier, I sent a 5.10 version for Nathan, any chance you can test this one for the current kernels? Basically this one should work for 5.11+, and the later 5.10 version is just for 5.10. I'm going to send it out separately for review. I do think this is the right solution, barring a tweak maybe on testing notify == TWA_SIGNAL first before digging into the task struct. But the principle is sound, and it'll work for other users of TWA_SIGNAL as well. None right now as far as I can tell, but the live patching is switching to TIF_NOTIFY_SIGNAL as well which will also cause issues with coredumps potentially. -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-18 2:57 ` Jens Axboe @ 2021-08-18 2:58 ` Jens Axboe 2021-08-21 10:08 ` Olivier Langlois 1 sibling, 0 replies; 66+ messages in thread From: Jens Axboe @ 2021-08-18 2:58 UTC (permalink / raw) To: Tony Battersby, Olivier Langlois, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> > Olivier, I sent a 5.10 version for Nathan, any chance you can test this ^^^^^^ Tony of course, my apologies. -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-18 2:57 ` Jens Axboe 2021-08-18 2:58 ` Jens Axboe @ 2021-08-21 10:08 ` Olivier Langlois 2021-08-21 16:47 ` Olivier Langlois 1 sibling, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2021-08-21 10:08 UTC (permalink / raw) To: Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On Tue, 2021-08-17 at 20:57 -0600, Jens Axboe wrote: > > Olivier, I sent a 5.10 version for Nathan, any chance you can test > this > one for the current kernels? Basically this one should work for > 5.11+, > and the later 5.10 version is just for 5.10. I'm going to send it out > separately for review. > > I do think this is the right solution, barring a tweak maybe on > testing > notify == TWA_SIGNAL first before digging into the task struct. But > the > principle is sound, and it'll work for other users of TWA_SIGNAL as > well. None right now as far as I can tell, but the live patching is > switching to TIF_NOTIFY_SIGNAL as well which will also cause issues > with > coredumps potentially. > Ok, I am going to give it a shot. This solution is probably superior to the previous attempt as it does not inject io_uring dependency into the coredump module. The small extra change that I alluded to in my previous reply will still be relevant even if we go with your patch... I'll come back soon with your patch testing result and my small extra change that I keep teasing about. Greetings, ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-21 10:08 ` Olivier Langlois @ 2021-08-21 16:47 ` Olivier Langlois 2021-08-21 16:51 ` Jens Axboe 0 siblings, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2021-08-21 16:47 UTC (permalink / raw) To: Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On Sat, 2021-08-21 at 06:08 -0400, Olivier Langlois wrote: > On Tue, 2021-08-17 at 20:57 -0600, Jens Axboe wrote: > > > > Olivier, I sent a 5.10 version for Nathan, any chance you can test > > this > > one for the current kernels? Basically this one should work for > > 5.11+, > > and the later 5.10 version is just for 5.10. I'm going to send it > > out > > separately for review. > > > > I do think this is the right solution, barring a tweak maybe on > > testing > > notify == TWA_SIGNAL first before digging into the task struct. But > > the > > principle is sound, and it'll work for other users of TWA_SIGNAL as > > well. None right now as far as I can tell, but the live patching is > > switching to TIF_NOTIFY_SIGNAL as well which will also cause issues > > with > > coredumps potentially. > > > Ok, I am going to give it a shot. This solution is probably superior > to > the previous attempt as it does not inject io_uring dependency into > the > coredump module. > > The small extra change that I alluded to in my previous reply will > still be relevant even if we go with your patch... > > I'll come back soon with your patch testing result and my small extra > change that I keep teasing about. > > Greetings, > Jens, your patch doesn't compile with 5.12+. AFAIK, the reason is that JOBCTL_TASK_WORK is gone. Wouldn't just a call to tracehook_notify_signal from do_coredump be enough and backward compatible with every possible stable branches? Greetings, ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-21 16:47 ` Olivier Langlois @ 2021-08-21 16:51 ` Jens Axboe 2021-08-21 17:21 ` Olivier Langlois 0 siblings, 1 reply; 66+ messages in thread From: Jens Axboe @ 2021-08-21 16:51 UTC (permalink / raw) To: Olivier Langlois, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On 8/21/21 10:47 AM, Olivier Langlois wrote: > On Sat, 2021-08-21 at 06:08 -0400, Olivier Langlois wrote: >> On Tue, 2021-08-17 at 20:57 -0600, Jens Axboe wrote: >>> >>> Olivier, I sent a 5.10 version for Nathan, any chance you can test >>> this >>> one for the current kernels? Basically this one should work for >>> 5.11+, >>> and the later 5.10 version is just for 5.10. I'm going to send it >>> out >>> separately for review. >>> >>> I do think this is the right solution, barring a tweak maybe on >>> testing >>> notify == TWA_SIGNAL first before digging into the task struct. But >>> the >>> principle is sound, and it'll work for other users of TWA_SIGNAL as >>> well. None right now as far as I can tell, but the live patching is >>> switching to TIF_NOTIFY_SIGNAL as well which will also cause issues >>> with >>> coredumps potentially. >>> >> Ok, I am going to give it a shot. This solution is probably superior >> to >> the previous attempt as it does not inject io_uring dependency into >> the >> coredump module. >> >> The small extra change that I alluded to in my previous reply will >> still be relevant even if we go with your patch... >> >> I'll come back soon with your patch testing result and my small extra >> change that I keep teasing about. >> >> Greetings, >> > Jens, > > your patch doesn't compile with 5.12+. AFAIK, the reason is that > JOBCTL_TASK_WORK is gone. > > Wouldn't just a call to tracehook_notify_signal from do_coredump be > enough and backward compatible with every possible stable branches? That version is just for 5.10, the first I posted is applicable to 5.11+ -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-21 16:51 ` Jens Axboe @ 2021-08-21 17:21 ` Olivier Langlois 0 siblings, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2021-08-21 17:21 UTC (permalink / raw) To: Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On Sat, 2021-08-21 at 10:51 -0600, Jens Axboe wrote: > On 8/21/21 10:47 AM, Olivier Langlois wrote: > > Jens, > > > > your patch doesn't compile with 5.12+. AFAIK, the reason is that > > JOBCTL_TASK_WORK is gone. > > > > Wouldn't just a call to tracehook_notify_signal from do_coredump be > > enough and backward compatible with every possible stable branches? > > That version is just for 5.10, the first I posted is applicable to > 5.11+ > Ok, in that case, I can tell you that it partially works. There is a small thing missing. I have tested mine which I did share in https://lore.kernel.org/io-uring/87pmwt6biw.fsf@disp2133/T/#m3b51d44c12e39a06b82aac1d372df05312cff833 and with another small patch added to it, it does work. I will offer my patch later today. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 18:24 ` Jens Axboe 2021-08-17 19:29 ` Tony Battersby @ 2021-08-21 9:52 ` Olivier Langlois 1 sibling, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2021-08-21 9:52 UTC (permalink / raw) To: Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On Tue, 2021-08-17 at 12:24 -0600, Jens Axboe wrote: > > And assuming that works, then I suspect this one would fix your issue > even with a piped core dump: > > diff --git a/fs/coredump.c b/fs/coredump.c > index 07afb5ddb1c4..852737a9ccbf 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -41,6 +41,7 @@ > #include <linux/fs.h> > #include <linux/path.h> > #include <linux/timekeeping.h> > +#include <linux/io_uring.h> > > #include <linux/uaccess.h> > #include <asm/mmu_context.h> > @@ -603,6 +604,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) > }; > > audit_core_dumps(siginfo->si_signo); > + io_uring_task_cancel(); > > binfmt = mm->binfmt; > if (!binfmt || !binfmt->core_dump) > That is what my patch is doing. Function call is inserted at a different place... I am not sure if one location is better than the other or if it matters at all but there is an extra change required to make it work... diff --git a/fs/coredump.c b/fs/coredump.c index 07afb5ddb1c4..614fe7a54c1a 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -41,6 +41,7 @@ #include <linux/fs.h> #include <linux/path.h> #include <linux/timekeeping.h> +#include <linux/io_uring.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> @@ -625,6 +626,8 @@ void do_coredump(const kernel_siginfo_t *siginfo) need_suid_safe = true; } + io_uring_task_cancel(); + retval = coredump_wait(siginfo->si_signo, &core_state); if (retval < 0) goto fail_creds; ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH] coredump: Limit what can interrupt coredumps 2021-08-17 18:15 ` Jens Axboe 2021-08-17 18:24 ` Jens Axboe @ 2021-08-21 9:48 ` Olivier Langlois 1 sibling, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2021-08-21 9:48 UTC (permalink / raw) To: Jens Axboe, Tony Battersby, Eric W. Biederman, Oleg Nesterov Cc: Linus Torvalds, Linux Kernel Mailing List, linux-fsdevel, io-uring, Alexander Viro, Pavel Begunkov> On Tue, 2021-08-17 at 12:15 -0600, Jens Axboe wrote: > > It does indeed sound like it's TIF_NOTIFY_SIGNAL that will trigger > some > signal_pending() and cause an interruption of the core dump. Just out > of > curiosity, what is your /proc/sys/kernel/core_pattern set to? If it's > set to some piped process, can you try and set it to 'core' and see > if > that eliminates the truncation of the core dumps for your case? > /proc/sys/kernel/core_pattern is set to: |/home/lano1106/bin/pipe_core.sh %e %p It normally points to systemd coredump module. I have pointed to a simpler program for debugging purposes. when core_pattern points to a local file, core dump files are just fine. That was the whole point of commit 06af8679449d ("coredump: Limit what can interrupt coredumps") I have been distracted by other things this week but my last attempt to fix this problem appears to be successful. I will send out a small patch set shortly... ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL [not found] ` <87h7i694ij.fsf_-_@disp2133> 2021-06-09 20:33 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Linus Torvalds @ 2021-10-22 14:13 ` Pavel Begunkov 2021-12-24 1:34 ` Olivier Langlois 1 sibling, 1 reply; 66+ messages in thread From: Pavel Begunkov @ 2021-10-22 14:13 UTC (permalink / raw) To: Eric W. Biederman, linux-kernel Cc: linux-fsdevel, io-uring, Alexander Viro, Olivier Langlois, Jens Axboe, Oleg Nesterov, Linus Torvalds On 6/9/21 21:17, Eric W. Biederman wrote: > > Folks, > > Olivier Langlois has been struggling with coredumps getting truncated in > tasks using io_uring. He has also apparently been struggling with > the some of his email messages not making it to the lists. Looks syzbot hit something relevant, see https://lore.kernel.org/io-uring/[email protected]/ In short, a task creates an io_uring worker thread, then the worker submits a task_work item to the creator task and won't die until the item is executed/cancelled. And I found that the creator task is sleeping in do_coredump() -> wait_for_completion() 0xffffffff81343ccb is in do_coredump (fs/coredump.c:469). 464 465 if (core_waiters > 0) { 466 struct core_thread *ptr; 467 468 freezer_do_not_count(); 469 wait_for_completion(&core_state->startup); 470 freezer_count(); A hack executing tws there helps (see diff below). Any chance anyone knows what this is and how to fix it? diff --git a/fs/coredump.c b/fs/coredump.c index 3224dee44d30..f6f9dfb02296 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -466,7 +466,8 @@ static int coredump_wait(int exit_code, struct core_state *core_state) struct core_thread *ptr; freezer_do_not_count(); - wait_for_completion(&core_state->startup); + while (wait_for_completion_interruptible(&core_state->startup)) + tracehook_notify_signal(); freezer_count(); /* * Wait for all the threads to become inactive, so that > > We were talking about some of his struggles and questions in this area > and he pointed me to this patch he thought he had posted but I could not > find in the list archives. > > In short the coredump code deliberately supports being interrupted by > SIGKILL, and depends upon prepare_signal to filter out all other > signals. With the io_uring code comes an extra test in signal_pending > for TIF_NOTIFY_SIGNAL (which is something about asking a task to run > task_work_run). > > I am baffled why the dumper thread would be getting interrupted by > TIF_NOTIFY_SIGNAL but apparently it is. Perhaps it is an io_uring > thread that is causing the dump. > > Now that we know the problem the question becomes how to fix this issue. > > Is there any chance all of this TWA_SIGNAL logic could simply be removed > now that io_uring threads are normal process threads? > > There are only the two call sites so I perhaps the could test > signal->flags & SIGNAL_FLAG_COREDUMP before scheduling a work on > a process that is dumping core? > > Perhaps the coredump code needs to call task_work_run before it does > anything? > > ----- > > From: Olivier Langlois <[email protected]> > Subject: [PATCH] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL > Date: Mon, 07 Jun 2021 16:25:06 -0400 > > io_uring is a big user of task_work and any event that io_uring made a > task waiting for that occurs during the core dump generation will > generate a TIF_NOTIFY_SIGNAL. > > Here are the detailed steps of the problem: > 1. io_uring calls vfs_poll() to install a task to a file wait queue > with io_async_wake() as the wakeup function cb from io_arm_poll_handler() > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling > set_notify_signal() > > Signed-off-by: Olivier Langlois <[email protected]> > --- > fs/coredump.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 2868e3e171ae..79c6e3f114db 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -519,7 +519,7 @@ static bool dump_interrupted(void) > * but then we need to teach dump_write() to restart and clear > * TIF_SIGPENDING. > */ > - return signal_pending(current); > + return task_sigpending(current); > } > > static void wait_for_dump_helpers(struct file *file) > -- Pavel Begunkov ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-10-22 14:13 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Pavel Begunkov @ 2021-12-24 1:34 ` Olivier Langlois 2021-12-24 10:37 ` Pavel Begunkov 0 siblings, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2021-12-24 1:34 UTC (permalink / raw) To: Pavel Begunkov, Eric W. Biederman, linux-kernel Cc: linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Oleg Nesterov, Linus Torvalds On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote: > On 6/9/21 21:17, Eric W. Biederman wrote: > > > > Folks, > > > > Olivier Langlois has been struggling with coredumps getting > > truncated in > > tasks using io_uring. He has also apparently been struggling with > > the some of his email messages not making it to the lists. > > Looks syzbot hit something relevant, see > https://lore.kernel.org/io- > uring/[email protected]/ > > In short, a task creates an io_uring worker thread, then the worker > submits a task_work item to the creator task and won't die until > the item is executed/cancelled. And I found that the creator task is > sleeping in do_coredump() -> wait_for_completion() > > 0xffffffff81343ccb is in do_coredump (fs/coredump.c:469). > 464 > 465 if (core_waiters > 0) { > 466 struct core_thread *ptr; > 467 > 468 freezer_do_not_count(); > 469 wait_for_completion(&core_state->startup); > 470 freezer_count(); > > > A hack executing tws there helps (see diff below). > Any chance anyone knows what this is and how to fix it? > > > diff --git a/fs/coredump.c b/fs/coredump.c > index 3224dee44d30..f6f9dfb02296 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -466,7 +466,8 @@ static int coredump_wait(int exit_code, struct > core_state *core_state) > struct core_thread *ptr; > > freezer_do_not_count(); > - wait_for_completion(&core_state->startup); > + while (wait_for_completion_interruptible(&core_state- > >startup)) > + tracehook_notify_signal(); > freezer_count(); > /* > * Wait for all the threads to become inactive, so that > > > > Pavel, I cannot comment on the merit of the proposed hack but my proposed patch to fix the coredump truncation issue when a process using io_uring core dumps that I submitted back in August is still unreviewed! https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15fa.1629655338.git.olivier@trillion01.com/ I have been using it since then I must have generated many dozens of perfect core dump files with it and I have not seen a single truncated core dump files like I used to have prior to the patch. I am bringing back my patch to your attention because one nice side effect of it is that it would have avoided totally the problem that you have encountered in coredump_wait() since it does cancel io_uring resources before calling coredump_wait()! Greetings, Olivier ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-12-24 1:34 ` Olivier Langlois @ 2021-12-24 10:37 ` Pavel Begunkov 2021-12-24 19:52 ` Eric W. Biederman 2022-01-05 19:39 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Olivier Langlois 0 siblings, 2 replies; 66+ messages in thread From: Pavel Begunkov @ 2021-12-24 10:37 UTC (permalink / raw) To: Olivier Langlois, Eric W. Biederman, linux-kernel Cc: linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Oleg Nesterov, Linus Torvalds On 12/24/21 01:34, Olivier Langlois wrote: > On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote: >> On 6/9/21 21:17, Eric W. Biederman wrote: >> In short, a task creates an io_uring worker thread, then the worker >> submits a task_work item to the creator task and won't die until >> the item is executed/cancelled. And I found that the creator task is >> sleeping in do_coredump() -> wait_for_completion() >> [...] >> A hack executing tws there helps (see diff below). >> Any chance anyone knows what this is and how to fix it? >> [...] > Pavel, > > I cannot comment on the merit of the proposed hack but my proposed > patch to fix the coredump truncation issue when a process using > io_uring core dumps that I submitted back in August is still > unreviewed! That's unfortunate. Not like I can help in any case, but I assumed it was dealt with by commit 06af8679449d4ed282df13191fc52d5ba28ec536 Author: Eric W. Biederman <[email protected]> Date: Thu Jun 10 15:11:11 2021 -0500 coredump: Limit what can interrupt coredumps Olivier Langlois has been struggling with coredumps being incompletely written in processes using io_uring. ... > https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15fa.1629655338.git.olivier@trillion01.com/ > > I have been using it since then I must have generated many dozens of > perfect core dump files with it and I have not seen a single truncated > core dump files like I used to have prior to the patch. > > I am bringing back my patch to your attention because one nice side > effect of it is that it would have avoided totally the problem that you > have encountered in coredump_wait() since it does cancel io_uring > resources before calling coredump_wait()! FWIW, I worked it around in io_uring back then by breaking the dependency. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-12-24 10:37 ` Pavel Begunkov @ 2021-12-24 19:52 ` Eric W. Biederman 2021-12-28 11:24 ` Pavel Begunkov 2022-01-05 19:39 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Olivier Langlois 1 sibling, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2021-12-24 19:52 UTC (permalink / raw) To: Pavel Begunkov Cc: Olivier Langlois, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Oleg Nesterov, Linus Torvalds Pavel Begunkov <[email protected]> writes: > On 12/24/21 01:34, Olivier Langlois wrote: >> On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote: >>> On 6/9/21 21:17, Eric W. Biederman wrote: >>> In short, a task creates an io_uring worker thread, then the worker >>> submits a task_work item to the creator task and won't die until >>> the item is executed/cancelled. And I found that the creator task is >>> sleeping in do_coredump() -> wait_for_completion() >>> > [...] >>> A hack executing tws there helps (see diff below). >>> Any chance anyone knows what this is and how to fix it? >>> > [...] >> Pavel, >> >> I cannot comment on the merit of the proposed hack but my proposed >> patch to fix the coredump truncation issue when a process using >> io_uring core dumps that I submitted back in August is still >> unreviewed! > > That's unfortunate. Not like I can help in any case, but I assumed > it was dealt with by > > commit 06af8679449d4ed282df13191fc52d5ba28ec536 > Author: Eric W. Biederman <[email protected]> > Date: Thu Jun 10 15:11:11 2021 -0500 > > coredump: Limit what can interrupt coredumps > Olivier Langlois has been struggling with coredumps being incompletely > written in > processes using io_uring. > ... I thought it had been too. >> https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15fa.1629655338.git.olivier@trillion01.com/ >> >> I have been using it since then I must have generated many dozens of >> perfect core dump files with it and I have not seen a single truncated >> core dump files like I used to have prior to the patch. >> >> I am bringing back my patch to your attention because one nice side >> effect of it is that it would have avoided totally the problem that you >> have encountered in coredump_wait() since it does cancel io_uring >> resources before calling coredump_wait()! > > FWIW, I worked it around in io_uring back then by breaking the > dependency. I am in the middle of untangling the dependencies between ptrace, coredump, signal handling and maybe a few related things. Do folks have a reproducer I can look at? Pavel especially if you have something that reproduces on the current kernels. As part of that I am in the process of guaranteeing all of the coredump work happens in get_signal so nothing of io_uring or any cleanup anywhere else runs until the coredump completes. I haven't quite posted the code for review because it's the holidays. But I am aiming at v5.17 or possibly v5.18, as the code is just about ready. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-12-24 19:52 ` Eric W. Biederman @ 2021-12-28 11:24 ` Pavel Begunkov 2022-03-14 23:58 ` Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Pavel Begunkov @ 2021-12-28 11:24 UTC (permalink / raw) To: Eric W. Biederman Cc: Olivier Langlois, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Oleg Nesterov, Linus Torvalds On 12/24/21 19:52, Eric W. Biederman wrote: > Pavel Begunkov <[email protected]> writes: [...] >> FWIW, I worked it around in io_uring back then by breaking the >> dependency. > > I am in the middle of untangling the dependencies between ptrace, > coredump, signal handling and maybe a few related things. Sounds great > Do folks have a reproducer I can look at? Pavel especially if you have > something that reproduces on the current kernels. A syz reproducer was triggering it reliably, I'd try to revert the commit below and test: https://syzkaller.appspot.com/text?tag=ReproC&x=15d3600cb00000 It should hung a task. Syzbot report for reference: https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e commit 1d5f5ea7cb7d15b9fb1cc82673ebb054f02cd7d2 Author: Pavel Begunkov <[email protected]> Date: Fri Oct 29 13:11:33 2021 +0100 io-wq: remove worker to owner tw dependency INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds. Not tainted 5.15.0-rc5-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:iou-wrk-6609 state:D stack:27944 pid: 6612 ppid: 6526 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xb44/0x5960 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x176/0x280 kernel/sched/completion.c:138 io_worker_exit fs/io-wq.c:183 [inline] io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 ... > As part of that I am in the process of guaranteeing all of the coredump > work happens in get_signal so nothing of io_uring or any cleanup > anywhere else runs until the coredump completes. > > I haven't quite posted the code for review because it's the holidays. > But I am aiming at v5.17 or possibly v5.18, as the code is just about > ready. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-12-28 11:24 ` Pavel Begunkov @ 2022-03-14 23:58 ` Eric W. Biederman [not found] ` <[email protected]> 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2022-03-14 23:58 UTC (permalink / raw) To: Pavel Begunkov Cc: Olivier Langlois, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Oleg Nesterov, Linus Torvalds Pavel Begunkov <[email protected]> writes: > On 12/24/21 19:52, Eric W. Biederman wrote: >> Pavel Begunkov <[email protected]> writes: > [...] >>> FWIW, I worked it around in io_uring back then by breaking the >>> dependency. >> I am in the middle of untangling the dependencies between ptrace, >> coredump, signal handling and maybe a few related things. > > Sounds great > >> Do folks have a reproducer I can look at? Pavel especially if you have >> something that reproduces on the current kernels. > > A syz reproducer was triggering it reliably, I'd try to revert the > commit below and test: > https://syzkaller.appspot.com/text?tag=ReproC&x=15d3600cb00000 > > It should hung a task. Syzbot report for reference: > https://syzkaller.appspot.com/bug?extid=27d62ee6f256b186883e > > > commit 1d5f5ea7cb7d15b9fb1cc82673ebb054f02cd7d2 > Author: Pavel Begunkov <[email protected]> > Date: Fri Oct 29 13:11:33 2021 +0100 > > io-wq: remove worker to owner tw dependency > > INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds. > Not tainted 5.15.0-rc5-syzkaller #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:iou-wrk-6609 state:D stack:27944 pid: 6612 ppid: 6526 flags:0x00004006 > Call Trace: > context_switch kernel/sched/core.c:4940 [inline] > __schedule+0xb44/0x5960 kernel/sched/core.c:6287 > schedule+0xd3/0x270 kernel/sched/core.c:6366 > schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857 > do_wait_for_common kernel/sched/completion.c:85 [inline] > __wait_for_common kernel/sched/completion.c:106 [inline] > wait_for_common kernel/sched/completion.c:117 [inline] > wait_for_completion+0x176/0x280 kernel/sched/completion.c:138 > io_worker_exit fs/io-wq.c:183 [inline] > io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 > ... Thank you very much for this. There were some bugs elsewhere I had to deal with so I am slower looking at this part of the code than I was expecting. I have now reproduced this with the commit reverted on current kernels and the repro.c from the syzcaller report. I am starting to look into how this interacts with my planned code changes in this area. In combination with my other planned changes I think all that needs to happen in do_coredump is to clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING to prevent io_uring interaction problems. But we will see. The deadlock you demonstrate here shows that it is definitely not enough to clear TIF_NOTIFY_SIGNAL (without other changes) so that signal_pending returns false, which I was hoping was be the case. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <[email protected]>]
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL [not found] ` <[email protected]> @ 2022-06-01 3:15 ` Jens Axboe 2022-07-20 16:49 ` [PATCH 0/2] coredump: Allow io_uring using apps to dump to pipes Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Jens Axboe @ 2022-06-01 3:15 UTC (permalink / raw) To: Olivier Langlois, Eric W. Biederman, Pavel Begunkov Cc: linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds On 5/31/22 2:06 PM, Olivier Langlois wrote: > On Mon, 2022-03-14 at 18:58 -0500, Eric W. Biederman wrote: >> >> Thank you very much for this. There were some bugs elsewhere I had >> to >> deal with so I am slower looking at this part of the code than I was >> expecting. >> >> I have now reproduced this with the commit reverted on current >> kernels >> and the repro.c from the syzcaller report. I am starting to look >> into >> how this interacts with my planned code changes in this area. >> >> In combination with my other planned changes I think all that needs >> to >> happen in do_coredump is to clear TIF_NOTIFY_SIGNAL along with >> TIF_SIGPENDING to prevent io_uring interaction problems. But we will >> see. >> >> The deadlock you demonstrate here shows that it is definitely not >> enough >> to clear TIF_NOTIFY_SIGNAL (without other changes) so that >> signal_pending returns false, which I was hoping was be the case. >> >> Eric > > I have been away for some time but if this is not resoved yet, I just > want to remind that clearing TIF_NOTIFY_SIGNAL along with > TIF_SIGPENDING won't do it because io_uring may set them asynchronously > to report some io completion while do_coredump() is executing. > > IMHO, just calling io_uring_task_cancel() from do_coredump() before > actually writing the dump, while maybe not the perfect solution, is the > simplest one. > > Otherwise, maybe masking interrupts locally could work but I wouldn't > dare to explore this option personally... Eric, are you fine with doing the cancelation based patch for now? IMHO it's not the ideal approach, but it will resolve the issue. And it'd honestly be great to get some closure on this so we don't have truncated core dumps if they are interrupted by task_work. The best solution would be to make the core dumps resilient to task_work, but a workable solution would be nice at this point... -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 0/2] coredump: Allow io_uring using apps to dump to pipes. 2022-06-01 3:15 ` Jens Axboe @ 2022-07-20 16:49 ` Eric W. Biederman 2022-07-20 16:50 ` [PATCH 1/2] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman 2022-07-20 16:51 ` [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring Eric W. Biederman 0 siblings, 2 replies; 66+ messages in thread From: Eric W. Biederman @ 2022-07-20 16:49 UTC (permalink / raw) To: Jens Axboe Cc: Olivier Langlois, Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds Folks who have been suffering from io_uring coredumping issues please give this a spin. I have lightly tested this version and more heavily tested an earlier version which had more dependencies. Unless I have missed something in cleaning up the code this should be a comprehensive fix to the coredumping issues when using io_uring. But folks please test and verify this. It has taken me long enough to get back to this point I don't properly remember how the reproducer I have was supposed to fail. All I can say with certainty is set of changes has what looks like a positive effect. Eric W. Biederman (2): signal: Move stopping for the coredump from do_exit into get_signal coredump: Allow coredumps to pipes to work with io_uring fs/coredump.c | 30 ++++++++++++++++++++++++++++-- include/linux/coredump.h | 2 ++ kernel/exit.c | 29 +++++------------------------ kernel/signal.c | 5 +++++ mm/oom_kill.c | 2 +- 5 files changed, 41 insertions(+), 27 deletions(-) Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH 1/2] signal: Move stopping for the coredump from do_exit into get_signal 2022-07-20 16:49 ` [PATCH 0/2] coredump: Allow io_uring using apps to dump to pipes Eric W. Biederman @ 2022-07-20 16:50 ` Eric W. Biederman 2022-07-20 16:51 ` [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring Eric W. Biederman 1 sibling, 0 replies; 66+ messages in thread From: Eric W. Biederman @ 2022-07-20 16:50 UTC (permalink / raw) To: Jens Axboe Cc: Olivier Langlois, Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds Stopping to participate in a coredump from a kernel oops makes no sense and is actively dangerous because the kernel is known to be broken. Considering to stop in a coredump from a kernel thread exit is silly because userspace coredumps are not generated from kernel threads. Not stopping for a coredump in exit(2) and exit_group(2) and related userspace exits that call do_exit or do_group_exit directly is the current behavior of the code as the PF_SIGNALED test in coredump_task_exit attests. Since only tasks that pass through get_signal and set PF_SIGNALED can join coredumps move stopping for coredumps into get_signal, where the PF_SIGNALED test is unnecessary. This avoids even the potential of stopping for coredumps in the silly or dangerous places. This can be seen to be safe by examining the few places that call do_exit: - get_signal calling do_group_exit Called by get_signal to terminate the userspace process. As stopping for the coredump happens now happens in get_signal the code will continue to participate in the coredump. - exit_group(2) calling do_group_exit If a thread calls exit_group(2) while another thread in the same process is performing a coredump there is a race. The thread that wins the race will take the lock and set SIGNAL_GROUP_EXIT. If it is the thread that called do_group_exit then zap_threads will return -EAGAIN and no coredump will be generated. If it is the thread that is coredumping that wins the race, the task that called do_group_exit will exit gracefully with an error code before the coredump begins. Having a single thread exit just before the coredump starts is not ideal as the semantics make no sense. (Did the group exit happen before the coredump or did the coredump happen before the group exit?). Eventually I intend for group exits to flow through get_signal and this silliness will no longer be possible. Until then the current behavior when this race occurs is maintained. - io_uring Called after get_signal returns to terminate the I/O worker thread (essentially a userspace thread that only runs kernel code) so that additional cleanup code can be run before do_exit. As get_signal is called the prior to do_exit code will continue to participate in the coredump. - make_task_dead Called on an unhandled kernel or hardware failure. As the failure is unhandled any extra work has the potential to make the failure worse so being part of a coredump is not appropriate. - kthread_exit Called to terminate a kernel thread as such coredumps do not exist. - call_usermodehelper_exec_async Called to terminate a kernel thread if kerenel_execve fails, as it is a kernel thread coredumps do not exist. - reboot, seeccomp For these calls of do_exit() they are semantically direct calls of exit(2) today. As do_exit() does not synchronize with siglock there is no logical race between a coredump killing the thread and these threads exiting. These threads logically exit before the coredump happens. This is also the current behavior so there is nothing to be concerned about with respect to userspsace semantics or regresssions. Moving the coredump stop for userspace threads that did not dequeue the coredumping signal from from do_exit into get_signal in general is safe, because the coredump in the single threaded case completely happens in get_signal. The code movement ensures that a multi-threaded coredump will not have any issues because the additional threads stop after some amount of cleanup has been done. The coredump code is robust to all kinds of userspace changes happening in parallel as multiple processes can share a mm. This makes the it safe to perform the coredump before the io_uring cleanup happens as io_uring can't do anything another process sharing the mm would not be doing. Signed-off-by: "Eric W. Biederman" <[email protected]> --- fs/coredump.c | 25 ++++++++++++++++++++++++- include/linux/coredump.h | 2 ++ kernel/exit.c | 29 +++++------------------------ kernel/signal.c | 5 +++++ mm/oom_kill.c | 2 +- 5 files changed, 37 insertions(+), 26 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index b836948c9543..67dda77c500f 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -389,6 +389,29 @@ static int zap_threads(struct task_struct *tsk, return nr; } +void coredump_join(struct core_state *core_state) +{ + /* Stop and join the in-progress coredump */ + struct core_thread self; + + self.task = current; + self.next = xchg(&core_state->dumper.next, &self); + /* + * Implies mb(), the result of xchg() must be visible + * to core_state->dumper. + */ + if (atomic_dec_and_test(&core_state->nr_threads)) + complete(&core_state->startup); + + for (;;) { + set_current_state(TASK_UNINTERRUPTIBLE); + if (!self.task) /* see coredump_finish() */ + break; + freezable_schedule(); + } + __set_current_state(TASK_RUNNING); +} + static int coredump_wait(int exit_code, struct core_state *core_state) { struct task_struct *tsk = current; @@ -436,7 +459,7 @@ static void coredump_finish(bool core_dumped) next = curr->next; task = curr->task; /* - * see coredump_task_exit(), curr->task must not see + * see coredump_join(), curr->task must not see * ->task == NULL before we read ->next. */ smp_mb(); diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 08a1d3e7e46d..815d6099b757 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -40,8 +40,10 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); +extern void coredump_join(struct core_state *core_state); extern void do_coredump(const kernel_siginfo_t *siginfo); #else +extern inline void coredump_join(struct core_state *core_state) {} static inline void do_coredump(const kernel_siginfo_t *siginfo) {} #endif diff --git a/kernel/exit.c b/kernel/exit.c index d8ecbaa514f7..2218ca02ac71 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -352,35 +352,16 @@ static void coredump_task_exit(struct task_struct *tsk) * We must hold siglock around checking core_state * and setting PF_POSTCOREDUMP. The core-inducing thread * will increment ->nr_threads for each thread in the - * group without PF_POSTCOREDUMP set. + * group without PF_POSTCOREDUMP set. Decrement ->nr_threads + * and possibly complete core_state->startup to politely skip + * participating in any pending coredumps. */ spin_lock_irq(&tsk->sighand->siglock); tsk->flags |= PF_POSTCOREDUMP; core_state = tsk->signal->core_state; + if (core_state && atomic_dec_and_test(&core_state->nr_threads)) + complete(&core_state->startup); spin_unlock_irq(&tsk->sighand->siglock); - if (core_state) { - struct core_thread self; - - self.task = current; - if (self.task->flags & PF_SIGNALED) - self.next = xchg(&core_state->dumper.next, &self); - else - self.task = NULL; - /* - * Implies mb(), the result of xchg() must be visible - * to core_state->dumper. - */ - if (atomic_dec_and_test(&core_state->nr_threads)) - complete(&core_state->startup); - - for (;;) { - set_current_state(TASK_UNINTERRUPTIBLE); - if (!self.task) /* see coredump_finish() */ - break; - freezable_schedule(); - } - __set_current_state(TASK_RUNNING); - } } #ifdef CONFIG_MEMCG diff --git a/kernel/signal.c b/kernel/signal.c index 8a0f114d00e0..8595c935027e 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2687,6 +2687,7 @@ bool get_signal(struct ksignal *ksig) } for (;;) { + struct core_state *core_state; struct k_sigaction *ka; enum pid_type type; @@ -2820,6 +2821,7 @@ bool get_signal(struct ksignal *ksig) } fatal: + core_state = signal->core_state; spin_unlock_irq(&sighand->siglock); if (unlikely(cgroup_task_frozen(current))) cgroup_leave_frozen(true); @@ -2842,6 +2844,9 @@ bool get_signal(struct ksignal *ksig) * that value and ignore the one we pass it. */ do_coredump(&ksig->info); + } else if (core_state) { + /* Wait for the coredump to happen */ + coredump_join(core_state); } /* diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 3c6cf9e3cd66..1bb689fd9f81 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -845,7 +845,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) /* * A coredumping process may sleep for an extended period in - * coredump_task_exit(), so the oom killer cannot assume that + * get_signal(), so the oom killer cannot assume that * the process will promptly exit and release memory. */ if (sig->core_state) -- 2.35.3 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring 2022-07-20 16:49 ` [PATCH 0/2] coredump: Allow io_uring using apps to dump to pipes Eric W. Biederman 2022-07-20 16:50 ` [PATCH 1/2] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman @ 2022-07-20 16:51 ` Eric W. Biederman 2022-08-22 21:16 ` Olivier Langlois 1 sibling, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2022-07-20 16:51 UTC (permalink / raw) To: Jens Axboe Cc: Olivier Langlois, Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds Now that io_uring like everything else stops for coredumps in get_signal the code can once again allow any interruptible condition after coredump_wait to interrupt the coredump. Clear TIF_NOTIFY_SIGNAL after coredump_wait, to guarantee that anything that sets TIF_NOTIFY_SIGNAL before coredump_wait completed won't cause the coredumps to interrupted. With all of the other threads in the process stopped io_uring doesn't call task_work_add on the thread running do_coredump. Combined with the clearing of TIF_NOTIFY_SIGNAL this allows processes that use io_uring to coredump through pipes. Restore dump_interrupted to be a simple call to signal_pending effectively reverting commit 06af8679449d ("coredump: Limit what can interrupt coredumps"). At this point only SIGKILL delivered to the coredumping thread should be able to cause signal_pending to return true. A nice followup would be to find a reliable race free way to modify task_work_add and probably set_notify_signal to skip setting TIF_NOTIFY_SIGNAL once it is clear a task will no longer process signals and other interruptible conditions. That would allow TIF_NOTIFY_SIGNAL to be cleared where TIF_SIGPENDING is cleared in coredump_zap_process. To be as certain as possible that this works, I tested this with commit 1d5f5ea7cb7d ("io-wq: remove worker to owner tw dependency") reverted. Which means that not only is TIF_NOTIFY_SIGNAL prevented from stopping coredumps to pipes, the sequence of stopping threads to participate in the coredump avoids deadlocks that were possible previously. Signed-off-by: "Eric W. Biederman" <[email protected]> --- fs/coredump.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/coredump.c b/fs/coredump.c index 67dda77c500f..c06594f56cbb 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -476,7 +476,7 @@ static bool dump_interrupted(void) * but then we need to teach dump_write() to restart and clear * TIF_SIGPENDING. */ - return fatal_signal_pending(current) || freezing(current); + return signal_pending(current); } static void wait_for_dump_helpers(struct file *file) @@ -589,6 +589,9 @@ void do_coredump(const kernel_siginfo_t *siginfo) old_cred = override_creds(cred); + /* Don't break out of interruptible sleeps */ + clear_notify_signal(); + ispipe = format_corename(&cn, &cprm, &argv, &argc); if (ispipe) { -- 2.35.3 ^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring 2022-07-20 16:51 ` [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring Eric W. Biederman @ 2022-08-22 21:16 ` Olivier Langlois 2022-08-23 3:35 ` Olivier Langlois 0 siblings, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2022-08-22 21:16 UTC (permalink / raw) To: Eric W. Biederman, Jens Axboe Cc: Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds On Wed, 2022-07-20 at 11:51 -0500, Eric W. Biederman wrote: > > Now that io_uring like everything else stops for coredumps in > get_signal the code can once again allow any interruptible > condition after coredump_wait to interrupt the coredump. > > Clear TIF_NOTIFY_SIGNAL after coredump_wait, to guarantee that > anything that sets TIF_NOTIFY_SIGNAL before coredump_wait completed > won't cause the coredumps to interrupted. > > With all of the other threads in the process stopped io_uring doesn't > call task_work_add on the thread running do_coredump. Combined with > the clearing of TIF_NOTIFY_SIGNAL this allows processes that use > io_uring to coredump through pipes. > > Restore dump_interrupted to be a simple call to signal_pending > effectively reverting commit 06af8679449d ("coredump: Limit what can > interrupt coredumps"). At this point only SIGKILL delivered to the > coredumping thread should be able to cause signal_pending to return > true. > > A nice followup would be to find a reliable race free way to modify > task_work_add and probably set_notify_signal to skip setting > TIF_NOTIFY_SIGNAL once it is clear a task will no longer process > signals and other interruptible conditions. That would allow > TIF_NOTIFY_SIGNAL to be cleared where TIF_SIGPENDING is cleared in > coredump_zap_process. > > To be as certain as possible that this works, I tested this with > commit 1d5f5ea7cb7d ("io-wq: remove worker to owner tw dependency") > reverted. Which means that not only is TIF_NOTIFY_SIGNAL prevented > from stopping coredumps to pipes, the sequence of stopping threads to > participate in the coredump avoids deadlocks that were possible > previously. > > Signed-off-by: "Eric W. Biederman" <[email protected]> > Hi Eric, What is stopping the task calling do_coredump() to be interrupted and call task_work_add() from the interrupt context? This is precisely what I was experiencing last summer when I did work on this issue. My understanding of how async I/O works with io_uring is that the task is added to a wait queue without being put to sleep and when the io_uring callback is called from the interrupt context, task_work_add() is called so that the next time io_uring syscall is invoked, pending work is processed to complete the I/O. So if: 1. io_uring request is initiated AND the task is in a wait queue 2. do_coredump() is called before the I/O is completed IMHO, this is how you end up having task_work_add() called while the coredump is generated. So far, the only way that I have found making sure that this was not happening was to cancel every pending io_uring requests before writing the coredump by calling io_uring_task_cancel(): --- a/fs/coredump.c +++ b/fs/coredump.c @@ -43,7 +43,7 @@ #include <linux/timekeeping.h> #include <linux/sysctl.h> #include <linux/elf.h> - +#include <linux/io_uring.h> #include <linux/uaccess.h> #include <asm/mmu_context.h> #include <asm/tlb.h> @@ -561,6 +561,8 @@ need_suid_safe = true; } + io_uring_task_cancel(); + retval = coredump_wait(siginfo->si_signo, &core_state); if (retval < 0) goto fail_creds; Greetings, ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring 2022-08-22 21:16 ` Olivier Langlois @ 2022-08-23 3:35 ` Olivier Langlois 2022-08-23 18:22 ` Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Olivier Langlois @ 2022-08-23 3:35 UTC (permalink / raw) To: Eric W. Biederman, Jens Axboe Cc: Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote: > > What is stopping the task calling do_coredump() to be interrupted and > call task_work_add() from the interrupt context? > > This is precisely what I was experiencing last summer when I did work > on this issue. > > My understanding of how async I/O works with io_uring is that the > task > is added to a wait queue without being put to sleep and when the > io_uring callback is called from the interrupt context, > task_work_add() > is called so that the next time io_uring syscall is invoked, pending > work is processed to complete the I/O. > > So if: > > 1. io_uring request is initiated AND the task is in a wait queue > 2. do_coredump() is called before the I/O is completed > > IMHO, this is how you end up having task_work_add() called while the > coredump is generated. > I forgot to add that I have experienced the issue with TCP/IP I/O. I suspect that with a TCP socket, the race condition window is much larger than if it was disk I/O and this might make it easier to reproduce the issue this way... ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring 2022-08-23 3:35 ` Olivier Langlois @ 2022-08-23 18:22 ` Eric W. Biederman 2022-08-23 18:27 ` Jens Axboe 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2022-08-23 18:22 UTC (permalink / raw) To: Olivier Langlois Cc: Jens Axboe, Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds Olivier Langlois <[email protected]> writes: > On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote: >> >> What is stopping the task calling do_coredump() to be interrupted and >> call task_work_add() from the interrupt context? >> >> This is precisely what I was experiencing last summer when I did work >> on this issue. >> >> My understanding of how async I/O works with io_uring is that the >> task >> is added to a wait queue without being put to sleep and when the >> io_uring callback is called from the interrupt context, >> task_work_add() >> is called so that the next time io_uring syscall is invoked, pending >> work is processed to complete the I/O. >> >> So if: >> >> 1. io_uring request is initiated AND the task is in a wait queue >> 2. do_coredump() is called before the I/O is completed >> >> IMHO, this is how you end up having task_work_add() called while the >> coredump is generated. >> > I forgot to add that I have experienced the issue with TCP/IP I/O. > > I suspect that with a TCP socket, the race condition window is much > larger than if it was disk I/O and this might make it easier to > reproduce the issue this way... I was under the apparently mistaken impression that the io_uring task_work_add only comes from the io_uring userspace helper threads. Those are definitely suppressed by my change. Do you have any idea in the code where io_uring code is being called in an interrupt context? I would really like to trace that code path so I have a better grasp on what is happening. If task_work_add is being called from interrupt context then something additional from what I have proposed certainly needs to be done. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring 2022-08-23 18:22 ` Eric W. Biederman @ 2022-08-23 18:27 ` Jens Axboe 2022-08-24 15:11 ` Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Jens Axboe @ 2022-08-23 18:27 UTC (permalink / raw) To: Eric W. Biederman, Olivier Langlois Cc: Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds On 8/23/22 12:22 PM, Eric W. Biederman wrote: > Olivier Langlois <[email protected]> writes: > >> On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote: >>> >>> What is stopping the task calling do_coredump() to be interrupted and >>> call task_work_add() from the interrupt context? >>> >>> This is precisely what I was experiencing last summer when I did work >>> on this issue. >>> >>> My understanding of how async I/O works with io_uring is that the >>> task >>> is added to a wait queue without being put to sleep and when the >>> io_uring callback is called from the interrupt context, >>> task_work_add() >>> is called so that the next time io_uring syscall is invoked, pending >>> work is processed to complete the I/O. >>> >>> So if: >>> >>> 1. io_uring request is initiated AND the task is in a wait queue >>> 2. do_coredump() is called before the I/O is completed >>> >>> IMHO, this is how you end up having task_work_add() called while the >>> coredump is generated. >>> >> I forgot to add that I have experienced the issue with TCP/IP I/O. >> >> I suspect that with a TCP socket, the race condition window is much >> larger than if it was disk I/O and this might make it easier to >> reproduce the issue this way... > > I was under the apparently mistaken impression that the io_uring > task_work_add only comes from the io_uring userspace helper threads. > Those are definitely suppressed by my change. > > Do you have any idea in the code where io_uring code is being called in > an interrupt context? I would really like to trace that code path so I > have a better grasp on what is happening. > > If task_work_add is being called from interrupt context then something > additional from what I have proposed certainly needs to be done. task_work may come from the helper threads, but generally it does not. One example would be doing a read from a socket. There's no data there, poll is armed to trigger a retry. When we get the poll notification that there's now data to be read, then we kick that off with task_work. Since it's from the poll handler, it can trigger from interrupt context. See the path from io_uring/poll.c:io_poll_wake() -> __io_poll_execute() -> io_req_task_work_add() -> task_work_add(). It can also happen for regular IRQ based reads from regular files, where the completion is actually done via task_work added from the potentially IRQ based completion path. -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring 2022-08-23 18:27 ` Jens Axboe @ 2022-08-24 15:11 ` Eric W. Biederman 2022-08-24 15:51 ` Jens Axboe 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2022-08-24 15:11 UTC (permalink / raw) To: Jens Axboe Cc: Olivier Langlois, Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds Jens Axboe <[email protected]> writes: > On 8/23/22 12:22 PM, Eric W. Biederman wrote: >> Olivier Langlois <[email protected]> writes: >> >>> On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote: >>>> >>>> What is stopping the task calling do_coredump() to be interrupted and >>>> call task_work_add() from the interrupt context? >>>> >>>> This is precisely what I was experiencing last summer when I did work >>>> on this issue. >>>> >>>> My understanding of how async I/O works with io_uring is that the >>>> task >>>> is added to a wait queue without being put to sleep and when the >>>> io_uring callback is called from the interrupt context, >>>> task_work_add() >>>> is called so that the next time io_uring syscall is invoked, pending >>>> work is processed to complete the I/O. >>>> >>>> So if: >>>> >>>> 1. io_uring request is initiated AND the task is in a wait queue >>>> 2. do_coredump() is called before the I/O is completed >>>> >>>> IMHO, this is how you end up having task_work_add() called while the >>>> coredump is generated. >>>> >>> I forgot to add that I have experienced the issue with TCP/IP I/O. >>> >>> I suspect that with a TCP socket, the race condition window is much >>> larger than if it was disk I/O and this might make it easier to >>> reproduce the issue this way... >> >> I was under the apparently mistaken impression that the io_uring >> task_work_add only comes from the io_uring userspace helper threads. >> Those are definitely suppressed by my change. >> >> Do you have any idea in the code where io_uring code is being called in >> an interrupt context? I would really like to trace that code path so I >> have a better grasp on what is happening. >> >> If task_work_add is being called from interrupt context then something >> additional from what I have proposed certainly needs to be done. > > task_work may come from the helper threads, but generally it does not. > One example would be doing a read from a socket. There's no data there, > poll is armed to trigger a retry. When we get the poll notification that > there's now data to be read, then we kick that off with task_work. Since > it's from the poll handler, it can trigger from interrupt context. See > the path from io_uring/poll.c:io_poll_wake() -> __io_poll_execute() -> > io_req_task_work_add() -> task_work_add(). But that is a task_work to the helper thread correct? > It can also happen for regular IRQ based reads from regular files, where > the completion is actually done via task_work added from the potentially > IRQ based completion path. I can see that. Which leaves me with the question do these task_work's directly wake up the thread that submitted the I/O request? Or is there likely to be something for an I/O thread to do before an ordinary program thread is notified. I am asking because it is only the case of notifying ordinary program threads that is interesting in the case of a coredump. As I understand it a data to read notification would typically be handled by the I/O uring worker thread to trigger reading the data before letting userspace know everything it asked to be done is complete. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring 2022-08-24 15:11 ` Eric W. Biederman @ 2022-08-24 15:51 ` Jens Axboe 0 siblings, 0 replies; 66+ messages in thread From: Jens Axboe @ 2022-08-24 15:51 UTC (permalink / raw) To: Eric W. Biederman Cc: Olivier Langlois, Pavel Begunkov, linux-kernel, linux-fsdevel, io-uring, Alexander Viro, Oleg Nesterov, Linus Torvalds On 8/24/22 9:11 AM, Eric W. Biederman wrote: > Jens Axboe <[email protected]> writes: > >> On 8/23/22 12:22 PM, Eric W. Biederman wrote: >>> Olivier Langlois <[email protected]> writes: >>> >>>> On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote: >>>>> >>>>> What is stopping the task calling do_coredump() to be interrupted and >>>>> call task_work_add() from the interrupt context? >>>>> >>>>> This is precisely what I was experiencing last summer when I did work >>>>> on this issue. >>>>> >>>>> My understanding of how async I/O works with io_uring is that the >>>>> task >>>>> is added to a wait queue without being put to sleep and when the >>>>> io_uring callback is called from the interrupt context, >>>>> task_work_add() >>>>> is called so that the next time io_uring syscall is invoked, pending >>>>> work is processed to complete the I/O. >>>>> >>>>> So if: >>>>> >>>>> 1. io_uring request is initiated AND the task is in a wait queue >>>>> 2. do_coredump() is called before the I/O is completed >>>>> >>>>> IMHO, this is how you end up having task_work_add() called while the >>>>> coredump is generated. >>>>> >>>> I forgot to add that I have experienced the issue with TCP/IP I/O. >>>> >>>> I suspect that with a TCP socket, the race condition window is much >>>> larger than if it was disk I/O and this might make it easier to >>>> reproduce the issue this way... >>> >>> I was under the apparently mistaken impression that the io_uring >>> task_work_add only comes from the io_uring userspace helper threads. >>> Those are definitely suppressed by my change. >>> >>> Do you have any idea in the code where io_uring code is being called in >>> an interrupt context? I would really like to trace that code path so I >>> have a better grasp on what is happening. >>> >>> If task_work_add is being called from interrupt context then something >>> additional from what I have proposed certainly needs to be done. >> >> task_work may come from the helper threads, but generally it does not. >> One example would be doing a read from a socket. There's no data there, >> poll is armed to trigger a retry. When we get the poll notification that >> there's now data to be read, then we kick that off with task_work. Since >> it's from the poll handler, it can trigger from interrupt context. See >> the path from io_uring/poll.c:io_poll_wake() -> __io_poll_execute() -> >> io_req_task_work_add() -> task_work_add(). > > But that is a task_work to the helper thread correct? No, it goes to the task that originally allocated/issued the request. Which would be the original task, unless the request was originally marked as going straight to a helper (IOSQE_ASYNC). For most cases, it'll be the original task, not a helper thread. >> It can also happen for regular IRQ based reads from regular files, where >> the completion is actually done via task_work added from the potentially >> IRQ based completion path. > > I can see that. > > Which leaves me with the question do these task_work's directly wake up > the thread that submitted the I/O request? Or is there likely to be > something for an I/O thread to do before an ordinary program thread is > notified. > > I am asking because it is only the case of notifying ordinary program > threads that is interesting in the case of a coredump. As I understand > it a data to read notification would typically be handled by the I/O > uring worker thread to trigger reading the data before letting userspace > know everything it asked to be done is complete. By default, it'll go back to the original task. If something is pollable, then there's no helper thread doing the request. An issue attempt is performed by the original task, there's no data/space there, poll is armed to trigger a retry. Retry notification will then queue task_work with the original task to retry. Generally for io_uring, helper threads are a slow path and aren't used unless we have no other options. For example, if someone has a network IO backend and there's helper thread activity, then that's generally a sign that something is wrong. This isn't super relevant to this particular topic, just highlighting that normally you would not expect to see much (if any) io-wq activity at all. -- Jens Axboe ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL 2021-12-24 10:37 ` Pavel Begunkov 2021-12-24 19:52 ` Eric W. Biederman @ 2022-01-05 19:39 ` Olivier Langlois 1 sibling, 0 replies; 66+ messages in thread From: Olivier Langlois @ 2022-01-05 19:39 UTC (permalink / raw) To: Pavel Begunkov, Eric W. Biederman, linux-kernel Cc: linux-fsdevel, io-uring, Alexander Viro, Jens Axboe, Oleg Nesterov, Linus Torvalds On Fri, 2021-12-24 at 10:37 +0000, Pavel Begunkov wrote: > On 12/24/21 01:34, Olivier Langlois wrote: > > On Fri, 2021-10-22 at 15:13 +0100, Pavel Begunkov wrote: > > > On 6/9/21 21:17, Eric W. Biederman wrote: > > > In short, a task creates an io_uring worker thread, then the > > > worker > > > submits a task_work item to the creator task and won't die until > > > the item is executed/cancelled. And I found that the creator task > > > is > > > sleeping in do_coredump() -> wait_for_completion() > > > > [...] > > > A hack executing tws there helps (see diff below). > > > Any chance anyone knows what this is and how to fix it? > > > > [...] > > Pavel, > > > > I cannot comment on the merit of the proposed hack but my proposed > > patch to fix the coredump truncation issue when a process using > > io_uring core dumps that I submitted back in August is still > > unreviewed! > > That's unfortunate. Not like I can help in any case, but I assumed > it was dealt with by > > commit 06af8679449d4ed282df13191fc52d5ba28ec536 > Author: Eric W. Biederman <[email protected]> > Date: Thu Jun 10 15:11:11 2021 -0500 > > coredump: Limit what can interrupt coredumps > > Olivier Langlois has been struggling with coredumps being > incompletely written in > processes using io_uring. > ... > It was a partial fix only. Oleg Nesterov pointed out that the fix was not good if if the core dump was written into a pipe. https://lore.kernel.org/io-uring/[email protected]/ > > https://lore.kernel.org/lkml/1625bc89782bf83d9d8c7c63e8ffcb651ccb15 > > [email protected]/ > > > > I have been using it since then I must have generated many dozens > > of > > perfect core dump files with it and I have not seen a single > > truncated > > core dump files like I used to have prior to the patch. > > > > I am bringing back my patch to your attention because one nice side > > effect of it is that it would have avoided totally the problem that > > you > > have encountered in coredump_wait() since it does cancel io_uring > > resources before calling coredump_wait()! > > FWIW, I worked it around in io_uring back then by breaking the > dependency. > Yes I have seen your work to fix the problem. It just seems to me that there is no good reason to keep io_uring process requests once you are generating a core dump and simply cancelling io_uring before generating the core dump would have avoided the problem that you have encountered plus any other similar issues yet to come... Greetings, ^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2022-08-24 15:51 UTC | newest] Thread overview: 66+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <[email protected]> [not found] ` <[email protected]> [not found] ` <87h7i694ij.fsf_-_@disp2133> 2021-06-09 20:33 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Linus Torvalds 2021-06-09 20:48 ` Eric W. Biederman 2021-06-09 20:52 ` Linus Torvalds 2021-06-09 21:02 ` Olivier Langlois 2021-06-09 21:05 ` Eric W. Biederman 2021-06-09 21:26 ` Olivier Langlois 2021-06-09 21:56 ` Olivier Langlois 2021-06-10 14:26 ` Eric W. Biederman 2021-06-10 15:17 ` Olivier Langlois 2021-06-10 18:58 ` [CFT}[PATCH] coredump: Limit what can interrupt coredumps Eric W. Biederman 2021-06-10 19:10 ` Linus Torvalds 2021-06-10 19:18 ` Eric W. Biederman 2021-06-10 19:50 ` Linus Torvalds 2021-06-10 20:11 ` [PATCH] " Eric W. Biederman 2021-06-10 21:04 ` Linus Torvalds 2021-06-12 14:36 ` Olivier Langlois 2021-06-12 16:26 ` Jens Axboe 2021-06-14 14:10 ` Oleg Nesterov 2021-06-14 16:37 ` Eric W. Biederman 2021-06-14 16:59 ` Oleg Nesterov 2021-06-15 22:08 ` Eric W. Biederman 2021-06-16 19:23 ` Olivier Langlois 2021-06-16 20:00 ` Eric W. Biederman 2021-06-18 20:05 ` Olivier Langlois 2021-08-05 13:06 ` Olivier Langlois 2021-08-10 21:48 ` Tony Battersby 2021-08-11 20:47 ` Olivier Langlois 2021-08-12 1:55 ` Jens Axboe 2021-08-12 13:53 ` Tony Battersby 2021-08-15 20:42 ` Olivier Langlois 2021-08-16 13:02 ` Pavel Begunkov 2021-08-16 13:06 ` Pavel Begunkov 2021-08-17 18:15 ` Jens Axboe 2021-08-17 18:24 ` Jens Axboe 2021-08-17 19:29 ` Tony Battersby 2021-08-17 19:59 ` Jens Axboe 2021-08-17 21:28 ` Jens Axboe 2021-08-17 21:39 ` Tony Battersby 2021-08-17 22:05 ` Jens Axboe 2021-08-18 14:37 ` Tony Battersby 2021-08-18 14:46 ` Jens Axboe 2021-08-18 2:57 ` Jens Axboe 2021-08-18 2:58 ` Jens Axboe 2021-08-21 10:08 ` Olivier Langlois 2021-08-21 16:47 ` Olivier Langlois 2021-08-21 16:51 ` Jens Axboe 2021-08-21 17:21 ` Olivier Langlois 2021-08-21 9:52 ` Olivier Langlois 2021-08-21 9:48 ` Olivier Langlois 2021-10-22 14:13 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Pavel Begunkov 2021-12-24 1:34 ` Olivier Langlois 2021-12-24 10:37 ` Pavel Begunkov 2021-12-24 19:52 ` Eric W. Biederman 2021-12-28 11:24 ` Pavel Begunkov 2022-03-14 23:58 ` Eric W. Biederman [not found] ` <[email protected]> 2022-06-01 3:15 ` Jens Axboe 2022-07-20 16:49 ` [PATCH 0/2] coredump: Allow io_uring using apps to dump to pipes Eric W. Biederman 2022-07-20 16:50 ` [PATCH 1/2] signal: Move stopping for the coredump from do_exit into get_signal Eric W. Biederman 2022-07-20 16:51 ` [PATCH 2/2] coredump: Allow coredumps to pipes to work with io_uring Eric W. Biederman 2022-08-22 21:16 ` Olivier Langlois 2022-08-23 3:35 ` Olivier Langlois 2022-08-23 18:22 ` Eric W. Biederman 2022-08-23 18:27 ` Jens Axboe 2022-08-24 15:11 ` Eric W. Biederman 2022-08-24 15:51 ` Jens Axboe 2022-01-05 19:39 ` [RFC] coredump: Do not interrupt dump for TIF_NOTIFY_SIGNAL Olivier Langlois
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox