public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET 0/4] io_uring and task_work interactions
@ 2020-04-06 19:48 Jens Axboe
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz

There's a case for io_uring where we want to run the task_work on ring
exit, but we can't easily do that as we don't know if the task_works has
work, or if exit work has already been queued.

Here's two prep patches that adds a task_work_pending() helper, and
makes task_work_run() check it. Then we can remove these checks from
the caller, and we can have io_uring check if we need to run work if
it needs to on ring exit.

-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] task_work: add task_work_pending() helper
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  2020-04-07 11:28   ` Oleg Nesterov
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

Various callsites currently check current->task_works != NULL to know
when to run work. Add a helper that we use internally for that. This is
in preparation for also not running if exit queue has been queued.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/task_work.h | 13 ++++++++++++-
 kernel/task_work.c        |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index bd9a6a91c097..54c911bbf754 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -15,7 +15,18 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
 
 int task_work_add(struct task_struct *task, struct callback_head *twork, bool);
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
-void task_work_run(void);
+void __task_work_run(void);
+
+static inline bool task_work_pending(void)
+{
+	return current->task_works;
+}
+
+static inline void task_work_run(void)
+{
+	if (task_work_pending())
+		__task_work_run();
+}
 
 static inline void exit_task_work(struct task_struct *task)
 {
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 825f28259a19..9620333423a3 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -87,7 +87,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func)
  * it exits. In the latter case task_work_add() can no longer add the
  * new work after task_work_run() returns.
  */
-void task_work_run(void)
+void __task_work_run(void)
 {
 	struct task_struct *task = current;
 	struct callback_head *work, *head, *next;
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  2020-04-07 11:39   ` Oleg Nesterov
  2020-04-07 12:47   ` Peter Zijlstra
  2020-04-06 19:48 ` [PATCH 3/4] task_work: kill current->task_works checking in callers Jens Axboe
  2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe
  3 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

If task_work has already been run on task exit, we don't always know
if it's safe to run again. Check for task_work_exited in the
task_work_pending() helper. This makes it less fragile in calling
from the exit files path, for example.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/task_work.h | 4 +++-
 kernel/task_work.c        | 8 ++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 54c911bbf754..24f977a8fc35 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -7,6 +7,8 @@
 
 typedef void (*task_work_func_t)(struct callback_head *);
 
+extern struct callback_head task_work_exited;
+
 static inline void
 init_task_work(struct callback_head *twork, task_work_func_t func)
 {
@@ -19,7 +21,7 @@ void __task_work_run(void);
 
 static inline bool task_work_pending(void)
 {
-	return current->task_works;
+	return current->task_works && current->task_works != &task_work_exited;
 }
 
 static inline void task_work_run(void)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 9620333423a3..d6a8b4ab4858 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -3,7 +3,7 @@
 #include <linux/task_work.h>
 #include <linux/tracehook.h>
 
-static struct callback_head work_exited; /* all we need is ->next == NULL */
+struct callback_head task_work_exited = { };
 
 /**
  * task_work_add - ask the @task to execute @work->func()
@@ -31,7 +31,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
 
 	do {
 		head = READ_ONCE(task->task_works);
-		if (unlikely(head == &work_exited))
+		if (unlikely(head == &task_work_exited))
 			return -ESRCH;
 		work->next = head;
 	} while (cmpxchg(&task->task_works, head, work) != head);
@@ -95,14 +95,14 @@ void __task_work_run(void)
 	for (;;) {
 		/*
 		 * work->func() can do task_work_add(), do not set
-		 * work_exited unless the list is empty.
+		 * task_work_exited unless the list is empty.
 		 */
 		do {
 			head = NULL;
 			work = READ_ONCE(task->task_works);
 			if (!work) {
 				if (task->flags & PF_EXITING)
-					head = &work_exited;
+					head = &task_work_exited;
 				else
 					break;
 			}
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] task_work: kill current->task_works checking in callers
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

If the callsite cares, use task_work_pending(). If not, just call
task_work_run() unconditionally, that makes the check inline and
doesn't add any extra overhead.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io-wq.c                |  7 ++-----
 fs/io_uring.c             | 20 ++++++++------------
 include/linux/tracehook.h |  3 +--
 kernel/signal.c           |  7 +++----
 4 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 4023c9846860..5bee3f5f67e1 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -717,8 +717,7 @@ static int io_wq_manager(void *data)
 	complete(&wq->done);
 
 	while (!kthread_should_stop()) {
-		if (current->task_works)
-			task_work_run();
+		task_work_run();
 
 		for_each_node(node) {
 			struct io_wqe *wqe = wq->wqes[node];
@@ -742,9 +741,7 @@ static int io_wq_manager(void *data)
 		schedule_timeout(HZ);
 	}
 
-	if (current->task_works)
-		task_work_run();
-
+	task_work_run();
 	return 0;
 err:
 	set_bit(IO_WQ_BIT_ERROR, &wq->state);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 79bd22289d73..1579390c7c53 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5967,8 +5967,7 @@ static int io_sq_thread(void *data)
 			if (!list_empty(&ctx->poll_list) ||
 			    (!time_after(jiffies, timeout) && ret != -EBUSY &&
 			    !percpu_ref_is_dying(&ctx->refs))) {
-				if (current->task_works)
-					task_work_run();
+				task_work_run();
 				cond_resched();
 				continue;
 			}
@@ -6000,8 +5999,8 @@ static int io_sq_thread(void *data)
 					finish_wait(&ctx->sqo_wait, &wait);
 					break;
 				}
-				if (current->task_works) {
-					task_work_run();
+				if (task_work_pending()) {
+					__task_work_run();
 					finish_wait(&ctx->sqo_wait, &wait);
 					continue;
 				}
@@ -6024,8 +6023,7 @@ static int io_sq_thread(void *data)
 		timeout = jiffies + ctx->sq_thread_idle;
 	}
 
-	if (current->task_works)
-		task_work_run();
+	task_work_run();
 
 	set_fs(old_fs);
 	if (cur_mm) {
@@ -6094,9 +6092,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		if (io_cqring_events(ctx, false) >= min_events)
 			return 0;
-		if (!current->task_works)
+		if (!task_work_pending())
 			break;
-		task_work_run();
+		__task_work_run();
 	} while (1);
 
 	if (sig) {
@@ -6117,8 +6115,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		if (current->task_works)
-			task_work_run();
+		task_work_run();
 		if (io_should_wake(&iowq, false))
 			break;
 		schedule();
@@ -7467,8 +7464,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	int submitted = 0;
 	struct fd f;
 
-	if (current->task_works)
-		task_work_run();
+	task_work_run();
 
 	if (flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP))
 		return -EINVAL;
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 36fb3bbed6b2..608a2d12bc14 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -184,8 +184,7 @@ static inline void tracehook_notify_resume(struct pt_regs *regs)
 	 * hlist_add_head(task->task_works);
 	 */
 	smp_mb__after_atomic();
-	if (unlikely(current->task_works))
-		task_work_run();
+	task_work_run();
 
 #ifdef CONFIG_KEYS_REQUEST_CACHE
 	if (unlikely(current->cached_requested_key)) {
diff --git a/kernel/signal.c b/kernel/signal.c
index e58a6c619824..d62b7a3f2045 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2271,8 +2271,8 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
 void ptrace_notify(int exit_code)
 {
 	BUG_ON((exit_code & (0x7f | ~0xffff)) != SIGTRAP);
-	if (unlikely(current->task_works))
-		task_work_run();
+
+	task_work_run();
 
 	spin_lock_irq(&current->sighand->siglock);
 	ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED);
@@ -2529,8 +2529,7 @@ bool get_signal(struct ksignal *ksig)
 	struct signal_struct *signal = current->signal;
 	int signr;
 
-	if (unlikely(current->task_works))
-		task_work_run();
+	task_work_run();
 
 	if (unlikely(uprobe_deny_signal()))
 		return false;
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] io_uring: flush task work before waiting for ring exit
  2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
                   ` (2 preceding siblings ...)
  2020-04-06 19:48 ` [PATCH 3/4] task_work: kill current->task_works checking in callers Jens Axboe
@ 2020-04-06 19:48 ` Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-06 19:48 UTC (permalink / raw)
  To: io-uring; +Cc: peterz, Jens Axboe

We could have triggered task work when we removed poll completions.
It's now safe to unconditionally call task_work_run() since it checks
for the exited task work, do so in case we have pending items there.
Ensure we do so before flushing the CQ ring overflow.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1579390c7c53..183bd28761e3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7295,10 +7295,13 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 		io_wq_cancel_all(ctx->io_wq);
 
 	io_iopoll_reap_events(ctx);
+	idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);
+	task_work_run();
+
 	/* if we failed setting up the ctx, we might not have any rings */
 	if (ctx->rings)
 		io_cqring_overflow_flush(ctx, true);
-	idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);
+
 	wait_for_completion(&ctx->completions[0]);
 	io_ring_ctx_free(ctx);
 }
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] task_work: add task_work_pending() helper
  2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
@ 2020-04-07 11:28   ` Oleg Nesterov
  2020-04-07 15:43     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 11:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/06, Jens Axboe wrote:
>
> +static inline bool task_work_pending(void)
> +{
> +	return current->task_works;
> +}
> +
> +static inline void task_work_run(void)
> +{
> +	if (task_work_pending())
> +		__task_work_run();
> +}

No, this is wrong. exit_task_work() must always call __task_work_run()
to install work_exited.

This helper (and 3/4) probably makes sense but please change exit_task_work()
to use __task_work_run() then.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
@ 2020-04-07 11:39   ` Oleg Nesterov
  2020-04-07 11:54     ` Oleg Nesterov
  2020-04-07 15:40     ` Jens Axboe
  2020-04-07 12:47   ` Peter Zijlstra
  1 sibling, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 11:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/06, Jens Axboe wrote:
>
> +extern struct callback_head task_work_exited;
> +
>  static inline void
>  init_task_work(struct callback_head *twork, task_work_func_t func)
>  {
> @@ -19,7 +21,7 @@ void __task_work_run(void);
>
>  static inline bool task_work_pending(void)
>  {
> -	return current->task_works;
> +	return current->task_works && current->task_works != &task_work_exited;
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Well, this penalizes all the current users, they can't hit work_exited.

IIUC, this is needed for the next change which adds task_work_run() into
io_ring_ctx_wait_and_kill(), right?

could you explain how the exiting can call io_ring_ctx_wait_and_kill()
after it passed exit_task_work() ?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 11:39   ` Oleg Nesterov
@ 2020-04-07 11:54     ` Oleg Nesterov
  2020-04-07 15:40     ` Jens Axboe
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 11:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/07, Oleg Nesterov wrote:
>
> On 04/06, Jens Axboe wrote:
> >
> > +extern struct callback_head task_work_exited;
> > +
> >  static inline void
> >  init_task_work(struct callback_head *twork, task_work_func_t func)
> >  {
> > @@ -19,7 +21,7 @@ void __task_work_run(void);
> >
> >  static inline bool task_work_pending(void)
> >  {
> > -	return current->task_works;
> > +	return current->task_works && current->task_works != &task_work_exited;
>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Well, this penalizes all the current users, they can't hit work_exited.
>
> IIUC, this is needed for the next change which adds task_work_run() into
> io_ring_ctx_wait_and_kill(), right?
>
> could you explain how the exiting can call io_ring_ctx_wait_and_kill()
> after it passed exit_task_work() ?

Note also that currently we assume that task_work_run() must not be called
by the exiting process (except exit_task_work). If io_ring adds the precedent
we need change the PF_EXITING logic in task_work_run().

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
  2020-04-07 11:39   ` Oleg Nesterov
@ 2020-04-07 12:47   ` Peter Zijlstra
  2020-04-07 15:43     ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2020-04-07 12:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Oleg Nesterov, viro


You seem to have lost Oleg and Al from the Cc list..

On Mon, Apr 06, 2020 at 01:48:51PM -0600, Jens Axboe wrote:
> If task_work has already been run on task exit, we don't always know
> if it's safe to run again. Check for task_work_exited in the
> task_work_pending() helper. This makes it less fragile in calling
> from the exit files path, for example.
> 
> Signed-off-by: Jens Axboe <[email protected]>
> ---
>  include/linux/task_work.h | 4 +++-
>  kernel/task_work.c        | 8 ++++----
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/task_work.h b/include/linux/task_work.h
> index 54c911bbf754..24f977a8fc35 100644
> --- a/include/linux/task_work.h
> +++ b/include/linux/task_work.h
> @@ -7,6 +7,8 @@
>  
>  typedef void (*task_work_func_t)(struct callback_head *);
>  
> +extern struct callback_head task_work_exited;
> +
>  static inline void
>  init_task_work(struct callback_head *twork, task_work_func_t func)
>  {
> @@ -19,7 +21,7 @@ void __task_work_run(void);
>  
>  static inline bool task_work_pending(void)
>  {
> -	return current->task_works;
> +	return current->task_works && current->task_works != &task_work_exited;
>  }

Hurmph..  not sure I like this. It inlines that second condition to
every caller of task_work_run() even though for pretty much all of them
this is impossible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 11:39   ` Oleg Nesterov
  2020-04-07 11:54     ` Oleg Nesterov
@ 2020-04-07 15:40     ` Jens Axboe
  2020-04-07 16:19       ` Oleg Nesterov
  1 sibling, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 15:40 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: io-uring, peterz

On 4/7/20 4:39 AM, Oleg Nesterov wrote:
> On 04/06, Jens Axboe wrote:
>>
>> +extern struct callback_head task_work_exited;
>> +
>>  static inline void
>>  init_task_work(struct callback_head *twork, task_work_func_t func)
>>  {
>> @@ -19,7 +21,7 @@ void __task_work_run(void);
>>
>>  static inline bool task_work_pending(void)
>>  {
>> -	return current->task_works;
>> +	return current->task_works && current->task_works != &task_work_exited;
>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Well, this penalizes all the current users, they can't hit work_exited.
> 
> IIUC, this is needed for the next change which adds task_work_run() into
> io_ring_ctx_wait_and_kill(), right?

Right - so you'd rather I localize that check there instead? Can certainly
do that.

> could you explain how the exiting can call io_ring_ctx_wait_and_kill()
> after it passed exit_task_work() ?

Sure, here's a trace where it happens:


BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 0 P4D 0 
Oops: 0010 [#1] SMP
CPU: 51 PID: 7290 Comm: mc_worker Kdump: loaded Not tainted 5.2.9-03696-gf2db01aa1e97 #190
Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B17 03/16/2018
RIP: 0010:0x0
Code: Bad RIP value.
RSP: 0018:ffffc9002721bc78 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff82d10ff0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffc9002721bc60 RDI: ffffffff82d10ff0
RBP: ffff889fd220e8f0 R08: 0000000000000000 R09: ffffffff812f1000
R10: ffff88bfa5fcb100 R11: 0000000000000000 R12: ffff889fd220e200
R13: ffff889fd220e92c R14: ffffffff82d10ff0 R15: 0000000000000000
FS:  00007f03161ff700(0000) GS:ffff88bfff9c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000002409004 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __task_work_run+0x66/0xa0
 io_ring_ctx_wait_and_kill+0x14e/0x3c0
 io_uring_release+0x1c/0x20
 __fput+0xaa/0x200
 __task_work_run+0x66/0xa0
 do_exit+0x9cf/0xb40
 do_group_exit+0x3a/0xa0
 get_signal+0x152/0x800
 do_signal+0x36/0x640
 ? __audit_syscall_exit+0x23c/0x290
 exit_to_usermode_loop+0x65/0x100
 do_syscall_64+0xd4/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f77057fe8ce
Code: Bad RIP value.
RSP: 002b:00007f03161f8960 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: 000000000000002a RBX: 00007f03161f8a30 RCX: 00007f77057fe8ce
RDX: 0000000000004040 RSI: 00007f03161f8a30 RDI: 00000000000057a4
RBP: 00007f03161f8980 R08: 0000000000000000 R09: 00007f03161fb7b8
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000004040 R14: 00007f02dc12bc00 R15: 00007f02dc21b900

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 12:47   ` Peter Zijlstra
@ 2020-04-07 15:43     ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 15:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: io-uring, Oleg Nesterov, viro

On 4/7/20 5:47 AM, Peter Zijlstra wrote:
> 
> You seem to have lost Oleg and Al from the Cc list..

I'll add them for v2, I did point Oleg at it!

> On Mon, Apr 06, 2020 at 01:48:51PM -0600, Jens Axboe wrote:
>> If task_work has already been run on task exit, we don't always know
>> if it's safe to run again. Check for task_work_exited in the
>> task_work_pending() helper. This makes it less fragile in calling
>> from the exit files path, for example.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>  include/linux/task_work.h | 4 +++-
>>  kernel/task_work.c        | 8 ++++----
>>  2 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/task_work.h b/include/linux/task_work.h
>> index 54c911bbf754..24f977a8fc35 100644
>> --- a/include/linux/task_work.h
>> +++ b/include/linux/task_work.h
>> @@ -7,6 +7,8 @@
>>  
>>  typedef void (*task_work_func_t)(struct callback_head *);
>>  
>> +extern struct callback_head task_work_exited;
>> +
>>  static inline void
>>  init_task_work(struct callback_head *twork, task_work_func_t func)
>>  {
>> @@ -19,7 +21,7 @@ void __task_work_run(void);
>>  
>>  static inline bool task_work_pending(void)
>>  {
>> -	return current->task_works;
>> +	return current->task_works && current->task_works != &task_work_exited;
>>  }
> 
> Hurmph..  not sure I like this. It inlines that second condition to
> every caller of task_work_run() even though for pretty much all of them
> this is impossible.

Oleg had the same concern, and I agree with both of you. Would you
prefer it we just leave task_work_run() as:

static inline void task_work_run(void)
{
	if (current->task_works)
		__task_work_run();
}

and then have the io_uring caller do:

	if (current->task_works != &task_work_exited)
		task_work_run();

instead?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] task_work: add task_work_pending() helper
  2020-04-07 11:28   ` Oleg Nesterov
@ 2020-04-07 15:43     ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 15:43 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: io-uring, peterz

On 4/7/20 4:28 AM, Oleg Nesterov wrote:
> On 04/06, Jens Axboe wrote:
>>
>> +static inline bool task_work_pending(void)
>> +{
>> +	return current->task_works;
>> +}
>> +
>> +static inline void task_work_run(void)
>> +{
>> +	if (task_work_pending())
>> +		__task_work_run();
>> +}
> 
> No, this is wrong. exit_task_work() must always call __task_work_run()
> to install work_exited.
> 
> This helper (and 3/4) probably makes sense but please change exit_task_work()
> to use __task_work_run() then.

Good catch, thanks. I'll make the change.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 15:40     ` Jens Axboe
@ 2020-04-07 16:19       ` Oleg Nesterov
  2020-04-07 16:59         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 16:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

On 04/07, Jens Axboe wrote:
>
> On 4/7/20 4:39 AM, Oleg Nesterov wrote:
> >
> > IIUC, this is needed for the next change which adds task_work_run() into
> > io_ring_ctx_wait_and_kill(), right?
>
> Right - so you'd rather I localize that check there instead? Can certainly
> do that.

I am still not sure we need this check at all... probably this is because
I don't understand the problem.

> > could you explain how the exiting can call io_ring_ctx_wait_and_kill()
> > after it passed exit_task_work() ?
>
> Sure, here's a trace where it happens:

but this task has not passed exit_task_work(),

>  __task_work_run+0x66/0xa0
>  io_ring_ctx_wait_and_kill+0x14e/0x3c0
>  io_uring_release+0x1c/0x20
>  __fput+0xaa/0x200
>  __task_work_run+0x66/0xa0
>  do_exit+0x9cf/0xb40

So task_work_run() is called recursively from exit_task_work()->task_work_run().
See my another email, this is wrong with or without this series. And that is
why I think task_work_run() hits work_exited.

Could you explain why io_ring_ctx_wait_and_kill() needs task_work_run() ?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 16:19       ` Oleg Nesterov
@ 2020-04-07 16:59         ` Jens Axboe
  2020-04-07 17:43           ` Oleg Nesterov
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-04-07 16:59 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: io-uring, peterz

On 4/7/20 9:19 AM, Oleg Nesterov wrote:
> On 04/07, Jens Axboe wrote:
>>
>> On 4/7/20 4:39 AM, Oleg Nesterov wrote:
>>>
>>> IIUC, this is needed for the next change which adds task_work_run() into
>>> io_ring_ctx_wait_and_kill(), right?
>>
>> Right - so you'd rather I localize that check there instead? Can certainly
>> do that.
> 
> I am still not sure we need this check at all... probably this is because
> I don't understand the problem.

Probably because I'm not explaining it very well... Let me try. io_uring
uses the task_work to handle completion of poll requests. Either an
explicit poll, or one done implicitly because someone did a recv/send
(or whatever) on a socket that wasn't ready. When we get the poll
waitqueue callback, we queue up task_work to handle the completion of
it.

These can come in at any time, obviously, as space or data becomes
available. If the task is exiting, our task_work_add() fails, and we
queue with someone else. But there seems to be a case where it does get
queued locally, and then io_uring doesn't know if it's safe to run
task_work or not. Sometimes that manifests itself in hitting the RIP ==
0 case that I included here. With the work_pending && work != exit_work
in place, it works fine.

>>> could you explain how the exiting can call io_ring_ctx_wait_and_kill()
>>> after it passed exit_task_work() ?
>>
>> Sure, here's a trace where it happens:
> 
> but this task has not passed exit_task_work(),

But it's definitely hitting callback->func == NULL, which is the
exit_work. So if it's not from past exit_task_work(), where is it from?

> 
>>  __task_work_run+0x66/0xa0
>>  io_ring_ctx_wait_and_kill+0x14e/0x3c0
>>  io_uring_release+0x1c/0x20
>>  __fput+0xaa/0x200
>>  __task_work_run+0x66/0xa0
>>  do_exit+0x9cf/0xb40
> 
> So task_work_run() is called recursively from
> exit_task_work()->task_work_run().  See my another email, this is
> wrong with or without this series. And that is why I think
> task_work_run() hits work_exited.

I see your newer email on this, I'll go read it.

> Could you explain why io_ring_ctx_wait_and_kill() needs
> task_work_run() ?

Hopefully the above helped! If I'm way off somehow, cluebat away.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued
  2020-04-07 16:59         ` Jens Axboe
@ 2020-04-07 17:43           ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-04-07 17:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, peterz

Jens,

I'll read your email tomorrow, just one note for now ...

On 04/07, Jens Axboe wrote:
>
> On 4/7/20 9:19 AM, Oleg Nesterov wrote:
> >
> > but this task has not passed exit_task_work(),
>
> But it's definitely hitting callback->func == NULL, which is the
> exit_work. So if it's not from past exit_task_work(), where is it from?

I guess it comes from task_work_run() added by the next patch ;)

> I see your newer email on this, I'll go read it.

please look at the "bad_work_func" example.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-07 17:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
2020-04-06 19:48 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
2020-04-07 11:28   ` Oleg Nesterov
2020-04-07 15:43     ` Jens Axboe
2020-04-06 19:48 ` [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued Jens Axboe
2020-04-07 11:39   ` Oleg Nesterov
2020-04-07 11:54     ` Oleg Nesterov
2020-04-07 15:40     ` Jens Axboe
2020-04-07 16:19       ` Oleg Nesterov
2020-04-07 16:59         ` Jens Axboe
2020-04-07 17:43           ` Oleg Nesterov
2020-04-07 12:47   ` Peter Zijlstra
2020-04-07 15:43     ` Jens Axboe
2020-04-06 19:48 ` [PATCH 3/4] task_work: kill current->task_works checking in callers Jens Axboe
2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox