public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals
@ 2020-10-01 19:42 Jens Axboe
  2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx

Hi,

I split this up into 3 pieces instead of the messy single patch, hope
this helps with review.

Patch 1 adds task_sigpending(), which tests TIF_SIGPENDING. Core use
cases that need to check for an actual signal pending are switched to
using task_sigpending() instead of signal_pending(). This should fix
Oleg's concern on signal_pending() == true, but no signals pending,
for actual signal delivery.

Patch 2 adds x86 and generic entry code support for TIF_TASKWORK.

Patch 3 adds task_work support for TIF_TASKWORK, if the arch supports it.

There's no need for any io_uring specific changes, so I've dropped those.
If TIF_TASKWORK is used, then JOBCTL_TASK_WORK will never be true and
hence we won't enter that case. If TIF_TASKWORK isn't available, then
we still need that code.

I've run this through my usual liburing test, and it passes. I also ran
it through all the ltp signal testing, and no changes from mainline in
terms of all tests passing.

 arch/x86/include/asm/thread_info.h |  2 ++
 arch/x86/kernel/signal.c           | 32 +++++++++++---------
 include/linux/entry-common.h       | 20 +++++++++++--
 include/linux/sched/signal.h       | 32 ++++++++++++++++----
 kernel/entry/common.c              | 14 +++++++--
 kernel/events/uprobes.c            |  2 +-
 kernel/ptrace.c                    |  2 +-
 kernel/signal.c                    | 12 ++++----
 kernel/task_work.c                 | 48 ++++++++++++++++++++++--------
 9 files changed, 118 insertions(+), 46 deletions(-)

Changes can also be viewed/pulled from this branch:

git://git.kernel.dk/linux-block tif-task_work

https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] kernel: add task_sigpending() helper
  2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
@ 2020-10-01 19:42 ` Jens Axboe
  2020-10-01 19:42 ` [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
  2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe
  2 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe

This is in preparation for maintaining signal_pending() as the decider
of whether or not a schedule() loop should be broken, or continue
sleeping. This is different than the core signal use cases, where we
really want to know if an actual signal is pending or not.
task_sigpending() returns non-zero if TIF_SIGPENDING is set.

Only core kernel use cases should care about the distinction between
the two, make sure those use the task_sigpending() helper.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/sched/signal.h | 13 +++++++++----
 kernel/events/uprobes.c      |  2 +-
 kernel/ptrace.c              |  2 +-
 kernel/signal.c              | 12 ++++++------
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 1bad18a1d8ba..e6f34d8fbf4d 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -353,11 +353,16 @@ static inline int restart_syscall(void)
 	return -ERESTARTNOINTR;
 }
 
-static inline int signal_pending(struct task_struct *p)
+static inline int task_sigpending(struct task_struct *p)
 {
 	return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
 }
 
+static inline int signal_pending(struct task_struct *p)
+{
+	return task_sigpending(p);
+}
+
 static inline int __fatal_signal_pending(struct task_struct *p)
 {
 	return unlikely(sigismember(&p->pending.signal, SIGKILL));
@@ -365,14 +370,14 @@ static inline int __fatal_signal_pending(struct task_struct *p)
 
 static inline int fatal_signal_pending(struct task_struct *p)
 {
-	return signal_pending(p) && __fatal_signal_pending(p);
+	return task_sigpending(p) && __fatal_signal_pending(p);
 }
 
 static inline int signal_pending_state(long state, struct task_struct *p)
 {
 	if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
 		return 0;
-	if (!signal_pending(p))
+	if (!task_sigpending(p))
 		return 0;
 
 	return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
@@ -389,7 +394,7 @@ static inline bool fault_signal_pending(vm_fault_t fault_flags,
 {
 	return unlikely((fault_flags & VM_FAULT_RETRY) &&
 			(fatal_signal_pending(current) ||
-			 (user_mode(regs) && signal_pending(current))));
+			 (user_mode(regs) && task_sigpending(current))));
 }
 
 /*
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0e18aaf23a7b..8bb26a338e06 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1973,7 +1973,7 @@ bool uprobe_deny_signal(void)
 
 	WARN_ON_ONCE(utask->state != UTASK_SSTEP);
 
-	if (signal_pending(t)) {
+	if (task_sigpending(t)) {
 		spin_lock_irq(&t->sighand->siglock);
 		clear_tsk_thread_flag(t, TIF_SIGPENDING);
 		spin_unlock_irq(&t->sighand->siglock);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..583b8da4c207 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -773,7 +773,7 @@ static int ptrace_peek_siginfo(struct task_struct *child,
 		data += sizeof(siginfo_t);
 		i++;
 
-		if (signal_pending(current))
+		if (task_sigpending(current))
 			break;
 
 		cond_resched();
diff --git a/kernel/signal.c b/kernel/signal.c
index a38b3edc6851..ad52141ab0d2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -983,7 +983,7 @@ static inline bool wants_signal(int sig, struct task_struct *p)
 	if (task_is_stopped_or_traced(p))
 		return false;
 
-	return task_curr(p) || !signal_pending(p);
+	return task_curr(p) || !task_sigpending(p);
 }
 
 static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
@@ -2822,7 +2822,7 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)
 		/* Remove the signals this thread can handle. */
 		sigandsets(&retarget, &retarget, &t->blocked);
 
-		if (!signal_pending(t))
+		if (!task_sigpending(t))
 			signal_wake_up(t, 0);
 
 		if (sigisemptyset(&retarget))
@@ -2856,7 +2856,7 @@ void exit_signals(struct task_struct *tsk)
 
 	cgroup_threadgroup_change_end(tsk);
 
-	if (!signal_pending(tsk))
+	if (!task_sigpending(tsk))
 		goto out;
 
 	unblocked = tsk->blocked;
@@ -2900,7 +2900,7 @@ long do_no_restart_syscall(struct restart_block *param)
 
 static void __set_task_blocked(struct task_struct *tsk, const sigset_t *newset)
 {
-	if (signal_pending(tsk) && !thread_group_empty(tsk)) {
+	if (task_sigpending(tsk) && !thread_group_empty(tsk)) {
 		sigset_t newblocked;
 		/* A set of now blocked but previously unblocked signals. */
 		sigandnsets(&newblocked, newset, &current->blocked);
@@ -4443,7 +4443,7 @@ SYSCALL_DEFINE2(signal, int, sig, __sighandler_t, handler)
 
 SYSCALL_DEFINE0(pause)
 {
-	while (!signal_pending(current)) {
+	while (!task_sigpending(current)) {
 		__set_current_state(TASK_INTERRUPTIBLE);
 		schedule();
 	}
@@ -4457,7 +4457,7 @@ static int sigsuspend(sigset_t *set)
 	current->saved_sigmask = current->blocked;
 	set_current_blocked(set);
 
-	while (!signal_pending(current)) {
+	while (!task_sigpending(current)) {
 		__set_current_state(TASK_INTERRUPTIBLE);
 		schedule();
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals
  2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
  2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe
@ 2020-10-01 19:42 ` Jens Axboe
  2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe
  2 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe, Roman Gershman

Users of TWA_SIGNAL need to break out of kernel waits loops, or force
re-entry into the kernel, to ensure that the queued task_work is run.
TWA_SIGNAL currently works like signal delivery in that sense, and uses
the same delivery mechanism. This currently works well from a functional
standpoint, but it is very heavy handed on a multithreaded application
where sighand is shared between all threads and main process. Adding
TWA_SIGNAL task_work on such setups need to grab the sighand->lock, which
creates a hot spot for otherwise unrelated task_work. This lock grabbing
is necessary on both the queue-work and run-work side of things,
exacerbating the problem/contention.

This adds TIF_TASKWORK for x86, which if set, will return true on
checking for pending signals. That in turn causes tasks to restart the
system call, which will run the added task_work. If TIF_TASKWORK is
available, we'll use that for notification when TWA_SIGNAL is specified.
If it isn't available, the existing TIF_SIGPENDING path is used.

Once all archs have added support for TIF_TASKWORK, we can kill the
old code completely. That will also allow removal of JOBCTL_TASK_WORK
and related code.

On my test box, even just using 16 threads shows a nice improvement
running an io_uring based echo server.

stock kernel:
0.01% <= 0.1 milliseconds
95.86% <= 0.2 milliseconds
98.27% <= 0.3 milliseconds
99.71% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.0 milliseconds
100.00% <= 1.1 milliseconds
100.00% <= 2 milliseconds
100.00% <= 3 milliseconds
100.00% <= 3 milliseconds
1378930.00 requests per second
~1600% CPU

1.38M requests/second, and all 16 CPUs are maxed out.

patched kernel:
0.01% <= 0.1 milliseconds
98.24% <= 0.2 milliseconds
99.47% <= 0.3 milliseconds
99.99% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.8 milliseconds
100.00% <= 0.9 milliseconds
100.00% <= 1.2 milliseconds
1666111.38 requests per second
~1450% CPU

1.67M requests/second, and we're no longer just hammering on the sighand
lock. The original reporter states:

"For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%.
 for 5.7.16 or later it achieves only 1M qps and the system cpu is is
 at ~100%"

with the only difference there being that TWA_SIGNAL is used
unconditionally in 5.7.16, since we need it to be able to solve an
inability to run task_work if the application is waiting in the kernel
already on an event that needs task_work run to be satisfied. Also
see commit 0ba9c9edcd15.

Reported-by: Roman Gershman <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
 arch/x86/include/asm/thread_info.h |  2 ++
 arch/x86/kernel/signal.c           | 32 +++++++++++++++++-------------
 include/linux/entry-common.h       | 20 ++++++++++++++++---
 include/linux/sched/signal.h       | 19 ++++++++++++++++--
 kernel/entry/common.c              | 14 ++++++++++---
 5 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 267701ae3d86..79fe7db3208c 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -93,6 +93,7 @@ struct thread_info {
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
 #define TIF_SLD			18	/* Restore split lock detection on context switch */
+#define TIF_TASKWORK		19	/* task_work pending */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
 #define TIF_IO_BITMAP		22	/* uses I/O bitmap */
@@ -123,6 +124,7 @@ struct thread_info {
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
 #define _TIF_SLD		(1 << TIF_SLD)
+#define _TIF_TASKWORK		(1 << TIF_TASKWORK)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF		(1 << TIF_FORCED_TF)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index be0d7d4152ec..5dc1eeaf0866 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -799,21 +799,8 @@ static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs)
 #endif
 }
 
-/*
- * Note that 'init' is a special process: it doesn't get signals it doesn't
- * want to handle. Thus you cannot kill init even with a SIGKILL even by
- * mistake.
- */
-void arch_do_signal(struct pt_regs *regs)
+void arch_restart_syscall(struct pt_regs *regs)
 {
-	struct ksignal ksig;
-
-	if (get_signal(&ksig)) {
-		/* Whee! Actually deliver the signal.  */
-		handle_signal(&ksig, regs);
-		return;
-	}
-
 	/* Did we come from a system call? */
 	if (syscall_get_nr(current, regs) >= 0) {
 		/* Restart the system call - no handlers present */
@@ -831,12 +818,29 @@ void arch_do_signal(struct pt_regs *regs)
 			break;
 		}
 	}
+}
+
+/*
+ * Note that 'init' is a special process: it doesn't get signals it doesn't
+ * want to handle. Thus you cannot kill init even with a SIGKILL even by
+ * mistake.
+ */
+bool arch_do_signal(struct pt_regs *regs)
+{
+	struct ksignal ksig;
+
+	if (get_signal(&ksig)) {
+		/* Whee! Actually deliver the signal.  */
+		handle_signal(&ksig, regs);
+		return true;
+	}
 
 	/*
 	 * If there's no signal to deliver, we just put the saved sigmask
 	 * back.
 	 */
 	restore_saved_sigmask();
+	return false;
 }
 
 void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index 159c7476b11b..03cab8b9ddab 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -37,6 +37,10 @@
 # define _TIF_UPROBE			(0)
 #endif
 
+#ifndef _TIF_TASKWORK
+# define _TIF_TASKWORK			(0)
+#endif
+
 /*
  * TIF flags handled in syscall_enter_from_usermode()
  */
@@ -69,7 +73,7 @@
 
 #define EXIT_TO_USER_MODE_WORK						\
 	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
-	 _TIF_NEED_RESCHED | _TIF_PATCH_PENDING |			\
+	 _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | _TIF_TASKWORK|	\
 	 ARCH_EXIT_TO_USER_MODE_WORK)
 
 /**
@@ -262,9 +266,19 @@ static __always_inline void arch_exit_to_user_mode(void) { }
  * arch_do_signal -  Architecture specific signal delivery function
  * @regs:	Pointer to currents pt_regs
  *
- * Invoked from exit_to_user_mode_loop().
+ * Invoked from exit_to_user_mode_loop(). Returns true if a signal was
+ * handled.
+ */
+bool arch_do_signal(struct pt_regs *regs);
+
+/**
+ * arch_restart_syscall -  Architecture specific syscall restarting
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Invoked from exit_to_user_mode_loop(), if we need to restart the current
+ * system call.
  */
-void arch_do_signal(struct pt_regs *regs);
+void arch_restart_syscall(struct pt_regs *regs);
 
 /**
  * arch_syscall_exit_tracehook - Wrapper around tracehook_report_syscall_exit()
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e6f34d8fbf4d..3093a7d30a24 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -360,6 +360,15 @@ static inline int task_sigpending(struct task_struct *p)
 
 static inline int signal_pending(struct task_struct *p)
 {
+#ifdef TIF_TASKWORK
+	/*
+	 * IF_TASKWORK isn't really a signal, but it requires the same
+	 * behavior in terms of ensuring that we break out of wait loops
+	 * so that task_work can be processed.
+	 */
+	if (unlikely(test_tsk_thread_flag(p, TIF_TASKWORK)))
+		return 1;
+#endif
 	return task_sigpending(p);
 }
 
@@ -506,10 +515,16 @@ extern int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize);
 
 static inline void restore_saved_sigmask_unless(bool interrupted)
 {
-	if (interrupted)
+	if (interrupted) {
+#ifdef TIF_TASKWORK
+		WARN_ON(!test_thread_flag(TIF_SIGPENDING) &&
+			!test_thread_flag(TIF_TASKWORK));
+#else
 		WARN_ON(!test_thread_flag(TIF_SIGPENDING));
-	else
+#endif
+	} else {
 		restore_saved_sigmask();
+	}
 }
 
 static inline sigset_t *sigmask_to_save(void)
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 6fdb6105e6d6..d25ee8f7f071 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -135,11 +135,13 @@ static __always_inline void exit_to_user_mode(void)
 }
 
 /* Workaround to allow gradual conversion of architecture code */
-void __weak arch_do_signal(struct pt_regs *regs) { }
+bool __weak arch_do_signal(struct pt_regs *regs) { return true; }
 
 static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 					    unsigned long ti_work)
 {
+	bool restart_sys = (ti_work & (_TIF_SIGPENDING|_TIF_TASKWORK)) != 0;
+
 	/*
 	 * Before returning to user space ensure that all pending work
 	 * items have been completed.
@@ -157,8 +159,11 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		if (ti_work & _TIF_PATCH_PENDING)
 			klp_update_patch_state(current);
 
-		if (ti_work & _TIF_SIGPENDING)
-			arch_do_signal(regs);
+		if (ti_work & _TIF_TASKWORK)
+			task_work_run();
+
+		if ((ti_work & _TIF_SIGPENDING) && arch_do_signal(regs))
+			restart_sys = false;
 
 		if (ti_work & _TIF_NOTIFY_RESUME) {
 			clear_thread_flag(TIF_NOTIFY_RESUME);
@@ -178,6 +183,9 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		ti_work = READ_ONCE(current_thread_info()->flags);
 	}
 
+	if (restart_sys)
+		arch_restart_syscall(regs);
+
 	/* Return the latest work state for arch_exit_to_user_mode() */
 	return ti_work;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
  2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe
  2020-10-01 19:42 ` [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
@ 2020-10-01 19:42 ` Jens Axboe
  2020-10-02 15:14   ` Oleg Nesterov
  2 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw)
  To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe

If the arch supports TIF_TASKWORK, then use that for TWA_SIGNAL as
it's more efficient than using the signal delivery method. This is
especially true on threaded applications, where ->sighand is shared
across threads.

Signed-off-by: Jens Axboe <[email protected]>
---
 kernel/task_work.c | 48 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 613b2d634af8..ae317cfe86b8 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -5,6 +5,39 @@
 
 static struct callback_head work_exited; /* all we need is ->next == NULL */
 
+/*
+ * TWA_SIGNAL signaling - use TIF_TASKWORK, if available.
+ */
+static void task_work_signal(struct task_struct *task)
+{
+#ifndef TIF_TASKWORK
+	unsigned long flags;
+
+	/*
+	 * Only grab the sighand lock if we don't already have some
+	 * task_work pending. This pairs with the smp_store_mb()
+	 * in get_signal(), see comment there.
+	 */
+	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
+	    lock_task_sighand(task, &flags)) {
+		task->jobctl |= JOBCTL_TASK_WORK;
+		signal_wake_up(task, 0);
+		unlock_task_sighand(task, &flags);
+	}
+#else
+	set_tsk_thread_flag(task, TIF_TASKWORK);
+	set_notify_resume(task);
+#endif
+}
+
+static inline void clear_tsk_taskwork(struct task_struct *task)
+{
+#ifdef TIF_TASKWORK
+	if (test_tsk_thread_flag(task, TIF_TASKWORK))
+		clear_tsk_thread_flag(task, TIF_TASKWORK);
+#endif
+}
+
 /**
  * task_work_add - ask the @task to execute @work->func()
  * @task: the task which should run the callback
@@ -28,7 +61,6 @@ int
 task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 {
 	struct callback_head *head;
-	unsigned long flags;
 
 	do {
 		head = READ_ONCE(task->task_works);
@@ -42,17 +74,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 		set_notify_resume(task);
 		break;
 	case TWA_SIGNAL:
-		/*
-		 * Only grab the sighand lock if we don't already have some
-		 * task_work pending. This pairs with the smp_store_mb()
-		 * in get_signal(), see comment there.
-		 */
-		if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
-		    lock_task_sighand(task, &flags)) {
-			task->jobctl |= JOBCTL_TASK_WORK;
-			signal_wake_up(task, 0);
-			unlock_task_sighand(task, &flags);
-		}
+		task_work_signal(task);
 		break;
 	}
 
@@ -110,6 +132,8 @@ void task_work_run(void)
 	struct task_struct *task = current;
 	struct callback_head *work, *head, *next;
 
+	clear_tsk_taskwork(task);
+
 	for (;;) {
 		/*
 		 * work->func() can do task_work_add(), do not set
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe
@ 2020-10-02 15:14   ` Oleg Nesterov
  2020-10-02 15:31     ` Thomas Gleixner
  2020-10-02 15:53     ` Jens Axboe
  0 siblings, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-10-02 15:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, io-uring, peterz, tglx

Heh. To be honest I don't really like 1-2 ;)

Unfortunately, I do not see a better approach right now. Let me think
until Monday, it is not that I think I will find a better solution, but
I'd like to try anyway.

Let me comment 3/3 for now.

On 10/01, Jens Axboe wrote:
>
> +static void task_work_signal(struct task_struct *task)
> +{
> +#ifndef TIF_TASKWORK
> +	unsigned long flags;
> +
> +	/*
> +	 * Only grab the sighand lock if we don't already have some
> +	 * task_work pending. This pairs with the smp_store_mb()
> +	 * in get_signal(), see comment there.
> +	 */
> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
> +	    lock_task_sighand(task, &flags)) {
> +		task->jobctl |= JOBCTL_TASK_WORK;
> +		signal_wake_up(task, 0);
> +		unlock_task_sighand(task, &flags);
> +	}
> +#else
> +	set_tsk_thread_flag(task, TIF_TASKWORK);
> +	set_notify_resume(task);
> +#endif

Again, I can't understand. task_work_signal(task) should set TIF_TASKWORK
to make signal_pending() = T _and_ wake/kick the target up, just like
signal_wake_up() does. Why do we set TIF_NOTIFY_RESUME ?

So I think that if we are going to add TIF_TASKWORK we should generalize
this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME
but implies signal_pending().

IOW, something like

	void set_notify_signal(task)
	{
		if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) {
			if (!wake_up_state(task, TASK_INTERRUPTIBLE))
				kick_process(t);
		}
	}

	// called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL
	void tracehook_notify_signal(regs)
	{
		clear_thread_flag(TIF_NOTIFY_SIGNAL);
		smp_mb__after_atomic();
		if (unlikely(current->task_works))
			task_work_run();
	}

This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can
have more users.

What do you think?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:14   ` Oleg Nesterov
@ 2020-10-02 15:31     ` Thomas Gleixner
  2020-10-02 15:38       ` Oleg Nesterov
  2020-10-02 15:52       ` Jens Axboe
  2020-10-02 15:53     ` Jens Axboe
  1 sibling, 2 replies; 15+ messages in thread
From: Thomas Gleixner @ 2020-10-02 15:31 UTC (permalink / raw)
  To: Oleg Nesterov, Jens Axboe; +Cc: linux-kernel, io-uring, peterz

On Fri, Oct 02 2020 at 17:14, Oleg Nesterov wrote:
> Heh. To be honest I don't really like 1-2 ;)

I do not like any of this :)

> So I think that if we are going to add TIF_TASKWORK we should generalize
> this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME
> but implies signal_pending().
>
> IOW, something like
>
> 	void set_notify_signal(task)
> 	{
> 		if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) {
> 			if (!wake_up_state(task, TASK_INTERRUPTIBLE))
> 				kick_process(t);
> 		}
> 	}
>
> 	// called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL
> 	void tracehook_notify_signal(regs)
> 	{
> 		clear_thread_flag(TIF_NOTIFY_SIGNAL);
> 		smp_mb__after_atomic();
> 		if (unlikely(current->task_works))
> 			task_work_run();
> 	}
>
> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can
> have more users.

I think it's fundamentaly wrong that we have several places and several
flags which handle task_work_run() instead of having exactly one place
and one flag.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:31     ` Thomas Gleixner
@ 2020-10-02 15:38       ` Oleg Nesterov
  2020-10-02 16:18         ` Jens Axboe
  2020-10-03  1:49         ` Thomas Gleixner
  2020-10-02 15:52       ` Jens Axboe
  1 sibling, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2020-10-02 15:38 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Jens Axboe, linux-kernel, io-uring, peterz

On 10/02, Thomas Gleixner wrote:
>
> I think it's fundamentaly wrong that we have several places and several
> flags which handle task_work_run() instead of having exactly one place
> and one flag.

Damn yes, agreed.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:31     ` Thomas Gleixner
  2020-10-02 15:38       ` Oleg Nesterov
@ 2020-10-02 15:52       ` Jens Axboe
  2020-10-02 16:42         ` Jens Axboe
  2020-10-02 19:10         ` Thomas Gleixner
  1 sibling, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-02 15:52 UTC (permalink / raw)
  To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz

On 10/2/20 9:31 AM, Thomas Gleixner wrote:
> On Fri, Oct 02 2020 at 17:14, Oleg Nesterov wrote:
>> Heh. To be honest I don't really like 1-2 ;)
> 
> I do not like any of this :)
> 
>> So I think that if we are going to add TIF_TASKWORK we should generalize
>> this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME
>> but implies signal_pending().
>>
>> IOW, something like
>>
>> 	void set_notify_signal(task)
>> 	{
>> 		if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) {
>> 			if (!wake_up_state(task, TASK_INTERRUPTIBLE))
>> 				kick_process(t);
>> 		}
>> 	}
>>
>> 	// called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL
>> 	void tracehook_notify_signal(regs)
>> 	{
>> 		clear_thread_flag(TIF_NOTIFY_SIGNAL);
>> 		smp_mb__after_atomic();
>> 		if (unlikely(current->task_works))
>> 			task_work_run();
>> 	}
>>
>> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can
>> have more users.
> 
> I think it's fundamentaly wrong that we have several places and several
> flags which handle task_work_run() instead of having exactly one place
> and one flag.

I don't disagree with that. I know it's not happening in this series, but
if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that,
then we can kill the signal and notify resume part of running task_work.
And that leaves us with exactly one place that runs it.

So we can potentially improve the current situation in that regard.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:14   ` Oleg Nesterov
  2020-10-02 15:31     ` Thomas Gleixner
@ 2020-10-02 15:53     ` Jens Axboe
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-02 15:53 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz, tglx

On 10/2/20 9:14 AM, Oleg Nesterov wrote:
> Heh. To be honest I don't really like 1-2 ;)
> 
> Unfortunately, I do not see a better approach right now. Let me think
> until Monday, it is not that I think I will find a better solution, but
> I'd like to try anyway.
> 
> Let me comment 3/3 for now.

Thanks, appreciate your time on this!

>> +static void task_work_signal(struct task_struct *task)
>> +{
>> +#ifndef TIF_TASKWORK
>> +	unsigned long flags;
>> +
>> +	/*
>> +	 * Only grab the sighand lock if we don't already have some
>> +	 * task_work pending. This pairs with the smp_store_mb()
>> +	 * in get_signal(), see comment there.
>> +	 */
>> +	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
>> +	    lock_task_sighand(task, &flags)) {
>> +		task->jobctl |= JOBCTL_TASK_WORK;
>> +		signal_wake_up(task, 0);
>> +		unlock_task_sighand(task, &flags);
>> +	}
>> +#else
>> +	set_tsk_thread_flag(task, TIF_TASKWORK);
>> +	set_notify_resume(task);
>> +#endif
> 
> Again, I can't understand. task_work_signal(task) should set TIF_TASKWORK
> to make signal_pending() = T _and_ wake/kick the target up, just like
> signal_wake_up() does. Why do we set TIF_NOTIFY_RESUME ?
> 
> So I think that if we are going to add TIF_TASKWORK we should generalize
> this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME
> but implies signal_pending().
> 
> IOW, something like
> 
> 	void set_notify_signal(task)
> 	{
> 		if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) {
> 			if (!wake_up_state(task, TASK_INTERRUPTIBLE))
> 				kick_process(t);
> 		}
> 	}
> 
> 	// called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL
> 	void tracehook_notify_signal(regs)
> 	{
> 		clear_thread_flag(TIF_NOTIFY_SIGNAL);
> 		smp_mb__after_atomic();
> 		if (unlikely(current->task_works))
> 			task_work_run();
> 	}
> 
> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can
> have more users.
> 
> What do you think?

I like that. It'll achieve the same thing as far as I'm concerned, but not
tie the functionality to task_work. Not that we have anything that'd use
it right now, but it still seems like a better base.

I'll adapt patch 2+3 for this, thanks Oleg.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:38       ` Oleg Nesterov
@ 2020-10-02 16:18         ` Jens Axboe
  2020-10-03  1:49         ` Thomas Gleixner
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-02 16:18 UTC (permalink / raw)
  To: Oleg Nesterov, Thomas Gleixner; +Cc: linux-kernel, io-uring, peterz

On 10/2/20 9:38 AM, Oleg Nesterov wrote:
> On 10/02, Thomas Gleixner wrote:
>>
>> I think it's fundamentaly wrong that we have several places and several
>> flags which handle task_work_run() instead of having exactly one place
>> and one flag.
> 
> Damn yes, agreed.

As mentioned in the other reply, this is actually a nice step towards
NOT having that be the case. Right now we have TWA_RESUME, which uses
TIF_NOTIFY_RESUME. Once all archs support TIF_NOTIFY_SIGNAL, then we can
totally drop TWA_NOTIFY resume, and use use TWA_SIGNAL as the default
for notify == true task_work users. And we can drop task_work noticing
and running in the signal handling as well, leaving us with only having
tracehook_notify_signal() running the task_work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:52       ` Jens Axboe
@ 2020-10-02 16:42         ` Jens Axboe
  2020-10-02 19:10         ` Thomas Gleixner
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-02 16:42 UTC (permalink / raw)
  To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz

On 10/2/20 9:52 AM, Jens Axboe wrote:
> On 10/2/20 9:31 AM, Thomas Gleixner wrote:
>> On Fri, Oct 02 2020 at 17:14, Oleg Nesterov wrote:
>>> Heh. To be honest I don't really like 1-2 ;)
>>
>> I do not like any of this :)
>>
>>> So I think that if we are going to add TIF_TASKWORK we should generalize
>>> this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME
>>> but implies signal_pending().
>>>
>>> IOW, something like
>>>
>>> 	void set_notify_signal(task)
>>> 	{
>>> 		if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) {
>>> 			if (!wake_up_state(task, TASK_INTERRUPTIBLE))
>>> 				kick_process(t);
>>> 		}
>>> 	}
>>>
>>> 	// called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL
>>> 	void tracehook_notify_signal(regs)
>>> 	{
>>> 		clear_thread_flag(TIF_NOTIFY_SIGNAL);
>>> 		smp_mb__after_atomic();
>>> 		if (unlikely(current->task_works))
>>> 			task_work_run();
>>> 	}
>>>
>>> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can
>>> have more users.
>>
>> I think it's fundamentaly wrong that we have several places and several
>> flags which handle task_work_run() instead of having exactly one place
>> and one flag.
> 
> I don't disagree with that. I know it's not happening in this series, but
> if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that,
> then we can kill the signal and notify resume part of running task_work.
> And that leaves us with exactly one place that runs it.
> 
> So we can potentially improve the current situation in that regard.

I re-spun (and re-tested) the series, now based on TIF_NOTIFY_SIGNAL
instead. I won't be sending this one out before we've discussed it
some more, but wanted to let you know what it currently looks like:

https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:52       ` Jens Axboe
  2020-10-02 16:42         ` Jens Axboe
@ 2020-10-02 19:10         ` Thomas Gleixner
  2020-10-02 20:14           ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2020-10-02 19:10 UTC (permalink / raw)
  To: Jens Axboe, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz

On Fri, Oct 02 2020 at 09:52, Jens Axboe wrote:
> On 10/2/20 9:31 AM, Thomas Gleixner wrote:
>>> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can
>>> have more users.
>> 
>> I think it's fundamentaly wrong that we have several places and several
>> flags which handle task_work_run() instead of having exactly one place
>> and one flag.
>
> I don't disagree with that. I know it's not happening in this series, but
> if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that,
> then we can kill the signal and notify resume part of running task_work.
> And that leaves us with exactly one place that runs it.
>
> So we can potentially improve the current situation in that regard.

I'll think about it over the weekend.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 19:10         ` Thomas Gleixner
@ 2020-10-02 20:14           ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-02 20:14 UTC (permalink / raw)
  To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz

On 10/2/20 1:10 PM, Thomas Gleixner wrote:
> On Fri, Oct 02 2020 at 09:52, Jens Axboe wrote:
>> On 10/2/20 9:31 AM, Thomas Gleixner wrote:
>>>> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can
>>>> have more users.
>>>
>>> I think it's fundamentaly wrong that we have several places and several
>>> flags which handle task_work_run() instead of having exactly one place
>>> and one flag.
>>
>> I don't disagree with that. I know it's not happening in this series, but
>> if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that,
>> then we can kill the signal and notify resume part of running task_work.
>> And that leaves us with exactly one place that runs it.
>>
>> So we can potentially improve the current situation in that regard.
> 
> I'll think about it over the weekend.

Thanks, I appreciate it!

Just to drive the point home, we'd end up with something like the below.
Which also enables me to remove a nasty sighand->lock deadlock
workaround in io_uring.

Not in this patch, but the io_uring cqring_wait() call can also be
removed. Outside of the core calling it in tracehook_notify_signal(),
the only callers are then the case where kthreads are used with
task_work.


 fs/io_uring.c                  | 41 ++++++++++++----------------------
 include/linux/sched/jobctl.h   |  4 +---
 include/linux/task_work.h      |  4 +---
 include/linux/tracehook.h      |  9 --------
 kernel/signal.c                | 22 ------------------
 kernel/task_work.c             | 40 +++------------------------------
 kernel/time/posix-cpu-timers.c |  2 +-
 7 files changed, 20 insertions(+), 102 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2a67552a9c2f..3a5f4a7bd369 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1597,12 +1597,12 @@ static void __io_free_req(struct io_kiocb *req)
 		int ret;
 
 		init_task_work(&req->task_work, io_req_task_file_table_put);
-		ret = task_work_add(req->task, &req->task_work, TWA_RESUME);
+		ret = task_work_add(req->task, &req->task_work, true);
 		if (unlikely(ret)) {
 			struct task_struct *tsk;
 
 			tsk = io_wq_get_task(req->ctx->io_wq);
-			task_work_add(tsk, &req->task_work, 0);
+			task_work_add(tsk, &req->task_work, false);
 		}
 	}
 }
@@ -1746,25 +1746,21 @@ static struct io_kiocb *io_req_find_next(struct io_kiocb *req)
 	return __io_req_find_next(req);
 }
 
-static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb,
-				bool twa_signal_ok)
+static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
 {
 	struct task_struct *tsk = req->task;
 	struct io_ring_ctx *ctx = req->ctx;
-	int ret, notify;
+	bool notify = false;
+	int ret;
 
 	if (tsk->flags & PF_EXITING)
 		return -ESRCH;
 
 	/*
-	 * SQPOLL kernel thread doesn't need notification, just a wakeup. For
-	 * all other cases, use TWA_SIGNAL unconditionally to ensure we're
-	 * processing task_work. There's no reliable way to tell if TWA_RESUME
-	 * will do the job.
+	 * SQPOLL kernel thread doesn't need notification, just a wakeup.
 	 */
-	notify = 0;
-	if (!(ctx->flags & IORING_SETUP_SQPOLL) && twa_signal_ok)
-		notify = TWA_SIGNAL;
+	if (!(ctx->flags & IORING_SETUP_SQPOLL))
+		notify = true;
 
 	ret = task_work_add(tsk, cb, notify);
 	if (!ret)
@@ -1825,13 +1821,13 @@ static void io_req_task_queue(struct io_kiocb *req)
 	init_task_work(&req->task_work, io_req_task_submit);
 	percpu_ref_get(&req->ctx->refs);
 
-	ret = io_req_task_work_add(req, &req->task_work, true);
+	ret = io_req_task_work_add(req, &req->task_work);
 	if (unlikely(ret)) {
 		struct task_struct *tsk;
 
 		init_task_work(&req->task_work, io_req_task_cancel);
 		tsk = io_wq_get_task(req->ctx->io_wq);
-		task_work_add(tsk, &req->task_work, 0);
+		task_work_add(tsk, &req->task_work, false);
 		wake_up_process(tsk);
 	}
 }
@@ -3056,14 +3052,14 @@ static int io_async_buf_func(struct wait_queue_entry *wait, unsigned mode,
 
 	/* submit ref gets dropped, acquire a new one */
 	refcount_inc(&req->refs);
-	ret = io_req_task_work_add(req, &req->task_work, true);
+	ret = io_req_task_work_add(req, &req->task_work);
 	if (unlikely(ret)) {
 		struct task_struct *tsk;
 
 		/* queue just for cancelation */
 		init_task_work(&req->task_work, io_req_task_cancel);
 		tsk = io_wq_get_task(req->ctx->io_wq);
-		task_work_add(tsk, &req->task_work, 0);
+		task_work_add(tsk, &req->task_work, false);
 		wake_up_process(tsk);
 	}
 	return 1;
@@ -4598,7 +4594,6 @@ struct io_poll_table {
 static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
 			   __poll_t mask, task_work_func_t func)
 {
-	bool twa_signal_ok;
 	int ret;
 
 	/* for instances that support it check for an event match first: */
@@ -4613,27 +4608,19 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
 	init_task_work(&req->task_work, func);
 	percpu_ref_get(&req->ctx->refs);
 
-	/*
-	 * If we using the signalfd wait_queue_head for this wakeup, then
-	 * it's not safe to use TWA_SIGNAL as we could be recursing on the
-	 * tsk->sighand->siglock on doing the wakeup. Should not be needed
-	 * either, as the normal wakeup will suffice.
-	 */
-	twa_signal_ok = (poll->head != &req->task->sighand->signalfd_wqh);
-
 	/*
 	 * If this fails, then the task is exiting. When a task exits, the
 	 * work gets canceled, so just cancel this request as well instead
 	 * of executing it. We can't safely execute it anyway, as we may not
 	 * have the needed state needed for it anyway.
 	 */
-	ret = io_req_task_work_add(req, &req->task_work, twa_signal_ok);
+	ret = io_req_task_work_add(req, &req->task_work);
 	if (unlikely(ret)) {
 		struct task_struct *tsk;
 
 		WRITE_ONCE(poll->canceled, true);
 		tsk = io_wq_get_task(req->ctx->io_wq);
-		task_work_add(tsk, &req->task_work, 0);
+		task_work_add(tsk, &req->task_work, false);
 		wake_up_process(tsk);
 	}
 	return 1;
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d2b4204ba4d3..fa067de9f1a9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,7 +19,6 @@ struct task_struct;
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
 #define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
-#define JOBCTL_TASK_WORK_BIT	24	/* set by TWA_SIGNAL */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -29,10 +28,9 @@ struct task_struct;
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
 #define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
-#define JOBCTL_TASK_WORK	(1UL << JOBCTL_TASK_WORK_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
-#define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK | JOBCTL_TASK_WORK)
+#define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
 
 extern bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask);
 extern void task_clear_jobctl_trapping(struct task_struct *task);
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 0fb93aafa478..a221bd5f746c 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -13,9 +13,7 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
 	twork->func = func;
 }
 
-#define TWA_RESUME	1
-#define TWA_SIGNAL	2
-int task_work_add(struct task_struct *task, struct callback_head *twork, int);
+int task_work_add(struct task_struct *task, struct callback_head *twork, bool);
 
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
 void task_work_run(void);
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 7ec0e94c5250..3a4a35ae87d1 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -178,15 +178,6 @@ static inline void set_notify_resume(struct task_struct *task)
  */
 static inline void tracehook_notify_resume(struct pt_regs *regs)
 {
-	/*
-	 * The caller just cleared TIF_NOTIFY_RESUME. This barrier
-	 * pairs with task_work_add()->set_notify_resume() after
-	 * hlist_add_head(task->task_works);
-	 */
-	smp_mb__after_atomic();
-	if (unlikely(current->task_works))
-		task_work_run();
-
 #ifdef CONFIG_KEYS_REQUEST_CACHE
 	if (unlikely(current->cached_requested_key)) {
 		key_put(current->cached_requested_key);
diff --git a/kernel/signal.c b/kernel/signal.c
index ad52141ab0d2..d44fa9141cef 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2271,8 +2271,6 @@ static void ptrace_do_notify(int signr, int exit_code, int why)
 void ptrace_notify(int exit_code)
 {
 	BUG_ON((exit_code & (0x7f | ~0xffff)) != SIGTRAP);
-	if (unlikely(current->task_works))
-		task_work_run();
 
 	spin_lock_irq(&current->sighand->siglock);
 	ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED);
@@ -2541,26 +2539,6 @@ bool get_signal(struct ksignal *ksig)
 
 relock:
 	spin_lock_irq(&sighand->siglock);
-	/*
-	 * Make sure we can safely read ->jobctl() in task_work add. As Oleg
-	 * states:
-	 *
-	 * It pairs with mb (implied by cmpxchg) before READ_ONCE. So we
-	 * roughly have
-	 *
-	 *	task_work_add:				get_signal:
-	 *	STORE(task->task_works, new_work);	STORE(task->jobctl);
-	 *	mb();					mb();
-	 *	LOAD(task->jobctl);			LOAD(task->task_works);
-	 *
-	 * and we can rely on STORE-MB-LOAD [ in task_work_add].
-	 */
-	smp_store_mb(current->jobctl, current->jobctl & ~JOBCTL_TASK_WORK);
-	if (unlikely(current->task_works)) {
-		spin_unlock_irq(&sighand->siglock);
-		task_work_run();
-		goto relock;
-	}
 
 	/*
 	 * Every stopped thread goes here after wakeup. Check to see if
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 95604e57af46..e68f5831a078 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -5,34 +5,6 @@
 
 static struct callback_head work_exited; /* all we need is ->next == NULL */
 
-/*
- * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster
- * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is
- * shared for threads, and can cause contention on sighand->lock. Even for
- * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking
- * or IRQ disabling is involved for notification (or running) purposes.
- */
-static void task_work_notify_signal(struct task_struct *task)
-{
-#ifdef TIF_NOTIFY_SIGNAL
-	set_notify_signal(task);
-#else
-	unsigned long flags;
-
-	/*
-	 * Only grab the sighand lock if we don't already have some
-	 * task_work pending. This pairs with the smp_store_mb()
-	 * in get_signal(), see comment there.
-	 */
-	if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) &&
-	    lock_task_sighand(task, &flags)) {
-		task->jobctl |= JOBCTL_TASK_WORK;
-		signal_wake_up(task, 0);
-		unlock_task_sighand(task, &flags);
-	}
-#endif
-}
-
 /**
  * task_work_add - ask the @task to execute @work->func()
  * @task: the task which should run the callback
@@ -53,7 +25,7 @@ static void task_work_notify_signal(struct task_struct *task)
  * 0 if succeeds or -ESRCH.
  */
 int
-task_work_add(struct task_struct *task, struct callback_head *work, int notify)
+task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
 {
 	struct callback_head *head;
 
@@ -64,14 +36,8 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
 		work->next = head;
 	} while (cmpxchg(&task->task_works, head, work) != head);
 
-	switch (notify) {
-	case TWA_RESUME:
-		set_notify_resume(task);
-		break;
-	case TWA_SIGNAL:
-		task_work_notify_signal(task);
-		break;
-	}
+	if (notify)
+		set_notify_signal(task);
 
 	return 0;
 }
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index a71758e34e45..51080a1ed11f 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1128,7 +1128,7 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk)
 
 	/* Schedule task work to actually expire the timers */
 	tsk->posix_cputimers_work.scheduled = true;
-	task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME);
+	task_work_add(tsk, &tsk->posix_cputimers_work.work, true);
 }
 
 static inline bool posix_cpu_timers_enable_work(struct task_struct *tsk,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-02 15:38       ` Oleg Nesterov
  2020-10-02 16:18         ` Jens Axboe
@ 2020-10-03  1:49         ` Thomas Gleixner
  2020-10-03 15:35           ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2020-10-03  1:49 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Jens Axboe, linux-kernel, io-uring, peterz

On Fri, Oct 02 2020 at 17:38, Oleg Nesterov wrote:
> On 10/02, Thomas Gleixner wrote:
>>
>> I think it's fundamentaly wrong that we have several places and several
>> flags which handle task_work_run() instead of having exactly one place
>> and one flag.
>
> Damn yes, agreed.

Actually there are TWO places, but they don't interfere:

   1) exit to user

   2) enter guest

From the kernel POV they are pretty much the same as both are leaving
the kernel domain. But they have a few subtle different requirements
what has to be done or not.

So any change to that logic needs to fixup both places,

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available
  2020-10-03  1:49         ` Thomas Gleixner
@ 2020-10-03 15:35           ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2020-10-03 15:35 UTC (permalink / raw)
  To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz

On 10/2/20 7:49 PM, Thomas Gleixner wrote:
> On Fri, Oct 02 2020 at 17:38, Oleg Nesterov wrote:
>> On 10/02, Thomas Gleixner wrote:
>>>
>>> I think it's fundamentaly wrong that we have several places and several
>>> flags which handle task_work_run() instead of having exactly one place
>>> and one flag.
>>
>> Damn yes, agreed.
> 
> Actually there are TWO places, but they don't interfere:
> 
>    1) exit to user
> 
>    2) enter guest
> 
> From the kernel POV they are pretty much the same as both are leaving
> the kernel domain. But they have a few subtle different requirements
> what has to be done or not.
> 
> So any change to that logic needs to fixup both places,

Right, I actually did spot that, but didn't include it in the initial
series. I've split up the series a bit more, into functional bits.
Should be easier to reason/discuss:

https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-03 15:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe
2020-10-01 19:42 ` [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe
2020-10-02 15:14   ` Oleg Nesterov
2020-10-02 15:31     ` Thomas Gleixner
2020-10-02 15:38       ` Oleg Nesterov
2020-10-02 16:18         ` Jens Axboe
2020-10-03  1:49         ` Thomas Gleixner
2020-10-03 15:35           ` Jens Axboe
2020-10-02 15:52       ` Jens Axboe
2020-10-02 16:42         ` Jens Axboe
2020-10-02 19:10         ` Thomas Gleixner
2020-10-02 20:14           ` Jens Axboe
2020-10-02 15:53     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox