* [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals @ 2020-10-01 19:42 Jens Axboe 2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw) To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx Hi, I split this up into 3 pieces instead of the messy single patch, hope this helps with review. Patch 1 adds task_sigpending(), which tests TIF_SIGPENDING. Core use cases that need to check for an actual signal pending are switched to using task_sigpending() instead of signal_pending(). This should fix Oleg's concern on signal_pending() == true, but no signals pending, for actual signal delivery. Patch 2 adds x86 and generic entry code support for TIF_TASKWORK. Patch 3 adds task_work support for TIF_TASKWORK, if the arch supports it. There's no need for any io_uring specific changes, so I've dropped those. If TIF_TASKWORK is used, then JOBCTL_TASK_WORK will never be true and hence we won't enter that case. If TIF_TASKWORK isn't available, then we still need that code. I've run this through my usual liburing test, and it passes. I also ran it through all the ltp signal testing, and no changes from mainline in terms of all tests passing. arch/x86/include/asm/thread_info.h | 2 ++ arch/x86/kernel/signal.c | 32 +++++++++++--------- include/linux/entry-common.h | 20 +++++++++++-- include/linux/sched/signal.h | 32 ++++++++++++++++---- kernel/entry/common.c | 14 +++++++-- kernel/events/uprobes.c | 2 +- kernel/ptrace.c | 2 +- kernel/signal.c | 12 ++++---- kernel/task_work.c | 48 ++++++++++++++++++++++-------- 9 files changed, 118 insertions(+), 46 deletions(-) Changes can also be viewed/pulled from this branch: git://git.kernel.dk/linux-block tif-task_work https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] kernel: add task_sigpending() helper 2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe @ 2020-10-01 19:42 ` Jens Axboe 2020-10-01 19:42 ` [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe 2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe 2 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw) To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe This is in preparation for maintaining signal_pending() as the decider of whether or not a schedule() loop should be broken, or continue sleeping. This is different than the core signal use cases, where we really want to know if an actual signal is pending or not. task_sigpending() returns non-zero if TIF_SIGPENDING is set. Only core kernel use cases should care about the distinction between the two, make sure those use the task_sigpending() helper. Signed-off-by: Jens Axboe <[email protected]> --- include/linux/sched/signal.h | 13 +++++++++---- kernel/events/uprobes.c | 2 +- kernel/ptrace.c | 2 +- kernel/signal.c | 12 ++++++------ 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 1bad18a1d8ba..e6f34d8fbf4d 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -353,11 +353,16 @@ static inline int restart_syscall(void) return -ERESTARTNOINTR; } -static inline int signal_pending(struct task_struct *p) +static inline int task_sigpending(struct task_struct *p) { return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING)); } +static inline int signal_pending(struct task_struct *p) +{ + return task_sigpending(p); +} + static inline int __fatal_signal_pending(struct task_struct *p) { return unlikely(sigismember(&p->pending.signal, SIGKILL)); @@ -365,14 +370,14 @@ static inline int __fatal_signal_pending(struct task_struct *p) static inline int fatal_signal_pending(struct task_struct *p) { - return signal_pending(p) && __fatal_signal_pending(p); + return task_sigpending(p) && __fatal_signal_pending(p); } static inline int signal_pending_state(long state, struct task_struct *p) { if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL))) return 0; - if (!signal_pending(p)) + if (!task_sigpending(p)) return 0; return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p); @@ -389,7 +394,7 @@ static inline bool fault_signal_pending(vm_fault_t fault_flags, { return unlikely((fault_flags & VM_FAULT_RETRY) && (fatal_signal_pending(current) || - (user_mode(regs) && signal_pending(current)))); + (user_mode(regs) && task_sigpending(current)))); } /* diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 0e18aaf23a7b..8bb26a338e06 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1973,7 +1973,7 @@ bool uprobe_deny_signal(void) WARN_ON_ONCE(utask->state != UTASK_SSTEP); - if (signal_pending(t)) { + if (task_sigpending(t)) { spin_lock_irq(&t->sighand->siglock); clear_tsk_thread_flag(t, TIF_SIGPENDING); spin_unlock_irq(&t->sighand->siglock); diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 43d6179508d6..583b8da4c207 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -773,7 +773,7 @@ static int ptrace_peek_siginfo(struct task_struct *child, data += sizeof(siginfo_t); i++; - if (signal_pending(current)) + if (task_sigpending(current)) break; cond_resched(); diff --git a/kernel/signal.c b/kernel/signal.c index a38b3edc6851..ad52141ab0d2 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -983,7 +983,7 @@ static inline bool wants_signal(int sig, struct task_struct *p) if (task_is_stopped_or_traced(p)) return false; - return task_curr(p) || !signal_pending(p); + return task_curr(p) || !task_sigpending(p); } static void complete_signal(int sig, struct task_struct *p, enum pid_type type) @@ -2822,7 +2822,7 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which) /* Remove the signals this thread can handle. */ sigandsets(&retarget, &retarget, &t->blocked); - if (!signal_pending(t)) + if (!task_sigpending(t)) signal_wake_up(t, 0); if (sigisemptyset(&retarget)) @@ -2856,7 +2856,7 @@ void exit_signals(struct task_struct *tsk) cgroup_threadgroup_change_end(tsk); - if (!signal_pending(tsk)) + if (!task_sigpending(tsk)) goto out; unblocked = tsk->blocked; @@ -2900,7 +2900,7 @@ long do_no_restart_syscall(struct restart_block *param) static void __set_task_blocked(struct task_struct *tsk, const sigset_t *newset) { - if (signal_pending(tsk) && !thread_group_empty(tsk)) { + if (task_sigpending(tsk) && !thread_group_empty(tsk)) { sigset_t newblocked; /* A set of now blocked but previously unblocked signals. */ sigandnsets(&newblocked, newset, ¤t->blocked); @@ -4443,7 +4443,7 @@ SYSCALL_DEFINE2(signal, int, sig, __sighandler_t, handler) SYSCALL_DEFINE0(pause) { - while (!signal_pending(current)) { + while (!task_sigpending(current)) { __set_current_state(TASK_INTERRUPTIBLE); schedule(); } @@ -4457,7 +4457,7 @@ static int sigsuspend(sigset_t *set) current->saved_sigmask = current->blocked; set_current_blocked(set); - while (!signal_pending(current)) { + while (!task_sigpending(current)) { __set_current_state(TASK_INTERRUPTIBLE); schedule(); } -- 2.28.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals 2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe 2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe @ 2020-10-01 19:42 ` Jens Axboe 2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe 2 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw) To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe, Roman Gershman Users of TWA_SIGNAL need to break out of kernel waits loops, or force re-entry into the kernel, to ensure that the queued task_work is run. TWA_SIGNAL currently works like signal delivery in that sense, and uses the same delivery mechanism. This currently works well from a functional standpoint, but it is very heavy handed on a multithreaded application where sighand is shared between all threads and main process. Adding TWA_SIGNAL task_work on such setups need to grab the sighand->lock, which creates a hot spot for otherwise unrelated task_work. This lock grabbing is necessary on both the queue-work and run-work side of things, exacerbating the problem/contention. This adds TIF_TASKWORK for x86, which if set, will return true on checking for pending signals. That in turn causes tasks to restart the system call, which will run the added task_work. If TIF_TASKWORK is available, we'll use that for notification when TWA_SIGNAL is specified. If it isn't available, the existing TIF_SIGPENDING path is used. Once all archs have added support for TIF_TASKWORK, we can kill the old code completely. That will also allow removal of JOBCTL_TASK_WORK and related code. On my test box, even just using 16 threads shows a nice improvement running an io_uring based echo server. stock kernel: 0.01% <= 0.1 milliseconds 95.86% <= 0.2 milliseconds 98.27% <= 0.3 milliseconds 99.71% <= 0.4 milliseconds 100.00% <= 0.5 milliseconds 100.00% <= 0.6 milliseconds 100.00% <= 0.7 milliseconds 100.00% <= 0.8 milliseconds 100.00% <= 0.9 milliseconds 100.00% <= 1.0 milliseconds 100.00% <= 1.1 milliseconds 100.00% <= 2 milliseconds 100.00% <= 3 milliseconds 100.00% <= 3 milliseconds 1378930.00 requests per second ~1600% CPU 1.38M requests/second, and all 16 CPUs are maxed out. patched kernel: 0.01% <= 0.1 milliseconds 98.24% <= 0.2 milliseconds 99.47% <= 0.3 milliseconds 99.99% <= 0.4 milliseconds 100.00% <= 0.5 milliseconds 100.00% <= 0.6 milliseconds 100.00% <= 0.7 milliseconds 100.00% <= 0.8 milliseconds 100.00% <= 0.9 milliseconds 100.00% <= 1.2 milliseconds 1666111.38 requests per second ~1450% CPU 1.67M requests/second, and we're no longer just hammering on the sighand lock. The original reporter states: "For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%. for 5.7.16 or later it achieves only 1M qps and the system cpu is is at ~100%" with the only difference there being that TWA_SIGNAL is used unconditionally in 5.7.16, since we need it to be able to solve an inability to run task_work if the application is waiting in the kernel already on an event that needs task_work run to be satisfied. Also see commit 0ba9c9edcd15. Reported-by: Roman Gershman <[email protected]> Signed-off-by: Jens Axboe <[email protected]> --- arch/x86/include/asm/thread_info.h | 2 ++ arch/x86/kernel/signal.c | 32 +++++++++++++++++------------- include/linux/entry-common.h | 20 ++++++++++++++++--- include/linux/sched/signal.h | 19 ++++++++++++++++-- kernel/entry/common.c | 14 ++++++++++--- 5 files changed, 65 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 267701ae3d86..79fe7db3208c 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -93,6 +93,7 @@ struct thread_info { #define TIF_NOTSC 16 /* TSC is not accessible in userland */ #define TIF_IA32 17 /* IA32 compatibility process */ #define TIF_SLD 18 /* Restore split lock detection on context switch */ +#define TIF_TASKWORK 19 /* task_work pending */ #define TIF_MEMDIE 20 /* is terminating due to OOM killer */ #define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */ #define TIF_IO_BITMAP 22 /* uses I/O bitmap */ @@ -123,6 +124,7 @@ struct thread_info { #define _TIF_NOTSC (1 << TIF_NOTSC) #define _TIF_IA32 (1 << TIF_IA32) #define _TIF_SLD (1 << TIF_SLD) +#define _TIF_TASKWORK (1 << TIF_TASKWORK) #define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG) #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP) #define _TIF_FORCED_TF (1 << TIF_FORCED_TF) diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index be0d7d4152ec..5dc1eeaf0866 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -799,21 +799,8 @@ static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs) #endif } -/* - * Note that 'init' is a special process: it doesn't get signals it doesn't - * want to handle. Thus you cannot kill init even with a SIGKILL even by - * mistake. - */ -void arch_do_signal(struct pt_regs *regs) +void arch_restart_syscall(struct pt_regs *regs) { - struct ksignal ksig; - - if (get_signal(&ksig)) { - /* Whee! Actually deliver the signal. */ - handle_signal(&ksig, regs); - return; - } - /* Did we come from a system call? */ if (syscall_get_nr(current, regs) >= 0) { /* Restart the system call - no handlers present */ @@ -831,12 +818,29 @@ void arch_do_signal(struct pt_regs *regs) break; } } +} + +/* + * Note that 'init' is a special process: it doesn't get signals it doesn't + * want to handle. Thus you cannot kill init even with a SIGKILL even by + * mistake. + */ +bool arch_do_signal(struct pt_regs *regs) +{ + struct ksignal ksig; + + if (get_signal(&ksig)) { + /* Whee! Actually deliver the signal. */ + handle_signal(&ksig, regs); + return true; + } /* * If there's no signal to deliver, we just put the saved sigmask * back. */ restore_saved_sigmask(); + return false; } void signal_fault(struct pt_regs *regs, void __user *frame, char *where) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 159c7476b11b..03cab8b9ddab 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -37,6 +37,10 @@ # define _TIF_UPROBE (0) #endif +#ifndef _TIF_TASKWORK +# define _TIF_TASKWORK (0) +#endif + /* * TIF flags handled in syscall_enter_from_usermode() */ @@ -69,7 +73,7 @@ #define EXIT_TO_USER_MODE_WORK \ (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ - _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | \ + _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | _TIF_TASKWORK| \ ARCH_EXIT_TO_USER_MODE_WORK) /** @@ -262,9 +266,19 @@ static __always_inline void arch_exit_to_user_mode(void) { } * arch_do_signal - Architecture specific signal delivery function * @regs: Pointer to currents pt_regs * - * Invoked from exit_to_user_mode_loop(). + * Invoked from exit_to_user_mode_loop(). Returns true if a signal was + * handled. + */ +bool arch_do_signal(struct pt_regs *regs); + +/** + * arch_restart_syscall - Architecture specific syscall restarting + * @regs: Pointer to currents pt_regs + * + * Invoked from exit_to_user_mode_loop(), if we need to restart the current + * system call. */ -void arch_do_signal(struct pt_regs *regs); +void arch_restart_syscall(struct pt_regs *regs); /** * arch_syscall_exit_tracehook - Wrapper around tracehook_report_syscall_exit() diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index e6f34d8fbf4d..3093a7d30a24 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -360,6 +360,15 @@ static inline int task_sigpending(struct task_struct *p) static inline int signal_pending(struct task_struct *p) { +#ifdef TIF_TASKWORK + /* + * IF_TASKWORK isn't really a signal, but it requires the same + * behavior in terms of ensuring that we break out of wait loops + * so that task_work can be processed. + */ + if (unlikely(test_tsk_thread_flag(p, TIF_TASKWORK))) + return 1; +#endif return task_sigpending(p); } @@ -506,10 +515,16 @@ extern int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize); static inline void restore_saved_sigmask_unless(bool interrupted) { - if (interrupted) + if (interrupted) { +#ifdef TIF_TASKWORK + WARN_ON(!test_thread_flag(TIF_SIGPENDING) && + !test_thread_flag(TIF_TASKWORK)); +#else WARN_ON(!test_thread_flag(TIF_SIGPENDING)); - else +#endif + } else { restore_saved_sigmask(); + } } static inline sigset_t *sigmask_to_save(void) diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 6fdb6105e6d6..d25ee8f7f071 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -135,11 +135,13 @@ static __always_inline void exit_to_user_mode(void) } /* Workaround to allow gradual conversion of architecture code */ -void __weak arch_do_signal(struct pt_regs *regs) { } +bool __weak arch_do_signal(struct pt_regs *regs) { return true; } static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, unsigned long ti_work) { + bool restart_sys = (ti_work & (_TIF_SIGPENDING|_TIF_TASKWORK)) != 0; + /* * Before returning to user space ensure that all pending work * items have been completed. @@ -157,8 +159,11 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, if (ti_work & _TIF_PATCH_PENDING) klp_update_patch_state(current); - if (ti_work & _TIF_SIGPENDING) - arch_do_signal(regs); + if (ti_work & _TIF_TASKWORK) + task_work_run(); + + if ((ti_work & _TIF_SIGPENDING) && arch_do_signal(regs)) + restart_sys = false; if (ti_work & _TIF_NOTIFY_RESUME) { clear_thread_flag(TIF_NOTIFY_RESUME); @@ -178,6 +183,9 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, ti_work = READ_ONCE(current_thread_info()->flags); } + if (restart_sys) + arch_restart_syscall(regs); + /* Return the latest work state for arch_exit_to_user_mode() */ return ti_work; } -- 2.28.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe 2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe 2020-10-01 19:42 ` [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe @ 2020-10-01 19:42 ` Jens Axboe 2020-10-02 15:14 ` Oleg Nesterov 2 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2020-10-01 19:42 UTC (permalink / raw) To: linux-kernel, io-uring; +Cc: peterz, oleg, tglx, Jens Axboe If the arch supports TIF_TASKWORK, then use that for TWA_SIGNAL as it's more efficient than using the signal delivery method. This is especially true on threaded applications, where ->sighand is shared across threads. Signed-off-by: Jens Axboe <[email protected]> --- kernel/task_work.c | 48 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 12 deletions(-) diff --git a/kernel/task_work.c b/kernel/task_work.c index 613b2d634af8..ae317cfe86b8 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -5,6 +5,39 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */ +/* + * TWA_SIGNAL signaling - use TIF_TASKWORK, if available. + */ +static void task_work_signal(struct task_struct *task) +{ +#ifndef TIF_TASKWORK + unsigned long flags; + + /* + * Only grab the sighand lock if we don't already have some + * task_work pending. This pairs with the smp_store_mb() + * in get_signal(), see comment there. + */ + if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) && + lock_task_sighand(task, &flags)) { + task->jobctl |= JOBCTL_TASK_WORK; + signal_wake_up(task, 0); + unlock_task_sighand(task, &flags); + } +#else + set_tsk_thread_flag(task, TIF_TASKWORK); + set_notify_resume(task); +#endif +} + +static inline void clear_tsk_taskwork(struct task_struct *task) +{ +#ifdef TIF_TASKWORK + if (test_tsk_thread_flag(task, TIF_TASKWORK)) + clear_tsk_thread_flag(task, TIF_TASKWORK); +#endif +} + /** * task_work_add - ask the @task to execute @work->func() * @task: the task which should run the callback @@ -28,7 +61,6 @@ int task_work_add(struct task_struct *task, struct callback_head *work, int notify) { struct callback_head *head; - unsigned long flags; do { head = READ_ONCE(task->task_works); @@ -42,17 +74,7 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify) set_notify_resume(task); break; case TWA_SIGNAL: - /* - * Only grab the sighand lock if we don't already have some - * task_work pending. This pairs with the smp_store_mb() - * in get_signal(), see comment there. - */ - if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) && - lock_task_sighand(task, &flags)) { - task->jobctl |= JOBCTL_TASK_WORK; - signal_wake_up(task, 0); - unlock_task_sighand(task, &flags); - } + task_work_signal(task); break; } @@ -110,6 +132,8 @@ void task_work_run(void) struct task_struct *task = current; struct callback_head *work, *head, *next; + clear_tsk_taskwork(task); + for (;;) { /* * work->func() can do task_work_add(), do not set -- 2.28.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe @ 2020-10-02 15:14 ` Oleg Nesterov 2020-10-02 15:31 ` Thomas Gleixner 2020-10-02 15:53 ` Jens Axboe 0 siblings, 2 replies; 15+ messages in thread From: Oleg Nesterov @ 2020-10-02 15:14 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, io-uring, peterz, tglx Heh. To be honest I don't really like 1-2 ;) Unfortunately, I do not see a better approach right now. Let me think until Monday, it is not that I think I will find a better solution, but I'd like to try anyway. Let me comment 3/3 for now. On 10/01, Jens Axboe wrote: > > +static void task_work_signal(struct task_struct *task) > +{ > +#ifndef TIF_TASKWORK > + unsigned long flags; > + > + /* > + * Only grab the sighand lock if we don't already have some > + * task_work pending. This pairs with the smp_store_mb() > + * in get_signal(), see comment there. > + */ > + if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) && > + lock_task_sighand(task, &flags)) { > + task->jobctl |= JOBCTL_TASK_WORK; > + signal_wake_up(task, 0); > + unlock_task_sighand(task, &flags); > + } > +#else > + set_tsk_thread_flag(task, TIF_TASKWORK); > + set_notify_resume(task); > +#endif Again, I can't understand. task_work_signal(task) should set TIF_TASKWORK to make signal_pending() = T _and_ wake/kick the target up, just like signal_wake_up() does. Why do we set TIF_NOTIFY_RESUME ? So I think that if we are going to add TIF_TASKWORK we should generalize this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME but implies signal_pending(). IOW, something like void set_notify_signal(task) { if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) { if (!wake_up_state(task, TASK_INTERRUPTIBLE)) kick_process(t); } } // called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL void tracehook_notify_signal(regs) { clear_thread_flag(TIF_NOTIFY_SIGNAL); smp_mb__after_atomic(); if (unlikely(current->task_works)) task_work_run(); } This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can have more users. What do you think? Oleg. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:14 ` Oleg Nesterov @ 2020-10-02 15:31 ` Thomas Gleixner 2020-10-02 15:38 ` Oleg Nesterov 2020-10-02 15:52 ` Jens Axboe 2020-10-02 15:53 ` Jens Axboe 1 sibling, 2 replies; 15+ messages in thread From: Thomas Gleixner @ 2020-10-02 15:31 UTC (permalink / raw) To: Oleg Nesterov, Jens Axboe; +Cc: linux-kernel, io-uring, peterz On Fri, Oct 02 2020 at 17:14, Oleg Nesterov wrote: > Heh. To be honest I don't really like 1-2 ;) I do not like any of this :) > So I think that if we are going to add TIF_TASKWORK we should generalize > this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME > but implies signal_pending(). > > IOW, something like > > void set_notify_signal(task) > { > if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) { > if (!wake_up_state(task, TASK_INTERRUPTIBLE)) > kick_process(t); > } > } > > // called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL > void tracehook_notify_signal(regs) > { > clear_thread_flag(TIF_NOTIFY_SIGNAL); > smp_mb__after_atomic(); > if (unlikely(current->task_works)) > task_work_run(); > } > > This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can > have more users. I think it's fundamentaly wrong that we have several places and several flags which handle task_work_run() instead of having exactly one place and one flag. Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:31 ` Thomas Gleixner @ 2020-10-02 15:38 ` Oleg Nesterov 2020-10-02 16:18 ` Jens Axboe 2020-10-03 1:49 ` Thomas Gleixner 2020-10-02 15:52 ` Jens Axboe 1 sibling, 2 replies; 15+ messages in thread From: Oleg Nesterov @ 2020-10-02 15:38 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Jens Axboe, linux-kernel, io-uring, peterz On 10/02, Thomas Gleixner wrote: > > I think it's fundamentaly wrong that we have several places and several > flags which handle task_work_run() instead of having exactly one place > and one flag. Damn yes, agreed. Oleg. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:38 ` Oleg Nesterov @ 2020-10-02 16:18 ` Jens Axboe 2020-10-03 1:49 ` Thomas Gleixner 1 sibling, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-02 16:18 UTC (permalink / raw) To: Oleg Nesterov, Thomas Gleixner; +Cc: linux-kernel, io-uring, peterz On 10/2/20 9:38 AM, Oleg Nesterov wrote: > On 10/02, Thomas Gleixner wrote: >> >> I think it's fundamentaly wrong that we have several places and several >> flags which handle task_work_run() instead of having exactly one place >> and one flag. > > Damn yes, agreed. As mentioned in the other reply, this is actually a nice step towards NOT having that be the case. Right now we have TWA_RESUME, which uses TIF_NOTIFY_RESUME. Once all archs support TIF_NOTIFY_SIGNAL, then we can totally drop TWA_NOTIFY resume, and use use TWA_SIGNAL as the default for notify == true task_work users. And we can drop task_work noticing and running in the signal handling as well, leaving us with only having tracehook_notify_signal() running the task_work. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:38 ` Oleg Nesterov 2020-10-02 16:18 ` Jens Axboe @ 2020-10-03 1:49 ` Thomas Gleixner 2020-10-03 15:35 ` Jens Axboe 1 sibling, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2020-10-03 1:49 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Jens Axboe, linux-kernel, io-uring, peterz On Fri, Oct 02 2020 at 17:38, Oleg Nesterov wrote: > On 10/02, Thomas Gleixner wrote: >> >> I think it's fundamentaly wrong that we have several places and several >> flags which handle task_work_run() instead of having exactly one place >> and one flag. > > Damn yes, agreed. Actually there are TWO places, but they don't interfere: 1) exit to user 2) enter guest From the kernel POV they are pretty much the same as both are leaving the kernel domain. But they have a few subtle different requirements what has to be done or not. So any change to that logic needs to fixup both places, Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-03 1:49 ` Thomas Gleixner @ 2020-10-03 15:35 ` Jens Axboe 0 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-03 15:35 UTC (permalink / raw) To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz On 10/2/20 7:49 PM, Thomas Gleixner wrote: > On Fri, Oct 02 2020 at 17:38, Oleg Nesterov wrote: >> On 10/02, Thomas Gleixner wrote: >>> >>> I think it's fundamentaly wrong that we have several places and several >>> flags which handle task_work_run() instead of having exactly one place >>> and one flag. >> >> Damn yes, agreed. > > Actually there are TWO places, but they don't interfere: > > 1) exit to user > > 2) enter guest > > From the kernel POV they are pretty much the same as both are leaving > the kernel domain. But they have a few subtle different requirements > what has to be done or not. > > So any change to that logic needs to fixup both places, Right, I actually did spot that, but didn't include it in the initial series. I've split up the series a bit more, into functional bits. Should be easier to reason/discuss: https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:31 ` Thomas Gleixner 2020-10-02 15:38 ` Oleg Nesterov @ 2020-10-02 15:52 ` Jens Axboe 2020-10-02 16:42 ` Jens Axboe 2020-10-02 19:10 ` Thomas Gleixner 1 sibling, 2 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-02 15:52 UTC (permalink / raw) To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz On 10/2/20 9:31 AM, Thomas Gleixner wrote: > On Fri, Oct 02 2020 at 17:14, Oleg Nesterov wrote: >> Heh. To be honest I don't really like 1-2 ;) > > I do not like any of this :) > >> So I think that if we are going to add TIF_TASKWORK we should generalize >> this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME >> but implies signal_pending(). >> >> IOW, something like >> >> void set_notify_signal(task) >> { >> if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) { >> if (!wake_up_state(task, TASK_INTERRUPTIBLE)) >> kick_process(t); >> } >> } >> >> // called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL >> void tracehook_notify_signal(regs) >> { >> clear_thread_flag(TIF_NOTIFY_SIGNAL); >> smp_mb__after_atomic(); >> if (unlikely(current->task_works)) >> task_work_run(); >> } >> >> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can >> have more users. > > I think it's fundamentaly wrong that we have several places and several > flags which handle task_work_run() instead of having exactly one place > and one flag. I don't disagree with that. I know it's not happening in this series, but if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that, then we can kill the signal and notify resume part of running task_work. And that leaves us with exactly one place that runs it. So we can potentially improve the current situation in that regard. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:52 ` Jens Axboe @ 2020-10-02 16:42 ` Jens Axboe 2020-10-02 19:10 ` Thomas Gleixner 1 sibling, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-02 16:42 UTC (permalink / raw) To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz On 10/2/20 9:52 AM, Jens Axboe wrote: > On 10/2/20 9:31 AM, Thomas Gleixner wrote: >> On Fri, Oct 02 2020 at 17:14, Oleg Nesterov wrote: >>> Heh. To be honest I don't really like 1-2 ;) >> >> I do not like any of this :) >> >>> So I think that if we are going to add TIF_TASKWORK we should generalize >>> this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME >>> but implies signal_pending(). >>> >>> IOW, something like >>> >>> void set_notify_signal(task) >>> { >>> if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) { >>> if (!wake_up_state(task, TASK_INTERRUPTIBLE)) >>> kick_process(t); >>> } >>> } >>> >>> // called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL >>> void tracehook_notify_signal(regs) >>> { >>> clear_thread_flag(TIF_NOTIFY_SIGNAL); >>> smp_mb__after_atomic(); >>> if (unlikely(current->task_works)) >>> task_work_run(); >>> } >>> >>> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can >>> have more users. >> >> I think it's fundamentaly wrong that we have several places and several >> flags which handle task_work_run() instead of having exactly one place >> and one flag. > > I don't disagree with that. I know it's not happening in this series, but > if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that, > then we can kill the signal and notify resume part of running task_work. > And that leaves us with exactly one place that runs it. > > So we can potentially improve the current situation in that regard. I re-spun (and re-tested) the series, now based on TIF_NOTIFY_SIGNAL instead. I won't be sending this one out before we've discussed it some more, but wanted to let you know what it currently looks like: https://git.kernel.dk/cgit/linux-block/log/?h=tif-task_work -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:52 ` Jens Axboe 2020-10-02 16:42 ` Jens Axboe @ 2020-10-02 19:10 ` Thomas Gleixner 2020-10-02 20:14 ` Jens Axboe 1 sibling, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2020-10-02 19:10 UTC (permalink / raw) To: Jens Axboe, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz On Fri, Oct 02 2020 at 09:52, Jens Axboe wrote: > On 10/2/20 9:31 AM, Thomas Gleixner wrote: >>> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can >>> have more users. >> >> I think it's fundamentaly wrong that we have several places and several >> flags which handle task_work_run() instead of having exactly one place >> and one flag. > > I don't disagree with that. I know it's not happening in this series, but > if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that, > then we can kill the signal and notify resume part of running task_work. > And that leaves us with exactly one place that runs it. > > So we can potentially improve the current situation in that regard. I'll think about it over the weekend. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 19:10 ` Thomas Gleixner @ 2020-10-02 20:14 ` Jens Axboe 0 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-02 20:14 UTC (permalink / raw) To: Thomas Gleixner, Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz On 10/2/20 1:10 PM, Thomas Gleixner wrote: > On Fri, Oct 02 2020 at 09:52, Jens Axboe wrote: >> On 10/2/20 9:31 AM, Thomas Gleixner wrote: >>>> This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can >>>> have more users. >>> >>> I think it's fundamentaly wrong that we have several places and several >>> flags which handle task_work_run() instead of having exactly one place >>> and one flag. >> >> I don't disagree with that. I know it's not happening in this series, but >> if we to the TIF_NOTIFY_SIGNAL route and get all archs supporting that, >> then we can kill the signal and notify resume part of running task_work. >> And that leaves us with exactly one place that runs it. >> >> So we can potentially improve the current situation in that regard. > > I'll think about it over the weekend. Thanks, I appreciate it! Just to drive the point home, we'd end up with something like the below. Which also enables me to remove a nasty sighand->lock deadlock workaround in io_uring. Not in this patch, but the io_uring cqring_wait() call can also be removed. Outside of the core calling it in tracehook_notify_signal(), the only callers are then the case where kthreads are used with task_work. fs/io_uring.c | 41 ++++++++++++---------------------- include/linux/sched/jobctl.h | 4 +--- include/linux/task_work.h | 4 +--- include/linux/tracehook.h | 9 -------- kernel/signal.c | 22 ------------------ kernel/task_work.c | 40 +++------------------------------ kernel/time/posix-cpu-timers.c | 2 +- 7 files changed, 20 insertions(+), 102 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 2a67552a9c2f..3a5f4a7bd369 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1597,12 +1597,12 @@ static void __io_free_req(struct io_kiocb *req) int ret; init_task_work(&req->task_work, io_req_task_file_table_put); - ret = task_work_add(req->task, &req->task_work, TWA_RESUME); + ret = task_work_add(req->task, &req->task_work, true); if (unlikely(ret)) { struct task_struct *tsk; tsk = io_wq_get_task(req->ctx->io_wq); - task_work_add(tsk, &req->task_work, 0); + task_work_add(tsk, &req->task_work, false); } } } @@ -1746,25 +1746,21 @@ static struct io_kiocb *io_req_find_next(struct io_kiocb *req) return __io_req_find_next(req); } -static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb, - bool twa_signal_ok) +static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb) { struct task_struct *tsk = req->task; struct io_ring_ctx *ctx = req->ctx; - int ret, notify; + bool notify = false; + int ret; if (tsk->flags & PF_EXITING) return -ESRCH; /* - * SQPOLL kernel thread doesn't need notification, just a wakeup. For - * all other cases, use TWA_SIGNAL unconditionally to ensure we're - * processing task_work. There's no reliable way to tell if TWA_RESUME - * will do the job. + * SQPOLL kernel thread doesn't need notification, just a wakeup. */ - notify = 0; - if (!(ctx->flags & IORING_SETUP_SQPOLL) && twa_signal_ok) - notify = TWA_SIGNAL; + if (!(ctx->flags & IORING_SETUP_SQPOLL)) + notify = true; ret = task_work_add(tsk, cb, notify); if (!ret) @@ -1825,13 +1821,13 @@ static void io_req_task_queue(struct io_kiocb *req) init_task_work(&req->task_work, io_req_task_submit); percpu_ref_get(&req->ctx->refs); - ret = io_req_task_work_add(req, &req->task_work, true); + ret = io_req_task_work_add(req, &req->task_work); if (unlikely(ret)) { struct task_struct *tsk; init_task_work(&req->task_work, io_req_task_cancel); tsk = io_wq_get_task(req->ctx->io_wq); - task_work_add(tsk, &req->task_work, 0); + task_work_add(tsk, &req->task_work, false); wake_up_process(tsk); } } @@ -3056,14 +3052,14 @@ static int io_async_buf_func(struct wait_queue_entry *wait, unsigned mode, /* submit ref gets dropped, acquire a new one */ refcount_inc(&req->refs); - ret = io_req_task_work_add(req, &req->task_work, true); + ret = io_req_task_work_add(req, &req->task_work); if (unlikely(ret)) { struct task_struct *tsk; /* queue just for cancelation */ init_task_work(&req->task_work, io_req_task_cancel); tsk = io_wq_get_task(req->ctx->io_wq); - task_work_add(tsk, &req->task_work, 0); + task_work_add(tsk, &req->task_work, false); wake_up_process(tsk); } return 1; @@ -4598,7 +4594,6 @@ struct io_poll_table { static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, __poll_t mask, task_work_func_t func) { - bool twa_signal_ok; int ret; /* for instances that support it check for an event match first: */ @@ -4613,27 +4608,19 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, init_task_work(&req->task_work, func); percpu_ref_get(&req->ctx->refs); - /* - * If we using the signalfd wait_queue_head for this wakeup, then - * it's not safe to use TWA_SIGNAL as we could be recursing on the - * tsk->sighand->siglock on doing the wakeup. Should not be needed - * either, as the normal wakeup will suffice. - */ - twa_signal_ok = (poll->head != &req->task->sighand->signalfd_wqh); - /* * If this fails, then the task is exiting. When a task exits, the * work gets canceled, so just cancel this request as well instead * of executing it. We can't safely execute it anyway, as we may not * have the needed state needed for it anyway. */ - ret = io_req_task_work_add(req, &req->task_work, twa_signal_ok); + ret = io_req_task_work_add(req, &req->task_work); if (unlikely(ret)) { struct task_struct *tsk; WRITE_ONCE(poll->canceled, true); tsk = io_wq_get_task(req->ctx->io_wq); - task_work_add(tsk, &req->task_work, 0); + task_work_add(tsk, &req->task_work, false); wake_up_process(tsk); } return 1; diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h index d2b4204ba4d3..fa067de9f1a9 100644 --- a/include/linux/sched/jobctl.h +++ b/include/linux/sched/jobctl.h @@ -19,7 +19,6 @@ struct task_struct; #define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */ #define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */ #define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */ -#define JOBCTL_TASK_WORK_BIT 24 /* set by TWA_SIGNAL */ #define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT) #define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT) @@ -29,10 +28,9 @@ struct task_struct; #define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT) #define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT) #define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT) -#define JOBCTL_TASK_WORK (1UL << JOBCTL_TASK_WORK_BIT) #define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY) -#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK | JOBCTL_TASK_WORK) +#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK) extern bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask); extern void task_clear_jobctl_trapping(struct task_struct *task); diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 0fb93aafa478..a221bd5f746c 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -13,9 +13,7 @@ init_task_work(struct callback_head *twork, task_work_func_t func) twork->func = func; } -#define TWA_RESUME 1 -#define TWA_SIGNAL 2 -int task_work_add(struct task_struct *task, struct callback_head *twork, int); +int task_work_add(struct task_struct *task, struct callback_head *twork, bool); struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t); void task_work_run(void); diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 7ec0e94c5250..3a4a35ae87d1 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -178,15 +178,6 @@ static inline void set_notify_resume(struct task_struct *task) */ static inline void tracehook_notify_resume(struct pt_regs *regs) { - /* - * The caller just cleared TIF_NOTIFY_RESUME. This barrier - * pairs with task_work_add()->set_notify_resume() after - * hlist_add_head(task->task_works); - */ - smp_mb__after_atomic(); - if (unlikely(current->task_works)) - task_work_run(); - #ifdef CONFIG_KEYS_REQUEST_CACHE if (unlikely(current->cached_requested_key)) { key_put(current->cached_requested_key); diff --git a/kernel/signal.c b/kernel/signal.c index ad52141ab0d2..d44fa9141cef 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2271,8 +2271,6 @@ static void ptrace_do_notify(int signr, int exit_code, int why) void ptrace_notify(int exit_code) { BUG_ON((exit_code & (0x7f | ~0xffff)) != SIGTRAP); - if (unlikely(current->task_works)) - task_work_run(); spin_lock_irq(¤t->sighand->siglock); ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED); @@ -2541,26 +2539,6 @@ bool get_signal(struct ksignal *ksig) relock: spin_lock_irq(&sighand->siglock); - /* - * Make sure we can safely read ->jobctl() in task_work add. As Oleg - * states: - * - * It pairs with mb (implied by cmpxchg) before READ_ONCE. So we - * roughly have - * - * task_work_add: get_signal: - * STORE(task->task_works, new_work); STORE(task->jobctl); - * mb(); mb(); - * LOAD(task->jobctl); LOAD(task->task_works); - * - * and we can rely on STORE-MB-LOAD [ in task_work_add]. - */ - smp_store_mb(current->jobctl, current->jobctl & ~JOBCTL_TASK_WORK); - if (unlikely(current->task_works)) { - spin_unlock_irq(&sighand->siglock); - task_work_run(); - goto relock; - } /* * Every stopped thread goes here after wakeup. Check to see if diff --git a/kernel/task_work.c b/kernel/task_work.c index 95604e57af46..e68f5831a078 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -5,34 +5,6 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */ -/* - * TWA_SIGNAL signaling - use TIF_NOTIFY_SIGNAL, if available, as it's faster - * than TIF_SIGPENDING as there's no dependency on ->sighand. The latter is - * shared for threads, and can cause contention on sighand->lock. Even for - * the non-threaded case TIF_NOTIFY_SIGNAL is more efficient, as no locking - * or IRQ disabling is involved for notification (or running) purposes. - */ -static void task_work_notify_signal(struct task_struct *task) -{ -#ifdef TIF_NOTIFY_SIGNAL - set_notify_signal(task); -#else - unsigned long flags; - - /* - * Only grab the sighand lock if we don't already have some - * task_work pending. This pairs with the smp_store_mb() - * in get_signal(), see comment there. - */ - if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) && - lock_task_sighand(task, &flags)) { - task->jobctl |= JOBCTL_TASK_WORK; - signal_wake_up(task, 0); - unlock_task_sighand(task, &flags); - } -#endif -} - /** * task_work_add - ask the @task to execute @work->func() * @task: the task which should run the callback @@ -53,7 +25,7 @@ static void task_work_notify_signal(struct task_struct *task) * 0 if succeeds or -ESRCH. */ int -task_work_add(struct task_struct *task, struct callback_head *work, int notify) +task_work_add(struct task_struct *task, struct callback_head *work, bool notify) { struct callback_head *head; @@ -64,14 +36,8 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify) work->next = head; } while (cmpxchg(&task->task_works, head, work) != head); - switch (notify) { - case TWA_RESUME: - set_notify_resume(task); - break; - case TWA_SIGNAL: - task_work_notify_signal(task); - break; - } + if (notify) + set_notify_signal(task); return 0; } diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c index a71758e34e45..51080a1ed11f 100644 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -1128,7 +1128,7 @@ static inline void __run_posix_cpu_timers(struct task_struct *tsk) /* Schedule task work to actually expire the timers */ tsk->posix_cputimers_work.scheduled = true; - task_work_add(tsk, &tsk->posix_cputimers_work.work, TWA_RESUME); + task_work_add(tsk, &tsk->posix_cputimers_work.work, true); } static inline bool posix_cpu_timers_enable_work(struct task_struct *tsk, -- Jens Axboe ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] task_work: use TIF_TASKWORK if available 2020-10-02 15:14 ` Oleg Nesterov 2020-10-02 15:31 ` Thomas Gleixner @ 2020-10-02 15:53 ` Jens Axboe 1 sibling, 0 replies; 15+ messages in thread From: Jens Axboe @ 2020-10-02 15:53 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel, io-uring, peterz, tglx On 10/2/20 9:14 AM, Oleg Nesterov wrote: > Heh. To be honest I don't really like 1-2 ;) > > Unfortunately, I do not see a better approach right now. Let me think > until Monday, it is not that I think I will find a better solution, but > I'd like to try anyway. > > Let me comment 3/3 for now. Thanks, appreciate your time on this! >> +static void task_work_signal(struct task_struct *task) >> +{ >> +#ifndef TIF_TASKWORK >> + unsigned long flags; >> + >> + /* >> + * Only grab the sighand lock if we don't already have some >> + * task_work pending. This pairs with the smp_store_mb() >> + * in get_signal(), see comment there. >> + */ >> + if (!(READ_ONCE(task->jobctl) & JOBCTL_TASK_WORK) && >> + lock_task_sighand(task, &flags)) { >> + task->jobctl |= JOBCTL_TASK_WORK; >> + signal_wake_up(task, 0); >> + unlock_task_sighand(task, &flags); >> + } >> +#else >> + set_tsk_thread_flag(task, TIF_TASKWORK); >> + set_notify_resume(task); >> +#endif > > Again, I can't understand. task_work_signal(task) should set TIF_TASKWORK > to make signal_pending() = T _and_ wake/kick the target up, just like > signal_wake_up() does. Why do we set TIF_NOTIFY_RESUME ? > > So I think that if we are going to add TIF_TASKWORK we should generalize > this logic and turn it into TIF_NOTIFY_SIGNAL. Similar to TIF_NOTIFY_RESUME > but implies signal_pending(). > > IOW, something like > > void set_notify_signal(task) > { > if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL)) { > if (!wake_up_state(task, TASK_INTERRUPTIBLE)) > kick_process(t); > } > } > > // called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL > void tracehook_notify_signal(regs) > { > clear_thread_flag(TIF_NOTIFY_SIGNAL); > smp_mb__after_atomic(); > if (unlikely(current->task_works)) > task_work_run(); > } > > This way task_work_run() doesn't need to clear TIF_NOTIFY_SIGNAL and it can > have more users. > > What do you think? I like that. It'll achieve the same thing as far as I'm concerned, but not tie the functionality to task_work. Not that we have anything that'd use it right now, but it still seems like a better base. I'll adapt patch 2+3 for this, thanks Oleg. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-10-03 15:35 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-10-01 19:42 [PATCHSET RFC 0/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe 2020-10-01 19:42 ` [PATCH 1/3] kernel: add task_sigpending() helper Jens Axboe 2020-10-01 19:42 ` [PATCH 2/3] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe 2020-10-01 19:42 ` [PATCH 3/3] task_work: use TIF_TASKWORK if available Jens Axboe 2020-10-02 15:14 ` Oleg Nesterov 2020-10-02 15:31 ` Thomas Gleixner 2020-10-02 15:38 ` Oleg Nesterov 2020-10-02 16:18 ` Jens Axboe 2020-10-03 1:49 ` Thomas Gleixner 2020-10-03 15:35 ` Jens Axboe 2020-10-02 15:52 ` Jens Axboe 2020-10-02 16:42 ` Jens Axboe 2020-10-02 19:10 ` Thomas Gleixner 2020-10-02 20:14 ` Jens Axboe 2020-10-02 15:53 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox