From: Jens Axboe <[email protected]>
To: Hillf Danton <[email protected]>
Cc: Oleg Nesterov <[email protected]>,
io-uring <[email protected]>,
LKML <[email protected]>,
Peter Zijlstra <[email protected]>,
Thomas Gleixner <[email protected]>
Subject: Re: [PATCH RFC v2] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals
Date: Fri, 2 Oct 2020 07:44:53 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 10/2/20 7:38 AM, Hillf Danton wrote:
>
> On Thu, 1 Oct 2020 11:27:04 -0600 Jens Axboe wrote:
>> On 10/1/20 10:27 AM, Oleg Nesterov wrote:
>>> Jens,
>>>
>>> I'll read this version tomorrow, but:
>>>
>>> On 10/01, Jens Axboe wrote:
>>>>
>>>> static inline int signal_pending(struct task_struct *p)
>>>> {
>>>> - return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
>>>> +#ifdef TIF_TASKWORK
>>>> + /*
>>>> + * TIF_TASKWORK isn't really a signal, but it requires the same
>>>> + * behavior of restarting the system call to force a kernel/user
>>>> + * transition.
>>>> + */
>>>> + return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING) ||
>>>> + test_tsk_thread_flag(p, TIF_TASKWORK));
>>>> +#else
>>>> + return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING));
>>>> +#endif
>>>
>>> This change alone is already very wrong.
>>>
>>> signal_pending(task) == T means that this task will do get_signal() as
>>> soon as it can, and this basically means you can't "divorce" SIGPENDING
>>> and TASKWORK.
>>>
>>> Simple example. Suppose we have a single-threaded task T.
>>>
>>> Someone does task_work_add(T, TWA_SIGNAL). This makes signal_pending()==T
>>> and this is what we need.
>>>
>>> Now suppose that another task sends a signal to T before T calls
>>> task_work_run() and clears TIF_TASKWORK. In this case SIGPENDING won't
>>> be set because signal_pending() is already set (see wants_signal), and
>>> this means that T won't notice this signal.
>>
>> That's a good point, and I have been thinking along those lines. The
>> "problem" is the two different use cases:
>>
>> 1) The "should I return from schedule() or break out of schedule() loops
>> kind of use cases".
>>
>> 2) Internal signal delivery use cases.
>>
>> The former wants one that factors in TIF_TASKWORK, while the latter
>> should of course only look at TIF_SIGPENDING.
>>
>> Now, my gut reaction would be to have __signal_pending() that purely
>> checks for TIF_SIGPENDING, and make sure we use that on the signal
>> delivery side of things. Or something with a better name than that, but
>> functionally the same. Ala:
>>
>> static inline int __signal_pending(struct task_struct *p)
>> {
>> return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING));
>> }
>>
>> static inline int signal_pending(struct task_struct *p)
>> {
>> #ifdef TIF_TASKWORK
>> return unlikely(test_tsk_thread_flag(p, TIF_TASKWORK)||
>> __signal_pending(p));
>> #else
>> return __signal_pending(p));
>> #endif
>> }
>>
>> and then use __signal_pending() on the signal delivery side.
>>
>> It's still not great in the sense that renaming signal_pending() would
>> be a better choice, but that's a whole lot of churn...
>
> To avoid that churn, IIUC replace TWA_SIGNAL with TWA_RESUME on
> adding task work, which is compensated by adding a counter of
> event source in IO ctx and waiting for event to arrive instead
> of signal.
That doesn't work. If the task is waiting in cqring_wait(), then
there's no issue already. The problem is if it's waiting somewhere
else.
Imagine three threads, call them T1-3. T1 creates a pipe, and creates
a ring. T1 queues a poll request for the read end of the pipe, and now
does a wait for T2. T2 is a completer thread, so it ends up waiting
for events on the ring. T2 is now in cqring_wait(). T3 is created,
and it writes to the pipe. This write triggers the original poll
request from T1, and task_work is now queued for T1. This task work
needs to be processed for T2 to wakeup and complete, but it can't
since T1 is in pthread_join() for T2.
This is why TWA_SIGNAL is needed, we need it to break the T1 wait
loop and process this work. No amount of changes in io_uring can
fix this dependency, and if you look at the last series posted,
it does in fact not even have any io_uring changes.
Hence the goal is to have TWA_SIGNAL have the same kind of semantics
it does now, but decoupled from ->sighand since that is problematic
on particularly threaded setups.
--
Jens Axboe
prev parent reply other threads:[~2020-10-02 13:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-01 15:03 [PATCH RFC v2] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals Jens Axboe
2020-10-01 16:27 ` Oleg Nesterov
2020-10-01 17:27 ` Jens Axboe
[not found] ` <[email protected]>
2020-10-02 13:44 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox