From: Jens Axboe <[email protected]>
To: Jann Horn <[email protected]>
Cc: io-uring <[email protected]>
Subject: Re: [PATCH v2] io_uring: use TWA_SIGNAL for task_work if the task isn't running
Date: Fri, 7 Aug 2020 16:11:22 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
[-- Attachment #1: Type: text/plain, Size: 2686 bytes --]
On 8/7/20 3:50 PM, Jens Axboe wrote:
> On 8/7/20 12:00 PM, Jann Horn wrote:
>> On Fri, Aug 7, 2020 at 6:56 PM Jens Axboe <[email protected]> wrote:
>>>
>>> An earlier commit:
>>>
>>> b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()")
>>>
>>> ensured that we didn't get stuck waiting for eventfd reads when it's
>>> registered with the io_uring ring for event notification, but we still
>>> have a gap where the task can be waiting on other events in the kernel
>>> and need a bigger nudge to make forward progress.
>>>
>>> Ensure that we use signaled notifications for a task that isn't currently
>>> running, to be certain the work is seen and processed immediately.
>>>
>>> Cc: [email protected] # v5.7+
>>> Reported-by: Josef <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>>
>>> ---
>>>
>>> This isn't perfect, as it'll use TWA_SIGNAL even for cases where we
>>> don't absolutely need it (like task waiting for completions in
>>> io_cqring_wait()), but we don't have a good way to tell right now. We
>>> can probably improve on this in the future, for now I think this is the
>>> best solution.
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index e9b27cdaa735..b4300a61f231 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1720,7 +1720,7 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
>>> */
>>> if (ctx->flags & IORING_SETUP_SQPOLL)
>>> notify = 0;
>>> - else if (ctx->cq_ev_fd)
>>> + else if (ctx->cq_ev_fd || (tsk->state != TASK_RUNNING))
>>> notify = TWA_SIGNAL;
>>>
>>> ret = task_work_add(tsk, cb, notify);
>>
>> I don't get it. Apart from still not understanding the big picture:
>>
>> What guarantees that the lockless read of tsk->state is in any way
>> related to the state of the remote process by the time we reach
>> task_work_add()? And why do we not need to signal in TASK_RUNNING
>> state (e.g. directly before the remote process switches to
>> TASK_INTERRUPTIBLE or something like that)?
>
> Yeah it doesn't, the patch doesn't cover the racy case. As far as I can
> tell, we've got two ways to do it:
>
> 1) We split the task_work_add() into two parts, one adding the work and
> one doing the signaling. Then we could do:
>
> int notify = TWA_RESUME;
>
> __task_work_add(tsk, cb);
>
> if (ctx->flags & IORING_SETUP_SQPOLL)
> notify = 0;
> else if (ctx->cq_ev_fd || (tsk->state != TASK_RUNNING))
> notify = TWA_SIGNAL;
>
> __task_work_signal(tsk, notify);
Something like the attached - totally untested so far, but it implements
that idea.
--
Jens Axboe
[-- Attachment #2: 0002-io_uring-use-TWA_SIGNAL-for-task_work-if-the-task-is.patch --]
[-- Type: text/x-patch, Size: 2351 bytes --]
From ec67af3b1e0c9325a04d5d1c12311086349d3a79 Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Thu, 6 Aug 2020 19:41:50 -0600
Subject: [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't
running
An earlier commit:
b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()")
ensured that we didn't get stuck waiting for eventfd reads when it's
registered with the io_uring ring for event notification, but we still
have a gap where the task can be waiting on other events in the kernel
and need a bigger nudge to make forward progress.
Ensure that we use signaled notifications for a task that isn't currently
running, to be certain the work is seen and processed immediately.
Cc: [email protected] # v5.7+
Reported-by: Josef <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e9b27cdaa735..443eecdfeda9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1712,21 +1712,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
struct io_ring_ctx *ctx = req->ctx;
int ret, notify = TWA_RESUME;
+ ret = __task_work_add(tsk, cb);
+ if (unlikely(ret))
+ return ret;
+
/*
* SQPOLL kernel thread doesn't need notification, just a wakeup.
- * If we're not using an eventfd, then TWA_RESUME is always fine,
- * as we won't have dependencies between request completions for
- * other kernel wait conditions.
+ * For any other work, use signaled wakeups if the task isn't
+ * running to avoid dependencies between tasks or threads. If
+ * the issuing task is currently waiting in the kernel on a thread,
+ * and same thread is waiting for a completion event, then we need
+ * to ensure that the issuing task processes task_work. TWA_SIGNAL
+ * is needed for that.
*/
if (ctx->flags & IORING_SETUP_SQPOLL)
notify = 0;
- else if (ctx->cq_ev_fd)
+ else if (READ_ONCE(tsk->state) != TASK_RUNNING)
notify = TWA_SIGNAL;
- ret = task_work_add(tsk, cb, notify);
- if (!ret)
- wake_up_process(tsk);
- return ret;
+ __task_work_notify(tsk, notify);
+ wake_up_process(tsk);
+ return 0;
}
static void __io_req_task_cancel(struct io_kiocb *req, int error)
--
2.28.0
[-- Attachment #3: 0001-kernel-split-task_work_add-into-two-separate-helpers.patch --]
[-- Type: text/x-patch, Size: 3714 bytes --]
From 802b09d10bdd2f90e510049b1c52b81edc2ae0a3 Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Fri, 7 Aug 2020 16:00:53 -0600
Subject: [PATCH 1/2] kernel: split task_work_add() into two separate helpers
Some callers may need to make signaling decisions based on the state
of the targeted task, and that can only safely be done post adding
the task_work to the task. Split task_work_add() into:
__task_work_add() - adds the work item
__task_work_notify() - sends the notification
No functional changes in this patch.
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/task_work.h | 19 ++++++++++++++++
kernel/task_work.c | 48 +++++++++++++++++++++------------------
2 files changed, 45 insertions(+), 22 deletions(-)
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 0fb93aafa478..7abbd8df5e13 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -5,6 +5,8 @@
#include <linux/list.h>
#include <linux/sched.h>
+extern struct callback_head work_exited;
+
typedef void (*task_work_func_t)(struct callback_head *);
static inline void
@@ -13,6 +15,21 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
twork->func = func;
}
+static inline int __task_work_add(struct task_struct *task,
+ struct callback_head *work)
+{
+ struct callback_head *head;
+
+ do {
+ head = READ_ONCE(task->task_works);
+ if (unlikely(head == &work_exited))
+ return -ESRCH;
+ work->next = head;
+ } while (cmpxchg(&task->task_works, head, work) != head);
+
+ return 0;
+}
+
#define TWA_RESUME 1
#define TWA_SIGNAL 2
int task_work_add(struct task_struct *task, struct callback_head *twork, int);
@@ -20,6 +37,8 @@ int task_work_add(struct task_struct *task, struct callback_head *twork, int);
struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
void task_work_run(void);
+void __task_work_notify(struct task_struct *task, int notify);
+
static inline void exit_task_work(struct task_struct *task)
{
task_work_run();
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 5c0848ca1287..9bde81481984 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -3,7 +3,27 @@
#include <linux/task_work.h>
#include <linux/tracehook.h>
-static struct callback_head work_exited; /* all we need is ->next == NULL */
+struct callback_head work_exited = {
+ .next = NULL /* all we need is ->next == NULL */
+};
+
+void __task_work_notify(struct task_struct *task, int notify)
+{
+ unsigned long flags;
+
+ switch (notify) {
+ case TWA_RESUME:
+ set_notify_resume(task);
+ break;
+ case TWA_SIGNAL:
+ if (lock_task_sighand(task, &flags)) {
+ task->jobctl |= JOBCTL_TASK_WORK;
+ signal_wake_up(task, 0);
+ unlock_task_sighand(task, &flags);
+ }
+ break;
+ }
+}
/**
* task_work_add - ask the @task to execute @work->func()
@@ -27,29 +47,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
int
task_work_add(struct task_struct *task, struct callback_head *work, int notify)
{
- struct callback_head *head;
- unsigned long flags;
+ int ret;
- do {
- head = READ_ONCE(task->task_works);
- if (unlikely(head == &work_exited))
- return -ESRCH;
- work->next = head;
- } while (cmpxchg(&task->task_works, head, work) != head);
-
- switch (notify) {
- case TWA_RESUME:
- set_notify_resume(task);
- break;
- case TWA_SIGNAL:
- if (lock_task_sighand(task, &flags)) {
- task->jobctl |= JOBCTL_TASK_WORK;
- signal_wake_up(task, 0);
- unlock_task_sighand(task, &flags);
- }
- break;
- }
+ ret = __task_work_add(task, work);
+ if (unlikely(ret))
+ return ret;
+ __task_work_notify(task, notify);
return 0;
}
--
2.28.0
prev parent reply other threads:[~2020-08-07 22:11 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-07 16:55 [PATCH v2] io_uring: use TWA_SIGNAL for task_work if the task isn't running Jens Axboe
2020-08-07 18:00 ` Jann Horn
2020-08-07 21:50 ` Jens Axboe
2020-08-07 22:11 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox