From: Jens Axboe <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: Stefan Metzmacher <[email protected]>,
io-uring <[email protected]>,
"Eric W. Biederman" <[email protected]>,
Al Viro <[email protected]>
Subject: Re: [PATCH 09/18] io-wq: fork worker threads from original task
Date: Thu, 4 Mar 2021 12:54:40 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAHk-=wj8BnbsSKWx=kUFPqpoohDdPchsW00c4L-x6ES8bOWLSg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1803 bytes --]
On 3/4/21 12:46 PM, Linus Torvalds wrote:
> On Thu, Mar 4, 2021 at 11:19 AM Jens Axboe <[email protected]> wrote:
>>
>> Took a quick look at this, and I agree that's _much_ better. In fact, it
>> boils down to just calling copy_process() and then having the caller do
>> wake_up_new_task(). So not sure if it's worth adding an
>> create_io_thread() helper, or just make copy_process() available
>> instead. This is ignoring the trace point for now...
>
> I really don't want to expose copy_process() outside of kernel/fork.c.
>
> The whole three-phase "copy - setup - activate" model is a really
> really good thing, and it's how we've done things internally almost
> forever, but I really don't want to expose those middle stages to any
> outsiders.
>
> So I'd really prefer a simple new "create_io_worker()", even if it's
> literally just some four-line function that does
>
> p = copy_process(..);
> if (!IS_ERR(p)) {
> block all signals in p
> set PF_IO_WORKER flag
> wake_up_new_task(p);
> }
> return p;
>
> I very much want that to be inside kernel/fork.c and have all these
> rules about creating new threads localized there.
I agree, here are the two current patches. Just need to add the signal
blocking, which I'd love to do in create_io_thread(), but seems to
require either an allocation or provide a helper to do it in the thread
itself (with an on-stack mask).
Removes code, even with comment added.
fs/io-wq.c | 68 ++++++++++++++++---------------------------------------------
fs/io-wq.h | 2 --
fs/io_uring.c | 29 ++++++++++++++------------
include/linux/sched/task.h | 2 ++
kernel/fork.c | 24 ++++++++++++++++++++++
5 files changed, 59 insertions(+), 66 deletions(-)
--
Jens Axboe
[-- Attachment #2: 0001-kernel-provide-create_io_thread-helper.patch --]
[-- Type: text/x-patch, Size: 2660 bytes --]
From 396142d9878cc1a02149616c7032b3e647205341 Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Thu, 4 Mar 2021 12:21:05 -0700
Subject: [PATCH 1/2] kernel: provide create_io_thread() helper
Provide a generic helper for setting up an io_uring worker. Returns a
task_struct so that the caller can do whatever setup is needed, then call
wake_up_new_task() to kick it into gear.
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/sched/task.h | 2 ++
kernel/fork.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index c0f71f2e7160..ef02be869cf2 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -31,6 +31,7 @@ struct kernel_clone_args {
/* Number of elements in *set_tid */
size_t set_tid_size;
int cgroup;
+ int io_thread;
struct cgroup *cgrp;
struct css_set *cset;
};
@@ -82,6 +83,7 @@ extern void exit_files(struct task_struct *);
extern void exit_itimers(struct signal_struct *);
extern pid_t kernel_clone(struct kernel_clone_args *kargs);
+struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
struct task_struct *fork_idle(int);
struct mm_struct *copy_init_mm(void);
extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/kernel/fork.c b/kernel/fork.c
index d66cd1014211..549acc6324f0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1940,6 +1940,8 @@ static __latent_entropy struct task_struct *copy_process(
p = dup_task_struct(current, node);
if (!p)
goto fork_out;
+ if (args->io_thread)
+ p->flags |= PF_IO_WORKER;
/*
* This _must_ happen before we call free_task(), i.e. before we jump
@@ -2410,6 +2412,28 @@ struct mm_struct *copy_init_mm(void)
return dup_mm(NULL, &init_mm);
}
+/*
+ * This is like kernel_clone(), but shaved down and tailored to just
+ * creating io_uring workers. It returns a created task, or an error pointer.
+ * The returned task is inactive, and the caller must fire it up through
+ * wake_up_new_task(p).
+ */
+struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
+{
+ unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
+ CLONE_IO|SIGCHLD;
+ struct kernel_clone_args args = {
+ .flags = ((lower_32_bits(flags) | CLONE_VM |
+ CLONE_UNTRACED) & ~CSIGNAL),
+ .exit_signal = (lower_32_bits(flags) & CSIGNAL),
+ .stack = (unsigned long)fn,
+ .stack_size = (unsigned long)arg,
+ .io_thread = 1,
+ };
+
+ return copy_process(NULL, 0, node, &args);
+}
+
/*
* Ok, this is the main fork-routine.
*
--
2.30.1
[-- Attachment #3: 0002-io_uring-move-to-using-create_io_thread.patch --]
[-- Type: text/x-patch, Size: 7839 bytes --]
From 9dee8128025806e74c7fd3915294649dc0b11f5f Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Thu, 4 Mar 2021 12:39:36 -0700
Subject: [PATCH 2/2] io_uring: move to using create_io_thread()
This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io-wq.c | 68 +++++++++++++--------------------------------------
fs/io-wq.h | 2 --
fs/io_uring.c | 29 ++++++++++++----------
3 files changed, 33 insertions(+), 66 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 19f18389ead2..693239ed4de5 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -54,7 +54,6 @@ struct io_worker {
spinlock_t lock;
struct completion ref_done;
- struct completion started;
struct rcu_head rcu;
};
@@ -116,7 +115,6 @@ struct io_wq {
struct io_wq_hash *hash;
refcount_t refs;
- struct completion started;
struct completion exited;
atomic_t worker_refs;
@@ -273,14 +271,6 @@ static void io_wqe_dec_running(struct io_worker *worker)
io_wqe_wake_worker(wqe, acct);
}
-static void io_worker_start(struct io_worker *worker)
-{
- current->flags |= PF_NOFREEZE;
- worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING);
- io_wqe_inc_running(worker);
- complete(&worker->started);
-}
-
/*
* Worker will start processing some work. Move it to the busy list, if
* it's currently on the freelist
@@ -490,8 +480,6 @@ static int io_wqe_worker(void *data)
struct io_wqe *wqe = worker->wqe;
struct io_wq *wq = wqe->wq;
- io_worker_start(worker);
-
while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
set_current_state(TASK_INTERRUPTIBLE);
loop:
@@ -576,12 +564,6 @@ static int task_thread(void *data, int index)
sprintf(buf, "iou-wrk-%d", wq->task_pid);
set_task_comm(current, buf);
- current->pf_io_worker = worker;
- worker->task = current;
-
- set_cpus_allowed_ptr(current, cpumask_of_node(wqe->node));
- current->flags |= PF_NO_SETAFFINITY;
-
raw_spin_lock_irq(&wqe->lock);
hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list);
list_add_tail_rcu(&worker->all_list, &wqe->all_list);
@@ -607,25 +589,10 @@ static int task_thread_unbound(void *data)
return task_thread(data, IO_WQ_ACCT_UNBOUND);
}
-pid_t io_wq_fork_thread(int (*fn)(void *), void *arg)
-{
- unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
- CLONE_IO|SIGCHLD;
- struct kernel_clone_args args = {
- .flags = ((lower_32_bits(flags) | CLONE_VM |
- CLONE_UNTRACED) & ~CSIGNAL),
- .exit_signal = (lower_32_bits(flags) & CSIGNAL),
- .stack = (unsigned long)fn,
- .stack_size = (unsigned long)arg,
- };
-
- return kernel_clone(&args);
-}
-
static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
{
struct io_worker *worker;
- pid_t pid;
+ struct task_struct *tsk;
__set_current_state(TASK_RUNNING);
@@ -638,21 +605,26 @@ static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
worker->wqe = wqe;
spin_lock_init(&worker->lock);
init_completion(&worker->ref_done);
- init_completion(&worker->started);
atomic_inc(&wq->worker_refs);
if (index == IO_WQ_ACCT_BOUND)
- pid = io_wq_fork_thread(task_thread_bound, worker);
+ tsk = create_io_thread(task_thread_bound, worker, wqe->node);
else
- pid = io_wq_fork_thread(task_thread_unbound, worker);
- if (pid < 0) {
+ tsk = create_io_thread(task_thread_unbound, worker, wqe->node);
+ if (IS_ERR(tsk)) {
if (atomic_dec_and_test(&wq->worker_refs))
complete(&wq->worker_done);
kfree(worker);
return false;
}
- wait_for_completion(&worker->started);
+ worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING);
+ io_wqe_inc_running(worker);
+ tsk->pf_io_worker = worker;
+ worker->task = tsk;
+ set_cpus_allowed_ptr(tsk, cpumask_of_node(wqe->node));
+ tsk->flags |= PF_NOFREEZE | PF_NO_SETAFFINITY;
+ wake_up_new_task(tsk);
return true;
}
@@ -752,10 +724,6 @@ static int io_wq_manager(void *data)
sprintf(buf, "iou-mgr-%d", wq->task_pid);
set_task_comm(current, buf);
- current->flags |= PF_IO_WORKER;
- wq->manager = get_task_struct(current);
-
- complete(&wq->started);
do {
set_current_state(TASK_INTERRUPTIBLE);
@@ -815,21 +783,20 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work)
static int io_wq_fork_manager(struct io_wq *wq)
{
- int ret;
+ struct task_struct *tsk;
if (wq->manager)
return 0;
reinit_completion(&wq->worker_done);
- current->flags |= PF_IO_WORKER;
- ret = io_wq_fork_thread(io_wq_manager, wq);
- current->flags &= ~PF_IO_WORKER;
- if (ret >= 0) {
- wait_for_completion(&wq->started);
+ tsk = create_io_thread(io_wq_manager, wq, NUMA_NO_NODE);
+ if (!IS_ERR(tsk)) {
+ wq->manager = get_task_struct(tsk);
+ wake_up_new_task(tsk);
return 0;
}
- return ret;
+ return PTR_ERR(tsk);
}
static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work)
@@ -1062,7 +1029,6 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
}
wq->task_pid = current->pid;
- init_completion(&wq->started);
init_completion(&wq->exited);
refcount_set(&wq->refs, 1);
diff --git a/fs/io-wq.h b/fs/io-wq.h
index 42f0be64a84d..5fbf7997149e 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -119,8 +119,6 @@ void io_wq_put_and_exit(struct io_wq *wq);
void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work);
void io_wq_hash_work(struct io_wq_work *work, void *val);
-pid_t io_wq_fork_thread(int (*fn)(void *), void *arg);
-
static inline bool io_wq_is_hashed(struct io_wq_work *work)
{
return work->flags & IO_WQ_WORK_HASHED;
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e55369555e5c..04f04ac3c4cf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6677,8 +6677,6 @@ static int io_sq_thread(void *data)
set_cpus_allowed_ptr(current, cpu_online_mask);
current->flags |= PF_NO_SETAFFINITY;
- complete(&sqd->completion);
-
wait_for_completion(&sqd->startup);
while (!io_sq_thread_should_stop(sqd)) {
@@ -7818,21 +7816,24 @@ void __io_uring_free(struct task_struct *tsk)
static int io_sq_thread_fork(struct io_sq_data *sqd, struct io_ring_ctx *ctx)
{
+ struct task_struct *tsk;
int ret;
clear_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
reinit_completion(&sqd->completion);
ctx->sqo_exec = 0;
sqd->task_pid = current->pid;
- current->flags |= PF_IO_WORKER;
- ret = io_wq_fork_thread(io_sq_thread, sqd);
- current->flags &= ~PF_IO_WORKER;
- if (ret < 0) {
+ tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE);
+ if (IS_ERR(tsk)) {
sqd->thread = NULL;
return ret;
}
- wait_for_completion(&sqd->completion);
- return io_uring_alloc_task_context(sqd->thread, ctx);
+ sqd->thread = tsk;
+ ret = io_uring_alloc_task_context(tsk, ctx);
+ if (ret)
+ set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
+ wake_up_new_task(tsk);
+ return ret;
}
static int io_sq_offload_create(struct io_ring_ctx *ctx,
@@ -7855,6 +7856,7 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
fdput(f);
}
if (ctx->flags & IORING_SETUP_SQPOLL) {
+ struct task_struct *tsk;
struct io_sq_data *sqd;
ret = -EPERM;
@@ -7896,15 +7898,16 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
}
sqd->task_pid = current->pid;
- current->flags |= PF_IO_WORKER;
- ret = io_wq_fork_thread(io_sq_thread, sqd);
- current->flags &= ~PF_IO_WORKER;
- if (ret < 0) {
+ tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE);
+ if (IS_ERR(tsk)) {
sqd->thread = NULL;
goto err;
}
- wait_for_completion(&sqd->completion);
ret = io_uring_alloc_task_context(sqd->thread, ctx);
+ if (ret)
+ set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
+ sqd->thread = tsk;
+ wake_up_new_task(tsk);
if (ret)
goto err;
} else if (p->flags & IORING_SETUP_SQ_AFF) {
--
2.30.1
next prev parent reply other threads:[~2021-03-04 19:56 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-19 17:09 [PATCHSET RFC 0/18] Remove kthread usage from io_uring Jens Axboe
2021-02-19 17:09 ` [PATCH 01/18] io_uring: remove the need for relying on an io-wq fallback worker Jens Axboe
2021-02-19 20:25 ` Eric W. Biederman
2021-02-19 20:37 ` Jens Axboe
2021-02-22 13:46 ` Pavel Begunkov
2021-02-19 17:09 ` [PATCH 02/18] io-wq: don't create any IO workers upfront Jens Axboe
2021-02-19 17:09 ` [PATCH 03/18] io_uring: disable io-wq attaching Jens Axboe
2021-02-19 17:09 ` [PATCH 04/18] io-wq: get rid of wq->use_refs Jens Axboe
2021-02-19 17:09 ` [PATCH 05/18] io_uring: tie async worker side to the task context Jens Axboe
2021-02-20 8:11 ` Hao Xu
2021-02-20 14:38 ` Jens Axboe
2021-02-21 9:16 ` Hao Xu
2021-02-19 17:09 ` [PATCH 06/18] io-wq: don't pass 'wqe' needlessly around Jens Axboe
2021-02-19 17:09 ` [PATCH 07/18] arch: setup PF_IO_WORKER threads like PF_KTHREAD Jens Axboe
2021-02-19 22:21 ` Eric W. Biederman
2021-02-19 23:26 ` Jens Axboe
2021-02-19 17:10 ` [PATCH 08/18] kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals Jens Axboe
2021-02-19 17:10 ` [PATCH 09/18] io-wq: fork worker threads from original task Jens Axboe
2021-03-04 12:23 ` Stefan Metzmacher
2021-03-04 13:05 ` Jens Axboe
2021-03-04 13:19 ` Stefan Metzmacher
2021-03-04 16:13 ` Stefan Metzmacher
2021-03-04 16:42 ` Jens Axboe
2021-03-04 17:09 ` Stefan Metzmacher
2021-03-04 17:32 ` Jens Axboe
2021-03-04 18:19 ` Jens Axboe
2021-03-04 18:56 ` Linus Torvalds
2021-03-04 19:19 ` Jens Axboe
2021-03-04 19:46 ` Linus Torvalds
2021-03-04 19:54 ` Jens Axboe [this message]
2021-03-04 20:00 ` Jens Axboe
2021-03-04 20:23 ` Jens Axboe
2021-03-04 20:50 ` Linus Torvalds
2021-03-04 20:54 ` Jens Axboe
2021-03-05 19:16 ` Eric W. Biederman
2021-03-05 19:00 ` Eric W. Biederman
2021-02-19 17:10 ` [PATCH 10/18] io-wq: worker idling always returns false Jens Axboe
2021-02-19 17:10 ` [PATCH 11/18] io_uring: remove any grabbing of context Jens Axboe
2021-02-19 17:10 ` [PATCH 12/18] io_uring: remove io_identity Jens Axboe
2021-02-19 17:10 ` [PATCH 13/18] io-wq: only remove worker from free_list, if it was there Jens Axboe
2021-02-19 17:10 ` [PATCH 14/18] io-wq: make io_wq_fork_thread() available to other users Jens Axboe
2021-02-19 17:10 ` [PATCH 15/18] io_uring: move SQPOLL thread io-wq forked worker Jens Axboe
2021-02-19 17:10 ` [PATCH 16/18] Revert "proc: don't allow async path resolution of /proc/thread-self components" Jens Axboe
2021-02-19 17:10 ` [PATCH 17/18] Revert "proc: don't allow async path resolution of /proc/self components" Jens Axboe
2021-02-19 17:10 ` [PATCH 18/18] net: remove cmsg restriction from io_uring based send/recvmsg calls Jens Axboe
2021-02-19 23:44 ` [PATCHSET RFC 0/18] Remove kthread usage from io_uring Stefan Metzmacher
2021-02-19 23:51 ` Jens Axboe
2021-02-21 5:04 ` Linus Torvalds
2021-02-21 21:22 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox