public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
@ 2024-09-11 16:23 Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Felix Moessbauer @ 2024-09-11 16:23 UTC (permalink / raw)
  To: axboe
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer

Hi,

as discussed in [1], this is a manual backport of the remaining two
patches to let the io worker threads respect the affinites defined by
the cgroup of the process.

In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
("io_uring: One wqe per wq") this is changed to only have a single worker.
As this patch is pretty invasive, Jens and me agreed to not backport it.

Instead we now limit the workers cpuset to the cpus that are in the
intersection between what the cgroup allows and what the NUMA node has.
This leaves the question what to do in case the intersection is empty:
To be backwarts compatible, we allow this case, but restrict the cpumask
of the poller to the cpuset defined by the cgroup. We further believe
this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
anyways.

[1] https://lore.kernel.org/lkml/[email protected]

Best regards,
Felix Moessbauer
Siemens AG

Felix Moessbauer (2):
  io_uring/io-wq: do not allow pinning outside of cpuset
  io_uring/io-wq: inherit cpuset of cgroup in io worker

 io_uring/io-wq.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset
  2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
@ 2024-09-11 16:23 ` Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
  2024-09-11 16:28 ` [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Felix Moessbauer @ 2024-09-11 16:23 UTC (permalink / raw)
  To: axboe
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer

commit 0997aa5497c714edbb349ca366d28bd550ba3408 upstream.

The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).

When changing the affinity of the io_wq thread via syscall, we must only
allow cpumasks within the limits defined by the cpuset controller of the
cgroup (if enabled).

Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
 io_uring/io-wq.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 139cd49b2c27..c74bcc8d2f06 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/rculist_nulls.h>
 #include <linux/cpu.h>
+#include <linux/cpuset.h>
 #include <linux/task_work.h>
 #include <linux/audit.h>
 #include <uapi/linux/io_uring.h>
@@ -1362,22 +1363,34 @@ static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node)
 
 int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask)
 {
+	cpumask_var_t allowed_mask;
+	int ret = 0;
 	int i;
 
 	if (!tctx || !tctx->io_wq)
 		return -EINVAL;
 
+	if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+		return -ENOMEM;
+	cpuset_cpus_allowed(tctx->io_wq->task, allowed_mask);
+
 	rcu_read_lock();
 	for_each_node(i) {
 		struct io_wqe *wqe = tctx->io_wq->wqes[i];
-
-		if (mask)
-			cpumask_copy(wqe->cpu_mask, mask);
-		else
-			cpumask_copy(wqe->cpu_mask, cpumask_of_node(i));
+		if (mask) {
+			if (cpumask_subset(mask, allowed_mask))
+				cpumask_copy(wqe->cpu_mask, mask);
+			else
+				ret = -EINVAL;
+		} else {
+			if (!cpumask_and(wqe->cpu_mask, cpumask_of_node(i), allowed_mask))
+				cpumask_copy(wqe->cpu_mask, allowed_mask);
+		}
 	}
 	rcu_read_unlock();
-	return 0;
+
+	free_cpumask_var(allowed_mask);
+	return ret;
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker
  2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
@ 2024-09-11 16:23 ` Felix Moessbauer
  2024-09-11 16:28 ` [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Felix Moessbauer @ 2024-09-11 16:23 UTC (permalink / raw)
  To: axboe
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer

commit 84eacf177faa605853c58e5b1c0d9544b88c16fd upstream.

The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).

When creating a new io worker, this worker should inherit the cpuset
of the cgroup.

Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
 io_uring/io-wq.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index c74bcc8d2f06..04265bf8d319 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -1157,6 +1157,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 {
 	int ret, node, i;
 	struct io_wq *wq;
+	cpumask_var_t allowed_mask;
 
 	if (WARN_ON_ONCE(!data->free_work || !data->do_work))
 		return ERR_PTR(-EINVAL);
@@ -1176,6 +1177,9 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 	wq->do_work = data->do_work;
 
 	ret = -ENOMEM;
+	if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+		goto err;
+	cpuset_cpus_allowed(current, allowed_mask);
 	for_each_node(node) {
 		struct io_wqe *wqe;
 		int alloc_node = node;
@@ -1188,7 +1192,8 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 		wq->wqes[node] = wqe;
 		if (!alloc_cpumask_var(&wqe->cpu_mask, GFP_KERNEL))
 			goto err;
-		cpumask_copy(wqe->cpu_mask, cpumask_of_node(node));
+		if (!cpumask_and(wqe->cpu_mask, cpumask_of_node(node), allowed_mask))
+			cpumask_copy(wqe->cpu_mask, allowed_mask);
 		wqe->node = alloc_node;
 		wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
 		wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers =
@@ -1222,6 +1227,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 		free_cpumask_var(wq->wqes[node]->cpu_mask);
 		kfree(wq->wqes[node]);
 	}
+	free_cpumask_var(allowed_mask);
 err_wq:
 	kfree(wq);
 	return ERR_PTR(ret);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
  2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
@ 2024-09-11 16:28 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2024-09-11 16:28 UTC (permalink / raw)
  To: Felix Moessbauer
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka

On 9/11/24 10:23 AM, Felix Moessbauer wrote:
> Hi,
> 
> as discussed in [1], this is a manual backport of the remaining two
> patches to let the io worker threads respect the affinites defined by
> the cgroup of the process.
> 
> In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> ("io_uring: One wqe per wq") this is changed to only have a single worker.
> As this patch is pretty invasive, Jens and me agreed to not backport it.
> 
> Instead we now limit the workers cpuset to the cpus that are in the
> intersection between what the cgroup allows and what the NUMA node has.
> This leaves the question what to do in case the intersection is empty:
> To be backwarts compatible, we allow this case, but restrict the cpumask
> of the poller to the cpuset defined by the cgroup. We further believe
> this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
> anyways.
> 
> [1] https://lore.kernel.org/lkml/[email protected]

The upstream patches are staged for 6.12 and marked for a backport, so
they should go upstream next week. Once they are upstream, I'll make
sure to check in on these on the stable front.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-11 16:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
2024-09-11 16:23 ` [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
2024-09-11 16:28 ` [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox