[PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
@ 2024-09-11 16:23 Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Felix Moessbauer @ 2024-09-11 16:23 UTC (permalink / raw)
  To: axboe
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer

Hi,

as discussed in [1], this is a manual backport of the remaining two
patches to let the io worker threads respect the affinites defined by
the cgroup of the process.

In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
("io_uring: One wqe per wq") this is changed to only have a single worker.
As this patch is pretty invasive, Jens and me agreed to not backport it.

Instead we now limit the workers cpuset to the cpus that are in the
intersection between what the cgroup allows and what the NUMA node has.
This leaves the question what to do in case the intersection is empty:
To be backwarts compatible, we allow this case, but restrict the cpumask
of the poller to the cpuset defined by the cgroup. We further believe
this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
anyways.

[1] https://lore.kernel.org/lkml/[email protected]

Best regards,
Felix Moessbauer
Siemens AG

Felix Moessbauer (2):
  io_uring/io-wq: do not allow pinning outside of cpuset
  io_uring/io-wq: inherit cpuset of cgroup in io worker

 io_uring/io-wq.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

-- 
2.39.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset
  2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
@ 2024-09-11 16:23 ` Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Felix Moessbauer @ 2024-09-11 16:23 UTC (permalink / raw)
  To: axboe
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer

commit 0997aa5497c714edbb349ca366d28bd550ba3408 upstream.

The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).

When changing the affinity of the io_wq thread via syscall, we must only
allow cpumasks within the limits defined by the cpuset controller of the
cgroup (if enabled).

Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
 io_uring/io-wq.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 139cd49b2c27..c74bcc8d2f06 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/rculist_nulls.h>
 #include <linux/cpu.h>
+#include <linux/cpuset.h>
 #include <linux/task_work.h>
 #include <linux/audit.h>
 #include <uapi/linux/io_uring.h>
@@ -1362,22 +1363,34 @@ static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node)
 
 int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask)
 {
+	cpumask_var_t allowed_mask;
+	int ret = 0;
 	int i;
 
 	if (!tctx || !tctx->io_wq)
 		return -EINVAL;
 
+	if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+		return -ENOMEM;
+	cpuset_cpus_allowed(tctx->io_wq->task, allowed_mask);
+
 	rcu_read_lock();
 	for_each_node(i) {
 		struct io_wqe *wqe = tctx->io_wq->wqes[i];
-
-		if (mask)
-			cpumask_copy(wqe->cpu_mask, mask);
-		else
-			cpumask_copy(wqe->cpu_mask, cpumask_of_node(i));
+		if (mask) {
+			if (cpumask_subset(mask, allowed_mask))
+				cpumask_copy(wqe->cpu_mask, mask);
+			else
+				ret = -EINVAL;
+		} else {
+			if (!cpumask_and(wqe->cpu_mask, cpumask_of_node(i), allowed_mask))
+				cpumask_copy(wqe->cpu_mask, allowed_mask);
+		}
 	}
 	rcu_read_unlock();
-	return 0;
+
+	free_cpumask_var(allowed_mask);
+	return ret;
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker
  2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
@ 2024-09-11 16:23 ` Felix Moessbauer
  2024-09-11 16:28 ` [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
  2024-09-30 19:15 ` Greg KH
  3 siblings, 0 replies; 8+ messages in thread
From: Felix Moessbauer @ 2024-09-11 16:23 UTC (permalink / raw)
  To: axboe
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer

commit 84eacf177faa605853c58e5b1c0d9544b88c16fd upstream.

The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).

When creating a new io worker, this worker should inherit the cpuset
of the cgroup.

Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
 io_uring/io-wq.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index c74bcc8d2f06..04265bf8d319 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -1157,6 +1157,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 {
 	int ret, node, i;
 	struct io_wq *wq;
+	cpumask_var_t allowed_mask;
 
 	if (WARN_ON_ONCE(!data->free_work || !data->do_work))
 		return ERR_PTR(-EINVAL);
@@ -1176,6 +1177,9 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 	wq->do_work = data->do_work;
 
 	ret = -ENOMEM;
+	if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+		goto err;
+	cpuset_cpus_allowed(current, allowed_mask);
 	for_each_node(node) {
 		struct io_wqe *wqe;
 		int alloc_node = node;
@@ -1188,7 +1192,8 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 		wq->wqes[node] = wqe;
 		if (!alloc_cpumask_var(&wqe->cpu_mask, GFP_KERNEL))
 			goto err;
-		cpumask_copy(wqe->cpu_mask, cpumask_of_node(node));
+		if (!cpumask_and(wqe->cpu_mask, cpumask_of_node(node), allowed_mask))
+			cpumask_copy(wqe->cpu_mask, allowed_mask);
 		wqe->node = alloc_node;
 		wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
 		wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers =
@@ -1222,6 +1227,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 		free_cpumask_var(wq->wqes[node]->cpu_mask);
 		kfree(wq->wqes[node]);
 	}
+	free_cpumask_var(allowed_mask);
 err_wq:
 	kfree(wq);
 	return ERR_PTR(ret);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
  2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
  2024-09-11 16:23 ` [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
@ 2024-09-11 16:28 ` Jens Axboe
  2024-09-30 19:15 ` Greg KH
  3 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-09-11 16:28 UTC (permalink / raw)
  To: Felix Moessbauer
  Cc: stable, asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	longman, adriaan.schmidt, florian.bezdeka

On 9/11/24 10:23 AM, Felix Moessbauer wrote:
> Hi,
> 
> as discussed in [1], this is a manual backport of the remaining two
> patches to let the io worker threads respect the affinites defined by
> the cgroup of the process.
> 
> In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> ("io_uring: One wqe per wq") this is changed to only have a single worker.
> As this patch is pretty invasive, Jens and me agreed to not backport it.
> 
> Instead we now limit the workers cpuset to the cpus that are in the
> intersection between what the cgroup allows and what the NUMA node has.
> This leaves the question what to do in case the intersection is empty:
> To be backwarts compatible, we allow this case, but restrict the cpumask
> of the poller to the cpuset defined by the cgroup. We further believe
> this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
> anyways.
> 
> [1] https://lore.kernel.org/lkml/[email protected]

The upstream patches are staged for 6.12 and marked for a backport, so
they should go upstream next week. Once they are upstream, I'll make
sure to check in on these on the stable front.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
  2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
                   ` (2 preceding siblings ...)
  2024-09-11 16:28 ` [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
@ 2024-09-30 19:15 ` Greg KH
  2024-10-01  7:32   ` MOESSBAUER, Felix
  3 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2024-09-30 19:15 UTC (permalink / raw)
  To: Felix Moessbauer
  Cc: axboe, stable, asml.silence, linux-kernel, io-uring, cgroups,
	dqminh, longman, adriaan.schmidt, florian.bezdeka

On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
> Hi,
> 
> as discussed in [1], this is a manual backport of the remaining two
> patches to let the io worker threads respect the affinites defined by
> the cgroup of the process.
> 
> In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> ("io_uring: One wqe per wq") this is changed to only have a single worker.
> As this patch is pretty invasive, Jens and me agreed to not backport it.
> 
> Instead we now limit the workers cpuset to the cpus that are in the
> intersection between what the cgroup allows and what the NUMA node has.
> This leaves the question what to do in case the intersection is empty:
> To be backwarts compatible, we allow this case, but restrict the cpumask
> of the poller to the cpuset defined by the cgroup. We further believe
> this is a reasonable decision, as da64d6db3bd3 drops the NUMA awareness
> anyways.
> 
> [1] https://lore.kernel.org/lkml/[email protected]

Why was neither of these actually tagged for inclusion in a stable tree?
Why just 6.1.y?  Please submit them for all relevent kernel versions.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
  2024-09-30 19:15 ` Greg KH
@ 2024-10-01  7:32   ` MOESSBAUER, Felix
  2024-10-01  7:50     ` gregkh
  0 siblings, 1 reply; 8+ messages in thread
From: MOESSBAUER, Felix @ 2024-10-01  7:32 UTC (permalink / raw)
  To: [email protected]
  Cc: Schmidt, Adriaan, [email protected],
	[email protected], Bezdeka, Florian,
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected]

On Mon, 2024-09-30 at 21:15 +0200, Greg KH wrote:
> On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
> > Hi,
> > 
> > as discussed in [1], this is a manual backport of the remaining two
> > patches to let the io worker threads respect the affinites defined
> > by
> > the cgroup of the process.
> > 
> > In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> > ("io_uring: One wqe per wq") this is changed to only have a single
> > worker.
> > As this patch is pretty invasive, Jens and me agreed to not
> > backport it.
> > 
> > Instead we now limit the workers cpuset to the cpus that are in the
> > intersection between what the cgroup allows and what the NUMA node
> > has.
> > This leaves the question what to do in case the intersection is
> > empty:
> > To be backwarts compatible, we allow this case, but restrict the
> > cpumask
> > of the poller to the cpuset defined by the cgroup. We further
> > believe
> > this is a reasonable decision, as da64d6db3bd3 drops the NUMA
> > awareness
> > anyways.
> > 
> > [1]
> > https://lore.kernel.org/lkml/[email protected]
> 
> Why was neither of these actually tagged for inclusion in a stable
> tree?

This is a manual backport of these patches for 6.1, as the subsystem
changed significantly between 6.1 and 6.2, making an automated backport
impossible. This has been agreed on with Jens in
https://lore.kernel.org/lkml/[email protected]/

> Why just 6.1.y?  Please submit them for all relevent kernel versions.

The original patch was tagged stable and got accepted in 6.6, 6.10 and
6.11.

Felix

> 
> thanks,
> 
> greg k-h

-- 
Siemens AG, Technology
Linux Expert Center



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
  2024-10-01  7:32   ` MOESSBAUER, Felix
@ 2024-10-01  7:50     ` gregkh
  2024-10-01 13:35       ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: gregkh @ 2024-10-01  7:50 UTC (permalink / raw)
  To: MOESSBAUER, Felix
  Cc: Schmidt, Adriaan, [email protected],
	[email protected], Bezdeka, Florian,
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected]

On Tue, Oct 01, 2024 at 07:32:42AM +0000, MOESSBAUER, Felix wrote:
> On Mon, 2024-09-30 at 21:15 +0200, Greg KH wrote:
> > On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
> > > Hi,
> > > 
> > > as discussed in [1], this is a manual backport of the remaining two
> > > patches to let the io worker threads respect the affinites defined
> > > by
> > > the cgroup of the process.
> > > 
> > > In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
> > > ("io_uring: One wqe per wq") this is changed to only have a single
> > > worker.
> > > As this patch is pretty invasive, Jens and me agreed to not
> > > backport it.
> > > 
> > > Instead we now limit the workers cpuset to the cpus that are in the
> > > intersection between what the cgroup allows and what the NUMA node
> > > has.
> > > This leaves the question what to do in case the intersection is
> > > empty:
> > > To be backwarts compatible, we allow this case, but restrict the
> > > cpumask
> > > of the poller to the cpuset defined by the cgroup. We further
> > > believe
> > > this is a reasonable decision, as da64d6db3bd3 drops the NUMA
> > > awareness
> > > anyways.
> > > 
> > > [1]
> > > https://lore.kernel.org/lkml/[email protected]
> > 
> > Why was neither of these actually tagged for inclusion in a stable
> > tree?
> 
> This is a manual backport of these patches for 6.1, as the subsystem
> changed significantly between 6.1 and 6.2, making an automated backport
> impossible. This has been agreed on with Jens in
> https://lore.kernel.org/lkml/[email protected]/
> 
> > Why just 6.1.y?  Please submit them for all relevent kernel versions.
> 
> The original patch was tagged stable and got accepted in 6.6, 6.10 and
> 6.11.

No they were not at all.  Please properly tag them in the future as per
the documentation if you wish to have things applied to the stable
trees:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets
  2024-10-01  7:50     ` gregkh
@ 2024-10-01 13:35       ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-10-01 13:35 UTC (permalink / raw)
  To: [email protected], MOESSBAUER, Felix
  Cc: Schmidt, Adriaan, [email protected],
	[email protected], Bezdeka, Florian,
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]

On 10/1/24 1:50 AM, [email protected] wrote:
> On Tue, Oct 01, 2024 at 07:32:42AM +0000, MOESSBAUER, Felix wrote:
>> On Mon, 2024-09-30 at 21:15 +0200, Greg KH wrote:
>>> On Wed, Sep 11, 2024 at 06:23:14PM +0200, Felix Moessbauer wrote:
>>>> Hi,
>>>>
>>>> as discussed in [1], this is a manual backport of the remaining two
>>>> patches to let the io worker threads respect the affinites defined
>>>> by
>>>> the cgroup of the process.
>>>>
>>>> In 6.1 one worker is created per NUMA node, while in da64d6db3bd3
>>>> ("io_uring: One wqe per wq") this is changed to only have a single
>>>> worker.
>>>> As this patch is pretty invasive, Jens and me agreed to not
>>>> backport it.
>>>>
>>>> Instead we now limit the workers cpuset to the cpus that are in the
>>>> intersection between what the cgroup allows and what the NUMA node
>>>> has.
>>>> This leaves the question what to do in case the intersection is
>>>> empty:
>>>> To be backwarts compatible, we allow this case, but restrict the
>>>> cpumask
>>>> of the poller to the cpuset defined by the cgroup. We further
>>>> believe
>>>> this is a reasonable decision, as da64d6db3bd3 drops the NUMA
>>>> awareness
>>>> anyways.
>>>>
>>>> [1]
>>>> https://lore.kernel.org/lkml/[email protected]
>>>
>>> Why was neither of these actually tagged for inclusion in a stable
>>> tree?
>>
>> This is a manual backport of these patches for 6.1, as the subsystem
>> changed significantly between 6.1 and 6.2, making an automated backport
>> impossible. This has been agreed on with Jens in
>> https://lore.kernel.org/lkml/[email protected]/
>>
>>> Why just 6.1.y?  Please submit them for all relevent kernel versions.
>>
>> The original patch was tagged stable and got accepted in 6.6, 6.10 and
>> 6.11.
> 
> No they were not at all.  Please properly tag them in the future as per
> the documentation if you wish to have things applied to the stable
> trees:
>     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

That's my bad, missed that one of them did not get marked for stable,
the sqpoll one did.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-01 13:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-11 16:23 [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
2024-09-11 16:23 ` [PATCH 6.1 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
2024-09-11 16:23 ` [PATCH 6.1 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
2024-09-11 16:28 ` [PATCH 6.1 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
2024-09-30 19:15 ` Greg KH
2024-10-01  7:32   ` MOESSBAUER, Felix
2024-10-01  7:50     ` gregkh
2024-10-01 13:35       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox