public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets
@ 2024-09-10 17:11 Felix Moessbauer
  2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Felix Moessbauer @ 2024-09-10 17:11 UTC (permalink / raw)
  To: axboe
  Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
	adriaan.schmidt, florian.bezdeka, Felix Moessbauer

Hi,

this series continues the affinity cleanup work started in
io_uring/sqpoll. It has been successfully tested against the liburing
testsuite (make runtests), liburing @ caae94903d2e201.

The test wq-aff.t succeeds if at least cpu 0,1 are in
the set and fails otherwise. This is expected, as the test wants
to pin on these cpus. I'll send a patch for liburing to skip that test
in case this pre-condition is not met.

Changes since v2:

- rework commit message (remove ambient and worker from title as well).
  Sorry for the noise.
- no functional changes

Changes since v1:

- rework commit messages (don't use ambient cpus, wq threads are no
  pollers)
- no functional changes

Best regards,
Felix Moessbauer
Siemens AG

Felix Moessbauer (2):
  io_uring/io-wq: do not allow pinning outside of cpuset
  io_uring/io-wq: inherit cpuset of cgroup in io worker

 io_uring/io-wq.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset
  2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
@ 2024-09-10 17:11 ` Felix Moessbauer
  2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
  2024-09-11 13:28 ` [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Felix Moessbauer @ 2024-09-10 17:11 UTC (permalink / raw)
  To: axboe
  Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
	adriaan.schmidt, florian.bezdeka, Felix Moessbauer

The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).

When changing the affinity of the io_wq thread via syscall, we must only
allow cpumasks within the limits defined by the cpuset controller of the
cgroup (if enabled).

Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
 io_uring/io-wq.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index f1e7c670add8..c7055a8895d7 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/rculist_nulls.h>
 #include <linux/cpu.h>
+#include <linux/cpuset.h>
 #include <linux/task_work.h>
 #include <linux/audit.h>
 #include <linux/mmu_context.h>
@@ -1322,17 +1323,29 @@ static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node)
 
 int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask)
 {
+	cpumask_var_t allowed_mask;
+	int ret = 0;
+
 	if (!tctx || !tctx->io_wq)
 		return -EINVAL;
 
+	if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+		return -ENOMEM;
+
 	rcu_read_lock();
-	if (mask)
-		cpumask_copy(tctx->io_wq->cpu_mask, mask);
-	else
-		cpumask_copy(tctx->io_wq->cpu_mask, cpu_possible_mask);
+	cpuset_cpus_allowed(tctx->io_wq->task, allowed_mask);
+	if (mask) {
+		if (cpumask_subset(mask, allowed_mask))
+			cpumask_copy(tctx->io_wq->cpu_mask, mask);
+		else
+			ret = -EINVAL;
+	} else {
+		cpumask_copy(tctx->io_wq->cpu_mask, allowed_mask);
+	}
 	rcu_read_unlock();
 
-	return 0;
+	free_cpumask_var(allowed_mask);
+	return ret;
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker
  2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
  2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
@ 2024-09-10 17:11 ` Felix Moessbauer
  2024-09-10 17:42   ` Waiman Long
  2024-09-11 13:28 ` [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
  2 siblings, 1 reply; 6+ messages in thread
From: Felix Moessbauer @ 2024-09-10 17:11 UTC (permalink / raw)
  To: axboe
  Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
	adriaan.schmidt, florian.bezdeka, Felix Moessbauer

The io worker threads are userland threads that just never exit to the
userland. By that, they are also assigned to a cgroup (the group of the
creating task).

When creating a new io worker, this worker should inherit the cpuset
of the cgroup.

Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
 io_uring/io-wq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index c7055a8895d7..a38f36b68060 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -1168,7 +1168,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 
 	if (!alloc_cpumask_var(&wq->cpu_mask, GFP_KERNEL))
 		goto err;
-	cpumask_copy(wq->cpu_mask, cpu_possible_mask);
+	cpuset_cpus_allowed(data->task, wq->cpu_mask);
 	wq->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
 	wq->acct[IO_WQ_ACCT_UNBOUND].max_workers =
 				task_rlimit(current, RLIMIT_NPROC);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker
  2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
@ 2024-09-10 17:42   ` Waiman Long
  2024-09-11  7:02     ` MOESSBAUER, Felix
  0 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2024-09-10 17:42 UTC (permalink / raw)
  To: Felix Moessbauer, axboe
  Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh,
	adriaan.schmidt, florian.bezdeka


On 9/10/24 13:11, Felix Moessbauer wrote:
> The io worker threads are userland threads that just never exit to the
> userland. By that, they are also assigned to a cgroup (the group of the
> creating task).

The io-wq task is not actually assigned to a cgroup. To belong to a 
cgroup, its pid has to be present to the cgroup.procs of the 
corresponding cgroup, which is not the case here. My understanding is 
that you are just restricting the CPU affinity to follow the cpuset of 
the corresponding user task that creates it. The CPU affinity (cpumask) 
is just one of the many resources controlled by a cgroup. That probably 
needs to be clarified.

Besides cpumask, the cpuset controller also controls the node mask of 
the memory nodes allowed.

Cheers,
Longman

>
> When creating a new io worker, this worker should inherit the cpuset
> of the cgroup.
>
> Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
> Signed-off-by: Felix Moessbauer <[email protected]>
> ---
>   io_uring/io-wq.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
> index c7055a8895d7..a38f36b68060 100644
> --- a/io_uring/io-wq.c
> +++ b/io_uring/io-wq.c
> @@ -1168,7 +1168,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
>   
>   	if (!alloc_cpumask_var(&wq->cpu_mask, GFP_KERNEL))
>   		goto err;
> -	cpumask_copy(wq->cpu_mask, cpu_possible_mask);
> +	cpuset_cpus_allowed(data->task, wq->cpu_mask);
>   	wq->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
>   	wq->acct[IO_WQ_ACCT_UNBOUND].max_workers =
>   				task_rlimit(current, RLIMIT_NPROC);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker
  2024-09-10 17:42   ` Waiman Long
@ 2024-09-11  7:02     ` MOESSBAUER, Felix
  0 siblings, 0 replies; 6+ messages in thread
From: MOESSBAUER, Felix @ 2024-09-11  7:02 UTC (permalink / raw)
  To: [email protected], [email protected]
  Cc: [email protected], Schmidt, Adriaan, Bezdeka, Florian,
	[email protected], [email protected],
	[email protected], [email protected]

On Tue, 2024-09-10 at 13:42 -0400, Waiman Long wrote:
> 
> On 9/10/24 13:11, Felix Moessbauer wrote:
> > The io worker threads are userland threads that just never exit to
> > the
> > userland. By that, they are also assigned to a cgroup (the group of
> > the
> > creating task).
> 
> The io-wq task is not actually assigned to a cgroup. To belong to a 
> cgroup, its pid has to be present to the cgroup.procs of the 
> corresponding cgroup, which is not the case here.

Hi, thanks for jumping in. As said, I'm not too familiar with the
internals of the io worker threads. Nonetheless, the kernel presents
the cgroup assignment quite consistently. This however contradicts your
statement from above. Example:

pid     tid
648460  648460  SCHED_OTHER   20  S    0  0-1  ./test/wq-aff.t
648460  648461  SCHED_OTHER   20  S    1  1    iou-sqp-648460
648460  648462  SCHED_OTHER   20  S    0  0    iou-wrk-648461

When I now check the cgroup.procs, I just see the 648460, which is
expected as this the process (with its main thread). Checking
cgroup.threads shows all three tids.

When checking the other way round, I get the same information:
$cat /proc/648460/task/648461/cgroup                                  
0::/user.slice/user-1000.slice/session-1.scope
$cat /proc/648460/task/648462/cgroup                                  
0::/user.slice/user-1000.slice/session-1.scope

Now I'm wondering if it is just presented incorrectly, or if these
tasks indeed belong to the mentioned cgroup?

> My understanding is
> that you are just restricting the CPU affinity to follow the cpuset
> of 
> the corresponding user task that creates it. The CPU affinity
> (cpumask) 
> is just one of the many resources controlled by a cgroup. That
> probably 
> needs to be clarified.

That's clear. Looking at the bigger picture, I want to ensure that the
io workers do not break out of the cgroup limits (I called it "ambient"
before, similar to the capabilites), because this breaks the isolation
assumption. In our case, we are mostly interested in not leaving the
cpuset, as we use that to perform system partitioning into realtime and
non realtime parts.

> 
> Besides cpumask, the cpuset controller also controls the node mask of
> the memory nodes allowed.

Yes, and that is especially important as some memory can be "closer" to
the IOs than others.

Best regards,
Felix

-- 
Siemens AG, Technology
Linux Expert Center



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets
  2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
  2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
  2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
@ 2024-09-11 13:28 ` Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-09-11 13:28 UTC (permalink / raw)
  To: Felix Moessbauer
  Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
	adriaan.schmidt, florian.bezdeka


On Tue, 10 Sep 2024 19:11:55 +0200, Felix Moessbauer wrote:
> this series continues the affinity cleanup work started in
> io_uring/sqpoll. It has been successfully tested against the liburing
> testsuite (make runtests), liburing @ caae94903d2e201.
> 
> The test wq-aff.t succeeds if at least cpu 0,1 are in
> the set and fails otherwise. This is expected, as the test wants
> to pin on these cpus. I'll send a patch for liburing to skip that test
> in case this pre-condition is not met.
> 
> [...]

Applied, thanks!

[1/2] io_uring/io-wq: do not allow pinning outside of cpuset
      commit: 0997aa5497c714edbb349ca366d28bd550ba3408
[2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker
      commit: 84eacf177faa605853c58e5b1c0d9544b88c16fd

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-11 13:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer
2024-09-10 17:42   ` Waiman Long
2024-09-11  7:02     ` MOESSBAUER, Felix
2024-09-11 13:28 ` [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox