* [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets @ 2024-09-10 17:11 Felix Moessbauer 2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Felix Moessbauer @ 2024-09-10 17:11 UTC (permalink / raw) To: axboe Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer Hi, this series continues the affinity cleanup work started in io_uring/sqpoll. It has been successfully tested against the liburing testsuite (make runtests), liburing @ caae94903d2e201. The test wq-aff.t succeeds if at least cpu 0,1 are in the set and fails otherwise. This is expected, as the test wants to pin on these cpus. I'll send a patch for liburing to skip that test in case this pre-condition is not met. Changes since v2: - rework commit message (remove ambient and worker from title as well). Sorry for the noise. - no functional changes Changes since v1: - rework commit messages (don't use ambient cpus, wq threads are no pollers) - no functional changes Best regards, Felix Moessbauer Siemens AG Felix Moessbauer (2): io_uring/io-wq: do not allow pinning outside of cpuset io_uring/io-wq: inherit cpuset of cgroup in io worker io_uring/io-wq.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) -- 2.39.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset 2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer @ 2024-09-10 17:11 ` Felix Moessbauer 2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer 2024-09-11 13:28 ` [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe 2 siblings, 0 replies; 6+ messages in thread From: Felix Moessbauer @ 2024-09-10 17:11 UTC (permalink / raw) To: axboe Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer The io worker threads are userland threads that just never exit to the userland. By that, they are also assigned to a cgroup (the group of the creating task). When changing the affinity of the io_wq thread via syscall, we must only allow cpumasks within the limits defined by the cpuset controller of the cgroup (if enabled). Fixes: da64d6db3bd3 ("io_uring: One wqe per wq") Signed-off-by: Felix Moessbauer <[email protected]> --- io_uring/io-wq.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index f1e7c670add8..c7055a8895d7 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -13,6 +13,7 @@ #include <linux/slab.h> #include <linux/rculist_nulls.h> #include <linux/cpu.h> +#include <linux/cpuset.h> #include <linux/task_work.h> #include <linux/audit.h> #include <linux/mmu_context.h> @@ -1322,17 +1323,29 @@ static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node) int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask) { + cpumask_var_t allowed_mask; + int ret = 0; + if (!tctx || !tctx->io_wq) return -EINVAL; + if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL)) + return -ENOMEM; + rcu_read_lock(); - if (mask) - cpumask_copy(tctx->io_wq->cpu_mask, mask); - else - cpumask_copy(tctx->io_wq->cpu_mask, cpu_possible_mask); + cpuset_cpus_allowed(tctx->io_wq->task, allowed_mask); + if (mask) { + if (cpumask_subset(mask, allowed_mask)) + cpumask_copy(tctx->io_wq->cpu_mask, mask); + else + ret = -EINVAL; + } else { + cpumask_copy(tctx->io_wq->cpu_mask, allowed_mask); + } rcu_read_unlock(); - return 0; + free_cpumask_var(allowed_mask); + return ret; } /* -- 2.39.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker 2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer 2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer @ 2024-09-10 17:11 ` Felix Moessbauer 2024-09-10 17:42 ` Waiman Long 2024-09-11 13:28 ` [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe 2 siblings, 1 reply; 6+ messages in thread From: Felix Moessbauer @ 2024-09-10 17:11 UTC (permalink / raw) To: axboe Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman, adriaan.schmidt, florian.bezdeka, Felix Moessbauer The io worker threads are userland threads that just never exit to the userland. By that, they are also assigned to a cgroup (the group of the creating task). When creating a new io worker, this worker should inherit the cpuset of the cgroup. Fixes: da64d6db3bd3 ("io_uring: One wqe per wq") Signed-off-by: Felix Moessbauer <[email protected]> --- io_uring/io-wq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index c7055a8895d7..a38f36b68060 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -1168,7 +1168,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) if (!alloc_cpumask_var(&wq->cpu_mask, GFP_KERNEL)) goto err; - cpumask_copy(wq->cpu_mask, cpu_possible_mask); + cpuset_cpus_allowed(data->task, wq->cpu_mask); wq->acct[IO_WQ_ACCT_BOUND].max_workers = bounded; wq->acct[IO_WQ_ACCT_UNBOUND].max_workers = task_rlimit(current, RLIMIT_NPROC); -- 2.39.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker 2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer @ 2024-09-10 17:42 ` Waiman Long 2024-09-11 7:02 ` MOESSBAUER, Felix 0 siblings, 1 reply; 6+ messages in thread From: Waiman Long @ 2024-09-10 17:42 UTC (permalink / raw) To: Felix Moessbauer, axboe Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, adriaan.schmidt, florian.bezdeka On 9/10/24 13:11, Felix Moessbauer wrote: > The io worker threads are userland threads that just never exit to the > userland. By that, they are also assigned to a cgroup (the group of the > creating task). The io-wq task is not actually assigned to a cgroup. To belong to a cgroup, its pid has to be present to the cgroup.procs of the corresponding cgroup, which is not the case here. My understanding is that you are just restricting the CPU affinity to follow the cpuset of the corresponding user task that creates it. The CPU affinity (cpumask) is just one of the many resources controlled by a cgroup. That probably needs to be clarified. Besides cpumask, the cpuset controller also controls the node mask of the memory nodes allowed. Cheers, Longman > > When creating a new io worker, this worker should inherit the cpuset > of the cgroup. > > Fixes: da64d6db3bd3 ("io_uring: One wqe per wq") > Signed-off-by: Felix Moessbauer <[email protected]> > --- > io_uring/io-wq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c > index c7055a8895d7..a38f36b68060 100644 > --- a/io_uring/io-wq.c > +++ b/io_uring/io-wq.c > @@ -1168,7 +1168,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) > > if (!alloc_cpumask_var(&wq->cpu_mask, GFP_KERNEL)) > goto err; > - cpumask_copy(wq->cpu_mask, cpu_possible_mask); > + cpuset_cpus_allowed(data->task, wq->cpu_mask); > wq->acct[IO_WQ_ACCT_BOUND].max_workers = bounded; > wq->acct[IO_WQ_ACCT_UNBOUND].max_workers = > task_rlimit(current, RLIMIT_NPROC); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker 2024-09-10 17:42 ` Waiman Long @ 2024-09-11 7:02 ` MOESSBAUER, Felix 0 siblings, 0 replies; 6+ messages in thread From: MOESSBAUER, Felix @ 2024-09-11 7:02 UTC (permalink / raw) To: [email protected], [email protected] Cc: [email protected], Schmidt, Adriaan, Bezdeka, Florian, [email protected], [email protected], [email protected], [email protected] On Tue, 2024-09-10 at 13:42 -0400, Waiman Long wrote: > > On 9/10/24 13:11, Felix Moessbauer wrote: > > The io worker threads are userland threads that just never exit to > > the > > userland. By that, they are also assigned to a cgroup (the group of > > the > > creating task). > > The io-wq task is not actually assigned to a cgroup. To belong to a > cgroup, its pid has to be present to the cgroup.procs of the > corresponding cgroup, which is not the case here. Hi, thanks for jumping in. As said, I'm not too familiar with the internals of the io worker threads. Nonetheless, the kernel presents the cgroup assignment quite consistently. This however contradicts your statement from above. Example: pid tid 648460 648460 SCHED_OTHER 20 S 0 0-1 ./test/wq-aff.t 648460 648461 SCHED_OTHER 20 S 1 1 iou-sqp-648460 648460 648462 SCHED_OTHER 20 S 0 0 iou-wrk-648461 When I now check the cgroup.procs, I just see the 648460, which is expected as this the process (with its main thread). Checking cgroup.threads shows all three tids. When checking the other way round, I get the same information: $cat /proc/648460/task/648461/cgroup 0::/user.slice/user-1000.slice/session-1.scope $cat /proc/648460/task/648462/cgroup 0::/user.slice/user-1000.slice/session-1.scope Now I'm wondering if it is just presented incorrectly, or if these tasks indeed belong to the mentioned cgroup? > My understanding is > that you are just restricting the CPU affinity to follow the cpuset > of > the corresponding user task that creates it. The CPU affinity > (cpumask) > is just one of the many resources controlled by a cgroup. That > probably > needs to be clarified. That's clear. Looking at the bigger picture, I want to ensure that the io workers do not break out of the cgroup limits (I called it "ambient" before, similar to the capabilites), because this breaks the isolation assumption. In our case, we are mostly interested in not leaving the cpuset, as we use that to perform system partitioning into realtime and non realtime parts. > > Besides cpumask, the cpuset controller also controls the node mask of > the memory nodes allowed. Yes, and that is especially important as some memory can be "closer" to the IOs than others. Best regards, Felix -- Siemens AG, Technology Linux Expert Center ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets 2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer 2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer 2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer @ 2024-09-11 13:28 ` Jens Axboe 2 siblings, 0 replies; 6+ messages in thread From: Jens Axboe @ 2024-09-11 13:28 UTC (permalink / raw) To: Felix Moessbauer Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman, adriaan.schmidt, florian.bezdeka On Tue, 10 Sep 2024 19:11:55 +0200, Felix Moessbauer wrote: > this series continues the affinity cleanup work started in > io_uring/sqpoll. It has been successfully tested against the liburing > testsuite (make runtests), liburing @ caae94903d2e201. > > The test wq-aff.t succeeds if at least cpu 0,1 are in > the set and fails otherwise. This is expected, as the test wants > to pin on these cpus. I'll send a patch for liburing to skip that test > in case this pre-condition is not met. > > [...] Applied, thanks! [1/2] io_uring/io-wq: do not allow pinning outside of cpuset commit: 0997aa5497c714edbb349ca366d28bd550ba3408 [2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker commit: 84eacf177faa605853c58e5b1c0d9544b88c16fd Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-11 13:28 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-10 17:11 [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer 2024-09-10 17:11 ` [PATCH v3 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer 2024-09-10 17:11 ` [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io worker Felix Moessbauer 2024-09-10 17:42 ` Waiman Long 2024-09-11 7:02 ` MOESSBAUER, Felix 2024-09-11 13:28 ` [PATCH v3 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox