* [PATCH 1/2] io_uring/io-wq: do not allow pinning outside of cpuset
2024-09-10 14:33 [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
@ 2024-09-10 14:33 ` Felix Moessbauer
2024-09-10 14:55 ` Jens Axboe
2024-09-10 14:33 ` [PATCH 2/2] io_uring/io-wq: limit io poller cpuset to ambient one Felix Moessbauer
2024-09-10 14:53 ` [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
2 siblings, 1 reply; 10+ messages in thread
From: Felix Moessbauer @ 2024-09-10 14:33 UTC (permalink / raw)
To: axboe
Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
adriaan.schmidt, florian.bezdeka, Felix Moessbauer
The io work queue polling threads are userland threads that just never
exit to the userland. By that, they are also assigned to a cgroup (the
group of the creating task).
When changing the affinity of the io_wq thread via syscall, we must only
allow cpumasks within the ambient limits. These are defined by the cpuset
controller of the cgroup (if enabled).
Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
io_uring/io-wq.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index f1e7c670add8..c7055a8895d7 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -13,6 +13,7 @@
#include <linux/slab.h>
#include <linux/rculist_nulls.h>
#include <linux/cpu.h>
+#include <linux/cpuset.h>
#include <linux/task_work.h>
#include <linux/audit.h>
#include <linux/mmu_context.h>
@@ -1322,17 +1323,29 @@ static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node)
int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask)
{
+ cpumask_var_t allowed_mask;
+ int ret = 0;
+
if (!tctx || !tctx->io_wq)
return -EINVAL;
+ if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL))
+ return -ENOMEM;
+
rcu_read_lock();
- if (mask)
- cpumask_copy(tctx->io_wq->cpu_mask, mask);
- else
- cpumask_copy(tctx->io_wq->cpu_mask, cpu_possible_mask);
+ cpuset_cpus_allowed(tctx->io_wq->task, allowed_mask);
+ if (mask) {
+ if (cpumask_subset(mask, allowed_mask))
+ cpumask_copy(tctx->io_wq->cpu_mask, mask);
+ else
+ ret = -EINVAL;
+ } else {
+ cpumask_copy(tctx->io_wq->cpu_mask, allowed_mask);
+ }
rcu_read_unlock();
- return 0;
+ free_cpumask_var(allowed_mask);
+ return ret;
}
/*
--
2.39.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] io_uring/io-wq: do not allow pinning outside of cpuset
2024-09-10 14:33 ` [PATCH 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
@ 2024-09-10 14:55 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-09-10 14:55 UTC (permalink / raw)
To: Felix Moessbauer
Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
adriaan.schmidt, florian.bezdeka
On 9/10/24 8:33 AM, Felix Moessbauer wrote:
> The io work queue polling threads are userland threads that just never
> exit to the userland. By that, they are also assigned to a cgroup (the
> group of the creating task).
They are not polling threads, they are just worker threads.
> When changing the affinity of the io_wq thread via syscall, we must only
> allow cpumasks within the ambient limits. These are defined by the cpuset
> controller of the cgroup (if enabled).
ambient limits? Not quite sure that's the correct term to use here.
Outside of commit message oddities, change looks good.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] io_uring/io-wq: limit io poller cpuset to ambient one
2024-09-10 14:33 [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
2024-09-10 14:33 ` [PATCH 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
@ 2024-09-10 14:33 ` Felix Moessbauer
2024-09-10 14:55 ` Jens Axboe
2024-09-10 14:53 ` [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
2 siblings, 1 reply; 10+ messages in thread
From: Felix Moessbauer @ 2024-09-10 14:33 UTC (permalink / raw)
To: axboe
Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
adriaan.schmidt, florian.bezdeka, Felix Moessbauer
The io work queue polling threads are userland threads that just never
exit to the userland. By that, they are also assigned to a cgroup (the
group of the creating task).
When creating a new io poller, this poller should inherit the cpu limits
of the cgroup, as it belongs to the cgroup of the creating task.
Fixes: da64d6db3bd3 ("io_uring: One wqe per wq")
Signed-off-by: Felix Moessbauer <[email protected]>
---
io_uring/io-wq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index c7055a8895d7..a38f36b68060 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -1168,7 +1168,7 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
if (!alloc_cpumask_var(&wq->cpu_mask, GFP_KERNEL))
goto err;
- cpumask_copy(wq->cpu_mask, cpu_possible_mask);
+ cpuset_cpus_allowed(data->task, wq->cpu_mask);
wq->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
wq->acct[IO_WQ_ACCT_UNBOUND].max_workers =
task_rlimit(current, RLIMIT_NPROC);
--
2.39.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] io_uring/io-wq: limit io poller cpuset to ambient one
2024-09-10 14:33 ` [PATCH 2/2] io_uring/io-wq: limit io poller cpuset to ambient one Felix Moessbauer
@ 2024-09-10 14:55 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-09-10 14:55 UTC (permalink / raw)
To: Felix Moessbauer
Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
adriaan.schmidt, florian.bezdeka
On 9/10/24 8:33 AM, Felix Moessbauer wrote:
> The io work queue polling threads are userland threads that just never
> exit to the userland. By that, they are also assigned to a cgroup (the
> group of the creating task).
>
> When creating a new io poller, this poller should inherit the cpu limits
> of the cgroup, as it belongs to the cgroup of the creating task.
Same comment on polling threads and the use of ambient.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets
2024-09-10 14:33 [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets Felix Moessbauer
2024-09-10 14:33 ` [PATCH 1/2] io_uring/io-wq: do not allow pinning outside of cpuset Felix Moessbauer
2024-09-10 14:33 ` [PATCH 2/2] io_uring/io-wq: limit io poller cpuset to ambient one Felix Moessbauer
@ 2024-09-10 14:53 ` Jens Axboe
2024-09-10 15:08 ` MOESSBAUER, Felix
2 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2024-09-10 14:53 UTC (permalink / raw)
To: Felix Moessbauer
Cc: asml.silence, linux-kernel, io-uring, cgroups, dqminh, longman,
adriaan.schmidt, florian.bezdeka
On 9/10/24 8:33 AM, Felix Moessbauer wrote:
> Hi,
>
> this series continues the affinity cleanup work started in
> io_uring/sqpoll. It has been tested against the liburing testsuite
> (make runtests), whereby the read-mshot test always fails:
>
> Running test read-mshot.t
> Buffer ring register failed -22
> test_inc 0 0 failed
> Test read-mshot.t failed with ret 1
>
> However, this test also fails on a non-patched linux-next @
> bc83b4d1f086.
That sounds very odd... What liburing are you using? On old kernels
where provided buffer rings aren't available the test should just skip,
new ones it should pass. Only thing I can think of is that your liburing
repo isn't current?
> The test wq-aff.t succeeds if at least cpu 0,1 are in the set and
> fails otherwise. This is expected, as the test wants to pin on these
> cpus. I'll send a patch for liburing to skip that test in case this
> pre-condition is not met.
>
> Regarding backporting: I would like to backport these patches to 6.1 as
> well, as they affect our realtime applications. However, in-between 6.1
> and next there is a major change da64d6db3bd3 ("io_uring: One wqe per
> wq"), which makes the backport tricky. While I don't think we want to
> backport this change, would a dedicated backport of the two pinning
> patches for the old multi-queue implementation have a chance to be accepted?
Let's not backport that patch, just because it's pretty invasive. It's
fine to have a separate backport patch of them for -stable, in this case
we'll have one version for stable kernels new enough to have that
change, and one for older versions. Thankfully not that many to care
about.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets
2024-09-10 14:53 ` [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets Jens Axboe
@ 2024-09-10 15:08 ` MOESSBAUER, Felix
2024-09-10 15:17 ` Jens Axboe
0 siblings, 1 reply; 10+ messages in thread
From: MOESSBAUER, Felix @ 2024-09-10 15:08 UTC (permalink / raw)
To: [email protected]
Cc: [email protected], [email protected],
Bezdeka, Florian, [email protected],
[email protected], [email protected], Schmidt, Adriaan,
[email protected]
On Tue, 2024-09-10 at 08:53 -0600, Jens Axboe wrote:
> On 9/10/24 8:33 AM, Felix Moessbauer wrote:
> > Hi,
> >
> > this series continues the affinity cleanup work started in
> > io_uring/sqpoll. It has been tested against the liburing testsuite
> > (make runtests), whereby the read-mshot test always fails:
> >
> > Running test read-mshot.t
> > Buffer ring register failed -22
> > test_inc 0 0
> > failed
> >
> > Test read-mshot.t failed with ret 1
> >
> > However, this test also fails on a non-patched linux-next @
> > bc83b4d1f086.
>
> That sounds very odd... What liburing are you using? On old kernels
> where provided buffer rings aren't available the test should just
> skip,
> new ones it should pass. Only thing I can think of is that your
> liburing
> repo isn't current?
Hmm... I tested against
https://github.com/axboe/liburing/commit/74fefa1b51ee35a2014ca6e7667d7c10e9c5b06f
I'll redo the test against the unpatched kernel to be 100% sure that it
is not related to my patches. The -22 is likely an -EINVAL.
>
> > The test wq-aff.t succeeds if at least cpu 0,1 are in the set and
> > fails otherwise. This is expected, as the test wants to pin on
> > these
> > cpus. I'll send a patch for liburing to skip that test in case this
> > pre-condition is not met.
> >
> > Regarding backporting: I would like to backport these patches to
> > 6.1 as
> > well, as they affect our realtime applications. However, in-between
> > 6.1
> > and next there is a major change da64d6db3bd3 ("io_uring: One wqe
> > per
> > wq"), which makes the backport tricky. While I don't think we want
> > to
> > backport this change, would a dedicated backport of the two pinning
> > patches for the old multi-queue implementation have a chance to be
> > accepted?
>
> Let's not backport that patch, just because it's pretty invasive.
> It's
> fine to have a separate backport patch of them for -stable, in this
> case
> we'll have one version for stable kernels new enough to have that
> change, and one for older versions. Thankfully not that many to care
> about.
Ok, that is fine for me. Then let's first get things right in this
series and then I'll send the backport.
Best regards,
Felix
>
--
Siemens AG, Technology
Linux Expert Center
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets
2024-09-10 15:08 ` MOESSBAUER, Felix
@ 2024-09-10 15:17 ` Jens Axboe
2024-09-10 15:37 ` MOESSBAUER, Felix
0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2024-09-10 15:17 UTC (permalink / raw)
To: MOESSBAUER, Felix
Cc: [email protected], [email protected],
Bezdeka, Florian, [email protected],
[email protected], [email protected], Schmidt, Adriaan,
[email protected]
On 9/10/24 9:08 AM, MOESSBAUER, Felix wrote:
> On Tue, 2024-09-10 at 08:53 -0600, Jens Axboe wrote:
>> On 9/10/24 8:33 AM, Felix Moessbauer wrote:
>>> Hi,
>>>
>>> this series continues the affinity cleanup work started in
>>> io_uring/sqpoll. It has been tested against the liburing testsuite
>>> (make runtests), whereby the read-mshot test always fails:
>>>
>>> Running test read-mshot.t
>>> Buffer ring register failed -22
>>> test_inc 0 0
>>> failed
>>>
>>> Test read-mshot.t failed with ret 1
>>>
>>> However, this test also fails on a non-patched linux-next @
>>> bc83b4d1f086.
>>
>> That sounds very odd... What liburing are you using? On old kernels
>> where provided buffer rings aren't available the test should just
>> skip,
>> new ones it should pass. Only thing I can think of is that your
>> liburing
>> repo isn't current?
>
> Hmm... I tested against
> https://github.com/axboe/liburing/commit/74fefa1b51ee35a2014ca6e7667d7c10e9c5b06f
That should certainly be fine.
> I'll redo the test against the unpatched kernel to be 100% sure that it
> is not related to my patches. The -22 is likely an -EINVAL.
I'd be highly surprised if it's related to your patches! Here's what I
get on the current kernel:
axboe@m2max-kvm ~/g/liburing (master)> test/read-mshot.t
axboe@m2max-kvm ~/g/liburing (master)> echo $status
0
and on an older 6.6-stable that doesn't support it:
axboe@m2max-kvm ~/g/liburing (master)> test/read-mshot.t
skip
axboe@m2max-kvm ~/g/liburing (master) [77]> echo $status
77
and then I tried 6.1 since that seems to be your base and get the same
result as 6.6-stable. So not quite sure why it fails on your end, but in
any case, I pushed a commit that I think will sort it for you.
>>> The test wq-aff.t succeeds if at least cpu 0,1 are in the set and
>>> fails otherwise. This is expected, as the test wants to pin on
>>> these
>>> cpus. I'll send a patch for liburing to skip that test in case this
>>> pre-condition is not met.
>>>
>>> Regarding backporting: I would like to backport these patches to
>>> 6.1 as
>>> well, as they affect our realtime applications. However, in-between
>>> 6.1
>>> and next there is a major change da64d6db3bd3 ("io_uring: One wqe
>>> per
>>> wq"), which makes the backport tricky. While I don't think we want
>>> to
>>> backport this change, would a dedicated backport of the two pinning
>>> patches for the old multi-queue implementation have a chance to be
>>> accepted?
>>
>> Let's not backport that patch, just because it's pretty invasive.
>> It's
>> fine to have a separate backport patch of them for -stable, in this
>> case
>> we'll have one version for stable kernels new enough to have that
>> change, and one for older versions. Thankfully not that many to care
>> about.
>
> Ok, that is fine for me. Then let's first get things right in this
> series and then I'll send the backport.
Exactly, that's the plan. Thanks!
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets
2024-09-10 15:17 ` Jens Axboe
@ 2024-09-10 15:37 ` MOESSBAUER, Felix
2024-09-10 15:39 ` Jens Axboe
0 siblings, 1 reply; 10+ messages in thread
From: MOESSBAUER, Felix @ 2024-09-10 15:37 UTC (permalink / raw)
To: [email protected]
Cc: [email protected], [email protected],
Bezdeka, Florian, [email protected],
[email protected], [email protected], [email protected],
Schmidt, Adriaan
On Tue, 2024-09-10 at 09:17 -0600, Jens Axboe wrote:
> On 9/10/24 9:08 AM, MOESSBAUER, Felix wrote:
> > On Tue, 2024-09-10 at 08:53 -0600, Jens Axboe wrote:
> > > On 9/10/24 8:33 AM, Felix Moessbauer wrote:
> > > > Hi,
> > > >
> > > > this series continues the affinity cleanup work started in
> > > > io_uring/sqpoll. It has been tested against the liburing
> > > > testsuite
> > > > (make runtests), whereby the read-mshot test always fails:
> > > >
> > > > Running test read-mshot.t
> > > > Buffer ring register failed -22
> > > > test_inc 0 0
> > > > failed
> > > >
> > > >
> > > > Test read-mshot.t failed with ret 1
> > > >
> > > > However, this test also fails on a non-patched linux-next @
> > > > bc83b4d1f086.
> > >
> > > That sounds very odd... What liburing are you using? On old
> > > kernels
> > > where provided buffer rings aren't available the test should just
> > > skip,
> > > new ones it should pass. Only thing I can think of is that your
> > > liburing
> > > repo isn't current?
> >
> > Hmm... I tested against
> > https://github.com/axboe/liburing/commit/74fefa1b51ee35a2014ca6e7667d7c10e9c5b06f
>
> That should certainly be fine.
>
> > I'll redo the test against the unpatched kernel to be 100% sure
> > that it
> > is not related to my patches. The -22 is likely an -EINVAL.
>
> I'd be highly surprised if it's related to your patches! Here's what
> I
> get on the current kernel:
>
> axboe@m2max-kvm ~/g/liburing (master)> test/read-mshot.t
> axboe@m2max-kvm ~/g/liburing (master)> echo $status
Without your patches for liburing, this test definitely fails on linux-
next @ bc83b4d1f086 (in qemu). Same error as above. Some more
information:
$ uname -a
Linux test-iou 6.11.0-rc7 #1 SMP PREEMPT_DYNAMIC Thu, 01 Jan 1970
01:00:00 +0000 x86_64 GNU/Linux
Strange...
> 0
>
> and on an older 6.6-stable that doesn't support it:
>
> axboe@m2max-kvm ~/g/liburing (master)> test/read-mshot.t
> skip
> axboe@m2max-kvm ~/g/liburing (master) [77]> echo $status
> 77
>
> and then I tried 6.1 since that seems to be your base and get the
> same
> result as 6.6-stable. So not quite sure why it fails on your end, but
> in
> any case, I pushed a commit that I think will sort it for you.
With liburing@3505047a35df and my kernel patches, all tests pass.
By that, I assume my patches themselves are fine. I'll just update the
commit messages to fix the oddities and send a functionally identical
v2.
Felix
--
Siemens AG, Technology
Linux Expert Center
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] io_uring/io-wq: respect cgroup cpusets
2024-09-10 15:37 ` MOESSBAUER, Felix
@ 2024-09-10 15:39 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-09-10 15:39 UTC (permalink / raw)
To: MOESSBAUER, Felix
Cc: [email protected], [email protected],
Bezdeka, Florian, [email protected],
[email protected], [email protected], [email protected],
Schmidt, Adriaan
On 9/10/24 9:37 AM, MOESSBAUER, Felix wrote:
> On Tue, 2024-09-10 at 09:17 -0600, Jens Axboe wrote:
>> On 9/10/24 9:08 AM, MOESSBAUER, Felix wrote:
>>> On Tue, 2024-09-10 at 08:53 -0600, Jens Axboe wrote:
>>>> On 9/10/24 8:33 AM, Felix Moessbauer wrote:
>>>>> Hi,
>>>>>
>>>>> this series continues the affinity cleanup work started in
>>>>> io_uring/sqpoll. It has been tested against the liburing
>>>>> testsuite
>>>>> (make runtests), whereby the read-mshot test always fails:
>>>>>
>>>>> Running test read-mshot.t
>>>>> Buffer ring register failed -22
>>>>> test_inc 0 0
>>>>> failed
>>>>>
>>>>>
>>>>> Test read-mshot.t failed with ret 1
>>>>>
>>>>> However, this test also fails on a non-patched linux-next @
>>>>> bc83b4d1f086.
>>>>
>>>> That sounds very odd... What liburing are you using? On old
>>>> kernels
>>>> where provided buffer rings aren't available the test should just
>>>> skip,
>>>> new ones it should pass. Only thing I can think of is that your
>>>> liburing
>>>> repo isn't current?
>>>
>>> Hmm... I tested against
>>> https://github.com/axboe/liburing/commit/74fefa1b51ee35a2014ca6e7667d7c10e9c5b06f
>>
>> That should certainly be fine.
>>
>>> I'll redo the test against the unpatched kernel to be 100% sure
>>> that it
>>> is not related to my patches. The -22 is likely an -EINVAL.
>>
>> I'd be highly surprised if it's related to your patches! Here's what
>> I
>> get on the current kernel:
>>
>> axboe@m2max-kvm ~/g/liburing (master)> test/read-mshot.t
>> axboe@m2max-kvm ~/g/liburing (master)> echo $status
>
> Without your patches for liburing, this test definitely fails on linux-
> next @ bc83b4d1f086 (in qemu). Same error as above. Some more
> information:
> $ uname -a
> Linux test-iou 6.11.0-rc7 #1 SMP PREEMPT_DYNAMIC Thu, 01 Jan 1970
> 01:00:00 +0000 x86_64 GNU/Linux
>
> Strange...
It could just be that I never tested that version on a kernel that has
support for ring provided buffers, but not for incrementally consumed
ones. Though that should be in -next for a while now, so even that
doesn't make sense... Oh well, should work now.
> By that, I assume my patches themselves are fine. I'll just update the
> commit messages to fix the oddities and send a functionally identical
> v2.
Sounds good, thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread