* [PATCH for-next] io_uring/query: check for loops in in_query()
@ 2025-09-11 0:13 Jens Axboe
2025-09-11 9:02 ` Pavel Begunkov
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2025-09-11 0:13 UTC (permalink / raw)
To: io-uring; +Cc: Pavel Begunkov
io_query() loops over query items that the application or liburing
passes in. But it has no checking for a max number of items, or if a
loop could be present. If someone were to do:
struct io_uring_query_hdr hdr1, hdr2, hdr3;
hdr3.next_entry = &hdr1;
hdr2.next_entry = &hdr3;
hdr1.next_entry = &hdr2;
io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0);
then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and
then loop back to hdr1.
Add a max cap for these kinds of cases, which is arbitrarily set to
1024 as well. Since there's now a cap, it seems that it would be saner
to have this interface return the number of items processed. Eg 0..N
for success, and < 0 for an error. Then if someone does need to query
more than the supported number of items, they can do so iteratively.
Fixes: c265ae75f900 ("io_uring: introduce io_uring querying")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
This would obviously need liburing changes too, but not a big deal
since a) the changes aren't in liburing yet, and b) this is still
unreleased code.
diff --git a/io_uring/query.c b/io_uring/query.c
index 9eed0f371956..99402e8e0a4a 100644
--- a/io_uring/query.c
+++ b/io_uring/query.c
@@ -70,11 +70,14 @@ static int io_handle_query_entry(struct io_ring_ctx *ctx,
return 0;
}
+/*
+ * Returns number of items processed, or < 0 in case of error.
+ */
int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
{
char entry_buffer[IO_MAX_QUERY_SIZE];
void __user *uhdr = arg;
- int ret;
+ int ret, nr = 0;
memset(entry_buffer, 0, sizeof(entry_buffer));
@@ -86,8 +89,11 @@ int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
ret = io_handle_query_entry(ctx, entry_buffer, uhdr, &next_hdr);
if (ret)
- return ret;
+ break;
uhdr = u64_to_user_ptr(next_hdr);
+ /* Have some limit to avoid a potential loop */
+ if (++nr >= 1024)
+ break;
}
- return 0;
+ return nr ?: ret;
}
--
Jens Axboe
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-11 0:13 [PATCH for-next] io_uring/query: check for loops in in_query() Jens Axboe
@ 2025-09-11 9:02 ` Pavel Begunkov
2025-09-11 11:40 ` Pavel Begunkov
0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2025-09-11 9:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 9/11/25 01:13, Jens Axboe wrote:
> io_query() loops over query items that the application or liburing
> passes in. But it has no checking for a max number of items, or if a
> loop could be present. If someone were to do:
>
> struct io_uring_query_hdr hdr1, hdr2, hdr3;
>
> hdr3.next_entry = &hdr1;
> hdr2.next_entry = &hdr3;
> hdr1.next_entry = &hdr2;
>
> io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0);
>
> then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and
> then loop back to hdr1.
>
> Add a max cap for these kinds of cases, which is arbitrarily set to
> 1024 as well. Since there's now a cap, it seems that it would be saner
> to have this interface return the number of items processed. Eg 0..N
> for success, and < 0 for an error. Then if someone does need to query
> more than the supported number of items, they can do so iteratively.
That worsens usability. The user would have to know / count how
many entries there was in the first place, retry, and do all
handling. It'll be better to:
if (nr > (1U << 20))
return -ERANGE;
if (fatal_signal_pending())
return -EINTR;
...
return 0;
1M should be high enough for future proofing and to protect from
mildly insane users (and would still be fast enough). I also had
cond_resched() in some version, but apparently it got lost as
well.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-11 9:02 ` Pavel Begunkov
@ 2025-09-11 11:40 ` Pavel Begunkov
2025-09-15 18:41 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2025-09-11 11:40 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 9/11/25 10:02, Pavel Begunkov wrote:
> On 9/11/25 01:13, Jens Axboe wrote:
>> io_query() loops over query items that the application or liburing
>> passes in. But it has no checking for a max number of items, or if a
>> loop could be present. If someone were to do:
>>
>> struct io_uring_query_hdr hdr1, hdr2, hdr3;
>>
>> hdr3.next_entry = &hdr1;
>> hdr2.next_entry = &hdr3;
>> hdr1.next_entry = &hdr2;
>>
>> io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0);
>>
>> then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and
>> then loop back to hdr1.
>>
>> Add a max cap for these kinds of cases, which is arbitrarily set to
>> 1024 as well. Since there's now a cap, it seems that it would be saner
>> to have this interface return the number of items processed. Eg 0..N
>> for success, and < 0 for an error. Then if someone does need to query
>> more than the supported number of items, they can do so iteratively.
>
> That worsens usability. The user would have to know / count how
> many entries there was in the first place, retry, and do all
> handling. It'll be better to:
>
> if (nr > (1U << 20))
> return -ERANGE;
> if (fatal_signal_pending())
> return -EINTR;
> ...
> return 0;
>
>
> 1M should be high enough for future proofing and to protect from
> mildly insane users (and would still be fast enough). I also had
> cond_resched() in some version, but apparently it got lost as
> well.
Tested the diff below, works well enough. 1M breaks out after a
second even in a very underpowered VM.
commit 49f9c37c612967f9a6111e113c072aace68099bf
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Thu Sep 11 12:28:18 2025 +0100
io_uring/query: prevent infinite loops
If the query chain forms a cycle, the interface will loop indefinitely.
Cap the number of iterations at 1M, that should hopefully be enough for
even the most inventive future use of the interface. Also make sure it
checks for fatal signals and reschedules when necessary.
Fixes: c265ae75f900 ("io_uring: introduce io_uring querying")
Reported-by: Jens Axboe <axboe@kernel.dk>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
diff --git a/io_uring/query.c b/io_uring/query.c
index 9eed0f371956..1af40e5e9a2a 100644
--- a/io_uring/query.c
+++ b/io_uring/query.c
@@ -6,6 +6,7 @@
#include "io_uring.h"
#define IO_MAX_QUERY_SIZE (sizeof(struct io_uring_query_opcode))
+#define IO_MAX_QUERY_ENTRIES (1U << 20)
static ssize_t io_query_ops(void *data)
{
@@ -74,7 +75,7 @@ int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
{
char entry_buffer[IO_MAX_QUERY_SIZE];
void __user *uhdr = arg;
- int ret;
+ int ret, nr = 0;
memset(entry_buffer, 0, sizeof(entry_buffer));
@@ -88,6 +89,13 @@ int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
if (ret)
return ret;
uhdr = u64_to_user_ptr(next_hdr);
+
+ /* Have some limit to avoid a potential cycle */
+ if (++nr >= IO_MAX_QUERY_ENTRIES)
+ return -ERANGE;
+ if (fatal_signal_pending(current))
+ return -EINTR;
+ cond_resched();
}
return 0;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-11 11:40 ` Pavel Begunkov
@ 2025-09-15 18:41 ` Jens Axboe
2025-09-16 15:05 ` Pavel Begunkov
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2025-09-15 18:41 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 9/11/25 5:40 AM, Pavel Begunkov wrote:
> On 9/11/25 10:02, Pavel Begunkov wrote:
>> On 9/11/25 01:13, Jens Axboe wrote:
>>> io_query() loops over query items that the application or liburing
>>> passes in. But it has no checking for a max number of items, or if a
>>> loop could be present. If someone were to do:
>>>
>>> struct io_uring_query_hdr hdr1, hdr2, hdr3;
>>>
>>> hdr3.next_entry = &hdr1;
>>> hdr2.next_entry = &hdr3;
>>> hdr1.next_entry = &hdr2;
>>>
>>> io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0);
>>>
>>> then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and
>>> then loop back to hdr1.
>>>
>>> Add a max cap for these kinds of cases, which is arbitrarily set to
>>> 1024 as well. Since there's now a cap, it seems that it would be saner
>>> to have this interface return the number of items processed. Eg 0..N
>>> for success, and < 0 for an error. Then if someone does need to query
>>> more than the supported number of items, they can do so iteratively.
>>
>> That worsens usability. The user would have to know / count how
>> many entries there was in the first place, retry, and do all
>> handling. It'll be better to:
>>
>> if (nr > (1U << 20))
>> return -ERANGE;
>> if (fatal_signal_pending())
>> return -EINTR;
>> ...
>> return 0;
>>
>>
>> 1M should be high enough for future proofing and to protect from
>> mildly insane users (and would still be fast enough). I also had
>> cond_resched() in some version, but apparently it got lost as
>> well.
>
> Tested the diff below, works well enough. 1M breaks out after a
> second even in a very underpowered VM.
Honestly I'm not sure which of the two I dislike more, I think both are
not great in terms of API. In practice, nobody is going to ask for 1000
entries. In practice, people will do one at the time. At the same time,
I do like having the ability to process multiple in one syscall, even if
it doesn't _really_ matter. Normally for interfaces like that, returning
number of processed is the right approach. Eg when you get a signal or
run into an error, you know where that happened. At the same time, it's
also a pain in the butt to use for an application if it did to hundreds
of then. But let's be real, it will not. It'll do a a handful at most,
and then it's pretty clear where to continue. The only real error here
would be -EINTR, as anything would be the applications fault because
it's dumb or malicious, hence the only thing it'd do is submit the whole
thing again. It's not like it's going to say "oh I got 2, which is less
than the 5, let me restart at 3". But it now might have to, because it
doesn't know what the error is.
Anyway, that's a long winded way of saying I kind of hate needing any
kind of limit, but at least with returning the number of entries
processed, we can make the limit low and meaningful rather than some
random high number "which is surely enough for everyone" with the sole
idea behind that being that we need to be able to detect loops. And even
if the interface is idempotent, it's still kind of silly to need to redo
everything in case of a potentially valid error like an -EINVAL.
Are we perhaps better off just killing this linking and just doing
single items at the time? That nicely avoids any issues related to this,
makes the 0/-ERROR interface sane, and makes the app interface simpler
too. The only downside is needing a few more syscalls at probe time,
which is not something worth optimizing for imho.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-15 18:41 ` Jens Axboe
@ 2025-09-16 15:05 ` Pavel Begunkov
2025-09-16 18:33 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2025-09-16 15:05 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 9/15/25 19:41, Jens Axboe wrote:
> On 9/11/25 5:40 AM, Pavel Begunkov wrote:
>> On 9/11/25 10:02, Pavel Begunkov wrote:
>>> On 9/11/25 01:13, Jens Axboe wrote:
>>>> io_query() loops over query items that the application or liburing
>>>> passes in. But it has no checking for a max number of items, or if a
>>>> loop could be present. If someone were to do:
>>>>
>>>> struct io_uring_query_hdr hdr1, hdr2, hdr3;
>>>>
>>>> hdr3.next_entry = &hdr1;
>>>> hdr2.next_entry = &hdr3;
>>>> hdr1.next_entry = &hdr2;
>>>>
>>>> io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0);
>>>>
>>>> then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and
>>>> then loop back to hdr1.
>>>>
>>>> Add a max cap for these kinds of cases, which is arbitrarily set to
>>>> 1024 as well. Since there's now a cap, it seems that it would be saner
>>>> to have this interface return the number of items processed. Eg 0..N
>>>> for success, and < 0 for an error. Then if someone does need to query
>>>> more than the supported number of items, they can do so iteratively.
>>>
>>> That worsens usability. The user would have to know / count how
>>> many entries there was in the first place, retry, and do all
>>> handling. It'll be better to:
>>>
>>> if (nr > (1U << 20))
>>> return -ERANGE;
>>> if (fatal_signal_pending())
>>> return -EINTR;
>>> ...
>>> return 0;
>>>
>>>
>>> 1M should be high enough for future proofing and to protect from
>>> mildly insane users (and would still be fast enough). I also had
>>> cond_resched() in some version, but apparently it got lost as
>>> well.
>>
>> Tested the diff below, works well enough. 1M breaks out after a
>> second even in a very underpowered VM.
>
> Honestly I'm not sure which of the two I dislike more, I think both are
> not great in terms of API. In practice, nobody is going to ask for 1000
> entries. In practice, people will do one at the time. At the same time,
> I do like having the ability to process multiple in one syscall, even if
> it doesn't _really_ matter. Normally for interfaces like that, returning
> number of processed is the right approach. Eg when you get a signal or
> run into an error, you know where that happened. At the same time, it's
In which case you lose the error code, which can be more important.
In either case, the idea is that it should only fail if the app is
buggy, in which case the user can do it one by one while debugging.
> also a pain in the butt to use for an application if it did to hundreds
> of then. But let's be real, it will not. It'll do a a handful at most,
> and then it's pretty clear where to continue. The only real error here
> would be -EINTR, as anything would be the applications fault because
I'd rather delay non fatal signals and even more so task work
processing, it can't ever be reliable in general case otherwise
and would always need to be executed in a loop. And the execution
should be brief, likely shorter than non-interruptible sections
of many syscalls. In this sense capping at 1000 can be a better
choice.
> it's dumb or malicious, hence the only thing it'd do is submit the whole
> thing again. It's not like it's going to say "oh I got 2, which is less
> than the 5, let me restart at 3". But it now might have to, because it
> doesn't know what the error is.
>
> Anyway, that's a long winded way of saying I kind of hate needing any
> kind of limit, but at least with returning the number of entries
> processed, we can make the limit low and meaningful rather than some
> random high number "which is surely enough for everyone" with the sole
> idea behind that being that we need to be able to detect loops. And even
> if the interface is idempotent, it's still kind of silly to need to redo
> everything in case of a potentially valid error like an -EINVAL.
It's not a valid syscall return for a correct application (when
querying is available). Query errors are returned individually.
> Are we perhaps better off just killing this linking and just doing
> single items at the time? That nicely avoids any issues related to this,
> makes the 0/-ERROR interface sane, and makes the app interface simpler
> too. The only downside is needing a few more syscalls at probe time,
> which is not something worth optimizing for imho.
You was pretty convincing insisting that extra waiting on teardown is
not tolerable. In the same spirit there were discussions on how fast
you can create rings. I wouldn't care if it's one or two extra
syscalls, but I reasonably expect that it might grow to low double
digit queries at some point, and 10-15 syscalls doesn't sound that
comfortable while that can be easily avoided.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-16 15:05 ` Pavel Begunkov
@ 2025-09-16 18:33 ` Jens Axboe
2025-09-16 18:35 ` Jens Axboe
2025-09-18 10:27 ` Pavel Begunkov
0 siblings, 2 replies; 9+ messages in thread
From: Jens Axboe @ 2025-09-16 18:33 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 9/16/25 9:05 AM, Pavel Begunkov wrote:
> I'd rather delay non fatal signals and even more so task work
> processing, it can't ever be reliable in general case otherwise
> and would always need to be executed in a loop. And the execution
> should be brief, likely shorter than non-interruptible sections
> of many syscalls. In this sense capping at 1000 can be a better
> choice.
Let's just cap it at 1000, at least that's a more reasonable number. I
don't think there's a perfect way to solve this problem, outside of
being able to detect loops, but at least 1000 can be documented as the
max limit. Not that anyone would ever get anywhere near that...
> You was pretty convincing insisting that extra waiting on teardown is
> not tolerable. In the same spirit there were discussions on how fast
> you can create rings. I wouldn't care if it's one or two extra
> syscalls, but I reasonably expect that it might grow to low double
> digit queries at some point, and 10-15 syscalls doesn't sound that
> comfortable while that can be easily avoided.
Those are completely different matters. The teardown slowness caused
100-150x slowdowns, doing 10-15 extra syscalls for setup is utter noise
and nobody would ever notice that, let alone complain about it. But I'm
willing to compromise on some sane lower limit, even if I think the
better API is just doing single queries at the time because that is the
sanest API when we don't need to care about squeezing out the last bit
of efficiency.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-16 18:33 ` Jens Axboe
@ 2025-09-16 18:35 ` Jens Axboe
2025-09-18 9:50 ` Pavel Begunkov
2025-09-18 10:27 ` Pavel Begunkov
1 sibling, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2025-09-16 18:35 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 9/16/25 12:33 PM, Jens Axboe wrote:
> On 9/16/25 9:05 AM, Pavel Begunkov wrote:
>> I'd rather delay non fatal signals and even more so task work
>> processing, it can't ever be reliable in general case otherwise
>> and would always need to be executed in a loop. And the execution
>> should be brief, likely shorter than non-interruptible sections
>> of many syscalls. In this sense capping at 1000 can be a better
>> choice.
>
> Let's just cap it at 1000, at least that's a more reasonable number. I
> don't think there's a perfect way to solve this problem, outside of
> being able to detect loops, but at least 1000 can be documented as the
> max limit. Not that anyone would ever get anywhere near that...
Forgot to mention, but I'm assuming you'll send a v2 of the patch?
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-16 18:35 ` Jens Axboe
@ 2025-09-18 9:50 ` Pavel Begunkov
0 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-09-18 9:50 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 9/16/25 19:35, Jens Axboe wrote:
> On 9/16/25 12:33 PM, Jens Axboe wrote:
>> On 9/16/25 9:05 AM, Pavel Begunkov wrote:
>>> I'd rather delay non fatal signals and even more so task work
>>> processing, it can't ever be reliable in general case otherwise
>>> and would always need to be executed in a loop. And the execution
>>> should be brief, likely shorter than non-interruptible sections
>>> of many syscalls. In this sense capping at 1000 can be a better
>>> choice.
>>
>> Let's just cap it at 1000, at least that's a more reasonable number. I
>> don't think there's a perfect way to solve this problem, outside of
>> being able to detect loops, but at least 1000 can be documented as the
>> max limit. Not that anyone would ever get anywhere near that...
>
> Forgot to mention, but I'm assuming you'll send a v2 of the patch?
I assumed you'd want to squash something into your patch, but
I can send it out properly.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query()
2025-09-16 18:33 ` Jens Axboe
2025-09-16 18:35 ` Jens Axboe
@ 2025-09-18 10:27 ` Pavel Begunkov
1 sibling, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-09-18 10:27 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 9/16/25 19:33, Jens Axboe wrote:
> On 9/16/25 9:05 AM, Pavel Begunkov wrote:
>> I'd rather delay non fatal signals and even more so task work
>> processing, it can't ever be reliable in general case otherwise
>> and would always need to be executed in a loop. And the execution
>> should be brief, likely shorter than non-interruptible sections
>> of many syscalls. In this sense capping at 1000 can be a better
>> choice.
>
> Let's just cap it at 1000, at least that's a more reasonable number. I
> don't think there's a perfect way to solve this problem, outside of
> being able to detect loops, but at least 1000 can be documented as the
> max limit. Not that anyone would ever get anywhere near that...
Fwiw, I don't think the limit is strictly required as long as the
task remains killable, even if it's a nice thing to do. It's that
kind of problem that'd rather seem to come from a malicious user
and not a honest mistake, and in case of the latter it'd be hard
to make it nondeterministic. IOW, if the user tests the code, it
should be able to easily find and fix it.
>> You was pretty convincing insisting that extra waiting on teardown is
>> not tolerable. In the same spirit there were discussions on how fast
>> you can create rings. I wouldn't care if it's one or two extra
>> syscalls, but I reasonably expect that it might grow to low double
>> digit queries at some point, and 10-15 syscalls doesn't sound that
>> comfortable while that can be easily avoided.
>
> Those are completely different matters. The teardown slowness caused
> 100-150x slowdowns, doing 10-15 extra syscalls for setup is utter noise
Another worry is the possibility that the number of queries goes
considerably up. E.g. a query type that returns info about a
specified req opcode, and some library doing it for each opcode.
> and nobody would ever notice that, let alone complain about it. But I'm
> willing to compromise on some sane lower limit, even if I think the
> better API is just doing single queries at the time because that is the
> sanest API when we don't need to care about squeezing out the last bit
> of efficiency.
If you have a bunch of them, it might be neater from the user code
perspective as well instead of making multiple calls. I guess can
be abstracted out by a library, but struct layouts would need to
be smart.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-18 10:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-11 0:13 [PATCH for-next] io_uring/query: check for loops in in_query() Jens Axboe
2025-09-11 9:02 ` Pavel Begunkov
2025-09-11 11:40 ` Pavel Begunkov
2025-09-15 18:41 ` Jens Axboe
2025-09-16 15:05 ` Pavel Begunkov
2025-09-16 18:33 ` Jens Axboe
2025-09-16 18:35 ` Jens Axboe
2025-09-18 9:50 ` Pavel Begunkov
2025-09-18 10:27 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox