* [PATCH for-next] io_uring/query: check for loops in in_query()
@ 2025-09-11 0:13 Jens Axboe
2025-09-11 9:02 ` Pavel Begunkov
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2025-09-11 0:13 UTC (permalink / raw)
To: io-uring; +Cc: Pavel Begunkov
io_query() loops over query items that the application or liburing
passes in. But it has no checking for a max number of items, or if a
loop could be present. If someone were to do:
struct io_uring_query_hdr hdr1, hdr2, hdr3;
hdr3.next_entry = &hdr1;
hdr2.next_entry = &hdr3;
hdr1.next_entry = &hdr2;
io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0);
then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and
then loop back to hdr1.
Add a max cap for these kinds of cases, which is arbitrarily set to
1024 as well. Since there's now a cap, it seems that it would be saner
to have this interface return the number of items processed. Eg 0..N
for success, and < 0 for an error. Then if someone does need to query
more than the supported number of items, they can do so iteratively.
Fixes: c265ae75f900 ("io_uring: introduce io_uring querying")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
This would obviously need liburing changes too, but not a big deal
since a) the changes aren't in liburing yet, and b) this is still
unreleased code.
diff --git a/io_uring/query.c b/io_uring/query.c
index 9eed0f371956..99402e8e0a4a 100644
--- a/io_uring/query.c
+++ b/io_uring/query.c
@@ -70,11 +70,14 @@ static int io_handle_query_entry(struct io_ring_ctx *ctx,
return 0;
}
+/*
+ * Returns number of items processed, or < 0 in case of error.
+ */
int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
{
char entry_buffer[IO_MAX_QUERY_SIZE];
void __user *uhdr = arg;
- int ret;
+ int ret, nr = 0;
memset(entry_buffer, 0, sizeof(entry_buffer));
@@ -86,8 +89,11 @@ int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
ret = io_handle_query_entry(ctx, entry_buffer, uhdr, &next_hdr);
if (ret)
- return ret;
+ break;
uhdr = u64_to_user_ptr(next_hdr);
+ /* Have some limit to avoid a potential loop */
+ if (++nr >= 1024)
+ break;
}
- return 0;
+ return nr ?: ret;
}
--
Jens Axboe
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-11 0:13 [PATCH for-next] io_uring/query: check for loops in in_query() Jens Axboe @ 2025-09-11 9:02 ` Pavel Begunkov 2025-09-11 11:40 ` Pavel Begunkov 0 siblings, 1 reply; 9+ messages in thread From: Pavel Begunkov @ 2025-09-11 9:02 UTC (permalink / raw) To: Jens Axboe, io-uring On 9/11/25 01:13, Jens Axboe wrote: > io_query() loops over query items that the application or liburing > passes in. But it has no checking for a max number of items, or if a > loop could be present. If someone were to do: > > struct io_uring_query_hdr hdr1, hdr2, hdr3; > > hdr3.next_entry = &hdr1; > hdr2.next_entry = &hdr3; > hdr1.next_entry = &hdr2; > > io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0); > > then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and > then loop back to hdr1. > > Add a max cap for these kinds of cases, which is arbitrarily set to > 1024 as well. Since there's now a cap, it seems that it would be saner > to have this interface return the number of items processed. Eg 0..N > for success, and < 0 for an error. Then if someone does need to query > more than the supported number of items, they can do so iteratively. That worsens usability. The user would have to know / count how many entries there was in the first place, retry, and do all handling. It'll be better to: if (nr > (1U << 20)) return -ERANGE; if (fatal_signal_pending()) return -EINTR; ... return 0; 1M should be high enough for future proofing and to protect from mildly insane users (and would still be fast enough). I also had cond_resched() in some version, but apparently it got lost as well. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-11 9:02 ` Pavel Begunkov @ 2025-09-11 11:40 ` Pavel Begunkov 2025-09-15 18:41 ` Jens Axboe 0 siblings, 1 reply; 9+ messages in thread From: Pavel Begunkov @ 2025-09-11 11:40 UTC (permalink / raw) To: Jens Axboe, io-uring On 9/11/25 10:02, Pavel Begunkov wrote: > On 9/11/25 01:13, Jens Axboe wrote: >> io_query() loops over query items that the application or liburing >> passes in. But it has no checking for a max number of items, or if a >> loop could be present. If someone were to do: >> >> struct io_uring_query_hdr hdr1, hdr2, hdr3; >> >> hdr3.next_entry = &hdr1; >> hdr2.next_entry = &hdr3; >> hdr1.next_entry = &hdr2; >> >> io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0); >> >> then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and >> then loop back to hdr1. >> >> Add a max cap for these kinds of cases, which is arbitrarily set to >> 1024 as well. Since there's now a cap, it seems that it would be saner >> to have this interface return the number of items processed. Eg 0..N >> for success, and < 0 for an error. Then if someone does need to query >> more than the supported number of items, they can do so iteratively. > > That worsens usability. The user would have to know / count how > many entries there was in the first place, retry, and do all > handling. It'll be better to: > > if (nr > (1U << 20)) > return -ERANGE; > if (fatal_signal_pending()) > return -EINTR; > ... > return 0; > > > 1M should be high enough for future proofing and to protect from > mildly insane users (and would still be fast enough). I also had > cond_resched() in some version, but apparently it got lost as > well. Tested the diff below, works well enough. 1M breaks out after a second even in a very underpowered VM. commit 49f9c37c612967f9a6111e113c072aace68099bf Author: Pavel Begunkov <asml.silence@gmail.com> Date: Thu Sep 11 12:28:18 2025 +0100 io_uring/query: prevent infinite loops If the query chain forms a cycle, the interface will loop indefinitely. Cap the number of iterations at 1M, that should hopefully be enough for even the most inventive future use of the interface. Also make sure it checks for fatal signals and reschedules when necessary. Fixes: c265ae75f900 ("io_uring: introduce io_uring querying") Reported-by: Jens Axboe <axboe@kernel.dk> Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> diff --git a/io_uring/query.c b/io_uring/query.c index 9eed0f371956..1af40e5e9a2a 100644 --- a/io_uring/query.c +++ b/io_uring/query.c @@ -6,6 +6,7 @@ #include "io_uring.h" #define IO_MAX_QUERY_SIZE (sizeof(struct io_uring_query_opcode)) +#define IO_MAX_QUERY_ENTRIES (1U << 20) static ssize_t io_query_ops(void *data) { @@ -74,7 +75,7 @@ int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args) { char entry_buffer[IO_MAX_QUERY_SIZE]; void __user *uhdr = arg; - int ret; + int ret, nr = 0; memset(entry_buffer, 0, sizeof(entry_buffer)); @@ -88,6 +89,13 @@ int io_query(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args) if (ret) return ret; uhdr = u64_to_user_ptr(next_hdr); + + /* Have some limit to avoid a potential cycle */ + if (++nr >= IO_MAX_QUERY_ENTRIES) + return -ERANGE; + if (fatal_signal_pending(current)) + return -EINTR; + cond_resched(); } return 0; } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-11 11:40 ` Pavel Begunkov @ 2025-09-15 18:41 ` Jens Axboe 2025-09-16 15:05 ` Pavel Begunkov 0 siblings, 1 reply; 9+ messages in thread From: Jens Axboe @ 2025-09-15 18:41 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 9/11/25 5:40 AM, Pavel Begunkov wrote: > On 9/11/25 10:02, Pavel Begunkov wrote: >> On 9/11/25 01:13, Jens Axboe wrote: >>> io_query() loops over query items that the application or liburing >>> passes in. But it has no checking for a max number of items, or if a >>> loop could be present. If someone were to do: >>> >>> struct io_uring_query_hdr hdr1, hdr2, hdr3; >>> >>> hdr3.next_entry = &hdr1; >>> hdr2.next_entry = &hdr3; >>> hdr1.next_entry = &hdr2; >>> >>> io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0); >>> >>> then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and >>> then loop back to hdr1. >>> >>> Add a max cap for these kinds of cases, which is arbitrarily set to >>> 1024 as well. Since there's now a cap, it seems that it would be saner >>> to have this interface return the number of items processed. Eg 0..N >>> for success, and < 0 for an error. Then if someone does need to query >>> more than the supported number of items, they can do so iteratively. >> >> That worsens usability. The user would have to know / count how >> many entries there was in the first place, retry, and do all >> handling. It'll be better to: >> >> if (nr > (1U << 20)) >> return -ERANGE; >> if (fatal_signal_pending()) >> return -EINTR; >> ... >> return 0; >> >> >> 1M should be high enough for future proofing and to protect from >> mildly insane users (and would still be fast enough). I also had >> cond_resched() in some version, but apparently it got lost as >> well. > > Tested the diff below, works well enough. 1M breaks out after a > second even in a very underpowered VM. Honestly I'm not sure which of the two I dislike more, I think both are not great in terms of API. In practice, nobody is going to ask for 1000 entries. In practice, people will do one at the time. At the same time, I do like having the ability to process multiple in one syscall, even if it doesn't _really_ matter. Normally for interfaces like that, returning number of processed is the right approach. Eg when you get a signal or run into an error, you know where that happened. At the same time, it's also a pain in the butt to use for an application if it did to hundreds of then. But let's be real, it will not. It'll do a a handful at most, and then it's pretty clear where to continue. The only real error here would be -EINTR, as anything would be the applications fault because it's dumb or malicious, hence the only thing it'd do is submit the whole thing again. It's not like it's going to say "oh I got 2, which is less than the 5, let me restart at 3". But it now might have to, because it doesn't know what the error is. Anyway, that's a long winded way of saying I kind of hate needing any kind of limit, but at least with returning the number of entries processed, we can make the limit low and meaningful rather than some random high number "which is surely enough for everyone" with the sole idea behind that being that we need to be able to detect loops. And even if the interface is idempotent, it's still kind of silly to need to redo everything in case of a potentially valid error like an -EINVAL. Are we perhaps better off just killing this linking and just doing single items at the time? That nicely avoids any issues related to this, makes the 0/-ERROR interface sane, and makes the app interface simpler too. The only downside is needing a few more syscalls at probe time, which is not something worth optimizing for imho. -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-15 18:41 ` Jens Axboe @ 2025-09-16 15:05 ` Pavel Begunkov 2025-09-16 18:33 ` Jens Axboe 0 siblings, 1 reply; 9+ messages in thread From: Pavel Begunkov @ 2025-09-16 15:05 UTC (permalink / raw) To: Jens Axboe, io-uring On 9/15/25 19:41, Jens Axboe wrote: > On 9/11/25 5:40 AM, Pavel Begunkov wrote: >> On 9/11/25 10:02, Pavel Begunkov wrote: >>> On 9/11/25 01:13, Jens Axboe wrote: >>>> io_query() loops over query items that the application or liburing >>>> passes in. But it has no checking for a max number of items, or if a >>>> loop could be present. If someone were to do: >>>> >>>> struct io_uring_query_hdr hdr1, hdr2, hdr3; >>>> >>>> hdr3.next_entry = &hdr1; >>>> hdr2.next_entry = &hdr3; >>>> hdr1.next_entry = &hdr2; >>>> >>>> io_uring_register(fd, IORING_REGISTER_QUERY, &hdr1, 0); >>>> >>>> then it'll happily loop forever and process hdr1 -> hdr2 -> hdr3 and >>>> then loop back to hdr1. >>>> >>>> Add a max cap for these kinds of cases, which is arbitrarily set to >>>> 1024 as well. Since there's now a cap, it seems that it would be saner >>>> to have this interface return the number of items processed. Eg 0..N >>>> for success, and < 0 for an error. Then if someone does need to query >>>> more than the supported number of items, they can do so iteratively. >>> >>> That worsens usability. The user would have to know / count how >>> many entries there was in the first place, retry, and do all >>> handling. It'll be better to: >>> >>> if (nr > (1U << 20)) >>> return -ERANGE; >>> if (fatal_signal_pending()) >>> return -EINTR; >>> ... >>> return 0; >>> >>> >>> 1M should be high enough for future proofing and to protect from >>> mildly insane users (and would still be fast enough). I also had >>> cond_resched() in some version, but apparently it got lost as >>> well. >> >> Tested the diff below, works well enough. 1M breaks out after a >> second even in a very underpowered VM. > > Honestly I'm not sure which of the two I dislike more, I think both are > not great in terms of API. In practice, nobody is going to ask for 1000 > entries. In practice, people will do one at the time. At the same time, > I do like having the ability to process multiple in one syscall, even if > it doesn't _really_ matter. Normally for interfaces like that, returning > number of processed is the right approach. Eg when you get a signal or > run into an error, you know where that happened. At the same time, it's In which case you lose the error code, which can be more important. In either case, the idea is that it should only fail if the app is buggy, in which case the user can do it one by one while debugging. > also a pain in the butt to use for an application if it did to hundreds > of then. But let's be real, it will not. It'll do a a handful at most, > and then it's pretty clear where to continue. The only real error here > would be -EINTR, as anything would be the applications fault because I'd rather delay non fatal signals and even more so task work processing, it can't ever be reliable in general case otherwise and would always need to be executed in a loop. And the execution should be brief, likely shorter than non-interruptible sections of many syscalls. In this sense capping at 1000 can be a better choice. > it's dumb or malicious, hence the only thing it'd do is submit the whole > thing again. It's not like it's going to say "oh I got 2, which is less > than the 5, let me restart at 3". But it now might have to, because it > doesn't know what the error is. > > Anyway, that's a long winded way of saying I kind of hate needing any > kind of limit, but at least with returning the number of entries > processed, we can make the limit low and meaningful rather than some > random high number "which is surely enough for everyone" with the sole > idea behind that being that we need to be able to detect loops. And even > if the interface is idempotent, it's still kind of silly to need to redo > everything in case of a potentially valid error like an -EINVAL. It's not a valid syscall return for a correct application (when querying is available). Query errors are returned individually. > Are we perhaps better off just killing this linking and just doing > single items at the time? That nicely avoids any issues related to this, > makes the 0/-ERROR interface sane, and makes the app interface simpler > too. The only downside is needing a few more syscalls at probe time, > which is not something worth optimizing for imho. You was pretty convincing insisting that extra waiting on teardown is not tolerable. In the same spirit there were discussions on how fast you can create rings. I wouldn't care if it's one or two extra syscalls, but I reasonably expect that it might grow to low double digit queries at some point, and 10-15 syscalls doesn't sound that comfortable while that can be easily avoided. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-16 15:05 ` Pavel Begunkov @ 2025-09-16 18:33 ` Jens Axboe 2025-09-16 18:35 ` Jens Axboe 2025-09-18 10:27 ` Pavel Begunkov 0 siblings, 2 replies; 9+ messages in thread From: Jens Axboe @ 2025-09-16 18:33 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 9/16/25 9:05 AM, Pavel Begunkov wrote: > I'd rather delay non fatal signals and even more so task work > processing, it can't ever be reliable in general case otherwise > and would always need to be executed in a loop. And the execution > should be brief, likely shorter than non-interruptible sections > of many syscalls. In this sense capping at 1000 can be a better > choice. Let's just cap it at 1000, at least that's a more reasonable number. I don't think there's a perfect way to solve this problem, outside of being able to detect loops, but at least 1000 can be documented as the max limit. Not that anyone would ever get anywhere near that... > You was pretty convincing insisting that extra waiting on teardown is > not tolerable. In the same spirit there were discussions on how fast > you can create rings. I wouldn't care if it's one or two extra > syscalls, but I reasonably expect that it might grow to low double > digit queries at some point, and 10-15 syscalls doesn't sound that > comfortable while that can be easily avoided. Those are completely different matters. The teardown slowness caused 100-150x slowdowns, doing 10-15 extra syscalls for setup is utter noise and nobody would ever notice that, let alone complain about it. But I'm willing to compromise on some sane lower limit, even if I think the better API is just doing single queries at the time because that is the sanest API when we don't need to care about squeezing out the last bit of efficiency. -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-16 18:33 ` Jens Axboe @ 2025-09-16 18:35 ` Jens Axboe 2025-09-18 9:50 ` Pavel Begunkov 2025-09-18 10:27 ` Pavel Begunkov 1 sibling, 1 reply; 9+ messages in thread From: Jens Axboe @ 2025-09-16 18:35 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 9/16/25 12:33 PM, Jens Axboe wrote: > On 9/16/25 9:05 AM, Pavel Begunkov wrote: >> I'd rather delay non fatal signals and even more so task work >> processing, it can't ever be reliable in general case otherwise >> and would always need to be executed in a loop. And the execution >> should be brief, likely shorter than non-interruptible sections >> of many syscalls. In this sense capping at 1000 can be a better >> choice. > > Let's just cap it at 1000, at least that's a more reasonable number. I > don't think there's a perfect way to solve this problem, outside of > being able to detect loops, but at least 1000 can be documented as the > max limit. Not that anyone would ever get anywhere near that... Forgot to mention, but I'm assuming you'll send a v2 of the patch? -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-16 18:35 ` Jens Axboe @ 2025-09-18 9:50 ` Pavel Begunkov 0 siblings, 0 replies; 9+ messages in thread From: Pavel Begunkov @ 2025-09-18 9:50 UTC (permalink / raw) To: Jens Axboe, io-uring On 9/16/25 19:35, Jens Axboe wrote: > On 9/16/25 12:33 PM, Jens Axboe wrote: >> On 9/16/25 9:05 AM, Pavel Begunkov wrote: >>> I'd rather delay non fatal signals and even more so task work >>> processing, it can't ever be reliable in general case otherwise >>> and would always need to be executed in a loop. And the execution >>> should be brief, likely shorter than non-interruptible sections >>> of many syscalls. In this sense capping at 1000 can be a better >>> choice. >> >> Let's just cap it at 1000, at least that's a more reasonable number. I >> don't think there's a perfect way to solve this problem, outside of >> being able to detect loops, but at least 1000 can be documented as the >> max limit. Not that anyone would ever get anywhere near that... > > Forgot to mention, but I'm assuming you'll send a v2 of the patch? I assumed you'd want to squash something into your patch, but I can send it out properly. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next] io_uring/query: check for loops in in_query() 2025-09-16 18:33 ` Jens Axboe 2025-09-16 18:35 ` Jens Axboe @ 2025-09-18 10:27 ` Pavel Begunkov 1 sibling, 0 replies; 9+ messages in thread From: Pavel Begunkov @ 2025-09-18 10:27 UTC (permalink / raw) To: Jens Axboe, io-uring On 9/16/25 19:33, Jens Axboe wrote: > On 9/16/25 9:05 AM, Pavel Begunkov wrote: >> I'd rather delay non fatal signals and even more so task work >> processing, it can't ever be reliable in general case otherwise >> and would always need to be executed in a loop. And the execution >> should be brief, likely shorter than non-interruptible sections >> of many syscalls. In this sense capping at 1000 can be a better >> choice. > > Let's just cap it at 1000, at least that's a more reasonable number. I > don't think there's a perfect way to solve this problem, outside of > being able to detect loops, but at least 1000 can be documented as the > max limit. Not that anyone would ever get anywhere near that... Fwiw, I don't think the limit is strictly required as long as the task remains killable, even if it's a nice thing to do. It's that kind of problem that'd rather seem to come from a malicious user and not a honest mistake, and in case of the latter it'd be hard to make it nondeterministic. IOW, if the user tests the code, it should be able to easily find and fix it. >> You was pretty convincing insisting that extra waiting on teardown is >> not tolerable. In the same spirit there were discussions on how fast >> you can create rings. I wouldn't care if it's one or two extra >> syscalls, but I reasonably expect that it might grow to low double >> digit queries at some point, and 10-15 syscalls doesn't sound that >> comfortable while that can be easily avoided. > > Those are completely different matters. The teardown slowness caused > 100-150x slowdowns, doing 10-15 extra syscalls for setup is utter noise Another worry is the possibility that the number of queries goes considerably up. E.g. a query type that returns info about a specified req opcode, and some library doing it for each opcode. > and nobody would ever notice that, let alone complain about it. But I'm > willing to compromise on some sane lower limit, even if I think the > better API is just doing single queries at the time because that is the > sanest API when we don't need to care about squeezing out the last bit > of efficiency. If you have a bunch of them, it might be neater from the user code perspective as well instead of making multiple calls. I guess can be abstracted out by a library, but struct layouts would need to be smart. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-18 10:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-11 0:13 [PATCH for-next] io_uring/query: check for loops in in_query() Jens Axboe 2025-09-11 9:02 ` Pavel Begunkov 2025-09-11 11:40 ` Pavel Begunkov 2025-09-15 18:41 ` Jens Axboe 2025-09-16 15:05 ` Pavel Begunkov 2025-09-16 18:33 ` Jens Axboe 2025-09-16 18:35 ` Jens Axboe 2025-09-18 9:50 ` Pavel Begunkov 2025-09-18 10:27 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox