* [PATCH AUTOSEL 6.18] io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs
[not found] <20260209122714.1037915-1-sashal@kernel.org>
@ 2026-02-09 12:26 ` Sasha Levin
0 siblings, 0 replies; only message in thread
From: Sasha Levin @ 2026-02-09 12:26 UTC (permalink / raw)
To: patches, stable
Cc: Jens Axboe, 是参差, Keith Busch, Sasha Levin,
io-uring
From: Jens Axboe <axboe@kernel.dk>
[ Upstream commit 38cfdd9dd279473a73814df9fd7e6e716951d361 ]
Add cond_resched() in those dump loops, just in case a lot of entries
are being dumped. And detect invalid CQ ring head/tail entries, to avoid
iterating more than what is necessary. Generally not an issue, but can be
if things like KASAN or other debugging metrics are enabled.
Reported-by: 是参差 <shicenci@gmail.com>
Link: https://lore.kernel.org/all/PS1PPF7E1D7501FE5631002D242DD89403FAB9BA@PS1PPF7E1D7501F.apcprd02.prod.outlook.com/
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Analysis of io_uring/fdinfo: cond_resched() and CQ bounds fix
### Commit Message Analysis
The commit addresses two issues in `io_uring/fdinfo.c`:
1. **Adding `cond_resched()` calls** in SQE/CQE dump loops to avoid
holding the CPU for too long when dumping many entries, particularly
with debugging tools like KASAN enabled.
2. **Detecting invalid CQ ring head/tail entries** to avoid iterating
more than necessary — this bounds the CQE loop to `ctx->cq_entries`
maximum iterations.
There's a `Reported-by:` tag indicating a real user hit this issue, and
a `Link:` to the mailing list discussion. The patch is `Reviewed-by:
Keith Busch` and authored by `Jens Axboe` (io_uring maintainer).
### Code Change Analysis
Let me examine the changes in detail:
**Change 1: `cond_resched()` in SQE loop** (line after `seq_printf(m,
"\n")` in the SQE loop)
- This is a straightforward addition to yield the CPU during potentially
long loops. The SQE loop already had bounded iteration (`sq_entries`
was calculated earlier), but with KASAN or heavy debugging, each
iteration can be slow. This prevents soft lockups.
**Change 2: Bounded CQE loop**
- Old code: `while (cq_head < cq_tail)` — this depends on userspace-
controlled `cq_head` and `cq_tail` values (read with `READ_ONCE`). If
these values are corrupted or malicious, `cq_tail - cq_head` could be
enormous (up to `UINT_MAX`), causing an extremely long loop.
- New code: `cq_entries = min(cq_tail - cq_head, ctx->cq_entries)` and
`for (i = 0; i < cq_entries; i++)` — this bounds the iteration to at
most `ctx->cq_entries`, which is the actual ring size. This is a
defensive bounds check.
- Also adds `cond_resched()` in the CQE loop.
**Change 3: CQE32 accounting fix**
- When CQE32 is detected (`cqe32 = true`), both `cq_head` and `i` are
incremented, properly accounting for the double-sized CQE entry in the
bounded loop.
### Bug Classification
This fixes two real problems:
1. **Soft lockup / scheduling latency issue**: Without `cond_resched()`,
dumping many SQEs/CQEs (especially with KASAN) can cause the kernel
to not schedule for a long time, triggering soft lockup warnings or
causing system unresponsiveness. This is a **real bug** — reported by
a user.
2. **Unbounded loop from userspace-controlled values**: The original CQE
loop was bounded only by `cq_tail - cq_head`, which are userspace-
written values. While in the normal case these are reasonable,
corrupted or malicious values could cause an extremely long
(potentially billions of iterations) loop in kernel context. This is
both a **robustness fix** and a **potential DoS vector** (any process
can read `/proc/<pid>/fdinfo/<fd>` for its own io_uring fds,
triggering this loop).
### Scope and Risk Assessment
- **Size**: Very small — ~10 lines changed in a single file
- **Subsystem**: io_uring fdinfo (diagnostic/debug path, not hot path)
- **Risk**: Extremely low
- `cond_resched()` is a standard kernel practice in long loops — zero
regression risk
- Bounding the CQE loop to `ctx->cq_entries` is obviously correct —
the ring can't have more entries than its size
- The CQE32 `i++` accounting is straightforward
- **Dependencies**: None apparent — this is a self-contained change to a
single function
### User Impact
- **Who is affected**: Anyone using io_uring who reads fdinfo
(monitoring tools, debuggers, diagnostic scripts)
- **Severity**: Soft lockups and system unresponsiveness — moderate to
high severity
- **Reproducibility**: Reported by a real user with a concrete scenario
(KASAN-enabled kernel with many ring entries)
### Stability Indicators
- Written by Jens Axboe (io_uring maintainer)
- Reviewed by Keith Busch (known kernel developer)
- Small, obviously correct change
- Fixes a user-reported issue
### Conclusion
This commit fixes:
1. A real soft lockup / scheduling latency bug (cond_resched in loops)
2. A potential unbounded loop from userspace-controlled values (CQE
bounds check)
Both are genuine bugs that affect real users. The fix is small,
obviously correct, self-contained, and carries virtually zero regression
risk. It meets all stable kernel criteria: it fixes a real bug, is small
and contained, is obviously correct, and doesn't introduce new features.
**YES**
io_uring/fdinfo.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 294c75a8a3bdb..3585ad8308504 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -65,7 +65,7 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
unsigned int cq_head = READ_ONCE(r->cq.head);
unsigned int cq_tail = READ_ONCE(r->cq.tail);
unsigned int sq_shift = 0;
- unsigned int sq_entries;
+ unsigned int cq_entries, sq_entries;
int sq_pid = -1, sq_cpu = -1;
u64 sq_total_time = 0, sq_work_time = 0;
unsigned int i;
@@ -119,9 +119,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
}
}
seq_printf(m, "\n");
+ cond_resched();
}
seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
- while (cq_head < cq_tail) {
+ cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
+ for (i = 0; i < cq_entries; i++) {
struct io_uring_cqe *cqe;
bool cqe32 = false;
@@ -136,8 +138,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
cqe->big_cqe[0], cqe->big_cqe[1]);
seq_printf(m, "\n");
cq_head++;
- if (cqe32)
+ if (cqe32) {
cq_head++;
+ i++;
+ }
+ cond_resched();
}
if (ctx->flags & IORING_SETUP_SQPOLL) {
--
2.51.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-02-09 12:27 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260209122714.1037915-1-sashal@kernel.org>
2026-02-09 12:26 ` [PATCH AUTOSEL 6.18] io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox