[PATCH AUTOSEL 6.18] io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.18] io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs
       [not found] <20260209122714.1037915-1-sashal@kernel.org>
@ 2026-02-09 12:26 ` Sasha Levin
  0 siblings, 0 replies; only message in thread
From: Sasha Levin @ 2026-02-09 12:26 UTC (permalink / raw)
  To: patches, stable
  Cc: Jens Axboe, 是参差, Keith Busch, Sasha Levin,
	io-uring

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 38cfdd9dd279473a73814df9fd7e6e716951d361 ]

Add cond_resched() in those dump loops, just in case a lot of entries
are being dumped. And detect invalid CQ ring head/tail entries, to avoid
iterating more than what is necessary. Generally not an issue, but can be
if things like KASAN or other debugging metrics are enabled.

Reported-by: 是参差 <shicenci@gmail.com>
Link: https://lore.kernel.org/all/PS1PPF7E1D7501FE5631002D242DD89403FAB9BA@PS1PPF7E1D7501F.apcprd02.prod.outlook.com/
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Analysis of io_uring/fdinfo: cond_resched() and CQ bounds fix

### Commit Message Analysis

The commit addresses two issues in `io_uring/fdinfo.c`:

1. **Adding `cond_resched()` calls** in SQE/CQE dump loops to avoid
   holding the CPU for too long when dumping many entries, particularly
   with debugging tools like KASAN enabled.
2. **Detecting invalid CQ ring head/tail entries** to avoid iterating
   more than necessary — this bounds the CQE loop to `ctx->cq_entries`
   maximum iterations.

There's a `Reported-by:` tag indicating a real user hit this issue, and
a `Link:` to the mailing list discussion. The patch is `Reviewed-by:
Keith Busch` and authored by `Jens Axboe` (io_uring maintainer).

### Code Change Analysis

Let me examine the changes in detail:

**Change 1: `cond_resched()` in SQE loop** (line after `seq_printf(m,
"\n")` in the SQE loop)
- This is a straightforward addition to yield the CPU during potentially
  long loops. The SQE loop already had bounded iteration (`sq_entries`
  was calculated earlier), but with KASAN or heavy debugging, each
  iteration can be slow. This prevents soft lockups.

**Change 2: Bounded CQE loop**
- Old code: `while (cq_head < cq_tail)` — this depends on userspace-
  controlled `cq_head` and `cq_tail` values (read with `READ_ONCE`). If
  these values are corrupted or malicious, `cq_tail - cq_head` could be
  enormous (up to `UINT_MAX`), causing an extremely long loop.
- New code: `cq_entries = min(cq_tail - cq_head, ctx->cq_entries)` and
  `for (i = 0; i < cq_entries; i++)` — this bounds the iteration to at
  most `ctx->cq_entries`, which is the actual ring size. This is a
  defensive bounds check.
- Also adds `cond_resched()` in the CQE loop.

**Change 3: CQE32 accounting fix**
- When CQE32 is detected (`cqe32 = true`), both `cq_head` and `i` are
  incremented, properly accounting for the double-sized CQE entry in the
  bounded loop.

### Bug Classification

This fixes two real problems:

1. **Soft lockup / scheduling latency issue**: Without `cond_resched()`,
   dumping many SQEs/CQEs (especially with KASAN) can cause the kernel
   to not schedule for a long time, triggering soft lockup warnings or
   causing system unresponsiveness. This is a **real bug** — reported by
   a user.

2. **Unbounded loop from userspace-controlled values**: The original CQE
   loop was bounded only by `cq_tail - cq_head`, which are userspace-
   written values. While in the normal case these are reasonable,
   corrupted or malicious values could cause an extremely long
   (potentially billions of iterations) loop in kernel context. This is
   both a **robustness fix** and a **potential DoS vector** (any process
   can read `/proc/<pid>/fdinfo/<fd>` for its own io_uring fds,
   triggering this loop).

### Scope and Risk Assessment

- **Size**: Very small — ~10 lines changed in a single file
- **Subsystem**: io_uring fdinfo (diagnostic/debug path, not hot path)
- **Risk**: Extremely low
  - `cond_resched()` is a standard kernel practice in long loops — zero
    regression risk
  - Bounding the CQE loop to `ctx->cq_entries` is obviously correct —
    the ring can't have more entries than its size
  - The CQE32 `i++` accounting is straightforward
- **Dependencies**: None apparent — this is a self-contained change to a
  single function

### User Impact

- **Who is affected**: Anyone using io_uring who reads fdinfo
  (monitoring tools, debuggers, diagnostic scripts)
- **Severity**: Soft lockups and system unresponsiveness — moderate to
  high severity
- **Reproducibility**: Reported by a real user with a concrete scenario
  (KASAN-enabled kernel with many ring entries)

### Stability Indicators

- Written by Jens Axboe (io_uring maintainer)
- Reviewed by Keith Busch (known kernel developer)
- Small, obviously correct change
- Fixes a user-reported issue

### Conclusion

This commit fixes:
1. A real soft lockup / scheduling latency bug (cond_resched in loops)
2. A potential unbounded loop from userspace-controlled values (CQE
   bounds check)

Both are genuine bugs that affect real users. The fix is small,
obviously correct, self-contained, and carries virtually zero regression
risk. It meets all stable kernel criteria: it fixes a real bug, is small
and contained, is obviously correct, and doesn't introduce new features.

**YES**

 io_uring/fdinfo.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 294c75a8a3bdb..3585ad8308504 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -65,7 +65,7 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 	unsigned int cq_head = READ_ONCE(r->cq.head);
 	unsigned int cq_tail = READ_ONCE(r->cq.tail);
 	unsigned int sq_shift = 0;
-	unsigned int sq_entries;
+	unsigned int cq_entries, sq_entries;
 	int sq_pid = -1, sq_cpu = -1;
 	u64 sq_total_time = 0, sq_work_time = 0;
 	unsigned int i;
@@ -119,9 +119,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 			}
 		}
 		seq_printf(m, "\n");
+		cond_resched();
 	}
 	seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
-	while (cq_head < cq_tail) {
+	cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
+	for (i = 0; i < cq_entries; i++) {
 		struct io_uring_cqe *cqe;
 		bool cqe32 = false;

@@ -136,8 +138,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 					cqe->big_cqe[0], cqe->big_cqe[1]);
 		seq_printf(m, "\n");
 		cq_head++;
-		if (cqe32)
+		if (cqe32) {
 			cq_head++;
+			i++;
+		}
+		cond_resched();
 	}

 	if (ctx->flags & IORING_SETUP_SQPOLL) {
-- 
2.51.0

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-02-09 12:27 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260209122714.1037915-1-sashal@kernel.org>
2026-02-09 12:26 ` [PATCH AUTOSEL 6.18] io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox