public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: "Jens Axboe" <axboe@kernel.dk>, 是参差 <shicenci@gmail.com>,
	"Keith Busch" <kbusch@kernel.org>,
	"Sasha Levin" <sashal@kernel.org>,
	io-uring@vger.kernel.org
Subject: [PATCH AUTOSEL 6.18] io_uring/fdinfo: be a bit nicer when looping a lot of SQEs/CQEs
Date: Mon,  9 Feb 2026 07:26:41 -0500	[thread overview]
Message-ID: <20260209122714.1037915-2-sashal@kernel.org> (raw)
In-Reply-To: <20260209122714.1037915-1-sashal@kernel.org>

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 38cfdd9dd279473a73814df9fd7e6e716951d361 ]

Add cond_resched() in those dump loops, just in case a lot of entries
are being dumped. And detect invalid CQ ring head/tail entries, to avoid
iterating more than what is necessary. Generally not an issue, but can be
if things like KASAN or other debugging metrics are enabled.

Reported-by: 是参差 <shicenci@gmail.com>
Link: https://lore.kernel.org/all/PS1PPF7E1D7501FE5631002D242DD89403FAB9BA@PS1PPF7E1D7501F.apcprd02.prod.outlook.com/
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Analysis of io_uring/fdinfo: cond_resched() and CQ bounds fix

### Commit Message Analysis

The commit addresses two issues in `io_uring/fdinfo.c`:

1. **Adding `cond_resched()` calls** in SQE/CQE dump loops to avoid
   holding the CPU for too long when dumping many entries, particularly
   with debugging tools like KASAN enabled.
2. **Detecting invalid CQ ring head/tail entries** to avoid iterating
   more than necessary — this bounds the CQE loop to `ctx->cq_entries`
   maximum iterations.

There's a `Reported-by:` tag indicating a real user hit this issue, and
a `Link:` to the mailing list discussion. The patch is `Reviewed-by:
Keith Busch` and authored by `Jens Axboe` (io_uring maintainer).

### Code Change Analysis

Let me examine the changes in detail:

**Change 1: `cond_resched()` in SQE loop** (line after `seq_printf(m,
"\n")` in the SQE loop)
- This is a straightforward addition to yield the CPU during potentially
  long loops. The SQE loop already had bounded iteration (`sq_entries`
  was calculated earlier), but with KASAN or heavy debugging, each
  iteration can be slow. This prevents soft lockups.

**Change 2: Bounded CQE loop**
- Old code: `while (cq_head < cq_tail)` — this depends on userspace-
  controlled `cq_head` and `cq_tail` values (read with `READ_ONCE`). If
  these values are corrupted or malicious, `cq_tail - cq_head` could be
  enormous (up to `UINT_MAX`), causing an extremely long loop.
- New code: `cq_entries = min(cq_tail - cq_head, ctx->cq_entries)` and
  `for (i = 0; i < cq_entries; i++)` — this bounds the iteration to at
  most `ctx->cq_entries`, which is the actual ring size. This is a
  defensive bounds check.
- Also adds `cond_resched()` in the CQE loop.

**Change 3: CQE32 accounting fix**
- When CQE32 is detected (`cqe32 = true`), both `cq_head` and `i` are
  incremented, properly accounting for the double-sized CQE entry in the
  bounded loop.

### Bug Classification

This fixes two real problems:

1. **Soft lockup / scheduling latency issue**: Without `cond_resched()`,
   dumping many SQEs/CQEs (especially with KASAN) can cause the kernel
   to not schedule for a long time, triggering soft lockup warnings or
   causing system unresponsiveness. This is a **real bug** — reported by
   a user.

2. **Unbounded loop from userspace-controlled values**: The original CQE
   loop was bounded only by `cq_tail - cq_head`, which are userspace-
   written values. While in the normal case these are reasonable,
   corrupted or malicious values could cause an extremely long
   (potentially billions of iterations) loop in kernel context. This is
   both a **robustness fix** and a **potential DoS vector** (any process
   can read `/proc/<pid>/fdinfo/<fd>` for its own io_uring fds,
   triggering this loop).

### Scope and Risk Assessment

- **Size**: Very small — ~10 lines changed in a single file
- **Subsystem**: io_uring fdinfo (diagnostic/debug path, not hot path)
- **Risk**: Extremely low
  - `cond_resched()` is a standard kernel practice in long loops — zero
    regression risk
  - Bounding the CQE loop to `ctx->cq_entries` is obviously correct —
    the ring can't have more entries than its size
  - The CQE32 `i++` accounting is straightforward
- **Dependencies**: None apparent — this is a self-contained change to a
  single function

### User Impact

- **Who is affected**: Anyone using io_uring who reads fdinfo
  (monitoring tools, debuggers, diagnostic scripts)
- **Severity**: Soft lockups and system unresponsiveness — moderate to
  high severity
- **Reproducibility**: Reported by a real user with a concrete scenario
  (KASAN-enabled kernel with many ring entries)

### Stability Indicators

- Written by Jens Axboe (io_uring maintainer)
- Reviewed by Keith Busch (known kernel developer)
- Small, obviously correct change
- Fixes a user-reported issue

### Conclusion

This commit fixes:
1. A real soft lockup / scheduling latency bug (cond_resched in loops)
2. A potential unbounded loop from userspace-controlled values (CQE
   bounds check)

Both are genuine bugs that affect real users. The fix is small,
obviously correct, self-contained, and carries virtually zero regression
risk. It meets all stable kernel criteria: it fixes a real bug, is small
and contained, is obviously correct, and doesn't introduce new features.

**YES**

 io_uring/fdinfo.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 294c75a8a3bdb..3585ad8308504 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -65,7 +65,7 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 	unsigned int cq_head = READ_ONCE(r->cq.head);
 	unsigned int cq_tail = READ_ONCE(r->cq.tail);
 	unsigned int sq_shift = 0;
-	unsigned int sq_entries;
+	unsigned int cq_entries, sq_entries;
 	int sq_pid = -1, sq_cpu = -1;
 	u64 sq_total_time = 0, sq_work_time = 0;
 	unsigned int i;
@@ -119,9 +119,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 			}
 		}
 		seq_printf(m, "\n");
+		cond_resched();
 	}
 	seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
-	while (cq_head < cq_tail) {
+	cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
+	for (i = 0; i < cq_entries; i++) {
 		struct io_uring_cqe *cqe;
 		bool cqe32 = false;
 
@@ -136,8 +138,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 					cqe->big_cqe[0], cqe->big_cqe[1]);
 		seq_printf(m, "\n");
 		cq_head++;
-		if (cqe32)
+		if (cqe32) {
 			cq_head++;
+			i++;
+		}
+		cond_resched();
 	}
 
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
-- 
2.51.0


           reply	other threads:[~2026-02-09 12:27 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <20260209122714.1037915-1-sashal@kernel.org>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260209122714.1037915-2-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=kbusch@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=shicenci@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox