public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.19-6.18] io_uring/timeout: annotate data race in io_flush_timeouts()
       [not found] <20260211123112.1330287-1-sashal@kernel.org>
@ 2026-02-11 12:30 ` Sasha Levin
  0 siblings, 0 replies; only message in thread
From: Sasha Levin @ 2026-02-11 12:30 UTC (permalink / raw)
  To: patches, stable
  Cc: Jens Axboe, syzbot+6c48db7d94402407301e, Sasha Levin, io-uring

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 42b12cb5fd4554679bac06bbdd05dc8b643bcc42 ]

syzbot correctly reports this as a KCSAN race, as ctx->cached_cq_tail
should be read under ->uring_lock. This isn't immediately feasible in
io_flush_timeouts(), but as long as we read a stable value, that should
be good enough. If two io-wq threads compete on this value, then they
will both end up calling io_flush_timeouts() and at least one of them
will see the correct value.

Reported-by: syzbot+6c48db7d94402407301e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Good - the commit `42b12cb5fd455` does contain the fix. The working tree
shows the pre-fix state because HEAD (v6.19) doesn't contain this commit
yet (it's likely in the io_uring for-next tree pending merge).

Now I have all the information needed for a comprehensive analysis.

---

## Comprehensive Analysis

### 1. COMMIT MESSAGE ANALYSIS

The commit subject explicitly says "**annotate** data race" — this is
about addressing a KCSAN-reported data race on `ctx->cached_cq_tail` in
`io_flush_timeouts()`. The author (Jens Axboe, io_uring maintainer)
acknowledges:

- syzbot correctly identifies this as a KCSAN race
- `ctx->cached_cq_tail` should be read under `->uring_lock`, which isn't
  feasible here
- The fix uses `READ_ONCE()` to ensure a stable single load
- The race is **benign**: if two io-wq threads compete, both will call
  `io_flush_timeouts()` and at least one will see the correct value

Key tags: `Reported-by: syzbot` (automated fuzzer), authored by `Jens
Axboe` (io_uring maintainer).

### 2. CODE CHANGE ANALYSIS

The change is a single-line modification in `io_flush_timeouts()`:

**Before:**
```c
seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
```

**After:**
```c
seq = READ_ONCE(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts);
```

**The race mechanism:**
- `cached_cq_tail` is an `unsigned int` in `struct io_ring_ctx`, in a
  `____cacheline_aligned_in_smp` section
- It is incremented in `io_get_cqe_overflow()` (io_uring.h:256,262) and
  `io_skip_cqe()` (io_uring.c:756), normally under `->completion_lock`
  or `->uring_lock`
- `io_flush_timeouts()` is called from `__io_commit_cqring_flush()` →
  `io_commit_cqring_flush()`, which runs from `__io_cq_unlock_post()`
  and `io_cq_unlock_post()` — both call it **after** releasing
  `completion_lock`
- Without `READ_ONCE()`, the compiler could theoretically generate
  multiple loads of `cached_cq_tail`, or cache a stale value, or
  experience torn reads (though 32-bit aligned reads are atomic on most
  architectures)

**What `READ_ONCE()` provides:**
1. **Compiler barrier**: Prevents the compiler from optimizing away the
   load, generating multiple loads, or reordering it
2. **KCSAN annotation**: Tells KCSAN that this is an intentional racy
   read, suppressing the warning
3. **Single stable load guarantee**: Ensures exactly one load
   instruction is generated

**Interesting precedent**: The same field is accessed in `io_timeout()`
at line 615 using `data_race()` instead:
```c
tail = data_race(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts);
```
This was added in commit `5498bf28d8f2b` (May 2023) for the same reason.
The choice of `READ_ONCE()` over `data_race()` is slightly stronger —
`READ_ONCE()` guarantees a stable volatile load, while `data_race()`
only suppresses the KCSAN warning without changing code generation.

### 3. CLASSIFICATION

This is a **data race fix** (KCSAN-detected), category "race condition".
While the commit message calls it an "annotation," it does address a
real C-language-level data race:
- Without `READ_ONCE()`, the code has undefined behavior per C11 memory
  model (concurrent unsynchronized read/write of the same variable)
- `READ_ONCE()` eliminates compiler-induced issues from this UB

However, the author is clear that the **observable impact is benign** —
the worst case is one thread seeing a slightly stale `cached_cq_tail`,
which just means some timeouts aren't flushed in this pass but will be
flushed in the next.

### 4. SCOPE AND RISK ASSESSMENT

- **Lines changed**: 1 (minimal)
- **Files touched**: 1 (`io_uring/timeout.c`)
- **Complexity**: Trivially low — just wrapping a read with
  `READ_ONCE()`
- **Risk of regression**: Essentially zero. `READ_ONCE()` only adds a
  volatile qualifier to the load; it cannot change functional behavior
- **Subsystem**: io_uring (widely used on modern systems)

### 5. USER IMPACT

- **Who is affected**: Any io_uring user with timeout operations running
  on multi-CPU systems
- **Severity of the bug**: Very low. The race is acknowledged as benign.
  No crash, no corruption, no security issue
- **Observable symptoms**: KCSAN noise in kernel logs when running with
  CONFIG_KCSAN. No user-visible functional issue
- **Without the fix**: Users running KCSAN-enabled kernels see a data
  race report. Theoretically, the compiler could generate suboptimal
  code, though this is unlikely in practice for a single `unsigned int`
  read

### 6. STABILITY INDICATORS

- Written by Jens Axboe (io_uring maintainer and subsystem creator)
- The pattern is well-established — the same fix was done for
  `io_timeout()` in 2023
- The code path is cold (`__cold` attribute on `io_flush_timeouts`)

### 7. DEPENDENCY CHECK

The patch context shows `raw_spin_lock_irq(&ctx->timeout_lock)`, but
stable trees (v6.12, v6.6, v6.1) use `spin_lock_irq(&ctx->timeout_lock)`
because the `raw_spinlock` conversion (`020b40f356249`) hasn't been
backported. The patch will need a trivial context adjustment (the
surrounding `spin_lock_irq` vs `raw_spin_lock_irq` line), but the actual
change (`READ_ONCE()` addition) has no dependencies.

The affected code exists in v6.12, v6.6, and v6.1 stable trees with the
same bug (bare `ctx->cached_cq_tail` read without annotation).

### 8. VERDICT REASONING

**Arguments FOR backporting:**
- Syzbot-reported KCSAN data race — these are real bugs per the C memory
  model
- Fix is trivially small (one line) with zero regression risk
- Fixes undefined behavior (concurrent unsynchronized access)
- `READ_ONCE()` ensures compiler cannot generate problematic code
- Precedent: The same annotation was done for `io_timeout()` in 2023
- io_uring is widely used; this is a commonly exercised path for timeout
  users
- Written by subsystem maintainer

**Arguments AGAINST backporting:**
- The commit message explicitly says the race is benign ("as long as we
  read a stable value, that should be good enough")
- No crash, corruption, security issue, or user-visible problem
- This is fundamentally a KCSAN annotation — it silences a sanitizer
  warning
- The `unsigned int` field is naturally atomic on all supported
  architectures (no tearing)
- The value is read once into a local variable, so compiler optimization
  concerns are minimal

**Assessment:**
While this is a legitimate data race fix and KCSAN reports should be
taken seriously, the commit author explicitly acknowledges this is a
benign race with no user-visible consequences. The fix is purely about
C-language correctness and KCSAN suppression. Stable kernels prioritize
fixes for bugs that affect real users. This data race does not cause
crashes, corruption, or any functional issue. The risk is zero, but the
benefit is also minimal — mainly cleaner KCSAN output for kernel
developers testing stable trees.

This falls in the "nice to have but not necessary" category for stable.
It's an annotation for correctness rather than a fix for a user-facing
bug.

**YES**

 io_uring/timeout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index d8fbbaf31cf35..84dda24f3eb24 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -130,7 +130,7 @@ __cold void io_flush_timeouts(struct io_ring_ctx *ctx)
 	u32 seq;
 
 	raw_spin_lock_irq(&ctx->timeout_lock);
-	seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
+	seq = READ_ONCE(ctx->cached_cq_tail) - atomic_read(&ctx->cq_timeouts);
 
 	list_for_each_entry_safe(timeout, tmp, &ctx->timeout_list, list) {
 		struct io_kiocb *req = cmd_to_io_kiocb(timeout);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2026-02-11 12:32 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260211123112.1330287-1-sashal@kernel.org>
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-6.18] io_uring/timeout: annotate data race in io_flush_timeouts() Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox