public inbox for [email protected]
 help / color / mirror / Atom feed
From: Caleb Sander Mateos <[email protected]>
To: Jens Axboe <[email protected]>, Pavel Begunkov <[email protected]>
Cc: Riley Thomasson <[email protected]>,
	[email protected], [email protected],
	Caleb Sander Mateos <[email protected]>
Subject: [PATCH 0/2] uring_cmd SQE corruptions
Date: Wed, 12 Feb 2025 13:45:44 -0700	[thread overview]
Message-ID: <[email protected]> (raw)

In our application issuing NVMe passthru commands, we have observed
nvme_uring_cmd fields being corrupted between when userspace initializes
the io_uring SQE and when nvme_uring_cmd_io() processes it.

We hypothesized that the uring_cmd's were executing asynchronously after
the io_uring_enter() syscall returned, yet were still reading the SQE in
the userspace-mapped SQ. Since io_uring_enter() had already incremented
the SQ head index, userspace reused the SQ slot for a new SQE once the
SQ wrapped around to it.

We confirmed this hypothesis by "poisoning" all SQEs up to the SQ head
index in userspace upon return from io_uring_enter(). By overwriting the
nvme_uring_cmd nsid field with a known garbage value, we were able to
trigger the err message in nvme_validate_passthru_nsid(), which logged
the garbage nsid value.

The issue is caused by commit 5eff57fa9f3a ("io_uring/uring_cmd: defer
SQE copying until it's needed"). With this commit reverted, the poisoned
values in the SQEs are no longer seen by nvme_uring_cmd_io().

Prior to the commit, each uring_cmd SQE was unconditionally memcpy()ed
to async_data at prep time. The commit moved this memcpy() to 2 cases
when the request goes async:
- If REQ_F_FORCE_ASYNC is set to force the initial issue to go async
- If ->uring_cmd() returns -EAGAIN in the initial non-blocking issue

This patch set fixes a bug in the EAGAIN case where the uring_cmd's sqe
pointer is not updated to point to async_data after the memcpy(),
as it correctly is in the REQ_F_FORCE_ASYNC case.

However, uring_cmd's can be issued async in other cases not enumerated
by 5eff57fa9f3a, also leading to SQE corruption. These include requests
besides the first in a linked chain, which are only issued once prior
requests complete. Requests waiting for a drain to complete would also
be initially issued async.

While it's probably possible for io_uring_cmd_prep_setup() to check for
each of these cases and avoid deferring the SQE memcpy(), we feel it
might be safer to revert 5eff57fa9f3a to avoid the corruption risk.
As discussed recently in regard to the ublk zero-copy patches[1], new
async paths added in the future could break these delicate assumptions.

Thoughts?

[1]: https://lore.kernel.org/io-uring/[email protected]/

Caleb Sander Mateos (2):
  io_uring/uring_cmd: don't assume io_uring_cmd_data layout
  io_uring/uring_cmd: switch sqe to async_data on EAGAIN

 io_uring/uring_cmd.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

-- 
2.45.2


             reply	other threads:[~2025-02-12 20:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-12 20:45 Caleb Sander Mateos [this message]
2025-02-12 20:45 ` [PATCH 1/2] io_uring/uring_cmd: don't assume io_uring_cmd_data layout Caleb Sander Mateos
2025-02-12 20:45 ` [PATCH 2/2] io_uring/uring_cmd: switch sqe to async_data on EAGAIN Caleb Sander Mateos
2025-02-12 20:55 ` [PATCH 0/2] uring_cmd SQE corruptions Jens Axboe
2025-02-12 21:02   ` Jens Axboe
2025-02-12 21:58     ` Caleb Sander
2025-02-12 22:34       ` Jens Axboe
2025-02-12 22:52         ` Caleb Sander
2025-02-12 22:56           ` Jens Axboe
2025-02-12 21:54   ` Caleb Sander
2025-02-12 22:39     ` Jens Axboe
2025-02-12 23:07       ` Caleb Sander Mateos
2025-02-12 23:21         ` Keith Busch
2025-02-12 23:46           ` Caleb Sander Mateos
2025-02-12 23:55             ` Jens Axboe
2025-02-13 16:28               ` Pavel Begunkov
2025-02-13 16:11             ` Pavel Begunkov
2025-02-13 14:48   ` Pavel Begunkov
2025-02-13 18:13 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox