From: Joanne Koong <joannelkoong@gmail.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
syzbot@syzkaller.appspotmail.com
Subject: Re: [PATCH v5 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Date: Tue, 16 Dec 2025 15:47:31 +0800 [thread overview]
Message-ID: <CAJnrk1bdkWVLDBrPKFVa7oPNqAw5BCbNo1N393ESp-zQOT0w5A@mail.gmail.com> (raw)
In-Reply-To: <CADUfDZr_aTixiUQUN0yRj=AbuBLTrgk5SXXsjwao54ZmMajUOA@mail.gmail.com>
On Tue, Dec 16, 2025 at 2:24 PM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> On Mon, Dec 15, 2025 at 8:46 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Tue, Dec 16, 2025 at 4:10 AM Caleb Sander Mateos
> > <csander@purestorage.com> wrote:
> > >
> > > io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS
> > > workloads. Even when only one thread pinned to a single CPU is accessing
> > > the io_ring_ctx, the atomic CASes required to lock and unlock the mutex
> > > are very hot instructions. The mutex's primary purpose is to prevent
> > > concurrent io_uring system calls on the same io_ring_ctx. However, there
> > > is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one
> > > task will make io_uring_enter() and io_uring_register() system calls on
> > > the io_ring_ctx once it's enabled.
> > > So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the
> > > uring_lock mutex_lock() and mutex_unlock() on the submitter_task. On
> > > other tasks acquiring the ctx uring lock, use a task work item to
> > > suspend the submitter_task for the critical section.
> >
> > Does this open the pathway to various data corruption issues since the
> > submitter task can be suspended while it's in the middle of executing
> > a section of logic that was previously protected by the mutex? With
>
> I don't think so. The submitter task is suspended by having it run a
> task work item that blocks it until the uring lock is released by the
> other task. Any section where the uring lock is held should either be
> on kernel threads, contained within an io_uring syscall, or contained
> within a task work item, none of which run other task work items. So
> whenever the submitter task runs the suspend task work, it shouldn't
> be in a uring-lock-protected section.
>
> > this patch (if I'm understandng it correctly), there's now no
> > guarantee that the logic inside the mutexed section for
> > IORING_SETUP_SINGLE_ISSUER submitter tasks is "atomically bundled", so
> > if it gets suspended between two state changes that need to be atomic
> > / bundled together, then I think the task that does the suspend would
> > now see corrupt state.
>
> Yes, I suppose there's nothing that prevents code from holding the
> uring lock across syscalls or task work items, but that would already
> be problematic. If a task holds the uring lock on return from a
> syscall or task work and then runs another task work item that tries
> to acquire the uring lock, it would deadlock.
>
> >
> > I did a quick grep and I think one example of this race shows up in
> > io_uring/rsrc.c for buffer cloning where if the src_ctx has
> > IORING_SETUP_SINGLE_ISSUER set and the cloning happens at the same
> > time the submitter task is unregistering the buffers, then this chain
> > of events happens:
> > * submitter task is executing the logic in io_sqe_buffers_unregister()
> > -> io_rsrc_data_free(), and frees data->nodes but data->nr is not yet
> > updated
> > * submitter task gets suspended through io_register_clone_buffers() ->
> > lock_two_rings() -> mutex_lock_nested(&ctx2->uring_lock, ...)
>
> I think what this is missing is that the submitter task can't get
> suspended at arbitrary points. It gets suspended in task work, and
> task work only runs when returning from the kernel to userspace. At
Ahh I see, thanks for the explanation. The documentation for
TWA_SIGNAL in task_work_add() says "@TWA_SIGNAL works like signals, in
that the it will interrupt the targeted task and run the task_work,
regardless of whether the task is currently running in the kernel or
userspace" so i had assumed this preempts the kernel.
Thanks,
Joanne
> which point "nothing" should be running on the task in userspace or
> the kernel and it should be safe to run arbitrary task work items on
> the task. Though Ming recently found an interesting deadlock caused by
> acquiring a mutex in task work that runs on an unlucky ublk server
> thread[1].
>
> [1] https://lore.kernel.org/linux-block/20251212143415.485359-1-ming.lei@redhat.com/
>
> Best,
> Caleb
>
> > * after suspending the src ctx, -> io_clone_buffers() runs, which will
> > get the incorrect "nbufs = src_ctx->buf_table.nr;" value
> > * io_clone_buffers() calls io_rsrc_node_lookup() which will
> > dereference a NULL pointer
> >
> > Thanks,
> > Joanne
> >
> > > If the io_ring_ctx is IORING_SETUP_R_DISABLED (possible during
> > > io_uring_setup(), io_uring_register(), or io_uring exit), submitter_task
> > > may be set concurrently, so acquire the uring_lock before checking it.
> > > If submitter_task isn't set yet, the uring_lock suffices to provide
> > > mutual exclusion.
> > >
> > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > Tested-by: syzbot@syzkaller.appspotmail.com
> > > ---
> > > io_uring/io_uring.c | 12 +++++
> > > io_uring/io_uring.h | 114 ++++++++++++++++++++++++++++++++++++++++++--
> > > 2 files changed, 123 insertions(+), 3 deletions(-)
> > >
next prev parent reply other threads:[~2025-12-16 7:47 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 20:09 [PATCH v5 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-12-15 20:09 ` [PATCH v5 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED Caleb Sander Mateos
2025-12-15 20:09 ` [PATCH v5 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
2025-12-16 0:50 ` Joanne Koong
2025-12-16 2:29 ` Caleb Sander Mateos
2025-12-15 20:09 ` [PATCH v5 3/6] io_uring: ensure io_uring_create() initializes submitter_task Caleb Sander Mateos
2025-12-15 20:09 ` [PATCH v5 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued() Caleb Sander Mateos
2025-12-16 0:34 ` Joanne Koong
2025-12-15 20:09 ` [PATCH v5 5/6] io_uring: factor out uring_lock helpers Caleb Sander Mateos
2025-12-15 20:09 ` [PATCH v5 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-12-16 4:46 ` Joanne Koong
2025-12-16 6:24 ` Caleb Sander Mateos
2025-12-16 7:47 ` Joanne Koong [this message]
2025-12-16 8:14 ` Joanne Koong
2025-12-16 16:03 ` Caleb Sander Mateos
2025-12-17 2:41 ` Joanne Koong
2025-12-16 15:49 ` Caleb Sander Mateos
2025-12-16 5:21 ` [syzbot ci] " syzbot ci
2025-12-18 1:24 ` Caleb Sander Mateos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJnrk1bdkWVLDBrPKFVa7oPNqAw5BCbNo1N393ESp-zQOT0w5A@mail.gmail.com \
--to=joannelkoong@gmail.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=syzbot@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox