From: Pavel Begunkov <[email protected]>
To: Ming Lei <[email protected]>,
Caleb Sander Mateos <[email protected]>
Cc: Jens Axboe <[email protected]>, Keith Busch <[email protected]>,
Christoph Hellwig <[email protected]>, Sagi Grimberg <[email protected]>,
Xinyu Zhang <[email protected]>,
[email protected], [email protected],
[email protected]
Subject: Re: [PATCH 0/3] Consistently look up fixed buffers before going async
Date: Mon, 24 Mar 2025 16:41:23 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <Z95nyw8LUw0aHKCu@fedora>
On 3/22/25 07:33, Ming Lei wrote:
> On Fri, Mar 21, 2025 at 12:48:16PM -0600, Caleb Sander Mateos wrote:
>> To use ublk zero copy, an application submits a sequence of io_uring
>> operations:
>> (1) Register a ublk request's buffer into the fixed buffer table
>> (2) Use the fixed buffer in some I/O operation
>> (3) Unregister the buffer from the fixed buffer table
>>
>> The ordering of these operations is critical; if the fixed buffer lookup
>> occurs before the register or after the unregister operation, the I/O
>> will fail with EFAULT or even corrupt a different ublk request's buffer.
>> It is possible to guarantee the correct order by linking the operations,
>> but that adds overhead and doesn't allow multiple I/O operations to
>> execute in parallel using the same ublk request's buffer. Ideally, the
>> application could just submit the register, I/O, and unregister SQEs in
>> the desired order without links and io_uring would ensure the ordering.
>
> So far there are only two ways to provide the order guarantee in io_uring
> syscall viewpoint:
>
> 1) IOSQE_IO_LINK
>
> 2) submit register_buffer operation and wait its completion, then submit IO
> operations
>
> Otherwise, you may just depend on the implementation, and there isn't such
> order guarantee, and it is hard to write generic io_uring application.
>
> I posted sqe group patchset for addressing this particular requirement in
> API level.
>
> https://lore.kernel.org/linux-block/[email protected]/
>
> Now I'd suggest to re-consider this approach for respecting the order
> in API level, so both application and io_uring needn't play trick for
> addressing this real problem.
The group api was one of the major sources of uneasiness for previous
iterations of ublk zc. The kernel side was messy, even though I
understand that the messiness was necessitated from the choice of
the API and the mismatch with existing io_uring bits.
The question is whether it can be made simpler and more streamlined
now, internally and from the point of uapi as well. E.g. can it
extend traditional links paths without leaking into other core io_uring
parts where it shouldn't be? And to be honest, I can't say I like the
idea, just as I'm not excited by links we already have. They're a
pain to keep around, the abstraction is leaking in all unexpected
places, and it's not flexible enough and needs kernel changes for
every new simple case, not to mention something more complicated
like reading a memory and deciding about the next request from that.
I'd rather argue for letting the user to do that in bpf and make
it responsible for all error parsing and argument inference, as
in patches I sent around December, though they need to be extended
to go beyond cqe-sqe manipulation interface.
> With sqe group, just two OPs are needed:
>
> - provide_buffer OP(group leader)
>
> - other generic OPs(group members)
>
> group leader won't be completed until all group member OPs are done.
>
> The whole group share same IO_LINK/IO_HARDLINK flag.
>
> That is all the concept, and this approach takes less SQEs, and application
> will become simpler too.
--
Pavel Begunkov
prev parent reply other threads:[~2025-03-24 16:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
2025-03-21 18:48 ` [PATCH 1/3] io_uring/net: only import send_zc buffer once Caleb Sander Mateos
2025-03-21 20:38 ` Pavel Begunkov
2025-03-21 20:44 ` Caleb Sander Mateos
2025-03-21 18:48 ` [PATCH 2/3] io_uring/net: import send_zc fixed buffer before going async Caleb Sander Mateos
2025-03-21 18:48 ` [PATCH 3/3] io_uring/uring_cmd: import " Caleb Sander Mateos
2025-03-21 20:35 ` Pavel Begunkov
2025-03-21 21:38 ` Caleb Sander Mateos
2025-03-22 12:18 ` Pavel Begunkov
2025-03-21 19:53 ` [PATCH 0/3] Consistently look up fixed buffers " Jens Axboe
2025-03-21 20:24 ` Pavel Begunkov
2025-03-21 21:24 ` Caleb Sander Mateos
2025-03-22 12:33 ` Pavel Begunkov
2025-03-22 7:42 ` Ming Lei
2025-03-22 7:33 ` Ming Lei
2025-03-24 16:41 ` Pavel Begunkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox