public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Ming Lei <[email protected]>
Cc: Jens Axboe <[email protected]>,
	[email protected], [email protected],
	Kevin Wolf <[email protected]>
Subject: Re: [PATCH 2/9] io_uring: support user sqe ext flags
Date: Tue, 30 Apr 2024 15:10:01 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <ZjDqb80OTfb6WzBp@fedora>

On 4/30/24 13:56, Ming Lei wrote:
> On Tue, Apr 30, 2024 at 01:00:30PM +0100, Pavel Begunkov wrote:
>> On 4/30/24 04:43, Ming Lei wrote:
>>> On Mon, Apr 29, 2024 at 04:24:54PM +0100, Pavel Begunkov wrote:
>>>> On 4/23/24 14:57, Ming Lei wrote:
>>>>> On Mon, Apr 22, 2024 at 12:16:12PM -0600, Jens Axboe wrote:
>>>>>> On 4/7/24 7:03 PM, Ming Lei wrote:
>>>>>>> sqe->flags is u8, and now we have used 7 bits, so take the last one for
>>>>>>> extending purpose.
>>>>>>>
>>>>>>> If bit7(IOSQE_HAS_EXT_FLAGS_BIT) is 1, it means this sqe carries ext flags
>>>>>>> from the last byte(.ext_flags), or bit23~bit16 of sqe->uring_cmd_flags for
>>>>>>> IORING_OP_URING_CMD.
>>>>>>>
>>>>>>> io_slot_flags() return value is converted to `ULL` because the affected bits
>>>>>>> are beyond 32bit now.
>>>>>>
>>>>>> If we're extending flags, which is something we arguably need to do at
>>>>>> some point, I think we should have them be generic and not spread out.
>>>>>
>>>>> Sorry, maybe I don't get your idea, and the ext_flag itself is always
>>>>> initialized in io_init_req(), like normal sqe->flags, same with its
>>>>> usage.
>>>>>
>>>>>> If uring_cmd needs specific flags and don't have them, then we should
>>>>>> add it just for that.
>>>>>
>>>>> The only difference is that bit23~bit16 of sqe->uring_cmd_flags is
>>>>> borrowed for uring_cmd's ext flags, because sqe byte0~47 have been taken,
>>>>> and can't be reused for generic flag. If we want to use byte48~63, it has
>>>>> to be overlapped with uring_cmd's payload, and it is one generic sqe
>>>>> flag, which is applied on uring_cmd too.
>>>>
>>>> Which is exactly the mess nobody would want to see. And I'd also
>>>
>>> The trouble is introduced by supporting uring_cmd, and solving it by setting
>>> ext flags for uring_cmd specially by liburing helper is still reasonable or
>>> understandable, IMO.
>>>
>>>> argue 8 extra bits is not enough anyway, otherwise the history will
>>>> repeat itself pretty soon
>>>
>>> It is started with 8 bits, now doubled when io_uring is basically
>>> mature, even though history might repeat, it will take much longer time
>>
>> You're mistaken, only 7 bits are taken not because there haven't been
>> ideas and need to use them, but because we're out of space and we've
>> been saving it for something that might be absolutely necessary.
>>
>> POLL_FIRST IMHO should've been a generic feature, but it worked around
>> being a send/recv specific flag, same goes for the use of registered
>> buffers, not to mention ideas for which we haven't had enough flag space.
> 
> OK, but I am wondering why not extend flags a bit so that io_uring can
> become extendable, just like this patch.

That would be great if can be done cleanly. Even having it
non contig with the first 8bits is fine, but not conditional
depending on opcode is too much.


>>>>> That is the only way I thought of, or any other suggestion for extending sqe
>>>>> flags generically?
>>>>
>>>> idea 1: just use the last bit. When we need another one it'd be time
>>>> to think about a long overdue SQE layout v2, this way we can try
>>>> to make flags u32 and clean up other problems.
>>>
>>> It looks over-kill to invent SQE v2 just for solving the trouble in
>>> uring_cmd, and supporting two layouts can be new trouble for io_uring.
>>
>> Sounds too uring_cmd centric, it's not specifically for uring_cmd, it's
>> just one of reasons. As for overkill, that's why I'm not telling you
>> to change the layour, but suggesting to take the last bit for the
>> group flag and leave future problems for the future.
> 
> You mentioned 8bit flag is designed from beginning just for saving
> space, so SQE V2 may not help us at all.

Not sure what you mean. Retrospectively speaking, u8 for flags was
an oversight


> If the last bit can be reserved for extend flag, it is still possible
> to extend sqe flags a bit, such as this patch. Otherwise, we just lose
> chance to extend sqe flags in future.

That's why I mentioned SQE layout v2, i.e. a ctx flag which reshuffles
sqe fields in a better way. Surely there will be a lot of headache with
such a migration, but you can make flags a u32 then if you find space
and wouldn't even need and extending flag.


> Jens, can you share your idea/option wrt. extending sqe flags?
> 
>>
>>
>>> Also I doubt the problem can be solved in layout v2:
>>>
>>> - 64 byte is small enough to support everything, same for v2
>>>
>>> - uring_cmd has only 16 bytes payload, taking any byte from
>>> the payload may cause trouble for drivers
>>>
>>> - the only possible change could still be to suppress bytes for OP
>>> specific flags, but it might cause trouble for some OPs, such as
>>> network.
>>
>> Look up sqe's __pad1, for example
> 
> Suppose it is just for uring_cmd, '__pad1' is shared with cmd_op, which is aligned
> with ioctl cmd and is supposed to be 32bit.

It's not shared with cmd_op, it's in a struct with it, unless you
use a u32 part of ->addr2/off, it's just that, a completely
unnecessary created padding. There was also another field left,
at least in case for nvme.

> Same with 'off' which is used in rw at least, if sqe group is to be
> generic flag.
> 
>>
>>
>>>> idea 2: the group assembling flag can move into cmds. Very roughly:
>>>>
>>>> io_cmd_init() {
>>>> 	ublk_cmd_init();
>>>> }
>>>>
>>>> ublk_cmd_init() {
>>>> 	io_uring_start_grouping(ctx, cmd);
>>>> }
>>>>
>>>> io_uring_start_grouping(ctx, cmd) {
>>>> 	ctx->grouping = true;
>>>> 	ctx->group_head = cmd->req;
>>>> }
>>>
>>> How can you know one group is starting without any flag? Or you still
>>> suggest the approach taken in fused command?
>>
>> That would be ublk's business, e.g. ublk or cmds specific flag
> 
> Then it becomes dedicated fused command actually, and last year's main
> concern is that the approach isn't generic.

My concern is anything leaking into hot paths, even if it's a
generic feature (and I wouldn't call it that). The question is
rather at what degree. I wouldn't call groups in isolation
without zc exciting, and making it to look like a generic feature
just for the sake of it might even be worse than having it opcode
specific.

Regardless, this approach doesn't forbid some other opcode from
doing ctx->grouping = true based on some other opcode specific
flag, doesn't necessarily binds it to cmds/ublk.

>>>> submit_sqe() {
>>>> 	if (ctx->grouping) {
>>>> 		link_to_group(req, ctx->group_head);
>>>> 		if (!(req->flags & REQ_F_LINK))
>>>> 			ctx->grouping = false;
>>>> 	}
>>>> }
>>>
>>> The group needs to be linked to existed link chain, so reusing REQ_F_LINK may
>>> not doable.
>>
>> Would it break zero copy feature if you cant?
> 
> The whole sqe group needs to be linked to existed link chain, so we
> can't reuse REQ_F_LINK here.

Why though? You're passing a buffer from the head to all group-linked
requests, how do normal links come into the picture?

-- 
Pavel Begunkov

  reply	other threads:[~2024-04-30 14:09 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-08  1:03 [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-04-08  1:03 ` [PATCH 1/9] io_uring: net: don't check sqe->__pad2[0] for send zc Ming Lei
2024-04-08  1:03 ` [PATCH 2/9] io_uring: support user sqe ext flags Ming Lei
2024-04-22 18:16   ` Jens Axboe
2024-04-23 13:57     ` Ming Lei
2024-04-29 15:24       ` Pavel Begunkov
2024-04-30  3:43         ` Ming Lei
2024-04-30 12:00           ` Pavel Begunkov
2024-04-30 12:56             ` Ming Lei
2024-04-30 14:10               ` Pavel Begunkov [this message]
2024-04-30 15:46                 ` Ming Lei
2024-05-02 14:22                   ` Pavel Begunkov
2024-05-04  1:19                     ` Ming Lei
2024-04-08  1:03 ` [PATCH 3/9] io_uring: add helper for filling cqes in __io_submit_flush_completions() Ming Lei
2024-04-08  1:03 ` [PATCH 4/9] io_uring: add one output argument to io_submit_sqe Ming Lei
2024-04-08  1:03 ` [PATCH 5/9] io_uring: support SQE group Ming Lei
2024-04-22 18:27   ` Jens Axboe
2024-04-23 13:08     ` Kevin Wolf
2024-04-24  1:39       ` Ming Lei
2024-04-25  9:27         ` Kevin Wolf
2024-04-26  7:53           ` Ming Lei
2024-04-26 17:05             ` Kevin Wolf
2024-04-29  3:34               ` Ming Lei
2024-04-29 15:48         ` Pavel Begunkov
2024-04-30  3:07           ` Ming Lei
2024-04-29 15:32       ` Pavel Begunkov
2024-04-30  3:03         ` Ming Lei
2024-04-30 12:27           ` Pavel Begunkov
2024-04-30 15:00             ` Ming Lei
2024-05-02 14:09               ` Pavel Begunkov
2024-05-04  1:56                 ` Ming Lei
2024-05-02 14:28               ` Pavel Begunkov
2024-04-24  0:46     ` Ming Lei
2024-04-08  1:03 ` [PATCH 6/9] io_uring: support providing sqe group buffer Ming Lei
2024-04-08  1:03 ` [PATCH 7/9] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-04-08  1:03 ` [PATCH 8/9] ublk: support provide io buffer Ming Lei
2024-04-08  1:03 ` [RFC PATCH 9/9] liburing: support sqe ext_flags & sqe group Ming Lei
2024-04-19  0:55 ` [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox