From: Stefan Metzmacher <[email protected]>
To: Jens Axboe <[email protected]>, "Darrick J. Wong" <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH 2/5] io_uring: add support for IORING_OP_URING_CMD
Date: Tue, 23 Feb 2021 09:14:56 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
[-- Attachment #1.1: Type: text/plain, Size: 8550 bytes --]
Am 22.02.21 um 21:14 schrieb Jens Axboe:
> On 2/22/21 1:04 PM, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>>>> I've been thinking along the same lines, because having a sparse sqe layout
>>>> for the uring cmd is a pain. I do think 'personality' is a bit too specific
>>>> to be part of the shared space, that should probably belong in the pdu
>>>> instead if the user needs it. One thing they all have in common is that they'd
>>>> need a sub-command, so why not make that u16 that?
>>>>
>>>> There's also the option of simply saying that the uring_cmd sqe is just
>>>> a different type, ala:
>>>>
>>>> struct io_uring_cmd_sqe {
>>>> __u8 opcode; /* IO_OP_URING_CMD */
>>>> __u8 flags;
>>>> __u16 target_op;
>>>> __s32 fd;
>>>> __u64 user_data;
>>>> strut io_uring_cmd_pdu cmd_pdu;
>>>> };
>>>>
>>>> which is essentially the same as your suggestion in terms of layout
>>>> (because that is the one that makes the most sense), we just don't try
>>>> and shoe-horn it into the existing sqe. As long as we overlap
>>>> opcode/flags, then init is fine. And past init, sqe is already consumed.
>>>>
>>>> Haven't tried and wire that up yet, and it may just be that the simple
>>>> layout change you did is just easier to deal with. The important part
>>>> here is the layout, and I certainly think we should do that. There's
>>>> effectively 54 bytes of data there, if you include the target op and fd
>>>> as part of that space. 48 fully usable for whatever.
>>>
>>> OK, folded in some of your stuff, and pushed out a new branch. Find it
>>> here:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v3
>>>
>>> I did notice while doing so that you put the issue flags in the cmd,
>>> I've made them external again. Just seems cleaner to me, otherwise
>>> you'd have to modify the command for reissue rather than just
>>> pass in the flags directly.
>>
>> I think the first two commits need more verbose comments, which clearly
>> document the uring_cmd() API.
>
> Oh for sure, I just haven't gotten around to it yet :-)
>
>> Event before uring_cmd(), it's really not clear to me why we have
>> 'enum io_uring_cmd_flags', as 'enum'.
>> As it seems to be use it as 'flags' (IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER).
>
> They could be unsigned in too, but not really a big deal imho.
>
>> With uring_cmd() it's not clear what the backend is supposed to do
>> with these flags.
>
> IO_URING_F_NONBLOCK tells the lower layers that the operation should be
> non-blocking, and if that isn't possible, then it must return -EAGAIN.
> If that happens, then the operation will be retried from a context where
> IO_URING_F_NONBLOCK isn't set.
>
> IO_URING_F_COMPLETE_DEFER is just part of the flags that should be
> passed to the completion side, the handler need not do anything else.
> It's only used internally, but allows fast processing if the completion
> occurs off the IO_URING_F_NONBLOCK path.
>
> It'll get documented... But the above is also why it should get passed
> in, rather than stuffed in the command itself.
Thanks!
>> I'd assume that uring_cmd() would per definition never block and go
>> async itself, by returning -EIOCBQUEUED. And a single &req->uring_cmd
>> is only ever passed once to uring_cmd() without any retry.
>
> No, -EIOCBQUEUED would mean "operation is queued, I'll call the
> completion callback for it later".
That's what I meant with async here.
> For example, you start the IO operation and you'll get a notification (eg IRQ) later on which allows
> you to complete it.
Yes, it's up to the implementation of uring_cmd() to do the processing and waiting
in the background, based on timers, hardware events or whatever and finally call
io_uring_cmd_done().
But with this:
ret = file->f_op->uring_cmd(&req->uring_cmd, issue_flags);
/* queued async, consumer will call io_uring_cmd_done() when complete */
if (ret == -EIOCBQUEUED)
return 0;
io_uring_cmd_done(&req->uring_cmd, ret);
return 0;
I don't see where -EAGAIN would trigger a retry in a io-wq worker context.
Isn't -EAGAIN exposed to the cqe. Similar to ret == -EAGAIN && req->flags & REQ_F_NOWAIT.
>> It's also not clear if IOSQE_ASYNC should have any impact.
>
> Handler doesn't need to care about that, it'll just mean that the
> initial queue attempt will not have IO_URING_F_NONBLOCK set.
Ok, because it's done from the io-wq worker, correct?
>> I think we also need a way to pass IORING_OP_ASYNC_CANCEL down.
>
> Cancelation indeed needs some thought. There are a few options:
>
> 1) Request completes sync, obviously no cancelation needed here as the
> request is never stuck in a state that requires cancelation.
>
> 2) Request needs blocking context, and hence an async handler is doing
> it. The regular cancelation already works for that, nothing is needed
> here. Would probably be better handled with a cancel handler.
>
> 3) uring_cmd handler returns -EIOCBQUEUED. This is the only case that
> needs active cancelation support. Only case where that would
> currently happen are things like block IO, where we don't support
> cancelation to begin with (insert long rant on inadequate hw
> support).
>
> So tldr here is that 1+2 is already there, and 3 not being fixed leaves
> us no different than the existing support for cancelation. IOW, I don't
> think this is an initial requirement, support can always be expanded
> later.
Given that you list 2) here again, I get the impression that the logic should be:
ret = file->f_op->uring_cmd(&req->uring_cmd, issue_flags);
/* reschedule in io-wq worker again */
if (ret == -EAGAIN)
return ret;
/* queued async, consumer will call io_uring_cmd_done() when complete */
if (ret == -EIOCBQUEUED)
return 0;
io_uring_cmd_done(&req->uring_cmd, ret);
return 0;
With that the above would make sense and seems to make the whole design more flexible
for the uring_cmd implementers.
However my primary use case would be using the -EIOCBQUEUED logic.
And I think it would be good to have IORING_OP_ASYNC_CANCEL logic in place for that,
as it would simplify the userspace logic to single io_uring_opcode_supported(IO_OP_URING_CMD).
I also noticed that some sendmsg/recvmsg implementations support -EIOCBQUEUED, e.g. _aead_recvmsg(),
I guess it would be nice to support that for IORING_OP_SENDMSG and IORING_OP_RECVMSG as well.
It uses struct kiocb and iocb->ki_complete().
Would it make sense to also use struct kiocb and iocb->ki_complete() instead of
a custom io_uring_cmd_done()?
Maybe it would be possible to also have a common way to cancel an struct kiocb request...
>>> Since we just need that one branch in req init, I do think that your
>>> suggestion of just modifying io_uring_sqe is the way to go. So that's
>>> what the above branch does.
>>
>> Thanks! I think it's much easier to handle the personality logic in
>> the core only.
>>
>> For fixed files or fixed buffers I think helper functions like this:
>>
>> struct file *io_uring_cmd_get_file(struct io_uring_cmd *cmd, int fd, bool fixed);
>>
>> And similar functions for io_buffer_select or io_import_fixed.
>
> I did end up retaining that, at least in its current state it's like you
> proposed. Only change is some packing on that very union, which should
> not be necessary, but due to fun arm reasons it is.
I noticed that thanks!
Do you also think a io_uring_cmd_get_file() would be useful?
My uring_cmd() implementation would need a 2nd struct file in order to
do something similar to a splice operation. And it might be good to
allow also fixed files to be used.
Referencing fixed buffer may also be useful, I'm not 100% sure I'll need them,
but it would be good to be flexible and prototype various solutions.
>>> I tested the block side, and it works for getting the bs of the
>>> device. That's all the testing that has been done so far :-)
>>
>> I've added EXPORT_SYMBOL(io_uring_cmd_done); and split your net patch,
>> similar to the two block patches. So we can better isolate the core
>> from the first consumers.
>>
>> See https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/io_uring-fops.v3
>
> Great thanks, I'll take a look and fold back. I'll also expand those
> commit messages :-)
Thanks!
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-02-23 8:16 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-27 21:25 [PATCHSET RFC 0/5] file_operations based io_uring commands Jens Axboe
2021-01-27 21:25 ` [PATCH 1/5] fs: add file_operations->uring_cmd() Jens Axboe
2021-01-27 21:25 ` [PATCH 2/5] io_uring: add support for IORING_OP_URING_CMD Jens Axboe
2021-01-28 0:38 ` Darrick J. Wong
2021-01-28 1:45 ` Jens Axboe
2021-01-28 2:19 ` Jens Axboe
2021-02-20 3:57 ` Stefan Metzmacher
2021-02-20 14:50 ` Jens Axboe
2021-02-20 16:45 ` Jens Axboe
2021-02-22 20:04 ` Stefan Metzmacher
2021-02-22 20:14 ` Jens Axboe
2021-02-23 8:14 ` Stefan Metzmacher [this message]
2021-02-23 13:21 ` Pavel Begunkov
2021-01-27 21:25 ` [PATCH 3/5] block: wire up support for file_operations->uring_cmd() Jens Axboe
2021-01-27 21:25 ` [PATCH 4/5] block: add example ioctl Jens Axboe
2021-01-27 21:25 ` [PATCH 5/5] net: wire up support for file_operations->uring_cmd() Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox