From: Jens Axboe <[email protected]>
To: Xiaoguang Wang <[email protected]>,
[email protected]
Cc: [email protected]
Subject: Re: [PATCH] io_uring: add io_uring_enter(2) fixed file support
Date: Thu, 3 Mar 2022 17:07:21 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 3/3/22 2:19 PM, Jens Axboe wrote:
> On 3/3/22 1:41 PM, Jens Axboe wrote:
>> On 3/3/22 10:18 AM, Jens Axboe wrote:
>>> On 3/3/22 9:31 AM, Jens Axboe wrote:
>>>> On 3/3/22 7:40 AM, Jens Axboe wrote:
>>>>> On 3/3/22 7:36 AM, Jens Axboe wrote:
>>>>>> The only potential oddity here is that the fd passed back is not a
>>>>>> legitimate fd. io_uring does support poll(2) on its file descriptor, so
>>>>>> that could cause some confusion even if I don't think anyone actually
>>>>>> does poll(2) on io_uring.
>>>>>
>>>>> Side note - the only implication here is that we then likely can't make
>>>>> the optimized behavior the default, it has to be an IORING_SETUP_REG
>>>>> flag which tells us that the application is aware of this limitation.
>>>>> Though I guess close(2) might mess with that too... Hmm.
>>>>
>>>> Not sure I can find a good approach for that. Tried out your patch and
>>>> made some fixes:
>>>>
>>>> - Missing free on final tctx free
>>>> - Rename registered_files to registered_rings
>>>> - Fix off-by-ones in checking max registration count
>>>> - Use kcalloc
>>>> - Rename ENTER_FIXED_FILE -> ENTER_REGISTERED_RING
>>>> - Don't pass in tctx to io_uring_unreg_ringfd()
>>>> - Get rid of forward declaration for adding tctx node
>>>> - Get rid of extra file pointer in io_uring_enter()
>>>> - Fix deadlock in io_ringfd_register()
>>>> - Use io_uring_rsrc_update rather than add a new struct type
>>>
>>> - Allow multiple register/unregister instead of enforcing just 1 at the
>>> time
>>> - Check for it really being a ring fd when registering
>>>
>>> For different batch counts, nice improvements are seen. Roughly:
>>>
>>> Batch==1 15% faster
>>> Batch==2 13% faster
>>> Batch==4 11% faster
>>>
>>> This is just in microbenchmarks where the fdget/fdput play a bigger
>>> factor, but it will certainly help with real world situations where
>>> batching is more limited than in benchmarks.
>>
>> In trying this out in applications, I think the better registration API
>> is to allow io_uring to pick the offset. The application doesn't care,
>> it's just a magic integer there. And if we allow io_uring to pick it,
>> then that makes things a lot easier to deal with.
>>
>> For registration, pass in an array of io_uring_rsrc_update structs, just
>> filling in the ring_fd in the data field. Return value is number of ring
>> fds registered, and up->offset now contains the chosen offset for each
>> of them.
>>
>> Unregister is the same struct, but just with offset filled in.
>>
>> For applications using io_uring, which is all of them as far as I'm
>> aware, we can also easily hide this. This means we can get the optimized
>> behavior by default.
>
> Only complication here is if the ring is shared across threads or
> processes. The thread one can be common, one thread doing submit and one
> doing completions. That one is a bit harder to handle. Without
> inheriting registered ring fds, not sure how it can be handled by
> default. Should probably just introduce a new queue init helper, but
> then that requires application changes to adopt...
>
> Ideas welcome!
Don't see a way to do it by default, so I think it'll have to be opt-in.
We could make it the default if you never shared a ring, which most
people don't do, but we can't easily do so if it's ever shared between
tasks or processes.
With liburing, if you share, you share the io_uring struct as well. So
it's hard to maintain a new ->enter_ring_fd in there. It'd be doable if
we could reserve real fd == registered fd. Which is of course possible
using xarray or similar, but that'll add extra overhead.
Anyway, current version below. Only real change here is allowing either
specific offset or generated offset, depending on what the
io_uring_rsrc_update->offset is set to. If set to -1U, then io_uring
will find a free offset. If set to anything else, io_uring will use that
index (as long as it's >=0 && < MAX).
--
Jens Axboe
next prev parent reply other threads:[~2022-03-04 0:07 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-03 5:28 [PATCH] io_uring: add io_uring_enter(2) fixed file support Xiaoguang Wang
2022-03-03 8:56 ` Hao Xu
2022-03-03 13:38 ` Jens Axboe
2022-03-03 14:36 ` Jens Axboe
2022-03-03 14:40 ` Jens Axboe
2022-03-03 16:31 ` Jens Axboe
2022-03-03 17:18 ` Jens Axboe
2022-03-03 20:41 ` Jens Axboe
2022-03-03 21:19 ` Jens Axboe
2022-03-04 0:07 ` Jens Axboe [this message]
2022-03-04 13:39 ` Xiaoguang Wang
2022-03-04 13:44 ` Jens Axboe
2022-03-04 15:16 ` Xiaoguang Wang
2022-03-04 15:22 ` Jens Axboe
2022-03-08 8:38 ` Xiaoguang Wang
2022-03-08 13:10 ` Jens Axboe
2022-03-03 22:24 ` Vito Caputo
2022-03-03 22:26 ` Jens Axboe
2022-03-04 1:49 ` Pavel Begunkov
2022-03-04 2:18 ` Jens Axboe
2022-03-04 2:28 ` Pavel Begunkov
2022-03-04 2:35 ` Pavel Begunkov
2022-03-04 2:43 ` Jens Axboe
2022-03-04 1:52 ` Pavel Begunkov
2022-03-04 2:19 ` Jens Axboe
2022-03-04 2:39 ` Pavel Begunkov
2022-03-04 3:03 ` Jens Axboe
2022-04-21 14:16 ` Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox