From: Xiaoguang Wang <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH] io_uring: add io_uring_enter(2) fixed file support
Date: Fri, 4 Mar 2022 21:39:34 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
hi,
> On 3/3/22 2:19 PM, Jens Axboe wrote:
>> On 3/3/22 1:41 PM, Jens Axboe wrote:
>>> On 3/3/22 10:18 AM, Jens Axboe wrote:
>>>> On 3/3/22 9:31 AM, Jens Axboe wrote:
>>>>> On 3/3/22 7:40 AM, Jens Axboe wrote:
>>>>>> On 3/3/22 7:36 AM, Jens Axboe wrote:
>>>>>>> The only potential oddity here is that the fd passed back is not a
>>>>>>> legitimate fd. io_uring does support poll(2) on its file descriptor, so
>>>>>>> that could cause some confusion even if I don't think anyone actually
>>>>>>> does poll(2) on io_uring.
>>>>>> Side note - the only implication here is that we then likely can't make
>>>>>> the optimized behavior the default, it has to be an IORING_SETUP_REG
>>>>>> flag which tells us that the application is aware of this limitation.
>>>>>> Though I guess close(2) might mess with that too... Hmm.
>>>>> Not sure I can find a good approach for that. Tried out your patch and
>>>>> made some fixes:
>>>>>
>>>>> - Missing free on final tctx free
>>>>> - Rename registered_files to registered_rings
>>>>> - Fix off-by-ones in checking max registration count
>>>>> - Use kcalloc
>>>>> - Rename ENTER_FIXED_FILE -> ENTER_REGISTERED_RING
>>>>> - Don't pass in tctx to io_uring_unreg_ringfd()
>>>>> - Get rid of forward declaration for adding tctx node
>>>>> - Get rid of extra file pointer in io_uring_enter()
>>>>> - Fix deadlock in io_ringfd_register()
>>>>> - Use io_uring_rsrc_update rather than add a new struct type
>>>> - Allow multiple register/unregister instead of enforcing just 1 at the
>>>> time
>>>> - Check for it really being a ring fd when registering
>>>>
>>>> For different batch counts, nice improvements are seen. Roughly:
>>>>
>>>> Batch==1 15% faster
>>>> Batch==2 13% faster
>>>> Batch==4 11% faster
>>>>
>>>> This is just in microbenchmarks where the fdget/fdput play a bigger
>>>> factor, but it will certainly help with real world situations where
>>>> batching is more limited than in benchmarks.
>>> In trying this out in applications, I think the better registration API
>>> is to allow io_uring to pick the offset. The application doesn't care,
>>> it's just a magic integer there. And if we allow io_uring to pick it,
>>> then that makes things a lot easier to deal with.
>>>
>>> For registration, pass in an array of io_uring_rsrc_update structs, just
>>> filling in the ring_fd in the data field. Return value is number of ring
>>> fds registered, and up->offset now contains the chosen offset for each
>>> of them.
>>>
>>> Unregister is the same struct, but just with offset filled in.
>>>
>>> For applications using io_uring, which is all of them as far as I'm
>>> aware, we can also easily hide this. This means we can get the optimized
>>> behavior by default.
>> Only complication here is if the ring is shared across threads or
>> processes. The thread one can be common, one thread doing submit and one
>> doing completions. That one is a bit harder to handle. Without
>> inheriting registered ring fds, not sure how it can be handled by
>> default. Should probably just introduce a new queue init helper, but
>> then that requires application changes to adopt...
>>
>> Ideas welcome!
> Don't see a way to do it by default, so I think it'll have to be opt-in.
> We could make it the default if you never shared a ring, which most
> people don't do, but we can't easily do so if it's ever shared between
> tasks or processes.
Before sending this patch, I also have thought about whether we can make
this
enter_ring_fd be shared between threads and be default feature, and
found that
it's hard :)
1) first we need to ensure this ring fd registration always can be
unregistered,
so this cache is tied to io_uring_task, then when thread exits, we can
ensure
fput called correspondingly.
2) we may also implement a file cache shared between threads in a
process,
but this may introduce extra overhead, at least need locks to protect
modifications
to this cache. If this cache is per thread, it doesn't need any
synchronizations.
So I suggest to make it be just simple and opt-in, and the registration
interface
seems not complicated, a thread trying to submit sqes can adopt it easily.
>
> With liburing, if you share, you share the io_uring struct as well. So
> it's hard to maintain a new ->enter_ring_fd in there. It'd be doable if
> we could reserve real fd == . Which is of course possible
> using xarray or similar, but that'll add extra overhead.
For liburing, we may need to add fixed file version for helpers which
submit sqes
or reap cqes, for example, io_uring_submit_fixed(), which passes a
enter_ring_fd.
>
> Anyway, current version below. Only real change here is allowing either
> specific offset or generated offset, depending on what the
> io_uring_rsrc_update->offset is set to. If set to -1U, then io_uring
> will find a free offset. If set to anything else, io_uring will use that
> index (as long as it's >=0 && < MAX).
Seems you forgot to attach the newest version, and also don't see a
patch attachment.
Finally, thanks for your quick response and many code improvements,
really appreciate it.
Regards,
Xiaoguang Wang
>
next prev parent reply other threads:[~2022-03-04 13:39 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-03 5:28 [PATCH] io_uring: add io_uring_enter(2) fixed file support Xiaoguang Wang
2022-03-03 8:56 ` Hao Xu
2022-03-03 13:38 ` Jens Axboe
2022-03-03 14:36 ` Jens Axboe
2022-03-03 14:40 ` Jens Axboe
2022-03-03 16:31 ` Jens Axboe
2022-03-03 17:18 ` Jens Axboe
2022-03-03 20:41 ` Jens Axboe
2022-03-03 21:19 ` Jens Axboe
2022-03-04 0:07 ` Jens Axboe
2022-03-04 13:39 ` Xiaoguang Wang [this message]
2022-03-04 13:44 ` Jens Axboe
2022-03-04 15:16 ` Xiaoguang Wang
2022-03-04 15:22 ` Jens Axboe
2022-03-08 8:38 ` Xiaoguang Wang
2022-03-08 13:10 ` Jens Axboe
2022-03-03 22:24 ` Vito Caputo
2022-03-03 22:26 ` Jens Axboe
2022-03-04 1:49 ` Pavel Begunkov
2022-03-04 2:18 ` Jens Axboe
2022-03-04 2:28 ` Pavel Begunkov
2022-03-04 2:35 ` Pavel Begunkov
2022-03-04 2:43 ` Jens Axboe
2022-03-04 1:52 ` Pavel Begunkov
2022-03-04 2:19 ` Jens Axboe
2022-03-04 2:39 ` Pavel Begunkov
2022-03-04 3:03 ` Jens Axboe
2022-04-21 14:16 ` Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3a59a3e1-4aa8-6970-23b6-fd331fb2c75c@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox