public inbox for [email protected]
 help / color / mirror / Atom feed
From: Xiaoguang Wang <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH] io_uring: add io_uring_enter(2) fixed file support
Date: Fri, 4 Mar 2022 21:39:34 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

hi,

> On 3/3/22 2:19 PM, Jens Axboe wrote:
>> On 3/3/22 1:41 PM, Jens Axboe wrote:
>>> On 3/3/22 10:18 AM, Jens Axboe wrote:
>>>> On 3/3/22 9:31 AM, Jens Axboe wrote:
>>>>> On 3/3/22 7:40 AM, Jens Axboe wrote:
>>>>>> On 3/3/22 7:36 AM, Jens Axboe wrote:
>>>>>>> The only potential oddity here is that the fd passed back is not a
>>>>>>> legitimate fd. io_uring does support poll(2) on its file descriptor, so
>>>>>>> that could cause some confusion even if I don't think anyone actually
>>>>>>> does poll(2) on io_uring.
>>>>>> Side note - the only implication here is that we then likely can't make
>>>>>> the optimized behavior the default, it has to be an IORING_SETUP_REG
>>>>>> flag which tells us that the application is aware of this limitation.
>>>>>> Though I guess close(2) might mess with that too... Hmm.
>>>>> Not sure I can find a good approach for that. Tried out your patch and
>>>>> made some fixes:
>>>>>
>>>>> - Missing free on final tctx free
>>>>> - Rename registered_files to registered_rings
>>>>> - Fix off-by-ones in checking max registration count
>>>>> - Use kcalloc
>>>>> - Rename ENTER_FIXED_FILE -> ENTER_REGISTERED_RING
>>>>> - Don't pass in tctx to io_uring_unreg_ringfd()
>>>>> - Get rid of forward declaration for adding tctx node
>>>>> - Get rid of extra file pointer in io_uring_enter()
>>>>> - Fix deadlock in io_ringfd_register()
>>>>> - Use io_uring_rsrc_update rather than add a new struct type
>>>> - Allow multiple register/unregister instead of enforcing just 1 at the
>>>>    time
>>>> - Check for it really being a ring fd when registering
>>>>
>>>> For different batch counts, nice improvements are seen. Roughly:
>>>>
>>>> Batch==1	15% faster
>>>> Batch==2	13% faster
>>>> Batch==4	11% faster
>>>>
>>>> This is just in microbenchmarks where the fdget/fdput play a bigger
>>>> factor, but it will certainly help with real world situations where
>>>> batching is more limited than in benchmarks.
>>> In trying this out in applications, I think the better registration API
>>> is to allow io_uring to pick the offset. The application doesn't care,
>>> it's just a magic integer there. And if we allow io_uring to pick it,
>>> then that makes things a lot easier to deal with.
>>>
>>> For registration, pass in an array of io_uring_rsrc_update structs, just
>>> filling in the ring_fd in the data field. Return value is number of ring
>>> fds registered, and up->offset now contains the chosen offset for each
>>> of them.
>>>
>>> Unregister is the same struct, but just with offset filled in.
>>>
>>> For applications using io_uring, which is all of them as far as I'm
>>> aware, we can also easily hide this. This means we can get the optimized
>>> behavior by default.
>> Only complication here is if the ring is shared across threads or
>> processes. The thread one can be common, one thread doing submit and one
>> doing completions. That one is a bit harder to handle. Without
>> inheriting registered ring fds, not sure how it can be handled by
>> default. Should probably just introduce a new queue init helper, but
>> then that requires application changes to adopt...
>>
>> Ideas welcome!
> Don't see a way to do it by default, so I think it'll have to be opt-in.
> We could make it the default if you never shared a ring, which most
> people don't do, but we can't easily do so if it's ever shared between
> tasks or processes.
Before sending this patch, I also have thought about whether we can make 
this
enter_ring_fd be shared between threads and be default feature, and 
found that
it's hard :)
   1) first we need to ensure this ring fd registration always can be 
unregistered,
so this cache is tied to io_uring_task, then when thread exits, we can 
ensure
fput called correspondingly.
   2) we may also implement a file cache shared between threads in a 
process,
but this may introduce extra overhead, at least need locks to protect 
modifications
to this cache. If this cache is per thread, it doesn't need any 
synchronizations.

So I suggest to make it be just simple and opt-in, and the registration 
interface
seems not complicated, a thread trying to submit sqes can adopt it easily.
>
> With liburing, if you share, you share the io_uring struct as well. So
> it's hard to maintain a new ->enter_ring_fd in there. It'd be doable if
> we could reserve real fd == . Which is of course possible
> using xarray or similar, but that'll add extra overhead.
For liburing, we may need to add fixed file version for helpers which 
submit sqes
or reap cqes, for example,  io_uring_submit_fixed(), which passes a 
enter_ring_fd.

>
> Anyway, current version below. Only real change here is allowing either
> specific offset or generated offset, depending on what the
> io_uring_rsrc_update->offset is set to. If set to -1U, then io_uring
> will find a free offset. If set to anything else, io_uring will use that
> index (as long as it's >=0 && < MAX).
Seems you forgot to attach the newest version, and also don't see a 
patch attachment.
Finally, thanks for your quick response and many code improvements, 
really appreciate it.

Regards,
Xiaoguang Wang
>


  reply	other threads:[~2022-03-04 13:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-03  5:28 [PATCH] io_uring: add io_uring_enter(2) fixed file support Xiaoguang Wang
2022-03-03  8:56 ` Hao Xu
2022-03-03 13:38 ` Jens Axboe
2022-03-03 14:36   ` Jens Axboe
2022-03-03 14:40     ` Jens Axboe
2022-03-03 16:31       ` Jens Axboe
2022-03-03 17:18         ` Jens Axboe
2022-03-03 20:41           ` Jens Axboe
2022-03-03 21:19             ` Jens Axboe
2022-03-04  0:07               ` Jens Axboe
2022-03-04 13:39                 ` Xiaoguang Wang [this message]
2022-03-04 13:44                   ` Jens Axboe
2022-03-04 15:16                     ` Xiaoguang Wang
2022-03-04 15:22                       ` Jens Axboe
2022-03-08  8:38                         ` Xiaoguang Wang
2022-03-08 13:10                           ` Jens Axboe
2022-03-03 22:24             ` Vito Caputo
2022-03-03 22:26               ` Jens Axboe
2022-03-04  1:49         ` Pavel Begunkov
2022-03-04  2:18           ` Jens Axboe
2022-03-04  2:28             ` Pavel Begunkov
2022-03-04  2:35               ` Pavel Begunkov
2022-03-04  2:43               ` Jens Axboe
2022-03-04  1:52         ` Pavel Begunkov
2022-03-04  2:19           ` Jens Axboe
2022-03-04  2:39             ` Pavel Begunkov
2022-03-04  3:03               ` Jens Axboe
2022-04-21 14:16     ` Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3a59a3e1-4aa8-6970-23b6-fd331fb2c75c@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox