Re: [PATCH 12/12] io_uring: add support for ring mapped supplied buffers

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Dylan Yudaken <[email protected]>,
	"[email protected]" <[email protected]>
Subject: Re: [PATCH 12/12] io_uring: add support for ring mapped supplied buffers
Date: Sun, 1 May 2022 06:25:47 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 5/1/22 4:28 AM, Dylan Yudaken wrote:
> On Sat, 2022-04-30 at 14:50 -0600, Jens Axboe wrote:
>> Provided buffers allow an application to supply io_uring with buffers
>> that can then be grabbed for a read/receive request, when the data
>> source is ready to deliver data. The existing scheme relies on using
>> IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use
>> in real world applications. It's pretty efficient if the application
>> is able to supply back batches of provided buffers when they have
>> been
>> consumed and the application is ready to recycle them, but if
>> fragmentation occurs in the buffer space, it can become difficult to
>> supply enough buffers at the time. This hurts efficiency.
>>
>> Add a register op, IORING_REGISTER_PBUF_RING, which allows an
>> application
>> to setup a shared queue for each buffer group of provided buffers.
>> The
>> application can then supply buffers simply by adding them to this
>> ring,
>> and the kernel can consume then just as easily. The ring shares the
>> head
>> with the application, the tail remains private in the kernel.
>>
>> Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use
>> IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to
>> the
>> ring, they must use the mapped ring. Mapped provided buffer rings can
>> co-exist with normal provided buffers, just not within the same group
>> ID.
>>
>> To gauge overhead of the existing scheme and evaluate the mapped ring
>> approach, a simple NOP benchmark was written. It uses a ring of 128
>> entries, and submits/completes 32 at the time. 'Replenish' is how
>> many buffers are provided back at the time after they have been
>> consumed:
>>
>> Test                    Replenish                       NOPs/sec
>> ================================================================
>> No provided buffers     NA                              ~30M
>> Provided buffers        32                              ~16M
>> Provided buffers         1                              ~10M
>> Ring buffers            32                              ~27M
>> Ring buffers             1                              ~27M
>>
>> The ring mapped buffers perform almost as well as not using provided
>> buffers at all, and they don't care if you provided 1 or more back at
>> the same time. This means application can just replenish as they go,
>> rather than need to batch and compact, further reducing overhead in
>> the
>> application. The NOP benchmark above doesn't need to do any
>> compaction,
>> so that overhead isn't even reflected in the above test.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>  fs/io_uring.c                 | 227 +++++++++++++++++++++++++++++++-
>> --
>>  include/uapi/linux/io_uring.h |  26 ++++
>>  2 files changed, 238 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 3d5d02b40347..a9691727652c 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -285,7 +285,16 @@ struct io_rsrc_data {
>>  struct io_buffer_list {
>>         struct list_head list;
>>         struct list_head buf_list;
>> +       struct page **buf_pages;
>>         __u16 bgid;
>> +
>> +       /* below is for ring provided buffers */
>> +       __u16 buf_nr_pages;
>> +       __u16 nr_entries;
>> +       __u16 buf_per_page;
>> +       struct io_uring_buf_ring *buf_ring;
>> +       __u32 tail;
>> +       __u32 mask;
>>  };
>>  
>>  struct io_buffer {
>> @@ -815,6 +824,7 @@ enum {
>>         REQ_F_NEED_CLEANUP_BIT,
>>         REQ_F_POLLED_BIT,
>>         REQ_F_BUFFER_SELECTED_BIT,
>> +       REQ_F_BUFFER_RING_BIT,
>>         REQ_F_COMPLETE_INLINE_BIT,
>>         REQ_F_REISSUE_BIT,
>>         REQ_F_CREDS_BIT,
>> @@ -865,6 +875,8 @@ enum {
>>         REQ_F_POLLED            = BIT(REQ_F_POLLED_BIT),
>>         /* buffer already selected */
>>         REQ_F_BUFFER_SELECTED   = BIT(REQ_F_BUFFER_SELECTED_BIT),
>> +       /* buffer selected from ring, needs commit */
>> +       REQ_F_BUFFER_RING       = BIT(REQ_F_BUFFER_RING_BIT),
>>         /* completion is deferred through io_comp_state */
>>         REQ_F_COMPLETE_INLINE   = BIT(REQ_F_COMPLETE_INLINE_BIT),
>>         /* caller should reissue async */
>> @@ -984,6 +996,15 @@ struct io_kiocb {
>>  
>>                 /* stores selected buf, valid IFF
>> REQ_F_BUFFER_SELECTED is set */
>>                 struct io_buffer        *kbuf;
>> +
>> +               /*
>> +                * stores buffer ID for ring provided buffers, valid
>> IFF
>> +                * REQ_F_BUFFER_RING is set.
>> +                */
>> +                struct {
>> +                        struct io_buffer_list  *buf_list;
>> +                        __u32                  bid;
>> +                };
>>         };
>>  
>>         union {
>> @@ -1564,21 +1585,30 @@ static inline void
>> io_req_set_rsrc_node(struct io_kiocb *req,
>>  
>>  static unsigned int __io_put_kbuf(struct io_kiocb *req, struct
>> list_head *list)
>>  {
>> -       struct io_buffer *kbuf = req->kbuf;
>>         unsigned int cflags;
>>  
>> -       cflags = IORING_CQE_F_BUFFER | (kbuf->bid <<
>> IORING_CQE_BUFFER_SHIFT);
>> -       req->flags &= ~REQ_F_BUFFER_SELECTED;
>> -       list_add(&kbuf->list, list);
>> -       req->kbuf = NULL;
>> -       return cflags;
>> +       if (req->flags & REQ_F_BUFFER_RING) {
>> +               if (req->buf_list)
>> +                       req->buf_list->tail++;
> 
> does this need locking? both on buf_list being available or atomic
> increment on tail.

This needs some comments and checks around the expectation. But the idea
is that the fast path will invoke eg the recv with the uring_lock
already held, and we'll hold it until we complete it.

Basically we have two cases:

1) Op invoked with uring_lock held. Either the request completes
   successfully in this invocation, and we put the kbuf with it still
   held. The completion just increments the tail, buf now consumed. Or
   we need to retry somehow, and we can just clear REQ_F_BUFFER_RING to
   recycle the buf, that's it.

2) Op invoked without uring_lock held. We get a buf and increment the
   tail, as we'd otherwise need to grab it again for the completion.
   We're now stuck with the buf, hold it until the request completes.

#1 is the above code, just need some checks and safe guards to ensure
that if ->buf_list is still set, we are still holding the lock.

>> +
>> +               cflags = req->bid << IORING_CQE_BUFFER_SHIFT;
>> +               req->flags &= ~REQ_F_BUFFER_RING;
>> +       } else {
>> +               struct io_buffer *kbuf = req->kbuf;
>> +
>> +               cflags = kbuf->bid << IORING_CQE_BUFFER_SHIFT;
>> +               list_add(&kbuf->list, list);
>> +               req->kbuf = NULL;
> 
> I wonder if this is not necessary? we don't do it above to buf_list and
> it seems to work? For consistency though maybe better to pick one
> approach?

You mean clearing ->kbuf? Yes that's not needed, we're clearing
REQ_F_BUFFER_SELECTED anyway, it's more of a belt and suspenders thing.
I just left the original code alone here, we can remove the NULL
separately.

>> +       /*
>> +        * If we came in unlocked, we have no choice but to
>> +        * consume the buffer here. This does mean it'll be
>> +        * pinned until the IO completes. But coming in
>> +        * unlocked means we're in io-wq context, hence there
> 
> I do not know if this matters here, but task work can also run unlocked
> operations.

As long as the issue_flags are correct, then we should be fine.

>> @@ -423,6 +427,28 @@ struct io_uring_restriction {
>>         __u32 resv2[3];
>>  };
>>  
>> +struct io_uring_buf {
>> +       __u64   addr;
>> +       __u32   len;
>> +       __u32   bid;
>> +};
> 
> I believe bid is 16 bits due to the way it comes back in CQE flags

That is correctl it's limited to 64K on each. We can make this __u16
and add some reserved field. Ditto for the io_uring_buf_reg as well.

-- 
Jens Axboe

next prev parent reply	other threads:[~2022-05-01 12:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-30 20:50 [PATCHSET v3 0/12] Add support for ring mapped provided buffers Jens Axboe
2022-04-30 20:50 ` [PATCH 01/12] io_uring: kill io_recv_buffer_select() wrapper Jens Axboe
2022-04-30 20:50 ` [PATCH 02/12] io_uring: make io_buffer_select() return the user address directly Jens Axboe
2022-04-30 20:50 ` [PATCH 03/12] io_uring: kill io_rw_buffer_select() wrapper Jens Axboe
2022-04-30 20:50 ` [PATCH 04/12] io_uring: ignore ->buf_index if REQ_F_BUFFER_SELECT isn't set Jens Axboe
2022-04-30 20:50 ` [PATCH 05/12] io_uring: always use req->buf_index for the provided buffer group Jens Axboe
2022-04-30 20:50 ` [PATCH 06/12] io_uring: cache last io_buffer_list lookup Jens Axboe
2022-04-30 20:50 ` [PATCH 07/12] io_uring: never call io_buffer_select() for a buffer re-select Jens Axboe
2022-04-30 20:50 ` [PATCH 08/12] io_uring: add buffer selection support to IORING_OP_NOP Jens Axboe
2022-04-30 20:50 ` [PATCH 09/12] io_uring: add io_pin_pages() helper Jens Axboe
2022-04-30 20:50 ` [PATCH 10/12] io_uring: abstract out provided buffer list selection Jens Axboe
2022-04-30 20:50 ` [PATCH 11/12] io_uring: move provided and fixed buffers into the same io_kiocb area Jens Axboe
2022-04-30 20:50 ` [PATCH 12/12] io_uring: add support for ring mapped supplied buffers Jens Axboe
2022-05-01 10:28   ` Dylan Yudaken
2022-05-01 12:25     ` Jens Axboe [this message]
2022-05-01 13:25       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox