public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH 2/6] io_uring: add IORING_OP_PROVIDE_BUFFERS
Date: Fri, 28 Feb 2020 21:50:26 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/28/20 5:43 PM, Pavel Begunkov wrote:
> 
> 
> On 28/02/2020 23:30, Jens Axboe wrote:
>> IORING_OP_PROVIDE_BUFFERS uses the buffer registration infrastructure to
>> support passing in an addr/len that is associated with a buffer ID and
>> buffer group ID. The group ID is used to index and lookup the buffers,
>> while the buffer ID can be used to notify the application which buffer
>> in the group was used. The addr passed in is the starting buffer address,
>> and length is each buffer length. A number of buffers to add with can be
>> specified, in which case addr is incremented by length for each addition,
>> and each buffer increments the buffer ID specified.
>>
>> No validation is done of the buffer ID. If the application provides
>> buffers within the same group with identical buffer IDs, then it'll have
>> a hard time telling which buffer ID was used. The only restriction is
>> that the buffer ID can be a max of 16-bits in size, so USHRT_MAX is the
>> maximum ID that can be used.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
> 
> 
>> +
>> +static int io_add_buffers(struct io_provide_buf *pbuf, struct list_head *list)
>> +{
>> +	struct io_buffer *buf;
>> +	u64 addr = pbuf->addr;
>> +	int i, bid = pbuf->bid;
>> +
>> +	for (i = 0; i < pbuf->nbufs; i++) {
>> +		buf = kmalloc(sizeof(*buf), GFP_KERNEL);
>> +		if (!buf)
>> +			break;
>> +
>> +		buf->addr = addr;
>> +		buf->len = pbuf->len;
>> +		buf->bid = bid;
>> +		list_add(&buf->list, list);
>> +		addr += pbuf->len;
> 
> So, it chops a linear buffer into pbuf->nbufs chunks of size pbuf->len.

Right

> Did you consider iovec? I'll try to think about it after getting some sleep

I did, my issue there is that if you do vecs, then you want non-contig
buffer IDs. And then you'd need something else than an iovec, and it
gets messy pretty quick.

But in general, I'd really like to reuse buffer IDs, because I think
that's what applications would generally want. Maybe that means that
single buffers is just easier to deal with, but at least you can do that
with this approach without needing any sort of "packaging" for the data
passed in - it's just a pointer.

I'm open to suggestions, though!

>> +static int io_provide_buffers(struct io_kiocb *req, struct io_kiocb **nxt,
>> +			      bool force_nonblock)
>> +{
>> +	struct io_provide_buf *p = &req->pbuf;
>> +	struct io_ring_ctx *ctx = req->ctx;
>> +	struct list_head *list;
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * "Normal" inline submissions always hold the uring_lock, since we
>> +	 * grab it from the system call. Same is true for the SQPOLL offload.
>> +	 * The only exception is when we've detached the request and issue it
>> +	 * from an async worker thread, grab the lock for that case.
>> +	 */
>> +	if (!force_nonblock)
>> +		mutex_lock(&ctx->uring_lock);
> 
> io_poll_task_handler() calls it with force_nonblock==true, but it
> doesn't hold the mutex AFAIK.

True, that's the only exception. And that command doesn't transfer data
so would never need a buffer, but I agree that's perhaps not fully
clear. The async task handler grabs the mutex.

>> +	lockdep_assert_held(&ctx->uring_lock);
>> +
>> +	list = idr_find(&ctx->io_buffer_idr, p->gid);
>> +	if (!list) {
>> +		list = kmalloc(sizeof(*list), GFP_KERNEL);
>> +		if (!list) {
>> +			ret = -ENOMEM;
>> +			goto out;
>> +		}
>> +		INIT_LIST_HEAD(list);
>> +		ret = idr_alloc(&ctx->io_buffer_idr, list, p->gid, p->gid + 1,
>> +					GFP_KERNEL);
>> +		if (ret < 0) {
>> +			kfree(list);
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	ret = io_add_buffers(p, list);
> 
> Isn't it better to not do partial registration?
> i.e. it may return ret < pbuf->nbufs

Most things work like that, though. If you ask for an 8k read, you can't
unwind if you just get 4k. We return 4k for that. I think in general, if
it fails, you're probably somewhat screwed in either case. At least with
the partial return, you know which buffers got registered and how many
you can use. If you return 0 and undo it all, then the application
really has no way to continue except abort.

-- 
Jens Axboe


  reply	other threads:[~2020-02-29  4:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 20:30 [PATCHSET v3] io_uring support for automatic buffers Jens Axboe
2020-02-28 20:30 ` [PATCH 1/6] io_uring: buffer registration infrastructure Jens Axboe
2020-02-28 20:30 ` [PATCH 2/6] io_uring: add IORING_OP_PROVIDE_BUFFERS Jens Axboe
2020-02-29  0:43   ` Pavel Begunkov
2020-02-29  4:50     ` Jens Axboe [this message]
2020-02-29 11:36       ` Pavel Begunkov
2020-02-29 17:32         ` Jens Axboe
2020-02-29 12:08   ` Pavel Begunkov
2020-02-29 17:34     ` Jens Axboe
2020-02-29 18:11       ` Jens Axboe
2020-03-09 17:03   ` Andres Freund
2020-03-09 17:17     ` Jens Axboe
2020-03-09 17:28       ` Andres Freund
2020-03-10 13:33         ` Jens Axboe
2020-02-28 20:30 ` [PATCH 3/6] io_uring: support buffer selection Jens Axboe
2020-02-29 12:21   ` Pavel Begunkov
2020-02-29 17:35     ` Jens Axboe
2020-03-09 17:21   ` Andres Freund
2020-03-10 13:37     ` Jens Axboe
2020-02-28 20:30 ` [PATCH 4/6] io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_READV Jens Axboe
2020-02-28 20:30 ` [PATCH 5/6] net: abstract out normal and compat msghdr import Jens Axboe
2020-02-28 20:30 ` [PATCH 6/6] io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox