Re: [PATCH] io_uring/rw: transform single vector readv/writev into ubuf

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Ming Lei <[email protected]>
To: Jens Axboe <[email protected]>
Cc: io-uring <[email protected]>, [email protected]
Subject: Re: [PATCH] io_uring/rw: transform single vector readv/writev into ubuf
Date: Sat, 25 Mar 2023 06:41:41 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Fri, Mar 24, 2023 at 08:35:38AM -0600, Jens Axboe wrote:
> It's very common to have applications that use vectored reads or writes,
> even if they only pass in a single segment. Obviously they should be
> using read/write at that point, but...

Yeah, it is like fixing application issue in kernel side, :-)

> 
> Vectored IO comes with the downside of needing to retain iovec state,
> and hence they require and allocation and state copy if they end up
> getting deferred. Additionally, they also require extra cleanup when
> completed as the memory as the allocated state memory has to be freed.
> 
> Automatically transform single segment IORING_OP_{READV,WRITEV} into
> IORING_OP_{READ,WRITE}, and hence into an ITER_UBUF. Outside of being
> more efficient if needing deferral, ITER_UBUF is also more efficient
> for normal processing compared to ITER_IOVEC, as they don't require
> iteration. The latter is apparent when running peak testing, where
> using IORING_OP_READV to randomly read 24 drives previously scored:
> 
> IOPS=72.54M, BW=35.42GiB/s, IOS/call=32/31
> IOPS=73.35M, BW=35.81GiB/s, IOS/call=32/31
> IOPS=72.71M, BW=35.50GiB/s, IOS/call=32/31
> IOPS=73.29M, BW=35.78GiB/s, IOS/call=32/32
> IOPS=73.45M, BW=35.86GiB/s, IOS/call=32/32
> IOPS=73.19M, BW=35.74GiB/s, IOS/call=31/32
> IOPS=72.89M, BW=35.59GiB/s, IOS/call=32/31
> IOPS=73.07M, BW=35.68GiB/s, IOS/call=32/32
> 
> and after the change we get:
> 
> IOPS=77.31M, BW=37.75GiB/s, IOS/call=32/31
> IOPS=77.32M, BW=37.75GiB/s, IOS/call=32/32
> IOPS=77.45M, BW=37.81GiB/s, IOS/call=31/31
> IOPS=77.47M, BW=37.83GiB/s, IOS/call=32/32
> IOPS=77.14M, BW=37.67GiB/s, IOS/call=32/32
> IOPS=77.14M, BW=37.66GiB/s, IOS/call=31/31
> IOPS=77.37M, BW=37.78GiB/s, IOS/call=32/32
> IOPS=77.25M, BW=37.72GiB/s, IOS/call=32/32
> 
> which is a nice win as well.
> 
> Signed-off-by: Jens Axboe <[email protected]>
> 
> ---
> 
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 4c233910e200..5c998754cb17 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -402,7 +402,22 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req,
>  			      req->ctx->compat);
>  	if (unlikely(ret < 0))
>  		return ERR_PTR(ret);
> -	return iovec;
> +	if (iter->nr_segs != 1)
> +		return iovec;
> +	/*
> +	 * Convert to non-vectored request if we have a single segment. If we
> +	 * need to defer the request, then we no longer have to allocate and
> +	 * maintain a struct io_async_rw. Additionally, we won't have cleanup
> +	 * to do at completion time
> +	 */
> +	rw->addr = (unsigned long) iter->iov[0].iov_base;
> +	rw->len = iter->iov[0].iov_len;
> +	iov_iter_ubuf(iter, ddir, iter->iov[0].iov_base, rw->len);
> +	/* readv -> read distance is the same as writev -> write */
> +	BUILD_BUG_ON((IORING_OP_READ - IORING_OP_READV) !=
> +			(IORING_OP_WRITE - IORING_OP_WRITEV));
> +	req->opcode += (IORING_OP_READ - IORING_OP_READV);

It is a bit fragile to change ->opcode, which may need matched callbacks for
the two OPs, also cause inconsistent opcode in traces.

I am wondering why not play the magic in io_prep_rw() from beginning?


Thanks,
Ming

next prev parent reply	other threads:[~2023-03-24 22:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-24 14:35 [PATCH] io_uring/rw: transform single vector readv/writev into ubuf Jens Axboe
2023-03-24 22:41 ` Ming Lei [this message]
2023-03-24 23:06   ` Jens Axboe
2023-03-25  0:24     ` Ming Lei
2023-03-27 11:45       ` Pavel Begunkov
2023-03-24 23:54 ` Keith Busch
2023-03-25  1:06   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox