From: Ming Lei <[email protected]>
To: Jens Axboe <[email protected]>
Cc: io-uring <[email protected]>, [email protected]
Subject: Re: [PATCH] io_uring/rw: transform single vector readv/writev into ubuf
Date: Sat, 25 Mar 2023 06:41:41 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On Fri, Mar 24, 2023 at 08:35:38AM -0600, Jens Axboe wrote:
> It's very common to have applications that use vectored reads or writes,
> even if they only pass in a single segment. Obviously they should be
> using read/write at that point, but...
Yeah, it is like fixing application issue in kernel side, :-)
>
> Vectored IO comes with the downside of needing to retain iovec state,
> and hence they require and allocation and state copy if they end up
> getting deferred. Additionally, they also require extra cleanup when
> completed as the memory as the allocated state memory has to be freed.
>
> Automatically transform single segment IORING_OP_{READV,WRITEV} into
> IORING_OP_{READ,WRITE}, and hence into an ITER_UBUF. Outside of being
> more efficient if needing deferral, ITER_UBUF is also more efficient
> for normal processing compared to ITER_IOVEC, as they don't require
> iteration. The latter is apparent when running peak testing, where
> using IORING_OP_READV to randomly read 24 drives previously scored:
>
> IOPS=72.54M, BW=35.42GiB/s, IOS/call=32/31
> IOPS=73.35M, BW=35.81GiB/s, IOS/call=32/31
> IOPS=72.71M, BW=35.50GiB/s, IOS/call=32/31
> IOPS=73.29M, BW=35.78GiB/s, IOS/call=32/32
> IOPS=73.45M, BW=35.86GiB/s, IOS/call=32/32
> IOPS=73.19M, BW=35.74GiB/s, IOS/call=31/32
> IOPS=72.89M, BW=35.59GiB/s, IOS/call=32/31
> IOPS=73.07M, BW=35.68GiB/s, IOS/call=32/32
>
> and after the change we get:
>
> IOPS=77.31M, BW=37.75GiB/s, IOS/call=32/31
> IOPS=77.32M, BW=37.75GiB/s, IOS/call=32/32
> IOPS=77.45M, BW=37.81GiB/s, IOS/call=31/31
> IOPS=77.47M, BW=37.83GiB/s, IOS/call=32/32
> IOPS=77.14M, BW=37.67GiB/s, IOS/call=32/32
> IOPS=77.14M, BW=37.66GiB/s, IOS/call=31/31
> IOPS=77.37M, BW=37.78GiB/s, IOS/call=32/32
> IOPS=77.25M, BW=37.72GiB/s, IOS/call=32/32
>
> which is a nice win as well.
>
> Signed-off-by: Jens Axboe <[email protected]>
>
> ---
>
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 4c233910e200..5c998754cb17 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -402,7 +402,22 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req,
> req->ctx->compat);
> if (unlikely(ret < 0))
> return ERR_PTR(ret);
> - return iovec;
> + if (iter->nr_segs != 1)
> + return iovec;
> + /*
> + * Convert to non-vectored request if we have a single segment. If we
> + * need to defer the request, then we no longer have to allocate and
> + * maintain a struct io_async_rw. Additionally, we won't have cleanup
> + * to do at completion time
> + */
> + rw->addr = (unsigned long) iter->iov[0].iov_base;
> + rw->len = iter->iov[0].iov_len;
> + iov_iter_ubuf(iter, ddir, iter->iov[0].iov_base, rw->len);
> + /* readv -> read distance is the same as writev -> write */
> + BUILD_BUG_ON((IORING_OP_READ - IORING_OP_READV) !=
> + (IORING_OP_WRITE - IORING_OP_WRITEV));
> + req->opcode += (IORING_OP_READ - IORING_OP_READV);
It is a bit fragile to change ->opcode, which may need matched callbacks for
the two OPs, also cause inconsistent opcode in traces.
I am wondering why not play the magic in io_prep_rw() from beginning?
Thanks,
Ming
next prev parent reply other threads:[~2023-03-24 22:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-24 14:35 [PATCH] io_uring/rw: transform single vector readv/writev into ubuf Jens Axboe
2023-03-24 22:41 ` Ming Lei [this message]
2023-03-24 23:06 ` Jens Axboe
2023-03-25 0:24 ` Ming Lei
2023-03-27 11:45 ` Pavel Begunkov
2023-03-24 23:54 ` Keith Busch
2023-03-25 1:06 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox