From: Hao Xu <[email protected]>
To: [email protected]
Cc: Jens Axboe <[email protected]>,
Pavel Begunkov <[email protected]>,
Ingo Molnar <[email protected]>,
Wanpeng Li <[email protected]>
Subject: Re: [RFC 00/19] uringlet
Date: Thu, 25 Aug 2022 21:03:59 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 8/19/22 23:27, Hao Xu wrote:
> From: Hao Xu <[email protected]>
>
> Hi Jens and all,
>
> This is an early RFC for a new way to do async IO. Currently io_uring
> works in a way like:
> - issue an IO request in nowait way
> here nowait means return error(EAGAIN) to io_uring layer when it would
> block in deeper kernel stack.
>
> - issue an IO request in a normal(block) way
> io_uring catches the EAGAIN error and create/wakeup a io-worker to
> redo the IO request in a block way. The original context turns to
> issue other requests. (some type of requests like buffered reads,
> leverage task work to wipe out io-workers)
>
> This has two main disadvantages:
> - we have to find every block point along the kernel code path and
> modify it to support nowait.
> e.g. alloc_memory() ----> if (alloc_memory() fails) return -EAGAIN
> This hugely adds programming complexisity, especially when the code
> path is long and complicated. For example, buffered write, we have
> to handle locks, possibly journal part, meta data like extent node
> misses.
>
> - By create/wakeup a new worker, we redo a IO request from the very
> beginning, which means we re-walk the path from beginning to the
> previous block point.
> The original context backtracks to the io_uring layer from the block
> point to submit other requests. While it's better to directly start
> the new submission.
>
> This RFC provides a new way to do it.
> - We maintain a worker pool for each io_uring instance and each worker
> in it can submit requests. The original task only needs to create the
> first worker and return to userspace. Later it doesn't need to call
> io_uring_enter.[1]
>
> - the created worker begins to submit requests. When it blocks, just
> let it be blocked. Create/wakeup another worker to do the submission
>
> [1] I currently keep these workers until the io_uring context exits. In
> other words, a worker does submission, sleep, wake up, but won't
> exit. Thus the original task don't need to create/wakeup workers.
>
> I've done some testing:
> name: buffered write
> fs: xfs
> env: qemu box, 4 cpu, 8G mem.
> tool: fio
>
> - single file test:
>
> fio ioengine=io_uring, size=10M, bs=1024, direct=0,
> thread=1, rw=randwrite, time_based=1, runtime=180
>
> async buffered writes:
> iodepth
> 1 write: IOPS=428k, BW=418MiB/s (438MB/s)(73.5GiB/180000msec);
> 2 write: IOPS=406k, BW=396MiB/s (416MB/s)(69.7GiB/180002msec);
> 4 write: IOPS=382k, BW=373MiB/s (391MB/s)(65.6GiB/180000msec);
> 8 write: IOPS=255k, BW=249MiB/s (261MB/s)(43.7GiB/180001msec);
> 16 write: IOPS=399k, BW=390MiB/s (409MB/s)(68.5GiB/180000msec);
> 32 write: IOPS=433k, BW=423MiB/s (443MB/s)(74.3GiB/180000msec);
>
> 1 lat (nsec): min=547, max=2929.3k, avg=1074.98, stdev=6498.72
> 2 lat (nsec): min=607, max=84320k, avg=3619.15, stdev=109104.36
> 4 lat (nsec): min=891, max=195941k, avg=9062.16, stdev=213600.71
> 8 lat (nsec): min=684, max=204164k, avg=29308.56, stdev=542490.72
> 16 lat (nsec): min=1002, max=77279k, avg=38716.65, stdev=461785.55
> 32 lat (nsec): min=674, max=75279k, avg=72673.91, stdev=588002.49
>
>
> uringlet:
> iodepth
> 1 write: IOPS=120k, BW=117MiB/s (123MB/s)(20.6GiB/180006msec);
> 2 write: IOPS=273k, BW=266MiB/s (279MB/s)(46.8GiB/180010msec);
> 4 write: IOPS=336k, BW=328MiB/s (344MB/s)(57.7GiB/180002msec);
> 8 write: IOPS=373k, BW=365MiB/s (382MB/s)(64.1GiB/180000msec);
> 16 write: IOPS=442k, BW=432MiB/s (453MB/s)(75.9GiB/180001msec);
> 32 write: IOPS=444k, BW=434MiB/s (455MB/s)(76.2GiB/180010msec);
>
> 1 lat (nsec): min=684, max=10790k, avg=6781.23, stdev=10000.69
> 2 lat (nsec): min=650, max=91712k, avg=5690.52, stdev=136818.11
> 4 lat (nsec): min=785, max=79038k, avg=10297.04, stdev=227375.52
> 8 lat (nsec): min=862, max=97493k, avg=19804.67, stdev=350809.60
> 16 lat (nsec): min=823, max=81279k, avg=34681.33, stdev=478427.17
> 32 lat (usec): min=6, max=105935, avg=70.55, stdev=696.08
>
> uringlet behaves worse on IOPS and lantency in small iodepth. I think
> the reason is there are more sleep and wakeup.(not sure about it, I'll
> look into it later)
>
> The downside of uringlet:
> - it costs more cpu resource, the reason is similar with the sqpoll case: a
> uringlet worker keeps checking sqring to reduce latency.[2]
> - task->plug is disabled for now since uringlet is buggy with it.
>
> [2] For now, I allow a uringlet worker spin on the empty sqring for some
> times.
>
> Any comments are welcome, This early RFC only supports buffered write for
> now and if the idea under it is proved to be the right way, I'll change
> it to a formal patchset and resolve the detail technical issues and try
> to support more io_uring features.
>
> Regards,
> Hao
>
Friendly ping...
Jens, any thoughts on this one?
prev parent reply other threads:[~2022-08-25 13:05 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-19 15:27 [RFC 00/19] uringlet Hao Xu
2022-08-19 15:27 ` [PATCH 01/19] io_uring: change return value of create_io_worker() and io_wqe_create_worker() Hao Xu
2022-08-19 15:27 ` [PATCH 02/19] io_uring: add IORING_SETUP_URINGLET Hao Xu
2022-08-19 15:27 ` [PATCH 03/19] io_uring: make worker pool per ctx for uringlet mode Hao Xu
2022-08-19 15:27 ` [PATCH 04/19] io-wq: split io_wqe_worker() to io_wqe_worker_normal() and io_wqe_worker_let() Hao Xu
2022-08-19 15:27 ` [PATCH 05/19] io_uring: add io_uringler_offload() for uringlet mode Hao Xu
2022-08-19 15:27 ` [PATCH 06/19] io-wq: change the io-worker scheduling logic Hao Xu
2022-08-19 15:27 ` [PATCH 07/19] io-wq: move worker state flags to io-wq.h Hao Xu
2022-08-19 15:27 ` [PATCH 08/19] io-wq: add IO_WORKER_F_SUBMIT and its friends Hao Xu
2022-08-19 15:27 ` [PATCH 09/19] io-wq: add IO_WORKER_F_SCHEDULED " Hao Xu
2022-08-19 15:27 ` [PATCH 10/19] io_uring: add io_submit_sqes_let() Hao Xu
2022-08-19 15:27 ` [PATCH 11/19] io_uring: don't allocate io-wq for a worker in uringlet mode Hao Xu
2022-08-19 15:27 ` [PATCH 12/19] io_uring: add uringlet worker cancellation function Hao Xu
2022-08-19 15:27 ` [PATCH 13/19] io-wq: add wq->owner for uringlet mode Hao Xu
2022-08-19 15:27 ` [PATCH 14/19] io_uring: modify issue_flags " Hao Xu
2022-08-19 15:27 ` [PATCH 15/19] io_uring: don't use inline completion cache if scheduled Hao Xu
2022-08-19 15:27 ` [PATCH 16/19] io_uring: release ctx->let when a ring exits Hao Xu
2022-08-19 15:27 ` [PATCH 17/19] io_uring: disable task plug for now Hao Xu
2022-08-19 15:27 ` [PATCH 18/19] io-wq: only do io_uringlet_end() at the first schedule time Hao Xu
2022-08-19 15:27 ` [PATCH 19/19] io_uring: wire up uringlet Hao Xu
2022-08-25 13:03 ` Hao Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox