Re: [PATCH v3 0/2] io_uring: add support for IORING_OP_GETDENTS

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: David Laight <[email protected]>,
	'Lennert Buytenhek' <[email protected]>,
	Al Viro <[email protected]>,
	"[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Subject: Re: [PATCH v3 0/2] io_uring: add support for IORING_OP_GETDENTS
Date: Sun, 21 Feb 2021 14:12:56 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/21/21 12:38 PM, David Laight wrote:
> From: Jens Axboe
>> Sent: 20 February 2021 18:29
>>
>> On 2/20/21 10:44 AM, David Laight wrote:
>>> From: Lennert Buytenhek
>>>> Sent: 18 February 2021 12:27
>>>>
>>>> These patches add support for IORING_OP_GETDENTS, which is a new io_uring
>>>> opcode that more or less does an lseek(sqe->fd, sqe->off, SEEK_SET)
>>>> followed by a getdents64(sqe->fd, (void *)sqe->addr, sqe->len).
>>>>
>>>> A dumb test program for IORING_OP_GETDENTS is available here:
>>>>
>>>> 	https://krautbox.wantstofly.org/~buytenh/uringfind-v2.c
>>>>
>>>> This test program does something along the lines of what find(1) does:
>>>> it scans recursively through a directory tree and prints the names of
>>>> all directories and files it encounters along the way -- but then using
>>>> io_uring.  (The io_uring version prints the names of encountered files and
>>>> directories in an order that's determined by SQE completion order, which
>>>> is somewhat nondeterministic and likely to differ between runs.)
>>>>
>>>> On a directory tree with 14-odd million files in it that's on a
>>>> six-drive (spinning disk) btrfs raid, find(1) takes:
>>>>
>>>> 	# echo 3 > /proc/sys/vm/drop_caches
>>>> 	# time find /mnt/repo > /dev/null
>>>>
>>>> 	real    24m7.815s
>>>> 	user    0m15.015s
>>>> 	sys     0m48.340s
>>>> 	#
>>>>
>>>> And the io_uring version takes:
>>>>
>>>> 	# echo 3 > /proc/sys/vm/drop_caches
>>>> 	# time ./uringfind /mnt/repo > /dev/null
>>>>
>>>> 	real    10m29.064s
>>>> 	user    0m4.347s
>>>> 	sys     0m1.677s
>>>> 	#
>>>
>>> While there may be uses for IORING_OP_GETDENTS are you sure your
>>> test is comparing like with like?
>>> The underlying work has to be done in either case, so you are
>>> swapping system calls for code complexity.
>>
>> What complexity?
> 
> Evan adding commands to a list to execute later is 'complexity'.
> As in adding more cpu cycles.

That's a pretty blanket statement. If the list is heavily shared, and
hence contended, yes that's generally true. But it isn't.

>>> I suspect that find is actually doing a stat() call on every
>>> directory entry and that your io_uring example is just believing
>>> the 'directory' flag returned in the directory entry for most
>>> modern filesystems.
>>
>> While that may be true (find doing stat as well), the runtime is
>> clearly dominated by IO. Adding a stat on top would be an extra
>> copy, but no extra IO.
> 
> I'd expect stat() to require the disk inode be read into memory.
> getdents() only requires the data of the directory be read.
> So calling stat() requires a lot more IO.

I actually went and checked instead of guessing, and find isn't doing a
stat by default on the files.

> The other thing I just realises is that the 'system time'
> output from time is completely meaningless for the io_uring case.
> All that processing is done by a kernel thread and I doubt
> is re-attributed to the user process.

For sure, you can't directly compare the sys times. But the actual
runtime is of course directly comparable. The actual example is btrfs,
which heavily offloads to threads as well. So the find case doesn't show
you the full picture either. Note that once the reworked worker scheme
is adopted, sys time will be directly attributed to the original task
and it will be included (for io_uring, not talking about btrfs).

I'm going to ignore the rest of this email as it isn't productive going
down that path. Suffice it to say that ideally _no_ operations will be
using the async offload, that's really only for regular file IO where
you cannot poll for readiness. And even those cases are gradually being
improved to not rely on it, like for async buffered reads. We still
need to handle writes better, and ideally things like this as well. But
that's a bit further out.

-- 
Jens Axboe

     prev parent reply	other threads:[~2021-02-21 21:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18 12:26 [PATCH v3 0/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-02-18 12:27 ` [PATCH v3 1/2] readdir: split the core of getdents64(2) out into vfs_getdents() Lennert Buytenhek
2021-02-18 12:27 ` [PATCH v3 2/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-02-19 12:05   ` Pavel Begunkov
2021-02-19 12:10     ` Pavel Begunkov
2021-02-19 18:06     ` Lennert Buytenhek
2021-02-19 12:34   ` Matthew Wilcox
2021-02-19 18:07     ` Lennert Buytenhek
2021-02-19 18:59       ` Lennert Buytenhek
2021-02-20 17:44 ` [PATCH v3 0/2] " David Laight
2021-02-20 18:29   ` Jens Axboe
2021-02-21 19:38     ` David Laight
2021-02-21 21:12       ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox