public inbox for [email protected]
 help / color / mirror / Atom feed
From: Clay Harris <[email protected]>
To: Lennert Buytenhek <[email protected]>
Cc: [email protected],
	Tavian Barnes <[email protected]>,
	Dmitry Kadashev <[email protected]>
Subject: Re: [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS
Date: Sat, 3 Apr 2021 23:42:16 -0500	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Tue, Mar 30 2021 at 14:17:21 +0300, Lennert Buytenhek quoth thus:

> (These patches depend on IORING_OP_MKDIRAT going in first.)
> 
> These patches add support for IORING_OP_GETDENTS, which is a new io_uring
> opcode that more or less does an lseek(sqe->fd, sqe->off, SEEK_SET)
> followed by a getdents64(sqe->fd, (void *)sqe->addr, sqe->len).
> 
> A dumb test program which recursively scans through a directory tree
> and prints the names of all directories and files it encounters along
> the way is available here:
> 
>         https://krautbox.wantstofly.org/~buytenh/uringfind-v3.c
> 
> Changes since v4:
> 
> - Make IORING_OP_GETDENTS read from the directory's current position
>   if the specified offset value is -1 (IORING_FEAT_RW_CUR_POS).
>   (Requested / pointed out by Tavian Barnes.)

This seems like a good feature.  As I understand it, this would
allow submitting pairs of IORING_OP_GETDENTS with IOSQE_IO_HARDLINK
wherein the first specifies the current offset and the second specifies
offset -1, thereby halfing the number of kernel round trips for N getdents64.

If the entire directory fits into the first buffer, the second would
indicate EOF.  This would certainly seem like a win, but note there
are diminishing returns as the directory size increases, versus just
doubling the buffer size.


An alternate / additional idea you may wish to consider is changing
getdents64 itself.

Ordinary read functions must return 0 length to indicate EOF, because
they can return arbitrary data.  This is not the case for getdents64.

1) Define a struct linux_dirent of minimum size containing an abnormal
value as a sentinel.  d_off = 0 or -1 should work.

2) Implement a flag for getdents64.

IF
	the flag is set AND
	we are returning a non-zero length buffer AND
	there is room in the buffer for the sentinel structure AND
	a getdents64 call using the d_off of the last struct in the
		buffer would return EOF
THEN
	append the sentinel struct to the buffer.


Using the arrangement, we would still handle a 0 length return as an
EOF, but if we see the sentinel struct, we can skip the additional call
altogether.  The saves all of the pairing of buffers and extra logic,
and unless we're unlucky and the sentinel structure did not fit in
the buffer at EOF, would always reduce the number of getdents64
calls by one.

Moreover, if the flag was available outside of io_uring, for smaller
directories, this feature would cut the number of directory reads
of readdir(3) by up to half.

> - Rebase onto for-5.13/io_uring as of 2021/03/30 plus v3 of Dmitry
>   Kadashev's "io_uring: add mkdirat support".
> 
> Changes since v3:
> 
> - Made locking in io_getdents() unconditional, as the prior
>   optimization was racy.  (Pointed out by Pavel Begunkov.)
> 
> - Rebase onto for-5.13/io_uring as of 2021/03/12 plus a manually
>   applied version of the mkdirat patch.
> 
> Changes since v2 RFC:
> 
> - Rebase onto io_uring-2021-02-17 plus a manually applied version of
>   the mkdirat patch.  The latter is needed because userland (liburing)
>   has already merged the opcode for IORING_OP_MKDIRAT (in commit
>   "io_uring.h: 5.12 pending kernel sync") while this opcode isn't in
>   the kernel yet (as of io_uring-2021-02-17), and this means that this
>   can't be merged until IORING_OP_MKDIRAT is merged.
> 
> - Adapt to changes made in "io_uring: replace force_nonblock with flags"
>   that are in io_uring-2021-02-17.
> 
> Changes since v1 RFC:
> 
> - Drop the trailing '64' from IORING_OP_GETDENTS64 (suggested by
>   Matthew Wilcox).
> 
> - Instead of requiring that sqe->off be zero, use this field to pass
>   in a directory offset to start reading from.  For the first
>   IORING_OP_GETDENTS call on a directory, this can be set to zero,
>   and for subsequent calls, it can be set to the ->d_off field of
>   the last struct linux_dirent64 returned by the previous call.
> 
> Lennert Buytenhek (2):
>   readdir: split the core of getdents64(2) out into vfs_getdents()
>   io_uring: add support for IORING_OP_GETDENTS
> 
>  fs/io_uring.c                 |   66 ++++++++++++++++++++++++++++++++++++++++++
>  fs/readdir.c                  |   25 ++++++++++-----
>  include/linux/fs.h            |    4 ++
>  include/uapi/linux/io_uring.h |    1
>  4 files changed, 88 insertions(+), 8 deletions(-)

  parent reply	other threads:[~2021-04-04  4:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30 11:17 [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 11:18 ` [PATCH v5 1/2] readdir: split the core of getdents64(2) out into vfs_getdents() Lennert Buytenhek
2021-03-30 11:19 ` [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 12:11   ` Pavel Begunkov
2021-04-04  4:42 ` Clay Harris [this message]
2021-04-04 17:25   ` [PATCH v5 0/2] " Tavian Barnes
2021-04-05  9:52     ` Clay Harris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210404044216.w7dqrioahqvbg4dz@ps29521.dreamhostps.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox