public inbox for [email protected]
 help / color / mirror / Atom feed
From: Clay Harris <[email protected]>
To: Tavian Barnes <[email protected]>
Cc: Lennert Buytenhek <[email protected]>,
	[email protected], Dmitry Kadashev <[email protected]>
Subject: Re: [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS
Date: Mon, 5 Apr 2021 04:52:23 -0500	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CABg4E-keAb4b4BMQDbdyj16p8GTBQgc2ribSzJCGpY-SMnn9TA@mail.gmail.com>

On Sun, Apr 04 2021 at 13:25:00 -0400, Tavian Barnes quoth thus:

> On Sun, 4 Apr 2021 at 00:42, Clay Harris <[email protected]> wrote:
> > On Tue, Mar 30 2021 at 14:17:21 +0300, Lennert Buytenhek quoth thus:
> >
> > > ...
> > >
> > > - Make IORING_OP_GETDENTS read from the directory's current position
> > >   if the specified offset value is -1 (IORING_FEAT_RW_CUR_POS).
> > >   (Requested / pointed out by Tavian Barnes.)
> >
> > This seems like a good feature.  As I understand it, this would
> > allow submitting pairs of IORING_OP_GETDENTS with IOSQE_IO_HARDLINK
> > wherein the first specifies the current offset and the second specifies
> > offset -1, thereby halfing the number of kernel round trips for N getdents64.
> 
> Yep, that was my main motivation for this suggestion.
> 
> > If the entire directory fits into the first buffer, the second would
> > indicate EOF.  This would certainly seem like a win, but note there
> > are diminishing returns as the directory size increases, versus just
> > doubling the buffer size.
> 
> True, but most directories are small, so I expect it would be a
> benefit most of the time.  Even for big directories you still get two
> buffers filled with one syscall, same as if you did a conventional
> getdents64() with twice as big a buffer.
> 
> > An alternate / additional idea you may wish to consider is changing
> > getdents64 itself.
> >
> > Ordinary read functions must return 0 length to indicate EOF, because
> > they can return arbitrary data.  This is not the case for getdents64.
> >
> > 1) Define a struct linux_dirent of minimum size containing an abnormal
> > value as a sentinel.  d_off = 0 or -1 should work.
> >
> > 2) Implement a flag for getdents64.
> 
> Sadly getdents64() doesn't take a flags argument.  We'd probably need
> a new syscall.
> 
> > IF
> >         the flag is set AND
> >         we are returning a non-zero length buffer AND
> >         there is room in the buffer for the sentinel structure AND
> >         a getdents64 call using the d_off of the last struct in the
> >                 buffer would return EOF
> > THEN
> >         append the sentinel struct to the buffer.
> >
> >
> > Using the arrangement, we would still handle a 0 length return as an
> > EOF, but if we see the sentinel struct, we can skip the additional call
> > altogether.  The saves all of the pairing of buffers and extra logic,
> > and unless we're unlucky and the sentinel structure did not fit in
> > the buffer at EOF, would always reduce the number of getdents64
> > calls by one.
> >
> > Moreover, if the flag was available outside of io_uring, for smaller
> > directories, this feature would cut the number of directory reads
> > of readdir(3) by up to half.
> 
> If we need a new syscall anyway, the calling convention could be
> adjusted to indicate EOF more easily than that, e.g.
> 
> int getdents2(int fd, void *buf, size_t *size, unsigned long flags);
> 
> With 0 being EOF, 1 being not-EOF, and -1 for error, or something.

Good point!  I was hedging the idea above with "if the flag was
available outside of io_uring" to allow an internal-only flag.
A new syscall would certainly be more useful as it would improve
every readdir(3) call, and of course could be called from inside
io_uring more efficiently than the current getdents64.

Alas, adding a new syscall like this is a little beyond my level of
expertise.  Would you (or anyone else reading this) have any interest in
implementing a getdents2?

> -- 
> Tavian Barnes

      reply	other threads:[~2021-04-05  9:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30 11:17 [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 11:18 ` [PATCH v5 1/2] readdir: split the core of getdents64(2) out into vfs_getdents() Lennert Buytenhek
2021-03-30 11:19 ` [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 12:11   ` Pavel Begunkov
2021-04-04  4:42 ` [PATCH v5 0/2] " Clay Harris
2021-04-04 17:25   ` Tavian Barnes
2021-04-05  9:52     ` Clay Harris [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210405095223.dulkil23btvxagg7@ps29521.dreamhostps.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox