public inbox for [email protected]
 help / color / mirror / Atom feed
From: Clay Harris <[email protected]>
To: Dominique Martinet <[email protected]>
Cc: Dave Chinner <[email protected]>,
	Alexander Viro <[email protected]>,
	Christian Brauner <[email protected]>,
	Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	Stefan Roesch <[email protected]>,
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH RFC 2/2] io_uring: add support for getdents
Date: Mon, 24 Apr 2023 04:20:55 -0500	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Mon, Apr 24 2023 at 17:41:18 +0900, Dominique Martinet quoth thus:

> Thanks!
> 
> Clay Harris wrote on Mon, Apr 24, 2023 at 02:29:46AM -0500:
> > This also seems like a good place to bring up a point I made with
> > the last attempt at this code.  You're missing an optimization here.
> > getdents knows whether it is returning a buffer because the next entry
> > won't fit versus because there are no more entries.  As it doesn't
> > return that information, callers must always keep calling it back
> > until EOF.  This means a completely unnecessary call is made for
> > every open directory.  In other words, for a directory scan where
> > the buffers are large enough to not overflow, that literally twice
> > as many calls are made to getdents as necessary.  As io_uring is
> > in-kernel, it could use an internal interface to getdents which would
> > return an EOF indicator along with the (probably non-empty) buffer.
> > io_uring would then return that flag with the CQE.
> 
> Sorry I didn't spot that comment in the last iteration of the patch,
> that sounds interesting.
> 
> This isn't straightforward even in-kernel though: the ctx.actor callback
> (filldir64) isn't called when we're done, so we only know we couldn't
> fill in the buffer.
> We could have the callback record 'buffer full' and consider we're done
> if the buffer is full, or just single-handedly declare we are if we have
> more than `MAXNAMLEN + sizeof(struct linux_dirent64)` left over, but I
> assume a filesystem is allowed to return what it has readily available
> and expect the user to come back later?
> In which case we cannot use this as an heuristic...
> 
> So if we do this, it'll require a way for filesystems to say they're
> filling in as much as they can, or go the sledgehammer way of adding an
> extra dir_context dir_context callback, either way I'm not sure I want
> to deal with all that immediately unless I'm told all filesystems will
> fill as much as possible without ever failing for any temporary reason
> in the middle of iterate/iterate_shared().

I don't have a complete understanding of this area, but my thought was
not that we would look for any buffer full condition, but rather that
an iterator could be tested for next_entry == EOF.

> Call me greedy but I believe such a flag in the CQE could also be added
> later on without any bad side effects (as it's optional to check on it
> to stop calling early and there's no harm in not setting it)?

Certainly it could be added later, but I wanted to make sure some thought
was put into it now.  It would be nice to have it sooner rather than later
though...

> 
> > (* As an aside, the only place I've ever seen a non-zero lseek on a
> > directory, is in a very resource limited environment, e.g. too small
> > open files limit.  In the case of a depth-first directory scan, it
> > must close directories before completely reading them, and reopen /
> > lseek to their previous position in order to continue.  This scenario
> > is certainly not worth bothering with for io_uring.)
> 
> (I also thought of userspace NFS/9P servers are these two at least get
> requests from clients with an arbitrary offset, but I'll be glad to
> forget about them for now...)
> 
> -- 
> Dominique Martinet | Asmadeus

  reply	other threads:[~2023-04-24  9:21 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-22  8:40 [PATCH RFC 0/2] io_uring: add getdents support, take 2 Dominique Martinet
2023-04-22  8:40 ` [PATCH RFC 1/2] fs: split off vfs_getdents function of getdents64 syscall Dominique Martinet
2023-04-22 10:34   ` Dominique Martinet
2023-04-22  8:40 ` [PATCH RFC 2/2] io_uring: add support for getdents Dominique Martinet
2023-04-23 22:40   ` Dave Chinner
2023-04-23 23:43     ` Dominique Martinet
2023-04-24  7:29       ` Clay Harris
2023-04-24  8:41         ` Dominique Martinet
2023-04-24  9:20           ` Clay Harris [this message]
2023-04-24 10:55             ` Dominique Martinet
2023-04-28  5:06       ` Dave Chinner
2023-04-28  6:14         ` Dominique Martinet
2023-04-28 11:27           ` Dominique Martinet
2023-04-30 23:15             ` Dave Chinner
2023-04-29  8:07           ` Dominique Martinet
2023-04-30 23:32             ` Dave Chinner
2023-05-01  0:49               ` Dominique Martinet
2023-05-01  7:16                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230424092054.q6iiqqnrohenr5d2@ps29521.dreamhostps.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox