public inbox for [email protected]
 help / color / mirror / Atom feed
From: Dominique Martinet <[email protected]>
To: Clay Harris <[email protected]>
Cc: Dave Chinner <[email protected]>,
	Alexander Viro <[email protected]>,
	Christian Brauner <[email protected]>,
	Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	Stefan Roesch <[email protected]>,
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH RFC 2/2] io_uring: add support for getdents
Date: Mon, 24 Apr 2023 17:41:18 +0900	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

Thanks!

Clay Harris wrote on Mon, Apr 24, 2023 at 02:29:46AM -0500:
> This also seems like a good place to bring up a point I made with
> the last attempt at this code.  You're missing an optimization here.
> getdents knows whether it is returning a buffer because the next entry
> won't fit versus because there are no more entries.  As it doesn't
> return that information, callers must always keep calling it back
> until EOF.  This means a completely unnecessary call is made for
> every open directory.  In other words, for a directory scan where
> the buffers are large enough to not overflow, that literally twice
> as many calls are made to getdents as necessary.  As io_uring is
> in-kernel, it could use an internal interface to getdents which would
> return an EOF indicator along with the (probably non-empty) buffer.
> io_uring would then return that flag with the CQE.

Sorry I didn't spot that comment in the last iteration of the patch,
that sounds interesting.

This isn't straightforward even in-kernel though: the ctx.actor callback
(filldir64) isn't called when we're done, so we only know we couldn't
fill in the buffer.
We could have the callback record 'buffer full' and consider we're done
if the buffer is full, or just single-handedly declare we are if we have
more than `MAXNAMLEN + sizeof(struct linux_dirent64)` left over, but I
assume a filesystem is allowed to return what it has readily available
and expect the user to come back later?
In which case we cannot use this as an heuristic...

So if we do this, it'll require a way for filesystems to say they're
filling in as much as they can, or go the sledgehammer way of adding an
extra dir_context dir_context callback, either way I'm not sure I want
to deal with all that immediately unless I'm told all filesystems will
fill as much as possible without ever failing for any temporary reason
in the middle of iterate/iterate_shared().
Call me greedy but I believe such a flag in the CQE could also be added
later on without any bad side effects (as it's optional to check on it
to stop calling early and there's no harm in not setting it)?


> (* As an aside, the only place I've ever seen a non-zero lseek on a
> directory, is in a very resource limited environment, e.g. too small
> open files limit.  In the case of a depth-first directory scan, it
> must close directories before completely reading them, and reopen /
> lseek to their previous position in order to continue.  This scenario
> is certainly not worth bothering with for io_uring.)

(I also thought of userspace NFS/9P servers are these two at least get
requests from clients with an arbitrary offset, but I'll be glad to
forget about them for now...)

-- 
Dominique Martinet | Asmadeus

  reply	other threads:[~2023-04-24  8:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-22  8:40 [PATCH RFC 0/2] io_uring: add getdents support, take 2 Dominique Martinet
2023-04-22  8:40 ` [PATCH RFC 1/2] fs: split off vfs_getdents function of getdents64 syscall Dominique Martinet
2023-04-22 10:34   ` Dominique Martinet
2023-04-22  8:40 ` [PATCH RFC 2/2] io_uring: add support for getdents Dominique Martinet
2023-04-23 22:40   ` Dave Chinner
2023-04-23 23:43     ` Dominique Martinet
2023-04-24  7:29       ` Clay Harris
2023-04-24  8:41         ` Dominique Martinet [this message]
2023-04-24  9:20           ` Clay Harris
2023-04-24 10:55             ` Dominique Martinet
2023-04-28  5:06       ` Dave Chinner
2023-04-28  6:14         ` Dominique Martinet
2023-04-28 11:27           ` Dominique Martinet
2023-04-30 23:15             ` Dave Chinner
2023-04-29  8:07           ` Dominique Martinet
2023-04-30 23:32             ` Dave Chinner
2023-05-01  0:49               ` Dominique Martinet
2023-05-01  7:16                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox