public inbox for [email protected]
 help / color / mirror / Atom feed
From: Christian Brauner <[email protected]>
To: Dominique Martinet <[email protected]>
Cc: Alexander Viro <[email protected]>,
	Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	Stefan Roesch <[email protected]>, Clay Harris <[email protected]>,
	Dave Chinner <[email protected]>,
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v2 1/6] fs: split off vfs_getdents function of getdents64 syscall
Date: Thu, 25 May 2023 14:33:44 +0200	[thread overview]
Message-ID: <20230525-ziellinie-dachten-3eaa30a89e6f@brauner> (raw)
In-Reply-To: <[email protected]>

On Thu, May 25, 2023 at 08:00:02PM +0900, Dominique Martinet wrote:
> Christian Brauner wrote on Thu, May 25, 2023 at 11:22:08AM +0200:
> > > What was confusing is that default_llseek updates f_pos under the
> > > inode_lock (write), and getdents also takes that lock (for read only in
> > > shared implem), so I assumed getdents also was just protected by this
> > > read lock, but I guess that was a bad assumption (as I kept pointing
> > > out, a shared read lock isn't good enough, we definitely agree there)
> > > 
> > > 
> > > In practice, in the non-registered file case io_uring is also calling
> > > fdget, so the lock is held exactly the same as the syscall and I wasn't
> > 
> > No, it really isn't. fdget() doesn't take f_pos_lock at all:
> > 
> > fdget()
> > -> __fdget()
> >    -> __fget_light()
> >       -> __fget()
> >          -> __fget_files()
> >             -> __fget_files_rcu()
> 
> Ugh, I managed to not notice that I was looking at fdget_pos and that
> it's not the same as fdget by the time I wrote two paragraphs... These
> functions all have too many wrappers and too similar names for a quick
> look before work.
> 
> > If that were true then any system call that passes an fd and uses
> > fdget() would try to acquire a mutex on f_pos_lock. We'd be serializing
> > every *at based system call on f_pos_lock whenever we have multiple fds
> > referring to the same file trying to operate on it concurrently.
> > 
> > We do have fdget_pos() and fdput_pos() as a special purpose fdget() for
> > a select group of system calls that require this synchronization.
> 
> Right, that makes sense, and invalidates everything I said after that
> anyway but it's not like looking stupid ever killed anyone.

I strongly disagree with the looking stupid part. These callchains are
quite unwieldy and it's easy to get confused. Usually if you receive a
long mail about the semantics involved - as in the earlier thread - it
means there's landmines all over.

> 
> Ok so it would require adding a new wrapper from struct file to struct
> fd that'd eventually take the lock and set FDPUT_POS_UNLOCK for... not
> fdput_pos but another function for that stopping short of fdput...
> Then just call that around both vfs_llseek and vfs_getdents calls; which
> is the easy part.
> 
> (Or possibly call mutex_lock directly like Dylan did in [1]...)
> [1] https://lore.kernel.org/all/[email protected]/T/#m3609dc8057d0bc8e41ceab643e4d630f7b91bde6

We'd need a consistent story whatever it ends up being.

> I'll be honest though I'm thankful for your explanations but I think
> I'll just do like Stefan and stop trying for now: the only reason I've
> started this was because I wanted to play with io_uring for a new toy
> project and it felt awkward without a getdents for crawling a tree; and
> I'm long past the point where I should have thrown the towel and just
> make that a sequential walk.
> There's too many "conditional patches" (NOWAIT, end of dir indicator)
> that I don't care about and require additional work to rebase
> continuously so I'll just leave it up to someone else who does care.
> 
> So to that someone: feel free to continue from these branches (I've
> included the fix for kernfs_fop_readdir that Dan Carpenter reported):
> https://github.com/martinetd/linux/commits/io_uring_getdents
> https://github.com/martinetd/liburing/commits/getdents
> 
> Or just start over, there's not that much code now hopefully the
> baseline requirements have gotten a little bit clearer.
> 
> 
> Sorry for stirring the mess and leaving halfway, if nobody does continue
> I might send a v3 when I have more time/energy in a few months, but it
> won't be quick.

It's fine.

  reply	other threads:[~2023-05-25 12:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-10 10:52 [PATCH v2 0/6] io_uring: add getdents support, take 2 Dominique Martinet
2023-05-10 10:52 ` [PATCH v2 1/6] fs: split off vfs_getdents function of getdents64 syscall Dominique Martinet
2023-05-23 15:39   ` Christian Brauner
2023-05-23 21:13     ` Dominique Martinet
2023-05-10 10:52 ` [PATCH v2 2/6] vfs_getdents/struct dir_context: add flags field Dominique Martinet
2023-05-10 10:52 ` [PATCH v2 3/6] io_uring: add support for getdents Dominique Martinet
2023-05-10 10:52 ` [PATCH v2 4/6] kernfs: implement readdir FMODE_NOWAIT Dominique Martinet
2023-05-11 10:55   ` Dan Carpenter
2023-05-11 11:03     ` Dominique Martinet
2023-05-10 10:52 ` [PATCH v2 5/6] libfs: set FMODE_NOWAIT on dir open Dominique Martinet
2023-05-10 10:52 ` [PATCH v2 6/6] RFC: io_uring getdents: test returning an EOF flag in CQE Dominique Martinet
2023-05-23 14:30   ` Christian Brauner
2023-05-23 21:05     ` Dominique Martinet
2023-05-24 13:52       ` [PATCH v2 1/6] fs: split off vfs_getdents function of getdents64 syscall Christian Brauner
2023-05-24 21:36         ` Dominique Martinet
2023-05-25  9:22           ` Christian Brauner
2023-05-25 11:00             ` Dominique Martinet
2023-05-25 12:33               ` Christian Brauner [this message]
2023-07-11  8:17               ` Hao Xu
2023-07-11  8:24                 ` Dominique Martinet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230525-ziellinie-dachten-3eaa30a89e6f@brauner \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox