* [PATCH 01/14] namei: prepare do_rmdir for refactoring
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 19:49 ` Al Viro
2021-07-15 10:35 ` [PATCH 02/14] namei: clean up do_rmdir retry logic Dmitry Kadashev
` (13 subsequent siblings)
14 siblings, 1 reply; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
This is just a preparation for the move of the main rmdir logic to a
separate function to make the logic easier to follow. This change
contains the flow changes so that the actual change to move the main
logic to a separate function does no change the flow at all.
Two changes here:
1. Previously on filename_parentat() error the function used to exit
immediately, and now it will check the return code to see if ESTALE
retry is appropriate. The filename_parentat() does its own retries on
ESTALE, but this extra check should be completely fine.
2. The retry_estale() check is wrapped in unlikely(). Some other places
already have that and overall it seems to make sense.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wh=cpt_tQCirzFZRPawRpbuFTZ2MxNpXiyUF+eBXF=+sw@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=wjFd0qn6asio=zg7zUTRmSty_TpAEhnwym1Qb=wFgCKzA@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index b5adfd4f7de6..99d5c3a4c12e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3998,11 +3998,11 @@ int do_rmdir(int dfd, struct filename *name)
mnt_drop_write(path.mnt);
exit2:
path_put(&path);
- if (retry_estale(error, lookup_flags)) {
+exit1:
+ if (unlikely(retry_estale(error, lookup_flags))) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
}
-exit1:
putname(name);
return error;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 01/14] namei: prepare do_rmdir for refactoring
2021-07-15 10:35 ` [PATCH 01/14] namei: prepare do_rmdir for refactoring Dmitry Kadashev
@ 2021-07-15 19:49 ` Al Viro
2021-07-20 6:52 ` Dmitry Kadashev
0 siblings, 1 reply; 23+ messages in thread
From: Al Viro @ 2021-07-15 19:49 UTC (permalink / raw)
To: Dmitry Kadashev
Cc: Jens Axboe, Christian Brauner, Linus Torvalds, linux-fsdevel,
io-uring
On Thu, Jul 15, 2021 at 05:35:47PM +0700, Dmitry Kadashev wrote:
> This is just a preparation for the move of the main rmdir logic to a
> separate function to make the logic easier to follow. This change
> contains the flow changes so that the actual change to move the main
> logic to a separate function does no change the flow at all.
>
> Two changes here:
>
> 1. Previously on filename_parentat() error the function used to exit
> immediately, and now it will check the return code to see if ESTALE
> retry is appropriate. The filename_parentat() does its own retries on
> ESTALE, but this extra check should be completely fine.
>
> 2. The retry_estale() check is wrapped in unlikely(). Some other places
> already have that and overall it seems to make sense.
That's not the way to do it.
static inline bool
retry_estale(const long error, const unsigned int flags)
{
return unlikely(error == -ESTALE && !(flags & LOOKUP_REVAL));
}
And strip the redundant unlikely in the callers. Having that markup
in callers makes sense only when different callers have different
odds of positive result, which is very much not the case here.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 01/14] namei: prepare do_rmdir for refactoring
2021-07-15 19:49 ` Al Viro
@ 2021-07-20 6:52 ` Dmitry Kadashev
0 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-20 6:52 UTC (permalink / raw)
To: Al Viro
Cc: Jens Axboe, Christian Brauner, Linus Torvalds, linux-fsdevel,
io-uring
On Fri, Jul 16, 2021 at 2:49 AM Al Viro <[email protected]> wrote:
>
> On Thu, Jul 15, 2021 at 05:35:47PM +0700, Dmitry Kadashev wrote:
> > This is just a preparation for the move of the main rmdir logic to a
> > separate function to make the logic easier to follow. This change
> > contains the flow changes so that the actual change to move the main
> > logic to a separate function does no change the flow at all.
> >
> > Two changes here:
> >
> > 1. Previously on filename_parentat() error the function used to exit
> > immediately, and now it will check the return code to see if ESTALE
> > retry is appropriate. The filename_parentat() does its own retries on
> > ESTALE, but this extra check should be completely fine.
> >
> > 2. The retry_estale() check is wrapped in unlikely(). Some other places
> > already have that and overall it seems to make sense.
>
> That's not the way to do it.
>
> static inline bool
> retry_estale(const long error, const unsigned int flags)
> {
> return unlikely(error == -ESTALE && !(flags & LOOKUP_REVAL));
> }
>
> And strip the redundant unlikely in the callers. Having that markup
> in callers makes sense only when different callers have different
> odds of positive result, which is very much not the case here.
Yeah, I thought about this, but wasn't sure about interplay of
inline+[un]likely(). But I see that it's used quite a bit throughout the
kernel code so I suppose it's fine. I'll use that next time, thanks.
--
Dmitry Kadashev
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 02/14] namei: clean up do_rmdir retry logic
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 01/14] namei: prepare do_rmdir for refactoring Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 20:02 ` Al Viro
2021-07-15 10:35 ` [PATCH 03/14] namei: prepare do_unlinkat for refactoring Dmitry Kadashev
` (12 subsequent siblings)
14 siblings, 1 reply; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
No functional changes, just move the main logic to a helper function to
make the whole thing easier to follow.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wh=cpt_tQCirzFZRPawRpbuFTZ2MxNpXiyUF+eBXF=+sw@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 99d5c3a4c12e..fbae4e9fcf53 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3947,7 +3947,7 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
}
EXPORT_SYMBOL(vfs_rmdir);
-int do_rmdir(int dfd, struct filename *name)
+static int try_rmdir(int dfd, struct filename *name, unsigned int lookup_flags)
{
struct user_namespace *mnt_userns;
int error;
@@ -3955,54 +3955,60 @@ int do_rmdir(int dfd, struct filename *name)
struct path path;
struct qstr last;
int type;
- unsigned int lookup_flags = 0;
-retry:
+
error = __filename_parentat(dfd, name, lookup_flags, &path, &last, &type);
if (error)
- goto exit1;
+ return error;
switch (type) {
case LAST_DOTDOT:
error = -ENOTEMPTY;
- goto exit2;
+ goto exit1;
case LAST_DOT:
error = -EINVAL;
- goto exit2;
+ goto exit1;
case LAST_ROOT:
error = -EBUSY;
- goto exit2;
+ goto exit1;
}
error = mnt_want_write(path.mnt);
if (error)
- goto exit2;
+ goto exit1;
inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
dentry = __lookup_hash(&last, path.dentry, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
- goto exit3;
+ goto exit2;
if (!dentry->d_inode) {
error = -ENOENT;
- goto exit4;
+ goto exit3;
}
error = security_path_rmdir(&path, dentry);
if (error)
- goto exit4;
+ goto exit3;
mnt_userns = mnt_user_ns(path.mnt);
error = vfs_rmdir(mnt_userns, path.dentry->d_inode, dentry);
-exit4:
- dput(dentry);
exit3:
+ dput(dentry);
+exit2:
inode_unlock(path.dentry->d_inode);
mnt_drop_write(path.mnt);
-exit2:
- path_put(&path);
exit1:
- if (unlikely(retry_estale(error, lookup_flags))) {
- lookup_flags |= LOOKUP_REVAL;
- goto retry;
- }
+ path_put(&path);
+
+ return error;
+}
+
+int do_rmdir(int dfd, struct filename *name)
+{
+ int error;
+
+ error = try_rmdir(dfd, name, 0);
+ if (unlikely(retry_estale(error, 0)))
+ error = try_rmdir(dfd, name, LOOKUP_REVAL);
+
putname(name);
return error;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 02/14] namei: clean up do_rmdir retry logic
2021-07-15 10:35 ` [PATCH 02/14] namei: clean up do_rmdir retry logic Dmitry Kadashev
@ 2021-07-15 20:02 ` Al Viro
0 siblings, 0 replies; 23+ messages in thread
From: Al Viro @ 2021-07-15 20:02 UTC (permalink / raw)
To: Dmitry Kadashev
Cc: Jens Axboe, Christian Brauner, Linus Torvalds, linux-fsdevel,
io-uring
On Thu, Jul 15, 2021 at 05:35:48PM +0700, Dmitry Kadashev wrote:
> No functional changes, just move the main logic to a helper function to
> make the whole thing easier to follow.
If you are renaming that pile of labels, at least give them names that would
mean something... And TBH I would probably go for something like
dentry = __lookup_hash(&last, path.dentry, lookup_flags);
if (IS_ERR(dentry)) {
error = PTR_ERR(dentry);
goto out_unlock;
}
if (!dentry->d_inode)
error = -ENOENT;
if (!error)
error = security_path_rmdir(&path, dentry);
if (!error)
error = vfs_rmdir(mnt_user_ns(path.mnt),
path.dentry->d_inode, dentry);
dput(dentry);
out_unlock:
there, to simplify that pile a bit...
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 03/14] namei: prepare do_unlinkat for refactoring
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 01/14] namei: prepare do_rmdir for refactoring Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 02/14] namei: clean up do_rmdir retry logic Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 04/14] namei: clean up do_unlinkat retry logic Dmitry Kadashev
` (11 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
This is just a preparation for the move of the main unlinkat logic to a
separate function to make the logic easier to follow. This change
contains the flow changes so that the actual change to move the main
logic to a separate function does no change the flow at all.
Just like the similar patch for rmdir a few commits before, there are
two changes here:
1. Previously on filename_parentat() error the function used to exit
immediately, and now it will check the return code to see if ESTALE
retry is appropriate. The filename_parentat() does its own retries on
ESTALE, but this extra check should be completely fine.
2. The retry_estale() check is wrapped in unlikely(). Some other places
already have that and overall it seems to make sense.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wh=cpt_tQCirzFZRPawRpbuFTZ2MxNpXiyUF+eBXF=+sw@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=wjFd0qn6asio=zg7zUTRmSty_TpAEhnwym1Qb=wFgCKzA@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index fbae4e9fcf53..6253486718d5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4148,12 +4148,12 @@ int do_unlinkat(int dfd, struct filename *name)
mnt_drop_write(path.mnt);
exit2:
path_put(&path);
- if (retry_estale(error, lookup_flags)) {
+exit1:
+ if (unlikely(retry_estale(error, lookup_flags))) {
lookup_flags |= LOOKUP_REVAL;
inode = NULL;
goto retry;
}
-exit1:
putname(name);
return error;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 04/14] namei: clean up do_unlinkat retry logic
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (2 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 03/14] namei: prepare do_unlinkat for refactoring Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 05/14] namei: prepare do_mkdirat for refactoring Dmitry Kadashev
` (10 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
No functional changes, just move the main logic to a helper function to
make the whole thing easier to follow.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wh=cpt_tQCirzFZRPawRpbuFTZ2MxNpXiyUF+eBXF=+sw@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 42 ++++++++++++++++++++++++------------------
1 file changed, 24 insertions(+), 18 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 6253486718d5..703cce40d597 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4091,7 +4091,9 @@ EXPORT_SYMBOL(vfs_unlink);
* writeout happening, and we don't want to prevent access to the directory
* while waiting on the I/O.
*/
-int do_unlinkat(int dfd, struct filename *name)
+
+static int try_unlinkat(int dfd, struct filename *name,
+ unsigned int lookup_flags)
{
int error;
struct dentry *dentry;
@@ -4100,19 +4102,18 @@ int do_unlinkat(int dfd, struct filename *name)
int type;
struct inode *inode = NULL;
struct inode *delegated_inode = NULL;
- unsigned int lookup_flags = 0;
-retry:
+
error = __filename_parentat(dfd, name, lookup_flags, &path, &last, &type);
if (error)
- goto exit1;
+ return error;
error = -EISDIR;
if (type != LAST_NORM)
- goto exit2;
+ goto exit1;
error = mnt_want_write(path.mnt);
if (error)
- goto exit2;
+ goto exit1;
retry_deleg:
inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
dentry = __lookup_hash(&last, path.dentry, lookup_flags);
@@ -4129,11 +4130,11 @@ int do_unlinkat(int dfd, struct filename *name)
ihold(inode);
error = security_path_unlink(&path, dentry);
if (error)
- goto exit3;
+ goto exit2;
mnt_userns = mnt_user_ns(path.mnt);
error = vfs_unlink(mnt_userns, path.dentry->d_inode, dentry,
&delegated_inode);
-exit3:
+exit2:
dput(dentry);
}
inode_unlock(path.dentry->d_inode);
@@ -4146,17 +4147,9 @@ int do_unlinkat(int dfd, struct filename *name)
goto retry_deleg;
}
mnt_drop_write(path.mnt);
-exit2:
- path_put(&path);
exit1:
- if (unlikely(retry_estale(error, lookup_flags))) {
- lookup_flags |= LOOKUP_REVAL;
- inode = NULL;
- goto retry;
- }
- putname(name);
+ path_put(&path);
return error;
-
slashes:
if (d_is_negative(dentry))
error = -ENOENT;
@@ -4164,7 +4157,20 @@ int do_unlinkat(int dfd, struct filename *name)
error = -EISDIR;
else
error = -ENOTDIR;
- goto exit3;
+ goto exit2;
+}
+
+int do_unlinkat(int dfd, struct filename *name)
+{
+ int error;
+
+ error = try_unlinkat(dfd, name, 0);
+ if (unlikely(retry_estale(error, 0)))
+ error = try_unlinkat(dfd, name, LOOKUP_REVAL);
+
+ putname(name);
+ return error;
+
}
SYSCALL_DEFINE3(unlinkat, int, dfd, const char __user *, pathname, int, flag)
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 05/14] namei: prepare do_mkdirat for refactoring
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (3 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 04/14] namei: clean up do_unlinkat retry logic Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 20:17 ` Al Viro
2021-07-15 10:35 ` [PATCH 06/14] namei: clean up do_mkdirat retry logic Dmitry Kadashev
` (9 subsequent siblings)
14 siblings, 1 reply; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
This is just a preparation for the move of the main mkdirat logic to a
separate function to make the logic easier to follow. This change
contains the flow changes so that the actual change to move the main
logic to a separate function does no change the flow at all.
Just like the similar patches for rmdir and unlink a few commits before,
there two changes here:
1. Previously on filename_create() error the function used to exit
immediately, and now it will check the return code to see if ESTALE
retry is appropriate. The filename_create() does its own retries on
ESTALE (at least via filename_parentat() used inside), but this extra
check should be completely fine.
2. The retry_estale() check is wrapped in unlikely(). Some other places
already have that and overall it seems to make sense
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wijsw1QSsQHFu_6dEoZEr_zvT7++WJWohcuEkLqqXBGrQ@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=wjFd0qn6asio=zg7zUTRmSty_TpAEhnwym1Qb=wFgCKzA@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 703cce40d597..54dbd1e38298 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3861,7 +3861,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
dentry = __filename_create(dfd, name, &path, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
- goto out_putname;
+ goto out;
if (!IS_POSIXACL(path.dentry->d_inode))
mode &= ~current_umask();
@@ -3873,11 +3873,11 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
mode);
}
done_path_create(&path, dentry);
- if (retry_estale(error, lookup_flags)) {
+out:
+ if (unlikely(retry_estale(error, lookup_flags))) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
}
-out_putname:
putname(name);
return error;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 05/14] namei: prepare do_mkdirat for refactoring
2021-07-15 10:35 ` [PATCH 05/14] namei: prepare do_mkdirat for refactoring Dmitry Kadashev
@ 2021-07-15 20:17 ` Al Viro
2021-07-20 6:59 ` Dmitry Kadashev
0 siblings, 1 reply; 23+ messages in thread
From: Al Viro @ 2021-07-15 20:17 UTC (permalink / raw)
To: Dmitry Kadashev
Cc: Jens Axboe, Christian Brauner, Linus Torvalds, linux-fsdevel,
io-uring
On Thu, Jul 15, 2021 at 05:35:51PM +0700, Dmitry Kadashev wrote:
> This is just a preparation for the move of the main mkdirat logic to a
> separate function to make the logic easier to follow. This change
> contains the flow changes so that the actual change to move the main
> logic to a separate function does no change the flow at all.
>
> Just like the similar patches for rmdir and unlink a few commits before,
> there two changes here:
>
> 1. Previously on filename_create() error the function used to exit
> immediately, and now it will check the return code to see if ESTALE
> retry is appropriate. The filename_create() does its own retries on
> ESTALE (at least via filename_parentat() used inside), but this extra
> check should be completely fine.
This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL
is the final stage of escalation; if we had to go there, there's no
point being optimistic about the last dcache lookup, nevermind trying
to retry the parent pathwalk if we fail with -ESTALE doing it.
I'm not saying that it's something worth optimizing for; the problem
is different - the logics makes no sense whatsoever that way. It's
a matter of reader's cycles wasted on "what the fuck are we trying
to do here?", not the CPU cycles wasted on execution.
While we are at it, it makes no sense for filename_parentat() and its
ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean,
look at the sequence of calls in there. And try to make sense of
it. Especially of the "OK, RCU attempt told us to sod off and try normal;
here, let's call path_parentat() with LOOKUP_REVAL for flags and if it
says -ESTALE, call it again with exact same arguments" part.
Seriously, look at that from the point of view of somebody who tries
to make sense of the entire thing.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 05/14] namei: prepare do_mkdirat for refactoring
2021-07-15 20:17 ` Al Viro
@ 2021-07-20 6:59 ` Dmitry Kadashev
2021-07-20 13:55 ` Al Viro
0 siblings, 1 reply; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-20 6:59 UTC (permalink / raw)
To: Al Viro
Cc: Jens Axboe, Christian Brauner, Linus Torvalds, linux-fsdevel,
io-uring
On Fri, Jul 16, 2021 at 3:17 AM Al Viro <[email protected]> wrote:
>
> On Thu, Jul 15, 2021 at 05:35:51PM +0700, Dmitry Kadashev wrote:
> > This is just a preparation for the move of the main mkdirat logic to a
> > separate function to make the logic easier to follow. This change
> > contains the flow changes so that the actual change to move the main
> > logic to a separate function does no change the flow at all.
> >
> > Just like the similar patches for rmdir and unlink a few commits before,
> > there two changes here:
> >
> > 1. Previously on filename_create() error the function used to exit
> > immediately, and now it will check the return code to see if ESTALE
> > retry is appropriate. The filename_create() does its own retries on
> > ESTALE (at least via filename_parentat() used inside), but this extra
> > check should be completely fine.
>
> This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL
> is the final stage of escalation; if we had to go there, there's no
> point being optimistic about the last dcache lookup, nevermind trying
> to retry the parent pathwalk if we fail with -ESTALE doing it.
>
> I'm not saying that it's something worth optimizing for; the problem
> is different - the logics makes no sense whatsoever that way. It's
> a matter of reader's cycles wasted on "what the fuck are we trying
> to do here?", not the CPU cycles wasted on execution.
>
> While we are at it, it makes no sense for filename_parentat() and its
> ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean,
> look at the sequence of calls in there. And try to make sense of
> it. Especially of the "OK, RCU attempt told us to sod off and try normal;
> here, let's call path_parentat() with LOOKUP_REVAL for flags and if it
> says -ESTALE, call it again with exact same arguments" part.
>
> Seriously, look at that from the point of view of somebody who tries
> to make sense of the entire thing
OK, let me try to venture down that "change the way ESTALE retries are
done completely" path. The problem here is I'm not familiar with the
code enough to be sure the conversion is 1-to-1 (i.e. that we can't get
ESTALE from somewhere unexpected), and that retries are open-coded in
quite a few places it seems. Anyway, I'll try and dig in and come back
with either an RFC patch or some questions. Thanks for the feedback, Al.
--
Dmitry Kadashev
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 05/14] namei: prepare do_mkdirat for refactoring
2021-07-20 6:59 ` Dmitry Kadashev
@ 2021-07-20 13:55 ` Al Viro
2021-07-21 10:02 ` Dmitry Kadashev
0 siblings, 1 reply; 23+ messages in thread
From: Al Viro @ 2021-07-20 13:55 UTC (permalink / raw)
To: Dmitry Kadashev
Cc: Jens Axboe, Christian Brauner, Linus Torvalds, linux-fsdevel,
io-uring
On Tue, Jul 20, 2021 at 01:59:29PM +0700, Dmitry Kadashev wrote:
> > This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL
> > is the final stage of escalation; if we had to go there, there's no
> > point being optimistic about the last dcache lookup, nevermind trying
> > to retry the parent pathwalk if we fail with -ESTALE doing it.
> >
> > I'm not saying that it's something worth optimizing for; the problem
> > is different - the logics makes no sense whatsoever that way. It's
> > a matter of reader's cycles wasted on "what the fuck are we trying
> > to do here?", not the CPU cycles wasted on execution.
> >
> > While we are at it, it makes no sense for filename_parentat() and its
> > ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean,
> > look at the sequence of calls in there. And try to make sense of
> > it. Especially of the "OK, RCU attempt told us to sod off and try normal;
> > here, let's call path_parentat() with LOOKUP_REVAL for flags and if it
> > says -ESTALE, call it again with exact same arguments" part.
> >
> > Seriously, look at that from the point of view of somebody who tries
> > to make sense of the entire thing
>
> OK, let me try to venture down that "change the way ESTALE retries are
> done completely" path. The problem here is I'm not familiar with the
> code enough to be sure the conversion is 1-to-1 (i.e. that we can't get
> ESTALE from somewhere unexpected), and that retries are open-coded in
> quite a few places it seems. Anyway, I'll try and dig in and come back
> with either an RFC patch or some questions. Thanks for the feedback, Al.
I'd try to look at the primitives that go through RCU/normal/REVAL series.
They are all in fs/namei.c; filename_lookup(), filename_parentat(),
do_filp_open() and do_filo_open_root(). The latter pair almost certainly
is fine as-is.
retry_estale() crap is limited to user_path_at/user_path_at_empty users,
along with some filename_parentat() ones.
There we follow that series with something that might give us ESTALE,
and if it does, we want to repeat the whole thing in REVAL mode.
OTOH, there are callers (and fairly similar ones, at that - look at e.g.
AF_UNIX bind doing mknod) where we don't have that kind of logics.
Question 1: which of those are lacking retry_estale(), even though they
might arguably need it? Note that e.g. AF_UNIX bind uses kern_path_create(),
so we need to look at all callchains leading to those, not just the ones
in fs/namei.c guts.
If most of those really want retry_estale, we'd be better off if we took
the REVAL fallback out of filename_lookup() and filename_parentat()
and turned massaged the users from
do rcu/normal/reval lookups
if failed, fuck off
do other work
if it fails with ESTALE
do rcu/reval/reval (yes, really)
if failed, fuck off
do other work
into
do rcu/normal lookups
if not failed
do other work
if something (including initial lookup) failed with ESTALE
repeat the entire thing with LOOKUP_REVAL in the mix
possibly with a helper function involved.
For the ones that need retry_estale that's a win; for the rest it'd
be boilerplate (that's basically the ones where "do other work" never
fails with ESTALE).
Question 2: how are "need retry_estale"/"fine just with ESTALE fallback
in filename_{lookup,parentat}()" cases are distributed?
If the majority is in "need retry_estale" class, then something similar
to what's been outlined above would probably be a decent solution.
Otherwise we'll need wrappers equivalent to current behaviour, and that's
where it can get unpleasant - at which level in call chain do we put
that wrapper? Sure, we can add filename_lookup_as_it_fucking_used_to_be().
Except that it's not called directly by those "don't need retry_estale"
users, so we'd need to provide such counterparts for them as well ;-/
IOW, we need the call tree for filename_lookup()/filename_parentat(),
with leaves (originators of call chain) marked with "does that user
do retry_estale?" (and tracked far back for the answer to depend only
upon the call site - if an intermediate can come from both kinds
of places, we need to track back to its callers).
Then we'll be able to see at which levels do we want those "as it used
to behave" wrappers...
If you want to dig around, that would probably be a reasonable place to
start.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 05/14] namei: prepare do_mkdirat for refactoring
2021-07-20 13:55 ` Al Viro
@ 2021-07-21 10:02 ` Dmitry Kadashev
0 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-21 10:02 UTC (permalink / raw)
To: Al Viro
Cc: Jens Axboe, Christian Brauner, Linus Torvalds, linux-fsdevel,
io-uring
.On Tue, Jul 20, 2021 at 8:58 PM Al Viro <[email protected]> wrote:
>
> On Tue, Jul 20, 2021 at 01:59:29PM +0700, Dmitry Kadashev wrote:
>
> > > This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL
> > > is the final stage of escalation; if we had to go there, there's no
> > > point being optimistic about the last dcache lookup, nevermind trying
> > > to retry the parent pathwalk if we fail with -ESTALE doing it.
> > >
> > > I'm not saying that it's something worth optimizing for; the problem
> > > is different - the logics makes no sense whatsoever that way. It's
> > > a matter of reader's cycles wasted on "what the fuck are we trying
> > > to do here?", not the CPU cycles wasted on execution.
> > >
> > > While we are at it, it makes no sense for filename_parentat() and its
> > > ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean,
> > > look at the sequence of calls in there. And try to make sense of
> > > it. Especially of the "OK, RCU attempt told us to sod off and try normal;
> > > here, let's call path_parentat() with LOOKUP_REVAL for flags and if it
> > > says -ESTALE, call it again with exact same arguments" part.
> > >
> > > Seriously, look at that from the point of view of somebody who tries
> > > to make sense of the entire thing
> >
> > OK, let me try to venture down that "change the way ESTALE retries are
> > done completely" path. The problem here is I'm not familiar with the
> > code enough to be sure the conversion is 1-to-1 (i.e. that we can't get
> > ESTALE from somewhere unexpected), and that retries are open-coded in
> > quite a few places it seems. Anyway, I'll try and dig in and come back
> > with either an RFC patch or some questions. Thanks for the feedback, Al.
>
> I'd try to look at the primitives that go through RCU/normal/REVAL series.
> They are all in fs/namei.c; filename_lookup(), filename_parentat(),
> do_filp_open() and do_filo_open_root(). The latter pair almost certainly
> is fine as-is.
>
> retry_estale() crap is limited to user_path_at/user_path_at_empty users,
> along with some filename_parentat() ones.
>
> There we follow that series with something that might give us ESTALE,
> and if it does, we want to repeat the whole thing in REVAL mode.
>
> OTOH, there are callers (and fairly similar ones, at that - look at e.g.
> AF_UNIX bind doing mknod) where we don't have that kind of logics.
>
> Question 1: which of those are lacking retry_estale(), even though they
> might arguably need it? Note that e.g. AF_UNIX bind uses kern_path_create(),
> so we need to look at all callchains leading to those, not just the ones
> in fs/namei.c guts.
>
> If most of those really want retry_estale, we'd be better off if we took
> the REVAL fallback out of filename_lookup() and filename_parentat()
> and turned massaged the users from
> do rcu/normal/reval lookups
> if failed, fuck off
> do other work
> if it fails with ESTALE
> do rcu/reval/reval (yes, really)
> if failed, fuck off
> do other work
> into
> do rcu/normal lookups
> if not failed
> do other work
> if something (including initial lookup) failed with ESTALE
> repeat the entire thing with LOOKUP_REVAL in the mix
> possibly with a helper function involved.
> For the ones that need retry_estale that's a win; for the rest it'd
> be boilerplate (that's basically the ones where "do other work" never
> fails with ESTALE).
>
> Question 2: how are "need retry_estale"/"fine just with ESTALE fallback
> in filename_{lookup,parentat}()" cases are distributed?
>
> If the majority is in "need retry_estale" class, then something similar
> to what's been outlined above would probably be a decent solution.
>
> Otherwise we'll need wrappers equivalent to current behaviour, and that's
> where it can get unpleasant - at which level in call chain do we put
> that wrapper? Sure, we can add filename_lookup_as_it_fucking_used_to_be().
> Except that it's not called directly by those "don't need retry_estale"
> users, so we'd need to provide such counterparts for them as well ;-/
>
> IOW, we need the call tree for filename_lookup()/filename_parentat(),
> with leaves (originators of call chain) marked with "does that user
> do retry_estale?" (and tracked far back for the answer to depend only
> upon the call site - if an intermediate can come from both kinds
> of places, we need to track back to its callers).
>
> Then we'll be able to see at which levels do we want those "as it used
> to behave" wrappers...
>
> If you want to dig around, that would probably be a reasonable place to
> start.
Thanks for the pointers! I am happy to dig into this. I can't spend much
time per week on this though, so it will take some time.
--
Dmitry Kadashev
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 06/14] namei: clean up do_mkdirat retry logic
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (4 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 05/14] namei: prepare do_mkdirat for refactoring Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 07/14] namei: prepare do_mknodat for refactoring Dmitry Kadashev
` (8 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
No functional changes, just move the main logic to a helper function to
make the whole thing easier to follow.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wijsw1QSsQHFu_6dEoZEr_zvT7++WJWohcuEkLqqXBGrQ@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 54dbd1e38298..50ab1cd00983 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3850,18 +3850,16 @@ int vfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
}
EXPORT_SYMBOL(vfs_mkdir);
-int do_mkdirat(int dfd, struct filename *name, umode_t mode)
+static int try_mkdirat(int dfd, struct filename *name, umode_t mode,
+ unsigned int lookup_flags)
{
struct dentry *dentry;
struct path path;
int error;
- unsigned int lookup_flags = LOOKUP_DIRECTORY;
-retry:
dentry = __filename_create(dfd, name, &path, lookup_flags);
- error = PTR_ERR(dentry);
if (IS_ERR(dentry))
- goto out;
+ return PTR_ERR(dentry);
if (!IS_POSIXACL(path.dentry->d_inode))
mode &= ~current_umask();
@@ -3873,11 +3871,18 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
mode);
}
done_path_create(&path, dentry);
-out:
- if (unlikely(retry_estale(error, lookup_flags))) {
- lookup_flags |= LOOKUP_REVAL;
- goto retry;
- }
+
+ return error;
+}
+
+int do_mkdirat(int dfd, struct filename *name, umode_t mode)
+{
+ int error;
+
+ error = try_mkdirat(dfd, name, mode, 0);
+ if (unlikely(retry_estale(error, 0)))
+ error = try_mkdirat(dfd, name, mode, LOOKUP_REVAL);
+
putname(name);
return error;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/14] namei: prepare do_mknodat for refactoring
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (5 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 06/14] namei: clean up do_mkdirat retry logic Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 08/14] namei: clean up do_mknodat retry logic Dmitry Kadashev
` (7 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
This is just a preparation for the move of the main mknodat logic to a
separate function to make the logic easier to follow. This change
contains the flow changes so that the actual change to move the main
logic to a separate function does no change the flow at all.
There are 3 changes to the flow here:
1. may_mknod() call is repeated on ESTALE retry now. It's OK to do since
the call is very cheap (and ESTALE retry is a slow path) and the result
doesn't change;
2. Just like the similar patches for rmdir and others a few commits
before, previously on filename_create() error the function used to exit
immediately, and now it will check the return code to see if ESTALE
retry is appropriate. The filename_create() does its own retries on
ESTALE (at least via filename_parentat() used inside), but this extra
check should be completely fine.
3. The retry_estale() check is wrapped in unlikely(). Some other places
already have that and overall it seems to make sense
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wijsw1QSsQHFu_6dEoZEr_zvT7++WJWohcuEkLqqXBGrQ@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=wjFd0qn6asio=zg7zUTRmSty_TpAEhnwym1Qb=wFgCKzA@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 50ab1cd00983..4008867e516d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3754,10 +3754,10 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
int error;
unsigned int lookup_flags = 0;
+retry:
error = may_mknod(mode);
if (error)
goto out1;
-retry:
dentry = __filename_create(dfd, name, &path, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
@@ -3788,11 +3788,11 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
}
out2:
done_path_create(&path, dentry);
- if (retry_estale(error, lookup_flags)) {
+out1:
+ if (unlikely(retry_estale(error, lookup_flags))) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
}
-out1:
putname(name);
return error;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/14] namei: clean up do_mknodat retry logic
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (6 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 07/14] namei: prepare do_mknodat for refactoring Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 09/14] namei: prepare do_symlinkat for refactoring Dmitry Kadashev
` (6 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
No functional changes, just move the main logic to a helper function to
make the whole thing easier to follow.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wiG+sN+2zSoAOggKCGue2kOJvw3rQySvQXsZstRQFTN+g@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 4008867e516d..f7cde1543b47 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3745,29 +3745,26 @@ static int may_mknod(umode_t mode)
}
}
-static int do_mknodat(int dfd, struct filename *name, umode_t mode,
- unsigned int dev)
+static int try_mknodat(int dfd, struct filename *name, umode_t mode,
+ unsigned int dev, unsigned int lookup_flags)
{
struct user_namespace *mnt_userns;
struct dentry *dentry;
struct path path;
int error;
- unsigned int lookup_flags = 0;
-retry:
error = may_mknod(mode);
if (error)
- goto out1;
+ return error;
dentry = __filename_create(dfd, name, &path, lookup_flags);
- error = PTR_ERR(dentry);
if (IS_ERR(dentry))
- goto out1;
+ return PTR_ERR(dentry);
if (!IS_POSIXACL(path.dentry->d_inode))
mode &= ~current_umask();
error = security_path_mknod(&path, dentry, mode, dev);
if (error)
- goto out2;
+ goto out;
mnt_userns = mnt_user_ns(path.mnt);
switch (mode & S_IFMT) {
@@ -3786,13 +3783,20 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
dentry, mode, 0);
break;
}
-out2:
+out:
done_path_create(&path, dentry);
-out1:
- if (unlikely(retry_estale(error, lookup_flags))) {
- lookup_flags |= LOOKUP_REVAL;
- goto retry;
- }
+ return error;
+}
+
+static int do_mknodat(int dfd, struct filename *name, umode_t mode,
+ unsigned int dev)
+{
+ int error;
+
+ error = try_mknodat(dfd, name, mode, dev, 0);
+ if (unlikely(retry_estale(error, 0)))
+ error = try_mknodat(dfd, name, mode, dev, LOOKUP_REVAL);
+
putname(name);
return error;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 09/14] namei: prepare do_symlinkat for refactoring
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (7 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 08/14] namei: clean up do_mknodat retry logic Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 10/14] namei: clean up do_symlinkat retry logic Dmitry Kadashev
` (5 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
This is just a preparation for the move of the main symlinkat logic to a
separate function to make the logic easier to follow. This change
contains the flow changes so that the actual change to move the main
logic to a separate function does no change the flow at all.
There are 3 changes to the flow here:
1. IS_ERR(from) check is repeated on ESTALE retry now. It's OK to do
since the call is trivial (and ESTALE retry is a slow path);
2. Just like the similar patches for rmdir and others a few commits
before, previously on filename_create() error the function used to exit
immediately, and now it will check the return code to see if ESTALE
retry is appropriate. The filename_create() does its own retries on
ESTALE (at least via filename_parentat() used inside), but this extra
check should be completely fine.
3. The retry_estale() check is wrapped in unlikely(). Some other places
already have that and overall it seems to make sense
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wiG+sN+2zSoAOggKCGue2kOJvw3rQySvQXsZstRQFTN+g@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=whH4msnFkj=iYZ9NDmZEAiZKM+vii803M8gnEwEsF1-Yg@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index f7cde1543b47..c4d75c94adce 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4241,15 +4241,15 @@ int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
struct path path;
unsigned int lookup_flags = 0;
+retry:
if (IS_ERR(from)) {
error = PTR_ERR(from);
- goto out_putnames;
+ goto out;
}
-retry:
dentry = __filename_create(newdfd, to, &path, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
- goto out_putnames;
+ goto out;
error = security_path_symlink(&path, dentry, from->name);
if (!error) {
@@ -4260,11 +4260,11 @@ int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
from->name);
}
done_path_create(&path, dentry);
- if (retry_estale(error, lookup_flags)) {
+out:
+ if (unlikely(retry_estale(error, lookup_flags))) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
}
-out_putnames:
putname(to);
putname(from);
return error;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/14] namei: clean up do_symlinkat retry logic
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (8 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 09/14] namei: prepare do_symlinkat for refactoring Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 11/14] namei: prepare do_linkat for refactoring Dmitry Kadashev
` (4 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
No functional changes, just move the main logic to a helper function to
make the whole thing easier to follow.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wiG+sN+2zSoAOggKCGue2kOJvw3rQySvQXsZstRQFTN+g@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index c4d75c94adce..61cf6bbe1e5c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4234,22 +4234,18 @@ int vfs_symlink(struct user_namespace *mnt_userns, struct inode *dir,
}
EXPORT_SYMBOL(vfs_symlink);
-int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
+static int try_symlinkat(struct filename *from, int newdfd, struct filename *to,
+ unsigned int lookup_flags)
{
int error;
struct dentry *dentry;
struct path path;
- unsigned int lookup_flags = 0;
-retry:
- if (IS_ERR(from)) {
- error = PTR_ERR(from);
- goto out;
- }
+ if (IS_ERR(from))
+ return PTR_ERR(from);
dentry = __filename_create(newdfd, to, &path, lookup_flags);
- error = PTR_ERR(dentry);
if (IS_ERR(dentry))
- goto out;
+ return PTR_ERR(dentry);
error = security_path_symlink(&path, dentry, from->name);
if (!error) {
@@ -4260,11 +4256,17 @@ int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
from->name);
}
done_path_create(&path, dentry);
-out:
- if (unlikely(retry_estale(error, lookup_flags))) {
- lookup_flags |= LOOKUP_REVAL;
- goto retry;
- }
+ return error;
+}
+
+int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
+{
+ int error;
+
+ error = try_symlinkat(from, newdfd, to, 0);
+ if (unlikely(retry_estale(error, 0)))
+ error = try_symlinkat(from, newdfd, to, LOOKUP_REVAL);
+
putname(to);
putname(from);
return error;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 11/14] namei: prepare do_linkat for refactoring
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (9 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 10/14] namei: clean up do_symlinkat retry logic Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 12/14] namei: clean up do_linkat retry logic Dmitry Kadashev
` (3 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
This is just a preparation for the move of the main linkat logic to a
separate function to make the logic easier to follow. This change
contains the flow changes so that the actual change to move the main
logic to a separate function does no change the flow at all.
Changes to the flow here:
1. Flags handling is moved into the retry loop. So it can be moved
into the function with the main logic. The cost here is mainly the
capabilities check on retry, but hopefully that is OK, ESTALE retries
are a slow path anyway.
2. Just like the similar patches for rmdir and others a few commits
before, previously on filename_create() and filename_lookup() error the
function used to exit immediately, and now it will check the return code
to see if ESTALE retry is appropriate. Both filename_create() and
filename_lookup() do their own retries on ESTALE (at least via
filename_parentat() used inside), but this extra check should be
completely fine. Invalid flags will now hit `if retry_estale()` as well.
3. unlikely() is added around the ESTALE check;
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wiG+sN+2zSoAOggKCGue2kOJvw3rQySvQXsZstRQFTN+g@mail.gmail.com/
Link https://lore.kernel.org/io-uring/CAHk-=wiE_JVny73KRZ6wuhL_5U0RRSmAw678_Cnkh3OHM8C7Jg@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 61cf6bbe1e5c..82cb6421b6df 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4391,6 +4391,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
int how = 0;
int error;
+retry:
if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) {
error = -EINVAL;
goto out_putnames;
@@ -4407,7 +4408,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
if (flags & AT_SYMLINK_FOLLOW)
how |= LOOKUP_FOLLOW;
-retry:
+
error = __filename_lookup(olddfd, old, how, &old_path, NULL);
if (error)
goto out_putnames;
@@ -4439,14 +4440,13 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
goto retry;
}
}
- if (retry_estale(error, how)) {
- path_put(&old_path);
- how |= LOOKUP_REVAL;
- goto retry;
- }
out_putpath:
path_put(&old_path);
out_putnames:
+ if (unlikely(retry_estale(error, how))) {
+ how |= LOOKUP_REVAL;
+ goto retry;
+ }
putname(old);
putname(new);
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 12/14] namei: clean up do_linkat retry logic
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (10 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 11/14] namei: prepare do_linkat for refactoring Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:35 ` [PATCH 13/14] namei: prepare do_renameat2 for refactoring Dmitry Kadashev
` (2 subsequent siblings)
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
No functional changes, just move the main logic to a helper function to
make the whole thing easier to follow.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wiG+sN+2zSoAOggKCGue2kOJvw3rQySvQXsZstRQFTN+g@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=wiE_JVny73KRZ6wuhL_5U0RRSmAw678_Cnkh3OHM8C7Jg@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 37 +++++++++++++++++++------------------
1 file changed, 19 insertions(+), 18 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 82cb6421b6df..b93e9623eb5d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4381,37 +4381,32 @@ EXPORT_SYMBOL(vfs_link);
* with linux 2.0, and to avoid hard-linking to directories
* and other special files. --ADM
*/
-int do_linkat(int olddfd, struct filename *old, int newdfd,
- struct filename *new, int flags)
+static int try_linkat(int olddfd, struct filename *old, int newdfd,
+ struct filename *new, int flags, unsigned int how)
{
struct user_namespace *mnt_userns;
struct dentry *new_dentry;
struct path old_path, new_path;
struct inode *delegated_inode = NULL;
- int how = 0;
int error;
retry:
- if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) {
- error = -EINVAL;
- goto out_putnames;
- }
+ if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0)
+ return -EINVAL;
/*
* To use null names we require CAP_DAC_READ_SEARCH
* This ensures that not everyone will be able to create
* handlink using the passed filedescriptor.
*/
- if (flags & AT_EMPTY_PATH && !capable(CAP_DAC_READ_SEARCH)) {
- error = -ENOENT;
- goto out_putnames;
- }
+ if (flags & AT_EMPTY_PATH && !capable(CAP_DAC_READ_SEARCH))
+ return -ENOENT;
if (flags & AT_SYMLINK_FOLLOW)
how |= LOOKUP_FOLLOW;
error = __filename_lookup(olddfd, old, how, &old_path, NULL);
if (error)
- goto out_putnames;
+ return error;
new_dentry = __filename_create(newdfd, new, &new_path,
(how & LOOKUP_REVAL));
@@ -4442,14 +4437,20 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
}
out_putpath:
path_put(&old_path);
-out_putnames:
- if (unlikely(retry_estale(error, how))) {
- how |= LOOKUP_REVAL;
- goto retry;
- }
+ return error;
+}
+
+int do_linkat(int olddfd, struct filename *old, int newdfd,
+ struct filename *new, int flags)
+{
+ int error;
+
+ error = try_linkat(olddfd, old, newdfd, new, flags, 0);
+ if (unlikely(retry_estale(error, 0)))
+ error = try_linkat(olddfd, old, newdfd, new, flags, LOOKUP_REVAL);
+
putname(old);
putname(new);
-
return error;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 13/14] namei: prepare do_renameat2 for refactoring
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (11 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 12/14] namei: clean up do_linkat retry logic Dmitry Kadashev
@ 2021-07-15 10:35 ` Dmitry Kadashev
2021-07-15 10:36 ` [PATCH 14/14] namei: clean up do_renameat2 retry logic Dmitry Kadashev
2021-07-15 10:39 ` [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:35 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
This is just a preparation for the move of the main renameat logic to a
separate function to make the logic easier to follow. This change
contains the flow changes so that the actual change to move the main
logic to a separate function does no change the flow at all.
Changes to the flow here:
1. Flags handling is moved into the retry loop. So it can be moved
into the function with the main logic. A few extra arithmetic checks
on a slow path should be OK.
2. Just like the similar patches for rmdir and others a few commits
before, previously on filename_create() and filename_lookup() error the
function used to exit immediately, and now it will check the return code
to see if ESTALE retry is appropriate. Both filename_create() and
filename_lookup() do their own retries on ESTALE (at least via
filename_parentat() used inside), but this extra check should be
completely fine. Invalid flags will now hit `if retry_estale()` as well.
3. unlikely() is added around the ESTALE check;
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wiG+sN+2zSoAOggKCGue2kOJvw3rQySvQXsZstRQFTN+g@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=wjFd0qn6asio=zg7zUTRmSty_TpAEhnwym1Qb=wFgCKzA@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index b93e9623eb5d..a75abff5a9a6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4654,9 +4654,9 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
int old_type, new_type;
struct inode *delegated_inode = NULL;
unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
- bool should_retry = false;
int error = -EINVAL;
+retry:
if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
goto put_names;
@@ -4667,7 +4667,6 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
if (flags & RENAME_EXCHANGE)
target_flags = 0;
-retry:
error = __filename_parentat(olddfd, from, lookup_flags, &old_path,
&old_last, &old_type);
if (error)
@@ -4769,17 +4768,14 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
}
mnt_drop_write(old_path.mnt);
exit2:
- if (retry_estale(error, lookup_flags))
- should_retry = true;
path_put(&new_path);
exit1:
path_put(&old_path);
- if (should_retry) {
- should_retry = false;
+put_names:
+ if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
}
-put_names:
putname(from);
putname(to);
return error;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 14/14] namei: clean up do_renameat2 retry logic
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (12 preceding siblings ...)
2021-07-15 10:35 ` [PATCH 13/14] namei: prepare do_renameat2 for refactoring Dmitry Kadashev
@ 2021-07-15 10:36 ` Dmitry Kadashev
2021-07-15 10:39 ` [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:36 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring, Dmitry Kadashev
No functional changes, just move the main logic to a helper function to
make the whole thing easier to follow.
Cc: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/io-uring/CAHk-=wiG+sN+2zSoAOggKCGue2kOJvw3rQySvQXsZstRQFTN+g@mail.gmail.com/
Link: https://lore.kernel.org/io-uring/CAHk-=wjFd0qn6asio=zg7zUTRmSty_TpAEhnwym1Qb=wFgCKzA@mail.gmail.com/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
fs/namei.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index a75abff5a9a6..dce0e427b95a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4643,8 +4643,9 @@ int vfs_rename(struct renamedata *rd)
}
EXPORT_SYMBOL(vfs_rename);
-int do_renameat2(int olddfd, struct filename *from, int newdfd,
- struct filename *to, unsigned int flags)
+static int try_renameat(int olddfd, struct filename *from, int newdfd,
+ struct filename *to, unsigned int flags,
+ unsigned int lookup_flags)
{
struct renamedata rd;
struct dentry *old_dentry, *new_dentry;
@@ -4653,16 +4654,15 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
struct qstr old_last, new_last;
int old_type, new_type;
struct inode *delegated_inode = NULL;
- unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET;
- int error = -EINVAL;
+ unsigned int target_flags = LOOKUP_RENAME_TARGET;
+ int error;
-retry:
if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
- goto put_names;
+ return -EINVAL;
if ((flags & (RENAME_NOREPLACE | RENAME_WHITEOUT)) &&
(flags & RENAME_EXCHANGE))
- goto put_names;
+ return -EINVAL;
if (flags & RENAME_EXCHANGE)
target_flags = 0;
@@ -4670,7 +4670,7 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
error = __filename_parentat(olddfd, from, lookup_flags, &old_path,
&old_last, &old_type);
if (error)
- goto put_names;
+ return error;
error = __filename_parentat(newdfd, to, lookup_flags, &new_path, &new_last,
&new_type);
@@ -4771,11 +4771,19 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
path_put(&new_path);
exit1:
path_put(&old_path);
-put_names:
- if (retry_estale(error, lookup_flags)) {
- lookup_flags |= LOOKUP_REVAL;
- goto retry;
- }
+ return error;
+}
+
+int do_renameat2(int olddfd, struct filename *from, int newdfd,
+ struct filename *to, unsigned int flags)
+{
+ int error;
+
+ error = try_renameat(olddfd, from, newdfd, to, flags, 0);
+ if (unlikely(retry_estale(error, 0)))
+ error = try_renameat(olddfd, from, newdfd, to, flags,
+ LOOKUP_REVAL);
+
putname(from);
putname(to);
return error;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 00/14] namei: clean up retry logic in various do_* functions
2021-07-15 10:35 [PATCH 00/14] namei: clean up retry logic in various do_* functions Dmitry Kadashev
` (13 preceding siblings ...)
2021-07-15 10:36 ` [PATCH 14/14] namei: clean up do_renameat2 retry logic Dmitry Kadashev
@ 2021-07-15 10:39 ` Dmitry Kadashev
14 siblings, 0 replies; 23+ messages in thread
From: Dmitry Kadashev @ 2021-07-15 10:39 UTC (permalink / raw)
To: Jens Axboe, Alexander Viro, Christian Brauner, Linus Torvalds
Cc: linux-fsdevel, io-uring
On Thu, Jul 15, 2021 at 5:36 PM Dmitry Kadashev <[email protected]> wrote:
>
> Suggested by Linus in https://lore.kernel.org/io-uring/CAHk-=wh=cpt_tQCirzFZRPawRpbuFTZ2MxNpXiyUF+eBXF=+sw@mail.gmail.com/
>
> This patchset does all the do_* functions one by one. The idea is to
> move the main logic to a helper function and handle stale retries /
> struct filename cleanups outside, which makes the logic easier to
> follow.
>
> There are a few minor changes in behavior:
>
> 1. filename_lookup() / filename_parentat() / filename_create() do their
> own retries on ESTALE (regardless of flags), and previously they were
> exempt from retries in the do_* functions (but they *were* called on
> retry - it's just the return code wasn't checked for ESTALE). And
> now the retry is done on the upper level, and so technically it could be
> called a behavior change. Hopefully it's an edge case where an
> additional check does not matter.
>
> 2. Some safety checks like may_mknod() / flags validation are now
> repeated on retry. Those are mostly trivial and retry is a slow path, so
> that should be OK.
>
> 3. retry_estale() is wrapped into unlikely() now
>
> On top of https://lore.kernel.org/io-uring/[email protected]/
>
> v2:
>
> - Split flow changes and code reorganization to different commits;
>
> - Move more checks into the new helpers, to avoid gotos in the touched
> do_* functions completely;
>
> - Add unlikely() around retry_estale();
>
> - Name the new helper functions try_* instead of *_helper;
>
> Dmitry Kadashev (14):
> namei: prepare do_rmdir for refactoring
> namei: clean up do_rmdir retry logic
> namei: prepare do_unlinkat for refactoring
> namei: clean up do_unlinkat retry logic
> namei: prepare do_mkdirat for refactoring
> namei: clean up do_mkdirat retry logic
> namei: prepare do_mknodat for refactoring
> namei: clean up do_mknodat retry logic
> namei: prepare do_symlinkat for refactoring
> namei: clean up do_symlinkat retry logic
> namei: prepare do_linkat for refactoring
> namei: clean up do_linkat retry logic
> namei: prepare do_renameat2 for refactoring
> namei: clean up do_renameat2 retry logic
>
> fs/namei.c | 252 +++++++++++++++++++++++++++++------------------------
> 1 file changed, 140 insertions(+), 112 deletions(-)
Ooops, the subject misses "v2", I'll resend.
--
Dmitry Kadashev
^ permalink raw reply [flat|nested] 23+ messages in thread