[PATCH v10 0/5] io_uring: add xattr support

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH v10 0/5] io_uring: add xattr support
@ 2021-12-29 20:29 Stefan Roesch
  2021-12-29 20:29 ` [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty() Stefan Roesch
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Stefan Roesch @ 2021-12-29 20:29 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, kernel-team; +Cc: torvalds, christian.brauner, shr

This adds the xattr support to io_uring. The intent is to have a more
complete support for file operations in io_uring.

This change adds support for the following functions to io_uring:
- fgetxattr
- fsetxattr
- getxattr
- setxattr

Patch 1: fs: split off do_user_path_at_empty from user_path_at_empty()
  This splits off a new function do_user_path_at_empty from
  user_path_at_empty that is based on filename and not on a
  user-specified string.

Patch 2: fs: split off setxattr_copy and do_setxattr function from setxattr
  Split off the setup part of the setxattr function in the setxattr_copy
  function. Split off the processing part in do_setxattr.

Patch 3: fs: split off do_getxattr from getxattr
  Split of the do_getxattr part from getxattr. This will
  allow it to be invoked it from io_uring.

Patch 4: io_uring: add fsetxattr and setxattr support
  This adds new functions to support the fsetxattr and setxattr
  functions.

Patch 5: io_uring: add fgetxattr and getxattr support
  This adds new functions to support the fgetxattr and getxattr
  functions.


There are two additional patches:
  liburing: Add support for xattr api's.
            This also includes the tests for the new code.
  xfstests: Add support for io_uring xattr support.


V10: - move do_user_path_at_empty definition to fs/internal.h
     - introduce __io_setxattr_finish function
     - introduce __io_getxattr_finish function
V9 : - keep kvalue in struct xattr_ctx
V8 : - introduce xattr_name struct as advised by Linus
     - remove kname_sz field in xattr_ctx
V7 : - split off setxattr in two functions as recommeneded by
       Christian.
V6 : - reverted addition of kname array to xattr_ctx structure
       Adding the kname array increases the io_kiocb beyond 64 bytes
       (increases it to 224 bytes). We try hard to limit it to 64 bytes.
       Keeping the original interface also is a bit more efficient.
     - addressed Pavel's reordering comment
     - addressed Pavel's putname comment
     - addressed Pavel's kvfree comment
     - rebased on for-5.17/io_uring-getdents64
V5 : - add kname array to xattr_ctx structure
V4 : - rebased patch series
V3 : - remove req->file checks in prep functions
     - change size parameter in do_xattr
V2 : - split off function do_user_path_empty instead of changing
       the function signature of user_path_at
     - Fix datatype size problem in do_getxattr


Stefan Roesch (5):
  fs: split off do_user_path_at_empty from user_path_at_empty()
  fs: split off setxattr_copy and do_setxattr function from setxattr
  fs: split off do_getxattr from getxattr
  io_uring: add fsetxattr and setxattr support
  io_uring: add fgetxattr and getxattr support

 fs/internal.h                 |  34 ++++
 fs/io_uring.c                 | 316 ++++++++++++++++++++++++++++++++++
 fs/namei.c                    |  10 +-
 fs/xattr.c                    | 114 ++++++++----
 include/uapi/linux/io_uring.h |   8 +-
 5 files changed, 444 insertions(+), 38 deletions(-)


base-commit: b4518682080d3a1cdd6ea45a54ff6772b8b2797a
-- 
2.30.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty()
  2021-12-29 20:29 [PATCH v10 0/5] io_uring: add xattr support Stefan Roesch
@ 2021-12-29 20:29 ` Stefan Roesch
  2021-12-30  0:49   ` Al Viro
  2021-12-29 20:29 ` [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr Stefan Roesch
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Stefan Roesch @ 2021-12-29 20:29 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, kernel-team; +Cc: torvalds, christian.brauner, shr

This splits off a do_user_path_at_empty function from the
user_path_at_empty_function. This is required so it can be
called from io_uring.

Signed-off-by: Stefan Roesch <[email protected]>
Acked-by: Christian Brauner <[email protected]>
---
 fs/internal.h |  6 ++++++
 fs/namei.c    | 10 ++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 432ea3ce76ec..afa60757d5f6 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -202,3 +202,9 @@ struct linux_dirent64;
 
 int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
 		 unsigned int count, loff_t *pos);
+
+ /*
+  * fs/namei.c:
+  */
+extern int do_user_path_at_empty(int dfd, struct filename *filename,
+				unsigned int flags, struct path *path);
diff --git a/fs/namei.c b/fs/namei.c
index 1f9d2187c765..d988e241b32c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2794,12 +2794,18 @@ int path_pts(struct path *path)
 }
 #endif
 
+int do_user_path_at_empty(int dfd, struct filename *filename, unsigned int flags,
+		       struct path *path)
+{
+	return filename_lookup(dfd, filename, flags, path, NULL);
+}
+
 int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
-		 struct path *path, int *empty)
+		struct path *path, int *empty)
 {
 	struct filename *filename = getname_flags(name, flags, empty);
-	int ret = filename_lookup(dfd, filename, flags, path, NULL);
 
+	int ret = do_user_path_at_empty(dfd, filename, flags, path);
 	putname(filename);
 	return ret;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty()
  2021-12-29 20:29 ` [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty() Stefan Roesch
@ 2021-12-30  0:49   ` Al Viro
  2021-12-30 19:57     ` Stefan Roesch
  0 siblings, 1 reply; 26+ messages in thread
From: Al Viro @ 2021-12-30  0:49 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Wed, Dec 29, 2021 at 12:29:58PM -0800, Stefan Roesch wrote:
> This splits off a do_user_path_at_empty function from the
> user_path_at_empty_function. This is required so it can be
> called from io_uring.

Umm...  Why do you bother with that wrapper?  filename_lookup() is already
there and already non-static.  Granted, its user outside of fs/namei.c
is ugly as hell, but looking at this series, I'd rather have the damn
thing call filename_lookup() directly.  _Or_, if you really feel like
doing that wrapper, make it inline in internal.h and have fs_parser.c
use call the same thing - it also passes NULL as the last argument.

Said that, I really don't see the point of having that wrapper in the
first place.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty()
  2021-12-30  0:49   ` Al Viro
@ 2021-12-30 19:57     ` Stefan Roesch
  0 siblings, 0 replies; 26+ messages in thread
From: Stefan Roesch @ 2021-12-30 19:57 UTC (permalink / raw)
  To: Al Viro; +Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner



On 12/29/21 4:49 PM, Al Viro wrote:
> On Wed, Dec 29, 2021 at 12:29:58PM -0800, Stefan Roesch wrote:
>> This splits off a do_user_path_at_empty function from the
>> user_path_at_empty_function. This is required so it can be
>> called from io_uring.
> 
> Umm...  Why do you bother with that wrapper?  filename_lookup() is already
> there and already non-static.  Granted, its user outside of fs/namei.c
> is ugly as hell, but looking at this series, I'd rather have the damn
> thing call filename_lookup() directly.  _Or_, if you really feel like
> doing that wrapper, make it inline in internal.h and have fs_parser.c
> use call the same thing - it also passes NULL as the last argument.
> 
> Said that, I really don't see the point of having that wrapper in the
> first place.

I'll remove the wrapper in the next version.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr
  2021-12-29 20:29 [PATCH v10 0/5] io_uring: add xattr support Stefan Roesch
  2021-12-29 20:29 ` [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty() Stefan Roesch
@ 2021-12-29 20:29 ` Stefan Roesch
  2021-12-30  1:15   ` Al Viro
  2021-12-29 20:30 ` [PATCH v10 3/5] fs: split off do_getxattr from getxattr Stefan Roesch
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Stefan Roesch @ 2021-12-29 20:29 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, kernel-team; +Cc: torvalds, christian.brauner, shr

This splits of the setup part of the function
setxattr in its own dedicated function called
setxattr_copy. In addition it also exposes a
new function called do_setxattr for making the
setxattr call.

This makes it possible to call these two functions
from io_uring in the processing of an xattr request.

Signed-off-by: Stefan Roesch <[email protected]>
Acked-by: Christian Brauner <[email protected]>
---
 fs/internal.h | 21 +++++++++++++
 fs/xattr.c    | 82 ++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 80 insertions(+), 23 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index afa60757d5f6..07487f29feb0 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -208,3 +208,24 @@ int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
   */
 extern int do_user_path_at_empty(int dfd, struct filename *filename,
 				unsigned int flags, struct path *path);
+
+ /*
+  * fs/xattr.c:
+  */
+struct xattr_name {
+	char name[XATTR_NAME_MAX + 1];
+};
+
+struct xattr_ctx {
+	/* Value of attribute */
+	const void __user *value;
+	void *kvalue;
+	size_t size;
+	/* Attribute name */
+	struct xattr_name *kname;
+	unsigned int flags;
+};
+
+int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
+int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+		struct xattr_ctx *ctx);
diff --git a/fs/xattr.c b/fs/xattr.c
index 5c8c5175b385..923ba944d20e 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -25,6 +25,8 @@
 
 #include <linux/uaccess.h>
 
+#include "internal.h"
+
 static const char *
 strcmp_prefix(const char *a, const char *a_prefix)
 {
@@ -539,43 +541,77 @@ EXPORT_SYMBOL_GPL(vfs_removexattr);
 /*
  * Extended attribute SET operations
  */
-static long
-setxattr(struct user_namespace *mnt_userns, struct dentry *d,
-	 const char __user *name, const void __user *value, size_t size,
-	 int flags)
+
+int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
 {
 	int error;
-	void *kvalue = NULL;
-	char kname[XATTR_NAME_MAX + 1];
 
-	if (flags & ~(XATTR_CREATE|XATTR_REPLACE))
+	if (ctx->flags & ~(XATTR_CREATE|XATTR_REPLACE))
 		return -EINVAL;
 
-	error = strncpy_from_user(kname, name, sizeof(kname));
-	if (error == 0 || error == sizeof(kname))
-		error = -ERANGE;
+	error = strncpy_from_user(ctx->kname->name, name,
+				sizeof(ctx->kname->name));
+	if (error == 0 || error == sizeof(ctx->kname->name))
+		return  -ERANGE;
 	if (error < 0)
 		return error;
 
-	if (size) {
-		if (size > XATTR_SIZE_MAX)
+	if (ctx->size) {
+		if (ctx->size > XATTR_SIZE_MAX)
 			return -E2BIG;
-		kvalue = kvmalloc(size, GFP_KERNEL);
-		if (!kvalue)
+
+		ctx->kvalue = kvmalloc(ctx->size, GFP_KERNEL);
+		if (!ctx->kvalue)
 			return -ENOMEM;
-		if (copy_from_user(kvalue, value, size)) {
-			error = -EFAULT;
-			goto out;
+
+		if (copy_from_user(ctx->kvalue, ctx->value, ctx->size)) {
+			kvfree(ctx->kvalue);
+			return -EFAULT;
 		}
-		if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
-		    (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
-			posix_acl_fix_xattr_from_user(mnt_userns, kvalue, size);
 	}
 
-	error = vfs_setxattr(mnt_userns, d, kname, kvalue, size, flags);
-out:
-	kvfree(kvalue);
+	return 0;
+}
+
+static void setxattr_convert(struct user_namespace *mnt_userns,
+			struct xattr_ctx *ctx)
+{
+	if (ctx->size &&
+		((strcmp(ctx->kname->name, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
+		(strcmp(ctx->kname->name, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)))
+		posix_acl_fix_xattr_from_user(mnt_userns, ctx->kvalue, ctx->size);
+}
+
+int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
+		struct xattr_ctx *ctx)
+{
+	setxattr_convert(mnt_userns, ctx);
+	return vfs_setxattr(mnt_userns, dentry, ctx->kname->name,
+			ctx->kvalue, ctx->size, ctx->flags);
+}
+
+static long
+setxattr(struct user_namespace *mnt_userns, struct dentry *d,
+	const char __user *name, const void __user *value, size_t size,
+	int flags)
+{
+	struct xattr_name kname;
+	struct xattr_ctx ctx = {
+		.value    = value,
+		.kvalue   = NULL,
+		.size     = size,
+		.kname    = &kname,
+		.flags    = flags,
+	};
+	int error;
+
+	error = setxattr_copy(name, &ctx);
+	if (error)
+		return error;
+
+	error = do_setxattr(mnt_userns, d, &ctx);
 
+	kvfree(ctx.kvalue);
 	return error;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr
  2021-12-29 20:29 ` [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr Stefan Roesch
@ 2021-12-30  1:15   ` Al Viro
  2021-12-30  9:41     ` Christian Brauner
  2021-12-30 19:57     ` Stefan Roesch
  0 siblings, 2 replies; 26+ messages in thread
From: Al Viro @ 2021-12-30  1:15 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Wed, Dec 29, 2021 at 12:29:59PM -0800, Stefan Roesch wrote:
> +	if (ctx->size) {
> +		if (ctx->size > XATTR_SIZE_MAX)
>  			return -E2BIG;
> -		kvalue = kvmalloc(size, GFP_KERNEL);
> -		if (!kvalue)
> +
> +		ctx->kvalue = kvmalloc(ctx->size, GFP_KERNEL);
> +		if (!ctx->kvalue)
>  			return -ENOMEM;
> -		if (copy_from_user(kvalue, value, size)) {
> -			error = -EFAULT;
> -			goto out;
> +
> +		if (copy_from_user(ctx->kvalue, ctx->value, ctx->size)) {
> +			kvfree(ctx->kvalue);
> +			return -EFAULT;

BTW, what's wrong with using vmemdup_user() here?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr
  2021-12-30  1:15   ` Al Viro
@ 2021-12-30  9:41     ` Christian Brauner
  2021-12-30 19:57     ` Stefan Roesch
  1 sibling, 0 replies; 26+ messages in thread
From: Christian Brauner @ 2021-12-30  9:41 UTC (permalink / raw)
  To: Al Viro; +Cc: Stefan Roesch, io-uring, linux-fsdevel, kernel-team, torvalds

On Thu, Dec 30, 2021 at 01:15:10AM +0000, Al Viro wrote:
> On Wed, Dec 29, 2021 at 12:29:59PM -0800, Stefan Roesch wrote:
> > +	if (ctx->size) {
> > +		if (ctx->size > XATTR_SIZE_MAX)
> >  			return -E2BIG;
> > -		kvalue = kvmalloc(size, GFP_KERNEL);
> > -		if (!kvalue)
> > +
> > +		ctx->kvalue = kvmalloc(ctx->size, GFP_KERNEL);
> > +		if (!ctx->kvalue)
> >  			return -ENOMEM;
> > -		if (copy_from_user(kvalue, value, size)) {
> > -			error = -EFAULT;
> > -			goto out;
> > +
> > +		if (copy_from_user(ctx->kvalue, ctx->value, ctx->size)) {
> > +			kvfree(ctx->kvalue);
> > +			return -EFAULT;
> 
> BTW, what's wrong with using vmemdup_user() here?

Nothing? It's simply timing paired with that specific code not needing
to be touched:

- in 2005 that code was kmalloc(GFP_KERNEL) + copy_from_user()
- in 2009 it was changed to memdup_user(GFP_USER)
- in 2012 it was changed to kvmalloc(GFP_KERNEL) + copy_from_user()

In 2018 you added vmemdup_user() and noone has updated that codepath. :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr
  2021-12-30  1:15   ` Al Viro
  2021-12-30  9:41     ` Christian Brauner
@ 2021-12-30 19:57     ` Stefan Roesch
  1 sibling, 0 replies; 26+ messages in thread
From: Stefan Roesch @ 2021-12-30 19:57 UTC (permalink / raw)
  To: Al Viro; +Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner



On 12/29/21 5:15 PM, Al Viro wrote:
> On Wed, Dec 29, 2021 at 12:29:59PM -0800, Stefan Roesch wrote:
>> +	if (ctx->size) {
>> +		if (ctx->size > XATTR_SIZE_MAX)
>>  			return -E2BIG;
>> -		kvalue = kvmalloc(size, GFP_KERNEL);
>> -		if (!kvalue)
>> +
>> +		ctx->kvalue = kvmalloc(ctx->size, GFP_KERNEL);
>> +		if (!ctx->kvalue)
>>  			return -ENOMEM;
>> -		if (copy_from_user(kvalue, value, size)) {
>> -			error = -EFAULT;
>> -			goto out;
>> +
>> +		if (copy_from_user(ctx->kvalue, ctx->value, ctx->size)) {
>> +			kvfree(ctx->kvalue);
>> +			return -EFAULT;
> 
> BTW, what's wrong with using vmemdup_user() here?

I was simply following the existing code. The next version will use the vmemdup_user function.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v10 3/5] fs: split off do_getxattr from getxattr
  2021-12-29 20:29 [PATCH v10 0/5] io_uring: add xattr support Stefan Roesch
  2021-12-29 20:29 ` [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty() Stefan Roesch
  2021-12-29 20:29 ` [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr Stefan Roesch
@ 2021-12-29 20:30 ` Stefan Roesch
  2021-12-29 20:30 ` [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support Stefan Roesch
  2021-12-29 20:30 ` [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support Stefan Roesch
  4 siblings, 0 replies; 26+ messages in thread
From: Stefan Roesch @ 2021-12-29 20:30 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, kernel-team; +Cc: torvalds, christian.brauner, shr

This splits off do_getxattr function from the getxattr
function. This will allow io_uring to call it from its
io worker.

Signed-off-by: Stefan Roesch <[email protected]>
Acked-by: Christian Brauner <[email protected]>
---
 fs/internal.h |  7 +++++++
 fs/xattr.c    | 32 ++++++++++++++++++++------------
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 07487f29feb0..c276ec290489 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -226,6 +226,13 @@ struct xattr_ctx {
 	unsigned int flags;
 };
 
+
+ssize_t do_getxattr(struct user_namespace *mnt_userns,
+		    struct dentry *d,
+		    const char *kname,
+		    void __user *value,
+		    size_t size);
+
 int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
 int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		struct xattr_ctx *ctx);
diff --git a/fs/xattr.c b/fs/xattr.c
index 923ba944d20e..3b6d683d07b9 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -677,19 +677,12 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
 /*
  * Extended attribute GET operations
  */
-static ssize_t
-getxattr(struct user_namespace *mnt_userns, struct dentry *d,
-	 const char __user *name, void __user *value, size_t size)
+ssize_t
+do_getxattr(struct user_namespace *mnt_userns, struct dentry *d,
+	const char *kname, void __user *value, size_t size)
 {
-	ssize_t error;
 	void *kvalue = NULL;
-	char kname[XATTR_NAME_MAX + 1];
-
-	error = strncpy_from_user(kname, name, sizeof(kname));
-	if (error == 0 || error == sizeof(kname))
-		error = -ERANGE;
-	if (error < 0)
-		return error;
+	ssize_t error;
 
 	if (size) {
 		if (size > XATTR_SIZE_MAX)
@@ -713,10 +706,25 @@ getxattr(struct user_namespace *mnt_userns, struct dentry *d,
 	}
 
 	kvfree(kvalue);
-
 	return error;
 }
 
+static ssize_t
+getxattr(struct user_namespace *mnt_userns, struct dentry *d,
+	 const char __user *name, void __user *value, size_t size)
+{
+	ssize_t error;
+	struct xattr_name kname;
+
+	error = strncpy_from_user(kname.name, name, sizeof(kname.name));
+	if (error == 0 || error == sizeof(kname.name))
+		error = -ERANGE;
+	if (error < 0)
+		return error;
+
+	return do_getxattr(mnt_userns, d, kname.name, value, size);
+}
+
 static ssize_t path_getxattr(const char __user *pathname,
 			     const char __user *name, void __user *value,
 			     size_t size, unsigned int lookup_flags)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-29 20:29 [PATCH v10 0/5] io_uring: add xattr support Stefan Roesch
                   ` (2 preceding siblings ...)
  2021-12-29 20:30 ` [PATCH v10 3/5] fs: split off do_getxattr from getxattr Stefan Roesch
@ 2021-12-29 20:30 ` Stefan Roesch
  2021-12-30  1:58   ` Al Viro
  2021-12-30  2:17   ` Al Viro
  2021-12-29 20:30 ` [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support Stefan Roesch
  4 siblings, 2 replies; 26+ messages in thread
From: Stefan Roesch @ 2021-12-29 20:30 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, kernel-team; +Cc: torvalds, christian.brauner, shr

This adds support to io_uring for the fsetxattr and setxattr API.

Signed-off-by: Stefan Roesch <[email protected]>
Acked-by: Christian Brauner <[email protected]>
---
 fs/io_uring.c                 | 164 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/io_uring.h |   6 +-
 2 files changed, 169 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c8258c784116..f1b325dd81d5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -82,6 +82,7 @@
 #include <linux/audit.h>
 #include <linux/security.h>
 #include <linux/atomic-ref.h>
+#include <linux/xattr.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -726,6 +727,12 @@ struct io_async_rw {
 	struct wait_page_queue		wpq;
 };
 
+struct io_xattr {
+	struct file			*file;
+	struct xattr_ctx		ctx;
+	struct filename			*filename;
+};
+
 enum {
 	REQ_F_FIXED_FILE_BIT	= IOSQE_FIXED_FILE_BIT,
 	REQ_F_IO_DRAIN_BIT	= IOSQE_IO_DRAIN_BIT,
@@ -866,6 +873,7 @@ struct io_kiocb {
 		struct io_symlink	symlink;
 		struct io_hardlink	hardlink;
 		struct io_getdents	getdents;
+		struct io_xattr		xattr;
 	};
 
 	u8				opcode;
@@ -1118,6 +1126,10 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_GETDENTS] = {
 		.needs_file		= 1,
 	},
+	[IORING_OP_FSETXATTR] = {
+		.needs_file = 1
+	},
+	[IORING_OP_SETXATTR] = {},
 };
 
 /* requests with any of those set should undergo io_disarm_next() */
@@ -3887,6 +3899,139 @@ static int io_renameat(struct io_kiocb *req, unsigned int issue_flags)
 	return 0;
 }
 
+static int __io_setxattr_prep(struct io_kiocb *req,
+			const struct io_uring_sqe *sqe)
+{
+	struct io_xattr *ix = &req->xattr;
+	const char __user *name;
+	int ret;
+
+	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+		return -EINVAL;
+	if (unlikely(sqe->ioprio))
+		return -EINVAL;
+	if (unlikely(req->flags & REQ_F_FIXED_FILE))
+		return -EBADF;
+
+	ix->filename = NULL;
+	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	ix->ctx.value = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+	ix->ctx.kvalue = NULL;
+	ix->ctx.size = READ_ONCE(sqe->len);
+	ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
+
+	ix->ctx.kname = kmalloc(sizeof(*ix->ctx.kname), GFP_KERNEL);
+	if (!ix->ctx.kname)
+		return -ENOMEM;
+
+	ret = setxattr_copy(name, &ix->ctx);
+	if (ret) {
+		kfree(ix->ctx.kname);
+		return ret;
+	}
+
+	req->flags |= REQ_F_NEED_CLEANUP;
+	return 0;
+}
+
+static int io_setxattr_prep(struct io_kiocb *req,
+			const struct io_uring_sqe *sqe)
+{
+	struct io_xattr *ix = &req->xattr;
+	const char __user *path;
+	int ret;
+
+	ret = __io_setxattr_prep(req, sqe);
+	if (ret)
+		return ret;
+
+	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+
+	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
+	if (IS_ERR(ix->filename)) {
+		ret = PTR_ERR(ix->filename);
+		ix->filename = NULL;
+	}
+
+	return ret;
+}
+
+static int io_fsetxattr_prep(struct io_kiocb *req,
+			const struct io_uring_sqe *sqe)
+{
+	return __io_setxattr_prep(req, sqe);
+}
+
+static int __io_setxattr(struct io_kiocb *req, unsigned int issue_flags,
+			struct path *path)
+{
+	struct io_xattr *ix = &req->xattr;
+	int ret;
+
+	ret = mnt_want_write(path->mnt);
+	if (!ret) {
+		ret = do_setxattr(mnt_user_ns(path->mnt), path->dentry, &ix->ctx);
+		mnt_drop_write(path->mnt);
+	}
+
+	return ret;
+}
+
+static void __io_setxattr_finish(struct io_kiocb *req, int ret)
+{
+	struct xattr_ctx *ctx = &req->xattr.ctx;
+
+	req->flags &= ~REQ_F_NEED_CLEANUP;
+
+	kfree(ctx->kname);
+	if (ctx->kvalue)
+		kvfree(ctx->kvalue);
+
+	if (ret < 0)
+		req_set_fail(req);
+
+	io_req_complete(req, ret);
+}
+
+static int io_fsetxattr(struct io_kiocb *req, unsigned int issue_flags)
+{
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+	ret = __io_setxattr(req, issue_flags, &req->file->f_path);
+	__io_setxattr_finish(req, ret);
+
+	return 0;
+}
+
+static int io_setxattr(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_xattr *ix = &req->xattr;
+	unsigned int lookup_flags = LOOKUP_FOLLOW;
+	struct path path;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+retry:
+	ret = do_user_path_at_empty(AT_FDCWD, ix->filename, lookup_flags, &path);
+	if (!ret) {
+		ret = __io_setxattr(req, issue_flags, &path);
+		path_put(&path);
+		if (retry_estale(ret, lookup_flags)) {
+			lookup_flags |= LOOKUP_REVAL;
+			goto retry;
+		}
+	}
+	putname(ix->filename);
+
+	__io_setxattr_finish(req, ret);
+	return 0;
+}
+
 static int io_unlinkat_prep(struct io_kiocb *req,
 			    const struct io_uring_sqe *sqe)
 {
@@ -6623,6 +6768,10 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_linkat_prep(req, sqe);
 	case IORING_OP_GETDENTS:
 		return io_getdents_prep(req, sqe);
+	case IORING_OP_FSETXATTR:
+		return io_fsetxattr_prep(req, sqe);
+	case IORING_OP_SETXATTR:
+		return io_setxattr_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6764,6 +6913,14 @@ static void io_clean_op(struct io_kiocb *req)
 			putname(req->hardlink.oldpath);
 			putname(req->hardlink.newpath);
 			break;
+		case IORING_OP_SETXATTR:
+			if (req->xattr.filename)
+				putname(req->xattr.filename);
+			fallthrough;
+		case IORING_OP_FSETXATTR:
+			kfree(req->xattr.ctx.kname);
+			kvfree(req->xattr.ctx.kvalue);
+			break;
 		}
 	}
 	if ((req->flags & REQ_F_POLLED) && req->apoll) {
@@ -6909,6 +7066,12 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	case IORING_OP_GETDENTS:
 		ret = io_getdents(req, issue_flags);
 		break;
+	case IORING_OP_FSETXATTR:
+		ret = io_fsetxattr(req, issue_flags);
+		break;
+	case IORING_OP_SETXATTR:
+		ret = io_setxattr(req, issue_flags);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -11277,6 +11440,7 @@ static int __init io_uring_init(void)
 	BUILD_BUG_SQE_ELEM(42, __u16,  personality);
 	BUILD_BUG_SQE_ELEM(44, __s32,  splice_fd_in);
 	BUILD_BUG_SQE_ELEM(44, __u32,  file_index);
+	BUILD_BUG_SQE_ELEM(48, __u64,  addr3);
 
 	BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
 		     sizeof(struct io_uring_rsrc_update));
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 57dc88db5793..c62a8bec8cd4 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -45,6 +45,7 @@ struct io_uring_sqe {
 		__u32		rename_flags;
 		__u32		unlink_flags;
 		__u32		hardlink_flags;
+		__u32		xattr_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	/* pack this to avoid bogus arm OABI complaints */
@@ -60,7 +61,8 @@ struct io_uring_sqe {
 		__s32	splice_fd_in;
 		__u32	file_index;
 	};
-	__u64	__pad2[2];
+	__u64	addr3;
+	__u64	__pad2[1];
 };
 
 enum {
@@ -144,6 +146,8 @@ enum {
 	IORING_OP_SYMLINKAT,
 	IORING_OP_LINKAT,
 	IORING_OP_GETDENTS,
+	IORING_OP_FSETXATTR,
+	IORING_OP_SETXATTR,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-29 20:30 ` [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support Stefan Roesch
@ 2021-12-30  1:58   ` Al Viro
  2021-12-30  2:17   ` Al Viro
  1 sibling, 0 replies; 26+ messages in thread
From: Al Viro @ 2021-12-30  1:58 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Wed, Dec 29, 2021 at 12:30:01PM -0800, Stefan Roesch wrote:

> +static int io_setxattr_prep(struct io_kiocb *req,
> +			const struct io_uring_sqe *sqe)
> +{
> +	struct io_xattr *ix = &req->xattr;
> +	const char __user *path;
> +	int ret;
> +
> +	ret = __io_setxattr_prep(req, sqe);
> +	if (ret)
> +		return ret;
> +
> +	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> +
> +	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
> +	if (IS_ERR(ix->filename)) {
> +		ret = PTR_ERR(ix->filename);
> +		ix->filename = NULL;
> +	}
> +
> +	return ret;
> +}

Same question as for getxattr side.  Why bother doing getname in prep
and open-coding the ESTALE retry loop in io_setxattr() proper?

Again, if you have hit the retry_estale() returning true, you are already
on a very slow path; trying to save on getname is completely pointless.
Moreover, had there been a situation where it might have been warranted
(and I really can't imagine one), why would that only be applicable in
io_uring case?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-29 20:30 ` [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support Stefan Roesch
  2021-12-30  1:58   ` Al Viro
@ 2021-12-30  2:17   ` Al Viro
  2021-12-30  2:19     ` Al Viro
                       ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Al Viro @ 2021-12-30  2:17 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Wed, Dec 29, 2021 at 12:30:01PM -0800, Stefan Roesch wrote:

> +static int __io_setxattr_prep(struct io_kiocb *req,
> +			const struct io_uring_sqe *sqe)
> +{
> +	struct io_xattr *ix = &req->xattr;
> +	const char __user *name;
> +	int ret;
> +
> +	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> +		return -EINVAL;
> +	if (unlikely(sqe->ioprio))
> +		return -EINVAL;
> +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
> +		return -EBADF;
> +
> +	ix->filename = NULL;
> +	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	ix->ctx.value = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> +	ix->ctx.kvalue = NULL;
> +	ix->ctx.size = READ_ONCE(sqe->len);
> +	ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
> +
> +	ix->ctx.kname = kmalloc(sizeof(*ix->ctx.kname), GFP_KERNEL);
> +	if (!ix->ctx.kname)
> +		return -ENOMEM;
> +
> +	ret = setxattr_copy(name, &ix->ctx);
> +	if (ret) {
> +		kfree(ix->ctx.kname);
> +		return ret;
> +	}
> +
> +	req->flags |= REQ_F_NEED_CLEANUP;
> +	return 0;
> +}

OK, so you
	* allocate a buffer for xattr name
	* have setxattr_copy() copy the name in *and* memdup the contents
	* on failure, you have the buffer for xattr name freed and return
an error.  memdup'ed stuff is left for cleanup, presumably.

> +static int io_setxattr_prep(struct io_kiocb *req,
> +			const struct io_uring_sqe *sqe)
> +{
> +	struct io_xattr *ix = &req->xattr;
> +	const char __user *path;
> +	int ret;
> +
> +	ret = __io_setxattr_prep(req, sqe);
> +	if (ret)
> +		return ret;
> +
> +	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> +
> +	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
> +	if (IS_ERR(ix->filename)) {
> +		ret = PTR_ERR(ix->filename);
> +		ix->filename = NULL;
> +	}
> +
> +	return ret;
> +}

... and here you use it and bring the pathname in.  Should the latter
step fail, you restore ->filename to NULL and return an error.

Could you explain what kind of magic could allow the caller to tell
whether ix->ctx.kname needs to be freed on error?  I don't see any way
that could possibly work...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30  2:17   ` Al Viro
@ 2021-12-30  2:19     ` Al Viro
  2021-12-30  3:04     ` Al Viro
  2021-12-30 20:18     ` Stefan Roesch
  2 siblings, 0 replies; 26+ messages in thread
From: Al Viro @ 2021-12-30  2:19 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Thu, Dec 30, 2021 at 02:17:23AM +0000, Al Viro wrote:

> > +static int io_setxattr_prep(struct io_kiocb *req,
> > +			const struct io_uring_sqe *sqe)
> > +{
> > +	struct io_xattr *ix = &req->xattr;
> > +	const char __user *path;
> > +	int ret;
> > +
> > +	ret = __io_setxattr_prep(req, sqe);
> > +	if (ret)
> > +		return ret;
> > +
> > +	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> > +
> > +	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
> > +	if (IS_ERR(ix->filename)) {
> > +		ret = PTR_ERR(ix->filename);
> > +		ix->filename = NULL;
> > +	}
> > +
> > +	return ret;
> > +}
> 
> ... and here you use it and bring the pathname in.  Should the latter
> step fail, you restore ->filename to NULL and return an error.
> 
> Could you explain what kind of magic could allow the caller to tell
> whether ix->ctx.kname needs to be freed on error?  I don't see any way
> that could possibly work...

getxattr side has the same problem, AFAICS...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30  2:17   ` Al Viro
  2021-12-30  2:19     ` Al Viro
@ 2021-12-30  3:04     ` Al Viro
  2021-12-30 10:12       ` Christian Brauner
  2021-12-30 20:18     ` Stefan Roesch
  2 siblings, 1 reply; 26+ messages in thread
From: Al Viro @ 2021-12-30  3:04 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Thu, Dec 30, 2021 at 02:17:23AM +0000, Al Viro wrote:
> On Wed, Dec 29, 2021 at 12:30:01PM -0800, Stefan Roesch wrote:
> 
> > +static int __io_setxattr_prep(struct io_kiocb *req,
> > +			const struct io_uring_sqe *sqe)
> > +{
> > +	struct io_xattr *ix = &req->xattr;
> > +	const char __user *name;
> > +	int ret;
> > +
> > +	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> > +		return -EINVAL;
> > +	if (unlikely(sqe->ioprio))
> > +		return -EINVAL;
> > +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
> > +		return -EBADF;
> > +
> > +	ix->filename = NULL;
> > +	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
> > +	ix->ctx.value = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> > +	ix->ctx.kvalue = NULL;
> > +	ix->ctx.size = READ_ONCE(sqe->len);
> > +	ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
> > +
> > +	ix->ctx.kname = kmalloc(sizeof(*ix->ctx.kname), GFP_KERNEL);
> > +	if (!ix->ctx.kname)
> > +		return -ENOMEM;
> > +
> > +	ret = setxattr_copy(name, &ix->ctx);
> > +	if (ret) {
> > +		kfree(ix->ctx.kname);
> > +		return ret;
> > +	}
> > +
> > +	req->flags |= REQ_F_NEED_CLEANUP;
> > +	return 0;
> > +}
> 
> OK, so you
> 	* allocate a buffer for xattr name
> 	* have setxattr_copy() copy the name in *and* memdup the contents
> 	* on failure, you have the buffer for xattr name freed and return
> an error.  memdup'ed stuff is left for cleanup, presumably.
> 
> > +static int io_setxattr_prep(struct io_kiocb *req,
> > +			const struct io_uring_sqe *sqe)
> > +{
> > +	struct io_xattr *ix = &req->xattr;
> > +	const char __user *path;
> > +	int ret;
> > +
> > +	ret = __io_setxattr_prep(req, sqe);
> > +	if (ret)
> > +		return ret;
> > +
> > +	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> > +
> > +	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
> > +	if (IS_ERR(ix->filename)) {
> > +		ret = PTR_ERR(ix->filename);
> > +		ix->filename = NULL;
> > +	}
> > +
> > +	return ret;
> > +}
> 
> ... and here you use it and bring the pathname in.  Should the latter
> step fail, you restore ->filename to NULL and return an error.
> 
> Could you explain what kind of magic could allow the caller to tell
> whether ix->ctx.kname needs to be freed on error?  I don't see any way
> that could possibly work...

FWIW, your calling conventions make no sense whatsoever.  OK, you have
a helper that does copyin of the arguments.  And it needs to be shared
between the syscall path (where you put the xattr name on stack) and
io_uring one (where you allocate it dynamically).  Why not simply move
the allocation into that helper, conditional upon the passed value being
NULL?  And leave it alone on any failure paths in that helper.

Syscall user would set it pointing to local structure/string/whatnot.
No freeing is needed there in any case.

io_uring one would set it to NULL and free the value left there on
cleanup.  Again, same in all cases, error or no error.  Just make sure
you have it zeroed *before* any failure exits (including those on req->flags,
etc.)

While we are at it, syscall path needs to free the copied xattr contents
anyway.  So screw freeing anything in that helper (both allocation failures
and copyin ones); have all freeing done by caller, and make it unconditional
there.  An error is usually a slow path; an error of that sort - definitely
so.  IOW,
	1) call the helper, copying userland data into the buffers allocated
by the helper
	2) if helper hasn't returned an error, do work
	3) free whatever the helper has allocated
With (3) being unconditional.  It doesn't make any sense to have a separate
early exit, especially since with your approach you end up paying the price
on failure exits in the helper anyway.

	error = setxattr_copy(...);
	if (likely(!error))
		error = do_setxattr(...);
	kvfree(...);
	return error;

would've been better for the syscall side as well.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30  3:04     ` Al Viro
@ 2021-12-30 10:12       ` Christian Brauner
  2021-12-30 16:16         ` Al Viro
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2021-12-30 10:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Stefan Roesch, io-uring, linux-fsdevel, kernel-team, torvalds

On Thu, Dec 30, 2021 at 03:04:34AM +0000, Al Viro wrote:
> On Thu, Dec 30, 2021 at 02:17:23AM +0000, Al Viro wrote:
> > On Wed, Dec 29, 2021 at 12:30:01PM -0800, Stefan Roesch wrote:
> > 
> > > +static int __io_setxattr_prep(struct io_kiocb *req,
> > > +			const struct io_uring_sqe *sqe)
> > > +{
> > > +	struct io_xattr *ix = &req->xattr;
> > > +	const char __user *name;
> > > +	int ret;
> > > +
> > > +	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> > > +		return -EINVAL;
> > > +	if (unlikely(sqe->ioprio))
> > > +		return -EINVAL;
> > > +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
> > > +		return -EBADF;
> > > +
> > > +	ix->filename = NULL;
> > > +	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
> > > +	ix->ctx.value = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> > > +	ix->ctx.kvalue = NULL;
> > > +	ix->ctx.size = READ_ONCE(sqe->len);
> > > +	ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
> > > +
> > > +	ix->ctx.kname = kmalloc(sizeof(*ix->ctx.kname), GFP_KERNEL);
> > > +	if (!ix->ctx.kname)
> > > +		return -ENOMEM;
> > > +
> > > +	ret = setxattr_copy(name, &ix->ctx);
> > > +	if (ret) {
> > > +		kfree(ix->ctx.kname);
> > > +		return ret;
> > > +	}
> > > +
> > > +	req->flags |= REQ_F_NEED_CLEANUP;
> > > +	return 0;
> > > +}
> > 
> > OK, so you
> > 	* allocate a buffer for xattr name
> > 	* have setxattr_copy() copy the name in *and* memdup the contents
> > 	* on failure, you have the buffer for xattr name freed and return
> > an error.  memdup'ed stuff is left for cleanup, presumably.
> > 
> > > +static int io_setxattr_prep(struct io_kiocb *req,
> > > +			const struct io_uring_sqe *sqe)
> > > +{
> > > +	struct io_xattr *ix = &req->xattr;
> > > +	const char __user *path;
> > > +	int ret;
> > > +
> > > +	ret = __io_setxattr_prep(req, sqe);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> > > +
> > > +	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
> > > +	if (IS_ERR(ix->filename)) {
> > > +		ret = PTR_ERR(ix->filename);
> > > +		ix->filename = NULL;
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > 
> > ... and here you use it and bring the pathname in.  Should the latter
> > step fail, you restore ->filename to NULL and return an error.
> > 
> > Could you explain what kind of magic could allow the caller to tell
> > whether ix->ctx.kname needs to be freed on error?  I don't see any way
> > that could possibly work...
> 
> FWIW, your calling conventions make no sense whatsoever.  OK, you have
> a helper that does copyin of the arguments.  And it needs to be shared
> between the syscall path (where you put the xattr name on stack) and
> io_uring one (where you allocate it dynamically).  Why not simply move
> the allocation into that helper, conditional upon the passed value being
> NULL?  And leave it alone on any failure paths in that helper.

I had thought about that too when looking at Stefan's code at first but
then concluded that doesn't make sense since io_uring doesn't allocate
xattr_ctx dynamically either. It embedds it directly in io_xattrs which
itself isn't allocated dynamically either.

But I think the unconditional cleanup you proposed makes sense. If we
add a simple static inline helper to internal.h to cleanup xattr_ctx
once the caller is done we can use that in __io_setxattr() and
setxattr(): 

From 248cae031c21d3103c8ab46afd729aa46114019a Mon Sep 17 00:00:00 2001
From: Christian Brauner <[email protected]>
Date: Thu, 30 Dec 2021 11:02:57 +0100
Subject: [PATCH] UNTESTED

---
 fs/internal.h |  7 +++++++
 fs/io_uring.c | 11 +----------
 fs/xattr.c    | 10 ++++------
 3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 942b2005a2be..446dca46d845 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -228,5 +228,12 @@ ssize_t do_getxattr(struct user_namespace *mnt_userns,
 		    size_t size);
 
 int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
+static inline void setxattr_finish(struct xattr_ctx *ctx)
+{
+	kfree(ctx->kname);
+	kvfree(ctx->kvalue);
+	memset(ctx, 0, sizeof(struct xattr_ctx));
+}
+
 int do_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 		struct xattr_ctx *ctx);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7204b8d593e4..2e30c7a87eb9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4114,7 +4114,7 @@ static int __io_setxattr(struct io_kiocb *req, unsigned int issue_flags,
 		ret = do_setxattr(mnt_user_ns(path->mnt), path->dentry, &ix->ctx);
 		mnt_drop_write(path->mnt);
 	}
-
+	setxattr_finish(&ix->ctx);
 	return ret;
 }
 
@@ -4127,12 +4127,7 @@ static int io_fsetxattr(struct io_kiocb *req, unsigned int issue_flags)
 		return -EAGAIN;
 
 	ret = __io_setxattr(req, issue_flags, &req->file->f_path);
-
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	kfree(ix->ctx.kname);
-
-	if (ix->ctx.kvalue)
-		kvfree(ix->ctx.kvalue);
 	if (ret < 0)
 		req_set_fail(req);
 
@@ -4163,10 +4158,6 @@ static int io_setxattr(struct io_kiocb *req, unsigned int issue_flags)
 	putname(ix->filename);
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	kfree(ix->ctx.kname);
-
-	if (ix->ctx.kvalue)
-		kvfree(ix->ctx.kvalue);
 	if (ret < 0)
 		req_set_fail(req);
 
diff --git a/fs/xattr.c b/fs/xattr.c
index 3b6d683d07b9..0f4e067816bc 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -545,6 +545,7 @@ EXPORT_SYMBOL_GPL(vfs_removexattr);
 int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
 {
 	int error;
+	struct xattr_ctx *new_ctx;
 
 	if (ctx->flags & ~(XATTR_CREATE|XATTR_REPLACE))
 		return -EINVAL;
@@ -606,12 +607,9 @@ setxattr(struct user_namespace *mnt_userns, struct dentry *d,
 	int error;
 
 	error = setxattr_copy(name, &ctx);
-	if (error)
-		return error;
-
-	error = do_setxattr(mnt_userns, d, &ctx);
-
-	kvfree(ctx.kvalue);
+	if (!error)
+		error = do_setxattr(mnt_userns, d, &ctx);
+	setxattr_finish(&ctx);
 	return error;
 }
 
-- 
2.30.2


> 
> Syscall user would set it pointing to local structure/string/whatnot.
> No freeing is needed there in any case.
> 
> io_uring one would set it to NULL and free the value left there on
> cleanup.  Again, same in all cases, error or no error.  Just make sure
> you have it zeroed *before* any failure exits (including those on req->flags,
> etc.)
> 
> While we are at it, syscall path needs to free the copied xattr contents
> anyway.  So screw freeing anything in that helper (both allocation failures
> and copyin ones); have all freeing done by caller, and make it unconditional
> there.  An error is usually a slow path; an error of that sort - definitely
> so.  IOW,
> 	1) call the helper, copying userland data into the buffers allocated
> by the helper
> 	2) if helper hasn't returned an error, do work
> 	3) free whatever the helper has allocated
> With (3) being unconditional.  It doesn't make any sense to have a separate
> early exit, especially since with your approach you end up paying the price
> on failure exits in the helper anyway.
> 
> 	error = setxattr_copy(...);
> 	if (likely(!error))
> 		error = do_setxattr(...);
> 	kvfree(...);
> 	return error;
> 
> would've been better for the syscall side as well.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30 10:12       ` Christian Brauner
@ 2021-12-30 16:16         ` Al Viro
  2021-12-30 18:01           ` Christian Brauner
  0 siblings, 1 reply; 26+ messages in thread
From: Al Viro @ 2021-12-30 16:16 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Stefan Roesch, io-uring, linux-fsdevel, kernel-team, torvalds

On Thu, Dec 30, 2021 at 11:12:42AM +0100, Christian Brauner wrote:

> @@ -545,6 +545,7 @@ EXPORT_SYMBOL_GPL(vfs_removexattr);
>  int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
>  {
>  	int error;
> +	struct xattr_ctx *new_ctx;
>  
>  	if (ctx->flags & ~(XATTR_CREATE|XATTR_REPLACE))
>  		return -EINVAL;
> @@ -606,12 +607,9 @@ setxattr(struct user_namespace *mnt_userns, struct dentry *d,
>  	int error;
>  
>  	error = setxattr_copy(name, &ctx);
> -	if (error)
> -		return error;
> -
> -	error = do_setxattr(mnt_userns, d, &ctx);
> -
> -	kvfree(ctx.kvalue);
> +	if (!error)
> +		error = do_setxattr(mnt_userns, d, &ctx);
> +	setxattr_finish(&ctx);
>  	return error;
>  }

Huh?  Have you lost a chunk or two in there?  The only modification of
setxattr_copy() in your delta is the introduction of an unused local
variable.  Confused...

What I had in mind is something like this:

// same for getxattr and setxattr
static int xattr_name_from_user(const char __user *name, struct xattr_ctx *ctx)
{
	int copied;

	if (!ctx->xattr_name) {
		ctx->xattr_name = kmalloc(XATTR_NAME_MAX + 1, GFP_KERNEL);
		if (!ctx->xattr_name)
			return -ENOMEM;
	}

	copied = strncpy_from_user(ctx->xattr_name, name, XATTR_NAME_MAX + 1);
 	if (copied < 0)
 		return copied;	// copyin failure; almost always -EFAULT
	if (copied == 0 || copied == XATTR_NAME_MAX + 1)
		return  -ERANGE;
	return 0;
}

// freeing is up to the caller, whether we succeed or not
int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
{
 	int error;

	if (ctx->flags & ~(XATTR_CREATE|XATTR_REPLACE))
 		return -EINVAL;

	error = xattr_name_from_user(name, ctx);
 	if (error)
 		return error;

	if (ctx->size) {
		void *p;

		if (ctx->size > XATTR_SIZE_MAX)
 			return -E2BIG;

		p = vmemdup_user(ctx->value, ctx->size);
		if (IS_ERR(p))
			return PTR_ERR(p);
		ctx->kvalue = p;
 	}
	return 0;
}

with syscall side concluded with freeing ->kvalue (unconditionally), while
io_uring one - ->kvalue and ->xattr_name (also unconditionally).  And to
hell with struct xattr_name - a string is a string.

However, what I really want to see is the answer to my question re control
flow and the place where we do copy the arguments from userland.  Including
the pathname.

*IF* there's a subtle reason that has to be done from prep phase (and there
might very well be - figuring out the control flow in io_uring is bloody
painful), I would really like to see it spelled out, along with the explanation
of the reasons why statx() doesn't need anything of that sort.

If there's no such reasons, I would bloody well leave marshalling to the
payload, allowing to share a lot more with the syscall path.  In that
case xattr_ctx only needs to carry the userland pointers/size/flags.
And all that "do we allocate the kernel copy of the name dynamically,
or does it live on stack" simply goes away.

Details, please.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30 16:16         ` Al Viro
@ 2021-12-30 18:01           ` Christian Brauner
  2021-12-30 19:09             ` Jens Axboe
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2021-12-30 18:01 UTC (permalink / raw)
  To: Al Viro; +Cc: Stefan Roesch, io-uring, linux-fsdevel, kernel-team, torvalds

On Thu, Dec 30, 2021 at 04:16:34PM +0000, Al Viro wrote:
> On Thu, Dec 30, 2021 at 11:12:42AM +0100, Christian Brauner wrote:
> 
> > @@ -545,6 +545,7 @@ EXPORT_SYMBOL_GPL(vfs_removexattr);
> >  int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
> >  {
> >  	int error;
> > +	struct xattr_ctx *new_ctx;
> >  
> >  	if (ctx->flags & ~(XATTR_CREATE|XATTR_REPLACE))
> >  		return -EINVAL;
> > @@ -606,12 +607,9 @@ setxattr(struct user_namespace *mnt_userns, struct dentry *d,
> >  	int error;
> >  
> >  	error = setxattr_copy(name, &ctx);
> > -	if (error)
> > -		return error;
> > -
> > -	error = do_setxattr(mnt_userns, d, &ctx);
> > -
> > -	kvfree(ctx.kvalue);
> > +	if (!error)
> > +		error = do_setxattr(mnt_userns, d, &ctx);
> > +	setxattr_finish(&ctx);
> >  	return error;
> >  }
> 
> Huh?  Have you lost a chunk or two in there?  The only modification of
> setxattr_copy() in your delta is the introduction of an unused local
> variable.  Confused...
> 
> What I had in mind is something like this:
> 
> // same for getxattr and setxattr
> static int xattr_name_from_user(const char __user *name, struct xattr_ctx *ctx)
> {
> 	int copied;
> 
> 	if (!ctx->xattr_name) {
> 		ctx->xattr_name = kmalloc(XATTR_NAME_MAX + 1, GFP_KERNEL);
> 		if (!ctx->xattr_name)
> 			return -ENOMEM;
> 	}
> 
> 	copied = strncpy_from_user(ctx->xattr_name, name, XATTR_NAME_MAX + 1);
>  	if (copied < 0)
>  		return copied;	// copyin failure; almost always -EFAULT
> 	if (copied == 0 || copied == XATTR_NAME_MAX + 1)
> 		return  -ERANGE;
> 	return 0;
> }
> 
> // freeing is up to the caller, whether we succeed or not
> int setxattr_copy(const char __user *name, struct xattr_ctx *ctx)
> {
>  	int error;
> 
> 	if (ctx->flags & ~(XATTR_CREATE|XATTR_REPLACE))
>  		return -EINVAL;
> 
> 	error = xattr_name_from_user(name, ctx);
>  	if (error)
>  		return error;
> 
> 	if (ctx->size) {
> 		void *p;
> 
> 		if (ctx->size > XATTR_SIZE_MAX)
>  			return -E2BIG;
> 
> 		p = vmemdup_user(ctx->value, ctx->size);
> 		if (IS_ERR(p))
> 			return PTR_ERR(p);
> 		ctx->kvalue = p;
>  	}
> 	return 0;
> }
> 
> with syscall side concluded with freeing ->kvalue (unconditionally), while
> io_uring one - ->kvalue and ->xattr_name (also unconditionally).  And to
> hell with struct xattr_name - a string is a string.

Uhm, it wasn't obvious at all that you were just talking about
attr_ctx->kname. At least to me. I thought you were saying to allocate
struct xattr_ctx dynamically for io_uring and have it static for the
syscall path. Anyway, that change seems sensible to me. I don't care
much if there's a separate struct xattr_name or not.

> 
> However, what I really want to see is the answer to my question re control
> flow and the place where we do copy the arguments from userland.  Including
> the pathname.
> 
> *IF* there's a subtle reason that has to be done from prep phase (and there
> might very well be - figuring out the control flow in io_uring is bloody
> painful), I would really like to see it spelled out, along with the explanation
> of the reasons why statx() doesn't need anything of that sort.
> 
> If there's no such reasons, I would bloody well leave marshalling to the

That's really something the io_uring folks should explain to us. I can't
help much there either apart from what I can gather from looking through
the io_req_prep() switch.

Where it's clear that nearly all syscall-operations immediately do a
getname() and/or copy their arguments in the *_prep() phase as, not in
the actual "do-the-work" phase. For example, io_epoll_ctl_prep() which
copies struct epoll_event via copy_from_user(). It doesn't do it in
io_epoll_ctl(). So as such io_statx_prep() is the outlier...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30 18:01           ` Christian Brauner
@ 2021-12-30 19:09             ` Jens Axboe
  2021-12-30 22:24               ` Al Viro
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-12-30 19:09 UTC (permalink / raw)
  To: Christian Brauner, Al Viro
  Cc: Stefan Roesch, io-uring, linux-fsdevel, kernel-team, torvalds

On 12/30/21 10:01 AM, Christian Brauner wrote:
>> However, what I really want to see is the answer to my question re control
>> flow and the place where we do copy the arguments from userland.  Including
>> the pathname.
>>
>> *IF* there's a subtle reason that has to be done from prep phase (and there
>> might very well be - figuring out the control flow in io_uring is bloody
>> painful), I would really like to see it spelled out, along with the explanation
>> of the reasons why statx() doesn't need anything of that sort.
>>
>> If there's no such reasons, I would bloody well leave marshalling to the
> 
> That's really something the io_uring folks should explain to us. I can't
> help much there either apart from what I can gather from looking through
> the io_req_prep() switch.
> 
> Where it's clear that nearly all syscall-operations immediately do a
> getname() and/or copy their arguments in the *_prep() phase as, not in
> the actual "do-the-work" phase. For example, io_epoll_ctl_prep() which
> copies struct epoll_event via copy_from_user(). It doesn't do it in
> io_epoll_ctl(). So as such io_statx_prep() is the outlier...

For each command, there are two steps:

- The prep of it, this happens inline from the system call where the
  request, or requests, are submitted. The prep phase should ensure that
  argument structs are stable. Hence a caller can prep a request and
  have memory on stack, as long as it submits before it becomes invalid.
  An example of that are iovecs for readv/writev. The caller does not
  need to have them stable for the duration of the request, just across
  submit. That's the io_${cmd}_prep() helpers.

- The execution of it. May be separate from prep and from an async
  worker. Where the lower layers don't support a nonblocking attempt,
  they are always done async. The statx stuff is an example of that.

Hence prep needs to copy from userland on the prep side always for the
statx family, as execution will happen out-of-line from the submission.

Does that explain it?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30 19:09             ` Jens Axboe
@ 2021-12-30 22:24               ` Al Viro
  2021-12-30 22:46                 ` Jens Axboe
  0 siblings, 1 reply; 26+ messages in thread
From: Al Viro @ 2021-12-30 22:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Brauner, Stefan Roesch, io-uring, linux-fsdevel,
	kernel-team, torvalds

On Thu, Dec 30, 2021 at 11:09:12AM -0800, Jens Axboe wrote:

> For each command, there are two steps:
> 
> - The prep of it, this happens inline from the system call where the
>   request, or requests, are submitted. The prep phase should ensure that
>   argument structs are stable. Hence a caller can prep a request and
>   have memory on stack, as long as it submits before it becomes invalid.
>   An example of that are iovecs for readv/writev. The caller does not
>   need to have them stable for the duration of the request, just across
>   submit. That's the io_${cmd}_prep() helpers.
> 
> - The execution of it. May be separate from prep and from an async
>   worker. Where the lower layers don't support a nonblocking attempt,
>   they are always done async. The statx stuff is an example of that.
> 
> Hence prep needs to copy from userland on the prep side always for the
> statx family, as execution will happen out-of-line from the submission.
> 
> Does that explain it?

The actual call chain leading to filename_lookup() is, AFAICS, this:
	io_statx()
		do_statx()
			vfs_statx()
				user_path_at()
					user_path_at_empty()
						filename_lookup()

If you are providing such warranties for the contents of pathname
arguments, you have a bug in statx in the mainline.  If you are not,
there's no point in doing getname() in getxattr prep.

Which one it is?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30 22:24               ` Al Viro
@ 2021-12-30 22:46                 ` Jens Axboe
  2021-12-30 23:02                   ` Al Viro
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Axboe @ 2021-12-30 22:46 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Brauner, Stefan Roesch, io-uring, linux-fsdevel,
	kernel-team, torvalds

On 12/30/21 2:24 PM, Al Viro wrote:
> On Thu, Dec 30, 2021 at 11:09:12AM -0800, Jens Axboe wrote:
> 
>> For each command, there are two steps:
>>
>> - The prep of it, this happens inline from the system call where the
>>   request, or requests, are submitted. The prep phase should ensure that
>>   argument structs are stable. Hence a caller can prep a request and
>>   have memory on stack, as long as it submits before it becomes invalid.
>>   An example of that are iovecs for readv/writev. The caller does not
>>   need to have them stable for the duration of the request, just across
>>   submit. That's the io_${cmd}_prep() helpers.
>>
>> - The execution of it. May be separate from prep and from an async
>>   worker. Where the lower layers don't support a nonblocking attempt,
>>   they are always done async. The statx stuff is an example of that.
>>
>> Hence prep needs to copy from userland on the prep side always for the
>> statx family, as execution will happen out-of-line from the submission.
>>
>> Does that explain it?
> 
> The actual call chain leading to filename_lookup() is, AFAICS, this:
> 	io_statx()
> 		do_statx()
> 			vfs_statx()
> 				user_path_at()
> 					user_path_at_empty()
> 						filename_lookup()
> 
> If you are providing such warranties for the contents of pathname
> arguments, you have a bug in statx in the mainline.  If you are not,
> there's no point in doing getname() in getxattr prep.

Not for the filename lookup, as I said it's for data passed in. There
are no guarantees on filename lookup, that happens when it gets
executed. See mentioned example on iovec and readv/writev.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30 22:46                 ` Jens Axboe
@ 2021-12-30 23:02                   ` Al Viro
  0 siblings, 0 replies; 26+ messages in thread
From: Al Viro @ 2021-12-30 23:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Brauner, Stefan Roesch, io-uring, linux-fsdevel,
	kernel-team, torvalds

On Thu, Dec 30, 2021 at 02:46:49PM -0800, Jens Axboe wrote:
> On 12/30/21 2:24 PM, Al Viro wrote:
> > On Thu, Dec 30, 2021 at 11:09:12AM -0800, Jens Axboe wrote:
> > 
> >> For each command, there are two steps:
> >>
> >> - The prep of it, this happens inline from the system call where the
> >>   request, or requests, are submitted. The prep phase should ensure that
> >>   argument structs are stable. Hence a caller can prep a request and
> >>   have memory on stack, as long as it submits before it becomes invalid.
> >>   An example of that are iovecs for readv/writev. The caller does not
> >>   need to have them stable for the duration of the request, just across
> >>   submit. That's the io_${cmd}_prep() helpers.
> >>
> >> - The execution of it. May be separate from prep and from an async
> >>   worker. Where the lower layers don't support a nonblocking attempt,
> >>   they are always done async. The statx stuff is an example of that.
> >>
> >> Hence prep needs to copy from userland on the prep side always for the
> >> statx family, as execution will happen out-of-line from the submission.
> >>
> >> Does that explain it?
> > 
> > The actual call chain leading to filename_lookup() is, AFAICS, this:
> > 	io_statx()
> > 		do_statx()
> > 			vfs_statx()
> > 				user_path_at()
> > 					user_path_at_empty()
> > 						filename_lookup()
> > 
> > If you are providing such warranties for the contents of pathname
> > arguments, you have a bug in statx in the mainline.  If you are not,
> > there's no point in doing getname() in getxattr prep.
> 
> Not for the filename lookup, as I said it's for data passed in. There
> are no guarantees on filename lookup, that happens when it gets
> executed. See mentioned example on iovec and readv/writev.

s/filename_lookup/getname_flags/, sorry.

Again, statx support does both the copyin and pathname resolution *after*
prep, from io_statx().  They are not separated - io_statx() pass the userland
pointer to user_path_at_empty(), which does all the work.  So if a pathname
you'd passed had been in a local array and you return right after submitting
a request, you will end up with io_statx() fetching random garbage.

This patchset is different - for getxattr you have getname done in prep,
with resulting struct filename kept around until the actual work is to
be done.  That's precisely the reason why the first patch in the series
introduces a user_path_at_empty() variant that takes a struct filename,
with the pathname contents already copied in.

IOW, why is user_path_at_empty() good for statx, but not for getxattr?
What's the difference?

Do you treat the pathname contents (string in userland memory, that is)
same way your writev support treats iovec array (caller may discard it
as soon as syscall returns) or the same way it treats the actual data
to be written (caller is responsible for keeping it around until the
operation reports completion)?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support
  2021-12-30  2:17   ` Al Viro
  2021-12-30  2:19     ` Al Viro
  2021-12-30  3:04     ` Al Viro
@ 2021-12-30 20:18     ` Stefan Roesch
  2 siblings, 0 replies; 26+ messages in thread
From: Stefan Roesch @ 2021-12-30 20:18 UTC (permalink / raw)
  To: Al Viro; +Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner



On 12/29/21 6:17 PM, Al Viro wrote:
> On Wed, Dec 29, 2021 at 12:30:01PM -0800, Stefan Roesch wrote:
> 
>> +static int __io_setxattr_prep(struct io_kiocb *req,
>> +			const struct io_uring_sqe *sqe)
>> +{
>> +	struct io_xattr *ix = &req->xattr;
>> +	const char __user *name;
>> +	int ret;
>> +
>> +	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>> +		return -EINVAL;
>> +	if (unlikely(sqe->ioprio))
>> +		return -EINVAL;
>> +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
>> +		return -EBADF;
>> +
>> +	ix->filename = NULL;
>> +	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> +	ix->ctx.value = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>> +	ix->ctx.kvalue = NULL;
>> +	ix->ctx.size = READ_ONCE(sqe->len);
>> +	ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
>> +
>> +	ix->ctx.kname = kmalloc(sizeof(*ix->ctx.kname), GFP_KERNEL);
>> +	if (!ix->ctx.kname)
>> +		return -ENOMEM;
>> +
>> +	ret = setxattr_copy(name, &ix->ctx);
>> +	if (ret) {
>> +		kfree(ix->ctx.kname);
>> +		return ret;
>> +	}
>> +
>> +	req->flags |= REQ_F_NEED_CLEANUP;
>> +	return 0;
>> +}
> 
> OK, so you
> 	* allocate a buffer for xattr name
> 	* have setxattr_copy() copy the name in *and* memdup the contents
> 	* on failure, you have the buffer for xattr name freed and return
> an error.  memdup'ed stuff is left for cleanup, presumably.
> 
>> +static int io_setxattr_prep(struct io_kiocb *req,
>> +			const struct io_uring_sqe *sqe)
>> +{
>> +	struct io_xattr *ix = &req->xattr;
>> +	const char __user *path;
>> +	int ret;
>> +
>> +	ret = __io_setxattr_prep(req, sqe);
>> +	if (ret)
>> +		return ret;
>> +
>> +	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>> +
>> +	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
>> +	if (IS_ERR(ix->filename)) {
>> +		ret = PTR_ERR(ix->filename);
>> +		ix->filename = NULL;
>> +	}
>> +
>> +	return ret;
>> +}
> 
> ... and here you use it and bring the pathname in.  Should the latter
> step fail, you restore ->filename to NULL and return an error.
> 
> Could you explain what kind of magic could allow the caller to tell
> whether ix->ctx.kname needs to be freed on error?  I don't see any way
> that could possibly work...

At the end of the function __io_setxattr_prep() we set the flag REQ_F_NEED_CLEANUP.

If the processing fails for some reason, the cleanup code in io_clean_op() gets called
and the data structures get de-allocated.

In case the request is processed successfully, the memory gets de-allocated in io_setxattr()
and io_fsetxattr() with the helper function __io_setxattr_finish(). The helper function clears
the flag REQ_F_NEED_CLEANUP, so clean up is not necessary.

This is the general pattern of cleanup in io-uring.

I can certainly add a cleanup function, that is called in all 3 cases:
- io_setxattr,
- io_fsetxattr
- io_clean_op








^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support
  2021-12-29 20:29 [PATCH v10 0/5] io_uring: add xattr support Stefan Roesch
                   ` (3 preceding siblings ...)
  2021-12-29 20:30 ` [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support Stefan Roesch
@ 2021-12-29 20:30 ` Stefan Roesch
  2021-12-30  1:41   ` Al Viro
  4 siblings, 1 reply; 26+ messages in thread
From: Stefan Roesch @ 2021-12-29 20:30 UTC (permalink / raw)
  To: io-uring, linux-fsdevel, kernel-team; +Cc: torvalds, christian.brauner, shr

This adds support to io_uring for the fgetxattr and getxattr API.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/io_uring.c                 | 152 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/io_uring.h |   2 +
 2 files changed, 154 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f1b325dd81d5..4f2ffe43cb1d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1130,6 +1130,10 @@ static const struct io_op_def io_op_defs[] = {
 		.needs_file = 1
 	},
 	[IORING_OP_SETXATTR] = {},
+	[IORING_OP_FGETXATTR] = {
+		.needs_file = 1
+	},
+	[IORING_OP_GETXATTR] = {},
 };
 
 /* requests with any of those set should undergo io_disarm_next() */
@@ -3899,6 +3903,137 @@ static int io_renameat(struct io_kiocb *req, unsigned int issue_flags)
 	return 0;
 }
 
+static int __io_getxattr_prep(struct io_kiocb *req,
+			      const struct io_uring_sqe *sqe)
+{
+	struct io_xattr *ix = &req->xattr;
+	const char __user *name;
+	int ret;
+
+	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+		return -EINVAL;
+	if (unlikely(sqe->ioprio))
+		return -EINVAL;
+	if (unlikely(req->flags & REQ_F_FIXED_FILE))
+		return -EBADF;
+
+	ix->filename = NULL;
+	ix->ctx.kvalue = NULL;
+	name = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	ix->ctx.value = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+	ix->ctx.size = READ_ONCE(sqe->len);
+	ix->ctx.flags = READ_ONCE(sqe->xattr_flags);
+
+	if (ix->ctx.flags)
+		return -EINVAL;
+
+	ix->ctx.kname = kmalloc(sizeof(*ix->ctx.kname), GFP_KERNEL);
+	if (!ix->ctx.kname)
+		return -ENOMEM;
+
+	ret = strncpy_from_user(ix->ctx.kname->name, name,
+				sizeof(ix->ctx.kname->name));
+	if (!ret || ret == sizeof(ix->ctx.kname->name))
+		ret = -ERANGE;
+	if (ret < 0) {
+		kfree(ix->ctx.kname);
+		return ret;
+	}
+
+	req->flags |= REQ_F_NEED_CLEANUP;
+	return 0;
+}
+
+static int io_fgetxattr_prep(struct io_kiocb *req,
+			     const struct io_uring_sqe *sqe)
+{
+	return __io_getxattr_prep(req, sqe);
+}
+
+static int io_getxattr_prep(struct io_kiocb *req,
+			    const struct io_uring_sqe *sqe)
+{
+	struct io_xattr *ix = &req->xattr;
+	const char __user *path;
+	int ret;
+
+	ret = __io_getxattr_prep(req, sqe);
+	if (ret)
+		return ret;
+
+	path = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+
+	ix->filename = getname_flags(path, LOOKUP_FOLLOW, NULL);
+	if (IS_ERR(ix->filename)) {
+		ret = PTR_ERR(ix->filename);
+		ix->filename = NULL;
+	}
+
+	return ret;
+}
+
+static void __io_getxattr_finish(struct io_kiocb *req, int ret)
+{
+	struct xattr_ctx *ctx = &req->xattr.ctx;
+
+	req->flags &= ~REQ_F_NEED_CLEANUP;
+
+	kfree(ctx->kname);
+	if (ret < 0)
+		req_set_fail(req);
+
+	io_req_complete(req, ret);
+}
+
+static int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_xattr *ix = &req->xattr;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+	ret = do_getxattr(mnt_user_ns(req->file->f_path.mnt),
+			req->file->f_path.dentry,
+			ix->ctx.kname->name,
+			(void __user *)ix->ctx.value,
+			ix->ctx.size);
+
+	__io_getxattr_finish(req, ret);
+	return 0;
+}
+
+static int io_getxattr(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_xattr *ix = &req->xattr;
+	unsigned int lookup_flags = LOOKUP_FOLLOW;
+	struct path path;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+retry:
+	ret = do_user_path_at_empty(AT_FDCWD, ix->filename, lookup_flags, &path);
+	if (!ret) {
+		ret = do_getxattr(mnt_user_ns(path.mnt),
+				path.dentry,
+				ix->ctx.kname->name,
+				(void __user *)ix->ctx.value,
+				ix->ctx.size);
+
+		path_put(&path);
+		if (retry_estale(ret, lookup_flags)) {
+			lookup_flags |= LOOKUP_REVAL;
+			goto retry;
+		}
+	}
+	putname(ix->filename);
+
+	__io_getxattr_finish(req, ret);
+	return 0;
+}
+
 static int __io_setxattr_prep(struct io_kiocb *req,
 			const struct io_uring_sqe *sqe)
 {
@@ -6772,6 +6907,10 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_fsetxattr_prep(req, sqe);
 	case IORING_OP_SETXATTR:
 		return io_setxattr_prep(req, sqe);
+	case IORING_OP_FGETXATTR:
+		return io_fgetxattr_prep(req, sqe);
+	case IORING_OP_GETXATTR:
+		return io_getxattr_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6921,6 +7060,13 @@ static void io_clean_op(struct io_kiocb *req)
 			kfree(req->xattr.ctx.kname);
 			kvfree(req->xattr.ctx.kvalue);
 			break;
+		case IORING_OP_GETXATTR:
+			if (req->xattr.filename)
+				putname(req->xattr.filename);
+			fallthrough;
+		case IORING_OP_FGETXATTR:
+			kfree(req->xattr.ctx.kname);
+			break;
 		}
 	}
 	if ((req->flags & REQ_F_POLLED) && req->apoll) {
@@ -7072,6 +7218,12 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	case IORING_OP_SETXATTR:
 		ret = io_setxattr(req, issue_flags);
 		break;
+	case IORING_OP_FGETXATTR:
+		ret = io_fgetxattr(req, issue_flags);
+		break;
+	case IORING_OP_GETXATTR:
+		ret = io_getxattr(req, issue_flags);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index c62a8bec8cd4..efc7ac9b3a6b 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -148,6 +148,8 @@ enum {
 	IORING_OP_GETDENTS,
 	IORING_OP_FSETXATTR,
 	IORING_OP_SETXATTR,
+	IORING_OP_FGETXATTR,
+	IORING_OP_GETXATTR,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support
  2021-12-29 20:30 ` [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support Stefan Roesch
@ 2021-12-30  1:41   ` Al Viro
  2021-12-30  1:46     ` Al Viro
  2021-12-30 20:01     ` Stefan Roesch
  0 siblings, 2 replies; 26+ messages in thread
From: Al Viro @ 2021-12-30  1:41 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Wed, Dec 29, 2021 at 12:30:02PM -0800, Stefan Roesch wrote:

> +static int io_getxattr(struct io_kiocb *req, unsigned int issue_flags)
> +{
> +	struct io_xattr *ix = &req->xattr;
> +	unsigned int lookup_flags = LOOKUP_FOLLOW;
> +	struct path path;
> +	int ret;
> +
> +	if (issue_flags & IO_URING_F_NONBLOCK)
> +		return -EAGAIN;
> +
> +retry:
> +	ret = do_user_path_at_empty(AT_FDCWD, ix->filename, lookup_flags, &path);
> +	if (!ret) {
> +		ret = do_getxattr(mnt_user_ns(path.mnt),
> +				path.dentry,
> +				ix->ctx.kname->name,
> +				(void __user *)ix->ctx.value,
> +				ix->ctx.size);
> +
> +		path_put(&path);
> +		if (retry_estale(ret, lookup_flags)) {
> +			lookup_flags |= LOOKUP_REVAL;
> +			goto retry;
> +		}
> +	}
> +	putname(ix->filename);
> +
> +	__io_getxattr_finish(req, ret);
> +	return 0;
> +}

Looking at that one...  Is there any reason to have that loop (from retry: to
putname() call) outside of fs/xattr.c?  Come to think of that, why bother
polluting your struct io_xattr with ->filename?

Note, BTW, that we already have this:
static ssize_t path_getxattr(const char __user *pathname,
                             const char __user *name, void __user *value,
			     size_t size, unsigned int lookup_flags)
{
	struct path path;
	ssize_t error;
retry:
	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
	if (error)
		return error;
	error = getxattr(mnt_user_ns(path.mnt), path.dentry, name, value, size);
	path_put(&path);
	if (retry_estale(error, lookup_flags)) {
		lookup_flags |= LOOKUP_REVAL;
		goto retry;
	}
	return error;
}
in there.  The only potential benefit here would be to avoid repeated getname
in case of having hit -ESTALE and going to repeat the entire fucking pathwalk
with maximal paranoia, asking the server(s) involved to revalidate on every
step, etc.

If we end up going there, who the hell *cares* about the costs of less than
a page worth of copy_from_user()?  We are already on a very slow path as it
is, so what's the point?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support
  2021-12-30  1:41   ` Al Viro
@ 2021-12-30  1:46     ` Al Viro
  2021-12-30 20:01     ` Stefan Roesch
  1 sibling, 0 replies; 26+ messages in thread
From: Al Viro @ 2021-12-30  1:46 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner

On Thu, Dec 30, 2021 at 01:41:35AM +0000, Al Viro wrote:
> On Wed, Dec 29, 2021 at 12:30:02PM -0800, Stefan Roesch wrote:
> 
> > +static int io_getxattr(struct io_kiocb *req, unsigned int issue_flags)
> > +{
> > +	struct io_xattr *ix = &req->xattr;
> > +	unsigned int lookup_flags = LOOKUP_FOLLOW;
> > +	struct path path;
> > +	int ret;
> > +
> > +	if (issue_flags & IO_URING_F_NONBLOCK)
> > +		return -EAGAIN;
> > +
> > +retry:
> > +	ret = do_user_path_at_empty(AT_FDCWD, ix->filename, lookup_flags, &path);
> > +	if (!ret) {
> > +		ret = do_getxattr(mnt_user_ns(path.mnt),
> > +				path.dentry,
> > +				ix->ctx.kname->name,
> > +				(void __user *)ix->ctx.value,
> > +				ix->ctx.size);
> > +
> > +		path_put(&path);
> > +		if (retry_estale(ret, lookup_flags)) {
> > +			lookup_flags |= LOOKUP_REVAL;
> > +			goto retry;
> > +		}
> > +	}
> > +	putname(ix->filename);
> > +
> > +	__io_getxattr_finish(req, ret);
> > +	return 0;
> > +}
> 
> Looking at that one...  Is there any reason to have that loop (from retry: to
> putname() call) outside of fs/xattr.c?  Come to think of that, why bother
> polluting your struct io_xattr with ->filename?
> 
> Note, BTW, that we already have this:
> static ssize_t path_getxattr(const char __user *pathname,
>                              const char __user *name, void __user *value,
> 			     size_t size, unsigned int lookup_flags)
> {
> 	struct path path;
> 	ssize_t error;
> retry:
> 	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
> 	if (error)
> 		return error;
> 	error = getxattr(mnt_user_ns(path.mnt), path.dentry, name, value, size);
> 	path_put(&path);
> 	if (retry_estale(error, lookup_flags)) {
> 		lookup_flags |= LOOKUP_REVAL;
> 		goto retry;
> 	}
> 	return error;
> }
> in there.  The only potential benefit here would be to avoid repeated getname
> in case of having hit -ESTALE and going to repeat the entire fucking pathwalk
> with maximal paranoia, asking the server(s) involved to revalidate on every
> step, etc.
> 
> If we end up going there, who the hell *cares* about the costs of less than
> a page worth of copy_from_user()?  We are already on a very slow path as it
> is, so what's the point?

BTW, if the answer is along the lines of "we want to copy the name in prep
phase fo $REASONS", I would like to hear what it is that makes getxattr()
different from statx() in that respect.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support
  2021-12-30  1:41   ` Al Viro
  2021-12-30  1:46     ` Al Viro
@ 2021-12-30 20:01     ` Stefan Roesch
  1 sibling, 0 replies; 26+ messages in thread
From: Stefan Roesch @ 2021-12-30 20:01 UTC (permalink / raw)
  To: Al Viro; +Cc: io-uring, linux-fsdevel, kernel-team, torvalds, christian.brauner



On 12/29/21 5:41 PM, Al Viro wrote:
> On Wed, Dec 29, 2021 at 12:30:02PM -0800, Stefan Roesch wrote:
> 
>> +static int io_getxattr(struct io_kiocb *req, unsigned int issue_flags)
>> +{
>> +	struct io_xattr *ix = &req->xattr;
>> +	unsigned int lookup_flags = LOOKUP_FOLLOW;
>> +	struct path path;
>> +	int ret;
>> +
>> +	if (issue_flags & IO_URING_F_NONBLOCK)
>> +		return -EAGAIN;
>> +
>> +retry:
>> +	ret = do_user_path_at_empty(AT_FDCWD, ix->filename, lookup_flags, &path);
>> +	if (!ret) {
>> +		ret = do_getxattr(mnt_user_ns(path.mnt),
>> +				path.dentry,
>> +				ix->ctx.kname->name,
>> +				(void __user *)ix->ctx.value,
>> +				ix->ctx.size);
>> +
>> +		path_put(&path);
>> +		if (retry_estale(ret, lookup_flags)) {
>> +			lookup_flags |= LOOKUP_REVAL;
>> +			goto retry;
>> +		}
>> +	}
>> +	putname(ix->filename);
>> +
>> +	__io_getxattr_finish(req, ret);
>> +	return 0;
>> +}
> 
> Looking at that one...  Is there any reason to have that loop (from retry: to
> putname() call) outside of fs/xattr.c?  Come to think of that, why bother
> polluting your struct io_xattr with ->filename?
> 
> Note, BTW, that we already have this:
> static ssize_t path_getxattr(const char __user *pathname,
>                              const char __user *name, void __user *value,
> 			     size_t size, unsigned int lookup_flags)
> {
> 	struct path path;
> 	ssize_t error;
> retry:
> 	error = user_path_at(AT_FDCWD, pathname, lookup_flags, &path);
> 	if (error)
> 		return error;
> 	error = getxattr(mnt_user_ns(path.mnt), path.dentry, name, value, size);
> 	path_put(&path);
> 	if (retry_estale(error, lookup_flags)) {
> 		lookup_flags |= LOOKUP_REVAL;
> 		goto retry;
> 	}
> 	return error;
> }
> in there.  The only potential benefit here would be to avoid repeated getname
> in case of having hit -ESTALE and going to repeat the entire fucking pathwalk
> with maximal paranoia, asking the server(s) involved to revalidate on every
> step, etc.
> 
> If we end up going there, who the hell *cares* about the costs of less than
> a page worth of copy_from_user()?  We are already on a very slow path as it
> is, so what's the point?

I think Jens already answered this why we capture the parameters during the prep
step. From Jens:

"

- The prep of it, this happens inline from the system call where the
  request, or requests, are submitted. The prep phase should ensure that
  argument structs are stable. Hence a caller can prep a request and
  have memory on stack, as long as it submits before it becomes invalid.
  An example of that are iovecs for readv/writev. The caller does not
  need to have them stable for the duration of the request, just across
  submit. That's the io_${cmd}_prep() helpers.

- The execution of it. May be separate from prep and from an async
  worker. Where the lower layers don't support a nonblocking attempt,
  they are always done async. The statx stuff is an example of that.

Hence prep needs to copy from userland on the prep side always for the
statx family, as execution will happen out-of-line from the submission.

"

Otherwise we need to copy the path value the user passed in, storing a filename struct
seems to be the better choice.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-12-30 23:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-29 20:29 [PATCH v10 0/5] io_uring: add xattr support Stefan Roesch
2021-12-29 20:29 ` [PATCH v10 1/5] fs: split off do_user_path_at_empty from user_path_at_empty() Stefan Roesch
2021-12-30  0:49   ` Al Viro
2021-12-30 19:57     ` Stefan Roesch
2021-12-29 20:29 ` [PATCH v10 2/5] fs: split off setxattr_copy and do_setxattr function from setxattr Stefan Roesch
2021-12-30  1:15   ` Al Viro
2021-12-30  9:41     ` Christian Brauner
2021-12-30 19:57     ` Stefan Roesch
2021-12-29 20:30 ` [PATCH v10 3/5] fs: split off do_getxattr from getxattr Stefan Roesch
2021-12-29 20:30 ` [PATCH v10 4/5] io_uring: add fsetxattr and setxattr support Stefan Roesch
2021-12-30  1:58   ` Al Viro
2021-12-30  2:17   ` Al Viro
2021-12-30  2:19     ` Al Viro
2021-12-30  3:04     ` Al Viro
2021-12-30 10:12       ` Christian Brauner
2021-12-30 16:16         ` Al Viro
2021-12-30 18:01           ` Christian Brauner
2021-12-30 19:09             ` Jens Axboe
2021-12-30 22:24               ` Al Viro
2021-12-30 22:46                 ` Jens Axboe
2021-12-30 23:02                   ` Al Viro
2021-12-30 20:18     ` Stefan Roesch
2021-12-29 20:30 ` [PATCH v10 5/5] io_uring: add fgetxattr and getxattr support Stefan Roesch
2021-12-30  1:41   ` Al Viro
2021-12-30  1:46     ` Al Viro
2021-12-30 20:01     ` Stefan Roesch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox