public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
@ 2021-06-03  5:18 Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 01/10] fs: make do_mkdirat() take struct filename Dmitry Kadashev
                   ` (12 more replies)
  0 siblings, 13 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

This started out as an attempt to add mkdirat support to io_uring which
is heavily based on renameat() / unlinkat() support.

During the review process more operations were added (linkat, symlinkat,
mknodat) mainly to keep things uniform internally (in namei.c), and
with things changed in namei.c adding support for these operations to
io_uring is trivial, so that was done too. See
https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/

The first patch is preparation with no functional changes, makes
do_mkdirat accept struct filename pointer rather than the user string.

The second one leverages that to implement mkdirat in io_uring.

3-6 just convert other similar do_* functions in namei.c to accept
struct filename, for uniformity with do_mkdirat, do_renameat and
do_unlinkat. No functional changes there.

7 changes do_* helpers in namei.c to return ints rather than some of
them returning ints and some longs.

8-10 add symlinkat, linkat and mknodat support to io_uring
(correspondingly).

Based on for-5.14/io_uring.

v5:
- rebase
- add symlinkat, linkat and mknodat support to io_uring

v4:
- update do_mknodat, do_symlinkat and do_linkat to accept struct
  filename for uniformity with do_mkdirat, do_renameat and do_unlinkat;

v3:
- rebase;

v2:
- do not mess with struct filename's refcount in do_mkdirat, instead add
  and use __filename_create() that does not drop the name on success;


Dmitry Kadashev (10):
  fs: make do_mkdirat() take struct filename
  io_uring: add support for IORING_OP_MKDIRAT
  fs: make do_mknodat() take struct filename
  fs: make do_symlinkat() take struct filename
  namei: add getname_uflags()
  fs: make do_linkat() take struct filename
  fs: update do_*() helpers to return ints
  io_uring: add support for IORING_OP_SYMLINKAT
  io_uring: add support for IORING_OP_LINKAT
  io_uring: add support for IORING_OP_MKNODAT

 fs/exec.c                     |   8 +-
 fs/internal.h                 |  10 +-
 fs/io_uring.c                 | 240 ++++++++++++++++++++++++++++++++++
 fs/namei.c                    | 137 ++++++++++++-------
 include/linux/fs.h            |   1 +
 include/uapi/linux/io_uring.h |   6 +
 6 files changed, 349 insertions(+), 53 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v5 01/10] fs: make do_mkdirat() take struct filename
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT Dmitry Kadashev
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

Pass in the struct filename pointers instead of the user string, and
update the three callers to do the same. This is heavily based on
commit dbea8d345177 ("fs: make do_renameat2() take struct filename").

This behaves like do_unlinkat() and do_renameat2().

Cc: Al Viro <[email protected]>
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/internal.h |  1 +
 fs/namei.c    | 22 ++++++++++++++++------
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 6aeae7ef3380..848e165ef0f1 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -77,6 +77,7 @@ long do_unlinkat(int dfd, struct filename *name);
 int may_linkat(struct user_namespace *mnt_userns, struct path *link);
 int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
 		 struct filename *newname, unsigned int flags);
+long do_mkdirat(int dfd, struct filename *name, umode_t mode);
 
 /*
  * namespace.c
diff --git a/fs/namei.c b/fs/namei.c
index 79b0ff9b151e..49317c018341 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3556,7 +3556,7 @@ struct file *do_file_open_root(struct dentry *dentry, struct vfsmount *mnt,
 	return file;
 }
 
-static struct dentry *filename_create(int dfd, struct filename *name,
+static struct dentry *__filename_create(int dfd, struct filename *name,
 				struct path *path, unsigned int lookup_flags)
 {
 	struct dentry *dentry = ERR_PTR(-EEXIST);
@@ -3612,7 +3612,6 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 		error = err2;
 		goto fail;
 	}
-	putname(name);
 	return dentry;
 fail:
 	dput(dentry);
@@ -3627,6 +3626,16 @@ static struct dentry *filename_create(int dfd, struct filename *name,
 	return dentry;
 }
 
+static inline struct dentry *filename_create(int dfd, struct filename *name,
+				struct path *path, unsigned int lookup_flags)
+{
+	struct dentry *res = __filename_create(dfd, name, path, lookup_flags);
+
+	if (!IS_ERR(res))
+		putname(name);
+	return res;
+}
+
 struct dentry *kern_path_create(int dfd, const char *pathname,
 				struct path *path, unsigned int lookup_flags)
 {
@@ -3817,7 +3826,7 @@ int vfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 }
 EXPORT_SYMBOL(vfs_mkdir);
 
-static long do_mkdirat(int dfd, const char __user *pathname, umode_t mode)
+long do_mkdirat(int dfd, struct filename *name, umode_t mode)
 {
 	struct dentry *dentry;
 	struct path path;
@@ -3825,7 +3834,7 @@ static long do_mkdirat(int dfd, const char __user *pathname, umode_t mode)
 	unsigned int lookup_flags = LOOKUP_DIRECTORY;
 
 retry:
-	dentry = user_path_create(dfd, pathname, &path, lookup_flags);
+	dentry = __filename_create(dfd, name, &path, lookup_flags);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
@@ -3843,17 +3852,18 @@ static long do_mkdirat(int dfd, const char __user *pathname, umode_t mode)
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
 	}
+	putname(name);
 	return error;
 }
 
 SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
 {
-	return do_mkdirat(dfd, pathname, mode);
+	return do_mkdirat(dfd, getname(pathname), mode);
 }
 
 SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode)
 {
-	return do_mkdirat(AT_FDCWD, pathname, mode);
+	return do_mkdirat(AT_FDCWD, getname(pathname), mode);
 }
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 01/10] fs: make do_mkdirat() take struct filename Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-22 11:41   ` Pavel Begunkov
  2021-06-22 17:41   ` Pavel Begunkov
  2021-06-03  5:18 ` [PATCH v5 03/10] fs: make do_mknodat() take struct filename Dmitry Kadashev
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
and arguments.

Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/io_uring.c                 | 55 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/io_uring.h |  1 +
 2 files changed, 56 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a1ca6badff36..8ab4eb559520 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -665,6 +665,13 @@ struct io_unlink {
 	struct filename			*filename;
 };
 
+struct io_mkdir {
+	struct file			*file;
+	int				dfd;
+	umode_t				mode;
+	struct filename			*filename;
+};
+
 struct io_completion {
 	struct file			*file;
 	struct list_head		list;
@@ -809,6 +816,7 @@ struct io_kiocb {
 		struct io_shutdown	shutdown;
 		struct io_rename	rename;
 		struct io_unlink	unlink;
+		struct io_mkdir		mkdir;
 		/* use only after cleaning per-op data, see io_clean_op() */
 		struct io_completion	compl;
 	};
@@ -1021,6 +1029,7 @@ static const struct io_op_def io_op_defs[] = {
 	},
 	[IORING_OP_RENAMEAT] = {},
 	[IORING_OP_UNLINKAT] = {},
+	[IORING_OP_MKDIRAT] = {},
 };
 
 static bool io_disarm_next(struct io_kiocb *req);
@@ -3530,6 +3539,44 @@ static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
 	return 0;
 }
 
+static int io_mkdirat_prep(struct io_kiocb *req,
+			    const struct io_uring_sqe *sqe)
+{
+	struct io_mkdir *mkd = &req->mkdir;
+	const char __user *fname;
+
+	if (unlikely(req->flags & REQ_F_FIXED_FILE))
+		return -EBADF;
+
+	mkd->dfd = READ_ONCE(sqe->fd);
+	mkd->mode = READ_ONCE(sqe->len);
+
+	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	mkd->filename = getname(fname);
+	if (IS_ERR(mkd->filename))
+		return PTR_ERR(mkd->filename);
+
+	req->flags |= REQ_F_NEED_CLEANUP;
+	return 0;
+}
+
+static int io_mkdirat(struct io_kiocb *req, int issue_flags)
+{
+	struct io_mkdir *mkd = &req->mkdir;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+	ret = do_mkdirat(mkd->dfd, mkd->filename, mkd->mode);
+
+	req->flags &= ~REQ_F_NEED_CLEANUP;
+	if (ret < 0)
+		req_set_fail_links(req);
+	io_req_complete(req, ret);
+	return 0;
+}
+
 static int io_shutdown_prep(struct io_kiocb *req,
 			    const struct io_uring_sqe *sqe)
 {
@@ -5936,6 +5983,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_renameat_prep(req, sqe);
 	case IORING_OP_UNLINKAT:
 		return io_unlinkat_prep(req, sqe);
+	case IORING_OP_MKDIRAT:
+		return io_mkdirat_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6077,6 +6126,9 @@ static void io_clean_op(struct io_kiocb *req)
 		case IORING_OP_UNLINKAT:
 			putname(req->unlink.filename);
 			break;
+		case IORING_OP_MKDIRAT:
+			putname(req->mkdir.filename);
+			break;
 		}
 		req->flags &= ~REQ_F_NEED_CLEANUP;
 	}
@@ -6203,6 +6255,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	case IORING_OP_UNLINKAT:
 		ret = io_unlinkat(req, issue_flags);
 		break;
+	case IORING_OP_MKDIRAT:
+		ret = io_mkdirat(req, issue_flags);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index e1ae46683301..bf9d720d371f 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -137,6 +137,7 @@ enum {
 	IORING_OP_SHUTDOWN,
 	IORING_OP_RENAMEAT,
 	IORING_OP_UNLINKAT,
+	IORING_OP_MKDIRAT,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 03/10] fs: make do_mknodat() take struct filename
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 01/10] fs: make do_mkdirat() take struct filename Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 04/10] fs: make do_symlinkat() " Dmitry Kadashev
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

Pass in the struct filename pointers instead of the user string, for
uniformity with the recently converted do_unlinkat(), do_renameat(),
do_mkdirat().

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210330071700.kpjoyp5zlni7uejm@wittgenstein/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/namei.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 49317c018341..9fc981e28788 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3724,7 +3724,7 @@ static int may_mknod(umode_t mode)
 	}
 }
 
-static long do_mknodat(int dfd, const char __user *filename, umode_t mode,
+static long do_mknodat(int dfd, struct filename *name, umode_t mode,
 		unsigned int dev)
 {
 	struct user_namespace *mnt_userns;
@@ -3735,9 +3735,9 @@ static long do_mknodat(int dfd, const char __user *filename, umode_t mode,
 
 	error = may_mknod(mode);
 	if (error)
-		return error;
+		goto out1;
 retry:
-	dentry = user_path_create(dfd, filename, &path, lookup_flags);
+	dentry = __filename_create(dfd, name, &path, lookup_flags);
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
@@ -3745,7 +3745,7 @@ static long do_mknodat(int dfd, const char __user *filename, umode_t mode,
 		mode &= ~current_umask();
 	error = security_path_mknod(&path, dentry, mode, dev);
 	if (error)
-		goto out;
+		goto out2;
 
 	mnt_userns = mnt_user_ns(path.mnt);
 	switch (mode & S_IFMT) {
@@ -3764,24 +3764,27 @@ static long do_mknodat(int dfd, const char __user *filename, umode_t mode,
 					  dentry, mode, 0);
 			break;
 	}
-out:
+out2:
 	done_path_create(&path, dentry);
 	if (retry_estale(error, lookup_flags)) {
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
 	}
+out1:
+	if (!IS_ERR(name))
+		putname(name);
 	return error;
 }
 
 SYSCALL_DEFINE4(mknodat, int, dfd, const char __user *, filename, umode_t, mode,
 		unsigned int, dev)
 {
-	return do_mknodat(dfd, filename, mode, dev);
+	return do_mknodat(dfd, getname(filename), mode, dev);
 }
 
 SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsigned, dev)
 {
-	return do_mknodat(AT_FDCWD, filename, mode, dev);
+	return do_mknodat(AT_FDCWD, getname(filename), mode, dev);
 }
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 04/10] fs: make do_symlinkat() take struct filename
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (2 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 03/10] fs: make do_mknodat() take struct filename Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 05/10] namei: add getname_uflags() Dmitry Kadashev
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

Pass in the struct filename pointers instead of the user string, for
uniformity with the recently converted do_mkdnodat(), do_unlinkat(),
do_renameat(), do_mkdirat().

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210330071700.kpjoyp5zlni7uejm@wittgenstein/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/namei.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 9fc981e28788..76572d703e82 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4189,23 +4189,23 @@ int vfs_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 }
 EXPORT_SYMBOL(vfs_symlink);
 
-static long do_symlinkat(const char __user *oldname, int newdfd,
-		  const char __user *newname)
+static long do_symlinkat(struct filename *from, int newdfd,
+		  struct filename *to)
 {
 	int error;
-	struct filename *from;
 	struct dentry *dentry;
 	struct path path;
 	unsigned int lookup_flags = 0;
 
-	from = getname(oldname);
-	if (IS_ERR(from))
-		return PTR_ERR(from);
+	if (IS_ERR(from)) {
+		error = PTR_ERR(from);
+		goto out_putboth;
+	}
 retry:
-	dentry = user_path_create(newdfd, newname, &path, lookup_flags);
+	dentry = __filename_create(newdfd, to, &path, lookup_flags);
 	error = PTR_ERR(dentry);
 	if (IS_ERR(dentry))
-		goto out_putname;
+		goto out_putfrom;
 
 	error = security_path_symlink(&path, dentry, from->name);
 	if (!error) {
@@ -4220,20 +4220,24 @@ static long do_symlinkat(const char __user *oldname, int newdfd,
 		lookup_flags |= LOOKUP_REVAL;
 		goto retry;
 	}
-out_putname:
-	putname(from);
+out_putboth:
+	if (!IS_ERR(to))
+		putname(to);
+out_putfrom:
+	if (!IS_ERR(from))
+		putname(from);
 	return error;
 }
 
 SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
 		int, newdfd, const char __user *, newname)
 {
-	return do_symlinkat(oldname, newdfd, newname);
+	return do_symlinkat(getname(oldname), newdfd, getname(newname));
 }
 
 SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newname)
 {
-	return do_symlinkat(oldname, AT_FDCWD, newname);
+	return do_symlinkat(getname(oldname), AT_FDCWD, getname(newname));
 }
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 05/10] namei: add getname_uflags()
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (3 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 04/10] fs: make do_symlinkat() " Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 06/10] fs: make do_linkat() take struct filename Dmitry Kadashev
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

There are a couple of places where we already open-code the (flags &
AT_EMPTY_PATH) check and io_uring will likely add another one in the
future.  Let's just add a simple helper getname_uflags() that handles
this directly and use it.

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210415100815.edrn4a7cy26wkowe@wittgenstein/
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Dmitry Kadashev <[email protected]>
Acked-by: Christian Brauner <[email protected]>
---
 fs/exec.c          | 8 ++------
 fs/namei.c         | 8 ++++++++
 include/linux/fs.h | 1 +
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 18594f11c31f..df33ecaf2111 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -2069,10 +2069,8 @@ SYSCALL_DEFINE5(execveat,
 		const char __user *const __user *, envp,
 		int, flags)
 {
-	int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
-
 	return do_execveat(fd,
-			   getname_flags(filename, lookup_flags, NULL),
+			   getname_uflags(filename, flags),
 			   argv, envp, flags);
 }
 
@@ -2090,10 +2088,8 @@ COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
 		       const compat_uptr_t __user *, envp,
 		       int,  flags)
 {
-	int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
-
 	return compat_do_execveat(fd,
-				  getname_flags(filename, lookup_flags, NULL),
+				  getname_uflags(filename, flags),
 				  argv, envp, flags);
 }
 #endif
diff --git a/fs/namei.c b/fs/namei.c
index 76572d703e82..010455938826 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -203,6 +203,14 @@ getname_flags(const char __user *filename, int flags, int *empty)
 	return result;
 }
 
+struct filename *
+getname_uflags(const char __user *filename, int uflags)
+{
+	int flags = (uflags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+	return getname_flags(filename, flags, NULL);
+}
+
 struct filename *
 getname(const char __user * filename)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c3c88fdb9b2a..5885a68d2c12 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2781,6 +2781,7 @@ static inline struct file *file_clone_open(struct file *file)
 extern int filp_close(struct file *, fl_owner_t id);
 
 extern struct filename *getname_flags(const char __user *, int, int *);
+extern struct filename *getname_uflags(const char __user *, int);
 extern struct filename *getname(const char __user *);
 extern struct filename *getname_kernel(const char *);
 extern void putname(struct filename *name);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 06/10] fs: make do_linkat() take struct filename
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (4 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 05/10] namei: add getname_uflags() Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 07/10] fs: update do_*() helpers to return ints Dmitry Kadashev
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

Pass in the struct filename pointers instead of the user string, for
uniformity with do_renameat2, do_unlinkat, do_mknodat, etc.

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210330071700.kpjoyp5zlni7uejm@wittgenstein/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/namei.c | 59 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 41 insertions(+), 18 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 010455938826..07b1619dd343 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2447,7 +2447,7 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path
 	return err;
 }
 
-int filename_lookup(int dfd, struct filename *name, unsigned flags,
+static int __filename_lookup(int dfd, struct filename *name, unsigned flags,
 		    struct path *path, struct path *root)
 {
 	int retval;
@@ -2469,7 +2469,18 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
 		audit_inode(name, path->dentry,
 			    flags & LOOKUP_MOUNTPOINT ? AUDIT_INODE_NOEVAL : 0);
 	restore_nameidata();
-	putname(name);
+	if (retval)
+		putname(name);
+	return retval;
+}
+
+int filename_lookup(int dfd, struct filename *name, unsigned flags,
+		    struct path *path, struct path *root)
+{
+	int retval = __filename_lookup(dfd, name, flags, path, root);
+
+	if (!retval)
+		putname(name);
 	return retval;
 }
 
@@ -4346,8 +4357,8 @@ EXPORT_SYMBOL(vfs_link);
  * with linux 2.0, and to avoid hard-linking to directories
  * and other special files.  --ADM
  */
-static int do_linkat(int olddfd, const char __user *oldname, int newdfd,
-	      const char __user *newname, int flags)
+static int do_linkat(int olddfd, struct filename *old, int newdfd,
+	      struct filename *new, int flags)
 {
 	struct user_namespace *mnt_userns;
 	struct dentry *new_dentry;
@@ -4356,31 +4367,36 @@ static int do_linkat(int olddfd, const char __user *oldname, int newdfd,
 	int how = 0;
 	int error;
 
-	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0)
-		return -EINVAL;
+	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) {
+		error = -EINVAL;
+		goto out_putnames;
+	}
 	/*
 	 * To use null names we require CAP_DAC_READ_SEARCH
 	 * This ensures that not everyone will be able to create
 	 * handlink using the passed filedescriptor.
 	 */
-	if (flags & AT_EMPTY_PATH) {
-		if (!capable(CAP_DAC_READ_SEARCH))
-			return -ENOENT;
-		how = LOOKUP_EMPTY;
+	if (flags & AT_EMPTY_PATH && !capable(CAP_DAC_READ_SEARCH)) {
+		error = -ENOENT;
+		goto out_putnames;
 	}
 
 	if (flags & AT_SYMLINK_FOLLOW)
 		how |= LOOKUP_FOLLOW;
 retry:
-	error = user_path_at(olddfd, oldname, how, &old_path);
+	error = __filename_lookup(olddfd, old, how, &old_path, NULL);
 	if (error)
-		return error;
+		goto out_putnew;
 
-	new_dentry = user_path_create(newdfd, newname, &new_path,
+	new_dentry = __filename_create(newdfd, new, &new_path,
 					(how & LOOKUP_REVAL));
 	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
-		goto out;
+	if (IS_ERR(new_dentry)) {
+		// On error `new` is freed by __filename_create, prevent extra freeing
+		// below
+		new = ERR_PTR(error);
+		goto out_putpath;
+	}
 
 	error = -EXDEV;
 	if (old_path.mnt != new_path.mnt)
@@ -4408,8 +4424,14 @@ static int do_linkat(int olddfd, const char __user *oldname, int newdfd,
 		how |= LOOKUP_REVAL;
 		goto retry;
 	}
-out:
+out_putpath:
 	path_put(&old_path);
+out_putnames:
+	if (!IS_ERR(old))
+		putname(old);
+out_putnew:
+	if (!IS_ERR(new))
+		putname(new);
 
 	return error;
 }
@@ -4417,12 +4439,13 @@ static int do_linkat(int olddfd, const char __user *oldname, int newdfd,
 SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 		int, newdfd, const char __user *, newname, int, flags)
 {
-	return do_linkat(olddfd, oldname, newdfd, newname, flags);
+	return do_linkat(olddfd, getname_uflags(oldname, flags),
+		newdfd, getname(newname), flags);
 }
 
 SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname)
 {
-	return do_linkat(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
+	return do_linkat(AT_FDCWD, getname(oldname), AT_FDCWD, getname(newname), 0);
 }
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 07/10] fs: update do_*() helpers to return ints
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (5 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 06/10] fs: make do_linkat() take struct filename Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-03  5:18 ` [PATCH v5 08/10] io_uring: add support for IORING_OP_SYMLINKAT Dmitry Kadashev
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

Update the following to return int rather than long, for uniformity with
the rest of the do_* helpers in namei.c:

* do_rmdir()
* do_unlinkat()
* do_mkdirat()
* do_mknodat()
* do_symlinkat()

Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210514143202.dmzfcgz5hnauy7ze@wittgenstein/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/internal.h |  6 +++---
 fs/namei.c    | 10 +++++-----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 848e165ef0f1..207a455e32d3 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -72,12 +72,12 @@ extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
 			   struct path *path, struct path *root);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
-long do_rmdir(int dfd, struct filename *name);
-long do_unlinkat(int dfd, struct filename *name);
+int do_rmdir(int dfd, struct filename *name);
+int do_unlinkat(int dfd, struct filename *name);
 int may_linkat(struct user_namespace *mnt_userns, struct path *link);
 int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
 		 struct filename *newname, unsigned int flags);
-long do_mkdirat(int dfd, struct filename *name, umode_t mode);
+int do_mkdirat(int dfd, struct filename *name, umode_t mode);
 
 /*
  * namespace.c
diff --git a/fs/namei.c b/fs/namei.c
index 07b1619dd343..f99de6e294ad 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3743,7 +3743,7 @@ static int may_mknod(umode_t mode)
 	}
 }
 
-static long do_mknodat(int dfd, struct filename *name, umode_t mode,
+static int do_mknodat(int dfd, struct filename *name, umode_t mode,
 		unsigned int dev)
 {
 	struct user_namespace *mnt_userns;
@@ -3848,7 +3848,7 @@ int vfs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 }
 EXPORT_SYMBOL(vfs_mkdir);
 
-long do_mkdirat(int dfd, struct filename *name, umode_t mode)
+int do_mkdirat(int dfd, struct filename *name, umode_t mode)
 {
 	struct dentry *dentry;
 	struct path path;
@@ -3943,7 +3943,7 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
 }
 EXPORT_SYMBOL(vfs_rmdir);
 
-long do_rmdir(int dfd, struct filename *name)
+int do_rmdir(int dfd, struct filename *name)
 {
 	struct user_namespace *mnt_userns;
 	int error = 0;
@@ -4081,7 +4081,7 @@ EXPORT_SYMBOL(vfs_unlink);
  * writeout happening, and we don't want to prevent access to the directory
  * while waiting on the I/O.
  */
-long do_unlinkat(int dfd, struct filename *name)
+int do_unlinkat(int dfd, struct filename *name)
 {
 	int error;
 	struct dentry *dentry;
@@ -4208,7 +4208,7 @@ int vfs_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 }
 EXPORT_SYMBOL(vfs_symlink);
 
-static long do_symlinkat(struct filename *from, int newdfd,
+static int do_symlinkat(struct filename *from, int newdfd,
 		  struct filename *to)
 {
 	int error;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 08/10] io_uring: add support for IORING_OP_SYMLINKAT
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (6 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 07/10] fs: update do_*() helpers to return ints Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-22 11:36   ` Pavel Begunkov
  2021-06-03  5:18 ` [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT Dmitry Kadashev
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

IORING_OP_SYMLINKAT behaves like symlinkat(2) and takes the same flags
and arguments.

Suggested-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/internal.h                 |  1 +
 fs/io_uring.c                 | 64 ++++++++++++++++++++++++++++++++++-
 fs/namei.c                    |  3 +-
 include/uapi/linux/io_uring.h |  1 +
 4 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 207a455e32d3..3b3954214385 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -78,6 +78,7 @@ int may_linkat(struct user_namespace *mnt_userns, struct path *link);
 int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
 		 struct filename *newname, unsigned int flags);
 int do_mkdirat(int dfd, struct filename *name, umode_t mode);
+int do_symlinkat(struct filename *from, int newdfd, struct filename *to);
 
 /*
  * namespace.c
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8ab4eb559520..5fdba9b381e5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -672,6 +672,13 @@ struct io_mkdir {
 	struct filename			*filename;
 };
 
+struct io_symlink {
+	struct file			*file;
+	int				new_dfd;
+	struct filename			*oldpath;
+	struct filename			*newpath;
+};
+
 struct io_completion {
 	struct file			*file;
 	struct list_head		list;
@@ -817,6 +824,7 @@ struct io_kiocb {
 		struct io_rename	rename;
 		struct io_unlink	unlink;
 		struct io_mkdir		mkdir;
+		struct io_symlink	symlink;
 		/* use only after cleaning per-op data, see io_clean_op() */
 		struct io_completion	compl;
 	};
@@ -1030,6 +1038,7 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_RENAMEAT] = {},
 	[IORING_OP_UNLINKAT] = {},
 	[IORING_OP_MKDIRAT] = {},
+	[IORING_OP_SYMLINKAT] = {},
 };
 
 static bool io_disarm_next(struct io_kiocb *req);
@@ -3572,7 +3581,51 @@ static int io_mkdirat(struct io_kiocb *req, int issue_flags)
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	if (ret < 0)
-		req_set_fail_links(req);
+		req_set_fail(req);
+	io_req_complete(req, ret);
+	return 0;
+}
+
+static int io_symlinkat_prep(struct io_kiocb *req,
+			    const struct io_uring_sqe *sqe)
+{
+	struct io_symlink *sl = &req->symlink;
+	const char __user *oldpath, *newpath;
+
+	if (unlikely(req->flags & REQ_F_FIXED_FILE))
+		return -EBADF;
+
+	sl->new_dfd = READ_ONCE(sqe->fd);
+	oldpath = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	newpath = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+
+	sl->oldpath = getname(oldpath);
+	if (IS_ERR(sl->oldpath))
+		return PTR_ERR(sl->oldpath);
+
+	sl->newpath = getname(newpath);
+	if (IS_ERR(sl->newpath)) {
+		putname(sl->oldpath);
+		return PTR_ERR(sl->newpath);
+	}
+
+	req->flags |= REQ_F_NEED_CLEANUP;
+	return 0;
+}
+
+static int io_symlinkat(struct io_kiocb *req, int issue_flags)
+{
+	struct io_symlink *sl = &req->symlink;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+	ret = do_symlinkat(sl->oldpath, sl->new_dfd, sl->newpath);
+
+	req->flags &= ~REQ_F_NEED_CLEANUP;
+	if (ret < 0)
+		req_set_fail(req);
 	io_req_complete(req, ret);
 	return 0;
 }
@@ -5985,6 +6038,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_unlinkat_prep(req, sqe);
 	case IORING_OP_MKDIRAT:
 		return io_mkdirat_prep(req, sqe);
+	case IORING_OP_SYMLINKAT:
+		return io_symlinkat_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6129,6 +6184,10 @@ static void io_clean_op(struct io_kiocb *req)
 		case IORING_OP_MKDIRAT:
 			putname(req->mkdir.filename);
 			break;
+		case IORING_OP_SYMLINKAT:
+			putname(req->symlink.oldpath);
+			putname(req->symlink.newpath);
+			break;
 		}
 		req->flags &= ~REQ_F_NEED_CLEANUP;
 	}
@@ -6258,6 +6317,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	case IORING_OP_MKDIRAT:
 		ret = io_mkdirat(req, issue_flags);
 		break;
+	case IORING_OP_SYMLINKAT:
+		ret = io_symlinkat(req, issue_flags);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/fs/namei.c b/fs/namei.c
index f99de6e294ad..f5b0379d2f8c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4208,8 +4208,7 @@ int vfs_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 }
 EXPORT_SYMBOL(vfs_symlink);
 
-static int do_symlinkat(struct filename *from, int newdfd,
-		  struct filename *to)
+int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
 {
 	int error;
 	struct dentry *dentry;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index bf9d720d371f..7b8a78d9c947 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -138,6 +138,7 @@ enum {
 	IORING_OP_RENAMEAT,
 	IORING_OP_UNLINKAT,
 	IORING_OP_MKDIRAT,
+	IORING_OP_SYMLINKAT,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (7 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 08/10] io_uring: add support for IORING_OP_SYMLINKAT Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-22 11:48   ` Pavel Begunkov
  2021-06-03  5:18 ` [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT Dmitry Kadashev
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

IORING_OP_LINKAT behaves like linkat(2) and takes the same flags and
arguments.

In some internal places 'hardlink' is used instead of 'link' to avoid
confusion with the SQE links. Name 'link' conflicts with the existing
'link' member of io_kiocb.

Suggested-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/internal.h                 |  2 ++
 fs/io_uring.c                 | 67 +++++++++++++++++++++++++++++++++++
 fs/namei.c                    |  2 +-
 include/uapi/linux/io_uring.h |  2 ++
 4 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/fs/internal.h b/fs/internal.h
index 3b3954214385..15a7d210cc67 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -79,6 +79,8 @@ int do_renameat2(int olddfd, struct filename *oldname, int newdfd,
 		 struct filename *newname, unsigned int flags);
 int do_mkdirat(int dfd, struct filename *name, umode_t mode);
 int do_symlinkat(struct filename *from, int newdfd, struct filename *to);
+int do_linkat(int olddfd, struct filename *old, int newdfd,
+			struct filename *new, int flags);
 
 /*
  * namespace.c
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5fdba9b381e5..31e1aa7dd90b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -679,6 +679,15 @@ struct io_symlink {
 	struct filename			*newpath;
 };
 
+struct io_hardlink {
+	struct file			*file;
+	int				old_dfd;
+	int				new_dfd;
+	struct filename			*oldpath;
+	struct filename			*newpath;
+	int				flags;
+};
+
 struct io_completion {
 	struct file			*file;
 	struct list_head		list;
@@ -825,6 +834,7 @@ struct io_kiocb {
 		struct io_unlink	unlink;
 		struct io_mkdir		mkdir;
 		struct io_symlink	symlink;
+		struct io_hardlink	hardlink;
 		/* use only after cleaning per-op data, see io_clean_op() */
 		struct io_completion	compl;
 	};
@@ -1039,6 +1049,7 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_UNLINKAT] = {},
 	[IORING_OP_MKDIRAT] = {},
 	[IORING_OP_SYMLINKAT] = {},
+	[IORING_OP_LINKAT] = {},
 };
 
 static bool io_disarm_next(struct io_kiocb *req);
@@ -3630,6 +3641,53 @@ static int io_symlinkat(struct io_kiocb *req, int issue_flags)
 	return 0;
 }
 
+static int io_linkat_prep(struct io_kiocb *req,
+			    const struct io_uring_sqe *sqe)
+{
+	struct io_hardlink *lnk = &req->hardlink;
+	const char __user *oldf, *newf;
+
+	if (unlikely(req->flags & REQ_F_FIXED_FILE))
+		return -EBADF;
+
+	lnk->old_dfd = READ_ONCE(sqe->fd);
+	lnk->new_dfd = READ_ONCE(sqe->len);
+	oldf = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	newf = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+	lnk->flags = READ_ONCE(sqe->hardlink_flags);
+
+	lnk->oldpath = getname(oldf);
+	if (IS_ERR(lnk->oldpath))
+		return PTR_ERR(lnk->oldpath);
+
+	lnk->newpath = getname(newf);
+	if (IS_ERR(lnk->newpath)) {
+		putname(lnk->oldpath);
+		return PTR_ERR(lnk->newpath);
+	}
+
+	req->flags |= REQ_F_NEED_CLEANUP;
+	return 0;
+}
+
+static int io_linkat(struct io_kiocb *req, int issue_flags)
+{
+	struct io_hardlink *lnk = &req->hardlink;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+	ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
+				lnk->newpath, lnk->flags);
+
+	req->flags &= ~REQ_F_NEED_CLEANUP;
+	if (ret < 0)
+		req_set_fail(req);
+	io_req_complete(req, ret);
+	return 0;
+}
+
 static int io_shutdown_prep(struct io_kiocb *req,
 			    const struct io_uring_sqe *sqe)
 {
@@ -6040,6 +6098,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_mkdirat_prep(req, sqe);
 	case IORING_OP_SYMLINKAT:
 		return io_symlinkat_prep(req, sqe);
+	case IORING_OP_LINKAT:
+		return io_linkat_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6188,6 +6248,10 @@ static void io_clean_op(struct io_kiocb *req)
 			putname(req->symlink.oldpath);
 			putname(req->symlink.newpath);
 			break;
+		case IORING_OP_LINKAT:
+			putname(req->hardlink.oldpath);
+			putname(req->hardlink.newpath);
+			break;
 		}
 		req->flags &= ~REQ_F_NEED_CLEANUP;
 	}
@@ -6320,6 +6384,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	case IORING_OP_SYMLINKAT:
 		ret = io_symlinkat(req, issue_flags);
 		break;
+	case IORING_OP_LINKAT:
+		ret = io_linkat(req, issue_flags);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/fs/namei.c b/fs/namei.c
index f5b0379d2f8c..b85e457c43b7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4356,7 +4356,7 @@ EXPORT_SYMBOL(vfs_link);
  * with linux 2.0, and to avoid hard-linking to directories
  * and other special files.  --ADM
  */
-static int do_linkat(int olddfd, struct filename *old, int newdfd,
+int do_linkat(int olddfd, struct filename *old, int newdfd,
 	      struct filename *new, int flags)
 {
 	struct user_namespace *mnt_userns;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 7b8a78d9c947..510e64a0a9c3 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -44,6 +44,7 @@ struct io_uring_sqe {
 		__u32		splice_flags;
 		__u32		rename_flags;
 		__u32		unlink_flags;
+		__u32		hardlink_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	union {
@@ -139,6 +140,7 @@ enum {
 	IORING_OP_UNLINKAT,
 	IORING_OP_MKDIRAT,
 	IORING_OP_SYMLINKAT,
+	IORING_OP_LINKAT,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (8 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT Dmitry Kadashev
@ 2021-06-03  5:18 ` Dmitry Kadashev
  2021-06-22 11:52   ` Pavel Begunkov
  2021-06-18  6:24 ` [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-03  5:18 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring, Dmitry Kadashev

IORING_OP_MKNODAT behaves like mknodat(2) and takes the same flags and
arguments.

Suggested-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
Signed-off-by: Dmitry Kadashev <[email protected]>
---
 fs/internal.h                 |  2 ++
 fs/io_uring.c                 | 56 +++++++++++++++++++++++++++++++++++
 fs/namei.c                    |  2 +-
 include/uapi/linux/io_uring.h |  2 ++
 4 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/fs/internal.h b/fs/internal.h
index 15a7d210cc67..c6fb9974006f 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -81,6 +81,8 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode);
 int do_symlinkat(struct filename *from, int newdfd, struct filename *to);
 int do_linkat(int olddfd, struct filename *old, int newdfd,
 			struct filename *new, int flags);
+int do_mknodat(int dfd, struct filename *name, umode_t mode,
+		unsigned int dev);
 
 /*
  * namespace.c
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 31e1aa7dd90b..475632374af8 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -688,6 +688,14 @@ struct io_hardlink {
 	int				flags;
 };
 
+struct io_mknod {
+	struct file			*file;
+	int				dfd;
+	umode_t				mode;
+	struct filename		*filename;
+	unsigned int		dev;
+};
+
 struct io_completion {
 	struct file			*file;
 	struct list_head		list;
@@ -835,6 +843,7 @@ struct io_kiocb {
 		struct io_mkdir		mkdir;
 		struct io_symlink	symlink;
 		struct io_hardlink	hardlink;
+		struct io_mknod		mknod;
 		/* use only after cleaning per-op data, see io_clean_op() */
 		struct io_completion	compl;
 	};
@@ -1050,6 +1059,7 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_MKDIRAT] = {},
 	[IORING_OP_SYMLINKAT] = {},
 	[IORING_OP_LINKAT] = {},
+	[IORING_OP_MKNODAT] = {},
 };
 
 static bool io_disarm_next(struct io_kiocb *req);
@@ -3687,6 +3697,44 @@ static int io_linkat(struct io_kiocb *req, int issue_flags)
 	io_req_complete(req, ret);
 	return 0;
 }
+static int io_mknodat_prep(struct io_kiocb *req,
+			    const struct io_uring_sqe *sqe)
+{
+	struct io_mknod *mkn = &req->mknod;
+	const char __user *fname;
+
+	if (unlikely(req->flags & REQ_F_FIXED_FILE))
+		return -EBADF;
+
+	mkn->dfd = READ_ONCE(sqe->fd);
+	mkn->mode = READ_ONCE(sqe->len);
+	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	mkn->dev = READ_ONCE(sqe->mknod_dev);
+
+	mkn->filename = getname(fname);
+	if (IS_ERR(mkn->filename))
+		return PTR_ERR(mkn->filename);
+
+	req->flags |= REQ_F_NEED_CLEANUP;
+	return 0;
+}
+
+static int io_mknodat(struct io_kiocb *req, int issue_flags)
+{
+	struct io_mknod *mkn = &req->mknod;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+	ret = do_mknodat(mkn->dfd, mkn->filename, mkn->mode, mkn->dev);
+
+	req->flags &= ~REQ_F_NEED_CLEANUP;
+	if (ret < 0)
+		req_set_fail(req);
+	io_req_complete(req, ret);
+	return 0;
+}
 
 static int io_shutdown_prep(struct io_kiocb *req,
 			    const struct io_uring_sqe *sqe)
@@ -6100,6 +6148,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_symlinkat_prep(req, sqe);
 	case IORING_OP_LINKAT:
 		return io_linkat_prep(req, sqe);
+	case IORING_OP_MKNODAT:
+		return io_mknodat_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6252,6 +6302,9 @@ static void io_clean_op(struct io_kiocb *req)
 			putname(req->hardlink.oldpath);
 			putname(req->hardlink.newpath);
 			break;
+		case IORING_OP_MKNODAT:
+			putname(req->mknod.filename);
+			break;
 		}
 		req->flags &= ~REQ_F_NEED_CLEANUP;
 	}
@@ -6387,6 +6440,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	case IORING_OP_LINKAT:
 		ret = io_linkat(req, issue_flags);
 		break;
+	case IORING_OP_MKNODAT:
+		ret = io_mknodat(req, issue_flags);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/fs/namei.c b/fs/namei.c
index b85e457c43b7..a4b1848b5dd1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3743,7 +3743,7 @@ static int may_mknod(umode_t mode)
 	}
 }
 
-static int do_mknodat(int dfd, struct filename *name, umode_t mode,
+int do_mknodat(int dfd, struct filename *name, umode_t mode,
 		unsigned int dev)
 {
 	struct user_namespace *mnt_userns;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 510e64a0a9c3..824b37f53a28 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -45,6 +45,7 @@ struct io_uring_sqe {
 		__u32		rename_flags;
 		__u32		unlink_flags;
 		__u32		hardlink_flags;
+		__u32		mknod_dev;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	union {
@@ -141,6 +142,7 @@ enum {
 	IORING_OP_MKDIRAT,
 	IORING_OP_SYMLINKAT,
 	IORING_OP_LINKAT,
+	IORING_OP_MKNODAT,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (9 preceding siblings ...)
  2021-06-03  5:18 ` [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT Dmitry Kadashev
@ 2021-06-18  6:24 ` Dmitry Kadashev
  2021-06-18 16:10   ` Jens Axboe
  2021-06-21 15:57 ` Jens Axboe
  2021-06-22 11:56 ` Pavel Begunkov
  12 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-18  6:24 UTC (permalink / raw)
  To: Jens Axboe, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring

On Thu, Jun 3, 2021 at 12:18 PM Dmitry Kadashev <[email protected]> wrote:
>
> This started out as an attempt to add mkdirat support to io_uring which
> is heavily based on renameat() / unlinkat() support.
>
> During the review process more operations were added (linkat, symlinkat,
> mknodat) mainly to keep things uniform internally (in namei.c), and
> with things changed in namei.c adding support for these operations to
> io_uring is trivial, so that was done too. See
> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/

Ping. Jens, are we waiting for the audit change to be merged before this
can go in?

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-18  6:24 ` [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
@ 2021-06-18 16:10   ` Jens Axboe
  2021-06-21 15:21     ` Jens Axboe
  0 siblings, 1 reply; 53+ messages in thread
From: Jens Axboe @ 2021-06-18 16:10 UTC (permalink / raw)
  To: Dmitry Kadashev, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring

On 6/18/21 12:24 AM, Dmitry Kadashev wrote:
> On Thu, Jun 3, 2021 at 12:18 PM Dmitry Kadashev <[email protected]> wrote:
>>
>> This started out as an attempt to add mkdirat support to io_uring which
>> is heavily based on renameat() / unlinkat() support.
>>
>> During the review process more operations were added (linkat, symlinkat,
>> mknodat) mainly to keep things uniform internally (in namei.c), and
>> with things changed in namei.c adding support for these operations to
>> io_uring is trivial, so that was done too. See
>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> 
> Ping. Jens, are we waiting for the audit change to be merged before this
> can go in?

Not necessarily, as that should go in for 5.14 anyway.

Al, are you OK with the generic changes?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-18 16:10   ` Jens Axboe
@ 2021-06-21 15:21     ` Jens Axboe
  2021-06-22  8:12       ` Christian Brauner
  2021-06-22  8:26       ` Dmitry Kadashev
  0 siblings, 2 replies; 53+ messages in thread
From: Jens Axboe @ 2021-06-21 15:21 UTC (permalink / raw)
  To: Dmitry Kadashev, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring

On 6/18/21 10:10 AM, Jens Axboe wrote:
> On 6/18/21 12:24 AM, Dmitry Kadashev wrote:
>> On Thu, Jun 3, 2021 at 12:18 PM Dmitry Kadashev <[email protected]> wrote:
>>>
>>> This started out as an attempt to add mkdirat support to io_uring which
>>> is heavily based on renameat() / unlinkat() support.
>>>
>>> During the review process more operations were added (linkat, symlinkat,
>>> mknodat) mainly to keep things uniform internally (in namei.c), and
>>> with things changed in namei.c adding support for these operations to
>>> io_uring is trivial, so that was done too. See
>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>
>> Ping. Jens, are we waiting for the audit change to be merged before this
>> can go in?
> 
> Not necessarily, as that should go in for 5.14 anyway.
> 
> Al, are you OK with the generic changes?

I have tentatively queued this up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (10 preceding siblings ...)
  2021-06-18  6:24 ` [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
@ 2021-06-21 15:57 ` Jens Axboe
  2021-06-21 15:59   ` Jens Axboe
  2021-06-22 11:56 ` Pavel Begunkov
  12 siblings, 1 reply; 53+ messages in thread
From: Jens Axboe @ 2021-06-21 15:57 UTC (permalink / raw)
  To: Dmitry Kadashev, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring

On 6/2/21 11:18 PM, Dmitry Kadashev wrote:
> This started out as an attempt to add mkdirat support to io_uring which
> is heavily based on renameat() / unlinkat() support.
> 
> During the review process more operations were added (linkat, symlinkat,
> mknodat) mainly to keep things uniform internally (in namei.c), and
> with things changed in namei.c adding support for these operations to
> io_uring is trivial, so that was done too. See
> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> 
> The first patch is preparation with no functional changes, makes
> do_mkdirat accept struct filename pointer rather than the user string.
> 
> The second one leverages that to implement mkdirat in io_uring.
> 
> 3-6 just convert other similar do_* functions in namei.c to accept
> struct filename, for uniformity with do_mkdirat, do_renameat and
> do_unlinkat. No functional changes there.
> 
> 7 changes do_* helpers in namei.c to return ints rather than some of
> them returning ints and some longs.
> 
> 8-10 add symlinkat, linkat and mknodat support to io_uring
> (correspondingly).
> 
> Based on for-5.14/io_uring.

Can you send in the liburing tests as well?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-21 15:57 ` Jens Axboe
@ 2021-06-21 15:59   ` Jens Axboe
  0 siblings, 0 replies; 53+ messages in thread
From: Jens Axboe @ 2021-06-21 15:59 UTC (permalink / raw)
  To: Dmitry Kadashev, Alexander Viro, Christian Brauner
  Cc: Pavel Begunkov, linux-fsdevel, io-uring

On 6/21/21 9:57 AM, Jens Axboe wrote:
> On 6/2/21 11:18 PM, Dmitry Kadashev wrote:
>> This started out as an attempt to add mkdirat support to io_uring which
>> is heavily based on renameat() / unlinkat() support.
>>
>> During the review process more operations were added (linkat, symlinkat,
>> mknodat) mainly to keep things uniform internally (in namei.c), and
>> with things changed in namei.c adding support for these operations to
>> io_uring is trivial, so that was done too. See
>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>
>> The first patch is preparation with no functional changes, makes
>> do_mkdirat accept struct filename pointer rather than the user string.
>>
>> The second one leverages that to implement mkdirat in io_uring.
>>
>> 3-6 just convert other similar do_* functions in namei.c to accept
>> struct filename, for uniformity with do_mkdirat, do_renameat and
>> do_unlinkat. No functional changes there.
>>
>> 7 changes do_* helpers in namei.c to return ints rather than some of
>> them returning ints and some longs.
>>
>> 8-10 add symlinkat, linkat and mknodat support to io_uring
>> (correspondingly).
>>
>> Based on for-5.14/io_uring.
> 
> Can you send in the liburing tests as well?

Ah nevermind, you already did...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-21 15:21     ` Jens Axboe
@ 2021-06-22  8:12       ` Christian Brauner
  2021-06-22  8:34         ` Dmitry Kadashev
  2021-06-22 17:26         ` Jens Axboe
  2021-06-22  8:26       ` Dmitry Kadashev
  1 sibling, 2 replies; 53+ messages in thread
From: Christian Brauner @ 2021-06-22  8:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Dmitry Kadashev, Alexander Viro, Pavel Begunkov, linux-fsdevel,
	io-uring

On Mon, Jun 21, 2021 at 09:21:23AM -0600, Jens Axboe wrote:
> On 6/18/21 10:10 AM, Jens Axboe wrote:
> > On 6/18/21 12:24 AM, Dmitry Kadashev wrote:
> >> On Thu, Jun 3, 2021 at 12:18 PM Dmitry Kadashev <[email protected]> wrote:
> >>>
> >>> This started out as an attempt to add mkdirat support to io_uring which
> >>> is heavily based on renameat() / unlinkat() support.
> >>>
> >>> During the review process more operations were added (linkat, symlinkat,
> >>> mknodat) mainly to keep things uniform internally (in namei.c), and
> >>> with things changed in namei.c adding support for these operations to
> >>> io_uring is trivial, so that was done too. See
> >>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> >>
> >> Ping. Jens, are we waiting for the audit change to be merged before this
> >> can go in?
> > 
> > Not necessarily, as that should go in for 5.14 anyway.
> > 
> > Al, are you OK with the generic changes?
> 
> I have tentatively queued this up.

Hey Dmitry,
hey Jens,

The additional op codes and suggested rework is partially on me. So I
should share the blame in case this gets a NAK:

Acked-by: Christian Brauner <[email protected]>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-21 15:21     ` Jens Axboe
  2021-06-22  8:12       ` Christian Brauner
@ 2021-06-22  8:26       ` Dmitry Kadashev
  1 sibling, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-22  8:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alexander Viro, Christian Brauner, Pavel Begunkov, linux-fsdevel,
	io-uring

On Mon, Jun 21, 2021 at 10:21 PM Jens Axboe <[email protected]> wrote:
>
> I have tentatively queued this up.

Tentative "woohoo!"

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22  8:12       ` Christian Brauner
@ 2021-06-22  8:34         ` Dmitry Kadashev
  2021-06-29 13:06           ` Christian Brauner
  2021-06-22 17:26         ` Jens Axboe
  1 sibling, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-22  8:34 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jens Axboe, Alexander Viro, Pavel Begunkov, linux-fsdevel,
	io-uring

On Tue, Jun 22, 2021 at 3:12 PM Christian Brauner
<[email protected]> wrote:
>
> On Mon, Jun 21, 2021 at 09:21:23AM -0600, Jens Axboe wrote:
> > On 6/18/21 10:10 AM, Jens Axboe wrote:
> > > On 6/18/21 12:24 AM, Dmitry Kadashev wrote:
> > >> On Thu, Jun 3, 2021 at 12:18 PM Dmitry Kadashev <[email protected]> wrote:
> > >>>
> > >>> This started out as an attempt to add mkdirat support to io_uring which
> > >>> is heavily based on renameat() / unlinkat() support.
> > >>>
> > >>> During the review process more operations were added (linkat, symlinkat,
> > >>> mknodat) mainly to keep things uniform internally (in namei.c), and
> > >>> with things changed in namei.c adding support for these operations to
> > >>> io_uring is trivial, so that was done too. See
> > >>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> > >>
> > >> Ping. Jens, are we waiting for the audit change to be merged before this
> > >> can go in?
> > >
> > > Not necessarily, as that should go in for 5.14 anyway.
> > >
> > > Al, are you OK with the generic changes?
> >
> > I have tentatively queued this up.
>
> Hey Dmitry,
> hey Jens,
>
> The additional op codes and suggested rework is partially on me. So I
> should share the blame in case this gets a NAK:
>
> Acked-by: Christian Brauner <[email protected]>

Hi Christian,

Thank you very much for your help with this! Just wanted to say that you
were quite supportive and really helped me a lot.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 08/10] io_uring: add support for IORING_OP_SYMLINKAT
  2021-06-03  5:18 ` [PATCH v5 08/10] io_uring: add support for IORING_OP_SYMLINKAT Dmitry Kadashev
@ 2021-06-22 11:36   ` Pavel Begunkov
  2021-06-23  5:45     ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 11:36 UTC (permalink / raw)
  To: Dmitry Kadashev, Jens Axboe, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> IORING_OP_SYMLINKAT behaves like symlinkat(2) and takes the same flags
> and arguments.
> 
> Suggested-by: Christian Brauner <[email protected]>
> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> Signed-off-by: Dmitry Kadashev <[email protected]>
> ---
>  fs/internal.h                 |  1 +
>  fs/io_uring.c                 | 64 ++++++++++++++++++++++++++++++++++-
>  fs/namei.c                    |  3 +-
>  include/uapi/linux/io_uring.h |  1 +
>  4 files changed, 66 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/internal.h b/fs/internal.h
> index 207a455e32d3..3b3954214385 100644

[...]

>  
>  static bool io_disarm_next(struct io_kiocb *req);
> @@ -3572,7 +3581,51 @@ static int io_mkdirat(struct io_kiocb *req, int issue_flags)
>  
>  	req->flags &= ~REQ_F_NEED_CLEANUP;
>  	if (ret < 0)
> -		req_set_fail_links(req);
> +		req_set_fail(req);

This means one of the previous patches doesn't compile. Let's fix it.

> +	io_req_complete(req, ret);
> +	return 0;
> +}
> +

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-03  5:18 ` [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT Dmitry Kadashev
@ 2021-06-22 11:41   ` Pavel Begunkov
  2021-06-22 11:50     ` Pavel Begunkov
  2021-06-22 17:41   ` Pavel Begunkov
  1 sibling, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 11:41 UTC (permalink / raw)
  To: Dmitry Kadashev, Jens Axboe, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
> and arguments.
> 
> Signed-off-by: Dmitry Kadashev <[email protected]>
> ---
>  fs/io_uring.c                 | 55 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/io_uring.h |  1 +
>  2 files changed, 56 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a1ca6badff36..8ab4eb559520 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -665,6 +665,13 @@ struct io_unlink {
>  	struct filename			*filename;
>  };
>  
> +struct io_mkdir {
> +	struct file			*file;
> +	int				dfd;
> +	umode_t				mode;
> +	struct filename			*filename;
> +};
> +
>  struct io_completion {
>  	struct file			*file;
>  	struct list_head		list;
> @@ -809,6 +816,7 @@ struct io_kiocb {
>  		struct io_shutdown	shutdown;
>  		struct io_rename	rename;
>  		struct io_unlink	unlink;
> +		struct io_mkdir		mkdir;
>  		/* use only after cleaning per-op data, see io_clean_op() */
>  		struct io_completion	compl;
>  	};
> @@ -1021,6 +1029,7 @@ static const struct io_op_def io_op_defs[] = {
>  	},
>  	[IORING_OP_RENAMEAT] = {},
>  	[IORING_OP_UNLINKAT] = {},
> +	[IORING_OP_MKDIRAT] = {},
>  };
>  
>  static bool io_disarm_next(struct io_kiocb *req);
> @@ -3530,6 +3539,44 @@ static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
>  	return 0;
>  }
>  
> +static int io_mkdirat_prep(struct io_kiocb *req,
> +			    const struct io_uring_sqe *sqe)
> +{
> +	struct io_mkdir *mkd = &req->mkdir;
> +	const char __user *fname;
> +
> +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
> +		return -EBADF;
> +
> +	mkd->dfd = READ_ONCE(sqe->fd);
> +	mkd->mode = READ_ONCE(sqe->len);
> +
> +	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	mkd->filename = getname(fname);
> +	if (IS_ERR(mkd->filename))
> +		return PTR_ERR(mkd->filename);

We have to check unused fields, e.g. buf_index and off,
to be able to use them in the future if needed. 

if (sqe->buf_index || sqe->off)
	return -EINVAL;

Please double check what fields are not used, and
same goes for all other opcodes.

> +
> +	req->flags |= REQ_F_NEED_CLEANUP;
> +	return 0;
> +}
> +
> +static int io_mkdirat(struct io_kiocb *req, int issue_flags)
> +{
> +	struct io_mkdir *mkd = &req->mkdir;
> +	int ret;
> +
> +	if (issue_flags & IO_URING_F_NONBLOCK)
> +		return -EAGAIN;
> +
> +	ret = do_mkdirat(mkd->dfd, mkd->filename, mkd->mode);
> +
> +	req->flags &= ~REQ_F_NEED_CLEANUP;
> +	if (ret < 0)
> +		req_set_fail_links(req);
> +	io_req_complete(req, ret);
> +	return 0;
> +}
> +
>  static int io_shutdown_prep(struct io_kiocb *req,
>  			    const struct io_uring_sqe *sqe)
>  {
> @@ -5936,6 +5983,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  		return io_renameat_prep(req, sqe);
>  	case IORING_OP_UNLINKAT:
>  		return io_unlinkat_prep(req, sqe);
> +	case IORING_OP_MKDIRAT:
> +		return io_mkdirat_prep(req, sqe);
>  	}
>  
>  	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
> @@ -6077,6 +6126,9 @@ static void io_clean_op(struct io_kiocb *req)
>  		case IORING_OP_UNLINKAT:
>  			putname(req->unlink.filename);
>  			break;
> +		case IORING_OP_MKDIRAT:
> +			putname(req->mkdir.filename);
> +			break;
>  		}
>  		req->flags &= ~REQ_F_NEED_CLEANUP;
>  	}
> @@ -6203,6 +6255,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>  	case IORING_OP_UNLINKAT:
>  		ret = io_unlinkat(req, issue_flags);
>  		break;
> +	case IORING_OP_MKDIRAT:
> +		ret = io_mkdirat(req, issue_flags);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index e1ae46683301..bf9d720d371f 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -137,6 +137,7 @@ enum {
>  	IORING_OP_SHUTDOWN,
>  	IORING_OP_RENAMEAT,
>  	IORING_OP_UNLINKAT,
> +	IORING_OP_MKDIRAT,
>  
>  	/* this goes last, obviously */
>  	IORING_OP_LAST,
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT
  2021-06-03  5:18 ` [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT Dmitry Kadashev
@ 2021-06-22 11:48   ` Pavel Begunkov
  2021-06-23  6:09     ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 11:48 UTC (permalink / raw)
  To: Dmitry Kadashev, Jens Axboe, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> IORING_OP_LINKAT behaves like linkat(2) and takes the same flags and
> arguments.
> 
> In some internal places 'hardlink' is used instead of 'link' to avoid
> confusion with the SQE links. Name 'link' conflicts with the existing
> 'link' member of io_kiocb.
> 
> Suggested-by: Christian Brauner <[email protected]>
> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> Signed-off-by: Dmitry Kadashev <[email protected]>
> ---
>  fs/internal.h                 |  2 ++
>  fs/io_uring.c                 | 67 +++++++++++++++++++++++++++++++++++
>  fs/namei.c                    |  2 +-
>  include/uapi/linux/io_uring.h |  2 ++
>  4 files changed, 72 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/internal.h b/fs/internal.h
> index 3b3954214385..15a7d210cc67 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h

[...]
> +
> +static int io_linkat(struct io_kiocb *req, int issue_flags)
> +{
> +	struct io_hardlink *lnk = &req->hardlink;
> +	int ret;
> +
> +	if (issue_flags & IO_URING_F_NONBLOCK)
> +		return -EAGAIN;
> +
> +	ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
> +				lnk->newpath, lnk->flags);

I'm curious, what's difference b/w SYMLINK and just LINK that
one doesn't use old_dfd and another does? Can it be
supported/wished by someone in the future? In that case I'd rather
reserve and verify a field for old_dfd for both, even if one
won't really support it for now.

> +
> +	req->flags &= ~REQ_F_NEED_CLEANUP;
> +	if (ret < 0)
> +		req_set_fail(req);
> +	io_req_complete(req, ret);
> +	return 0;
> +}
> +

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-22 11:41   ` Pavel Begunkov
@ 2021-06-22 11:50     ` Pavel Begunkov
  2021-06-23  6:41       ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 11:50 UTC (permalink / raw)
  To: Dmitry Kadashev, Jens Axboe, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/22/21 12:41 PM, Pavel Begunkov wrote:
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>> IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
>> and arguments.
>>
>> Signed-off-by: Dmitry Kadashev <[email protected]>
>> ---
>>  fs/io_uring.c                 | 55 +++++++++++++++++++++++++++++++++++
>>  include/uapi/linux/io_uring.h |  1 +
>>  2 files changed, 56 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index a1ca6badff36..8ab4eb559520 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -665,6 +665,13 @@ struct io_unlink {
>>  	struct filename			*filename;
>>  };
>>  
>> +struct io_mkdir {
>> +	struct file			*file;
>> +	int				dfd;
>> +	umode_t				mode;
>> +	struct filename			*filename;
>> +};
>> +
>>  struct io_completion {
>>  	struct file			*file;
>>  	struct list_head		list;
>> @@ -809,6 +816,7 @@ struct io_kiocb {
>>  		struct io_shutdown	shutdown;
>>  		struct io_rename	rename;
>>  		struct io_unlink	unlink;
>> +		struct io_mkdir		mkdir;
>>  		/* use only after cleaning per-op data, see io_clean_op() */
>>  		struct io_completion	compl;
>>  	};
>> @@ -1021,6 +1029,7 @@ static const struct io_op_def io_op_defs[] = {
>>  	},
>>  	[IORING_OP_RENAMEAT] = {},
>>  	[IORING_OP_UNLINKAT] = {},
>> +	[IORING_OP_MKDIRAT] = {},
>>  };
>>  
>>  static bool io_disarm_next(struct io_kiocb *req);
>> @@ -3530,6 +3539,44 @@ static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
>>  	return 0;
>>  }
>>  
>> +static int io_mkdirat_prep(struct io_kiocb *req,
>> +			    const struct io_uring_sqe *sqe)
>> +{
>> +	struct io_mkdir *mkd = &req->mkdir;
>> +	const char __user *fname;
>> +
>> +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
>> +		return -EBADF;
>> +
>> +	mkd->dfd = READ_ONCE(sqe->fd);
>> +	mkd->mode = READ_ONCE(sqe->len);
>> +
>> +	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> +	mkd->filename = getname(fname);
>> +	if (IS_ERR(mkd->filename))
>> +		return PTR_ERR(mkd->filename);
> 
> We have to check unused fields, e.g. buf_index and off,
> to be able to use them in the future if needed. 
> 
> if (sqe->buf_index || sqe->off)
> 	return -EINVAL;
> 
> Please double check what fields are not used, and
> same goes for all other opcodes.

+ opcode specific flags, e.g.

if (sqe->rw_flags)
	return -EINVAL;

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT
  2021-06-03  5:18 ` [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT Dmitry Kadashev
@ 2021-06-22 11:52   ` Pavel Begunkov
  2021-06-23  6:26     ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 11:52 UTC (permalink / raw)
  To: Dmitry Kadashev, Jens Axboe, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> IORING_OP_MKNODAT behaves like mknodat(2) and takes the same flags and
> arguments.
> 
> Suggested-by: Christian Brauner <[email protected]>
> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> Signed-off-by: Dmitry Kadashev <[email protected]>
> ---
>  fs/internal.h                 |  2 ++
>  fs/io_uring.c                 | 56 +++++++++++++++++++++++++++++++++++
>  fs/namei.c                    |  2 +-
>  include/uapi/linux/io_uring.h |  2 ++
>  4 files changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/internal.h b/fs/internal.h
> index 15a7d210cc67..c6fb9974006f 100644

[...]

>  static bool io_disarm_next(struct io_kiocb *req);
> @@ -3687,6 +3697,44 @@ static int io_linkat(struct io_kiocb *req, int issue_flags)
>  	io_req_complete(req, ret);
>  	return 0;
>  }
> +static int io_mknodat_prep(struct io_kiocb *req,
> +			    const struct io_uring_sqe *sqe)
> +{
> +	struct io_mknod *mkn = &req->mknod;
> +	const char __user *fname;
> +
> +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
> +		return -EBADF;

IOPOLL won't support it, and the check is missing.
Probably for other opcodes as well.

if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
	return -EINVAL;

> +
> +	mkn->dfd = READ_ONCE(sqe->fd);
> +	mkn->mode = READ_ONCE(sqe->len);
> +	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	mkn->dev = READ_ONCE(sqe->mknod_dev);
> +
> +	mkn->filename = getname(fname);
> +	if (IS_ERR(mkn->filename))
> +		return PTR_ERR(mkn->filename);
> +
> +	req->flags |= REQ_F_NEED_CLEANUP;
> +	return 0;
> +}

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
                   ` (11 preceding siblings ...)
  2021-06-21 15:57 ` Jens Axboe
@ 2021-06-22 11:56 ` Pavel Begunkov
  2021-06-22 17:26   ` Jens Axboe
  2021-06-23  5:35   ` Dmitry Kadashev
  12 siblings, 2 replies; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 11:56 UTC (permalink / raw)
  To: Dmitry Kadashev, Jens Axboe, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> This started out as an attempt to add mkdirat support to io_uring which
> is heavily based on renameat() / unlinkat() support.
> 
> During the review process more operations were added (linkat, symlinkat,
> mknodat) mainly to keep things uniform internally (in namei.c), and
> with things changed in namei.c adding support for these operations to
> io_uring is trivial, so that was done too. See
> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/

io_uring part looks good in general, just small comments. However, I
believe we should respin it, because there should be build problems
in the middle.


> The first patch is preparation with no functional changes, makes
> do_mkdirat accept struct filename pointer rather than the user string.
> 
> The second one leverages that to implement mkdirat in io_uring.
> 
> 3-6 just convert other similar do_* functions in namei.c to accept
> struct filename, for uniformity with do_mkdirat, do_renameat and
> do_unlinkat. No functional changes there.
> 
> 7 changes do_* helpers in namei.c to return ints rather than some of
> them returning ints and some longs.
> 
> 8-10 add symlinkat, linkat and mknodat support to io_uring
> (correspondingly).
> 
> Based on for-5.14/io_uring.
> 
> v5:
> - rebase
> - add symlinkat, linkat and mknodat support to io_uring
> 
> v4:
> - update do_mknodat, do_symlinkat and do_linkat to accept struct
>   filename for uniformity with do_mkdirat, do_renameat and do_unlinkat;
> 
> v3:
> - rebase;
> 
> v2:
> - do not mess with struct filename's refcount in do_mkdirat, instead add
>   and use __filename_create() that does not drop the name on success;
> 
> 
> Dmitry Kadashev (10):
>   fs: make do_mkdirat() take struct filename
>   io_uring: add support for IORING_OP_MKDIRAT
>   fs: make do_mknodat() take struct filename
>   fs: make do_symlinkat() take struct filename
>   namei: add getname_uflags()
>   fs: make do_linkat() take struct filename
>   fs: update do_*() helpers to return ints
>   io_uring: add support for IORING_OP_SYMLINKAT
>   io_uring: add support for IORING_OP_LINKAT
>   io_uring: add support for IORING_OP_MKNODAT
> 
>  fs/exec.c                     |   8 +-
>  fs/internal.h                 |  10 +-
>  fs/io_uring.c                 | 240 ++++++++++++++++++++++++++++++++++
>  fs/namei.c                    | 137 ++++++++++++-------
>  include/linux/fs.h            |   1 +
>  include/uapi/linux/io_uring.h |   6 +
>  6 files changed, 349 insertions(+), 53 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22 11:56 ` Pavel Begunkov
@ 2021-06-22 17:26   ` Jens Axboe
  2021-06-22 17:28     ` Pavel Begunkov
  2021-06-23  5:35   ` Dmitry Kadashev
  1 sibling, 1 reply; 53+ messages in thread
From: Jens Axboe @ 2021-06-22 17:26 UTC (permalink / raw)
  To: Pavel Begunkov, Dmitry Kadashev, Alexander Viro,
	Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/22/21 5:56 AM, Pavel Begunkov wrote:
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>> This started out as an attempt to add mkdirat support to io_uring which
>> is heavily based on renameat() / unlinkat() support.
>>
>> During the review process more operations were added (linkat, symlinkat,
>> mknodat) mainly to keep things uniform internally (in namei.c), and
>> with things changed in namei.c adding support for these operations to
>> io_uring is trivial, so that was done too. See
>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> 
> io_uring part looks good in general, just small comments. However, I
> believe we should respin it, because there should be build problems
> in the middle.

I can drop it, if Dmitry wants to respin. I do think that we could
easily drop mknodat and not really lose anything there, better to
reserve the op for something a bit more useful.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22  8:12       ` Christian Brauner
  2021-06-22  8:34         ` Dmitry Kadashev
@ 2021-06-22 17:26         ` Jens Axboe
  1 sibling, 0 replies; 53+ messages in thread
From: Jens Axboe @ 2021-06-22 17:26 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Dmitry Kadashev, Alexander Viro, Pavel Begunkov, linux-fsdevel,
	io-uring

On 6/22/21 2:12 AM, Christian Brauner wrote:
> On Mon, Jun 21, 2021 at 09:21:23AM -0600, Jens Axboe wrote:
>> On 6/18/21 10:10 AM, Jens Axboe wrote:
>>> On 6/18/21 12:24 AM, Dmitry Kadashev wrote:
>>>> On Thu, Jun 3, 2021 at 12:18 PM Dmitry Kadashev <[email protected]> wrote:
>>>>>
>>>>> This started out as an attempt to add mkdirat support to io_uring which
>>>>> is heavily based on renameat() / unlinkat() support.
>>>>>
>>>>> During the review process more operations were added (linkat, symlinkat,
>>>>> mknodat) mainly to keep things uniform internally (in namei.c), and
>>>>> with things changed in namei.c adding support for these operations to
>>>>> io_uring is trivial, so that was done too. See
>>>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>>>
>>>> Ping. Jens, are we waiting for the audit change to be merged before this
>>>> can go in?
>>>
>>> Not necessarily, as that should go in for 5.14 anyway.
>>>
>>> Al, are you OK with the generic changes?
>>
>> I have tentatively queued this up.
> 
> Hey Dmitry,
> hey Jens,
> 
> The additional op codes and suggested rework is partially on me. So I
> should share the blame in case this gets a NAK:
> 
> Acked-by: Christian Brauner <[email protected]>

Thanks Christian!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22 17:26   ` Jens Axboe
@ 2021-06-22 17:28     ` Pavel Begunkov
  2021-06-22 17:32       ` Jens Axboe
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 17:28 UTC (permalink / raw)
  To: Jens Axboe, Dmitry Kadashev, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/22/21 6:26 PM, Jens Axboe wrote:
> On 6/22/21 5:56 AM, Pavel Begunkov wrote:
>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>> This started out as an attempt to add mkdirat support to io_uring which
>>> is heavily based on renameat() / unlinkat() support.
>>>
>>> During the review process more operations were added (linkat, symlinkat,
>>> mknodat) mainly to keep things uniform internally (in namei.c), and
>>> with things changed in namei.c adding support for these operations to
>>> io_uring is trivial, so that was done too. See
>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>
>> io_uring part looks good in general, just small comments. However, I
>> believe we should respin it, because there should be build problems
>> in the middle.
> 
> I can drop it, if Dmitry wants to respin. I do think that we could
> easily drop mknodat and not really lose anything there, better to
> reserve the op for something a bit more useful.

I can try it and send a fold in, if you want.
Other changes may be on top

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22 17:28     ` Pavel Begunkov
@ 2021-06-22 17:32       ` Jens Axboe
  2021-06-23  5:37         ` Dmitry Kadashev
  2021-06-23  5:49         ` Dmitry Kadashev
  0 siblings, 2 replies; 53+ messages in thread
From: Jens Axboe @ 2021-06-22 17:32 UTC (permalink / raw)
  To: Pavel Begunkov, Dmitry Kadashev, Alexander Viro,
	Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/22/21 11:28 AM, Pavel Begunkov wrote:
> On 6/22/21 6:26 PM, Jens Axboe wrote:
>> On 6/22/21 5:56 AM, Pavel Begunkov wrote:
>>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>>> This started out as an attempt to add mkdirat support to io_uring which
>>>> is heavily based on renameat() / unlinkat() support.
>>>>
>>>> During the review process more operations were added (linkat, symlinkat,
>>>> mknodat) mainly to keep things uniform internally (in namei.c), and
>>>> with things changed in namei.c adding support for these operations to
>>>> io_uring is trivial, so that was done too. See
>>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>>
>>> io_uring part looks good in general, just small comments. However, I
>>> believe we should respin it, because there should be build problems
>>> in the middle.
>>
>> I can drop it, if Dmitry wants to respin. I do think that we could
>> easily drop mknodat and not really lose anything there, better to
>> reserve the op for something a bit more useful.
> 
> I can try it and send a fold in, if you want.
> Other changes may be on top

Sure that works too, will rebase in any case, and I'd like to add
Christian's ack as well. I'll just re-apply with the fold-ins.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-03  5:18 ` [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT Dmitry Kadashev
  2021-06-22 11:41   ` Pavel Begunkov
@ 2021-06-22 17:41   ` Pavel Begunkov
  2021-06-23  0:41     ` Jens Axboe
  2021-06-23  5:50     ` Dmitry Kadashev
  1 sibling, 2 replies; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-22 17:41 UTC (permalink / raw)
  To: Dmitry Kadashev, Jens Axboe, Alexander Viro, Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
> and arguments.

Jens, a fold-in er discussed, and it will get you
a conflict at 8/10


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4b215e0f8dd8..c0e469ebd22d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3589,7 +3589,7 @@ static int io_mkdirat(struct io_kiocb *req, int issue_flags)
 
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	if (ret < 0)
-		req_set_fail_links(req);
+		req_set_fail(req);
 	io_req_complete(req, ret);
 	return 0;
 }


> Signed-off-by: Dmitry Kadashev <[email protected]>
> ---
>  fs/io_uring.c                 | 55 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/io_uring.h |  1 +
>  2 files changed, 56 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a1ca6badff36..8ab4eb559520 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -665,6 +665,13 @@ struct io_unlink {
>  	struct filename			*filename;
>  };
>  
> +struct io_mkdir {
> +	struct file			*file;
> +	int				dfd;
> +	umode_t				mode;
> +	struct filename			*filename;
> +};
> +
>  struct io_completion {
>  	struct file			*file;
>  	struct list_head		list;
> @@ -809,6 +816,7 @@ struct io_kiocb {
>  		struct io_shutdown	shutdown;
>  		struct io_rename	rename;
>  		struct io_unlink	unlink;
> +		struct io_mkdir		mkdir;
>  		/* use only after cleaning per-op data, see io_clean_op() */
>  		struct io_completion	compl;
>  	};
> @@ -1021,6 +1029,7 @@ static const struct io_op_def io_op_defs[] = {
>  	},
>  	[IORING_OP_RENAMEAT] = {},
>  	[IORING_OP_UNLINKAT] = {},
> +	[IORING_OP_MKDIRAT] = {},
>  };
>  
>  static bool io_disarm_next(struct io_kiocb *req);
> @@ -3530,6 +3539,44 @@ static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
>  	return 0;
>  }
>  
> +static int io_mkdirat_prep(struct io_kiocb *req,
> +			    const struct io_uring_sqe *sqe)
> +{
> +	struct io_mkdir *mkd = &req->mkdir;
> +	const char __user *fname;
> +
> +	if (unlikely(req->flags & REQ_F_FIXED_FILE))
> +		return -EBADF;
> +
> +	mkd->dfd = READ_ONCE(sqe->fd);
> +	mkd->mode = READ_ONCE(sqe->len);
> +
> +	fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	mkd->filename = getname(fname);
> +	if (IS_ERR(mkd->filename))
> +		return PTR_ERR(mkd->filename);
> +
> +	req->flags |= REQ_F_NEED_CLEANUP;
> +	return 0;
> +}
> +
> +static int io_mkdirat(struct io_kiocb *req, int issue_flags)
> +{
> +	struct io_mkdir *mkd = &req->mkdir;
> +	int ret;
> +
> +	if (issue_flags & IO_URING_F_NONBLOCK)
> +		return -EAGAIN;
> +
> +	ret = do_mkdirat(mkd->dfd, mkd->filename, mkd->mode);
> +
> +	req->flags &= ~REQ_F_NEED_CLEANUP;
> +	if (ret < 0)
> +		req_set_fail_links(req);
> +	io_req_complete(req, ret);
> +	return 0;
> +}
> +
>  static int io_shutdown_prep(struct io_kiocb *req,
>  			    const struct io_uring_sqe *sqe)
>  {
> @@ -5936,6 +5983,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  		return io_renameat_prep(req, sqe);
>  	case IORING_OP_UNLINKAT:
>  		return io_unlinkat_prep(req, sqe);
> +	case IORING_OP_MKDIRAT:
> +		return io_mkdirat_prep(req, sqe);
>  	}
>  
>  	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
> @@ -6077,6 +6126,9 @@ static void io_clean_op(struct io_kiocb *req)
>  		case IORING_OP_UNLINKAT:
>  			putname(req->unlink.filename);
>  			break;
> +		case IORING_OP_MKDIRAT:
> +			putname(req->mkdir.filename);
> +			break;
>  		}
>  		req->flags &= ~REQ_F_NEED_CLEANUP;
>  	}
> @@ -6203,6 +6255,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>  	case IORING_OP_UNLINKAT:
>  		ret = io_unlinkat(req, issue_flags);
>  		break;
> +	case IORING_OP_MKDIRAT:
> +		ret = io_mkdirat(req, issue_flags);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  		break;
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index e1ae46683301..bf9d720d371f 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -137,6 +137,7 @@ enum {
>  	IORING_OP_SHUTDOWN,
>  	IORING_OP_RENAMEAT,
>  	IORING_OP_UNLINKAT,
> +	IORING_OP_MKDIRAT,
>  
>  	/* this goes last, obviously */
>  	IORING_OP_LAST,
> 

-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-22 17:41   ` Pavel Begunkov
@ 2021-06-23  0:41     ` Jens Axboe
  2021-06-23  5:50     ` Dmitry Kadashev
  1 sibling, 0 replies; 53+ messages in thread
From: Jens Axboe @ 2021-06-23  0:41 UTC (permalink / raw)
  To: Pavel Begunkov, Dmitry Kadashev, Alexander Viro,
	Christian Brauner
  Cc: linux-fsdevel, io-uring

On 6/22/21 11:41 AM, Pavel Begunkov wrote:
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>> IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
>> and arguments.
> 
> Jens, a fold-in er discussed, and it will get you
> a conflict at 8/10

Folded in, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22 11:56 ` Pavel Begunkov
  2021-06-22 17:26   ` Jens Axboe
@ 2021-06-23  5:35   ` Dmitry Kadashev
  2021-06-24  2:37     ` Jens Axboe
  1 sibling, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  5:35 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Tue, Jun 22, 2021 at 6:56 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> > This started out as an attempt to add mkdirat support to io_uring which
> > is heavily based on renameat() / unlinkat() support.
> >
> > During the review process more operations were added (linkat, symlinkat,
> > mknodat) mainly to keep things uniform internally (in namei.c), and
> > with things changed in namei.c adding support for these operations to
> > io_uring is trivial, so that was done too. See
> > https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>
> io_uring part looks good in general, just small comments. However, I
> believe we should respin it, because there should be build problems
> in the middle.

I knew my celebration was premature! :)

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22 17:32       ` Jens Axboe
@ 2021-06-23  5:37         ` Dmitry Kadashev
  2021-06-23  5:49         ` Dmitry Kadashev
  1 sibling, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  5:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Wed, Jun 23, 2021 at 12:32 AM Jens Axboe <[email protected]> wrote:
>
> On 6/22/21 11:28 AM, Pavel Begunkov wrote:
> > On 6/22/21 6:26 PM, Jens Axboe wrote:
> >> On 6/22/21 5:56 AM, Pavel Begunkov wrote:
> >>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> >>>> This started out as an attempt to add mkdirat support to io_uring which
> >>>> is heavily based on renameat() / unlinkat() support.
> >>>>
> >>>> During the review process more operations were added (linkat, symlinkat,
> >>>> mknodat) mainly to keep things uniform internally (in namei.c), and
> >>>> with things changed in namei.c adding support for these operations to
> >>>> io_uring is trivial, so that was done too. See
> >>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> >>>
> >>> io_uring part looks good in general, just small comments. However, I
> >>> believe we should respin it, because there should be build problems
> >>> in the middle.
> >>
> >> I can drop it, if Dmitry wants to respin. I do think that we could
> >> easily drop mknodat and not really lose anything there, better to
> >> reserve the op for something a bit more useful.
> >
> > I can try it and send a fold in, if you want.
> > Other changes may be on top
>
> Sure that works too, will rebase in any case, and I'd like to add
> Christian's ack as well. I'll just re-apply with the fold-ins.

Thanks Jens and Pavel, I'll process the comments and will send a follow-up
patchset.

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 08/10] io_uring: add support for IORING_OP_SYMLINKAT
  2021-06-22 11:36   ` Pavel Begunkov
@ 2021-06-23  5:45     ` Dmitry Kadashev
  0 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  5:45 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Tue, Jun 22, 2021 at 6:37 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> > IORING_OP_SYMLINKAT behaves like symlinkat(2) and takes the same flags
> > and arguments.
> >
> > Suggested-by: Christian Brauner <[email protected]>
> > Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> > Signed-off-by: Dmitry Kadashev <[email protected]>
> > ---
> >  fs/internal.h                 |  1 +
> >  fs/io_uring.c                 | 64 ++++++++++++++++++++++++++++++++++-
> >  fs/namei.c                    |  3 +-
> >  include/uapi/linux/io_uring.h |  1 +
> >  4 files changed, 66 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 207a455e32d3..3b3954214385 100644
>
> [...]
>
> >
> >  static bool io_disarm_next(struct io_kiocb *req);
> > @@ -3572,7 +3581,51 @@ static int io_mkdirat(struct io_kiocb *req, int issue_flags)
> >
> >       req->flags &= ~REQ_F_NEED_CLEANUP;
> >       if (ret < 0)
> > -             req_set_fail_links(req);
> > +             req_set_fail(req);
>
> This means one of the previous patches doesn't compile. Let's fix it.

This is an artifact from yet another rebase, sorry. This should have been a
part of the MKDIR opcode commit, but got squashed with a wrong one. But I can
see you've already figured that out and sent for Jens to fold it in. Thanks!

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22 17:32       ` Jens Axboe
  2021-06-23  5:37         ` Dmitry Kadashev
@ 2021-06-23  5:49         ` Dmitry Kadashev
  2021-06-24  2:37           ` Jens Axboe
  1 sibling, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  5:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Wed, Jun 23, 2021 at 12:32 AM Jens Axboe <[email protected]> wrote:
>
> On 6/22/21 11:28 AM, Pavel Begunkov wrote:
> > On 6/22/21 6:26 PM, Jens Axboe wrote:
> >> On 6/22/21 5:56 AM, Pavel Begunkov wrote:
> >>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> >>>> This started out as an attempt to add mkdirat support to io_uring which
> >>>> is heavily based on renameat() / unlinkat() support.
> >>>>
> >>>> During the review process more operations were added (linkat, symlinkat,
> >>>> mknodat) mainly to keep things uniform internally (in namei.c), and
> >>>> with things changed in namei.c adding support for these operations to
> >>>> io_uring is trivial, so that was done too. See
> >>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> >>>
> >>> io_uring part looks good in general, just small comments. However, I
> >>> believe we should respin it, because there should be build problems
> >>> in the middle.
> >>
> >> I can drop it, if Dmitry wants to respin. I do think that we could
> >> easily drop mknodat and not really lose anything there, better to
> >> reserve the op for something a bit more useful.
> >
> > I can try it and send a fold in, if you want.
> > Other changes may be on top
>
> Sure that works too, will rebase in any case, and I'd like to add
> Christian's ack as well. I'll just re-apply with the fold-ins.

Jens, it seems that during this rebase you've accidentally squashed the
"fs: update do_*() helpers to return ints" and
"io_uring: add support for IORING_OP_SYMLINKAT" commits, so there is only the
former one in your tree, but it actually adds the SYMLINKAT opcode to io uring
(in addition to changing the helpers return types).

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-22 17:41   ` Pavel Begunkov
  2021-06-23  0:41     ` Jens Axboe
@ 2021-06-23  5:50     ` Dmitry Kadashev
  1 sibling, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  5:50 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Wed, Jun 23, 2021 at 12:41 AM Pavel Begunkov <[email protected]> wrote:
>
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> > IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
> > and arguments.
>
> Jens, a fold-in er discussed, and it will get you
> a conflict at 8/10

Thanks, Pavel

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT
  2021-06-22 11:48   ` Pavel Begunkov
@ 2021-06-23  6:09     ` Dmitry Kadashev
  2021-06-23 13:13       ` Pavel Begunkov
  0 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  6:09 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Tue, Jun 22, 2021 at 6:48 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> > IORING_OP_LINKAT behaves like linkat(2) and takes the same flags and
> > arguments.
> >
> > In some internal places 'hardlink' is used instead of 'link' to avoid
> > confusion with the SQE links. Name 'link' conflicts with the existing
> > 'link' member of io_kiocb.
> >
> > Suggested-by: Christian Brauner <[email protected]>
> > Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> > Signed-off-by: Dmitry Kadashev <[email protected]>
> > ---
> >  fs/internal.h                 |  2 ++
> >  fs/io_uring.c                 | 67 +++++++++++++++++++++++++++++++++++
> >  fs/namei.c                    |  2 +-
> >  include/uapi/linux/io_uring.h |  2 ++
> >  4 files changed, 72 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 3b3954214385..15a7d210cc67 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
>
> [...]
> > +
> > +static int io_linkat(struct io_kiocb *req, int issue_flags)
> > +{
> > +     struct io_hardlink *lnk = &req->hardlink;
> > +     int ret;
> > +
> > +     if (issue_flags & IO_URING_F_NONBLOCK)
> > +             return -EAGAIN;
> > +
> > +     ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
> > +                             lnk->newpath, lnk->flags);
>
> I'm curious, what's difference b/w SYMLINK and just LINK that
> one doesn't use old_dfd and another does?

Symlink's content does not have to exist, it's pretty much an arbitrary string.
E.g. try `ln -s http://example.com/ foo` :)

> Can it be supported/wished by someone in the future?

I don't really know. I guess it could be imagined that someone wants to try and
resolve the full target name against some dfd. But to me the whole idea looks
inherently problematic. Accepting the old dfd feels like the path is going to
be resolved, and historically it is not the case, and we'd need a special dfd
value to mean "do not resolve", and AT_FDCWD won't work for this (since it
means "resolve against the CWD", not "do not resolve").

> In that case I'd rather reserve and verify a field for old_dfd for both, even
> if one won't really support it for now.

If I understand you correctly, at this point you mean just checking that
old_dfd is not set (i.e. == -1)? I'll add a check.

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT
  2021-06-22 11:52   ` Pavel Begunkov
@ 2021-06-23  6:26     ` Dmitry Kadashev
  2021-06-23 11:58       ` Pavel Begunkov
  2021-06-24  2:36       ` Jens Axboe
  0 siblings, 2 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  6:26 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Tue, Jun 22, 2021 at 6:52 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> > IORING_OP_MKNODAT behaves like mknodat(2) and takes the same flags and
> > arguments.
> >
> > Suggested-by: Christian Brauner <[email protected]>
> > Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> > Signed-off-by: Dmitry Kadashev <[email protected]>
> > ---
> >  fs/internal.h                 |  2 ++
> >  fs/io_uring.c                 | 56 +++++++++++++++++++++++++++++++++++
> >  fs/namei.c                    |  2 +-
> >  include/uapi/linux/io_uring.h |  2 ++
> >  4 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 15a7d210cc67..c6fb9974006f 100644
>
> [...]
>
> >  static bool io_disarm_next(struct io_kiocb *req);
> > @@ -3687,6 +3697,44 @@ static int io_linkat(struct io_kiocb *req, int issue_flags)
> >       io_req_complete(req, ret);
> >       return 0;
> >  }
> > +static int io_mknodat_prep(struct io_kiocb *req,
> > +                         const struct io_uring_sqe *sqe)
> > +{
> > +     struct io_mknod *mkn = &req->mknod;
> > +     const char __user *fname;
> > +
> > +     if (unlikely(req->flags & REQ_F_FIXED_FILE))
> > +             return -EBADF;
>
> IOPOLL won't support it, and the check is missing.
> Probably for other opcodes as well.
>
> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>         return -EINVAL;

This change is based on some other similar opcodes (unlinkat, renameat) that
were added a while ago. Those lack the check as well. I guess I'll just prepare
a patch that adds the checks to all of them. Thanks, Pavel.

Jens, separately it's my understanding that you do not want the MKNODAT opcode
at all, should I drop this from the subsequent series?

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-22 11:50     ` Pavel Begunkov
@ 2021-06-23  6:41       ` Dmitry Kadashev
  2021-06-23 11:53         ` Pavel Begunkov
  0 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-23  6:41 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Tue, Jun 22, 2021 at 6:50 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/22/21 12:41 PM, Pavel Begunkov wrote:
> > On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
> >> IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
> >> and arguments.
> >>
> >> Signed-off-by: Dmitry Kadashev <[email protected]>
> >> ---
> >>  fs/io_uring.c                 | 55 +++++++++++++++++++++++++++++++++++
> >>  include/uapi/linux/io_uring.h |  1 +
> >>  2 files changed, 56 insertions(+)
> >>
> >> diff --git a/fs/io_uring.c b/fs/io_uring.c
> >> index a1ca6badff36..8ab4eb559520 100644
> >> --- a/fs/io_uring.c
> >> +++ b/fs/io_uring.c
> >> @@ -665,6 +665,13 @@ struct io_unlink {
> >>      struct filename                 *filename;
> >>  };
> >>
> >> +struct io_mkdir {
> >> +    struct file                     *file;
> >> +    int                             dfd;
> >> +    umode_t                         mode;
> >> +    struct filename                 *filename;
> >> +};
> >> +
> >>  struct io_completion {
> >>      struct file                     *file;
> >>      struct list_head                list;
> >> @@ -809,6 +816,7 @@ struct io_kiocb {
> >>              struct io_shutdown      shutdown;
> >>              struct io_rename        rename;
> >>              struct io_unlink        unlink;
> >> +            struct io_mkdir         mkdir;
> >>              /* use only after cleaning per-op data, see io_clean_op() */
> >>              struct io_completion    compl;
> >>      };
> >> @@ -1021,6 +1029,7 @@ static const struct io_op_def io_op_defs[] = {
> >>      },
> >>      [IORING_OP_RENAMEAT] = {},
> >>      [IORING_OP_UNLINKAT] = {},
> >> +    [IORING_OP_MKDIRAT] = {},
> >>  };
> >>
> >>  static bool io_disarm_next(struct io_kiocb *req);
> >> @@ -3530,6 +3539,44 @@ static int io_unlinkat(struct io_kiocb *req, unsigned int issue_flags)
> >>      return 0;
> >>  }
> >>
> >> +static int io_mkdirat_prep(struct io_kiocb *req,
> >> +                        const struct io_uring_sqe *sqe)
> >> +{
> >> +    struct io_mkdir *mkd = &req->mkdir;
> >> +    const char __user *fname;
> >> +
> >> +    if (unlikely(req->flags & REQ_F_FIXED_FILE))
> >> +            return -EBADF;
> >> +
> >> +    mkd->dfd = READ_ONCE(sqe->fd);
> >> +    mkd->mode = READ_ONCE(sqe->len);
> >> +
> >> +    fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
> >> +    mkd->filename = getname(fname);
> >> +    if (IS_ERR(mkd->filename))
> >> +            return PTR_ERR(mkd->filename);
> >
> > We have to check unused fields, e.g. buf_index and off,
> > to be able to use them in the future if needed.
> >
> > if (sqe->buf_index || sqe->off)
> >       return -EINVAL;
> >
> > Please double check what fields are not used, and
> > same goes for all other opcodes.

This changeset is based on some other ops that were added a while ago
(renameat, unlinkat), which lack the check as well. I guess I'll just go over
all of them and add the checks in a single patch if that's OK.

I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
the existing checks like this lack it too btw. I suppose I can fix those in a
separate commit if that makes sense.

>
> + opcode specific flags, e.g.
>
> if (sqe->rw_flags)
>         return -EINVAL;

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-23  6:41       ` Dmitry Kadashev
@ 2021-06-23 11:53         ` Pavel Begunkov
  2021-06-24 11:11           ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-23 11:53 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On 6/23/21 7:41 AM, Dmitry Kadashev wrote:
> On Tue, Jun 22, 2021 at 6:50 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 6/22/21 12:41 PM, Pavel Begunkov wrote:
>>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>>> IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags
>>>> and arguments.
>>>>
>>>> Signed-off-by: Dmitry Kadashev <[email protected]>
>>>> ---

[...]

>>> We have to check unused fields, e.g. buf_index and off,
>>> to be able to use them in the future if needed.
>>>
>>> if (sqe->buf_index || sqe->off)
>>>       return -EINVAL;
>>>
>>> Please double check what fields are not used, and
>>> same goes for all other opcodes.
> 
> This changeset is based on some other ops that were added a while ago
> (renameat, unlinkat), which lack the check as well. I guess I'll just go over

That's not great... Thanks for letting know

> all of them and add the checks in a single patch if that's OK.

For newly added opcodes a single patch on top is ok, rename and
unlink should be patched separately.

> I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
> the existing checks like this lack it too btw. I suppose I can fix those in a
> separate commit if that makes sense.

When we really use a field there should be a READ_ONCE(),
but I wouldn't care about those we check for compatibility
reasons, but that's only my opinion.

>> + opcode specific flags, e.g.
>>
>> if (sqe->rw_flags)
>>         return -EINVAL;
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT
  2021-06-23  6:26     ` Dmitry Kadashev
@ 2021-06-23 11:58       ` Pavel Begunkov
  2021-06-24  2:36       ` Jens Axboe
  1 sibling, 0 replies; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-23 11:58 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On 6/23/21 7:26 AM, Dmitry Kadashev wrote:
> On Tue, Jun 22, 2021 at 6:52 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>> IORING_OP_MKNODAT behaves like mknodat(2) and takes the same flags and
>>> arguments.
>>>
>>> Suggested-by: Christian Brauner <[email protected]>
>>> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>> Signed-off-by: Dmitry Kadashev <[email protected]>
>>> ---
>>>  fs/internal.h                 |  2 ++
>>>  fs/io_uring.c                 | 56 +++++++++++++++++++++++++++++++++++
>>>  fs/namei.c                    |  2 +-
>>>  include/uapi/linux/io_uring.h |  2 ++
>>>  4 files changed, 61 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/internal.h b/fs/internal.h
>>> index 15a7d210cc67..c6fb9974006f 100644
>>
>> [...]
>>
>>>  static bool io_disarm_next(struct io_kiocb *req);
>>> @@ -3687,6 +3697,44 @@ static int io_linkat(struct io_kiocb *req, int issue_flags)
>>>       io_req_complete(req, ret);
>>>       return 0;
>>>  }
>>> +static int io_mknodat_prep(struct io_kiocb *req,
>>> +                         const struct io_uring_sqe *sqe)
>>> +{
>>> +     struct io_mknod *mkn = &req->mknod;
>>> +     const char __user *fname;
>>> +
>>> +     if (unlikely(req->flags & REQ_F_FIXED_FILE))
>>> +             return -EBADF;
>>
>> IOPOLL won't support it, and the check is missing.
>> Probably for other opcodes as well.
>>
>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>         return -EINVAL;
> 
> This change is based on some other similar opcodes (unlinkat, renameat) that
> were added a while ago. Those lack the check as well. I guess I'll just prepare
> a patch that adds the checks to all of them. Thanks, Pavel.
> 
> Jens, separately it's my understanding that you do not want the MKNODAT opcode
> at all, should I drop this from the subsequent series?

I guess that comment was to my note that the patch with it
doesn't compile, but as it should be already fixed up...

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT
  2021-06-23  6:09     ` Dmitry Kadashev
@ 2021-06-23 13:13       ` Pavel Begunkov
  0 siblings, 0 replies; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-23 13:13 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On 6/23/21 7:09 AM, Dmitry Kadashev wrote:
> On Tue, Jun 22, 2021 at 6:48 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>> IORING_OP_LINKAT behaves like linkat(2) and takes the same flags and
>>> arguments.
>>>
>>> In some internal places 'hardlink' is used instead of 'link' to avoid
>>> confusion with the SQE links. Name 'link' conflicts with the existing
>>> 'link' member of io_kiocb.
>>>
>>> Suggested-by: Christian Brauner <[email protected]>
>>> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>> Signed-off-by: Dmitry Kadashev <[email protected]>
>>> ---
>>>  fs/internal.h                 |  2 ++
>>>  fs/io_uring.c                 | 67 +++++++++++++++++++++++++++++++++++
>>>  fs/namei.c                    |  2 +-
>>>  include/uapi/linux/io_uring.h |  2 ++
>>>  4 files changed, 72 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/internal.h b/fs/internal.h
>>> index 3b3954214385..15a7d210cc67 100644
>>> --- a/fs/internal.h
>>> +++ b/fs/internal.h
>>
>> [...]
>>> +
>>> +static int io_linkat(struct io_kiocb *req, int issue_flags)
>>> +{
>>> +     struct io_hardlink *lnk = &req->hardlink;
>>> +     int ret;
>>> +
>>> +     if (issue_flags & IO_URING_F_NONBLOCK)
>>> +             return -EAGAIN;
>>> +
>>> +     ret = do_linkat(lnk->old_dfd, lnk->oldpath, lnk->new_dfd,
>>> +                             lnk->newpath, lnk->flags);
>>
>> I'm curious, what's difference b/w SYMLINK and just LINK that
>> one doesn't use old_dfd and another does?
> 
> Symlink's content does not have to exist, it's pretty much an arbitrary string.
> E.g. try `ln -s http://example.com/ foo` :)
> 
>> Can it be supported/wished by someone in the future?
> 
> I don't really know. I guess it could be imagined that someone wants to try and
> resolve the full target name against some dfd. But to me the whole idea looks
> inherently problematic. Accepting the old dfd feels like the path is going to
> be resolved, and historically it is not the case, and we'd need a special dfd
> value to mean "do not resolve", and AT_FDCWD won't work for this (since it
> means "resolve against the CWD", not "do not resolve").

I see, I don't know it good enough to reason, but have to throw the question
into the air, ...

>> In that case I'd rather reserve and verify a field for old_dfd for both, even
>> if one won't really support it for now.
> 
> If I understand you correctly, at this point you mean just checking that
> old_dfd is not set (i.e. == -1)? I'll add a check.

... and we have all 5.14 to fix it and other parts if needed, so let's
leave it as is

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT
  2021-06-23  6:26     ` Dmitry Kadashev
  2021-06-23 11:58       ` Pavel Begunkov
@ 2021-06-24  2:36       ` Jens Axboe
  1 sibling, 0 replies; 53+ messages in thread
From: Jens Axboe @ 2021-06-24  2:36 UTC (permalink / raw)
  To: Dmitry Kadashev, Pavel Begunkov
  Cc: Alexander Viro, Christian Brauner, linux-fsdevel, io-uring

On 6/23/21 12:26 AM, Dmitry Kadashev wrote:
> On Tue, Jun 22, 2021 at 6:52 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>> IORING_OP_MKNODAT behaves like mknodat(2) and takes the same flags and
>>> arguments.
>>>
>>> Suggested-by: Christian Brauner <[email protected]>
>>> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>> Signed-off-by: Dmitry Kadashev <[email protected]>
>>> ---
>>>  fs/internal.h                 |  2 ++
>>>  fs/io_uring.c                 | 56 +++++++++++++++++++++++++++++++++++
>>>  fs/namei.c                    |  2 +-
>>>  include/uapi/linux/io_uring.h |  2 ++
>>>  4 files changed, 61 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/internal.h b/fs/internal.h
>>> index 15a7d210cc67..c6fb9974006f 100644
>>
>> [...]
>>
>>>  static bool io_disarm_next(struct io_kiocb *req);
>>> @@ -3687,6 +3697,44 @@ static int io_linkat(struct io_kiocb *req, int issue_flags)
>>>       io_req_complete(req, ret);
>>>       return 0;
>>>  }
>>> +static int io_mknodat_prep(struct io_kiocb *req,
>>> +                         const struct io_uring_sqe *sqe)
>>> +{
>>> +     struct io_mknod *mkn = &req->mknod;
>>> +     const char __user *fname;
>>> +
>>> +     if (unlikely(req->flags & REQ_F_FIXED_FILE))
>>> +             return -EBADF;
>>
>> IOPOLL won't support it, and the check is missing.
>> Probably for other opcodes as well.
>>
>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>         return -EINVAL;
> 
> This change is based on some other similar opcodes (unlinkat, renameat) that
> were added a while ago. Those lack the check as well. I guess I'll just prepare
> a patch that adds the checks to all of them. Thanks, Pavel.
> 
> Jens, separately it's my understanding that you do not want the MKNODAT opcode
> at all, should I drop this from the subsequent series?

Right, just drop that one for now. Would be great if you could resend
the series with the suggested fixes folded in. Might as well just do
a clean sweep.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-23  5:35   ` Dmitry Kadashev
@ 2021-06-24  2:37     ` Jens Axboe
  0 siblings, 0 replies; 53+ messages in thread
From: Jens Axboe @ 2021-06-24  2:37 UTC (permalink / raw)
  To: Dmitry Kadashev, Pavel Begunkov
  Cc: Alexander Viro, Christian Brauner, linux-fsdevel, io-uring

On 6/22/21 11:35 PM, Dmitry Kadashev wrote:
> On Tue, Jun 22, 2021 at 6:56 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>> This started out as an attempt to add mkdirat support to io_uring which
>>> is heavily based on renameat() / unlinkat() support.
>>>
>>> During the review process more operations were added (linkat, symlinkat,
>>> mknodat) mainly to keep things uniform internally (in namei.c), and
>>> with things changed in namei.c adding support for these operations to
>>> io_uring is trivial, so that was done too. See
>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>
>> io_uring part looks good in general, just small comments. However, I
>> believe we should respin it, because there should be build problems
>> in the middle.
> 
> I knew my celebration was premature! :)

It's all good, just respin with the requested fixes :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-23  5:49         ` Dmitry Kadashev
@ 2021-06-24  2:37           ` Jens Axboe
  2021-06-24 10:55             ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Jens Axboe @ 2021-06-24  2:37 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Pavel Begunkov, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On 6/22/21 11:49 PM, Dmitry Kadashev wrote:
> On Wed, Jun 23, 2021 at 12:32 AM Jens Axboe <[email protected]> wrote:
>>
>> On 6/22/21 11:28 AM, Pavel Begunkov wrote:
>>> On 6/22/21 6:26 PM, Jens Axboe wrote:
>>>> On 6/22/21 5:56 AM, Pavel Begunkov wrote:
>>>>> On 6/3/21 6:18 AM, Dmitry Kadashev wrote:
>>>>>> This started out as an attempt to add mkdirat support to io_uring which
>>>>>> is heavily based on renameat() / unlinkat() support.
>>>>>>
>>>>>> During the review process more operations were added (linkat, symlinkat,
>>>>>> mknodat) mainly to keep things uniform internally (in namei.c), and
>>>>>> with things changed in namei.c adding support for these operations to
>>>>>> io_uring is trivial, so that was done too. See
>>>>>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
>>>>>
>>>>> io_uring part looks good in general, just small comments. However, I
>>>>> believe we should respin it, because there should be build problems
>>>>> in the middle.
>>>>
>>>> I can drop it, if Dmitry wants to respin. I do think that we could
>>>> easily drop mknodat and not really lose anything there, better to
>>>> reserve the op for something a bit more useful.
>>>
>>> I can try it and send a fold in, if you want.
>>> Other changes may be on top
>>
>> Sure that works too, will rebase in any case, and I'd like to add
>> Christian's ack as well. I'll just re-apply with the fold-ins.
> 
> Jens, it seems that during this rebase you've accidentally squashed the
> "fs: update do_*() helpers to return ints" and
> "io_uring: add support for IORING_OP_SYMLINKAT" commits, so there is only the
> former one in your tree, but it actually adds the SYMLINKAT opcode to io uring
> (in addition to changing the helpers return types).

Man, I wonder what happened there. I'll just drop the series, so when you
resend this one (hopefully soon if it's for 5.14...), just make it
against the current branch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-24  2:37           ` Jens Axboe
@ 2021-06-24 10:55             ` Dmitry Kadashev
  0 siblings, 0 replies; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-24 10:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Thu, Jun 24, 2021 at 9:37 AM Jens Axboe <[email protected]> wrote:
>
> On 6/22/21 11:49 PM, Dmitry Kadashev wrote:
> >
> > Jens, it seems that during this rebase you've accidentally squashed the
> > "fs: update do_*() helpers to return ints" and
> > "io_uring: add support for IORING_OP_SYMLINKAT" commits, so there is only the
> > former one in your tree, but it actually adds the SYMLINKAT opcode to io uring
> > (in addition to changing the helpers return types).
>
> Man, I wonder what happened there. I'll just drop the series, so when you
> resend this one (hopefully soon if it's for 5.14...), just make it
> against the current branch.

Sure, I'll send v6 shortly

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-23 11:53         ` Pavel Begunkov
@ 2021-06-24 11:11           ` Dmitry Kadashev
  2021-06-24 12:21             ` Pavel Begunkov
  0 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-24 11:11 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Wed, Jun 23, 2021 at 6:54 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/23/21 7:41 AM, Dmitry Kadashev wrote:
> > I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
> > the existing checks like this lack it too btw. I suppose I can fix those in a
> > separate commit if that makes sense.
>
> When we really use a field there should be a READ_ONCE(),
> but I wouldn't care about those we check for compatibility
> reasons, but that's only my opinion.

I'm not sure how the compatibility check reads are special. The code is
either correct or not. If a compatibility check has correctness problems
then it's pretty much as bad as any other part of the code having such
problems, no?

That said, I'll just go ahead and use the approach that the rest of the
code (or rather most of it) uses (no READ_ONCE). If it needs fixing then
the whole bunch can probably be fixed in one go (either a single patch
or a series).

Thanks for your help, Pavel!

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-24 11:11           ` Dmitry Kadashev
@ 2021-06-24 12:21             ` Pavel Begunkov
  2021-06-28  8:17               ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-06-24 12:21 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On 6/24/21 12:11 PM, Dmitry Kadashev wrote:
> On Wed, Jun 23, 2021 at 6:54 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 6/23/21 7:41 AM, Dmitry Kadashev wrote:
>>> I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
>>> the existing checks like this lack it too btw. I suppose I can fix those in a
>>> separate commit if that makes sense.
>>
>> When we really use a field there should be a READ_ONCE(),
>> but I wouldn't care about those we check for compatibility
>> reasons, but that's only my opinion.
> 
> I'm not sure how the compatibility check reads are special. The code is
> either correct or not. If a compatibility check has correctness problems
> then it's pretty much as bad as any other part of the code having such
> problems, no?

If it reads and verifies a values first, e.g. index into some internal
array, and then compiler plays a joke and reloads it, we might be
absolutely screwed expecting 'segfaults', kernel data leakages and all
the fun stuff.

If that's a compatibility check, whether it's loaded earlier or later,
or whatever, it's not a big deal, the userspace can in any case change
the memory at any moment it wishes, even tightly around the moment
we're reading it.


> That said, I'll just go ahead and use the approach that the rest of the
> code (or rather most of it) uses (no READ_ONCE). If it needs fixing then
> the whole bunch can probably be fixed in one go (either a single patch
> or a series).
> 
> Thanks for your help, Pavel!
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-24 12:21             ` Pavel Begunkov
@ 2021-06-28  8:17               ` Dmitry Kadashev
  2021-07-07 14:06                 ` Pavel Begunkov
  0 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-06-28  8:17 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Thu, Jun 24, 2021 at 7:22 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/24/21 12:11 PM, Dmitry Kadashev wrote:
> > On Wed, Jun 23, 2021 at 6:54 PM Pavel Begunkov <[email protected]> wrote:
> >>
> >> On 6/23/21 7:41 AM, Dmitry Kadashev wrote:
> >>> I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
> >>> the existing checks like this lack it too btw. I suppose I can fix those in a
> >>> separate commit if that makes sense.
> >>
> >> When we really use a field there should be a READ_ONCE(),
> >> but I wouldn't care about those we check for compatibility
> >> reasons, but that's only my opinion.
> >
> > I'm not sure how the compatibility check reads are special. The code is
> > either correct or not. If a compatibility check has correctness problems
> > then it's pretty much as bad as any other part of the code having such
> > problems, no?
>
> If it reads and verifies a values first, e.g. index into some internal
> array, and then compiler plays a joke and reloads it, we might be
> absolutely screwed expecting 'segfaults', kernel data leakages and all
> the fun stuff.
>
> If that's a compatibility check, whether it's loaded earlier or later,
> or whatever, it's not a big deal, the userspace can in any case change
> the memory at any moment it wishes, even tightly around the moment
> we're reading it.

Sorry for the slow reply, I have to balance this with my actual job that
is not directly related to the kernel development :)

I'm no kernel concurrency expert (actually I'm not any kind of kernel
expert), but my understanding is READ_ONCE does not just mean "do not
read more than once", but rather "read exactly once" (and more than
that), and if it's not applied then the compiler is within its rights to
optimize the read out, so the compatibility check can effectively be
disabled.

I don't think it's likely to happen, but "bad things do not happen in
practice" and "it is technically correct" are two different things :)

FWIW I'm not arguing it has to be changed, I just want to understand
things better (and if it helps to spot a bug at some point then great).
So if my reasoning is wrong then please point out where. And if it's
just the simplicity / clarity of the code that is the goal here and any
negative effects are considered to be unlikely then it's OK, I can
understand that.

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support
  2021-06-22  8:34         ` Dmitry Kadashev
@ 2021-06-29 13:06           ` Christian Brauner
  0 siblings, 0 replies; 53+ messages in thread
From: Christian Brauner @ 2021-06-29 13:06 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Jens Axboe, Alexander Viro, Pavel Begunkov, linux-fsdevel,
	io-uring

On Tue, Jun 22, 2021 at 03:34:37PM +0700, Dmitry Kadashev wrote:
> On Tue, Jun 22, 2021 at 3:12 PM Christian Brauner
> <[email protected]> wrote:
> >
> > On Mon, Jun 21, 2021 at 09:21:23AM -0600, Jens Axboe wrote:
> > > On 6/18/21 10:10 AM, Jens Axboe wrote:
> > > > On 6/18/21 12:24 AM, Dmitry Kadashev wrote:
> > > >> On Thu, Jun 3, 2021 at 12:18 PM Dmitry Kadashev <[email protected]> wrote:
> > > >>>
> > > >>> This started out as an attempt to add mkdirat support to io_uring which
> > > >>> is heavily based on renameat() / unlinkat() support.
> > > >>>
> > > >>> During the review process more operations were added (linkat, symlinkat,
> > > >>> mknodat) mainly to keep things uniform internally (in namei.c), and
> > > >>> with things changed in namei.c adding support for these operations to
> > > >>> io_uring is trivial, so that was done too. See
> > > >>> https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/
> > > >>
> > > >> Ping. Jens, are we waiting for the audit change to be merged before this
> > > >> can go in?
> > > >
> > > > Not necessarily, as that should go in for 5.14 anyway.
> > > >
> > > > Al, are you OK with the generic changes?
> > >
> > > I have tentatively queued this up.
> >
> > Hey Dmitry,
> > hey Jens,
> >
> > The additional op codes and suggested rework is partially on me. So I
> > should share the blame in case this gets a NAK:
> >
> > Acked-by: Christian Brauner <[email protected]>
> 
> Hi Christian,
> 
> Thank you very much for your help with this! Just wanted to say that you
> were quite supportive and really helped me a lot.

Hey Dmitry,

Thank you for saying that! Much appreciated.
Thanks for working on this!

Christian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-06-28  8:17               ` Dmitry Kadashev
@ 2021-07-07 14:06                 ` Pavel Begunkov
  2021-07-12 12:44                   ` Dmitry Kadashev
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Begunkov @ 2021-07-07 14:06 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On 6/28/21 9:17 AM, Dmitry Kadashev wrote:
> On Thu, Jun 24, 2021 at 7:22 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 6/24/21 12:11 PM, Dmitry Kadashev wrote:
>>> On Wed, Jun 23, 2021 at 6:54 PM Pavel Begunkov <[email protected]> wrote:
>>>>
>>>> On 6/23/21 7:41 AM, Dmitry Kadashev wrote:
>>>>> I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
>>>>> the existing checks like this lack it too btw. I suppose I can fix those in a
>>>>> separate commit if that makes sense.
>>>>
>>>> When we really use a field there should be a READ_ONCE(),
>>>> but I wouldn't care about those we check for compatibility
>>>> reasons, but that's only my opinion.
>>>
>>> I'm not sure how the compatibility check reads are special. The code is
>>> either correct or not. If a compatibility check has correctness problems
>>> then it's pretty much as bad as any other part of the code having such
>>> problems, no?
>>
>> If it reads and verifies a values first, e.g. index into some internal
>> array, and then compiler plays a joke and reloads it, we might be
>> absolutely screwed expecting 'segfaults', kernel data leakages and all
>> the fun stuff.
>>
>> If that's a compatibility check, whether it's loaded earlier or later,
>> or whatever, it's not a big deal, the userspace can in any case change
>> the memory at any moment it wishes, even tightly around the moment
>> we're reading it.
> 
> Sorry for the slow reply, I have to balance this with my actual job that
> is not directly related to the kernel development :)
> 
> I'm no kernel concurrency expert (actually I'm not any kind of kernel
> expert), but my understanding is READ_ONCE does not just mean "do not
> read more than once", but rather "read exactly once" (and more than
> that), and if it's not applied then the compiler is within its rights to
> optimize the read out, so the compatibility check can effectively be
> disabled.

Yep, as they say it's about all the "inventive" transformations
compilers can do, double read is just one of those that may turn very
nasty for us.

One big difference for me is whether it have a potential to crash the
kernel or not, though it's just one side.

Compilers can't drop the check just because, it first should be proven
to be safe to do, and there are all sorts barriers around and
limitations on how CQEs and SQEs are used, making impossible to alias
memory. E.g. CQEs and SQEs can't be reused in a single syscall, they're
only written and read respectively, and so on. Maybe, the only one I'd
worry about is the call to io_commit_sqring(), i.e. for SQE reads not
happening after it, but we need to take a look whether it's
theoretically possible.

> I don't think it's likely to happen, but "bad things do not happen in
> practice" and "it is technically correct" are two different things :)
> 
> FWIW I'm not arguing it has to be changed, I just want to understand
> things better (and if it helps to spot a bug at some point then great).
> So if my reasoning is wrong then please point out where. And if it's
> just the simplicity / clarity of the code that is the goal here and any
> negative effects are considered to be unlikely then it's OK, I can
> understand that.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-07-07 14:06                 ` Pavel Begunkov
@ 2021-07-12 12:44                   ` Dmitry Kadashev
  2021-07-12 13:14                     ` Pavel Begunkov
  0 siblings, 1 reply; 53+ messages in thread
From: Dmitry Kadashev @ 2021-07-12 12:44 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On Wed, Jul 7, 2021 at 9:06 PM Pavel Begunkov <[email protected]> wrote:
>
> On 6/28/21 9:17 AM, Dmitry Kadashev wrote:
> > On Thu, Jun 24, 2021 at 7:22 PM Pavel Begunkov <[email protected]> wrote:
> >>
> >> On 6/24/21 12:11 PM, Dmitry Kadashev wrote:
> >>> On Wed, Jun 23, 2021 at 6:54 PM Pavel Begunkov <[email protected]> wrote:
> >>>>
> >>>> On 6/23/21 7:41 AM, Dmitry Kadashev wrote:
> >>>>> I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
> >>>>> the existing checks like this lack it too btw. I suppose I can fix those in a
> >>>>> separate commit if that makes sense.
> >>>>
> >>>> When we really use a field there should be a READ_ONCE(),
> >>>> but I wouldn't care about those we check for compatibility
> >>>> reasons, but that's only my opinion.
> >>>
> >>> I'm not sure how the compatibility check reads are special. The code is
> >>> either correct or not. If a compatibility check has correctness problems
> >>> then it's pretty much as bad as any other part of the code having such
> >>> problems, no?
> >>
> >> If it reads and verifies a values first, e.g. index into some internal
> >> array, and then compiler plays a joke and reloads it, we might be
> >> absolutely screwed expecting 'segfaults', kernel data leakages and all
> >> the fun stuff.
> >>
> >> If that's a compatibility check, whether it's loaded earlier or later,
> >> or whatever, it's not a big deal, the userspace can in any case change
> >> the memory at any moment it wishes, even tightly around the moment
> >> we're reading it.
> >
> > Sorry for the slow reply, I have to balance this with my actual job that
> > is not directly related to the kernel development :)
> >
> > I'm no kernel concurrency expert (actually I'm not any kind of kernel
> > expert), but my understanding is READ_ONCE does not just mean "do not
> > read more than once", but rather "read exactly once" (and more than
> > that), and if it's not applied then the compiler is within its rights to
> > optimize the read out, so the compatibility check can effectively be
> > disabled.
>
> Yep, as they say it's about all the "inventive" transformations
> compilers can do, double read is just one of those that may turn very
> nasty for us.
>
> One big difference for me is whether it have a potential to crash the
> kernel or not, though it's just one side.

Ah, that makes sense.

> Compilers can't drop the check just because, it first should be proven
> to be safe to do, and there are all sorts barriers around and
> limitations on how CQEs and SQEs are used, making impossible to alias
> memory. E.g. CQEs and SQEs can't be reused in a single syscall, they're
> only written and read respectively, and so on. Maybe, the only one I'd
> worry about is the call to io_commit_sqring(), i.e. for SQE reads not
> happening after it, but we need to take a look whether it's
> theoretically possible.

Thanks for the explanation, Pavel!

-- 
Dmitry Kadashev

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT
  2021-07-12 12:44                   ` Dmitry Kadashev
@ 2021-07-12 13:14                     ` Pavel Begunkov
  0 siblings, 0 replies; 53+ messages in thread
From: Pavel Begunkov @ 2021-07-12 13:14 UTC (permalink / raw)
  To: Dmitry Kadashev
  Cc: Jens Axboe, Alexander Viro, Christian Brauner, linux-fsdevel,
	io-uring

On 7/12/21 1:44 PM, Dmitry Kadashev wrote:
> On Wed, Jul 7, 2021 at 9:06 PM Pavel Begunkov <[email protected]> wrote:
>> On 6/28/21 9:17 AM, Dmitry Kadashev wrote:
>>> On Thu, Jun 24, 2021 at 7:22 PM Pavel Begunkov <[email protected]> wrote:
>>>> On 6/24/21 12:11 PM, Dmitry Kadashev wrote:
>>>>> On Wed, Jun 23, 2021 at 6:54 PM Pavel Begunkov <[email protected]> wrote:
>>>>>> On 6/23/21 7:41 AM, Dmitry Kadashev wrote:
>>>>>>> I'd imagine READ_ONCE is to be used in those checks though, isn't it? Some of
>>>>>>> the existing checks like this lack it too btw. I suppose I can fix those in a
>>>>>>> separate commit if that makes sense.
>>>>>>
>>>>>> When we really use a field there should be a READ_ONCE(),
>>>>>> but I wouldn't care about those we check for compatibility
>>>>>> reasons, but that's only my opinion.
>>>>>
>>>>> I'm not sure how the compatibility check reads are special. The code is
>>>>> either correct or not. If a compatibility check has correctness problems
>>>>> then it's pretty much as bad as any other part of the code having such
>>>>> problems, no?
>>>>
>>>> If it reads and verifies a values first, e.g. index into some internal
>>>> array, and then compiler plays a joke and reloads it, we might be
>>>> absolutely screwed expecting 'segfaults', kernel data leakages and all
>>>> the fun stuff.
>>>>
>>>> If that's a compatibility check, whether it's loaded earlier or later,
>>>> or whatever, it's not a big deal, the userspace can in any case change
>>>> the memory at any moment it wishes, even tightly around the moment
>>>> we're reading it.
>>>
>>> Sorry for the slow reply, I have to balance this with my actual job that
>>> is not directly related to the kernel development :)
>>>
>>> I'm no kernel concurrency expert (actually I'm not any kind of kernel
>>> expert), but my understanding is READ_ONCE does not just mean "do not
>>> read more than once", but rather "read exactly once" (and more than
>>> that), and if it's not applied then the compiler is within its rights to
>>> optimize the read out, so the compatibility check can effectively be
>>> disabled.
>>
>> Yep, as they say it's about all the "inventive" transformations
>> compilers can do, double read is just one of those that may turn very
>> nasty for us.
>>
>> One big difference for me is whether it have a potential to crash the
>> kernel or not, though it's just one side.
> 
> Ah, that makes sense.
> 
>> Compilers can't drop the check just because, it first should be proven
>> to be safe to do, and there are all sorts barriers around and
>> limitations on how CQEs and SQEs are used, making impossible to alias
>> memory. E.g. CQEs and SQEs can't be reused in a single syscall, they're
>> only written and read respectively, and so on. Maybe, the only one I'd
>> worry about is the call to io_commit_sqring(), i.e. for SQE reads not
>> happening after it, but we need to take a look whether it's
>> theoretically possible.
> 
> Thanks for the explanation, Pavel!

btw, that was for why we were rather reluctant about that. It could
be a good idea to add READ_ONCE as you mentioned, at least would be
less confusing. 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2021-07-12 13:14 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-03  5:18 [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 01/10] fs: make do_mkdirat() take struct filename Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 02/10] io_uring: add support for IORING_OP_MKDIRAT Dmitry Kadashev
2021-06-22 11:41   ` Pavel Begunkov
2021-06-22 11:50     ` Pavel Begunkov
2021-06-23  6:41       ` Dmitry Kadashev
2021-06-23 11:53         ` Pavel Begunkov
2021-06-24 11:11           ` Dmitry Kadashev
2021-06-24 12:21             ` Pavel Begunkov
2021-06-28  8:17               ` Dmitry Kadashev
2021-07-07 14:06                 ` Pavel Begunkov
2021-07-12 12:44                   ` Dmitry Kadashev
2021-07-12 13:14                     ` Pavel Begunkov
2021-06-22 17:41   ` Pavel Begunkov
2021-06-23  0:41     ` Jens Axboe
2021-06-23  5:50     ` Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 03/10] fs: make do_mknodat() take struct filename Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 04/10] fs: make do_symlinkat() " Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 05/10] namei: add getname_uflags() Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 06/10] fs: make do_linkat() take struct filename Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 07/10] fs: update do_*() helpers to return ints Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 08/10] io_uring: add support for IORING_OP_SYMLINKAT Dmitry Kadashev
2021-06-22 11:36   ` Pavel Begunkov
2021-06-23  5:45     ` Dmitry Kadashev
2021-06-03  5:18 ` [PATCH v5 09/10] io_uring: add support for IORING_OP_LINKAT Dmitry Kadashev
2021-06-22 11:48   ` Pavel Begunkov
2021-06-23  6:09     ` Dmitry Kadashev
2021-06-23 13:13       ` Pavel Begunkov
2021-06-03  5:18 ` [PATCH v5 10/10] io_uring: add support for IORING_OP_MKNODAT Dmitry Kadashev
2021-06-22 11:52   ` Pavel Begunkov
2021-06-23  6:26     ` Dmitry Kadashev
2021-06-23 11:58       ` Pavel Begunkov
2021-06-24  2:36       ` Jens Axboe
2021-06-18  6:24 ` [PATCH v5 00/10] io_uring: add mkdir, [sym]linkat and mknodat support Dmitry Kadashev
2021-06-18 16:10   ` Jens Axboe
2021-06-21 15:21     ` Jens Axboe
2021-06-22  8:12       ` Christian Brauner
2021-06-22  8:34         ` Dmitry Kadashev
2021-06-29 13:06           ` Christian Brauner
2021-06-22 17:26         ` Jens Axboe
2021-06-22  8:26       ` Dmitry Kadashev
2021-06-21 15:57 ` Jens Axboe
2021-06-21 15:59   ` Jens Axboe
2021-06-22 11:56 ` Pavel Begunkov
2021-06-22 17:26   ` Jens Axboe
2021-06-22 17:28     ` Pavel Begunkov
2021-06-22 17:32       ` Jens Axboe
2021-06-23  5:37         ` Dmitry Kadashev
2021-06-23  5:49         ` Dmitry Kadashev
2021-06-24  2:37           ` Jens Axboe
2021-06-24 10:55             ` Dmitry Kadashev
2021-06-23  5:35   ` Dmitry Kadashev
2021-06-24  2:37     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox