public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2 0/3] io_uring: add getdents64 support
@ 2021-11-24 23:16 Stefan Roesch
  2021-11-24 23:16 ` [PATCH v2 1/3] fs: add parameter use_fpos to iterate_dir function Stefan Roesch
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Stefan Roesch @ 2021-11-24 23:16 UTC (permalink / raw)
  To: io-uring, linux-fsdevel; +Cc: shr

This series adds support for getdents64 in liburing. The intent is to
provide a more complete I/O interface for io_uring.

Patch 1: fs: add parameter use_fpos to iterate_dir()
  This adds a new parameter to the function iterate_dir() so the
  caller can specify if the position is the file position or the
  position stored in the buffer context.

Patch 2: fs: split off vfs_getdents function from getdents64 system call
  This splits of the iterate_dir part of the syscall in its own
  dedicated function. This allows to call the function directly from
  liburing.

Patch 3: io_uring: add support for getdents64
  Adds the functions to io_uring to support getdents64.

There is also a patch series for the changes to liburing. This includes
a new test. The patch series is called "liburing: add getdents support."

The following tests have been performed:
- new liburing getdents test program has been run
- xfstests have been run
- both tests have been repeated with the kernel memory leak checker
  and no leaks have been reported.

Signed-off-by: Stefan Roesch <[email protected]>
---
V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with
    the additional parameter.

Stefan Roesch (3):
  fs: add parameter use_fpos to iterate_dir function
  fs: split off vfs_getdents function of getdents64 syscall
  io_uring: add support for getdents64

 arch/alpha/kernel/osf_sys.c   |  2 +-
 fs/ecryptfs/file.c            |  2 +-
 fs/exportfs/expfs.c           |  2 +-
 fs/internal.h                 |  8 +++++
 fs/io_uring.c                 | 52 ++++++++++++++++++++++++++++
 fs/ksmbd/smb2pdu.c            |  2 +-
 fs/ksmbd/vfs.c                |  4 +--
 fs/nfsd/nfs4recover.c         |  2 +-
 fs/nfsd/vfs.c                 |  2 +-
 fs/overlayfs/readdir.c        |  6 ++--
 fs/readdir.c                  | 64 ++++++++++++++++++++++++++---------
 include/linux/fs.h            |  2 +-
 include/uapi/linux/io_uring.h |  1 +
 13 files changed, 121 insertions(+), 28 deletions(-)


base-commit: f0afafc21027c39544a2c1d889b0cff75b346932
-- 
2.30.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] fs: add parameter use_fpos to iterate_dir function
  2021-11-24 23:16 [PATCH v2 0/3] io_uring: add getdents64 support Stefan Roesch
@ 2021-11-24 23:16 ` Stefan Roesch
  2021-11-24 23:16 ` [PATCH v2 2/3] fs: split off vfs_getdents function of getdents64 syscall Stefan Roesch
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Stefan Roesch @ 2021-11-24 23:16 UTC (permalink / raw)
  To: io-uring, linux-fsdevel; +Cc: shr

This adds the use_fpos parameter to the iterate_dir function.
If use_fpos is true it uses the file position in the file
structure (existing behavior). If use_fpos is false, it uses
the pos in the context structure.

This change is required to support getdents in io_uring.

Signed-off-by: Stefan Roesch <[email protected]>
---
 arch/alpha/kernel/osf_sys.c |  2 +-
 fs/ecryptfs/file.c          |  2 +-
 fs/exportfs/expfs.c         |  2 +-
 fs/ksmbd/smb2pdu.c          |  2 +-
 fs/ksmbd/vfs.c              |  4 ++--
 fs/nfsd/nfs4recover.c       |  2 +-
 fs/nfsd/vfs.c               |  2 +-
 fs/overlayfs/readdir.c      |  6 +++---
 fs/readdir.c                | 28 ++++++++++++++++++++--------
 include/linux/fs.h          |  2 +-
 10 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 8bbeebb73cf0..bf9e6a5d65a9 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -162,7 +162,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
 	if (!arg.file)
 		return -EBADF;
 
-	error = iterate_dir(arg.file, &buf.ctx);
+	error = iterate_dir(arg.file, &buf.ctx, true);
 	if (error >= 0)
 		error = buf.error;
 	if (count != buf.count)
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index 18d5b91cb573..d85e51cb9a88 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -109,7 +109,7 @@ static int ecryptfs_readdir(struct file *file, struct dir_context *ctx)
 		.sb = inode->i_sb,
 	};
 	lower_file = ecryptfs_file_to_lower(file);
-	rc = iterate_dir(lower_file, &buf.ctx);
+	rc = iterate_dir(lower_file, &buf.ctx, true);
 	ctx->pos = buf.ctx.pos;
 	if (rc < 0)
 		goto out;
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..0f303356f907 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -323,7 +323,7 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
 	while (1) {
 		int old_seq = buffer.sequence;
 
-		error = iterate_dir(file, &buffer.ctx);
+		error = iterate_dir(file, &buffer.ctx, true);
 		if (buffer.found) {
 			error = 0;
 			break;
diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c
index 121f8e8c70ac..77424966b38c 100644
--- a/fs/ksmbd/smb2pdu.c
+++ b/fs/ksmbd/smb2pdu.c
@@ -3923,7 +3923,7 @@ int smb2_query_dir(struct ksmbd_work *work)
 	dir_fp->readdir_data.private		= &query_dir_private;
 	set_ctx_actor(&dir_fp->readdir_data.ctx, __query_dir);
 
-	rc = iterate_dir(dir_fp->filp, &dir_fp->readdir_data.ctx);
+	rc = iterate_dir(dir_fp->filp, &dir_fp->readdir_data.ctx, true);
 	if (rc == 0)
 		restart_ctx(&dir_fp->readdir_data.ctx);
 	if (rc == -ENOSPC)
diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c
index 19d36393974c..1755a2a22275 100644
--- a/fs/ksmbd/vfs.c
+++ b/fs/ksmbd/vfs.c
@@ -1136,7 +1136,7 @@ int ksmbd_vfs_empty_dir(struct ksmbd_file *fp)
 	set_ctx_actor(&readdir_data.ctx, __dir_empty);
 	readdir_data.dirent_count = 0;
 
-	err = iterate_dir(fp->filp, &readdir_data.ctx);
+	err = iterate_dir(fp->filp, &readdir_data.ctx, true);
 	if (readdir_data.dirent_count > 2)
 		err = -ENOTEMPTY;
 	else
@@ -1186,7 +1186,7 @@ static int ksmbd_vfs_lookup_in_dir(struct path *dir, char *name, size_t namelen)
 	if (IS_ERR(dfilp))
 		return PTR_ERR(dfilp);
 
-	ret = iterate_dir(dfilp, &readdir_data.ctx);
+	ret = iterate_dir(dfilp, &readdir_data.ctx, true);
 	if (readdir_data.dirent_count > 0)
 		ret = 0;
 	fput(dfilp);
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 6fedc49726bf..013b1a3530c9 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -307,7 +307,7 @@ nfsd4_list_rec_dir(recdir_func *f, struct nfsd_net *nn)
 		return status;
 	}
 
-	status = iterate_dir(nn->rec_file, &ctx.ctx);
+	status = iterate_dir(nn->rec_file, &ctx.ctx, true);
 	inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
 
 	list_for_each_entry_safe(entry, tmp, &ctx.names, list) {
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c99857689e2c..cd7a7d783fa7 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1980,7 +1980,7 @@ static __be32 nfsd_buffered_readdir(struct file *file, struct svc_fh *fhp,
 		buf.used = 0;
 		buf.full = 0;
 
-		host_err = iterate_dir(file, &buf.ctx);
+		host_err = iterate_dir(file, &buf.ctx, true);
 		if (buf.full)
 			host_err = 0;
 
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 150fdf3bc68d..089150315942 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -306,7 +306,7 @@ static inline int ovl_dir_read(struct path *realpath,
 	do {
 		rdd->count = 0;
 		rdd->err = 0;
-		err = iterate_dir(realfile, &rdd->ctx);
+		err = iterate_dir(realfile, &rdd->ctx, true);
 		if (err >= 0)
 			err = rdd->err;
 	} while (!err && rdd->count);
@@ -722,7 +722,7 @@ static int ovl_iterate_real(struct file *file, struct dir_context *ctx)
 			return PTR_ERR(rdt.cache);
 	}
 
-	err = iterate_dir(od->realfile, &rdt.ctx);
+	err = iterate_dir(od->realfile, &rdt.ctx, true);
 	ctx->pos = rdt.ctx.pos;
 
 	return err;
@@ -753,7 +753,7 @@ static int ovl_iterate(struct file *file, struct dir_context *ctx)
 		      OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent))))) {
 			err = ovl_iterate_real(file, ctx);
 		} else {
-			err = iterate_dir(od->realfile, ctx);
+			err = iterate_dir(od->realfile, ctx, true);
 		}
 		goto out;
 	}
diff --git a/fs/readdir.c b/fs/readdir.c
index 09e8ed7d4161..8ea5b5f45a78 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -21,6 +21,7 @@
 #include <linux/unistd.h>
 #include <linux/compat.h>
 #include <linux/uaccess.h>
+#include "internal.h"
 
 #include <asm/unaligned.h>
 
@@ -36,8 +37,14 @@
 	unsafe_copy_to_user(dst, src, len, label);		\
 } while (0)
 
-
-int iterate_dir(struct file *file, struct dir_context *ctx)
+/**
+ * iterate_dir - iterate over directory
+ * @file    : pointer to file struct of directory
+ * @ctx     : pointer to directory ctx structure
+ * @use_fpos: true : use file offset
+ *            false: use pos in ctx structure
+ */
+int iterate_dir(struct file *file, struct dir_context *ctx, bool use_fpos)
 {
 	struct inode *inode = file_inode(file);
 	bool shared = false;
@@ -60,12 +67,17 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
-		ctx->pos = file->f_pos;
+		if (use_fpos)
+			ctx->pos = file->f_pos;
+
 		if (shared)
 			res = file->f_op->iterate_shared(file, ctx);
 		else
 			res = file->f_op->iterate(file, ctx);
-		file->f_pos = ctx->pos;
+
+		if (use_fpos)
+			file->f_pos = ctx->pos;
+
 		fsnotify_access(file);
 		file_accessed(file);
 	}
@@ -190,7 +202,7 @@ SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 	if (!f.file)
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(f.file, &buf.ctx, true);
 	if (buf.result)
 		error = buf.result;
 
@@ -283,7 +295,7 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	if (!f.file)
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(f.file, &buf.ctx, true);
 	if (error >= 0)
 		error = buf.error;
 	if (buf.prev_reclen) {
@@ -448,7 +460,7 @@ COMPAT_SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
 	if (!f.file)
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(f.file, &buf.ctx, true);
 	if (buf.result)
 		error = buf.result;
 
@@ -534,7 +546,7 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
 	if (!f.file)
 		return -EBADF;
 
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(f.file, &buf.ctx, true);
 	if (error >= 0)
 		error = buf.error;
 	if (buf.prev_reclen) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1cb616fc1105..ba4f49c4ac41 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3343,7 +3343,7 @@ const char *simple_get_link(struct dentry *, struct inode *,
 			    struct delayed_call *);
 extern const struct inode_operations simple_symlink_inode_operations;
 
-extern int iterate_dir(struct file *, struct dir_context *);
+extern int iterate_dir(struct file *file, struct dir_context *ctx, bool use_fpos);
 
 int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
 		int flags);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] fs: split off vfs_getdents function of getdents64 syscall
  2021-11-24 23:16 [PATCH v2 0/3] io_uring: add getdents64 support Stefan Roesch
  2021-11-24 23:16 ` [PATCH v2 1/3] fs: add parameter use_fpos to iterate_dir function Stefan Roesch
@ 2021-11-24 23:16 ` Stefan Roesch
  2021-11-24 23:17 ` [PATCH v2 3/3] io_uring: add support for getdents64 Stefan Roesch
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Stefan Roesch @ 2021-11-24 23:16 UTC (permalink / raw)
  To: io-uring, linux-fsdevel; +Cc: shr

This splits off the vfs_getdents function from the getdents64 system
call. This allows io_uring to call the function.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/internal.h |  8 ++++++++
 fs/readdir.c  | 36 ++++++++++++++++++++++++++++--------
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 7979ff8d168c..355be993b9f1 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -194,3 +194,11 @@ long splice_file_to_pipe(struct file *in,
 			 struct pipe_inode_info *opipe,
 			 loff_t *offset,
 			 size_t len, unsigned int flags);
+
+/*
+ * fs/readdir.c
+ */
+struct linux_dirent64;
+
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+		 unsigned int count, s64 pos);
diff --git a/fs/readdir.c b/fs/readdir.c
index 8ea5b5f45a78..fc5b50fb160b 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -363,22 +363,26 @@ static int filldir64(struct dir_context *ctx, const char *name, int namlen,
 	return -EFAULT;
 }
 
-SYSCALL_DEFINE3(getdents64, unsigned int, fd,
-		struct linux_dirent64 __user *, dirent, unsigned int, count)
+/**
+ * vfs_getdents - getdents without fdget
+ * @file    : pointer to file struct of directory
+ * @dirent  : pointer to user directory structure
+ * @count   : size of buffer
+ * @ctx_pos : if file pos is used, pass -1,
+ *            if ctx pos is used, pass ctx pos
+ */
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+		 unsigned int count, s64 ctx_pos)
 {
-	struct fd f;
 	struct getdents_callback64 buf = {
 		.ctx.actor = filldir64,
+		.ctx.pos = ctx_pos,
 		.count = count,
 		.current_dir = dirent
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!f.file)
-		return -EBADF;
-
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(file, &buf.ctx, ctx_pos < 0);
 	if (error >= 0)
 		error = buf.error;
 	if (buf.prev_reclen) {
@@ -391,6 +395,22 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
+
+	return error;
+}
+
+SYSCALL_DEFINE3(getdents64, unsigned int, fd,
+		struct linux_dirent64 __user *, dirent, unsigned int, count)
+{
+	struct fd f;
+	int error;
+
+	f = fdget_pos(fd);
+	if (!f.file)
+		return -EBADF;
+
+	error = vfs_getdents(f.file, dirent, count, -1);
+
 	fdput_pos(f);
 	return error;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] io_uring: add support for getdents64
  2021-11-24 23:16 [PATCH v2 0/3] io_uring: add getdents64 support Stefan Roesch
  2021-11-24 23:16 ` [PATCH v2 1/3] fs: add parameter use_fpos to iterate_dir function Stefan Roesch
  2021-11-24 23:16 ` [PATCH v2 2/3] fs: split off vfs_getdents function of getdents64 syscall Stefan Roesch
@ 2021-11-24 23:17 ` Stefan Roesch
  2021-11-25  4:42 ` [PATCH v2 0/3] io_uring: add getdents64 support Clay Harris
  2021-12-15 21:09 ` Jens Axboe
  4 siblings, 0 replies; 9+ messages in thread
From: Stefan Roesch @ 2021-11-24 23:17 UTC (permalink / raw)
  To: io-uring, linux-fsdevel; +Cc: shr

This adds support for getdents64 to io_uring.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/io_uring.c                 | 52 +++++++++++++++++++++++++++++++++++
 include/uapi/linux/io_uring.h |  1 +
 2 files changed, 53 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f666a0e7f5e8..d423050276e8 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -691,6 +691,13 @@ struct io_hardlink {
 	int				flags;
 };
 
+struct io_getdents {
+	struct file			*file;
+	struct linux_dirent64 __user	*dirent;
+	unsigned int			count;
+	loff_t				pos;
+};
+
 struct io_async_connect {
 	struct sockaddr_storage		address;
 };
@@ -856,6 +863,7 @@ struct io_kiocb {
 		struct io_mkdir		mkdir;
 		struct io_symlink	symlink;
 		struct io_hardlink	hardlink;
+		struct io_getdents	getdents;
 	};
 
 	u8				opcode;
@@ -1105,6 +1113,9 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_MKDIRAT] = {},
 	[IORING_OP_SYMLINKAT] = {},
 	[IORING_OP_LINKAT] = {},
+	[IORING_OP_GETDENTS] = {
+		.needs_file		= 1,
+	},
 };
 
 /* requests with any of those set should undergo io_disarm_next() */
@@ -3971,6 +3982,42 @@ static int io_linkat(struct io_kiocb *req, unsigned int issue_flags)
 	return 0;
 }
 
+static int io_getdents_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	struct io_getdents *getdents = &req->getdents;
+
+	if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+		return -EINVAL;
+	if (sqe->ioprio || sqe->rw_flags || sqe->buf_index)
+		return -EINVAL;
+
+	getdents->pos = READ_ONCE(sqe->off);
+	getdents->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	getdents->count = READ_ONCE(sqe->len);
+
+	return 0;
+}
+
+static int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_getdents *getdents = &req->getdents;
+	int ret;
+
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		return -EAGAIN;
+
+	ret = vfs_getdents(req->file, getdents->dirent, getdents->count, getdents->pos);
+	if (ret < 0) {
+		if (ret == -ERESTARTSYS)
+			ret = -EINTR;
+
+		req_set_fail(req);
+	}
+
+	io_req_complete(req, ret);
+	return 0;
+}
+
 static int io_shutdown_prep(struct io_kiocb *req,
 			    const struct io_uring_sqe *sqe)
 {
@@ -6486,6 +6533,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		return io_symlinkat_prep(req, sqe);
 	case IORING_OP_LINKAT:
 		return io_linkat_prep(req, sqe);
+	case IORING_OP_GETDENTS:
+		return io_getdents_prep(req, sqe);
 	}
 
 	printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6768,6 +6817,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	case IORING_OP_LINKAT:
 		ret = io_linkat(req, issue_flags);
 		break;
+	case IORING_OP_GETDENTS:
+		ret = io_getdents(req, issue_flags);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 787f491f0d2a..57dc88db5793 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -143,6 +143,7 @@ enum {
 	IORING_OP_MKDIRAT,
 	IORING_OP_SYMLINKAT,
 	IORING_OP_LINKAT,
+	IORING_OP_GETDENTS,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] io_uring: add getdents64 support
  2021-11-24 23:16 [PATCH v2 0/3] io_uring: add getdents64 support Stefan Roesch
                   ` (2 preceding siblings ...)
  2021-11-24 23:17 ` [PATCH v2 3/3] io_uring: add support for getdents64 Stefan Roesch
@ 2021-11-25  4:42 ` Clay Harris
  2021-12-01  6:01   ` Stefan Roesch
  2021-12-15 21:09 ` Jens Axboe
  4 siblings, 1 reply; 9+ messages in thread
From: Clay Harris @ 2021-11-25  4:42 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: io-uring, linux-fsdevel


I seem to recall making a few comments the last time a getdents64
for io_uring was proposed; in particular I wanted to bring up one
here.  This applies only to altering the internal interface, which
io_uring would use, although wiring up a new syscall would be a nice
addition.

The current interface has 2 issues:

1)
getdents64 requires at least two calls to read a directory.
One or more to get the dents and a final call to see the EOF.
With small directories, this means literally 50% of the calls
are wasted.

2)
The fpos cannot be changed atomically with a read, so it is not
possible to to safely perform concurrent reads on the same fd.

But, the kernel knows (most, if not all of the time) that it is at
EOF at the time it returns the last buffer.  So, it would be very
useful to get an EOF indicator back with the final buffer.  This
could just a flag, or for instance make an fpos parameter which is
both input and output, returning the (post read) fpos or zero at
EOF.

Futhermore, for input, one could supply:
	0:	Start from directory beginning
	-1:	Read from current position
	other:	(output from previous call) Read from here

On Wed, Nov 24 2021 at 15:16:57 -0800, Stefan Roesch quoth thus:

> This series adds support for getdents64 in liburing. The intent is to
> provide a more complete I/O interface for io_uring.
> 
> Patch 1: fs: add parameter use_fpos to iterate_dir()
>   This adds a new parameter to the function iterate_dir() so the
>   caller can specify if the position is the file position or the
>   position stored in the buffer context.
> 
> Patch 2: fs: split off vfs_getdents function from getdents64 system call
>   This splits of the iterate_dir part of the syscall in its own
>   dedicated function. This allows to call the function directly from
>   liburing.
> 
> Patch 3: io_uring: add support for getdents64
>   Adds the functions to io_uring to support getdents64.
> 
> There is also a patch series for the changes to liburing. This includes
> a new test. The patch series is called "liburing: add getdents support."
> 
> The following tests have been performed:
> - new liburing getdents test program has been run
> - xfstests have been run
> - both tests have been repeated with the kernel memory leak checker
>   and no leaks have been reported.
> 
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
> V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with
>     the additional parameter.
> 
> Stefan Roesch (3):
>   fs: add parameter use_fpos to iterate_dir function
>   fs: split off vfs_getdents function of getdents64 syscall
>   io_uring: add support for getdents64
> 
>  arch/alpha/kernel/osf_sys.c   |  2 +-
>  fs/ecryptfs/file.c            |  2 +-
>  fs/exportfs/expfs.c           |  2 +-
>  fs/internal.h                 |  8 +++++
>  fs/io_uring.c                 | 52 ++++++++++++++++++++++++++++
>  fs/ksmbd/smb2pdu.c            |  2 +-
>  fs/ksmbd/vfs.c                |  4 +--
>  fs/nfsd/nfs4recover.c         |  2 +-
>  fs/nfsd/vfs.c                 |  2 +-
>  fs/overlayfs/readdir.c        |  6 ++--
>  fs/readdir.c                  | 64 ++++++++++++++++++++++++++---------
>  include/linux/fs.h            |  2 +-
>  include/uapi/linux/io_uring.h |  1 +
>  13 files changed, 121 insertions(+), 28 deletions(-)
> 
> 
> base-commit: f0afafc21027c39544a2c1d889b0cff75b346932
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] io_uring: add getdents64 support
  2021-11-25  4:42 ` [PATCH v2 0/3] io_uring: add getdents64 support Clay Harris
@ 2021-12-01  6:01   ` Stefan Roesch
  2021-12-01  7:11     ` Clay Harris
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Roesch @ 2021-12-01  6:01 UTC (permalink / raw)
  To: Clay Harris; +Cc: io-uring, linux-fsdevel



On 11/24/21 8:42 PM, Clay Harris wrote:
> 
> I seem to recall making a few comments the last time a getdents64
> for io_uring was proposed; in particular I wanted to bring up one
> here.  This applies only to altering the internal interface, which
> io_uring would use, although wiring up a new syscall would be a nice
> addition.
> 
> The current interface has 2 issues:
> 
> 1)
> getdents64 requires at least two calls to read a directory.
> One or more to get the dents and a final call to see the EOF.
> With small directories, this means literally 50% of the calls
> are wasted.
> 
> 2)
> The fpos cannot be changed atomically with a read, so it is not
> possible to to safely perform concurrent reads on the same fd.
> 
> But, the kernel knows (most, if not all of the time) that it is at
> EOF at the time it returns the last buffer.  So, it would be very
> useful to get an EOF indicator back with the final buffer.  This
> could just a flag, or for instance make an fpos parameter which is
> both input and output, returning the (post read) fpos or zero at
> EOF.
> 
> Futhermore, for input, one could supply:
> 	0:	Start from directory beginning
> 	-1:	Read from current position
> 	other:	(output from previous call) Read from here
> 

While I can understand the wish to optimize the getdents call, this
has its own set of challenges:

- The getdents API is following the logic of other read API's. None
  of these API's has the logic you described above. This would be
  inconsistent.
- The eof needs to be stored in another field. The dirent structure
  does not have space in the field, so a new data structure needs to be defined.
- However the goal is to provide a familiar interface to the user.
- If the user wants to reduce the number of calls he can still provide
  a bigger user buffer.

> On Wed, Nov 24 2021 at 15:16:57 -0800, Stefan Roesch quoth thus:
> 
>> This series adds support for getdents64 in liburing. The intent is to
>> provide a more complete I/O interface for io_uring.
>>
>> Patch 1: fs: add parameter use_fpos to iterate_dir()
>>   This adds a new parameter to the function iterate_dir() so the
>>   caller can specify if the position is the file position or the
>>   position stored in the buffer context.
>>
>> Patch 2: fs: split off vfs_getdents function from getdents64 system call
>>   This splits of the iterate_dir part of the syscall in its own
>>   dedicated function. This allows to call the function directly from
>>   liburing.
>>
>> Patch 3: io_uring: add support for getdents64
>>   Adds the functions to io_uring to support getdents64.
>>
>> There is also a patch series for the changes to liburing. This includes
>> a new test. The patch series is called "liburing: add getdents support."
>>
>> The following tests have been performed:
>> - new liburing getdents test program has been run
>> - xfstests have been run
>> - both tests have been repeated with the kernel memory leak checker
>>   and no leaks have been reported.
>>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>> V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with
>>     the additional parameter.
>>
>> Stefan Roesch (3):
>>   fs: add parameter use_fpos to iterate_dir function
>>   fs: split off vfs_getdents function of getdents64 syscall
>>   io_uring: add support for getdents64
>>
>>  arch/alpha/kernel/osf_sys.c   |  2 +-
>>  fs/ecryptfs/file.c            |  2 +-
>>  fs/exportfs/expfs.c           |  2 +-
>>  fs/internal.h                 |  8 +++++
>>  fs/io_uring.c                 | 52 ++++++++++++++++++++++++++++
>>  fs/ksmbd/smb2pdu.c            |  2 +-
>>  fs/ksmbd/vfs.c                |  4 +--
>>  fs/nfsd/nfs4recover.c         |  2 +-
>>  fs/nfsd/vfs.c                 |  2 +-
>>  fs/overlayfs/readdir.c        |  6 ++--
>>  fs/readdir.c                  | 64 ++++++++++++++++++++++++++---------
>>  include/linux/fs.h            |  2 +-
>>  include/uapi/linux/io_uring.h |  1 +
>>  13 files changed, 121 insertions(+), 28 deletions(-)
>>
>>
>> base-commit: f0afafc21027c39544a2c1d889b0cff75b346932
>> -- 
>> 2.30.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] io_uring: add getdents64 support
  2021-12-01  6:01   ` Stefan Roesch
@ 2021-12-01  7:11     ` Clay Harris
  0 siblings, 0 replies; 9+ messages in thread
From: Clay Harris @ 2021-12-01  7:11 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: io-uring, linux-fsdevel

On Tue, Nov 30 2021 at 22:01:30 -0800, Stefan Roesch quoth thus:

> 
> 
> On 11/24/21 8:42 PM, Clay Harris wrote:
> > 
> > I seem to recall making a few comments the last time a getdents64
> > for io_uring was proposed; in particular I wanted to bring up one
> > here.  This applies only to altering the internal interface, which
> > io_uring would use, although wiring up a new syscall would be a nice
> > addition.
> > 
> > The current interface has 2 issues:
> > 
> > 1)
> > getdents64 requires at least two calls to read a directory.
> > One or more to get the dents and a final call to see the EOF.
> > With small directories, this means literally 50% of the calls
> > are wasted.
> > 
> > 2)
> > The fpos cannot be changed atomically with a read, so it is not
> > possible to to safely perform concurrent reads on the same fd.
> > 
> > But, the kernel knows (most, if not all of the time) that it is at
> > EOF at the time it returns the last buffer.  So, it would be very
> > useful to get an EOF indicator back with the final buffer.  This
> > could just a flag, or for instance make an fpos parameter which is
> > both input and output, returning the (post read) fpos or zero at
> > EOF.
> > 
> > Futhermore, for input, one could supply:
> > 	0:	Start from directory beginning
> > 	-1:	Read from current position
> > 	other:	(output from previous call) Read from here
> > 

Thank you for taking the time to respond!

> While I can understand the wish to optimize the getdents call, this
> has its own set of challenges:
> 
> - The getdents API is following the logic of other read API's. None
>   of these API's has the logic you described above. This would be
>   inconsistent.

The point was that the familiar interface can be improved by changing its API.

Consider as if you were implementing a new syscall:
ssize_t getdents2(int fd, void *buf, size_t count, off_t *offset, int flags);

This is close enough to pread() that no one would be slowed down by its
unfamiliarity.  It would work just like getdents64(), except that the fpos
at the end of the read would always be returned in offset.  At EOF, fpos 0
would be returned.  On the calling side, one would set a flag and set offset
to a value other than -1 to cause it to atomically change fpos before the read.

Just considering an io_uring (only) call, io_uring is already a new interface,
so the inconsistency point doesn't really seem to apply.

> - The eof needs to be stored in another field. The dirent structure
>   does not have space in the field, so a new data structure needs to be defined.

I believe the important part of the idea is the EOF processing, so I'll drop
the fpos discussion for now.  In this case you could use exactly the old
getdents64() API for io_uring with the minor addition of a returned flag bit.
My understanding, which I'll admit could be flawed, is that the directory
iterator should be testable for the EOF condition.  If correct, I'd think
that testing it and setting a bit in the cqe flags field is all you'd need.
So, as long as the internal call has access to the iterator, no new structures
are required.

> - However the goal is to provide a familiar interface to the user.
> - If the user wants to reduce the number of calls he can still provide
>   a bigger user buffer.

In io_uring, if there is no EOF flag as suggested above, the best you
could probably due is to queue two requests, IOSQE_IO_HARDLINK-ed together,
every time you want some directory entries, and then look for a zero length
return.  That's extra logic, the extra wasted call, and already doubling the
required buffering.

> 
> > On Wed, Nov 24 2021 at 15:16:57 -0800, Stefan Roesch quoth thus:
> > 
> >> This series adds support for getdents64 in liburing. The intent is to
> >> provide a more complete I/O interface for io_uring.
> >>
> >> Patch 1: fs: add parameter use_fpos to iterate_dir()
> >>   This adds a new parameter to the function iterate_dir() so the
> >>   caller can specify if the position is the file position or the
> >>   position stored in the buffer context.
> >>
> >> Patch 2: fs: split off vfs_getdents function from getdents64 system call
> >>   This splits of the iterate_dir part of the syscall in its own
> >>   dedicated function. This allows to call the function directly from
> >>   liburing.
> >>
> >> Patch 3: io_uring: add support for getdents64
> >>   Adds the functions to io_uring to support getdents64.
> >>
> >> There is also a patch series for the changes to liburing. This includes
> >> a new test. The patch series is called "liburing: add getdents support."
> >>
> >> The following tests have been performed:
> >> - new liburing getdents test program has been run
> >> - xfstests have been run
> >> - both tests have been repeated with the kernel memory leak checker
> >>   and no leaks have been reported.
> >>
> >> Signed-off-by: Stefan Roesch <[email protected]>
> >> ---
> >> V2: Updated the iterate_dir calls in fs/ksmbd, fs/ecryptfs and arch/alpha with
> >>     the additional parameter.
> >>
> >> Stefan Roesch (3):
> >>   fs: add parameter use_fpos to iterate_dir function
> >>   fs: split off vfs_getdents function of getdents64 syscall
> >>   io_uring: add support for getdents64
> >>
> >>  arch/alpha/kernel/osf_sys.c   |  2 +-
> >>  fs/ecryptfs/file.c            |  2 +-
> >>  fs/exportfs/expfs.c           |  2 +-
> >>  fs/internal.h                 |  8 +++++
> >>  fs/io_uring.c                 | 52 ++++++++++++++++++++++++++++
> >>  fs/ksmbd/smb2pdu.c            |  2 +-
> >>  fs/ksmbd/vfs.c                |  4 +--
> >>  fs/nfsd/nfs4recover.c         |  2 +-
> >>  fs/nfsd/vfs.c                 |  2 +-
> >>  fs/overlayfs/readdir.c        |  6 ++--
> >>  fs/readdir.c                  | 64 ++++++++++++++++++++++++++---------
> >>  include/linux/fs.h            |  2 +-
> >>  include/uapi/linux/io_uring.h |  1 +
> >>  13 files changed, 121 insertions(+), 28 deletions(-)
> >>
> >>
> >> base-commit: f0afafc21027c39544a2c1d889b0cff75b346932
> >> -- 
> >> 2.30.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] io_uring: add getdents64 support
  2021-11-24 23:16 [PATCH v2 0/3] io_uring: add getdents64 support Stefan Roesch
                   ` (3 preceding siblings ...)
  2021-11-25  4:42 ` [PATCH v2 0/3] io_uring: add getdents64 support Clay Harris
@ 2021-12-15 21:09 ` Jens Axboe
  2021-12-31 23:27   ` Al Viro
  4 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2021-12-15 21:09 UTC (permalink / raw)
  To: Stefan Roesch, io-uring, linux-fsdevel, Al Viro

On 11/24/21 4:16 PM, Stefan Roesch wrote:
> This series adds support for getdents64 in liburing. The intent is to
> provide a more complete I/O interface for io_uring.
> 
> Patch 1: fs: add parameter use_fpos to iterate_dir()
>   This adds a new parameter to the function iterate_dir() so the
>   caller can specify if the position is the file position or the
>   position stored in the buffer context.
> 
> Patch 2: fs: split off vfs_getdents function from getdents64 system call
>   This splits of the iterate_dir part of the syscall in its own
>   dedicated function. This allows to call the function directly from
>   liburing.
> 
> Patch 3: io_uring: add support for getdents64
>   Adds the functions to io_uring to support getdents64.
> 
> There is also a patch series for the changes to liburing. This includes
> a new test. The patch series is called "liburing: add getdents support."

Al, ping on this one as well, same question on the VFS side.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] io_uring: add getdents64 support
  2021-12-15 21:09 ` Jens Axboe
@ 2021-12-31 23:27   ` Al Viro
  0 siblings, 0 replies; 9+ messages in thread
From: Al Viro @ 2021-12-31 23:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Stefan Roesch, io-uring, linux-fsdevel

On Wed, Dec 15, 2021 at 02:09:52PM -0700, Jens Axboe wrote:
> On 11/24/21 4:16 PM, Stefan Roesch wrote:
> > This series adds support for getdents64 in liburing. The intent is to
> > provide a more complete I/O interface for io_uring.
> > 
> > Patch 1: fs: add parameter use_fpos to iterate_dir()
> >   This adds a new parameter to the function iterate_dir() so the
> >   caller can specify if the position is the file position or the
> >   position stored in the buffer context.
> > 
> > Patch 2: fs: split off vfs_getdents function from getdents64 system call
> >   This splits of the iterate_dir part of the syscall in its own
> >   dedicated function. This allows to call the function directly from
> >   liburing.
> > 
> > Patch 3: io_uring: add support for getdents64
> >   Adds the functions to io_uring to support getdents64.
> > 
> > There is also a patch series for the changes to liburing. This includes
> > a new test. The patch series is called "liburing: add getdents support."
> 
> Al, ping on this one as well, same question on the VFS side.

First of all, apologies for being MIA - had been sick for a couple of
months, so right now I'm digging through the huge pile of mail.

The problem is not with VFS side - it's with ->iterate_dir() *instances*.
The logics for validation of file position in there is not pretty, and
it's based upon the assumption that all position changes (other than
those from ->iterate_dir() itself) come through lseek(2), with its
per-opened-file exclusion with getdents(2).

Your API just shoves an arbitrary number at them, with no indication
as far as struct file instance is concerned that something might need
to be checked.  Hell, the file might've been just opened with no
directory-changing operations done since then; of course its position
is valid.  And has nothing whatsoever with the number you put into
ctx->pos.

Normalization is sufficiently costly to show up on real-world loads.
Even "assume it bogus" flag somewhere in ctx will seriously cost on
io_uring side.  And that doesn't help with the case of tmpfs et.al.
where position is ignored - there's a per-instance cursor moved
around on lseek/readdir and that's the authoritative thing there.

So please, use a saner userland ABI - pread-like stuff is a really,
really bad idea for directories.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-31 23:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-24 23:16 [PATCH v2 0/3] io_uring: add getdents64 support Stefan Roesch
2021-11-24 23:16 ` [PATCH v2 1/3] fs: add parameter use_fpos to iterate_dir function Stefan Roesch
2021-11-24 23:16 ` [PATCH v2 2/3] fs: split off vfs_getdents function of getdents64 syscall Stefan Roesch
2021-11-24 23:17 ` [PATCH v2 3/3] io_uring: add support for getdents64 Stefan Roesch
2021-11-25  4:42 ` [PATCH v2 0/3] io_uring: add getdents64 support Clay Harris
2021-12-01  6:01   ` Stefan Roesch
2021-12-01  7:11     ` Clay Harris
2021-12-15 21:09 ` Jens Axboe
2021-12-31 23:27   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox