* [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS
@ 2021-03-30 11:17 Lennert Buytenhek
2021-03-30 11:18 ` [PATCH v5 1/2] readdir: split the core of getdents64(2) out into vfs_getdents() Lennert Buytenhek
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Lennert Buytenhek @ 2021-03-30 11:17 UTC (permalink / raw)
To: io-uring; +Cc: Tavian Barnes, Dmitry Kadashev
(These patches depend on IORING_OP_MKDIRAT going in first.)
These patches add support for IORING_OP_GETDENTS, which is a new io_uring
opcode that more or less does an lseek(sqe->fd, sqe->off, SEEK_SET)
followed by a getdents64(sqe->fd, (void *)sqe->addr, sqe->len).
A dumb test program which recursively scans through a directory tree
and prints the names of all directories and files it encounters along
the way is available here:
https://krautbox.wantstofly.org/~buytenh/uringfind-v3.c
Changes since v4:
- Make IORING_OP_GETDENTS read from the directory's current position
if the specified offset value is -1 (IORING_FEAT_RW_CUR_POS).
(Requested / pointed out by Tavian Barnes.)
- Rebase onto for-5.13/io_uring as of 2021/03/30 plus v3 of Dmitry
Kadashev's "io_uring: add mkdirat support".
Changes since v3:
- Made locking in io_getdents() unconditional, as the prior
optimization was racy. (Pointed out by Pavel Begunkov.)
- Rebase onto for-5.13/io_uring as of 2021/03/12 plus a manually
applied version of the mkdirat patch.
Changes since v2 RFC:
- Rebase onto io_uring-2021-02-17 plus a manually applied version of
the mkdirat patch. The latter is needed because userland (liburing)
has already merged the opcode for IORING_OP_MKDIRAT (in commit
"io_uring.h: 5.12 pending kernel sync") while this opcode isn't in
the kernel yet (as of io_uring-2021-02-17), and this means that this
can't be merged until IORING_OP_MKDIRAT is merged.
- Adapt to changes made in "io_uring: replace force_nonblock with flags"
that are in io_uring-2021-02-17.
Changes since v1 RFC:
- Drop the trailing '64' from IORING_OP_GETDENTS64 (suggested by
Matthew Wilcox).
- Instead of requiring that sqe->off be zero, use this field to pass
in a directory offset to start reading from. For the first
IORING_OP_GETDENTS call on a directory, this can be set to zero,
and for subsequent calls, it can be set to the ->d_off field of
the last struct linux_dirent64 returned by the previous call.
Lennert Buytenhek (2):
readdir: split the core of getdents64(2) out into vfs_getdents()
io_uring: add support for IORING_OP_GETDENTS
fs/io_uring.c | 66 ++++++++++++++++++++++++++++++++++++++++++
fs/readdir.c | 25 ++++++++++-----
include/linux/fs.h | 4 ++
include/uapi/linux/io_uring.h | 1
4 files changed, 88 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v5 1/2] readdir: split the core of getdents64(2) out into vfs_getdents()
2021-03-30 11:17 [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
@ 2021-03-30 11:18 ` Lennert Buytenhek
2021-03-30 11:19 ` [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-04-04 4:42 ` [PATCH v5 0/2] " Clay Harris
2 siblings, 0 replies; 7+ messages in thread
From: Lennert Buytenhek @ 2021-03-30 11:18 UTC (permalink / raw)
To: io-uring; +Cc: Al Viro, linux-fsdevel
So that IORING_OP_GETDENTS may use it, split out the core of the
getdents64(2) syscall into a helper function, vfs_getdents().
vfs_getdents() calls into filesystems' ->iterate{,_shared}() which
expect serialization on struct file, which means that callers of
vfs_getdents() are responsible for either using fdget_pos() or
performing the equivalent serialization by hand.
Cc: Al Viro <[email protected]>
Signed-off-by: Lennert Buytenhek <[email protected]>
---
fs/readdir.c | 25 +++++++++++++++++--------
include/linux/fs.h | 4 ++++
2 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/fs/readdir.c b/fs/readdir.c
index 19434b3c982c..f52167c1eb61 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -348,10 +348,9 @@ static int filldir64(struct dir_context *ctx, const char *name, int namlen,
return -EFAULT;
}
-SYSCALL_DEFINE3(getdents64, unsigned int, fd,
- struct linux_dirent64 __user *, dirent, unsigned int, count)
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+ unsigned int count)
{
- struct fd f;
struct getdents_callback64 buf = {
.ctx.actor = filldir64,
.count = count,
@@ -359,11 +358,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
};
int error;
- f = fdget_pos(fd);
- if (!f.file)
- return -EBADF;
-
- error = iterate_dir(f.file, &buf.ctx);
+ error = iterate_dir(file, &buf.ctx);
if (error >= 0)
error = buf.error;
if (buf.prev_reclen) {
@@ -376,6 +371,20 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
else
error = count - buf.count;
}
+ return error;
+}
+
+SYSCALL_DEFINE3(getdents64, unsigned int, fd,
+ struct linux_dirent64 __user *, dirent, unsigned int, count)
+{
+ struct fd f;
+ int error;
+
+ f = fdget_pos(fd);
+ if (!f.file)
+ return -EBADF;
+
+ error = vfs_getdents(f.file, dirent, count);
fdput_pos(f);
return error;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ec8f3ddf4a6a..c03235883e18 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3227,6 +3227,10 @@ extern const struct inode_operations simple_symlink_inode_operations;
extern int iterate_dir(struct file *, struct dir_context *);
+struct linux_dirent64;
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+ unsigned int count);
+
int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat,
int flags);
int vfs_fstat(int fd, struct kstat *stat);
--
2.30.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS
2021-03-30 11:17 [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 11:18 ` [PATCH v5 1/2] readdir: split the core of getdents64(2) out into vfs_getdents() Lennert Buytenhek
@ 2021-03-30 11:19 ` Lennert Buytenhek
2021-03-30 12:11 ` Pavel Begunkov
2021-04-04 4:42 ` [PATCH v5 0/2] " Clay Harris
2 siblings, 1 reply; 7+ messages in thread
From: Lennert Buytenhek @ 2021-03-30 11:19 UTC (permalink / raw)
To: io-uring
IORING_OP_GETDENTS behaves much like getdents64(2) and takes the same
arguments, but with a small twist: it takes an additional offset
argument, and reading from the specified directory starts at the given
offset.
Internally, if necessary, IORING_OP_GETDENTS will vfs_llseek() to
the right directory position before calling vfs_getdents().
For the first IORING_OP_GETDENTS call on a directory, the offset
parameter can be set to zero, and for subsequent calls, it can be
set to the ->d_off field of the last struct linux_dirent64 returned
by the previous IORING_OP_GETDENTS call.
Alternatively, specifying an offset argument of -1 will read from
the directory's current file offset (IORING_FEAT_RW_CUR_POS).
Signed-off-by: Lennert Buytenhek <[email protected]>
---
fs/io_uring.c | 66 +++++++++++++++++++++++++++++++++++
include/uapi/linux/io_uring.h | 1 +
2 files changed, 67 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f4ff3da821a5..90637d5a34b9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -670,6 +670,13 @@ struct io_mkdir {
struct filename *filename;
};
+struct io_getdents {
+ struct file *file;
+ struct linux_dirent64 __user *dirent;
+ unsigned int count;
+ loff_t pos;
+};
+
struct io_completion {
struct file *file;
struct list_head list;
@@ -811,6 +818,7 @@ struct io_kiocb {
struct io_rename rename;
struct io_unlink unlink;
struct io_mkdir mkdir;
+ struct io_getdents getdents;
/* use only after cleaning per-op data, see io_clean_op() */
struct io_completion compl;
};
@@ -1025,6 +1033,9 @@ static const struct io_op_def io_op_defs[] = {
[IORING_OP_RENAMEAT] = {},
[IORING_OP_UNLINKAT] = {},
[IORING_OP_MKDIRAT] = {},
+ [IORING_OP_GETDENTS] = {
+ .needs_file = 1,
+ },
};
static bool io_disarm_next(struct io_kiocb *req);
@@ -4314,6 +4325,56 @@ static int io_sync_file_range(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
+static int io_getdents_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_getdents *getdents = &req->getdents;
+
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->ioprio || sqe->rw_flags || sqe->buf_index)
+ return -EINVAL;
+
+ getdents->pos = READ_ONCE(sqe->off);
+ getdents->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ getdents->count = READ_ONCE(sqe->len);
+ return 0;
+}
+
+static int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_getdents *getdents = &req->getdents;
+ int ret = 0;
+
+ /* getdents always requires a blocking context */
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ return -EAGAIN;
+
+ /* for vfs_llseek and to serialize ->iterate_shared() on this file */
+ mutex_lock(&req->file->f_pos_lock);
+
+ if (getdents->pos != -1 && getdents->pos != req->file->f_pos) {
+ loff_t res = vfs_llseek(req->file, getdents->pos, SEEK_SET);
+ if (res < 0)
+ ret = res;
+ }
+
+ if (ret == 0) {
+ ret = vfs_getdents(req->file, getdents->dirent,
+ getdents->count);
+ }
+
+ mutex_unlock(&req->file->f_pos_lock);
+
+ if (ret < 0) {
+ if (ret == -ERESTARTSYS)
+ ret = -EINTR;
+ req_set_fail_links(req);
+ }
+ io_req_complete(req, ret);
+ return 0;
+}
+
#if defined(CONFIG_NET)
static int io_setup_async_msg(struct io_kiocb *req,
struct io_async_msghdr *kmsg)
@@ -5991,6 +6052,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return io_unlinkat_prep(req, sqe);
case IORING_OP_MKDIRAT:
return io_mkdirat_prep(req, sqe);
+ case IORING_OP_GETDENTS:
+ return io_getdents_prep(req, sqe);
}
printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
@@ -6253,6 +6316,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
case IORING_OP_MKDIRAT:
ret = io_mkdirat(req, issue_flags);
break;
+ case IORING_OP_GETDENTS:
+ ret = io_getdents(req, issue_flags);
+ break;
default:
ret = -EINVAL;
break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index cf26a94ab880..0693a6e4d6bb 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -138,6 +138,7 @@ enum {
IORING_OP_RENAMEAT,
IORING_OP_UNLINKAT,
IORING_OP_MKDIRAT,
+ IORING_OP_GETDENTS,
/* this goes last, obviously */
IORING_OP_LAST,
--
2.30.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS
2021-03-30 11:19 ` [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
@ 2021-03-30 12:11 ` Pavel Begunkov
0 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2021-03-30 12:11 UTC (permalink / raw)
To: Lennert Buytenhek, io-uring
On 30/03/2021 12:19, Lennert Buytenhek wrote:
> IORING_OP_GETDENTS behaves much like getdents64(2) and takes the same
> arguments, but with a small twist: it takes an additional offset
> argument, and reading from the specified directory starts at the given
> offset.
>
> Internally, if necessary, IORING_OP_GETDENTS will vfs_llseek() to
> the right directory position before calling vfs_getdents().
>
> For the first IORING_OP_GETDENTS call on a directory, the offset
> parameter can be set to zero, and for subsequent calls, it can be
> set to the ->d_off field of the last struct linux_dirent64 returned
> by the previous IORING_OP_GETDENTS call.
I still consider this API being quite a mess. In particular, changing
file->pos even with specified offset is neither convenient for users
nor good performance-wise, and just looks weird.
I haven't been following the last discussion, but iirc Matthew proposed
how to do it right. If you want to "get it done quick", just seek
position back after doing your stuff, because once this patch is merged
we have to maintain the behaviour.
> Alternatively, specifying an offset argument of -1 will read from
> the directory's current file offset (IORING_FEAT_RW_CUR_POS).
>
> Signed-off-by: Lennert Buytenhek <[email protected]>
> ---
> fs/io_uring.c | 66 +++++++++++++++++++++++++++++++++++
> include/uapi/linux/io_uring.h | 1 +
> 2 files changed, 67 insertions(+)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index f4ff3da821a5..90637d5a34b9 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -670,6 +670,13 @@ struct io_mkdir {
> struct filename *filename;
> };
>
> +struct io_getdents {
> + struct file *file;
> + struct linux_dirent64 __user *dirent;
> + unsigned int count;
> + loff_t pos;
> +};
> +
> struct io_completion {
> struct file *file;
> struct list_head list;
> @@ -811,6 +818,7 @@ struct io_kiocb {
> struct io_rename rename;
> struct io_unlink unlink;
> struct io_mkdir mkdir;
> + struct io_getdents getdents;
> /* use only after cleaning per-op data, see io_clean_op() */
> struct io_completion compl;
> };
> @@ -1025,6 +1033,9 @@ static const struct io_op_def io_op_defs[] = {
> [IORING_OP_RENAMEAT] = {},
> [IORING_OP_UNLINKAT] = {},
> [IORING_OP_MKDIRAT] = {},
> + [IORING_OP_GETDENTS] = {
> + .needs_file = 1,
> + },
> };
>
> static bool io_disarm_next(struct io_kiocb *req);
> @@ -4314,6 +4325,56 @@ static int io_sync_file_range(struct io_kiocb *req, unsigned int issue_flags)
> return 0;
> }
>
> +static int io_getdents_prep(struct io_kiocb *req,
> + const struct io_uring_sqe *sqe)
> +{
> + struct io_getdents *getdents = &req->getdents;
> +
> + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> + return -EINVAL;
> + if (sqe->ioprio || sqe->rw_flags || sqe->buf_index)
> + return -EINVAL;
> +
> + getdents->pos = READ_ONCE(sqe->off);
> + getdents->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
> + getdents->count = READ_ONCE(sqe->len);
> + return 0;
> +}
> +
> +static int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
> +{
> + struct io_getdents *getdents = &req->getdents;
> + int ret = 0;
> +
> + /* getdents always requires a blocking context */
> + if (issue_flags & IO_URING_F_NONBLOCK)
> + return -EAGAIN;
> +
> + /* for vfs_llseek and to serialize ->iterate_shared() on this file */
> + mutex_lock(&req->file->f_pos_lock);
> +
> + if (getdents->pos != -1 && getdents->pos != req->file->f_pos) {
> + loff_t res = vfs_llseek(req->file, getdents->pos, SEEK_SET);
> + if (res < 0)
> + ret = res;
> + }
> +
> + if (ret == 0) {
> + ret = vfs_getdents(req->file, getdents->dirent,
> + getdents->count);
> + }
> +
> + mutex_unlock(&req->file->f_pos_lock);
> +
> + if (ret < 0) {
> + if (ret == -ERESTARTSYS)
> + ret = -EINTR;
> + req_set_fail_links(req);
> + }
> + io_req_complete(req, ret);
> + return 0;
> +}
> +
> #if defined(CONFIG_NET)
> static int io_setup_async_msg(struct io_kiocb *req,
> struct io_async_msghdr *kmsg)
> @@ -5991,6 +6052,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> return io_unlinkat_prep(req, sqe);
> case IORING_OP_MKDIRAT:
> return io_mkdirat_prep(req, sqe);
> + case IORING_OP_GETDENTS:
> + return io_getdents_prep(req, sqe);
> }
>
> printk_once(KERN_WARNING "io_uring: unhandled opcode %d\n",
> @@ -6253,6 +6316,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> case IORING_OP_MKDIRAT:
> ret = io_mkdirat(req, issue_flags);
> break;
> + case IORING_OP_GETDENTS:
> + ret = io_getdents(req, issue_flags);
> + break;
> default:
> ret = -EINVAL;
> break;
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index cf26a94ab880..0693a6e4d6bb 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -138,6 +138,7 @@ enum {
> IORING_OP_RENAMEAT,
> IORING_OP_UNLINKAT,
> IORING_OP_MKDIRAT,
> + IORING_OP_GETDENTS,
>
> /* this goes last, obviously */
> IORING_OP_LAST,
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS
2021-03-30 11:17 [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 11:18 ` [PATCH v5 1/2] readdir: split the core of getdents64(2) out into vfs_getdents() Lennert Buytenhek
2021-03-30 11:19 ` [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
@ 2021-04-04 4:42 ` Clay Harris
2021-04-04 17:25 ` Tavian Barnes
2 siblings, 1 reply; 7+ messages in thread
From: Clay Harris @ 2021-04-04 4:42 UTC (permalink / raw)
To: Lennert Buytenhek; +Cc: io-uring, Tavian Barnes, Dmitry Kadashev
On Tue, Mar 30 2021 at 14:17:21 +0300, Lennert Buytenhek quoth thus:
> (These patches depend on IORING_OP_MKDIRAT going in first.)
>
> These patches add support for IORING_OP_GETDENTS, which is a new io_uring
> opcode that more or less does an lseek(sqe->fd, sqe->off, SEEK_SET)
> followed by a getdents64(sqe->fd, (void *)sqe->addr, sqe->len).
>
> A dumb test program which recursively scans through a directory tree
> and prints the names of all directories and files it encounters along
> the way is available here:
>
> https://krautbox.wantstofly.org/~buytenh/uringfind-v3.c
>
> Changes since v4:
>
> - Make IORING_OP_GETDENTS read from the directory's current position
> if the specified offset value is -1 (IORING_FEAT_RW_CUR_POS).
> (Requested / pointed out by Tavian Barnes.)
This seems like a good feature. As I understand it, this would
allow submitting pairs of IORING_OP_GETDENTS with IOSQE_IO_HARDLINK
wherein the first specifies the current offset and the second specifies
offset -1, thereby halfing the number of kernel round trips for N getdents64.
If the entire directory fits into the first buffer, the second would
indicate EOF. This would certainly seem like a win, but note there
are diminishing returns as the directory size increases, versus just
doubling the buffer size.
An alternate / additional idea you may wish to consider is changing
getdents64 itself.
Ordinary read functions must return 0 length to indicate EOF, because
they can return arbitrary data. This is not the case for getdents64.
1) Define a struct linux_dirent of minimum size containing an abnormal
value as a sentinel. d_off = 0 or -1 should work.
2) Implement a flag for getdents64.
IF
the flag is set AND
we are returning a non-zero length buffer AND
there is room in the buffer for the sentinel structure AND
a getdents64 call using the d_off of the last struct in the
buffer would return EOF
THEN
append the sentinel struct to the buffer.
Using the arrangement, we would still handle a 0 length return as an
EOF, but if we see the sentinel struct, we can skip the additional call
altogether. The saves all of the pairing of buffers and extra logic,
and unless we're unlucky and the sentinel structure did not fit in
the buffer at EOF, would always reduce the number of getdents64
calls by one.
Moreover, if the flag was available outside of io_uring, for smaller
directories, this feature would cut the number of directory reads
of readdir(3) by up to half.
> - Rebase onto for-5.13/io_uring as of 2021/03/30 plus v3 of Dmitry
> Kadashev's "io_uring: add mkdirat support".
>
> Changes since v3:
>
> - Made locking in io_getdents() unconditional, as the prior
> optimization was racy. (Pointed out by Pavel Begunkov.)
>
> - Rebase onto for-5.13/io_uring as of 2021/03/12 plus a manually
> applied version of the mkdirat patch.
>
> Changes since v2 RFC:
>
> - Rebase onto io_uring-2021-02-17 plus a manually applied version of
> the mkdirat patch. The latter is needed because userland (liburing)
> has already merged the opcode for IORING_OP_MKDIRAT (in commit
> "io_uring.h: 5.12 pending kernel sync") while this opcode isn't in
> the kernel yet (as of io_uring-2021-02-17), and this means that this
> can't be merged until IORING_OP_MKDIRAT is merged.
>
> - Adapt to changes made in "io_uring: replace force_nonblock with flags"
> that are in io_uring-2021-02-17.
>
> Changes since v1 RFC:
>
> - Drop the trailing '64' from IORING_OP_GETDENTS64 (suggested by
> Matthew Wilcox).
>
> - Instead of requiring that sqe->off be zero, use this field to pass
> in a directory offset to start reading from. For the first
> IORING_OP_GETDENTS call on a directory, this can be set to zero,
> and for subsequent calls, it can be set to the ->d_off field of
> the last struct linux_dirent64 returned by the previous call.
>
> Lennert Buytenhek (2):
> readdir: split the core of getdents64(2) out into vfs_getdents()
> io_uring: add support for IORING_OP_GETDENTS
>
> fs/io_uring.c | 66 ++++++++++++++++++++++++++++++++++++++++++
> fs/readdir.c | 25 ++++++++++-----
> include/linux/fs.h | 4 ++
> include/uapi/linux/io_uring.h | 1
> 4 files changed, 88 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS
2021-04-04 4:42 ` [PATCH v5 0/2] " Clay Harris
@ 2021-04-04 17:25 ` Tavian Barnes
2021-04-05 9:52 ` Clay Harris
0 siblings, 1 reply; 7+ messages in thread
From: Tavian Barnes @ 2021-04-04 17:25 UTC (permalink / raw)
To: Clay Harris; +Cc: Lennert Buytenhek, io-uring, Dmitry Kadashev
On Sun, 4 Apr 2021 at 00:42, Clay Harris <[email protected]> wrote:
> On Tue, Mar 30 2021 at 14:17:21 +0300, Lennert Buytenhek quoth thus:
>
> > ...
> >
> > - Make IORING_OP_GETDENTS read from the directory's current position
> > if the specified offset value is -1 (IORING_FEAT_RW_CUR_POS).
> > (Requested / pointed out by Tavian Barnes.)
>
> This seems like a good feature. As I understand it, this would
> allow submitting pairs of IORING_OP_GETDENTS with IOSQE_IO_HARDLINK
> wherein the first specifies the current offset and the second specifies
> offset -1, thereby halfing the number of kernel round trips for N getdents64.
Yep, that was my main motivation for this suggestion.
> If the entire directory fits into the first buffer, the second would
> indicate EOF. This would certainly seem like a win, but note there
> are diminishing returns as the directory size increases, versus just
> doubling the buffer size.
True, but most directories are small, so I expect it would be a
benefit most of the time. Even for big directories you still get two
buffers filled with one syscall, same as if you did a conventional
getdents64() with twice as big a buffer.
> An alternate / additional idea you may wish to consider is changing
> getdents64 itself.
>
> Ordinary read functions must return 0 length to indicate EOF, because
> they can return arbitrary data. This is not the case for getdents64.
>
> 1) Define a struct linux_dirent of minimum size containing an abnormal
> value as a sentinel. d_off = 0 or -1 should work.
>
> 2) Implement a flag for getdents64.
Sadly getdents64() doesn't take a flags argument. We'd probably need
a new syscall.
> IF
> the flag is set AND
> we are returning a non-zero length buffer AND
> there is room in the buffer for the sentinel structure AND
> a getdents64 call using the d_off of the last struct in the
> buffer would return EOF
> THEN
> append the sentinel struct to the buffer.
>
>
> Using the arrangement, we would still handle a 0 length return as an
> EOF, but if we see the sentinel struct, we can skip the additional call
> altogether. The saves all of the pairing of buffers and extra logic,
> and unless we're unlucky and the sentinel structure did not fit in
> the buffer at EOF, would always reduce the number of getdents64
> calls by one.
>
> Moreover, if the flag was available outside of io_uring, for smaller
> directories, this feature would cut the number of directory reads
> of readdir(3) by up to half.
If we need a new syscall anyway, the calling convention could be
adjusted to indicate EOF more easily than that, e.g.
int getdents2(int fd, void *buf, size_t *size, unsigned long flags);
With 0 being EOF, 1 being not-EOF, and -1 for error, or something.
--
Tavian Barnes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS
2021-04-04 17:25 ` Tavian Barnes
@ 2021-04-05 9:52 ` Clay Harris
0 siblings, 0 replies; 7+ messages in thread
From: Clay Harris @ 2021-04-05 9:52 UTC (permalink / raw)
To: Tavian Barnes; +Cc: Lennert Buytenhek, io-uring, Dmitry Kadashev
On Sun, Apr 04 2021 at 13:25:00 -0400, Tavian Barnes quoth thus:
> On Sun, 4 Apr 2021 at 00:42, Clay Harris <[email protected]> wrote:
> > On Tue, Mar 30 2021 at 14:17:21 +0300, Lennert Buytenhek quoth thus:
> >
> > > ...
> > >
> > > - Make IORING_OP_GETDENTS read from the directory's current position
> > > if the specified offset value is -1 (IORING_FEAT_RW_CUR_POS).
> > > (Requested / pointed out by Tavian Barnes.)
> >
> > This seems like a good feature. As I understand it, this would
> > allow submitting pairs of IORING_OP_GETDENTS with IOSQE_IO_HARDLINK
> > wherein the first specifies the current offset and the second specifies
> > offset -1, thereby halfing the number of kernel round trips for N getdents64.
>
> Yep, that was my main motivation for this suggestion.
>
> > If the entire directory fits into the first buffer, the second would
> > indicate EOF. This would certainly seem like a win, but note there
> > are diminishing returns as the directory size increases, versus just
> > doubling the buffer size.
>
> True, but most directories are small, so I expect it would be a
> benefit most of the time. Even for big directories you still get two
> buffers filled with one syscall, same as if you did a conventional
> getdents64() with twice as big a buffer.
>
> > An alternate / additional idea you may wish to consider is changing
> > getdents64 itself.
> >
> > Ordinary read functions must return 0 length to indicate EOF, because
> > they can return arbitrary data. This is not the case for getdents64.
> >
> > 1) Define a struct linux_dirent of minimum size containing an abnormal
> > value as a sentinel. d_off = 0 or -1 should work.
> >
> > 2) Implement a flag for getdents64.
>
> Sadly getdents64() doesn't take a flags argument. We'd probably need
> a new syscall.
>
> > IF
> > the flag is set AND
> > we are returning a non-zero length buffer AND
> > there is room in the buffer for the sentinel structure AND
> > a getdents64 call using the d_off of the last struct in the
> > buffer would return EOF
> > THEN
> > append the sentinel struct to the buffer.
> >
> >
> > Using the arrangement, we would still handle a 0 length return as an
> > EOF, but if we see the sentinel struct, we can skip the additional call
> > altogether. The saves all of the pairing of buffers and extra logic,
> > and unless we're unlucky and the sentinel structure did not fit in
> > the buffer at EOF, would always reduce the number of getdents64
> > calls by one.
> >
> > Moreover, if the flag was available outside of io_uring, for smaller
> > directories, this feature would cut the number of directory reads
> > of readdir(3) by up to half.
>
> If we need a new syscall anyway, the calling convention could be
> adjusted to indicate EOF more easily than that, e.g.
>
> int getdents2(int fd, void *buf, size_t *size, unsigned long flags);
>
> With 0 being EOF, 1 being not-EOF, and -1 for error, or something.
Good point! I was hedging the idea above with "if the flag was
available outside of io_uring" to allow an internal-only flag.
A new syscall would certainly be more useful as it would improve
every readdir(3) call, and of course could be called from inside
io_uring more efficiently than the current getdents64.
Alas, adding a new syscall like this is a little beyond my level of
expertise. Would you (or anyone else reading this) have any interest in
implementing a getdents2?
> --
> Tavian Barnes
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-04-05 9:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-30 11:17 [PATCH v5 0/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 11:18 ` [PATCH v5 1/2] readdir: split the core of getdents64(2) out into vfs_getdents() Lennert Buytenhek
2021-03-30 11:19 ` [PATCH v5 2/2] io_uring: add support for IORING_OP_GETDENTS Lennert Buytenhek
2021-03-30 12:11 ` Pavel Begunkov
2021-04-04 4:42 ` [PATCH v5 0/2] " Clay Harris
2021-04-04 17:25 ` Tavian Barnes
2021-04-05 9:52 ` Clay Harris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox