* [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring
@ 2025-09-12 15:28 Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 01/10] fhandle: create helper for name_to_handle_at(2) Thomas Bertschinger
` (9 more replies)
0 siblings, 10 replies; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
This series adds support for name_to_handle_at() and open_by_handle_at()
to io_uring. The idea is for these opcodes to be useful for userspace
NFS servers that want to use io_uring.
For both syscalls, io_uring will initially attempt to complete the
operation only using cached data, and will fall back to running in async
context when that is not possible.
Supporting this for open_by_handle_at() requires a way to communicate to
the filesystem that it should not block in its fh_to_dentry()
implementation. This is done with a new flag FILEID_CACHED which is set
in the file handle by the VFS. If a filesystem supports this new flag,
it will indicate that with a new flag EXPORT_OP_NONBLOCK so that the VFS
knows not to call into a filesystem with the FILEID_CACHED flag, when
the FS does not know about that flag.
Support for the new FILEID_CACHED flag is added for xfs.
v3 is mostly the same as [v2], with minor changes.
v2 -> v3:
- rename do_filp_path_open -> do_file_handle_open()
- rename the parameter fileid_type in xfs_fs_fh_to_{dentry,parent}() to
fileid_type_flags
- a few minor style fixups reported by checkpatch.pl
- fix incorrect use of '&' instead of '&&' in exportfs_decode_fh_raw()
- add docs for EXPORT_OP_NONBLOCK in Documentation/filesystems/nfs/exporting.rst
[v2] https://lore.kernel.org/linux-fsdevel/20250910214927.480316-1-tahbertschinger@gmail.com/
[v1] https://lore.kernel.org/linux-fsdevel/20250814235431.995876-1-tahbertschinger@gmail.com/
Thomas Bertschinger (10):
fhandle: create helper for name_to_handle_at(2)
io_uring: add support for IORING_OP_NAME_TO_HANDLE_AT
fhandle: helper for allocating, reading struct file_handle
fhandle: create do_file_handle_open() helper
fhandle: make do_file_handle_open() take struct open_flags
exportfs: allow VFS flags in struct file_handle
exportfs: new FILEID_CACHED flag for non-blocking fh lookup
io_uring: add __io_open_prep() helper
io_uring: add support for IORING_OP_OPEN_BY_HANDLE_AT
xfs: add support for non-blocking fh_to_dentry()
Documentation/filesystems/nfs/exporting.rst | 6 +
fs/exportfs/expfs.c | 14 +-
fs/fhandle.c | 155 +++++++++-------
fs/internal.h | 13 ++
fs/xfs/xfs_export.c | 34 +++-
fs/xfs/xfs_export.h | 3 +-
fs/xfs/xfs_handle.c | 2 +-
include/linux/exportfs.h | 34 +++-
include/uapi/linux/io_uring.h | 3 +
io_uring/opdef.c | 26 +++
io_uring/openclose.c | 191 +++++++++++++++++++-
io_uring/openclose.h | 13 ++
12 files changed, 409 insertions(+), 85 deletions(-)
base-commit: 76eeb9b8de9880ca38696b2fb56ac45ac0a25c6c
--
2.51.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 01/10] fhandle: create helper for name_to_handle_at(2)
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 02/10] io_uring: add support for IORING_OP_NAME_TO_HANDLE_AT Thomas Bertschinger
` (8 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
Create a helper do_sys_name_to_handle_at() that takes an additional
argument, lookup_flags, beyond the syscall arguments.
Because name_to_handle_at(2) doesn't take any lookup flags, it always
passes 0 for this argument.
Future callers like io_uring may pass LOOKUP_CACHED in order to request
a non-blocking lookup.
This helper's name is confusingly similar to do_sys_name_to_handle()
which takes care of returning the file handle, once the filename has
been turned into a struct path. To distinguish the names more clearly,
rename the latter to do_path_to_handle().
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fhandle.c | 61 ++++++++++++++++++++++++++++-----------------------
fs/internal.h | 9 ++++++++
2 files changed, 43 insertions(+), 27 deletions(-)
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 68a7d2861c58..605ad8e7d93d 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -14,10 +14,10 @@
#include "internal.h"
#include "mount.h"
-static long do_sys_name_to_handle(const struct path *path,
- struct file_handle __user *ufh,
- void __user *mnt_id, bool unique_mntid,
- int fh_flags)
+static long do_path_to_handle(const struct path *path,
+ struct file_handle __user *ufh,
+ void __user *mnt_id, bool unique_mntid,
+ int fh_flags)
{
long retval;
struct file_handle f_handle;
@@ -111,27 +111,11 @@ static long do_sys_name_to_handle(const struct path *path,
return retval;
}
-/**
- * sys_name_to_handle_at: convert name to handle
- * @dfd: directory relative to which name is interpreted if not absolute
- * @name: name that should be converted to handle.
- * @handle: resulting file handle
- * @mnt_id: mount id of the file system containing the file
- * (u64 if AT_HANDLE_MNT_ID_UNIQUE, otherwise int)
- * @flag: flag value to indicate whether to follow symlink or not
- * and whether a decodable file handle is required.
- *
- * @handle->handle_size indicate the space available to store the
- * variable part of the file handle in bytes. If there is not
- * enough space, the field is updated to return the minimum
- * value required.
- */
-SYSCALL_DEFINE5(name_to_handle_at, int, dfd, const char __user *, name,
- struct file_handle __user *, handle, void __user *, mnt_id,
- int, flag)
+long do_sys_name_to_handle_at(int dfd, const char __user *name,
+ struct file_handle __user *handle,
+ void __user *mnt_id, int flag, int lookup_flags)
{
struct path path;
- int lookup_flags;
int fh_flags = 0;
int err;
@@ -155,19 +139,42 @@ SYSCALL_DEFINE5(name_to_handle_at, int, dfd, const char __user *, name,
else if (flag & AT_HANDLE_CONNECTABLE)
fh_flags |= EXPORT_FH_CONNECTABLE;
- lookup_flags = (flag & AT_SYMLINK_FOLLOW) ? LOOKUP_FOLLOW : 0;
+ if (flag & AT_SYMLINK_FOLLOW)
+ lookup_flags |= LOOKUP_FOLLOW;
if (flag & AT_EMPTY_PATH)
lookup_flags |= LOOKUP_EMPTY;
err = user_path_at(dfd, name, lookup_flags, &path);
if (!err) {
- err = do_sys_name_to_handle(&path, handle, mnt_id,
- flag & AT_HANDLE_MNT_ID_UNIQUE,
- fh_flags);
+ err = do_path_to_handle(&path, handle, mnt_id,
+ flag & AT_HANDLE_MNT_ID_UNIQUE,
+ fh_flags);
path_put(&path);
}
return err;
}
+/**
+ * sys_name_to_handle_at: convert name to handle
+ * @dfd: directory relative to which name is interpreted if not absolute
+ * @name: name that should be converted to handle.
+ * @handle: resulting file handle
+ * @mnt_id: mount id of the file system containing the file
+ * (u64 if AT_HANDLE_MNT_ID_UNIQUE, otherwise int)
+ * @flag: flag value to indicate whether to follow symlink or not
+ * and whether a decodable file handle is required.
+ *
+ * @handle->handle_size indicate the space available to store the
+ * variable part of the file handle in bytes. If there is not
+ * enough space, the field is updated to return the minimum
+ * value required.
+ */
+SYSCALL_DEFINE5(name_to_handle_at, int, dfd, const char __user *, name,
+ struct file_handle __user *, handle, void __user *, mnt_id,
+ int, flag)
+{
+ return do_sys_name_to_handle_at(dfd, name, handle, mnt_id, flag, 0);
+}
+
static int get_path_anchor(int fd, struct path *root)
{
if (fd >= 0) {
diff --git a/fs/internal.h b/fs/internal.h
index 38e8aab27bbd..c972f8ade52d 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -355,3 +355,12 @@ int anon_inode_getattr(struct mnt_idmap *idmap, const struct path *path,
int anon_inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr);
void pidfs_get_root(struct path *path);
+
+/*
+ * fs/fhandle.c
+ */
+#ifdef CONFIG_FHANDLE
+long do_sys_name_to_handle_at(int dfd, const char __user *name,
+ struct file_handle __user *handle,
+ void __user *mnt_id, int flag, int lookup_flags);
+#endif /* CONFIG_FHANDLE */
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 02/10] io_uring: add support for IORING_OP_NAME_TO_HANDLE_AT
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 01/10] fhandle: create helper for name_to_handle_at(2) Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-17 14:14 ` Jens Axboe
2025-09-12 15:28 ` [PATCH v3 03/10] fhandle: helper for allocating, reading struct file_handle Thomas Bertschinger
` (7 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
Add support for name_to_handle_at(2) to io_uring.
Like openat*(), this tries to do a non-blocking lookup first and resorts
to async lookup when that fails.
This uses sqe->addr for the path, ->addr2 for the file handle which is
filled in by the kernel, and ->addr3 for the mouint_id which is filled
in by the kernel.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
---
include/uapi/linux/io_uring.h | 2 ++
io_uring/opdef.c | 11 +++++++++
io_uring/openclose.c | 45 +++++++++++++++++++++++++++++++++++
io_uring/openclose.h | 5 ++++
4 files changed, 63 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 6957dc539d83..a4aa83ad9527 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -74,6 +74,7 @@ struct io_uring_sqe {
__u32 install_fd_flags;
__u32 nop_flags;
__u32 pipe_flags;
+ __u32 name_to_handle_flags;
};
__u64 user_data; /* data to be passed back at completion time */
/* pack this to avoid bogus arm OABI complaints */
@@ -289,6 +290,7 @@ enum io_uring_op {
IORING_OP_READV_FIXED,
IORING_OP_WRITEV_FIXED,
IORING_OP_PIPE,
+ IORING_OP_NAME_TO_HANDLE_AT,
/* this goes last, obviously */
IORING_OP_LAST,
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 9568785810d9..76306c9e0ecd 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -574,6 +574,14 @@ const struct io_issue_def io_issue_defs[] = {
.prep = io_pipe_prep,
.issue = io_pipe,
},
+ [IORING_OP_NAME_TO_HANDLE_AT] = {
+#if defined(CONFIG_FHANDLE)
+ .prep = io_name_to_handle_at_prep,
+ .issue = io_name_to_handle_at,
+#else
+ .prep = io_eopnotsupp_prep,
+#endif
+ },
};
const struct io_cold_def io_cold_defs[] = {
@@ -824,6 +832,9 @@ const struct io_cold_def io_cold_defs[] = {
[IORING_OP_PIPE] = {
.name = "PIPE",
},
+ [IORING_OP_NAME_TO_HANDLE_AT] = {
+ .name = "NAME_TO_HANDLE_AT",
+ },
};
const char *io_uring_get_opcode(u8 opcode)
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index d70700e5cef8..884a66e56643 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -27,6 +27,15 @@ struct io_open {
unsigned long nofile;
};
+struct io_name_to_handle {
+ struct file *file;
+ int dfd;
+ int flags;
+ struct file_handle __user *ufh;
+ char __user *path;
+ void __user *mount_id;
+};
+
struct io_close {
struct file *file;
int fd;
@@ -187,6 +196,42 @@ void io_open_cleanup(struct io_kiocb *req)
putname(open->filename);
}
+#if defined(CONFIG_FHANDLE)
+int io_name_to_handle_at_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_name_to_handle *nh = io_kiocb_to_cmd(req, struct io_name_to_handle);
+
+ nh->dfd = READ_ONCE(sqe->fd);
+ nh->flags = READ_ONCE(sqe->name_to_handle_flags);
+ nh->path = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ nh->ufh = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ nh->mount_id = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+
+ return 0;
+}
+
+int io_name_to_handle_at(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_name_to_handle *nh = io_kiocb_to_cmd(req, struct io_name_to_handle);
+ int lookup_flags = 0;
+ long ret;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ lookup_flags = LOOKUP_CACHED;
+
+ ret = do_sys_name_to_handle_at(nh->dfd, nh->path, nh->ufh, nh->mount_id,
+ nh->flags, lookup_flags);
+
+ if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
+ return -EAGAIN;
+
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_set_res(req, ret, 0);
+ return IOU_COMPLETE;
+}
+#endif /* CONFIG_FHANDLE */
+
int __io_close_fixed(struct io_ring_ctx *ctx, unsigned int issue_flags,
unsigned int offset)
{
diff --git a/io_uring/openclose.h b/io_uring/openclose.h
index 4ca2a9935abc..2fc1c8d35d0b 100644
--- a/io_uring/openclose.h
+++ b/io_uring/openclose.h
@@ -10,6 +10,11 @@ void io_open_cleanup(struct io_kiocb *req);
int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_openat2(struct io_kiocb *req, unsigned int issue_flags);
+#if defined(CONFIG_FHANDLE)
+int io_name_to_handle_at_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_name_to_handle_at(struct io_kiocb *req, unsigned int issue_flags);
+#endif /* CONFIG_FHANDLE */
+
int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_close(struct io_kiocb *req, unsigned int issue_flags);
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 03/10] fhandle: helper for allocating, reading struct file_handle
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 01/10] fhandle: create helper for name_to_handle_at(2) Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 02/10] io_uring: add support for IORING_OP_NAME_TO_HANDLE_AT Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 04/10] fhandle: create do_file_handle_open() helper Thomas Bertschinger
` (6 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
Pull the code for allocating and copying a struct file_handle from
userspace into a helper function get_user_handle() just for this.
do_handle_open() is updated to call get_user_handle() prior to calling
handle_to_path(), and the latter now takes a kernel pointer as a
parameter instead of a __user pointer.
This new helper, as well as handle_to_path(), are also exposed in
fs/internal.h. In a subsequent commit, io_uring will use these helpers
to support open_by_handle_at(2) in io_uring.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fhandle.c | 63 +++++++++++++++++++++++++++++----------------------
fs/internal.h | 3 +++
2 files changed, 39 insertions(+), 27 deletions(-)
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 605ad8e7d93d..4ba23229758c 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -330,25 +330,44 @@ static inline int may_decode_fh(struct handle_to_path_ctx *ctx,
return 0;
}
-static int handle_to_path(int mountdirfd, struct file_handle __user *ufh,
- struct path *path, unsigned int o_flags)
+struct file_handle *get_user_handle(struct file_handle __user *ufh)
{
- int retval = 0;
struct file_handle f_handle;
- struct file_handle *handle __free(kfree) = NULL;
- struct handle_to_path_ctx ctx = {};
- const struct export_operations *eops;
+ struct file_handle *handle;
if (copy_from_user(&f_handle, ufh, sizeof(struct file_handle)))
- return -EFAULT;
+ return ERR_PTR(-EFAULT);
if ((f_handle.handle_bytes > MAX_HANDLE_SZ) ||
(f_handle.handle_bytes == 0))
- return -EINVAL;
+ return ERR_PTR(-EINVAL);
if (f_handle.handle_type < 0 ||
FILEID_USER_FLAGS(f_handle.handle_type) & ~FILEID_VALID_USER_FLAGS)
- return -EINVAL;
+ return ERR_PTR(-EINVAL);
+
+ handle = kmalloc(struct_size(handle, f_handle, f_handle.handle_bytes),
+ GFP_KERNEL);
+ if (!handle)
+ return ERR_PTR(-ENOMEM);
+
+ /* copy the full handle */
+ *handle = f_handle;
+ if (copy_from_user(&handle->f_handle,
+ &ufh->f_handle,
+ f_handle.handle_bytes)) {
+ return ERR_PTR(-EFAULT);
+ }
+
+ return handle;
+}
+
+int handle_to_path(int mountdirfd, struct file_handle *handle,
+ struct path *path, unsigned int o_flags)
+{
+ int retval = 0;
+ struct handle_to_path_ctx ctx = {};
+ const struct export_operations *eops;
retval = get_path_anchor(mountdirfd, &ctx.root);
if (retval)
@@ -362,31 +381,16 @@ static int handle_to_path(int mountdirfd, struct file_handle __user *ufh,
if (retval)
goto out_path;
- handle = kmalloc(struct_size(handle, f_handle, f_handle.handle_bytes),
- GFP_KERNEL);
- if (!handle) {
- retval = -ENOMEM;
- goto out_path;
- }
- /* copy the full handle */
- *handle = f_handle;
- if (copy_from_user(&handle->f_handle,
- &ufh->f_handle,
- f_handle.handle_bytes)) {
- retval = -EFAULT;
- goto out_path;
- }
-
/*
* If handle was encoded with AT_HANDLE_CONNECTABLE, verify that we
* are decoding an fd with connected path, which is accessible from
* the mount fd path.
*/
- if (f_handle.handle_type & FILEID_IS_CONNECTABLE) {
+ if (handle->handle_type & FILEID_IS_CONNECTABLE) {
ctx.fh_flags |= EXPORT_FH_CONNECTABLE;
ctx.flags |= HANDLE_CHECK_SUBTREE;
}
- if (f_handle.handle_type & FILEID_IS_DIR)
+ if (handle->handle_type & FILEID_IS_DIR)
ctx.fh_flags |= EXPORT_FH_DIR_ONLY;
/* Filesystem code should not be exposed to user flags */
handle->handle_type &= ~FILEID_USER_FLAGS_MASK;
@@ -400,12 +404,17 @@ static int handle_to_path(int mountdirfd, struct file_handle __user *ufh,
static long do_handle_open(int mountdirfd, struct file_handle __user *ufh,
int open_flag)
{
+ struct file_handle *handle __free(kfree) = NULL;
long retval = 0;
struct path path __free(path_put) = {};
struct file *file;
const struct export_operations *eops;
- retval = handle_to_path(mountdirfd, ufh, &path, open_flag);
+ handle = get_user_handle(ufh);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+
+ retval = handle_to_path(mountdirfd, handle, &path, open_flag);
if (retval)
return retval;
diff --git a/fs/internal.h b/fs/internal.h
index c972f8ade52d..ab80f83ded47 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -363,4 +363,7 @@ void pidfs_get_root(struct path *path);
long do_sys_name_to_handle_at(int dfd, const char __user *name,
struct file_handle __user *handle,
void __user *mnt_id, int flag, int lookup_flags);
+struct file_handle *get_user_handle(struct file_handle __user *ufh);
+int handle_to_path(int mountdirfd, struct file_handle *handle,
+ struct path *path, unsigned int o_flags);
#endif /* CONFIG_FHANDLE */
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 04/10] fhandle: create do_file_handle_open() helper
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
` (2 preceding siblings ...)
2025-09-12 15:28 ` [PATCH v3 03/10] fhandle: helper for allocating, reading struct file_handle Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 05/10] fhandle: make do_file_handle_open() take struct open_flags Thomas Bertschinger
` (5 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
This pulls the code for opening a file, after its handle has been
converted to a struct path, into a new helper function.
This function will be used by io_uring once io_uring supports
open_by_handle_at(2).
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
---
fs/fhandle.c | 21 +++++++++++++++------
fs/internal.h | 1 +
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 4ba23229758c..b018fa482b03 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -401,6 +401,20 @@ int handle_to_path(int mountdirfd, struct file_handle *handle,
return retval;
}
+struct file *do_file_handle_open(struct path *path, int open_flag)
+{
+ const struct export_operations *eops;
+ struct file *file;
+
+ eops = path->mnt->mnt_sb->s_export_op;
+ if (eops->open)
+ file = eops->open(path, open_flag);
+ else
+ file = file_open_root(path, "", open_flag, 0);
+
+ return file;
+}
+
static long do_handle_open(int mountdirfd, struct file_handle __user *ufh,
int open_flag)
{
@@ -408,7 +422,6 @@ static long do_handle_open(int mountdirfd, struct file_handle __user *ufh,
long retval = 0;
struct path path __free(path_put) = {};
struct file *file;
- const struct export_operations *eops;
handle = get_user_handle(ufh);
if (IS_ERR(handle))
@@ -422,11 +435,7 @@ static long do_handle_open(int mountdirfd, struct file_handle __user *ufh,
if (fd < 0)
return fd;
- eops = path.mnt->mnt_sb->s_export_op;
- if (eops->open)
- file = eops->open(&path, open_flag);
- else
- file = file_open_root(&path, "", open_flag, 0);
+ file = do_file_handle_open(&path, open_flag);
if (IS_ERR(file))
return PTR_ERR(file);
diff --git a/fs/internal.h b/fs/internal.h
index ab80f83ded47..0a3d90d30d96 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -366,4 +366,5 @@ long do_sys_name_to_handle_at(int dfd, const char __user *name,
struct file_handle *get_user_handle(struct file_handle __user *ufh);
int handle_to_path(int mountdirfd, struct file_handle *handle,
struct path *path, unsigned int o_flags);
+struct file *do_file_handle_open(struct path *path, int open_flag);
#endif /* CONFIG_FHANDLE */
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 05/10] fhandle: make do_file_handle_open() take struct open_flags
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
` (3 preceding siblings ...)
2025-09-12 15:28 ` [PATCH v3 04/10] fhandle: create do_file_handle_open() helper Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 06/10] exportfs: allow VFS flags in struct file_handle Thomas Bertschinger
` (4 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
This allows the caller to pass additional flags, such as lookup flags,
if desired.
This will be used by io_uring to support non-blocking
open_by_handle_at(2).
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
---
fs/fhandle.c | 14 ++++++++++----
fs/internal.h | 2 +-
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/fs/fhandle.c b/fs/fhandle.c
index b018fa482b03..7cc17e03e632 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -401,16 +401,16 @@ int handle_to_path(int mountdirfd, struct file_handle *handle,
return retval;
}
-struct file *do_file_handle_open(struct path *path, int open_flag)
+struct file *do_file_handle_open(struct path *path, struct open_flags *op)
{
const struct export_operations *eops;
struct file *file;
eops = path->mnt->mnt_sb->s_export_op;
if (eops->open)
- file = eops->open(path, open_flag);
+ file = eops->open(path, op->open_flag);
else
- file = file_open_root(path, "", open_flag, 0);
+ file = do_file_open_root(path, "", op);
return file;
}
@@ -422,6 +422,8 @@ static long do_handle_open(int mountdirfd, struct file_handle __user *ufh,
long retval = 0;
struct path path __free(path_put) = {};
struct file *file;
+ struct open_flags op;
+ struct open_how how;
handle = get_user_handle(ufh);
if (IS_ERR(handle))
@@ -435,7 +437,11 @@ static long do_handle_open(int mountdirfd, struct file_handle __user *ufh,
if (fd < 0)
return fd;
- file = do_file_handle_open(&path, open_flag);
+ how = build_open_how(open_flag, 0);
+ retval = build_open_flags(&how, &op);
+ if (retval)
+ return retval;
+ file = do_file_handle_open(&path, &op);
if (IS_ERR(file))
return PTR_ERR(file);
diff --git a/fs/internal.h b/fs/internal.h
index 0a3d90d30d96..2d107383a534 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -366,5 +366,5 @@ long do_sys_name_to_handle_at(int dfd, const char __user *name,
struct file_handle *get_user_handle(struct file_handle __user *ufh);
int handle_to_path(int mountdirfd, struct file_handle *handle,
struct path *path, unsigned int o_flags);
-struct file *do_file_handle_open(struct path *path, int open_flag);
+struct file *do_file_handle_open(struct path *path, struct open_flags *op);
#endif /* CONFIG_FHANDLE */
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 06/10] exportfs: allow VFS flags in struct file_handle
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
` (4 preceding siblings ...)
2025-09-12 15:28 ` [PATCH v3 05/10] fhandle: make do_file_handle_open() take struct open_flags Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 07/10] exportfs: new FILEID_CACHED flag for non-blocking fh lookup Thomas Bertschinger
` (3 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
The handle_type field of struct file_handle is already being used to
pass "user" flags to open_by_handle_at() in the upper 16 bits.
Bits 8..15 are still unused, as FS implementations are expected to only
set the lower 8 bits.
This change prepares the VFS to pass flags to FS implementations of
fh_to_{dentry,parent}() using the previously unused bits 8..15 of
handle_type.
The user is prevented from setting VFS flags in a file handle--such a
handle will be rejected by open_by_handle_at(2). Only the VFS can set
those flags before passing the handle to the FS.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
fs/exportfs/expfs.c | 2 +-
fs/fhandle.c | 2 +-
include/linux/exportfs.h | 29 ++++++++++++++++++++++++++---
3 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index d3e55de4a2a2..949ce6ef6c4e 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -391,7 +391,7 @@ int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
else
type = nop->encode_fh(inode, fid->raw, max_len, parent);
- if (type > 0 && FILEID_USER_FLAGS(type)) {
+ if (type > 0 && (type & ~FILEID_HANDLE_TYPE_MASK)) {
pr_warn_once("%s: unexpected fh type value 0x%x from fstype %s.\n",
__func__, type, inode->i_sb->s_type->name);
return -EINVAL;
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 7cc17e03e632..2dc669aeb520 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -342,7 +342,7 @@ struct file_handle *get_user_handle(struct file_handle __user *ufh)
(f_handle.handle_bytes == 0))
return ERR_PTR(-EINVAL);
- if (f_handle.handle_type < 0 ||
+ if (f_handle.handle_type < 0 || FILEID_FS_FLAGS(f_handle.handle_type) ||
FILEID_USER_FLAGS(f_handle.handle_type) & ~FILEID_VALID_USER_FLAGS)
return ERR_PTR(-EINVAL);
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index cfb0dd1ea49c..30a9791d88e0 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -173,10 +173,33 @@ struct handle_to_path_ctx {
#define EXPORT_FH_DIR_ONLY 0x4 /* Only decode file handle for a directory */
/*
- * Filesystems use only lower 8 bits of file_handle type for fid_type.
- * name_to_handle_at() uses upper 16 bits of type as user flags to be
- * interpreted by open_by_handle_at().
+ * The 32 bits of the handle_type field of struct file_handle are used for a few
+ * different purposes:
+ *
+ * Filesystems use only lower 8 bits of file_handle type for fid_type.
+ *
+ * VFS uses bits 8..15 of the handle_type to pass flags to the FS
+ * implementation of fh_to_{dentry,parent}().
+ *
+ * name_to_handle_at() uses upper 16 bits of type as user flags to be
+ * interpreted by open_by_handle_at().
+ *
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * | user flags | VFS flags | fid_type |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * (MSB) (LSB)
+ *
+ * Filesystems are expected not to fill in any bits outside of fid_type in
+ * their encode_fh() implementation.
*/
+#define FILEID_HANDLE_TYPE_MASK 0xff
+#define FILEID_TYPE(type) ((type) & FILEID_HANDLE_TYPE_MASK)
+
+/* VFS flags: */
+#define FILEID_FS_FLAGS_MASK 0xff00
+#define FILEID_FS_FLAGS(flags) ((flags) & FILEID_FS_FLAGS_MASK)
+
+/* User flags: */
#define FILEID_USER_FLAGS_MASK 0xffff0000
#define FILEID_USER_FLAGS(type) ((type) & FILEID_USER_FLAGS_MASK)
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 07/10] exportfs: new FILEID_CACHED flag for non-blocking fh lookup
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
` (5 preceding siblings ...)
2025-09-12 15:28 ` [PATCH v3 06/10] exportfs: allow VFS flags in struct file_handle Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-12 16:28 ` Amir Goldstein
2025-09-12 15:28 ` [PATCH v3 08/10] io_uring: add __io_open_prep() helper Thomas Bertschinger
` (2 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
This defines a new flag FILEID_CACHED that the VFS can set in the
handle_type field of struct file_handle to request that the FS
implementations of fh_to_{dentry,parent}() only complete if they can
satisfy the request with cached data.
Because not every FS implementation will recognize this new flag, those
that do recognize the flag can indicate their support using a new
export flag, EXPORT_OP_NONBLOCK.
If FILEID_CACHED is set in a file handle, but the filesystem does not
set EXPORT_OP_NONBLOCK, then the VFS will return -EAGAIN without
attempting to call into the filesystem code.
exportfs_decode_fh_raw() is updated to respect the new flag by returning
-EAGAIN when it would need to do an operation that may not be possible
with only cached data.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
---
I didn't apply Amir's Reviewed-by for this patch because I added the
Documenation section, which was not reviewed in v2.
Documentation/filesystems/nfs/exporting.rst | 6 ++++++
fs/exportfs/expfs.c | 12 ++++++++++++
fs/fhandle.c | 2 ++
include/linux/exportfs.h | 5 +++++
4 files changed, 25 insertions(+)
diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst
index de64d2d002a2..70f46eaeb0d4 100644
--- a/Documentation/filesystems/nfs/exporting.rst
+++ b/Documentation/filesystems/nfs/exporting.rst
@@ -238,3 +238,9 @@ following flags are defined:
all of an inode's dirty data on last close. Exports that behave this
way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip
waiting for writeback when closing such files.
+
+ EXPORT_OP_NONBLOCK - FS supports fh_to_{dentry,parent}() using cached data
+ When performing open_by_handle_at(2) using io_uring, it is useful to
+ complete the file open using only cached data when possible, otherwise
+ failing with -EAGAIN. This flag indicates that the filesystem supports this
+ mode of operation.
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 949ce6ef6c4e..e2cfdd9d6392 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -441,6 +441,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
void *context)
{
const struct export_operations *nop = mnt->mnt_sb->s_export_op;
+ bool decode_cached = fileid_type & FILEID_CACHED;
struct dentry *result, *alias;
char nbuf[NAME_MAX+1];
int err;
@@ -453,6 +454,10 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
*/
if (!exportfs_can_decode_fh(nop))
return ERR_PTR(-ESTALE);
+
+ if (decode_cached && !(nop->flags & EXPORT_OP_NONBLOCK))
+ return ERR_PTR(-EAGAIN);
+
result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
if (IS_ERR_OR_NULL(result))
return result;
@@ -481,6 +486,10 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
* filesystem root.
*/
if (result->d_flags & DCACHE_DISCONNECTED) {
+ err = -EAGAIN;
+ if (decode_cached)
+ goto err_result;
+
err = reconnect_path(mnt, result, nbuf);
if (err)
goto err_result;
@@ -526,6 +535,9 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
err = PTR_ERR(target_dir);
if (IS_ERR(target_dir))
goto err_result;
+ err = -EAGAIN;
+ if (decode_cached && (target_dir->d_flags & DCACHE_DISCONNECTED))
+ goto err_result;
/*
* And as usual we need to make sure the parent directory is
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 2dc669aeb520..509ff8983f94 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -273,6 +273,8 @@ static int do_handle_to_path(struct file_handle *handle, struct path *path,
if (IS_ERR_OR_NULL(dentry)) {
if (dentry == ERR_PTR(-ENOMEM))
return -ENOMEM;
+ if (dentry == ERR_PTR(-EAGAIN))
+ return -EAGAIN;
return -ESTALE;
}
path->dentry = dentry;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 30a9791d88e0..8238b6f67956 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -199,6 +199,8 @@ struct handle_to_path_ctx {
#define FILEID_FS_FLAGS_MASK 0xff00
#define FILEID_FS_FLAGS(flags) ((flags) & FILEID_FS_FLAGS_MASK)
+#define FILEID_CACHED 0x100 /* Use only cached data when decoding handle */
+
/* User flags: */
#define FILEID_USER_FLAGS_MASK 0xffff0000
#define FILEID_USER_FLAGS(type) ((type) & FILEID_USER_FLAGS_MASK)
@@ -303,6 +305,9 @@ struct export_operations {
*/
#define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
#define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
+#define EXPORT_OP_NONBLOCK (0x80) /* Filesystem supports non-
+ blocking fh_to_dentry()
+ */
unsigned long flags;
};
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 08/10] io_uring: add __io_open_prep() helper
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
` (6 preceding siblings ...)
2025-09-12 15:28 ` [PATCH v3 07/10] exportfs: new FILEID_CACHED flag for non-blocking fh lookup Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-17 14:18 ` Jens Axboe
2025-09-12 15:28 ` [PATCH v3 09/10] io_uring: add support for IORING_OP_OPEN_BY_HANDLE_AT Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 10/10] xfs: add support for non-blocking fh_to_dentry() Thomas Bertschinger
9 siblings, 1 reply; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
This adds a helper, __io_open_prep(), which does the part of preparing
for an open that is shared between openat*() and open_by_handle_at().
It excludes reading in the user path or file handle--this will be done
by functions specific to the kind of open().
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
---
io_uring/openclose.c | 35 +++++++++++++++++++++++++----------
1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index 884a66e56643..4da2afdb9773 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -58,11 +58,10 @@ static bool io_openat_force_async(struct io_open *open)
return open->how.flags & (O_TRUNC | O_CREAT | __O_TMPFILE);
}
-static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+/* Prep for open that is common to both openat*() and open_by_handle_at() */
+static int __io_open_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_open *open = io_kiocb_to_cmd(req, struct io_open);
- const char __user *fname;
- int ret;
if (unlikely(sqe->buf_index))
return -EINVAL;
@@ -74,6 +73,29 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
open->how.flags |= O_LARGEFILE;
open->dfd = READ_ONCE(sqe->fd);
+
+ open->file_slot = READ_ONCE(sqe->file_index);
+ if (open->file_slot && (open->how.flags & O_CLOEXEC))
+ return -EINVAL;
+
+ open->nofile = rlimit(RLIMIT_NOFILE);
+
+ if (io_openat_force_async(open))
+ req->flags |= REQ_F_FORCE_ASYNC;
+
+ return 0;
+}
+
+static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_open *open = io_kiocb_to_cmd(req, struct io_open);
+ const char __user *fname;
+ int ret;
+
+ ret = __io_open_prep(req, sqe);
+ if (ret)
+ return ret;
+
fname = u64_to_user_ptr(READ_ONCE(sqe->addr));
open->filename = getname(fname);
if (IS_ERR(open->filename)) {
@@ -82,14 +104,7 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
return ret;
}
- open->file_slot = READ_ONCE(sqe->file_index);
- if (open->file_slot && (open->how.flags & O_CLOEXEC))
- return -EINVAL;
-
- open->nofile = rlimit(RLIMIT_NOFILE);
req->flags |= REQ_F_NEED_CLEANUP;
- if (io_openat_force_async(open))
- req->flags |= REQ_F_FORCE_ASYNC;
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 09/10] io_uring: add support for IORING_OP_OPEN_BY_HANDLE_AT
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
` (7 preceding siblings ...)
2025-09-12 15:28 ` [PATCH v3 08/10] io_uring: add __io_open_prep() helper Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-17 14:18 ` Jens Axboe
2025-09-12 15:28 ` [PATCH v3 10/10] xfs: add support for non-blocking fh_to_dentry() Thomas Bertschinger
9 siblings, 1 reply; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
This adds support for open_by_handle_at(2) to io_uring.
First an attempt to do a non-blocking open by handle is made. If that
fails, for example, because the target inode is not cached, a blocking
attempt is made.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
---
include/uapi/linux/io_uring.h | 1 +
io_uring/opdef.c | 15 +++++
io_uring/openclose.c | 111 ++++++++++++++++++++++++++++++++++
io_uring/openclose.h | 8 +++
4 files changed, 135 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index a4aa83ad9527..c571929e7807 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -291,6 +291,7 @@ enum io_uring_op {
IORING_OP_WRITEV_FIXED,
IORING_OP_PIPE,
IORING_OP_NAME_TO_HANDLE_AT,
+ IORING_OP_OPEN_BY_HANDLE_AT,
/* this goes last, obviously */
IORING_OP_LAST,
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 76306c9e0ecd..1aa36f3f30de 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -580,6 +580,15 @@ const struct io_issue_def io_issue_defs[] = {
.issue = io_name_to_handle_at,
#else
.prep = io_eopnotsupp_prep,
+#endif
+ },
+ [IORING_OP_OPEN_BY_HANDLE_AT] = {
+#if defined(CONFIG_FHANDLE)
+ .prep = io_open_by_handle_at_prep,
+ .issue = io_open_by_handle_at,
+ .async_size = sizeof(struct io_open_handle_async),
+#else
+ .prep = io_eopnotsupp_prep,
#endif
},
};
@@ -835,6 +844,12 @@ const struct io_cold_def io_cold_defs[] = {
[IORING_OP_NAME_TO_HANDLE_AT] = {
.name = "NAME_TO_HANDLE_AT",
},
+ [IORING_OP_OPEN_BY_HANDLE_AT] = {
+ .name = "OPEN_BY_HANDLE_AT",
+#if defined(CONFIG_FHANDLE)
+ .cleanup = io_open_by_handle_cleanup,
+#endif
+ }
};
const char *io_uring_get_opcode(u8 opcode)
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index 4da2afdb9773..289d61373567 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/errno.h>
+#include <linux/exportfs.h>
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/fdtable.h>
@@ -245,6 +246,116 @@ int io_name_to_handle_at(struct io_kiocb *req, unsigned int issue_flags)
io_req_set_res(req, ret, 0);
return IOU_COMPLETE;
}
+
+int io_open_by_handle_at_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_open *open = io_kiocb_to_cmd(req, struct io_open);
+ struct io_open_handle_async *ah;
+ u64 flags;
+ int ret;
+
+ flags = READ_ONCE(sqe->open_flags);
+ open->how = build_open_how(flags, 0);
+
+ ret = __io_open_prep(req, sqe);
+ if (ret)
+ return ret;
+
+ ah = io_uring_alloc_async_data(NULL, req);
+ if (!ah)
+ return -ENOMEM;
+ memset(&ah->path, 0, sizeof(ah->path));
+ ah->handle = get_user_handle(u64_to_user_ptr(READ_ONCE(sqe->addr)));
+ if (IS_ERR(ah->handle))
+ return PTR_ERR(ah->handle);
+
+ req->flags |= REQ_F_NEED_CLEANUP;
+
+ return 0;
+}
+
+int io_open_by_handle_at(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_open *open = io_kiocb_to_cmd(req, struct io_open);
+ struct io_open_handle_async *ah = req->async_data;
+ bool nonblock_set = open->how.flags & O_NONBLOCK;
+ bool fixed = !!open->file_slot;
+ struct file *file;
+ struct open_flags op;
+ int ret;
+
+ ret = build_open_flags(&open->how, &op);
+ if (ret)
+ goto err;
+
+ if (issue_flags & IO_URING_F_NONBLOCK)
+ ah->handle->handle_type |= FILEID_CACHED;
+ else
+ ah->handle->handle_type &= ~FILEID_CACHED;
+
+ if (!ah->path.dentry) {
+ /*
+ * Handle has not yet been converted to path, either because
+ * this is our first try, or because we tried previously with
+ * IO_URING_F_NONBLOCK set, and failed.
+ */
+ ret = handle_to_path(open->dfd, ah->handle, &ah->path, op.open_flag);
+ if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
+ return -EAGAIN;
+
+ if (ret)
+ goto err;
+ }
+
+ if (!fixed) {
+ ret = __get_unused_fd_flags(open->how.flags, open->nofile);
+ if (ret < 0)
+ goto err;
+ }
+
+ if (issue_flags & IO_URING_F_NONBLOCK) {
+ WARN_ON_ONCE(io_openat_force_async(open));
+ op.lookup_flags |= LOOKUP_CACHED;
+ op.open_flag |= O_NONBLOCK;
+ }
+ file = do_file_handle_open(&ah->path, &op);
+
+ if (IS_ERR(file)) {
+ if (!fixed)
+ put_unused_fd(ret);
+ ret = PTR_ERR(file);
+ if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
+ return -EAGAIN;
+ goto err;
+ }
+
+ if ((issue_flags & IO_URING_F_NONBLOCK) && !nonblock_set)
+ file->f_flags &= ~O_NONBLOCK;
+
+ if (!fixed)
+ fd_install(ret, file);
+ else
+ ret = io_fixed_fd_install(req, issue_flags, file,
+ open->file_slot);
+
+err:
+ io_open_by_handle_cleanup(req);
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ if (ret < 0)
+ req_set_fail(req);
+ io_req_set_res(req, ret, 0);
+ return IOU_COMPLETE;
+}
+
+void io_open_by_handle_cleanup(struct io_kiocb *req)
+{
+ struct io_open_handle_async *ah = req->async_data;
+
+ if (ah->path.dentry)
+ path_put(&ah->path);
+
+ kfree(ah->handle);
+}
#endif /* CONFIG_FHANDLE */
int __io_close_fixed(struct io_ring_ctx *ctx, unsigned int issue_flags,
diff --git a/io_uring/openclose.h b/io_uring/openclose.h
index 2fc1c8d35d0b..f966859a8a92 100644
--- a/io_uring/openclose.h
+++ b/io_uring/openclose.h
@@ -10,9 +10,17 @@ void io_open_cleanup(struct io_kiocb *req);
int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_openat2(struct io_kiocb *req, unsigned int issue_flags);
+struct io_open_handle_async {
+ struct file_handle *handle;
+ struct path path;
+};
+
#if defined(CONFIG_FHANDLE)
int io_name_to_handle_at_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_name_to_handle_at(struct io_kiocb *req, unsigned int issue_flags);
+int io_open_by_handle_at_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_open_by_handle_at(struct io_kiocb *req, unsigned int issue_flags);
+void io_open_by_handle_cleanup(struct io_kiocb *req);
#endif /* CONFIG_FHANDLE */
int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 10/10] xfs: add support for non-blocking fh_to_dentry()
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
` (8 preceding siblings ...)
2025-09-12 15:28 ` [PATCH v3 09/10] io_uring: add support for IORING_OP_OPEN_BY_HANDLE_AT Thomas Bertschinger
@ 2025-09-12 15:28 ` Thomas Bertschinger
2025-09-12 22:51 ` Dave Chinner
9 siblings, 1 reply; 16+ messages in thread
From: Thomas Bertschinger @ 2025-09-12 15:28 UTC (permalink / raw)
To: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
Cc: Thomas Bertschinger
This is to support using open_by_handle_at(2) via io_uring. It is useful
for io_uring to request that opening a file via handle be completed
using only cached data, or fail with -EAGAIN if that is not possible.
The signature of xfs_nfs_get_inode() is extended with a new flags
argument that allows callers to specify XFS_IGET_INCORE.
That flag is set when the VFS passes the FILEID_CACHED flag via the
fileid_type argument.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
---
fs/xfs/xfs_export.c | 34 ++++++++++++++++++++++++++--------
fs/xfs/xfs_export.h | 3 ++-
fs/xfs/xfs_handle.c | 2 +-
3 files changed, 29 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index 201489d3de08..6a57ed8fd9b7 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -106,7 +106,8 @@ struct inode *
xfs_nfs_get_inode(
struct super_block *sb,
u64 ino,
- u32 generation)
+ u32 generation,
+ uint flags)
{
xfs_mount_t *mp = XFS_M(sb);
xfs_inode_t *ip;
@@ -123,7 +124,9 @@ xfs_nfs_get_inode(
* fine and not an indication of a corrupted filesystem as clients can
* send invalid file handles and we have to handle it gracefully..
*/
- error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip);
+ flags |= XFS_IGET_UNTRUSTED;
+
+ error = xfs_iget(mp, NULL, ino, flags, 0, &ip);
if (error) {
/*
@@ -140,6 +143,10 @@ xfs_nfs_get_inode(
case -EFSCORRUPTED:
error = -ESTALE;
break;
+ case -ENODATA:
+ if (flags & XFS_IGET_INCORE)
+ error = -EAGAIN;
+ break;
default:
break;
}
@@ -170,10 +177,15 @@ xfs_nfs_get_inode(
STATIC struct dentry *
xfs_fs_fh_to_dentry(struct super_block *sb, struct fid *fid,
- int fh_len, int fileid_type)
+ int fh_len, int fileid_type_flags)
{
+ int fileid_type = FILEID_TYPE(fileid_type_flags);
struct xfs_fid64 *fid64 = (struct xfs_fid64 *)fid;
struct inode *inode = NULL;
+ uint flags = 0;
+
+ if (fileid_type_flags & FILEID_CACHED)
+ flags = XFS_IGET_INCORE;
if (fh_len < xfs_fileid_length(fileid_type))
return NULL;
@@ -181,11 +193,11 @@ xfs_fs_fh_to_dentry(struct super_block *sb, struct fid *fid,
switch (fileid_type) {
case FILEID_INO32_GEN_PARENT:
case FILEID_INO32_GEN:
- inode = xfs_nfs_get_inode(sb, fid->i32.ino, fid->i32.gen);
+ inode = xfs_nfs_get_inode(sb, fid->i32.ino, fid->i32.gen, flags);
break;
case FILEID_INO32_GEN_PARENT | XFS_FILEID_TYPE_64FLAG:
case FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG:
- inode = xfs_nfs_get_inode(sb, fid64->ino, fid64->gen);
+ inode = xfs_nfs_get_inode(sb, fid64->ino, fid64->gen, flags);
break;
}
@@ -194,10 +206,15 @@ xfs_fs_fh_to_dentry(struct super_block *sb, struct fid *fid,
STATIC struct dentry *
xfs_fs_fh_to_parent(struct super_block *sb, struct fid *fid,
- int fh_len, int fileid_type)
+ int fh_len, int fileid_type_flags)
{
+ int fileid_type = FILEID_TYPE(fileid_type_flags);
struct xfs_fid64 *fid64 = (struct xfs_fid64 *)fid;
struct inode *inode = NULL;
+ uint flags = 0;
+
+ if (fileid_type_flags & FILEID_CACHED)
+ flags = XFS_IGET_INCORE;
if (fh_len < xfs_fileid_length(fileid_type))
return NULL;
@@ -205,11 +222,11 @@ xfs_fs_fh_to_parent(struct super_block *sb, struct fid *fid,
switch (fileid_type) {
case FILEID_INO32_GEN_PARENT:
inode = xfs_nfs_get_inode(sb, fid->i32.parent_ino,
- fid->i32.parent_gen);
+ fid->i32.parent_gen, flags);
break;
case FILEID_INO32_GEN_PARENT | XFS_FILEID_TYPE_64FLAG:
inode = xfs_nfs_get_inode(sb, fid64->parent_ino,
- fid64->parent_gen);
+ fid64->parent_gen, flags);
break;
}
@@ -248,4 +265,5 @@ const struct export_operations xfs_export_operations = {
.map_blocks = xfs_fs_map_blocks,
.commit_blocks = xfs_fs_commit_blocks,
#endif
+ .flags = EXPORT_OP_NONBLOCK,
};
diff --git a/fs/xfs/xfs_export.h b/fs/xfs/xfs_export.h
index 3cd85e8901a5..9addfcd5b1e1 100644
--- a/fs/xfs/xfs_export.h
+++ b/fs/xfs/xfs_export.h
@@ -57,6 +57,7 @@ struct xfs_fid64 {
/* This flag goes on the wire. Don't play with it. */
#define XFS_FILEID_TYPE_64FLAG 0x80 /* NFS fileid has 64bit inodes */
-struct inode *xfs_nfs_get_inode(struct super_block *sb, u64 ino, u32 gen);
+struct inode *xfs_nfs_get_inode(struct super_block *sb, u64 ino, u32 gen,
+ uint flags);
#endif /* __XFS_EXPORT_H__ */
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index f19fce557354..7d877ff504d6 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -193,7 +193,7 @@ xfs_khandle_to_inode(
return ERR_PTR(-EINVAL);
inode = xfs_nfs_get_inode(mp->m_super, handle->ha_fid.fid_ino,
- handle->ha_fid.fid_gen);
+ handle->ha_fid.fid_gen, 0);
if (IS_ERR(inode))
return ERR_CAST(inode);
--
2.51.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 07/10] exportfs: new FILEID_CACHED flag for non-blocking fh lookup
2025-09-12 15:28 ` [PATCH v3 07/10] exportfs: new FILEID_CACHED flag for non-blocking fh lookup Thomas Bertschinger
@ 2025-09-12 16:28 ` Amir Goldstein
0 siblings, 0 replies; 16+ messages in thread
From: Amir Goldstein @ 2025-09-12 16:28 UTC (permalink / raw)
To: Thomas Bertschinger
Cc: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton
On Fri, Sep 12, 2025 at 5:27 PM Thomas Bertschinger
<tahbertschinger@gmail.com> wrote:
>
> This defines a new flag FILEID_CACHED that the VFS can set in the
> handle_type field of struct file_handle to request that the FS
> implementations of fh_to_{dentry,parent}() only complete if they can
> satisfy the request with cached data.
>
> Because not every FS implementation will recognize this new flag, those
> that do recognize the flag can indicate their support using a new
> export flag, EXPORT_OP_NONBLOCK.
>
> If FILEID_CACHED is set in a file handle, but the filesystem does not
> set EXPORT_OP_NONBLOCK, then the VFS will return -EAGAIN without
> attempting to call into the filesystem code.
>
> exportfs_decode_fh_raw() is updated to respect the new flag by returning
> -EAGAIN when it would need to do an operation that may not be possible
> with only cached data.
>
> Suggested-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
> ---
> I didn't apply Amir's Reviewed-by for this patch because I added the
> Documenation section, which was not reviewed in v2.
Documentation looks good.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
>
> Documentation/filesystems/nfs/exporting.rst | 6 ++++++
> fs/exportfs/expfs.c | 12 ++++++++++++
> fs/fhandle.c | 2 ++
> include/linux/exportfs.h | 5 +++++
> 4 files changed, 25 insertions(+)
>
> diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst
> index de64d2d002a2..70f46eaeb0d4 100644
> --- a/Documentation/filesystems/nfs/exporting.rst
> +++ b/Documentation/filesystems/nfs/exporting.rst
> @@ -238,3 +238,9 @@ following flags are defined:
> all of an inode's dirty data on last close. Exports that behave this
> way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip
> waiting for writeback when closing such files.
> +
> + EXPORT_OP_NONBLOCK - FS supports fh_to_{dentry,parent}() using cached data
> + When performing open_by_handle_at(2) using io_uring, it is useful to
> + complete the file open using only cached data when possible, otherwise
> + failing with -EAGAIN. This flag indicates that the filesystem supports this
> + mode of operation.
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index 949ce6ef6c4e..e2cfdd9d6392 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -441,6 +441,7 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
> void *context)
> {
> const struct export_operations *nop = mnt->mnt_sb->s_export_op;
> + bool decode_cached = fileid_type & FILEID_CACHED;
> struct dentry *result, *alias;
> char nbuf[NAME_MAX+1];
> int err;
> @@ -453,6 +454,10 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
> */
> if (!exportfs_can_decode_fh(nop))
> return ERR_PTR(-ESTALE);
> +
> + if (decode_cached && !(nop->flags & EXPORT_OP_NONBLOCK))
> + return ERR_PTR(-EAGAIN);
> +
> result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
> if (IS_ERR_OR_NULL(result))
> return result;
> @@ -481,6 +486,10 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
> * filesystem root.
> */
> if (result->d_flags & DCACHE_DISCONNECTED) {
> + err = -EAGAIN;
> + if (decode_cached)
> + goto err_result;
> +
> err = reconnect_path(mnt, result, nbuf);
> if (err)
> goto err_result;
> @@ -526,6 +535,9 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
> err = PTR_ERR(target_dir);
> if (IS_ERR(target_dir))
> goto err_result;
> + err = -EAGAIN;
> + if (decode_cached && (target_dir->d_flags & DCACHE_DISCONNECTED))
> + goto err_result;
>
> /*
> * And as usual we need to make sure the parent directory is
> diff --git a/fs/fhandle.c b/fs/fhandle.c
> index 2dc669aeb520..509ff8983f94 100644
> --- a/fs/fhandle.c
> +++ b/fs/fhandle.c
> @@ -273,6 +273,8 @@ static int do_handle_to_path(struct file_handle *handle, struct path *path,
> if (IS_ERR_OR_NULL(dentry)) {
> if (dentry == ERR_PTR(-ENOMEM))
> return -ENOMEM;
> + if (dentry == ERR_PTR(-EAGAIN))
> + return -EAGAIN;
> return -ESTALE;
> }
> path->dentry = dentry;
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 30a9791d88e0..8238b6f67956 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -199,6 +199,8 @@ struct handle_to_path_ctx {
> #define FILEID_FS_FLAGS_MASK 0xff00
> #define FILEID_FS_FLAGS(flags) ((flags) & FILEID_FS_FLAGS_MASK)
>
> +#define FILEID_CACHED 0x100 /* Use only cached data when decoding handle */
> +
> /* User flags: */
> #define FILEID_USER_FLAGS_MASK 0xffff0000
> #define FILEID_USER_FLAGS(type) ((type) & FILEID_USER_FLAGS_MASK)
> @@ -303,6 +305,9 @@ struct export_operations {
> */
> #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> +#define EXPORT_OP_NONBLOCK (0x80) /* Filesystem supports non-
> + blocking fh_to_dentry()
> + */
> unsigned long flags;
> };
>
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 10/10] xfs: add support for non-blocking fh_to_dentry()
2025-09-12 15:28 ` [PATCH v3 10/10] xfs: add support for non-blocking fh_to_dentry() Thomas Bertschinger
@ 2025-09-12 22:51 ` Dave Chinner
0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2025-09-12 22:51 UTC (permalink / raw)
To: Thomas Bertschinger
Cc: io-uring, axboe, linux-fsdevel, viro, brauner, linux-nfs,
linux-xfs, cem, chuck.lever, jlayton, amir73il
On Fri, Sep 12, 2025 at 09:28:55AM -0600, Thomas Bertschinger wrote:
> This is to support using open_by_handle_at(2) via io_uring. It is useful
> for io_uring to request that opening a file via handle be completed
> using only cached data, or fail with -EAGAIN if that is not possible.
>
> The signature of xfs_nfs_get_inode() is extended with a new flags
> argument that allows callers to specify XFS_IGET_INCORE.
>
> That flag is set when the VFS passes the FILEID_CACHED flag via the
> fileid_type argument.
>
> Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
> Acked-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/xfs/xfs_export.c | 34 ++++++++++++++++++++++++++--------
> fs/xfs/xfs_export.h | 3 ++-
> fs/xfs/xfs_handle.c | 2 +-
> 3 files changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
> index 201489d3de08..6a57ed8fd9b7 100644
> --- a/fs/xfs/xfs_export.c
> +++ b/fs/xfs/xfs_export.c
> @@ -106,7 +106,8 @@ struct inode *
> xfs_nfs_get_inode(
> struct super_block *sb,
> u64 ino,
> - u32 generation)
> + u32 generation,
> + uint flags)
> {
> xfs_mount_t *mp = XFS_M(sb);
> xfs_inode_t *ip;
> @@ -123,7 +124,9 @@ xfs_nfs_get_inode(
> * fine and not an indication of a corrupted filesystem as clients can
> * send invalid file handles and we have to handle it gracefully..
> */
> - error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip);
> + flags |= XFS_IGET_UNTRUSTED;
> +
> + error = xfs_iget(mp, NULL, ino, flags, 0, &ip);
> if (error) {
>
> /*
> @@ -140,6 +143,10 @@ xfs_nfs_get_inode(
> case -EFSCORRUPTED:
> error = -ESTALE;
> break;
> + case -ENODATA:
> + if (flags & XFS_IGET_INCORE)
> + error = -EAGAIN;
> + break;
> default:
> break;
> }
> @@ -170,10 +177,15 @@ xfs_nfs_get_inode(
>
> STATIC struct dentry *
> xfs_fs_fh_to_dentry(struct super_block *sb, struct fid *fid,
> - int fh_len, int fileid_type)
> + int fh_len, int fileid_type_flags)
> {
> + int fileid_type = FILEID_TYPE(fileid_type_flags);
> struct xfs_fid64 *fid64 = (struct xfs_fid64 *)fid;
> struct inode *inode = NULL;
> + uint flags = 0;
> +
> + if (fileid_type_flags & FILEID_CACHED)
> + flags = XFS_IGET_INCORE;
XFS_IGET_INCORE doesn't guarantee non-blocking lookup behaviour. It
never has and it never will. It simply means we return inodes that
are already full instantiated or it fails with either EAGAIN or
ENODATA.
IOWs, XFS_IGET_INCORE exploits the internal XFS inode cache
architecture (cache lookups are done under RCU locks, so cannot
block). The resultant cleanup that needs to be done once a ilookup
fails before another attempt can be made is done outside RCU, and
the lookup is most definitely allowed to block in those paths before
it returns -EAGAIN to the outer lookup loop. It is mostly pure luck
that we don't have any sleeping locks in various internal "need to
retry the lookup" paths right now.
Exposing XFS_IGET_INCORE functionality to the outside world does not
fill me with joy, especially to a userspace ABI. i.e. this takes a
rarely used, niche internal filesystem behaviour, redefines how it
is supposed to behave and what it guarantees to callers without
actually defining those semantics, and then requires the filesystem
to support it forever more (because io_uring is kernel/userspace
ABI).
IOWs, this is a NACK on using XFS_IGET_INCORE for FILEID_CACHED. The
semantics that are required bu io_uring are non-blocking lookups,
and that should be defined by a new flag (say XFS_IGET_NONBLOCK)
with clearly defined and agreed upon semantics.
Indeed, this shows the semantic problem with defining the generic
filehandle behaviour as FILEID_CACHED. io_ uring does not want
-cached- inode lookups, it wants *non-blocking* inode lookups.
These are *not* equivalent lookup semantics.
e.g. find_inode_fast() has FILEID_CACHED compatible semantics - it
will return either a referenced, fully instantiated cached inode or
null.
However, find_inode_fast() does *not have non-blocking behaviour*.
If it finds an inode being freed, it will block until that inode has
been removed from the cache, then it will retry the lookup and
return NULL because the inode is no longer found in the cache.
IOWs, "only return in-cache inodes" is fundamentally the wrong
semantic to implement for non-blocking filehandle decoding. The API
needs to ask for non-blocking lookup semantics, not "in-cache"
lookup semantics.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 02/10] io_uring: add support for IORING_OP_NAME_TO_HANDLE_AT
2025-09-12 15:28 ` [PATCH v3 02/10] io_uring: add support for IORING_OP_NAME_TO_HANDLE_AT Thomas Bertschinger
@ 2025-09-17 14:14 ` Jens Axboe
0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2025-09-17 14:14 UTC (permalink / raw)
To: Thomas Bertschinger, io-uring, linux-fsdevel, viro, brauner,
linux-nfs, linux-xfs, cem, chuck.lever, jlayton, amir73il
On 9/12/25 9:28 AM, Thomas Bertschinger wrote:
> +#if defined(CONFIG_FHANDLE)
> +int io_name_to_handle_at_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> + struct io_name_to_handle *nh = io_kiocb_to_cmd(req, struct io_name_to_handle);
> +
> + nh->dfd = READ_ONCE(sqe->fd);
> + nh->flags = READ_ONCE(sqe->name_to_handle_flags);
> + nh->path = u64_to_user_ptr(READ_ONCE(sqe->addr));
> + nh->ufh = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> + nh->mount_id = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> +
> + return 0;
> +}
Should probably include a:
if (sqe->len)
return -EINVAL;
to allow for using that field in the future, should that become
necessary.
Outside of that, this patch looks fine to me.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 09/10] io_uring: add support for IORING_OP_OPEN_BY_HANDLE_AT
2025-09-12 15:28 ` [PATCH v3 09/10] io_uring: add support for IORING_OP_OPEN_BY_HANDLE_AT Thomas Bertschinger
@ 2025-09-17 14:18 ` Jens Axboe
0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2025-09-17 14:18 UTC (permalink / raw)
To: Thomas Bertschinger, io-uring, linux-fsdevel, viro, brauner,
linux-nfs, linux-xfs, cem, chuck.lever, jlayton, amir73il
On 9/12/25 9:28 AM, Thomas Bertschinger wrote:
> +int io_open_by_handle_at_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> + struct io_open *open = io_kiocb_to_cmd(req, struct io_open);
> + struct io_open_handle_async *ah;
> + u64 flags;
> + int ret;
> +
> + flags = READ_ONCE(sqe->open_flags);
> + open->how = build_open_how(flags, 0);
Maybe kill 'flags' here as it's only used to pass into build_open_how()?
open->how = build_open_how(READ_ONCE(sqe->open_flags), 0);
> + ret = __io_open_prep(req, sqe);
> + if (ret)
> + return ret;
> +
> + ah = io_uring_alloc_async_data(NULL, req);
> + if (!ah)
> + return -ENOMEM;
> + memset(&ah->path, 0, sizeof(ah->path));
> + ah->handle = get_user_handle(u64_to_user_ptr(READ_ONCE(sqe->addr)));
> + if (IS_ERR(ah->handle))
> + return PTR_ERR(ah->handle);
> +
> + req->flags |= REQ_F_NEED_CLEANUP;
> +
> + return 0;
Prudent to do something ala:
if (IS_ERR(ah->handle)) {
ret = PTR_ERR(ah->handle);
ah->handle = NULL;
return ret;
}
> +void io_open_by_handle_cleanup(struct io_kiocb *req)
> +{
> + struct io_open_handle_async *ah = req->async_data;
> +
> + if (ah->path.dentry)
> + path_put(&ah->path);
> +
> + kfree(ah->handle);
> +}
kfree(ah->handle);
ah->hande = NULL;
Just a few minor nits, overall this looks fine.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 08/10] io_uring: add __io_open_prep() helper
2025-09-12 15:28 ` [PATCH v3 08/10] io_uring: add __io_open_prep() helper Thomas Bertschinger
@ 2025-09-17 14:18 ` Jens Axboe
0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2025-09-17 14:18 UTC (permalink / raw)
To: Thomas Bertschinger, io-uring, linux-fsdevel, viro, brauner,
linux-nfs, linux-xfs, cem, chuck.lever, jlayton, amir73il
On 9/12/25 9:28 AM, Thomas Bertschinger wrote:
> This adds a helper, __io_open_prep(), which does the part of preparing
> for an open that is shared between openat*() and open_by_handle_at().
>
> It excludes reading in the user path or file handle--this will be done
> by functions specific to the kind of open().
Looks fine to me.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-09-17 14:18 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 15:28 [PATCH v3 00/10] add support for name_to, open_by_handle_at() to io_uring Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 01/10] fhandle: create helper for name_to_handle_at(2) Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 02/10] io_uring: add support for IORING_OP_NAME_TO_HANDLE_AT Thomas Bertschinger
2025-09-17 14:14 ` Jens Axboe
2025-09-12 15:28 ` [PATCH v3 03/10] fhandle: helper for allocating, reading struct file_handle Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 04/10] fhandle: create do_file_handle_open() helper Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 05/10] fhandle: make do_file_handle_open() take struct open_flags Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 06/10] exportfs: allow VFS flags in struct file_handle Thomas Bertschinger
2025-09-12 15:28 ` [PATCH v3 07/10] exportfs: new FILEID_CACHED flag for non-blocking fh lookup Thomas Bertschinger
2025-09-12 16:28 ` Amir Goldstein
2025-09-12 15:28 ` [PATCH v3 08/10] io_uring: add __io_open_prep() helper Thomas Bertschinger
2025-09-17 14:18 ` Jens Axboe
2025-09-12 15:28 ` [PATCH v3 09/10] io_uring: add support for IORING_OP_OPEN_BY_HANDLE_AT Thomas Bertschinger
2025-09-17 14:18 ` Jens Axboe
2025-09-12 15:28 ` [PATCH v3 10/10] xfs: add support for non-blocking fh_to_dentry() Thomas Bertschinger
2025-09-12 22:51 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox