* [PATCH v8 00/16] fuse: fuse-over-io-uring
@ 2024-12-09 14:56 Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
` (15 more replies)
0 siblings, 16 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This adds support for io-uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
approach was taken from ublk.
Motivation for these patches is all to increase fuse performance,
by:
- Reducing kernel/userspace context switches
- Part of that is given by the ring ring - handling multiple
requests on either side of kernel/userspace without the need
to switch per request
- Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting
the result of a request and fetching the next fuse request
in one step. In contrary to legacy read/write to /dev/fuse
- Core and numa affinity - one ring per core, which allows to
avoid cpu core context switches
A more detailed motivation description can be found in the
introction of previous patch series
https://lore.kernel.org/r/[email protected]
That description also includes benchmark results with RFCv1.
Performance with the current series needs to be tested, but will
be lower, as several optimization patches are missing, like
wake-up on the same core. These optimizations will be submitted
after merging the main changes.
The corresponding libfuse patches are on my uring branch, but needs
cleanup for submission - that will be done once the kernel design
will not change anymore
https://github.com/bsbernd/libfuse/tree/uring
Testing with that libfuse branch is possible by running something
like:
example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring --uring-q-depth=128 /scratch/source /scratch/dest
With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:
cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
unique: 4, result=104
Without the --uring option "cqe" is replaced by the default "dev"
dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
unique: 4, success, outsize: 120
Future work
- different payload sizes per ring
- zero copy
Signed-off-by: Bernd Schubert <[email protected]>
---
Changes in v8:
- Move the lock in fuse_uring_create_queue to have initialization before
taking fc->lock (Joanne)
- Avoid double assignment of ring_ent->cmd (Pavel)
- Set a global ring may_payload size instead of per ring-entry (Joanne)
- Also consider fc->max_pages for the max payload size (instead of
fc->max_write only) (Joanne)
- Fix freeing of ring->queues in error case in fuse_uring_create (Joanne)
- Fix allocation size of the ring, including queues was a leftover from
previous patches (Miklos, Joanne)
- Move FUSE_URING_IOV_SEGS definiton to the top of dev_uring.c (Joanne)a
- Update Documentation/filesystems/fuse-io-uring.rst and use 'io-uring'
instead of 'uring' (Pavel)
- Rename SQE op codes to FUSE_IO_URING_CMD_REGISTER and
FUSE_IO_URING_CMD_COMMIT_AND_FETCH
- Use READ_ONCE on data in 80B SQE section (struct fuse_uring_cmd_req)
(Pavel)
- Add back sanity check for IO_URING_F_SQE128 (had been initially there,
but got lost between different version updates) (Pavel)
- Remove pr_devel logs (Miklos)
- Only set fuse_uring_cmd() in to file_operations in the last patch
and mark that function with __maybe_unused before, to avoid potential
compiler warnings (Pavel)
- Add missing sanity for qid < ring->nr_queues
- Add check for fc->initialized - FUSE_IO_URING_CMD_REGISTER must only
arrive after FUSE_INIT in order to know the max payload size
- Add in 'struct fuse_uring_ent_in_out' and add in the commit id.
For now the commit id is the request unique, but any number
that can identify the corresponding struct fuse_ring_ent object.
The current way via struct fuse_req needs struct fuse_pqueue per
queue (>2kb per core/queue), has hash overhead and is not suitable
for requests types without a unique (like future updates for notify
- Increase FUSE_KERNEL_MINOR_VERSION to 42
- Separate out make fuse_request_find/fuse_req_hash/fuse_pqueue_init
non-static to simplify review
- Don't return too early in fuse_uring_copy_to_ring, to always update
'ring_ent_in_out'
- Code simplification of fuse_uring_next_fuse_req()
- fuse_uring_commit_fetch was accidentally doing a full copy on stack
of queue->fpq
- Separate out setting and getting values from io_uring_cmd *cmd->pdu
into functions
- Fix freeing of queue->ent_released (was accidentally in the wrong
function)
- Remove the queue from fuse_uring_cmd_pdu, ring_ent is enough since
v7
- Return -EAGAIN for IO_URING_F_CANCEL when ring-entries are in the
wrong state. To be clarified with io-uring upstream if that is right.
- Slight simplifaction by using list_first_entry_or_null instead of
extra checks if the list is empty
- Link to v7: https://lore.kernel.org/r/[email protected]
Changes in v7:
- Bug fixes:
- Removed unsetting ring->ready as that brought up a lock
order violation for fc->bg_lock/queue->lock
- Check for !fc->connected in fuse_uring_cmd(), tear down issues
came up with large ring sizes without that.
- Removal of (arg->size == 0) condition and warning in fuse_copy_args
as that is actually expected for some op codes.
- New init flag: FUSE_OVER_IO_URING to tell fuse-server about over-io-uring
capability
- Use fuse_set_zero_arg0() to set arg0 and rename to struct fuse_zero_header
(I hope I got Miklos suggestion right)
- Simplification of fuse_uring_ent_avail()
- Renamed some structs in uapi/linux/fuse.h to fuse_uring
(from fuse_ring) to be consistent
- Removal of 'if 0' in fuse_uring_cmd()
- Return -E... directly in fuse_uring_cmd() instead of setting err first
and removal of goto's in that function.
- Just a simple WARN_ON_ONCE() for (oh->unique & FUSE_INT_REQ_BIT) as
that code should be unreachable
- Removal of several pr_devel and some pr_warn() messages
- Removed RFC as it passed several xfstests runs now
- Link to v6: https://lore.kernel.org/r/[email protected]
Changes in v6:
- Update to linux-6.12
- Use 'struct fuse_iqueue_ops' and redirect fiq->ops once
the ring is ready.
- Fix return code from fuse_uring_copy_from_ring on
copy_from_user failure (Dan Carpenter / kernel test robot)
- Avoid list iteration in fuse_uring_cancel (Joanne)
- Simplified struct fuse_ring_req_header
- Adds a new 'struct struct fuse_ring_ent_in_out'
- Fix assigning ring->queues[qid] in fuse_uring_create_queue,
it was too early, resulting in races
- Add back 'FRRS_INVALID = 0' to ensure ring-ent states always
have a value > 0
- Avoid assigning struct io_uring_cmd *cmd->pdu multiple times,
once on settings up IO_URING_F_CANCEL is sufficient for sending
the request as well.
- Link to v5: https://lore.kernel.org/r/[email protected]
Changes in v5:
- Main focus in v5 is the separation of headers from payload,
which required to introduce 'struct fuse_zero_in'.
- Addressed several teardown issues, that were a regression in v4.
- Fixed "BUG: sleeping function called" due to allocation while
holding a lock reported by David Wei
- Fix function comment reported by kernel test rebot
- Fix set but unused variabled reported by test robot
- Link to v4: https://lore.kernel.org/r/[email protected]
Changes in v4:
- Removal of ioctls, all configuration is done dynamically
on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
allocated as array of the ring/queue - removal of the tag
variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
is more cumbersome now and needs an almost unused
struct fuse_pqueue per fuse_ring_queue and uses the unique
id of fuse requests.
- No device clones needed for to workaroung hanging mounts
on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.
Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
as that needs to go through another subsystem/tree) ,
removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
(Josef)
- Addressed several other comments from Josef (I need to go over
the RFCv2 review again, I'm not sure if everything is addressed
already)
- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com
- Link to v2: https://lore.kernel.org/all/[email protected]/
- Link to v1: https://lore.kernel.org/r/[email protected]
---
Bernd Schubert (16):
fuse: rename to fuse_dev_end_requests and make non-static
fuse: Move fuse_get_dev to header file
fuse: Move request bits
fuse: Add fuse-io-uring design documentation
fuse: make args->in_args[0] to be always the header
fuse: {io-uring} Handle SQEs - register commands
fuse: Make fuse_copy non static
fuse: Add fuse-io-uring handling into fuse_copy
fuse: {io-uring} Make hash-list req unique finding functions non-static
fuse: Add io-uring sqe commit and fetch support
fuse: {io-uring} Handle teardown of ring entries
fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
fuse: Allow to queue fg requests through io-uring
fuse: Allow to queue bg requests through io-uring
fuse: {io-uring} Prevent mount point hang on fuse-server termination
fuse: enable fuse-over-io-uring
Documentation/filesystems/fuse-io-uring.rst | 101 +++
fs/fuse/Kconfig | 12 +
fs/fuse/Makefile | 1 +
fs/fuse/dax.c | 11 +-
fs/fuse/dev.c | 124 +--
fs/fuse/dev_uring.c | 1308 +++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 207 +++++
fs/fuse/dir.c | 32 +-
fs/fuse/fuse_dev_i.h | 68 ++
fs/fuse/fuse_i.h | 27 +
fs/fuse/inode.c | 12 +-
fs/fuse/xattr.c | 7 +-
include/uapi/linux/fuse.h | 76 +-
13 files changed, 1910 insertions(+), 76 deletions(-)
---
base-commit: e70140ba0d2b1a30467d4af6bcfe761327b9ec95
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8
Best regards,
--
Bernd Schubert <[email protected]>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v8 01/16] fuse: rename to fuse_dev_end_requests and make non-static
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
` (14 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
fs/fuse/dev.c | 11 +++++------
fs/fuse/fuse_dev_i.h | 14 ++++++++++++++
2 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 27ccae63495d14ea339aa6c8da63d0ac44fc8885..757f2c797d68aa217c0e120f6f16e4a24808ecae 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -7,6 +7,7 @@
*/
#include "fuse_i.h"
+#include "fuse_dev_i.h"
#include <linux/init.h>
#include <linux/module.h>
@@ -34,8 +35,6 @@ MODULE_ALIAS("devname:fuse");
static struct kmem_cache *fuse_req_cachep;
-static void end_requests(struct list_head *head);
-
static struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
@@ -1885,7 +1884,7 @@ static void fuse_resend(struct fuse_conn *fc)
spin_unlock(&fiq->lock);
list_for_each_entry(req, &to_queue, list)
clear_bit(FR_PENDING, &req->flags);
- end_requests(&to_queue);
+ fuse_dev_end_requests(&to_queue);
return;
}
/* iq and pq requests are both oldest to newest */
@@ -2204,7 +2203,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
}
/* Abort all requests on the given list (pending or processing) */
-static void end_requests(struct list_head *head)
+void fuse_dev_end_requests(struct list_head *head)
{
while (!list_empty(head)) {
struct fuse_req *req;
@@ -2307,7 +2306,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
wake_up_all(&fc->blocked_waitq);
spin_unlock(&fc->lock);
- end_requests(&to_end);
+ fuse_dev_end_requests(&to_end);
} else {
spin_unlock(&fc->lock);
}
@@ -2337,7 +2336,7 @@ int fuse_dev_release(struct inode *inode, struct file *file)
list_splice_init(&fpq->processing[i], &to_end);
spin_unlock(&fpq->lock);
- end_requests(&to_end);
+ fuse_dev_end_requests(&to_end);
/* Are we the last open device? */
if (atomic_dec_and_test(&fc->dev_count)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..4fcff2223fa60fbfb844a3f8e1252a523c4c01af
--- /dev/null
+++ b/fs/fuse/fuse_dev_i.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2008 Miklos Szeredi <[email protected]>
+ */
+#ifndef _FS_FUSE_DEV_I_H
+#define _FS_FUSE_DEV_I_H
+
+#include <linux/types.h>
+
+void fuse_dev_end_requests(struct list_head *head);
+
+#endif
+
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 02/16] fuse: Move fuse_get_dev to header file
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 03/16] fuse: Move request bits Bernd Schubert
` (13 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Another preparation patch, as this function will be needed by
fuse/dev.c and fuse/dev_uring.c.
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
fs/fuse/dev.c | 9 ---------
fs/fuse/fuse_dev_i.h | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 757f2c797d68aa217c0e120f6f16e4a24808ecae..3db3282bdac4613788ec8d6d29bfc56241086609 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -35,15 +35,6 @@ MODULE_ALIAS("devname:fuse");
static struct kmem_cache *fuse_req_cachep;
-static struct fuse_dev *fuse_get_dev(struct file *file)
-{
- /*
- * Lockless access is OK, because file->private data is set
- * once during mount and is valid until the file is released.
- */
- return READ_ONCE(file->private_data);
-}
-
static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
{
INIT_LIST_HEAD(&req->list);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 4fcff2223fa60fbfb844a3f8e1252a523c4c01af..e7ea1b21c18204335c52406de5291f0c47d654f5 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,15 @@
#include <linux/types.h>
+static inline struct fuse_dev *fuse_get_dev(struct file *file)
+{
+ /*
+ * Lockless access is OK, because file->private data is set
+ * once during mount and is valid until the file is released.
+ */
+ return READ_ONCE(file->private_data);
+}
+
void fuse_dev_end_requests(struct list_head *head);
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 03/16] fuse: Move request bits
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
` (12 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
These are needed by fuse-over-io-uring.
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
fs/fuse/dev.c | 4 ----
fs/fuse/fuse_dev_i.h | 4 ++++
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 3db3282bdac4613788ec8d6d29bfc56241086609..4f8825de9e05b9ffd291ac5bff747a10a70df0b4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -29,10 +29,6 @@
MODULE_ALIAS_MISCDEV(FUSE_MINOR);
MODULE_ALIAS("devname:fuse");
-/* Ordinary requests have even IDs, while interrupts IDs are odd */
-#define FUSE_INT_REQ_BIT (1ULL << 0)
-#define FUSE_REQ_ID_STEP (1ULL << 1)
-
static struct kmem_cache *fuse_req_cachep;
static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index e7ea1b21c18204335c52406de5291f0c47d654f5..08a7e88e002773fcd18c25a229c7aa6450831401 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,10 @@
#include <linux/types.h>
+/* Ordinary requests have even IDs, while interrupts IDs are odd */
+#define FUSE_INT_REQ_BIT (1ULL << 0)
+#define FUSE_REQ_ID_STEP (1ULL << 1)
+
static inline struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 04/16] fuse: Add fuse-io-uring design documentation
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (2 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 03/16] fuse: Move request bits Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
` (11 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Signed-off-by: Bernd Schubert <[email protected]>
---
Documentation/filesystems/fuse-io-uring.rst | 101 ++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)
diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 0000000000000000000000000000000000000000..6299b65072a8468f08cc4f6978c386546bb9559a
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+FUSE-over-io-uring design documentation
+=======================================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through io-uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through io-uring, userspace
+is required to also handle requests through /dev/fuse after io-uring setup
+is complete. Specifically notifications (initiated from the daemon side)
+ and interrupts.
+
+Fuse io-uring configuration
+===========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until io-uring setup is complete.
+
+In order to set up fuse-over-io-uring fuse-server (user-space)
+needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the /dev/fuse
+connection file descriptor. Initial submit is with the sub command
+FUSE_URING_REQ_REGISTER, which will just register entries to be
+available in the kernel.
+
+Once at least one entry per queue is submitted, kernel starts
+to enqueue to ring queues.
+Note, every CPU core has its own fuse-io-uring queue.
+Userspace handles the CQE/fuse-request and submits the result as
+subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
+the requests and also marks the entry available again. If there are
+pending requests waiting the request will be immediately submitted
+to the daemon again.
+
+Initial SQE
+-----------
+
+ | | FUSE filesystem daemon
+ | |
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD /
+ | | FUSE_URING_REQ_FETCH
+ | | [wait cqe]
+ | | >io_uring_wait_cqe() or
+ | | >io_uring_submit_and_wait()
+ | |
+ | >fuse_uring_cmd() |
+ | >fuse_uring_fetch() |
+ | >fuse_uring_ent_release() |
+
+
+Sending requests with CQEs
+--------------------------
+
+ | | FUSE filesystem daemon
+ | | [waiting for CQEs]
+ | "rm /mnt/fuse/file" |
+ | |
+ | >sys_unlink() |
+ | >fuse_unlink() |
+ | [allocate request] |
+ | >__fuse_request_send() |
+ | ... |
+ | >fuse_uring_queue_fuse_req |
+ | [queue request on fg or |
+ | bg queue] |
+ | >fuse_uring_assign_ring_entry() |
+ | >fuse_uring_send_to_ring() |
+ | >fuse_uring_copy_to_ring() |
+ | >io_uring_cmd_done() |
+ | >request_wait_answer() |
+ | [sleep on req->waitq] |
+ | | [receives and handles CQE]
+ | | [submit result and fetch next]
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD/
+ | | FUSE_URING_REQ_COMMIT_AND_FETCH
+ | >fuse_uring_cmd() |
+ | >fuse_uring_commit_and_release() |
+ | >fuse_uring_copy_from_ring() |
+ | [ copy the result to the fuse req] |
+ | >fuse_uring_req_end_and_get_next() |
+ | >fuse_request_end() |
+ | [wake up req->waitq] |
+ | >fuse_uring_ent_release_and_fetch()|
+ | [wait or handle next req] |
+ | |
+ | |
+ | [req->waitq woken up] |
+ | <fuse_unlink() |
+ | <sys_unlink() |
+
+
+
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 05/16] fuse: make args->in_args[0] to be always the header
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (3 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 06/16] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
` (10 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This change sets up FUSE operations to always have headers in
args.in_args[0], even for opcodes without an actual header.
This step prepares for a clean separation of payload from headers,
initially it is used by fuse-over-io-uring.
For opcodes without a header, we use a zero-sized struct as a
placeholder. This approach:
- Keeps things consistent across all FUSE operations
- Will help with payload alignment later
- Avoids future issues when header sizes change
Op codes that already have an op code specific header do not
need modification.
Op codes that have neither payload nor op code headers
are not modified either (FUSE_READLINK and FUSE_DESTROY).
FUSE_BATCH_FORGET already has the header in the right place,
but is not using fuse_copy_args - as -over-uring is currently
not handling forgets it does not matter for now, but header
separation will later need special attention for that op code.
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
fs/fuse/dax.c | 11 ++++++-----
fs/fuse/dev.c | 9 +++++----
fs/fuse/dir.c | 32 ++++++++++++++++++--------------
fs/fuse/fuse_i.h | 13 +++++++++++++
fs/fuse/xattr.c | 7 ++++---
5 files changed, 46 insertions(+), 26 deletions(-)
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 9abbc2f2894f905099b48862d776083e6075fbba..0b6ee6dd1fd6569a12f1a44c24ca178163b0da81 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -240,11 +240,12 @@ static int fuse_send_removemapping(struct inode *inode,
args.opcode = FUSE_REMOVEMAPPING;
args.nodeid = fi->nodeid;
- args.in_numargs = 2;
- args.in_args[0].size = sizeof(*inargp);
- args.in_args[0].value = inargp;
- args.in_args[1].size = inargp->count * sizeof(*remove_one);
- args.in_args[1].value = remove_one;
+ args.in_numargs = 3;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = sizeof(*inargp);
+ args.in_args[1].value = inargp;
+ args.in_args[2].size = inargp->count * sizeof(*remove_one);
+ args.in_args[2].value = remove_one;
return fuse_simple_request(fm, &args);
}
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 4f8825de9e05b9ffd291ac5bff747a10a70df0b4..623c5a067c1841e8210b5b4e063e7b6690f1825a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1746,7 +1746,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
args = &ap->args;
args->nodeid = outarg->nodeid;
args->opcode = FUSE_NOTIFY_REPLY;
- args->in_numargs = 2;
+ args->in_numargs = 3;
args->in_pages = true;
args->end = fuse_retrieve_end;
@@ -1774,9 +1774,10 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
}
ra->inarg.offset = outarg->offset;
ra->inarg.size = total_len;
- args->in_args[0].size = sizeof(ra->inarg);
- args->in_args[0].value = &ra->inarg;
- args->in_args[1].size = total_len;
+ fuse_set_zero_arg0(args);
+ args->in_args[1].size = sizeof(ra->inarg);
+ args->in_args[1].value = &ra->inarg;
+ args->in_args[2].size = total_len;
err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
if (err)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 494ac372ace07ab4ea06c13a404ecc1d2ccb4f23..1c6126069ee7fcce522fbb7bcec21c9392982413 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -175,9 +175,10 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
memset(outarg, 0, sizeof(struct fuse_entry_out));
args->opcode = FUSE_LOOKUP;
args->nodeid = nodeid;
- args->in_numargs = 1;
- args->in_args[0].size = name->len + 1;
- args->in_args[0].value = name->name;
+ args->in_numargs = 2;
+ fuse_set_zero_arg0(args);
+ args->in_args[1].size = name->len + 1;
+ args->in_args[1].value = name->name;
args->out_numargs = 1;
args->out_args[0].size = sizeof(struct fuse_entry_out);
args->out_args[0].value = outarg;
@@ -928,11 +929,12 @@ static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
FUSE_ARGS(args);
args.opcode = FUSE_SYMLINK;
- args.in_numargs = 2;
- args.in_args[0].size = entry->d_name.len + 1;
- args.in_args[0].value = entry->d_name.name;
- args.in_args[1].size = len;
- args.in_args[1].value = link;
+ args.in_numargs = 3;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = entry->d_name.len + 1;
+ args.in_args[1].value = entry->d_name.name;
+ args.in_args[2].size = len;
+ args.in_args[2].value = link;
return create_new_entry(idmap, fm, &args, dir, entry, S_IFLNK);
}
@@ -992,9 +994,10 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
args.opcode = FUSE_UNLINK;
args.nodeid = get_node_id(dir);
- args.in_numargs = 1;
- args.in_args[0].size = entry->d_name.len + 1;
- args.in_args[0].value = entry->d_name.name;
+ args.in_numargs = 2;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = entry->d_name.len + 1;
+ args.in_args[1].value = entry->d_name.name;
err = fuse_simple_request(fm, &args);
if (!err) {
fuse_dir_changed(dir);
@@ -1015,9 +1018,10 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
args.opcode = FUSE_RMDIR;
args.nodeid = get_node_id(dir);
- args.in_numargs = 1;
- args.in_args[0].size = entry->d_name.len + 1;
- args.in_args[0].value = entry->d_name.name;
+ args.in_numargs = 2;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = entry->d_name.len + 1;
+ args.in_args[1].value = entry->d_name.name;
err = fuse_simple_request(fm, &args);
if (!err) {
fuse_dir_changed(dir);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 74744c6f286003251564d1235f4d2ca8654d661b..babddd05303796d689a64f0f5a890066b43170ac 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -947,6 +947,19 @@ struct fuse_mount {
struct rcu_head rcu;
};
+/*
+ * Empty header for FUSE opcodes without specific header needs.
+ * Used as a placeholder in args->in_args[0] for consistency
+ * across all FUSE operations, simplifying request handling.
+ */
+struct fuse_zero_header {};
+
+static inline void fuse_set_zero_arg0(struct fuse_args *args)
+{
+ args->in_args[0].size = sizeof(struct fuse_zero_header);
+ args->in_args[0].value = NULL;
+}
+
static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
{
return sb->s_fs_info;
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 9f568d345c51236ddd421b162820a4ea9b0734f4..93dfb06b6cea045d6df90c61c900680968bda39f 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -164,9 +164,10 @@ int fuse_removexattr(struct inode *inode, const char *name)
args.opcode = FUSE_REMOVEXATTR;
args.nodeid = get_node_id(inode);
- args.in_numargs = 1;
- args.in_args[0].size = strlen(name) + 1;
- args.in_args[0].value = name;
+ args.in_numargs = 2;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = strlen(name) + 1;
+ args.in_args[1].value = name;
err = fuse_simple_request(fm, &args);
if (err == -ENOSYS) {
fm->fc->no_removexattr = 1;
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 06/16] fuse: {io-uring} Handle SQEs - register commands
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (4 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-12 1:29 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 07/16] fuse: Make fuse_copy non static Bernd Schubert
` (9 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
entries.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/Kconfig | 12 ++
fs/fuse/Makefile | 1 +
fs/fuse/dev_uring.c | 339 ++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 118 ++++++++++++++++
fs/fuse/fuse_i.h | 5 +
fs/fuse/inode.c | 10 ++
include/uapi/linux/fuse.h | 76 ++++++++++-
7 files changed, 560 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..ca215a3cba3e310d1359d069202193acdcdb172b 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
to be performed directly on a backing file.
If you want to allow passthrough operations, answer Y.
+
+config FUSE_IO_URING
+ bool "FUSE communication over io-uring"
+ default y
+ depends on FUSE_FS
+ depends on IO_URING
+ help
+ This allows sending FUSE requests over the io-uring interface and
+ also adds request core affinity.
+
+ If you want to allow fuse server/client communication through io-uring,
+ answer Y
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 2c372180d631eb340eca36f19ee2c2686de9714d..3f0f312a31c1cc200c0c91a086b30a8318e39d94 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -15,5 +15,6 @@ fuse-y += iomode.o
fuse-$(CONFIG_FUSE_DAX) += dax.o
fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
fuse-$(CONFIG_SYSCTL) += sysctl.o
+fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
new file mode 100644
index 0000000000000000000000000000000000000000..f0c5807c94a55f9c9e2aa95ad078724971ddd125
--- /dev/null
+++ b/fs/fuse/dev_uring.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#include "fuse_i.h"
+#include "dev_uring_i.h"
+#include "fuse_dev_i.h"
+
+#include <linux/fs.h>
+#include <linux/io_uring/cmd.h>
+
+#ifdef CONFIG_FUSE_IO_URING
+static bool __read_mostly enable_uring;
+module_param(enable_uring, bool, 0644);
+MODULE_PARM_DESC(enable_uring,
+ "Enable userspace communication through io-uring");
+#endif
+
+#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+
+
+bool fuse_uring_enabled(void)
+{
+ return enable_uring;
+}
+
+static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
+{
+ struct fuse_ring_queue *queue = ent->queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
+ return -EIO;
+
+ ent->state = FRRS_COMMIT;
+ list_move(&ent->list, &queue->ent_commit_queue);
+
+ return 0;
+}
+
+void fuse_uring_destruct(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+ int qid;
+
+ if (!ring)
+ return;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ WARN_ON(!list_empty(&queue->ent_avail_queue));
+ WARN_ON(!list_empty(&queue->ent_commit_queue));
+
+ kfree(queue);
+ ring->queues[qid] = NULL;
+ }
+
+ kfree(ring->queues);
+ kfree(ring);
+ fc->ring = NULL;
+}
+
+/*
+ * Basic ring setup for this connection based on the provided configuration
+ */
+static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = NULL;
+ size_t nr_queues = num_possible_cpus();
+ struct fuse_ring *res = NULL;
+ size_t max_payload_size;
+
+ ring = kzalloc(sizeof(*fc->ring), GFP_KERNEL_ACCOUNT);
+ if (!ring)
+ return NULL;
+
+ ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
+ GFP_KERNEL_ACCOUNT);
+ if (!ring->queues)
+ goto out_err;
+
+ max_payload_size = max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write);
+ max_payload_size =
+ max_t(size_t, max_payload_size, fc->max_pages * PAGE_SIZE);
+
+ spin_lock(&fc->lock);
+ if (fc->ring) {
+ /* race, another thread created the ring in the meantime */
+ spin_unlock(&fc->lock);
+ res = fc->ring;
+ goto out_err;
+ }
+
+ fc->ring = ring;
+ ring->nr_queues = nr_queues;
+ ring->fc = fc;
+ ring->max_payload_sz = max_payload_size;
+
+ spin_unlock(&fc->lock);
+ return ring;
+
+out_err:
+ kfree(ring->queues);
+ kfree(ring);
+ return res;
+}
+
+static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
+ int qid)
+{
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_ring_queue *queue;
+
+ queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
+ if (!queue)
+ return ERR_PTR(-ENOMEM);
+ queue->qid = qid;
+ queue->ring = ring;
+ spin_lock_init(&queue->lock);
+
+ INIT_LIST_HEAD(&queue->ent_avail_queue);
+ INIT_LIST_HEAD(&queue->ent_commit_queue);
+
+ spin_lock(&fc->lock);
+ if (ring->queues[qid]) {
+ spin_unlock(&fc->lock);
+ kfree(queue);
+ return ring->queues[qid];
+ }
+
+ WRITE_ONCE(ring->queues[qid], queue);
+ spin_unlock(&fc->lock);
+
+ return queue;
+}
+
+/*
+ * Make a ring entry available for fuse_req assignment
+ */
+static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
+ struct fuse_ring_queue *queue)
+{
+ list_move(&ring_ent->list, &queue->ent_avail_queue);
+ ring_ent->state = FRRS_WAIT;
+}
+
+/*
+ * fuse_uring_req_fetch command handling
+ */
+static void _fuse_uring_register(struct fuse_ring_ent *ring_ent,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ spin_lock(&queue->lock);
+ fuse_uring_ent_avail(ring_ent, queue);
+ spin_unlock(&queue->lock);
+}
+
+/*
+ * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
+ * the payload
+ */
+static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
+ struct iovec iov[FUSE_URING_IOV_SEGS])
+{
+ struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ struct iov_iter iter;
+ ssize_t ret;
+
+ if (sqe->len != FUSE_URING_IOV_SEGS)
+ return -EINVAL;
+
+ /*
+ * Direction for buffer access will actually be READ and WRITE,
+ * using write for the import should include READ access as well.
+ */
+ ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
+ FUSE_URING_IOV_SEGS, &iov, &iter);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+/* Register header and payload buffer with the kernel and fetch a request */
+static int fuse_uring_register(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, struct fuse_conn *fc)
+{
+ const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ struct fuse_ring_ent *ring_ent;
+ int err;
+ struct iovec iov[FUSE_URING_IOV_SEGS];
+ size_t payload_size;
+ unsigned int qid = READ_ONCE(cmd_req->qid);
+
+ err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
+ if (err) {
+ pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
+ err);
+ return err;
+ }
+
+ err = -ENOMEM;
+ if (!ring) {
+ ring = fuse_uring_create(fc);
+ if (!ring)
+ return err;
+ }
+
+ if (qid >= ring->nr_queues) {
+ pr_info_ratelimited("fuse: Invalid ring qid %u\n", qid);
+ return -EINVAL;
+ }
+
+ err = -ENOMEM;
+ queue = ring->queues[qid];
+ if (!queue) {
+ queue = fuse_uring_create_queue(ring, qid);
+ if (!queue)
+ return err;
+ }
+
+ /*
+ * The created queue above does not need to be destructed in
+ * case of entry errors below, will be done at ring destruction time.
+ */
+
+ ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
+ if (!ring_ent)
+ return err;
+
+ INIT_LIST_HEAD(&ring_ent->list);
+
+ ring_ent->queue = queue;
+ ring_ent->cmd = cmd;
+
+ err = -EINVAL;
+ if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
+ pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
+ goto err;
+ }
+
+ ring_ent->headers = iov[0].iov_base;
+ ring_ent->payload = iov[1].iov_base;
+ payload_size = iov[1].iov_len;
+
+ if (payload_size < ring->max_payload_sz) {
+ pr_info_ratelimited("Invalid req payload len %zu\n",
+ payload_size);
+ goto err;
+ }
+
+ spin_lock(&queue->lock);
+
+ /*
+ * FUSE_IO_URING_CMD_REGISTER is an initialization exception, needs
+ * state override
+ */
+ ring_ent->state = FRRS_USERSPACE;
+ err = fuse_ring_ent_unset_userspace(ring_ent);
+ spin_unlock(&queue->lock);
+ if (WARN_ON_ONCE(err))
+ goto err;
+
+ _fuse_uring_register(ring_ent, cmd, issue_flags);
+
+ return 0;
+err:
+ list_del_init(&ring_ent->list);
+ kfree(ring_ent);
+ return err;
+}
+
+/*
+ * Entry function from io_uring to handle the given passthrough command
+ * (op cocde IORING_OP_URING_CMD)
+ */
+int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct fuse_dev *fud;
+ struct fuse_conn *fc;
+ u32 cmd_op = cmd->cmd_op;
+ int err;
+
+ if (!enable_uring) {
+ pr_info_ratelimited("fuse-io-uring is disabled\n");
+ return -EOPNOTSUPP;
+ }
+
+ /* This extra SQE size holds struct fuse_uring_cmd_req */
+ if (!(issue_flags & IO_URING_F_SQE128))
+ return -EINVAL;
+
+ fud = fuse_get_dev(cmd->file);
+ if (!fud) {
+ pr_info_ratelimited("No fuse device found\n");
+ return -ENOTCONN;
+ }
+ fc = fud->fc;
+
+ if (fc->aborted)
+ return -ECONNABORTED;
+ if (!fc->connected)
+ return -ENOTCONN;
+
+ /*
+ * fuse_uring_register() needs the ring to be initialized,
+ * we need to know the max payload size
+ */
+ if (!fc->initialized)
+ return -EAGAIN;
+
+ switch (cmd_op) {
+ case FUSE_IO_URING_CMD_REGISTER:
+ err = fuse_uring_register(cmd, issue_flags, fc);
+ if (err) {
+ pr_info_once("FUSE_IO_URING_CMD_REGISTER failed err=%d\n",
+ err);
+ return err;
+ }
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return -EIOCBQUEUED;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..73e9e3063bb038e8341d85cd2a440421275e6aa8
--- /dev/null
+++ b/fs/fuse/dev_uring_i.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#ifndef _FS_FUSE_DEV_URING_I_H
+#define _FS_FUSE_DEV_URING_I_H
+
+#include "fuse_i.h"
+
+#ifdef CONFIG_FUSE_IO_URING
+
+enum fuse_ring_req_state {
+ FRRS_INVALID = 0,
+
+ /* The ring entry received from userspace and it is being processed */
+ FRRS_COMMIT,
+
+ /* The ring entry is waiting for new fuse requests */
+ FRRS_WAIT,
+
+ /* The ring entry is in or on the way to user space */
+ FRRS_USERSPACE,
+};
+
+/** A fuse ring entry, part of the ring queue */
+struct fuse_ring_ent {
+ /* userspace buffer */
+ struct fuse_uring_req_header __user *headers;
+ void *__user *payload;
+
+ /* the ring queue that owns the request */
+ struct fuse_ring_queue *queue;
+
+ struct io_uring_cmd *cmd;
+
+ struct list_head list;
+
+ /*
+ * state the request is currently in
+ * (enum fuse_ring_req_state)
+ */
+ unsigned int state;
+
+ struct fuse_req *fuse_req;
+
+ /* commit id to identify the server reply */
+ uint64_t commit_id;
+};
+
+struct fuse_ring_queue {
+ /*
+ * back pointer to the main fuse uring structure that holds this
+ * queue
+ */
+ struct fuse_ring *ring;
+
+ /* queue id, typically also corresponds to the cpu core */
+ unsigned int qid;
+
+ /*
+ * queue lock, taken when any value in the queue changes _and_ also
+ * a ring entry state changes.
+ */
+ spinlock_t lock;
+
+ /* available ring entries (struct fuse_ring_ent) */
+ struct list_head ent_avail_queue;
+
+ /*
+ * entries in the process of being committed or in the process
+ * to be send to userspace
+ */
+ struct list_head ent_commit_queue;
+};
+
+/**
+ * Describes if uring is for communication and holds alls the data needed
+ * for uring communication
+ */
+struct fuse_ring {
+ /* back pointer */
+ struct fuse_conn *fc;
+
+ /* number of ring queues */
+ size_t nr_queues;
+
+ /* maximum payload/arg size */
+ size_t max_payload_sz;
+
+ struct fuse_ring_queue **queues;
+};
+
+bool fuse_uring_enabled(void);
+void fuse_uring_destruct(struct fuse_conn *fc);
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+
+#else /* CONFIG_FUSE_IO_URING */
+
+struct fuse_ring;
+
+static inline void fuse_uring_create(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_destruct(struct fuse_conn *fc)
+{
+}
+
+static inline bool fuse_uring_enabled(void)
+{
+ return false;
+}
+
+#endif /* CONFIG_FUSE_IO_URING */
+
+#endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index babddd05303796d689a64f0f5a890066b43170ac..d75dd9b59a5c35b76919db760645464f604517f5 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -923,6 +923,11 @@ struct fuse_conn {
/** IDR for backing files ids */
struct idr backing_files_map;
#endif
+
+#ifdef CONFIG_FUSE_IO_URING
+ /** uring connection information*/
+ struct fuse_ring *ring;
+#endif
};
/*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 3ce4f4e81d09e867c3a7db7b1dbb819f88ed34ef..e4f9bbacfc1bc6f51d5d01b4c47b42cc159ed783 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -7,6 +7,7 @@
*/
#include "fuse_i.h"
+#include "dev_uring_i.h"
#include <linux/pagemap.h>
#include <linux/slab.h>
@@ -992,6 +993,8 @@ static void delayed_release(struct rcu_head *p)
{
struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
+ fuse_uring_destruct(fc);
+
put_user_ns(fc->user_ns);
fc->release(fc);
}
@@ -1446,6 +1449,13 @@ void fuse_send_init(struct fuse_mount *fm)
if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
flags |= FUSE_PASSTHROUGH;
+ /*
+ * This is just an information flag for fuse server. No need to check
+ * the reply - server is either sending IORING_OP_URING_CMD or not.
+ */
+ if (fuse_uring_enabled())
+ flags |= FUSE_OVER_IO_URING;
+
ia->in.flags = flags;
ia->in.flags2 = flags >> 32;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index f1e99458e29e4fdce5273bc3def242342f207ebd..388cb4b93f48575d5e57c27b02f59a80e2fbe93c 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -220,6 +220,15 @@
*
* 7.41
* - add FUSE_ALLOW_IDMAP
+ * 7.42
+ * - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
+ * structures:
+ * - struct fuse_uring_ent_in_out
+ * - struct fuse_uring_req_header
+ * - struct fuse_uring_cmd_req
+ * - FUSE_URING_IN_OUT_HEADER_SZ
+ * - FUSE_URING_OP_IN_OUT_SZ
+ * - enum fuse_uring_cmd
*/
#ifndef _LINUX_FUSE_H
@@ -255,7 +264,7 @@
#define FUSE_KERNEL_VERSION 7
/** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 41
+#define FUSE_KERNEL_MINOR_VERSION 42
/** The node ID of the root inode */
#define FUSE_ROOT_ID 1
@@ -425,6 +434,7 @@ struct fuse_file_lock {
* FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
* of the request ID indicates resend requests
* FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
+ * FUSE_OVER_IO_URING: Indicate that Client supports io-uring
*/
#define FUSE_ASYNC_READ (1 << 0)
#define FUSE_POSIX_LOCKS (1 << 1)
@@ -471,6 +481,7 @@ struct fuse_file_lock {
/* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
#define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP
#define FUSE_ALLOW_IDMAP (1ULL << 40)
+#define FUSE_OVER_IO_URING (1ULL << 41)
/**
* CUSE INIT request/reply flags
@@ -1206,4 +1217,67 @@ struct fuse_supp_groups {
uint32_t groups[];
};
+/**
+ * Size of the ring buffer header
+ */
+#define FUSE_URING_IN_OUT_HEADER_SZ 128
+#define FUSE_URING_OP_IN_OUT_SZ 128
+
+struct fuse_uring_ent_in_out {
+ uint64_t flags;
+
+ /*
+ * commit ID to be used in a reply to a ring request (see also
+ * struct fuse_uring_cmd_req)
+ */
+ uint64_t commit_id;
+
+ /* size of use payload buffer */
+ uint32_t payload_sz;
+ uint32_t padding;
+
+ uint64_t reserved;
+};
+
+/**
+ * Header for all fuse-io-uring requests
+ */
+struct fuse_uring_req_header {
+ /* struct fuse_in / struct fuse_out */
+ char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
+
+ /* per op code structs */
+ char op_in[FUSE_URING_OP_IN_OUT_SZ];
+
+ /* struct fuse_ring_in_out */
+ char ring_ent_in_out[sizeof(struct fuse_uring_ent_in_out)];
+};
+
+/**
+ * sqe commands to the kernel
+ */
+enum fuse_uring_cmd {
+ FUSE_IO_URING_CMD_INVALID = 0,
+
+ /* register the request buffer and fetch a fuse request */
+ FUSE_IO_URING_CMD_REGISTER = 1,
+
+ /* commit fuse request result and fetch next request */
+ FUSE_IO_URING_CMD_COMMIT_AND_FETCH = 2,
+};
+
+/**
+ * In the 80B command area of the SQE.
+ */
+struct fuse_uring_cmd_req {
+ uint64_t flags;
+
+ /* entry identifier for commits */
+ uint64_t commit_id;
+
+ /* queue the command is for (queue index) */
+ uint16_t qid;
+ uint8_t padding[6];
+};
+
#endif /* _LINUX_FUSE_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 07/16] fuse: Make fuse_copy non static
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (5 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 06/16] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-13 0:50 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
` (8 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-io-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 30 ++++++++----------------------
fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++
2 files changed, 33 insertions(+), 22 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 623c5a067c1841e8210b5b4e063e7b6690f1825a..6ee7e28a84c80a3e7c8dc933986c0388371ff6cd 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -678,22 +678,8 @@ static int unlock_request(struct fuse_req *req)
return err;
}
-struct fuse_copy_state {
- int write;
- struct fuse_req *req;
- struct iov_iter *iter;
- struct pipe_buffer *pipebufs;
- struct pipe_buffer *currbuf;
- struct pipe_inode_info *pipe;
- unsigned long nr_segs;
- struct page *pg;
- unsigned len;
- unsigned offset;
- unsigned move_pages:1;
-};
-
-static void fuse_copy_init(struct fuse_copy_state *cs, int write,
- struct iov_iter *iter)
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+ struct iov_iter *iter)
{
memset(cs, 0, sizeof(*cs));
cs->write = write;
@@ -1054,9 +1040,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
}
/* Copy request arguments to/from userspace buffer */
-static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
- unsigned argpages, struct fuse_arg *args,
- int zeroing)
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
+ unsigned argpages, struct fuse_arg *args,
+ int zeroing)
{
int err = 0;
unsigned i;
@@ -1933,8 +1919,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
return NULL;
}
-static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
- unsigned nbytes)
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+ unsigned nbytes)
{
unsigned reqsize = sizeof(struct fuse_out_header);
@@ -2036,7 +2022,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
if (oh.error)
err = nbytes != sizeof(oh) ? -EINVAL : 0;
else
- err = copy_out_args(cs, req->args, nbytes);
+ err = fuse_copy_out_args(cs, req->args, nbytes);
fuse_copy_finish(cs);
spin_lock(&fpq->lock);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 08a7e88e002773fcd18c25a229c7aa6450831401..21eb1bdb492d04f0a406d25bb8d300b34244dce2 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -12,6 +12,23 @@
#define FUSE_INT_REQ_BIT (1ULL << 0)
#define FUSE_REQ_ID_STEP (1ULL << 1)
+struct fuse_arg;
+struct fuse_args;
+
+struct fuse_copy_state {
+ int write;
+ struct fuse_req *req;
+ struct iov_iter *iter;
+ struct pipe_buffer *pipebufs;
+ struct pipe_buffer *currbuf;
+ struct pipe_inode_info *pipe;
+ unsigned long nr_segs;
+ struct page *pg;
+ unsigned int len;
+ unsigned int offset;
+ unsigned int move_pages:1;
+};
+
static inline struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
@@ -23,5 +40,13 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
void fuse_dev_end_requests(struct list_head *head);
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+ struct iov_iter *iter);
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
+ unsigned int argpages, struct fuse_arg *args,
+ int zeroing);
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+ unsigned int nbytes);
+
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 08/16] fuse: Add fuse-io-uring handling into fuse_copy
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (6 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 07/16] fuse: Make fuse_copy non static Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-13 1:25 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 09/16] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
` (7 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Add special fuse-io-uring into the fuse argument
copy handler.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 12 +++++++++++-
fs/fuse/fuse_dev_i.h | 5 +++++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6ee7e28a84c80a3e7c8dc933986c0388371ff6cd..2ba153054f7ba61a870c847cb87d81168220661f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -786,6 +786,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
*size -= ncpy;
cs->len -= ncpy;
cs->offset += ncpy;
+ if (cs->is_uring)
+ cs->ring.offset += ncpy;
+
return ncpy;
}
@@ -1922,7 +1925,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
unsigned nbytes)
{
- unsigned reqsize = sizeof(struct fuse_out_header);
+
+ unsigned int reqsize = 0;
+
+ /*
+ * Uring has all headers separated from args - args is payload only
+ */
+ if (!cs->is_uring)
+ reqsize = sizeof(struct fuse_out_header);
reqsize += fuse_len_args(args->out_numargs, args->out_args);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 21eb1bdb492d04f0a406d25bb8d300b34244dce2..0708730b656b97071de9a5331ef4d51a112c602c 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -27,6 +27,11 @@ struct fuse_copy_state {
unsigned int len;
unsigned int offset;
unsigned int move_pages:1;
+ unsigned int is_uring:1;
+ struct {
+ /* overall offset with the user buffer */
+ unsigned int offset;
+ } ring;
};
static inline struct fuse_dev *fuse_get_dev(struct file *file)
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 09/16] fuse: {io-uring} Make hash-list req unique finding functions non-static
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (7 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-13 1:41 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 10/16] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
` (6 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
fuse-over-io-uring uses existing functions to find requests based
on their unique id - make these functions non-static.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 6 +++---
fs/fuse/fuse_dev_i.h | 6 ++++++
fs/fuse/fuse_i.h | 5 +++++
fs/fuse/inode.c | 2 +-
4 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 2ba153054f7ba61a870c847cb87d81168220661f..a45d92431769d4aadaf5c5792086abc5dda3c048 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -220,7 +220,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq)
}
EXPORT_SYMBOL_GPL(fuse_get_unique);
-static unsigned int fuse_req_hash(u64 unique)
+unsigned int fuse_req_hash(u64 unique)
{
return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
}
@@ -1910,7 +1910,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
}
/* Look up request on processing list by unique ID */
-static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique)
{
unsigned int hash = fuse_req_hash(unique);
struct fuse_req *req;
@@ -1994,7 +1994,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
spin_lock(&fpq->lock);
req = NULL;
if (fpq->connected)
- req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
+ req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
err = -ENOENT;
if (!req) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 0708730b656b97071de9a5331ef4d51a112c602c..d7bf72dabd84c3896d1447380649e2f4d20b0643 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -7,6 +7,7 @@
#define _FS_FUSE_DEV_I_H
#include <linux/types.h>
+#include <linux/fs.h>
/* Ordinary requests have even IDs, while interrupts IDs are odd */
#define FUSE_INT_REQ_BIT (1ULL << 0)
@@ -14,6 +15,8 @@
struct fuse_arg;
struct fuse_args;
+struct fuse_pqueue;
+struct fuse_req;
struct fuse_copy_state {
int write;
@@ -43,6 +46,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
return READ_ONCE(file->private_data);
}
+unsigned int fuse_req_hash(u64 unique);
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
+
void fuse_dev_end_requests(struct list_head *head);
void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d75dd9b59a5c35b76919db760645464f604517f5..e545b0864dd51e82df61cc39bdf65d3d36a418dc 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1237,6 +1237,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
*/
struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
+/**
+ * Initialize the fuse processing queue
+ */
+void fuse_pqueue_init(struct fuse_pqueue *fpq);
+
/**
* Initialize fuse_conn
*/
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index e4f9bbacfc1bc6f51d5d01b4c47b42cc159ed783..328797b9aac9a816a4ad2c69b6880dc6ef6222b0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -938,7 +938,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq,
fiq->priv = priv;
}
-static void fuse_pqueue_init(struct fuse_pqueue *fpq)
+void fuse_pqueue_init(struct fuse_pqueue *fpq)
{
unsigned int i;
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 10/16] fuse: Add io-uring sqe commit and fetch support
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (8 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 09/16] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 11/16] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
` (5 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This adds support for fuse request completion through ring SQEs
(FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
the ring entry it becomes available for new fuse requests.
Handling of requests through the ring (SQE/CQE handling)
is complete now.
Fuse request data are copied through the mmaped ring buffer,
there is no support for any zero copy yet.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 431 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 12 ++
fs/fuse/fuse_i.h | 4 +
3 files changed, 447 insertions(+)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index f0c5807c94a55f9c9e2aa95ad078724971ddd125..b43e48dea4eba2d361119735c549f6a6cd461372 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -26,6 +26,19 @@ bool fuse_uring_enabled(void)
return enable_uring;
}
+static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
+ int error)
+{
+ struct fuse_req *req = ring_ent->fuse_req;
+
+ if (set_err)
+ req->out.h.error = error;
+
+ clear_bit(FR_SENT, &req->flags);
+ fuse_request_end(ring_ent->fuse_req);
+ ring_ent->fuse_req = NULL;
+}
+
static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
{
struct fuse_ring_queue *queue = ent->queue;
@@ -56,8 +69,11 @@ void fuse_uring_destruct(struct fuse_conn *fc)
continue;
WARN_ON(!list_empty(&queue->ent_avail_queue));
+ WARN_ON(!list_empty(&queue->ent_w_req_queue));
WARN_ON(!list_empty(&queue->ent_commit_queue));
+ WARN_ON(!list_empty(&queue->ent_in_userspace));
+ kfree(queue->fpq.processing);
kfree(queue);
ring->queues[qid] = NULL;
}
@@ -117,16 +133,30 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
{
struct fuse_conn *fc = ring->fc;
struct fuse_ring_queue *queue;
+ struct list_head *pq;
queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
if (!queue)
return ERR_PTR(-ENOMEM);
+ pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
+ if (!pq) {
+ kfree(queue);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ kfree(queue->fpq.processing);
queue->qid = qid;
queue->ring = ring;
spin_lock_init(&queue->lock);
INIT_LIST_HEAD(&queue->ent_avail_queue);
INIT_LIST_HEAD(&queue->ent_commit_queue);
+ INIT_LIST_HEAD(&queue->ent_w_req_queue);
+ INIT_LIST_HEAD(&queue->ent_in_userspace);
+ INIT_LIST_HEAD(&queue->fuse_req_queue);
+
+ queue->fpq.processing = pq;
+ fuse_pqueue_init(&queue->fpq);
spin_lock(&fc->lock);
if (ring->queues[qid]) {
@@ -141,6 +171,210 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
return queue;
}
+/*
+ * Checks for errors and stores it into the request
+ */
+static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
+ struct fuse_req *req,
+ struct fuse_conn *fc)
+{
+ int err;
+
+ err = -EINVAL;
+ if (oh->unique == 0) {
+ /* Not supportd through io-uring yet */
+ pr_warn_once("notify through fuse-io-uring not supported\n");
+ goto seterr;
+ }
+
+ err = -EINVAL;
+ if (oh->error <= -ERESTARTSYS || oh->error > 0)
+ goto seterr;
+
+ if (oh->error) {
+ err = oh->error;
+ goto err;
+ }
+
+ err = -ENOENT;
+ if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
+ pr_warn_ratelimited("unique mismatch, expected: %llu got %llu\n",
+ req->in.h.unique,
+ oh->unique & ~FUSE_INT_REQ_BIT);
+ goto seterr;
+ }
+
+ /*
+ * Is it an interrupt reply ID?
+ * XXX: Not supported through fuse-io-uring yet, it should not even
+ * find the request - should not happen.
+ */
+ WARN_ON_ONCE(oh->unique & FUSE_INT_REQ_BIT);
+
+ return 0;
+
+seterr:
+ oh->error = err;
+err:
+ return err;
+}
+
+static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
+ struct fuse_req *req,
+ struct fuse_ring_ent *ent)
+{
+ struct fuse_copy_state cs;
+ struct fuse_args *args = req->args;
+ struct iov_iter iter;
+ int err, res;
+ struct fuse_uring_ent_in_out ring_in_out;
+
+ res = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
+ sizeof(ring_in_out));
+ if (res)
+ return -EFAULT;
+
+ err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz,
+ &iter);
+ if (err)
+ return err;
+
+ fuse_copy_init(&cs, 0, &iter);
+ cs.is_uring = 1;
+ cs.req = req;
+
+ return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
+}
+
+ /*
+ * Copy data from the req to the ring buffer
+ */
+static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
+ struct fuse_ring_ent *ent)
+{
+ struct fuse_copy_state cs;
+ struct fuse_args *args = req->args;
+ struct fuse_in_arg *in_args = args->in_args;
+ int num_args = args->in_numargs;
+ int err, res;
+ struct iov_iter iter;
+ struct fuse_uring_ent_in_out ent_in_out = {
+ .flags = 0,
+ .commit_id = ent->commit_id,
+ };
+
+ err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter);
+ if (err) {
+ pr_info_ratelimited("fuse: Import of user buffer failed\n");
+ return err;
+ }
+
+ fuse_copy_init(&cs, 1, &iter);
+ cs.is_uring = 1;
+ cs.req = req;
+
+ if (num_args > 0) {
+ /*
+ * Expectation is that the first argument is the per op header.
+ * Some op code have that as zero.
+ */
+ if (args->in_args[0].size > 0) {
+ res = copy_to_user(&ent->headers->op_in, in_args->value,
+ in_args->size);
+ err = res > 0 ? -EFAULT : res;
+ if (err) {
+ pr_info_ratelimited(
+ "Copying the header failed.\n");
+ return err;
+ }
+ }
+ in_args++;
+ num_args--;
+ }
+
+ /* copy the payload */
+ err = fuse_copy_args(&cs, num_args, args->in_pages,
+ (struct fuse_arg *)in_args, 0);
+ if (err) {
+ pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
+ return err;
+ }
+
+ ent_in_out.payload_sz = cs.ring.offset;
+ res = copy_to_user(&ent->headers->ring_ent_in_out, &ent_in_out,
+ sizeof(ent_in_out));
+ err = res > 0 ? -EFAULT : res;
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int
+fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_req *req = ring_ent->fuse_req;
+ int err, res;
+
+ err = -EIO;
+ if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
+ pr_err("qid=%d ring-req=%p invalid state %d on send\n",
+ queue->qid, ring_ent, ring_ent->state);
+ err = -EIO;
+ goto err;
+ }
+
+ /* copy the request */
+ err = fuse_uring_copy_to_ring(ring, req, ring_ent);
+ if (unlikely(err)) {
+ pr_info_ratelimited("Copy to ring failed: %d\n", err);
+ goto err;
+ }
+
+ /* copy fuse_in_header */
+ res = copy_to_user(&ring_ent->headers->in_out, &req->in.h,
+ sizeof(req->in.h));
+ err = res > 0 ? -EFAULT : res;
+ if (err)
+ goto err;
+
+ set_bit(FR_SENT, &req->flags);
+ return 0;
+
+err:
+ fuse_uring_req_end(ring_ent, true, err);
+ return err;
+}
+
+/*
+ * Write data to the ring buffer and send the request to userspace,
+ * userspace will read it
+ * This is comparable with classical read(/dev/fuse)
+ */
+static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent,
+ unsigned int issue_flags)
+{
+ int err = 0;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ err = fuse_uring_prepare_send(ring_ent);
+ if (err)
+ goto err;
+
+ spin_lock(&queue->lock);
+ ring_ent->state = FRRS_USERSPACE;
+ list_move(&ring_ent->list, &queue->ent_in_userspace);
+ spin_unlock(&queue->lock);
+
+ io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags);
+ return 0;
+
+err:
+ return err;
+}
+
/*
* Make a ring entry available for fuse_req assignment
*/
@@ -151,6 +385,195 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
ring_ent->state = FRRS_WAIT;
}
+/* Used to find the request on SQE commit */
+static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
+ struct fuse_req *req)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_pqueue *fpq = &queue->fpq;
+ unsigned int hash;
+
+ /* commit_id is the unique id of the request */
+ ring_ent->commit_id = req->in.h.unique;
+
+ req->ring_entry = ring_ent;
+ hash = fuse_req_hash(ring_ent->commit_id);
+ list_move_tail(&req->list, &fpq->processing[hash]);
+}
+
+/*
+ * Assign a fuse queue entry to the given entry
+ */
+static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
+ struct fuse_req *req)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ if (WARN_ON_ONCE(ring_ent->state != FRRS_WAIT &&
+ ring_ent->state != FRRS_COMMIT)) {
+ pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+ ring_ent->state);
+ }
+ list_del_init(&req->list);
+ clear_bit(FR_PENDING, &req->flags);
+ ring_ent->fuse_req = req;
+ ring_ent->state = FRRS_FUSE_REQ;
+ list_move(&ring_ent->list, &queue->ent_w_req_queue);
+ fuse_uring_add_to_pq(ring_ent, req);
+}
+
+/*
+ * Release the ring entry and fetch the next fuse request if available
+ *
+ * @return true if a new request has been fetched
+ */
+static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
+ __must_hold(&queue->lock)
+{
+ struct fuse_req *req;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct list_head *req_queue = &queue->fuse_req_queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ /* get and assign the next entry while it is still holding the lock */
+ req = list_first_entry_or_null(req_queue, struct fuse_req, list);
+ if (req) {
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Read data from the ring buffer, which user space has written to
+ * This is comparible with handling of classical write(/dev/fuse).
+ * Also make the ring request available again for new fuse requests.
+ */
+static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
+ unsigned int issue_flags)
+{
+ struct fuse_ring *ring = ring_ent->queue->ring;
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_req *req = ring_ent->fuse_req;
+ ssize_t err = 0;
+ bool set_err = false;
+
+ err = copy_from_user(&req->out.h, &ring_ent->headers->in_out,
+ sizeof(req->out.h));
+ if (err) {
+ req->out.h.error = err;
+ goto out;
+ }
+
+ err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
+ if (err) {
+ /* req->out.h.error already set */
+ goto out;
+ }
+
+ err = fuse_uring_copy_from_ring(ring, req, ring_ent);
+ if (err)
+ set_err = true;
+
+out:
+ fuse_uring_req_end(ring_ent, set_err, err);
+}
+
+/*
+ * Get the next fuse req and send it
+ */
+static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
+ struct fuse_ring_queue *queue,
+ unsigned int issue_flags)
+{
+ int err;
+ bool has_next;
+
+retry:
+ spin_lock(&queue->lock);
+ fuse_uring_ent_avail(ring_ent, queue);
+ has_next = fuse_uring_ent_assign_req(ring_ent);
+ spin_unlock(&queue->lock);
+
+ if (has_next) {
+ err = fuse_uring_send_next_to_ring(ring_ent, issue_flags);
+ if (err)
+ goto retry;
+ }
+}
+
+/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
+static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
+ struct fuse_conn *fc)
+{
+ const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+ struct fuse_ring_ent *ring_ent;
+ int err;
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
+ unsigned int qid = READ_ONCE(cmd_req->qid);
+ struct fuse_pqueue *fpq;
+ struct fuse_req *req;
+
+ err = -ENOTCONN;
+ if (!ring)
+ return err;
+
+ if (qid >= ring->nr_queues)
+ return -EINVAL;
+
+ queue = ring->queues[qid];
+ if (!queue)
+ return err;
+ fpq = &queue->fpq;
+
+ spin_lock(&queue->lock);
+ /* Find a request based on the unique ID of the fuse request
+ * This should get revised, as it needs a hash calculation and list
+ * search. And full struct fuse_pqueue is needed (memory overhead).
+ * As well as the link from req to ring_ent.
+ */
+ req = fuse_request_find(fpq, commit_id);
+ err = -ENOENT;
+ if (!req) {
+ pr_info("qid=%d commit_id %llu not found\n", queue->qid,
+ commit_id);
+ spin_unlock(&queue->lock);
+ return err;
+ }
+ list_del_init(&req->list);
+ ring_ent = req->ring_entry;
+ req->ring_entry = NULL;
+
+ err = fuse_ring_ent_unset_userspace(ring_ent);
+ if (err != 0) {
+ pr_info_ratelimited("qid=%d commit_id %llu state %d",
+ queue->qid, commit_id, ring_ent->state);
+ spin_unlock(&queue->lock);
+ return err;
+ }
+
+ ring_ent->cmd = cmd;
+ spin_unlock(&queue->lock);
+
+ /* without the queue lock, as other locks are taken */
+ fuse_uring_commit(ring_ent, issue_flags);
+
+ /*
+ * Fetching the next request is absolutely required as queued
+ * fuse requests would otherwise not get processed - committing
+ * and fetching is done in one step vs legacy fuse, which has separated
+ * read (fetch request) and write (commit result).
+ */
+ fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
+ return 0;
+}
+
/*
* fuse_uring_req_fetch command handling
*/
@@ -331,6 +754,14 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
return err;
}
break;
+ case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
+ err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
+ if (err) {
+ pr_info_once("FUSE_IO_URING_COMMIT_AND_FETCH failed err=%d\n",
+ err);
+ return err;
+ }
+ break;
default:
return -EINVAL;
}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 73e9e3063bb038e8341d85cd2a440421275e6aa8..6149d43dc9438a0dec400a9cebb8c8b7755d66b0 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -20,6 +20,9 @@ enum fuse_ring_req_state {
/* The ring entry is waiting for new fuse requests */
FRRS_WAIT,
+ /* The ring entry got assigned a fuse req */
+ FRRS_FUSE_REQ,
+
/* The ring entry is in or on the way to user space */
FRRS_USERSPACE,
};
@@ -72,7 +75,16 @@ struct fuse_ring_queue {
* entries in the process of being committed or in the process
* to be send to userspace
*/
+ struct list_head ent_w_req_queue;
struct list_head ent_commit_queue;
+
+ /* entries in userspace */
+ struct list_head ent_in_userspace;
+
+ /* fuse requests waiting for an entry slot */
+ struct list_head fuse_req_queue;
+
+ struct fuse_pqueue fpq;
};
/**
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e545b0864dd51e82df61cc39bdf65d3d36a418dc..e71556894bc25808581424ec7bdd4afeebc81f15 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -438,6 +438,10 @@ struct fuse_req {
/** fuse_mount this request belongs to */
struct fuse_mount *fm;
+
+#ifdef CONFIG_FUSE_IO_URING
+ void *ring_entry;
+#endif
};
struct fuse_iqueue;
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 11/16] fuse: {io-uring} Handle teardown of ring entries
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (9 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 10/16] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 12/16] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static Bernd Schubert
` (4 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.
Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.
Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed
b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.
This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.
Entries in state:
- FRRS_INIT: No action needed, do not contribute to
ring->queue_refs yet
- All other states: Are currently processed by other tasks,
async teardown is needed and it has to wait for the two
states above. It could be also solved without an async
teardown task, but would require additional if conditions
in hot code paths. Also in my personal opinion the code
looks cleaner with async teardown.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 9 +++
fs/fuse/dev_uring.c | 206 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 51 +++++++++++++
3 files changed, 266 insertions(+)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index a45d92431769d4aadaf5c5792086abc5dda3c048..8da0e6437250b8136643e47bf960dd809ce06f78 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -6,6 +6,7 @@
See the file COPYING.
*/
+#include "dev_uring_i.h"
#include "fuse_i.h"
#include "fuse_dev_i.h"
@@ -2291,6 +2292,12 @@ void fuse_abort_conn(struct fuse_conn *fc)
spin_unlock(&fc->lock);
fuse_dev_end_requests(&to_end);
+
+ /*
+ * fc->lock must not be taken to avoid conflicts with io-uring
+ * locks
+ */
+ fuse_uring_abort(fc);
} else {
spin_unlock(&fc->lock);
}
@@ -2302,6 +2309,8 @@ void fuse_wait_aborted(struct fuse_conn *fc)
/* matches implicit memory barrier in fuse_drop_waiting() */
smp_mb();
wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
+
+ fuse_uring_wait_stopped_queues(fc);
}
int fuse_dev_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index b43e48dea4eba2d361119735c549f6a6cd461372..60bcddec773d1cf3bbefc674fdbdfb7823b7fbc1 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -54,6 +54,37 @@ static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
return 0;
}
+/* Abort all list queued request on the given ring queue */
+static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
+{
+ struct fuse_req *req;
+ LIST_HEAD(req_list);
+
+ spin_lock(&queue->lock);
+ list_for_each_entry(req, &queue->fuse_req_queue, list)
+ clear_bit(FR_PENDING, &req->flags);
+ list_splice_init(&queue->fuse_req_queue, &req_list);
+ spin_unlock(&queue->lock);
+
+ /* must not hold queue lock to avoid order issues with fi->lock */
+ fuse_dev_end_requests(&req_list);
+}
+
+void fuse_uring_abort_end_requests(struct fuse_ring *ring)
+{
+ int qid;
+ struct fuse_ring_queue *queue;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ queue = READ_ONCE(ring->queues[qid]);
+ if (!queue)
+ continue;
+
+ queue->stopped = true;
+ fuse_uring_abort_end_queue_requests(queue);
+ }
+}
+
void fuse_uring_destruct(struct fuse_conn *fc)
{
struct fuse_ring *ring = fc->ring;
@@ -114,10 +145,13 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
goto out_err;
}
+ init_waitqueue_head(&ring->stop_waitq);
+
fc->ring = ring;
ring->nr_queues = nr_queues;
ring->fc = fc;
ring->max_payload_sz = max_payload_size;
+ atomic_set(&ring->queue_refs, 0);
spin_unlock(&fc->lock);
return ring;
@@ -171,6 +205,174 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
return queue;
}
+static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
+{
+ struct fuse_req *req = ent->fuse_req;
+
+ /* remove entry from fuse_pqueue->processing */
+ list_del_init(&req->list);
+ ent->fuse_req = NULL;
+ clear_bit(FR_SENT, &req->flags);
+ req->out.h.error = -ECONNABORTED;
+ fuse_request_end(req);
+}
+
+/*
+ * Release a request/entry on connection tear down
+ */
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
+ bool need_cmd_done)
+{
+ /*
+ * fuse_request_end() might take other locks like fi->lock and
+ * can lead to lock ordering issues
+ */
+ lockdep_assert_not_held(&ent->queue->lock);
+
+ if (need_cmd_done)
+ io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
+ IO_URING_F_UNLOCKED);
+
+ if (ent->fuse_req)
+ fuse_uring_stop_fuse_req_end(ent);
+
+ list_del_init(&ent->list);
+ kfree(ent);
+}
+
+static void fuse_uring_stop_list_entries(struct list_head *head,
+ struct fuse_ring_queue *queue,
+ enum fuse_ring_req_state exp_state)
+{
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_ring_ent *ent, *next;
+ ssize_t queue_refs = SSIZE_MAX;
+ LIST_HEAD(to_teardown);
+
+ spin_lock(&queue->lock);
+ list_for_each_entry_safe(ent, next, head, list) {
+ if (ent->state != exp_state) {
+ pr_warn("entry teardown qid=%d state=%d expected=%d",
+ queue->qid, ent->state, exp_state);
+ continue;
+ }
+
+ list_move(&ent->list, &to_teardown);
+ }
+ spin_unlock(&queue->lock);
+
+ /* no queue lock to avoid lock order issues */
+ list_for_each_entry_safe(ent, next, &to_teardown, list) {
+ bool need_cmd_done = ent->state != FRRS_USERSPACE;
+
+ fuse_uring_entry_teardown(ent, need_cmd_done);
+ queue_refs = atomic_dec_return(&ring->queue_refs);
+
+ WARN_ON_ONCE(queue_refs < 0);
+ }
+}
+
+static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue)
+{
+ fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue,
+ FRRS_USERSPACE);
+ fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue, FRRS_WAIT);
+}
+
+/*
+ * Log state debug info
+ */
+static void fuse_uring_log_ent_state(struct fuse_ring *ring)
+{
+ int qid;
+ struct fuse_ring_ent *ent;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ spin_lock(&queue->lock);
+ /*
+ * Log entries from the intermediate queue, the other queues
+ * should be empty
+ */
+ list_for_each_entry(ent, &queue->ent_w_req_queue, list) {
+ pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+ ring, qid, ent, ent->state);
+ }
+ list_for_each_entry(ent, &queue->ent_commit_queue, list) {
+ pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+ ring, qid, ent, ent->state);
+ }
+ spin_unlock(&queue->lock);
+ }
+ ring->stop_debug_log = 1;
+}
+
+static void fuse_uring_async_stop_queues(struct work_struct *work)
+{
+ int qid;
+ struct fuse_ring *ring =
+ container_of(work, struct fuse_ring, async_teardown_work.work);
+
+ /* XXX code dup */
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+ if (!queue)
+ continue;
+
+ fuse_uring_teardown_entries(queue);
+ }
+
+ /*
+ * Some ring entries are might be in the middle of IO operations,
+ * i.e. in process to get handled by file_operations::uring_cmd
+ * or on the way to userspace - we could handle that with conditions in
+ * run time code, but easier/cleaner to have an async tear down handler
+ * If there are still queue references left
+ */
+ if (atomic_read(&ring->queue_refs) > 0) {
+ if (time_after(jiffies,
+ ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT))
+ fuse_uring_log_ent_state(ring);
+
+ schedule_delayed_work(&ring->async_teardown_work,
+ FUSE_URING_TEARDOWN_INTERVAL);
+ } else {
+ wake_up_all(&ring->stop_waitq);
+ }
+}
+
+/*
+ * Stop the ring queues
+ */
+void fuse_uring_stop_queues(struct fuse_ring *ring)
+{
+ int qid;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+ if (!queue)
+ continue;
+
+ fuse_uring_teardown_entries(queue);
+ }
+
+ if (atomic_read(&ring->queue_refs) > 0) {
+ ring->teardown_time = jiffies;
+ INIT_DELAYED_WORK(&ring->async_teardown_work,
+ fuse_uring_async_stop_queues);
+ schedule_delayed_work(&ring->async_teardown_work,
+ FUSE_URING_TEARDOWN_INTERVAL);
+ } else {
+ wake_up_all(&ring->stop_waitq);
+ }
+}
+
/*
* Checks for errors and stores it into the request
*/
@@ -532,6 +734,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
return err;
fpq = &queue->fpq;
+ if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped))
+ return err;
+
spin_lock(&queue->lock);
/* Find a request based on the unique ID of the fuse request
* This should get revised, as it needs a hash calculation and list
@@ -696,6 +901,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
if (WARN_ON_ONCE(err))
goto err;
+ atomic_inc(&ring->queue_refs);
_fuse_uring_register(ring_ent, cmd, issue_flags);
return 0;
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 6149d43dc9438a0dec400a9cebb8c8b7755d66b0..392894d7b6fb15472d72945150517a9f0a029253 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -11,6 +11,9 @@
#ifdef CONFIG_FUSE_IO_URING
+#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ)
+#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20)
+
enum fuse_ring_req_state {
FRRS_INVALID = 0,
@@ -85,6 +88,8 @@ struct fuse_ring_queue {
struct list_head fuse_req_queue;
struct fuse_pqueue fpq;
+
+ bool stopped;
};
/**
@@ -102,12 +107,51 @@ struct fuse_ring {
size_t max_payload_sz;
struct fuse_ring_queue **queues;
+ /*
+ * Log ring entry states onces on stop when entries cannot be
+ * released
+ */
+ unsigned int stop_debug_log : 1;
+
+ wait_queue_head_t stop_waitq;
+
+ /* async tear down */
+ struct delayed_work async_teardown_work;
+
+ /* log */
+ unsigned long teardown_time;
+
+ atomic_t queue_refs;
};
bool fuse_uring_enabled(void);
void fuse_uring_destruct(struct fuse_conn *fc);
+void fuse_uring_stop_queues(struct fuse_ring *ring);
+void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+
+ if (ring == NULL)
+ return;
+
+ if (atomic_read(&ring->queue_refs) > 0) {
+ fuse_uring_abort_end_requests(ring);
+ fuse_uring_stop_queues(ring);
+ }
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+
+ if (ring)
+ wait_event(ring->stop_waitq,
+ atomic_read(&ring->queue_refs) == 0);
+}
+
#else /* CONFIG_FUSE_IO_URING */
struct fuse_ring;
@@ -125,6 +169,13 @@ static inline bool fuse_uring_enabled(void)
return false;
}
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+}
#endif /* CONFIG_FUSE_IO_URING */
#endif /* _FS_FUSE_DEV_URING_I_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 12/16] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (10 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 11/16] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 13/16] fuse: Allow to queue fg requests through io-uring Bernd Schubert
` (3 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
These functions are also needed by fuse-over-io-uring.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 5 +++--
fs/fuse/fuse_dev_i.h | 5 +++++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8da0e6437250b8136643e47bf960dd809ce06f78..71f2baf1481b95b7fe10250e348cfba427199720 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -237,7 +237,8 @@ __releases(fiq->lock)
spin_unlock(&fiq->lock);
}
-static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget)
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+ struct fuse_forget_link *forget)
{
spin_lock(&fiq->lock);
if (fiq->connected) {
@@ -250,7 +251,7 @@ static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_li
}
}
-static void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
{
spin_lock(&fiq->lock);
if (list_empty(&req->intr_entry)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index d7bf72dabd84c3896d1447380649e2f4d20b0643..1d1c1e9848fba8dae46651e28809f73e165e74fe 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -17,6 +17,8 @@ struct fuse_arg;
struct fuse_args;
struct fuse_pqueue;
struct fuse_req;
+struct fuse_iqueue;
+struct fuse_forget_link;
struct fuse_copy_state {
int write;
@@ -58,6 +60,9 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
int zeroing);
int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
unsigned int nbytes);
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+ struct fuse_forget_link *forget);
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req);
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 13/16] fuse: Allow to queue fg requests through io-uring
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (11 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 12/16] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-10 23:14 ` kernel test robot
2024-12-09 14:56 ` [PATCH v8 14/16] fuse: Allow to queue bg " Bernd Schubert
` (2 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This prepares queueing and sending foreground requests through
io-uring.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 8 +++
2 files changed, 183 insertions(+)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 60bcddec773d1cf3bbefc674fdbdfb7823b7fbc1..5767fb7a501ac7253aa8a598a1aba87b65da0898 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -26,6 +26,29 @@ bool fuse_uring_enabled(void)
return enable_uring;
}
+struct fuse_uring_cmd_pdu {
+ struct fuse_ring_ent *ring_ent;
+};
+
+const struct fuse_iqueue_ops fuse_io_uring_ops;
+
+static void fuse_uring_cmd_set_ring_ent(struct io_uring_cmd *cmd,
+ struct fuse_ring_ent *ring_ent)
+{
+ struct fuse_uring_cmd_pdu *pdu =
+ io_uring_cmd_to_pdu(cmd, struct fuse_uring_cmd_pdu);
+
+ pdu->ring_ent = ring_ent;
+}
+
+static struct fuse_ring_ent *fuse_uring_cmd_to_ring_ent(struct io_uring_cmd *cmd)
+{
+ struct fuse_uring_cmd_pdu *pdu =
+ io_uring_cmd_to_pdu(cmd, struct fuse_uring_cmd_pdu);
+
+ return pdu->ring_ent;
+}
+
static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
int error)
{
@@ -779,6 +802,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
return 0;
}
+static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
+{
+ int qid;
+ struct fuse_ring_queue *queue;
+ bool ready = true;
+
+ for (qid = 0; qid < ring->nr_queues && ready; qid++) {
+ if (current_qid == qid)
+ continue;
+
+ queue = ring->queues[qid];
+ if (!queue) {
+ ready = false;
+ break;
+ }
+
+ spin_lock(&queue->lock);
+ if (list_empty(&queue->ent_avail_queue))
+ ready = false;
+ spin_unlock(&queue->lock);
+ }
+
+ return ready;
+}
+
/*
* fuse_uring_req_fetch command handling
*/
@@ -787,10 +835,22 @@ static void _fuse_uring_register(struct fuse_ring_ent *ring_ent,
unsigned int issue_flags)
{
struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_iqueue *fiq = &fc->iq;
spin_lock(&queue->lock);
fuse_uring_ent_avail(ring_ent, queue);
spin_unlock(&queue->lock);
+
+ if (!ring->ready) {
+ bool ready = is_ring_ready(ring, queue->qid);
+
+ if (ready) {
+ WRITE_ONCE(ring->ready, true);
+ fiq->ops = &fuse_io_uring_ops;
+ }
+ }
}
/*
@@ -974,3 +1034,118 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
return -EIOCBQUEUED;
}
+
+/*
+ * This prepares and sends the ring request in fuse-uring task context.
+ * User buffers are not mapped yet - the application does not have permission
+ * to write to it - this has to be executed in ring task context.
+ */
+static void
+fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct fuse_ring_ent *ring_ent = fuse_uring_cmd_to_ring_ent(cmd);
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ int err;
+
+ if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
+ err = -ECANCELED;
+ goto terminating;
+ }
+
+ err = fuse_uring_prepare_send(ring_ent);
+ if (err)
+ goto err;
+
+terminating:
+ spin_lock(&queue->lock);
+ ring_ent->state = FRRS_USERSPACE;
+ list_move(&ring_ent->list, &queue->ent_in_userspace);
+ spin_unlock(&queue->lock);
+ io_uring_cmd_done(cmd, err, 0, issue_flags);
+ return;
+err:
+ fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
+}
+
+static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
+{
+ unsigned int qid;
+ struct fuse_ring_queue *queue;
+
+ qid = task_cpu(current);
+
+ if (WARN_ONCE(qid >= ring->nr_queues,
+ "Core number (%u) exceeds nr ueues (%zu)\n", qid,
+ ring->nr_queues))
+ qid = 0;
+
+ queue = ring->queues[qid];
+ if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
+ return NULL;
+
+ return queue;
+}
+
+/* queue a fuse request and send it if a ring entry is available */
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
+{
+ struct fuse_conn *fc = req->fm->fc;
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ struct fuse_ring_ent *ring_ent = NULL;
+ int err;
+
+ err = -EINVAL;
+ queue = fuse_uring_task_to_queue(ring);
+ if (!queue)
+ goto err;
+
+ if (req->in.h.opcode != FUSE_NOTIFY_REPLY)
+ req->in.h.unique = fuse_get_unique(fiq);
+
+ spin_lock(&queue->lock);
+ err = -ENOTCONN;
+ if (unlikely(queue->stopped))
+ goto err_unlock;
+
+ ring_ent = list_first_entry_or_null(&queue->ent_avail_queue,
+ struct fuse_ring_ent, list);
+ if (ring_ent)
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+ else
+ list_add_tail(&req->list, &queue->fuse_req_queue);
+ spin_unlock(&queue->lock);
+
+ if (ring_ent) {
+ struct io_uring_cmd *cmd = ring_ent->cmd;
+
+ err = -EIO;
+ if (WARN_ON_ONCE(ring_ent->state != FRRS_FUSE_REQ))
+ goto err;
+
+ fuse_uring_cmd_set_ring_ent(cmd, ring_ent);
+ io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+ }
+
+ return;
+
+err_unlock:
+ spin_unlock(&queue->lock);
+err:
+ req->out.h.error = err;
+ clear_bit(FR_PENDING, &req->flags);
+ fuse_request_end(req);
+}
+
+const struct fuse_iqueue_ops fuse_io_uring_ops = {
+ /* should be send over io-uring as enhancement */
+ .send_forget = fuse_dev_queue_forget,
+
+ /*
+ * could be send over io-uring, but interrupts should be rare,
+ * no need to make the code complex
+ */
+ .send_interrupt = fuse_dev_queue_interrupt,
+ .send_req = fuse_uring_queue_fuse_req,
+};
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 392894d7b6fb15472d72945150517a9f0a029253..bea4fd1532083b98dc04ba65c9a6cae2d7e36714 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -122,6 +122,8 @@ struct fuse_ring {
unsigned long teardown_time;
atomic_t queue_refs;
+
+ bool ready;
};
bool fuse_uring_enabled(void);
@@ -129,6 +131,7 @@ void fuse_uring_destruct(struct fuse_conn *fc);
void fuse_uring_stop_queues(struct fuse_ring *ring);
void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
static inline void fuse_uring_abort(struct fuse_conn *fc)
{
@@ -152,6 +155,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
atomic_read(&ring->queue_refs) == 0);
}
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+ return fc->ring && fc->ring->ready;
+}
+
#else /* CONFIG_FUSE_IO_URING */
struct fuse_ring;
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 14/16] fuse: Allow to queue bg requests through io-uring
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (12 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 13/16] fuse: Allow to queue fg requests through io-uring Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This prepares queueing and sending background requests through
io-uring.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 26 +++++++++++++-
fs/fuse/dev_uring.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 6 ++++
3 files changed, 130 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 71f2baf1481b95b7fe10250e348cfba427199720..8f8aaf74ee8dfbe8837f48811138d4ff99b44bba 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -568,7 +568,25 @@ ssize_t __fuse_simple_request(struct mnt_idmap *idmap,
return ret;
}
-static bool fuse_request_queue_background(struct fuse_req *req)
+#ifdef CONFIG_FUSE_IO_URING
+static bool fuse_request_queue_background_uring(struct fuse_conn *fc,
+ struct fuse_req *req)
+{
+ struct fuse_iqueue *fiq = &fc->iq;
+
+ req->in.h.unique = fuse_get_unique(fiq);
+ req->in.h.len = sizeof(struct fuse_in_header) +
+ fuse_len_args(req->args->in_numargs,
+ (struct fuse_arg *) req->args->in_args);
+
+ return fuse_uring_queue_bq_req(req);
+}
+#endif
+
+/*
+ * @return true if queued
+ */
+static int fuse_request_queue_background(struct fuse_req *req)
{
struct fuse_mount *fm = req->fm;
struct fuse_conn *fc = fm->fc;
@@ -580,6 +598,12 @@ static bool fuse_request_queue_background(struct fuse_req *req)
atomic_inc(&fc->num_waiting);
}
__set_bit(FR_ISREPLY, &req->flags);
+
+#ifdef CONFIG_FUSE_IO_URING
+ if (fuse_uring_ready(fc))
+ return fuse_request_queue_background_uring(fc, req);
+#endif
+
spin_lock(&fc->bg_lock);
if (likely(fc->connected)) {
fc->num_background++;
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 5767fb7a501ac7253aa8a598a1aba87b65da0898..8bdfb6fcfa51976cd121bee7f2e8dec1ff9aa916 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -49,10 +49,52 @@ static struct fuse_ring_ent *fuse_uring_cmd_to_ring_ent(struct io_uring_cmd *cmd
return pdu->ring_ent;
}
+static void fuse_uring_flush_bg(struct fuse_ring_queue *queue)
+{
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_conn *fc = ring->fc;
+
+ lockdep_assert_held(&queue->lock);
+ lockdep_assert_held(&fc->bg_lock);
+
+ /*
+ * Allow one bg request per queue, ignoring global fc limits.
+ * This prevents a single queue from consuming all resources and
+ * eliminates the need for remote queue wake-ups when global
+ * limits are met but this queue has no more waiting requests.
+ */
+ while ((fc->active_background < fc->max_background ||
+ !queue->active_background) &&
+ (!list_empty(&queue->fuse_req_bg_queue))) {
+ struct fuse_req *req;
+
+ req = list_first_entry(&queue->fuse_req_bg_queue,
+ struct fuse_req, list);
+ fc->active_background++;
+ queue->active_background++;
+
+ list_move_tail(&req->list, &queue->fuse_req_queue);
+ }
+}
+
static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
int error)
{
+ struct fuse_ring_queue *queue = ring_ent->queue;
struct fuse_req *req = ring_ent->fuse_req;
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_conn *fc = ring->fc;
+
+ lockdep_assert_not_held(&queue->lock);
+ spin_lock(&queue->lock);
+ if (test_bit(FR_BACKGROUND, &req->flags)) {
+ queue->active_background--;
+ spin_lock(&fc->bg_lock);
+ fuse_uring_flush_bg(queue);
+ spin_unlock(&fc->bg_lock);
+ }
+
+ spin_unlock(&queue->lock);
if (set_err)
req->out.h.error = error;
@@ -97,6 +139,7 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
{
int qid;
struct fuse_ring_queue *queue;
+ struct fuse_conn *fc = ring->fc;
for (qid = 0; qid < ring->nr_queues; qid++) {
queue = READ_ONCE(ring->queues[qid]);
@@ -104,6 +147,13 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
continue;
queue->stopped = true;
+
+ WARN_ON_ONCE(ring->fc->max_background != UINT_MAX);
+ spin_lock(&queue->lock);
+ spin_lock(&fc->bg_lock);
+ fuse_uring_flush_bg(queue);
+ spin_unlock(&fc->bg_lock);
+ spin_unlock(&queue->lock);
fuse_uring_abort_end_queue_requests(queue);
}
}
@@ -211,6 +261,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
INIT_LIST_HEAD(&queue->ent_w_req_queue);
INIT_LIST_HEAD(&queue->ent_in_userspace);
INIT_LIST_HEAD(&queue->fuse_req_queue);
+ INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
queue->fpq.processing = pq;
fuse_pqueue_init(&queue->fpq);
@@ -1138,6 +1189,54 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
fuse_request_end(req);
}
+bool fuse_uring_queue_bq_req(struct fuse_req *req)
+{
+ struct fuse_conn *fc = req->fm->fc;
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ struct fuse_ring_ent *ring_ent = NULL;
+
+ queue = fuse_uring_task_to_queue(ring);
+ if (!queue)
+ return false;
+
+ spin_lock(&queue->lock);
+ if (unlikely(queue->stopped)) {
+ spin_unlock(&queue->lock);
+ return false;
+ }
+
+ list_add_tail(&req->list, &queue->fuse_req_bg_queue);
+
+ ring_ent = list_first_entry_or_null(&queue->ent_avail_queue,
+ struct fuse_ring_ent, list);
+ spin_lock(&fc->bg_lock);
+ fc->num_background++;
+ if (fc->num_background == fc->max_background)
+ fc->blocked = 1;
+ fuse_uring_flush_bg(queue);
+ spin_unlock(&fc->bg_lock);
+
+ /*
+ * Due to bg_queue flush limits there might be other bg requests
+ * in the queue that need to be handled first. Or no further req
+ * might be available.
+ */
+ req = list_first_entry_or_null(&queue->fuse_req_queue, struct fuse_req,
+ list);
+ if (ring_ent && req) {
+ struct io_uring_cmd *cmd = ring_ent->cmd;
+
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+
+ fuse_uring_cmd_set_ring_ent(cmd, ring_ent);
+ io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+ }
+ spin_unlock(&queue->lock);
+
+ return true;
+}
+
const struct fuse_iqueue_ops fuse_io_uring_ops = {
/* should be send over io-uring as enhancement */
.send_forget = fuse_dev_queue_forget,
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index bea4fd1532083b98dc04ba65c9a6cae2d7e36714..8c5e3ac630f245192c380d132b665d95b8f446a4 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -87,8 +87,13 @@ struct fuse_ring_queue {
/* fuse requests waiting for an entry slot */
struct list_head fuse_req_queue;
+ /* background fuse requests */
+ struct list_head fuse_req_bg_queue;
+
struct fuse_pqueue fpq;
+ unsigned int active_background;
+
bool stopped;
};
@@ -132,6 +137,7 @@ void fuse_uring_stop_queues(struct fuse_ring *ring);
void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
+bool fuse_uring_queue_bq_req(struct fuse_req *req);
static inline void fuse_uring_abort(struct fuse_conn *fc)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (13 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 14/16] fuse: Allow to queue bg " Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
When the fuse-server terminates while the fuse-client or kernel
still has queued URING_CMDs, these commands retain references
to the struct file used by the fuse connection. This prevents
fuse_dev_release() from being invoked, resulting in a hung mount
point.
This patch addresses the issue by making queued URING_CMDs
cancelable, allowing fuse_dev_release() to proceed as expected
and preventing the mount point from hanging.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 87 ++++++++++++++++++++++++++++++++++++++++++---------
fs/fuse/dev_uring_i.h | 12 +++++++
2 files changed, 85 insertions(+), 14 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 8bdfb6fcfa51976cd121bee7f2e8dec1ff9aa916..be7eaf7cc569ff77f8ebdff323634b84ea0a3f63 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -168,6 +168,7 @@ void fuse_uring_destruct(struct fuse_conn *fc)
for (qid = 0; qid < ring->nr_queues; qid++) {
struct fuse_ring_queue *queue = ring->queues[qid];
+ struct fuse_ring_ent *ent, *next;
if (!queue)
continue;
@@ -177,6 +178,12 @@ void fuse_uring_destruct(struct fuse_conn *fc)
WARN_ON(!list_empty(&queue->ent_commit_queue));
WARN_ON(!list_empty(&queue->ent_in_userspace));
+ list_for_each_entry_safe(ent, next, &queue->ent_released,
+ list) {
+ list_del_init(&ent->list);
+ kfree(ent);
+ }
+
kfree(queue->fpq.processing);
kfree(queue);
ring->queues[qid] = NULL;
@@ -262,6 +269,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
INIT_LIST_HEAD(&queue->ent_in_userspace);
INIT_LIST_HEAD(&queue->fuse_req_queue);
INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
+ INIT_LIST_HEAD(&queue->ent_released);
queue->fpq.processing = pq;
fuse_pqueue_init(&queue->fpq);
@@ -294,24 +302,27 @@ static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
/*
* Release a request/entry on connection tear down
*/
-static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
- bool need_cmd_done)
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
{
- /*
- * fuse_request_end() might take other locks like fi->lock and
- * can lead to lock ordering issues
- */
- lockdep_assert_not_held(&ent->queue->lock);
+ struct fuse_ring_queue *queue = ent->queue;
- if (need_cmd_done)
+ if (ent->need_cmd_done)
io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
IO_URING_F_UNLOCKED);
if (ent->fuse_req)
fuse_uring_stop_fuse_req_end(ent);
- list_del_init(&ent->list);
- kfree(ent);
+ /*
+ * The entry must not be freed immediately, due to access of direct
+ * pointer access of entries through IO_URING_F_CANCEL - there is a risk
+ * of race between daemon termination (which triggers IO_URING_F_CANCEL
+ * and accesses entries without checking the list state first
+ */
+ spin_lock(&queue->lock);
+ list_move(&ent->list, &queue->ent_released);
+ ent->state = FRRS_RELEASED;
+ spin_unlock(&queue->lock);
}
static void fuse_uring_stop_list_entries(struct list_head *head,
@@ -331,15 +342,15 @@ static void fuse_uring_stop_list_entries(struct list_head *head,
continue;
}
+ ent->need_cmd_done = ent->state != FRRS_USERSPACE;
+ ent->state = FRRS_TEARDOWN;
list_move(&ent->list, &to_teardown);
}
spin_unlock(&queue->lock);
/* no queue lock to avoid lock order issues */
list_for_each_entry_safe(ent, next, &to_teardown, list) {
- bool need_cmd_done = ent->state != FRRS_USERSPACE;
-
- fuse_uring_entry_teardown(ent, need_cmd_done);
+ fuse_uring_entry_teardown(ent);
queue_refs = atomic_dec_return(&ring->queue_refs);
WARN_ON_ONCE(queue_refs < 0);
@@ -447,6 +458,49 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
}
}
+/*
+ * Handle IO_URING_F_CANCEL, typically should come on daemon termination.
+ *
+ * Releasing the last entry should trigger fuse_dev_release() if
+ * the daemon was terminated
+ */
+static int fuse_uring_cancel(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+ struct fuse_ring_ent *ent = fuse_uring_cmd_to_ring_ent(cmd);
+ struct fuse_ring_queue *queue;
+ bool need_cmd_done = false;
+ int ret = 0;
+
+ /*
+ * direct access on ent - it must not be destructed as long as
+ * IO_URING_F_CANCEL might come up
+ */
+ queue = ent->queue;
+ spin_lock(&queue->lock);
+ if (ent->state == FRRS_WAIT) {
+ ent->state = FRRS_USERSPACE;
+ list_move(&ent->list, &queue->ent_in_userspace);
+ need_cmd_done = true;
+ }
+ spin_unlock(&queue->lock);
+
+ if (need_cmd_done) {
+ io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
+ } else {
+ /* io-uring handles resending */
+ ret = -EAGAIN;
+ }
+
+ return ret;
+}
+
+static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
+ struct fuse_ring_ent *ring_ent)
+{
+ fuse_uring_cmd_set_ring_ent(cmd, ring_ent);
+ io_uring_cmd_mark_cancelable(cmd, issue_flags);
+}
+
/*
* Checks for errors and stores it into the request
*/
@@ -841,6 +895,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
spin_unlock(&queue->lock);
/* without the queue lock, as other locks are taken */
+ fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
fuse_uring_commit(ring_ent, issue_flags);
/*
@@ -890,6 +945,8 @@ static void _fuse_uring_register(struct fuse_ring_ent *ring_ent,
struct fuse_conn *fc = ring->fc;
struct fuse_iqueue *fiq = &fc->iq;
+ fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
+
spin_lock(&queue->lock);
fuse_uring_ent_avail(ring_ent, queue);
spin_unlock(&queue->lock);
@@ -1039,6 +1096,9 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
return -EOPNOTSUPP;
}
+ if ((unlikely(issue_flags & IO_URING_F_CANCEL)))
+ return fuse_uring_cancel(cmd, issue_flags);
+
/* This extra SQE size holds struct fuse_uring_cmd_req */
if (!(issue_flags & IO_URING_F_SQE128))
return -EINVAL;
@@ -1170,7 +1230,6 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
if (ring_ent) {
struct io_uring_cmd *cmd = ring_ent->cmd;
-
err = -EIO;
if (WARN_ON_ONCE(ring_ent->state != FRRS_FUSE_REQ))
goto err;
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 8c5e3ac630f245192c380d132b665d95b8f446a4..4e670022ada2827657feec8e5165e56dbfb86037 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -28,6 +28,12 @@ enum fuse_ring_req_state {
/* The ring entry is in or on the way to user space */
FRRS_USERSPACE,
+
+ /* The ring entry is in teardown */
+ FRRS_TEARDOWN,
+
+ /* The ring entry is released, but not freed yet */
+ FRRS_RELEASED,
};
/** A fuse ring entry, part of the ring queue */
@@ -49,6 +55,9 @@ struct fuse_ring_ent {
*/
unsigned int state;
+ /* The entry needs io_uring_cmd_done for teardown */
+ unsigned int need_cmd_done:1;
+
struct fuse_req *fuse_req;
/* commit id to identify the server reply */
@@ -84,6 +93,9 @@ struct fuse_ring_queue {
/* entries in userspace */
struct list_head ent_in_userspace;
+ /* entries that are released */
+ struct list_head ent_released;
+
/* fuse requests waiting for an entry slot */
struct list_head fuse_req_queue;
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 16/16] fuse: enable fuse-over-io-uring
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (14 preceding siblings ...)
2024-12-09 14:56 ` [PATCH v8 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2024-12-09 14:56 ` Bernd Schubert
15 siblings, 0 replies; 22+ messages in thread
From: Bernd Schubert @ 2024-12-09 14:56 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
All required parts are handled now, fuse-io-uring can
be enabled.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 3 +++
fs/fuse/dev_uring.c | 3 +--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8f8aaf74ee8dfbe8837f48811138d4ff99b44bba..e2b1d7d6ff67c77e029383419783c46cbdb53e78 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2492,6 +2492,9 @@ const struct file_operations fuse_dev_operations = {
.fasync = fuse_dev_fasync,
.unlocked_ioctl = fuse_dev_ioctl,
.compat_ioctl = compat_ptr_ioctl,
+#ifdef CONFIG_FUSE_IO_URING
+ .uring_cmd = fuse_uring_cmd,
+#endif
};
EXPORT_SYMBOL_GPL(fuse_dev_operations);
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index be7eaf7cc569ff77f8ebdff323634b84ea0a3f63..183917edfc1afe41e806528f762951a0233dd66f 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1083,8 +1083,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
* Entry function from io_uring to handle the given passthrough command
* (op cocde IORING_OP_URING_CMD)
*/
-int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
- unsigned int issue_flags)
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
struct fuse_dev *fud;
struct fuse_conn *fc;
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v8 13/16] fuse: Allow to queue fg requests through io-uring
2024-12-09 14:56 ` [PATCH v8 13/16] fuse: Allow to queue fg requests through io-uring Bernd Schubert
@ 2024-12-10 23:14 ` kernel test robot
0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2024-12-10 23:14 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: oe-kbuild-all, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
David Wei, bernd, Bernd Schubert
Hi Bernd,
kernel test robot noticed the following build warnings:
[auto build test WARNING on e70140ba0d2b1a30467d4af6bcfe761327b9ec95]
url: https://github.com/intel-lab-lkp/linux/commits/Bernd-Schubert/fuse-rename-to-fuse_dev_end_requests-and-make-non-static/20241210-003313
base: e70140ba0d2b1a30467d4af6bcfe761327b9ec95
patch link: https://lore.kernel.org/r/20241209-fuse-uring-for-6-10-rfc4-v8-13-d9f9f2642be3%40ddn.com
patch subject: [PATCH v8 13/16] fuse: Allow to queue fg requests through io-uring
config: m68k-randconfig-r112-20241211 (https://download.01.org/0day-ci/archive/20241211/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 14.2.0
reproduce: (https://download.01.org/0day-ci/archive/20241211/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
sparse warnings: (new ones prefixed by >>)
>> fs/fuse/dev_uring.c:33:30: sparse: sparse: symbol 'fuse_io_uring_ops' was not declared. Should it be static?
vim +/fuse_io_uring_ops +33 fs/fuse/dev_uring.c
32
> 33 const struct fuse_iqueue_ops fuse_io_uring_ops;
34
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 06/16] fuse: {io-uring} Handle SQEs - register commands
2024-12-09 14:56 ` [PATCH v8 06/16] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
@ 2024-12-12 1:29 ` Joanne Koong
0 siblings, 0 replies; 22+ messages in thread
From: Joanne Koong @ 2024-12-12 1:29 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Mon, Dec 9, 2024 at 6:57 AM Bernd Schubert <[email protected]> wrote:
>
> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
> For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
> entries.
>
> Signed-off-by: Bernd Schubert <[email protected]>
Nice, thanks for your work on this! Left a few comments below
> ---
> fs/fuse/Kconfig | 12 ++
> fs/fuse/Makefile | 1 +
> fs/fuse/dev_uring.c | 339 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/dev_uring_i.h | 118 ++++++++++++++++
> fs/fuse/fuse_i.h | 5 +
> fs/fuse/inode.c | 10 ++
> include/uapi/linux/fuse.h | 76 ++++++++++-
> 7 files changed, 560 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..ca215a3cba3e310d1359d069202193acdcdb172b 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
> to be performed directly on a backing file.
>
> If you want to allow passthrough operations, answer Y.
> +
> +config FUSE_IO_URING
> + bool "FUSE communication over io-uring"
> + default y
> + depends on FUSE_FS
> + depends on IO_URING
> + help
> + This allows sending FUSE requests over the io-uring interface and
> + also adds request core affinity.
> +
> + If you want to allow fuse server/client communication through io-uring,
> + answer Y
> diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> index 2c372180d631eb340eca36f19ee2c2686de9714d..3f0f312a31c1cc200c0c91a086b30a8318e39d94 100644
> --- a/fs/fuse/Makefile
> +++ b/fs/fuse/Makefile
> @@ -15,5 +15,6 @@ fuse-y += iomode.o
> fuse-$(CONFIG_FUSE_DAX) += dax.o
> fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
> fuse-$(CONFIG_SYSCTL) += sysctl.o
> +fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
>
> virtiofs-y := virtio_fs.o
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f0c5807c94a55f9c9e2aa95ad078724971ddd125
> --- /dev/null
> +++ b/fs/fuse/dev_uring.c
> @@ -0,0 +1,339 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * FUSE: Filesystem in Userspace
> + * Copyright (c) 2023-2024 DataDirect Networks.
> + */
> +
> +#include "fuse_i.h"
> +#include "dev_uring_i.h"
> +#include "fuse_dev_i.h"
> +
> +#include <linux/fs.h>
> +#include <linux/io_uring/cmd.h>
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +static bool __read_mostly enable_uring;
> +module_param(enable_uring, bool, 0644);
> +MODULE_PARM_DESC(enable_uring,
> + "Enable userspace communication through io-uring");
> +#endif
> +
> +#define FUSE_URING_IOV_SEGS 2 /* header and payload */
> +
> +
> +bool fuse_uring_enabled(void)
> +{
> + return enable_uring;
> +}
> +
> +static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
Instead of the name fuse_ring_ent_unset_userspace(), what are your
thoughts on naming it fuse_ring_ent_set_commit()?
fuse_ring_ent_set_commit() sounds more representative to me of what
this function is intended for than fuse_ring_ent_unset_userspace(),
especially as it'll also be called by fuse_uring_commit_fetch() too
> +{
> + struct fuse_ring_queue *queue = ent->queue;
> +
> + lockdep_assert_held(&queue->lock);
> +
> + if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
> + return -EIO;
> +
> + ent->state = FRRS_COMMIT;
> + list_move(&ent->list, &queue->ent_commit_queue);
> +
> + return 0;
> +}
> +
> +void fuse_uring_destruct(struct fuse_conn *fc)
> +{
> + struct fuse_ring *ring = fc->ring;
> + int qid;
> +
> + if (!ring)
> + return;
> +
> + for (qid = 0; qid < ring->nr_queues; qid++) {
> + struct fuse_ring_queue *queue = ring->queues[qid];
> +
> + if (!queue)
> + continue;
> +
> + WARN_ON(!list_empty(&queue->ent_avail_queue));
> + WARN_ON(!list_empty(&queue->ent_commit_queue));
> +
> + kfree(queue);
> + ring->queues[qid] = NULL;
> + }
> +
> + kfree(ring->queues);
> + kfree(ring);
> + fc->ring = NULL;
> +}
> +
> +/*
> + * Basic ring setup for this connection based on the provided configuration
> + */
> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
> +{
> + struct fuse_ring *ring = NULL;
nit: don't need to set to NULL here since it gets set immediately
> + size_t nr_queues = num_possible_cpus();
> + struct fuse_ring *res = NULL;
> + size_t max_payload_size;
> +
> + ring = kzalloc(sizeof(*fc->ring), GFP_KERNEL_ACCOUNT);
> + if (!ring)
> + return NULL;
> +
> + ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
> + GFP_KERNEL_ACCOUNT);
> + if (!ring->queues)
> + goto out_err;
> +
> + max_payload_size = max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write);
I think we can just use max here instead of max_t since
FUSE_MIN_READ_BUFFER is never negative so the signed to unsigned
promotion will be okay
> + max_payload_size =
> + max_t(size_t, max_payload_size, fc->max_pages * PAGE_SIZE);
Same here, i think we can just use max here instead of max_t
> +
> + spin_lock(&fc->lock);
> + if (fc->ring) {
> + /* race, another thread created the ring in the meantime */
> + spin_unlock(&fc->lock);
> + res = fc->ring;
> + goto out_err;
> + }
> +
> + fc->ring = ring;
> + ring->nr_queues = nr_queues;
> + ring->fc = fc;
> + ring->max_payload_sz = max_payload_size;
> +
> + spin_unlock(&fc->lock);
> + return ring;
> +
> +out_err:
> + kfree(ring->queues);
> + kfree(ring);
> + return res;
> +}
> +
> +static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> + int qid)
> +{
> + struct fuse_conn *fc = ring->fc;
> + struct fuse_ring_queue *queue;
> +
> + queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
> + if (!queue)
> + return ERR_PTR(-ENOMEM);
I think we need to return NULL here since fuse_uring_register() checks
"if (!queue)" for error
> + queue->qid = qid;
> + queue->ring = ring;
> + spin_lock_init(&queue->lock);
> +
> + INIT_LIST_HEAD(&queue->ent_avail_queue);
> + INIT_LIST_HEAD(&queue->ent_commit_queue);
> +
> + spin_lock(&fc->lock);
> + if (ring->queues[qid]) {
> + spin_unlock(&fc->lock);
> + kfree(queue);
> + return ring->queues[qid];
> + }
> +
> + WRITE_ONCE(ring->queues[qid], queue);
Thanks for your explanation on v7 about why this needs WRITE_ONCE.
Might be worth including that as a comment here for future readers.
> + spin_unlock(&fc->lock);
> +
> + return queue;
> +}
> +
> +/*
> + * Make a ring entry available for fuse_req assignment
> + */
> +static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
> + struct fuse_ring_queue *queue)
> +{
> + list_move(&ring_ent->list, &queue->ent_avail_queue);
> + ring_ent->state = FRRS_WAIT;
Just curious what your thoughts are on this - would it make sense to
rename FRRS_WAIT to FRRS_AVAILABLE? It seems like FRRS_WAIT is the
state where the entry is available for new requests, and
FRRS_AVAILABLE might be more descriptive of a name than FRRS_WAIT?
Feel free to nix the idea though if you hate it
> +}
> +
> +/*
> + * fuse_uring_req_fetch command handling
> + */
> +static void _fuse_uring_register(struct fuse_ring_ent *ring_ent,
> + struct io_uring_cmd *cmd,
> + unsigned int issue_flags)
> +{
> + struct fuse_ring_queue *queue = ring_ent->queue;
> +
> + spin_lock(&queue->lock);
> + fuse_uring_ent_avail(ring_ent, queue);
> + spin_unlock(&queue->lock);
> +}
> +
> +/*
> + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
> + * the payload
> + */
> +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
> + struct iovec iov[FUSE_URING_IOV_SEGS])
> +{
> + struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
> + struct iov_iter iter;
> + ssize_t ret;
> +
> + if (sqe->len != FUSE_URING_IOV_SEGS)
> + return -EINVAL;
> +
> + /*
> + * Direction for buffer access will actually be READ and WRITE,
> + * using write for the import should include READ access as well.
> + */
> + ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
> + FUSE_URING_IOV_SEGS, &iov, &iter);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +/* Register header and payload buffer with the kernel and fetch a request */
> +static int fuse_uring_register(struct io_uring_cmd *cmd,
> + unsigned int issue_flags, struct fuse_conn *fc)
> +{
> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
> + struct fuse_ring *ring = fc->ring;
> + struct fuse_ring_queue *queue;
> + struct fuse_ring_ent *ring_ent;
> + int err;
> + struct iovec iov[FUSE_URING_IOV_SEGS];
> + size_t payload_size;
> + unsigned int qid = READ_ONCE(cmd_req->qid);
Why do we need READ_ONCE()? I looked at the ublk_drv.c code and they
do this too for some io_uring_sqe_cmd()s but not for others. My (maybe
wrong) understanding is that cmd_req->qid won't ever be concurrently
modified?
> +
> + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> + if (err) {
> + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
> + err);
> + return err;
> + }
> +
> + err = -ENOMEM;
> + if (!ring) {
> + ring = fuse_uring_create(fc);
> + if (!ring)
> + return err;
> + }
> +
> + if (qid >= ring->nr_queues) {
> + pr_info_ratelimited("fuse: Invalid ring qid %u\n", qid);
> + return -EINVAL;
> + }
> +
> + err = -ENOMEM;
> + queue = ring->queues[qid];
> + if (!queue) {
> + queue = fuse_uring_create_queue(ring, qid);
> + if (!queue)
> + return err;
> + }
> +
> + /*
> + * The created queue above does not need to be destructed in
> + * case of entry errors below, will be done at ring destruction time.
> + */
> +
> + ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
> + if (!ring_ent)
> + return err;
> +
> + INIT_LIST_HEAD(&ring_ent->list);
> +
> + ring_ent->queue = queue;
> + ring_ent->cmd = cmd;
> +
> + err = -EINVAL;
> + if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
> + pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
> + goto err;
> + }
> +
> + ring_ent->headers = iov[0].iov_base;
> + ring_ent->payload = iov[1].iov_base;
> + payload_size = iov[1].iov_len;
> +
> + if (payload_size < ring->max_payload_sz) {
> + pr_info_ratelimited("Invalid req payload len %zu\n",
> + payload_size);
> + goto err;
> + }
> +
> + spin_lock(&queue->lock);
> +
> + /*
> + * FUSE_IO_URING_CMD_REGISTER is an initialization exception, needs
> + * state override
> + */
> + ring_ent->state = FRRS_USERSPACE;
> + err = fuse_ring_ent_unset_userspace(ring_ent);
> + spin_unlock(&queue->lock);
> + if (WARN_ON_ONCE(err))
imo, the WARN_ON_ONCE isn't necessary since this condition has the
WARN_ON_ONCE() already in fuse_ring_ent_unset_userspace()
> + goto err;
> +
This looks good to me but it might look even cleaner to move the
ring_ent logic into another function and then call that here.
> + _fuse_uring_register(ring_ent, cmd, issue_flags);
IMO, _fuse_uring_register() as a function name is too similar to
fuse_uring_register(). Maybe "fuse_uring_do_register()" instead? kind
of like how there's fuse_dev_write() and fuse_dev_do_write()?
> +
> + return 0;
> +err:
> + list_del_init(&ring_ent->list);
> + kfree(ring_ent);
> + return err;
> +}
> +
> +/*
> + * Entry function from io_uring to handle the given passthrough command
> + * (op cocde IORING_OP_URING_CMD)
nit: "cocde" -> "code"
> + */
> +int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
> + unsigned int issue_flags)
> +{
> + struct fuse_dev *fud;
> + struct fuse_conn *fc;
> + u32 cmd_op = cmd->cmd_op;
> + int err;
> +
> + if (!enable_uring) {
> + pr_info_ratelimited("fuse-io-uring is disabled\n");
> + return -EOPNOTSUPP;
> + }
> +
> + /* This extra SQE size holds struct fuse_uring_cmd_req */
> + if (!(issue_flags & IO_URING_F_SQE128))
> + return -EINVAL;
> +
> + fud = fuse_get_dev(cmd->file);
> + if (!fud) {
> + pr_info_ratelimited("No fuse device found\n");
> + return -ENOTCONN;
> + }
> + fc = fud->fc;
> +
> + if (fc->aborted)
> + return -ECONNABORTED;
> + if (!fc->connected)
> + return -ENOTCONN;
> +
> + /*
> + * fuse_uring_register() needs the ring to be initialized,
> + * we need to know the max payload size
> + */
Does this comment belong here?
> + if (!fc->initialized)
> + return -EAGAIN;
> +
> + switch (cmd_op) {
> + case FUSE_IO_URING_CMD_REGISTER:
Nice, this opcode name seems a lot more clear to me.
> + err = fuse_uring_register(cmd, issue_flags, fc);
> + if (err) {
> + pr_info_once("FUSE_IO_URING_CMD_REGISTER failed err=%d\n",
pr_info instead of pr_info_once seems more useful here. My
understanding of pr_info_once is that this message would get printed
only once during the kernel's lifetime, but there could be multiple
fuse servers wanting to use io-uring
> + err);
> + return err;
> + }
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return -EIOCBQUEUED;
> +}
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..73e9e3063bb038e8341d85cd2a440421275e6aa8
> --- /dev/null
> +++ b/fs/fuse/dev_uring_i.h
> @@ -0,0 +1,118 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * FUSE: Filesystem in Userspace
> + * Copyright (c) 2023-2024 DataDirect Networks.
> + */
> +
> +#ifndef _FS_FUSE_DEV_URING_I_H
> +#define _FS_FUSE_DEV_URING_I_H
> +
> +#include "fuse_i.h"
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +
> +enum fuse_ring_req_state {
> + FRRS_INVALID = 0,
> +
> + /* The ring entry received from userspace and it is being processed */
> + FRRS_COMMIT,
> +
> + /* The ring entry is waiting for new fuse requests */
> + FRRS_WAIT,
> +
> + /* The ring entry is in or on the way to user space */
> + FRRS_USERSPACE,
> +};
> +
> +/** A fuse ring entry, part of the ring queue */
> +struct fuse_ring_ent {
> + /* userspace buffer */
> + struct fuse_uring_req_header __user *headers;
> + void *__user *payload;
Is this supposed to be void __user *payload or void *__user *payload?
i see the definition for iovec as
struct iovec {
void *iov_base;
size_t iov_len;
};
and then in fuse_uring_register() we do "ring_ent->payload =
iov[1].iov_base". It seems like this should be "void __user *payload"?
> +
> + /* the ring queue that owns the request */
> + struct fuse_ring_queue *queue;
> +
> + struct io_uring_cmd *cmd;
> +
> + struct list_head list;
> +
> + /*
> + * state the request is currently in
> + * (enum fuse_ring_req_state)
> + */
> + unsigned int state;
Any reason why we don't define this as "enum fuse_ring_req_state
state;"? Then we could get rid of that 2nd line in the comment as well
Might also be worth including a comment here that it's protected by
the ring queue spinlock.
> +
> + struct fuse_req *fuse_req;
> +
> + /* commit id to identify the server reply */
> + uint64_t commit_id;
> +};
> +
> +struct fuse_ring_queue {
> + /*
> + * back pointer to the main fuse uring structure that holds this
> + * queue
> + */
> + struct fuse_ring *ring;
> +
> + /* queue id, typically also corresponds to the cpu core */
If I'm understanding it correctly, qid will always correspond to the
cpu core, correct? Should we get rid of "typically" here? i think that
sets the expectation that it might not.
> + unsigned int qid;
> +
> + /*
> + * queue lock, taken when any value in the queue changes _and_ also
> + * a ring entry state changes.
> + */
> + spinlock_t lock;
> +
> + /* available ring entries (struct fuse_ring_ent) */
> + struct list_head ent_avail_queue;
IMO, I think the name could just be "avail_queue" and "commit_queue"
instead of "ent_avail_queue" and "ent_commit_queue".
> +
> + /*
> + * entries in the process of being committed or in the process
> + * to be send to userspace
nit: "send" -> "sent"
> + */
> + struct list_head ent_commit_queue;
> +};
> +
> +/**
> + * Describes if uring is for communication and holds alls the data needed
> + * for uring communication
> + */
IMO, this could just be "Holds all the data needed for uring
communication". i think the first part of this comment (eg "describes
if uring is for communication") applies more to the "bool
fuse_uring_enabled(void);" line.
> +struct fuse_ring {
> + /* back pointer */
> + struct fuse_conn *fc;
> +
> + /* number of ring queues */
> + size_t nr_queues;
> +
> + /* maximum payload/arg size */
> + size_t max_payload_sz;
> +
> + struct fuse_ring_queue **queues;
> +};
> +
> +bool fuse_uring_enabled(void);
> +void fuse_uring_destruct(struct fuse_conn *fc);
> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
> +
> +#else /* CONFIG_FUSE_IO_URING */
> +
> +struct fuse_ring;
> +
> +static inline void fuse_uring_create(struct fuse_conn *fc)
> +{
> +}
> +
> +static inline void fuse_uring_destruct(struct fuse_conn *fc)
> +{
> +}
> +
> +static inline bool fuse_uring_enabled(void)
> +{
> + return false;
> +}
> +
> +#endif /* CONFIG_FUSE_IO_URING */
> +
> +#endif /* _FS_FUSE_DEV_URING_I_H */
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index babddd05303796d689a64f0f5a890066b43170ac..d75dd9b59a5c35b76919db760645464f604517f5 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -923,6 +923,11 @@ struct fuse_conn {
> /** IDR for backing files ids */
> struct idr backing_files_map;
> #endif
> +
> +#ifdef CONFIG_FUSE_IO_URING
> + /** uring connection information*/
> + struct fuse_ring *ring;
> +#endif
> };
>
> /*
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 3ce4f4e81d09e867c3a7db7b1dbb819f88ed34ef..e4f9bbacfc1bc6f51d5d01b4c47b42cc159ed783 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -7,6 +7,7 @@
> */
>
> #include "fuse_i.h"
> +#include "dev_uring_i.h"
>
> #include <linux/pagemap.h>
> #include <linux/slab.h>
> @@ -992,6 +993,8 @@ static void delayed_release(struct rcu_head *p)
> {
> struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
>
> + fuse_uring_destruct(fc);
> +
> put_user_ns(fc->user_ns);
> fc->release(fc);
> }
> @@ -1446,6 +1449,13 @@ void fuse_send_init(struct fuse_mount *fm)
> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> flags |= FUSE_PASSTHROUGH;
>
> + /*
> + * This is just an information flag for fuse server. No need to check
> + * the reply - server is either sending IORING_OP_URING_CMD or not.
> + */
> + if (fuse_uring_enabled())
> + flags |= FUSE_OVER_IO_URING;
> +
> ia->in.flags = flags;
> ia->in.flags2 = flags >> 32;
>
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index f1e99458e29e4fdce5273bc3def242342f207ebd..388cb4b93f48575d5e57c27b02f59a80e2fbe93c 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -220,6 +220,15 @@
> *
> * 7.41
> * - add FUSE_ALLOW_IDMAP
> + * 7.42
> + * - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
> + * structures:
> + * - struct fuse_uring_ent_in_out
> + * - struct fuse_uring_req_header
> + * - struct fuse_uring_cmd_req
> + * - FUSE_URING_IN_OUT_HEADER_SZ
> + * - FUSE_URING_OP_IN_OUT_SZ
> + * - enum fuse_uring_cmd
> */
>
> #ifndef _LINUX_FUSE_H
> @@ -255,7 +264,7 @@
> #define FUSE_KERNEL_VERSION 7
>
> /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 41
> +#define FUSE_KERNEL_MINOR_VERSION 42
>
> /** The node ID of the root inode */
> #define FUSE_ROOT_ID 1
> @@ -425,6 +434,7 @@ struct fuse_file_lock {
> * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
> * of the request ID indicates resend requests
> * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
> + * FUSE_OVER_IO_URING: Indicate that Client supports io-uring
nit: "Client" -> "client"
> */
> #define FUSE_ASYNC_READ (1 << 0)
> #define FUSE_POSIX_LOCKS (1 << 1)
> @@ -471,6 +481,7 @@ struct fuse_file_lock {
> /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
> #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP
> #define FUSE_ALLOW_IDMAP (1ULL << 40)
> +#define FUSE_OVER_IO_URING (1ULL << 41)
>
> /**
> * CUSE INIT request/reply flags
> @@ -1206,4 +1217,67 @@ struct fuse_supp_groups {
> uint32_t groups[];
> };
>
> +/**
> + * Size of the ring buffer header
> + */
> +#define FUSE_URING_IN_OUT_HEADER_SZ 128
> +#define FUSE_URING_OP_IN_OUT_SZ 128
> +
> +struct fuse_uring_ent_in_out {
> + uint64_t flags;
> +
> + /*
> + * commit ID to be used in a reply to a ring request (see also
> + * struct fuse_uring_cmd_req)
> + */
> + uint64_t commit_id;
> +
> + /* size of use payload buffer */
nit: "use" -> "user"
> + uint32_t payload_sz;
> + uint32_t padding;
> +
> + uint64_t reserved;
> +};
If I'm understanding it correctly, this is for a fuse-uring entry
specific header? Might be worth including that as a comment at the
top, just to be explicit. It took me a bit of digging to figure out
that this is to be used as a header
> +
> +/**
> + * Header for all fuse-io-uring requests
> + */
> +struct fuse_uring_req_header {
> + /* struct fuse_in / struct fuse_out */
> + char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
Does this hold struct fuse_in_header / struct fuse_out_header? (I
see the comment says "struct fuse_in / struct fuse_out", but I don't
see those structs defined anywhere but maybe I'm missing something)
> +
> + /* per op code structs */
IMO, "per op header" sounds more descriptive of a comment
> + char op_in[FUSE_URING_OP_IN_OUT_SZ];
> +
> + /* struct fuse_ring_in_out */
> + char ring_ent_in_out[sizeof(struct fuse_uring_ent_in_out)];
Just curious, is there a reason this can't be "struct
fuse_uring_ent_in_out ent_in_out;" instead of having it defined as a
char array?
> +};
> +
> +/**
> + * sqe commands to the kernel
> + */
> +enum fuse_uring_cmd {
> + FUSE_IO_URING_CMD_INVALID = 0,
> +
> + /* register the request buffer and fetch a fuse request */
> + FUSE_IO_URING_CMD_REGISTER = 1,
> +
> + /* commit fuse request result and fetch next request */
> + FUSE_IO_URING_CMD_COMMIT_AND_FETCH = 2,
> +};
> +
> +/**
> + * In the 80B command area of the SQE.
> + */
> +struct fuse_uring_cmd_req {
> + uint64_t flags;
> +
> + /* entry identifier for commits */
> + uint64_t commit_id;
> +
> + /* queue the command is for (queue index) */
> + uint16_t qid;
> + uint8_t padding[6];
> +};
> +
> #endif /* _LINUX_FUSE_H */
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 07/16] fuse: Make fuse_copy non static
2024-12-09 14:56 ` [PATCH v8 07/16] fuse: Make fuse_copy non static Bernd Schubert
@ 2024-12-13 0:50 ` Joanne Koong
0 siblings, 0 replies; 22+ messages in thread
From: Joanne Koong @ 2024-12-13 0:50 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Mon, Dec 9, 2024 at 6:57 AM Bernd Schubert <[email protected]> wrote:
>
> Move 'struct fuse_copy_state' and fuse_copy_* functions
> to fuse_dev_i.h to make it available for fuse-io-uring.
> 'copy_out_args()' is renamed to 'fuse_copy_out_args'.
>
> Signed-off-by: Bernd Schubert <[email protected]>
LGTM.
Reviewed-by: Joanne Koong <[email protected]>
> ---
> fs/fuse/dev.c | 30 ++++++++----------------------
> fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++
> 2 files changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 623c5a067c1841e8210b5b4e063e7b6690f1825a..6ee7e28a84c80a3e7c8dc933986c0388371ff6cd 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -678,22 +678,8 @@ static int unlock_request(struct fuse_req *req)
> return err;
> }
>
> -struct fuse_copy_state {
> - int write;
> - struct fuse_req *req;
> - struct iov_iter *iter;
> - struct pipe_buffer *pipebufs;
> - struct pipe_buffer *currbuf;
> - struct pipe_inode_info *pipe;
> - unsigned long nr_segs;
> - struct page *pg;
> - unsigned len;
> - unsigned offset;
> - unsigned move_pages:1;
> -};
> -
> -static void fuse_copy_init(struct fuse_copy_state *cs, int write,
> - struct iov_iter *iter)
> +void fuse_copy_init(struct fuse_copy_state *cs, int write,
> + struct iov_iter *iter)
> {
> memset(cs, 0, sizeof(*cs));
> cs->write = write;
> @@ -1054,9 +1040,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
> }
>
> /* Copy request arguments to/from userspace buffer */
> -static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
> - unsigned argpages, struct fuse_arg *args,
> - int zeroing)
> +int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
> + unsigned argpages, struct fuse_arg *args,
> + int zeroing)
> {
> int err = 0;
> unsigned i;
> @@ -1933,8 +1919,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
> return NULL;
> }
>
> -static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
> - unsigned nbytes)
> +int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
> + unsigned nbytes)
> {
> unsigned reqsize = sizeof(struct fuse_out_header);
>
> @@ -2036,7 +2022,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
> if (oh.error)
> err = nbytes != sizeof(oh) ? -EINVAL : 0;
> else
> - err = copy_out_args(cs, req->args, nbytes);
> + err = fuse_copy_out_args(cs, req->args, nbytes);
> fuse_copy_finish(cs);
>
> spin_lock(&fpq->lock);
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 08a7e88e002773fcd18c25a229c7aa6450831401..21eb1bdb492d04f0a406d25bb8d300b34244dce2 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -12,6 +12,23 @@
> #define FUSE_INT_REQ_BIT (1ULL << 0)
> #define FUSE_REQ_ID_STEP (1ULL << 1)
>
> +struct fuse_arg;
> +struct fuse_args;
> +
> +struct fuse_copy_state {
> + int write;
> + struct fuse_req *req;
> + struct iov_iter *iter;
> + struct pipe_buffer *pipebufs;
> + struct pipe_buffer *currbuf;
> + struct pipe_inode_info *pipe;
> + unsigned long nr_segs;
> + struct page *pg;
> + unsigned int len;
> + unsigned int offset;
> + unsigned int move_pages:1;
> +};
> +
> static inline struct fuse_dev *fuse_get_dev(struct file *file)
> {
> /*
> @@ -23,5 +40,13 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
>
> void fuse_dev_end_requests(struct list_head *head);
>
> +void fuse_copy_init(struct fuse_copy_state *cs, int write,
> + struct iov_iter *iter);
nit: indentation of this line is misaligned
> +int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
> + unsigned int argpages, struct fuse_arg *args,
> + int zeroing);
> +int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
> + unsigned int nbytes);
> +
> #endif
>
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 08/16] fuse: Add fuse-io-uring handling into fuse_copy
2024-12-09 14:56 ` [PATCH v8 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
@ 2024-12-13 1:25 ` Joanne Koong
0 siblings, 0 replies; 22+ messages in thread
From: Joanne Koong @ 2024-12-13 1:25 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Mon, Dec 9, 2024 at 6:57 AM Bernd Schubert <[email protected]> wrote:
>
> Add special fuse-io-uring into the fuse argument
> copy handler.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> fs/fuse/dev.c | 12 +++++++++++-
> fs/fuse/fuse_dev_i.h | 5 +++++
> 2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 6ee7e28a84c80a3e7c8dc933986c0388371ff6cd..2ba153054f7ba61a870c847cb87d81168220661f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -786,6 +786,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
> *size -= ncpy;
> cs->len -= ncpy;
> cs->offset += ncpy;
> + if (cs->is_uring)
> + cs->ring.offset += ncpy;
> +
> return ncpy;
> }
>
> @@ -1922,7 +1925,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
> int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
> unsigned nbytes)
> {
> - unsigned reqsize = sizeof(struct fuse_out_header);
> +
> + unsigned int reqsize = 0;
> +
> + /*
> + * Uring has all headers separated from args - args is payload only
> + */
> + if (!cs->is_uring)
> + reqsize = sizeof(struct fuse_out_header);
>
> reqsize += fuse_len_args(args->out_numargs, args->out_args);
>
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 21eb1bdb492d04f0a406d25bb8d300b34244dce2..0708730b656b97071de9a5331ef4d51a112c602c 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -27,6 +27,11 @@ struct fuse_copy_state {
> unsigned int len;
> unsigned int offset;
> unsigned int move_pages:1;
> + unsigned int is_uring:1;
> + struct {
> + /* overall offset with the user buffer */
> + unsigned int offset;
> + } ring;
I find it a bit unintuitive that this is named offset when it's used
only to keep track of the payload size. Maybe this should be renamed?
Thanks,
Joanne
> };
>
> static inline struct fuse_dev *fuse_get_dev(struct file *file)
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 09/16] fuse: {io-uring} Make hash-list req unique finding functions non-static
2024-12-09 14:56 ` [PATCH v8 09/16] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
@ 2024-12-13 1:41 ` Joanne Koong
0 siblings, 0 replies; 22+ messages in thread
From: Joanne Koong @ 2024-12-13 1:41 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Mon, Dec 9, 2024 at 6:57 AM Bernd Schubert <[email protected]> wrote:
>
> fuse-over-io-uring uses existing functions to find requests based
> on their unique id - make these functions non-static.
>
> Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
> ---
> fs/fuse/dev.c | 6 +++---
> fs/fuse/fuse_dev_i.h | 6 ++++++
> fs/fuse/fuse_i.h | 5 +++++
> fs/fuse/inode.c | 2 +-
> 4 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 2ba153054f7ba61a870c847cb87d81168220661f..a45d92431769d4aadaf5c5792086abc5dda3c048 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -220,7 +220,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq)
> }
> EXPORT_SYMBOL_GPL(fuse_get_unique);
>
> -static unsigned int fuse_req_hash(u64 unique)
> +unsigned int fuse_req_hash(u64 unique)
> {
> return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
> }
> @@ -1910,7 +1910,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
> }
>
> /* Look up request on processing list by unique ID */
> -static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
> +struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique)
> {
> unsigned int hash = fuse_req_hash(unique);
> struct fuse_req *req;
> @@ -1994,7 +1994,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
> spin_lock(&fpq->lock);
> req = NULL;
> if (fpq->connected)
> - req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
> + req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
>
> err = -ENOENT;
> if (!req) {
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 0708730b656b97071de9a5331ef4d51a112c602c..d7bf72dabd84c3896d1447380649e2f4d20b0643 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -7,6 +7,7 @@
> #define _FS_FUSE_DEV_I_H
>
> #include <linux/types.h>
> +#include <linux/fs.h>
Is this include needed?
>
> /* Ordinary requests have even IDs, while interrupts IDs are odd */
> #define FUSE_INT_REQ_BIT (1ULL << 0)
> @@ -14,6 +15,8 @@
>
> struct fuse_arg;
> struct fuse_args;
> +struct fuse_pqueue;
> +struct fuse_req;
>
> struct fuse_copy_state {
> int write;
> @@ -43,6 +46,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
> return READ_ONCE(file->private_data);
> }
>
> +unsigned int fuse_req_hash(u64 unique);
> +struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
> +
> void fuse_dev_end_requests(struct list_head *head);
>
> void fuse_copy_init(struct fuse_copy_state *cs, int write,
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index d75dd9b59a5c35b76919db760645464f604517f5..e545b0864dd51e82df61cc39bdf65d3d36a418dc 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1237,6 +1237,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
> */
> struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
>
> +/**
> + * Initialize the fuse processing queue
> + */
> +void fuse_pqueue_init(struct fuse_pqueue *fpq);
> +
> /**
> * Initialize fuse_conn
> */
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index e4f9bbacfc1bc6f51d5d01b4c47b42cc159ed783..328797b9aac9a816a4ad2c69b6880dc6ef6222b0 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -938,7 +938,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq,
> fiq->priv = priv;
> }
>
> -static void fuse_pqueue_init(struct fuse_pqueue *fpq)
> +void fuse_pqueue_init(struct fuse_pqueue *fpq)
> {
> unsigned int i;
>
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-12-13 1:41 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 14:56 [PATCH v8 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 03/16] fuse: Move request bits Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 06/16] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
2024-12-12 1:29 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 07/16] fuse: Make fuse_copy non static Bernd Schubert
2024-12-13 0:50 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
2024-12-13 1:25 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 09/16] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
2024-12-13 1:41 ` Joanne Koong
2024-12-09 14:56 ` [PATCH v8 10/16] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 11/16] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 12/16] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 13/16] fuse: Allow to queue fg requests through io-uring Bernd Schubert
2024-12-10 23:14 ` kernel test robot
2024-12-09 14:56 ` [PATCH v8 14/16] fuse: Allow to queue bg " Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2024-12-09 14:56 ` [PATCH v8 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox