public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v9 00/17] fuse: fuse-over-io-uring
@ 2025-01-07  0:25 Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 01/17] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
                   ` (17 more replies)
  0 siblings, 18 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds support for io-uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
approach was taken from ublk.

Motivation for these patches is all to increase fuse performance,
by:
- Reducing kernel/userspace context switches
    - Part of that is given by the ring ring - handling multiple
      requests on either side of kernel/userspace without the need
      to switch per request
    - Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting
      the result of a request and fetching the next fuse request
      in one step. In contrary to legacy read/write to /dev/fuse
- Core and numa affinity - one ring per core, which allows to
  avoid cpu core context switches

A more detailed motivation description can be found in the
introction of previous patch series
https://lore.kernel.org/r/[email protected]
That description also includes benchmark results with RFCv1.
Performance with the current series needs to be tested, but will
be lower, as several optimization patches are missing, like
wake-up on the same core. These optimizations will be submitted
after merging the main changes.

The corresponding libfuse patches are on my uring branch, but needs
cleanup for submission - that will be done once the kernel design
will not change anymore
https://github.com/bsbernd/libfuse/tree/uring

Testing with that libfuse branch is possible by running something
like:

example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring  --uring-q-depth=128 /scratch/source /scratch/dest

With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:

cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
    unique: 4, result=104

Without the --uring option "cqe" is replaced by the default "dev"

dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
   unique: 4, success, outsize: 120

Future work
- different payload sizes per ring
- zero copy

Signed-off-by: Bernd Schubert <[email protected]>
---
Changes in v9:
- Fixed a queue->lock/fc->bg_lock order issue, fuse_block_alloc() now waits
  until fc->io_uring is ready
- Renamed fuse_ring_ent_unset_userspace to fuse_ring_ent_set_commit (Joanne)
- No need to initialize *ring to NULL in fuse_uring_create (Joanne)
- Use max() instead of max_t in fuse_uring_create (Joanne)
- Rename FRRS_WAIT to FRRS_AVAILABLE (Joanne)
- Add comment for WRITE_ONCE(ring->queues[qid], ...) (Joanne)
- Rename _fuse_uring_register to fuse_uring_do_register (Joanne)
- Split out fuse_uring_create_ring_ent() (Joanne)
- Use 'struct fuse_uring_ent_in_out' instead of char[] in
  fuse_uring_req_header (Joanne)
- Set fuse_ring_ent->cmd to NULL to ensure io-uring commands cannot
  be used two times (Pavel). That also allows to simplify
  fuse_uring_entry_teardown().
- Fix return value on allocation failure in fuse_uring_create_queue (Joanne)
- Renamed struct fuse_copy_state.ring.offset to .copied_sz
- static const struct fuse_iqueue_ops fuse_io_uring_ops (kernel test robot)
- ring_ent->commit_id was removed and req->in.h.unique is set in the request
  header as commit id.
- Rename of "ring_ent" to "ent" in several functions
- Rename struct fuse_uring_cmd_pdu to struct fuse_uring_pdu
- Link to v8: https://lore.kernel.org/r/[email protected]
- No return code from fuse_uring_cancel(), io-uring handles
  resending IO_URING_F_CANCEL on its own (Pavel)

Changes in v8:
- Move the lock in fuse_uring_create_queue to have initialization before
  taking fc->lock (Joanne)
- Avoid double assignment of ring_ent->cmd (Pavel)
- Set a global ring may_payload size instead of per ring-entry (Joanne)
- Also consider fc->max_pages for the max payload size (instead of
  fc->max_write only) (Joanne)
- Fix freeing of ring->queues in error case in fuse_uring_create (Joanne)
- Fix allocation size of the ring, including queues was a leftover from
  previous patches (Miklos, Joanne)
- Move FUSE_URING_IOV_SEGS definiton to the top of dev_uring.c (Joanne)a
- Update Documentation/filesystems/fuse-io-uring.rst and use 'io-uring'
  instead of 'uring' (Pavel)
- Rename SQE op codes to FUSE_IO_URING_CMD_REGISTER and
  FUSE_IO_URING_CMD_COMMIT_AND_FETCH
- Use READ_ONCE on data in 80B SQE section (struct fuse_uring_cmd_req)
  (Pavel)
- Add back sanity check for IO_URING_F_SQE128 (had been initially there,
  but got lost between different version updates) (Pavel)
- Remove pr_devel logs (Miklos)
- Only set fuse_uring_cmd() in to file_operations in the last patch
  and mark that function with __maybe_unused before, to avoid potential
  compiler warnings (Pavel)
- Add missing sanity for qid < ring->nr_queues
- Add check for fc->initialized - FUSE_IO_URING_CMD_REGISTER must only
  arrive after FUSE_INIT in order to know the max payload size
- Add in 'struct fuse_uring_ent_in_out' and add in the commit id.
  For now the commit id is the request unique, but any number
  that can identify the corresponding struct fuse_ring_ent object.
  The current way via struct fuse_req needs struct fuse_pqueue per
  queue (>2kb per core/queue), has hash overhead and is not suitable
  for requests types without a unique (like future updates for notify
- Increase FUSE_KERNEL_MINOR_VERSION to 42
- Separate out make fuse_request_find/fuse_req_hash/fuse_pqueue_init
  non-static to simplify review
- Don't return too early in fuse_uring_copy_to_ring, to always update
  'ring_ent_in_out'
- Code simplification of fuse_uring_next_fuse_req()
- fuse_uring_commit_fetch was accidentally doing a full copy on stack
  of queue->fpq
- Separate out setting and getting values from io_uring_cmd *cmd->pdu
  into functions
- Fix freeing of queue->ent_released (was accidentally in the wrong
  function)
- Remove the queue from fuse_uring_cmd_pdu, ring_ent is enough since
  v7
- Return -EAGAIN for IO_URING_F_CANCEL when ring-entries are in the
  wrong state. To be clarified with io-uring upstream if that is right.
- Slight simplifaction by using list_first_entry_or_null instead of
  extra checks if the list is empty
- Link to v7: https://lore.kernel.org/r/[email protected]

Changes in v7:
- Bug fixes:
   - Removed unsetting ring->ready as that brought up a lock
     order violation for fc->bg_lock/queue->lock
   - Check for !fc->connected in fuse_uring_cmd(), tear down issues
     came up with large ring sizes without that.
   - Removal of (arg->size == 0) condition and warning in fuse_copy_args
     as that is actually expected for some op codes.
- New init flag: FUSE_OVER_IO_URING to tell fuse-server about over-io-uring
                 capability
- Use fuse_set_zero_arg0() to set arg0 and rename to struct fuse_zero_header
  (I hope I got Miklos suggestion right)
- Simplification of fuse_uring_ent_avail()
- Renamed some structs in uapi/linux/fuse.h to fuse_uring
  (from fuse_ring) to be consistent
- Removal of 'if 0' in fuse_uring_cmd()
- Return -E... directly in fuse_uring_cmd() instead of setting err first
  and removal of goto's in that function.
- Just a simple WARN_ON_ONCE() for (oh->unique & FUSE_INT_REQ_BIT) as
  that code should be unreachable
- Removal of several pr_devel and some pr_warn() messages
- Removed RFC as it passed several xfstests runs now
- Link to v6: https://lore.kernel.org/r/[email protected]

Changes in v6:
- Update to linux-6.12
- Use 'struct fuse_iqueue_ops' and redirect fiq->ops once
  the ring is ready.
- Fix return code from fuse_uring_copy_from_ring on
  copy_from_user failure (Dan Carpenter / kernel test robot)
- Avoid list iteration in fuse_uring_cancel (Joanne)
- Simplified struct fuse_ring_req_header
	- Adds a new 'struct struct fuse_ring_ent_in_out'
- Fix assigning ring->queues[qid] in fuse_uring_create_queue,
  it was too early, resulting in races
- Add back 'FRRS_INVALID = 0' to ensure ring-ent states always
  have a value > 0
- Avoid assigning struct io_uring_cmd *cmd->pdu multiple times,
  once on settings up IO_URING_F_CANCEL is sufficient for sending
  the request as well.
- Link to v5: https://lore.kernel.org/r/[email protected]

Changes in v5:
- Main focus in v5 is the separation of headers from payload,
  which required to introduce 'struct fuse_zero_in'.
- Addressed several teardown issues, that were a regression in v4.
- Fixed "BUG: sleeping function called" due to allocation while
  holding a lock reported by David Wei
- Fix function comment reported by kernel test rebot
- Fix set but unused variabled reported by test robot
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Removal of ioctls, all configuration is done dynamically
  on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
  allocated as array of the ring/queue - removal of the tag
  variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
  is more cumbersome now and needs an almost unused
  struct fuse_pqueue per fuse_ring_queue and uses the unique
  id of fuse requests.
- No device clones needed for to workaroung hanging mounts
  on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.

Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
  as that needs to go through another subsystem/tree) ,
  removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
  for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
  (Josef)
- Addressed several other comments from Josef (I need to go over
  the RFCv2 review again, I'm not sure if everything is addressed
  already)

- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com
- Link to v2: https://lore.kernel.org/all/[email protected]/
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Bernd Schubert (17):
      fuse: rename to fuse_dev_end_requests and make non-static
      fuse: Move fuse_get_dev to header file
      fuse: Move request bits
      fuse: Add fuse-io-uring design documentation
      fuse: make args->in_args[0] to be always the header
      fuse: {io-uring} Handle SQEs - register commands
      fuse: Make fuse_copy non static
      fuse: Add fuse-io-uring handling into fuse_copy
      fuse: {io-uring} Make hash-list req unique finding functions non-static
      fuse: Add io-uring sqe commit and fetch support
      fuse: {io-uring} Handle teardown of ring entries
      fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
      fuse: Allow to queue fg requests through io-uring
      fuse: Allow to queue bg requests through io-uring
      fuse: {io-uring} Prevent mount point hang on fuse-server termination
      fuse: block request allocation until io-uring init is complete
      fuse: enable fuse-over-io-uring

 Documentation/filesystems/fuse-io-uring.rst |  101 ++
 fs/fuse/Kconfig                             |   12 +
 fs/fuse/Makefile                            |    1 +
 fs/fuse/dax.c                               |   11 +-
 fs/fuse/dev.c                               |  127 +--
 fs/fuse/dev_uring.c                         | 1318 +++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h                       |  205 +++++
 fs/fuse/dir.c                               |   32 +-
 fs/fuse/fuse_dev_i.h                        |   67 ++
 fs/fuse/fuse_i.h                            |   30 +
 fs/fuse/inode.c                             |   14 +-
 fs/fuse/xattr.c                             |    7 +-
 include/uapi/linux/fuse.h                   |   76 +-
 13 files changed, 1924 insertions(+), 77 deletions(-)
---
base-commit: 5428dc1906dde5fb5ab283cda4714011f9811aa1
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8

Best regards,
-- 
Bernd Schubert <[email protected]>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v9 01/17] fuse: rename to fuse_dev_end_requests and make non-static
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 02/17] fuse: Move fuse_get_dev to header file Bernd Schubert
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
 fs/fuse/dev.c        | 11 +++++------
 fs/fuse/fuse_dev_i.h | 14 ++++++++++++++
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 27ccae63495d14ea339aa6c8da63d0ac44fc8885..757f2c797d68aa217c0e120f6f16e4a24808ecae 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -7,6 +7,7 @@
 */
 
 #include "fuse_i.h"
+#include "fuse_dev_i.h"
 
 #include <linux/init.h>
 #include <linux/module.h>
@@ -34,8 +35,6 @@ MODULE_ALIAS("devname:fuse");
 
 static struct kmem_cache *fuse_req_cachep;
 
-static void end_requests(struct list_head *head);
-
 static struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*
@@ -1885,7 +1884,7 @@ static void fuse_resend(struct fuse_conn *fc)
 		spin_unlock(&fiq->lock);
 		list_for_each_entry(req, &to_queue, list)
 			clear_bit(FR_PENDING, &req->flags);
-		end_requests(&to_queue);
+		fuse_dev_end_requests(&to_queue);
 		return;
 	}
 	/* iq and pq requests are both oldest to newest */
@@ -2204,7 +2203,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
 }
 
 /* Abort all requests on the given list (pending or processing) */
-static void end_requests(struct list_head *head)
+void fuse_dev_end_requests(struct list_head *head)
 {
 	while (!list_empty(head)) {
 		struct fuse_req *req;
@@ -2307,7 +2306,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		wake_up_all(&fc->blocked_waitq);
 		spin_unlock(&fc->lock);
 
-		end_requests(&to_end);
+		fuse_dev_end_requests(&to_end);
 	} else {
 		spin_unlock(&fc->lock);
 	}
@@ -2337,7 +2336,7 @@ int fuse_dev_release(struct inode *inode, struct file *file)
 			list_splice_init(&fpq->processing[i], &to_end);
 		spin_unlock(&fpq->lock);
 
-		end_requests(&to_end);
+		fuse_dev_end_requests(&to_end);
 
 		/* Are we the last open device? */
 		if (atomic_dec_and_test(&fc->dev_count)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..4fcff2223fa60fbfb844a3f8e1252a523c4c01af
--- /dev/null
+++ b/fs/fuse/fuse_dev_i.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2008  Miklos Szeredi <[email protected]>
+ */
+#ifndef _FS_FUSE_DEV_I_H
+#define _FS_FUSE_DEV_I_H
+
+#include <linux/types.h>
+
+void fuse_dev_end_requests(struct list_head *head);
+
+#endif
+

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 02/17] fuse: Move fuse_get_dev to header file
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 01/17] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 03/17] fuse: Move request bits Bernd Schubert
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Another preparation patch, as this function will be needed by
fuse/dev.c and fuse/dev_uring.c.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
 fs/fuse/dev.c        | 9 ---------
 fs/fuse/fuse_dev_i.h | 9 +++++++++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 757f2c797d68aa217c0e120f6f16e4a24808ecae..3db3282bdac4613788ec8d6d29bfc56241086609 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -35,15 +35,6 @@ MODULE_ALIAS("devname:fuse");
 
 static struct kmem_cache *fuse_req_cachep;
 
-static struct fuse_dev *fuse_get_dev(struct file *file)
-{
-	/*
-	 * Lockless access is OK, because file->private data is set
-	 * once during mount and is valid until the file is released.
-	 */
-	return READ_ONCE(file->private_data);
-}
-
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 {
 	INIT_LIST_HEAD(&req->list);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 4fcff2223fa60fbfb844a3f8e1252a523c4c01af..e7ea1b21c18204335c52406de5291f0c47d654f5 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,15 @@
 
 #include <linux/types.h>
 
+static inline struct fuse_dev *fuse_get_dev(struct file *file)
+{
+	/*
+	 * Lockless access is OK, because file->private data is set
+	 * once during mount and is valid until the file is released.
+	 */
+	return READ_ONCE(file->private_data);
+}
+
 void fuse_dev_end_requests(struct list_head *head);
 
 #endif

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 03/17] fuse: Move request bits
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 01/17] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 02/17] fuse: Move fuse_get_dev to header file Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 04/17] fuse: Add fuse-io-uring design documentation Bernd Schubert
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

These are needed by fuse-over-io-uring.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
 fs/fuse/dev.c        | 4 ----
 fs/fuse/fuse_dev_i.h | 4 ++++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 3db3282bdac4613788ec8d6d29bfc56241086609..4f8825de9e05b9ffd291ac5bff747a10a70df0b4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -29,10 +29,6 @@
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
 
-/* Ordinary requests have even IDs, while interrupts IDs are odd */
-#define FUSE_INT_REQ_BIT (1ULL << 0)
-#define FUSE_REQ_ID_STEP (1ULL << 1)
-
 static struct kmem_cache *fuse_req_cachep;
 
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index e7ea1b21c18204335c52406de5291f0c47d654f5..08a7e88e002773fcd18c25a229c7aa6450831401 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,10 @@
 
 #include <linux/types.h>
 
+/* Ordinary requests have even IDs, while interrupts IDs are odd */
+#define FUSE_INT_REQ_BIT (1ULL << 0)
+#define FUSE_REQ_ID_STEP (1ULL << 1)
+
 static inline struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 04/17] fuse: Add fuse-io-uring design documentation
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (2 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 03/17] fuse: Move request bits Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 05/17] fuse: make args->in_args[0] to be always the header Bernd Schubert
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Signed-off-by: Bernd Schubert <[email protected]>
---
 Documentation/filesystems/fuse-io-uring.rst | 101 ++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 0000000000000000000000000000000000000000..6299b65072a8468f08cc4f6978c386546bb9559a
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+FUSE-over-io-uring design documentation
+=======================================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through io-uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through io-uring, userspace
+is required to also handle requests through /dev/fuse after io-uring setup
+is complete.  Specifically notifications (initiated from the daemon side)
+ and interrupts.
+
+Fuse io-uring configuration
+===========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until io-uring setup is complete.
+
+In order to set up fuse-over-io-uring fuse-server (user-space)
+needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the /dev/fuse
+connection file descriptor. Initial submit is with the sub command
+FUSE_URING_REQ_REGISTER, which will just register entries to be
+available in the kernel.
+
+Once at least one entry per queue is submitted, kernel starts
+to enqueue to ring queues.
+Note, every CPU core has its own fuse-io-uring queue.
+Userspace handles the CQE/fuse-request and submits the result as
+subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
+the requests and also marks the entry available again. If there are
+pending requests waiting the request will be immediately submitted
+to the daemon again.
+
+Initial SQE
+-----------
+
+ |                                    |  FUSE filesystem daemon
+ |                                    |
+ |                                    |  >io_uring_submit()
+ |                                    |   IORING_OP_URING_CMD /
+ |                                    |   FUSE_URING_REQ_FETCH
+ |                                    |  [wait cqe]
+ |                                    |   >io_uring_wait_cqe() or
+ |                                    |   >io_uring_submit_and_wait()
+ |                                    |
+ |  >fuse_uring_cmd()                 |
+ |   >fuse_uring_fetch()              |
+ |    >fuse_uring_ent_release()       |
+
+
+Sending requests with CQEs
+--------------------------
+
+ |                                         |  FUSE filesystem daemon
+ |                                         |  [waiting for CQEs]
+ |  "rm /mnt/fuse/file"                    |
+ |                                         |
+ |  >sys_unlink()                          |
+ |    >fuse_unlink()                       |
+ |      [allocate request]                 |
+ |      >__fuse_request_send()             |
+ |        ...                              |
+ |       >fuse_uring_queue_fuse_req        |
+ |        [queue request on fg or          |
+ |          bg queue]                      |
+ |         >fuse_uring_assign_ring_entry() |
+ |         >fuse_uring_send_to_ring()      |
+ |          >fuse_uring_copy_to_ring()     |
+ |          >io_uring_cmd_done()           |
+ |          >request_wait_answer()         |
+ |           [sleep on req->waitq]         |
+ |                                         |  [receives and handles CQE]
+ |                                         |  [submit result and fetch next]
+ |                                         |  >io_uring_submit()
+ |                                         |   IORING_OP_URING_CMD/
+ |                                         |   FUSE_URING_REQ_COMMIT_AND_FETCH
+ |  >fuse_uring_cmd()                      |
+ |   >fuse_uring_commit_and_release()      |
+ |    >fuse_uring_copy_from_ring()         |
+ |     [ copy the result to the fuse req]  |
+ |     >fuse_uring_req_end_and_get_next()  |
+ |      >fuse_request_end()                |
+ |       [wake up req->waitq]              |
+ |      >fuse_uring_ent_release_and_fetch()|
+ |       [wait or handle next req]         |
+ |                                         |
+ |                                         |
+ |       [req->waitq woken up]             |
+ |    <fuse_unlink()                       |
+ |  <sys_unlink()                          |
+
+
+

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 05/17] fuse: make args->in_args[0] to be always the header
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (3 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 04/17] fuse: Add fuse-io-uring design documentation Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This change sets up FUSE operations to always have headers in
args.in_args[0], even for opcodes without an actual header.
This step prepares for a clean separation of payload from headers,
initially it is used by fuse-over-io-uring.

For opcodes without a header, we use a zero-sized struct as a
placeholder. This approach:
- Keeps things consistent across all FUSE operations
- Will help with payload alignment later
- Avoids future issues when header sizes change

Op codes that already have an op code specific header do not
need modification.
Op codes that have neither payload nor op code headers
are not modified either (FUSE_READLINK and FUSE_DESTROY).
FUSE_BATCH_FORGET already has the header in the right place,
but is not using fuse_copy_args - as -over-uring is currently
not handling forgets it does not matter for now, but header
separation will later need special attention for that op code.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
 fs/fuse/dax.c    | 11 ++++++-----
 fs/fuse/dev.c    |  9 +++++----
 fs/fuse/dir.c    | 32 ++++++++++++++++++--------------
 fs/fuse/fuse_i.h | 13 +++++++++++++
 fs/fuse/xattr.c  |  7 ++++---
 5 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 9abbc2f2894f905099b48862d776083e6075fbba..0b6ee6dd1fd6569a12f1a44c24ca178163b0da81 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -240,11 +240,12 @@ static int fuse_send_removemapping(struct inode *inode,
 
 	args.opcode = FUSE_REMOVEMAPPING;
 	args.nodeid = fi->nodeid;
-	args.in_numargs = 2;
-	args.in_args[0].size = sizeof(*inargp);
-	args.in_args[0].value = inargp;
-	args.in_args[1].size = inargp->count * sizeof(*remove_one);
-	args.in_args[1].value = remove_one;
+	args.in_numargs = 3;
+	fuse_set_zero_arg0(&args);
+	args.in_args[1].size = sizeof(*inargp);
+	args.in_args[1].value = inargp;
+	args.in_args[2].size = inargp->count * sizeof(*remove_one);
+	args.in_args[2].value = remove_one;
 	return fuse_simple_request(fm, &args);
 }
 
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 4f8825de9e05b9ffd291ac5bff747a10a70df0b4..623c5a067c1841e8210b5b4e063e7b6690f1825a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1746,7 +1746,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	args = &ap->args;
 	args->nodeid = outarg->nodeid;
 	args->opcode = FUSE_NOTIFY_REPLY;
-	args->in_numargs = 2;
+	args->in_numargs = 3;
 	args->in_pages = true;
 	args->end = fuse_retrieve_end;
 
@@ -1774,9 +1774,10 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	}
 	ra->inarg.offset = outarg->offset;
 	ra->inarg.size = total_len;
-	args->in_args[0].size = sizeof(ra->inarg);
-	args->in_args[0].value = &ra->inarg;
-	args->in_args[1].size = total_len;
+	fuse_set_zero_arg0(args);
+	args->in_args[1].size = sizeof(ra->inarg);
+	args->in_args[1].value = &ra->inarg;
+	args->in_args[2].size = total_len;
 
 	err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
 	if (err)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 494ac372ace07ab4ea06c13a404ecc1d2ccb4f23..1c6126069ee7fcce522fbb7bcec21c9392982413 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -175,9 +175,10 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 	memset(outarg, 0, sizeof(struct fuse_entry_out));
 	args->opcode = FUSE_LOOKUP;
 	args->nodeid = nodeid;
-	args->in_numargs = 1;
-	args->in_args[0].size = name->len + 1;
-	args->in_args[0].value = name->name;
+	args->in_numargs = 2;
+	fuse_set_zero_arg0(args);
+	args->in_args[1].size = name->len + 1;
+	args->in_args[1].value = name->name;
 	args->out_numargs = 1;
 	args->out_args[0].size = sizeof(struct fuse_entry_out);
 	args->out_args[0].value = outarg;
@@ -928,11 +929,12 @@ static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
 	FUSE_ARGS(args);
 
 	args.opcode = FUSE_SYMLINK;
-	args.in_numargs = 2;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
-	args.in_args[1].size = len;
-	args.in_args[1].value = link;
+	args.in_numargs = 3;
+	fuse_set_zero_arg0(&args);
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
+	args.in_args[2].size = len;
+	args.in_args[2].value = link;
 	return create_new_entry(idmap, fm, &args, dir, entry, S_IFLNK);
 }
 
@@ -992,9 +994,10 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 
 	args.opcode = FUSE_UNLINK;
 	args.nodeid = get_node_id(dir);
-	args.in_numargs = 1;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
+	args.in_numargs = 2;
+	fuse_set_zero_arg0(&args);
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
 	err = fuse_simple_request(fm, &args);
 	if (!err) {
 		fuse_dir_changed(dir);
@@ -1015,9 +1018,10 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 
 	args.opcode = FUSE_RMDIR;
 	args.nodeid = get_node_id(dir);
-	args.in_numargs = 1;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
+	args.in_numargs = 2;
+	fuse_set_zero_arg0(&args);
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
 	err = fuse_simple_request(fm, &args);
 	if (!err) {
 		fuse_dir_changed(dir);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 74744c6f286003251564d1235f4d2ca8654d661b..babddd05303796d689a64f0f5a890066b43170ac 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -947,6 +947,19 @@ struct fuse_mount {
 	struct rcu_head rcu;
 };
 
+/*
+ * Empty header for FUSE opcodes without specific header needs.
+ * Used as a placeholder in args->in_args[0] for consistency
+ * across all FUSE operations, simplifying request handling.
+ */
+struct fuse_zero_header {};
+
+static inline void fuse_set_zero_arg0(struct fuse_args *args)
+{
+	args->in_args[0].size = sizeof(struct fuse_zero_header);
+	args->in_args[0].value = NULL;
+}
+
 static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
 {
 	return sb->s_fs_info;
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 9f568d345c51236ddd421b162820a4ea9b0734f4..93dfb06b6cea045d6df90c61c900680968bda39f 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -164,9 +164,10 @@ int fuse_removexattr(struct inode *inode, const char *name)
 
 	args.opcode = FUSE_REMOVEXATTR;
 	args.nodeid = get_node_id(inode);
-	args.in_numargs = 1;
-	args.in_args[0].size = strlen(name) + 1;
-	args.in_args[0].value = name;
+	args.in_numargs = 2;
+	fuse_set_zero_arg0(&args);
+	args.in_args[1].size = strlen(name) + 1;
+	args.in_args[1].value = name;
 	err = fuse_simple_request(fm, &args);
 	if (err == -ENOSYS) {
 		fm->fc->no_removexattr = 1;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (4 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 05/17] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  9:56   ` Luis Henriques
  2025-01-17 11:06   ` Pavel Begunkov
  2025-01-07  0:25 ` [PATCH v9 07/17] fuse: Make fuse_copy non static Bernd Schubert
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
entries.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/Kconfig           |  12 ++
 fs/fuse/Makefile          |   1 +
 fs/fuse/dev_uring.c       | 333 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h     | 116 ++++++++++++++++
 fs/fuse/fuse_i.h          |   5 +
 fs/fuse/inode.c           |  10 ++
 include/uapi/linux/fuse.h |  76 ++++++++++-
 7 files changed, 552 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..ca215a3cba3e310d1359d069202193acdcdb172b 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
 	  to be performed directly on a backing file.
 
 	  If you want to allow passthrough operations, answer Y.
+
+config FUSE_IO_URING
+	bool "FUSE communication over io-uring"
+	default y
+	depends on FUSE_FS
+	depends on IO_URING
+	help
+	  This allows sending FUSE requests over the io-uring interface and
+          also adds request core affinity.
+
+	  If you want to allow fuse server/client communication through io-uring,
+	  answer Y
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 2c372180d631eb340eca36f19ee2c2686de9714d..3f0f312a31c1cc200c0c91a086b30a8318e39d94 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -15,5 +15,6 @@ fuse-y += iomode.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
 fuse-$(CONFIG_SYSCTL) += sysctl.o
+fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
new file mode 100644
index 0000000000000000000000000000000000000000..b44ba4033615e01041313c040035b6da6af0ee17
--- /dev/null
+++ b/fs/fuse/dev_uring.c
@@ -0,0 +1,333 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#include "fuse_i.h"
+#include "dev_uring_i.h"
+#include "fuse_dev_i.h"
+
+#include <linux/fs.h>
+#include <linux/io_uring/cmd.h>
+
+#ifdef CONFIG_FUSE_IO_URING
+static bool __read_mostly enable_uring;
+module_param(enable_uring, bool, 0644);
+MODULE_PARM_DESC(enable_uring,
+		 "Enable userspace communication through io-uring");
+#endif
+
+#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+
+
+bool fuse_uring_enabled(void)
+{
+	return enable_uring;
+}
+
+void fuse_uring_destruct(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+	int qid;
+
+	if (!ring)
+		return;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		WARN_ON(!list_empty(&queue->ent_avail_queue));
+		WARN_ON(!list_empty(&queue->ent_commit_queue));
+
+		kfree(queue);
+		ring->queues[qid] = NULL;
+	}
+
+	kfree(ring->queues);
+	kfree(ring);
+	fc->ring = NULL;
+}
+
+/*
+ * Basic ring setup for this connection based on the provided configuration
+ */
+static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring;
+	size_t nr_queues = num_possible_cpus();
+	struct fuse_ring *res = NULL;
+	size_t max_payload_size;
+
+	ring = kzalloc(sizeof(*fc->ring), GFP_KERNEL_ACCOUNT);
+	if (!ring)
+		return NULL;
+
+	ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
+			       GFP_KERNEL_ACCOUNT);
+	if (!ring->queues)
+		goto out_err;
+
+	max_payload_size = max(FUSE_MIN_READ_BUFFER, fc->max_write);
+	max_payload_size = max(max_payload_size, fc->max_pages * PAGE_SIZE);
+
+	spin_lock(&fc->lock);
+	if (fc->ring) {
+		/* race, another thread created the ring in the meantime */
+		spin_unlock(&fc->lock);
+		res = fc->ring;
+		goto out_err;
+	}
+
+	fc->ring = ring;
+	ring->nr_queues = nr_queues;
+	ring->fc = fc;
+	ring->max_payload_sz = max_payload_size;
+
+	spin_unlock(&fc->lock);
+	return ring;
+
+out_err:
+	kfree(ring->queues);
+	kfree(ring);
+	return res;
+}
+
+static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
+						       int qid)
+{
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_ring_queue *queue;
+
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
+	if (!queue)
+		return NULL;
+	queue->qid = qid;
+	queue->ring = ring;
+	spin_lock_init(&queue->lock);
+
+	INIT_LIST_HEAD(&queue->ent_avail_queue);
+	INIT_LIST_HEAD(&queue->ent_commit_queue);
+
+	spin_lock(&fc->lock);
+	if (ring->queues[qid]) {
+		spin_unlock(&fc->lock);
+		kfree(queue);
+		return ring->queues[qid];
+	}
+
+	/*
+	 * write_once and lock as the caller mostly doesn't take the lock at all
+	 */
+	WRITE_ONCE(ring->queues[qid], queue);
+	spin_unlock(&fc->lock);
+
+	return queue;
+}
+
+/*
+ * Make a ring entry available for fuse_req assignment
+ */
+static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
+				 struct fuse_ring_queue *queue)
+{
+	list_move(&ring_ent->list, &queue->ent_avail_queue);
+	ring_ent->state = FRRS_AVAILABLE;
+}
+
+/*
+ * fuse_uring_req_fetch command handling
+ */
+static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
+				   struct io_uring_cmd *cmd,
+				   unsigned int issue_flags)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	spin_lock(&queue->lock);
+	fuse_uring_ent_avail(ring_ent, queue);
+	spin_unlock(&queue->lock);
+}
+
+/*
+ * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
+ * the payload
+ */
+static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
+					 struct iovec iov[FUSE_URING_IOV_SEGS])
+{
+	struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	struct iov_iter iter;
+	ssize_t ret;
+
+	if (sqe->len != FUSE_URING_IOV_SEGS)
+		return -EINVAL;
+
+	/*
+	 * Direction for buffer access will actually be READ and WRITE,
+	 * using write for the import should include READ access as well.
+	 */
+	ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
+			   FUSE_URING_IOV_SEGS, &iov, &iter);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static struct fuse_ring_ent *
+fuse_uring_create_ring_ent(struct io_uring_cmd *cmd,
+			   struct fuse_ring_queue *queue)
+{
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_ring_ent *ent;
+	size_t payload_size;
+	struct iovec iov[FUSE_URING_IOV_SEGS];
+	int err;
+
+	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
+	if (err) {
+		pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
+				    err);
+		return ERR_PTR(err);
+	}
+
+	/*
+	 * The created queue above does not need to be destructed in
+	 * case of entry errors below, will be done at ring destruction time.
+	 */
+	err = -ENOMEM;
+	ent = kzalloc(sizeof(*ent), GFP_KERNEL_ACCOUNT);
+	if (!ent)
+		return ERR_PTR(err);
+
+	INIT_LIST_HEAD(&ent->list);
+
+	ent->queue = queue;
+	ent->cmd = cmd;
+
+	err = -EINVAL;
+	if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
+		pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
+		return ERR_PTR(err);
+	}
+
+	ent->headers = iov[0].iov_base;
+	ent->payload = iov[1].iov_base;
+	payload_size = iov[1].iov_len;
+
+	if (payload_size < ring->max_payload_sz) {
+		pr_info_ratelimited("Invalid req payload len %zu\n",
+				    payload_size);
+		return ERR_PTR(err);
+	}
+
+	return ent;
+}
+
+/* Register header and payload buffer with the kernel and fetch a request */
+static int fuse_uring_register(struct io_uring_cmd *cmd,
+			       unsigned int issue_flags, struct fuse_conn *fc)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	struct fuse_ring_ent *ring_ent;
+	int err;
+	struct iovec iov[FUSE_URING_IOV_SEGS];
+	unsigned int qid = READ_ONCE(cmd_req->qid);
+
+	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
+	if (err) {
+		pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
+				    err);
+		return err;
+	}
+
+	err = -ENOMEM;
+	if (!ring) {
+		ring = fuse_uring_create(fc);
+		if (!ring)
+			return err;
+	}
+
+	if (qid >= ring->nr_queues) {
+		pr_info_ratelimited("fuse: Invalid ring qid %u\n", qid);
+		return -EINVAL;
+	}
+
+	err = -ENOMEM;
+	queue = ring->queues[qid];
+	if (!queue) {
+		queue = fuse_uring_create_queue(ring, qid);
+		if (!queue)
+			return err;
+	}
+
+	ring_ent = fuse_uring_create_ring_ent(cmd, queue);
+	if (IS_ERR(ring_ent))
+		return PTR_ERR(ring_ent);
+
+	fuse_uring_do_register(ring_ent, cmd, issue_flags);
+
+	return 0;
+}
+
+/*
+ * Entry function from io_uring to handle the given passthrough command
+ * (op code IORING_OP_URING_CMD)
+ */
+int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
+				  unsigned int issue_flags)
+{
+	struct fuse_dev *fud;
+	struct fuse_conn *fc;
+	u32 cmd_op = cmd->cmd_op;
+	int err;
+
+	if (!enable_uring) {
+		pr_info_ratelimited("fuse-io-uring is disabled\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* This extra SQE size holds struct fuse_uring_cmd_req */
+	if (!(issue_flags & IO_URING_F_SQE128))
+		return -EINVAL;
+
+	fud = fuse_get_dev(cmd->file);
+	if (!fud) {
+		pr_info_ratelimited("No fuse device found\n");
+		return -ENOTCONN;
+	}
+	fc = fud->fc;
+
+	if (fc->aborted)
+		return -ECONNABORTED;
+	if (!fc->connected)
+		return -ENOTCONN;
+
+	/*
+	 * fuse_uring_register() needs the ring to be initialized,
+	 * we need to know the max payload size
+	 */
+	if (!fc->initialized)
+		return -EAGAIN;
+
+	switch (cmd_op) {
+	case FUSE_IO_URING_CMD_REGISTER:
+		err = fuse_uring_register(cmd, issue_flags, fc);
+		if (err) {
+			pr_info_once("FUSE_IO_URING_CMD_REGISTER failed err=%d\n",
+				     err);
+			return err;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return -EIOCBQUEUED;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..4e46dd65196d26dabc62dada33b17de9aa511c08
--- /dev/null
+++ b/fs/fuse/dev_uring_i.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#ifndef _FS_FUSE_DEV_URING_I_H
+#define _FS_FUSE_DEV_URING_I_H
+
+#include "fuse_i.h"
+
+#ifdef CONFIG_FUSE_IO_URING
+
+enum fuse_ring_req_state {
+	FRRS_INVALID = 0,
+
+	/* The ring entry received from userspace and it is being processed */
+	FRRS_COMMIT,
+
+	/* The ring entry is waiting for new fuse requests */
+	FRRS_AVAILABLE,
+
+	/* The ring entry is in or on the way to user space */
+	FRRS_USERSPACE,
+};
+
+/** A fuse ring entry, part of the ring queue */
+struct fuse_ring_ent {
+	/* userspace buffer */
+	struct fuse_uring_req_header __user *headers;
+	void __user *payload;
+
+	/* the ring queue that owns the request */
+	struct fuse_ring_queue *queue;
+
+	/* fields below are protected by queue->lock */
+
+	struct io_uring_cmd *cmd;
+
+	struct list_head list;
+
+	enum fuse_ring_req_state state;
+
+	struct fuse_req *fuse_req;
+
+	/* commit id to identify the server reply */
+	uint64_t commit_id;
+};
+
+struct fuse_ring_queue {
+	/*
+	 * back pointer to the main fuse uring structure that holds this
+	 * queue
+	 */
+	struct fuse_ring *ring;
+
+	/* queue id, corresponds to the cpu core */
+	unsigned int qid;
+
+	/*
+	 * queue lock, taken when any value in the queue changes _and_ also
+	 * a ring entry state changes.
+	 */
+	spinlock_t lock;
+
+	/* available ring entries (struct fuse_ring_ent) */
+	struct list_head ent_avail_queue;
+
+	/*
+	 * entries in the process of being committed or in the process
+	 * to be sent to userspace
+	 */
+	struct list_head ent_commit_queue;
+};
+
+/**
+ * Describes if uring is for communication and holds alls the data needed
+ * for uring communication
+ */
+struct fuse_ring {
+	/* back pointer */
+	struct fuse_conn *fc;
+
+	/* number of ring queues */
+	size_t nr_queues;
+
+	/* maximum payload/arg size */
+	size_t max_payload_sz;
+
+	struct fuse_ring_queue **queues;
+};
+
+bool fuse_uring_enabled(void);
+void fuse_uring_destruct(struct fuse_conn *fc);
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+
+#else /* CONFIG_FUSE_IO_URING */
+
+struct fuse_ring;
+
+static inline void fuse_uring_create(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_destruct(struct fuse_conn *fc)
+{
+}
+
+static inline bool fuse_uring_enabled(void)
+{
+	return false;
+}
+
+#endif /* CONFIG_FUSE_IO_URING */
+
+#endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index babddd05303796d689a64f0f5a890066b43170ac..d75dd9b59a5c35b76919db760645464f604517f5 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -923,6 +923,11 @@ struct fuse_conn {
 	/** IDR for backing files ids */
 	struct idr backing_files_map;
 #endif
+
+#ifdef CONFIG_FUSE_IO_URING
+	/**  uring connection information*/
+	struct fuse_ring *ring;
+#endif
 };
 
 /*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 3ce4f4e81d09e867c3a7db7b1dbb819f88ed34ef..e4f9bbacfc1bc6f51d5d01b4c47b42cc159ed783 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -7,6 +7,7 @@
 */
 
 #include "fuse_i.h"
+#include "dev_uring_i.h"
 
 #include <linux/pagemap.h>
 #include <linux/slab.h>
@@ -992,6 +993,8 @@ static void delayed_release(struct rcu_head *p)
 {
 	struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
 
+	fuse_uring_destruct(fc);
+
 	put_user_ns(fc->user_ns);
 	fc->release(fc);
 }
@@ -1446,6 +1449,13 @@ void fuse_send_init(struct fuse_mount *fm)
 	if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
 		flags |= FUSE_PASSTHROUGH;
 
+	/*
+	 * This is just an information flag for fuse server. No need to check
+	 * the reply - server is either sending IORING_OP_URING_CMD or not.
+	 */
+	if (fuse_uring_enabled())
+		flags |= FUSE_OVER_IO_URING;
+
 	ia->in.flags = flags;
 	ia->in.flags2 = flags >> 32;
 
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index f1e99458e29e4fdce5273bc3def242342f207ebd..5e0eb41d967e9de5951673de4405a3ed22cdd8e2 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -220,6 +220,15 @@
  *
  *  7.41
  *  - add FUSE_ALLOW_IDMAP
+ *  7.42
+ *  - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
+ *    structures:
+ *    - struct fuse_uring_ent_in_out
+ *    - struct fuse_uring_req_header
+ *    - struct fuse_uring_cmd_req
+ *    - FUSE_URING_IN_OUT_HEADER_SZ
+ *    - FUSE_URING_OP_IN_OUT_SZ
+ *    - enum fuse_uring_cmd
  */
 
 #ifndef _LINUX_FUSE_H
@@ -255,7 +264,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 41
+#define FUSE_KERNEL_MINOR_VERSION 42
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -425,6 +434,7 @@ struct fuse_file_lock {
  * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
  *		    of the request ID indicates resend requests
  * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
+ * FUSE_OVER_IO_URING: Indicate that client supports io-uring
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -471,6 +481,7 @@ struct fuse_file_lock {
 /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
 #define FUSE_DIRECT_IO_RELAX	FUSE_DIRECT_IO_ALLOW_MMAP
 #define FUSE_ALLOW_IDMAP	(1ULL << 40)
+#define FUSE_OVER_IO_URING	(1ULL << 41)
 
 /**
  * CUSE INIT request/reply flags
@@ -1206,4 +1217,67 @@ struct fuse_supp_groups {
 	uint32_t	groups[];
 };
 
+/**
+ * Size of the ring buffer header
+ */
+#define FUSE_URING_IN_OUT_HEADER_SZ 128
+#define FUSE_URING_OP_IN_OUT_SZ 128
+
+/* Used as part of the fuse_uring_req_header */
+struct fuse_uring_ent_in_out {
+	uint64_t flags;
+
+	/*
+	 * commit ID to be used in a reply to a ring request (see also
+	 * struct fuse_uring_cmd_req)
+	 */
+	uint64_t commit_id;
+
+	/* size of user payload buffer */
+	uint32_t payload_sz;
+	uint32_t padding;
+
+	uint64_t reserved;
+};
+
+/**
+ * Header for all fuse-io-uring requests
+ */
+struct fuse_uring_req_header {
+	/* struct fuse_in_header / struct fuse_out_header */
+	char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
+
+	/* per op code header */
+	char op_in[FUSE_URING_OP_IN_OUT_SZ];
+
+	struct fuse_uring_ent_in_out ring_ent_in_out;
+};
+
+/**
+ * sqe commands to the kernel
+ */
+enum fuse_uring_cmd {
+	FUSE_IO_URING_CMD_INVALID = 0,
+
+	/* register the request buffer and fetch a fuse request */
+	FUSE_IO_URING_CMD_REGISTER = 1,
+
+	/* commit fuse request result and fetch next request */
+	FUSE_IO_URING_CMD_COMMIT_AND_FETCH = 2,
+};
+
+/**
+ * In the 80B command area of the SQE.
+ */
+struct fuse_uring_cmd_req {
+	uint64_t flags;
+
+	/* entry identifier for commits */
+	uint64_t commit_id;
+
+	/* queue the command is for (queue index) */
+	uint16_t qid;
+	uint8_t padding[6];
+};
+
 #endif /* _LINUX_FUSE_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 07/17] fuse: Make fuse_copy non static
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (5 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 08/17] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-io-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
 fs/fuse/dev.c        | 30 ++++++++----------------------
 fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 623c5a067c1841e8210b5b4e063e7b6690f1825a..6ee7e28a84c80a3e7c8dc933986c0388371ff6cd 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -678,22 +678,8 @@ static int unlock_request(struct fuse_req *req)
 	return err;
 }
 
-struct fuse_copy_state {
-	int write;
-	struct fuse_req *req;
-	struct iov_iter *iter;
-	struct pipe_buffer *pipebufs;
-	struct pipe_buffer *currbuf;
-	struct pipe_inode_info *pipe;
-	unsigned long nr_segs;
-	struct page *pg;
-	unsigned len;
-	unsigned offset;
-	unsigned move_pages:1;
-};
-
-static void fuse_copy_init(struct fuse_copy_state *cs, int write,
-			   struct iov_iter *iter)
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+		    struct iov_iter *iter)
 {
 	memset(cs, 0, sizeof(*cs));
 	cs->write = write;
@@ -1054,9 +1040,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
 }
 
 /* Copy request arguments to/from userspace buffer */
-static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
-			  unsigned argpages, struct fuse_arg *args,
-			  int zeroing)
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
+		   unsigned argpages, struct fuse_arg *args,
+		   int zeroing)
 {
 	int err = 0;
 	unsigned i;
@@ -1933,8 +1919,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 	return NULL;
 }
 
-static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
-			 unsigned nbytes)
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+		       unsigned nbytes)
 {
 	unsigned reqsize = sizeof(struct fuse_out_header);
 
@@ -2036,7 +2022,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	if (oh.error)
 		err = nbytes != sizeof(oh) ? -EINVAL : 0;
 	else
-		err = copy_out_args(cs, req->args, nbytes);
+		err = fuse_copy_out_args(cs, req->args, nbytes);
 	fuse_copy_finish(cs);
 
 	spin_lock(&fpq->lock);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 08a7e88e002773fcd18c25a229c7aa6450831401..21eb1bdb492d04f0a406d25bb8d300b34244dce2 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -12,6 +12,23 @@
 #define FUSE_INT_REQ_BIT (1ULL << 0)
 #define FUSE_REQ_ID_STEP (1ULL << 1)
 
+struct fuse_arg;
+struct fuse_args;
+
+struct fuse_copy_state {
+	int write;
+	struct fuse_req *req;
+	struct iov_iter *iter;
+	struct pipe_buffer *pipebufs;
+	struct pipe_buffer *currbuf;
+	struct pipe_inode_info *pipe;
+	unsigned long nr_segs;
+	struct page *pg;
+	unsigned int len;
+	unsigned int offset;
+	unsigned int move_pages:1;
+};
+
 static inline struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*
@@ -23,5 +40,13 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 
 void fuse_dev_end_requests(struct list_head *head);
 
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+			   struct iov_iter *iter);
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
+		   unsigned int argpages, struct fuse_arg *args,
+		   int zeroing);
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+		       unsigned int nbytes);
+
 #endif
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 08/17] fuse: Add fuse-io-uring handling into fuse_copy
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (6 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 07/17] fuse: Make fuse_copy non static Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-10 22:18   ` Joanne Koong
  2025-01-07  0:25 ` [PATCH v9 09/17] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Add special fuse-io-uring into the fuse argument
copy handler.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c        | 12 +++++++++++-
 fs/fuse/fuse_dev_i.h |  4 ++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6ee7e28a84c80a3e7c8dc933986c0388371ff6cd..8b03a540e151daa1f62986aa79030e9e7a456059 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -786,6 +786,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
 	*size -= ncpy;
 	cs->len -= ncpy;
 	cs->offset += ncpy;
+	if (cs->is_uring)
+		cs->ring.copied_sz += ncpy;
+
 	return ncpy;
 }
 
@@ -1922,7 +1925,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
 		       unsigned nbytes)
 {
-	unsigned reqsize = sizeof(struct fuse_out_header);
+
+	unsigned int reqsize = 0;
+
+	/*
+	 * Uring has all headers separated from args - args is payload only
+	 */
+	if (!cs->is_uring)
+		reqsize = sizeof(struct fuse_out_header);
 
 	reqsize += fuse_len_args(args->out_numargs, args->out_args);
 
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 21eb1bdb492d04f0a406d25bb8d300b34244dce2..4a8a4feb2df53fb84938a6711e6bcfd0f1b9f615 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -27,6 +27,10 @@ struct fuse_copy_state {
 	unsigned int len;
 	unsigned int offset;
 	unsigned int move_pages:1;
+	unsigned int is_uring:1;
+	struct {
+		unsigned int copied_sz; /* copied size into the user buffer */
+	} ring;
 };
 
 static inline struct fuse_dev *fuse_get_dev(struct file *file)

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 09/17] fuse: {io-uring} Make hash-list req unique finding functions non-static
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (7 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 08/17] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

fuse-over-io-uring uses existing functions to find requests based
on their unique id - make these functions non-static.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
---
 fs/fuse/dev.c        | 6 +++---
 fs/fuse/fuse_dev_i.h | 6 ++++++
 fs/fuse/fuse_i.h     | 5 +++++
 fs/fuse/inode.c      | 2 +-
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8b03a540e151daa1f62986aa79030e9e7a456059..aa33eba51c51dff6af2cdcf60bed9c3f6b4bc0d0 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -220,7 +220,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq)
 }
 EXPORT_SYMBOL_GPL(fuse_get_unique);
 
-static unsigned int fuse_req_hash(u64 unique)
+unsigned int fuse_req_hash(u64 unique)
 {
 	return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
 }
@@ -1910,7 +1910,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
 }
 
 /* Look up request on processing list by unique ID */
-static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique)
 {
 	unsigned int hash = fuse_req_hash(unique);
 	struct fuse_req *req;
@@ -1994,7 +1994,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	spin_lock(&fpq->lock);
 	req = NULL;
 	if (fpq->connected)
-		req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
+		req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
 
 	err = -ENOENT;
 	if (!req) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 4a8a4feb2df53fb84938a6711e6bcfd0f1b9f615..599a61536f8c85b3631b8584247a917bda92e719 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -7,6 +7,7 @@
 #define _FS_FUSE_DEV_I_H
 
 #include <linux/types.h>
+#include <linux/fs.h>
 
 /* Ordinary requests have even IDs, while interrupts IDs are odd */
 #define FUSE_INT_REQ_BIT (1ULL << 0)
@@ -14,6 +15,8 @@
 
 struct fuse_arg;
 struct fuse_args;
+struct fuse_pqueue;
+struct fuse_req;
 
 struct fuse_copy_state {
 	int write;
@@ -42,6 +45,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 	return READ_ONCE(file->private_data);
 }
 
+unsigned int fuse_req_hash(u64 unique);
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
+
 void fuse_dev_end_requests(struct list_head *head);
 
 void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d75dd9b59a5c35b76919db760645464f604517f5..e545b0864dd51e82df61cc39bdf65d3d36a418dc 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1237,6 +1237,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
  */
 struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 
+/**
+ * Initialize the fuse processing queue
+ */
+void fuse_pqueue_init(struct fuse_pqueue *fpq);
+
 /**
  * Initialize fuse_conn
  */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index e4f9bbacfc1bc6f51d5d01b4c47b42cc159ed783..328797b9aac9a816a4ad2c69b6880dc6ef6222b0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -938,7 +938,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq,
 	fiq->priv = priv;
 }
 
-static void fuse_pqueue_init(struct fuse_pqueue *fpq)
+void fuse_pqueue_init(struct fuse_pqueue *fpq)
 {
 	unsigned int i;
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (8 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 09/17] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07 14:42   ` Luis Henriques
                     ` (2 more replies)
  2025-01-07  0:25 ` [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
                   ` (7 subsequent siblings)
  17 siblings, 3 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds support for fuse request completion through ring SQEs
(FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
the ring entry it becomes available for new fuse requests.
Handling of requests through the ring (SQE/CQE handling)
is complete now.

Fuse request data are copied through the mmaped ring buffer,
there is no support for any zero copy yet.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c   | 450 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |  12 ++
 fs/fuse/fuse_i.h      |   4 +
 3 files changed, 466 insertions(+)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index b44ba4033615e01041313c040035b6da6af0ee17..f44e66a7ea577390da87e9ac7d118a9416898c28 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -26,6 +26,19 @@ bool fuse_uring_enabled(void)
 	return enable_uring;
 }
 
+static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
+			       int error)
+{
+	struct fuse_req *req = ring_ent->fuse_req;
+
+	if (set_err)
+		req->out.h.error = error;
+
+	clear_bit(FR_SENT, &req->flags);
+	fuse_request_end(ring_ent->fuse_req);
+	ring_ent->fuse_req = NULL;
+}
+
 void fuse_uring_destruct(struct fuse_conn *fc)
 {
 	struct fuse_ring *ring = fc->ring;
@@ -41,8 +54,11 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 			continue;
 
 		WARN_ON(!list_empty(&queue->ent_avail_queue));
+		WARN_ON(!list_empty(&queue->ent_w_req_queue));
 		WARN_ON(!list_empty(&queue->ent_commit_queue));
+		WARN_ON(!list_empty(&queue->ent_in_userspace));
 
+		kfree(queue->fpq.processing);
 		kfree(queue);
 		ring->queues[qid] = NULL;
 	}
@@ -101,20 +117,34 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 {
 	struct fuse_conn *fc = ring->fc;
 	struct fuse_ring_queue *queue;
+	struct list_head *pq;
 
 	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
 	if (!queue)
 		return NULL;
+	pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
+	if (!pq) {
+		kfree(queue);
+		return NULL;
+	}
+
 	queue->qid = qid;
 	queue->ring = ring;
 	spin_lock_init(&queue->lock);
 
 	INIT_LIST_HEAD(&queue->ent_avail_queue);
 	INIT_LIST_HEAD(&queue->ent_commit_queue);
+	INIT_LIST_HEAD(&queue->ent_w_req_queue);
+	INIT_LIST_HEAD(&queue->ent_in_userspace);
+	INIT_LIST_HEAD(&queue->fuse_req_queue);
+
+	queue->fpq.processing = pq;
+	fuse_pqueue_init(&queue->fpq);
 
 	spin_lock(&fc->lock);
 	if (ring->queues[qid]) {
 		spin_unlock(&fc->lock);
+		kfree(queue->fpq.processing);
 		kfree(queue);
 		return ring->queues[qid];
 	}
@@ -128,6 +158,214 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	return queue;
 }
 
+/*
+ * Checks for errors and stores it into the request
+ */
+static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
+					 struct fuse_req *req,
+					 struct fuse_conn *fc)
+{
+	int err;
+
+	err = -EINVAL;
+	if (oh->unique == 0) {
+		/* Not supportd through io-uring yet */
+		pr_warn_once("notify through fuse-io-uring not supported\n");
+		goto seterr;
+	}
+
+	err = -EINVAL;
+	if (oh->error <= -ERESTARTSYS || oh->error > 0)
+		goto seterr;
+
+	if (oh->error) {
+		err = oh->error;
+		goto err;
+	}
+
+	err = -ENOENT;
+	if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
+		pr_warn_ratelimited("unique mismatch, expected: %llu got %llu\n",
+				    req->in.h.unique,
+				    oh->unique & ~FUSE_INT_REQ_BIT);
+		goto seterr;
+	}
+
+	/*
+	 * Is it an interrupt reply ID?
+	 * XXX: Not supported through fuse-io-uring yet, it should not even
+	 *      find the request - should not happen.
+	 */
+	WARN_ON_ONCE(oh->unique & FUSE_INT_REQ_BIT);
+
+	return 0;
+
+seterr:
+	oh->error = err;
+err:
+	return err;
+}
+
+static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
+				     struct fuse_req *req,
+				     struct fuse_ring_ent *ent)
+{
+	struct fuse_copy_state cs;
+	struct fuse_args *args = req->args;
+	struct iov_iter iter;
+	int err, res;
+	struct fuse_uring_ent_in_out ring_in_out;
+
+	res = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
+			     sizeof(ring_in_out));
+	if (res)
+		return -EFAULT;
+
+	err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz,
+			  &iter);
+	if (err)
+		return err;
+
+	fuse_copy_init(&cs, 0, &iter);
+	cs.is_uring = 1;
+	cs.req = req;
+
+	return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
+}
+
+ /*
+  * Copy data from the req to the ring buffer
+  */
+static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
+				   struct fuse_ring_ent *ent)
+{
+	struct fuse_copy_state cs;
+	struct fuse_args *args = req->args;
+	struct fuse_in_arg *in_args = args->in_args;
+	int num_args = args->in_numargs;
+	int err, res;
+	struct iov_iter iter;
+	struct fuse_uring_ent_in_out ent_in_out = {
+		.flags = 0,
+		.commit_id = ent->commit_id,
+	};
+
+	if (WARN_ON(ent_in_out.commit_id == 0))
+		return -EINVAL;
+
+	err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter);
+	if (err) {
+		pr_info_ratelimited("fuse: Import of user buffer failed\n");
+		return err;
+	}
+
+	fuse_copy_init(&cs, 1, &iter);
+	cs.is_uring = 1;
+	cs.req = req;
+
+	if (num_args > 0) {
+		/*
+		 * Expectation is that the first argument is the per op header.
+		 * Some op code have that as zero.
+		 */
+		if (args->in_args[0].size > 0) {
+			res = copy_to_user(&ent->headers->op_in, in_args->value,
+					   in_args->size);
+			err = res > 0 ? -EFAULT : res;
+			if (err) {
+				pr_info_ratelimited(
+					"Copying the header failed.\n");
+				return err;
+			}
+		}
+		in_args++;
+		num_args--;
+	}
+
+	/* copy the payload */
+	err = fuse_copy_args(&cs, num_args, args->in_pages,
+			     (struct fuse_arg *)in_args, 0);
+	if (err) {
+		pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
+		return err;
+	}
+
+	ent_in_out.payload_sz = cs.ring.copied_sz;
+	res = copy_to_user(&ent->headers->ring_ent_in_out, &ent_in_out,
+			   sizeof(ent_in_out));
+	err = res > 0 ? -EFAULT : res;
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int
+fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_req *req = ring_ent->fuse_req;
+	int err, res;
+
+	err = -EIO;
+	if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
+		pr_err("qid=%d ring-req=%p invalid state %d on send\n",
+		       queue->qid, ring_ent, ring_ent->state);
+		err = -EIO;
+		goto err;
+	}
+
+	/* copy the request */
+	err = fuse_uring_copy_to_ring(ring, req, ring_ent);
+	if (unlikely(err)) {
+		pr_info_ratelimited("Copy to ring failed: %d\n", err);
+		goto err;
+	}
+
+	/* copy fuse_in_header */
+	res = copy_to_user(&ring_ent->headers->in_out, &req->in.h,
+			   sizeof(req->in.h));
+	err = res > 0 ? -EFAULT : res;
+	if (err)
+		goto err;
+
+	set_bit(FR_SENT, &req->flags);
+	return 0;
+
+err:
+	fuse_uring_req_end(ring_ent, true, err);
+	return err;
+}
+
+/*
+ * Write data to the ring buffer and send the request to userspace,
+ * userspace will read it
+ * This is comparable with classical read(/dev/fuse)
+ */
+static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent,
+					unsigned int issue_flags)
+{
+	int err = 0;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	err = fuse_uring_prepare_send(ring_ent);
+	if (err)
+		goto err;
+
+	spin_lock(&queue->lock);
+	ring_ent->state = FRRS_USERSPACE;
+	list_move(&ring_ent->list, &queue->ent_in_userspace);
+	spin_unlock(&queue->lock);
+
+	io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags);
+	ring_ent->cmd = NULL;
+	return 0;
+
+err:
+	return err;
+}
+
 /*
  * Make a ring entry available for fuse_req assignment
  */
@@ -138,6 +376,210 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
 	ring_ent->state = FRRS_AVAILABLE;
 }
 
+/* Used to find the request on SQE commit */
+static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
+				 struct fuse_req *req)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_pqueue *fpq = &queue->fpq;
+	unsigned int hash;
+
+	/* commit_id is the unique id of the request */
+	ring_ent->commit_id = req->in.h.unique;
+
+	req->ring_entry = ring_ent;
+	hash = fuse_req_hash(ring_ent->commit_id);
+	list_move_tail(&req->list, &fpq->processing[hash]);
+}
+
+/*
+ * Assign a fuse queue entry to the given entry
+ */
+static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
+					   struct fuse_req *req)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	if (WARN_ON_ONCE(ring_ent->state != FRRS_AVAILABLE &&
+			 ring_ent->state != FRRS_COMMIT)) {
+		pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+			ring_ent->state);
+	}
+	list_del_init(&req->list);
+	clear_bit(FR_PENDING, &req->flags);
+	ring_ent->fuse_req = req;
+	ring_ent->state = FRRS_FUSE_REQ;
+	list_move(&ring_ent->list, &queue->ent_w_req_queue);
+	fuse_uring_add_to_pq(ring_ent, req);
+}
+
+/*
+ * Release the ring entry and fetch the next fuse request if available
+ *
+ * @return true if a new request has been fetched
+ */
+static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
+	__must_hold(&queue->lock)
+{
+	struct fuse_req *req;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct list_head *req_queue = &queue->fuse_req_queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	/* get and assign the next entry while it is still holding the lock */
+	req = list_first_entry_or_null(req_queue, struct fuse_req, list);
+	if (req) {
+		fuse_uring_add_req_to_ring_ent(ring_ent, req);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Read data from the ring buffer, which user space has written to
+ * This is comparible with handling of classical write(/dev/fuse).
+ * Also make the ring request available again for new fuse requests.
+ */
+static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
+			      unsigned int issue_flags)
+{
+	struct fuse_ring *ring = ring_ent->queue->ring;
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_req *req = ring_ent->fuse_req;
+	ssize_t err = 0;
+	bool set_err = false;
+
+	err = copy_from_user(&req->out.h, &ring_ent->headers->in_out,
+			     sizeof(req->out.h));
+	if (err) {
+		req->out.h.error = err;
+		goto out;
+	}
+
+	err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
+	if (err) {
+		/* req->out.h.error already set */
+		goto out;
+	}
+
+	err = fuse_uring_copy_from_ring(ring, req, ring_ent);
+	if (err)
+		set_err = true;
+
+out:
+	fuse_uring_req_end(ring_ent, set_err, err);
+}
+
+/*
+ * Get the next fuse req and send it
+ */
+static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
+				     struct fuse_ring_queue *queue,
+				     unsigned int issue_flags)
+{
+	int err;
+	bool has_next;
+
+retry:
+	spin_lock(&queue->lock);
+	fuse_uring_ent_avail(ring_ent, queue);
+	has_next = fuse_uring_ent_assign_req(ring_ent);
+	spin_unlock(&queue->lock);
+
+	if (has_next) {
+		err = fuse_uring_send_next_to_ring(ring_ent, issue_flags);
+		if (err)
+			goto retry;
+	}
+}
+
+static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
+{
+	struct fuse_ring_queue *queue = ent->queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
+		return -EIO;
+
+	ent->state = FRRS_COMMIT;
+	list_move(&ent->list, &queue->ent_commit_queue);
+
+	return 0;
+}
+
+/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
+static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
+				   struct fuse_conn *fc)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_ring_ent *ring_ent;
+	int err;
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
+	unsigned int qid = READ_ONCE(cmd_req->qid);
+	struct fuse_pqueue *fpq;
+	struct fuse_req *req;
+
+	err = -ENOTCONN;
+	if (!ring)
+		return err;
+
+	if (qid >= ring->nr_queues)
+		return -EINVAL;
+
+	queue = ring->queues[qid];
+	if (!queue)
+		return err;
+	fpq = &queue->fpq;
+
+	spin_lock(&queue->lock);
+	/* Find a request based on the unique ID of the fuse request
+	 * This should get revised, as it needs a hash calculation and list
+	 * search. And full struct fuse_pqueue is needed (memory overhead).
+	 * As well as the link from req to ring_ent.
+	 */
+	req = fuse_request_find(fpq, commit_id);
+	err = -ENOENT;
+	if (!req) {
+		pr_info("qid=%d commit_id %llu not found\n", queue->qid,
+			commit_id);
+		spin_unlock(&queue->lock);
+		return err;
+	}
+	list_del_init(&req->list);
+	ring_ent = req->ring_entry;
+	req->ring_entry = NULL;
+
+	err = fuse_ring_ent_set_commit(ring_ent);
+	if (err != 0) {
+		pr_info_ratelimited("qid=%d commit_id %llu state %d",
+				    queue->qid, commit_id, ring_ent->state);
+		spin_unlock(&queue->lock);
+		return err;
+	}
+
+	ring_ent->cmd = cmd;
+	spin_unlock(&queue->lock);
+
+	/* without the queue lock, as other locks are taken */
+	fuse_uring_commit(ring_ent, issue_flags);
+
+	/*
+	 * Fetching the next request is absolutely required as queued
+	 * fuse requests would otherwise not get processed - committing
+	 * and fetching is done in one step vs legacy fuse, which has separated
+	 * read (fetch request) and write (commit result).
+	 */
+	fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
+	return 0;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -325,6 +767,14 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
 			return err;
 		}
 		break;
+	case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
+		err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
+		if (err) {
+			pr_info_once("FUSE_IO_URING_COMMIT_AND_FETCH failed err=%d\n",
+				     err);
+			return err;
+		}
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 4e46dd65196d26dabc62dada33b17de9aa511c08..80f1c62d4df7f0ca77c4d5179068df6ffdbf7d85 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -20,6 +20,9 @@ enum fuse_ring_req_state {
 	/* The ring entry is waiting for new fuse requests */
 	FRRS_AVAILABLE,
 
+	/* The ring entry got assigned a fuse req */
+	FRRS_FUSE_REQ,
+
 	/* The ring entry is in or on the way to user space */
 	FRRS_USERSPACE,
 };
@@ -70,7 +73,16 @@ struct fuse_ring_queue {
 	 * entries in the process of being committed or in the process
 	 * to be sent to userspace
 	 */
+	struct list_head ent_w_req_queue;
 	struct list_head ent_commit_queue;
+
+	/* entries in userspace */
+	struct list_head ent_in_userspace;
+
+	/* fuse requests waiting for an entry slot */
+	struct list_head fuse_req_queue;
+
+	struct fuse_pqueue fpq;
 };
 
 /**
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e545b0864dd51e82df61cc39bdf65d3d36a418dc..e71556894bc25808581424ec7bdd4afeebc81f15 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -438,6 +438,10 @@ struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+#ifdef CONFIG_FUSE_IO_URING
+	void *ring_entry;
+#endif
 };
 
 struct fuse_iqueue;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (9 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07 15:31   ` Luis Henriques
  2025-01-17 11:23   ` Pavel Begunkov
  2025-01-07  0:25 ` [PATCH v9 12/17] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static Bernd Schubert
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.

Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.

Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed

b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.

This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.

Entries in state:
- FRRS_INIT: No action needed, do not contribute to
  ring->queue_refs yet
- All other states: Are currently processed by other tasks,
  async teardown is needed and it has to wait for the two
  states above. It could be also solved without an async
  teardown task, but would require additional if conditions
  in hot code paths. Also in my personal opinion the code
  looks cleaner with async teardown.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |   9 +++
 fs/fuse/dev_uring.c   | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |  51 +++++++++++++
 3 files changed, 258 insertions(+)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index aa33eba51c51dff6af2cdcf60bed9c3f6b4bc0d0..1c21e491e891196c77c7f6135cdc2aece785d399 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -6,6 +6,7 @@
   See the file COPYING.
 */
 
+#include "dev_uring_i.h"
 #include "fuse_i.h"
 #include "fuse_dev_i.h"
 
@@ -2291,6 +2292,12 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		spin_unlock(&fc->lock);
 
 		fuse_dev_end_requests(&to_end);
+
+		/*
+		 * fc->lock must not be taken to avoid conflicts with io-uring
+		 * locks
+		 */
+		fuse_uring_abort(fc);
 	} else {
 		spin_unlock(&fc->lock);
 	}
@@ -2302,6 +2309,8 @@ void fuse_wait_aborted(struct fuse_conn *fc)
 	/* matches implicit memory barrier in fuse_drop_waiting() */
 	smp_mb();
 	wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
+
+	fuse_uring_wait_stopped_queues(fc);
 }
 
 int fuse_dev_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index f44e66a7ea577390da87e9ac7d118a9416898c28..01a908b2ef9ada14b759ca047eab40b4c4431d89 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -39,6 +39,37 @@ static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
 	ring_ent->fuse_req = NULL;
 }
 
+/* Abort all list queued request on the given ring queue */
+static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
+{
+	struct fuse_req *req;
+	LIST_HEAD(req_list);
+
+	spin_lock(&queue->lock);
+	list_for_each_entry(req, &queue->fuse_req_queue, list)
+		clear_bit(FR_PENDING, &req->flags);
+	list_splice_init(&queue->fuse_req_queue, &req_list);
+	spin_unlock(&queue->lock);
+
+	/* must not hold queue lock to avoid order issues with fi->lock */
+	fuse_dev_end_requests(&req_list);
+}
+
+void fuse_uring_abort_end_requests(struct fuse_ring *ring)
+{
+	int qid;
+	struct fuse_ring_queue *queue;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		queue = READ_ONCE(ring->queues[qid]);
+		if (!queue)
+			continue;
+
+		queue->stopped = true;
+		fuse_uring_abort_end_queue_requests(queue);
+	}
+}
+
 void fuse_uring_destruct(struct fuse_conn *fc)
 {
 	struct fuse_ring *ring = fc->ring;
@@ -98,10 +129,13 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 		goto out_err;
 	}
 
+	init_waitqueue_head(&ring->stop_waitq);
+
 	fc->ring = ring;
 	ring->nr_queues = nr_queues;
 	ring->fc = fc;
 	ring->max_payload_sz = max_payload_size;
+	atomic_set(&ring->queue_refs, 0);
 
 	spin_unlock(&fc->lock);
 	return ring;
@@ -158,6 +192,166 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	return queue;
 }
 
+static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
+{
+	struct fuse_req *req = ent->fuse_req;
+
+	/* remove entry from fuse_pqueue->processing */
+	list_del_init(&req->list);
+	ent->fuse_req = NULL;
+	clear_bit(FR_SENT, &req->flags);
+	req->out.h.error = -ECONNABORTED;
+	fuse_request_end(req);
+}
+
+/*
+ * Release a request/entry on connection tear down
+ */
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
+{
+	if (ent->cmd) {
+		io_uring_cmd_done(ent->cmd, -ENOTCONN, 0, IO_URING_F_UNLOCKED);
+		ent->cmd = NULL;
+	}
+
+	if (ent->fuse_req)
+		fuse_uring_stop_fuse_req_end(ent);
+
+	list_del_init(&ent->list);
+	kfree(ent);
+}
+
+static void fuse_uring_stop_list_entries(struct list_head *head,
+					 struct fuse_ring_queue *queue,
+					 enum fuse_ring_req_state exp_state)
+{
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_ring_ent *ent, *next;
+	ssize_t queue_refs = SSIZE_MAX;
+	LIST_HEAD(to_teardown);
+
+	spin_lock(&queue->lock);
+	list_for_each_entry_safe(ent, next, head, list) {
+		if (ent->state != exp_state) {
+			pr_warn("entry teardown qid=%d state=%d expected=%d",
+				queue->qid, ent->state, exp_state);
+			continue;
+		}
+
+		list_move(&ent->list, &to_teardown);
+	}
+	spin_unlock(&queue->lock);
+
+	/* no queue lock to avoid lock order issues */
+	list_for_each_entry_safe(ent, next, &to_teardown, list) {
+		fuse_uring_entry_teardown(ent);
+		queue_refs = atomic_dec_return(&ring->queue_refs);
+		WARN_ON_ONCE(queue_refs < 0);
+	}
+}
+
+static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue)
+{
+	fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue,
+				     FRRS_USERSPACE);
+	fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue,
+				     FRRS_AVAILABLE);
+}
+
+/*
+ * Log state debug info
+ */
+static void fuse_uring_log_ent_state(struct fuse_ring *ring)
+{
+	int qid;
+	struct fuse_ring_ent *ent;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		spin_lock(&queue->lock);
+		/*
+		 * Log entries from the intermediate queue, the other queues
+		 * should be empty
+		 */
+		list_for_each_entry(ent, &queue->ent_w_req_queue, list) {
+			pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+				ring, qid, ent, ent->state);
+		}
+		list_for_each_entry(ent, &queue->ent_commit_queue, list) {
+			pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+				ring, qid, ent, ent->state);
+		}
+		spin_unlock(&queue->lock);
+	}
+	ring->stop_debug_log = 1;
+}
+
+static void fuse_uring_async_stop_queues(struct work_struct *work)
+{
+	int qid;
+	struct fuse_ring *ring =
+		container_of(work, struct fuse_ring, async_teardown_work.work);
+
+	/* XXX code dup */
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+		if (!queue)
+			continue;
+
+		fuse_uring_teardown_entries(queue);
+	}
+
+	/*
+	 * Some ring entries are might be in the middle of IO operations,
+	 * i.e. in process to get handled by file_operations::uring_cmd
+	 * or on the way to userspace - we could handle that with conditions in
+	 * run time code, but easier/cleaner to have an async tear down handler
+	 * If there are still queue references left
+	 */
+	if (atomic_read(&ring->queue_refs) > 0) {
+		if (time_after(jiffies,
+			       ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT))
+			fuse_uring_log_ent_state(ring);
+
+		schedule_delayed_work(&ring->async_teardown_work,
+				      FUSE_URING_TEARDOWN_INTERVAL);
+	} else {
+		wake_up_all(&ring->stop_waitq);
+	}
+}
+
+/*
+ * Stop the ring queues
+ */
+void fuse_uring_stop_queues(struct fuse_ring *ring)
+{
+	int qid;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+		if (!queue)
+			continue;
+
+		fuse_uring_teardown_entries(queue);
+	}
+
+	if (atomic_read(&ring->queue_refs) > 0) {
+		ring->teardown_time = jiffies;
+		INIT_DELAYED_WORK(&ring->async_teardown_work,
+				  fuse_uring_async_stop_queues);
+		schedule_delayed_work(&ring->async_teardown_work,
+				      FUSE_URING_TEARDOWN_INTERVAL);
+	} else {
+		wake_up_all(&ring->stop_waitq);
+	}
+}
+
 /*
  * Checks for errors and stores it into the request
  */
@@ -538,6 +732,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 		return err;
 	fpq = &queue->fpq;
 
+	if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped))
+		return err;
+
 	spin_lock(&queue->lock);
 	/* Find a request based on the unique ID of the fuse request
 	 * This should get revised, as it needs a hash calculation and list
@@ -667,6 +864,7 @@ fuse_uring_create_ring_ent(struct io_uring_cmd *cmd,
 		return ERR_PTR(err);
 	}
 
+	atomic_inc(&ring->queue_refs);
 	return ent;
 }
 
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 80f1c62d4df7f0ca77c4d5179068df6ffdbf7d85..ee5aeccae66caaf9a4dccbbbc785820836182668 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -11,6 +11,9 @@
 
 #ifdef CONFIG_FUSE_IO_URING
 
+#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ)
+#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20)
+
 enum fuse_ring_req_state {
 	FRRS_INVALID = 0,
 
@@ -83,6 +86,8 @@ struct fuse_ring_queue {
 	struct list_head fuse_req_queue;
 
 	struct fuse_pqueue fpq;
+
+	bool stopped;
 };
 
 /**
@@ -100,12 +105,51 @@ struct fuse_ring {
 	size_t max_payload_sz;
 
 	struct fuse_ring_queue **queues;
+	/*
+	 * Log ring entry states onces on stop when entries cannot be
+	 * released
+	 */
+	unsigned int stop_debug_log : 1;
+
+	wait_queue_head_t stop_waitq;
+
+	/* async tear down */
+	struct delayed_work async_teardown_work;
+
+	/* log */
+	unsigned long teardown_time;
+
+	atomic_t queue_refs;
 };
 
 bool fuse_uring_enabled(void);
 void fuse_uring_destruct(struct fuse_conn *fc);
+void fuse_uring_stop_queues(struct fuse_ring *ring);
+void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
 
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+
+	if (ring == NULL)
+		return;
+
+	if (atomic_read(&ring->queue_refs) > 0) {
+		fuse_uring_abort_end_requests(ring);
+		fuse_uring_stop_queues(ring);
+	}
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+
+	if (ring)
+		wait_event(ring->stop_waitq,
+			   atomic_read(&ring->queue_refs) == 0);
+}
+
 #else /* CONFIG_FUSE_IO_URING */
 
 struct fuse_ring;
@@ -123,6 +167,13 @@ static inline bool fuse_uring_enabled(void)
 	return false;
 }
 
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+}
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 12/17] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (10 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring Bernd Schubert
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

These functions are also needed by fuse-over-io-uring.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c        | 5 +++--
 fs/fuse/fuse_dev_i.h | 5 +++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1c21e491e891196c77c7f6135cdc2aece785d399..ecf2f805f456222fda02598397beba41fc356460 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -237,7 +237,8 @@ __releases(fiq->lock)
 	spin_unlock(&fiq->lock);
 }
 
-static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget)
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+			   struct fuse_forget_link *forget)
 {
 	spin_lock(&fiq->lock);
 	if (fiq->connected) {
@@ -250,7 +251,7 @@ static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_li
 	}
 }
 
-static void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
 {
 	spin_lock(&fiq->lock);
 	if (list_empty(&req->intr_entry)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 599a61536f8c85b3631b8584247a917bda92e719..429661ae065464c62a093cf719c5ece38686bbbe 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -17,6 +17,8 @@ struct fuse_arg;
 struct fuse_args;
 struct fuse_pqueue;
 struct fuse_req;
+struct fuse_iqueue;
+struct fuse_forget_link;
 
 struct fuse_copy_state {
 	int write;
@@ -57,6 +59,9 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
 		   int zeroing);
 int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
 		       unsigned int nbytes);
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+			   struct fuse_forget_link *forget);
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req);
 
 #endif
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (11 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 12/17] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07 15:54   ` Luis Henriques
                     ` (2 more replies)
  2025-01-07  0:25 ` [PATCH v9 14/17] fuse: Allow to queue bg " Bernd Schubert
                   ` (4 subsequent siblings)
  17 siblings, 3 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This prepares queueing and sending foreground requests through
io-uring.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c   | 185 ++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/fuse/dev_uring_i.h |  11 ++-
 2 files changed, 187 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 01a908b2ef9ada14b759ca047eab40b4c4431d89..89a22a4eee23cbba49bac7a2d2126bb51193326f 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -26,6 +26,29 @@ bool fuse_uring_enabled(void)
 	return enable_uring;
 }
 
+struct fuse_uring_pdu {
+	struct fuse_ring_ent *ring_ent;
+};
+
+static const struct fuse_iqueue_ops fuse_io_uring_ops;
+
+static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd,
+				   struct fuse_ring_ent *ring_ent)
+{
+	struct fuse_uring_pdu *pdu =
+		io_uring_cmd_to_pdu(cmd, struct fuse_uring_pdu);
+
+	pdu->ring_ent = ring_ent;
+}
+
+static struct fuse_ring_ent *uring_cmd_to_ring_ent(struct io_uring_cmd *cmd)
+{
+	struct fuse_uring_pdu *pdu =
+		io_uring_cmd_to_pdu(cmd, struct fuse_uring_pdu);
+
+	return pdu->ring_ent;
+}
+
 static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
 			       int error)
 {
@@ -441,7 +464,7 @@ static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
 	struct iov_iter iter;
 	struct fuse_uring_ent_in_out ent_in_out = {
 		.flags = 0,
-		.commit_id = ent->commit_id,
+		.commit_id = req->in.h.unique,
 	};
 
 	if (WARN_ON(ent_in_out.commit_id == 0))
@@ -460,7 +483,7 @@ static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
 	if (num_args > 0) {
 		/*
 		 * Expectation is that the first argument is the per op header.
-		 * Some op code have that as zero.
+		 * Some op code have that as zero size.
 		 */
 		if (args->in_args[0].size > 0) {
 			res = copy_to_user(&ent->headers->op_in, in_args->value,
@@ -578,11 +601,8 @@ static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
 	struct fuse_pqueue *fpq = &queue->fpq;
 	unsigned int hash;
 
-	/* commit_id is the unique id of the request */
-	ring_ent->commit_id = req->in.h.unique;
-
 	req->ring_entry = ring_ent;
-	hash = fuse_req_hash(ring_ent->commit_id);
+	hash = fuse_req_hash(req->in.h.unique);
 	list_move_tail(&req->list, &fpq->processing[hash]);
 }
 
@@ -777,6 +797,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	return 0;
 }
 
+static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
+{
+	int qid;
+	struct fuse_ring_queue *queue;
+	bool ready = true;
+
+	for (qid = 0; qid < ring->nr_queues && ready; qid++) {
+		if (current_qid == qid)
+			continue;
+
+		queue = ring->queues[qid];
+		if (!queue) {
+			ready = false;
+			break;
+		}
+
+		spin_lock(&queue->lock);
+		if (list_empty(&queue->ent_avail_queue))
+			ready = false;
+		spin_unlock(&queue->lock);
+	}
+
+	return ready;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -785,10 +830,22 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
 				   unsigned int issue_flags)
 {
 	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
 
 	spin_lock(&queue->lock);
 	fuse_uring_ent_avail(ring_ent, queue);
 	spin_unlock(&queue->lock);
+
+	if (!ring->ready) {
+		bool ready = is_ring_ready(ring, queue->qid);
+
+		if (ready) {
+			WRITE_ONCE(ring->ready, true);
+			fiq->ops = &fuse_io_uring_ops;
+		}
+	}
 }
 
 /*
@@ -979,3 +1036,119 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
 
 	return -EIOCBQUEUED;
 }
+
+/*
+ * This prepares and sends the ring request in fuse-uring task context.
+ * User buffers are not mapped yet - the application does not have permission
+ * to write to it - this has to be executed in ring task context.
+ */
+static void
+fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
+			    unsigned int issue_flags)
+{
+	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
+	struct fuse_ring_queue *queue = ent->queue;
+	int err;
+
+	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
+		err = -ECANCELED;
+		goto terminating;
+	}
+
+	err = fuse_uring_prepare_send(ent);
+	if (err)
+		goto err;
+
+terminating:
+	spin_lock(&queue->lock);
+	ent->state = FRRS_USERSPACE;
+	list_move(&ent->list, &queue->ent_in_userspace);
+	spin_unlock(&queue->lock);
+	io_uring_cmd_done(cmd, err, 0, issue_flags);
+	ent->cmd = NULL;
+	return;
+err:
+	fuse_uring_next_fuse_req(ent, queue, issue_flags);
+}
+
+static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
+{
+	unsigned int qid;
+	struct fuse_ring_queue *queue;
+
+	qid = task_cpu(current);
+
+	if (WARN_ONCE(qid >= ring->nr_queues,
+		      "Core number (%u) exceeds nr ueues (%zu)\n", qid,
+		      ring->nr_queues))
+		qid = 0;
+
+	queue = ring->queues[qid];
+	if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
+		return NULL;
+
+	return queue;
+}
+
+/* queue a fuse request and send it if a ring entry is available */
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	struct fuse_ring_ent *ent = NULL;
+	int err;
+
+	err = -EINVAL;
+	queue = fuse_uring_task_to_queue(ring);
+	if (!queue)
+		goto err;
+
+	if (req->in.h.opcode != FUSE_NOTIFY_REPLY)
+		req->in.h.unique = fuse_get_unique(fiq);
+
+	spin_lock(&queue->lock);
+	err = -ENOTCONN;
+	if (unlikely(queue->stopped))
+		goto err_unlock;
+
+	ent = list_first_entry_or_null(&queue->ent_avail_queue,
+				       struct fuse_ring_ent, list);
+	if (ent)
+		fuse_uring_add_req_to_ring_ent(ent, req);
+	else
+		list_add_tail(&req->list, &queue->fuse_req_queue);
+	spin_unlock(&queue->lock);
+
+	if (ent) {
+		struct io_uring_cmd *cmd = ent->cmd;
+
+		err = -EIO;
+		if (WARN_ON_ONCE(ent->state != FRRS_FUSE_REQ))
+			goto err;
+
+		uring_cmd_set_ring_ent(cmd, ent);
+		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+	}
+
+	return;
+
+err_unlock:
+	spin_unlock(&queue->lock);
+err:
+	req->out.h.error = err;
+	clear_bit(FR_PENDING, &req->flags);
+	fuse_request_end(req);
+}
+
+static const struct fuse_iqueue_ops fuse_io_uring_ops = {
+	/* should be send over io-uring as enhancement */
+	.send_forget = fuse_dev_queue_forget,
+
+	/*
+	 * could be send over io-uring, but interrupts should be rare,
+	 * no need to make the code complex
+	 */
+	.send_interrupt = fuse_dev_queue_interrupt,
+	.send_req = fuse_uring_queue_fuse_req,
+};
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index ee5aeccae66caaf9a4dccbbbc785820836182668..cda330978faa019ceedf161f50d86db976b072e2 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -48,9 +48,6 @@ struct fuse_ring_ent {
 	enum fuse_ring_req_state state;
 
 	struct fuse_req *fuse_req;
-
-	/* commit id to identify the server reply */
-	uint64_t commit_id;
 };
 
 struct fuse_ring_queue {
@@ -120,6 +117,8 @@ struct fuse_ring {
 	unsigned long teardown_time;
 
 	atomic_t queue_refs;
+
+	bool ready;
 };
 
 bool fuse_uring_enabled(void);
@@ -127,6 +126,7 @@ void fuse_uring_destruct(struct fuse_conn *fc);
 void fuse_uring_stop_queues(struct fuse_ring *ring);
 void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
 
 static inline void fuse_uring_abort(struct fuse_conn *fc)
 {
@@ -150,6 +150,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
 			   atomic_read(&ring->queue_refs) == 0);
 }
 
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+	return fc->ring && fc->ring->ready;
+}
+
 #else /* CONFIG_FUSE_IO_URING */
 
 struct fuse_ring;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 14/17] fuse: Allow to queue bg requests through io-uring
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (12 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-17 11:49   ` Pavel Begunkov
  2025-01-07  0:25 ` [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This prepares queueing and sending background requests through
io-uring.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         | 24 ++++++++++++-
 fs/fuse/dev_uring.c   | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h | 12 +++++++
 3 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ecf2f805f456222fda02598397beba41fc356460..afafa960d4725d9b64b22f17bf09c846219396d6 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -568,7 +568,25 @@ ssize_t __fuse_simple_request(struct mnt_idmap *idmap,
 	return ret;
 }
 
-static bool fuse_request_queue_background(struct fuse_req *req)
+#ifdef CONFIG_FUSE_IO_URING
+static bool fuse_request_queue_background_uring(struct fuse_conn *fc,
+					       struct fuse_req *req)
+{
+	struct fuse_iqueue *fiq = &fc->iq;
+
+	req->in.h.unique = fuse_get_unique(fiq);
+	req->in.h.len = sizeof(struct fuse_in_header) +
+		fuse_len_args(req->args->in_numargs,
+			      (struct fuse_arg *) req->args->in_args);
+
+	return fuse_uring_queue_bq_req(req);
+}
+#endif
+
+/*
+ * @return true if queued
+ */
+static int fuse_request_queue_background(struct fuse_req *req)
 {
 	struct fuse_mount *fm = req->fm;
 	struct fuse_conn *fc = fm->fc;
@@ -580,6 +598,10 @@ static bool fuse_request_queue_background(struct fuse_req *req)
 		atomic_inc(&fc->num_waiting);
 	}
 	__set_bit(FR_ISREPLY, &req->flags);
+
+	if (fuse_uring_ready(fc))
+		return fuse_request_queue_background_uring(fc, req);
+
 	spin_lock(&fc->bg_lock);
 	if (likely(fc->connected)) {
 		fc->num_background++;
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 89a22a4eee23cbba49bac7a2d2126bb51193326f..4e4385dff9315d25aa8c37a37f1e902aec3fcd20 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -49,10 +49,52 @@ static struct fuse_ring_ent *uring_cmd_to_ring_ent(struct io_uring_cmd *cmd)
 	return pdu->ring_ent;
 }
 
+static void fuse_uring_flush_bg(struct fuse_ring_queue *queue)
+{
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_conn *fc = ring->fc;
+
+	lockdep_assert_held(&queue->lock);
+	lockdep_assert_held(&fc->bg_lock);
+
+	/*
+	 * Allow one bg request per queue, ignoring global fc limits.
+	 * This prevents a single queue from consuming all resources and
+	 * eliminates the need for remote queue wake-ups when global
+	 * limits are met but this queue has no more waiting requests.
+	 */
+	while ((fc->active_background < fc->max_background ||
+		!queue->active_background) &&
+	       (!list_empty(&queue->fuse_req_bg_queue))) {
+		struct fuse_req *req;
+
+		req = list_first_entry(&queue->fuse_req_bg_queue,
+				       struct fuse_req, list);
+		fc->active_background++;
+		queue->active_background++;
+
+		list_move_tail(&req->list, &queue->fuse_req_queue);
+	}
+}
+
 static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
 			       int error)
 {
+	struct fuse_ring_queue *queue = ring_ent->queue;
 	struct fuse_req *req = ring_ent->fuse_req;
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_conn *fc = ring->fc;
+
+	lockdep_assert_not_held(&queue->lock);
+	spin_lock(&queue->lock);
+	if (test_bit(FR_BACKGROUND, &req->flags)) {
+		queue->active_background--;
+		spin_lock(&fc->bg_lock);
+		fuse_uring_flush_bg(queue);
+		spin_unlock(&fc->bg_lock);
+	}
+
+	spin_unlock(&queue->lock);
 
 	if (set_err)
 		req->out.h.error = error;
@@ -82,6 +124,7 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
 {
 	int qid;
 	struct fuse_ring_queue *queue;
+	struct fuse_conn *fc = ring->fc;
 
 	for (qid = 0; qid < ring->nr_queues; qid++) {
 		queue = READ_ONCE(ring->queues[qid]);
@@ -89,6 +132,13 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
 			continue;
 
 		queue->stopped = true;
+
+		WARN_ON_ONCE(ring->fc->max_background != UINT_MAX);
+		spin_lock(&queue->lock);
+		spin_lock(&fc->bg_lock);
+		fuse_uring_flush_bg(queue);
+		spin_unlock(&fc->bg_lock);
+		spin_unlock(&queue->lock);
 		fuse_uring_abort_end_queue_requests(queue);
 	}
 }
@@ -194,6 +244,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	INIT_LIST_HEAD(&queue->ent_w_req_queue);
 	INIT_LIST_HEAD(&queue->ent_in_userspace);
 	INIT_LIST_HEAD(&queue->fuse_req_queue);
+	INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
 
 	queue->fpq.processing = pq;
 	fuse_pqueue_init(&queue->fpq);
@@ -1141,6 +1192,54 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 	fuse_request_end(req);
 }
 
+bool fuse_uring_queue_bq_req(struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	struct fuse_ring_ent *ent = NULL;
+
+	queue = fuse_uring_task_to_queue(ring);
+	if (!queue)
+		return false;
+
+	spin_lock(&queue->lock);
+	if (unlikely(queue->stopped)) {
+		spin_unlock(&queue->lock);
+		return false;
+	}
+
+	list_add_tail(&req->list, &queue->fuse_req_bg_queue);
+
+	ent = list_first_entry_or_null(&queue->ent_avail_queue,
+				       struct fuse_ring_ent, list);
+	spin_lock(&fc->bg_lock);
+	fc->num_background++;
+	if (fc->num_background == fc->max_background)
+		fc->blocked = 1;
+	fuse_uring_flush_bg(queue);
+	spin_unlock(&fc->bg_lock);
+
+	/*
+	 * Due to bg_queue flush limits there might be other bg requests
+	 * in the queue that need to be handled first. Or no further req
+	 * might be available.
+	 */
+	req = list_first_entry_or_null(&queue->fuse_req_queue, struct fuse_req,
+				       list);
+	if (ent && req) {
+		struct io_uring_cmd *cmd = ent->cmd;
+
+		fuse_uring_add_req_to_ring_ent(ent, req);
+
+		uring_cmd_set_ring_ent(cmd, ent);
+		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+	}
+	spin_unlock(&queue->lock);
+
+	return true;
+}
+
 static const struct fuse_iqueue_ops fuse_io_uring_ops = {
 	/* should be send over io-uring as enhancement */
 	.send_forget = fuse_dev_queue_forget,
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index cda330978faa019ceedf161f50d86db976b072e2..a4271f4e55aa9d2d9b42f3d2c4095887f9563351 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -82,8 +82,13 @@ struct fuse_ring_queue {
 	/* fuse requests waiting for an entry slot */
 	struct list_head fuse_req_queue;
 
+	/* background fuse requests */
+	struct list_head fuse_req_bg_queue;
+
 	struct fuse_pqueue fpq;
 
+	unsigned int active_background;
+
 	bool stopped;
 };
 
@@ -127,6 +132,7 @@ void fuse_uring_stop_queues(struct fuse_ring *ring);
 void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
 void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
+bool fuse_uring_queue_bq_req(struct fuse_req *req);
 
 static inline void fuse_uring_abort(struct fuse_conn *fc)
 {
@@ -179,6 +185,12 @@ static inline void fuse_uring_abort(struct fuse_conn *fc)
 static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
 {
 }
+
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+	return false;
+}
+
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (13 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 14/17] fuse: Allow to queue bg " Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07 16:14   ` Luis Henriques
  2025-01-17 11:52   ` Pavel Begunkov
  2025-01-07  0:25 ` [PATCH v9 16/17] fuse: block request allocation until io-uring init is complete Bernd Schubert
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

When the fuse-server terminates while the fuse-client or kernel
still has queued URING_CMDs, these commands retain references
to the struct file used by the fuse connection. This prevents
fuse_dev_release() from being invoked, resulting in a hung mount
point.

This patch addresses the issue by making queued URING_CMDs
cancelable, allowing fuse_dev_release() to proceed as expected
and preventing the mount point from hanging.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |  2 ++
 fs/fuse/dev_uring.c   | 71 ++++++++++++++++++++++++++++++++++++++++++++++++---
 fs/fuse/dev_uring_i.h |  9 +++++++
 3 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index afafa960d4725d9b64b22f17bf09c846219396d6..1b593b23f7b8c319ec38c7e726dabf516965500e 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -599,8 +599,10 @@ static int fuse_request_queue_background(struct fuse_req *req)
 	}
 	__set_bit(FR_ISREPLY, &req->flags);
 
+#ifdef CONFIG_FUSE_IO_URING
 	if (fuse_uring_ready(fc))
 		return fuse_request_queue_background_uring(fc, req);
+#endif
 
 	spin_lock(&fc->bg_lock);
 	if (likely(fc->connected)) {
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 4e4385dff9315d25aa8c37a37f1e902aec3fcd20..cdd3917b365f4040c0f147648b09af9a41e2f49e 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -153,6 +153,7 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 
 	for (qid = 0; qid < ring->nr_queues; qid++) {
 		struct fuse_ring_queue *queue = ring->queues[qid];
+		struct fuse_ring_ent *ent, *next;
 
 		if (!queue)
 			continue;
@@ -162,6 +163,12 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 		WARN_ON(!list_empty(&queue->ent_commit_queue));
 		WARN_ON(!list_empty(&queue->ent_in_userspace));
 
+		list_for_each_entry_safe(ent, next, &queue->ent_released,
+					 list) {
+			list_del_init(&ent->list);
+			kfree(ent);
+		}
+
 		kfree(queue->fpq.processing);
 		kfree(queue);
 		ring->queues[qid] = NULL;
@@ -245,6 +252,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	INIT_LIST_HEAD(&queue->ent_in_userspace);
 	INIT_LIST_HEAD(&queue->fuse_req_queue);
 	INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
+	INIT_LIST_HEAD(&queue->ent_released);
 
 	queue->fpq.processing = pq;
 	fuse_pqueue_init(&queue->fpq);
@@ -283,6 +291,7 @@ static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
  */
 static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
 {
+	struct fuse_ring_queue *queue = ent->queue;
 	if (ent->cmd) {
 		io_uring_cmd_done(ent->cmd, -ENOTCONN, 0, IO_URING_F_UNLOCKED);
 		ent->cmd = NULL;
@@ -291,8 +300,16 @@ static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
 	if (ent->fuse_req)
 		fuse_uring_stop_fuse_req_end(ent);
 
-	list_del_init(&ent->list);
-	kfree(ent);
+	/*
+	 * The entry must not be freed immediately, due to access of direct
+	 * pointer access of entries through IO_URING_F_CANCEL - there is a risk
+	 * of race between daemon termination (which triggers IO_URING_F_CANCEL
+	 * and accesses entries without checking the list state first
+	 */
+	spin_lock(&queue->lock);
+	list_move(&ent->list, &queue->ent_released);
+	ent->state = FRRS_RELEASED;
+	spin_unlock(&queue->lock);
 }
 
 static void fuse_uring_stop_list_entries(struct list_head *head,
@@ -312,6 +329,7 @@ static void fuse_uring_stop_list_entries(struct list_head *head,
 			continue;
 		}
 
+		ent->state = FRRS_TEARDOWN;
 		list_move(&ent->list, &to_teardown);
 	}
 	spin_unlock(&queue->lock);
@@ -426,6 +444,46 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
 	}
 }
 
+/*
+ * Handle IO_URING_F_CANCEL, typically should come on daemon termination.
+ *
+ * Releasing the last entry should trigger fuse_dev_release() if
+ * the daemon was terminated
+ */
+static void fuse_uring_cancel(struct io_uring_cmd *cmd,
+			      unsigned int issue_flags)
+{
+	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
+	struct fuse_ring_queue *queue;
+	bool need_cmd_done = false;
+
+	/*
+	 * direct access on ent - it must not be destructed as long as
+	 * IO_URING_F_CANCEL might come up
+	 */
+	queue = ent->queue;
+	spin_lock(&queue->lock);
+	if (ent->state == FRRS_AVAILABLE) {
+		ent->state = FRRS_USERSPACE;
+		list_move(&ent->list, &queue->ent_in_userspace);
+		need_cmd_done = true;
+		ent->cmd = NULL;
+	}
+	spin_unlock(&queue->lock);
+
+	if (need_cmd_done) {
+		/* no queue lock to avoid lock order issues */
+		io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
+	}
+}
+
+static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
+				      struct fuse_ring_ent *ring_ent)
+{
+	uring_cmd_set_ring_ent(cmd, ring_ent);
+	io_uring_cmd_mark_cancelable(cmd, issue_flags);
+}
+
 /*
  * Checks for errors and stores it into the request
  */
@@ -836,6 +894,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	spin_unlock(&queue->lock);
 
 	/* without the queue lock, as other locks are taken */
+	fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
 	fuse_uring_commit(ring_ent, issue_flags);
 
 	/*
@@ -885,6 +944,8 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
 	struct fuse_conn *fc = ring->fc;
 	struct fuse_iqueue *fiq = &fc->iq;
 
+	fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
+
 	spin_lock(&queue->lock);
 	fuse_uring_ent_avail(ring_ent, queue);
 	spin_unlock(&queue->lock);
@@ -1041,6 +1102,11 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
 		return -EOPNOTSUPP;
 	}
 
+	if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
+		fuse_uring_cancel(cmd, issue_flags);
+		return 0;
+	}
+
 	/* This extra SQE size holds struct fuse_uring_cmd_req */
 	if (!(issue_flags & IO_URING_F_SQE128))
 		return -EINVAL;
@@ -1173,7 +1239,6 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 
 	if (ent) {
 		struct io_uring_cmd *cmd = ent->cmd;
-
 		err = -EIO;
 		if (WARN_ON_ONCE(ent->state != FRRS_FUSE_REQ))
 			goto err;
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index a4271f4e55aa9d2d9b42f3d2c4095887f9563351..af2b3de829949a778d60493f36588fea67a4ba85 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -28,6 +28,12 @@ enum fuse_ring_req_state {
 
 	/* The ring entry is in or on the way to user space */
 	FRRS_USERSPACE,
+
+	/* The ring entry is in teardown */
+	FRRS_TEARDOWN,
+
+	/* The ring entry is released, but not freed yet */
+	FRRS_RELEASED,
 };
 
 /** A fuse ring entry, part of the ring queue */
@@ -79,6 +85,9 @@ struct fuse_ring_queue {
 	/* entries in userspace */
 	struct list_head ent_in_userspace;
 
+	/* entries that are released */
+	struct list_head ent_released;
+
 	/* fuse requests waiting for an entry slot */
 	struct list_head fuse_req_queue;
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 16/17] fuse: block request allocation until io-uring init is complete
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (14 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-07  0:25 ` [PATCH v9 17/17] fuse: enable fuse-over-io-uring Bernd Schubert
  2025-01-17  9:07 ` [PATCH v9 00/17] fuse: fuse-over-io-uring Miklos Szeredi
  17 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

From: Bernd Schubert <[email protected]>

Avoid races and block request allocation until io-uring
queues are ready.

This is a especially important for background requests,
as bg request completion might cause lock order inversion
of the typical queue->lock and then fc->bg_lock

    fuse_request_end
       spin_lock(&fc->bg_lock);
       flush_bg_queue
         fuse_send_one
           fuse_uring_queue_fuse_req
           spin_lock(&queue->lock);

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c       | 3 ++-
 fs/fuse/dev_uring.c | 1 +
 fs/fuse/fuse_i.h    | 3 +++
 fs/fuse/inode.c     | 2 ++
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1b593b23f7b8c319ec38c7e726dabf516965500e..f002e8a096f97ba8b6e039309292942995c901c5 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -76,7 +76,8 @@ void fuse_set_initialized(struct fuse_conn *fc)
 
 static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background)
 {
-	return !fc->initialized || (for_background && fc->blocked);
+	return !fc->initialized || (for_background && fc->blocked) ||
+	       (fc->io_uring && !fuse_uring_ready(fc));
 }
 
 static void fuse_drop_waiting(struct fuse_conn *fc)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index cdd3917b365f4040c0f147648b09af9a41e2f49e..45815dee2d969650efe0df199cc3bd1f3e2e08f7 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -956,6 +956,7 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
 		if (ready) {
 			WRITE_ONCE(ring->ready, true);
 			fiq->ops = &fuse_io_uring_ops;
+			wake_up_all(&fc->blocked_waitq);
 		}
 	}
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e71556894bc25808581424ec7bdd4afeebc81f15..886c3af2195892cb2ca0a171cd7b930b6e92484c 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -867,6 +867,9 @@ struct fuse_conn {
 	/* Use pages instead of pointer for kernel I/O */
 	unsigned int use_pages_for_kvec_io:1;
 
+	/* Use io_uring for communication */
+	unsigned int io_uring;
+
 	/** Maximum stack depth for passthrough backing files */
 	int max_stack_depth;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 328797b9aac9a816a4ad2c69b6880dc6ef6222b0..e9db2cb8c150878634728685af0fa15e7ade628f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1390,6 +1390,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 				else
 					ok = false;
 			}
+			if (flags & FUSE_OVER_IO_URING && fuse_uring_enabled())
+				fc->io_uring = 1;
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v9 17/17] fuse: enable fuse-over-io-uring
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (15 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 16/17] fuse: block request allocation until io-uring init is complete Bernd Schubert
@ 2025-01-07  0:25 ` Bernd Schubert
  2025-01-17 11:52   ` Pavel Begunkov
  2025-01-17  9:07 ` [PATCH v9 00/17] fuse: fuse-over-io-uring Miklos Szeredi
  17 siblings, 1 reply; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07  0:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

All required parts are handled now, fuse-io-uring can
be enabled.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c       | 3 +++
 fs/fuse/dev_uring.c | 3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index f002e8a096f97ba8b6e039309292942995c901c5..5b5f789b37eb68811832d905ca05b59a0d5a2b2a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2493,6 +2493,9 @@ const struct file_operations fuse_dev_operations = {
 	.fasync		= fuse_dev_fasync,
 	.unlocked_ioctl = fuse_dev_ioctl,
 	.compat_ioctl   = compat_ptr_ioctl,
+#ifdef CONFIG_FUSE_IO_URING
+	.uring_cmd	= fuse_uring_cmd,
+#endif
 };
 EXPORT_SYMBOL_GPL(fuse_dev_operations);
 
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 45815dee2d969650efe0df199cc3bd1f3e2e08f7..946b8468959c14de9e0698d63b52c99fe6ad352b 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1090,8 +1090,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
  * Entry function from io_uring to handle the given passthrough command
  * (op code IORING_OP_URING_CMD)
  */
-int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
-				  unsigned int issue_flags)
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
 	struct fuse_dev *fud;
 	struct fuse_conn *fc;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands
  2025-01-07  0:25 ` [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
@ 2025-01-07  9:56   ` Luis Henriques
  2025-01-07 12:07     ` Bernd Schubert
  2025-01-17 11:06   ` Pavel Begunkov
  1 sibling, 1 reply; 45+ messages in thread
From: Luis Henriques @ 2025-01-07  9:56 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

Hi Bernd,

On Tue, Jan 07 2025, Bernd Schubert wrote:

> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
> For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
> entries.

Please find below two (minor) comments I had already for v8.  Hopefully
this time I'll finish reviewing rev v9!

> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/Kconfig           |  12 ++
>  fs/fuse/Makefile          |   1 +
>  fs/fuse/dev_uring.c       | 333 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/dev_uring_i.h     | 116 ++++++++++++++++
>  fs/fuse/fuse_i.h          |   5 +
>  fs/fuse/inode.c           |  10 ++
>  include/uapi/linux/fuse.h |  76 ++++++++++-
>  7 files changed, 552 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..ca215a3cba3e310d1359d069202193acdcdb172b 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
>  	  to be performed directly on a backing file.
>  
>  	  If you want to allow passthrough operations, answer Y.
> +
> +config FUSE_IO_URING
> +	bool "FUSE communication over io-uring"
> +	default y
> +	depends on FUSE_FS
> +	depends on IO_URING
> +	help
> +	  This allows sending FUSE requests over the io-uring interface and
> +          also adds request core affinity.
> +
> +	  If you want to allow fuse server/client communication through io-uring,
> +	  answer Y
> diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> index 2c372180d631eb340eca36f19ee2c2686de9714d..3f0f312a31c1cc200c0c91a086b30a8318e39d94 100644
> --- a/fs/fuse/Makefile
> +++ b/fs/fuse/Makefile
> @@ -15,5 +15,6 @@ fuse-y += iomode.o
>  fuse-$(CONFIG_FUSE_DAX) += dax.o
>  fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
>  fuse-$(CONFIG_SYSCTL) += sysctl.o
> +fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
>  
>  virtiofs-y := virtio_fs.o
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b44ba4033615e01041313c040035b6da6af0ee17
> --- /dev/null
> +++ b/fs/fuse/dev_uring.c
> @@ -0,0 +1,333 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * FUSE: Filesystem in Userspace
> + * Copyright (c) 2023-2024 DataDirect Networks.
> + */
> +
> +#include "fuse_i.h"
> +#include "dev_uring_i.h"
> +#include "fuse_dev_i.h"
> +
> +#include <linux/fs.h>
> +#include <linux/io_uring/cmd.h>
> +
> +#ifdef CONFIG_FUSE_IO_URING

I guess this #ifdef a leftover from older versions, and should probably be
removed.  This file is compiled only if FUSE_IO_URING is defined.

> +static bool __read_mostly enable_uring;
> +module_param(enable_uring, bool, 0644);
> +MODULE_PARM_DESC(enable_uring,
> +		 "Enable userspace communication through io-uring");
> +#endif
> +
> +#define FUSE_URING_IOV_SEGS 2 /* header and payload */
> +
> +
> +bool fuse_uring_enabled(void)
> +{
> +	return enable_uring;
> +}
> +
> +void fuse_uring_destruct(struct fuse_conn *fc)
> +{
> +	struct fuse_ring *ring = fc->ring;
> +	int qid;
> +
> +	if (!ring)
> +		return;
> +
> +	for (qid = 0; qid < ring->nr_queues; qid++) {
> +		struct fuse_ring_queue *queue = ring->queues[qid];
> +
> +		if (!queue)
> +			continue;
> +
> +		WARN_ON(!list_empty(&queue->ent_avail_queue));
> +		WARN_ON(!list_empty(&queue->ent_commit_queue));
> +
> +		kfree(queue);
> +		ring->queues[qid] = NULL;
> +	}
> +
> +	kfree(ring->queues);
> +	kfree(ring);
> +	fc->ring = NULL;
> +}
> +
> +/*
> + * Basic ring setup for this connection based on the provided configuration
> + */
> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
> +{
> +	struct fuse_ring *ring;
> +	size_t nr_queues = num_possible_cpus();
> +	struct fuse_ring *res = NULL;
> +	size_t max_payload_size;
> +
> +	ring = kzalloc(sizeof(*fc->ring), GFP_KERNEL_ACCOUNT);
> +	if (!ring)
> +		return NULL;
> +
> +	ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
> +			       GFP_KERNEL_ACCOUNT);
> +	if (!ring->queues)
> +		goto out_err;
> +
> +	max_payload_size = max(FUSE_MIN_READ_BUFFER, fc->max_write);
> +	max_payload_size = max(max_payload_size, fc->max_pages * PAGE_SIZE);
> +
> +	spin_lock(&fc->lock);
> +	if (fc->ring) {
> +		/* race, another thread created the ring in the meantime */
> +		spin_unlock(&fc->lock);
> +		res = fc->ring;
> +		goto out_err;
> +	}
> +
> +	fc->ring = ring;
> +	ring->nr_queues = nr_queues;
> +	ring->fc = fc;
> +	ring->max_payload_sz = max_payload_size;
> +
> +	spin_unlock(&fc->lock);
> +	return ring;
> +
> +out_err:
> +	kfree(ring->queues);
> +	kfree(ring);
> +	return res;
> +}
> +
> +static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> +						       int qid)
> +{
> +	struct fuse_conn *fc = ring->fc;
> +	struct fuse_ring_queue *queue;
> +
> +	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
> +	if (!queue)
> +		return NULL;
> +	queue->qid = qid;
> +	queue->ring = ring;
> +	spin_lock_init(&queue->lock);
> +
> +	INIT_LIST_HEAD(&queue->ent_avail_queue);
> +	INIT_LIST_HEAD(&queue->ent_commit_queue);
> +
> +	spin_lock(&fc->lock);
> +	if (ring->queues[qid]) {
> +		spin_unlock(&fc->lock);
> +		kfree(queue);
> +		return ring->queues[qid];
> +	}
> +
> +	/*
> +	 * write_once and lock as the caller mostly doesn't take the lock at all
> +	 */
> +	WRITE_ONCE(ring->queues[qid], queue);
> +	spin_unlock(&fc->lock);
> +
> +	return queue;
> +}
> +
> +/*
> + * Make a ring entry available for fuse_req assignment
> + */
> +static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
> +				 struct fuse_ring_queue *queue)
> +{
> +	list_move(&ring_ent->list, &queue->ent_avail_queue);
> +	ring_ent->state = FRRS_AVAILABLE;
> +}
> +
> +/*
> + * fuse_uring_req_fetch command handling
> + */
> +static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
> +				   struct io_uring_cmd *cmd,
> +				   unsigned int issue_flags)
> +{
> +	struct fuse_ring_queue *queue = ring_ent->queue;
> +
> +	spin_lock(&queue->lock);
> +	fuse_uring_ent_avail(ring_ent, queue);
> +	spin_unlock(&queue->lock);
> +}
> +
> +/*
> + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
> + * the payload
> + */
> +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
> +					 struct iovec iov[FUSE_URING_IOV_SEGS])
> +{
> +	struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	struct iov_iter iter;
> +	ssize_t ret;
> +
> +	if (sqe->len != FUSE_URING_IOV_SEGS)
> +		return -EINVAL;
> +
> +	/*
> +	 * Direction for buffer access will actually be READ and WRITE,
> +	 * using write for the import should include READ access as well.
> +	 */
> +	ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
> +			   FUSE_URING_IOV_SEGS, &iov, &iter);
> +	if (ret < 0)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static struct fuse_ring_ent *
> +fuse_uring_create_ring_ent(struct io_uring_cmd *cmd,
> +			   struct fuse_ring_queue *queue)
> +{
> +	struct fuse_ring *ring = queue->ring;
> +	struct fuse_ring_ent *ent;
> +	size_t payload_size;
> +	struct iovec iov[FUSE_URING_IOV_SEGS];
> +	int err;
> +
> +	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> +	if (err) {
> +		pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
> +				    err);
> +		return ERR_PTR(err);
> +	}
> +
> +	/*
> +	 * The created queue above does not need to be destructed in
> +	 * case of entry errors below, will be done at ring destruction time.
> +	 */
> +	err = -ENOMEM;
> +	ent = kzalloc(sizeof(*ent), GFP_KERNEL_ACCOUNT);
> +	if (!ent)
> +		return ERR_PTR(err);

'ent' isn't being freed on the error paths below.

Cheers,
-- 
Luís

> 
> +
> +	INIT_LIST_HEAD(&ent->list);
> +
> +	ent->queue = queue;
> +	ent->cmd = cmd;
> +
> +	err = -EINVAL;
> +	if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
> +		pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
> +		return ERR_PTR(err);
> +	}
> +
> +	ent->headers = iov[0].iov_base;
> +	ent->payload = iov[1].iov_base;
> +	payload_size = iov[1].iov_len;
> +
> +	if (payload_size < ring->max_payload_sz) {
> +		pr_info_ratelimited("Invalid req payload len %zu\n",
> +				    payload_size);
> +		return ERR_PTR(err);
> +	}
> +
> +	return ent;
> +}
> +
> +/* Register header and payload buffer with the kernel and fetch a request */
> +static int fuse_uring_register(struct io_uring_cmd *cmd,
> +			       unsigned int issue_flags, struct fuse_conn *fc)
> +{
> +	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
> +	struct fuse_ring *ring = fc->ring;
> +	struct fuse_ring_queue *queue;
> +	struct fuse_ring_ent *ring_ent;
> +	int err;
> +	struct iovec iov[FUSE_URING_IOV_SEGS];
> +	unsigned int qid = READ_ONCE(cmd_req->qid);
> +
> +	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> +	if (err) {
> +		pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
> +				    err);
> +		return err;
> +	}
> +
> +	err = -ENOMEM;
> +	if (!ring) {
> +		ring = fuse_uring_create(fc);
> +		if (!ring)
> +			return err;
> +	}
> +
> +	if (qid >= ring->nr_queues) {
> +		pr_info_ratelimited("fuse: Invalid ring qid %u\n", qid);
> +		return -EINVAL;
> +	}
> +
> +	err = -ENOMEM;
> +	queue = ring->queues[qid];
> +	if (!queue) {
> +		queue = fuse_uring_create_queue(ring, qid);
> +		if (!queue)
> +			return err;
> +	}
> +
> +	ring_ent = fuse_uring_create_ring_ent(cmd, queue);
> +	if (IS_ERR(ring_ent))
> +		return PTR_ERR(ring_ent);
> +
> +	fuse_uring_do_register(ring_ent, cmd, issue_flags);
> +
> +	return 0;
> +}
> +
> +/*
> + * Entry function from io_uring to handle the given passthrough command
> + * (op code IORING_OP_URING_CMD)
> + */
> +int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
> +				  unsigned int issue_flags)
> +{
> +	struct fuse_dev *fud;
> +	struct fuse_conn *fc;
> +	u32 cmd_op = cmd->cmd_op;
> +	int err;
> +
> +	if (!enable_uring) {
> +		pr_info_ratelimited("fuse-io-uring is disabled\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	/* This extra SQE size holds struct fuse_uring_cmd_req */
> +	if (!(issue_flags & IO_URING_F_SQE128))
> +		return -EINVAL;
> +
> +	fud = fuse_get_dev(cmd->file);
> +	if (!fud) {
> +		pr_info_ratelimited("No fuse device found\n");
> +		return -ENOTCONN;
> +	}
> +	fc = fud->fc;
> +
> +	if (fc->aborted)
> +		return -ECONNABORTED;
> +	if (!fc->connected)
> +		return -ENOTCONN;
> +
> +	/*
> +	 * fuse_uring_register() needs the ring to be initialized,
> +	 * we need to know the max payload size
> +	 */
> +	if (!fc->initialized)
> +		return -EAGAIN;
> +
> +	switch (cmd_op) {
> +	case FUSE_IO_URING_CMD_REGISTER:
> +		err = fuse_uring_register(cmd, issue_flags, fc);
> +		if (err) {
> +			pr_info_once("FUSE_IO_URING_CMD_REGISTER failed err=%d\n",
> +				     err);
> +			return err;
> +		}
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return -EIOCBQUEUED;
> +}
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..4e46dd65196d26dabc62dada33b17de9aa511c08
> --- /dev/null
> +++ b/fs/fuse/dev_uring_i.h
> @@ -0,0 +1,116 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * FUSE: Filesystem in Userspace
> + * Copyright (c) 2023-2024 DataDirect Networks.
> + */
> +
> +#ifndef _FS_FUSE_DEV_URING_I_H
> +#define _FS_FUSE_DEV_URING_I_H
> +
> +#include "fuse_i.h"
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +
> +enum fuse_ring_req_state {
> +	FRRS_INVALID = 0,
> +
> +	/* The ring entry received from userspace and it is being processed */
> +	FRRS_COMMIT,
> +
> +	/* The ring entry is waiting for new fuse requests */
> +	FRRS_AVAILABLE,
> +
> +	/* The ring entry is in or on the way to user space */
> +	FRRS_USERSPACE,
> +};
> +
> +/** A fuse ring entry, part of the ring queue */
> +struct fuse_ring_ent {
> +	/* userspace buffer */
> +	struct fuse_uring_req_header __user *headers;
> +	void __user *payload;
> +
> +	/* the ring queue that owns the request */
> +	struct fuse_ring_queue *queue;
> +
> +	/* fields below are protected by queue->lock */
> +
> +	struct io_uring_cmd *cmd;
> +
> +	struct list_head list;
> +
> +	enum fuse_ring_req_state state;
> +
> +	struct fuse_req *fuse_req;
> +
> +	/* commit id to identify the server reply */
> +	uint64_t commit_id;
> +};
> +
> +struct fuse_ring_queue {
> +	/*
> +	 * back pointer to the main fuse uring structure that holds this
> +	 * queue
> +	 */
> +	struct fuse_ring *ring;
> +
> +	/* queue id, corresponds to the cpu core */
> +	unsigned int qid;
> +
> +	/*
> +	 * queue lock, taken when any value in the queue changes _and_ also
> +	 * a ring entry state changes.
> +	 */
> +	spinlock_t lock;
> +
> +	/* available ring entries (struct fuse_ring_ent) */
> +	struct list_head ent_avail_queue;
> +
> +	/*
> +	 * entries in the process of being committed or in the process
> +	 * to be sent to userspace
> +	 */
> +	struct list_head ent_commit_queue;
> +};
> +
> +/**
> + * Describes if uring is for communication and holds alls the data needed
> + * for uring communication
> + */
> +struct fuse_ring {
> +	/* back pointer */
> +	struct fuse_conn *fc;
> +
> +	/* number of ring queues */
> +	size_t nr_queues;
> +
> +	/* maximum payload/arg size */
> +	size_t max_payload_sz;
> +
> +	struct fuse_ring_queue **queues;
> +};
> +
> +bool fuse_uring_enabled(void);
> +void fuse_uring_destruct(struct fuse_conn *fc);
> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
> +
> +#else /* CONFIG_FUSE_IO_URING */
> +
> +struct fuse_ring;
> +
> +static inline void fuse_uring_create(struct fuse_conn *fc)
> +{
> +}
> +
> +static inline void fuse_uring_destruct(struct fuse_conn *fc)
> +{
> +}
> +
> +static inline bool fuse_uring_enabled(void)
> +{
> +	return false;
> +}
> +
> +#endif /* CONFIG_FUSE_IO_URING */
> +
> +#endif /* _FS_FUSE_DEV_URING_I_H */
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index babddd05303796d689a64f0f5a890066b43170ac..d75dd9b59a5c35b76919db760645464f604517f5 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -923,6 +923,11 @@ struct fuse_conn {
>  	/** IDR for backing files ids */
>  	struct idr backing_files_map;
>  #endif
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +	/**  uring connection information*/
> +	struct fuse_ring *ring;
> +#endif
>  };
>  
>  /*
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 3ce4f4e81d09e867c3a7db7b1dbb819f88ed34ef..e4f9bbacfc1bc6f51d5d01b4c47b42cc159ed783 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -7,6 +7,7 @@
>  */
>  
>  #include "fuse_i.h"
> +#include "dev_uring_i.h"
>  
>  #include <linux/pagemap.h>
>  #include <linux/slab.h>
> @@ -992,6 +993,8 @@ static void delayed_release(struct rcu_head *p)
>  {
>  	struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
>  
> +	fuse_uring_destruct(fc);
> +
>  	put_user_ns(fc->user_ns);
>  	fc->release(fc);
>  }
> @@ -1446,6 +1449,13 @@ void fuse_send_init(struct fuse_mount *fm)
>  	if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
>  		flags |= FUSE_PASSTHROUGH;
>  
> +	/*
> +	 * This is just an information flag for fuse server. No need to check
> +	 * the reply - server is either sending IORING_OP_URING_CMD or not.
> +	 */
> +	if (fuse_uring_enabled())
> +		flags |= FUSE_OVER_IO_URING;
> +
>  	ia->in.flags = flags;
>  	ia->in.flags2 = flags >> 32;
>  
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index f1e99458e29e4fdce5273bc3def242342f207ebd..5e0eb41d967e9de5951673de4405a3ed22cdd8e2 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -220,6 +220,15 @@
>   *
>   *  7.41
>   *  - add FUSE_ALLOW_IDMAP
> + *  7.42
> + *  - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
> + *    structures:
> + *    - struct fuse_uring_ent_in_out
> + *    - struct fuse_uring_req_header
> + *    - struct fuse_uring_cmd_req
> + *    - FUSE_URING_IN_OUT_HEADER_SZ
> + *    - FUSE_URING_OP_IN_OUT_SZ
> + *    - enum fuse_uring_cmd
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -255,7 +264,7 @@
>  #define FUSE_KERNEL_VERSION 7
>  
>  /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 41
> +#define FUSE_KERNEL_MINOR_VERSION 42
>  
>  /** The node ID of the root inode */
>  #define FUSE_ROOT_ID 1
> @@ -425,6 +434,7 @@ struct fuse_file_lock {
>   * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
>   *		    of the request ID indicates resend requests
>   * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
> + * FUSE_OVER_IO_URING: Indicate that client supports io-uring
>   */
>  #define FUSE_ASYNC_READ		(1 << 0)
>  #define FUSE_POSIX_LOCKS	(1 << 1)
> @@ -471,6 +481,7 @@ struct fuse_file_lock {
>  /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
>  #define FUSE_DIRECT_IO_RELAX	FUSE_DIRECT_IO_ALLOW_MMAP
>  #define FUSE_ALLOW_IDMAP	(1ULL << 40)
> +#define FUSE_OVER_IO_URING	(1ULL << 41)
>  
>  /**
>   * CUSE INIT request/reply flags
> @@ -1206,4 +1217,67 @@ struct fuse_supp_groups {
>  	uint32_t	groups[];
>  };
>  
> +/**
> + * Size of the ring buffer header
> + */
> +#define FUSE_URING_IN_OUT_HEADER_SZ 128
> +#define FUSE_URING_OP_IN_OUT_SZ 128
> +
> +/* Used as part of the fuse_uring_req_header */
> +struct fuse_uring_ent_in_out {
> +	uint64_t flags;
> +
> +	/*
> +	 * commit ID to be used in a reply to a ring request (see also
> +	 * struct fuse_uring_cmd_req)
> +	 */
> +	uint64_t commit_id;
> +
> +	/* size of user payload buffer */
> +	uint32_t payload_sz;
> +	uint32_t padding;
> +
> +	uint64_t reserved;
> +};
> +
> +/**
> + * Header for all fuse-io-uring requests
> + */
> +struct fuse_uring_req_header {
> +	/* struct fuse_in_header / struct fuse_out_header */
> +	char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
> +
> +	/* per op code header */
> +	char op_in[FUSE_URING_OP_IN_OUT_SZ];
> +
> +	struct fuse_uring_ent_in_out ring_ent_in_out;
> +};
> +
> +/**
> + * sqe commands to the kernel
> + */
> +enum fuse_uring_cmd {
> +	FUSE_IO_URING_CMD_INVALID = 0,
> +
> +	/* register the request buffer and fetch a fuse request */
> +	FUSE_IO_URING_CMD_REGISTER = 1,
> +
> +	/* commit fuse request result and fetch next request */
> +	FUSE_IO_URING_CMD_COMMIT_AND_FETCH = 2,
> +};
> +
> +/**
> + * In the 80B command area of the SQE.
> + */
> +struct fuse_uring_cmd_req {
> +	uint64_t flags;
> +
> +	/* entry identifier for commits */
> +	uint64_t commit_id;
> +
> +	/* queue the command is for (queue index) */
> +	uint16_t qid;
> +	uint8_t padding[6];
> +};
> +
>  #endif /* _LINUX_FUSE_H */
>
> -- 
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands
  2025-01-07  9:56   ` Luis Henriques
@ 2025-01-07 12:07     ` Bernd Schubert
  0 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07 12:07 UTC (permalink / raw)
  To: Luis Henriques, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd



On 1/7/25 10:56, Luis Henriques wrote:
> Hi Bernd,
> 
> On Tue, Jan 07 2025, Bernd Schubert wrote:
> 
>> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
>> For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
>> entries.
> 
> Please find below two (minor) comments I had already for v8.  Hopefully
> this time I'll finish reviewing rev v9!


Thank you very much, both fixed in v10 branch, I will wait a bit for
further reviews before sending out v10.
I think the leak of ring_ent in error cases is actually new v9 - I think
I wanted to move up error checking (before the allocation) and then forgot
about that :/ Done now, no need to allocate at all if the IOV is not correct.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-07  0:25 ` [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
@ 2025-01-07 14:42   ` Luis Henriques
  2025-01-07 15:59     ` Bernd Schubert
  2025-01-13 22:44   ` Joanne Koong
  2025-01-17 11:18   ` Pavel Begunkov
  2 siblings, 1 reply; 45+ messages in thread
From: Luis Henriques @ 2025-01-07 14:42 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

Hi,

On Tue, Jan 07 2025, Bernd Schubert wrote:

> This adds support for fuse request completion through ring SQEs
> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
> the ring entry it becomes available for new fuse requests.
> Handling of requests through the ring (SQE/CQE handling)
> is complete now.
>
> Fuse request data are copied through the mmaped ring buffer,
> there is no support for any zero copy yet.

Please find below a few more comments.

Also, please note that I'm trying to understand this patchset (and the
whole fuse-over-io-uring thing), so most of my comments are minor nits.
And those that are not may simply be wrong!  I'm just noting them as I
navigate through the code.

(And by the way, thanks for this work!)

> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dev_uring.c   | 450 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/dev_uring_i.h |  12 ++
>  fs/fuse/fuse_i.h      |   4 +
>  3 files changed, 466 insertions(+)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index b44ba4033615e01041313c040035b6da6af0ee17..f44e66a7ea577390da87e9ac7d118a9416898c28 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -26,6 +26,19 @@ bool fuse_uring_enabled(void)
>  	return enable_uring;
>  }
>  
> +static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
> +			       int error)
> +{
> +	struct fuse_req *req = ring_ent->fuse_req;
> +
> +	if (set_err)
> +		req->out.h.error = error;
> +
> +	clear_bit(FR_SENT, &req->flags);
> +	fuse_request_end(ring_ent->fuse_req);
> +	ring_ent->fuse_req = NULL;
> +}
> +
>  void fuse_uring_destruct(struct fuse_conn *fc)
>  {
>  	struct fuse_ring *ring = fc->ring;
> @@ -41,8 +54,11 @@ void fuse_uring_destruct(struct fuse_conn *fc)
>  			continue;
>  
>  		WARN_ON(!list_empty(&queue->ent_avail_queue));
> +		WARN_ON(!list_empty(&queue->ent_w_req_queue));
>  		WARN_ON(!list_empty(&queue->ent_commit_queue));
> +		WARN_ON(!list_empty(&queue->ent_in_userspace));
>  
> +		kfree(queue->fpq.processing);
>  		kfree(queue);
>  		ring->queues[qid] = NULL;
>  	}
> @@ -101,20 +117,34 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>  {
>  	struct fuse_conn *fc = ring->fc;
>  	struct fuse_ring_queue *queue;
> +	struct list_head *pq;
>  
>  	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
>  	if (!queue)
>  		return NULL;
> +	pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
> +	if (!pq) {
> +		kfree(queue);
> +		return NULL;
> +	}
> +
>  	queue->qid = qid;
>  	queue->ring = ring;
>  	spin_lock_init(&queue->lock);
>  
>  	INIT_LIST_HEAD(&queue->ent_avail_queue);
>  	INIT_LIST_HEAD(&queue->ent_commit_queue);
> +	INIT_LIST_HEAD(&queue->ent_w_req_queue);
> +	INIT_LIST_HEAD(&queue->ent_in_userspace);
> +	INIT_LIST_HEAD(&queue->fuse_req_queue);
> +
> +	queue->fpq.processing = pq;
> +	fuse_pqueue_init(&queue->fpq);
>  
>  	spin_lock(&fc->lock);
>  	if (ring->queues[qid]) {
>  		spin_unlock(&fc->lock);
> +		kfree(queue->fpq.processing);
>  		kfree(queue);
>  		return ring->queues[qid];
>  	}
> @@ -128,6 +158,214 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>  	return queue;
>  }
>  
> +/*
> + * Checks for errors and stores it into the request
> + */
> +static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
> +					 struct fuse_req *req,
> +					 struct fuse_conn *fc)
> +{
> +	int err;
> +
> +	err = -EINVAL;
> +	if (oh->unique == 0) {
> +		/* Not supportd through io-uring yet */

typo: "supported"

> +		pr_warn_once("notify through fuse-io-uring not supported\n");
> +		goto seterr;
> +	}
> +
> +	err = -EINVAL;

Not really needed, it already has this value.

> +	if (oh->error <= -ERESTARTSYS || oh->error > 0)
> +		goto seterr;
> +
> +	if (oh->error) {
> +		err = oh->error;
> +		goto err;
> +	}
> +
> +	err = -ENOENT;
> +	if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
> +		pr_warn_ratelimited("unique mismatch, expected: %llu got %llu\n",
> +				    req->in.h.unique,
> +				    oh->unique & ~FUSE_INT_REQ_BIT);
> +		goto seterr;
> +	}
> +
> +	/*
> +	 * Is it an interrupt reply ID?
> +	 * XXX: Not supported through fuse-io-uring yet, it should not even
> +	 *      find the request - should not happen.
> +	 */
> +	WARN_ON_ONCE(oh->unique & FUSE_INT_REQ_BIT);
> +
> +	return 0;
> +
> +seterr:
> +	oh->error = err;
> +err:
> +	return err;
> +}
> +
> +static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
> +				     struct fuse_req *req,
> +				     struct fuse_ring_ent *ent)
> +{
> +	struct fuse_copy_state cs;
> +	struct fuse_args *args = req->args;
> +	struct iov_iter iter;
> +	int err, res;

nit: no need for two variables; one of the 'int' variables could be
removed.  There are other functions with a similar pattern, but this was
the first one that caught my attention.

> +	struct fuse_uring_ent_in_out ring_in_out;
> +
> +	res = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
> +			     sizeof(ring_in_out));
> +	if (res)
> +		return -EFAULT;
> +
> +	err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz,
> +			  &iter);
> +	if (err)
> +		return err;
> +
> +	fuse_copy_init(&cs, 0, &iter);
> +	cs.is_uring = 1;
> +	cs.req = req;
> +
> +	return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
> +}
> +
> + /*
> +  * Copy data from the req to the ring buffer
> +  */

nit: extra space in comment indentation.

> 
> +static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
> +				   struct fuse_ring_ent *ent)
> +{
> +	struct fuse_copy_state cs;
> +	struct fuse_args *args = req->args;
> +	struct fuse_in_arg *in_args = args->in_args;
> +	int num_args = args->in_numargs;
> +	int err, res;
> +	struct iov_iter iter;
> +	struct fuse_uring_ent_in_out ent_in_out = {
> +		.flags = 0,
> +		.commit_id = ent->commit_id,
> +	};
> +
> +	if (WARN_ON(ent_in_out.commit_id == 0))
> +		return -EINVAL;
> +
> +	err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter);
> +	if (err) {
> +		pr_info_ratelimited("fuse: Import of user buffer failed\n");
> +		return err;
> +	}
> +
> +	fuse_copy_init(&cs, 1, &iter);
> +	cs.is_uring = 1;
> +	cs.req = req;
> +
> +	if (num_args > 0) {
> +		/*
> +		 * Expectation is that the first argument is the per op header.
> +		 * Some op code have that as zero.
> +		 */
> +		if (args->in_args[0].size > 0) {
> +			res = copy_to_user(&ent->headers->op_in, in_args->value,
> +					   in_args->size);
> +			err = res > 0 ? -EFAULT : res;
> +			if (err) {
> +				pr_info_ratelimited(
> +					"Copying the header failed.\n");
> +				return err;
> +			}
> +		}
> +		in_args++;
> +		num_args--;
> +	}
> +
> +	/* copy the payload */
> +	err = fuse_copy_args(&cs, num_args, args->in_pages,
> +			     (struct fuse_arg *)in_args, 0);
> +	if (err) {
> +		pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
> +		return err;
> +	}
> +
> +	ent_in_out.payload_sz = cs.ring.copied_sz;
> +	res = copy_to_user(&ent->headers->ring_ent_in_out, &ent_in_out,
> +			   sizeof(ent_in_out));
> +	err = res > 0 ? -EFAULT : res;
> +	if (err)
> +		return err;

Simply return err? :-)

> +
> +	return 0;
> +}
> +
> +static int
> +fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
> +{
> +	struct fuse_ring_queue *queue = ring_ent->queue;
> +	struct fuse_ring *ring = queue->ring;
> +	struct fuse_req *req = ring_ent->fuse_req;
> +	int err, res;
> +
> +	err = -EIO;
> +	if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
> +		pr_err("qid=%d ring-req=%p invalid state %d on send\n",
> +		       queue->qid, ring_ent, ring_ent->state);
> +		err = -EIO;

'err' initialized twice.  One of these could be removed.

> +		goto err;
> +	}
> +
> +	/* copy the request */
> +	err = fuse_uring_copy_to_ring(ring, req, ring_ent);
> +	if (unlikely(err)) {
> +		pr_info_ratelimited("Copy to ring failed: %d\n", err);
> +		goto err;
> +	}
> +
> +	/* copy fuse_in_header */
> +	res = copy_to_user(&ring_ent->headers->in_out, &req->in.h,
> +			   sizeof(req->in.h));
> +	err = res > 0 ? -EFAULT : res;
> +	if (err)
> +		goto err;
> +
> +	set_bit(FR_SENT, &req->flags);
> +	return 0;
> +
> +err:
> +	fuse_uring_req_end(ring_ent, true, err);
> +	return err;
> +}
> +
> +/*
> + * Write data to the ring buffer and send the request to userspace,
> + * userspace will read it
> + * This is comparable with classical read(/dev/fuse)
> + */
> +static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent,
> +					unsigned int issue_flags)
> +{
> +	int err = 0;
> +	struct fuse_ring_queue *queue = ring_ent->queue;
> +
> +	err = fuse_uring_prepare_send(ring_ent);
> +	if (err)
> +		goto err;

Since this is the only place where this label is used, it could simply
return 'err' and the label removed.

> +
> +	spin_lock(&queue->lock);
> +	ring_ent->state = FRRS_USERSPACE;
> +	list_move(&ring_ent->list, &queue->ent_in_userspace);
> +	spin_unlock(&queue->lock);
> +
> +	io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags);
> +	ring_ent->cmd = NULL;
> +	return 0;
> +
> +err:
> +	return err;
> +}
> +
>  /*
>   * Make a ring entry available for fuse_req assignment
>   */
> @@ -138,6 +376,210 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>  	ring_ent->state = FRRS_AVAILABLE;
>  }
>  
> +/* Used to find the request on SQE commit */
> +static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
> +				 struct fuse_req *req)
> +{
> +	struct fuse_ring_queue *queue = ring_ent->queue;
> +	struct fuse_pqueue *fpq = &queue->fpq;
> +	unsigned int hash;
> +
> +	/* commit_id is the unique id of the request */
> +	ring_ent->commit_id = req->in.h.unique;
> +
> +	req->ring_entry = ring_ent;
> +	hash = fuse_req_hash(ring_ent->commit_id);
> +	list_move_tail(&req->list, &fpq->processing[hash]);
> +}
> +
> +/*
> + * Assign a fuse queue entry to the given entry
> + */
> +static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
> +					   struct fuse_req *req)
> +{
> +	struct fuse_ring_queue *queue = ring_ent->queue;
> +
> +	lockdep_assert_held(&queue->lock);
> +
> +	if (WARN_ON_ONCE(ring_ent->state != FRRS_AVAILABLE &&
> +			 ring_ent->state != FRRS_COMMIT)) {
> +		pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
> +			ring_ent->state);
> +	}
> +	list_del_init(&req->list);
> +	clear_bit(FR_PENDING, &req->flags);
> +	ring_ent->fuse_req = req;
> +	ring_ent->state = FRRS_FUSE_REQ;
> +	list_move(&ring_ent->list, &queue->ent_w_req_queue);
> +	fuse_uring_add_to_pq(ring_ent, req);
> +}
> +
> +/*
> + * Release the ring entry and fetch the next fuse request if available
> + *
> + * @return true if a new request has been fetched
> + */
> +static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
> +	__must_hold(&queue->lock)
> +{
> +	struct fuse_req *req;
> +	struct fuse_ring_queue *queue = ring_ent->queue;
> +	struct list_head *req_queue = &queue->fuse_req_queue;
> +
> +	lockdep_assert_held(&queue->lock);
> +
> +	/* get and assign the next entry while it is still holding the lock */
> +	req = list_first_entry_or_null(req_queue, struct fuse_req, list);
> +	if (req) {
> +		fuse_uring_add_req_to_ring_ent(ring_ent, req);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Read data from the ring buffer, which user space has written to
> + * This is comparible with handling of classical write(/dev/fuse).

nit: "comparable"

> + * Also make the ring request available again for new fuse requests.
> + */
> +static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
> +			      unsigned int issue_flags)
> +{
> +	struct fuse_ring *ring = ring_ent->queue->ring;
> +	struct fuse_conn *fc = ring->fc;
> +	struct fuse_req *req = ring_ent->fuse_req;
> +	ssize_t err = 0;
> +	bool set_err = false;
> +
> +	err = copy_from_user(&req->out.h, &ring_ent->headers->in_out,
> +			     sizeof(req->out.h));
> +	if (err) {
> +		req->out.h.error = err;
> +		goto out;
> +	}
> +
> +	err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
> +	if (err) {
> +		/* req->out.h.error already set */
> +		goto out;
> +	}
> +
> +	err = fuse_uring_copy_from_ring(ring, req, ring_ent);
> +	if (err)
> +		set_err = true;
> +
> +out:
> +	fuse_uring_req_end(ring_ent, set_err, err);
> +}
> +
> +/*
> + * Get the next fuse req and send it
> + */
> +static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
> +				     struct fuse_ring_queue *queue,
> +				     unsigned int issue_flags)
> +{
> +	int err;
> +	bool has_next;
> +
> +retry:
> +	spin_lock(&queue->lock);
> +	fuse_uring_ent_avail(ring_ent, queue);
> +	has_next = fuse_uring_ent_assign_req(ring_ent);
> +	spin_unlock(&queue->lock);
> +
> +	if (has_next) {
> +		err = fuse_uring_send_next_to_ring(ring_ent, issue_flags);
> +		if (err)
> +			goto retry;

I wonder whether this is safe.  Maybe this is *obviously* safe, but I'm
still trying to understand this patchset; so, for me, it is not :-)

Would it be worth it trying to limit the maximum number of retries?

> +	}
> +}
> +
> +static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
> +{
> +	struct fuse_ring_queue *queue = ent->queue;
> +
> +	lockdep_assert_held(&queue->lock);
> +
> +	if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
> +		return -EIO;
> +
> +	ent->state = FRRS_COMMIT;
> +	list_move(&ent->list, &queue->ent_commit_queue);
> +
> +	return 0;
> +}
> +
> +/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
> +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
> +				   struct fuse_conn *fc)
> +{
> +	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
> +	struct fuse_ring_ent *ring_ent;
> +	int err;
> +	struct fuse_ring *ring = fc->ring;
> +	struct fuse_ring_queue *queue;
> +	uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
> +	unsigned int qid = READ_ONCE(cmd_req->qid);
> +	struct fuse_pqueue *fpq;
> +	struct fuse_req *req;
> +
> +	err = -ENOTCONN;
> +	if (!ring)
> +		return err;
> +
> +	if (qid >= ring->nr_queues)
> +		return -EINVAL;
> +
> +	queue = ring->queues[qid];
> +	if (!queue)
> +		return err;
> +	fpq = &queue->fpq;
> +
> +	spin_lock(&queue->lock);
> +	/* Find a request based on the unique ID of the fuse request
> +	 * This should get revised, as it needs a hash calculation and list
> +	 * search. And full struct fuse_pqueue is needed (memory overhead).
> +	 * As well as the link from req to ring_ent.
> +	 */
> +	req = fuse_request_find(fpq, commit_id);
> +	err = -ENOENT;
> +	if (!req) {
> +		pr_info("qid=%d commit_id %llu not found\n", queue->qid,
> +			commit_id);
> +		spin_unlock(&queue->lock);
> +		return err;
> +	}
> +	list_del_init(&req->list);
> +	ring_ent = req->ring_entry;
> +	req->ring_entry = NULL;
> +
> +	err = fuse_ring_ent_set_commit(ring_ent);
> +	if (err != 0) {

I'm probably missing something, but because we removed 'req' from the list
above, aren't we leaking it if we get an error here?

Cheers,
-- 
Luís

> +		pr_info_ratelimited("qid=%d commit_id %llu state %d",
> +				    queue->qid, commit_id, ring_ent->state);
> +		spin_unlock(&queue->lock);
> +		return err;
> +	}
> +
> +	ring_ent->cmd = cmd;
> +	spin_unlock(&queue->lock);
> +
> +	/* without the queue lock, as other locks are taken */
> +	fuse_uring_commit(ring_ent, issue_flags);
> +
> +	/*
> +	 * Fetching the next request is absolutely required as queued
> +	 * fuse requests would otherwise not get processed - committing
> +	 * and fetching is done in one step vs legacy fuse, which has separated
> +	 * read (fetch request) and write (commit result).
> +	 */
> +	fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
> +	return 0;
> +}
> +
>  /*
>   * fuse_uring_req_fetch command handling
>   */
> @@ -325,6 +767,14 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
>  			return err;
>  		}
>  		break;
> +	case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
> +		err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
> +		if (err) {
> +			pr_info_once("FUSE_IO_URING_COMMIT_AND_FETCH failed err=%d\n",
> +				     err);
> +			return err;
> +		}
> +		break;
>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index 4e46dd65196d26dabc62dada33b17de9aa511c08..80f1c62d4df7f0ca77c4d5179068df6ffdbf7d85 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -20,6 +20,9 @@ enum fuse_ring_req_state {
>  	/* The ring entry is waiting for new fuse requests */
>  	FRRS_AVAILABLE,
>  
> +	/* The ring entry got assigned a fuse req */
> +	FRRS_FUSE_REQ,
> +
>  	/* The ring entry is in or on the way to user space */
>  	FRRS_USERSPACE,
>  };
> @@ -70,7 +73,16 @@ struct fuse_ring_queue {
>  	 * entries in the process of being committed or in the process
>  	 * to be sent to userspace
>  	 */
> +	struct list_head ent_w_req_queue;
>  	struct list_head ent_commit_queue;
> +
> +	/* entries in userspace */
> +	struct list_head ent_in_userspace;
> +
> +	/* fuse requests waiting for an entry slot */
> +	struct list_head fuse_req_queue;
> +
> +	struct fuse_pqueue fpq;
>  };
>  
>  /**
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e545b0864dd51e82df61cc39bdf65d3d36a418dc..e71556894bc25808581424ec7bdd4afeebc81f15 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -438,6 +438,10 @@ struct fuse_req {
>  
>  	/** fuse_mount this request belongs to */
>  	struct fuse_mount *fm;
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +	void *ring_entry;
> +#endif
>  };
>  
>  struct fuse_iqueue;
>
> -- 
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries
  2025-01-07  0:25 ` [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
@ 2025-01-07 15:31   ` Luis Henriques
  2025-01-17 11:23   ` Pavel Begunkov
  1 sibling, 0 replies; 45+ messages in thread
From: Luis Henriques @ 2025-01-07 15:31 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Tue, Jan 07 2025, Bernd Schubert wrote:

> On teardown struct file_operations::uring_cmd requests
> need to be completed by calling io_uring_cmd_done().
> Not completing all ring entries would result in busy io-uring
> tasks giving warning messages in intervals and unreleased
> struct file.
>
> Additionally the fuse connection and with that the ring can
> only get released when all io-uring commands are completed.
>
> Completion is done with ring entries that are
> a) in waiting state for new fuse requests - io_uring_cmd_done
> is needed
>
> b) already in userspace - io_uring_cmd_done through teardown
> is not needed, the request can just get released. If fuse server
> is still active and commits such a ring entry, fuse_uring_cmd()
> already checks if the connection is active and then complete the
> io-uring itself with -ENOTCONN. I.e. special handling is not
> needed.
>
> This scheme is basically represented by the ring entry state
> FRRS_WAIT and FRRS_USERSPACE.
>
> Entries in state:
> - FRRS_INIT: No action needed, do not contribute to
>   ring->queue_refs yet
> - All other states: Are currently processed by other tasks,
>   async teardown is needed and it has to wait for the two
>   states above. It could be also solved without an async
>   teardown task, but would require additional if conditions
>   in hot code paths. Also in my personal opinion the code
>   looks cleaner with async teardown.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dev.c         |   9 +++
>  fs/fuse/dev_uring.c   | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/dev_uring_i.h |  51 +++++++++++++
>  3 files changed, 258 insertions(+)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index aa33eba51c51dff6af2cdcf60bed9c3f6b4bc0d0..1c21e491e891196c77c7f6135cdc2aece785d399 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -6,6 +6,7 @@
>    See the file COPYING.
>  */
>  
> +#include "dev_uring_i.h"
>  #include "fuse_i.h"
>  #include "fuse_dev_i.h"
>  
> @@ -2291,6 +2292,12 @@ void fuse_abort_conn(struct fuse_conn *fc)
>  		spin_unlock(&fc->lock);
>  
>  		fuse_dev_end_requests(&to_end);
> +
> +		/*
> +		 * fc->lock must not be taken to avoid conflicts with io-uring
> +		 * locks
> +		 */
> +		fuse_uring_abort(fc);
>  	} else {
>  		spin_unlock(&fc->lock);
>  	}
> @@ -2302,6 +2309,8 @@ void fuse_wait_aborted(struct fuse_conn *fc)
>  	/* matches implicit memory barrier in fuse_drop_waiting() */
>  	smp_mb();
>  	wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
> +
> +	fuse_uring_wait_stopped_queues(fc);
>  }
>  
>  int fuse_dev_release(struct inode *inode, struct file *file)
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index f44e66a7ea577390da87e9ac7d118a9416898c28..01a908b2ef9ada14b759ca047eab40b4c4431d89 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -39,6 +39,37 @@ static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
>  	ring_ent->fuse_req = NULL;
>  }
>  
> +/* Abort all list queued request on the given ring queue */
> +static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
> +{
> +	struct fuse_req *req;
> +	LIST_HEAD(req_list);
> +
> +	spin_lock(&queue->lock);
> +	list_for_each_entry(req, &queue->fuse_req_queue, list)
> +		clear_bit(FR_PENDING, &req->flags);
> +	list_splice_init(&queue->fuse_req_queue, &req_list);
> +	spin_unlock(&queue->lock);
> +
> +	/* must not hold queue lock to avoid order issues with fi->lock */
> +	fuse_dev_end_requests(&req_list);
> +}
> +
> +void fuse_uring_abort_end_requests(struct fuse_ring *ring)
> +{
> +	int qid;
> +	struct fuse_ring_queue *queue;
> +
> +	for (qid = 0; qid < ring->nr_queues; qid++) {
> +		queue = READ_ONCE(ring->queues[qid]);
> +		if (!queue)
> +			continue;
> +
> +		queue->stopped = true;
> +		fuse_uring_abort_end_queue_requests(queue);
> +	}
> +}
> +
>  void fuse_uring_destruct(struct fuse_conn *fc)
>  {
>  	struct fuse_ring *ring = fc->ring;
> @@ -98,10 +129,13 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
>  		goto out_err;
>  	}
>  
> +	init_waitqueue_head(&ring->stop_waitq);
> +
>  	fc->ring = ring;
>  	ring->nr_queues = nr_queues;
>  	ring->fc = fc;
>  	ring->max_payload_sz = max_payload_size;
> +	atomic_set(&ring->queue_refs, 0);
>  
>  	spin_unlock(&fc->lock);
>  	return ring;
> @@ -158,6 +192,166 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>  	return queue;
>  }
>  
> +static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
> +{
> +	struct fuse_req *req = ent->fuse_req;
> +
> +	/* remove entry from fuse_pqueue->processing */
> +	list_del_init(&req->list);
> +	ent->fuse_req = NULL;
> +	clear_bit(FR_SENT, &req->flags);
> +	req->out.h.error = -ECONNABORTED;
> +	fuse_request_end(req);
> +}
> +
> +/*
> + * Release a request/entry on connection tear down
> + */
> +static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
> +{
> +	if (ent->cmd) {
> +		io_uring_cmd_done(ent->cmd, -ENOTCONN, 0, IO_URING_F_UNLOCKED);
> +		ent->cmd = NULL;
> +	}
> +
> +	if (ent->fuse_req)
> +		fuse_uring_stop_fuse_req_end(ent);
> +
> +	list_del_init(&ent->list);
> +	kfree(ent);
> +}
> +
> +static void fuse_uring_stop_list_entries(struct list_head *head,
> +					 struct fuse_ring_queue *queue,
> +					 enum fuse_ring_req_state exp_state)
> +{
> +	struct fuse_ring *ring = queue->ring;
> +	struct fuse_ring_ent *ent, *next;
> +	ssize_t queue_refs = SSIZE_MAX;
> +	LIST_HEAD(to_teardown);
> +
> +	spin_lock(&queue->lock);
> +	list_for_each_entry_safe(ent, next, head, list) {
> +		if (ent->state != exp_state) {
> +			pr_warn("entry teardown qid=%d state=%d expected=%d",
> +				queue->qid, ent->state, exp_state);
> +			continue;
> +		}
> +
> +		list_move(&ent->list, &to_teardown);
> +	}
> +	spin_unlock(&queue->lock);
> +
> +	/* no queue lock to avoid lock order issues */
> +	list_for_each_entry_safe(ent, next, &to_teardown, list) {
> +		fuse_uring_entry_teardown(ent);
> +		queue_refs = atomic_dec_return(&ring->queue_refs);
> +		WARN_ON_ONCE(queue_refs < 0);
> +	}
> +}
> +
> +static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue)
> +{
> +	fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue,
> +				     FRRS_USERSPACE);
> +	fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue,
> +				     FRRS_AVAILABLE);
> +}
> +
> +/*
> + * Log state debug info
> + */
> +static void fuse_uring_log_ent_state(struct fuse_ring *ring)
> +{
> +	int qid;
> +	struct fuse_ring_ent *ent;
> +
> +	for (qid = 0; qid < ring->nr_queues; qid++) {
> +		struct fuse_ring_queue *queue = ring->queues[qid];
> +
> +		if (!queue)
> +			continue;
> +
> +		spin_lock(&queue->lock);
> +		/*
> +		 * Log entries from the intermediate queue, the other queues
> +		 * should be empty
> +		 */
> +		list_for_each_entry(ent, &queue->ent_w_req_queue, list) {
> +			pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
> +				ring, qid, ent, ent->state);
> +		}
> +		list_for_each_entry(ent, &queue->ent_commit_queue, list) {
> +			pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",

Probably copy&paste: the above string 'ent-req-queue' should probably be
'ent-commit-queue' or something similar.

> +				ring, qid, ent, ent->state);
> +		}
> +		spin_unlock(&queue->lock);
> +	}
> +	ring->stop_debug_log = 1;
> +}
> +
> +static void fuse_uring_async_stop_queues(struct work_struct *work)
> +{
> +	int qid;
> +	struct fuse_ring *ring =
> +		container_of(work, struct fuse_ring, async_teardown_work.work);
> +
> +	/* XXX code dup */

Yeah, I guess the delayed work callback could simply call
fuse_uring_stop_queues(), which would do different things depending on the
value of ring->teardown_time (0 or jiffies).  Which could also be
confusing.

> 
> +	for (qid = 0; qid < ring->nr_queues; qid++) {
> +		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
> +
> +		if (!queue)
> +			continue;
> +
> +		fuse_uring_teardown_entries(queue);
> +	}
> +
> +	/*
> +	 * Some ring entries are might be in the middle of IO operations,

nit: remove extra 'are'.

> +	 * i.e. in process to get handled by file_operations::uring_cmd
> +	 * or on the way to userspace - we could handle that with conditions in
> +	 * run time code, but easier/cleaner to have an async tear down handler
> +	 * If there are still queue references left
> +	 */
> +	if (atomic_read(&ring->queue_refs) > 0) {
> +		if (time_after(jiffies,
> +			       ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT))
> +			fuse_uring_log_ent_state(ring);
> +
> +		schedule_delayed_work(&ring->async_teardown_work,
> +				      FUSE_URING_TEARDOWN_INTERVAL);
> +	} else {
> +		wake_up_all(&ring->stop_waitq);
> +	}
> +}
> +
> +/*
> + * Stop the ring queues
> + */
> +void fuse_uring_stop_queues(struct fuse_ring *ring)
> +{
> +	int qid;
> +
> +	for (qid = 0; qid < ring->nr_queues; qid++) {
> +		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
> +
> +		if (!queue)
> +			continue;
> +
> +		fuse_uring_teardown_entries(queue);
> +	}
> +
> +	if (atomic_read(&ring->queue_refs) > 0) {
> +		ring->teardown_time = jiffies;
> +		INIT_DELAYED_WORK(&ring->async_teardown_work,
> +				  fuse_uring_async_stop_queues);
> +		schedule_delayed_work(&ring->async_teardown_work,
> +				      FUSE_URING_TEARDOWN_INTERVAL);
> +	} else {
> +		wake_up_all(&ring->stop_waitq);
> +	}
> +}
> +
>  /*
>   * Checks for errors and stores it into the request
>   */
> @@ -538,6 +732,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>  		return err;
>  	fpq = &queue->fpq;
>  
> +	if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped))
> +		return err;
> +
>  	spin_lock(&queue->lock);
>  	/* Find a request based on the unique ID of the fuse request
>  	 * This should get revised, as it needs a hash calculation and list
> @@ -667,6 +864,7 @@ fuse_uring_create_ring_ent(struct io_uring_cmd *cmd,
>  		return ERR_PTR(err);
>  	}
>  
> +	atomic_inc(&ring->queue_refs);
>  	return ent;
>  }
>  
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index 80f1c62d4df7f0ca77c4d5179068df6ffdbf7d85..ee5aeccae66caaf9a4dccbbbc785820836182668 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -11,6 +11,9 @@
>  
>  #ifdef CONFIG_FUSE_IO_URING
>  
> +#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ)
> +#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20)
> +
>  enum fuse_ring_req_state {
>  	FRRS_INVALID = 0,
>  
> @@ -83,6 +86,8 @@ struct fuse_ring_queue {
>  	struct list_head fuse_req_queue;
>  
>  	struct fuse_pqueue fpq;
> +
> +	bool stopped;
>  };
>  
>  /**
> @@ -100,12 +105,51 @@ struct fuse_ring {
>  	size_t max_payload_sz;
>  
>  	struct fuse_ring_queue **queues;
> +	/*
> +	 * Log ring entry states onces on stop when entries cannot be

typo: "once"

> +	 * released
> +	 */
> +	unsigned int stop_debug_log : 1;
> +
> +	wait_queue_head_t stop_waitq;
> +
> +	/* async tear down */
> +	struct delayed_work async_teardown_work;
> +
> +	/* log */
> +	unsigned long teardown_time;
> +
> +	atomic_t queue_refs;
>  };
>  
>  bool fuse_uring_enabled(void);
>  void fuse_uring_destruct(struct fuse_conn *fc);
> +void fuse_uring_stop_queues(struct fuse_ring *ring);
> +void fuse_uring_abort_end_requests(struct fuse_ring *ring);
>  int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
>  
> +static inline void fuse_uring_abort(struct fuse_conn *fc)
> +{
> +	struct fuse_ring *ring = fc->ring;
> +
> +	if (ring == NULL)
> +		return;
> +
> +	if (atomic_read(&ring->queue_refs) > 0) {
> +		fuse_uring_abort_end_requests(ring);
> +		fuse_uring_stop_queues(ring);
> +	}
> +}
> +
> +static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
> +{
> +	struct fuse_ring *ring = fc->ring;
> +
> +	if (ring)
> +		wait_event(ring->stop_waitq,
> +			   atomic_read(&ring->queue_refs) == 0);
> +}
> +
>  #else /* CONFIG_FUSE_IO_URING */
>  
>  struct fuse_ring;
> @@ -123,6 +167,13 @@ static inline bool fuse_uring_enabled(void)
>  	return false;
>  }
>  
> +static inline void fuse_uring_abort(struct fuse_conn *fc)
> +{
> +}
> +
> +static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
> +{
> +}
>  #endif /* CONFIG_FUSE_IO_URING */
>  
>  #endif /* _FS_FUSE_DEV_URING_I_H */
>
> -- 
> 2.43.0
>
>

-- 
Luís

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring
  2025-01-07  0:25 ` [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring Bernd Schubert
@ 2025-01-07 15:54   ` Luis Henriques
  2025-01-07 18:59     ` Bernd Schubert
  2025-01-17 11:47   ` Pavel Begunkov
  2025-01-17 21:52   ` Bernd Schubert
  2 siblings, 1 reply; 45+ messages in thread
From: Luis Henriques @ 2025-01-07 15:54 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Tue, Jan 07 2025, Bernd Schubert wrote:

> This prepares queueing and sending foreground requests through
> io-uring.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dev_uring.c   | 185 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  fs/fuse/dev_uring_i.h |  11 ++-
>  2 files changed, 187 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 01a908b2ef9ada14b759ca047eab40b4c4431d89..89a22a4eee23cbba49bac7a2d2126bb51193326f 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -26,6 +26,29 @@ bool fuse_uring_enabled(void)
>  	return enable_uring;
>  }
>  
> +struct fuse_uring_pdu {
> +	struct fuse_ring_ent *ring_ent;
> +};
> +
> +static const struct fuse_iqueue_ops fuse_io_uring_ops;
> +
> +static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd,
> +				   struct fuse_ring_ent *ring_ent)
> +{
> +	struct fuse_uring_pdu *pdu =
> +		io_uring_cmd_to_pdu(cmd, struct fuse_uring_pdu);
> +
> +	pdu->ring_ent = ring_ent;
> +}
> +
> +static struct fuse_ring_ent *uring_cmd_to_ring_ent(struct io_uring_cmd *cmd)
> +{
> +	struct fuse_uring_pdu *pdu =
> +		io_uring_cmd_to_pdu(cmd, struct fuse_uring_pdu);
> +
> +	return pdu->ring_ent;
> +}
> +
>  static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
>  			       int error)
>  {
> @@ -441,7 +464,7 @@ static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
>  	struct iov_iter iter;
>  	struct fuse_uring_ent_in_out ent_in_out = {
>  		.flags = 0,
> -		.commit_id = ent->commit_id,
> +		.commit_id = req->in.h.unique,
>  	};
>  
>  	if (WARN_ON(ent_in_out.commit_id == 0))
> @@ -460,7 +483,7 @@ static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
>  	if (num_args > 0) {
>  		/*
>  		 * Expectation is that the first argument is the per op header.
> -		 * Some op code have that as zero.
> +		 * Some op code have that as zero size.
>  		 */
>  		if (args->in_args[0].size > 0) {
>  			res = copy_to_user(&ent->headers->op_in, in_args->value,
> @@ -578,11 +601,8 @@ static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
>  	struct fuse_pqueue *fpq = &queue->fpq;
>  	unsigned int hash;
>  
> -	/* commit_id is the unique id of the request */
> -	ring_ent->commit_id = req->in.h.unique;
> -
>  	req->ring_entry = ring_ent;
> -	hash = fuse_req_hash(ring_ent->commit_id);
> +	hash = fuse_req_hash(req->in.h.unique);
>  	list_move_tail(&req->list, &fpq->processing[hash]);
>  }
>  
> @@ -777,6 +797,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>  	return 0;
>  }
>  
> +static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
> +{
> +	int qid;
> +	struct fuse_ring_queue *queue;
> +	bool ready = true;
> +
> +	for (qid = 0; qid < ring->nr_queues && ready; qid++) {
> +		if (current_qid == qid)
> +			continue;
> +
> +		queue = ring->queues[qid];
> +		if (!queue) {
> +			ready = false;
> +			break;
> +		}
> +
> +		spin_lock(&queue->lock);
> +		if (list_empty(&queue->ent_avail_queue))
> +			ready = false;
> +		spin_unlock(&queue->lock);
> +	}
> +
> +	return ready;
> +}
> +
>  /*
>   * fuse_uring_req_fetch command handling
>   */
> @@ -785,10 +830,22 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
>  				   unsigned int issue_flags)
>  {
>  	struct fuse_ring_queue *queue = ring_ent->queue;
> +	struct fuse_ring *ring = queue->ring;
> +	struct fuse_conn *fc = ring->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
>  
>  	spin_lock(&queue->lock);
>  	fuse_uring_ent_avail(ring_ent, queue);
>  	spin_unlock(&queue->lock);
> +
> +	if (!ring->ready) {
> +		bool ready = is_ring_ready(ring, queue->qid);
> +
> +		if (ready) {
> +			WRITE_ONCE(ring->ready, true);
> +			fiq->ops = &fuse_io_uring_ops;

Shouldn't we be taking the fiq->lock to protect the above operation?

> +		}
> +	}
>  }
>  
>  /*
> @@ -979,3 +1036,119 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
>  
>  	return -EIOCBQUEUED;
>  }
> +
> +/*
> + * This prepares and sends the ring request in fuse-uring task context.
> + * User buffers are not mapped yet - the application does not have permission
> + * to write to it - this has to be executed in ring task context.
> + */
> +static void
> +fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
> +			    unsigned int issue_flags)
> +{
> +	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
> +	struct fuse_ring_queue *queue = ent->queue;
> +	int err;
> +
> +	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
> +		err = -ECANCELED;
> +		goto terminating;
> +	}
> +
> +	err = fuse_uring_prepare_send(ent);
> +	if (err)
> +		goto err;

Suggestion: simplify this function flow.  Something like:

	int err = 0;

	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD))
		err = -ECANCELED;
	else if (fuse_uring_prepare_send(ent)) {
		fuse_uring_next_fuse_req(ent, queue, issue_flags);
		return;
	}
	spin_lock(&queue->lock);
        [...]

> +		goto terminating;
> +	}
> +
> +	err = fuse_uring_prepare_send(ent);
> +	if (err)
> +		goto err;

> +
> +terminating:
> +	spin_lock(&queue->lock);
> +	ent->state = FRRS_USERSPACE;
> +	list_move(&ent->list, &queue->ent_in_userspace);
> +	spin_unlock(&queue->lock);
> +	io_uring_cmd_done(cmd, err, 0, issue_flags);
> +	ent->cmd = NULL;
> +	return;
> +err:
> +	fuse_uring_next_fuse_req(ent, queue, issue_flags);
> +}
> +
> +static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
> +{
> +	unsigned int qid;
> +	struct fuse_ring_queue *queue;
> +
> +	qid = task_cpu(current);
> +
> +	if (WARN_ONCE(qid >= ring->nr_queues,
> +		      "Core number (%u) exceeds nr ueues (%zu)\n", qid,

typo: 'queues'

> +		      ring->nr_queues))
> +		qid = 0;
> +
> +	queue = ring->queues[qid];
> +	if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
> +		return NULL;

nit: no need for this if statement.  The WARN_ONCE() is enough.

Cheers,
-- 
Luís

> +
> +	return queue;
> +}
> +
> +/* queue a fuse request and send it if a ring entry is available */
> +void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_ring *ring = fc->ring;
> +	struct fuse_ring_queue *queue;
> +	struct fuse_ring_ent *ent = NULL;
> +	int err;
> +
> +	err = -EINVAL;
> +	queue = fuse_uring_task_to_queue(ring);
> +	if (!queue)
> +		goto err;
> +
> +	if (req->in.h.opcode != FUSE_NOTIFY_REPLY)
> +		req->in.h.unique = fuse_get_unique(fiq);
> +
> +	spin_lock(&queue->lock);
> +	err = -ENOTCONN;
> +	if (unlikely(queue->stopped))
> +		goto err_unlock;
> +
> +	ent = list_first_entry_or_null(&queue->ent_avail_queue,
> +				       struct fuse_ring_ent, list);
> +	if (ent)
> +		fuse_uring_add_req_to_ring_ent(ent, req);
> +	else
> +		list_add_tail(&req->list, &queue->fuse_req_queue);
> +	spin_unlock(&queue->lock);
> +
> +	if (ent) {
> +		struct io_uring_cmd *cmd = ent->cmd;
> +
> +		err = -EIO;
> +		if (WARN_ON_ONCE(ent->state != FRRS_FUSE_REQ))
> +			goto err;
> +
> +		uring_cmd_set_ring_ent(cmd, ent);
> +		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
> +	}
> +
> +	return;
> +
> +err_unlock:
> +	spin_unlock(&queue->lock);
> +err:
> +	req->out.h.error = err;
> +	clear_bit(FR_PENDING, &req->flags);
> +	fuse_request_end(req);
> +}
> +
> +static const struct fuse_iqueue_ops fuse_io_uring_ops = {
> +	/* should be send over io-uring as enhancement */
> +	.send_forget = fuse_dev_queue_forget,
> +
> +	/*
> +	 * could be send over io-uring, but interrupts should be rare,
> +	 * no need to make the code complex
> +	 */
> +	.send_interrupt = fuse_dev_queue_interrupt,
> +	.send_req = fuse_uring_queue_fuse_req,
> +};
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index ee5aeccae66caaf9a4dccbbbc785820836182668..cda330978faa019ceedf161f50d86db976b072e2 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -48,9 +48,6 @@ struct fuse_ring_ent {
>  	enum fuse_ring_req_state state;
>  
>  	struct fuse_req *fuse_req;
> -
> -	/* commit id to identify the server reply */
> -	uint64_t commit_id;
>  };
>  
>  struct fuse_ring_queue {
> @@ -120,6 +117,8 @@ struct fuse_ring {
>  	unsigned long teardown_time;
>  
>  	atomic_t queue_refs;
> +
> +	bool ready;
>  };
>  
>  bool fuse_uring_enabled(void);
> @@ -127,6 +126,7 @@ void fuse_uring_destruct(struct fuse_conn *fc);
>  void fuse_uring_stop_queues(struct fuse_ring *ring);
>  void fuse_uring_abort_end_requests(struct fuse_ring *ring);
>  int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
> +void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
>  
>  static inline void fuse_uring_abort(struct fuse_conn *fc)
>  {
> @@ -150,6 +150,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
>  			   atomic_read(&ring->queue_refs) == 0);
>  }
>  
> +static inline bool fuse_uring_ready(struct fuse_conn *fc)
> +{
> +	return fc->ring && fc->ring->ready;
> +}
> +
>  #else /* CONFIG_FUSE_IO_URING */
>  
>  struct fuse_ring;
>
> -- 
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-07 14:42   ` Luis Henriques
@ 2025-01-07 15:59     ` Bernd Schubert
  2025-01-07 16:21       ` Luis Henriques
  0 siblings, 1 reply; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07 15:59 UTC (permalink / raw)
  To: Luis Henriques, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei



On 1/7/25 15:42, Luis Henriques wrote:
> Hi,
> 
> On Tue, Jan 07 2025, Bernd Schubert wrote:
> 
>> This adds support for fuse request completion through ring SQEs
>> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
>> the ring entry it becomes available for new fuse requests.
>> Handling of requests through the ring (SQE/CQE handling)
>> is complete now.
>>
>> Fuse request data are copied through the mmaped ring buffer,
>> there is no support for any zero copy yet.
> 
> Please find below a few more comments.

Thanks, I fixed all comments, except of retry in fuse_uring_next_fuse_req.


[...]

> 
> Also, please note that I'm trying to understand this patchset (and the
> whole fuse-over-io-uring thing), so most of my comments are minor nits.
> And those that are not may simply be wrong!  I'm just noting them as I
> navigate through the code.
> 
> (And by the way, thanks for this work!)
> 
>> +/*
>> + * Get the next fuse req and send it
>> + */
>> +static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
>> +				     struct fuse_ring_queue *queue,
>> +				     unsigned int issue_flags)
>> +{
>> +	int err;
>> +	bool has_next;
>> +
>> +retry:
>> +	spin_lock(&queue->lock);
>> +	fuse_uring_ent_avail(ring_ent, queue);
>> +	has_next = fuse_uring_ent_assign_req(ring_ent);
>> +	spin_unlock(&queue->lock);
>> +
>> +	if (has_next) {
>> +		err = fuse_uring_send_next_to_ring(ring_ent, issue_flags);
>> +		if (err)
>> +			goto retry;
> 
> I wonder whether this is safe.  Maybe this is *obviously* safe, but I'm
> still trying to understand this patchset; so, for me, it is not :-)
> 
> Would it be worth it trying to limit the maximum number of retries?

No, we cannot limit retries. Let's do a simple example with one ring
entry and also just one queue. Multiple applications create fuse
requests. The first application fills the only available ring entry
and submits it, the others just get queued in queue->fuse_req_queue.
After that the application just waits request_wait_answer()

On commit of the first request the ring task has to take the next
request from queue->fuse_req_queue - if something fails with that
request it has to complete it and proceed to the next request.
If we would introduce a max-retries here, it would put the ring entry
on hold (FRRS_AVAILABLE) and until another application comes, it would
forever wait there. The applications waiting in request_wait_answer
would never complete either.


> 
>> +	}
>> +}
>> +
>> +static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
>> +{
>> +	struct fuse_ring_queue *queue = ent->queue;
>> +
>> +	lockdep_assert_held(&queue->lock);
>> +
>> +	if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
>> +		return -EIO;
>> +
>> +	ent->state = FRRS_COMMIT;
>> +	list_move(&ent->list, &queue->ent_commit_queue);
>> +
>> +	return 0;
>> +}
>> +
>> +/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
>> +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>> +				   struct fuse_conn *fc)
>> +{
>> +	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
>> +	struct fuse_ring_ent *ring_ent;
>> +	int err;
>> +	struct fuse_ring *ring = fc->ring;
>> +	struct fuse_ring_queue *queue;
>> +	uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
>> +	unsigned int qid = READ_ONCE(cmd_req->qid);
>> +	struct fuse_pqueue *fpq;
>> +	struct fuse_req *req;
>> +
>> +	err = -ENOTCONN;
>> +	if (!ring)
>> +		return err;
>> +
>> +	if (qid >= ring->nr_queues)
>> +		return -EINVAL;
>> +
>> +	queue = ring->queues[qid];
>> +	if (!queue)
>> +		return err;
>> +	fpq = &queue->fpq;
>> +
>> +	spin_lock(&queue->lock);
>> +	/* Find a request based on the unique ID of the fuse request
>> +	 * This should get revised, as it needs a hash calculation and list
>> +	 * search. And full struct fuse_pqueue is needed (memory overhead).
>> +	 * As well as the link from req to ring_ent.
>> +	 */
>> +	req = fuse_request_find(fpq, commit_id);
>> +	err = -ENOENT;
>> +	if (!req) {
>> +		pr_info("qid=%d commit_id %llu not found\n", queue->qid,
>> +			commit_id);
>> +		spin_unlock(&queue->lock);
>> +		return err;
>> +	}
>> +	list_del_init(&req->list);
>> +	ring_ent = req->ring_entry;
>> +	req->ring_entry = NULL;
>> +
>> +	err = fuse_ring_ent_set_commit(ring_ent);
>> +	if (err != 0) {
> 
> I'm probably missing something, but because we removed 'req' from the list
> above, aren't we leaking it if we get an error here?

Hmm, yeah, that is debatable. We basically have a grave error here.
Either kernel or userspace are doing something wrong. Though probably
you are right and we should end the request with EIO.


Thanks,
Bernd




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2025-01-07  0:25 ` [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2025-01-07 16:14   ` Luis Henriques
  2025-01-07 19:03     ` Bernd Schubert
  2025-01-17 11:52   ` Pavel Begunkov
  1 sibling, 1 reply; 45+ messages in thread
From: Luis Henriques @ 2025-01-07 16:14 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Tue, Jan 07 2025, Bernd Schubert wrote:

> When the fuse-server terminates while the fuse-client or kernel
> still has queued URING_CMDs, these commands retain references
> to the struct file used by the fuse connection. This prevents
> fuse_dev_release() from being invoked, resulting in a hung mount
> point.
>
> This patch addresses the issue by making queued URING_CMDs
> cancelable, allowing fuse_dev_release() to proceed as expected
> and preventing the mount point from hanging.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dev.c         |  2 ++
>  fs/fuse/dev_uring.c   | 71 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  fs/fuse/dev_uring_i.h |  9 +++++++
>  3 files changed, 79 insertions(+), 3 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index afafa960d4725d9b64b22f17bf09c846219396d6..1b593b23f7b8c319ec38c7e726dabf516965500e 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -599,8 +599,10 @@ static int fuse_request_queue_background(struct fuse_req *req)
>  	}
>  	__set_bit(FR_ISREPLY, &req->flags);
>  
> +#ifdef CONFIG_FUSE_IO_URING
>  	if (fuse_uring_ready(fc))
>  		return fuse_request_queue_background_uring(fc, req);
> +#endif

I guess this should be moved to the previous patch.

Cheers,
-- 
Luís

>  
>  	spin_lock(&fc->bg_lock);
>  	if (likely(fc->connected)) {
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 4e4385dff9315d25aa8c37a37f1e902aec3fcd20..cdd3917b365f4040c0f147648b09af9a41e2f49e 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -153,6 +153,7 @@ void fuse_uring_destruct(struct fuse_conn *fc)
>  
>  	for (qid = 0; qid < ring->nr_queues; qid++) {
>  		struct fuse_ring_queue *queue = ring->queues[qid];
> +		struct fuse_ring_ent *ent, *next;
>  
>  		if (!queue)
>  			continue;
> @@ -162,6 +163,12 @@ void fuse_uring_destruct(struct fuse_conn *fc)
>  		WARN_ON(!list_empty(&queue->ent_commit_queue));
>  		WARN_ON(!list_empty(&queue->ent_in_userspace));
>  
> +		list_for_each_entry_safe(ent, next, &queue->ent_released,
> +					 list) {
> +			list_del_init(&ent->list);
> +			kfree(ent);
> +		}
> +
>  		kfree(queue->fpq.processing);
>  		kfree(queue);
>  		ring->queues[qid] = NULL;
> @@ -245,6 +252,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>  	INIT_LIST_HEAD(&queue->ent_in_userspace);
>  	INIT_LIST_HEAD(&queue->fuse_req_queue);
>  	INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
> +	INIT_LIST_HEAD(&queue->ent_released);
>  
>  	queue->fpq.processing = pq;
>  	fuse_pqueue_init(&queue->fpq);
> @@ -283,6 +291,7 @@ static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
>   */
>  static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
>  {
> +	struct fuse_ring_queue *queue = ent->queue;
>  	if (ent->cmd) {
>  		io_uring_cmd_done(ent->cmd, -ENOTCONN, 0, IO_URING_F_UNLOCKED);
>  		ent->cmd = NULL;
> @@ -291,8 +300,16 @@ static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
>  	if (ent->fuse_req)
>  		fuse_uring_stop_fuse_req_end(ent);
>  
> -	list_del_init(&ent->list);
> -	kfree(ent);
> +	/*
> +	 * The entry must not be freed immediately, due to access of direct
> +	 * pointer access of entries through IO_URING_F_CANCEL - there is a risk
> +	 * of race between daemon termination (which triggers IO_URING_F_CANCEL
> +	 * and accesses entries without checking the list state first
> +	 */
> +	spin_lock(&queue->lock);
> +	list_move(&ent->list, &queue->ent_released);
> +	ent->state = FRRS_RELEASED;
> +	spin_unlock(&queue->lock);
>  }
>  
>  static void fuse_uring_stop_list_entries(struct list_head *head,
> @@ -312,6 +329,7 @@ static void fuse_uring_stop_list_entries(struct list_head *head,
>  			continue;
>  		}
>  
> +		ent->state = FRRS_TEARDOWN;
>  		list_move(&ent->list, &to_teardown);
>  	}
>  	spin_unlock(&queue->lock);
> @@ -426,6 +444,46 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
>  	}
>  }
>  
> +/*
> + * Handle IO_URING_F_CANCEL, typically should come on daemon termination.
> + *
> + * Releasing the last entry should trigger fuse_dev_release() if
> + * the daemon was terminated
> + */
> +static void fuse_uring_cancel(struct io_uring_cmd *cmd,
> +			      unsigned int issue_flags)
> +{
> +	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
> +	struct fuse_ring_queue *queue;
> +	bool need_cmd_done = false;
> +
> +	/*
> +	 * direct access on ent - it must not be destructed as long as
> +	 * IO_URING_F_CANCEL might come up
> +	 */
> +	queue = ent->queue;
> +	spin_lock(&queue->lock);
> +	if (ent->state == FRRS_AVAILABLE) {
> +		ent->state = FRRS_USERSPACE;
> +		list_move(&ent->list, &queue->ent_in_userspace);
> +		need_cmd_done = true;
> +		ent->cmd = NULL;
> +	}
> +	spin_unlock(&queue->lock);
> +
> +	if (need_cmd_done) {
> +		/* no queue lock to avoid lock order issues */
> +		io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
> +	}
> +}
> +
> +static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
> +				      struct fuse_ring_ent *ring_ent)
> +{
> +	uring_cmd_set_ring_ent(cmd, ring_ent);
> +	io_uring_cmd_mark_cancelable(cmd, issue_flags);
> +}
> +
>  /*
>   * Checks for errors and stores it into the request
>   */
> @@ -836,6 +894,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>  	spin_unlock(&queue->lock);
>  
>  	/* without the queue lock, as other locks are taken */
> +	fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
>  	fuse_uring_commit(ring_ent, issue_flags);
>  
>  	/*
> @@ -885,6 +944,8 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
>  	struct fuse_conn *fc = ring->fc;
>  	struct fuse_iqueue *fiq = &fc->iq;
>  
> +	fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
> +
>  	spin_lock(&queue->lock);
>  	fuse_uring_ent_avail(ring_ent, queue);
>  	spin_unlock(&queue->lock);
> @@ -1041,6 +1102,11 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
>  		return -EOPNOTSUPP;
>  	}
>  
> +	if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
> +		fuse_uring_cancel(cmd, issue_flags);
> +		return 0;
> +	}
> +
>  	/* This extra SQE size holds struct fuse_uring_cmd_req */
>  	if (!(issue_flags & IO_URING_F_SQE128))
>  		return -EINVAL;
> @@ -1173,7 +1239,6 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
>  
>  	if (ent) {
>  		struct io_uring_cmd *cmd = ent->cmd;
> -
>  		err = -EIO;
>  		if (WARN_ON_ONCE(ent->state != FRRS_FUSE_REQ))
>  			goto err;
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index a4271f4e55aa9d2d9b42f3d2c4095887f9563351..af2b3de829949a778d60493f36588fea67a4ba85 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -28,6 +28,12 @@ enum fuse_ring_req_state {
>  
>  	/* The ring entry is in or on the way to user space */
>  	FRRS_USERSPACE,
> +
> +	/* The ring entry is in teardown */
> +	FRRS_TEARDOWN,
> +
> +	/* The ring entry is released, but not freed yet */
> +	FRRS_RELEASED,
>  };
>  
>  /** A fuse ring entry, part of the ring queue */
> @@ -79,6 +85,9 @@ struct fuse_ring_queue {
>  	/* entries in userspace */
>  	struct list_head ent_in_userspace;
>  
> +	/* entries that are released */
> +	struct list_head ent_released;
> +
>  	/* fuse requests waiting for an entry slot */
>  	struct list_head fuse_req_queue;
>  
>
> -- 
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-07 15:59     ` Bernd Schubert
@ 2025-01-07 16:21       ` Luis Henriques
  0 siblings, 0 replies; 45+ messages in thread
From: Luis Henriques @ 2025-01-07 16:21 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Bernd Schubert, Miklos Szeredi, Jens Axboe, Pavel Begunkov,
	linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei

On Tue, Jan 07 2025, Bernd Schubert wrote:

> On 1/7/25 15:42, Luis Henriques wrote:
>> Hi,
>> On Tue, Jan 07 2025, Bernd Schubert wrote:
>> 
>>> This adds support for fuse request completion through ring SQEs
>>> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
>>> the ring entry it becomes available for new fuse requests.
>>> Handling of requests through the ring (SQE/CQE handling)
>>> is complete now.
>>>
>>> Fuse request data are copied through the mmaped ring buffer,
>>> there is no support for any zero copy yet.
>> Please find below a few more comments.
>
> Thanks, I fixed all comments, except of retry in fuse_uring_next_fuse_req.

Awesome, thanks for taking those comments into account.

> [...]
>
>> Also, please note that I'm trying to understand this patchset (and the
>> whole fuse-over-io-uring thing), so most of my comments are minor nits.
>> And those that are not may simply be wrong!  I'm just noting them as I
>> navigate through the code.
>> (And by the way, thanks for this work!)
>> 
>>> +/*
>>> + * Get the next fuse req and send it
>>> + */
>>> +static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
>>> +				     struct fuse_ring_queue *queue,
>>> +				     unsigned int issue_flags)
>>> +{
>>> +	int err;
>>> +	bool has_next;
>>> +
>>> +retry:
>>> +	spin_lock(&queue->lock);
>>> +	fuse_uring_ent_avail(ring_ent, queue);
>>> +	has_next = fuse_uring_ent_assign_req(ring_ent);
>>> +	spin_unlock(&queue->lock);
>>> +
>>> +	if (has_next) {
>>> +		err = fuse_uring_send_next_to_ring(ring_ent, issue_flags);
>>> +		if (err)
>>> +			goto retry;
>> I wonder whether this is safe.  Maybe this is *obviously* safe, but I'm
>> still trying to understand this patchset; so, for me, it is not :-)
>> Would it be worth it trying to limit the maximum number of retries?
>
> No, we cannot limit retries. Let's do a simple example with one ring
> entry and also just one queue. Multiple applications create fuse
> requests. The first application fills the only available ring entry
> and submits it, the others just get queued in queue->fuse_req_queue.
> After that the application just waits request_wait_answer()
>
> On commit of the first request the ring task has to take the next
> request from queue->fuse_req_queue - if something fails with that
> request it has to complete it and proceed to the next request.
> If we would introduce a max-retries here, it would put the ring entry
> on hold (FRRS_AVAILABLE) and until another application comes, it would
> forever wait there. The applications waiting in request_wait_answer
> would never complete either.

Oh! OK, I see it now.  I totally misunderstood it then.  Thanks for taking
your taking explaining it.

Cheers,
-- 
Luís

>>> +	}
>>> +}
>>> +
>>> +static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
>>> +{
>>> +	struct fuse_ring_queue *queue = ent->queue;
>>> +
>>> +	lockdep_assert_held(&queue->lock);
>>> +
>>> +	if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
>>> +		return -EIO;
>>> +
>>> +	ent->state = FRRS_COMMIT;
>>> +	list_move(&ent->list, &queue->ent_commit_queue);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
>>> +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>>> +				   struct fuse_conn *fc)
>>> +{
>>> +	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
>>> +	struct fuse_ring_ent *ring_ent;
>>> +	int err;
>>> +	struct fuse_ring *ring = fc->ring;
>>> +	struct fuse_ring_queue *queue;
>>> +	uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
>>> +	unsigned int qid = READ_ONCE(cmd_req->qid);
>>> +	struct fuse_pqueue *fpq;
>>> +	struct fuse_req *req;
>>> +
>>> +	err = -ENOTCONN;
>>> +	if (!ring)
>>> +		return err;
>>> +
>>> +	if (qid >= ring->nr_queues)
>>> +		return -EINVAL;
>>> +
>>> +	queue = ring->queues[qid];
>>> +	if (!queue)
>>> +		return err;
>>> +	fpq = &queue->fpq;
>>> +
>>> +	spin_lock(&queue->lock);
>>> +	/* Find a request based on the unique ID of the fuse request
>>> +	 * This should get revised, as it needs a hash calculation and list
>>> +	 * search. And full struct fuse_pqueue is needed (memory overhead).
>>> +	 * As well as the link from req to ring_ent.
>>> +	 */
>>> +	req = fuse_request_find(fpq, commit_id);
>>> +	err = -ENOENT;
>>> +	if (!req) {
>>> +		pr_info("qid=%d commit_id %llu not found\n", queue->qid,
>>> +			commit_id);
>>> +		spin_unlock(&queue->lock);
>>> +		return err;
>>> +	}
>>> +	list_del_init(&req->list);
>>> +	ring_ent = req->ring_entry;
>>> +	req->ring_entry = NULL;
>>> +
>>> +	err = fuse_ring_ent_set_commit(ring_ent);
>>> +	if (err != 0) {
>> I'm probably missing something, but because we removed 'req' from the list
>> above, aren't we leaking it if we get an error here?
>
> Hmm, yeah, that is debatable. We basically have a grave error here.
> Either kernel or userspace are doing something wrong. Though probably
> you are right and we should end the request with EIO.
>
>
> Thanks,
> Bernd
>
>
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring
  2025-01-07 15:54   ` Luis Henriques
@ 2025-01-07 18:59     ` Bernd Schubert
  2025-01-07 21:25       ` Luis Henriques
  0 siblings, 1 reply; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07 18:59 UTC (permalink / raw)
  To: Luis Henriques, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei



On 1/7/25 16:54, Luis Henriques wrote:

[...]

>> @@ -785,10 +830,22 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
>>   				   unsigned int issue_flags)
>>   {
>>   	struct fuse_ring_queue *queue = ring_ent->queue;
>> +	struct fuse_ring *ring = queue->ring;
>> +	struct fuse_conn *fc = ring->fc;
>> +	struct fuse_iqueue *fiq = &fc->iq;
>>   
>>   	spin_lock(&queue->lock);
>>   	fuse_uring_ent_avail(ring_ent, queue);
>>   	spin_unlock(&queue->lock);
>> +
>> +	if (!ring->ready) {
>> +		bool ready = is_ring_ready(ring, queue->qid);
>> +
>> +		if (ready) {
>> +			WRITE_ONCE(ring->ready, true);
>> +			fiq->ops = &fuse_io_uring_ops;
> 
> Shouldn't we be taking the fiq->lock to protect the above operation?

I switched the order and changed it to WRITE_ONCE. fiq->lock would
require that doing the operations would also hold lock.
Also see "[PATCH v9 16/17] fuse: block request allocation until",
there should be no races anyone.

> 
>> +		}
>> +	}
>>   }
>>   
>>   /*
>> @@ -979,3 +1036,119 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
>>   
>>   	return -EIOCBQUEUED;
>>   }
>> +
>> +/*
>> + * This prepares and sends the ring request in fuse-uring task context.
>> + * User buffers are not mapped yet - the application does not have permission
>> + * to write to it - this has to be executed in ring task context.
>> + */
>> +static void
>> +fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
>> +			    unsigned int issue_flags)
>> +{
>> +	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
>> +	struct fuse_ring_queue *queue = ent->queue;
>> +	int err;
>> +
>> +	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
>> +		err = -ECANCELED;
>> +		goto terminating;
>> +	}
>> +
>> +	err = fuse_uring_prepare_send(ent);
>> +	if (err)
>> +		goto err;
> 
> Suggestion: simplify this function flow.  Something like:
> 
> 	int err = 0;
> 
> 	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD))
> 		err = -ECANCELED;
> 	else if (fuse_uring_prepare_send(ent)) {
> 		fuse_uring_next_fuse_req(ent, queue, issue_flags);
> 		return;
> 	}
> 	spin_lock(&queue->lock);
>          [...]

That makes it look like fuse_uring_prepare_send is not an
error, but expected. How about like this?

static void
fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
			    unsigned int issue_flags)
{
	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
	struct fuse_ring_queue *queue = ent->queue;
	int err;

	if (!(issue_flags & IO_URING_F_TASK_DEAD)) {
		err = fuse_uring_prepare_send(ent);
		if (err) {
			fuse_uring_next_fuse_req(ent, queue, issue_flags);
			return;
		}
	} else {
		err = -ECANCELED;
	}

	spin_lock(&queue->lock);
	ent->state = FRRS_USERSPACE;
	list_move(&ent->list, &queue->ent_in_userspace);
	spin_unlock(&queue->lock);

	io_uring_cmd_done(cmd, err, 0, issue_flags);
	ent->cmd = NULL;
}



Thanks,
Bernd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2025-01-07 16:14   ` Luis Henriques
@ 2025-01-07 19:03     ` Bernd Schubert
  0 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-07 19:03 UTC (permalink / raw)
  To: Luis Henriques, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei



On 1/7/25 17:14, Luis Henriques wrote:
> On Tue, Jan 07 2025, Bernd Schubert wrote:
> 
>> When the fuse-server terminates while the fuse-client or kernel
>> still has queued URING_CMDs, these commands retain references
>> to the struct file used by the fuse connection. This prevents
>> fuse_dev_release() from being invoked, resulting in a hung mount
>> point.
>>
>> This patch addresses the issue by making queued URING_CMDs
>> cancelable, allowing fuse_dev_release() to proceed as expected
>> and preventing the mount point from hanging.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>>   fs/fuse/dev.c         |  2 ++
>>   fs/fuse/dev_uring.c   | 71 ++++++++++++++++++++++++++++++++++++++++++++++++---
>>   fs/fuse/dev_uring_i.h |  9 +++++++
>>   3 files changed, 79 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index afafa960d4725d9b64b22f17bf09c846219396d6..1b593b23f7b8c319ec38c7e726dabf516965500e 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -599,8 +599,10 @@ static int fuse_request_queue_background(struct fuse_req *req)
>>   	}
>>   	__set_bit(FR_ISREPLY, &req->flags);
>>   
>> +#ifdef CONFIG_FUSE_IO_URING
>>   	if (fuse_uring_ready(fc))
>>   		return fuse_request_queue_background_uring(fc, req);
>> +#endif
> 

Oh sorry, I had tried to remove the ifdef in v9, but didn't succeed
and added back in the wrong patch.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring
  2025-01-07 18:59     ` Bernd Schubert
@ 2025-01-07 21:25       ` Luis Henriques
  0 siblings, 0 replies; 45+ messages in thread
From: Luis Henriques @ 2025-01-07 21:25 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Luis Henriques, Bernd Schubert, Miklos Szeredi, Jens Axboe,
	Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei

On Tue, Jan 07 2025, Bernd Schubert wrote:

> On 1/7/25 16:54, Luis Henriques wrote:
>
> [...]
>
>>> @@ -785,10 +830,22 @@ static void fuse_uring_do_register(struct fuse_ring_ent *ring_ent,
>>>   				   unsigned int issue_flags)
>>>   {
>>>   	struct fuse_ring_queue *queue = ring_ent->queue;
>>> +	struct fuse_ring *ring = queue->ring;
>>> +	struct fuse_conn *fc = ring->fc;
>>> +	struct fuse_iqueue *fiq = &fc->iq;
>>>     	spin_lock(&queue->lock);
>>>   	fuse_uring_ent_avail(ring_ent, queue);
>>>   	spin_unlock(&queue->lock);
>>> +
>>> +	if (!ring->ready) {
>>> +		bool ready = is_ring_ready(ring, queue->qid);
>>> +
>>> +		if (ready) {
>>> +			WRITE_ONCE(ring->ready, true);
>>> +			fiq->ops = &fuse_io_uring_ops;
>> Shouldn't we be taking the fiq->lock to protect the above operation?
>
> I switched the order and changed it to WRITE_ONCE. fiq->lock would
> require that doing the operations would also hold lock.
> Also see "[PATCH v9 16/17] fuse: block request allocation until",
> there should be no races anyone.

OK, great.  I still need to go read the code a few more times, I guess.
Thank you for your help understanding this code, Bernd.

Cheers,
-- 
Luís

>> 
>>> +		}
>>> +	}
>>>   }
>>>     /*
>>> @@ -979,3 +1036,119 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
>>>     	return -EIOCBQUEUED;
>>>   }
>>> +
>>> +/*
>>> + * This prepares and sends the ring request in fuse-uring task context.
>>> + * User buffers are not mapped yet - the application does not have permission
>>> + * to write to it - this has to be executed in ring task context.
>>> + */
>>> +static void
>>> +fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
>>> +			    unsigned int issue_flags)
>>> +{
>>> +	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
>>> +	struct fuse_ring_queue *queue = ent->queue;
>>> +	int err;
>>> +
>>> +	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
>>> +		err = -ECANCELED;
>>> +		goto terminating;
>>> +	}
>>> +
>>> +	err = fuse_uring_prepare_send(ent);
>>> +	if (err)
>>> +		goto err;
>> Suggestion: simplify this function flow.  Something like:
>> 	int err = 0;
>> 	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD))
>> 		err = -ECANCELED;
>> 	else if (fuse_uring_prepare_send(ent)) {
>> 		fuse_uring_next_fuse_req(ent, queue, issue_flags);
>> 		return;
>> 	}
>> 	spin_lock(&queue->lock);
>>          [...]
>
> That makes it look like fuse_uring_prepare_send is not an
> error, but expected. How about like this?
>
> static void
> fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
> 			    unsigned int issue_flags)
> {
> 	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
> 	struct fuse_ring_queue *queue = ent->queue;
> 	int err;
>
> 	if (!(issue_flags & IO_URING_F_TASK_DEAD)) {
> 		err = fuse_uring_prepare_send(ent);
> 		if (err) {
> 			fuse_uring_next_fuse_req(ent, queue, issue_flags);
> 			return;
> 		}
> 	} else {
> 		err = -ECANCELED;
> 	}
>
> 	spin_lock(&queue->lock);
> 	ent->state = FRRS_USERSPACE;
> 	list_move(&ent->list, &queue->ent_in_userspace);
> 	spin_unlock(&queue->lock);
>
> 	io_uring_cmd_done(cmd, err, 0, issue_flags);
> 	ent->cmd = NULL;
> }
>
>
>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 08/17] fuse: Add fuse-io-uring handling into fuse_copy
  2025-01-07  0:25 ` [PATCH v9 08/17] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
@ 2025-01-10 22:18   ` Joanne Koong
  0 siblings, 0 replies; 45+ messages in thread
From: Joanne Koong @ 2025-01-10 22:18 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Mon, Jan 6, 2025 at 4:25 PM Bernd Schubert <[email protected]> wrote:
>
> Add special fuse-io-uring into the fuse argument
> copy handler.
>
> Signed-off-by: Bernd Schubert <[email protected]>

Reviewed-by: Joanne Koong <[email protected]>

> ---
>  fs/fuse/dev.c        | 12 +++++++++++-
>  fs/fuse/fuse_dev_i.h |  4 ++++
>  2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 6ee7e28a84c80a3e7c8dc933986c0388371ff6cd..8b03a540e151daa1f62986aa79030e9e7a456059 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -786,6 +786,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
>         *size -= ncpy;
>         cs->len -= ncpy;
>         cs->offset += ncpy;
> +       if (cs->is_uring)
> +               cs->ring.copied_sz += ncpy;
> +
>         return ncpy;
>  }
>
> @@ -1922,7 +1925,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
>  int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
>                        unsigned nbytes)
>  {
> -       unsigned reqsize = sizeof(struct fuse_out_header);
> +
> +       unsigned int reqsize = 0;
> +
> +       /*
> +        * Uring has all headers separated from args - args is payload only
> +        */
> +       if (!cs->is_uring)
> +               reqsize = sizeof(struct fuse_out_header);
>
>         reqsize += fuse_len_args(args->out_numargs, args->out_args);
>
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 21eb1bdb492d04f0a406d25bb8d300b34244dce2..4a8a4feb2df53fb84938a6711e6bcfd0f1b9f615 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -27,6 +27,10 @@ struct fuse_copy_state {
>         unsigned int len;
>         unsigned int offset;
>         unsigned int move_pages:1;
> +       unsigned int is_uring:1;
> +       struct {
> +               unsigned int copied_sz; /* copied size into the user buffer */
> +       } ring;
>  };
>
>  static inline struct fuse_dev *fuse_get_dev(struct file *file)
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-07  0:25 ` [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
  2025-01-07 14:42   ` Luis Henriques
@ 2025-01-13 22:44   ` Joanne Koong
  2025-01-20  0:33     ` Bernd Schubert
  2025-01-17 11:18   ` Pavel Begunkov
  2 siblings, 1 reply; 45+ messages in thread
From: Joanne Koong @ 2025-01-13 22:44 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Mon, Jan 6, 2025 at 4:25 PM Bernd Schubert <[email protected]> wrote:
>
> This adds support for fuse request completion through ring SQEs
> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
> the ring entry it becomes available for new fuse requests.
> Handling of requests through the ring (SQE/CQE handling)
> is complete now.
>
> Fuse request data are copied through the mmaped ring buffer,
> there is no support for any zero copy yet.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dev_uring.c   | 450 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/dev_uring_i.h |  12 ++
>  fs/fuse/fuse_i.h      |   4 +
>  3 files changed, 466 insertions(+)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index b44ba4033615e01041313c040035b6da6af0ee17..f44e66a7ea577390da87e9ac7d118a9416898c28 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -26,6 +26,19 @@ bool fuse_uring_enabled(void)
>         return enable_uring;
>  }
>
> +static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
> +                              int error)
> +{
> +       struct fuse_req *req = ring_ent->fuse_req;
> +
> +       if (set_err)
> +               req->out.h.error = error;

I think we could get away with not having the "bool set_err" as an
argument if we do "if (error)" directly. AFAICT, we can use the value
of error directly since  it always returns zero on success and any
non-zero value is considered an error.

> +
> +       clear_bit(FR_SENT, &req->flags);
> +       fuse_request_end(ring_ent->fuse_req);
> +       ring_ent->fuse_req = NULL;
> +}
> +
>  void fuse_uring_destruct(struct fuse_conn *fc)
>  {
>         struct fuse_ring *ring = fc->ring;
> @@ -41,8 +54,11 @@ void fuse_uring_destruct(struct fuse_conn *fc)
>                         continue;
>
>                 WARN_ON(!list_empty(&queue->ent_avail_queue));
> +               WARN_ON(!list_empty(&queue->ent_w_req_queue));
>                 WARN_ON(!list_empty(&queue->ent_commit_queue));
> +               WARN_ON(!list_empty(&queue->ent_in_userspace));
>
> +               kfree(queue->fpq.processing);
>                 kfree(queue);
>                 ring->queues[qid] = NULL;
>         }
> @@ -101,20 +117,34 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>  {
>         struct fuse_conn *fc = ring->fc;
>         struct fuse_ring_queue *queue;
> +       struct list_head *pq;
>
>         queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
>         if (!queue)
>                 return NULL;
> +       pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
> +       if (!pq) {
> +               kfree(queue);
> +               return NULL;
> +       }
> +
>         queue->qid = qid;
>         queue->ring = ring;
>         spin_lock_init(&queue->lock);
>
>         INIT_LIST_HEAD(&queue->ent_avail_queue);
>         INIT_LIST_HEAD(&queue->ent_commit_queue);
> +       INIT_LIST_HEAD(&queue->ent_w_req_queue);
> +       INIT_LIST_HEAD(&queue->ent_in_userspace);
> +       INIT_LIST_HEAD(&queue->fuse_req_queue);
> +
> +       queue->fpq.processing = pq;
> +       fuse_pqueue_init(&queue->fpq);
>
>         spin_lock(&fc->lock);
>         if (ring->queues[qid]) {
>                 spin_unlock(&fc->lock);
> +               kfree(queue->fpq.processing);
>                 kfree(queue);
>                 return ring->queues[qid];
>         }
> @@ -128,6 +158,214 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>         return queue;
>  }
>
> +/*
> + * Checks for errors and stores it into the request
> + */
> +static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
> +                                        struct fuse_req *req,
> +                                        struct fuse_conn *fc)
> +{
> +       int err;
> +
> +       err = -EINVAL;
> +       if (oh->unique == 0) {
> +               /* Not supportd through io-uring yet */
> +               pr_warn_once("notify through fuse-io-uring not supported\n");
> +               goto seterr;
> +       }
> +
> +       err = -EINVAL;
> +       if (oh->error <= -ERESTARTSYS || oh->error > 0)
> +               goto seterr;
> +
> +       if (oh->error) {
> +               err = oh->error;
> +               goto err;
> +       }
> +
> +       err = -ENOENT;
> +       if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
> +               pr_warn_ratelimited("unique mismatch, expected: %llu got %llu\n",
> +                                   req->in.h.unique,
> +                                   oh->unique & ~FUSE_INT_REQ_BIT);
> +               goto seterr;
> +       }
> +
> +       /*
> +        * Is it an interrupt reply ID?
> +        * XXX: Not supported through fuse-io-uring yet, it should not even
> +        *      find the request - should not happen.
> +        */
> +       WARN_ON_ONCE(oh->unique & FUSE_INT_REQ_BIT);
> +
> +       return 0;
> +
> +seterr:
> +       oh->error = err;
> +err:
> +       return err;
> +}
> +
> +static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
> +                                    struct fuse_req *req,
> +                                    struct fuse_ring_ent *ent)
> +{
> +       struct fuse_copy_state cs;
> +       struct fuse_args *args = req->args;
> +       struct iov_iter iter;
> +       int err, res;
> +       struct fuse_uring_ent_in_out ring_in_out;
> +
> +       res = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
> +                            sizeof(ring_in_out));
> +       if (res)
> +               return -EFAULT;
> +
> +       err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz,
> +                         &iter);
> +       if (err)
> +               return err;
> +
> +       fuse_copy_init(&cs, 0, &iter);
> +       cs.is_uring = 1;
> +       cs.req = req;
> +
> +       return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
> +}
> +
> + /*
> +  * Copy data from the req to the ring buffer
> +  */
> +static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
> +                                  struct fuse_ring_ent *ent)
> +{
> +       struct fuse_copy_state cs;
> +       struct fuse_args *args = req->args;
> +       struct fuse_in_arg *in_args = args->in_args;
> +       int num_args = args->in_numargs;
> +       int err, res;
> +       struct iov_iter iter;
> +       struct fuse_uring_ent_in_out ent_in_out = {
> +               .flags = 0,
> +               .commit_id = ent->commit_id,
> +       };
> +
> +       if (WARN_ON(ent_in_out.commit_id == 0))
> +               return -EINVAL;
> +
> +       err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter);
> +       if (err) {
> +               pr_info_ratelimited("fuse: Import of user buffer failed\n");
> +               return err;
> +       }
> +
> +       fuse_copy_init(&cs, 1, &iter);
> +       cs.is_uring = 1;
> +       cs.req = req;
> +
> +       if (num_args > 0) {
> +               /*
> +                * Expectation is that the first argument is the per op header.
> +                * Some op code have that as zero.
> +                */
> +               if (args->in_args[0].size > 0) {
> +                       res = copy_to_user(&ent->headers->op_in, in_args->value,
> +                                          in_args->size);
> +                       err = res > 0 ? -EFAULT : res;
> +                       if (err) {
> +                               pr_info_ratelimited(
> +                                       "Copying the header failed.\n");
> +                               return err;
> +                       }
> +               }
> +               in_args++;
> +               num_args--;
> +       }
> +
> +       /* copy the payload */
> +       err = fuse_copy_args(&cs, num_args, args->in_pages,
> +                            (struct fuse_arg *)in_args, 0);
> +       if (err) {
> +               pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
> +               return err;
> +       }
> +
> +       ent_in_out.payload_sz = cs.ring.copied_sz;
> +       res = copy_to_user(&ent->headers->ring_ent_in_out, &ent_in_out,
> +                          sizeof(ent_in_out));
> +       err = res > 0 ? -EFAULT : res;
> +       if (err)
> +               return err;
> +
> +       return 0;
> +}
> +
> +static int
> +fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
> +{
> +       struct fuse_ring_queue *queue = ring_ent->queue;
> +       struct fuse_ring *ring = queue->ring;
> +       struct fuse_req *req = ring_ent->fuse_req;
> +       int err, res;
> +
> +       err = -EIO;
> +       if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
> +               pr_err("qid=%d ring-req=%p invalid state %d on send\n",
> +                      queue->qid, ring_ent, ring_ent->state);
> +               err = -EIO;
> +               goto err;
> +       }
> +
> +       /* copy the request */
> +       err = fuse_uring_copy_to_ring(ring, req, ring_ent);
> +       if (unlikely(err)) {
> +               pr_info_ratelimited("Copy to ring failed: %d\n", err);
> +               goto err;
> +       }
> +
> +       /* copy fuse_in_header */
> +       res = copy_to_user(&ring_ent->headers->in_out, &req->in.h,
> +                          sizeof(req->in.h));
> +       err = res > 0 ? -EFAULT : res;
> +       if (err)
> +               goto err;
> +
> +       set_bit(FR_SENT, &req->flags);
> +       return 0;
> +
> +err:
> +       fuse_uring_req_end(ring_ent, true, err);
> +       return err;
> +}
> +
> +/*
> + * Write data to the ring buffer and send the request to userspace,
> + * userspace will read it
> + * This is comparable with classical read(/dev/fuse)
> + */
> +static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent,
> +                                       unsigned int issue_flags)
> +{
> +       int err = 0;
> +       struct fuse_ring_queue *queue = ring_ent->queue;
> +
> +       err = fuse_uring_prepare_send(ring_ent);
> +       if (err)
> +               goto err;
> +
> +       spin_lock(&queue->lock);
> +       ring_ent->state = FRRS_USERSPACE;
> +       list_move(&ring_ent->list, &queue->ent_in_userspace);
> +       spin_unlock(&queue->lock);
> +
> +       io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags);
> +       ring_ent->cmd = NULL;
> +       return 0;
> +
> +err:
> +       return err;
> +}
> +
>  /*
>   * Make a ring entry available for fuse_req assignment
>   */
> @@ -138,6 +376,210 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>         ring_ent->state = FRRS_AVAILABLE;
>  }
>
> +/* Used to find the request on SQE commit */
> +static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
> +                                struct fuse_req *req)
> +{
> +       struct fuse_ring_queue *queue = ring_ent->queue;
> +       struct fuse_pqueue *fpq = &queue->fpq;
> +       unsigned int hash;
> +
> +       /* commit_id is the unique id of the request */
> +       ring_ent->commit_id = req->in.h.unique;
> +
> +       req->ring_entry = ring_ent;
> +       hash = fuse_req_hash(ring_ent->commit_id);
> +       list_move_tail(&req->list, &fpq->processing[hash]);
> +}
> +
> +/*
> + * Assign a fuse queue entry to the given entry
> + */
> +static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
> +                                          struct fuse_req *req)
> +{
> +       struct fuse_ring_queue *queue = ring_ent->queue;
> +
> +       lockdep_assert_held(&queue->lock);
> +
> +       if (WARN_ON_ONCE(ring_ent->state != FRRS_AVAILABLE &&
> +                        ring_ent->state != FRRS_COMMIT)) {
> +               pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
> +                       ring_ent->state);
> +       }
> +       list_del_init(&req->list);
> +       clear_bit(FR_PENDING, &req->flags);
> +       ring_ent->fuse_req = req;
> +       ring_ent->state = FRRS_FUSE_REQ;
> +       list_move(&ring_ent->list, &queue->ent_w_req_queue);
> +       fuse_uring_add_to_pq(ring_ent, req);
> +}
> +
> +/*
> + * Release the ring entry and fetch the next fuse request if available
> + *
> + * @return true if a new request has been fetched
> + */
> +static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
> +       __must_hold(&queue->lock)
> +{
> +       struct fuse_req *req;
> +       struct fuse_ring_queue *queue = ring_ent->queue;
> +       struct list_head *req_queue = &queue->fuse_req_queue;
> +
> +       lockdep_assert_held(&queue->lock);
> +
> +       /* get and assign the next entry while it is still holding the lock */
> +       req = list_first_entry_or_null(req_queue, struct fuse_req, list);
> +       if (req) {
> +               fuse_uring_add_req_to_ring_ent(ring_ent, req);
> +               return true;
> +       }
> +
> +       return false;
> +}
> +
> +/*
> + * Read data from the ring buffer, which user space has written to
> + * This is comparible with handling of classical write(/dev/fuse).
> + * Also make the ring request available again for new fuse requests.
> + */
> +static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
> +                             unsigned int issue_flags)
> +{
> +       struct fuse_ring *ring = ring_ent->queue->ring;
> +       struct fuse_conn *fc = ring->fc;
> +       struct fuse_req *req = ring_ent->fuse_req;
> +       ssize_t err = 0;
> +       bool set_err = false;
> +
> +       err = copy_from_user(&req->out.h, &ring_ent->headers->in_out,
> +                            sizeof(req->out.h));
> +       if (err) {
> +               req->out.h.error = err;
> +               goto out;
> +       }
> +
> +       err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
> +       if (err) {
> +               /* req->out.h.error already set */
> +               goto out;
> +       }
> +
> +       err = fuse_uring_copy_from_ring(ring, req, ring_ent);
> +       if (err)
> +               set_err = true;
> +
> +out:
> +       fuse_uring_req_end(ring_ent, set_err, err);
> +}
> +
> +/*
> + * Get the next fuse req and send it
> + */
> +static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
> +                                    struct fuse_ring_queue *queue,
> +                                    unsigned int issue_flags)
> +{
> +       int err;
> +       bool has_next;
> +
> +retry:
> +       spin_lock(&queue->lock);
> +       fuse_uring_ent_avail(ring_ent, queue);
> +       has_next = fuse_uring_ent_assign_req(ring_ent);
> +       spin_unlock(&queue->lock);
> +
> +       if (has_next) {
> +               err = fuse_uring_send_next_to_ring(ring_ent, issue_flags);
> +               if (err)
> +                       goto retry;
> +       }
> +}
> +
> +static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
> +{
> +       struct fuse_ring_queue *queue = ent->queue;
> +
> +       lockdep_assert_held(&queue->lock);
> +
> +       if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
> +               return -EIO;
> +
> +       ent->state = FRRS_COMMIT;
> +       list_move(&ent->list, &queue->ent_commit_queue);
> +
> +       return 0;
> +}
> +
> +/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
> +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
> +                                  struct fuse_conn *fc)
> +{
> +       const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
> +       struct fuse_ring_ent *ring_ent;
> +       int err;
> +       struct fuse_ring *ring = fc->ring;
> +       struct fuse_ring_queue *queue;
> +       uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
> +       unsigned int qid = READ_ONCE(cmd_req->qid);
> +       struct fuse_pqueue *fpq;
> +       struct fuse_req *req;
> +
> +       err = -ENOTCONN;
> +       if (!ring)
> +               return err;
> +
> +       if (qid >= ring->nr_queues)
> +               return -EINVAL;
> +
> +       queue = ring->queues[qid];
> +       if (!queue)
> +               return err;
> +       fpq = &queue->fpq;
> +
> +       spin_lock(&queue->lock);
> +       /* Find a request based on the unique ID of the fuse request
> +        * This should get revised, as it needs a hash calculation and list
> +        * search. And full struct fuse_pqueue is needed (memory overhead).
> +        * As well as the link from req to ring_ent.
> +        */

imo, the hash calculation and list search seems ok. I can't think of a
more optimal way of doing it. Instead of using the full struct
fuse_pqueue, I think we could just have the "struct list_head
*processing" defined inside "struct fuse_ring_queue" and change
fuse_request_find() to take in a list_head. I don't think we need a
dedicated spinlock for the list either. We can just reuse queue->lock,
as that's (currently) always held already when the processing list is
accessed.


> +       req = fuse_request_find(fpq, commit_id);
> +       err = -ENOENT;
> +       if (!req) {
> +               pr_info("qid=%d commit_id %llu not found\n", queue->qid,
> +                       commit_id);
> +               spin_unlock(&queue->lock);
> +               return err;
> +       }
> +       list_del_init(&req->list);
> +       ring_ent = req->ring_entry;
> +       req->ring_entry = NULL;

Do we need to set this to NULL, given that the request will be cleaned
up later in fuse_uring_req_end() anyways?

> +
> +       err = fuse_ring_ent_set_commit(ring_ent);
> +       if (err != 0) {
> +               pr_info_ratelimited("qid=%d commit_id %llu state %d",
> +                                   queue->qid, commit_id, ring_ent->state);
> +               spin_unlock(&queue->lock);
> +               return err;
> +       }
> +
> +       ring_ent->cmd = cmd;
> +       spin_unlock(&queue->lock);
> +
> +       /* without the queue lock, as other locks are taken */
> +       fuse_uring_commit(ring_ent, issue_flags);
> +
> +       /*
> +        * Fetching the next request is absolutely required as queued
> +        * fuse requests would otherwise not get processed - committing
> +        * and fetching is done in one step vs legacy fuse, which has separated
> +        * read (fetch request) and write (commit result).
> +        */
> +       fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);

If there's no request ready to read next, then no request will be
fetched and this will return. However, as I understand it, once the
uring is registered, userspace should only be interacting with the
uring via FUSE_IO_URING_CMD_COMMIT_AND_FETCH. However for the case
where no request was ready to read, it seems like userspace would have
nothing to commit when it wants to fetch the next request?

A more general question though: I imagine the most common use case
from the server side is waiting / polling until there is a request to
fetch. Could we not just do that here in the kernel instead with
adding a waitqueue mechanism and having fuse_uring_next_fuse_req()
only return when there is a request available? It seems like that
would reduce the amount of overhead instead of doing the
waiting/checking from the server side?

> +       return 0;
> +}
> +
>  /*
>   * fuse_uring_req_fetch command handling
>   */
> @@ -325,6 +767,14 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
>                         return err;
>                 }
>                 break;
> +       case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
> +               err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
> +               if (err) {
> +                       pr_info_once("FUSE_IO_URING_COMMIT_AND_FETCH failed err=%d\n",
> +                                    err);
> +                       return err;
> +               }
> +               break;
>         default:
>                 return -EINVAL;
>         }
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index 4e46dd65196d26dabc62dada33b17de9aa511c08..80f1c62d4df7f0ca77c4d5179068df6ffdbf7d85 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -20,6 +20,9 @@ enum fuse_ring_req_state {
>         /* The ring entry is waiting for new fuse requests */
>         FRRS_AVAILABLE,
>
> +       /* The ring entry got assigned a fuse req */
> +       FRRS_FUSE_REQ,
> +
>         /* The ring entry is in or on the way to user space */
>         FRRS_USERSPACE,
>  };
> @@ -70,7 +73,16 @@ struct fuse_ring_queue {
>          * entries in the process of being committed or in the process
>          * to be sent to userspace
>          */
> +       struct list_head ent_w_req_queue;

What does the w in this stand for? I find the name ambiguous here.


Thanks,
Joanne

>         struct list_head ent_commit_queue;
> +
> +       /* entries in userspace */
> +       struct list_head ent_in_userspace;
> +
> +       /* fuse requests waiting for an entry slot */
> +       struct list_head fuse_req_queue;
> +
> +       struct fuse_pqueue fpq;
>  };
>
>  /**
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e545b0864dd51e82df61cc39bdf65d3d36a418dc..e71556894bc25808581424ec7bdd4afeebc81f15 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -438,6 +438,10 @@ struct fuse_req {
>
>         /** fuse_mount this request belongs to */
>         struct fuse_mount *fm;
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +       void *ring_entry;
> +#endif
>  };
>
>  struct fuse_iqueue;
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 00/17] fuse: fuse-over-io-uring
  2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
                   ` (16 preceding siblings ...)
  2025-01-07  0:25 ` [PATCH v9 17/17] fuse: enable fuse-over-io-uring Bernd Schubert
@ 2025-01-17  9:07 ` Miklos Szeredi
  2025-01-17  9:12   ` Bernd Schubert
  17 siblings, 1 reply; 45+ messages in thread
From: Miklos Szeredi @ 2025-01-17  9:07 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Tue, 7 Jan 2025 at 01:25, Bernd Schubert <[email protected]> wrote:
>
> This adds support for io-uring communication between kernel and
> userspace daemon using opcode the IORING_OP_URING_CMD. The basic
> approach was taken from ublk.

I think this is in a good shape.   Let's pull v10 into
fuse.git#for-next and maybe we can have go at v6.14.

Any objections?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 00/17] fuse: fuse-over-io-uring
  2025-01-17  9:07 ` [PATCH v9 00/17] fuse: fuse-over-io-uring Miklos Szeredi
@ 2025-01-17  9:12   ` Bernd Schubert
  2025-01-17 12:01     ` Pavel Begunkov
  0 siblings, 1 reply; 45+ messages in thread
From: Bernd Schubert @ 2025-01-17  9:12 UTC (permalink / raw)
  To: Miklos Szeredi, Bernd Schubert
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei



On 1/17/25 10:07, Miklos Szeredi wrote:
> On Tue, 7 Jan 2025 at 01:25, Bernd Schubert <[email protected]> wrote:
>>
>> This adds support for io-uring communication between kernel and
>> userspace daemon using opcode the IORING_OP_URING_CMD. The basic
>> approach was taken from ublk.
> 
> I think this is in a good shape.   Let's pull v10 into
> fuse.git#for-next and maybe we can have go at v6.14.
> 
> Any objections?

Sounds great, I will have v10 in the next hours (got distracted all
week), there is a start up race fix I found in our branch with page
pinning (which slows down start up).


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands
  2025-01-07  0:25 ` [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
  2025-01-07  9:56   ` Luis Henriques
@ 2025-01-17 11:06   ` Pavel Begunkov
  2025-01-19 22:47     ` Bernd Schubert
  1 sibling, 1 reply; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 11:06 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei, bernd

On 1/7/25 00:25, Bernd Schubert wrote:
> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
> For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
> entries.
> 
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
...

Apart from mentioned by others and the comment below lgtm

Reviewed-by: Pavel Begunkov <[email protected]>


> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b44ba4033615e01041313c040035b6da6af0ee17
> --- /dev/null
> +++ b/fs/fuse/dev_uring.c
> @@ -0,0 +1,333 @@
...> +/* Register header and payload buffer with the kernel and fetch a request */
> +static int fuse_uring_register(struct io_uring_cmd *cmd,
> +			       unsigned int issue_flags, struct fuse_conn *fc)
> +{
> +	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
> +	struct fuse_ring *ring = fc->ring;
> +	struct fuse_ring_queue *queue;
> +	struct fuse_ring_ent *ring_ent;
> +	int err;
> +	struct iovec iov[FUSE_URING_IOV_SEGS];
> +	unsigned int qid = READ_ONCE(cmd_req->qid);
> +
> +	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);

Looks like leftovers? Not used, and it's repeated in
fuse_uring_create_ring_ent().


> +	if (err) {
> +		pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
> +				    err);
> +		return err;
> +	}
...

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-07  0:25 ` [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
  2025-01-07 14:42   ` Luis Henriques
  2025-01-13 22:44   ` Joanne Koong
@ 2025-01-17 11:18   ` Pavel Begunkov
  2025-01-17 11:20     ` Bernd Schubert
  2 siblings, 1 reply; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 11:18 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei, bernd

On 1/7/25 00:25, Bernd Schubert wrote:
> This adds support for fuse request completion through ring SQEs
> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
> the ring entry it becomes available for new fuse requests.
> Handling of requests through the ring (SQE/CQE handling)
> is complete now.
> 
> Fuse request data are copied through the mmaped ring buffer,
> there is no support for any zero copy yet.

Reviewed-by: Pavel Begunkov <[email protected]> # io_uring

> 
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>   fs/fuse/dev_uring.c   | 450 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   fs/fuse/dev_uring_i.h |  12 ++
>   fs/fuse/fuse_i.h      |   4 +
>   3 files changed, 466 insertions(+)
> 
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index b44ba4033615e01041313c040035b6da6af0ee17..f44e66a7ea577390da87e9ac7d118a9416898c28 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -26,6 +26,19 @@ bool fuse_uring_enabled(void)
...
> +
> +/*
> + * Write data to the ring buffer and send the request to userspace,
> + * userspace will read it
> + * This is comparable with classical read(/dev/fuse)
> + */
> +static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent,
> +					unsigned int issue_flags)
> +{
> +	int err = 0;
> +	struct fuse_ring_queue *queue = ring_ent->queue;
> +
> +	err = fuse_uring_prepare_send(ring_ent);
> +	if (err)
> +		goto err;
> +
> +	spin_lock(&queue->lock);
> +	ring_ent->state = FRRS_USERSPACE;
> +	list_move(&ring_ent->list, &queue->ent_in_userspace);
> +	spin_unlock(&queue->lock);
> +
> +	io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags);
> +	ring_ent->cmd = NULL;

I haven't checked if it races with some reallocation, but
you might want to consider clearing it under the spin.

spin_lock(&queue->lock);
...
cmd = ring_ent->cmd;
ring_ent->cmd = NULL;
spin_unlock(&queue->lock);

io_uring_cmd_done(cmd);

Can be done on top if even needed.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-17 11:18   ` Pavel Begunkov
@ 2025-01-17 11:20     ` Bernd Schubert
  0 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-17 11:20 UTC (permalink / raw)
  To: Pavel Begunkov, Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei



On 1/17/25 12:18, Pavel Begunkov wrote:
> On 1/7/25 00:25, Bernd Schubert wrote:
>> This adds support for fuse request completion through ring SQEs
>> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
>> the ring entry it becomes available for new fuse requests.
>> Handling of requests through the ring (SQE/CQE handling)
>> is complete now.
>>
>> Fuse request data are copied through the mmaped ring buffer,
>> there is no support for any zero copy yet.
> 
> Reviewed-by: Pavel Begunkov <[email protected]> # io_uring
> 
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>>   fs/fuse/dev_uring.c   | 450 ++++++++++++++++++++++++++++++++++++++++
>> ++++++++++
>>   fs/fuse/dev_uring_i.h |  12 ++
>>   fs/fuse/fuse_i.h      |   4 +
>>   3 files changed, 466 insertions(+)
>>
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> index
>> b44ba4033615e01041313c040035b6da6af0ee17..f44e66a7ea577390da87e9ac7d118a9416898c28 100644
>> --- a/fs/fuse/dev_uring.c
>> +++ b/fs/fuse/dev_uring.c
>> @@ -26,6 +26,19 @@ bool fuse_uring_enabled(void)
> ...
>> +
>> +/*
>> + * Write data to the ring buffer and send the request to userspace,
>> + * userspace will read it
>> + * This is comparable with classical read(/dev/fuse)
>> + */
>> +static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent,
>> +                    unsigned int issue_flags)
>> +{
>> +    int err = 0;
>> +    struct fuse_ring_queue *queue = ring_ent->queue;
>> +
>> +    err = fuse_uring_prepare_send(ring_ent);
>> +    if (err)
>> +        goto err;
>> +
>> +    spin_lock(&queue->lock);
>> +    ring_ent->state = FRRS_USERSPACE;
>> +    list_move(&ring_ent->list, &queue->ent_in_userspace);
>> +    spin_unlock(&queue->lock);
>> +
>> +    io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags);
>> +    ring_ent->cmd = NULL;
> 
> I haven't checked if it races with some reallocation, but
> you might want to consider clearing it under the spin.
> 
> spin_lock(&queue->lock);
> ...
> cmd = ring_ent->cmd;
> ring_ent->cmd = NULL;
> spin_unlock(&queue->lock);
> 
> io_uring_cmd_done(cmd);
> 
> Can be done on top if even needed.


Yes, thanks for your review! That is what I actually have in the ddn
branch, when I found the startup race.


Thanks,
Bernd



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries
  2025-01-07  0:25 ` [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
  2025-01-07 15:31   ` Luis Henriques
@ 2025-01-17 11:23   ` Pavel Begunkov
  1 sibling, 0 replies; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 11:23 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei, bernd

On 1/7/25 00:25, Bernd Schubert wrote:
> On teardown struct file_operations::uring_cmd requests
> need to be completed by calling io_uring_cmd_done().
> Not completing all ring entries would result in busy io-uring
> tasks giving warning messages in intervals and unreleased
> struct file.
> 
> Additionally the fuse connection and with that the ring can
> only get released when all io-uring commands are completed.
> 
> Completion is done with ring entries that are
> a) in waiting state for new fuse requests - io_uring_cmd_done
> is needed
> 
> b) already in userspace - io_uring_cmd_done through teardown
> is not needed, the request can just get released. If fuse server
> is still active and commits such a ring entry, fuse_uring_cmd()
> already checks if the connection is active and then complete the
> io-uring itself with -ENOTCONN. I.e. special handling is not
> needed.
> 
> This scheme is basically represented by the ring entry state
> FRRS_WAIT and FRRS_USERSPACE.

Looks reasonable

Reviewed-by: Pavel Begunkov <[email protected]>

> 
> Entries in state:
> - FRRS_INIT: No action needed, do not contribute to
>    ring->queue_refs yet
> - All other states: Are currently processed by other tasks,
>    async teardown is needed and it has to wait for the two
>    states above. It could be also solved without an async
>    teardown task, but would require additional if conditions
>    in hot code paths. Also in my personal opinion the code
>    looks cleaner with async teardown.
> 
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>   fs/fuse/dev.c         |   9 +++
>   fs/fuse/dev_uring.c   | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   fs/fuse/dev_uring_i.h |  51 +++++++++++++
>   3 files changed, 258 insertions(+)
-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring
  2025-01-07  0:25 ` [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring Bernd Schubert
  2025-01-07 15:54   ` Luis Henriques
@ 2025-01-17 11:47   ` Pavel Begunkov
  2025-01-17 21:52   ` Bernd Schubert
  2 siblings, 0 replies; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 11:47 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei, bernd

On 1/7/25 00:25, Bernd Schubert wrote:
> This prepares queueing and sending foreground requests through
> io-uring.

Reviewed-by: Pavel Begunkov <[email protected]> # io_uring


> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>   fs/fuse/dev_uring.c   | 185 ++++++++++++++++++++++++++++++++++++++++++++++++--
>   fs/fuse/dev_uring_i.h |  11 ++-
>   2 files changed, 187 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 01a908b2ef9ada14b759ca047eab40b4c4431d89..89a22a4eee23cbba49bac7a2d2126bb51193326f 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -26,6 +26,29 @@ bool fuse_uring_enabled(void)
>   	return enable_uring;
>   }
>  
...
> +
> +/*
> + * This prepares and sends the ring request in fuse-uring task context.
> + * User buffers are not mapped yet - the application does not have permission
> + * to write to it - this has to be executed in ring task context.
> + */
> +static void
> +fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
> +			    unsigned int issue_flags)
> +{
> +	struct fuse_ring_ent *ent = uring_cmd_to_ring_ent(cmd);
> +	struct fuse_ring_queue *queue = ent->queue;
> +	int err;
> +
> +	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
> +		err = -ECANCELED;
> +		goto terminating;
> +	}
> +
> +	err = fuse_uring_prepare_send(ent);
> +	if (err)
> +		goto err;
> +
> +terminating:
> +	spin_lock(&queue->lock);
> +	ent->state = FRRS_USERSPACE;
> +	list_move(&ent->list, &queue->ent_in_userspace);
> +	spin_unlock(&queue->lock);
> +	io_uring_cmd_done(cmd, err, 0, issue_flags);
> +	ent->cmd = NULL;

Might be worth moving inside the critical section as well.


-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 14/17] fuse: Allow to queue bg requests through io-uring
  2025-01-07  0:25 ` [PATCH v9 14/17] fuse: Allow to queue bg " Bernd Schubert
@ 2025-01-17 11:49   ` Pavel Begunkov
  0 siblings, 0 replies; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 11:49 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei, bernd

On 1/7/25 00:25, Bernd Schubert wrote:
> This prepares queueing and sending background requests through
> io-uring.


Reviewed-by: Pavel Begunkov <[email protected]> # io_uring

> 
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>   fs/fuse/dev.c         | 24 ++++++++++++-
>   fs/fuse/dev_uring.c   | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   fs/fuse/dev_uring_i.h | 12 +++++++
>   3 files changed, 134 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index ecf2f805f456222fda02598397beba41fc356460..afafa960d4725d9b64b22f17bf09c846219396d6 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2025-01-07  0:25 ` [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
  2025-01-07 16:14   ` Luis Henriques
@ 2025-01-17 11:52   ` Pavel Begunkov
  1 sibling, 0 replies; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 11:52 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei, bernd

On 1/7/25 00:25, Bernd Schubert wrote:
> When the fuse-server terminates while the fuse-client or kernel
> still has queued URING_CMDs, these commands retain references
> to the struct file used by the fuse connection. This prevents
> fuse_dev_release() from being invoked, resulting in a hung mount
> point.

lgtm

Reviewed-by: Pavel Begunkov <[email protected]> # io_uring

> 
> This patch addresses the issue by making queued URING_CMDs
> cancelable, allowing fuse_dev_release() to proceed as expected
> and preventing the mount point from hanging.
> 
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>   fs/fuse/dev.c         |  2 ++
>   fs/fuse/dev_uring.c   | 71 ++++++++++++++++++++++++++++++++++++++++++++++++---
>   fs/fuse/dev_uring_i.h |  9 +++++++
>   3 files changed, 79 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index afafa960d4725d9b64b22f17bf09c846219396d6..1b593b23f7b8c319ec38c7e726dabf516965500e 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -599,8 +599,10 @@ static int fuse_request_queue_background(struct fuse_req *req)
>   	}
>   	__set_bit(FR_ISREPLY, &req->flags);
>   
> +#ifdef CONFIG_FUSE_IO_URING
>   	if (fuse_uring_ready(fc))
>   		return fuse_request_queue_background_uring(fc, req);
> +#endif


Looks like it should've been a part of some earlier commit.

>   
>   	spin_lock(&fc->bg_lock);
>   	if (likely(fc->connected)) {


-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 17/17] fuse: enable fuse-over-io-uring
  2025-01-07  0:25 ` [PATCH v9 17/17] fuse: enable fuse-over-io-uring Bernd Schubert
@ 2025-01-17 11:52   ` Pavel Begunkov
  0 siblings, 0 replies; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 11:52 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei, bernd

On 1/7/25 00:25, Bernd Schubert wrote:
> All required parts are handled now, fuse-io-uring can
> be enabled.

Reviewed-by: Pavel Begunkov <[email protected]>

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 00/17] fuse: fuse-over-io-uring
  2025-01-17  9:12   ` Bernd Schubert
@ 2025-01-17 12:01     ` Pavel Begunkov
  0 siblings, 0 replies; 45+ messages in thread
From: Pavel Begunkov @ 2025-01-17 12:01 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi, Bernd Schubert
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei

On 1/17/25 09:12, Bernd Schubert wrote:
> On 1/17/25 10:07, Miklos Szeredi wrote:
>> On Tue, 7 Jan 2025 at 01:25, Bernd Schubert <[email protected]> wrote:
>>>
>>> This adds support for io-uring communication between kernel and
>>> userspace daemon using opcode the IORING_OP_URING_CMD. The basic
>>> approach was taken from ublk.
>>
>> I think this is in a good shape.   Let's pull v10 into
>> fuse.git#for-next and maybe we can have go at v6.14.
>>
>> Any objections?

Sounds right, io_uring adjacent bits look good. Bernd, feel free
to stick to the series in general:

Reviewed-by: Pavel Begunkov <[email protected]> # io_uring


> Sounds great, I will have v10 in the next hours (got distracted all
> week), there is a start up race fix I found in our branch with page
> pinning (which slows down start up).

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring
  2025-01-07  0:25 ` [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring Bernd Schubert
  2025-01-07 15:54   ` Luis Henriques
  2025-01-17 11:47   ` Pavel Begunkov
@ 2025-01-17 21:52   ` Bernd Schubert
  2 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-17 21:52 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei


> +/* queue a fuse request and send it if a ring entry is available */
> +void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_ring *ring = fc->ring;
> +	struct fuse_ring_queue *queue;
> +	struct fuse_ring_ent *ent = NULL;
> +	int err;
> +
> +	err = -EINVAL;
> +	queue = fuse_uring_task_to_queue(ring);
> +	if (!queue)
> +		goto err;
> +
> +	if (req->in.h.opcode != FUSE_NOTIFY_REPLY)
> +		req->in.h.unique = fuse_get_unique(fiq);
> +
> +	spin_lock(&queue->lock);
> +	err = -ENOTCONN;
> +	if (unlikely(queue->stopped))
> +		goto err_unlock;
> +
> +	ent = list_first_entry_or_null(&queue->ent_avail_queue,
> +				       struct fuse_ring_ent, list);
> +	if (ent)
> +		fuse_uring_add_req_to_ring_ent(ent, req);
> +	else
> +		list_add_tail(&req->list, &queue->fuse_req_queue);
> +	spin_unlock(&queue->lock);
> +
> +	if (ent) {
> +		struct io_uring_cmd *cmd = ent->cmd;
> +
> +		err = -EIO;
> +		if (WARN_ON_ONCE(ent->state != FRRS_FUSE_REQ))
> +			goto err;


I noticed this - this is wrong, as ent would be in nirvana state if
this condition would ever happen.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands
  2025-01-17 11:06   ` Pavel Begunkov
@ 2025-01-19 22:47     ` Bernd Schubert
  0 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-19 22:47 UTC (permalink / raw)
  To: Pavel Begunkov, Bernd Schubert, Miklos Szeredi
  Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
	Amir Goldstein, Ming Lei, David Wei



On 1/17/25 12:06, Pavel Begunkov wrote:
> On 1/7/25 00:25, Bernd Schubert wrote:
>> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
>> For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
>> entries.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
> ...
> 
> Apart from mentioned by others and the comment below lgtm
> 
> Reviewed-by: Pavel Begunkov <[email protected]>
> 
> 
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> new file mode 100644
>> index
>> 0000000000000000000000000000000000000000..b44ba4033615e01041313c040035b6da6af0ee17
>> --- /dev/null
>> +++ b/fs/fuse/dev_uring.c
>> @@ -0,0 +1,333 @@
> ...> +/* Register header and payload buffer with the kernel and fetch a
> request */
>> +static int fuse_uring_register(struct io_uring_cmd *cmd,
>> +                   unsigned int issue_flags, struct fuse_conn *fc)
>> +{
>> +    const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd-
>> >sqe);
>> +    struct fuse_ring *ring = fc->ring;
>> +    struct fuse_ring_queue *queue;
>> +    struct fuse_ring_ent *ring_ent;
>> +    int err;
>> +    struct iovec iov[FUSE_URING_IOV_SEGS];
>> +    unsigned int qid = READ_ONCE(cmd_req->qid);
>> +
>> +    err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> 
> Looks like leftovers? Not used, and it's repeated in
> fuse_uring_create_ring_ent().

Yep, thank you fixed. I hope there is nothing like that left
anymore. I run static analyzers today - nothing found.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support
  2025-01-13 22:44   ` Joanne Koong
@ 2025-01-20  0:33     ` Bernd Schubert
  0 siblings, 0 replies; 45+ messages in thread
From: Bernd Schubert @ 2025-01-20  0:33 UTC (permalink / raw)
  To: Joanne Koong, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei

[-- Attachment #1: Type: text/plain, Size: 22967 bytes --]

Hi Joanne,

sorry for my late reply, I was occupied all week. 

On 1/13/25 23:44, Joanne Koong wrote:
> On Mon, Jan 6, 2025 at 4:25 PM Bernd Schubert <[email protected]> wrote:
>>
>> This adds support for fuse request completion through ring SQEs
>> (FUSE_URING_CMD_COMMIT_AND_FETCH handling). After committing
>> the ring entry it becomes available for new fuse requests.
>> Handling of requests through the ring (SQE/CQE handling)
>> is complete now.
>>
>> Fuse request data are copied through the mmaped ring buffer,
>> there is no support for any zero copy yet.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>>  fs/fuse/dev_uring.c   | 450 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/fuse/dev_uring_i.h |  12 ++
>>  fs/fuse/fuse_i.h      |   4 +
>>  3 files changed, 466 insertions(+)
>>
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> index b44ba4033615e01041313c040035b6da6af0ee17..f44e66a7ea577390da87e9ac7d118a9416898c28 100644
>> --- a/fs/fuse/dev_uring.c
>> +++ b/fs/fuse/dev_uring.c
>> @@ -26,6 +26,19 @@ bool fuse_uring_enabled(void)
>>         return enable_uring;
>>  }
>>
>> +static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
>> +                              int error)
>> +{
>> +       struct fuse_req *req = ring_ent->fuse_req;
>> +
>> +       if (set_err)
>> +               req->out.h.error = error;
> 
> I think we could get away with not having the "bool set_err" as an
> argument if we do "if (error)" directly. AFAICT, we can use the value
> of error directly since  it always returns zero on success and any
> non-zero value is considered an error.

I had done this because of fuse_uring_commit()


	err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
	if (err) {
		/* req->out.h.error already set */
		goto out;
	}


In fuse_uring_out_header_has_err() the header might already have the
error code, but there are other errors as well. Well, setting an 
existing error code saves us a few lines and conditions, so you are
probably right and I removed that argument now.


> 
>> +
>> +       clear_bit(FR_SENT, &req->flags);
>> +       fuse_request_end(ring_ent->fuse_req);
>> +       ring_ent->fuse_req = NULL;
>> +}
>> +
>>  void fuse_uring_destruct(struct fuse_conn *fc)
>>  {
>>         struct fuse_ring *ring = fc->ring;
>> @@ -41,8 +54,11 @@ void fuse_uring_destruct(struct fuse_conn *fc)
>>                         continue;
>>
>>                 WARN_ON(!list_empty(&queue->ent_avail_queue));
>> +               WARN_ON(!list_empty(&queue->ent_w_req_queue));
>>                 WARN_ON(!list_empty(&queue->ent_commit_queue));
>> +               WARN_ON(!list_empty(&queue->ent_in_userspace));
>>
>> +               kfree(queue->fpq.processing);
>>                 kfree(queue);
>>                 ring->queues[qid] = NULL;
>>         }
>> @@ -101,20 +117,34 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>>  {
>>         struct fuse_conn *fc = ring->fc;
>>         struct fuse_ring_queue *queue;
>> +       struct list_head *pq;
>>
>>         queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
>>         if (!queue)
>>                 return NULL;
>> +       pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
>> +       if (!pq) {
>> +               kfree(queue);
>> +               return NULL;
>> +       }
>> +
>>         queue->qid = qid;
>>         queue->ring = ring;
>>         spin_lock_init(&queue->lock);
>>
>>         INIT_LIST_HEAD(&queue->ent_avail_queue);
>>         INIT_LIST_HEAD(&queue->ent_commit_queue);
>> +       INIT_LIST_HEAD(&queue->ent_w_req_queue);
>> +       INIT_LIST_HEAD(&queue->ent_in_userspace);
>> +       INIT_LIST_HEAD(&queue->fuse_req_queue);
>> +
>> +       queue->fpq.processing = pq;
>> +       fuse_pqueue_init(&queue->fpq);
>>
>>         spin_lock(&fc->lock);
>>         if (ring->queues[qid]) {
>>                 spin_unlock(&fc->lock);
>> +               kfree(queue->fpq.processing);
>>                 kfree(queue);
>>                 return ring->queues[qid];
>>         }
>> @@ -128,6 +158,214 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>>         return queue;
>>  }
>>
>> +/*
>> + * Checks for errors and stores it into the request
>> + */
>> +static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
>> +                                        struct fuse_req *req,
>> +                                        struct fuse_conn *fc)
>> +{
>> +       int err;
>> +
>> +       err = -EINVAL;
>> +       if (oh->unique == 0) {
>> +               /* Not supportd through io-uring yet */
>> +               pr_warn_once("notify through fuse-io-uring not supported\n");
>> +               goto seterr;
>> +       }
>> +
>> +       err = -EINVAL;
>> +       if (oh->error <= -ERESTARTSYS || oh->error > 0)
>> +               goto seterr;
>> +
>> +       if (oh->error) {
>> +               err = oh->error;
>> +               goto err;
>> +       }
>> +
>> +       err = -ENOENT;
>> +       if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
>> +               pr_warn_ratelimited("unique mismatch, expected: %llu got %llu\n",
>> +                                   req->in.h.unique,
>> +                                   oh->unique & ~FUSE_INT_REQ_BIT);
>> +               goto seterr;
>> +       }
>> +
>> +       /*
>> +        * Is it an interrupt reply ID?
>> +        * XXX: Not supported through fuse-io-uring yet, it should not even
>> +        *      find the request - should not happen.
>> +        */
>> +       WARN_ON_ONCE(oh->unique & FUSE_INT_REQ_BIT);
>> +
>> +       return 0;
>> +
>> +seterr:
>> +       oh->error = err;
>> +err:
>> +       return err;
>> +}
>> +
>> +static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
>> +                                    struct fuse_req *req,
>> +                                    struct fuse_ring_ent *ent)
>> +{
>> +       struct fuse_copy_state cs;
>> +       struct fuse_args *args = req->args;
>> +       struct iov_iter iter;
>> +       int err, res;
>> +       struct fuse_uring_ent_in_out ring_in_out;
>> +
>> +       res = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
>> +                            sizeof(ring_in_out));
>> +       if (res)
>> +               return -EFAULT;
>> +
>> +       err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz,
>> +                         &iter);
>> +       if (err)
>> +               return err;
>> +
>> +       fuse_copy_init(&cs, 0, &iter);
>> +       cs.is_uring = 1;
>> +       cs.req = req;
>> +
>> +       return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
>> +}
>> +
>> + /*
>> +  * Copy data from the req to the ring buffer
>> +  */
>> +static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
>> +                                  struct fuse_ring_ent *ent)
>> +{
>> +       struct fuse_copy_state cs;
>> +       struct fuse_args *args = req->args;
>> +       struct fuse_in_arg *in_args = args->in_args;
>> +       int num_args = args->in_numargs;
>> +       int err, res;
>> +       struct iov_iter iter;
>> +       struct fuse_uring_ent_in_out ent_in_out = {
>> +               .flags = 0,
>> +               .commit_id = ent->commit_id,
>> +       };
>> +
>> +       if (WARN_ON(ent_in_out.commit_id == 0))
>> +               return -EINVAL;
>> +
>> +       err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter);
>> +       if (err) {
>> +               pr_info_ratelimited("fuse: Import of user buffer failed\n");
>> +               return err;
>> +       }
>> +
>> +       fuse_copy_init(&cs, 1, &iter);
>> +       cs.is_uring = 1;
>> +       cs.req = req;
>> +
>> +       if (num_args > 0) {
>> +               /*
>> +                * Expectation is that the first argument is the per op header.
>> +                * Some op code have that as zero.
>> +                */
>> +               if (args->in_args[0].size > 0) {
>> +                       res = copy_to_user(&ent->headers->op_in, in_args->value,
>> +                                          in_args->size);
>> +                       err = res > 0 ? -EFAULT : res;
>> +                       if (err) {
>> +                               pr_info_ratelimited(
>> +                                       "Copying the header failed.\n");
>> +                               return err;
>> +                       }
>> +               }
>> +               in_args++;
>> +               num_args--;
>> +       }
>> +
>> +       /* copy the payload */
>> +       err = fuse_copy_args(&cs, num_args, args->in_pages,
>> +                            (struct fuse_arg *)in_args, 0);
>> +       if (err) {
>> +               pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
>> +               return err;
>> +       }
>> +
>> +       ent_in_out.payload_sz = cs.ring.copied_sz;
>> +       res = copy_to_user(&ent->headers->ring_ent_in_out, &ent_in_out,
>> +                          sizeof(ent_in_out));
>> +       err = res > 0 ? -EFAULT : res;
>> +       if (err)
>> +               return err;
>> +
>> +       return 0;
>> +}
>> +
>> +static int
>> +fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
>> +{
>> +       struct fuse_ring_queue *queue = ring_ent->queue;
>> +       struct fuse_ring *ring = queue->ring;
>> +       struct fuse_req *req = ring_ent->fuse_req;
>> +       int err, res;
>> +
>> +       err = -EIO;
>> +       if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
>> +               pr_err("qid=%d ring-req=%p invalid state %d on send\n",
>> +                      queue->qid, ring_ent, ring_ent->state);
>> +               err = -EIO;
>> +               goto err;
>> +       }
>> +
>> +       /* copy the request */
>> +       err = fuse_uring_copy_to_ring(ring, req, ring_ent);
>> +       if (unlikely(err)) {
>> +               pr_info_ratelimited("Copy to ring failed: %d\n", err);
>> +               goto err;
>> +       }
>> +
>> +       /* copy fuse_in_header */
>> +       res = copy_to_user(&ring_ent->headers->in_out, &req->in.h,
>> +                          sizeof(req->in.h));
>> +       err = res > 0 ? -EFAULT : res;
>> +       if (err)
>> +               goto err;
>> +
>> +       set_bit(FR_SENT, &req->flags);
>> +       return 0;
>> +
>> +err:
>> +       fuse_uring_req_end(ring_ent, true, err);
>> +       return err;
>> +}
>> +
>> +/*
>> + * Write data to the ring buffer and send the request to userspace,
>> + * userspace will read it
>> + * This is comparable with classical read(/dev/fuse)
>> + */
>> +static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent,
>> +                                       unsigned int issue_flags)
>> +{
>> +       int err = 0;
>> +       struct fuse_ring_queue *queue = ring_ent->queue;
>> +
>> +       err = fuse_uring_prepare_send(ring_ent);
>> +       if (err)
>> +               goto err;
>> +
>> +       spin_lock(&queue->lock);
>> +       ring_ent->state = FRRS_USERSPACE;
>> +       list_move(&ring_ent->list, &queue->ent_in_userspace);
>> +       spin_unlock(&queue->lock);
>> +
>> +       io_uring_cmd_done(ring_ent->cmd, 0, 0, issue_flags);
>> +       ring_ent->cmd = NULL;
>> +       return 0;
>> +
>> +err:
>> +       return err;
>> +}
>> +
>>  /*
>>   * Make a ring entry available for fuse_req assignment
>>   */
>> @@ -138,6 +376,210 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>>         ring_ent->state = FRRS_AVAILABLE;
>>  }
>>
>> +/* Used to find the request on SQE commit */
>> +static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
>> +                                struct fuse_req *req)
>> +{
>> +       struct fuse_ring_queue *queue = ring_ent->queue;
>> +       struct fuse_pqueue *fpq = &queue->fpq;
>> +       unsigned int hash;
>> +
>> +       /* commit_id is the unique id of the request */
>> +       ring_ent->commit_id = req->in.h.unique;
>> +
>> +       req->ring_entry = ring_ent;
>> +       hash = fuse_req_hash(ring_ent->commit_id);
>> +       list_move_tail(&req->list, &fpq->processing[hash]);
>> +}
>> +
>> +/*
>> + * Assign a fuse queue entry to the given entry
>> + */
>> +static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
>> +                                          struct fuse_req *req)
>> +{
>> +       struct fuse_ring_queue *queue = ring_ent->queue;
>> +
>> +       lockdep_assert_held(&queue->lock);
>> +
>> +       if (WARN_ON_ONCE(ring_ent->state != FRRS_AVAILABLE &&
>> +                        ring_ent->state != FRRS_COMMIT)) {
>> +               pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
>> +                       ring_ent->state);
>> +       }
>> +       list_del_init(&req->list);
>> +       clear_bit(FR_PENDING, &req->flags);
>> +       ring_ent->fuse_req = req;
>> +       ring_ent->state = FRRS_FUSE_REQ;
>> +       list_move(&ring_ent->list, &queue->ent_w_req_queue);
>> +       fuse_uring_add_to_pq(ring_ent, req);
>> +}
>> +
>> +/*
>> + * Release the ring entry and fetch the next fuse request if available
>> + *
>> + * @return true if a new request has been fetched
>> + */
>> +static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
>> +       __must_hold(&queue->lock)
>> +{
>> +       struct fuse_req *req;
>> +       struct fuse_ring_queue *queue = ring_ent->queue;
>> +       struct list_head *req_queue = &queue->fuse_req_queue;
>> +
>> +       lockdep_assert_held(&queue->lock);
>> +
>> +       /* get and assign the next entry while it is still holding the lock */
>> +       req = list_first_entry_or_null(req_queue, struct fuse_req, list);
>> +       if (req) {
>> +               fuse_uring_add_req_to_ring_ent(ring_ent, req);
>> +               return true;
>> +       }
>> +
>> +       return false;
>> +}
>> +
>> +/*
>> + * Read data from the ring buffer, which user space has written to
>> + * This is comparible with handling of classical write(/dev/fuse).
>> + * Also make the ring request available again for new fuse requests.
>> + */
>> +static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
>> +                             unsigned int issue_flags)
>> +{
>> +       struct fuse_ring *ring = ring_ent->queue->ring;
>> +       struct fuse_conn *fc = ring->fc;
>> +       struct fuse_req *req = ring_ent->fuse_req;
>> +       ssize_t err = 0;
>> +       bool set_err = false;
>> +
>> +       err = copy_from_user(&req->out.h, &ring_ent->headers->in_out,
>> +                            sizeof(req->out.h));
>> +       if (err) {
>> +               req->out.h.error = err;
>> +               goto out;
>> +       }
>> +
>> +       err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
>> +       if (err) {
>> +               /* req->out.h.error already set */
>> +               goto out;
>> +       }
>> +
>> +       err = fuse_uring_copy_from_ring(ring, req, ring_ent);
>> +       if (err)
>> +               set_err = true;
>> +
>> +out:
>> +       fuse_uring_req_end(ring_ent, set_err, err);
>> +}
>> +
>> +/*
>> + * Get the next fuse req and send it
>> + */
>> +static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
>> +                                    struct fuse_ring_queue *queue,
>> +                                    unsigned int issue_flags)
>> +{
>> +       int err;
>> +       bool has_next;
>> +
>> +retry:
>> +       spin_lock(&queue->lock);
>> +       fuse_uring_ent_avail(ring_ent, queue);
>> +       has_next = fuse_uring_ent_assign_req(ring_ent);
>> +       spin_unlock(&queue->lock);
>> +
>> +       if (has_next) {
>> +               err = fuse_uring_send_next_to_ring(ring_ent, issue_flags);
>> +               if (err)
>> +                       goto retry;
>> +       }
>> +}
>> +
>> +static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
>> +{
>> +       struct fuse_ring_queue *queue = ent->queue;
>> +
>> +       lockdep_assert_held(&queue->lock);
>> +
>> +       if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
>> +               return -EIO;
>> +
>> +       ent->state = FRRS_COMMIT;
>> +       list_move(&ent->list, &queue->ent_commit_queue);
>> +
>> +       return 0;
>> +}
>> +
>> +/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
>> +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>> +                                  struct fuse_conn *fc)
>> +{
>> +       const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
>> +       struct fuse_ring_ent *ring_ent;
>> +       int err;
>> +       struct fuse_ring *ring = fc->ring;
>> +       struct fuse_ring_queue *queue;
>> +       uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
>> +       unsigned int qid = READ_ONCE(cmd_req->qid);
>> +       struct fuse_pqueue *fpq;
>> +       struct fuse_req *req;
>> +
>> +       err = -ENOTCONN;
>> +       if (!ring)
>> +               return err;
>> +
>> +       if (qid >= ring->nr_queues)
>> +               return -EINVAL;
>> +
>> +       queue = ring->queues[qid];
>> +       if (!queue)
>> +               return err;
>> +       fpq = &queue->fpq;
>> +
>> +       spin_lock(&queue->lock);
>> +       /* Find a request based on the unique ID of the fuse request
>> +        * This should get revised, as it needs a hash calculation and list
>> +        * search. And full struct fuse_pqueue is needed (memory overhead).
>> +        * As well as the link from req to ring_ent.
>> +        */
> 
> imo, the hash calculation and list search seems ok. I can't think of a
> more optimal way of doing it. Instead of using the full struct
> fuse_pqueue, I think we could just have the "struct list_head
> *processing" defined inside "struct fuse_ring_queue" and change
> fuse_request_find() to take in a list_head. I don't think we need a
> dedicated spinlock for the list either. We can just reuse queue->lock,
> as that's (currently) always held already when the processing list is
> accessed.


Please see the attached patch, which uses xarray. Totally untested, though.
I actually found an issue while writing this patch - FR_PENDING was cleared
without holding fiq->lock, but that is important for request_wait_answer().
If something removes req from the list, we entirely loose the ring entry -
can never be used anymore. Personally I think the attached patch is safer.


> 
> 
>> +       req = fuse_request_find(fpq, commit_id);
>> +       err = -ENOENT;
>> +       if (!req) {
>> +               pr_info("qid=%d commit_id %llu not found\n", queue->qid,
>> +                       commit_id);
>> +               spin_unlock(&queue->lock);
>> +               return err;
>> +       }
>> +       list_del_init(&req->list);
>> +       ring_ent = req->ring_entry;
>> +       req->ring_entry = NULL;
> 
> Do we need to set this to NULL, given that the request will be cleaned
> up later in fuse_uring_req_end() anyways?

It is not explicitly set to NULL in that function. Would you mind to keep
it safe? 

> 
>> +
>> +       err = fuse_ring_ent_set_commit(ring_ent);
>> +       if (err != 0) {
>> +               pr_info_ratelimited("qid=%d commit_id %llu state %d",
>> +                                   queue->qid, commit_id, ring_ent->state);
>> +               spin_unlock(&queue->lock);
>> +               return err;
>> +       }
>> +
>> +       ring_ent->cmd = cmd;
>> +       spin_unlock(&queue->lock);
>> +
>> +       /* without the queue lock, as other locks are taken */
>> +       fuse_uring_commit(ring_ent, issue_flags);
>> +
>> +       /*
>> +        * Fetching the next request is absolutely required as queued
>> +        * fuse requests would otherwise not get processed - committing
>> +        * and fetching is done in one step vs legacy fuse, which has separated
>> +        * read (fetch request) and write (commit result).
>> +        */
>> +       fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
> 
> If there's no request ready to read next, then no request will be
> fetched and this will return. However, as I understand it, once the
> uring is registered, userspace should only be interacting with the
> uring via FUSE_IO_URING_CMD_COMMIT_AND_FETCH. However for the case
> where no request was ready to read, it seems like userspace would have
> nothing to commit when it wants to fetch the next request?

We have

FUSE_IO_URING_CMD_REGISTER 
FUSE_IO_URING_CMD_COMMIT_AND_FETCH


After _CMD_REGISTER the corresponding ring-entry is ready to get fuse
requests and waiting. After it gets a request assigned and handles it
by fuse server the _COMMIT_AND_FETCH scheme applies. Did you possibly
miss that _CMD_REGISTER will already have it waiting?


> 
> A more general question though: I imagine the most common use case
> from the server side is waiting / polling until there is a request to
> fetch. Could we not just do that here in the kernel instead with
> adding a waitqueue mechanism and having fuse_uring_next_fuse_req()
> only return when there is a request available? It seems like that
> would reduce the amount of overhead instead of doing the
> waiting/checking from the server side?

The io-uring interface says that we should return -EIOCBQUEUED. If we
would wait here, other SQEs that are submitted in parallel by
fuse-server couldn't be handled anymore, as we wouldn't return
to io-uring (all of this is in io-uring task context).

> 
>> +       return 0;
>> +}
>> +
>>  /*
>>   * fuse_uring_req_fetch command handling
>>   */
>> @@ -325,6 +767,14 @@ int __maybe_unused fuse_uring_cmd(struct io_uring_cmd *cmd,
>>                         return err;
>>                 }
>>                 break;
>> +       case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
>> +               err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
>> +               if (err) {
>> +                       pr_info_once("FUSE_IO_URING_COMMIT_AND_FETCH failed err=%d\n",
>> +                                    err);
>> +                       return err;
>> +               }
>> +               break;
>>         default:
>>                 return -EINVAL;
>>         }
>> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
>> index 4e46dd65196d26dabc62dada33b17de9aa511c08..80f1c62d4df7f0ca77c4d5179068df6ffdbf7d85 100644
>> --- a/fs/fuse/dev_uring_i.h
>> +++ b/fs/fuse/dev_uring_i.h
>> @@ -20,6 +20,9 @@ enum fuse_ring_req_state {
>>         /* The ring entry is waiting for new fuse requests */
>>         FRRS_AVAILABLE,
>>
>> +       /* The ring entry got assigned a fuse req */
>> +       FRRS_FUSE_REQ,
>> +
>>         /* The ring entry is in or on the way to user space */
>>         FRRS_USERSPACE,
>>  };
>> @@ -70,7 +73,16 @@ struct fuse_ring_queue {
>>          * entries in the process of being committed or in the process
>>          * to be sent to userspace
>>          */
>> +       struct list_head ent_w_req_queue;
> 
> What does the w in this stand for? I find the name ambiguous here.

"entry-with-request-queue".  Do you have another naming suggestion?


Thanks,
Bernd


[-- Attachment #2: ent-xarray.patch --]
[-- Type: text/x-patch, Size: 5767 bytes --]

commit 48eacf47fc2f6a6a2bab35579451825282eb4f1f
Author: Bernd Schubert <[email protected]>
Date:   Mon Jan 20 00:34:41 2025 +0100

    xarray

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 40a0c19ab4d7..b8d2cea1f72b 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -7,6 +7,7 @@
 #include "fuse_i.h"
 #include "dev_uring_i.h"
 #include "fuse_dev_i.h"
+#include <linux/xarray.h>
 
 #include <linux/fs.h>
 #include <linux/io_uring/cmd.h>
@@ -55,7 +56,6 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 		WARN_ON(!list_empty(&queue->ent_commit_queue));
 		WARN_ON(!list_empty(&queue->ent_in_userspace));
 
-		kfree(queue->fpq.processing);
 		kfree(queue);
 		ring->queues[qid] = NULL;
 	}
@@ -135,13 +135,11 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	INIT_LIST_HEAD(&queue->ent_in_userspace);
 	INIT_LIST_HEAD(&queue->fuse_req_queue);
 
-	queue->fpq.processing = pq;
-	fuse_pqueue_init(&queue->fpq);
+	xa_init(&queue->ent_xa);
 
 	spin_lock(&fc->lock);
 	if (ring->queues[qid]) {
 		spin_unlock(&fc->lock);
-		kfree(queue->fpq.processing);
 		kfree(queue);
 		return ring->queues[qid];
 	}
@@ -240,7 +238,7 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
 	struct iov_iter iter;
 	struct fuse_uring_ent_in_out ent_in_out = {
 		.flags = 0,
-		.commit_id = req->in.h.unique,
+		.commit_id = ent->id
 	};
 
 	err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter);
@@ -373,19 +371,6 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ent,
 	ent->state = FRRS_AVAILABLE;
 }
 
-/* Used to find the request on SQE commit */
-static void fuse_uring_add_to_pq(struct fuse_ring_ent *ent,
-				 struct fuse_req *req)
-{
-	struct fuse_ring_queue *queue = ent->queue;
-	struct fuse_pqueue *fpq = &queue->fpq;
-	unsigned int hash;
-
-	req->ring_entry = ent;
-	hash = fuse_req_hash(req->in.h.unique);
-	list_move_tail(&req->list, &fpq->processing[hash]);
-}
-
 /*
  * Assign a fuse queue entry to the given entry
  */
@@ -410,7 +395,9 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
 	ent->fuse_req = req;
 	ent->state = FRRS_FUSE_REQ;
 	list_move(&ent->list, &queue->ent_w_req_queue);
-	fuse_uring_add_to_pq(ent, req);
+
+	WARN_ON_ONCE(!list_empty(&ent->proc_list));
+	list_move_tail(&req->list, &ent->proc_list);
 }
 
 /*
@@ -450,6 +437,15 @@ static void fuse_uring_commit(struct fuse_ring_ent *ent,
 	struct fuse_req *req = ent->fuse_req;
 	ssize_t err = 0;
 
+	/*
+	 * The request was removed from proc_list - we are not going to further
+	 * process it
+	 */
+	if (list_empty(&ent->proc_list))
+		return;
+
+	list_del_init(&req->list);
+
 	err = copy_from_user(&req->out.h, &ent->headers->in_out,
 			     sizeof(req->out.h));
 	if (err) {
@@ -506,6 +502,12 @@ static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
 	return 0;
 }
 
+static struct fuse_ring_ent *
+fuse_uring_find_ring_ent(struct fuse_ring_queue *queue, u32 id)
+{
+	return xa_load(&queue->ent_xa, id);
+}
+
 /* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
 static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 				   struct fuse_conn *fc)
@@ -517,7 +519,6 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	struct fuse_ring_queue *queue;
 	uint64_t commit_id = READ_ONCE(cmd_req->commit_id);
 	unsigned int qid = READ_ONCE(cmd_req->qid);
-	struct fuse_pqueue *fpq;
 	struct fuse_req *req;
 
 	err = -ENOTCONN;
@@ -530,28 +531,20 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	queue = ring->queues[qid];
 	if (!queue)
 		return err;
-	fpq = &queue->fpq;
 
 	spin_lock(&queue->lock);
-	/* Find a request based on the unique ID of the fuse request
-	 * This should get revised, as it needs a hash calculation and list
-	 * search. And full struct fuse_pqueue is needed (memory overhead).
-	 * As well as the link from req to ring_ent.
-	 */
-	req = fuse_request_find(fpq, commit_id);
+
 	err = -ENOENT;
-	if (!req) {
-		pr_info("qid=%d commit_id %llu not found\n", queue->qid,
-			commit_id);
+	ent = fuse_uring_find_ring_ent(queue, commit_id);
+	if (!ent) {
 		spin_unlock(&queue->lock);
 		return err;
 	}
-	list_del_init(&req->list);
-	ent = req->ring_entry;
-	req->ring_entry = NULL;
+
+	req = ent->fuse_req;
 
 	err = fuse_ring_ent_set_commit(ent);
-	if (err != 0) {
+	if (err != 0 && !list_empty(&ent->proc_list)) {
 		pr_info_ratelimited("qid=%d commit_id %llu state %d",
 				    queue->qid, commit_id, ent->state);
 		spin_unlock(&queue->lock);
@@ -658,11 +651,19 @@ fuse_uring_create_ring_ent(struct io_uring_cmd *cmd,
 		return ERR_PTR(err);
 
 	INIT_LIST_HEAD(&ent->list);
+	INIT_LIST_HEAD(&ent->proc_list);
 
 	ent->queue = queue;
 	ent->headers = iov[0].iov_base;
 	ent->payload = iov[1].iov_base;
 
+	// Generate a unique ID and add to XArray
+	err = xa_alloc(&queue->ent_xa, &ent->id, ent, xa_limit_32b, GFP_KERNEL);
+	if (err) {
+		kfree(ent);
+		return ERR_PTR(err);
+	}
+
 	return ent;
 }
 
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 44bf237f0d5a..b77f1a485c8b 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -44,7 +44,16 @@ struct fuse_ring_ent {
 
 	enum fuse_ring_req_state state;
 
+	/*
+	 * processing queue, as list to comply with remaining fuse code
+	 * that expects the entry on a list and might also remove it
+	 * from the list
+	 */
+	struct list_head proc_list;
+
 	struct fuse_req *fuse_req;
+
+	u32 id; /* entry ID*/
 };
 
 struct fuse_ring_queue {
@@ -79,7 +88,8 @@ struct fuse_ring_queue {
 	/* fuse requests waiting for an entry slot */
 	struct list_head fuse_req_queue;
 
-	struct fuse_pqueue fpq;
+	/* XArray to store and find ring entries */
+	struct xarray ent_xa;
 };
 
 /**

^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2025-01-20  0:33 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07  0:25 [PATCH v9 00/17] fuse: fuse-over-io-uring Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 01/17] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 02/17] fuse: Move fuse_get_dev to header file Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 03/17] fuse: Move request bits Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 04/17] fuse: Add fuse-io-uring design documentation Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 05/17] fuse: make args->in_args[0] to be always the header Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
2025-01-07  9:56   ` Luis Henriques
2025-01-07 12:07     ` Bernd Schubert
2025-01-17 11:06   ` Pavel Begunkov
2025-01-19 22:47     ` Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 07/17] fuse: Make fuse_copy non static Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 08/17] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
2025-01-10 22:18   ` Joanne Koong
2025-01-07  0:25 ` [PATCH v9 09/17] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
2025-01-07 14:42   ` Luis Henriques
2025-01-07 15:59     ` Bernd Schubert
2025-01-07 16:21       ` Luis Henriques
2025-01-13 22:44   ` Joanne Koong
2025-01-20  0:33     ` Bernd Schubert
2025-01-17 11:18   ` Pavel Begunkov
2025-01-17 11:20     ` Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
2025-01-07 15:31   ` Luis Henriques
2025-01-17 11:23   ` Pavel Begunkov
2025-01-07  0:25 ` [PATCH v9 12/17] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring Bernd Schubert
2025-01-07 15:54   ` Luis Henriques
2025-01-07 18:59     ` Bernd Schubert
2025-01-07 21:25       ` Luis Henriques
2025-01-17 11:47   ` Pavel Begunkov
2025-01-17 21:52   ` Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 14/17] fuse: Allow to queue bg " Bernd Schubert
2025-01-17 11:49   ` Pavel Begunkov
2025-01-07  0:25 ` [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2025-01-07 16:14   ` Luis Henriques
2025-01-07 19:03     ` Bernd Schubert
2025-01-17 11:52   ` Pavel Begunkov
2025-01-07  0:25 ` [PATCH v9 16/17] fuse: block request allocation until io-uring init is complete Bernd Schubert
2025-01-07  0:25 ` [PATCH v9 17/17] fuse: enable fuse-over-io-uring Bernd Schubert
2025-01-17 11:52   ` Pavel Begunkov
2025-01-17  9:07 ` [PATCH v9 00/17] fuse: fuse-over-io-uring Miklos Szeredi
2025-01-17  9:12   ` Bernd Schubert
2025-01-17 12:01     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox