public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH RFC v6 00/16] fuse: fuse-over-io-uring
@ 2024-11-21 23:43 Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
                   ` (15 more replies)
  0 siblings, 16 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

[Still marked as RFC as I need to run tests on this version
and need to review myself. The design should be complete now.]

This adds support for uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
approach was taken from ublk.

Motivation for these patches is all to increase fuse performance,
by:
- Reducing kernel/userspace context switches
    - Part of that is given by the ring ring - handling multiple 
      requests on either side of kernel/userspace without the need
      to switch per request
    - Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting 
      the result of a request and fetching the next fuse request
      in one step. In contrary to legacy read/write to /dev/fuse
- Core and numa affinity - one ring per core, which allows to
  avoid cpu core context switches

A more detailed motivation description can be found in the
introction of previous patch series
https://lore.kernel.org/r/[email protected]
That description also includes benchmark results with RFCv1.
Performance with the current series needs to be tested, but will
be lower, as several optimization patches are missing, like
wake-up on the same core. These optimizations will be submitted
after merging the main changes.

The corresponding libfuse patches are on my uring branch, but needs
cleanup for submission - that will be done once the kernel design
will not change anymore
https://github.com/bsbernd/libfuse/tree/uring

Testing with that libfuse branch is possible by running something
like:

example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring  --uring-q-depth=128 /scratch/source /scratch/dest

With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:

cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
    unique: 4, result=104

Without the --uring option "cqe" is replaced by the default "dev"

dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
   unique: 4, success, outsize: 120

Future work
- different payload sizes per ring
- zero copy

Signed-off-by: Bernd Schubert <[email protected]>
---
Changes in v6:
- Update to linux-6.12
- Use 'struct fuse_iqueue_ops' and redirect fiq->ops once
  the ring is ready.
- Fix return code from fuse_uring_copy_from_ring on
  copy_from_user failure (Dan Carpenter / kernel test robot)
- Avoid list iteration in fuse_uring_cancel (Joanne)
- Simplified struct fuse_ring_req_header
	- Adds a new 'struct struct fuse_ring_ent_in_out'
- Fix assigning ring->queues[qid] in fuse_uring_create_queue,
  it was too early, resulting in races
- Add back 'FRRS_INVALID = 0' to ensure ring-ent states always
  have a value > 0
- Avoid assigning struct io_uring_cmd *cmd->pdu multiple times,
  once on settings up IO_URING_F_CANCEL is sufficient for sending
  the request as well.
- Link to v5: https://lore.kernel.org/r/[email protected]

Changes in v5:
- Main focus in v5 is the separation of headers from payload,
  which required to introduce 'struct fuse_zero_in'.
- Addressed several teardown issues, that were a regression in v4.
- Fixed "BUG: sleeping function called" due to allocation while
  holding a lock reported by David Wei
- Fix function comment reported by kernel test rebot
- Fix set but unused variabled reported by test robot
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Removal of ioctls, all configuration is done dynamically
  on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
  allocated as array of the ring/queue - removal of the tag
  variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
  is more cumbersome now and needs an almost unused 
  struct fuse_pqueue per fuse_ring_queue and uses the unique
  id of fuse requests.
- No device clones needed for to workaroung hanging mounts
  on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.

Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
  as that needs to go through another subsystem/tree) ,
  removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
  for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
  (Josef)
- Addressed several other comments from Josef (I need to go over
  the RFCv2 review again, I'm not sure if everything is addressed
  already)

- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com
- Link to v2: https://lore.kernel.org/all/[email protected]/
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Bernd Schubert (15):
      fuse: rename to fuse_dev_end_requests and make non-static
      fuse: Move fuse_get_dev to header file
      fuse: Move request bits
      fuse: Add fuse-io-uring design documentation
      fuse: make args->in_args[0] to be always the header
      fuse: {uring} Handle SQEs - register commands
      fuse: Make fuse_copy non static
      fuse: Add fuse-io-uring handling into fuse_copy
      fuse: {uring} Add uring sqe commit and fetch support
      fuse: {uring} Handle teardown of ring entries
      fuse: {uring} Allow to queue fg requests through io-uring
      fuse: {uring} Allow to queue to the ring
      fuse: {uring} Handle IO_URING_F_TASK_DEAD
      fuse: {io-uring} Prevent mount point hang on fuse-server termination
      fuse: enable fuse-over-io-uring

Pavel Begunkov (1):
      io_uring/cmd: let cmds to know about dying task

 Documentation/filesystems/fuse-io-uring.rst |  101 ++
 fs/fuse/Kconfig                             |   12 +
 fs/fuse/Makefile                            |    1 +
 fs/fuse/dax.c                               |   13 +-
 fs/fuse/dev.c                               |  139 +--
 fs/fuse/dev_uring.c                         | 1339 +++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h                       |  201 ++++
 fs/fuse/dir.c                               |   41 +-
 fs/fuse/fuse_dev_i.h                        |   69 ++
 fs/fuse/fuse_i.h                            |   21 +
 fs/fuse/inode.c                             |    5 +-
 fs/fuse/xattr.c                             |    9 +-
 include/linux/io_uring_types.h              |    1 +
 include/uapi/linux/fuse.h                   |   57 ++
 io_uring/uring_cmd.c                        |    6 +-
 15 files changed, 1939 insertions(+), 76 deletions(-)
---
base-commit: 3022e9d00ebec31ed435ae0844e3f235dba998a9
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8

Best regards,
-- 
Bernd Schubert <[email protected]>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 01/16] fuse: rename to fuse_dev_end_requests and make non-static
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
 fs/fuse/dev.c        | 11 +++++------
 fs/fuse/fuse_dev_i.h | 15 +++++++++++++++
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1f64ae6d7a69e53c8d96f2e1f5caca3ff2b4ab26..09b73044a9b6748767d2479dda0a09a97b8b4c0f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -7,6 +7,7 @@
 */
 
 #include "fuse_i.h"
+#include "fuse_dev_i.h"
 
 #include <linux/init.h>
 #include <linux/module.h>
@@ -34,8 +35,6 @@ MODULE_ALIAS("devname:fuse");
 
 static struct kmem_cache *fuse_req_cachep;
 
-static void end_requests(struct list_head *head);
-
 static struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*
@@ -1873,7 +1872,7 @@ static void fuse_resend(struct fuse_conn *fc)
 		spin_unlock(&fiq->lock);
 		list_for_each_entry(req, &to_queue, list)
 			clear_bit(FR_PENDING, &req->flags);
-		end_requests(&to_queue);
+		fuse_dev_end_requests(&to_queue);
 		return;
 	}
 	/* iq and pq requests are both oldest to newest */
@@ -2192,7 +2191,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
 }
 
 /* Abort all requests on the given list (pending or processing) */
-static void end_requests(struct list_head *head)
+void fuse_dev_end_requests(struct list_head *head)
 {
 	while (!list_empty(head)) {
 		struct fuse_req *req;
@@ -2295,7 +2294,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		wake_up_all(&fc->blocked_waitq);
 		spin_unlock(&fc->lock);
 
-		end_requests(&to_end);
+		fuse_dev_end_requests(&to_end);
 	} else {
 		spin_unlock(&fc->lock);
 	}
@@ -2325,7 +2324,7 @@ int fuse_dev_release(struct inode *inode, struct file *file)
 			list_splice_init(&fpq->processing[i], &to_end);
 		spin_unlock(&fpq->lock);
 
-		end_requests(&to_end);
+		fuse_dev_end_requests(&to_end);
 
 		/* Are we the last open device? */
 		if (atomic_dec_and_test(&fc->dev_count)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..5a1b8a2775d84274abee46eabb3000345b2d9da0
--- /dev/null
+++ b/fs/fuse/fuse_dev_i.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2008  Miklos Szeredi <[email protected]>
+ */
+#ifndef _FS_FUSE_DEV_I_H
+#define _FS_FUSE_DEV_I_H
+
+#include <linux/types.h>
+
+void fuse_dev_end_requests(struct list_head *head);
+
+#endif
+
+

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 02/16] fuse: Move fuse_get_dev to header file
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 03/16] fuse: Move request bits Bernd Schubert
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Another preparation patch, as this function will be needed by
fuse/dev.c and fuse/dev_uring.c.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
 fs/fuse/dev.c        | 9 ---------
 fs/fuse/fuse_dev_i.h | 9 +++++++++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 09b73044a9b6748767d2479dda0a09a97b8b4c0f..649513b55906d2aef99f79a942c2c63113796b5a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -35,15 +35,6 @@ MODULE_ALIAS("devname:fuse");
 
 static struct kmem_cache *fuse_req_cachep;
 
-static struct fuse_dev *fuse_get_dev(struct file *file)
-{
-	/*
-	 * Lockless access is OK, because file->private data is set
-	 * once during mount and is valid until the file is released.
-	 */
-	return READ_ONCE(file->private_data);
-}
-
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 {
 	INIT_LIST_HEAD(&req->list);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 5a1b8a2775d84274abee46eabb3000345b2d9da0..b38e67b3f889f3fa08f7279e3309cde908527146 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,15 @@
 
 #include <linux/types.h>
 
+static inline struct fuse_dev *fuse_get_dev(struct file *file)
+{
+	/*
+	 * Lockless access is OK, because file->private data is set
+	 * once during mount and is valid until the file is released.
+	 */
+	return READ_ONCE(file->private_data);
+}
+
 void fuse_dev_end_requests(struct list_head *head);
 
 #endif

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 03/16] fuse: Move request bits
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

These are needed by dev_uring functions as well

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
 fs/fuse/dev.c        | 4 ----
 fs/fuse/fuse_dev_i.h | 4 ++++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 649513b55906d2aef99f79a942c2c63113796b5a..fd8898b0c1cca4d117982d5208d78078472b0dfb 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -29,10 +29,6 @@
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
 
-/* Ordinary requests have even IDs, while interrupts IDs are odd */
-#define FUSE_INT_REQ_BIT (1ULL << 0)
-#define FUSE_REQ_ID_STEP (1ULL << 1)
-
 static struct kmem_cache *fuse_req_cachep;
 
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index b38e67b3f889f3fa08f7279e3309cde908527146..6c506f040d5fb57dae746880c657a95637ac50ce 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,10 @@
 
 #include <linux/types.h>
 
+/* Ordinary requests have even IDs, while interrupts IDs are odd */
+#define FUSE_INT_REQ_BIT (1ULL << 0)
+#define FUSE_REQ_ID_STEP (1ULL << 1)
+
 static inline struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 04/16] fuse: Add fuse-io-uring design documentation
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (2 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 03/16] fuse: Move request bits Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Signed-off-by: Bernd Schubert <[email protected]>
---
 Documentation/filesystems/fuse-io-uring.rst | 101 ++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 0000000000000000000000000000000000000000..50fdba1ea566588be3663e29b04bb9bbb6c9e4fb
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+FUSE Uring design documentation
+==============================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through uring, userspace
+is required to also handle requests through /dev/fuse after
+uring setup is complete.  Specifically notifications (initiated from
+the daemon side) and interrupts.
+
+Fuse io-uring configuration
+========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until uring setup is complete.
+
+In order to set up fuse-over-io-uring fuse-server (user-space)
+needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the
+/dev/fuse connection file descriptor. Initial submit is with
+the sub command FUSE_URING_REQ_FETCH, which will just register
+entries to be available in the kernel.
+
+Once at least one entry per queue is submitted, kernel starts
+to enqueue to ring queues.
+Note, every CPU core has its own fuse-io-uring queue.
+Userspace handles the CQE/fuse-request and submits the result as
+subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
+the requests and also marks the entry available again. If there are
+pending requests waiting the request will be immediately submitted
+to the daemon again.
+
+Initial SQE
+-----------
+
+ |                                    |  FUSE filesystem daemon
+ |                                    |
+ |                                    |  >io_uring_submit()
+ |                                    |   IORING_OP_URING_CMD /
+ |                                    |   FUSE_URING_REQ_FETCH
+ |                                    |  [wait cqe]
+ |                                    |   >io_uring_wait_cqe() or
+ |                                    |   >io_uring_submit_and_wait()
+ |                                    |
+ |  >fuse_uring_cmd()                 |
+ |   >fuse_uring_fetch()              |
+ |    >fuse_uring_ent_release()       |
+
+
+Sending requests with CQEs
+--------------------------
+
+ |                                         |  FUSE filesystem daemon
+ |                                         |  [waiting for CQEs]
+ |  "rm /mnt/fuse/file"                    |
+ |                                         |
+ |  >sys_unlink()                          |
+ |    >fuse_unlink()                       |
+ |      [allocate request]                 |
+ |      >__fuse_request_send()             |
+ |        ...                              |
+ |       >fuse_uring_queue_fuse_req        |
+ |        [queue request on fg or          |
+ |          bg queue]                      |
+ |         >fuse_uring_assign_ring_entry() |
+ |         >fuse_uring_send_to_ring()      |
+ |          >fuse_uring_copy_to_ring()     |
+ |          >io_uring_cmd_done()           |
+ |          >request_wait_answer()         |
+ |           [sleep on req->waitq]         |
+ |                                         |  [receives and handles CQE]
+ |                                         |  [submit result and fetch next]
+ |                                         |  >io_uring_submit()
+ |                                         |   IORING_OP_URING_CMD/
+ |                                         |   FUSE_URING_REQ_COMMIT_AND_FETCH
+ |  >fuse_uring_cmd()                      |
+ |   >fuse_uring_commit_and_release()      |
+ |    >fuse_uring_copy_from_ring()         |
+ |     [ copy the result to the fuse req]  |
+ |     >fuse_uring_req_end_and_get_next()  |
+ |      >fuse_request_end()                |
+ |       [wake up req->waitq]              |
+ |      >fuse_uring_ent_release_and_fetch()|
+ |       [wait or handle next req]         |
+ |                                         |
+ |                                         |
+ |       [req->waitq woken up]             |
+ |    <fuse_unlink()                       |
+ |  <sys_unlink()                          |
+
+
+

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (3 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-23  9:01   ` Miklos Szeredi
  2024-11-21 23:43 ` [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This change sets up FUSE operations to have headers in args.in_args[0],
even for opcodes without an actual header. We do this to prepare for
cleanly separating payload from headers in the future.

For opcodes without a header, we use a zero-sized struct as a
placeholder. This approach:
- Keeps things consistent across all FUSE operations
- Will help with payload alignment later
- Avoids future issues when header sizes change

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dax.c    | 13 ++++++++-----
 fs/fuse/dev.c    | 24 ++++++++++++++++++++----
 fs/fuse/dir.c    | 41 +++++++++++++++++++++++++++--------------
 fs/fuse/fuse_i.h |  7 +++++++
 fs/fuse/xattr.c  |  9 ++++++---
 5 files changed, 68 insertions(+), 26 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
 	struct fuse_inode *fi = get_fuse_inode(inode);
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	FUSE_ARGS(args);
+	struct fuse_zero_in zero_arg;
 
 	args.opcode = FUSE_REMOVEMAPPING;
 	args.nodeid = fi->nodeid;
-	args.in_numargs = 2;
-	args.in_args[0].size = sizeof(*inargp);
-	args.in_args[0].value = inargp;
-	args.in_args[1].size = inargp->count * sizeof(*remove_one);
-	args.in_args[1].value = remove_one;
+	args.in_numargs = 3;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = sizeof(*inargp);
+	args.in_args[1].value = inargp;
+	args.in_args[2].size = inargp->count * sizeof(*remove_one);
+	args.in_args[2].value = remove_one;
 	return fuse_simple_request(fm, &args);
 }
 
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index fd8898b0c1cca4d117982d5208d78078472b0dfb..6cb45b5332c45f322e9163469ffd114cbc07dc4f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1053,6 +1053,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
 
 	for (i = 0; !err && i < numargs; i++)  {
 		struct fuse_arg *arg = &args[i];
+
+		/* zero headers */
+		if (arg->size == 0) {
+			if (WARN_ON_ONCE(i != 0)) {
+				if (cs->req)
+					pr_err_once(
+						"fuse: zero size header in opcode %d\n",
+						cs->req->in.h.opcode);
+				return -EINVAL;
+			}
+			continue;
+		}
+
 		if (i == numargs - 1 && argpages)
 			err = fuse_copy_pages(cs, arg->size, zeroing);
 		else
@@ -1709,6 +1722,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	size_t args_size = sizeof(*ra);
 	struct fuse_args_pages *ap;
 	struct fuse_args *args;
+	struct fuse_zero_in zero_arg;
 
 	offset = outarg->offset & ~PAGE_MASK;
 	file_size = i_size_read(inode);
@@ -1735,7 +1749,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	args = &ap->args;
 	args->nodeid = outarg->nodeid;
 	args->opcode = FUSE_NOTIFY_REPLY;
-	args->in_numargs = 2;
+	args->in_numargs = 3;
 	args->in_pages = true;
 	args->end = fuse_retrieve_end;
 
@@ -1762,9 +1776,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	}
 	ra->inarg.offset = outarg->offset;
 	ra->inarg.size = total_len;
-	args->in_args[0].size = sizeof(ra->inarg);
-	args->in_args[0].value = &ra->inarg;
-	args->in_args[1].size = total_len;
+	args->in_args[0].size = sizeof(zero_arg);
+	args->in_args[0].value = &zero_arg;
+	args->in_args[1].size = sizeof(ra->inarg);
+	args->in_args[1].value = &ra->inarg;
+	args->in_args[2].size = total_len;
 
 	err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
 	if (err)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 54104dd48af7c94b312f1a8671c8905542d456c4..bea9fba2b1473750c70a1c336d695c5c205d9c07 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -172,12 +172,16 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 			     u64 nodeid, const struct qstr *name,
 			     struct fuse_entry_out *outarg)
 {
+	struct fuse_zero_in zero_arg;
+
 	memset(outarg, 0, sizeof(struct fuse_entry_out));
 	args->opcode = FUSE_LOOKUP;
 	args->nodeid = nodeid;
-	args->in_numargs = 1;
-	args->in_args[0].size = name->len + 1;
-	args->in_args[0].value = name->name;
+	args->in_numargs = 2;
+	args->in_args[0].size = sizeof(zero_arg);
+	args->in_args[0].value = &zero_arg;
+	args->in_args[1].size = name->len + 1;
+	args->in_args[1].value = name->name;
 	args->out_numargs = 1;
 	args->out_args[0].size = sizeof(struct fuse_entry_out);
 	args->out_args[0].value = outarg;
@@ -922,16 +926,19 @@ static int fuse_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
 			struct dentry *entry, const char *link)
 {
+	struct fuse_zero_in zero_arg;
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	unsigned len = strlen(link) + 1;
 	FUSE_ARGS(args);
 
 	args.opcode = FUSE_SYMLINK;
-	args.in_numargs = 2;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
-	args.in_args[1].size = len;
-	args.in_args[1].value = link;
+	args.in_numargs = 3;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
+	args.in_args[2].size = len;
+	args.in_args[2].value = link;
 	return create_new_entry(idmap, fm, &args, dir, entry, S_IFLNK);
 }
 
@@ -982,6 +989,7 @@ static void fuse_entry_unlinked(struct dentry *entry)
 
 static int fuse_unlink(struct inode *dir, struct dentry *entry)
 {
+	struct fuse_zero_in inarg;
 	int err;
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
@@ -991,9 +999,11 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 
 	args.opcode = FUSE_UNLINK;
 	args.nodeid = get_node_id(dir);
-	args.in_numargs = 1;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
+	args.in_numargs = 2;
+	args.in_args[0].size = sizeof(inarg);
+	args.in_args[0].value = &inarg;
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
 	err = fuse_simple_request(fm, &args);
 	if (!err) {
 		fuse_dir_changed(dir);
@@ -1005,6 +1015,7 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 
 static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 {
+	struct fuse_zero_in zero_arg;
 	int err;
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
@@ -1014,9 +1025,11 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 
 	args.opcode = FUSE_RMDIR;
 	args.nodeid = get_node_id(dir);
-	args.in_numargs = 1;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
+	args.in_numargs = 2;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
 	err = fuse_simple_request(fm, &args);
 	if (!err) {
 		fuse_dir_changed(dir);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e6cc3d552b1382fc43bfe5191efc46e956ca268c..d9c79cc5318f9591c313e233335d40931d6c7f58 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -938,6 +938,13 @@ struct fuse_mount {
 	struct rcu_head rcu;
 };
 
+/*
+ * Empty header for FUSE opcodes without specific header needs.
+ * Used as a placeholder in args->in_args[0] for consistency
+ * across all FUSE operations, simplifying request handling.
+ */
+struct fuse_zero_in {};
+
 static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
 {
 	return sb->s_fs_info;
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 9f568d345c51236ddd421b162820a4ea9b0734f4..c26afacbe53c1a164e27d6253360e0e1808e2ea6 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -158,15 +158,18 @@ int fuse_removexattr(struct inode *inode, const char *name)
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	FUSE_ARGS(args);
 	int err;
+	struct fuse_zero_in zero_arg;
 
 	if (fm->fc->no_removexattr)
 		return -EOPNOTSUPP;
 
 	args.opcode = FUSE_REMOVEXATTR;
 	args.nodeid = get_node_id(inode);
-	args.in_numargs = 1;
-	args.in_args[0].size = strlen(name) + 1;
-	args.in_args[0].value = name;
+	args.in_numargs = 2;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = strlen(name) + 1;
+	args.in_args[1].value = name;
 	err = fuse_simple_request(fm, &args);
 	if (err == -ENOSYS) {
 		fm->fc->no_removexattr = 1;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (4 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-23  9:52   ` Miklos Szeredi
  2024-11-21 23:43 ` [PATCH RFC v6 07/16] fuse: Make fuse_copy non static Bernd Schubert
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_URING_REQ_FETCH is handled to register queue entries.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/Kconfig           |  12 ++
 fs/fuse/Makefile          |   1 +
 fs/fuse/dev.c             |   4 +
 fs/fuse/dev_uring.c       | 348 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h     | 109 +++++++++++++++
 fs/fuse/fuse_dev_i.h      |   1 +
 fs/fuse/fuse_i.h          |   5 +
 fs/fuse/inode.c           |   3 +
 include/uapi/linux/fuse.h |  57 ++++++++
 9 files changed, 540 insertions(+)

diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..11f37cefc94b2af5a675c238801560c822b95f1a 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
 	  to be performed directly on a backing file.
 
 	  If you want to allow passthrough operations, answer Y.
+
+config FUSE_IO_URING
+	bool "FUSE communication over io-uring"
+	default y
+	depends on FUSE_FS
+	depends on IO_URING
+	help
+	  This allows sending FUSE requests over the IO uring interface and
+          also adds request core affinity.
+
+	  If you want to allow fuse server/client communication through io-uring,
+	  answer Y
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index ce0ff7a9007b94b4ab246b5271f227d126c768e8..fcf16b1c391a9bf11ca9f3a25b137acdb203ac47 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -14,5 +14,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
 fuse-y += iomode.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
+fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6cb45b5332c45f322e9163469ffd114cbc07dc4f..53f60fb5de230635d1a158ae5c40d6b2c314ecd2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -6,6 +6,7 @@
   See the file COPYING.
 */
 
+#include "dev_uring_i.h"
 #include "fuse_i.h"
 #include "fuse_dev_i.h"
 
@@ -2467,6 +2468,9 @@ const struct file_operations fuse_dev_operations = {
 	.fasync		= fuse_dev_fasync,
 	.unlocked_ioctl = fuse_dev_ioctl,
 	.compat_ioctl   = compat_ptr_ioctl,
+#ifdef CONFIG_FUSE_IO_URING
+	.uring_cmd	= fuse_uring_cmd,
+#endif
 };
 EXPORT_SYMBOL_GPL(fuse_dev_operations);
 
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
new file mode 100644
index 0000000000000000000000000000000000000000..ef5e40dcbc5154d8665c7c7ad46123c4a1d621ee
--- /dev/null
+++ b/fs/fuse/dev_uring.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#include <linux/fs.h>
+
+#include "fuse_i.h"
+#include "dev_uring_i.h"
+#include "fuse_dev_i.h"
+
+#include <linux/io_uring/cmd.h>
+
+#ifdef CONFIG_FUSE_IO_URING
+static bool __read_mostly enable_uring;
+module_param(enable_uring, bool, 0644);
+MODULE_PARM_DESC(enable_uring,
+		 "Enable uring userspace communication through uring.");
+#endif
+
+static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
+{
+	struct fuse_ring_queue *queue = ent->queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
+		return -EIO;
+
+	ent->state = FRRS_COMMIT;
+	list_move(&ent->list, &queue->ent_commit_queue);
+
+	return 0;
+}
+
+void fuse_uring_destruct(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+	int qid;
+
+	if (!ring)
+		return;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		WARN_ON(!list_empty(&queue->ent_avail_queue));
+		WARN_ON(!list_empty(&queue->ent_commit_queue));
+
+		kfree(queue);
+		ring->queues[qid] = NULL;
+	}
+
+	kfree(ring->queues);
+	kfree(ring);
+	fc->ring = NULL;
+}
+
+#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+
+/*
+ * Basic ring setup for this connection based on the provided configuration
+ */
+static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = NULL;
+	size_t nr_queues = num_possible_cpus();
+	struct fuse_ring *res = NULL;
+
+	ring = kzalloc(sizeof(*fc->ring) +
+			       nr_queues * sizeof(struct fuse_ring_queue),
+		       GFP_KERNEL_ACCOUNT);
+	if (!ring)
+		return NULL;
+
+	ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
+			       GFP_KERNEL_ACCOUNT);
+	if (!ring->queues)
+		goto out_err;
+
+	spin_lock(&fc->lock);
+	if (fc->ring) {
+		/* race, another thread created the ring in the mean time */
+		spin_unlock(&fc->lock);
+		res = fc->ring;
+		goto out_err;
+	}
+
+	fc->ring = ring;
+	ring->nr_queues = nr_queues;
+	ring->fc = fc;
+
+	spin_unlock(&fc->lock);
+	return ring;
+
+out_err:
+	if (ring)
+		kfree(ring->queues);
+	kfree(ring);
+	return res;
+}
+
+static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
+						       int qid)
+{
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_ring_queue *queue;
+
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
+	if (!queue)
+		return ERR_PTR(-ENOMEM);
+	spin_lock(&fc->lock);
+	if (ring->queues[qid]) {
+		spin_unlock(&fc->lock);
+		kfree(queue);
+		return ring->queues[qid];
+	}
+
+	queue->qid = qid;
+	queue->ring = ring;
+	spin_lock_init(&queue->lock);
+
+	INIT_LIST_HEAD(&queue->ent_avail_queue);
+	INIT_LIST_HEAD(&queue->ent_commit_queue);
+
+	WRITE_ONCE(ring->queues[qid], queue);
+	spin_unlock(&fc->lock);
+
+	return queue;
+}
+
+/*
+ * Put a ring request onto hold, it is no longer used for now.
+ */
+static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
+				 struct fuse_ring_queue *queue)
+	__must_hold(&queue->lock)
+{
+	struct fuse_ring *ring = queue->ring;
+
+	lockdep_assert_held(&queue->lock);
+
+	/* unsets all previous flags - basically resets */
+	pr_devel("%s ring=%p qid=%d state=%d\n", __func__, ring,
+		 ring_ent->queue->qid, ring_ent->state);
+
+	if (WARN_ON(ring_ent->state != FRRS_COMMIT)) {
+		pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+			ring_ent->state);
+		return;
+	}
+
+	list_move(&ring_ent->list, &queue->ent_avail_queue);
+
+	ring_ent->state = FRRS_WAIT;
+}
+
+/*
+ * fuse_uring_req_fetch command handling
+ */
+static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
+			      struct io_uring_cmd *cmd,
+			      unsigned int issue_flags)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	spin_lock(&queue->lock);
+	fuse_uring_ent_avail(ring_ent, queue);
+	ring_ent->cmd = cmd;
+	spin_unlock(&queue->lock);
+}
+
+/*
+ * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
+ * the payload
+ */
+static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
+					 struct iovec iov[FUSE_URING_IOV_SEGS])
+{
+	struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	struct iov_iter iter;
+	ssize_t ret;
+
+	if (sqe->len != FUSE_URING_IOV_SEGS)
+		return -EINVAL;
+
+	/*
+	 * Direction for buffer access will actually be READ and WRITE,
+	 * using write for the import should include READ access as well.
+	 */
+	ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
+			   FUSE_URING_IOV_SEGS, &iov, &iter);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
+			    struct fuse_conn *fc)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	struct fuse_ring_ent *ring_ent;
+	int err;
+	struct iovec iov[FUSE_URING_IOV_SEGS];
+
+	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
+	if (err) {
+		pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
+				    err);
+		return err;
+	}
+
+#if 0
+	/* Does not work as sending over io-uring is async */
+	err = -ETXTBSY;
+	if (fc->initialized) {
+		pr_info_ratelimited(
+			"Received FUSE_URING_REQ_FETCH after connection is initialized\n");
+		return err;
+	}
+#endif
+
+	err = -ENOMEM;
+	if (!ring) {
+		ring = fuse_uring_create(fc);
+		if (!ring)
+			return err;
+	}
+
+	queue = ring->queues[cmd_req->qid];
+	if (!queue) {
+		queue = fuse_uring_create_queue(ring, cmd_req->qid);
+		if (!queue)
+			return err;
+	}
+
+	/*
+	 * The created queue above does not need to be destructed in
+	 * case of entry errors below, will be done at ring destruction time.
+	 */
+
+	ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
+	if (ring_ent == NULL)
+		return err;
+
+	INIT_LIST_HEAD(&ring_ent->list);
+
+	ring_ent->queue = queue;
+	ring_ent->cmd = cmd;
+
+	err = -EINVAL;
+	if (iov[0].iov_len < sizeof(struct fuse_ring_req_header)) {
+		pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
+		goto err;
+	}
+
+	ring_ent->headers = iov[0].iov_base;
+	ring_ent->payload = iov[1].iov_base;
+	ring_ent->max_arg_len = iov[1].iov_len;
+
+	if (ring_ent->max_arg_len <
+	    max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write)) {
+		pr_info_ratelimited("Invalid req payload len %zu\n",
+				    ring_ent->max_arg_len);
+		goto err;
+	}
+
+	spin_lock(&queue->lock);
+
+	/*
+	 * FUSE_URING_REQ_FETCH is an initialization exception, needs
+	 * state override
+	 */
+	ring_ent->state = FRRS_USERSPACE;
+	err = fuse_ring_ent_unset_userspace(ring_ent);
+	spin_unlock(&queue->lock);
+	if (WARN_ON_ONCE(err != 0))
+		goto err;
+
+	_fuse_uring_fetch(ring_ent, cmd, issue_flags);
+
+	return 0;
+err:
+	list_del_init(&ring_ent->list);
+	kfree(ring_ent);
+	return err;
+}
+
+/*
+ * Entry function from io_uring to handle the given passthrough command
+ * (op cocde IORING_OP_URING_CMD)
+ */
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_dev *fud;
+	struct fuse_conn *fc;
+	u32 cmd_op = cmd->cmd_op;
+	int err = 0;
+
+	/* Disabled for now, especially as teardown is not implemented yet */
+	err = -EOPNOTSUPP;
+	pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
+	return err;
+
+	err = -EOPNOTSUPP;
+	if (!enable_uring) {
+		pr_info_ratelimited("uring is disabled\n");
+		return err;
+	}
+
+	err = -ENOTCONN;
+	fud = fuse_get_dev(cmd->file);
+	if (!fud) {
+		pr_info_ratelimited("No fuse device found\n");
+		return err;
+	}
+	fc = fud->fc;
+
+	if (fc->aborted)
+		return err;
+
+	switch (cmd_op) {
+	case FUSE_URING_REQ_FETCH:
+		err = fuse_uring_fetch(cmd, issue_flags, fc);
+		if (err) {
+			pr_info_once("fuse_uring_fetch failed err=%d\n", err);
+			return err;
+		}
+		break;
+	default:
+		err = -EINVAL;
+		pr_devel("Unknown uring command %d", cmd_op);
+		return err;
+	}
+
+	pr_devel("uring cmd op=%d, qid=%d ID=%llu ret=%d\n", cmd_op,
+		 cmd_req->qid, cmd_req->commit_id, err);
+
+	return -EIOCBQUEUED;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..fab6f2a6c14b9de0aa8ec525ab17e59315c31e6a
--- /dev/null
+++ b/fs/fuse/dev_uring_i.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#ifndef _FS_FUSE_DEV_URING_I_H
+#define _FS_FUSE_DEV_URING_I_H
+
+#include "fuse_i.h"
+
+#ifdef CONFIG_FUSE_IO_URING
+
+enum fuse_ring_req_state {
+	FRRS_INVALID = 0,
+
+	/* ring entry received from userspace and it being processed */
+	FRRS_COMMIT,
+
+	/* The ring request waits for a new fuse request */
+	FRRS_WAIT,
+
+	/* request is in or on the way to user space */
+	FRRS_USERSPACE,
+};
+
+/** A fuse ring entry, part of the ring queue */
+struct fuse_ring_ent {
+	/* userspace buffer */
+	struct fuse_ring_req_header __user *headers;
+	void *__user *payload;
+
+	/* the ring queue that owns the request */
+	struct fuse_ring_queue *queue;
+
+	struct io_uring_cmd *cmd;
+
+	struct list_head list;
+
+	/* size of payload buffer */
+	size_t max_arg_len;
+
+	/*
+	 * state the request is currently in
+	 * (enum fuse_ring_req_state)
+	 */
+	unsigned int state;
+
+	struct fuse_req *fuse_req;
+};
+
+struct fuse_ring_queue {
+	/*
+	 * back pointer to the main fuse uring structure that holds this
+	 * queue
+	 */
+	struct fuse_ring *ring;
+
+	/* queue id, typically also corresponds to the cpu core */
+	unsigned int qid;
+
+	/*
+	 * queue lock, taken when any value in the queue changes _and_ also
+	 * a ring entry state changes.
+	 */
+	spinlock_t lock;
+
+	/* available ring entries (struct fuse_ring_ent) */
+	struct list_head ent_avail_queue;
+
+	/*
+	 * entries in the process of being committed or in the process
+	 * to be send to userspace
+	 */
+	struct list_head ent_commit_queue;
+};
+
+/**
+ * Describes if uring is for communication and holds alls the data needed
+ * for uring communication
+ */
+struct fuse_ring {
+	/* back pointer */
+	struct fuse_conn *fc;
+
+	/* number of ring queues */
+	size_t nr_queues;
+
+	struct fuse_ring_queue **queues;
+};
+
+void fuse_uring_destruct(struct fuse_conn *fc);
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+
+#else /* CONFIG_FUSE_IO_URING */
+
+struct fuse_ring;
+
+static inline void fuse_uring_create(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_destruct(struct fuse_conn *fc)
+{
+}
+
+#endif /* CONFIG_FUSE_IO_URING */
+
+#endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 6c506f040d5fb57dae746880c657a95637ac50ce..e82cbf9c569af4f271ba0456cb49e0a5116bf36b 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,7 @@
 
 #include <linux/types.h>
 
+
 /* Ordinary requests have even IDs, while interrupts IDs are odd */
 #define FUSE_INT_REQ_BIT (1ULL << 0)
 #define FUSE_REQ_ID_STEP (1ULL << 1)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d9c79cc5318f9591c313e233335d40931d6c7f58..5e009a3511d3dd4e9c0e8b4f08ebb271831b1236 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -914,6 +914,11 @@ struct fuse_conn {
 	/** IDR for backing files ids */
 	struct idr backing_files_map;
 #endif
+
+#ifdef CONFIG_FUSE_IO_URING
+	/**  uring connection information*/
+	struct fuse_ring *ring;
+#endif
 };
 
 /*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index fd3321e29a3e569bf06be22a5383cf34fd42c051..c8f72a50047ac1dfc7e52e9f4e49716a016326ff 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -7,6 +7,7 @@
 */
 
 #include "fuse_i.h"
+#include "dev_uring_i.h"
 
 #include <linux/pagemap.h>
 #include <linux/slab.h>
@@ -959,6 +960,8 @@ static void delayed_release(struct rcu_head *p)
 {
 	struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
 
+	fuse_uring_destruct(fc);
+
 	put_user_ns(fc->user_ns);
 	fc->release(fc);
 }
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index f1e99458e29e4fdce5273bc3def242342f207ebd..623ffe6a5b20d73dffc8f2abe781953540c79c9d 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -1206,4 +1206,61 @@ struct fuse_supp_groups {
 	uint32_t	groups[];
 };
 
+/**
+ * Size of the ring buffer header
+ */
+#define FUSE_RING_IN_OUT_HEADER_SZ 128
+#define FUSE_RING_OP_IN_OUT_SZ 128
+
+struct fuse_ring_ent_in_out {
+	uint64_t flags;
+
+	/* size of use payload buffer */
+	uint32_t payload_sz;
+	uint32_t padding;
+
+	uint8_t reserved[30];
+};
+
+/**
+ * This structure mapped onto the
+ */
+struct fuse_ring_req_header {
+	/* struct fuse_in / struct fuse_out */
+	char in_out[FUSE_RING_IN_OUT_HEADER_SZ];
+
+	/* per op code structs */
+	char op_in[FUSE_RING_OP_IN_OUT_SZ];
+
+	/* struct fuse_ring_in_out */
+	char ring_ent_in_out[sizeof(struct fuse_ring_ent_in_out)];
+};
+
+/**
+ * sqe commands to the kernel
+ */
+enum fuse_uring_cmd {
+	FUSE_URING_REQ_INVALID = 0,
+
+	/* submit sqe to kernel to get a request */
+	FUSE_URING_REQ_FETCH = 1,
+
+	/* commit result and fetch next request */
+	FUSE_URING_REQ_COMMIT_AND_FETCH = 2,
+};
+
+/**
+ * In the 80B command area of the SQE.
+ */
+struct fuse_uring_cmd_req {
+	uint64_t flags;
+
+	/* entry identifier */
+	uint64_t commit_id;
+
+	/* queue the command is for (queue index) */
+	uint16_t qid;
+	uint8_t padding[6];
+};
+
 #endif /* _LINUX_FUSE_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 07/16] fuse: Make fuse_copy non static
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (5 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c        | 30 ++++++++----------------------
 fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 53f60fb5de230635d1a158ae5c40d6b2c314ecd2..aa654989d768df3dd82690ef071bbe0aa87817b0 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -679,22 +679,8 @@ static int unlock_request(struct fuse_req *req)
 	return err;
 }
 
-struct fuse_copy_state {
-	int write;
-	struct fuse_req *req;
-	struct iov_iter *iter;
-	struct pipe_buffer *pipebufs;
-	struct pipe_buffer *currbuf;
-	struct pipe_inode_info *pipe;
-	unsigned long nr_segs;
-	struct page *pg;
-	unsigned len;
-	unsigned offset;
-	unsigned move_pages:1;
-};
-
-static void fuse_copy_init(struct fuse_copy_state *cs, int write,
-			   struct iov_iter *iter)
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+		    struct iov_iter *iter)
 {
 	memset(cs, 0, sizeof(*cs));
 	cs->write = write;
@@ -1045,9 +1031,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
 }
 
 /* Copy request arguments to/from userspace buffer */
-static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
-			  unsigned argpages, struct fuse_arg *args,
-			  int zeroing)
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
+		   unsigned argpages, struct fuse_arg *args,
+		   int zeroing)
 {
 	int err = 0;
 	unsigned i;
@@ -1937,8 +1923,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 	return NULL;
 }
 
-static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
-			 unsigned nbytes)
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+		       unsigned nbytes)
 {
 	unsigned reqsize = sizeof(struct fuse_out_header);
 
@@ -2040,7 +2026,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	if (oh.error)
 		err = nbytes != sizeof(oh) ? -EINVAL : 0;
 	else
-		err = copy_out_args(cs, req->args, nbytes);
+		err = fuse_copy_out_args(cs, req->args, nbytes);
 	fuse_copy_finish(cs);
 
 	spin_lock(&fpq->lock);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index e82cbf9c569af4f271ba0456cb49e0a5116bf36b..f36e304cd62c8302aed95de89926fc894f602cfd 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -13,6 +13,23 @@
 #define FUSE_INT_REQ_BIT (1ULL << 0)
 #define FUSE_REQ_ID_STEP (1ULL << 1)
 
+struct fuse_arg;
+struct fuse_args;
+
+struct fuse_copy_state {
+	int write;
+	struct fuse_req *req;
+	struct iov_iter *iter;
+	struct pipe_buffer *pipebufs;
+	struct pipe_buffer *currbuf;
+	struct pipe_inode_info *pipe;
+	unsigned long nr_segs;
+	struct page *pg;
+	unsigned int len;
+	unsigned int offset;
+	unsigned int move_pages:1;
+};
+
 static inline struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*
@@ -24,6 +41,14 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 
 void fuse_dev_end_requests(struct list_head *head);
 
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+			   struct iov_iter *iter);
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
+		   unsigned int argpages, struct fuse_arg *args,
+		   int zeroing);
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+		       unsigned int nbytes);
+
 #endif
 
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 08/16] fuse: Add fuse-io-uring handling into fuse_copy
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (6 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 07/16] fuse: Make fuse_copy non static Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Add special fuse-io-uring into the fuse argument
copy handler.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         | 12 +++++++++++-
 fs/fuse/dev_uring_i.h |  6 +++---
 fs/fuse/fuse_dev_i.h  |  5 +++++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index aa654989d768df3dd82690ef071bbe0aa87817b0..ad65ede9c7723bb6f3589e64b8eef7429fa4b488 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -787,6 +787,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
 	*size -= ncpy;
 	cs->len -= ncpy;
 	cs->offset += ncpy;
+	if (cs->is_uring)
+		cs->ring.offset += ncpy;
+
 	return ncpy;
 }
 
@@ -1926,7 +1929,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
 		       unsigned nbytes)
 {
-	unsigned reqsize = sizeof(struct fuse_out_header);
+
+	unsigned int reqsize = 0;
+
+	/*
+	 * Uring has all headers separated from args - args is payload only
+	 */
+	if (!cs->is_uring)
+		reqsize = sizeof(struct fuse_out_header);
 
 	reqsize += fuse_len_args(args->out_numargs, args->out_args);
 
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index fab6f2a6c14b9de0aa8ec525ab17e59315c31e6a..17767e373d31969fe2987fed31c66b5077f209c6 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -14,13 +14,13 @@
 enum fuse_ring_req_state {
 	FRRS_INVALID = 0,
 
-	/* ring entry received from userspace and it being processed */
+	/* The ring entry received from userspace and it being processed */
 	FRRS_COMMIT,
 
-	/* The ring request waits for a new fuse request */
+	/* The ring entry is waiting for new fuse requests */
 	FRRS_WAIT,
 
-	/* request is in or on the way to user space */
+	/* The ring entry is in or on the way to user space */
 	FRRS_USERSPACE,
 };
 
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index f36e304cd62c8302aed95de89926fc894f602cfd..7ecb103af6f0feca99eb8940872c6a5ccf2e5186 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -28,6 +28,11 @@ struct fuse_copy_state {
 	unsigned int len;
 	unsigned int offset;
 	unsigned int move_pages:1;
+	unsigned int is_uring:1;
+	struct {
+		/* overall offset with the user buffer */
+		unsigned int offset;
+	} ring;
 };
 
 static inline struct fuse_dev *fuse_get_dev(struct file *file)

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 09/16] fuse: {uring} Add uring sqe commit and fetch support
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (7 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds support for fuse request completion through ring SQEs
(FUSE_URING_REQ_COMMIT_AND_FETCH handling). After committing
the ring entry it becomes available for new fuse requests.
Handling of requests through the ring (SQE/CQE handling)
is complete now.

Fuse request data are copied through the mmaped ring buffer,
there is no support for any zero copy yet.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |   6 +-
 fs/fuse/dev_uring.c   | 451 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |  12 ++
 fs/fuse/fuse_dev_i.h  |   7 +-
 fs/fuse/fuse_i.h      |   9 +
 fs/fuse/inode.c       |   2 +-
 6 files changed, 482 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ad65ede9c7723bb6f3589e64b8eef7429fa4b488..15dc168fd789bf11f27fae11a732a3dfc60de97d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -221,7 +221,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq)
 }
 EXPORT_SYMBOL_GPL(fuse_get_unique);
 
-static unsigned int fuse_req_hash(u64 unique)
+unsigned int fuse_req_hash(u64 unique)
 {
 	return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
 }
@@ -1914,7 +1914,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
 }
 
 /* Look up request on processing list by unique ID */
-static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique)
 {
 	unsigned int hash = fuse_req_hash(unique);
 	struct fuse_req *req;
@@ -1998,7 +1998,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	spin_lock(&fpq->lock);
 	req = NULL;
 	if (fpq->connected)
-		req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
+		req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
 
 	err = -ENOENT;
 	if (!req) {
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index ef5e40dcbc5154d8665c7c7ad46123c4a1d621ee..46aab7f7ee0680e84e3a62ae99e664d8b0f85421 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -19,6 +19,24 @@ MODULE_PARM_DESC(enable_uring,
 		 "Enable uring userspace communication through uring.");
 #endif
 
+#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+
+/*
+ * Finalize a fuse request, then fetch and send the next entry, if available
+ */
+static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
+			       int error)
+{
+	struct fuse_req *req = ring_ent->fuse_req;
+
+	if (set_err)
+		req->out.h.error = error;
+
+	clear_bit(FR_SENT, &req->flags);
+	fuse_request_end(ring_ent->fuse_req);
+	ring_ent->fuse_req = NULL;
+}
+
 static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
 {
 	struct fuse_ring_queue *queue = ent->queue;
@@ -49,8 +67,11 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 			continue;
 
 		WARN_ON(!list_empty(&queue->ent_avail_queue));
+		WARN_ON(!list_empty(&queue->ent_w_req_queue));
 		WARN_ON(!list_empty(&queue->ent_commit_queue));
+		WARN_ON(!list_empty(&queue->ent_in_userspace));
 
+		kfree(queue->fpq.processing);
 		kfree(queue);
 		ring->queues[qid] = NULL;
 	}
@@ -109,13 +130,21 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 {
 	struct fuse_conn *fc = ring->fc;
 	struct fuse_ring_queue *queue;
+	struct list_head *pq;
 
 	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
 	if (!queue)
 		return ERR_PTR(-ENOMEM);
+	pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
+	if (!pq) {
+		kfree(queue);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	spin_lock(&fc->lock);
 	if (ring->queues[qid]) {
 		spin_unlock(&fc->lock);
+		kfree(queue->fpq.processing);
 		kfree(queue);
 		return ring->queues[qid];
 	}
@@ -126,6 +155,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 
 	INIT_LIST_HEAD(&queue->ent_avail_queue);
 	INIT_LIST_HEAD(&queue->ent_commit_queue);
+	INIT_LIST_HEAD(&queue->ent_w_req_queue);
+	INIT_LIST_HEAD(&queue->ent_in_userspace);
+	INIT_LIST_HEAD(&queue->fuse_req_queue);
+
+	queue->fpq.processing = pq;
+	fuse_pqueue_init(&queue->fpq);
 
 	WRITE_ONCE(ring->queues[qid], queue);
 	spin_unlock(&fc->lock);
@@ -133,6 +168,232 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	return queue;
 }
 
+static void
+fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
+			      unsigned int issue_flags)
+{
+	io_uring_cmd_done(cmd, 0, 0, issue_flags);
+}
+
+/*
+ * Checks for errors and stores it into the request
+ */
+static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
+					 struct fuse_req *req,
+					 struct fuse_conn *fc)
+{
+	int err;
+
+	if (oh->unique == 0) {
+		/* Not supportd through request based uring, this needs another
+		 * ring from user space to kernel
+		 */
+		pr_warn("Unsupported fuse-notify\n");
+		err = -EINVAL;
+		goto seterr;
+	}
+
+	if (oh->error <= -512 || oh->error > 0) {
+		err = -EINVAL;
+		goto seterr;
+	}
+
+	if (oh->error) {
+		err = oh->error;
+		pr_devel("%s:%d err=%d op=%d req-ret=%d", __func__, __LINE__,
+			 err, req->args->opcode, req->out.h.error);
+		goto err; /* error already set */
+	}
+
+	if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
+		pr_warn("Unpexted seqno mismatch, expected: %llu got %llu\n",
+			req->in.h.unique, oh->unique & ~FUSE_INT_REQ_BIT);
+		err = -ENOENT;
+		goto seterr;
+	}
+
+	/* Is it an interrupt reply ID?	 */
+	if (oh->unique & FUSE_INT_REQ_BIT) {
+		err = 0;
+		if (oh->error == -ENOSYS)
+			fc->no_interrupt = 1;
+		else if (oh->error == -EAGAIN) {
+			/* XXX Interrupts not handled yet */
+			/* err = queue_interrupt(req); */
+			pr_warn("Intrerupt EAGAIN not supported yet");
+			err = -EINVAL;
+		}
+
+		goto seterr;
+	}
+
+	return 0;
+
+seterr:
+	pr_devel("%s:%d err=%d op=%d req-ret=%d", __func__, __LINE__, err,
+		 req->args->opcode, req->out.h.error);
+	oh->error = err;
+err:
+	return err;
+}
+
+static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
+				     struct fuse_req *req,
+				     struct fuse_ring_ent *ent)
+{
+	struct fuse_copy_state cs;
+	struct fuse_args *args = req->args;
+	struct iov_iter iter;
+	int err, res;
+	struct fuse_ring_ent_in_out ring_in_out;
+
+	res = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
+			     sizeof(ring_in_out));
+	if (res)
+		return -EFAULT;
+
+	err = import_ubuf(ITER_SOURCE, ent->payload, ent->max_arg_len, &iter);
+	if (err)
+		return err;
+
+	fuse_copy_init(&cs, 0, &iter);
+	cs.is_uring = 1;
+	cs.req = req;
+
+	return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
+}
+
+ /*
+  * Copy data from the req to the ring buffer
+  */
+static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
+				   struct fuse_ring_ent *ent)
+{
+	struct fuse_copy_state cs;
+	struct fuse_args *args = req->args;
+	struct fuse_in_arg *in_args = args->in_args;
+	int num_args = args->in_numargs;
+	int err, res;
+	struct iov_iter iter;
+	struct fuse_ring_ent_in_out ring_in_out = { .flags = 0 };
+
+	if (num_args == 0)
+		return 0;
+
+	err = import_ubuf(ITER_DEST, ent->payload, ent->max_arg_len, &iter);
+	if (err) {
+		pr_info_ratelimited("Import user buffer failed\n");
+		return err;
+	}
+
+	fuse_copy_init(&cs, 1, &iter);
+	cs.is_uring = 1;
+	cs.req = req;
+
+	/*
+	 * Expectation is that the first argument is the per op header.
+	 * Some op code have that as zero.
+	 */
+	if (args->in_args[0].size > 0) {
+		res = copy_to_user(&ent->headers->op_in, in_args->value,
+				   in_args->size);
+		err = res > 0 ? -EFAULT : res;
+		if (err) {
+			pr_info_ratelimited("Copying the header failed.\n");
+			return err;
+		}
+	}
+	in_args++;
+	num_args--;
+
+	/* copy the payload */
+	err = fuse_copy_args(&cs, num_args, args->in_pages,
+			     (struct fuse_arg *)in_args, 0);
+	if (err) {
+		pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
+		return err;
+	}
+
+	ring_in_out.payload_sz = cs.ring.offset;
+	res = copy_to_user(&ent->headers->ring_ent_in_out, &ring_in_out,
+			   sizeof(ring_in_out));
+	err = res > 0 ? -EFAULT : res;
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int
+fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_req *req = ring_ent->fuse_req;
+	int err = 0, res;
+
+	if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
+		pr_err("qid=%d ring-req=%p invalid state %d on send\n",
+		       queue->qid, ring_ent, ring_ent->state);
+		err = -EIO;
+	}
+
+	if (err)
+		return err;
+
+	pr_devel("%s qid=%d state=%d cmd-done op=%d unique=%llu\n", __func__,
+		 queue->qid, ring_ent->state, req->in.h.opcode,
+		 req->in.h.unique);
+
+	/* copy the request */
+	err = fuse_uring_copy_to_ring(ring, req, ring_ent);
+	if (unlikely(err)) {
+		pr_info("Copy to ring failed: %d\n", err);
+		goto err;
+	}
+
+	/* copy fuse_in_header */
+	res = copy_to_user(&ring_ent->headers->in_out, &req->in.h,
+			   sizeof(req->in.h));
+	err = res > 0 ? -EFAULT : res;
+	if (err)
+		goto err;
+
+	set_bit(FR_SENT, &req->flags);
+	return 0;
+
+err:
+	fuse_uring_req_end(ring_ent, true, err);
+	return err;
+}
+
+/*
+ * Write data to the ring buffer and send the request to userspace,
+ * userspace will read it
+ * This is comparable with classical read(/dev/fuse)
+ */
+static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
+{
+	int err = 0;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	err = fuse_uring_prepare_send(ring_ent);
+	if (err)
+		goto err;
+
+	spin_lock(&queue->lock);
+	ring_ent->state = FRRS_USERSPACE;
+	list_move(&ring_ent->list, &queue->ent_in_userspace);
+	spin_unlock(&queue->lock);
+
+	io_uring_cmd_complete_in_task(ring_ent->cmd,
+				      fuse_uring_async_send_to_ring);
+	return 0;
+
+err:
+	return err;
+}
+
 /*
  * Put a ring request onto hold, it is no longer used for now.
  */
@@ -159,6 +420,193 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
 	ring_ent->state = FRRS_WAIT;
 }
 
+/* Used to find the request on SQE commit */
+static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
+				 struct fuse_req *req)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_pqueue *fpq = &queue->fpq;
+	unsigned int hash;
+
+	req->ring_entry = ring_ent;
+	hash = fuse_req_hash(req->in.h.unique);
+	list_move_tail(&req->list, &fpq->processing[hash]);
+}
+
+/*
+ * Assign a fuse queue entry to the given entry
+ */
+static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
+					   struct fuse_req *req)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	if (WARN_ON_ONCE(ring_ent->state != FRRS_WAIT &&
+			 ring_ent->state != FRRS_COMMIT)) {
+		pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+			ring_ent->state);
+	}
+	list_del_init(&req->list);
+	clear_bit(FR_PENDING, &req->flags);
+	ring_ent->fuse_req = req;
+	ring_ent->state = FRRS_FUSE_REQ;
+	list_move(&ring_ent->list, &queue->ent_w_req_queue);
+	fuse_uring_add_to_pq(ring_ent, req);
+}
+
+/*
+ * Release the ring entry and fetch the next fuse request if available
+ *
+ * @return true if a new request has been fetched
+ */
+static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
+	__must_hold(&queue->lock)
+{
+	struct fuse_req *req = NULL;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct list_head *req_queue = &queue->fuse_req_queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	/* get and assign the next entry while it is still holding the lock */
+	if (!list_empty(req_queue)) {
+		req = list_first_entry(req_queue, struct fuse_req, list);
+		fuse_uring_add_req_to_ring_ent(ring_ent, req);
+	}
+
+	return req ? true : false;
+}
+
+/*
+ * Read data from the ring buffer, which user space has written to
+ * This is comparible with handling of classical write(/dev/fuse).
+ * Also make the ring request available again for new fuse requests.
+ */
+static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
+			      unsigned int issue_flags)
+{
+	struct fuse_ring *ring = ring_ent->queue->ring;
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_req *req = ring_ent->fuse_req;
+	ssize_t err = 0;
+	bool set_err = false;
+
+	err = copy_from_user(&req->out.h, &ring_ent->headers->in_out,
+			     sizeof(req->out.h));
+	if (err) {
+		req->out.h.error = err;
+		goto out;
+	}
+
+	err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
+	if (err) {
+		/* req->out.h.error already set */
+		pr_devel("%s:%d err=%zd oh->err=%d\n", __func__, __LINE__, err,
+			 req->out.h.error);
+		goto out;
+	}
+
+	err = fuse_uring_copy_from_ring(ring, req, ring_ent);
+	if (err)
+		set_err = true;
+
+out:
+	pr_devel("%s:%d ret=%zd op=%d req-ret=%d\n", __func__, __LINE__, err,
+		 req->args->opcode, req->out.h.error);
+	fuse_uring_req_end(ring_ent, set_err, err);
+}
+
+/*
+ * Get the next fuse req and send it
+ */
+static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
+				    struct fuse_ring_queue *queue)
+{
+	int has_next, err;
+	int prev_state = ring_ent->state;
+
+	do {
+		spin_lock(&queue->lock);
+		has_next = fuse_uring_ent_assign_req(ring_ent);
+		if (!has_next) {
+			fuse_uring_ent_avail(ring_ent, queue);
+			spin_unlock(&queue->lock);
+			break; /* no request left */
+		}
+		spin_unlock(&queue->lock);
+
+		err = fuse_uring_send_next_to_ring(ring_ent);
+		if (err)
+			ring_ent->state = prev_state;
+	} while (err);
+}
+
+/* FUSE_URING_REQ_COMMIT_AND_FETCH handler */
+static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
+				   struct fuse_conn *fc)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_ring_ent *ring_ent;
+	int err;
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	uint64_t commit_id = cmd_req->commit_id;
+	struct fuse_pqueue fpq;
+	struct fuse_req *req;
+
+	err = -ENOTCONN;
+	if (!ring)
+		return err;
+
+	queue = ring->queues[cmd_req->qid];
+	if (!queue)
+		return err;
+	fpq = queue->fpq;
+
+	spin_lock(&queue->lock);
+	/* Find a request based on the unique ID of the fuse request
+	 * This should get revised, as it needs a hash calculation and list
+	 * search. And full struct fuse_pqueue is needed (memory overhead).
+	 * As well as the link from req to ring_ent.
+	 */
+	req = fuse_request_find(&fpq, commit_id);
+	err = -ENOENT;
+	if (!req) {
+		pr_info("qid=%d commit_id %llu not found\n", queue->qid,
+			commit_id);
+		spin_unlock(&queue->lock);
+		return err;
+	}
+	list_del_init(&req->list);
+	ring_ent = req->ring_entry;
+	req->ring_entry = NULL;
+
+	err = fuse_ring_ent_unset_userspace(ring_ent);
+	if (err != 0) {
+		pr_info_ratelimited("qid=%d commit_id %llu state %d",
+				    queue->qid, commit_id, ring_ent->state);
+		spin_unlock(&queue->lock);
+		return err;
+	}
+
+	ring_ent->cmd = cmd;
+	spin_unlock(&queue->lock);
+
+	/* without the queue lock, as other locks are taken */
+	fuse_uring_commit(ring_ent, issue_flags);
+
+	/*
+	 * Fetching the next request is absolutely required as queued
+	 * fuse requests would otherwise not get processed - committing
+	 * and fetching is done in one step vs legacy fuse, which has separated
+	 * read (fetch request) and write (commit result).
+	 */
+	fuse_uring_next_fuse_req(ring_ent, queue);
+	return 0;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -335,6 +783,9 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 			return err;
 		}
 		break;
+	case FUSE_URING_REQ_COMMIT_AND_FETCH:
+		err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
+		break;
 	default:
 		err = -EINVAL;
 		pr_devel("Unknown uring command %d", cmd_op);
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 17767e373d31969fe2987fed31c66b5077f209c6..65e8ca9bcb10f11b1b62f2b59cda979da961ebd4 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -20,6 +20,9 @@ enum fuse_ring_req_state {
 	/* The ring entry is waiting for new fuse requests */
 	FRRS_WAIT,
 
+	/* The ring entry got assigned a fuse req */
+	FRRS_FUSE_REQ,
+
 	/* The ring entry is in or on the way to user space */
 	FRRS_USERSPACE,
 };
@@ -72,7 +75,16 @@ struct fuse_ring_queue {
 	 * entries in the process of being committed or in the process
 	 * to be send to userspace
 	 */
+	struct list_head ent_w_req_queue;
 	struct list_head ent_commit_queue;
+
+	/* entries in userspace */
+	struct list_head ent_in_userspace;
+
+	/* fuse requests waiting for an entry slot */
+	struct list_head fuse_req_queue;
+
+	struct fuse_pqueue fpq;
 };
 
 /**
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 7ecb103af6f0feca99eb8940872c6a5ccf2e5186..a8d578b99a14239c05b4a496a4b3b1396eb768dd 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -7,7 +7,7 @@
 #define _FS_FUSE_DEV_I_H
 
 #include <linux/types.h>
-
+#include <linux/fs.h>
 
 /* Ordinary requests have even IDs, while interrupts IDs are odd */
 #define FUSE_INT_REQ_BIT (1ULL << 0)
@@ -15,6 +15,8 @@
 
 struct fuse_arg;
 struct fuse_args;
+struct fuse_pqueue;
+struct fuse_req;
 
 struct fuse_copy_state {
 	int write;
@@ -44,6 +46,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 	return READ_ONCE(file->private_data);
 }
 
+unsigned int fuse_req_hash(u64 unique);
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
+
 void fuse_dev_end_requests(struct list_head *head);
 
 void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5e009a3511d3dd4e9c0e8b4f08ebb271831b1236..55cac719ed7355a73546c148f1b2c257fa1b70f7 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -435,6 +435,10 @@ struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+#ifdef CONFIG_FUSE_IO_URING
+	void *ring_entry;
+#endif
 };
 
 struct fuse_iqueue;
@@ -1215,6 +1219,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
  */
 struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 
+/**
+ * Initialize the fuse processing queue
+ */
+void fuse_pqueue_init(struct fuse_pqueue *fpq);
+
 /**
  * Initialize fuse_conn
  */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index c8f72a50047ac1dfc7e52e9f4e49716a016326ff..b0d44176601f6f7591042d8553596888a7490a85 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -906,7 +906,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq,
 	fiq->priv = priv;
 }
 
-static void fuse_pqueue_init(struct fuse_pqueue *fpq)
+void fuse_pqueue_init(struct fuse_pqueue *fpq)
 {
 	unsigned int i;
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 10/16] fuse: {uring} Handle teardown of ring entries
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (8 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.

Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.

Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed

b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.

This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.

Entries in state:
- FRRS_INIT: No action needed, do not contribute to
  ring->queue_refs yet
- All other states: Are currently processed by other tasks,
  async teardown is needed and it has to wait for the two
  states above. It could be also solved without an async
  teardown task, but would require additional if conditions
  in hot code paths. Also in my personal opinion the code
  looks cleaner with async teardown.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |   8 ++
 fs/fuse/dev_uring.c   | 216 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |  51 ++++++++++++
 3 files changed, 275 insertions(+)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 15dc168fd789bf11f27fae11a732a3dfc60de97d..17a76d0c964f1ecd27dd447504c94646f4ba6b6e 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2295,6 +2295,12 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		spin_unlock(&fc->lock);
 
 		fuse_dev_end_requests(&to_end);
+
+		/*
+		 * fc->lock must not be taken to avoid conflicts with io-uring
+		 * locks
+		 */
+		fuse_uring_abort(fc);
 	} else {
 		spin_unlock(&fc->lock);
 	}
@@ -2306,6 +2312,8 @@ void fuse_wait_aborted(struct fuse_conn *fc)
 	/* matches implicit memory barrier in fuse_drop_waiting() */
 	smp_mb();
 	wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
+
+	fuse_uring_wait_stopped_queues(fc);
 }
 
 int fuse_dev_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 46aab7f7ee0680e84e3a62ae99e664d8b0f85421..19d5d3eafced090a84651b21a9f65cd8b3414435 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -52,6 +52,37 @@ static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
 	return 0;
 }
 
+/* Abort all list queued request on the given ring queue */
+static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
+{
+	struct fuse_req *req;
+	LIST_HEAD(req_list);
+
+	spin_lock(&queue->lock);
+	list_for_each_entry(req, &queue->fuse_req_queue, list)
+		clear_bit(FR_PENDING, &req->flags);
+	list_splice_init(&queue->fuse_req_queue, &req_list);
+	spin_unlock(&queue->lock);
+
+	/* must not hold queue lock to avoid order issues with fi->lock */
+	fuse_dev_end_requests(&req_list);
+}
+
+void fuse_uring_abort_end_requests(struct fuse_ring *ring)
+{
+	int qid;
+	struct fuse_ring_queue *queue;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		queue = READ_ONCE(ring->queues[qid]);
+		if (!queue)
+			continue;
+
+		queue->stopped = true;
+		fuse_uring_abort_end_queue_requests(queue);
+	}
+}
+
 void fuse_uring_destruct(struct fuse_conn *fc)
 {
 	struct fuse_ring *ring = fc->ring;
@@ -111,9 +142,12 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 		goto out_err;
 	}
 
+	init_waitqueue_head(&ring->stop_waitq);
+
 	fc->ring = ring;
 	ring->nr_queues = nr_queues;
 	ring->fc = fc;
+	atomic_set(&ring->queue_refs, 0);
 
 	spin_unlock(&fc->lock);
 	return ring;
@@ -175,6 +209,182 @@ fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
 	io_uring_cmd_done(cmd, 0, 0, issue_flags);
 }
 
+static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
+{
+	struct fuse_req *req = ent->fuse_req;
+
+	/* remove entry from fuse_pqueue->processing */
+	list_del_init(&req->list);
+	ent->fuse_req = NULL;
+	clear_bit(FR_SENT, &req->flags);
+	req->out.h.error = -ECONNABORTED;
+	fuse_request_end(req);
+}
+
+/*
+ * Release a request/entry on connection tear down
+ */
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
+					 bool need_cmd_done)
+{
+	struct fuse_ring_queue *queue = ent->queue;
+
+	/*
+	 * fuse_request_end() might take other locks like fi->lock and
+	 * can lead to lock ordering issues
+	 */
+	lockdep_assert_not_held(&ent->queue->lock);
+
+	if (need_cmd_done) {
+		pr_devel("qid=%d sending cmd_done\n", queue->qid);
+
+		io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
+				  IO_URING_F_UNLOCKED);
+	}
+
+	if (ent->fuse_req)
+		fuse_uring_stop_fuse_req_end(ent);
+
+	list_del_init(&ent->list);
+	kfree(ent);
+}
+
+static void fuse_uring_stop_list_entries(struct list_head *head,
+					 struct fuse_ring_queue *queue,
+					 enum fuse_ring_req_state exp_state)
+{
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_ring_ent *ent, *next;
+	ssize_t queue_refs = SSIZE_MAX;
+	LIST_HEAD(to_teardown);
+
+	spin_lock(&queue->lock);
+	list_for_each_entry_safe(ent, next, head, list) {
+		if (ent->state != exp_state) {
+			pr_warn("entry teardown qid=%d state=%d expected=%d",
+				queue->qid, ent->state, exp_state);
+			continue;
+		}
+
+		list_move(&ent->list, &to_teardown);
+	}
+	spin_unlock(&queue->lock);
+
+	/* no queue lock to avoid lock order issues */
+	list_for_each_entry_safe(ent, next, &to_teardown, list) {
+		bool need_cmd_done = ent->state != FRRS_USERSPACE;
+
+		fuse_uring_entry_teardown(ent, need_cmd_done);
+		queue_refs = atomic_dec_return(&ring->queue_refs);
+
+		if (WARN_ON_ONCE(queue_refs < 0))
+			pr_warn("qid=%d queue_refs=%zd", queue->qid,
+				queue_refs);
+	}
+}
+
+static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue)
+{
+	fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue,
+				     FRRS_USERSPACE);
+	fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue, FRRS_WAIT);
+}
+
+/*
+ * Log state debug info
+ */
+static void fuse_uring_log_ent_state(struct fuse_ring *ring)
+{
+	int qid;
+	struct fuse_ring_ent *ent;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		spin_lock(&queue->lock);
+		/*
+		 * Log entries from the intermediate queue, the other queues
+		 * should be empty
+		 */
+		list_for_each_entry(ent, &queue->ent_w_req_queue, list) {
+			pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+				ring, qid, ent, ent->state);
+		}
+		list_for_each_entry(ent, &queue->ent_commit_queue, list) {
+			pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+				ring, qid, ent, ent->state);
+		}
+		spin_unlock(&queue->lock);
+	}
+	ring->stop_debug_log = 1;
+}
+
+static void fuse_uring_async_stop_queues(struct work_struct *work)
+{
+	int qid;
+	struct fuse_ring *ring =
+		container_of(work, struct fuse_ring, async_teardown_work.work);
+
+	/* XXX code dup */
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+		if (!queue)
+			continue;
+
+		fuse_uring_teardown_entries(queue);
+	}
+
+	/*
+	 * Some ring entries are might be in the middle of IO operations,
+	 * i.e. in process to get handled by file_operations::uring_cmd
+	 * or on the way to userspace - we could handle that with conditions in
+	 * run time code, but easier/cleaner to have an async tear down handler
+	 * If there are still queue references left
+	 */
+	if (atomic_read(&ring->queue_refs) > 0) {
+		if (time_after(jiffies,
+			       ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT))
+			fuse_uring_log_ent_state(ring);
+
+		schedule_delayed_work(&ring->async_teardown_work,
+				      FUSE_URING_TEARDOWN_INTERVAL);
+	} else {
+		wake_up_all(&ring->stop_waitq);
+	}
+}
+
+/*
+ * Stop the ring queues
+ */
+void fuse_uring_stop_queues(struct fuse_ring *ring)
+{
+	int qid;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+		if (!queue)
+			continue;
+
+		fuse_uring_teardown_entries(queue);
+	}
+
+	if (atomic_read(&ring->queue_refs) > 0) {
+		pr_info("ring=%p scheduling async queue stop\n", ring);
+		ring->teardown_time = jiffies;
+		INIT_DELAYED_WORK(&ring->async_teardown_work,
+				  fuse_uring_async_stop_queues);
+		schedule_delayed_work(&ring->async_teardown_work,
+				      FUSE_URING_TEARDOWN_INTERVAL);
+	} else {
+		wake_up_all(&ring->stop_waitq);
+	}
+}
+
 /*
  * Checks for errors and stores it into the request
  */
@@ -565,6 +775,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 		return err;
 	fpq = queue->fpq;
 
+	if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped))
+		return err;
+
 	spin_lock(&queue->lock);
 	/* Find a request based on the unique ID of the fuse request
 	 * This should get revised, as it needs a hash calculation and list
@@ -732,6 +945,7 @@ static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
 	if (WARN_ON_ONCE(err != 0))
 		goto err;
 
+	atomic_inc(&ring->queue_refs);
 	_fuse_uring_fetch(ring_ent, cmd, issue_flags);
 
 	return 0;
@@ -758,6 +972,8 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
 	return err;
 
+	pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
+
 	err = -EOPNOTSUPP;
 	if (!enable_uring) {
 		pr_info_ratelimited("uring is disabled\n");
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 65e8ca9bcb10f11b1b62f2b59cda979da961ebd4..e567a20731d76f47b7ebe3f31da4a9348f6d2bc8 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -11,6 +11,9 @@
 
 #ifdef CONFIG_FUSE_IO_URING
 
+#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ)
+#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20)
+
 enum fuse_ring_req_state {
 	FRRS_INVALID = 0,
 
@@ -85,6 +88,8 @@ struct fuse_ring_queue {
 	struct list_head fuse_req_queue;
 
 	struct fuse_pqueue fpq;
+
+	bool stopped;
 };
 
 /**
@@ -99,11 +104,50 @@ struct fuse_ring {
 	size_t nr_queues;
 
 	struct fuse_ring_queue **queues;
+	/*
+	 * Log ring entry states onces on stop when entries cannot be
+	 * released
+	 */
+	unsigned int stop_debug_log : 1;
+
+	wait_queue_head_t stop_waitq;
+
+	/* async tear down */
+	struct delayed_work async_teardown_work;
+
+	/* log */
+	unsigned long teardown_time;
+
+	atomic_t queue_refs;
 };
 
 void fuse_uring_destruct(struct fuse_conn *fc);
+void fuse_uring_stop_queues(struct fuse_ring *ring);
+void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
 
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+
+	if (ring == NULL)
+		return;
+
+	if (atomic_read(&ring->queue_refs) > 0) {
+		fuse_uring_abort_end_requests(ring);
+		fuse_uring_stop_queues(ring);
+	}
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+
+	if (ring)
+		wait_event(ring->stop_waitq,
+			   atomic_read(&ring->queue_refs) == 0);
+}
+
 #else /* CONFIG_FUSE_IO_URING */
 
 struct fuse_ring;
@@ -116,6 +160,13 @@ static inline void fuse_uring_destruct(struct fuse_conn *fc)
 {
 }
 
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+}
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 11/16] fuse: {uring} Allow to queue fg requests through io-uring
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (9 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 12/16] fuse: {uring} Allow to queue to the ring Bernd Schubert
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This prepares queueing and sending foreground requests through
io-uring.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |   5 +-
 fs/fuse/dev_uring.c   | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |  11 ++++
 fs/fuse/fuse_dev_i.h  |   5 ++
 4 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 17a76d0c964f1ecd27dd447504c94646f4ba6b6e..ff7fd5c1096e8bb1f3479c2ac353c9a14fbf7ecd 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -237,7 +237,8 @@ __releases(fiq->lock)
 	spin_unlock(&fiq->lock);
 }
 
-static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget)
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+			   struct fuse_forget_link *forget)
 {
 	spin_lock(&fiq->lock);
 	if (fiq->connected) {
@@ -250,7 +251,7 @@ static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_li
 	}
 }
 
-static void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
 {
 	spin_lock(&fiq->lock);
 	if (list_empty(&req->intr_entry)) {
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 19d5d3eafced090a84651b21a9f65cd8b3414435..d8653b4fd990000c8de073089416944877b4a3a8 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -21,6 +21,12 @@ MODULE_PARM_DESC(enable_uring,
 
 #define FUSE_URING_IOV_SEGS 2 /* header and payload */
 
+struct fuse_uring_cmd_pdu {
+	struct fuse_ring_ent *ring_ent;
+};
+
+const struct fuse_iqueue_ops fuse_io_uring_ops;
+
 /*
  * Finalize a fuse request, then fetch and send the next entry, if available
  */
@@ -820,6 +826,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	return 0;
 }
 
+static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
+{
+	int qid;
+	struct fuse_ring_queue *queue;
+	bool ready = true;
+
+	for (qid = 0; qid < ring->nr_queues && ready; qid++) {
+		if (current_qid == qid)
+			continue;
+
+		queue = ring->queues[qid];
+		if (!queue) {
+			ready = false;
+			break;
+		}
+
+		spin_lock(&queue->lock);
+		if (list_empty(&queue->ent_avail_queue))
+			ready = false;
+		spin_unlock(&queue->lock);
+	}
+
+	return ready;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -828,11 +859,23 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
 			      unsigned int issue_flags)
 {
 	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
 
 	spin_lock(&queue->lock);
 	fuse_uring_ent_avail(ring_ent, queue);
 	ring_ent->cmd = cmd;
 	spin_unlock(&queue->lock);
+
+	if (!ring->ready) {
+		bool ready = is_ring_ready(ring, queue->qid);
+
+		if (ready) {
+			WRITE_ONCE(ring->ready, true);
+			fiq->ops = &fuse_io_uring_ops;
+		}
+	}
 }
 
 /*
@@ -1013,3 +1056,119 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 
 	return -EIOCBQUEUED;
 }
+
+/*
+ * This prepares and sends the ring request in fuse-uring task context.
+ * User buffers are not mapped yet - the application does not have permission
+ * to write to it - this has to be executed in ring task context.
+ */
+static void
+fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
+			    unsigned int issue_flags)
+{
+	struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+	struct fuse_ring_ent *ring_ent = pdu->ring_ent;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	int err;
+
+	BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
+
+	err = fuse_uring_prepare_send(ring_ent);
+	if (err)
+		goto err;
+
+	io_uring_cmd_done(cmd, 0, 0, issue_flags);
+
+	spin_lock(&queue->lock);
+	ring_ent->state = FRRS_USERSPACE;
+	list_move(&ring_ent->list, &queue->ent_in_userspace);
+	spin_unlock(&queue->lock);
+	return;
+err:
+	fuse_uring_next_fuse_req(ring_ent, queue);
+}
+
+static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
+{
+	unsigned int qid;
+	struct fuse_ring_queue *queue;
+
+	qid = task_cpu(current);
+
+	if (WARN_ONCE(qid >= ring->nr_queues,
+		      "Core number (%u) exceeds nr ueues (%zu)\n", qid,
+		      ring->nr_queues))
+		qid = 0;
+
+	queue = ring->queues[qid];
+	if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
+		return NULL;
+
+	return queue;
+}
+
+/* queue a fuse request and send it if a ring entry is available */
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	struct fuse_ring_ent *ring_ent = NULL;
+	int err;
+
+	err = -EINVAL;
+	queue = fuse_uring_task_to_queue(ring);
+	if (!queue)
+		goto err;
+
+	if (req->in.h.opcode != FUSE_NOTIFY_REPLY)
+		req->in.h.unique = fuse_get_unique(fiq);
+	spin_lock(&queue->lock);
+	err = -ENOTCONN;
+	if (unlikely(queue->stopped))
+		goto err_unlock;
+
+	if (!list_empty(&queue->ent_avail_queue)) {
+		ring_ent = list_first_entry(&queue->ent_avail_queue,
+					    struct fuse_ring_ent, list);
+
+		fuse_uring_add_req_to_ring_ent(ring_ent, req);
+	} else {
+		list_add_tail(&req->list, &queue->fuse_req_queue);
+	}
+	spin_unlock(&queue->lock);
+
+	if (ring_ent) {
+		struct io_uring_cmd *cmd = ring_ent->cmd;
+		struct fuse_uring_cmd_pdu *pdu =
+			(struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+		err = -EIO;
+		if (WARN_ON_ONCE(ring_ent->state != FRRS_FUSE_REQ))
+			goto err;
+
+		pdu->ring_ent = ring_ent;
+		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+	}
+
+	return;
+
+err_unlock:
+	spin_unlock(&queue->lock);
+err:
+	req->out.h.error = err;
+	clear_bit(FR_PENDING, &req->flags);
+	fuse_request_end(req);
+}
+
+const struct fuse_iqueue_ops fuse_io_uring_ops = {
+	/* should be send over io-uring as enhancement */
+	.send_forget = fuse_dev_queue_forget,
+
+	/*
+	 * could be send over io-uring, but interrupts should be rare,
+	 * no need to make the code complex
+	 */
+	.send_interrupt = fuse_dev_queue_interrupt,
+	.send_req = fuse_uring_queue_fuse_req,
+};
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index e567a20731d76f47b7ebe3f31da4a9348f6d2bc8..57aa3ed04447eb832e5a0463f06969a04154b181 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -119,12 +119,15 @@ struct fuse_ring {
 	unsigned long teardown_time;
 
 	atomic_t queue_refs;
+
+	bool ready;
 };
 
 void fuse_uring_destruct(struct fuse_conn *fc);
 void fuse_uring_stop_queues(struct fuse_ring *ring);
 void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
 
 static inline void fuse_uring_abort(struct fuse_conn *fc)
 {
@@ -133,6 +136,8 @@ static inline void fuse_uring_abort(struct fuse_conn *fc)
 	if (ring == NULL)
 		return;
 
+	WRITE_ONCE(ring->ready, false);
+
 	if (atomic_read(&ring->queue_refs) > 0) {
 		fuse_uring_abort_end_requests(ring);
 		fuse_uring_stop_queues(ring);
@@ -148,6 +153,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
 			   atomic_read(&ring->queue_refs) == 0);
 }
 
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+	return fc->ring && fc->ring->ready;
+}
+
 #else /* CONFIG_FUSE_IO_URING */
 
 struct fuse_ring;
@@ -167,6 +177,7 @@ static inline void fuse_uring_abort(struct fuse_conn *fc)
 static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
 {
 }
+
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index a8d578b99a14239c05b4a496a4b3b1396eb768dd..545aeae93400c6b3ba49c8fc17993a9692665416 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -17,6 +17,8 @@ struct fuse_arg;
 struct fuse_args;
 struct fuse_pqueue;
 struct fuse_req;
+struct fuse_iqueue *fiq;
+struct fuse_forget_link *forget;
 
 struct fuse_copy_state {
 	int write;
@@ -58,6 +60,9 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
 		   int zeroing);
 int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
 		       unsigned int nbytes);
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+			   struct fuse_forget_link *forget);
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req);
 
 #endif
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 12/16] fuse: {uring} Allow to queue to the ring
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (10 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This enables enqueuing requests through fuse uring queues.

For initial simplicity requests are always allocated the normal way
then added to ring queues lists and only then copied to ring queue
entries. Later on the allocation and adding the requests to a list
can be avoided, by directly using a ring entry. This introduces
some code complexity and is therefore not done for now.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |  26 ++++++++++++-
 fs/fuse/dev_uring.c   | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |   6 +++
 fs/fuse/fuse_dev_i.h  |   4 +-
 4 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ff7fd5c1096e8bb1f3479c2ac353c9a14fbf7ecd..dc8fc46efca82d30afb64b6c0e6a361fd951ca33 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -568,7 +568,25 @@ ssize_t __fuse_simple_request(struct mnt_idmap *idmap,
 	return ret;
 }
 
-static bool fuse_request_queue_background(struct fuse_req *req)
+#ifdef CONFIG_FUSE_IO_URING
+static bool fuse_request_queue_background_uring(struct fuse_conn *fc,
+					       struct fuse_req *req)
+{
+	struct fuse_iqueue *fiq = &fc->iq;
+
+	req->in.h.unique = fuse_get_unique(fiq);
+	req->in.h.len = sizeof(struct fuse_in_header) +
+		fuse_len_args(req->args->in_numargs,
+			      (struct fuse_arg *) req->args->in_args);
+
+	return fuse_uring_queue_bq_req(req);
+}
+#endif
+
+/*
+ * @return true if queued
+ */
+static int fuse_request_queue_background(struct fuse_req *req)
 {
 	struct fuse_mount *fm = req->fm;
 	struct fuse_conn *fc = fm->fc;
@@ -580,6 +598,12 @@ static bool fuse_request_queue_background(struct fuse_req *req)
 		atomic_inc(&fc->num_waiting);
 	}
 	__set_bit(FR_ISREPLY, &req->flags);
+
+#ifdef CONFIG_FUSE_IO_URING
+	if (fuse_uring_ready(fc))
+		return fuse_request_queue_background_uring(fc, req);
+#endif
+
 	spin_lock(&fc->bg_lock);
 	if (likely(fc->connected)) {
 		fc->num_background++;
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index d8653b4fd990000c8de073089416944877b4a3a8..36ff1df1633880d66c23b13b425f70c6796c1c2c 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -27,13 +27,55 @@ struct fuse_uring_cmd_pdu {
 
 const struct fuse_iqueue_ops fuse_io_uring_ops;
 
+static void fuse_uring_flush_bg(struct fuse_ring_queue *queue)
+{
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_conn *fc = ring->fc;
+
+	lockdep_assert_held(&queue->lock);
+	lockdep_assert_held(&fc->bg_lock);
+
+	/*
+	 * Allow one bg request per queue, ignoring global fc limits.
+	 * This prevents a single queue from consuming all resources and
+	 * eliminates the need for remote queue wake-ups when global
+	 * limits are met but this queue has no more waiting requests.
+	 */
+	while ((fc->active_background < fc->max_background ||
+		!queue->active_background) &&
+	       (!list_empty(&queue->fuse_req_bg_queue))) {
+		struct fuse_req *req;
+
+		req = list_first_entry(&queue->fuse_req_bg_queue,
+				       struct fuse_req, list);
+		fc->active_background++;
+		queue->active_background++;
+
+		list_move_tail(&req->list, &queue->fuse_req_queue);
+	}
+}
+
 /*
  * Finalize a fuse request, then fetch and send the next entry, if available
  */
 static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
 			       int error)
 {
+	struct fuse_ring_queue *queue = ring_ent->queue;
 	struct fuse_req *req = ring_ent->fuse_req;
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_conn *fc = ring->fc;
+
+	lockdep_assert_not_held(&queue->lock);
+	spin_lock(&queue->lock);
+	if (test_bit(FR_BACKGROUND, &req->flags)) {
+		queue->active_background--;
+		spin_lock(&fc->bg_lock);
+		fuse_uring_flush_bg(queue);
+		spin_unlock(&fc->bg_lock);
+	}
+
+	spin_unlock(&queue->lock);
 
 	if (set_err)
 		req->out.h.error = error;
@@ -78,6 +120,7 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
 {
 	int qid;
 	struct fuse_ring_queue *queue;
+	struct fuse_conn *fc = ring->fc;
 
 	for (qid = 0; qid < ring->nr_queues; qid++) {
 		queue = READ_ONCE(ring->queues[qid]);
@@ -85,6 +128,13 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
 			continue;
 
 		queue->stopped = true;
+
+		WARN_ON_ONCE(ring->fc->max_background != UINT_MAX);
+		spin_lock(&queue->lock);
+		spin_lock(&fc->bg_lock);
+		fuse_uring_flush_bg(queue);
+		spin_unlock(&fc->bg_lock);
+		spin_unlock(&queue->lock);
 		fuse_uring_abort_end_queue_requests(queue);
 	}
 }
@@ -198,6 +248,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	INIT_LIST_HEAD(&queue->ent_w_req_queue);
 	INIT_LIST_HEAD(&queue->ent_in_userspace);
 	INIT_LIST_HEAD(&queue->fuse_req_queue);
+	INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
 
 	queue->fpq.processing = pq;
 	fuse_pqueue_init(&queue->fpq);
@@ -1161,6 +1212,58 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 	fuse_request_end(req);
 }
 
+bool fuse_uring_queue_bq_req(struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	struct fuse_ring_ent *ring_ent = NULL;
+
+	queue = fuse_uring_task_to_queue(ring);
+	if (!queue)
+		return false;
+
+	spin_lock(&queue->lock);
+	if (unlikely(queue->stopped)) {
+		spin_unlock(&queue->lock);
+		return false;
+	}
+
+	list_add_tail(&req->list, &queue->fuse_req_bg_queue);
+
+	if (!list_empty(&queue->ent_avail_queue))
+		ring_ent = list_first_entry(&queue->ent_avail_queue,
+					    struct fuse_ring_ent, list);
+
+	spin_lock(&fc->bg_lock);
+	fc->num_background++;
+	if (fc->num_background == fc->max_background)
+		fc->blocked = 1;
+	fuse_uring_flush_bg(queue);
+	spin_unlock(&fc->bg_lock);
+
+	/*
+	 * Due to bg_queue flush limits there might be other bg requests
+	 * in the queue that need to be handled first. Or no further req
+	 * might be available.
+	 */
+	req = list_first_entry_or_null(&queue->fuse_req_queue, struct fuse_req,
+				       list);
+	if (ring_ent && req) {
+		struct io_uring_cmd *cmd = ring_ent->cmd;
+		struct fuse_uring_cmd_pdu *pdu =
+			(struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+		fuse_uring_add_req_to_ring_ent(ring_ent, req);
+
+		pdu->ring_ent = ring_ent;
+		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+	}
+	spin_unlock(&queue->lock);
+
+	return true;
+}
+
 const struct fuse_iqueue_ops fuse_io_uring_ops = {
 	/* should be send over io-uring as enhancement */
 	.send_forget = fuse_dev_queue_forget,
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 57aa3ed04447eb832e5a0463f06969a04154b181..8426337361c72a30dca8f6fd9012ea3827160091 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -87,8 +87,13 @@ struct fuse_ring_queue {
 	/* fuse requests waiting for an entry slot */
 	struct list_head fuse_req_queue;
 
+	/* background fuse requests */
+	struct list_head fuse_req_bg_queue;
+
 	struct fuse_pqueue fpq;
 
+	unsigned int active_background;
+
 	bool stopped;
 };
 
@@ -128,6 +133,7 @@ void fuse_uring_stop_queues(struct fuse_ring *ring);
 void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
 void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
+bool fuse_uring_queue_bq_req(struct fuse_req *req);
 
 static inline void fuse_uring_abort(struct fuse_conn *fc)
 {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 545aeae93400c6b3ba49c8fc17993a9692665416..853333d6fcd3382286532d03ef3cec8ab4979fe7 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -17,8 +17,8 @@ struct fuse_arg;
 struct fuse_args;
 struct fuse_pqueue;
 struct fuse_req;
-struct fuse_iqueue *fiq;
-struct fuse_forget_link *forget;
+struct fuse_iqueue;
+struct fuse_forget_link;
 
 struct fuse_copy_state {
 	int write;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 13/16] io_uring/cmd: let cmds to know about dying task
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (11 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 12/16] fuse: {uring} Allow to queue to the ring Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

From: Pavel Begunkov <[email protected]>

When the taks that submitted a request is dying, a task work for that
request might get run by a kernel thread or even worse by a half
dismantled task. We can't just cancel the task work without running the
callback as the cmd might need to do some clean up, so pass a flag
instead. If set, it's not safe to access any task resources and the
callback is expected to cancel the cmd ASAP.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 include/linux/io_uring_types.h | 1 +
 io_uring/uring_cmd.c           | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 4b9ba523978d203ae23fb4ec9622d6e4e35a5e36..2ee5dc105b58ab48aef347a845509367a8241b9b 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -37,6 +37,7 @@ enum io_uring_cmd_flags {
 	/* set when uring wants to cancel a previously issued command */
 	IO_URING_F_CANCEL		= (1 << 11),
 	IO_URING_F_COMPAT		= (1 << 12),
+	IO_URING_F_TASK_DEAD		= (1 << 13),
 };
 
 struct io_wq_work_node {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 39c3c816ec7882b9aa26cd45df6ade531379e40f..38b6ccb4e55a1e85d204263272a280d3272557a4 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -119,9 +119,13 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
 static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
 {
 	struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+	unsigned int flags = IO_URING_F_COMPLETE_DEFER;
+
+	if (req->task->flags & PF_EXITING)
+		flags |= IO_URING_F_TASK_DEAD;
 
 	/* task_work executor checks the deffered list completion */
-	ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER);
+	ioucmd->task_work_cb(ioucmd, flags);
 }
 
 void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (12 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

The ring task is terminating, it not safe to still access
its resources. Also no need for further actions.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 36ff1df1633880d66c23b13b425f70c6796c1c2c..d0f8f0932e1715babebbc715c1846a5052419eb9 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1124,16 +1124,22 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
 
 	BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
 
+	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
+		err = -ECANCELED;
+		goto terminating;
+	}
+
 	err = fuse_uring_prepare_send(ring_ent);
 	if (err)
 		goto err;
 
-	io_uring_cmd_done(cmd, 0, 0, issue_flags);
-
+terminating:
 	spin_lock(&queue->lock);
 	ring_ent->state = FRRS_USERSPACE;
 	list_move(&ring_ent->list, &queue->ent_in_userspace);
 	spin_unlock(&queue->lock);
+	io_uring_cmd_done(cmd, err, 0, issue_flags);
+
 	return;
 err:
 	fuse_uring_next_fuse_req(ring_ent, queue);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (13 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  2024-11-21 23:43 ` [PATCH RFC v6 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

When the fuse-server terminates while the fuse-client or kernel
still has queued URING_CMDs, these commands retain references
to the struct file used by the fuse connection. This prevents
fuse_dev_release() from being invoked, resulting in a hung mount
point.

This patch addresses the issue by making queued URING_CMDs
cancelable, allowing fuse_dev_release() to proceed as expected
and preventing the mount point from hanging.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c   | 103 ++++++++++++++++++++++++++++++++++++++++----------
 fs/fuse/dev_uring_i.h |  12 ++++++
 2 files changed, 94 insertions(+), 21 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index d0f8f0932e1715babebbc715c1846a5052419eb9..b7a6c3946611a9fdecd4996117b45b3081ad6edd 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -23,6 +23,7 @@ MODULE_PARM_DESC(enable_uring,
 
 struct fuse_uring_cmd_pdu {
 	struct fuse_ring_ent *ring_ent;
+	struct fuse_ring_queue *queue;
 };
 
 const struct fuse_iqueue_ops fuse_io_uring_ops;
@@ -221,6 +222,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	struct fuse_conn *fc = ring->fc;
 	struct fuse_ring_queue *queue;
 	struct list_head *pq;
+	struct fuse_ring_ent *ent, *next;
 
 	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
 	if (!queue)
@@ -249,6 +251,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 	INIT_LIST_HEAD(&queue->ent_in_userspace);
 	INIT_LIST_HEAD(&queue->fuse_req_queue);
 	INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
+	INIT_LIST_HEAD(&queue->ent_released);
+
+	list_for_each_entry_safe(ent, next, &queue->ent_released, list) {
+		list_del_init(&ent->list);
+		kfree(ent);
+	}
 
 	queue->fpq.processing = pq;
 	fuse_pqueue_init(&queue->fpq);
@@ -281,8 +289,7 @@ static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
 /*
  * Release a request/entry on connection tear down
  */
-static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
-					 bool need_cmd_done)
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
 {
 	struct fuse_ring_queue *queue = ent->queue;
 
@@ -292,7 +299,7 @@ static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
 	 */
 	lockdep_assert_not_held(&ent->queue->lock);
 
-	if (need_cmd_done) {
+	if (ent->need_cmd_done) {
 		pr_devel("qid=%d sending cmd_done\n", queue->qid);
 
 		io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
@@ -302,8 +309,16 @@ static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
 	if (ent->fuse_req)
 		fuse_uring_stop_fuse_req_end(ent);
 
-	list_del_init(&ent->list);
-	kfree(ent);
+	/*
+	 * The entry must not be freed immediately, due to access of direct
+	 * pointer access of entries through IO_URING_F_CANCEL - there is a risk
+	 * of race between daemon termination (which triggers IO_URING_F_CANCEL
+	 * and accesses entries without checking the list state first
+	 */
+	spin_lock(&queue->lock);
+	list_move(&ent->list, &queue->ent_released);
+	ent->state = FRRS_RELEASED;
+	spin_unlock(&queue->lock);
 }
 
 static void fuse_uring_stop_list_entries(struct list_head *head,
@@ -323,15 +338,15 @@ static void fuse_uring_stop_list_entries(struct list_head *head,
 			continue;
 		}
 
+		ent->need_cmd_done = ent->state != FRRS_USERSPACE;
+		ent->state = FRRS_TEARDOWN;
 		list_move(&ent->list, &to_teardown);
 	}
 	spin_unlock(&queue->lock);
 
 	/* no queue lock to avoid lock order issues */
 	list_for_each_entry_safe(ent, next, &to_teardown, list) {
-		bool need_cmd_done = ent->state != FRRS_USERSPACE;
-
-		fuse_uring_entry_teardown(ent, need_cmd_done);
+		fuse_uring_entry_teardown(ent);
 		queue_refs = atomic_dec_return(&ring->queue_refs);
 
 		if (WARN_ON_ONCE(queue_refs < 0))
@@ -442,6 +457,49 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
 	}
 }
 
+/*
+ * Handle IO_URING_F_CANCEL, typically should come on daemon termination
+ */
+static void fuse_uring_cancel(struct io_uring_cmd *cmd,
+			      unsigned int issue_flags, struct fuse_conn *fc)
+{
+	struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+	struct fuse_ring_queue *queue = pdu->queue;
+	struct fuse_ring_ent *ent = pdu->ring_ent;
+	bool need_cmd_done = false;
+
+	/*
+	 * direct access on ent - it must not be destructed as long as
+	 * IO_URING_F_CANCEL might come up
+	 */
+	spin_lock(&queue->lock);
+	if (ent->state == FRRS_WAIT) {
+		ent->state = FRRS_USERSPACE;
+		list_move(&ent->list, &queue->ent_in_userspace);
+		need_cmd_done = true;
+	}
+	spin_unlock(&queue->lock);
+
+	if (need_cmd_done)
+		io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
+
+	/*
+	 * releasing the last entry should trigger fuse_dev_release() if
+	 * the daemon was terminated
+	 */
+}
+
+static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
+				      struct fuse_ring_ent *ring_ent)
+{
+	struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+	pdu->ring_ent = ring_ent;
+	pdu->queue = ring_ent->queue;
+
+	io_uring_cmd_mark_cancelable(cmd, issue_flags);
+}
+
 /*
  * Checks for errors and stores it into the request
  */
@@ -665,7 +723,8 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
  * Put a ring request onto hold, it is no longer used for now.
  */
 static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
-				 struct fuse_ring_queue *queue)
+				 struct fuse_ring_queue *queue,
+				 unsigned int issue_flags)
 	__must_hold(&queue->lock)
 {
 	struct fuse_ring *ring = queue->ring;
@@ -682,6 +741,7 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
 		return;
 	}
 
+	fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
 	list_move(&ring_ent->list, &queue->ent_avail_queue);
 
 	ring_ent->state = FRRS_WAIT;
@@ -789,7 +849,8 @@ static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
  * Get the next fuse req and send it
  */
 static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
-				    struct fuse_ring_queue *queue)
+				    struct fuse_ring_queue *queue,
+				    unsigned int issue_flags)
 {
 	int has_next, err;
 	int prev_state = ring_ent->state;
@@ -798,7 +859,7 @@ static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
 		spin_lock(&queue->lock);
 		has_next = fuse_uring_ent_assign_req(ring_ent);
 		if (!has_next) {
-			fuse_uring_ent_avail(ring_ent, queue);
+			fuse_uring_ent_avail(ring_ent, queue, issue_flags);
 			spin_unlock(&queue->lock);
 			break; /* no request left */
 		}
@@ -873,7 +934,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	 * and fetching is done in one step vs legacy fuse, which has separated
 	 * read (fetch request) and write (commit result).
 	 */
-	fuse_uring_next_fuse_req(ring_ent, queue);
+	fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
 	return 0;
 }
 
@@ -915,7 +976,7 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
 	struct fuse_iqueue *fiq = &fc->iq;
 
 	spin_lock(&queue->lock);
-	fuse_uring_ent_avail(ring_ent, queue);
+	fuse_uring_ent_avail(ring_ent, queue, issue_flags);
 	ring_ent->cmd = cmd;
 	spin_unlock(&queue->lock);
 
@@ -1085,6 +1146,11 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	if (fc->aborted)
 		return err;
 
+	if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
+		fuse_uring_cancel(cmd, issue_flags, fc);
+		return 0;
+	}
+
 	switch (cmd_op) {
 	case FUSE_URING_REQ_FETCH:
 		err = fuse_uring_fetch(cmd, issue_flags, fc);
@@ -1142,7 +1208,7 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
 
 	return;
 err:
-	fuse_uring_next_fuse_req(ring_ent, queue);
+	fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
 }
 
 static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
@@ -1197,14 +1263,11 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
 
 	if (ring_ent) {
 		struct io_uring_cmd *cmd = ring_ent->cmd;
-		struct fuse_uring_cmd_pdu *pdu =
-			(struct fuse_uring_cmd_pdu *)cmd->pdu;
-
 		err = -EIO;
 		if (WARN_ON_ONCE(ring_ent->state != FRRS_FUSE_REQ))
 			goto err;
 
-		pdu->ring_ent = ring_ent;
+		/* pdu already set by preparing IO_URING_F_CANCEL */
 		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
 	}
 
@@ -1257,12 +1320,10 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
 				       list);
 	if (ring_ent && req) {
 		struct io_uring_cmd *cmd = ring_ent->cmd;
-		struct fuse_uring_cmd_pdu *pdu =
-			(struct fuse_uring_cmd_pdu *)cmd->pdu;
 
 		fuse_uring_add_req_to_ring_ent(ring_ent, req);
 
-		pdu->ring_ent = ring_ent;
+		/* pdu already set by preparing IO_URING_F_CANCEL */
 		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
 	}
 	spin_unlock(&queue->lock);
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 8426337361c72a30dca8f6fd9012ea3827160091..6af7754249623102f48a4c5c924a21b20851925f 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -28,6 +28,12 @@ enum fuse_ring_req_state {
 
 	/* The ring entry is in or on the way to user space */
 	FRRS_USERSPACE,
+
+	/* The ring entry is in teardown */
+	FRRS_TEARDOWN,
+
+	/* The ring entry is released, but not freed yet */
+	FRRS_RELEASED,
 };
 
 /** A fuse ring entry, part of the ring queue */
@@ -52,6 +58,9 @@ struct fuse_ring_ent {
 	 */
 	unsigned int state;
 
+	/* The entry needs io_uring_cmd_done for teardown */
+	unsigned int need_cmd_done;
+
 	struct fuse_req *fuse_req;
 };
 
@@ -84,6 +93,9 @@ struct fuse_ring_queue {
 	/* entries in userspace */
 	struct list_head ent_in_userspace;
 
+	/* entries that are released */
+	struct list_head ent_released;
+
 	/* fuse requests waiting for an entry slot */
 	struct list_head fuse_req_queue;
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH RFC v6 16/16] fuse: enable fuse-over-io-uring
  2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (14 preceding siblings ...)
  2024-11-21 23:43 ` [PATCH RFC v6 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2024-11-21 23:43 ` Bernd Schubert
  15 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-21 23:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

All required parts are handled now, fuse-io-uring can
be enabled.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index b7a6c3946611a9fdecd4996117b45b3081ad6edd..91105b8f674baacbd3b16bede8678686ff2c1896 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1122,11 +1122,6 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	u32 cmd_op = cmd->cmd_op;
 	int err = 0;
 
-	/* Disabled for now, especially as teardown is not implemented yet */
-	err = -EOPNOTSUPP;
-	pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
-	return err;
-
 	pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
 
 	err = -EOPNOTSUPP;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-21 23:43 ` [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2024-11-23  9:01   ` Miklos Szeredi
  2024-11-25 14:45     ` Bernd Schubert
  0 siblings, 1 reply; 24+ messages in thread
From: Miklos Szeredi @ 2024-11-23  9:01 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Fri, 22 Nov 2024 at 00:44, Bernd Schubert <[email protected]> wrote:

> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
>         struct fuse_inode *fi = get_fuse_inode(inode);
>         struct fuse_mount *fm = get_fuse_mount(inode);
>         FUSE_ARGS(args);
> +       struct fuse_zero_in zero_arg;

I'd move this to global scope (i.e. just a single instance for all
uses) and rename to zero_header.

> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index fd8898b0c1cca4d117982d5208d78078472b0dfb..6cb45b5332c45f322e9163469ffd114cbc07dc4f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1053,6 +1053,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>
>         for (i = 0; !err && i < numargs; i++)  {
>                 struct fuse_arg *arg = &args[i];
> +
> +               /* zero headers */
> +               if (arg->size == 0) {
> +                       if (WARN_ON_ONCE(i != 0)) {
> +                               if (cs->req)
> +                                       pr_err_once(
> +                                               "fuse: zero size header in opcode %d\n",
> +                                               cs->req->in.h.opcode);
> +                               return -EINVAL;
> +                       }

Just keep the WARN_ON_ONCE() and drop everything else, including
return -EINVAL.  The same thing should happen without the arg->size ==
0 check.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands
  2024-11-21 23:43 ` [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
@ 2024-11-23  9:52   ` Miklos Szeredi
  2024-11-23 12:42     ` Bernd Schubert
  0 siblings, 1 reply; 24+ messages in thread
From: Miklos Szeredi @ 2024-11-23  9:52 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Fri, 22 Nov 2024 at 00:44, Bernd Schubert <[email protected]> wrote:

> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
> +{
> +       struct fuse_ring *ring = NULL;
> +       size_t nr_queues = num_possible_cpus();
> +       struct fuse_ring *res = NULL;
> +
> +       ring = kzalloc(sizeof(*fc->ring) +
> +                              nr_queues * sizeof(struct fuse_ring_queue),

Left over from a previous version?

> +static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
> +                                struct fuse_ring_queue *queue)
> +       __must_hold(&queue->lock)
> +{
> +       struct fuse_ring *ring = queue->ring;
> +
> +       lockdep_assert_held(&queue->lock);
> +
> +       /* unsets all previous flags - basically resets */
> +       pr_devel("%s ring=%p qid=%d state=%d\n", __func__, ring,
> +                ring_ent->queue->qid, ring_ent->state);
> +
> +       if (WARN_ON(ring_ent->state != FRRS_COMMIT)) {
> +               pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
> +                       ring_ent->state);
> +               return;
> +       }
> +
> +       list_move(&ring_ent->list, &queue->ent_avail_queue);
> +
> +       ring_ent->state = FRRS_WAIT;
> +}

AFAICS this function is essentially just one line, the rest is
debugging.   While it's good for initial development it's bad for
review because the of the bad signal to noise ratio (the debugging
part is irrelevant for code review).

Would it make sense to post the non-RFC version without most of the
pr_debug()/pr_warn() stuff and just keep the simple WARN_ON() lines
that signal if something has gone wrong.

Long term we could get rid of some of that too.   E.g ring_ent->state
seems to be there just for debugging, but if the code is clean enough
we don't need to have a separate state.

> +#if 0
> +       /* Does not work as sending over io-uring is async */
> +       err = -ETXTBSY;
> +       if (fc->initialized) {
> +               pr_info_ratelimited(
> +                       "Received FUSE_URING_REQ_FETCH after connection is initialized\n");
> +               return err;
> +       }
> +#endif

I fail to remember what's up with this.  Why is it important that
FETCH is sent before INIT reply?

> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 6c506f040d5fb57dae746880c657a95637ac50ce..e82cbf9c569af4f271ba0456cb49e0a5116bf36b 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -8,6 +8,7 @@
>
>  #include <linux/types.h>
>
> +

Unneeded extra line.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands
  2024-11-23  9:52   ` Miklos Szeredi
@ 2024-11-23 12:42     ` Bernd Schubert
  2024-11-23 13:09       ` Miklos Szeredi
  0 siblings, 1 reply; 24+ messages in thread
From: Bernd Schubert @ 2024-11-23 12:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd



On 11/23/24 10:52, Miklos Szeredi wrote:
> On Fri, 22 Nov 2024 at 00:44, Bernd Schubert <[email protected]> wrote:
> 
>> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
>> +{
>> +       struct fuse_ring *ring = NULL;
>> +       size_t nr_queues = num_possible_cpus();
>> +       struct fuse_ring *res = NULL;
>> +
>> +       ring = kzalloc(sizeof(*fc->ring) +
>> +                              nr_queues * sizeof(struct fuse_ring_queue),
> 
> Left over from a previous version?

Why? This struct holds all the queues? We could also put into fc, but
it would take additional memory, even if uring is not used.

> 
>> +static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>> +                                struct fuse_ring_queue *queue)
>> +       __must_hold(&queue->lock)
>> +{
>> +       struct fuse_ring *ring = queue->ring;
>> +
>> +       lockdep_assert_held(&queue->lock);
>> +
>> +       /* unsets all previous flags - basically resets */
>> +       pr_devel("%s ring=%p qid=%d state=%d\n", __func__, ring,
>> +                ring_ent->queue->qid, ring_ent->state);
>> +
>> +       if (WARN_ON(ring_ent->state != FRRS_COMMIT)) {
>> +               pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
>> +                       ring_ent->state);
>> +               return;
>> +       }
>> +
>> +       list_move(&ring_ent->list, &queue->ent_avail_queue);
>> +
>> +       ring_ent->state = FRRS_WAIT;
>> +}
> 
> AFAICS this function is essentially just one line, the rest is
> debugging.   While it's good for initial development it's bad for
> review because the of the bad signal to noise ratio (the debugging
> part is irrelevant for code review).
> 
> Would it make sense to post the non-RFC version without most of the
> pr_debug()/pr_warn() stuff and just keep the simple WARN_ON() lines
> that signal if something has gone wrong.

I can remove the pr_debug/pr_warn lines for the non-RFC version, that
will come early next week.

> 
> Long term we could get rid of some of that too.   E.g ring_ent->state
> seems to be there just for debugging, but if the code is clean enough
> we don't need to have a separate state.

Almost, please see 

[PATCH RFC v6 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination

there you really need a ring state, because access is outside of lists.
Unless you want to iterate over the lists, if the the entry is still
in there. Please see the discussion with Joanne in RFC v5.
I have also added in v6 15/16 comments about non-list access.

> 
>> +#if 0
>> +       /* Does not work as sending over io-uring is async */
>> +       err = -ETXTBSY;
>> +       if (fc->initialized) {
>> +               pr_info_ratelimited(
>> +                       "Received FUSE_URING_REQ_FETCH after connection is initialized\n");
>> +               return err;
>> +       }
>> +#endif
> 
> I fail to remember what's up with this.  Why is it important that
> FETCH is sent before INIT reply?

That is basically what I try to explain with the comment,
it does not work. I had left it in the code, so that you would
run over it, looks like that worked  :)

Even though libfuse sends the SQEs before
setting up /dev/fuse threads, handling the SQEs takes longer.
So what happens is that while IORING_OP_URING_CMD/FUSE_URING_REQ_FETCH
are coming in, FUSE_INIT reply gets through. In userspace we do not
know at all, when these SQEs are registered, because we don't get
a reply. Even worse, we don't even know if io-uring works at all and
cannot adjust number of /dev/fuse handling threads. Here setup with
ioctls had a clear advantage - there was a clear reply.

The other issue is, that we will probably first need handle FUSE_INIT
in userspace before sending SQEs at all, in order to know the payload
buffer size.

> 
>> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
>> index 6c506f040d5fb57dae746880c657a95637ac50ce..e82cbf9c569af4f271ba0456cb49e0a5116bf36b 100644
>> --- a/fs/fuse/fuse_dev_i.h
>> +++ b/fs/fuse/fuse_dev_i.h
>> @@ -8,6 +8,7 @@
>>
>>  #include <linux/types.h>
>>
>> +
> 
> Unneeded extra line.

Yeah, I didn't review these patches on my own yet, there are probably
more tiny things like that all over the place - will be done by next
with in the non-RFC version.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands
  2024-11-23 12:42     ` Bernd Schubert
@ 2024-11-23 13:09       ` Miklos Szeredi
  2024-11-24 21:25         ` Bernd Schubert
  0 siblings, 1 reply; 24+ messages in thread
From: Miklos Szeredi @ 2024-11-23 13:09 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Sat, 23 Nov 2024 at 13:42, Bernd Schubert <[email protected]> wrote:
>
>
>
> On 11/23/24 10:52, Miklos Szeredi wrote:
> > On Fri, 22 Nov 2024 at 00:44, Bernd Schubert <[email protected]> wrote:
> >
> >> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
> >> +{
> >> +       struct fuse_ring *ring = NULL;
> >> +       size_t nr_queues = num_possible_cpus();
> >> +       struct fuse_ring *res = NULL;
> >> +
> >> +       ring = kzalloc(sizeof(*fc->ring) +
> >> +                              nr_queues * sizeof(struct fuse_ring_queue),
> >
> > Left over from a previous version?
>
> Why? This struct holds all the queues? We could also put into fc, but
> it would take additional memory, even if uring is not used.

But fuse_ring_queue is allocated on demand in
fuse_uring_create_queue().  Where is this space actually gets used?

> there you really need a ring state, because access is outside of lists.
> Unless you want to iterate over the lists, if the the entry is still
> in there. Please see the discussion with Joanne in RFC v5.
> I have also added in v6 15/16 comments about non-list access.

Okay, let that be then.

> Even though libfuse sends the SQEs before
> setting up /dev/fuse threads, handling the SQEs takes longer.
> So what happens is that while IORING_OP_URING_CMD/FUSE_URING_REQ_FETCH
> are coming in, FUSE_INIT reply gets through. In userspace we do not
> know at all, when these SQEs are registered, because we don't get
> a reply. Even worse, we don't even know if io-uring works at all and
> cannot adjust number of /dev/fuse handling threads. Here setup with
> ioctls had a clear advantage - there was a clear reply.

Server could negotiate fuse uring availability in INIT, which is how
all other feature negotiations work.

> The other issue is, that we will probably first need handle FUSE_INIT
> in userspace before sending SQEs at all, in order to know the payload
> buffer size.

Yeah.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands
  2024-11-23 13:09       ` Miklos Szeredi
@ 2024-11-24 21:25         ` Bernd Schubert
  0 siblings, 0 replies; 24+ messages in thread
From: Bernd Schubert @ 2024-11-24 21:25 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On 11/23/24 14:09, Miklos Szeredi wrote:
> On Sat, 23 Nov 2024 at 13:42, Bernd Schubert <[email protected]> wrote:
>>
>>
>>
>> On 11/23/24 10:52, Miklos Szeredi wrote:
>>> On Fri, 22 Nov 2024 at 00:44, Bernd Schubert <[email protected]> wrote:
>>>
>>>> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
>>>> +{
>>>> +       struct fuse_ring *ring = NULL;
>>>> +       size_t nr_queues = num_possible_cpus();
>>>> +       struct fuse_ring *res = NULL;
>>>> +
>>>> +       ring = kzalloc(sizeof(*fc->ring) +
>>>> +                              nr_queues * sizeof(struct fuse_ring_queue),
>>>
>>> Left over from a previous version?
>>
>> Why? This struct holds all the queues? We could also put into fc, but
>> it would take additional memory, even if uring is not used.
> 
> But fuse_ring_queue is allocated on demand in
> fuse_uring_create_queue().  Where is this space actually gets used?


In fuse_uring_fetch()

	err = -ENOMEM;
	if (!ring) {
		ring = fuse_uring_create(fc);
		if (!ring)
			return err;
	}

	queue = ring->queues[cmd_req->qid];
	if (!queue) {
		queue = fuse_uring_create_queue(ring, cmd_req->qid);
		if (!queue)
			return err;
	}

I.е. the ring object is created dynamically. Btw, I still a bit struggling
with struct names - maybe 'struct fuse_ring_pool' is a better name?


> 
>> there you really need a ring state, because access is outside of lists.
>> Unless you want to iterate over the lists, if the the entry is still
>> in there. Please see the discussion with Joanne in RFC v5.
>> I have also added in v6 15/16 comments about non-list access.
> 
> Okay, let that be then.
> 
>> Even though libfuse sends the SQEs before
>> setting up /dev/fuse threads, handling the SQEs takes longer.
>> So what happens is that while IORING_OP_URING_CMD/FUSE_URING_REQ_FETCH
>> are coming in, FUSE_INIT reply gets through. In userspace we do not
>> know at all, when these SQEs are registered, because we don't get
>> a reply. Even worse, we don't even know if io-uring works at all and
>> cannot adjust number of /dev/fuse handling threads. Here setup with
>> ioctls had a clear advantage - there was a clear reply.
> 
> Server could negotiate fuse uring availability in INIT, which is how
> all other feature negotiations work.
> 
>> The other issue is, that we will probably first need handle FUSE_INIT
>> in userspace before sending SQEs at all, in order to know the payload
>> buffer size.
> 
> Yeah.

Fine with me will move it in libfuse


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-23  9:01   ` Miklos Szeredi
@ 2024-11-25 14:45     ` Bernd Schubert
  2024-11-25 15:34       ` Miklos Szeredi
  0 siblings, 1 reply; 24+ messages in thread
From: Bernd Schubert @ 2024-11-25 14:45 UTC (permalink / raw)
  To: Miklos Szeredi, Bernd Schubert
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd



On 11/23/24 10:01, Miklos Szeredi wrote:
> On Fri, 22 Nov 2024 at 00:44, Bernd Schubert <[email protected]> wrote:
> 
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
>>         struct fuse_inode *fi = get_fuse_inode(inode);
>>         struct fuse_mount *fm = get_fuse_mount(inode);
>>         FUSE_ARGS(args);
>> +       struct fuse_zero_in zero_arg;
> 
> I'd move this to global scope (i.e. just a single instance for all
> uses) and rename to zero_header.
> 
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index fd8898b0c1cca4d117982d5208d78078472b0dfb..6cb45b5332c45f322e9163469ffd114cbc07dc4f 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -1053,6 +1053,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>>
>>         for (i = 0; !err && i < numargs; i++)  {
>>                 struct fuse_arg *arg = &args[i];
>> +
>> +               /* zero headers */
>> +               if (arg->size == 0) {
>> +                       if (WARN_ON_ONCE(i != 0)) {
>> +                               if (cs->req)
>> +                                       pr_err_once(
>> +                                               "fuse: zero size header in opcode %d\n",
>> +                                               cs->req->in.h.opcode);
>> +                               return -EINVAL;
>> +                       }
> 
> Just keep the WARN_ON_ONCE() and drop everything else, including
> return -EINVAL.  The same thing should happen without the arg->size ==
> 0 check

I have to remove the WARN_ON_ONCE condition altogether, gets triggered by
/dev/fuse read (i.e. with io-uring being disabled), in generic/062, 
op code=39 (FUSE_IOCTL).
Without the pr_err_once() and printing the op code it would have been impossible
to see which op code that is - the trace does not help here.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-25 14:45     ` Bernd Schubert
@ 2024-11-25 15:34       ` Miklos Szeredi
  0 siblings, 0 replies; 24+ messages in thread
From: Miklos Szeredi @ 2024-11-25 15:34 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Bernd Schubert, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Joanne Koong, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Mon, 25 Nov 2024 at 15:46, Bernd Schubert <[email protected]> wrote:

> I have to remove the WARN_ON_ONCE condition altogether, gets triggered by
> /dev/fuse read (i.e. with io-uring being disabled), in generic/062,
> op code=39 (FUSE_IOCTL).
> Without the pr_err_once() and printing the op code it would have been impossible
> to see which op code that is - the trace does not help here.

It should, since commit 396b209e405a ("fuse: add simple request tracepoints").

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-11-25 15:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-21 23:43 [PATCH RFC v6 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 03/16] fuse: Move request bits Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
2024-11-23  9:01   ` Miklos Szeredi
2024-11-25 14:45     ` Bernd Schubert
2024-11-25 15:34       ` Miklos Szeredi
2024-11-21 23:43 ` [PATCH RFC v6 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-11-23  9:52   ` Miklos Szeredi
2024-11-23 12:42     ` Bernd Schubert
2024-11-23 13:09       ` Miklos Szeredi
2024-11-24 21:25         ` Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 07/16] fuse: Make fuse_copy non static Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 12/16] fuse: {uring} Allow to queue to the ring Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2024-11-21 23:43 ` [PATCH RFC v6 16/16] fuse: enable fuse-over-io-uring Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox