public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH RFC v5 00/16] fuse: fuse-over-io-uring
@ 2024-11-07 17:03 Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
                   ` (15 more replies)
  0 siblings, 16 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds support for uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
appraoch was taken from ublk.  The patches are in RFC state,
some major changes are still to be expected.

Motivation for these patches is all to increase fuse performance.
In fuse-over-io-uring requests avoid core switching (application
on core X, processing of fuse server on random core Y) and use
shared memory between kernel and userspace to transfer data.
Similar approaches have been taken by ZUFS and FUSE2, though
not over io-uring, but through ioctl IOs

https://lwn.net/Articles/756625/
https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/log/?h=fuse2

Avoiding cache line bouncing / numa systems was discussed
between Amir and Miklos before and Miklos had posted
part of the private discussion here
https://lore.kernel.org/linux-fsdevel/CAJfpegtL3NXPNgK1kuJR8kLu3WkVC_ErBPRfToLEiA_0=w3=hA@mail.gmail.com/

This cache line bouncing should be reduced by these patches, as
a) Switching between kernel and userspace is reduced by 50%,
as the request fetch (by read) and result commit (write) is replaced
by a single and submit and fetch command
b) Submitting via ring can avoid context switches at all.
Note: As of now userspace still needs to transition to the kernel to
wake up the submit the result, though it might be possible to
avoid that as well (for example either with IORING_SETUP_SQPOLL
(basic testing did not show performance advantage for now) or
the task that is submitting fuse requests to the ring could also
poll for results (needs additional work).

I had also noticed waitq wake-up latencies in fuse before
https://lore.kernel.org/lkml/[email protected]/T/

This spinning approach helped with performance (>40% improvement
for file creates), but due to random server side thread/core utilization
spinning cannot be well controlled in /dev/fuse mode.
With fuse-over-io-uring requests are handled on the same core
(sync requests) or on core+1 (large async requests) and performance
improvements are achieved without spinning.

Splice/zero-copy is not supported yet, Ming Lei is working
on io-uring support for ublk_drv, we can probably also use
that approach for fuse and get better zero copy than splice.
https://lore.kernel.org/io-uring/[email protected]/

RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM
(32 cores) with a kernel that has several debug options
enabled (like KASAN and MSAN). RFCv3 is not that well tested yet.
O_DIRECT is currently not working well with /dev/fuse and
also these patches, a patch has been submitted to fix that (although
the approach is refused)
https://www.spinics.net/lists/linux-fsdevel/msg280028.html

Up the to RFCv2 nice effect in io-uring mode was that xftests run faster
(like generic/522 ~2400s /dev/fuse vs. ~1600s patched), though still
slow as this is with ASAN/leak-detection/etc.
With RFCv3 and removed mmap overall run time as approximately the same,
though some optimizations are removed in RFCv3, like submitting to
the ring from the task that created the fuse request (hence, without
io_uring_cmd_complete_in_task()).

The corresponding libfuse patches are on my uring branch,
but need cleanup for submission - will happen during the next
days.
https://github.com/bsbernd/libfuse/tree/uring

Testing with that libfuse branch is possible by running something
like:

example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
/scratch/source /scratch/dest

With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:

cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
    unique: 4, result=104

Without the --uring option "cqe" is replaced by the default "dev"

dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
   unique: 4, success, outsize: 120

TODO list for next RFC version
- make the buffer layout exactly the same as /dev/fuse IO
- different request size - a large ring queue size currently needs
too much memory, even if most of the queue size is needed for small
IOs

Future work
- zero copy

I had run quite some benchmarks with linux-6.2 before LSFMMBPF2023,
which, resulted in some tuning patches (at the end of the
patch series).

Benchmark results (with RFC v1)
=======================================

System used for the benchmark is a 32 core (HyperThreading enabled)
Xeon E5-2650 system. I don't have local disks attached that could do
>5GB/s IOs, for paged and dio results a patched version of passthrough-hp
was used that bypasses final reads/writes.

paged reads
-----------
            128K IO size                      1024K IO size
jobs   /dev/fuse     uring    gain     /dev/fuse    uring   gain
 1        1117        1921    1.72        1902       1942   1.02
 2        2502        3527    1.41        3066       3260   1.06
 4        5052        6125    1.21        5994       6097   1.02
 8        6273       10855    1.73        7101      10491   1.48
16        6373       11320    1.78        7660      11419   1.49
24        6111        9015    1.48        7600       9029   1.19
32        5725        7968    1.39        6986       7961   1.14

dio reads (1024K)
-----------------

jobs   /dev/fuse  uring   gain
1	    2023   3998	  2.42
2	    3375   7950   2.83
4	    3823   15022  3.58
8	    7796   22591  2.77
16	    8520   27864  3.27
24	    8361   20617  2.55
32	    8717   12971  1.55

mmap reads (4K)
---------------
(sequential, I probably should have made it random, sequential exposes
a rather interesting/weird 'optimized' memcpy issue - sequential becomes
reversed order 4K read)
https://lore.kernel.org/linux-fsdevel/[email protected]/

jobs  /dev/fuse     uring    gain
1       130          323     2.49
2       219          538     2.46
4       503         1040     2.07
8       1472        2039     1.38
16      2191        3518     1.61
24      2453        4561     1.86
32      2178        5628     2.58

(Results on request, setting MAP_HUGETLB much improves performance
for both, io-uring mode then has a slight advantage only.)

creates/s
----------
threads /dev/fuse     uring   gain
1          3944       10121   2.57
2          8580       24524   2.86
4         16628       44426   2.67
8         46746       56716   1.21
16        79740      102966   1.29
20        80284      119502   1.49

(the gain drop with >=8 cores needs to be investigated)

Jens had done some benchmarks with v3 and noticed only 
25% improvement and half of CPU time usage, but v3
removes several optimizations (like waking the same core
and avoiding task io_uring_cmd_done in extra task context).
These optimizations will be submitted once the core work
is merged.

Signed-off-by: Bernd Schubert <[email protected]>
---
Changes in v5:
- Main focus in v5 is the separation of headers from payload,
  which required to introduce 'struct fuse_zero_in'.
- Addressed several teardown issues, that were a regression in v4.
- Fixed "BUG: sleeping function called" due to allocation while
  holding a lock reported by David Wei
- Fix function comment reported by kernel test rebot
- Fix set but unused variabled reported by test robot
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Removal of ioctls, all configuration is done dynamically
  on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
  allocated as array of the ring/queue - removal of the tag
  variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
  is more cumbersome now and needs an almost unused 
  struct fuse_pqueue per fuse_ring_queue and uses the unique
  id of fuse requests.
- No device clones needed for to workaroung hanging mounts
  on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.

Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
  as that needs to go through another subsystem/tree) ,
  removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
  for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
  (Josef)
- Addressed several other comments from Josef (I need to go over
  the RFCv2 review again, I'm not sure if everything is addressed
  already)

- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com
- Link to v2: https://lore.kernel.org/all/[email protected]/
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Bernd Schubert (15):
      fuse: rename to fuse_dev_end_requests and make non-static
      fuse: Move fuse_get_dev to header file
      fuse: Move request bits
      fuse: Add fuse-io-uring design documentation
      fuse: make args->in_args[0] to be always the header
      fuse: {uring} Handle SQEs - register commands
      fuse: Make fuse_copy non static
      fuse: Add fuse-io-uring handling into fuse_copy
      fuse: {uring} Add uring sqe commit and fetch support
      fuse: {uring} Handle teardown of ring entries
      fuse: {uring} Add a ring queue and send method
      fuse: {uring} Allow to queue to the ring
      fuse: {uring} Handle IO_URING_F_TASK_DEAD
      fuse: {io-uring} Prevent mount point hang on fuse-server termination
      fuse: enable fuse-over-io-uring

Pavel Begunkov (1):
      io_uring/cmd: let cmds to know about dying task

 Documentation/filesystems/fuse-io-uring.rst |  101 +++
 fs/fuse/Kconfig                             |   12 +
 fs/fuse/Makefile                            |    1 +
 fs/fuse/dax.c                               |   13 +-
 fs/fuse/dev.c                               |  174 ++--
 fs/fuse/dev_uring.c                         | 1208 +++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h                       |  191 +++++
 fs/fuse/dir.c                               |   41 +-
 fs/fuse/fuse_dev_i.h                        |   64 ++
 fs/fuse/fuse_i.h                            |   21 +
 fs/fuse/inode.c                             |    5 +-
 fs/fuse/xattr.c                             |    9 +-
 include/linux/io_uring_types.h              |    1 +
 include/uapi/linux/fuse.h                   |   57 ++
 io_uring/uring_cmd.c                        |    6 +-
 15 files changed, 1827 insertions(+), 77 deletions(-)
---
base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8

Best regards,
-- 
Bernd Schubert <[email protected]>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 01/16] fuse: rename to fuse_dev_end_requests and make non-static
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
 fs/fuse/dev.c        |  7 ++++---
 fs/fuse/fuse_dev_i.h | 15 +++++++++++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de124b3b469f5487beebbaf7630eb3..74cb9ae900525890543e0d79a5a89e5d43d31c9c 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -7,6 +7,7 @@
 */
 
 #include "fuse_i.h"
+#include "fuse_dev_i.h"
 
 #include <linux/init.h>
 #include <linux/module.h>
@@ -2136,7 +2137,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
 }
 
 /* Abort all requests on the given list (pending or processing) */
-static void end_requests(struct list_head *head)
+void fuse_dev_end_requests(struct list_head *head)
 {
 	while (!list_empty(head)) {
 		struct fuse_req *req;
@@ -2239,7 +2240,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		wake_up_all(&fc->blocked_waitq);
 		spin_unlock(&fc->lock);
 
-		end_requests(&to_end);
+		fuse_dev_end_requests(&to_end);
 	} else {
 		spin_unlock(&fc->lock);
 	}
@@ -2269,7 +2270,7 @@ int fuse_dev_release(struct inode *inode, struct file *file)
 			list_splice_init(&fpq->processing[i], &to_end);
 		spin_unlock(&fpq->lock);
 
-		end_requests(&to_end);
+		fuse_dev_end_requests(&to_end);
 
 		/* Are we the last open device? */
 		if (atomic_dec_and_test(&fc->dev_count)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..5a1b8a2775d84274abee46eabb3000345b2d9da0
--- /dev/null
+++ b/fs/fuse/fuse_dev_i.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2008  Miklos Szeredi <[email protected]>
+ */
+#ifndef _FS_FUSE_DEV_I_H
+#define _FS_FUSE_DEV_I_H
+
+#include <linux/types.h>
+
+void fuse_dev_end_requests(struct list_head *head);
+
+#endif
+
+

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 02/16] fuse: Move fuse_get_dev to header file
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 03/16] fuse: Move request bits Bernd Schubert
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Another preparation patch, as this function will be needed by
fuse/dev.c and fuse/dev_uring.c.

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
 fs/fuse/dev.c        | 9 ---------
 fs/fuse/fuse_dev_i.h | 9 +++++++++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 74cb9ae900525890543e0d79a5a89e5d43d31c9c..9ac69fd2cead0d1fe062dc3405a7dedcd1d36691 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -32,15 +32,6 @@ MODULE_ALIAS("devname:fuse");
 
 static struct kmem_cache *fuse_req_cachep;
 
-static struct fuse_dev *fuse_get_dev(struct file *file)
-{
-	/*
-	 * Lockless access is OK, because file->private data is set
-	 * once during mount and is valid until the file is released.
-	 */
-	return READ_ONCE(file->private_data);
-}
-
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
 {
 	INIT_LIST_HEAD(&req->list);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 5a1b8a2775d84274abee46eabb3000345b2d9da0..b38e67b3f889f3fa08f7279e3309cde908527146 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,15 @@
 
 #include <linux/types.h>
 
+static inline struct fuse_dev *fuse_get_dev(struct file *file)
+{
+	/*
+	 * Lockless access is OK, because file->private data is set
+	 * once during mount and is valid until the file is released.
+	 */
+	return READ_ONCE(file->private_data);
+}
+
 void fuse_dev_end_requests(struct list_head *head);
 
 #endif

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 03/16] fuse: Move request bits
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

These are needed by dev_uring functions as well

Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
 fs/fuse/dev.c        | 4 ----
 fs/fuse/fuse_dev_i.h | 4 ++++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9ac69fd2cead0d1fe062dc3405a7dedcd1d36691..dbc222f9b0f0e590ce3ef83077e6b4cff03cff65 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -26,10 +26,6 @@
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
 
-/* Ordinary requests have even IDs, while interrupts IDs are odd */
-#define FUSE_INT_REQ_BIT (1ULL << 0)
-#define FUSE_REQ_ID_STEP (1ULL << 1)
-
 static struct kmem_cache *fuse_req_cachep;
 
 static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index b38e67b3f889f3fa08f7279e3309cde908527146..6c506f040d5fb57dae746880c657a95637ac50ce 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,10 @@
 
 #include <linux/types.h>
 
+/* Ordinary requests have even IDs, while interrupts IDs are odd */
+#define FUSE_INT_REQ_BIT (1ULL << 0)
+#define FUSE_REQ_ID_STEP (1ULL << 1)
+
 static inline struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 04/16] fuse: Add fuse-io-uring design documentation
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (2 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 03/16] fuse: Move request bits Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Signed-off-by: Bernd Schubert <[email protected]>
---
 Documentation/filesystems/fuse-io-uring.rst | 101 ++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 0000000000000000000000000000000000000000..50fdba1ea566588be3663e29b04bb9bbb6c9e4fb
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+FUSE Uring design documentation
+==============================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through uring, userspace
+is required to also handle requests through /dev/fuse after
+uring setup is complete.  Specifically notifications (initiated from
+the daemon side) and interrupts.
+
+Fuse io-uring configuration
+========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until uring setup is complete.
+
+In order to set up fuse-over-io-uring fuse-server (user-space)
+needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the
+/dev/fuse connection file descriptor. Initial submit is with
+the sub command FUSE_URING_REQ_FETCH, which will just register
+entries to be available in the kernel.
+
+Once at least one entry per queue is submitted, kernel starts
+to enqueue to ring queues.
+Note, every CPU core has its own fuse-io-uring queue.
+Userspace handles the CQE/fuse-request and submits the result as
+subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
+the requests and also marks the entry available again. If there are
+pending requests waiting the request will be immediately submitted
+to the daemon again.
+
+Initial SQE
+-----------
+
+ |                                    |  FUSE filesystem daemon
+ |                                    |
+ |                                    |  >io_uring_submit()
+ |                                    |   IORING_OP_URING_CMD /
+ |                                    |   FUSE_URING_REQ_FETCH
+ |                                    |  [wait cqe]
+ |                                    |   >io_uring_wait_cqe() or
+ |                                    |   >io_uring_submit_and_wait()
+ |                                    |
+ |  >fuse_uring_cmd()                 |
+ |   >fuse_uring_fetch()              |
+ |    >fuse_uring_ent_release()       |
+
+
+Sending requests with CQEs
+--------------------------
+
+ |                                         |  FUSE filesystem daemon
+ |                                         |  [waiting for CQEs]
+ |  "rm /mnt/fuse/file"                    |
+ |                                         |
+ |  >sys_unlink()                          |
+ |    >fuse_unlink()                       |
+ |      [allocate request]                 |
+ |      >__fuse_request_send()             |
+ |        ...                              |
+ |       >fuse_uring_queue_fuse_req        |
+ |        [queue request on fg or          |
+ |          bg queue]                      |
+ |         >fuse_uring_assign_ring_entry() |
+ |         >fuse_uring_send_to_ring()      |
+ |          >fuse_uring_copy_to_ring()     |
+ |          >io_uring_cmd_done()           |
+ |          >request_wait_answer()         |
+ |           [sleep on req->waitq]         |
+ |                                         |  [receives and handles CQE]
+ |                                         |  [submit result and fetch next]
+ |                                         |  >io_uring_submit()
+ |                                         |   IORING_OP_URING_CMD/
+ |                                         |   FUSE_URING_REQ_COMMIT_AND_FETCH
+ |  >fuse_uring_cmd()                      |
+ |   >fuse_uring_commit_and_release()      |
+ |    >fuse_uring_copy_from_ring()         |
+ |     [ copy the result to the fuse req]  |
+ |     >fuse_uring_req_end_and_get_next()  |
+ |      >fuse_request_end()                |
+ |       [wake up req->waitq]              |
+ |      >fuse_uring_ent_release_and_fetch()|
+ |       [wait or handle next req]         |
+ |                                         |
+ |                                         |
+ |       [req->waitq woken up]             |
+ |    <fuse_unlink()                       |
+ |  <sys_unlink()                          |
+
+
+

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (3 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-14 20:57   ` Joanne Koong
  2024-11-07 17:03 ` [PATCH RFC v5 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This change sets up FUSE operations to have headers in args.in_args[0],
even for opcodes without an actual header. We do this to prepare for
cleanly separating payload from headers in the future.

For opcodes without a header, we use a zero-sized struct as a
placeholder. This approach:
- Keeps things consistent across all FUSE operations
- Will help with payload alignment later
- Avoids future issues when header sizes change

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dax.c    | 13 ++++++++-----
 fs/fuse/dev.c    | 24 ++++++++++++++++++++----
 fs/fuse/dir.c    | 41 +++++++++++++++++++++++++++--------------
 fs/fuse/fuse_i.h |  7 +++++++
 fs/fuse/xattr.c  |  9 ++++++---
 5 files changed, 68 insertions(+), 26 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
 	struct fuse_inode *fi = get_fuse_inode(inode);
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	FUSE_ARGS(args);
+	struct fuse_zero_in zero_arg;
 
 	args.opcode = FUSE_REMOVEMAPPING;
 	args.nodeid = fi->nodeid;
-	args.in_numargs = 2;
-	args.in_args[0].size = sizeof(*inargp);
-	args.in_args[0].value = inargp;
-	args.in_args[1].size = inargp->count * sizeof(*remove_one);
-	args.in_args[1].value = remove_one;
+	args.in_numargs = 3;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = sizeof(*inargp);
+	args.in_args[1].value = inargp;
+	args.in_args[2].size = inargp->count * sizeof(*remove_one);
+	args.in_args[2].value = remove_one;
 	return fuse_simple_request(fm, &args);
 }
 
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index dbc222f9b0f0e590ce3ef83077e6b4cff03cff65..6effef4073da3dad2f6140761eca98147a41d88d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1007,6 +1007,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
 
 	for (i = 0; !err && i < numargs; i++)  {
 		struct fuse_arg *arg = &args[i];
+
+		/* zero headers */
+		if (arg->size == 0) {
+			if (WARN_ON_ONCE(i != 0)) {
+				if (cs->req)
+					pr_err_once(
+						"fuse: zero size header in opcode %d\n",
+						cs->req->in.h.opcode);
+				return -EINVAL;
+			}
+			continue;
+		}
+
 		if (i == numargs - 1 && argpages)
 			err = fuse_copy_pages(cs, arg->size, zeroing);
 		else
@@ -1662,6 +1675,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	size_t args_size = sizeof(*ra);
 	struct fuse_args_pages *ap;
 	struct fuse_args *args;
+	struct fuse_zero_in zero_arg;
 
 	offset = outarg->offset & ~PAGE_MASK;
 	file_size = i_size_read(inode);
@@ -1688,7 +1702,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	args = &ap->args;
 	args->nodeid = outarg->nodeid;
 	args->opcode = FUSE_NOTIFY_REPLY;
-	args->in_numargs = 2;
+	args->in_numargs = 3;
 	args->in_pages = true;
 	args->end = fuse_retrieve_end;
 
@@ -1715,9 +1729,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
 	}
 	ra->inarg.offset = outarg->offset;
 	ra->inarg.size = total_len;
-	args->in_args[0].size = sizeof(ra->inarg);
-	args->in_args[0].value = &ra->inarg;
-	args->in_args[1].size = total_len;
+	args->in_args[0].size = sizeof(zero_arg);
+	args->in_args[0].value = &zero_arg;
+	args->in_args[1].size = sizeof(ra->inarg);
+	args->in_args[1].value = &ra->inarg;
+	args->in_args[2].size = total_len;
 
 	err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
 	if (err)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 2b0d4781f39484d50d1fd7f4f673d8b08c5fd7cf..6d67d7f8e6b4460c759df3fb293e169bcc78a897 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -172,12 +172,16 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 			     u64 nodeid, const struct qstr *name,
 			     struct fuse_entry_out *outarg)
 {
+	struct fuse_zero_in zero_arg;
+
 	memset(outarg, 0, sizeof(struct fuse_entry_out));
 	args->opcode = FUSE_LOOKUP;
 	args->nodeid = nodeid;
-	args->in_numargs = 1;
-	args->in_args[0].size = name->len + 1;
-	args->in_args[0].value = name->name;
+	args->in_numargs = 2;
+	args->in_args[0].size = sizeof(zero_arg);
+	args->in_args[0].value = &zero_arg;
+	args->in_args[1].size = name->len + 1;
+	args->in_args[1].value = name->name;
 	args->out_numargs = 1;
 	args->out_args[0].size = sizeof(struct fuse_entry_out);
 	args->out_args[0].value = outarg;
@@ -915,16 +919,19 @@ static int fuse_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
 			struct dentry *entry, const char *link)
 {
+	struct fuse_zero_in zero_arg;
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	unsigned len = strlen(link) + 1;
 	FUSE_ARGS(args);
 
 	args.opcode = FUSE_SYMLINK;
-	args.in_numargs = 2;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
-	args.in_args[1].size = len;
-	args.in_args[1].value = link;
+	args.in_numargs = 3;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
+	args.in_args[2].size = len;
+	args.in_args[2].value = link;
 	return create_new_entry(fm, &args, dir, entry, S_IFLNK);
 }
 
@@ -975,6 +982,7 @@ static void fuse_entry_unlinked(struct dentry *entry)
 
 static int fuse_unlink(struct inode *dir, struct dentry *entry)
 {
+	struct fuse_zero_in inarg;
 	int err;
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
@@ -984,9 +992,11 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 
 	args.opcode = FUSE_UNLINK;
 	args.nodeid = get_node_id(dir);
-	args.in_numargs = 1;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
+	args.in_numargs = 2;
+	args.in_args[0].size = sizeof(inarg);
+	args.in_args[0].value = &inarg;
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
 	err = fuse_simple_request(fm, &args);
 	if (!err) {
 		fuse_dir_changed(dir);
@@ -998,6 +1008,7 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 
 static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 {
+	struct fuse_zero_in zero_arg;
 	int err;
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
@@ -1007,9 +1018,11 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 
 	args.opcode = FUSE_RMDIR;
 	args.nodeid = get_node_id(dir);
-	args.in_numargs = 1;
-	args.in_args[0].size = entry->d_name.len + 1;
-	args.in_args[0].value = entry->d_name.name;
+	args.in_numargs = 2;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = entry->d_name.len + 1;
+	args.in_args[1].value = entry->d_name.name;
 	err = fuse_simple_request(fm, &args);
 	if (!err) {
 		fuse_dir_changed(dir);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f2391961031374d8d55916c326c6472f0c03aae6..e2d1d90dfdb13b2c3e7de4789501ee45d3bf7794 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -941,6 +941,13 @@ struct fuse_mount {
 	struct rcu_head rcu;
 };
 
+/*
+ * Empty header for FUSE opcodes without specific header needs.
+ * Used as a placeholder in args->in_args[0] for consistency
+ * across all FUSE operations, simplifying request handling.
+ */
+struct fuse_zero_in {};
+
 static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
 {
 	return sb->s_fs_info;
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 5b423fdbb13f8f17c3982e96dd0de836662092b0..2df1efd2e9bdb46571148f484d7927044f31c184 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -158,15 +158,18 @@ int fuse_removexattr(struct inode *inode, const char *name)
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	FUSE_ARGS(args);
 	int err;
+	struct fuse_zero_in zero_arg;
 
 	if (fm->fc->no_removexattr)
 		return -EOPNOTSUPP;
 
 	args.opcode = FUSE_REMOVEXATTR;
 	args.nodeid = get_node_id(inode);
-	args.in_numargs = 1;
-	args.in_args[0].size = strlen(name) + 1;
-	args.in_args[0].value = name;
+	args.in_numargs = 2;
+	args.in_args[0].size = sizeof(zero_arg);
+	args.in_args[0].value = &zero_arg;
+	args.in_args[1].size = strlen(name) + 1;
+	args.in_args[1].value = name;
 	err = fuse_simple_request(fm, &args);
 	if (err == -ENOSYS) {
 		fm->fc->no_removexattr = 1;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 06/16] fuse: {uring} Handle SQEs - register commands
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (4 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 07/16] fuse: Make fuse_copy non static Bernd Schubert
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_URING_REQ_FETCH is handled to register queue entries.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/Kconfig           |  12 ++
 fs/fuse/Makefile          |   1 +
 fs/fuse/dev.c             |   4 +
 fs/fuse/dev_uring.c       | 349 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h     | 108 ++++++++++++++
 fs/fuse/fuse_dev_i.h      |   1 +
 fs/fuse/fuse_i.h          |   5 +
 fs/fuse/inode.c           |   3 +
 include/uapi/linux/fuse.h |  57 ++++++++
 9 files changed, 540 insertions(+)

diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..11f37cefc94b2af5a675c238801560c822b95f1a 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
 	  to be performed directly on a backing file.
 
 	  If you want to allow passthrough operations, answer Y.
+
+config FUSE_IO_URING
+	bool "FUSE communication over io-uring"
+	default y
+	depends on FUSE_FS
+	depends on IO_URING
+	help
+	  This allows sending FUSE requests over the IO uring interface and
+          also adds request core affinity.
+
+	  If you want to allow fuse server/client communication through io-uring,
+	  answer Y
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 6e0228c6d0cba9541c8668efb86b83094751d469..7193a14374fd3a08b901ef53fbbea7c31b12f22c 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -11,5 +11,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
 fuse-y += iomode.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
 fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
+fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6effef4073da3dad2f6140761eca98147a41d88d..d4e7d69f79cec192cb456aedfb7d4a2a274fea80 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -6,6 +6,7 @@
   See the file COPYING.
 */
 
+#include "dev_uring_i.h"
 #include "fuse_i.h"
 #include "fuse_dev_i.h"
 
@@ -2414,6 +2415,9 @@ const struct file_operations fuse_dev_operations = {
 	.fasync		= fuse_dev_fasync,
 	.unlocked_ioctl = fuse_dev_ioctl,
 	.compat_ioctl   = compat_ptr_ioctl,
+#ifdef CONFIG_FUSE_IO_URING
+	.uring_cmd	= fuse_uring_cmd,
+#endif
 };
 EXPORT_SYMBOL_GPL(fuse_dev_operations);
 
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
new file mode 100644
index 0000000000000000000000000000000000000000..ce0a41b00613133ea1b8062290bc960b95254ac9
--- /dev/null
+++ b/fs/fuse/dev_uring.c
@@ -0,0 +1,349 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#include <linux/fs.h>
+
+#include "fuse_i.h"
+#include "dev_uring_i.h"
+#include "fuse_dev_i.h"
+
+#include <linux/io_uring/cmd.h>
+
+#ifdef CONFIG_FUSE_IO_URING
+static bool __read_mostly enable_uring;
+module_param(enable_uring, bool, 0644);
+MODULE_PARM_DESC(enable_uring,
+		 "Enable uring userspace communication through uring.");
+#endif
+
+static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
+{
+	struct fuse_ring_queue *queue = ent->queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
+		return -EIO;
+
+	ent->state = FRRS_COMMIT;
+	list_move(&ent->list, &queue->ent_intermediate_queue);
+
+	return 0;
+}
+
+void fuse_uring_destruct(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+	int qid;
+
+	if (!ring)
+		return;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		WARN_ON(!list_empty(&queue->ent_avail_queue));
+		WARN_ON(!list_empty(&queue->ent_intermediate_queue));
+
+		kfree(queue);
+		ring->queues[qid] = NULL;
+	}
+
+	kfree(ring->queues);
+	kfree(ring);
+	fc->ring = NULL;
+}
+
+#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+
+/*
+ * Basic ring setup for this connection based on the provided configuration
+ */
+static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = NULL;
+	size_t nr_queues = num_possible_cpus();
+	struct fuse_ring *res = NULL;
+
+	ring = kzalloc(sizeof(*fc->ring) +
+			       nr_queues * sizeof(struct fuse_ring_queue),
+		       GFP_KERNEL_ACCOUNT);
+	if (!ring)
+		return NULL;
+
+	ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
+			       GFP_KERNEL_ACCOUNT);
+	if (!ring->queues)
+		goto out_err;
+
+	spin_lock(&fc->lock);
+	if (fc->ring) {
+		/* race, another thread created the ring in the mean time */
+		spin_unlock(&fc->lock);
+		res = fc->ring;
+		goto out_err;
+	}
+
+	fc->ring = ring;
+	ring->nr_queues = nr_queues;
+	ring->fc = fc;
+
+	spin_unlock(&fc->lock);
+	return ring;
+
+out_err:
+	if (ring)
+		kfree(ring->queues);
+	kfree(ring);
+	return res;
+}
+
+static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
+						       int qid)
+{
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_ring_queue *queue;
+
+	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
+	if (!queue)
+		return ERR_PTR(-ENOMEM);
+	spin_lock(&fc->lock);
+	if (ring->queues[qid]) {
+		spin_unlock(&fc->lock);
+		kfree(queue);
+		return ring->queues[qid];
+	}
+	ring->queues[qid] = queue;
+
+	queue->qid = qid;
+	queue->ring = ring;
+	spin_lock_init(&queue->lock);
+
+	INIT_LIST_HEAD(&queue->ent_avail_queue);
+	INIT_LIST_HEAD(&queue->ent_intermediate_queue);
+
+	spin_unlock(&fc->lock);
+
+	return queue;
+}
+
+/*
+ * Put a ring request onto hold, it is no longer used for now.
+ */
+static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
+				 struct fuse_ring_queue *queue)
+	__must_hold(&queue->lock)
+{
+	struct fuse_ring *ring = queue->ring;
+
+	lockdep_assert_held(&queue->lock);
+
+	/* unsets all previous flags - basically resets */
+	pr_devel("%s ring=%p qid=%d state=%d\n", __func__, ring,
+		 ring_ent->queue->qid, ring_ent->state);
+
+	if (WARN_ON(ring_ent->state != FRRS_COMMIT)) {
+		pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+			ring_ent->state);
+		return;
+	}
+
+	list_move(&ring_ent->list, &queue->ent_avail_queue);
+
+	ring_ent->state = FRRS_WAIT;
+}
+
+/*
+ * fuse_uring_req_fetch command handling
+ */
+static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
+			      struct io_uring_cmd *cmd,
+			      unsigned int issue_flags)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	spin_lock(&queue->lock);
+	fuse_uring_ent_avail(ring_ent, queue);
+	ring_ent->cmd = cmd;
+	spin_unlock(&queue->lock);
+}
+
+/*
+ * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
+ * the payload
+ */
+static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
+					 struct iovec iov[FUSE_URING_IOV_SEGS])
+{
+	struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	struct iov_iter iter;
+	ssize_t ret;
+
+	if (sqe->len != FUSE_URING_IOV_SEGS)
+		return -EINVAL;
+
+	/*
+	 * Direction for buffer access will actually be READ and WRITE,
+	 * using write for the import should include READ access as well.
+	 */
+	ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
+			   FUSE_URING_IOV_SEGS, &iov, &iter);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
+			    struct fuse_conn *fc)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	struct fuse_ring_ent *ring_ent;
+	int err;
+	struct iovec iov[FUSE_URING_IOV_SEGS];
+
+	err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
+	if (err) {
+		pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
+				    err);
+		return err;
+	}
+
+#if 0
+	/* Does not work as sending over io-uring is async */
+	err = -ETXTBSY;
+	if (fc->initialized) {
+		pr_info_ratelimited(
+			"Received FUSE_URING_REQ_FETCH after connection is initialized\n");
+		return err;
+	}
+#endif
+
+	err = -ENOMEM;
+	if (!ring) {
+		ring = fuse_uring_create(fc);
+		if (!ring)
+			return err;
+	}
+
+	queue = ring->queues[cmd_req->qid];
+	if (!queue) {
+		queue = fuse_uring_create_queue(ring, cmd_req->qid);
+		if (!queue)
+			return err;
+	}
+
+	/*
+	 * The created queue above does not need to be destructed in
+	 * case of entry errors below, will be done at ring destruction time.
+	 */
+
+	ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
+	if (ring_ent == NULL)
+		return err;
+
+	INIT_LIST_HEAD(&ring_ent->list);
+
+	ring_ent->queue = queue;
+	ring_ent->cmd = cmd;
+
+	err = -EINVAL;
+	if (iov[0].iov_len < sizeof(struct fuse_ring_req_header)) {
+		pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
+		goto err;
+	}
+
+	ring_ent->headers = iov[0].iov_base;
+	ring_ent->payload = iov[1].iov_base;
+	ring_ent->max_arg_len = iov[1].iov_len;
+
+	if (ring_ent->max_arg_len <
+	    max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write)) {
+		pr_info_ratelimited("Invalid req payload len %zu\n",
+				    ring_ent->max_arg_len);
+		goto err;
+	}
+
+	spin_lock(&queue->lock);
+
+	/*
+	 * FUSE_URING_REQ_FETCH is an initialization exception, needs
+	 * state override
+	 */
+	ring_ent->state = FRRS_USERSPACE;
+	err = fuse_ring_ent_unset_userspace(ring_ent);
+	spin_unlock(&queue->lock);
+	if (WARN_ON_ONCE(err != 0))
+		goto err;
+
+	_fuse_uring_fetch(ring_ent, cmd, issue_flags);
+
+	return 0;
+err:
+	list_del_init(&ring_ent->list);
+	kfree(ring_ent);
+	return err;
+}
+
+/*
+ * Entry function from io_uring to handle the given passthrough command
+ * (op cocde IORING_OP_URING_CMD)
+ */
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_dev *fud;
+	struct fuse_conn *fc;
+	u32 cmd_op = cmd->cmd_op;
+	int err = 0;
+
+	/* Disabled for now, especially as teardown is not implemented yet */
+	err = -EOPNOTSUPP;
+	pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
+	goto out;
+
+	err = -EOPNOTSUPP;
+	if (!enable_uring) {
+		pr_info_ratelimited("uring is disabled\n");
+		goto out;
+	}
+
+	err = -ENOTCONN;
+	fud = fuse_get_dev(cmd->file);
+	if (!fud) {
+		pr_info_ratelimited("No fuse device found\n");
+		goto out;
+	}
+	fc = fud->fc;
+
+	if (fc->aborted)
+		goto out;
+
+	switch (cmd_op) {
+	case FUSE_URING_REQ_FETCH:
+		err = fuse_uring_fetch(cmd, issue_flags, fc);
+		if (err)
+			pr_info_once("fuse_uring_fetch failed err=%d\n", err);
+		break;
+	default:
+		err = -EINVAL;
+		pr_devel("Unknown uring command %d", cmd_op);
+		goto out;
+	}
+out:
+	pr_devel("uring cmd op=%d, qid=%d ID=%llu ret=%d\n", cmd_op,
+		 cmd_req->qid, cmd_req->commit_id, err);
+
+	if (err < 0)
+		io_uring_cmd_done(cmd, err, 0, issue_flags);
+
+	return -EIOCBQUEUED;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..11798080896372c72692228ff7072bbee6a63e53
--- /dev/null
+++ b/fs/fuse/dev_uring_i.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#ifndef _FS_FUSE_DEV_URING_I_H
+#define _FS_FUSE_DEV_URING_I_H
+
+#include "fuse_i.h"
+
+#ifdef CONFIG_FUSE_IO_URING
+
+enum fuse_ring_req_state {
+
+	/* ring entry received from userspace and it being processed */
+	FRRS_COMMIT,
+
+	/* The ring request waits for a new fuse request */
+	FRRS_WAIT,
+
+	/* request is in or on the way to user space */
+	FRRS_USERSPACE,
+};
+
+/** A fuse ring entry, part of the ring queue */
+struct fuse_ring_ent {
+	/* userspace buffer */
+	struct fuse_ring_req_header __user *headers;
+	void *__user *payload;
+
+	/* the ring queue that owns the request */
+	struct fuse_ring_queue *queue;
+
+	struct io_uring_cmd *cmd;
+
+	struct list_head list;
+
+	/* size of payload buffer */
+	size_t max_arg_len;
+
+	/*
+	 * state the request is currently in
+	 * (enum fuse_ring_req_state)
+	 */
+	unsigned int state;
+
+	struct fuse_req *fuse_req;
+};
+
+struct fuse_ring_queue {
+	/*
+	 * back pointer to the main fuse uring structure that holds this
+	 * queue
+	 */
+	struct fuse_ring *ring;
+
+	/* queue id, typically also corresponds to the cpu core */
+	unsigned int qid;
+
+	/*
+	 * queue lock, taken when any value in the queue changes _and_ also
+	 * a ring entry state changes.
+	 */
+	spinlock_t lock;
+
+	/* available ring entries (struct fuse_ring_ent) */
+	struct list_head ent_avail_queue;
+
+	/*
+	 * entries in the process of being committed or in the process
+	 * to be send to userspace
+	 */
+	struct list_head ent_intermediate_queue;
+};
+
+/**
+ * Describes if uring is for communication and holds alls the data needed
+ * for uring communication
+ */
+struct fuse_ring {
+	/* back pointer */
+	struct fuse_conn *fc;
+
+	/* number of ring queues */
+	size_t nr_queues;
+
+	struct fuse_ring_queue **queues;
+};
+
+void fuse_uring_destruct(struct fuse_conn *fc);
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+
+#else /* CONFIG_FUSE_IO_URING */
+
+struct fuse_ring;
+
+static inline void fuse_uring_create(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_destruct(struct fuse_conn *fc)
+{
+}
+
+#endif /* CONFIG_FUSE_IO_URING */
+
+#endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 6c506f040d5fb57dae746880c657a95637ac50ce..e82cbf9c569af4f271ba0456cb49e0a5116bf36b 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,7 @@
 
 #include <linux/types.h>
 
+
 /* Ordinary requests have even IDs, while interrupts IDs are odd */
 #define FUSE_INT_REQ_BIT (1ULL << 0)
 #define FUSE_REQ_ID_STEP (1ULL << 1)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e2d1d90dfdb13b2c3e7de4789501ee45d3bf7794..91c2e7e35cdbd470894a8a9cd026b77368b7a4b6 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -917,6 +917,11 @@ struct fuse_conn {
 	/** IDR for backing files ids */
 	struct idr backing_files_map;
 #endif
+
+#ifdef CONFIG_FUSE_IO_URING
+	/**  uring connection information*/
+	struct fuse_ring *ring;
+#endif
 };
 
 /*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d8756ded7145f38b49d129b361b991ba..59f8fb7b915f052f892d587a0f9a8dc17cf750ce 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -7,6 +7,7 @@
 */
 
 #include "fuse_i.h"
+#include "dev_uring_i.h"
 
 #include <linux/pagemap.h>
 #include <linux/slab.h>
@@ -947,6 +948,8 @@ static void delayed_release(struct rcu_head *p)
 {
 	struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
 
+	fuse_uring_destruct(fc);
+
 	put_user_ns(fc->user_ns);
 	fc->release(fc);
 }
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d08b99d60f6fd6d0d072d01ad6bcc1b48da0a242..2fddc2e29f86cec25b05832ae7a622898a84b00f 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -1186,4 +1186,61 @@ struct fuse_supp_groups {
 	uint32_t	groups[];
 };
 
+/**
+ * Size of the ring buffer header
+ */
+#define FUSE_HEADER_SZ 256
+#define FUSE_IN_OUT_HEADER_SZ 128
+
+/**
+ * This structure mapped onto the
+ */
+struct fuse_ring_req_header {
+	union {
+		char ring_header[FUSE_HEADER_SZ];
+
+		struct {
+			uint64_t flags;
+
+			uint32_t in_out_arg_len;
+			uint32_t padding;
+			union {
+				char in_out[FUSE_IN_OUT_HEADER_SZ];
+				struct fuse_in_header in;
+				struct fuse_out_header out;
+			};
+
+			/* fuse operaration header */
+			char op_in[];
+		};
+	};
+};
+
+/**
+ * sqe commands to the kernel
+ */
+enum fuse_uring_cmd {
+	FUSE_URING_REQ_INVALID = 0,
+
+	/* submit sqe to kernel to get a request */
+	FUSE_URING_REQ_FETCH = 1,
+
+	/* commit result and fetch next request */
+	FUSE_URING_REQ_COMMIT_AND_FETCH = 2,
+};
+
+/**
+ * In the 80B command area of the SQE.
+ */
+struct fuse_uring_cmd_req {
+	uint64_t flags;
+
+	/* entry identifier */
+	uint64_t commit_id;
+
+	/* queue the command is for (queue index) */
+	uint16_t qid;
+	uint8_t padding[6];
+};
+
 #endif /* _LINUX_FUSE_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 07/16] fuse: Make fuse_copy non static
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (5 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c        | 30 ++++++++----------------------
 fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index d4e7d69f79cec192cb456aedfb7d4a2a274fea80..f210f91a937b24e75a467e943cdec4581900e061 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -630,22 +630,8 @@ static int unlock_request(struct fuse_req *req)
 	return err;
 }
 
-struct fuse_copy_state {
-	int write;
-	struct fuse_req *req;
-	struct iov_iter *iter;
-	struct pipe_buffer *pipebufs;
-	struct pipe_buffer *currbuf;
-	struct pipe_inode_info *pipe;
-	unsigned long nr_segs;
-	struct page *pg;
-	unsigned len;
-	unsigned offset;
-	unsigned move_pages:1;
-};
-
-static void fuse_copy_init(struct fuse_copy_state *cs, int write,
-			   struct iov_iter *iter)
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+		    struct iov_iter *iter)
 {
 	memset(cs, 0, sizeof(*cs));
 	cs->write = write;
@@ -999,9 +985,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
 }
 
 /* Copy request arguments to/from userspace buffer */
-static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
-			  unsigned argpages, struct fuse_arg *args,
-			  int zeroing)
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
+		   unsigned argpages, struct fuse_arg *args,
+		   int zeroing)
 {
 	int err = 0;
 	unsigned i;
@@ -1883,8 +1869,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 	return NULL;
 }
 
-static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
-			 unsigned nbytes)
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+		       unsigned nbytes)
 {
 	unsigned reqsize = sizeof(struct fuse_out_header);
 
@@ -1986,7 +1972,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	if (oh.error)
 		err = nbytes != sizeof(oh) ? -EINVAL : 0;
 	else
-		err = copy_out_args(cs, req->args, nbytes);
+		err = fuse_copy_out_args(cs, req->args, nbytes);
 	fuse_copy_finish(cs);
 
 	spin_lock(&fpq->lock);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index e82cbf9c569af4f271ba0456cb49e0a5116bf36b..f36e304cd62c8302aed95de89926fc894f602cfd 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -13,6 +13,23 @@
 #define FUSE_INT_REQ_BIT (1ULL << 0)
 #define FUSE_REQ_ID_STEP (1ULL << 1)
 
+struct fuse_arg;
+struct fuse_args;
+
+struct fuse_copy_state {
+	int write;
+	struct fuse_req *req;
+	struct iov_iter *iter;
+	struct pipe_buffer *pipebufs;
+	struct pipe_buffer *currbuf;
+	struct pipe_inode_info *pipe;
+	unsigned long nr_segs;
+	struct page *pg;
+	unsigned int len;
+	unsigned int offset;
+	unsigned int move_pages:1;
+};
+
 static inline struct fuse_dev *fuse_get_dev(struct file *file)
 {
 	/*
@@ -24,6 +41,14 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 
 void fuse_dev_end_requests(struct list_head *head);
 
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+			   struct iov_iter *iter);
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
+		   unsigned int argpages, struct fuse_arg *args,
+		   int zeroing);
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+		       unsigned int nbytes);
+
 #endif
 
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 08/16] fuse: Add fuse-io-uring handling into fuse_copy
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (6 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 07/16] fuse: Make fuse_copy non static Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

Add special fuse-io-uring into the fuse argument
copy handler.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c        | 12 +++++++++++-
 fs/fuse/fuse_dev_i.h |  5 +++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index f210f91a937b24e75a467e943cdec4581900e061..4ca67c8ae0e28072383478d6ee7ad7791566b6ce 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -738,6 +738,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
 	*size -= ncpy;
 	cs->len -= ncpy;
 	cs->offset += ncpy;
+	if (cs->is_uring)
+		cs->ring.offset += ncpy;
+
 	return ncpy;
 }
 
@@ -1872,7 +1875,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
 int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
 		       unsigned nbytes)
 {
-	unsigned reqsize = sizeof(struct fuse_out_header);
+
+	unsigned int reqsize = 0;
+
+	/*
+	 * Uring has all headers separated from args - args is payload only
+	 */
+	if (!cs->is_uring)
+		reqsize = sizeof(struct fuse_out_header);
 
 	reqsize += fuse_len_args(args->out_numargs, args->out_args);
 
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index f36e304cd62c8302aed95de89926fc894f602cfd..7ecb103af6f0feca99eb8940872c6a5ccf2e5186 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -28,6 +28,11 @@ struct fuse_copy_state {
 	unsigned int len;
 	unsigned int offset;
 	unsigned int move_pages:1;
+	unsigned int is_uring:1;
+	struct {
+		/* overall offset with the user buffer */
+		unsigned int offset;
+	} ring;
 };
 
 static inline struct fuse_dev *fuse_get_dev(struct file *file)

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 09/16] fuse: {uring} Add uring sqe commit and fetch support
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (7 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This adds support for fuse request completion through ring SQEs
(FUSE_URING_REQ_COMMIT_AND_FETCH handling). After committing
the ring entry it becomes available for new fuse requests.
Handling of requests through the ring (SQE/CQE handling)
is complete now.

Fuse request data are copied through the mmaped ring buffer,
there is no support for any zero copy yet.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |   6 +-
 fs/fuse/dev_uring.c   | 449 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |  11 ++
 fs/fuse/fuse_dev_i.h  |   7 +-
 fs/fuse/fuse_i.h      |   9 +
 fs/fuse/inode.c       |   2 +-
 6 files changed, 479 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 4ca67c8ae0e28072383478d6ee7ad7791566b6ce..b085176ea824bd612a8736e00c9b6f8f9e232208 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -188,7 +188,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq)
 }
 EXPORT_SYMBOL_GPL(fuse_get_unique);
 
-static unsigned int fuse_req_hash(u64 unique)
+unsigned int fuse_req_hash(u64 unique)
 {
 	return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
 }
@@ -1860,7 +1860,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
 }
 
 /* Look up request on processing list by unique ID */
-static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique)
 {
 	unsigned int hash = fuse_req_hash(unique);
 	struct fuse_req *req;
@@ -1944,7 +1944,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	spin_lock(&fpq->lock);
 	req = NULL;
 	if (fpq->connected)
-		req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
+		req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
 
 	err = -ENOENT;
 	if (!req) {
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index ce0a41b00613133ea1b8062290bc960b95254ac9..4f8a0bd1e2192bfbc310eb53dd8e89274e6f479b 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -19,6 +19,24 @@ MODULE_PARM_DESC(enable_uring,
 		 "Enable uring userspace communication through uring.");
 #endif
 
+#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+
+/*
+ * Finalize a fuse request, then fetch and send the next entry, if available
+ */
+static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
+			       int error)
+{
+	struct fuse_req *req = ring_ent->fuse_req;
+
+	if (set_err)
+		req->out.h.error = error;
+
+	clear_bit(FR_SENT, &req->flags);
+	fuse_request_end(ring_ent->fuse_req);
+	ring_ent->fuse_req = NULL;
+}
+
 static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
 {
 	struct fuse_ring_queue *queue = ent->queue;
@@ -50,7 +68,9 @@ void fuse_uring_destruct(struct fuse_conn *fc)
 
 		WARN_ON(!list_empty(&queue->ent_avail_queue));
 		WARN_ON(!list_empty(&queue->ent_intermediate_queue));
+		WARN_ON(!list_empty(&queue->ent_in_userspace));
 
+		kfree(queue->fpq.processing);
 		kfree(queue);
 		ring->queues[qid] = NULL;
 	}
@@ -109,13 +129,21 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 {
 	struct fuse_conn *fc = ring->fc;
 	struct fuse_ring_queue *queue;
+	struct list_head *pq;
 
 	queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
 	if (!queue)
 		return ERR_PTR(-ENOMEM);
+	pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
+	if (!pq) {
+		kfree(queue);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	spin_lock(&fc->lock);
 	if (ring->queues[qid]) {
 		spin_unlock(&fc->lock);
+		kfree(queue->fpq.processing);
 		kfree(queue);
 		return ring->queues[qid];
 	}
@@ -127,12 +155,244 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
 
 	INIT_LIST_HEAD(&queue->ent_avail_queue);
 	INIT_LIST_HEAD(&queue->ent_intermediate_queue);
+	INIT_LIST_HEAD(&queue->ent_in_userspace);
+	INIT_LIST_HEAD(&queue->fuse_req_queue);
+
+	queue->fpq.processing = pq;
+	fuse_pqueue_init(&queue->fpq);
 
 	spin_unlock(&fc->lock);
 
 	return queue;
 }
 
+static void
+fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
+			      unsigned int issue_flags)
+{
+	io_uring_cmd_done(cmd, 0, 0, issue_flags);
+}
+
+/*
+ * Checks for errors and stores it into the request
+ */
+static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
+					 struct fuse_req *req,
+					 struct fuse_conn *fc)
+{
+	int err;
+
+	if (oh->unique == 0) {
+		/* Not supportd through request based uring, this needs another
+		 * ring from user space to kernel
+		 */
+		pr_warn("Unsupported fuse-notify\n");
+		err = -EINVAL;
+		goto seterr;
+	}
+
+	if (oh->error <= -512 || oh->error > 0) {
+		err = -EINVAL;
+		goto seterr;
+	}
+
+	if (oh->error) {
+		err = oh->error;
+		pr_devel("%s:%d err=%d op=%d req-ret=%d", __func__, __LINE__,
+			 err, req->args->opcode, req->out.h.error);
+		goto err; /* error already set */
+	}
+
+	if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
+		pr_warn("Unpexted seqno mismatch, expected: %llu got %llu\n",
+			req->in.h.unique, oh->unique & ~FUSE_INT_REQ_BIT);
+		err = -ENOENT;
+		goto seterr;
+	}
+
+	/* Is it an interrupt reply ID?	 */
+	if (oh->unique & FUSE_INT_REQ_BIT) {
+		err = 0;
+		if (oh->error == -ENOSYS)
+			fc->no_interrupt = 1;
+		else if (oh->error == -EAGAIN) {
+			/* XXX Interrupts not handled yet */
+			/* err = queue_interrupt(req); */
+			pr_warn("Intrerupt EAGAIN not supported yet");
+			err = -EINVAL;
+		}
+
+		goto seterr;
+	}
+
+	return 0;
+
+seterr:
+	pr_devel("%s:%d err=%d op=%d req-ret=%d", __func__, __LINE__, err,
+		 req->args->opcode, req->out.h.error);
+	oh->error = err;
+err:
+	return err;
+}
+
+static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
+				     struct fuse_req *req,
+				     struct fuse_ring_ent *ent)
+{
+	struct fuse_copy_state cs;
+	struct fuse_args *args = req->args;
+	struct iov_iter iter;
+	int err;
+	int res_arg_len;
+
+	err = copy_from_user(&res_arg_len, &ent->headers->in_out_arg_len,
+			     sizeof(res_arg_len));
+	if (err)
+		return err;
+
+	err = import_ubuf(ITER_SOURCE, ent->payload, ent->max_arg_len, &iter);
+	if (err)
+		return err;
+
+	fuse_copy_init(&cs, 0, &iter);
+	cs.is_uring = 1;
+	cs.req = req;
+
+	return fuse_copy_out_args(&cs, args, res_arg_len);
+}
+
+ /*
+  * Copy data from the req to the ring buffer
+  */
+static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
+				   struct fuse_ring_ent *ent)
+{
+	struct fuse_copy_state cs;
+	struct fuse_args *args = req->args;
+	struct fuse_in_arg *in_args = args->in_args;
+	int num_args = args->in_numargs;
+	int err, res;
+	struct iov_iter iter;
+
+	if (num_args == 0)
+		return 0;
+
+	err = import_ubuf(ITER_DEST, ent->payload, ent->max_arg_len, &iter);
+	if (err) {
+		pr_info_ratelimited("Import user buffer failed\n");
+		return err;
+	}
+
+	fuse_copy_init(&cs, 1, &iter);
+	cs.is_uring = 1;
+	cs.req = req;
+
+	/*
+	 * Expectation is that the first argument is the header, for some
+	 * operations it might be zero.
+	 */
+	if (args->in_args[0].size > 0) {
+		res = copy_to_user(&ent->headers->op_in, in_args->value,
+				   in_args->size);
+		err = res > 0 ? -EFAULT : res;
+		if (err) {
+			pr_info_ratelimited("Copying the header failed.\n");
+			return err;
+		}
+	}
+
+	/* Skip the already handled header */
+	in_args++;
+	num_args--;
+
+	err = fuse_copy_args(&cs, num_args, args->in_pages,
+			     (struct fuse_arg *)in_args, 0);
+	if (err) {
+		pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
+		return err;
+	}
+
+	BUILD_BUG_ON((sizeof(ent->headers->in_out_arg_len) !=
+		      sizeof(cs.ring.offset)));
+	res = copy_to_user(&ent->headers->in_out_arg_len, &cs.ring.offset,
+			   sizeof(ent->headers->in_out_arg_len));
+	err = res > 0 ? -EFAULT : res;
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int
+fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_req *req = ring_ent->fuse_req;
+	int err = 0, res;
+
+	if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
+		pr_err("qid=%d ring-req=%p invalid state %d on send\n",
+		       queue->qid, ring_ent, ring_ent->state);
+		err = -EIO;
+	}
+
+	if (err)
+		return err;
+
+	pr_devel("%s qid=%d state=%d cmd-done op=%d unique=%llu\n", __func__,
+		 queue->qid, ring_ent->state, req->in.h.opcode,
+		 req->in.h.unique);
+
+	/* copy the request */
+	err = fuse_uring_copy_to_ring(ring, req, ring_ent);
+	if (unlikely(err)) {
+		pr_info("Copy to ring failed: %d\n", err);
+		goto err;
+	}
+
+	/* copy fuse_in_header */
+	res = copy_to_user(&ring_ent->headers->in, &req->in.h,
+			   sizeof(req->in.h));
+	err = res > 0 ? -EFAULT : res;
+	if (err)
+		goto err;
+
+	set_bit(FR_SENT, &req->flags);
+	return 0;
+
+err:
+	fuse_uring_req_end(ring_ent, true, err);
+	return err;
+}
+
+/*
+ * Write data to the ring buffer and send the request to userspace,
+ * userspace will read it
+ * This is comparable with classical read(/dev/fuse)
+ */
+static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
+{
+	int err = 0;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+
+	err = fuse_uring_prepare_send(ring_ent);
+	if (err)
+		goto err;
+
+	spin_lock(&queue->lock);
+	ring_ent->state = FRRS_USERSPACE;
+	list_move(&ring_ent->list, &queue->ent_in_userspace);
+	spin_unlock(&queue->lock);
+
+	io_uring_cmd_complete_in_task(ring_ent->cmd,
+				      fuse_uring_async_send_to_ring);
+	return 0;
+
+err:
+	return err;
+}
+
 /*
  * Put a ring request onto hold, it is no longer used for now.
  */
@@ -159,6 +419,192 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
 	ring_ent->state = FRRS_WAIT;
 }
 
+/* Used to find the request on SQE commit */
+static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent)
+{
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_req *req = ring_ent->fuse_req;
+	struct fuse_pqueue *fpq = &queue->fpq;
+	unsigned int hash;
+
+	hash = fuse_req_hash(req->in.h.unique);
+	list_move_tail(&req->list, &fpq->processing[hash]);
+	req->ring_entry = ring_ent;
+}
+
+/*
+ * Assign a fuse queue entry to the given entry
+ */
+static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
+					   struct fuse_req *req)
+{
+	lockdep_assert_held(&ring_ent->queue->lock);
+
+	if (WARN_ON_ONCE(ring_ent->state != FRRS_WAIT &&
+			 ring_ent->state != FRRS_COMMIT)) {
+		pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+			ring_ent->state);
+	}
+	list_del_init(&req->list);
+	clear_bit(FR_PENDING, &req->flags);
+	ring_ent->fuse_req = req;
+	ring_ent->state = FRRS_FUSE_REQ;
+
+	fuse_uring_add_to_pq(ring_ent);
+}
+
+/*
+ * Release the ring entry and fetch the next fuse request if available
+ *
+ * @return true if a new request has been fetched
+ */
+static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
+	__must_hold(&queue->lock)
+{
+	struct fuse_req *req = NULL;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct list_head *req_queue = &queue->fuse_req_queue;
+
+	lockdep_assert_held(&queue->lock);
+
+	/* get and assign the next entry while it is still holding the lock */
+	if (!list_empty(req_queue)) {
+		req = list_first_entry(req_queue, struct fuse_req, list);
+		fuse_uring_add_req_to_ring_ent(ring_ent, req);
+		list_move(&ring_ent->list, &queue->ent_intermediate_queue);
+	}
+
+	return req ? true : false;
+}
+
+/*
+ * Read data from the ring buffer, which user space has written to
+ * This is comparible with handling of classical write(/dev/fuse).
+ * Also make the ring request available again for new fuse requests.
+ */
+static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
+			      unsigned int issue_flags)
+{
+	struct fuse_ring *ring = ring_ent->queue->ring;
+	struct fuse_conn *fc = ring->fc;
+	struct fuse_req *req = ring_ent->fuse_req;
+	ssize_t err = 0;
+	bool set_err = false;
+
+	err = copy_from_user(&req->out.h, &ring_ent->headers->out,
+			     sizeof(req->out.h));
+	if (err) {
+		req->out.h.error = err;
+		goto out;
+	}
+
+	err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
+	if (err) {
+		/* req->out.h.error already set */
+		pr_devel("%s:%d err=%zd oh->err=%d\n", __func__, __LINE__, err,
+			 req->out.h.error);
+		goto out;
+	}
+
+	err = fuse_uring_copy_from_ring(ring, req, ring_ent);
+	if (err)
+		set_err = true;
+
+out:
+	pr_devel("%s:%d ret=%zd op=%d req-ret=%d\n", __func__, __LINE__, err,
+		 req->args->opcode, req->out.h.error);
+	fuse_uring_req_end(ring_ent, set_err, err);
+}
+
+/*
+ * Get the next fuse req and send it
+ */
+static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
+				    struct fuse_ring_queue *queue)
+{
+	int has_next, err;
+	int prev_state = ring_ent->state;
+
+	do {
+		spin_lock(&queue->lock);
+		has_next = fuse_uring_ent_assign_req(ring_ent);
+		if (!has_next) {
+			fuse_uring_ent_avail(ring_ent, queue);
+			spin_unlock(&queue->lock);
+			break; /* no request left */
+		}
+		spin_unlock(&queue->lock);
+
+		err = fuse_uring_send_next_to_ring(ring_ent);
+		if (err)
+			ring_ent->state = prev_state;
+	} while (err);
+}
+
+/* FUSE_URING_REQ_COMMIT_AND_FETCH handler */
+static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
+				   struct fuse_conn *fc)
+{
+	const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+	struct fuse_ring_ent *ring_ent;
+	int err;
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	uint64_t commit_id = cmd_req->commit_id;
+	struct fuse_pqueue fpq;
+	struct fuse_req *req;
+
+	err = -ENOTCONN;
+	if (!ring)
+		return err;
+
+	queue = ring->queues[cmd_req->qid];
+	if (!queue)
+		return err;
+	fpq = queue->fpq;
+
+	spin_lock(&queue->lock);
+	/* Find a request based on the unique ID of the fuse request
+	 * This should get revised, as it needs a hash calculation and list
+	 * search. And full struct fuse_pqueue is needed (memory overhead).
+	 * As well as the link from req to ring_ent.
+	 */
+	req = fuse_request_find(&fpq, commit_id);
+	err = -ENOENT;
+	if (!req) {
+		pr_info("qid=%d commit_id %llu not found\n", queue->qid,
+			commit_id);
+		spin_unlock(&queue->lock);
+		return err;
+	}
+	list_del_init(&req->list);
+	ring_ent = req->ring_entry;
+	req->ring_entry = NULL;
+
+	err = fuse_ring_ent_unset_userspace(ring_ent);
+	if (err != 0) {
+		pr_info_ratelimited("qid=%d commit_id %llu state %d",
+				    queue->qid, commit_id, ring_ent->state);
+		spin_unlock(&queue->lock);
+		return err;
+	}
+
+	ring_ent->cmd = cmd;
+	spin_unlock(&queue->lock);
+
+	/* without the queue lock, as other locks are taken */
+	fuse_uring_commit(ring_ent, issue_flags);
+
+	/*
+	 * Fetching the next request is absolutely required as queued
+	 * fuse requests would otherwise not get processed - committing
+	 * and fetching is done in one step vs legacy fuse, which has separated
+	 * read (fetch request) and write (commit result).
+	 */
+	fuse_uring_next_fuse_req(ring_ent, queue);
+	return 0;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -333,6 +779,9 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		if (err)
 			pr_info_once("fuse_uring_fetch failed err=%d\n", err);
 		break;
+	case FUSE_URING_REQ_COMMIT_AND_FETCH:
+		err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
+		break;
 	default:
 		err = -EINVAL;
 		pr_devel("Unknown uring command %d", cmd_op);
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 11798080896372c72692228ff7072bbee6a63e53..c7bac19e91b781fc9ccce540e39d99b39b751f6b 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -19,6 +19,9 @@ enum fuse_ring_req_state {
 	/* The ring request waits for a new fuse request */
 	FRRS_WAIT,
 
+	/* The ring req got assigned a fuse req */
+	FRRS_FUSE_REQ,
+
 	/* request is in or on the way to user space */
 	FRRS_USERSPACE,
 };
@@ -72,6 +75,14 @@ struct fuse_ring_queue {
 	 * to be send to userspace
 	 */
 	struct list_head ent_intermediate_queue;
+
+	/* entries in userspace */
+	struct list_head ent_in_userspace;
+
+	/* fuse requests waiting for an entry slot */
+	struct list_head fuse_req_queue;
+
+	struct fuse_pqueue fpq;
 };
 
 /**
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 7ecb103af6f0feca99eb8940872c6a5ccf2e5186..a8d578b99a14239c05b4a496a4b3b1396eb768dd 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -7,7 +7,7 @@
 #define _FS_FUSE_DEV_I_H
 
 #include <linux/types.h>
-
+#include <linux/fs.h>
 
 /* Ordinary requests have even IDs, while interrupts IDs are odd */
 #define FUSE_INT_REQ_BIT (1ULL << 0)
@@ -15,6 +15,8 @@
 
 struct fuse_arg;
 struct fuse_args;
+struct fuse_pqueue;
+struct fuse_req;
 
 struct fuse_copy_state {
 	int write;
@@ -44,6 +46,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
 	return READ_ONCE(file->private_data);
 }
 
+unsigned int fuse_req_hash(u64 unique);
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
+
 void fuse_dev_end_requests(struct list_head *head);
 
 void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 91c2e7e35cdbd470894a8a9cd026b77368b7a4b6..8bb6bd1854e41afb52a0d0081fa5fc6bfdfa58d8 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -435,6 +435,10 @@ struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+#ifdef CONFIG_FUSE_IO_URING
+	void *ring_entry;
+#endif
 };
 
 struct fuse_iqueue;
@@ -1207,6 +1211,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
  */
 struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
 
+/**
+ * Initialize the fuse processing queue
+ */
+void fuse_pqueue_init(struct fuse_pqueue *fpq);
+
 /**
  * Initialize fuse_conn
  */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 59f8fb7b915f052f892d587a0f9a8dc17cf750ce..a1179c1e212b7a1cfd6e69f20dd5fcbe18c6202b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -894,7 +894,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq,
 	fiq->priv = priv;
 }
 
-static void fuse_pqueue_init(struct fuse_pqueue *fpq)
+void fuse_pqueue_init(struct fuse_pqueue *fpq)
 {
 	unsigned int i;
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 10/16] fuse: {uring} Handle teardown of ring entries
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (8 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 11/16] fuse: {uring} Add a ring queue and send method Bernd Schubert
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.

Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.

Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed

b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.

This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.

Entries in state:
- FRRS_INIT: No action needed, do not contribute to
  ring->queue_refs yet
- All other states: Are currently processed by other tasks,
  async teardown is needed and it has to wait for the two
  states above. It could be also solved without an async
  teardown task, but would require additional if conditions
  in hot code paths. Also in my personal opinion the code
  looks cleaner with async teardown.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         |   8 ++
 fs/fuse/dev_uring.c   | 211 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |  51 ++++++++++++
 3 files changed, 270 insertions(+)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index b085176ea824bd612a8736e00c9b6f8f9e232208..d0321619c3bdcb2ee592b9f83dbee192a3ff734a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2241,6 +2241,12 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		spin_unlock(&fc->lock);
 
 		fuse_dev_end_requests(&to_end);
+
+		/*
+		 * fc->lock must not be taken to avoid conflicts with io-uring
+		 * locks
+		 */
+		fuse_uring_abort(fc);
 	} else {
 		spin_unlock(&fc->lock);
 	}
@@ -2252,6 +2258,8 @@ void fuse_wait_aborted(struct fuse_conn *fc)
 	/* matches implicit memory barrier in fuse_drop_waiting() */
 	smp_mb();
 	wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
+
+	fuse_uring_wait_stopped_queues(fc);
 }
 
 int fuse_dev_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 4f8a0bd1e2192bfbc310eb53dd8e89274e6f479b..2f5665518d3f66bf2ae20c0274e277ee94adc491 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -52,6 +52,37 @@ static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
 	return 0;
 }
 
+/* Abort all list queued request on the given ring queue */
+static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
+{
+	struct fuse_req *req;
+	LIST_HEAD(req_list);
+
+	spin_lock(&queue->lock);
+	list_for_each_entry(req, &queue->fuse_req_queue, list)
+		clear_bit(FR_PENDING, &req->flags);
+	list_splice_init(&queue->fuse_req_queue, &req_list);
+	spin_unlock(&queue->lock);
+
+	/* must not hold queue lock to avoid order issues with fi->lock */
+	fuse_dev_end_requests(&req_list);
+}
+
+void fuse_uring_abort_end_requests(struct fuse_ring *ring)
+{
+	int qid;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		queue->stopped = true;
+		fuse_uring_abort_end_queue_requests(queue);
+	}
+}
+
 void fuse_uring_destruct(struct fuse_conn *fc)
 {
 	struct fuse_ring *ring = fc->ring;
@@ -110,9 +141,12 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
 		goto out_err;
 	}
 
+	init_waitqueue_head(&ring->stop_waitq);
+
 	fc->ring = ring;
 	ring->nr_queues = nr_queues;
 	ring->fc = fc;
+	atomic_set(&ring->queue_refs, 0);
 
 	spin_unlock(&fc->lock);
 	return ring;
@@ -173,6 +207,177 @@ fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
 	io_uring_cmd_done(cmd, 0, 0, issue_flags);
 }
 
+static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
+{
+	struct fuse_req *req = ent->fuse_req;
+
+	/* remove entry from fuse_pqueue->processing */
+	list_del_init(&req->list);
+	ent->fuse_req = NULL;
+	clear_bit(FR_SENT, &req->flags);
+	req->out.h.error = -ECONNABORTED;
+	fuse_request_end(req);
+}
+
+/*
+ * Release a request/entry on connection tear down
+ */
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
+					 bool need_cmd_done)
+{
+	struct fuse_ring_queue *queue = ent->queue;
+
+	/*
+	 * fuse_request_end() might take other locks like fi->lock and
+	 * can lead to lock ordering issues
+	 */
+	lockdep_assert_not_held(&ent->queue->lock);
+
+	if (need_cmd_done) {
+		pr_devel("qid=%d sending cmd_done\n", queue->qid);
+
+		io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
+				  IO_URING_F_UNLOCKED);
+	}
+
+	if (ent->fuse_req)
+		fuse_uring_stop_fuse_req_end(ent);
+
+	list_del_init(&ent->list);
+	kfree(ent);
+}
+
+static void fuse_uring_stop_list_entries(struct list_head *head,
+					 struct fuse_ring_queue *queue,
+					 enum fuse_ring_req_state exp_state)
+{
+	struct fuse_ring *ring = queue->ring;
+	struct fuse_ring_ent *ent, *next;
+	ssize_t queue_refs = SSIZE_MAX;
+	LIST_HEAD(to_teardown);
+
+	spin_lock(&queue->lock);
+	list_for_each_entry_safe(ent, next, head, list) {
+		if (ent->state != exp_state) {
+			pr_warn("entry teardown qid=%d state=%d expected=%d",
+				queue->qid, ent->state, exp_state);
+			continue;
+		}
+
+		list_move(&ent->list, &to_teardown);
+	}
+	spin_unlock(&queue->lock);
+
+	/* no queue lock to avoid lock order issues */
+	list_for_each_entry_safe(ent, next, &to_teardown, list) {
+		bool need_cmd_done = ent->state != FRRS_USERSPACE;
+
+		fuse_uring_entry_teardown(ent, need_cmd_done);
+		queue_refs = atomic_dec_return(&ring->queue_refs);
+
+		if (WARN_ON_ONCE(queue_refs < 0))
+			pr_warn("qid=%d queue_refs=%zd", queue->qid,
+				queue_refs);
+	}
+}
+
+static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue)
+{
+	fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue,
+				     FRRS_USERSPACE);
+	fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue, FRRS_WAIT);
+}
+
+/*
+ * Log state debug info
+ */
+static void fuse_uring_log_ent_state(struct fuse_ring *ring)
+{
+	int qid;
+	struct fuse_ring_ent *ent;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		spin_lock(&queue->lock);
+		/*
+		 * Log entries from the intermediate queue, the other queues
+		 * should be empty
+		 */
+		list_for_each_entry(ent, &queue->ent_intermediate_queue, list) {
+			pr_info("ring=%p qid=%d ent=%p state=%d\n", ring, qid,
+				ent, ent->state);
+		}
+		spin_lock(&queue->lock);
+	}
+	ring->stop_debug_log = 1;
+}
+
+static void fuse_uring_async_stop_queues(struct work_struct *work)
+{
+	int qid;
+	struct fuse_ring *ring =
+		container_of(work, struct fuse_ring, async_teardown_work.work);
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		fuse_uring_teardown_entries(queue);
+	}
+
+	/*
+	 * Some ring entries are might be in the middle of IO operations,
+	 * i.e. in process to get handled by file_operations::uring_cmd
+	 * or on the way to userspace - we could handle that with conditions in
+	 * run time code, but easier/cleaner to have an async tear down handler
+	 * If there are still queue references left
+	 */
+	if (atomic_read(&ring->queue_refs) > 0) {
+		if (time_after(jiffies,
+			       ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT))
+			fuse_uring_log_ent_state(ring);
+
+		schedule_delayed_work(&ring->async_teardown_work,
+				      FUSE_URING_TEARDOWN_INTERVAL);
+	} else {
+		wake_up_all(&ring->stop_waitq);
+	}
+}
+
+/*
+ * Stop the ring queues
+ */
+void fuse_uring_stop_queues(struct fuse_ring *ring)
+{
+	int qid;
+
+	for (qid = 0; qid < ring->nr_queues; qid++) {
+		struct fuse_ring_queue *queue = ring->queues[qid];
+
+		if (!queue)
+			continue;
+
+		fuse_uring_teardown_entries(queue);
+	}
+
+	if (atomic_read(&ring->queue_refs) > 0) {
+		pr_info("ring=%p scheduling async queue stop\n", ring);
+		ring->teardown_time = jiffies;
+		INIT_DELAYED_WORK(&ring->async_teardown_work,
+				  fuse_uring_async_stop_queues);
+		schedule_delayed_work(&ring->async_teardown_work,
+				      FUSE_URING_TEARDOWN_INTERVAL);
+	} else {
+		wake_up_all(&ring->stop_waitq);
+	}
+}
+
 /*
  * Checks for errors and stores it into the request
  */
@@ -563,6 +768,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 		return err;
 	fpq = queue->fpq;
 
+	if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped))
+		return err;
+
 	spin_lock(&queue->lock);
 	/* Find a request based on the unique ID of the fuse request
 	 * This should get revised, as it needs a hash calculation and list
@@ -730,6 +938,7 @@ static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
 	if (WARN_ON_ONCE(err != 0))
 		goto err;
 
+	atomic_inc(&ring->queue_refs);
 	_fuse_uring_fetch(ring_ent, cmd, issue_flags);
 
 	return 0;
@@ -756,6 +965,8 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
 	goto out;
 
+	pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
+
 	err = -EOPNOTSUPP;
 	if (!enable_uring) {
 		pr_info_ratelimited("uring is disabled\n");
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index c7bac19e91b781fc9ccce540e39d99b39b751f6b..c9497fc94373a6e071161c205e77279fd0ada741 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -11,6 +11,9 @@
 
 #ifdef CONFIG_FUSE_IO_URING
 
+#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ)
+#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20)
+
 enum fuse_ring_req_state {
 
 	/* ring entry received from userspace and it being processed */
@@ -83,6 +86,8 @@ struct fuse_ring_queue {
 	struct list_head fuse_req_queue;
 
 	struct fuse_pqueue fpq;
+
+	bool stopped;
 };
 
 /**
@@ -97,11 +102,50 @@ struct fuse_ring {
 	size_t nr_queues;
 
 	struct fuse_ring_queue **queues;
+	/*
+	 * Log ring entry states onces on stop when entries cannot be
+	 * released
+	 */
+	unsigned int stop_debug_log : 1;
+
+	wait_queue_head_t stop_waitq;
+
+	/* async tear down */
+	struct delayed_work async_teardown_work;
+
+	/* log */
+	unsigned long teardown_time;
+
+	atomic_t queue_refs;
 };
 
 void fuse_uring_destruct(struct fuse_conn *fc);
+void fuse_uring_stop_queues(struct fuse_ring *ring);
+void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
 
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+
+	if (ring == NULL)
+		return;
+
+	if (atomic_read(&ring->queue_refs) > 0) {
+		fuse_uring_abort_end_requests(ring);
+		fuse_uring_stop_queues(ring);
+	}
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+	struct fuse_ring *ring = fc->ring;
+
+	if (ring)
+		wait_event(ring->stop_waitq,
+			   atomic_read(&ring->queue_refs) == 0);
+}
+
 #else /* CONFIG_FUSE_IO_URING */
 
 struct fuse_ring;
@@ -114,6 +158,13 @@ static inline void fuse_uring_destruct(struct fuse_conn *fc)
 {
 }
 
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+}
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 11/16] fuse: {uring} Add a ring queue and send method
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (9 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 12/16] fuse: {uring} Allow to queue to the ring Bernd Schubert
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This prepares queueing and sending through io-uring.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c   | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h |   7 ++++
 2 files changed, 108 insertions(+)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 2f5665518d3f66bf2ae20c0274e277ee94adc491..84f5c330bac296c65ff676d454065963082fa116 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -21,6 +21,10 @@ MODULE_PARM_DESC(enable_uring,
 
 #define FUSE_URING_IOV_SEGS 2 /* header and payload */
 
+struct fuse_uring_cmd_pdu {
+	struct fuse_ring_ent *ring_ent;
+};
+
 /*
  * Finalize a fuse request, then fetch and send the next entry, if available
  */
@@ -1007,3 +1011,100 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 
 	return -EIOCBQUEUED;
 }
+
+/*
+ * This prepares and sends the ring request in fuse-uring task context.
+ * User buffers are not mapped yet - the application does not have permission
+ * to write to it - this has to be executed in ring task context.
+ * XXX: Map and pin user paged and avoid this function.
+ */
+static void
+fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
+			    unsigned int issue_flags)
+{
+	struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+	struct fuse_ring_ent *ring_ent = pdu->ring_ent;
+	struct fuse_ring_queue *queue = ring_ent->queue;
+	int err;
+
+	BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
+
+	err = fuse_uring_prepare_send(ring_ent);
+	if (err)
+		goto err;
+
+	io_uring_cmd_done(cmd, 0, 0, issue_flags);
+
+	spin_lock(&queue->lock);
+	ring_ent->state = FRRS_USERSPACE;
+	list_move(&ring_ent->list, &queue->ent_in_userspace);
+	spin_unlock(&queue->lock);
+	return;
+err:
+	fuse_uring_next_fuse_req(ring_ent, queue);
+}
+
+/* queue a fuse request and send it if a ring entry is available */
+int fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req)
+{
+	struct fuse_ring *ring = fc->ring;
+	struct fuse_ring_queue *queue;
+	int qid = 0;
+	struct fuse_ring_ent *ring_ent = NULL;
+	int res;
+
+	/*
+	 * async requests are best handled on another core, the current
+	 * core can do application/page handling, while the async request
+	 * is handled on another core in userspace.
+	 * For sync request the application has to wait - no processing, so
+	 * the request should continue on the current core and avoid context
+	 * switches.
+	 * XXX This should be on the same numa node and not busy - is there
+	 * a scheduler function available  that could make this decision?
+	 * It should also not persistently switch between cores - makes
+	 * it hard for the scheduler.
+	 */
+	qid = task_cpu(current);
+
+	if (WARN_ONCE(qid >= ring->nr_queues,
+		      "Core number (%u) exceeds nr ueues (%zu)\n", qid,
+		      ring->nr_queues))
+		qid = 0;
+
+	queue = ring->queues[qid];
+	if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
+		return -EINVAL;
+
+	spin_lock(&queue->lock);
+
+	if (unlikely(queue->stopped)) {
+		res = -ENOTCONN;
+		goto err_unlock;
+	}
+
+	list_add_tail(&req->list, &queue->fuse_req_queue);
+
+	if (!list_empty(&queue->ent_avail_queue)) {
+		ring_ent = list_first_entry(&queue->ent_avail_queue,
+					    struct fuse_ring_ent, list);
+		list_del_init(&ring_ent->list);
+		fuse_uring_add_req_to_ring_ent(ring_ent, req);
+	}
+	spin_unlock(&queue->lock);
+
+	if (ring_ent) {
+		struct io_uring_cmd *cmd = ring_ent->cmd;
+		struct fuse_uring_cmd_pdu *pdu =
+			(struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+		pdu->ring_ent = ring_ent;
+		io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+	}
+
+	return 0;
+
+err_unlock:
+	spin_unlock(&queue->lock);
+	return res;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index c9497fc94373a6e071161c205e77279fd0ada741..c442e53cefe5fea998a04bb060861569bece0459 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -123,6 +123,7 @@ void fuse_uring_destruct(struct fuse_conn *fc);
 void fuse_uring_stop_queues(struct fuse_ring *ring);
 void fuse_uring_abort_end_requests(struct fuse_ring *ring);
 int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+int fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req);
 
 static inline void fuse_uring_abort(struct fuse_conn *fc)
 {
@@ -165,6 +166,12 @@ static inline void fuse_uring_abort(struct fuse_conn *fc)
 static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
 {
 }
+
+static inline int
+fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req)
+{
+	return -EPFNOSUPPORT;
+}
 #endif /* CONFIG_FUSE_IO_URING */
 
 #endif /* _FS_FUSE_DEV_URING_I_H */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 12/16] fuse: {uring} Allow to queue to the ring
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (10 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 11/16] fuse: {uring} Add a ring queue and send method Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

This enables enqueuing requests through fuse uring queues.

For initial simplicity requests are always allocated the normal way
then added to ring queues lists and only then copied to ring queue
entries. Later on the allocation and adding the requests to a list
can be avoided, by directly using a ring entry. This introduces
some code complexity and is therefore not done for now.

FIXME: Needs update with new function pointers in fuse-next.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev.c         | 70 +++++++++++++++++++++++++++++++++++++++++++++------
 fs/fuse/dev_uring.c   | 33 ++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h | 14 +++++++++++
 3 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index d0321619c3bdcb2ee592b9f83dbee192a3ff734a..c31bccc667dfafbbb09ef04ababd401558a9c321 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -211,13 +211,23 @@ const struct fuse_iqueue_ops fuse_dev_fiq_ops = {
 };
 EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops);
 
-static void queue_request_and_unlock(struct fuse_iqueue *fiq,
+static void queue_request_and_unlock(struct fuse_conn *fc,
 				     struct fuse_req *req)
 __releases(fiq->lock)
 {
+	struct fuse_iqueue *fiq = &fc->iq;
+
 	req->in.h.len = sizeof(struct fuse_in_header) +
 		fuse_len_args(req->args->in_numargs,
 			      (struct fuse_arg *) req->args->in_args);
+
+	if (fuse_uring_ready(fc)) {
+		/* this lock is not needed at all for ring req handling */
+		spin_unlock(&fiq->lock);
+		fuse_uring_queue_fuse_req(fc, req);
+		return;
+	}
+
 	list_add_tail(&req->list, &fiq->pending);
 	fiq->ops->wake_pending_and_unlock(fiq);
 }
@@ -254,7 +264,7 @@ static void flush_bg_queue(struct fuse_conn *fc)
 		fc->active_background++;
 		spin_lock(&fiq->lock);
 		req->in.h.unique = fuse_get_unique(fiq);
-		queue_request_and_unlock(fiq, req);
+		queue_request_and_unlock(fc, req);
 	}
 }
 
@@ -398,7 +408,8 @@ static void request_wait_answer(struct fuse_req *req)
 
 static void __fuse_request_send(struct fuse_req *req)
 {
-	struct fuse_iqueue *fiq = &req->fm->fc->iq;
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
 
 	BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
 	spin_lock(&fiq->lock);
@@ -410,7 +421,7 @@ static void __fuse_request_send(struct fuse_req *req)
 		/* acquire extra reference, since request is still needed
 		   after fuse_request_end() */
 		__fuse_get_request(req);
-		queue_request_and_unlock(fiq, req);
+		queue_request_and_unlock(fc, req);
 
 		request_wait_answer(req);
 		/* Pairs with smp_wmb() in fuse_request_end() */
@@ -480,6 +491,10 @@ ssize_t fuse_simple_request(struct fuse_mount *fm, struct fuse_args *args)
 	if (args->force) {
 		atomic_inc(&fc->num_waiting);
 		req = fuse_request_alloc(fm, GFP_KERNEL | __GFP_NOFAIL);
+		if (unlikely(!req)) {
+			ret = -ENOTCONN;
+			goto err;
+		}
 
 		if (!args->nocreds)
 			fuse_force_creds(req);
@@ -507,16 +522,55 @@ ssize_t fuse_simple_request(struct fuse_mount *fm, struct fuse_args *args)
 	}
 	fuse_put_request(req);
 
+err:
 	return ret;
 }
 
-static bool fuse_request_queue_background(struct fuse_req *req)
+static bool fuse_request_queue_background_uring(struct fuse_conn *fc,
+					       struct fuse_req *req)
+{
+	struct fuse_iqueue *fiq = &fc->iq;
+	int err;
+
+	req->in.h.unique = fuse_get_unique(fiq);
+	req->in.h.len = sizeof(struct fuse_in_header) +
+		fuse_len_args(req->args->in_numargs,
+			      (struct fuse_arg *) req->args->in_args);
+
+	err = fuse_uring_queue_fuse_req(fc, req);
+	if (!err) {
+		/* XXX remove and lets the users of that use per queue values -
+		 * avoid the shared spin lock...
+		 * Is this needed at all?
+		 */
+		spin_lock(&fc->bg_lock);
+		fc->num_background++;
+		fc->active_background++;
+
+
+		/* XXX block when per ring queues get occupied */
+		if (fc->num_background == fc->max_background)
+			fc->blocked = 1;
+		spin_unlock(&fc->bg_lock);
+	}
+
+	return err ? false : true;
+}
+
+/*
+ * @return true if queued
+ */
+static int fuse_request_queue_background(struct fuse_req *req)
 {
 	struct fuse_mount *fm = req->fm;
 	struct fuse_conn *fc = fm->fc;
 	bool queued = false;
 
 	WARN_ON(!test_bit(FR_BACKGROUND, &req->flags));
+
+	if (fuse_uring_ready(fc))
+		return fuse_request_queue_background_uring(fc, req);
+
 	if (!test_bit(FR_WAITING, &req->flags)) {
 		__set_bit(FR_WAITING, &req->flags);
 		atomic_inc(&fc->num_waiting);
@@ -569,7 +623,8 @@ static int fuse_simple_notify_reply(struct fuse_mount *fm,
 				    struct fuse_args *args, u64 unique)
 {
 	struct fuse_req *req;
-	struct fuse_iqueue *fiq = &fm->fc->iq;
+	struct fuse_conn *fc = fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
 	int err = 0;
 
 	req = fuse_get_req(fm, false);
@@ -583,7 +638,7 @@ static int fuse_simple_notify_reply(struct fuse_mount *fm,
 
 	spin_lock(&fiq->lock);
 	if (fiq->connected) {
-		queue_request_and_unlock(fiq, req);
+		queue_request_and_unlock(fc, req);
 	} else {
 		err = -ENODEV;
 		spin_unlock(&fiq->lock);
@@ -2199,6 +2254,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
 		spin_unlock(&fc->bg_lock);
 
 		fuse_set_initialized(fc);
+
 		list_for_each_entry(fud, &fc->devices, entry) {
 			struct fuse_pqueue *fpq = &fud->pq;
 
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 84f5c330bac296c65ff676d454065963082fa116..5cd80988ee592679d9791a6528805f7dc8d58709 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -817,6 +817,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	return 0;
 }
 
+static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
+{
+	int qid;
+	struct fuse_ring_queue *queue;
+	bool ready = true;
+
+	for (qid = 0; qid < ring->nr_queues && ready; qid++) {
+		if (current_qid == qid)
+			continue;
+
+		queue = ring->queues[qid];
+		if (!queue) {
+			ready = false;
+			break;
+		}
+
+		spin_lock(&queue->lock);
+		if (list_empty(&queue->ent_avail_queue))
+			ready = false;
+		spin_unlock(&queue->lock);
+	}
+
+	return ready;
+}
+
 /*
  * fuse_uring_req_fetch command handling
  */
@@ -825,11 +850,19 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
 			      unsigned int issue_flags)
 {
 	struct fuse_ring_queue *queue = ring_ent->queue;
+	struct fuse_ring *ring = queue->ring;
 
 	spin_lock(&queue->lock);
 	fuse_uring_ent_avail(ring_ent, queue);
 	ring_ent->cmd = cmd;
 	spin_unlock(&queue->lock);
+
+	if (!ring->ready) {
+		bool ready = is_ring_ready(ring, queue->qid);
+
+		if (ready)
+			WRITE_ONCE(ring->ready, true);
+	}
 }
 
 /*
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index c442e53cefe5fea998a04bb060861569bece0459..7951a8a96702190beba0596212c90b60da659aca 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -117,6 +117,8 @@ struct fuse_ring {
 	unsigned long teardown_time;
 
 	atomic_t queue_refs;
+
+	bool ready;
 };
 
 void fuse_uring_destruct(struct fuse_conn *fc);
@@ -132,6 +134,8 @@ static inline void fuse_uring_abort(struct fuse_conn *fc)
 	if (ring == NULL)
 		return;
 
+	WRITE_ONCE(ring->ready, false);
+
 	if (atomic_read(&ring->queue_refs) > 0) {
 		fuse_uring_abort_end_requests(ring);
 		fuse_uring_stop_queues(ring);
@@ -147,6 +151,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
 			   atomic_read(&ring->queue_refs) == 0);
 }
 
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+	return fc->ring && fc->ring->ready;
+}
+
 #else /* CONFIG_FUSE_IO_URING */
 
 struct fuse_ring;
@@ -167,6 +176,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
 {
 }
 
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+	return false;
+}
+
 static inline int
 fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req)
 {

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 13/16] io_uring/cmd: let cmds to know about dying task
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (11 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 12/16] fuse: {uring} Allow to queue to the ring Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

From: Pavel Begunkov <[email protected]>

When the taks that submitted a request is dying, a task work for that
request might get run by a kernel thread or even worse by a half
dismantled task. We can't just cancel the task work without running the
callback as the cmd might need to do some clean up, so pass a flag
instead. If set, it's not safe to access any task resources and the
callback is expected to cancel the cmd ASAP.

Changed to
if (req->task != current)
based on discussion with Jens, needs to be double verified if really
needed.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Bernd Schubert <[email protected]>
---
 include/linux/io_uring_types.h | 1 +
 io_uring/uring_cmd.c           | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 7abdc09271245ff7de3fb9a905ca78b7561e37eb..869a81c63e4970576155043fce7fe656293d7f58 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -37,6 +37,7 @@ enum io_uring_cmd_flags {
 	/* set when uring wants to cancel a previously issued command */
 	IO_URING_F_CANCEL		= (1 << 11),
 	IO_URING_F_COMPAT		= (1 << 12),
+	IO_URING_F_TASK_DEAD		= (1 << 13),
 };
 
 struct io_wq_work_node {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 21ac5fb2d5f087e1174d5c94815d580972db6e3f..405a39fdd91c9abf741c2b3b6bde2f4d850312ae 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -119,9 +119,13 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
 static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
 {
 	struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+	unsigned int flags = IO_URING_F_COMPLETE_DEFER;
+
+	if (req->task->flags & PF_EXITING)
+		flags |= IO_URING_F_TASK_DEAD;
 
 	/* task_work executor checks the deffered list completion */
-	ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER);
+	ioucmd->task_work_cb(ioucmd, flags);
 }
 
 void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (12 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-07 17:03 ` [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
  2024-11-07 17:04 ` [PATCH RFC v5 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

The ring task is terminating, it not safe to still access
its resources. Also no need for further actions.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 5cd80988ee592679d9791a6528805f7dc8d58709..6af515458695ccb2e32cc8c62c45471e6710c15f 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1062,16 +1062,22 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
 
 	BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
 
+	if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
+		err = -ECANCELED;
+		goto terminating;
+	}
+
 	err = fuse_uring_prepare_send(ring_ent);
 	if (err)
 		goto err;
 
-	io_uring_cmd_done(cmd, 0, 0, issue_flags);
-
+terminating:
 	spin_lock(&queue->lock);
 	ring_ent->state = FRRS_USERSPACE;
 	list_move(&ring_ent->list, &queue->ent_in_userspace);
 	spin_unlock(&queue->lock);
+	io_uring_cmd_done(cmd, err, 0, issue_flags);
+
 	return;
 err:
 	fuse_uring_next_fuse_req(ring_ent, queue);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (13 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
@ 2024-11-07 17:03 ` Bernd Schubert
  2024-11-18 19:32   ` Joanne Koong
  2024-11-18 23:30   ` Joanne Koong
  2024-11-07 17:04 ` [PATCH RFC v5 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
  15 siblings, 2 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:03 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

When the fuse-server terminates while the fuse-client or kernel
still has queued URING_CMDs, these commands retain references
to the struct file used by the fuse connection. This prevents
fuse_dev_release() from being invoked, resulting in a hung mount
point.

This patch addresses the issue by making queued URING_CMDs
cancelable, allowing fuse_dev_release() to proceed as expected
and preventing the mount point from hanging.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 70 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 6af515458695ccb2e32cc8c62c45471e6710c15f..b465da41c42c47eaf69f09bab1423061bc8fcc68 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -23,6 +23,7 @@ MODULE_PARM_DESC(enable_uring,
 
 struct fuse_uring_cmd_pdu {
 	struct fuse_ring_ent *ring_ent;
+	struct fuse_ring_queue *queue;
 };
 
 /*
@@ -382,6 +383,61 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
 	}
 }
 
+/*
+ * Handle IO_URING_F_CANCEL, typically should come on daemon termination
+ */
+static void fuse_uring_cancel(struct io_uring_cmd *cmd,
+			      unsigned int issue_flags, struct fuse_conn *fc)
+{
+	struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+	struct fuse_ring_queue *queue = pdu->queue;
+	struct fuse_ring_ent *ent;
+	bool found = false;
+	bool need_cmd_done = false;
+
+	spin_lock(&queue->lock);
+
+	/* XXX: This is cumbersome for large queues. */
+	list_for_each_entry(ent, &queue->ent_avail_queue, list) {
+		if (pdu->ring_ent == ent) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_info("qid=%d Did not find ent=%p", queue->qid, ent);
+		spin_unlock(&queue->lock);
+		return;
+	}
+
+	if (ent->state == FRRS_WAIT) {
+		ent->state = FRRS_USERSPACE;
+		list_move(&ent->list, &queue->ent_in_userspace);
+		need_cmd_done = true;
+	}
+	spin_unlock(&queue->lock);
+
+	if (need_cmd_done)
+		io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
+
+	/*
+	 * releasing the last entry should trigger fuse_dev_release() if
+	 * the daemon was terminated
+	 */
+}
+
+static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
+				      struct fuse_ring_ent *ring_ent)
+{
+	struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+	pdu->ring_ent = ring_ent;
+	pdu->queue = ring_ent->queue;
+
+	io_uring_cmd_mark_cancelable(cmd, issue_flags);
+}
+
 /*
  * Checks for errors and stores it into the request
  */
@@ -606,7 +662,8 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
  * Put a ring request onto hold, it is no longer used for now.
  */
 static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
-				 struct fuse_ring_queue *queue)
+				 struct fuse_ring_queue *queue,
+				 unsigned int issue_flags)
 	__must_hold(&queue->lock)
 {
 	struct fuse_ring *ring = queue->ring;
@@ -626,6 +683,7 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
 	list_move(&ring_ent->list, &queue->ent_avail_queue);
 
 	ring_ent->state = FRRS_WAIT;
+	fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
 }
 
 /* Used to find the request on SQE commit */
@@ -729,7 +787,8 @@ static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
  * Get the next fuse req and send it
  */
 static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
-				    struct fuse_ring_queue *queue)
+				    struct fuse_ring_queue *queue,
+				    unsigned int issue_flags)
 {
 	int has_next, err;
 	int prev_state = ring_ent->state;
@@ -738,7 +797,7 @@ static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
 		spin_lock(&queue->lock);
 		has_next = fuse_uring_ent_assign_req(ring_ent);
 		if (!has_next) {
-			fuse_uring_ent_avail(ring_ent, queue);
+			fuse_uring_ent_avail(ring_ent, queue, issue_flags);
 			spin_unlock(&queue->lock);
 			break; /* no request left */
 		}
@@ -813,7 +872,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
 	 * and fetching is done in one step vs legacy fuse, which has separated
 	 * read (fetch request) and write (commit result).
 	 */
-	fuse_uring_next_fuse_req(ring_ent, queue);
+	fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
 	return 0;
 }
 
@@ -853,7 +912,7 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
 	struct fuse_ring *ring = queue->ring;
 
 	spin_lock(&queue->lock);
-	fuse_uring_ent_avail(ring_ent, queue);
+	fuse_uring_ent_avail(ring_ent, queue, issue_flags);
 	ring_ent->cmd = cmd;
 	spin_unlock(&queue->lock);
 
@@ -1021,6 +1080,11 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	if (fc->aborted)
 		goto out;
 
+	if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
+		fuse_uring_cancel(cmd, issue_flags, fc);
+		return 0;
+	}
+
 	switch (cmd_op) {
 	case FUSE_URING_REQ_FETCH:
 		err = fuse_uring_fetch(cmd, issue_flags, fc);
@@ -1080,7 +1144,7 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
 
 	return;
 err:
-	fuse_uring_next_fuse_req(ring_ent, queue);
+	fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
 }
 
 /* queue a fuse request and send it if a ring entry is available */

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v5 16/16] fuse: enable fuse-over-io-uring
  2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
                   ` (14 preceding siblings ...)
  2024-11-07 17:03 ` [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2024-11-07 17:04 ` Bernd Schubert
  15 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-07 17:04 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
	Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
	Bernd Schubert

All required parts are handled now, fuse-io-uring can
be enabled.

Signed-off-by: Bernd Schubert <[email protected]>
---
 fs/fuse/dev_uring.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index b465da41c42c47eaf69f09bab1423061bc8fcc68..2ee7d5ba260bc4b54927af1a856dabcf7d725edb 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1056,11 +1056,6 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
 	u32 cmd_op = cmd->cmd_op;
 	int err = 0;
 
-	/* Disabled for now, especially as teardown is not implemented yet */
-	err = -EOPNOTSUPP;
-	pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
-	goto out;
-
 	pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
 
 	err = -EOPNOTSUPP;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-07 17:03 ` [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2024-11-14 20:57   ` Joanne Koong
  2024-11-14 21:05     ` Bernd Schubert
  0 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2024-11-14 20:57 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>
> This change sets up FUSE operations to have headers in args.in_args[0],
> even for opcodes without an actual header. We do this to prepare for
> cleanly separating payload from headers in the future.
>
> For opcodes without a header, we use a zero-sized struct as a
> placeholder. This approach:
> - Keeps things consistent across all FUSE operations
> - Will help with payload alignment later
> - Avoids future issues when header sizes change
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dax.c    | 13 ++++++++-----
>  fs/fuse/dev.c    | 24 ++++++++++++++++++++----
>  fs/fuse/dir.c    | 41 +++++++++++++++++++++++++++--------------
>  fs/fuse/fuse_i.h |  7 +++++++
>  fs/fuse/xattr.c  |  9 ++++++---
>  5 files changed, 68 insertions(+), 26 deletions(-)
>
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
>         struct fuse_inode *fi = get_fuse_inode(inode);
>         struct fuse_mount *fm = get_fuse_mount(inode);
>         FUSE_ARGS(args);
> +       struct fuse_zero_in zero_arg;
>
>         args.opcode = FUSE_REMOVEMAPPING;
>         args.nodeid = fi->nodeid;
> -       args.in_numargs = 2;
> -       args.in_args[0].size = sizeof(*inargp);
> -       args.in_args[0].value = inargp;
> -       args.in_args[1].size = inargp->count * sizeof(*remove_one);
> -       args.in_args[1].value = remove_one;
> +       args.in_numargs = 3;
> +       args.in_args[0].size = sizeof(zero_arg);
> +       args.in_args[0].value = &zero_arg;
> +       args.in_args[1].size = sizeof(*inargp);
> +       args.in_args[1].value = inargp;
> +       args.in_args[2].size = inargp->count * sizeof(*remove_one);
> +       args.in_args[2].value = remove_one;
>         return fuse_simple_request(fm, &args);
>  }
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index dbc222f9b0f0e590ce3ef83077e6b4cff03cff65..6effef4073da3dad2f6140761eca98147a41d88d 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1007,6 +1007,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>
>         for (i = 0; !err && i < numargs; i++)  {
>                 struct fuse_arg *arg = &args[i];
> +
> +               /* zero headers */
> +               if (arg->size == 0) {
> +                       if (WARN_ON_ONCE(i != 0)) {
> +                               if (cs->req)
> +                                       pr_err_once(
> +                                               "fuse: zero size header in opcode %d\n",
> +                                               cs->req->in.h.opcode);
> +                               return -EINVAL;
> +                       }
> +                       continue;
> +               }
> +
>                 if (i == numargs - 1 && argpages)
>                         err = fuse_copy_pages(cs, arg->size, zeroing);
>                 else
> @@ -1662,6 +1675,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>         size_t args_size = sizeof(*ra);
>         struct fuse_args_pages *ap;
>         struct fuse_args *args;
> +       struct fuse_zero_in zero_arg;
>
>         offset = outarg->offset & ~PAGE_MASK;
>         file_size = i_size_read(inode);
> @@ -1688,7 +1702,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>         args = &ap->args;
>         args->nodeid = outarg->nodeid;
>         args->opcode = FUSE_NOTIFY_REPLY;
> -       args->in_numargs = 2;
> +       args->in_numargs = 3;
>         args->in_pages = true;
>         args->end = fuse_retrieve_end;
>
> @@ -1715,9 +1729,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>         }
>         ra->inarg.offset = outarg->offset;
>         ra->inarg.size = total_len;
> -       args->in_args[0].size = sizeof(ra->inarg);
> -       args->in_args[0].value = &ra->inarg;
> -       args->in_args[1].size = total_len;
> +       args->in_args[0].size = sizeof(zero_arg);
> +       args->in_args[0].value = &zero_arg;
> +       args->in_args[1].size = sizeof(ra->inarg);
> +       args->in_args[1].value = &ra->inarg;
> +       args->in_args[2].size = total_len;
>
>         err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
>         if (err)

Do we also need to add a zero arg header for FUSE_READLINK,
FUSE_DESTROY, and FUSE_BATCH_FORGET requests as well?


Thanks,
Joanne

> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 2b0d4781f39484d50d1fd7f4f673d8b08c5fd7cf..6d67d7f8e6b4460c759df3fb293e169bcc78a897 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -172,12 +172,16 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
>                              u64 nodeid, const struct qstr *name,
>                              struct fuse_entry_out *outarg)
>  {
> +       struct fuse_zero_in zero_arg;
> +
>         memset(outarg, 0, sizeof(struct fuse_entry_out));
>         args->opcode = FUSE_LOOKUP;
>         args->nodeid = nodeid;
> -       args->in_numargs = 1;
> -       args->in_args[0].size = name->len + 1;
> -       args->in_args[0].value = name->name;
> +       args->in_numargs = 2;
> +       args->in_args[0].size = sizeof(zero_arg);
> +       args->in_args[0].value = &zero_arg;
> +       args->in_args[1].size = name->len + 1;
> +       args->in_args[1].value = name->name;
>         args->out_numargs = 1;
>         args->out_args[0].size = sizeof(struct fuse_entry_out);
>         args->out_args[0].value = outarg;
> @@ -915,16 +919,19 @@ static int fuse_mkdir(struct mnt_idmap *idmap, struct inode *dir,
>  static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
>                         struct dentry *entry, const char *link)
>  {
> +       struct fuse_zero_in zero_arg;
>         struct fuse_mount *fm = get_fuse_mount(dir);
>         unsigned len = strlen(link) + 1;
>         FUSE_ARGS(args);
>
>         args.opcode = FUSE_SYMLINK;
> -       args.in_numargs = 2;
> -       args.in_args[0].size = entry->d_name.len + 1;
> -       args.in_args[0].value = entry->d_name.name;
> -       args.in_args[1].size = len;
> -       args.in_args[1].value = link;
> +       args.in_numargs = 3;
> +       args.in_args[0].size = sizeof(zero_arg);
> +       args.in_args[0].value = &zero_arg;
> +       args.in_args[1].size = entry->d_name.len + 1;
> +       args.in_args[1].value = entry->d_name.name;
> +       args.in_args[2].size = len;
> +       args.in_args[2].value = link;
>         return create_new_entry(fm, &args, dir, entry, S_IFLNK);
>  }
>
> @@ -975,6 +982,7 @@ static void fuse_entry_unlinked(struct dentry *entry)
>
>  static int fuse_unlink(struct inode *dir, struct dentry *entry)
>  {
> +       struct fuse_zero_in inarg;
>         int err;
>         struct fuse_mount *fm = get_fuse_mount(dir);
>         FUSE_ARGS(args);
> @@ -984,9 +992,11 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
>
>         args.opcode = FUSE_UNLINK;
>         args.nodeid = get_node_id(dir);
> -       args.in_numargs = 1;
> -       args.in_args[0].size = entry->d_name.len + 1;
> -       args.in_args[0].value = entry->d_name.name;
> +       args.in_numargs = 2;
> +       args.in_args[0].size = sizeof(inarg);
> +       args.in_args[0].value = &inarg;
> +       args.in_args[1].size = entry->d_name.len + 1;
> +       args.in_args[1].value = entry->d_name.name;
>         err = fuse_simple_request(fm, &args);
>         if (!err) {
>                 fuse_dir_changed(dir);
> @@ -998,6 +1008,7 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
>
>  static int fuse_rmdir(struct inode *dir, struct dentry *entry)
>  {
> +       struct fuse_zero_in zero_arg;
>         int err;
>         struct fuse_mount *fm = get_fuse_mount(dir);
>         FUSE_ARGS(args);
> @@ -1007,9 +1018,11 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
>
>         args.opcode = FUSE_RMDIR;
>         args.nodeid = get_node_id(dir);
> -       args.in_numargs = 1;
> -       args.in_args[0].size = entry->d_name.len + 1;
> -       args.in_args[0].value = entry->d_name.name;
> +       args.in_numargs = 2;
> +       args.in_args[0].size = sizeof(zero_arg);
> +       args.in_args[0].value = &zero_arg;
> +       args.in_args[1].size = entry->d_name.len + 1;
> +       args.in_args[1].value = entry->d_name.name;
>         err = fuse_simple_request(fm, &args);
>         if (!err) {
>                 fuse_dir_changed(dir);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f2391961031374d8d55916c326c6472f0c03aae6..e2d1d90dfdb13b2c3e7de4789501ee45d3bf7794 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -941,6 +941,13 @@ struct fuse_mount {
>         struct rcu_head rcu;
>  };
>
> +/*
> + * Empty header for FUSE opcodes without specific header needs.
> + * Used as a placeholder in args->in_args[0] for consistency
> + * across all FUSE operations, simplifying request handling.
> + */
> +struct fuse_zero_in {};
> +
>  static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
>  {
>         return sb->s_fs_info;
> diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
> index 5b423fdbb13f8f17c3982e96dd0de836662092b0..2df1efd2e9bdb46571148f484d7927044f31c184 100644
> --- a/fs/fuse/xattr.c
> +++ b/fs/fuse/xattr.c
> @@ -158,15 +158,18 @@ int fuse_removexattr(struct inode *inode, const char *name)
>         struct fuse_mount *fm = get_fuse_mount(inode);
>         FUSE_ARGS(args);
>         int err;
> +       struct fuse_zero_in zero_arg;
>
>         if (fm->fc->no_removexattr)
>                 return -EOPNOTSUPP;
>
>         args.opcode = FUSE_REMOVEXATTR;
>         args.nodeid = get_node_id(inode);
> -       args.in_numargs = 1;
> -       args.in_args[0].size = strlen(name) + 1;
> -       args.in_args[0].value = name;
> +       args.in_numargs = 2;
> +       args.in_args[0].size = sizeof(zero_arg);
> +       args.in_args[0].value = &zero_arg;
> +       args.in_args[1].size = strlen(name) + 1;
> +       args.in_args[1].value = name;
>         err = fuse_simple_request(fm, &args);
>         if (err == -ENOSYS) {
>                 fm->fc->no_removexattr = 1;
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-14 20:57   ` Joanne Koong
@ 2024-11-14 21:05     ` Bernd Schubert
  2024-11-14 21:29       ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schubert @ 2024-11-14 21:05 UTC (permalink / raw)
  To: Joanne Koong, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd



On 11/14/24 21:57, Joanne Koong wrote:
> On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>>
>> This change sets up FUSE operations to have headers in args.in_args[0],
>> even for opcodes without an actual header. We do this to prepare for
>> cleanly separating payload from headers in the future.
>>
>> For opcodes without a header, we use a zero-sized struct as a
>> placeholder. This approach:
>> - Keeps things consistent across all FUSE operations
>> - Will help with payload alignment later
>> - Avoids future issues when header sizes change
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>>  fs/fuse/dax.c    | 13 ++++++++-----
>>  fs/fuse/dev.c    | 24 ++++++++++++++++++++----
>>  fs/fuse/dir.c    | 41 +++++++++++++++++++++++++++--------------
>>  fs/fuse/fuse_i.h |  7 +++++++
>>  fs/fuse/xattr.c  |  9 ++++++---
>>  5 files changed, 68 insertions(+), 26 deletions(-)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
>>         struct fuse_inode *fi = get_fuse_inode(inode);
>>         struct fuse_mount *fm = get_fuse_mount(inode);
>>         FUSE_ARGS(args);
>> +       struct fuse_zero_in zero_arg;
>>
>>         args.opcode = FUSE_REMOVEMAPPING;
>>         args.nodeid = fi->nodeid;
>> -       args.in_numargs = 2;
>> -       args.in_args[0].size = sizeof(*inargp);
>> -       args.in_args[0].value = inargp;
>> -       args.in_args[1].size = inargp->count * sizeof(*remove_one);
>> -       args.in_args[1].value = remove_one;
>> +       args.in_numargs = 3;
>> +       args.in_args[0].size = sizeof(zero_arg);
>> +       args.in_args[0].value = &zero_arg;
>> +       args.in_args[1].size = sizeof(*inargp);
>> +       args.in_args[1].value = inargp;
>> +       args.in_args[2].size = inargp->count * sizeof(*remove_one);
>> +       args.in_args[2].value = remove_one;
>>         return fuse_simple_request(fm, &args);
>>  }
>>
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index dbc222f9b0f0e590ce3ef83077e6b4cff03cff65..6effef4073da3dad2f6140761eca98147a41d88d 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -1007,6 +1007,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>>
>>         for (i = 0; !err && i < numargs; i++)  {
>>                 struct fuse_arg *arg = &args[i];
>> +
>> +               /* zero headers */
>> +               if (arg->size == 0) {
>> +                       if (WARN_ON_ONCE(i != 0)) {
>> +                               if (cs->req)
>> +                                       pr_err_once(
>> +                                               "fuse: zero size header in opcode %d\n",
>> +                                               cs->req->in.h.opcode);
>> +                               return -EINVAL;
>> +                       }
>> +                       continue;
>> +               }
>> +
>>                 if (i == numargs - 1 && argpages)
>>                         err = fuse_copy_pages(cs, arg->size, zeroing);
>>                 else
>> @@ -1662,6 +1675,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>>         size_t args_size = sizeof(*ra);
>>         struct fuse_args_pages *ap;
>>         struct fuse_args *args;
>> +       struct fuse_zero_in zero_arg;
>>
>>         offset = outarg->offset & ~PAGE_MASK;
>>         file_size = i_size_read(inode);
>> @@ -1688,7 +1702,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>>         args = &ap->args;
>>         args->nodeid = outarg->nodeid;
>>         args->opcode = FUSE_NOTIFY_REPLY;
>> -       args->in_numargs = 2;
>> +       args->in_numargs = 3;
>>         args->in_pages = true;
>>         args->end = fuse_retrieve_end;
>>
>> @@ -1715,9 +1729,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>>         }
>>         ra->inarg.offset = outarg->offset;
>>         ra->inarg.size = total_len;
>> -       args->in_args[0].size = sizeof(ra->inarg);
>> -       args->in_args[0].value = &ra->inarg;
>> -       args->in_args[1].size = total_len;
>> +       args->in_args[0].size = sizeof(zero_arg);
>> +       args->in_args[0].value = &zero_arg;
>> +       args->in_args[1].size = sizeof(ra->inarg);
>> +       args->in_args[1].value = &ra->inarg;
>> +       args->in_args[2].size = total_len;
>>
>>         err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
>>         if (err)
> 
> Do we also need to add a zero arg header for FUSE_READLINK,
> FUSE_DESTROY, and FUSE_BATCH_FORGET requests as well?
> 

Thanks for looking at the patch! I should have added to the commit message
that I didn't modify these, as they don't have an in argument at all.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-14 21:05     ` Bernd Schubert
@ 2024-11-14 21:29       ` Joanne Koong
  2024-11-14 22:06         ` Bernd Schubert
  0 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2024-11-14 21:29 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Bernd Schubert, Miklos Szeredi, Jens Axboe, Pavel Begunkov,
	linux-fsdevel, io-uring, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Thu, Nov 14, 2024 at 1:05 PM Bernd Schubert
<[email protected]> wrote:
>
>
>
> On 11/14/24 21:57, Joanne Koong wrote:
> > On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
> >>
> >> This change sets up FUSE operations to have headers in args.in_args[0],
> >> even for opcodes without an actual header. We do this to prepare for
> >> cleanly separating payload from headers in the future.
> >>
> >> For opcodes without a header, we use a zero-sized struct as a
> >> placeholder. This approach:
> >> - Keeps things consistent across all FUSE operations
> >> - Will help with payload alignment later
> >> - Avoids future issues when header sizes change
> >>
> >> Signed-off-by: Bernd Schubert <[email protected]>
> >> ---
> >>  fs/fuse/dax.c    | 13 ++++++++-----
> >>  fs/fuse/dev.c    | 24 ++++++++++++++++++++----
> >>  fs/fuse/dir.c    | 41 +++++++++++++++++++++++++++--------------
> >>  fs/fuse/fuse_i.h |  7 +++++++
> >>  fs/fuse/xattr.c  |  9 ++++++---
> >>  5 files changed, 68 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> >> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
> >> --- a/fs/fuse/dax.c
> >> +++ b/fs/fuse/dax.c
> >> @@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
> >>         struct fuse_inode *fi = get_fuse_inode(inode);
> >>         struct fuse_mount *fm = get_fuse_mount(inode);
> >>         FUSE_ARGS(args);
> >> +       struct fuse_zero_in zero_arg;
> >>
> >>         args.opcode = FUSE_REMOVEMAPPING;
> >>         args.nodeid = fi->nodeid;
> >> -       args.in_numargs = 2;
> >> -       args.in_args[0].size = sizeof(*inargp);
> >> -       args.in_args[0].value = inargp;
> >> -       args.in_args[1].size = inargp->count * sizeof(*remove_one);
> >> -       args.in_args[1].value = remove_one;
> >> +       args.in_numargs = 3;
> >> +       args.in_args[0].size = sizeof(zero_arg);
> >> +       args.in_args[0].value = &zero_arg;
> >> +       args.in_args[1].size = sizeof(*inargp);
> >> +       args.in_args[1].value = inargp;
> >> +       args.in_args[2].size = inargp->count * sizeof(*remove_one);
> >> +       args.in_args[2].value = remove_one;
> >>         return fuse_simple_request(fm, &args);
> >>  }
> >>
> >> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> >> index dbc222f9b0f0e590ce3ef83077e6b4cff03cff65..6effef4073da3dad2f6140761eca98147a41d88d 100644
> >> --- a/fs/fuse/dev.c
> >> +++ b/fs/fuse/dev.c
> >> @@ -1007,6 +1007,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
> >>
> >>         for (i = 0; !err && i < numargs; i++)  {
> >>                 struct fuse_arg *arg = &args[i];
> >> +
> >> +               /* zero headers */
> >> +               if (arg->size == 0) {
> >> +                       if (WARN_ON_ONCE(i != 0)) {
> >> +                               if (cs->req)
> >> +                                       pr_err_once(
> >> +                                               "fuse: zero size header in opcode %d\n",
> >> +                                               cs->req->in.h.opcode);
> >> +                               return -EINVAL;
> >> +                       }
> >> +                       continue;
> >> +               }
> >> +
> >>                 if (i == numargs - 1 && argpages)
> >>                         err = fuse_copy_pages(cs, arg->size, zeroing);
> >>                 else
> >> @@ -1662,6 +1675,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> >>         size_t args_size = sizeof(*ra);
> >>         struct fuse_args_pages *ap;
> >>         struct fuse_args *args;
> >> +       struct fuse_zero_in zero_arg;
> >>
> >>         offset = outarg->offset & ~PAGE_MASK;
> >>         file_size = i_size_read(inode);
> >> @@ -1688,7 +1702,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> >>         args = &ap->args;
> >>         args->nodeid = outarg->nodeid;
> >>         args->opcode = FUSE_NOTIFY_REPLY;
> >> -       args->in_numargs = 2;
> >> +       args->in_numargs = 3;
> >>         args->in_pages = true;
> >>         args->end = fuse_retrieve_end;
> >>
> >> @@ -1715,9 +1729,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> >>         }
> >>         ra->inarg.offset = outarg->offset;
> >>         ra->inarg.size = total_len;
> >> -       args->in_args[0].size = sizeof(ra->inarg);
> >> -       args->in_args[0].value = &ra->inarg;
> >> -       args->in_args[1].size = total_len;
> >> +       args->in_args[0].size = sizeof(zero_arg);
> >> +       args->in_args[0].value = &zero_arg;
> >> +       args->in_args[1].size = sizeof(ra->inarg);
> >> +       args->in_args[1].value = &ra->inarg;
> >> +       args->in_args[2].size = total_len;
> >>
> >>         err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
> >>         if (err)
> >
> > Do we also need to add a zero arg header for FUSE_READLINK,
> > FUSE_DESTROY, and FUSE_BATCH_FORGET requests as well?
> >
>
> Thanks for looking at the patch! I should have added to the commit message
> that I didn't modify these, as they don't have an in argument at all.
>

Thanks for clarifying! (and apologies for the late review. I haven't
been keeping up with these patches since RFC v3 but I'm planning to
get up to speed and take a deeper look at these tomorrow + next week).

I think the FUSE_BATCH_FORGET request does use in args, depending on
the number of forget requests.


Thanks,
Joanne
>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-14 21:29       ` Joanne Koong
@ 2024-11-14 22:06         ` Bernd Schubert
  2024-11-15  0:49           ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schubert @ 2024-11-14 22:06 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Bernd Schubert, Miklos Szeredi, Jens Axboe, Pavel Begunkov,
	linux-fsdevel, io-uring, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd



On 11/14/24 22:29, Joanne Koong wrote:
> On Thu, Nov 14, 2024 at 1:05 PM Bernd Schubert
> <[email protected]> wrote:
>>
>>
>>
>> On 11/14/24 21:57, Joanne Koong wrote:
>>> On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>>>>
>>>> This change sets up FUSE operations to have headers in args.in_args[0],
>>>> even for opcodes without an actual header. We do this to prepare for
>>>> cleanly separating payload from headers in the future.
>>>>
>>>> For opcodes without a header, we use a zero-sized struct as a
>>>> placeholder. This approach:
>>>> - Keeps things consistent across all FUSE operations
>>>> - Will help with payload alignment later
>>>> - Avoids future issues when header sizes change
>>>>
>>>> Signed-off-by: Bernd Schubert <[email protected]>
>>>> ---
>>>>  fs/fuse/dax.c    | 13 ++++++++-----
>>>>  fs/fuse/dev.c    | 24 ++++++++++++++++++++----
>>>>  fs/fuse/dir.c    | 41 +++++++++++++++++++++++++++--------------
>>>>  fs/fuse/fuse_i.h |  7 +++++++
>>>>  fs/fuse/xattr.c  |  9 ++++++---
>>>>  5 files changed, 68 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>>>> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
>>>> --- a/fs/fuse/dax.c
>>>> +++ b/fs/fuse/dax.c
>>>> @@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
>>>>         struct fuse_inode *fi = get_fuse_inode(inode);
>>>>         struct fuse_mount *fm = get_fuse_mount(inode);
>>>>         FUSE_ARGS(args);
>>>> +       struct fuse_zero_in zero_arg;
>>>>
>>>>         args.opcode = FUSE_REMOVEMAPPING;
>>>>         args.nodeid = fi->nodeid;
>>>> -       args.in_numargs = 2;
>>>> -       args.in_args[0].size = sizeof(*inargp);
>>>> -       args.in_args[0].value = inargp;
>>>> -       args.in_args[1].size = inargp->count * sizeof(*remove_one);
>>>> -       args.in_args[1].value = remove_one;
>>>> +       args.in_numargs = 3;
>>>> +       args.in_args[0].size = sizeof(zero_arg);
>>>> +       args.in_args[0].value = &zero_arg;
>>>> +       args.in_args[1].size = sizeof(*inargp);
>>>> +       args.in_args[1].value = inargp;
>>>> +       args.in_args[2].size = inargp->count * sizeof(*remove_one);
>>>> +       args.in_args[2].value = remove_one;
>>>>         return fuse_simple_request(fm, &args);
>>>>  }
>>>>
>>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>>>> index dbc222f9b0f0e590ce3ef83077e6b4cff03cff65..6effef4073da3dad2f6140761eca98147a41d88d 100644
>>>> --- a/fs/fuse/dev.c
>>>> +++ b/fs/fuse/dev.c
>>>> @@ -1007,6 +1007,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
>>>>
>>>>         for (i = 0; !err && i < numargs; i++)  {
>>>>                 struct fuse_arg *arg = &args[i];
>>>> +
>>>> +               /* zero headers */
>>>> +               if (arg->size == 0) {
>>>> +                       if (WARN_ON_ONCE(i != 0)) {
>>>> +                               if (cs->req)
>>>> +                                       pr_err_once(
>>>> +                                               "fuse: zero size header in opcode %d\n",
>>>> +                                               cs->req->in.h.opcode);
>>>> +                               return -EINVAL;
>>>> +                       }
>>>> +                       continue;
>>>> +               }
>>>> +
>>>>                 if (i == numargs - 1 && argpages)
>>>>                         err = fuse_copy_pages(cs, arg->size, zeroing);
>>>>                 else
>>>> @@ -1662,6 +1675,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>>>>         size_t args_size = sizeof(*ra);
>>>>         struct fuse_args_pages *ap;
>>>>         struct fuse_args *args;
>>>> +       struct fuse_zero_in zero_arg;
>>>>
>>>>         offset = outarg->offset & ~PAGE_MASK;
>>>>         file_size = i_size_read(inode);
>>>> @@ -1688,7 +1702,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>>>>         args = &ap->args;
>>>>         args->nodeid = outarg->nodeid;
>>>>         args->opcode = FUSE_NOTIFY_REPLY;
>>>> -       args->in_numargs = 2;
>>>> +       args->in_numargs = 3;
>>>>         args->in_pages = true;
>>>>         args->end = fuse_retrieve_end;
>>>>
>>>> @@ -1715,9 +1729,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
>>>>         }
>>>>         ra->inarg.offset = outarg->offset;
>>>>         ra->inarg.size = total_len;
>>>> -       args->in_args[0].size = sizeof(ra->inarg);
>>>> -       args->in_args[0].value = &ra->inarg;
>>>> -       args->in_args[1].size = total_len;
>>>> +       args->in_args[0].size = sizeof(zero_arg);
>>>> +       args->in_args[0].value = &zero_arg;
>>>> +       args->in_args[1].size = sizeof(ra->inarg);
>>>> +       args->in_args[1].value = &ra->inarg;
>>>> +       args->in_args[2].size = total_len;
>>>>
>>>>         err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
>>>>         if (err)
>>>
>>> Do we also need to add a zero arg header for FUSE_READLINK,
>>> FUSE_DESTROY, and FUSE_BATCH_FORGET requests as well?
>>>
>>
>> Thanks for looking at the patch! I should have added to the commit message
>> that I didn't modify these, as they don't have an in argument at all.
>>
> 
> Thanks for clarifying! (and apologies for the late review. I haven't
> been keeping up with these patches since RFC v3 but I'm planning to
> get up to speed and take a deeper look at these tomorrow + next week).

No worries at all... I'm also very late with reviewing your patches. 
I'm close for the next fuse-io-version, just fixing some bg accounting
issues that had been in all rfc versions so far.

> 
> I think the FUSE_BATCH_FORGET request does use in args, depending on
> the number of forget requests.

Ah right, but it does not use fuse_copy_args and args->in_args[idx] - 
is very special. And just looking it up again, the header is in the
right place. Issue would be more for over-io-uring to copy into the
payload. However, current over-io-uring patches don't handle forgets
at all - it goes over /dev/fuse. Unless you disagree, I think we can
do forgets later on over io-uring as optimization.


Thanks,
Bernd



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header
  2024-11-14 22:06         ` Bernd Schubert
@ 2024-11-15  0:49           ` Joanne Koong
  0 siblings, 0 replies; 29+ messages in thread
From: Joanne Koong @ 2024-11-15  0:49 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Bernd Schubert, Miklos Szeredi, Jens Axboe, Pavel Begunkov,
	linux-fsdevel, io-uring, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Thu, Nov 14, 2024 at 2:06 PM Bernd Schubert
<[email protected]> wrote:
>
>
>
> On 11/14/24 22:29, Joanne Koong wrote:
> > On Thu, Nov 14, 2024 at 1:05 PM Bernd Schubert
> > <[email protected]> wrote:
> >>
> >>
> >>
> >> On 11/14/24 21:57, Joanne Koong wrote:
> >>> On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
> >>>>
> >>>> This change sets up FUSE operations to have headers in args.in_args[0],
> >>>> even for opcodes without an actual header. We do this to prepare for
> >>>> cleanly separating payload from headers in the future.
> >>>>
> >>>> For opcodes without a header, we use a zero-sized struct as a
> >>>> placeholder. This approach:
> >>>> - Keeps things consistent across all FUSE operations
> >>>> - Will help with payload alignment later
> >>>> - Avoids future issues when header sizes change
> >>>>
> >>>> Signed-off-by: Bernd Schubert <[email protected]>
> >>>> ---
> >>>>  fs/fuse/dax.c    | 13 ++++++++-----
> >>>>  fs/fuse/dev.c    | 24 ++++++++++++++++++++----
> >>>>  fs/fuse/dir.c    | 41 +++++++++++++++++++++++++++--------------
> >>>>  fs/fuse/fuse_i.h |  7 +++++++
> >>>>  fs/fuse/xattr.c  |  9 ++++++---
> >>>>  5 files changed, 68 insertions(+), 26 deletions(-)
> >>>>
> >>>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> >>>> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..e459b8134ccb089f971bebf8da1f7fc5199c1271 100644
> >>>> --- a/fs/fuse/dax.c
> >>>> +++ b/fs/fuse/dax.c
> >>>> @@ -237,14 +237,17 @@ static int fuse_send_removemapping(struct inode *inode,
> >>>>         struct fuse_inode *fi = get_fuse_inode(inode);
> >>>>         struct fuse_mount *fm = get_fuse_mount(inode);
> >>>>         FUSE_ARGS(args);
> >>>> +       struct fuse_zero_in zero_arg;
> >>>>
> >>>>         args.opcode = FUSE_REMOVEMAPPING;
> >>>>         args.nodeid = fi->nodeid;
> >>>> -       args.in_numargs = 2;
> >>>> -       args.in_args[0].size = sizeof(*inargp);
> >>>> -       args.in_args[0].value = inargp;
> >>>> -       args.in_args[1].size = inargp->count * sizeof(*remove_one);
> >>>> -       args.in_args[1].value = remove_one;
> >>>> +       args.in_numargs = 3;
> >>>> +       args.in_args[0].size = sizeof(zero_arg);
> >>>> +       args.in_args[0].value = &zero_arg;
> >>>> +       args.in_args[1].size = sizeof(*inargp);
> >>>> +       args.in_args[1].value = inargp;
> >>>> +       args.in_args[2].size = inargp->count * sizeof(*remove_one);
> >>>> +       args.in_args[2].value = remove_one;
> >>>>         return fuse_simple_request(fm, &args);
> >>>>  }
> >>>>
> >>>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> >>>> index dbc222f9b0f0e590ce3ef83077e6b4cff03cff65..6effef4073da3dad2f6140761eca98147a41d88d 100644
> >>>> --- a/fs/fuse/dev.c
> >>>> +++ b/fs/fuse/dev.c
> >>>> @@ -1007,6 +1007,19 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
> >>>>
> >>>>         for (i = 0; !err && i < numargs; i++)  {
> >>>>                 struct fuse_arg *arg = &args[i];
> >>>> +
> >>>> +               /* zero headers */
> >>>> +               if (arg->size == 0) {
> >>>> +                       if (WARN_ON_ONCE(i != 0)) {
> >>>> +                               if (cs->req)
> >>>> +                                       pr_err_once(
> >>>> +                                               "fuse: zero size header in opcode %d\n",
> >>>> +                                               cs->req->in.h.opcode);
> >>>> +                               return -EINVAL;
> >>>> +                       }
> >>>> +                       continue;
> >>>> +               }
> >>>> +
> >>>>                 if (i == numargs - 1 && argpages)
> >>>>                         err = fuse_copy_pages(cs, arg->size, zeroing);
> >>>>                 else
> >>>> @@ -1662,6 +1675,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> >>>>         size_t args_size = sizeof(*ra);
> >>>>         struct fuse_args_pages *ap;
> >>>>         struct fuse_args *args;
> >>>> +       struct fuse_zero_in zero_arg;
> >>>>
> >>>>         offset = outarg->offset & ~PAGE_MASK;
> >>>>         file_size = i_size_read(inode);
> >>>> @@ -1688,7 +1702,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> >>>>         args = &ap->args;
> >>>>         args->nodeid = outarg->nodeid;
> >>>>         args->opcode = FUSE_NOTIFY_REPLY;
> >>>> -       args->in_numargs = 2;
> >>>> +       args->in_numargs = 3;
> >>>>         args->in_pages = true;
> >>>>         args->end = fuse_retrieve_end;
> >>>>
> >>>> @@ -1715,9 +1729,11 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> >>>>         }
> >>>>         ra->inarg.offset = outarg->offset;
> >>>>         ra->inarg.size = total_len;
> >>>> -       args->in_args[0].size = sizeof(ra->inarg);
> >>>> -       args->in_args[0].value = &ra->inarg;
> >>>> -       args->in_args[1].size = total_len;
> >>>> +       args->in_args[0].size = sizeof(zero_arg);
> >>>> +       args->in_args[0].value = &zero_arg;
> >>>> +       args->in_args[1].size = sizeof(ra->inarg);
> >>>> +       args->in_args[1].value = &ra->inarg;
> >>>> +       args->in_args[2].size = total_len;
> >>>>
> >>>>         err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
> >>>>         if (err)
> >>>
> >>> Do we also need to add a zero arg header for FUSE_READLINK,
> >>> FUSE_DESTROY, and FUSE_BATCH_FORGET requests as well?
> >>>
> >>
> >> Thanks for looking at the patch! I should have added to the commit message
> >> that I didn't modify these, as they don't have an in argument at all.
> >>
> >
> > Thanks for clarifying! (and apologies for the late review. I haven't
> > been keeping up with these patches since RFC v3 but I'm planning to
> > get up to speed and take a deeper look at these tomorrow + next week).
>
> No worries at all... I'm also very late with reviewing your patches.
> I'm close for the next fuse-io-version, just fixing some bg accounting
> issues that had been in all rfc versions so far.
>

Awesome, I'll wait until your next fuse io version to review then.
Thanks for trucking along on this - I'm very excited to use this.

> >
> > I think the FUSE_BATCH_FORGET request does use in args, depending on
> > the number of forget requests.
>
> Ah right, but it does not use fuse_copy_args and args->in_args[idx] -
> is very special. And just looking it up again, the header is in the
> right place. Issue would be more for over-io-uring to copy into the
> payload. However, current over-io-uring patches don't handle forgets
> at all - it goes over /dev/fuse. Unless you disagree, I think we can
> do forgets later on over io-uring as optimization.
>

Not important at all - was just noting it in case you had meant to
include it as part of this patch.


Thanks,
Joanne
>
> Thanks,
> Bernd
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-07 17:03 ` [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2024-11-18 19:32   ` Joanne Koong
  2024-11-18 19:55     ` Bernd Schubert
  2024-11-18 23:30   ` Joanne Koong
  1 sibling, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2024-11-18 19:32 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>
> When the fuse-server terminates while the fuse-client or kernel
> still has queued URING_CMDs, these commands retain references
> to the struct file used by the fuse connection. This prevents
> fuse_dev_release() from being invoked, resulting in a hung mount
> point.

Could you explain the flow of what happens after a fuse server
terminates? How does that trigger the IO_URING_F_CANCEL uring cmd?

>
> This patch addresses the issue by making queued URING_CMDs
> cancelable, allowing fuse_dev_release() to proceed as expected
> and preventing the mount point from hanging.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dev_uring.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 70 insertions(+), 6 deletions(-)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 6af515458695ccb2e32cc8c62c45471e6710c15f..b465da41c42c47eaf69f09bab1423061bc8fcc68 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -23,6 +23,7 @@ MODULE_PARM_DESC(enable_uring,
>
>  struct fuse_uring_cmd_pdu {
>         struct fuse_ring_ent *ring_ent;
> +       struct fuse_ring_queue *queue;
>  };
>
>  /*
> @@ -382,6 +383,61 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
>         }
>  }
>
> +/*
> + * Handle IO_URING_F_CANCEL, typically should come on daemon termination
> + */
> +static void fuse_uring_cancel(struct io_uring_cmd *cmd,
> +                             unsigned int issue_flags, struct fuse_conn *fc)
> +{
> +       struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
> +       struct fuse_ring_queue *queue = pdu->queue;
> +       struct fuse_ring_ent *ent;
> +       bool found = false;
> +       bool need_cmd_done = false;
> +
> +       spin_lock(&queue->lock);
> +
> +       /* XXX: This is cumbersome for large queues. */
> +       list_for_each_entry(ent, &queue->ent_avail_queue, list) {
> +               if (pdu->ring_ent == ent) {
> +                       found = true;
> +                       break;
> +               }
> +       }
> +
> +       if (!found) {
> +               pr_info("qid=%d Did not find ent=%p", queue->qid, ent);
> +               spin_unlock(&queue->lock);
> +               return;
> +       }
> +
> +       if (ent->state == FRRS_WAIT) {
> +               ent->state = FRRS_USERSPACE;
> +               list_move(&ent->list, &queue->ent_in_userspace);
> +               need_cmd_done = true;
> +       }
> +       spin_unlock(&queue->lock);
> +
> +       if (need_cmd_done)
> +               io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
> +
> +       /*
> +        * releasing the last entry should trigger fuse_dev_release() if
> +        * the daemon was terminated
> +        */
> +}
> +
> +static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
> +                                     struct fuse_ring_ent *ring_ent)
> +{
> +       struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
> +
> +       pdu->ring_ent = ring_ent;
> +       pdu->queue = ring_ent->queue;
> +
> +       io_uring_cmd_mark_cancelable(cmd, issue_flags);
> +}
> +
>  /*
>   * Checks for errors and stores it into the request
>   */
> @@ -606,7 +662,8 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
>   * Put a ring request onto hold, it is no longer used for now.
>   */
>  static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
> -                                struct fuse_ring_queue *queue)
> +                                struct fuse_ring_queue *queue,
> +                                unsigned int issue_flags)
>         __must_hold(&queue->lock)
>  {
>         struct fuse_ring *ring = queue->ring;
> @@ -626,6 +683,7 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>         list_move(&ring_ent->list, &queue->ent_avail_queue);
>
>         ring_ent->state = FRRS_WAIT;
> +       fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
>  }
>
>  /* Used to find the request on SQE commit */
> @@ -729,7 +787,8 @@ static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
>   * Get the next fuse req and send it
>   */
>  static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
> -                                   struct fuse_ring_queue *queue)
> +                                   struct fuse_ring_queue *queue,
> +                                   unsigned int issue_flags)
>  {
>         int has_next, err;
>         int prev_state = ring_ent->state;
> @@ -738,7 +797,7 @@ static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
>                 spin_lock(&queue->lock);
>                 has_next = fuse_uring_ent_assign_req(ring_ent);
>                 if (!has_next) {
> -                       fuse_uring_ent_avail(ring_ent, queue);
> +                       fuse_uring_ent_avail(ring_ent, queue, issue_flags);
>                         spin_unlock(&queue->lock);
>                         break; /* no request left */
>                 }
> @@ -813,7 +872,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>          * and fetching is done in one step vs legacy fuse, which has separated
>          * read (fetch request) and write (commit result).
>          */
> -       fuse_uring_next_fuse_req(ring_ent, queue);
> +       fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
>         return 0;
>  }
>
> @@ -853,7 +912,7 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
>         struct fuse_ring *ring = queue->ring;
>
>         spin_lock(&queue->lock);
> -       fuse_uring_ent_avail(ring_ent, queue);
> +       fuse_uring_ent_avail(ring_ent, queue, issue_flags);
>         ring_ent->cmd = cmd;
>         spin_unlock(&queue->lock);
>
> @@ -1021,6 +1080,11 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>         if (fc->aborted)
>                 goto out;
>
> +       if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
> +               fuse_uring_cancel(cmd, issue_flags, fc);
> +               return 0;
> +       }
> +
>         switch (cmd_op) {
>         case FUSE_URING_REQ_FETCH:
>                 err = fuse_uring_fetch(cmd, issue_flags, fc);
> @@ -1080,7 +1144,7 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
>
>         return;
>  err:
> -       fuse_uring_next_fuse_req(ring_ent, queue);
> +       fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
>  }
>
>  /* queue a fuse request and send it if a ring entry is available */
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-18 19:32   ` Joanne Koong
@ 2024-11-18 19:55     ` Bernd Schubert
  2024-11-18 23:10       ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schubert @ 2024-11-18 19:55 UTC (permalink / raw)
  To: Joanne Koong, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd



On 11/18/24 20:32, Joanne Koong wrote:
> On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>>
>> When the fuse-server terminates while the fuse-client or kernel
>> still has queued URING_CMDs, these commands retain references
>> to the struct file used by the fuse connection. This prevents
>> fuse_dev_release() from being invoked, resulting in a hung mount
>> point.
> 
> Could you explain the flow of what happens after a fuse server
> terminates? How does that trigger the IO_URING_F_CANCEL uring cmd?

This is all about daemon termination, when the mount point is still
alive. Basically without this patch even plain (non forced umount)
hangs.
Without queued IORING_OP_URING_CMDs there is a call into 
fuse_dev_release() on daemon termination, with queued
IORING_OP_URING_CMDs this doesn't happen as each of these commands
holds a file reference.

IO_URING_F_CANCEL is triggered from from io-uring, I guess when
the io-uring file descriptor is released 
(note: 'io-uring fd' != '/dev/fuse fd').


I guess I need to improve the commit message a bit.



Cheers,
Bernd





^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-18 19:55     ` Bernd Schubert
@ 2024-11-18 23:10       ` Joanne Koong
  0 siblings, 0 replies; 29+ messages in thread
From: Joanne Koong @ 2024-11-18 23:10 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Bernd Schubert, Miklos Szeredi, Jens Axboe, Pavel Begunkov,
	linux-fsdevel, io-uring, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Mon, Nov 18, 2024 at 11:55 AM Bernd Schubert
<[email protected]> wrote:
>
>
> On 11/18/24 20:32, Joanne Koong wrote:
> > On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
> >>
> >> When the fuse-server terminates while the fuse-client or kernel
> >> still has queued URING_CMDs, these commands retain references
> >> to the struct file used by the fuse connection. This prevents
> >> fuse_dev_release() from being invoked, resulting in a hung mount
> >> point.
> >
> > Could you explain the flow of what happens after a fuse server
> > terminates? How does that trigger the IO_URING_F_CANCEL uring cmd?
>
> This is all about daemon termination, when the mount point is still
> alive. Basically without this patch even plain (non forced umount)
> hangs.
> Without queued IORING_OP_URING_CMDs there is a call into
> fuse_dev_release() on daemon termination, with queued
> IORING_OP_URING_CMDs this doesn't happen as each of these commands
> holds a file reference.
>
> IO_URING_F_CANCEL is triggered from from io-uring, I guess when
> the io-uring file descriptor is released
> (note: 'io-uring fd' != '/dev/fuse fd').

Gotcha. I took a look at the io_uring code and it looks like the call
chain looks something like this:

io_uring_release()
  io_ring_ctx_wait_and_kill()
    io_ring_exit_work()
      io_uring_try_cancel_requests()
        io_uring_try_cancel_uring_cmd()
            file->f_op->uring_cmd() w/ IO_URING_F_CANCEL issues_flag

>
>
> I guess I need to improve the commit message a bit.
>
>
>
> Cheers,
> Bernd
>
>
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-07 17:03 ` [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
  2024-11-18 19:32   ` Joanne Koong
@ 2024-11-18 23:30   ` Joanne Koong
  2024-11-18 23:47     ` Bernd Schubert
  1 sibling, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2024-11-18 23:30 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd

On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>
> When the fuse-server terminates while the fuse-client or kernel
> still has queued URING_CMDs, these commands retain references
> to the struct file used by the fuse connection. This prevents
> fuse_dev_release() from being invoked, resulting in a hung mount
> point.
>
> This patch addresses the issue by making queued URING_CMDs
> cancelable, allowing fuse_dev_release() to proceed as expected
> and preventing the mount point from hanging.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
>  fs/fuse/dev_uring.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 70 insertions(+), 6 deletions(-)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 6af515458695ccb2e32cc8c62c45471e6710c15f..b465da41c42c47eaf69f09bab1423061bc8fcc68 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -23,6 +23,7 @@ MODULE_PARM_DESC(enable_uring,
>
>  struct fuse_uring_cmd_pdu {
>         struct fuse_ring_ent *ring_ent;
> +       struct fuse_ring_queue *queue;
>  };
>
>  /*
> @@ -382,6 +383,61 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
>         }
>  }
>
> +/*
> + * Handle IO_URING_F_CANCEL, typically should come on daemon termination
> + */
> +static void fuse_uring_cancel(struct io_uring_cmd *cmd,
> +                             unsigned int issue_flags, struct fuse_conn *fc)
> +{
> +       struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
> +       struct fuse_ring_queue *queue = pdu->queue;
> +       struct fuse_ring_ent *ent;
> +       bool found = false;
> +       bool need_cmd_done = false;
> +
> +       spin_lock(&queue->lock);
> +
> +       /* XXX: This is cumbersome for large queues. */
> +       list_for_each_entry(ent, &queue->ent_avail_queue, list) {
> +               if (pdu->ring_ent == ent) {
> +                       found = true;
> +                       break;
> +               }
> +       }

Do we have to check that the entry is on the ent_avail_queue, or can
we assume that if the ent->state is FRRS_WAIT, the only queue it'll be
on is the ent_avail_queue? I see only one case where this isn't true,
for teardown in fuse_uring_stop_list_entries() - if we had a
workaround for this, eg having some teardown state signifying that
io_uring_cmd_done() needs to be called on the cmd and clearing
FRRS_WAIT, then we could get rid of iteration through ent_avail_queue
for every cancelled cmd.

> +
> +       if (!found) {
> +               pr_info("qid=%d Did not find ent=%p", queue->qid, ent);
> +               spin_unlock(&queue->lock);
> +               return;
> +       }
> +
> +       if (ent->state == FRRS_WAIT) {
> +               ent->state = FRRS_USERSPACE;
> +               list_move(&ent->list, &queue->ent_in_userspace);
> +               need_cmd_done = true;
> +       }
> +       spin_unlock(&queue->lock);
> +
> +       if (need_cmd_done)
> +               io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
> +
> +       /*
> +        * releasing the last entry should trigger fuse_dev_release() if
> +        * the daemon was terminated
> +        */
> +}
> +
> +static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
> +                                     struct fuse_ring_ent *ring_ent)
> +{
> +       struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
> +
> +       pdu->ring_ent = ring_ent;
> +       pdu->queue = ring_ent->queue;
> +
> +       io_uring_cmd_mark_cancelable(cmd, issue_flags);
> +}
> +
>  /*
>   * Checks for errors and stores it into the request
>   */
> @@ -606,7 +662,8 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
>   * Put a ring request onto hold, it is no longer used for now.
>   */
>  static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
> -                                struct fuse_ring_queue *queue)
> +                                struct fuse_ring_queue *queue,
> +                                unsigned int issue_flags)
>         __must_hold(&queue->lock)
>  {
>         struct fuse_ring *ring = queue->ring;
> @@ -626,6 +683,7 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>         list_move(&ring_ent->list, &queue->ent_avail_queue);
>
>         ring_ent->state = FRRS_WAIT;
> +       fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
>  }
>
>  /* Used to find the request on SQE commit */
> @@ -729,7 +787,8 @@ static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
>   * Get the next fuse req and send it
>   */
>  static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
> -                                   struct fuse_ring_queue *queue)
> +                                   struct fuse_ring_queue *queue,
> +                                   unsigned int issue_flags)
>  {
>         int has_next, err;
>         int prev_state = ring_ent->state;
> @@ -738,7 +797,7 @@ static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
>                 spin_lock(&queue->lock);
>                 has_next = fuse_uring_ent_assign_req(ring_ent);
>                 if (!has_next) {
> -                       fuse_uring_ent_avail(ring_ent, queue);
> +                       fuse_uring_ent_avail(ring_ent, queue, issue_flags);
>                         spin_unlock(&queue->lock);
>                         break; /* no request left */
>                 }
> @@ -813,7 +872,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
>          * and fetching is done in one step vs legacy fuse, which has separated
>          * read (fetch request) and write (commit result).
>          */
> -       fuse_uring_next_fuse_req(ring_ent, queue);
> +       fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
>         return 0;
>  }
>
> @@ -853,7 +912,7 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
>         struct fuse_ring *ring = queue->ring;
>
>         spin_lock(&queue->lock);
> -       fuse_uring_ent_avail(ring_ent, queue);
> +       fuse_uring_ent_avail(ring_ent, queue, issue_flags);
>         ring_ent->cmd = cmd;
>         spin_unlock(&queue->lock);
>
> @@ -1021,6 +1080,11 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>         if (fc->aborted)
>                 goto out;
>
> +       if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
> +               fuse_uring_cancel(cmd, issue_flags, fc);
> +               return 0;
> +       }
> +
>         switch (cmd_op) {
>         case FUSE_URING_REQ_FETCH:
>                 err = fuse_uring_fetch(cmd, issue_flags, fc);
> @@ -1080,7 +1144,7 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
>
>         return;
>  err:
> -       fuse_uring_next_fuse_req(ring_ent, queue);
> +       fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
>  }
>
>  /* queue a fuse request and send it if a ring entry is available */
>
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-18 23:30   ` Joanne Koong
@ 2024-11-18 23:47     ` Bernd Schubert
  2024-11-19  2:02       ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Bernd Schubert @ 2024-11-18 23:47 UTC (permalink / raw)
  To: Joanne Koong, Bernd Schubert
  Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
	io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd



On 11/19/24 00:30, Joanne Koong wrote:
> On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>>
>> When the fuse-server terminates while the fuse-client or kernel
>> still has queued URING_CMDs, these commands retain references
>> to the struct file used by the fuse connection. This prevents
>> fuse_dev_release() from being invoked, resulting in a hung mount
>> point.
>>
>> This patch addresses the issue by making queued URING_CMDs
>> cancelable, allowing fuse_dev_release() to proceed as expected
>> and preventing the mount point from hanging.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>>  fs/fuse/dev_uring.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 70 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> index 6af515458695ccb2e32cc8c62c45471e6710c15f..b465da41c42c47eaf69f09bab1423061bc8fcc68 100644
>> --- a/fs/fuse/dev_uring.c
>> +++ b/fs/fuse/dev_uring.c
>> @@ -23,6 +23,7 @@ MODULE_PARM_DESC(enable_uring,
>>
>>  struct fuse_uring_cmd_pdu {
>>         struct fuse_ring_ent *ring_ent;
>> +       struct fuse_ring_queue *queue;
>>  };
>>
>>  /*
>> @@ -382,6 +383,61 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
>>         }
>>  }
>>
>> +/*
>> + * Handle IO_URING_F_CANCEL, typically should come on daemon termination
>> + */
>> +static void fuse_uring_cancel(struct io_uring_cmd *cmd,
>> +                             unsigned int issue_flags, struct fuse_conn *fc)
>> +{
>> +       struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
>> +       struct fuse_ring_queue *queue = pdu->queue;
>> +       struct fuse_ring_ent *ent;
>> +       bool found = false;
>> +       bool need_cmd_done = false;
>> +
>> +       spin_lock(&queue->lock);
>> +
>> +       /* XXX: This is cumbersome for large queues. */
>> +       list_for_each_entry(ent, &queue->ent_avail_queue, list) {
>> +               if (pdu->ring_ent == ent) {
>> +                       found = true;
>> +                       break;
>> +               }
>> +       }
> 
> Do we have to check that the entry is on the ent_avail_queue, or can
> we assume that if the ent->state is FRRS_WAIT, the only queue it'll be
> on is the ent_avail_queue? I see only one case where this isn't true,
> for teardown in fuse_uring_stop_list_entries() - if we had a
> workaround for this, eg having some teardown state signifying that
> io_uring_cmd_done() needs to be called on the cmd and clearing
> FRRS_WAIT, then we could get rid of iteration through ent_avail_queue
> for every cancelled cmd.


I'm scared that we would run into races - I don't want to access memory
pointed to by pdu->ring_ent, before I'm not sure it is on the list.
Remember the long discussion Miklos and I had about the tiny 'tag'
variable and finding requests using existing hash lists [0] ? 
Personally I would prefer an array of 

struct queue_entries {
	struct fuse_ring_ent *ring_ent;
	bool valid;
}


in struct fuse_ring_queue {
    ...
    struct queue_entries *entries[]
}

And that array would only get freed on queue destruction. Besides
avoiding hash lists, it would also allow to use 'valid' to know if
we can access the ring_entry and then check the state.

Thanks,
Bernd


[0] https://lore.kernel.org/linux-fsdevel/CAJfpegu_UQ1BNp0UDHeOZFWwUoXbJ_LP4W=o+UX6MB3DsJbH8g@mail.gmail.com/T/#t

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-18 23:47     ` Bernd Schubert
@ 2024-11-19  2:02       ` Joanne Koong
  2024-11-19  9:32         ` Bernd Schubert
  0 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2024-11-19  2:02 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Bernd Schubert, Miklos Szeredi, Jens Axboe, Pavel Begunkov,
	linux-fsdevel, io-uring, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd

On Mon, Nov 18, 2024 at 3:47 PM Bernd Schubert
<[email protected]> wrote:
>
> On 11/19/24 00:30, Joanne Koong wrote:
> > On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
> >>
> >> When the fuse-server terminates while the fuse-client or kernel
> >> still has queued URING_CMDs, these commands retain references
> >> to the struct file used by the fuse connection. This prevents
> >> fuse_dev_release() from being invoked, resulting in a hung mount
> >> point.
> >>
> >> This patch addresses the issue by making queued URING_CMDs
> >> cancelable, allowing fuse_dev_release() to proceed as expected
> >> and preventing the mount point from hanging.
> >>
> >> Signed-off-by: Bernd Schubert <[email protected]>
> >> ---
> >>  fs/fuse/dev_uring.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++-----
> >>  1 file changed, 70 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> >> index 6af515458695ccb2e32cc8c62c45471e6710c15f..b465da41c42c47eaf69f09bab1423061bc8fcc68 100644
> >> --- a/fs/fuse/dev_uring.c
> >> +++ b/fs/fuse/dev_uring.c
> >> @@ -23,6 +23,7 @@ MODULE_PARM_DESC(enable_uring,
> >>
> >>  struct fuse_uring_cmd_pdu {
> >>         struct fuse_ring_ent *ring_ent;
> >> +       struct fuse_ring_queue *queue;
> >>  };
> >>
> >>  /*
> >> @@ -382,6 +383,61 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
> >>         }
> >>  }
> >>
> >> +/*
> >> + * Handle IO_URING_F_CANCEL, typically should come on daemon termination
> >> + */
> >> +static void fuse_uring_cancel(struct io_uring_cmd *cmd,
> >> +                             unsigned int issue_flags, struct fuse_conn *fc)
> >> +{
> >> +       struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
> >> +       struct fuse_ring_queue *queue = pdu->queue;
> >> +       struct fuse_ring_ent *ent;
> >> +       bool found = false;
> >> +       bool need_cmd_done = false;
> >> +
> >> +       spin_lock(&queue->lock);
> >> +
> >> +       /* XXX: This is cumbersome for large queues. */
> >> +       list_for_each_entry(ent, &queue->ent_avail_queue, list) {
> >> +               if (pdu->ring_ent == ent) {
> >> +                       found = true;
> >> +                       break;
> >> +               }
> >> +       }
> >
> > Do we have to check that the entry is on the ent_avail_queue, or can
> > we assume that if the ent->state is FRRS_WAIT, the only queue it'll be
> > on is the ent_avail_queue? I see only one case where this isn't true,
> > for teardown in fuse_uring_stop_list_entries() - if we had a
> > workaround for this, eg having some teardown state signifying that
> > io_uring_cmd_done() needs to be called on the cmd and clearing
> > FRRS_WAIT, then we could get rid of iteration through ent_avail_queue
> > for every cancelled cmd.
>
>
> I'm scared that we would run into races - I don't want to access memory
> pointed to by pdu->ring_ent, before I'm not sure it is on the list.

Oh, I was seeing that we mark the cmd as cancellable (eg in
fuse_uring_prepare_cancel()) only after the ring_ent is moved to the
ent_avail_queue (in fuse_uring_ent_avail()) and that this is done in
the scope of the queue->lock, so we would only call into
fuse_uring_cancel() when the ring_ent is on the list. Could there
still be a race condition here?

Alternatively, inspired by your "bool valid;" idea below, maybe
another solution would be having a bit in "struct fuse_ring_ent"
tracking if io_uring_cmd_done() needs to be called on it?

This is fairly unimportant though - this part could always be
optimized in a future patchset if you think it needs to be optimized,
but was just curious if these would work.


Thanks,
Joanne

> Remember the long discussion Miklos and I had about the tiny 'tag'
> variable and finding requests using existing hash lists [0] ?
> Personally I would prefer an array of
>
> struct queue_entries {
>         struct fuse_ring_ent *ring_ent;
>         bool valid;
> }
>
>
> in struct fuse_ring_queue {
>     ...
>     struct queue_entries *entries[]
> }
>
> And that array would only get freed on queue destruction. Besides
> avoiding hash lists, it would also allow to use 'valid' to know if
> we can access the ring_entry and then check the state.
>
> Thanks,
> Bernd
>
>
> [0] https://lore.kernel.org/linux-fsdevel/CAJfpegu_UQ1BNp0UDHeOZFWwUoXbJ_LP4W=o+UX6MB3DsJbH8g@mail.gmail.com/T/#t

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
  2024-11-19  2:02       ` Joanne Koong
@ 2024-11-19  9:32         ` Bernd Schubert
  0 siblings, 0 replies; 29+ messages in thread
From: Bernd Schubert @ 2024-11-19  9:32 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Bernd Schubert, Miklos Szeredi, Jens Axboe, Pavel Begunkov,
	linux-fsdevel, io-uring, Josef Bacik, Amir Goldstein, Ming Lei,
	David Wei, bernd



On 11/19/24 03:02, Joanne Koong wrote:
> On Mon, Nov 18, 2024 at 3:47 PM Bernd Schubert
> <[email protected]> wrote:
>>
>> On 11/19/24 00:30, Joanne Koong wrote:
>>> On Thu, Nov 7, 2024 at 9:04 AM Bernd Schubert <[email protected]> wrote:
>>>>
>>>> When the fuse-server terminates while the fuse-client or kernel
>>>> still has queued URING_CMDs, these commands retain references
>>>> to the struct file used by the fuse connection. This prevents
>>>> fuse_dev_release() from being invoked, resulting in a hung mount
>>>> point.
>>>>
>>>> This patch addresses the issue by making queued URING_CMDs
>>>> cancelable, allowing fuse_dev_release() to proceed as expected
>>>> and preventing the mount point from hanging.
>>>>
>>>> Signed-off-by: Bernd Schubert <[email protected]>
>>>> ---
>>>>  fs/fuse/dev_uring.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++-----
>>>>  1 file changed, 70 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>>>> index 6af515458695ccb2e32cc8c62c45471e6710c15f..b465da41c42c47eaf69f09bab1423061bc8fcc68 100644
>>>> --- a/fs/fuse/dev_uring.c
>>>> +++ b/fs/fuse/dev_uring.c
>>>> @@ -23,6 +23,7 @@ MODULE_PARM_DESC(enable_uring,
>>>>
>>>>  struct fuse_uring_cmd_pdu {
>>>>         struct fuse_ring_ent *ring_ent;
>>>> +       struct fuse_ring_queue *queue;
>>>>  };
>>>>
>>>>  /*
>>>> @@ -382,6 +383,61 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
>>>>         }
>>>>  }
>>>>
>>>> +/*
>>>> + * Handle IO_URING_F_CANCEL, typically should come on daemon termination
>>>> + */
>>>> +static void fuse_uring_cancel(struct io_uring_cmd *cmd,
>>>> +                             unsigned int issue_flags, struct fuse_conn *fc)
>>>> +{
>>>> +       struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
>>>> +       struct fuse_ring_queue *queue = pdu->queue;
>>>> +       struct fuse_ring_ent *ent;
>>>> +       bool found = false;
>>>> +       bool need_cmd_done = false;
>>>> +
>>>> +       spin_lock(&queue->lock);
>>>> +
>>>> +       /* XXX: This is cumbersome for large queues. */
>>>> +       list_for_each_entry(ent, &queue->ent_avail_queue, list) {
>>>> +               if (pdu->ring_ent == ent) {
>>>> +                       found = true;
>>>> +                       break;
>>>> +               }
>>>> +       }
>>>
>>> Do we have to check that the entry is on the ent_avail_queue, or can
>>> we assume that if the ent->state is FRRS_WAIT, the only queue it'll be
>>> on is the ent_avail_queue? I see only one case where this isn't true,
>>> for teardown in fuse_uring_stop_list_entries() - if we had a
>>> workaround for this, eg having some teardown state signifying that
>>> io_uring_cmd_done() needs to be called on the cmd and clearing
>>> FRRS_WAIT, then we could get rid of iteration through ent_avail_queue
>>> for every cancelled cmd.
>>
>>
>> I'm scared that we would run into races - I don't want to access memory
>> pointed to by pdu->ring_ent, before I'm not sure it is on the list.
> 
> Oh, I was seeing that we mark the cmd as cancellable (eg in
> fuse_uring_prepare_cancel()) only after the ring_ent is moved to the
> ent_avail_queue (in fuse_uring_ent_avail()) and that this is done in
> the scope of the queue->lock, so we would only call into
> fuse_uring_cancel() when the ring_ent is on the list. Could there
> still be a race condition here?
> 
> Alternatively, inspired by your "bool valid;" idea below, maybe
> another solution would be having a bit in "struct fuse_ring_ent"
> tracking if io_uring_cmd_done() needs to be called on it?

What I mean is that daemon termination might race with normal umount.
Umount does everything cleanly and iterates through lists, but might
free 'struct fuse_ring_ent', see fuse_uring_entry_teardown().
On the other hand, daemon termination with IO_URING_F_CANCEL has 
the pointer to ring_ent - but that pointer might be already freed 
by umount. That also means another bit in "struct fuse_ring_ent" 
won't help.

> 
> This is fairly unimportant though - this part could always be
> optimized in a future patchset if you think it needs to be optimized,
> but was just curious if these would work.
> 

I'm going to change logic a bit and will introduce another list
'freeable_ring_ent'. Entries will be moved to that new list and
only freed in fuse_uring_destruct(). After that IO_URING_F_CANCEL
can check stat of ring_ent directly


Thanks for the discussion!


Bernd

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-11-19  9:32 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-07 17:03 [PATCH RFC v5 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 03/16] fuse: Move request bits Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
2024-11-14 20:57   ` Joanne Koong
2024-11-14 21:05     ` Bernd Schubert
2024-11-14 21:29       ` Joanne Koong
2024-11-14 22:06         ` Bernd Schubert
2024-11-15  0:49           ` Joanne Koong
2024-11-07 17:03 ` [PATCH RFC v5 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 07/16] fuse: Make fuse_copy non static Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 11/16] fuse: {uring} Add a ring queue and send method Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 12/16] fuse: {uring} Allow to queue to the ring Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
2024-11-07 17:03 ` [PATCH RFC v5 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2024-11-18 19:32   ` Joanne Koong
2024-11-18 19:55     ` Bernd Schubert
2024-11-18 23:10       ` Joanne Koong
2024-11-18 23:30   ` Joanne Koong
2024-11-18 23:47     ` Bernd Schubert
2024-11-19  2:02       ` Joanne Koong
2024-11-19  9:32         ` Bernd Schubert
2024-11-07 17:04 ` [PATCH RFC v5 16/16] fuse: enable fuse-over-io-uring Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox