* [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
@ 2024-10-16 0:05 Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 01/15] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
` (17 more replies)
0 siblings, 18 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert, Josef Bacik
This adds support for uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
appraoch was taken from ublk. The patches are in RFC state,
some major changes are still to be expected.
Motivation for these patches is all to increase fuse performance.
In fuse-over-io-uring requests avoid core switching (application
on core X, processing of fuse server on random core Y) and use
shared memory between kernel and userspace to transfer data.
Similar approaches have been taken by ZUFS and FUSE2, though
not over io-uring, but through ioctl IOs
https://lwn.net/Articles/756625/
https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/log/?h=fuse2
Avoiding cache line bouncing / numa systems was discussed
between Amir and Miklos before and Miklos had posted
part of the private discussion here
https://lore.kernel.org/linux-fsdevel/CAJfpegtL3NXPNgK1kuJR8kLu3WkVC_ErBPRfToLEiA_0=w3=hA@mail.gmail.com/
This cache line bouncing should be reduced by these patches, as
a) Switching between kernel and userspace is reduced by 50%,
as the request fetch (by read) and result commit (write) is replaced
by a single and submit and fetch command
b) Submitting via ring can avoid context switches at all.
Note: As of now userspace still needs to transition to the kernel to
wake up the submit the result, though it might be possible to
avoid that as well (for example either with IORING_SETUP_SQPOLL
(basic testing did not show performance advantage for now) or
the task that is submitting fuse requests to the ring could also
poll for results (needs additional work).
I had also noticed waitq wake-up latencies in fuse before
https://lore.kernel.org/lkml/[email protected]/T/
This spinning approach helped with performance (>40% improvement
for file creates), but due to random server side thread/core utilization
spinning cannot be well controlled in /dev/fuse mode.
With fuse-over-io-uring requests are handled on the same core
(sync requests) or on core+1 (large async requests) and performance
improvements are achieved without spinning.
Splice/zero-copy is not supported yet, Ming Lei is working
on io-uring support for ublk_drv, we can probably also use
that approach for fuse and get better zero copy than splice.
https://lore.kernel.org/io-uring/[email protected]/
RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM
(32 cores) with a kernel that has several debug options
enabled (like KASAN and MSAN). RFCv3 is not that well tested yet.
O_DIRECT is currently not working well with /dev/fuse and
also these patches, a patch has been submitted to fix that (although
the approach is refused)
https://www.spinics.net/lists/linux-fsdevel/msg280028.html
Up the to RFCv2 nice effect in io-uring mode was that xftests run faster
(like generic/522 ~2400s /dev/fuse vs. ~1600s patched), though still
slow as this is with ASAN/leak-detection/etc.
With RFCv3 and removed mmap overall run time as approximately the same,
though some optimizations are removed in RFCv3, like submitting to
the ring from the task that created the fuse request (hence, without
io_uring_cmd_complete_in_task()).
The corresponding libfuse patches are on my uring branch,
but need cleanup for submission - will happen during the next
days.
https://github.com/bsbernd/libfuse/tree/uring
Testing with that libfuse branch is possible by running something
like:
example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
/scratch/source /scratch/dest
With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:
cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
unique: 4, result=104
Without the --uring option "cqe" is replaced by the default "dev"
dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
unique: 4, success, outsize: 120
TODO list for next RFC version
- make the buffer layout exactly the same as /dev/fuse IO
- different request size - a large ring queue size currently needs
too much memory, even if most of the queue size is needed for small
IOs
Future work
- zero copy
I had run quite some benchmarks with linux-6.2 before LSFMMBPF2023,
which, resulted in some tuning patches (at the end of the
patch series).
Benchmark results (with RFC v1)
=======================================
System used for the benchmark is a 32 core (HyperThreading enabled)
Xeon E5-2650 system. I don't have local disks attached that could do
>5GB/s IOs, for paged and dio results a patched version of passthrough-hp
was used that bypasses final reads/writes.
paged reads
-----------
128K IO size 1024K IO size
jobs /dev/fuse uring gain /dev/fuse uring gain
1 1117 1921 1.72 1902 1942 1.02
2 2502 3527 1.41 3066 3260 1.06
4 5052 6125 1.21 5994 6097 1.02
8 6273 10855 1.73 7101 10491 1.48
16 6373 11320 1.78 7660 11419 1.49
24 6111 9015 1.48 7600 9029 1.19
32 5725 7968 1.39 6986 7961 1.14
dio reads (1024K)
-----------------
jobs /dev/fuse uring gain
1 2023 3998 2.42
2 3375 7950 2.83
4 3823 15022 3.58
8 7796 22591 2.77
16 8520 27864 3.27
24 8361 20617 2.55
32 8717 12971 1.55
mmap reads (4K)
---------------
(sequential, I probably should have made it random, sequential exposes
a rather interesting/weird 'optimized' memcpy issue - sequential becomes
reversed order 4K read)
https://lore.kernel.org/linux-fsdevel/[email protected]/
jobs /dev/fuse uring gain
1 130 323 2.49
2 219 538 2.46
4 503 1040 2.07
8 1472 2039 1.38
16 2191 3518 1.61
24 2453 4561 1.86
32 2178 5628 2.58
(Results on request, setting MAP_HUGETLB much improves performance
for both, io-uring mode then has a slight advantage only.)
creates/s
----------
threads /dev/fuse uring gain
1 3944 10121 2.57
2 8580 24524 2.86
4 16628 44426 2.67
8 46746 56716 1.21
16 79740 102966 1.29
20 80284 119502 1.49
(the gain drop with >=8 cores needs to be investigated)
Jens had done some benchmarks with v3 and noticed only
25% improvement and half of CPU time usage, but v3
removes several optimizations (like waking the same core
and avoiding task io_uring_cmd_done in extra task context).
These optimizations will be submitted once the core work
is merged.
Remaining TODO list for RFCv3:
--------------------------------
1) Let the ring configure ioctl return information,
like mmap/queue-buf size
Right now libfuse and kernel have lots of duplicated setup code
and any kind of pointer/offset mismatch results in a non-working
ring that is hard to debug - probably better when the kernel does
the calculations and returns that to server side
2) In combination with 1, ring requests should retrieve their
userspace address and length from kernel side instead of
calculating it through the mmaped queue buffer on their own.
(Introduction of FUSE_URING_BUF_ADDR_FETCH)
3) Add log buffer into the ioctl and ring-request
This is to provide better error messages (instead of just
errno)
3) Multiple IO sizes per queue
Small IOs and metadata requests do not need large buffer sizes,
we need multiple IO sizes per queue.
4) FUSE_INTERRUPT handling
These are not handled yet, kernel side is probably not difficult
anymore as ring entries take fuse requests through lists.
TODO:
======
- separate buffer for fuse headers to always handle alignment
Signed-off-by: Bernd Schubert <[email protected]>
---
Changes in v4:
- Removal of ioctls, all configuration is done dynamically
on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
allocated as array of the ring/queue - removal of the tag
variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
is more cumbersome now and needs an almost unused
struct fuse_pqueue per fuse_ring_queue and uses the unique
id of fuse requests.
- No device clones needed for to workaroung hanging mounts
on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.
Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
as that needs to go through another subsystem/tree) ,
removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
(Josef)
- Addressed several other comments from Josef (I need to go over
the RFCv2 review again, I'm not sure if everything is addressed
already)
- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com
- Link to v2: https://lore.kernel.org/all/[email protected]/
- Link to v1: https://lore.kernel.org/r/[email protected]
---
Bernd Schubert (14):
fuse: rename to fuse_dev_end_requests and make non-static
fuse: Move fuse_get_dev to header file
fuse: Move request bits
fuse: Add fuse-io-uring design documentation
fuse: {uring} Handle SQEs - register commands
fuse: Make fuse_copy non static
fuse: Add buffer offset for uring into fuse_copy_state
fuse: {uring} Add uring sqe commit and fetch support
fuse: {uring} Handle teardown of ring entries
fuse: {uring} Add a ring queue and send method
fuse: {uring} Allow to queue to the ring
fuse: {uring} Handle IO_URING_F_TASK_DEAD
fuse: {io-uring} Prevent mount point hang on fuse-server termination
fuse: enable fuse-over-io-uring
Pavel Begunkov (1):
io_uring/cmd: let cmds to know about dying task
Documentation/filesystems/fuse-io-uring.rst | 101 +++
fs/fuse/Kconfig | 12 +
fs/fuse/Makefile | 1 +
fs/fuse/dev.c | 155 ++--
fs/fuse/dev_uring.c | 1130 +++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 199 +++++
fs/fuse/fuse_dev_i.h | 64 ++
fs/fuse/fuse_i.h | 14 +
fs/fuse/inode.c | 5 +-
include/linux/io_uring_types.h | 1 +
include/uapi/linux/fuse.h | 70 ++
io_uring/uring_cmd.c | 6 +-
12 files changed, 1706 insertions(+), 52 deletions(-)
---
base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8
Best regards,
--
Bernd Schubert <[email protected]>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC v4 01/15] fuse: rename to fuse_dev_end_requests and make non-static
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 02/15] fuse: Move fuse_get_dev to header file Bernd Schubert
` (16 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert, Josef Bacik
This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
fs/fuse/dev.c | 7 ++++---
fs/fuse/fuse_dev_i.h | 15 +++++++++++++++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de124b3b469f5487beebbaf7630eb3..74cb9ae900525890543e0d79a5a89e5d43d31c9c 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -7,6 +7,7 @@
*/
#include "fuse_i.h"
+#include "fuse_dev_i.h"
#include <linux/init.h>
#include <linux/module.h>
@@ -2136,7 +2137,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
}
/* Abort all requests on the given list (pending or processing) */
-static void end_requests(struct list_head *head)
+void fuse_dev_end_requests(struct list_head *head)
{
while (!list_empty(head)) {
struct fuse_req *req;
@@ -2239,7 +2240,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
wake_up_all(&fc->blocked_waitq);
spin_unlock(&fc->lock);
- end_requests(&to_end);
+ fuse_dev_end_requests(&to_end);
} else {
spin_unlock(&fc->lock);
}
@@ -2269,7 +2270,7 @@ int fuse_dev_release(struct inode *inode, struct file *file)
list_splice_init(&fpq->processing[i], &to_end);
spin_unlock(&fpq->lock);
- end_requests(&to_end);
+ fuse_dev_end_requests(&to_end);
/* Are we the last open device? */
if (atomic_dec_and_test(&fc->dev_count)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..5a1b8a2775d84274abee46eabb3000345b2d9da0
--- /dev/null
+++ b/fs/fuse/fuse_dev_i.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2008 Miklos Szeredi <[email protected]>
+ */
+#ifndef _FS_FUSE_DEV_I_H
+#define _FS_FUSE_DEV_I_H
+
+#include <linux/types.h>
+
+void fuse_dev_end_requests(struct list_head *head);
+
+#endif
+
+
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 02/15] fuse: Move fuse_get_dev to header file
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 01/15] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 03/15] fuse: Move request bits Bernd Schubert
` (15 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert, Josef Bacik
Another preparation patch, as this function will be needed by
fuse/dev.c and fuse/dev_uring.c.
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
fs/fuse/dev.c | 9 ---------
fs/fuse/fuse_dev_i.h | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 74cb9ae900525890543e0d79a5a89e5d43d31c9c..9ac69fd2cead0d1fe062dc3405a7dedcd1d36691 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -32,15 +32,6 @@ MODULE_ALIAS("devname:fuse");
static struct kmem_cache *fuse_req_cachep;
-static struct fuse_dev *fuse_get_dev(struct file *file)
-{
- /*
- * Lockless access is OK, because file->private data is set
- * once during mount and is valid until the file is released.
- */
- return READ_ONCE(file->private_data);
-}
-
static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
{
INIT_LIST_HEAD(&req->list);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 5a1b8a2775d84274abee46eabb3000345b2d9da0..b38e67b3f889f3fa08f7279e3309cde908527146 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,15 @@
#include <linux/types.h>
+static inline struct fuse_dev *fuse_get_dev(struct file *file)
+{
+ /*
+ * Lockless access is OK, because file->private data is set
+ * once during mount and is valid until the file is released.
+ */
+ return READ_ONCE(file->private_data);
+}
+
void fuse_dev_end_requests(struct list_head *head);
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 03/15] fuse: Move request bits
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 01/15] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 02/15] fuse: Move fuse_get_dev to header file Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 04/15] fuse: Add fuse-io-uring design documentation Bernd Schubert
` (14 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert, Josef Bacik
These are needed by dev_uring functions as well
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
fs/fuse/dev.c | 4 ----
fs/fuse/fuse_dev_i.h | 4 ++++
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9ac69fd2cead0d1fe062dc3405a7dedcd1d36691..dbc222f9b0f0e590ce3ef83077e6b4cff03cff65 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -26,10 +26,6 @@
MODULE_ALIAS_MISCDEV(FUSE_MINOR);
MODULE_ALIAS("devname:fuse");
-/* Ordinary requests have even IDs, while interrupts IDs are odd */
-#define FUSE_INT_REQ_BIT (1ULL << 0)
-#define FUSE_REQ_ID_STEP (1ULL << 1)
-
static struct kmem_cache *fuse_req_cachep;
static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index b38e67b3f889f3fa08f7279e3309cde908527146..6c506f040d5fb57dae746880c657a95637ac50ce 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,10 @@
#include <linux/types.h>
+/* Ordinary requests have even IDs, while interrupts IDs are odd */
+#define FUSE_INT_REQ_BIT (1ULL << 0)
+#define FUSE_REQ_ID_STEP (1ULL << 1)
+
static inline struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 04/15] fuse: Add fuse-io-uring design documentation
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (2 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 03/15] fuse: Move request bits Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 05/15] fuse: {uring} Handle SQEs - register commands Bernd Schubert
` (13 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
Signed-off-by: Bernd Schubert <[email protected]>
---
Documentation/filesystems/fuse-io-uring.rst | 101 ++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)
diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 0000000000000000000000000000000000000000..50fdba1ea566588be3663e29b04bb9bbb6c9e4fb
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+FUSE Uring design documentation
+==============================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through uring, userspace
+is required to also handle requests through /dev/fuse after
+uring setup is complete. Specifically notifications (initiated from
+the daemon side) and interrupts.
+
+Fuse io-uring configuration
+========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until uring setup is complete.
+
+In order to set up fuse-over-io-uring fuse-server (user-space)
+needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the
+/dev/fuse connection file descriptor. Initial submit is with
+the sub command FUSE_URING_REQ_FETCH, which will just register
+entries to be available in the kernel.
+
+Once at least one entry per queue is submitted, kernel starts
+to enqueue to ring queues.
+Note, every CPU core has its own fuse-io-uring queue.
+Userspace handles the CQE/fuse-request and submits the result as
+subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
+the requests and also marks the entry available again. If there are
+pending requests waiting the request will be immediately submitted
+to the daemon again.
+
+Initial SQE
+-----------
+
+ | | FUSE filesystem daemon
+ | |
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD /
+ | | FUSE_URING_REQ_FETCH
+ | | [wait cqe]
+ | | >io_uring_wait_cqe() or
+ | | >io_uring_submit_and_wait()
+ | |
+ | >fuse_uring_cmd() |
+ | >fuse_uring_fetch() |
+ | >fuse_uring_ent_release() |
+
+
+Sending requests with CQEs
+--------------------------
+
+ | | FUSE filesystem daemon
+ | | [waiting for CQEs]
+ | "rm /mnt/fuse/file" |
+ | |
+ | >sys_unlink() |
+ | >fuse_unlink() |
+ | [allocate request] |
+ | >__fuse_request_send() |
+ | ... |
+ | >fuse_uring_queue_fuse_req |
+ | [queue request on fg or |
+ | bg queue] |
+ | >fuse_uring_assign_ring_entry() |
+ | >fuse_uring_send_to_ring() |
+ | >fuse_uring_copy_to_ring() |
+ | >io_uring_cmd_done() |
+ | >request_wait_answer() |
+ | [sleep on req->waitq] |
+ | | [receives and handles CQE]
+ | | [submit result and fetch next]
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD/
+ | | FUSE_URING_REQ_COMMIT_AND_FETCH
+ | >fuse_uring_cmd() |
+ | >fuse_uring_commit_and_release() |
+ | >fuse_uring_copy_from_ring() |
+ | [ copy the result to the fuse req] |
+ | >fuse_uring_req_end_and_get_next() |
+ | >fuse_request_end() |
+ | [wake up req->waitq] |
+ | >fuse_uring_ent_release_and_fetch()|
+ | [wait or handle next req] |
+ | |
+ | |
+ | [req->waitq woken up] |
+ | <fuse_unlink() |
+ | <sys_unlink() |
+
+
+
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 05/15] fuse: {uring} Handle SQEs - register commands
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (3 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 04/15] fuse: Add fuse-io-uring design documentation Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 06/15] fuse: Make fuse_copy non static Bernd Schubert
` (12 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/Kconfig | 12 ++
fs/fuse/Makefile | 1 +
fs/fuse/dev.c | 4 +
fs/fuse/dev_uring.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 105 ++++++++++++++++
fs/fuse/fuse_dev_i.h | 1 +
fs/fuse/fuse_i.h | 5 +
fs/fuse/inode.c | 3 +
include/uapi/linux/fuse.h | 70 +++++++++++
9 files changed, 498 insertions(+)
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..11f37cefc94b2af5a675c238801560c822b95f1a 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
to be performed directly on a backing file.
If you want to allow passthrough operations, answer Y.
+
+config FUSE_IO_URING
+ bool "FUSE communication over io-uring"
+ default y
+ depends on FUSE_FS
+ depends on IO_URING
+ help
+ This allows sending FUSE requests over the IO uring interface and
+ also adds request core affinity.
+
+ If you want to allow fuse server/client communication through io-uring,
+ answer Y
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 6e0228c6d0cba9541c8668efb86b83094751d469..7193a14374fd3a08b901ef53fbbea7c31b12f22c 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -11,5 +11,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
fuse-y += iomode.o
fuse-$(CONFIG_FUSE_DAX) += dax.o
fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
+fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index dbc222f9b0f0e590ce3ef83077e6b4cff03cff65..8e8d887bb3dfacec074753ebba7bd504335b5a18 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -6,6 +6,7 @@
See the file COPYING.
*/
+#include "dev_uring_i.h"
#include "fuse_i.h"
#include "fuse_dev_i.h"
@@ -2398,6 +2399,9 @@ const struct file_operations fuse_dev_operations = {
.fasync = fuse_dev_fasync,
.unlocked_ioctl = fuse_dev_ioctl,
.compat_ioctl = compat_ptr_ioctl,
+#ifdef CONFIG_FUSE_IO_URING
+ .uring_cmd = fuse_uring_cmd,
+#endif
};
EXPORT_SYMBOL_GPL(fuse_dev_operations);
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
new file mode 100644
index 0000000000000000000000000000000000000000..724ac6ae67d301a7bdc5b36a20d620ff8be63b18
--- /dev/null
+++ b/fs/fuse/dev_uring.c
@@ -0,0 +1,297 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#include <linux/fs.h>
+
+#include "fuse_i.h"
+#include "dev_uring_i.h"
+#include "fuse_dev_i.h"
+
+#include <linux/io_uring/cmd.h>
+
+#ifdef CONFIG_FUSE_IO_URING
+static bool __read_mostly enable_uring;
+module_param(enable_uring, bool, 0644);
+MODULE_PARM_DESC(enable_uring,
+ "Enable uring userspace communication through uring.");
+#endif
+
+static int fuse_ring_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
+{
+ struct fuse_ring_queue *queue = ent->queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
+ return -EIO;
+
+ ent->state = FRRS_COMMIT;
+ list_move(&ent->list, &queue->ent_intermediate_queue);
+
+ return 0;
+}
+
+void fuse_uring_destruct(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+ int qid;
+
+ if (!ring)
+ return;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ WARN_ON(!list_empty(&queue->ent_avail_queue));
+ WARN_ON(!list_empty(&queue->ent_intermediate_queue));
+
+ kfree(queue);
+ ring->queues[qid] = NULL;
+ }
+
+ kfree(ring->queues);
+ kfree(ring);
+ fc->ring = NULL;
+}
+
+/*
+ * Basic ring setup for this connection based on the provided configuration
+ */
+static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = NULL;
+ size_t nr_queues = num_possible_cpus();
+ struct fuse_ring *res = NULL;
+
+ ring = kzalloc(sizeof(*fc->ring) +
+ nr_queues * sizeof(struct fuse_ring_queue),
+ GFP_KERNEL_ACCOUNT);
+ if (!ring)
+ return NULL;
+
+ ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
+ GFP_KERNEL_ACCOUNT);
+ if (!ring->queues)
+ goto out_err;
+
+ spin_lock(&fc->lock);
+ if (fc->ring) {
+ /* race, another thread created the ring in the mean time */
+ spin_unlock(&fc->lock);
+ res = fc->ring;
+ goto out_err;
+ }
+
+ fc->ring = ring;
+ ring->nr_queues = nr_queues;
+ ring->fc = fc;
+
+ spin_unlock(&fc->lock);
+ return ring;
+
+out_err:
+ if (ring)
+ kfree(ring->queues);
+ kfree(ring);
+ return res;
+}
+
+static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
+ int qid)
+{
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_ring_queue *queue;
+
+ queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
+ if (!queue)
+ return ERR_PTR(-ENOMEM);
+ spin_lock(&fc->lock);
+ if (ring->queues[qid]) {
+ spin_unlock(&fc->lock);
+ kfree(queue);
+ return ring->queues[qid];
+ }
+ ring->queues[qid] = queue;
+
+ queue->qid = qid;
+ queue->ring = ring;
+ spin_lock_init(&queue->lock);
+
+ INIT_LIST_HEAD(&queue->ent_avail_queue);
+ INIT_LIST_HEAD(&queue->ent_intermediate_queue);
+
+ spin_unlock(&fc->lock);
+
+ return queue;
+}
+
+/*
+ * Put a ring request onto hold, it is no longer used for now.
+ */
+static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
+ struct fuse_ring_queue *queue)
+ __must_hold(&queue->lock)
+{
+ struct fuse_ring *ring = queue->ring;
+
+ lockdep_assert_held(&queue->lock);
+
+ /* unsets all previous flags - basically resets */
+ pr_devel("%s ring=%p qid=%d state=%d\n", __func__, ring,
+ ring_ent->queue->qid, ring_ent->state);
+
+ if (WARN_ON(ring_ent->state != FRRS_COMMIT)) {
+ pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+ ring_ent->state);
+ return;
+ }
+
+ list_move(&ring_ent->list, &queue->ent_avail_queue);
+
+ ring_ent->state = FRRS_WAIT;
+}
+
+/*
+ * fuse_uring_req_fetch command handling
+ */
+static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ spin_lock(&queue->lock);
+ fuse_uring_ent_avail(ring_ent, queue);
+ ring_ent->cmd = cmd;
+ spin_unlock(&queue->lock);
+}
+
+static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
+ struct fuse_conn *fc)
+{
+ const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ struct fuse_ring_ent *ring_ent;
+ int err;
+
+#if 0
+ /* Does not work as sending over io-uring is async */
+ err = -ETXTBSY;
+ if (fc->initialized) {
+ pr_info_ratelimited(
+ "Received FUSE_URING_REQ_FETCH after connection is initialized\n");
+ return err;
+ }
+#endif
+
+ err = -ENOMEM;
+ if (!ring) {
+ ring = fuse_uring_create(fc);
+ if (!ring)
+ return err;
+ }
+
+ queue = ring->queues[cmd_req->qid];
+ if (!queue) {
+ queue = fuse_uring_create_queue(ring, cmd_req->qid);
+ if (!queue)
+ return err;
+ }
+
+ /*
+ * The created queue above does not need to be destructed in
+ * case of entry errors below, will be done at ring destruction time.
+ */
+
+ ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
+ if (ring_ent == NULL)
+ return err;
+
+ ring_ent->queue = queue;
+ ring_ent->cmd = cmd;
+ ring_ent->rreq = (struct fuse_ring_req __user *)cmd_req->buf_ptr;
+ ring_ent->max_arg_len = cmd_req->buf_len - sizeof(*ring_ent->rreq);
+ INIT_LIST_HEAD(&ring_ent->list);
+
+ spin_lock(&queue->lock);
+
+ /*
+ * FUSE_URING_REQ_FETCH is an initialization exception, needs
+ * state override
+ */
+ ring_ent->state = FRRS_USERSPACE;
+ err = fuse_ring_ring_ent_unset_userspace(ring_ent);
+ spin_unlock(&queue->lock);
+ if (WARN_ON_ONCE(err != 0))
+ goto err;
+
+ _fuse_uring_fetch(ring_ent, cmd, issue_flags);
+
+ return 0;
+err:
+ list_del_init(&ring_ent->list);
+ kfree(ring_ent);
+ return err;
+}
+
+/**
+ * Entry function from io_uring to handle the given passthrough command
+ * (op cocde IORING_OP_URING_CMD)
+ */
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+ const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+ struct fuse_dev *fud;
+ struct fuse_conn *fc;
+ u32 cmd_op = cmd->cmd_op;
+ int err = 0;
+
+ pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
+
+ /* Disabled for now, especially as teardown is not implemented yet */
+ err = -EOPNOTSUPP;
+ pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
+ goto out;
+
+ err = -EOPNOTSUPP;
+ if (!enable_uring) {
+ pr_info_ratelimited("uring is disabled\n");
+ goto out;
+ }
+
+ err = -ENOTCONN;
+ fud = fuse_get_dev(cmd->file);
+ if (!fud) {
+ pr_info_ratelimited("No fuse device found\n");
+ goto out;
+ }
+ fc = fud->fc;
+
+ if (fc->aborted)
+ goto out;
+
+ switch (cmd_op) {
+ case FUSE_URING_REQ_FETCH:
+ err = fuse_uring_fetch(cmd, issue_flags, fc);
+ break;
+ default:
+ err = -EINVAL;
+ pr_devel("Unknown uring command %d", cmd_op);
+ goto out;
+ }
+out:
+ pr_devel("uring cmd op=%d, qid=%d ID=%llu ret=%d\n", cmd_op,
+ cmd_req->qid, cmd_req->commit_id, err);
+
+ if (err < 0)
+ io_uring_cmd_done(cmd, err, 0, issue_flags);
+
+ return -EIOCBQUEUED;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..9a763262c6a5a781a36c3825529d729efef80e78
--- /dev/null
+++ b/fs/fuse/dev_uring_i.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#ifndef _FS_FUSE_DEV_URING_I_H
+#define _FS_FUSE_DEV_URING_I_H
+
+#include "fuse_i.h"
+
+#ifdef CONFIG_FUSE_IO_URING
+
+enum fuse_ring_req_state {
+
+ /* ring entry received from userspace and it being processed */
+ FRRS_COMMIT,
+
+ /* The ring request waits for a new fuse request */
+ FRRS_WAIT,
+
+ /* request is in or on the way to user space */
+ FRRS_USERSPACE,
+};
+
+/** A fuse ring entry, part of the ring queue */
+struct fuse_ring_ent {
+ /* userspace buffer */
+ struct fuse_ring_req __user *rreq;
+
+ /* the ring queue that owns the request */
+ struct fuse_ring_queue *queue;
+
+ struct io_uring_cmd *cmd;
+
+ struct list_head list;
+
+ /*
+ * state the request is currently in
+ * (enum fuse_ring_req_state)
+ */
+ unsigned int state;
+
+ /* struct fuse_ring_req::in_out_arg size*/
+ size_t max_arg_len;
+};
+
+struct fuse_ring_queue {
+ /*
+ * back pointer to the main fuse uring structure that holds this
+ * queue
+ */
+ struct fuse_ring *ring;
+
+ /* queue id, typically also corresponds to the cpu core */
+ unsigned int qid;
+
+ /*
+ * queue lock, taken when any value in the queue changes _and_ also
+ * a ring entry state changes.
+ */
+ spinlock_t lock;
+
+ /* available ring entries (struct fuse_ring_ent) */
+ struct list_head ent_avail_queue;
+
+ /*
+ * entries in the process of being committed or in the process
+ * to be send to userspace
+ */
+ struct list_head ent_intermediate_queue;
+};
+
+/**
+ * Describes if uring is for communication and holds alls the data needed
+ * for uring communication
+ */
+struct fuse_ring {
+ /* back pointer */
+ struct fuse_conn *fc;
+
+ /* number of ring queues */
+ size_t nr_queues;
+
+ struct fuse_ring_queue **queues;
+};
+
+void fuse_uring_destruct(struct fuse_conn *fc);
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+
+#else /* CONFIG_FUSE_IO_URING */
+
+struct fuse_ring;
+
+static inline void fuse_uring_create(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_destruct(struct fuse_conn *fc)
+{
+}
+
+#endif /* CONFIG_FUSE_IO_URING */
+
+#endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 6c506f040d5fb57dae746880c657a95637ac50ce..e82cbf9c569af4f271ba0456cb49e0a5116bf36b 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,7 @@
#include <linux/types.h>
+
/* Ordinary requests have even IDs, while interrupts IDs are odd */
#define FUSE_INT_REQ_BIT (1ULL << 0)
#define FUSE_REQ_ID_STEP (1ULL << 1)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f2391961031374d8d55916c326c6472f0c03aae6..33e81b895fee620b9c2fcc8d9312fec53e3dc227 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -917,6 +917,11 @@ struct fuse_conn {
/** IDR for backing files ids */
struct idr backing_files_map;
#endif
+
+#ifdef CONFIG_FUSE_IO_URING
+ /** uring connection information*/
+ struct fuse_ring *ring;
+#endif
};
/*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d8756ded7145f38b49d129b361b991ba..59f8fb7b915f052f892d587a0f9a8dc17cf750ce 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -7,6 +7,7 @@
*/
#include "fuse_i.h"
+#include "dev_uring_i.h"
#include <linux/pagemap.h>
#include <linux/slab.h>
@@ -947,6 +948,8 @@ static void delayed_release(struct rcu_head *p)
{
struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
+ fuse_uring_destruct(fc);
+
put_user_ns(fc->user_ns);
fc->release(fc);
}
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d08b99d60f6fd6d0d072d01ad6bcc1b48da0a242..b60a42259f7f735f79e8010e5089f15c34eb9308 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -1186,4 +1186,74 @@ struct fuse_supp_groups {
uint32_t groups[];
};
+/**
+ * Size of the ring buffer header
+ */
+#define FUSE_RING_HEADER_BUF_SIZE 4096
+#define FUSE_RING_MIN_IN_OUT_ARG_SIZE 4096
+
+/*
+ * Request is background type. Daemon side is free to use this information
+ * to handle foreground/background CQEs with different priorities.
+ */
+#define FUSE_RING_REQ_FLAG_ASYNC (1ull << 0)
+
+/**
+ * This structure mapped onto the
+ */
+struct fuse_ring_req {
+ union {
+ /* The first 4K are command data */
+ char ring_header[FUSE_RING_HEADER_BUF_SIZE];
+
+ struct {
+ uint64_t flags;
+
+ uint32_t in_out_arg_len;
+ uint32_t padding;
+
+ /* kernel fills in, reads out */
+ union {
+ struct fuse_in_header in;
+ struct fuse_out_header out;
+ };
+ };
+ };
+
+ char in_out_arg[];
+};
+
+/**
+ * sqe commands to the kernel
+ */
+enum fuse_uring_cmd {
+ FUSE_URING_REQ_INVALID = 0,
+
+ /* submit sqe to kernel to get a request */
+ FUSE_URING_REQ_FETCH = 1,
+
+ /* commit result and fetch next request */
+ FUSE_URING_REQ_COMMIT_AND_FETCH = 2,
+};
+
+/**
+ * In the 80B command area of the SQE.
+ */
+struct fuse_uring_cmd_req {
+ /* User buffer */
+ uint64_t buf_ptr;
+
+ /* entry identifier */
+ uint64_t commit_id;
+
+ /* queue the command is for (queue index) */
+ uint16_t qid;
+ uint8_t padding[6];
+
+ /* length of the user buffer */
+ uint32_t buf_len;
+
+ uint32_t flags;
+};
+
#endif /* _LINUX_FUSE_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 06/15] fuse: Make fuse_copy non static
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (4 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 05/15] fuse: {uring} Handle SQEs - register commands Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 07/15] fuse: Add buffer offset for uring into fuse_copy_state Bernd Schubert
` (11 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 30 ++++++++----------------------
fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++
2 files changed, 33 insertions(+), 22 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8e8d887bb3dfacec074753ebba7bd504335b5a18..dc4e0f787159a0ce28d29d410f23120aa55cad53 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -630,22 +630,8 @@ static int unlock_request(struct fuse_req *req)
return err;
}
-struct fuse_copy_state {
- int write;
- struct fuse_req *req;
- struct iov_iter *iter;
- struct pipe_buffer *pipebufs;
- struct pipe_buffer *currbuf;
- struct pipe_inode_info *pipe;
- unsigned long nr_segs;
- struct page *pg;
- unsigned len;
- unsigned offset;
- unsigned move_pages:1;
-};
-
-static void fuse_copy_init(struct fuse_copy_state *cs, int write,
- struct iov_iter *iter)
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+ struct iov_iter *iter)
{
memset(cs, 0, sizeof(*cs));
cs->write = write;
@@ -999,9 +985,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
}
/* Copy request arguments to/from userspace buffer */
-static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
- unsigned argpages, struct fuse_arg *args,
- int zeroing)
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
+ unsigned argpages, struct fuse_arg *args,
+ int zeroing)
{
int err = 0;
unsigned i;
@@ -1867,8 +1853,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
return NULL;
}
-static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
- unsigned nbytes)
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+ unsigned nbytes)
{
unsigned reqsize = sizeof(struct fuse_out_header);
@@ -1970,7 +1956,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
if (oh.error)
err = nbytes != sizeof(oh) ? -EINVAL : 0;
else
- err = copy_out_args(cs, req->args, nbytes);
+ err = fuse_copy_out_args(cs, req->args, nbytes);
fuse_copy_finish(cs);
spin_lock(&fpq->lock);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index e82cbf9c569af4f271ba0456cb49e0a5116bf36b..f36e304cd62c8302aed95de89926fc894f602cfd 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -13,6 +13,23 @@
#define FUSE_INT_REQ_BIT (1ULL << 0)
#define FUSE_REQ_ID_STEP (1ULL << 1)
+struct fuse_arg;
+struct fuse_args;
+
+struct fuse_copy_state {
+ int write;
+ struct fuse_req *req;
+ struct iov_iter *iter;
+ struct pipe_buffer *pipebufs;
+ struct pipe_buffer *currbuf;
+ struct pipe_inode_info *pipe;
+ unsigned long nr_segs;
+ struct page *pg;
+ unsigned int len;
+ unsigned int offset;
+ unsigned int move_pages:1;
+};
+
static inline struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
@@ -24,6 +41,14 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
void fuse_dev_end_requests(struct list_head *head);
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+ struct iov_iter *iter);
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
+ unsigned int argpages, struct fuse_arg *args,
+ int zeroing);
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+ unsigned int nbytes);
+
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 07/15] fuse: Add buffer offset for uring into fuse_copy_state
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (5 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 06/15] fuse: Make fuse_copy non static Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 08/15] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
` (10 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
This is to know the size of the overall copy.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 13 ++++++++++++-
fs/fuse/fuse_dev_i.h | 5 +++++
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index dc4e0f787159a0ce28d29d410f23120aa55cad53..12836b44de9164e750f2a5f4c4d78c5c934a32b4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -738,6 +738,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
*size -= ncpy;
cs->len -= ncpy;
cs->offset += ncpy;
+ if (cs->is_uring)
+ cs->ring.offset += ncpy;
+
return ncpy;
}
@@ -1856,7 +1859,15 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
unsigned nbytes)
{
- unsigned reqsize = sizeof(struct fuse_out_header);
+
+ unsigned int reqsize = 0;
+
+ /*
+ * Uring has the out header outside of args
+ * XXX: This uring exception will probably change
+ */
+ if (!cs->is_uring)
+ reqsize = sizeof(struct fuse_out_header);
reqsize += fuse_len_args(args->out_numargs, args->out_args);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index f36e304cd62c8302aed95de89926fc894f602cfd..7ecb103af6f0feca99eb8940872c6a5ccf2e5186 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -28,6 +28,11 @@ struct fuse_copy_state {
unsigned int len;
unsigned int offset;
unsigned int move_pages:1;
+ unsigned int is_uring:1;
+ struct {
+ /* overall offset with the user buffer */
+ unsigned int offset;
+ } ring;
};
static inline struct fuse_dev *fuse_get_dev(struct file *file)
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 08/15] fuse: {uring} Add uring sqe commit and fetch support
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (6 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 07/15] fuse: Add buffer offset for uring into fuse_copy_state Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 09/15] fuse: {uring} Handle teardown of ring entries Bernd Schubert
` (9 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
This adds support for fuse request completion through ring SQEs
(FUSE_URING_REQ_COMMIT_AND_FETCH handling). After committing
the ring entry it becomes available for new fuse requests.
Handling of requests through the ring (SQE/CQE handling)
is complete now.
Fuse request data are copied through the mmaped ring buffer,
there is no support for any zero copy yet.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 6 +-
fs/fuse/dev_uring.c | 430 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 13 ++
fs/fuse/fuse_dev_i.h | 7 +-
fs/fuse/fuse_i.h | 9 ++
fs/fuse/inode.c | 2 +-
6 files changed, 462 insertions(+), 5 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 12836b44de9164e750f2a5f4c4d78c5c934a32b4..fdb43640db5fdbe6b6232e1b2e2259e3117d237d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -188,7 +188,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq)
}
EXPORT_SYMBOL_GPL(fuse_get_unique);
-static unsigned int fuse_req_hash(u64 unique)
+unsigned int fuse_req_hash(u64 unique)
{
return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
}
@@ -1844,7 +1844,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
}
/* Look up request on processing list by unique ID */
-static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique)
{
unsigned int hash = fuse_req_hash(unique);
struct fuse_req *req;
@@ -1929,7 +1929,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
spin_lock(&fpq->lock);
req = NULL;
if (fpq->connected)
- req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
+ req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
err = -ENOENT;
if (!req) {
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 724ac6ae67d301a7bdc5b36a20d620ff8be63b18..0c39d5c1c62a1c496782e5c54b9f72a70cffdfa2 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -19,6 +19,22 @@ MODULE_PARM_DESC(enable_uring,
"Enable uring userspace communication through uring.");
#endif
+/*
+ * Finalize a fuse request, then fetch and send the next entry, if available
+ */
+static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
+ int error)
+{
+ struct fuse_req *req = ring_ent->fuse_req;
+
+ if (set_err)
+ req->out.h.error = error;
+
+ clear_bit(FR_SENT, &req->flags);
+ fuse_request_end(ring_ent->fuse_req);
+ ring_ent->fuse_req = NULL;
+}
+
static int fuse_ring_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
{
struct fuse_ring_queue *queue = ent->queue;
@@ -50,7 +66,9 @@ void fuse_uring_destruct(struct fuse_conn *fc)
WARN_ON(!list_empty(&queue->ent_avail_queue));
WARN_ON(!list_empty(&queue->ent_intermediate_queue));
+ WARN_ON(!list_empty(&queue->ent_in_userspace));
+ kfree(queue->fpq.processing);
kfree(queue);
ring->queues[qid] = NULL;
}
@@ -107,6 +125,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
{
struct fuse_conn *fc = ring->fc;
struct fuse_ring_queue *queue;
+ struct list_head *pq;
queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
if (!queue)
@@ -114,6 +133,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
spin_lock(&fc->lock);
if (ring->queues[qid]) {
spin_unlock(&fc->lock);
+ kfree(queue->fpq.processing);
kfree(queue);
return ring->queues[qid];
}
@@ -125,12 +145,228 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
INIT_LIST_HEAD(&queue->ent_avail_queue);
INIT_LIST_HEAD(&queue->ent_intermediate_queue);
+ INIT_LIST_HEAD(&queue->ent_in_userspace);
+ INIT_LIST_HEAD(&queue->fuse_req_queue);
+
+ pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
+ if (!pq) {
+ kfree(queue);
+ return ERR_PTR(-ENOMEM);
+ }
+ queue->fpq.processing = pq;
+ fuse_pqueue_init(&queue->fpq);
spin_unlock(&fc->lock);
return queue;
}
+static void
+fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ io_uring_cmd_done(cmd, 0, 0, issue_flags);
+}
+
+/*
+ * Checks for errors and stores it into the request
+ */
+static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
+ struct fuse_req *req,
+ struct fuse_conn *fc)
+{
+ int err;
+
+ if (oh->unique == 0) {
+ /* Not supportd through request based uring, this needs another
+ * ring from user space to kernel
+ */
+ pr_warn("Unsupported fuse-notify\n");
+ err = -EINVAL;
+ goto seterr;
+ }
+
+ if (oh->error <= -512 || oh->error > 0) {
+ err = -EINVAL;
+ goto seterr;
+ }
+
+ if (oh->error) {
+ err = oh->error;
+ pr_devel("%s:%d err=%d op=%d req-ret=%d", __func__, __LINE__,
+ err, req->args->opcode, req->out.h.error);
+ goto err; /* error already set */
+ }
+
+ if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
+ pr_warn("Unpexted seqno mismatch, expected: %llu got %llu\n",
+ req->in.h.unique, oh->unique & ~FUSE_INT_REQ_BIT);
+ err = -ENOENT;
+ goto seterr;
+ }
+
+ /* Is it an interrupt reply ID? */
+ if (oh->unique & FUSE_INT_REQ_BIT) {
+ err = 0;
+ if (oh->error == -ENOSYS)
+ fc->no_interrupt = 1;
+ else if (oh->error == -EAGAIN) {
+ /* XXX Interrupts not handled yet */
+ /* err = queue_interrupt(req); */
+ pr_warn("Intrerupt EAGAIN not supported yet");
+ err = -EINVAL;
+ }
+
+ goto seterr;
+ }
+
+ return 0;
+
+seterr:
+ pr_devel("%s:%d err=%d op=%d req-ret=%d", __func__, __LINE__, err,
+ req->args->opcode, req->out.h.error);
+ oh->error = err;
+err:
+ pr_devel("%s:%d err=%d op=%d req-ret=%d", __func__, __LINE__, err,
+ req->args->opcode, req->out.h.error);
+ return err;
+}
+
+static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
+ struct fuse_req *req,
+ struct fuse_ring_ent *ent)
+{
+ struct fuse_ring_req __user *rreq = ent->rreq;
+ struct fuse_copy_state cs;
+ struct fuse_args *args = req->args;
+ struct iov_iter iter;
+ int err;
+ int res_arg_len;
+
+ err = copy_from_user(&res_arg_len, &rreq->in_out_arg_len,
+ sizeof(res_arg_len));
+ if (err)
+ return err;
+
+ err = import_ubuf(ITER_SOURCE, (void __user *)&rreq->in_out_arg,
+ ent->max_arg_len, &iter);
+ if (err)
+ return err;
+
+ fuse_copy_init(&cs, 0, &iter);
+ cs.is_uring = 1;
+ cs.req = req;
+
+ return fuse_copy_out_args(&cs, args, res_arg_len);
+}
+
+ /*
+ * Copy data from the req to the ring buffer
+ */
+static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
+ struct fuse_ring_ent *ent)
+{
+ struct fuse_ring_req __user *rreq = ent->rreq;
+ struct fuse_copy_state cs;
+ struct fuse_args *args = req->args;
+ int err, res;
+ struct iov_iter iter;
+
+ err = import_ubuf(ITER_DEST, (void __user *)&rreq->in_out_arg,
+ ent->max_arg_len, &iter);
+ if (err) {
+ pr_info("Import user buffer failed\n");
+ return err;
+ }
+
+ fuse_copy_init(&cs, 1, &iter);
+ cs.is_uring = 1;
+ cs.req = req;
+ err = fuse_copy_args(&cs, args->in_numargs, args->in_pages,
+ (struct fuse_arg *)args->in_args, 0);
+ if (err) {
+ pr_info("%s fuse_copy_args failed\n", __func__);
+ return err;
+ }
+
+ BUILD_BUG_ON((sizeof(rreq->in_out_arg_len) != sizeof(cs.ring.offset)));
+ res = copy_to_user(&rreq->in_out_arg_len, &cs.ring.offset,
+ sizeof(rreq->in_out_arg_len));
+ err = res > 0 ? -EFAULT : res;
+
+ return err;
+}
+
+static int
+fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
+{
+ struct fuse_ring_req *rreq = ring_ent->rreq;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_req *req = ring_ent->fuse_req;
+ int err = 0, res;
+
+ if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
+ pr_err("qid=%d ring-req=%p buf_req=%p invalid state %d on send\n",
+ queue->qid, ring_ent, rreq, ring_ent->state);
+ err = -EIO;
+ }
+
+ if (err)
+ return err;
+
+ pr_devel("%s qid=%d state=%d cmd-done op=%d unique=%llu\n", __func__,
+ queue->qid, ring_ent->state, req->in.h.opcode,
+ req->in.h.unique);
+
+ /* copy the request */
+ err = fuse_uring_copy_to_ring(ring, req, ring_ent);
+ if (unlikely(err)) {
+ pr_info("Copy to ring failed: %d\n", err);
+ goto err;
+ }
+
+ /* copy fuse_in_header */
+ res = copy_to_user(&rreq->in, &req->in.h, sizeof(rreq->in));
+ err = res > 0 ? -EFAULT : res;
+ if (err)
+ goto err;
+
+ set_bit(FR_SENT, &req->flags);
+ return 0;
+
+err:
+ fuse_uring_req_end(ring_ent, true, err);
+ return err;
+}
+
+/*
+ * Write data to the ring buffer and send the request to userspace,
+ * userspace will read it
+ * This is comparable with classical read(/dev/fuse)
+ */
+static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
+{
+ int err = 0;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ err = fuse_uring_prepare_send(ring_ent);
+ if (err)
+ goto err;
+
+ spin_lock(&queue->lock);
+ ring_ent->state = FRRS_USERSPACE;
+ list_move(&ring_ent->list, &queue->ent_in_userspace);
+ spin_unlock(&queue->lock);
+
+ io_uring_cmd_complete_in_task(ring_ent->cmd,
+ fuse_uring_async_send_to_ring);
+ return 0;
+
+err:
+ return err;
+}
+
/*
* Put a ring request onto hold, it is no longer used for now.
*/
@@ -157,6 +393,197 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
ring_ent->state = FRRS_WAIT;
}
+/* Used to find the request on SQE commit */
+static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_req *req = ring_ent->fuse_req;
+ struct fuse_pqueue *fpq = &queue->fpq;
+ unsigned int hash;
+
+ hash = fuse_req_hash(req->in.h.unique);
+ list_move_tail(&req->list, &fpq->processing[hash]);
+ req->ring_entry = ring_ent;
+}
+
+/*
+ * Assign a fuse queue entry to the given entry
+ */
+static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
+ struct fuse_req *req)
+{
+ lockdep_assert_held(&ring_ent->queue->lock);
+
+ if (WARN_ON_ONCE(ring_ent->state != FRRS_WAIT &&
+ ring_ent->state != FRRS_COMMIT)) {
+ pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+ ring_ent->state);
+ }
+ list_del_init(&req->list);
+ clear_bit(FR_PENDING, &req->flags);
+ ring_ent->fuse_req = req;
+ ring_ent->state = FRRS_FUSE_REQ;
+
+ fuse_uring_add_to_pq(ring_ent);
+}
+
+/*
+ * Release the ring entry and fetch the next fuse request if available
+ *
+ * @return true if a new request has been fetched
+ */
+static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
+ __must_hold(&queue->lock)
+{
+ struct fuse_req *req = NULL;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct list_head *req_queue = &queue->fuse_req_queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ /* get and assign the next entry while it is still holding the lock */
+ if (!list_empty(req_queue)) {
+ req = list_first_entry(req_queue, struct fuse_req, list);
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+ list_move(&ring_ent->list, &queue->ent_intermediate_queue);
+ }
+
+ return req ? true : false;
+}
+
+/*
+ * Read data from the ring buffer, which user space has written to
+ * This is comparible with handling of classical write(/dev/fuse).
+ * Also make the ring request available again for new fuse requests.
+ */
+static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
+ unsigned int issue_flags)
+{
+ struct fuse_ring *ring = ring_ent->queue->ring;
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_ring_req *rreq = ring_ent->rreq;
+ struct fuse_req *req = ring_ent->fuse_req;
+ ssize_t err = 0;
+ bool set_err = false;
+
+ err = copy_from_user(&req->out.h, &rreq->out, sizeof(req->out.h));
+ if (err) {
+ req->out.h.error = err;
+ goto out;
+ }
+
+ err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
+ if (err) {
+ /* req->out.h.error already set */
+ pr_devel("%s:%d err=%zd oh->err=%d\n", __func__, __LINE__, err,
+ req->out.h.error);
+ goto out;
+ }
+
+ err = fuse_uring_copy_from_ring(ring, req, ring_ent);
+ if (err)
+ set_err = true;
+
+out:
+ pr_devel("%s:%d ret=%zd op=%d req-ret=%d\n", __func__, __LINE__, err,
+ req->args->opcode, req->out.h.error);
+ fuse_uring_req_end(ring_ent, set_err, err);
+}
+
+/*
+ * Get the next fuse req and send it
+ */
+static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
+ struct fuse_ring_queue *queue)
+{
+ int has_next, err;
+ int prev_state = ring_ent->state;
+
+ do {
+ spin_lock(&queue->lock);
+ has_next = fuse_uring_ent_assign_req(ring_ent);
+ if (!has_next) {
+ fuse_uring_ent_avail(ring_ent, queue);
+ spin_unlock(&queue->lock);
+ break; /* no request left */
+ }
+ spin_unlock(&queue->lock);
+
+ err = fuse_uring_send_next_to_ring(ring_ent);
+ if (err) {
+ ring_ent->state = prev_state;
+ continue;
+ }
+
+ err = 0;
+ } while (err);
+}
+
+/* FUSE_URING_REQ_COMMIT_AND_FETCH handler */
+static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
+ struct fuse_conn *fc)
+{
+ const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+ struct fuse_ring_ent *ring_ent;
+ int err;
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ uint64_t commit_id = cmd_req->commit_id;
+ struct fuse_pqueue fpq;
+ unsigned int hash;
+ struct fuse_req *req;
+
+ err = -ENOTCONN;
+ if (!ring)
+ return err;
+
+ queue = ring->queues[cmd_req->qid];
+ if (!queue)
+ return err;
+ fpq = queue->fpq;
+
+ spin_lock(&queue->lock);
+ /* Find a request based on the unique ID of the fuse request
+ * This should get revised, as it needs a hash calculation and list
+ * search. And full struct fuse_pqueue is needed (memory overhead).
+ * As well as the link from req to ring_ent.
+ */
+ hash = fuse_req_hash(commit_id);
+ req = fuse_request_find(&fpq, commit_id);
+ err = -ENOENT;
+ if (!req) {
+ pr_info("qid=%d commit_id %llu not found\n", queue->qid,
+ commit_id);
+ spin_unlock(&queue->lock);
+ return err;
+ }
+ ring_ent = req->ring_entry;
+ req->ring_entry = NULL;
+
+ err = fuse_ring_ring_ent_unset_userspace(ring_ent);
+ if (err != 0) {
+ pr_info_ratelimited("qid=%d commit_id %llu state %d",
+ queue->qid, commit_id, ring_ent->state);
+ spin_unlock(&queue->lock);
+ return err;
+ }
+
+ ring_ent->cmd = cmd;
+ spin_unlock(&queue->lock);
+
+ /* without the queue lock, as other locks are taken */
+ fuse_uring_commit(ring_ent, issue_flags);
+
+ /*
+ * Fetching the next request is absolutely required as queued
+ * fuse requests would otherwise not get processed - committing
+ * and fetching is done in one step vs legacy fuse, which has separated
+ * read (fetch request) and write (commit result).
+ */
+ fuse_uring_next_fuse_req(ring_ent, queue);
+ return 0;
+}
+
/*
* fuse_uring_req_fetch command handling
*/
@@ -281,6 +708,9 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
case FUSE_URING_REQ_FETCH:
err = fuse_uring_fetch(cmd, issue_flags, fc);
break;
+ case FUSE_URING_REQ_COMMIT_AND_FETCH:
+ ret = fuse_uring_commit_fetch(cmd, issue_flags, fc);
+ break;
default:
err = -EINVAL;
pr_devel("Unknown uring command %d", cmd_op);
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 9a763262c6a5a781a36c3825529d729efef80e78..9bc7f490b02acb46aa7bbb31d5ce55a4d2787a60 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -19,6 +19,9 @@ enum fuse_ring_req_state {
/* The ring request waits for a new fuse request */
FRRS_WAIT,
+ /* The ring req got assigned a fuse req */
+ FRRS_FUSE_REQ,
+
/* request is in or on the way to user space */
FRRS_USERSPACE,
};
@@ -43,6 +46,8 @@ struct fuse_ring_ent {
/* struct fuse_ring_req::in_out_arg size*/
size_t max_arg_len;
+
+ struct fuse_req *fuse_req;
};
struct fuse_ring_queue {
@@ -69,6 +74,14 @@ struct fuse_ring_queue {
* to be send to userspace
*/
struct list_head ent_intermediate_queue;
+
+ /* entries in userspace */
+ struct list_head ent_in_userspace;
+
+ /* fuse requests waiting for an entry slot */
+ struct list_head fuse_req_queue;
+
+ struct fuse_pqueue fpq;
};
/**
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 7ecb103af6f0feca99eb8940872c6a5ccf2e5186..a8d578b99a14239c05b4a496a4b3b1396eb768dd 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -7,7 +7,7 @@
#define _FS_FUSE_DEV_I_H
#include <linux/types.h>
-
+#include <linux/fs.h>
/* Ordinary requests have even IDs, while interrupts IDs are odd */
#define FUSE_INT_REQ_BIT (1ULL << 0)
@@ -15,6 +15,8 @@
struct fuse_arg;
struct fuse_args;
+struct fuse_pqueue;
+struct fuse_req;
struct fuse_copy_state {
int write;
@@ -44,6 +46,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
return READ_ONCE(file->private_data);
}
+unsigned int fuse_req_hash(u64 unique);
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
+
void fuse_dev_end_requests(struct list_head *head);
void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 33e81b895fee620b9c2fcc8d9312fec53e3dc227..f1ddaba92869518db8854512ec8dd5ed0a0eeaa7 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -435,6 +435,10 @@ struct fuse_req {
/** fuse_mount this request belongs to */
struct fuse_mount *fm;
+
+#ifdef CONFIG_FUSE_IO_URING
+ void *ring_entry;
+#endif
};
struct fuse_iqueue;
@@ -1200,6 +1204,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
*/
struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
+/**
+ * Initialize the fuse processing queue
+ */
+void fuse_pqueue_init(struct fuse_pqueue *fpq);
+
/**
* Initialize fuse_conn
*/
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 59f8fb7b915f052f892d587a0f9a8dc17cf750ce..a1179c1e212b7a1cfd6e69f20dd5fcbe18c6202b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -894,7 +894,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq,
fiq->priv = priv;
}
-static void fuse_pqueue_init(struct fuse_pqueue *fpq)
+void fuse_pqueue_init(struct fuse_pqueue *fpq)
{
unsigned int i;
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 09/15] fuse: {uring} Handle teardown of ring entries
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (7 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 08/15] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 10/15] fuse: {uring} Add a ring queue and send method Bernd Schubert
` (8 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.
Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.
Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed
b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.
This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.
Entries in state:
- FRRS_INIT: No action needed, do not contribute to
ring->queue_refs yet
- All other states: Are currently processed by other tasks,
async teardown is needed and it has to wait for the two
states above. It could be also solved without an async
teardown task, but would require additional if conditions
in hot code paths. Also in my personal opinion the code
looks cleaner with async teardown.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 8 ++
fs/fuse/dev_uring.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++-
fs/fuse/dev_uring_i.h | 62 +++++++++++++++
3 files changed, 279 insertions(+), 3 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index fdb43640db5fdbe6b6232e1b2e2259e3117d237d..c8cc5fb2cfada29226f578a6273e8d6d34ab59ab 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2226,6 +2226,12 @@ void fuse_abort_conn(struct fuse_conn *fc)
spin_unlock(&fc->lock);
fuse_dev_end_requests(&to_end);
+
+ /*
+ * fc->lock must not be taken to avoid conflicts with io-uring
+ * locks
+ */
+ fuse_uring_abort(fc);
} else {
spin_unlock(&fc->lock);
}
@@ -2237,6 +2243,8 @@ void fuse_wait_aborted(struct fuse_conn *fc)
/* matches implicit memory barrier in fuse_drop_waiting() */
smp_mb();
wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
+
+ fuse_uring_wait_stopped_queues(fc);
}
int fuse_dev_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 0c39d5c1c62a1c496782e5c54b9f72a70cffdfa2..455a42a6b9348dda15dd082d3bfd778279f61e0b 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -50,6 +50,36 @@ static int fuse_ring_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
return 0;
}
+/* Abort all list queued request on the given ring queue */
+static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
+{
+ struct fuse_req *req;
+ LIST_HEAD(req_list);
+
+ spin_lock(&queue->lock);
+ list_for_each_entry(req, &queue->fuse_req_queue, list)
+ clear_bit(FR_PENDING, &req->flags);
+ list_splice_init(&queue->fuse_req_queue, &req_list);
+ spin_unlock(&queue->lock);
+
+ /* must not hold queue lock to avoid order issues with fi->lock */
+ fuse_dev_end_requests(&req_list);
+}
+
+void fuse_uring_abort_end_requests(struct fuse_ring *ring)
+{
+ int qid;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ fuse_uring_abort_end_queue_requests(queue);
+ }
+}
+
void fuse_uring_destruct(struct fuse_conn *fc)
{
struct fuse_ring *ring = fc->ring;
@@ -106,9 +136,12 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
goto out_err;
}
+ init_waitqueue_head(&ring->stop_waitq);
+
fc->ring = ring;
ring->nr_queues = nr_queues;
ring->fc = fc;
+ atomic_set(&ring->queue_refs, 0);
spin_unlock(&fc->lock);
return ring;
@@ -168,6 +201,175 @@ fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
io_uring_cmd_done(cmd, 0, 0, issue_flags);
}
+static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
+{
+ struct fuse_req *req = ent->fuse_req;
+
+ ent->fuse_req = NULL;
+ clear_bit(FR_SENT, &req->flags);
+ req->out.h.error = -ECONNABORTED;
+ fuse_request_end(req);
+}
+
+/*
+ * Release a request/entry on connection tear down
+ */
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
+ bool need_cmd_done)
+{
+ struct fuse_ring_queue *queue = ent->queue;
+
+ /*
+ * fuse_request_end() might take other locks like fi->lock and
+ * can lead to lock ordering issues
+ */
+ lockdep_assert_not_held(&ent->queue->lock);
+
+ if (need_cmd_done) {
+ pr_devel("qid=%d sending cmd_done\n", queue->qid);
+
+ io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
+ IO_URING_F_UNLOCKED);
+ }
+
+ if (ent->fuse_req)
+ fuse_uring_stop_fuse_req_end(ent);
+
+ list_del_init(&ent->list);
+ kfree(ent);
+}
+
+static void fuse_uring_stop_list_entries(struct list_head *head,
+ struct fuse_ring_queue *queue,
+ enum fuse_ring_req_state exp_state)
+{
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_ring_ent *ent, *next;
+ ssize_t queue_refs = SSIZE_MAX;
+ LIST_HEAD(to_teardown);
+
+ spin_lock(&queue->lock);
+ list_for_each_entry_safe(ent, next, head, list) {
+ if (ent->state != exp_state) {
+ pr_warn("entry teardown qid=%d state=%d expected=%d",
+ queue->qid, ent->state, exp_state);
+ continue;
+ }
+
+ list_move(&ent->list, &to_teardown);
+ }
+ spin_unlock(&queue->lock);
+
+ /* no queue lock to avoid lock order issues */
+ list_for_each_entry_safe(ent, next, &to_teardown, list) {
+ bool need_cmd_done = ent->state != FRRS_USERSPACE;
+
+ fuse_uring_entry_teardown(ent, need_cmd_done);
+ queue_refs = atomic_dec_return(&ring->queue_refs);
+
+ if (WARN_ON_ONCE(queue_refs < 0))
+ pr_warn("qid=%d queue_refs=%zd", queue->qid,
+ queue_refs);
+ }
+}
+
+static void fuse_uring_stop_queue(struct fuse_ring_queue *queue)
+{
+ fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue,
+ FRRS_USERSPACE);
+ fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue, FRRS_WAIT);
+}
+
+/*
+ * Log state debug info
+ */
+static void fuse_uring_log_ent_state(struct fuse_ring *ring)
+{
+ int qid;
+ struct fuse_ring_ent *ent;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ spin_lock(&queue->lock);
+ /*
+ * Log entries from the intermediate queue, the other queues
+ * should be empty
+ */
+ list_for_each_entry(ent, &queue->ent_intermediate_queue, list) {
+ pr_info("ring=%p qid=%d ent=%p state=%d\n", ring, qid,
+ ent, ent->state);
+ }
+ spin_lock(&queue->lock);
+ }
+ ring->stop_debug_log = 1;
+}
+
+static void fuse_uring_async_stop_queues(struct work_struct *work)
+{
+ int qid;
+ struct fuse_ring *ring =
+ container_of(work, struct fuse_ring, async_teardown_work.work);
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ fuse_uring_stop_queue(queue);
+ }
+
+ /*
+ * Some ring entries are might be in the middle of IO operations,
+ * i.e. in process to get handled by file_operations::uring_cmd
+ * or on the way to userspace - we could handle that with conditions in
+ * run time code, but easier/cleaner to have an async tear down handler
+ * If there are still queue references left
+ */
+ if (atomic_read(&ring->queue_refs) > 0) {
+ if (time_after(jiffies,
+ ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT))
+ fuse_uring_log_ent_state(ring);
+
+ schedule_delayed_work(&ring->async_teardown_work,
+ FUSE_URING_TEARDOWN_INTERVAL);
+ } else {
+ wake_up_all(&ring->stop_waitq);
+ }
+}
+
+/*
+ * Stop the ring queues
+ */
+void fuse_uring_stop_queues(struct fuse_ring *ring)
+{
+ int qid;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ fuse_uring_stop_queue(queue);
+ }
+
+ if (atomic_read(&ring->queue_refs) > 0) {
+ pr_info("ring=%p scheduling async queue stop\n", ring);
+ ring->teardown_time = jiffies;
+ INIT_DELAYED_WORK(&ring->async_teardown_work,
+ fuse_uring_async_stop_queues);
+ schedule_delayed_work(&ring->async_teardown_work,
+ FUSE_URING_TEARDOWN_INTERVAL);
+ } else {
+ wake_up_all(&ring->stop_waitq);
+ }
+}
+
/*
* Checks for errors and stores it into the request
*/
@@ -542,6 +744,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
return err;
fpq = queue->fpq;
+ if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped))
+ return err;
+
spin_lock(&queue->lock);
/* Find a request based on the unique ID of the fuse request
* This should get revised, as it needs a hash calculation and list
@@ -659,6 +864,7 @@ static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
if (WARN_ON_ONCE(err != 0))
goto err;
+ atomic_inc(&ring->queue_refs);
_fuse_uring_fetch(ring_ent, cmd, issue_flags);
return 0;
@@ -680,13 +886,13 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
u32 cmd_op = cmd->cmd_op;
int err = 0;
- pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
-
/* Disabled for now, especially as teardown is not implemented yet */
err = -EOPNOTSUPP;
pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
goto out;
+ pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
+
err = -EOPNOTSUPP;
if (!enable_uring) {
pr_info_ratelimited("uring is disabled\n");
@@ -709,7 +915,7 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
err = fuse_uring_fetch(cmd, issue_flags, fc);
break;
case FUSE_URING_REQ_COMMIT_AND_FETCH:
- ret = fuse_uring_commit_fetch(cmd, issue_flags, fc);
+ err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
break;
default:
err = -EINVAL;
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 9bc7f490b02acb46aa7bbb31d5ce55a4d2787a60..c19e439cd51316bdabdd16901659e97b2ff90875 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -11,6 +11,9 @@
#ifdef CONFIG_FUSE_IO_URING
+#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ)
+#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20)
+
enum fuse_ring_req_state {
/* ring entry received from userspace and it being processed */
@@ -82,6 +85,8 @@ struct fuse_ring_queue {
struct list_head fuse_req_queue;
struct fuse_pqueue fpq;
+
+ bool stopped;
};
/**
@@ -96,11 +101,61 @@ struct fuse_ring {
size_t nr_queues;
struct fuse_ring_queue **queues;
+ /*
+ * Log ring entry states onces on stop when entries cannot be
+ * released
+ */
+ unsigned int stop_debug_log : 1;
+
+ wait_queue_head_t stop_waitq;
+
+ /* async tear down */
+ struct delayed_work async_teardown_work;
+
+ /* log */
+ unsigned long teardown_time;
+
+ atomic_t queue_refs;
};
void fuse_uring_destruct(struct fuse_conn *fc);
+void fuse_uring_stop_queues(struct fuse_ring *ring);
+void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+static inline void fuse_uring_set_stopped_queues(struct fuse_ring *ring)
+{
+ int qid;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ WRITE_ONCE(queue->stopped, true);
+ }
+}
+
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+
+ if (ring == NULL)
+ return;
+
+ if (atomic_read(&ring->queue_refs) > 0) {
+ fuse_uring_abort_end_requests(ring);
+ fuse_uring_stop_queues(ring);
+ }
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+
+ if (ring)
+ wait_event(ring->stop_waitq,
+ atomic_read(&ring->queue_refs) == 0);
+}
+
#else /* CONFIG_FUSE_IO_URING */
struct fuse_ring;
@@ -113,6 +168,13 @@ static inline void fuse_uring_destruct(struct fuse_conn *fc)
{
}
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+}
#endif /* CONFIG_FUSE_IO_URING */
#endif /* _FS_FUSE_DEV_URING_I_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 10/15] fuse: {uring} Add a ring queue and send method
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (8 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 09/15] fuse: {uring} Handle teardown of ring entries Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 11/15] fuse: {uring} Allow to queue to the ring Bernd Schubert
` (7 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
This prepares queueing and sending through io-uring.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 7 ++++
2 files changed, 108 insertions(+)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 455a42a6b9348dda15dd082d3bfd778279f61e0b..3f1c39bb43e24a7f9c5d4cdd507f56fe6358f2fd 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -19,6 +19,10 @@ MODULE_PARM_DESC(enable_uring,
"Enable uring userspace communication through uring.");
#endif
+struct fuse_uring_cmd_pdu {
+ struct fuse_ring_ent *ring_ent;
+};
+
/*
* Finalize a fuse request, then fetch and send the next entry, if available
*/
@@ -931,3 +935,100 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
return -EIOCBQUEUED;
}
+
+/*
+ * This prepares and sends the ring request in fuse-uring task context.
+ * User buffers are not mapped yet - the application does not have permission
+ * to write to it - this has to be executed in ring task context.
+ * XXX: Map and pin user paged and avoid this function.
+ */
+static void
+fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+ struct fuse_ring_ent *ring_ent = pdu->ring_ent;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ int err;
+
+ BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
+
+ err = fuse_uring_prepare_send(ring_ent);
+ if (err)
+ goto err;
+
+ io_uring_cmd_done(cmd, 0, 0, issue_flags);
+
+ spin_lock(&queue->lock);
+ ring_ent->state = FRRS_USERSPACE;
+ list_move(&ring_ent->list, &queue->ent_in_userspace);
+ spin_unlock(&queue->lock);
+ return;
+err:
+ fuse_uring_next_fuse_req(ring_ent, queue);
+}
+
+/* queue a fuse request and send it if a ring entry is available */
+int fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req)
+{
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ int qid = 0;
+ struct fuse_ring_ent *ring_ent = NULL;
+ int res;
+
+ /*
+ * async requests are best handled on another core, the current
+ * core can do application/page handling, while the async request
+ * is handled on another core in userspace.
+ * For sync request the application has to wait - no processing, so
+ * the request should continue on the current core and avoid context
+ * switches.
+ * XXX This should be on the same numa node and not busy - is there
+ * a scheduler function available that could make this decision?
+ * It should also not persistently switch between cores - makes
+ * it hard for the scheduler.
+ */
+ qid = task_cpu(current);
+
+ if (WARN_ONCE(qid >= ring->nr_queues,
+ "Core number (%u) exceeds nr ueues (%zu)\n", qid,
+ ring->nr_queues))
+ qid = 0;
+
+ queue = ring->queues[qid];
+ if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
+ return -EINVAL;
+
+ spin_lock(&queue->lock);
+
+ if (unlikely(queue->stopped)) {
+ res = -ENOTCONN;
+ goto err_unlock;
+ }
+
+ list_add_tail(&req->list, &queue->fuse_req_queue);
+
+ if (!list_empty(&queue->ent_avail_queue)) {
+ ring_ent = list_first_entry(&queue->ent_avail_queue,
+ struct fuse_ring_ent, list);
+ list_del_init(&ring_ent->list);
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+ }
+ spin_unlock(&queue->lock);
+
+ if (ring_ent) {
+ struct io_uring_cmd *cmd = ring_ent->cmd;
+ struct fuse_uring_cmd_pdu *pdu =
+ (struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+ pdu->ring_ent = ring_ent;
+ io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+ }
+
+ return 0;
+
+err_unlock:
+ spin_unlock(&queue->lock);
+ return res;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index c19e439cd51316bdabdd16901659e97b2ff90875..4f5586684cb8fec3ddc825511cb6b935f5cf85d6 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -122,6 +122,7 @@ void fuse_uring_destruct(struct fuse_conn *fc);
void fuse_uring_stop_queues(struct fuse_ring *ring);
void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+int fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req);
static inline void fuse_uring_set_stopped_queues(struct fuse_ring *ring)
{
@@ -175,6 +176,12 @@ static inline void fuse_uring_abort(struct fuse_conn *fc)
static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
{
}
+
+static inline int
+fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req)
+{
+ return -EPFNOSUPPORT;
+}
#endif /* CONFIG_FUSE_IO_URING */
#endif /* _FS_FUSE_DEV_URING_I_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 11/15] fuse: {uring} Allow to queue to the ring
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (9 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 10/15] fuse: {uring} Add a ring queue and send method Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task Bernd Schubert
` (6 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
This enables enqueuing requests through fuse uring queues.
For initial simplicity requests are always allocated the normal way
then added to ring queues lists and only then copied to ring queue
entries. Later on the allocation and adding the requests to a list
can be avoided, by directly using a ring entry. This introduces
some code complexity and is therefore not done for now.
FIXME: Needs update with new function pointers in fuse-next.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 74 +++++++++++++++++++++++++++++++++++++++++++++------
fs/fuse/dev_uring.c | 33 +++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 12 +++++++++
3 files changed, 111 insertions(+), 8 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c8cc5fb2cfada29226f578a6273e8d6d34ab59ab..a8b261ae0290ab1fae9c8c0de293d699e16dab2c 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -211,13 +211,24 @@ const struct fuse_iqueue_ops fuse_dev_fiq_ops = {
};
EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops);
-static void queue_request_and_unlock(struct fuse_iqueue *fiq,
- struct fuse_req *req)
+
+static void queue_request_and_unlock(struct fuse_conn *fc,
+ struct fuse_req *req, bool allow_uring)
__releases(fiq->lock)
{
+ struct fuse_iqueue *fiq = &fc->iq;
+
req->in.h.len = sizeof(struct fuse_in_header) +
fuse_len_args(req->args->in_numargs,
(struct fuse_arg *) req->args->in_args);
+
+ if (allow_uring && fuse_uring_ready(fc)) {
+ /* this lock is not needed at all for ring req handling */
+ spin_unlock(&fiq->lock);
+ fuse_uring_queue_fuse_req(fc, req);
+ return;
+ }
+
list_add_tail(&req->list, &fiq->pending);
fiq->ops->wake_pending_and_unlock(fiq);
}
@@ -254,7 +265,7 @@ static void flush_bg_queue(struct fuse_conn *fc)
fc->active_background++;
spin_lock(&fiq->lock);
req->in.h.unique = fuse_get_unique(fiq);
- queue_request_and_unlock(fiq, req);
+ queue_request_and_unlock(fc, req, true);
}
}
@@ -398,7 +409,8 @@ static void request_wait_answer(struct fuse_req *req)
static void __fuse_request_send(struct fuse_req *req)
{
- struct fuse_iqueue *fiq = &req->fm->fc->iq;
+ struct fuse_conn *fc = req->fm->fc;
+ struct fuse_iqueue *fiq = &fc->iq;
BUG_ON(test_bit(FR_BACKGROUND, &req->flags));
spin_lock(&fiq->lock);
@@ -410,7 +422,7 @@ static void __fuse_request_send(struct fuse_req *req)
/* acquire extra reference, since request is still needed
after fuse_request_end() */
__fuse_get_request(req);
- queue_request_and_unlock(fiq, req);
+ queue_request_and_unlock(fc, req, true);
request_wait_answer(req);
/* Pairs with smp_wmb() in fuse_request_end() */
@@ -480,6 +492,10 @@ ssize_t fuse_simple_request(struct fuse_mount *fm, struct fuse_args *args)
if (args->force) {
atomic_inc(&fc->num_waiting);
req = fuse_request_alloc(fm, GFP_KERNEL | __GFP_NOFAIL);
+ if (unlikely(!req)) {
+ ret = -ENOTCONN;
+ goto err;
+ }
if (!args->nocreds)
fuse_force_creds(req);
@@ -507,16 +523,55 @@ ssize_t fuse_simple_request(struct fuse_mount *fm, struct fuse_args *args)
}
fuse_put_request(req);
+err:
return ret;
}
-static bool fuse_request_queue_background(struct fuse_req *req)
+static bool fuse_request_queue_background_uring(struct fuse_conn *fc,
+ struct fuse_req *req)
+{
+ struct fuse_iqueue *fiq = &fc->iq;
+ int err;
+
+ req->in.h.unique = fuse_get_unique(fiq);
+ req->in.h.len = sizeof(struct fuse_in_header) +
+ fuse_len_args(req->args->in_numargs,
+ (struct fuse_arg *) req->args->in_args);
+
+ err = fuse_uring_queue_fuse_req(fc, req);
+ if (!err) {
+ /* XXX remove and lets the users of that use per queue values -
+ * avoid the shared spin lock...
+ * Is this needed at all?
+ */
+ spin_lock(&fc->bg_lock);
+ fc->num_background++;
+ fc->active_background++;
+
+
+ /* XXX block when per ring queues get occupied */
+ if (fc->num_background == fc->max_background)
+ fc->blocked = 1;
+ spin_unlock(&fc->bg_lock);
+ }
+
+ return err ? false : true;
+}
+
+/*
+ * @return true if queued
+ */
+static int fuse_request_queue_background(struct fuse_req *req)
{
struct fuse_mount *fm = req->fm;
struct fuse_conn *fc = fm->fc;
bool queued = false;
WARN_ON(!test_bit(FR_BACKGROUND, &req->flags));
+
+ if (fuse_uring_ready(fc))
+ return fuse_request_queue_background_uring(fc, req);
+
if (!test_bit(FR_WAITING, &req->flags)) {
__set_bit(FR_WAITING, &req->flags);
atomic_inc(&fc->num_waiting);
@@ -569,7 +624,8 @@ static int fuse_simple_notify_reply(struct fuse_mount *fm,
struct fuse_args *args, u64 unique)
{
struct fuse_req *req;
- struct fuse_iqueue *fiq = &fm->fc->iq;
+ struct fuse_conn *fc = fm->fc;
+ struct fuse_iqueue *fiq = &fc->iq;
int err = 0;
req = fuse_get_req(fm, false);
@@ -583,7 +639,8 @@ static int fuse_simple_notify_reply(struct fuse_mount *fm,
spin_lock(&fiq->lock);
if (fiq->connected) {
- queue_request_and_unlock(fiq, req);
+ /* uring for notify not supported yet */
+ queue_request_and_unlock(fc, req, false);
} else {
err = -ENODEV;
spin_unlock(&fiq->lock);
@@ -2184,6 +2241,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
spin_unlock(&fc->bg_lock);
fuse_set_initialized(fc);
+
list_for_each_entry(fud, &fc->devices, entry) {
struct fuse_pqueue *fpq = &fud->pq;
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 3f1c39bb43e24a7f9c5d4cdd507f56fe6358f2fd..6af14a32e908bcb82767ab1bf1f78d83329f801a 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -793,6 +793,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
return 0;
}
+static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
+{
+ int qid;
+ struct fuse_ring_queue *queue;
+ bool ready = true;
+
+ for (qid = 0; qid < ring->nr_queues && ready; qid++) {
+ if (current_qid == qid)
+ continue;
+
+ queue = ring->queues[qid];
+ if (!queue) {
+ ready = false;
+ break;
+ }
+
+ spin_lock(&queue->lock);
+ if (list_empty(&queue->ent_avail_queue))
+ ready = false;
+ spin_unlock(&queue->lock);
+ }
+
+ return ready;
+}
+
/*
* fuse_uring_req_fetch command handling
*/
@@ -801,11 +826,19 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
unsigned int issue_flags)
{
struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_ring *ring = queue->ring;
spin_lock(&queue->lock);
fuse_uring_ent_avail(ring_ent, queue);
ring_ent->cmd = cmd;
spin_unlock(&queue->lock);
+
+ if (!ring->ready) {
+ bool ready = is_ring_ready(ring, queue->qid);
+
+ if (ready)
+ WRITE_ONCE(ring->ready, true);
+ }
}
static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 4f5586684cb8fec3ddc825511cb6b935f5cf85d6..931eef6f770e967784bbe9b354bc61d9bfe7ff8d 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -116,6 +116,8 @@ struct fuse_ring {
unsigned long teardown_time;
atomic_t queue_refs;
+
+ bool ready;
};
void fuse_uring_destruct(struct fuse_conn *fc);
@@ -157,6 +159,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
atomic_read(&ring->queue_refs) == 0);
}
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+ return fc->ring && fc->ring->ready;
+}
+
#else /* CONFIG_FUSE_IO_URING */
struct fuse_ring;
@@ -177,6 +184,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
{
}
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+ return false;
+}
+
static inline int
fuse_uring_queue_fuse_req(struct fuse_conn *fc, struct fuse_req *req)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (10 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 11/15] fuse: {uring} Allow to queue to the ring Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-11-04 0:28 ` Pavel Begunkov
2024-10-16 0:05 ` [PATCH RFC v4 13/15] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
` (5 subsequent siblings)
17 siblings, 1 reply; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
From: Pavel Begunkov <[email protected]>
When the taks that submitted a request is dying, a task work for that
request might get run by a kernel thread or even worse by a half
dismantled task. We can't just cancel the task work without running the
callback as the cmd might need to do some clean up, so pass a flag
instead. If set, it's not safe to access any task resources and the
callback is expected to cancel the cmd ASAP.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 1 +
io_uring/uring_cmd.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 7abdc09271245ff7de3fb9a905ca78b7561e37eb..869a81c63e4970576155043fce7fe656293d7f58 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -37,6 +37,7 @@ enum io_uring_cmd_flags {
/* set when uring wants to cancel a previously issued command */
IO_URING_F_CANCEL = (1 << 11),
IO_URING_F_COMPAT = (1 << 12),
+ IO_URING_F_TASK_DEAD = (1 << 13),
};
struct io_wq_work_node {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 21ac5fb2d5f087e1174d5c94815d580972db6e3f..82c6001cc0696bbcbebb92153e1461f2a9aeebc3 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -119,9 +119,13 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+ unsigned int flags = IO_URING_F_COMPLETE_DEFER;
+
+ if (req->task != current)
+ flags |= IO_URING_F_TASK_DEAD;
/* task_work executor checks the deffered list completion */
- ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER);
+ ioucmd->task_work_cb(ioucmd, flags);
}
void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 13/15] fuse: {uring} Handle IO_URING_F_TASK_DEAD
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (11 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 14/15] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
` (4 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
The ring task is terminating, it not safe to still access
its resources. Also no need for further actions.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 6af14a32e908bcb82767ab1bf1f78d83329f801a..6632c9163b8a51c39e07258fea631cf9383ce538 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -986,16 +986,22 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
+ if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
+ err = -ECANCELED;
+ goto terminating;
+ }
+
err = fuse_uring_prepare_send(ring_ent);
if (err)
goto err;
- io_uring_cmd_done(cmd, 0, 0, issue_flags);
-
+terminating:
spin_lock(&queue->lock);
ring_ent->state = FRRS_USERSPACE;
list_move(&ring_ent->list, &queue->ent_in_userspace);
spin_unlock(&queue->lock);
+ io_uring_cmd_done(cmd, err, 0, issue_flags);
+
return;
err:
fuse_uring_next_fuse_req(ring_ent, queue);
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 14/15] fuse: {io-uring} Prevent mount point hang on fuse-server termination
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (12 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 13/15] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 15/15] fuse: enable fuse-over-io-uring Bernd Schubert
` (3 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
When the fuse-server terminates while the fuse-client or kernel
still has queued URING_CMDs, these commands retain references
to the struct file used by the fuse connection. This prevents
fuse_dev_release() from being invoked, resulting in a hung mount
point.
This patch addresses the issue by making queued URING_CMDs
cancelable, allowing fuse_dev_release() to proceed as expected
and preventing the mount point from hanging.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 6632c9163b8a51c39e07258fea631cf9383ce538..5603831d490c64045ff402140c317019e69f8987 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -21,6 +21,7 @@ MODULE_PARM_DESC(enable_uring,
struct fuse_uring_cmd_pdu {
struct fuse_ring_ent *ring_ent;
+ struct fuse_ring_queue *queue;
};
/*
@@ -374,6 +375,61 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
}
}
+/*
+ * Handle IO_URING_F_CANCEL, typically should come on on daemon termination
+ */
+static void fuse_uring_cancel(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, struct fuse_conn *fc)
+{
+ struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+ struct fuse_ring_queue *queue = pdu->queue;
+ struct fuse_ring_ent *ent;
+ bool found = false;
+ bool need_cmd_done = false;
+
+ spin_lock(&queue->lock);
+
+ /* XXX: This is cumbersome for large queues. */
+ list_for_each_entry(ent, &queue->ent_avail_queue, list) {
+ if (pdu->ring_ent == ent) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ pr_info("qid=%d Did not find ent=%p", queue->qid, ent);
+ spin_unlock(&queue->lock);
+ return;
+ }
+
+ if (ent->state == FRRS_WAIT) {
+ ent->state = FRRS_USERSPACE;
+ list_move(&ent->list, &queue->ent_in_userspace);
+ need_cmd_done = true;
+ }
+ spin_unlock(&queue->lock);
+
+ if (need_cmd_done)
+ io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
+
+ /*
+ * releasing the last entry should trigger fuse_dev_release() if
+ * the daemon was terminated
+ */
+}
+
+static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
+ struct fuse_ring_ent *ring_ent)
+{
+ struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+ pdu->ring_ent = ring_ent;
+ pdu->queue = ring_ent->queue;
+
+ io_uring_cmd_mark_cancelable(cmd, issue_flags);
+}
+
/*
* Checks for errors and stores it into the request
*/
@@ -902,6 +958,7 @@ static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
goto err;
atomic_inc(&ring->queue_refs);
+ fuse_uring_prepare_cancel(cmd, issue_flags, ring_ent);
_fuse_uring_fetch(ring_ent, cmd, issue_flags);
return 0;
@@ -947,6 +1004,11 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
if (fc->aborted)
goto out;
+ if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
+ fuse_uring_cancel(cmd, issue_flags, fc);
+ return 0;
+ }
+
switch (cmd_op) {
case FUSE_URING_REQ_FETCH:
err = fuse_uring_fetch(cmd, issue_flags, fc);
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH RFC v4 15/15] fuse: enable fuse-over-io-uring
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (13 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 14/15] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2024-10-16 0:05 ` Bernd Schubert
2024-10-16 0:08 ` [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (2 subsequent siblings)
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:05 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Bernd Schubert
All required parts are handled now, fuse-io-uring can
be enabled.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 5603831d490c64045ff402140c317019e69f8987..e518d4379aa1e239612d0776fc5a734dbc20ce90 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -980,11 +980,6 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
u32 cmd_op = cmd->cmd_op;
int err = 0;
- /* Disabled for now, especially as teardown is not implemented yet */
- err = -EOPNOTSUPP;
- pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
- goto out;
-
pr_devel("%s:%d received: cmd op %d\n", __func__, __LINE__, cmd_op);
err = -EOPNOTSUPP;
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (14 preceding siblings ...)
2024-10-16 0:05 ` [PATCH RFC v4 15/15] fuse: enable fuse-over-io-uring Bernd Schubert
@ 2024-10-16 0:08 ` Bernd Schubert
2024-10-21 4:06 ` David Wei
2024-10-22 22:10 ` David Wei
17 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-16 0:08 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
Please note that this is a preview only to show the current status.
V5 should follow soon to separate the headers into its own buffer.
I actually hope that this v4 is the last RFC version.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (15 preceding siblings ...)
2024-10-16 0:08 ` [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
@ 2024-10-21 4:06 ` David Wei
2024-10-21 11:47 ` Bernd Schubert
2024-10-22 22:10 ` David Wei
17 siblings, 1 reply; 36+ messages in thread
From: David Wei @ 2024-10-21 4:06 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
On 2024-10-15 17:05, Bernd Schubert wrote:
[...]
>
> The corresponding libfuse patches are on my uring branch,
> but need cleanup for submission - will happen during the next
> days.
> https://github.com/bsbernd/libfuse/tree/uring
>
> Testing with that libfuse branch is possible by running something
> like:
>
> example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
> /scratch/source /scratch/dest
>
> With the --debug-fuse option one should see CQE in the request type,
> if requests are received via io-uring:
>
> cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
> unique: 4, result=104
>
> Without the --uring option "cqe" is replaced by the default "dev"
>
> dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
> unique: 4, success, outsize: 120
Hi Bernd, I applied this patchset to io_uring-6.12 branch with some
minor conflicts. I'm running the following command:
$ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
/home/vmuser/scratch/source /home/vmuser/scratch/dest
FUSE library version: 3.17.0
Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672
dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
INIT: 7.40
flags=0x73fffffb
max_readahead=0x00020000
INIT: 7.40
flags=0x4041f429
max_readahead=0x00020000
max_write=0x00100000
max_background=0
congestion_threshold=0
time_gran=1
unique: 2, success, outsize: 80
I created the source and dest folders which are both empty.
I see the following in dmesg:
[ 2453.197510] uring is disabled
[ 2453.198525] uring is disabled
[ 2453.198749] uring is disabled
...
If I then try to list the directory /home/vmuser/scratch:
$ ls -l /home/vmuser/scratch
ls: cannot access 'dest': Software caused connection abort
And passthrough_hp terminates.
My kconfig:
CONFIG_FUSE_FS=m
CONFIG_FUSE_PASSTHROUGH=y
CONFIG_FUSE_IO_URING=y
I'll look into it next week but, do you see anything obviously wrong?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-21 4:06 ` David Wei
@ 2024-10-21 11:47 ` Bernd Schubert
2024-10-21 20:57 ` David Wei
0 siblings, 1 reply; 36+ messages in thread
From: Bernd Schubert @ 2024-10-21 11:47 UTC (permalink / raw)
To: David Wei, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
Hi David,
On 10/21/24 06:06, David Wei wrote:
> [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On 2024-10-15 17:05, Bernd Schubert wrote:
> [...]
>>
...
> Hi Bernd, I applied this patchset to io_uring-6.12 branch with some
> minor conflicts. I'm running the following command:
>
> $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
> /home/vmuser/scratch/source /home/vmuser/scratch/dest
> FUSE library version: 3.17.0
> Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672
> dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> INIT: 7.40
> flags=0x73fffffb
> max_readahead=0x00020000
> INIT: 7.40
> flags=0x4041f429
> max_readahead=0x00020000
> max_write=0x00100000
> max_background=0
> congestion_threshold=0
> time_gran=1
> unique: 2, success, outsize: 80
>
> I created the source and dest folders which are both empty.
>
> I see the following in dmesg:
>
> [ 2453.197510] uring is disabled
> [ 2453.198525] uring is disabled
> [ 2453.198749] uring is disabled
> ...
>
> If I then try to list the directory /home/vmuser/scratch:
>
> $ ls -l /home/vmuser/scratch
> ls: cannot access 'dest': Software caused connection abort
>
> And passthrough_hp terminates.
>
> My kconfig:
>
> CONFIG_FUSE_FS=m
> CONFIG_FUSE_PASSTHROUGH=y
> CONFIG_FUSE_IO_URING=y
>
> I'll look into it next week but, do you see anything obviously wrong?
thanks for testing it! I just pushed a fix to my libfuse branches to
avoid the abort for -EOPNOTSUPP. It will gracefully fall back to
/dev/fuse IO now.
Could you please use the rfcv4 branch, as the plain uring
branch will soon get incompatible updates for rfc5?
https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4
The short answer to let you enable fuse-io-uring:
echo 1 >/sys/module/fuse/parameters/enable_uring
(With that the "uring is disabled" should be fixed.)
The long answer for Miklos and others
IOCTL removal introduced a design issue, as now fuse-client
(kernel) does not know if fuse-server/libfuse wants to set
up io-uring communication.
It is not even possible to forbid FUSE_URING_REQ_FETCH after
FUSE_INIT reply, as io-uring is async. What happens is that
fuse-client (kernel) receives all FUSE_URING_REQ_FETCH commands
only after FUSE_INIT reply. And that although FUSE_URING_REQ_FETCH
is send out from libuse *before* replying to FUSE_INIT.
I had also added a comment for that into the code.
And the other issue is that libfuse now does not know if kernel supports
fuse-io-uring. That has some implications
- libfuse cannot write at start up time a clear error message like
"Kernel does not support fuse-over-io-uring, falling back to /dev/fuse IO"
- In the fallback code path one might want to adjust number of libfuse
/dev/fuse threads if io-uring is not supported - with io-uring typically
one thread might be sufficient - to handle FUSE_INTERRUPT.
My suggestion is that we introduce the new FUSE_URING_REQ_REGISTER (or
replace FUSE_URING_REQ_FETCH with that) and then wait in fuse-server
for completion of that command before sending out FUSE_URING_REQ_FETCH.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-21 11:47 ` Bernd Schubert
@ 2024-10-21 20:57 ` David Wei
2024-10-22 10:24 ` Bernd Schubert
0 siblings, 1 reply; 36+ messages in thread
From: David Wei @ 2024-10-21 20:57 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
On 2024-10-21 04:47, Bernd Schubert wrote:
> Hi David,
>
> On 10/21/24 06:06, David Wei wrote:
>> [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>
>> On 2024-10-15 17:05, Bernd Schubert wrote:
>> [...]
>>>
>
> ...
>
>> Hi Bernd, I applied this patchset to io_uring-6.12 branch with some
>> minor conflicts. I'm running the following command:
>>
>> $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
>> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
>> /home/vmuser/scratch/source /home/vmuser/scratch/dest
>> FUSE library version: 3.17.0
>> Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672
>> dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
>> INIT: 7.40
>> flags=0x73fffffb
>> max_readahead=0x00020000
>> INIT: 7.40
>> flags=0x4041f429
>> max_readahead=0x00020000
>> max_write=0x00100000
>> max_background=0
>> congestion_threshold=0
>> time_gran=1
>> unique: 2, success, outsize: 80
>>
>> I created the source and dest folders which are both empty.
>>
>> I see the following in dmesg:
>>
>> [ 2453.197510] uring is disabled
>> [ 2453.198525] uring is disabled
>> [ 2453.198749] uring is disabled
>> ...
>>
>> If I then try to list the directory /home/vmuser/scratch:
>>
>> $ ls -l /home/vmuser/scratch
>> ls: cannot access 'dest': Software caused connection abort
>>
>> And passthrough_hp terminates.
>>
>> My kconfig:
>>
>> CONFIG_FUSE_FS=m
>> CONFIG_FUSE_PASSTHROUGH=y
>> CONFIG_FUSE_IO_URING=y
>>
>> I'll look into it next week but, do you see anything obviously wrong?
>
>
> thanks for testing it! I just pushed a fix to my libfuse branches to
> avoid the abort for -EOPNOTSUPP. It will gracefully fall back to
> /dev/fuse IO now.
>
> Could you please use the rfcv4 branch, as the plain uring
> branch will soon get incompatible updates for rfc5?
>
> https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4
>
>
> The short answer to let you enable fuse-io-uring:
>
> echo 1 >/sys/module/fuse/parameters/enable_uring
>
>
> (With that the "uring is disabled" should be fixed.)
Thanks, using this branch fixed the issue and now I can see the dest
folder mirroring that of the source folder. There are two issues I
noticed:
[63490.068211] ---[ end trace 0000000000000000 ]---
[64010.242963] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:330
[64010.243531] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 11057, name: fuse-ring-1
[64010.244092] preempt_count: 1, expected: 0
[64010.244346] RCU nest depth: 0, expected: 0
[64010.244599] 2 locks held by fuse-ring-1/11057:
[64010.244886] #0: ffff888105db20a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x900/0xd80
[64010.245476] #1: ffff88810f941818 (&fc->lock){+.+.}-{2:2}, at: fuse_uring_cmd+0x83e/0x1890 [fuse]
[64010.246031] CPU: 1 UID: 0 PID: 11057 Comm: fuse-ring-1 Tainted: G W 6.11.0-10089-g0d2090ccdbbe #2
[64010.246655] Tainted: [W]=WARN
[64010.246853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[64010.247542] Call Trace:
[64010.247705] <TASK>
[64010.247860] dump_stack_lvl+0xb0/0xd0
[64010.248090] __might_resched+0x2f8/0x510
[64010.248338] __kmalloc_cache_noprof+0x2aa/0x390
[64010.248614] ? lockdep_init_map_type+0x2cb/0x7b0
[64010.248923] ? fuse_uring_cmd+0xcc2/0x1890 [fuse]
[64010.249215] fuse_uring_cmd+0xcc2/0x1890 [fuse]
[64010.249506] io_uring_cmd+0x214/0x500
[64010.249745] io_issue_sqe+0x588/0x1810
[64010.249999] ? __pfx_io_issue_sqe+0x10/0x10
[64010.250254] ? io_alloc_async_data+0x88/0x120
[64010.250516] ? io_alloc_async_data+0x88/0x120
[64010.250811] ? io_uring_cmd_prep+0x2eb/0x9f0
[64010.251103] io_submit_sqes+0x796/0x1f80
[64010.251387] __do_sys_io_uring_enter+0x90a/0xd80
[64010.251696] ? do_user_addr_fault+0x26f/0xb60
[64010.251991] ? __pfx___do_sys_io_uring_enter+0x10/0x10
[64010.252333] ? __up_read+0x3ba/0x750
[64010.252565] ? __pfx___up_read+0x10/0x10
[64010.252868] do_syscall_64+0x68/0x140
[64010.253121] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[64010.253444] RIP: 0033:0x7f03a03fb7af
[64010.253679] Code: 45 0f b6 90 d0 00 00 00 41 8b b8 cc 00 00 00 45 31 c0 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> a8 02 74 cc f0 48 83 0c 24 00 49 8b 40 20 8b 00 a8 01 74 bc b8
[64010.254801] RSP: 002b:00007f039f3ffd08 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
[64010.255261] RAX: ffffffffffffffda RBX: 0000561ab7c1ced0 RCX: 00007f03a03fb7af
[64010.255695] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000009
[64010.256127] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000008
[64010.256556] R10: 0000000000000000 R11: 0000000000000246 R12: 0000561ab7c1d7a8
[64010.256990] R13: 0000561ab7c1da00 R14: 0000561ab7c1d520 R15: 0000000000000001
[64010.257442] </TASK>
If I am already in dest when I do the mount using passthrough_hp and
then e.g. ls, it hangs indefinitely even if I kill passthrough_hp.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-21 20:57 ` David Wei
@ 2024-10-22 10:24 ` Bernd Schubert
2024-10-22 12:46 ` Bernd Schubert
2024-10-22 17:12 ` David Wei
0 siblings, 2 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-10-22 10:24 UTC (permalink / raw)
To: David Wei, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
On 10/21/24 22:57, David Wei wrote:
> On 2024-10-21 04:47, Bernd Schubert wrote:
>> Hi David,
>>
>> On 10/21/24 06:06, David Wei wrote:
>>> [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On 2024-10-15 17:05, Bernd Schubert wrote:
>>> [...]
>>>>
>>
>> ...
>>
>>> Hi Bernd, I applied this patchset to io_uring-6.12 branch with some
>>> minor conflicts. I'm running the following command:
>>>
>>> $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
>>> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
>>> /home/vmuser/scratch/source /home/vmuser/scratch/dest
>>> FUSE library version: 3.17.0
>>> Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672
>>> dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
>>> INIT: 7.40
>>> flags=0x73fffffb
>>> max_readahead=0x00020000
>>> INIT: 7.40
>>> flags=0x4041f429
>>> max_readahead=0x00020000
>>> max_write=0x00100000
>>> max_background=0
>>> congestion_threshold=0
>>> time_gran=1
>>> unique: 2, success, outsize: 80
>>>
>>> I created the source and dest folders which are both empty.
>>>
>>> I see the following in dmesg:
>>>
>>> [ 2453.197510] uring is disabled
>>> [ 2453.198525] uring is disabled
>>> [ 2453.198749] uring is disabled
>>> ...
>>>
>>> If I then try to list the directory /home/vmuser/scratch:
>>>
>>> $ ls -l /home/vmuser/scratch
>>> ls: cannot access 'dest': Software caused connection abort
>>>
>>> And passthrough_hp terminates.
>>>
>>> My kconfig:
>>>
>>> CONFIG_FUSE_FS=m
>>> CONFIG_FUSE_PASSTHROUGH=y
>>> CONFIG_FUSE_IO_URING=y
>>>
>>> I'll look into it next week but, do you see anything obviously wrong?
>>
>>
>> thanks for testing it! I just pushed a fix to my libfuse branches to
>> avoid the abort for -EOPNOTSUPP. It will gracefully fall back to
>> /dev/fuse IO now.
>>
>> Could you please use the rfcv4 branch, as the plain uring
>> branch will soon get incompatible updates for rfc5?
>>
>> https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4
>>
>>
>> The short answer to let you enable fuse-io-uring:
>>
>> echo 1 >/sys/module/fuse/parameters/enable_uring
>>
>>
>> (With that the "uring is disabled" should be fixed.)
>
> Thanks, using this branch fixed the issue and now I can see the dest
> folder mirroring that of the source folder. There are two issues I
> noticed:
>
> [63490.068211] ---[ end trace 0000000000000000 ]---
> [64010.242963] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:330
> [64010.243531] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 11057, name: fuse-ring-1
> [64010.244092] preempt_count: 1, expected: 0
> [64010.244346] RCU nest depth: 0, expected: 0
> [64010.244599] 2 locks held by fuse-ring-1/11057:
> [64010.244886] #0: ffff888105db20a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x900/0xd80
> [64010.245476] #1: ffff88810f941818 (&fc->lock){+.+.}-{2:2}, at: fuse_uring_cmd+0x83e/0x1890 [fuse]
> [64010.246031] CPU: 1 UID: 0 PID: 11057 Comm: fuse-ring-1 Tainted: G W 6.11.0-10089-g0d2090ccdbbe #2
> [64010.246655] Tainted: [W]=WARN
> [64010.246853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [64010.247542] Call Trace:
> [64010.247705] <TASK>
> [64010.247860] dump_stack_lvl+0xb0/0xd0
> [64010.248090] __might_resched+0x2f8/0x510
> [64010.248338] __kmalloc_cache_noprof+0x2aa/0x390
> [64010.248614] ? lockdep_init_map_type+0x2cb/0x7b0
> [64010.248923] ? fuse_uring_cmd+0xcc2/0x1890 [fuse]
> [64010.249215] fuse_uring_cmd+0xcc2/0x1890 [fuse]
> [64010.249506] io_uring_cmd+0x214/0x500
> [64010.249745] io_issue_sqe+0x588/0x1810
> [64010.249999] ? __pfx_io_issue_sqe+0x10/0x10
> [64010.250254] ? io_alloc_async_data+0x88/0x120
> [64010.250516] ? io_alloc_async_data+0x88/0x120
> [64010.250811] ? io_uring_cmd_prep+0x2eb/0x9f0
> [64010.251103] io_submit_sqes+0x796/0x1f80
> [64010.251387] __do_sys_io_uring_enter+0x90a/0xd80
> [64010.251696] ? do_user_addr_fault+0x26f/0xb60
> [64010.251991] ? __pfx___do_sys_io_uring_enter+0x10/0x10
> [64010.252333] ? __up_read+0x3ba/0x750
> [64010.252565] ? __pfx___up_read+0x10/0x10
> [64010.252868] do_syscall_64+0x68/0x140
> [64010.253121] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [64010.253444] RIP: 0033:0x7f03a03fb7af
> [64010.253679] Code: 45 0f b6 90 d0 00 00 00 41 8b b8 cc 00 00 00 45 31 c0 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> a8 02 74 cc f0 48 83 0c 24 00 49 8b 40 20 8b 00 a8 01 74 bc b8
> [64010.254801] RSP: 002b:00007f039f3ffd08 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
> [64010.255261] RAX: ffffffffffffffda RBX: 0000561ab7c1ced0 RCX: 00007f03a03fb7af
> [64010.255695] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000009
> [64010.256127] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000008
> [64010.256556] R10: 0000000000000000 R11: 0000000000000246 R12: 0000561ab7c1d7a8
> [64010.256990] R13: 0000561ab7c1da00 R14: 0000561ab7c1d520 R15: 0000000000000001
> [64010.257442] </TASK>
Regarding issue one, does this patch solve it?
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index e518d4379aa1..304919bc12fb 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -168,6 +168,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
if (!queue)
return ERR_PTR(-ENOMEM);
+ pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
+ if (!pq) {
+ kfree(queue);
+ return ERR_PTR(-ENOMEM);
+ }
+
spin_lock(&fc->lock);
if (ring->queues[qid]) {
spin_unlock(&fc->lock);
@@ -186,11 +192,6 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
INIT_LIST_HEAD(&queue->ent_in_userspace);
INIT_LIST_HEAD(&queue->fuse_req_queue);
- pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
- if (!pq) {
- kfree(queue);
- return ERR_PTR(-ENOMEM);
- }
queue->fpq.processing = pq;
fuse_pqueue_init(&queue->fpq);
I think we don't need GFP_ATOMIC, but can do allocations before taking
the lock. This pq allocation is new in v4 and I forgot to put it into
the right place and it slipped through my very basic testing (I'm
concentrating on the design changes for now - testing will come back
with v6).
>
> If I am already in dest when I do the mount using passthrough_hp and
> then e.g. ls, it hangs indefinitely even if I kill passthrough_hp.
I'm going to check in a bit. I hope it is not a recursion issue.
Thanks,
Bernd
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-22 10:24 ` Bernd Schubert
@ 2024-10-22 12:46 ` Bernd Schubert
2024-10-22 17:10 ` David Wei
2024-10-22 17:12 ` David Wei
1 sibling, 1 reply; 36+ messages in thread
From: Bernd Schubert @ 2024-10-22 12:46 UTC (permalink / raw)
To: David Wei, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
On 10/22/24 12:24, Bernd Schubert wrote:
> On 10/21/24 22:57, David Wei wrote:
>> If I am already in dest when I do the mount using passthrough_hp and
>> then e.g. ls, it hangs indefinitely even if I kill passthrough_hp.
>
> I'm going to check in a bit. I hope it is not a recursion issue.
>
Hmm, I cannot reproduce this
bernd@squeeze1 dest>pwd
/scratch/dest
bernd@squeeze1 dest>/home/bernd/src/libfuse/github//build-debian/example/passthrough_hp -o allow_other --nopassthrough --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 /scratch/source /scratch/dest
bernd@squeeze1 dest>ll
total 6.4G
drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 scratch_mnt
drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 test_dir
-rw-r--r-- 1 bernd bernd 50G Sep 12 14:20 testfile
-rwxr-xr-x 1 bernd bernd 6.3G Sep 12 14:39 testfile1
Same when running in foreground and doing operations from another console
cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 732
unique: 4, result=104
cqe unique: 6, opcode: STATFS (17), nodeid: 1, insize: 0, pid: 732
unique: 6, result=80
In order to check it is not a recursion issue I also switched my VM to
one core - still no issue. What is your setup?
Also, I'm still on 6.10, I want to send out v5 with separated headers
later this week and next week v6 (and maybe without RFC) for 6.12 next
week.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-22 12:46 ` Bernd Schubert
@ 2024-10-22 17:10 ` David Wei
0 siblings, 0 replies; 36+ messages in thread
From: David Wei @ 2024-10-22 17:10 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
On 2024-10-22 05:46, Bernd Schubert wrote:
>
>
> On 10/22/24 12:24, Bernd Schubert wrote:
>> On 10/21/24 22:57, David Wei wrote:
>>> If I am already in dest when I do the mount using passthrough_hp and
>>> then e.g. ls, it hangs indefinitely even if I kill passthrough_hp.
>>
>> I'm going to check in a bit. I hope it is not a recursion issue.
>>
>
> Hmm, I cannot reproduce this
>
> bernd@squeeze1 dest>pwd
> /scratch/dest
>
> bernd@squeeze1 dest>/home/bernd/src/libfuse/github//build-debian/example/passthrough_hp -o allow_other --nopassthrough --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 /scratch/source /scratch/dest
>
> bernd@squeeze1 dest>ll
> total 6.4G
> drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 scratch_mnt
> drwxr-xr-x 2 fusetests fusetests 4.0K Jul 30 17:59 test_dir
> -rw-r--r-- 1 bernd bernd 50G Sep 12 14:20 testfile
> -rwxr-xr-x 1 bernd bernd 6.3G Sep 12 14:39 testfile1
>
>
> Same when running in foreground and doing operations from another console
>
>
> cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 732
> unique: 4, result=104
> cqe unique: 6, opcode: STATFS (17), nodeid: 1, insize: 0, pid: 732
> unique: 6, result=80
>
>
> In order to check it is not a recursion issue I also switched my VM to
> one core - still no issue. What is your setup?
> Also, I'm still on 6.10, I want to send out v5 with separated headers
> later this week and next week v6 (and maybe without RFC) for 6.12 next
> week.
I tried this again and could not repro anymore. I think your latest
libfuse that falls back to /dev/fuse fixed it. Sorry for the noise!
>
>
> Thanks,
> Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-22 10:24 ` Bernd Schubert
2024-10-22 12:46 ` Bernd Schubert
@ 2024-10-22 17:12 ` David Wei
1 sibling, 0 replies; 36+ messages in thread
From: David Wei @ 2024-10-22 17:12 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
On 2024-10-22 03:24, Bernd Schubert wrote:
>
>
> On 10/21/24 22:57, David Wei wrote:
>> On 2024-10-21 04:47, Bernd Schubert wrote:
>>> Hi David,
>>>
>>> On 10/21/24 06:06, David Wei wrote:
>>>> [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>
>>>> On 2024-10-15 17:05, Bernd Schubert wrote:
>>>> [...]
>>>>>
>>>
>>> ...
>>>
>>>> Hi Bernd, I applied this patchset to io_uring-6.12 branch with some
>>>> minor conflicts. I'm running the following command:
>>>>
>>>> $ sudo ./build/example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
>>>> --uring --uring-per-core-queue --uring-fg-depth=1 --uring-bg-depth=1 \
>>>> /home/vmuser/scratch/source /home/vmuser/scratch/dest
>>>> FUSE library version: 3.17.0
>>>> Creating ring per-core-queue=1 sync-depth=1 async-depth=1 arglen=1052672
>>>> dev unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
>>>> INIT: 7.40
>>>> flags=0x73fffffb
>>>> max_readahead=0x00020000
>>>> INIT: 7.40
>>>> flags=0x4041f429
>>>> max_readahead=0x00020000
>>>> max_write=0x00100000
>>>> max_background=0
>>>> congestion_threshold=0
>>>> time_gran=1
>>>> unique: 2, success, outsize: 80
>>>>
>>>> I created the source and dest folders which are both empty.
>>>>
>>>> I see the following in dmesg:
>>>>
>>>> [ 2453.197510] uring is disabled
>>>> [ 2453.198525] uring is disabled
>>>> [ 2453.198749] uring is disabled
>>>> ...
>>>>
>>>> If I then try to list the directory /home/vmuser/scratch:
>>>>
>>>> $ ls -l /home/vmuser/scratch
>>>> ls: cannot access 'dest': Software caused connection abort
>>>>
>>>> And passthrough_hp terminates.
>>>>
>>>> My kconfig:
>>>>
>>>> CONFIG_FUSE_FS=m
>>>> CONFIG_FUSE_PASSTHROUGH=y
>>>> CONFIG_FUSE_IO_URING=y
>>>>
>>>> I'll look into it next week but, do you see anything obviously wrong?
>>>
>>>
>>> thanks for testing it! I just pushed a fix to my libfuse branches to
>>> avoid the abort for -EOPNOTSUPP. It will gracefully fall back to
>>> /dev/fuse IO now.
>>>
>>> Could you please use the rfcv4 branch, as the plain uring
>>> branch will soon get incompatible updates for rfc5?
>>>
>>> https://github.com/bsbernd/libfuse/tree/uring-for-rfcv4
>>>
>>>
>>> The short answer to let you enable fuse-io-uring:
>>>
>>> echo 1 >/sys/module/fuse/parameters/enable_uring
>>>
>>>
>>> (With that the "uring is disabled" should be fixed.)
>>
>> Thanks, using this branch fixed the issue and now I can see the dest
>> folder mirroring that of the source folder. There are two issues I
>> noticed:
>>
>> [63490.068211] ---[ end trace 0000000000000000 ]---
>> [64010.242963] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:330
>> [64010.243531] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 11057, name: fuse-ring-1
>> [64010.244092] preempt_count: 1, expected: 0
>> [64010.244346] RCU nest depth: 0, expected: 0
>> [64010.244599] 2 locks held by fuse-ring-1/11057:
>> [64010.244886] #0: ffff888105db20a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x900/0xd80
>> [64010.245476] #1: ffff88810f941818 (&fc->lock){+.+.}-{2:2}, at: fuse_uring_cmd+0x83e/0x1890 [fuse]
>> [64010.246031] CPU: 1 UID: 0 PID: 11057 Comm: fuse-ring-1 Tainted: G W 6.11.0-10089-g0d2090ccdbbe #2
>> [64010.246655] Tainted: [W]=WARN
>> [64010.246853] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
>> [64010.247542] Call Trace:
>> [64010.247705] <TASK>
>> [64010.247860] dump_stack_lvl+0xb0/0xd0
>> [64010.248090] __might_resched+0x2f8/0x510
>> [64010.248338] __kmalloc_cache_noprof+0x2aa/0x390
>> [64010.248614] ? lockdep_init_map_type+0x2cb/0x7b0
>> [64010.248923] ? fuse_uring_cmd+0xcc2/0x1890 [fuse]
>> [64010.249215] fuse_uring_cmd+0xcc2/0x1890 [fuse]
>> [64010.249506] io_uring_cmd+0x214/0x500
>> [64010.249745] io_issue_sqe+0x588/0x1810
>> [64010.249999] ? __pfx_io_issue_sqe+0x10/0x10
>> [64010.250254] ? io_alloc_async_data+0x88/0x120
>> [64010.250516] ? io_alloc_async_data+0x88/0x120
>> [64010.250811] ? io_uring_cmd_prep+0x2eb/0x9f0
>> [64010.251103] io_submit_sqes+0x796/0x1f80
>> [64010.251387] __do_sys_io_uring_enter+0x90a/0xd80
>> [64010.251696] ? do_user_addr_fault+0x26f/0xb60
>> [64010.251991] ? __pfx___do_sys_io_uring_enter+0x10/0x10
>> [64010.252333] ? __up_read+0x3ba/0x750
>> [64010.252565] ? __pfx___up_read+0x10/0x10
>> [64010.252868] do_syscall_64+0x68/0x140
>> [64010.253121] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [64010.253444] RIP: 0033:0x7f03a03fb7af
>> [64010.253679] Code: 45 0f b6 90 d0 00 00 00 41 8b b8 cc 00 00 00 45 31 c0 41 b9 08 00 00 00 41 83 e2 01 41 c1 e2 04 41 09 c2 b8 aa 01 00 00 0f 05 <c3> a8 02 74 cc f0 48 83 0c 24 00 49 8b 40 20 8b 00 a8 01 74 bc b8
>> [64010.254801] RSP: 002b:00007f039f3ffd08 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
>> [64010.255261] RAX: ffffffffffffffda RBX: 0000561ab7c1ced0 RCX: 00007f03a03fb7af
>> [64010.255695] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000009
>> [64010.256127] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000008
>> [64010.256556] R10: 0000000000000000 R11: 0000000000000246 R12: 0000561ab7c1d7a8
>> [64010.256990] R13: 0000561ab7c1da00 R14: 0000561ab7c1d520 R15: 0000000000000001
>> [64010.257442] </TASK>
>
> Regarding issue one, does this patch solve it?
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index e518d4379aa1..304919bc12fb 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -168,6 +168,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
> if (!queue)
> return ERR_PTR(-ENOMEM);
> + pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
> + if (!pq) {
> + kfree(queue);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> spin_lock(&fc->lock);
> if (ring->queues[qid]) {
> spin_unlock(&fc->lock);
> @@ -186,11 +192,6 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> INIT_LIST_HEAD(&queue->ent_in_userspace);
> INIT_LIST_HEAD(&queue->fuse_req_queue);
>
> - pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
> - if (!pq) {
> - kfree(queue);
> - return ERR_PTR(-ENOMEM);
> - }
> queue->fpq.processing = pq;
> fuse_pqueue_init(&queue->fpq);
>
>
> I think we don't need GFP_ATOMIC, but can do allocations before taking
> the lock. This pq allocation is new in v4 and I forgot to put it into
> the right place and it slipped through my very basic testing (I'm
> concentrating on the design changes for now - testing will come back
> with v6).
Thanks, this patch fixed it for me.
>
>>
>> If I am already in dest when I do the mount using passthrough_hp and
>> then e.g. ls, it hangs indefinitely even if I kill passthrough_hp.
>
> I'm going to check in a bit. I hope it is not a recursion issue.
>
>
> Thanks,
> Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
` (16 preceding siblings ...)
2024-10-21 4:06 ` David Wei
@ 2024-10-22 22:10 ` David Wei
2024-11-04 8:24 ` Bernd Schubert
17 siblings, 1 reply; 36+ messages in thread
From: David Wei @ 2024-10-22 22:10 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
On 2024-10-15 17:05, Bernd Schubert wrote:
> RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM
> (32 cores) with a kernel that has several debug options
> enabled (like KASAN and MSAN). RFCv3 is not that well tested yet.
> O_DIRECT is currently not working well with /dev/fuse and
> also these patches, a patch has been submitted to fix that (although
> the approach is refused)
> https://www.spinics.net/lists/linux-fsdevel/msg280028.html
Hi Bernd, I applied this patch and the associated libfuse patch at:
https://github.com/bsbernd/libfuse/tree/aligned-writes
I have a simple Python FUSE client that is still returning EINVAL for
write():
with open(sys.argv[1], 'r+b') as f:
mmapped_file = mmap.mmap(f.fileno(), 0)
shm = shared_memory.SharedMemory(create=True, size=mmapped_file.size())
shm.buf[:mmapped_file.size()] = mmapped_file[:]
fd = os.open("/home/vmuser/scratch/dest/out", O_RDWR|O_CREAT|O_DIRECT)
with open(fd, 'w+b') as f2:
f2.write(bytes(shm.buf))
mmapped_file.close()
shm.unlink()
shm.close()
I'll keep looking at this but letting you know in case it's something
obvious again.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-10-16 0:05 ` [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task Bernd Schubert
@ 2024-11-04 0:28 ` Pavel Begunkov
2024-11-04 22:15 ` Bernd Schubert
0 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2024-11-04 0:28 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Amir Goldstein,
Ming Lei
On 10/16/24 01:05, Bernd Schubert wrote:
> From: Pavel Begunkov <[email protected]>
>
> When the taks that submitted a request is dying, a task work for that
> request might get run by a kernel thread or even worse by a half
> dismantled task. We can't just cancel the task work without running the
> callback as the cmd might need to do some clean up, so pass a flag
> instead. If set, it's not safe to access any task resources and the
> callback is expected to cancel the cmd ASAP.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> include/linux/io_uring_types.h | 1 +
> io_uring/uring_cmd.c | 6 +++++-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 7abdc09271245ff7de3fb9a905ca78b7561e37eb..869a81c63e4970576155043fce7fe656293d7f58 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -37,6 +37,7 @@ enum io_uring_cmd_flags {
> /* set when uring wants to cancel a previously issued command */
> IO_URING_F_CANCEL = (1 << 11),
> IO_URING_F_COMPAT = (1 << 12),
> + IO_URING_F_TASK_DEAD = (1 << 13),
> };
>
> struct io_wq_work_node {
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 21ac5fb2d5f087e1174d5c94815d580972db6e3f..82c6001cc0696bbcbebb92153e1461f2a9aeebc3 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -119,9 +119,13 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
> {
> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
> + unsigned int flags = IO_URING_F_COMPLETE_DEFER;
> +
> + if (req->task != current)
> + flags |= IO_URING_F_TASK_DEAD;
Bernd, please don't change patches under my name without any
notice. This check is wrong, just stick to the original
https://lore.kernel.org/io-uring/[email protected]/
In general if you need to change something, either stick your
name, so that I know it might be a derivative, or reflect it in
the commit message, e.g.
Signed-off-by: initial author
[Person 2: changed this and that]
Signed-off-by: person 2
Also, a quick note that btrfs also need the patch, so it'll likely
get queued via either io_uring or btrfs trees for next.
> /* task_work executor checks the deffered list completion */
> - ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER);
> + ioucmd->task_work_cb(ioucmd, flags);
> }
>
> void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-10-22 22:10 ` David Wei
@ 2024-11-04 8:24 ` Bernd Schubert
2024-11-04 23:02 ` Bernd Schubert
0 siblings, 1 reply; 36+ messages in thread
From: Bernd Schubert @ 2024-11-04 8:24 UTC (permalink / raw)
To: David Wei, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Amir Goldstein, Ming Lei, Josef Bacik
Hi David,
On 10/23/24 00:10, David Wei wrote:
> [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On 2024-10-15 17:05, Bernd Schubert wrote:
>> RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM
>> (32 cores) with a kernel that has several debug options
>> enabled (like KASAN and MSAN). RFCv3 is not that well tested yet.
>> O_DIRECT is currently not working well with /dev/fuse and
>> also these patches, a patch has been submitted to fix that (although
>> the approach is refused)
>> https://www.spinics.net/lists/linux-fsdevel/msg280028.html
>
> Hi Bernd, I applied this patch and the associated libfuse patch at:
>
> https://github.com/bsbernd/libfuse/tree/aligned-writes
>
> I have a simple Python FUSE client that is still returning EINVAL for
> write():
>
> with open(sys.argv[1], 'r+b') as f:
> mmapped_file = mmap.mmap(f.fileno(), 0)
> shm = shared_memory.SharedMemory(create=True, size=mmapped_file.size())
> shm.buf[:mmapped_file.size()] = mmapped_file[:]
> fd = os.open("/home/vmuser/scratch/dest/out", O_RDWR|O_CREAT|O_DIRECT)
> with open(fd, 'w+b') as f2:
> f2.write(bytes(shm.buf))
> mmapped_file.close()
> shm.unlink()
> shm.close()
>
> I'll keep looking at this but letting you know in case it's something
> obvious again.
the 'aligned-writes' libfuse branch would need another kernel patch. Please
hold on a little bit, I hope to send out a new version later today or
tomorrow that separates headers from payload - alignment is guaranteed.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-04 0:28 ` Pavel Begunkov
@ 2024-11-04 22:15 ` Bernd Schubert
2024-11-05 1:08 ` Pavel Begunkov
0 siblings, 1 reply; 36+ messages in thread
From: Bernd Schubert @ 2024-11-04 22:15 UTC (permalink / raw)
To: Pavel Begunkov, Miklos Szeredi
Cc: Jens Axboe, [email protected],
[email protected], Joanne Koong, Amir Goldstein, Ming Lei
On 11/4/24 01:28, Pavel Begunkov wrote:
> On 10/16/24 01:05, Bernd Schubert wrote:
>> From: Pavel Begunkov <[email protected]>
>>
>> When the taks that submitted a request is dying, a task work for that
>> request might get run by a kernel thread or even worse by a half
>> dismantled task. We can't just cancel the task work without running the
>> callback as the cmd might need to do some clean up, so pass a flag
>> instead. If set, it's not safe to access any task resources and the
>> callback is expected to cancel the cmd ASAP.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> include/linux/io_uring_types.h | 1 +
>> io_uring/uring_cmd.c | 6 +++++-
>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/io_uring_types.h
>> b/include/linux/io_uring_types.h
>> index
>> 7abdc09271245ff7de3fb9a905ca78b7561e37eb..869a81c63e4970576155043fce7fe656293d7f58 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -37,6 +37,7 @@ enum io_uring_cmd_flags {
>> /* set when uring wants to cancel a previously issued command */
>> IO_URING_F_CANCEL = (1 << 11),
>> IO_URING_F_COMPAT = (1 << 12),
>> + IO_URING_F_TASK_DEAD = (1 << 13),
>> };
>> struct io_wq_work_node {
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index
>> 21ac5fb2d5f087e1174d5c94815d580972db6e3f..82c6001cc0696bbcbebb92153e1461f2a9aeebc3 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -119,9 +119,13 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>> static void io_uring_cmd_work(struct io_kiocb *req, struct
>> io_tw_state *ts)
>> {
>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct
>> io_uring_cmd);
>> + unsigned int flags = IO_URING_F_COMPLETE_DEFER;
>> +
>> + if (req->task != current)
>> + flags |= IO_URING_F_TASK_DEAD;
>
> Bernd, please don't change patches under my name without any
> notice. This check is wrong, just stick to the original
>
> https://lore.kernel.org/io-uring/[email protected]/
>
> In general if you need to change something, either stick your
> name, so that I know it might be a derivative, or reflect it in
> the commit message, e.g.
>
> Signed-off-by: initial author
> [Person 2: changed this and that]
> Signed-off-by: person 2
Oh sorry, for sure. I totally forgot to update the commit message.
Somehow the initial version didn't trigger. I need to double check to
see if there wasn't a testing issue on my side - going to check tomorrow.
>
> Also, a quick note that btrfs also need the patch, so it'll likely
> get queued via either io_uring or btrfs trees for next.
Thanks, good to know, one patch less to carry :)
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 00/15] fuse: fuse-over-io-uring
2024-11-04 8:24 ` Bernd Schubert
@ 2024-11-04 23:02 ` Bernd Schubert
0 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-11-04 23:02 UTC (permalink / raw)
To: David Wei, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, [email protected],
[email protected], Joanne Koong, Amir Goldstein, Ming Lei,
Josef Bacik
On 11/4/24 09:24, Bernd Schubert wrote:
> Hi David,
>
> On 10/23/24 00:10, David Wei wrote:
>> [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>
>> On 2024-10-15 17:05, Bernd Schubert wrote:
>>> RFCv1 and RFCv2 have been tested with multiple xfstest runs in a VM
>>> (32 cores) with a kernel that has several debug options
>>> enabled (like KASAN and MSAN). RFCv3 is not that well tested yet.
>>> O_DIRECT is currently not working well with /dev/fuse and
>>> also these patches, a patch has been submitted to fix that (although
>>> the approach is refused)
>>> https://www.spinics.net/lists/linux-fsdevel/msg280028.html
>>
>> Hi Bernd, I applied this patch and the associated libfuse patch at:
>>
>> https://github.com/bsbernd/libfuse/tree/aligned-writes
>>
>> I have a simple Python FUSE client that is still returning EINVAL for
>> write():
>>
>> with open(sys.argv[1], 'r+b') as f:
>> mmapped_file = mmap.mmap(f.fileno(), 0)
>> shm = shared_memory.SharedMemory(create=True, size=mmapped_file.size())
>> shm.buf[:mmapped_file.size()] = mmapped_file[:]
>> fd = os.open("/home/vmuser/scratch/dest/out", O_RDWR|O_CREAT|O_DIRECT)
>> with open(fd, 'w+b') as f2:
>> f2.write(bytes(shm.buf))
>> mmapped_file.close()
>> shm.unlink()
>> shm.close()
>>
>> I'll keep looking at this but letting you know in case it's something
>> obvious again.
>
> the 'aligned-writes' libfuse branch would need another kernel patch. Please
> hold on a little bit, I hope to send out a new version later today or
> tomorrow that separates headers from payload - alignment is guaranteed.
>
If you are very brave, you could try out this (sorry, still on 6.10)
https://github.com/bsbernd/linux/tree/fuse-uring-for-6.10-rfc5
https://github.com/bsbernd/libfuse/tree/uring
Right now #fuse-uring-for-6.10-rfc5 is rather similar to
fuse-uring-for-6.10-rfc4, with two additional patches to
separate headers from payload. The head commit, which
updates fuse-io-uring is going to be rebased into the
other commits tomorrow.
Also, I just noticed a tear down issue, when the daemon
is killed while IO is going on - busy inodes on sb shutdown.
Some fuse requests are probably not correctly released, I
guess that is also already present on rfcv4. Will look into
it in the morning.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-04 22:15 ` Bernd Schubert
@ 2024-11-05 1:08 ` Pavel Begunkov
2024-11-05 23:02 ` Bernd Schubert
0 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2024-11-05 1:08 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, [email protected],
[email protected], Joanne Koong, Amir Goldstein, Ming Lei
On 11/4/24 22:15, Bernd Schubert wrote:
> On 11/4/24 01:28, Pavel Begunkov wrote:
...
>> In general if you need to change something, either stick your
>> name, so that I know it might be a derivative, or reflect it in
>> the commit message, e.g.
>>
>> Signed-off-by: initial author
>> [Person 2: changed this and that]
>> Signed-off-by: person 2
>
> Oh sorry, for sure. I totally forgot to update the commit message.
>
> Somehow the initial version didn't trigger. I need to double check to
"Didn't trigger" like in "kernel was still crashing"?
FWIW, the original version is how it's handled in several places
across io_uring, and the difference is a gap for !DEFER_TASKRUN
when a task_work is queued somewhere in between when a task is
started going through exit() but haven't got PF_EXITING set yet.
IOW, should be harder to hit.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-05 1:08 ` Pavel Begunkov
@ 2024-11-05 23:02 ` Bernd Schubert
2024-11-06 0:14 ` Pavel Begunkov
2024-11-06 4:44 ` Ming Lei
0 siblings, 2 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-11-05 23:02 UTC (permalink / raw)
To: Pavel Begunkov, Miklos Szeredi
Cc: Jens Axboe, [email protected],
[email protected], Joanne Koong, Amir Goldstein, Ming Lei
On 11/5/24 02:08, Pavel Begunkov wrote:
> On 11/4/24 22:15, Bernd Schubert wrote:
>> On 11/4/24 01:28, Pavel Begunkov wrote:
> ...
>>> In general if you need to change something, either stick your
>>> name, so that I know it might be a derivative, or reflect it in
>>> the commit message, e.g.
>>>
>>> Signed-off-by: initial author
>>> [Person 2: changed this and that]
>>> Signed-off-by: person 2
>>
>> Oh sorry, for sure. I totally forgot to update the commit message.
>>
>> Somehow the initial version didn't trigger. I need to double check to
>
> "Didn't trigger" like in "kernel was still crashing"?
My initial problem was a crash in iov_iter_get_pages2() on process
kill. And when I tested your initial patch IO_URING_F_TASK_DEAD didn't
get set. Jens then asked to test with the version that I have in my
branch and that worked fine. Although in the mean time I wonder if
I made test mistake (like just fuse.ko reload instead of reboot with
new kernel). Just fixed a couple of issues in my branch (basically
ready for the next version send), will test the initial patch
again as first thing in the morning.
>
> FWIW, the original version is how it's handled in several places
> across io_uring, and the difference is a gap for !DEFER_TASKRUN
> when a task_work is queued somewhere in between when a task is
> started going through exit() but haven't got PF_EXITING set yet.
> IOW, should be harder to hit.
>
Does that mean that the test for PF_EXITING is racy and we cannot
entirely rely on it?
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-05 23:02 ` Bernd Schubert
@ 2024-11-06 0:14 ` Pavel Begunkov
2024-11-06 19:28 ` Bernd Schubert
2024-11-06 4:44 ` Ming Lei
1 sibling, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2024-11-06 0:14 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, [email protected],
[email protected], Joanne Koong, Amir Goldstein, Ming Lei
On 11/5/24 23:02, Bernd Schubert wrote:
> On 11/5/24 02:08, Pavel Begunkov wrote:
>> On 11/4/24 22:15, Bernd Schubert wrote:
>>> On 11/4/24 01:28, Pavel Begunkov wrote:
>> ...
>>>> In general if you need to change something, either stick your
>>>> name, so that I know it might be a derivative, or reflect it in
>>>> the commit message, e.g.
>>>>
>>>> Signed-off-by: initial author
>>>> [Person 2: changed this and that]
>>>> Signed-off-by: person 2
>>>
>>> Oh sorry, for sure. I totally forgot to update the commit message.
>>>
>>> Somehow the initial version didn't trigger. I need to double check to
>>
>> "Didn't trigger" like in "kernel was still crashing"?
>
> My initial problem was a crash in iov_iter_get_pages2() on process
> kill. And when I tested your initial patch IO_URING_F_TASK_DEAD didn't
> get set. Jens then asked to test with the version that I have in my
> branch and that worked fine. Although in the mean time I wonder if
> I made test mistake (like just fuse.ko reload instead of reboot with
> new kernel). Just fixed a couple of issues in my branch (basically
> ready for the next version send), will test the initial patch
> again as first thing in the morning.
I see. Please let know if it doesn't work, it's not specific
to fuse, if there is a problem it'd also affects other core
io_uring parts.
>> FWIW, the original version is how it's handled in several places
>> across io_uring, and the difference is a gap for !DEFER_TASKRUN
>> when a task_work is queued somewhere in between when a task is
>> started going through exit() but haven't got PF_EXITING set yet.
>> IOW, should be harder to hit.
>>
>
> Does that mean that the test for PF_EXITING is racy and we cannot
> entirely rely on it?
No, the PF_EXITING check was fine, even though it'll look
different now for unrelated reasons. What I'm saying is that the
callback can get executed from the desired task, i.e.
req->task == current, but it can happen from a late exit(2)/etc.
path where the task is botched and likely doesn't have ->mm.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-05 23:02 ` Bernd Schubert
2024-11-06 0:14 ` Pavel Begunkov
@ 2024-11-06 4:44 ` Ming Lei
2024-11-06 19:34 ` Bernd Schubert
1 sibling, 1 reply; 36+ messages in thread
From: Ming Lei @ 2024-11-06 4:44 UTC (permalink / raw)
To: Bernd Schubert
Cc: Pavel Begunkov, Miklos Szeredi, Jens Axboe,
[email protected], [email protected],
Joanne Koong, Amir Goldstein, Ming Lei
On Wed, Nov 6, 2024 at 7:02 AM Bernd Schubert <[email protected]> wrote:
>
>
>
> On 11/5/24 02:08, Pavel Begunkov wrote:
> > On 11/4/24 22:15, Bernd Schubert wrote:
> >> On 11/4/24 01:28, Pavel Begunkov wrote:
> > ...
> >>> In general if you need to change something, either stick your
> >>> name, so that I know it might be a derivative, or reflect it in
> >>> the commit message, e.g.
> >>>
> >>> Signed-off-by: initial author
> >>> [Person 2: changed this and that]
> >>> Signed-off-by: person 2
> >>
> >> Oh sorry, for sure. I totally forgot to update the commit message.
> >>
> >> Somehow the initial version didn't trigger. I need to double check to
> >
> > "Didn't trigger" like in "kernel was still crashing"?
>
> My initial problem was a crash in iov_iter_get_pages2() on process
> kill. And when I tested your initial patch IO_URING_F_TASK_DEAD didn't
> get set. Jens then asked to test with the version that I have in my
> branch and that worked fine. Although in the mean time I wonder if
> I made test mistake (like just fuse.ko reload instead of reboot with
> new kernel). Just fixed a couple of issues in my branch (basically
> ready for the next version send), will test the initial patch
> again as first thing in the morning.
>
>
> >
> > FWIW, the original version is how it's handled in several places
> > across io_uring, and the difference is a gap for !DEFER_TASKRUN
> > when a task_work is queued somewhere in between when a task is
> > started going through exit() but haven't got PF_EXITING set yet.
> > IOW, should be harder to hit.
> >
>
> Does that mean that the test for PF_EXITING is racy and we cannot
> entirely rely on it?
Another solution is to mark uring_cmd as io_uring_cmd_mark_cancelable(),
which provides a chance to cancel cmd in the current context.
Thanks,
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-06 0:14 ` Pavel Begunkov
@ 2024-11-06 19:28 ` Bernd Schubert
0 siblings, 0 replies; 36+ messages in thread
From: Bernd Schubert @ 2024-11-06 19:28 UTC (permalink / raw)
To: Pavel Begunkov, Miklos Szeredi
Cc: Jens Axboe, [email protected],
[email protected], Joanne Koong, Amir Goldstein, Ming Lei
On 11/6/24 01:14, Pavel Begunkov wrote:
> On 11/5/24 23:02, Bernd Schubert wrote:
>> On 11/5/24 02:08, Pavel Begunkov wrote:
>>> On 11/4/24 22:15, Bernd Schubert wrote:
>>>> On 11/4/24 01:28, Pavel Begunkov wrote:
>>> ...
>>>>> In general if you need to change something, either stick your
>>>>> name, so that I know it might be a derivative, or reflect it in
>>>>> the commit message, e.g.
>>>>>
>>>>> Signed-off-by: initial author
>>>>> [Person 2: changed this and that]
>>>>> Signed-off-by: person 2
>>>>
>>>> Oh sorry, for sure. I totally forgot to update the commit message.
>>>>
>>>> Somehow the initial version didn't trigger. I need to double check to
>>>
>>> "Didn't trigger" like in "kernel was still crashing"?
>>
>> My initial problem was a crash in iov_iter_get_pages2() on process
>> kill. And when I tested your initial patch IO_URING_F_TASK_DEAD didn't
>> get set. Jens then asked to test with the version that I have in my
>> branch and that worked fine. Although in the mean time I wonder if
>> I made test mistake (like just fuse.ko reload instead of reboot with
>> new kernel). Just fixed a couple of issues in my branch (basically
>> ready for the next version send), will test the initial patch
>> again as first thing in the morning.
>
> I see. Please let know if it doesn't work, it's not specific
> to fuse, if there is a problem it'd also affects other core
> io_uring parts.
In my current branch getting that situation is rather hard, but
eventually got IO_URING_F_TASK_DEAD (>40 test rounds vs. every time
before...) - with the initial patch version - I think my testing was
flawed back that time.
>
>>> FWIW, the original version is how it's handled in several places
>>> across io_uring, and the difference is a gap for !DEFER_TASKRUN
>>> when a task_work is queued somewhere in between when a task is
>>> started going through exit() but haven't got PF_EXITING set yet.
>>> IOW, should be harder to hit.
>>>
>>
>> Does that mean that the test for PF_EXITING is racy and we cannot
>> entirely rely on it?
>
> No, the PF_EXITING check was fine, even though it'll look
> different now for unrelated reasons. What I'm saying is that the
> callback can get executed from the desired task, i.e.
> req->task == current, but it can happen from a late exit(2)/etc.
> path where the task is botched and likely doesn't have ->mm.
>
Ah ok, thanks for the info!
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-06 4:44 ` Ming Lei
@ 2024-11-06 19:34 ` Bernd Schubert
2024-11-07 16:11 ` Pavel Begunkov
0 siblings, 1 reply; 36+ messages in thread
From: Bernd Schubert @ 2024-11-06 19:34 UTC (permalink / raw)
To: Ming Lei
Cc: Pavel Begunkov, Miklos Szeredi, Jens Axboe,
[email protected], [email protected],
Joanne Koong, Amir Goldstein, Ming Lei
On 11/6/24 05:44, Ming Lei wrote:
> On Wed, Nov 6, 2024 at 7:02 AM Bernd Schubert <[email protected]> wrote:
>>
>>
>>
>> On 11/5/24 02:08, Pavel Begunkov wrote:
>>>
>>> FWIW, the original version is how it's handled in several places
>>> across io_uring, and the difference is a gap for !DEFER_TASKRUN
>>> when a task_work is queued somewhere in between when a task is
>>> started going through exit() but haven't got PF_EXITING set yet.
>>> IOW, should be harder to hit.
>>>
>>
>> Does that mean that the test for PF_EXITING is racy and we cannot
>> entirely rely on it?
>
> Another solution is to mark uring_cmd as io_uring_cmd_mark_cancelable(),
> which provides a chance to cancel cmd in the current context.
Yeah, I have that, see
[PATCH RFC v4 14/15] fuse: {io-uring} Prevent mount point hang on fuse-server termination
As I just wrote to Pavel, getting IO_URING_F_TASK_DEAD is rather hard
in my current branch.IO_URING_F_CANCEL didn't make a difference ,
I had especially tried to disable it - still neither
IO_URING_F_TASK_DEAD nor the crash got easily triggered. So I
reenabled IO_URING_F_CANCEL and then eventually
got IO_URING_F_TASK_DEAD - i.e. without checking the underlying code,
looks like we need both for safety measures.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task
2024-11-06 19:34 ` Bernd Schubert
@ 2024-11-07 16:11 ` Pavel Begunkov
0 siblings, 0 replies; 36+ messages in thread
From: Pavel Begunkov @ 2024-11-07 16:11 UTC (permalink / raw)
To: Bernd Schubert, Ming Lei
Cc: Miklos Szeredi, Jens Axboe, [email protected],
[email protected], Joanne Koong, Amir Goldstein, Ming Lei
On 11/6/24 19:34, Bernd Schubert wrote:
> On 11/6/24 05:44, Ming Lei wrote:
>> On Wed, Nov 6, 2024 at 7:02 AM Bernd Schubert <[email protected]> wrote:
>>>
>>>
>>>
>>> On 11/5/24 02:08, Pavel Begunkov wrote:
>>>>
>>>> FWIW, the original version is how it's handled in several places
>>>> across io_uring, and the difference is a gap for !DEFER_TASKRUN
>>>> when a task_work is queued somewhere in between when a task is
>>>> started going through exit() but haven't got PF_EXITING set yet.
>>>> IOW, should be harder to hit.
>>>>
>>>
>>> Does that mean that the test for PF_EXITING is racy and we cannot
>>> entirely rely on it?
>>
>> Another solution is to mark uring_cmd as io_uring_cmd_mark_cancelable(),
>> which provides a chance to cancel cmd in the current context.
In short, F_CANCEL not going to help, unfortunately.
The F_CANCEL path can and likely to be triggered from a kthread instead
of the original task. See call sites of io_uring_try_cancel_requests(),
where the task termination/exit path, i.e. io_uring_cancel_generic(), in
most cases will skip the call bc of the tctx_inflight() check.
Also, io_uring doesn't try to cancel queued task_work (the callback
is supposed to check if it need to fail the request), so if someone
queued up a task_work including via __io_uring_cmd_do_in_task() and
friends, even F_CANCEL won't be able to cancel it.
> Yeah, I have that, see
> [PATCH RFC v4 14/15] fuse: {io-uring} Prevent mount point hang on fuse-server termination
>
> As I just wrote to Pavel, getting IO_URING_F_TASK_DEAD is rather hard
> in my current branch.IO_URING_F_CANCEL didn't make a difference ,
> I had especially tried to disable it - still neither
> IO_URING_F_TASK_DEAD nor the crash got easily triggered. So I
> reenabled IO_URING_F_CANCEL and then eventually
> got IO_URING_F_TASK_DEAD - i.e. without checking the underlying code,
> looks like we need both for safety measures.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-11-07 16:11 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 0:05 [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 01/15] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 02/15] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 03/15] fuse: Move request bits Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 04/15] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 05/15] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 06/15] fuse: Make fuse_copy non static Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 07/15] fuse: Add buffer offset for uring into fuse_copy_state Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 08/15] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 09/15] fuse: {uring} Handle teardown of ring entries Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 10/15] fuse: {uring} Add a ring queue and send method Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 11/15] fuse: {uring} Allow to queue to the ring Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 12/15] io_uring/cmd: let cmds to know about dying task Bernd Schubert
2024-11-04 0:28 ` Pavel Begunkov
2024-11-04 22:15 ` Bernd Schubert
2024-11-05 1:08 ` Pavel Begunkov
2024-11-05 23:02 ` Bernd Schubert
2024-11-06 0:14 ` Pavel Begunkov
2024-11-06 19:28 ` Bernd Schubert
2024-11-06 4:44 ` Ming Lei
2024-11-06 19:34 ` Bernd Schubert
2024-11-07 16:11 ` Pavel Begunkov
2024-10-16 0:05 ` [PATCH RFC v4 13/15] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 14/15] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2024-10-16 0:05 ` [PATCH RFC v4 15/15] fuse: enable fuse-over-io-uring Bernd Schubert
2024-10-16 0:08 ` [PATCH RFC v4 00/15] fuse: fuse-over-io-uring Bernd Schubert
2024-10-21 4:06 ` David Wei
2024-10-21 11:47 ` Bernd Schubert
2024-10-21 20:57 ` David Wei
2024-10-22 10:24 ` Bernd Schubert
2024-10-22 12:46 ` Bernd Schubert
2024-10-22 17:10 ` David Wei
2024-10-22 17:12 ` David Wei
2024-10-22 22:10 ` David Wei
2024-11-04 8:24 ` Bernd Schubert
2024-11-04 23:02 ` Bernd Schubert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox