* [PATCH RFC v7 01/16] fuse: rename to fuse_dev_end_requests and make non-static
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-28 0:19 ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
` (16 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
fs/fuse/dev.c | 11 +++++------
fs/fuse/fuse_dev_i.h | 14 ++++++++++++++
2 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1f64ae6d7a69e53c8d96f2e1f5caca3ff2b4ab26..09b73044a9b6748767d2479dda0a09a97b8b4c0f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -7,6 +7,7 @@
*/
#include "fuse_i.h"
+#include "fuse_dev_i.h"
#include <linux/init.h>
#include <linux/module.h>
@@ -34,8 +35,6 @@ MODULE_ALIAS("devname:fuse");
static struct kmem_cache *fuse_req_cachep;
-static void end_requests(struct list_head *head);
-
static struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
@@ -1873,7 +1872,7 @@ static void fuse_resend(struct fuse_conn *fc)
spin_unlock(&fiq->lock);
list_for_each_entry(req, &to_queue, list)
clear_bit(FR_PENDING, &req->flags);
- end_requests(&to_queue);
+ fuse_dev_end_requests(&to_queue);
return;
}
/* iq and pq requests are both oldest to newest */
@@ -2192,7 +2191,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
}
/* Abort all requests on the given list (pending or processing) */
-static void end_requests(struct list_head *head)
+void fuse_dev_end_requests(struct list_head *head)
{
while (!list_empty(head)) {
struct fuse_req *req;
@@ -2295,7 +2294,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
wake_up_all(&fc->blocked_waitq);
spin_unlock(&fc->lock);
- end_requests(&to_end);
+ fuse_dev_end_requests(&to_end);
} else {
spin_unlock(&fc->lock);
}
@@ -2325,7 +2324,7 @@ int fuse_dev_release(struct inode *inode, struct file *file)
list_splice_init(&fpq->processing[i], &to_end);
spin_unlock(&fpq->lock);
- end_requests(&to_end);
+ fuse_dev_end_requests(&to_end);
/* Are we the last open device? */
if (atomic_dec_and_test(&fc->dev_count)) {
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..4fcff2223fa60fbfb844a3f8e1252a523c4c01af
--- /dev/null
+++ b/fs/fuse/fuse_dev_i.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (C) 2001-2008 Miklos Szeredi <[email protected]>
+ */
+#ifndef _FS_FUSE_DEV_I_H
+#define _FS_FUSE_DEV_I_H
+
+#include <linux/types.h>
+
+void fuse_dev_end_requests(struct list_head *head);
+
+#endif
+
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 01/16] fuse: rename to fuse_dev_end_requests and make non-static
2024-11-27 13:40 ` [PATCH RFC v7 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
@ 2024-11-28 0:19 ` Joanne Koong
0 siblings, 0 replies; 37+ messages in thread
From: Joanne Koong @ 2024-11-28 0:19 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Wed, Nov 27, 2024 at 5:41 AM Bernd Schubert <[email protected]> wrote:
>
> This function is needed by fuse_uring.c to clean ring queues,
> so make it non static. Especially in non-static mode the function
> name 'end_requests' should be prefixed with fuse_
>
> Signed-off-by: Bernd Schubert <[email protected]>
> Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
> ---
> fs/fuse/dev.c | 11 +++++------
> fs/fuse/fuse_dev_i.h | 14 ++++++++++++++
> 2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 1f64ae6d7a69e53c8d96f2e1f5caca3ff2b4ab26..09b73044a9b6748767d2479dda0a09a97b8b4c0f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -7,6 +7,7 @@
> */
>
> #include "fuse_i.h"
> +#include "fuse_dev_i.h"
>
> #include <linux/init.h>
> #include <linux/module.h>
> @@ -34,8 +35,6 @@ MODULE_ALIAS("devname:fuse");
>
> static struct kmem_cache *fuse_req_cachep;
>
> -static void end_requests(struct list_head *head);
> -
> static struct fuse_dev *fuse_get_dev(struct file *file)
> {
> /*
> @@ -1873,7 +1872,7 @@ static void fuse_resend(struct fuse_conn *fc)
> spin_unlock(&fiq->lock);
> list_for_each_entry(req, &to_queue, list)
> clear_bit(FR_PENDING, &req->flags);
> - end_requests(&to_queue);
> + fuse_dev_end_requests(&to_queue);
> return;
> }
> /* iq and pq requests are both oldest to newest */
> @@ -2192,7 +2191,7 @@ static __poll_t fuse_dev_poll(struct file *file, poll_table *wait)
> }
>
> /* Abort all requests on the given list (pending or processing) */
> -static void end_requests(struct list_head *head)
> +void fuse_dev_end_requests(struct list_head *head)
> {
> while (!list_empty(head)) {
> struct fuse_req *req;
> @@ -2295,7 +2294,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
> wake_up_all(&fc->blocked_waitq);
> spin_unlock(&fc->lock);
>
> - end_requests(&to_end);
> + fuse_dev_end_requests(&to_end);
> } else {
> spin_unlock(&fc->lock);
> }
> @@ -2325,7 +2324,7 @@ int fuse_dev_release(struct inode *inode, struct file *file)
> list_splice_init(&fpq->processing[i], &to_end);
> spin_unlock(&fpq->lock);
>
> - end_requests(&to_end);
> + fuse_dev_end_requests(&to_end);
>
> /* Are we the last open device? */
> if (atomic_dec_and_test(&fc->dev_count)) {
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..4fcff2223fa60fbfb844a3f8e1252a523c4c01af
> --- /dev/null
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * FUSE: Filesystem in Userspace
> + * Copyright (C) 2001-2008 Miklos Szeredi <[email protected]>
> + */
> +#ifndef _FS_FUSE_DEV_I_H
> +#define _FS_FUSE_DEV_I_H
> +
> +#include <linux/types.h>
> +
> +void fuse_dev_end_requests(struct list_head *head);
> +
> +#endif
> +
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 02/16] fuse: Move fuse_get_dev to header file
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-28 0:20 ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 03/16] fuse: Move request bits Bernd Schubert
` (15 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Another preparation patch, as this function will be needed by
fuse/dev.c and fuse/dev_uring.c.
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
fs/fuse/dev.c | 9 ---------
fs/fuse/fuse_dev_i.h | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 09b73044a9b6748767d2479dda0a09a97b8b4c0f..649513b55906d2aef99f79a942c2c63113796b5a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -35,15 +35,6 @@ MODULE_ALIAS("devname:fuse");
static struct kmem_cache *fuse_req_cachep;
-static struct fuse_dev *fuse_get_dev(struct file *file)
-{
- /*
- * Lockless access is OK, because file->private data is set
- * once during mount and is valid until the file is released.
- */
- return READ_ONCE(file->private_data);
-}
-
static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
{
INIT_LIST_HEAD(&req->list);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 4fcff2223fa60fbfb844a3f8e1252a523c4c01af..e7ea1b21c18204335c52406de5291f0c47d654f5 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,15 @@
#include <linux/types.h>
+static inline struct fuse_dev *fuse_get_dev(struct file *file)
+{
+ /*
+ * Lockless access is OK, because file->private data is set
+ * once during mount and is valid until the file is released.
+ */
+ return READ_ONCE(file->private_data);
+}
+
void fuse_dev_end_requests(struct list_head *head);
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 02/16] fuse: Move fuse_get_dev to header file
2024-11-27 13:40 ` [PATCH RFC v7 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
@ 2024-11-28 0:20 ` Joanne Koong
0 siblings, 0 replies; 37+ messages in thread
From: Joanne Koong @ 2024-11-28 0:20 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Wed, Nov 27, 2024 at 5:41 AM Bernd Schubert <[email protected]> wrote:
>
> Another preparation patch, as this function will be needed by
> fuse/dev.c and fuse/dev_uring.c.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
> ---
> fs/fuse/dev.c | 9 ---------
> fs/fuse/fuse_dev_i.h | 9 +++++++++
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 09b73044a9b6748767d2479dda0a09a97b8b4c0f..649513b55906d2aef99f79a942c2c63113796b5a 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -35,15 +35,6 @@ MODULE_ALIAS("devname:fuse");
>
> static struct kmem_cache *fuse_req_cachep;
>
> -static struct fuse_dev *fuse_get_dev(struct file *file)
> -{
> - /*
> - * Lockless access is OK, because file->private data is set
> - * once during mount and is valid until the file is released.
> - */
> - return READ_ONCE(file->private_data);
> -}
> -
> static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> {
> INIT_LIST_HEAD(&req->list);
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 4fcff2223fa60fbfb844a3f8e1252a523c4c01af..e7ea1b21c18204335c52406de5291f0c47d654f5 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -8,6 +8,15 @@
>
> #include <linux/types.h>
>
> +static inline struct fuse_dev *fuse_get_dev(struct file *file)
> +{
> + /*
> + * Lockless access is OK, because file->private data is set
> + * once during mount and is valid until the file is released.
> + */
> + return READ_ONCE(file->private_data);
> +}
> +
> void fuse_dev_end_requests(struct list_head *head);
>
> #endif
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 03/16] fuse: Move request bits
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-28 0:21 ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
` (14 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
These are needed by dev_uring functions as well
Signed-off-by: Bernd Schubert <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
---
fs/fuse/dev.c | 4 ----
fs/fuse/fuse_dev_i.h | 4 ++++
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 649513b55906d2aef99f79a942c2c63113796b5a..fd8898b0c1cca4d117982d5208d78078472b0dfb 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -29,10 +29,6 @@
MODULE_ALIAS_MISCDEV(FUSE_MINOR);
MODULE_ALIAS("devname:fuse");
-/* Ordinary requests have even IDs, while interrupts IDs are odd */
-#define FUSE_INT_REQ_BIT (1ULL << 0)
-#define FUSE_REQ_ID_STEP (1ULL << 1)
-
static struct kmem_cache *fuse_req_cachep;
static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index e7ea1b21c18204335c52406de5291f0c47d654f5..08a7e88e002773fcd18c25a229c7aa6450831401 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -8,6 +8,10 @@
#include <linux/types.h>
+/* Ordinary requests have even IDs, while interrupts IDs are odd */
+#define FUSE_INT_REQ_BIT (1ULL << 0)
+#define FUSE_REQ_ID_STEP (1ULL << 1)
+
static inline struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 03/16] fuse: Move request bits
2024-11-27 13:40 ` [PATCH RFC v7 03/16] fuse: Move request bits Bernd Schubert
@ 2024-11-28 0:21 ` Joanne Koong
0 siblings, 0 replies; 37+ messages in thread
From: Joanne Koong @ 2024-11-28 0:21 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Wed, Nov 27, 2024 at 5:41 AM Bernd Schubert <[email protected]> wrote:
>
> These are needed by dev_uring functions as well
>
> Signed-off-by: Bernd Schubert <[email protected]>
> Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Joanne Koong <[email protected]>
> ---
> fs/fuse/dev.c | 4 ----
> fs/fuse/fuse_dev_i.h | 4 ++++
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 649513b55906d2aef99f79a942c2c63113796b5a..fd8898b0c1cca4d117982d5208d78078472b0dfb 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -29,10 +29,6 @@
> MODULE_ALIAS_MISCDEV(FUSE_MINOR);
> MODULE_ALIAS("devname:fuse");
>
> -/* Ordinary requests have even IDs, while interrupts IDs are odd */
> -#define FUSE_INT_REQ_BIT (1ULL << 0)
> -#define FUSE_REQ_ID_STEP (1ULL << 1)
> -
> static struct kmem_cache *fuse_req_cachep;
>
> static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index e7ea1b21c18204335c52406de5291f0c47d654f5..08a7e88e002773fcd18c25a229c7aa6450831401 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -8,6 +8,10 @@
>
> #include <linux/types.h>
>
> +/* Ordinary requests have even IDs, while interrupts IDs are odd */
> +#define FUSE_INT_REQ_BIT (1ULL << 0)
> +#define FUSE_REQ_ID_STEP (1ULL << 1)
> +
> static inline struct fuse_dev *fuse_get_dev(struct file *file)
> {
> /*
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 04/16] fuse: Add fuse-io-uring design documentation
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (2 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 03/16] fuse: Move request bits Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-12-03 12:30 ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
` (13 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Signed-off-by: Bernd Schubert <[email protected]>
---
Documentation/filesystems/fuse-io-uring.rst | 101 ++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)
diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
new file mode 100644
index 0000000000000000000000000000000000000000..50fdba1ea566588be3663e29b04bb9bbb6c9e4fb
--- /dev/null
+++ b/Documentation/filesystems/fuse-io-uring.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+FUSE Uring design documentation
+==============================
+
+This documentation covers basic details how the fuse
+kernel/userspace communication through uring is configured
+and works. For generic details about FUSE see fuse.rst.
+
+This document also covers the current interface, which is
+still in development and might change.
+
+Limitations
+===========
+As of now not all requests types are supported through uring, userspace
+is required to also handle requests through /dev/fuse after
+uring setup is complete. Specifically notifications (initiated from
+the daemon side) and interrupts.
+
+Fuse io-uring configuration
+========================
+
+Fuse kernel requests are queued through the classical /dev/fuse
+read/write interface - until uring setup is complete.
+
+In order to set up fuse-over-io-uring fuse-server (user-space)
+needs to submit SQEs (opcode = IORING_OP_URING_CMD) to the
+/dev/fuse connection file descriptor. Initial submit is with
+the sub command FUSE_URING_REQ_FETCH, which will just register
+entries to be available in the kernel.
+
+Once at least one entry per queue is submitted, kernel starts
+to enqueue to ring queues.
+Note, every CPU core has its own fuse-io-uring queue.
+Userspace handles the CQE/fuse-request and submits the result as
+subcommand FUSE_URING_REQ_COMMIT_AND_FETCH - kernel completes
+the requests and also marks the entry available again. If there are
+pending requests waiting the request will be immediately submitted
+to the daemon again.
+
+Initial SQE
+-----------
+
+ | | FUSE filesystem daemon
+ | |
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD /
+ | | FUSE_URING_REQ_FETCH
+ | | [wait cqe]
+ | | >io_uring_wait_cqe() or
+ | | >io_uring_submit_and_wait()
+ | |
+ | >fuse_uring_cmd() |
+ | >fuse_uring_fetch() |
+ | >fuse_uring_ent_release() |
+
+
+Sending requests with CQEs
+--------------------------
+
+ | | FUSE filesystem daemon
+ | | [waiting for CQEs]
+ | "rm /mnt/fuse/file" |
+ | |
+ | >sys_unlink() |
+ | >fuse_unlink() |
+ | [allocate request] |
+ | >__fuse_request_send() |
+ | ... |
+ | >fuse_uring_queue_fuse_req |
+ | [queue request on fg or |
+ | bg queue] |
+ | >fuse_uring_assign_ring_entry() |
+ | >fuse_uring_send_to_ring() |
+ | >fuse_uring_copy_to_ring() |
+ | >io_uring_cmd_done() |
+ | >request_wait_answer() |
+ | [sleep on req->waitq] |
+ | | [receives and handles CQE]
+ | | [submit result and fetch next]
+ | | >io_uring_submit()
+ | | IORING_OP_URING_CMD/
+ | | FUSE_URING_REQ_COMMIT_AND_FETCH
+ | >fuse_uring_cmd() |
+ | >fuse_uring_commit_and_release() |
+ | >fuse_uring_copy_from_ring() |
+ | [ copy the result to the fuse req] |
+ | >fuse_uring_req_end_and_get_next() |
+ | >fuse_request_end() |
+ | [wake up req->waitq] |
+ | >fuse_uring_ent_release_and_fetch()|
+ | [wait or handle next req] |
+ | |
+ | |
+ | [req->waitq woken up] |
+ | <fuse_unlink() |
+ | <sys_unlink() |
+
+
+
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 04/16] fuse: Add fuse-io-uring design documentation
2024-11-27 13:40 ` [PATCH RFC v7 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
@ 2024-12-03 12:30 ` Pavel Begunkov
0 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 12:30 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> Documentation/filesystems/fuse-io-uring.rst | 101 ++++++++++++++++++++++++++++
> 1 file changed, 101 insertions(+)
>
> diff --git a/Documentation/filesystems/fuse-io-uring.rst b/Documentation/filesystems/fuse-io-uring.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..50fdba1ea566588be3663e29b04bb9bbb6c9e4fb
> --- /dev/null
> +++ b/Documentation/filesystems/fuse-io-uring.rst
> @@ -0,0 +1,101 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +FUSE Uring design documentation
> +==============================
I'd suggest to s/uring/io-uring/ (excluding func names/etc.), there
are several such here. It probably wouldn't confuse anyone, but I
don't a reason why we'd deviate from the original name.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 05/16] fuse: make args->in_args[0] to be always the header
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (3 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-28 0:27 ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
` (12 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This change sets up FUSE operations to have headers in args.in_args[0],
even for opcodes without an actual header. We do this to prepare for
cleanly separating payload from headers in the future.
For opcodes without a header, we use a zero-sized struct as a
placeholder. This approach:
- Keeps things consistent across all FUSE operations
- Will help with payload alignment later
- Avoids future issues when header sizes change
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dax.c | 11 ++++++-----
fs/fuse/dev.c | 9 +++++----
fs/fuse/dir.c | 32 ++++++++++++++++++--------------
fs/fuse/fuse_i.h | 13 +++++++++++++
fs/fuse/xattr.c | 7 ++++---
5 files changed, 46 insertions(+), 26 deletions(-)
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 12ef91d170bb3091ac35a33d2b9dc38330b00948..44bd30d448e4e8c1d8c6b0499e50564fa0828efc 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -240,11 +240,12 @@ static int fuse_send_removemapping(struct inode *inode,
args.opcode = FUSE_REMOVEMAPPING;
args.nodeid = fi->nodeid;
- args.in_numargs = 2;
- args.in_args[0].size = sizeof(*inargp);
- args.in_args[0].value = inargp;
- args.in_args[1].size = inargp->count * sizeof(*remove_one);
- args.in_args[1].value = remove_one;
+ args.in_numargs = 3;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = sizeof(*inargp);
+ args.in_args[1].value = inargp;
+ args.in_args[2].size = inargp->count * sizeof(*remove_one);
+ args.in_args[2].value = remove_one;
return fuse_simple_request(fm, &args);
}
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index fd8898b0c1cca4d117982d5208d78078472b0dfb..63c3865aebb7811fdf4a5729b2181ee8321421dc 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1735,7 +1735,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
args = &ap->args;
args->nodeid = outarg->nodeid;
args->opcode = FUSE_NOTIFY_REPLY;
- args->in_numargs = 2;
+ args->in_numargs = 3;
args->in_pages = true;
args->end = fuse_retrieve_end;
@@ -1762,9 +1762,10 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
}
ra->inarg.offset = outarg->offset;
ra->inarg.size = total_len;
- args->in_args[0].size = sizeof(ra->inarg);
- args->in_args[0].value = &ra->inarg;
- args->in_args[1].size = total_len;
+ fuse_set_zero_arg0(args);
+ args->in_args[1].size = sizeof(ra->inarg);
+ args->in_args[1].value = &ra->inarg;
+ args->in_args[2].size = total_len;
err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
if (err)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 54104dd48af7c94b312f1a8671c8905542d456c4..ccb240d4262f9399c9c90434aaeaf76b50f223ad 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -175,9 +175,10 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
memset(outarg, 0, sizeof(struct fuse_entry_out));
args->opcode = FUSE_LOOKUP;
args->nodeid = nodeid;
- args->in_numargs = 1;
- args->in_args[0].size = name->len + 1;
- args->in_args[0].value = name->name;
+ args->in_numargs = 2;
+ fuse_set_zero_arg0(args);
+ args->in_args[1].size = name->len + 1;
+ args->in_args[1].value = name->name;
args->out_numargs = 1;
args->out_args[0].size = sizeof(struct fuse_entry_out);
args->out_args[0].value = outarg;
@@ -927,11 +928,12 @@ static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
FUSE_ARGS(args);
args.opcode = FUSE_SYMLINK;
- args.in_numargs = 2;
- args.in_args[0].size = entry->d_name.len + 1;
- args.in_args[0].value = entry->d_name.name;
- args.in_args[1].size = len;
- args.in_args[1].value = link;
+ args.in_numargs = 3;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = entry->d_name.len + 1;
+ args.in_args[1].value = entry->d_name.name;
+ args.in_args[2].size = len;
+ args.in_args[2].value = link;
return create_new_entry(idmap, fm, &args, dir, entry, S_IFLNK);
}
@@ -991,9 +993,10 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
args.opcode = FUSE_UNLINK;
args.nodeid = get_node_id(dir);
- args.in_numargs = 1;
- args.in_args[0].size = entry->d_name.len + 1;
- args.in_args[0].value = entry->d_name.name;
+ args.in_numargs = 2;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = entry->d_name.len + 1;
+ args.in_args[1].value = entry->d_name.name;
err = fuse_simple_request(fm, &args);
if (!err) {
fuse_dir_changed(dir);
@@ -1014,9 +1017,10 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
args.opcode = FUSE_RMDIR;
args.nodeid = get_node_id(dir);
- args.in_numargs = 1;
- args.in_args[0].size = entry->d_name.len + 1;
- args.in_args[0].value = entry->d_name.name;
+ args.in_numargs = 2;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = entry->d_name.len + 1;
+ args.in_args[1].value = entry->d_name.name;
err = fuse_simple_request(fm, &args);
if (!err) {
fuse_dir_changed(dir);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e6cc3d552b1382fc43bfe5191efc46e956ca268c..e3748751e231d0991c050b31bdd84db0b8016f9f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -938,6 +938,19 @@ struct fuse_mount {
struct rcu_head rcu;
};
+/*
+ * Empty header for FUSE opcodes without specific header needs.
+ * Used as a placeholder in args->in_args[0] for consistency
+ * across all FUSE operations, simplifying request handling.
+ */
+struct fuse_zero_header {};
+
+static inline void fuse_set_zero_arg0(struct fuse_args *args)
+{
+ args->in_args[0].size = sizeof(struct fuse_zero_header);
+ args->in_args[0].value = NULL;
+}
+
static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
{
return sb->s_fs_info;
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 9f568d345c51236ddd421b162820a4ea9b0734f4..93dfb06b6cea045d6df90c61c900680968bda39f 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -164,9 +164,10 @@ int fuse_removexattr(struct inode *inode, const char *name)
args.opcode = FUSE_REMOVEXATTR;
args.nodeid = get_node_id(inode);
- args.in_numargs = 1;
- args.in_args[0].size = strlen(name) + 1;
- args.in_args[0].value = name;
+ args.in_numargs = 2;
+ fuse_set_zero_arg0(&args);
+ args.in_args[1].size = strlen(name) + 1;
+ args.in_args[1].value = name;
err = fuse_simple_request(fm, &args);
if (err == -ENOSYS) {
fm->fc->no_removexattr = 1;
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 05/16] fuse: make args->in_args[0] to be always the header
2024-11-27 13:40 ` [PATCH RFC v7 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2024-11-28 0:27 ` Joanne Koong
0 siblings, 0 replies; 37+ messages in thread
From: Joanne Koong @ 2024-11-28 0:27 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Wed, Nov 27, 2024 at 5:41 AM Bernd Schubert <[email protected]> wrote:
>
> This change sets up FUSE operations to have headers in args.in_args[0],
> even for opcodes without an actual header. We do this to prepare for
> cleanly separating payload from headers in the future.
>
> For opcodes without a header, we use a zero-sized struct as a
> placeholder. This approach:
> - Keeps things consistent across all FUSE operations
> - Will help with payload alignment later
> - Avoids future issues when header sizes change
>
> Signed-off-by: Bernd Schubert <[email protected]>
This LGTM. It might be worth mentioning in the commit message that
this won't add zero-sized arg headers to request types that don't take
in any in args.
Reviewed-by: Joanne Koong <[email protected]>
> ---
> fs/fuse/dax.c | 11 ++++++-----
> fs/fuse/dev.c | 9 +++++----
> fs/fuse/dir.c | 32 ++++++++++++++++++--------------
> fs/fuse/fuse_i.h | 13 +++++++++++++
> fs/fuse/xattr.c | 7 ++++---
> 5 files changed, 46 insertions(+), 26 deletions(-)
>
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 12ef91d170bb3091ac35a33d2b9dc38330b00948..44bd30d448e4e8c1d8c6b0499e50564fa0828efc 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -240,11 +240,12 @@ static int fuse_send_removemapping(struct inode *inode,
>
> args.opcode = FUSE_REMOVEMAPPING;
> args.nodeid = fi->nodeid;
> - args.in_numargs = 2;
> - args.in_args[0].size = sizeof(*inargp);
> - args.in_args[0].value = inargp;
> - args.in_args[1].size = inargp->count * sizeof(*remove_one);
> - args.in_args[1].value = remove_one;
> + args.in_numargs = 3;
> + fuse_set_zero_arg0(&args);
> + args.in_args[1].size = sizeof(*inargp);
> + args.in_args[1].value = inargp;
> + args.in_args[2].size = inargp->count * sizeof(*remove_one);
> + args.in_args[2].value = remove_one;
> return fuse_simple_request(fm, &args);
> }
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index fd8898b0c1cca4d117982d5208d78078472b0dfb..63c3865aebb7811fdf4a5729b2181ee8321421dc 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1735,7 +1735,7 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> args = &ap->args;
> args->nodeid = outarg->nodeid;
> args->opcode = FUSE_NOTIFY_REPLY;
> - args->in_numargs = 2;
> + args->in_numargs = 3;
> args->in_pages = true;
> args->end = fuse_retrieve_end;
>
> @@ -1762,9 +1762,10 @@ static int fuse_retrieve(struct fuse_mount *fm, struct inode *inode,
> }
> ra->inarg.offset = outarg->offset;
> ra->inarg.size = total_len;
> - args->in_args[0].size = sizeof(ra->inarg);
> - args->in_args[0].value = &ra->inarg;
> - args->in_args[1].size = total_len;
> + fuse_set_zero_arg0(args);
> + args->in_args[1].size = sizeof(ra->inarg);
> + args->in_args[1].value = &ra->inarg;
> + args->in_args[2].size = total_len;
>
> err = fuse_simple_notify_reply(fm, args, outarg->notify_unique);
> if (err)
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 54104dd48af7c94b312f1a8671c8905542d456c4..ccb240d4262f9399c9c90434aaeaf76b50f223ad 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -175,9 +175,10 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
> memset(outarg, 0, sizeof(struct fuse_entry_out));
> args->opcode = FUSE_LOOKUP;
> args->nodeid = nodeid;
> - args->in_numargs = 1;
> - args->in_args[0].size = name->len + 1;
> - args->in_args[0].value = name->name;
> + args->in_numargs = 2;
> + fuse_set_zero_arg0(args);
> + args->in_args[1].size = name->len + 1;
> + args->in_args[1].value = name->name;
> args->out_numargs = 1;
> args->out_args[0].size = sizeof(struct fuse_entry_out);
> args->out_args[0].value = outarg;
> @@ -927,11 +928,12 @@ static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
> FUSE_ARGS(args);
>
> args.opcode = FUSE_SYMLINK;
> - args.in_numargs = 2;
> - args.in_args[0].size = entry->d_name.len + 1;
> - args.in_args[0].value = entry->d_name.name;
> - args.in_args[1].size = len;
> - args.in_args[1].value = link;
> + args.in_numargs = 3;
> + fuse_set_zero_arg0(&args);
> + args.in_args[1].size = entry->d_name.len + 1;
> + args.in_args[1].value = entry->d_name.name;
> + args.in_args[2].size = len;
> + args.in_args[2].value = link;
> return create_new_entry(idmap, fm, &args, dir, entry, S_IFLNK);
> }
>
> @@ -991,9 +993,10 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
>
> args.opcode = FUSE_UNLINK;
> args.nodeid = get_node_id(dir);
> - args.in_numargs = 1;
> - args.in_args[0].size = entry->d_name.len + 1;
> - args.in_args[0].value = entry->d_name.name;
> + args.in_numargs = 2;
> + fuse_set_zero_arg0(&args);
> + args.in_args[1].size = entry->d_name.len + 1;
> + args.in_args[1].value = entry->d_name.name;
> err = fuse_simple_request(fm, &args);
> if (!err) {
> fuse_dir_changed(dir);
> @@ -1014,9 +1017,10 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
>
> args.opcode = FUSE_RMDIR;
> args.nodeid = get_node_id(dir);
> - args.in_numargs = 1;
> - args.in_args[0].size = entry->d_name.len + 1;
> - args.in_args[0].value = entry->d_name.name;
> + args.in_numargs = 2;
> + fuse_set_zero_arg0(&args);
> + args.in_args[1].size = entry->d_name.len + 1;
> + args.in_args[1].value = entry->d_name.name;
> err = fuse_simple_request(fm, &args);
> if (!err) {
> fuse_dir_changed(dir);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e6cc3d552b1382fc43bfe5191efc46e956ca268c..e3748751e231d0991c050b31bdd84db0b8016f9f 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -938,6 +938,19 @@ struct fuse_mount {
> struct rcu_head rcu;
> };
>
> +/*
> + * Empty header for FUSE opcodes without specific header needs.
> + * Used as a placeholder in args->in_args[0] for consistency
> + * across all FUSE operations, simplifying request handling.
> + */
> +struct fuse_zero_header {};
> +
> +static inline void fuse_set_zero_arg0(struct fuse_args *args)
> +{
> + args->in_args[0].size = sizeof(struct fuse_zero_header);
> + args->in_args[0].value = NULL;
> +}
> +
> static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
> {
> return sb->s_fs_info;
> diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
> index 9f568d345c51236ddd421b162820a4ea9b0734f4..93dfb06b6cea045d6df90c61c900680968bda39f 100644
> --- a/fs/fuse/xattr.c
> +++ b/fs/fuse/xattr.c
> @@ -164,9 +164,10 @@ int fuse_removexattr(struct inode *inode, const char *name)
>
> args.opcode = FUSE_REMOVEXATTR;
> args.nodeid = get_node_id(inode);
> - args.in_numargs = 1;
> - args.in_args[0].size = strlen(name) + 1;
> - args.in_args[0].value = name;
> + args.in_numargs = 2;
> + fuse_set_zero_arg0(&args);
> + args.in_args[1].size = strlen(name) + 1;
> + args.in_args[1].value = name;
> err = fuse_simple_request(fm, &args);
> if (err == -ENOSYS) {
> fm->fc->no_removexattr = 1;
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (4 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-28 2:23 ` Joanne Koong
` (2 more replies)
2024-11-27 13:40 ` [PATCH RFC v7 07/16] fuse: Make fuse_copy non static Bernd Schubert
` (11 subsequent siblings)
17 siblings, 3 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/Kconfig | 12 ++
fs/fuse/Makefile | 1 +
fs/fuse/dev.c | 4 +
fs/fuse/dev_uring.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 115 +++++++++++++++++
fs/fuse/fuse_i.h | 5 +
fs/fuse/inode.c | 10 ++
include/uapi/linux/fuse.h | 67 ++++++++++
8 files changed, 532 insertions(+)
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..11f37cefc94b2af5a675c238801560c822b95f1a 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
to be performed directly on a backing file.
If you want to allow passthrough operations, answer Y.
+
+config FUSE_IO_URING
+ bool "FUSE communication over io-uring"
+ default y
+ depends on FUSE_FS
+ depends on IO_URING
+ help
+ This allows sending FUSE requests over the IO uring interface and
+ also adds request core affinity.
+
+ If you want to allow fuse server/client communication through io-uring,
+ answer Y
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index ce0ff7a9007b94b4ab246b5271f227d126c768e8..fcf16b1c391a9bf11ca9f3a25b137acdb203ac47 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -14,5 +14,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
fuse-y += iomode.o
fuse-$(CONFIG_FUSE_DAX) += dax.o
fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
+fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 63c3865aebb7811fdf4a5729b2181ee8321421dc..0770373492ae9ee83c4154fede9dcfd7be9fb33d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -6,6 +6,7 @@
See the file COPYING.
*/
+#include "dev_uring_i.h"
#include "fuse_i.h"
#include "fuse_dev_i.h"
@@ -2452,6 +2453,9 @@ const struct file_operations fuse_dev_operations = {
.fasync = fuse_dev_fasync,
.unlocked_ioctl = fuse_dev_ioctl,
.compat_ioctl = compat_ptr_ioctl,
+#ifdef CONFIG_FUSE_IO_URING
+ .uring_cmd = fuse_uring_cmd,
+#endif
};
EXPORT_SYMBOL_GPL(fuse_dev_operations);
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
new file mode 100644
index 0000000000000000000000000000000000000000..af9c5f116ba1dcf6c01d0359d1a06491c92c32f9
--- /dev/null
+++ b/fs/fuse/dev_uring.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#include <linux/fs.h>
+
+#include "fuse_i.h"
+#include "dev_uring_i.h"
+#include "fuse_dev_i.h"
+
+#include <linux/io_uring/cmd.h>
+
+#ifdef CONFIG_FUSE_IO_URING
+static bool __read_mostly enable_uring;
+module_param(enable_uring, bool, 0644);
+MODULE_PARM_DESC(enable_uring,
+ "Enable uring userspace communication through uring.");
+#endif
+
+bool fuse_uring_enabled(void)
+{
+ return enable_uring;
+}
+
+static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
+{
+ struct fuse_ring_queue *queue = ent->queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
+ return -EIO;
+
+ ent->state = FRRS_COMMIT;
+ list_move(&ent->list, &queue->ent_commit_queue);
+
+ return 0;
+}
+
+void fuse_uring_destruct(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+ int qid;
+
+ if (!ring)
+ return;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ WARN_ON(!list_empty(&queue->ent_avail_queue));
+ WARN_ON(!list_empty(&queue->ent_commit_queue));
+
+ kfree(queue);
+ ring->queues[qid] = NULL;
+ }
+
+ kfree(ring->queues);
+ kfree(ring);
+ fc->ring = NULL;
+}
+
+#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+
+/*
+ * Basic ring setup for this connection based on the provided configuration
+ */
+static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = NULL;
+ size_t nr_queues = num_possible_cpus();
+ struct fuse_ring *res = NULL;
+
+ ring = kzalloc(sizeof(*fc->ring) +
+ nr_queues * sizeof(struct fuse_ring_queue),
+ GFP_KERNEL_ACCOUNT);
+ if (!ring)
+ return NULL;
+
+ ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
+ GFP_KERNEL_ACCOUNT);
+ if (!ring->queues)
+ goto out_err;
+
+ spin_lock(&fc->lock);
+ if (fc->ring) {
+ /* race, another thread created the ring in the mean time */
+ spin_unlock(&fc->lock);
+ res = fc->ring;
+ goto out_err;
+ }
+
+ fc->ring = ring;
+ ring->nr_queues = nr_queues;
+ ring->fc = fc;
+
+ spin_unlock(&fc->lock);
+ return ring;
+
+out_err:
+ if (ring)
+ kfree(ring->queues);
+ kfree(ring);
+ return res;
+}
+
+static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
+ int qid)
+{
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_ring_queue *queue;
+
+ queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
+ if (!queue)
+ return ERR_PTR(-ENOMEM);
+ spin_lock(&fc->lock);
+ if (ring->queues[qid]) {
+ spin_unlock(&fc->lock);
+ kfree(queue);
+ return ring->queues[qid];
+ }
+
+ queue->qid = qid;
+ queue->ring = ring;
+ spin_lock_init(&queue->lock);
+
+ INIT_LIST_HEAD(&queue->ent_avail_queue);
+ INIT_LIST_HEAD(&queue->ent_commit_queue);
+
+ WRITE_ONCE(ring->queues[qid], queue);
+ spin_unlock(&fc->lock);
+
+ return queue;
+}
+
+/*
+ * Make a ring entry available for fuse_req assignment
+ */
+static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
+ struct fuse_ring_queue *queue)
+{
+ list_move(&ring_ent->list, &queue->ent_avail_queue);
+ ring_ent->state = FRRS_WAIT;
+}
+
+/*
+ * fuse_uring_req_fetch command handling
+ */
+static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ spin_lock(&queue->lock);
+ fuse_uring_ent_avail(ring_ent, queue);
+ ring_ent->cmd = cmd;
+ spin_unlock(&queue->lock);
+}
+
+/*
+ * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
+ * the payload
+ */
+static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
+ struct iovec iov[FUSE_URING_IOV_SEGS])
+{
+ struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ struct iov_iter iter;
+ ssize_t ret;
+
+ if (sqe->len != FUSE_URING_IOV_SEGS)
+ return -EINVAL;
+
+ /*
+ * Direction for buffer access will actually be READ and WRITE,
+ * using write for the import should include READ access as well.
+ */
+ ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
+ FUSE_URING_IOV_SEGS, &iov, &iter);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
+ struct fuse_conn *fc)
+{
+ const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ struct fuse_ring_ent *ring_ent;
+ int err;
+ struct iovec iov[FUSE_URING_IOV_SEGS];
+
+ err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
+ if (err) {
+ pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
+ err);
+ return err;
+ }
+
+ err = -ENOMEM;
+ if (!ring) {
+ ring = fuse_uring_create(fc);
+ if (!ring)
+ return err;
+ }
+
+ queue = ring->queues[cmd_req->qid];
+ if (!queue) {
+ queue = fuse_uring_create_queue(ring, cmd_req->qid);
+ if (!queue)
+ return err;
+ }
+
+ /*
+ * The created queue above does not need to be destructed in
+ * case of entry errors below, will be done at ring destruction time.
+ */
+
+ ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
+ if (ring_ent == NULL)
+ return err;
+
+ INIT_LIST_HEAD(&ring_ent->list);
+
+ ring_ent->queue = queue;
+ ring_ent->cmd = cmd;
+
+ err = -EINVAL;
+ if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
+ pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
+ goto err;
+ }
+
+ ring_ent->headers = iov[0].iov_base;
+ ring_ent->payload = iov[1].iov_base;
+ ring_ent->max_arg_len = iov[1].iov_len;
+
+ if (ring_ent->max_arg_len <
+ max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write)) {
+ pr_info_ratelimited("Invalid req payload len %zu\n",
+ ring_ent->max_arg_len);
+ goto err;
+ }
+
+ spin_lock(&queue->lock);
+
+ /*
+ * FUSE_URING_REQ_FETCH is an initialization exception, needs
+ * state override
+ */
+ ring_ent->state = FRRS_USERSPACE;
+ err = fuse_ring_ent_unset_userspace(ring_ent);
+ spin_unlock(&queue->lock);
+ if (WARN_ON_ONCE(err != 0))
+ goto err;
+
+ _fuse_uring_fetch(ring_ent, cmd, issue_flags);
+
+ return 0;
+err:
+ list_del_init(&ring_ent->list);
+ kfree(ring_ent);
+ return err;
+}
+
+/*
+ * Entry function from io_uring to handle the given passthrough command
+ * (op cocde IORING_OP_URING_CMD)
+ */
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+ struct fuse_dev *fud;
+ struct fuse_conn *fc;
+ u32 cmd_op = cmd->cmd_op;
+ int err;
+
+ /* Disabled for now, especially as teardown is not implemented yet */
+ pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
+ return -EOPNOTSUPP;
+
+ if (!enable_uring) {
+ pr_info_ratelimited("fuse-io-uring is disabled\n");
+ return -EOPNOTSUPP;
+ }
+
+ fud = fuse_get_dev(cmd->file);
+ if (!fud) {
+ pr_info_ratelimited("No fuse device found\n");
+ return -ENOTCONN;
+ }
+ fc = fud->fc;
+
+ if (!fc->connected || fc->aborted)
+ return fc->aborted ? -ECONNABORTED : -ENOTCONN;
+
+ switch (cmd_op) {
+ case FUSE_URING_REQ_FETCH:
+ err = fuse_uring_fetch(cmd, issue_flags, fc);
+ if (err) {
+ pr_info_once("fuse_uring_fetch failed err=%d\n", err);
+ return err;
+ }
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return -EIOCBQUEUED;
+}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
new file mode 100644
index 0000000000000000000000000000000000000000..75c644cc0b2bb3721b08f8695964815d53f46e92
--- /dev/null
+++ b/fs/fuse/dev_uring_i.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * FUSE: Filesystem in Userspace
+ * Copyright (c) 2023-2024 DataDirect Networks.
+ */
+
+#ifndef _FS_FUSE_DEV_URING_I_H
+#define _FS_FUSE_DEV_URING_I_H
+
+#include "fuse_i.h"
+
+#ifdef CONFIG_FUSE_IO_URING
+
+enum fuse_ring_req_state {
+ FRRS_INVALID = 0,
+
+ /* The ring entry received from userspace and it being processed */
+ FRRS_COMMIT,
+
+ /* The ring entry is waiting for new fuse requests */
+ FRRS_WAIT,
+
+ /* The ring entry is in or on the way to user space */
+ FRRS_USERSPACE,
+};
+
+/** A fuse ring entry, part of the ring queue */
+struct fuse_ring_ent {
+ /* userspace buffer */
+ struct fuse_uring_req_header __user *headers;
+ void *__user *payload;
+
+ /* the ring queue that owns the request */
+ struct fuse_ring_queue *queue;
+
+ struct io_uring_cmd *cmd;
+
+ struct list_head list;
+
+ /* size of payload buffer */
+ size_t max_arg_len;
+
+ /*
+ * state the request is currently in
+ * (enum fuse_ring_req_state)
+ */
+ unsigned int state;
+
+ struct fuse_req *fuse_req;
+};
+
+struct fuse_ring_queue {
+ /*
+ * back pointer to the main fuse uring structure that holds this
+ * queue
+ */
+ struct fuse_ring *ring;
+
+ /* queue id, typically also corresponds to the cpu core */
+ unsigned int qid;
+
+ /*
+ * queue lock, taken when any value in the queue changes _and_ also
+ * a ring entry state changes.
+ */
+ spinlock_t lock;
+
+ /* available ring entries (struct fuse_ring_ent) */
+ struct list_head ent_avail_queue;
+
+ /*
+ * entries in the process of being committed or in the process
+ * to be send to userspace
+ */
+ struct list_head ent_commit_queue;
+};
+
+/**
+ * Describes if uring is for communication and holds alls the data needed
+ * for uring communication
+ */
+struct fuse_ring {
+ /* back pointer */
+ struct fuse_conn *fc;
+
+ /* number of ring queues */
+ size_t nr_queues;
+
+ struct fuse_ring_queue **queues;
+};
+
+bool fuse_uring_enabled(void);
+void fuse_uring_destruct(struct fuse_conn *fc);
+int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+
+#else /* CONFIG_FUSE_IO_URING */
+
+struct fuse_ring;
+
+static inline void fuse_uring_create(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_destruct(struct fuse_conn *fc)
+{
+}
+
+static inline bool fuse_uring_enabled(void)
+{
+ return false;
+}
+
+#endif /* CONFIG_FUSE_IO_URING */
+
+#endif /* _FS_FUSE_DEV_URING_I_H */
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e3748751e231d0991c050b31bdd84db0b8016f9f..a21256ec4c3b4bd7c67eae2d03b68d87dcc8234b 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -914,6 +914,11 @@ struct fuse_conn {
/** IDR for backing files ids */
struct idr backing_files_map;
#endif
+
+#ifdef CONFIG_FUSE_IO_URING
+ /** uring connection information*/
+ struct fuse_ring *ring;
+#endif
};
/*
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index fd3321e29a3e569bf06be22a5383cf34fd42c051..76267c79e920204175e5713853de8214c5555d46 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -7,6 +7,7 @@
*/
#include "fuse_i.h"
+#include "dev_uring_i.h"
#include <linux/pagemap.h>
#include <linux/slab.h>
@@ -959,6 +960,8 @@ static void delayed_release(struct rcu_head *p)
{
struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
+ fuse_uring_destruct(fc);
+
put_user_ns(fc->user_ns);
fc->release(fc);
}
@@ -1413,6 +1416,13 @@ void fuse_send_init(struct fuse_mount *fm)
if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
flags |= FUSE_PASSTHROUGH;
+ /*
+ * This is just an information flag for fuse server. No need to check
+ * the reply - server is either sending IORING_OP_URING_CMD or not.
+ */
+ if (fuse_uring_enabled())
+ flags |= FUSE_OVER_IO_URING;
+
ia->in.flags = flags;
ia->in.flags2 = flags >> 32;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index f1e99458e29e4fdce5273bc3def242342f207ebd..6d39077edf8cde4fa77130efcec16323839a676c 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -220,6 +220,14 @@
*
* 7.41
* - add FUSE_ALLOW_IDMAP
+ * 7.42
+ * - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
+ * structures:
+ * - fuse_uring_ent_in_out
+ * - fuse_uring_req_header
+ * - fuse_uring_cmd_req
+ * - FUSE_URING_IN_OUT_HEADER_SZ
+ * - FUSE_URING_OP_IN_OUT_SZ
*/
#ifndef _LINUX_FUSE_H
@@ -425,6 +433,7 @@ struct fuse_file_lock {
* FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
* of the request ID indicates resend requests
* FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
+ * FUSE_OVER_IO_URING: Indicate that Client supports io-uring
*/
#define FUSE_ASYNC_READ (1 << 0)
#define FUSE_POSIX_LOCKS (1 << 1)
@@ -471,6 +480,7 @@ struct fuse_file_lock {
/* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
#define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP
#define FUSE_ALLOW_IDMAP (1ULL << 40)
+#define FUSE_OVER_IO_URING (1ULL << 41)
/**
* CUSE INIT request/reply flags
@@ -1206,4 +1216,61 @@ struct fuse_supp_groups {
uint32_t groups[];
};
+/**
+ * Size of the ring buffer header
+ */
+#define FUSE_URING_IN_OUT_HEADER_SZ 128
+#define FUSE_URING_OP_IN_OUT_SZ 128
+
+struct fuse_uring_ent_in_out {
+ uint64_t flags;
+
+ /* size of use payload buffer */
+ uint32_t payload_sz;
+ uint32_t padding;
+
+ uint8_t reserved[30];
+};
+
+/**
+ * This structure mapped onto the
+ */
+struct fuse_uring_req_header {
+ /* struct fuse_in / struct fuse_out */
+ char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
+
+ /* per op code structs */
+ char op_in[FUSE_URING_OP_IN_OUT_SZ];
+
+ /* struct fuse_ring_in_out */
+ char ring_ent_in_out[sizeof(struct fuse_uring_ent_in_out)];
+};
+
+/**
+ * sqe commands to the kernel
+ */
+enum fuse_uring_cmd {
+ FUSE_URING_REQ_INVALID = 0,
+
+ /* submit sqe to kernel to get a request */
+ FUSE_URING_REQ_FETCH = 1,
+
+ /* commit result and fetch next request */
+ FUSE_URING_REQ_COMMIT_AND_FETCH = 2,
+};
+
+/**
+ * In the 80B command area of the SQE.
+ */
+struct fuse_uring_cmd_req {
+ uint64_t flags;
+
+ /* entry identifier */
+ uint64_t commit_id;
+
+ /* queue the command is for (queue index) */
+ uint16_t qid;
+ uint8_t padding[6];
+};
+
#endif /* _LINUX_FUSE_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
2024-11-27 13:40 ` [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
@ 2024-11-28 2:23 ` Joanne Koong
2024-11-28 18:20 ` Bernd Schubert
2024-12-03 13:24 ` Pavel Begunkov
2024-12-03 13:38 ` Pavel Begunkov
2 siblings, 1 reply; 37+ messages in thread
From: Joanne Koong @ 2024-11-28 2:23 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On Wed, Nov 27, 2024 at 5:41 AM Bernd Schubert <[email protected]> wrote:
>
> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
> For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> fs/fuse/Kconfig | 12 ++
> fs/fuse/Makefile | 1 +
> fs/fuse/dev.c | 4 +
> fs/fuse/dev_uring.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/dev_uring_i.h | 115 +++++++++++++++++
> fs/fuse/fuse_i.h | 5 +
> fs/fuse/inode.c | 10 ++
> include/uapi/linux/fuse.h | 67 ++++++++++
> 8 files changed, 532 insertions(+)
>
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..11f37cefc94b2af5a675c238801560c822b95f1a 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
> to be performed directly on a backing file.
>
> If you want to allow passthrough operations, answer Y.
> +
> +config FUSE_IO_URING
> + bool "FUSE communication over io-uring"
> + default y
> + depends on FUSE_FS
> + depends on IO_URING
> + help
> + This allows sending FUSE requests over the IO uring interface and
> + also adds request core affinity.
> +
> + If you want to allow fuse server/client communication through io-uring,
> + answer Y
> diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
> index ce0ff7a9007b94b4ab246b5271f227d126c768e8..fcf16b1c391a9bf11ca9f3a25b137acdb203ac47 100644
> --- a/fs/fuse/Makefile
> +++ b/fs/fuse/Makefile
> @@ -14,5 +14,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
> fuse-y += iomode.o
> fuse-$(CONFIG_FUSE_DAX) += dax.o
> fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
> +fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
>
> virtiofs-y := virtio_fs.o
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 63c3865aebb7811fdf4a5729b2181ee8321421dc..0770373492ae9ee83c4154fede9dcfd7be9fb33d 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -6,6 +6,7 @@
> See the file COPYING.
> */
>
> +#include "dev_uring_i.h"
> #include "fuse_i.h"
> #include "fuse_dev_i.h"
>
> @@ -2452,6 +2453,9 @@ const struct file_operations fuse_dev_operations = {
> .fasync = fuse_dev_fasync,
> .unlocked_ioctl = fuse_dev_ioctl,
> .compat_ioctl = compat_ptr_ioctl,
> +#ifdef CONFIG_FUSE_IO_URING
> + .uring_cmd = fuse_uring_cmd,
> +#endif
> };
> EXPORT_SYMBOL_GPL(fuse_dev_operations);
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..af9c5f116ba1dcf6c01d0359d1a06491c92c32f9
> --- /dev/null
> +++ b/fs/fuse/dev_uring.c
> @@ -0,0 +1,318 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * FUSE: Filesystem in Userspace
> + * Copyright (c) 2023-2024 DataDirect Networks.
> + */
> +
> +#include <linux/fs.h>
nit: for consistency, should this line be placed directly above the
other "#include <linux..." line?
> +
> +#include "fuse_i.h"
> +#include "dev_uring_i.h"
> +#include "fuse_dev_i.h"
> +
> +#include <linux/io_uring/cmd.h>
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +static bool __read_mostly enable_uring;
> +module_param(enable_uring, bool, 0644);
> +MODULE_PARM_DESC(enable_uring,
> + "Enable uring userspace communication through uring.");
nit: The double uring seems a bit repetitive to me. Maybe just "enable
uring userspace communication" or "enable userspace communication
through uring"? Also, super nit but I noticed the other
MODULE_PARM_DESCs don't have trailing periods.
> +#endif
> +
> +bool fuse_uring_enabled(void)
> +{
> + return enable_uring;
> +}
> +
> +static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
> +{
> + struct fuse_ring_queue *queue = ent->queue;
> +
> + lockdep_assert_held(&queue->lock);
> +
> + if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
> + return -EIO;
> +
> + ent->state = FRRS_COMMIT;
> + list_move(&ent->list, &queue->ent_commit_queue);
> +
> + return 0;
> +}
> +
> +void fuse_uring_destruct(struct fuse_conn *fc)
> +{
> + struct fuse_ring *ring = fc->ring;
> + int qid;
> +
> + if (!ring)
> + return;
> +
> + for (qid = 0; qid < ring->nr_queues; qid++) {
> + struct fuse_ring_queue *queue = ring->queues[qid];
> +
> + if (!queue)
> + continue;
> +
> + WARN_ON(!list_empty(&queue->ent_avail_queue));
> + WARN_ON(!list_empty(&queue->ent_commit_queue));
> +
> + kfree(queue);
> + ring->queues[qid] = NULL;
> + }
> +
> + kfree(ring->queues);
> + kfree(ring);
> + fc->ring = NULL;
> +}
> +
> +#define FUSE_URING_IOV_SEGS 2 /* header and payload */
nit: to make the code flow easier, might be better to move #defines to
the top of the file after the includes.
> +
> +/*
> + * Basic ring setup for this connection based on the provided configuration
> + */
> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
> +{
> + struct fuse_ring *ring = NULL;
> + size_t nr_queues = num_possible_cpus();
> + struct fuse_ring *res = NULL;
> +
> + ring = kzalloc(sizeof(*fc->ring) +
> + nr_queues * sizeof(struct fuse_ring_queue),
I think you just need kzalloc(sizeof(*fc->ring)); here since you're
allocating ring->queues later below
> + GFP_KERNEL_ACCOUNT);
> + if (!ring)
> + return NULL;
> +
> + ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
> + GFP_KERNEL_ACCOUNT);
> + if (!ring->queues)
> + goto out_err;
> +
> + spin_lock(&fc->lock);
> + if (fc->ring) {
> + /* race, another thread created the ring in the mean time */
nit: s/mean time/meantime
> + spin_unlock(&fc->lock);
> + res = fc->ring;
> + goto out_err;
> + }
> +
> + fc->ring = ring;
> + ring->nr_queues = nr_queues;
> + ring->fc = fc;
> +
> + spin_unlock(&fc->lock);
> + return ring;
> +
> +out_err:
> + if (ring)
I think you meant if (ring->queues)
> + kfree(ring->queues);
> + kfree(ring);
> + return res;
> +}
> +
> +static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
> + int qid)
> +{
> + struct fuse_conn *fc = ring->fc;
> + struct fuse_ring_queue *queue;
> +
> + queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
> + if (!queue)
> + return ERR_PTR(-ENOMEM);
> + spin_lock(&fc->lock);
This probably doesn't make much of a difference but might be better to
minimize logic inside the lock, eg do the queue initialization stuff
outside the lock
> + if (ring->queues[qid]) {
> + spin_unlock(&fc->lock);
> + kfree(queue);
> + return ring->queues[qid];
> + }
> +
> + queue->qid = qid;
> + queue->ring = ring;
> + spin_lock_init(&queue->lock);
> +
> + INIT_LIST_HEAD(&queue->ent_avail_queue);
> + INIT_LIST_HEAD(&queue->ent_commit_queue);
> +
> + WRITE_ONCE(ring->queues[qid], queue);
Just curious, why do we need WRITE_ONCE here if it's already protected
by the fc->lock?
> + spin_unlock(&fc->lock);
> +
> + return queue;
> +}
> +
> +/*
> + * Make a ring entry available for fuse_req assignment
> + */
> +static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
> + struct fuse_ring_queue *queue)
> +{
> + list_move(&ring_ent->list, &queue->ent_avail_queue);
> + ring_ent->state = FRRS_WAIT;
> +}
> +
> +/*
> + * fuse_uring_req_fetch command handling
> + */
> +static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
> + struct io_uring_cmd *cmd,
> + unsigned int issue_flags)
> +{
> + struct fuse_ring_queue *queue = ring_ent->queue;
> +
> + spin_lock(&queue->lock);
> + fuse_uring_ent_avail(ring_ent, queue);
> + ring_ent->cmd = cmd;
> + spin_unlock(&queue->lock);
> +}
> +
> +/*
> + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
> + * the payload
> + */
> +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
> + struct iovec iov[FUSE_URING_IOV_SEGS])
> +{
> + struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
> + struct iov_iter iter;
> + ssize_t ret;
> +
> + if (sqe->len != FUSE_URING_IOV_SEGS)
> + return -EINVAL;
> +
> + /*
> + * Direction for buffer access will actually be READ and WRITE,
> + * using write for the import should include READ access as well.
> + */
> + ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
> + FUSE_URING_IOV_SEGS, &iov, &iter);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
> + struct fuse_conn *fc)
> +{
> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
> + struct fuse_ring *ring = fc->ring;
> + struct fuse_ring_queue *queue;
> + struct fuse_ring_ent *ring_ent;
> + int err;
> + struct iovec iov[FUSE_URING_IOV_SEGS];
> +
> + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> + if (err) {
> + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
> + err);
> + return err;
> + }
> +
> + err = -ENOMEM;
> + if (!ring) {
> + ring = fuse_uring_create(fc);
> + if (!ring)
> + return err;
> + }
> +
> + queue = ring->queues[cmd_req->qid];
> + if (!queue) {
> + queue = fuse_uring_create_queue(ring, cmd_req->qid);
> + if (!queue)
> + return err;
> + }
> +
> + /*
> + * The created queue above does not need to be destructed in
> + * case of entry errors below, will be done at ring destruction time.
> + */
> +
> + ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
> + if (ring_ent == NULL)
nit: !ring_ent
> + return err;
> +
> + INIT_LIST_HEAD(&ring_ent->list);
> +
> + ring_ent->queue = queue;
> + ring_ent->cmd = cmd;
> +
> + err = -EINVAL;
> + if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
> + pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
> + goto err;
> + }
> +
> + ring_ent->headers = iov[0].iov_base;
> + ring_ent->payload = iov[1].iov_base;
> + ring_ent->max_arg_len = iov[1].iov_len;
> +
> + if (ring_ent->max_arg_len <
> + max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write)) {
If I'm understanding this correctly, iov[0] is the header and iov[1]
is the payload. Is this right that the payload len must always be
equal to fc->max_write?
Also, do we need to take into account fc->max_pages too?
> + pr_info_ratelimited("Invalid req payload len %zu\n",
> + ring_ent->max_arg_len);
> + goto err;
> + }
> +
> + spin_lock(&queue->lock);
> +
> + /*
> + * FUSE_URING_REQ_FETCH is an initialization exception, needs
> + * state override
> + */
> + ring_ent->state = FRRS_USERSPACE;
> + err = fuse_ring_ent_unset_userspace(ring_ent);
> + spin_unlock(&queue->lock);
> + if (WARN_ON_ONCE(err != 0))
nit: WARN_ON_ONCE(err)
> + goto err;
> +
> + _fuse_uring_fetch(ring_ent, cmd, issue_flags);
> +
> + return 0;
> +err:
> + list_del_init(&ring_ent->list);
> + kfree(ring_ent);
> + return err;
> +}
> +
> +/*
> + * Entry function from io_uring to handle the given passthrough command
> + * (op cocde IORING_OP_URING_CMD)
> + */
> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
> +{
> + struct fuse_dev *fud;
> + struct fuse_conn *fc;
> + u32 cmd_op = cmd->cmd_op;
> + int err;
> +
> + /* Disabled for now, especially as teardown is not implemented yet */
> + pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
> + return -EOPNOTSUPP;
> +
> + if (!enable_uring) {
> + pr_info_ratelimited("fuse-io-uring is disabled\n");
> + return -EOPNOTSUPP;
> + }
> +
> + fud = fuse_get_dev(cmd->file);
> + if (!fud) {
> + pr_info_ratelimited("No fuse device found\n");
> + return -ENOTCONN;
> + }
> + fc = fud->fc;
> +
> + if (!fc->connected || fc->aborted)
> + return fc->aborted ? -ECONNABORTED : -ENOTCONN;
I find
if (fc->aborted)
return -ECONNABORTED;
if (!fc->connected)
return -ENOTCONN;
easier to read
> +
> + switch (cmd_op) {
> + case FUSE_URING_REQ_FETCH:
FUSE_URING_REQ_FETCH is only used for initialization, would it be
clearer if this was named FUSE_URING_INIT or something like that?
> + err = fuse_uring_fetch(cmd, issue_flags, fc);
> + if (err) {
> + pr_info_once("fuse_uring_fetch failed err=%d\n", err);
> + return err;
> + }
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return -EIOCBQUEUED;
> +}
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..75c644cc0b2bb3721b08f8695964815d53f46e92
> --- /dev/null
> +++ b/fs/fuse/dev_uring_i.h
> @@ -0,0 +1,115 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * FUSE: Filesystem in Userspace
> + * Copyright (c) 2023-2024 DataDirect Networks.
> + */
> +
> +#ifndef _FS_FUSE_DEV_URING_I_H
> +#define _FS_FUSE_DEV_URING_I_H
> +
> +#include "fuse_i.h"
> +
> +#ifdef CONFIG_FUSE_IO_URING
> +
> +enum fuse_ring_req_state {
> + FRRS_INVALID = 0,
> +
> + /* The ring entry received from userspace and it being processed */
nit: "it is being processed"
> + FRRS_COMMIT,
> +
> + /* The ring entry is waiting for new fuse requests */
> + FRRS_WAIT,
> +
> + /* The ring entry is in or on the way to user space */
> + FRRS_USERSPACE,
> +};
> +
> +/** A fuse ring entry, part of the ring queue */
> +struct fuse_ring_ent {
> + /* userspace buffer */
> + struct fuse_uring_req_header __user *headers;
> + void *__user *payload;
> +
> + /* the ring queue that owns the request */
> + struct fuse_ring_queue *queue;
> +
> + struct io_uring_cmd *cmd;
> +
> + struct list_head list;
> +
> + /* size of payload buffer */
> + size_t max_arg_len;
> +
> + /*
> + * state the request is currently in
> + * (enum fuse_ring_req_state)
> + */
> + unsigned int state;
> +
> + struct fuse_req *fuse_req;
> +};
> +
> +struct fuse_ring_queue {
> + /*
> + * back pointer to the main fuse uring structure that holds this
> + * queue
> + */
> + struct fuse_ring *ring;
> +
> + /* queue id, typically also corresponds to the cpu core */
> + unsigned int qid;
> +
> + /*
> + * queue lock, taken when any value in the queue changes _and_ also
> + * a ring entry state changes.
> + */
> + spinlock_t lock;
> +
> + /* available ring entries (struct fuse_ring_ent) */
> + struct list_head ent_avail_queue;
> +
> + /*
> + * entries in the process of being committed or in the process
> + * to be send to userspace
> + */
> + struct list_head ent_commit_queue;
> +};
> +
> +/**
> + * Describes if uring is for communication and holds alls the data needed
> + * for uring communication
> + */
> +struct fuse_ring {
> + /* back pointer */
> + struct fuse_conn *fc;
> +
> + /* number of ring queues */
> + size_t nr_queues;
> +
> + struct fuse_ring_queue **queues;
> +};
> +
> +bool fuse_uring_enabled(void);
> +void fuse_uring_destruct(struct fuse_conn *fc);
> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
> +
> +#else /* CONFIG_FUSE_IO_URING */
> +
> +struct fuse_ring;
> +
> +static inline void fuse_uring_create(struct fuse_conn *fc)
> +{
> +}
> +
> +static inline void fuse_uring_destruct(struct fuse_conn *fc)
> +{
> +}
> +
> +static inline bool fuse_uring_enabled(void)
> +{
> + return false;
> +}
> +
> +#endif /* CONFIG_FUSE_IO_URING */
> +
> +#endif /* _FS_FUSE_DEV_URING_I_H */
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e3748751e231d0991c050b31bdd84db0b8016f9f..a21256ec4c3b4bd7c67eae2d03b68d87dcc8234b 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -914,6 +914,11 @@ struct fuse_conn {
> /** IDR for backing files ids */
> struct idr backing_files_map;
> #endif
> +
> +#ifdef CONFIG_FUSE_IO_URING
> + /** uring connection information*/
> + struct fuse_ring *ring;
> +#endif
> };
>
> /*
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index fd3321e29a3e569bf06be22a5383cf34fd42c051..76267c79e920204175e5713853de8214c5555d46 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -7,6 +7,7 @@
> */
>
> #include "fuse_i.h"
> +#include "dev_uring_i.h"
>
> #include <linux/pagemap.h>
> #include <linux/slab.h>
> @@ -959,6 +960,8 @@ static void delayed_release(struct rcu_head *p)
> {
> struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
>
> + fuse_uring_destruct(fc);
> +
> put_user_ns(fc->user_ns);
> fc->release(fc);
> }
> @@ -1413,6 +1416,13 @@ void fuse_send_init(struct fuse_mount *fm)
> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> flags |= FUSE_PASSTHROUGH;
>
> + /*
> + * This is just an information flag for fuse server. No need to check
> + * the reply - server is either sending IORING_OP_URING_CMD or not.
> + */
> + if (fuse_uring_enabled())
> + flags |= FUSE_OVER_IO_URING;
> +
> ia->in.flags = flags;
> ia->in.flags2 = flags >> 32;
>
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index f1e99458e29e4fdce5273bc3def242342f207ebd..6d39077edf8cde4fa77130efcec16323839a676c 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -220,6 +220,14 @@
> *
> * 7.41
> * - add FUSE_ALLOW_IDMAP
> + * 7.42
> + * - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
> + * structures:
> + * - fuse_uring_ent_in_out
> + * - fuse_uring_req_header
> + * - fuse_uring_cmd_req
> + * - FUSE_URING_IN_OUT_HEADER_SZ
> + * - FUSE_URING_OP_IN_OUT_SZ
> */
>
> #ifndef _LINUX_FUSE_H
> @@ -425,6 +433,7 @@ struct fuse_file_lock {
> * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
> * of the request ID indicates resend requests
> * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
> + * FUSE_OVER_IO_URING: Indicate that Client supports io-uring
> */
> #define FUSE_ASYNC_READ (1 << 0)
> #define FUSE_POSIX_LOCKS (1 << 1)
> @@ -471,6 +480,7 @@ struct fuse_file_lock {
> /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
> #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP
> #define FUSE_ALLOW_IDMAP (1ULL << 40)
> +#define FUSE_OVER_IO_URING (1ULL << 41)
>
> /**
> * CUSE INIT request/reply flags
> @@ -1206,4 +1216,61 @@ struct fuse_supp_groups {
> uint32_t groups[];
> };
>
> +/**
> + * Size of the ring buffer header
> + */
> +#define FUSE_URING_IN_OUT_HEADER_SZ 128
> +#define FUSE_URING_OP_IN_OUT_SZ 128
> +
> +struct fuse_uring_ent_in_out {
> + uint64_t flags;
> +
> + /* size of use payload buffer */
> + uint32_t payload_sz;
> + uint32_t padding;
> +
> + uint8_t reserved[30];
out of curiosity, how was 30 chosen here? I think this makes the
struct 46 bytes?
> +};
> +
> +/**
> + * This structure mapped onto the
> + */
> +struct fuse_uring_req_header {
> + /* struct fuse_in / struct fuse_out */
> + char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
> +
> + /* per op code structs */
> + char op_in[FUSE_URING_OP_IN_OUT_SZ];
> +
> + /* struct fuse_ring_in_out */
> + char ring_ent_in_out[sizeof(struct fuse_uring_ent_in_out)];
> +};
> +
> +/**
> + * sqe commands to the kernel
> + */
> +enum fuse_uring_cmd {
> + FUSE_URING_REQ_INVALID = 0,
> +
> + /* submit sqe to kernel to get a request */
> + FUSE_URING_REQ_FETCH = 1,
> +
> + /* commit result and fetch next request */
> + FUSE_URING_REQ_COMMIT_AND_FETCH = 2,
> +};
> +
> +/**
> + * In the 80B command area of the SQE.
> + */
> +struct fuse_uring_cmd_req {
> + uint64_t flags;
> +
> + /* entry identifier */
> + uint64_t commit_id;
> +
> + /* queue the command is for (queue index) */
> + uint16_t qid;
> + uint8_t padding[6];
> +};
> +
> #endif /* _LINUX_FUSE_H */
>
I'll be out the rest of this week for an American holiday
(Thanksgiving) and next week for a work trip but will review the rest
of this patchset after that when i get back.
Thanks,
Joanne
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
2024-11-28 2:23 ` Joanne Koong
@ 2024-11-28 18:20 ` Bernd Schubert
0 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-28 18:20 UTC (permalink / raw)
To: Joanne Koong, Bernd Schubert
Cc: Miklos Szeredi, Jens Axboe, Pavel Begunkov, linux-fsdevel,
io-uring, Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
Hi Joanne,
thanks for your careful review!
On 11/28/24 03:23, Joanne Koong wrote:
> On Wed, Nov 27, 2024 at 5:41 AM Bernd Schubert <[email protected]> wrote:
>>
>> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
>> For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>> fs/fuse/Kconfig | 12 ++
>> fs/fuse/Makefile | 1 +
>> fs/fuse/dev.c | 4 +
>> fs/fuse/dev_uring.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++
>> fs/fuse/dev_uring_i.h | 115 +++++++++++++++++
>> fs/fuse/fuse_i.h | 5 +
>> fs/fuse/inode.c | 10 ++
>> include/uapi/linux/fuse.h | 67 ++++++++++
>> 8 files changed, 532 insertions(+)
>>
>> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
>> index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..11f37cefc94b2af5a675c238801560c822b95f1a 100644
>> --- a/fs/fuse/Kconfig
>> +++ b/fs/fuse/Kconfig
>> @@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
>> to be performed directly on a backing file.
>>
>> If you want to allow passthrough operations, answer Y.
>> +
>> +config FUSE_IO_URING
>> + bool "FUSE communication over io-uring"
>> + default y
>> + depends on FUSE_FS
>> + depends on IO_URING
>> + help
>> + This allows sending FUSE requests over the IO uring interface and
>> + also adds request core affinity.
>> +
>> + If you want to allow fuse server/client communication through io-uring,
>> + answer Y
>> diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
>> index ce0ff7a9007b94b4ab246b5271f227d126c768e8..fcf16b1c391a9bf11ca9f3a25b137acdb203ac47 100644
>> --- a/fs/fuse/Makefile
>> +++ b/fs/fuse/Makefile
>> @@ -14,5 +14,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
>> fuse-y += iomode.o
>> fuse-$(CONFIG_FUSE_DAX) += dax.o
>> fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
>> +fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
>>
>> virtiofs-y := virtio_fs.o
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 63c3865aebb7811fdf4a5729b2181ee8321421dc..0770373492ae9ee83c4154fede9dcfd7be9fb33d 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -6,6 +6,7 @@
>> See the file COPYING.
>> */
>>
>> +#include "dev_uring_i.h"
>> #include "fuse_i.h"
>> #include "fuse_dev_i.h"
>>
>> @@ -2452,6 +2453,9 @@ const struct file_operations fuse_dev_operations = {
>> .fasync = fuse_dev_fasync,
>> .unlocked_ioctl = fuse_dev_ioctl,
>> .compat_ioctl = compat_ptr_ioctl,
>> +#ifdef CONFIG_FUSE_IO_URING
>> + .uring_cmd = fuse_uring_cmd,
>> +#endif
>> };
>> EXPORT_SYMBOL_GPL(fuse_dev_operations);
>>
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..af9c5f116ba1dcf6c01d0359d1a06491c92c32f9
>> --- /dev/null
>> +++ b/fs/fuse/dev_uring.c
>> @@ -0,0 +1,318 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * FUSE: Filesystem in Userspace
>> + * Copyright (c) 2023-2024 DataDirect Networks.
>> + */
>> +
>> +#include <linux/fs.h>
>
> nit: for consistency, should this line be placed directly above the
> other "#include <linux..." line?
>
>> +
>> +#include "fuse_i.h"
>> +#include "dev_uring_i.h"
>> +#include "fuse_dev_i.h"
>> +
>> +#include <linux/io_uring/cmd.h>
>> +
>> +#ifdef CONFIG_FUSE_IO_URING
>> +static bool __read_mostly enable_uring;
>> +module_param(enable_uring, bool, 0644);
>> +MODULE_PARM_DESC(enable_uring,
>> + "Enable uring userspace communication through uring.");
>
> nit: The double uring seems a bit repetitive to me. Maybe just "enable
> uring userspace communication" or "enable userspace communication
> through uring"? Also, super nit but I noticed the other
> MODULE_PARM_DESCs don't have trailing periods.
Fixed to "Enable userspace communication through io-uring"
>
>> +#endif
>> +
>> +bool fuse_uring_enabled(void)
>> +{
>> + return enable_uring;
>> +}
>> +
>> +static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
>> +{
>> + struct fuse_ring_queue *queue = ent->queue;
>> +
>> + lockdep_assert_held(&queue->lock);
>> +
>> + if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
>> + return -EIO;
>> +
>> + ent->state = FRRS_COMMIT;
>> + list_move(&ent->list, &queue->ent_commit_queue);
>> +
>> + return 0;
>> +}
>> +
>> +void fuse_uring_destruct(struct fuse_conn *fc)
>> +{
>> + struct fuse_ring *ring = fc->ring;
>> + int qid;
>> +
>> + if (!ring)
>> + return;
>> +
>> + for (qid = 0; qid < ring->nr_queues; qid++) {
>> + struct fuse_ring_queue *queue = ring->queues[qid];
>> +
>> + if (!queue)
>> + continue;
>> +
>> + WARN_ON(!list_empty(&queue->ent_avail_queue));
>> + WARN_ON(!list_empty(&queue->ent_commit_queue));
>> +
>> + kfree(queue);
>> + ring->queues[qid] = NULL;
>> + }
>> +
>> + kfree(ring->queues);
>> + kfree(ring);
>> + fc->ring = NULL;
>> +}
>> +
>> +#define FUSE_URING_IOV_SEGS 2 /* header and payload */
>
> nit: to make the code flow easier, might be better to move #defines to
> the top of the file after the includes.
Fixed.
>
>> +
>> +/*
>> + * Basic ring setup for this connection based on the provided configuration
>> + */
>> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
>> +{
>> + struct fuse_ring *ring = NULL;
>> + size_t nr_queues = num_possible_cpus();
>> + struct fuse_ring *res = NULL;
>> +
>> + ring = kzalloc(sizeof(*fc->ring) +
>> + nr_queues * sizeof(struct fuse_ring_queue),
>
> I think you just need kzalloc(sizeof(*fc->ring)); here since you're
> allocating ring->queues later below
Ah, now I see what Miklos probably meant "with left over from previous
patches". Sorry Miklos, I had misunderstood what you meant and thanks
Joanne for spotting it.
>
>> + GFP_KERNEL_ACCOUNT);
>> + if (!ring)
>> + return NULL;
>> +
>> + ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
>> + GFP_KERNEL_ACCOUNT);
>> + if (!ring->queues)
>> + goto out_err;
>> +
>> + spin_lock(&fc->lock);
>> + if (fc->ring) {
>> + /* race, another thread created the ring in the mean time */
>
> nit: s/mean time/meantime
Fixed.
>
>> + spin_unlock(&fc->lock);
>> + res = fc->ring;
>> + goto out_err;
>> + }
>> +
>> + fc->ring = ring;
>> + ring->nr_queues = nr_queues;
>> + ring->fc = fc;
>> +
>> + spin_unlock(&fc->lock);
>> + return ring;
>> +
>> +out_err:
>> + if (ring)
>
> I think you meant if (ring->queues)
I really meant ring, initially ring allocation error was via this goto. I just
removed the condition now.
>
>> + kfree(ring->queues);
>> + kfree(ring);
>> + return res;
>> +}
>> +
>> +static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>> + int qid)
>> +{
>> + struct fuse_conn *fc = ring->fc;
>> + struct fuse_ring_queue *queue;
>> +
>> + queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
>> + if (!queue)
>> + return ERR_PTR(-ENOMEM);
>> + spin_lock(&fc->lock);
>
> This probably doesn't make much of a difference but might be better to
> minimize logic inside the lock, eg do the queue initialization stuff
> outside the lock
Absolutely, fixed.
>
>> + if (ring->queues[qid]) {
>> + spin_unlock(&fc->lock);
>> + kfree(queue);
>> + return ring->queues[qid];
>> + }
>> +
>> + queue->qid = qid;
>> + queue->ring = ring;
>> + spin_lock_init(&queue->lock);
>> +
>> + INIT_LIST_HEAD(&queue->ent_avail_queue);
>> + INIT_LIST_HEAD(&queue->ent_commit_queue);
>> +
>> + WRITE_ONCE(ring->queues[qid], queue);
>
> Just curious, why do we need WRITE_ONCE here if it's already protected
> by the fc->lock?
fuse_uring_fetch() (going to rename it _register) obtains the the ring
without holding a lock. Only when that fails it calls into
fuse_uring_create(), which then obtains the lock.
I.e. this is optimized to not need the lock at all.
Btw, interesting part here is that I always thought that with a single
libfuse ring thread we should never run into races, until I noticed
with large queue sizes (>128) that there were races in fuse_uring_fetch,
because fuse_uring_create_queue had assigned "ring->queues[qid] = queue"
too early, before initialization of the spin lock (some lockdep
annotations about the queue spin lock not being initialized came up).
>
>> + spin_unlock(&fc->lock);
>> +
>> + return queue;
>> +}
>> +
>> +/*
>> + * Make a ring entry available for fuse_req assignment
>> + */
>> +static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>> + struct fuse_ring_queue *queue)
>> +{
>> + list_move(&ring_ent->list, &queue->ent_avail_queue);
>> + ring_ent->state = FRRS_WAIT;
>> +}
>> +
>> +/*
>> + * fuse_uring_req_fetch command handling
>> + */
>> +static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
>> + struct io_uring_cmd *cmd,
>> + unsigned int issue_flags)
>> +{
>> + struct fuse_ring_queue *queue = ring_ent->queue;
>> +
>> + spin_lock(&queue->lock);
>> + fuse_uring_ent_avail(ring_ent, queue);
>> + ring_ent->cmd = cmd;
>> + spin_unlock(&queue->lock);
>> +}
>> +
>> +/*
>> + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
>> + * the payload
>> + */
>> +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
>> + struct iovec iov[FUSE_URING_IOV_SEGS])
>> +{
>> + struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> + struct iov_iter iter;
>> + ssize_t ret;
>> +
>> + if (sqe->len != FUSE_URING_IOV_SEGS)
>> + return -EINVAL;
>> +
>> + /*
>> + * Direction for buffer access will actually be READ and WRITE,
>> + * using write for the import should include READ access as well.
>> + */
>> + ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
>> + FUSE_URING_IOV_SEGS, &iov, &iter);
>> + if (ret < 0)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
>> + struct fuse_conn *fc)
>> +{
>> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
>> + struct fuse_ring *ring = fc->ring;
>> + struct fuse_ring_queue *queue;
>> + struct fuse_ring_ent *ring_ent;
>> + int err;
>> + struct iovec iov[FUSE_URING_IOV_SEGS];
>> +
>> + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
>> + if (err) {
>> + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
>> + err);
>> + return err;
>> + }
>> +
>> + err = -ENOMEM;
>> + if (!ring) {
>> + ring = fuse_uring_create(fc);
>> + if (!ring)
>> + return err;
>> + }
>> +
>> + queue = ring->queues[cmd_req->qid];
>> + if (!queue) {
>> + queue = fuse_uring_create_queue(ring, cmd_req->qid);
>> + if (!queue)
>> + return err;
>> + }
>> +
>> + /*
>> + * The created queue above does not need to be destructed in
>> + * case of entry errors below, will be done at ring destruction time.
>> + */
>> +
>> + ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
>> + if (ring_ent == NULL)
>
> nit: !ring_ent
Fixed.
>
>> + return err;
>> +
>> + INIT_LIST_HEAD(&ring_ent->list);
>> +
>> + ring_ent->queue = queue;
>> + ring_ent->cmd = cmd;
>> +
>> + err = -EINVAL;
>> + if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
>> + pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
>> + goto err;
>> + }
>> +
>> + ring_ent->headers = iov[0].iov_base;
>> + ring_ent->payload = iov[1].iov_base;
>> + ring_ent->max_arg_len = iov[1].iov_len;
>> +
>> + if (ring_ent->max_arg_len <
>> + max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write)) {
>
> If I'm understanding this correctly, iov[0] is the header and iov[1]
> is the payload. Is this right that the payload len must always be
> equal to fc->max_write?
I'm going to drop ring_ent->max_arg_len for the next version, but it will
eventually come back once I had support for multiple payload sizes.
Typically we will need many small requests, less mid size requests and few
large size requests. Just think of a queue size of 128 and a recent system
with 192 cores.
>
> Also, do we need to take into account fc->max_pages too?
Oh yes, thanks for spotting!
>
>> + pr_info_ratelimited("Invalid req payload len %zu\n",
>> + ring_ent->max_arg_len);
>> + goto err;
>> + }
>> +
>> + spin_lock(&queue->lock);
>> +
>> + /*
>> + * FUSE_URING_REQ_FETCH is an initialization exception, needs
>> + * state override
>> + */
>> + ring_ent->state = FRRS_USERSPACE;
>> + err = fuse_ring_ent_unset_userspace(ring_ent);
>> + spin_unlock(&queue->lock);
>> + if (WARN_ON_ONCE(err != 0))
>
> nit: WARN_ON_ONCE(err)
Fixed.
>
>> + goto err;
>> +
>> + _fuse_uring_fetch(ring_ent, cmd, issue_flags);
>> +
>> + return 0;
>> +err:
>> + list_del_init(&ring_ent->list);
>> + kfree(ring_ent);
>> + return err;
>> +}
>> +
>> +/*
>> + * Entry function from io_uring to handle the given passthrough command
>> + * (op cocde IORING_OP_URING_CMD)
>> + */
>> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> +{
>> + struct fuse_dev *fud;
>> + struct fuse_conn *fc;
>> + u32 cmd_op = cmd->cmd_op;
>> + int err;
>> +
>> + /* Disabled for now, especially as teardown is not implemented yet */
>> + pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
>> + return -EOPNOTSUPP;
>> +
>> + if (!enable_uring) {
>> + pr_info_ratelimited("fuse-io-uring is disabled\n");
>> + return -EOPNOTSUPP;
>> + }
>> +
>> + fud = fuse_get_dev(cmd->file);
>> + if (!fud) {
>> + pr_info_ratelimited("No fuse device found\n");
>> + return -ENOTCONN;
>> + }
>> + fc = fud->fc;
>> +
>> + if (!fc->connected || fc->aborted)
>> + return fc->aborted ? -ECONNABORTED : -ENOTCONN;
>
> I find
> if (fc->aborted)
> return -ECONNABORTED;
> if (!fc->connected)
> return -ENOTCONN;
>
> easier to read
Fine with me.
>> +
>> + switch (cmd_op) {
>> + case FUSE_URING_REQ_FETCH:
>
> FUSE_URING_REQ_FETCH is only used for initialization, would it be
> clearer if this was named FUSE_URING_INIT or something like that?
FUSE_URING_CMD_REGISTER?
>
>> + err = fuse_uring_fetch(cmd, issue_flags, fc);
>> + if (err) {
>> + pr_info_once("fuse_uring_fetch failed err=%d\n", err);
>> + return err;
>> + }
>> + break;
>> + default:
>> + return -EINVAL;
>> + }
>> +
>> + return -EIOCBQUEUED;
>> +}
>> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..75c644cc0b2bb3721b08f8695964815d53f46e92
>> --- /dev/null
>> +++ b/fs/fuse/dev_uring_i.h
>> @@ -0,0 +1,115 @@
>> +/* SPDX-License-Identifier: GPL-2.0
>> + *
>> + * FUSE: Filesystem in Userspace
>> + * Copyright (c) 2023-2024 DataDirect Networks.
>> + */
>> +
>> +#ifndef _FS_FUSE_DEV_URING_I_H
>> +#define _FS_FUSE_DEV_URING_I_H
>> +
>> +#include "fuse_i.h"
>> +
>> +#ifdef CONFIG_FUSE_IO_URING
>> +
>> +enum fuse_ring_req_state {
>> + FRRS_INVALID = 0,
>> +
>> + /* The ring entry received from userspace and it being processed */
>
> nit: "it is being processed"
>
>> + FRRS_COMMIT,
>> +
>> + /* The ring entry is waiting for new fuse requests */
>> + FRRS_WAIT,
>> +
>> + /* The ring entry is in or on the way to user space */
>> + FRRS_USERSPACE,
>> +};
>> +
>> +/** A fuse ring entry, part of the ring queue */
>> +struct fuse_ring_ent {
>> + /* userspace buffer */
>> + struct fuse_uring_req_header __user *headers;
>> + void *__user *payload;
>> +
>> + /* the ring queue that owns the request */
>> + struct fuse_ring_queue *queue;
>> +
>> + struct io_uring_cmd *cmd;
>> +
>> + struct list_head list;
>> +
>> + /* size of payload buffer */
>> + size_t max_arg_len;
>> +
>> + /*
>> + * state the request is currently in
>> + * (enum fuse_ring_req_state)
>> + */
>> + unsigned int state;
>> +
>> + struct fuse_req *fuse_req;
>> +};
>> +
>> +struct fuse_ring_queue {
>> + /*
>> + * back pointer to the main fuse uring structure that holds this
>> + * queue
>> + */
>> + struct fuse_ring *ring;
>> +
>> + /* queue id, typically also corresponds to the cpu core */
>> + unsigned int qid;
>> +
>> + /*
>> + * queue lock, taken when any value in the queue changes _and_ also
>> + * a ring entry state changes.
>> + */
>> + spinlock_t lock;
>> +
>> + /* available ring entries (struct fuse_ring_ent) */
>> + struct list_head ent_avail_queue;
>> +
>> + /*
>> + * entries in the process of being committed or in the process
>> + * to be send to userspace
>> + */
>> + struct list_head ent_commit_queue;
>> +};
>> +
>> +/**
>> + * Describes if uring is for communication and holds alls the data needed
>> + * for uring communication
>> + */
>> +struct fuse_ring {
>> + /* back pointer */
>> + struct fuse_conn *fc;
>> +
>> + /* number of ring queues */
>> + size_t nr_queues;
>> +
>> + struct fuse_ring_queue **queues;
>> +};
>> +
>> +bool fuse_uring_enabled(void);
>> +void fuse_uring_destruct(struct fuse_conn *fc);
>> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
>> +
>> +#else /* CONFIG_FUSE_IO_URING */
>> +
>> +struct fuse_ring;
>> +
>> +static inline void fuse_uring_create(struct fuse_conn *fc)
>> +{
>> +}
>> +
>> +static inline void fuse_uring_destruct(struct fuse_conn *fc)
>> +{
>> +}
>> +
>> +static inline bool fuse_uring_enabled(void)
>> +{
>> + return false;
>> +}
>> +
>> +#endif /* CONFIG_FUSE_IO_URING */
>> +
>> +#endif /* _FS_FUSE_DEV_URING_I_H */
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index e3748751e231d0991c050b31bdd84db0b8016f9f..a21256ec4c3b4bd7c67eae2d03b68d87dcc8234b 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -914,6 +914,11 @@ struct fuse_conn {
>> /** IDR for backing files ids */
>> struct idr backing_files_map;
>> #endif
>> +
>> +#ifdef CONFIG_FUSE_IO_URING
>> + /** uring connection information*/
>> + struct fuse_ring *ring;
>> +#endif
>> };
>>
>> /*
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index fd3321e29a3e569bf06be22a5383cf34fd42c051..76267c79e920204175e5713853de8214c5555d46 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -7,6 +7,7 @@
>> */
>>
>> #include "fuse_i.h"
>> +#include "dev_uring_i.h"
>>
>> #include <linux/pagemap.h>
>> #include <linux/slab.h>
>> @@ -959,6 +960,8 @@ static void delayed_release(struct rcu_head *p)
>> {
>> struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
>>
>> + fuse_uring_destruct(fc);
>> +
>> put_user_ns(fc->user_ns);
>> fc->release(fc);
>> }
>> @@ -1413,6 +1416,13 @@ void fuse_send_init(struct fuse_mount *fm)
>> if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
>> flags |= FUSE_PASSTHROUGH;
>>
>> + /*
>> + * This is just an information flag for fuse server. No need to check
>> + * the reply - server is either sending IORING_OP_URING_CMD or not.
>> + */
>> + if (fuse_uring_enabled())
>> + flags |= FUSE_OVER_IO_URING;
>> +
>> ia->in.flags = flags;
>> ia->in.flags2 = flags >> 32;
>>
>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
>> index f1e99458e29e4fdce5273bc3def242342f207ebd..6d39077edf8cde4fa77130efcec16323839a676c 100644
>> --- a/include/uapi/linux/fuse.h
>> +++ b/include/uapi/linux/fuse.h
>> @@ -220,6 +220,14 @@
>> *
>> * 7.41
>> * - add FUSE_ALLOW_IDMAP
>> + * 7.42
>> + * - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
>> + * structures:
>> + * - fuse_uring_ent_in_out
>> + * - fuse_uring_req_header
>> + * - fuse_uring_cmd_req
>> + * - FUSE_URING_IN_OUT_HEADER_SZ
>> + * - FUSE_URING_OP_IN_OUT_SZ
>> */
>>
>> #ifndef _LINUX_FUSE_H
>> @@ -425,6 +433,7 @@ struct fuse_file_lock {
>> * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
>> * of the request ID indicates resend requests
>> * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
>> + * FUSE_OVER_IO_URING: Indicate that Client supports io-uring
>> */
>> #define FUSE_ASYNC_READ (1 << 0)
>> #define FUSE_POSIX_LOCKS (1 << 1)
>> @@ -471,6 +480,7 @@ struct fuse_file_lock {
>> /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
>> #define FUSE_DIRECT_IO_RELAX FUSE_DIRECT_IO_ALLOW_MMAP
>> #define FUSE_ALLOW_IDMAP (1ULL << 40)
>> +#define FUSE_OVER_IO_URING (1ULL << 41)
>>
>> /**
>> * CUSE INIT request/reply flags
>> @@ -1206,4 +1216,61 @@ struct fuse_supp_groups {
>> uint32_t groups[];
>> };
>>
>> +/**
>> + * Size of the ring buffer header
>> + */
>> +#define FUSE_URING_IN_OUT_HEADER_SZ 128
>> +#define FUSE_URING_OP_IN_OUT_SZ 128
>> +
>> +struct fuse_uring_ent_in_out {
>> + uint64_t flags;
>> +
>> + /* size of use payload buffer */
>> + uint32_t payload_sz;
>> + uint32_t padding;
>> +
>> + uint8_t reserved[30];
>
> out of curiosity, how was 30 chosen here? I think this makes the
> struct 46 bytes?
Clearly a bug, that was supposed to be 16. I changed it to
uint64_t reserved[2];
>
>> +};
>> +
>> +/**
>> + * This structure mapped onto the
>> + */
>> +struct fuse_uring_req_header {
>> + /* struct fuse_in / struct fuse_out */
>> + char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
>> +
>> + /* per op code structs */
>> + char op_in[FUSE_URING_OP_IN_OUT_SZ];
>> +
>> + /* struct fuse_ring_in_out */
>> + char ring_ent_in_out[sizeof(struct fuse_uring_ent_in_out)];
>> +};
>> +
>> +/**
>> + * sqe commands to the kernel
>> + */
>> +enum fuse_uring_cmd {
>> + FUSE_URING_REQ_INVALID = 0,
>> +
>> + /* submit sqe to kernel to get a request */
>> + FUSE_URING_REQ_FETCH = 1,
>> +
>> + /* commit result and fetch next request */
>> + FUSE_URING_REQ_COMMIT_AND_FETCH = 2,
>> +};
>> +
>> +/**
>> + * In the 80B command area of the SQE.
>> + */
>> +struct fuse_uring_cmd_req {
>> + uint64_t flags;
>> +
>> + /* entry identifier */
>> + uint64_t commit_id;
>> +
>> + /* queue the command is for (queue index) */
>> + uint16_t qid;
>> + uint8_t padding[6];
>> +};
>> +
>> #endif /* _LINUX_FUSE_H */
>>
>
> I'll be out the rest of this week for an American holiday
> (Thanksgiving) and next week for a work trip but will review the rest
> of this patchset after that when i get back.
Thanks so much for your review! Happy Thanksgiving and safe travels!
Thanks,
Bernd
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
2024-11-27 13:40 ` [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-11-28 2:23 ` Joanne Koong
@ 2024-12-03 13:24 ` Pavel Begunkov
2024-12-03 13:49 ` Bernd Schubert
2024-12-03 13:38 ` Pavel Begunkov
2 siblings, 1 reply; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 13:24 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
> For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> fs/fuse/Kconfig | 12 ++
> fs/fuse/Makefile | 1 +
> fs/fuse/dev.c | 4 +
> fs/fuse/dev_uring.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/dev_uring_i.h | 115 +++++++++++++++++
> fs/fuse/fuse_i.h | 5 +
> fs/fuse/inode.c | 10 ++
> include/uapi/linux/fuse.h | 67 ++++++++++
> 8 files changed, 532 insertions(+)
>
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..af9c5f116ba1dcf6c01d0359d1a06491c92c32f9
> --- /dev/null
> +++ b/fs/fuse/dev_uring.c
...
> +
> +/*
> + * fuse_uring_req_fetch command handling
> + */
> +static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
> + struct io_uring_cmd *cmd,
> + unsigned int issue_flags)
> +{
> + struct fuse_ring_queue *queue = ring_ent->queue;
> +
> + spin_lock(&queue->lock);
> + fuse_uring_ent_avail(ring_ent, queue);
> + ring_ent->cmd = cmd;
> + spin_unlock(&queue->lock);
> +}
> +
> +/*
> + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
> + * the payload
> + */
> +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
> + struct iovec iov[FUSE_URING_IOV_SEGS])
> +{
> + struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
> + struct iov_iter iter;
> + ssize_t ret;
> +
> + if (sqe->len != FUSE_URING_IOV_SEGS)
> + return -EINVAL;
> +
> + /*
> + * Direction for buffer access will actually be READ and WRITE,
> + * using write for the import should include READ access as well.
> + */
> + ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
> + FUSE_URING_IOV_SEGS, &iov, &iter);
You're throwing away the iterator, I'd be a bit cautious about it.
FUSE_URING_IOV_SEGS is 2, so it should avoid ITER_UBUF, but Jens
can say if it's abuse of the API or not.
Fwiw, it's not the first place I know of that just want to get
an iovec avoiding playing games with different iterator modes.
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
> + struct fuse_conn *fc)
> +{
> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
You need to check that the ring is setup with SQE128.
(!(issue_flags & IO_URING_F_SQE128))
// fail
> + struct fuse_ring *ring = fc->ring;
> + struct fuse_ring_queue *queue;
> + struct fuse_ring_ent *ring_ent;
> + int err;
> + struct iovec iov[FUSE_URING_IOV_SEGS];
> +
> + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> + if (err) {
> + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
> + err);
> + return err;
> + }
> +
> + err = -ENOMEM;
> + if (!ring) {
> + ring = fuse_uring_create(fc);
> + if (!ring)
> + return err;
> + }
> +
> + queue = ring->queues[cmd_req->qid];
> + if (!queue) {
> + queue = fuse_uring_create_queue(ring, cmd_req->qid);
> + if (!queue)
> + return err;
> + }
> +
> + /*
> + * The created queue above does not need to be destructed in
> + * case of entry errors below, will be done at ring destruction time.
> + */
> +
> + ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
> + if (ring_ent == NULL)
> + return err;
> +
> + INIT_LIST_HEAD(&ring_ent->list);
> +
> + ring_ent->queue = queue;
> + ring_ent->cmd = cmd;
nit: seems it's also set immediately after in
_fuse_uring_fetch().
> +
> + err = -EINVAL;
> + if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
> + pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
> + goto err;
> + }
> +
...
> +/*
> + * Entry function from io_uring to handle the given passthrough command
> + * (op cocde IORING_OP_URING_CMD)
> + */
> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
> +{
> + struct fuse_dev *fud;
> + struct fuse_conn *fc;
> + u32 cmd_op = cmd->cmd_op;
> + int err;
> +
> + /* Disabled for now, especially as teardown is not implemented yet */
> + pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
> + return -EOPNOTSUPP;
Do compilers give warnings about such things? Unreachable code, maybe.
I don't care much, but if they do to avoid breaking CONFIG_WERROR you
might want to do sth about it. E.g. I'd usually mark the function
__maybe_unused and not set it into fops until a later patch.
...
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
2024-12-03 13:24 ` Pavel Begunkov
@ 2024-12-03 13:49 ` Bernd Schubert
2024-12-03 14:16 ` Pavel Begunkov
0 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-12-03 13:49 UTC (permalink / raw)
To: Pavel Begunkov, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 12/3/24 14:24, Pavel Begunkov wrote:
> On 11/27/24 13:40, Bernd Schubert wrote:
>> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
>> For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>> fs/fuse/Kconfig | 12 ++
>> fs/fuse/Makefile | 1 +
>> fs/fuse/dev.c | 4 +
>> fs/fuse/dev_uring.c | 318 ++++++++++++++++++++++++++++++++++++
>> ++++++++++
>> fs/fuse/dev_uring_i.h | 115 +++++++++++++++++
>> fs/fuse/fuse_i.h | 5 +
>> fs/fuse/inode.c | 10 ++
>> include/uapi/linux/fuse.h | 67 ++++++++++
>> 8 files changed, 532 insertions(+)
>>
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> new file mode 100644
>> index
>> 0000000000000000000000000000000000000000..af9c5f116ba1dcf6c01d0359d1a06491c92c32f9
>> --- /dev/null
>> +++ b/fs/fuse/dev_uring.c
> ...
>> +
>> +/*
>> + * fuse_uring_req_fetch command handling
>> + */
>> +static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
>> + struct io_uring_cmd *cmd,
>> + unsigned int issue_flags)
>> +{
>> + struct fuse_ring_queue *queue = ring_ent->queue;
>> +
>> + spin_lock(&queue->lock);
>> + fuse_uring_ent_avail(ring_ent, queue);
>> + ring_ent->cmd = cmd;
>> + spin_unlock(&queue->lock);
>> +}
>> +
>> +/*
>> + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
>> + * the payload
>> + */
>> +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
>> + struct iovec iov[FUSE_URING_IOV_SEGS])
>> +{
>> + struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> + struct iov_iter iter;
>> + ssize_t ret;
>> +
>> + if (sqe->len != FUSE_URING_IOV_SEGS)
>> + return -EINVAL;
>> +
>> + /*
>> + * Direction for buffer access will actually be READ and WRITE,
>> + * using write for the import should include READ access as well.
>> + */
>> + ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
>> + FUSE_URING_IOV_SEGS, &iov, &iter);
>
> You're throwing away the iterator, I'd be a bit cautious about it.
> FUSE_URING_IOV_SEGS is 2, so it should avoid ITER_UBUF, but Jens
> can say if it's abuse of the API or not.
>
> Fwiw, it's not the first place I know of that just want to get
> an iovec avoiding playing games with different iterator modes.
Shall I create new exported function like import_iovec_from_user()
that duplicates all the parts from __import_iovec()? I could also
let __import_iovec() use that new function, although there will be
less inlining with -02.
>
>
>> + if (ret < 0)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int
>> issue_flags,
>> + struct fuse_conn *fc)
>> +{
>> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd-
>> >sqe);
>
> You need to check that the ring is setup with SQE128.
>
> (!(issue_flags & IO_URING_F_SQE128))
> // fail
>
>> + struct fuse_ring *ring = fc->ring;
>> + struct fuse_ring_queue *queue;
>> + struct fuse_ring_ent *ring_ent;
>> + int err;
>> + struct iovec iov[FUSE_URING_IOV_SEGS];
>> +
>> + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
>> + if (err) {
>> + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
>> + err);
>> + return err;
>> + }
>> +
>> + err = -ENOMEM;
>> + if (!ring) {
>> + ring = fuse_uring_create(fc);
>> + if (!ring)
>> + return err;
>> + }
>> +
>> + queue = ring->queues[cmd_req->qid];
>> + if (!queue) {
>> + queue = fuse_uring_create_queue(ring, cmd_req->qid);
>> + if (!queue)
>> + return err;
>> + }
>> +
>> + /*
>> + * The created queue above does not need to be destructed in
>> + * case of entry errors below, will be done at ring destruction
>> time.
>> + */
>> +
>> + ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
>> + if (ring_ent == NULL)
>> + return err;
>> +
>> + INIT_LIST_HEAD(&ring_ent->list);
>> +
>> + ring_ent->queue = queue;
>> + ring_ent->cmd = cmd;
>
> nit: seems it's also set immediately after in
> _fuse_uring_fetch().
>
>> +
>> + err = -EINVAL;
>> + if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
>> + pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
>> + goto err;
>> + }
>> +
> ...
>> +/*
>> + * Entry function from io_uring to handle the given passthrough command
>> + * (op cocde IORING_OP_URING_CMD)
>> + */
>> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> +{
>> + struct fuse_dev *fud;
>> + struct fuse_conn *fc;
>> + u32 cmd_op = cmd->cmd_op;
>> + int err;
>> +
>> + /* Disabled for now, especially as teardown is not implemented
>> yet */
>> + pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
>> + return -EOPNOTSUPP;
>
> Do compilers give warnings about such things? Unreachable code, maybe.
> I don't care much, but if they do to avoid breaking CONFIG_WERROR you
> might want to do sth about it. E.g. I'd usually mark the function
> __maybe_unused and not set it into fops until a later patch.
I don't get any warning, but I can also do what you suggest.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
2024-12-03 13:49 ` Bernd Schubert
@ 2024-12-03 14:16 ` Pavel Begunkov
0 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 14:16 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 12/3/24 13:49, Bernd Schubert wrote:
> On 12/3/24 14:24, Pavel Begunkov wrote:
>> On 11/27/24 13:40, Bernd Schubert wrote:
>>> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
>>> For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
...
>>> + /*
>>> + * Direction for buffer access will actually be READ and WRITE,
>>> + * using write for the import should include READ access as well.
>>> + */
>>> + ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
>>> + FUSE_URING_IOV_SEGS, &iov, &iter);
>>
>> You're throwing away the iterator, I'd be a bit cautious about it.
>> FUSE_URING_IOV_SEGS is 2, so it should avoid ITER_UBUF, but Jens
>> can say if it's abuse of the API or not.
>>
>> Fwiw, it's not the first place I know of that just want to get
>> an iovec avoiding playing games with different iterator modes.
>
>
> Shall I create new exported function like import_iovec_from_user()
> that duplicates all the parts from __import_iovec()? I could also
> let __import_iovec() use that new function, although there will be
> less inlining with -02.
I'd say we can just leave it as is for now and deal later. That is
unless someone complains.
>>> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>>> +{
>>> + struct fuse_dev *fud;
>>> + struct fuse_conn *fc;
>>> + u32 cmd_op = cmd->cmd_op;
>>> + int err;
>>> +
>>> + /* Disabled for now, especially as teardown is not implemented
>>> yet */
>>> + pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
>>> + return -EOPNOTSUPP;
>>
>> Do compilers give warnings about such things? Unreachable code, maybe.
>> I don't care much, but if they do to avoid breaking CONFIG_WERROR you
>> might want to do sth about it. E.g. I'd usually mark the function
>> __maybe_unused and not set it into fops until a later patch.
>
>
> I don't get any warning, but I can also do what you suggest.
Got it, no preference then.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
2024-11-27 13:40 ` [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-11-28 2:23 ` Joanne Koong
2024-12-03 13:24 ` Pavel Begunkov
@ 2024-12-03 13:38 ` Pavel Begunkov
2 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 13:38 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
...
> +static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
> + struct fuse_conn *fc)
> +{
> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
> + struct fuse_ring *ring = fc->ring;
> + struct fuse_ring_queue *queue;
> + struct fuse_ring_ent *ring_ent;
> + int err;
> + struct iovec iov[FUSE_URING_IOV_SEGS];
> +
> + err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
> + if (err) {
> + pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
> + err);
> + return err;
> + }
> +
> + err = -ENOMEM;
> + if (!ring) {
> + ring = fuse_uring_create(fc);
> + if (!ring)
> + return err;
> + }
> +
> + queue = ring->queues[cmd_req->qid];
cmd_req points into an SQE, which can be modified by user at any moment.
All reads should be READ_ONCE(). And unless you can tolerate getting
different values you must avoid reading it twice. It shouldn't
normally happen unless the user is buggy / malicious.
> + if (!queue) {
> + queue = fuse_uring_create_queue(ring, cmd_req->qid);
> + if (!queue)
> + return err;
> + }
> +
> + /*
> + * The created queue above does not need to be destructed in
> + * case of entry errors below, will be done at ring destruction time.
> + */
> +
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 07/16] fuse: Make fuse_copy non static
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (5 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
` (10 subsequent siblings)
17 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 30 ++++++++----------------------
fs/fuse/fuse_dev_i.h | 25 +++++++++++++++++++++++++
2 files changed, 33 insertions(+), 22 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0770373492ae9ee83c4154fede9dcfd7be9fb33d..94314bb61ea16b1ea4a42a1a4b36fbf5cbf7ef70 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -679,22 +679,8 @@ static int unlock_request(struct fuse_req *req)
return err;
}
-struct fuse_copy_state {
- int write;
- struct fuse_req *req;
- struct iov_iter *iter;
- struct pipe_buffer *pipebufs;
- struct pipe_buffer *currbuf;
- struct pipe_inode_info *pipe;
- unsigned long nr_segs;
- struct page *pg;
- unsigned len;
- unsigned offset;
- unsigned move_pages:1;
-};
-
-static void fuse_copy_init(struct fuse_copy_state *cs, int write,
- struct iov_iter *iter)
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+ struct iov_iter *iter)
{
memset(cs, 0, sizeof(*cs));
cs->write = write;
@@ -1045,9 +1031,9 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
}
/* Copy request arguments to/from userspace buffer */
-static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
- unsigned argpages, struct fuse_arg *args,
- int zeroing)
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
+ unsigned argpages, struct fuse_arg *args,
+ int zeroing)
{
int err = 0;
unsigned i;
@@ -1922,8 +1908,8 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
return NULL;
}
-static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
- unsigned nbytes)
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+ unsigned nbytes)
{
unsigned reqsize = sizeof(struct fuse_out_header);
@@ -2025,7 +2011,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
if (oh.error)
err = nbytes != sizeof(oh) ? -EINVAL : 0;
else
- err = copy_out_args(cs, req->args, nbytes);
+ err = fuse_copy_out_args(cs, req->args, nbytes);
fuse_copy_finish(cs);
spin_lock(&fpq->lock);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 08a7e88e002773fcd18c25a229c7aa6450831401..21eb1bdb492d04f0a406d25bb8d300b34244dce2 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -12,6 +12,23 @@
#define FUSE_INT_REQ_BIT (1ULL << 0)
#define FUSE_REQ_ID_STEP (1ULL << 1)
+struct fuse_arg;
+struct fuse_args;
+
+struct fuse_copy_state {
+ int write;
+ struct fuse_req *req;
+ struct iov_iter *iter;
+ struct pipe_buffer *pipebufs;
+ struct pipe_buffer *currbuf;
+ struct pipe_inode_info *pipe;
+ unsigned long nr_segs;
+ struct page *pg;
+ unsigned int len;
+ unsigned int offset;
+ unsigned int move_pages:1;
+};
+
static inline struct fuse_dev *fuse_get_dev(struct file *file)
{
/*
@@ -23,5 +40,13 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
void fuse_dev_end_requests(struct list_head *head);
+void fuse_copy_init(struct fuse_copy_state *cs, int write,
+ struct iov_iter *iter);
+int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
+ unsigned int argpages, struct fuse_arg *args,
+ int zeroing);
+int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
+ unsigned int nbytes);
+
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFC v7 08/16] fuse: Add fuse-io-uring handling into fuse_copy
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (6 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 07/16] fuse: Make fuse_copy non static Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
` (9 subsequent siblings)
17 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
Add special fuse-io-uring into the fuse argument
copy handler.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 12 +++++++++++-
fs/fuse/fuse_dev_i.h | 5 +++++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 94314bb61ea16b1ea4a42a1a4b36fbf5cbf7ef70..51d9c14a8ede9348661921f5d5a6d837c76f91ee 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -787,6 +787,9 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
*size -= ncpy;
cs->len -= ncpy;
cs->offset += ncpy;
+ if (cs->is_uring)
+ cs->ring.offset += ncpy;
+
return ncpy;
}
@@ -1911,7 +1914,14 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
unsigned nbytes)
{
- unsigned reqsize = sizeof(struct fuse_out_header);
+
+ unsigned int reqsize = 0;
+
+ /*
+ * Uring has all headers separated from args - args is payload only
+ */
+ if (!cs->is_uring)
+ reqsize = sizeof(struct fuse_out_header);
reqsize += fuse_len_args(args->out_numargs, args->out_args);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 21eb1bdb492d04f0a406d25bb8d300b34244dce2..0708730b656b97071de9a5331ef4d51a112c602c 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -27,6 +27,11 @@ struct fuse_copy_state {
unsigned int len;
unsigned int offset;
unsigned int move_pages:1;
+ unsigned int is_uring:1;
+ struct {
+ /* overall offset with the user buffer */
+ unsigned int offset;
+ } ring;
};
static inline struct fuse_dev *fuse_get_dev(struct file *file)
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFC v7 09/16] fuse: {uring} Add uring sqe commit and fetch support
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (7 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-12-03 13:47 ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
` (8 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This adds support for fuse request completion through ring SQEs
(FUSE_URING_REQ_COMMIT_AND_FETCH handling). After committing
the ring entry it becomes available for new fuse requests.
Handling of requests through the ring (SQE/CQE handling)
is complete now.
Fuse request data are copied through the mmaped ring buffer,
there is no support for any zero copy yet.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 6 +-
fs/fuse/dev_uring.c | 423 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 12 ++
fs/fuse/fuse_dev_i.h | 6 +
fs/fuse/fuse_i.h | 9 ++
fs/fuse/inode.c | 2 +-
6 files changed, 454 insertions(+), 4 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 51d9c14a8ede9348661921f5d5a6d837c76f91ee..143ce48a1898d911aafbba2b2e734805509169c2 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -221,7 +221,7 @@ u64 fuse_get_unique(struct fuse_iqueue *fiq)
}
EXPORT_SYMBOL_GPL(fuse_get_unique);
-static unsigned int fuse_req_hash(u64 unique)
+unsigned int fuse_req_hash(u64 unique)
{
return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS);
}
@@ -1899,7 +1899,7 @@ static int fuse_notify(struct fuse_conn *fc, enum fuse_notify_code code,
}
/* Look up request on processing list by unique ID */
-static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique)
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique)
{
unsigned int hash = fuse_req_hash(unique);
struct fuse_req *req;
@@ -1983,7 +1983,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
spin_lock(&fpq->lock);
req = NULL;
if (fpq->connected)
- req = request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
+ req = fuse_request_find(fpq, oh.unique & ~FUSE_INT_REQ_BIT);
err = -ENOENT;
if (!req) {
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index af9c5f116ba1dcf6c01d0359d1a06491c92c32f9..7bb07f5ba436fcb89537f0821f08a7167da52902 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -24,6 +24,19 @@ bool fuse_uring_enabled(void)
return enable_uring;
}
+static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
+ int error)
+{
+ struct fuse_req *req = ring_ent->fuse_req;
+
+ if (set_err)
+ req->out.h.error = error;
+
+ clear_bit(FR_SENT, &req->flags);
+ fuse_request_end(ring_ent->fuse_req);
+ ring_ent->fuse_req = NULL;
+}
+
static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
{
struct fuse_ring_queue *queue = ent->queue;
@@ -54,8 +67,11 @@ void fuse_uring_destruct(struct fuse_conn *fc)
continue;
WARN_ON(!list_empty(&queue->ent_avail_queue));
+ WARN_ON(!list_empty(&queue->ent_w_req_queue));
WARN_ON(!list_empty(&queue->ent_commit_queue));
+ WARN_ON(!list_empty(&queue->ent_in_userspace));
+ kfree(queue->fpq.processing);
kfree(queue);
ring->queues[qid] = NULL;
}
@@ -114,13 +130,21 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
{
struct fuse_conn *fc = ring->fc;
struct fuse_ring_queue *queue;
+ struct list_head *pq;
queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
if (!queue)
return ERR_PTR(-ENOMEM);
+ pq = kcalloc(FUSE_PQ_HASH_SIZE, sizeof(struct list_head), GFP_KERNEL);
+ if (!pq) {
+ kfree(queue);
+ return ERR_PTR(-ENOMEM);
+ }
+
spin_lock(&fc->lock);
if (ring->queues[qid]) {
spin_unlock(&fc->lock);
+ kfree(queue->fpq.processing);
kfree(queue);
return ring->queues[qid];
}
@@ -131,6 +155,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
INIT_LIST_HEAD(&queue->ent_avail_queue);
INIT_LIST_HEAD(&queue->ent_commit_queue);
+ INIT_LIST_HEAD(&queue->ent_w_req_queue);
+ INIT_LIST_HEAD(&queue->ent_in_userspace);
+ INIT_LIST_HEAD(&queue->fuse_req_queue);
+
+ queue->fpq.processing = pq;
+ fuse_pqueue_init(&queue->fpq);
WRITE_ONCE(ring->queues[qid], queue);
spin_unlock(&fc->lock);
@@ -138,6 +168,213 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
return queue;
}
+static void
+fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ io_uring_cmd_done(cmd, 0, 0, issue_flags);
+}
+
+/*
+ * Checks for errors and stores it into the request
+ */
+static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
+ struct fuse_req *req,
+ struct fuse_conn *fc)
+{
+ int err;
+
+ err = -EINVAL;
+ if (oh->unique == 0) {
+ /* Not supportd through io-uring yet */
+ pr_warn_once("fuse: notify through fuse-io-uring not supported\n");
+ goto seterr;
+ }
+
+ err = -EINVAL;
+ if (oh->error <= -ERESTARTSYS || oh->error > 0)
+ goto seterr;
+
+ if (oh->error) {
+ err = oh->error;
+ goto err;
+ }
+
+ err = -ENOENT;
+ if ((oh->unique & ~FUSE_INT_REQ_BIT) != req->in.h.unique) {
+ pr_warn_ratelimited("Unexpected seqno mismatch, expected: %llu got %llu\n",
+ req->in.h.unique, oh->unique & ~FUSE_INT_REQ_BIT);
+ goto seterr;
+ }
+
+ /*
+ * Is it an interrupt reply ID?
+ * XXX: Not supported through fuse-io-uring yet, it should not even
+ * find the request - should not happen.
+ */
+ WARN_ON_ONCE(oh->unique & FUSE_INT_REQ_BIT);
+
+ return 0;
+
+seterr:
+ oh->error = err;
+err:
+ return err;
+}
+
+static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
+ struct fuse_req *req,
+ struct fuse_ring_ent *ent)
+{
+ struct fuse_copy_state cs;
+ struct fuse_args *args = req->args;
+ struct iov_iter iter;
+ int err, res;
+ struct fuse_uring_ent_in_out ring_in_out;
+
+ res = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
+ sizeof(ring_in_out));
+ if (res)
+ return -EFAULT;
+
+ err = import_ubuf(ITER_SOURCE, ent->payload, ent->max_arg_len, &iter);
+ if (err)
+ return err;
+
+ fuse_copy_init(&cs, 0, &iter);
+ cs.is_uring = 1;
+ cs.req = req;
+
+ return fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
+}
+
+ /*
+ * Copy data from the req to the ring buffer
+ */
+static int fuse_uring_copy_to_ring(struct fuse_ring *ring, struct fuse_req *req,
+ struct fuse_ring_ent *ent)
+{
+ struct fuse_copy_state cs;
+ struct fuse_args *args = req->args;
+ struct fuse_in_arg *in_args = args->in_args;
+ int num_args = args->in_numargs;
+ int err, res;
+ struct iov_iter iter;
+ struct fuse_uring_ent_in_out ring_in_out = { .flags = 0 };
+
+ if (num_args == 0)
+ return 0;
+
+ err = import_ubuf(ITER_DEST, ent->payload, ent->max_arg_len, &iter);
+ if (err) {
+ pr_info_ratelimited("fuse: Import of user buffer failed\n");
+ return err;
+ }
+
+ fuse_copy_init(&cs, 1, &iter);
+ cs.is_uring = 1;
+ cs.req = req;
+
+ /*
+ * Expectation is that the first argument is the per op header.
+ * Some op code have that as zero.
+ */
+ if (args->in_args[0].size > 0) {
+ res = copy_to_user(&ent->headers->op_in, in_args->value,
+ in_args->size);
+ err = res > 0 ? -EFAULT : res;
+ if (err) {
+ pr_info_ratelimited("Copying the header failed.\n");
+ return err;
+ }
+ }
+ in_args++;
+ num_args--;
+
+ /* copy the payload */
+ err = fuse_copy_args(&cs, num_args, args->in_pages,
+ (struct fuse_arg *)in_args, 0);
+ if (err) {
+ pr_info_ratelimited("%s fuse_copy_args failed\n", __func__);
+ return err;
+ }
+
+ ring_in_out.payload_sz = cs.ring.offset;
+ res = copy_to_user(&ent->headers->ring_ent_in_out, &ring_in_out,
+ sizeof(ring_in_out));
+ err = res > 0 ? -EFAULT : res;
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int
+fuse_uring_prepare_send(struct fuse_ring_ent *ring_ent)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_req *req = ring_ent->fuse_req;
+ int err = 0, res;
+
+ if (WARN_ON(ring_ent->state != FRRS_FUSE_REQ)) {
+ pr_err("qid=%d ring-req=%p invalid state %d on send\n",
+ queue->qid, ring_ent, ring_ent->state);
+ err = -EIO;
+ }
+
+ if (err)
+ return err;
+
+ /* copy the request */
+ err = fuse_uring_copy_to_ring(ring, req, ring_ent);
+ if (unlikely(err)) {
+ pr_info("Copy to ring failed: %d\n", err);
+ goto err;
+ }
+
+ /* copy fuse_in_header */
+ res = copy_to_user(&ring_ent->headers->in_out, &req->in.h,
+ sizeof(req->in.h));
+ err = res > 0 ? -EFAULT : res;
+ if (err)
+ goto err;
+
+ set_bit(FR_SENT, &req->flags);
+ return 0;
+
+err:
+ fuse_uring_req_end(ring_ent, true, err);
+ return err;
+}
+
+/*
+ * Write data to the ring buffer and send the request to userspace,
+ * userspace will read it
+ * This is comparable with classical read(/dev/fuse)
+ */
+static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
+{
+ int err = 0;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ err = fuse_uring_prepare_send(ring_ent);
+ if (err)
+ goto err;
+
+ spin_lock(&queue->lock);
+ ring_ent->state = FRRS_USERSPACE;
+ list_move(&ring_ent->list, &queue->ent_in_userspace);
+ spin_unlock(&queue->lock);
+
+ io_uring_cmd_complete_in_task(ring_ent->cmd,
+ fuse_uring_async_send_to_ring);
+ return 0;
+
+err:
+ return err;
+}
+
/*
* Make a ring entry available for fuse_req assignment
*/
@@ -148,6 +385,189 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
ring_ent->state = FRRS_WAIT;
}
+/* Used to find the request on SQE commit */
+static void fuse_uring_add_to_pq(struct fuse_ring_ent *ring_ent,
+ struct fuse_req *req)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_pqueue *fpq = &queue->fpq;
+ unsigned int hash;
+
+ req->ring_entry = ring_ent;
+ hash = fuse_req_hash(req->in.h.unique);
+ list_move_tail(&req->list, &fpq->processing[hash]);
+}
+
+/*
+ * Assign a fuse queue entry to the given entry
+ */
+static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ring_ent,
+ struct fuse_req *req)
+{
+ struct fuse_ring_queue *queue = ring_ent->queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ if (WARN_ON_ONCE(ring_ent->state != FRRS_WAIT &&
+ ring_ent->state != FRRS_COMMIT)) {
+ pr_warn("%s qid=%d state=%d\n", __func__, ring_ent->queue->qid,
+ ring_ent->state);
+ }
+ list_del_init(&req->list);
+ clear_bit(FR_PENDING, &req->flags);
+ ring_ent->fuse_req = req;
+ ring_ent->state = FRRS_FUSE_REQ;
+ list_move(&ring_ent->list, &queue->ent_w_req_queue);
+ fuse_uring_add_to_pq(ring_ent, req);
+}
+
+/*
+ * Release the ring entry and fetch the next fuse request if available
+ *
+ * @return true if a new request has been fetched
+ */
+static bool fuse_uring_ent_assign_req(struct fuse_ring_ent *ring_ent)
+ __must_hold(&queue->lock)
+{
+ struct fuse_req *req = NULL;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ struct list_head *req_queue = &queue->fuse_req_queue;
+
+ lockdep_assert_held(&queue->lock);
+
+ /* get and assign the next entry while it is still holding the lock */
+ if (!list_empty(req_queue)) {
+ req = list_first_entry(req_queue, struct fuse_req, list);
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+ }
+
+ return req ? true : false;
+}
+
+/*
+ * Read data from the ring buffer, which user space has written to
+ * This is comparible with handling of classical write(/dev/fuse).
+ * Also make the ring request available again for new fuse requests.
+ */
+static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
+ unsigned int issue_flags)
+{
+ struct fuse_ring *ring = ring_ent->queue->ring;
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_req *req = ring_ent->fuse_req;
+ ssize_t err = 0;
+ bool set_err = false;
+
+ err = copy_from_user(&req->out.h, &ring_ent->headers->in_out,
+ sizeof(req->out.h));
+ if (err) {
+ req->out.h.error = err;
+ goto out;
+ }
+
+ err = fuse_uring_out_header_has_err(&req->out.h, req, fc);
+ if (err) {
+ /* req->out.h.error already set */
+ goto out;
+ }
+
+ err = fuse_uring_copy_from_ring(ring, req, ring_ent);
+ if (err)
+ set_err = true;
+
+out:
+ fuse_uring_req_end(ring_ent, set_err, err);
+}
+
+/*
+ * Get the next fuse req and send it
+ */
+static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
+ struct fuse_ring_queue *queue)
+{
+ int has_next, err;
+ int prev_state = ring_ent->state;
+
+ do {
+ spin_lock(&queue->lock);
+ has_next = fuse_uring_ent_assign_req(ring_ent);
+ if (!has_next) {
+ fuse_uring_ent_avail(ring_ent, queue);
+ spin_unlock(&queue->lock);
+ break; /* no request left */
+ }
+ spin_unlock(&queue->lock);
+
+ err = fuse_uring_send_next_to_ring(ring_ent);
+ if (err)
+ ring_ent->state = prev_state;
+ } while (err);
+}
+
+/* FUSE_URING_REQ_COMMIT_AND_FETCH handler */
+static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
+ struct fuse_conn *fc)
+{
+ const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
+ struct fuse_ring_ent *ring_ent;
+ int err;
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ uint64_t commit_id = cmd_req->commit_id;
+ struct fuse_pqueue fpq;
+ struct fuse_req *req;
+
+ err = -ENOTCONN;
+ if (!ring)
+ return err;
+
+ queue = ring->queues[cmd_req->qid];
+ if (!queue)
+ return err;
+ fpq = queue->fpq;
+
+ spin_lock(&queue->lock);
+ /* Find a request based on the unique ID of the fuse request
+ * This should get revised, as it needs a hash calculation and list
+ * search. And full struct fuse_pqueue is needed (memory overhead).
+ * As well as the link from req to ring_ent.
+ */
+ req = fuse_request_find(&fpq, commit_id);
+ err = -ENOENT;
+ if (!req) {
+ pr_info("qid=%d commit_id %llu not found\n", queue->qid,
+ commit_id);
+ spin_unlock(&queue->lock);
+ return err;
+ }
+ list_del_init(&req->list);
+ ring_ent = req->ring_entry;
+ req->ring_entry = NULL;
+
+ err = fuse_ring_ent_unset_userspace(ring_ent);
+ if (err != 0) {
+ pr_info_ratelimited("qid=%d commit_id %llu state %d",
+ queue->qid, commit_id, ring_ent->state);
+ spin_unlock(&queue->lock);
+ return err;
+ }
+
+ ring_ent->cmd = cmd;
+ spin_unlock(&queue->lock);
+
+ /* without the queue lock, as other locks are taken */
+ fuse_uring_commit(ring_ent, issue_flags);
+
+ /*
+ * Fetching the next request is absolutely required as queued
+ * fuse requests would otherwise not get processed - committing
+ * and fetching is done in one step vs legacy fuse, which has separated
+ * read (fetch request) and write (commit result).
+ */
+ fuse_uring_next_fuse_req(ring_ent, queue);
+ return 0;
+}
+
/*
* fuse_uring_req_fetch command handling
*/
@@ -310,6 +730,9 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
return err;
}
break;
+ case FUSE_URING_REQ_COMMIT_AND_FETCH:
+ err = fuse_uring_commit_fetch(cmd, issue_flags, fc);
+ break;
default:
return -EINVAL;
}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 75c644cc0b2bb3721b08f8695964815d53f46e92..b2722828bd20dce5f5bd7884ba87861fb6e0d97b 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -20,6 +20,9 @@ enum fuse_ring_req_state {
/* The ring entry is waiting for new fuse requests */
FRRS_WAIT,
+ /* The ring entry got assigned a fuse req */
+ FRRS_FUSE_REQ,
+
/* The ring entry is in or on the way to user space */
FRRS_USERSPACE,
};
@@ -72,7 +75,16 @@ struct fuse_ring_queue {
* entries in the process of being committed or in the process
* to be send to userspace
*/
+ struct list_head ent_w_req_queue;
struct list_head ent_commit_queue;
+
+ /* entries in userspace */
+ struct list_head ent_in_userspace;
+
+ /* fuse requests waiting for an entry slot */
+ struct list_head fuse_req_queue;
+
+ struct fuse_pqueue fpq;
};
/**
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 0708730b656b97071de9a5331ef4d51a112c602c..d7bf72dabd84c3896d1447380649e2f4d20b0643 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -7,6 +7,7 @@
#define _FS_FUSE_DEV_I_H
#include <linux/types.h>
+#include <linux/fs.h>
/* Ordinary requests have even IDs, while interrupts IDs are odd */
#define FUSE_INT_REQ_BIT (1ULL << 0)
@@ -14,6 +15,8 @@
struct fuse_arg;
struct fuse_args;
+struct fuse_pqueue;
+struct fuse_req;
struct fuse_copy_state {
int write;
@@ -43,6 +46,9 @@ static inline struct fuse_dev *fuse_get_dev(struct file *file)
return READ_ONCE(file->private_data);
}
+unsigned int fuse_req_hash(u64 unique);
+struct fuse_req *fuse_request_find(struct fuse_pqueue *fpq, u64 unique);
+
void fuse_dev_end_requests(struct list_head *head);
void fuse_copy_init(struct fuse_copy_state *cs, int write,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index a21256ec4c3b4bd7c67eae2d03b68d87dcc8234b..dd628996bbd45b375ca36694067f7be3e58698fa 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -435,6 +435,10 @@ struct fuse_req {
/** fuse_mount this request belongs to */
struct fuse_mount *fm;
+
+#ifdef CONFIG_FUSE_IO_URING
+ void *ring_entry;
+#endif
};
struct fuse_iqueue;
@@ -1221,6 +1225,11 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
*/
struct fuse_conn *fuse_conn_get(struct fuse_conn *fc);
+/**
+ * Initialize the fuse processing queue
+ */
+void fuse_pqueue_init(struct fuse_pqueue *fpq);
+
/**
* Initialize fuse_conn
*/
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 76267c79e920204175e5713853de8214c5555d46..2943d25bff9a96fde8c3b51ea8fb02eedd3e6004 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -906,7 +906,7 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq,
fiq->priv = priv;
}
-static void fuse_pqueue_init(struct fuse_pqueue *fpq)
+void fuse_pqueue_init(struct fuse_pqueue *fpq)
{
unsigned int i;
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 09/16] fuse: {uring} Add uring sqe commit and fetch support
2024-11-27 13:40 ` [PATCH RFC v7 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
@ 2024-12-03 13:47 ` Pavel Begunkov
0 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 13:47 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
> This adds support for fuse request completion through ring SQEs
> (FUSE_URING_REQ_COMMIT_AND_FETCH handling). After committing
> the ring entry it becomes available for new fuse requests.
> Handling of requests through the ring (SQE/CQE handling)
> is complete now.
>
> Fuse request data are copied through the mmaped ring buffer,
> there is no support for any zero copy yet.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
...
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index af9c5f116ba1dcf6c01d0359d1a06491c92c32f9..7bb07f5ba436fcb89537f0821f08a7167da52902 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -24,6 +24,19 @@ bool fuse_uring_enabled(void)
...
> +/*
> + * Write data to the ring buffer and send the request to userspace,
> + * userspace will read it
> + * This is comparable with classical read(/dev/fuse)
> + */
> +static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
> +{
> + int err = 0;
> + struct fuse_ring_queue *queue = ring_ent->queue;
> +
> + err = fuse_uring_prepare_send(ring_ent);
> + if (err)
> + goto err;
> +
> + spin_lock(&queue->lock);
> + ring_ent->state = FRRS_USERSPACE;
> + list_move(&ring_ent->list, &queue->ent_in_userspace);
> + spin_unlock(&queue->lock);
> +
> + io_uring_cmd_complete_in_task(ring_ent->cmd,
> + fuse_uring_async_send_to_ring);
Just io_uring_cmd_done should be enough.
io_uring_cmd_done(cmd, 0, 0, issue_flags);
> + return 0;
> +
> +err:
> + return err;
> +}
> +
> /*
> * Make a ring entry available for fuse_req assignment
> */
> @@ -148,6 +385,189 @@ static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
> ring_ent->state = FRRS_WAIT;
> }
>
...
> +
> +/* FUSE_URING_REQ_COMMIT_AND_FETCH handler */
> +static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
> + struct fuse_conn *fc)
> +{
> + const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
Same comment about checking SQE128, fuse_uring_cmd() sounds like a good
place for it.
> + struct fuse_ring_ent *ring_ent;
> + int err;
> + struct fuse_ring *ring = fc->ring;
> + struct fuse_ring_queue *queue;
> + uint64_t commit_id = cmd_req->commit_id;
> + struct fuse_pqueue fpq;
> + struct fuse_req *req;
> +
> + err = -ENOTCONN;
> + if (!ring)
> + return err;
> +
> + queue = ring->queues[cmd_req->qid];
READ_ONCE()
> + if (!queue)
> + return err;
> + fpq = queue->fpq;
> +
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 10/16] fuse: {uring} Handle teardown of ring entries
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (8 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
` (7 subsequent siblings)
17 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
On teardown struct file_operations::uring_cmd requests
need to be completed by calling io_uring_cmd_done().
Not completing all ring entries would result in busy io-uring
tasks giving warning messages in intervals and unreleased
struct file.
Additionally the fuse connection and with that the ring can
only get released when all io-uring commands are completed.
Completion is done with ring entries that are
a) in waiting state for new fuse requests - io_uring_cmd_done
is needed
b) already in userspace - io_uring_cmd_done through teardown
is not needed, the request can just get released. If fuse server
is still active and commits such a ring entry, fuse_uring_cmd()
already checks if the connection is active and then complete the
io-uring itself with -ENOTCONN. I.e. special handling is not
needed.
This scheme is basically represented by the ring entry state
FRRS_WAIT and FRRS_USERSPACE.
Entries in state:
- FRRS_INIT: No action needed, do not contribute to
ring->queue_refs yet
- All other states: Are currently processed by other tasks,
async teardown is needed and it has to wait for the two
states above. It could be also solved without an async
teardown task, but would require additional if conditions
in hot code paths. Also in my personal opinion the code
looks cleaner with async teardown.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 8 ++
fs/fuse/dev_uring.c | 206 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 51 +++++++++++++
3 files changed, 265 insertions(+)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 143ce48a1898d911aafbba2b2e734805509169c2..ed85a5f2d6c360b8401b174bc97cc135d87e90d9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2280,6 +2280,12 @@ void fuse_abort_conn(struct fuse_conn *fc)
spin_unlock(&fc->lock);
fuse_dev_end_requests(&to_end);
+
+ /*
+ * fc->lock must not be taken to avoid conflicts with io-uring
+ * locks
+ */
+ fuse_uring_abort(fc);
} else {
spin_unlock(&fc->lock);
}
@@ -2291,6 +2297,8 @@ void fuse_wait_aborted(struct fuse_conn *fc)
/* matches implicit memory barrier in fuse_drop_waiting() */
smp_mb();
wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0);
+
+ fuse_uring_wait_stopped_queues(fc);
}
int fuse_dev_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 7bb07f5ba436fcb89537f0821f08a7167da52902..b5bb78b5f5902c8b87f3e196baaa05640380046d 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -52,6 +52,37 @@ static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
return 0;
}
+/* Abort all list queued request on the given ring queue */
+static void fuse_uring_abort_end_queue_requests(struct fuse_ring_queue *queue)
+{
+ struct fuse_req *req;
+ LIST_HEAD(req_list);
+
+ spin_lock(&queue->lock);
+ list_for_each_entry(req, &queue->fuse_req_queue, list)
+ clear_bit(FR_PENDING, &req->flags);
+ list_splice_init(&queue->fuse_req_queue, &req_list);
+ spin_unlock(&queue->lock);
+
+ /* must not hold queue lock to avoid order issues with fi->lock */
+ fuse_dev_end_requests(&req_list);
+}
+
+void fuse_uring_abort_end_requests(struct fuse_ring *ring)
+{
+ int qid;
+ struct fuse_ring_queue *queue;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ queue = READ_ONCE(ring->queues[qid]);
+ if (!queue)
+ continue;
+
+ queue->stopped = true;
+ fuse_uring_abort_end_queue_requests(queue);
+ }
+}
+
void fuse_uring_destruct(struct fuse_conn *fc)
{
struct fuse_ring *ring = fc->ring;
@@ -111,9 +142,12 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
goto out_err;
}
+ init_waitqueue_head(&ring->stop_waitq);
+
fc->ring = ring;
ring->nr_queues = nr_queues;
ring->fc = fc;
+ atomic_set(&ring->queue_refs, 0);
spin_unlock(&fc->lock);
return ring;
@@ -175,6 +209,174 @@ fuse_uring_async_send_to_ring(struct io_uring_cmd *cmd,
io_uring_cmd_done(cmd, 0, 0, issue_flags);
}
+static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
+{
+ struct fuse_req *req = ent->fuse_req;
+
+ /* remove entry from fuse_pqueue->processing */
+ list_del_init(&req->list);
+ ent->fuse_req = NULL;
+ clear_bit(FR_SENT, &req->flags);
+ req->out.h.error = -ECONNABORTED;
+ fuse_request_end(req);
+}
+
+/*
+ * Release a request/entry on connection tear down
+ */
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
+ bool need_cmd_done)
+{
+ /*
+ * fuse_request_end() might take other locks like fi->lock and
+ * can lead to lock ordering issues
+ */
+ lockdep_assert_not_held(&ent->queue->lock);
+
+ if (need_cmd_done)
+ io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
+ IO_URING_F_UNLOCKED);
+
+ if (ent->fuse_req)
+ fuse_uring_stop_fuse_req_end(ent);
+
+ list_del_init(&ent->list);
+ kfree(ent);
+}
+
+static void fuse_uring_stop_list_entries(struct list_head *head,
+ struct fuse_ring_queue *queue,
+ enum fuse_ring_req_state exp_state)
+{
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_ring_ent *ent, *next;
+ ssize_t queue_refs = SSIZE_MAX;
+ LIST_HEAD(to_teardown);
+
+ spin_lock(&queue->lock);
+ list_for_each_entry_safe(ent, next, head, list) {
+ if (ent->state != exp_state) {
+ pr_warn("entry teardown qid=%d state=%d expected=%d",
+ queue->qid, ent->state, exp_state);
+ continue;
+ }
+
+ list_move(&ent->list, &to_teardown);
+ }
+ spin_unlock(&queue->lock);
+
+ /* no queue lock to avoid lock order issues */
+ list_for_each_entry_safe(ent, next, &to_teardown, list) {
+ bool need_cmd_done = ent->state != FRRS_USERSPACE;
+
+ fuse_uring_entry_teardown(ent, need_cmd_done);
+ queue_refs = atomic_dec_return(&ring->queue_refs);
+
+ WARN_ON_ONCE(queue_refs < 0);
+ }
+}
+
+static void fuse_uring_teardown_entries(struct fuse_ring_queue *queue)
+{
+ fuse_uring_stop_list_entries(&queue->ent_in_userspace, queue,
+ FRRS_USERSPACE);
+ fuse_uring_stop_list_entries(&queue->ent_avail_queue, queue, FRRS_WAIT);
+}
+
+/*
+ * Log state debug info
+ */
+static void fuse_uring_log_ent_state(struct fuse_ring *ring)
+{
+ int qid;
+ struct fuse_ring_ent *ent;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = ring->queues[qid];
+
+ if (!queue)
+ continue;
+
+ spin_lock(&queue->lock);
+ /*
+ * Log entries from the intermediate queue, the other queues
+ * should be empty
+ */
+ list_for_each_entry(ent, &queue->ent_w_req_queue, list) {
+ pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+ ring, qid, ent, ent->state);
+ }
+ list_for_each_entry(ent, &queue->ent_commit_queue, list) {
+ pr_info(" ent-req-queue ring=%p qid=%d ent=%p state=%d\n",
+ ring, qid, ent, ent->state);
+ }
+ spin_unlock(&queue->lock);
+ }
+ ring->stop_debug_log = 1;
+}
+
+static void fuse_uring_async_stop_queues(struct work_struct *work)
+{
+ int qid;
+ struct fuse_ring *ring =
+ container_of(work, struct fuse_ring, async_teardown_work.work);
+
+ /* XXX code dup */
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+ if (!queue)
+ continue;
+
+ fuse_uring_teardown_entries(queue);
+ }
+
+ /*
+ * Some ring entries are might be in the middle of IO operations,
+ * i.e. in process to get handled by file_operations::uring_cmd
+ * or on the way to userspace - we could handle that with conditions in
+ * run time code, but easier/cleaner to have an async tear down handler
+ * If there are still queue references left
+ */
+ if (atomic_read(&ring->queue_refs) > 0) {
+ if (time_after(jiffies,
+ ring->teardown_time + FUSE_URING_TEARDOWN_TIMEOUT))
+ fuse_uring_log_ent_state(ring);
+
+ schedule_delayed_work(&ring->async_teardown_work,
+ FUSE_URING_TEARDOWN_INTERVAL);
+ } else {
+ wake_up_all(&ring->stop_waitq);
+ }
+}
+
+/*
+ * Stop the ring queues
+ */
+void fuse_uring_stop_queues(struct fuse_ring *ring)
+{
+ int qid;
+
+ for (qid = 0; qid < ring->nr_queues; qid++) {
+ struct fuse_ring_queue *queue = READ_ONCE(ring->queues[qid]);
+
+ if (!queue)
+ continue;
+
+ fuse_uring_teardown_entries(queue);
+ }
+
+ if (atomic_read(&ring->queue_refs) > 0) {
+ ring->teardown_time = jiffies;
+ INIT_DELAYED_WORK(&ring->async_teardown_work,
+ fuse_uring_async_stop_queues);
+ schedule_delayed_work(&ring->async_teardown_work,
+ FUSE_URING_TEARDOWN_INTERVAL);
+ } else {
+ wake_up_all(&ring->stop_waitq);
+ }
+}
+
/*
* Checks for errors and stores it into the request
*/
@@ -526,6 +728,9 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
return err;
fpq = queue->fpq;
+ if (!READ_ONCE(fc->connected) || READ_ONCE(queue->stopped))
+ return err;
+
spin_lock(&queue->lock);
/* Find a request based on the unique ID of the fuse request
* This should get revised, as it needs a hash calculation and list
@@ -683,6 +888,7 @@ static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
if (WARN_ON_ONCE(err != 0))
goto err;
+ atomic_inc(&ring->queue_refs);
_fuse_uring_fetch(ring_ent, cmd, issue_flags);
return 0;
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index b2722828bd20dce5f5bd7884ba87861fb6e0d97b..52259714ffc59a38b21f834ae5e317fe818863dc 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -11,6 +11,9 @@
#ifdef CONFIG_FUSE_IO_URING
+#define FUSE_URING_TEARDOWN_TIMEOUT (5 * HZ)
+#define FUSE_URING_TEARDOWN_INTERVAL (HZ/20)
+
enum fuse_ring_req_state {
FRRS_INVALID = 0,
@@ -85,6 +88,8 @@ struct fuse_ring_queue {
struct list_head fuse_req_queue;
struct fuse_pqueue fpq;
+
+ bool stopped;
};
/**
@@ -99,12 +104,51 @@ struct fuse_ring {
size_t nr_queues;
struct fuse_ring_queue **queues;
+ /*
+ * Log ring entry states onces on stop when entries cannot be
+ * released
+ */
+ unsigned int stop_debug_log : 1;
+
+ wait_queue_head_t stop_waitq;
+
+ /* async tear down */
+ struct delayed_work async_teardown_work;
+
+ /* log */
+ unsigned long teardown_time;
+
+ atomic_t queue_refs;
};
bool fuse_uring_enabled(void);
void fuse_uring_destruct(struct fuse_conn *fc);
+void fuse_uring_stop_queues(struct fuse_ring *ring);
+void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+
+ if (ring == NULL)
+ return;
+
+ if (atomic_read(&ring->queue_refs) > 0) {
+ fuse_uring_abort_end_requests(ring);
+ fuse_uring_stop_queues(ring);
+ }
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+ struct fuse_ring *ring = fc->ring;
+
+ if (ring)
+ wait_event(ring->stop_waitq,
+ atomic_read(&ring->queue_refs) == 0);
+}
+
#else /* CONFIG_FUSE_IO_URING */
struct fuse_ring;
@@ -122,6 +166,13 @@ static inline bool fuse_uring_enabled(void)
return false;
}
+static inline void fuse_uring_abort(struct fuse_conn *fc)
+{
+}
+
+static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
+{
+}
#endif /* CONFIG_FUSE_IO_URING */
#endif /* _FS_FUSE_DEV_URING_I_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (9 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-12-03 14:09 ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 12/16] fuse: {uring} Allow to queue bg " Bernd Schubert
` (6 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This prepares queueing and sending foreground requests through
io-uring.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 5 +-
fs/fuse/dev_uring.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 8 +++
fs/fuse/fuse_dev_i.h | 5 ++
4 files changed, 175 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ed85a5f2d6c360b8401b174bc97cc135d87e90d9..c53deb690cc9c7958741cd144fcad166b5721e11 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -237,7 +237,8 @@ __releases(fiq->lock)
spin_unlock(&fiq->lock);
}
-static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_link *forget)
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+ struct fuse_forget_link *forget)
{
spin_lock(&fiq->lock);
if (fiq->connected) {
@@ -250,7 +251,7 @@ static void fuse_dev_queue_forget(struct fuse_iqueue *fiq, struct fuse_forget_li
}
}
-static void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req)
{
spin_lock(&fiq->lock);
if (list_empty(&req->intr_entry)) {
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index b5bb78b5f5902c8b87f3e196baaa05640380046d..b1c56ccf828ec2d4cd921906fb42901fefcc6cc5 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -24,6 +24,12 @@ bool fuse_uring_enabled(void)
return enable_uring;
}
+struct fuse_uring_cmd_pdu {
+ struct fuse_ring_ent *ring_ent;
+};
+
+const struct fuse_iqueue_ops fuse_io_uring_ops;
+
static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
int error)
{
@@ -773,6 +779,31 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
return 0;
}
+static bool is_ring_ready(struct fuse_ring *ring, int current_qid)
+{
+ int qid;
+ struct fuse_ring_queue *queue;
+ bool ready = true;
+
+ for (qid = 0; qid < ring->nr_queues && ready; qid++) {
+ if (current_qid == qid)
+ continue;
+
+ queue = ring->queues[qid];
+ if (!queue) {
+ ready = false;
+ break;
+ }
+
+ spin_lock(&queue->lock);
+ if (list_empty(&queue->ent_avail_queue))
+ ready = false;
+ spin_unlock(&queue->lock);
+ }
+
+ return ready;
+}
+
/*
* fuse_uring_req_fetch command handling
*/
@@ -781,11 +812,23 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
unsigned int issue_flags)
{
struct fuse_ring_queue *queue = ring_ent->queue;
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_conn *fc = ring->fc;
+ struct fuse_iqueue *fiq = &fc->iq;
spin_lock(&queue->lock);
fuse_uring_ent_avail(ring_ent, queue);
ring_ent->cmd = cmd;
spin_unlock(&queue->lock);
+
+ if (!ring->ready) {
+ bool ready = is_ring_ready(ring, queue->qid);
+
+ if (ready) {
+ WRITE_ONCE(ring->ready, true);
+ fiq->ops = &fuse_io_uring_ops;
+ }
+ }
}
/*
@@ -945,3 +988,119 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
return -EIOCBQUEUED;
}
+
+/*
+ * This prepares and sends the ring request in fuse-uring task context.
+ * User buffers are not mapped yet - the application does not have permission
+ * to write to it - this has to be executed in ring task context.
+ */
+static void
+fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+ struct fuse_ring_ent *ring_ent = pdu->ring_ent;
+ struct fuse_ring_queue *queue = ring_ent->queue;
+ int err;
+
+ BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
+
+ err = fuse_uring_prepare_send(ring_ent);
+ if (err)
+ goto err;
+
+ io_uring_cmd_done(cmd, 0, 0, issue_flags);
+
+ spin_lock(&queue->lock);
+ ring_ent->state = FRRS_USERSPACE;
+ list_move(&ring_ent->list, &queue->ent_in_userspace);
+ spin_unlock(&queue->lock);
+ return;
+err:
+ fuse_uring_next_fuse_req(ring_ent, queue);
+}
+
+static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
+{
+ unsigned int qid;
+ struct fuse_ring_queue *queue;
+
+ qid = task_cpu(current);
+
+ if (WARN_ONCE(qid >= ring->nr_queues,
+ "Core number (%u) exceeds nr ueues (%zu)\n", qid,
+ ring->nr_queues))
+ qid = 0;
+
+ queue = ring->queues[qid];
+ if (WARN_ONCE(!queue, "Missing queue for qid %d\n", qid))
+ return NULL;
+
+ return queue;
+}
+
+/* queue a fuse request and send it if a ring entry is available */
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
+{
+ struct fuse_conn *fc = req->fm->fc;
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ struct fuse_ring_ent *ring_ent = NULL;
+ int err;
+
+ err = -EINVAL;
+ queue = fuse_uring_task_to_queue(ring);
+ if (!queue)
+ goto err;
+
+ if (req->in.h.opcode != FUSE_NOTIFY_REPLY)
+ req->in.h.unique = fuse_get_unique(fiq);
+ spin_lock(&queue->lock);
+ err = -ENOTCONN;
+ if (unlikely(queue->stopped))
+ goto err_unlock;
+
+ if (!list_empty(&queue->ent_avail_queue)) {
+ ring_ent = list_first_entry(&queue->ent_avail_queue,
+ struct fuse_ring_ent, list);
+
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+ } else {
+ list_add_tail(&req->list, &queue->fuse_req_queue);
+ }
+ spin_unlock(&queue->lock);
+
+ if (ring_ent) {
+ struct io_uring_cmd *cmd = ring_ent->cmd;
+ struct fuse_uring_cmd_pdu *pdu =
+ (struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+ err = -EIO;
+ if (WARN_ON_ONCE(ring_ent->state != FRRS_FUSE_REQ))
+ goto err;
+
+ pdu->ring_ent = ring_ent;
+ io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+ }
+
+ return;
+
+err_unlock:
+ spin_unlock(&queue->lock);
+err:
+ req->out.h.error = err;
+ clear_bit(FR_PENDING, &req->flags);
+ fuse_request_end(req);
+}
+
+const struct fuse_iqueue_ops fuse_io_uring_ops = {
+ /* should be send over io-uring as enhancement */
+ .send_forget = fuse_dev_queue_forget,
+
+ /*
+ * could be send over io-uring, but interrupts should be rare,
+ * no need to make the code complex
+ */
+ .send_interrupt = fuse_dev_queue_interrupt,
+ .send_req = fuse_uring_queue_fuse_req,
+};
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 52259714ffc59a38b21f834ae5e317fe818863dc..19867d27894f9d985e224111ea586c82b4b4cfe8 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -119,6 +119,8 @@ struct fuse_ring {
unsigned long teardown_time;
atomic_t queue_refs;
+
+ bool ready;
};
bool fuse_uring_enabled(void);
@@ -126,6 +128,7 @@ void fuse_uring_destruct(struct fuse_conn *fc);
void fuse_uring_stop_queues(struct fuse_ring *ring);
void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
+void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
static inline void fuse_uring_abort(struct fuse_conn *fc)
{
@@ -149,6 +152,11 @@ static inline void fuse_uring_wait_stopped_queues(struct fuse_conn *fc)
atomic_read(&ring->queue_refs) == 0);
}
+static inline bool fuse_uring_ready(struct fuse_conn *fc)
+{
+ return fc->ring && fc->ring->ready;
+}
+
#else /* CONFIG_FUSE_IO_URING */
struct fuse_ring;
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index d7bf72dabd84c3896d1447380649e2f4d20b0643..1d1c1e9848fba8dae46651e28809f73e165e74fe 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -17,6 +17,8 @@ struct fuse_arg;
struct fuse_args;
struct fuse_pqueue;
struct fuse_req;
+struct fuse_iqueue;
+struct fuse_forget_link;
struct fuse_copy_state {
int write;
@@ -58,6 +60,9 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned int numargs,
int zeroing);
int fuse_copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
unsigned int nbytes);
+void fuse_dev_queue_forget(struct fuse_iqueue *fiq,
+ struct fuse_forget_link *forget);
+void fuse_dev_queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req);
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring
2024-11-27 13:40 ` [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
@ 2024-12-03 14:09 ` Pavel Begunkov
2024-12-03 22:46 ` Bernd Schubert
0 siblings, 1 reply; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 14:09 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
> This prepares queueing and sending foreground requests through
> io-uring.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> fs/fuse/dev.c | 5 +-
> fs/fuse/dev_uring.c | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/dev_uring_i.h | 8 +++
> fs/fuse/fuse_dev_i.h | 5 ++
> 4 files changed, 175 insertions(+), 2 deletions(-)
>
...
> +
> +/*
> + * This prepares and sends the ring request in fuse-uring task context.
> + * User buffers are not mapped yet - the application does not have permission
> + * to write to it - this has to be executed in ring task context.
> + */
> +static void
> +fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
> + unsigned int issue_flags)
> +{
> + struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
> + struct fuse_ring_ent *ring_ent = pdu->ring_ent;
> + struct fuse_ring_queue *queue = ring_ent->queue;
> + int err;
> +
> + BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
> +
> + err = fuse_uring_prepare_send(ring_ent);
> + if (err)
> + goto err;
> +
> + io_uring_cmd_done(cmd, 0, 0, issue_flags);
> +
> + spin_lock(&queue->lock);
> + ring_ent->state = FRRS_USERSPACE;
I haven't followed the cancellation/teardown path well, but don't
you need to set FRRS_USERSPACE before io_uring_cmd_done()?
E.g. while we're just before the spin_lock above here, can
fuse_uring_stop_list_entries() find the request, see that the state
is not FRRS_USERSPACE
bool need_cmd_done = ent->state != FRRS_USERSPACE;
and call io_uring_cmd_done a second time?
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring
2024-12-03 14:09 ` Pavel Begunkov
@ 2024-12-03 22:46 ` Bernd Schubert
0 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-12-03 22:46 UTC (permalink / raw)
To: Pavel Begunkov, Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 12/3/24 15:09, Pavel Begunkov wrote:
> On 11/27/24 13:40, Bernd Schubert wrote:
>> This prepares queueing and sending foreground requests through
>> io-uring.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>> fs/fuse/dev.c | 5 +-
>> fs/fuse/dev_uring.c | 159 ++++++++++++++++++++++++++++++++++++++++
>> ++++++++++
>> fs/fuse/dev_uring_i.h | 8 +++
>> fs/fuse/fuse_dev_i.h | 5 ++
>> 4 files changed, 175 insertions(+), 2 deletions(-)
>>
> ...
>> +
>> +/*
>> + * This prepares and sends the ring request in fuse-uring task context.
>> + * User buffers are not mapped yet - the application does not have
>> permission
>> + * to write to it - this has to be executed in ring task context.
>> + */
>> +static void
>> +fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
>> + unsigned int issue_flags)
>> +{
>> + struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu
>> *)cmd->pdu;
>> + struct fuse_ring_ent *ring_ent = pdu->ring_ent;
>> + struct fuse_ring_queue *queue = ring_ent->queue;
>> + int err;
>> +
>> + BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
>> +
>> + err = fuse_uring_prepare_send(ring_ent);
>> + if (err)
>> + goto err;
>> +
>> + io_uring_cmd_done(cmd, 0, 0, issue_flags);
>> +
>> + spin_lock(&queue->lock);
>> + ring_ent->state = FRRS_USERSPACE;
>
> I haven't followed the cancellation/teardown path well, but don't
> you need to set FRRS_USERSPACE before io_uring_cmd_done()?
>
> E.g. while we're just before the spin_lock above here, can
> fuse_uring_stop_list_entries() find the request, see that the state
> is not FRRS_USERSPACE
>
> bool need_cmd_done = ent->state != FRRS_USERSPACE;
>
> and call io_uring_cmd_done a second time?
Sorry about the confusion, I had actually already fixed it in patch 14,
the one that added handling of IO_URING_F_TASK_DEAD and that you asked
to merge into this patch here. Obviously at least that part should have
been part of this patch here.
Thanks again for your thorough review!
Bernd
Bernd
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 12/16] fuse: {uring} Allow to queue bg requests through io-uring
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (10 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
` (5 subsequent siblings)
17 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
This prepares queueing and sending background requests through
io-uring.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev.c | 26 ++++++++++++-
fs/fuse/dev_uring.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 6 +++
3 files changed, 134 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c53deb690cc9c7958741cd144fcad166b5721e11..5134bd158ce0d24f77a4b622b87ccb83b8c92f3f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -568,7 +568,25 @@ ssize_t __fuse_simple_request(struct mnt_idmap *idmap,
return ret;
}
-static bool fuse_request_queue_background(struct fuse_req *req)
+#ifdef CONFIG_FUSE_IO_URING
+static bool fuse_request_queue_background_uring(struct fuse_conn *fc,
+ struct fuse_req *req)
+{
+ struct fuse_iqueue *fiq = &fc->iq;
+
+ req->in.h.unique = fuse_get_unique(fiq);
+ req->in.h.len = sizeof(struct fuse_in_header) +
+ fuse_len_args(req->args->in_numargs,
+ (struct fuse_arg *) req->args->in_args);
+
+ return fuse_uring_queue_bq_req(req);
+}
+#endif
+
+/*
+ * @return true if queued
+ */
+static int fuse_request_queue_background(struct fuse_req *req)
{
struct fuse_mount *fm = req->fm;
struct fuse_conn *fc = fm->fc;
@@ -580,6 +598,12 @@ static bool fuse_request_queue_background(struct fuse_req *req)
atomic_inc(&fc->num_waiting);
}
__set_bit(FR_ISREPLY, &req->flags);
+
+#ifdef CONFIG_FUSE_IO_URING
+ if (fuse_uring_ready(fc))
+ return fuse_request_queue_background_uring(fc, req);
+#endif
+
spin_lock(&fc->bg_lock);
if (likely(fc->connected)) {
fc->num_background++;
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index b1c56ccf828ec2d4cd921906fb42901fefcc6cc5..66addb5d00c36d84a0d8d1f470f5ae10d8ee3f6f 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -30,10 +30,52 @@ struct fuse_uring_cmd_pdu {
const struct fuse_iqueue_ops fuse_io_uring_ops;
+static void fuse_uring_flush_bg(struct fuse_ring_queue *queue)
+{
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_conn *fc = ring->fc;
+
+ lockdep_assert_held(&queue->lock);
+ lockdep_assert_held(&fc->bg_lock);
+
+ /*
+ * Allow one bg request per queue, ignoring global fc limits.
+ * This prevents a single queue from consuming all resources and
+ * eliminates the need for remote queue wake-ups when global
+ * limits are met but this queue has no more waiting requests.
+ */
+ while ((fc->active_background < fc->max_background ||
+ !queue->active_background) &&
+ (!list_empty(&queue->fuse_req_bg_queue))) {
+ struct fuse_req *req;
+
+ req = list_first_entry(&queue->fuse_req_bg_queue,
+ struct fuse_req, list);
+ fc->active_background++;
+ queue->active_background++;
+
+ list_move_tail(&req->list, &queue->fuse_req_queue);
+ }
+}
+
static void fuse_uring_req_end(struct fuse_ring_ent *ring_ent, bool set_err,
int error)
{
+ struct fuse_ring_queue *queue = ring_ent->queue;
struct fuse_req *req = ring_ent->fuse_req;
+ struct fuse_ring *ring = queue->ring;
+ struct fuse_conn *fc = ring->fc;
+
+ lockdep_assert_not_held(&queue->lock);
+ spin_lock(&queue->lock);
+ if (test_bit(FR_BACKGROUND, &req->flags)) {
+ queue->active_background--;
+ spin_lock(&fc->bg_lock);
+ fuse_uring_flush_bg(queue);
+ spin_unlock(&fc->bg_lock);
+ }
+
+ spin_unlock(&queue->lock);
if (set_err)
req->out.h.error = error;
@@ -78,6 +120,7 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
{
int qid;
struct fuse_ring_queue *queue;
+ struct fuse_conn *fc = ring->fc;
for (qid = 0; qid < ring->nr_queues; qid++) {
queue = READ_ONCE(ring->queues[qid]);
@@ -85,6 +128,13 @@ void fuse_uring_abort_end_requests(struct fuse_ring *ring)
continue;
queue->stopped = true;
+
+ WARN_ON_ONCE(ring->fc->max_background != UINT_MAX);
+ spin_lock(&queue->lock);
+ spin_lock(&fc->bg_lock);
+ fuse_uring_flush_bg(queue);
+ spin_unlock(&fc->bg_lock);
+ spin_unlock(&queue->lock);
fuse_uring_abort_end_queue_requests(queue);
}
}
@@ -198,6 +248,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
INIT_LIST_HEAD(&queue->ent_w_req_queue);
INIT_LIST_HEAD(&queue->ent_in_userspace);
INIT_LIST_HEAD(&queue->fuse_req_queue);
+ INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
queue->fpq.processing = pq;
fuse_pqueue_init(&queue->fpq);
@@ -1093,6 +1144,58 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
fuse_request_end(req);
}
+bool fuse_uring_queue_bq_req(struct fuse_req *req)
+{
+ struct fuse_conn *fc = req->fm->fc;
+ struct fuse_ring *ring = fc->ring;
+ struct fuse_ring_queue *queue;
+ struct fuse_ring_ent *ring_ent = NULL;
+
+ queue = fuse_uring_task_to_queue(ring);
+ if (!queue)
+ return false;
+
+ spin_lock(&queue->lock);
+ if (unlikely(queue->stopped)) {
+ spin_unlock(&queue->lock);
+ return false;
+ }
+
+ list_add_tail(&req->list, &queue->fuse_req_bg_queue);
+
+ if (!list_empty(&queue->ent_avail_queue))
+ ring_ent = list_first_entry(&queue->ent_avail_queue,
+ struct fuse_ring_ent, list);
+
+ spin_lock(&fc->bg_lock);
+ fc->num_background++;
+ if (fc->num_background == fc->max_background)
+ fc->blocked = 1;
+ fuse_uring_flush_bg(queue);
+ spin_unlock(&fc->bg_lock);
+
+ /*
+ * Due to bg_queue flush limits there might be other bg requests
+ * in the queue that need to be handled first. Or no further req
+ * might be available.
+ */
+ req = list_first_entry_or_null(&queue->fuse_req_queue, struct fuse_req,
+ list);
+ if (ring_ent && req) {
+ struct io_uring_cmd *cmd = ring_ent->cmd;
+ struct fuse_uring_cmd_pdu *pdu =
+ (struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+ fuse_uring_add_req_to_ring_ent(ring_ent, req);
+
+ pdu->ring_ent = ring_ent;
+ io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
+ }
+ spin_unlock(&queue->lock);
+
+ return true;
+}
+
const struct fuse_iqueue_ops fuse_io_uring_ops = {
/* should be send over io-uring as enhancement */
.send_forget = fuse_dev_queue_forget,
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 19867d27894f9d985e224111ea586c82b4b4cfe8..0826fb1c03e2e38dedad56552ea09461965e248f 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -87,8 +87,13 @@ struct fuse_ring_queue {
/* fuse requests waiting for an entry slot */
struct list_head fuse_req_queue;
+ /* background fuse requests */
+ struct list_head fuse_req_bg_queue;
+
struct fuse_pqueue fpq;
+ unsigned int active_background;
+
bool stopped;
};
@@ -129,6 +134,7 @@ void fuse_uring_stop_queues(struct fuse_ring *ring);
void fuse_uring_abort_end_requests(struct fuse_ring *ring);
int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req);
+bool fuse_uring_queue_bq_req(struct fuse_req *req);
static inline void fuse_uring_abort(struct fuse_conn *fc)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (11 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 12/16] fuse: {uring} Allow to queue bg " Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-12-03 12:15 ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
` (4 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
From: Pavel Begunkov <[email protected]>
When the taks that submitted a request is dying, a task work for that
request might get run by a kernel thread or even worse by a half
dismantled task. We can't just cancel the task work without running the
callback as the cmd might need to do some clean up, so pass a flag
instead. If set, it's not safe to access any task resources and the
callback is expected to cancel the cmd ASAP.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 1 +
io_uring/uring_cmd.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 4b9ba523978d203ae23fb4ec9622d6e4e35a5e36..2ee5dc105b58ab48aef347a845509367a8241b9b 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -37,6 +37,7 @@ enum io_uring_cmd_flags {
/* set when uring wants to cancel a previously issued command */
IO_URING_F_CANCEL = (1 << 11),
IO_URING_F_COMPAT = (1 << 12),
+ IO_URING_F_TASK_DEAD = (1 << 13),
};
struct io_wq_work_node {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 39c3c816ec7882b9aa26cd45df6ade531379e40f..38b6ccb4e55a1e85d204263272a280d3272557a4 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -119,9 +119,13 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+ unsigned int flags = IO_URING_F_COMPLETE_DEFER;
+
+ if (req->task->flags & PF_EXITING)
+ flags |= IO_URING_F_TASK_DEAD;
/* task_work executor checks the deffered list completion */
- ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER);
+ ioucmd->task_work_cb(ioucmd, flags);
}
void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task
2024-11-27 13:40 ` [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
@ 2024-12-03 12:15 ` Pavel Begunkov
2024-12-03 12:15 ` Bernd Schubert
0 siblings, 1 reply; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 12:15 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
> From: Pavel Begunkov <[email protected]>
>
> When the taks that submitted a request is dying, a task work for that
> request might get run by a kernel thread or even worse by a half
> dismantled task. We can't just cancel the task work without running the
> callback as the cmd might need to do some clean up, so pass a flag
> instead. If set, it's not safe to access any task resources and the
> callback is expected to cancel the cmd ASAP.
It was merged through btrfs tree, so when you rebase patches onto
something branching off v6.13-rc1 it should already be there.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task
2024-12-03 12:15 ` Pavel Begunkov
@ 2024-12-03 12:15 ` Bernd Schubert
0 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-12-03 12:15 UTC (permalink / raw)
To: Pavel Begunkov, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 12/3/24 13:15, Pavel Begunkov wrote:
> On 11/27/24 13:40, Bernd Schubert wrote:
>> From: Pavel Begunkov <[email protected]>
>>
>> When the taks that submitted a request is dying, a task work for that
>> request might get run by a kernel thread or even worse by a half
>> dismantled task. We can't just cancel the task work without running the
>> callback as the cmd might need to do some clean up, so pass a flag
>> instead. If set, it's not safe to access any task resources and the
>> callback is expected to cancel the cmd ASAP.
>
> It was merged through btrfs tree, so when you rebase patches onto
> something branching off v6.13-rc1 it should already be there.
Yeah thanks, just had rebased the branch to 6.13 and had already
noticed and dropped this patch.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (12 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-12-03 12:20 ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
` (3 subsequent siblings)
17 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
The ring task is terminating, it not safe to still access
its resources. Also no need for further actions.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 66addb5d00c36d84a0d8d1f470f5ae10d8ee3f6f..94dc3f56d4ab604eb4b87d3b9731567e3a214b0a 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1056,16 +1056,22 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
+ if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
+ err = -ECANCELED;
+ goto terminating;
+ }
+
err = fuse_uring_prepare_send(ring_ent);
if (err)
goto err;
- io_uring_cmd_done(cmd, 0, 0, issue_flags);
-
+terminating:
spin_lock(&queue->lock);
ring_ent->state = FRRS_USERSPACE;
list_move(&ring_ent->list, &queue->ent_in_userspace);
spin_unlock(&queue->lock);
+ io_uring_cmd_done(cmd, err, 0, issue_flags);
+
return;
err:
fuse_uring_next_fuse_req(ring_ent, queue);
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD
2024-11-27 13:40 ` [PATCH RFC v7 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
@ 2024-12-03 12:20 ` Pavel Begunkov
0 siblings, 0 replies; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 12:20 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
> The ring task is terminating, it not safe to still access
> its resources. Also no need for further actions.
>
> Signed-off-by: Bernd Schubert <[email protected]>
> ---
> fs/fuse/dev_uring.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 66addb5d00c36d84a0d8d1f470f5ae10d8ee3f6f..94dc3f56d4ab604eb4b87d3b9731567e3a214b0a 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -1056,16 +1056,22 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
>
> BUILD_BUG_ON(sizeof(pdu) > sizeof(cmd->pdu));
>
> + if (unlikely(issue_flags & IO_URING_F_TASK_DEAD)) {
> + err = -ECANCELED;
> + goto terminating;
> + }
Can you just fix up the patch that added the function? Patch 11, I
assume. No need to keep a bug for a couple of commits when you can
avoid introducing it in the first place.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFC v7 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (13 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
` (2 subsequent siblings)
17 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
When the fuse-server terminates while the fuse-client or kernel
still has queued URING_CMDs, these commands retain references
to the struct file used by the fuse connection. This prevents
fuse_dev_release() from being invoked, resulting in a hung mount
point.
This patch addresses the issue by making queued URING_CMDs
cancelable, allowing fuse_dev_release() to proceed as expected
and preventing the mount point from hanging.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 109 ++++++++++++++++++++++++++++++++++++++------------
fs/fuse/dev_uring_i.h | 12 ++++++
2 files changed, 95 insertions(+), 26 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 94dc3f56d4ab604eb4b87d3b9731567e3a214b0a..fe24e31bbfecec526f88bc5b82b0aa132357c1cc 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -26,6 +26,7 @@ bool fuse_uring_enabled(void)
struct fuse_uring_cmd_pdu {
struct fuse_ring_ent *ring_ent;
+ struct fuse_ring_queue *queue;
};
const struct fuse_iqueue_ops fuse_io_uring_ops;
@@ -221,6 +222,7 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
struct fuse_conn *fc = ring->fc;
struct fuse_ring_queue *queue;
struct list_head *pq;
+ struct fuse_ring_ent *ent, *next;
queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
if (!queue)
@@ -249,6 +251,12 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
INIT_LIST_HEAD(&queue->ent_in_userspace);
INIT_LIST_HEAD(&queue->fuse_req_queue);
INIT_LIST_HEAD(&queue->fuse_req_bg_queue);
+ INIT_LIST_HEAD(&queue->ent_released);
+
+ list_for_each_entry_safe(ent, next, &queue->ent_released, list) {
+ list_del_init(&ent->list);
+ kfree(ent);
+ }
queue->fpq.processing = pq;
fuse_pqueue_init(&queue->fpq);
@@ -281,24 +289,27 @@ static void fuse_uring_stop_fuse_req_end(struct fuse_ring_ent *ent)
/*
* Release a request/entry on connection tear down
*/
-static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent,
- bool need_cmd_done)
+static void fuse_uring_entry_teardown(struct fuse_ring_ent *ent)
{
- /*
- * fuse_request_end() might take other locks like fi->lock and
- * can lead to lock ordering issues
- */
- lockdep_assert_not_held(&ent->queue->lock);
+ struct fuse_ring_queue *queue = ent->queue;
- if (need_cmd_done)
+ if (ent->need_cmd_done)
io_uring_cmd_done(ent->cmd, -ENOTCONN, 0,
IO_URING_F_UNLOCKED);
if (ent->fuse_req)
fuse_uring_stop_fuse_req_end(ent);
- list_del_init(&ent->list);
- kfree(ent);
+ /*
+ * The entry must not be freed immediately, due to access of direct
+ * pointer access of entries through IO_URING_F_CANCEL - there is a risk
+ * of race between daemon termination (which triggers IO_URING_F_CANCEL
+ * and accesses entries without checking the list state first
+ */
+ spin_lock(&queue->lock);
+ list_move(&ent->list, &queue->ent_released);
+ ent->state = FRRS_RELEASED;
+ spin_unlock(&queue->lock);
}
static void fuse_uring_stop_list_entries(struct list_head *head,
@@ -318,15 +329,15 @@ static void fuse_uring_stop_list_entries(struct list_head *head,
continue;
}
+ ent->need_cmd_done = ent->state != FRRS_USERSPACE;
+ ent->state = FRRS_TEARDOWN;
list_move(&ent->list, &to_teardown);
}
spin_unlock(&queue->lock);
/* no queue lock to avoid lock order issues */
list_for_each_entry_safe(ent, next, &to_teardown, list) {
- bool need_cmd_done = ent->state != FRRS_USERSPACE;
-
- fuse_uring_entry_teardown(ent, need_cmd_done);
+ fuse_uring_entry_teardown(ent);
queue_refs = atomic_dec_return(&ring->queue_refs);
WARN_ON_ONCE(queue_refs < 0);
@@ -434,6 +445,49 @@ void fuse_uring_stop_queues(struct fuse_ring *ring)
}
}
+/*
+ * Handle IO_URING_F_CANCEL, typically should come on daemon termination
+ */
+static void fuse_uring_cancel(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, struct fuse_conn *fc)
+{
+ struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+ struct fuse_ring_queue *queue = pdu->queue;
+ struct fuse_ring_ent *ent = pdu->ring_ent;
+ bool need_cmd_done = false;
+
+ /*
+ * direct access on ent - it must not be destructed as long as
+ * IO_URING_F_CANCEL might come up
+ */
+ spin_lock(&queue->lock);
+ if (ent->state == FRRS_WAIT) {
+ ent->state = FRRS_USERSPACE;
+ list_move(&ent->list, &queue->ent_in_userspace);
+ need_cmd_done = true;
+ }
+ spin_unlock(&queue->lock);
+
+ if (need_cmd_done)
+ io_uring_cmd_done(cmd, -ENOTCONN, 0, issue_flags);
+
+ /*
+ * releasing the last entry should trigger fuse_dev_release() if
+ * the daemon was terminated
+ */
+}
+
+static void fuse_uring_prepare_cancel(struct io_uring_cmd *cmd, int issue_flags,
+ struct fuse_ring_ent *ring_ent)
+{
+ struct fuse_uring_cmd_pdu *pdu = (struct fuse_uring_cmd_pdu *)cmd->pdu;
+
+ pdu->ring_ent = ring_ent;
+ pdu->queue = ring_ent->queue;
+
+ io_uring_cmd_mark_cancelable(cmd, issue_flags);
+}
+
/*
* Checks for errors and stores it into the request
*/
@@ -638,8 +692,10 @@ static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ring_ent)
* Make a ring entry available for fuse_req assignment
*/
static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
- struct fuse_ring_queue *queue)
+ struct fuse_ring_queue *queue,
+ unsigned int issue_flags)
{
+ fuse_uring_prepare_cancel(ring_ent->cmd, issue_flags, ring_ent);
list_move(&ring_ent->list, &queue->ent_avail_queue);
ring_ent->state = FRRS_WAIT;
}
@@ -742,7 +798,8 @@ static void fuse_uring_commit(struct fuse_ring_ent *ring_ent,
* Get the next fuse req and send it
*/
static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
- struct fuse_ring_queue *queue)
+ struct fuse_ring_queue *queue,
+ unsigned int issue_flags)
{
int has_next, err;
int prev_state = ring_ent->state;
@@ -751,7 +808,7 @@ static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ring_ent,
spin_lock(&queue->lock);
has_next = fuse_uring_ent_assign_req(ring_ent);
if (!has_next) {
- fuse_uring_ent_avail(ring_ent, queue);
+ fuse_uring_ent_avail(ring_ent, queue, issue_flags);
spin_unlock(&queue->lock);
break; /* no request left */
}
@@ -826,7 +883,7 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
* and fetching is done in one step vs legacy fuse, which has separated
* read (fetch request) and write (commit result).
*/
- fuse_uring_next_fuse_req(ring_ent, queue);
+ fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
return 0;
}
@@ -868,7 +925,7 @@ static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
struct fuse_iqueue *fiq = &fc->iq;
spin_lock(&queue->lock);
- fuse_uring_ent_avail(ring_ent, queue);
+ fuse_uring_ent_avail(ring_ent, queue, issue_flags);
ring_ent->cmd = cmd;
spin_unlock(&queue->lock);
@@ -1022,6 +1079,11 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
if (!fc->connected || fc->aborted)
return fc->aborted ? -ECONNABORTED : -ENOTCONN;
+ if ((unlikely(issue_flags & IO_URING_F_CANCEL))) {
+ fuse_uring_cancel(cmd, issue_flags, fc);
+ return 0;
+ }
+
switch (cmd_op) {
case FUSE_URING_REQ_FETCH:
err = fuse_uring_fetch(cmd, issue_flags, fc);
@@ -1074,7 +1136,7 @@ fuse_uring_send_req_in_task(struct io_uring_cmd *cmd,
return;
err:
- fuse_uring_next_fuse_req(ring_ent, queue);
+ fuse_uring_next_fuse_req(ring_ent, queue, issue_flags);
}
static struct fuse_ring_queue *fuse_uring_task_to_queue(struct fuse_ring *ring)
@@ -1129,14 +1191,11 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
if (ring_ent) {
struct io_uring_cmd *cmd = ring_ent->cmd;
- struct fuse_uring_cmd_pdu *pdu =
- (struct fuse_uring_cmd_pdu *)cmd->pdu;
-
err = -EIO;
if (WARN_ON_ONCE(ring_ent->state != FRRS_FUSE_REQ))
goto err;
- pdu->ring_ent = ring_ent;
+ /* pdu already set by preparing IO_URING_F_CANCEL */
io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
}
@@ -1189,12 +1248,10 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
list);
if (ring_ent && req) {
struct io_uring_cmd *cmd = ring_ent->cmd;
- struct fuse_uring_cmd_pdu *pdu =
- (struct fuse_uring_cmd_pdu *)cmd->pdu;
fuse_uring_add_req_to_ring_ent(ring_ent, req);
- pdu->ring_ent = ring_ent;
+ /* pdu already set by preparing IO_URING_F_CANCEL */
io_uring_cmd_complete_in_task(cmd, fuse_uring_send_req_in_task);
}
spin_unlock(&queue->lock);
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 0826fb1c03e2e38dedad56552ea09461965e248f..df541247f07e413923b13b6bf203f301e8c1710a 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -28,6 +28,12 @@ enum fuse_ring_req_state {
/* The ring entry is in or on the way to user space */
FRRS_USERSPACE,
+
+ /* The ring entry is in teardown */
+ FRRS_TEARDOWN,
+
+ /* The ring entry is released, but not freed yet */
+ FRRS_RELEASED,
};
/** A fuse ring entry, part of the ring queue */
@@ -52,6 +58,9 @@ struct fuse_ring_ent {
*/
unsigned int state;
+ /* The entry needs io_uring_cmd_done for teardown */
+ unsigned int need_cmd_done;
+
struct fuse_req *fuse_req;
};
@@ -84,6 +93,9 @@ struct fuse_ring_queue {
/* entries in userspace */
struct list_head ent_in_userspace;
+ /* entries that are released */
+ struct list_head ent_released;
+
/* fuse requests waiting for an entry slot */
struct list_head fuse_req_queue;
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFC v7 16/16] fuse: enable fuse-over-io-uring
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (14 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
@ 2024-11-27 13:40 ` Bernd Schubert
2024-11-27 13:45 ` [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-12-03 14:24 ` Pavel Begunkov
17 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd,
Bernd Schubert
All required parts are handled now, fuse-io-uring can
be enabled.
Signed-off-by: Bernd Schubert <[email protected]>
---
fs/fuse/dev_uring.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index fe24e31bbfecec526f88bc5b82b0aa132357c1cc..0039be997ededd29b094d84001958de3de91fe5a 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -1060,9 +1060,6 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
u32 cmd_op = cmd->cmd_op;
int err;
- /* Disabled for now, especially as teardown is not implemented yet */
- pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
- return -EOPNOTSUPP;
if (!enable_uring) {
pr_info_ratelimited("fuse-io-uring is disabled\n");
--
2.43.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 00/16] fuse: fuse-over-io-uring
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (15 preceding siblings ...)
2024-11-27 13:40 ` [PATCH RFC v7 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
@ 2024-11-27 13:45 ` Bernd Schubert
2024-12-03 14:24 ` Pavel Begunkov
17 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-11-27 13:45 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, Pavel Begunkov, linux-fsdevel, io-uring, Joanne Koong,
Josef Bacik, Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 14:40, Bernd Schubert wrote:
> [I removed RFC status as the design should be in place now
> and as xfstests pass. I still reviewing patches myself, though
> and also repeatings tests with different queue sizes.]
Sorry, my intend was to remove RFC status, but then forgot to
actually unset it :/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 00/16] fuse: fuse-over-io-uring
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
` (16 preceding siblings ...)
2024-11-27 13:45 ` [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
@ 2024-12-03 14:24 ` Pavel Begunkov
2024-12-03 14:32 ` Bernd Schubert
17 siblings, 1 reply; 37+ messages in thread
From: Pavel Begunkov @ 2024-12-03 14:24 UTC (permalink / raw)
To: Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 11/27/24 13:40, Bernd Schubert wrote:
> [I removed RFC status as the design should be in place now
> and as xfstests pass. I still reviewing patches myself, though
> and also repeatings tests with different queue sizes.]
I left a few comments, but it looks sane. At least on the io_uring
side nothing weird caught my eye. Cancellations might be a bit
worrisome as usual, so would be nice to give it a good run with
sanitizers.
> This adds support for uring communication between kernel and
> userspace daemon using opcode the IORING_OP_URING_CMD. The basic
> approach was taken from ublk.
>
> Motivation for these patches is all to increase fuse performance,
> by:
> - Reducing kernel/userspace context switches
> - Part of that is given by the ring ring - handling multiple
> requests on either side of kernel/userspace without the need
> to switch per request
> - Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting
> the result of a request and fetching the next fuse request
> in one step. In contrary to legacy read/write to /dev/fuse
> - Core and numa affinity - one ring per core, which allows to
> avoid cpu core context switches
>
> A more detailed motivation description can be found in the
> introction of previous patch series
> https://lore.kernel.org/r/[email protected]
> That description also includes benchmark results with RFCv1.
> Performance with the current series needs to be tested, but will
> be lower, as several optimization patches are missing, like
> wake-up on the same core. These optimizations will be submitted
> after merging the main changes.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFC v7 00/16] fuse: fuse-over-io-uring
2024-12-03 14:24 ` Pavel Begunkov
@ 2024-12-03 14:32 ` Bernd Schubert
0 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2024-12-03 14:32 UTC (permalink / raw)
To: Pavel Begunkov, Bernd Schubert, Miklos Szeredi
Cc: Jens Axboe, linux-fsdevel, io-uring, Joanne Koong, Josef Bacik,
Amir Goldstein, Ming Lei, David Wei, bernd
On 12/3/24 15:24, Pavel Begunkov wrote:
> On 11/27/24 13:40, Bernd Schubert wrote:
>> [I removed RFC status as the design should be in place now
>> and as xfstests pass. I still reviewing patches myself, though
>> and also repeatings tests with different queue sizes.]
>
> I left a few comments, but it looks sane. At least on the io_uring
> side nothing weird caught my eye. Cancellations might be a bit
> worrisome as usual, so would be nice to give it a good run with
> sanitizers.
Thanks a lot for your reviews, new series is in preparation, will
send it out tomorrow to give a test run over night. I'm
running xfstests on a kernel that has lockdep and ASAN enabled, which
is why it takes around 15 hours (with/without FOPEN_DIRECT_IO).
And definitely, cancellation/teardown is the part that I spent most
time on.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 37+ messages in thread