public inbox for [email protected]
 help / color / mirror / Atom feed
From: Bernd Schubert <[email protected]>
To: Joanne Koong <[email protected]>,
	Bernd Schubert <[email protected]>
Cc: Miklos Szeredi <[email protected]>, Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	[email protected], [email protected],
	Josef Bacik <[email protected]>,
	Amir Goldstein <[email protected]>,
	Ming Lei <[email protected]>, David Wei <[email protected]>,
	[email protected]
Subject: Re: [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands
Date: Thu, 28 Nov 2024 19:20:21 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAJnrk1YnWFQYG9VTr_1iUwcJmQEg3LemGOGkiqwbaqa4EaMUWw@mail.gmail.com>

Hi Joanne,

thanks for your careful review!

On 11/28/24 03:23, Joanne Koong wrote:
> On Wed, Nov 27, 2024 at 5:41 AM Bernd Schubert <[email protected]> wrote:
>>
>> This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
>> For now only FUSE_URING_REQ_FETCH is handled to register queue entries.
>>
>> Signed-off-by: Bernd Schubert <[email protected]>
>> ---
>>  fs/fuse/Kconfig           |  12 ++
>>  fs/fuse/Makefile          |   1 +
>>  fs/fuse/dev.c             |   4 +
>>  fs/fuse/dev_uring.c       | 318 ++++++++++++++++++++++++++++++++++++++++++++++
>>  fs/fuse/dev_uring_i.h     | 115 +++++++++++++++++
>>  fs/fuse/fuse_i.h          |   5 +
>>  fs/fuse/inode.c           |  10 ++
>>  include/uapi/linux/fuse.h |  67 ++++++++++
>>  8 files changed, 532 insertions(+)
>>
>> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
>> index 8674dbfbe59dbf79c304c587b08ebba3cfe405be..11f37cefc94b2af5a675c238801560c822b95f1a 100644
>> --- a/fs/fuse/Kconfig
>> +++ b/fs/fuse/Kconfig
>> @@ -63,3 +63,15 @@ config FUSE_PASSTHROUGH
>>           to be performed directly on a backing file.
>>
>>           If you want to allow passthrough operations, answer Y.
>> +
>> +config FUSE_IO_URING
>> +       bool "FUSE communication over io-uring"
>> +       default y
>> +       depends on FUSE_FS
>> +       depends on IO_URING
>> +       help
>> +         This allows sending FUSE requests over the IO uring interface and
>> +          also adds request core affinity.
>> +
>> +         If you want to allow fuse server/client communication through io-uring,
>> +         answer Y
>> diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
>> index ce0ff7a9007b94b4ab246b5271f227d126c768e8..fcf16b1c391a9bf11ca9f3a25b137acdb203ac47 100644
>> --- a/fs/fuse/Makefile
>> +++ b/fs/fuse/Makefile
>> @@ -14,5 +14,6 @@ fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
>>  fuse-y += iomode.o
>>  fuse-$(CONFIG_FUSE_DAX) += dax.o
>>  fuse-$(CONFIG_FUSE_PASSTHROUGH) += passthrough.o
>> +fuse-$(CONFIG_FUSE_IO_URING) += dev_uring.o
>>
>>  virtiofs-y := virtio_fs.o
>> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
>> index 63c3865aebb7811fdf4a5729b2181ee8321421dc..0770373492ae9ee83c4154fede9dcfd7be9fb33d 100644
>> --- a/fs/fuse/dev.c
>> +++ b/fs/fuse/dev.c
>> @@ -6,6 +6,7 @@
>>    See the file COPYING.
>>  */
>>
>> +#include "dev_uring_i.h"
>>  #include "fuse_i.h"
>>  #include "fuse_dev_i.h"
>>
>> @@ -2452,6 +2453,9 @@ const struct file_operations fuse_dev_operations = {
>>         .fasync         = fuse_dev_fasync,
>>         .unlocked_ioctl = fuse_dev_ioctl,
>>         .compat_ioctl   = compat_ptr_ioctl,
>> +#ifdef CONFIG_FUSE_IO_URING
>> +       .uring_cmd      = fuse_uring_cmd,
>> +#endif
>>  };
>>  EXPORT_SYMBOL_GPL(fuse_dev_operations);
>>
>> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..af9c5f116ba1dcf6c01d0359d1a06491c92c32f9
>> --- /dev/null
>> +++ b/fs/fuse/dev_uring.c
>> @@ -0,0 +1,318 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * FUSE: Filesystem in Userspace
>> + * Copyright (c) 2023-2024 DataDirect Networks.
>> + */
>> +
>> +#include <linux/fs.h>
> 
> nit: for consistency, should this line be placed directly above the
> other "#include <linux..." line?
> 
>> +
>> +#include "fuse_i.h"
>> +#include "dev_uring_i.h"
>> +#include "fuse_dev_i.h"
>> +
>> +#include <linux/io_uring/cmd.h>
>> +
>> +#ifdef CONFIG_FUSE_IO_URING
>> +static bool __read_mostly enable_uring;
>> +module_param(enable_uring, bool, 0644);
>> +MODULE_PARM_DESC(enable_uring,
>> +                "Enable uring userspace communication through uring.");
> 
> nit: The double uring seems a bit repetitive to me. Maybe just "enable
> uring userspace communication" or "enable userspace communication
> through uring"? Also, super nit but I noticed the other
> MODULE_PARM_DESCs don't have trailing periods.

Fixed to "Enable userspace communication through io-uring"

> 
>> +#endif
>> +
>> +bool fuse_uring_enabled(void)
>> +{
>> +       return enable_uring;
>> +}
>> +
>> +static int fuse_ring_ent_unset_userspace(struct fuse_ring_ent *ent)
>> +{
>> +       struct fuse_ring_queue *queue = ent->queue;
>> +
>> +       lockdep_assert_held(&queue->lock);
>> +
>> +       if (WARN_ON_ONCE(ent->state != FRRS_USERSPACE))
>> +               return -EIO;
>> +
>> +       ent->state = FRRS_COMMIT;
>> +       list_move(&ent->list, &queue->ent_commit_queue);
>> +
>> +       return 0;
>> +}
>> +
>> +void fuse_uring_destruct(struct fuse_conn *fc)
>> +{
>> +       struct fuse_ring *ring = fc->ring;
>> +       int qid;
>> +
>> +       if (!ring)
>> +               return;
>> +
>> +       for (qid = 0; qid < ring->nr_queues; qid++) {
>> +               struct fuse_ring_queue *queue = ring->queues[qid];
>> +
>> +               if (!queue)
>> +                       continue;
>> +
>> +               WARN_ON(!list_empty(&queue->ent_avail_queue));
>> +               WARN_ON(!list_empty(&queue->ent_commit_queue));
>> +
>> +               kfree(queue);
>> +               ring->queues[qid] = NULL;
>> +       }
>> +
>> +       kfree(ring->queues);
>> +       kfree(ring);
>> +       fc->ring = NULL;
>> +}
>> +
>> +#define FUSE_URING_IOV_SEGS 2 /* header and payload */
> 
> nit: to make the code flow easier, might be better to move #defines to
> the top of the file after the includes.

Fixed.

> 
>> +
>> +/*
>> + * Basic ring setup for this connection based on the provided configuration
>> + */
>> +static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
>> +{
>> +       struct fuse_ring *ring = NULL;
>> +       size_t nr_queues = num_possible_cpus();
>> +       struct fuse_ring *res = NULL;
>> +
>> +       ring = kzalloc(sizeof(*fc->ring) +
>> +                              nr_queues * sizeof(struct fuse_ring_queue),
> 
> I think you just need kzalloc(sizeof(*fc->ring)); here since you're
> allocating ring->queues later below

Ah, now I see what Miklos probably meant "with left over from previous
patches". Sorry Miklos, I had misunderstood what you meant and thanks 
Joanne for spotting it.

> 
>> +                      GFP_KERNEL_ACCOUNT);
>> +       if (!ring)
>> +               return NULL;
>> +
>> +       ring->queues = kcalloc(nr_queues, sizeof(struct fuse_ring_queue *),
>> +                              GFP_KERNEL_ACCOUNT);
>> +       if (!ring->queues)
>> +               goto out_err;
>> +
>> +       spin_lock(&fc->lock);
>> +       if (fc->ring) {
>> +               /* race, another thread created the ring in the mean time */
> 
> nit: s/mean time/meantime

Fixed.

> 
>> +               spin_unlock(&fc->lock);
>> +               res = fc->ring;
>> +               goto out_err;
>> +       }
>> +
>> +       fc->ring = ring;
>> +       ring->nr_queues = nr_queues;
>> +       ring->fc = fc;
>> +
>> +       spin_unlock(&fc->lock);
>> +       return ring;
>> +
>> +out_err:
>> +       if (ring)
> 
> I think you meant if (ring->queues)

I really meant ring, initially ring allocation error was via this goto. I just
removed the condition now.

> 
>> +               kfree(ring->queues);
>> +       kfree(ring);
>> +       return res;
>> +}
>> +
>> +static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
>> +                                                      int qid)
>> +{
>> +       struct fuse_conn *fc = ring->fc;
>> +       struct fuse_ring_queue *queue;
>> +
>> +       queue = kzalloc(sizeof(*queue), GFP_KERNEL_ACCOUNT);
>> +       if (!queue)
>> +               return ERR_PTR(-ENOMEM);
>> +       spin_lock(&fc->lock);
> 
> This probably doesn't make much of a difference but might be better to
> minimize logic inside the lock, eg do the queue initialization stuff
> outside the lock

Absolutely, fixed.

> 
>> +       if (ring->queues[qid]) {
>> +               spin_unlock(&fc->lock);
>> +               kfree(queue);
>> +               return ring->queues[qid];
>> +       }
>> +
>> +       queue->qid = qid;
>> +       queue->ring = ring;
>> +       spin_lock_init(&queue->lock);
>> +
>> +       INIT_LIST_HEAD(&queue->ent_avail_queue);
>> +       INIT_LIST_HEAD(&queue->ent_commit_queue);
>> +
>> +       WRITE_ONCE(ring->queues[qid], queue);
> 
> Just curious, why do we need WRITE_ONCE here if it's already protected
> by the fc->lock?

fuse_uring_fetch() (going to rename it _register) obtains the the ring
without holding a lock. Only when that fails it calls into 
fuse_uring_create(), which then obtains the lock. 
I.e. this is optimized to not need the lock at all.

Btw, interesting part here is that I always thought that with a single 
libfuse ring thread we should never run into races, until I noticed 
with large queue sizes (>128) that there were races in fuse_uring_fetch, 
because fuse_uring_create_queue had assigned "ring->queues[qid] = queue" 
too early, before initialization of the spin lock (some lockdep
annotations about the queue spin lock not being initialized came up). 


> 
>> +       spin_unlock(&fc->lock);
>> +
>> +       return queue;
>> +}
>> +
>> +/*
>> + * Make a ring entry available for fuse_req assignment
>> + */
>> +static void fuse_uring_ent_avail(struct fuse_ring_ent *ring_ent,
>> +                                struct fuse_ring_queue *queue)
>> +{
>> +       list_move(&ring_ent->list, &queue->ent_avail_queue);
>> +       ring_ent->state = FRRS_WAIT;
>> +}
>> +
>> +/*
>> + * fuse_uring_req_fetch command handling
>> + */
>> +static void _fuse_uring_fetch(struct fuse_ring_ent *ring_ent,
>> +                             struct io_uring_cmd *cmd,
>> +                             unsigned int issue_flags)
>> +{
>> +       struct fuse_ring_queue *queue = ring_ent->queue;
>> +
>> +       spin_lock(&queue->lock);
>> +       fuse_uring_ent_avail(ring_ent, queue);
>> +       ring_ent->cmd = cmd;
>> +       spin_unlock(&queue->lock);
>> +}
>> +
>> +/*
>> + * sqe->addr is a ptr to an iovec array, iov[0] has the headers, iov[1]
>> + * the payload
>> + */
>> +static int fuse_uring_get_iovec_from_sqe(const struct io_uring_sqe *sqe,
>> +                                        struct iovec iov[FUSE_URING_IOV_SEGS])
>> +{
>> +       struct iovec __user *uiov = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> +       struct iov_iter iter;
>> +       ssize_t ret;
>> +
>> +       if (sqe->len != FUSE_URING_IOV_SEGS)
>> +               return -EINVAL;
>> +
>> +       /*
>> +        * Direction for buffer access will actually be READ and WRITE,
>> +        * using write for the import should include READ access as well.
>> +        */
>> +       ret = import_iovec(WRITE, uiov, FUSE_URING_IOV_SEGS,
>> +                          FUSE_URING_IOV_SEGS, &iov, &iter);
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       return 0;
>> +}
>> +
>> +static int fuse_uring_fetch(struct io_uring_cmd *cmd, unsigned int issue_flags,
>> +                           struct fuse_conn *fc)
>> +{
>> +       const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe_cmd(cmd->sqe);
>> +       struct fuse_ring *ring = fc->ring;
>> +       struct fuse_ring_queue *queue;
>> +       struct fuse_ring_ent *ring_ent;
>> +       int err;
>> +       struct iovec iov[FUSE_URING_IOV_SEGS];
>> +
>> +       err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
>> +       if (err) {
>> +               pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
>> +                                   err);
>> +               return err;
>> +       }
>> +
>> +       err = -ENOMEM;
>> +       if (!ring) {
>> +               ring = fuse_uring_create(fc);
>> +               if (!ring)
>> +                       return err;
>> +       }
>> +
>> +       queue = ring->queues[cmd_req->qid];
>> +       if (!queue) {
>> +               queue = fuse_uring_create_queue(ring, cmd_req->qid);
>> +               if (!queue)
>> +                       return err;
>> +       }
>> +
>> +       /*
>> +        * The created queue above does not need to be destructed in
>> +        * case of entry errors below, will be done at ring destruction time.
>> +        */
>> +
>> +       ring_ent = kzalloc(sizeof(*ring_ent), GFP_KERNEL_ACCOUNT);
>> +       if (ring_ent == NULL)
> 
> nit: !ring_ent

Fixed.

> 
>> +               return err;
>> +
>> +       INIT_LIST_HEAD(&ring_ent->list);
>> +
>> +       ring_ent->queue = queue;
>> +       ring_ent->cmd = cmd;
>> +
>> +       err = -EINVAL;
>> +       if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
>> +               pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
>> +               goto err;
>> +       }
>> +
>> +       ring_ent->headers = iov[0].iov_base;
>> +       ring_ent->payload = iov[1].iov_base;
>> +       ring_ent->max_arg_len = iov[1].iov_len;
>> +
>> +       if (ring_ent->max_arg_len <
>> +           max_t(size_t, FUSE_MIN_READ_BUFFER, fc->max_write)) {
> 
> If I'm understanding this correctly, iov[0] is the header and iov[1]
> is the payload. Is this right that the payload len must always be
> equal to fc->max_write?

I'm going to drop ring_ent->max_arg_len for the next version, but it will
eventually come back once I had support for multiple payload sizes.
Typically we will need many small requests, less mid size requests and few
large size requests. Just think of a queue size of 128 and a recent system
with 192 cores.


> 
> Also, do we need to take into account fc->max_pages too?

Oh yes, thanks for spotting!

> 
>> +               pr_info_ratelimited("Invalid req payload len %zu\n",
>> +                                   ring_ent->max_arg_len);
>> +               goto err;
>> +       }
>> +
>> +       spin_lock(&queue->lock);
>> +
>> +       /*
>> +        * FUSE_URING_REQ_FETCH is an initialization exception, needs
>> +        * state override
>> +        */
>> +       ring_ent->state = FRRS_USERSPACE;
>> +       err = fuse_ring_ent_unset_userspace(ring_ent);
>> +       spin_unlock(&queue->lock);
>> +       if (WARN_ON_ONCE(err != 0))
> 
> nit: WARN_ON_ONCE(err)

Fixed.

> 
>> +               goto err;
>> +
>> +       _fuse_uring_fetch(ring_ent, cmd, issue_flags);
>> +
>> +       return 0;
>> +err:
>> +       list_del_init(&ring_ent->list);
>> +       kfree(ring_ent);
>> +       return err;
>> +}
>> +
>> +/*
>> + * Entry function from io_uring to handle the given passthrough command
>> + * (op cocde IORING_OP_URING_CMD)
>> + */
>> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> +{
>> +       struct fuse_dev *fud;
>> +       struct fuse_conn *fc;
>> +       u32 cmd_op = cmd->cmd_op;
>> +       int err;
>> +
>> +       /* Disabled for now, especially as teardown is not implemented yet */
>> +       pr_info_ratelimited("fuse-io-uring is not enabled yet\n");
>> +       return -EOPNOTSUPP;
>> +
>> +       if (!enable_uring) {
>> +               pr_info_ratelimited("fuse-io-uring is disabled\n");
>> +               return -EOPNOTSUPP;
>> +       }
>> +
>> +       fud = fuse_get_dev(cmd->file);
>> +       if (!fud) {
>> +               pr_info_ratelimited("No fuse device found\n");
>> +               return -ENOTCONN;
>> +       }
>> +       fc = fud->fc;
>> +
>> +       if (!fc->connected || fc->aborted)
>> +               return fc->aborted ? -ECONNABORTED : -ENOTCONN;
> 
> I find
> if (fc->aborted)
>   return -ECONNABORTED;
> if (!fc->connected)
>    return -ENOTCONN;
> 
> easier to read

Fine with me.

>> +
>> +       switch (cmd_op) {
>> +       case FUSE_URING_REQ_FETCH:
> 
> FUSE_URING_REQ_FETCH is only used for initialization, would it be
> clearer if this was named FUSE_URING_INIT or something like that?

 FUSE_URING_CMD_REGISTER?

> 
>> +               err = fuse_uring_fetch(cmd, issue_flags, fc);
>> +               if (err) {
>> +                       pr_info_once("fuse_uring_fetch failed err=%d\n", err);
>> +                       return err;
>> +               }
>> +               break;
>> +       default:
>> +               return -EINVAL;
>> +       }
>> +
>> +       return -EIOCBQUEUED;
>> +}
>> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..75c644cc0b2bb3721b08f8695964815d53f46e92
>> --- /dev/null
>> +++ b/fs/fuse/dev_uring_i.h
>> @@ -0,0 +1,115 @@
>> +/* SPDX-License-Identifier: GPL-2.0
>> + *
>> + * FUSE: Filesystem in Userspace
>> + * Copyright (c) 2023-2024 DataDirect Networks.
>> + */
>> +
>> +#ifndef _FS_FUSE_DEV_URING_I_H
>> +#define _FS_FUSE_DEV_URING_I_H
>> +
>> +#include "fuse_i.h"
>> +
>> +#ifdef CONFIG_FUSE_IO_URING
>> +
>> +enum fuse_ring_req_state {
>> +       FRRS_INVALID = 0,
>> +
>> +       /* The ring entry received from userspace and it being processed */
> 
> nit: "it is being processed"
> 
>> +       FRRS_COMMIT,
>> +
>> +       /* The ring entry is waiting for new fuse requests */
>> +       FRRS_WAIT,
>> +
>> +       /* The ring entry is in or on the way to user space */
>> +       FRRS_USERSPACE,
>> +};
>> +
>> +/** A fuse ring entry, part of the ring queue */
>> +struct fuse_ring_ent {
>> +       /* userspace buffer */
>> +       struct fuse_uring_req_header __user *headers;
>> +       void *__user *payload;
>> +
>> +       /* the ring queue that owns the request */
>> +       struct fuse_ring_queue *queue;
>> +
>> +       struct io_uring_cmd *cmd;
>> +
>> +       struct list_head list;
>> +
>> +       /* size of payload buffer */
>> +       size_t max_arg_len;
>> +
>> +       /*
>> +        * state the request is currently in
>> +        * (enum fuse_ring_req_state)
>> +        */
>> +       unsigned int state;
>> +
>> +       struct fuse_req *fuse_req;
>> +};
>> +
>> +struct fuse_ring_queue {
>> +       /*
>> +        * back pointer to the main fuse uring structure that holds this
>> +        * queue
>> +        */
>> +       struct fuse_ring *ring;
>> +
>> +       /* queue id, typically also corresponds to the cpu core */
>> +       unsigned int qid;
>> +
>> +       /*
>> +        * queue lock, taken when any value in the queue changes _and_ also
>> +        * a ring entry state changes.
>> +        */
>> +       spinlock_t lock;
>> +
>> +       /* available ring entries (struct fuse_ring_ent) */
>> +       struct list_head ent_avail_queue;
>> +
>> +       /*
>> +        * entries in the process of being committed or in the process
>> +        * to be send to userspace
>> +        */
>> +       struct list_head ent_commit_queue;
>> +};
>> +
>> +/**
>> + * Describes if uring is for communication and holds alls the data needed
>> + * for uring communication
>> + */
>> +struct fuse_ring {
>> +       /* back pointer */
>> +       struct fuse_conn *fc;
>> +
>> +       /* number of ring queues */
>> +       size_t nr_queues;
>> +
>> +       struct fuse_ring_queue **queues;
>> +};
>> +
>> +bool fuse_uring_enabled(void);
>> +void fuse_uring_destruct(struct fuse_conn *fc);
>> +int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
>> +
>> +#else /* CONFIG_FUSE_IO_URING */
>> +
>> +struct fuse_ring;
>> +
>> +static inline void fuse_uring_create(struct fuse_conn *fc)
>> +{
>> +}
>> +
>> +static inline void fuse_uring_destruct(struct fuse_conn *fc)
>> +{
>> +}
>> +
>> +static inline bool fuse_uring_enabled(void)
>> +{
>> +       return false;
>> +}
>> +
>> +#endif /* CONFIG_FUSE_IO_URING */
>> +
>> +#endif /* _FS_FUSE_DEV_URING_I_H */
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index e3748751e231d0991c050b31bdd84db0b8016f9f..a21256ec4c3b4bd7c67eae2d03b68d87dcc8234b 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -914,6 +914,11 @@ struct fuse_conn {
>>         /** IDR for backing files ids */
>>         struct idr backing_files_map;
>>  #endif
>> +
>> +#ifdef CONFIG_FUSE_IO_URING
>> +       /**  uring connection information*/
>> +       struct fuse_ring *ring;
>> +#endif
>>  };
>>
>>  /*
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index fd3321e29a3e569bf06be22a5383cf34fd42c051..76267c79e920204175e5713853de8214c5555d46 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -7,6 +7,7 @@
>>  */
>>
>>  #include "fuse_i.h"
>> +#include "dev_uring_i.h"
>>
>>  #include <linux/pagemap.h>
>>  #include <linux/slab.h>
>> @@ -959,6 +960,8 @@ static void delayed_release(struct rcu_head *p)
>>  {
>>         struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
>>
>> +       fuse_uring_destruct(fc);
>> +
>>         put_user_ns(fc->user_ns);
>>         fc->release(fc);
>>  }
>> @@ -1413,6 +1416,13 @@ void fuse_send_init(struct fuse_mount *fm)
>>         if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
>>                 flags |= FUSE_PASSTHROUGH;
>>
>> +       /*
>> +        * This is just an information flag for fuse server. No need to check
>> +        * the reply - server is either sending IORING_OP_URING_CMD or not.
>> +        */
>> +       if (fuse_uring_enabled())
>> +               flags |= FUSE_OVER_IO_URING;
>> +
>>         ia->in.flags = flags;
>>         ia->in.flags2 = flags >> 32;
>>
>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
>> index f1e99458e29e4fdce5273bc3def242342f207ebd..6d39077edf8cde4fa77130efcec16323839a676c 100644
>> --- a/include/uapi/linux/fuse.h
>> +++ b/include/uapi/linux/fuse.h
>> @@ -220,6 +220,14 @@
>>   *
>>   *  7.41
>>   *  - add FUSE_ALLOW_IDMAP
>> + *  7.42
>> + *  - Add FUSE_OVER_IO_URING and all other io-uring related flags and data
>> + *    structures:
>> + *    - fuse_uring_ent_in_out
>> + *    - fuse_uring_req_header
>> + *    - fuse_uring_cmd_req
>> + *    - FUSE_URING_IN_OUT_HEADER_SZ
>> + *    - FUSE_URING_OP_IN_OUT_SZ
>>   */
>>
>>  #ifndef _LINUX_FUSE_H
>> @@ -425,6 +433,7 @@ struct fuse_file_lock {
>>   * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
>>   *                 of the request ID indicates resend requests
>>   * FUSE_ALLOW_IDMAP: allow creation of idmapped mounts
>> + * FUSE_OVER_IO_URING: Indicate that Client supports io-uring
>>   */
>>  #define FUSE_ASYNC_READ                (1 << 0)
>>  #define FUSE_POSIX_LOCKS       (1 << 1)
>> @@ -471,6 +480,7 @@ struct fuse_file_lock {
>>  /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
>>  #define FUSE_DIRECT_IO_RELAX   FUSE_DIRECT_IO_ALLOW_MMAP
>>  #define FUSE_ALLOW_IDMAP       (1ULL << 40)
>> +#define FUSE_OVER_IO_URING     (1ULL << 41)
>>
>>  /**
>>   * CUSE INIT request/reply flags
>> @@ -1206,4 +1216,61 @@ struct fuse_supp_groups {
>>         uint32_t        groups[];
>>  };
>>
>> +/**
>> + * Size of the ring buffer header
>> + */
>> +#define FUSE_URING_IN_OUT_HEADER_SZ 128
>> +#define FUSE_URING_OP_IN_OUT_SZ 128
>> +
>> +struct fuse_uring_ent_in_out {
>> +       uint64_t flags;
>> +
>> +       /* size of use payload buffer */
>> +       uint32_t payload_sz;
>> +       uint32_t padding;
>> +
>> +       uint8_t reserved[30];
> 
> out of curiosity, how was 30 chosen here? I think this makes the
> struct 46 bytes?

Clearly a bug, that was supposed to be 16. I changed it to
	uint64_t reserved[2];


> 
>> +};
>> +
>> +/**
>> + * This structure mapped onto the
>> + */
>> +struct fuse_uring_req_header {
>> +       /* struct fuse_in / struct fuse_out */
>> +       char in_out[FUSE_URING_IN_OUT_HEADER_SZ];
>> +
>> +       /* per op code structs */
>> +       char op_in[FUSE_URING_OP_IN_OUT_SZ];
>> +
>> +       /* struct fuse_ring_in_out */
>> +       char ring_ent_in_out[sizeof(struct fuse_uring_ent_in_out)];
>> +};
>> +
>> +/**
>> + * sqe commands to the kernel
>> + */
>> +enum fuse_uring_cmd {
>> +       FUSE_URING_REQ_INVALID = 0,
>> +
>> +       /* submit sqe to kernel to get a request */
>> +       FUSE_URING_REQ_FETCH = 1,
>> +
>> +       /* commit result and fetch next request */
>> +       FUSE_URING_REQ_COMMIT_AND_FETCH = 2,
>> +};
>> +
>> +/**
>> + * In the 80B command area of the SQE.
>> + */
>> +struct fuse_uring_cmd_req {
>> +       uint64_t flags;
>> +
>> +       /* entry identifier */
>> +       uint64_t commit_id;
>> +
>> +       /* queue the command is for (queue index) */
>> +       uint16_t qid;
>> +       uint8_t padding[6];
>> +};
>> +
>>  #endif /* _LINUX_FUSE_H */
>>
> 
> I'll be out the rest of this week for an American holiday
> (Thanksgiving) and next week for a work trip but will review the rest
> of this patchset after that when i get back.

Thanks so much for your review! Happy Thanksgiving and safe travels!


Thanks,
Bernd

  reply	other threads:[~2024-11-28 18:20 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-27 13:40 [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-11-28  0:19   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-11-28  0:20   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 03/16] fuse: Move request bits Bernd Schubert
2024-11-28  0:21   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-12-03 12:30   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
2024-11-28  0:27   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-11-28  2:23   ` Joanne Koong
2024-11-28 18:20     ` Bernd Schubert [this message]
2024-12-03 13:24   ` Pavel Begunkov
2024-12-03 13:49     ` Bernd Schubert
2024-12-03 14:16       ` Pavel Begunkov
2024-12-03 13:38   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 07/16] fuse: Make fuse_copy non static Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
2024-12-03 13:47   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
2024-12-03 14:09   ` Pavel Begunkov
2024-12-03 22:46     ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 12/16] fuse: {uring} Allow to queue bg " Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
2024-12-03 12:15   ` Pavel Begunkov
2024-12-03 12:15     ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
2024-12-03 12:20   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
2024-11-27 13:45 ` [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-12-03 14:24 ` Pavel Begunkov
2024-12-03 14:32   ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox