public inbox for [email protected]
 help / color / mirror / Atom feed
From: Bernd Schubert <[email protected]>
To: Miklos Szeredi <[email protected]>
Cc: Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	 [email protected], [email protected],
	 Joanne Koong <[email protected]>,
	Josef Bacik <[email protected]>,
	 Amir Goldstein <[email protected]>,
	Ming Lei <[email protected]>,  David Wei <[email protected]>,
	[email protected],  Bernd Schubert <[email protected]>
Subject: [PATCH RFC v7 00/16] fuse: fuse-over-io-uring
Date: Wed, 27 Nov 2024 14:40:17 +0100	[thread overview]
Message-ID: <[email protected]> (raw)

[I removed RFC status as the design should be in place now
and as xfstests pass. I still reviewing patches myself, though
and also repeatings tests with different queue sizes.]

This adds support for uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
approach was taken from ublk.

Motivation for these patches is all to increase fuse performance,
by:
- Reducing kernel/userspace context switches
    - Part of that is given by the ring ring - handling multiple
      requests on either side of kernel/userspace without the need
      to switch per request
    - Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting
      the result of a request and fetching the next fuse request
      in one step. In contrary to legacy read/write to /dev/fuse
- Core and numa affinity - one ring per core, which allows to
  avoid cpu core context switches

A more detailed motivation description can be found in the
introction of previous patch series
https://lore.kernel.org/r/[email protected]
That description also includes benchmark results with RFCv1.
Performance with the current series needs to be tested, but will
be lower, as several optimization patches are missing, like
wake-up on the same core. These optimizations will be submitted
after merging the main changes.

The corresponding libfuse patches are on my uring branch, but needs
cleanup for submission - that will be done once the kernel design
will not change anymore
https://github.com/bsbernd/libfuse/tree/uring

Testing with that libfuse branch is possible by running something
like:

example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring  --uring-q-depth=128 /scratch/source /scratch/dest

With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:

cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
    unique: 4, result=104

Without the --uring option "cqe" is replaced by the default "dev"

dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
   unique: 4, success, outsize: 120

Future work
- different payload sizes per ring
- zero copy

Signed-off-by: Bernd Schubert <[email protected]>
---
Changes in v7:
- Bug fixes:
   - Removed unsetting ring->ready as that brought up a lock
     order violation for fc->bg_lock/queue->lock
   - Check for !fc->connected in fuse_uring_cmd(), tear down issues
     came up with large ring sizes without that.
   - Removal of (arg->size == 0) condition and warning in fuse_copy_args
     as that is actually expected for some op codes.
- New init flag: FUSE_OVER_IO_URING to tell fuse-server about over-io-uring
                 capability
- Use fuse_set_zero_arg0() to set arg0 and rename to struct fuse_zero_header
  (I hope I got Miklos suggestion right)
- Simplification of fuse_uring_ent_avail()
- Renamed some structs in uapi/linux/fuse.h to fuse_uring
  (from fuse_ring) to be consistent
- Removal of 'if 0' in fuse_uring_cmd()
- Return -E... directly in fuse_uring_cmd() instead of setting err first
  and removal of goto's in that function.
- Just a simple WARN_ON_ONCE() for (oh->unique & FUSE_INT_REQ_BIT) as
  that code should be unreachable
- Removal of several pr_devel and some pr_warn() messages
- Removed RFC as it passed several xfstests runs now
- Link to v6: https://lore.kernel.org/r/[email protected]

Changes in v6:
- Update to linux-6.12
- Use 'struct fuse_iqueue_ops' and redirect fiq->ops once
  the ring is ready.
- Fix return code from fuse_uring_copy_from_ring on
  copy_from_user failure (Dan Carpenter / kernel test robot)
- Avoid list iteration in fuse_uring_cancel (Joanne)
- Simplified struct fuse_ring_req_header
	- Adds a new 'struct struct fuse_ring_ent_in_out'
- Fix assigning ring->queues[qid] in fuse_uring_create_queue,
  it was too early, resulting in races
- Add back 'FRRS_INVALID = 0' to ensure ring-ent states always
  have a value > 0
- Avoid assigning struct io_uring_cmd *cmd->pdu multiple times,
  once on settings up IO_URING_F_CANCEL is sufficient for sending
  the request as well.
- Link to v5: https://lore.kernel.org/r/[email protected]

Changes in v5:
- Main focus in v5 is the separation of headers from payload,
  which required to introduce 'struct fuse_zero_in'.
- Addressed several teardown issues, that were a regression in v4.
- Fixed "BUG: sleeping function called" due to allocation while
  holding a lock reported by David Wei
- Fix function comment reported by kernel test rebot
- Fix set but unused variabled reported by test robot
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Removal of ioctls, all configuration is done dynamically
  on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
  allocated as array of the ring/queue - removal of the tag
  variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
  is more cumbersome now and needs an almost unused
  struct fuse_pqueue per fuse_ring_queue and uses the unique
  id of fuse requests.
- No device clones needed for to workaroung hanging mounts
  on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.

Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
  as that needs to go through another subsystem/tree) ,
  removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
  for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
  (Josef)
- Addressed several other comments from Josef (I need to go over
  the RFCv2 review again, I'm not sure if everything is addressed
  already)

- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com
- Link to v2: https://lore.kernel.org/all/[email protected]/
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Bernd Schubert (15):
      fuse: rename to fuse_dev_end_requests and make non-static
      fuse: Move fuse_get_dev to header file
      fuse: Move request bits
      fuse: Add fuse-io-uring design documentation
      fuse: make args->in_args[0] to be always the header
      fuse: {uring} Handle SQEs - register commands
      fuse: Make fuse_copy non static
      fuse: Add fuse-io-uring handling into fuse_copy
      fuse: {uring} Add uring sqe commit and fetch support
      fuse: {uring} Handle teardown of ring entries
      fuse: {uring} Allow to queue fg requests through io-uring
      fuse: {uring} Allow to queue bg requests through io-uring
      fuse: {uring} Handle IO_URING_F_TASK_DEAD
      fuse: {io-uring} Prevent mount point hang on fuse-server termination
      fuse: enable fuse-over-io-uring

Pavel Begunkov (1):
      io_uring/cmd: let cmds to know about dying task

 Documentation/filesystems/fuse-io-uring.rst |  101 +++
 fs/fuse/Kconfig                             |   12 +
 fs/fuse/Makefile                            |    1 +
 fs/fuse/dax.c                               |   11 +-
 fs/fuse/dev.c                               |  124 +--
 fs/fuse/dev_uring.c                         | 1269 +++++++++++++++++++++++++++
 fs/fuse/dev_uring_i.h                       |  204 +++++
 fs/fuse/dir.c                               |   32 +-
 fs/fuse/fuse_dev_i.h                        |   68 ++
 fs/fuse/fuse_i.h                            |   27 +
 fs/fuse/inode.c                             |   12 +-
 fs/fuse/xattr.c                             |    7 +-
 include/linux/io_uring_types.h              |    1 +
 include/uapi/linux/fuse.h                   |   67 ++
 io_uring/uring_cmd.c                        |    6 +-
 15 files changed, 1866 insertions(+), 76 deletions(-)
---
base-commit: 3022e9d00ebec31ed435ae0844e3f235dba998a9
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8

Best regards,
-- 
Bernd Schubert <[email protected]>


             reply	other threads:[~2024-11-27 13:40 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-27 13:40 Bernd Schubert [this message]
2024-11-27 13:40 ` [PATCH RFC v7 01/16] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-11-28  0:19   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 02/16] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-11-28  0:20   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 03/16] fuse: Move request bits Bernd Schubert
2024-11-28  0:21   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 04/16] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-12-03 12:30   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 05/16] fuse: make args->in_args[0] to be always the header Bernd Schubert
2024-11-28  0:27   ` Joanne Koong
2024-11-27 13:40 ` [PATCH RFC v7 06/16] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-11-28  2:23   ` Joanne Koong
2024-11-28 18:20     ` Bernd Schubert
2024-12-03 13:24   ` Pavel Begunkov
2024-12-03 13:49     ` Bernd Schubert
2024-12-03 14:16       ` Pavel Begunkov
2024-12-03 13:38   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 07/16] fuse: Make fuse_copy non static Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 08/16] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 09/16] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
2024-12-03 13:47   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 10/16] fuse: {uring} Handle teardown of ring entries Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 11/16] fuse: {uring} Allow to queue fg requests through io-uring Bernd Schubert
2024-12-03 14:09   ` Pavel Begunkov
2024-12-03 22:46     ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 12/16] fuse: {uring} Allow to queue bg " Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 13/16] io_uring/cmd: let cmds to know about dying task Bernd Schubert
2024-12-03 12:15   ` Pavel Begunkov
2024-12-03 12:15     ` Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 14/16] fuse: {uring} Handle IO_URING_F_TASK_DEAD Bernd Schubert
2024-12-03 12:20   ` Pavel Begunkov
2024-11-27 13:40 ` [PATCH RFC v7 15/16] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2024-11-27 13:40 ` [PATCH RFC v7 16/16] fuse: enable fuse-over-io-uring Bernd Schubert
2024-11-27 13:45 ` [PATCH RFC v7 00/16] fuse: fuse-over-io-uring Bernd Schubert
2024-12-03 14:24 ` Pavel Begunkov
2024-12-03 14:32   ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241127-fuse-uring-for-6-10-rfc4-v7-0-934b3a69baca@ddn.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox