From: Bernd Schubert <[email protected]>
To: Miklos Szeredi <[email protected]>
Cc: Jens Axboe <[email protected]>,
Pavel Begunkov <[email protected]>,
[email protected], [email protected],
Joanne Koong <[email protected]>,
Josef Bacik <[email protected]>,
Amir Goldstein <[email protected]>,
Ming Lei <[email protected]>, David Wei <[email protected]>,
[email protected], Bernd Schubert <[email protected]>
Subject: [PATCH v9 00/17] fuse: fuse-over-io-uring
Date: Tue, 07 Jan 2025 01:25:05 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
This adds support for io-uring communication between kernel and
userspace daemon using opcode the IORING_OP_URING_CMD. The basic
approach was taken from ublk.
Motivation for these patches is all to increase fuse performance,
by:
- Reducing kernel/userspace context switches
- Part of that is given by the ring ring - handling multiple
requests on either side of kernel/userspace without the need
to switch per request
- Part of that is FUSE_URING_REQ_COMMIT_AND_FETCH, i.e. submitting
the result of a request and fetching the next fuse request
in one step. In contrary to legacy read/write to /dev/fuse
- Core and numa affinity - one ring per core, which allows to
avoid cpu core context switches
A more detailed motivation description can be found in the
introction of previous patch series
https://lore.kernel.org/r/[email protected]
That description also includes benchmark results with RFCv1.
Performance with the current series needs to be tested, but will
be lower, as several optimization patches are missing, like
wake-up on the same core. These optimizations will be submitted
after merging the main changes.
The corresponding libfuse patches are on my uring branch, but needs
cleanup for submission - that will be done once the kernel design
will not change anymore
https://github.com/bsbernd/libfuse/tree/uring
Testing with that libfuse branch is possible by running something
like:
example/passthrough_hp -o allow_other --debug-fuse --nopassthrough \
--uring --uring-q-depth=128 /scratch/source /scratch/dest
With the --debug-fuse option one should see CQE in the request type,
if requests are received via io-uring:
cqe unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 16, pid: 7060
unique: 4, result=104
Without the --uring option "cqe" is replaced by the default "dev"
dev unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 7117
unique: 4, success, outsize: 120
Future work
- different payload sizes per ring
- zero copy
Signed-off-by: Bernd Schubert <[email protected]>
---
Changes in v9:
- Fixed a queue->lock/fc->bg_lock order issue, fuse_block_alloc() now waits
until fc->io_uring is ready
- Renamed fuse_ring_ent_unset_userspace to fuse_ring_ent_set_commit (Joanne)
- No need to initialize *ring to NULL in fuse_uring_create (Joanne)
- Use max() instead of max_t in fuse_uring_create (Joanne)
- Rename FRRS_WAIT to FRRS_AVAILABLE (Joanne)
- Add comment for WRITE_ONCE(ring->queues[qid], ...) (Joanne)
- Rename _fuse_uring_register to fuse_uring_do_register (Joanne)
- Split out fuse_uring_create_ring_ent() (Joanne)
- Use 'struct fuse_uring_ent_in_out' instead of char[] in
fuse_uring_req_header (Joanne)
- Set fuse_ring_ent->cmd to NULL to ensure io-uring commands cannot
be used two times (Pavel). That also allows to simplify
fuse_uring_entry_teardown().
- Fix return value on allocation failure in fuse_uring_create_queue (Joanne)
- Renamed struct fuse_copy_state.ring.offset to .copied_sz
- static const struct fuse_iqueue_ops fuse_io_uring_ops (kernel test robot)
- ring_ent->commit_id was removed and req->in.h.unique is set in the request
header as commit id.
- Rename of "ring_ent" to "ent" in several functions
- Rename struct fuse_uring_cmd_pdu to struct fuse_uring_pdu
- Link to v8: https://lore.kernel.org/r/[email protected]
- No return code from fuse_uring_cancel(), io-uring handles
resending IO_URING_F_CANCEL on its own (Pavel)
Changes in v8:
- Move the lock in fuse_uring_create_queue to have initialization before
taking fc->lock (Joanne)
- Avoid double assignment of ring_ent->cmd (Pavel)
- Set a global ring may_payload size instead of per ring-entry (Joanne)
- Also consider fc->max_pages for the max payload size (instead of
fc->max_write only) (Joanne)
- Fix freeing of ring->queues in error case in fuse_uring_create (Joanne)
- Fix allocation size of the ring, including queues was a leftover from
previous patches (Miklos, Joanne)
- Move FUSE_URING_IOV_SEGS definiton to the top of dev_uring.c (Joanne)a
- Update Documentation/filesystems/fuse-io-uring.rst and use 'io-uring'
instead of 'uring' (Pavel)
- Rename SQE op codes to FUSE_IO_URING_CMD_REGISTER and
FUSE_IO_URING_CMD_COMMIT_AND_FETCH
- Use READ_ONCE on data in 80B SQE section (struct fuse_uring_cmd_req)
(Pavel)
- Add back sanity check for IO_URING_F_SQE128 (had been initially there,
but got lost between different version updates) (Pavel)
- Remove pr_devel logs (Miklos)
- Only set fuse_uring_cmd() in to file_operations in the last patch
and mark that function with __maybe_unused before, to avoid potential
compiler warnings (Pavel)
- Add missing sanity for qid < ring->nr_queues
- Add check for fc->initialized - FUSE_IO_URING_CMD_REGISTER must only
arrive after FUSE_INIT in order to know the max payload size
- Add in 'struct fuse_uring_ent_in_out' and add in the commit id.
For now the commit id is the request unique, but any number
that can identify the corresponding struct fuse_ring_ent object.
The current way via struct fuse_req needs struct fuse_pqueue per
queue (>2kb per core/queue), has hash overhead and is not suitable
for requests types without a unique (like future updates for notify
- Increase FUSE_KERNEL_MINOR_VERSION to 42
- Separate out make fuse_request_find/fuse_req_hash/fuse_pqueue_init
non-static to simplify review
- Don't return too early in fuse_uring_copy_to_ring, to always update
'ring_ent_in_out'
- Code simplification of fuse_uring_next_fuse_req()
- fuse_uring_commit_fetch was accidentally doing a full copy on stack
of queue->fpq
- Separate out setting and getting values from io_uring_cmd *cmd->pdu
into functions
- Fix freeing of queue->ent_released (was accidentally in the wrong
function)
- Remove the queue from fuse_uring_cmd_pdu, ring_ent is enough since
v7
- Return -EAGAIN for IO_URING_F_CANCEL when ring-entries are in the
wrong state. To be clarified with io-uring upstream if that is right.
- Slight simplifaction by using list_first_entry_or_null instead of
extra checks if the list is empty
- Link to v7: https://lore.kernel.org/r/[email protected]
Changes in v7:
- Bug fixes:
- Removed unsetting ring->ready as that brought up a lock
order violation for fc->bg_lock/queue->lock
- Check for !fc->connected in fuse_uring_cmd(), tear down issues
came up with large ring sizes without that.
- Removal of (arg->size == 0) condition and warning in fuse_copy_args
as that is actually expected for some op codes.
- New init flag: FUSE_OVER_IO_URING to tell fuse-server about over-io-uring
capability
- Use fuse_set_zero_arg0() to set arg0 and rename to struct fuse_zero_header
(I hope I got Miklos suggestion right)
- Simplification of fuse_uring_ent_avail()
- Renamed some structs in uapi/linux/fuse.h to fuse_uring
(from fuse_ring) to be consistent
- Removal of 'if 0' in fuse_uring_cmd()
- Return -E... directly in fuse_uring_cmd() instead of setting err first
and removal of goto's in that function.
- Just a simple WARN_ON_ONCE() for (oh->unique & FUSE_INT_REQ_BIT) as
that code should be unreachable
- Removal of several pr_devel and some pr_warn() messages
- Removed RFC as it passed several xfstests runs now
- Link to v6: https://lore.kernel.org/r/[email protected]
Changes in v6:
- Update to linux-6.12
- Use 'struct fuse_iqueue_ops' and redirect fiq->ops once
the ring is ready.
- Fix return code from fuse_uring_copy_from_ring on
copy_from_user failure (Dan Carpenter / kernel test robot)
- Avoid list iteration in fuse_uring_cancel (Joanne)
- Simplified struct fuse_ring_req_header
- Adds a new 'struct struct fuse_ring_ent_in_out'
- Fix assigning ring->queues[qid] in fuse_uring_create_queue,
it was too early, resulting in races
- Add back 'FRRS_INVALID = 0' to ensure ring-ent states always
have a value > 0
- Avoid assigning struct io_uring_cmd *cmd->pdu multiple times,
once on settings up IO_URING_F_CANCEL is sufficient for sending
the request as well.
- Link to v5: https://lore.kernel.org/r/[email protected]
Changes in v5:
- Main focus in v5 is the separation of headers from payload,
which required to introduce 'struct fuse_zero_in'.
- Addressed several teardown issues, that were a regression in v4.
- Fixed "BUG: sleeping function called" due to allocation while
holding a lock reported by David Wei
- Fix function comment reported by kernel test rebot
- Fix set but unused variabled reported by test robot
- Link to v4: https://lore.kernel.org/r/[email protected]
Changes in v4:
- Removal of ioctls, all configuration is done dynamically
on the arrival of FUSE_URING_REQ_FETCH
- ring entries are not (and cannot be without config ioctls)
allocated as array of the ring/queue - removal of the tag
variable. Finding ring entries on FUSE_URING_REQ_COMMIT_AND_FETCH
is more cumbersome now and needs an almost unused
struct fuse_pqueue per fuse_ring_queue and uses the unique
id of fuse requests.
- No device clones needed for to workaroung hanging mounts
on fuse-server/daemon termination, handled by IO_URING_F_CANCEL
- Removal of sync/async ring entry types
- Addressed some of Joannes comments, but probably not all
- Only very basic tests run for v3, as more updates should follow quickly.
Changes in v3
- Removed the __wake_on_current_cpu optimization (for now
as that needs to go through another subsystem/tree) ,
removing it means a significant performance drop)
- Removed MMAP (Miklos)
- Switched to two IOCTLs, instead of one ioctl that had a field
for subcommands (ring and queue config) (Miklos)
- The ring entry state is a single state and not a bitmask anymore
(Josef)
- Addressed several other comments from Josef (I need to go over
the RFCv2 review again, I'm not sure if everything is addressed
already)
- Link to v3: https://lore.kernel.org/r/20240901-b4-fuse-uring-rfcv3-without-mmap-v3-0-9207f7391444@ddn.com
- Link to v2: https://lore.kernel.org/all/[email protected]/
- Link to v1: https://lore.kernel.org/r/[email protected]
---
Bernd Schubert (17):
fuse: rename to fuse_dev_end_requests and make non-static
fuse: Move fuse_get_dev to header file
fuse: Move request bits
fuse: Add fuse-io-uring design documentation
fuse: make args->in_args[0] to be always the header
fuse: {io-uring} Handle SQEs - register commands
fuse: Make fuse_copy non static
fuse: Add fuse-io-uring handling into fuse_copy
fuse: {io-uring} Make hash-list req unique finding functions non-static
fuse: Add io-uring sqe commit and fetch support
fuse: {io-uring} Handle teardown of ring entries
fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
fuse: Allow to queue fg requests through io-uring
fuse: Allow to queue bg requests through io-uring
fuse: {io-uring} Prevent mount point hang on fuse-server termination
fuse: block request allocation until io-uring init is complete
fuse: enable fuse-over-io-uring
Documentation/filesystems/fuse-io-uring.rst | 101 ++
fs/fuse/Kconfig | 12 +
fs/fuse/Makefile | 1 +
fs/fuse/dax.c | 11 +-
fs/fuse/dev.c | 127 +--
fs/fuse/dev_uring.c | 1318 +++++++++++++++++++++++++++
fs/fuse/dev_uring_i.h | 205 +++++
fs/fuse/dir.c | 32 +-
fs/fuse/fuse_dev_i.h | 67 ++
fs/fuse/fuse_i.h | 30 +
fs/fuse/inode.c | 14 +-
fs/fuse/xattr.c | 7 +-
include/uapi/linux/fuse.h | 76 +-
13 files changed, 1924 insertions(+), 77 deletions(-)
---
base-commit: 5428dc1906dde5fb5ab283cda4714011f9811aa1
change-id: 20241015-fuse-uring-for-6-10-rfc4-61d0fc6851f8
Best regards,
--
Bernd Schubert <[email protected]>
next reply other threads:[~2025-01-07 1:59 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 0:25 Bernd Schubert [this message]
2025-01-07 0:25 ` [PATCH v9 01/17] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 02/17] fuse: Move fuse_get_dev to header file Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 03/17] fuse: Move request bits Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 04/17] fuse: Add fuse-io-uring design documentation Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 05/17] fuse: make args->in_args[0] to be always the header Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 06/17] fuse: {io-uring} Handle SQEs - register commands Bernd Schubert
2025-01-07 9:56 ` Luis Henriques
2025-01-07 12:07 ` Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 07/17] fuse: Make fuse_copy non static Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 08/17] fuse: Add fuse-io-uring handling into fuse_copy Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 09/17] fuse: {io-uring} Make hash-list req unique finding functions non-static Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 10/17] fuse: Add io-uring sqe commit and fetch support Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 11/17] fuse: {io-uring} Handle teardown of ring entries Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 12/17] fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 13/17] fuse: Allow to queue fg requests through io-uring Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 14/17] fuse: Allow to queue bg " Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 15/17] fuse: {io-uring} Prevent mount point hang on fuse-server termination Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 16/17] fuse: block request allocation until io-uring init is complete Bernd Schubert
2025-01-07 0:25 ` [PATCH v9 17/17] fuse: enable fuse-over-io-uring Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250107-fuse-uring-for-6-10-rfc4-v9-0-9c786f9a7a9d@ddn.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox