Re: Large CQE for fuse headers

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Ming Lei <[email protected]>
To: Bernd Schubert <[email protected]>
Cc: Jens Axboe <[email protected]>,
	[email protected], Pavel Begunkov <[email protected]>,
	Miklos Szeredi <[email protected]>,
	Joanne Koong <[email protected]>,
	Josef Bacik <[email protected]>
Subject: Re: Large CQE for fuse headers
Date: Mon, 14 Oct 2024 10:44:33 +0800	[thread overview]
Message-ID: <ZwyFke6PayyOznP_@fedora> (raw)
In-Reply-To: <[email protected]>

On Sun, Oct 13, 2024 at 11:20:53PM +0200, Bernd Schubert wrote:
> 
> 
> On 10/12/24 16:38, Jens Axboe wrote:
> > On 10/11/24 7:55 PM, Ming Lei wrote:
> >> On Fri, Oct 11, 2024 at 4:56?AM Bernd Schubert
> >> <[email protected]> wrote:
> >>>
> >>> Hello,
> >>>
> >>> as discussed during LPC, we would like to have large CQE sizes, at least
> >>> 256B. Ideally 256B for fuse, but CQE512 might be a bit too much...
> >>>
> >>> Pavel said that this should be ok, but it would be better to have the CQE
> >>> size as function argument.
> >>> Could you give me some hints how this should look like and especially how
> >>> we are going to communicate the CQE size to the kernel? I guess just adding
> >>> IORING_SETUP_CQE256 / IORING_SETUP_CQE512 would be much easier.
> >>>
> >>> I'm basically through with other changes Miklos had been asking for and
> >>> moving fuse headers into the CQE is next.
> >>
> >> Big CQE may not be efficient,  there are copy from kernel to CQE and
> >> from CQE to userspace. And not flexible, it is one ring-wide property,
> >> if it is big,
> >> any CQE from this ring has to be big.
> > 
> > There isn't really a copy - the kernel fills it in, generally the
> > application itself, just in the kernel, and then the application can
> > read it on that side. It's the same memory, and it'll also generally be
> > cache hot when the applicatio reaps it. Unless a lot of time has passed,
> > obviously.
> > 
> > That said, yeah bigger sqe/cqe is less ideal than smaller ones,
> > obviously. Currently you can fit 4 normal cqes in a cache line, or a
> > single sqe. Making either of them bigger will obviously bloat that.
> > 
> >> If you are saying uring_cmd,  another way is to mapped one area for
> >> this purpose, the fuse driver can write fuse headers to this indexed
> >> mmap buffer, and userspace read it, which is just efficient, without
> >> io_uring core changes. ublk uses this way to fill IO request header.
> >> But it requires each command to have a unique tag.
> > 
> > That may indeed be a decent idea for this too. You don't even need fancy
> > tagging, you can just use the cqe index for your tag too, as it should
> > not be bigger than the the cq ring space. Then you can get away with
> > just using normal cqe sizes, and just have a shared region between the
> > two where data gets written by the uring_cmd completion, and the app can
> > access it directly from userspace.
> 
> Would be good if Miklos could chime in here, adding back mmap for headers
> wouldn't be difficult, but would add back more fuse-uring startup and
> tear-down code.
> 
> From performance point of view, I don't know anything about CPU cache
> prefetching, but shouldn't the cpu cache logic be able to easily prefetch 
> larger linear io-uring rings into 2nd/3rd level caches? And if if the
> fuse header is in a separated buffer, it can't auto prefetch that
> without additional instructions? I.e. how would the cpu cache logic
> auto know about these additional memory areas?

It also depends on how fuse user code consumes the big CQE payload, if
fuse header needs to keep in memory a bit long, you may have to copy it
somewhere for post-processing since io_uring(kernel) needs CQE to be
returned back asap.


Thanks,
Ming

next prev parent reply	other threads:[~2024-10-14  2:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10 20:56 Large CQE for fuse headers Bernd Schubert
2024-10-11 17:57 ` Jens Axboe
2024-10-11 18:35   ` Bernd Schubert
2024-10-11 18:39     ` Jens Axboe
2024-10-11 19:03       ` Bernd Schubert
2024-10-11 19:24         ` Jens Axboe
2024-10-11 21:38 ` Pavel Begunkov
2024-10-12  1:55 ` Ming Lei
2024-10-12 14:38   ` Jens Axboe
2024-10-13 21:20     ` Bernd Schubert
2024-10-14  2:44       ` Ming Lei [this message]
2024-10-14 11:10         ` Miklos Szeredi
2024-10-14 12:47           ` Bernd Schubert
2024-10-14 13:34             ` Pavel Begunkov
2024-10-14 15:21               ` Bernd Schubert
2024-10-14 17:48                 ` Pavel Begunkov
2024-10-14 21:27                   ` Bernd Schubert
2024-10-16 10:54                     ` Miklos Szeredi
2024-10-16 11:53                       ` Bernd Schubert
2024-10-16 12:24                         ` Miklos Szeredi
2024-10-17  0:59                         ` Ming Lei
2024-10-14 13:20           ` Bernd Schubert
2024-10-14 10:31       ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZwyFke6PayyOznP_@fedora \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox