public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jeff Moyer <[email protected]>
To: Pierre Labat <[email protected]>
Cc: "'io-uring\@vger.kernel.org'" <[email protected]>
Subject: Re: FYI, fsnotify contention with aio and io_uring.
Date: Mon, 07 Aug 2023 16:11:14 -0400	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <SJ0PR08MB6494F5A32B7C60A5AD8B33C2AB09A@SJ0PR08MB6494.namprd08.prod.outlook.com> (Pierre Labat's message of "Fri, 4 Aug 2023 17:47:25 +0000")

Hi, Pierre,

Pierre Labat <[email protected]> writes:

> Hi,
>
> This is FYI, may be you already knows about that, but in case you don't....
>
> I was pushing the limit of the number of nvme read IOPS, the FIO + the
> Linux OS can handle. For that, I have something special under the
> Linux nvme driver. As a consequence I am not limited by whatever the
> NVME SSD max IOPS or IO latency would be.
>
> As I cranked the number of system cores and FIO jobs doing direct 4k
> random read on /dev/nvme0n1, I hit a wall. The IOPS scaling slows
> (less than linear) and around 15 FIO jobs on 15 core threads, the
> overall IOPS, in fact, goes down as I add more FIO jobs. For example
> on a system with 24 cores/48 threads, when I goes beyond 15 FIO jobs,
> the overall IOPS starts to go down.
>
> This happens the same for io_uring and aio. Was using kernel version 6.3.9. Using one namespace (/dev/nvme0n1).

[snip]

> As you can see 76% of the cpu on the box is sucked up by
> lockref_get_not_zero() and lockref_put_return().  Looking at the code,
> there is contention when IO_uring call fsnotify_access().

Is there a FAN_MODIFY fsnotify watch set on /dev?  If so, it might be a
good idea to find out what set it and why.

> The filesystem code fsnotify_access() ends up calling dget_parent()
> and later dput() to take a reference on the parent directory (that
> would be /dev/ in our case), and later release the reference.  This is
> done (get+put) for each IO.
>
> The dget increments a unique/same counter (for the /dev/ directory)
> and dput decrements this same counter.
>
> As a consequence we have 24 cores/48 threads fighting to get the same
> counter in their cache to modify it. At a rate of M of iops. That is
> disastrous.
>
> To work around that problem, and continue my scalability testing, I
> acked io_uring and aio to set the flag FMODE_NONOTIFY in the struct
> file->f_mode of the file on which IOs are done.  Doing that forces
> fsnotify to do nothing. The iops immediately went up more than 4X and
> the fsnotify trashing disappeared.
>
> May be it would be a good idea to add an option to FIO to disable
> fsnotify on the file[s] on which IOs are issued?

Maybe I'm wrong, but that sounds like an abuse of the FMODE_NONOTIFY
flag.

> Or to take a reference on the file parent directory only once when fio
> starts?

Let's decide on whether or not the application is following best
practices, first, starting with answering the questions I asked above.

Cheers,
Jeff


  reply	other threads:[~2023-08-07 20:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-04 17:47 FYI, fsnotify contention with aio and io_uring Pierre Labat
2023-08-07 20:11 ` Jeff Moyer [this message]
2023-08-08 21:41   ` Jens Axboe
2023-08-09 16:33     ` [EXT] " Pierre Labat
2023-08-09 17:14       ` Jeff Moyer
2023-08-14 16:30         ` Pierre Labat
2023-08-29 21:54           ` Pierre Labat
2023-09-14 19:11             ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox