From: Jeff Moyer <[email protected]>
To: Pierre Labat <[email protected]>
Cc: 'Jens Axboe' <[email protected]>,
"'io-uring\@vger.kernel.org'" <[email protected]>
Subject: Re: [EXT] Re: FYI, fsnotify contention with aio and io_uring.
Date: Thu, 14 Sep 2023 15:11:49 -0400 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <SJ0PR08MB64949350D2580D27863FBFDFABE7A@SJ0PR08MB6494.namprd08.prod.outlook.com> (Pierre Labat's message of "Tue, 29 Aug 2023 21:54:42 +0000")
[-- Attachment #1: Type: text/plain, Size: 7386 bytes --]
Pierre Labat <[email protected]> writes:
> Hi,
>
> Had some time to re-do some testing.
>
> 1) Pipewire (its wireplumber deamon) set a watch on the children of the directory /dev via inotify.
> I removed that (disabled pipewire), but still had the fsnotify
> overhead when using aio/io_ring at high IOPS across several threads on
> several cores.
>
> 2) I then noticed that udev set a watch (via inotify) on the files in /dev.
> This is due to a rule in /usr/lib/udev/rules.d/60-block.rules
> # watch metadata changes, caused by tools closing the device node which was opened for writing
> ACTION!="remove", SUBSYSTEM=="block", \
> KERNEL=="loop*|mmcblk*[0-9]|msblk*[0-9]|mspblk*[0-9]|nvme*|sd*|vd*|xvd*|bcache*|cciss*|dasd*|ubd*|ubi*|scm*|pmem*|nbd*|zd*",
> \
> OPTIONS+="watch"
> I removed "nvme*" from this rule (I am testing on /dev/nvme0n1), then finally the fsnotify overhead disappeared.
Interesting. I don't see that behavior. I setup a null block device
with the following parameters:
modprobe null_blk submit_queues=96 queue_mode=2 gb=350 bs=4096 completion_nsec=0 hw_queue_depth=1024
And I ran the following fio job:
---
[global]
ioengine=io_uring
iodepth=8
direct=1
rw=read
filename=/dev/nullb0
cpus_allowed=0-95
cpus_allowed_policy=split
size=1g
offset_increment=10g
[32thread]
numjobs=32
---
If there are no watches on /dev or /dev/nullb0, then I see 70-79GiB/s
throughput from this job. If I add a watch on /dev/nullb0, there
appears to be a small performance hit, but it is within the run-to-run
variation. If I instead add a watch to /dev, the throughput drops to
~10GiB/s. So, I think this matches your initial report (the perf top
output closely matched yours).
Can you run the attached script to verify that nothing is watching /dev
when you have udev configured to watch the nvme device, and report back?
Run it with the path as the argument, so "inotify-watchers.sh /dev".
If there is no watch on /dev, and you still see a performance problem,
then we'll need to start investigating that. A good starting point
would be details of how you are testing, along with new perf top output.
-Jeff
> 3) I think there is nothing wrong with Pipewire and udev, they simply want to watch what is going on in /dev.
> I don't think they are interested in (and it is not the goal/charter
> of fsnotify) quantifying millions of read/write accesses/sec to a file
> they watch. There are other tools for that, that are optimized for
> that task.
>
> I think to avoid the overhead, the fsnotify subsystem should be
> refined to factor high frequency read/write file access.
> Or piece of code (like aio/io_uring) doing high frequency fsnotify should do the factoring themselves.
> Or the user should be given a way to turn off fsnotify calls for read/write on specific file.
> Now, the only way to work around the cpu overhead without hacking, is
> to disable services watching /dev. That means people can't use these
> services anymore. Doesn't seem right.
>
> Regards,
>
> Pierre
>
>
>> -----Original Message-----
>> From: Pierre Labat
>> Sent: Monday, August 14, 2023 9:31 AM
>> To: Jeff Moyer <[email protected]>
>> Cc: Jens Axboe <[email protected]>; '[email protected]' <io-
>> [email protected]>
>> Subject: RE: [EXT] Re: FYI, fsnotify contention with aio and io_uring.
>>
>> Hi Jeff,
>>
>> Indeed, by default, in my configuration, pipewire is running.
>> When I can re-test, I'll disabled it and see if that remove the problem.
>> Thanks for the hint!
>>
>> Pierre
>>
>> > -----Original Message-----
>> > From: Jeff Moyer <[email protected]>
>> > Sent: Wednesday, August 9, 2023 10:15 AM
>> > To: Pierre Labat <[email protected]>
>> > Cc: Jens Axboe <[email protected]>; '[email protected]' <io-
>> > [email protected]>
>> > Subject: Re: [EXT] Re: FYI, fsnotify contention with aio and io_uring.
>> >
>> > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless
>> > you recognize the sender and were expecting this message.
>> >
>> >
>> > Pierre Labat <[email protected]> writes:
>> >
>> > > Micron Confidential
>> > >
>> > > Hi Jeff and Jens,
>> > >
>> > > About "FAN_MODIFY fsnotify watch set on /dev".
>> > >
>> > > Was using Fedora34 distro (with 6.3.9 kernel), and fio. Without any
>> > particular/specific setting.
>> > > I tried to see what could watch /dev but failed at that.
>> > > I used the inotify-info tool, but that display watchers using the
>> > > inotify interface. And nothing was watching /dev via inotify.
>> > > Need to figure out how to do the same but for the fanotify interface.
>> > > I'll look at it again and let you know.
>> >
>> > You wouldn't happen to be running pipewire, would you?
>> >
>> > https://urldefense.com/v3/__https://gitlab.freedesktop.org/pipewire/pi
>> > pewir
>> > e/-
>> > /commit/88f0dbd6fcd0a412fc4bece22afdc3ba0151e4cf__;!!KZTdOCjhgt4hgw!6E
>> > 063jj
>> > -_XK1NceWzms7DaYacILy4cKmeNVA3xalNwkd0zrYTX-IouUnvJ8bZs-RG3YSdk5XpFoo$
>> >
>> > -Jeff
>> >
>> > >
>> > > Regards,
>> > >
>> > > Pierre
>> > >
>> > >
>> > >
>> > > Micron Confidential
>> > >> -----Original Message-----
>> > >> From: Jens Axboe <[email protected]>
>> > >> Sent: Tuesday, August 8, 2023 2:41 PM
>> > >> To: Jeff Moyer <[email protected]>; Pierre Labat
>> > >> <[email protected]>
>> > >> Cc: '[email protected]' <[email protected]>
>> > >> Subject: [EXT] Re: FYI, fsnotify contention with aio and io_uring.
>> > >>
>> > >> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments
>> > >> unless you recognize the sender and were expecting this message.
>> > >>
>> > >>
>> > >> On 8/7/23 2:11?PM, Jeff Moyer wrote:
>> > >> > Hi, Pierre,
>> > >> >
>> > >> > Pierre Labat <[email protected]> writes:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> This is FYI, may be you already knows about that, but in case
>> > >> >> you
>> > >> don't....
>> > >> >>
>> > >> >> I was pushing the limit of the number of nvme read IOPS, the FIO
>> > >> >> + the Linux OS can handle. For that, I have something special
>> > >> >> under the Linux nvme driver. As a consequence I am not limited
>> > >> >> by whatever the NVME SSD max IOPS or IO latency would be.
>> > >> >>
>> > >> >> As I cranked the number of system cores and FIO jobs doing
>> > >> >> direct 4k random read on /dev/nvme0n1, I hit a wall. The IOPS
>> > >> >> scaling slows (less than linear) and around 15 FIO jobs on 15
>> > >> >> core threads, the overall IOPS, in fact, goes down as I add more
>> > >> >> FIO jobs. For example on a system with 24 cores/48 threads, when
>> > >> >> I goes beyond 15 FIO jobs, the overall IOPS starts to go down.
>> > >> >>
>> > >> >> This happens the same for io_uring and aio. Was using kernel
>> > >> >> version
>> > >> 6.3.9. Using one namespace (/dev/nvme0n1).
>> > >> >
>> > >> > [snip]
>> > >> >
>> > >> >> As you can see 76% of the cpu on the box is sucked up by
>> > >> >> lockref_get_not_zero() and lockref_put_return(). Looking at the
>> > >> >> code, there is contention when IO_uring call fsnotify_access().
>> > >> >
>> > >> > Is there a FAN_MODIFY fsnotify watch set on /dev? If so, it
>> > >> > might be a good idea to find out what set it and why.
>> > >>
>> > >> This would be my guess too, some distros do seem to do that. The
>> > >> notification bits scale horribly, nobody should use it for anything
>> > >> high performance...
>> > >>
>> > >> --
>> > >> Jens Axboe
[-- Attachment #2: inotify-watchers.sh --]
[-- Type: application/x-sh, Size: 795 bytes --]
prev parent reply other threads:[~2023-09-14 19:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-04 17:47 FYI, fsnotify contention with aio and io_uring Pierre Labat
2023-08-07 20:11 ` Jeff Moyer
2023-08-08 21:41 ` Jens Axboe
2023-08-09 16:33 ` [EXT] " Pierre Labat
2023-08-09 17:14 ` Jeff Moyer
2023-08-14 16:30 ` Pierre Labat
2023-08-29 21:54 ` Pierre Labat
2023-09-14 19:11 ` Jeff Moyer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox