public inbox for [email protected]
 help / color / mirror / Atom feed
* Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15
@ 2023-01-19 21:36 Saeed Mirzamohammadi
  2023-01-20  4:12 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Saeed Mirzamohammadi @ 2023-01-19 21:36 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, asml.silence, linux-kernel

Hello,
 
I'm reporting a performance regression after the commit below on phoronix pts/fio test and with the config that is added in the end of this email:

Link: https://lore.kernel.org/all/[email protected]/

commit 7b3188e7ed54102a5dcc73d07727f41fb528f7c8
Author: Jens Axboe [email protected]
Date:   Mon Aug 30 19:37:41 2021 -0600
 
    io_uring: IORING_OP_WRITE needs hash_reg_file set
 
We observed regression on the latest v6.1.y and v5.15.y upstream kernels (Haven't tested other stable kernels). We noticed that performance regression improved 45% after the revert of the commit above.
 
All of the benchmarks below have experienced around ~45% regression.
phoronix-pts-fio-1.15.0-RandomWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs
phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs
phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedYes-DirectNo-BlockSize4KB-MB-s_xfs
 
We tend to see this regression on 4KB BlockSize tests.
 
We tried out changing force_async but that has no effect on the result. Also, backported a modified version of the patch mentioned here (https://lkml.org/lkml/2022/7/20/854) but that didn't affect performance.
 
Do you have any suggestions on any fixes or what else we can try to narrow down the issue?
 
Thanks a bunch,
Saeed
--------
 
Here is more info on the benchmark and system:
 
Here is the config for fio:
[global]
rw=randwrite
ioengine=io_uring
iodepth=64
size=1g
direct=1
buffered=0
startdelay=5
force_async=4
ramp_time=5
runtime=20
time_based
disk_util=0
clat_percentiles=0
disable_lat=1
disable_clat=1
disable_slat=1
filename=/data/fiofile
[test]
name=test
bs=4k
stonewall
 
df -Th output (file is on /data/):
Filesystem                 Type      Size  Used Avail Use% Mounted on
devtmpfs                   devtmpfs  252G     0  252G   0% /dev
tmpfs                      tmpfs     252G     0  252G   0% /dev/shm
tmpfs                      tmpfs     252G   18M  252G   1% /run
tmpfs                      tmpfs     252G     0  252G   0% /sys/fs/cgroup
/dev/mapper/ocivolume-root xfs        89G   17G   73G  19% /
/dev/mapper/ocivolume-oled xfs        10G  143M  9.9G   2% /var/oled
/dev/sda2                  xfs      1014M  643M  372M  64% /boot
/dev/sda1                  vfat      100M  5.0M   95M   6% /boot/efi
tmpfs                      tmpfs      51G     0   51G   0% /run/user/0
tmpfs                      tmpfs      51G     0   51G   0% /run/user/987
/dev/mapper/tank-lvm       xfs       100G  1.8G   99G   2% /data

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15
  2023-01-19 21:36 Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15 Saeed Mirzamohammadi
@ 2023-01-20  4:12 ` Jens Axboe
  2023-01-26  0:22   ` Saeed Mirzamohammadi
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-01-20  4:12 UTC (permalink / raw)
  To: Saeed Mirzamohammadi, io-uring; +Cc: asml.silence, linux-kernel

On 1/19/23 2:36?PM, Saeed Mirzamohammadi wrote:
> Hello,
>  
> I'm reporting a performance regression after the commit below on
> phoronix pts/fio test and with the config that is added in the end of
> this email:
> 
> Link: https://lore.kernel.org/all/[email protected]/
> 
> commit 7b3188e7ed54102a5dcc73d07727f41fb528f7c8
> Author: Jens Axboe [email protected]
> Date:   Mon Aug 30 19:37:41 2021 -0600
>  
>     io_uring: IORING_OP_WRITE needs hash_reg_file set
>  
> We observed regression on the latest v6.1.y and v5.15.y upstream
> kernels (Haven't tested other stable kernels). We noticed that
> performance regression improved 45% after the revert of the commit
> above.
>  
> All of the benchmarks below have experienced around ~45% regression.
> phoronix-pts-fio-1.15.0-RandomWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs
> phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs
> phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedYes-DirectNo-BlockSize4KB-MB-s_xfs
>  
> We tend to see this regression on 4KB BlockSize tests.
>  
> We tried out changing force_async but that has no effect on the
> result. Also, backported a modified version of the patch mentioned
> here (https://lkml.org/lkml/2022/7/20/854) but that didn't affect
> performance.
>  
> Do you have any suggestions on any fixes or what else we can try to
> narrow down the issue?

This is really mostly by design - the previous approach of not hashing
buffered writes on regular files would cause a lot of inode lock
contention due to lots of threads hammering on that.

That said, for XFS, we don't need to serialize on O_DIRECT writes. Don't
think we currently have a way to detect this as it isn't really
advertised. Something like the below might work, with the caveat that
this is totally untested.


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 595a5bcf46b9..85fdc6f2efa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1171,7 +1171,8 @@ xfs_file_open(
 {
 	if (xfs_is_shutdown(XFS_M(inode->i_sb)))
 		return -EIO;
-	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
+	file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC |
+			FMODE_ODIRECT_PARALLEL;
 	return generic_file_open(inode, file);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c1769a2c5d70..8541b9e53c2d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -166,6 +166,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File supports DIRECT IO */
 #define	FMODE_CAN_ODIRECT	((__force fmode_t)0x400000)
 
+/* File supports parallel O_DIRECT writes */
+#define	FMODE_ODIRECT_PARALLEL	((__force fmode_t)0x800000)
+
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x4000000)
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e680685e8a00..1409f6f69b13 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -424,7 +424,12 @@ static void io_prep_async_work(struct io_kiocb *req)
 		req->flags |= io_file_get_flags(req->file) << REQ_F_SUPPORT_NOWAIT_BIT;
 
 	if (req->flags & REQ_F_ISREG) {
-		if (def->hash_reg_file || (ctx->flags & IORING_SETUP_IOPOLL))
+		bool should_hash = def->hash_reg_file;
+
+		if (should_hash && (req->file->f_flags & O_DIRECT) &&
+		    (req->file->f_mode & FMODE_ODIRECT_PARALLEL))
+			should_hash = false;
+		if (should_hash || (ctx->flags & IORING_SETUP_IOPOLL))
 			io_wq_hash_work(&req->work, file_inode(req->file));
 	} else if (!req->file || !S_ISBLK(file_inode(req->file)->i_mode)) {
 		if (def->unbound_nonreg_file)

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15
  2023-01-20  4:12 ` Jens Axboe
@ 2023-01-26  0:22   ` Saeed Mirzamohammadi
  2023-01-26  0:28     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Saeed Mirzamohammadi @ 2023-01-26  0:22 UTC (permalink / raw)
  To: Jens Axboe, [email protected]
  Cc: [email protected], [email protected]

Hi Jens,

I applied your patch (with a minor conflict in xfs_file_open() since FMODE_BUF_WASYNC isn't in v5.15) and did the same series of tests on the v5.15 kernel. All the io_uring benchmarks regressed 20-45% after it. I haven't tested on v6.1 yet.

Thanks,
Saeed
________________________________________
From: Jens Axboe <[email protected]>
Sent: Thursday, January 19, 2023 8:12 PM
To: Saeed Mirzamohammadi; [email protected]
Cc: [email protected]; [email protected]
Subject: Re: Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15

On 1/19/23 2:36?PM, Saeed Mirzamohammadi wrote:
> Hello,
>
> I'm reporting a performance regression after the commit below on
> phoronix pts/fio test and with the config that is added in the end of
> this email:
>
> Link: https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!ACWV5N9M2RV99hQ!KM7c30r4OiqvkxVW44cyWb3rZr85i28yKto8WAAcj8OgWAOp-ebzcrHuggGw96ivMFCARikAEWwjhUBYFqujONc$
>
> commit 7b3188e7ed54102a5dcc73d07727f41fb528f7c8
> Author: Jens Axboe [email protected]
> Date:   Mon Aug 30 19:37:41 2021 -0600
>
>     io_uring: IORING_OP_WRITE needs hash_reg_file set
>
> We observed regression on the latest v6.1.y and v5.15.y upstream
> kernels (Haven't tested other stable kernels). We noticed that
> performance regression improved 45% after the revert of the commit
> above.
>
> All of the benchmarks below have experienced around ~45% regression.
> phoronix-pts-fio-1.15.0-RandomWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs
> phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs
> phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedYes-DirectNo-BlockSize4KB-MB-s_xfs
>
> We tend to see this regression on 4KB BlockSize tests.
>
> We tried out changing force_async but that has no effect on the
> result. Also, backported a modified version of the patch mentioned
> here (https://urldefense.com/v3/__https://lkml.org/lkml/2022/7/20/854__;!!ACWV5N9M2RV99hQ!KM7c30r4OiqvkxVW44cyWb3rZr85i28yKto8WAAcj8OgWAOp-ebzcrHuggGw96ivMFCARikAEWwjhUBYbOxQftI$ ) but that didn't affect
> performance.
>
> Do you have any suggestions on any fixes or what else we can try to
> narrow down the issue?

This is really mostly by design - the previous approach of not hashing
buffered writes on regular files would cause a lot of inode lock
contention due to lots of threads hammering on that.

That said, for XFS, we don't need to serialize on O_DIRECT writes. Don't
think we currently have a way to detect this as it isn't really
advertised. Something like the below might work, with the caveat that
this is totally untested.


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 595a5bcf46b9..85fdc6f2efa4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1171,7 +1171,8 @@ xfs_file_open(
 {
        if (xfs_is_shutdown(XFS_M(inode->i_sb)))
                return -EIO;
-       file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
+       file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC |
+                       FMODE_ODIRECT_PARALLEL;
        return generic_file_open(inode, file);
 }

diff --git a/include/linux/fs.h b/include/linux/fs.h
index c1769a2c5d70..8541b9e53c2d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -166,6 +166,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File supports DIRECT IO */
 #define        FMODE_CAN_ODIRECT       ((__force fmode_t)0x400000)

+/* File supports parallel O_DIRECT writes */
+#define        FMODE_ODIRECT_PARALLEL  ((__force fmode_t)0x800000)
+
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY         ((__force fmode_t)0x4000000)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e680685e8a00..1409f6f69b13 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -424,7 +424,12 @@ static void io_prep_async_work(struct io_kiocb *req)
                req->flags |= io_file_get_flags(req->file) << REQ_F_SUPPORT_NOWAIT_BIT;

        if (req->flags & REQ_F_ISREG) {
-               if (def->hash_reg_file || (ctx->flags & IORING_SETUP_IOPOLL))
+               bool should_hash = def->hash_reg_file;
+
+               if (should_hash && (req->file->f_flags & O_DIRECT) &&
+                   (req->file->f_mode & FMODE_ODIRECT_PARALLEL))
+                       should_hash = false;
+               if (should_hash || (ctx->flags & IORING_SETUP_IOPOLL))
                        io_wq_hash_work(&req->work, file_inode(req->file));
        } else if (!req->file || !S_ISBLK(file_inode(req->file)->i_mode)) {
                if (def->unbound_nonreg_file)

--
Jens Axboe


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15
  2023-01-26  0:22   ` Saeed Mirzamohammadi
@ 2023-01-26  0:28     ` Jens Axboe
       [not found]       ` <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-01-26  0:28 UTC (permalink / raw)
  To: Saeed Mirzamohammadi, [email protected]
  Cc: [email protected], [email protected]

On 1/25/23 5:22?PM, Saeed Mirzamohammadi wrote:
> Hi Jens,
> 
> I applied your patch (with a minor conflict in xfs_file_open() since FMODE_BUF_WASYNC isn't in v5.15) and did the same series of tests on the v5.15 kernel. All the io_uring benchmarks regressed 20-45% after it. I haven't tested on v6.1 yet.

It should basically make the behavior the same as before once you apply
the patch, so please pass on the patch that you applied for 5.15 so we
can take a closer look.

Also, please don't top post. Replies go below the context, that's normal
OSS etiquette. Just like I did in the first one, and in this one.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15
       [not found]       ` <[email protected]>
@ 2023-01-26 18:35         ` Jens Axboe
  2023-02-14 18:58           ` Saeed Mirzamohammadi
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-01-26 18:35 UTC (permalink / raw)
  To: Saeed Mirzamohammadi
  Cc: [email protected], [email protected],
	[email protected]

On 1/26/23 11:04 AM, Saeed Mirzamohammadi wrote:
> Hi Jens,
> 
>> On Jan 25, 2023, at 4:28 PM, Jens Axboe <[email protected]> wrote:
>> 
>> On 1/25/23 5:22?PM, Saeed Mirzamohammadi wrote:
>>> Hi Jens,
>>> 
>>> I applied your patch (with a minor conflict in xfs_file_open() since FMODE_BUF_WASYNC isn't in v5.15) and did the same series of tests on the v5.15 kernel. All the io_uring benchmarks regressed 20-45% after it. I haven't tested on v6.1 yet.
>> 
>> It should basically make the behavior the same as before once you apply
>> the patch, so please pass on the patch that you applied for 5.15 so we
>> can take a closer look.
> 
> Attached the patch.

I tested the upstream variant, and it does what it's supposed to and
gets parallel writes on O_DIRECT. Unpatched, any dio write results in:

             fio-566     [000] .....   131.071108: io_uring_queue_async_work: ring 00000000706cb6c0, request 00000000b21691c4, user_data 0xaaab0e8e4c00, opcode WRITE, flags 0xe0040000, hashed queue, work 000000002c5aeb79

and after the patch:

             fio-376     [000] .....    24.590994: io_uring_queue_async_work: ring 000000007bdb650a, request 000000006b5350e0, user_data 0xaaab1b3e3c00, opcode WRITE, flags 0xe0040000, normal queue, work 00000000e3e81955

where the hashed queued is serialized based on the inode, and the normal
queue is not (eg they run in parallel).

As mentioned, the fio job being used isn't representative of anything
that should actually be run, the async flag really only exists for
experimentation. Do you have a real workload that is seeing a regression?
If yes, does that real workload change performance with the patch?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15
  2023-01-26 18:35         ` Jens Axboe
@ 2023-02-14 18:58           ` Saeed Mirzamohammadi
  0 siblings, 0 replies; 6+ messages in thread
From: Saeed Mirzamohammadi @ 2023-02-14 18:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: [email protected], [email protected],
	[email protected]

Hi Jens,

> On Jan 26, 2023, at 10:35 AM, Jens Axboe <[email protected]> wrote:
> 
> On 1/26/23 11:04 AM, Saeed Mirzamohammadi wrote:
>> Hi Jens,
>> 
>>> On Jan 25, 2023, at 4:28 PM, Jens Axboe <[email protected]> wrote:
>>> 
>>> On 1/25/23 5:22?PM, Saeed Mirzamohammadi wrote:
>>>> Hi Jens,
>>>> 
>>>> I applied your patch (with a minor conflict in xfs_file_open() since FMODE_BUF_WASYNC isn't in v5.15) and did the same series of tests on the v5.15 kernel. All the io_uring benchmarks regressed 20-45% after it. I haven't tested on v6.1 yet.
>>> 
>>> It should basically make the behavior the same as before once you apply
>>> the patch, so please pass on the patch that you applied for 5.15 so we
>>> can take a closer look.
>> 
>> Attached the patch.
> 
> I tested the upstream variant, and it does what it's supposed to and
> gets parallel writes on O_DIRECT. Unpatched, any dio write results in:
> 
>             fio-566     [000] .....   131.071108: io_uring_queue_async_work: ring 00000000706cb6c0, request 00000000b21691c4, user_data 0xaaab0e8e4c00, opcode WRITE, flags 0xe0040000, hashed queue, work 000000002c5aeb79
> 
> and after the patch:
> 
>             fio-376     [000] .....    24.590994: io_uring_queue_async_work: ring 000000007bdb650a, request 000000006b5350e0, user_data 0xaaab1b3e3c00, opcode WRITE, flags 0xe0040000, normal queue, work 00000000e3e81955
> 

Thanks for looking into this.

> where the hashed queued is serialized based on the inode, and the normal
> queue is not (eg they run in parallel).
> 
> As mentioned, the fio job being used isn't representative of anything
> that should actually be run, the async flag really only exists for
> experimentation. Do you have a real workload that is seeing a regression?
> If yes, does that real workload change performance with the patch?

I tested without the async flag but didn’t see any change in the performance.

I haven’t tested any real workload yet. I’ll share with you if I noticed anything.

Thanks,
Saeed

p.s. I experienced multipathd issues with the patch that I had to work through. Never without the patch.

> 
> -- 
> Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-02-14 18:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-19 21:36 Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15 Saeed Mirzamohammadi
2023-01-20  4:12 ` Jens Axboe
2023-01-26  0:22   ` Saeed Mirzamohammadi
2023-01-26  0:28     ` Jens Axboe
     [not found]       ` <[email protected]>
2023-01-26 18:35         ` Jens Axboe
2023-02-14 18:58           ` Saeed Mirzamohammadi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox