From: Hao_Xu <[email protected]>
To: Jens Axboe <[email protected]>,
Matthew Wilcox <[email protected]>,
[email protected]
Cc: Johannes Weiner <[email protected]>,
Andrew Morton <[email protected]>
Subject: Re: Loophole in async page I/O
Date: Thu, 15 Oct 2020 20:17:25 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
在 2020/10/15 下午7:27, Hao_Xu 写道:
> 在 2020/10/15 上午4:57, Jens Axboe 写道:
>> On 10/14/20 2:31 PM, Hao_Xu wrote:
>>> Hi Jens,
>>> I've done some tests for the new fix code with readahead disabled from
>>> userspace. Here comes some results.
>>> For the perf reports, since I'm new to kernel stuff, still investigating
>>> on it.
>>> I'll keep addressing the issue which causes the difference among the
>>> four perf reports(in which the copy_user_enhanced_fast_string() catches
>>> my eyes)
>>>
>>> my environment is:
>>> server: physical server
>>> kernel: mainline 5.9.0-rc8+ latest commit 6f2f486d57c4d562cdf4
>>> fs: ext4
>>> device: nvme ssd
>>> fio: 3.20
>>>
>>> I did the tests by setting and commenting the code:
>>> filp->f_mode |= FMODE_BUF_RASYNC;
>>> in fs/ext4/file.c ext4_file_open()
>>
>> You don't have to modify the kernel, if you use a newer fio then you can
>> essentially just add:
>>
>> --force_async=1
>>
>> after setting the engine to io_uring to get the same effect. Just a
>> heads up, as that might make it easier for you.
>>
>>> the IOPS with readahead disabled from userspace is below:
>>>
>>> with new fix code(force readahead)
>>> QD/Test FMODE_BUF_RASYNC set FMODE_BUF_RASYNC not set
>>> 1 10.8k 10.3k
>>> 2 21.2k 20.1k
>>> 4 41.1k 39.1k
>>> 8 76.1k 72.2k
>>> 16 133k 126k
>>> 32 169k 147k
>>> 64 176k 160k
>>> 128 (1)187k (2)156k
>>>
>>> now async buffered reads feature looks better in terms of IOPS,
>>> but it still looks similar with the async buffered reads feature in the
>>> mainline code.
>>
>> I'd say it looks better all around. And what you're completely
>> forgetting here is that when FMODE_BUF_RASYNC isn't set, then you're
>> using QD number of async workers to achieve that result. Hence you have
>> 1..128 threads potentially running on that one, vs having a _single_
>> process running with FMODE_BUF_RASYNC.
> I totally agree with this, the server I use has many cpus which makes
> the multiple async workers works exactly parallelly.
>
>>
>>> with mainline code(the fix code in commit c8d317aa1887 ("io_uring: fix
>>> async buffered reads when readahead is disabled"))
>>> QD/Test FMODE_BUF_RASYNC set FMODE_BUF_RASYNC not set
>>> 1 10.9k 10.2k
>>> 2 21.6k 20.2k
>>> 4 41.0k 39.9k
>>> 8 79.7k 75.9k
>>> 16 141k 138k
>>> 32 169k 237k
>>> 64 190k 316k
>>> 128 (3)195k (4)315k
>>>
>>> Considering the number in place (1)(2)(3)(4), the new fix doesn't seem
>>> to fix the slow down
>>> but make the number (4) become number (2)
>>
>> Not sure why there would be a difference between 2 and 4, that does seem
>> odd. I'll see if I can reproduce that. More questions below.
>>
>>> the perf reports of (1)(2)(3)(4) situations are:
>>> (1)
>>> 9 # Overhead Command Shared Object Symbol
>>> 10 # ........ ....... ..................
>>> ..............................................
>>> 11 #
>>> 12 10.19% fio [kernel.vmlinux] [k]
>>> copy_user_enhanced_fast_string
>>> 13 8.53% fio fio [.] clock_thread_fn
>>> 14 4.67% fio [kernel.vmlinux] [k] xas_load
>>> 15 2.18% fio [kernel.vmlinux] [k] clear_page_erms
>>> 16 2.02% fio libc-2.24.so [.] __memset_avx2_erms
>>> 17 1.55% fio [kernel.vmlinux] [k] mutex_unlock
>>> 18 1.51% fio [kernel.vmlinux] [k] shmem_getpage_gfp
>>> 19 1.48% fio [kernel.vmlinux] [k]
>>> native_irq_return_iret
>>> 20 1.48% fio [kernel.vmlinux] [k]
>>> get_page_from_freelist
>>> 21 1.46% fio [kernel.vmlinux] [k]
>>> generic_file_buffered_read
>>> 22 1.45% fio [nvme] [k] nvme_irq
>>> 23 1.25% fio [kernel.vmlinux] [k]
>>> __list_del_entry_valid
>>> 24 1.22% fio [kernel.vmlinux] [k] free_pcppages_bulk
>>> 25 1.15% fio [kernel.vmlinux] [k] _raw_spin_lock
>>> 26 1.12% fio fio [.] get_io_u
>>> 27 0.81% fio [ext4] [k] ext4_mpage_readpages
>>> 28 0.78% fio fio [.] fio_gettime
>>> 29 0.76% fio [kernel.vmlinux] [k] find_get_entries
>>> 30 0.75% fio [vdso] [.] __vdso_clock_gettime
>>> 31 0.73% fio [kernel.vmlinux] [k] release_pages
>>> 32 0.68% fio [kernel.vmlinux] [k] find_get_entry
>>> 33 0.68% fio fio [.] io_u_queued_complete
>>> 34 0.67% fio [kernel.vmlinux] [k] io_async_buf_func
>>> 35 0.65% fio [kernel.vmlinux] [k] io_submit_sqes
>>
>> These profiles are of marginal use, as you're only profiling fio itself,
>> not all of the async workers that are running for !FMODE_BUF_RASYNC.
>>
> Ah, I got it. Thanks.
>> How long does the test run? It looks suspect that clock_thread_fn shows
>> up in the profiles at all.
>>
> it runs about 5 msec, randread 4G with bs=4k
Sorry, 5 seconds not 5 msec.
>> And is it actually doing IO, or are you using shm/tmpfs for this test?
>> Isn't ext4 hosting the file? I see a lot of shmem_getpage_gfp(), makes
>> me a little confused.
>>
> I'm using ext4 on real nvme ssd device. from the call stack, the
> shm_getpage_gfp is from __memset_avx2_erms in libc.
> there are ext4 related functions in all the four reports.
> I'm doing more to check if it is my test process causing high IOPS in
> case (4).
next prev parent reply other threads:[~2020-10-15 12:17 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-12 21:13 Loophole in async page I/O Matthew Wilcox
2020-10-12 22:08 ` Jens Axboe
2020-10-12 22:22 ` Jens Axboe
2020-10-12 22:42 ` Jens Axboe
2020-10-14 20:31 ` Hao_Xu
2020-10-14 20:57 ` Jens Axboe
2020-10-15 11:27 ` Hao_Xu
2020-10-15 12:17 ` Hao_Xu [this message]
2020-10-13 5:31 ` Hao_Xu
2020-10-13 17:50 ` Jens Axboe
2020-10-13 19:50 ` Hao_Xu
2020-10-13 5:13 ` Hao_Xu
2020-10-13 12:01 ` Matthew Wilcox
2020-10-13 19:57 ` Hao_Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ebc1be1f-6bea-044c-3467-d1f6c74ace11@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox