From: Martin Raiber <[email protected]>
To: [email protected], [email protected], [email protected]
Subject: Re: [PATCH] btrfs: Prevent nowait or async read from doing sync IO
Date: Mon, 8 Mar 2021 19:03:30 +0000 [thread overview]
Message-ID: <0102017813390ff7-879ae32a-99ac-4c52-a01d-58c85686d2f8-000000@eu-west-1.amazonses.com> (raw)
In-Reply-To: <[email protected]>
On 26.02.2021 18:00 David Sterba wrote:
> On Fri, Jan 08, 2021 at 12:02:48AM +0000, Martin Raiber wrote:
>> When reading from btrfs file via io_uring I get following
>> call traces:
>>
>> [<0>] wait_on_page_bit+0x12b/0x270
>> [<0>] read_extent_buffer_pages+0x2ad/0x360
>> [<0>] btree_read_extent_buffer_pages+0x97/0x110
>> [<0>] read_tree_block+0x36/0x60
>> [<0>] read_block_for_search.isra.0+0x1a9/0x360
>> [<0>] btrfs_search_slot+0x23d/0x9f0
>> [<0>] btrfs_lookup_csum+0x75/0x170
>> [<0>] btrfs_lookup_bio_sums+0x23d/0x630
>> [<0>] btrfs_submit_data_bio+0x109/0x180
>> [<0>] submit_one_bio+0x44/0x70
>> [<0>] extent_readahead+0x37a/0x3a0
>> [<0>] read_pages+0x8e/0x1f0
>> [<0>] page_cache_ra_unbounded+0x1aa/0x1f0
>> [<0>] generic_file_buffered_read+0x3eb/0x830
>> [<0>] io_iter_do_read+0x1a/0x40
>> [<0>] io_read+0xde/0x350
>> [<0>] io_issue_sqe+0x5cd/0xed0
>> [<0>] __io_queue_sqe+0xf9/0x370
>> [<0>] io_submit_sqes+0x637/0x910
>> [<0>] __x64_sys_io_uring_enter+0x22e/0x390
>> [<0>] do_syscall_64+0x33/0x80
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Prevent those by setting IOCB_NOIO before calling
>> generic_file_buffered_read.
>>
>> Async read has the same problem. So disable that by removing
>> FMODE_BUF_RASYNC. This was added with commit
>> 8730f12b7962b21ea9ad2756abce1e205d22db84 ("btrfs: flag files as
>> supporting buffered async reads") with 5.9. Io_uring will read
>> the data via worker threads if it can't be read without sync IO
>> this way.
>>
>> Signed-off-by: Martin Raiber <[email protected]>
>> ---
>> fs/btrfs/file.c | 15 +++++++++++++--
>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 0e41459b8..8bb561f6d 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -3589,7 +3589,7 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence)
>>
>> static int btrfs_file_open(struct inode *inode, struct file *filp)
>> {
>> - filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
>> + filp->f_mode |= FMODE_NOWAIT;
>> return generic_file_open(inode, filp);
>> }
>>
>> @@ -3639,7 +3639,18 @@ static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>> return ret;
>> }
>>
>> - return generic_file_buffered_read(iocb, to, ret);
>> + if (iocb->ki_flags & IOCB_NOWAIT)
>> + iocb->ki_flags |= IOCB_NOIO;
>> +
>> + ret = generic_file_buffered_read(iocb, to, ret);
>> +
>> + if (iocb->ki_flags & IOCB_NOWAIT) {
>> + iocb->ki_flags &= ~IOCB_NOIO;
>> + if (ret == 0)
>> + ret = -EAGAIN;
>> + }
> Christoph has some doubts about the code,
> https://lore.kernel.org/lkml/[email protected]/
>
> The patch has been in for-next but as I'm not sure it's correct and
> don't have a reproducer, I'll remove it again. We do want to fix the
> warning, maybe there's only something trivial missing but we need to be
> sure, I don't have enough expertise here.
The general gist of the critism is kind of correct. It is generic_file_buffered_read/filemap_read that handles the IOCB_NOIO, however. It is only used from gfs2 since 5.8 and IOCB_NOIO was added to 5.8 with 41da51bce36f44eefc1e3d0f47d18841cbd065ba ....
However, I cannot see how to find out if readahead was called with IOCB_NOWAIT from extent_readahead/btrfs_readahead/readahead_control. So add an additional parameter to address_space_operations.readahead ? As mentioned, not too relevant to btrfs (because of the CRC calculation), but making readahead async in all cases (incl. IOCB_WAITQ) would be the proper solution.
W.r.t. testing: The most low-effort way I can think of is to add an io_uring switch to xfs_io, so that xfstests can be run using io_uring (where possible). Then check via tracing/perf that there aren't any call stacks with both io_uring_enter and wait_on_page_bit (or any other blocking call) in them.
prev parent reply other threads:[~2021-03-08 19:04 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-08 0:02 [PATCH] btrfs: Prevent nowait or async read from doing sync IO Martin Raiber
2021-01-12 15:36 ` David Sterba
2021-01-12 17:01 ` Pavel Begunkov
2021-01-24 19:09 ` Martin Raiber
2021-02-26 17:00 ` David Sterba
2021-03-08 19:03 ` Martin Raiber [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0102017813390ff7-879ae32a-99ac-4c52-a01d-58c85686d2f8-000000@eu-west-1.amazonses.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox