From: Jens Axboe <[email protected]>
To: Matthew Wilcox <[email protected]>, [email protected]
Cc: Johannes Weiner <[email protected]>, Hao_Xu <[email protected]>
Subject: Re: Loophole in async page I/O
Date: Mon, 12 Oct 2020 16:22:43 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 10/12/20 4:08 PM, Jens Axboe wrote:
> On 10/12/20 3:13 PM, Matthew Wilcox wrote:
>> This one's pretty unlikely, but there's a case in buffered reads where
>> an IOCB_WAITQ read can end up sleeping.
>>
>> generic_file_buffered_read():
>> page = find_get_page(mapping, index);
>> ...
>> if (!PageUptodate(page)) {
>> ...
>> if (iocb->ki_flags & IOCB_WAITQ) {
>> ...
>> error = wait_on_page_locked_async(page,
>> iocb->ki_waitq);
>> wait_on_page_locked_async():
>> if (!PageLocked(page))
>> return 0;
>> (back to generic_file_buffered_read):
>> if (!mapping->a_ops->is_partially_uptodate(page,
>> offset, iter->count))
>> goto page_not_up_to_date_locked;
>>
>> page_not_up_to_date_locked:
>> if (iocb->ki_flags & (IOCB_NOIO | IOCB_NOWAIT)) {
>> unlock_page(page);
>> put_page(page);
>> goto would_block;
>> }
>> ...
>> error = mapping->a_ops->readpage(filp, page);
>> (will unlock page on I/O completion)
>> if (!PageUptodate(page)) {
>> error = lock_page_killable(page);
>>
>> So if we have IOCB_WAITQ set but IOCB_NOWAIT clear, we'll call ->readpage()
>> and wait for the I/O to complete. I can't quite figure out if this is
>> intentional -- I think not; if I understand the semantics right, we
>> should be returning -EIOCBQUEUED and punting to an I/O thread to
>> kick off the I/O and wait.
>>
>> I think the right fix is to return -EIOCBQUEUED from
>> wait_on_page_locked_async() if the page isn't locked. ie this:
>>
>> @@ -1258,7 +1258,7 @@ static int wait_on_page_locked_async(struct page *page,
>> struct wait_page_queue *wait)
>> {
>> if (!PageLocked(page))
>> - return 0;
>> + return -EIOCBQUEUED;
>> return __wait_on_page_locked_async(compound_head(page), wait, false);
>> }
>>
>> But as I said, I'm not sure what the semantics are supposed to be.
>
> If NOWAIT isn't set, then the issue attempt is from the helper thread
> already, and IOCB_WAITQ shouldn't be set either (the latter doesn't
> matter for this discussion). So it's totally fine and expected to block
> at that point.
>
> Hmm actually, I believe that:
>
> commit c8d317aa1887b40b188ec3aaa6e9e524333caed1
> Author: Hao Xu <[email protected]>
> Date: Tue Sep 29 20:00:45 2020 +0800
>
> io_uring: fix async buffered reads when readahead is disabled
>
> maybe messed up that case, so we could block off the retry-path. I'll
> take a closer look, looks like that can be the case if read-ahead is
> disabled.
>
> In general, we can only return -EIOCBQUEUED if the IO has been started
> or is in progress already. That means we can safely rely on being told
> when it's unlocked/done. If we need to block, we should be returning
> -EAGAIN, which would punt to a worker thread.
Something like the below might be a better solution - just always use
the read-ahead to generate the IO, for the requested range. That won't
issue any IO beyond what we asked for. And ensure we don't clear NOWAIT
on the io_uring side for retry.
Totally untested... Just trying to get the idea across. We might need
some low cap on req_count in case the range is large. Hao Xu, can you
try with this? Thinking of your read-ahead disabled slowdown as well,
this could very well be the reason why.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index aae0ef2ec34d..9a2dfe132665 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3107,7 +3107,6 @@ static bool io_rw_should_retry(struct io_kiocb *req)
wait->wait.flags = 0;
INIT_LIST_HEAD(&wait->wait.entry);
kiocb->ki_flags |= IOCB_WAITQ;
- kiocb->ki_flags &= ~IOCB_NOWAIT;
kiocb->ki_waitq = wait;
io_get_req_task(req);
diff --git a/mm/readahead.c b/mm/readahead.c
index 3c9a8dd7c56c..693af86d171d 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -568,15 +568,16 @@ void page_cache_sync_readahead(struct address_space *mapping,
struct file_ra_state *ra, struct file *filp,
pgoff_t index, unsigned long req_count)
{
- /* no read-ahead */
- if (!ra->ra_pages)
- return;
-
if (blk_cgroup_congested())
return;
- /* be dumb */
- if (filp && (filp->f_mode & FMODE_RANDOM)) {
+ /*
+ * Even if read-ahead is disabled, issue this request as read-ahead
+ * as we'll need it to satisfy the requested range. The forced
+ * read-ahead will do the right thing and limit the read to just the
+ * requested range.
+ */
+ if (!ra->ra_pages || (filp && (filp->f_mode & FMODE_RANDOM))) {
force_page_cache_readahead(mapping, filp, index, req_count);
return;
}
--
Jens Axboe
next prev parent reply other threads:[~2020-10-12 22:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-12 21:13 Loophole in async page I/O Matthew Wilcox
2020-10-12 22:08 ` Jens Axboe
2020-10-12 22:22 ` Jens Axboe [this message]
2020-10-12 22:42 ` Jens Axboe
2020-10-14 20:31 ` Hao_Xu
2020-10-14 20:57 ` Jens Axboe
2020-10-15 11:27 ` Hao_Xu
2020-10-15 12:17 ` Hao_Xu
2020-10-13 5:31 ` Hao_Xu
2020-10-13 17:50 ` Jens Axboe
2020-10-13 19:50 ` Hao_Xu
2020-10-13 5:13 ` Hao_Xu
2020-10-13 12:01 ` Matthew Wilcox
2020-10-13 19:57 ` Hao_Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox