From: Jens Axboe <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: io-uring <[email protected]>,
linux-fsdevel <[email protected]>,
Al Viro <[email protected]>
Subject: Re: [PATCH 2/3] io_uring: use iov_iter state save/restore helpers
Date: Tue, 14 Sep 2021 17:02:27 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 9/14/21 1:37 PM, Jens Axboe wrote:
> On 9/14/21 12:45 PM, Linus Torvalds wrote:
>> On Tue, Sep 14, 2021 at 7:18 AM Jens Axboe <[email protected]> wrote:
>>>
>>>
>>> + iov_iter_restore(iter, state);
>>> +
>> ...
>>> rw->bytes_done += ret;
>>> + iov_iter_advance(iter, ret);
>>> + if (!iov_iter_count(iter))
>>> + break;
>>> + iov_iter_save_state(iter, state);
>>
>> Ok, so now you keep iovb_iter and the state always in sync by just
>> always resetting the iter back and then walking it forward explicitly
>> - and re-saving the state.
>>
>> That seems safe, if potentially unnecessarily expensive.
>
> Right, it's not ideal if it's a big range of IO, then it'll definitely
> be noticeable. But not too worried about it, at least not for now...
>
>> I guess re-walking lots of iovec entries is actually very unlikely in
>> practice, so maybe this "stupid brute-force" model is the right one.
>
> Not sure what the alternative is here. We could do something similar to
> __io_import_fixed() as we're only dealing with iter types where we can
> do that, but probably best left as a later optimization if it's deemed
> necessary.
>
>> I do find the odd "use __state vs rw->state" to be very confusing,
>> though. Particularly in io_read(), where you do this:
>>
>> + iov_iter_restore(iter, state);
>> +
>> ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
>> if (ret2)
>> return ret2;
>>
>> iovec = NULL;
>> rw = req->async_data;
>> - /* now use our persistent iterator, if we aren't already */
>> - iter = &rw->iter;
>> + /* now use our persistent iterator and state, if we aren't already */
>> + if (iter != &rw->iter) {
>> + iter = &rw->iter;
>> + state = &rw->iter_state;
>> + }
>>
>> do {
>> - io_size -= ret;
>> rw->bytes_done += ret;
>> + iov_iter_advance(iter, ret);
>> + if (!iov_iter_count(iter))
>> + break;
>> + iov_iter_save_state(iter, state);
>>
>>
>> Note how it first does that iov_iter_restore() on iter/state, buit
>> then it *replaces&* the iter/state pointers, and then it does
>> iov_iter_advance() on the replacement ones.
>
> We restore the iter so it's the same as before we did the read_iter
> call, and then setup a consistent copy of the iov/iter in case we need
> to punt this request for retry. rw->iter should have the same state as
> iter at this point, and since rw->iter is the copy we'll use going
> forward, we're advancing that one in case ret > 0.
>
> The other case is that no persistent state is needed, and then iter
> remains the same.
>
> I'll take a second look at this part and see if I can make it a bit more
> straight forward, or at least comment it properly.
I hacked up something that shortens the iter for the initial IO, so we
could more easily test the retry path and the state. It really is a
hack, but the idea was to issue 64K io from fio, and then the initial
attempt would be anywhere from 4K-60K truncated. That forces retry.
I ran this with both 16 segments and 8 segments, verifying that it
hits both the UIO_FASTIOV and alloc path.
I did find one issue with that, see the last hunk in the hack. We
need to increment rw->bytes_done if we don't break, or set ret to
0 if we do. Otherwise that last ret ends up being accounted twice.
But apart from that, it passes data verification runs.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc1ff47e3221..484c86252f9d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -744,6 +744,7 @@ enum {
REQ_F_NOWAIT_READ_BIT,
REQ_F_NOWAIT_WRITE_BIT,
REQ_F_ISREG_BIT,
+ REQ_F_TRUNCATED_BIT,
/* not a real bit, just to check we're not overflowing the space */
__REQ_F_LAST_BIT,
@@ -797,6 +798,7 @@ enum {
REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT),
/* there is a linked timeout that has to be armed */
REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT),
+ REQ_F_TRUNCATED = BIT(REQ_F_TRUNCATED_BIT),
};
struct async_poll {
@@ -3454,11 +3456,12 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
{
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct kiocb *kiocb = &req->rw.kiocb;
- struct iov_iter __iter, *iter = &__iter;
+ struct iov_iter __i, __iter, *iter = &__iter;
struct io_async_rw *rw = req->async_data;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
struct iov_iter_state __state, *state;
ssize_t ret, ret2;
+ bool do_restore = false;
if (rw) {
iter = &rw->iter;
@@ -3492,8 +3495,25 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
return ret;
}
+ if (!(req->flags & REQ_F_TRUNCATED) && !(iov_iter_count(iter) & 4095)) {
+ int nr_vecs;
+
+ __i = *iter;
+ nr_vecs = 1 + (prandom_u32() % iter->nr_segs);
+ iter->nr_segs = nr_vecs;
+ iter->count = nr_vecs * 8192;
+ req->flags |= REQ_F_TRUNCATED;
+ do_restore = true;
+ }
+
ret = io_iter_do_read(req, iter);
+ if (ret == -EAGAIN) {
+ req->flags &= ~REQ_F_TRUNCATED;
+ *iter = __i;
+ do_restore = false;
+ }
+
if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) {
req->flags &= ~REQ_F_REISSUE;
/* IOPOLL retry should happen for io-wq threads */
@@ -3513,6 +3533,9 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
iov_iter_restore(iter, state);
+ if (do_restore)
+ *iter = __i;
+
ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
if (ret2)
return ret2;
@@ -3526,10 +3549,10 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
}
do {
- rw->bytes_done += ret;
iov_iter_advance(iter, ret);
if (!iov_iter_count(iter))
break;
+ rw->bytes_done += ret;
iov_iter_save_state(iter, state);
/* if we can retry, do so with the callbacks armed */
--
Jens Axboe
next prev parent reply other threads:[~2021-09-14 23:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-14 14:17 [PATCHSET v2 0/3] Add ability to save/restore iov_iter state Jens Axboe
2021-09-14 14:17 ` [PATCH 1/3] iov_iter: add helper to save " Jens Axboe
2021-09-14 14:17 ` [PATCH 2/3] io_uring: use iov_iter state save/restore helpers Jens Axboe
2021-09-14 18:45 ` Linus Torvalds
2021-09-14 19:37 ` Jens Axboe
2021-09-14 23:02 ` Jens Axboe [this message]
2021-09-14 14:17 ` [PATCH 3/3] Revert "iov_iter: track truncated size" Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2021-09-15 16:29 [PATCHSET v3 0/3] Add ability to save/restore iov_iter state Jens Axboe
2021-09-15 16:29 ` [PATCH 2/3] io_uring: use iov_iter state save/restore helpers Jens Axboe
2021-09-10 18:25 [PATCHSET 0/3] Add ability to save/restore iov_iter state Jens Axboe
2021-09-10 18:25 ` [PATCH 2/3] io_uring: use iov_iter state save/restore helpers Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox