* [PATCHSET v2 0/3] Add ability to save/restore iov_iter state
@ 2021-09-14 14:17 Jens Axboe
2021-09-14 14:17 ` [PATCH 1/3] iov_iter: add helper to save " Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jens Axboe @ 2021-09-14 14:17 UTC (permalink / raw)
To: io-uring, linux-fsdevel; +Cc: torvalds, viro
Hi,
Linus didn't particularly love the iov_iter->truncated addition and how
it was used, and it was hard to disagree with that. Instead of relying
on tracking ->truncated, add a few pieces of state so we can safely
handle partial or errored read/write attempts (that we want to retry).
Then we can get rid of the iov_iter addition, and at the same time
handle cases that weren't handled correctly before.
I've run this through vectored read/write with io_uring on the commonly
problematic cases (dm and low depth SCSI device) which trigger these
conditions often, and it seems to pass muster.
For a discussion on this topic, see the thread here:
https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/
You can find these patches here:
https://git.kernel.dk/cgit/linux-block/log/?h=iov_iter.2
Changes since v1:
- Drop 'did_bytes' from iov_iter_restore(). Only two cases in io_uring
used it, and one of them had to be changed with v2. Better to just
make the subsequent iov_iter_advance() explicit at that point.
- Cleanup and sanitize the io_uring side, and ensure it's sane around
worker retries. No more digging into iov_iter_state from io_uring, we
use it just for save/restore purposes.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/3] iov_iter: add helper to save iov_iter state
2021-09-14 14:17 [PATCHSET v2 0/3] Add ability to save/restore iov_iter state Jens Axboe
@ 2021-09-14 14:17 ` Jens Axboe
2021-09-14 14:17 ` [PATCH 2/3] io_uring: use iov_iter state save/restore helpers Jens Axboe
2021-09-14 14:17 ` [PATCH 3/3] Revert "iov_iter: track truncated size" Jens Axboe
2 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2021-09-14 14:17 UTC (permalink / raw)
To: io-uring, linux-fsdevel; +Cc: torvalds, viro, Jens Axboe
In an ideal world, when someone is passed an iov_iter and returns X bytes,
then X bytes would have been consumed/advanced from the iov_iter. But we
have use cases that always consume the entire iterator, a few examples
of that are iomap and bdev O_DIRECT. This means we cannot rely on the
state of the iov_iter once we've called ->read_iter() or ->write_iter().
This would be easier if we didn't always have to deal with truncate of
the iov_iter, as rewinding would be trivial without that. We recently
added a commit to track the truncate state, but that grew the iov_iter
by 8 bytes and wasn't the best solution.
Implement a helper to save enough of the iov_iter state to sanely restore
it after we've called the read/write iterator helpers. This currently
only works for IOVEC/BVEC/KVEC as that's all we need, support for other
iterator types are left as an exercise for the reader.
Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/uio.h | 15 +++++++++++++++
lib/iov_iter.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 5265024e8b90..984c4ab74859 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -27,6 +27,12 @@ enum iter_type {
ITER_DISCARD,
};
+struct iov_iter_state {
+ size_t iov_offset;
+ size_t count;
+ unsigned long nr_segs;
+};
+
struct iov_iter {
u8 iter_type;
bool data_source;
@@ -55,6 +61,14 @@ static inline enum iter_type iov_iter_type(const struct iov_iter *i)
return i->iter_type;
}
+static inline void iov_iter_save_state(struct iov_iter *iter,
+ struct iov_iter_state *state)
+{
+ state->iov_offset = iter->iov_offset;
+ state->count = iter->count;
+ state->nr_segs = iter->nr_segs;
+}
+
static inline bool iter_is_iovec(const struct iov_iter *i)
{
return iov_iter_type(i) == ITER_IOVEC;
@@ -233,6 +247,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
size_t maxsize, size_t *start);
int iov_iter_npages(const struct iov_iter *i, int maxpages);
+void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state);
const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f2d50d69a6c3..755c10c5138c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1972,3 +1972,39 @@ int import_single_range(int rw, void __user *buf, size_t len,
return 0;
}
EXPORT_SYMBOL(import_single_range);
+
+/**
+ * iov_iter_restore() - Restore a &struct iov_iter to the same state as when
+ * iov_iter_save_state() was called.
+ *
+ * @i: &struct iov_iter to restore
+ * @state: state to restore from
+ *
+ * Used after iov_iter_save_state() to bring restore @i, if operations may
+ * have advanced it.
+ *
+ * Note: only works on ITER_IOVEC, ITER_BVEC, and ITER_KVEC
+ */
+void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
+{
+ if (WARN_ON_ONCE(!iov_iter_is_bvec(i) && !iter_is_iovec(i)) &&
+ !iov_iter_is_kvec(i))
+ return;
+ i->iov_offset = state->iov_offset;
+ i->count = state->count;
+ /*
+ * For the *vec iters, nr_segs + iov is constant - if we increment
+ * the vec, then we also decrement the nr_segs count. Hence we don't
+ * need to track both of these, just one is enough and we can deduct
+ * the other from that. ITER_KVEC and ITER_IOVEC are the same struct
+ * size, so we can just increment the iov pointer as they are unionzed.
+ * ITER_BVEC _may_ be the same size on some archs, but on others it is
+ * not. Be safe and handle it separately.
+ */
+ BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
+ if (iov_iter_is_bvec(i))
+ i->bvec -= state->nr_segs - i->nr_segs;
+ else
+ i->iov -= state->nr_segs - i->nr_segs;
+ i->nr_segs = state->nr_segs;
+}
--
2.33.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] io_uring: use iov_iter state save/restore helpers
2021-09-14 14:17 [PATCHSET v2 0/3] Add ability to save/restore iov_iter state Jens Axboe
2021-09-14 14:17 ` [PATCH 1/3] iov_iter: add helper to save " Jens Axboe
@ 2021-09-14 14:17 ` Jens Axboe
2021-09-14 18:45 ` Linus Torvalds
2021-09-14 14:17 ` [PATCH 3/3] Revert "iov_iter: track truncated size" Jens Axboe
2 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2021-09-14 14:17 UTC (permalink / raw)
To: io-uring, linux-fsdevel; +Cc: torvalds, viro, Jens Axboe
Get rid of the need to do re-expand and revert on an iterator when we
encounter a short IO, or failure that warrants a retry. Use the new
state save/restore helpers instead.
We keep the iov_iter_state persistent across retries, if we need to
restart the read or write operation. If there's a pending retry, the
operation will always exit with the state correctly saved.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 62 ++++++++++++++++++++++++++++++++++-----------------
1 file changed, 41 insertions(+), 21 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 855ea544807f..dbc97d440801 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -712,6 +712,7 @@ struct io_async_rw {
struct iovec fast_iov[UIO_FASTIOV];
const struct iovec *free_iovec;
struct iov_iter iter;
+ struct iov_iter_state iter_state;
size_t bytes_done;
struct wait_page_queue wpq;
};
@@ -2608,8 +2609,7 @@ static bool io_resubmit_prep(struct io_kiocb *req)
if (!rw)
return !io_req_prep_async(req);
- /* may have left rw->iter inconsistent on -EIOCBQUEUED */
- iov_iter_revert(&rw->iter, req->result - iov_iter_count(&rw->iter));
+ iov_iter_restore(&rw->iter, &rw->iter_state);
return true;
}
@@ -3310,12 +3310,16 @@ static int io_setup_async_rw(struct io_kiocb *req, const struct iovec *iovec,
if (!force && !io_op_defs[req->opcode].needs_async_setup)
return 0;
if (!req->async_data) {
+ struct io_async_rw *iorw;
+
if (io_alloc_async_data(req)) {
kfree(iovec);
return -ENOMEM;
}
io_req_map_rw(req, iovec, fast_iov, iter);
+ iorw = req->async_data;
+ iov_iter_save_state(&iorw->iter, &iorw->iter_state);
}
return 0;
}
@@ -3334,6 +3338,7 @@ static inline int io_rw_prep_async(struct io_kiocb *req, int rw)
iorw->free_iovec = iov;
if (iov)
req->flags |= REQ_F_NEED_CLEANUP;
+ iov_iter_save_state(&iorw->iter, &iorw->iter_state);
return 0;
}
@@ -3437,19 +3442,23 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
struct kiocb *kiocb = &req->rw.kiocb;
struct iov_iter __iter, *iter = &__iter;
struct io_async_rw *rw = req->async_data;
- ssize_t io_size, ret, ret2;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+ struct iov_iter_state __state, *state;
+ ssize_t ret, ret2;
if (rw) {
iter = &rw->iter;
+ state = &rw->iter_state;
+ iov_iter_restore(iter, state);
iovec = NULL;
} else {
ret = io_import_iovec(READ, req, &iovec, iter, !force_nonblock);
if (ret < 0)
return ret;
+ state = &__state;
+ iov_iter_save_state(iter, state);
}
- io_size = iov_iter_count(iter);
- req->result = io_size;
+ req->result = iov_iter_count(iter);
/* Ensure we clear previously set non-block flag */
if (!force_nonblock)
@@ -3463,7 +3472,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
return ret ?: -EAGAIN;
}
- ret = rw_verify_area(READ, req->file, io_kiocb_ppos(kiocb), io_size);
+ ret = rw_verify_area(READ, req->file, io_kiocb_ppos(kiocb), req->result);
if (unlikely(ret)) {
kfree(iovec);
return ret;
@@ -3479,30 +3488,36 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
/* no retry on NONBLOCK nor RWF_NOWAIT */
if (req->flags & REQ_F_NOWAIT)
goto done;
- /* some cases will consume bytes even on error returns */
- iov_iter_reexpand(iter, iter->count + iter->truncated);
- iov_iter_revert(iter, io_size - iov_iter_count(iter));
ret = 0;
} else if (ret == -EIOCBQUEUED) {
goto out_free;
- } else if (ret <= 0 || ret == io_size || !force_nonblock ||
+ } else if (ret <= 0 || ret == req->result || !force_nonblock ||
(req->flags & REQ_F_NOWAIT) || !need_read_all(req)) {
/* read all, failed, already did sync or don't want to retry */
goto done;
}
+ iov_iter_restore(iter, state);
+
ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
if (ret2)
return ret2;
iovec = NULL;
rw = req->async_data;
- /* now use our persistent iterator, if we aren't already */
- iter = &rw->iter;
+ /* now use our persistent iterator and state, if we aren't already */
+ if (iter != &rw->iter) {
+ iter = &rw->iter;
+ state = &rw->iter_state;
+ }
do {
- io_size -= ret;
rw->bytes_done += ret;
+ iov_iter_advance(iter, ret);
+ if (!iov_iter_count(iter))
+ break;
+ iov_iter_save_state(iter, state);
+
/* if we can retry, do so with the callbacks armed */
if (!io_rw_should_retry(req)) {
kiocb->ki_flags &= ~IOCB_WAITQ;
@@ -3520,7 +3535,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
return 0;
/* we got some bytes, but not all. retry. */
kiocb->ki_flags &= ~IOCB_WAITQ;
- } while (ret > 0 && ret < io_size);
+ } while (ret > 0);
done:
kiocb_done(kiocb, ret, issue_flags);
out_free:
@@ -3543,19 +3558,24 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
struct kiocb *kiocb = &req->rw.kiocb;
struct iov_iter __iter, *iter = &__iter;
struct io_async_rw *rw = req->async_data;
- ssize_t ret, ret2, io_size;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+ struct iov_iter_state __state, *state;
+ ssize_t ret, ret2;
if (rw) {
iter = &rw->iter;
+ state = &rw->iter_state;
+ iov_iter_restore(iter, state);
iovec = NULL;
} else {
ret = io_import_iovec(WRITE, req, &iovec, iter, !force_nonblock);
if (ret < 0)
return ret;
+ state = &__state;
+ iov_iter_save_state(iter, state);
}
- io_size = iov_iter_count(iter);
- req->result = io_size;
+ req->result = iov_iter_count(iter);
+ ret2 = 0;
/* Ensure we clear previously set non-block flag */
if (!force_nonblock)
@@ -3572,7 +3592,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
(req->flags & REQ_F_ISREG))
goto copy_iov;
- ret = rw_verify_area(WRITE, req->file, io_kiocb_ppos(kiocb), io_size);
+ ret = rw_verify_area(WRITE, req->file, io_kiocb_ppos(kiocb), req->result);
if (unlikely(ret))
goto out_free;
@@ -3619,9 +3639,9 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
kiocb_done(kiocb, ret2, issue_flags);
} else {
copy_iov:
- /* some cases will consume bytes even on error returns */
- iov_iter_reexpand(iter, iter->count + iter->truncated);
- iov_iter_revert(iter, io_size - iov_iter_count(iter));
+ iov_iter_restore(iter, state);
+ if (ret2 > 0)
+ iov_iter_advance(iter, ret2);
ret = io_setup_async_rw(req, iovec, inline_vecs, iter, false);
return ret ?: -EAGAIN;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] Revert "iov_iter: track truncated size"
2021-09-14 14:17 [PATCHSET v2 0/3] Add ability to save/restore iov_iter state Jens Axboe
2021-09-14 14:17 ` [PATCH 1/3] iov_iter: add helper to save " Jens Axboe
2021-09-14 14:17 ` [PATCH 2/3] io_uring: use iov_iter state save/restore helpers Jens Axboe
@ 2021-09-14 14:17 ` Jens Axboe
2 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2021-09-14 14:17 UTC (permalink / raw)
To: io-uring, linux-fsdevel; +Cc: torvalds, viro, Jens Axboe
This reverts commit 2112ff5ce0c1128fe7b4d19cfe7f2b8ce5b595fa.
We no longer need to track the truncation count, the one user that did
need it has been converted to using iov_iter_restore() instead.
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/uio.h | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 984c4ab74859..207101a9c5c3 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -53,7 +53,6 @@ struct iov_iter {
};
loff_t xarray_start;
};
- size_t truncated;
};
static inline enum iter_type iov_iter_type(const struct iov_iter *i)
@@ -270,10 +269,8 @@ static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
* conversion in assignement is by definition greater than all
* values of size_t, including old i->count.
*/
- if (i->count > count) {
- i->truncated += i->count - count;
+ if (i->count > count)
i->count = count;
- }
}
/*
@@ -282,7 +279,6 @@ static inline void iov_iter_truncate(struct iov_iter *i, u64 count)
*/
static inline void iov_iter_reexpand(struct iov_iter *i, size_t count)
{
- i->truncated -= count - i->count;
i->count = count;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] io_uring: use iov_iter state save/restore helpers
2021-09-14 14:17 ` [PATCH 2/3] io_uring: use iov_iter state save/restore helpers Jens Axboe
@ 2021-09-14 18:45 ` Linus Torvalds
2021-09-14 19:37 ` Jens Axboe
0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2021-09-14 18:45 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, linux-fsdevel, Al Viro
On Tue, Sep 14, 2021 at 7:18 AM Jens Axboe <[email protected]> wrote:
>
>
> + iov_iter_restore(iter, state);
> +
...
> rw->bytes_done += ret;
> + iov_iter_advance(iter, ret);
> + if (!iov_iter_count(iter))
> + break;
> + iov_iter_save_state(iter, state);
Ok, so now you keep iovb_iter and the state always in sync by just
always resetting the iter back and then walking it forward explicitly
- and re-saving the state.
That seems safe, if potentially unnecessarily expensive.
I guess re-walking lots of iovec entries is actually very unlikely in
practice, so maybe this "stupid brute-force" model is the right one.
I do find the odd "use __state vs rw->state" to be very confusing,
though. Particularly in io_read(), where you do this:
+ iov_iter_restore(iter, state);
+
ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
if (ret2)
return ret2;
iovec = NULL;
rw = req->async_data;
- /* now use our persistent iterator, if we aren't already */
- iter = &rw->iter;
+ /* now use our persistent iterator and state, if we aren't already */
+ if (iter != &rw->iter) {
+ iter = &rw->iter;
+ state = &rw->iter_state;
+ }
do {
- io_size -= ret;
rw->bytes_done += ret;
+ iov_iter_advance(iter, ret);
+ if (!iov_iter_count(iter))
+ break;
+ iov_iter_save_state(iter, state);
Note how it first does that iov_iter_restore() on iter/state, buit
then it *replaces&* the iter/state pointers, and then it does
iov_iter_advance() on the replacement ones.
I don't see how that could be right. You're doing iov_iter_advance()
on something else than the one you restored to the original values.
And if it is right, it's sure confusing as hell.
Linus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] io_uring: use iov_iter state save/restore helpers
2021-09-14 18:45 ` Linus Torvalds
@ 2021-09-14 19:37 ` Jens Axboe
2021-09-14 23:02 ` Jens Axboe
0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2021-09-14 19:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: io-uring, linux-fsdevel, Al Viro
On 9/14/21 12:45 PM, Linus Torvalds wrote:
> On Tue, Sep 14, 2021 at 7:18 AM Jens Axboe <[email protected]> wrote:
>>
>>
>> + iov_iter_restore(iter, state);
>> +
> ...
>> rw->bytes_done += ret;
>> + iov_iter_advance(iter, ret);
>> + if (!iov_iter_count(iter))
>> + break;
>> + iov_iter_save_state(iter, state);
>
> Ok, so now you keep iovb_iter and the state always in sync by just
> always resetting the iter back and then walking it forward explicitly
> - and re-saving the state.
>
> That seems safe, if potentially unnecessarily expensive.
Right, it's not ideal if it's a big range of IO, then it'll definitely
be noticeable. But not too worried about it, at least not for now...
> I guess re-walking lots of iovec entries is actually very unlikely in
> practice, so maybe this "stupid brute-force" model is the right one.
Not sure what the alternative is here. We could do something similar to
__io_import_fixed() as we're only dealing with iter types where we can
do that, but probably best left as a later optimization if it's deemed
necessary.
> I do find the odd "use __state vs rw->state" to be very confusing,
> though. Particularly in io_read(), where you do this:
>
> + iov_iter_restore(iter, state);
> +
> ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
> if (ret2)
> return ret2;
>
> iovec = NULL;
> rw = req->async_data;
> - /* now use our persistent iterator, if we aren't already */
> - iter = &rw->iter;
> + /* now use our persistent iterator and state, if we aren't already */
> + if (iter != &rw->iter) {
> + iter = &rw->iter;
> + state = &rw->iter_state;
> + }
>
> do {
> - io_size -= ret;
> rw->bytes_done += ret;
> + iov_iter_advance(iter, ret);
> + if (!iov_iter_count(iter))
> + break;
> + iov_iter_save_state(iter, state);
>
>
> Note how it first does that iov_iter_restore() on iter/state, buit
> then it *replaces&* the iter/state pointers, and then it does
> iov_iter_advance() on the replacement ones.
We restore the iter so it's the same as before we did the read_iter
call, and then setup a consistent copy of the iov/iter in case we need
to punt this request for retry. rw->iter should have the same state as
iter at this point, and since rw->iter is the copy we'll use going
forward, we're advancing that one in case ret > 0.
The other case is that no persistent state is needed, and then iter
remains the same.
I'll take a second look at this part and see if I can make it a bit more
straight forward, or at least comment it properly.
--
Jens Axboe
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/3] io_uring: use iov_iter state save/restore helpers
2021-09-14 19:37 ` Jens Axboe
@ 2021-09-14 23:02 ` Jens Axboe
0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2021-09-14 23:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: io-uring, linux-fsdevel, Al Viro
On 9/14/21 1:37 PM, Jens Axboe wrote:
> On 9/14/21 12:45 PM, Linus Torvalds wrote:
>> On Tue, Sep 14, 2021 at 7:18 AM Jens Axboe <[email protected]> wrote:
>>>
>>>
>>> + iov_iter_restore(iter, state);
>>> +
>> ...
>>> rw->bytes_done += ret;
>>> + iov_iter_advance(iter, ret);
>>> + if (!iov_iter_count(iter))
>>> + break;
>>> + iov_iter_save_state(iter, state);
>>
>> Ok, so now you keep iovb_iter and the state always in sync by just
>> always resetting the iter back and then walking it forward explicitly
>> - and re-saving the state.
>>
>> That seems safe, if potentially unnecessarily expensive.
>
> Right, it's not ideal if it's a big range of IO, then it'll definitely
> be noticeable. But not too worried about it, at least not for now...
>
>> I guess re-walking lots of iovec entries is actually very unlikely in
>> practice, so maybe this "stupid brute-force" model is the right one.
>
> Not sure what the alternative is here. We could do something similar to
> __io_import_fixed() as we're only dealing with iter types where we can
> do that, but probably best left as a later optimization if it's deemed
> necessary.
>
>> I do find the odd "use __state vs rw->state" to be very confusing,
>> though. Particularly in io_read(), where you do this:
>>
>> + iov_iter_restore(iter, state);
>> +
>> ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
>> if (ret2)
>> return ret2;
>>
>> iovec = NULL;
>> rw = req->async_data;
>> - /* now use our persistent iterator, if we aren't already */
>> - iter = &rw->iter;
>> + /* now use our persistent iterator and state, if we aren't already */
>> + if (iter != &rw->iter) {
>> + iter = &rw->iter;
>> + state = &rw->iter_state;
>> + }
>>
>> do {
>> - io_size -= ret;
>> rw->bytes_done += ret;
>> + iov_iter_advance(iter, ret);
>> + if (!iov_iter_count(iter))
>> + break;
>> + iov_iter_save_state(iter, state);
>>
>>
>> Note how it first does that iov_iter_restore() on iter/state, buit
>> then it *replaces&* the iter/state pointers, and then it does
>> iov_iter_advance() on the replacement ones.
>
> We restore the iter so it's the same as before we did the read_iter
> call, and then setup a consistent copy of the iov/iter in case we need
> to punt this request for retry. rw->iter should have the same state as
> iter at this point, and since rw->iter is the copy we'll use going
> forward, we're advancing that one in case ret > 0.
>
> The other case is that no persistent state is needed, and then iter
> remains the same.
>
> I'll take a second look at this part and see if I can make it a bit more
> straight forward, or at least comment it properly.
I hacked up something that shortens the iter for the initial IO, so we
could more easily test the retry path and the state. It really is a
hack, but the idea was to issue 64K io from fio, and then the initial
attempt would be anywhere from 4K-60K truncated. That forces retry.
I ran this with both 16 segments and 8 segments, verifying that it
hits both the UIO_FASTIOV and alloc path.
I did find one issue with that, see the last hunk in the hack. We
need to increment rw->bytes_done if we don't break, or set ret to
0 if we do. Otherwise that last ret ends up being accounted twice.
But apart from that, it passes data verification runs.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc1ff47e3221..484c86252f9d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -744,6 +744,7 @@ enum {
REQ_F_NOWAIT_READ_BIT,
REQ_F_NOWAIT_WRITE_BIT,
REQ_F_ISREG_BIT,
+ REQ_F_TRUNCATED_BIT,
/* not a real bit, just to check we're not overflowing the space */
__REQ_F_LAST_BIT,
@@ -797,6 +798,7 @@ enum {
REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT),
/* there is a linked timeout that has to be armed */
REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT),
+ REQ_F_TRUNCATED = BIT(REQ_F_TRUNCATED_BIT),
};
struct async_poll {
@@ -3454,11 +3456,12 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
{
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct kiocb *kiocb = &req->rw.kiocb;
- struct iov_iter __iter, *iter = &__iter;
+ struct iov_iter __i, __iter, *iter = &__iter;
struct io_async_rw *rw = req->async_data;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
struct iov_iter_state __state, *state;
ssize_t ret, ret2;
+ bool do_restore = false;
if (rw) {
iter = &rw->iter;
@@ -3492,8 +3495,25 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
return ret;
}
+ if (!(req->flags & REQ_F_TRUNCATED) && !(iov_iter_count(iter) & 4095)) {
+ int nr_vecs;
+
+ __i = *iter;
+ nr_vecs = 1 + (prandom_u32() % iter->nr_segs);
+ iter->nr_segs = nr_vecs;
+ iter->count = nr_vecs * 8192;
+ req->flags |= REQ_F_TRUNCATED;
+ do_restore = true;
+ }
+
ret = io_iter_do_read(req, iter);
+ if (ret == -EAGAIN) {
+ req->flags &= ~REQ_F_TRUNCATED;
+ *iter = __i;
+ do_restore = false;
+ }
+
if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) {
req->flags &= ~REQ_F_REISSUE;
/* IOPOLL retry should happen for io-wq threads */
@@ -3513,6 +3533,9 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
iov_iter_restore(iter, state);
+ if (do_restore)
+ *iter = __i;
+
ret2 = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
if (ret2)
return ret2;
@@ -3526,10 +3549,10 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
}
do {
- rw->bytes_done += ret;
iov_iter_advance(iter, ret);
if (!iov_iter_count(iter))
break;
+ rw->bytes_done += ret;
iov_iter_save_state(iter, state);
/* if we can retry, do so with the callbacks armed */
--
Jens Axboe
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-09-14 23:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-14 14:17 [PATCHSET v2 0/3] Add ability to save/restore iov_iter state Jens Axboe
2021-09-14 14:17 ` [PATCH 1/3] iov_iter: add helper to save " Jens Axboe
2021-09-14 14:17 ` [PATCH 2/3] io_uring: use iov_iter state save/restore helpers Jens Axboe
2021-09-14 18:45 ` Linus Torvalds
2021-09-14 19:37 ` Jens Axboe
2021-09-14 23:02 ` Jens Axboe
2021-09-14 14:17 ` [PATCH 3/3] Revert "iov_iter: track truncated size" Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox