[PATCH] io_uring/rw: ensure retry isn't lost for write

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH] io_uring/rw: ensure retry isn't lost for write
       [not found] <CGME20240422134215epcas5p4b5dcd1a5cd0308be5e43f691d7f92947@epcas5p4.samsung.com>
@ 2024-04-22 13:35 ` Anuj Gupta
  2024-04-23 12:15   ` Anuj gupta
  2024-04-23 14:00   ` Pavel Begunkov
  0 siblings, 2 replies; 6+ messages in thread
From: Anuj Gupta @ 2024-04-22 13:35 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, anuj1072538, Anuj Gupta

In case of write, the iov_iter gets updated before retry kicks in.
Restore the iov_iter before retrying. It can be reproduced by issuing
a write greater than device limit.

Fixes: df604d2ad480 (io_uring/rw: ensure retry condition isn't lost)

Signed-off-by: Anuj Gupta <[email protected]>
---
 io_uring/rw.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 4fed829fe97c..9fadb29ec34f 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -1035,8 +1035,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	else
 		ret2 = -EINVAL;
 
-	if (req->flags & REQ_F_REISSUE)
+	if (req->flags & REQ_F_REISSUE) {
+		iov_iter_restore(&io->iter, &io->iter_state);
 		return IOU_ISSUE_SKIP_COMPLETE;
+	}
 
 	/*
 	 * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring/rw: ensure retry isn't lost for write
  2024-04-22 13:35 ` [PATCH] io_uring/rw: ensure retry isn't lost for write Anuj Gupta
@ 2024-04-23 12:15   ` Anuj gupta
  2024-04-23 14:00   ` Pavel Begunkov
  1 sibling, 0 replies; 6+ messages in thread
From: Anuj gupta @ 2024-04-23 12:15 UTC (permalink / raw)
  To: Anuj Gupta; +Cc: axboe, io-uring

On Mon, Apr 22, 2024 at 11:19 PM Anuj Gupta <[email protected]> wrote:
>
> In case of write, the iov_iter gets updated before retry kicks in.
> Restore the iov_iter before retrying. It can be reproduced by issuing
> a write greater than device limit.
>
> Fixes: df604d2ad480 (io_uring/rw: ensure retry condition isn't lost)
>
> Signed-off-by: Anuj Gupta <[email protected]>
> ---
>  io_uring/rw.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 4fed829fe97c..9fadb29ec34f 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -1035,8 +1035,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>         else
>                 ret2 = -EINVAL;
>
> -       if (req->flags & REQ_F_REISSUE)
> +       if (req->flags & REQ_F_REISSUE) {
> +               iov_iter_restore(&io->iter, &io->iter_state);
>                 return IOU_ISSUE_SKIP_COMPLETE;
> +       }
>
>         /*
>          * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just
> --
> 2.25.1
>

Looking more into it, no write happens incase of retry. This is because
the first call to blkdev_direct_write advances the iter and updates the
count to 0. Since the I/O needs to be split, retry handling gets triggered.
We don't restore the iter, and the retry happens with count=0. Hence NO I/O.
This doesn't happen incase of read, as blkdev_read_iter reverts the iter,
and restores the right count value back[3].

NVMe device limit [1]
Fio command used[2]

[1]
#cat /sys/block/nvme0n1/queue/max_hw_sectors_kb
512

[2]
fio -iodepth=1 -rw=write -direct=1 -ioengine=io_uring -bs=1M -numjobs=1 \
-offset=0 -size=1M -group_reporting -filename=/dev/nvme0n1 -name=io_uring

[3]
static ssize_t blkdev_read_iter(struct kiocb iocb, struct iov_iterto)
{
        if (iocb->ki_flags & IOCB_DIRECT) {
                ret = blkdev_direct_IO(iocb, to);
                if (ret >= 0) {
                        iocb->ki_pos += ret;
                        count -= ret;
                }
                iov_iter_revert(to, count - iov_iter_count(to));
                if (ret < 0 || !count)
                        goto reexpand;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring/rw: ensure retry isn't lost for write
  2024-04-22 13:35 ` [PATCH] io_uring/rw: ensure retry isn't lost for write Anuj Gupta
  2024-04-23 12:15   ` Anuj gupta
@ 2024-04-23 14:00   ` Pavel Begunkov
  2024-04-24 13:36     ` Jens Axboe
  1 sibling, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2024-04-23 14:00 UTC (permalink / raw)
  To: Anuj Gupta, axboe; +Cc: io-uring, anuj1072538

On 4/22/24 14:35, Anuj Gupta wrote:
> In case of write, the iov_iter gets updated before retry kicks in.
> Restore the iov_iter before retrying. It can be reproduced by issuing
> a write greater than device limit.
> 
> Fixes: df604d2ad480 (io_uring/rw: ensure retry condition isn't lost)
> 
> Signed-off-by: Anuj Gupta <[email protected]>
> ---
>   io_uring/rw.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 4fed829fe97c..9fadb29ec34f 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -1035,8 +1035,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>   	else
>   		ret2 = -EINVAL;
>   
> -	if (req->flags & REQ_F_REISSUE)
> +	if (req->flags & REQ_F_REISSUE) {
> +		iov_iter_restore(&io->iter, &io->iter_state);
>   		return IOU_ISSUE_SKIP_COMPLETE;

That's races with resubmission of the request, if it can happen from
io-wq that'd corrupt the iter. Nor I believe that the fix that this
patch fixes is correct, see

https://lore.kernel.org/linux-block/Zh505790%2FoufXqMn@fedora/T/#mb24d3dca84eb2d83878ea218cb0efaae34c9f026

Jens, I'd suggest to revert "io_uring/rw: ensure retry condition
isn't lost". I don't think we can sanely reissue from the callback
unless there are better ownership rules over kiocb and iter, e.g.
never touch the iter after calling the kiocb's callback.

> +	}
>   
>   	/*
>   	 * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring/rw: ensure retry isn't lost for write
  2024-04-23 14:00   ` Pavel Begunkov
@ 2024-04-24 13:36     ` Jens Axboe
  2024-04-24 15:04       ` Pavel Begunkov
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2024-04-24 13:36 UTC (permalink / raw)
  To: Pavel Begunkov, Anuj Gupta; +Cc: io-uring, anuj1072538

On 4/23/24 8:00 AM, Pavel Begunkov wrote:
> On 4/22/24 14:35, Anuj Gupta wrote:
>> In case of write, the iov_iter gets updated before retry kicks in.
>> Restore the iov_iter before retrying. It can be reproduced by issuing
>> a write greater than device limit.
>>
>> Fixes: df604d2ad480 (io_uring/rw: ensure retry condition isn't lost)
>>
>> Signed-off-by: Anuj Gupta <[email protected]>
>> ---
>>   io_uring/rw.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>> index 4fed829fe97c..9fadb29ec34f 100644
>> --- a/io_uring/rw.c
>> +++ b/io_uring/rw.c
>> @@ -1035,8 +1035,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>       else
>>           ret2 = -EINVAL;
>>   -    if (req->flags & REQ_F_REISSUE)
>> +    if (req->flags & REQ_F_REISSUE) {
>> +        iov_iter_restore(&io->iter, &io->iter_state);
>>           return IOU_ISSUE_SKIP_COMPLETE;
> 
> That's races with resubmission of the request, if it can happen from
> io-wq that'd corrupt the iter. Nor I believe that the fix that this
> patch fixes is correct, see
> 
> https://lore.kernel.org/linux-block/Zh505790%2FoufXqMn@fedora/T/#mb24d3dca84eb2d83878ea218cb0efaae34c9f026
> 
> Jens, I'd suggest to revert "io_uring/rw: ensure retry condition
> isn't lost". I don't think we can sanely reissue from the callback
> unless there are better ownership rules over kiocb and iter, e.g.
> never touch the iter after calling the kiocb's callback.

It is a problem, but I don't believe it's a new one. If we revert the
existing fix, then we'll have to deal with the failure to end the IO due
to the (now) missing same thread group check, though. Which should be
doable, but would be nice to get this cleaned and cleared up once and
for all.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring/rw: ensure retry isn't lost for write
  2024-04-24 13:36     ` Jens Axboe
@ 2024-04-24 15:04       ` Pavel Begunkov
  2024-04-25 15:15         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2024-04-24 15:04 UTC (permalink / raw)
  To: Jens Axboe, Anuj Gupta; +Cc: io-uring, anuj1072538

On 4/24/24 14:36, Jens Axboe wrote:
> On 4/23/24 8:00 AM, Pavel Begunkov wrote:
>> On 4/22/24 14:35, Anuj Gupta wrote:
>>> In case of write, the iov_iter gets updated before retry kicks in.
>>> Restore the iov_iter before retrying. It can be reproduced by issuing
>>> a write greater than device limit.
>>>
>>> Fixes: df604d2ad480 (io_uring/rw: ensure retry condition isn't lost)
>>>
>>> Signed-off-by: Anuj Gupta <[email protected]>
>>> ---
>>>    io_uring/rw.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>> index 4fed829fe97c..9fadb29ec34f 100644
>>> --- a/io_uring/rw.c
>>> +++ b/io_uring/rw.c
>>> @@ -1035,8 +1035,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>>        else
>>>            ret2 = -EINVAL;
>>>    -    if (req->flags & REQ_F_REISSUE)
>>> +    if (req->flags & REQ_F_REISSUE) {
>>> +        iov_iter_restore(&io->iter, &io->iter_state);
>>>            return IOU_ISSUE_SKIP_COMPLETE;
>>
>> That's races with resubmission of the request, if it can happen from
>> io-wq that'd corrupt the iter. Nor I believe that the fix that this
>> patch fixes is correct, see
>>
>> https://lore.kernel.org/linux-block/Zh505790%2FoufXqMn@fedora/T/#mb24d3dca84eb2d83878ea218cb0efaae34c9f026
>>
>> Jens, I'd suggest to revert "io_uring/rw: ensure retry condition
>> isn't lost". I don't think we can sanely reissue from the callback
>> unless there are better ownership rules over kiocb and iter, e.g.
>> never touch the iter after calling the kiocb's callback.
> 
> It is a problem, but I don't believe it's a new one. If we revert the
> existing fix, then we'll have to deal with the failure to end the IO due
> to the (now) missing same thread group check, though. Which should be

My bad, I meant reverting the patch that removed thread group checks
together with its fixes.

> doable, but would be nice to get this cleaned and cleared up once and
> for all.

It's not like I'm in love with that chunk of code, if anything the
group check was quite feeble and quite, but replacing it with sth
clean but buggy is questionable...
Do you think it was broken before? Because I don't see any simple
way to fix it without propagating reissue back to io_read/write.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring/rw: ensure retry isn't lost for write
  2024-04-24 15:04       ` Pavel Begunkov
@ 2024-04-25 15:15         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-04-25 15:15 UTC (permalink / raw)
  To: Pavel Begunkov, Anuj Gupta; +Cc: io-uring, anuj1072538

On 4/24/24 9:04 AM, Pavel Begunkov wrote:
> On 4/24/24 14:36, Jens Axboe wrote:
>> On 4/23/24 8:00 AM, Pavel Begunkov wrote:
>>> On 4/22/24 14:35, Anuj Gupta wrote:
>>>> In case of write, the iov_iter gets updated before retry kicks in.
>>>> Restore the iov_iter before retrying. It can be reproduced by issuing
>>>> a write greater than device limit.
>>>>
>>>> Fixes: df604d2ad480 (io_uring/rw: ensure retry condition isn't lost)
>>>>
>>>> Signed-off-by: Anuj Gupta <[email protected]>
>>>> ---
>>>>    io_uring/rw.c | 4 +++-
>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>>> index 4fed829fe97c..9fadb29ec34f 100644
>>>> --- a/io_uring/rw.c
>>>> +++ b/io_uring/rw.c
>>>> @@ -1035,8 +1035,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>>>        else
>>>>            ret2 = -EINVAL;
>>>>    -    if (req->flags & REQ_F_REISSUE)
>>>> +    if (req->flags & REQ_F_REISSUE) {
>>>> +        iov_iter_restore(&io->iter, &io->iter_state);
>>>>            return IOU_ISSUE_SKIP_COMPLETE;
>>>
>>> That's races with resubmission of the request, if it can happen from
>>> io-wq that'd corrupt the iter. Nor I believe that the fix that this
>>> patch fixes is correct, see
>>>
>>> https://lore.kernel.org/linux-block/Zh505790%2FoufXqMn@fedora/T/#mb24d3dca84eb2d83878ea218cb0efaae34c9f026
>>>
>>> Jens, I'd suggest to revert "io_uring/rw: ensure retry condition
>>> isn't lost". I don't think we can sanely reissue from the callback
>>> unless there are better ownership rules over kiocb and iter, e.g.
>>> never touch the iter after calling the kiocb's callback.
>>
>> It is a problem, but I don't believe it's a new one. If we revert the
>> existing fix, then we'll have to deal with the failure to end the IO due
>> to the (now) missing same thread group check, though. Which should be
> 
> My bad, I meant reverting the patch that removed thread group checks
> together with its fixes.

Gotcha, yeah let's do that for now. It's a bit annoying as with the
async data prep we can sanely retry anything at this point, and avoid
any random -EAGAIN bubbling back to userspace. But we do have some gaps
to cover in terms of either missing that (what the 2nd patch attempted
to do), so doesn't look like we can sanely cover that for now.

I did a revert (ish) commit, will send it out to the list shortly.

>> doable, but would be nice to get this cleaned and cleared up once and
>> for all.
> 
> It's not like I'm in love with that chunk of code, if anything the
> group check was quite feeble and quite, but replacing it with sth
> clean but buggy is questionable...

It's just an awful work-around that isn't needed anymore, as it's meant
to check if we can sanely re-import. With the current code base, there's
never any need to re-import anything, and we can always sanely retry.
The problem is just that we need to be able to handle that...

> Do you think it was broken before? Because I don't see any simple
> way to fix it without propagating reissue back to io_read/write.

It's just always felt a bit fragile in how we attempt to catch the
reissue flag, never quite loved that part. Seems to be it could only be
completely solid if we remove the need to check this in the read/write
issue path completely, and leave it to the callback side. It all really
(again) boils back to how the lower level don't handle this
consistently. If we bubbled back -EAGAIN through the issue path always,
it'd be trivial to handle. But we don't, so handling it completion side
seems like the saner choice.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-25 15:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20240422134215epcas5p4b5dcd1a5cd0308be5e43f691d7f92947@epcas5p4.samsung.com>
2024-04-22 13:35 ` [PATCH] io_uring/rw: ensure retry isn't lost for write Anuj Gupta
2024-04-23 12:15   ` Anuj gupta
2024-04-23 14:00   ` Pavel Begunkov
2024-04-24 13:36     ` Jens Axboe
2024-04-24 15:04       ` Pavel Begunkov
2024-04-25 15:15         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox