* Polled I/O cannot find completions
@ 2020-03-27 2:57 Bijan Mottahedeh
2020-03-27 15:36 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-03-27 2:57 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring
I'm seeing poll threads hang as I increase the number of threads in
polled fio tests. I think this is because of polling on BLK_QC_T_NONE
cookie, which will never succeed.
A related problem however, is that the meaning of BLK_QC_T_NONE seems to
be ambiguous.
Specifically, the following cases return BLK_QC_T_NONE which I think
would be problematic for polled io:
generic_make_request()
...
if (current->bio_list) {
bio_list_add(¤t->bio_list[0], bio);
goto out;
}
In this case the request is delayed but should get a cookie eventually.
How does the caller know what the right action is in this case for a
polled request? Polling would never succeed.
__blk_mq_issue_directly()
...
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_update_dispatch_busy(hctx, true);
__blk_mq_requeue_request(rq);
break;
In this case, cookie is not updated and would keep its default
BLK_QC_T_NONE value from blk_mq_make_request(). However, this request
will eventually be reissued, so again, how would the caller poll for the
completion of this request?
blk_mq_try_issue_directly()
...
ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
blk_mq_request_bypass_insert(rq, false, true);
Am I missing something here?
Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?
Thanks.
--bijan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions
2020-03-27 2:57 Polled I/O cannot find completions Bijan Mottahedeh
@ 2020-03-27 15:36 ` Jens Axboe
2020-03-27 16:31 ` Bijan Mottahedeh
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2020-03-27 15:36 UTC (permalink / raw)
To: Bijan Mottahedeh; +Cc: io-uring, [email protected]
CC'ing linux-block, this isn't an io_uring issue.
On 3/26/20 8:57 PM, Bijan Mottahedeh wrote:
> I'm seeing poll threads hang as I increase the number of threads in
> polled fio tests. I think this is because of polling on BLK_QC_T_NONE
> cookie, which will never succeed.
>
> A related problem however, is that the meaning of BLK_QC_T_NONE seems to
> be ambiguous.
>
> Specifically, the following cases return BLK_QC_T_NONE which I think
> would be problematic for polled io:
>
>
> generic_make_request()
> ...
> if (current->bio_list) {
> bio_list_add(¤t->bio_list[0], bio);
> goto out;
> }
>
> In this case the request is delayed but should get a cookie eventually.
> How does the caller know what the right action is in this case for a
> polled request? Polling would never succeed.
>
>
> __blk_mq_issue_directly()
> ...
> case BLK_STS_RESOURCE:
> case BLK_STS_DEV_RESOURCE:
> blk_mq_update_dispatch_busy(hctx, true);
> __blk_mq_requeue_request(rq);
> break;
>
> In this case, cookie is not updated and would keep its default
> BLK_QC_T_NONE value from blk_mq_make_request(). However, this request
> will eventually be reissued, so again, how would the caller poll for the
> completion of this request?
>
> blk_mq_try_issue_directly()
> ...
> ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
> if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
> blk_mq_request_bypass_insert(rq, false, true);
>
> Am I missing something here?
>
> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?
>
> Thanks.
>
> --bijan
>
>
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions
2020-03-27 15:36 ` Jens Axboe
@ 2020-03-27 16:31 ` Bijan Mottahedeh
2020-03-27 16:35 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-03-27 16:31 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, [email protected]
Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are you
saying that it should never receive that result?
That's one of the things I'm not clear about.
--bijan
> CC'ing linux-block, this isn't an io_uring issue.
>
>
> On 3/26/20 8:57 PM, Bijan Mottahedeh wrote:
>> I'm seeing poll threads hang as I increase the number of threads in
>> polled fio tests. I think this is because of polling on BLK_QC_T_NONE
>> cookie, which will never succeed.
>>
>> A related problem however, is that the meaning of BLK_QC_T_NONE seems to
>> be ambiguous.
>>
>> Specifically, the following cases return BLK_QC_T_NONE which I think
>> would be problematic for polled io:
>>
>>
>> generic_make_request()
>> ...
>> if (current->bio_list) {
>> bio_list_add(¤t->bio_list[0], bio);
>> goto out;
>> }
>>
>> In this case the request is delayed but should get a cookie eventually.
>> How does the caller know what the right action is in this case for a
>> polled request? Polling would never succeed.
>>
>>
>> __blk_mq_issue_directly()
>> ...
>> case BLK_STS_RESOURCE:
>> case BLK_STS_DEV_RESOURCE:
>> blk_mq_update_dispatch_busy(hctx, true);
>> __blk_mq_requeue_request(rq);
>> break;
>>
>> In this case, cookie is not updated and would keep its default
>> BLK_QC_T_NONE value from blk_mq_make_request(). However, this request
>> will eventually be reissued, so again, how would the caller poll for the
>> completion of this request?
>>
>> blk_mq_try_issue_directly()
>> ...
>> ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
>> if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
>> blk_mq_request_bypass_insert(rq, false, true);
>>
>> Am I missing something here?
>>
>> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?
>>
>> Thanks.
>>
>> --bijan
>>
>>
>>
>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions
2020-03-27 16:31 ` Bijan Mottahedeh
@ 2020-03-27 16:35 ` Jens Axboe
2020-03-31 18:43 ` Bijan Mottahedeh
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2020-03-27 16:35 UTC (permalink / raw)
To: Bijan Mottahedeh; +Cc: io-uring, [email protected]
On 3/27/20 10:31 AM, Bijan Mottahedeh wrote:
> Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are you
> saying that it should never receive that result?
> That's one of the things I'm not clear about.
BLK_QC_T_* are block cookies, they are only valid in the block layer.
Only the poll handler called should have to deal with them, inside
their f_op->iopoll() handler. It's simply passed from the queue to
the poll side.
So no, io_uring shouldn't have to deal with them at all.
The problem, as I see it, is if the block layer returns BLK_QC_T_NONE
and the IO was actually queued and requires polling to be found. We'd
end up with IO timeouts for handling those requests, and that's not a
good thing...
>> On 3/26/20 8:57 PM, Bijan Mottahedeh wrote:
>>> I'm seeing poll threads hang as I increase the number of threads in
>>> polled fio tests. I think this is because of polling on BLK_QC_T_NONE
>>> cookie, which will never succeed.
>>>
>>> A related problem however, is that the meaning of BLK_QC_T_NONE seems to
>>> be ambiguous.
>>>
>>> Specifically, the following cases return BLK_QC_T_NONE which I think
>>> would be problematic for polled io:
>>>
>>>
>>> generic_make_request()
>>> ...
>>> if (current->bio_list) {
>>> bio_list_add(¤t->bio_list[0], bio);
>>> goto out;
>>> }
>>>
>>> In this case the request is delayed but should get a cookie eventually.
>>> How does the caller know what the right action is in this case for a
>>> polled request? Polling would never succeed.
>>>
>>>
>>> __blk_mq_issue_directly()
>>> ...
>>> case BLK_STS_RESOURCE:
>>> case BLK_STS_DEV_RESOURCE:
>>> blk_mq_update_dispatch_busy(hctx, true);
>>> __blk_mq_requeue_request(rq);
>>> break;
>>>
>>> In this case, cookie is not updated and would keep its default
>>> BLK_QC_T_NONE value from blk_mq_make_request(). However, this request
>>> will eventually be reissued, so again, how would the caller poll for the
>>> completion of this request?
>>>
>>> blk_mq_try_issue_directly()
>>> ...
>>> ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true);
>>> if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
>>> blk_mq_request_bypass_insert(rq, false, true);
>>>
>>> Am I missing something here?
>>>
>>> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be?
Pretty sure that's a leftover from when the attempts was made to pass
back -EAGAIN inline instead of through the bio end_io handler.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions
2020-03-27 16:35 ` Jens Axboe
@ 2020-03-31 18:43 ` Bijan Mottahedeh
2020-04-01 1:01 ` Bijan Mottahedeh
0 siblings, 1 reply; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-03-31 18:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, [email protected]
>> Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are you
>> saying that it should never receive that result?
>> That's one of the things I'm not clear about.
> BLK_QC_T_* are block cookies, they are only valid in the block layer.
> Only the poll handler called should have to deal with them, inside
> their f_op->iopoll() handler. It's simply passed from the queue to
> the poll side.
>
> So no, io_uring shouldn't have to deal with them at all.
>
> The problem, as I see it, is if the block layer returns BLK_QC_T_NONE
> and the IO was actually queued and requires polling to be found. We'd
> end up with IO timeouts for handling those requests, and that's not a
> good thing...
I see requests in io_do_iopoll() on poll_list with req->res == -EAGAIN,
I think because the completion happened after an issued request was
added to poll_list in io_iopoll_req_issued().
How should we deal with such a request, reissue unconditionally or
something else?
--bijan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions
2020-03-31 18:43 ` Bijan Mottahedeh
@ 2020-04-01 1:01 ` Bijan Mottahedeh
0 siblings, 0 replies; 6+ messages in thread
From: Bijan Mottahedeh @ 2020-04-01 1:01 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring, [email protected]
On 3/31/2020 11:43 AM, Bijan Mottahedeh wrote:
>
>>> Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are
>>> you
>>> saying that it should never receive that result?
>>> That's one of the things I'm not clear about.
>> BLK_QC_T_* are block cookies, they are only valid in the block layer.
>> Only the poll handler called should have to deal with them, inside
>> their f_op->iopoll() handler. It's simply passed from the queue to
>> the poll side.
>>
>> So no, io_uring shouldn't have to deal with them at all.
>>
>> The problem, as I see it, is if the block layer returns BLK_QC_T_NONE
>> and the IO was actually queued and requires polling to be found. We'd
>> end up with IO timeouts for handling those requests, and that's not a
>> good thing...
>
> I see requests in io_do_iopoll() on poll_list with req->res ==
> -EAGAIN, I think because the completion happened after an issued
> request was added to poll_list in io_iopoll_req_issued().
>
> How should we deal with such a request, reissue unconditionally or
> something else?
>
I mimicked the done processing code in io_iopoll_complete() for -EAGAIN
as a test. I can now get further and don't see polling threads hang; in
fact, I eventually see I/O timeouts as you noted.
It seems that there might be two separate issues here. Makes sense?
Thanks.
--bijan
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 62bd410..a3e3a4e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1738,11 +1738,24 @@ static void io_iopoll_complete(struct
io_ring_ctx *ctx,
io_free_req_many(ctx, &rb);
}
+static void io_iopoll_queue(struct list_head *again)
+{
+ struct io_kiocb *req;
+
+ while (!list_empty(again)) {
+ req = list_first_entry(again, struct io_kiocb, list);
+ list_del(&req->list);
+ refcount_inc(&req->refs);
+ io_queue_async_work(req);
+ }
+}
+
static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
long min)
{
struct io_kiocb *req, *tmp;
LIST_HEAD(done);
+ LIST_HEAD(again);
bool spin;
int ret;
@@ -1757,9 +1770,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx,
unsigned
struct kiocb *kiocb = &req->rw.kiocb;
/*
- * Move completed entries to our local list. If we find a
- * request that requires polling, break out and complete
- * the done list first, if we have entries there.
+ * Move completed and retryable entries to our local lists.
+ * If we find a request that requires polling, break out
+ * and complete those lists first, if we have entries there.
*/
if (req->flags & REQ_F_IOPOLL_COMPLETED) {
list_move_tail(&req->list, &done);
@@ -1768,6 +1781,13 @@ static int io_do_iopoll(struct io_ring_ctx *ctx,
unsigned
if (!list_empty(&done))
break;
+ if (req->result == -EAGAIN) {
+ list_move_tail(&req->list, &again);
+ continue;
+ }
+ if (!list_empty(&again))
+ break;
+
ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin);
if (ret < 0)
break;
@@ -1780,6 +1800,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx,
unsigned
if (!list_empty(&done))
io_iopoll_complete(ctx, nr_events, &done);
+ if (!list_empty(&again))
+ io_iopoll_queue(&again);
+
return ret;
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-04-01 1:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-27 2:57 Polled I/O cannot find completions Bijan Mottahedeh
2020-03-27 15:36 ` Jens Axboe
2020-03-27 16:31 ` Bijan Mottahedeh
2020-03-27 16:35 ` Jens Axboe
2020-03-31 18:43 ` Bijan Mottahedeh
2020-04-01 1:01 ` Bijan Mottahedeh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox