* Polled I/O cannot find completions @ 2020-03-27 2:57 Bijan Mottahedeh 2020-03-27 15:36 ` Jens Axboe 0 siblings, 1 reply; 6+ messages in thread From: Bijan Mottahedeh @ 2020-03-27 2:57 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring I'm seeing poll threads hang as I increase the number of threads in polled fio tests. I think this is because of polling on BLK_QC_T_NONE cookie, which will never succeed. A related problem however, is that the meaning of BLK_QC_T_NONE seems to be ambiguous. Specifically, the following cases return BLK_QC_T_NONE which I think would be problematic for polled io: generic_make_request() ... if (current->bio_list) { bio_list_add(¤t->bio_list[0], bio); goto out; } In this case the request is delayed but should get a cookie eventually. How does the caller know what the right action is in this case for a polled request? Polling would never succeed. __blk_mq_issue_directly() ... case BLK_STS_RESOURCE: case BLK_STS_DEV_RESOURCE: blk_mq_update_dispatch_busy(hctx, true); __blk_mq_requeue_request(rq); break; In this case, cookie is not updated and would keep its default BLK_QC_T_NONE value from blk_mq_make_request(). However, this request will eventually be reissued, so again, how would the caller poll for the completion of this request? blk_mq_try_issue_directly() ... ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true); if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) blk_mq_request_bypass_insert(rq, false, true); Am I missing something here? Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be? Thanks. --bijan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions 2020-03-27 2:57 Polled I/O cannot find completions Bijan Mottahedeh @ 2020-03-27 15:36 ` Jens Axboe 2020-03-27 16:31 ` Bijan Mottahedeh 0 siblings, 1 reply; 6+ messages in thread From: Jens Axboe @ 2020-03-27 15:36 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring, [email protected] CC'ing linux-block, this isn't an io_uring issue. On 3/26/20 8:57 PM, Bijan Mottahedeh wrote: > I'm seeing poll threads hang as I increase the number of threads in > polled fio tests. I think this is because of polling on BLK_QC_T_NONE > cookie, which will never succeed. > > A related problem however, is that the meaning of BLK_QC_T_NONE seems to > be ambiguous. > > Specifically, the following cases return BLK_QC_T_NONE which I think > would be problematic for polled io: > > > generic_make_request() > ... > if (current->bio_list) { > bio_list_add(¤t->bio_list[0], bio); > goto out; > } > > In this case the request is delayed but should get a cookie eventually. > How does the caller know what the right action is in this case for a > polled request? Polling would never succeed. > > > __blk_mq_issue_directly() > ... > case BLK_STS_RESOURCE: > case BLK_STS_DEV_RESOURCE: > blk_mq_update_dispatch_busy(hctx, true); > __blk_mq_requeue_request(rq); > break; > > In this case, cookie is not updated and would keep its default > BLK_QC_T_NONE value from blk_mq_make_request(). However, this request > will eventually be reissued, so again, how would the caller poll for the > completion of this request? > > blk_mq_try_issue_directly() > ... > ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true); > if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) > blk_mq_request_bypass_insert(rq, false, true); > > Am I missing something here? > > Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be? > > Thanks. > > --bijan > > > > -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions 2020-03-27 15:36 ` Jens Axboe @ 2020-03-27 16:31 ` Bijan Mottahedeh 2020-03-27 16:35 ` Jens Axboe 0 siblings, 1 reply; 6+ messages in thread From: Bijan Mottahedeh @ 2020-03-27 16:31 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, [email protected] Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are you saying that it should never receive that result? That's one of the things I'm not clear about. --bijan > CC'ing linux-block, this isn't an io_uring issue. > > > On 3/26/20 8:57 PM, Bijan Mottahedeh wrote: >> I'm seeing poll threads hang as I increase the number of threads in >> polled fio tests. I think this is because of polling on BLK_QC_T_NONE >> cookie, which will never succeed. >> >> A related problem however, is that the meaning of BLK_QC_T_NONE seems to >> be ambiguous. >> >> Specifically, the following cases return BLK_QC_T_NONE which I think >> would be problematic for polled io: >> >> >> generic_make_request() >> ... >> if (current->bio_list) { >> bio_list_add(¤t->bio_list[0], bio); >> goto out; >> } >> >> In this case the request is delayed but should get a cookie eventually. >> How does the caller know what the right action is in this case for a >> polled request? Polling would never succeed. >> >> >> __blk_mq_issue_directly() >> ... >> case BLK_STS_RESOURCE: >> case BLK_STS_DEV_RESOURCE: >> blk_mq_update_dispatch_busy(hctx, true); >> __blk_mq_requeue_request(rq); >> break; >> >> In this case, cookie is not updated and would keep its default >> BLK_QC_T_NONE value from blk_mq_make_request(). However, this request >> will eventually be reissued, so again, how would the caller poll for the >> completion of this request? >> >> blk_mq_try_issue_directly() >> ... >> ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true); >> if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) >> blk_mq_request_bypass_insert(rq, false, true); >> >> Am I missing something here? >> >> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be? >> >> Thanks. >> >> --bijan >> >> >> >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions 2020-03-27 16:31 ` Bijan Mottahedeh @ 2020-03-27 16:35 ` Jens Axboe 2020-03-31 18:43 ` Bijan Mottahedeh 0 siblings, 1 reply; 6+ messages in thread From: Jens Axboe @ 2020-03-27 16:35 UTC (permalink / raw) To: Bijan Mottahedeh; +Cc: io-uring, [email protected] On 3/27/20 10:31 AM, Bijan Mottahedeh wrote: > Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are you > saying that it should never receive that result? > That's one of the things I'm not clear about. BLK_QC_T_* are block cookies, they are only valid in the block layer. Only the poll handler called should have to deal with them, inside their f_op->iopoll() handler. It's simply passed from the queue to the poll side. So no, io_uring shouldn't have to deal with them at all. The problem, as I see it, is if the block layer returns BLK_QC_T_NONE and the IO was actually queued and requires polling to be found. We'd end up with IO timeouts for handling those requests, and that's not a good thing... >> On 3/26/20 8:57 PM, Bijan Mottahedeh wrote: >>> I'm seeing poll threads hang as I increase the number of threads in >>> polled fio tests. I think this is because of polling on BLK_QC_T_NONE >>> cookie, which will never succeed. >>> >>> A related problem however, is that the meaning of BLK_QC_T_NONE seems to >>> be ambiguous. >>> >>> Specifically, the following cases return BLK_QC_T_NONE which I think >>> would be problematic for polled io: >>> >>> >>> generic_make_request() >>> ... >>> if (current->bio_list) { >>> bio_list_add(¤t->bio_list[0], bio); >>> goto out; >>> } >>> >>> In this case the request is delayed but should get a cookie eventually. >>> How does the caller know what the right action is in this case for a >>> polled request? Polling would never succeed. >>> >>> >>> __blk_mq_issue_directly() >>> ... >>> case BLK_STS_RESOURCE: >>> case BLK_STS_DEV_RESOURCE: >>> blk_mq_update_dispatch_busy(hctx, true); >>> __blk_mq_requeue_request(rq); >>> break; >>> >>> In this case, cookie is not updated and would keep its default >>> BLK_QC_T_NONE value from blk_mq_make_request(). However, this request >>> will eventually be reissued, so again, how would the caller poll for the >>> completion of this request? >>> >>> blk_mq_try_issue_directly() >>> ... >>> ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false, true); >>> if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) >>> blk_mq_request_bypass_insert(rq, false, true); >>> >>> Am I missing something here? >>> >>> Incidentally, I don't see BLK_QC_T_EAGAIN used anywhere, should it be? Pretty sure that's a leftover from when the attempts was made to pass back -EAGAIN inline instead of through the bio end_io handler. -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions 2020-03-27 16:35 ` Jens Axboe @ 2020-03-31 18:43 ` Bijan Mottahedeh 2020-04-01 1:01 ` Bijan Mottahedeh 0 siblings, 1 reply; 6+ messages in thread From: Bijan Mottahedeh @ 2020-03-31 18:43 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, [email protected] >> Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are you >> saying that it should never receive that result? >> That's one of the things I'm not clear about. > BLK_QC_T_* are block cookies, they are only valid in the block layer. > Only the poll handler called should have to deal with them, inside > their f_op->iopoll() handler. It's simply passed from the queue to > the poll side. > > So no, io_uring shouldn't have to deal with them at all. > > The problem, as I see it, is if the block layer returns BLK_QC_T_NONE > and the IO was actually queued and requires polling to be found. We'd > end up with IO timeouts for handling those requests, and that's not a > good thing... I see requests in io_do_iopoll() on poll_list with req->res == -EAGAIN, I think because the completion happened after an issued request was added to poll_list in io_iopoll_req_issued(). How should we deal with such a request, reissue unconditionally or something else? --bijan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Polled I/O cannot find completions 2020-03-31 18:43 ` Bijan Mottahedeh @ 2020-04-01 1:01 ` Bijan Mottahedeh 0 siblings, 0 replies; 6+ messages in thread From: Bijan Mottahedeh @ 2020-04-01 1:01 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, [email protected] On 3/31/2020 11:43 AM, Bijan Mottahedeh wrote: > >>> Does io_uring though have to deal with BLK_QC_T_NONE at all? Or are >>> you >>> saying that it should never receive that result? >>> That's one of the things I'm not clear about. >> BLK_QC_T_* are block cookies, they are only valid in the block layer. >> Only the poll handler called should have to deal with them, inside >> their f_op->iopoll() handler. It's simply passed from the queue to >> the poll side. >> >> So no, io_uring shouldn't have to deal with them at all. >> >> The problem, as I see it, is if the block layer returns BLK_QC_T_NONE >> and the IO was actually queued and requires polling to be found. We'd >> end up with IO timeouts for handling those requests, and that's not a >> good thing... > > I see requests in io_do_iopoll() on poll_list with req->res == > -EAGAIN, I think because the completion happened after an issued > request was added to poll_list in io_iopoll_req_issued(). > > How should we deal with such a request, reissue unconditionally or > something else? > I mimicked the done processing code in io_iopoll_complete() for -EAGAIN as a test. I can now get further and don't see polling threads hang; in fact, I eventually see I/O timeouts as you noted. It seems that there might be two separate issues here. Makes sense? Thanks. --bijan diff --git a/fs/io_uring.c b/fs/io_uring.c index 62bd410..a3e3a4e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1738,11 +1738,24 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, io_free_req_many(ctx, &rb); } +static void io_iopoll_queue(struct list_head *again) +{ + struct io_kiocb *req; + + while (!list_empty(again)) { + req = list_first_entry(again, struct io_kiocb, list); + list_del(&req->list); + refcount_inc(&req->refs); + io_queue_async_work(req); + } +} + static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events, long min) { struct io_kiocb *req, *tmp; LIST_HEAD(done); + LIST_HEAD(again); bool spin; int ret; @@ -1757,9 +1770,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned struct kiocb *kiocb = &req->rw.kiocb; /* - * Move completed entries to our local list. If we find a - * request that requires polling, break out and complete - * the done list first, if we have entries there. + * Move completed and retryable entries to our local lists. + * If we find a request that requires polling, break out + * and complete those lists first, if we have entries there. */ if (req->flags & REQ_F_IOPOLL_COMPLETED) { list_move_tail(&req->list, &done); @@ -1768,6 +1781,13 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned if (!list_empty(&done)) break; + if (req->result == -EAGAIN) { + list_move_tail(&req->list, &again); + continue; + } + if (!list_empty(&again)) + break; + ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin); if (ret < 0) break; @@ -1780,6 +1800,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned if (!list_empty(&done)) io_iopoll_complete(ctx, nr_events, &done); + if (!list_empty(&again)) + io_iopoll_queue(&again); + return ret; } ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-04-01 1:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-03-27 2:57 Polled I/O cannot find completions Bijan Mottahedeh 2020-03-27 15:36 ` Jens Axboe 2020-03-27 16:31 ` Bijan Mottahedeh 2020-03-27 16:35 ` Jens Axboe 2020-03-31 18:43 ` Bijan Mottahedeh 2020-04-01 1:01 ` Bijan Mottahedeh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox