From: Ming Lei <[email protected]>
To: Mike Snitzer <[email protected]>
Cc: Jens Axboe <[email protected]>,
[email protected], [email protected]
Subject: Re: [RFC PATCH] io_uring: reissue in case -EAGAIN is returned after io issue returns
Date: Wed, 6 Apr 2022 10:09:44 +0800 [thread overview]
Message-ID: <Ykz2aF3VgyyVG46m@T590> (raw)
In-Reply-To: <[email protected]>
On Mon, Apr 04, 2022 at 12:51:30PM -0400, Mike Snitzer wrote:
> On Sun, Apr 03 2022 at 7:45P -0400,
> Ming Lei <[email protected]> wrote:
>
> > -EAGAIN still may return after io issue returns, and REQ_F_REISSUE is
> > set in io_complete_rw_iopoll(), but the req never gets chance to be handled.
> > io_iopoll_check doesn't handle this situation, and io hang can be caused.
> >
> > Current dm io polling may return -EAGAIN after bio submission is
> > returned, also blk-throttle might trigger this situation too.
> >
> > Cc: Mike Snitzer <[email protected]>
> > Signed-off-by: Ming Lei <[email protected]>
>
> I first reverted commit 5291984004ed ("dm: fix bio polling to handle
> possibile BLK_STS_AGAIN") then applied this patch and verified this
> fixes the DM bio polling hangs. Nice work!
>
> But interestingly with this fio test (against dm-linear ontop of
> null_blk with queue_mode=2 submit_queues=8 poll_queues=2 bs=4096 gb=16):
>
> fio --bs=4096 --ioengine=io_uring --fixedbufs --registerfiles --hipri=1 \
> --iodepth=16 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 \
> --filename=/dev/mapper/linear --direct=1 --runtime=20 --numjobs=16 \
> --rw=randread --name=test --group_reporting --norandommap
16jobs in io_uring/aio test is overkill.
>
> I get 3186k IOPS with your patch to have io_uring retry (and commit
> 5291984004ed reverted), but 4305k IOPS if leave commit 5291984004ed
> applied (and DM resorts to retrying any -EAGAIN _without_ polling).
IMO, commit 5291984004ed shouldn't be reverted, which is reasonable to
retry on underlying IO for dm.
This patch is for making io_uring more reliable, since the current
io_uring code only handles -EAGAIN from submission code path, and
-EAGAIN/REISSUE isn't handled if it is returned during ->poll(),
then the io hang is caused.
Jens, what do you think of this patch? Does io_uring need to handle
-EAGAIN in this case?
>
> Jens rightly pointed out to me that polling tests that exhaust tags
> are bogus anyway (because such unbounded IO defeats the point of
> polling). Jens also thinks my result, with commit 5291984004ed
> applied, is somehow bogus and not to be trusted ;) He is very likely
> correct, and the failing likely in the null_blk driver -- I'm
> skeptical of that driver given it cannot pass fio verify testing
> (e.g. --do_verify=1 --verify=crc32c --verify_async=1) with or without
> polling.
Because it is null block...
>
> Review comments inlined below.
>
> > ---
> > fs/io-wq.h | 13 +++++
> > fs/io_uring.c | 128 ++++++++++++++++++++++++++++----------------------
> > 2 files changed, 86 insertions(+), 55 deletions(-)
> >
> > diff --git a/fs/io-wq.h b/fs/io-wq.h
> > index dbecd27656c7..4ca4863664fb 100644
> > --- a/fs/io-wq.h
> > +++ b/fs/io-wq.h
> > @@ -96,6 +96,19 @@ static inline void wq_list_add_head(struct io_wq_work_node *node,
> > WRITE_ONCE(list->first, node);
> > }
> >
> > +static inline void wq_list_remove(struct io_wq_work_list *list,
> > + struct io_wq_work_node *prev,
> > + struct io_wq_work_node *node)
> > +{
> > + if (!prev)
> > + WRITE_ONCE(list->first, node->next);
> > + else
> > + prev->next = node->next;
> > +
> > + if (node == list->last)
> > + list->last = prev;
> > +}
> > +
> > static inline void wq_list_cut(struct io_wq_work_list *list,
> > struct io_wq_work_node *last,
> > struct io_wq_work_node *prev)
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 59e54a6854b7..6db5514e10ca 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -2759,6 +2759,65 @@ static inline bool io_run_task_work(void)
> > return false;
> > }
> >
> > +#ifdef CONFIG_BLOCK
> > +static bool io_resubmit_prep(struct io_kiocb *req)
> > +{
> > + struct io_async_rw *rw = req->async_data;
> > +
> > + if (!req_has_async_data(req))
> > + return !io_req_prep_async(req);
> > + iov_iter_restore(&rw->s.iter, &rw->s.iter_state);
> > + return true;
> > +}
> > +
> > +static bool io_rw_should_reissue(struct io_kiocb *req)
> > +{
> > + umode_t mode = file_inode(req->file)->i_mode;
> > + struct io_ring_ctx *ctx = req->ctx;
> > +
> > + if (!S_ISBLK(mode) && !S_ISREG(mode))
> > + return false;
> > + if ((req->flags & REQ_F_NOWAIT) || (io_wq_current_is_worker() &&
> > + !(ctx->flags & IORING_SETUP_IOPOLL)))
> > + return false;
> > + /*
> > + * If ref is dying, we might be running poll reap from the exit work.
> > + * Don't attempt to reissue from that path, just let it fail with
> > + * -EAGAIN.
> > + */
> > + if (percpu_ref_is_dying(&ctx->refs))
> > + return false;
> > + /*
> > + * Play it safe and assume not safe to re-import and reissue if we're
> > + * not in the original thread group (or in task context).
> > + */
> > + if (!same_thread_group(req->task, current) || !in_task())
> > + return false;
> > + return true;
> > +}
> > +#else
> > +static bool io_resubmit_prep(struct io_kiocb *req)
> > +{
> > + return false;
> > +}
> > +static bool io_rw_should_reissue(struct io_kiocb *req)
> > +{
> > + return false;
> > +}
> > +#endif
> > +
> > +static void do_io_reissue(struct io_kiocb *req, int ret)
> > +{
> > + if (req->flags & REQ_F_REISSUE) {
> > + req->flags &= ~REQ_F_REISSUE;
> > + if (io_resubmit_prep(req))
> > + io_req_task_queue_reissue(req);
> > + else
> > + io_req_task_queue_fail(req, ret);
> > + }
> > +}
>
> Minor nit but: I'd leave caller to check for REQ_F_REISSUE.
>
> > +
> > +
> > static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
> > {
> > struct io_wq_work_node *pos, *start, *prev;
> > @@ -2786,6 +2845,13 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
> > if (READ_ONCE(req->iopoll_completed))
> > break;
> >
> > + /*
> > + * Once REISSUE flag is set, the req has been done, and we
> > + * have to retry
> > + */
> > + if (req->flags & REQ_F_REISSUE)
> > + break;
> > +
> > ret = kiocb->ki_filp->f_op->iopoll(kiocb, &iob, poll_flags);
> > if (unlikely(ret < 0))
> > return ret;
> > @@ -2807,6 +2873,12 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
> > wq_list_for_each_resume(pos, prev) {
> > struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
> >
> > + if (req->flags & REQ_F_REISSUE) {
> > + wq_list_remove(&ctx->iopoll_list, prev, pos);
> > + do_io_reissue(req, -EIO);
> > + break;
> > + }
> > +
>
> That way you'll avoid redundant checks for REQ_F_REISSUE here.
Another do_io_reissue() needn't to remove req from ->iopoll_list, that
is why the check is done here.
Thanks,
Ming
next prev parent reply other threads:[~2022-04-06 12:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-03 11:45 [RFC PATCH] io_uring: reissue in case -EAGAIN is returned after io issue returns Ming Lei
2022-04-04 16:51 ` Mike Snitzer
2022-04-06 2:09 ` Ming Lei [this message]
2022-04-06 16:52 ` Mike Snitzer
2022-04-06 2:20 ` Jens Axboe
2022-04-06 3:57 ` Ming Lei
2022-04-06 12:58 ` Jens Axboe
2022-04-06 13:21 ` Ming Lei
2022-04-06 16:38 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ykz2aF3VgyyVG46m@T590 \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox