public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring <io-uring@vger.kernel.org>, Ming Lei <ming.lei@redhat.com>
Subject: Re: [PATCH v2] io_uring: fix IOPOLL with passthrough I/O
Date: Thu, 15 Jan 2026 06:08:32 +0800	[thread overview]
Message-ID: <CAHj4cs8=5Lifi8U+8mCnknxODYAeqD2_fS-zvkmWeb4hCp9z-A@mail.gmail.com> (raw)
In-Reply-To: <c008dbd2-6436-40da-b5c6-f34844878a6f@kernel.dk>

On Wed, Jan 14, 2026 at 11:29 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> A previous commit improving IOPOLL made an incorrect assumption that
> task_work isn't used with IOPOLL. This can cause crashes when doing
> passthrough I/O on nvme, where queueing the completion task_work will
> trample on the same memory that holds the completed list of requests.
>
> Fix it up by shuffling the members around, so we're not sharing any
> parts that end up getting used in this path.

I tried the v2 and confirmed the issue was fixed:

Tested-by: Yi Zhang <yi.zhang@redhat.com>


# ./check nvme/049
nvme/049 => nvme0n1 (basic test for uring-passthrough I/O on /dev/ngX) [passed]
    runtime    ...  7.991s
nvme/049 => nvme1n1 (basic test for uring-passthrough I/O on /dev/ngX) [passed]
    runtime    ...  7.970s
nvme/049 => nvme2n1 (basic test for uring-passthrough I/O on /dev/ngX) [passed]
    runtime    ...  7.965s
nvme/049 => nvme3n1 (basic test for uring-passthrough I/O on /dev/ngX) [passed]
    runtime    ...  7.975s
nvme/049 => nvme4n1 (basic test for uring-passthrough I/O on /dev/ngX) [passed]
    runtime    ...  8.003s
nvme/049 => nvme5n1 (basic test for uring-passthrough I/O on /dev/ngX) [passed]
    runtime    ...  7.999s

>
> Fixes: 3c7d76d6128a ("io_uring: IOPOLL polling improvements")
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Link: https://lore.kernel.org/linux-block/CAHj4cs_SLPj9v9w5MgfzHKy+983enPx3ZQY2kMuMJ1202DBefw@mail.gmail.com/
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
> v2: ensure ->iopoll_start is read before doing actual polling
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index e4c804f99c30..211686ad89fd 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -713,13 +713,10 @@ struct io_kiocb {
>         atomic_t                        refs;
>         bool                            cancel_seq_set;
>
> -       /*
> -        * IOPOLL doesn't use task_work, so use the ->iopoll_node list
> -        * entry to manage pending iopoll requests.
> -        */
>         union {
>                 struct io_task_work     io_task_work;
> -               struct list_head        iopoll_node;
> +               /* For IOPOLL setup queues, with hybrid polling */
> +               u64                     iopoll_start;
>         };
>
>         union {
> @@ -728,8 +725,8 @@ struct io_kiocb {
>                  * poll
>                  */
>                 struct hlist_node       hash_node;
> -               /* For IOPOLL setup queues, with hybrid polling */
> -               u64                     iopoll_start;
> +               /* IOPOLL completion handling */
> +               struct list_head        iopoll_node;
>                 /* for private io_kiocb freeing */
>                 struct rcu_head         rcu_head;
>         };
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 307f1f39d9f3..c33c533a267e 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -1296,12 +1296,13 @@ static int io_uring_hybrid_poll(struct io_kiocb *req,
>                                 struct io_comp_batch *iob, unsigned int poll_flags)
>  {
>         struct io_ring_ctx *ctx = req->ctx;
> -       u64 runtime, sleep_time;
> +       u64 runtime, sleep_time, iopoll_start;
>         int ret;
>
> +       iopoll_start = READ_ONCE(req->iopoll_start);
>         sleep_time = io_hybrid_iopoll_delay(ctx, req);
>         ret = io_uring_classic_poll(req, iob, poll_flags);
> -       runtime = ktime_get_ns() - req->iopoll_start - sleep_time;
> +       runtime = ktime_get_ns() - iopoll_start - sleep_time;
>
>         /*
>          * Use minimum sleep time if we're polling devices with different
> --
> Jens Axboe
>


-- 
Best Regards,
  Yi Zhang


  reply	other threads:[~2026-01-14 22:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14 15:28 [PATCH v2] io_uring: fix IOPOLL with passthrough I/O Jens Axboe
2026-01-14 22:08 ` Yi Zhang [this message]
2026-01-15  1:42 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHj4cs8=5Lifi8U+8mCnknxODYAeqD2_fS-zvkmWeb4hCp9z-A@mail.gmail.com' \
    --to=yi.zhang@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox