From: Jens Axboe <[email protected]>
To: "Haeuptle, Michael" <[email protected]>,
"[email protected]" <[email protected]>
Subject: Re: Bug? CQE.res = -EAGAIN with nvme multipath driver
Date: Mon, 6 Jan 2025 19:39:31 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/6/25 7:33 PM, Jens Axboe wrote:
> On 1/6/25 4:53 PM, Jens Axboe wrote:
>> On 1/6/25 1:03 PM, Haeuptle, Michael wrote:
>>> Hello,
>>>
>>> I?m using the nvme multipath driver (NVMF/RDMA) and io-uring. When a
>>> path goes away, I sometimes get a CQE.res = -EAGAIN in user space.
>>> This is unexpected since the nvme multipath driver should handle this
>>> transparently. It?s somewhat workload related but easy to reproduce
>>> with fio.
>>>
>>> The multipath driver uses kblockd worker to re-queue the failed NVME
>>> bios
>>> (https://github.com/torvalds/linux/blob/13563da6ffcf49b8b45772e40b35f96926a7ee1e/drivers/nvme/host/multipath.c#L126).
>>> The original request is ended.
>>>
>>> When the nvme_requeue_work callback is executed, the blk layer tries
>>> to allocate a new request for the bios but that fails and the bio
>>> status is set to BLK_STS_AGAIN
>>> (https://elixir.bootlin.com/linux/v6.12.6/source/block/blk-mq.c#L2987).
>>> The failure to allocate a new req seems to be due to all tags for the
>>> queue being used up.
>>>
>>> Eventually, this makes it into io_uring?s io_rw_should_reissue and
>>> hits same_thread_group(req->tctx->task, current) = false (in
>>> https://github.com/torvalds/linux/blob/13563da6ffcf49b8b45772e40b35f96926a7ee1e/io_uring/rw.c#L437).
>>> As a result, CQE.res = -EAGAIN and thrown back to the user space
>>> program.
>>>
>>> Here?s a stack dump when we hit same_thread_group(req->tctx->task,
>>> current) = false
>>>
>>> kernel: [237700.098733] dump_stack_lvl+0x44/0x5c
>>> kernel: [237700.098737] io_rw_should_reissue.cold+0x5d/0x64
>>> kernel: [237700.098742] io_complete_rw+0x9a/0xc0
>>> kernel: [237700.098745] blkdev_bio_end_io_async+0x33/0x80
>>> kernel: [237700.098749] blk_mq_submit_bio+0x5b5/0x620
>>> kernel: [237700.098756] submit_bio_noacct_nocheck+0x163/0x370
>>> kernel: [237700.098760] ? submit_bio_noacct+0x79/0x4b0
>>> kernel: [237700.098764] nvme_requeue_work+0x4b/0x60 [nvme_core]
>>> kernel: [237700.098776] process_one_work+0x1c7/0x380
>>> kernel: [237700.098782] worker_thread+0x4d/0x380
>>> kernel: [237700.098786] ? _raw_spin_lock_irqsave+0x23/0x50
>>> kernel: [237700.098791] ? rescuer_thread+0x3a0/0x3a0
>>> kernel: [237700.098794] kthread+0xe9/0x110
>>> kernel: [237700.098798] ? kthread_complete_and_exit+0x20/0x20
>>> kernel: [237700.098802] ret_from_fork+0x22/0x30
>>> kernel: [237700.098811] </TASK>
>>>
>>> Is the same_thread_group() check really needed in this case? The
>>> thread groups are certainly different? Any side effects if this check
>>> is being removed?
>>
>> It's their for safety reasons - across all request types, it's not
>> always safe. For this case, absolutely the check does not need to be
>> there. So probably best to ponder ways to bypass it selectively.
>>
>> Let me ponder a bit what the best approach would be here...
>
> Actually I think we can just remove it. The actual retry will happen out
> of context anyway, and the comment about the import is no longer valid
> as the import will have been done upfront since 6.10.
>
> Do you want to send a patch for that, or do you want me to send one out
> referencing this report?
Also see:
commit 039a2e800bcd5beb89909d1a488abf3d647642cf
Author: Jens Axboe <[email protected]>
Date: Thu Apr 25 09:04:32 2024 -0600
io_uring/rw: reinstate thread check for retries
let me take a closer look tomorrow...
--
Jens Axboe
prev parent reply other threads:[~2025-01-07 2:39 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 20:03 Bug? CQE.res = -EAGAIN with nvme multipath driver Haeuptle, Michael
2025-01-06 23:53 ` Jens Axboe
2025-01-07 2:33 ` Jens Axboe
2025-01-07 2:39 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox