* Re: ublk: kernel crash when killing SPDK application
[not found] ` <d2179120-171b-47ba-b664-23242981ef19@nvidia.com>
@ 2025-04-22 13:42 ` Ming Lei
2025-04-22 13:47 ` Jared Holzman
0 siblings, 1 reply; 2+ messages in thread
From: Ming Lei @ 2025-04-22 13:42 UTC (permalink / raw)
To: Jared Holzman
Cc: Guy Eisenberg, linux-block@vger.kernel.org, axboe@kernel.dk,
Yoav Cohen, Omri Levi, Ofer Oshri, io-uring
On Tue, Apr 22, 2025 at 02:43:06PM +0300, Jared Holzman wrote:
> On 15/04/2025 15:56, Ming Lei wrote:
> > On Tue, Apr 15, 2025 at 10:58:37AM +0000, Guy Eisenberg wrote:
...
>
> Hi Ming,
>
> Unfortunately your patch did not solve the issue, it is still happening (6.14 Kernel)
>
> I believe the issue is that ublk_cancel_cmd() is calling io_uring_cmd_done() on a uring_cmd that is currently scheduled as a task work by io_uring_cmd_complete_in_task()
>
> I reproduced with the patch below and saw the warning I added shortly before the crash. The dmesg log is attached.
>
> I'm not sure how to solve the issue though. Unless we wait for the task work to complete in ublk_cancel cmd. I can't see any way to cancel the task work
>
> Would appreciate your assistance,
>
> Regards,
>
> Jared
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index ca9a67b5b537..d9f544206b36 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -72,6 +72,10 @@
> (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED)
>
> +#ifndef IORING_URING_CMD_TW_SCHED
> + #define IORING_URING_CMD_TW_SCHED (1U << 31)
> +#endif
> +
> struct ublk_rq_data {
> struct llist_node node;
>
> @@ -1236,6 +1240,7 @@ static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags)
> struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> struct ublk_queue *ubq = pdu->ubq;
>
> + cmd->flags &= ~IORING_URING_CMD_TW_SCHED;
> ublk_forward_io_cmds(ubq, issue_flags);
> }
>
> @@ -1245,7 +1250,7 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
>
> if (llist_add(&data->node, &ubq->io_cmds)) {
> struct ublk_io *io = &ubq->ios[rq->tag];
> -
> + io->cmd->flags |= IORING_URING_CMD_TW_SCHED;
> io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
> }
> }
> @@ -1498,8 +1503,10 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> io->flags |= UBLK_IO_FLAG_CANCELED;
> spin_unlock(&ubq->cancel_lock);
>
> - if (!done)
> + if (!done) {
> + WARN_ON_ONCE(io->cmd->flags & IORING_URING_CMD_TW_SCHED);
> io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
> + }
> }
>
> /*
> @@ -1925,6 +1932,7 @@ static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd,
> static void ublk_ch_uring_cmd_cb(struct io_uring_cmd *cmd,
> unsigned int issue_flags)
> {
> + cmd->flags &= ~IORING_URING_CMD_TW_SCHED;
> ublk_ch_uring_cmd_local(cmd, issue_flags);
> }
>
> @@ -1937,6 +1945,7 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>
> /* well-implemented server won't run into unlocked */
> if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) {
> + cmd->flags |= IORING_URING_CMD_TW_SCHED;
> io_uring_cmd_complete_in_task(cmd, ublk_ch_uring_cmd_cb);
> return -EIOCBQUEUED;
> }
> diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> index abd0c8bd950b..3ac2ef7bd99a 100644
> --- a/include/linux/io_uring/cmd.h
> +++ b/include/linux/io_uring/cmd.h
> @@ -7,6 +7,7 @@
>
> /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
> #define IORING_URING_CMD_CANCELABLE (1U << 30)
> +#define IORING_URING_CMD_TW_SCHED (1U << 31)
>
> struct io_uring_cmd {
> struct file *file;
>
Nice debug patch!
Your patch and the dmesg log has shown the race between io_uring_cmd_complete_in_task()
and io_uring_cmd_done() <- ublk_cancel_cmd().
In theory, io_uring should have the knowledge to cover it, but I guess it
might be a bit hard.
I will try to cook a ublk fix tomorrow for you to test.
Thanks,
Ming
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: ublk: kernel crash when killing SPDK application
2025-04-22 13:42 ` ublk: kernel crash when killing SPDK application Ming Lei
@ 2025-04-22 13:47 ` Jared Holzman
0 siblings, 0 replies; 2+ messages in thread
From: Jared Holzman @ 2025-04-22 13:47 UTC (permalink / raw)
To: Ming Lei
Cc: Guy Eisenberg, linux-block@vger.kernel.org, axboe@kernel.dk,
Yoav Cohen, Omri Levi, Ofer Oshri, io-uring
On 22/04/2025 16:42, Ming Lei wrote:
> On Tue, Apr 22, 2025 at 02:43:06PM +0300, Jared Holzman wrote:
>> On 15/04/2025 15:56, Ming Lei wrote:
>>> On Tue, Apr 15, 2025 at 10:58:37AM +0000, Guy Eisenberg wrote:
>
> ...
>
>>
>> Hi Ming,
>>
>> Unfortunately your patch did not solve the issue, it is still happening (6.14 Kernel)
>>
>> I believe the issue is that ublk_cancel_cmd() is calling io_uring_cmd_done() on a uring_cmd that is currently scheduled as a task work by io_uring_cmd_complete_in_task()
>>
>> I reproduced with the patch below and saw the warning I added shortly before the crash. The dmesg log is attached.
>>
>> I'm not sure how to solve the issue though. Unless we wait for the task work to complete in ublk_cancel cmd. I can't see any way to cancel the task work
>>
>> Would appreciate your assistance,
>>
>> Regards,
>>
>> Jared
>>
>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>> index ca9a67b5b537..d9f544206b36 100644
>> --- a/drivers/block/ublk_drv.c
>> +++ b/drivers/block/ublk_drv.c
>> @@ -72,6 +72,10 @@
>> (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
>> UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED)
>>
>> +#ifndef IORING_URING_CMD_TW_SCHED
>> + #define IORING_URING_CMD_TW_SCHED (1U << 31)
>> +#endif
>> +
>> struct ublk_rq_data {
>> struct llist_node node;
>>
>> @@ -1236,6 +1240,7 @@ static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags)
>> struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
>> struct ublk_queue *ubq = pdu->ubq;
>>
>> + cmd->flags &= ~IORING_URING_CMD_TW_SCHED;
>> ublk_forward_io_cmds(ubq, issue_flags);
>> }
>>
>> @@ -1245,7 +1250,7 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
>>
>> if (llist_add(&data->node, &ubq->io_cmds)) {
>> struct ublk_io *io = &ubq->ios[rq->tag];
>> -
>> + io->cmd->flags |= IORING_URING_CMD_TW_SCHED;
>> io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
>> }
>> }
>> @@ -1498,8 +1503,10 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>> io->flags |= UBLK_IO_FLAG_CANCELED;
>> spin_unlock(&ubq->cancel_lock);
>>
>> - if (!done)
>> + if (!done) {
>> + WARN_ON_ONCE(io->cmd->flags & IORING_URING_CMD_TW_SCHED);
>> io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
>> + }
>> }
>>
>> /*
>> @@ -1925,6 +1932,7 @@ static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd,
>> static void ublk_ch_uring_cmd_cb(struct io_uring_cmd *cmd,
>> unsigned int issue_flags)
>> {
>> + cmd->flags &= ~IORING_URING_CMD_TW_SCHED;
>> ublk_ch_uring_cmd_local(cmd, issue_flags);
>> }
>>
>> @@ -1937,6 +1945,7 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>>
>> /* well-implemented server won't run into unlocked */
>> if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) {
>> + cmd->flags |= IORING_URING_CMD_TW_SCHED;
>> io_uring_cmd_complete_in_task(cmd, ublk_ch_uring_cmd_cb);
>> return -EIOCBQUEUED;
>> }
>> diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
>> index abd0c8bd950b..3ac2ef7bd99a 100644
>> --- a/include/linux/io_uring/cmd.h
>> +++ b/include/linux/io_uring/cmd.h
>> @@ -7,6 +7,7 @@
>>
>> /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
>> #define IORING_URING_CMD_CANCELABLE (1U << 30)
>> +#define IORING_URING_CMD_TW_SCHED (1U << 31)
>>
>> struct io_uring_cmd {
>> struct file *file;
>>
>
> Nice debug patch!
>
> Your patch and the dmesg log has shown the race between io_uring_cmd_complete_in_task()
> and io_uring_cmd_done() <- ublk_cancel_cmd().
>
> In theory, io_uring should have the knowledge to cover it, but I guess it
> might be a bit hard.
>
> I will try to cook a ublk fix tomorrow for you to test.
>
> Thanks,
> Ming
>
That would be great. Much appreciated!
Regards,
Jared
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-04-22 13:47 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <IA1PR12MB645841796CB4C76F62F24522A9B22@IA1PR12MB6458.namprd12.prod.outlook.com>
[not found] ` <Z_5XdWPQa7cq1nDJ@fedora>
[not found] ` <d2179120-171b-47ba-b664-23242981ef19@nvidia.com>
2025-04-22 13:42 ` ublk: kernel crash when killing SPDK application Ming Lei
2025-04-22 13:47 ` Jared Holzman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox