public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jared Holzman <jholzman@nvidia.com>
Cc: Guy Eisenberg <geisenberg@nvidia.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>, Yoav Cohen <yoav@nvidia.com>,
	Omri Levi <omril@nvidia.com>, Ofer Oshri <ofer@nvidia.com>,
	io-uring@vger.kernel.org
Subject: Re: ublk: kernel crash when killing SPDK application
Date: Tue, 22 Apr 2025 21:42:04 +0800	[thread overview]
Message-ID: <aAecrLIivK5ioeOk@fedora> (raw)
In-Reply-To: <d2179120-171b-47ba-b664-23242981ef19@nvidia.com>

On Tue, Apr 22, 2025 at 02:43:06PM +0300, Jared Holzman wrote:
> On 15/04/2025 15:56, Ming Lei wrote:
> > On Tue, Apr 15, 2025 at 10:58:37AM +0000, Guy Eisenberg wrote:

...

> 
> Hi Ming,
> 
> Unfortunately your patch did not solve the issue, it is still happening (6.14 Kernel)
> 
> I believe the issue is that ublk_cancel_cmd() is calling io_uring_cmd_done() on a uring_cmd that is currently scheduled as a task work by io_uring_cmd_complete_in_task()
> 
> I reproduced with the patch below and saw the warning I added shortly before the crash. The dmesg log is attached.
> 
> I'm not sure how to solve the issue though. Unless we wait for the task work to complete in ublk_cancel cmd. I can't see any way to cancel the task work
> 
> Would appreciate your assistance,
> 
> Regards,
> 
> Jared
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index ca9a67b5b537..d9f544206b36 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -72,6 +72,10 @@
>  	(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
>  	 UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED)
>  
> +#ifndef IORING_URING_CMD_TW_SCHED
> +        #define IORING_URING_CMD_TW_SCHED (1U << 31)
> +#endif
> +
>  struct ublk_rq_data {
>  	struct llist_node node;
>  
> @@ -1236,6 +1240,7 @@ static void ublk_rq_task_work_cb(struct io_uring_cmd *cmd, unsigned issue_flags)
>  	struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
>  	struct ublk_queue *ubq = pdu->ubq;
>  
> +	cmd->flags &= ~IORING_URING_CMD_TW_SCHED;
>  	ublk_forward_io_cmds(ubq, issue_flags);
>  }
>  
> @@ -1245,7 +1250,7 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
>  
>  	if (llist_add(&data->node, &ubq->io_cmds)) {
>  		struct ublk_io *io = &ubq->ios[rq->tag];
> -
> +		io->cmd->flags |= IORING_URING_CMD_TW_SCHED;
>  		io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
>  	}
>  }
> @@ -1498,8 +1503,10 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>  		io->flags |= UBLK_IO_FLAG_CANCELED;
>  	spin_unlock(&ubq->cancel_lock);
>  
> -	if (!done)
> +	if (!done) {
> +		WARN_ON_ONCE(io->cmd->flags & IORING_URING_CMD_TW_SCHED);
>  		io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
> +        }
>  }
>  
>  /*
> @@ -1925,6 +1932,7 @@ static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd,
>  static void ublk_ch_uring_cmd_cb(struct io_uring_cmd *cmd,
>  		unsigned int issue_flags)
>  {
> +	cmd->flags &= ~IORING_URING_CMD_TW_SCHED;
>  	ublk_ch_uring_cmd_local(cmd, issue_flags);
>  }
>  
> @@ -1937,6 +1945,7 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  
>  	/* well-implemented server won't run into unlocked */
>  	if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) {
> +		cmd->flags |= IORING_URING_CMD_TW_SCHED;
>  		io_uring_cmd_complete_in_task(cmd, ublk_ch_uring_cmd_cb);
>  		return -EIOCBQUEUED;
>  	}
> diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> index abd0c8bd950b..3ac2ef7bd99a 100644
> --- a/include/linux/io_uring/cmd.h
> +++ b/include/linux/io_uring/cmd.h
> @@ -7,6 +7,7 @@
>  
>  /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
>  #define IORING_URING_CMD_CANCELABLE	(1U << 30)
> +#define IORING_URING_CMD_TW_SCHED	(1U << 31)
>  
>  struct io_uring_cmd {
>  	struct file	*file;
> 

Nice debug patch!

Your patch and the dmesg log has shown the race between io_uring_cmd_complete_in_task()
and io_uring_cmd_done() <- ublk_cancel_cmd().

In theory, io_uring should have the knowledge to cover it, but I guess it
might be a bit hard.

I will try to cook a ublk fix tomorrow for you to test.

Thanks,
Ming


       reply	other threads:[~2025-04-22 13:42 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <IA1PR12MB645841796CB4C76F62F24522A9B22@IA1PR12MB6458.namprd12.prod.outlook.com>
     [not found] ` <Z_5XdWPQa7cq1nDJ@fedora>
     [not found]   ` <d2179120-171b-47ba-b664-23242981ef19@nvidia.com>
2025-04-22 13:42     ` Ming Lei [this message]
2025-04-22 13:47       ` ublk: kernel crash when killing SPDK application Jared Holzman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aAecrLIivK5ioeOk@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=geisenberg@nvidia.com \
    --cc=io-uring@vger.kernel.org \
    --cc=jholzman@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ofer@nvidia.com \
    --cc=omril@nvidia.com \
    --cc=yoav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox