public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Pavel Begunkov <[email protected]>
Cc: Jens Axboe <[email protected]>,
	[email protected], Kanchan Joshi <[email protected]>,
	[email protected]
Subject: Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
Date: Sun, 16 Apr 2023 18:05:39 +0800	[thread overview]
Message-ID: <ZDvIc4uDT9F2Mej/@ovpn-8-16.pek2.redhat.com> (raw)
In-Reply-To: <[email protected]>

On Sun, Apr 16, 2023 at 12:15:20AM +0100, Pavel Begunkov wrote:
> On 4/14/23 16:42, Ming Lei wrote:
> > On Fri, Apr 14, 2023 at 04:07:52PM +0100, Pavel Begunkov wrote:
> > > On 4/14/23 14:53, Ming Lei wrote:
> > > > On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
> > > > > On 4/14/23 08:53, Ming Lei wrote:
> > > > > > So far io_req_complete_post() only covers DEFER_TASKRUN by completing
> > > > > > request via task work when the request is completed from IOWQ.
> > > > > > 
> > > > > > However, uring command could be completed from any context, and if io
> > > > > > uring is setup with DEFER_TASKRUN, the command is required to be
> > > > > > completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
> > > > > > can't be wakeup, and may hang forever.
> > > > > 
> > > > > fwiw, there is one legit exception, when the task is half dead
> > > > > task_work will be executed by a kthread. It should be fine as it
> > > > > locks the ctx down, but I can't help but wonder whether it's only
> > > > > ublk_cancel_queue() affected or there are more places in ublk?
> > > > 
> > > > No, it isn't.
> > > > 
> > > > It isn't triggered on nvme-pt just because command is always done
> > > > in task context.
> > > > 
> > > > And we know more uring command cases are coming.
> > > 
> > > Because all requests and cmds but ublk complete it from another
> > > task, ublk is special in this regard.
> > 
> > Not sure it is true, cause it is allowed to call io_uring_cmd_done from other
> > task technically. And it could be more friendly for driver to not limit
> > its caller in the task context. Especially we have another API of
> > io_uring_cmd_complete_in_task().
> 
> I agree that the cmd io_uring API can do better.
> 
> 
> > > I have several more not so related questions:
> > > 
> > > 1) Can requests be submitted by some other task than ->ubq_daemon?
> > 
> > Yeah, requests can be submitted by other task, but ublk driver doesn't
> > allow it because ublk driver has not knowledge when the io_uring context
> > goes away, so has to limit requests submitted from ->ubq_daemon only,
> > then use this task's information for checking if the io_uring context
> > is going to exit. When the io_uring context is dying, we need to
> > abort these uring commands(may never complete), see ublk_cancel_queue().
> > 
> > The only difference is that the uring command may never complete,
> > because one uring cmd is only completed when the associated block request
> > is coming. The situation could be improved by adding API/callback for
> > notifying io_uring exit.
> 
> Got it. And it sounds like you can use IORING_SETUP_SINGLE_ISSUER
> and possibly IORING_SETUP_DEFER_TASKRUN, if not already.

ublk driver is simple, but the userspace ublk server can be quite
complicated and need flexible setting, and we shouldn't put any limit
on userspace in theory.

> 
> 
> > > Looking at
> > > 
> > > static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > > {
> > >      ...
> > >      if (ubq->ubq_daemon && ubq->ubq_daemon != current)
> > >         goto out;
> > > }
> > > 
> > > ublk_queue_cmd() avoiding io_uring way of delivery and using
> > > raw task_work doesn't seem great. Especially with TWA_SIGNAL_NO_IPI.
> > 
> > Yeah, it has been in my todo list to kill task work. In ublk early time,
> 
> I see
> 
> > task work just performs better than io_uring_cmd_complete_in_task(), but
> > the gap becomes pretty small or even not visible now.
> 
> It seems a bit strange, non DEFER_TASKRUN tw is almost identical to what
> you do, see __io_req_task_work_add(). Maybe it's extra callbacks on the
> execution side.
> 
> Did you try DEFER_TASKRUN? Not sure it suits your case as there are
> limitations, but the queueing side of it, as well as execution and
> waiting are well optimised and should do better.

I tried DEFER_TASKRUN which need this fix, not see obvious IOPS boost
against IORING_SETUP_COOP_TASKRUN, which does make big difference.

> 
> 
> > > 2) What the purpose of the two lines below? I see how
> > > UBLK_F_URING_CMD_COMP_IN_TASK is used, but don't understand
> > > why it changes depending on whether it's a module or not.
> > 
> > task work isn't available in case of building ublk as module.
> 
> Ah, makes sense now, thanks
> 
> > > 3) The long comment in ublk_queue_cmd() seems quite scary.
> > > If you have a cmd / io_uring request it hold a ctx reference
> > > and is always allowed to use io_uring's task_work infra like
> > > io_uring_cmd_complete_in_task(). Why it's different for ublk?
> > 
> > The thing is that we don't know if there is io_uring request for the
> > coming blk request. UBLK_IO_FLAG_ABORTED just means that the io_uring
> > context is dead, and we can't use io_uring_cmd_complete_in_task() any
> > more.
> 
> Roughly got it, IIUC, there might not be a (valid) io_uring
> request backing this block request in the first place because of
> this aborting thing.

I am working on adding notifier cb in io_uring_try_cancel_requests(),
and looks it works. With this way, ublk server implementation can become
quite flexible and aborting becomes simpler, such as, not need limit of
single per-queue submitter any more, and I remember that spdk guys did
complain this kind of limit.


Thanks,
Ming


      reply	other threads:[~2023-04-16 10:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230414075422epcas5p3ae5de53e643a448f19df82a7a1d5cd1c@epcas5p3.samsung.com>
2023-04-14  7:53 ` [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN Ming Lei
2023-04-14 11:52   ` Kanchan Joshi
2023-04-14 12:39   ` Jens Axboe
2023-04-14 13:01   ` Pavel Begunkov
2023-04-14 13:53     ` Ming Lei
2023-04-14 14:13       ` Kanchan Joshi
2023-04-14 14:53         ` Ming Lei
2023-04-14 15:07       ` Pavel Begunkov
2023-04-14 15:42         ` Ming Lei
2023-04-15 23:15           ` Pavel Begunkov
2023-04-16 10:05             ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDvIc4uDT9F2Mej/@ovpn-8-16.pek2.redhat.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox