public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
@ 2023-04-14  7:53 ` Ming Lei
  2023-04-14 11:52   ` Kanchan Joshi
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ming Lei @ 2023-04-14  7:53 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Ming Lei, Kanchan Joshi

So far io_req_complete_post() only covers DEFER_TASKRUN by completing
request via task work when the request is completed from IOWQ.

However, uring command could be completed from any context, and if io
uring is setup with DEFER_TASKRUN, the command is required to be
completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
can't be wakeup, and may hang forever.

The issue can be observed on removing ublk device, but turns out it is
one generic issue for uring command & DEFER_TASKRUN, so solve it in
io_uring core code.

Link: https://lore.kernel.org/linux-block/[email protected]/
Reported-by: Jens Axboe <[email protected]>
Cc: Kanchan Joshi <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
 io_uring/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9083a8466ebf..9f6f92ed60b2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1012,7 +1012,7 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
 
 void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
 {
-	if (req->ctx->task_complete && (issue_flags & IO_URING_F_IOWQ)) {
+	if (req->ctx->task_complete && req->ctx->submitter_task != current) {
 		req->io_task_work.func = io_req_task_complete;
 		io_req_task_work_add(req);
 	} else if (!(issue_flags & IO_URING_F_UNLOCKED) ||
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14  7:53 ` [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN Ming Lei
@ 2023-04-14 11:52   ` Kanchan Joshi
  2023-04-14 12:39   ` Jens Axboe
  2023-04-14 13:01   ` Pavel Begunkov
  2 siblings, 0 replies; 11+ messages in thread
From: Kanchan Joshi @ 2023-04-14 11:52 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, io-uring

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]

On Fri, Apr 14, 2023 at 03:53:13PM +0800, Ming Lei wrote:
>So far io_req_complete_post() only covers DEFER_TASKRUN by completing
>request via task work when the request is completed from IOWQ.
>
>However, uring command could be completed from any context, and if io
>uring is setup with DEFER_TASKRUN, the command is required to be
>completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
>can't be wakeup, and may hang forever.
>
>The issue can be observed on removing ublk device, but turns out it is
>one generic issue for uring command & DEFER_TASKRUN, so solve it in
>io_uring core code.

Thanks for sharing, this has been fine for nvme-passthrough side though.
We usually test with DEFER_TASKRUN option, as both fio and t/io_uring
set the option.

>Link: https://lore.kernel.org/linux-block/[email protected]/
>Reported-by: Jens Axboe <[email protected]>
>Cc: Kanchan Joshi <[email protected]>
>Signed-off-by: Ming Lei <[email protected]>
>---
> io_uring/io_uring.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>index 9083a8466ebf..9f6f92ed60b2 100644
>--- a/io_uring/io_uring.c
>+++ b/io_uring/io_uring.c
>@@ -1012,7 +1012,7 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
>
> void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
> {
>-	if (req->ctx->task_complete && (issue_flags & IO_URING_F_IOWQ)) {
>+	if (req->ctx->task_complete && req->ctx->submitter_task != current) {
> 		req->io_task_work.func = io_req_task_complete;
> 		io_req_task_work_add(req);

In nvme-side, we always complete in task context, so this seems bit hard
to produce.
But this patch ensures that task-work is setup if it is needed, and
caller/driver did not get to set that explicitly. So looks fine to me.
FWIW, I do not see regression in nvme tests.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14  7:53 ` [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN Ming Lei
  2023-04-14 11:52   ` Kanchan Joshi
@ 2023-04-14 12:39   ` Jens Axboe
  2023-04-14 13:01   ` Pavel Begunkov
  2 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2023-04-14 12:39 UTC (permalink / raw)
  To: io-uring, Ming Lei; +Cc: Kanchan Joshi


On Fri, 14 Apr 2023 15:53:13 +0800, Ming Lei wrote:
> So far io_req_complete_post() only covers DEFER_TASKRUN by completing
> request via task work when the request is completed from IOWQ.
> 
> However, uring command could be completed from any context, and if io
> uring is setup with DEFER_TASKRUN, the command is required to be
> completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
> can't be wakeup, and may hang forever.
> 
> [...]

Applied, thanks!

[1/1] io_uring: complete request via task work in case of DEFER_TASKRUN
      commit: 860e1c7f8b0b43fbf91b4d689adfaa13adb89452

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14  7:53 ` [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN Ming Lei
  2023-04-14 11:52   ` Kanchan Joshi
  2023-04-14 12:39   ` Jens Axboe
@ 2023-04-14 13:01   ` Pavel Begunkov
  2023-04-14 13:53     ` Ming Lei
  2 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2023-04-14 13:01 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, io-uring; +Cc: Kanchan Joshi

On 4/14/23 08:53, Ming Lei wrote:
> So far io_req_complete_post() only covers DEFER_TASKRUN by completing
> request via task work when the request is completed from IOWQ.
> 
> However, uring command could be completed from any context, and if io
> uring is setup with DEFER_TASKRUN, the command is required to be
> completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
> can't be wakeup, and may hang forever.

fwiw, there is one legit exception, when the task is half dead
task_work will be executed by a kthread. It should be fine as it
locks the ctx down, but I can't help but wonder whether it's only
ublk_cancel_queue() affected or there are more places in ublk?

One more thing, cmds should not be setting issue_flags but only
forwarding what the core io_uring code passed, it'll get tons of
bugs in no time otherwise.

static void ublk_cancel_queue(struct ublk_queue *ubq)
{
     ...
     io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0,
                       IO_URING_F_UNLOCKED);
}

Can we replace it with task_work? It should be cold, and I
assume ublk_cancel_queue() doesn't assume that all requests will
put down by the end of the function as io_uring_cmd_done()
can offload it in any case.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14 13:01   ` Pavel Begunkov
@ 2023-04-14 13:53     ` Ming Lei
  2023-04-14 14:13       ` Kanchan Joshi
  2023-04-14 15:07       ` Pavel Begunkov
  0 siblings, 2 replies; 11+ messages in thread
From: Ming Lei @ 2023-04-14 13:53 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, Kanchan Joshi, ming.lei

On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
> On 4/14/23 08:53, Ming Lei wrote:
> > So far io_req_complete_post() only covers DEFER_TASKRUN by completing
> > request via task work when the request is completed from IOWQ.
> > 
> > However, uring command could be completed from any context, and if io
> > uring is setup with DEFER_TASKRUN, the command is required to be
> > completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
> > can't be wakeup, and may hang forever.
> 
> fwiw, there is one legit exception, when the task is half dead
> task_work will be executed by a kthread. It should be fine as it
> locks the ctx down, but I can't help but wonder whether it's only
> ublk_cancel_queue() affected or there are more places in ublk?

No, it isn't.

It isn't triggered on nvme-pt just because command is always done
in task context.

And we know more uring command cases are coming.

> 
> One more thing, cmds should not be setting issue_flags but only
> forwarding what the core io_uring code passed, it'll get tons of
> bugs in no time otherwise.

Here io_uring_cmd_done() is changed to this way recently, and it
could be another topic.

> 
> static void ublk_cancel_queue(struct ublk_queue *ubq)
> {
>     ...
>     io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0,
>                       IO_URING_F_UNLOCKED);
> }
> 
> Can we replace it with task_work? It should be cold, and I
> assume ublk_cancel_queue() doesn't assume that all requests will
> put down by the end of the function as io_uring_cmd_done()
> can offload it in any case.

But it isn't specific for ublk, any caller of io_uring_cmd_done()
has such issue since io_uring_cmd_done() is one generic API.


thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14 13:53     ` Ming Lei
@ 2023-04-14 14:13       ` Kanchan Joshi
  2023-04-14 14:53         ` Ming Lei
  2023-04-14 15:07       ` Pavel Begunkov
  1 sibling, 1 reply; 11+ messages in thread
From: Kanchan Joshi @ 2023-04-14 14:13 UTC (permalink / raw)
  To: Ming Lei; +Cc: Pavel Begunkov, Jens Axboe, io-uring

[-- Attachment #1: Type: text/plain, Size: 1350 bytes --]

On Fri, Apr 14, 2023 at 09:53:15PM +0800, Ming Lei wrote:
>On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
>> On 4/14/23 08:53, Ming Lei wrote:
>> > So far io_req_complete_post() only covers DEFER_TASKRUN by completing
>> > request via task work when the request is completed from IOWQ.
>> >
>> > However, uring command could be completed from any context, and if io
>> > uring is setup with DEFER_TASKRUN, the command is required to be
>> > completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
>> > can't be wakeup, and may hang forever.
>>
>> fwiw, there is one legit exception, when the task is half dead
>> task_work will be executed by a kthread. It should be fine as it
>> locks the ctx down, but I can't help but wonder whether it's only
>> ublk_cancel_queue() affected or there are more places in ublk?
>
>No, it isn't.
>
>It isn't triggered on nvme-pt just because command is always done
>in task context.
>
>And we know more uring command cases are coming.

FWIW, the model I had in mind (in initial days) was this -
1) io_uring_cmd_done is a simple API, it just posts one/two results into
reuglar/big SQE
2) for anything complex completion (that requires task-work), it will
use another API io_uring_cmd_complete_in_task with the provider-specific
callback (that will call above simple API eventually)


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14 14:13       ` Kanchan Joshi
@ 2023-04-14 14:53         ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2023-04-14 14:53 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: Pavel Begunkov, Jens Axboe, io-uring, ming.lei

On Fri, Apr 14, 2023 at 07:43:15PM +0530, Kanchan Joshi wrote:
> On Fri, Apr 14, 2023 at 09:53:15PM +0800, Ming Lei wrote:
> > On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
> > > On 4/14/23 08:53, Ming Lei wrote:
> > > > So far io_req_complete_post() only covers DEFER_TASKRUN by completing
> > > > request via task work when the request is completed from IOWQ.
> > > >
> > > > However, uring command could be completed from any context, and if io
> > > > uring is setup with DEFER_TASKRUN, the command is required to be
> > > > completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
> > > > can't be wakeup, and may hang forever.
> > > 
> > > fwiw, there is one legit exception, when the task is half dead
> > > task_work will be executed by a kthread. It should be fine as it
> > > locks the ctx down, but I can't help but wonder whether it's only
> > > ublk_cancel_queue() affected or there are more places in ublk?
> > 
> > No, it isn't.
> > 
> > It isn't triggered on nvme-pt just because command is always done
> > in task context.
> > 
> > And we know more uring command cases are coming.
> 
> FWIW, the model I had in mind (in initial days) was this -
> 1) io_uring_cmd_done is a simple API, it just posts one/two results into
> reuglar/big SQE
> 2) for anything complex completion (that requires task-work), it will
> use another API io_uring_cmd_complete_in_task with the provider-specific
> callback (that will call above simple API eventually)
 
IMO, the current two APIs type are fine, from ublk viewpoint at least.

io_uring setup/setting is transparent/invisible to driver, and it is reasonable
for the two interfaces to hide any io_uring implementation details.

Meantime driver should be free to choose either of the two.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14 13:53     ` Ming Lei
  2023-04-14 14:13       ` Kanchan Joshi
@ 2023-04-14 15:07       ` Pavel Begunkov
  2023-04-14 15:42         ` Ming Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2023-04-14 15:07 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, io-uring, Kanchan Joshi

On 4/14/23 14:53, Ming Lei wrote:
> On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
>> On 4/14/23 08:53, Ming Lei wrote:
>>> So far io_req_complete_post() only covers DEFER_TASKRUN by completing
>>> request via task work when the request is completed from IOWQ.
>>>
>>> However, uring command could be completed from any context, and if io
>>> uring is setup with DEFER_TASKRUN, the command is required to be
>>> completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
>>> can't be wakeup, and may hang forever.
>>
>> fwiw, there is one legit exception, when the task is half dead
>> task_work will be executed by a kthread. It should be fine as it
>> locks the ctx down, but I can't help but wonder whether it's only
>> ublk_cancel_queue() affected or there are more places in ublk?
> 
> No, it isn't.
> 
> It isn't triggered on nvme-pt just because command is always done
> in task context.
> 
> And we know more uring command cases are coming.

Because all requests and cmds but ublk complete it from another
task, ublk is special in this regard.

I have several more not so related questions:

1) Can requests be submitted by some other task than ->ubq_daemon?
Looking at

static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
     ...
     if (ubq->ubq_daemon && ubq->ubq_daemon != current)
        goto out;
}

ublk_queue_cmd() avoiding io_uring way of delivery and using
raw task_work doesn't seem great. Especially with TWA_SIGNAL_NO_IPI.

2) What the purpose of the two lines below? I see how
UBLK_F_URING_CMD_COMP_IN_TASK is used, but don't understand
why it changes depending on whether it's a module or not.

3) The long comment in ublk_queue_cmd() seems quite scary.
If you have a cmd / io_uring request it hold a ctx reference
and is always allowed to use io_uring's task_work infra like
io_uring_cmd_complete_in_task(). Why it's different for ublk?

>>
>> One more thing, cmds should not be setting issue_flags but only
>> forwarding what the core io_uring code passed, it'll get tons of
>> bugs in no time otherwise.
> 
> Here io_uring_cmd_done() is changed to this way recently, and it
> could be another topic.

And it's abused, but as you said, not particularly related
to this patch.


>> static void ublk_cancel_queue(struct ublk_queue *ubq)
>> {
>>      ...
>>      io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0,
>>                        IO_URING_F_UNLOCKED);
>> }
>>
>> Can we replace it with task_work? It should be cold, and I
>> assume ublk_cancel_queue() doesn't assume that all requests will
>> put down by the end of the function as io_uring_cmd_done()
>> can offload it in any case.
> 
> But it isn't specific for ublk, any caller of io_uring_cmd_done()
> has such issue since io_uring_cmd_done() is one generic API.

Well, fair enough, considering that IO_URING_F_UNLOCKED was
just added (*still naively hoping it'll be clean up*)

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14 15:07       ` Pavel Begunkov
@ 2023-04-14 15:42         ` Ming Lei
  2023-04-15 23:15           ` Pavel Begunkov
  0 siblings, 1 reply; 11+ messages in thread
From: Ming Lei @ 2023-04-14 15:42 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, Kanchan Joshi, ming.lei

On Fri, Apr 14, 2023 at 04:07:52PM +0100, Pavel Begunkov wrote:
> On 4/14/23 14:53, Ming Lei wrote:
> > On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
> > > On 4/14/23 08:53, Ming Lei wrote:
> > > > So far io_req_complete_post() only covers DEFER_TASKRUN by completing
> > > > request via task work when the request is completed from IOWQ.
> > > > 
> > > > However, uring command could be completed from any context, and if io
> > > > uring is setup with DEFER_TASKRUN, the command is required to be
> > > > completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
> > > > can't be wakeup, and may hang forever.
> > > 
> > > fwiw, there is one legit exception, when the task is half dead
> > > task_work will be executed by a kthread. It should be fine as it
> > > locks the ctx down, but I can't help but wonder whether it's only
> > > ublk_cancel_queue() affected or there are more places in ublk?
> > 
> > No, it isn't.
> > 
> > It isn't triggered on nvme-pt just because command is always done
> > in task context.
> > 
> > And we know more uring command cases are coming.
> 
> Because all requests and cmds but ublk complete it from another
> task, ublk is special in this regard.

Not sure it is true, cause it is allowed to call io_uring_cmd_done from other
task technically. And it could be more friendly for driver to not limit
its caller in the task context. Especially we have another API of
io_uring_cmd_complete_in_task().

> 
> I have several more not so related questions:
> 
> 1) Can requests be submitted by some other task than ->ubq_daemon?

Yeah, requests can be submitted by other task, but ublk driver doesn't
allow it because ublk driver has not knowledge when the io_uring context
goes away, so has to limit requests submitted from ->ubq_daemon only,
then use this task's information for checking if the io_uring context
is going to exit. When the io_uring context is dying, we need to
abort these uring commands(may never complete), see ublk_cancel_queue().

The only difference is that the uring command may never complete,
because one uring cmd is only completed when the associated block request
is coming. The situation could be improved by adding API/callback for
notifying io_uring exit.


> Looking at
> 
> static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
> {
>     ...
>     if (ubq->ubq_daemon && ubq->ubq_daemon != current)
>        goto out;
> }
> 
> ublk_queue_cmd() avoiding io_uring way of delivery and using
> raw task_work doesn't seem great. Especially with TWA_SIGNAL_NO_IPI.

Yeah, it has been in my todo list to kill task work. In ublk early time,
task work just performs better than io_uring_cmd_complete_in_task(), but
the gap becomes pretty small or even not visible now. 

> 
> 2) What the purpose of the two lines below? I see how
> UBLK_F_URING_CMD_COMP_IN_TASK is used, but don't understand
> why it changes depending on whether it's a module or not.

task work isn't available in case of building ublk as module.

> 
> 3) The long comment in ublk_queue_cmd() seems quite scary.
> If you have a cmd / io_uring request it hold a ctx reference
> and is always allowed to use io_uring's task_work infra like
> io_uring_cmd_complete_in_task(). Why it's different for ublk?

The thing is that we don't know if there is io_uring request for the
coming blk request. UBLK_IO_FLAG_ABORTED just means that the io_uring
context is dead, and we can't use io_uring_cmd_complete_in_task() any
more.

> 
> > > 
> > > One more thing, cmds should not be setting issue_flags but only
> > > forwarding what the core io_uring code passed, it'll get tons of
> > > bugs in no time otherwise.
> > 
> > Here io_uring_cmd_done() is changed to this way recently, and it
> > could be another topic.
> 
> And it's abused, but as you said, not particularly related
> to this patch.
> 
> 
> > > static void ublk_cancel_queue(struct ublk_queue *ubq)
> > > {
> > >      ...
> > >      io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0,
> > >                        IO_URING_F_UNLOCKED);
> > > }
> > > 
> > > Can we replace it with task_work? It should be cold, and I
> > > assume ublk_cancel_queue() doesn't assume that all requests will
> > > put down by the end of the function as io_uring_cmd_done()
> > > can offload it in any case.
> > 
> > But it isn't specific for ublk, any caller of io_uring_cmd_done()
> > has such issue since io_uring_cmd_done() is one generic API.
> 
> Well, fair enough, considering that IO_URING_F_UNLOCKED was
> just added (*still naively hoping it'll be clean up*)

IMO, it is reasonable for io_uring_cmd_done to hide any io_uring
implementation details, even the task context concept, but not
sure if it is doable.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-14 15:42         ` Ming Lei
@ 2023-04-15 23:15           ` Pavel Begunkov
  2023-04-16 10:05             ` Ming Lei
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2023-04-15 23:15 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, io-uring, Kanchan Joshi

On 4/14/23 16:42, Ming Lei wrote:
> On Fri, Apr 14, 2023 at 04:07:52PM +0100, Pavel Begunkov wrote:
>> On 4/14/23 14:53, Ming Lei wrote:
>>> On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
>>>> On 4/14/23 08:53, Ming Lei wrote:
>>>>> So far io_req_complete_post() only covers DEFER_TASKRUN by completing
>>>>> request via task work when the request is completed from IOWQ.
>>>>>
>>>>> However, uring command could be completed from any context, and if io
>>>>> uring is setup with DEFER_TASKRUN, the command is required to be
>>>>> completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
>>>>> can't be wakeup, and may hang forever.
>>>>
>>>> fwiw, there is one legit exception, when the task is half dead
>>>> task_work will be executed by a kthread. It should be fine as it
>>>> locks the ctx down, but I can't help but wonder whether it's only
>>>> ublk_cancel_queue() affected or there are more places in ublk?
>>>
>>> No, it isn't.
>>>
>>> It isn't triggered on nvme-pt just because command is always done
>>> in task context.
>>>
>>> And we know more uring command cases are coming.
>>
>> Because all requests and cmds but ublk complete it from another
>> task, ublk is special in this regard.
> 
> Not sure it is true, cause it is allowed to call io_uring_cmd_done from other
> task technically. And it could be more friendly for driver to not limit
> its caller in the task context. Especially we have another API of
> io_uring_cmd_complete_in_task().

I agree that the cmd io_uring API can do better.


>> I have several more not so related questions:
>>
>> 1) Can requests be submitted by some other task than ->ubq_daemon?
> 
> Yeah, requests can be submitted by other task, but ublk driver doesn't
> allow it because ublk driver has not knowledge when the io_uring context
> goes away, so has to limit requests submitted from ->ubq_daemon only,
> then use this task's information for checking if the io_uring context
> is going to exit. When the io_uring context is dying, we need to
> abort these uring commands(may never complete), see ublk_cancel_queue().
> 
> The only difference is that the uring command may never complete,
> because one uring cmd is only completed when the associated block request
> is coming. The situation could be improved by adding API/callback for
> notifying io_uring exit.

Got it. And it sounds like you can use IORING_SETUP_SINGLE_ISSUER
and possibly IORING_SETUP_DEFER_TASKRUN, if not already.


>> Looking at
>>
>> static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> {
>>      ...
>>      if (ubq->ubq_daemon && ubq->ubq_daemon != current)
>>         goto out;
>> }
>>
>> ublk_queue_cmd() avoiding io_uring way of delivery and using
>> raw task_work doesn't seem great. Especially with TWA_SIGNAL_NO_IPI.
> 
> Yeah, it has been in my todo list to kill task work. In ublk early time,

I see

> task work just performs better than io_uring_cmd_complete_in_task(), but
> the gap becomes pretty small or even not visible now.

It seems a bit strange, non DEFER_TASKRUN tw is almost identical to what
you do, see __io_req_task_work_add(). Maybe it's extra callbacks on the
execution side.

Did you try DEFER_TASKRUN? Not sure it suits your case as there are
limitations, but the queueing side of it, as well as execution and
waiting are well optimised and should do better.


>> 2) What the purpose of the two lines below? I see how
>> UBLK_F_URING_CMD_COMP_IN_TASK is used, but don't understand
>> why it changes depending on whether it's a module or not.
> 
> task work isn't available in case of building ublk as module.

Ah, makes sense now, thanks

>> 3) The long comment in ublk_queue_cmd() seems quite scary.
>> If you have a cmd / io_uring request it hold a ctx reference
>> and is always allowed to use io_uring's task_work infra like
>> io_uring_cmd_complete_in_task(). Why it's different for ublk?
> 
> The thing is that we don't know if there is io_uring request for the
> coming blk request. UBLK_IO_FLAG_ABORTED just means that the io_uring
> context is dead, and we can't use io_uring_cmd_complete_in_task() any
> more.

Roughly got it, IIUC, there might not be a (valid) io_uring
request backing this block request in the first place because of
this aborting thing.


>>>> One more thing, cmds should not be setting issue_flags but only
>>>> forwarding what the core io_uring code passed, it'll get tons of
>>>> bugs in no time otherwise.
>>>
>>> Here io_uring_cmd_done() is changed to this way recently, and it
>>> could be another topic.
>>
>> And it's abused, but as you said, not particularly related
>> to this patch.
>>
>>
>>>> static void ublk_cancel_queue(struct ublk_queue *ubq)
>>>> {
>>>>       ...
>>>>       io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0,
>>>>                         IO_URING_F_UNLOCKED);
>>>> }
>>>>
>>>> Can we replace it with task_work? It should be cold, and I
>>>> assume ublk_cancel_queue() doesn't assume that all requests will
>>>> put down by the end of the function as io_uring_cmd_done()
>>>> can offload it in any case.
>>>
>>> But it isn't specific for ublk, any caller of io_uring_cmd_done()
>>> has such issue since io_uring_cmd_done() is one generic API.
>>
>> Well, fair enough, considering that IO_URING_F_UNLOCKED was
>> just added (*still naively hoping it'll be clean up*)
> 
> IMO, it is reasonable for io_uring_cmd_done to hide any io_uring
> implementation details, even the task context concept, but not
> sure if it is doable.

I agree that there should be a function doing the right thing
without extra flags, i.e. completing via tw, and there should
also be a helper for more advanced performant cases like when
we know the context.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN
  2023-04-15 23:15           ` Pavel Begunkov
@ 2023-04-16 10:05             ` Ming Lei
  0 siblings, 0 replies; 11+ messages in thread
From: Ming Lei @ 2023-04-16 10:05 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, Kanchan Joshi, ming.lei

On Sun, Apr 16, 2023 at 12:15:20AM +0100, Pavel Begunkov wrote:
> On 4/14/23 16:42, Ming Lei wrote:
> > On Fri, Apr 14, 2023 at 04:07:52PM +0100, Pavel Begunkov wrote:
> > > On 4/14/23 14:53, Ming Lei wrote:
> > > > On Fri, Apr 14, 2023 at 02:01:26PM +0100, Pavel Begunkov wrote:
> > > > > On 4/14/23 08:53, Ming Lei wrote:
> > > > > > So far io_req_complete_post() only covers DEFER_TASKRUN by completing
> > > > > > request via task work when the request is completed from IOWQ.
> > > > > > 
> > > > > > However, uring command could be completed from any context, and if io
> > > > > > uring is setup with DEFER_TASKRUN, the command is required to be
> > > > > > completed from current context, otherwise wait on IORING_ENTER_GETEVENTS
> > > > > > can't be wakeup, and may hang forever.
> > > > > 
> > > > > fwiw, there is one legit exception, when the task is half dead
> > > > > task_work will be executed by a kthread. It should be fine as it
> > > > > locks the ctx down, but I can't help but wonder whether it's only
> > > > > ublk_cancel_queue() affected or there are more places in ublk?
> > > > 
> > > > No, it isn't.
> > > > 
> > > > It isn't triggered on nvme-pt just because command is always done
> > > > in task context.
> > > > 
> > > > And we know more uring command cases are coming.
> > > 
> > > Because all requests and cmds but ublk complete it from another
> > > task, ublk is special in this regard.
> > 
> > Not sure it is true, cause it is allowed to call io_uring_cmd_done from other
> > task technically. And it could be more friendly for driver to not limit
> > its caller in the task context. Especially we have another API of
> > io_uring_cmd_complete_in_task().
> 
> I agree that the cmd io_uring API can do better.
> 
> 
> > > I have several more not so related questions:
> > > 
> > > 1) Can requests be submitted by some other task than ->ubq_daemon?
> > 
> > Yeah, requests can be submitted by other task, but ublk driver doesn't
> > allow it because ublk driver has not knowledge when the io_uring context
> > goes away, so has to limit requests submitted from ->ubq_daemon only,
> > then use this task's information for checking if the io_uring context
> > is going to exit. When the io_uring context is dying, we need to
> > abort these uring commands(may never complete), see ublk_cancel_queue().
> > 
> > The only difference is that the uring command may never complete,
> > because one uring cmd is only completed when the associated block request
> > is coming. The situation could be improved by adding API/callback for
> > notifying io_uring exit.
> 
> Got it. And it sounds like you can use IORING_SETUP_SINGLE_ISSUER
> and possibly IORING_SETUP_DEFER_TASKRUN, if not already.

ublk driver is simple, but the userspace ublk server can be quite
complicated and need flexible setting, and we shouldn't put any limit
on userspace in theory.

> 
> 
> > > Looking at
> > > 
> > > static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > > {
> > >      ...
> > >      if (ubq->ubq_daemon && ubq->ubq_daemon != current)
> > >         goto out;
> > > }
> > > 
> > > ublk_queue_cmd() avoiding io_uring way of delivery and using
> > > raw task_work doesn't seem great. Especially with TWA_SIGNAL_NO_IPI.
> > 
> > Yeah, it has been in my todo list to kill task work. In ublk early time,
> 
> I see
> 
> > task work just performs better than io_uring_cmd_complete_in_task(), but
> > the gap becomes pretty small or even not visible now.
> 
> It seems a bit strange, non DEFER_TASKRUN tw is almost identical to what
> you do, see __io_req_task_work_add(). Maybe it's extra callbacks on the
> execution side.
> 
> Did you try DEFER_TASKRUN? Not sure it suits your case as there are
> limitations, but the queueing side of it, as well as execution and
> waiting are well optimised and should do better.

I tried DEFER_TASKRUN which need this fix, not see obvious IOPS boost
against IORING_SETUP_COOP_TASKRUN, which does make big difference.

> 
> 
> > > 2) What the purpose of the two lines below? I see how
> > > UBLK_F_URING_CMD_COMP_IN_TASK is used, but don't understand
> > > why it changes depending on whether it's a module or not.
> > 
> > task work isn't available in case of building ublk as module.
> 
> Ah, makes sense now, thanks
> 
> > > 3) The long comment in ublk_queue_cmd() seems quite scary.
> > > If you have a cmd / io_uring request it hold a ctx reference
> > > and is always allowed to use io_uring's task_work infra like
> > > io_uring_cmd_complete_in_task(). Why it's different for ublk?
> > 
> > The thing is that we don't know if there is io_uring request for the
> > coming blk request. UBLK_IO_FLAG_ABORTED just means that the io_uring
> > context is dead, and we can't use io_uring_cmd_complete_in_task() any
> > more.
> 
> Roughly got it, IIUC, there might not be a (valid) io_uring
> request backing this block request in the first place because of
> this aborting thing.

I am working on adding notifier cb in io_uring_try_cancel_requests(),
and looks it works. With this way, ublk server implementation can become
quite flexible and aborting becomes simpler, such as, not need limit of
single per-queue submitter any more, and I remember that spdk guys did
complain this kind of limit.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-04-16 10:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20230414075422epcas5p3ae5de53e643a448f19df82a7a1d5cd1c@epcas5p3.samsung.com>
2023-04-14  7:53 ` [PATCH] io_uring: complete request via task work in case of DEFER_TASKRUN Ming Lei
2023-04-14 11:52   ` Kanchan Joshi
2023-04-14 12:39   ` Jens Axboe
2023-04-14 13:01   ` Pavel Begunkov
2023-04-14 13:53     ` Ming Lei
2023-04-14 14:13       ` Kanchan Joshi
2023-04-14 14:53         ` Ming Lei
2023-04-14 15:07       ` Pavel Begunkov
2023-04-14 15:42         ` Ming Lei
2023-04-15 23:15           ` Pavel Begunkov
2023-04-16 10:05             ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox