* [PATCH v2 01/14] io_uring/cmd: kill one issue_flags to tw conversion
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion Pavel Begunkov
` (12 subsequent siblings)
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
io_uring cmd converts struct io_tw_state to issue_flags and later back
to io_tw_state, it's awfully ill-fated, not to mention that intermediate
issue_flags state is not correct.
Get rid of the last conversion, drag through tw everything that came
with IO_URING_F_UNLOCKED, and replace io_req_complete_defer() with a
direct call to io_req_complete_defer(), at least for the time being.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/7f0d5ddfb5335d038bfd8db50656a1d69daed37f.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/uring_cmd.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 42f63adfa54a..f197e8c22965 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -100,11 +100,11 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
if (req->ctx->flags & IORING_SETUP_IOPOLL) {
/* order with io_iopoll_req_issued() checking ->iopoll_complete */
smp_store_release(&req->iopoll_completed, 1);
+ } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
+ io_req_complete_defer(req);
} else {
- struct io_tw_state ts = {
- .locked = !(issue_flags & IO_URING_F_UNLOCKED),
- };
- io_req_task_complete(req, &ts);
+ req->io_task_work.func = io_req_task_complete;
+ io_req_task_work_add(req);
}
}
EXPORT_SYMBOL_GPL(io_uring_cmd_done);
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 01/14] io_uring/cmd: kill one issue_flags to tw conversion Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 2:23 ` Ming Lei
2024-03-18 0:41 ` [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe Pavel Begunkov
` (11 subsequent siblings)
13 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
!IO_URING_F_UNLOCKED does not translate to availability of the deferred
completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
pass and look for to use io_req_complete_defer() and other variants.
Luckily, it's not a real problem as two wrongs actually made it right,
at least as far as io_uring_cmd_work() goes.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/uring_cmd.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index f197e8c22965..ec38a8d4836d 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
- unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
+ unsigned issue_flags = IO_URING_F_UNLOCKED;
+
+ /* locked task_work executor checks the deffered list completion */
+ if (ts->locked)
+ issue_flags = IO_URING_F_COMPLETE_DEFER;
ioucmd->task_work_cb(ioucmd, issue_flags);
}
@@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
if (req->ctx->flags & IORING_SETUP_IOPOLL) {
/* order with io_iopoll_req_issued() checking ->iopoll_complete */
smp_store_release(&req->iopoll_completed, 1);
- } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
+ } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
+ if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
+ return;
io_req_complete_defer(req);
} else {
req->io_task_work.func = io_req_task_complete;
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 0:41 ` [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion Pavel Begunkov
@ 2024-03-18 2:23 ` Ming Lei
2024-03-18 2:25 ` Jens Axboe
0 siblings, 1 reply; 41+ messages in thread
From: Ming Lei @ 2024-03-18 2:23 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
> pass and look for to use io_req_complete_defer() and other variants.
>
> Luckily, it's not a real problem as two wrongs actually made it right,
> at least as far as io_uring_cmd_work() goes.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> io_uring/uring_cmd.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index f197e8c22965..ec38a8d4836d 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
> {
> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
> + unsigned issue_flags = IO_URING_F_UNLOCKED;
> +
> + /* locked task_work executor checks the deffered list completion */
> + if (ts->locked)
> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>
> ioucmd->task_work_cb(ioucmd, issue_flags);
> }
> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
> smp_store_release(&req->iopoll_completed, 1);
> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
> + return;
> io_req_complete_defer(req);
> } else {
> req->io_task_work.func = io_req_task_complete;
'git-bisect' shows the reported warning starts from this patch.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 2:23 ` Ming Lei
@ 2024-03-18 2:25 ` Jens Axboe
2024-03-18 2:32 ` Pavel Begunkov
0 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2024-03-18 2:25 UTC (permalink / raw)
To: Ming Lei, Pavel Begunkov; +Cc: io-uring, linux-block, Kanchan Joshi
On 3/17/24 8:23 PM, Ming Lei wrote:
> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
>> pass and look for to use io_req_complete_defer() and other variants.
>>
>> Luckily, it's not a real problem as two wrongs actually made it right,
>> at least as far as io_uring_cmd_work() goes.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>> io_uring/uring_cmd.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index f197e8c22965..ec38a8d4836d 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>> {
>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>> +
>> + /* locked task_work executor checks the deffered list completion */
>> + if (ts->locked)
>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>
>> ioucmd->task_work_cb(ioucmd, issue_flags);
>> }
>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>> smp_store_release(&req->iopoll_completed, 1);
>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>> + return;
>> io_req_complete_defer(req);
>> } else {
>> req->io_task_work.func = io_req_task_complete;
>
> 'git-bisect' shows the reported warning starts from this patch.
That does make sense, as probably:
+ /* locked task_work executor checks the deffered list completion */
+ if (ts->locked)
+ issue_flags = IO_URING_F_COMPLETE_DEFER;
this assumption isn't true, and that would mess with the task management
(which is in your oops).
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 2:25 ` Jens Axboe
@ 2024-03-18 2:32 ` Pavel Begunkov
2024-03-18 2:40 ` Jens Axboe
0 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 2:32 UTC (permalink / raw)
To: Jens Axboe, Ming Lei; +Cc: io-uring, linux-block, Kanchan Joshi
On 3/18/24 02:25, Jens Axboe wrote:
> On 3/17/24 8:23 PM, Ming Lei wrote:
>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
>>> pass and look for to use io_req_complete_defer() and other variants.
>>>
>>> Luckily, it's not a real problem as two wrongs actually made it right,
>>> at least as far as io_uring_cmd_work() goes.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
>>> Signed-off-by: Jens Axboe <[email protected]>
oops, I should've removed all the signed-offs
>>> ---
>>> io_uring/uring_cmd.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>> index f197e8c22965..ec38a8d4836d 100644
>>> --- a/io_uring/uring_cmd.c
>>> +++ b/io_uring/uring_cmd.c
>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>> {
>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>>> +
>>> + /* locked task_work executor checks the deffered list completion */
>>> + if (ts->locked)
>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>
>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>> }
>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>>> smp_store_release(&req->iopoll_completed, 1);
>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>>> + return;
>>> io_req_complete_defer(req);
>>> } else {
>>> req->io_task_work.func = io_req_task_complete;
>>
>> 'git-bisect' shows the reported warning starts from this patch.
Thanks Ming
>
> That does make sense, as probably:
>
> + /* locked task_work executor checks the deffered list completion */
> + if (ts->locked)
> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>
> this assumption isn't true, and that would mess with the task management
> (which is in your oops).
I'm missing it, how it's not true?
static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
{
...
if (ts->locked) {
io_submit_flush_completions(ctx);
...
}
}
static __cold void io_fallback_req_func(struct work_struct *work)
{
...
mutex_lock(&ctx->uring_lock);
llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
req->io_task_work.func(req, &ts);
io_submit_flush_completions(ctx);
mutex_unlock(&ctx->uring_lock);
...
}
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 2:32 ` Pavel Begunkov
@ 2024-03-18 2:40 ` Jens Axboe
2024-03-18 2:43 ` Pavel Begunkov
2024-03-18 2:47 ` Ming Lei
0 siblings, 2 replies; 41+ messages in thread
From: Jens Axboe @ 2024-03-18 2:40 UTC (permalink / raw)
To: Pavel Begunkov, Ming Lei; +Cc: io-uring, linux-block, Kanchan Joshi
On 3/17/24 8:32 PM, Pavel Begunkov wrote:
> On 3/18/24 02:25, Jens Axboe wrote:
>> On 3/17/24 8:23 PM, Ming Lei wrote:
>>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
>>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
>>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
>>>> pass and look for to use io_req_complete_defer() and other variants.
>>>>
>>>> Luckily, it's not a real problem as two wrongs actually made it right,
>>>> at least as far as io_uring_cmd_work() goes.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
>>>> Signed-off-by: Jens Axboe <[email protected]>
>
> oops, I should've removed all the signed-offs
>
>>>> ---
>>>> io_uring/uring_cmd.c | 10 ++++++++--
>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>> index f197e8c22965..ec38a8d4836d 100644
>>>> --- a/io_uring/uring_cmd.c
>>>> +++ b/io_uring/uring_cmd.c
>>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>> {
>>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>>>> +
>>>> + /* locked task_work executor checks the deffered list completion */
>>>> + if (ts->locked)
>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>>> }
>>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>>>> smp_store_release(&req->iopoll_completed, 1);
>>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>>>> + return;
>>>> io_req_complete_defer(req);
>>>> } else {
>>>> req->io_task_work.func = io_req_task_complete;
>>>
>>> 'git-bisect' shows the reported warning starts from this patch.
>
> Thanks Ming
>
>>
>> That does make sense, as probably:
>>
>> + /* locked task_work executor checks the deffered list completion */
>> + if (ts->locked)
>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>
>> this assumption isn't true, and that would mess with the task management
>> (which is in your oops).
>
> I'm missing it, how it's not true?
>
>
> static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
> {
> ...
> if (ts->locked) {
> io_submit_flush_completions(ctx);
> ...
> }
> }
>
> static __cold void io_fallback_req_func(struct work_struct *work)
> {
> ...
> mutex_lock(&ctx->uring_lock);
> llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
> req->io_task_work.func(req, &ts);
> io_submit_flush_completions(ctx);
> mutex_unlock(&ctx->uring_lock);
> ...
> }
I took a look too, and don't immediately see it. Those are also the two
only cases I found, and before the patches, looks fine too.
So no immediate answer there... But I can confirm that before this
patch, test passes fine. With the patch, it goes boom pretty quick.
Either directly off putting the task, or an unrelated memory crash
instead.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 2:40 ` Jens Axboe
@ 2024-03-18 2:43 ` Pavel Begunkov
2024-03-18 2:46 ` Jens Axboe
2024-03-18 2:47 ` Ming Lei
1 sibling, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 2:43 UTC (permalink / raw)
To: Jens Axboe, Ming Lei; +Cc: io-uring, linux-block, Kanchan Joshi
On 3/18/24 02:40, Jens Axboe wrote:
> On 3/17/24 8:32 PM, Pavel Begunkov wrote:
>> On 3/18/24 02:25, Jens Axboe wrote:
>>> On 3/17/24 8:23 PM, Ming Lei wrote:
>>>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
>>>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
>>>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
>>>>> pass and look for to use io_req_complete_defer() and other variants.
>>>>>
>>>>> Luckily, it's not a real problem as two wrongs actually made it right,
>>>>> at least as far as io_uring_cmd_work() goes.
>>>>>
>>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>
>> oops, I should've removed all the signed-offs
>>
>>>>> ---
>>>>> io_uring/uring_cmd.c | 10 ++++++++--
>>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>>> index f197e8c22965..ec38a8d4836d 100644
>>>>> --- a/io_uring/uring_cmd.c
>>>>> +++ b/io_uring/uring_cmd.c
>>>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>>>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>>> {
>>>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>>>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>>>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>>>>> +
>>>>> + /* locked task_work executor checks the deffered list completion */
>>>>> + if (ts->locked)
>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>>>> }
>>>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>>>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>>>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>>>>> smp_store_release(&req->iopoll_completed, 1);
>>>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>>>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>>>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>>>>> + return;
>>>>> io_req_complete_defer(req);
>>>>> } else {
>>>>> req->io_task_work.func = io_req_task_complete;
>>>>
>>>> 'git-bisect' shows the reported warning starts from this patch.
>>
>> Thanks Ming
>>
>>>
>>> That does make sense, as probably:
>>>
>>> + /* locked task_work executor checks the deffered list completion */
>>> + if (ts->locked)
>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>
>>> this assumption isn't true, and that would mess with the task management
>>> (which is in your oops).
>>
>> I'm missing it, how it's not true?
>>
>>
>> static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
>> {
>> ...
>> if (ts->locked) {
>> io_submit_flush_completions(ctx);
>> ...
>> }
>> }
>>
>> static __cold void io_fallback_req_func(struct work_struct *work)
>> {
>> ...
>> mutex_lock(&ctx->uring_lock);
>> llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
>> req->io_task_work.func(req, &ts);
>> io_submit_flush_completions(ctx);
>> mutex_unlock(&ctx->uring_lock);
>> ...
>> }
>
> I took a look too, and don't immediately see it. Those are also the two
> only cases I found, and before the patches, looks fine too.
>
> So no immediate answer there... But I can confirm that before this
> patch, test passes fine. With the patch, it goes boom pretty quick.
Which test is that? ublk generic/004?
> Either directly off putting the task, or an unrelated memory crash
> instead.
If tw locks it should be checking for deferred completions,
that's the rule. Maybe there is some rouge conversion and locked
came not from task work executor... I'll need to stare more
closely
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 2:43 ` Pavel Begunkov
@ 2024-03-18 2:46 ` Jens Axboe
0 siblings, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2024-03-18 2:46 UTC (permalink / raw)
To: Pavel Begunkov, Ming Lei; +Cc: io-uring, linux-block, Kanchan Joshi
On 3/17/24 8:43 PM, Pavel Begunkov wrote:
>>>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>>>> index f197e8c22965..ec38a8d4836d 100644
>>>>>> --- a/io_uring/uring_cmd.c
>>>>>> +++ b/io_uring/uring_cmd.c
>>>>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>>>>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>>>> {
>>>>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>>>>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>>>>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>>>>>> +
>>>>>> + /* locked task_work executor checks the deffered list completion */
>>>>>> + if (ts->locked)
>>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>>>>> }
>>>>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>>>>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>>>>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>>>>>> smp_store_release(&req->iopoll_completed, 1);
>>>>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>>>>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>>>>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>>>>>> + return;
>>>>>> io_req_complete_defer(req);
>>>>>> } else {
>>>>>> req->io_task_work.func = io_req_task_complete;
>>>>>
>>>>> 'git-bisect' shows the reported warning starts from this patch.
>>>
>>> Thanks Ming
>>>
>>>>
>>>> That does make sense, as probably:
>>>>
>>>> + /* locked task_work executor checks the deffered list completion */
>>>> + if (ts->locked)
>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>
>>>> this assumption isn't true, and that would mess with the task management
>>>> (which is in your oops).
>>>
>>> I'm missing it, how it's not true?
>>>
>>>
>>> static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
>>> {
>>> ...
>>> if (ts->locked) {
>>> io_submit_flush_completions(ctx);
>>> ...
>>> }
>>> }
>>>
>>> static __cold void io_fallback_req_func(struct work_struct *work)
>>> {
>>> ...
>>> mutex_lock(&ctx->uring_lock);
>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
>>> req->io_task_work.func(req, &ts);
>>> io_submit_flush_completions(ctx);
>>> mutex_unlock(&ctx->uring_lock);
>>> ...
>>> }
>>
>> I took a look too, and don't immediately see it. Those are also the two
>> only cases I found, and before the patches, looks fine too.
>>
>> So no immediate answer there... But I can confirm that before this
>> patch, test passes fine. With the patch, it goes boom pretty quick.
>
> Which test is that? ublk generic/004?
Yep, it won't make it through one run of:
sudo make test T=generic/004
with it.
>> Either directly off putting the task, or an unrelated memory crash
>> instead.
>
> If tw locks it should be checking for deferred completions,
> that's the rule. Maybe there is some rouge conversion and locked
> came not from task work executor... I'll need to stare more
> closely
Yeah not sure what it is, but I hope you can reproduce with the above.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 2:40 ` Jens Axboe
2024-03-18 2:43 ` Pavel Begunkov
@ 2024-03-18 2:47 ` Ming Lei
2024-03-18 3:11 ` Jens Axboe
1 sibling, 1 reply; 41+ messages in thread
From: Ming Lei @ 2024-03-18 2:47 UTC (permalink / raw)
To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-block, Kanchan Joshi
On Sun, Mar 17, 2024 at 08:40:59PM -0600, Jens Axboe wrote:
> On 3/17/24 8:32 PM, Pavel Begunkov wrote:
> > On 3/18/24 02:25, Jens Axboe wrote:
> >> On 3/17/24 8:23 PM, Ming Lei wrote:
> >>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
> >>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
> >>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
> >>>> pass and look for to use io_req_complete_defer() and other variants.
> >>>>
> >>>> Luckily, it's not a real problem as two wrongs actually made it right,
> >>>> at least as far as io_uring_cmd_work() goes.
> >>>>
> >>>> Signed-off-by: Pavel Begunkov <[email protected]>
> >>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
> >>>> Signed-off-by: Jens Axboe <[email protected]>
> >
> > oops, I should've removed all the signed-offs
> >
> >>>> ---
> >>>> io_uring/uring_cmd.c | 10 ++++++++--
> >>>> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> >>>> index f197e8c22965..ec38a8d4836d 100644
> >>>> --- a/io_uring/uring_cmd.c
> >>>> +++ b/io_uring/uring_cmd.c
> >>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
> >>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
> >>>> {
> >>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
> >>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
> >>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
> >>>> +
> >>>> + /* locked task_work executor checks the deffered list completion */
> >>>> + if (ts->locked)
> >>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
> >>>> ioucmd->task_work_cb(ioucmd, issue_flags);
> >>>> }
> >>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
> >>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
> >>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
> >>>> smp_store_release(&req->iopoll_completed, 1);
> >>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
> >>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
> >>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
> >>>> + return;
> >>>> io_req_complete_defer(req);
> >>>> } else {
> >>>> req->io_task_work.func = io_req_task_complete;
> >>>
> >>> 'git-bisect' shows the reported warning starts from this patch.
> >
> > Thanks Ming
> >
> >>
> >> That does make sense, as probably:
> >>
> >> + /* locked task_work executor checks the deffered list completion */
> >> + if (ts->locked)
> >> + issue_flags = IO_URING_F_COMPLETE_DEFER;
> >>
> >> this assumption isn't true, and that would mess with the task management
> >> (which is in your oops).
> >
> > I'm missing it, how it's not true?
> >
> >
> > static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
> > {
> > ...
> > if (ts->locked) {
> > io_submit_flush_completions(ctx);
> > ...
> > }
> > }
> >
> > static __cold void io_fallback_req_func(struct work_struct *work)
> > {
> > ...
> > mutex_lock(&ctx->uring_lock);
> > llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
> > req->io_task_work.func(req, &ts);
> > io_submit_flush_completions(ctx);
> > mutex_unlock(&ctx->uring_lock);
> > ...
> > }
>
> I took a look too, and don't immediately see it. Those are also the two
> only cases I found, and before the patches, looks fine too.
>
> So no immediate answer there... But I can confirm that before this
> patch, test passes fine. With the patch, it goes boom pretty quick.
> Either directly off putting the task, or an unrelated memory crash
> instead.
In ublk, the translated 'issue_flags' is passed to io_uring_cmd_done()
from ioucmd->task_work_cb()(__ublk_rq_task_work()). That might be
related with the reason.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 2:47 ` Ming Lei
@ 2024-03-18 3:11 ` Jens Axboe
2024-03-18 3:24 ` Pavel Begunkov
2024-03-18 6:59 ` Ming Lei
0 siblings, 2 replies; 41+ messages in thread
From: Jens Axboe @ 2024-03-18 3:11 UTC (permalink / raw)
To: Ming Lei; +Cc: Pavel Begunkov, io-uring, linux-block, Kanchan Joshi
On 3/17/24 8:47 PM, Ming Lei wrote:
> On Sun, Mar 17, 2024 at 08:40:59PM -0600, Jens Axboe wrote:
>> On 3/17/24 8:32 PM, Pavel Begunkov wrote:
>>> On 3/18/24 02:25, Jens Axboe wrote:
>>>> On 3/17/24 8:23 PM, Ming Lei wrote:
>>>>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
>>>>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
>>>>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
>>>>>> pass and look for to use io_req_complete_defer() and other variants.
>>>>>>
>>>>>> Luckily, it's not a real problem as two wrongs actually made it right,
>>>>>> at least as far as io_uring_cmd_work() goes.
>>>>>>
>>>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>
>>> oops, I should've removed all the signed-offs
>>>
>>>>>> ---
>>>>>> io_uring/uring_cmd.c | 10 ++++++++--
>>>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>>>> index f197e8c22965..ec38a8d4836d 100644
>>>>>> --- a/io_uring/uring_cmd.c
>>>>>> +++ b/io_uring/uring_cmd.c
>>>>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>>>>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>>>> {
>>>>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>>>>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>>>>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>>>>>> +
>>>>>> + /* locked task_work executor checks the deffered list completion */
>>>>>> + if (ts->locked)
>>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>>>>> }
>>>>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>>>>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>>>>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>>>>>> smp_store_release(&req->iopoll_completed, 1);
>>>>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>>>>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>>>>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>>>>>> + return;
>>>>>> io_req_complete_defer(req);
>>>>>> } else {
>>>>>> req->io_task_work.func = io_req_task_complete;
>>>>>
>>>>> 'git-bisect' shows the reported warning starts from this patch.
>>>
>>> Thanks Ming
>>>
>>>>
>>>> That does make sense, as probably:
>>>>
>>>> + /* locked task_work executor checks the deffered list completion */
>>>> + if (ts->locked)
>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>
>>>> this assumption isn't true, and that would mess with the task management
>>>> (which is in your oops).
>>>
>>> I'm missing it, how it's not true?
>>>
>>>
>>> static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
>>> {
>>> ...
>>> if (ts->locked) {
>>> io_submit_flush_completions(ctx);
>>> ...
>>> }
>>> }
>>>
>>> static __cold void io_fallback_req_func(struct work_struct *work)
>>> {
>>> ...
>>> mutex_lock(&ctx->uring_lock);
>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
>>> req->io_task_work.func(req, &ts);
>>> io_submit_flush_completions(ctx);
>>> mutex_unlock(&ctx->uring_lock);
>>> ...
>>> }
>>
>> I took a look too, and don't immediately see it. Those are also the two
>> only cases I found, and before the patches, looks fine too.
>>
>> So no immediate answer there... But I can confirm that before this
>> patch, test passes fine. With the patch, it goes boom pretty quick.
>> Either directly off putting the task, or an unrelated memory crash
>> instead.
>
> In ublk, the translated 'issue_flags' is passed to io_uring_cmd_done()
> from ioucmd->task_work_cb()(__ublk_rq_task_work()). That might be
> related with the reason.
Or maybe ublk is doing multiple invocations of task_work completions? I
added this:
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index a2cb8da3cc33..ba7641b380a9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -739,6 +739,7 @@ static void io_put_task_remote(struct task_struct *task)
{
struct io_uring_task *tctx = task->io_uring;
+ WARN_ON_ONCE(!percpu_counter_read(&tctx->inflight));
percpu_counter_sub(&tctx->inflight, 1);
if (unlikely(atomic_read(&tctx->in_cancel)))
wake_up(&tctx->wait);
and hit this:
[ 77.386845] ------------[ cut here ]------------
[ 77.387128] WARNING: CPU: 5 PID: 109 at io_uring/io_uring.c:742
io_put_task_remote+0x164/0x1a8
[ 77.387608] Modules linked in:
[ 77.387784] CPU: 5 PID: 109 Comm: kworker/5:1 Not tainted
6.8.0-11436-g340741d86a53-dirty #5750
[ 77.388277] Hardware name: linux,dummy-virt (DT)
[ 77.388601] Workqueue: events io_fallback_req_func
[ 77.388930] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS
BTYPE=--)
[ 77.389402] pc : io_put_task_remote+0x164/0x1a8
[ 77.389711] lr : __io_submit_flush_completions+0x8b8/0x1308
[ 77.390087] sp : ffff800087327a60
[ 77.390317] x29: ffff800087327a60 x28: 1fffe0002040b329 x27:
1fffe0002040b32f
[ 77.390817] x26: ffff000103c4e900 x25: ffff000102059900 x24:
ffff000104670000
[ 77.391314] x23: ffff0000d2195000 x22: 00000000002ce20c x21:
ffff0000ced4fcc8
[ 77.391787] x20: ffff0000ced4fc00 x19: ffff000103c4e900 x18:
0000000000000000
[ 77.392209] x17: ffff8000814b0c34 x16: ffff8000814affac x15:
ffff8000814ac4a8
[ 77.392633] x14: ffff80008069327c x13: ffff800080018c9c x12:
ffff600020789d26
[ 77.393055] x11: 1fffe00020789d25 x10: ffff600020789d25 x9 :
dfff800000000000
[ 77.393479] x8 : 00009fffdf8762db x7 : ffff000103c4e92b x6 :
0000000000000001
[ 77.393904] x5 : ffff000103c4e928 x4 : ffff600020789d26 x3 :
1fffe0001a432a7a
[ 77.394334] x2 : 1fffe00019da9f9a x1 : 0000000000000000 x0 :
0000000000000000
[ 77.394761] Call trace:
[ 77.394913] io_put_task_remote+0x164/0x1a8
[ 77.395168] __io_submit_flush_completions+0x8b8/0x1308
[ 77.395481] io_fallback_req_func+0x138/0x1e8
[ 77.395742] process_one_work+0x538/0x1048
[ 77.395992] worker_thread+0x760/0xbd4
[ 77.396221] kthread+0x2dc/0x368
[ 77.396417] ret_from_fork+0x10/0x20
[ 77.396634] ---[ end trace 0000000000000000 ]---
[ 77.397706] ------------[ cut here ]------------
which is showing either an imbalance in the task references, or multiple
completions from the same io_uring request.
Anyway, I'll pop back in tomrrow, but hopefully the above is somewhat
useful at least. I'd suspect the __ublk_rq_task_work() abort check for
current != ubq->ubq_daemon and what happens off that.
--
Jens Axboe
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 3:11 ` Jens Axboe
@ 2024-03-18 3:24 ` Pavel Begunkov
2024-03-18 6:59 ` Ming Lei
1 sibling, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 3:24 UTC (permalink / raw)
To: Jens Axboe, Ming Lei; +Cc: io-uring, linux-block, Kanchan Joshi
On 3/18/24 03:11, Jens Axboe wrote:
> On 3/17/24 8:47 PM, Ming Lei wrote:
>> On Sun, Mar 17, 2024 at 08:40:59PM -0600, Jens Axboe wrote:
>>> On 3/17/24 8:32 PM, Pavel Begunkov wrote:
>>>> On 3/18/24 02:25, Jens Axboe wrote:
>>>>> On 3/17/24 8:23 PM, Ming Lei wrote:
>>>>>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
>>>>>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
>>>>>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
>>>>>>> pass and look for to use io_req_complete_defer() and other variants.
>>>>>>>
>>>>>>> Luckily, it's not a real problem as two wrongs actually made it right,
>>>>>>> at least as far as io_uring_cmd_work() goes.
>>>>>>>
>>>>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>>>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>
>>>> oops, I should've removed all the signed-offs
>>>>
>>>>>>> ---
>>>>>>> io_uring/uring_cmd.c | 10 ++++++++--
>>>>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>>>>> index f197e8c22965..ec38a8d4836d 100644
>>>>>>> --- a/io_uring/uring_cmd.c
>>>>>>> +++ b/io_uring/uring_cmd.c
>>>>>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>>>>>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>>>>> {
>>>>>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>>>>>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>>>>>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>>>>>>> +
>>>>>>> + /* locked task_work executor checks the deffered list completion */
>>>>>>> + if (ts->locked)
>>>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>>>>>> }
>>>>>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>>>>>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>>>>>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>>>>>>> smp_store_release(&req->iopoll_completed, 1);
>>>>>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>>>>>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>>>>>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>>>>>>> + return;
>>>>>>> io_req_complete_defer(req);
>>>>>>> } else {
>>>>>>> req->io_task_work.func = io_req_task_complete;
>>>>>>
>>>>>> 'git-bisect' shows the reported warning starts from this patch.
>>>>
>>>> Thanks Ming
>>>>
>>>>>
>>>>> That does make sense, as probably:
>>>>>
>>>>> + /* locked task_work executor checks the deffered list completion */
>>>>> + if (ts->locked)
>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>>
>>>>> this assumption isn't true, and that would mess with the task management
>>>>> (which is in your oops).
>>>>
>>>> I'm missing it, how it's not true?
>>>>
>>>>
>>>> static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
>>>> {
>>>> ...
>>>> if (ts->locked) {
>>>> io_submit_flush_completions(ctx);
>>>> ...
>>>> }
>>>> }
>>>>
>>>> static __cold void io_fallback_req_func(struct work_struct *work)
>>>> {
>>>> ...
>>>> mutex_lock(&ctx->uring_lock);
>>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
>>>> req->io_task_work.func(req, &ts);
>>>> io_submit_flush_completions(ctx);
>>>> mutex_unlock(&ctx->uring_lock);
>>>> ...
>>>> }
>>>
>>> I took a look too, and don't immediately see it. Those are also the two
>>> only cases I found, and before the patches, looks fine too.
>>>
>>> So no immediate answer there... But I can confirm that before this
>>> patch, test passes fine. With the patch, it goes boom pretty quick.
>>> Either directly off putting the task, or an unrelated memory crash
>>> instead.
>>
>> In ublk, the translated 'issue_flags' is passed to io_uring_cmd_done()
>> from ioucmd->task_work_cb()(__ublk_rq_task_work()). That might be
>> related with the reason.
>
> Or maybe ublk is doing multiple invocations of task_work completions? I
> added this:
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index a2cb8da3cc33..ba7641b380a9 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -739,6 +739,7 @@ static void io_put_task_remote(struct task_struct *task)
> {
> struct io_uring_task *tctx = task->io_uring;
>
> + WARN_ON_ONCE(!percpu_counter_read(&tctx->inflight));
> percpu_counter_sub(&tctx->inflight, 1);
> if (unlikely(atomic_read(&tctx->in_cancel)))
> wake_up(&tctx->wait);
>
> and hit this:
>
> [ 77.386845] ------------[ cut here ]------------
> [ 77.387128] WARNING: CPU: 5 PID: 109 at io_uring/io_uring.c:742
> io_put_task_remote+0x164/0x1a8
> [ 77.387608] Modules linked in:
> [ 77.387784] CPU: 5 PID: 109 Comm: kworker/5:1 Not tainted
> 6.8.0-11436-g340741d86a53-dirty #5750
> [ 77.388277] Hardware name: linux,dummy-virt (DT)
> [ 77.388601] Workqueue: events io_fallback_req_func
> [ 77.388930] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS
> BTYPE=--)
> [ 77.389402] pc : io_put_task_remote+0x164/0x1a8
> [ 77.389711] lr : __io_submit_flush_completions+0x8b8/0x1308
> [ 77.390087] sp : ffff800087327a60
> [ 77.390317] x29: ffff800087327a60 x28: 1fffe0002040b329 x27:
> 1fffe0002040b32f
> [ 77.390817] x26: ffff000103c4e900 x25: ffff000102059900 x24:
> ffff000104670000
> [ 77.391314] x23: ffff0000d2195000 x22: 00000000002ce20c x21:
> ffff0000ced4fcc8
> [ 77.391787] x20: ffff0000ced4fc00 x19: ffff000103c4e900 x18:
> 0000000000000000
> [ 77.392209] x17: ffff8000814b0c34 x16: ffff8000814affac x15:
> ffff8000814ac4a8
> [ 77.392633] x14: ffff80008069327c x13: ffff800080018c9c x12:
> ffff600020789d26
> [ 77.393055] x11: 1fffe00020789d25 x10: ffff600020789d25 x9 :
> dfff800000000000
> [ 77.393479] x8 : 00009fffdf8762db x7 : ffff000103c4e92b x6 :
> 0000000000000001
> [ 77.393904] x5 : ffff000103c4e928 x4 : ffff600020789d26 x3 :
> 1fffe0001a432a7a
> [ 77.394334] x2 : 1fffe00019da9f9a x1 : 0000000000000000 x0 :
> 0000000000000000
> [ 77.394761] Call trace:
> [ 77.394913] io_put_task_remote+0x164/0x1a8
> [ 77.395168] __io_submit_flush_completions+0x8b8/0x1308
> [ 77.395481] io_fallback_req_func+0x138/0x1e8
> [ 77.395742] process_one_work+0x538/0x1048
> [ 77.395992] worker_thread+0x760/0xbd4
> [ 77.396221] kthread+0x2dc/0x368
> [ 77.396417] ret_from_fork+0x10/0x20
> [ 77.396634] ---[ end trace 0000000000000000 ]---
> [ 77.397706] ------------[ cut here ]------------
>
> which is showing either an imbalance in the task references, or multiple
> completions from the same io_uring request.
>
> Anyway, I'll pop back in tomrrow, but hopefully the above is somewhat
> useful at least. I'd suspect the __ublk_rq_task_work() abort check for
> current != ubq->ubq_daemon and what happens off that.
We can enable refcounting for all requests, then it should trigger
on double free. i.e. adding io_req_set_refcount(req) somewhere in
io_init_req().
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 3:11 ` Jens Axboe
2024-03-18 3:24 ` Pavel Begunkov
@ 2024-03-18 6:59 ` Ming Lei
2024-03-18 11:45 ` Pavel Begunkov
1 sibling, 1 reply; 41+ messages in thread
From: Ming Lei @ 2024-03-18 6:59 UTC (permalink / raw)
To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-block, Kanchan Joshi
On Sun, Mar 17, 2024 at 09:11:27PM -0600, Jens Axboe wrote:
> On 3/17/24 8:47 PM, Ming Lei wrote:
> > On Sun, Mar 17, 2024 at 08:40:59PM -0600, Jens Axboe wrote:
> >> On 3/17/24 8:32 PM, Pavel Begunkov wrote:
> >>> On 3/18/24 02:25, Jens Axboe wrote:
> >>>> On 3/17/24 8:23 PM, Ming Lei wrote:
> >>>>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
> >>>>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
> >>>>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
> >>>>>> pass and look for to use io_req_complete_defer() and other variants.
> >>>>>>
> >>>>>> Luckily, it's not a real problem as two wrongs actually made it right,
> >>>>>> at least as far as io_uring_cmd_work() goes.
> >>>>>>
> >>>>>> Signed-off-by: Pavel Begunkov <[email protected]>
> >>>>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
> >>>>>> Signed-off-by: Jens Axboe <[email protected]>
> >>>
> >>> oops, I should've removed all the signed-offs
> >>>
> >>>>>> ---
> >>>>>> io_uring/uring_cmd.c | 10 ++++++++--
> >>>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> >>>>>> index f197e8c22965..ec38a8d4836d 100644
> >>>>>> --- a/io_uring/uring_cmd.c
> >>>>>> +++ b/io_uring/uring_cmd.c
> >>>>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
> >>>>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
> >>>>>> {
> >>>>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
> >>>>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
> >>>>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
> >>>>>> +
> >>>>>> + /* locked task_work executor checks the deffered list completion */
> >>>>>> + if (ts->locked)
> >>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
> >>>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
> >>>>>> }
> >>>>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
> >>>>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
> >>>>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
> >>>>>> smp_store_release(&req->iopoll_completed, 1);
> >>>>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
> >>>>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
> >>>>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
> >>>>>> + return;
> >>>>>> io_req_complete_defer(req);
> >>>>>> } else {
> >>>>>> req->io_task_work.func = io_req_task_complete;
> >>>>>
> >>>>> 'git-bisect' shows the reported warning starts from this patch.
> >>>
> >>> Thanks Ming
> >>>
> >>>>
> >>>> That does make sense, as probably:
> >>>>
> >>>> + /* locked task_work executor checks the deffered list completion */
> >>>> + if (ts->locked)
> >>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
> >>>>
> >>>> this assumption isn't true, and that would mess with the task management
> >>>> (which is in your oops).
> >>>
> >>> I'm missing it, how it's not true?
> >>>
> >>>
> >>> static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
> >>> {
> >>> ...
> >>> if (ts->locked) {
> >>> io_submit_flush_completions(ctx);
> >>> ...
> >>> }
> >>> }
> >>>
> >>> static __cold void io_fallback_req_func(struct work_struct *work)
> >>> {
> >>> ...
> >>> mutex_lock(&ctx->uring_lock);
> >>> llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
> >>> req->io_task_work.func(req, &ts);
> >>> io_submit_flush_completions(ctx);
> >>> mutex_unlock(&ctx->uring_lock);
> >>> ...
> >>> }
> >>
> >> I took a look too, and don't immediately see it. Those are also the two
> >> only cases I found, and before the patches, looks fine too.
> >>
> >> So no immediate answer there... But I can confirm that before this
> >> patch, test passes fine. With the patch, it goes boom pretty quick.
> >> Either directly off putting the task, or an unrelated memory crash
> >> instead.
> >
> > In ublk, the translated 'issue_flags' is passed to io_uring_cmd_done()
> > from ioucmd->task_work_cb()(__ublk_rq_task_work()). That might be
> > related with the reason.
>
> Or maybe ublk is doing multiple invocations of task_work completions? I
> added this:
Yes, your debug log & point does help.
This patch convert zero flag(!IO_URING_F_UNLOCKED) into IO_URING_F_COMPLETE_DEFER,
and somewhere is easily ignored, and follows the fix, which need to be
folded into patch 2.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5d4b448fdc50..22f2b52390a9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3259,7 +3259,8 @@ static bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
/* ->sqe isn't available if no async data */
if (!req_has_async_data(req))
cmd->sqe = NULL;
- file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL);
+ file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL |
+ IO_URING_F_COMPLETE_DEFER);
ret = true;
}
}
Thanks,
Ming
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion
2024-03-18 6:59 ` Ming Lei
@ 2024-03-18 11:45 ` Pavel Begunkov
0 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 11:45 UTC (permalink / raw)
To: Ming Lei, Jens Axboe; +Cc: io-uring, linux-block, Kanchan Joshi
On 3/18/24 06:59, Ming Lei wrote:
> On Sun, Mar 17, 2024 at 09:11:27PM -0600, Jens Axboe wrote:
>> On 3/17/24 8:47 PM, Ming Lei wrote:
>>> On Sun, Mar 17, 2024 at 08:40:59PM -0600, Jens Axboe wrote:
>>>> On 3/17/24 8:32 PM, Pavel Begunkov wrote:
>>>>> On 3/18/24 02:25, Jens Axboe wrote:
>>>>>> On 3/17/24 8:23 PM, Ming Lei wrote:
>>>>>>> On Mon, Mar 18, 2024 at 12:41:47AM +0000, Pavel Begunkov wrote:
>>>>>>>> !IO_URING_F_UNLOCKED does not translate to availability of the deferred
>>>>>>>> completion infra, IO_URING_F_COMPLETE_DEFER does, that what we should
>>>>>>>> pass and look for to use io_req_complete_defer() and other variants.
>>>>>>>>
>>>>>>>> Luckily, it's not a real problem as two wrongs actually made it right,
>>>>>>>> at least as far as io_uring_cmd_work() goes.
>>>>>>>>
>>>>>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>>>>>> Link: https://lore.kernel.org/r/eb08e72e837106963bc7bc7dccfd93d646cc7f36.1710514702.git.asml.silence@gmail.com
>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>
>>>>> oops, I should've removed all the signed-offs
>>>>>
>>>>>>>> ---
>>>>>>>> io_uring/uring_cmd.c | 10 ++++++++--
>>>>>>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>>>>>> index f197e8c22965..ec38a8d4836d 100644
>>>>>>>> --- a/io_uring/uring_cmd.c
>>>>>>>> +++ b/io_uring/uring_cmd.c
>>>>>>>> @@ -56,7 +56,11 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
>>>>>>>> static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>>>>>> {
>>>>>>>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>>>>>>>> - unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
>>>>>>>> + unsigned issue_flags = IO_URING_F_UNLOCKED;
>>>>>>>> +
>>>>>>>> + /* locked task_work executor checks the deffered list completion */
>>>>>>>> + if (ts->locked)
>>>>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>>>>>>> }
>>>>>>>> @@ -100,7 +104,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
>>>>>>>> if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>>>>>>>> /* order with io_iopoll_req_issued() checking ->iopoll_complete */
>>>>>>>> smp_store_release(&req->iopoll_completed, 1);
>>>>>>>> - } else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
>>>>>>>> + } else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
>>>>>>>> + if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
>>>>>>>> + return;
>>>>>>>> io_req_complete_defer(req);
>>>>>>>> } else {
>>>>>>>> req->io_task_work.func = io_req_task_complete;
>>>>>>>
>>>>>>> 'git-bisect' shows the reported warning starts from this patch.
>>>>>
>>>>> Thanks Ming
>>>>>
>>>>>>
>>>>>> That does make sense, as probably:
>>>>>>
>>>>>> + /* locked task_work executor checks the deffered list completion */
>>>>>> + if (ts->locked)
>>>>>> + issue_flags = IO_URING_F_COMPLETE_DEFER;
>>>>>>
>>>>>> this assumption isn't true, and that would mess with the task management
>>>>>> (which is in your oops).
>>>>>
>>>>> I'm missing it, how it's not true?
>>>>>
>>>>>
>>>>> static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
>>>>> {
>>>>> ...
>>>>> if (ts->locked) {
>>>>> io_submit_flush_completions(ctx);
>>>>> ...
>>>>> }
>>>>> }
>>>>>
>>>>> static __cold void io_fallback_req_func(struct work_struct *work)
>>>>> {
>>>>> ...
>>>>> mutex_lock(&ctx->uring_lock);
>>>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
>>>>> req->io_task_work.func(req, &ts);
>>>>> io_submit_flush_completions(ctx);
>>>>> mutex_unlock(&ctx->uring_lock);
>>>>> ...
>>>>> }
>>>>
>>>> I took a look too, and don't immediately see it. Those are also the two
>>>> only cases I found, and before the patches, looks fine too.
>>>>
>>>> So no immediate answer there... But I can confirm that before this
>>>> patch, test passes fine. With the patch, it goes boom pretty quick.
>>>> Either directly off putting the task, or an unrelated memory crash
>>>> instead.
>>>
>>> In ublk, the translated 'issue_flags' is passed to io_uring_cmd_done()
>>> from ioucmd->task_work_cb()(__ublk_rq_task_work()). That might be
>>> related with the reason.
>>
>> Or maybe ublk is doing multiple invocations of task_work completions? I
>> added this:
>
> Yes, your debug log & point does help.
>
> This patch convert zero flag(!IO_URING_F_UNLOCKED) into IO_URING_F_COMPLETE_DEFER,
> and somewhere is easily ignored, and follows the fix, which need to be
> folded into patch 2.
Thanks, was totally unaware of this chunk. A side note, it's
better to be moved to uring_cmd, i'll do the change
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 5d4b448fdc50..22f2b52390a9 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3259,7 +3259,8 @@ static bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
> /* ->sqe isn't available if no async data */
> if (!req_has_async_data(req))
> cmd->sqe = NULL;
> - file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL);
> + file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL |
> + IO_URING_F_COMPLETE_DEFER);
> ret = true;
> }
> }
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 01/14] io_uring/cmd: kill one issue_flags to tw conversion Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 02/14] io_uring/cmd: fix tw <-> issue_flags conversion Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 8:10 ` Ming Lei
2024-03-18 0:41 ` [PATCH v2 04/14] io_uring/cmd: introduce io_uring_cmd_complete Pavel Begunkov
` (10 subsequent siblings)
13 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
io_uring_cmd_done() is called from the irq context and is considered to
be irq safe, however that's not the case if the driver requires
cancellations because io_uring_cmd_del_cancelable() could try to take
the uring_lock mutex.
Clean up the confusion, by deferring cancellation handling to
locked task_work if we came into io_uring_cmd_done() from iowq
or other IO_URING_F_UNLOCKED path.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/uring_cmd.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index ec38a8d4836d..9590081feb2d 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -14,19 +14,18 @@
#include "rsrc.h"
#include "uring_cmd.h"
-static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
- unsigned int issue_flags)
+static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
{
struct io_kiocb *req = cmd_to_io_kiocb(cmd);
struct io_ring_ctx *ctx = req->ctx;
+ lockdep_assert_held(&ctx->uring_lock);
+
if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
return;
cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
- io_ring_submit_lock(ctx, issue_flags);
hlist_del(&req->hash_node);
- io_ring_submit_unlock(ctx, issue_flags);
}
/*
@@ -44,6 +43,9 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
struct io_kiocb *req = cmd_to_io_kiocb(cmd);
struct io_ring_ctx *ctx = req->ctx;
+ if (WARN_ON_ONCE(ctx->flags & IORING_SETUP_IOPOLL))
+ return;
+
if (!(cmd->flags & IORING_URING_CMD_CANCELABLE)) {
cmd->flags |= IORING_URING_CMD_CANCELABLE;
io_ring_submit_lock(ctx, issue_flags);
@@ -84,6 +86,15 @@ static inline void io_req_set_cqe32_extra(struct io_kiocb *req,
req->big_cqe.extra2 = extra2;
}
+static void io_req_cmd_task_complete(struct io_kiocb *req, struct io_tw_state *ts)
+{
+ struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+
+ io_tw_lock(req->ctx, ts);
+ io_uring_cmd_del_cancelable(ioucmd);
+ io_req_task_complete(req, ts);
+}
+
/*
* Called by consumers of io_uring_cmd, if they originally returned
* -EIOCBQUEUED upon receiving the command.
@@ -93,8 +104,6 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
{
struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
- io_uring_cmd_del_cancelable(ioucmd, issue_flags);
-
if (ret < 0)
req_set_fail(req);
@@ -107,9 +116,10 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2,
} else if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
if (WARN_ON_ONCE(issue_flags & IO_URING_F_UNLOCKED))
return;
+ io_uring_cmd_del_cancelable(ioucmd);
io_req_complete_defer(req);
} else {
- req->io_task_work.func = io_req_task_complete;
+ req->io_task_work.func = io_req_cmd_task_complete;
io_req_task_work_add(req);
}
}
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe
2024-03-18 0:41 ` [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe Pavel Begunkov
@ 2024-03-18 8:10 ` Ming Lei
2024-03-18 11:50 ` Pavel Begunkov
0 siblings, 1 reply; 41+ messages in thread
From: Ming Lei @ 2024-03-18 8:10 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 12:41:48AM +0000, Pavel Begunkov wrote:
> io_uring_cmd_done() is called from the irq context and is considered to
> be irq safe, however that's not the case if the driver requires
> cancellations because io_uring_cmd_del_cancelable() could try to take
> the uring_lock mutex.
>
> Clean up the confusion, by deferring cancellation handling to
> locked task_work if we came into io_uring_cmd_done() from iowq
> or other IO_URING_F_UNLOCKED path.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> io_uring/uring_cmd.c | 24 +++++++++++++++++-------
> 1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index ec38a8d4836d..9590081feb2d 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -14,19 +14,18 @@
> #include "rsrc.h"
> #include "uring_cmd.h"
>
> -static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
> - unsigned int issue_flags)
> +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
> {
> struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> struct io_ring_ctx *ctx = req->ctx;
>
> + lockdep_assert_held(&ctx->uring_lock);
> +
> if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
> return;
>
> cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
> - io_ring_submit_lock(ctx, issue_flags);
> hlist_del(&req->hash_node);
> - io_ring_submit_unlock(ctx, issue_flags);
> }
>
> /*
> @@ -44,6 +43,9 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
> struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> struct io_ring_ctx *ctx = req->ctx;
>
> + if (WARN_ON_ONCE(ctx->flags & IORING_SETUP_IOPOLL))
> + return;
> +
This way limits cancelable command can't be used in iopoll context, and
it isn't reasonable, and supporting iopoll has been in ublk todo list,
especially single ring context is shared for both command and normal IO.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe
2024-03-18 8:10 ` Ming Lei
@ 2024-03-18 11:50 ` Pavel Begunkov
2024-03-18 11:59 ` Ming Lei
0 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 11:50 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On 3/18/24 08:10, Ming Lei wrote:
> On Mon, Mar 18, 2024 at 12:41:48AM +0000, Pavel Begunkov wrote:
>> io_uring_cmd_done() is called from the irq context and is considered to
>> be irq safe, however that's not the case if the driver requires
>> cancellations because io_uring_cmd_del_cancelable() could try to take
>> the uring_lock mutex.
>>
>> Clean up the confusion, by deferring cancellation handling to
>> locked task_work if we came into io_uring_cmd_done() from iowq
>> or other IO_URING_F_UNLOCKED path.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> io_uring/uring_cmd.c | 24 +++++++++++++++++-------
>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index ec38a8d4836d..9590081feb2d 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -14,19 +14,18 @@
>> #include "rsrc.h"on
>> #include "uring_cmd.h"
>>
>> -static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
>> - unsigned int issue_flags)
>> +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
>> {
>> struct io_kiocb *req = cmd_to_io_kiocb(cmd);
>> struct io_ring_ctx *ctx = req->ctx;
>>
>> + lockdep_assert_held(&ctx->uring_lock);
>> +
>> if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
>> return;
>>
>> cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
>> - io_ring_submit_lock(ctx, issue_flags);
>> hlist_del(&req->hash_node);
>> - io_ring_submit_unlock(ctx, issue_flags);
>> }
>>
>> /*
>> @@ -44,6 +43,9 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
>> struct io_kiocb *req = cmd_to_io_kiocb(cmd);
>> struct io_ring_ctx *ctx = req->ctx;
>>
>> + if (WARN_ON_ONCE(ctx->flags & IORING_SETUP_IOPOLL))
>> + return;
>> +
>
> This way limits cancelable command can't be used in iopoll context, and
> it isn't reasonable, and supporting iopoll has been in ublk todo list,
> especially single ring context is shared for both command and normal IO.
That's something that can be solved when it's needed, and hence the
warning so it's not missed. That would need del_cancelable on the
->iopoll side, but depends on the "leaving in cancel queue"
problem resolution.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe
2024-03-18 11:50 ` Pavel Begunkov
@ 2024-03-18 11:59 ` Ming Lei
2024-03-18 12:46 ` Pavel Begunkov
0 siblings, 1 reply; 41+ messages in thread
From: Ming Lei @ 2024-03-18 11:59 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 11:50:32AM +0000, Pavel Begunkov wrote:
> On 3/18/24 08:10, Ming Lei wrote:
> > On Mon, Mar 18, 2024 at 12:41:48AM +0000, Pavel Begunkov wrote:
> > > io_uring_cmd_done() is called from the irq context and is considered to
> > > be irq safe, however that's not the case if the driver requires
> > > cancellations because io_uring_cmd_del_cancelable() could try to take
> > > the uring_lock mutex.
> > >
> > > Clean up the confusion, by deferring cancellation handling to
> > > locked task_work if we came into io_uring_cmd_done() from iowq
> > > or other IO_URING_F_UNLOCKED path.
> > >
> > > Signed-off-by: Pavel Begunkov <[email protected]>
> > > ---
> > > io_uring/uring_cmd.c | 24 +++++++++++++++++-------
> > > 1 file changed, 17 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > > index ec38a8d4836d..9590081feb2d 100644
> > > --- a/io_uring/uring_cmd.c
> > > +++ b/io_uring/uring_cmd.c
> > > @@ -14,19 +14,18 @@
> > > #include "rsrc.h"on #include "uring_cmd.h"
> > > -static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
> > > - unsigned int issue_flags)
> > > +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
> > > {
> > > struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> > > struct io_ring_ctx *ctx = req->ctx;
> > > + lockdep_assert_held(&ctx->uring_lock);
> > > +
> > > if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
> > > return;
> > > cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
> > > - io_ring_submit_lock(ctx, issue_flags);
> > > hlist_del(&req->hash_node);
> > > - io_ring_submit_unlock(ctx, issue_flags);
> > > }
> > > /*
> > > @@ -44,6 +43,9 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
> > > struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> > > struct io_ring_ctx *ctx = req->ctx;
> > > + if (WARN_ON_ONCE(ctx->flags & IORING_SETUP_IOPOLL))
> > > + return;
> > > +
> >
> > This way limits cancelable command can't be used in iopoll context, and
> > it isn't reasonable, and supporting iopoll has been in ublk todo list,
> > especially single ring context is shared for both command and normal IO.
>
> That's something that can be solved when it's needed, and hence the
> warning so it's not missed. That would need del_cancelable on the
> ->iopoll side, but depends on the "leaving in cancel queue"
> problem resolution.
The current code is actually working with iopoll, so adding the warning
can be one regression. Maybe someone has been using ublk with iopoll.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe
2024-03-18 11:59 ` Ming Lei
@ 2024-03-18 12:46 ` Pavel Begunkov
2024-03-18 13:09 ` Ming Lei
0 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 12:46 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On 3/18/24 11:59, Ming Lei wrote:
> On Mon, Mar 18, 2024 at 11:50:32AM +0000, Pavel Begunkov wrote:
>> On 3/18/24 08:10, Ming Lei wrote:
>>> On Mon, Mar 18, 2024 at 12:41:48AM +0000, Pavel Begunkov wrote:
>>>> io_uring_cmd_done() is called from the irq context and is considered to
>>>> be irq safe, however that's not the case if the driver requires
>>>> cancellations because io_uring_cmd_del_cancelable() could try to take
>>>> the uring_lock mutex.
>>>>
>>>> Clean up the confusion, by deferring cancellation handling to
>>>> locked task_work if we came into io_uring_cmd_done() from iowq
>>>> or other IO_URING_F_UNLOCKED path.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> ---
>>>> io_uring/uring_cmd.c | 24 +++++++++++++++++-------
>>>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>> index ec38a8d4836d..9590081feb2d 100644
>>>> --- a/io_uring/uring_cmd.c
>>>> +++ b/io_uring/uring_cmd.c
>>>> @@ -14,19 +14,18 @@
>>>> #include "rsrc.h"on #include "uring_cmd.h"
>>>> -static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
>>>> - unsigned int issue_flags)
>>>> +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
>>>> {
>>>> struct io_kiocb *req = cmd_to_io_kiocb(cmd);
>>>> struct io_ring_ctx *ctx = req->ctx;
>>>> + lockdep_assert_held(&ctx->uring_lock);
>>>> +
>>>> if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
>>>> return;
>>>> cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
>>>> - io_ring_submit_lock(ctx, issue_flags);
>>>> hlist_del(&req->hash_node);
>>>> - io_ring_submit_unlock(ctx, issue_flags);
>>>> }
>>>> /*
>>>> @@ -44,6 +43,9 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
>>>> struct io_kiocb *req = cmd_to_io_kiocb(cmd);
>>>> struct io_ring_ctx *ctx = req->ctx;
>>>> + if (WARN_ON_ONCE(ctx->flags & IORING_SETUP_IOPOLL))
>>>> + return;
>>>> +
>>>
>>> This way limits cancelable command can't be used in iopoll context, and
>>> it isn't reasonable, and supporting iopoll has been in ublk todo list,
>>> especially single ring context is shared for both command and normal IO.
>>
>> That's something that can be solved when it's needed, and hence the
>> warning so it's not missed. That would need del_cancelable on the
>> ->iopoll side, but depends on the "leaving in cancel queue"
>> problem resolution.
>
> The current code is actually working with iopoll, so adding the warning
> can be one regression. Maybe someone has been using ublk with iopoll.
Hmm, I don't see ->uring_cmd_iopoll implemented anywhere, and w/o
it io_uring should fail the request:
int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
{
...
if (ctx->flags & IORING_SETUP_IOPOLL) {
if (!file->f_op->uring_cmd_iopoll)
return -EOPNOTSUPP;
issue_flags |= IO_URING_F_IOPOLL;
}
}
Did I miss it somewhere?
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe
2024-03-18 12:46 ` Pavel Begunkov
@ 2024-03-18 13:09 ` Ming Lei
0 siblings, 0 replies; 41+ messages in thread
From: Ming Lei @ 2024-03-18 13:09 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 12:46:25PM +0000, Pavel Begunkov wrote:
> On 3/18/24 11:59, Ming Lei wrote:
> > On Mon, Mar 18, 2024 at 11:50:32AM +0000, Pavel Begunkov wrote:
> > > On 3/18/24 08:10, Ming Lei wrote:
> > > > On Mon, Mar 18, 2024 at 12:41:48AM +0000, Pavel Begunkov wrote:
> > > > > io_uring_cmd_done() is called from the irq context and is considered to
> > > > > be irq safe, however that's not the case if the driver requires
> > > > > cancellations because io_uring_cmd_del_cancelable() could try to take
> > > > > the uring_lock mutex.
> > > > >
> > > > > Clean up the confusion, by deferring cancellation handling to
> > > > > locked task_work if we came into io_uring_cmd_done() from iowq
> > > > > or other IO_URING_F_UNLOCKED path.
> > > > >
> > > > > Signed-off-by: Pavel Begunkov <[email protected]>
> > > > > ---
> > > > > io_uring/uring_cmd.c | 24 +++++++++++++++++-------
> > > > > 1 file changed, 17 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > > > > index ec38a8d4836d..9590081feb2d 100644
> > > > > --- a/io_uring/uring_cmd.c
> > > > > +++ b/io_uring/uring_cmd.c
> > > > > @@ -14,19 +14,18 @@
> > > > > #include "rsrc.h"on #include "uring_cmd.h"
> > > > > -static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
> > > > > - unsigned int issue_flags)
> > > > > +static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd)
> > > > > {
> > > > > struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> > > > > struct io_ring_ctx *ctx = req->ctx;
> > > > > + lockdep_assert_held(&ctx->uring_lock);
> > > > > +
> > > > > if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
> > > > > return;
> > > > > cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
> > > > > - io_ring_submit_lock(ctx, issue_flags);
> > > > > hlist_del(&req->hash_node);
> > > > > - io_ring_submit_unlock(ctx, issue_flags);
> > > > > }
> > > > > /*
> > > > > @@ -44,6 +43,9 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
> > > > > struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> > > > > struct io_ring_ctx *ctx = req->ctx;
> > > > > + if (WARN_ON_ONCE(ctx->flags & IORING_SETUP_IOPOLL))
> > > > > + return;
> > > > > +
> > > >
> > > > This way limits cancelable command can't be used in iopoll context, and
> > > > it isn't reasonable, and supporting iopoll has been in ublk todo list,
> > > > especially single ring context is shared for both command and normal IO.
> > >
> > > That's something that can be solved when it's needed, and hence the
> > > warning so it's not missed. That would need del_cancelable on the
> > > ->iopoll side, but depends on the "leaving in cancel queue"
> > > problem resolution.
> >
> > The current code is actually working with iopoll, so adding the warning
> > can be one regression. Maybe someone has been using ublk with iopoll.
>
> Hmm, I don't see ->uring_cmd_iopoll implemented anywhere, and w/o
> it io_uring should fail the request:
>
> int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
> {
> ...
> if (ctx->flags & IORING_SETUP_IOPOLL) {
> if (!file->f_op->uring_cmd_iopoll)
> return -EOPNOTSUPP;
> issue_flags |= IO_URING_F_IOPOLL;
> }
> }
>
> Did I miss it somewhere?
Looks I missed the point, now the WARN() is fine.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 04/14] io_uring/cmd: introduce io_uring_cmd_complete
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (2 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 03/14] io_uring/cmd: make io_uring_cmd_done irq safe Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED Pavel Begunkov
` (9 subsequent siblings)
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
io_uring_cmd_complete() does exactly what io_uring_cmd_done() does, that
is completing the request, but doesn't ask for issue_flags argument. We
have a couple of users hardcoding some random issue_flags values in
drivers, which they absolutely should not do. This function will be used
to get rid of them. Also, add comments warning users that they're only
allowed to pass issue_flags that were given from io_uring.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/bb7e81aa31f9c878780d46e379d106124a7aa102.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/io_uring/cmd.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index e453a997c060..9cbe986eab7d 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -26,12 +26,25 @@ static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe)
#if defined(CONFIG_IO_URING)
int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
struct iov_iter *iter, void *ioucmd);
+
+/*
+ * Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd
+ * and a corresponding io_uring request.
+ *
+ * Note: the caller must never invent the @issue_flags mask, it's only allowed
+ * to pass what has been provided by the core io_uring code.
+ */
void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
unsigned issue_flags);
+
void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
void (*task_work_cb)(struct io_uring_cmd *, unsigned),
unsigned flags);
+/*
+ * The caller must never invent the @issue_flags mask, it's only allowed
+ * to pass what has been provided by the core io_uring code.
+ */
void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
unsigned int issue_flags);
@@ -56,6 +69,17 @@ static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
}
#endif
+/*
+ * Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd
+ * and a corresponding io_uring request. Similar to io_uring_cmd_done() but
+ * without issue_flags argument.
+ */
+static inline void io_uring_cmd_complete(struct io_uring_cmd *ioucmd,
+ ssize_t ret, ssize_t res2)
+{
+ io_uring_cmd_done(ioucmd, ret, res2, IO_URING_F_UNLOCKED);
+}
+
/* users must follow the IOU_F_TWQ_LAZY_WAKE semantics */
static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
void (*task_work_cb)(struct io_uring_cmd *, unsigned))
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (3 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 04/14] io_uring/cmd: introduce io_uring_cmd_complete Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 8:16 ` Ming Lei
2024-03-18 0:41 ` [PATCH v2 06/14] nvme/io_uring: " Pavel Begunkov
` (8 subsequent siblings)
13 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
uring_cmd implementations should not try to guess issue_flags, just use
a newly added io_uring_cmd_complete(). We're loosing an optimisation in
the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
is that we don't care that much about it.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
drivers/block/ublk_drv.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index bea3d5cf8a83..97dceecadab2 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
return true;
}
-static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
- unsigned int issue_flags)
+static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
{
bool done;
@@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
spin_unlock(&ubq->cancel_lock);
if (!done)
- io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
+ io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
}
/*
* The ublk char device won't be closed when calling cancel fn, so both
* ublk device and queue are guaranteed to be live
*/
-static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
- unsigned int issue_flags)
+static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
{
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
struct ublk_queue *ubq = pdu->ubq;
@@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
io = &ubq->ios[pdu->tag];
WARN_ON_ONCE(io->cmd != cmd);
- ublk_cancel_cmd(ubq, io, issue_flags);
+ ublk_cancel_cmd(ubq, io);
if (need_schedule) {
if (ublk_can_use_recovery(ub))
@@ -1484,7 +1482,7 @@ static void ublk_cancel_queue(struct ublk_queue *ubq)
int i;
for (i = 0; i < ubq->q_depth; i++)
- ublk_cancel_cmd(ubq, &ubq->ios[i], IO_URING_F_UNLOCKED);
+ ublk_cancel_cmd(ubq, &ubq->ios[i]);
}
/* Cancel all pending commands, must be called after del_gendisk() returns */
@@ -1777,7 +1775,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
return -EIOCBQUEUED;
out:
- io_uring_cmd_done(cmd, ret, 0, issue_flags);
+ io_uring_cmd_complete(cmd, ret, 0);
pr_devel("%s: complete: cmd op %d, tag %d ret %x io_flags %x\n",
__func__, cmd_op, tag, ret, io->flags);
return -EIOCBQUEUED;
@@ -1842,7 +1840,7 @@ static void ublk_ch_uring_cmd_cb(struct io_uring_cmd *cmd,
static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
if (unlikely(issue_flags & IO_URING_F_CANCEL)) {
- ublk_uring_cmd_cancel_fn(cmd, issue_flags);
+ ublk_uring_cmd_cancel_fn(cmd);
return 0;
}
@@ -2930,7 +2928,7 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
if (ub)
ublk_put_device(ub);
out:
- io_uring_cmd_done(cmd, ret, 0, issue_flags);
+ io_uring_cmd_complete(cmd, ret, 0);
pr_devel("%s: cmd done ret %d cmd_op %x, dev id %d qid %d\n",
__func__, ret, cmd->cmd_op, header->dev_id, header->queue_id);
return -EIOCBQUEUED;
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 0:41 ` [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED Pavel Begunkov
@ 2024-03-18 8:16 ` Ming Lei
2024-03-18 12:52 ` Pavel Begunkov
0 siblings, 1 reply; 41+ messages in thread
From: Ming Lei @ 2024-03-18 8:16 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
> uring_cmd implementations should not try to guess issue_flags, just use
> a newly added io_uring_cmd_complete(). We're loosing an optimisation in
> the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
> is that we don't care that much about it.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> drivers/block/ublk_drv.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index bea3d5cf8a83..97dceecadab2 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
> return true;
> }
>
> -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> - unsigned int issue_flags)
> +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
> {
> bool done;
>
> @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> spin_unlock(&ubq->cancel_lock);
>
> if (!done)
> - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
> + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
> }
>
> /*
> * The ublk char device won't be closed when calling cancel fn, so both
> * ublk device and queue are guaranteed to be live
> */
> -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
> - unsigned int issue_flags)
> +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
> {
> struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> struct ublk_queue *ubq = pdu->ubq;
> @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>
> io = &ubq->ios[pdu->tag];
> WARN_ON_ONCE(io->cmd != cmd);
> - ublk_cancel_cmd(ubq, io, issue_flags);
> + ublk_cancel_cmd(ubq, io);
.cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
be removed, otherwise double task run is caused because .cancel_fn
can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 8:16 ` Ming Lei
@ 2024-03-18 12:52 ` Pavel Begunkov
2024-03-18 13:37 ` Pavel Begunkov
2024-03-18 14:34 ` Ming Lei
0 siblings, 2 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 12:52 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On 3/18/24 08:16, Ming Lei wrote:
> On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
>> uring_cmd implementations should not try to guess issue_flags, just use
>> a newly added io_uring_cmd_complete(). We're loosing an optimisation in
>> the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
>> is that we don't care that much about it.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>> drivers/block/ublk_drv.c | 18 ++++++++----------
>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>> index bea3d5cf8a83..97dceecadab2 100644
>> --- a/drivers/block/ublk_drv.c
>> +++ b/drivers/block/ublk_drv.c
>> @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
>> return true;
>> }
>>
>> -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>> - unsigned int issue_flags)
>> +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
>> {
>> bool done;
>>
>> @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>> spin_unlock(&ubq->cancel_lock);
>>
>> if (!done)
>> - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
>> + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
>> }
>>
>> /*
>> * The ublk char device won't be closed when calling cancel fn, so both
>> * ublk device and queue are guaranteed to be live
>> */
>> -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>> - unsigned int issue_flags)
>> +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
>> {
>> struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
>> struct ublk_queue *ubq = pdu->ubq;
>> @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>>
>> io = &ubq->ios[pdu->tag];
>> WARN_ON_ONCE(io->cmd != cmd);
>> - ublk_cancel_cmd(ubq, io, issue_flags);
>> + ublk_cancel_cmd(ubq, io);
>
> .cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
> be removed, otherwise double task run is caused because .cancel_fn
> can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
I see, that's exactly why I was asking whether it can be deferred
to tw. Let me see if I can get by without that patch, but honestly
it's a horrible abuse of the ring state. Any ideas how that can be
cleaned up?
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 12:52 ` Pavel Begunkov
@ 2024-03-18 13:37 ` Pavel Begunkov
2024-03-18 14:32 ` Pavel Begunkov
2024-03-18 14:34 ` Ming Lei
1 sibling, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 13:37 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On 3/18/24 12:52, Pavel Begunkov wrote:
> On 3/18/24 08:16, Ming Lei wrote:
>> On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
>>> uring_cmd implementations should not try to guess issue_flags, just use
>>> a newly added io_uring_cmd_complete(). We're loosing an optimisation in
>>> the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
>>> is that we don't care that much about it.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> ---
>>> drivers/block/ublk_drv.c | 18 ++++++++----------
>>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>> index bea3d5cf8a83..97dceecadab2 100644
>>> --- a/drivers/block/ublk_drv.c
>>> +++ b/drivers/block/ublk_drv.c
>>> @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
>>> return true;
>>> }
>>> -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>>> - unsigned int issue_flags)
>>> +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
>>> {
>>> bool done;
>>> @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>>> spin_unlock(&ubq->cancel_lock);
>>> if (!done)
>>> - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
>>> + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
>>> }
>>> /*
>>> * The ublk char device won't be closed when calling cancel fn, so both
>>> * ublk device and queue are guaranteed to be live
>>> */
>>> -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>>> - unsigned int issue_flags)
>>> +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
>>> {
>>> struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
>>> struct ublk_queue *ubq = pdu->ubq;
>>> @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>>> io = &ubq->ios[pdu->tag];
>>> WARN_ON_ONCE(io->cmd != cmd);
>>> - ublk_cancel_cmd(ubq, io, issue_flags);
>>> + ublk_cancel_cmd(ubq, io);
>>
>> .cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
>> be removed, otherwise double task run is caused because .cancel_fn
>> can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
>
> I see, that's exactly why I was asking whether it can be deferred
> to tw. Let me see if I can get by without that patch, but honestly
> it's a horrible abuse of the ring state. Any ideas how that can be
> cleaned up?
I assume io_uring_try_cancel_uring_cmd() can run in parallel with
completions, so there can be two parallel calls calls to ->uring_cmd
(e.g. io-wq + cancel), which gives me shivers. Also, I'd rather
no cancel in place requests of another task, io_submit_flush_completions()
but it complicates things.
Is there any argument against removing requests from the cancellation
list in io_uring_try_cancel_uring_cmd() before calling ->uring_cmd?
io_uring_try_cancel_uring_cmd() {
lock();
for_each_req() {
remove_req_from_cancel_list(req);
req->file->uring_cmd();
}
unlock();
}
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 13:37 ` Pavel Begunkov
@ 2024-03-18 14:32 ` Pavel Begunkov
2024-03-18 14:39 ` Ming Lei
0 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 14:32 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On 3/18/24 13:37, Pavel Begunkov wrote:
> On 3/18/24 12:52, Pavel Begunkov wrote:
>> On 3/18/24 08:16, Ming Lei wrote:
>>> On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
>>>> uring_cmd implementations should not try to guess issue_flags, just use
>>>> a newly added io_uring_cmd_complete(). We're loosing an optimisation in
>>>> the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
>>>> is that we don't care that much about it.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> ---
>>>> drivers/block/ublk_drv.c | 18 ++++++++----------
>>>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>>> index bea3d5cf8a83..97dceecadab2 100644
>>>> --- a/drivers/block/ublk_drv.c
>>>> +++ b/drivers/block/ublk_drv.c
>>>> @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
>>>> return true;
>>>> }
>>>> -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>>>> - unsigned int issue_flags)
>>>> +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
>>>> {
>>>> bool done;
>>>> @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>>>> spin_unlock(&ubq->cancel_lock);
>>>> if (!done)
>>>> - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
>>>> + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
>>>> }
>>>> /*
>>>> * The ublk char device won't be closed when calling cancel fn, so both
>>>> * ublk device and queue are guaranteed to be live
>>>> */
>>>> -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>>>> - unsigned int issue_flags)
>>>> +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
>>>> {
>>>> struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
>>>> struct ublk_queue *ubq = pdu->ubq;
>>>> @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>>>> io = &ubq->ios[pdu->tag];
>>>> WARN_ON_ONCE(io->cmd != cmd);
>>>> - ublk_cancel_cmd(ubq, io, issue_flags);
>>>> + ublk_cancel_cmd(ubq, io);
>>>
>>> .cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
>>> be removed, otherwise double task run is caused because .cancel_fn
>>> can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
>>
>> I see, that's exactly why I was asking whether it can be deferred
>> to tw. Let me see if I can get by without that patch, but honestly
>> it's a horrible abuse of the ring state. Any ideas how that can be
>> cleaned up?
>
> I assume io_uring_try_cancel_uring_cmd() can run in parallel with
> completions, so there can be two parallel calls calls to ->uring_cmd
> (e.g. io-wq + cancel), which gives me shivers. Also, I'd rather
> no cancel in place requests of another task, io_submit_flush_completions()
> but it complicates things.
I'm wrong though on flush_completions, the task there cancels only
its own requests
io_uring_try_cancel_uring_cmd() {
...
if (!cancel_all && req->task != task)
continue;
}
> Is there any argument against removing requests from the cancellation
> list in io_uring_try_cancel_uring_cmd() before calling ->uring_cmd?
>
> io_uring_try_cancel_uring_cmd() {
> lock();
> for_each_req() {
> remove_req_from_cancel_list(req);
> req->file->uring_cmd();
> }
> unlock();
> }
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 14:32 ` Pavel Begunkov
@ 2024-03-18 14:39 ` Ming Lei
0 siblings, 0 replies; 41+ messages in thread
From: Ming Lei @ 2024-03-18 14:39 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 02:32:16PM +0000, Pavel Begunkov wrote:
> On 3/18/24 13:37, Pavel Begunkov wrote:
> > On 3/18/24 12:52, Pavel Begunkov wrote:
> > > On 3/18/24 08:16, Ming Lei wrote:
> > > > On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
> > > > > uring_cmd implementations should not try to guess issue_flags, just use
> > > > > a newly added io_uring_cmd_complete(). We're loosing an optimisation in
> > > > > the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
> > > > > is that we don't care that much about it.
> > > > >
> > > > > Signed-off-by: Pavel Begunkov <[email protected]>
> > > > > Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
> > > > > Signed-off-by: Jens Axboe <[email protected]>
> > > > > ---
> > > > > drivers/block/ublk_drv.c | 18 ++++++++----------
> > > > > 1 file changed, 8 insertions(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > index bea3d5cf8a83..97dceecadab2 100644
> > > > > --- a/drivers/block/ublk_drv.c
> > > > > +++ b/drivers/block/ublk_drv.c
> > > > > @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
> > > > > return true;
> > > > > }
> > > > > -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> > > > > - unsigned int issue_flags)
> > > > > +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
> > > > > {
> > > > > bool done;
> > > > > @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> > > > > spin_unlock(&ubq->cancel_lock);
> > > > > if (!done)
> > > > > - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
> > > > > + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
> > > > > }
> > > > > /*
> > > > > * The ublk char device won't be closed when calling cancel fn, so both
> > > > > * ublk device and queue are guaranteed to be live
> > > > > */
> > > > > -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
> > > > > - unsigned int issue_flags)
> > > > > +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
> > > > > {
> > > > > struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> > > > > struct ublk_queue *ubq = pdu->ubq;
> > > > > @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
> > > > > io = &ubq->ios[pdu->tag];
> > > > > WARN_ON_ONCE(io->cmd != cmd);
> > > > > - ublk_cancel_cmd(ubq, io, issue_flags);
> > > > > + ublk_cancel_cmd(ubq, io);
> > > >
> > > > .cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
> > > > be removed, otherwise double task run is caused because .cancel_fn
> > > > can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
> > >
> > > I see, that's exactly why I was asking whether it can be deferred
> > > to tw. Let me see if I can get by without that patch, but honestly
> > > it's a horrible abuse of the ring state. Any ideas how that can be
> > > cleaned up?
> >
> > I assume io_uring_try_cancel_uring_cmd() can run in parallel with
> > completions, so there can be two parallel calls calls to ->uring_cmd
> > (e.g. io-wq + cancel), which gives me shivers. Also, I'd rather
> > no cancel in place requests of another task, io_submit_flush_completions()
> > but it complicates things.
>
> I'm wrong though on flush_completions, the task there cancels only
> its own requests
>
> io_uring_try_cancel_uring_cmd() {
> ...
> if (!cancel_all && req->task != task)
> continue;
> }
>
>
> > Is there any argument against removing requests from the cancellation
> > list in io_uring_try_cancel_uring_cmd() before calling ->uring_cmd?
> >
> > io_uring_try_cancel_uring_cmd() {
> > lock();
> > for_each_req() {
> > remove_req_from_cancel_list(req);
> > req->file->uring_cmd();
> > }
> > unlock();
Also the req may not be ready to cancel in ->uring_cmd(), so it
should be allowed to retry in future if it isn't canceled this time.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 12:52 ` Pavel Begunkov
2024-03-18 13:37 ` Pavel Begunkov
@ 2024-03-18 14:34 ` Ming Lei
2024-03-18 15:08 ` Pavel Begunkov
1 sibling, 1 reply; 41+ messages in thread
From: Ming Lei @ 2024-03-18 14:34 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 12:52:33PM +0000, Pavel Begunkov wrote:
> On 3/18/24 08:16, Ming Lei wrote:
> > On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
> > > uring_cmd implementations should not try to guess issue_flags, just use
> > > a newly added io_uring_cmd_complete(). We're loosing an optimisation in
> > > the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
> > > is that we don't care that much about it.
> > >
> > > Signed-off-by: Pavel Begunkov <[email protected]>
> > > Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
> > > Signed-off-by: Jens Axboe <[email protected]>
> > > ---
> > > drivers/block/ublk_drv.c | 18 ++++++++----------
> > > 1 file changed, 8 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index bea3d5cf8a83..97dceecadab2 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
> > > return true;
> > > }
> > > -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> > > - unsigned int issue_flags)
> > > +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
> > > {
> > > bool done;
> > > @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> > > spin_unlock(&ubq->cancel_lock);
> > > if (!done)
> > > - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
> > > + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
> > > }
> > > /*
> > > * The ublk char device won't be closed when calling cancel fn, so both
> > > * ublk device and queue are guaranteed to be live
> > > */
> > > -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
> > > - unsigned int issue_flags)
> > > +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
> > > {
> > > struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> > > struct ublk_queue *ubq = pdu->ubq;
> > > @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
> > > io = &ubq->ios[pdu->tag];
> > > WARN_ON_ONCE(io->cmd != cmd);
> > > - ublk_cancel_cmd(ubq, io, issue_flags);
> > > + ublk_cancel_cmd(ubq, io);
> >
> > .cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
> > be removed, otherwise double task run is caused because .cancel_fn
> > can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
>
> I see, that's exactly why I was asking whether it can be deferred
> to tw. Let me see if I can get by without that patch, but honestly
> it's a horrible abuse of the ring state. Any ideas how that can be
> cleaned up?
Simply deferring io_uring_cmd_done() in ublk_cancel_cmd() to tw still triggers
warning in __put_task_struct(), so I'd suggest to add the patch until
it is root-cause & fixed.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 14:34 ` Ming Lei
@ 2024-03-18 15:08 ` Pavel Begunkov
2024-03-18 15:16 ` Ming Lei
0 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 15:08 UTC (permalink / raw)
To: Ming Lei; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On 3/18/24 14:34, Ming Lei wrote:
> On Mon, Mar 18, 2024 at 12:52:33PM +0000, Pavel Begunkov wrote:
>> On 3/18/24 08:16, Ming Lei wrote:
>>> On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
>>>> uring_cmd implementations should not try to guess issue_flags, just use
>>>> a newly added io_uring_cmd_complete(). We're loosing an optimisation in
>>>> the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
>>>> is that we don't care that much about it.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> ---
>>>> drivers/block/ublk_drv.c | 18 ++++++++----------
>>>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
>>>> index bea3d5cf8a83..97dceecadab2 100644
>>>> --- a/drivers/block/ublk_drv.c
>>>> +++ b/drivers/block/ublk_drv.c
>>>> @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
>>>> return true;
>>>> }
>>>> -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>>>> - unsigned int issue_flags)
>>>> +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
>>>> {
>>>> bool done;
>>>> @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
>>>> spin_unlock(&ubq->cancel_lock);
>>>> if (!done)
>>>> - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
>>>> + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
>>>> }
>>>> /*
>>>> * The ublk char device won't be closed when calling cancel fn, so both
>>>> * ublk device and queue are guaranteed to be live
>>>> */
>>>> -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>>>> - unsigned int issue_flags)
>>>> +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
>>>> {
>>>> struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
>>>> struct ublk_queue *ubq = pdu->ubq;
>>>> @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
>>>> io = &ubq->ios[pdu->tag];
>>>> WARN_ON_ONCE(io->cmd != cmd);
>>>> - ublk_cancel_cmd(ubq, io, issue_flags);
>>>> + ublk_cancel_cmd(ubq, io);
>>>
>>> .cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
>>> be removed, otherwise double task run is caused because .cancel_fn
>>> can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
>>
>> I see, that's exactly why I was asking whether it can be deferred
>> to tw. Let me see if I can get by without that patch, but honestly
>> it's a horrible abuse of the ring state. Any ideas how that can be
>> cleaned up?
>
> Simply deferring io_uring_cmd_done() in ublk_cancel_cmd() to tw still triggers
> warning in __put_task_struct(), so I'd suggest to add the patch until
> it is root-cause & fixed.
I mean drop the patch[es] changing how ublk passes issue_flags
around, moving cancellation point and all related, and leave it
to later really hoping we'll figure how to do it better.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED
2024-03-18 15:08 ` Pavel Begunkov
@ 2024-03-18 15:16 ` Ming Lei
0 siblings, 0 replies; 41+ messages in thread
From: Ming Lei @ 2024-03-18 15:16 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, linux-block, Jens Axboe, Kanchan Joshi
On Mon, Mar 18, 2024 at 03:08:19PM +0000, Pavel Begunkov wrote:
> On 3/18/24 14:34, Ming Lei wrote:
> > On Mon, Mar 18, 2024 at 12:52:33PM +0000, Pavel Begunkov wrote:
> > > On 3/18/24 08:16, Ming Lei wrote:
> > > > On Mon, Mar 18, 2024 at 12:41:50AM +0000, Pavel Begunkov wrote:
> > > > > uring_cmd implementations should not try to guess issue_flags, just use
> > > > > a newly added io_uring_cmd_complete(). We're loosing an optimisation in
> > > > > the cancellation path in ublk_uring_cmd_cancel_fn(), but the assumption
> > > > > is that we don't care that much about it.
> > > > >
> > > > > Signed-off-by: Pavel Begunkov <[email protected]>
> > > > > Link: https://lore.kernel.org/r/2f7bc9fbc98b11412d10b8fd88e58e35614e3147.1710514702.git.asml.silence@gmail.com
> > > > > Signed-off-by: Jens Axboe <[email protected]>
> > > > > ---
> > > > > drivers/block/ublk_drv.c | 18 ++++++++----------
> > > > > 1 file changed, 8 insertions(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > index bea3d5cf8a83..97dceecadab2 100644
> > > > > --- a/drivers/block/ublk_drv.c
> > > > > +++ b/drivers/block/ublk_drv.c
> > > > > @@ -1417,8 +1417,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
> > > > > return true;
> > > > > }
> > > > > -static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> > > > > - unsigned int issue_flags)
> > > > > +static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io)
> > > > > {
> > > > > bool done;
> > > > > @@ -1432,15 +1431,14 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
> > > > > spin_unlock(&ubq->cancel_lock);
> > > > > if (!done)
> > > > > - io_uring_cmd_done(io->cmd, UBLK_IO_RES_ABORT, 0, issue_flags);
> > > > > + io_uring_cmd_complete(io->cmd, UBLK_IO_RES_ABORT, 0);
> > > > > }
> > > > > /*
> > > > > * The ublk char device won't be closed when calling cancel fn, so both
> > > > > * ublk device and queue are guaranteed to be live
> > > > > */
> > > > > -static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
> > > > > - unsigned int issue_flags)
> > > > > +static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd)
> > > > > {
> > > > > struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> > > > > struct ublk_queue *ubq = pdu->ubq;
> > > > > @@ -1464,7 +1462,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
> > > > > io = &ubq->ios[pdu->tag];
> > > > > WARN_ON_ONCE(io->cmd != cmd);
> > > > > - ublk_cancel_cmd(ubq, io, issue_flags);
> > > > > + ublk_cancel_cmd(ubq, io);
> > > >
> > > > .cancel_fn is always called with .uring_lock held, so this 'issue_flags' can't
> > > > be removed, otherwise double task run is caused because .cancel_fn
> > > > can be called multiple times if the request stays in ctx->cancelable_uring_cmd.
> > >
> > > I see, that's exactly why I was asking whether it can be deferred
> > > to tw. Let me see if I can get by without that patch, but honestly
> > > it's a horrible abuse of the ring state. Any ideas how that can be
> > > cleaned up?
> >
> > Simply deferring io_uring_cmd_done() in ublk_cancel_cmd() to tw still triggers
> > warning in __put_task_struct(), so I'd suggest to add the patch until
> > it is root-cause & fixed.
>
> I mean drop the patch[es] changing how ublk passes issue_flags
> around, moving cancellation point and all related, and leave it
> to later really hoping we'll figure how to do it better.
Looks fine for me.
Thanks,
Ming
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 06/14] nvme/io_uring: don't hard code IO_URING_F_UNLOCKED
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (4 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 05/14] ublk: don't hard code IO_URING_F_UNLOCKED Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 13:26 ` Kanchan Joshi
2024-03-18 0:41 ` [PATCH v2 07/14] io_uring/rw: avoid punting to io-wq directly Pavel Begunkov
` (7 subsequent siblings)
13 siblings, 1 reply; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
uring_cmd implementations should not try to guess issue_flags, use a
freshly added helper io_uring_cmd_complete() instead.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/c661cc48f3dd4a09ace5f9d93f5d498cbf3de583.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
drivers/nvme/host/ioctl.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 3dfd5ae99ae0..1a7b5af42dbc 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -426,10 +426,13 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
* For iopoll, complete it directly.
* Otherwise, move the completion to task work.
*/
- if (blk_rq_is_poll(req))
- nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED);
- else
+ if (blk_rq_is_poll(req)) {
+ if (pdu->bio)
+ blk_rq_unmap_user(pdu->bio);
+ io_uring_cmd_complete(ioucmd, pdu->status, pdu->result);
+ } else {
io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
+ }
return RQ_END_IO_FREE;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 07/14] io_uring/rw: avoid punting to io-wq directly
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (5 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 06/14] nvme/io_uring: " Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 08/14] io_uring: force tw ctx locking Pavel Begunkov
` (6 subsequent siblings)
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
kiocb_done() should care to specifically redirecting requests to io-wq.
Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let
the core code io_uring handle offloading.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/595021a813bebf0256e80b95bcc594377096e08b.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/io_uring.c | 8 ++++----
io_uring/io_uring.h | 1 -
io_uring/rw.c | 8 +-------
3 files changed, 5 insertions(+), 12 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5d4b448fdc50..9d7bbdea6db5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -498,7 +498,7 @@ static void io_prep_async_link(struct io_kiocb *req)
}
}
-void io_queue_iowq(struct io_kiocb *req, struct io_tw_state *ts_dont_use)
+static void io_queue_iowq(struct io_kiocb *req)
{
struct io_kiocb *link = io_prep_linked_timeout(req);
struct io_uring_task *tctx = req->task->io_uring;
@@ -1505,7 +1505,7 @@ void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts)
if (unlikely(req->task->flags & PF_EXITING))
io_req_defer_failed(req, -EFAULT);
else if (req->flags & REQ_F_FORCE_ASYNC)
- io_queue_iowq(req, ts);
+ io_queue_iowq(req);
else
io_queue_sqe(req);
}
@@ -2088,7 +2088,7 @@ static void io_queue_async(struct io_kiocb *req, int ret)
break;
case IO_APOLL_ABORTED:
io_kbuf_recycle(req, 0);
- io_queue_iowq(req, NULL);
+ io_queue_iowq(req);
break;
case IO_APOLL_OK:
break;
@@ -2135,7 +2135,7 @@ static void io_queue_sqe_fallback(struct io_kiocb *req)
if (unlikely(req->ctx->drain_active))
io_drain_req(req);
else
- io_queue_iowq(req, NULL);
+ io_queue_iowq(req);
}
}
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 6426ee382276..472ba5692ba8 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -79,7 +79,6 @@ struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
void __io_req_task_work_add(struct io_kiocb *req, unsigned flags);
bool io_alloc_async_data(struct io_kiocb *req);
void io_req_task_queue(struct io_kiocb *req);
-void io_queue_iowq(struct io_kiocb *req, struct io_tw_state *ts_dont_use);
void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts);
void io_req_task_queue_fail(struct io_kiocb *req, int ret);
void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts);
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 0585ebcc9773..576934dbf833 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -187,12 +187,6 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
return NULL;
}
-static void io_req_task_queue_reissue(struct io_kiocb *req)
-{
- req->io_task_work.func = io_queue_iowq;
- io_req_task_work_add(req);
-}
-
#ifdef CONFIG_BLOCK
static bool io_resubmit_prep(struct io_kiocb *req)
{
@@ -405,7 +399,7 @@ static int kiocb_done(struct io_kiocb *req, ssize_t ret,
if (req->flags & REQ_F_REISSUE) {
req->flags &= ~REQ_F_REISSUE;
if (io_resubmit_prep(req))
- io_req_task_queue_reissue(req);
+ return -EAGAIN;
else
io_req_task_queue_fail(req, final_ret);
}
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 08/14] io_uring: force tw ctx locking
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (6 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 07/14] io_uring/rw: avoid punting to io-wq directly Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 09/14] io_uring: remove struct io_tw_state::locked Pavel Begunkov
` (5 subsequent siblings)
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
We can run normal task_work without locking the ctx, however we try to
lock anyway and most handlers prefer or require it locked. It might have
been interesting to multi-submitter ring with high contention completing
async read/write requests via task_work, however that will still need to
go through io_req_complete_post() and potentially take the lock for
rsrc node putting or some other case.
In other words, it's hard to care about it, so alawys force the locking.
The case described would also because of various io_uring caches.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/1f7f31f4075e766343055ff0d07482992038d467.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/io_uring.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9d7bbdea6db5..3184d57f9a35 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1191,8 +1191,9 @@ struct llist_node *io_handle_tw_list(struct llist_node *node,
if (req->ctx != ctx) {
ctx_flush_and_put(ctx, &ts);
ctx = req->ctx;
- /* if not contended, grab and improve batching */
- ts.locked = mutex_trylock(&ctx->uring_lock);
+
+ ts.locked = true;
+ mutex_lock(&ctx->uring_lock);
percpu_ref_get(&ctx->refs);
}
INDIRECT_CALL_2(req->io_task_work.func,
@@ -1453,11 +1454,9 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
if (io_run_local_work_continue(ctx, ret, min_events))
goto again;
- if (ts->locked) {
- io_submit_flush_completions(ctx);
- if (io_run_local_work_continue(ctx, ret, min_events))
- goto again;
- }
+ io_submit_flush_completions(ctx);
+ if (io_run_local_work_continue(ctx, ret, min_events))
+ goto again;
trace_io_uring_local_work_run(ctx, ret, loops);
return ret;
@@ -1481,14 +1480,12 @@ static inline int io_run_local_work_locked(struct io_ring_ctx *ctx,
static int io_run_local_work(struct io_ring_ctx *ctx, int min_events)
{
- struct io_tw_state ts = {};
+ struct io_tw_state ts = { .locked = true };
int ret;
- ts.locked = mutex_trylock(&ctx->uring_lock);
+ mutex_lock(&ctx->uring_lock);
ret = __io_run_local_work(ctx, &ts, min_events);
- if (ts.locked)
- mutex_unlock(&ctx->uring_lock);
-
+ mutex_unlock(&ctx->uring_lock);
return ret;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 09/14] io_uring: remove struct io_tw_state::locked
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (7 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 08/14] io_uring: force tw ctx locking Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 10/14] io_uring: refactor io_fill_cqe_req_aux Pavel Begunkov
` (4 subsequent siblings)
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
ctx is always locked for task_work now, so get rid of struct
io_tw_state::locked. Note I'm stopping one step before removing
io_tw_state altogether, which is not empty, because it still serves the
purpose of indicating which function is a tw callback and forcing users
not to invoke them carelessly out of a wrong context. The removal can
always be done later.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/04482ca7bfea9eb47ba483c590aa46c83315a6ea.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/io_uring_types.h | 2 --
io_uring/io_uring.c | 31 ++++++++-----------------------
io_uring/io_uring.h | 5 +----
io_uring/poll.c | 2 +-
io_uring/rw.c | 6 ++----
io_uring/timeout.c | 8 ++------
io_uring/uring_cmd.c | 8 ++------
io_uring/waitid.c | 2 +-
8 files changed, 17 insertions(+), 47 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index e24893625085..5a2afbc93887 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -439,8 +439,6 @@ struct io_ring_ctx {
};
struct io_tw_state {
- /* ->uring_lock is taken, callbacks can use io_tw_lock to lock it */
- bool locked;
};
enum {
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 3184d57f9a35..ef089f6367ea 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -252,14 +252,12 @@ static __cold void io_fallback_req_func(struct work_struct *work)
fallback_work.work);
struct llist_node *node = llist_del_all(&ctx->fallback_llist);
struct io_kiocb *req, *tmp;
- struct io_tw_state ts = { .locked = true, };
+ struct io_tw_state ts = {};
percpu_ref_get(&ctx->refs);
mutex_lock(&ctx->uring_lock);
llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
req->io_task_work.func(req, &ts);
- if (WARN_ON_ONCE(!ts.locked))
- return;
io_submit_flush_completions(ctx);
mutex_unlock(&ctx->uring_lock);
percpu_ref_put(&ctx->refs);
@@ -1163,11 +1161,9 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
return;
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
- if (ts->locked) {
- io_submit_flush_completions(ctx);
- mutex_unlock(&ctx->uring_lock);
- ts->locked = false;
- }
+
+ io_submit_flush_completions(ctx);
+ mutex_unlock(&ctx->uring_lock);
percpu_ref_put(&ctx->refs);
}
@@ -1191,8 +1187,6 @@ struct llist_node *io_handle_tw_list(struct llist_node *node,
if (req->ctx != ctx) {
ctx_flush_and_put(ctx, &ts);
ctx = req->ctx;
-
- ts.locked = true;
mutex_lock(&ctx->uring_lock);
percpu_ref_get(&ctx->refs);
}
@@ -1465,22 +1459,16 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
static inline int io_run_local_work_locked(struct io_ring_ctx *ctx,
int min_events)
{
- struct io_tw_state ts = { .locked = true, };
- int ret;
+ struct io_tw_state ts = {};
if (llist_empty(&ctx->work_llist))
return 0;
-
- ret = __io_run_local_work(ctx, &ts, min_events);
- /* shouldn't happen! */
- if (WARN_ON_ONCE(!ts.locked))
- mutex_lock(&ctx->uring_lock);
- return ret;
+ return __io_run_local_work(ctx, &ts, min_events);
}
static int io_run_local_work(struct io_ring_ctx *ctx, int min_events)
{
- struct io_tw_state ts = { .locked = true };
+ struct io_tw_state ts = {};
int ret;
mutex_lock(&ctx->uring_lock);
@@ -1708,10 +1696,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts)
{
- if (ts->locked)
- io_req_complete_defer(req);
- else
- io_req_complete_post(req, IO_URING_F_UNLOCKED);
+ io_req_complete_defer(req);
}
/*
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 472ba5692ba8..6cad3ef3408b 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -344,10 +344,7 @@ static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
static inline void io_tw_lock(struct io_ring_ctx *ctx, struct io_tw_state *ts)
{
- if (!ts->locked) {
- mutex_lock(&ctx->uring_lock);
- ts->locked = true;
- }
+ lockdep_assert_held(&ctx->uring_lock);
}
/*
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 6db1dcb2c797..8901dd118e50 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -322,7 +322,7 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts)
__poll_t mask = mangle_poll(req->cqe.res &
req->apoll_events);
- if (!io_fill_cqe_req_aux(req, ts->locked, mask,
+ if (!io_fill_cqe_req_aux(req, true, mask,
IORING_CQE_F_MORE)) {
io_req_set_res(req, mask, 0);
return IOU_POLL_REMOVE_POLL_USE_RES;
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 576934dbf833..c7f9246ff508 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -305,11 +305,9 @@ void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts)
io_req_io_end(req);
- if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
- unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
+ if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING))
+ req->cqe.flags |= io_put_kbuf(req, 0);
- req->cqe.flags |= io_put_kbuf(req, issue_flags);
- }
io_req_task_complete(req, ts);
}
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 7fd7dbb211d6..0a48e6acd0b2 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -72,10 +72,7 @@ static void io_timeout_complete(struct io_kiocb *req, struct io_tw_state *ts)
struct io_ring_ctx *ctx = req->ctx;
if (!io_timeout_finish(timeout, data)) {
- bool filled;
- filled = io_fill_cqe_req_aux(req, ts->locked, -ETIME,
- IORING_CQE_F_MORE);
- if (filled) {
+ if (io_fill_cqe_req_aux(req, true, -ETIME, IORING_CQE_F_MORE)) {
/* re-arm timer */
spin_lock_irq(&ctx->timeout_lock);
list_add(&timeout->list, ctx->timeout_list.prev);
@@ -301,7 +298,6 @@ int io_timeout_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
static void io_req_task_link_timeout(struct io_kiocb *req, struct io_tw_state *ts)
{
- unsigned issue_flags = ts->locked ? 0 : IO_URING_F_UNLOCKED;
struct io_timeout *timeout = io_kiocb_to_cmd(req, struct io_timeout);
struct io_kiocb *prev = timeout->prev;
int ret = -ENOENT;
@@ -313,7 +309,7 @@ static void io_req_task_link_timeout(struct io_kiocb *req, struct io_tw_state *t
.data = prev->cqe.user_data,
};
- ret = io_try_cancel(req->task->io_uring, &cd, issue_flags);
+ ret = io_try_cancel(req->task->io_uring, &cd, 0);
}
io_req_set_res(req, ret ?: -ETIME, 0);
io_req_task_complete(req, ts);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 9590081feb2d..77c29f6883d3 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -58,13 +58,9 @@ EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
{
struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
- unsigned issue_flags = IO_URING_F_UNLOCKED;
- /* locked task_work executor checks the deffered list completion */
- if (ts->locked)
- issue_flags = IO_URING_F_COMPLETE_DEFER;
-
- ioucmd->task_work_cb(ioucmd, issue_flags);
+ /* task_work executor checks the deffered list completion */
+ ioucmd->task_work_cb(ioucmd, IO_URING_F_COMPLETE_DEFER);
}
void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
diff --git a/io_uring/waitid.c b/io_uring/waitid.c
index 77d340666cb9..6362ec20abc0 100644
--- a/io_uring/waitid.c
+++ b/io_uring/waitid.c
@@ -118,7 +118,7 @@ static int io_waitid_finish(struct io_kiocb *req, int ret)
static void io_waitid_complete(struct io_kiocb *req, int ret)
{
struct io_waitid *iw = io_kiocb_to_cmd(req, struct io_waitid);
- struct io_tw_state ts = { .locked = true };
+ struct io_tw_state ts = {};
/* anyone completing better be holding a reference */
WARN_ON_ONCE(!(atomic_read(&iw->refs) & IO_WAITID_REF_MASK));
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 10/14] io_uring: refactor io_fill_cqe_req_aux
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (8 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 09/14] io_uring: remove struct io_tw_state::locked Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 11/14] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
` (3 subsequent siblings)
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
The restriction on multishot execution context disallowing io-wq is
driven by rules of io_fill_cqe_req_aux(), it should only be called in
the master task context, either from the syscall path or in task_work.
Since task_work now always takes the ctx lock implying
IO_URING_F_COMPLETE_DEFER, we can just assume that the function is
always called with its defer argument set to true.
Kill the argument. Also rename the function for more consistency as
"fill" in CQE related functions was usually meant for raw interfaces
only copying data into the CQ without any locking, waking the user
and other accounting "post" functions take care of.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/777fb7fbd2a3ba526a876fc422288c5f65283f12.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/io_uring.c | 16 +++-------------
io_uring/io_uring.h | 2 +-
io_uring/net.c | 6 ++----
io_uring/poll.c | 3 +--
io_uring/rw.c | 4 +---
io_uring/timeout.c | 2 +-
6 files changed, 9 insertions(+), 24 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ef089f6367ea..30542dda1473 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -913,40 +913,30 @@ static void __io_flush_post_cqes(struct io_ring_ctx *ctx)
state->cqes_count = 0;
}
-static bool __io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags,
- bool allow_overflow)
+bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags)
{
bool filled;
io_cq_lock(ctx);
filled = io_fill_cqe_aux(ctx, user_data, res, cflags);
- if (!filled && allow_overflow)
+ if (!filled)
filled = io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0);
io_cq_unlock_post(ctx);
return filled;
}
-bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags)
-{
- return __io_post_aux_cqe(ctx, user_data, res, cflags, true);
-}
-
/*
* A helper for multishot requests posting additional CQEs.
* Should only be used from a task_work including IO_URING_F_MULTISHOT.
*/
-bool io_fill_cqe_req_aux(struct io_kiocb *req, bool defer, s32 res, u32 cflags)
+bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
{
struct io_ring_ctx *ctx = req->ctx;
u64 user_data = req->cqe.user_data;
struct io_uring_cqe *cqe;
lockdep_assert(!io_wq_current_is_worker());
-
- if (!defer)
- return __io_post_aux_cqe(ctx, user_data, res, cflags, false);
-
lockdep_assert_held(&ctx->uring_lock);
if (ctx->submit_state.cqes_count == ARRAY_SIZE(ctx->completion_cqes)) {
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 6cad3ef3408b..4bc96470e591 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -67,7 +67,7 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx);
void io_req_defer_failed(struct io_kiocb *req, s32 res);
void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags);
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
-bool io_fill_cqe_req_aux(struct io_kiocb *req, bool defer, s32 res, u32 cflags);
+bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags);
void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
diff --git a/io_uring/net.c b/io_uring/net.c
index 1e7665ff6ef7..ed798e185bbf 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -706,8 +706,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
* receive from this socket.
*/
if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished &&
- io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
- *ret, cflags | IORING_CQE_F_MORE)) {
+ io_req_post_cqe(req, *ret, cflags | IORING_CQE_F_MORE)) {
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE;
@@ -1428,8 +1427,7 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
if (ret < 0)
return ret;
- if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
- ret, IORING_CQE_F_MORE))
+ if (io_req_post_cqe(req, ret, IORING_CQE_F_MORE))
goto retry;
io_req_set_res(req, ret, 0);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 8901dd118e50..5d55bbf1de15 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -322,8 +322,7 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts)
__poll_t mask = mangle_poll(req->cqe.res &
req->apoll_events);
- if (!io_fill_cqe_req_aux(req, true, mask,
- IORING_CQE_F_MORE)) {
+ if (!io_req_post_cqe(req, mask, IORING_CQE_F_MORE)) {
io_req_set_res(req, mask, 0);
return IOU_POLL_REMOVE_POLL_USE_RES;
}
diff --git a/io_uring/rw.c b/io_uring/rw.c
index c7f9246ff508..35216e8adc29 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -955,9 +955,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
cflags = io_put_kbuf(req, issue_flags);
rw->len = 0; /* similarly to above, reset len to 0 */
- if (io_fill_cqe_req_aux(req,
- issue_flags & IO_URING_F_COMPLETE_DEFER,
- ret, cflags | IORING_CQE_F_MORE)) {
+ if (io_req_post_cqe(req, ret, cflags | IORING_CQE_F_MORE)) {
if (issue_flags & IO_URING_F_MULTISHOT) {
/*
* Force retry, as we might have more data to
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 0a48e6acd0b2..3458ca550b83 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -72,7 +72,7 @@ static void io_timeout_complete(struct io_kiocb *req, struct io_tw_state *ts)
struct io_ring_ctx *ctx = req->ctx;
if (!io_timeout_finish(timeout, data)) {
- if (io_fill_cqe_req_aux(req, true, -ETIME, IORING_CQE_F_MORE)) {
+ if (io_req_post_cqe(req, -ETIME, IORING_CQE_F_MORE)) {
/* re-arm timer */
spin_lock_irq(&ctx->timeout_lock);
list_add(&timeout->list, ctx->timeout_list.prev);
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 11/14] io_uring: get rid of intermediate aux cqe caches
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (9 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 10/14] io_uring: refactor io_fill_cqe_req_aux Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 12/14] io_uring: remove current check from complete_post Pavel Begunkov
` (2 subsequent siblings)
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
io_post_aux_cqe(), which is used for multishot requests, delays
completions by putting CQEs into a temporary array for the purpose
completion lock/flush batching.
DEFER_TASKRUN doesn't need any locking, so for it we can put completions
directly into the CQ and defer post completion handling with a flag.
That leaves !DEFER_TASKRUN, which is not that interesting / hot for
multishot requests, so have conditional locking with deferred flush
for them.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/0eb3f55722540a11b036d3c90771220eb082d65e.1710514702.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/io_uring_types.h | 3 +-
io_uring/io_uring.c | 64 ++++++++--------------------------
2 files changed, 15 insertions(+), 52 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 5a2afbc93887..ea7e5488b3be 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -205,6 +205,7 @@ struct io_submit_state {
bool plug_started;
bool need_plug;
+ bool cq_flush;
unsigned short submit_nr;
unsigned int cqes_count;
struct blk_plug plug;
@@ -342,8 +343,6 @@ struct io_ring_ctx {
unsigned cq_last_tm_flush;
} ____cacheline_aligned_in_smp;
- struct io_uring_cqe completion_cqes[16];
-
spinlock_t completion_lock;
/* IRQ completion list, under ->completion_lock */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 30542dda1473..8c485bcb5cb7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -176,7 +176,7 @@ static struct ctl_table kernel_io_uring_disabled_table[] = {
static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
{
if (!wq_list_empty(&ctx->submit_state.compl_reqs) ||
- ctx->submit_state.cqes_count)
+ ctx->submit_state.cq_flush)
__io_submit_flush_completions(ctx);
}
@@ -636,6 +636,12 @@ static inline void __io_cq_lock(struct io_ring_ctx *ctx)
spin_lock(&ctx->completion_lock);
}
+static inline void __io_cq_unlock(struct io_ring_ctx *ctx)
+{
+ if (!ctx->lockless_cq)
+ spin_unlock(&ctx->completion_lock);
+}
+
static inline void io_cq_lock(struct io_ring_ctx *ctx)
__acquires(ctx->completion_lock)
{
@@ -888,31 +894,6 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
return false;
}
-static void __io_flush_post_cqes(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
-{
- struct io_submit_state *state = &ctx->submit_state;
- unsigned int i;
-
- lockdep_assert_held(&ctx->uring_lock);
- for (i = 0; i < state->cqes_count; i++) {
- struct io_uring_cqe *cqe = &ctx->completion_cqes[i];
-
- if (!io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags)) {
- if (ctx->lockless_cq) {
- spin_lock(&ctx->completion_lock);
- io_cqring_event_overflow(ctx, cqe->user_data,
- cqe->res, cqe->flags, 0, 0);
- spin_unlock(&ctx->completion_lock);
- } else {
- io_cqring_event_overflow(ctx, cqe->user_data,
- cqe->res, cqe->flags, 0, 0);
- }
- }
- }
- state->cqes_count = 0;
-}
-
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags)
{
bool filled;
@@ -933,31 +914,16 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags
bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
{
struct io_ring_ctx *ctx = req->ctx;
- u64 user_data = req->cqe.user_data;
- struct io_uring_cqe *cqe;
+ bool posted;
lockdep_assert(!io_wq_current_is_worker());
lockdep_assert_held(&ctx->uring_lock);
- if (ctx->submit_state.cqes_count == ARRAY_SIZE(ctx->completion_cqes)) {
- __io_cq_lock(ctx);
- __io_flush_post_cqes(ctx);
- /* no need to flush - flush is deferred */
- __io_cq_unlock_post(ctx);
- }
-
- /* For defered completions this is not as strict as it is otherwise,
- * however it's main job is to prevent unbounded posted completions,
- * and in that it works just as well.
- */
- if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq))
- return false;
-
- cqe = &ctx->completion_cqes[ctx->submit_state.cqes_count++];
- cqe->user_data = user_data;
- cqe->res = res;
- cqe->flags = cflags;
- return true;
+ __io_cq_lock(ctx);
+ posted = io_fill_cqe_aux(ctx, req->cqe.user_data, res, cflags);
+ ctx->submit_state.cq_flush = true;
+ __io_cq_unlock_post(ctx);
+ return posted;
}
static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
@@ -1551,9 +1517,6 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
struct io_wq_work_node *node;
__io_cq_lock(ctx);
- /* must come first to preserve CQE ordering in failure cases */
- if (state->cqes_count)
- __io_flush_post_cqes(ctx);
__wq_list_for_each(node, &state->compl_reqs) {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
@@ -1575,6 +1538,7 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
io_free_batch_list(ctx, state->compl_reqs.first);
INIT_WQ_LIST(&state->compl_reqs);
}
+ ctx->submit_state.cq_flush = false;
}
static unsigned io_cqring_events(struct io_ring_ctx *ctx)
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 12/14] io_uring: remove current check from complete_post
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (10 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 11/14] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 13/14] io_uring: refactor io_req_complete_post() Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 14/14] io_uring: clean up io_lockdep_assert_cq_locked Pavel Begunkov
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
task_work execution is now always locked, and we shouldn't get into
io_req_complete_post() from them. That means that complete_post() is
always called out of the original task context and we don't even need to
check current.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/c6a57b44418fe12d76656f0a1be8c982f5151e20.1710538932.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8c485bcb5cb7..0b89fab65bdc 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -978,7 +978,7 @@ void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
- if (ctx->task_complete && ctx->submitter_task != current) {
+ if (ctx->task_complete) {
req->io_task_work.func = io_req_task_complete;
io_req_task_work_add(req);
} else if (!(issue_flags & IO_URING_F_UNLOCKED) ||
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 13/14] io_uring: refactor io_req_complete_post()
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (11 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 12/14] io_uring: remove current check from complete_post Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
2024-03-18 0:41 ` [PATCH v2 14/14] io_uring: clean up io_lockdep_assert_cq_locked Pavel Begunkov
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
Make io_req_complete_post() to push all IORING_SETUP_IOPOLL requests
to task_work, it's much cleaner and should normally happen. We couldn't
do it before because there was a possibility of looping in
complete_post() -> tw -> complete_post() -> ...
Also, unexport the function and inline __io_req_complete_post().
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/f0d46b81e799e36d85d4daf12e2696c022bf88fb.1710538932.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/io_uring.c | 29 +++++++++++------------------
io_uring/io_uring.h | 1 -
2 files changed, 11 insertions(+), 19 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0b89fab65bdc..68c69b553b17 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -926,11 +926,21 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
return posted;
}
-static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
+static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
struct io_rsrc_node *rsrc_node = NULL;
+ /*
+ * Handle special CQ sync cases via task_work. DEFER_TASKRUN requires
+ * the submitter task context, IOPOLL protects with uring_lock.
+ */
+ if (ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)) {
+ req->io_task_work.func = io_req_task_complete;
+ io_req_task_work_add(req);
+ return;
+ }
+
io_cq_lock(ctx);
if (!(req->flags & REQ_F_CQE_SKIP)) {
if (!io_fill_cqe_req(ctx, req))
@@ -974,23 +984,6 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
}
}
-void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
-{
- struct io_ring_ctx *ctx = req->ctx;
-
- if (ctx->task_complete) {
- req->io_task_work.func = io_req_task_complete;
- io_req_task_work_add(req);
- } else if (!(issue_flags & IO_URING_F_UNLOCKED) ||
- !(ctx->flags & IORING_SETUP_IOPOLL)) {
- __io_req_complete_post(req, issue_flags);
- } else {
- mutex_lock(&ctx->uring_lock);
- __io_req_complete_post(req, issue_flags & ~IO_URING_F_UNLOCKED);
- mutex_unlock(&ctx->uring_lock);
- }
-}
-
void io_req_defer_failed(struct io_kiocb *req, s32 res)
__must_hold(&ctx->uring_lock)
{
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 4bc96470e591..db6cab40bbbf 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -65,7 +65,6 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow);
void io_req_cqe_overflow(struct io_kiocb *req);
int io_run_task_work_sig(struct io_ring_ctx *ctx);
void io_req_defer_failed(struct io_kiocb *req, s32 res);
-void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags);
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags);
void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 14/14] io_uring: clean up io_lockdep_assert_cq_locked
2024-03-18 0:41 [PATCH v2 00/14] remove aux CQE caches Pavel Begunkov
` (12 preceding siblings ...)
2024-03-18 0:41 ` [PATCH v2 13/14] io_uring: refactor io_req_complete_post() Pavel Begunkov
@ 2024-03-18 0:41 ` Pavel Begunkov
13 siblings, 0 replies; 41+ messages in thread
From: Pavel Begunkov @ 2024-03-18 0:41 UTC (permalink / raw)
To: io-uring; +Cc: linux-block, Jens Axboe, asml.silence, Kanchan Joshi, Ming Lei
Move CONFIG_PROVE_LOCKING checks inside of io_lockdep_assert_cq_locked()
and kill the else branch.
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/3c7296e943992cf64daa70d0fdfe0d3c87a37c6f.1710538932.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/io_uring.h | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index db6cab40bbbf..85f4c8c1e846 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -119,9 +119,9 @@ enum {
void io_eventfd_ops(struct rcu_head *rcu);
void io_activate_pollwq(struct io_ring_ctx *ctx);
-#if defined(CONFIG_PROVE_LOCKING)
static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
{
+#if defined(CONFIG_PROVE_LOCKING)
lockdep_assert(in_task());
if (ctx->flags & IORING_SETUP_IOPOLL) {
@@ -140,12 +140,8 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
else
lockdep_assert(current == ctx->submitter_task);
}
-}
-#else
-static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
-{
-}
#endif
+}
static inline void io_req_task_work_add(struct io_kiocb *req)
{
--
2.44.0
^ permalink raw reply related [flat|nested] 41+ messages in thread