* [PATCH] test/defer: fix deadlock when io_uring_submit fail
@ 2025-01-15 13:10 lizetao
2025-01-16 14:51 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: lizetao @ 2025-01-15 13:10 UTC (permalink / raw)
To: Jens Axboe, Pavel Begunkov; +Cc: [email protected]
While performing fault injection testing, a bug report was triggered:
FAULT_INJECTION: forcing a failure.
name fail_usercopy, interval 1, probability 0, space 0, times 0
CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G O 6.13.0-rc6-gf2a0a37b174b #17
Tainted: [O]=OOT_MODULE
Hardware name: linux,dummy-virt (DT)
Call trace:
show_stack+0x20/0x38 (C)
dump_stack_lvl+0x78/0x90
dump_stack+0x1c/0x28
should_fail_ex+0x544/0x648
should_fail+0x14/0x20
should_fail_usercopy+0x1c/0x28
get_timespec64+0x7c/0x258
__io_timeout_prep+0x31c/0x798
io_link_timeout_prep+0x1c/0x30
io_submit_sqes+0x59c/0x1d50
__arm64_sys_io_uring_enter+0x8dc/0xfa0
invoke_syscall+0x74/0x270
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x38/0x78
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x198/0x1a0
The deadlock stack is as follows:
io_cqring_wait+0xa64/0x1060
__arm64_sys_io_uring_enter+0x46c/0xfa0
invoke_syscall+0x74/0x270
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x38/0x78
el0t_64_sync_handler+0xc8/0xd0
el0t_64_sync+0x198/0x1a0
This is because after the submission fails, the defer.t testcase is still waiting to submit the failed request, resulting in an eventual deadlock.
Solve the problem by telling wait_cqes the number of requests to wait for.
Fixes: 6f6de47d6126 ("test/defer: Test deferring with drain and links")
Signed-off-by: Li Zetao <[email protected]>
---
test/defer.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/test/defer.c b/test/defer.c index b0770ef..2447be0 100644
--- a/test/defer.c
+++ b/test/defer.c
@@ -69,12 +69,12 @@ err:
return 1;
}
-static int wait_cqes(struct test_context *ctx)
+static int wait_cqes(struct test_context *ctx, int num)
{
int ret, i;
struct io_uring_cqe *cqe;
- for (i = 0; i < ctx->nr; i++) {
+ for (i = 0; i < num; i++) {
ret = io_uring_wait_cqe(ctx->ring, &cqe);
if (ret < 0) {
@@ -105,7 +105,7 @@ static int test_canceled_userdata(struct io_uring *ring)
goto err;
}
- if (wait_cqes(&ctx))
+ if (wait_cqes(&ctx, ret))
goto err;
for (i = 0; i < nr; i++) {
@@ -139,7 +139,7 @@ static int test_thread_link_cancel(struct io_uring *ring)
goto err;
}
- if (wait_cqes(&ctx))
+ if (wait_cqes(&ctx, ret))
goto err;
for (i = 0; i < nr; i++) {
@@ -185,7 +185,7 @@ static int test_drain_with_linked_timeout(struct io_uring *ring)
goto err;
}
- if (wait_cqes(&ctx))
+ if (wait_cqes(&ctx, ret))
goto err;
free_context(&ctx);
@@ -212,7 +212,7 @@ static int run_drained(struct io_uring *ring, int nr)
goto err;
}
- if (wait_cqes(&ctx))
+ if (wait_cqes(&ctx, ret))
goto err;
free_context(&ctx);
--
2.33.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
2025-01-15 13:10 [PATCH] test/defer: fix deadlock when io_uring_submit fail lizetao
@ 2025-01-16 14:51 ` Jens Axboe
2025-01-18 9:42 ` lizetao
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2025-01-16 14:51 UTC (permalink / raw)
To: lizetao, Pavel Begunkov; +Cc: [email protected]
On 1/15/25 6:10 AM, lizetao wrote:
> While performing fault injection testing, a bug report was triggered:
>
> FAULT_INJECTION: forcing a failure.
> name fail_usercopy, interval 1, probability 0, space 0, times 0
> CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G O 6.13.0-rc6-gf2a0a37b174b #17
> Tainted: [O]=OOT_MODULE
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> show_stack+0x20/0x38 (C)
> dump_stack_lvl+0x78/0x90
> dump_stack+0x1c/0x28
> should_fail_ex+0x544/0x648
> should_fail+0x14/0x20
> should_fail_usercopy+0x1c/0x28
> get_timespec64+0x7c/0x258
> __io_timeout_prep+0x31c/0x798
> io_link_timeout_prep+0x1c/0x30
> io_submit_sqes+0x59c/0x1d50
> __arm64_sys_io_uring_enter+0x8dc/0xfa0
> invoke_syscall+0x74/0x270
> el0_svc_common.constprop.0+0xb4/0x240
> do_el0_svc+0x48/0x68
> el0_svc+0x38/0x78
> el0t_64_sync_handler+0xc8/0xd0
> el0t_64_sync+0x198/0x1a0
>
> The deadlock stack is as follows:
>
> io_cqring_wait+0xa64/0x1060
> __arm64_sys_io_uring_enter+0x46c/0xfa0
> invoke_syscall+0x74/0x270
> el0_svc_common.constprop.0+0xb4/0x240
> do_el0_svc+0x48/0x68
> el0_svc+0x38/0x78
> el0t_64_sync_handler+0xc8/0xd0
> el0t_64_sync+0x198/0x1a0
>
> This is because after the submission fails, the defer.t testcase is still waiting to submit the failed request, resulting in an eventual deadlock.
> Solve the problem by telling wait_cqes the number of requests to wait for.
I suspect this would be fixed by setting IORING_SETUP_SUBMIT_ALL for
ring init, something probably all/most tests should set.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH] test/defer: fix deadlock when io_uring_submit fail
2025-01-16 14:51 ` Jens Axboe
@ 2025-01-18 9:42 ` lizetao
2025-01-18 15:43 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: lizetao @ 2025-01-18 9:42 UTC (permalink / raw)
To: Jens Axboe, Pavel Begunkov; +Cc: [email protected]
Hi,
> -----Original Message-----
> From: Jens Axboe <[email protected]>
> Sent: Thursday, January 16, 2025 10:51 PM
> To: lizetao <[email protected]>; Pavel Begunkov <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
>
> On 1/15/25 6:10 AM, lizetao wrote:
> > While performing fault injection testing, a bug report was triggered:
> >
> > FAULT_INJECTION: forcing a failure.
> > name fail_usercopy, interval 1, probability 0, space 0, times 0
> > CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G O
> 6.13.0-rc6-gf2a0a37b174b #17
> > Tainted: [O]=OOT_MODULE
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> > show_stack+0x20/0x38 (C)
> > dump_stack_lvl+0x78/0x90
> > dump_stack+0x1c/0x28
> > should_fail_ex+0x544/0x648
> > should_fail+0x14/0x20
> > should_fail_usercopy+0x1c/0x28
> > get_timespec64+0x7c/0x258
> > __io_timeout_prep+0x31c/0x798
> > io_link_timeout_prep+0x1c/0x30
> > io_submit_sqes+0x59c/0x1d50
> > __arm64_sys_io_uring_enter+0x8dc/0xfa0
> > invoke_syscall+0x74/0x270
> > el0_svc_common.constprop.0+0xb4/0x240
> > do_el0_svc+0x48/0x68
> > el0_svc+0x38/0x78
> > el0t_64_sync_handler+0xc8/0xd0
> > el0t_64_sync+0x198/0x1a0
> >
> > The deadlock stack is as follows:
> >
> > io_cqring_wait+0xa64/0x1060
> > __arm64_sys_io_uring_enter+0x46c/0xfa0
> > invoke_syscall+0x74/0x270
> > el0_svc_common.constprop.0+0xb4/0x240
> > do_el0_svc+0x48/0x68
> > el0_svc+0x38/0x78
> > el0t_64_sync_handler+0xc8/0xd0
> > el0t_64_sync+0x198/0x1a0
> >
> > This is because after the submission fails, the defer.t testcase is still waiting to
> submit the failed request, resulting in an eventual deadlock.
> > Solve the problem by telling wait_cqes the number of requests to wait for.
>
> I suspect this would be fixed by setting IORING_SETUP_SUBMIT_ALL for ring init,
> something probably all/most tests should set.
I tested it and found that IORING_SETUP_SUBMIT_ALL can indeed solve this problem.
Should I just modify this problem or add IORING_SETUP_SUBMIT_ALL to the general path to
solve most possible problems?
>
> --
> Jens Axboe
---
Li Zetao
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
2025-01-18 9:42 ` lizetao
@ 2025-01-18 15:43 ` Jens Axboe
0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-01-18 15:43 UTC (permalink / raw)
To: lizetao, Pavel Begunkov; +Cc: [email protected]
On 1/18/25 2:42 AM, lizetao wrote:
> Hi,
>
>> -----Original Message-----
>> From: Jens Axboe <[email protected]>
>> Sent: Thursday, January 16, 2025 10:51 PM
>> To: lizetao <[email protected]>; Pavel Begunkov <[email protected]>
>> Cc: [email protected]
>> Subject: Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
>>
>> On 1/15/25 6:10 AM, lizetao wrote:
>>> While performing fault injection testing, a bug report was triggered:
>>>
>>> FAULT_INJECTION: forcing a failure.
>>> name fail_usercopy, interval 1, probability 0, space 0, times 0
>>> CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G O
>> 6.13.0-rc6-gf2a0a37b174b #17
>>> Tainted: [O]=OOT_MODULE
>>> Hardware name: linux,dummy-virt (DT)
>>> Call trace:
>>> show_stack+0x20/0x38 (C)
>>> dump_stack_lvl+0x78/0x90
>>> dump_stack+0x1c/0x28
>>> should_fail_ex+0x544/0x648
>>> should_fail+0x14/0x20
>>> should_fail_usercopy+0x1c/0x28
>>> get_timespec64+0x7c/0x258
>>> __io_timeout_prep+0x31c/0x798
>>> io_link_timeout_prep+0x1c/0x30
>>> io_submit_sqes+0x59c/0x1d50
>>> __arm64_sys_io_uring_enter+0x8dc/0xfa0
>>> invoke_syscall+0x74/0x270
>>> el0_svc_common.constprop.0+0xb4/0x240
>>> do_el0_svc+0x48/0x68
>>> el0_svc+0x38/0x78
>>> el0t_64_sync_handler+0xc8/0xd0
>>> el0t_64_sync+0x198/0x1a0
>>>
>>> The deadlock stack is as follows:
>>>
>>> io_cqring_wait+0xa64/0x1060
>>> __arm64_sys_io_uring_enter+0x46c/0xfa0
>>> invoke_syscall+0x74/0x270
>>> el0_svc_common.constprop.0+0xb4/0x240
>>> do_el0_svc+0x48/0x68
>>> el0_svc+0x38/0x78
>>> el0t_64_sync_handler+0xc8/0xd0
>>> el0t_64_sync+0x198/0x1a0
>>>
>>> This is because after the submission fails, the defer.t testcase is still waiting to
>> submit the failed request, resulting in an eventual deadlock.
>>> Solve the problem by telling wait_cqes the number of requests to wait for.
>>
>> I suspect this would be fixed by setting IORING_SETUP_SUBMIT_ALL for ring init,
>> something probably all/most tests should set.
>
>
> I tested it and found that IORING_SETUP_SUBMIT_ALL can indeed solve
> this problem. Should I just modify this problem or add
> IORING_SETUP_SUBMIT_ALL to the general path to solve most possible
> problems?
I think just fix up this one. We really should have all the tests use
t_create_ring*() first, and those helpers should just set SUBMIT_ALL.
But that's a separate change.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-18 15:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-15 13:10 [PATCH] test/defer: fix deadlock when io_uring_submit fail lizetao
2025-01-16 14:51 ` Jens Axboe
2025-01-18 9:42 ` lizetao
2025-01-18 15:43 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox