public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] test/defer: fix deadlock when io_uring_submit fail
@ 2025-01-15 13:10 lizetao
  2025-01-16 14:51 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: lizetao @ 2025-01-15 13:10 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: [email protected]

While performing fault injection testing, a bug report was triggered:

  FAULT_INJECTION: forcing a failure.
  name fail_usercopy, interval 1, probability 0, space 0, times 0
  CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G           O       6.13.0-rc6-gf2a0a37b174b #17
  Tainted: [O]=OOT_MODULE
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   show_stack+0x20/0x38 (C)
   dump_stack_lvl+0x78/0x90
   dump_stack+0x1c/0x28
   should_fail_ex+0x544/0x648
   should_fail+0x14/0x20
   should_fail_usercopy+0x1c/0x28
   get_timespec64+0x7c/0x258
   __io_timeout_prep+0x31c/0x798
   io_link_timeout_prep+0x1c/0x30
   io_submit_sqes+0x59c/0x1d50
   __arm64_sys_io_uring_enter+0x8dc/0xfa0
   invoke_syscall+0x74/0x270
   el0_svc_common.constprop.0+0xb4/0x240
   do_el0_svc+0x48/0x68
   el0_svc+0x38/0x78
   el0t_64_sync_handler+0xc8/0xd0
   el0t_64_sync+0x198/0x1a0

The deadlock stack is as follows:

  io_cqring_wait+0xa64/0x1060
  __arm64_sys_io_uring_enter+0x46c/0xfa0
  invoke_syscall+0x74/0x270
  el0_svc_common.constprop.0+0xb4/0x240
  do_el0_svc+0x48/0x68
  el0_svc+0x38/0x78
  el0t_64_sync_handler+0xc8/0xd0
  el0t_64_sync+0x198/0x1a0

This is because after the submission fails, the defer.t testcase is still waiting to submit the failed request, resulting in an eventual deadlock.
Solve the problem by telling wait_cqes the number of requests to wait for.

Fixes: 6f6de47d6126 ("test/defer: Test deferring with drain and links")
Signed-off-by: Li Zetao <[email protected]>
---
 test/defer.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/test/defer.c b/test/defer.c index b0770ef..2447be0 100644
--- a/test/defer.c
+++ b/test/defer.c
@@ -69,12 +69,12 @@ err:
 	return 1;
 }
 
-static int wait_cqes(struct test_context *ctx)
+static int wait_cqes(struct test_context *ctx, int num)
 {
 	int ret, i;
 	struct io_uring_cqe *cqe;
 
-	for (i = 0; i < ctx->nr; i++) {
+	for (i = 0; i < num; i++) {
 		ret = io_uring_wait_cqe(ctx->ring, &cqe);
 
 		if (ret < 0) {
@@ -105,7 +105,7 @@ static int test_canceled_userdata(struct io_uring *ring)
 		goto err;
 	}
 
-	if (wait_cqes(&ctx))
+	if (wait_cqes(&ctx, ret))
 		goto err;
 
 	for (i = 0; i < nr; i++) {
@@ -139,7 +139,7 @@ static int test_thread_link_cancel(struct io_uring *ring)
 		goto err;
 	}
 
-	if (wait_cqes(&ctx))
+	if (wait_cqes(&ctx, ret))
 		goto err;
 
 	for (i = 0; i < nr; i++) {
@@ -185,7 +185,7 @@ static int test_drain_with_linked_timeout(struct io_uring *ring)
 		goto err;
 	}
 
-	if (wait_cqes(&ctx))
+	if (wait_cqes(&ctx, ret))
 		goto err;
 
 	free_context(&ctx);
@@ -212,7 +212,7 @@ static int run_drained(struct io_uring *ring, int nr)
 		goto err;
 	}
 
-	if (wait_cqes(&ctx))
+	if (wait_cqes(&ctx, ret))
 		goto err;
 
 	free_context(&ctx);
--
2.33.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
  2025-01-15 13:10 [PATCH] test/defer: fix deadlock when io_uring_submit fail lizetao
@ 2025-01-16 14:51 ` Jens Axboe
  2025-01-18  9:42   ` lizetao
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2025-01-16 14:51 UTC (permalink / raw)
  To: lizetao, Pavel Begunkov; +Cc: [email protected]

On 1/15/25 6:10 AM, lizetao wrote:
> While performing fault injection testing, a bug report was triggered:
> 
>   FAULT_INJECTION: forcing a failure.
>   name fail_usercopy, interval 1, probability 0, space 0, times 0
>   CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G           O       6.13.0-rc6-gf2a0a37b174b #17
>   Tainted: [O]=OOT_MODULE
>   Hardware name: linux,dummy-virt (DT)
>   Call trace:
>    show_stack+0x20/0x38 (C)
>    dump_stack_lvl+0x78/0x90
>    dump_stack+0x1c/0x28
>    should_fail_ex+0x544/0x648
>    should_fail+0x14/0x20
>    should_fail_usercopy+0x1c/0x28
>    get_timespec64+0x7c/0x258
>    __io_timeout_prep+0x31c/0x798
>    io_link_timeout_prep+0x1c/0x30
>    io_submit_sqes+0x59c/0x1d50
>    __arm64_sys_io_uring_enter+0x8dc/0xfa0
>    invoke_syscall+0x74/0x270
>    el0_svc_common.constprop.0+0xb4/0x240
>    do_el0_svc+0x48/0x68
>    el0_svc+0x38/0x78
>    el0t_64_sync_handler+0xc8/0xd0
>    el0t_64_sync+0x198/0x1a0
> 
> The deadlock stack is as follows:
> 
>   io_cqring_wait+0xa64/0x1060
>   __arm64_sys_io_uring_enter+0x46c/0xfa0
>   invoke_syscall+0x74/0x270
>   el0_svc_common.constprop.0+0xb4/0x240
>   do_el0_svc+0x48/0x68
>   el0_svc+0x38/0x78
>   el0t_64_sync_handler+0xc8/0xd0
>   el0t_64_sync+0x198/0x1a0
> 
> This is because after the submission fails, the defer.t testcase is still waiting to submit the failed request, resulting in an eventual deadlock.
> Solve the problem by telling wait_cqes the number of requests to wait for.

I suspect this would be fixed by setting IORING_SETUP_SUBMIT_ALL for
ring init, something probably all/most tests should set.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] test/defer: fix deadlock when io_uring_submit fail
  2025-01-16 14:51 ` Jens Axboe
@ 2025-01-18  9:42   ` lizetao
  2025-01-18 15:43     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: lizetao @ 2025-01-18  9:42 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: [email protected]

Hi,

> -----Original Message-----
> From: Jens Axboe <[email protected]>
> Sent: Thursday, January 16, 2025 10:51 PM
> To: lizetao <[email protected]>; Pavel Begunkov <[email protected]>
> Cc: [email protected]
> Subject: Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
> 
> On 1/15/25 6:10 AM, lizetao wrote:
> > While performing fault injection testing, a bug report was triggered:
> >
> >   FAULT_INJECTION: forcing a failure.
> >   name fail_usercopy, interval 1, probability 0, space 0, times 0
> >   CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G           O
> 6.13.0-rc6-gf2a0a37b174b #17
> >   Tainted: [O]=OOT_MODULE
> >   Hardware name: linux,dummy-virt (DT)
> >   Call trace:
> >    show_stack+0x20/0x38 (C)
> >    dump_stack_lvl+0x78/0x90
> >    dump_stack+0x1c/0x28
> >    should_fail_ex+0x544/0x648
> >    should_fail+0x14/0x20
> >    should_fail_usercopy+0x1c/0x28
> >    get_timespec64+0x7c/0x258
> >    __io_timeout_prep+0x31c/0x798
> >    io_link_timeout_prep+0x1c/0x30
> >    io_submit_sqes+0x59c/0x1d50
> >    __arm64_sys_io_uring_enter+0x8dc/0xfa0
> >    invoke_syscall+0x74/0x270
> >    el0_svc_common.constprop.0+0xb4/0x240
> >    do_el0_svc+0x48/0x68
> >    el0_svc+0x38/0x78
> >    el0t_64_sync_handler+0xc8/0xd0
> >    el0t_64_sync+0x198/0x1a0
> >
> > The deadlock stack is as follows:
> >
> >   io_cqring_wait+0xa64/0x1060
> >   __arm64_sys_io_uring_enter+0x46c/0xfa0
> >   invoke_syscall+0x74/0x270
> >   el0_svc_common.constprop.0+0xb4/0x240
> >   do_el0_svc+0x48/0x68
> >   el0_svc+0x38/0x78
> >   el0t_64_sync_handler+0xc8/0xd0
> >   el0t_64_sync+0x198/0x1a0
> >
> > This is because after the submission fails, the defer.t testcase is still waiting to
> submit the failed request, resulting in an eventual deadlock.
> > Solve the problem by telling wait_cqes the number of requests to wait for.
> 
> I suspect this would be fixed by setting IORING_SETUP_SUBMIT_ALL for ring init,
> something probably all/most tests should set.


I tested it and found that IORING_SETUP_SUBMIT_ALL can indeed solve this problem. 
Should I just modify this problem or add IORING_SETUP_SUBMIT_ALL to the general path to
solve most possible problems?
> 
> --
> Jens Axboe

---
Li Zetao


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
  2025-01-18  9:42   ` lizetao
@ 2025-01-18 15:43     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-01-18 15:43 UTC (permalink / raw)
  To: lizetao, Pavel Begunkov; +Cc: [email protected]

On 1/18/25 2:42 AM, lizetao wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Jens Axboe <[email protected]>
>> Sent: Thursday, January 16, 2025 10:51 PM
>> To: lizetao <[email protected]>; Pavel Begunkov <[email protected]>
>> Cc: [email protected]
>> Subject: Re: [PATCH] test/defer: fix deadlock when io_uring_submit fail
>>
>> On 1/15/25 6:10 AM, lizetao wrote:
>>> While performing fault injection testing, a bug report was triggered:
>>>
>>>   FAULT_INJECTION: forcing a failure.
>>>   name fail_usercopy, interval 1, probability 0, space 0, times 0
>>>   CPU: 12 UID: 0 PID: 18795 Comm: defer.t Tainted: G           O
>> 6.13.0-rc6-gf2a0a37b174b #17
>>>   Tainted: [O]=OOT_MODULE
>>>   Hardware name: linux,dummy-virt (DT)
>>>   Call trace:
>>>    show_stack+0x20/0x38 (C)
>>>    dump_stack_lvl+0x78/0x90
>>>    dump_stack+0x1c/0x28
>>>    should_fail_ex+0x544/0x648
>>>    should_fail+0x14/0x20
>>>    should_fail_usercopy+0x1c/0x28
>>>    get_timespec64+0x7c/0x258
>>>    __io_timeout_prep+0x31c/0x798
>>>    io_link_timeout_prep+0x1c/0x30
>>>    io_submit_sqes+0x59c/0x1d50
>>>    __arm64_sys_io_uring_enter+0x8dc/0xfa0
>>>    invoke_syscall+0x74/0x270
>>>    el0_svc_common.constprop.0+0xb4/0x240
>>>    do_el0_svc+0x48/0x68
>>>    el0_svc+0x38/0x78
>>>    el0t_64_sync_handler+0xc8/0xd0
>>>    el0t_64_sync+0x198/0x1a0
>>>
>>> The deadlock stack is as follows:
>>>
>>>   io_cqring_wait+0xa64/0x1060
>>>   __arm64_sys_io_uring_enter+0x46c/0xfa0
>>>   invoke_syscall+0x74/0x270
>>>   el0_svc_common.constprop.0+0xb4/0x240
>>>   do_el0_svc+0x48/0x68
>>>   el0_svc+0x38/0x78
>>>   el0t_64_sync_handler+0xc8/0xd0
>>>   el0t_64_sync+0x198/0x1a0
>>>
>>> This is because after the submission fails, the defer.t testcase is still waiting to
>> submit the failed request, resulting in an eventual deadlock.
>>> Solve the problem by telling wait_cqes the number of requests to wait for.
>>
>> I suspect this would be fixed by setting IORING_SETUP_SUBMIT_ALL for ring init,
>> something probably all/most tests should set.
> 
> 
> I tested it and found that IORING_SETUP_SUBMIT_ALL can indeed solve
> this problem. Should I just modify this problem or add
> IORING_SETUP_SUBMIT_ALL to the general path to solve most possible
> problems?

I think just fix up this one. We really should have all the tests use
t_create_ring*() first, and those helpers should just set SUBMIT_ALL.
But that's a separate change.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-18 15:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-15 13:10 [PATCH] test/defer: fix deadlock when io_uring_submit fail lizetao
2025-01-16 14:51 ` Jens Axboe
2025-01-18  9:42   ` lizetao
2025-01-18 15:43     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox