public inbox for [email protected]
 help / color / mirror / Atom feed
* crash on connect
@ 2020-02-20 14:19 Glauber Costa
  2020-02-20 16:17 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2020-02-20 14:19 UTC (permalink / raw)
  To: io-uring, Jens Axboe, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

Hi there, me again

Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347

This test is easier to explain: it essentially issues a connect and a
shutdown right away.

It currently fails due to no fault of io_uring. But every now and then
it crashes (you may have to run more than once to get it to crash)

Instructions are similar to my last test.
Except the test to build is now "tests/unit/connect_test"
Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash

Run it with ./build/release/tests/unit/connect_test -- -c1
--reactor-backend=uring

Backtrace attached

[-- Attachment #2: uring-connect.txt --]
[-- Type: text/plain, Size: 4567 bytes --]

[  732.030514] BUG: unable to handle page fault for address: 0000000000002008
[  732.030666] #PF: supervisor write access in kernel mode
[  732.030807] #PF: error_code(0x0002) - not-present page
[  732.030946] PGD 8000000fe304d067 P4D 8000000fe304d067 PUD fe4745067 PMD 0 
[  732.031131] Oops: 0002 [#1] SMP PTI
[  732.031355] CPU: 0 PID: 1656 Comm: connect_test Not tainted 5.6.0-rc1+ #39
[  732.031583] Hardware name: ASUS All Series/X99-A, BIOS 3402 08/18/2016
[  732.031817] RIP: 0010:__io_queue_sqe+0x4ac/0x4f0
[  732.032044] Code: 13 4d 85 d2 75 d8 4c 8b 64 24 18 4c 8b 7c 24 08 e9 c3 fe ff ff 48 8b 43 60 48 85 c0 74 20 48 8b 53 58 48 89 10 48 85 d2 74 04 <48> 89 42 08 48 c7 43 58 00 00 00 00 48 c7 43 60 00 00 00 00 48 8b
[  732.032300] RSP: 0018:ffffb9eec11c7d20 EFLAGS: 00010006
[  732.032564] RAX: ffffe62e7e5b9700 RBX: ffff99966ee25700 RCX: dead000000000122
[  732.032817] RDX: 0000000000002000 RSI: ffff999676b10580 RDI: ffff999676b105b0
[  732.033067] RBP: ffffb9eec11c7db0 R08: ffff99966c3ce848 R09: ffff99966ee25700
[  732.033319] R10: ffffffffa0e639a0 R11: ffff99966ee257a8 R12: 0000000000000000
[  732.033572] R13: ffff999676b105c0 R14: fffffffffffffff5 R15: ffff999663058040
[  732.033827] FS:  00007ffff2897700(0000) GS:ffff99967fa00000(0000) knlGS:0000000000000000
[  732.034080] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  732.034334] CR2: 0000000000002008 CR3: 0000000fe20f6005 CR4: 00000000001606f0
[  732.034596] Call Trace:
[  732.034853]  ? io_poll_queue_proc+0x30/0x30
[  732.035112]  ? kmem_cache_alloc+0x1a4/0x230
[  732.035355]  io_submit_sqes+0x772/0xad0
[  732.035614]  ? __wake_up_common_lock+0x87/0xc0
[  732.035857]  ? sock_has_perm+0x80/0xa0
[  732.036107]  __x64_sys_io_uring_enter+0x253/0x350
[  732.036364]  do_syscall_64+0x5b/0x190
[  732.036615]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  732.036870] RIP: 0033:0x7ffff5b6dc4d
[  732.037125] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0b 72 0c 00 f7 d8 64 89 01 48
[  732.037525] RSP: 002b:00007ffff28930b8 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
[  732.037805] RAX: ffffffffffffffda RBX: 0000000000000011 RCX: 00007ffff5b6dc4d
[  732.038083] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000007
[  732.038364] RBP: 00007ffff2893120 R08: 0000000000000000 R09: 0000000000000008
[  732.038647] R10: 0000000000000000 R11: 0000000000000246 R12: 00006160000117e0
[  732.038928] R13: 00007ffff2893260 R14: 00000000013cd408 R15: 000060200002aa10
[  732.039213] Modules linked in: iptable_mangle xt_CHECKSUM iptable_nat xt_MASQUERADE nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables xfs libcrc32c snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec irqbypass snd_hwdep snd_hda_core crct10dif_pclmul crc32_pclmul snd_seq ghash_clmulni_intel intel_cstate snd_seq_device eeepc_wmi intel_uncore snd_pcm asus_wmi wmi_bmof iTCO_wdt pcspkr intel_rapl_perf sparse_keymap rfkill snd_timer i2c_i801 iTCO_vendor_support mei_me snd mei intel_wmi_thunderbolt lpc_ich soundcore nouveau video i2c_algo_bit drm_kms_helper cec ttm drm e1000e mxm_wmi nvme crc32c_intel nvme_core wmi fuse
[  732.040520] CR2: 0000000000002008
[  732.040873] ---[ end trace ad0acf94c0df32bf ]---
[  732.041226] RIP: 0010:__io_queue_sqe+0x4ac/0x4f0
[  732.041579] Code: 13 4d 85 d2 75 d8 4c 8b 64 24 18 4c 8b 7c 24 08 e9 c3 fe ff ff 48 8b 43 60 48 85 c0 74 20 48 8b 53 58 48 89 10 48 85 d2 74 04 <48> 89 42 08 48 c7 43 58 00 00 00 00 48 c7 43 60 00 00 00 00 48 8b
[  732.041965] RSP: 0018:ffffb9eec11c7d20 EFLAGS: 00010006
[  732.042343] RAX: ffffe62e7e5b9700 RBX: ffff99966ee25700 RCX: dead000000000122
[  732.042720] RDX: 0000000000002000 RSI: ffff999676b10580 RDI: ffff999676b105b0
[  732.043102] RBP: ffffb9eec11c7db0 R08: ffff99966c3ce848 R09: ffff99966ee25700
[  732.043483] R10: ffffffffa0e639a0 R11: ffff99966ee257a8 R12: 0000000000000000
[  732.043865] R13: ffff999676b105c0 R14: fffffffffffffff5 R15: ffff999663058040
[  732.044246] FS:  00007ffff2897700(0000) GS:ffff99967fa00000(0000) knlGS:0000000000000000
[  732.044627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  732.045010] CR2: 0000000000002008 CR3: 0000000fe20f6005 CR4: 00000000001606f0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 14:19 crash on connect Glauber Costa
@ 2020-02-20 16:17 ` Jens Axboe
  2020-02-20 16:29   ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2020-02-20 16:17 UTC (permalink / raw)
  To: Glauber Costa, io-uring, Avi Kivity

On 2/20/20 7:19 AM, Glauber Costa wrote:
> Hi there, me again
> 
> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
> 
> This test is easier to explain: it essentially issues a connect and a
> shutdown right away.
> 
> It currently fails due to no fault of io_uring. But every now and then
> it crashes (you may have to run more than once to get it to crash)
> 
> Instructions are similar to my last test.
> Except the test to build is now "tests/unit/connect_test"
> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
> 
> Run it with ./build/release/tests/unit/connect_test -- -c1
> --reactor-backend=uring
> 
> Backtrace attached

Perfect thanks, I'll take a look!


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 16:17 ` Jens Axboe
@ 2020-02-20 16:29   ` Jens Axboe
  2020-02-20 16:34     ` Glauber Costa
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2020-02-20 16:29 UTC (permalink / raw)
  To: Glauber Costa, io-uring, Avi Kivity

On 2/20/20 9:17 AM, Jens Axboe wrote:
> On 2/20/20 7:19 AM, Glauber Costa wrote:
>> Hi there, me again
>>
>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
>>
>> This test is easier to explain: it essentially issues a connect and a
>> shutdown right away.
>>
>> It currently fails due to no fault of io_uring. But every now and then
>> it crashes (you may have to run more than once to get it to crash)
>>
>> Instructions are similar to my last test.
>> Except the test to build is now "tests/unit/connect_test"
>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
>>
>> Run it with ./build/release/tests/unit/connect_test -- -c1
>> --reactor-backend=uring
>>
>> Backtrace attached
> 
> Perfect thanks, I'll take a look!

Haven't managed to crash it yet, but every run complains:

got to shutdown of 10 with refcnt: 2
Refs being all dropped, calling forget for 10
terminate called after throwing an instance of 'fmt::v6::format_error'
  what():  argument index out of range
unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)

Not sure if that's causing it not to fail here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 16:29   ` Jens Axboe
@ 2020-02-20 16:34     ` Glauber Costa
  2020-02-20 16:38       ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2020-02-20 16:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Avi Kivity

On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
>
> On 2/20/20 9:17 AM, Jens Axboe wrote:
> > On 2/20/20 7:19 AM, Glauber Costa wrote:
> >> Hi there, me again
> >>
> >> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
> >>
> >> This test is easier to explain: it essentially issues a connect and a
> >> shutdown right away.
> >>
> >> It currently fails due to no fault of io_uring. But every now and then
> >> it crashes (you may have to run more than once to get it to crash)
> >>
> >> Instructions are similar to my last test.
> >> Except the test to build is now "tests/unit/connect_test"
> >> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
> >>
> >> Run it with ./build/release/tests/unit/connect_test -- -c1
> >> --reactor-backend=uring
> >>
> >> Backtrace attached
> >
> > Perfect thanks, I'll take a look!
>
> Haven't managed to crash it yet, but every run complains:
>
> got to shutdown of 10 with refcnt: 2
> Refs being all dropped, calling forget for 10
> terminate called after throwing an instance of 'fmt::v6::format_error'
>   what():  argument index out of range
> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
>
> Not sure if that's causing it not to fail here.

Ok, that means it "passed". (I was in the process of figuring out
where I got this wrong when I started seeing the crashes)

>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 16:34     ` Glauber Costa
@ 2020-02-20 16:38       ` Jens Axboe
  2020-02-20 16:52         ` Glauber Costa
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2020-02-20 16:38 UTC (permalink / raw)
  To: Glauber Costa; +Cc: io-uring, Avi Kivity

On 2/20/20 9:34 AM, Glauber Costa wrote:
> On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
>>
>> On 2/20/20 9:17 AM, Jens Axboe wrote:
>>> On 2/20/20 7:19 AM, Glauber Costa wrote:
>>>> Hi there, me again
>>>>
>>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
>>>>
>>>> This test is easier to explain: it essentially issues a connect and a
>>>> shutdown right away.
>>>>
>>>> It currently fails due to no fault of io_uring. But every now and then
>>>> it crashes (you may have to run more than once to get it to crash)
>>>>
>>>> Instructions are similar to my last test.
>>>> Except the test to build is now "tests/unit/connect_test"
>>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
>>>>
>>>> Run it with ./build/release/tests/unit/connect_test -- -c1
>>>> --reactor-backend=uring
>>>>
>>>> Backtrace attached
>>>
>>> Perfect thanks, I'll take a look!
>>
>> Haven't managed to crash it yet, but every run complains:
>>
>> got to shutdown of 10 with refcnt: 2
>> Refs being all dropped, calling forget for 10
>> terminate called after throwing an instance of 'fmt::v6::format_error'
>>   what():  argument index out of range
>> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
>>
>> Not sure if that's causing it not to fail here.
> 
> Ok, that means it "passed". (I was in the process of figuring out
> where I got this wrong when I started seeing the crashes)

Can you do, in your kernel dir:

$ gdb vmlinux
[...]
(gdb) l *__io_queue_sqe+0x4a

and see what it says?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 16:38       ` Jens Axboe
@ 2020-02-20 16:52         ` Glauber Costa
  2020-02-20 17:28           ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2020-02-20 16:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Avi Kivity

On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <[email protected]> wrote:
>
> On 2/20/20 9:34 AM, Glauber Costa wrote:
> > On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
> >>
> >> On 2/20/20 9:17 AM, Jens Axboe wrote:
> >>> On 2/20/20 7:19 AM, Glauber Costa wrote:
> >>>> Hi there, me again
> >>>>
> >>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
> >>>>
> >>>> This test is easier to explain: it essentially issues a connect and a
> >>>> shutdown right away.
> >>>>
> >>>> It currently fails due to no fault of io_uring. But every now and then
> >>>> it crashes (you may have to run more than once to get it to crash)
> >>>>
> >>>> Instructions are similar to my last test.
> >>>> Except the test to build is now "tests/unit/connect_test"
> >>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
> >>>>
> >>>> Run it with ./build/release/tests/unit/connect_test -- -c1
> >>>> --reactor-backend=uring
> >>>>
> >>>> Backtrace attached
> >>>
> >>> Perfect thanks, I'll take a look!
> >>
> >> Haven't managed to crash it yet, but every run complains:
> >>
> >> got to shutdown of 10 with refcnt: 2
> >> Refs being all dropped, calling forget for 10
> >> terminate called after throwing an instance of 'fmt::v6::format_error'
> >>   what():  argument index out of range
> >> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
> >>
> >> Not sure if that's causing it not to fail here.
> >
> > Ok, that means it "passed". (I was in the process of figuring out
> > where I got this wrong when I started seeing the crashes)
>
> Can you do, in your kernel dir:
>
> $ gdb vmlinux
> [...]
> (gdb) l *__io_queue_sqe+0x4a
>
> and see what it says?

0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
4809 struct io_kiocb *linked_timeout;
4810 struct io_kiocb *nxt = NULL;
4811 int ret;
4812
4813 again:
4814 linked_timeout = io_prep_linked_timeout(req);
4815
4816 ret = io_issue_sqe(req, sqe, &nxt, true);
4817
4818 /*

(I am not using timeouts, just async_cancel)
>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 16:52         ` Glauber Costa
@ 2020-02-20 17:28           ` Jens Axboe
  2020-02-20 17:33             ` Glauber Costa
       [not found]             ` <CAD-J=zbdrZJ2nKgH3Ob=QAAM9Ci439T9DduNxvetK9B_52LDOQ@mail.gmail.com>
  0 siblings, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2020-02-20 17:28 UTC (permalink / raw)
  To: Glauber Costa; +Cc: io-uring, Avi Kivity

On 2/20/20 9:52 AM, Glauber Costa wrote:
> On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <[email protected]> wrote:
>>
>> On 2/20/20 9:34 AM, Glauber Costa wrote:
>>> On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
>>>>
>>>> On 2/20/20 9:17 AM, Jens Axboe wrote:
>>>>> On 2/20/20 7:19 AM, Glauber Costa wrote:
>>>>>> Hi there, me again
>>>>>>
>>>>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
>>>>>>
>>>>>> This test is easier to explain: it essentially issues a connect and a
>>>>>> shutdown right away.
>>>>>>
>>>>>> It currently fails due to no fault of io_uring. But every now and then
>>>>>> it crashes (you may have to run more than once to get it to crash)
>>>>>>
>>>>>> Instructions are similar to my last test.
>>>>>> Except the test to build is now "tests/unit/connect_test"
>>>>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
>>>>>>
>>>>>> Run it with ./build/release/tests/unit/connect_test -- -c1
>>>>>> --reactor-backend=uring
>>>>>>
>>>>>> Backtrace attached
>>>>>
>>>>> Perfect thanks, I'll take a look!
>>>>
>>>> Haven't managed to crash it yet, but every run complains:
>>>>
>>>> got to shutdown of 10 with refcnt: 2
>>>> Refs being all dropped, calling forget for 10
>>>> terminate called after throwing an instance of 'fmt::v6::format_error'
>>>>   what():  argument index out of range
>>>> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
>>>>
>>>> Not sure if that's causing it not to fail here.
>>>
>>> Ok, that means it "passed". (I was in the process of figuring out
>>> where I got this wrong when I started seeing the crashes)
>>
>> Can you do, in your kernel dir:
>>
>> $ gdb vmlinux
>> [...]
>> (gdb) l *__io_queue_sqe+0x4a
>>
>> and see what it says?
> 
> 0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
> 4809 struct io_kiocb *linked_timeout;
> 4810 struct io_kiocb *nxt = NULL;
> 4811 int ret;
> 4812
> 4813 again:
> 4814 linked_timeout = io_prep_linked_timeout(req);
> 4815
> 4816 ret = io_issue_sqe(req, sqe, &nxt, true);
> 4817
> 4818 /*
> 
> (I am not using timeouts, just async_cancel)

Can't seem to hit it here, went through thousands of iterations...
I'll keep trying.

If you have time, you can try and enable CONFIG_KASAN=y and see if
you can hit it with that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 17:28           ` Jens Axboe
@ 2020-02-20 17:33             ` Glauber Costa
       [not found]             ` <CAD-J=zbdrZJ2nKgH3Ob=QAAM9Ci439T9DduNxvetK9B_52LDOQ@mail.gmail.com>
  1 sibling, 0 replies; 12+ messages in thread
From: Glauber Costa @ 2020-02-20 17:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Avi Kivity

On Thu, Feb 20, 2020 at 12:28 PM Jens Axboe <[email protected]> wrote:
>
> On 2/20/20 9:52 AM, Glauber Costa wrote:
> > On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <[email protected]> wrote:
> >>
> >> On 2/20/20 9:34 AM, Glauber Costa wrote:
> >>> On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
> >>>>
> >>>> On 2/20/20 9:17 AM, Jens Axboe wrote:
> >>>>> On 2/20/20 7:19 AM, Glauber Costa wrote:
> >>>>>> Hi there, me again
> >>>>>>
> >>>>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
> >>>>>>
> >>>>>> This test is easier to explain: it essentially issues a connect and a
> >>>>>> shutdown right away.
> >>>>>>
> >>>>>> It currently fails due to no fault of io_uring. But every now and then
> >>>>>> it crashes (you may have to run more than once to get it to crash)
> >>>>>>
> >>>>>> Instructions are similar to my last test.
> >>>>>> Except the test to build is now "tests/unit/connect_test"
> >>>>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
> >>>>>>
> >>>>>> Run it with ./build/release/tests/unit/connect_test -- -c1
> >>>>>> --reactor-backend=uring
> >>>>>>
> >>>>>> Backtrace attached
> >>>>>
> >>>>> Perfect thanks, I'll take a look!
> >>>>
> >>>> Haven't managed to crash it yet, but every run complains:
> >>>>
> >>>> got to shutdown of 10 with refcnt: 2
> >>>> Refs being all dropped, calling forget for 10
> >>>> terminate called after throwing an instance of 'fmt::v6::format_error'
> >>>>   what():  argument index out of range
> >>>> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
> >>>>
> >>>> Not sure if that's causing it not to fail here.
> >>>
> >>> Ok, that means it "passed". (I was in the process of figuring out
> >>> where I got this wrong when I started seeing the crashes)
> >>
> >> Can you do, in your kernel dir:
> >>
> >> $ gdb vmlinux
> >> [...]
> >> (gdb) l *__io_queue_sqe+0x4a
> >>
> >> and see what it says?
> >
> > 0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
> > 4809 struct io_kiocb *linked_timeout;
> > 4810 struct io_kiocb *nxt = NULL;
> > 4811 int ret;
> > 4812
> > 4813 again:
> > 4814 linked_timeout = io_prep_linked_timeout(req);
> > 4815
> > 4816 ret = io_issue_sqe(req, sqe, &nxt, true);
> > 4817
> > 4818 /*
> >
> > (I am not using timeouts, just async_cancel)
>
> Can't seem to hit it here, went through thousands of iterations...
> I'll keep trying.
>
> If you have time, you can try and enable CONFIG_KASAN=y and see if
> you can hit it with that.

sure thing. will let you know

>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
       [not found]             ` <CAD-J=zbdrZJ2nKgH3Ob=QAAM9Ci439T9DduNxvetK9B_52LDOQ@mail.gmail.com>
@ 2020-02-20 19:12               ` Jens Axboe
  2020-02-20 19:19                 ` Glauber Costa
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2020-02-20 19:12 UTC (permalink / raw)
  To: Glauber Costa; +Cc: io-uring, Avi Kivity

On 2/20/20 11:45 AM, Glauber Costa wrote:
> On Thu, Feb 20, 2020 at 12:28 PM Jens Axboe <[email protected]> wrote:
>>
>> On 2/20/20 9:52 AM, Glauber Costa wrote:
>>> On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <[email protected]> wrote:
>>>>
>>>> On 2/20/20 9:34 AM, Glauber Costa wrote:
>>>>> On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
>>>>>>
>>>>>> On 2/20/20 9:17 AM, Jens Axboe wrote:
>>>>>>> On 2/20/20 7:19 AM, Glauber Costa wrote:
>>>>>>>> Hi there, me again
>>>>>>>>
>>>>>>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
>>>>>>>>
>>>>>>>> This test is easier to explain: it essentially issues a connect and a
>>>>>>>> shutdown right away.
>>>>>>>>
>>>>>>>> It currently fails due to no fault of io_uring. But every now and then
>>>>>>>> it crashes (you may have to run more than once to get it to crash)
>>>>>>>>
>>>>>>>> Instructions are similar to my last test.
>>>>>>>> Except the test to build is now "tests/unit/connect_test"
>>>>>>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
>>>>>>>>
>>>>>>>> Run it with ./build/release/tests/unit/connect_test -- -c1
>>>>>>>> --reactor-backend=uring
>>>>>>>>
>>>>>>>> Backtrace attached
>>>>>>>
>>>>>>> Perfect thanks, I'll take a look!
>>>>>>
>>>>>> Haven't managed to crash it yet, but every run complains:
>>>>>>
>>>>>> got to shutdown of 10 with refcnt: 2
>>>>>> Refs being all dropped, calling forget for 10
>>>>>> terminate called after throwing an instance of 'fmt::v6::format_error'
>>>>>>   what():  argument index out of range
>>>>>> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
>>>>>>
>>>>>> Not sure if that's causing it not to fail here.
>>>>>
>>>>> Ok, that means it "passed". (I was in the process of figuring out
>>>>> where I got this wrong when I started seeing the crashes)
>>>>
>>>> Can you do, in your kernel dir:
>>>>
>>>> $ gdb vmlinux
>>>> [...]
>>>> (gdb) l *__io_queue_sqe+0x4a
>>>>
>>>> and see what it says?
>>>
>>> 0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
>>> 4809 struct io_kiocb *linked_timeout;
>>> 4810 struct io_kiocb *nxt = NULL;
>>> 4811 int ret;
>>> 4812
>>> 4813 again:
>>> 4814 linked_timeout = io_prep_linked_timeout(req);
>>> 4815
>>> 4816 ret = io_issue_sqe(req, sqe, &nxt, true);
>>> 4817
>>> 4818 /*
>>>
>>> (I am not using timeouts, just async_cancel)
>>
>> Can't seem to hit it here, went through thousands of iterations...
>> I'll keep trying.
>>
>> If you have time, you can try and enable CONFIG_KASAN=y and see if
>> you can hit it with that.
> 
> I can
> 
> Attaching full dmesg

Can you try the latest? It's sha d8154e605f84. Before you do, can you
do the lookup on __io_queue_sqe+0x639 with gdb?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 19:12               ` Jens Axboe
@ 2020-02-20 19:19                 ` Glauber Costa
  2020-02-20 19:36                   ` Glauber Costa
  0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2020-02-20 19:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Avi Kivity

On Thu, Feb 20, 2020 at 2:12 PM Jens Axboe <[email protected]> wrote:
>
> On 2/20/20 11:45 AM, Glauber Costa wrote:
> > On Thu, Feb 20, 2020 at 12:28 PM Jens Axboe <[email protected]> wrote:
> >>
> >> On 2/20/20 9:52 AM, Glauber Costa wrote:
> >>> On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <[email protected]> wrote:
> >>>>
> >>>> On 2/20/20 9:34 AM, Glauber Costa wrote:
> >>>>> On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
> >>>>>>
> >>>>>> On 2/20/20 9:17 AM, Jens Axboe wrote:
> >>>>>>> On 2/20/20 7:19 AM, Glauber Costa wrote:
> >>>>>>>> Hi there, me again
> >>>>>>>>
> >>>>>>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
> >>>>>>>>
> >>>>>>>> This test is easier to explain: it essentially issues a connect and a
> >>>>>>>> shutdown right away.
> >>>>>>>>
> >>>>>>>> It currently fails due to no fault of io_uring. But every now and then
> >>>>>>>> it crashes (you may have to run more than once to get it to crash)
> >>>>>>>>
> >>>>>>>> Instructions are similar to my last test.
> >>>>>>>> Except the test to build is now "tests/unit/connect_test"
> >>>>>>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
> >>>>>>>>
> >>>>>>>> Run it with ./build/release/tests/unit/connect_test -- -c1
> >>>>>>>> --reactor-backend=uring
> >>>>>>>>
> >>>>>>>> Backtrace attached
> >>>>>>>
> >>>>>>> Perfect thanks, I'll take a look!
> >>>>>>
> >>>>>> Haven't managed to crash it yet, but every run complains:
> >>>>>>
> >>>>>> got to shutdown of 10 with refcnt: 2
> >>>>>> Refs being all dropped, calling forget for 10
> >>>>>> terminate called after throwing an instance of 'fmt::v6::format_error'
> >>>>>>   what():  argument index out of range
> >>>>>> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
> >>>>>>
> >>>>>> Not sure if that's causing it not to fail here.
> >>>>>
> >>>>> Ok, that means it "passed". (I was in the process of figuring out
> >>>>> where I got this wrong when I started seeing the crashes)
> >>>>
> >>>> Can you do, in your kernel dir:
> >>>>
> >>>> $ gdb vmlinux
> >>>> [...]
> >>>> (gdb) l *__io_queue_sqe+0x4a
> >>>>
> >>>> and see what it says?
> >>>
> >>> 0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
> >>> 4809 struct io_kiocb *linked_timeout;
> >>> 4810 struct io_kiocb *nxt = NULL;
> >>> 4811 int ret;
> >>> 4812
> >>> 4813 again:
> >>> 4814 linked_timeout = io_prep_linked_timeout(req);
> >>> 4815
> >>> 4816 ret = io_issue_sqe(req, sqe, &nxt, true);
> >>> 4817
> >>> 4818 /*
> >>>
> >>> (I am not using timeouts, just async_cancel)
> >>
> >> Can't seem to hit it here, went through thousands of iterations...
> >> I'll keep trying.
> >>
> >> If you have time, you can try and enable CONFIG_KASAN=y and see if
> >> you can hit it with that.
> >
> > I can
> >
> > Attaching full dmesg
>
> Can you try the latest? It's sha d8154e605f84. Before you do, can you
> do the lookup on __io_queue_sqe+0x639 with gdb?

Moving to that hash now. In the meantime, so I don't delay your fun:

) l *__io_queue_sqe+0x639
0xffffffff81566c19 is in __io_queue_sqe (./include/linux/compiler.h:226).
221 {
222 switch (size) {
223 case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
224 case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
225 case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
226 case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
227 default:
228 barrier();
229 __builtin_memcpy((void *)p, (const void *)res, size);
230 barrier();


>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 19:19                 ` Glauber Costa
@ 2020-02-20 19:36                   ` Glauber Costa
  2020-02-20 20:08                     ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Glauber Costa @ 2020-02-20 19:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Avi Kivity

On Thu, Feb 20, 2020 at 2:19 PM Glauber Costa <[email protected]> wrote:
>
> On Thu, Feb 20, 2020 at 2:12 PM Jens Axboe <[email protected]> wrote:
> >
> > On 2/20/20 11:45 AM, Glauber Costa wrote:
> > > On Thu, Feb 20, 2020 at 12:28 PM Jens Axboe <[email protected]> wrote:
> > >>
> > >> On 2/20/20 9:52 AM, Glauber Costa wrote:
> > >>> On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <[email protected]> wrote:
> > >>>>
> > >>>> On 2/20/20 9:34 AM, Glauber Costa wrote:
> > >>>>> On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
> > >>>>>>
> > >>>>>> On 2/20/20 9:17 AM, Jens Axboe wrote:
> > >>>>>>> On 2/20/20 7:19 AM, Glauber Costa wrote:
> > >>>>>>>> Hi there, me again
> > >>>>>>>>
> > >>>>>>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
> > >>>>>>>>
> > >>>>>>>> This test is easier to explain: it essentially issues a connect and a
> > >>>>>>>> shutdown right away.
> > >>>>>>>>
> > >>>>>>>> It currently fails due to no fault of io_uring. But every now and then
> > >>>>>>>> it crashes (you may have to run more than once to get it to crash)
> > >>>>>>>>
> > >>>>>>>> Instructions are similar to my last test.
> > >>>>>>>> Except the test to build is now "tests/unit/connect_test"
> > >>>>>>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
> > >>>>>>>>
> > >>>>>>>> Run it with ./build/release/tests/unit/connect_test -- -c1
> > >>>>>>>> --reactor-backend=uring
> > >>>>>>>>
> > >>>>>>>> Backtrace attached
> > >>>>>>>
> > >>>>>>> Perfect thanks, I'll take a look!
> > >>>>>>
> > >>>>>> Haven't managed to crash it yet, but every run complains:
> > >>>>>>
> > >>>>>> got to shutdown of 10 with refcnt: 2
> > >>>>>> Refs being all dropped, calling forget for 10
> > >>>>>> terminate called after throwing an instance of 'fmt::v6::format_error'
> > >>>>>>   what():  argument index out of range
> > >>>>>> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
> > >>>>>>
> > >>>>>> Not sure if that's causing it not to fail here.
> > >>>>>
> > >>>>> Ok, that means it "passed". (I was in the process of figuring out
> > >>>>> where I got this wrong when I started seeing the crashes)
> > >>>>
> > >>>> Can you do, in your kernel dir:
> > >>>>
> > >>>> $ gdb vmlinux
> > >>>> [...]
> > >>>> (gdb) l *__io_queue_sqe+0x4a
> > >>>>
> > >>>> and see what it says?
> > >>>
> > >>> 0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
> > >>> 4809 struct io_kiocb *linked_timeout;
> > >>> 4810 struct io_kiocb *nxt = NULL;
> > >>> 4811 int ret;
> > >>> 4812
> > >>> 4813 again:
> > >>> 4814 linked_timeout = io_prep_linked_timeout(req);
> > >>> 4815
> > >>> 4816 ret = io_issue_sqe(req, sqe, &nxt, true);
> > >>> 4817
> > >>> 4818 /*
> > >>>
> > >>> (I am not using timeouts, just async_cancel)
> > >>
> > >> Can't seem to hit it here, went through thousands of iterations...
> > >> I'll keep trying.
> > >>
> > >> If you have time, you can try and enable CONFIG_KASAN=y and see if
> > >> you can hit it with that.
> > >
> > > I can
> > >
> > > Attaching full dmesg
> >
> > Can you try the latest? It's sha d8154e605f84.

10 runs, no crashes.

Thanks!

>> Before you do, can you
> > do the lookup on __io_queue_sqe+0x639 with gdb?
>
> Moving to that hash now. In the meantime, so I don't delay your fun:
>
> ) l *__io_queue_sqe+0x639
> 0xffffffff81566c19 is in __io_queue_sqe (./include/linux/compiler.h:226).
> 221 {
> 222 switch (size) {
> 223 case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> 224 case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> 225 case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
> 226 case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
> 227 default:
> 228 barrier();
> 229 __builtin_memcpy((void *)p, (const void *)res, size);
> 230 barrier();
>
>
> >
> > --
> > Jens Axboe
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: crash on connect
  2020-02-20 19:36                   ` Glauber Costa
@ 2020-02-20 20:08                     ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2020-02-20 20:08 UTC (permalink / raw)
  To: Glauber Costa; +Cc: io-uring, Avi Kivity

On 2/20/20 12:36 PM, Glauber Costa wrote:
> On Thu, Feb 20, 2020 at 2:19 PM Glauber Costa <[email protected]> wrote:
>>
>> On Thu, Feb 20, 2020 at 2:12 PM Jens Axboe <[email protected]> wrote:
>>>
>>> On 2/20/20 11:45 AM, Glauber Costa wrote:
>>>> On Thu, Feb 20, 2020 at 12:28 PM Jens Axboe <[email protected]> wrote:
>>>>>
>>>>> On 2/20/20 9:52 AM, Glauber Costa wrote:
>>>>>> On Thu, Feb 20, 2020 at 11:39 AM Jens Axboe <[email protected]> wrote:
>>>>>>>
>>>>>>> On 2/20/20 9:34 AM, Glauber Costa wrote:
>>>>>>>> On Thu, Feb 20, 2020 at 11:29 AM Jens Axboe <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On 2/20/20 9:17 AM, Jens Axboe wrote:
>>>>>>>>>> On 2/20/20 7:19 AM, Glauber Costa wrote:
>>>>>>>>>>> Hi there, me again
>>>>>>>>>>>
>>>>>>>>>>> Kernel is at 043f0b67f2ab8d1af418056bc0cc6f0623d31347
>>>>>>>>>>>
>>>>>>>>>>> This test is easier to explain: it essentially issues a connect and a
>>>>>>>>>>> shutdown right away.
>>>>>>>>>>>
>>>>>>>>>>> It currently fails due to no fault of io_uring. But every now and then
>>>>>>>>>>> it crashes (you may have to run more than once to get it to crash)
>>>>>>>>>>>
>>>>>>>>>>> Instructions are similar to my last test.
>>>>>>>>>>> Except the test to build is now "tests/unit/connect_test"
>>>>>>>>>>> Code is at [email protected]:glommer/seastar.git  branch io-uring-connect-crash
>>>>>>>>>>>
>>>>>>>>>>> Run it with ./build/release/tests/unit/connect_test -- -c1
>>>>>>>>>>> --reactor-backend=uring
>>>>>>>>>>>
>>>>>>>>>>> Backtrace attached
>>>>>>>>>>
>>>>>>>>>> Perfect thanks, I'll take a look!
>>>>>>>>>
>>>>>>>>> Haven't managed to crash it yet, but every run complains:
>>>>>>>>>
>>>>>>>>> got to shutdown of 10 with refcnt: 2
>>>>>>>>> Refs being all dropped, calling forget for 10
>>>>>>>>> terminate called after throwing an instance of 'fmt::v6::format_error'
>>>>>>>>>   what():  argument index out of range
>>>>>>>>> unknown location(0): fatal error: in "unixdomain_server": signal: SIGABRT (application abort requested)
>>>>>>>>>
>>>>>>>>> Not sure if that's causing it not to fail here.
>>>>>>>>
>>>>>>>> Ok, that means it "passed". (I was in the process of figuring out
>>>>>>>> where I got this wrong when I started seeing the crashes)
>>>>>>>
>>>>>>> Can you do, in your kernel dir:
>>>>>>>
>>>>>>> $ gdb vmlinux
>>>>>>> [...]
>>>>>>> (gdb) l *__io_queue_sqe+0x4a
>>>>>>>
>>>>>>> and see what it says?
>>>>>>
>>>>>> 0xffffffff81375ada is in __io_queue_sqe (fs/io_uring.c:4814).
>>>>>> 4809 struct io_kiocb *linked_timeout;
>>>>>> 4810 struct io_kiocb *nxt = NULL;
>>>>>> 4811 int ret;
>>>>>> 4812
>>>>>> 4813 again:
>>>>>> 4814 linked_timeout = io_prep_linked_timeout(req);
>>>>>> 4815
>>>>>> 4816 ret = io_issue_sqe(req, sqe, &nxt, true);
>>>>>> 4817
>>>>>> 4818 /*
>>>>>>
>>>>>> (I am not using timeouts, just async_cancel)
>>>>>
>>>>> Can't seem to hit it here, went through thousands of iterations...
>>>>> I'll keep trying.
>>>>>
>>>>> If you have time, you can try and enable CONFIG_KASAN=y and see if
>>>>> you can hit it with that.
>>>>
>>>> I can
>>>>
>>>> Attaching full dmesg
>>>
>>> Can you try the latest? It's sha d8154e605f84.
> 
> 10 runs, no crashes.
> 
> Thanks!

Great! Thanks for reporting and the quick testing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-02-20 20:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-20 14:19 crash on connect Glauber Costa
2020-02-20 16:17 ` Jens Axboe
2020-02-20 16:29   ` Jens Axboe
2020-02-20 16:34     ` Glauber Costa
2020-02-20 16:38       ` Jens Axboe
2020-02-20 16:52         ` Glauber Costa
2020-02-20 17:28           ` Jens Axboe
2020-02-20 17:33             ` Glauber Costa
     [not found]             ` <CAD-J=zbdrZJ2nKgH3Ob=QAAM9Ci439T9DduNxvetK9B_52LDOQ@mail.gmail.com>
2020-02-20 19:12               ` Jens Axboe
2020-02-20 19:19                 ` Glauber Costa
2020-02-20 19:36                   ` Glauber Costa
2020-02-20 20:08                     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox