public inbox for [email protected]
 help / color / mirror / Atom feed
* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference
       [not found] <CAAss7+pgQN7uPFaLakd+K4yZH6TRcMHELQV0wAA2NUxPpYEL_Q@mail.gmail.com>
@ 2020-11-07  0:51 ` Josef
  2020-11-07 12:10   ` Pavel Begunkov
  0 siblings, 1 reply; 5+ messages in thread
From: Josef @ 2020-11-07  0:51 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: norman

On Sat, 7 Nov 2020 at 01:45, Josef <[email protected]> wrote:
>
> Hi,
>
> I came across some strange behaviour in some netty-io_uring tests when
> using SQPOLL which seems like a bug to me, however I don't know how to
> reproduce it, as the error occurs randomly which leads to a kernel
> "freeze",  I spend all day trying to figure out how to reproduce this
> error...any idea what the cause is?
>
> branch: for-5.11/io_uring
> last commit 34f98f655639b32f28c30c27dbbea57f8c304d9c
>
> (please don't waste your time as I'll take a look on the weekend)
>
> ---
> Josef Grieb

I forgot to mention that same cores are running at 100% cpu usage,
when error occurs

----
Josef Grieb

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference
  2020-11-07  0:51 ` Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference Josef
@ 2020-11-07 12:10   ` Pavel Begunkov
  2020-11-07 14:09     ` Josef
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Begunkov @ 2020-11-07 12:10 UTC (permalink / raw)
  To: Josef, Jens Axboe, io-uring; +Cc: norman

On 07/11/2020 00:51, Josef wrote:
> On Sat, 7 Nov 2020 at 01:45, Josef <[email protected]> wrote:
>> I came across some strange behaviour in some netty-io_uring tests when
>> using SQPOLL which seems like a bug to me, however I don't know how to
>> reproduce it, as the error occurs randomly which leads to a kernel
>> "freeze",  I spend all day trying to figure out how to reproduce this
>> error...any idea what the cause is?
>>
>> branch: for-5.11/io_uring
>> last commit 34f98f655639b32f28c30c27dbbea57f8c304d9c
>>
>> (please don't waste your time as I'll take a look on the weekend)
>>
> I forgot to mention that same cores are running at 100% cpu usage,
> when error occurs

I haven't got the first email, is it "kernel NULL pointer dereference"
as in the subject or just freeze?

also
- anything in dmesg? (>5 min after freeze)
- did you locate which test hangs it? If so what it uses? e.g. SQPOLL
sharing, IOPOLL., etc.
- is it send/recvmsg, send/recv you use? any other?
- does this happen often?
- you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread`
(funcgraph is from bpftools). Or catch that with some other tools.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference
  2020-11-07 12:10   ` Pavel Begunkov
@ 2020-11-07 14:09     ` Josef
  2020-11-07 16:28       ` Pavel Begunkov
  0 siblings, 1 reply; 5+ messages in thread
From: Josef @ 2020-11-07 14:09 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, Jens Axboe; +Cc: norman

> I haven't got the first email, is it "kernel NULL pointer dereference"
> as in the subject or just freeze?

that's weird..probably the size of the attached log file is too big...
here dmesg log file
https://gist.github.com/1Jo1/3d0bcefc18f097265f0dc1ef054a87c0

> - did you locate which test hangs it? If so what it uses? e.g. SQPOLL
> sharing, IOPOLL., etc.

yes, it uses SQPOLL, without sharing, IPOLL is not enabled, and Async
Flag is enabled

> - is it send/recvmsg, send/recv you use? any other?

no the tests which occurs the error use these operations: OP_READ,
OP_WRITE, OP_POLL_ADD, OP_POLL_REMOVE, OP_CLOSE, OP_ACCEPT, OP_TIMEOUT
(OP_READ, OP_WRITE and OP_CLOSE async flag is enabled)

> - does this happen often?

yeah quite often

> - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread`
> (funcgraph is from bpftools). Or catch that with some other tools.

I'm not quite familiar with these tools( kernel debugging in general)
I'll take a look tomorrow


ernel NULL pointer dereference"
> as in the subject or just freeze?
>
> also
> - anything in dmesg? (>5 min after freeze)
> - did you locate which test hangs it? If so what it uses? e.g. SQPOLL
> sharing, IOPOLL., etc.
> - is it send/recvmsg, send/recv you use? any other?
> - does this happen often?
> - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread`
> (funcgraph is from bpftools). Or catch that with some other tools.
>
> --
> Pavel Begunkov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference
  2020-11-07 14:09     ` Josef
@ 2020-11-07 16:28       ` Pavel Begunkov
  2020-11-07 20:03         ` Pavel Begunkov
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Begunkov @ 2020-11-07 16:28 UTC (permalink / raw)
  To: Josef, io-uring, Jens Axboe; +Cc: norman

On 07/11/2020 14:09, Josef wrote:
>> I haven't got the first email, is it "kernel NULL pointer dereference"
>> as in the subject or just freeze?
> 
> that's weird..probably the size of the attached log file is too big...
> here dmesg log file
> https://gist.github.com/1Jo1/3d0bcefc18f097265f0dc1ef054a87c0

That's much better with the log, thanks! I'll take a look later

> 
>> - did you locate which test hangs it? If so what it uses? e.g. SQPOLL
>> sharing, IOPOLL., etc.
> 
> yes, it uses SQPOLL, without sharing, IPOLL is not enabled, and Async
> Flag is enabled
> 
>> - is it send/recvmsg, send/recv you use? any other?
> 
> no the tests which occurs the error use these operations: OP_READ,
> OP_WRITE, OP_POLL_ADD, OP_POLL_REMOVE, OP_CLOSE, OP_ACCEPT, OP_TIMEOUT
> (OP_READ, OP_WRITE and OP_CLOSE async flag is enabled)
> 
>> - does this happen often?
> 
> yeah quite often
> 
>> - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread`
>> (funcgraph is from bpftools). Or catch that with some other tools.
> 
> I'm not quite familiar with these tools( kernel debugging in general)
> I'll take a look tomorrow

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference
  2020-11-07 16:28       ` Pavel Begunkov
@ 2020-11-07 20:03         ` Pavel Begunkov
  0 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2020-11-07 20:03 UTC (permalink / raw)
  To: Josef, io-uring, Jens Axboe; +Cc: norman

On 07/11/2020 16:28, Pavel Begunkov wrote:
> On 07/11/2020 14:09, Josef wrote:
>>> I haven't got the first email, is it "kernel NULL pointer dereference"
>>> as in the subject or just freeze?
>>
>> that's weird..probably the size of the attached log file is too big...
>> here dmesg log file
>> https://gist.github.com/1Jo1/3d0bcefc18f097265f0dc1ef054a87c0
> 
> That's much better with the log, thanks! I'll take a look later

Ok, we get into fget_many() without ->files, and it's clear how it
may happen. I'll write up a patch.

> 
>>
>>> - did you locate which test hangs it? If so what it uses? e.g. SQPOLL
>>> sharing, IOPOLL., etc.
>>
>> yes, it uses SQPOLL, without sharing, IPOLL is not enabled, and Async
>> Flag is enabled
>>
>>> - is it send/recvmsg, send/recv you use? any other?
>>
>> no the tests which occurs the error use these operations: OP_READ,
>> OP_WRITE, OP_POLL_ADD, OP_POLL_REMOVE, OP_CLOSE, OP_ACCEPT, OP_TIMEOUT
>> (OP_READ, OP_WRITE and OP_CLOSE async flag is enabled)
>>
>>> - does this happen often?
>>
>> yeah quite often
>>
>>> - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread`
>>> (funcgraph is from bpftools). Or catch that with some other tools.
>>
>> I'm not quite familiar with these tools( kernel debugging in general)
>> I'll take a look tomorrow
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-07 20:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAAss7+pgQN7uPFaLakd+K4yZH6TRcMHELQV0wAA2NUxPpYEL_Q@mail.gmail.com>
2020-11-07  0:51 ` Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference Josef
2020-11-07 12:10   ` Pavel Begunkov
2020-11-07 14:09     ` Josef
2020-11-07 16:28       ` Pavel Begunkov
2020-11-07 20:03         ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox