* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference [not found] <CAAss7+pgQN7uPFaLakd+K4yZH6TRcMHELQV0wAA2NUxPpYEL_Q@mail.gmail.com> @ 2020-11-07 0:51 ` Josef 2020-11-07 12:10 ` Pavel Begunkov 0 siblings, 1 reply; 5+ messages in thread From: Josef @ 2020-11-07 0:51 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: norman On Sat, 7 Nov 2020 at 01:45, Josef <[email protected]> wrote: > > Hi, > > I came across some strange behaviour in some netty-io_uring tests when > using SQPOLL which seems like a bug to me, however I don't know how to > reproduce it, as the error occurs randomly which leads to a kernel > "freeze", I spend all day trying to figure out how to reproduce this > error...any idea what the cause is? > > branch: for-5.11/io_uring > last commit 34f98f655639b32f28c30c27dbbea57f8c304d9c > > (please don't waste your time as I'll take a look on the weekend) > > --- > Josef Grieb I forgot to mention that same cores are running at 100% cpu usage, when error occurs ---- Josef Grieb ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference 2020-11-07 0:51 ` Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference Josef @ 2020-11-07 12:10 ` Pavel Begunkov 2020-11-07 14:09 ` Josef 0 siblings, 1 reply; 5+ messages in thread From: Pavel Begunkov @ 2020-11-07 12:10 UTC (permalink / raw) To: Josef, Jens Axboe, io-uring; +Cc: norman On 07/11/2020 00:51, Josef wrote: > On Sat, 7 Nov 2020 at 01:45, Josef <[email protected]> wrote: >> I came across some strange behaviour in some netty-io_uring tests when >> using SQPOLL which seems like a bug to me, however I don't know how to >> reproduce it, as the error occurs randomly which leads to a kernel >> "freeze", I spend all day trying to figure out how to reproduce this >> error...any idea what the cause is? >> >> branch: for-5.11/io_uring >> last commit 34f98f655639b32f28c30c27dbbea57f8c304d9c >> >> (please don't waste your time as I'll take a look on the weekend) >> > I forgot to mention that same cores are running at 100% cpu usage, > when error occurs I haven't got the first email, is it "kernel NULL pointer dereference" as in the subject or just freeze? also - anything in dmesg? (>5 min after freeze) - did you locate which test hangs it? If so what it uses? e.g. SQPOLL sharing, IOPOLL., etc. - is it send/recvmsg, send/recv you use? any other? - does this happen often? - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread` (funcgraph is from bpftools). Or catch that with some other tools. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference 2020-11-07 12:10 ` Pavel Begunkov @ 2020-11-07 14:09 ` Josef 2020-11-07 16:28 ` Pavel Begunkov 0 siblings, 1 reply; 5+ messages in thread From: Josef @ 2020-11-07 14:09 UTC (permalink / raw) To: Pavel Begunkov, io-uring, Jens Axboe; +Cc: norman > I haven't got the first email, is it "kernel NULL pointer dereference" > as in the subject or just freeze? that's weird..probably the size of the attached log file is too big... here dmesg log file https://gist.github.com/1Jo1/3d0bcefc18f097265f0dc1ef054a87c0 > - did you locate which test hangs it? If so what it uses? e.g. SQPOLL > sharing, IOPOLL., etc. yes, it uses SQPOLL, without sharing, IPOLL is not enabled, and Async Flag is enabled > - is it send/recvmsg, send/recv you use? any other? no the tests which occurs the error use these operations: OP_READ, OP_WRITE, OP_POLL_ADD, OP_POLL_REMOVE, OP_CLOSE, OP_ACCEPT, OP_TIMEOUT (OP_READ, OP_WRITE and OP_CLOSE async flag is enabled) > - does this happen often? yeah quite often > - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread` > (funcgraph is from bpftools). Or catch that with some other tools. I'm not quite familiar with these tools( kernel debugging in general) I'll take a look tomorrow ernel NULL pointer dereference" > as in the subject or just freeze? > > also > - anything in dmesg? (>5 min after freeze) > - did you locate which test hangs it? If so what it uses? e.g. SQPOLL > sharing, IOPOLL., etc. > - is it send/recvmsg, send/recv you use? any other? > - does this happen often? > - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread` > (funcgraph is from bpftools). Or catch that with some other tools. > > -- > Pavel Begunkov ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference 2020-11-07 14:09 ` Josef @ 2020-11-07 16:28 ` Pavel Begunkov 2020-11-07 20:03 ` Pavel Begunkov 0 siblings, 1 reply; 5+ messages in thread From: Pavel Begunkov @ 2020-11-07 16:28 UTC (permalink / raw) To: Josef, io-uring, Jens Axboe; +Cc: norman On 07/11/2020 14:09, Josef wrote: >> I haven't got the first email, is it "kernel NULL pointer dereference" >> as in the subject or just freeze? > > that's weird..probably the size of the attached log file is too big... > here dmesg log file > https://gist.github.com/1Jo1/3d0bcefc18f097265f0dc1ef054a87c0 That's much better with the log, thanks! I'll take a look later > >> - did you locate which test hangs it? If so what it uses? e.g. SQPOLL >> sharing, IOPOLL., etc. > > yes, it uses SQPOLL, without sharing, IPOLL is not enabled, and Async > Flag is enabled > >> - is it send/recvmsg, send/recv you use? any other? > > no the tests which occurs the error use these operations: OP_READ, > OP_WRITE, OP_POLL_ADD, OP_POLL_REMOVE, OP_CLOSE, OP_ACCEPT, OP_TIMEOUT > (OP_READ, OP_WRITE and OP_CLOSE async flag is enabled) > >> - does this happen often? > > yeah quite often > >> - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread` >> (funcgraph is from bpftools). Or catch that with some other tools. > > I'm not quite familiar with these tools( kernel debugging in general) > I'll take a look tomorrow -- Pavel Begunkov ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference 2020-11-07 16:28 ` Pavel Begunkov @ 2020-11-07 20:03 ` Pavel Begunkov 0 siblings, 0 replies; 5+ messages in thread From: Pavel Begunkov @ 2020-11-07 20:03 UTC (permalink / raw) To: Josef, io-uring, Jens Axboe; +Cc: norman On 07/11/2020 16:28, Pavel Begunkov wrote: > On 07/11/2020 14:09, Josef wrote: >>> I haven't got the first email, is it "kernel NULL pointer dereference" >>> as in the subject or just freeze? >> >> that's weird..probably the size of the attached log file is too big... >> here dmesg log file >> https://gist.github.com/1Jo1/3d0bcefc18f097265f0dc1ef054a87c0 > > That's much better with the log, thanks! I'll take a look later Ok, we get into fget_many() without ->files, and it's clear how it may happen. I'll write up a patch. > >> >>> - did you locate which test hangs it? If so what it uses? e.g. SQPOLL >>> sharing, IOPOLL., etc. >> >> yes, it uses SQPOLL, without sharing, IPOLL is not enabled, and Async >> Flag is enabled >> >>> - is it send/recvmsg, send/recv you use? any other? >> >> no the tests which occurs the error use these operations: OP_READ, >> OP_WRITE, OP_POLL_ADD, OP_POLL_REMOVE, OP_CLOSE, OP_ACCEPT, OP_TIMEOUT >> (OP_READ, OP_WRITE and OP_CLOSE async flag is enabled) >> >>> - does this happen often? >> >> yeah quite often >> >>> - you may try `funcgraph __io_sq_thread -H` or even with `io_sq_thread` >>> (funcgraph is from bpftools). Or catch that with some other tools. >> >> I'm not quite familiar with these tools( kernel debugging in general) >> I'll take a look tomorrow > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-11-07 20:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CAAss7+pgQN7uPFaLakd+K4yZH6TRcMHELQV0wAA2NUxPpYEL_Q@mail.gmail.com> 2020-11-07 0:51 ` Using SQPOLL for-5.11/io_uring kernel NULL pointer dereference Josef 2020-11-07 12:10 ` Pavel Begunkov 2020-11-07 14:09 ` Josef 2020-11-07 16:28 ` Pavel Begunkov 2020-11-07 20:03 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox