* [RFC 0/1] io_uring: preserve work->mm since actual work processing may need it @ 2020-04-09 22:03 Bijan Mottahedeh 2020-04-09 22:03 ` [RFC 1/1] " Bijan Mottahedeh 0 siblings, 1 reply; 10+ messages in thread From: Bijan Mottahedeh @ 2020-04-09 22:03 UTC (permalink / raw) To: axboe; +Cc: io-uring The liburing madvise test crashes the system with a NULL pointer dereference because io_madvise() is passing a NULL mm value, previously cleared in io_wq_switch_mm(), to do_advise(). I'm not clear why work->mm is being cleared, especially since it seems to run contrary to what the comment above it states, but in any case preserving the work->mm value gets rid of the crash. -------------------------------------------------------------------------- Running test madvise [ 165.733724] BUG: kernel NULL pointer dereference, address: 0000000000000138 [ 165.735088] #PF: supervisor read access in kernel mode [ 165.736027] #PF: error_code(0x0000) - not-present page [ 165.736971] PGD 8000000fa3c32067 P4D 8000000fa3c32067 PUD fc4e17067 PMD 0 [ 165.738254] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 165.739140] CPU: 18 PID: 30105 Comm: io_wqe_worker-0 Not tainted 5.6.0-next-1 [ 165.740640] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-4 [ 165.742721] RIP: 0010:__lock_acquire.isra.29+0x37/0x6c0 [ 165.743656] Code: 25 40 8e 01 00 53 48 83 ec 18 44 8b 35 e6 2f 61 01 45 85 fc [ 165.747020] RSP: 0018:ffffc9000b08bba0 EFLAGS: 00010097 [ 165.747989] RAX: 0000000000000000 RBX: 0000000000000130 RCX: 0000000000000001 [ 165.749276] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000130 [ 165.750552] RBP: ffff888fa35224c0 R08: 0000000000000000 R09: 0000000000000000 [ 165.751862] R10: 0000000000000130 R11: 0000000000000000 R12: 0000000000000000 [ 165.753195] R13: 0000000000000001 R14: 0000000000000000 R15: 00007f5c4ecea000 [ 165.754490] FS: 0000000000000000(0000) GS:ffff888ff4600000(0000) knlGS:00000 [ 165.756007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 165.757054] CR2: 0000000000000138 CR3: 0000000fc709c002 CR4: 0000000000160ee0 [ 165.758339] Call Trace: [ 165.758805] ? load_balance+0x1b4/0xd00 [ 165.759525] lock_acquire+0xf9/0x160 [ 165.760202] ? do_madvise+0xa59/0xb20 [ 165.760894] down_read+0x3c/0xe0 [ 165.761479] ? do_madvise+0xa59/0xb20 [ 165.762188] do_madvise+0xa59/0xb20 [ 165.762830] ? kvm_sched_clock_read+0xd/0x20 [ 165.763643] ? free_debug_processing+0x291/0x2c8 [ 165.764535] ? do_raw_spin_unlock+0x83/0x90 [ 165.765303] ? free_debug_processing+0x291/0x2c8 [ 165.766184] io_issue_sqe+0xafa/0x11e0 [ 165.766867] ? kvm_sched_clock_read+0xd/0x20 [ 165.767641] ? __free_pages_ok+0x3db/0x550 [ 165.768390] ? _raw_spin_unlock+0x1f/0x30 [ 165.769129] io_wq_submit_work+0x2f/0x80 [ 165.769800] io_worker_handle_work+0x38a/0x540 [ 165.770650] io_wqe_worker+0x32a/0x370 [ 165.771342] kthread+0x118/0x120 [ 165.771948] ? io_worker_handle_work+0x540/0x540 [ 165.772784] ? kthread_insert_work_sanity_check+0x60/0x60 [ 165.773766] ret_from_fork+0x1f/0x30 [ 165.774419] Modules linked in: xfs dm_mod sr_mod sd_mod cdrom crc32c_intel nt [ 165.777124] CR2: 0000000000000138 [ 165.777733] ---[ end trace 2a1a5b9c912bd387 ]--- Bijan Mottahedeh (1): io_uring: preserve work->mm since actual work processing may need it fs/io-wq.c | 2 -- 1 file changed, 2 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-09 22:03 [RFC 0/1] io_uring: preserve work->mm since actual work processing may need it Bijan Mottahedeh @ 2020-04-09 22:03 ` Bijan Mottahedeh 2020-04-10 8:47 ` Pavel Begunkov 0 siblings, 1 reply; 10+ messages in thread From: Bijan Mottahedeh @ 2020-04-09 22:03 UTC (permalink / raw) To: axboe; +Cc: io-uring Do not clear work->mm since io_madvise() passes it to do_madvise() when the request is actually processed. Signed-off-by: Bijan Mottahedeh <[email protected]> --- fs/io-wq.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 4023c98..4d20754 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -431,8 +431,6 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work) if (!worker->mm) set_fs(USER_DS); worker->mm = work->mm; - /* hang on to this mm */ - work->mm = NULL; return; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-09 22:03 ` [RFC 1/1] " Bijan Mottahedeh @ 2020-04-10 8:47 ` Pavel Begunkov 2020-04-10 16:54 ` Bijan Mottahedeh 0 siblings, 1 reply; 10+ messages in thread From: Pavel Begunkov @ 2020-04-10 8:47 UTC (permalink / raw) To: Bijan Mottahedeh, axboe; +Cc: io-uring On 4/10/2020 1:03 AM, Bijan Mottahedeh wrote: > Do not clear work->mm since io_madvise() passes it to do_madvise() > when the request is actually processed. As I see, this down_read() from the trace is down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm() just several lines above your change. So, what do you mean by passing? I don't see do_madvise() __explicitly__ accepting mm as an argument. What tree do you use? Extra patches on top? > > Signed-off-by: Bijan Mottahedeh <[email protected]> > --- > fs/io-wq.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index 4023c98..4d20754 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -431,8 +431,6 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work) > if (!worker->mm) > set_fs(USER_DS); > worker->mm = work->mm; > - /* hang on to this mm */ > - work->mm = NULL; > return; > } > > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-10 8:47 ` Pavel Begunkov @ 2020-04-10 16:54 ` Bijan Mottahedeh 2020-04-10 17:51 ` Pavel Begunkov 0 siblings, 1 reply; 10+ messages in thread From: Bijan Mottahedeh @ 2020-04-10 16:54 UTC (permalink / raw) To: Pavel Begunkov, axboe; +Cc: io-uring > As I see, this down_read() from the trace is > down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm() > just several lines above your change. So, what do you mean by passing? I > don't see do_madvise() __explicitly__ accepting mm as an argument. I think the sequence is: io_madvise() -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice) ^^^^^^^^^^^^ -> down_read(&mm->mmap_sem) I added an assert in do_madvise() for a NULL mm value and hit it running the test. > What tree do you use? Extra patches on top? I'm using next-20200409 with no patches. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-10 16:54 ` Bijan Mottahedeh @ 2020-04-10 17:51 ` Pavel Begunkov 2020-04-10 17:57 ` Pavel Begunkov 2020-04-10 19:09 ` Bijan Mottahedeh 0 siblings, 2 replies; 10+ messages in thread From: Pavel Begunkov @ 2020-04-10 17:51 UTC (permalink / raw) To: Bijan Mottahedeh, axboe; +Cc: io-uring On 10/04/2020 19:54, Bijan Mottahedeh wrote: > >> As I see, this down_read() from the trace is >> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm() >> just several lines above your change. So, what do you mean by passing? I >> don't see do_madvise() __explicitly__ accepting mm as an argument. > > I think the sequence is: > > io_madvise() > -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice) > ^^^^^^^^^^^^ > -> down_read(&mm->mmap_sem) > > I added an assert in do_madvise() for a NULL mm value and hit it running the test. > >> What tree do you use? Extra patches on top? > > I'm using next-20200409 with no patches. I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't in Jen's tree. I don't think your patch will do, because it changes mm refcounting with extra mmdrop() in io_req_work_drop_env(). That's assuming it worked well before. Better fix then is to make it ```do_madvise(NULL, current->mm, ...)``` as it actually was at some point in the mentioned patch (v5). -- Pavel Begunkov ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-10 17:51 ` Pavel Begunkov @ 2020-04-10 17:57 ` Pavel Begunkov 2020-04-10 19:09 ` Bijan Mottahedeh 1 sibling, 0 replies; 10+ messages in thread From: Pavel Begunkov @ 2020-04-10 17:57 UTC (permalink / raw) To: Bijan Mottahedeh, axboe; +Cc: io-uring On 10/04/2020 20:51, Pavel Begunkov wrote: >> I added an assert in do_madvise() for a NULL mm value and hit it running the test. >> >>> What tree do you use? Extra patches on top? >> >> I'm using next-20200409 with no patches. > > I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't > in Jen's tree oops, sorry for mistyping your name. > > I don't think your patch will do, because it changes mm refcounting with extra > mmdrop() in io_req_work_drop_env(). That's assuming it worked well before. > > Better fix then is to make it ```do_madvise(NULL, current->mm, ...)``` > as it actually was at some point in the mentioned patch (v5). Jens, how this should be handled? Through what tree it has to go? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-10 17:51 ` Pavel Begunkov 2020-04-10 17:57 ` Pavel Begunkov @ 2020-04-10 19:09 ` Bijan Mottahedeh 2020-04-11 2:17 ` Jens Axboe 1 sibling, 1 reply; 10+ messages in thread From: Bijan Mottahedeh @ 2020-04-10 19:09 UTC (permalink / raw) To: Pavel Begunkov, axboe; +Cc: io-uring On 4/10/2020 10:51 AM, Pavel Begunkov wrote: > On 10/04/2020 19:54, Bijan Mottahedeh wrote: >>> As I see, this down_read() from the trace is >>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm() >>> just several lines above your change. So, what do you mean by passing? I >>> don't see do_madvise() __explicitly__ accepting mm as an argument. >> I think the sequence is: >> >> io_madvise() >> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice) >> ^^^^^^^^^^^^ >> -> down_read(&mm->mmap_sem) >> >> I added an assert in do_madvise() for a NULL mm value and hit it running the test. >> >>> What tree do you use? Extra patches on top? >> I'm using next-20200409 with no patches. > I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't > in Jen's tree. > > I don't think your patch will do, because it changes mm refcounting with extra > mmdrop() in io_req_work_drop_env(). That's assuming it worked well before. > > Better fix then is to make it ```do_madvise(NULL, current->mm, ...)``` > as it actually was at some point in the mentioned patch (v5). > Ok. Jens had suggested to use req->work.mm in the patch comments so let's just get him to confirm: "I think we want to use req->work.mm here - it'll be the same as current->mm at this point, but it makes it clear that we're using a grabbed mm." ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-10 19:09 ` Bijan Mottahedeh @ 2020-04-11 2:17 ` Jens Axboe 2020-04-16 20:24 ` Minchan Kim 0 siblings, 1 reply; 10+ messages in thread From: Jens Axboe @ 2020-04-11 2:17 UTC (permalink / raw) To: Bijan Mottahedeh, Pavel Begunkov; +Cc: io-uring, Minchan Kim On 4/10/20 12:09 PM, Bijan Mottahedeh wrote: > On 4/10/2020 10:51 AM, Pavel Begunkov wrote: >> On 10/04/2020 19:54, Bijan Mottahedeh wrote: >>>> As I see, this down_read() from the trace is >>>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm() >>>> just several lines above your change. So, what do you mean by passing? I >>>> don't see do_madvise() __explicitly__ accepting mm as an argument. >>> I think the sequence is: >>> >>> io_madvise() >>> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice) >>> ^^^^^^^^^^^^ >>> -> down_read(&mm->mmap_sem) >>> >>> I added an assert in do_madvise() for a NULL mm value and hit it running the test. >>> >>>> What tree do you use? Extra patches on top? >>> I'm using next-20200409 with no patches. >> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't >> in Jen's tree. >> >> I don't think your patch will do, because it changes mm refcounting with extra >> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before. >> >> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)``` >> as it actually was at some point in the mentioned patch (v5). >> > Ok. Jens had suggested to use req->work.mm in the patch comments so > let's just get him to confirm: > > "I think we want to use req->work.mm here - it'll be the same as > current->mm at this point, but it makes it clear that we're using a > grabbed mm." We should just use current->mm, as that matches at that point anyway since IORING_OP_MADVISE had needs_mm set. Minchan, can you please make that change? -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-11 2:17 ` Jens Axboe @ 2020-04-16 20:24 ` Minchan Kim 2020-04-16 20:30 ` Jens Axboe 0 siblings, 1 reply; 10+ messages in thread From: Minchan Kim @ 2020-04-16 20:24 UTC (permalink / raw) To: Jens Axboe; +Cc: Bijan Mottahedeh, Pavel Begunkov, io-uring Hi Jens, Sorry for the late. On Fri, Apr 10, 2020 at 08:17:29PM -0600, Jens Axboe wrote: > On 4/10/20 12:09 PM, Bijan Mottahedeh wrote: > > On 4/10/2020 10:51 AM, Pavel Begunkov wrote: > >> On 10/04/2020 19:54, Bijan Mottahedeh wrote: > >>>> As I see, this down_read() from the trace is > >>>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm() > >>>> just several lines above your change. So, what do you mean by passing? I > >>>> don't see do_madvise() __explicitly__ accepting mm as an argument. > >>> I think the sequence is: > >>> > >>> io_madvise() > >>> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice) > >>> ^^^^^^^^^^^^ > >>> -> down_read(&mm->mmap_sem) > >>> > >>> I added an assert in do_madvise() for a NULL mm value and hit it running the test. > >>> > >>>> What tree do you use? Extra patches on top? > >>> I'm using next-20200409 with no patches. > >> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't > >> in Jen's tree. > >> > >> I don't think your patch will do, because it changes mm refcounting with extra > >> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before. > >> > >> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)``` > >> as it actually was at some point in the mentioned patch (v5). > >> > > Ok. Jens had suggested to use req->work.mm in the patch comments so > > let's just get him to confirm: > > > > "I think we want to use req->work.mm here - it'll be the same as > > current->mm at this point, but it makes it clear that we're using a > > grabbed mm." > > We should just use current->mm, as that matches at that point anyway > since IORING_OP_MADVISE had needs_mm set. > > Minchan, can you please make that change? Do you mean this? diff --git a/fs/io_uring.c b/fs/io_uring.c index a9537cd77aeb..3edbb4764993 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3280,7 +3280,7 @@ static int io_madvise(struct io_kiocb *req, bool force_nonblock) if (force_nonblock) return -EAGAIN; - ret = do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice); + ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice); if (ret < 0) req_set_fail_links(req); io_cqring_add_event(req, ret); Since I have a plan to resend whole patchset again, I will carry on that. ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it 2020-04-16 20:24 ` Minchan Kim @ 2020-04-16 20:30 ` Jens Axboe 0 siblings, 0 replies; 10+ messages in thread From: Jens Axboe @ 2020-04-16 20:30 UTC (permalink / raw) To: Minchan Kim; +Cc: Bijan Mottahedeh, Pavel Begunkov, io-uring On 4/16/20 2:24 PM, Minchan Kim wrote: > Hi Jens, > > Sorry for the late. > > On Fri, Apr 10, 2020 at 08:17:29PM -0600, Jens Axboe wrote: >> On 4/10/20 12:09 PM, Bijan Mottahedeh wrote: >>> On 4/10/2020 10:51 AM, Pavel Begunkov wrote: >>>> On 10/04/2020 19:54, Bijan Mottahedeh wrote: >>>>>> As I see, this down_read() from the trace is >>>>>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm() >>>>>> just several lines above your change. So, what do you mean by passing? I >>>>>> don't see do_madvise() __explicitly__ accepting mm as an argument. >>>>> I think the sequence is: >>>>> >>>>> io_madvise() >>>>> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice) >>>>> ^^^^^^^^^^^^ >>>>> -> down_read(&mm->mmap_sem) >>>>> >>>>> I added an assert in do_madvise() for a NULL mm value and hit it running the test. >>>>> >>>>>> What tree do you use? Extra patches on top? >>>>> I'm using next-20200409 with no patches. >>>> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't >>>> in Jen's tree. >>>> >>>> I don't think your patch will do, because it changes mm refcounting with extra >>>> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before. >>>> >>>> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)``` >>>> as it actually was at some point in the mentioned patch (v5). >>>> >>> Ok. Jens had suggested to use req->work.mm in the patch comments so >>> let's just get him to confirm: >>> >>> "I think we want to use req->work.mm here - it'll be the same as >>> current->mm at this point, but it makes it clear that we're using a >>> grabbed mm." >> >> We should just use current->mm, as that matches at that point anyway >> since IORING_OP_MADVISE had needs_mm set. >> >> Minchan, can you please make that change? > > Do you mean this? > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index a9537cd77aeb..3edbb4764993 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -3280,7 +3280,7 @@ static int io_madvise(struct io_kiocb *req, bool force_nonblock) > if (force_nonblock) > return -EAGAIN; > > - ret = do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice); > + ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice); > if (ret < 0) > req_set_fail_links(req); > io_cqring_add_event(req, ret); > > Since I have a plan to resend whole patchset again, I will carry on > that. Yeah exactly like that, thanks! -- Jens Axboe ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-04-16 20:30 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-09 22:03 [RFC 0/1] io_uring: preserve work->mm since actual work processing may need it Bijan Mottahedeh 2020-04-09 22:03 ` [RFC 1/1] " Bijan Mottahedeh 2020-04-10 8:47 ` Pavel Begunkov 2020-04-10 16:54 ` Bijan Mottahedeh 2020-04-10 17:51 ` Pavel Begunkov 2020-04-10 17:57 ` Pavel Begunkov 2020-04-10 19:09 ` Bijan Mottahedeh 2020-04-11 2:17 ` Jens Axboe 2020-04-16 20:24 ` Minchan Kim 2020-04-16 20:30 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox