* [RFC 0/1] io_uring: preserve work->mm since actual work processing may need it
@ 2020-04-09 22:03 Bijan Mottahedeh
2020-04-09 22:03 ` [RFC 1/1] " Bijan Mottahedeh
0 siblings, 1 reply; 10+ messages in thread
From: Bijan Mottahedeh @ 2020-04-09 22:03 UTC (permalink / raw)
To: axboe; +Cc: io-uring
The liburing madvise test crashes the system with a NULL pointer
dereference because io_madvise() is passing a NULL mm value, previously
cleared in io_wq_switch_mm(), to do_advise().
I'm not clear why work->mm is being cleared, especially since it seems
to run contrary to what the comment above it states, but in any case
preserving the work->mm value gets rid of the crash.
--------------------------------------------------------------------------
Running test madvise
[ 165.733724] BUG: kernel NULL pointer dereference, address: 0000000000000138
[ 165.735088] #PF: supervisor read access in kernel mode
[ 165.736027] #PF: error_code(0x0000) - not-present page
[ 165.736971] PGD 8000000fa3c32067 P4D 8000000fa3c32067 PUD fc4e17067 PMD 0
[ 165.738254] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[ 165.739140] CPU: 18 PID: 30105 Comm: io_wqe_worker-0 Not tainted 5.6.0-next-1
[ 165.740640] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-4
[ 165.742721] RIP: 0010:__lock_acquire.isra.29+0x37/0x6c0
[ 165.743656] Code: 25 40 8e 01 00 53 48 83 ec 18 44 8b 35 e6 2f 61 01 45 85 fc
[ 165.747020] RSP: 0018:ffffc9000b08bba0 EFLAGS: 00010097
[ 165.747989] RAX: 0000000000000000 RBX: 0000000000000130 RCX: 0000000000000001
[ 165.749276] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000130
[ 165.750552] RBP: ffff888fa35224c0 R08: 0000000000000000 R09: 0000000000000000
[ 165.751862] R10: 0000000000000130 R11: 0000000000000000 R12: 0000000000000000
[ 165.753195] R13: 0000000000000001 R14: 0000000000000000 R15: 00007f5c4ecea000
[ 165.754490] FS: 0000000000000000(0000) GS:ffff888ff4600000(0000) knlGS:00000
[ 165.756007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 165.757054] CR2: 0000000000000138 CR3: 0000000fc709c002 CR4: 0000000000160ee0
[ 165.758339] Call Trace:
[ 165.758805] ? load_balance+0x1b4/0xd00
[ 165.759525] lock_acquire+0xf9/0x160
[ 165.760202] ? do_madvise+0xa59/0xb20
[ 165.760894] down_read+0x3c/0xe0
[ 165.761479] ? do_madvise+0xa59/0xb20
[ 165.762188] do_madvise+0xa59/0xb20
[ 165.762830] ? kvm_sched_clock_read+0xd/0x20
[ 165.763643] ? free_debug_processing+0x291/0x2c8
[ 165.764535] ? do_raw_spin_unlock+0x83/0x90
[ 165.765303] ? free_debug_processing+0x291/0x2c8
[ 165.766184] io_issue_sqe+0xafa/0x11e0
[ 165.766867] ? kvm_sched_clock_read+0xd/0x20
[ 165.767641] ? __free_pages_ok+0x3db/0x550
[ 165.768390] ? _raw_spin_unlock+0x1f/0x30
[ 165.769129] io_wq_submit_work+0x2f/0x80
[ 165.769800] io_worker_handle_work+0x38a/0x540
[ 165.770650] io_wqe_worker+0x32a/0x370
[ 165.771342] kthread+0x118/0x120
[ 165.771948] ? io_worker_handle_work+0x540/0x540
[ 165.772784] ? kthread_insert_work_sanity_check+0x60/0x60
[ 165.773766] ret_from_fork+0x1f/0x30
[ 165.774419] Modules linked in: xfs dm_mod sr_mod sd_mod cdrom crc32c_intel nt
[ 165.777124] CR2: 0000000000000138
[ 165.777733] ---[ end trace 2a1a5b9c912bd387 ]---
Bijan Mottahedeh (1):
io_uring: preserve work->mm since actual work processing may need it
fs/io-wq.c | 2 --
1 file changed, 2 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-09 22:03 [RFC 0/1] io_uring: preserve work->mm since actual work processing may need it Bijan Mottahedeh
@ 2020-04-09 22:03 ` Bijan Mottahedeh
2020-04-10 8:47 ` Pavel Begunkov
0 siblings, 1 reply; 10+ messages in thread
From: Bijan Mottahedeh @ 2020-04-09 22:03 UTC (permalink / raw)
To: axboe; +Cc: io-uring
Do not clear work->mm since io_madvise() passes it to do_madvise()
when the request is actually processed.
Signed-off-by: Bijan Mottahedeh <[email protected]>
---
fs/io-wq.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 4023c98..4d20754 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -431,8 +431,6 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work)
if (!worker->mm)
set_fs(USER_DS);
worker->mm = work->mm;
- /* hang on to this mm */
- work->mm = NULL;
return;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-09 22:03 ` [RFC 1/1] " Bijan Mottahedeh
@ 2020-04-10 8:47 ` Pavel Begunkov
2020-04-10 16:54 ` Bijan Mottahedeh
0 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2020-04-10 8:47 UTC (permalink / raw)
To: Bijan Mottahedeh, axboe; +Cc: io-uring
On 4/10/2020 1:03 AM, Bijan Mottahedeh wrote:
> Do not clear work->mm since io_madvise() passes it to do_madvise()
> when the request is actually processed.
As I see, this down_read() from the trace is
down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm()
just several lines above your change. So, what do you mean by passing? I
don't see do_madvise() __explicitly__ accepting mm as an argument.
What tree do you use? Extra patches on top?
>
> Signed-off-by: Bijan Mottahedeh <[email protected]>
> ---
> fs/io-wq.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index 4023c98..4d20754 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -431,8 +431,6 @@ static void io_wq_switch_mm(struct io_worker *worker, struct io_wq_work *work)
> if (!worker->mm)
> set_fs(USER_DS);
> worker->mm = work->mm;
> - /* hang on to this mm */
> - work->mm = NULL;
> return;
> }
>
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-10 8:47 ` Pavel Begunkov
@ 2020-04-10 16:54 ` Bijan Mottahedeh
2020-04-10 17:51 ` Pavel Begunkov
0 siblings, 1 reply; 10+ messages in thread
From: Bijan Mottahedeh @ 2020-04-10 16:54 UTC (permalink / raw)
To: Pavel Begunkov, axboe; +Cc: io-uring
> As I see, this down_read() from the trace is
> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm()
> just several lines above your change. So, what do you mean by passing? I
> don't see do_madvise() __explicitly__ accepting mm as an argument.
I think the sequence is:
io_madvise()
-> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice)
^^^^^^^^^^^^
-> down_read(&mm->mmap_sem)
I added an assert in do_madvise() for a NULL mm value and hit it running
the test.
> What tree do you use? Extra patches on top?
I'm using next-20200409 with no patches.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-10 16:54 ` Bijan Mottahedeh
@ 2020-04-10 17:51 ` Pavel Begunkov
2020-04-10 17:57 ` Pavel Begunkov
2020-04-10 19:09 ` Bijan Mottahedeh
0 siblings, 2 replies; 10+ messages in thread
From: Pavel Begunkov @ 2020-04-10 17:51 UTC (permalink / raw)
To: Bijan Mottahedeh, axboe; +Cc: io-uring
On 10/04/2020 19:54, Bijan Mottahedeh wrote:
>
>> As I see, this down_read() from the trace is
>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm()
>> just several lines above your change. So, what do you mean by passing? I
>> don't see do_madvise() __explicitly__ accepting mm as an argument.
>
> I think the sequence is:
>
> io_madvise()
> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice)
> ^^^^^^^^^^^^
> -> down_read(&mm->mmap_sem)
>
> I added an assert in do_madvise() for a NULL mm value and hit it running the test.
>
>> What tree do you use? Extra patches on top?
>
> I'm using next-20200409 with no patches.
I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't
in Jen's tree.
I don't think your patch will do, because it changes mm refcounting with extra
mmdrop() in io_req_work_drop_env(). That's assuming it worked well before.
Better fix then is to make it ```do_madvise(NULL, current->mm, ...)```
as it actually was at some point in the mentioned patch (v5).
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-10 17:51 ` Pavel Begunkov
@ 2020-04-10 17:57 ` Pavel Begunkov
2020-04-10 19:09 ` Bijan Mottahedeh
1 sibling, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2020-04-10 17:57 UTC (permalink / raw)
To: Bijan Mottahedeh, axboe; +Cc: io-uring
On 10/04/2020 20:51, Pavel Begunkov wrote:
>> I added an assert in do_madvise() for a NULL mm value and hit it running the test.
>>
>>> What tree do you use? Extra patches on top?
>>
>> I'm using next-20200409 with no patches.
>
> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't
> in Jen's tree
oops, sorry for mistyping your name.
>
> I don't think your patch will do, because it changes mm refcounting with extra
> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before.
>
> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)```
> as it actually was at some point in the mentioned patch (v5).
Jens, how this should be handled? Through what tree it has to go?
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-10 17:51 ` Pavel Begunkov
2020-04-10 17:57 ` Pavel Begunkov
@ 2020-04-10 19:09 ` Bijan Mottahedeh
2020-04-11 2:17 ` Jens Axboe
1 sibling, 1 reply; 10+ messages in thread
From: Bijan Mottahedeh @ 2020-04-10 19:09 UTC (permalink / raw)
To: Pavel Begunkov, axboe; +Cc: io-uring
On 4/10/2020 10:51 AM, Pavel Begunkov wrote:
> On 10/04/2020 19:54, Bijan Mottahedeh wrote:
>>> As I see, this down_read() from the trace is
>>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm()
>>> just several lines above your change. So, what do you mean by passing? I
>>> don't see do_madvise() __explicitly__ accepting mm as an argument.
>> I think the sequence is:
>>
>> io_madvise()
>> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice)
>> ^^^^^^^^^^^^
>> -> down_read(&mm->mmap_sem)
>>
>> I added an assert in do_madvise() for a NULL mm value and hit it running the test.
>>
>>> What tree do you use? Extra patches on top?
>> I'm using next-20200409 with no patches.
> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't
> in Jen's tree.
>
> I don't think your patch will do, because it changes mm refcounting with extra
> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before.
>
> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)```
> as it actually was at some point in the mentioned patch (v5).
>
Ok. Jens had suggested to use req->work.mm in the patch comments so
let's just get him to confirm:
"I think we want to use req->work.mm here - it'll be the same as
current->mm at this point, but it makes it clear that we're using a
grabbed mm."
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-10 19:09 ` Bijan Mottahedeh
@ 2020-04-11 2:17 ` Jens Axboe
2020-04-16 20:24 ` Minchan Kim
0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-04-11 2:17 UTC (permalink / raw)
To: Bijan Mottahedeh, Pavel Begunkov; +Cc: io-uring, Minchan Kim
On 4/10/20 12:09 PM, Bijan Mottahedeh wrote:
> On 4/10/2020 10:51 AM, Pavel Begunkov wrote:
>> On 10/04/2020 19:54, Bijan Mottahedeh wrote:
>>>> As I see, this down_read() from the trace is
>>>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm()
>>>> just several lines above your change. So, what do you mean by passing? I
>>>> don't see do_madvise() __explicitly__ accepting mm as an argument.
>>> I think the sequence is:
>>>
>>> io_madvise()
>>> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice)
>>> ^^^^^^^^^^^^
>>> -> down_read(&mm->mmap_sem)
>>>
>>> I added an assert in do_madvise() for a NULL mm value and hit it running the test.
>>>
>>>> What tree do you use? Extra patches on top?
>>> I'm using next-20200409 with no patches.
>> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't
>> in Jen's tree.
>>
>> I don't think your patch will do, because it changes mm refcounting with extra
>> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before.
>>
>> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)```
>> as it actually was at some point in the mentioned patch (v5).
>>
> Ok. Jens had suggested to use req->work.mm in the patch comments so
> let's just get him to confirm:
>
> "I think we want to use req->work.mm here - it'll be the same as
> current->mm at this point, but it makes it clear that we're using a
> grabbed mm."
We should just use current->mm, as that matches at that point anyway
since IORING_OP_MADVISE had needs_mm set.
Minchan, can you please make that change?
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-11 2:17 ` Jens Axboe
@ 2020-04-16 20:24 ` Minchan Kim
2020-04-16 20:30 ` Jens Axboe
0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2020-04-16 20:24 UTC (permalink / raw)
To: Jens Axboe; +Cc: Bijan Mottahedeh, Pavel Begunkov, io-uring
Hi Jens,
Sorry for the late.
On Fri, Apr 10, 2020 at 08:17:29PM -0600, Jens Axboe wrote:
> On 4/10/20 12:09 PM, Bijan Mottahedeh wrote:
> > On 4/10/2020 10:51 AM, Pavel Begunkov wrote:
> >> On 10/04/2020 19:54, Bijan Mottahedeh wrote:
> >>>> As I see, this down_read() from the trace is
> >>>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm()
> >>>> just several lines above your change. So, what do you mean by passing? I
> >>>> don't see do_madvise() __explicitly__ accepting mm as an argument.
> >>> I think the sequence is:
> >>>
> >>> io_madvise()
> >>> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice)
> >>> ^^^^^^^^^^^^
> >>> -> down_read(&mm->mmap_sem)
> >>>
> >>> I added an assert in do_madvise() for a NULL mm value and hit it running the test.
> >>>
> >>>> What tree do you use? Extra patches on top?
> >>> I'm using next-20200409 with no patches.
> >> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't
> >> in Jen's tree.
> >>
> >> I don't think your patch will do, because it changes mm refcounting with extra
> >> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before.
> >>
> >> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)```
> >> as it actually was at some point in the mentioned patch (v5).
> >>
> > Ok. Jens had suggested to use req->work.mm in the patch comments so
> > let's just get him to confirm:
> >
> > "I think we want to use req->work.mm here - it'll be the same as
> > current->mm at this point, but it makes it clear that we're using a
> > grabbed mm."
>
> We should just use current->mm, as that matches at that point anyway
> since IORING_OP_MADVISE had needs_mm set.
>
> Minchan, can you please make that change?
Do you mean this?
diff --git a/fs/io_uring.c b/fs/io_uring.c
index a9537cd77aeb..3edbb4764993 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3280,7 +3280,7 @@ static int io_madvise(struct io_kiocb *req, bool force_nonblock)
if (force_nonblock)
return -EAGAIN;
- ret = do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice);
+ ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice);
if (ret < 0)
req_set_fail_links(req);
io_cqring_add_event(req, ret);
Since I have a plan to resend whole patchset again, I will carry on
that.
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 1/1] io_uring: preserve work->mm since actual work processing may need it
2020-04-16 20:24 ` Minchan Kim
@ 2020-04-16 20:30 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2020-04-16 20:30 UTC (permalink / raw)
To: Minchan Kim; +Cc: Bijan Mottahedeh, Pavel Begunkov, io-uring
On 4/16/20 2:24 PM, Minchan Kim wrote:
> Hi Jens,
>
> Sorry for the late.
>
> On Fri, Apr 10, 2020 at 08:17:29PM -0600, Jens Axboe wrote:
>> On 4/10/20 12:09 PM, Bijan Mottahedeh wrote:
>>> On 4/10/2020 10:51 AM, Pavel Begunkov wrote:
>>>> On 10/04/2020 19:54, Bijan Mottahedeh wrote:
>>>>>> As I see, this down_read() from the trace is
>>>>>> down_read(¤t->mm->mmap_sem), where current->mm is set by use_mm()
>>>>>> just several lines above your change. So, what do you mean by passing? I
>>>>>> don't see do_madvise() __explicitly__ accepting mm as an argument.
>>>>> I think the sequence is:
>>>>>
>>>>> io_madvise()
>>>>> -> do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice)
>>>>> ^^^^^^^^^^^^
>>>>> -> down_read(&mm->mmap_sem)
>>>>>
>>>>> I added an assert in do_madvise() for a NULL mm value and hit it running the test.
>>>>>
>>>>>> What tree do you use? Extra patches on top?
>>>>> I'm using next-20200409 with no patches.
>>>> I see, it came from 676a179 ("mm: pass task and mm to do_madvise"), which isn't
>>>> in Jen's tree.
>>>>
>>>> I don't think your patch will do, because it changes mm refcounting with extra
>>>> mmdrop() in io_req_work_drop_env(). That's assuming it worked well before.
>>>>
>>>> Better fix then is to make it ```do_madvise(NULL, current->mm, ...)```
>>>> as it actually was at some point in the mentioned patch (v5).
>>>>
>>> Ok. Jens had suggested to use req->work.mm in the patch comments so
>>> let's just get him to confirm:
>>>
>>> "I think we want to use req->work.mm here - it'll be the same as
>>> current->mm at this point, but it makes it clear that we're using a
>>> grabbed mm."
>>
>> We should just use current->mm, as that matches at that point anyway
>> since IORING_OP_MADVISE had needs_mm set.
>>
>> Minchan, can you please make that change?
>
> Do you mean this?
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a9537cd77aeb..3edbb4764993 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -3280,7 +3280,7 @@ static int io_madvise(struct io_kiocb *req, bool force_nonblock)
> if (force_nonblock)
> return -EAGAIN;
>
> - ret = do_madvise(NULL, req->work.mm, ma->addr, ma->len, ma->advice);
> + ret = do_madvise(NULL, current->mm, ma->addr, ma->len, ma->advice);
> if (ret < 0)
> req_set_fail_links(req);
> io_cqring_add_event(req, ret);
>
> Since I have a plan to resend whole patchset again, I will carry on
> that.
Yeah exactly like that, thanks!
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-04-16 20:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-09 22:03 [RFC 0/1] io_uring: preserve work->mm since actual work processing may need it Bijan Mottahedeh
2020-04-09 22:03 ` [RFC 1/1] " Bijan Mottahedeh
2020-04-10 8:47 ` Pavel Begunkov
2020-04-10 16:54 ` Bijan Mottahedeh
2020-04-10 17:51 ` Pavel Begunkov
2020-04-10 17:57 ` Pavel Begunkov
2020-04-10 19:09 ` Bijan Mottahedeh
2020-04-11 2:17 ` Jens Axboe
2020-04-16 20:24 ` Minchan Kim
2020-04-16 20:30 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox