* Re: [syzbot] WARNING in mntput_no_expire (2) [not found] <[email protected]> @ 2021-04-01 15:45 ` Christian Brauner 2021-04-01 16:09 ` Jens Axboe 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-01 15:45 UTC (permalink / raw) To: syzbot, axboe; +Cc: linux-fsdevel, linux-kernel, syzkaller-bugs, viro, io-uring On Thu, Apr 01, 2021 at 02:09:20AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: d19cc4bf Merge tag 'trace-v5.12-rc5' of git://git.kernel.o.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1018f281d00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=d1a3d65a48dbd1bc > dashboard link: https://syzkaller.appspot.com/bug?extid=c88a7030da47945a3cc3 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12f50d11d00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=137694a1d00000 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: [email protected] > > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 8409 at fs/namespace.c:1186 mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 > Modules linked in: > CPU: 1 PID: 8409 Comm: syz-executor035 Not tainted 5.12.0-rc5-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 > Code: ff 48 c7 c2 e0 cb 78 89 be c2 02 00 00 48 c7 c7 a0 cb 78 89 c6 05 e5 6d e5 0b 01 e8 ff e1 f6 06 e9 3f fd ff ff e8 c6 a5 a8 ff <0f> 0b e9 fc fc ff ff e8 ba a5 a8 ff e8 55 dc 94 ff 31 ff 89 c5 89 > RSP: 0018:ffffc9000165fc78 EFLAGS: 00010293 > RAX: 0000000000000000 RBX: 1ffff920002cbf95 RCX: 0000000000000000 > RDX: ffff88802072d4c0 RSI: ffffffff81cb4b8a RDI: 0000000000000003 > RBP: ffff888011656900 R08: 0000000000000000 R09: ffffffff8fa978af > R10: ffffffff81cb4884 R11: 0000000000000000 R12: 0000000000000008 > R13: ffffc9000165fcc8 R14: dffffc0000000000 R15: 00000000ffffffff > FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055a722053160 CR3: 000000000bc8e000 CR4: 00000000001506e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > mntput fs/namespace.c:1232 [inline] > cleanup_mnt+0x523/0x530 fs/namespace.c:1132 > task_work_run+0xdd/0x1a0 kernel/task_work.c:140 > exit_task_work include/linux/task_work.h:30 [inline] > do_exit+0xbfc/0x2a60 kernel/exit.c:825 > do_group_exit+0x125/0x310 kernel/exit.c:922 > __do_sys_exit_group kernel/exit.c:933 [inline] > __se_sys_exit_group kernel/exit.c:931 [inline] > __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xae > RIP: 0033:0x446af9 > Code: Unable to access opcode bytes at RIP 0x446acf. > RSP: 002b:00000000005dfe48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > RAX: ffffffffffffffda RBX: 00000000004ce450 RCX: 0000000000446af9 > RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 > RBP: 0000000000000001 R08: ffffffffffffffbc R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ce450 > R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 [+Cc Jens + io_uring] Hm, this reproducer uses io_uring and it's the io_uring_enter() that triggers this reliably. With this reproducer I've managed to reproduce the issue on v5.12-rc4, and v5.12-rc3, v5.12-rc2 and v5.12-rc1. It's not reproducible at 9820b4dca0f9c6b7ab8b4307286cdace171b724d which is the commit immediately before the first v5.12 io_uring merge. It's first reproducible with the first io_uring merge for v5.12, i.e. 5bbb336ba75d95611a7b9456355b48705016bdb1 Christian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-01 15:45 ` [syzbot] WARNING in mntput_no_expire (2) Christian Brauner @ 2021-04-01 16:09 ` Jens Axboe 2021-04-01 17:46 ` Christian Brauner 0 siblings, 1 reply; 31+ messages in thread From: Jens Axboe @ 2021-04-01 16:09 UTC (permalink / raw) To: Christian Brauner, syzbot Cc: linux-fsdevel, linux-kernel, syzkaller-bugs, viro, io-uring On 4/1/21 9:45 AM, Christian Brauner wrote: > On Thu, Apr 01, 2021 at 02:09:20AM -0700, syzbot wrote: >> Hello, >> >> syzbot found the following issue on: >> >> HEAD commit: d19cc4bf Merge tag 'trace-v5.12-rc5' of git://git.kernel.o.. >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=1018f281d00000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=d1a3d65a48dbd1bc >> dashboard link: https://syzkaller.appspot.com/bug?extid=c88a7030da47945a3cc3 >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12f50d11d00000 >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=137694a1d00000 >> >> IMPORTANT: if you fix the issue, please add the following tag to the commit: >> Reported-by: [email protected] >> >> ------------[ cut here ]------------ >> WARNING: CPU: 1 PID: 8409 at fs/namespace.c:1186 mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 >> Modules linked in: >> CPU: 1 PID: 8409 Comm: syz-executor035 Not tainted 5.12.0-rc5-syzkaller #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >> RIP: 0010:mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 >> Code: ff 48 c7 c2 e0 cb 78 89 be c2 02 00 00 48 c7 c7 a0 cb 78 89 c6 05 e5 6d e5 0b 01 e8 ff e1 f6 06 e9 3f fd ff ff e8 c6 a5 a8 ff <0f> 0b e9 fc fc ff ff e8 ba a5 a8 ff e8 55 dc 94 ff 31 ff 89 c5 89 >> RSP: 0018:ffffc9000165fc78 EFLAGS: 00010293 >> RAX: 0000000000000000 RBX: 1ffff920002cbf95 RCX: 0000000000000000 >> RDX: ffff88802072d4c0 RSI: ffffffff81cb4b8a RDI: 0000000000000003 >> RBP: ffff888011656900 R08: 0000000000000000 R09: ffffffff8fa978af >> R10: ffffffff81cb4884 R11: 0000000000000000 R12: 0000000000000008 >> R13: ffffc9000165fcc8 R14: dffffc0000000000 R15: 00000000ffffffff >> FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 000055a722053160 CR3: 000000000bc8e000 CR4: 00000000001506e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> mntput fs/namespace.c:1232 [inline] >> cleanup_mnt+0x523/0x530 fs/namespace.c:1132 >> task_work_run+0xdd/0x1a0 kernel/task_work.c:140 >> exit_task_work include/linux/task_work.h:30 [inline] >> do_exit+0xbfc/0x2a60 kernel/exit.c:825 >> do_group_exit+0x125/0x310 kernel/exit.c:922 >> __do_sys_exit_group kernel/exit.c:933 [inline] >> __se_sys_exit_group kernel/exit.c:931 [inline] >> __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 >> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 >> entry_SYSCALL_64_after_hwframe+0x44/0xae >> RIP: 0033:0x446af9 >> Code: Unable to access opcode bytes at RIP 0x446acf. >> RSP: 002b:00000000005dfe48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 >> RAX: ffffffffffffffda RBX: 00000000004ce450 RCX: 0000000000446af9 >> RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 >> RBP: 0000000000000001 R08: ffffffffffffffbc R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ce450 >> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 > > [+Cc Jens + io_uring] > > Hm, this reproducer uses io_uring and it's the io_uring_enter() that > triggers this reliably. With this reproducer I've managed to reproduce > the issue on v5.12-rc4, and v5.12-rc3, v5.12-rc2 and v5.12-rc1. > It's not reproducible at > 9820b4dca0f9c6b7ab8b4307286cdace171b724d > which is the commit immediately before the first v5.12 io_uring merge. > It's first reproducible with the first io_uring merge for v5.12, i.e. > 5bbb336ba75d95611a7b9456355b48705016bdb1 Thanks, that's good info. I'll take a look at it and see if I can reproduce. -- Jens Axboe ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-01 16:09 ` Jens Axboe @ 2021-04-01 17:46 ` Christian Brauner 2021-04-01 17:59 ` Christian Brauner 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-01 17:46 UTC (permalink / raw) To: Jens Axboe Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, viro, io-uring On Thu, Apr 01, 2021 at 10:09:18AM -0600, Jens Axboe wrote: > On 4/1/21 9:45 AM, Christian Brauner wrote: > > On Thu, Apr 01, 2021 at 02:09:20AM -0700, syzbot wrote: > >> Hello, > >> > >> syzbot found the following issue on: > >> > >> HEAD commit: d19cc4bf Merge tag 'trace-v5.12-rc5' of git://git.kernel.o.. > >> git tree: upstream > >> console output: https://syzkaller.appspot.com/x/log.txt?x=1018f281d00000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=d1a3d65a48dbd1bc > >> dashboard link: https://syzkaller.appspot.com/bug?extid=c88a7030da47945a3cc3 > >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12f50d11d00000 > >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=137694a1d00000 > >> > >> IMPORTANT: if you fix the issue, please add the following tag to the commit: > >> Reported-by: [email protected] > >> > >> ------------[ cut here ]------------ > >> WARNING: CPU: 1 PID: 8409 at fs/namespace.c:1186 mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 > >> Modules linked in: > >> CPU: 1 PID: 8409 Comm: syz-executor035 Not tainted 5.12.0-rc5-syzkaller #0 > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > >> RIP: 0010:mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 > >> Code: ff 48 c7 c2 e0 cb 78 89 be c2 02 00 00 48 c7 c7 a0 cb 78 89 c6 05 e5 6d e5 0b 01 e8 ff e1 f6 06 e9 3f fd ff ff e8 c6 a5 a8 ff <0f> 0b e9 fc fc ff ff e8 ba a5 a8 ff e8 55 dc 94 ff 31 ff 89 c5 89 > >> RSP: 0018:ffffc9000165fc78 EFLAGS: 00010293 > >> RAX: 0000000000000000 RBX: 1ffff920002cbf95 RCX: 0000000000000000 > >> RDX: ffff88802072d4c0 RSI: ffffffff81cb4b8a RDI: 0000000000000003 > >> RBP: ffff888011656900 R08: 0000000000000000 R09: ffffffff8fa978af > >> R10: ffffffff81cb4884 R11: 0000000000000000 R12: 0000000000000008 > >> R13: ffffc9000165fcc8 R14: dffffc0000000000 R15: 00000000ffffffff > >> FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: 000055a722053160 CR3: 000000000bc8e000 CR4: 00000000001506e0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> Call Trace: > >> mntput fs/namespace.c:1232 [inline] > >> cleanup_mnt+0x523/0x530 fs/namespace.c:1132 > >> task_work_run+0xdd/0x1a0 kernel/task_work.c:140 > >> exit_task_work include/linux/task_work.h:30 [inline] > >> do_exit+0xbfc/0x2a60 kernel/exit.c:825 > >> do_group_exit+0x125/0x310 kernel/exit.c:922 > >> __do_sys_exit_group kernel/exit.c:933 [inline] > >> __se_sys_exit_group kernel/exit.c:931 [inline] > >> __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 > >> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > >> entry_SYSCALL_64_after_hwframe+0x44/0xae > >> RIP: 0033:0x446af9 > >> Code: Unable to access opcode bytes at RIP 0x446acf. > >> RSP: 002b:00000000005dfe48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > >> RAX: ffffffffffffffda RBX: 00000000004ce450 RCX: 0000000000446af9 > >> RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 > >> RBP: 0000000000000001 R08: ffffffffffffffbc R09: 0000000000000000 > >> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ce450 > >> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 > > > > [+Cc Jens + io_uring] > > > > Hm, this reproducer uses io_uring and it's the io_uring_enter() that > > triggers this reliably. With this reproducer I've managed to reproduce > > the issue on v5.12-rc4, and v5.12-rc3, v5.12-rc2 and v5.12-rc1. > > It's not reproducible at > > 9820b4dca0f9c6b7ab8b4307286cdace171b724d > > which is the commit immediately before the first v5.12 io_uring merge. > > It's first reproducible with the first io_uring merge for v5.12, i.e. > > 5bbb336ba75d95611a7b9456355b48705016bdb1 > > Thanks, that's good info. I'll take a look at it and see if I can > reproduce. Ok, I was deep into this anyway and it didn't make much sense to do anything else at that point so I bisected this a bit further. The first bad commit is: commit 3a81fd02045c329f25e5900fa61f613c9b317644 Author: Jens Axboe <[email protected]> Date: Thu Dec 10 12:25:36 2020 -0700 io_uring: enable LOOKUP_CACHED path resolution for filename lookups Instead of being pessimistic and assume that path lookup will block, use LOOKUP_CACHED to attempt just a cached lookup. This ensures that the fast path is always done inline, and we only punt to async context if IO is needed to satisfy the lookup. For forced nonblock open attempts, mark the file O_NONBLOCK over the actual ->open() call as well. We can safely clear this again before doing fd_install(), so it'll never be user visible that we fiddled with it. This greatly improves the performance of file open where the dentry is already cached: ached 5.10-git 5.10-git+LOOKUP_CACHED Speedup --------------------------------------------------------------- 33% 1,014,975 900,474 1.1x 89% 545,466 292,937 1.9x 100% 435,636 151,475 2.9x The more cache hot we are, the faster the inline LOOKUP_CACHED optimization helps. This is unsurprising and expected, as a thread offload becomes a more dominant part of the total overhead. If we look at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in the dentry cache will yield: 275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0 275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0 275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991 275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1 275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4 which shows a failed nonblock lookup, then punt to worker, and then we complete with fd == 4. This takes 65 usec in total. Re-running the same test case again: 281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0 281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0 281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4 shows the same request completing inline, also returning fd == 4. This takes 6 usec. Signed-off-by: Jens Axboe <[email protected]> ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-01 17:46 ` Christian Brauner @ 2021-04-01 17:59 ` Christian Brauner 2021-04-01 19:11 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-01 17:59 UTC (permalink / raw) To: Jens Axboe, viro Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Thu, Apr 01, 2021 at 07:46:13PM +0200, Christian Brauner wrote: > On Thu, Apr 01, 2021 at 10:09:18AM -0600, Jens Axboe wrote: > > On 4/1/21 9:45 AM, Christian Brauner wrote: > > > On Thu, Apr 01, 2021 at 02:09:20AM -0700, syzbot wrote: > > >> Hello, > > >> > > >> syzbot found the following issue on: > > >> > > >> HEAD commit: d19cc4bf Merge tag 'trace-v5.12-rc5' of git://git.kernel.o.. > > >> git tree: upstream > > >> console output: https://syzkaller.appspot.com/x/log.txt?x=1018f281d00000 > > >> kernel config: https://syzkaller.appspot.com/x/.config?x=d1a3d65a48dbd1bc > > >> dashboard link: https://syzkaller.appspot.com/bug?extid=c88a7030da47945a3cc3 > > >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12f50d11d00000 > > >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=137694a1d00000 > > >> > > >> IMPORTANT: if you fix the issue, please add the following tag to the commit: > > >> Reported-by: [email protected] > > >> > > >> ------------[ cut here ]------------ > > >> WARNING: CPU: 1 PID: 8409 at fs/namespace.c:1186 mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 > > >> Modules linked in: > > >> CPU: 1 PID: 8409 Comm: syz-executor035 Not tainted 5.12.0-rc5-syzkaller #0 > > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > >> RIP: 0010:mntput_no_expire+0xaca/0xcb0 fs/namespace.c:1186 > > >> Code: ff 48 c7 c2 e0 cb 78 89 be c2 02 00 00 48 c7 c7 a0 cb 78 89 c6 05 e5 6d e5 0b 01 e8 ff e1 f6 06 e9 3f fd ff ff e8 c6 a5 a8 ff <0f> 0b e9 fc fc ff ff e8 ba a5 a8 ff e8 55 dc 94 ff 31 ff 89 c5 89 > > >> RSP: 0018:ffffc9000165fc78 EFLAGS: 00010293 > > >> RAX: 0000000000000000 RBX: 1ffff920002cbf95 RCX: 0000000000000000 > > >> RDX: ffff88802072d4c0 RSI: ffffffff81cb4b8a RDI: 0000000000000003 > > >> RBP: ffff888011656900 R08: 0000000000000000 R09: ffffffff8fa978af > > >> R10: ffffffff81cb4884 R11: 0000000000000000 R12: 0000000000000008 > > >> R13: ffffc9000165fcc8 R14: dffffc0000000000 R15: 00000000ffffffff > > >> FS: 0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 > > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >> CR2: 000055a722053160 CR3: 000000000bc8e000 CR4: 00000000001506e0 > > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > >> Call Trace: > > >> mntput fs/namespace.c:1232 [inline] > > >> cleanup_mnt+0x523/0x530 fs/namespace.c:1132 > > >> task_work_run+0xdd/0x1a0 kernel/task_work.c:140 > > >> exit_task_work include/linux/task_work.h:30 [inline] > > >> do_exit+0xbfc/0x2a60 kernel/exit.c:825 > > >> do_group_exit+0x125/0x310 kernel/exit.c:922 > > >> __do_sys_exit_group kernel/exit.c:933 [inline] > > >> __se_sys_exit_group kernel/exit.c:931 [inline] > > >> __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931 > > >> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > > >> entry_SYSCALL_64_after_hwframe+0x44/0xae > > >> RIP: 0033:0x446af9 > > >> Code: Unable to access opcode bytes at RIP 0x446acf. > > >> RSP: 002b:00000000005dfe48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > >> RAX: ffffffffffffffda RBX: 00000000004ce450 RCX: 0000000000446af9 > > >> RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 > > >> RBP: 0000000000000001 R08: ffffffffffffffbc R09: 0000000000000000 > > >> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ce450 > > >> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 > > > > > > [+Cc Jens + io_uring] > > > > > > Hm, this reproducer uses io_uring and it's the io_uring_enter() that > > > triggers this reliably. With this reproducer I've managed to reproduce > > > the issue on v5.12-rc4, and v5.12-rc3, v5.12-rc2 and v5.12-rc1. > > > It's not reproducible at > > > 9820b4dca0f9c6b7ab8b4307286cdace171b724d > > > which is the commit immediately before the first v5.12 io_uring merge. > > > It's first reproducible with the first io_uring merge for v5.12, i.e. > > > 5bbb336ba75d95611a7b9456355b48705016bdb1 > > > > Thanks, that's good info. I'll take a look at it and see if I can > > reproduce. > > Ok, I was deep into this anyway and it didn't make much sense to do > anything else at that point so I bisected this a bit further. The first > bad commit is: > > commit 3a81fd02045c329f25e5900fa61f613c9b317644 > Author: Jens Axboe <[email protected]> > Date: Thu Dec 10 12:25:36 2020 -0700 > > io_uring: enable LOOKUP_CACHED path resolution for filename lookups > > Instead of being pessimistic and assume that path lookup will block, use > LOOKUP_CACHED to attempt just a cached lookup. This ensures that the > fast path is always done inline, and we only punt to async context if > IO is needed to satisfy the lookup. > > For forced nonblock open attempts, mark the file O_NONBLOCK over the > actual ->open() call as well. We can safely clear this again before > doing fd_install(), so it'll never be user visible that we fiddled with > it. > > This greatly improves the performance of file open where the dentry is > already cached: > > ached 5.10-git 5.10-git+LOOKUP_CACHED Speedup > --------------------------------------------------------------- > 33% 1,014,975 900,474 1.1x > 89% 545,466 292,937 1.9x > 100% 435,636 151,475 2.9x > > The more cache hot we are, the faster the inline LOOKUP_CACHED > optimization helps. This is unsurprising and expected, as a thread > offload becomes a more dominant part of the total overhead. If we look > at io_uring tracing, doing an IORING_OP_OPENAT on a file that isn't in > the dentry cache will yield: > > 275.550481: io_uring_create: ring 00000000ddda6278, fd 3 sq size 8, cq size 16, flags 0 > 275.550491: io_uring_submit_sqe: ring 00000000ddda6278, op 18, data 0x0, non block 1, sq_thread 0 > 275.550498: io_uring_queue_async_work: ring 00000000ddda6278, request 00000000c0267d17, flags 69760, normal queue, work 000000003d683991 > 275.550502: io_uring_cqring_wait: ring 00000000ddda6278, min_events 1 > 275.550556: io_uring_complete: ring 00000000ddda6278, user_data 0x0, result 4 > > which shows a failed nonblock lookup, then punt to worker, and then we > complete with fd == 4. This takes 65 usec in total. Re-running the same > test case again: > > 281.253956: io_uring_create: ring 0000000008207252, fd 3 sq size 8, cq size 16, flags 0 > 281.253967: io_uring_submit_sqe: ring 0000000008207252, op 18, data 0x0, non block 1, sq_thread 0 > 281.253973: io_uring_complete: ring 0000000008207252, user_data 0x0, result 4 > > shows the same request completing inline, also returning fd == 4. This > takes 6 usec. > > Signed-off-by: Jens Axboe <[email protected]> I _think_ I see what the issue is. It seems that an assumption made in this commit might be wrong and we're missing a mnt_add_count() bump that we would otherwise have gotten if we've moved the failure handling into the unlazy helpers themselves. Al, does that sound plausible? commit eacd9aa8cedeb412842c7b339adbaa0477fdd5ad Author: Al Viro <[email protected]> Date: Mon Feb 15 12:03:23 2021 -0500 fix handling of nd->depth on LOOKUP_CACHED failures in try_to_unlazy* After switching to non-RCU mode, we want nd->depth to match the number of entries in nd->stack[] that need eventual path_put(). legitimize_links() takes care of that on failures; unfortunately, failure exits added for LOOKUP_CACHED do not. We could add the logics for that into those failure exits, both in try_to_unlazy() and in try_to_unlazy_next(), but since both checks are immediately followed by legitimize_links() and there's no calls of legitimize_links() other than those two... It's easier to move the check (and required handling of nd->depth on failure) into legitimize_links() itself. [caught by Jens: ... and since we are zeroing ->depth here, we need to do drop_links() first] Fixes: 6c6ec2b0a3e0 "fs: add support for LOOKUP_CACHED" Tested-by: Jens Axboe <[email protected]> Signed-off-by: Al Viro <[email protected]> diff --git a/fs/namei.c b/fs/namei.c index 4cae88733a5c..de74ad2bc6e2 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -630,6 +630,11 @@ static inline bool legitimize_path(struct nameidata *nd, static bool legitimize_links(struct nameidata *nd) { int i; + if (unlikely(nd->flags & LOOKUP_CACHED)) { + drop_links(nd); + nd->depth = 0; + return false; + } for (i = 0; i < nd->depth; i++) { struct saved *last = nd->stack + i; if (unlikely(!legitimize_path(nd, &last->link, last->seq))) { @@ -686,8 +691,6 @@ static bool try_to_unlazy(struct nameidata *nd) BUG_ON(!(nd->flags & LOOKUP_RCU)); nd->flags &= ~LOOKUP_RCU; - if (nd->flags & LOOKUP_CACHED) - goto out1; if (unlikely(!legitimize_links(nd))) goto out1; if (unlikely(!legitimize_path(nd, &nd->path, nd->seq))) @@ -724,8 +727,6 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry, unsi BUG_ON(!(nd->flags & LOOKUP_RCU)); nd->flags &= ~LOOKUP_RCU; - if (nd->flags & LOOKUP_CACHED) - goto out2; if (unlikely(!legitimize_links(nd))) goto out2; if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq))) ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-01 17:59 ` Christian Brauner @ 2021-04-01 19:11 ` Al Viro 2021-04-04 2:34 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-01 19:11 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Thu, Apr 01, 2021 at 07:59:19PM +0200, Christian Brauner wrote: > I _think_ I see what the issue is. It seems that an assumption made in > this commit might be wrong and we're missing a mnt_add_count() bump that > we would otherwise have gotten if we've moved the failure handling into > the unlazy helpers themselves. > > Al, does that sound plausible? mnt_add_count() on _what_? Failure in legitimize_links() ends up with nd->path.mnt zeroed, in both callers. So which vfsmount would be affected? Rules: in RCU mode: no mounts pinned out of RCU mode: nd->path.mnt and all nd->stack[i].link.mnt for i below nd->depth are either NULL or pinned Transition from RCU to non-RCU mode happens in try_to_unlazy() and try_to_unlazy_next(). References (if any) are dropped by eventual terminate_walk() (after that the contents of nameidata is junk). __legitimize_mnt() is the primitive for pinning. Return values: 0 -- successfully pinned (or given NULL as an argument) 1 -- failed, refcount not affected -1 -- failed, refcount bumped. It stays in RCU mode in all cases. One user is __legitimize_path(); it also stays in RCU mode. If it fails to legitimize path->mnt, it will zero it *IF* __legitimize_mnt() reports that refcount hadn't been taken. In all other cases, path->mnt is pinned. IOW, the caller is responsible for path_put() regardless of the outcome. Another user is legitimize_mnt(). _That_ will make sure that refcount is unaffected in case of failure (IOW, if __legitimize_mnt() reports failure with refcount bumped, we drop out of RCU mode, do mntput() and go back). On failure in legitimize_links() we either leave nd->depth equal to zero (in which case all nd->stack[...].link.mnt are to be ignored) or we set it one higher than the last attempted legitimize_path() in there. In the latter case, all entries in nd->stack below the value we put into nd->depth had legitimize_path() called (and thus have ->mnt either NULL or pinned) and everything starting from nd->depth is to be ignored. nd->path handling: 1) Callers of legitimize_links() are responsible for zeroing nd->path.mnt on legitimize_links() failure. Both do that, AFAICS. 2) in try_to_unlazy() we proceed to call legitimize_path() on nd->path. Once that call is done, we have nd->path.mnt pinned or NULL, so nothing further is needed with it. 3) in try_to_unlazy_next() we use legitimize_mnt() instead. Failure of that is handled by zeroing nd->path.mnt; success means that nd->path.mnt is pinned and should be left alone. We could use __legitimize_mnt() in try_to_unlazy_next() (basically, substitute the body of legitimize_mnt() there and massage it a bit), but that ends up being harder to follow: res = __legitimize_mnt(nd->path.mnt, nd->m_seq); if (unlikely(res)) { if (res < 0) // pinned, leave it there goto out1; else // not pinned, zero it goto out2; } instead of if (unlikely(!legitimize_mnt(nd->path.mnt, nd->m_seq))) goto out2; we have now. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-01 19:11 ` Al Viro @ 2021-04-04 2:34 ` Al Viro 2021-04-04 2:38 ` Al Viro 2021-04-04 11:34 ` Christian Brauner 0 siblings, 2 replies; 31+ messages in thread From: Al Viro @ 2021-04-04 2:34 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Thu, Apr 01, 2021 at 07:11:12PM +0000, Al Viro wrote: > > I _think_ I see what the issue is. It seems that an assumption made in > > this commit might be wrong and we're missing a mnt_add_count() bump that > > we would otherwise have gotten if we've moved the failure handling into > > the unlazy helpers themselves. > > > > Al, does that sound plausible? > > mnt_add_count() on _what_? Failure in legitimize_links() ends up with > nd->path.mnt zeroed, in both callers. So which vfsmount would be > affected? Could you turn that WARN_ON(count < 0) into if (WARN_ON(count < 0)) printk(KERN_ERR "id = %d, dev = %s, count = %d\n", mnt->mnt_id, mnt->mnt_sb->s_id, count); add system("cat /proc/self/mountinfo"); right after sandbox_common() call and try to reproduce that? I really wonder what mount is it happening to. BTW, how painful would it be to teach syzcaller to turn those cascades of NONFAILING(*(uint8_t*)0x20000080 = 0x12); NONFAILING(*(uint8_t*)0x20000081 = 0); NONFAILING(*(uint16_t*)0x20000082 = 0); NONFAILING(*(uint32_t*)0x20000084 = 0xffffff9c); NONFAILING(*(uint64_t*)0x20000088 = 0); NONFAILING(*(uint64_t*)0x20000090 = 0x20000180); NONFAILING(memcpy((void*)0x20000180, "./file0\000", 8)); NONFAILING(*(uint32_t*)0x20000098 = 0); NONFAILING(*(uint32_t*)0x2000009c = 0x80); NONFAILING(*(uint64_t*)0x200000a0 = 0x23456); .... NONFAILING(syz_io_uring_submit(r[1], r[2], 0x20000080, 0)); into something more readable? Bloody annoyance every time... Sure, I can manually translate it into struct io_uring_sqe *sqe = (void *)0x20000080; char *s = (void *)0x20000180; memset(sqe, '\0', sizeof(*sqe)); sqe->opcode = 0x12; // IORING_OP_OPENAT? sqe->fd = -100; // AT_FDCWD? sqe->addr = s; strcpy(s, "./file0"); sqe->open_flags = 0x80; // O_EXCL??? sqe->user_data = 0x23456; // random tag? syz_io_uring_submit(r[1], r[2], (unsigned long)p, 0); but it's really annoying as hell, especially since syz_io_uring_submit() comes from syzcaller and the damn thing _knows_ that the third argument is sodding io_uring_sqe, and never passed to anything other than memcpy() in there, at that, so the exact address can't matter. Incidentally, solitary O_EXCL (without O_CREAT) is... curious. Does that sucker still trigger without it? I.e. with store to 0x2000009c replaced with storing 0? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 2:34 ` Al Viro @ 2021-04-04 2:38 ` Al Viro 2021-04-04 11:34 ` Christian Brauner 1 sibling, 0 replies; 31+ messages in thread From: Al Viro @ 2021-04-04 2:38 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 02:34:08AM +0000, Al Viro wrote: > I really wonder what mount is it happening to. BTW, how painful would > it be to teach syzcaller to turn those cascades of > NONFAILING(*(uint8_t*)0x20000080 = 0x12); > NONFAILING(*(uint8_t*)0x20000081 = 0); > NONFAILING(*(uint16_t*)0x20000082 = 0); > NONFAILING(*(uint32_t*)0x20000084 = 0xffffff9c); > NONFAILING(*(uint64_t*)0x20000088 = 0); > NONFAILING(*(uint64_t*)0x20000090 = 0x20000180); > NONFAILING(memcpy((void*)0x20000180, "./file0\000", 8)); > NONFAILING(*(uint32_t*)0x20000098 = 0); > NONFAILING(*(uint32_t*)0x2000009c = 0x80); > NONFAILING(*(uint64_t*)0x200000a0 = 0x23456); > .... > NONFAILING(syz_io_uring_submit(r[1], r[2], 0x20000080, 0)); > into something more readable? Bloody annoyance every time... Sure, I can > manually translate it into > struct io_uring_sqe *sqe = (void *)0x20000080; > char *s = (void *)0x20000180; > memset(sqe, '\0', sizeof(*sqe)); > sqe->opcode = 0x12; // IORING_OP_OPENAT? > sqe->fd = -100; // AT_FDCWD? > sqe->addr = s; > strcpy(s, "./file0"); > sqe->open_flags = 0x80; // O_EXCL??? > sqe->user_data = 0x23456; // random tag? > syz_io_uring_submit(r[1], r[2], (unsigned long)p, 0); > but it's really annoying as hell, especially since syz_io_uring_submit() > comes from syzcaller and the damn thing _knows_ that the third argument > is sodding io_uring_sqe, and never passed to anything other than > memcpy() in there, at that, so the exact address can't matter. ... especially since the native syzcaller reproducer clearly *does* have that information. Simply putting that into comments side-by-side with what gets put into C reproducer would be nice, especially if it goes with field names... ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 2:34 ` Al Viro 2021-04-04 2:38 ` Al Viro @ 2021-04-04 11:34 ` Christian Brauner 2021-04-04 15:56 ` Al Viro 1 sibling, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-04 11:34 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 02:34:08AM +0000, Al Viro wrote: > On Thu, Apr 01, 2021 at 07:11:12PM +0000, Al Viro wrote: > > > > I _think_ I see what the issue is. It seems that an assumption made in > > > this commit might be wrong and we're missing a mnt_add_count() bump that > > > we would otherwise have gotten if we've moved the failure handling into > > > the unlazy helpers themselves. > > > > > > Al, does that sound plausible? > > > > mnt_add_count() on _what_? Failure in legitimize_links() ends up with > > nd->path.mnt zeroed, in both callers. So which vfsmount would be > > affected? It looks to me like it's the vfsmount of the nd->root after we called chroot or pivot_root. > > Could you turn that WARN_ON(count < 0) into > if (WARN_ON(count < 0)) > printk(KERN_ERR "id = %d, dev = %s, count = %d\n", > mnt->mnt_id, > mnt->mnt_sb->s_id, > count); > add system("cat /proc/self/mountinfo"); right after sandbox_common() > call and try to reproduce that? Sorry for not replying to your earlier mail but I've been debugging this too. My current theory is that it's related to LOOKUP_ROOT_GRABBED when LOOKUP_CACHED is specified _possibly_ with an interaction how create_io_thread() is created with CLONE_FS. The reproducer requires you either have called pivot_root() or chroot() in order for the failure to happen. So I think the fact that we skip legitimize_root() when LOOKUP_CACHED is set might figure into this. I can keep digging. Funny enough I already placed a printk statement into the place you wanted one too so I just amended mine. Here's what you get: If pivot pivot_root() is used before the chroot() you get: [ 637.464555] AAAA: count(-1) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(579) | dev(tmpfs) if you only call chroot, i.e. make the pivot_root() branch a simple if (true) you get: [ 955.206117] AAAA: count(-2) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(580) | dev(tmpfs) The cat /proc/self/mountinfo is for the id(579) below: 514 513 8:2 / / rw,relatime - ext4 /dev/sda2 rw 515 514 0:5 / /dev rw,nosuid,noexec,relatime - devtmpfs udev rw,size=302716k,nr_inodes=75679,mode=755 516 515 0:26 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 517 515 0:28 / /dev/shm rw,nosuid,nodev - tmpfs tmpfs rw 518 515 0:48 / /dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,pagesize=2M 519 515 0:21 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw 520 514 0:27 / /run rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=62152k,mode=755 521 520 0:29 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=5120k 522 520 0:49 / /run/lxd_agent rw,relatime - tmpfs tmpfs rw,size=51200k,mode=700 523 520 0:59 / /run/user/1000 rw,nosuid,nodev,relatime - tmpfs tmpfs rw,size=62148k,nr_inodes=15537,mode=700,uid=1000,gid=1000 524 514 0:24 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 525 524 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime - securityfs securityfs rw 526 524 0:30 / /sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,size=4096k,nr_inodes=1024,mode=755 527 526 0:31 /../../.. /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup2 rw 528 526 0:32 /../../.. /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,name=systemd 529 526 0:36 /../../.. /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory 530 526 0:37 /.. /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu,cpuacct 531 526 0:38 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,hugetlb 549 526 0:39 /.. /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,blkio 550 526 0:40 /../../.. /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer 551 526 0:41 /../../.. /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,pids 552 526 0:42 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,net_cls,net_prio 553 526 0:43 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,perf_event 554 526 0:44 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpuset,clone_children 555 526 0:45 /.. /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,devices 556 526 0:46 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,rdma 557 524 0:33 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime - pstore pstore rw 558 524 0:34 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime - efivarfs efivarfs rw 559 524 0:35 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime - bpf none rw,mode=700 560 524 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime - debugfs debugfs rw 561 524 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime - tracefs tracefs rw 562 524 0:51 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime - fusectl fusectl rw 563 524 0:20 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime - configfs configfs rw 564 514 0:25 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 565 564 0:47 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,fd=29,pgrp=0,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=31453 566 565 0:52 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime - binfmt_misc binfmt_misc rw 567 514 0:50 / /home/ubuntu/src/compiled rw,relatime - virtiofs lxd_lxc rw 568 514 8:1 / /boot/efi rw,relatime - vfat /dev/sda1 rw,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 569 514 0:57 / /var/lib/lxcfs rw,nosuid,nodev,relatime - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other > > I really wonder what mount is it happening to. BTW, how painful would > it be to teach syzcaller to turn those cascades of > NONFAILING(*(uint8_t*)0x20000080 = 0x12); > NONFAILING(*(uint8_t*)0x20000081 = 0); > NONFAILING(*(uint16_t*)0x20000082 = 0); > NONFAILING(*(uint32_t*)0x20000084 = 0xffffff9c); > NONFAILING(*(uint64_t*)0x20000088 = 0); > NONFAILING(*(uint64_t*)0x20000090 = 0x20000180); > NONFAILING(memcpy((void*)0x20000180, "./file0\000", 8)); > NONFAILING(*(uint32_t*)0x20000098 = 0); > NONFAILING(*(uint32_t*)0x2000009c = 0x80); > NONFAILING(*(uint64_t*)0x200000a0 = 0x23456); > .... > NONFAILING(syz_io_uring_submit(r[1], r[2], 0x20000080, 0)); > into something more readable? Bloody annoyance every time... Sure, I can > manually translate it into > struct io_uring_sqe *sqe = (void *)0x20000080; > char *s = (void *)0x20000180; > memset(sqe, '\0', sizeof(*sqe)); > sqe->opcode = 0x12; // IORING_OP_OPENAT? > sqe->fd = -100; // AT_FDCWD? > sqe->addr = s; > strcpy(s, "./file0"); > sqe->open_flags = 0x80; // O_EXCL??? > sqe->user_data = 0x23456; // random tag? > syz_io_uring_submit(r[1], r[2], (unsigned long)p, 0); > but it's really annoying as hell, especially since syz_io_uring_submit() > comes from syzcaller and the damn thing _knows_ that the third argument > is sodding io_uring_sqe, and never passed to anything other than > memcpy() in there, at that, so the exact address can't matter. Yeah, it's terrible to get an idea of what's going on in such produces. I did the same exercise as you. I should probably have replied with all of that data sooner so you wouldn't have to do the same shite I had to. Oh well. > > Incidentally, solitary O_EXCL (without O_CREAT) is... curious. Does that > sucker still trigger without it? I.e. with store to 0x2000009c replaced > with storing 0? Yeah, I did that as well yesterday morning and it still triggers. Christian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 11:34 ` Christian Brauner @ 2021-04-04 15:56 ` Al Viro 2021-04-04 16:40 ` Christian Brauner 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-04 15:56 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 01:34:45PM +0200, Christian Brauner wrote: > Sorry for not replying to your earlier mail but I've been debugging this > too. My current theory is that it's related to LOOKUP_ROOT_GRABBED when > LOOKUP_CACHED is specified _possibly_ with an interaction how > create_io_thread() is created with CLONE_FS. The reproducer requires you > either have called pivot_root() or chroot() in order for the failure to > happen. So I think the fact that we skip legitimize_root() when > LOOKUP_CACHED is set might figure into this. I can keep digging. > > Funny enough I already placed a printk statement into the place you > wanted one too so I just amended mine. Here's what you get: > > If pivot pivot_root() is used before the chroot() you get: > > [ 637.464555] AAAA: count(-1) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(579) | dev(tmpfs) > > if you only call chroot, i.e. make the pivot_root() branch a simple > if (true) you get: > > [ 955.206117] AAAA: count(-2) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(580) | dev(tmpfs) Very interesting. What happens if you call loop() twice? And now I wonder whether it's root or cwd, actually... Hmm... How about this: fd = open("/proc/self/mountinfo", 0); mkdir("./newroot/foo", 0777); mount("./newroot/foo", "./newroot/foo", 0, MS_BIND, NULL); chroot("./newroot"); chdir("/foo"); while (1) { static char buf[4096]; int n = read(fd, buf, 4096); if (n <= 0) break; write(1, buf, n); } close(fd); drop_caps(); loop(); as the end of namespace_sandbox_proc(), instead of chroot("./newroot"); chdir("/"); drop_caps(); loop(); sequence we have there? > The cat /proc/self/mountinfo is for the id(579) below: ... and it misses the damn thing, since we call it before the mount in question had been created ;-/ So we'd probably be better off not trying to be clever and just doing that as explicit (and completely untested) read-write loop above. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 15:56 ` Al Viro @ 2021-04-04 16:40 ` Christian Brauner 2021-04-04 16:44 ` Al Viro 2021-04-04 16:52 ` Christian Brauner 0 siblings, 2 replies; 31+ messages in thread From: Christian Brauner @ 2021-04-04 16:40 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 03:56:02PM +0000, Al Viro wrote: > On Sun, Apr 04, 2021 at 01:34:45PM +0200, Christian Brauner wrote: > > > Sorry for not replying to your earlier mail but I've been debugging this > > too. My current theory is that it's related to LOOKUP_ROOT_GRABBED when > > LOOKUP_CACHED is specified _possibly_ with an interaction how > > create_io_thread() is created with CLONE_FS. The reproducer requires you > > either have called pivot_root() or chroot() in order for the failure to > > happen. So I think the fact that we skip legitimize_root() when > > LOOKUP_CACHED is set might figure into this. I can keep digging. > > > > > Funny enough I already placed a printk statement into the place you > > wanted one too so I just amended mine. Here's what you get: > > > > If pivot pivot_root() is used before the chroot() you get: > > > > [ 637.464555] AAAA: count(-1) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(579) | dev(tmpfs) > > > > if you only call chroot, i.e. make the pivot_root() branch a simple > > if (true) you get: > > > > [ 955.206117] AAAA: count(-2) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(580) | dev(tmpfs) > > Very interesting. What happens if you call loop() twice? And now I wonder > whether it's root or cwd, actually... Hmm... > > How about this: > fd = open("/proc/self/mountinfo", 0); > mkdir("./newroot/foo", 0777); > mount("./newroot/foo", "./newroot/foo", 0, MS_BIND, NULL); > chroot("./newroot"); > chdir("/foo"); > while (1) { > static char buf[4096]; > int n = read(fd, buf, 4096); > if (n <= 0) > break; > write(1, buf, n); > } > close(fd); > drop_caps(); > loop(); > as the end of namespace_sandbox_proc(), instead of > chroot("./newroot"); > chdir("/"); > drop_caps(); > loop(); > sequence we have there? Uhum, well then we oops properly with a null-deref. f1-vm login: [ 395.046971][ T5856] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN [ 395.049716][ T5856] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 395.052847][ T5856] CPU: 1 PID: 5856 Comm: iou-wrk-5851 Tainted: G E 5.12.0-rc5-020f68e042a19f59784f0962004d848181d13b9e #46 [ 395.056594][ T5856] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 0.0.0 02/06/2015 [ 395.058962][ T5856] RIP: 0010:inode_permission+0x4e/0x530 [ 395.060362][ T5856] Code: ff 89 de e8 34 42 a9 ff 85 db 0f 85 ef 00 00 00 e8 87 40 a9 ff 4c 8d 7d 02 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 14 02 4c 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 c2 [ 395.065442][ T5856] RSP: 0018:ffffc9000681f640 EFLAGS: 00010246 [ 395.067274][ T5856] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff84ac11fc [ 395.069834][ T5856] RDX: 0000000000000000 RSI: ffff888013210000 RDI: 0000000000000002 [ 395.072527][ T5856] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1000044f22 [ 395.075287][ T5856] R10: ffff888031c105d3 R11: ffffed1000044f21 R12: ffffc9000681f8e0 [ 395.077026][ T5856] R13: 0000000000000001 R14: ffffffff8e441e60 R15: 0000000000000002 [ 395.079001][ T5856] FS: 00007f136886f800(0000) GS:ffff888015a80000(0000) knlGS:0000000000000000 [ 395.081181][ T5856] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 395.082659][ T5856] CR2: 000055ecfefe8e70 CR3: 00000000236b7000 CR4: 0000000000350ee0 [ 395.084727][ T5856] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 395.087117][ T5856] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 395.089613][ T5856] Call Trace: [ 395.090570][ T5856] link_path_walk.part.0+0x790/0xc10 [ 395.091735][ T5856] ? write_comp_data+0x2a/0x90 [ 395.092931][ T5856] ? __sanitizer_cov_trace_pc+0x1d/0x50 [ 395.094093][ T5856] ? walk_component+0x6a0/0x6a0 [ 395.095021][ T5856] ? percpu_counter_add_batch+0xbc/0x180 [ 395.096255][ T5856] path_openat+0x269/0x2790 [ 395.097305][ T5856] ? path_lookupat.isra.0+0x530/0x530 [ 395.098391][ T5856] ? ret_from_fork+0x1f/0x30 [ 395.099431][ T5856] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 395.101326][ T5856] do_filp_open+0x208/0x270 [ 395.102894][ T5856] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 395.104525][ T5856] ? may_open_dev+0xf0/0xf0 [ 395.105968][ T5856] ? do_raw_spin_lock+0x125/0x2e0 [ 395.107426][ T5856] ? write_comp_data+0x2a/0x90 [ 395.108651][ T5856] ? __sanitizer_cov_trace_pc+0x1d/0x50 [ 395.110033][ T5856] ? _raw_spin_unlock+0x29/0x40 [ 395.111397][ T5856] ? alloc_fd+0x499/0x640 [ 395.112783][ T5856] io_openat2+0x1d1/0x8f0 [ 395.114237][ T5856] ? __lock_acquire+0x1847/0x5850 [ 395.115982][ T5856] ? io_req_complete_post+0xa90/0xa90 [ 395.117347][ T5856] ? write_comp_data+0x2a/0x90 [ 395.118428][ T5856] ? io_issue_sqe+0x5ac0/0x5ac0 [ 395.119497][ T5856] io_issue_sqe+0x2a2/0x5ac0 [ 395.120707][ T5856] ? io_poll_complete.constprop.0+0x100/0x100 [ 395.130417][ T5856] ? find_held_lock+0x2d/0x110 [ 395.139772][ T5856] ? io_worker_handle_work+0x55d/0x13b0 [ 395.149051][ T5856] ? lock_downgrade+0x690/0x690 [ 395.158262][ T5856] ? rwlock_bug.part.0+0x90/0x90 [ 395.171997][ T5856] ? io_issue_sqe+0x5ac0/0x5ac0 [ 395.180513][ T5856] io_wq_submit_work+0x1da/0x640 [ 395.193646][ T5856] ? io_issue_sqe+0x5ac0/0x5ac0 [ 395.202240][ T5856] io_worker_handle_work+0x70a/0x13b0 [ 395.210276][ T5856] ? rwlock_bug.part.0+0x90/0x90 [ 395.218389][ T5856] io_wqe_worker+0x2ec/0xd00 [ 395.226152][ T5856] ? io_worker_handle_work+0x13b0/0x13b0 [ 395.238492][ T5856] ? ret_from_fork+0x8/0x30 [ 395.245595][ T5856] ? lock_downgrade+0x690/0x690 [ 395.252630][ T5856] ? do_raw_spin_lock+0x125/0x2e0 [ 395.259801][ T5856] ? rwlock_bug.part.0+0x90/0x90 [ 395.271613][ T5856] ? write_comp_data+0x2a/0x90 [ 395.279000][ T5856] ? _raw_spin_unlock_irq+0x24/0x50 [ 395.285999][ T5856] ? io_worker_handle_work+0x13b0/0x13b0 [ 395.297727][ T5856] ret_from_fork+0x1f/0x30 [ 395.304546][ T5856] Modules linked in: efi_pstore(E) efivarfs(E) [ 395.311931][ T5851] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#2] PREEMPT SMP KASAN [ 395.313449][ T5856] ---[ end trace e9d61a560dd12328 ]--- ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 16:40 ` Christian Brauner @ 2021-04-04 16:44 ` Al Viro 2021-04-04 17:05 ` Christian Brauner 2021-04-04 16:52 ` Christian Brauner 1 sibling, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-04 16:44 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 06:40:40PM +0200, Christian Brauner wrote: > > Very interesting. What happens if you call loop() twice? And now I wonder > > whether it's root or cwd, actually... Hmm... > > > > How about this: > > fd = open("/proc/self/mountinfo", 0); > > mkdir("./newroot/foo", 0777); > > mount("./newroot/foo", "./newroot/foo", 0, MS_BIND, NULL); > > chroot("./newroot"); > > chdir("/foo"); > > while (1) { > > static char buf[4096]; > > int n = read(fd, buf, 4096); > > if (n <= 0) > > break; > > write(1, buf, n); > > } > > close(fd); > > drop_caps(); > > loop(); > > as the end of namespace_sandbox_proc(), instead of > > chroot("./newroot"); > > chdir("/"); > > drop_caps(); > > loop(); > > sequence we have there? > > Uhum, well then we oops properly with a null-deref. Cute... Could you dump namei.o (ideally - with namei.s) from your build someplace public? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 16:44 ` Al Viro @ 2021-04-04 17:05 ` Christian Brauner 2021-04-04 18:50 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-04 17:05 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 04:44:06PM +0000, Al Viro wrote: > On Sun, Apr 04, 2021 at 06:40:40PM +0200, Christian Brauner wrote: > > > > Very interesting. What happens if you call loop() twice? And now I wonder > > > whether it's root or cwd, actually... Hmm... > > > > > > How about this: > > > fd = open("/proc/self/mountinfo", 0); > > > mkdir("./newroot/foo", 0777); > > > mount("./newroot/foo", "./newroot/foo", 0, MS_BIND, NULL); > > > chroot("./newroot"); > > > chdir("/foo"); > > > while (1) { > > > static char buf[4096]; > > > int n = read(fd, buf, 4096); > > > if (n <= 0) > > > break; > > > write(1, buf, n); > > > } > > > close(fd); > > > drop_caps(); > > > loop(); > > > as the end of namespace_sandbox_proc(), instead of > > > chroot("./newroot"); > > > chdir("/"); > > > drop_caps(); > > > loop(); > > > sequence we have there? > > > > Uhum, well then we oops properly with a null-deref. > > Cute... Could you dump namei.o (ideally - with namei.s) from your build > someplace public? Yeah, I have at least namei.o https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing Christian ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 17:05 ` Christian Brauner @ 2021-04-04 18:50 ` Al Viro 2021-04-04 20:17 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-04 18:50 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 07:05:13PM +0200, Christian Brauner wrote: > On Sun, Apr 04, 2021 at 04:44:06PM +0000, Al Viro wrote: > > On Sun, Apr 04, 2021 at 06:40:40PM +0200, Christian Brauner wrote: > > > > > > Very interesting. What happens if you call loop() twice? And now I wonder > > > > whether it's root or cwd, actually... Hmm... > > > > > > > > How about this: > > > > fd = open("/proc/self/mountinfo", 0); > > > > mkdir("./newroot/foo", 0777); > > > > mount("./newroot/foo", "./newroot/foo", 0, MS_BIND, NULL); > > > > chroot("./newroot"); > > > > chdir("/foo"); > > > > while (1) { > > > > static char buf[4096]; > > > > int n = read(fd, buf, 4096); > > > > if (n <= 0) > > > > break; > > > > write(1, buf, n); > > > > } > > > > close(fd); > > > > drop_caps(); > > > > loop(); > > > > as the end of namespace_sandbox_proc(), instead of > > > > chroot("./newroot"); > > > > chdir("/"); > > > > drop_caps(); > > > > loop(); > > > > sequence we have there? > > > > > > Uhum, well then we oops properly with a null-deref. > > > > Cute... Could you dump namei.o (ideally - with namei.s) from your build > > someplace public? > > Yeah, I have at least namei.o > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing *grumble* Is it reproducible without KASAN? Would be much easier to follow the produced asm... ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 18:50 ` Al Viro @ 2021-04-04 20:17 ` Al Viro 2021-04-05 11:44 ` Christian Brauner 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-04 20:17 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote: > > Yeah, I have at least namei.o > > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing > > *grumble* > > Is it reproducible without KASAN? Would be much easier to follow the produced > asm... Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e. nd->inode == NULL. Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in link_path_walk() and trying to reproduce that oops? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 20:17 ` Al Viro @ 2021-04-05 11:44 ` Christian Brauner 2021-04-05 16:18 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-05 11:44 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 08:17:21PM +0000, Al Viro wrote: > On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote: > > > > Yeah, I have at least namei.o > > > > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing > > > > *grumble* > > > > Is it reproducible without KASAN? Would be much easier to follow the produced > > asm... > > Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e. > nd->inode == NULL. Yeah, I already saw that. > > Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in > link_path_walk() and trying to reproduce that oops? Yep, no problem. If you run the reproducer in a loop for a little while you eventually trigger the BUG_ON() and then you get the following splat (and then an endless loop) in [1] with nd->inode NULL. _But_ I managed to debug this further and was able to trigger the BUG_ON() directly in path_init() in the AT_FDCWD branch (after all its AT_FDCWD(./file0) with the patch in [3] (it's in LOOKUP_RCU) the corresponding splat is in [2]. So the crash happens for a PF_IO_WORKER thread with a NULL nd->inode for the PF_IO_WORKER's pwd (The PF_IO_WORKER seems to be in async context.). [3]: diff --git a/fs/namei.c b/fs/namei.c index 28a70diff --git a/fs/namei.c b/fs/namei.c index 28a7006bdb04..8a31ccb61c45 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2336,8 +2336,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags) /* Relative pathname -- get the starting-point it is relative to. */ if (nd->dfd == AT_FDCWD) { + struct fs_struct *fs = current->fs; if (flags & LOOKUP_RCU) { - struct fs_struct *fs = current->fs; unsigned seq; do { @@ -2347,9 +2347,14 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); } while (read_seqcount_retry(&fs->seq, seq)); } else { - get_fs_pwd(current->fs, &nd->path); + get_fs_pwd(fs, &nd->path); nd->inode = nd->path.dentry->d_inode; } + + BUG_ON(!fs->users); + BUG_ON(fs->users < 0); + BUG_ON(!nd->inode && (current->flags & PF_IO_WORKER)); + BUG_ON(!nd->inode); } else { /* Caller must check execute permissions on the starting path component */ struct fd f = fdget_raw(nd->dfd); [1]: [ 257.564526] ------------[ cut here ]------------ [ 257.564549] kernel BUG at fs/namei.c:2208! [ 257.564998] ------------[ cut here ]------------ [ 257.565773] kernel BUG at fs/namei.c:2208! [ 257.567632] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 257.569113] CPU: 3 PID: 6086 Comm: uring_viro Tainted: G W E 5.12.0-rc5-b47394a22e3dce3d03165e48ef0462f41c198ac7 #47 [ 257.572028] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 0.0.0 02/06/2015 [ 257.573687] RIP: 0010:link_path_walk.part.0+0x6c9/0xd30 [ 257.575008] Code: 00 48 8b 43 18 48 89 44 24 18 48 8b 44 24 38 80 38 00 0f 85 b1 04 00 00 49 8b 6e 30 48 85 ed 0f 85 82 fb ff ff e8 67 35 a8 ff <0f> 0b e8 60 35 a8 ff 49 83 c7 01 48 bf 00 00 00 00 00 fc ff df 4c [ 257.580853] RSP: 0018:ffffc900052af5c0 EFLAGS: 00010246 [ 257.582513] RAX: 0000000000000000 RBX: ffff88802f7b2da0 RCX: ffffffff8ecd175c [ 257.584211] RDX: 0000000000000000 RSI: ffff8880133a0000 RDI: 0000000000000002 [ 257.585985] RBP: 0000000000000000 R08: 0000000000000001 R09: fffffbfff388f13e [ 257.588061] R10: ffff8880322e0353 R11: fffffbfff388f13d R12: ffffc900052af820 [ 257.590059] R13: ffff8880133a0000 R14: ffffc900052af820 R15: ffff888031149120 [ 257.592605] FS: 00007f445b331800(0000) GS:ffff888015b80000(0000) knlGS:0000000000000000 [ 257.595266] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 257.597262] CR2: 00005578c2c17008 CR3: 00000000307a4000 CR4: 0000000000350ee0 [ 257.599546] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 257.601331] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 257.602938] Call Trace: [ 257.603764] ? write_comp_data+0x2a/0x90 [ 257.605637] ? __sanitizer_cov_trace_pc+0x1d/0x50 [ 257.607107] ? path_init+0x662/0x1800 [ 257.608713] ? walk_component+0x6a0/0x6a0 [ 257.610125] path_openat+0x269/0x2790 [ 257.611601] ? path_lookupat.isra.0+0x530/0x530 [ 257.613180] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 257.614586] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 257.615898] do_filp_open+0x197/0x270 [ 257.616881] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 257.618293] ? may_open_dev+0xf0/0xf0 [ 257.619700] ? do_raw_spin_lock+0x125/0x2e0 [ 257.621159] ? write_comp_data+0x2a/0x90 [ 257.622574] ? __sanitizer_cov_trace_pc+0x1d/0x50 [ 257.624271] ? _raw_spin_unlock+0x29/0x40 [ 257.625737] ? alloc_fd+0x499/0x640 [ 257.627075] io_openat2+0x1d1/0x8f0 [ 257.628458] ? io_req_complete_post+0xa90/0xa90 [ 257.629921] ? __lock_acquire+0x1847/0x5850 [ 257.631108] ? write_comp_data+0x2a/0x90 [ 257.632069] io_issue_sqe+0x2a2/0x5ac0 [ 257.633112] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 257.634534] ? io_poll_complete.constprop.0+0x100/0x100 [ 257.636129] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 257.647374] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 257.656776] ? find_held_lock+0x2d/0x110 [ 257.665481] ? __might_fault+0xd8/0x180 [ 257.674198] __io_queue_sqe+0x19f/0xcf0 [ 257.688028] ? __check_object_size+0x1b4/0x4e0 [ 257.696724] ? __ia32_sys_io_uring_setup+0x70/0x70 [ 257.705177] ? write_comp_data+0x2a/0x90 [ 257.713125] io_queue_sqe+0x612/0xb70 [ 257.720757] io_submit_sqes+0x517d/0x6650 [ 257.733042] ? __x64_sys_io_uring_enter+0xb15/0xdd0 [ 257.741308] __x64_sys_io_uring_enter+0xb15/0xdd0 [ 257.748782] ? __ia32_sys_io_uring_enter+0xdd0/0xdd0 [ 257.756094] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 257.764428] ? syscall_enter_from_user_mode+0x27/0x70 [ 257.773331] do_syscall_64+0x2d/0x70 [ 257.780447] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 257.787936] RIP: 0033:0x7f445b44767d [ 257.795186] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb f7 0c 00 f7 d8 64 89 01 48 [ 257.816184] RSP: 002b:00005578c0f91e98 EFLAGS: 00000212 ORIG_RAX: 00000000000001aa [ 257.824286] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f445b44767d [ 257.832553] RDX: 0000000000000000 RSI: 00000000000045f5 RDI: 0000000000000003 [ 257.844189] RBP: 00005578c0f91f70 R08: 0000000000000000 R09: 0000000000000000 [ 257.853290] R10: 0000000000000000 R11: 0000000000000212 R12: 00005578c0e64640 [ 257.861454] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 257.874753] Modules linked in: efi_pstore(E) efivarfs(E) [ 257.883235] invalid opcode: 0000 [#2] PREEMPT SMP KASAN [ 257.891763] ---[ end trace a40a2f46a1222bc8 ]--- [2]: [ 228.106433] ------------[ cut here ]------------ [ 228.106449] kernel BUG at fs/namei.c:2356! [ 228.106596] ------------[ cut here ]------------ [ 228.109523] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 228.110025] kernel BUG at fs/namei.c:2357! [ 228.111226] CPU: 1 PID: 6052 Comm: iou-wrk-6049 Tainted: G W E 5.12.0-rc5-cc402ba42485f28e932348e7627433eb1dccef5c #51 [ 228.115791] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 0.0.0 02/06/2015 [ 228.118150] RIP: 0010:path_init+0x141c/0x1730 [ 228.119523] Code: fb ff ff 48 c7 c7 00 47 c1 91 e8 4f c9 f0 ff e9 b5 ed ff ff 48 c7 c7 40 46 c1 91 e8 3e c9 f0 ff e9 5b ee ff ff e8 b4 78 a8 ff <0f> 0b 48 8b bd 60 ff ff ff e8 26 c9 f0 ff e9 52 f5 ff ff e8 1c c9 [ 228.124889] RSP: 0018:ffffc90004d8f6c0 EFLAGS: 00010246 [ 228.126702] RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffffff884cd98c [ 228.129228] RDX: 0000000000000000 RSI: ffff8880007a9d00 RDI: 0000000000000002 [ 228.131765] RBP: ffffc90004d8f760 R08: 0000000000000001 R09: ffffed10026e1ba2 [ 228.133962] R10: ffff88801370dd0b R11: ffffed10026e1ba1 R12: ffff88801370dd08 [ 228.135843] R13: ffff88801370dd00 R14: ffffc90004d8f8e0 R15: 0000000000000001 [ 228.137812] FS: 00007fd5db164800(0000) GS:ffff888015a80000(0000) knlGS:0000000000000000 [ 228.140069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 228.141894] CR2: 00007fe892801000 CR3: 0000000036041000 CR4: 0000000000350ee0 [ 228.144485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 228.146914] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 228.149326] Call Trace: [ 228.150372] ? write_comp_data+0x2a/0x90 [ 228.151870] ? percpu_counter_add_batch+0xbc/0x180 [ 228.153667] path_openat+0x192/0x2790 [ 228.155038] ? path_lookupat.isra.0+0x530/0x530 [ 228.156573] ? ret_from_fork+0x1f/0x30 [ 228.157757] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 228.159136] do_filp_open+0x208/0x270 [ 228.160252] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 228.161598] ? may_open_dev+0xf0/0xf0 [ 228.162653] ? do_raw_spin_lock+0x125/0x2e0 [ 228.163856] ? write_comp_data+0x2a/0x90 [ 228.165213] ? __sanitizer_cov_trace_pc+0x1d/0x50 [ 228.166782] ? _raw_spin_unlock+0x29/0x40 [ 228.168176] ? alloc_fd+0x499/0x640 [ 228.169402] io_openat2+0x1d1/0x8f0 [ 228.170607] ? __lock_acquire+0x1847/0x5850 [ 228.172381] ? io_req_complete_post+0xa90/0xa90 [ 228.174218] ? write_comp_data+0x2a/0x90 [ 228.175843] ? io_issue_sqe+0x5ac0/0x5ac0 [ 228.177600] io_issue_sqe+0x2a2/0x5ac0 [ 228.179183] ? io_poll_complete.constprop.0+0x100/0x100 [ 228.180821] ? find_held_lock+0x2d/0x110 [ 228.181963] ? io_worker_handle_work+0x55d/0x13b0 [ 228.183154] ? lock_downgrade+0x690/0x690 [ 228.192463] ? rwlock_bug.part.0+0x90/0x90 [ 228.202181] ? io_issue_sqe+0x5ac0/0x5ac0 [ 228.211962] io_wq_submit_work+0x1da/0x640 [ 228.227152] ? io_issue_sqe+0x5ac0/0x5ac0 [ 228.236852] io_worker_handle_work+0x70a/0x13b0 [ 228.246583] ? rwlock_bug.part.0+0x90/0x90 [ 228.256009] io_wqe_worker+0x2ec/0xd00 [ 228.264954] ? io_worker_handle_work+0x13b0/0x13b0 [ 228.274045] ? ret_from_fork+0x8/0x30 [ 228.288365] ? lock_downgrade+0x690/0x690 [ 228.297032] ? do_raw_spin_lock+0x125/0x2e0 [ 228.305457] ? rwlock_bug.part.0+0x90/0x90 [ 228.313717] ? write_comp_data+0x2a/0x90 [ 228.321685] ? _raw_spin_unlock_irq+0x24/0x50 [ 228.329596] ? io_worker_handle_work+0x13b0/0x13b0 [ 228.343015] ret_from_fork+0x1f/0x30 [ 228.351033] Modules linked in: efi_pstore(E) efivarfs(E) [ 228.359699] invalid opcode: 0000 [#2] PREEMPT SMP KASAN [ 228.370439] CPU: 3 PID: 6049 Comm: uring_viro Tainted: G D W E 5.12.0-rc5-cc402ba42485f28e932348e7627433eb1dccef5c #51 [ 228.394578] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 0.0.0 02/06/2015 [ 228.396859] ---[ end trace 5ac6a4c645048000 ]--- ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-05 11:44 ` Christian Brauner @ 2021-04-05 16:18 ` Al Viro 2021-04-05 17:08 ` Christian Brauner 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-05 16:18 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Mon, Apr 05, 2021 at 01:44:37PM +0200, Christian Brauner wrote: > On Sun, Apr 04, 2021 at 08:17:21PM +0000, Al Viro wrote: > > On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote: > > > > > > Yeah, I have at least namei.o > > > > > > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing > > > > > > *grumble* > > > > > > Is it reproducible without KASAN? Would be much easier to follow the produced > > > asm... > > > > Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e. > > nd->inode == NULL. > > Yeah, I already saw that. > > > > > Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in > > link_path_walk() and trying to reproduce that oops? > > Yep, no problem. If you run the reproducer in a loop for a little while > you eventually trigger the BUG_ON() and then you get the following splat > (and then an endless loop) in [1] with nd->inode NULL. > > _But_ I managed to debug this further and was able to trigger the BUG_ON() > directly in path_init() in the AT_FDCWD branch (after all its AT_FDCWD(./file0) > with the patch in [3] (it's in LOOKUP_RCU) the corresponding splat is in [2]. > So the crash happens for a PF_IO_WORKER thread with a NULL nd->inode for the > PF_IO_WORKER's pwd (The PF_IO_WORKER seems to be in async context.). So we find current->fs->pwd.dentry negative, with current->fs->seq sampled equal before and after that? Lovely... The only places where we assign anything to ->pwd.dentry are void set_fs_pwd(struct fs_struct *fs, const struct path *path) { struct path old_pwd; path_get(path); spin_lock(&fs->lock); write_seqcount_begin(&fs->seq); old_pwd = fs->pwd; fs->pwd = *path; write_seqcount_end(&fs->seq); spin_unlock(&fs->lock); if (old_pwd.dentry) path_put(&old_pwd); } where we have ->seq bumped between dget new/assignment/ dput old, copy_fs_struct() where we have spin_lock(&old->lock); fs->root = old->root; path_get(&fs->root); fs->pwd = old->pwd; path_get(&fs->pwd); spin_unlock(&old->lock); fs being freshly allocated instance that couldn't have been observed by anyone and chroot_fs_refs(), where we have spin_lock(&fs->lock); write_seqcount_begin(&fs->seq); hits += replace_path(&fs->root, old_root, new_root); hits += replace_path(&fs->pwd, old_root, new_root); write_seqcount_end(&fs->seq); while (hits--) { count++; path_get(new_root); } spin_unlock(&fs->lock); ... static inline int replace_path(struct path *p, const struct path *old, const struct path *new) { if (likely(p->dentry != old->dentry || p->mnt != old->mnt)) return 0; *p = *new; return 1; } Here we have new_root->dentry pinned from the very beginning, and assignments are wrapped into bumps of ->seq. Moreover, we are holding ->lock through that sequence (as all writers do), so these references can't be dropped before path_get() bumps new_root->dentry refcount. chroot_fs_refs() is called only by pivot_root(2): chroot_fs_refs(&root, &new); and there new is set by error = user_path_at(AT_FDCWD, new_root, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new); if (error) goto out0; which pins new.dentry *and* verifies that it's positive and a directory, at that. Since pinned positive dentry can't be made negative by anybody else, we know it will remain in that state until path_put(&new); well downstream of chroot_fs_refs(). In copy_fs_struct() we are copying someone's ->pwd, so it's also pinned positive. And it won't be dropped outside of old->lock, so by the time somebody manages to drop the reference in old, path_get() effects will be visible (old->lock serving as a barrier). That leaves set_fs_pwd() calls: fs/init.c:54: set_fs_pwd(current->fs, &path); init_chdir(), path set by LOOKUP_DIRECTORY patwalk. Pinned positive. fs/namespace.c:4207: set_fs_pwd(current->fs, &root); init_mount_tree(), root.dentry being ->mnt_root of rootfs. Pinned positive (and it would've oopsed much earlier had that been it) fs/namespace.c:4485: set_fs_pwd(fs, &root); mntns_install(), root filled by successful LOOKUP_DOWN for "/" from mnt_ns->root. Should be pinned positive. fs/open.c:501: set_fs_pwd(current->fs, &path); chdir(2), path set by LOOKUP_DIRECTORY pathwalk. Pinned positive. fs/open.c:528: set_fs_pwd(current->fs, &f.file->f_path); fchdir(2), file->f_path of any opened file. Pinned positive. kernel/usermode_driver.c:130: set_fs_pwd(current->fs, &umd_info->wd); umd_setup(), ->wd.dentry equal to ->wd.mnt->mnt_root, should be pinned positive. kernel/nsproxy.c:509: set_fs_pwd(me->fs, &nsset->fs->pwd); commit_nsset(). Let's see what's going on there... if ((flags & CLONE_NEWNS) && (flags & ~CLONE_NEWNS)) { set_fs_root(me->fs, &nsset->fs->root); set_fs_pwd(me->fs, &nsset->fs->pwd); } In those conditions nsset.fs has come from copy_fs_struct() done in prepare_nsset(). And the only thing that might've been done to it would be those set_fs_pwd() in mntns_install() (I'm not fond of the entire nsset->fs thing - looks like papering over bad calling conventions, but anyway) Now, I might've missed some insanity (direct assignments to ->pwd.dentry, etc. - wouldn't be the first time io_uring folks went "layering? wassat? we'll just poke in whatever we can reach"), but I don't see anything obvious of that sort in the area... OK, how about this: in path_init(), right after do { seq = read_seqcount_begin(&fs->seq); nd->path = fs->pwd; nd->inode = nd->path.dentry->d_inode; nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); } while (read_seqcount_retry(&fs->seq, seq)); slap if (!nd->inode) { // should never, ever happen struct dentry *fucked = nd->path.dentry; printk(KERN_ERR "%pd4 %d %x %p %d %d", fucked, d_count(fucked), fucked->d_flags, fs, fs->users, seq); BUG_ON(1); return ERR_PTR(-EINVAL); } and see what it catches? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-05 16:18 ` Al Viro @ 2021-04-05 17:08 ` Christian Brauner 2021-04-05 18:23 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-05 17:08 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Mon, Apr 05, 2021 at 04:18:58PM +0000, Al Viro wrote: > On Mon, Apr 05, 2021 at 01:44:37PM +0200, Christian Brauner wrote: > > On Sun, Apr 04, 2021 at 08:17:21PM +0000, Al Viro wrote: > > > On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote: > > > > > > > > Yeah, I have at least namei.o > > > > > > > > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing > > > > > > > > *grumble* > > > > > > > > Is it reproducible without KASAN? Would be much easier to follow the produced > > > > asm... > > > > > > Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e. > > > nd->inode == NULL. > > > > Yeah, I already saw that. > > > > > > > > Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in > > > link_path_walk() and trying to reproduce that oops? > > > > Yep, no problem. If you run the reproducer in a loop for a little while > > you eventually trigger the BUG_ON() and then you get the following splat > > (and then an endless loop) in [1] with nd->inode NULL. > > > > _But_ I managed to debug this further and was able to trigger the BUG_ON() > > directly in path_init() in the AT_FDCWD branch (after all its AT_FDCWD(./file0) > > with the patch in [3] (it's in LOOKUP_RCU) the corresponding splat is in [2]. > > So the crash happens for a PF_IO_WORKER thread with a NULL nd->inode for the > > PF_IO_WORKER's pwd (The PF_IO_WORKER seems to be in async context.). > > So we find current->fs->pwd.dentry negative, with current->fs->seq sampled > equal before and after that? Lovely... The only places where we assign > anything to ->pwd.dentry are > void set_fs_pwd(struct fs_struct *fs, const struct path *path) > { > struct path old_pwd; > > path_get(path); > spin_lock(&fs->lock); > write_seqcount_begin(&fs->seq); > old_pwd = fs->pwd; > fs->pwd = *path; > write_seqcount_end(&fs->seq); > spin_unlock(&fs->lock); > > if (old_pwd.dentry) > path_put(&old_pwd); > } > where we have ->seq bumped between dget new/assignment/ dput old, > copy_fs_struct() where we have > spin_lock(&old->lock); > fs->root = old->root; > path_get(&fs->root); > fs->pwd = old->pwd; > path_get(&fs->pwd); > spin_unlock(&old->lock); > fs being freshly allocated instance that couldn't have been observed > by anyone and chroot_fs_refs(), where we have > spin_lock(&fs->lock); > write_seqcount_begin(&fs->seq); > hits += replace_path(&fs->root, old_root, new_root); > hits += replace_path(&fs->pwd, old_root, new_root); > write_seqcount_end(&fs->seq); > while (hits--) { > count++; > path_get(new_root); > } > spin_unlock(&fs->lock); > ... > static inline int replace_path(struct path *p, const struct path *old, const struct path *new) > { > if (likely(p->dentry != old->dentry || p->mnt != old->mnt)) > return 0; > *p = *new; > return 1; > } > Here we have new_root->dentry pinned from the very beginning, > and assignments are wrapped into bumps of ->seq. Moreover, > we are holding ->lock through that sequence (as all writers > do), so these references can't be dropped before path_get() > bumps new_root->dentry refcount. > > chroot_fs_refs() is called only by pivot_root(2): > chroot_fs_refs(&root, &new); > and there new is set by > error = user_path_at(AT_FDCWD, new_root, > LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new); > if (error) > goto out0; > which pins new.dentry *and* verifies that it's positive and a directory, > at that. Since pinned positive dentry can't be made negative by anybody > else, we know it will remain in that state until > path_put(&new); > well downstream of chroot_fs_refs(). In copy_fs_struct() we are > copying someone's ->pwd, so it's also pinned positive. And it > won't be dropped outside of old->lock, so by the time somebody > manages to drop the reference in old, path_get() effects will be > visible (old->lock serving as a barrier). > > That leaves set_fs_pwd() calls: > fs/init.c:54: set_fs_pwd(current->fs, &path); > init_chdir(), path set by LOOKUP_DIRECTORY patwalk. Pinned positive. > fs/namespace.c:4207: set_fs_pwd(current->fs, &root); > init_mount_tree(), root.dentry being ->mnt_root of rootfs. Pinned > positive (and it would've oopsed much earlier had that been it) > fs/namespace.c:4485: set_fs_pwd(fs, &root); > mntns_install(), root filled by successful LOOKUP_DOWN for "/" > from mnt_ns->root. Should be pinned positive. > fs/open.c:501: set_fs_pwd(current->fs, &path); > chdir(2), path set by LOOKUP_DIRECTORY pathwalk. Pinned positive. > fs/open.c:528: set_fs_pwd(current->fs, &f.file->f_path); > fchdir(2), file->f_path of any opened file. Pinned positive. > kernel/usermode_driver.c:130: set_fs_pwd(current->fs, &umd_info->wd); > umd_setup(), ->wd.dentry equal to ->wd.mnt->mnt_root, should be pinned positive. > kernel/nsproxy.c:509: set_fs_pwd(me->fs, &nsset->fs->pwd); > commit_nsset(). Let's see what's going on there... > > if ((flags & CLONE_NEWNS) && (flags & ~CLONE_NEWNS)) { > set_fs_root(me->fs, &nsset->fs->root); > set_fs_pwd(me->fs, &nsset->fs->pwd); > } > In those conditions nsset.fs has come from copy_fs_struct() done in > prepare_nsset(). And the only thing that might've been done to it > would be those set_fs_pwd() in mntns_install() (I'm not fond of the > entire nsset->fs thing - looks like papering over bad calling > conventions, but anyway) > > Now, I might've missed some insanity (direct assignments to ->pwd.dentry, > etc. - wouldn't be the first time io_uring folks went "layering? wassat? > we'll just poke in whatever we can reach"), but I don't see anything > obvious of that sort in the area... > > OK, how about this: in path_init(), right after > do { > seq = read_seqcount_begin(&fs->seq); > nd->path = fs->pwd; > nd->inode = nd->path.dentry->d_inode; > nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); > } while (read_seqcount_retry(&fs->seq, seq)); > slap > if (!nd->inode) { > // should never, ever happen > struct dentry *fucked = nd->path.dentry; > printk(KERN_ERR "%pd4 %d %x %p %d %d", fucked, d_count(fucked), > fucked->d_flags, fs, fs->users, seq); > BUG_ON(1); > return ERR_PTR(-EINVAL); > } > and see what it catches? Ah dentry count of -127 looks... odd. [ 246.102077] /newroot/foo -127 18008 ffff888012819000 6 0 [ 246.102240] ------------[ cut here ]------------ [ 246.102264] /newroot/foo -127 18008 ffff888012819000 6 6 [ 246.104163] ------------[ cut here ]------------ [ 246.104943] kernel BUG at fs/namei.c:2359! [ 246.106342] kernel BUG at fs/namei.c:2359! [ 246.106385] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 246.110540] CPU: 0 PID: 6345 Comm: uring_viro Tainted: G W E 5.12.0-rc5-1ebc00aa82b08217d1fc4eef5435f8499783194c #53 [ 246.113725] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 0.0.0 02/06/2015 [ 246.116115] RIP: 0010:path_init.cold+0xbb/0xea [ 246.117711] Code: d0 7c 04 84 d2 75 4b 4c 8b 45 98 41 89 d9 44 89 f9 4c 89 e6 41 8b 94 24 d0 00 00 00 48 c7 c7 20 93 1a b1 41 55 e8 1c 70 fe ff <0f> 0b 48 8b 7d b8 e8 1f ee e6 f8 e9 55 ff ff ff 48 8b 7d 98 e8 01 [ 246.124372] RSP: 0018:ffffc900073275f0 EFLAGS: 00010282 [ 246.126466] RAX: 000000000000002c RBX: 0000000000000006 RCX: 0000000000000000 [ 246.129685] RDX: 0000000000000000 RSI: ffff88801354d700 RDI: fffff52000e64eb0 [ 246.131400] RBP: ffffc900073276a0 R08: 000000000000002c R09: ffffed1002b46045 [ 246.133241] R10: ffff888015a30227 R11: ffffed1002b46044 R12: ffff8880303eb028 [ 246.135124] R13: 0000000000000006 R14: ffffc90007327820 R15: 0000000000018008 [ 246.136931] FS: 00007f8695724800(0000) GS:ffff888015a00000(0000) knlGS:0000000000000000 [ 246.139247] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 246.141604] CR2: 000055fdf3c11008 CR3: 000000002e9dd000 CR4: 0000000000350ef0 [ 246.143437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 246.145337] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 246.147114] Call Trace: [ 246.147910] ? write_comp_data+0x2a/0x90 [ 246.149010] path_openat+0x192/0x2790 [ 246.150062] ? path_lookupat.isra.0+0x530/0x530 [ 246.151295] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 246.152561] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 246.154071] do_filp_open+0x197/0x270 [ 246.155157] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 246.156392] ? may_open_dev+0xf0/0xf0 [ 246.157969] ? do_raw_spin_lock+0x125/0x2e0 [ 246.159167] ? write_comp_data+0x2a/0x90 [ 246.160319] ? __sanitizer_cov_trace_pc+0x1d/0x50 [ 246.161649] ? _raw_spin_unlock+0x29/0x40 [ 246.162823] ? alloc_fd+0x499/0x640 [ 246.164092] io_openat2+0x1d1/0x8f0 [ 246.165403] ? io_req_complete_post+0xa90/0xa90 [ 246.166974] ? __lock_acquire+0x1847/0x5850 [ 246.168455] ? write_comp_data+0x2a/0x90 [ 246.169877] io_issue_sqe+0x2a2/0x5ac0 [ 246.171226] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 246.173079] ? io_poll_complete.constprop.0+0x100/0x100 [ 246.174960] ? rcu_read_lock_sched_held+0xa1/0xd0 [ 246.176468] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 246.187303] ? find_held_lock+0x2d/0x110 [ 246.197951] ? __might_fault+0xd8/0x180 [ 246.208458] __io_queue_sqe+0x19f/0xcf0 [ 246.218694] ? __check_object_size+0x1b4/0x4e0 [ 246.228802] ? __ia32_sys_io_uring_setup+0x70/0x70 [ 246.239099] ? write_comp_data+0x2a/0x90 [ 246.249152] io_queue_sqe+0x612/0xb70 [ 246.258967] io_submit_sqes+0x517d/0x6650 [ 246.268445] ? __x64_sys_io_uring_enter+0xb15/0xdd0 [ 246.282682] __x64_sys_io_uring_enter+0xb15/0xdd0 [ 246.292110] ? __ia32_sys_io_uring_enter+0xdd0/0xdd0 [ 246.301285] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 246.310116] ? syscall_enter_from_user_mode+0x27/0x70 [ 246.318845] do_syscall_64+0x2d/0x70 [ 246.327328] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 246.336188] RIP: 0033:0x7f869583a67d [ 246.344145] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb f7 0c 00 f7 d8 64 89 01 48 [ 246.367649] RSP: 002b:000055fdf2a89e98 EFLAGS: 00000212 ORIG_RAX: 00000000000001aa [ 246.377100] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f869583a67d [ 246.386119] RDX: 0000000000000000 RSI: 00000000000045f5 RDI: 0000000000000003 [ 246.395095] RBP: 000055fdf2a89f70 R08: 0000000000000000 R09: 0000000000000000 [ 246.403925] R10: 0000000000000000 R11: 0000000000000212 R12: 000055fdf295c640 [ 246.412977] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 246.425046] Modules linked in: efi_pstore(E) efivarfs(E) [ 246.434681] invalid opcode: 0000 [#2] PREEMPT SMP KASAN [ 246.435099] ---[ end trace b331351bc5a092fa ]--- ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-05 17:08 ` Christian Brauner @ 2021-04-05 18:23 ` Al Viro 2021-04-05 18:28 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-05 18:23 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Mon, Apr 05, 2021 at 07:08:01PM +0200, Christian Brauner wrote: > Ah dentry count of -127 looks... odd. dead + 1... void lockref_mark_dead(struct lockref *lockref) { assert_spin_locked(&lockref->lock); lockref->count = -128; } IOW, a leaked (uncounted) reference to dentry, that got dget() called on it after dentry had been freed. IOW, current->fs->pwd.dentry happens to point to an already freed struct dentry here. Joy... Could you slap spin_lock(¤t->fs->lock); WARN_ON(d_count(current->fs->pwd.dentry) < 0); spin_unlock(¤t->fs->lock); before and after calls of io_issue_sqe() and see if it triggers? We definitely are seeing buggered dentry refcounting here. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-05 18:23 ` Al Viro @ 2021-04-05 18:28 ` Al Viro 2021-04-05 20:07 ` Christian Brauner 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-05 18:28 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Mon, Apr 05, 2021 at 06:23:49PM +0000, Al Viro wrote: > On Mon, Apr 05, 2021 at 07:08:01PM +0200, Christian Brauner wrote: > > > Ah dentry count of -127 looks... odd. > > dead + 1... > > void lockref_mark_dead(struct lockref *lockref) > { > assert_spin_locked(&lockref->lock); > lockref->count = -128; > } > > IOW, a leaked (uncounted) reference to dentry, that got dget() called on > it after dentry had been freed. > > IOW, current->fs->pwd.dentry happens to point to an already freed > struct dentry here. Joy... > > Could you slap > > spin_lock(¤t->fs->lock); > WARN_ON(d_count(current->fs->pwd.dentry) < 0); > spin_unlock(¤t->fs->lock); > > before and after calls of io_issue_sqe() and see if it triggers? We definitely > are seeing buggered dentry refcounting here. Check if this helps, please. diff --git a/fs/namei.c b/fs/namei.c index 216f16e74351..82344f1139ff 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2289,6 +2289,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) int error; const char *s = nd->name->name; + nd->path.mnt = NULL; + nd->path.dentry = NULL; + /* LOOKUP_CACHED requires RCU, ask caller to retry */ if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED) return ERR_PTR(-EAGAIN); @@ -2322,8 +2325,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) } nd->root.mnt = NULL; - nd->path.mnt = NULL; - nd->path.dentry = NULL; /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-05 18:28 ` Al Viro @ 2021-04-05 20:07 ` Christian Brauner 2021-04-06 1:38 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-05 20:07 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Mon, Apr 05, 2021 at 06:28:54PM +0000, Al Viro wrote: > On Mon, Apr 05, 2021 at 06:23:49PM +0000, Al Viro wrote: > > On Mon, Apr 05, 2021 at 07:08:01PM +0200, Christian Brauner wrote: > > > > > Ah dentry count of -127 looks... odd. > > > > dead + 1... > > > > void lockref_mark_dead(struct lockref *lockref) > > { > > assert_spin_locked(&lockref->lock); > > lockref->count = -128; > > } > > > > IOW, a leaked (uncounted) reference to dentry, that got dget() called on > > it after dentry had been freed. > > > > IOW, current->fs->pwd.dentry happens to point to an already freed > > struct dentry here. Joy... > > > > Could you slap > > > > spin_lock(¤t->fs->lock); > > WARN_ON(d_count(current->fs->pwd.dentry) < 0); > > spin_unlock(¤t->fs->lock); > > > > before and after calls of io_issue_sqe() and see if it triggers? We definitely > > are seeing buggered dentry refcounting here. > > Check if this helps, please. > > diff --git a/fs/namei.c b/fs/namei.c > index 216f16e74351..82344f1139ff 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -2289,6 +2289,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > int error; > const char *s = nd->name->name; > > + nd->path.mnt = NULL; > + nd->path.dentry = NULL; > + > /* LOOKUP_CACHED requires RCU, ask caller to retry */ > if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED) > return ERR_PTR(-EAGAIN); > @@ -2322,8 +2325,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > } > > nd->root.mnt = NULL; > - nd->path.mnt = NULL; > - nd->path.dentry = NULL; > > /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ > if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { Bingo. That fixes it. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-05 20:07 ` Christian Brauner @ 2021-04-06 1:38 ` Al Viro 2021-04-06 2:24 ` Al Viro 2021-04-06 12:35 ` Christian Brauner 0 siblings, 2 replies; 31+ messages in thread From: Al Viro @ 2021-04-06 1:38 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Mon, Apr 05, 2021 at 10:07:37PM +0200, Christian Brauner wrote: > > diff --git a/fs/namei.c b/fs/namei.c > > index 216f16e74351..82344f1139ff 100644 > > --- a/fs/namei.c > > +++ b/fs/namei.c > > @@ -2289,6 +2289,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > > int error; > > const char *s = nd->name->name; > > > > + nd->path.mnt = NULL; > > + nd->path.dentry = NULL; > > + > > /* LOOKUP_CACHED requires RCU, ask caller to retry */ > > if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED) > > return ERR_PTR(-EAGAIN); > > @@ -2322,8 +2325,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > > } > > > > nd->root.mnt = NULL; > > - nd->path.mnt = NULL; > > - nd->path.dentry = NULL; > > > > /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ > > if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { > > Bingo. That fixes it. *grumble* OK, I suppose it'll do for backports, but longer term... I don't like how convoluted the rules for nameidata fields' validity are. In particular, for nd->path I would rather have it * cleared in set_nameidata() * cleared when it become invalid. That would be * places that drop rcu_read_lock() without having legitimized the sucker (already done, except for terminate_walk()) * terminate_walk() in non-RCU case after path_put(&nd->path) OTOH... wait a sec - the whole thing is this cycle regression, so... Could you verify that the variant below fixes that crap? Make sure nd->path.mnt and nd->path.dentry are always valid pointers Initialize them in set_nameidata() and make sure that terminate_walk() clears them once the pointers become potentially invalid (i.e. we leave RCU mode or drop them in non-RCU one). Currently we have "path_init() always initializes them and nobody accesses them outside of path_init()/terminate_walk() segments", which is asking for trouble. With that change we would have nd->path.{mnt,dentry} 1) always valid - NULL or pointing to currently allocated objects. 2) non-NULL while we are successfully walking 3) NULL when we are not walking at all 4) contributing to refcounts whenever non-NULL outside of RCU mode. Hopefully-fixes: 6c6ec2b0a3e0 ("fs: add support for LOOKUP_CACHED") Signed-off-by: Al Viro <[email protected]> --- diff --git a/fs/namei.c b/fs/namei.c index 216f16e74351..fc8760d4314e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -579,6 +579,8 @@ static void set_nameidata(struct nameidata *p, int dfd, struct filename *name) p->stack = p->internal; p->dfd = dfd; p->name = name; + p->path.mnt = NULL; + p->path.dentry = NULL; p->total_link_count = old ? old->total_link_count : 0; p->saved = old; current->nameidata = p; @@ -652,6 +654,8 @@ static void terminate_walk(struct nameidata *nd) rcu_read_unlock(); } nd->depth = 0; + nd->path.mnt = NULL; + nd->path.dentry = NULL; } /* path_put is needed afterwards regardless of success or failure */ @@ -2322,8 +2326,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) } nd->root.mnt = NULL; - nd->path.mnt = NULL; - nd->path.dentry = NULL; /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 1:38 ` Al Viro @ 2021-04-06 2:24 ` Al Viro 2021-04-06 12:35 ` Christian Brauner 1 sibling, 0 replies; 31+ messages in thread From: Al Viro @ 2021-04-06 2:24 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Tue, Apr 06, 2021 at 01:38:39AM +0000, Al Viro wrote: > On Mon, Apr 05, 2021 at 10:07:37PM +0200, Christian Brauner wrote: > > > > diff --git a/fs/namei.c b/fs/namei.c > > > index 216f16e74351..82344f1139ff 100644 > > > --- a/fs/namei.c > > > +++ b/fs/namei.c > > > @@ -2289,6 +2289,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > > > int error; > > > const char *s = nd->name->name; > > > > > > + nd->path.mnt = NULL; > > > + nd->path.dentry = NULL; > > > + > > > /* LOOKUP_CACHED requires RCU, ask caller to retry */ > > > if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED) > > > return ERR_PTR(-EAGAIN); > > > @@ -2322,8 +2325,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > > > } > > > > > > nd->root.mnt = NULL; > > > - nd->path.mnt = NULL; > > > - nd->path.dentry = NULL; > > > > > > /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ > > > if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { > > > > Bingo. That fixes it. > > *grumble* > > OK, I suppose it'll do for backports, but longer term... I don't like how > convoluted the rules for nameidata fields' validity are. In particular, > for nd->path I would rather have it > * cleared in set_nameidata() > * cleared when it become invalid. That would be > * places that drop rcu_read_lock() without having legitimized the sucker > (already done, except for terminate_walk()) > * terminate_walk() in non-RCU case after path_put(&nd->path) > > OTOH... wait a sec - the whole thing is this cycle regression, so... BTW, there's another piece of brittleness that almost accidentally doesn't end up biting us - nd->depth should be cleared in set_nameidata(), not in path_init(). In this case we are saved by the fact that the only really early failure in path_init() can't happen on the first call, so if we do leave the sucker before zeroing nd->depth, we are saved by the fact that terminate_walk() has just been called and it *does* clear nd->depth. It's obviously cleaner to have it initialized from the very beginning and not bother with it in path_init() at all. Separate patch, though - this is not, strictly speaking, a bug. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 1:38 ` Al Viro 2021-04-06 2:24 ` Al Viro @ 2021-04-06 12:35 ` Christian Brauner 2021-04-06 13:13 ` Al Viro 1 sibling, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-06 12:35 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Tue, Apr 06, 2021 at 01:38:39AM +0000, Al Viro wrote: > On Mon, Apr 05, 2021 at 10:07:37PM +0200, Christian Brauner wrote: > > > > diff --git a/fs/namei.c b/fs/namei.c > > > index 216f16e74351..82344f1139ff 100644 > > > --- a/fs/namei.c > > > +++ b/fs/namei.c > > > @@ -2289,6 +2289,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > > > int error; > > > const char *s = nd->name->name; > > > > > > + nd->path.mnt = NULL; > > > + nd->path.dentry = NULL; > > > + > > > /* LOOKUP_CACHED requires RCU, ask caller to retry */ > > > if ((flags & (LOOKUP_RCU | LOOKUP_CACHED)) == LOOKUP_CACHED) > > > return ERR_PTR(-EAGAIN); > > > @@ -2322,8 +2325,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > > > } > > > > > > nd->root.mnt = NULL; > > > - nd->path.mnt = NULL; > > > - nd->path.dentry = NULL; > > > > > > /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ > > > if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { > > > > Bingo. That fixes it. > > *grumble* > > OK, I suppose it'll do for backports, but longer term... I don't like how > convoluted the rules for nameidata fields' validity are. In particular, > for nd->path I would rather have it > * cleared in set_nameidata() > * cleared when it become invalid. That would be > * places that drop rcu_read_lock() without having legitimized the sucker > (already done, except for terminate_walk()) > * terminate_walk() in non-RCU case after path_put(&nd->path) > > OTOH... wait a sec - the whole thing is this cycle regression, so... > > Could you verify that the variant below fixes that crap? > > Make sure nd->path.mnt and nd->path.dentry are always valid pointers > > Initialize them in set_nameidata() and make sure that terminate_walk() clears them > once the pointers become potentially invalid (i.e. we leave RCU mode or drop them > in non-RCU one). Currently we have "path_init() always initializes them and nobody > accesses them outside of path_init()/terminate_walk() segments", which is asking > for trouble. > > With that change we would have nd->path.{mnt,dentry} > 1) always valid - NULL or pointing to currently allocated objects. > 2) non-NULL while we are successfully walking > 3) NULL when we are not walking at all > 4) contributing to refcounts whenever non-NULL outside of RCU mode. > > Hopefully-fixes: 6c6ec2b0a3e0 ("fs: add support for LOOKUP_CACHED") > Signed-off-by: Al Viro <[email protected]> > --- > diff --git a/fs/namei.c b/fs/namei.c > index 216f16e74351..fc8760d4314e 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -579,6 +579,8 @@ static void set_nameidata(struct nameidata *p, int dfd, struct filename *name) > p->stack = p->internal; > p->dfd = dfd; > p->name = name; > + p->path.mnt = NULL; > + p->path.dentry = NULL; > p->total_link_count = old ? old->total_link_count : 0; > p->saved = old; > current->nameidata = p; > @@ -652,6 +654,8 @@ static void terminate_walk(struct nameidata *nd) > rcu_read_unlock(); > } > nd->depth = 0; > + nd->path.mnt = NULL; > + nd->path.dentry = NULL; > } > > /* path_put is needed afterwards regardless of success or failure */ > @@ -2322,8 +2326,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) > } > > nd->root.mnt = NULL; > - nd->path.mnt = NULL; > - nd->path.dentry = NULL; > > /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ > if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { Yeah, that works too. Though I'm not a fan of this open-coding as it looks pretty brittle especially if we ever introduce another LOOKUP_* flag that requires us to clear some other field carefully. I think a tiny static inline void helper might make it easier to grok what's going on. And while we're at it might I bring up the possibility of an additional cleanup of how we currently call path_init(). Right now we pass the return value from path_init() directly into e.g. link_path_walk() which as a first thing checks for error. Which feels rather wrong and has always confused me when looking at these codepaths. I get that it might make sense for reasons unrelated to path_init() that link_path_walk() checks its first argument for error but path_init() should be checked for error right away especially now that we return early when LOOKUP_CACHED is set without LOOKUP_RCU. I have two/three additional patches on top of the one below that massage path_lookupat(), path_openat() to get rid of such - imho - hard to parse while constructs: while (!(error = link_path_walk(s, nd)) && (s = <foo>(nd, file, op)) != NULL) ; in favor of a for (;;) like we do in link_path_walk() itself already with early exits to a cleanup label. I'm not too fond of the if (!error) error = foo(); if (!error) error = bar(); terminate_walk(); return error; thing especially in longer functions such as path_lookupat() where it gets convoluted pretty quickly. I think it would be cleaner to have something like [1]. The early exists make the code easier to reason about imho. But I get that that's a style discussion. If you want to see the whole thing and not just the patch below I'm happy to send it though. Here's the tweaked fix for the immediate syzbot issue only: From d6d669aaaba64debb9c433e281109b5494ad9bc5 Mon Sep 17 00:00:00 2001 From: Al Viro <[email protected]> Date: Tue, 6 Apr 2021 13:20:26 +0200 Subject: [PATCH 1/3] namei: Ensure nd->path.{dentry,mnt} are always valid pointers Initialize them in set_nameidata() and make sure that terminate_walk() clears them once the pointers become potentially invalid (i.e. we leave RCU mode or drop them in non-RCU one). Currently we have "path_init() always initializes them and nobody accesses them outside of path_init()/terminate_walk() segments", which is asking for trouble. With that change we would have nd->path.{mnt,dentry} 1) always valid - NULL or pointing to currently allocated objects. 2) non-NULL while we are successfully walking 3) NULL when we are not walking at all 4) contributing to refcounts whenever non-NULL outside of RCU mode. Fixes: 6c6ec2b0a3e0 ("fs: add support for LOOKUP_CACHED") Reported-by: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Al Viro <[email protected]> Signed-off-by: Christian Brauner <[email protected]> --- fs/namei.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 216f16e74351..618055419da3 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -573,9 +573,18 @@ struct nameidata { umode_t dir_mode; } __randomize_layout; +static inline void clear_nameidata(struct nameidata *nd) +{ + nd->depth = 0; + nd->path.mnt = NULL; + nd->path.dentry = NULL; +} + static void set_nameidata(struct nameidata *p, int dfd, struct filename *name) { struct nameidata *old = current->nameidata; + + clear_nameidata(p); p->stack = p->internal; p->dfd = dfd; p->name = name; @@ -651,7 +660,7 @@ static void terminate_walk(struct nameidata *nd) nd->flags &= ~LOOKUP_RCU; rcu_read_unlock(); } - nd->depth = 0; + clear_nameidata(nd); } /* path_put is needed afterwards regardless of success or failure */ @@ -2322,8 +2331,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags) } nd->root.mnt = NULL; - nd->path.mnt = NULL; - nd->path.dentry = NULL; /* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */ if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) { -- 2.27.0 [1]: diff --git a/fs/namei.c b/fs/namei.c index ae1b5cd7b1a9..572a0d6e22cc 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2424,33 +2424,49 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path int err; s = path_init(nd, flags); - if (IS_ERR(s)) - return PTR_ERR(s); + if (IS_ERR(s)) { + err = PTR_ERR(s); + goto out_terminate; + } if (unlikely(flags & LOOKUP_DOWN)) { err = handle_lookup_down(nd); if (unlikely(err < 0)) - s = ERR_PTR(err); + goto out_terminate; } - while (!(err = link_path_walk(s, nd)) && - (s = lookup_last(nd)) != NULL) - ; - if (!err) - err = complete_walk(nd); + for (;;) { + err = link_path_walk(s, nd); + if (err) + goto out_terminate; - if (!err && nd->flags & LOOKUP_DIRECTORY) - if (!d_can_lookup(nd->path.dentry)) - err = -ENOTDIR; - if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) { + s = lookup_last(nd); + if (!s) + break; + } + + err = complete_walk(nd); + if (err) + goto out_terminate; + + if ((nd->flags & LOOKUP_DIRECTORY) && + (!d_can_lookup(nd->path.dentry))) { + err = -ENOTDIR; + goto out_terminate; + } + + if (unlikely(nd->flags & LOOKUP_MOUNTPOINT)) { err = handle_lookup_down(nd); nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please... + if (err) + goto out_terminate; } - if (!err) { - *path = nd->path; - nd->path.mnt = NULL; - nd->path.dentry = NULL; - } + + *path = nd->path; + nd->path.mnt = NULL; + nd->path.dentry = NULL; + +out_terminate: terminate_walk(nd); return err; } ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 12:35 ` Christian Brauner @ 2021-04-06 13:13 ` Al Viro 2021-04-06 13:22 ` Christian Brauner 0 siblings, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-06 13:13 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Tue, Apr 06, 2021 at 02:35:05PM +0200, Christian Brauner wrote: > And while we're at it might I bring up the possibility of an additional > cleanup of how we currently call path_init(). > Right now we pass the return value from path_init() directly into e.g. > link_path_walk() which as a first thing checks for error. Which feels > rather wrong and has always confused me when looking at these codepaths. Why? > I get that it might make sense for reasons unrelated to path_init() that > link_path_walk() checks its first argument for error but path_init() > should be checked for error right away especially now that we return > early when LOOKUP_CACHED is set without LOOKUP_RCU. But you are making the _callers_ of path_init() do that, for no good reason. > thing especially in longer functions such as path_lookupat() where it > gets convoluted pretty quickly. I think it would be cleaner to have > something like [1]. The early exists make the code easier to reason > about imho. But I get that that's a style discussion. Your variant is a lot more brittle, actually. > @@ -2424,33 +2424,49 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path > int err; > > s = path_init(nd, flags); > - if (IS_ERR(s)) > - return PTR_ERR(s); Where has that come from, BTW? Currently path_lookupat() does no such thing. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 13:13 ` Al Viro @ 2021-04-06 13:22 ` Christian Brauner 2021-04-06 14:15 ` Al Viro 0 siblings, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-06 13:22 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Tue, Apr 06, 2021 at 01:13:13PM +0000, Al Viro wrote: > On Tue, Apr 06, 2021 at 02:35:05PM +0200, Christian Brauner wrote: > > > And while we're at it might I bring up the possibility of an additional > > cleanup of how we currently call path_init(). > > Right now we pass the return value from path_init() directly into e.g. > > link_path_walk() which as a first thing checks for error. Which feels > > rather wrong and has always confused me when looking at these codepaths. > > Why? Why is a another function in charge of checking the return value of an initialization function. If something like path_init() fails why is the next caller responsible for rejecting it's return value and then we're passing that failure value through the whole function with if (!err) ladders but as I said it's mostly style preferences. > > > I get that it might make sense for reasons unrelated to path_init() that > > link_path_walk() checks its first argument for error but path_init() > > should be checked for error right away especially now that we return > > early when LOOKUP_CACHED is set without LOOKUP_RCU. > > But you are making the _callers_ of path_init() do that, for no good > reason. I'm confused why having callers of functions responsible for checking error values is such an out-of-band concept suddenly. I don't think it's worth arguing over this though. > > > thing especially in longer functions such as path_lookupat() where it > > gets convoluted pretty quickly. I think it would be cleaner to have > > something like [1]. The early exists make the code easier to reason > > about imho. But I get that that's a style discussion. > > Your variant is a lot more brittle, actually. > > > @@ -2424,33 +2424,49 @@ static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path > > int err; > > > > s = path_init(nd, flags); > > - if (IS_ERR(s)) > > - return PTR_ERR(s); > > Where has that come from, BTW? Currently path_lookupat() does no such thing. Hm? Are you maybe overlooking path_init() which assigns straight into the variable declaration? Or are you referring to sm else? static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path) { const char *s = path_init(nd, flags); int err; if (unlikely(flags & LOOKUP_DOWN) && !IS_ERR(s)) { err = handle_lookup_down(nd); if (unlikely(err < 0)) s = ERR_PTR(err); } while (!(err = link_path_walk(s, nd)) && (s = lookup_last(nd)) != NULL) ; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 13:22 ` Christian Brauner @ 2021-04-06 14:15 ` Al Viro 2021-04-06 14:23 ` Al Viro 2021-04-06 14:46 ` Christian Brauner 0 siblings, 2 replies; 31+ messages in thread From: Al Viro @ 2021-04-06 14:15 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Tue, Apr 06, 2021 at 03:22:05PM +0200, Christian Brauner wrote: > Why is a another function in charge of checking the return value of an > initialization function. If something like path_init() fails why is the > next caller responsible for rejecting it's return value and then we're > passing that failure value through the whole function with if (!err) > ladders but as I said it's mostly style preferences. Because otherwise you either need *all* paths leading to link_path_walk() duplicate the logics (and get hurt whenever you miss one) or have "well, in some cases link_path_walk() handles ERR_PTR() given to it, in some cases its caller do" mess. > > > s = path_init(nd, flags); > > > - if (IS_ERR(s)) > > > - return PTR_ERR(s); > > > > Where has that come from, BTW? Currently path_lookupat() does no such thing. > > Hm? Are you maybe overlooking path_init() which assigns straight into > the variable declaration? Or are you referring to sm else? I'm referring to the fact that your diff is with an already modified path_lookupat() _and_ those modifications have managed to introduce a bug your patch reverts. No terminate_walk() paired with that path_init() failure, i.e. path_init() is responsible for cleanups on its (many) failure exits... ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 14:15 ` Al Viro @ 2021-04-06 14:23 ` Al Viro 2021-04-06 15:37 ` Jens Axboe 2021-04-06 14:46 ` Christian Brauner 1 sibling, 1 reply; 31+ messages in thread From: Al Viro @ 2021-04-06 14:23 UTC (permalink / raw) To: Christian Brauner Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Tue, Apr 06, 2021 at 02:15:01PM +0000, Al Viro wrote: > I'm referring to the fact that your diff is with an already modified path_lookupat() > _and_ those modifications have managed to introduce a bug your patch reverts. > No terminate_walk() paired with that path_init() failure, i.e. path_init() is > responsible for cleanups on its (many) failure exits... I can't tell without seeing the variant your diff is against, but at a guess it had a non-trivial amount of trouble with missed rcu_read_unlock() in cases when path_init() fails after having done rcu_read_lock(). For trivial testcase, consider passing -1 for dfd, so that it would fail with -EBADF. Or passing 0 for dfd and "blah" for name (assuming your stdin is not a directory). Sure, you could handle those in path_init() (or delay grabbing rcu_read_lock() in there, spreading it in a bunch of branches), but duplicated cleanup logics for a bunch of failure exits is asking for trouble. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 14:23 ` Al Viro @ 2021-04-06 15:37 ` Jens Axboe 0 siblings, 0 replies; 31+ messages in thread From: Jens Axboe @ 2021-04-06 15:37 UTC (permalink / raw) To: Al Viro, Christian Brauner Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On 4/6/21 8:23 AM, Al Viro wrote: > On Tue, Apr 06, 2021 at 02:15:01PM +0000, Al Viro wrote: > >> I'm referring to the fact that your diff is with an already modified path_lookupat() >> _and_ those modifications have managed to introduce a bug your patch reverts. >> No terminate_walk() paired with that path_init() failure, i.e. path_init() is >> responsible for cleanups on its (many) failure exits... > > I can't tell without seeing the variant your diff is against, but at a guess > it had a non-trivial amount of trouble with missed rcu_read_unlock() in > cases when path_init() fails after having done rcu_read_lock(). For trivial > testcase, consider passing -1 for dfd, so that it would fail with -EBADF. > Or passing 0 for dfd and "blah" for name (assuming your stdin is not a directory). > Sure, you could handle those in path_init() (or delay grabbing rcu_read_lock() > in there, spreading it in a bunch of branches), but duplicated cleanup logics > for a bunch of failure exits is asking for trouble. Thanks for taking care of this Al, fwiw I'm (mostly) out on vacation. -- Jens Axboe ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-06 14:15 ` Al Viro 2021-04-06 14:23 ` Al Viro @ 2021-04-06 14:46 ` Christian Brauner 1 sibling, 0 replies; 31+ messages in thread From: Christian Brauner @ 2021-04-06 14:46 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Tue, Apr 06, 2021 at 02:15:01PM +0000, Al Viro wrote: > On Tue, Apr 06, 2021 at 03:22:05PM +0200, Christian Brauner wrote: > > > Why is a another function in charge of checking the return value of an > > initialization function. If something like path_init() fails why is the > > next caller responsible for rejecting it's return value and then we're > > passing that failure value through the whole function with if (!err) > > ladders but as I said it's mostly style preferences. > > Because otherwise you either need *all* paths leading to link_path_walk() > duplicate the logics (and get hurt whenever you miss one) or have "well, > in some cases link_path_walk() handles ERR_PTR() given to it, in some > cases its caller do" mess. > > > > > s = path_init(nd, flags); > > > > - if (IS_ERR(s)) > > > > - return PTR_ERR(s); > > > > > > Where has that come from, BTW? Currently path_lookupat() does no such thing. > > > > Hm? Are you maybe overlooking path_init() which assigns straight into > > the variable declaration? Or are you referring to sm else? > > I'm referring to the fact that your diff is with an already modified path_lookupat() > _and_ those modifications have managed to introduce a bug your patch reverts. > No terminate_walk() paired with that path_init() failure, i.e. path_init() is > responsible for cleanups on its (many) failure exits... Note that the paste post the patch was just a doodle to illustrate the point not sm to review in earnest (I should probably comment prefix things like this with "untested".). ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 16:40 ` Christian Brauner 2021-04-04 16:44 ` Al Viro @ 2021-04-04 16:52 ` Christian Brauner 2021-04-04 16:55 ` Christian Brauner 1 sibling, 1 reply; 31+ messages in thread From: Christian Brauner @ 2021-04-04 16:52 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 06:40:40PM +0200, Christian Brauner wrote: > On Sun, Apr 04, 2021 at 03:56:02PM +0000, Al Viro wrote: > > On Sun, Apr 04, 2021 at 01:34:45PM +0200, Christian Brauner wrote: > > > > > Sorry for not replying to your earlier mail but I've been debugging this > > > too. My current theory is that it's related to LOOKUP_ROOT_GRABBED when > > > LOOKUP_CACHED is specified _possibly_ with an interaction how > > > create_io_thread() is created with CLONE_FS. The reproducer requires you > > > either have called pivot_root() or chroot() in order for the failure to > > > happen. So I think the fact that we skip legitimize_root() when > > > LOOKUP_CACHED is set might figure into this. I can keep digging. > > > > > > > > Funny enough I already placed a printk statement into the place you > > > wanted one too so I just amended mine. Here's what you get: > > > > > > If pivot pivot_root() is used before the chroot() you get: > > > > > > [ 637.464555] AAAA: count(-1) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(579) | dev(tmpfs) > > > > > > if you only call chroot, i.e. make the pivot_root() branch a simple > > > if (true) you get: > > > > > > [ 955.206117] AAAA: count(-2) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(580) | dev(tmpfs) > > > > Very interesting. What happens if you call loop() twice? And now I wonder > > whether it's root or cwd, actually... Hmm... > > > > How about this: > > fd = open("/proc/self/mountinfo", 0); > > mkdir("./newroot/foo", 0777); > > mount("./newroot/foo", "./newroot/foo", 0, MS_BIND, NULL); > > chroot("./newroot"); > > chdir("/foo"); > > while (1) { > > static char buf[4096]; > > int n = read(fd, buf, 4096); > > if (n <= 0) > > break; > > write(1, buf, n); > > } > > close(fd); > > drop_caps(); > > loop(); > > as the end of namespace_sandbox_proc(), instead of > > chroot("./newroot"); > > chdir("/"); > > drop_caps(); > > loop(); > > sequence we have there? > > Uhum, well then we oops properly with a null-deref. And note that the reproducer also requires CLONE_NEWNS which causes the fs_struct to be unshared as well. I'm not completely in the clear what would happen if a new io worker thread were to be created after the caller has called unshare(CLONE_NEWNS). > > f1-vm login: [ 395.046971][ T5856] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN > [ 395.049716][ T5856] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] > [ 395.052847][ T5856] CPU: 1 PID: 5856 Comm: iou-wrk-5851 Tainted: G E 5.12.0-rc5-020f68e042a19f59784f0962004d848181d13b9e #46 > [ 395.056594][ T5856] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 0.0.0 02/06/2015 > [ 395.058962][ T5856] RIP: 0010:inode_permission+0x4e/0x530 > [ 395.060362][ T5856] Code: ff 89 de e8 34 42 a9 ff 85 db 0f 85 ef 00 00 00 e8 87 40 a9 ff 4c 8d 7d 02 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 14 02 4c 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 c2 > [ 395.065442][ T5856] RSP: 0018:ffffc9000681f640 EFLAGS: 00010246 > [ 395.067274][ T5856] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff84ac11fc > [ 395.069834][ T5856] RDX: 0000000000000000 RSI: ffff888013210000 RDI: 0000000000000002 > [ 395.072527][ T5856] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed1000044f22 > [ 395.075287][ T5856] R10: ffff888031c105d3 R11: ffffed1000044f21 R12: ffffc9000681f8e0 > [ 395.077026][ T5856] R13: 0000000000000001 R14: ffffffff8e441e60 R15: 0000000000000002 > [ 395.079001][ T5856] FS: 00007f136886f800(0000) GS:ffff888015a80000(0000) knlGS:0000000000000000 > [ 395.081181][ T5856] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 395.082659][ T5856] CR2: 000055ecfefe8e70 CR3: 00000000236b7000 CR4: 0000000000350ee0 > [ 395.084727][ T5856] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 395.087117][ T5856] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 395.089613][ T5856] Call Trace: > [ 395.090570][ T5856] link_path_walk.part.0+0x790/0xc10 > [ 395.091735][ T5856] ? write_comp_data+0x2a/0x90 > [ 395.092931][ T5856] ? __sanitizer_cov_trace_pc+0x1d/0x50 > [ 395.094093][ T5856] ? walk_component+0x6a0/0x6a0 > [ 395.095021][ T5856] ? percpu_counter_add_batch+0xbc/0x180 > [ 395.096255][ T5856] path_openat+0x269/0x2790 > [ 395.097305][ T5856] ? path_lookupat.isra.0+0x530/0x530 > [ 395.098391][ T5856] ? ret_from_fork+0x1f/0x30 > [ 395.099431][ T5856] ? lockdep_hardirqs_on_prepare+0x400/0x400 > [ 395.101326][ T5856] do_filp_open+0x208/0x270 > [ 395.102894][ T5856] ? rcu_read_lock_bh_held+0xb0/0xb0 > [ 395.104525][ T5856] ? may_open_dev+0xf0/0xf0 > [ 395.105968][ T5856] ? do_raw_spin_lock+0x125/0x2e0 > [ 395.107426][ T5856] ? write_comp_data+0x2a/0x90 > [ 395.108651][ T5856] ? __sanitizer_cov_trace_pc+0x1d/0x50 > [ 395.110033][ T5856] ? _raw_spin_unlock+0x29/0x40 > [ 395.111397][ T5856] ? alloc_fd+0x499/0x640 > [ 395.112783][ T5856] io_openat2+0x1d1/0x8f0 > [ 395.114237][ T5856] ? __lock_acquire+0x1847/0x5850 > [ 395.115982][ T5856] ? io_req_complete_post+0xa90/0xa90 > [ 395.117347][ T5856] ? write_comp_data+0x2a/0x90 > [ 395.118428][ T5856] ? io_issue_sqe+0x5ac0/0x5ac0 > [ 395.119497][ T5856] io_issue_sqe+0x2a2/0x5ac0 > [ 395.120707][ T5856] ? io_poll_complete.constprop.0+0x100/0x100 > [ 395.130417][ T5856] ? find_held_lock+0x2d/0x110 > [ 395.139772][ T5856] ? io_worker_handle_work+0x55d/0x13b0 > [ 395.149051][ T5856] ? lock_downgrade+0x690/0x690 > [ 395.158262][ T5856] ? rwlock_bug.part.0+0x90/0x90 > [ 395.171997][ T5856] ? io_issue_sqe+0x5ac0/0x5ac0 > [ 395.180513][ T5856] io_wq_submit_work+0x1da/0x640 > [ 395.193646][ T5856] ? io_issue_sqe+0x5ac0/0x5ac0 > [ 395.202240][ T5856] io_worker_handle_work+0x70a/0x13b0 > [ 395.210276][ T5856] ? rwlock_bug.part.0+0x90/0x90 > [ 395.218389][ T5856] io_wqe_worker+0x2ec/0xd00 > [ 395.226152][ T5856] ? io_worker_handle_work+0x13b0/0x13b0 > [ 395.238492][ T5856] ? ret_from_fork+0x8/0x30 > [ 395.245595][ T5856] ? lock_downgrade+0x690/0x690 > [ 395.252630][ T5856] ? do_raw_spin_lock+0x125/0x2e0 > [ 395.259801][ T5856] ? rwlock_bug.part.0+0x90/0x90 > [ 395.271613][ T5856] ? write_comp_data+0x2a/0x90 > [ 395.279000][ T5856] ? _raw_spin_unlock_irq+0x24/0x50 > [ 395.285999][ T5856] ? io_worker_handle_work+0x13b0/0x13b0 > [ 395.297727][ T5856] ret_from_fork+0x1f/0x30 > [ 395.304546][ T5856] Modules linked in: efi_pstore(E) efivarfs(E) > [ 395.311931][ T5851] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#2] PREEMPT SMP KASAN > [ 395.313449][ T5856] ---[ end trace e9d61a560dd12328 ]--- ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [syzbot] WARNING in mntput_no_expire (2) 2021-04-04 16:52 ` Christian Brauner @ 2021-04-04 16:55 ` Christian Brauner 0 siblings, 0 replies; 31+ messages in thread From: Christian Brauner @ 2021-04-04 16:55 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs, io-uring On Sun, Apr 04, 2021 at 06:52:08PM +0200, Christian Brauner wrote: > On Sun, Apr 04, 2021 at 06:40:40PM +0200, Christian Brauner wrote: > > On Sun, Apr 04, 2021 at 03:56:02PM +0000, Al Viro wrote: > > > On Sun, Apr 04, 2021 at 01:34:45PM +0200, Christian Brauner wrote: > > > > > > > Sorry for not replying to your earlier mail but I've been debugging this > > > > too. My current theory is that it's related to LOOKUP_ROOT_GRABBED when > > > > LOOKUP_CACHED is specified _possibly_ with an interaction how > > > > create_io_thread() is created with CLONE_FS. The reproducer requires you > > > > either have called pivot_root() or chroot() in order for the failure to > > > > happen. So I think the fact that we skip legitimize_root() when > > > > LOOKUP_CACHED is set might figure into this. I can keep digging. > > > > > > > > > > > Funny enough I already placed a printk statement into the place you > > > > wanted one too so I just amended mine. Here's what you get: > > > > > > > > If pivot pivot_root() is used before the chroot() you get: > > > > > > > > [ 637.464555] AAAA: count(-1) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(579) | dev(tmpfs) > > > > > > > > if you only call chroot, i.e. make the pivot_root() branch a simple > > > > if (true) you get: > > > > > > > > [ 955.206117] AAAA: count(-2) | mnt_mntpoint(/) | mnt->mnt.mnt_root(/) | id(580) | dev(tmpfs) > > > > > > Very interesting. What happens if you call loop() twice? And now I wonder > > > whether it's root or cwd, actually... Hmm... > > > > > > How about this: > > > fd = open("/proc/self/mountinfo", 0); > > > mkdir("./newroot/foo", 0777); > > > mount("./newroot/foo", "./newroot/foo", 0, MS_BIND, NULL); > > > chroot("./newroot"); > > > chdir("/foo"); > > > while (1) { > > > static char buf[4096]; > > > int n = read(fd, buf, 4096); > > > if (n <= 0) > > > break; > > > write(1, buf, n); > > > } > > > close(fd); > > > drop_caps(); > > > loop(); > > > as the end of namespace_sandbox_proc(), instead of > > > chroot("./newroot"); > > > chdir("/"); > > > drop_caps(); > > > loop(); > > > sequence we have there? > > > > Uhum, well then we oops properly with a null-deref. > > And note that the reproducer also requires CLONE_NEWNS which causes the > fs_struct to be unshared as well. I'm not completely in the clear what > would happen if a new io worker thread were to be created after the > caller has called unshare(CLONE_NEWNS). And here's a non-null-deref version: [ 647.257107] AAAA: count(-1) | mnt_mntpoint(foo) | mnt->mnt.mnt_root(foo) | id(1358) | dev(tmpfs) which is 1358 1326 0:66 /newroot/foo /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/foo rw,relatime - tmpfs rw Just for kicks, here's the full mount table: 1224 513 8:2 / / rw,relatime - ext4 /dev/sda2 rw 1225 1224 0:5 / /dev rw,nosuid,noexec,relatime - devtmpfs udev rw,size=302716k,nr_inodes=75679,mode=755 1226 1225 0:26 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 1227 1225 0:28 / /dev/shm rw,nosuid,nodev - tmpfs tmpfs rw 1228 1225 0:48 / /dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,pagesize=2M 1229 1225 0:21 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw 1230 1224 0:27 / /run rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=62152k,mode=755 1231 1230 0:29 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,size=5120k 1232 1230 0:49 / /run/lxd_agent rw,relatime - tmpfs tmpfs rw,size=51200k,mode=700 1233 1230 0:59 / /run/user/1000 rw,nosuid,nodev,relatime - tmpfs tmpfs rw,size=62148k,nr_inodes=15537,mode=700,uid=1000,gid=1000 1234 1224 0:24 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 1235 1234 0:6 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime - securityfs securityfs rw 1236 1234 0:30 / /sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,size=4096k,nr_inodes=1024,mode=755 1237 1236 0:31 /../../.. /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup2 rw 1238 1236 0:32 /../../.. /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,name=systemd 1239 1236 0:36 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,perf_event 1240 1236 0:37 /.. /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,blkio 1241 1236 0:38 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,rdma 1242 1236 0:39 /.. /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu,cpuacct 1243 1236 0:40 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,net_cls,net_prio 1244 1236 0:41 /.. /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,devices 1245 1236 0:42 /../../.. /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory 1246 1236 0:43 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,hugetlb 1247 1236 0:44 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpuset,clone_children 1248 1236 0:45 /../../.. /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,pids 1249 1236 0:46 /../../.. /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer 1250 1234 0:33 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime - pstore pstore rw 1251 1234 0:34 / /sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime - efivarfs efivarfs rw 1252 1234 0:35 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime - bpf none rw,mode=700 1253 1234 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime - debugfs debugfs rw 1254 1234 0:12 / /sys/kernel/tracing rw,nosuid,nodev,noexec,relatime - tracefs tracefs rw 1255 1234 0:51 / /sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime - fusectl fusectl rw 1256 1234 0:20 / /sys/kernel/config rw,nosuid,nodev,noexec,relatime - configfs configfs rw 1257 1224 0:25 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 1258 1257 0:47 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,fd=29,pgrp=0,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=34137 1259 1258 0:52 / /proc/sys/fs/binfmt_misc rw,nosuid,nodev,noexec,relatime - binfmt_misc binfmt_misc rw 1260 1224 0:50 / /home/ubuntu/src/compiled rw,relatime - virtiofs lxd_lxc rw 1261 1224 8:1 / /boot/efi rw,relatime - vfat /dev/sda1 rw,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 1262 1224 0:57 / /var/lib/lxcfs rw,nosuid,nodev,relatime - fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other 1263 1224 0:60 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp rw,relatime - tmpfs rw 1264 1263 0:5 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/dev rw,nosuid,noexec,relatime - devtmpfs udev rw,size=302716k,nr_inodes=75679,mode=755 1265 1264 0:26 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 1266 1264 0:28 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/dev/shm rw,nosuid,nodev - tmpfs tmpfs rw 1267 1264 0:48 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,pagesize=2M 1268 1264 0:21 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw 1269 1263 0:61 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/proc rw,relatime - proc none rw 1270 1263 0:24 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 1271 1270 0:6 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/kernel/security rw,nosuid,nodev,noexec,relatime - securityfs securityfs rw 1272 1270 0:30 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,size=4096k,nr_inodes=1024,mode=755 1273 1272 0:31 /../../.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup2 rw 1274 1272 0:32 /../../.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,name=systemd 1275 1272 0:36 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,perf_event 1276 1272 0:37 /.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,blkio 1277 1272 0:38 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,rdma 1278 1272 0:39 /.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu,cpuacct 1279 1272 0:40 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,net_cls,net_prio 1280 1272 0:41 /.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,devices 1281 1272 0:42 /../../.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory 1282 1272 0:43 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,hugetlb 1283 1272 0:44 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpuset,clone_children 1284 1272 0:45 /../../.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,pids 1285 1272 0:46 /../../.. /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer 1286 1270 0:33 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/pstore rw,nosuid,nodev,noexec,relatime - pstore pstore rw 1287 1270 0:34 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime - efivarfs efivarfs rw 1288 1270 0:35 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/bpf rw,nosuid,nodev,noexec,relatime - bpf none rw,mode=700 1289 1270 0:7 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/kernel/debug rw,nosuid,nodev,noexec,relatime - debugfs debugfs rw 1290 1270 0:12 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/kernel/tracing rw,nosuid,nodev,noexec,relatime - tracefs tracefs rw 1291 1270 0:51 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime - fusectl fusectl rw 1292 1270 0:20 / /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/sys/kernel/config rw,nosuid,nodev,noexec,relatime - configfs configfs rw 1293 1263 0:60 /newroot/foo /home/ubuntu/syzkaller.qO1bUT/syz-tmp/newroot/foo rw,relatime - tmpfs rw 1294 1224 0:62 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp rw,relatime - tmpfs rw 1295 1294 0:5 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/dev rw,nosuid,noexec,relatime - devtmpfs udev rw,size=302716k,nr_inodes=75679,mode=755 1296 1295 0:26 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 1297 1295 0:28 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/dev/shm rw,nosuid,nodev - tmpfs tmpfs rw 1298 1295 0:48 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,pagesize=2M 1299 1295 0:21 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw 1300 1294 0:63 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/proc rw,relatime - proc none rw 1301 1294 0:24 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 1302 1301 0:6 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/kernel/security rw,nosuid,nodev,noexec,relatime - securityfs securityfs rw 1303 1301 0:30 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,size=4096k,nr_inodes=1024,mode=755 1304 1303 0:31 /../../.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup2 rw 1305 1303 0:32 /../../.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,name=systemd 1306 1303 0:36 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,perf_event 1307 1303 0:37 /.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,blkio 1308 1303 0:38 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,rdma 1309 1303 0:39 /.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu,cpuacct 1310 1303 0:40 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,net_cls,net_prio 1311 1303 0:41 /.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,devices 1312 1303 0:42 /../../.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory 1313 1303 0:43 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,hugetlb 1314 1303 0:44 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpuset,clone_children 1315 1303 0:45 /../../.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,pids 1316 1303 0:46 /../../.. /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer 1317 1301 0:33 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/pstore rw,nosuid,nodev,noexec,relatime - pstore pstore rw 1318 1301 0:34 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime - efivarfs efivarfs rw 1319 1301 0:35 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/bpf rw,nosuid,nodev,noexec,relatime - bpf none rw,mode=700 1320 1301 0:7 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/kernel/debug rw,nosuid,nodev,noexec,relatime - debugfs debugfs rw 1321 1301 0:12 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/kernel/tracing rw,nosuid,nodev,noexec,relatime - tracefs tracefs rw 1322 1301 0:51 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime - fusectl fusectl rw 1323 1301 0:20 / /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/sys/kernel/config rw,nosuid,nodev,noexec,relatime - configfs configfs rw 1324 1294 0:62 /newroot/foo /home/ubuntu/syzkaller.AVk8bL/syz-tmp/newroot/foo rw,relatime - tmpfs rw 1326 1224 0:66 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp rw,relatime - tmpfs rw 1327 1326 0:5 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/dev rw,nosuid,noexec,relatime - devtmpfs udev rw,size=302716k,nr_inodes=75679,mode=755 1328 1327 0:26 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 1329 1327 0:28 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/dev/shm rw,nosuid,nodev - tmpfs tmpfs rw 1330 1327 0:48 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/dev/hugepages rw,relatime - hugetlbfs hugetlbfs rw,pagesize=2M 1331 1327 0:21 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw 1332 1326 0:67 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/proc rw,relatime - proc none rw 1333 1326 0:24 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 1334 1333 0:6 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/kernel/security rw,nosuid,nodev,noexec,relatime - securityfs securityfs rw 1335 1333 0:30 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup ro,nosuid,nodev,noexec - tmpfs tmpfs ro,size=4096k,nr_inodes=1024,mode=755 1336 1335 0:31 /../../.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup2 rw 1337 1335 0:32 /../../.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,xattr,name=systemd 1338 1335 0:36 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,perf_event 1339 1335 0:37 /.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,blkio 1340 1335 0:38 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,rdma 1341 1335 0:39 /.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpu,cpuacct 1342 1335 0:40 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,net_cls,net_prio 1343 1335 0:41 /.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,devices 1344 1335 0:42 /../../.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,memory 1345 1335 0:43 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,hugetlb 1346 1335 0:44 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,cpuset,clone_children 1347 1335 0:45 /../../.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,pids 1348 1335 0:46 /../../.. /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer 1349 1333 0:33 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/pstore rw,nosuid,nodev,noexec,relatime - pstore pstore rw 1350 1333 0:34 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/firmware/efi/efivars rw,nosuid,nodev,noexec,relatime - efivarfs efivarfs rw 1351 1333 0:35 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/bpf rw,nosuid,nodev,noexec,relatime - bpf none rw,mode=700 1352 1333 0:7 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/kernel/debug rw,nosuid,nodev,noexec,relatime - debugfs debugfs rw 1353 1333 0:12 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/kernel/tracing rw,nosuid,nodev,noexec,relatime - tracefs tracefs rw 1354 1333 0:51 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/fs/fuse/connections rw,nosuid,nodev,noexec,relatime - fusectl fusectl rw 1355 1333 0:20 / /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/sys/kernel/config rw,nosuid,nodev,noexec,relatime - configfs configfs rw 1358 1326 0:66 /newroot/foo /home/ubuntu/syzkaller.Wgqj6W/syz-tmp/newroot/foo rw,relatime - tmpfs rw OPCODE(18) | fd(-100) | path(./file0) ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2021-04-06 15:37 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <[email protected]>
2021-04-01 15:45 ` [syzbot] WARNING in mntput_no_expire (2) Christian Brauner
2021-04-01 16:09 ` Jens Axboe
2021-04-01 17:46 ` Christian Brauner
2021-04-01 17:59 ` Christian Brauner
2021-04-01 19:11 ` Al Viro
2021-04-04 2:34 ` Al Viro
2021-04-04 2:38 ` Al Viro
2021-04-04 11:34 ` Christian Brauner
2021-04-04 15:56 ` Al Viro
2021-04-04 16:40 ` Christian Brauner
2021-04-04 16:44 ` Al Viro
2021-04-04 17:05 ` Christian Brauner
2021-04-04 18:50 ` Al Viro
2021-04-04 20:17 ` Al Viro
2021-04-05 11:44 ` Christian Brauner
2021-04-05 16:18 ` Al Viro
2021-04-05 17:08 ` Christian Brauner
2021-04-05 18:23 ` Al Viro
2021-04-05 18:28 ` Al Viro
2021-04-05 20:07 ` Christian Brauner
2021-04-06 1:38 ` Al Viro
2021-04-06 2:24 ` Al Viro
2021-04-06 12:35 ` Christian Brauner
2021-04-06 13:13 ` Al Viro
2021-04-06 13:22 ` Christian Brauner
2021-04-06 14:15 ` Al Viro
2021-04-06 14:23 ` Al Viro
2021-04-06 15:37 ` Jens Axboe
2021-04-06 14:46 ` Christian Brauner
2021-04-04 16:52 ` Christian Brauner
2021-04-04 16:55 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox