* INFO: task hung in io_uring_cancel_sqpoll [not found] ` <[email protected]> @ 2021-04-30 14:21 ` Palash Oswal 2021-04-30 14:33 ` Pavel Begunkov 2021-05-02 22:12 ` Pavel Begunkov 0 siblings, 2 replies; 20+ messages in thread From: Palash Oswal @ 2021-04-30 14:21 UTC (permalink / raw) To: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc > dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 > userspace arch: riscv64 > CC: [[email protected] [email protected] [email protected] [email protected]] > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: [email protected] > > INFO: task iou-sqp-4867:4871 blocked for more than 430 seconds. > Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:iou-sqp-4867 state:D stack: 0 pid: 4871 ppid: 4259 flags:0x00000004 > Call Trace: > [<ffffffe003bc3252>] context_switch kernel/sched/core.c:4324 [inline] > [<ffffffe003bc3252>] __schedule+0x478/0xdec kernel/sched/core.c:5075 > [<ffffffe003bc3c2a>] schedule+0x64/0x166 kernel/sched/core.c:5154 > [<ffffffe0004b80ee>] io_uring_cancel_sqpoll+0x1de/0x294 fs/io_uring.c:8858 > [<ffffffe0004c19cc>] io_sq_thread+0x548/0xe58 fs/io_uring.c:6750 > [<ffffffe000005570>] ret_from_exception+0x0/0x14 > > Showing all locks held in the system: > 3 locks held by kworker/u4:0/7: > 3 locks held by kworker/1:0/19: > 1 lock held by khungtaskd/1556: > #0: ffffffe00592b5e8 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1fa kernel/locking/lockdep.c:6329 > 1 lock held by klogd/3947: > 2 locks held by getty/4142: > #0: ffffffe00f1aa098 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0x3c/0x48 drivers/tty/tty_ldsem.c:340 > #1: ffffffd010b6f2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9ac/0xb08 drivers/tty/n_tty.c:2178 > 2 locks held by kworker/0:1/4375: > > ============================================= > > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at [email protected]. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. Attaching a C reproducer for this bug: #define _GNU_SOURCE #include <fcntl.h> #include <signal.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/wait.h> #include <time.h> #include <unistd.h> static uint64_t current_time_ms(void) { struct timespec ts; if (clock_gettime(CLOCK_MONOTONIC, &ts)) exit(1); return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; } #define SIZEOF_IO_URING_SQE 64 #define SIZEOF_IO_URING_CQE 16 #define SQ_TAIL_OFFSET 64 #define SQ_RING_MASK_OFFSET 256 #define SQ_RING_ENTRIES_OFFSET 264 #define CQ_RING_ENTRIES_OFFSET 268 #define CQ_CQES_OFFSET 320 struct io_sqring_offsets { uint32_t head; uint32_t tail; uint32_t ring_mask; uint32_t ring_entries; uint32_t flags; uint32_t dropped; uint32_t array; uint32_t resv1; uint64_t resv2; }; struct io_cqring_offsets { uint32_t head; uint32_t tail; uint32_t ring_mask; uint32_t ring_entries; uint32_t overflow; uint32_t cqes; uint64_t resv[2]; }; struct io_uring_params { uint32_t sq_entries; uint32_t cq_entries; uint32_t flags; uint32_t sq_thread_cpu; uint32_t sq_thread_idle; uint32_t features; uint32_t resv[4]; struct io_sqring_offsets sq_off; struct io_cqring_offsets cq_off; }; #define IORING_OFF_SQ_RING 0 #define IORING_OFF_SQES 0x10000000ULL #define sys_io_uring_setup 425 static long syz_io_uring_setup(volatile long a0, volatile long a1, volatile long a2, volatile long a3, volatile long a4, volatile long a5) { uint32_t entries = (uint32_t)a0; struct io_uring_params* setup_params = (struct io_uring_params*)a1; void* vma1 = (void*)a2; void* vma2 = (void*)a3; void** ring_ptr_out = (void**)a4; void** sqes_ptr_out = (void**)a5; uint32_t fd_io_uring = syscall(sys_io_uring_setup, entries, setup_params); uint32_t sq_ring_sz = setup_params->sq_off.array + setup_params->sq_entries * sizeof(uint32_t); uint32_t cq_ring_sz = setup_params->cq_off.cqes + setup_params->cq_entries * SIZEOF_IO_URING_CQE; uint32_t ring_sz = sq_ring_sz > cq_ring_sz ? sq_ring_sz : cq_ring_sz; *ring_ptr_out = mmap(vma1, ring_sz, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd_io_uring, IORING_OFF_SQ_RING); uint32_t sqes_sz = setup_params->sq_entries * SIZEOF_IO_URING_SQE; *sqes_ptr_out = mmap(vma2, sqes_sz, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd_io_uring, IORING_OFF_SQES); return fd_io_uring; } static long syz_io_uring_submit(volatile long a0, volatile long a1, volatile long a2, volatile long a3) { char* ring_ptr = (char*)a0; char* sqes_ptr = (char*)a1; char* sqe = (char*)a2; uint32_t sqes_index = (uint32_t)a3; uint32_t sq_ring_entries = *(uint32_t*)(ring_ptr + SQ_RING_ENTRIES_OFFSET); uint32_t cq_ring_entries = *(uint32_t*)(ring_ptr + CQ_RING_ENTRIES_OFFSET); uint32_t sq_array_off = (CQ_CQES_OFFSET + cq_ring_entries * SIZEOF_IO_URING_CQE + 63) & ~63; if (sq_ring_entries) sqes_index %= sq_ring_entries; char* sqe_dest = sqes_ptr + sqes_index * SIZEOF_IO_URING_SQE; memcpy(sqe_dest, sqe, SIZEOF_IO_URING_SQE); uint32_t sq_ring_mask = *(uint32_t*)(ring_ptr + SQ_RING_MASK_OFFSET); uint32_t* sq_tail_ptr = (uint32_t*)(ring_ptr + SQ_TAIL_OFFSET); uint32_t sq_tail = *sq_tail_ptr & sq_ring_mask; uint32_t sq_tail_next = *sq_tail_ptr + 1; uint32_t* sq_array = (uint32_t*)(ring_ptr + sq_array_off); *(sq_array + sq_tail) = sqes_index; __atomic_store_n(sq_tail_ptr, sq_tail_next, __ATOMIC_RELEASE); return 0; } static void kill_and_wait(int pid, int* status) { kill(-pid, SIGKILL); kill(pid, SIGKILL); while (waitpid(-1, status, __WALL) != pid) { } } #define WAIT_FLAGS __WALL uint64_t r[3] = {0xffffffffffffffff, 0x0, 0x0}; void trigger_bug(void) { intptr_t res = 0; *(uint32_t*)0x20000204 = 0; *(uint32_t*)0x20000208 = 2; *(uint32_t*)0x2000020c = 0; *(uint32_t*)0x20000210 = 0; *(uint32_t*)0x20000218 = -1; memset((void*)0x2000021c, 0, 12); res = -1; res = syz_io_uring_setup(0x7987, 0x20000200, 0x20400000, 0x20ffd000, 0x200000c0, 0x200001c0); if (res != -1) { r[0] = res; r[1] = *(uint64_t*)0x200000c0; r[2] = *(uint64_t*)0x200001c0; } *(uint8_t*)0x20000180 = 0xb; *(uint8_t*)0x20000181 = 1; *(uint16_t*)0x20000182 = 0; *(uint32_t*)0x20000184 = 0; *(uint64_t*)0x20000188 = 4; *(uint64_t*)0x20000190 = 0x20000140; *(uint64_t*)0x20000140 = 0x77359400; *(uint64_t*)0x20000148 = 0; *(uint32_t*)0x20000198 = 1; *(uint32_t*)0x2000019c = 0; *(uint64_t*)0x200001a0 = 0; *(uint16_t*)0x200001a8 = 0; *(uint16_t*)0x200001aa = 0; memset((void*)0x200001ac, 0, 20); syz_io_uring_submit(r[1], r[2], 0x20000180, 1); *(uint32_t*)0x20000544 = 0; *(uint32_t*)0x20000548 = 0x36; *(uint32_t*)0x2000054c = 0; *(uint32_t*)0x20000550 = 0; *(uint32_t*)0x20000558 = r[0]; memset((void*)0x2000055c, 0, 12); syz_io_uring_setup(0x4bf1, 0x20000540, 0x20ffd000, 0x20ffc000, 0, 0); } int main(void) { syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); int pid = fork(); if (pid < 0) exit(1); if (pid == 0) { trigger_bug(); exit(0); } int status = 0; uint64_t start = current_time_ms(); for (;;) { if (current_time_ms() - start < 1000) { continue; } kill_and_wait(pid, &status); break; } return 0; } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 14:21 ` INFO: task hung in io_uring_cancel_sqpoll Palash Oswal @ 2021-04-30 14:33 ` Pavel Begunkov 2021-04-30 15:02 ` Palash Oswal 2021-05-02 22:12 ` Pavel Begunkov 1 sibling, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-04-30 14:33 UTC (permalink / raw) To: Palash Oswal, Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 4/30/21 3:21 PM, Palash Oswal wrote: > On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >> >> Hello, >> >> syzbot found the following issue on: >> >> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >> userspace arch: riscv64 >> CC: [[email protected] [email protected] [email protected] [email protected]] >> >> Unfortunately, I don't have any reproducer for this issue yet. There was so many fixes in 5.12 after this revision, including sqpoll cancellation related... Can you try something more up-to-date? Like released 5.12 or for-next >> >> IMPORTANT: if you fix the issue, please add the following tag to the commit: >> Reported-by: [email protected] >> >> INFO: task iou-sqp-4867:4871 blocked for more than 430 seconds. >> Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> task:iou-sqp-4867 state:D stack: 0 pid: 4871 ppid: 4259 flags:0x00000004 >> Call Trace: >> [<ffffffe003bc3252>] context_switch kernel/sched/core.c:4324 [inline] >> [<ffffffe003bc3252>] __schedule+0x478/0xdec kernel/sched/core.c:5075 >> [<ffffffe003bc3c2a>] schedule+0x64/0x166 kernel/sched/core.c:5154 >> [<ffffffe0004b80ee>] io_uring_cancel_sqpoll+0x1de/0x294 fs/io_uring.c:8858 >> [<ffffffe0004c19cc>] io_sq_thread+0x548/0xe58 fs/io_uring.c:6750 >> [<ffffffe000005570>] ret_from_exception+0x0/0x14 >> >> Showing all locks held in the system: >> 3 locks held by kworker/u4:0/7: >> 3 locks held by kworker/1:0/19: >> 1 lock held by khungtaskd/1556: >> #0: ffffffe00592b5e8 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1fa kernel/locking/lockdep.c:6329 >> 1 lock held by klogd/3947: >> 2 locks held by getty/4142: >> #0: ffffffe00f1aa098 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0x3c/0x48 drivers/tty/tty_ldsem.c:340 >> #1: ffffffd010b6f2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9ac/0xb08 drivers/tty/n_tty.c:2178 >> 2 locks held by kworker/0:1/4375: >> >> ============================================= >> >> >> >> --- >> This report is generated by a bot. It may contain errors. >> See https://goo.gl/tpsmEJ for more information about syzbot. >> syzbot engineers can be reached at [email protected]. >> >> syzbot will keep track of this issue. See: >> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > Attaching a C reproducer for this bug: > > #define _GNU_SOURCE > > #include <fcntl.h> > #include <signal.h> > #include <stdint.h> > #include <stdlib.h> > #include <string.h> > #include <sys/mman.h> > #include <sys/syscall.h> > #include <sys/wait.h> > #include <time.h> > #include <unistd.h> > > static uint64_t current_time_ms(void) > { > struct timespec ts; > if (clock_gettime(CLOCK_MONOTONIC, &ts)) > exit(1); > return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; > } > > #define SIZEOF_IO_URING_SQE 64 > #define SIZEOF_IO_URING_CQE 16 > #define SQ_TAIL_OFFSET 64 > #define SQ_RING_MASK_OFFSET 256 > #define SQ_RING_ENTRIES_OFFSET 264 > #define CQ_RING_ENTRIES_OFFSET 268 > #define CQ_CQES_OFFSET 320 > > struct io_sqring_offsets { > uint32_t head; > uint32_t tail; > uint32_t ring_mask; > uint32_t ring_entries; > uint32_t flags; > uint32_t dropped; > uint32_t array; > uint32_t resv1; > uint64_t resv2; > }; > > struct io_cqring_offsets { > uint32_t head; > uint32_t tail; > uint32_t ring_mask; > uint32_t ring_entries; > uint32_t overflow; > uint32_t cqes; > uint64_t resv[2]; > }; > > struct io_uring_params { > uint32_t sq_entries; > uint32_t cq_entries; > uint32_t flags; > uint32_t sq_thread_cpu; > uint32_t sq_thread_idle; > uint32_t features; > uint32_t resv[4]; > struct io_sqring_offsets sq_off; > struct io_cqring_offsets cq_off; > }; > > #define IORING_OFF_SQ_RING 0 > #define IORING_OFF_SQES 0x10000000ULL > > #define sys_io_uring_setup 425 > static long syz_io_uring_setup(volatile long a0, volatile long a1, > volatile long a2, volatile long a3, volatile long a4, volatile long > a5) > { > uint32_t entries = (uint32_t)a0; > struct io_uring_params* setup_params = (struct io_uring_params*)a1; > void* vma1 = (void*)a2; > void* vma2 = (void*)a3; > void** ring_ptr_out = (void**)a4; > void** sqes_ptr_out = (void**)a5; > uint32_t fd_io_uring = syscall(sys_io_uring_setup, entries, setup_params); > uint32_t sq_ring_sz = setup_params->sq_off.array + > setup_params->sq_entries * sizeof(uint32_t); > uint32_t cq_ring_sz = setup_params->cq_off.cqes + > setup_params->cq_entries * SIZEOF_IO_URING_CQE; > uint32_t ring_sz = sq_ring_sz > cq_ring_sz ? sq_ring_sz : cq_ring_sz; > *ring_ptr_out = mmap(vma1, ring_sz, PROT_READ | PROT_WRITE, > MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd_io_uring, > IORING_OFF_SQ_RING); > uint32_t sqes_sz = setup_params->sq_entries * SIZEOF_IO_URING_SQE; > *sqes_ptr_out = mmap(vma2, sqes_sz, PROT_READ | PROT_WRITE, > MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd_io_uring, IORING_OFF_SQES); > return fd_io_uring; > } > > static long syz_io_uring_submit(volatile long a0, volatile long a1, > volatile long a2, volatile long a3) > { > char* ring_ptr = (char*)a0; > char* sqes_ptr = (char*)a1; > char* sqe = (char*)a2; > uint32_t sqes_index = (uint32_t)a3; > uint32_t sq_ring_entries = *(uint32_t*)(ring_ptr + SQ_RING_ENTRIES_OFFSET); > uint32_t cq_ring_entries = *(uint32_t*)(ring_ptr + CQ_RING_ENTRIES_OFFSET); > uint32_t sq_array_off = (CQ_CQES_OFFSET + cq_ring_entries * > SIZEOF_IO_URING_CQE + 63) & ~63; > if (sq_ring_entries) > sqes_index %= sq_ring_entries; > char* sqe_dest = sqes_ptr + sqes_index * SIZEOF_IO_URING_SQE; > memcpy(sqe_dest, sqe, SIZEOF_IO_URING_SQE); > uint32_t sq_ring_mask = *(uint32_t*)(ring_ptr + SQ_RING_MASK_OFFSET); > uint32_t* sq_tail_ptr = (uint32_t*)(ring_ptr + SQ_TAIL_OFFSET); > uint32_t sq_tail = *sq_tail_ptr & sq_ring_mask; > uint32_t sq_tail_next = *sq_tail_ptr + 1; > uint32_t* sq_array = (uint32_t*)(ring_ptr + sq_array_off); > *(sq_array + sq_tail) = sqes_index; > __atomic_store_n(sq_tail_ptr, sq_tail_next, __ATOMIC_RELEASE); > return 0; > } > > static void kill_and_wait(int pid, int* status) > { > kill(-pid, SIGKILL); > kill(pid, SIGKILL); > while (waitpid(-1, status, __WALL) != pid) { > } > } > > #define WAIT_FLAGS __WALL > > uint64_t r[3] = {0xffffffffffffffff, 0x0, 0x0}; > > void trigger_bug(void) > { > intptr_t res = 0; > *(uint32_t*)0x20000204 = 0; > *(uint32_t*)0x20000208 = 2; > *(uint32_t*)0x2000020c = 0; > *(uint32_t*)0x20000210 = 0; > *(uint32_t*)0x20000218 = -1; > memset((void*)0x2000021c, 0, 12); > res = -1; > res = syz_io_uring_setup(0x7987, 0x20000200, 0x20400000, > 0x20ffd000, 0x200000c0, 0x200001c0); > if (res != -1) { > r[0] = res; > r[1] = *(uint64_t*)0x200000c0; > r[2] = *(uint64_t*)0x200001c0; > } > *(uint8_t*)0x20000180 = 0xb; > *(uint8_t*)0x20000181 = 1; > *(uint16_t*)0x20000182 = 0; > *(uint32_t*)0x20000184 = 0; > *(uint64_t*)0x20000188 = 4; > *(uint64_t*)0x20000190 = 0x20000140; > *(uint64_t*)0x20000140 = 0x77359400; > *(uint64_t*)0x20000148 = 0; > *(uint32_t*)0x20000198 = 1; > *(uint32_t*)0x2000019c = 0; > *(uint64_t*)0x200001a0 = 0; > *(uint16_t*)0x200001a8 = 0; > *(uint16_t*)0x200001aa = 0; > memset((void*)0x200001ac, 0, 20); > syz_io_uring_submit(r[1], r[2], 0x20000180, 1); > *(uint32_t*)0x20000544 = 0; > *(uint32_t*)0x20000548 = 0x36; > *(uint32_t*)0x2000054c = 0; > *(uint32_t*)0x20000550 = 0; > *(uint32_t*)0x20000558 = r[0]; > memset((void*)0x2000055c, 0, 12); > syz_io_uring_setup(0x4bf1, 0x20000540, 0x20ffd000, 0x20ffc000, 0, 0); > > } > int main(void) > { > syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); > int pid = fork(); > if (pid < 0) > exit(1); > if (pid == 0) { > trigger_bug(); > exit(0); > } > int status = 0; > uint64_t start = current_time_ms(); > for (;;) { > if (current_time_ms() - start < 1000) { > continue; > } > kill_and_wait(pid, &status); > break; > } > return 0; > } > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 14:33 ` Pavel Begunkov @ 2021-04-30 15:02 ` Palash Oswal 2021-04-30 15:07 ` Pavel Begunkov ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Palash Oswal @ 2021-04-30 15:02 UTC (permalink / raw) To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: > > On 4/30/21 3:21 PM, Palash Oswal wrote: > > On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: > >> > >> Hello, > >> > >> syzbot found the following issue on: > >> > >> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. > >> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > >> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc > >> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 > >> userspace arch: riscv64 > >> CC: [[email protected] [email protected] [email protected] [email protected]] > >> > >> Unfortunately, I don't have any reproducer for this issue yet. > > There was so many fixes in 5.12 after this revision, including sqpoll > cancellation related... Can you try something more up-to-date? Like > released 5.12 or for-next > The reproducer works for 5.12. I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe commit on for-next tree https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next and the reproducer fails. Palash ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 15:02 ` Palash Oswal @ 2021-04-30 15:07 ` Pavel Begunkov 2021-04-30 15:11 ` Palash Oswal 2021-04-30 18:34 ` Pavel Begunkov 2021-05-02 10:33 ` Pavel Begunkov 2 siblings, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-04-30 15:07 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 4/30/21 4:02 PM, Palash Oswal wrote: > On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: >> >> On 4/30/21 3:21 PM, Palash Oswal wrote: >>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >>>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >>>> userspace arch: riscv64 >>>> CC: [[email protected] [email protected] [email protected] [email protected]] >>>> >>>> Unfortunately, I don't have any reproducer for this issue yet. >> >> There was so many fixes in 5.12 after this revision, including sqpoll >> cancellation related... Can you try something more up-to-date? Like >> released 5.12 or for-next >> > > The reproducer works for 5.12. Ok, any chance you have syz repro as well? it's easier to read > > I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe > commit on for-next tree > https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next > and the reproducer fails. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 15:07 ` Pavel Begunkov @ 2021-04-30 15:11 ` Palash Oswal 0 siblings, 0 replies; 20+ messages in thread From: Palash Oswal @ 2021-04-30 15:11 UTC (permalink / raw) To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On Fri, Apr 30, 2021 at 8:37 PM Pavel Begunkov <[email protected]> wrote: > > On 4/30/21 4:02 PM, Palash Oswal wrote: > > On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: > >> > >> On 4/30/21 3:21 PM, Palash Oswal wrote: > >>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: > >>>> > >>>> Hello, > >>>> > >>>> syzbot found the following issue on: > >>>> > >>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. > >>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 > >>>> userspace arch: riscv64 > >>>> CC: [[email protected] [email protected] [email protected] [email protected]] > >>>> > >>>> Unfortunately, I don't have any reproducer for this issue yet. > >> > >> There was so many fixes in 5.12 after this revision, including sqpoll > >> cancellation related... Can you try something more up-to-date? Like > >> released 5.12 or for-next > >> > > > > The reproducer works for 5.12. > > Ok, any chance you have syz repro as well? it's easier to read > > > Syzkaller reproducer: # {Threaded:false Collide:false Repeat:true RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false UseTmpDir:false HandleSegv:false Repro:false Trace:false} r0 = syz_io_uring_setup(0x7987, &(0x7f0000000200)={0x0, 0x0, 0x2}, &(0x7f0000400000/0xc00000)=nil, &(0x7f0000ffd000/0x3000)=nil, &(0x7f00000000c0)=<r1=>0x0, &(0x7f00000001c0)=<r2=>0x0) syz_io_uring_submit(r1, r2, &(0x7f0000000180)=@IORING_OP_TIMEOUT={0xb, 0x1, 0x0, 0x0, 0x4, &(0x7f0000000140)={0x77359400}}, 0x1) syz_io_uring_setup(0x4bf1, &(0x7f0000000540)={0x0, 0x0, 0x36, 0x0, 0x0, 0x0, r0}, &(0x7f0000ffd000/0x2000)=nil, &(0x7f0000ffc000/0x1000)=nil, 0x0, 0x0) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 15:02 ` Palash Oswal 2021-04-30 15:07 ` Pavel Begunkov @ 2021-04-30 18:34 ` Pavel Begunkov 2021-04-30 18:34 ` syzbot 2021-04-30 21:04 ` Pavel Begunkov 2021-05-02 10:33 ` Pavel Begunkov 2 siblings, 2 replies; 20+ messages in thread From: Pavel Begunkov @ 2021-04-30 18:34 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 4/30/21 4:02 PM, Palash Oswal wrote: > On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: >> >> On 4/30/21 3:21 PM, Palash Oswal wrote: >>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >>>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >>>> userspace arch: riscv64 >>>> CC: [[email protected] [email protected] [email protected] [email protected]] >>>> >>>> Unfortunately, I don't have any reproducer for this issue yet. >> >> There was so many fixes in 5.12 after this revision, including sqpoll >> cancellation related... Can you try something more up-to-date? Like >> released 5.12 or for-next >> > > The reproducer works for 5.12. > > I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe > commit on for-next tree > https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next > and the reproducer fails. Can't reproduce. Does it hang as in the original's report dmesg? Can you paste logs? #syz test: git://git.kernel.dk/linux-block io_uring-5.13 -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 18:34 ` Pavel Begunkov @ 2021-04-30 18:34 ` syzbot 2021-04-30 21:04 ` Pavel Begunkov 1 sibling, 0 replies; 20+ messages in thread From: syzbot @ 2021-04-30 18:34 UTC (permalink / raw) To: Pavel Begunkov Cc: asml.silence, axboe, io-uring, linux-kernel, oswalpalash, syzkaller-upstream-moderation > On 4/30/21 4:02 PM, Palash Oswal wrote: >> On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: >>> >>> On 4/30/21 3:21 PM, Palash Oswal wrote: >>>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >>>>> >>>>> Hello, >>>>> >>>>> syzbot found the following issue on: >>>>> >>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >>>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >>>>> userspace arch: riscv64 >>>>> CC: [[email protected] [email protected] [email protected] [email protected]] >>>>> >>>>> Unfortunately, I don't have any reproducer for this issue yet. >>> >>> There was so many fixes in 5.12 after this revision, including sqpoll >>> cancellation related... Can you try something more up-to-date? Like >>> released 5.12 or for-next >>> >> >> The reproducer works for 5.12. >> >> I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe >> commit on for-next tree >> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next >> and the reproducer fails. > > Can't reproduce. Does it hang as in the original's report dmesg? > Can you paste logs? > > #syz test: git://git.kernel.dk/linux-block io_uring-5.13 This crash does not have a reproducer. I cannot test it. > > -- > Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 18:34 ` Pavel Begunkov 2021-04-30 18:34 ` syzbot @ 2021-04-30 21:04 ` Pavel Begunkov 2021-05-01 4:51 ` Palash Oswal 1 sibling, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-04-30 21:04 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 4/30/21 7:34 PM, Pavel Begunkov wrote: > On 4/30/21 4:02 PM, Palash Oswal wrote: >> On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: >>> >>> On 4/30/21 3:21 PM, Palash Oswal wrote: >>>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >>>>> >>>>> Hello, >>>>> >>>>> syzbot found the following issue on: >>>>> >>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >>>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >>>>> userspace arch: riscv64 >>>>> CC: [[email protected] [email protected] [email protected] [email protected]] >>>>> >>>>> Unfortunately, I don't have any reproducer for this issue yet. >>> >>> There was so many fixes in 5.12 after this revision, including sqpoll >>> cancellation related... Can you try something more up-to-date? Like >>> released 5.12 or for-next >>> >> >> The reproducer works for 5.12. >> >> I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe >> commit on for-next tree >> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next >> and the reproducer fails. > > Can't reproduce. Does it hang as in the original's report dmesg? > Can you paste logs? and `uname -r` if you could > > #syz test: git://git.kernel.dk/linux-block io_uring-5.13 > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 21:04 ` Pavel Begunkov @ 2021-05-01 4:51 ` Palash Oswal 2021-05-01 7:39 ` Pavel Begunkov 0 siblings, 1 reply; 20+ messages in thread From: Palash Oswal @ 2021-05-01 4:51 UTC (permalink / raw) To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On Sat, May 1, 2021 at 2:35 AM Pavel Begunkov <[email protected]> wrote: > > On 4/30/21 7:34 PM, Pavel Begunkov wrote: > > On 4/30/21 4:02 PM, Palash Oswal wrote: > >> On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: > >>> > >>> On 4/30/21 3:21 PM, Palash Oswal wrote: > >>>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: > >>>>> > >>>>> Hello, > >>>>> > >>>>> syzbot found the following issue on: > >>>>> > >>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. > >>>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 > >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc > >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 > >>>>> userspace arch: riscv64 > >>>>> CC: [[email protected] [email protected] [email protected] [email protected]] > >>>>> > >>>>> Unfortunately, I don't have any reproducer for this issue yet. > >>> > >>> There was so many fixes in 5.12 after this revision, including sqpoll > >>> cancellation related... Can you try something more up-to-date? Like > >>> released 5.12 or for-next > >>> > >> > >> The reproducer works for 5.12. > >> > >> I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe > >> commit on for-next tree > >> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next > >> and the reproducer fails. > > > > Can't reproduce. Does it hang as in the original's report dmesg? > > Can you paste logs? > > and `uname -r` if you could root@syzkaller:~# echo 30 > /proc/sys/kernel/hung_task_timeout_secs root@syzkaller:~# uname -a Linux syzkaller 5.12.0 #112 SMP Sat May 1 10:13:41 IST 2021 x86_64 GNU/Linux root@syzkaller:~# ./repro [ 70.412424] repro[365]: segfault at 0 ip 0000556d88201005 sp 00007ffc7ddf2cd0 error 6 in repro[556d8] [ 70.417215] Code: cc 8b 75 c0 48 8b 45 e8 41 b9 00 00 00 00 41 89 d0 b9 11 80 00 00 ba 03 00 00 00 48 [ 121.593305] INFO: task iou-sqp-365:366 blocked for more than 30 seconds. [ 121.594448] Not tainted 5.12.0 #112 [ 121.595072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 121.596250] task:iou-sqp-365 state:D stack: 0 pid: 366 ppid: 364 flags:0x00004004 [ 121.597514] Call Trace: [ 121.598019] __schedule+0xb1d/0x1130 [ 121.598774] ? __sched_text_start+0x8/0x8 [ 121.599580] ? io_wq_worker_sleeping+0x145/0x500 [ 121.600442] schedule+0x131/0x1c0 [ 121.600902] io_uring_cancel_sqpoll+0x288/0x350 [ 121.601571] ? io_sq_thread_unpark+0xd0/0xd0 [ 121.602410] ? mutex_lock+0xbb/0x130 [ 121.603027] ? init_wait_entry+0xe0/0xe0 [ 121.603573] ? wait_for_completion_killable_timeout+0x20/0x20 [ 121.604454] io_sq_thread+0x174c/0x18c0 [ 121.605014] ? io_rsrc_put_work+0x380/0x380 [ 121.605652] ? init_wait_entry+0xe0/0xe0 [ 121.606428] ? _raw_spin_lock_irq+0xa5/0x180 [ 121.607262] ? _raw_spin_lock_irqsave+0x190/0x190 [ 121.608005] ? calculate_sigpending+0x6b/0xa0 [ 121.608636] ? io_rsrc_put_work+0x380/0x380 [ 121.609301] ret_from_fork+0x22/0x30 root@syzkaller:~# ps PID TTY TIME CMD 294 ttyS0 00:00:00 login 357 ttyS0 00:00:00 bash 365 ttyS0 00:00:00 repro 370 ttyS0 00:00:00 ps root@syzkaller:~# kill -9 365 root@syzkaller:~# [ 305.888970] INFO: task iou-sqp-365:366 blocked for more than 215 seconds. [ 305.893275] Not tainted 5.12.0 #112 [ 305.895507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 305.899685] task:iou-sqp-365 state:D stack: 0 pid: 366 ppid: 1 flags:0x00004004 [ 305.904071] Call Trace: [ 305.905616] __schedule+0xb1d/0x1130 [ 305.907660] ? __sched_text_start+0x8/0x8 [ 305.910314] ? io_wq_worker_sleeping+0x145/0x500 [ 305.913328] schedule+0x131/0x1c0 [ 305.914562] io_uring_cancel_sqpoll+0x288/0x350 [ 305.916513] ? io_sq_thread_unpark+0xd0/0xd0 [ 305.918346] ? mutex_lock+0xbb/0x130 [ 305.919616] ? init_wait_entry+0xe0/0xe0 [ 305.920896] ? wait_for_completion_killable_timeout+0x20/0x20 [ 305.922805] io_sq_thread+0x174c/0x18c0 [ 305.923876] ? io_rsrc_put_work+0x380/0x380 [ 305.924748] ? init_wait_entry+0xe0/0xe0 [ 305.925523] ? _raw_spin_lock_irq+0xa5/0x180 [ 305.926353] ? _raw_spin_lock_irqsave+0x190/0x190 [ 305.927436] ? calculate_sigpending+0x6b/0xa0 [ 305.928266] ? io_rsrc_put_work+0x380/0x380 [ 305.929104] ret_from_fork+0x22/0x30 The trace from my syzkaller instance: Syzkaller hit 'INFO: task hung in io_uring_cancel_sqpoll' bug. syz-executor198[307]: segfault at 0 ip 00000000004020f3 sp 00007ffd21853620 error 6 in syz-executor198379463[401000+96000] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 45 64 39 c6 48 0f 42 f0 45 31 c9 e8 96 8a 04 00 8b 75 00 41 89 d8 4c 89 ef <49> 89 06 41 b9 00 00 00 10 b9 11 80 00 00 ba 03 00 00 00 c1 e6 06 INFO: task iou-sqp-307:308 blocked for more than 120 seconds. Not tainted 5.12.0 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:iou-sqp-307 state:D stack: 0 pid: 308 ppid: 306 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4322 [inline] __schedule+0x75f/0xa10 kernel/sched/core.c:5073 schedule+0xb7/0x110 kernel/sched/core.c:5152 io_uring_cancel_sqpoll+0x1c6/0x290 fs/io_uring.c:9018 io_sq_thread+0xf8c/0x1080 fs/io_uring.c:6836 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-05-01 4:51 ` Palash Oswal @ 2021-05-01 7:39 ` Pavel Begunkov 0 siblings, 0 replies; 20+ messages in thread From: Pavel Begunkov @ 2021-05-01 7:39 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 5/1/21 5:51 AM, Palash Oswal wrote: > On Sat, May 1, 2021 at 2:35 AM Pavel Begunkov <[email protected]> wrote: >> >> On 4/30/21 7:34 PM, Pavel Begunkov wrote: >>> On 4/30/21 4:02 PM, Palash Oswal wrote: >>>> On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: >>>>> >>>>> On 4/30/21 3:21 PM, Palash Oswal wrote: >>>>>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> syzbot found the following issue on: >>>>>>> >>>>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >>>>>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >>>>>>> userspace arch: riscv64 >>>>>>> CC: [[email protected] [email protected] [email protected] [email protected]] >>>>>>> >>>>>>> Unfortunately, I don't have any reproducer for this issue yet. >>>>> >>>>> There was so many fixes in 5.12 after this revision, including sqpoll >>>>> cancellation related... Can you try something more up-to-date? Like >>>>> released 5.12 or for-next >>>>> >>>> >>>> The reproducer works for 5.12. >>>> >>>> I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe >>>> commit on for-next tree >>>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next >>>> and the reproducer fails. >>> >>> Can't reproduce. Does it hang as in the original's report dmesg? >>> Can you paste logs? >> >> and `uname -r` if you could Great, thanks. I'll take a look later > > root@syzkaller:~# echo 30 > /proc/sys/kernel/hung_task_timeout_secs > root@syzkaller:~# uname -a > Linux syzkaller 5.12.0 #112 SMP Sat May 1 10:13:41 IST 2021 x86_64 GNU/Linux > root@syzkaller:~# ./repro > [ 70.412424] repro[365]: segfault at 0 ip 0000556d88201005 sp > 00007ffc7ddf2cd0 error 6 in repro[556d8] > [ 70.417215] Code: cc 8b 75 c0 48 8b 45 e8 41 b9 00 00 00 00 41 89 > d0 b9 11 80 00 00 ba 03 00 00 00 48 > [ 121.593305] INFO: task iou-sqp-365:366 blocked for more than 30 seconds. > [ 121.594448] Not tainted 5.12.0 #112 > [ 121.595072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 121.596250] task:iou-sqp-365 state:D stack: 0 pid: 366 > ppid: 364 flags:0x00004004 > [ 121.597514] Call Trace: > [ 121.598019] __schedule+0xb1d/0x1130 > [ 121.598774] ? __sched_text_start+0x8/0x8 > [ 121.599580] ? io_wq_worker_sleeping+0x145/0x500 > [ 121.600442] schedule+0x131/0x1c0 > [ 121.600902] io_uring_cancel_sqpoll+0x288/0x350 > [ 121.601571] ? io_sq_thread_unpark+0xd0/0xd0 > [ 121.602410] ? mutex_lock+0xbb/0x130 > [ 121.603027] ? init_wait_entry+0xe0/0xe0 > [ 121.603573] ? wait_for_completion_killable_timeout+0x20/0x20 > [ 121.604454] io_sq_thread+0x174c/0x18c0 > [ 121.605014] ? io_rsrc_put_work+0x380/0x380 > [ 121.605652] ? init_wait_entry+0xe0/0xe0 > [ 121.606428] ? _raw_spin_lock_irq+0xa5/0x180 > [ 121.607262] ? _raw_spin_lock_irqsave+0x190/0x190 > [ 121.608005] ? calculate_sigpending+0x6b/0xa0 > [ 121.608636] ? io_rsrc_put_work+0x380/0x380 > [ 121.609301] ret_from_fork+0x22/0x30 > > > root@syzkaller:~# ps > PID TTY TIME CMD > 294 ttyS0 00:00:00 login > 357 ttyS0 00:00:00 bash > 365 ttyS0 00:00:00 repro > 370 ttyS0 00:00:00 ps > root@syzkaller:~# kill -9 365 > root@syzkaller:~# [ 305.888970] INFO: task iou-sqp-365:366 blocked > for more than 215 seconds. > [ 305.893275] Not tainted 5.12.0 #112 > [ 305.895507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 305.899685] task:iou-sqp-365 state:D stack: 0 pid: 366 > ppid: 1 flags:0x00004004 > [ 305.904071] Call Trace: > [ 305.905616] __schedule+0xb1d/0x1130 > [ 305.907660] ? __sched_text_start+0x8/0x8 > [ 305.910314] ? io_wq_worker_sleeping+0x145/0x500 > [ 305.913328] schedule+0x131/0x1c0 > [ 305.914562] io_uring_cancel_sqpoll+0x288/0x350 > [ 305.916513] ? io_sq_thread_unpark+0xd0/0xd0 > [ 305.918346] ? mutex_lock+0xbb/0x130 > [ 305.919616] ? init_wait_entry+0xe0/0xe0 > [ 305.920896] ? wait_for_completion_killable_timeout+0x20/0x20 > [ 305.922805] io_sq_thread+0x174c/0x18c0 > [ 305.923876] ? io_rsrc_put_work+0x380/0x380 > [ 305.924748] ? init_wait_entry+0xe0/0xe0 > [ 305.925523] ? _raw_spin_lock_irq+0xa5/0x180 > [ 305.926353] ? _raw_spin_lock_irqsave+0x190/0x190 > [ 305.927436] ? calculate_sigpending+0x6b/0xa0 > [ 305.928266] ? io_rsrc_put_work+0x380/0x380 > [ 305.929104] ret_from_fork+0x22/0x30 > > The trace from my syzkaller instance: > > Syzkaller hit 'INFO: task hung in io_uring_cancel_sqpoll' bug. > > syz-executor198[307]: segfault at 0 ip 00000000004020f3 sp > 00007ffd21853620 error 6 in syz-executor198379463[401000+96000] > Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 45 64 39 c6 > 48 0f 42 f0 45 31 c9 e8 96 8a 04 00 8b 75 00 41 89 d8 4c 89 ef <49> 89 > 06 41 b9 00 00 00 10 b9 11 80 00 00 ba 03 00 00 00 c1 e6 06 > INFO: task iou-sqp-307:308 blocked for more than 120 seconds. > Not tainted 5.12.0 #2 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:iou-sqp-307 state:D stack: 0 pid: 308 ppid: 306 flags:0x00004004 > Call Trace: > context_switch kernel/sched/core.c:4322 [inline] > __schedule+0x75f/0xa10 kernel/sched/core.c:5073 > schedule+0xb7/0x110 kernel/sched/core.c:5152 > io_uring_cancel_sqpoll+0x1c6/0x290 fs/io_uring.c:9018 > io_sq_thread+0xf8c/0x1080 fs/io_uring.c:6836 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 15:02 ` Palash Oswal 2021-04-30 15:07 ` Pavel Begunkov 2021-04-30 18:34 ` Pavel Begunkov @ 2021-05-02 10:33 ` Pavel Begunkov 2021-05-02 15:01 ` Palash Oswal 2 siblings, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-05-02 10:33 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 4/30/21 4:02 PM, Palash Oswal wrote: > On Fri, Apr 30, 2021 at 8:03 PM Pavel Begunkov <[email protected]> wrote: >> >> On 4/30/21 3:21 PM, Palash Oswal wrote: >>> On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >>>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >>>> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >>>> userspace arch: riscv64 >>>> CC: [[email protected] [email protected] [email protected] [email protected]] >>>> >>>> Unfortunately, I don't have any reproducer for this issue yet. >> >> There was so many fixes in 5.12 after this revision, including sqpoll >> cancellation related... Can you try something more up-to-date? Like >> released 5.12 or for-next >> > > The reproducer works for 5.12. Tried 5.12, [ 245.467397] INFO: task iou-sqp-2018:2019 blocked for more than 122 seconds. [ 245.467424] Call Trace: [ 245.467432] __schedule+0x365/0x960 [ 245.467444] schedule+0x68/0xe0 [ 245.467450] io_uring_cancel_sqpoll+0xdb/0x110 [ 245.467461] ? wait_woken+0x80/0x80 [ 245.467472] io_sq_thread+0x1c6/0x6c0 [ 245.467482] ? wait_woken+0x80/0x80 [ 245.467491] ? finish_task_switch.isra.0+0xca/0x2e0 [ 245.467497] ? io_wq_submit_work+0x140/0x140 [ 245.467506] ret_from_fork+0x22/0x30 [ 245.467520] INFO: task iou-sqp-2052:2053 blocked for more than 122 seconds. [ 245.467536] Call Trace: [ 245.467539] __schedule+0x365/0x960 [ 245.467545] schedule+0x68/0xe0 [ 245.467550] io_uring_cancel_sqpoll+0xdb/0x110 [ 245.467559] ? wait_woken+0x80/0x80 [ 245.467568] io_sq_thread+0x1c6/0x6c0 [ 245.467577] ? wait_woken+0x80/0x80 [ 245.467586] ? finish_task_switch.isra.0+0xca/0x2e0 [ 245.467591] ? io_wq_submit_work+0x140/0x140 > > I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe However, there is a bunch patches fixing sqpoll cancellations in 5.13, all are waiting for backporting. and for-next doesn't trigger the issue for me. Are you absolutely sure b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe does hit it? > commit on for-next tree > https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next > and the reproducer fails. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-05-02 10:33 ` Pavel Begunkov @ 2021-05-02 15:01 ` Palash Oswal 2021-05-02 15:56 ` Pavel Begunkov 0 siblings, 1 reply; 20+ messages in thread From: Palash Oswal @ 2021-05-02 15:01 UTC (permalink / raw) To: Pavel Begunkov, Hillf Danton Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On Sun, May 2, 2021 at 4:04 PM Pavel Begunkov <[email protected]> wrote: > > I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe > > However, there is a bunch patches fixing sqpoll cancellations in > 5.13, all are waiting for backporting. and for-next doesn't trigger > the issue for me. > > Are you absolutely sure b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe > does hit it? > > > commit on for-next tree > > https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next > > and the reproducer fails. Hi Pavel and Hillf, I believe there's a little misunderstanding, apologies. The reproducer does not trigger the bug on the for-next tree which has patches for 5.13. The reproducer process exits correctly. Likely one of those commits that will be back-ported to 5.12 will address this issue. When I wrote `the reproducer fails`, I meant to indicate that the bug is not triggered on for-next. I will word it better next time! Palash ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-05-02 15:01 ` Palash Oswal @ 2021-05-02 15:56 ` Pavel Begunkov 0 siblings, 0 replies; 20+ messages in thread From: Pavel Begunkov @ 2021-05-02 15:56 UTC (permalink / raw) To: Palash Oswal, Hillf Danton Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 5/2/21 4:01 PM, Palash Oswal wrote: > On Sun, May 2, 2021 at 4:04 PM Pavel Begunkov <[email protected]> wrote: > >>> I tested against the HEAD b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe >> >> However, there is a bunch patches fixing sqpoll cancellations in >> 5.13, all are waiting for backporting. and for-next doesn't trigger >> the issue for me. >> >> Are you absolutely sure b1ef997bec4d5cf251bfb5e47f7b04afa49bcdfe >> does hit it? >> >>> commit on for-next tree >>> https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/?h=for-next >>> and the reproducer fails. > > Hi Pavel and Hillf, > > I believe there's a little misunderstanding, apologies. The reproducer > does not trigger the bug on the for-next tree which has patches for > 5.13. The reproducer process exits correctly. Likely one of those > commits that will be back-ported to 5.12 will address this issue. > When I wrote `the reproducer fails`, I meant to indicate that the bug > is not triggered on for-next. I will word it better next time! I see, great it's clarified -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-04-30 14:21 ` INFO: task hung in io_uring_cancel_sqpoll Palash Oswal 2021-04-30 14:33 ` Pavel Begunkov @ 2021-05-02 22:12 ` Pavel Begunkov 2021-05-03 4:41 ` Palash Oswal 1 sibling, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-05-02 22:12 UTC (permalink / raw) To: Palash Oswal, Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 4/30/21 3:21 PM, Palash Oswal wrote: > On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: >> >> Hello, >> >> syzbot found the following issue on: >> >> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. >> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes >> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc >> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 >> userspace arch: riscv64 >> CC: [[email protected] [email protected] [email protected] [email protected]] >> >> Unfortunately, I don't have any reproducer for this issue yet. >> >> IMPORTANT: if you fix the issue, please add the following tag to the commit: >> Reported-by: [email protected] >> >> INFO: task iou-sqp-4867:4871 blocked for more than 430 seconds. >> Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> task:iou-sqp-4867 state:D stack: 0 pid: 4871 ppid: 4259 flags:0x00000004 >> Call Trace: >> [<ffffffe003bc3252>] context_switch kernel/sched/core.c:4324 [inline] >> [<ffffffe003bc3252>] __schedule+0x478/0xdec kernel/sched/core.c:5075 >> [<ffffffe003bc3c2a>] schedule+0x64/0x166 kernel/sched/core.c:5154 >> [<ffffffe0004b80ee>] io_uring_cancel_sqpoll+0x1de/0x294 fs/io_uring.c:8858 >> [<ffffffe0004c19cc>] io_sq_thread+0x548/0xe58 fs/io_uring.c:6750 >> [<ffffffe000005570>] ret_from_exception+0x0/0x14 The test might be very useful. Would you send a patch to liburing? Or mind the repro being taken as a test? >> >> Showing all locks held in the system: >> 3 locks held by kworker/u4:0/7: >> 3 locks held by kworker/1:0/19: >> 1 lock held by khungtaskd/1556: >> #0: ffffffe00592b5e8 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1fa kernel/locking/lockdep.c:6329 >> 1 lock held by klogd/3947: >> 2 locks held by getty/4142: >> #0: ffffffe00f1aa098 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0x3c/0x48 drivers/tty/tty_ldsem.c:340 >> #1: ffffffd010b6f2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9ac/0xb08 drivers/tty/n_tty.c:2178 >> 2 locks held by kworker/0:1/4375: >> >> ============================================= >> >> >> >> --- >> This report is generated by a bot. It may contain errors. >> See https://goo.gl/tpsmEJ for more information about syzbot. >> syzbot engineers can be reached at [email protected]. >> >> syzbot will keep track of this issue. See: >> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > Attaching a C reproducer for this bug: > > #define _GNU_SOURCE > > #include <fcntl.h> > #include <signal.h> > #include <stdint.h> > #include <stdlib.h> > #include <string.h> > #include <sys/mman.h> > #include <sys/syscall.h> > #include <sys/wait.h> > #include <time.h> > #include <unistd.h> > > static uint64_t current_time_ms(void) > { > struct timespec ts; > if (clock_gettime(CLOCK_MONOTONIC, &ts)) > exit(1); > return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; > } > > #define SIZEOF_IO_URING_SQE 64 > #define SIZEOF_IO_URING_CQE 16 > #define SQ_TAIL_OFFSET 64 > #define SQ_RING_MASK_OFFSET 256 > #define SQ_RING_ENTRIES_OFFSET 264 > #define CQ_RING_ENTRIES_OFFSET 268 > #define CQ_CQES_OFFSET 320 > > struct io_sqring_offsets { > uint32_t head; > uint32_t tail; > uint32_t ring_mask; > uint32_t ring_entries; > uint32_t flags; > uint32_t dropped; > uint32_t array; > uint32_t resv1; > uint64_t resv2; > }; > > struct io_cqring_offsets { > uint32_t head; > uint32_t tail; > uint32_t ring_mask; > uint32_t ring_entries; > uint32_t overflow; > uint32_t cqes; > uint64_t resv[2]; > }; > > struct io_uring_params { > uint32_t sq_entries; > uint32_t cq_entries; > uint32_t flags; > uint32_t sq_thread_cpu; > uint32_t sq_thread_idle; > uint32_t features; > uint32_t resv[4]; > struct io_sqring_offsets sq_off; > struct io_cqring_offsets cq_off; > }; > > #define IORING_OFF_SQ_RING 0 > #define IORING_OFF_SQES 0x10000000ULL > > #define sys_io_uring_setup 425 > static long syz_io_uring_setup(volatile long a0, volatile long a1, > volatile long a2, volatile long a3, volatile long a4, volatile long > a5) > { > uint32_t entries = (uint32_t)a0; > struct io_uring_params* setup_params = (struct io_uring_params*)a1; > void* vma1 = (void*)a2; > void* vma2 = (void*)a3; > void** ring_ptr_out = (void**)a4; > void** sqes_ptr_out = (void**)a5; > uint32_t fd_io_uring = syscall(sys_io_uring_setup, entries, setup_params); > uint32_t sq_ring_sz = setup_params->sq_off.array + > setup_params->sq_entries * sizeof(uint32_t); > uint32_t cq_ring_sz = setup_params->cq_off.cqes + > setup_params->cq_entries * SIZEOF_IO_URING_CQE; > uint32_t ring_sz = sq_ring_sz > cq_ring_sz ? sq_ring_sz : cq_ring_sz; > *ring_ptr_out = mmap(vma1, ring_sz, PROT_READ | PROT_WRITE, > MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd_io_uring, > IORING_OFF_SQ_RING); > uint32_t sqes_sz = setup_params->sq_entries * SIZEOF_IO_URING_SQE; > *sqes_ptr_out = mmap(vma2, sqes_sz, PROT_READ | PROT_WRITE, > MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd_io_uring, IORING_OFF_SQES); > return fd_io_uring; > } > > static long syz_io_uring_submit(volatile long a0, volatile long a1, > volatile long a2, volatile long a3) > { > char* ring_ptr = (char*)a0; > char* sqes_ptr = (char*)a1; > char* sqe = (char*)a2; > uint32_t sqes_index = (uint32_t)a3; > uint32_t sq_ring_entries = *(uint32_t*)(ring_ptr + SQ_RING_ENTRIES_OFFSET); > uint32_t cq_ring_entries = *(uint32_t*)(ring_ptr + CQ_RING_ENTRIES_OFFSET); > uint32_t sq_array_off = (CQ_CQES_OFFSET + cq_ring_entries * > SIZEOF_IO_URING_CQE + 63) & ~63; > if (sq_ring_entries) > sqes_index %= sq_ring_entries; > char* sqe_dest = sqes_ptr + sqes_index * SIZEOF_IO_URING_SQE; > memcpy(sqe_dest, sqe, SIZEOF_IO_URING_SQE); > uint32_t sq_ring_mask = *(uint32_t*)(ring_ptr + SQ_RING_MASK_OFFSET); > uint32_t* sq_tail_ptr = (uint32_t*)(ring_ptr + SQ_TAIL_OFFSET); > uint32_t sq_tail = *sq_tail_ptr & sq_ring_mask; > uint32_t sq_tail_next = *sq_tail_ptr + 1; > uint32_t* sq_array = (uint32_t*)(ring_ptr + sq_array_off); > *(sq_array + sq_tail) = sqes_index; > __atomic_store_n(sq_tail_ptr, sq_tail_next, __ATOMIC_RELEASE); > return 0; > } > > static void kill_and_wait(int pid, int* status) > { > kill(-pid, SIGKILL); > kill(pid, SIGKILL); > while (waitpid(-1, status, __WALL) != pid) { > } > } > > #define WAIT_FLAGS __WALL > > uint64_t r[3] = {0xffffffffffffffff, 0x0, 0x0}; > > void trigger_bug(void) > { > intptr_t res = 0; > *(uint32_t*)0x20000204 = 0; > *(uint32_t*)0x20000208 = 2; > *(uint32_t*)0x2000020c = 0; > *(uint32_t*)0x20000210 = 0; > *(uint32_t*)0x20000218 = -1; > memset((void*)0x2000021c, 0, 12); > res = -1; > res = syz_io_uring_setup(0x7987, 0x20000200, 0x20400000, > 0x20ffd000, 0x200000c0, 0x200001c0); > if (res != -1) { > r[0] = res; > r[1] = *(uint64_t*)0x200000c0; > r[2] = *(uint64_t*)0x200001c0; > } > *(uint8_t*)0x20000180 = 0xb; > *(uint8_t*)0x20000181 = 1; > *(uint16_t*)0x20000182 = 0; > *(uint32_t*)0x20000184 = 0; > *(uint64_t*)0x20000188 = 4; > *(uint64_t*)0x20000190 = 0x20000140; > *(uint64_t*)0x20000140 = 0x77359400; > *(uint64_t*)0x20000148 = 0; > *(uint32_t*)0x20000198 = 1; > *(uint32_t*)0x2000019c = 0; > *(uint64_t*)0x200001a0 = 0; > *(uint16_t*)0x200001a8 = 0; > *(uint16_t*)0x200001aa = 0; > memset((void*)0x200001ac, 0, 20); > syz_io_uring_submit(r[1], r[2], 0x20000180, 1); > *(uint32_t*)0x20000544 = 0; > *(uint32_t*)0x20000548 = 0x36; > *(uint32_t*)0x2000054c = 0; > *(uint32_t*)0x20000550 = 0; > *(uint32_t*)0x20000558 = r[0]; > memset((void*)0x2000055c, 0, 12); > syz_io_uring_setup(0x4bf1, 0x20000540, 0x20ffd000, 0x20ffc000, 0, 0); > > } > int main(void) > { > syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); > int pid = fork(); > if (pid < 0) > exit(1); > if (pid == 0) { > trigger_bug(); > exit(0); > } > int status = 0; > uint64_t start = current_time_ms(); > for (;;) { > if (current_time_ms() - start < 1000) { > continue; > } > kill_and_wait(pid, &status); > break; > } > return 0; > } > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-05-02 22:12 ` Pavel Begunkov @ 2021-05-03 4:41 ` Palash Oswal 2021-05-03 10:45 ` Pavel Begunkov 0 siblings, 1 reply; 20+ messages in thread From: Palash Oswal @ 2021-05-03 4:41 UTC (permalink / raw) To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On Mon, May 3, 2021 at 3:42 AM Pavel Begunkov <[email protected]> wrote: > > On 4/30/21 3:21 PM, Palash Oswal wrote: > > On Thursday, March 18, 2021 at 9:40:21 PM UTC+5:30 syzbot wrote: > >> > >> Hello, > >> > >> syzbot found the following issue on: > >> > >> HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. > >> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > >> console output: https://syzkaller.appspot.com/x/log.txt?x=12dde5aed00000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=81c0b708b31626cc > >> dashboard link: https://syzkaller.appspot.com/bug?extid=11bf59db879676f59e52 > >> userspace arch: riscv64 > >> CC: [[email protected] [email protected] [email protected] [email protected]] > >> > >> Unfortunately, I don't have any reproducer for this issue yet. > >> > >> IMPORTANT: if you fix the issue, please add the following tag to the commit: > >> Reported-by: [email protected] > >> > >> INFO: task iou-sqp-4867:4871 blocked for more than 430 seconds. > >> Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 > >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> task:iou-sqp-4867 state:D stack: 0 pid: 4871 ppid: 4259 flags:0x00000004 > >> Call Trace: > >> [<ffffffe003bc3252>] context_switch kernel/sched/core.c:4324 [inline] > >> [<ffffffe003bc3252>] __schedule+0x478/0xdec kernel/sched/core.c:5075 > >> [<ffffffe003bc3c2a>] schedule+0x64/0x166 kernel/sched/core.c:5154 > >> [<ffffffe0004b80ee>] io_uring_cancel_sqpoll+0x1de/0x294 fs/io_uring.c:8858 > >> [<ffffffe0004c19cc>] io_sq_thread+0x548/0xe58 fs/io_uring.c:6750 > >> [<ffffffe000005570>] ret_from_exception+0x0/0x14 > > The test might be very useful. Would you send a patch to > liburing? Or mind the repro being taken as a test? > > Pavel, I'm working on a PR for liburing with this test. Do you know how I can address this behavior? root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang Running test sqp[ 15.310997] Running test sqpoll-cancel-hang: oll-cancel-hang: [ 15.333348] sqpoll-cancel-h[305]: segfault at 0 ip 000055ad00e265e3 sp] [ 15.334940] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 46 All tests passed root@syzkaller:~/liburing/test# ./sqpoll-cancel-hang [ 13.572639] sqpoll-cancel-h[298]: segfault at 0 ip 00005634c4a455e3 sp] [ 13.576506] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 46 [ 23.350459] random: crng init done [ 23.352837] random: 7 urandom warning(s) missed due to ratelimiting [ 243.090865] INFO: task iou-sqp-298:299 blocked for more than 120 secon. [ 243.095187] Not tainted 5.12.0 #142 [ 243.099800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disable. [ 243.105928] task:iou-sqp-298 state:D stack: 0 pid: 299 ppid: 4 [ 243.111044] Call Trace: [ 243.112855] __schedule+0xb1d/0x1130 [ 243.115549] ? __sched_text_start+0x8/0x8 [ 243.118328] ? io_wq_worker_sleeping+0x145/0x500 [ 243.121790] schedule+0x131/0x1c0 [ 243.123698] io_uring_cancel_sqpoll+0x288/0x350 [ 243.125977] ? io_sq_thread_unpark+0xd0/0xd0 [ 243.128966] ? mutex_lock+0xbb/0x130 [ 243.132572] ? init_wait_entry+0xe0/0xe0 [ 243.135429] ? wait_for_completion_killable_timeout+0x20/0x20 [ 243.138303] io_sq_thread+0x174c/0x18c0 [ 243.140162] ? io_rsrc_put_work+0x380/0x380 [ 243.141613] ? init_wait_entry+0xe0/0xe0 [ 243.143686] ? _raw_spin_lock_irq+0xa5/0x180 [ 243.145619] ? _raw_spin_lock_irqsave+0x190/0x190 [ 243.147671] ? calculate_sigpending+0x6b/0xa0 [ 243.149036] ? io_rsrc_put_work+0x380/0x380 [ 243.150694] ret_from_fork+0x22/0x30 Palash ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-05-03 4:41 ` Palash Oswal @ 2021-05-03 10:45 ` Pavel Begunkov [not found] ` <CAGyP=7eoSfh7z638PnP5UF4xVKcrG1jB_qmFo6uPZ7iWfu_2sQ@mail.gmail.com> 0 siblings, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-05-03 10:45 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 5/3/21 5:41 AM, Palash Oswal wrote: > On Mon, May 3, 2021 at 3:42 AM Pavel Begunkov <[email protected]> wrote: >> The test might be very useful. Would you send a patch to >> liburing? Or mind the repro being taken as a test? > > Pavel, > > I'm working on a PR for liburing with this test. Do you know how I can Perfect, thanks > address this behavior? As the hang is sqpoll cancellations, it's most probably fixed in 5.13 + again awaits to be taken for stable. Don't know about segfaults, but it was so for long, and syz reproducers are ofthen faults for me, and exit with 0 in the end. So, I wouldn't worry about it. > root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang > Running test sqp[ 15.310997] Running test sqpoll-cancel-hang: > oll-cancel-hang: > [ 15.333348] sqpoll-cancel-h[305]: segfault at 0 ip 000055ad00e265e3 sp] > [ 15.334940] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 46 > All tests passed > > root@syzkaller:~/liburing/test# ./sqpoll-cancel-hang > [ 13.572639] sqpoll-cancel-h[298]: segfault at 0 ip 00005634c4a455e3 sp] > [ 13.576506] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 46 > [ 23.350459] random: crng init done > [ 23.352837] random: 7 urandom warning(s) missed due to ratelimiting > [ 243.090865] INFO: task iou-sqp-298:299 blocked for more than 120 secon. > [ 243.095187] Not tainted 5.12.0 #142 > [ 243.099800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disable. > [ 243.105928] task:iou-sqp-298 state:D stack: 0 pid: 299 ppid: 4 > [ 243.111044] Call Trace: > [ 243.112855] __schedule+0xb1d/0x1130 > [ 243.115549] ? __sched_text_start+0x8/0x8 > [ 243.118328] ? io_wq_worker_sleeping+0x145/0x500 > [ 243.121790] schedule+0x131/0x1c0 > [ 243.123698] io_uring_cancel_sqpoll+0x288/0x350 > [ 243.125977] ? io_sq_thread_unpark+0xd0/0xd0 > [ 243.128966] ? mutex_lock+0xbb/0x130 > [ 243.132572] ? init_wait_entry+0xe0/0xe0 > [ 243.135429] ? wait_for_completion_killable_timeout+0x20/0x20 > [ 243.138303] io_sq_thread+0x174c/0x18c0 > [ 243.140162] ? io_rsrc_put_work+0x380/0x380 > [ 243.141613] ? init_wait_entry+0xe0/0xe0 > [ 243.143686] ? _raw_spin_lock_irq+0xa5/0x180 > [ 243.145619] ? _raw_spin_lock_irqsave+0x190/0x190 > [ 243.147671] ? calculate_sigpending+0x6b/0xa0 > [ 243.149036] ? io_rsrc_put_work+0x380/0x380 > [ 243.150694] ret_from_fork+0x22/0x30 > > Palash > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <CAGyP=7eoSfh7z638PnP5UF4xVKcrG1jB_qmFo6uPZ7iWfu_2sQ@mail.gmail.com>]
* Re: INFO: task hung in io_uring_cancel_sqpoll [not found] ` <CAGyP=7eoSfh7z638PnP5UF4xVKcrG1jB_qmFo6uPZ7iWfu_2sQ@mail.gmail.com> @ 2021-05-13 21:28 ` Pavel Begunkov 2021-05-13 21:31 ` Pavel Begunkov 0 siblings, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-05-13 21:28 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 5/10/21 5:47 AM, Palash Oswal wrote: > On Mon, May 3, 2021 at 4:15 PM Pavel Begunkov <[email protected]> > wrote: > >> On 5/3/21 5:41 AM, Palash Oswal wrote: >>> On Mon, May 3, 2021 at 3:42 AM Pavel Begunkov <[email protected]> >> wrote: >>>> The test might be very useful. Would you send a patch to >>>> liburing? Or mind the repro being taken as a test? >>> >>> Pavel, >>> >>> I'm working on a PR for liburing with this test. Do you know how I can >> >> Perfect, thanks >> >>> address this behavior? >> >> As the hang is sqpoll cancellations, it's most probably >> fixed in 5.13 + again awaits to be taken for stable. >> >> Don't know about segfaults, but it was so for long, and >> syz reproducers are ofthen faults for me, and exit with 0 >> in the end. So, I wouldn't worry about it. >> >> > Hey Pavel, > The bug actually fails to reproduce on 5.12 when the fork() call is made by > the runtests.sh script. This causes the program to end correctly, and the > hang does not occur. I verified this on 5.12 where the bug isn't patched. > Just running the `sqpoll-cancel-hang` triggers the bug; whereas running it > after being forked from runtests.sh does not trigger the bug. I see. fyi, it's always good to wait for 5 minutes, because some useful logs are not generated immediately but do timeout based hang detection. I'd think that may be due CLONE_IO and how to whom it binds workers, but can you try first: 1) timeout -s INT -k $TIMEOUT $TIMEOUT ./sqpoll-cancel-hang 2) remove timeout from <liburing>/tests/Makefile and run "./runtests.sh sqpoll-cancel-hang" again looking for faults? diff --git a/test/runtests.sh b/test/runtests.sh index e8f4ae5..2b51dca 100755 --- a/test/runtests.sh +++ b/test/runtests.sh @@ -91,7 +91,8 @@ run_test() fi # Run the test - timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev + # timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev + ./$test_name $dev local status=$? # Check test status > > The segfaults are benign, but notice the "All tests passed" in the previous > mail. It should not have passed, as the run was on 5.12. Therefore I wanted > to ask your input on how to address this odd behaviour, where the > involvement of runtests.sh actually mitigated the bug. > > > >>> root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang >>> Running test sqp[ 15.310997] Running test sqpoll-cancel-hang: >>> oll-cancel-hang: >>> [ 15.333348] sqpoll-cancel-h[305]: segfault at 0 ip 000055ad00e265e3 >> sp] >>> [ 15.334940] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 >> 46 >>> All tests passed >>> >>> root@syzkaller:~/liburing/test# ./sqpoll-cancel-hang >>> [ 13.572639] sqpoll-cancel-h[298]: segfault at 0 ip 00005634c4a455e3 >> sp] >>> [ 13.576506] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 >> 46 >>> [ 23.350459] random: crng init done >>> [ 23.352837] random: 7 urandom warning(s) missed due to ratelimiting >>> [ 243.090865] INFO: task iou-sqp-298:299 blocked for more than 120 >> secon. >>> [ 243.095187] Not tainted 5.12.0 #142 >>> [ 243.099800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disable. >>> [ 243.105928] task:iou-sqp-298 state:D stack: 0 pid: 299 ppid: >> 4 >>> [ 243.111044] Call Trace: >>> [ 243.112855] __schedule+0xb1d/0x1130 >>> [ 243.115549] ? __sched_text_start+0x8/0x8 >>> [ 243.118328] ? io_wq_worker_sleeping+0x145/0x500 >>> [ 243.121790] schedule+0x131/0x1c0 >>> [ 243.123698] io_uring_cancel_sqpoll+0x288/0x350 >>> [ 243.125977] ? io_sq_thread_unpark+0xd0/0xd0 >>> [ 243.128966] ? mutex_lock+0xbb/0x130 >>> [ 243.132572] ? init_wait_entry+0xe0/0xe0 >>> [ 243.135429] ? wait_for_completion_killable_timeout+0x20/0x20 >>> [ 243.138303] io_sq_thread+0x174c/0x18c0 >>> [ 243.140162] ? io_rsrc_put_work+0x380/0x380 >>> [ 243.141613] ? init_wait_entry+0xe0/0xe0 >>> [ 243.143686] ? _raw_spin_lock_irq+0xa5/0x180 >>> [ 243.145619] ? _raw_spin_lock_irqsave+0x190/0x190 >>> [ 243.147671] ? calculate_sigpending+0x6b/0xa0 >>> [ 243.149036] ? io_rsrc_put_work+0x380/0x380 >>> [ 243.150694] ret_from_fork+0x22/0x30 >>> >>> Palash >>> >> >> -- >> Pavel Begunkov >> > -- Pavel Begunkov ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-05-13 21:28 ` Pavel Begunkov @ 2021-05-13 21:31 ` Pavel Begunkov 2021-05-15 7:44 ` Palash Oswal 0 siblings, 1 reply; 20+ messages in thread From: Pavel Begunkov @ 2021-05-13 21:31 UTC (permalink / raw) To: Palash Oswal; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On 5/13/21 10:28 PM, Pavel Begunkov wrote: > On 5/10/21 5:47 AM, Palash Oswal wrote: >> On Mon, May 3, 2021 at 4:15 PM Pavel Begunkov <[email protected]> >> wrote: >> >>> On 5/3/21 5:41 AM, Palash Oswal wrote: >>>> On Mon, May 3, 2021 at 3:42 AM Pavel Begunkov <[email protected]> >>> wrote: >>>>> The test might be very useful. Would you send a patch to >>>>> liburing? Or mind the repro being taken as a test? >>>> >>>> Pavel, >>>> >>>> I'm working on a PR for liburing with this test. Do you know how I can >>> >>> Perfect, thanks >>> >>>> address this behavior? >>> >>> As the hang is sqpoll cancellations, it's most probably >>> fixed in 5.13 + again awaits to be taken for stable. >>> >>> Don't know about segfaults, but it was so for long, and >>> syz reproducers are ofthen faults for me, and exit with 0 >>> in the end. So, I wouldn't worry about it. >>> >>> >> Hey Pavel, >> The bug actually fails to reproduce on 5.12 when the fork() call is made by >> the runtests.sh script. This causes the program to end correctly, and the >> hang does not occur. I verified this on 5.12 where the bug isn't patched. >> Just running the `sqpoll-cancel-hang` triggers the bug; whereas running it >> after being forked from runtests.sh does not trigger the bug. > > I see. fyi, it's always good to wait for 5 minutes, because some useful > logs are not generated immediately but do timeout based hang detection. > > I'd think that may be due CLONE_IO and how to whom it binds workers, > but can you try first: > > 1) timeout -s INT -k $TIMEOUT $TIMEOUT ./sqpoll-cancel-hang edit: timeout -s INT -k 60 60 ./sqpoll-cancel-hang And privileged, root/sudo > > 2) remove timeout from <liburing>/tests/Makefile and run > "./runtests.sh sqpoll-cancel-hang" again looking for faults? > > diff --git a/test/runtests.sh b/test/runtests.sh > index e8f4ae5..2b51dca 100755 > --- a/test/runtests.sh > +++ b/test/runtests.sh > @@ -91,7 +91,8 @@ run_test() > fi > > # Run the test > - timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev > + # timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev > + ./$test_name $dev > local status=$? > > # Check test status > > >> >> The segfaults are benign, but notice the "All tests passed" in the previous >> mail. It should not have passed, as the run was on 5.12. Therefore I wanted >> to ask your input on how to address this odd behaviour, where the >> involvement of runtests.sh actually mitigated the bug. >> >> >> >>>> root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang >>>> Running test sqp[ 15.310997] Running test sqpoll-cancel-hang: >>>> oll-cancel-hang: >>>> [ 15.333348] sqpoll-cancel-h[305]: segfault at 0 ip 000055ad00e265e3 >>> sp] >>>> [ 15.334940] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 >>> 46 >>>> All tests passed >>>> >>>> root@syzkaller:~/liburing/test# ./sqpoll-cancel-hang >>>> [ 13.572639] sqpoll-cancel-h[298]: segfault at 0 ip 00005634c4a455e3 >>> sp] >>>> [ 13.576506] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 >>> 46 >>>> [ 23.350459] random: crng init done >>>> [ 23.352837] random: 7 urandom warning(s) missed due to ratelimiting >>>> [ 243.090865] INFO: task iou-sqp-298:299 blocked for more than 120 >>> secon. >>>> [ 243.095187] Not tainted 5.12.0 #142 >>>> [ 243.099800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disable. >>>> [ 243.105928] task:iou-sqp-298 state:D stack: 0 pid: 299 ppid: >>> 4 >>>> [ 243.111044] Call Trace: >>>> [ 243.112855] __schedule+0xb1d/0x1130 >>>> [ 243.115549] ? __sched_text_start+0x8/0x8 >>>> [ 243.118328] ? io_wq_worker_sleeping+0x145/0x500 >>>> [ 243.121790] schedule+0x131/0x1c0 >>>> [ 243.123698] io_uring_cancel_sqpoll+0x288/0x350 >>>> [ 243.125977] ? io_sq_thread_unpark+0xd0/0xd0 >>>> [ 243.128966] ? mutex_lock+0xbb/0x130 >>>> [ 243.132572] ? init_wait_entry+0xe0/0xe0 >>>> [ 243.135429] ? wait_for_completion_killable_timeout+0x20/0x20 >>>> [ 243.138303] io_sq_thread+0x174c/0x18c0 >>>> [ 243.140162] ? io_rsrc_put_work+0x380/0x380 >>>> [ 243.141613] ? init_wait_entry+0xe0/0xe0 >>>> [ 243.143686] ? _raw_spin_lock_irq+0xa5/0x180 >>>> [ 243.145619] ? _raw_spin_lock_irqsave+0x190/0x190 >>>> [ 243.147671] ? calculate_sigpending+0x6b/0xa0 >>>> [ 243.149036] ? io_rsrc_put_work+0x380/0x380 >>>> [ 243.150694] ret_from_fork+0x22/0x30 >>>> >>>> Palash >>>> >>> >>> -- >>> Pavel Begunkov >>> >> > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll 2021-05-13 21:31 ` Pavel Begunkov @ 2021-05-15 7:44 ` Palash Oswal 0 siblings, 0 replies; 20+ messages in thread From: Palash Oswal @ 2021-05-15 7:44 UTC (permalink / raw) To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, LKML, syzbot+11bf59db879676f59e52 On Fri, May 14, 2021 at 3:01 AM Pavel Begunkov <[email protected]> wrote: > > On 5/13/21 10:28 PM, Pavel Begunkov wrote: > > On 5/10/21 5:47 AM, Palash Oswal wrote: > >> On Mon, May 3, 2021 at 4:15 PM Pavel Begunkov <[email protected]> > >> wrote: > >> > >>> On 5/3/21 5:41 AM, Palash Oswal wrote: > >>>> On Mon, May 3, 2021 at 3:42 AM Pavel Begunkov <[email protected]> > >>> wrote: > >>>>> The test might be very useful. Would you send a patch to > >>>>> liburing? Or mind the repro being taken as a test? > >>>> > >>>> Pavel, > >>>> > >>>> I'm working on a PR for liburing with this test. Do you know how I can > >>> > >>> Perfect, thanks > >>> > >>>> address this behavior? > >>> > >>> As the hang is sqpoll cancellations, it's most probably > >>> fixed in 5.13 + again awaits to be taken for stable. > >>> > >>> Don't know about segfaults, but it was so for long, and > >>> syz reproducers are ofthen faults for me, and exit with 0 > >>> in the end. So, I wouldn't worry about it. > >>> > >>> > >> Hey Pavel, > >> The bug actually fails to reproduce on 5.12 when the fork() call is made by > >> the runtests.sh script. This causes the program to end correctly, and the > >> hang does not occur. I verified this on 5.12 where the bug isn't patched. > >> Just running the `sqpoll-cancel-hang` triggers the bug; whereas running it > >> after being forked from runtests.sh does not trigger the bug. > > > > I see. fyi, it's always good to wait for 5 minutes, because some useful > > logs are not generated immediately but do timeout based hang detection. > > > > I'd think that may be due CLONE_IO and how to whom it binds workers, > > but can you try first: > > > > 1) timeout -s INT -k $TIMEOUT $TIMEOUT ./sqpoll-cancel-hang > > edit: > > timeout -s INT -k 60 60 ./sqpoll-cancel-hang > > And privileged, root/sudo > > > > > 2) remove timeout from <liburing>/tests/Makefile and run > > "./runtests.sh sqpoll-cancel-hang" again looking for faults? > > > > diff --git a/test/runtests.sh b/test/runtests.sh > > index e8f4ae5..2b51dca 100755 > > --- a/test/runtests.sh > > +++ b/test/runtests.sh > > @@ -91,7 +91,8 @@ run_test() > > fi > > > > # Run the test > > - timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev > > + # timeout -s INT -k $TIMEOUT $TIMEOUT ./$test_name $dev > > + ./$test_name $dev > > local status=$? > > > > # Check test status root@syzkaller:~/liburing/test# timeout -s INT -k 60 60 ./sqpoll-cancel-hang [ 19.381358] sqpoll-cancel-h[300]: segfault at 0 ip 0000556f7fa325e3 sp 00007ffee497d980 error 6 in s] [ 19.387323] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 45 64 39 c6 48 0f 42 f0 45 31 c6 root@syzkaller:~/liburing/test# [ 243.511620] INFO: task iou-sqp-300:301 blocked for more than 120 sec. [ 243.514146] Not tainted 5.12.0 #142 [ 243.515301] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.517629] task:iou-sqp-300 state:D stack: 0 pid: 301 ppid: 1 flags:0x00004004 [ 243.520155] Call Trace: [ 243.520956] __schedule+0xb1d/0x1130 [ 243.522102] ? __sched_text_start+0x8/0x8 [ 243.523195] ? io_wq_worker_sleeping+0x145/0x500 [ 243.524588] schedule+0x131/0x1c0 [ 243.525892] io_uring_cancel_sqpoll+0x288/0x350 [ 243.527610] ? io_sq_thread_unpark+0xd0/0xd0 [ 243.529084] ? mutex_lock+0xbb/0x130 [ 243.530327] ? init_wait_entry+0xe0/0xe0 [ 243.532805] ? wait_for_completion_killable_timeout+0x20/0x20 [ 243.535411] io_sq_thread+0x174c/0x18c0 [ 243.536520] ? io_rsrc_put_work+0x380/0x380 [ 243.537904] ? init_wait_entry+0xe0/0xe0 [ 243.538935] ? _raw_spin_lock_irq+0xa5/0x180 [ 243.540203] ? _raw_spin_lock_irqsave+0x190/0x190 [ 243.542398] ? calculate_sigpending+0x6b/0xa0 [ 243.543868] ? io_rsrc_put_work+0x380/0x380 [ 243.545377] ret_from_fork+0x22/0x30 ^C root@syzkaller:~/liburing/test# ps PID TTY TIME CMD 269 ttyS0 00:00:00 login 294 ttyS0 00:00:00 bash 300 ttyS0 00:00:00 sqpoll-cancel-h 305 ttyS0 00:00:00 ps After reboot, and the runtests.sh diff applied ( to remove timeout ) root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang Running test sqp[ 45.332140] Running test sqpoll-cancel-hang: oll-cancel-hang: [ 45.352524] sqpoll-cancel-h[314]: segfault at 0 ip 000056025bd085e3 sp 00007fffb08e20b0 error 6 in s] [ 45.356601] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 45 64 39 c6 48 0f 42 f0 45 31 c6 [ 243.019384] INFO: task iou-sqp-314:315 blocked for more than 120 seconds. [ 243.021483] Not tainted 5.12.0 #142 [ 243.022633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.024651] task:iou-sqp-314 state:D stack: 0 pid: 315 ppid: 313 flags:0x00004004 [ 243.026822] Call Trace: [ 243.027478] __schedule+0xb1d/0x1130 [ 243.028382] ? __sched_text_start+0x8/0x8 [ 243.029536] ? io_wq_worker_sleeping+0x145/0x500 [ 243.030932] schedule+0x131/0x1c0 [ 243.031920] io_uring_cancel_sqpoll+0x288/0x350 [ 243.033393] ? io_sq_thread_unpark+0xd0/0xd0 [ 243.034713] ? mutex_lock+0xbb/0x130 [ 243.035775] ? init_wait_entry+0xe0/0xe0 [ 243.037036] ? wait_for_completion_killable_timeout+0x20/0x20 [ 243.039492] io_sq_thread+0x174c/0x18c0 [ 243.040894] ? io_rsrc_put_work+0x380/0x380 [ 243.042463] ? init_wait_entry+0xe0/0xe0 [ 243.043990] ? _raw_spin_lock_irq+0xa5/0x180 [ 243.045581] ? _raw_spin_lock_irqsave+0x190/0x190 [ 243.047545] ? calculate_sigpending+0x6b/0xa0 [ 243.049262] ? io_rsrc_put_work+0x380/0x380 [ 243.050861] ret_from_fork+0x22/0x30 ^C root@syzkaller:~/liburing/test# ps PID TTY TIME CMD 285 ttyS0 00:00:00 login 300 ttyS0 00:00:00 bash 314 ttyS0 00:00:00 sqpoll-cancel-h 318 ttyS0 00:00:00 ps runtests.sh without any changes root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang [ 49.634886] Running test sqpoll-cancel-hang: Running test sqpoll-cancel-hang: [ 49.658365] sqpoll-cancel-h[302]: segfault at 0 ip 000055a76e99c5e3 sp 00007ffdc255d1a0 error 6 in s] [ 49.661703] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 45 64 39 c6 48 0f 42 f0 45 31 c6 Test sqpoll-cancel-hang timed out (may not be a failure) All tests passed root@syzkaller:~/liburing/test# ps PID TTY TIME CMD 269 ttyS0 00:00:00 login 287 ttyS0 00:00:00 bash 302 ttyS0 00:00:00 sqpoll-cancel-h 309 ttyS0 00:00:00 ps root@syzkaller:~/liburing/test# [ 243.324831] INFO: task iou-sqp-302:303 blocked for more than 120 sec. [ 243.328320] Not tainted 5.12.0 #142 [ 243.330361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.333930] task:iou-sqp-302 state:D stack: 0 pid: 303 ppid: 1 flags:0x00004004 [ 243.337758] Call Trace: [ 243.338926] __schedule+0xb1d/0x1130 [ 243.340801] ? __sched_text_start+0x8/0x8 [ 243.342690] ? io_wq_worker_sleeping+0x145/0x500 [ 243.344903] schedule+0x131/0x1c0 [ 243.346626] io_uring_cancel_sqpoll+0x288/0x350 [ 243.348762] ? io_sq_thread_unpark+0xd0/0xd0 [ 243.351036] ? mutex_lock+0xbb/0x130 [ 243.352737] ? init_wait_entry+0xe0/0xe0 [ 243.354673] ? wait_for_completion_killable_timeout+0x20/0x20 [ 243.356989] io_sq_thread+0x174c/0x18c0 [ 243.358559] ? io_rsrc_put_work+0x380/0x380 [ 243.359981] ? init_wait_entry+0xe0/0xe0 [ 243.361185] ? _raw_spin_lock_irq+0x110/0x180 [ 243.362958] ? _raw_spin_lock_irqsave+0x190/0x190 [ 243.364260] ? calculate_sigpending+0x6b/0xa0 [ 243.365763] ? io_rsrc_put_work+0x380/0x380 [ 243.367041] ret_from_fork+0x22/0x30 > > > > > >> > >> The segfaults are benign, but notice the "All tests passed" in the previous > >> mail. It should not have passed, as the run was on 5.12. Therefore I wanted > >> to ask your input on how to address this odd behaviour, where the > >> involvement of runtests.sh actually mitigated the bug. > >> > >> > >> > >>>> root@syzkaller:~/liburing/test# ./runtests.sh sqpoll-cancel-hang > >>>> Running test sqp[ 15.310997] Running test sqpoll-cancel-hang: > >>>> oll-cancel-hang: > >>>> [ 15.333348] sqpoll-cancel-h[305]: segfault at 0 ip 000055ad00e265e3 > >>> sp] > >>>> [ 15.334940] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 > >>> 46 > >>>> All tests passed > >>>> > >>>> root@syzkaller:~/liburing/test# ./sqpoll-cancel-hang > >>>> [ 13.572639] sqpoll-cancel-h[298]: segfault at 0 ip 00005634c4a455e3 > >>> sp] > >>>> [ 13.576506] Code: 89 d8 8d 34 90 8b 45 04 ba 03 00 00 00 c1 e0 04 03 > >>> 46 > >>>> [ 23.350459] random: crng init done > >>>> [ 23.352837] random: 7 urandom warning(s) missed due to ratelimiting > >>>> [ 243.090865] INFO: task iou-sqp-298:299 blocked for more than 120 > >>> secon. > >>>> [ 243.095187] Not tainted 5.12.0 #142 > >>>> [ 243.099800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disable. > >>>> [ 243.105928] task:iou-sqp-298 state:D stack: 0 pid: 299 ppid: > >>> 4 > >>>> [ 243.111044] Call Trace: > >>>> [ 243.112855] __schedule+0xb1d/0x1130 > >>>> [ 243.115549] ? __sched_text_start+0x8/0x8 > >>>> [ 243.118328] ? io_wq_worker_sleeping+0x145/0x500 > >>>> [ 243.121790] schedule+0x131/0x1c0 > >>>> [ 243.123698] io_uring_cancel_sqpoll+0x288/0x350 > >>>> [ 243.125977] ? io_sq_thread_unpark+0xd0/0xd0 > >>>> [ 243.128966] ? mutex_lock+0xbb/0x130 > >>>> [ 243.132572] ? init_wait_entry+0xe0/0xe0 > >>>> [ 243.135429] ? wait_for_completion_killable_timeout+0x20/0x20 > >>>> [ 243.138303] io_sq_thread+0x174c/0x18c0 > >>>> [ 243.140162] ? io_rsrc_put_work+0x380/0x380 > >>>> [ 243.141613] ? init_wait_entry+0xe0/0xe0 > >>>> [ 243.143686] ? _raw_spin_lock_irq+0xa5/0x180 > >>>> [ 243.145619] ? _raw_spin_lock_irqsave+0x190/0x190 > >>>> [ 243.147671] ? calculate_sigpending+0x6b/0xa0 > >>>> [ 243.149036] ? io_rsrc_put_work+0x380/0x380 > >>>> [ 243.150694] ret_from_fork+0x22/0x30 > >>>> > >>>> Palash > >>>> > >>> > >>> -- > >>> Pavel Begunkov > >>> > >> > > > > -- > Pavel Begunkov ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: INFO: task hung in io_uring_cancel_sqpoll [not found] <[email protected]> [not found] ` <[email protected]> @ 2021-06-12 15:48 ` syzbot 1 sibling, 0 replies; 20+ messages in thread From: syzbot @ 2021-06-12 15:48 UTC (permalink / raw) To: asml.silence, axboe, hdanton, io-uring, linux-kernel, oswalpalash, syzkaller-upstream-moderation Auto-closing this bug as obsolete. Crashes did not happen for a while, no reproducer and no activity. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-06-12 15:48 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <[email protected]> [not found] ` <[email protected]> [not found] ` <[email protected]> 2021-04-30 14:21 ` INFO: task hung in io_uring_cancel_sqpoll Palash Oswal 2021-04-30 14:33 ` Pavel Begunkov 2021-04-30 15:02 ` Palash Oswal 2021-04-30 15:07 ` Pavel Begunkov 2021-04-30 15:11 ` Palash Oswal 2021-04-30 18:34 ` Pavel Begunkov 2021-04-30 18:34 ` syzbot 2021-04-30 21:04 ` Pavel Begunkov 2021-05-01 4:51 ` Palash Oswal 2021-05-01 7:39 ` Pavel Begunkov 2021-05-02 10:33 ` Pavel Begunkov 2021-05-02 15:01 ` Palash Oswal 2021-05-02 15:56 ` Pavel Begunkov 2021-05-02 22:12 ` Pavel Begunkov 2021-05-03 4:41 ` Palash Oswal 2021-05-03 10:45 ` Pavel Begunkov [not found] ` <CAGyP=7eoSfh7z638PnP5UF4xVKcrG1jB_qmFo6uPZ7iWfu_2sQ@mail.gmail.com> 2021-05-13 21:28 ` Pavel Begunkov 2021-05-13 21:31 ` Pavel Begunkov 2021-05-15 7:44 ` Palash Oswal 2021-06-12 15:48 ` syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox