* [PATCH] block: reexpand iov_iter after read/write @ 2021-04-01 7:18 yangerkun 2021-04-06 1:28 ` yangerkun 2021-04-09 14:49 ` Pavel Begunkov 0 siblings, 2 replies; 18+ messages in thread From: yangerkun @ 2021-04-01 7:18 UTC (permalink / raw) To: viro, axboe, asml.silence; +Cc: linux-fsdevel, linux-block, io-uring, yangerkun We get a bug: BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 Read of size 8 at addr ffff0000d3fb11f8 by task CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted 5.10.0-00843-g352c8610ccd2 #2 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 io_read fs/io_uring.c:3421 [inline] io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 __kmalloc+0x23c/0x334 mm/slub.c:3970 kmalloc include/linux/slab.h:557 [inline] __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 io_setup_async_rw fs/io_uring.c:3229 [inline] io_read fs/io_uring.c:3436 [inline] io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 io_dismantle_req fs/io_uring.c:1855 [inline] __io_free_req+0x70/0x254 fs/io_uring.c:1867 io_put_req_find_next fs/io_uring.c:2173 [inline] __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 task_work_run+0xdc/0x128 kernel/task_work.c:151 get_signal+0x6f8/0x980 kernel/signal.c:2562 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 work_pending+0xc/0x180 blkdev_read_iter can truncate iov_iter's count since the count + pos may exceed the size of the blkdev. This will confuse io_read that we have consume the iovec. And once we do the iov_iter_revert in io_read, we will trigger the slab-out-of-bounds. Fix it by reexpand the count with size has been truncated. blkdev_write_iter can trigger the problem too. Signed-off-by: yangerkun <[email protected]> --- fs/block_dev.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 92ed7d5df677..788e1014576f 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) struct inode *bd_inode = bdev_file_inode(file); loff_t size = i_size_read(bd_inode); struct blk_plug plug; + size_t shorted = 0; ssize_t ret; if (bdev_read_only(I_BDEV(bd_inode))) @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) return -EOPNOTSUPP; - iov_iter_truncate(from, size - iocb->ki_pos); + size -= iocb->ki_pos; + if (iov_iter_count(from) > size) { + shorted = iov_iter_count(from) - size; + iov_iter_truncate(from, size); + } blk_start_plug(&plug); ret = __generic_file_write_iter(iocb, from); if (ret > 0) ret = generic_write_sync(iocb, ret); + iov_iter_reexpand(from, iov_iter_count(from) + shorted); blk_finish_plug(&plug); return ret; } @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) struct inode *bd_inode = bdev_file_inode(file); loff_t size = i_size_read(bd_inode); loff_t pos = iocb->ki_pos; + size_t shorted = 0; + ssize_t ret; if (pos >= size) return 0; size -= pos; - iov_iter_truncate(to, size); - return generic_file_read_iter(iocb, to); + if (iov_iter_count(to) > size) { + shorted = iov_iter_count(to) - size; + iov_iter_truncate(to, size); + } + + ret = generic_file_read_iter(iocb, to); + iov_iter_reexpand(to, iov_iter_count(to) + shorted); + return ret; } EXPORT_SYMBOL_GPL(blkdev_read_iter); -- 2.25.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-01 7:18 [PATCH] block: reexpand iov_iter after read/write yangerkun @ 2021-04-06 1:28 ` yangerkun 2021-04-06 11:04 ` Pavel Begunkov 2021-04-09 14:49 ` Pavel Begunkov 1 sibling, 1 reply; 18+ messages in thread From: yangerkun @ 2021-04-06 1:28 UTC (permalink / raw) To: viro, axboe, asml.silence; +Cc: linux-fsdevel, linux-block, io-uring Ping... 在 2021/4/1 15:18, yangerkun 写道: > We get a bug: > > BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 > lib/iov_iter.c:1139 > Read of size 8 at addr ffff0000d3fb11f8 by task > > CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted > 5.10.0-00843-g352c8610ccd2 #2 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 > show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x110/0x164 lib/dump_stack.c:118 > print_address_description+0x78/0x5c8 mm/kasan/report.c:385 > __kasan_report mm/kasan/report.c:545 [inline] > kasan_report+0x148/0x1e4 mm/kasan/report.c:562 > check_memory_region_inline mm/kasan/generic.c:183 [inline] > __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 > iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 > io_read fs/io_uring.c:3421 [inline] > io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 > __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 > io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 > io_submit_sqe fs/io_uring.c:6395 [inline] > io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 > __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] > __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] > __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 > __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] > invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] > el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] > do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 > el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 > el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 > el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 > > Allocated by task 12570: > stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 > kasan_save_stack mm/kasan/common.c:48 [inline] > kasan_set_track mm/kasan/common.c:56 [inline] > __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 > kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 > __kmalloc+0x23c/0x334 mm/slub.c:3970 > kmalloc include/linux/slab.h:557 [inline] > __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 > io_setup_async_rw fs/io_uring.c:3229 [inline] > io_read fs/io_uring.c:3436 [inline] > io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 > __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 > io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 > io_submit_sqe fs/io_uring.c:6395 [inline] > io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 > __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] > __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] > __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 > __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] > invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] > el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] > do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 > el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 > el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 > el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 > > Freed by task 12570: > stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 > kasan_save_stack mm/kasan/common.c:48 [inline] > kasan_set_track+0x38/0x6c mm/kasan/common.c:56 > kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 > __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 > kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 > slab_free_hook mm/slub.c:1544 [inline] > slab_free_freelist_hook mm/slub.c:1577 [inline] > slab_free mm/slub.c:3142 [inline] > kfree+0x104/0x38c mm/slub.c:4124 > io_dismantle_req fs/io_uring.c:1855 [inline] > __io_free_req+0x70/0x254 fs/io_uring.c:1867 > io_put_req_find_next fs/io_uring.c:2173 [inline] > __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 > __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 > io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 > task_work_run+0xdc/0x128 kernel/task_work.c:151 > get_signal+0x6f8/0x980 kernel/signal.c:2562 > do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 > do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 > work_pending+0xc/0x180 > > blkdev_read_iter can truncate iov_iter's count since the count + pos may > exceed the size of the blkdev. This will confuse io_read that we have > consume the iovec. And once we do the iov_iter_revert in io_read, we > will trigger the slab-out-of-bounds. Fix it by reexpand the count with > size has been truncated. > > blkdev_write_iter can trigger the problem too. > > Signed-off-by: yangerkun <[email protected]> > --- > fs/block_dev.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 92ed7d5df677..788e1014576f 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) > struct inode *bd_inode = bdev_file_inode(file); > loff_t size = i_size_read(bd_inode); > struct blk_plug plug; > + size_t shorted = 0; > ssize_t ret; > > if (bdev_read_only(I_BDEV(bd_inode))) > @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) > if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) > return -EOPNOTSUPP; > > - iov_iter_truncate(from, size - iocb->ki_pos); > + size -= iocb->ki_pos; > + if (iov_iter_count(from) > size) { > + shorted = iov_iter_count(from) - size; > + iov_iter_truncate(from, size); > + } > > blk_start_plug(&plug); > ret = __generic_file_write_iter(iocb, from); > if (ret > 0) > ret = generic_write_sync(iocb, ret); > + iov_iter_reexpand(from, iov_iter_count(from) + shorted); > blk_finish_plug(&plug); > return ret; > } > @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) > struct inode *bd_inode = bdev_file_inode(file); > loff_t size = i_size_read(bd_inode); > loff_t pos = iocb->ki_pos; > + size_t shorted = 0; > + ssize_t ret; > > if (pos >= size) > return 0; > > size -= pos; > - iov_iter_truncate(to, size); > - return generic_file_read_iter(iocb, to); > + if (iov_iter_count(to) > size) { > + shorted = iov_iter_count(to) - size; > + iov_iter_truncate(to, size); > + } > + > + ret = generic_file_read_iter(iocb, to); > + iov_iter_reexpand(to, iov_iter_count(to) + shorted); > + return ret; > } > EXPORT_SYMBOL_GPL(blkdev_read_iter); > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-06 1:28 ` yangerkun @ 2021-04-06 11:04 ` Pavel Begunkov 2021-04-07 14:16 ` yangerkun 0 siblings, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2021-04-06 11:04 UTC (permalink / raw) To: yangerkun, viro, axboe; +Cc: linux-fsdevel, linux-block, io-uring On 06/04/2021 02:28, yangerkun wrote: > Ping... It wasn't forgotten, but wouln't have worked because of other reasons. With these two already queued, that's a different story. https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=07204f21577a1d882f0259590c3553fe6a476381 https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=230d50d448acb6639991440913299e50cacf1daf Can you re-confirm, that the bug is still there (should be) and your patch fixes it? > > 在 2021/4/1 15:18, yangerkun 写道: >> We get a bug: >> >> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 >> lib/iov_iter.c:1139 >> Read of size 8 at addr ffff0000d3fb11f8 by task >> >> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted >> 5.10.0-00843-g352c8610ccd2 #2 >> Hardware name: linux,dummy-virt (DT) >> Call trace: >> dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 >> show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 >> __dump_stack lib/dump_stack.c:77 [inline] >> dump_stack+0x110/0x164 lib/dump_stack.c:118 >> print_address_description+0x78/0x5c8 mm/kasan/report.c:385 >> __kasan_report mm/kasan/report.c:545 [inline] >> kasan_report+0x148/0x1e4 mm/kasan/report.c:562 >> check_memory_region_inline mm/kasan/generic.c:183 [inline] >> __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 >> iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 >> io_read fs/io_uring.c:3421 [inline] >> io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 >> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >> io_submit_sqe fs/io_uring.c:6395 [inline] >> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 >> __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] >> __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] >> __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 >> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] >> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] >> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] >> do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 >> el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 >> el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 >> el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 >> >> Allocated by task 12570: >> stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 >> kasan_save_stack mm/kasan/common.c:48 [inline] >> kasan_set_track mm/kasan/common.c:56 [inline] >> __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 >> kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 >> __kmalloc+0x23c/0x334 mm/slub.c:3970 >> kmalloc include/linux/slab.h:557 [inline] >> __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 >> io_setup_async_rw fs/io_uring.c:3229 [inline] >> io_read fs/io_uring.c:3436 [inline] >> io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 >> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >> io_submit_sqe fs/io_uring.c:6395 [inline] >> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 >> __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] >> __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] >> __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 >> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] >> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] >> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] >> do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 >> el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 >> el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 >> el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 >> >> Freed by task 12570: >> stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 >> kasan_save_stack mm/kasan/common.c:48 [inline] >> kasan_set_track+0x38/0x6c mm/kasan/common.c:56 >> kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 >> __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 >> kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 >> slab_free_hook mm/slub.c:1544 [inline] >> slab_free_freelist_hook mm/slub.c:1577 [inline] >> slab_free mm/slub.c:3142 [inline] >> kfree+0x104/0x38c mm/slub.c:4124 >> io_dismantle_req fs/io_uring.c:1855 [inline] >> __io_free_req+0x70/0x254 fs/io_uring.c:1867 >> io_put_req_find_next fs/io_uring.c:2173 [inline] >> __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 >> __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 >> io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 >> task_work_run+0xdc/0x128 kernel/task_work.c:151 >> get_signal+0x6f8/0x980 kernel/signal.c:2562 >> do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 >> do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 >> work_pending+0xc/0x180 >> >> blkdev_read_iter can truncate iov_iter's count since the count + pos may >> exceed the size of the blkdev. This will confuse io_read that we have >> consume the iovec. And once we do the iov_iter_revert in io_read, we >> will trigger the slab-out-of-bounds. Fix it by reexpand the count with >> size has been truncated. >> >> blkdev_write_iter can trigger the problem too. >> >> Signed-off-by: yangerkun <[email protected]> >> --- >> fs/block_dev.c | 20 +++++++++++++++++--- >> 1 file changed, 17 insertions(+), 3 deletions(-) >> >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 92ed7d5df677..788e1014576f 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >> struct inode *bd_inode = bdev_file_inode(file); >> loff_t size = i_size_read(bd_inode); >> struct blk_plug plug; >> + size_t shorted = 0; >> ssize_t ret; >> if (bdev_read_only(I_BDEV(bd_inode))) >> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >> if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) >> return -EOPNOTSUPP; >> - iov_iter_truncate(from, size - iocb->ki_pos); >> + size -= iocb->ki_pos; >> + if (iov_iter_count(from) > size) { >> + shorted = iov_iter_count(from) - size; >> + iov_iter_truncate(from, size); >> + } >> blk_start_plug(&plug); >> ret = __generic_file_write_iter(iocb, from); >> if (ret > 0) >> ret = generic_write_sync(iocb, ret); >> + iov_iter_reexpand(from, iov_iter_count(from) + shorted); >> blk_finish_plug(&plug); >> return ret; >> } >> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) >> struct inode *bd_inode = bdev_file_inode(file); >> loff_t size = i_size_read(bd_inode); >> loff_t pos = iocb->ki_pos; >> + size_t shorted = 0; >> + ssize_t ret; >> if (pos >= size) >> return 0; >> size -= pos; >> - iov_iter_truncate(to, size); >> - return generic_file_read_iter(iocb, to); >> + if (iov_iter_count(to) > size) { >> + shorted = iov_iter_count(to) - size; >> + iov_iter_truncate(to, size); >> + } >> + >> + ret = generic_file_read_iter(iocb, to); >> + iov_iter_reexpand(to, iov_iter_count(to) + shorted); >> + return ret; >> } >> EXPORT_SYMBOL_GPL(blkdev_read_iter); >> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-06 11:04 ` Pavel Begunkov @ 2021-04-07 14:16 ` yangerkun 0 siblings, 0 replies; 18+ messages in thread From: yangerkun @ 2021-04-07 14:16 UTC (permalink / raw) To: Pavel Begunkov, viro, axboe; +Cc: linux-fsdevel, linux-block, io-uring 在 2021/4/6 19:04, Pavel Begunkov 写道: > On 06/04/2021 02:28, yangerkun wrote: >> Ping... > > It wasn't forgotten, but wouln't have worked because of > other reasons. With these two already queued, that's a > different story. > > https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=07204f21577a1d882f0259590c3553fe6a476381 > https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=230d50d448acb6639991440913299e50cacf1daf > > Can you re-confirm, that the bug is still there (should be) > and your patch fixes it? Hi, This problem still exists in mainline (2d743660786e Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs), and this patch will fix it. The io_read for loop will return -EAGAIN. This will lead a iov_iter_revert in io_read. Once we truncate iov_iter in blkdev_read_iter, we will see this bug... [ 181.204371][ T4241] loop0: detected capacity change from 0 to 232 [ 181.253683][ T4241] ================================================================== [ 181.255313][ T4241] BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0xd0/0x3e0 [ 181.256723][ T4241] Read of size 8 at addr ffff0000cfbc8ff8 by task a.out/4241 [ 181.257776][ T4241] [ 181.258749][ T4241] CPU: 5 PID: 4241 Comm: a.out Not tainted 5.12.0-rc6-00006-g2d743660786e #1 [ 181.260149][ T4241] Hardware name: linux,dummy-virt (DT) [ 181.261468][ T4241] Call trace: [ 181.262052][ T4241] dump_backtrace+0x0/0x348 [ 181.263139][ T4241] show_stack+0x28/0x38 [ 181.264234][ T4241] dump_stack+0x134/0x1a4 [ 181.265175][ T4241] print_address_description.constprop.0+0x68/0x304 [ 181.266430][ T4241] kasan_report+0x1d0/0x238 [ 181.267308][ T4241] __asan_load8+0x88/0xc0 [ 181.268317][ T4241] iov_iter_revert+0xd0/0x3e0 [ 181.269251][ T4241] io_read+0x310/0x5c0 [ 181.270208][ T4241] io_issue_sqe+0x3fc/0x25d8 [ 181.271134][ T4241] __io_queue_sqe+0xf8/0x480 [ 181.272142][ T4241] io_queue_sqe+0x3a4/0x4c8 [ 181.273053][ T4241] io_submit_sqes+0xd9c/0x22d0 [ 181.274375][ T4241] __arm64_sys_io_uring_enter+0x3d0/0xce0 [ 181.275554][ T4241] do_el0_svc+0xc4/0x228 [ 181.276411][ T4241] el0_svc+0x24/0x30 [ 181.277323][ T4241] el0_sync_handler+0x158/0x160 [ 181.278241][ T4241] el0_sync+0x13c/0x140 [ 181.279287][ T4241] [ 181.279820][ T4241] Allocated by task 4241: [ 181.280699][ T4241] kasan_save_stack+0x24/0x50 [ 181.281626][ T4241] __kasan_kmalloc+0x84/0xa8 [ 181.282578][ T4241] io_wq_create+0x94/0x668 [ 181.283469][ T4241] io_uring_alloc_task_context+0x164/0x368 [ 181.284748][ T4241] io_uring_add_task_file+0x1b0/0x208 [ 181.285865][ T4241] io_uring_setup+0xaac/0x12a0 [ 181.286823][ T4241] __arm64_sys_io_uring_setup+0x34/0x40 [ 181.287957][ T4241] do_el0_svc+0xc4/0x228 [ 181.288906][ T4241] el0_svc+0x24/0x30 [ 181.289816][ T4241] el0_sync_handler+0x158/0x160 [ 181.290751][ T4241] el0_sync+0x13c/0x140 [ 181.291697][ T4241] > >> >> 在 2021/4/1 15:18, yangerkun 写道: >>> We get a bug: >>> >>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 >>> lib/iov_iter.c:1139 >>> Read of size 8 at addr ffff0000d3fb11f8 by task >>> >>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted >>> 5.10.0-00843-g352c8610ccd2 #2 >>> Hardware name: linux,dummy-virt (DT) >>> Call trace: >>> dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 >>> show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 >>> __dump_stack lib/dump_stack.c:77 [inline] >>> dump_stack+0x110/0x164 lib/dump_stack.c:118 >>> print_address_description+0x78/0x5c8 mm/kasan/report.c:385 >>> __kasan_report mm/kasan/report.c:545 [inline] >>> kasan_report+0x148/0x1e4 mm/kasan/report.c:562 >>> check_memory_region_inline mm/kasan/generic.c:183 [inline] >>> __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 >>> iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 >>> io_read fs/io_uring.c:3421 [inline] >>> io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 >>> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >>> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >>> io_submit_sqe fs/io_uring.c:6395 [inline] >>> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 >>> __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] >>> __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] >>> __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 >>> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] >>> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] >>> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] >>> do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 >>> el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 >>> el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 >>> el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 >>> >>> Allocated by task 12570: >>> stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 >>> kasan_save_stack mm/kasan/common.c:48 [inline] >>> kasan_set_track mm/kasan/common.c:56 [inline] >>> __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 >>> kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 >>> __kmalloc+0x23c/0x334 mm/slub.c:3970 >>> kmalloc include/linux/slab.h:557 [inline] >>> __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 >>> io_setup_async_rw fs/io_uring.c:3229 [inline] >>> io_read fs/io_uring.c:3436 [inline] >>> io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 >>> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >>> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >>> io_submit_sqe fs/io_uring.c:6395 [inline] >>> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 >>> __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] >>> __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] >>> __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 >>> __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] >>> invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] >>> el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] >>> do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 >>> el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 >>> el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 >>> el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 >>> >>> Freed by task 12570: >>> stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 >>> kasan_save_stack mm/kasan/common.c:48 [inline] >>> kasan_set_track+0x38/0x6c mm/kasan/common.c:56 >>> kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 >>> __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 >>> kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 >>> slab_free_hook mm/slub.c:1544 [inline] >>> slab_free_freelist_hook mm/slub.c:1577 [inline] >>> slab_free mm/slub.c:3142 [inline] >>> kfree+0x104/0x38c mm/slub.c:4124 >>> io_dismantle_req fs/io_uring.c:1855 [inline] >>> __io_free_req+0x70/0x254 fs/io_uring.c:1867 >>> io_put_req_find_next fs/io_uring.c:2173 [inline] >>> __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 >>> __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 >>> io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 >>> task_work_run+0xdc/0x128 kernel/task_work.c:151 >>> get_signal+0x6f8/0x980 kernel/signal.c:2562 >>> do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 >>> do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 >>> work_pending+0xc/0x180 >>> >>> blkdev_read_iter can truncate iov_iter's count since the count + pos may >>> exceed the size of the blkdev. This will confuse io_read that we have >>> consume the iovec. And once we do the iov_iter_revert in io_read, we >>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with >>> size has been truncated. >>> >>> blkdev_write_iter can trigger the problem too. >>> >>> Signed-off-by: yangerkun <[email protected]> >>> --- >>> fs/block_dev.c | 20 +++++++++++++++++--- >>> 1 file changed, 17 insertions(+), 3 deletions(-) >>> >>> diff --git a/fs/block_dev.c b/fs/block_dev.c >>> index 92ed7d5df677..788e1014576f 100644 >>> --- a/fs/block_dev.c >>> +++ b/fs/block_dev.c >>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>> struct inode *bd_inode = bdev_file_inode(file); >>> loff_t size = i_size_read(bd_inode); >>> struct blk_plug plug; >>> + size_t shorted = 0; >>> ssize_t ret; >>> if (bdev_read_only(I_BDEV(bd_inode))) >>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>> if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) >>> return -EOPNOTSUPP; >>> - iov_iter_truncate(from, size - iocb->ki_pos); >>> + size -= iocb->ki_pos; >>> + if (iov_iter_count(from) > size) { >>> + shorted = iov_iter_count(from) - size; >>> + iov_iter_truncate(from, size); >>> + } >>> blk_start_plug(&plug); >>> ret = __generic_file_write_iter(iocb, from); >>> if (ret > 0) >>> ret = generic_write_sync(iocb, ret); >>> + iov_iter_reexpand(from, iov_iter_count(from) + shorted); >>> blk_finish_plug(&plug); >>> return ret; >>> } >>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) >>> struct inode *bd_inode = bdev_file_inode(file); >>> loff_t size = i_size_read(bd_inode); >>> loff_t pos = iocb->ki_pos; >>> + size_t shorted = 0; >>> + ssize_t ret; >>> if (pos >= size) >>> return 0; >>> size -= pos; >>> - iov_iter_truncate(to, size); >>> - return generic_file_read_iter(iocb, to); >>> + if (iov_iter_count(to) > size) { >>> + shorted = iov_iter_count(to) - size; >>> + iov_iter_truncate(to, size); >>> + } >>> + >>> + ret = generic_file_read_iter(iocb, to); >>> + iov_iter_reexpand(to, iov_iter_count(to) + shorted); >>> + return ret; >>> } >>> EXPORT_SYMBOL_GPL(blkdev_read_iter); >>> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-01 7:18 [PATCH] block: reexpand iov_iter after read/write yangerkun 2021-04-06 1:28 ` yangerkun @ 2021-04-09 14:49 ` Pavel Begunkov 2021-04-15 17:37 ` Pavel Begunkov 1 sibling, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2021-04-09 14:49 UTC (permalink / raw) To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring On 01/04/2021 08:18, yangerkun wrote: > We get a bug: > > BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 > lib/iov_iter.c:1139 > Read of size 8 at addr ffff0000d3fb11f8 by task > > CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted > 5.10.0-00843-g352c8610ccd2 #2 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 > show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x110/0x164 lib/dump_stack.c:118 > print_address_description+0x78/0x5c8 mm/kasan/report.c:385 > __kasan_report mm/kasan/report.c:545 [inline] > kasan_report+0x148/0x1e4 mm/kasan/report.c:562 > check_memory_region_inline mm/kasan/generic.c:183 [inline] > __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 > iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 > io_read fs/io_uring.c:3421 [inline] > io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 > __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 > io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 > io_submit_sqe fs/io_uring.c:6395 [inline] > io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 > __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] > __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] > __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 > __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] > invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] > el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] > do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 > el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 > el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 > el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 > > Allocated by task 12570: > stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 > kasan_save_stack mm/kasan/common.c:48 [inline] > kasan_set_track mm/kasan/common.c:56 [inline] > __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 > kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 > __kmalloc+0x23c/0x334 mm/slub.c:3970 > kmalloc include/linux/slab.h:557 [inline] > __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 > io_setup_async_rw fs/io_uring.c:3229 [inline] > io_read fs/io_uring.c:3436 [inline] > io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 > __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 > io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 > io_submit_sqe fs/io_uring.c:6395 [inline] > io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 > __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] > __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] > __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 > __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] > invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] > el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] > do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 > el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 > el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 > el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 > > Freed by task 12570: > stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 > kasan_save_stack mm/kasan/common.c:48 [inline] > kasan_set_track+0x38/0x6c mm/kasan/common.c:56 > kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 > __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 > kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 > slab_free_hook mm/slub.c:1544 [inline] > slab_free_freelist_hook mm/slub.c:1577 [inline] > slab_free mm/slub.c:3142 [inline] > kfree+0x104/0x38c mm/slub.c:4124 > io_dismantle_req fs/io_uring.c:1855 [inline] > __io_free_req+0x70/0x254 fs/io_uring.c:1867 > io_put_req_find_next fs/io_uring.c:2173 [inline] > __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 > __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 > io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 > task_work_run+0xdc/0x128 kernel/task_work.c:151 > get_signal+0x6f8/0x980 kernel/signal.c:2562 > do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 > do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 > work_pending+0xc/0x180 > > blkdev_read_iter can truncate iov_iter's count since the count + pos may > exceed the size of the blkdev. This will confuse io_read that we have > consume the iovec. And once we do the iov_iter_revert in io_read, we > will trigger the slab-out-of-bounds. Fix it by reexpand the count with > size has been truncated. Looks right, Acked-by: Pavel Begunkov <[email protected]> > > blkdev_write_iter can trigger the problem too. > > Signed-off-by: yangerkun <[email protected]> > --- > fs/block_dev.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 92ed7d5df677..788e1014576f 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) > struct inode *bd_inode = bdev_file_inode(file); > loff_t size = i_size_read(bd_inode); > struct blk_plug plug; > + size_t shorted = 0; > ssize_t ret; > > if (bdev_read_only(I_BDEV(bd_inode))) > @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) > if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) > return -EOPNOTSUPP; > > - iov_iter_truncate(from, size - iocb->ki_pos); > + size -= iocb->ki_pos; > + if (iov_iter_count(from) > size) { > + shorted = iov_iter_count(from) - size; > + iov_iter_truncate(from, size); > + } > > blk_start_plug(&plug); > ret = __generic_file_write_iter(iocb, from); > if (ret > 0) > ret = generic_write_sync(iocb, ret); > + iov_iter_reexpand(from, iov_iter_count(from) + shorted); > blk_finish_plug(&plug); > return ret; > } > @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) > struct inode *bd_inode = bdev_file_inode(file); > loff_t size = i_size_read(bd_inode); > loff_t pos = iocb->ki_pos; > + size_t shorted = 0; > + ssize_t ret; > > if (pos >= size) > return 0; > > size -= pos; > - iov_iter_truncate(to, size); > - return generic_file_read_iter(iocb, to); > + if (iov_iter_count(to) > size) { > + shorted = iov_iter_count(to) - size; > + iov_iter_truncate(to, size); > + } > + > + ret = generic_file_read_iter(iocb, to); > + iov_iter_reexpand(to, iov_iter_count(to) + shorted); > + return ret; > } > EXPORT_SYMBOL_GPL(blkdev_read_iter); > > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-09 14:49 ` Pavel Begunkov @ 2021-04-15 17:37 ` Pavel Begunkov 2021-04-15 17:39 ` Pavel Begunkov 0 siblings, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2021-04-15 17:37 UTC (permalink / raw) To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring On 09/04/2021 15:49, Pavel Begunkov wrote: > On 01/04/2021 08:18, yangerkun wrote: >> We get a bug: >> >> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 >> lib/iov_iter.c:1139 >> Read of size 8 at addr ffff0000d3fb11f8 by task >> >> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted >> 5.10.0-00843-g352c8610ccd2 #2 >> Hardware name: linux,dummy-virt (DT) >> Call trace: ... >> __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 >> iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 >> io_read fs/io_uring.c:3421 [inline] >> io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 >> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >> io_submit_sqe fs/io_uring.c:6395 [inline] >> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 ... >> >> blkdev_read_iter can truncate iov_iter's count since the count + pos may >> exceed the size of the blkdev. This will confuse io_read that we have >> consume the iovec. And once we do the iov_iter_revert in io_read, we >> will trigger the slab-out-of-bounds. Fix it by reexpand the count with >> size has been truncated. > > Looks right, > > Acked-by: Pavel Begunkov <[email protected]> Fwiw, we need to forget to drag it through 5.13 + stable >> >> blkdev_write_iter can trigger the problem too. >> >> Signed-off-by: yangerkun <[email protected]> >> --- >> fs/block_dev.c | 20 +++++++++++++++++--- >> 1 file changed, 17 insertions(+), 3 deletions(-) >> >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 92ed7d5df677..788e1014576f 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >> struct inode *bd_inode = bdev_file_inode(file); >> loff_t size = i_size_read(bd_inode); >> struct blk_plug plug; >> + size_t shorted = 0; >> ssize_t ret; >> >> if (bdev_read_only(I_BDEV(bd_inode))) >> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >> if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) >> return -EOPNOTSUPP; >> >> - iov_iter_truncate(from, size - iocb->ki_pos); >> + size -= iocb->ki_pos; >> + if (iov_iter_count(from) > size) { >> + shorted = iov_iter_count(from) - size; >> + iov_iter_truncate(from, size); >> + } >> >> blk_start_plug(&plug); >> ret = __generic_file_write_iter(iocb, from); >> if (ret > 0) >> ret = generic_write_sync(iocb, ret); >> + iov_iter_reexpand(from, iov_iter_count(from) + shorted); >> blk_finish_plug(&plug); >> return ret; >> } >> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) >> struct inode *bd_inode = bdev_file_inode(file); >> loff_t size = i_size_read(bd_inode); >> loff_t pos = iocb->ki_pos; >> + size_t shorted = 0; >> + ssize_t ret; >> >> if (pos >= size) >> return 0; >> >> size -= pos; >> - iov_iter_truncate(to, size); >> - return generic_file_read_iter(iocb, to); >> + if (iov_iter_count(to) > size) { >> + shorted = iov_iter_count(to) - size; >> + iov_iter_truncate(to, size); >> + } >> + >> + ret = generic_file_read_iter(iocb, to); >> + iov_iter_reexpand(to, iov_iter_count(to) + shorted); >> + return ret; >> } >> EXPORT_SYMBOL_GPL(blkdev_read_iter); >> >> > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-15 17:37 ` Pavel Begunkov @ 2021-04-15 17:39 ` Pavel Begunkov 2021-04-28 6:16 ` yangerkun 0 siblings, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2021-04-15 17:39 UTC (permalink / raw) To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring On 15/04/2021 18:37, Pavel Begunkov wrote: > On 09/04/2021 15:49, Pavel Begunkov wrote: >> On 01/04/2021 08:18, yangerkun wrote: >>> We get a bug: >>> >>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 >>> lib/iov_iter.c:1139 >>> Read of size 8 at addr ffff0000d3fb11f8 by task >>> >>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted >>> 5.10.0-00843-g352c8610ccd2 #2 >>> Hardware name: linux,dummy-virt (DT) >>> Call trace: > ... >>> __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 >>> iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 >>> io_read fs/io_uring.c:3421 [inline] >>> io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 >>> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >>> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >>> io_submit_sqe fs/io_uring.c:6395 [inline] >>> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 > ... >>> >>> blkdev_read_iter can truncate iov_iter's count since the count + pos may >>> exceed the size of the blkdev. This will confuse io_read that we have >>> consume the iovec. And once we do the iov_iter_revert in io_read, we >>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with >>> size has been truncated. >> >> Looks right, >> >> Acked-by: Pavel Begunkov <[email protected]> > > Fwiw, we need to forget to drag it through 5.13 + stable Err, yypo, to _not_ forget to 5.13 + stable... > > >>> >>> blkdev_write_iter can trigger the problem too. >>> >>> Signed-off-by: yangerkun <[email protected]> >>> --- >>> fs/block_dev.c | 20 +++++++++++++++++--- >>> 1 file changed, 17 insertions(+), 3 deletions(-) >>> >>> diff --git a/fs/block_dev.c b/fs/block_dev.c >>> index 92ed7d5df677..788e1014576f 100644 >>> --- a/fs/block_dev.c >>> +++ b/fs/block_dev.c >>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>> struct inode *bd_inode = bdev_file_inode(file); >>> loff_t size = i_size_read(bd_inode); >>> struct blk_plug plug; >>> + size_t shorted = 0; >>> ssize_t ret; >>> >>> if (bdev_read_only(I_BDEV(bd_inode))) >>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>> if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) >>> return -EOPNOTSUPP; >>> >>> - iov_iter_truncate(from, size - iocb->ki_pos); >>> + size -= iocb->ki_pos; >>> + if (iov_iter_count(from) > size) { >>> + shorted = iov_iter_count(from) - size; >>> + iov_iter_truncate(from, size); >>> + } >>> >>> blk_start_plug(&plug); >>> ret = __generic_file_write_iter(iocb, from); >>> if (ret > 0) >>> ret = generic_write_sync(iocb, ret); >>> + iov_iter_reexpand(from, iov_iter_count(from) + shorted); >>> blk_finish_plug(&plug); >>> return ret; >>> } >>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) >>> struct inode *bd_inode = bdev_file_inode(file); >>> loff_t size = i_size_read(bd_inode); >>> loff_t pos = iocb->ki_pos; >>> + size_t shorted = 0; >>> + ssize_t ret; >>> >>> if (pos >= size) >>> return 0; >>> >>> size -= pos; >>> - iov_iter_truncate(to, size); >>> - return generic_file_read_iter(iocb, to); >>> + if (iov_iter_count(to) > size) { >>> + shorted = iov_iter_count(to) - size; >>> + iov_iter_truncate(to, size); >>> + } >>> + >>> + ret = generic_file_read_iter(iocb, to); >>> + iov_iter_reexpand(to, iov_iter_count(to) + shorted); >>> + return ret; >>> } >>> EXPORT_SYMBOL_GPL(blkdev_read_iter); >>> >>> >> > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-15 17:39 ` Pavel Begunkov @ 2021-04-28 6:16 ` yangerkun 2021-04-30 12:57 ` Pavel Begunkov 0 siblings, 1 reply; 18+ messages in thread From: yangerkun @ 2021-04-28 6:16 UTC (permalink / raw) To: Pavel Begunkov, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring Hi, Should we pick this patch for 5.13? 在 2021/4/16 1:39, Pavel Begunkov 写道: > On 15/04/2021 18:37, Pavel Begunkov wrote: >> On 09/04/2021 15:49, Pavel Begunkov wrote: >>> On 01/04/2021 08:18, yangerkun wrote: >>>> We get a bug: >>>> >>>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 >>>> lib/iov_iter.c:1139 >>>> Read of size 8 at addr ffff0000d3fb11f8 by task >>>> >>>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted >>>> 5.10.0-00843-g352c8610ccd2 #2 >>>> Hardware name: linux,dummy-virt (DT) >>>> Call trace: >> ... >>>> __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 >>>> iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 >>>> io_read fs/io_uring.c:3421 [inline] >>>> io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 >>>> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >>>> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >>>> io_submit_sqe fs/io_uring.c:6395 [inline] >>>> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 >> ... >>>> >>>> blkdev_read_iter can truncate iov_iter's count since the count + pos may >>>> exceed the size of the blkdev. This will confuse io_read that we have >>>> consume the iovec. And once we do the iov_iter_revert in io_read, we >>>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with >>>> size has been truncated. >>> >>> Looks right, >>> >>> Acked-by: Pavel Begunkov <[email protected]> >> >> Fwiw, we need to forget to drag it through 5.13 + stable > > Err, yypo, to _not_ forget to 5.13 + stable... > >> >> >>>> >>>> blkdev_write_iter can trigger the problem too. >>>> >>>> Signed-off-by: yangerkun <[email protected]> >>>> --- >>>> fs/block_dev.c | 20 +++++++++++++++++--- >>>> 1 file changed, 17 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/fs/block_dev.c b/fs/block_dev.c >>>> index 92ed7d5df677..788e1014576f 100644 >>>> --- a/fs/block_dev.c >>>> +++ b/fs/block_dev.c >>>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>>> struct inode *bd_inode = bdev_file_inode(file); >>>> loff_t size = i_size_read(bd_inode); >>>> struct blk_plug plug; >>>> + size_t shorted = 0; >>>> ssize_t ret; >>>> >>>> if (bdev_read_only(I_BDEV(bd_inode))) >>>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>>> if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) >>>> return -EOPNOTSUPP; >>>> >>>> - iov_iter_truncate(from, size - iocb->ki_pos); >>>> + size -= iocb->ki_pos; >>>> + if (iov_iter_count(from) > size) { >>>> + shorted = iov_iter_count(from) - size; >>>> + iov_iter_truncate(from, size); >>>> + } >>>> >>>> blk_start_plug(&plug); >>>> ret = __generic_file_write_iter(iocb, from); >>>> if (ret > 0) >>>> ret = generic_write_sync(iocb, ret); >>>> + iov_iter_reexpand(from, iov_iter_count(from) + shorted); >>>> blk_finish_plug(&plug); >>>> return ret; >>>> } >>>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) >>>> struct inode *bd_inode = bdev_file_inode(file); >>>> loff_t size = i_size_read(bd_inode); >>>> loff_t pos = iocb->ki_pos; >>>> + size_t shorted = 0; >>>> + ssize_t ret; >>>> >>>> if (pos >= size) >>>> return 0; >>>> >>>> size -= pos; >>>> - iov_iter_truncate(to, size); >>>> - return generic_file_read_iter(iocb, to); >>>> + if (iov_iter_count(to) > size) { >>>> + shorted = iov_iter_count(to) - size; >>>> + iov_iter_truncate(to, size); >>>> + } >>>> + >>>> + ret = generic_file_read_iter(iocb, to); >>>> + iov_iter_reexpand(to, iov_iter_count(to) + shorted); >>>> + return ret; >>>> } >>>> EXPORT_SYMBOL_GPL(blkdev_read_iter); >>>> >>>> >>> >> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-28 6:16 ` yangerkun @ 2021-04-30 12:57 ` Pavel Begunkov 2021-04-30 14:35 ` Al Viro 0 siblings, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2021-04-30 12:57 UTC (permalink / raw) To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring On 4/28/21 7:16 AM, yangerkun wrote: > Hi, > > Should we pick this patch for 5.13? Looks ok to me > > 在 2021/4/16 1:39, Pavel Begunkov 写道: >> On 15/04/2021 18:37, Pavel Begunkov wrote: >>> On 09/04/2021 15:49, Pavel Begunkov wrote: >>>> On 01/04/2021 08:18, yangerkun wrote: >>>>> We get a bug: >>>>> >>>>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 >>>>> lib/iov_iter.c:1139 >>>>> Read of size 8 at addr ffff0000d3fb11f8 by task >>>>> >>>>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted >>>>> 5.10.0-00843-g352c8610ccd2 #2 >>>>> Hardware name: linux,dummy-virt (DT) >>>>> Call trace: >>> ... >>>>> __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 >>>>> iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 >>>>> io_read fs/io_uring.c:3421 [inline] >>>>> io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 >>>>> __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 >>>>> io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 >>>>> io_submit_sqe fs/io_uring.c:6395 [inline] >>>>> io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 >>> ... >>>>> >>>>> blkdev_read_iter can truncate iov_iter's count since the count + pos may >>>>> exceed the size of the blkdev. This will confuse io_read that we have >>>>> consume the iovec. And once we do the iov_iter_revert in io_read, we >>>>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with >>>>> size has been truncated. >>>> >>>> Looks right, >>>> >>>> Acked-by: Pavel Begunkov <[email protected]> >>> >>> Fwiw, we need to forget to drag it through 5.13 + stable >> >> Err, yypo, to _not_ forget to 5.13 + stable... >> >>> >>> >>>>> >>>>> blkdev_write_iter can trigger the problem too. >>>>> >>>>> Signed-off-by: yangerkun <[email protected]> >>>>> --- >>>>> fs/block_dev.c | 20 +++++++++++++++++--- >>>>> 1 file changed, 17 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/fs/block_dev.c b/fs/block_dev.c >>>>> index 92ed7d5df677..788e1014576f 100644 >>>>> --- a/fs/block_dev.c >>>>> +++ b/fs/block_dev.c >>>>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>>>> struct inode *bd_inode = bdev_file_inode(file); >>>>> loff_t size = i_size_read(bd_inode); >>>>> struct blk_plug plug; >>>>> + size_t shorted = 0; >>>>> ssize_t ret; >>>>> if (bdev_read_only(I_BDEV(bd_inode))) >>>>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from) >>>>> if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT) >>>>> return -EOPNOTSUPP; >>>>> - iov_iter_truncate(from, size - iocb->ki_pos); >>>>> + size -= iocb->ki_pos; >>>>> + if (iov_iter_count(from) > size) { >>>>> + shorted = iov_iter_count(from) - size; >>>>> + iov_iter_truncate(from, size); >>>>> + } >>>>> blk_start_plug(&plug); >>>>> ret = __generic_file_write_iter(iocb, from); >>>>> if (ret > 0) >>>>> ret = generic_write_sync(iocb, ret); >>>>> + iov_iter_reexpand(from, iov_iter_count(from) + shorted); >>>>> blk_finish_plug(&plug); >>>>> return ret; >>>>> } >>>>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to) >>>>> struct inode *bd_inode = bdev_file_inode(file); >>>>> loff_t size = i_size_read(bd_inode); >>>>> loff_t pos = iocb->ki_pos; >>>>> + size_t shorted = 0; >>>>> + ssize_t ret; >>>>> if (pos >= size) >>>>> return 0; >>>>> size -= pos; >>>>> - iov_iter_truncate(to, size); >>>>> - return generic_file_read_iter(iocb, to); >>>>> + if (iov_iter_count(to) > size) { >>>>> + shorted = iov_iter_count(to) - size; >>>>> + iov_iter_truncate(to, size); >>>>> + } >>>>> + >>>>> + ret = generic_file_read_iter(iocb, to); >>>>> + iov_iter_reexpand(to, iov_iter_count(to) + shorted); >>>>> + return ret; >>>>> } >>>>> EXPORT_SYMBOL_GPL(blkdev_read_iter); >>>>> >>>> >>> >> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-30 12:57 ` Pavel Begunkov @ 2021-04-30 14:35 ` Al Viro 2021-05-06 16:57 ` Pavel Begunkov 2021-05-06 17:19 ` Jens Axboe 0 siblings, 2 replies; 18+ messages in thread From: Al Viro @ 2021-04-30 14:35 UTC (permalink / raw) To: Pavel Begunkov; +Cc: yangerkun, axboe, linux-fsdevel, linux-block, io-uring On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote: > On 4/28/21 7:16 AM, yangerkun wrote: > > Hi, > > > > Should we pick this patch for 5.13? > > Looks ok to me Looks sane. BTW, Pavel, could you go over #untested.iov_iter and give it some beating? Ideally - with per-commit profiling to see what speedups/slowdowns do they come with... It's not in the final state (if nothing else, it needs to be rebased on top of xarray stuff, and there will be followup cleanups as well), but I'd appreciate testing and profiling data... It does survive xfstests + LTP syscall tests, but that's about it. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-30 14:35 ` Al Viro @ 2021-05-06 16:57 ` Pavel Begunkov 2021-05-06 17:17 ` Al Viro 2021-05-06 17:19 ` Jens Axboe 1 sibling, 1 reply; 18+ messages in thread From: Pavel Begunkov @ 2021-05-06 16:57 UTC (permalink / raw) To: Al Viro, Jens Axboe; +Cc: yangerkun, linux-fsdevel, linux-block, io-uring On 4/30/21 3:35 PM, Al Viro wrote: > On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote: >> On 4/28/21 7:16 AM, yangerkun wrote: >>> Hi, >>> >>> Should we pick this patch for 5.13? >> >> Looks ok to me > > Looks sane. BTW, Pavel, could you go over #untested.iov_iter > and give it some beating? Ideally - with per-commit profiling to see > what speedups/slowdowns do they come with... I've heard Jens already tested it out. Jens, is that right? Can you share? especially since you have much more fitting hardware. > > It's not in the final state (if nothing else, it needs to be > rebased on top of xarray stuff, and there will be followup cleanups > as well), but I'd appreciate testing and profiling data... > > It does survive xfstests + LTP syscall tests, but that's about > it. > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-05-06 16:57 ` Pavel Begunkov @ 2021-05-06 17:17 ` Al Viro 0 siblings, 0 replies; 18+ messages in thread From: Al Viro @ 2021-05-06 17:17 UTC (permalink / raw) To: Pavel Begunkov Cc: Jens Axboe, yangerkun, linux-fsdevel, linux-block, io-uring On Thu, May 06, 2021 at 05:57:02PM +0100, Pavel Begunkov wrote: > On 4/30/21 3:35 PM, Al Viro wrote: > > On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote: > >> On 4/28/21 7:16 AM, yangerkun wrote: > >>> Hi, > >>> > >>> Should we pick this patch for 5.13? > >> > >> Looks ok to me > > > > Looks sane. BTW, Pavel, could you go over #untested.iov_iter > > and give it some beating? Ideally - with per-commit profiling to see > > what speedups/slowdowns do they come with... > > I've heard Jens already tested it out. Jens, is that right? Can you > share? especially since you have much more fitting hardware. FWIW, the current branch is #untested.iov_iter-3 and the code generated by it at least _looks_ better than with mainline; how much of an improvement does it make would have to be found by profiling... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-04-30 14:35 ` Al Viro 2021-05-06 16:57 ` Pavel Begunkov @ 2021-05-06 17:19 ` Jens Axboe 2021-05-06 18:55 ` Al Viro 1 sibling, 1 reply; 18+ messages in thread From: Jens Axboe @ 2021-05-06 17:19 UTC (permalink / raw) To: Al Viro, Pavel Begunkov; +Cc: yangerkun, linux-fsdevel, linux-block, io-uring On 4/30/21 8:35 AM, Al Viro wrote: > On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote: >> On 4/28/21 7:16 AM, yangerkun wrote: >>> Hi, >>> >>> Should we pick this patch for 5.13? >> >> Looks ok to me > > Looks sane. BTW, Pavel, could you go over #untested.iov_iter > and give it some beating? Ideally - with per-commit profiling to see > what speedups/slowdowns do they come with... > > It's not in the final state (if nothing else, it needs to be > rebased on top of xarray stuff, and there will be followup cleanups > as well), but I'd appreciate testing and profiling data... > > It does survive xfstests + LTP syscall tests, but that's about > it. Al, I ran your v3 branch of that and I didn't see anything in terms of speedups. The test case is something that just writes to eventfd a ton of times, enough to get a picture of the overall runtime. First I ran with the existing baseline, which is eventfd using ->write(): Executed in 436.58 millis fish external usr time 106.21 millis 121.00 micros 106.09 millis sys time 331.32 millis 33.00 micros 331.29 millis Executed in 436.84 millis fish external usr time 113.38 millis 0.00 micros 113.38 millis sys time 324.32 millis 226.00 micros 324.10 millis Then I ran it with the eventfd ->write_iter() patch I posted: Executed in 484.54 millis fish external usr time 93.19 millis 119.00 micros 93.07 millis sys time 391.35 millis 46.00 micros 391.30 millis Executed in 485.45 millis fish external usr time 96.05 millis 0.00 micros 96.05 millis sys time 389.42 millis 216.00 micros 389.20 millis Doing a quick profile, on the latter run with ->write_iter() we're spending 8% of the time in _copy_from_iter(), and 4% in new_sync_write(). That's obviously not there at all for the first case. Both have about 4% in eventfd_write(). Non-iter case spends 1% in copy_from_user(). Finally with your branch pulled in as well, iow using ->write_iter() for eventfd and your iov changes: Executed in 485.26 millis fish external usr time 103.09 millis 70.00 micros 103.03 millis sys time 382.18 millis 83.00 micros 382.09 millis Executed in 485.16 millis fish external usr time 104.07 millis 69.00 micros 104.00 millis sys time 381.09 millis 94.00 micros 381.00 millis and there's no real difference there. We're spending less time in _copy_from_iter() (8% -> 6%) and less time in new_sync_write(), but doesn't seem to manifest itself in reduced runtime. -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-05-06 17:19 ` Jens Axboe @ 2021-05-06 18:55 ` Al Viro 2021-05-06 19:15 ` Jens Axboe 0 siblings, 1 reply; 18+ messages in thread From: Al Viro @ 2021-05-06 18:55 UTC (permalink / raw) To: Jens Axboe Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring On Thu, May 06, 2021 at 11:19:03AM -0600, Jens Axboe wrote: > Doing a quick profile, on the latter run with ->write_iter() we're > spending 8% of the time in _copy_from_iter(), and 4% in > new_sync_write(). That's obviously not there at all for the first case. > Both have about 4% in eventfd_write(). Non-iter case spends 1% in > copy_from_user(). > > Finally with your branch pulled in as well, iow using ->write_iter() for > eventfd and your iov changes: > > Executed in 485.26 millis fish external > usr time 103.09 millis 70.00 micros 103.03 millis > sys time 382.18 millis 83.00 micros 382.09 millis > > Executed in 485.16 millis fish external > usr time 104.07 millis 69.00 micros 104.00 millis > sys time 381.09 millis 94.00 micros 381.00 millis > > and there's no real difference there. We're spending less time in > _copy_from_iter() (8% -> 6%) and less time in new_sync_write(), but > doesn't seem to manifest itself in reduced runtime. Interesting... do you have instruction-level profiles for _copy_from_iter() and new_sync_write() on the last of those trees? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-05-06 18:55 ` Al Viro @ 2021-05-06 19:15 ` Jens Axboe 2021-05-06 21:08 ` Al Viro 0 siblings, 1 reply; 18+ messages in thread From: Jens Axboe @ 2021-05-06 19:15 UTC (permalink / raw) To: Al Viro; +Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring [-- Attachment #1: Type: text/plain, Size: 1273 bytes --] On 5/6/21 12:55 PM, Al Viro wrote: > On Thu, May 06, 2021 at 11:19:03AM -0600, Jens Axboe wrote: > >> Doing a quick profile, on the latter run with ->write_iter() we're >> spending 8% of the time in _copy_from_iter(), and 4% in >> new_sync_write(). That's obviously not there at all for the first case. >> Both have about 4% in eventfd_write(). Non-iter case spends 1% in >> copy_from_user(). >> >> Finally with your branch pulled in as well, iow using ->write_iter() for >> eventfd and your iov changes: >> >> Executed in 485.26 millis fish external >> usr time 103.09 millis 70.00 micros 103.03 millis >> sys time 382.18 millis 83.00 micros 382.09 millis >> >> Executed in 485.16 millis fish external >> usr time 104.07 millis 69.00 micros 104.00 millis >> sys time 381.09 millis 94.00 micros 381.00 millis >> >> and there's no real difference there. We're spending less time in >> _copy_from_iter() (8% -> 6%) and less time in new_sync_write(), but >> doesn't seem to manifest itself in reduced runtime. > > Interesting... do you have instruction-level profiles for _copy_from_iter() > and new_sync_write() on the last of those trees? Attached output of perf annotate <func> for that last run. -- Jens Axboe [-- Attachment #2: nsw --] [-- Type: text/plain, Size: 10648 bytes --] Percent | Source code & Disassembly of vmlinux for cycles (72 samples, percent: local period) --------------------------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : ffffffff812cef20 <new_sync_write>: : new_sync_write(): : inc_syscr(current); : return ret; : } : : static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos) : { 0.00 : ffffffff812cef20: callq ffffffff8103a8a0 <__fentry__> 0.00 : ffffffff812cef25: push %rbp 0.00 : ffffffff812cef26: mov %rdx,%r8 5.55 : ffffffff812cef29: mov %rsp,%rbp 0.00 : ffffffff812cef2c: push %r12 0.00 : ffffffff812cef2e: push %rbx 0.00 : ffffffff812cef2f: mov %rcx,%r12 0.00 : ffffffff812cef32: sub $0x68,%rsp : struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len }; 0.00 : ffffffff812cef36: mov %rdx,-0x70(%rbp) : iocb_flags(): : } : : static inline int iocb_flags(struct file *file) : { : int res = 0; : if (file->f_flags & O_APPEND) 0.00 : ffffffff812cef3a: mov 0x40(%rdi),%edx : new_sync_write(): : { 8.33 : ffffffff812cef3d: mov %rdi,%rbx : struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len }; 0.00 : ffffffff812cef40: mov %rsi,-0x78(%rbp) : iocb_flags(): 0.00 : ffffffff812cef44: mov %edx,%eax 0.00 : ffffffff812cef46: shr $0x6,%eax 0.00 : ffffffff812cef49: and $0x10,%eax : res |= IOCB_APPEND; : if (file->f_flags & O_DIRECT) : res |= IOCB_DIRECT; 0.00 : ffffffff812cef4c: mov %eax,%ecx 0.00 : ffffffff812cef4e: or $0x20000,%ecx 0.00 : ffffffff812cef54: test $0x40,%dh 6.94 : ffffffff812cef57: cmovne %ecx,%eax : if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) 0.00 : ffffffff812cef5a: test $0x10,%dh 0.00 : ffffffff812cef5d: jne ffffffff812cef77 <new_sync_write+0x57> 0.00 : ffffffff812cef5f: mov 0xd0(%rdi),%rcx 0.00 : ffffffff812cef66: mov (%rcx),%rcx 0.00 : ffffffff812cef69: mov 0x28(%rcx),%rsi 0.00 : ffffffff812cef6d: testb $0x10,0x50(%rsi) 13.89 : ffffffff812cef71: je ffffffff812cf04c <new_sync_write+0x12c> : res |= IOCB_DSYNC; 0.00 : ffffffff812cef77: or $0x2,%eax : if (file->f_flags & __O_SYNC) : res |= IOCB_SYNC; 0.00 : ffffffff812cef7a: mov %eax,%ecx 0.00 : ffffffff812cef7c: or $0x4,%ecx 0.00 : ffffffff812cef7f: and $0x100000,%edx : file_write_hint(): : if (file->f_write_hint != WRITE_LIFE_NOT_SET) 0.00 : ffffffff812cef85: mov 0x34(%rbx),%edx : iocb_flags(): : res |= IOCB_SYNC; 0.00 : ffffffff812cef88: cmovne %ecx,%eax : file_write_hint(): : if (file->f_write_hint != WRITE_LIFE_NOT_SET) 0.00 : ffffffff812cef8b: test %edx,%edx 6.97 : ffffffff812cef8d: jne ffffffff812cf03c <new_sync_write+0x11c> : return file_inode(file)->i_write_hint; 0.00 : ffffffff812cef93: mov 0x20(%rbx),%rdx 0.00 : ffffffff812cef97: movzbl 0x87(%rdx),%edx : get_current(): : : DECLARE_PER_CPU(struct task_struct *, current_task); : : static __always_inline struct task_struct *get_current(void) : { : return this_cpu_read_stable(current_task); 0.00 : ffffffff812cef9e: mov %gs:0x126c0,%rcx : get_current_ioprio(): : * If the calling process has set an I/O priority, use that. Otherwise, return : * the default I/O priority. : */ : static inline int get_current_ioprio(void) : { : struct io_context *ioc = current->io_context; 0.00 : ffffffff812cefa7: mov 0x860(%rcx),%rsi : : if (ioc) 0.00 : ffffffff812cefae: xor %ecx,%ecx 0.00 : ffffffff812cefb0: test %rsi,%rsi 0.00 : ffffffff812cefb3: je ffffffff812cefb9 <new_sync_write+0x99> : return ioc->ioprio; 0.00 : ffffffff812cefb5: movzwl 0x14(%rsi),%ecx : init_sync_kiocb(): : *kiocb = (struct kiocb) { 0.00 : ffffffff812cefb9: shl $0x10,%ecx 12.50 : ffffffff812cefbc: movzwl %dx,%edx 0.00 : ffffffff812cefbf: movq $0x0,-0x38(%rbp) 0.00 : ffffffff812cefc7: movq $0x0,-0x30(%rbp) 0.00 : ffffffff812cefcf: or %ecx,%edx 0.00 : ffffffff812cefd1: movq $0x0,-0x28(%rbp) 0.00 : ffffffff812cefd9: movq $0x0,-0x18(%rbp) 0.00 : ffffffff812cefe1: mov %rbx,-0x40(%rbp) 0.00 : ffffffff812cefe5: mov %eax,-0x20(%rbp) 6.93 : ffffffff812cefe8: mov %edx,-0x1c(%rbp) : new_sync_write(): : struct kiocb kiocb; : struct iov_iter iter; : ssize_t ret; : : init_sync_kiocb(&kiocb, filp); : kiocb.ki_pos = (ppos ? *ppos : 0); 0.00 : ffffffff812cefeb: test %r12,%r12 0.00 : ffffffff812cefee: je ffffffff812cf05b <new_sync_write+0x13b> : iov_iter_init(&iter, WRITE, &iov, 1, len); 0.00 : ffffffff812ceff0: mov $0x1,%esi 0.00 : ffffffff812ceff5: lea -0x68(%rbp),%rdi 0.00 : ffffffff812ceff9: mov $0x1,%ecx 0.00 : ffffffff812ceffe: lea -0x78(%rbp),%rdx : kiocb.ki_pos = (ppos ? *ppos : 0); 0.00 : ffffffff812cf002: mov (%r12),%rax 0.00 : ffffffff812cf006: mov %rax,-0x38(%rbp) : iov_iter_init(&iter, WRITE, &iov, 1, len); 8.33 : ffffffff812cf00a: callq ffffffff814c45e0 <iov_iter_init> : call_write_iter(): : return file->f_op->write_iter(kio, iter); 12.51 : ffffffff812cf00f: mov 0x28(%rbx),%rax 0.00 : ffffffff812cf013: lea -0x68(%rbp),%rsi 0.00 : ffffffff812cf017: lea -0x40(%rbp),%rdi 0.00 : ffffffff812cf01b: callq *0x28(%rax) : new_sync_write(): : : ret = call_write_iter(filp, &kiocb, &iter); : BUG_ON(ret == -EIOCBQUEUED); 0.00 : ffffffff812cf01e: cmp $0xfffffffffffffdef,%rax 0.00 : ffffffff812cf024: je ffffffff812cf089 <new_sync_write+0x169> : if (ret > 0 && ppos) 0.00 : ffffffff812cf026: test %rax,%rax 0.00 : ffffffff812cf029: jle ffffffff812cf033 <new_sync_write+0x113> : *ppos = kiocb.ki_pos; 0.00 : ffffffff812cf02b: mov -0x38(%rbp),%rdx 12.49 : ffffffff812cf02f: mov %rdx,(%r12) : return ret; : } 0.00 : ffffffff812cf033: add $0x68,%rsp 0.00 : ffffffff812cf037: pop %rbx 0.00 : ffffffff812cf038: pop %r12 0.00 : ffffffff812cf03a: pop %rbp 0.00 : ffffffff812cf03b: retq : ki_hint_validate(): : if (hint <= max_hint) 0.00 : ffffffff812cf03c: xor %ecx,%ecx 0.00 : ffffffff812cf03e: cmp $0xffff,%edx 0.00 : ffffffff812cf044: cmova %ecx,%edx 0.00 : ffffffff812cf047: jmpq ffffffff812cef9e <new_sync_write+0x7e> : iocb_flags(): : if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) 5.55 : ffffffff812cf04c: testb $0x1,0xc(%rcx) 0.00 : ffffffff812cf050: je ffffffff812cef7a <new_sync_write+0x5a> 0.00 : ffffffff812cf056: jmpq ffffffff812cef77 <new_sync_write+0x57> : new_sync_write(): : iov_iter_init(&iter, WRITE, &iov, 1, len); 0.00 : ffffffff812cf05b: mov $0x1,%esi 0.00 : ffffffff812cf060: lea -0x68(%rbp),%rdi 0.00 : ffffffff812cf064: mov $0x1,%ecx 0.00 : ffffffff812cf069: lea -0x78(%rbp),%rdx 0.00 : ffffffff812cf06d: callq ffffffff814c45e0 <iov_iter_init> : call_write_iter(): : return file->f_op->write_iter(kio, iter); 0.00 : ffffffff812cf072: mov 0x28(%rbx),%rax 0.00 : ffffffff812cf076: lea -0x68(%rbp),%rsi 0.00 : ffffffff812cf07a: lea -0x40(%rbp),%rdi 0.00 : ffffffff812cf07e: callq *0x28(%rax) : new_sync_write(): : BUG_ON(ret == -EIOCBQUEUED); 0.00 : ffffffff812cf081: cmp $0xfffffffffffffdef,%rax 0.00 : ffffffff812cf087: jne ffffffff812cf033 <new_sync_write+0x113> 0.00 : ffffffff812cf089: ud2 [-- Attachment #3: cfi --] [-- Type: text/plain, Size: 30346 bytes --] Percent | Source code & Disassembly of vmlinux for cycles (113 samples, percent: local period) ---------------------------------------------------------------------------------------------------- : : : : Disassembly of section .text: : : ffffffff814c6aa0 <_copy_from_iter>: : _copy_from_iter(): : } : EXPORT_SYMBOL_GPL(_copy_mc_to_iter); : #endif /* CONFIG_ARCH_HAS_COPY_MC */ : : size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) : { 0.00 : ffffffff814c6aa0: push %rbp 7.07 : ffffffff814c6aa1: mov %rdx,%rax 0.00 : ffffffff814c6aa4: mov %rsp,%rbp 0.00 : ffffffff814c6aa7: push %r15 0.00 : ffffffff814c6aa9: push %r14 0.00 : ffffffff814c6aab: push %r13 3.54 : ffffffff814c6aad: push %r12 0.00 : ffffffff814c6aaf: push %rbx 0.00 : ffffffff814c6ab0: sub $0x50,%rsp 0.00 : ffffffff814c6ab4: mov %rdx,-0x78(%rbp) : iov_iter_type(): : }; : }; : : static inline enum iter_type iov_iter_type(const struct iov_iter *i) : { : return i->iter_type; 0.89 : ffffffff814c6ab8: movzbl (%rdx),%edx : _copy_from_iter(): 0.00 : ffffffff814c6abb: mov %rdi,-0x68(%rbp) : if (unlikely(iov_iter_is_pipe(i))) { 0.00 : ffffffff814c6abf: cmp $0x3,%dl 0.00 : ffffffff814c6ac2: je ffffffff814c6bd6 <_copy_from_iter+0x136> 0.00 : ffffffff814c6ac8: mov %rax,%rdi : WARN_ON(1); : return 0; : } : if (iter_is_iovec(i)) : might_fault(); : iterate_and_advance(i, bytes, base, len, off, 0.00 : ffffffff814c6acb: mov 0x10(%rax),%rax 0.00 : ffffffff814c6acf: cmp %rsi,%rax 0.00 : ffffffff814c6ad2: cmovbe %rax,%rsi 2.65 : ffffffff814c6ad6: mov %rsi,%r13 0.00 : ffffffff814c6ad9: test %rsi,%rsi 3.52 : ffffffff814c6adc: je ffffffff814c6bdd <_copy_from_iter+0x13d> 1.76 : ffffffff814c6ae2: test %dl,%dl 0.00 : ffffffff814c6ae4: jne ffffffff814c6be2 <_copy_from_iter+0x142> 0.00 : ffffffff814c6aea: mov 0x18(%rdi),%rax 0.00 : ffffffff814c6aee: mov 0x8(%rdi),%r14 0.00 : ffffffff814c6af2: xor %r15d,%r15d 0.00 : ffffffff814c6af5: mov -0x68(%rbp),%rdi 25.58 : ffffffff814c6af9: lea 0x10(%rax),%r12 0.00 : ffffffff814c6afd: jmp ffffffff814c6b0e <_copy_from_iter+0x6e> 0.00 : ffffffff814c6aff: mov -0x68(%rbp),%rax 0.00 : ffffffff814c6b03: lea (%rax,%r15,1),%rdi 0.00 : ffffffff814c6b07: add $0x10,%r12 : { 0.00 : ffffffff814c6b0b: xor %r14d,%r14d : iterate_and_advance(i, bytes, base, len, off, 0.00 : ffffffff814c6b0e: mov -0x8(%r12),%rcx 1.09 : ffffffff814c6b13: lea -0x10(%r12),%rax 0.00 : ffffffff814c6b18: mov %r12,-0x60(%rbp) 0.00 : ffffffff814c6b1c: mov %rax,-0x70(%rbp) 0.00 : ffffffff814c6b20: mov %rcx,%rbx 0.00 : ffffffff814c6b23: sub %r14,%rbx 1.76 : ffffffff814c6b26: cmp %r13,%rbx 0.00 : ffffffff814c6b29: cmova %r13,%rbx 0.00 : ffffffff814c6b2d: test %rbx,%rbx 0.00 : ffffffff814c6b30: je ffffffff814c6b07 <_copy_from_iter+0x67> 0.00 : ffffffff814c6b32: mov -0x10(%r12),%rsi 0.00 : ffffffff814c6b37: mov %rbx,%rax 0.00 : ffffffff814c6b3a: add %r14,%rsi : __chk_range_not_ok(): : */ : if (__builtin_constant_p(size)) : return unlikely(addr > limit - size); : : /* Arbitrary sizes? Be careful about overflow */ : addr += size; 0.00 : ffffffff814c6b3d: add %rsi,%rax 4.42 : ffffffff814c6b40: jb ffffffff814c6bd1 <_copy_from_iter+0x131> : copyin(): : if (access_ok(from, n)) { 0.00 : ffffffff814c6b46: movabs $0x7ffffffff000,%rdx 0.00 : ffffffff814c6b50: cmp %rdx,%rax 3.52 : ffffffff814c6b53: ja ffffffff814c6bd1 <_copy_from_iter+0x131> : copy_user_generic(): : /* : * If CPU has ERMS feature, use copy_user_enhanced_fast_string. : * Otherwise, if CPU has rep_good feature, use copy_user_generic_string. : * Otherwise, use copy_user_generic_unrolled. : */ : alternative_call_2(copy_user_generic_unrolled, 0.00 : ffffffff814c6b55: mov %ebx,%edx 0.00 : ffffffff814c6b57: callq ffffffff81523880 <copy_user_generic_unrolled> : _copy_from_iter(): : iterate_and_advance(i, bytes, base, len, off, 6.18 : ffffffff814c6b5c: mov -0x8(%r12),%rcx : copy_user_generic(): : X86_FEATURE_ERMS, : ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from), : "=d" (len)), : "1" (to), "2" (from), "3" (len) : : "memory", "rcx", "r8", "r9", "r10", "r11"); : return ret; 0.00 : ffffffff814c6b61: mov %eax,%eax : _copy_from_iter(): 0.00 : ffffffff814c6b63: cltq 0.00 : ffffffff814c6b65: mov %rbx,%rdx 0.00 : ffffffff814c6b68: sub %rbx,%r13 0.00 : ffffffff814c6b6b: sub %rax,%rdx 0.00 : ffffffff814c6b6e: add %rax,%r13 0.00 : ffffffff814c6b71: add %rdx,%r15 3.53 : ffffffff814c6b74: add %r14,%rdx 0.00 : ffffffff814c6b77: cmp %rcx,%rdx 0.00 : ffffffff814c6b7a: jb ffffffff814c6bc4 <_copy_from_iter+0x124> 0.00 : ffffffff814c6b7c: test %r13,%r13 0.00 : ffffffff814c6b7f: jne ffffffff814c6aff <_copy_from_iter+0x5f> 0.00 : ffffffff814c6b85: mov -0x78(%rbp),%rcx 2.66 : ffffffff814c6b89: mov -0x60(%rbp),%rdi 2.65 : ffffffff814c6b8d: mov %rdi,%rax 0.00 : ffffffff814c6b90: sub 0x18(%rcx),%rax 11.54 : ffffffff814c6b94: mov %r13,0x8(%rcx) 0.00 : ffffffff814c6b98: mov %rdi,0x18(%rcx) 12.36 : ffffffff814c6b9c: mov %rcx,%rdi 0.00 : ffffffff814c6b9f: sar $0x4,%rax 0.00 : ffffffff814c6ba3: sub %rax,0x20(%rcx) 0.00 : ffffffff814c6ba7: mov 0x10(%rcx),%rax 0.00 : ffffffff814c6bab: sub %r15,%rax 0.00 : ffffffff814c6bae: mov %rax,0x10(%rdi) : copyin(addr + off, base, len), : memcpy(addr + off, base, len) : ) : : return bytes; : } 3.53 : ffffffff814c6bb2: add $0x50,%rsp 0.00 : ffffffff814c6bb6: mov %r15,%rax 0.00 : ffffffff814c6bb9: pop %rbx 0.00 : ffffffff814c6bba: pop %r12 0.00 : ffffffff814c6bbc: pop %r13 0.00 : ffffffff814c6bbe: pop %r14 0.00 : ffffffff814c6bc0: pop %r15 1.76 : ffffffff814c6bc2: pop %rbp 0.00 : ffffffff814c6bc3: retq 0.00 : ffffffff814c6bc4: mov -0x70(%rbp),%rax 0.00 : ffffffff814c6bc8: mov %rdx,%r13 0.00 : ffffffff814c6bcb: mov %rax,-0x60(%rbp) 0.00 : ffffffff814c6bcf: jmp ffffffff814c6b85 <_copy_from_iter+0xe5> : copyin(): 0.00 : ffffffff814c6bd1: mov %rbx,%rax 0.00 : ffffffff814c6bd4: jmp ffffffff814c6b63 <_copy_from_iter+0xc3> : _copy_from_iter(): : WARN_ON(1); 0.00 : ffffffff814c6bd6: ud2 : return 0; 0.00 : ffffffff814c6bd8: xor %r15d,%r15d 0.00 : ffffffff814c6bdb: jmp ffffffff814c6bb2 <_copy_from_iter+0x112> 0.00 : ffffffff814c6bdd: xor %r15d,%r15d 0.00 : ffffffff814c6be0: jmp ffffffff814c6bb2 <_copy_from_iter+0x112> : iterate_and_advance(i, bytes, base, len, off, 0.00 : ffffffff814c6be2: cmp $0x2,%dl 0.00 : ffffffff814c6be5: je ffffffff814c6e09 <_copy_from_iter+0x369> 0.00 : ffffffff814c6beb: cmp $0x1,%dl 0.00 : ffffffff814c6bee: je ffffffff814c6d6b <_copy_from_iter+0x2cb> 0.00 : ffffffff814c6bf4: mov %rsi,%r15 0.00 : ffffffff814c6bf7: cmp $0x4,%dl 0.00 : ffffffff814c6bfa: jne ffffffff814c6bab <_copy_from_iter+0x10b> 0.00 : ffffffff814c6bfc: mov 0x8(%rdi),%rax 0.00 : ffffffff814c6c00: add 0x20(%rdi),%rax 0.00 : ffffffff814c6c04: movl $0x0,-0x48(%rbp) 0.00 : ffffffff814c6c0b: movq $0x3,-0x40(%rbp) 0.00 : ffffffff814c6c13: movq $0x0,-0x38(%rbp) 0.00 : ffffffff814c6c1b: movq $0x0,-0x30(%rbp) 0.00 : ffffffff814c6c23: mov %eax,%ebx 0.00 : ffffffff814c6c25: shr $0xc,%rax 0.00 : ffffffff814c6c29: mov %rax,%rcx 0.00 : ffffffff814c6c2c: mov %rax,-0x60(%rbp) 0.00 : ffffffff814c6c30: mov 0x18(%rdi),%rax 0.00 : ffffffff814c6c34: and $0xfff,%ebx 0.00 : ffffffff814c6c3a: mov %rcx,-0x50(%rbp) 0.00 : ffffffff814c6c3e: mov %rax,-0x58(%rbp) 0.00 : ffffffff814c6c42: mov $0xffffffffffffffff,%rsi 0.00 : ffffffff814c6c49: lea -0x58(%rbp),%rdi 0.00 : ffffffff814c6c4d: xor %r15d,%r15d 0.00 : ffffffff814c6c50: callq ffffffff8151fe40 <xas_find> 0.00 : ffffffff814c6c55: mov %rax,%r14 0.00 : ffffffff814c6c58: test %rax,%rax 0.00 : ffffffff814c6c5b: je ffffffff814c6d51 <_copy_from_iter+0x2b1> 0.00 : ffffffff814c6c61: mov %ebx,%r12d : xas_retry(): : * Context: Any context. : * Return: true if the operation needs to be retried. : */ : static inline bool xas_retry(struct xa_state *xas, const void *entry) : { : if (xa_is_zero(entry)) 0.00 : ffffffff814c6c64: cmp $0x406,%r14 0.00 : ffffffff814c6c6b: je ffffffff814c6d1c <_copy_from_iter+0x27c> : return true; : if (!xa_is_retry(entry)) 0.00 : ffffffff814c6c71: cmp $0x402,%r14 0.00 : ffffffff814c6c78: je ffffffff814c6f7f <_copy_from_iter+0x4df> : _copy_from_iter(): 0.00 : ffffffff814c6c7e: test $0x1,%r14b 0.00 : ffffffff814c6c82: jne ffffffff814c6f78 <_copy_from_iter+0x4d8> 0.00 : ffffffff814c6c88: mov %r14,%rdi 0.00 : ffffffff814c6c8b: callq ffffffff81296c00 <PageHuge> 0.00 : ffffffff814c6c90: mov %eax,%ebx 0.00 : ffffffff814c6c92: test %eax,%eax 0.00 : ffffffff814c6c94: jne ffffffff814c6f30 <_copy_from_iter+0x490> 0.00 : ffffffff814c6c9a: mov -0x60(%rbp),%rdi 0.00 : ffffffff814c6c9e: mov 0x20(%r14),%rax 0.00 : ffffffff814c6ca2: mov %edi,%ecx 0.00 : ffffffff814c6ca4: sub %eax,%ecx 0.00 : ffffffff814c6ca6: cmp %rdi,%rax 0.00 : ffffffff814c6ca9: cmovb %ecx,%ebx 0.00 : ffffffff814c6cac: jmp ffffffff814c6d00 <_copy_from_iter+0x260> 0.00 : ffffffff814c6cae: mov %r12d,%eax 0.00 : ffffffff814c6cb1: mov $0x1000,%edx 0.00 : ffffffff814c6cb6: movslq %ebx,%rsi 0.00 : ffffffff814c6cb9: mov -0x68(%rbp),%rcx 0.00 : ffffffff814c6cbd: sub %rax,%rdx 0.00 : ffffffff814c6cc0: cmp %r13,%rdx 0.00 : ffffffff814c6cc3: cmova %r13,%rdx 0.00 : ffffffff814c6cc7: shl $0x6,%rsi 0.00 : ffffffff814c6ccb: add %r14,%rsi : lowmem_page_address(): : */ : #include <linux/vmstat.h> : : static __always_inline void *lowmem_page_address(const struct page *page) : { : return page_to_virt(page); 0.00 : ffffffff814c6cce: sub 0xebda73(%rip),%rsi # ffffffff82384748 <vmemmap_base> : _copy_from_iter(): 0.00 : ffffffff814c6cd5: mov %rdx,%r12 0.00 : ffffffff814c6cd8: lea (%rcx,%r15,1),%rdi 0.00 : ffffffff814c6cdc: add %r12,%r15 : lowmem_page_address(): 0.00 : ffffffff814c6cdf: sar $0x6,%rsi 0.00 : ffffffff814c6ce3: shl $0xc,%rsi 0.00 : ffffffff814c6ce7: add 0xebda6a(%rip),%rsi # ffffffff82384758 <page_offset_base> : _copy_from_iter(): 0.00 : ffffffff814c6cee: add %rax,%rsi : memcpy(): : if (q_size < size) : __read_overflow2(); : } : if (p_size < size || q_size < size) : fortify_panic(__func__); : return __underlying_memcpy(p, q, size); 0.00 : ffffffff814c6cf1: callq ffffffff81a22620 <__memcpy> : _copy_from_iter(): 0.00 : ffffffff814c6cf6: sub %r12,%r13 0.00 : ffffffff814c6cf9: je ffffffff814c6d51 <_copy_from_iter+0x2b1> 0.00 : ffffffff814c6cfb: inc %ebx 0.00 : ffffffff814c6cfd: xor %r12d,%r12d : constant_test_bit(): : } : : static __always_inline bool constant_test_bit(long nr, const volatile unsigned long *addr) : { : return ((1UL << (nr & (BITS_PER_LONG-1))) & : (addr[nr >> _BITOPS_LONG_SHIFT])) != 0; 0.00 : ffffffff814c6d00: mov (%r14),%rax 0.00 : ffffffff814c6d03: shr $0x10,%rax 0.00 : ffffffff814c6d07: and $0x1,%eax : thp_nr_pages(): : */ : static inline int thp_nr_pages(struct page *page) : { : VM_BUG_ON_PGFLAGS(PageTail(page), page); : if (PageHead(page)) : return HPAGE_PMD_NR; 0.00 : ffffffff814c6d0a: cmp $0x1,%al 0.00 : ffffffff814c6d0c: sbb %eax,%eax 0.00 : ffffffff814c6d0e: and $0xfffffe01,%eax 0.00 : ffffffff814c6d13: add $0x200,%eax : _copy_from_iter(): 0.00 : ffffffff814c6d18: cmp %eax,%ebx 0.00 : ffffffff814c6d1a: jl ffffffff814c6cae <_copy_from_iter+0x20e> : xas_next_entry(): : * : * Return: The next present entry after the one currently referred to by @xas. : */ : static inline void *xas_next_entry(struct xa_state *xas, unsigned long max) : { : struct xa_node *node = xas->xa_node; 0.00 : ffffffff814c6d1c: mov -0x40(%rbp),%rdi : xas_not_node(): : return ((unsigned long)node & 3) || !node; 0.00 : ffffffff814c6d20: test $0x3,%dil 0.00 : ffffffff814c6d24: setne %cl 0.00 : ffffffff814c6d27: test %rdi,%rdi 0.00 : ffffffff814c6d2a: sete %al 0.00 : ffffffff814c6d2d: or %al,%cl 0.00 : ffffffff814c6d2f: je ffffffff814c6ecc <_copy_from_iter+0x42c> : xas_next_entry(): : return xas_find(xas, max); : if (unlikely(xas->xa_offset == XA_CHUNK_MASK)) : return xas_find(xas, max); : entry = xa_entry(xas->xa, node, xas->xa_offset + 1); : if (unlikely(xa_is_internal(entry))) : return xas_find(xas, max); 0.00 : ffffffff814c6d35: mov $0xffffffffffffffff,%rsi 0.00 : ffffffff814c6d3c: lea -0x58(%rbp),%rdi 0.00 : ffffffff814c6d40: callq ffffffff8151fe40 <xas_find> 0.00 : ffffffff814c6d45: mov %rax,%r14 : _copy_from_iter(): 0.00 : ffffffff814c6d48: test %rax,%rax 0.00 : ffffffff814c6d4b: jne ffffffff814c6c64 <_copy_from_iter+0x1c4> : __rcu_read_unlock(): : } : : static inline void __rcu_read_unlock(void) : { : preempt_enable(); : rcu_read_unlock_strict(); 0.00 : ffffffff814c6d51: callq ffffffff810e12c0 <rcu_read_unlock_strict> : _copy_from_iter(): 0.00 : ffffffff814c6d56: mov -0x78(%rbp),%rax 0.00 : ffffffff814c6d5a: mov -0x78(%rbp),%rdi 0.00 : ffffffff814c6d5e: add %r15,0x8(%rax) 0.00 : ffffffff814c6d62: mov 0x10(%rax),%rax 0.00 : ffffffff814c6d66: jmpq ffffffff814c6bab <_copy_from_iter+0x10b> 0.00 : ffffffff814c6d6b: mov 0x18(%rdi),%rax 0.00 : ffffffff814c6d6f: xor %r15d,%r15d 0.00 : ffffffff814c6d72: mov 0x8(%rdi),%rbx 0.00 : ffffffff814c6d76: mov -0x68(%rbp),%rdi 0.00 : ffffffff814c6d7a: lea 0x10(%rax),%r12 0.00 : ffffffff814c6d7e: mov %r15,%rax 0.00 : ffffffff814c6d81: mov %r12,%r15 0.00 : ffffffff814c6d84: mov %rax,%r12 0.00 : ffffffff814c6d87: jmp ffffffff814c6d97 <_copy_from_iter+0x2f7> 0.00 : ffffffff814c6d89: mov -0x68(%rbp),%rax 0.00 : ffffffff814c6d8d: lea (%rax,%r12,1),%rdi 0.00 : ffffffff814c6d91: add $0x10,%r15 0.00 : ffffffff814c6d95: xor %ebx,%ebx 0.00 : ffffffff814c6d97: mov -0x8(%r15),%r14 0.00 : ffffffff814c6d9b: lea -0x10(%r15),%rax 0.00 : ffffffff814c6d9f: mov %r15,-0x60(%rbp) 0.00 : ffffffff814c6da3: mov %rax,-0x70(%rbp) 0.00 : ffffffff814c6da7: sub %rbx,%r14 0.00 : ffffffff814c6daa: cmp %r13,%r14 0.00 : ffffffff814c6dad: cmova %r13,%r14 0.00 : ffffffff814c6db1: test %r14,%r14 0.00 : ffffffff814c6db4: je ffffffff814c6d91 <_copy_from_iter+0x2f1> 0.00 : ffffffff814c6db6: mov -0x10(%r15),%rsi : memcpy(): 0.00 : ffffffff814c6dba: mov %r14,%rdx : _copy_from_iter(): 0.00 : ffffffff814c6dbd: add %r14,%r12 0.00 : ffffffff814c6dc0: sub %r14,%r13 0.00 : ffffffff814c6dc3: add %rbx,%rsi : memcpy(): 0.00 : ffffffff814c6dc6: callq ffffffff81a22620 <__memcpy> : _copy_from_iter(): 0.00 : ffffffff814c6dcb: lea (%rbx,%r14,1),%rcx 0.00 : ffffffff814c6dcf: cmp %rcx,-0x8(%r15) 0.00 : ffffffff814c6dd3: ja ffffffff814c6eb9 <_copy_from_iter+0x419> 0.00 : ffffffff814c6dd9: test %r13,%r13 0.00 : ffffffff814c6ddc: jne ffffffff814c6d89 <_copy_from_iter+0x2e9> 0.00 : ffffffff814c6dde: mov %r12,%r15 0.00 : ffffffff814c6de1: mov -0x78(%rbp),%rdi 0.00 : ffffffff814c6de5: mov -0x60(%rbp),%rcx 0.00 : ffffffff814c6de9: mov %rcx,%rax 0.00 : ffffffff814c6dec: sub 0x18(%rdi),%rax 0.00 : ffffffff814c6df0: mov %r13,0x8(%rdi) 0.00 : ffffffff814c6df4: mov %rcx,0x18(%rdi) 0.00 : ffffffff814c6df8: sar $0x4,%rax 0.00 : ffffffff814c6dfc: sub %rax,0x20(%rdi) 0.00 : ffffffff814c6e00: mov 0x10(%rdi),%rax 0.00 : ffffffff814c6e04: jmpq ffffffff814c6bab <_copy_from_iter+0x10b> 0.00 : ffffffff814c6e09: mov 0x18(%rdi),%r14 0.00 : ffffffff814c6e0d: mov 0x8(%rdi),%r12d 0.00 : ffffffff814c6e11: xor %r15d,%r15d 0.00 : ffffffff814c6e14: mov 0xc(%r14),%eax 0.00 : ffffffff814c6e18: mov 0x8(%r14),%edx 0.00 : ffffffff814c6e1c: mov $0x1000,%esi 0.00 : ffffffff814c6e21: mov -0x68(%rbp),%rdi 0.00 : ffffffff814c6e25: add %r12d,%eax 0.00 : ffffffff814c6e28: sub %r12d,%edx 0.00 : ffffffff814c6e2b: mov %eax,%ecx 0.00 : ffffffff814c6e2d: and $0xfff,%ecx 0.00 : ffffffff814c6e33: cmp %r13,%rdx 0.00 : ffffffff814c6e36: cmova %r13,%rdx 0.00 : ffffffff814c6e3a: sub %rcx,%rsi 0.00 : ffffffff814c6e3d: cmp %rsi,%rdx 0.00 : ffffffff814c6e40: cmovbe %rdx,%rsi 0.00 : ffffffff814c6e44: shr $0xc,%eax 0.00 : ffffffff814c6e47: add %r15,%rdi 0.00 : ffffffff814c6e4a: mov %rsi,%rbx 0.00 : ffffffff814c6e4d: mov %eax,%esi 0.00 : ffffffff814c6e4f: shl $0x6,%rsi 0.00 : ffffffff814c6e53: add (%r14),%rsi : memcpy(): 0.00 : ffffffff814c6e56: mov %rbx,%rdx : _copy_from_iter(): 0.00 : ffffffff814c6e59: add %rbx,%r15 : lowmem_page_address(): 0.00 : ffffffff814c6e5c: sub 0xebd8e5(%rip),%rsi # ffffffff82384748 <vmemmap_base> : _copy_from_iter(): 0.00 : ffffffff814c6e63: add %ebx,%r12d : lowmem_page_address(): 0.00 : ffffffff814c6e66: sar $0x6,%rsi 0.00 : ffffffff814c6e6a: shl $0xc,%rsi 0.00 : ffffffff814c6e6e: add 0xebd8e3(%rip),%rsi # ffffffff82384758 <page_offset_base> : _copy_from_iter(): 0.00 : ffffffff814c6e75: add %rcx,%rsi : memcpy(): 0.00 : ffffffff814c6e78: callq ffffffff81a22620 <__memcpy> : _copy_from_iter(): 0.00 : ffffffff814c6e7d: cmp %r12d,0x8(%r14) 0.00 : ffffffff814c6e81: jne ffffffff814c6e8a <_copy_from_iter+0x3ea> 0.00 : ffffffff814c6e83: add $0x10,%r14 0.00 : ffffffff814c6e87: xor %r12d,%r12d 0.00 : ffffffff814c6e8a: sub %rbx,%r13 0.00 : ffffffff814c6e8d: jne ffffffff814c6e14 <_copy_from_iter+0x374> 0.00 : ffffffff814c6e8f: mov -0x78(%rbp),%rcx 0.00 : ffffffff814c6e93: mov %r12d,%eax 0.00 : ffffffff814c6e96: mov %rax,0x8(%rcx) 0.00 : ffffffff814c6e9a: mov %r14,%rax 0.00 : ffffffff814c6e9d: sub 0x18(%rcx),%rax 0.00 : ffffffff814c6ea1: mov %rcx,%rdi 0.00 : ffffffff814c6ea4: mov %r14,0x18(%rcx) 0.00 : ffffffff814c6ea8: sar $0x4,%rax 0.00 : ffffffff814c6eac: sub %rax,0x20(%rcx) 0.00 : ffffffff814c6eb0: mov 0x10(%rcx),%rax 0.00 : ffffffff814c6eb4: jmpq ffffffff814c6bab <_copy_from_iter+0x10b> 0.00 : ffffffff814c6eb9: mov -0x70(%rbp),%rax 0.00 : ffffffff814c6ebd: mov %r12,%r15 0.00 : ffffffff814c6ec0: mov %rcx,%r13 0.00 : ffffffff814c6ec3: mov %rax,-0x60(%rbp) 0.00 : ffffffff814c6ec7: jmpq ffffffff814c6de1 <_copy_from_iter+0x341> : xas_next_entry(): : if (unlikely(xas_not_node(node) || node->shift || 0.00 : ffffffff814c6ecc: cmpb $0x0,(%rdi) 0.00 : ffffffff814c6ecf: jne ffffffff814c6d35 <_copy_from_iter+0x295> 0.00 : ffffffff814c6ed5: mov -0x50(%rbp),%rsi 0.00 : ffffffff814c6ed9: movzbl -0x46(%rbp),%r9d 0.00 : ffffffff814c6ede: mov %rsi,%r8 0.00 : ffffffff814c6ee1: mov %r9,%rax 0.00 : ffffffff814c6ee4: and $0x3f,%r8d 0.00 : ffffffff814c6ee8: cmp %r8,%r9 0.00 : ffffffff814c6eeb: jne ffffffff814c6d35 <_copy_from_iter+0x295> : if (unlikely(xas->xa_index >= max)) 0.00 : ffffffff814c6ef1: cmp $0xffffffffffffffff,%rsi 0.00 : ffffffff814c6ef5: je ffffffff814c6f60 <_copy_from_iter+0x4c0> : if (unlikely(xas->xa_offset == XA_CHUNK_MASK)) 0.00 : ffffffff814c6ef7: cmp $0x3f,%al 0.00 : ffffffff814c6ef9: je ffffffff814c6f4b <_copy_from_iter+0x4ab> : entry = xa_entry(xas->xa, node, xas->xa_offset + 1); 0.00 : ffffffff814c6efb: movzbl %al,%r8d : xa_entry(): : return rcu_dereference_check(node->slots[offset], 0.00 : ffffffff814c6eff: add $0x5,%r8 0.00 : ffffffff814c6f03: mov 0x8(%rdi,%r8,8),%r14 : xa_is_internal(): : return ((unsigned long)entry & 3) == 2; 0.00 : ffffffff814c6f08: mov %r14,%r8 0.00 : ffffffff814c6f0b: and $0x3,%r8d : xas_next_entry(): : if (unlikely(xa_is_internal(entry))) 0.00 : ffffffff814c6f0f: cmp $0x2,%r8 0.00 : ffffffff814c6f13: je ffffffff814c6f37 <_copy_from_iter+0x497> : xas->xa_offset++; 0.00 : ffffffff814c6f15: inc %eax : xas->xa_index++; 0.00 : ffffffff814c6f17: inc %rsi : } while (!entry); 0.00 : ffffffff814c6f1a: mov $0x1,%ecx 0.00 : ffffffff814c6f1f: test %r14,%r14 0.00 : ffffffff814c6f22: je ffffffff814c6ef1 <_copy_from_iter+0x451> 0.00 : ffffffff814c6f24: mov %al,-0x46(%rbp) 0.00 : ffffffff814c6f27: mov %rsi,-0x50(%rbp) 0.00 : ffffffff814c6f2b: jmpq ffffffff814c6c64 <_copy_from_iter+0x1c4> : _copy_from_iter(): 0.00 : ffffffff814c6f30: ud2 0.00 : ffffffff814c6f32: jmpq ffffffff814c6d51 <_copy_from_iter+0x2b1> 0.00 : ffffffff814c6f37: test %cl,%cl 0.00 : ffffffff814c6f39: je ffffffff814c6d35 <_copy_from_iter+0x295> 0.00 : ffffffff814c6f3f: mov %al,-0x46(%rbp) 0.00 : ffffffff814c6f42: mov %rsi,-0x50(%rbp) 0.00 : ffffffff814c6f46: jmpq ffffffff814c6d35 <_copy_from_iter+0x295> 0.00 : ffffffff814c6f4b: test %cl,%cl 0.00 : ffffffff814c6f4d: je ffffffff814c6d35 <_copy_from_iter+0x295> 0.00 : ffffffff814c6f53: movb $0x3f,-0x46(%rbp) 0.00 : ffffffff814c6f57: mov %rsi,-0x50(%rbp) : xas_next_entry(): : return xas_find(xas, max); 0.00 : ffffffff814c6f5b: jmpq ffffffff814c6d35 <_copy_from_iter+0x295> 0.00 : ffffffff814c6f60: test %cl,%cl 0.00 : ffffffff814c6f62: je ffffffff814c6d35 <_copy_from_iter+0x295> 0.00 : ffffffff814c6f68: mov %al,-0x46(%rbp) 0.00 : ffffffff814c6f6b: movq $0xffffffffffffffff,-0x50(%rbp) : return xas_find(xas, max); 0.00 : ffffffff814c6f73: jmpq ffffffff814c6d35 <_copy_from_iter+0x295> : _copy_from_iter(): 0.00 : ffffffff814c6f78: ud2 0.00 : ffffffff814c6f7a: jmpq ffffffff814c6d51 <_copy_from_iter+0x2b1> : xas_reset(): : xas->xa_node = XAS_RESTART; 0.00 : ffffffff814c6f7f: movq $0x3,-0x40(%rbp) : xas_not_node(): : return ((unsigned long)node & 3) || !node; 0.00 : ffffffff814c6f87: jmpq ffffffff814c6d35 <_copy_from_iter+0x295> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-05-06 19:15 ` Jens Axboe @ 2021-05-06 21:08 ` Al Viro 2021-05-06 21:17 ` Matthew Wilcox 2021-05-07 14:59 ` Jens Axboe 0 siblings, 2 replies; 18+ messages in thread From: Al Viro @ 2021-05-06 21:08 UTC (permalink / raw) To: Jens Axboe Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring On Thu, May 06, 2021 at 01:15:01PM -0600, Jens Axboe wrote: > Attached output of perf annotate <func> for that last run. Heh... I wonder if keeping the value of iocb_flags(file) in struct file itself would have a visible effect... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-05-06 21:08 ` Al Viro @ 2021-05-06 21:17 ` Matthew Wilcox 2021-05-07 14:59 ` Jens Axboe 1 sibling, 0 replies; 18+ messages in thread From: Matthew Wilcox @ 2021-05-06 21:17 UTC (permalink / raw) To: Al Viro Cc: Jens Axboe, Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring On Thu, May 06, 2021 at 09:08:50PM +0000, Al Viro wrote: > On Thu, May 06, 2021 at 01:15:01PM -0600, Jens Axboe wrote: > > > Attached output of perf annotate <func> for that last run. > > Heh... I wonder if keeping the value of iocb_flags(file) in > struct file itself would have a visible effect... I suggested that ... https://lore.kernel.org/linux-fsdevel/[email protected]/ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] block: reexpand iov_iter after read/write 2021-05-06 21:08 ` Al Viro 2021-05-06 21:17 ` Matthew Wilcox @ 2021-05-07 14:59 ` Jens Axboe 1 sibling, 0 replies; 18+ messages in thread From: Jens Axboe @ 2021-05-07 14:59 UTC (permalink / raw) To: Al Viro; +Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring On 5/6/21 3:08 PM, Al Viro wrote: > On Thu, May 06, 2021 at 01:15:01PM -0600, Jens Axboe wrote: > >> Attached output of perf annotate <func> for that last run. > > Heh... I wonder if keeping the value of iocb_flags(file) in > struct file itself would have a visible effect... A quick hack to get rid of the init_sync_kiocb() in new_sync_write() and just eliminate the ki_flags read in eventfd_write(), as the test case is blocking. That brings us closer to the ->write() method, down 7% vs the previous 10%: Executed in 468.23 millis fish external usr time 95.09 millis 114.00 micros 94.98 millis sys time 372.98 millis 76.00 micros 372.90 millis Executed in 468.97 millis fish external usr time 91.05 millis 89.00 micros 90.96 millis sys time 377.92 millis 69.00 micros 377.85 millis -- Jens Axboe ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-05-07 14:59 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-04-01 7:18 [PATCH] block: reexpand iov_iter after read/write yangerkun 2021-04-06 1:28 ` yangerkun 2021-04-06 11:04 ` Pavel Begunkov 2021-04-07 14:16 ` yangerkun 2021-04-09 14:49 ` Pavel Begunkov 2021-04-15 17:37 ` Pavel Begunkov 2021-04-15 17:39 ` Pavel Begunkov 2021-04-28 6:16 ` yangerkun 2021-04-30 12:57 ` Pavel Begunkov 2021-04-30 14:35 ` Al Viro 2021-05-06 16:57 ` Pavel Begunkov 2021-05-06 17:17 ` Al Viro 2021-05-06 17:19 ` Jens Axboe 2021-05-06 18:55 ` Al Viro 2021-05-06 19:15 ` Jens Axboe 2021-05-06 21:08 ` Al Viro 2021-05-06 21:17 ` Matthew Wilcox 2021-05-07 14:59 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox