public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] block: reexpand iov_iter after read/write
@ 2021-04-01  7:18 yangerkun
  2021-04-06  1:28 ` yangerkun
  2021-04-09 14:49 ` Pavel Begunkov
  0 siblings, 2 replies; 18+ messages in thread
From: yangerkun @ 2021-04-01  7:18 UTC (permalink / raw)
  To: viro, axboe, asml.silence; +Cc: linux-fsdevel, linux-block, io-uring, yangerkun

We get a bug:

BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
lib/iov_iter.c:1139
Read of size 8 at addr ffff0000d3fb11f8 by task

CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
5.10.0-00843-g352c8610ccd2 #2
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x110/0x164 lib/dump_stack.c:118
 print_address_description+0x78/0x5c8 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report+0x148/0x1e4 mm/kasan/report.c:562
 check_memory_region_inline mm/kasan/generic.c:183 [inline]
 __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
 io_read fs/io_uring.c:3421 [inline]
 io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
 io_submit_sqe fs/io_uring.c:6395 [inline]
 io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
 __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Allocated by task 12570:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
 __kmalloc+0x23c/0x334 mm/slub.c:3970
 kmalloc include/linux/slab.h:557 [inline]
 __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
 io_setup_async_rw fs/io_uring.c:3229 [inline]
 io_read fs/io_uring.c:3436 [inline]
 io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
 io_submit_sqe fs/io_uring.c:6395 [inline]
 io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
 __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Freed by task 12570:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track+0x38/0x6c mm/kasan/common.c:56
 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook mm/slub.c:1577 [inline]
 slab_free mm/slub.c:3142 [inline]
 kfree+0x104/0x38c mm/slub.c:4124
 io_dismantle_req fs/io_uring.c:1855 [inline]
 __io_free_req+0x70/0x254 fs/io_uring.c:1867
 io_put_req_find_next fs/io_uring.c:2173 [inline]
 __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
 task_work_run+0xdc/0x128 kernel/task_work.c:151
 get_signal+0x6f8/0x980 kernel/signal.c:2562
 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
 work_pending+0xc/0x180

blkdev_read_iter can truncate iov_iter's count since the count + pos may
exceed the size of the blkdev. This will confuse io_read that we have
consume the iovec. And once we do the iov_iter_revert in io_read, we
will trigger the slab-out-of-bounds. Fix it by reexpand the count with
size has been truncated.

blkdev_write_iter can trigger the problem too.

Signed-off-by: yangerkun <[email protected]>
---
 fs/block_dev.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 92ed7d5df677..788e1014576f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *bd_inode = bdev_file_inode(file);
 	loff_t size = i_size_read(bd_inode);
 	struct blk_plug plug;
+	size_t shorted = 0;
 	ssize_t ret;
 
 	if (bdev_read_only(I_BDEV(bd_inode)))
@@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
 		return -EOPNOTSUPP;
 
-	iov_iter_truncate(from, size - iocb->ki_pos);
+	size -= iocb->ki_pos;
+	if (iov_iter_count(from) > size) {
+		shorted = iov_iter_count(from) - size;
+		iov_iter_truncate(from, size);
+	}
 
 	blk_start_plug(&plug);
 	ret = __generic_file_write_iter(iocb, from);
 	if (ret > 0)
 		ret = generic_write_sync(iocb, ret);
+	iov_iter_reexpand(from, iov_iter_count(from) + shorted);
 	blk_finish_plug(&plug);
 	return ret;
 }
@@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	struct inode *bd_inode = bdev_file_inode(file);
 	loff_t size = i_size_read(bd_inode);
 	loff_t pos = iocb->ki_pos;
+	size_t shorted = 0;
+	ssize_t ret;
 
 	if (pos >= size)
 		return 0;
 
 	size -= pos;
-	iov_iter_truncate(to, size);
-	return generic_file_read_iter(iocb, to);
+	if (iov_iter_count(to) > size) {
+		shorted = iov_iter_count(to) - size;
+		iov_iter_truncate(to, size);
+	}
+
+	ret = generic_file_read_iter(iocb, to);
+	iov_iter_reexpand(to, iov_iter_count(to) + shorted);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_read_iter);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-01  7:18 [PATCH] block: reexpand iov_iter after read/write yangerkun
@ 2021-04-06  1:28 ` yangerkun
  2021-04-06 11:04   ` Pavel Begunkov
  2021-04-09 14:49 ` Pavel Begunkov
  1 sibling, 1 reply; 18+ messages in thread
From: yangerkun @ 2021-04-06  1:28 UTC (permalink / raw)
  To: viro, axboe, asml.silence; +Cc: linux-fsdevel, linux-block, io-uring

Ping...

在 2021/4/1 15:18, yangerkun 写道:
> We get a bug:
> 
> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
> lib/iov_iter.c:1139
> Read of size 8 at addr ffff0000d3fb11f8 by task
> 
> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
> 5.10.0-00843-g352c8610ccd2 #2
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>   dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
>   show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x110/0x164 lib/dump_stack.c:118
>   print_address_description+0x78/0x5c8 mm/kasan/report.c:385
>   __kasan_report mm/kasan/report.c:545 [inline]
>   kasan_report+0x148/0x1e4 mm/kasan/report.c:562
>   check_memory_region_inline mm/kasan/generic.c:183 [inline]
>   __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>   iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>   io_read fs/io_uring.c:3421 [inline]
>   io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>   __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>   io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>   io_submit_sqe fs/io_uring.c:6395 [inline]
>   io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>   __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>   __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>   __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>   __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>   invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>   el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>   do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>   el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>   el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>   el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
> 
> Allocated by task 12570:
>   stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>   kasan_save_stack mm/kasan/common.c:48 [inline]
>   kasan_set_track mm/kasan/common.c:56 [inline]
>   __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
>   kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
>   __kmalloc+0x23c/0x334 mm/slub.c:3970
>   kmalloc include/linux/slab.h:557 [inline]
>   __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
>   io_setup_async_rw fs/io_uring.c:3229 [inline]
>   io_read fs/io_uring.c:3436 [inline]
>   io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
>   __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>   io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>   io_submit_sqe fs/io_uring.c:6395 [inline]
>   io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>   __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>   __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>   __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>   __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>   invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>   el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>   do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>   el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>   el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>   el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
> 
> Freed by task 12570:
>   stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>   kasan_save_stack mm/kasan/common.c:48 [inline]
>   kasan_set_track+0x38/0x6c mm/kasan/common.c:56
>   kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
>   __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
>   kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
>   slab_free_hook mm/slub.c:1544 [inline]
>   slab_free_freelist_hook mm/slub.c:1577 [inline]
>   slab_free mm/slub.c:3142 [inline]
>   kfree+0x104/0x38c mm/slub.c:4124
>   io_dismantle_req fs/io_uring.c:1855 [inline]
>   __io_free_req+0x70/0x254 fs/io_uring.c:1867
>   io_put_req_find_next fs/io_uring.c:2173 [inline]
>   __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
>   __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
>   io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
>   task_work_run+0xdc/0x128 kernel/task_work.c:151
>   get_signal+0x6f8/0x980 kernel/signal.c:2562
>   do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
>   do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
>   work_pending+0xc/0x180
> 
> blkdev_read_iter can truncate iov_iter's count since the count + pos may
> exceed the size of the blkdev. This will confuse io_read that we have
> consume the iovec. And once we do the iov_iter_revert in io_read, we
> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
> size has been truncated.
> 
> blkdev_write_iter can trigger the problem too.
> 
> Signed-off-by: yangerkun <[email protected]>
> ---
>   fs/block_dev.c | 20 +++++++++++++++++---
>   1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 92ed7d5df677..788e1014576f 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>   	struct inode *bd_inode = bdev_file_inode(file);
>   	loff_t size = i_size_read(bd_inode);
>   	struct blk_plug plug;
> +	size_t shorted = 0;
>   	ssize_t ret;
>   
>   	if (bdev_read_only(I_BDEV(bd_inode)))
> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>   	if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>   		return -EOPNOTSUPP;
>   
> -	iov_iter_truncate(from, size - iocb->ki_pos);
> +	size -= iocb->ki_pos;
> +	if (iov_iter_count(from) > size) {
> +		shorted = iov_iter_count(from) - size;
> +		iov_iter_truncate(from, size);
> +	}
>   
>   	blk_start_plug(&plug);
>   	ret = __generic_file_write_iter(iocb, from);
>   	if (ret > 0)
>   		ret = generic_write_sync(iocb, ret);
> +	iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>   	blk_finish_plug(&plug);
>   	return ret;
>   }
> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>   	struct inode *bd_inode = bdev_file_inode(file);
>   	loff_t size = i_size_read(bd_inode);
>   	loff_t pos = iocb->ki_pos;
> +	size_t shorted = 0;
> +	ssize_t ret;
>   
>   	if (pos >= size)
>   		return 0;
>   
>   	size -= pos;
> -	iov_iter_truncate(to, size);
> -	return generic_file_read_iter(iocb, to);
> +	if (iov_iter_count(to) > size) {
> +		shorted = iov_iter_count(to) - size;
> +		iov_iter_truncate(to, size);
> +	}
> +
> +	ret = generic_file_read_iter(iocb, to);
> +	iov_iter_reexpand(to, iov_iter_count(to) + shorted);
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(blkdev_read_iter);
>   
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-06  1:28 ` yangerkun
@ 2021-04-06 11:04   ` Pavel Begunkov
  2021-04-07 14:16     ` yangerkun
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-06 11:04 UTC (permalink / raw)
  To: yangerkun, viro, axboe; +Cc: linux-fsdevel, linux-block, io-uring

On 06/04/2021 02:28, yangerkun wrote:
> Ping...

It wasn't forgotten, but wouln't have worked because of
other reasons. With these two already queued, that's a
different story.

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=07204f21577a1d882f0259590c3553fe6a476381
https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=230d50d448acb6639991440913299e50cacf1daf

Can you re-confirm, that the bug is still there (should be)
and your patch fixes it?

> 
> 在 2021/4/1 15:18, yangerkun 写道:
>> We get a bug:
>>
>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
>> lib/iov_iter.c:1139
>> Read of size 8 at addr ffff0000d3fb11f8 by task
>>
>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
>> 5.10.0-00843-g352c8610ccd2 #2
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>>   dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
>>   show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0x110/0x164 lib/dump_stack.c:118
>>   print_address_description+0x78/0x5c8 mm/kasan/report.c:385
>>   __kasan_report mm/kasan/report.c:545 [inline]
>>   kasan_report+0x148/0x1e4 mm/kasan/report.c:562
>>   check_memory_region_inline mm/kasan/generic.c:183 [inline]
>>   __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>>   iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>>   io_read fs/io_uring.c:3421 [inline]
>>   io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>>   __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>   io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>   io_submit_sqe fs/io_uring.c:6395 [inline]
>>   io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>>   __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>>   __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>>   __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>>   __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>>   invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>>   el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>>   do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>>   el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>>   el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>>   el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
>>
>> Allocated by task 12570:
>>   stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>>   kasan_save_stack mm/kasan/common.c:48 [inline]
>>   kasan_set_track mm/kasan/common.c:56 [inline]
>>   __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
>>   kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
>>   __kmalloc+0x23c/0x334 mm/slub.c:3970
>>   kmalloc include/linux/slab.h:557 [inline]
>>   __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
>>   io_setup_async_rw fs/io_uring.c:3229 [inline]
>>   io_read fs/io_uring.c:3436 [inline]
>>   io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
>>   __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>   io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>   io_submit_sqe fs/io_uring.c:6395 [inline]
>>   io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>>   __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>>   __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>>   __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>>   __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>>   invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>>   el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>>   do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>>   el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>>   el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>>   el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
>>
>> Freed by task 12570:
>>   stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>>   kasan_save_stack mm/kasan/common.c:48 [inline]
>>   kasan_set_track+0x38/0x6c mm/kasan/common.c:56
>>   kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
>>   __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
>>   kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
>>   slab_free_hook mm/slub.c:1544 [inline]
>>   slab_free_freelist_hook mm/slub.c:1577 [inline]
>>   slab_free mm/slub.c:3142 [inline]
>>   kfree+0x104/0x38c mm/slub.c:4124
>>   io_dismantle_req fs/io_uring.c:1855 [inline]
>>   __io_free_req+0x70/0x254 fs/io_uring.c:1867
>>   io_put_req_find_next fs/io_uring.c:2173 [inline]
>>   __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
>>   __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
>>   io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
>>   task_work_run+0xdc/0x128 kernel/task_work.c:151
>>   get_signal+0x6f8/0x980 kernel/signal.c:2562
>>   do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
>>   do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
>>   work_pending+0xc/0x180
>>
>> blkdev_read_iter can truncate iov_iter's count since the count + pos may
>> exceed the size of the blkdev. This will confuse io_read that we have
>> consume the iovec. And once we do the iov_iter_revert in io_read, we
>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
>> size has been truncated.
>>
>> blkdev_write_iter can trigger the problem too.
>>
>> Signed-off-by: yangerkun <[email protected]>
>> ---
>>   fs/block_dev.c | 20 +++++++++++++++++---
>>   1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 92ed7d5df677..788e1014576f 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>       struct inode *bd_inode = bdev_file_inode(file);
>>       loff_t size = i_size_read(bd_inode);
>>       struct blk_plug plug;
>> +    size_t shorted = 0;
>>       ssize_t ret;
>>         if (bdev_read_only(I_BDEV(bd_inode)))
>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>       if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>>           return -EOPNOTSUPP;
>>   -    iov_iter_truncate(from, size - iocb->ki_pos);
>> +    size -= iocb->ki_pos;
>> +    if (iov_iter_count(from) > size) {
>> +        shorted = iov_iter_count(from) - size;
>> +        iov_iter_truncate(from, size);
>> +    }
>>         blk_start_plug(&plug);
>>       ret = __generic_file_write_iter(iocb, from);
>>       if (ret > 0)
>>           ret = generic_write_sync(iocb, ret);
>> +    iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>>       blk_finish_plug(&plug);
>>       return ret;
>>   }
>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>       struct inode *bd_inode = bdev_file_inode(file);
>>       loff_t size = i_size_read(bd_inode);
>>       loff_t pos = iocb->ki_pos;
>> +    size_t shorted = 0;
>> +    ssize_t ret;
>>         if (pos >= size)
>>           return 0;
>>         size -= pos;
>> -    iov_iter_truncate(to, size);
>> -    return generic_file_read_iter(iocb, to);
>> +    if (iov_iter_count(to) > size) {
>> +        shorted = iov_iter_count(to) - size;
>> +        iov_iter_truncate(to, size);
>> +    }
>> +
>> +    ret = generic_file_read_iter(iocb, to);
>> +    iov_iter_reexpand(to, iov_iter_count(to) + shorted);
>> +    return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(blkdev_read_iter);
>>  

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-06 11:04   ` Pavel Begunkov
@ 2021-04-07 14:16     ` yangerkun
  0 siblings, 0 replies; 18+ messages in thread
From: yangerkun @ 2021-04-07 14:16 UTC (permalink / raw)
  To: Pavel Begunkov, viro, axboe; +Cc: linux-fsdevel, linux-block, io-uring



在 2021/4/6 19:04, Pavel Begunkov 写道:
> On 06/04/2021 02:28, yangerkun wrote:
>> Ping...
> 
> It wasn't forgotten, but wouln't have worked because of
> other reasons. With these two already queued, that's a
> different story.
> 
> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=07204f21577a1d882f0259590c3553fe6a476381
> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=230d50d448acb6639991440913299e50cacf1daf
> 
> Can you re-confirm, that the bug is still there (should be)
> and your patch fixes it?

Hi,

This problem still exists in mainline (2d743660786e Merge branch 'fixes' 
of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs), and this 
patch will fix it.

The io_read for loop will return -EAGAIN. This will lead a 
iov_iter_revert in io_read. Once we truncate iov_iter in 
blkdev_read_iter, we will see this bug...


[  181.204371][ T4241] loop0: detected capacity change from 0 to 232 

[  181.253683][ T4241] 
==================================================================
[  181.255313][ T4241] BUG: KASAN: slab-out-of-bounds in 
iov_iter_revert+0xd0/0x3e0
[  181.256723][ T4241] Read of size 8 at addr ffff0000cfbc8ff8 by task 
a.out/4241
[  181.257776][ T4241] 

[  181.258749][ T4241] CPU: 5 PID: 4241 Comm: a.out Not tainted 
5.12.0-rc6-00006-g2d743660786e
#1 

[  181.260149][ T4241] Hardware name: linux,dummy-virt (DT) 

[  181.261468][ T4241] Call trace: 

[  181.262052][ T4241]  dump_backtrace+0x0/0x348 

[  181.263139][ T4241]  show_stack+0x28/0x38 

[  181.264234][ T4241]  dump_stack+0x134/0x1a4 

[  181.265175][ T4241]  print_address_description.constprop.0+0x68/0x304 

[  181.266430][ T4241]  kasan_report+0x1d0/0x238 

[  181.267308][ T4241]  __asan_load8+0x88/0xc0 

[  181.268317][ T4241]  iov_iter_revert+0xd0/0x3e0 

[  181.269251][ T4241]  io_read+0x310/0x5c0 

[  181.270208][ T4241]  io_issue_sqe+0x3fc/0x25d8 

[  181.271134][ T4241]  __io_queue_sqe+0xf8/0x480 

[  181.272142][ T4241]  io_queue_sqe+0x3a4/0x4c8 

[  181.273053][ T4241]  io_submit_sqes+0xd9c/0x22d0 

[  181.274375][ T4241]  __arm64_sys_io_uring_enter+0x3d0/0xce0 

[  181.275554][ T4241]  do_el0_svc+0xc4/0x228 

[  181.276411][ T4241]  el0_svc+0x24/0x30 

[  181.277323][ T4241]  el0_sync_handler+0x158/0x160 

[  181.278241][ T4241]  el0_sync+0x13c/0x140 

[  181.279287][ T4241] 

[  181.279820][ T4241] Allocated by task 4241: 

[  181.280699][ T4241]  kasan_save_stack+0x24/0x50 

[  181.281626][ T4241]  __kasan_kmalloc+0x84/0xa8 

[  181.282578][ T4241]  io_wq_create+0x94/0x668 

[  181.283469][ T4241]  io_uring_alloc_task_context+0x164/0x368 

[  181.284748][ T4241]  io_uring_add_task_file+0x1b0/0x208 

[  181.285865][ T4241]  io_uring_setup+0xaac/0x12a0 

[  181.286823][ T4241]  __arm64_sys_io_uring_setup+0x34/0x40 

[  181.287957][ T4241]  do_el0_svc+0xc4/0x228 

[  181.288906][ T4241]  el0_svc+0x24/0x30 

[  181.289816][ T4241]  el0_sync_handler+0x158/0x160 

[  181.290751][ T4241]  el0_sync+0x13c/0x140 

[  181.291697][ T4241] 


> 
>>
>> 在 2021/4/1 15:18, yangerkun 写道:
>>> We get a bug:
>>>
>>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
>>> lib/iov_iter.c:1139
>>> Read of size 8 at addr ffff0000d3fb11f8 by task
>>>
>>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
>>> 5.10.0-00843-g352c8610ccd2 #2
>>> Hardware name: linux,dummy-virt (DT)
>>> Call trace:
>>>    dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
>>>    show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
>>>    __dump_stack lib/dump_stack.c:77 [inline]
>>>    dump_stack+0x110/0x164 lib/dump_stack.c:118
>>>    print_address_description+0x78/0x5c8 mm/kasan/report.c:385
>>>    __kasan_report mm/kasan/report.c:545 [inline]
>>>    kasan_report+0x148/0x1e4 mm/kasan/report.c:562
>>>    check_memory_region_inline mm/kasan/generic.c:183 [inline]
>>>    __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>>>    iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>>>    io_read fs/io_uring.c:3421 [inline]
>>>    io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>>>    __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>>    io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>>    io_submit_sqe fs/io_uring.c:6395 [inline]
>>>    io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>>>    __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>>>    __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>>>    __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>>>    __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>>>    invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>>>    el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>>>    do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>>>    el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>>>    el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>>>    el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
>>>
>>> Allocated by task 12570:
>>>    stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>>>    kasan_save_stack mm/kasan/common.c:48 [inline]
>>>    kasan_set_track mm/kasan/common.c:56 [inline]
>>>    __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
>>>    kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
>>>    __kmalloc+0x23c/0x334 mm/slub.c:3970
>>>    kmalloc include/linux/slab.h:557 [inline]
>>>    __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
>>>    io_setup_async_rw fs/io_uring.c:3229 [inline]
>>>    io_read fs/io_uring.c:3436 [inline]
>>>    io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
>>>    __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>>    io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>>    io_submit_sqe fs/io_uring.c:6395 [inline]
>>>    io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>>>    __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>>>    __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>>>    __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>>>    __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>>>    invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>>>    el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>>>    do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>>>    el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>>>    el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>>>    el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
>>>
>>> Freed by task 12570:
>>>    stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>>>    kasan_save_stack mm/kasan/common.c:48 [inline]
>>>    kasan_set_track+0x38/0x6c mm/kasan/common.c:56
>>>    kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
>>>    __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
>>>    kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
>>>    slab_free_hook mm/slub.c:1544 [inline]
>>>    slab_free_freelist_hook mm/slub.c:1577 [inline]
>>>    slab_free mm/slub.c:3142 [inline]
>>>    kfree+0x104/0x38c mm/slub.c:4124
>>>    io_dismantle_req fs/io_uring.c:1855 [inline]
>>>    __io_free_req+0x70/0x254 fs/io_uring.c:1867
>>>    io_put_req_find_next fs/io_uring.c:2173 [inline]
>>>    __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
>>>    __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
>>>    io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
>>>    task_work_run+0xdc/0x128 kernel/task_work.c:151
>>>    get_signal+0x6f8/0x980 kernel/signal.c:2562
>>>    do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
>>>    do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
>>>    work_pending+0xc/0x180
>>>
>>> blkdev_read_iter can truncate iov_iter's count since the count + pos may
>>> exceed the size of the blkdev. This will confuse io_read that we have
>>> consume the iovec. And once we do the iov_iter_revert in io_read, we
>>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
>>> size has been truncated.
>>>
>>> blkdev_write_iter can trigger the problem too.
>>>
>>> Signed-off-by: yangerkun <[email protected]>
>>> ---
>>>    fs/block_dev.c | 20 +++++++++++++++++---
>>>    1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>> index 92ed7d5df677..788e1014576f 100644
>>> --- a/fs/block_dev.c
>>> +++ b/fs/block_dev.c
>>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>        struct inode *bd_inode = bdev_file_inode(file);
>>>        loff_t size = i_size_read(bd_inode);
>>>        struct blk_plug plug;
>>> +    size_t shorted = 0;
>>>        ssize_t ret;
>>>          if (bdev_read_only(I_BDEV(bd_inode)))
>>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>        if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>>>            return -EOPNOTSUPP;
>>>    -    iov_iter_truncate(from, size - iocb->ki_pos);
>>> +    size -= iocb->ki_pos;
>>> +    if (iov_iter_count(from) > size) {
>>> +        shorted = iov_iter_count(from) - size;
>>> +        iov_iter_truncate(from, size);
>>> +    }
>>>          blk_start_plug(&plug);
>>>        ret = __generic_file_write_iter(iocb, from);
>>>        if (ret > 0)
>>>            ret = generic_write_sync(iocb, ret);
>>> +    iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>>>        blk_finish_plug(&plug);
>>>        return ret;
>>>    }
>>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>        struct inode *bd_inode = bdev_file_inode(file);
>>>        loff_t size = i_size_read(bd_inode);
>>>        loff_t pos = iocb->ki_pos;
>>> +    size_t shorted = 0;
>>> +    ssize_t ret;
>>>          if (pos >= size)
>>>            return 0;
>>>          size -= pos;
>>> -    iov_iter_truncate(to, size);
>>> -    return generic_file_read_iter(iocb, to);
>>> +    if (iov_iter_count(to) > size) {
>>> +        shorted = iov_iter_count(to) - size;
>>> +        iov_iter_truncate(to, size);
>>> +    }
>>> +
>>> +    ret = generic_file_read_iter(iocb, to);
>>> +    iov_iter_reexpand(to, iov_iter_count(to) + shorted);
>>> +    return ret;
>>>    }
>>>    EXPORT_SYMBOL_GPL(blkdev_read_iter);
>>>   
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-01  7:18 [PATCH] block: reexpand iov_iter after read/write yangerkun
  2021-04-06  1:28 ` yangerkun
@ 2021-04-09 14:49 ` Pavel Begunkov
  2021-04-15 17:37   ` Pavel Begunkov
  1 sibling, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-09 14:49 UTC (permalink / raw)
  To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring

On 01/04/2021 08:18, yangerkun wrote:
> We get a bug:
> 
> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
> lib/iov_iter.c:1139
> Read of size 8 at addr ffff0000d3fb11f8 by task
> 
> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
> 5.10.0-00843-g352c8610ccd2 #2
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
>  show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x110/0x164 lib/dump_stack.c:118
>  print_address_description+0x78/0x5c8 mm/kasan/report.c:385
>  __kasan_report mm/kasan/report.c:545 [inline]
>  kasan_report+0x148/0x1e4 mm/kasan/report.c:562
>  check_memory_region_inline mm/kasan/generic.c:183 [inline]
>  __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>  iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>  io_read fs/io_uring.c:3421 [inline]
>  io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>  __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>  io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>  io_submit_sqe fs/io_uring.c:6395 [inline]
>  io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>  __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>  __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>  __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>  __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>  invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>  el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>  do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>  el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>  el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>  el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
> 
> Allocated by task 12570:
>  stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>  kasan_save_stack mm/kasan/common.c:48 [inline]
>  kasan_set_track mm/kasan/common.c:56 [inline]
>  __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
>  kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
>  __kmalloc+0x23c/0x334 mm/slub.c:3970
>  kmalloc include/linux/slab.h:557 [inline]
>  __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
>  io_setup_async_rw fs/io_uring.c:3229 [inline]
>  io_read fs/io_uring.c:3436 [inline]
>  io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
>  __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>  io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>  io_submit_sqe fs/io_uring.c:6395 [inline]
>  io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>  __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
>  __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
>  __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
>  __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
>  invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
>  el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
>  do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
>  el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
>  el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
>  el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
> 
> Freed by task 12570:
>  stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
>  kasan_save_stack mm/kasan/common.c:48 [inline]
>  kasan_set_track+0x38/0x6c mm/kasan/common.c:56
>  kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
>  __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
>  kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
>  slab_free_hook mm/slub.c:1544 [inline]
>  slab_free_freelist_hook mm/slub.c:1577 [inline]
>  slab_free mm/slub.c:3142 [inline]
>  kfree+0x104/0x38c mm/slub.c:4124
>  io_dismantle_req fs/io_uring.c:1855 [inline]
>  __io_free_req+0x70/0x254 fs/io_uring.c:1867
>  io_put_req_find_next fs/io_uring.c:2173 [inline]
>  __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
>  __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
>  io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
>  task_work_run+0xdc/0x128 kernel/task_work.c:151
>  get_signal+0x6f8/0x980 kernel/signal.c:2562
>  do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
>  do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
>  work_pending+0xc/0x180
> 
> blkdev_read_iter can truncate iov_iter's count since the count + pos may
> exceed the size of the blkdev. This will confuse io_read that we have
> consume the iovec. And once we do the iov_iter_revert in io_read, we
> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
> size has been truncated.

Looks right,

Acked-by: Pavel Begunkov <[email protected]>

> 
> blkdev_write_iter can trigger the problem too.
> 
> Signed-off-by: yangerkun <[email protected]>
> ---
>  fs/block_dev.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 92ed7d5df677..788e1014576f 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	struct inode *bd_inode = bdev_file_inode(file);
>  	loff_t size = i_size_read(bd_inode);
>  	struct blk_plug plug;
> +	size_t shorted = 0;
>  	ssize_t ret;
>  
>  	if (bdev_read_only(I_BDEV(bd_inode)))
> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  	if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>  		return -EOPNOTSUPP;
>  
> -	iov_iter_truncate(from, size - iocb->ki_pos);
> +	size -= iocb->ki_pos;
> +	if (iov_iter_count(from) > size) {
> +		shorted = iov_iter_count(from) - size;
> +		iov_iter_truncate(from, size);
> +	}
>  
>  	blk_start_plug(&plug);
>  	ret = __generic_file_write_iter(iocb, from);
>  	if (ret > 0)
>  		ret = generic_write_sync(iocb, ret);
> +	iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>  	blk_finish_plug(&plug);
>  	return ret;
>  }
> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	struct inode *bd_inode = bdev_file_inode(file);
>  	loff_t size = i_size_read(bd_inode);
>  	loff_t pos = iocb->ki_pos;
> +	size_t shorted = 0;
> +	ssize_t ret;
>  
>  	if (pos >= size)
>  		return 0;
>  
>  	size -= pos;
> -	iov_iter_truncate(to, size);
> -	return generic_file_read_iter(iocb, to);
> +	if (iov_iter_count(to) > size) {
> +		shorted = iov_iter_count(to) - size;
> +		iov_iter_truncate(to, size);
> +	}
> +
> +	ret = generic_file_read_iter(iocb, to);
> +	iov_iter_reexpand(to, iov_iter_count(to) + shorted);
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(blkdev_read_iter);
>  
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-09 14:49 ` Pavel Begunkov
@ 2021-04-15 17:37   ` Pavel Begunkov
  2021-04-15 17:39     ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-15 17:37 UTC (permalink / raw)
  To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring

On 09/04/2021 15:49, Pavel Begunkov wrote:
> On 01/04/2021 08:18, yangerkun wrote:
>> We get a bug:
>>
>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
>> lib/iov_iter.c:1139
>> Read of size 8 at addr ffff0000d3fb11f8 by task
>>
>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
>> 5.10.0-00843-g352c8610ccd2 #2
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
...
>>  __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>>  iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>>  io_read fs/io_uring.c:3421 [inline]
>>  io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>>  __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>  io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>  io_submit_sqe fs/io_uring.c:6395 [inline]
>>  io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
...
>>
>> blkdev_read_iter can truncate iov_iter's count since the count + pos may
>> exceed the size of the blkdev. This will confuse io_read that we have
>> consume the iovec. And once we do the iov_iter_revert in io_read, we
>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
>> size has been truncated.
> 
> Looks right,
> 
> Acked-by: Pavel Begunkov <[email protected]>

Fwiw, we need to forget to drag it through 5.13 + stable


>>
>> blkdev_write_iter can trigger the problem too.
>>
>> Signed-off-by: yangerkun <[email protected]>
>> ---
>>  fs/block_dev.c | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 92ed7d5df677..788e1014576f 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>  	struct inode *bd_inode = bdev_file_inode(file);
>>  	loff_t size = i_size_read(bd_inode);
>>  	struct blk_plug plug;
>> +	size_t shorted = 0;
>>  	ssize_t ret;
>>  
>>  	if (bdev_read_only(I_BDEV(bd_inode)))
>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>  	if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>>  		return -EOPNOTSUPP;
>>  
>> -	iov_iter_truncate(from, size - iocb->ki_pos);
>> +	size -= iocb->ki_pos;
>> +	if (iov_iter_count(from) > size) {
>> +		shorted = iov_iter_count(from) - size;
>> +		iov_iter_truncate(from, size);
>> +	}
>>  
>>  	blk_start_plug(&plug);
>>  	ret = __generic_file_write_iter(iocb, from);
>>  	if (ret > 0)
>>  		ret = generic_write_sync(iocb, ret);
>> +	iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>>  	blk_finish_plug(&plug);
>>  	return ret;
>>  }
>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>  	struct inode *bd_inode = bdev_file_inode(file);
>>  	loff_t size = i_size_read(bd_inode);
>>  	loff_t pos = iocb->ki_pos;
>> +	size_t shorted = 0;
>> +	ssize_t ret;
>>  
>>  	if (pos >= size)
>>  		return 0;
>>  
>>  	size -= pos;
>> -	iov_iter_truncate(to, size);
>> -	return generic_file_read_iter(iocb, to);
>> +	if (iov_iter_count(to) > size) {
>> +		shorted = iov_iter_count(to) - size;
>> +		iov_iter_truncate(to, size);
>> +	}
>> +
>> +	ret = generic_file_read_iter(iocb, to);
>> +	iov_iter_reexpand(to, iov_iter_count(to) + shorted);
>> +	return ret;
>>  }
>>  EXPORT_SYMBOL_GPL(blkdev_read_iter);
>>  
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-15 17:37   ` Pavel Begunkov
@ 2021-04-15 17:39     ` Pavel Begunkov
  2021-04-28  6:16       ` yangerkun
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-15 17:39 UTC (permalink / raw)
  To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring

On 15/04/2021 18:37, Pavel Begunkov wrote:
> On 09/04/2021 15:49, Pavel Begunkov wrote:
>> On 01/04/2021 08:18, yangerkun wrote:
>>> We get a bug:
>>>
>>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
>>> lib/iov_iter.c:1139
>>> Read of size 8 at addr ffff0000d3fb11f8 by task
>>>
>>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
>>> 5.10.0-00843-g352c8610ccd2 #2
>>> Hardware name: linux,dummy-virt (DT)
>>> Call trace:
> ...
>>>  __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>>>  iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>>>  io_read fs/io_uring.c:3421 [inline]
>>>  io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>>>  __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>>  io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>>  io_submit_sqe fs/io_uring.c:6395 [inline]
>>>  io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
> ...
>>>
>>> blkdev_read_iter can truncate iov_iter's count since the count + pos may
>>> exceed the size of the blkdev. This will confuse io_read that we have
>>> consume the iovec. And once we do the iov_iter_revert in io_read, we
>>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
>>> size has been truncated.
>>
>> Looks right,
>>
>> Acked-by: Pavel Begunkov <[email protected]>
> 
> Fwiw, we need to forget to drag it through 5.13 + stable

Err, yypo, to _not_ forget to 5.13 + stable...

> 
> 
>>>
>>> blkdev_write_iter can trigger the problem too.
>>>
>>> Signed-off-by: yangerkun <[email protected]>
>>> ---
>>>  fs/block_dev.c | 20 +++++++++++++++++---
>>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>> index 92ed7d5df677..788e1014576f 100644
>>> --- a/fs/block_dev.c
>>> +++ b/fs/block_dev.c
>>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>  	struct inode *bd_inode = bdev_file_inode(file);
>>>  	loff_t size = i_size_read(bd_inode);
>>>  	struct blk_plug plug;
>>> +	size_t shorted = 0;
>>>  	ssize_t ret;
>>>  
>>>  	if (bdev_read_only(I_BDEV(bd_inode)))
>>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>  	if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>>>  		return -EOPNOTSUPP;
>>>  
>>> -	iov_iter_truncate(from, size - iocb->ki_pos);
>>> +	size -= iocb->ki_pos;
>>> +	if (iov_iter_count(from) > size) {
>>> +		shorted = iov_iter_count(from) - size;
>>> +		iov_iter_truncate(from, size);
>>> +	}
>>>  
>>>  	blk_start_plug(&plug);
>>>  	ret = __generic_file_write_iter(iocb, from);
>>>  	if (ret > 0)
>>>  		ret = generic_write_sync(iocb, ret);
>>> +	iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>>>  	blk_finish_plug(&plug);
>>>  	return ret;
>>>  }
>>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>  	struct inode *bd_inode = bdev_file_inode(file);
>>>  	loff_t size = i_size_read(bd_inode);
>>>  	loff_t pos = iocb->ki_pos;
>>> +	size_t shorted = 0;
>>> +	ssize_t ret;
>>>  
>>>  	if (pos >= size)
>>>  		return 0;
>>>  
>>>  	size -= pos;
>>> -	iov_iter_truncate(to, size);
>>> -	return generic_file_read_iter(iocb, to);
>>> +	if (iov_iter_count(to) > size) {
>>> +		shorted = iov_iter_count(to) - size;
>>> +		iov_iter_truncate(to, size);
>>> +	}
>>> +
>>> +	ret = generic_file_read_iter(iocb, to);
>>> +	iov_iter_reexpand(to, iov_iter_count(to) + shorted);
>>> +	return ret;
>>>  }
>>>  EXPORT_SYMBOL_GPL(blkdev_read_iter);
>>>  
>>>
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-15 17:39     ` Pavel Begunkov
@ 2021-04-28  6:16       ` yangerkun
  2021-04-30 12:57         ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: yangerkun @ 2021-04-28  6:16 UTC (permalink / raw)
  To: Pavel Begunkov, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring

Hi,

Should we pick this patch for 5.13?

在 2021/4/16 1:39, Pavel Begunkov 写道:
> On 15/04/2021 18:37, Pavel Begunkov wrote:
>> On 09/04/2021 15:49, Pavel Begunkov wrote:
>>> On 01/04/2021 08:18, yangerkun wrote:
>>>> We get a bug:
>>>>
>>>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
>>>> lib/iov_iter.c:1139
>>>> Read of size 8 at addr ffff0000d3fb11f8 by task
>>>>
>>>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
>>>> 5.10.0-00843-g352c8610ccd2 #2
>>>> Hardware name: linux,dummy-virt (DT)
>>>> Call trace:
>> ...
>>>>   __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>>>>   iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>>>>   io_read fs/io_uring.c:3421 [inline]
>>>>   io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>>>>   __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>>>   io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>>>   io_submit_sqe fs/io_uring.c:6395 [inline]
>>>>   io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>> ...
>>>>
>>>> blkdev_read_iter can truncate iov_iter's count since the count + pos may
>>>> exceed the size of the blkdev. This will confuse io_read that we have
>>>> consume the iovec. And once we do the iov_iter_revert in io_read, we
>>>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
>>>> size has been truncated.
>>>
>>> Looks right,
>>>
>>> Acked-by: Pavel Begunkov <[email protected]>
>>
>> Fwiw, we need to forget to drag it through 5.13 + stable
> 
> Err, yypo, to _not_ forget to 5.13 + stable...
> 
>>
>>
>>>>
>>>> blkdev_write_iter can trigger the problem too.
>>>>
>>>> Signed-off-by: yangerkun <[email protected]>
>>>> ---
>>>>   fs/block_dev.c | 20 +++++++++++++++++---
>>>>   1 file changed, 17 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>>> index 92ed7d5df677..788e1014576f 100644
>>>> --- a/fs/block_dev.c
>>>> +++ b/fs/block_dev.c
>>>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>>   	struct inode *bd_inode = bdev_file_inode(file);
>>>>   	loff_t size = i_size_read(bd_inode);
>>>>   	struct blk_plug plug;
>>>> +	size_t shorted = 0;
>>>>   	ssize_t ret;
>>>>   
>>>>   	if (bdev_read_only(I_BDEV(bd_inode)))
>>>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>>   	if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>>>>   		return -EOPNOTSUPP;
>>>>   
>>>> -	iov_iter_truncate(from, size - iocb->ki_pos);
>>>> +	size -= iocb->ki_pos;
>>>> +	if (iov_iter_count(from) > size) {
>>>> +		shorted = iov_iter_count(from) - size;
>>>> +		iov_iter_truncate(from, size);
>>>> +	}
>>>>   
>>>>   	blk_start_plug(&plug);
>>>>   	ret = __generic_file_write_iter(iocb, from);
>>>>   	if (ret > 0)
>>>>   		ret = generic_write_sync(iocb, ret);
>>>> +	iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>>>>   	blk_finish_plug(&plug);
>>>>   	return ret;
>>>>   }
>>>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>>   	struct inode *bd_inode = bdev_file_inode(file);
>>>>   	loff_t size = i_size_read(bd_inode);
>>>>   	loff_t pos = iocb->ki_pos;
>>>> +	size_t shorted = 0;
>>>> +	ssize_t ret;
>>>>   
>>>>   	if (pos >= size)
>>>>   		return 0;
>>>>   
>>>>   	size -= pos;
>>>> -	iov_iter_truncate(to, size);
>>>> -	return generic_file_read_iter(iocb, to);
>>>> +	if (iov_iter_count(to) > size) {
>>>> +		shorted = iov_iter_count(to) - size;
>>>> +		iov_iter_truncate(to, size);
>>>> +	}
>>>> +
>>>> +	ret = generic_file_read_iter(iocb, to);
>>>> +	iov_iter_reexpand(to, iov_iter_count(to) + shorted);
>>>> +	return ret;
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(blkdev_read_iter);
>>>>   
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-28  6:16       ` yangerkun
@ 2021-04-30 12:57         ` Pavel Begunkov
  2021-04-30 14:35           ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-04-30 12:57 UTC (permalink / raw)
  To: yangerkun, axboe; +Cc: viro, linux-fsdevel, linux-block, io-uring

On 4/28/21 7:16 AM, yangerkun wrote:
> Hi,
> 
> Should we pick this patch for 5.13?

Looks ok to me

> 
> 在 2021/4/16 1:39, Pavel Begunkov 写道:
>> On 15/04/2021 18:37, Pavel Begunkov wrote:
>>> On 09/04/2021 15:49, Pavel Begunkov wrote:
>>>> On 01/04/2021 08:18, yangerkun wrote:
>>>>> We get a bug:
>>>>>
>>>>> BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
>>>>> lib/iov_iter.c:1139
>>>>> Read of size 8 at addr ffff0000d3fb11f8 by task
>>>>>
>>>>> CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
>>>>> 5.10.0-00843-g352c8610ccd2 #2
>>>>> Hardware name: linux,dummy-virt (DT)
>>>>> Call trace:
>>> ...
>>>>>   __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
>>>>>   iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
>>>>>   io_read fs/io_uring.c:3421 [inline]
>>>>>   io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
>>>>>   __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
>>>>>   io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
>>>>>   io_submit_sqe fs/io_uring.c:6395 [inline]
>>>>>   io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
>>> ...
>>>>>
>>>>> blkdev_read_iter can truncate iov_iter's count since the count + pos may
>>>>> exceed the size of the blkdev. This will confuse io_read that we have
>>>>> consume the iovec. And once we do the iov_iter_revert in io_read, we
>>>>> will trigger the slab-out-of-bounds. Fix it by reexpand the count with
>>>>> size has been truncated.
>>>>
>>>> Looks right,
>>>>
>>>> Acked-by: Pavel Begunkov <[email protected]>
>>>
>>> Fwiw, we need to forget to drag it through 5.13 + stable
>>
>> Err, yypo, to _not_ forget to 5.13 + stable...
>>
>>>
>>>
>>>>>
>>>>> blkdev_write_iter can trigger the problem too.
>>>>>
>>>>> Signed-off-by: yangerkun <[email protected]>
>>>>> ---
>>>>>   fs/block_dev.c | 20 +++++++++++++++++---
>>>>>   1 file changed, 17 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>>>> index 92ed7d5df677..788e1014576f 100644
>>>>> --- a/fs/block_dev.c
>>>>> +++ b/fs/block_dev.c
>>>>> @@ -1680,6 +1680,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>>>       struct inode *bd_inode = bdev_file_inode(file);
>>>>>       loff_t size = i_size_read(bd_inode);
>>>>>       struct blk_plug plug;
>>>>> +    size_t shorted = 0;
>>>>>       ssize_t ret;
>>>>>         if (bdev_read_only(I_BDEV(bd_inode)))
>>>>> @@ -1697,12 +1698,17 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>>>       if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
>>>>>           return -EOPNOTSUPP;
>>>>>   -    iov_iter_truncate(from, size - iocb->ki_pos);
>>>>> +    size -= iocb->ki_pos;
>>>>> +    if (iov_iter_count(from) > size) {
>>>>> +        shorted = iov_iter_count(from) - size;
>>>>> +        iov_iter_truncate(from, size);
>>>>> +    }
>>>>>         blk_start_plug(&plug);
>>>>>       ret = __generic_file_write_iter(iocb, from);
>>>>>       if (ret > 0)
>>>>>           ret = generic_write_sync(iocb, ret);
>>>>> +    iov_iter_reexpand(from, iov_iter_count(from) + shorted);
>>>>>       blk_finish_plug(&plug);
>>>>>       return ret;
>>>>>   }
>>>>> @@ -1714,13 +1720,21 @@ ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>>>       struct inode *bd_inode = bdev_file_inode(file);
>>>>>       loff_t size = i_size_read(bd_inode);
>>>>>       loff_t pos = iocb->ki_pos;
>>>>> +    size_t shorted = 0;
>>>>> +    ssize_t ret;
>>>>>         if (pos >= size)
>>>>>           return 0;
>>>>>         size -= pos;
>>>>> -    iov_iter_truncate(to, size);
>>>>> -    return generic_file_read_iter(iocb, to);
>>>>> +    if (iov_iter_count(to) > size) {
>>>>> +        shorted = iov_iter_count(to) - size;
>>>>> +        iov_iter_truncate(to, size);
>>>>> +    }
>>>>> +
>>>>> +    ret = generic_file_read_iter(iocb, to);
>>>>> +    iov_iter_reexpand(to, iov_iter_count(to) + shorted);
>>>>> +    return ret;
>>>>>   }
>>>>>   EXPORT_SYMBOL_GPL(blkdev_read_iter);
>>>>>  
>>>>
>>>
>>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-30 12:57         ` Pavel Begunkov
@ 2021-04-30 14:35           ` Al Viro
  2021-05-06 16:57             ` Pavel Begunkov
  2021-05-06 17:19             ` Jens Axboe
  0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2021-04-30 14:35 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: yangerkun, axboe, linux-fsdevel, linux-block, io-uring

On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote:
> On 4/28/21 7:16 AM, yangerkun wrote:
> > Hi,
> > 
> > Should we pick this patch for 5.13?
> 
> Looks ok to me

	Looks sane.  BTW, Pavel, could you go over #untested.iov_iter
and give it some beating?  Ideally - with per-commit profiling to see
what speedups/slowdowns do they come with...

	It's not in the final state (if nothing else, it needs to be
rebased on top of xarray stuff, and there will be followup cleanups
as well), but I'd appreciate testing and profiling data...

	It does survive xfstests + LTP syscall tests, but that's about
it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-30 14:35           ` Al Viro
@ 2021-05-06 16:57             ` Pavel Begunkov
  2021-05-06 17:17               ` Al Viro
  2021-05-06 17:19             ` Jens Axboe
  1 sibling, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-05-06 16:57 UTC (permalink / raw)
  To: Al Viro, Jens Axboe; +Cc: yangerkun, linux-fsdevel, linux-block, io-uring

On 4/30/21 3:35 PM, Al Viro wrote:
> On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote:
>> On 4/28/21 7:16 AM, yangerkun wrote:
>>> Hi,
>>>
>>> Should we pick this patch for 5.13?
>>
>> Looks ok to me
> 
> 	Looks sane.  BTW, Pavel, could you go over #untested.iov_iter
> and give it some beating?  Ideally - with per-commit profiling to see
> what speedups/slowdowns do they come with...

I've heard Jens already tested it out. Jens, is that right? Can you
share? especially since you have much more fitting hardware.

> 
> 	It's not in the final state (if nothing else, it needs to be
> rebased on top of xarray stuff, and there will be followup cleanups
> as well), but I'd appreciate testing and profiling data...
> 
> 	It does survive xfstests + LTP syscall tests, but that's about
> it.
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-05-06 16:57             ` Pavel Begunkov
@ 2021-05-06 17:17               ` Al Viro
  0 siblings, 0 replies; 18+ messages in thread
From: Al Viro @ 2021-05-06 17:17 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, yangerkun, linux-fsdevel, linux-block, io-uring

On Thu, May 06, 2021 at 05:57:02PM +0100, Pavel Begunkov wrote:
> On 4/30/21 3:35 PM, Al Viro wrote:
> > On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote:
> >> On 4/28/21 7:16 AM, yangerkun wrote:
> >>> Hi,
> >>>
> >>> Should we pick this patch for 5.13?
> >>
> >> Looks ok to me
> > 
> > 	Looks sane.  BTW, Pavel, could you go over #untested.iov_iter
> > and give it some beating?  Ideally - with per-commit profiling to see
> > what speedups/slowdowns do they come with...
> 
> I've heard Jens already tested it out. Jens, is that right? Can you
> share? especially since you have much more fitting hardware.

FWIW, the current branch is #untested.iov_iter-3 and the code generated
by it at least _looks_ better than with mainline; how much of an improvement
does it make would have to be found by profiling...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-04-30 14:35           ` Al Viro
  2021-05-06 16:57             ` Pavel Begunkov
@ 2021-05-06 17:19             ` Jens Axboe
  2021-05-06 18:55               ` Al Viro
  1 sibling, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-05-06 17:19 UTC (permalink / raw)
  To: Al Viro, Pavel Begunkov; +Cc: yangerkun, linux-fsdevel, linux-block, io-uring

On 4/30/21 8:35 AM, Al Viro wrote:
> On Fri, Apr 30, 2021 at 01:57:22PM +0100, Pavel Begunkov wrote:
>> On 4/28/21 7:16 AM, yangerkun wrote:
>>> Hi,
>>>
>>> Should we pick this patch for 5.13?
>>
>> Looks ok to me
> 
> 	Looks sane.  BTW, Pavel, could you go over #untested.iov_iter
> and give it some beating?  Ideally - with per-commit profiling to see
> what speedups/slowdowns do they come with...
> 
> 	It's not in the final state (if nothing else, it needs to be
> rebased on top of xarray stuff, and there will be followup cleanups
> as well), but I'd appreciate testing and profiling data...
> 
> 	It does survive xfstests + LTP syscall tests, but that's about
> it.

Al, I ran your v3 branch of that and I didn't see anything in terms
of speedups. The test case is something that just writes to eventfd
a ton of times, enough to get a picture of the overall runtime. First
I ran with the existing baseline, which is eventfd using ->write():

Executed in  436.58 millis    fish           external
   usr time  106.21 millis  121.00 micros  106.09 millis
   sys time  331.32 millis   33.00 micros  331.29 millis

Executed in  436.84 millis    fish           external
   usr time  113.38 millis    0.00 micros  113.38 millis
   sys time  324.32 millis  226.00 micros  324.10 millis

Then I ran it with the eventfd ->write_iter() patch I posted:

Executed in  484.54 millis    fish           external
   usr time   93.19 millis  119.00 micros   93.07 millis
   sys time  391.35 millis   46.00 micros  391.30 millis

Executed in  485.45 millis    fish           external
   usr time   96.05 millis    0.00 micros   96.05 millis
   sys time  389.42 millis  216.00 micros  389.20 millis

Doing a quick profile, on the latter run with ->write_iter() we're
spending 8% of the time in _copy_from_iter(), and 4% in
new_sync_write(). That's obviously not there at all for the first case.
Both have about 4% in eventfd_write(). Non-iter case spends 1% in
copy_from_user().

Finally with your branch pulled in as well, iow using ->write_iter() for
eventfd and your iov changes:

Executed in  485.26 millis    fish           external
   usr time  103.09 millis   70.00 micros  103.03 millis
   sys time  382.18 millis   83.00 micros  382.09 millis

Executed in  485.16 millis    fish           external
   usr time  104.07 millis   69.00 micros  104.00 millis
   sys time  381.09 millis   94.00 micros  381.00 millis

and there's no real difference there. We're spending less time in
_copy_from_iter() (8% -> 6%) and less time in new_sync_write(), but
doesn't seem to manifest itself in reduced runtime.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-05-06 17:19             ` Jens Axboe
@ 2021-05-06 18:55               ` Al Viro
  2021-05-06 19:15                 ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2021-05-06 18:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring

On Thu, May 06, 2021 at 11:19:03AM -0600, Jens Axboe wrote:

> Doing a quick profile, on the latter run with ->write_iter() we're
> spending 8% of the time in _copy_from_iter(), and 4% in
> new_sync_write(). That's obviously not there at all for the first case.
> Both have about 4% in eventfd_write(). Non-iter case spends 1% in
> copy_from_user().
> 
> Finally with your branch pulled in as well, iow using ->write_iter() for
> eventfd and your iov changes:
> 
> Executed in  485.26 millis    fish           external
>    usr time  103.09 millis   70.00 micros  103.03 millis
>    sys time  382.18 millis   83.00 micros  382.09 millis
> 
> Executed in  485.16 millis    fish           external
>    usr time  104.07 millis   69.00 micros  104.00 millis
>    sys time  381.09 millis   94.00 micros  381.00 millis
> 
> and there's no real difference there. We're spending less time in
> _copy_from_iter() (8% -> 6%) and less time in new_sync_write(), but
> doesn't seem to manifest itself in reduced runtime.

Interesting... do you have instruction-level profiles for _copy_from_iter()
and new_sync_write() on the last of those trees?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-05-06 18:55               ` Al Viro
@ 2021-05-06 19:15                 ` Jens Axboe
  2021-05-06 21:08                   ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-05-06 19:15 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring

[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]

On 5/6/21 12:55 PM, Al Viro wrote:
> On Thu, May 06, 2021 at 11:19:03AM -0600, Jens Axboe wrote:
> 
>> Doing a quick profile, on the latter run with ->write_iter() we're
>> spending 8% of the time in _copy_from_iter(), and 4% in
>> new_sync_write(). That's obviously not there at all for the first case.
>> Both have about 4% in eventfd_write(). Non-iter case spends 1% in
>> copy_from_user().
>>
>> Finally with your branch pulled in as well, iow using ->write_iter() for
>> eventfd and your iov changes:
>>
>> Executed in  485.26 millis    fish           external
>>    usr time  103.09 millis   70.00 micros  103.03 millis
>>    sys time  382.18 millis   83.00 micros  382.09 millis
>>
>> Executed in  485.16 millis    fish           external
>>    usr time  104.07 millis   69.00 micros  104.00 millis
>>    sys time  381.09 millis   94.00 micros  381.00 millis
>>
>> and there's no real difference there. We're spending less time in
>> _copy_from_iter() (8% -> 6%) and less time in new_sync_write(), but
>> doesn't seem to manifest itself in reduced runtime.
> 
> Interesting... do you have instruction-level profiles for _copy_from_iter()
> and new_sync_write() on the last of those trees?

Attached output of perf annotate <func> for that last run.

-- 
Jens Axboe


[-- Attachment #2: nsw --]
[-- Type: text/plain, Size: 10648 bytes --]

 Percent |	Source code & Disassembly of vmlinux for cycles (72 samples, percent: local period)
---------------------------------------------------------------------------------------------------
         :
         :
         :
         :                      Disassembly of section .text:
         :
         :                      ffffffff812cef20 <new_sync_write>:
         :                      new_sync_write():
         :                      inc_syscr(current);
         :                      return ret;
         :                      }
         :
         :                      static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
         :                      {
    0.00 :   ffffffff812cef20:       callq  ffffffff8103a8a0 <__fentry__>
    0.00 :   ffffffff812cef25:       push   %rbp
    0.00 :   ffffffff812cef26:       mov    %rdx,%r8
    5.55 :   ffffffff812cef29:       mov    %rsp,%rbp
    0.00 :   ffffffff812cef2c:       push   %r12
    0.00 :   ffffffff812cef2e:       push   %rbx
    0.00 :   ffffffff812cef2f:       mov    %rcx,%r12
    0.00 :   ffffffff812cef32:       sub    $0x68,%rsp
         :                      struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
    0.00 :   ffffffff812cef36:       mov    %rdx,-0x70(%rbp)
         :                      iocb_flags():
         :                      }
         :
         :                      static inline int iocb_flags(struct file *file)
         :                      {
         :                      int res = 0;
         :                      if (file->f_flags & O_APPEND)
    0.00 :   ffffffff812cef3a:       mov    0x40(%rdi),%edx
         :                      new_sync_write():
         :                      {
    8.33 :   ffffffff812cef3d:       mov    %rdi,%rbx
         :                      struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
    0.00 :   ffffffff812cef40:       mov    %rsi,-0x78(%rbp)
         :                      iocb_flags():
    0.00 :   ffffffff812cef44:       mov    %edx,%eax
    0.00 :   ffffffff812cef46:       shr    $0x6,%eax
    0.00 :   ffffffff812cef49:       and    $0x10,%eax
         :                      res |= IOCB_APPEND;
         :                      if (file->f_flags & O_DIRECT)
         :                      res |= IOCB_DIRECT;
    0.00 :   ffffffff812cef4c:       mov    %eax,%ecx
    0.00 :   ffffffff812cef4e:       or     $0x20000,%ecx
    0.00 :   ffffffff812cef54:       test   $0x40,%dh
    6.94 :   ffffffff812cef57:       cmovne %ecx,%eax
         :                      if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
    0.00 :   ffffffff812cef5a:       test   $0x10,%dh
    0.00 :   ffffffff812cef5d:       jne    ffffffff812cef77 <new_sync_write+0x57>
    0.00 :   ffffffff812cef5f:       mov    0xd0(%rdi),%rcx
    0.00 :   ffffffff812cef66:       mov    (%rcx),%rcx
    0.00 :   ffffffff812cef69:       mov    0x28(%rcx),%rsi
    0.00 :   ffffffff812cef6d:       testb  $0x10,0x50(%rsi)
   13.89 :   ffffffff812cef71:       je     ffffffff812cf04c <new_sync_write+0x12c>
         :                      res |= IOCB_DSYNC;
    0.00 :   ffffffff812cef77:       or     $0x2,%eax
         :                      if (file->f_flags & __O_SYNC)
         :                      res |= IOCB_SYNC;
    0.00 :   ffffffff812cef7a:       mov    %eax,%ecx
    0.00 :   ffffffff812cef7c:       or     $0x4,%ecx
    0.00 :   ffffffff812cef7f:       and    $0x100000,%edx
         :                      file_write_hint():
         :                      if (file->f_write_hint != WRITE_LIFE_NOT_SET)
    0.00 :   ffffffff812cef85:       mov    0x34(%rbx),%edx
         :                      iocb_flags():
         :                      res |= IOCB_SYNC;
    0.00 :   ffffffff812cef88:       cmovne %ecx,%eax
         :                      file_write_hint():
         :                      if (file->f_write_hint != WRITE_LIFE_NOT_SET)
    0.00 :   ffffffff812cef8b:       test   %edx,%edx
    6.97 :   ffffffff812cef8d:       jne    ffffffff812cf03c <new_sync_write+0x11c>
         :                      return file_inode(file)->i_write_hint;
    0.00 :   ffffffff812cef93:       mov    0x20(%rbx),%rdx
    0.00 :   ffffffff812cef97:       movzbl 0x87(%rdx),%edx
         :                      get_current():
         :
         :                      DECLARE_PER_CPU(struct task_struct *, current_task);
         :
         :                      static __always_inline struct task_struct *get_current(void)
         :                      {
         :                      return this_cpu_read_stable(current_task);
    0.00 :   ffffffff812cef9e:       mov    %gs:0x126c0,%rcx
         :                      get_current_ioprio():
         :                      * If the calling process has set an I/O priority, use that. Otherwise, return
         :                      * the default I/O priority.
         :                      */
         :                      static inline int get_current_ioprio(void)
         :                      {
         :                      struct io_context *ioc = current->io_context;
    0.00 :   ffffffff812cefa7:       mov    0x860(%rcx),%rsi
         :
         :                      if (ioc)
    0.00 :   ffffffff812cefae:       xor    %ecx,%ecx
    0.00 :   ffffffff812cefb0:       test   %rsi,%rsi
    0.00 :   ffffffff812cefb3:       je     ffffffff812cefb9 <new_sync_write+0x99>
         :                      return ioc->ioprio;
    0.00 :   ffffffff812cefb5:       movzwl 0x14(%rsi),%ecx
         :                      init_sync_kiocb():
         :                      *kiocb = (struct kiocb) {
    0.00 :   ffffffff812cefb9:       shl    $0x10,%ecx
   12.50 :   ffffffff812cefbc:       movzwl %dx,%edx
    0.00 :   ffffffff812cefbf:       movq   $0x0,-0x38(%rbp)
    0.00 :   ffffffff812cefc7:       movq   $0x0,-0x30(%rbp)
    0.00 :   ffffffff812cefcf:       or     %ecx,%edx
    0.00 :   ffffffff812cefd1:       movq   $0x0,-0x28(%rbp)
    0.00 :   ffffffff812cefd9:       movq   $0x0,-0x18(%rbp)
    0.00 :   ffffffff812cefe1:       mov    %rbx,-0x40(%rbp)
    0.00 :   ffffffff812cefe5:       mov    %eax,-0x20(%rbp)
    6.93 :   ffffffff812cefe8:       mov    %edx,-0x1c(%rbp)
         :                      new_sync_write():
         :                      struct kiocb kiocb;
         :                      struct iov_iter iter;
         :                      ssize_t ret;
         :
         :                      init_sync_kiocb(&kiocb, filp);
         :                      kiocb.ki_pos = (ppos ? *ppos : 0);
    0.00 :   ffffffff812cefeb:       test   %r12,%r12
    0.00 :   ffffffff812cefee:       je     ffffffff812cf05b <new_sync_write+0x13b>
         :                      iov_iter_init(&iter, WRITE, &iov, 1, len);
    0.00 :   ffffffff812ceff0:       mov    $0x1,%esi
    0.00 :   ffffffff812ceff5:       lea    -0x68(%rbp),%rdi
    0.00 :   ffffffff812ceff9:       mov    $0x1,%ecx
    0.00 :   ffffffff812ceffe:       lea    -0x78(%rbp),%rdx
         :                      kiocb.ki_pos = (ppos ? *ppos : 0);
    0.00 :   ffffffff812cf002:       mov    (%r12),%rax
    0.00 :   ffffffff812cf006:       mov    %rax,-0x38(%rbp)
         :                      iov_iter_init(&iter, WRITE, &iov, 1, len);
    8.33 :   ffffffff812cf00a:       callq  ffffffff814c45e0 <iov_iter_init>
         :                      call_write_iter():
         :                      return file->f_op->write_iter(kio, iter);
   12.51 :   ffffffff812cf00f:       mov    0x28(%rbx),%rax
    0.00 :   ffffffff812cf013:       lea    -0x68(%rbp),%rsi
    0.00 :   ffffffff812cf017:       lea    -0x40(%rbp),%rdi
    0.00 :   ffffffff812cf01b:       callq  *0x28(%rax)
         :                      new_sync_write():
         :
         :                      ret = call_write_iter(filp, &kiocb, &iter);
         :                      BUG_ON(ret == -EIOCBQUEUED);
    0.00 :   ffffffff812cf01e:       cmp    $0xfffffffffffffdef,%rax
    0.00 :   ffffffff812cf024:       je     ffffffff812cf089 <new_sync_write+0x169>
         :                      if (ret > 0 && ppos)
    0.00 :   ffffffff812cf026:       test   %rax,%rax
    0.00 :   ffffffff812cf029:       jle    ffffffff812cf033 <new_sync_write+0x113>
         :                      *ppos = kiocb.ki_pos;
    0.00 :   ffffffff812cf02b:       mov    -0x38(%rbp),%rdx
   12.49 :   ffffffff812cf02f:       mov    %rdx,(%r12)
         :                      return ret;
         :                      }
    0.00 :   ffffffff812cf033:       add    $0x68,%rsp
    0.00 :   ffffffff812cf037:       pop    %rbx
    0.00 :   ffffffff812cf038:       pop    %r12
    0.00 :   ffffffff812cf03a:       pop    %rbp
    0.00 :   ffffffff812cf03b:       retq
         :                      ki_hint_validate():
         :                      if (hint <= max_hint)
    0.00 :   ffffffff812cf03c:       xor    %ecx,%ecx
    0.00 :   ffffffff812cf03e:       cmp    $0xffff,%edx
    0.00 :   ffffffff812cf044:       cmova  %ecx,%edx
    0.00 :   ffffffff812cf047:       jmpq   ffffffff812cef9e <new_sync_write+0x7e>
         :                      iocb_flags():
         :                      if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
    5.55 :   ffffffff812cf04c:       testb  $0x1,0xc(%rcx)
    0.00 :   ffffffff812cf050:       je     ffffffff812cef7a <new_sync_write+0x5a>
    0.00 :   ffffffff812cf056:       jmpq   ffffffff812cef77 <new_sync_write+0x57>
         :                      new_sync_write():
         :                      iov_iter_init(&iter, WRITE, &iov, 1, len);
    0.00 :   ffffffff812cf05b:       mov    $0x1,%esi
    0.00 :   ffffffff812cf060:       lea    -0x68(%rbp),%rdi
    0.00 :   ffffffff812cf064:       mov    $0x1,%ecx
    0.00 :   ffffffff812cf069:       lea    -0x78(%rbp),%rdx
    0.00 :   ffffffff812cf06d:       callq  ffffffff814c45e0 <iov_iter_init>
         :                      call_write_iter():
         :                      return file->f_op->write_iter(kio, iter);
    0.00 :   ffffffff812cf072:       mov    0x28(%rbx),%rax
    0.00 :   ffffffff812cf076:       lea    -0x68(%rbp),%rsi
    0.00 :   ffffffff812cf07a:       lea    -0x40(%rbp),%rdi
    0.00 :   ffffffff812cf07e:       callq  *0x28(%rax)
         :                      new_sync_write():
         :                      BUG_ON(ret == -EIOCBQUEUED);
    0.00 :   ffffffff812cf081:       cmp    $0xfffffffffffffdef,%rax
    0.00 :   ffffffff812cf087:       jne    ffffffff812cf033 <new_sync_write+0x113>
    0.00 :   ffffffff812cf089:       ud2

[-- Attachment #3: cfi --]
[-- Type: text/plain, Size: 30346 bytes --]

 Percent |	Source code & Disassembly of vmlinux for cycles (113 samples, percent: local period)
----------------------------------------------------------------------------------------------------
         :
         :
         :
         :                      Disassembly of section .text:
         :
         :                      ffffffff814c6aa0 <_copy_from_iter>:
         :                      _copy_from_iter():
         :                      }
         :                      EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
         :                      #endif /* CONFIG_ARCH_HAS_COPY_MC */
         :
         :                      size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
         :                      {
    0.00 :   ffffffff814c6aa0:       push   %rbp
    7.07 :   ffffffff814c6aa1:       mov    %rdx,%rax
    0.00 :   ffffffff814c6aa4:       mov    %rsp,%rbp
    0.00 :   ffffffff814c6aa7:       push   %r15
    0.00 :   ffffffff814c6aa9:       push   %r14
    0.00 :   ffffffff814c6aab:       push   %r13
    3.54 :   ffffffff814c6aad:       push   %r12
    0.00 :   ffffffff814c6aaf:       push   %rbx
    0.00 :   ffffffff814c6ab0:       sub    $0x50,%rsp
    0.00 :   ffffffff814c6ab4:       mov    %rdx,-0x78(%rbp)
         :                      iov_iter_type():
         :                      };
         :                      };
         :
         :                      static inline enum iter_type iov_iter_type(const struct iov_iter *i)
         :                      {
         :                      return i->iter_type;
    0.89 :   ffffffff814c6ab8:       movzbl (%rdx),%edx
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6abb:       mov    %rdi,-0x68(%rbp)
         :                      if (unlikely(iov_iter_is_pipe(i))) {
    0.00 :   ffffffff814c6abf:       cmp    $0x3,%dl
    0.00 :   ffffffff814c6ac2:       je     ffffffff814c6bd6 <_copy_from_iter+0x136>
    0.00 :   ffffffff814c6ac8:       mov    %rax,%rdi
         :                      WARN_ON(1);
         :                      return 0;
         :                      }
         :                      if (iter_is_iovec(i))
         :                      might_fault();
         :                      iterate_and_advance(i, bytes, base, len, off,
    0.00 :   ffffffff814c6acb:       mov    0x10(%rax),%rax
    0.00 :   ffffffff814c6acf:       cmp    %rsi,%rax
    0.00 :   ffffffff814c6ad2:       cmovbe %rax,%rsi
    2.65 :   ffffffff814c6ad6:       mov    %rsi,%r13
    0.00 :   ffffffff814c6ad9:       test   %rsi,%rsi
    3.52 :   ffffffff814c6adc:       je     ffffffff814c6bdd <_copy_from_iter+0x13d>
    1.76 :   ffffffff814c6ae2:       test   %dl,%dl
    0.00 :   ffffffff814c6ae4:       jne    ffffffff814c6be2 <_copy_from_iter+0x142>
    0.00 :   ffffffff814c6aea:       mov    0x18(%rdi),%rax
    0.00 :   ffffffff814c6aee:       mov    0x8(%rdi),%r14
    0.00 :   ffffffff814c6af2:       xor    %r15d,%r15d
    0.00 :   ffffffff814c6af5:       mov    -0x68(%rbp),%rdi
   25.58 :   ffffffff814c6af9:       lea    0x10(%rax),%r12
    0.00 :   ffffffff814c6afd:       jmp    ffffffff814c6b0e <_copy_from_iter+0x6e>
    0.00 :   ffffffff814c6aff:       mov    -0x68(%rbp),%rax
    0.00 :   ffffffff814c6b03:       lea    (%rax,%r15,1),%rdi
    0.00 :   ffffffff814c6b07:       add    $0x10,%r12
         :                      {
    0.00 :   ffffffff814c6b0b:       xor    %r14d,%r14d
         :                      iterate_and_advance(i, bytes, base, len, off,
    0.00 :   ffffffff814c6b0e:       mov    -0x8(%r12),%rcx
    1.09 :   ffffffff814c6b13:       lea    -0x10(%r12),%rax
    0.00 :   ffffffff814c6b18:       mov    %r12,-0x60(%rbp)
    0.00 :   ffffffff814c6b1c:       mov    %rax,-0x70(%rbp)
    0.00 :   ffffffff814c6b20:       mov    %rcx,%rbx
    0.00 :   ffffffff814c6b23:       sub    %r14,%rbx
    1.76 :   ffffffff814c6b26:       cmp    %r13,%rbx
    0.00 :   ffffffff814c6b29:       cmova  %r13,%rbx
    0.00 :   ffffffff814c6b2d:       test   %rbx,%rbx
    0.00 :   ffffffff814c6b30:       je     ffffffff814c6b07 <_copy_from_iter+0x67>
    0.00 :   ffffffff814c6b32:       mov    -0x10(%r12),%rsi
    0.00 :   ffffffff814c6b37:       mov    %rbx,%rax
    0.00 :   ffffffff814c6b3a:       add    %r14,%rsi
         :                      __chk_range_not_ok():
         :                      */
         :                      if (__builtin_constant_p(size))
         :                      return unlikely(addr > limit - size);
         :
         :                      /* Arbitrary sizes? Be careful about overflow */
         :                      addr += size;
    0.00 :   ffffffff814c6b3d:       add    %rsi,%rax
    4.42 :   ffffffff814c6b40:       jb     ffffffff814c6bd1 <_copy_from_iter+0x131>
         :                      copyin():
         :                      if (access_ok(from, n)) {
    0.00 :   ffffffff814c6b46:       movabs $0x7ffffffff000,%rdx
    0.00 :   ffffffff814c6b50:       cmp    %rdx,%rax
    3.52 :   ffffffff814c6b53:       ja     ffffffff814c6bd1 <_copy_from_iter+0x131>
         :                      copy_user_generic():
         :                      /*
         :                      * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
         :                      * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
         :                      * Otherwise, use copy_user_generic_unrolled.
         :                      */
         :                      alternative_call_2(copy_user_generic_unrolled,
    0.00 :   ffffffff814c6b55:       mov    %ebx,%edx
    0.00 :   ffffffff814c6b57:       callq  ffffffff81523880 <copy_user_generic_unrolled>
         :                      _copy_from_iter():
         :                      iterate_and_advance(i, bytes, base, len, off,
    6.18 :   ffffffff814c6b5c:       mov    -0x8(%r12),%rcx
         :                      copy_user_generic():
         :                      X86_FEATURE_ERMS,
         :                      ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
         :                      "=d" (len)),
         :                      "1" (to), "2" (from), "3" (len)
         :                      : "memory", "rcx", "r8", "r9", "r10", "r11");
         :                      return ret;
    0.00 :   ffffffff814c6b61:       mov    %eax,%eax
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6b63:       cltq
    0.00 :   ffffffff814c6b65:       mov    %rbx,%rdx
    0.00 :   ffffffff814c6b68:       sub    %rbx,%r13
    0.00 :   ffffffff814c6b6b:       sub    %rax,%rdx
    0.00 :   ffffffff814c6b6e:       add    %rax,%r13
    0.00 :   ffffffff814c6b71:       add    %rdx,%r15
    3.53 :   ffffffff814c6b74:       add    %r14,%rdx
    0.00 :   ffffffff814c6b77:       cmp    %rcx,%rdx
    0.00 :   ffffffff814c6b7a:       jb     ffffffff814c6bc4 <_copy_from_iter+0x124>
    0.00 :   ffffffff814c6b7c:       test   %r13,%r13
    0.00 :   ffffffff814c6b7f:       jne    ffffffff814c6aff <_copy_from_iter+0x5f>
    0.00 :   ffffffff814c6b85:       mov    -0x78(%rbp),%rcx
    2.66 :   ffffffff814c6b89:       mov    -0x60(%rbp),%rdi
    2.65 :   ffffffff814c6b8d:       mov    %rdi,%rax
    0.00 :   ffffffff814c6b90:       sub    0x18(%rcx),%rax
   11.54 :   ffffffff814c6b94:       mov    %r13,0x8(%rcx)
    0.00 :   ffffffff814c6b98:       mov    %rdi,0x18(%rcx)
   12.36 :   ffffffff814c6b9c:       mov    %rcx,%rdi
    0.00 :   ffffffff814c6b9f:       sar    $0x4,%rax
    0.00 :   ffffffff814c6ba3:       sub    %rax,0x20(%rcx)
    0.00 :   ffffffff814c6ba7:       mov    0x10(%rcx),%rax
    0.00 :   ffffffff814c6bab:       sub    %r15,%rax
    0.00 :   ffffffff814c6bae:       mov    %rax,0x10(%rdi)
         :                      copyin(addr + off, base, len),
         :                      memcpy(addr + off, base, len)
         :                      )
         :
         :                      return bytes;
         :                      }
    3.53 :   ffffffff814c6bb2:       add    $0x50,%rsp
    0.00 :   ffffffff814c6bb6:       mov    %r15,%rax
    0.00 :   ffffffff814c6bb9:       pop    %rbx
    0.00 :   ffffffff814c6bba:       pop    %r12
    0.00 :   ffffffff814c6bbc:       pop    %r13
    0.00 :   ffffffff814c6bbe:       pop    %r14
    0.00 :   ffffffff814c6bc0:       pop    %r15
    1.76 :   ffffffff814c6bc2:       pop    %rbp
    0.00 :   ffffffff814c6bc3:       retq
    0.00 :   ffffffff814c6bc4:       mov    -0x70(%rbp),%rax
    0.00 :   ffffffff814c6bc8:       mov    %rdx,%r13
    0.00 :   ffffffff814c6bcb:       mov    %rax,-0x60(%rbp)
    0.00 :   ffffffff814c6bcf:       jmp    ffffffff814c6b85 <_copy_from_iter+0xe5>
         :                      copyin():
    0.00 :   ffffffff814c6bd1:       mov    %rbx,%rax
    0.00 :   ffffffff814c6bd4:       jmp    ffffffff814c6b63 <_copy_from_iter+0xc3>
         :                      _copy_from_iter():
         :                      WARN_ON(1);
    0.00 :   ffffffff814c6bd6:       ud2
         :                      return 0;
    0.00 :   ffffffff814c6bd8:       xor    %r15d,%r15d
    0.00 :   ffffffff814c6bdb:       jmp    ffffffff814c6bb2 <_copy_from_iter+0x112>
    0.00 :   ffffffff814c6bdd:       xor    %r15d,%r15d
    0.00 :   ffffffff814c6be0:       jmp    ffffffff814c6bb2 <_copy_from_iter+0x112>
         :                      iterate_and_advance(i, bytes, base, len, off,
    0.00 :   ffffffff814c6be2:       cmp    $0x2,%dl
    0.00 :   ffffffff814c6be5:       je     ffffffff814c6e09 <_copy_from_iter+0x369>
    0.00 :   ffffffff814c6beb:       cmp    $0x1,%dl
    0.00 :   ffffffff814c6bee:       je     ffffffff814c6d6b <_copy_from_iter+0x2cb>
    0.00 :   ffffffff814c6bf4:       mov    %rsi,%r15
    0.00 :   ffffffff814c6bf7:       cmp    $0x4,%dl
    0.00 :   ffffffff814c6bfa:       jne    ffffffff814c6bab <_copy_from_iter+0x10b>
    0.00 :   ffffffff814c6bfc:       mov    0x8(%rdi),%rax
    0.00 :   ffffffff814c6c00:       add    0x20(%rdi),%rax
    0.00 :   ffffffff814c6c04:       movl   $0x0,-0x48(%rbp)
    0.00 :   ffffffff814c6c0b:       movq   $0x3,-0x40(%rbp)
    0.00 :   ffffffff814c6c13:       movq   $0x0,-0x38(%rbp)
    0.00 :   ffffffff814c6c1b:       movq   $0x0,-0x30(%rbp)
    0.00 :   ffffffff814c6c23:       mov    %eax,%ebx
    0.00 :   ffffffff814c6c25:       shr    $0xc,%rax
    0.00 :   ffffffff814c6c29:       mov    %rax,%rcx
    0.00 :   ffffffff814c6c2c:       mov    %rax,-0x60(%rbp)
    0.00 :   ffffffff814c6c30:       mov    0x18(%rdi),%rax
    0.00 :   ffffffff814c6c34:       and    $0xfff,%ebx
    0.00 :   ffffffff814c6c3a:       mov    %rcx,-0x50(%rbp)
    0.00 :   ffffffff814c6c3e:       mov    %rax,-0x58(%rbp)
    0.00 :   ffffffff814c6c42:       mov    $0xffffffffffffffff,%rsi
    0.00 :   ffffffff814c6c49:       lea    -0x58(%rbp),%rdi
    0.00 :   ffffffff814c6c4d:       xor    %r15d,%r15d
    0.00 :   ffffffff814c6c50:       callq  ffffffff8151fe40 <xas_find>
    0.00 :   ffffffff814c6c55:       mov    %rax,%r14
    0.00 :   ffffffff814c6c58:       test   %rax,%rax
    0.00 :   ffffffff814c6c5b:       je     ffffffff814c6d51 <_copy_from_iter+0x2b1>
    0.00 :   ffffffff814c6c61:       mov    %ebx,%r12d
         :                      xas_retry():
         :                      * Context: Any context.
         :                      * Return: true if the operation needs to be retried.
         :                      */
         :                      static inline bool xas_retry(struct xa_state *xas, const void *entry)
         :                      {
         :                      if (xa_is_zero(entry))
    0.00 :   ffffffff814c6c64:       cmp    $0x406,%r14
    0.00 :   ffffffff814c6c6b:       je     ffffffff814c6d1c <_copy_from_iter+0x27c>
         :                      return true;
         :                      if (!xa_is_retry(entry))
    0.00 :   ffffffff814c6c71:       cmp    $0x402,%r14
    0.00 :   ffffffff814c6c78:       je     ffffffff814c6f7f <_copy_from_iter+0x4df>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6c7e:       test   $0x1,%r14b
    0.00 :   ffffffff814c6c82:       jne    ffffffff814c6f78 <_copy_from_iter+0x4d8>
    0.00 :   ffffffff814c6c88:       mov    %r14,%rdi
    0.00 :   ffffffff814c6c8b:       callq  ffffffff81296c00 <PageHuge>
    0.00 :   ffffffff814c6c90:       mov    %eax,%ebx
    0.00 :   ffffffff814c6c92:       test   %eax,%eax
    0.00 :   ffffffff814c6c94:       jne    ffffffff814c6f30 <_copy_from_iter+0x490>
    0.00 :   ffffffff814c6c9a:       mov    -0x60(%rbp),%rdi
    0.00 :   ffffffff814c6c9e:       mov    0x20(%r14),%rax
    0.00 :   ffffffff814c6ca2:       mov    %edi,%ecx
    0.00 :   ffffffff814c6ca4:       sub    %eax,%ecx
    0.00 :   ffffffff814c6ca6:       cmp    %rdi,%rax
    0.00 :   ffffffff814c6ca9:       cmovb  %ecx,%ebx
    0.00 :   ffffffff814c6cac:       jmp    ffffffff814c6d00 <_copy_from_iter+0x260>
    0.00 :   ffffffff814c6cae:       mov    %r12d,%eax
    0.00 :   ffffffff814c6cb1:       mov    $0x1000,%edx
    0.00 :   ffffffff814c6cb6:       movslq %ebx,%rsi
    0.00 :   ffffffff814c6cb9:       mov    -0x68(%rbp),%rcx
    0.00 :   ffffffff814c6cbd:       sub    %rax,%rdx
    0.00 :   ffffffff814c6cc0:       cmp    %r13,%rdx
    0.00 :   ffffffff814c6cc3:       cmova  %r13,%rdx
    0.00 :   ffffffff814c6cc7:       shl    $0x6,%rsi
    0.00 :   ffffffff814c6ccb:       add    %r14,%rsi
         :                      lowmem_page_address():
         :                      */
         :                      #include <linux/vmstat.h>
         :
         :                      static __always_inline void *lowmem_page_address(const struct page *page)
         :                      {
         :                      return page_to_virt(page);
    0.00 :   ffffffff814c6cce:       sub    0xebda73(%rip),%rsi        # ffffffff82384748 <vmemmap_base>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6cd5:       mov    %rdx,%r12
    0.00 :   ffffffff814c6cd8:       lea    (%rcx,%r15,1),%rdi
    0.00 :   ffffffff814c6cdc:       add    %r12,%r15
         :                      lowmem_page_address():
    0.00 :   ffffffff814c6cdf:       sar    $0x6,%rsi
    0.00 :   ffffffff814c6ce3:       shl    $0xc,%rsi
    0.00 :   ffffffff814c6ce7:       add    0xebda6a(%rip),%rsi        # ffffffff82384758 <page_offset_base>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6cee:       add    %rax,%rsi
         :                      memcpy():
         :                      if (q_size < size)
         :                      __read_overflow2();
         :                      }
         :                      if (p_size < size || q_size < size)
         :                      fortify_panic(__func__);
         :                      return __underlying_memcpy(p, q, size);
    0.00 :   ffffffff814c6cf1:       callq  ffffffff81a22620 <__memcpy>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6cf6:       sub    %r12,%r13
    0.00 :   ffffffff814c6cf9:       je     ffffffff814c6d51 <_copy_from_iter+0x2b1>
    0.00 :   ffffffff814c6cfb:       inc    %ebx
    0.00 :   ffffffff814c6cfd:       xor    %r12d,%r12d
         :                      constant_test_bit():
         :                      }
         :
         :                      static __always_inline bool constant_test_bit(long nr, const volatile unsigned long *addr)
         :                      {
         :                      return ((1UL << (nr & (BITS_PER_LONG-1))) &
         :                      (addr[nr >> _BITOPS_LONG_SHIFT])) != 0;
    0.00 :   ffffffff814c6d00:       mov    (%r14),%rax
    0.00 :   ffffffff814c6d03:       shr    $0x10,%rax
    0.00 :   ffffffff814c6d07:       and    $0x1,%eax
         :                      thp_nr_pages():
         :                      */
         :                      static inline int thp_nr_pages(struct page *page)
         :                      {
         :                      VM_BUG_ON_PGFLAGS(PageTail(page), page);
         :                      if (PageHead(page))
         :                      return HPAGE_PMD_NR;
    0.00 :   ffffffff814c6d0a:       cmp    $0x1,%al
    0.00 :   ffffffff814c6d0c:       sbb    %eax,%eax
    0.00 :   ffffffff814c6d0e:       and    $0xfffffe01,%eax
    0.00 :   ffffffff814c6d13:       add    $0x200,%eax
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6d18:       cmp    %eax,%ebx
    0.00 :   ffffffff814c6d1a:       jl     ffffffff814c6cae <_copy_from_iter+0x20e>
         :                      xas_next_entry():
         :                      *
         :                      * Return: The next present entry after the one currently referred to by @xas.
         :                      */
         :                      static inline void *xas_next_entry(struct xa_state *xas, unsigned long max)
         :                      {
         :                      struct xa_node *node = xas->xa_node;
    0.00 :   ffffffff814c6d1c:       mov    -0x40(%rbp),%rdi
         :                      xas_not_node():
         :                      return ((unsigned long)node & 3) || !node;
    0.00 :   ffffffff814c6d20:       test   $0x3,%dil
    0.00 :   ffffffff814c6d24:       setne  %cl
    0.00 :   ffffffff814c6d27:       test   %rdi,%rdi
    0.00 :   ffffffff814c6d2a:       sete   %al
    0.00 :   ffffffff814c6d2d:       or     %al,%cl
    0.00 :   ffffffff814c6d2f:       je     ffffffff814c6ecc <_copy_from_iter+0x42c>
         :                      xas_next_entry():
         :                      return xas_find(xas, max);
         :                      if (unlikely(xas->xa_offset == XA_CHUNK_MASK))
         :                      return xas_find(xas, max);
         :                      entry = xa_entry(xas->xa, node, xas->xa_offset + 1);
         :                      if (unlikely(xa_is_internal(entry)))
         :                      return xas_find(xas, max);
    0.00 :   ffffffff814c6d35:       mov    $0xffffffffffffffff,%rsi
    0.00 :   ffffffff814c6d3c:       lea    -0x58(%rbp),%rdi
    0.00 :   ffffffff814c6d40:       callq  ffffffff8151fe40 <xas_find>
    0.00 :   ffffffff814c6d45:       mov    %rax,%r14
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6d48:       test   %rax,%rax
    0.00 :   ffffffff814c6d4b:       jne    ffffffff814c6c64 <_copy_from_iter+0x1c4>
         :                      __rcu_read_unlock():
         :                      }
         :
         :                      static inline void __rcu_read_unlock(void)
         :                      {
         :                      preempt_enable();
         :                      rcu_read_unlock_strict();
    0.00 :   ffffffff814c6d51:       callq  ffffffff810e12c0 <rcu_read_unlock_strict>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6d56:       mov    -0x78(%rbp),%rax
    0.00 :   ffffffff814c6d5a:       mov    -0x78(%rbp),%rdi
    0.00 :   ffffffff814c6d5e:       add    %r15,0x8(%rax)
    0.00 :   ffffffff814c6d62:       mov    0x10(%rax),%rax
    0.00 :   ffffffff814c6d66:       jmpq   ffffffff814c6bab <_copy_from_iter+0x10b>
    0.00 :   ffffffff814c6d6b:       mov    0x18(%rdi),%rax
    0.00 :   ffffffff814c6d6f:       xor    %r15d,%r15d
    0.00 :   ffffffff814c6d72:       mov    0x8(%rdi),%rbx
    0.00 :   ffffffff814c6d76:       mov    -0x68(%rbp),%rdi
    0.00 :   ffffffff814c6d7a:       lea    0x10(%rax),%r12
    0.00 :   ffffffff814c6d7e:       mov    %r15,%rax
    0.00 :   ffffffff814c6d81:       mov    %r12,%r15
    0.00 :   ffffffff814c6d84:       mov    %rax,%r12
    0.00 :   ffffffff814c6d87:       jmp    ffffffff814c6d97 <_copy_from_iter+0x2f7>
    0.00 :   ffffffff814c6d89:       mov    -0x68(%rbp),%rax
    0.00 :   ffffffff814c6d8d:       lea    (%rax,%r12,1),%rdi
    0.00 :   ffffffff814c6d91:       add    $0x10,%r15
    0.00 :   ffffffff814c6d95:       xor    %ebx,%ebx
    0.00 :   ffffffff814c6d97:       mov    -0x8(%r15),%r14
    0.00 :   ffffffff814c6d9b:       lea    -0x10(%r15),%rax
    0.00 :   ffffffff814c6d9f:       mov    %r15,-0x60(%rbp)
    0.00 :   ffffffff814c6da3:       mov    %rax,-0x70(%rbp)
    0.00 :   ffffffff814c6da7:       sub    %rbx,%r14
    0.00 :   ffffffff814c6daa:       cmp    %r13,%r14
    0.00 :   ffffffff814c6dad:       cmova  %r13,%r14
    0.00 :   ffffffff814c6db1:       test   %r14,%r14
    0.00 :   ffffffff814c6db4:       je     ffffffff814c6d91 <_copy_from_iter+0x2f1>
    0.00 :   ffffffff814c6db6:       mov    -0x10(%r15),%rsi
         :                      memcpy():
    0.00 :   ffffffff814c6dba:       mov    %r14,%rdx
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6dbd:       add    %r14,%r12
    0.00 :   ffffffff814c6dc0:       sub    %r14,%r13
    0.00 :   ffffffff814c6dc3:       add    %rbx,%rsi
         :                      memcpy():
    0.00 :   ffffffff814c6dc6:       callq  ffffffff81a22620 <__memcpy>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6dcb:       lea    (%rbx,%r14,1),%rcx
    0.00 :   ffffffff814c6dcf:       cmp    %rcx,-0x8(%r15)
    0.00 :   ffffffff814c6dd3:       ja     ffffffff814c6eb9 <_copy_from_iter+0x419>
    0.00 :   ffffffff814c6dd9:       test   %r13,%r13
    0.00 :   ffffffff814c6ddc:       jne    ffffffff814c6d89 <_copy_from_iter+0x2e9>
    0.00 :   ffffffff814c6dde:       mov    %r12,%r15
    0.00 :   ffffffff814c6de1:       mov    -0x78(%rbp),%rdi
    0.00 :   ffffffff814c6de5:       mov    -0x60(%rbp),%rcx
    0.00 :   ffffffff814c6de9:       mov    %rcx,%rax
    0.00 :   ffffffff814c6dec:       sub    0x18(%rdi),%rax
    0.00 :   ffffffff814c6df0:       mov    %r13,0x8(%rdi)
    0.00 :   ffffffff814c6df4:       mov    %rcx,0x18(%rdi)
    0.00 :   ffffffff814c6df8:       sar    $0x4,%rax
    0.00 :   ffffffff814c6dfc:       sub    %rax,0x20(%rdi)
    0.00 :   ffffffff814c6e00:       mov    0x10(%rdi),%rax
    0.00 :   ffffffff814c6e04:       jmpq   ffffffff814c6bab <_copy_from_iter+0x10b>
    0.00 :   ffffffff814c6e09:       mov    0x18(%rdi),%r14
    0.00 :   ffffffff814c6e0d:       mov    0x8(%rdi),%r12d
    0.00 :   ffffffff814c6e11:       xor    %r15d,%r15d
    0.00 :   ffffffff814c6e14:       mov    0xc(%r14),%eax
    0.00 :   ffffffff814c6e18:       mov    0x8(%r14),%edx
    0.00 :   ffffffff814c6e1c:       mov    $0x1000,%esi
    0.00 :   ffffffff814c6e21:       mov    -0x68(%rbp),%rdi
    0.00 :   ffffffff814c6e25:       add    %r12d,%eax
    0.00 :   ffffffff814c6e28:       sub    %r12d,%edx
    0.00 :   ffffffff814c6e2b:       mov    %eax,%ecx
    0.00 :   ffffffff814c6e2d:       and    $0xfff,%ecx
    0.00 :   ffffffff814c6e33:       cmp    %r13,%rdx
    0.00 :   ffffffff814c6e36:       cmova  %r13,%rdx
    0.00 :   ffffffff814c6e3a:       sub    %rcx,%rsi
    0.00 :   ffffffff814c6e3d:       cmp    %rsi,%rdx
    0.00 :   ffffffff814c6e40:       cmovbe %rdx,%rsi
    0.00 :   ffffffff814c6e44:       shr    $0xc,%eax
    0.00 :   ffffffff814c6e47:       add    %r15,%rdi
    0.00 :   ffffffff814c6e4a:       mov    %rsi,%rbx
    0.00 :   ffffffff814c6e4d:       mov    %eax,%esi
    0.00 :   ffffffff814c6e4f:       shl    $0x6,%rsi
    0.00 :   ffffffff814c6e53:       add    (%r14),%rsi
         :                      memcpy():
    0.00 :   ffffffff814c6e56:       mov    %rbx,%rdx
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6e59:       add    %rbx,%r15
         :                      lowmem_page_address():
    0.00 :   ffffffff814c6e5c:       sub    0xebd8e5(%rip),%rsi        # ffffffff82384748 <vmemmap_base>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6e63:       add    %ebx,%r12d
         :                      lowmem_page_address():
    0.00 :   ffffffff814c6e66:       sar    $0x6,%rsi
    0.00 :   ffffffff814c6e6a:       shl    $0xc,%rsi
    0.00 :   ffffffff814c6e6e:       add    0xebd8e3(%rip),%rsi        # ffffffff82384758 <page_offset_base>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6e75:       add    %rcx,%rsi
         :                      memcpy():
    0.00 :   ffffffff814c6e78:       callq  ffffffff81a22620 <__memcpy>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6e7d:       cmp    %r12d,0x8(%r14)
    0.00 :   ffffffff814c6e81:       jne    ffffffff814c6e8a <_copy_from_iter+0x3ea>
    0.00 :   ffffffff814c6e83:       add    $0x10,%r14
    0.00 :   ffffffff814c6e87:       xor    %r12d,%r12d
    0.00 :   ffffffff814c6e8a:       sub    %rbx,%r13
    0.00 :   ffffffff814c6e8d:       jne    ffffffff814c6e14 <_copy_from_iter+0x374>
    0.00 :   ffffffff814c6e8f:       mov    -0x78(%rbp),%rcx
    0.00 :   ffffffff814c6e93:       mov    %r12d,%eax
    0.00 :   ffffffff814c6e96:       mov    %rax,0x8(%rcx)
    0.00 :   ffffffff814c6e9a:       mov    %r14,%rax
    0.00 :   ffffffff814c6e9d:       sub    0x18(%rcx),%rax
    0.00 :   ffffffff814c6ea1:       mov    %rcx,%rdi
    0.00 :   ffffffff814c6ea4:       mov    %r14,0x18(%rcx)
    0.00 :   ffffffff814c6ea8:       sar    $0x4,%rax
    0.00 :   ffffffff814c6eac:       sub    %rax,0x20(%rcx)
    0.00 :   ffffffff814c6eb0:       mov    0x10(%rcx),%rax
    0.00 :   ffffffff814c6eb4:       jmpq   ffffffff814c6bab <_copy_from_iter+0x10b>
    0.00 :   ffffffff814c6eb9:       mov    -0x70(%rbp),%rax
    0.00 :   ffffffff814c6ebd:       mov    %r12,%r15
    0.00 :   ffffffff814c6ec0:       mov    %rcx,%r13
    0.00 :   ffffffff814c6ec3:       mov    %rax,-0x60(%rbp)
    0.00 :   ffffffff814c6ec7:       jmpq   ffffffff814c6de1 <_copy_from_iter+0x341>
         :                      xas_next_entry():
         :                      if (unlikely(xas_not_node(node) || node->shift ||
    0.00 :   ffffffff814c6ecc:       cmpb   $0x0,(%rdi)
    0.00 :   ffffffff814c6ecf:       jne    ffffffff814c6d35 <_copy_from_iter+0x295>
    0.00 :   ffffffff814c6ed5:       mov    -0x50(%rbp),%rsi
    0.00 :   ffffffff814c6ed9:       movzbl -0x46(%rbp),%r9d
    0.00 :   ffffffff814c6ede:       mov    %rsi,%r8
    0.00 :   ffffffff814c6ee1:       mov    %r9,%rax
    0.00 :   ffffffff814c6ee4:       and    $0x3f,%r8d
    0.00 :   ffffffff814c6ee8:       cmp    %r8,%r9
    0.00 :   ffffffff814c6eeb:       jne    ffffffff814c6d35 <_copy_from_iter+0x295>
         :                      if (unlikely(xas->xa_index >= max))
    0.00 :   ffffffff814c6ef1:       cmp    $0xffffffffffffffff,%rsi
    0.00 :   ffffffff814c6ef5:       je     ffffffff814c6f60 <_copy_from_iter+0x4c0>
         :                      if (unlikely(xas->xa_offset == XA_CHUNK_MASK))
    0.00 :   ffffffff814c6ef7:       cmp    $0x3f,%al
    0.00 :   ffffffff814c6ef9:       je     ffffffff814c6f4b <_copy_from_iter+0x4ab>
         :                      entry = xa_entry(xas->xa, node, xas->xa_offset + 1);
    0.00 :   ffffffff814c6efb:       movzbl %al,%r8d
         :                      xa_entry():
         :                      return rcu_dereference_check(node->slots[offset],
    0.00 :   ffffffff814c6eff:       add    $0x5,%r8
    0.00 :   ffffffff814c6f03:       mov    0x8(%rdi,%r8,8),%r14
         :                      xa_is_internal():
         :                      return ((unsigned long)entry & 3) == 2;
    0.00 :   ffffffff814c6f08:       mov    %r14,%r8
    0.00 :   ffffffff814c6f0b:       and    $0x3,%r8d
         :                      xas_next_entry():
         :                      if (unlikely(xa_is_internal(entry)))
    0.00 :   ffffffff814c6f0f:       cmp    $0x2,%r8
    0.00 :   ffffffff814c6f13:       je     ffffffff814c6f37 <_copy_from_iter+0x497>
         :                      xas->xa_offset++;
    0.00 :   ffffffff814c6f15:       inc    %eax
         :                      xas->xa_index++;
    0.00 :   ffffffff814c6f17:       inc    %rsi
         :                      } while (!entry);
    0.00 :   ffffffff814c6f1a:       mov    $0x1,%ecx
    0.00 :   ffffffff814c6f1f:       test   %r14,%r14
    0.00 :   ffffffff814c6f22:       je     ffffffff814c6ef1 <_copy_from_iter+0x451>
    0.00 :   ffffffff814c6f24:       mov    %al,-0x46(%rbp)
    0.00 :   ffffffff814c6f27:       mov    %rsi,-0x50(%rbp)
    0.00 :   ffffffff814c6f2b:       jmpq   ffffffff814c6c64 <_copy_from_iter+0x1c4>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6f30:       ud2
    0.00 :   ffffffff814c6f32:       jmpq   ffffffff814c6d51 <_copy_from_iter+0x2b1>
    0.00 :   ffffffff814c6f37:       test   %cl,%cl
    0.00 :   ffffffff814c6f39:       je     ffffffff814c6d35 <_copy_from_iter+0x295>
    0.00 :   ffffffff814c6f3f:       mov    %al,-0x46(%rbp)
    0.00 :   ffffffff814c6f42:       mov    %rsi,-0x50(%rbp)
    0.00 :   ffffffff814c6f46:       jmpq   ffffffff814c6d35 <_copy_from_iter+0x295>
    0.00 :   ffffffff814c6f4b:       test   %cl,%cl
    0.00 :   ffffffff814c6f4d:       je     ffffffff814c6d35 <_copy_from_iter+0x295>
    0.00 :   ffffffff814c6f53:       movb   $0x3f,-0x46(%rbp)
    0.00 :   ffffffff814c6f57:       mov    %rsi,-0x50(%rbp)
         :                      xas_next_entry():
         :                      return xas_find(xas, max);
    0.00 :   ffffffff814c6f5b:       jmpq   ffffffff814c6d35 <_copy_from_iter+0x295>
    0.00 :   ffffffff814c6f60:       test   %cl,%cl
    0.00 :   ffffffff814c6f62:       je     ffffffff814c6d35 <_copy_from_iter+0x295>
    0.00 :   ffffffff814c6f68:       mov    %al,-0x46(%rbp)
    0.00 :   ffffffff814c6f6b:       movq   $0xffffffffffffffff,-0x50(%rbp)
         :                      return xas_find(xas, max);
    0.00 :   ffffffff814c6f73:       jmpq   ffffffff814c6d35 <_copy_from_iter+0x295>
         :                      _copy_from_iter():
    0.00 :   ffffffff814c6f78:       ud2
    0.00 :   ffffffff814c6f7a:       jmpq   ffffffff814c6d51 <_copy_from_iter+0x2b1>
         :                      xas_reset():
         :                      xas->xa_node = XAS_RESTART;
    0.00 :   ffffffff814c6f7f:       movq   $0x3,-0x40(%rbp)
         :                      xas_not_node():
         :                      return ((unsigned long)node & 3) || !node;
    0.00 :   ffffffff814c6f87:       jmpq   ffffffff814c6d35 <_copy_from_iter+0x295>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-05-06 19:15                 ` Jens Axboe
@ 2021-05-06 21:08                   ` Al Viro
  2021-05-06 21:17                     ` Matthew Wilcox
  2021-05-07 14:59                     ` Jens Axboe
  0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2021-05-06 21:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring

On Thu, May 06, 2021 at 01:15:01PM -0600, Jens Axboe wrote:

> Attached output of perf annotate <func> for that last run.

Heh...  I wonder if keeping the value of iocb_flags(file) in
struct file itself would have a visible effect...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-05-06 21:08                   ` Al Viro
@ 2021-05-06 21:17                     ` Matthew Wilcox
  2021-05-07 14:59                     ` Jens Axboe
  1 sibling, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2021-05-06 21:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Jens Axboe, Pavel Begunkov, yangerkun, linux-fsdevel, linux-block,
	io-uring

On Thu, May 06, 2021 at 09:08:50PM +0000, Al Viro wrote:
> On Thu, May 06, 2021 at 01:15:01PM -0600, Jens Axboe wrote:
> 
> > Attached output of perf annotate <func> for that last run.
> 
> Heh...  I wonder if keeping the value of iocb_flags(file) in
> struct file itself would have a visible effect...

I suggested that ...
https://lore.kernel.org/linux-fsdevel/[email protected]/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] block: reexpand iov_iter after read/write
  2021-05-06 21:08                   ` Al Viro
  2021-05-06 21:17                     ` Matthew Wilcox
@ 2021-05-07 14:59                     ` Jens Axboe
  1 sibling, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2021-05-07 14:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Pavel Begunkov, yangerkun, linux-fsdevel, linux-block, io-uring

On 5/6/21 3:08 PM, Al Viro wrote:
> On Thu, May 06, 2021 at 01:15:01PM -0600, Jens Axboe wrote:
> 
>> Attached output of perf annotate <func> for that last run.
> 
> Heh...  I wonder if keeping the value of iocb_flags(file) in
> struct file itself would have a visible effect...

A quick hack to get rid of the init_sync_kiocb() in new_sync_write() and
just eliminate the ki_flags read in eventfd_write(), as the test case is
blocking. That brings us closer to the ->write() method, down 7% vs the
previous 10%:

Executed in  468.23 millis    fish           external
   usr time   95.09 millis  114.00 micros   94.98 millis
   sys time  372.98 millis   76.00 micros  372.90 millis

Executed in  468.97 millis    fish           external
   usr time   91.05 millis   89.00 micros   90.96 millis
   sys time  377.92 millis   69.00 micros  377.85 millis

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-05-07 14:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-01  7:18 [PATCH] block: reexpand iov_iter after read/write yangerkun
2021-04-06  1:28 ` yangerkun
2021-04-06 11:04   ` Pavel Begunkov
2021-04-07 14:16     ` yangerkun
2021-04-09 14:49 ` Pavel Begunkov
2021-04-15 17:37   ` Pavel Begunkov
2021-04-15 17:39     ` Pavel Begunkov
2021-04-28  6:16       ` yangerkun
2021-04-30 12:57         ` Pavel Begunkov
2021-04-30 14:35           ` Al Viro
2021-05-06 16:57             ` Pavel Begunkov
2021-05-06 17:17               ` Al Viro
2021-05-06 17:19             ` Jens Axboe
2021-05-06 18:55               ` Al Viro
2021-05-06 19:15                 ` Jens Axboe
2021-05-06 21:08                   ` Al Viro
2021-05-06 21:17                     ` Matthew Wilcox
2021-05-07 14:59                     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox