public inbox for [email protected]
 help / color / mirror / Atom feed
* [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
@ 2023-06-21  7:38 Guangwu Zhang
  2023-06-21 13:40 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Guangwu Zhang @ 2023-06-21  7:38 UTC (permalink / raw)
  To: linux-block, Ming Lei, Jeff Moyer, io-uring

HI,
Found the io_req_local_work_add error when run  liburing testing.

kernel repo :
    Merge branch 'for-6.5/block' into for-next
    * for-6.5/block:
      reiserfs: fix blkdev_put() warning from release_journal_dev()

[ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
[ 1733.395900] Read of size 4 at addr ffff888133320458 by task
iou-wrk-97057/97138
[ 1733.403205]
[ 1733.404706] CPU: 4 PID: 97138 Comm: iou-wrk-97057 Kdump: loaded Not
tainted 6.4.0-rc3.kasan+ #1
[ 1733.413404] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS
2.13.3 12/13/2021
[ 1733.420972] Call Trace:
[ 1733.423425]  <TASK>
[ 1733.425533]  dump_stack_lvl+0x33/0x50
[ 1733.429207]  print_address_description.constprop.0+0x2c/0x3e0
[ 1733.434959]  print_report+0xb5/0x270
[ 1733.438539]  ? kasan_addr_to_slab+0x9/0xa0
[ 1733.442639]  ? io_req_local_work_add+0x3b1/0x4a0
[ 1733.447258]  kasan_report+0xcf/0x100
[ 1733.450839]  ? io_req_local_work_add+0x3b1/0x4a0
[ 1733.455456]  io_req_local_work_add+0x3b1/0x4a0
[ 1733.459903]  ? __pfx_io_req_local_work_add+0x10/0x10
[ 1733.464871]  ? __schedule+0x616/0x1530
[ 1733.468622]  __io_req_task_work_add+0x1bc/0x270
[ 1733.473156]  io_issue_sqe+0x55a/0xe80
[ 1733.476831]  io_wq_submit_work+0x23e/0xa00
[ 1733.480930]  io_worker_handle_work+0x2f5/0xa80
[ 1733.485384]  io_wq_worker+0x6c5/0x9d0
[ 1733.489051]  ? __pfx_io_wq_worker+0x10/0x10
[ 1733.493246]  ? _raw_spin_lock_irq+0x82/0xe0
[ 1733.497430]  ? __pfx_io_wq_worker+0x10/0x10
[ 1733.501616]  ret_from_fork+0x29/0x50
[ 1733.505204]  </TASK>
[ 1733.507396]
[ 1733.508894] Allocated by task 97057:
[ 1733.512475]  kasan_save_stack+0x1e/0x40
[ 1733.516313]  kasan_set_track+0x21/0x30
[ 1733.520068]  __kasan_slab_alloc+0x83/0x90
[ 1733.524080]  kmem_cache_alloc_bulk+0x13a/0x1e0
[ 1733.528526]  __io_alloc_req_refill+0x238/0x510
[ 1733.532971]  io_submit_sqes+0x65a/0xcd0
[ 1733.536810]  __do_sys_io_uring_enter+0x4e9/0x830
[ 1733.541430]  do_syscall_64+0x59/0x90
[ 1733.545010]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1733.550071]
[ 1733.551571] The buggy address belongs to the object at ffff8881333203c0
[ 1733.551571]  which belongs to the cache io_kiocb of size 224
[ 1733.563816] The buggy address is located 152 bytes inside of
[ 1733.563816]  224-byte region [ffff8881333203c0, ffff8881333204a0)
[ 1733.575544]
[ 1733.577042] The buggy address belongs to the physical page:
[ 1733.582617] page:00000000edbe178c refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x133320
[ 1733.592011] head:00000000edbe178c order:1 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[ 1733.600096] memcg:ffff88810cd49001
[ 1733.603501] flags:
0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 1733.610896] page_type: 0xffffffff()
[ 1733.614390] raw: 0017ffffc0010200 ffff888101222280 ffffea0004473900
0000000000000002
[ 1733.622128] raw: 0000000000000000 0000000000190019 00000001ffffffff
ffff88810cd49001
[ 1733.629866] page dumped because: kasan: bad access detected
[ 1733.635439]
[ 1733.636938] Memory state around the buggy address:
[ 1733.641731]  ffff888133320300: 00 00 00 00 00 00 00 00 00 00 00 00
fc fc fc fc
[ 1733.648952]  ffff888133320380: fc fc fc fc fc fc fc fc 00 00 00 00
00 00 00 00
[ 1733.656169] >ffff888133320400: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 1733.663389]                                                        ^
[ 1733.669743]  ffff888133320480: 00 00 00 00 fc fc fc fc fc fc fc fc
fc fc fc fc
[ 1733.676961]  ffff888133320500: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00

-- 
Guangwu Zhang
Thanks


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
  2023-06-21  7:38 [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0 Guangwu Zhang
@ 2023-06-21 13:40 ` Jens Axboe
  2023-06-21 14:04   ` Ming Lei
  2023-06-21 14:04   ` Jens Axboe
  0 siblings, 2 replies; 8+ messages in thread
From: Jens Axboe @ 2023-06-21 13:40 UTC (permalink / raw)
  To: Guangwu Zhang, linux-block, Ming Lei, Jeff Moyer, io-uring

On 6/21/23 1:38?AM, Guangwu Zhang wrote:
> HI,
> Found the io_req_local_work_add error when run  liburing testing.
> 
> kernel repo :
>     Merge branch 'for-6.5/block' into for-next
>     * for-6.5/block:
>       reiserfs: fix blkdev_put() warning from release_journal_dev()
> 
> [ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
> [ 1733.395900] Read of size 4 at addr ffff888133320458 by task
> iou-wrk-97057/97138
> [ 1733.403205]
> [ 1733.404706] CPU: 4 PID: 97138 Comm: iou-wrk-97057 Kdump: loaded Not
> tainted 6.4.0-rc3.kasan+ #1
> [ 1733.413404] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS
> 2.13.3 12/13/2021
> [ 1733.420972] Call Trace:
> [ 1733.423425]  <TASK>
> [ 1733.425533]  dump_stack_lvl+0x33/0x50
> [ 1733.429207]  print_address_description.constprop.0+0x2c/0x3e0
> [ 1733.434959]  print_report+0xb5/0x270
> [ 1733.438539]  ? kasan_addr_to_slab+0x9/0xa0
> [ 1733.442639]  ? io_req_local_work_add+0x3b1/0x4a0
> [ 1733.447258]  kasan_report+0xcf/0x100
> [ 1733.450839]  ? io_req_local_work_add+0x3b1/0x4a0
> [ 1733.455456]  io_req_local_work_add+0x3b1/0x4a0
> [ 1733.459903]  ? __pfx_io_req_local_work_add+0x10/0x10
> [ 1733.464871]  ? __schedule+0x616/0x1530
> [ 1733.468622]  __io_req_task_work_add+0x1bc/0x270
> [ 1733.473156]  io_issue_sqe+0x55a/0xe80
> [ 1733.476831]  io_wq_submit_work+0x23e/0xa00
> [ 1733.480930]  io_worker_handle_work+0x2f5/0xa80
> [ 1733.485384]  io_wq_worker+0x6c5/0x9d0
> [ 1733.489051]  ? __pfx_io_wq_worker+0x10/0x10
> [ 1733.493246]  ? _raw_spin_lock_irq+0x82/0xe0
> [ 1733.497430]  ? __pfx_io_wq_worker+0x10/0x10
> [ 1733.501616]  ret_from_fork+0x29/0x50
> [ 1733.505204]  </TASK>
> [ 1733.507396]
> [ 1733.508894] Allocated by task 97057:
> [ 1733.512475]  kasan_save_stack+0x1e/0x40
> [ 1733.516313]  kasan_set_track+0x21/0x30
> [ 1733.520068]  __kasan_slab_alloc+0x83/0x90
> [ 1733.524080]  kmem_cache_alloc_bulk+0x13a/0x1e0
> [ 1733.528526]  __io_alloc_req_refill+0x238/0x510
> [ 1733.532971]  io_submit_sqes+0x65a/0xcd0
> [ 1733.536810]  __do_sys_io_uring_enter+0x4e9/0x830
> [ 1733.541430]  do_syscall_64+0x59/0x90
> [ 1733.545010]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 1733.550071]
> [ 1733.551571] The buggy address belongs to the object at ffff8881333203c0
> [ 1733.551571]  which belongs to the cache io_kiocb of size 224
> [ 1733.563816] The buggy address is located 152 bytes inside of
> [ 1733.563816]  224-byte region [ffff8881333203c0, ffff8881333204a0)
> [ 1733.575544]
> [ 1733.577042] The buggy address belongs to the physical page:
> [ 1733.582617] page:00000000edbe178c refcount:1 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0x133320
> [ 1733.592011] head:00000000edbe178c order:1 entire_mapcount:0
> nr_pages_mapped:0 pincount:0
> [ 1733.600096] memcg:ffff88810cd49001
> [ 1733.603501] flags:
> 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> [ 1733.610896] page_type: 0xffffffff()
> [ 1733.614390] raw: 0017ffffc0010200 ffff888101222280 ffffea0004473900
> 0000000000000002
> [ 1733.622128] raw: 0000000000000000 0000000000190019 00000001ffffffff
> ffff88810cd49001
> [ 1733.629866] page dumped because: kasan: bad access detected
> [ 1733.635439]
> [ 1733.636938] Memory state around the buggy address:
> [ 1733.641731]  ffff888133320300: 00 00 00 00 00 00 00 00 00 00 00 00
> fc fc fc fc
> [ 1733.648952]  ffff888133320380: fc fc fc fc fc fc fc fc 00 00 00 00
> 00 00 00 00
> [ 1733.656169] >ffff888133320400: 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> [ 1733.663389]                                                        ^
> [ 1733.669743]  ffff888133320480: 00 00 00 00 fc fc fc fc fc fc fc fc
> fc fc fc fc
> [ 1733.676961]  ffff888133320500: 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00

I appreciate you running tests and sending in failures, but can you
please be more specific about what exactly was run? We seem to need to
do this dance every time, which is just wasting time. So:

1) What test triggered this?
2) Was it invoked with any arguments?

In general, a good bug report should include exactly HOW you ended
up there.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
  2023-06-21 13:40 ` Jens Axboe
@ 2023-06-21 14:04   ` Ming Lei
  2023-06-21 15:56     ` Jens Axboe
  2023-06-21 14:04   ` Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Ming Lei @ 2023-06-21 14:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Guangwu Zhang, linux-block, Jeff Moyer, io-uring

On Wed, Jun 21, 2023 at 07:40:49AM -0600, Jens Axboe wrote:
> On 6/21/23 1:38?AM, Guangwu Zhang wrote:
> > HI,
> > Found the io_req_local_work_add error when run  liburing testing.
> > 
> > kernel repo :
> >     Merge branch 'for-6.5/block' into for-next
> >     * for-6.5/block:
> >       reiserfs: fix blkdev_put() warning from release_journal_dev()
> > 
> > [ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
> > [ 1733.395900] Read of size 4 at addr ffff888133320458 by task
> > iou-wrk-97057/97138
> > [ 1733.403205]
> > [ 1733.404706] CPU: 4 PID: 97138 Comm: iou-wrk-97057 Kdump: loaded Not
> > tainted 6.4.0-rc3.kasan+ #1
> > [ 1733.413404] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS
> > 2.13.3 12/13/2021
> > [ 1733.420972] Call Trace:
> > [ 1733.423425]  <TASK>
> > [ 1733.425533]  dump_stack_lvl+0x33/0x50
> > [ 1733.429207]  print_address_description.constprop.0+0x2c/0x3e0
> > [ 1733.434959]  print_report+0xb5/0x270
> > [ 1733.438539]  ? kasan_addr_to_slab+0x9/0xa0
> > [ 1733.442639]  ? io_req_local_work_add+0x3b1/0x4a0
> > [ 1733.447258]  kasan_report+0xcf/0x100
> > [ 1733.450839]  ? io_req_local_work_add+0x3b1/0x4a0
> > [ 1733.455456]  io_req_local_work_add+0x3b1/0x4a0
> > [ 1733.459903]  ? __pfx_io_req_local_work_add+0x10/0x10
> > [ 1733.464871]  ? __schedule+0x616/0x1530
> > [ 1733.468622]  __io_req_task_work_add+0x1bc/0x270
> > [ 1733.473156]  io_issue_sqe+0x55a/0xe80
> > [ 1733.476831]  io_wq_submit_work+0x23e/0xa00
> > [ 1733.480930]  io_worker_handle_work+0x2f5/0xa80
> > [ 1733.485384]  io_wq_worker+0x6c5/0x9d0
> > [ 1733.489051]  ? __pfx_io_wq_worker+0x10/0x10
> > [ 1733.493246]  ? _raw_spin_lock_irq+0x82/0xe0
> > [ 1733.497430]  ? __pfx_io_wq_worker+0x10/0x10
> > [ 1733.501616]  ret_from_fork+0x29/0x50
> > [ 1733.505204]  </TASK>
> > [ 1733.507396]
> > [ 1733.508894] Allocated by task 97057:
> > [ 1733.512475]  kasan_save_stack+0x1e/0x40
> > [ 1733.516313]  kasan_set_track+0x21/0x30
> > [ 1733.520068]  __kasan_slab_alloc+0x83/0x90
> > [ 1733.524080]  kmem_cache_alloc_bulk+0x13a/0x1e0
> > [ 1733.528526]  __io_alloc_req_refill+0x238/0x510
> > [ 1733.532971]  io_submit_sqes+0x65a/0xcd0
> > [ 1733.536810]  __do_sys_io_uring_enter+0x4e9/0x830
> > [ 1733.541430]  do_syscall_64+0x59/0x90
> > [ 1733.545010]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [ 1733.550071]
> > [ 1733.551571] The buggy address belongs to the object at ffff8881333203c0
> > [ 1733.551571]  which belongs to the cache io_kiocb of size 224
> > [ 1733.563816] The buggy address is located 152 bytes inside of
> > [ 1733.563816]  224-byte region [ffff8881333203c0, ffff8881333204a0)
> > [ 1733.575544]
> > [ 1733.577042] The buggy address belongs to the physical page:
> > [ 1733.582617] page:00000000edbe178c refcount:1 mapcount:0
> > mapping:0000000000000000 index:0x0 pfn:0x133320
> > [ 1733.592011] head:00000000edbe178c order:1 entire_mapcount:0
> > nr_pages_mapped:0 pincount:0
> > [ 1733.600096] memcg:ffff88810cd49001
> > [ 1733.603501] flags:
> > 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > [ 1733.610896] page_type: 0xffffffff()
> > [ 1733.614390] raw: 0017ffffc0010200 ffff888101222280 ffffea0004473900
> > 0000000000000002
> > [ 1733.622128] raw: 0000000000000000 0000000000190019 00000001ffffffff
> > ffff88810cd49001
> > [ 1733.629866] page dumped because: kasan: bad access detected
> > [ 1733.635439]
> > [ 1733.636938] Memory state around the buggy address:
> > [ 1733.641731]  ffff888133320300: 00 00 00 00 00 00 00 00 00 00 00 00
> > fc fc fc fc
> > [ 1733.648952]  ffff888133320380: fc fc fc fc fc fc fc fc 00 00 00 00
> > 00 00 00 00
> > [ 1733.656169] >ffff888133320400: 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00
> > [ 1733.663389]                                                        ^
> > [ 1733.669743]  ffff888133320480: 00 00 00 00 fc fc fc fc fc fc fc fc
> > fc fc fc fc
> > [ 1733.676961]  ffff888133320500: 00 00 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00
> 
> I appreciate you running tests and sending in failures, but can you
> please be more specific about what exactly was run? We seem to need to
> do this dance every time, which is just wasting time. So:
> 
> 1) What test triggered this?
> 2) Was it invoked with any arguments?

I can see the whole dmesg log, and it seems from "sq-full-cpp.c /dev/sda":

[ 1340.918880] Running test sq-full-cpp.t:
[ 1341.009225] Running test sq-full-cpp.t /dev/sda:
[ 1342.025292] restraintd[1061]: *** Current Time: Tue Jun 20 21:17:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
[ 1402.053433] restraintd[1061]: *** Current Time: Tue Jun 20 21:18:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
[ 1462.047970] restraintd[1061]: *** Current Time: Tue Jun 20 21:19:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
[-- MARK -- Wed Jun 21 01:20:00 2023]
[ 1522.029598] restraintd[1061]: *** Current Time: Tue Jun 20 21:20:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
[ 1582.029278] restraintd[1061]: *** Current Time: Tue Jun 20 21:21:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
[ 1642.034617] restraintd[1061]: *** Current Time: Tue Jun 20 21:22:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
[ 1702.034344] restraintd[1061]: *** Current Time: Tue Jun 20 21:23:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
[ 1733.381782] ==================================================================
[ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
[ 1733.395900] Read of size 4 at addr ffff888133320458 by task iou-wrk-97057/97138


Thank,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
  2023-06-21 13:40 ` Jens Axboe
  2023-06-21 14:04   ` Ming Lei
@ 2023-06-21 14:04   ` Jens Axboe
  1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2023-06-21 14:04 UTC (permalink / raw)
  To: Guangwu Zhang, linux-block, Ming Lei, Jeff Moyer, io-uring

On 6/21/23 7:40?AM, Jens Axboe wrote:
> On 6/21/23 1:38?AM, Guangwu Zhang wrote:
>> HI,
>> Found the io_req_local_work_add error when run  liburing testing.
>>
>> kernel repo :
>>     Merge branch 'for-6.5/block' into for-next
>>     * for-6.5/block:
>>       reiserfs: fix blkdev_put() warning from release_journal_dev()
>>
>> [ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
>> [ 1733.395900] Read of size 4 at addr ffff888133320458 by task
>> iou-wrk-97057/97138
>> [ 1733.403205]
>> [ 1733.404706] CPU: 4 PID: 97138 Comm: iou-wrk-97057 Kdump: loaded Not
>> tainted 6.4.0-rc3.kasan+ #1
>> [ 1733.413404] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS
>> 2.13.3 12/13/2021
>> [ 1733.420972] Call Trace:
>> [ 1733.423425]  <TASK>
>> [ 1733.425533]  dump_stack_lvl+0x33/0x50
>> [ 1733.429207]  print_address_description.constprop.0+0x2c/0x3e0
>> [ 1733.434959]  print_report+0xb5/0x270
>> [ 1733.438539]  ? kasan_addr_to_slab+0x9/0xa0
>> [ 1733.442639]  ? io_req_local_work_add+0x3b1/0x4a0
>> [ 1733.447258]  kasan_report+0xcf/0x100
>> [ 1733.450839]  ? io_req_local_work_add+0x3b1/0x4a0
>> [ 1733.455456]  io_req_local_work_add+0x3b1/0x4a0
>> [ 1733.459903]  ? __pfx_io_req_local_work_add+0x10/0x10
>> [ 1733.464871]  ? __schedule+0x616/0x1530
>> [ 1733.468622]  __io_req_task_work_add+0x1bc/0x270
>> [ 1733.473156]  io_issue_sqe+0x55a/0xe80
>> [ 1733.476831]  io_wq_submit_work+0x23e/0xa00
>> [ 1733.480930]  io_worker_handle_work+0x2f5/0xa80
>> [ 1733.485384]  io_wq_worker+0x6c5/0x9d0
>> [ 1733.489051]  ? __pfx_io_wq_worker+0x10/0x10
>> [ 1733.493246]  ? _raw_spin_lock_irq+0x82/0xe0
>> [ 1733.497430]  ? __pfx_io_wq_worker+0x10/0x10
>> [ 1733.501616]  ret_from_fork+0x29/0x50
>> [ 1733.505204]  </TASK>
>> [ 1733.507396]
>> [ 1733.508894] Allocated by task 97057:
>> [ 1733.512475]  kasan_save_stack+0x1e/0x40
>> [ 1733.516313]  kasan_set_track+0x21/0x30
>> [ 1733.520068]  __kasan_slab_alloc+0x83/0x90
>> [ 1733.524080]  kmem_cache_alloc_bulk+0x13a/0x1e0
>> [ 1733.528526]  __io_alloc_req_refill+0x238/0x510
>> [ 1733.532971]  io_submit_sqes+0x65a/0xcd0
>> [ 1733.536810]  __do_sys_io_uring_enter+0x4e9/0x830
>> [ 1733.541430]  do_syscall_64+0x59/0x90
>> [ 1733.545010]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>> [ 1733.550071]
>> [ 1733.551571] The buggy address belongs to the object at ffff8881333203c0
>> [ 1733.551571]  which belongs to the cache io_kiocb of size 224
>> [ 1733.563816] The buggy address is located 152 bytes inside of
>> [ 1733.563816]  224-byte region [ffff8881333203c0, ffff8881333204a0)
>> [ 1733.575544]
>> [ 1733.577042] The buggy address belongs to the physical page:
>> [ 1733.582617] page:00000000edbe178c refcount:1 mapcount:0
>> mapping:0000000000000000 index:0x0 pfn:0x133320
>> [ 1733.592011] head:00000000edbe178c order:1 entire_mapcount:0
>> nr_pages_mapped:0 pincount:0
>> [ 1733.600096] memcg:ffff88810cd49001
>> [ 1733.603501] flags:
>> 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
>> [ 1733.610896] page_type: 0xffffffff()
>> [ 1733.614390] raw: 0017ffffc0010200 ffff888101222280 ffffea0004473900
>> 0000000000000002
>> [ 1733.622128] raw: 0000000000000000 0000000000190019 00000001ffffffff
>> ffff88810cd49001
>> [ 1733.629866] page dumped because: kasan: bad access detected
>> [ 1733.635439]
>> [ 1733.636938] Memory state around the buggy address:
>> [ 1733.641731]  ffff888133320300: 00 00 00 00 00 00 00 00 00 00 00 00
>> fc fc fc fc
>> [ 1733.648952]  ffff888133320380: fc fc fc fc fc fc fc fc 00 00 00 00
>> 00 00 00 00
>> [ 1733.656169] >ffff888133320400: 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00
>> [ 1733.663389]                                                        ^
>> [ 1733.669743]  ffff888133320480: 00 00 00 00 fc fc fc fc fc fc fc fc
>> fc fc fc fc
>> [ 1733.676961]  ffff888133320500: 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00
> 
> I appreciate you running tests and sending in failures, but can you
> please be more specific about what exactly was run? We seem to need to
> do this dance every time, which is just wasting time. So:
> 
> 1) What test triggered this?
> 2) Was it invoked with any arguments?
> 
> In general, a good bug report should include exactly HOW you ended
> up there.

Another thing that's really handy in bug reports like this is a
disassemble of where the bad memory access occured.  We have this:

BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0

and you can look that up in gdb, for example.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
  2023-06-21 14:04   ` Ming Lei
@ 2023-06-21 15:56     ` Jens Axboe
  2023-06-23  5:51       ` Guangwu Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2023-06-21 15:56 UTC (permalink / raw)
  To: Ming Lei; +Cc: Guangwu Zhang, linux-block, Jeff Moyer, io-uring

On 6/21/23 8:04?AM, Ming Lei wrote:
> On Wed, Jun 21, 2023 at 07:40:49AM -0600, Jens Axboe wrote:
>> On 6/21/23 1:38?AM, Guangwu Zhang wrote:
>>> HI,
>>> Found the io_req_local_work_add error when run  liburing testing.
>>>
>>> kernel repo :
>>>     Merge branch 'for-6.5/block' into for-next
>>>     * for-6.5/block:
>>>       reiserfs: fix blkdev_put() warning from release_journal_dev()
>>>
>>> [ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
>>> [ 1733.395900] Read of size 4 at addr ffff888133320458 by task
>>> iou-wrk-97057/97138
>>> [ 1733.403205]
>>> [ 1733.404706] CPU: 4 PID: 97138 Comm: iou-wrk-97057 Kdump: loaded Not
>>> tainted 6.4.0-rc3.kasan+ #1
>>> [ 1733.413404] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS
>>> 2.13.3 12/13/2021
>>> [ 1733.420972] Call Trace:
>>> [ 1733.423425]  <TASK>
>>> [ 1733.425533]  dump_stack_lvl+0x33/0x50
>>> [ 1733.429207]  print_address_description.constprop.0+0x2c/0x3e0
>>> [ 1733.434959]  print_report+0xb5/0x270
>>> [ 1733.438539]  ? kasan_addr_to_slab+0x9/0xa0
>>> [ 1733.442639]  ? io_req_local_work_add+0x3b1/0x4a0
>>> [ 1733.447258]  kasan_report+0xcf/0x100
>>> [ 1733.450839]  ? io_req_local_work_add+0x3b1/0x4a0
>>> [ 1733.455456]  io_req_local_work_add+0x3b1/0x4a0
>>> [ 1733.459903]  ? __pfx_io_req_local_work_add+0x10/0x10
>>> [ 1733.464871]  ? __schedule+0x616/0x1530
>>> [ 1733.468622]  __io_req_task_work_add+0x1bc/0x270
>>> [ 1733.473156]  io_issue_sqe+0x55a/0xe80
>>> [ 1733.476831]  io_wq_submit_work+0x23e/0xa00
>>> [ 1733.480930]  io_worker_handle_work+0x2f5/0xa80
>>> [ 1733.485384]  io_wq_worker+0x6c5/0x9d0
>>> [ 1733.489051]  ? __pfx_io_wq_worker+0x10/0x10
>>> [ 1733.493246]  ? _raw_spin_lock_irq+0x82/0xe0
>>> [ 1733.497430]  ? __pfx_io_wq_worker+0x10/0x10
>>> [ 1733.501616]  ret_from_fork+0x29/0x50
>>> [ 1733.505204]  </TASK>
>>> [ 1733.507396]
>>> [ 1733.508894] Allocated by task 97057:
>>> [ 1733.512475]  kasan_save_stack+0x1e/0x40
>>> [ 1733.516313]  kasan_set_track+0x21/0x30
>>> [ 1733.520068]  __kasan_slab_alloc+0x83/0x90
>>> [ 1733.524080]  kmem_cache_alloc_bulk+0x13a/0x1e0
>>> [ 1733.528526]  __io_alloc_req_refill+0x238/0x510
>>> [ 1733.532971]  io_submit_sqes+0x65a/0xcd0
>>> [ 1733.536810]  __do_sys_io_uring_enter+0x4e9/0x830
>>> [ 1733.541430]  do_syscall_64+0x59/0x90
>>> [ 1733.545010]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>> [ 1733.550071]
>>> [ 1733.551571] The buggy address belongs to the object at ffff8881333203c0
>>> [ 1733.551571]  which belongs to the cache io_kiocb of size 224
>>> [ 1733.563816] The buggy address is located 152 bytes inside of
>>> [ 1733.563816]  224-byte region [ffff8881333203c0, ffff8881333204a0)
>>> [ 1733.575544]
>>> [ 1733.577042] The buggy address belongs to the physical page:
>>> [ 1733.582617] page:00000000edbe178c refcount:1 mapcount:0
>>> mapping:0000000000000000 index:0x0 pfn:0x133320
>>> [ 1733.592011] head:00000000edbe178c order:1 entire_mapcount:0
>>> nr_pages_mapped:0 pincount:0
>>> [ 1733.600096] memcg:ffff88810cd49001
>>> [ 1733.603501] flags:
>>> 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
>>> [ 1733.610896] page_type: 0xffffffff()
>>> [ 1733.614390] raw: 0017ffffc0010200 ffff888101222280 ffffea0004473900
>>> 0000000000000002
>>> [ 1733.622128] raw: 0000000000000000 0000000000190019 00000001ffffffff
>>> ffff88810cd49001
>>> [ 1733.629866] page dumped because: kasan: bad access detected
>>> [ 1733.635439]
>>> [ 1733.636938] Memory state around the buggy address:
>>> [ 1733.641731]  ffff888133320300: 00 00 00 00 00 00 00 00 00 00 00 00
>>> fc fc fc fc
>>> [ 1733.648952]  ffff888133320380: fc fc fc fc fc fc fc fc 00 00 00 00
>>> 00 00 00 00
>>> [ 1733.656169] >ffff888133320400: 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>> [ 1733.663389]                                                        ^
>>> [ 1733.669743]  ffff888133320480: 00 00 00 00 fc fc fc fc fc fc fc fc
>>> fc fc fc fc
>>> [ 1733.676961]  ffff888133320500: 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>
>> I appreciate you running tests and sending in failures, but can you
>> please be more specific about what exactly was run? We seem to need to
>> do this dance every time, which is just wasting time. So:
>>
>> 1) What test triggered this?
>> 2) Was it invoked with any arguments?
> 
> I can see the whole dmesg log, and it seems from "sq-full-cpp.c /dev/sda":
> 
> [ 1340.918880] Running test sq-full-cpp.t:
> [ 1341.009225] Running test sq-full-cpp.t /dev/sda:
> [ 1342.025292] restraintd[1061]: *** Current Time: Tue Jun 20 21:17:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> [ 1402.053433] restraintd[1061]: *** Current Time: Tue Jun 20 21:18:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> [ 1462.047970] restraintd[1061]: *** Current Time: Tue Jun 20 21:19:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> [-- MARK -- Wed Jun 21 01:20:00 2023]
> [ 1522.029598] restraintd[1061]: *** Current Time: Tue Jun 20 21:20:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> [ 1582.029278] restraintd[1061]: *** Current Time: Tue Jun 20 21:21:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> [ 1642.034617] restraintd[1061]: *** Current Time: Tue Jun 20 21:22:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> [ 1702.034344] restraintd[1061]: *** Current Time: Tue Jun 20 21:23:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> [ 1733.381782] ==================================================================
> [ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
> [ 1733.395900] Read of size 4 at addr ffff888133320458 by task iou-wrk-97057/97138

I don't think that's right - sq-full-cpp doesn't do anything if passed
an argument, and you also have a lot of time passing in between that
test being run and the warning being spewed. Maybe the tests are run
both as root and non-root? The root run ones will add the dmesg spew on
which test is being run, the non-root will not.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
  2023-06-21 15:56     ` Jens Axboe
@ 2023-06-23  5:51       ` Guangwu Zhang
  2023-06-23 12:14         ` Jeff Moyer
  2023-06-23 12:30         ` Pavel Begunkov
  0 siblings, 2 replies; 8+ messages in thread
From: Guangwu Zhang @ 2023-06-23  5:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, linux-block, Jeff Moyer, io-uring

Just hit the bug one time with liburing testing, so it hard to said
which case triggered the issue,
here is the test steps
1) enable kernel KASAN module
2) mkfs and mount with sda and set TEST_FILES=/dev/sda
3) copy all liburing cases to mount point
4) make runtests as root

here is other testing more easy to reproduce the error.
refer the test steps
1) enable kernel KASAN module
2) get a nvme disk with nvme over fc
3) fio with  io_uring_cmd

fio --numjobs=1 --ioengine=io_uring_cmd --runtime='300' --size='300G'
--filename=/dev/ng5n1 --rw='randrw' --name='fio_test'  --iodepth=64
--bs=8k --group_reporting=1 --cmd_type='nvme'   --fixedbufs
fio_test: (g=0): rw=randrw, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T)
8192B-8192B, ioengine=io_uring_cmd, iodepth=64
fio-3.34-100-g50875
Starting 1 process
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=119259136, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=7563542528, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=23744880640, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=11413504000, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=22324764672, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=27687174144, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=49106542592, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=49600675840, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=48269828096, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=39076798464, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=47050080256, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=18889048064, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=41649799168, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=37820571648, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=15420358656, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=1672372224, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=13677133824, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=50066644992, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=40153137152, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=19285647360, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=8948129792, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=2271256576, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=29889617920, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=19249717248, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=17358807040, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=39550853120, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=45227081728, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=32457154560, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=44406554624, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=17701838848, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=25463160832, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=45162758144, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=11501166592, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=19937656832, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=42879565824, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=16697352192, buflen=8192
fio: io_u error on file /dev/ng5n1: Unknown error -17024: write
offset=5585068032, buflen=8192
fio: pid=2451, err=-17024/file:io_u.c:1889, func=io_u error,
error=Unknown error -17024

fio_test: (groupid=0, jobs=1): err=-17024 (file:io_u.c:1889, func=io_u
error, error=Unknown error -17024): pid=2451: Fri Jun 23 01:45:45 2023
  read: IOPS=20.2k, BW=158MiB/s (165MB/s)(230MiB/1461msec)
    slat (usec): min=2, max=466, avg=11.90, stdev=19.43
    clat (usec): min=167, max=738890, avg=1333.54, stdev=19954.27
     lat (usec): min=176, max=738894, avg=1345.44, stdev=19954.17
    clat percentiles (usec):
     |  1.00th=[   359],  5.00th=[   457], 10.00th=[   502], 20.00th=[   562],
     | 30.00th=[   611], 40.00th=[   676], 50.00th=[   742], 60.00th=[   816],
     | 70.00th=[   881], 80.00th=[   947], 90.00th=[  1037], 95.00th=[  1156],
     | 99.00th=[  1434], 99.50th=[  1598], 99.90th=[  3097], 99.95th=[700449],
     | 99.99th=[700449]
   bw (  KiB/s): min=163696, max=307856, per=100.00%, avg=235776.00,
stdev=101936.51, samples=2
   iops        : min=20462, max=38482, avg=29472.00, stdev=12742.06, samples=2
  write: IOPS=20.2k, BW=158MiB/s (165MB/s)(230MiB/1461msec); 0 zone resets
    slat (usec): min=2, max=489, avg=12.08, stdev=18.99
    clat (usec): min=276, max=697804, avg=906.93, stdev=5743.07
     lat (usec): min=321, max=697810, avg=919.01, stdev=5742.89
    clat percentiles (usec):
     |  1.00th=[  453],  5.00th=[  537], 10.00th=[  586], 20.00th=[  652],
     | 30.00th=[  709], 40.00th=[  783], 50.00th=[  840], 60.00th=[  906],
     | 70.00th=[  963], 80.00th=[ 1029], 90.00th=[ 1139], 95.00th=[ 1287],
     | 99.00th=[ 1516], 99.50th=[ 1631], 99.90th=[ 2442], 99.95th=[ 3064],
     | 99.99th=[ 6194]
   bw (  KiB/s): min=163936, max=307936, per=100.00%, avg=235936.00,
stdev=101823.38, samples=2
   iops        : min=20492, max=38492, avg=29492.00, stdev=12727.92, samples=2
  lat (usec)   : 250=0.04%, 500=6.04%, 750=37.07%, 1000=37.96%
  lat (msec)   : 2=18.64%, 4=0.14%, 10=0.01%, 750=0.04%
  cpu          : usr=2.60%, sys=48.15%, ctx=575, majf=0, minf=13
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=29496,29531,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=158MiB/s (165MB/s), 158MiB/s-158MiB/s (165MB/s-165MB/s),
io=230MiB (242MB), run=1461-1461msec
  WRITE: bw=158MiB/s (165MB/s), 158MiB/s-158MiB/s (165MB/s-165MB/s),
io=230MiB (242MB), run=1461-1461msec

[  820.367682] ==================================================================
[  820.374911] Disabling lock debugging due to kernel taint
[ 2624.630806] ==================================================================
[ 2624.638050] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
[ 2624.644942] Read of size 4 at addr ffff8885102bf3d8 by task iou-wrk-2451/2479
[ 2624.652077]
[ 2624.653576] CPU: 26 PID: 2479 Comm: iou-wrk-2451 Kdump: loaded
Tainted: G    B              6.4.0-rc3.kasan+ #1
[ 2624.663663] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS
2.17.1 11/15/2022
[ 2624.671227] Call Trace:
[ 2624.673681]  <IRQ>
[ 2624.675701]  dump_stack_lvl+0x33/0x50
[ 2624.679364]  print_address_description.constprop.0+0x2c/0x3e0
[ 2624.685112]  print_report+0xb5/0x270
[ 2624.688690]  ? kasan_addr_to_slab+0x9/0xa0
[ 2624.692786]  ? io_req_local_work_add+0x3b1/0x4a0
[ 2624.697407]  kasan_report+0xcf/0x100
[ 2624.700985]  ? io_req_local_work_add+0x3b1/0x4a0
[ 2624.705605]  io_req_local_work_add+0x3b1/0x4a0
[ 2624.710052]  ? __pfx_io_req_local_work_add+0x10/0x10
[ 2624.715015]  ? bio_uninit+0x73/0x1f0
[ 2624.718596]  ? bio_endio+0x3bc/0x530
[ 2624.722174]  ? __pfx_nvme_uring_task_cb+0x10/0x10 [nvme_core]
[ 2624.727944]  __io_req_task_work_add+0x1bc/0x270
[ 2624.732480]  nvme_uring_cmd_end_io+0x1c2/0x300 [nvme_core]
[ 2624.737981]  ? __pfx_nvme_uring_cmd_end_io+0x10/0x10 [nvme_core]
[ 2624.744003]  __blk_mq_end_request+0xf4/0x460
[ 2624.748276]  nvme_fc_complete_rq+0x23b/0x320 [nvme_fc]
[ 2624.753426]  blk_complete_reqs+0xa7/0xe0
[ 2624.757349]  __do_softirq+0x19b/0x59a
[ 2624.761016]  __irq_exit_rcu+0x170/0x1d0
[ 2624.764855]  sysvec_call_function_single+0x6f/0x90
[ 2624.769646]  </IRQ>
[ 2624.771753]  <TASK>
[ 2624.773858]  asm_sysvec_call_function_single+0x16/0x20
[ 2624.778997] RIP: 0010:unwind_next_frame+0x9b9/0x1ea0
[ 2624.783963] Code: 40 38 f2 40 0f 9e c6 84 d2 0f 95 c2 40 84 d6 0f
85 3a 0e 00 00 83 e0 07 38 c1 0f 9e c2 84 c9 0f 95 c0 84 c2 0f 85 25
0e 00 00 <41> 0f b6 41 04 c0 e8 04 3c 01 0f 84 b8 01 00 00 3c 04 0f 84
02 08
[ 2624.802708] RSP: 0018:ffff88848f367158 EFLAGS: 00000246
[ 2624.807932] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[ 2624.815064] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88848f367238
[ 2624.822197] RBP: ffff88848f367228 R08: ffff88848f367210 R09: ffffffff9f998abc
[ 2624.829329] R10: ffffffff9f998ac0 R11: ffff88848f367528 R12: ffff88848f367218
[ 2624.836462] R13: ffff88848f367230 R14: ffff88848f367530 R15: ffff88848f3671d0
[ 2624.843594]  ? unwind_next_frame+0x8c2/0x1ea0
[ 2624.847953]  ? kasan_set_track+0x21/0x30
[ 2624.851878]  ? kasan_set_track+0x21/0x30
[ 2624.855802]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 2624.861117]  arch_stack_walk+0x88/0xf0
[ 2624.864870]  ? __kasan_slab_alloc+0x83/0x90
[ 2624.869055]  stack_trace_save+0x91/0xd0
[ 2624.872893]  ? __pfx_stack_trace_save+0x10/0x10
[ 2624.877424]  ? ret_from_fork+0x29/0x50
[ 2624.881179]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 2624.886490]  ? arch_stack_walk+0x88/0xf0
[ 2624.890415]  kasan_save_stack+0x1e/0x40
[ 2624.894254]  ? kasan_save_stack+0x1e/0x40
[ 2624.898268]  ? kasan_set_track+0x21/0x30
[ 2624.902192]  ? ret_from_fork+0x29/0x50
[ 2624.905944]  ? stack_trace_save+0x91/0xd0
[ 2624.909957]  ? __pfx_stack_trace_save+0x10/0x10
[ 2624.914488]  ? __stack_depot_save+0x34/0x350
[ 2624.918762]  ? kasan_save_stack+0x2e/0x40
[ 2624.922773]  ? kasan_save_stack+0x1e/0x40
[ 2624.926786]  ? kasan_set_track+0x21/0x30
[ 2624.930712]  ? __kasan_slab_alloc+0x83/0x90
[ 2624.934896]  ? kmem_cache_alloc+0x190/0x3a0
[ 2624.939082]  ? mempool_alloc+0x100/0x2d0
[ 2624.943008]  ? bio_alloc_bioset+0x1a9/0x770
[ 2624.947195]  ? blk_rq_map_user_iov+0x4a0/0xa60
[ 2624.951640]  ? nvme_map_user_request+0x1dc/0x590 [nvme_core]
[ 2624.957317]  ? nvme_uring_cmd_io+0x8c0/0xcd0 [nvme_core]
[ 2624.962645]  ? io_uring_cmd+0x1f1/0x3d0
[ 2624.966485]  ? io_issue_sqe+0x4ff/0xe80
[ 2624.970324]  ? io_wq_submit_work+0x23e/0xa00
[ 2624.974597]  ? io_worker_handle_work+0x404/0xa80
[ 2624.979217]  ? io_wq_worker+0x6c5/0x9d0
[ 2624.983055]  ? ret_from_fork+0x29/0x50
[ 2624.986806]  ? __pfx_newidle_balance+0x10/0x10
[ 2624.991252]  kasan_set_track+0x21/0x30
[ 2624.995006]  __kasan_slab_alloc+0x83/0x90
[ 2624.999016]  kmem_cache_alloc+0x190/0x3a0
[ 2625.003029]  mempool_alloc+0x100/0x2d0
[ 2625.006782]  ? __pfx_mempool_alloc+0x10/0x10
[ 2625.011054]  ? dma_direct_map_sg+0x1b6/0x3a0
[ 2625.015325]  ? __pfx_dma_direct_map_sg+0x10/0x10
[ 2625.019947]  qla2xxx_get_qpair_sp.constprop.0+0x8b/0x2b0 [qla2xxx]
[ 2625.026186]  qla_nvme_post_cmd+0x273/0x7c0 [qla2xxx]
[ 2625.031202]  nvme_fc_start_fcp_op.part.0+0x5f9/0x1650 [nvme_fc]
[ 2625.037130]  ? finish_wait+0x9a/0x230
[ 2625.040797]  ? __pfx_nvme_fc_start_fcp_op.part.0+0x10/0x10 [nvme_fc]
[ 2625.047159]  nvme_fc_queue_rq+0x3b7/0x730 [nvme_fc]
[ 2625.052044]  ? __pfx_nvme_fc_queue_rq+0x10/0x10 [nvme_fc]
[ 2625.057452]  blk_mq_dispatch_rq_list+0x3d8/0x13c0
[ 2625.062158]  ? __blk_mq_alloc_requests+0x8b0/0xc30
[ 2625.066951]  ? bio_associate_blkg_from_css+0x2fc/0xaf0
[ 2625.072088]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
[ 2625.077227]  ? _raw_spin_lock+0x81/0xe0
[ 2625.081067]  ? __pfx__raw_spin_lock+0x10/0x10
[ 2625.085425]  __blk_mq_sched_dispatch_requests+0x19c/0x480
[ 2625.090825]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
[ 2625.096743]  ? __pfx_blk_mq_alloc_request+0x10/0x10
[ 2625.101623]  ? __pfx_blk_mq_insert_request+0x10/0x10
[ 2625.106588]  blk_mq_sched_dispatch_requests+0xdd/0x130
[ 2625.111726]  blk_mq_run_hw_queue+0x3dd/0x4e0
[ 2625.115999]  ? __pfx_nvme_uring_cmd_end_io+0x10/0x10 [nvme_core]
[ 2625.122022]  nvme_uring_cmd_io+0x826/0xcd0 [nvme_core]
[ 2625.127180]  ? __pfx_nvme_uring_cmd_io+0x10/0x10 [nvme_core]
[ 2625.132853]  ? __update_idle_core+0x59/0x360
[ 2625.137128]  ? __schedule+0x616/0x1530
[ 2625.140878]  ? native_queued_spin_lock_slowpath+0x163/0xa90
[ 2625.146453]  io_uring_cmd+0x1f1/0x3d0
[ 2625.150117]  io_issue_sqe+0x4ff/0xe80
[ 2625.153782]  io_wq_submit_work+0x23e/0xa00
[ 2625.157881]  io_worker_handle_work+0x404/0xa80
[ 2625.162328]  io_wq_worker+0x6c5/0x9d0
[ 2625.165993]  ? __pfx_io_wq_worker+0x10/0x10
[ 2625.170180]  ? _raw_spin_lock_irq+0x82/0xe0
[ 2625.174365]  ? __pfx_io_wq_worker+0x10/0x10
[ 2625.178549]  ret_from_fork+0x29/0x50
[ 2625.182131]  </TASK>
[ 2625.184320]
[ 2625.185820] Allocated by task 2451:
[ 2625.189313]  kasan_save_stack+0x1e/0x40
[ 2625.193153]  kasan_set_track+0x21/0x30
[ 2625.196906]  __kasan_slab_alloc+0x83/0x90
[ 2625.200917]  kmem_cache_alloc_bulk+0x13a/0x1e0
[ 2625.205361]  __io_alloc_req_refill+0x238/0x510
[ 2625.209807]  io_submit_sqes+0x65a/0xcd0
[ 2625.213648]  __do_sys_io_uring_enter+0x4e9/0x830
[ 2625.218266]  do_syscall_64+0x59/0x90
[ 2625.221846]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 2625.226896]
[ 2625.228397] The buggy address belongs to the object at ffff8885102bf340
[ 2625.228397]  which belongs to the cache io_kiocb of size 224
[ 2625.240640] The buggy address is located 152 bytes inside of
[ 2625.240640]  224-byte region [ffff8885102bf340, ffff8885102bf420)
[ 2625.252367]
[ 2625.253865] The buggy address belongs to the physical page:
[ 2625.259439] page:00000000fe3a9582 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x5102bc
[ 2625.268824] head:00000000fe3a9582 order:2 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[ 2625.276910] memcg:ffff888495973401
[ 2625.280313] flags:
0x57ffffc0010200(slab|head|node=1|zone=2|lastcpupid=0x1fffff)
[ 2625.287706] page_type: 0xffffffff()
[ 2625.291200] raw: 0057ffffc0010200 ffff888109159040 dead000000000122
0000000000000000
[ 2625.298938] raw: 0000000000000000 0000000000330033 00000001ffffffff
ffff888495973401
[ 2625.306675] page dumped because: kasan: bad access detected
[ 2625.312250]
[ 2625.313747] Memory state around the buggy address:
[ 2625.318540]  ffff8885102bf280: 00 00 00 00 00 00 00 00 00 00 00 00
fc fc fc fc
[ 2625.325760]  ffff8885102bf300: fc fc fc fc fc fc fc fc 00 00 00 00
00 00 00 00
[ 2625.332978] >ffff8885102bf380: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 2625.340196]                                                        ^
[ 2625.346548]  ffff8885102bf400: 00 00 00 00 fc fc fc fc fc fc fc fc
fc fc fc fc
[ 2625.353766]  ffff8885102bf480: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 2625.360986] ==================================================================



Jens Axboe <[email protected]> 于2023年6月21日周三 23:56写道:
>
> On 6/21/23 8:04?AM, Ming Lei wrote:
> > On Wed, Jun 21, 2023 at 07:40:49AM -0600, Jens Axboe wrote:
> >> On 6/21/23 1:38?AM, Guangwu Zhang wrote:
> >>> HI,
> >>> Found the io_req_local_work_add error when run  liburing testing.
> >>>
> >>> kernel repo :
> >>>     Merge branch 'for-6.5/block' into for-next
> >>>     * for-6.5/block:
> >>>       reiserfs: fix blkdev_put() warning from release_journal_dev()
> >>>
> >>> [ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
> >>> [ 1733.395900] Read of size 4 at addr ffff888133320458 by task
> >>> iou-wrk-97057/97138
> >>> [ 1733.403205]
> >>> [ 1733.404706] CPU: 4 PID: 97138 Comm: iou-wrk-97057 Kdump: loaded Not
> >>> tainted 6.4.0-rc3.kasan+ #1
> >>> [ 1733.413404] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS
> >>> 2.13.3 12/13/2021
> >>> [ 1733.420972] Call Trace:
> >>> [ 1733.423425]  <TASK>
> >>> [ 1733.425533]  dump_stack_lvl+0x33/0x50
> >>> [ 1733.429207]  print_address_description.constprop.0+0x2c/0x3e0
> >>> [ 1733.434959]  print_report+0xb5/0x270
> >>> [ 1733.438539]  ? kasan_addr_to_slab+0x9/0xa0
> >>> [ 1733.442639]  ? io_req_local_work_add+0x3b1/0x4a0
> >>> [ 1733.447258]  kasan_report+0xcf/0x100
> >>> [ 1733.450839]  ? io_req_local_work_add+0x3b1/0x4a0
> >>> [ 1733.455456]  io_req_local_work_add+0x3b1/0x4a0
> >>> [ 1733.459903]  ? __pfx_io_req_local_work_add+0x10/0x10
> >>> [ 1733.464871]  ? __schedule+0x616/0x1530
> >>> [ 1733.468622]  __io_req_task_work_add+0x1bc/0x270
> >>> [ 1733.473156]  io_issue_sqe+0x55a/0xe80
> >>> [ 1733.476831]  io_wq_submit_work+0x23e/0xa00
> >>> [ 1733.480930]  io_worker_handle_work+0x2f5/0xa80
> >>> [ 1733.485384]  io_wq_worker+0x6c5/0x9d0
> >>> [ 1733.489051]  ? __pfx_io_wq_worker+0x10/0x10
> >>> [ 1733.493246]  ? _raw_spin_lock_irq+0x82/0xe0
> >>> [ 1733.497430]  ? __pfx_io_wq_worker+0x10/0x10
> >>> [ 1733.501616]  ret_from_fork+0x29/0x50
> >>> [ 1733.505204]  </TASK>
> >>> [ 1733.507396]
> >>> [ 1733.508894] Allocated by task 97057:
> >>> [ 1733.512475]  kasan_save_stack+0x1e/0x40
> >>> [ 1733.516313]  kasan_set_track+0x21/0x30
> >>> [ 1733.520068]  __kasan_slab_alloc+0x83/0x90
> >>> [ 1733.524080]  kmem_cache_alloc_bulk+0x13a/0x1e0
> >>> [ 1733.528526]  __io_alloc_req_refill+0x238/0x510
> >>> [ 1733.532971]  io_submit_sqes+0x65a/0xcd0
> >>> [ 1733.536810]  __do_sys_io_uring_enter+0x4e9/0x830
> >>> [ 1733.541430]  do_syscall_64+0x59/0x90
> >>> [ 1733.545010]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> >>> [ 1733.550071]
> >>> [ 1733.551571] The buggy address belongs to the object at ffff8881333203c0
> >>> [ 1733.551571]  which belongs to the cache io_kiocb of size 224
> >>> [ 1733.563816] The buggy address is located 152 bytes inside of
> >>> [ 1733.563816]  224-byte region [ffff8881333203c0, ffff8881333204a0)
> >>> [ 1733.575544]
> >>> [ 1733.577042] The buggy address belongs to the physical page:
> >>> [ 1733.582617] page:00000000edbe178c refcount:1 mapcount:0
> >>> mapping:0000000000000000 index:0x0 pfn:0x133320
> >>> [ 1733.592011] head:00000000edbe178c order:1 entire_mapcount:0
> >>> nr_pages_mapped:0 pincount:0
> >>> [ 1733.600096] memcg:ffff88810cd49001
> >>> [ 1733.603501] flags:
> >>> 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> >>> [ 1733.610896] page_type: 0xffffffff()
> >>> [ 1733.614390] raw: 0017ffffc0010200 ffff888101222280 ffffea0004473900
> >>> 0000000000000002
> >>> [ 1733.622128] raw: 0000000000000000 0000000000190019 00000001ffffffff
> >>> ffff88810cd49001
> >>> [ 1733.629866] page dumped because: kasan: bad access detected
> >>> [ 1733.635439]
> >>> [ 1733.636938] Memory state around the buggy address:
> >>> [ 1733.641731]  ffff888133320300: 00 00 00 00 00 00 00 00 00 00 00 00
> >>> fc fc fc fc
> >>> [ 1733.648952]  ffff888133320380: fc fc fc fc fc fc fc fc 00 00 00 00
> >>> 00 00 00 00
> >>> [ 1733.656169] >ffff888133320400: 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 00
> >>> [ 1733.663389]                                                        ^
> >>> [ 1733.669743]  ffff888133320480: 00 00 00 00 fc fc fc fc fc fc fc fc
> >>> fc fc fc fc
> >>> [ 1733.676961]  ffff888133320500: 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 00
> >>
> >> I appreciate you running tests and sending in failures, but can you
> >> please be more specific about what exactly was run? We seem to need to
> >> do this dance every time, which is just wasting time. So:
> >>
> >> 1) What test triggered this?
> >> 2) Was it invoked with any arguments?
> >
> > I can see the whole dmesg log, and it seems from "sq-full-cpp.c /dev/sda":
> >
> > [ 1340.918880] Running test sq-full-cpp.t:
> > [ 1341.009225] Running test sq-full-cpp.t /dev/sda:
> > [ 1342.025292] restraintd[1061]: *** Current Time: Tue Jun 20 21:17:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> > [ 1402.053433] restraintd[1061]: *** Current Time: Tue Jun 20 21:18:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> > [ 1462.047970] restraintd[1061]: *** Current Time: Tue Jun 20 21:19:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> > [-- MARK -- Wed Jun 21 01:20:00 2023]
> > [ 1522.029598] restraintd[1061]: *** Current Time: Tue Jun 20 21:20:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> > [ 1582.029278] restraintd[1061]: *** Current Time: Tue Jun 20 21:21:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> > [ 1642.034617] restraintd[1061]: *** Current Time: Tue Jun 20 21:22:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> > [ 1702.034344] restraintd[1061]: *** Current Time: Tue Jun 20 21:23:57 2023  Localwatchdog at: Wed Jun 21 01:03:56 2023
> > [ 1733.381782] ==================================================================
> > [ 1733.389012] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
> > [ 1733.395900] Read of size 4 at addr ffff888133320458 by task iou-wrk-97057/97138
>
> I don't think that's right - sq-full-cpp doesn't do anything if passed
> an argument, and you also have a lot of time passing in between that
> test being run and the warning being spewed. Maybe the tests are run
> both as root and non-root? The root run ones will add the dmesg spew on
> which test is being run, the non-root will not.
>
> --
> Jens Axboe
>


-- 
Guangwu Zhang
Thanks


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
  2023-06-23  5:51       ` Guangwu Zhang
@ 2023-06-23 12:14         ` Jeff Moyer
  2023-06-23 12:30         ` Pavel Begunkov
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff Moyer @ 2023-06-23 12:14 UTC (permalink / raw)
  To: Guangwu Zhang; +Cc: Jens Axboe, Ming Lei, linux-block, io-uring

Guangwu Zhang <[email protected]> writes:

> Just hit the bug one time with liburing testing, so it hard to said
> which case triggered the issue,
> here is the test steps
> 1) enable kernel KASAN module
> 2) mkfs and mount with sda and set TEST_FILES=/dev/sda
> 3) copy all liburing cases to mount point
> 4) make runtests as root

Just a note on the test procedure: you shouldn't mount sda *and* set
TEST_FILES to sda.  You can leave TEST_FILES empty for this case.

-Jeff


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0
  2023-06-23  5:51       ` Guangwu Zhang
  2023-06-23 12:14         ` Jeff Moyer
@ 2023-06-23 12:30         ` Pavel Begunkov
  1 sibling, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2023-06-23 12:30 UTC (permalink / raw)
  To: Guangwu Zhang, Jens Axboe; +Cc: Ming Lei, linux-block, Jeff Moyer, io-uring

On 6/23/23 06:51, Guangwu Zhang wrote:
> Just hit the bug one time with liburing testing, so it hard to said
> which case triggered the issue,
> here is the test steps
> 1) enable kernel KASAN module
> 2) mkfs and mount with sda and set TEST_FILES=/dev/sda
> 3) copy all liburing cases to mount point
> 4) make runtests as root

Fwiw, I suspect it's a false positive warning due to
kasan checks we added in io_uring. Guangwu, can you decode
the trace? It should help to confirm whether the assumption
is right.

You can use scripts/decode_stacktrace.sh + specify what
source file was used.

Or do it with gdb like below by replacing "0x3b1" with
accordance with the warning message.

gdb vmlinux --ex "l io_req_local_work_add+0x3b1"


TL;DR;

static void io_req_local_work_add(struct io_kiocb *req)
{
	first = READ_ONCE(ctx->work_llist.first);
	do {
		if (first)
			nr_tw_prev = READ_ONCE(first->nr_tw);
		...
		req->io_task_work.node.next = first;
	} while (!try_cmpxchg(&ctx->work_llist, &first, req));
}

This might poke into a "freed" request, but we won't actually
use the value as the following cmpxchg will fail. It should
be fine to look into it because we're under rcu read lock
and requests are allocated with SLAB_TYPESAFE_BY_RCU.
The problem here is that it's not only kasan'ed on kfree,
but there are io_uring caches that are kasan'ed as well.


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-06-23 12:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-21  7:38 [bug report] BUG: KASAN: out-of-bounds in io_req_local_work_add+0x3b1/0x4a0 Guangwu Zhang
2023-06-21 13:40 ` Jens Axboe
2023-06-21 14:04   ` Ming Lei
2023-06-21 15:56     ` Jens Axboe
2023-06-23  5:51       ` Guangwu Zhang
2023-06-23 12:14         ` Jeff Moyer
2023-06-23 12:30         ` Pavel Begunkov
2023-06-21 14:04   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox