Re: [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time

public inbox for [email protected]
 help / color / mirror / Atom feed

From: John Garry <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time
Date: Tue, 4 Mar 2025 18:10:38 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 09/01/2025 18:15, Jens Axboe wrote:
> Rather than try and have io_read/io_write turn REQ_F_REISSUE into
> -EAGAIN, catch the REQ_F_REISSUE when the request is otherwise
> considered as done. This is saner as we know this isn't happening
> during an actual submission, and it removes the need to randomly
> check REQ_F_REISSUE after read/write submission.
> 
> If REQ_F_REISSUE is set, __io_submit_flush_completions() will skip over
> this request in terms of posting a CQE, and the regular request
> cleaning will ensure that it gets reissued via io-wq.
> 
> Signed-off-by: Jens Axboe <[email protected]>

JFYI, this patch causes or exposes an issue in scsi_debug where we get a 
use-after-free:

Starting 10 processes
[    9.445254] 
==================================================================
[    9.446156] BUG: KASAN: slab-use-after-free in bio_poll+0x26b/0x420
[    9.447188] Read of size 4 at addr ff1100014c9b46b4 by task fio/442
[    9.447933]
[    9.448121] CPU: 8 UID: 0 PID: 442 Comm: fio Not tainted 
6.13.0-rc4-00052-gfdf8fc8dce75 #3390
[    9.449161] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    9.450573] Call Trace:
[    9.450876]  <TASK>
[    9.451186]  dump_stack_lvl+0x53/0x70
[    9.451644]  print_report+0xce/0x660
[    9.452077]  ? sdebug_blk_mq_poll+0x92/0x100
[    9.452639]  ? bio_poll+0x26b/0x420
[    9.453077]  kasan_report+0xc6/0x100
[    9.453537]  ? bio_poll+0x26b/0x420
[    9.453955]  bio_poll+0x26b/0x420
[    9.454374]  ? task_mm_cid_work+0x33e/0x750
[    9.454879]  iocb_bio_iopoll+0x47/0x60
[    9.455355]  io_do_iopoll+0x450/0x10a0
[    9.455814]  ? _raw_spin_lock_irq+0x81/0xe0
[    9.456359]  ? __pfx_io_do_iopoll+0x10/0x10
[    9.456866]  ? mutex_lock+0x8c/0xe0
[    9.457317]  ? __pfx_mutex_lock+0x10/0x10
[    9.457799]  ? __pfx_mutex_unlock+0x10/0x10
[    9.458316]  __do_sys_io_uring_enter+0x7b7/0x12e0
[    9.458866]  ? __pfx___do_sys_io_uring_enter+0x10/0x10
[    9.459515]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
[    9.460202]  ? handle_mm_fault+0x16f/0x400
[    9.460696]  do_syscall_64+0xa6/0x1a0
[    9.461149]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[    9.461787] RIP: 0033:0x560572d148f8
[    9.462234] Code: 1c 01 00 00 48 8b 04 24 83 78 38 00 0f 85 0e 01 00 
00 41 8b 3f 41 ba 01 00 00 00 45 31 c0 45 31 c9 b8 aa 01 00 00 89 ea 0f 
05 <89> c6 85 c0 0f 89 ec 00 00 00 89 44 24 0c e8 55 87 fa ff 8b 74 24
[    9.464489] RSP: 002b:00007ffc5330a600 EFLAGS: 00000246 ORIG_RAX: 
00000000000001aa
[    9.465400] RAX: ffffffffffffffda RBX: 00007f39cd9d9ac0 RCX: 
0000560572d148f8
[    9.466254] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
0000000000000006
[    9.467114] RBP: 0000000000000001 R08: 0000000000000000 R09: 
0000000000000000
[    9.467962] R10: 0000000000000001 R11: 0000000000000246 R12: 
0000000000000000
[    9.468803] R13: 00007ffc5330a798 R14: 0000000000000001 R15: 
0000560577589630
[    9.469672]  </TASK>
[    9.469950]
[    9.470168] Allocated by task 441:
[    9.470577]  kasan_save_stack+0x33/0x60
[    9.471033]  kasan_save_track+0x14/0x30
[    9.471554]  __kasan_slab_alloc+0x6e/0x70
[    9.472036]  kmem_cache_alloc_noprof+0xe9/0x300
[    9.472599]  mempool_alloc_noprof+0x11a/0x2e0
[    9.473161]  bio_alloc_bioset+0x1ab/0x780
[    9.473634]  blkdev_direct_IO+0x456/0x2130
[    9.474130]  blkdev_write_iter+0x54f/0xb90
[    9.474647]  io_write+0x3b3/0xfe0
[    9.475053]  io_issue_sqe+0x131/0x13e0
[    9.475516]  io_submit_sqes+0x6f6/0x21e0
[    9.475995]  __do_sys_io_uring_enter+0xa1e/0x12e0
[    9.476602]  do_syscall_64+0xa6/0x1a0
[    9.477043]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[    9.477659]
[    9.477848] Freed by task 441:
[    9.478261]  kasan_save_stack+0x33/0x60
[    9.478715]  kasan_save_track+0x14/0x30
[    9.479197]  kasan_save_free_info+0x3b/0x60
[    9.479692]  __kasan_slab_free+0x37/0x50
[    9.480191]  slab_free_after_rcu_debug+0xb1/0x280
[    9.480755]  rcu_core+0x610/0x1a80
[    9.481215]  handle_softirqs+0x1b5/0x5c0
[    9.481696]  irq_exit_rcu+0xaf/0xe0
[    9.482119]  sysvec_apic_timer_interrupt+0x6c/0x80
[    9.482729]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[    9.483389]
[    9.483581] Last potentially related work creation:
[    9.484174]  kasan_save_stack+0x33/0x60
[    9.484661]  __kasan_record_aux_stack+0x8e/0xa0
[    9.485228]  kmem_cache_free+0x21c/0x370
[    9.485713]  blk_update_request+0x22c/0x1070
[    9.486280]  scsi_end_request+0x6b/0x5d0
[    9.486762]  scsi_io_completion+0xa4/0xda0
[    9.487285]  sdebug_blk_mq_poll_iter+0x189/0x2c0
[    9.487851]  bt_tags_iter+0x15f/0x290
[    9.488310]  __blk_mq_all_tag_iter+0x31d/0x960
[    9.488869]  blk_mq_tagset_busy_iter+0xeb/0x140
[    9.489448]  sdebug_blk_mq_poll+0x92/0x100
[    9.489949]  blk_hctx_poll+0x160/0x330
[    9.490446]  bio_poll+0x182/0x420
[    9.490853]  iocb_bio_iopoll+0x47/0x60
[    9.491343]  io_do_iopoll+0x450/0x10a0
[    9.491798]  __do_sys_io_uring_enter+0x7b7/0x12e0
[    9.492398]  do_syscall_64+0xa6/0x1a0
[    9.492852]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[    9.493484]
[    9.493676] The buggy address belongs to the object at ff1100014c9b4640
[    9.493676]  which belongs to the cache bio-248 of size 248
[    9.495118] The buggy address is located 116 bytes inside of
[    9.495118]  freed 248-byte region [ff1100014c9b4640, ff1100014c9b4738)
[    9.496597]
[    9.496784] The buggy address belongs to the physical page:
[    9.497465] page: refcount:1 mapcount:0 mapping:0000000000000000 
index:0x0 pfn:0x14c9b4
[    9.498464] head: order:2 mapcount:0 entire_mapcount:0 
nr_pages_mapped:0 pincount:0
[    9.499421] flags: 0x200000000000040(head|node=0|zone=2)
[    9.500053] page_type: f5(slab)
[    9.500451] raw: 0200000000000040 ff110001052f8dc0 dead000000000122 
0000000000000000
[    9.501386] raw: 0000000000000000 0000000080330033 00000001f5000000 
0000000000000000
[    9.502333] head: 0200000000000040 ff110001052f8dc0 dead000000000122 
0000000000000000
[    9.503261] head: 0000000000000000 0000000080330033 00000001f5000000 
0000000000000000
[    9.504213] head: 0200000000000002 ffd4000005326d01 ffffffffffffffff 
0000000000000000
[    9.505142] head: 0000000000000004 0000000000000000 00000000ffffffff 
0000000000000000
[    9.506082] page dumped because: kasan: bad access detected
[    9.506752]
[    9.506939] Memory state around the buggy address:
[    9.507560]  ff1100014c9b4580: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 fc
[    9.508454]  ff1100014c9b4600: fc fc fc fc fc fc fc fc fa fb fb fb fb 
fb fb fb
[    9.509365] >ff1100014c9b4680: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[    9.510260]                                      ^
[    9.510842]  ff1100014c9b4700: fb fb fb fb fb fb fb fc fc fc fc fc fc 
fc fc fc
[    9.511755]  ff1100014c9b4780: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[    9.512654] 
==================================================================
[    9.513616] Disabling lock debugging due to kernel taint
QEMU: Terminated

Now scsi_debug does something pretty unorthodox in the mq_poll callback 
in that it calls blk_mq_tagset_busy_iter() ... -> scsi_done().

However, for qemu with nvme I get this:

fio-3.34
Starting 10 processes
[   30.887296] 
==================================================================
[   30.907820] BUG: KASAN: slab-use-after-free in bio_poll+0x26b/0x420
[   30.924793] Read of size 4 at addr ff1100015f775ab4 by task fio/458
[   30.949904]
[   30.952784] CPU: 11 UID: 0 PID: 458 Comm: fio Not tainted 
6.13.0-rc4-00053-gc9c268957b58 #3391
[   31.036344] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   31.090860] Call Trace:
[   31.153928]  <TASK>
[   31.180060]  dump_stack_lvl+0x53/0x70
[   31.209414]  print_report+0xce/0x660
[   31.220341]  ? bio_poll+0x26b/0x420
[   31.236876]  kasan_report+0xc6/0x100
[   31.253395]  ? bio_poll+0x26b/0x420
[   31.283105]  bio_poll+0x26b/0x420
[   31.304388]  iocb_bio_iopoll+0x47/0x60
[   31.327575]  io_do_iopoll+0x450/0x10a0
[   31.357706]  ? __pfx_io_do_iopoll+0x10/0x10
[   31.381389]  ? io_submit_sqes+0x6f6/0x21e0
[   31.397833]  ? mutex_lock+0x8c/0xe0
[   31.436789]  ? __pfx_mutex_lock+0x10/0x10
[   31.469967]  __do_sys_io_uring_enter+0x7b7/0x12e0
[   31.506017]  ? __pfx___do_sys_io_uring_enter+0x10/0x10
[   31.556819]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
[   31.599749]  ? handle_mm_fault+0x16f/0x400
[   31.637617]  ? up_read+0x1a/0xb0
[   31.658649]  do_syscall_64+0xa6/0x1a0
[   31.715961]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   31.738610] RIP: 0033:0x558b29f538f8
[   31.758298] Code: 1c 01 00 00 48 8b 04 24 83 78 38 00 0f 85 0e 01 00 
00 41 8b 3f 41 ba 01 00 00 00 45 31 c0 45 31 c9 b8 aa 01 00 00 89 ea 0f 
05 <89> c6 85 c0 0f 89 ec 00 00 00 89 44 24 0c e8 55 87 fa ff 8b 74 24
[   31.868980] RSP: 002b:00007ffd37d51490 EFLAGS: 00000246 ORIG_RAX: 
00000000000001aa
[   31.946356] RAX: ffffffffffffffda RBX: 00007f120ebfeb40 RCX: 
0000558b29f538f8
[   32.044833] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
0000000000000006
[   32.086849] RBP: 0000000000000001 R08: 0000000000000000 R09: 
0000000000000000
[   32.117522] R10: 0000000000000001 R11: 0000000000000246 R12: 
0000000000000000
[   32.155554] R13: 00007ffd37d51628 R14: 0000000000000001 R15: 
0000558b3c3216b0
[   32.174488]  </TASK>
[   32.183180]
[   32.193202] Allocated by task 458:
[   32.205642]  kasan_save_stack+0x33/0x60
[   32.215908]  kasan_save_track+0x14/0x30
[   32.231828]  __kasan_slab_alloc+0x6e/0x70
[   32.244998]  kmem_cache_alloc_noprof+0xe9/0x300
[   32.263654]  mempool_alloc_noprof+0x11a/0x2e0
[   32.274050]  bio_alloc_bioset+0x1ab/0x780
[   32.286829]  blkdev_direct_IO+0x456/0x2130
[   32.293655]  blkdev_write_iter+0x54f/0xb90
[   32.299844]  io_write+0x3b3/0xfe0
[   32.309428]  io_issue_sqe+0x131/0x13e0
[   32.315319]  io_submit_sqes+0x6f6/0x21e0
[   32.320913]  __do_sys_io_uring_enter+0xa1e/0x12e0
[   32.328091]  do_syscall_64+0xa6/0x1a0
[   32.336915]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   32.350460]
[   32.355097] Freed by task 455:
[   32.360331]  kasan_save_stack+0x33/0x60
[   32.369595]  kasan_save_track+0x14/0x30
[   32.377397]  kasan_save_free_info+0x3b/0x60
[   32.386598]  __kasan_slab_free+0x37/0x50
[   32.398562]  slab_free_after_rcu_debug+0xb1/0x280
[   32.417108]  rcu_core+0x610/0x1a80
[   32.424947]  handle_softirqs+0x1b5/0x5c0
[   32.434754]  irq_exit_rcu+0xaf/0xe0
[   32.438144]  sysvec_apic_timer_interrupt+0x6c/0x80
[   32.443842]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   32.448109]
[   32.449772] Last potentially related work creation:
[   32.454800]  kasan_save_stack+0x33/0x60
[   32.458743]  __kasan_record_aux_stack+0x8e/0xa0
[   32.463802]  kmem_cache_free+0x21c/0x370
[   32.468130]  blk_mq_end_request_batch+0x26b/0x13f0
[   32.473935]  io_do_iopoll+0xa78/0x10a0
[   32.477800]  __do_sys_io_uring_enter+0x7b7/0x12e0
[   32.482678]  do_syscall_64+0xa6/0x1a0
[   32.487671]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   32.492551]
[   32.494058] The buggy address belongs to the object at ff1100015f775a40
[   32.494058]  which belongs to the cache bio-248 of size 248
[   32.504485] The buggy address is located 116 bytes inside of
[   32.504485]  freed 248-byte region [ff1100015f775a40, ff1100015f775b38)
[   32.518309]
[   32.520370] The buggy address belongs to the physical page:
[   32.526444] page: refcount:1 mapcount:0 mapping:0000000000000000 
index:0x0 pfn:0x15f774
[   32.535554] head: order:2 mapcount:0 entire_mapcount:0 
nr_pages_mapped:0 pincount:0
[   32.542517] flags: 0x200000000000040(head|node=0|zone=2)
[   32.547971] page_type: f5(slab)
[   32.551287] raw: 0200000000000040 ff1100010376af80 dead000000000122 
0000000000000000
[   32.559290] raw: 0000000000000000 0000000000330033 00000001f5000000 
0000000000000000
[   32.566773] head: 0200000000000040 ff1100010376af80 dead000000000122 
0000000000000000
[   32.574046] head: 0000000000000000 0000000000330033 00000001f5000000 
0000000000000000
[   32.581715] head: 0200000000000002 ffd40000057ddd01 ffffffffffffffff 
0000000000000000
[   32.589588] head: 0000000000000004 0000000000000000 00000000ffffffff 
0000000000000000
[   32.596963] page dumped because: kasan: bad access detected
[   32.603473]
[   32.604871] Memory state around the buggy address:
[   32.609617]  ff1100015f775980: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 fc
[   32.617652]  ff1100015f775a00: fc fc fc fc fc fc fc fc fa fb fb fb fb 
fb fb fb
[   32.625385] >ff1100015f775a80: fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb fb fb
[   32.634014]                                      ^
[   32.637444]  ff1100015f775b00: fb fb fb fb fb fb fb fc fc fc fc fc fc 
fc fc fc
[   32.644158]  ff1100015f775b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[   32.651115] 
==================================================================
[   32.659002] Disabling lock debugging due to kernel taint
QEMU: Terminated [W(10)][0.1%][w=150MiB/s][w=38.4k IOPS][eta 01h:24m:16s]
root@jgarry-ubuntu-bm5-instance-20230215-1843:/home/ubuntu/linux#

Here's my git bisect log:

git bisect start
# good: [1cbfb828e05171ca2dd77b5988d068e6872480fe] Merge tag
'for-6.14/block-20250118' of git://git.kernel.dk/linux
git bisect good 1cbfb828e05171ca2dd77b5988d068e6872480fe
# bad: [a312e1706ce6c124f04ec85ddece240f3bb2a696] Merge tag
'for-6.14/io_uring-20250119' of git://git.kernel.dk/linux
git bisect bad a312e1706ce6c124f04ec85ddece240f3bb2a696
# good: [3d8b5a22d40435b4a7e58f06ae2cd3506b222898] block: add support
to pass user meta buffer
git bisect good 3d8b5a22d40435b4a7e58f06ae2cd3506b222898
# good: [ce9464081d5168ee0f279d6932ba82260a5b97c4] io_uring/msg_ring:
Drop custom destructor
git bisect good ce9464081d5168ee0f279d6932ba82260a5b97c4
# bad: [d803d123948feffbd992213e144df224097f82b0] io_uring/rw: handle
-EAGAIN retry at IO completion time
git bisect bad d803d123948feffbd992213e144df224097f82b0
# good: [c5f71916146033f9aba108075ff7087022075fd6] io_uring/rw: always
clear ->bytes_done on io_async_rw setup
git bisect good c5f71916146033f9aba108075ff7087022075fd6
# good: [2a51c327d4a4a2eb62d67f4ea13a17efd0f25c5c] io_uring/rsrc:
simplify the bvec iter count calculation
git bisect good 2a51c327d4a4a2eb62d67f4ea13a17efd0f25c5c
# good: [9ac273ae3dc296905b4d61e4c8e7a25592f6d183] io_uring/rw: use
io_rw_recycle() from cleanup path
git bisect good 9ac273ae3dc296905b4d61e4c8e7a25592f6d183
# first bad commit: [d803d123948feffbd992213e144df224097f82b0]
io_uring/rw: handle -EAGAIN retry at IO completion time
john@localhost:~/mnt_sda4/john/kernel-dev2>

Thanks,
John

> ---
>   io_uring/io_uring.c | 15 +++++++--
>   io_uring/rw.c       | 80 ++++++++++++++-------------------------------
>   2 files changed, 38 insertions(+), 57 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index db198bd435b5..92ba2fdcd087 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -115,7 +115,7 @@
>   				REQ_F_ASYNC_DATA)
>   
>   #define IO_REQ_CLEAN_SLOW_FLAGS (REQ_F_REFCOUNT | REQ_F_LINK | REQ_F_HARDLINK |\
> -				 IO_REQ_CLEAN_FLAGS)
> +				 REQ_F_REISSUE | IO_REQ_CLEAN_FLAGS)
>   
>   #define IO_TCTX_REFS_CACHE_NR	(1U << 10)
>   
> @@ -1403,6 +1403,12 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
>   						    comp_list);
>   
>   		if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) {
> +			if (req->flags & REQ_F_REISSUE) {
> +				node = req->comp_list.next;
> +				req->flags &= ~REQ_F_REISSUE;
> +				io_queue_iowq(req);
> +				continue;
> +			}
>   			if (req->flags & REQ_F_REFCOUNT) {
>   				node = req->comp_list.next;
>   				if (!req_ref_put_and_test(req))
> @@ -1442,7 +1448,12 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
>   		struct io_kiocb *req = container_of(node, struct io_kiocb,
>   					    comp_list);
>   
> -		if (!(req->flags & REQ_F_CQE_SKIP) &&
> +		/*
> +		 * Requests marked with REQUEUE should not post a CQE, they
> +		 * will go through the io-wq retry machinery and post one
> +		 * later.
> +		 */
> +		if (!(req->flags & (REQ_F_CQE_SKIP | REQ_F_REISSUE)) &&
>   		    unlikely(!io_fill_cqe_req(ctx, req))) {
>   			if (ctx->lockless_cq) {
>   				spin_lock(&ctx->completion_lock);
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index afc669048c5d..c52c0515f0a2 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -202,7 +202,7 @@ static void io_req_rw_cleanup(struct io_kiocb *req, unsigned int issue_flags)
>   	 * mean that the underlying data can be gone at any time. But that
>   	 * should be fixed seperately, and then this check could be killed.
>   	 */
> -	if (!(req->flags & REQ_F_REFCOUNT)) {
> +	if (!(req->flags & (REQ_F_REISSUE | REQ_F_REFCOUNT))) {
>   		req->flags &= ~REQ_F_NEED_CLEANUP;
>   		io_rw_recycle(req, issue_flags);
>   	}
> @@ -455,19 +455,12 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
>   	return NULL;
>   }
>   
> -#ifdef CONFIG_BLOCK
> -static void io_resubmit_prep(struct io_kiocb *req)
> -{
> -	struct io_async_rw *io = req->async_data;
> -	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> -
> -	io_meta_restore(io, &rw->kiocb);
> -	iov_iter_restore(&io->iter, &io->iter_state);
> -}
> -
>   static bool io_rw_should_reissue(struct io_kiocb *req)
>   {
> +#ifdef CONFIG_BLOCK
> +	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
>   	umode_t mode = file_inode(req->file)->i_mode;
> +	struct io_async_rw *io = req->async_data;
>   	struct io_ring_ctx *ctx = req->ctx;
>   
>   	if (!S_ISBLK(mode) && !S_ISREG(mode))
> @@ -488,17 +481,14 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
>   	 */
>   	if (!same_thread_group(req->tctx->task, current) || !in_task())
>   		return false;
> +
> +	io_meta_restore(io, &rw->kiocb);
> +	iov_iter_restore(&io->iter, &io->iter_state);
>   	return true;
> -}
>   #else
> -static void io_resubmit_prep(struct io_kiocb *req)
> -{
> -}
> -static bool io_rw_should_reissue(struct io_kiocb *req)
> -{
>   	return false;
> -}
>   #endif
> +}
>   
>   static void io_req_end_write(struct io_kiocb *req)
>   {
> @@ -525,22 +515,16 @@ static void io_req_io_end(struct io_kiocb *req)
>   	}
>   }
>   
> -static bool __io_complete_rw_common(struct io_kiocb *req, long res)
> +static void __io_complete_rw_common(struct io_kiocb *req, long res)
>   {
> -	if (unlikely(res != req->cqe.res)) {
> -		if (res == -EAGAIN && io_rw_should_reissue(req)) {
> -			/*
> -			 * Reissue will start accounting again, finish the
> -			 * current cycle.
> -			 */
> -			io_req_io_end(req);
> -			req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
> -			return true;
> -		}
> +	if (res == req->cqe.res)
> +		return;
> +	if (res == -EAGAIN && io_rw_should_reissue(req)) {
> +		req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
> +	} else {
>   		req_set_fail(req);
>   		req->cqe.res = res;
>   	}
> -	return false;
>   }
>   
>   static inline int io_fixup_rw_res(struct io_kiocb *req, long res)
> @@ -583,8 +567,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res)
>   	struct io_kiocb *req = cmd_to_io_kiocb(rw);
>   
>   	if (!kiocb->dio_complete || !(kiocb->ki_flags & IOCB_DIO_CALLER_COMP)) {
> -		if (__io_complete_rw_common(req, res))
> -			return;
> +		__io_complete_rw_common(req, res);
>   		io_req_set_res(req, io_fixup_rw_res(req, res), 0);
>   	}
>   	req->io_task_work.func = io_req_rw_complete;
> @@ -646,26 +629,19 @@ static int kiocb_done(struct io_kiocb *req, ssize_t ret,
>   	if (ret >= 0 && req->flags & REQ_F_CUR_POS)
>   		req->file->f_pos = rw->kiocb.ki_pos;
>   	if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) {
> -		if (!__io_complete_rw_common(req, ret)) {
> -			/*
> -			 * Safe to call io_end from here as we're inline
> -			 * from the submission path.
> -			 */
> -			io_req_io_end(req);
> -			io_req_set_res(req, final_ret,
> -				       io_put_kbuf(req, ret, issue_flags));
> -			io_req_rw_cleanup(req, issue_flags);
> -			return IOU_OK;
> -		}
> +		__io_complete_rw_common(req, ret);
> +		/*
> +		 * Safe to call io_end from here as we're inline
> +		 * from the submission path.
> +		 */
> +		io_req_io_end(req);
> +		io_req_set_res(req, final_ret, io_put_kbuf(req, ret, issue_flags));
> +		io_req_rw_cleanup(req, issue_flags);
> +		return IOU_OK;
>   	} else {
>   		io_rw_done(&rw->kiocb, ret);
>   	}
>   
> -	if (req->flags & REQ_F_REISSUE) {
> -		req->flags &= ~REQ_F_REISSUE;
> -		io_resubmit_prep(req);
> -		return -EAGAIN;
> -	}
>   	return IOU_ISSUE_SKIP_COMPLETE;
>   }
>   
> @@ -944,8 +920,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>   	if (ret == -EOPNOTSUPP && force_nonblock)
>   		ret = -EAGAIN;
>   
> -	if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) {
> -		req->flags &= ~REQ_F_REISSUE;
> +	if (ret == -EAGAIN) {
>   		/* If we can poll, just do that. */
>   		if (io_file_can_poll(req))
>   			return -EAGAIN;
> @@ -1154,11 +1129,6 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>   	else
>   		ret2 = -EINVAL;
>   
> -	if (req->flags & REQ_F_REISSUE) {
> -		req->flags &= ~REQ_F_REISSUE;
> -		ret2 = -EAGAIN;
> -	}
> -
>   	/*
>   	 * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just
>   	 * retry them without IOCB_NOWAIT.

next prev parent reply	other threads:[~2025-03-04 18:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09 18:15 [PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases Jens Axboe
2025-01-09 18:15 ` [PATCH 1/3] io_uring/rw: use io_rw_recycle() from cleanup path Jens Axboe
2025-01-09 18:15 ` [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time Jens Axboe
2025-03-04 18:10   ` John Garry [this message]
2025-03-05 16:57     ` John Garry
2025-03-05 17:03       ` Jens Axboe
2025-03-05 21:03         ` Jens Axboe
2025-01-09 18:15 ` [PATCH 3/3] io_uring/rw: don't gate retry on completion context Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox