From: John Garry <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time
Date: Tue, 4 Mar 2025 18:10:38 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 09/01/2025 18:15, Jens Axboe wrote:
> Rather than try and have io_read/io_write turn REQ_F_REISSUE into
> -EAGAIN, catch the REQ_F_REISSUE when the request is otherwise
> considered as done. This is saner as we know this isn't happening
> during an actual submission, and it removes the need to randomly
> check REQ_F_REISSUE after read/write submission.
>
> If REQ_F_REISSUE is set, __io_submit_flush_completions() will skip over
> this request in terms of posting a CQE, and the regular request
> cleaning will ensure that it gets reissued via io-wq.
>
> Signed-off-by: Jens Axboe <[email protected]>
JFYI, this patch causes or exposes an issue in scsi_debug where we get a
use-after-free:
Starting 10 processes
[ 9.445254]
==================================================================
[ 9.446156] BUG: KASAN: slab-use-after-free in bio_poll+0x26b/0x420
[ 9.447188] Read of size 4 at addr ff1100014c9b46b4 by task fio/442
[ 9.447933]
[ 9.448121] CPU: 8 UID: 0 PID: 442 Comm: fio Not tainted
6.13.0-rc4-00052-gfdf8fc8dce75 #3390
[ 9.449161] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 9.450573] Call Trace:
[ 9.450876] <TASK>
[ 9.451186] dump_stack_lvl+0x53/0x70
[ 9.451644] print_report+0xce/0x660
[ 9.452077] ? sdebug_blk_mq_poll+0x92/0x100
[ 9.452639] ? bio_poll+0x26b/0x420
[ 9.453077] kasan_report+0xc6/0x100
[ 9.453537] ? bio_poll+0x26b/0x420
[ 9.453955] bio_poll+0x26b/0x420
[ 9.454374] ? task_mm_cid_work+0x33e/0x750
[ 9.454879] iocb_bio_iopoll+0x47/0x60
[ 9.455355] io_do_iopoll+0x450/0x10a0
[ 9.455814] ? _raw_spin_lock_irq+0x81/0xe0
[ 9.456359] ? __pfx_io_do_iopoll+0x10/0x10
[ 9.456866] ? mutex_lock+0x8c/0xe0
[ 9.457317] ? __pfx_mutex_lock+0x10/0x10
[ 9.457799] ? __pfx_mutex_unlock+0x10/0x10
[ 9.458316] __do_sys_io_uring_enter+0x7b7/0x12e0
[ 9.458866] ? __pfx___do_sys_io_uring_enter+0x10/0x10
[ 9.459515] ? __pfx___rseq_handle_notify_resume+0x10/0x10
[ 9.460202] ? handle_mm_fault+0x16f/0x400
[ 9.460696] do_syscall_64+0xa6/0x1a0
[ 9.461149] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 9.461787] RIP: 0033:0x560572d148f8
[ 9.462234] Code: 1c 01 00 00 48 8b 04 24 83 78 38 00 0f 85 0e 01 00
00 41 8b 3f 41 ba 01 00 00 00 45 31 c0 45 31 c9 b8 aa 01 00 00 89 ea 0f
05 <89> c6 85 c0 0f 89 ec 00 00 00 89 44 24 0c e8 55 87 fa ff 8b 74 24
[ 9.464489] RSP: 002b:00007ffc5330a600 EFLAGS: 00000246 ORIG_RAX:
00000000000001aa
[ 9.465400] RAX: ffffffffffffffda RBX: 00007f39cd9d9ac0 RCX:
0000560572d148f8
[ 9.466254] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
0000000000000006
[ 9.467114] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000000000000
[ 9.467962] R10: 0000000000000001 R11: 0000000000000246 R12:
0000000000000000
[ 9.468803] R13: 00007ffc5330a798 R14: 0000000000000001 R15:
0000560577589630
[ 9.469672] </TASK>
[ 9.469950]
[ 9.470168] Allocated by task 441:
[ 9.470577] kasan_save_stack+0x33/0x60
[ 9.471033] kasan_save_track+0x14/0x30
[ 9.471554] __kasan_slab_alloc+0x6e/0x70
[ 9.472036] kmem_cache_alloc_noprof+0xe9/0x300
[ 9.472599] mempool_alloc_noprof+0x11a/0x2e0
[ 9.473161] bio_alloc_bioset+0x1ab/0x780
[ 9.473634] blkdev_direct_IO+0x456/0x2130
[ 9.474130] blkdev_write_iter+0x54f/0xb90
[ 9.474647] io_write+0x3b3/0xfe0
[ 9.475053] io_issue_sqe+0x131/0x13e0
[ 9.475516] io_submit_sqes+0x6f6/0x21e0
[ 9.475995] __do_sys_io_uring_enter+0xa1e/0x12e0
[ 9.476602] do_syscall_64+0xa6/0x1a0
[ 9.477043] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 9.477659]
[ 9.477848] Freed by task 441:
[ 9.478261] kasan_save_stack+0x33/0x60
[ 9.478715] kasan_save_track+0x14/0x30
[ 9.479197] kasan_save_free_info+0x3b/0x60
[ 9.479692] __kasan_slab_free+0x37/0x50
[ 9.480191] slab_free_after_rcu_debug+0xb1/0x280
[ 9.480755] rcu_core+0x610/0x1a80
[ 9.481215] handle_softirqs+0x1b5/0x5c0
[ 9.481696] irq_exit_rcu+0xaf/0xe0
[ 9.482119] sysvec_apic_timer_interrupt+0x6c/0x80
[ 9.482729] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 9.483389]
[ 9.483581] Last potentially related work creation:
[ 9.484174] kasan_save_stack+0x33/0x60
[ 9.484661] __kasan_record_aux_stack+0x8e/0xa0
[ 9.485228] kmem_cache_free+0x21c/0x370
[ 9.485713] blk_update_request+0x22c/0x1070
[ 9.486280] scsi_end_request+0x6b/0x5d0
[ 9.486762] scsi_io_completion+0xa4/0xda0
[ 9.487285] sdebug_blk_mq_poll_iter+0x189/0x2c0
[ 9.487851] bt_tags_iter+0x15f/0x290
[ 9.488310] __blk_mq_all_tag_iter+0x31d/0x960
[ 9.488869] blk_mq_tagset_busy_iter+0xeb/0x140
[ 9.489448] sdebug_blk_mq_poll+0x92/0x100
[ 9.489949] blk_hctx_poll+0x160/0x330
[ 9.490446] bio_poll+0x182/0x420
[ 9.490853] iocb_bio_iopoll+0x47/0x60
[ 9.491343] io_do_iopoll+0x450/0x10a0
[ 9.491798] __do_sys_io_uring_enter+0x7b7/0x12e0
[ 9.492398] do_syscall_64+0xa6/0x1a0
[ 9.492852] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 9.493484]
[ 9.493676] The buggy address belongs to the object at ff1100014c9b4640
[ 9.493676] which belongs to the cache bio-248 of size 248
[ 9.495118] The buggy address is located 116 bytes inside of
[ 9.495118] freed 248-byte region [ff1100014c9b4640, ff1100014c9b4738)
[ 9.496597]
[ 9.496784] The buggy address belongs to the physical page:
[ 9.497465] page: refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x14c9b4
[ 9.498464] head: order:2 mapcount:0 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[ 9.499421] flags: 0x200000000000040(head|node=0|zone=2)
[ 9.500053] page_type: f5(slab)
[ 9.500451] raw: 0200000000000040 ff110001052f8dc0 dead000000000122
0000000000000000
[ 9.501386] raw: 0000000000000000 0000000080330033 00000001f5000000
0000000000000000
[ 9.502333] head: 0200000000000040 ff110001052f8dc0 dead000000000122
0000000000000000
[ 9.503261] head: 0000000000000000 0000000080330033 00000001f5000000
0000000000000000
[ 9.504213] head: 0200000000000002 ffd4000005326d01 ffffffffffffffff
0000000000000000
[ 9.505142] head: 0000000000000004 0000000000000000 00000000ffffffff
0000000000000000
[ 9.506082] page dumped because: kasan: bad access detected
[ 9.506752]
[ 9.506939] Memory state around the buggy address:
[ 9.507560] ff1100014c9b4580: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 fc
[ 9.508454] ff1100014c9b4600: fc fc fc fc fc fc fc fc fa fb fb fb fb
fb fb fb
[ 9.509365] >ff1100014c9b4680: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 9.510260] ^
[ 9.510842] ff1100014c9b4700: fb fb fb fb fb fb fb fc fc fc fc fc fc
fc fc fc
[ 9.511755] ff1100014c9b4780: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[ 9.512654]
==================================================================
[ 9.513616] Disabling lock debugging due to kernel taint
QEMU: Terminated
Now scsi_debug does something pretty unorthodox in the mq_poll callback
in that it calls blk_mq_tagset_busy_iter() ... -> scsi_done().
However, for qemu with nvme I get this:
fio-3.34
Starting 10 processes
[ 30.887296]
==================================================================
[ 30.907820] BUG: KASAN: slab-use-after-free in bio_poll+0x26b/0x420
[ 30.924793] Read of size 4 at addr ff1100015f775ab4 by task fio/458
[ 30.949904]
[ 30.952784] CPU: 11 UID: 0 PID: 458 Comm: fio Not tainted
6.13.0-rc4-00053-gc9c268957b58 #3391
[ 31.036344] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 31.090860] Call Trace:
[ 31.153928] <TASK>
[ 31.180060] dump_stack_lvl+0x53/0x70
[ 31.209414] print_report+0xce/0x660
[ 31.220341] ? bio_poll+0x26b/0x420
[ 31.236876] kasan_report+0xc6/0x100
[ 31.253395] ? bio_poll+0x26b/0x420
[ 31.283105] bio_poll+0x26b/0x420
[ 31.304388] iocb_bio_iopoll+0x47/0x60
[ 31.327575] io_do_iopoll+0x450/0x10a0
[ 31.357706] ? __pfx_io_do_iopoll+0x10/0x10
[ 31.381389] ? io_submit_sqes+0x6f6/0x21e0
[ 31.397833] ? mutex_lock+0x8c/0xe0
[ 31.436789] ? __pfx_mutex_lock+0x10/0x10
[ 31.469967] __do_sys_io_uring_enter+0x7b7/0x12e0
[ 31.506017] ? __pfx___do_sys_io_uring_enter+0x10/0x10
[ 31.556819] ? __pfx___rseq_handle_notify_resume+0x10/0x10
[ 31.599749] ? handle_mm_fault+0x16f/0x400
[ 31.637617] ? up_read+0x1a/0xb0
[ 31.658649] do_syscall_64+0xa6/0x1a0
[ 31.715961] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 31.738610] RIP: 0033:0x558b29f538f8
[ 31.758298] Code: 1c 01 00 00 48 8b 04 24 83 78 38 00 0f 85 0e 01 00
00 41 8b 3f 41 ba 01 00 00 00 45 31 c0 45 31 c9 b8 aa 01 00 00 89 ea 0f
05 <89> c6 85 c0 0f 89 ec 00 00 00 89 44 24 0c e8 55 87 fa ff 8b 74 24
[ 31.868980] RSP: 002b:00007ffd37d51490 EFLAGS: 00000246 ORIG_RAX:
00000000000001aa
[ 31.946356] RAX: ffffffffffffffda RBX: 00007f120ebfeb40 RCX:
0000558b29f538f8
[ 32.044833] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
0000000000000006
[ 32.086849] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000000000000
[ 32.117522] R10: 0000000000000001 R11: 0000000000000246 R12:
0000000000000000
[ 32.155554] R13: 00007ffd37d51628 R14: 0000000000000001 R15:
0000558b3c3216b0
[ 32.174488] </TASK>
[ 32.183180]
[ 32.193202] Allocated by task 458:
[ 32.205642] kasan_save_stack+0x33/0x60
[ 32.215908] kasan_save_track+0x14/0x30
[ 32.231828] __kasan_slab_alloc+0x6e/0x70
[ 32.244998] kmem_cache_alloc_noprof+0xe9/0x300
[ 32.263654] mempool_alloc_noprof+0x11a/0x2e0
[ 32.274050] bio_alloc_bioset+0x1ab/0x780
[ 32.286829] blkdev_direct_IO+0x456/0x2130
[ 32.293655] blkdev_write_iter+0x54f/0xb90
[ 32.299844] io_write+0x3b3/0xfe0
[ 32.309428] io_issue_sqe+0x131/0x13e0
[ 32.315319] io_submit_sqes+0x6f6/0x21e0
[ 32.320913] __do_sys_io_uring_enter+0xa1e/0x12e0
[ 32.328091] do_syscall_64+0xa6/0x1a0
[ 32.336915] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 32.350460]
[ 32.355097] Freed by task 455:
[ 32.360331] kasan_save_stack+0x33/0x60
[ 32.369595] kasan_save_track+0x14/0x30
[ 32.377397] kasan_save_free_info+0x3b/0x60
[ 32.386598] __kasan_slab_free+0x37/0x50
[ 32.398562] slab_free_after_rcu_debug+0xb1/0x280
[ 32.417108] rcu_core+0x610/0x1a80
[ 32.424947] handle_softirqs+0x1b5/0x5c0
[ 32.434754] irq_exit_rcu+0xaf/0xe0
[ 32.438144] sysvec_apic_timer_interrupt+0x6c/0x80
[ 32.443842] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 32.448109]
[ 32.449772] Last potentially related work creation:
[ 32.454800] kasan_save_stack+0x33/0x60
[ 32.458743] __kasan_record_aux_stack+0x8e/0xa0
[ 32.463802] kmem_cache_free+0x21c/0x370
[ 32.468130] blk_mq_end_request_batch+0x26b/0x13f0
[ 32.473935] io_do_iopoll+0xa78/0x10a0
[ 32.477800] __do_sys_io_uring_enter+0x7b7/0x12e0
[ 32.482678] do_syscall_64+0xa6/0x1a0
[ 32.487671] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 32.492551]
[ 32.494058] The buggy address belongs to the object at ff1100015f775a40
[ 32.494058] which belongs to the cache bio-248 of size 248
[ 32.504485] The buggy address is located 116 bytes inside of
[ 32.504485] freed 248-byte region [ff1100015f775a40, ff1100015f775b38)
[ 32.518309]
[ 32.520370] The buggy address belongs to the physical page:
[ 32.526444] page: refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x15f774
[ 32.535554] head: order:2 mapcount:0 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[ 32.542517] flags: 0x200000000000040(head|node=0|zone=2)
[ 32.547971] page_type: f5(slab)
[ 32.551287] raw: 0200000000000040 ff1100010376af80 dead000000000122
0000000000000000
[ 32.559290] raw: 0000000000000000 0000000000330033 00000001f5000000
0000000000000000
[ 32.566773] head: 0200000000000040 ff1100010376af80 dead000000000122
0000000000000000
[ 32.574046] head: 0000000000000000 0000000000330033 00000001f5000000
0000000000000000
[ 32.581715] head: 0200000000000002 ffd40000057ddd01 ffffffffffffffff
0000000000000000
[ 32.589588] head: 0000000000000004 0000000000000000 00000000ffffffff
0000000000000000
[ 32.596963] page dumped because: kasan: bad access detected
[ 32.603473]
[ 32.604871] Memory state around the buggy address:
[ 32.609617] ff1100015f775980: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 fc
[ 32.617652] ff1100015f775a00: fc fc fc fc fc fc fc fc fa fb fb fb fb
fb fb fb
[ 32.625385] >ff1100015f775a80: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 32.634014] ^
[ 32.637444] ff1100015f775b00: fb fb fb fb fb fb fb fc fc fc fc fc fc
fc fc fc
[ 32.644158] ff1100015f775b80: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[ 32.651115]
==================================================================
[ 32.659002] Disabling lock debugging due to kernel taint
QEMU: Terminated [W(10)][0.1%][w=150MiB/s][w=38.4k IOPS][eta 01h:24m:16s]
root@jgarry-ubuntu-bm5-instance-20230215-1843:/home/ubuntu/linux#
Here's my git bisect log:
git bisect start
# good: [1cbfb828e05171ca2dd77b5988d068e6872480fe] Merge tag
'for-6.14/block-20250118' of git://git.kernel.dk/linux
git bisect good 1cbfb828e05171ca2dd77b5988d068e6872480fe
# bad: [a312e1706ce6c124f04ec85ddece240f3bb2a696] Merge tag
'for-6.14/io_uring-20250119' of git://git.kernel.dk/linux
git bisect bad a312e1706ce6c124f04ec85ddece240f3bb2a696
# good: [3d8b5a22d40435b4a7e58f06ae2cd3506b222898] block: add support
to pass user meta buffer
git bisect good 3d8b5a22d40435b4a7e58f06ae2cd3506b222898
# good: [ce9464081d5168ee0f279d6932ba82260a5b97c4] io_uring/msg_ring:
Drop custom destructor
git bisect good ce9464081d5168ee0f279d6932ba82260a5b97c4
# bad: [d803d123948feffbd992213e144df224097f82b0] io_uring/rw: handle
-EAGAIN retry at IO completion time
git bisect bad d803d123948feffbd992213e144df224097f82b0
# good: [c5f71916146033f9aba108075ff7087022075fd6] io_uring/rw: always
clear ->bytes_done on io_async_rw setup
git bisect good c5f71916146033f9aba108075ff7087022075fd6
# good: [2a51c327d4a4a2eb62d67f4ea13a17efd0f25c5c] io_uring/rsrc:
simplify the bvec iter count calculation
git bisect good 2a51c327d4a4a2eb62d67f4ea13a17efd0f25c5c
# good: [9ac273ae3dc296905b4d61e4c8e7a25592f6d183] io_uring/rw: use
io_rw_recycle() from cleanup path
git bisect good 9ac273ae3dc296905b4d61e4c8e7a25592f6d183
# first bad commit: [d803d123948feffbd992213e144df224097f82b0]
io_uring/rw: handle -EAGAIN retry at IO completion time
john@localhost:~/mnt_sda4/john/kernel-dev2>
Thanks,
John
> ---
> io_uring/io_uring.c | 15 +++++++--
> io_uring/rw.c | 80 ++++++++++++++-------------------------------
> 2 files changed, 38 insertions(+), 57 deletions(-)
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index db198bd435b5..92ba2fdcd087 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -115,7 +115,7 @@
> REQ_F_ASYNC_DATA)
>
> #define IO_REQ_CLEAN_SLOW_FLAGS (REQ_F_REFCOUNT | REQ_F_LINK | REQ_F_HARDLINK |\
> - IO_REQ_CLEAN_FLAGS)
> + REQ_F_REISSUE | IO_REQ_CLEAN_FLAGS)
>
> #define IO_TCTX_REFS_CACHE_NR (1U << 10)
>
> @@ -1403,6 +1403,12 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
> comp_list);
>
> if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) {
> + if (req->flags & REQ_F_REISSUE) {
> + node = req->comp_list.next;
> + req->flags &= ~REQ_F_REISSUE;
> + io_queue_iowq(req);
> + continue;
> + }
> if (req->flags & REQ_F_REFCOUNT) {
> node = req->comp_list.next;
> if (!req_ref_put_and_test(req))
> @@ -1442,7 +1448,12 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
> struct io_kiocb *req = container_of(node, struct io_kiocb,
> comp_list);
>
> - if (!(req->flags & REQ_F_CQE_SKIP) &&
> + /*
> + * Requests marked with REQUEUE should not post a CQE, they
> + * will go through the io-wq retry machinery and post one
> + * later.
> + */
> + if (!(req->flags & (REQ_F_CQE_SKIP | REQ_F_REISSUE)) &&
> unlikely(!io_fill_cqe_req(ctx, req))) {
> if (ctx->lockless_cq) {
> spin_lock(&ctx->completion_lock);
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index afc669048c5d..c52c0515f0a2 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -202,7 +202,7 @@ static void io_req_rw_cleanup(struct io_kiocb *req, unsigned int issue_flags)
> * mean that the underlying data can be gone at any time. But that
> * should be fixed seperately, and then this check could be killed.
> */
> - if (!(req->flags & REQ_F_REFCOUNT)) {
> + if (!(req->flags & (REQ_F_REISSUE | REQ_F_REFCOUNT))) {
> req->flags &= ~REQ_F_NEED_CLEANUP;
> io_rw_recycle(req, issue_flags);
> }
> @@ -455,19 +455,12 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
> return NULL;
> }
>
> -#ifdef CONFIG_BLOCK
> -static void io_resubmit_prep(struct io_kiocb *req)
> -{
> - struct io_async_rw *io = req->async_data;
> - struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> -
> - io_meta_restore(io, &rw->kiocb);
> - iov_iter_restore(&io->iter, &io->iter_state);
> -}
> -
> static bool io_rw_should_reissue(struct io_kiocb *req)
> {
> +#ifdef CONFIG_BLOCK
> + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> umode_t mode = file_inode(req->file)->i_mode;
> + struct io_async_rw *io = req->async_data;
> struct io_ring_ctx *ctx = req->ctx;
>
> if (!S_ISBLK(mode) && !S_ISREG(mode))
> @@ -488,17 +481,14 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
> */
> if (!same_thread_group(req->tctx->task, current) || !in_task())
> return false;
> +
> + io_meta_restore(io, &rw->kiocb);
> + iov_iter_restore(&io->iter, &io->iter_state);
> return true;
> -}
> #else
> -static void io_resubmit_prep(struct io_kiocb *req)
> -{
> -}
> -static bool io_rw_should_reissue(struct io_kiocb *req)
> -{
> return false;
> -}
> #endif
> +}
>
> static void io_req_end_write(struct io_kiocb *req)
> {
> @@ -525,22 +515,16 @@ static void io_req_io_end(struct io_kiocb *req)
> }
> }
>
> -static bool __io_complete_rw_common(struct io_kiocb *req, long res)
> +static void __io_complete_rw_common(struct io_kiocb *req, long res)
> {
> - if (unlikely(res != req->cqe.res)) {
> - if (res == -EAGAIN && io_rw_should_reissue(req)) {
> - /*
> - * Reissue will start accounting again, finish the
> - * current cycle.
> - */
> - io_req_io_end(req);
> - req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
> - return true;
> - }
> + if (res == req->cqe.res)
> + return;
> + if (res == -EAGAIN && io_rw_should_reissue(req)) {
> + req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
> + } else {
> req_set_fail(req);
> req->cqe.res = res;
> }
> - return false;
> }
>
> static inline int io_fixup_rw_res(struct io_kiocb *req, long res)
> @@ -583,8 +567,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res)
> struct io_kiocb *req = cmd_to_io_kiocb(rw);
>
> if (!kiocb->dio_complete || !(kiocb->ki_flags & IOCB_DIO_CALLER_COMP)) {
> - if (__io_complete_rw_common(req, res))
> - return;
> + __io_complete_rw_common(req, res);
> io_req_set_res(req, io_fixup_rw_res(req, res), 0);
> }
> req->io_task_work.func = io_req_rw_complete;
> @@ -646,26 +629,19 @@ static int kiocb_done(struct io_kiocb *req, ssize_t ret,
> if (ret >= 0 && req->flags & REQ_F_CUR_POS)
> req->file->f_pos = rw->kiocb.ki_pos;
> if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) {
> - if (!__io_complete_rw_common(req, ret)) {
> - /*
> - * Safe to call io_end from here as we're inline
> - * from the submission path.
> - */
> - io_req_io_end(req);
> - io_req_set_res(req, final_ret,
> - io_put_kbuf(req, ret, issue_flags));
> - io_req_rw_cleanup(req, issue_flags);
> - return IOU_OK;
> - }
> + __io_complete_rw_common(req, ret);
> + /*
> + * Safe to call io_end from here as we're inline
> + * from the submission path.
> + */
> + io_req_io_end(req);
> + io_req_set_res(req, final_ret, io_put_kbuf(req, ret, issue_flags));
> + io_req_rw_cleanup(req, issue_flags);
> + return IOU_OK;
> } else {
> io_rw_done(&rw->kiocb, ret);
> }
>
> - if (req->flags & REQ_F_REISSUE) {
> - req->flags &= ~REQ_F_REISSUE;
> - io_resubmit_prep(req);
> - return -EAGAIN;
> - }
> return IOU_ISSUE_SKIP_COMPLETE;
> }
>
> @@ -944,8 +920,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
> if (ret == -EOPNOTSUPP && force_nonblock)
> ret = -EAGAIN;
>
> - if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) {
> - req->flags &= ~REQ_F_REISSUE;
> + if (ret == -EAGAIN) {
> /* If we can poll, just do that. */
> if (io_file_can_poll(req))
> return -EAGAIN;
> @@ -1154,11 +1129,6 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
> else
> ret2 = -EINVAL;
>
> - if (req->flags & REQ_F_REISSUE) {
> - req->flags &= ~REQ_F_REISSUE;
> - ret2 = -EAGAIN;
> - }
> -
> /*
> * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just
> * retry them without IOCB_NOWAIT.
next prev parent reply other threads:[~2025-03-04 18:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-09 18:15 [PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases Jens Axboe
2025-01-09 18:15 ` [PATCH 1/3] io_uring/rw: use io_rw_recycle() from cleanup path Jens Axboe
2025-01-09 18:15 ` [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time Jens Axboe
2025-03-04 18:10 ` John Garry [this message]
2025-03-05 16:57 ` John Garry
2025-03-05 17:03 ` Jens Axboe
2025-03-05 21:03 ` Jens Axboe
2025-01-09 18:15 ` [PATCH 3/3] io_uring/rw: don't gate retry on completion context Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox