public inbox for [email protected]
 help / color / mirror / Atom feed
* io-uring: KASAN failure, presumably
@ 2021-09-12  2:34 Nadav Amit
  2021-09-12  4:43 ` io-uring: KASAN failure, presumably REQ_F_REISSUE issue Nadav Amit
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nadav Amit @ 2021-09-12  2:34 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Linux Kernel Mailing List

Hello Jens (& Pavel),

I hope you are having a nice weekend. I ran into a KASAN failure in io-uring
which I think is not "my fault".

The failure does not happen very infrequently, so my analysis is based on
reading the code. IIUC the failure, then I do not understand the code well
enough, as to say I do not understand how it was supposed to work. I would
appreciate your feedback.

The failure happens on my own custom kernel (do not try to correlate the line
numbers). The gist of the splat is:

[84142.034456] ==================================================================
[84142.035552] BUG: KASAN: use-after-free in io_req_complete_post (fs/io_uring.c:1629)
[84142.036473] Read of size 4 at addr ffff8881a1577e60 by task memcached/246246
[84142.037415]
[84142.037621] CPU: 0 PID: 246246 Comm: memcached Not tainted 5.13.1+ #236
[84142.038509] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
[84142.040151] Call Trace:      
[84142.040495] dump_stack (lib/dump_stack.c:122)
[84142.040962] print_address_description.constprop.0 (mm/kasan/report.c:234)
[84142.041751] ? io_req_complete_post (fs/io_uring.c:1629)
[84142.042365] kasan_report.cold (mm/kasan/report.c:420 mm/kasan/report.c:436)
[84142.042921] ? io_req_complete_post (fs/io_uring.c:1629)
[84142.043534] __asan_load4 (mm/kasan/generic.c:252) 
[84142.044008] io_req_complete_post (fs/io_uring.c:1629) 
[84142.044609] __io_complete_rw.isra.0 (fs/io_uring.c:2525) 
[84142.045264] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4123) 
[84142.045949] io_complete_rw (fs/io_uring.c:2532) 
[84142.046447] handle_userfault (fs/userfaultfd.c:778) 

[snip]

[84142.072667] Freed by task 246231:
[84142.073197] kasan_save_stack (mm/kasan/common.c:39)
[84142.073896] kasan_set_track (mm/kasan/common.c:46)
[84142.074421] kasan_set_free_info (mm/kasan/generic.c:359)
[84142.075015] __kasan_slab_free (mm/kasan/common.c:362 mm/kasan/common.c:325 mm/kasan/common.c:368)
[84142.075578] kmem_cache_free (mm/slub.c:1608 mm/slub.c:3168 mm/slub.c:3184)
[84142.076116] __io_free_req (./arch/x86/include/asm/preempt.h:80 ./include/linux/rcupdate.h:68 ./include/linux/rcupdate.h:655 ./include/linux/percpu-refcount.h:317 ./include/linux/percpu-refcount.h:338 fs/io_uring.c:1802)
[84142.076641] io_free_req (fs/io_uring.c:2113)
[84142.077110] __io_queue_sqe (fs/io_uring.c:2208 fs/io_uring.c:6533)
[84142.077628] io_queue_sqe (fs/io_uring.c:6568)
[84142.078121] io_submit_sqes (fs/io_uring.c:6730 fs/io_uring.c:6838)
[84142.078665] __x64_sys_io_uring_enter (fs/io_uring.c:9428 fs/io_uring.c:9369 fs/io_uring.c:9369)
[84142.079463] do_syscall_64 (arch/x86/entry/common.c:47)
[84142.079967] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:112)


I believe the issue is related to the handling of REQ_F_REISSUE and
specifically to commit 230d50d448acb ("io_uring: move reissue into regular IO
path"). There seems to be a race between io_write()/io_read()
and __io_complete_rw()/kiocb_done().

__io_complete_rw() sets REQ_F_REIUSSE:

               if ((res == -EAGAIN || res == -EOPNOTSUPP) &&
                    io_rw_should_reissue(req)) {
                        req->flags |= REQ_F_REISSUE;
                        return;
               }

And then kiocb_done() then checks REQ_F_REISSUE and clear it:

        if (check_reissue && req->flags & REQ_F_REISSUE) {
                req->flags &= ~REQ_F_REISSUE;
                ...


These two might race with io_write() for instance, which issues the I/O
(__io_complete_rw() and kiocb_done() might run immediately after
call_write_iter() is called) and then check and clear REQ_F_REISSUE.

        if (req->file->f_op->write_iter)
                ret2 = call_write_iter(req->file, kiocb, iter);
        else if (req->file->f_op->write)
                ret2 = loop_rw_iter(WRITE, req, iter);
        else
                ret2 = -EINVAL;

        if (req->flags & REQ_F_REISSUE) {
                req->flags &= ~REQ_F_REISSUE;
                ret2 = -EAGAIN;
        }


So if call_write_iter() returns -EIOCBQUEUED, this return value can be
lost/ignored if kiocb_done() was called with result of -EAGAIN. Presumably,
other bad things might happen due to the fact both io_write() and
kiocb_done() see REQ_F_REISSUE set.

You might ask why, after enqueuing the IO for async execution, kiocb_done()
would be called with -EAGAIN as a result. Indeed, this might be more
unique to my use-case that is under development (userfaultfd might
return -EAGAIN if the mappings undergoing changes; presumably -EBUSY or some
wait-queue would be better.) Having said that, the current behavior still
seems valid.

So I do not understand the check for REQ_F_REISSUE in io_write()/io_read().
Shouldn't it just be removed? I do not suppose you want to do
bit-test-and-clear to avoid such a race.

Thanks,
Nadav

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-12 18:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-12  2:34 io-uring: KASAN failure, presumably Nadav Amit
2021-09-12  4:43 ` io-uring: KASAN failure, presumably REQ_F_REISSUE issue Nadav Amit
2021-09-12 13:00 ` io-uring: KASAN failure, presumably Jens Axboe
2021-09-12 18:15 ` Jens Axboe
2021-09-12 18:21   ` Nadav Amit
2021-09-12 18:24     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox