From: Jens Axboe <[email protected]>
To: [email protected]
Subject: [PATCHSET RFC 0/5] Wait on cancelations at release time
Date: Tue, 4 Jun 2024 13:01:27 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
Hi,
I've posted this before, but did a bit more work on it and sending it
out again. The idea is to ensure that we've done any fputs that we need
to when a task using a ring exit, so that we don't leave references that
will get put "shortly afterwards". Currently cancelations are done by
ring exit work, which is punted to a kworker. This means that after the
final ->release() on the io_uring fd has completed, there can still be
pending fputs. This can be observed by running the following script:
#!/bin/bash
DEV=/dev/nvme0n1
MNT=/data
ITER=0
while true; do
echo loop $ITER
sudo mount $DEV $MNT
fio --name=test --ioengine=io_uring --iodepth=2 --filename=$MNT/foo --size=1g --buffered=1 --overwrite=0 --numjobs=12 --minimal --rw=randread --thread=1 --output=/dev/null --eta=never &
Y=$(($RANDOM % 3))
X=$(($RANDOM % 10))
VAL="$Y.$X"
sleep $VAL
FIO_PID=$(pidof fio)
if [ -z "$FIO_PID" ]; then
((ITER++))
continue
fi
ps -e | grep fio > /dev/null 2>&1
while [ $? -eq 0 ]; do
killall -KILL $FIO_PID > /dev/null 2>&1
echo will wait
wait > /dev/null 2>&1
echo done waiting
ps -e | grep "fio " > /dev/null 2>&1
done
sudo umount /data
if [ $? -ne 0 ]; then
break
fi
((ITER++))
done
which just starts a fio job doing writes, kills it, waits on the task
to exit, and then immediately tries to umount it. Currently that will
at some point trigger:
[...]
loop 9
will wait(f=12)
done waiting
umount: /data: target is busy.
as the umount raced with the final fputs on the files being accessed
on the mount point.
There are a few parts to this:
1) Final fput is done via task_work, but for kernel threads, it's done
via a delayed work queue. Patches 1+2 allows for kernel threads to
use task_work like other threads, as we can then quiesce the fputs
for the task rather than need to flush a system wide global pending
list that can have pending final releases for any task or file.
2) Patch 3 moves away from percpu reference counts, as those require
an RCU sync on freeing. As the goal is to move to sync cancelations
on exit, this can add considerable latency. Outside of that, percpu
ref counts provide a lot of guarantees and features that io_uring
doesn't need, and the new approach is faster.
3) Finally, make the cancelations sync. They are still offloaded to
a kworker, but the task doing ->release() waits for them to finish.
With this, the above test case works fine, as expected.
I'll send patches 1+2 separately, but wanted to get this out for review
and discussion first.
Patches are against current -git, with io_uring 6.10 and 6.11 pending
changes pulled in. You can also find the patches here:
https://git.kernel.dk/cgit/linux/log/?h=io_uring-exit-cancel
fs/file_table.c | 2 +-
include/linux/io_uring_types.h | 4 +-
include/linux/sched.h | 2 +-
io_uring/Makefile | 2 +-
io_uring/io_uring.c | 77 ++++++++++++++++++++++++----------
io_uring/io_uring.h | 3 +-
io_uring/refs.c | 58 +++++++++++++++++++++++++
io_uring/refs.h | 53 +++++++++++++++++++++++
io_uring/register.c | 3 +-
io_uring/rw.c | 3 +-
io_uring/sqpoll.c | 3 +-
kernel/fork.c | 2 +-
12 files changed, 182 insertions(+), 30 deletions(-)
--
Jens Axboe
next reply other threads:[~2024-06-04 19:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 19:01 Jens Axboe [this message]
2024-06-04 19:01 ` [PATCH 1/5] fs: gate final fput task_work on PF_NO_TASKWORK Jens Axboe
2024-06-04 19:01 ` [PATCH 2/5] io_uring: mark exit side kworkers as task_work capable Jens Axboe
2024-06-05 15:01 ` Pavel Begunkov
2024-06-05 18:08 ` Jens Axboe
2024-06-04 19:01 ` [PATCH 3/5] io_uring: move to using private ring references Jens Axboe
2024-06-05 15:11 ` Pavel Begunkov
2024-06-05 16:31 ` Pavel Begunkov
2024-06-05 19:13 ` Pavel Begunkov
2024-06-05 19:29 ` Jens Axboe
2024-06-05 19:39 ` Jens Axboe
2024-06-04 19:01 ` [PATCH 4/5] io_uring: consider ring dead once the ref is marked dying Jens Axboe
2024-06-04 19:01 ` [PATCH 5/5] io_uring: wait for cancelations on final ring put Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox