From: Jens Axboe <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: [email protected]
Subject: [GIT PULL] io_uring changes for 5.5-rc
Date: Thu, 21 Nov 2019 10:10:46 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
Hi Linus,
Here are the changes for io_uring for 5.5. A lot of stuff has been going
on this cycle, with improving the support for networked IO (and hence
unbounded request completion times) being one of the major themes.
There's been a set of fixes done this week, I'll send those out as well
once we're certain we're fully happy with them. This pull request
contains:
- Unification of the "normal" submit path and the SQPOLL path (Pavel)
- Support for sparse (and bigger) file sets, and updating of those file
sets without needing to unregister/register again.
- Independently sized CQ ring, instead of just making it always 2x the
SQ ring size. This makes it more flexible for networked applications.
- Support for overflowed CQ ring, never dropping events but providing
backpressure on submits.
- Add support for absolute timeouts, not just relative ones.
- Support for generic cancellations. This divorces io_uring from
workqueues as well, which additionally gets us one step closer to
generic async system call support.
- With cancellations, we can support grabbing the process file table as
well, just like we do mm context. This allows support for system calls
that create file descriptors, like accept4() support that's built on
top of that.
- Support for io_uring tracing (Dmitrii)
- Support for linked timeouts. These abort an operation if it isn't
completed by the time noted in the linke timeout.
- Speedup tracking of poll requests
- Various cleanups making the coder easier to follow (Jackie, Pavel,
Bob, YueHaibing, me)
- Update MAINTAINERS with new io_uring list
This will cause a merge conflict with some of the late fixes that
went into mainline. I'm attaching my merge commit of pulling master
into for-next the other day to resolve it. There are three main
conflicts, two first are trivial, the last one not that bad. But figured
I'd attach it for reference.
Please pull!
git://git.kernel.dk/linux-block.git tags/for-5.5/io_uring-20191121
----------------------------------------------------------------
Bob Liu (2):
io_uring: clean up io_uring_cancel_files()
io_uring: introduce req_need_defer()
Dmitrii Dolgov (1):
io_uring: add set of tracing events
Jackie Liu (5):
io_uring: replace s->needs_lock with s->in_async
io_uring: set -EINTR directly when a signal wakes up in io_cqring_wait
io_uring: remove passed in 'ctx' function parameter ctx if possible
io_uring: keep io_put_req only responsible for release and put req
io_uring: separate the io_free_req and io_free_req_find_next interface
Jens Axboe (47):
io_uring: run dependent links inline if possible
io_uring: allow sparse fixed file sets
io_uring: add support for IORING_REGISTER_FILES_UPDATE
io_uring: allow application controlled CQ ring size
io_uring: add support for absolute timeouts
io_uring: add support for canceling timeout requests
io-wq: small threadpool implementation for io_uring
io_uring: replace workqueue usage with io-wq
io_uring: io_uring: add support for async work inheriting files
net: add __sys_accept4_file() helper
io_uring: add support for IORING_OP_ACCEPT
io_uring: protect fixed file indexing with array_index_nospec()
io_uring: support for larger fixed file sets
io_uring: fix race with canceling timeouts
io_uring: io_wq_create() returns an error pointer, not NULL
io_uring: support for generic async request cancel
io_uring: remove io_uring_add_to_prev() trace event
io_uring: add completion trace event
MAINTAINERS: update io_uring entry
io-wq: use proper nesting IRQ disabling spinlocks for cancel
io_uring: enable optimized link handling for IORING_OP_POLL_ADD
io_uring: fixup a few spots where link failure isn't flagged
io_uring: kill dead REQ_F_LINK_DONE flag
io_uring: abstract out io_async_cancel_one() helper
io_uring: add support for linked SQE timeouts
io_uring: make io_cqring_events() take 'ctx' as argument
io_uring: pass in io_kiocb to fill/add CQ handlers
io_uring: add support for backlogged CQ ring
io-wq: io_wqe_run_queue() doesn't need to use list_empty_careful()
io-wq: add support for bounded vs unbunded work
io_uring: properly mark async work as bounded vs unbounded
io_uring: reduce/pack size of io_ring_ctx
io_uring: fix error clear of ->file_table in io_sqe_files_register()
io_uring: convert accept4() -ERESTARTSYS into -EINTR
io_uring: provide fallback request for OOM situations
io_uring: make ASYNC_CANCEL work with poll and timeout
io_uring: flag SQPOLL busy condition to userspace
io_uring: don't do flush cancel under inflight_lock
io_uring: fix -ENOENT issue with linked timer with short timeout
io_uring: use correct "is IO worker" helper
io_uring: fix potential deadlock in io_poll_wake()
io_uring: check for validity of ->rings in teardown
io_wq: add get/put_work handlers to io_wq_create()
io-wq: ensure we have a stable view of ->cur_work for cancellations
io-wq: ensure free/busy list browsing see all items
io-wq: remove now redundant struct io_wq_nulls_list
io_uring: make POLL_ADD/POLL_REMOVE scale better
Pavel Begunkov (8):
io_uring: remove index from sqe_submit
io_uring: Fix mm_fault with READ/WRITE_FIXED
io_uring: Merge io_submit_sqes and io_ring_submit
io_uring: io_queue_link*() right after submit
io_uring: allocate io_kiocb upfront
io_uring: Use submit info inlined into req
io_uring: use inlined struct sqe_submit
io_uring: Fix getting file for non-fd opcodes
YueHaibing (1):
io-wq: use kfree_rcu() to simplify the code
MAINTAINERS | 5 +-
fs/Kconfig | 3 +
fs/Makefile | 1 +
fs/io-wq.c | 1065 +++++++++++++++++++
fs/io-wq.h | 74 ++
fs/io_uring.c | 2213 ++++++++++++++++++++++++++-------------
include/Kbuild | 1 +
include/linux/sched.h | 1 +
include/linux/socket.h | 3 +
include/trace/events/io_uring.h | 358 +++++++
include/uapi/linux/io_uring.h | 24 +-
init/Kconfig | 1 +
kernel/sched/core.c | 16 +-
net/socket.c | 65 +-
14 files changed, 3098 insertions(+), 732 deletions(-)
create mode 100644 fs/io-wq.c
create mode 100644 fs/io-wq.h
create mode 100644 include/trace/events/io_uring.h
commit 22ffc78881bce32ae83dbd315fb15cd7ef8a6e4a
Merge: a3085d8079be b226c9e1f4cb
Author: Jens Axboe <[email protected]>
Date: Fri Nov 15 18:03:21 2019 -0700
Merge branch 'master' into for-next
* master: (98 commits)
afs: Fix race in commit bulk status fetch
KVM: Add a comment describing the /dev/kvm no_compat handling
drm/amdgpu: fix null pointer deref in firmware header printing
rsxx: add missed destroy_workqueue calls in remove
iocost: check active_list of all the ancestors in iocg_activate()
rbd: silence bogus uninitialized warning in rbd_object_map_update_finish()
ceph: increment/decrement dio counter on async requests
ceph: take the inode lock before acquiring cap refs
ALSA: usb-audio: Fix incorrect size check for processing/extension units
KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast()
kbuild: tell sparse about the $ARCH
sparc: vdso: fix build error of vdso32
block, bfq: deschedule empty bfq_queues not referred by any process
mmc: sdhci-of-at91: fix quirk2 overwrite
ALSA: usb-audio: Fix incorrect NULL check in create_yamaha_midi_quirk()
io_uring: ensure registered buffer import returns the IO length
io_uring: Fix getting file for timeout
drm/i915/tgl: MOCS table update
Revert "drm/i915/ehl: Update MOCS table for EHL"
KVM: Forbid /dev/kvm being opened by a compat task when CONFIG_KVM_COMPAT=n
...
Signed-off-by: Jens Axboe <[email protected]>
diff --cc fs/io_uring.c
index 25f0e8fd935b,2c819c3c855d..187e1c5021ac
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@@ -340,8 -326,7 +340,9 @@@ struct io_kiocb
#define REQ_F_TIMEOUT 1024 /* timeout request */
#define REQ_F_ISREG 2048 /* regular file */
#define REQ_F_MUST_PUNT 4096 /* must be punted even for NONBLOCK */
- #define REQ_F_INFLIGHT 8192 /* on inflight list */
- #define REQ_F_COMP_LOCKED 16384 /* completion under lock */
+ #define REQ_F_TIMEOUT_NOSEQ 8192 /* no timeout sequence */
++#define REQ_F_INFLIGHT 16384 /* on inflight list */
++#define REQ_F_COMP_LOCKED 32768 /* completion under lock */
u64 user_data;
u32 result;
u32 sequence;
@@@ -482,9 -454,13 +483,13 @@@ static struct io_kiocb *io_get_timeout_
struct io_kiocb *req;
req = list_first_entry_or_null(&ctx->timeout_list, struct io_kiocb, list);
- if (req && !__req_need_defer(req)) {
- list_del_init(&req->list);
- return req;
+ if (req) {
+ if (req->flags & REQ_F_TIMEOUT_NOSEQ)
+ return NULL;
- if (!__io_sequence_defer(ctx, req)) {
++ if (!__req_need_defer(req)) {
+ list_del_init(&req->list);
+ return req;
+ }
}
return NULL;
@@@ -2296,12 -1946,7 +2301,13 @@@ static int io_timeout(struct io_kiocb *
if (get_timespec64(&ts, u64_to_user_ptr(sqe->addr)))
return -EFAULT;
+ if (flags & IORING_TIMEOUT_ABS)
+ mode = HRTIMER_MODE_ABS;
+ else
+ mode = HRTIMER_MODE_REL;
+
+ hrtimer_init(&req->timeout.timer, CLOCK_MONOTONIC, mode);
+ req->flags |= REQ_F_TIMEOUT;
/*
* sqe->off holds how many events that need to occur for this
@@@ -2352,82 -2004,14 +2365,83 @@@
nxt->sequence++;
}
req->sequence -= span;
+ add:
list_add(&req->list, entry);
+ req->timeout.timer.function = io_timeout_fn;
+ hrtimer_start(&req->timeout.timer, timespec64_to_ktime(ts), mode);
spin_unlock_irq(&ctx->completion_lock);
+ return 0;
+}
- hrtimer_init(&req->timeout.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
- req->timeout.timer.function = io_timeout_fn;
- hrtimer_start(&req->timeout.timer, timespec64_to_ktime(ts),
- HRTIMER_MODE_REL);
+static bool io_cancel_cb(struct io_wq_work *work, void *data)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+
+ return req->user_data == (unsigned long) data;
+}
+
+static int io_async_cancel_one(struct io_ring_ctx *ctx, void *sqe_addr)
+{
+ enum io_wq_cancel cancel_ret;
+ int ret = 0;
+
+ cancel_ret = io_wq_cancel_cb(ctx->io_wq, io_cancel_cb, sqe_addr);
+ switch (cancel_ret) {
+ case IO_WQ_CANCEL_OK:
+ ret = 0;
+ break;
+ case IO_WQ_CANCEL_RUNNING:
+ ret = -EALREADY;
+ break;
+ case IO_WQ_CANCEL_NOTFOUND:
+ ret = -ENOENT;
+ break;
+ }
+
+ return ret;
+}
+
+static void io_async_find_and_cancel(struct io_ring_ctx *ctx,
+ struct io_kiocb *req, __u64 sqe_addr,
+ struct io_kiocb **nxt)
+{
+ unsigned long flags;
+ int ret;
+
+ ret = io_async_cancel_one(ctx, (void *) (unsigned long) sqe_addr);
+ if (ret != -ENOENT) {
+ spin_lock_irqsave(&ctx->completion_lock, flags);
+ goto done;
+ }
+
+ spin_lock_irqsave(&ctx->completion_lock, flags);
+ ret = io_timeout_cancel(ctx, sqe_addr);
+ if (ret != -ENOENT)
+ goto done;
+ ret = io_poll_cancel(ctx, sqe_addr);
+done:
+ io_cqring_fill_event(req, ret);
+ io_commit_cqring(ctx);
+ spin_unlock_irqrestore(&ctx->completion_lock, flags);
+ io_cqring_ev_posted(ctx);
+
+ if (ret < 0 && (req->flags & REQ_F_LINK))
+ req->flags |= REQ_F_FAIL_LINK;
+ io_put_req_find_next(req, nxt);
+}
+
+static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+ struct io_kiocb **nxt)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
+ return -EINVAL;
+ if (sqe->flags || sqe->ioprio || sqe->off || sqe->len ||
+ sqe->cancel_flags)
+ return -EINVAL;
+
+ io_async_find_and_cancel(ctx, req, READ_ONCE(sqe->addr), NULL);
return 0;
}
--
Jens Axboe
reply other threads:[~2019-11-21 17:10 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox