* [PATCH 01/12] io_uring: fix false WARN_ONCE
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
@ 2021-06-17 17:13 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 02/12] io_uring: refactor io_submit_flush_completions() Pavel Begunkov
` (10 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:13 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: syzbot+ea2f1484cffe5109dc10
WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_wake_worker fs/io-wq.c:244 [inline]
WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_enqueue+0x7f6/0x910 fs/io-wq.c:751
A WARN_ON_ONCE() in io_wqe_wake_worker() can be triggered by a valid
userspace setup. Replace it with pr_warn.
Reported-by: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 897b94530b57..d7acb3dce249 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -238,7 +238,8 @@ static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct)
* Most likely an attempt to queue unbounded work on an io_wq that
* wasn't setup with any unbounded workers.
*/
- WARN_ON_ONCE(!acct->max_workers);
+ if (unlikely(!acct->max_workers))
+ pr_warn_once("io-wq is not configured for unbound workers");
rcu_read_lock();
ret = io_wqe_activate_free_worker(wqe);
@@ -899,6 +900,8 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
if (WARN_ON_ONCE(!data->free_work || !data->do_work))
return ERR_PTR(-EINVAL);
+ if (WARN_ON_ONCE(!bounded))
+ return ERR_PTR(-EINVAL);
wq = kzalloc(struct_size(wq, wqes, nr_node_ids), GFP_KERNEL);
if (!wq)
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 02/12] io_uring: refactor io_submit_flush_completions()
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
2021-06-17 17:13 ` [PATCH 01/12] io_uring: fix false WARN_ONCE Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 03/12] io_uring: move creds from io-wq work to io_kiocb Pavel Begunkov
` (9 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
struct io_comp_state is always contained in struct io_ring_ctx, don't
pass them into io_submit_flush_completions() separately, it makes the
interface cleaner and simplifies it for the compiler.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d916eb2cef09..5935df64b153 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1059,8 +1059,7 @@ static void __io_queue_sqe(struct io_kiocb *req);
static void io_rsrc_put_work(struct work_struct *work);
static void io_req_task_queue(struct io_kiocb *req);
-static void io_submit_flush_completions(struct io_comp_state *cs,
- struct io_ring_ctx *ctx);
+static void io_submit_flush_completions(struct io_ring_ctx *ctx);
static bool io_poll_remove_waitqs(struct io_kiocb *req);
static int io_req_prep_async(struct io_kiocb *req);
@@ -1879,7 +1878,7 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
return;
if (ctx->submit_state.comp.nr) {
mutex_lock(&ctx->uring_lock);
- io_submit_flush_completions(&ctx->submit_state.comp, ctx);
+ io_submit_flush_completions(ctx);
mutex_unlock(&ctx->uring_lock);
}
percpu_ref_put(&ctx->refs);
@@ -2127,9 +2126,9 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
list_add(&req->compl.list, &state->comp.free_list);
}
-static void io_submit_flush_completions(struct io_comp_state *cs,
- struct io_ring_ctx *ctx)
+static void io_submit_flush_completions(struct io_ring_ctx *ctx)
{
+ struct io_comp_state *cs = &ctx->submit_state.comp;
int i, nr = cs->nr;
struct io_kiocb *req;
struct req_batch rb;
@@ -6451,7 +6450,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
cs->reqs[cs->nr++] = req;
if (cs->nr == ARRAY_SIZE(cs->reqs))
- io_submit_flush_completions(cs, ctx);
+ io_submit_flush_completions(ctx);
} else {
io_put_req(req);
}
@@ -6651,7 +6650,7 @@ static void io_submit_state_end(struct io_submit_state *state,
if (state->link.head)
io_queue_sqe(state->link.head);
if (state->comp.nr)
- io_submit_flush_completions(&state->comp, ctx);
+ io_submit_flush_completions(ctx);
if (state->plug_started)
blk_finish_plug(&state->plug);
io_state_file_put(state);
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 03/12] io_uring: move creds from io-wq work to io_kiocb
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
2021-06-17 17:13 ` [PATCH 01/12] io_uring: fix false WARN_ONCE Pavel Begunkov
2021-06-17 17:14 ` [PATCH 02/12] io_uring: refactor io_submit_flush_completions() Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 04/12] io_uring: track request creds with a flag Pavel Begunkov
` (8 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
io-wq now doesn't have anything to do with creds now, so move ->creds
from struct io_wq_work into request (aka struct io_kiocb).
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.h | 1 -
fs/io_uring.c | 24 +++++++++++++-----------
2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/fs/io-wq.h b/fs/io-wq.h
index af2df0680ee2..32c7b4e82484 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -87,7 +87,6 @@ static inline void wq_list_del(struct io_wq_work_list *list,
struct io_wq_work {
struct io_wq_work_node list;
- const struct cred *creds;
unsigned flags;
};
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5935df64b153..2bac5cd4dc91 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -851,6 +851,8 @@ struct io_kiocb {
struct hlist_node hash_node;
struct async_poll *apoll;
struct io_wq_work work;
+ const struct cred *creds;
+
/* store used ubuf, so we can prevent reloading */
struct io_mapped_ubuf *imu;
};
@@ -1234,8 +1236,8 @@ static void io_prep_async_work(struct io_kiocb *req)
const struct io_op_def *def = &io_op_defs[req->opcode];
struct io_ring_ctx *ctx = req->ctx;
- if (!req->work.creds)
- req->work.creds = get_current_cred();
+ if (!req->creds)
+ req->creds = get_current_cred();
req->work.list.next = NULL;
req->work.flags = 0;
@@ -1745,9 +1747,9 @@ static void io_dismantle_req(struct io_kiocb *req)
percpu_ref_put(req->fixed_rsrc_refs);
if (req->async_data)
kfree(req->async_data);
- if (req->work.creds) {
- put_cred(req->work.creds);
- req->work.creds = NULL;
+ if (req->creds) {
+ put_cred(req->creds);
+ req->creds = NULL;
}
}
@@ -6139,8 +6141,8 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
const struct cred *creds = NULL;
int ret;
- if (req->work.creds && req->work.creds != current_cred())
- creds = override_creds(req->work.creds);
+ if (req->creds && req->creds != current_cred())
+ creds = override_creds(req->creds);
switch (req->opcode) {
case IORING_OP_NOP:
@@ -6532,7 +6534,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
atomic_set(&req->refs, 2);
req->task = current;
req->result = 0;
- req->work.creds = NULL;
+ req->creds = NULL;
/* enforce forwards compatibility on users */
if (unlikely(sqe_flags & ~SQE_VALID_FLAGS))
@@ -6550,10 +6552,10 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
personality = READ_ONCE(sqe->personality);
if (personality) {
- req->work.creds = xa_load(&ctx->personalities, personality);
- if (!req->work.creds)
+ req->creds = xa_load(&ctx->personalities, personality);
+ if (!req->creds)
return -EINVAL;
- get_cred(req->work.creds);
+ get_cred(req->creds);
}
state = &ctx->submit_state;
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 04/12] io_uring: track request creds with a flag
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (2 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 03/12] io_uring: move creds from io-wq work to io_kiocb Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 05/12] io_uring: simplify iovec freeing in io_clean_op() Pavel Begunkov
` (7 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
Currently, if req->creds is not NULL, then there are creds assigned.
Track the invariant with a new flag in req->flags. No need to clear the
field at init, and also cleanup can be efficiently moved into
io_clean_op().
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2bac5cd4dc91..d0d56243c135 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -718,6 +718,7 @@ enum {
REQ_F_COMPLETE_INLINE_BIT,
REQ_F_REISSUE_BIT,
REQ_F_DONT_REISSUE_BIT,
+ REQ_F_CREDS_BIT,
/* keep async read/write and isreg together and in order */
REQ_F_ASYNC_READ_BIT,
REQ_F_ASYNC_WRITE_BIT,
@@ -771,6 +772,8 @@ enum {
REQ_F_ASYNC_WRITE = BIT(REQ_F_ASYNC_WRITE_BIT),
/* regular file */
REQ_F_ISREG = BIT(REQ_F_ISREG_BIT),
+ /* has creds assigned */
+ REQ_F_CREDS = BIT(REQ_F_CREDS_BIT),
};
struct async_poll {
@@ -1236,8 +1239,10 @@ static void io_prep_async_work(struct io_kiocb *req)
const struct io_op_def *def = &io_op_defs[req->opcode];
struct io_ring_ctx *ctx = req->ctx;
- if (!req->creds)
+ if (!(req->flags & REQ_F_CREDS)) {
+ req->flags |= REQ_F_CREDS;
req->creds = get_current_cred();
+ }
req->work.list.next = NULL;
req->work.flags = 0;
@@ -1623,7 +1628,7 @@ static void io_req_complete_post(struct io_kiocb *req, long res,
static inline bool io_req_needs_clean(struct io_kiocb *req)
{
return req->flags & (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP |
- REQ_F_POLLED | REQ_F_INFLIGHT);
+ REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS);
}
static void io_req_complete_state(struct io_kiocb *req, long res,
@@ -1747,10 +1752,6 @@ static void io_dismantle_req(struct io_kiocb *req)
percpu_ref_put(req->fixed_rsrc_refs);
if (req->async_data)
kfree(req->async_data);
- if (req->creds) {
- put_cred(req->creds);
- req->creds = NULL;
- }
}
/* must to be called somewhat shortly after putting a request */
@@ -6133,6 +6134,10 @@ static void io_clean_op(struct io_kiocb *req)
atomic_dec(&tctx->inflight_tracked);
req->flags &= ~REQ_F_INFLIGHT;
}
+ if (req->flags & REQ_F_CREDS) {
+ put_cred(req->creds);
+ req->flags &= ~REQ_F_CREDS;
+ }
}
static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
@@ -6141,7 +6146,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
const struct cred *creds = NULL;
int ret;
- if (req->creds && req->creds != current_cred())
+ if ((req->flags & REQ_F_CREDS) && req->creds != current_cred())
creds = override_creds(req->creds);
switch (req->opcode) {
@@ -6534,7 +6539,6 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
atomic_set(&req->refs, 2);
req->task = current;
req->result = 0;
- req->creds = NULL;
/* enforce forwards compatibility on users */
if (unlikely(sqe_flags & ~SQE_VALID_FLAGS))
@@ -6556,6 +6560,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
if (!req->creds)
return -EINVAL;
get_cred(req->creds);
+ req->flags |= REQ_F_CREDS;
}
state = &ctx->submit_state;
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 05/12] io_uring: simplify iovec freeing in io_clean_op()
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (3 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 04/12] io_uring: track request creds with a flag Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 06/12] io_uring: clean all flags in io_clean_op() at once Pavel Begunkov
` (6 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't get REQ_F_NEED_CLEANUP for rw unless there is ->free_iovec set,
so remove the optimisation of NULL checking it inline, kfree() will take
care if that would ever be the case.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d0d56243c135..5f92fcc9a41b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6092,8 +6092,8 @@ static void io_clean_op(struct io_kiocb *req)
case IORING_OP_WRITE_FIXED:
case IORING_OP_WRITE: {
struct io_async_rw *io = req->async_data;
- if (io->free_iovec)
- kfree(io->free_iovec);
+
+ kfree(io->free_iovec);
break;
}
case IORING_OP_RECVMSG:
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 06/12] io_uring: clean all flags in io_clean_op() at once
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (4 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 05/12] io_uring: simplify iovec freeing in io_clean_op() Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 07/12] io_uring: refactor io_get_sequence() Pavel Begunkov
` (5 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
Clean all flags in io_clean_op() in the end in one operation, will save
us a couple of operation and binary size.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5f92fcc9a41b..98932f3786d5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -109,6 +109,8 @@
#define SQE_VALID_FLAGS (IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK| \
IOSQE_IO_HARDLINK | IOSQE_ASYNC | \
IOSQE_BUFFER_SELECT)
+#define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \
+ REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS)
#define IO_TCTX_REFS_CACHE_NR (1U << 10)
@@ -1627,8 +1629,7 @@ static void io_req_complete_post(struct io_kiocb *req, long res,
static inline bool io_req_needs_clean(struct io_kiocb *req)
{
- return req->flags & (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP |
- REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS);
+ return req->flags & IO_REQ_CLEAN_FLAGS;
}
static void io_req_complete_state(struct io_kiocb *req, long res,
@@ -6080,7 +6081,6 @@ static void io_clean_op(struct io_kiocb *req)
kfree(req->sr_msg.kbuf);
break;
}
- req->flags &= ~REQ_F_BUFFER_SELECTED;
}
if (req->flags & REQ_F_NEED_CLEANUP) {
@@ -6121,7 +6121,6 @@ static void io_clean_op(struct io_kiocb *req)
putname(req->unlink.filename);
break;
}
- req->flags &= ~REQ_F_NEED_CLEANUP;
}
if ((req->flags & REQ_F_POLLED) && req->apoll) {
kfree(req->apoll->double_poll);
@@ -6132,12 +6131,11 @@ static void io_clean_op(struct io_kiocb *req)
struct io_uring_task *tctx = req->task->io_uring;
atomic_dec(&tctx->inflight_tracked);
- req->flags &= ~REQ_F_INFLIGHT;
}
- if (req->flags & REQ_F_CREDS) {
+ if (req->flags & REQ_F_CREDS)
put_cred(req->creds);
- req->flags &= ~REQ_F_CREDS;
- }
+
+ req->flags &= ~IO_REQ_CLEAN_FLAGS;
}
static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 07/12] io_uring: refactor io_get_sequence()
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (5 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 06/12] io_uring: clean all flags in io_clean_op() at once Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 08/12] io_uring: inline __tctx_task_work() Pavel Begunkov
` (4 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
Clean up io_get_sequence() and add a comment describing the magic around
sequence correction.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 98932f3786d5..54838cdb2536 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5993,13 +5993,12 @@ static int io_req_prep_async(struct io_kiocb *req)
static u32 io_get_sequence(struct io_kiocb *req)
{
- struct io_kiocb *pos;
- struct io_ring_ctx *ctx = req->ctx;
- u32 nr_reqs = 0;
+ u32 seq = req->ctx->cached_sq_head;
- io_for_each_link(pos, req)
- nr_reqs++;
- return ctx->cached_sq_head - nr_reqs;
+ /* need original cached_sq_head, but it was increased for each req */
+ io_for_each_link(req, req)
+ seq--;
+ return seq;
}
static bool io_drain_req(struct io_kiocb *req)
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 08/12] io_uring: inline __tctx_task_work()
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (6 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 07/12] io_uring: refactor io_get_sequence() Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 09/12] io_uring: optimise task_work submit flushing Pavel Begunkov
` (3 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
Inline __tctx_task_work() into tctx_task_work() in preparation for
further optimisations.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 67 ++++++++++++++++++++++++---------------------------
1 file changed, 31 insertions(+), 36 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54838cdb2536..d8bc4f82efd1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1888,48 +1888,43 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
percpu_ref_put(&ctx->refs);
}
-static bool __tctx_task_work(struct io_uring_task *tctx)
-{
- struct io_ring_ctx *ctx = NULL;
- struct io_wq_work_list list;
- struct io_wq_work_node *node;
-
- if (wq_list_empty(&tctx->task_list))
- return false;
-
- spin_lock_irq(&tctx->task_lock);
- list = tctx->task_list;
- INIT_WQ_LIST(&tctx->task_list);
- spin_unlock_irq(&tctx->task_lock);
-
- node = list.first;
- while (node) {
- struct io_wq_work_node *next = node->next;
- struct io_kiocb *req;
-
- req = container_of(node, struct io_kiocb, io_task_work.node);
- if (req->ctx != ctx) {
- ctx_flush_and_put(ctx);
- ctx = req->ctx;
- percpu_ref_get(&ctx->refs);
- }
-
- req->task_work.func(&req->task_work);
- node = next;
- }
-
- ctx_flush_and_put(ctx);
- return list.first != NULL;
-}
-
static void tctx_task_work(struct callback_head *cb)
{
- struct io_uring_task *tctx = container_of(cb, struct io_uring_task, task_work);
+ struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
+ task_work);
clear_bit(0, &tctx->task_state);
- while (__tctx_task_work(tctx))
+ while (!wq_list_empty(&tctx->task_list)) {
+ struct io_ring_ctx *ctx = NULL;
+ struct io_wq_work_list list;
+ struct io_wq_work_node *node;
+
+ spin_lock_irq(&tctx->task_lock);
+ list = tctx->task_list;
+ INIT_WQ_LIST(&tctx->task_list);
+ spin_unlock_irq(&tctx->task_lock);
+
+ node = list.first;
+ while (node) {
+ struct io_wq_work_node *next = node->next;
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ io_task_work.node);
+
+ if (req->ctx != ctx) {
+ ctx_flush_and_put(ctx);
+ ctx = req->ctx;
+ percpu_ref_get(&ctx->refs);
+ }
+ req->task_work.func(&req->task_work);
+ node = next;
+ }
+
+ ctx_flush_and_put(ctx);
+ if (!list.first)
+ break;
cond_resched();
+ }
}
static int io_req_task_work_add(struct io_kiocb *req)
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 09/12] io_uring: optimise task_work submit flushing
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (7 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 08/12] io_uring: inline __tctx_task_work() Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 10/12] io_uring: refactor tctx task_work list splicing Pavel Begunkov
` (2 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
tctx_task_work() tries to fetch a next batch of requests, but before it
would flush completions from the previous batch that may be sub-optimal.
E.g. io_req_task_queue() executes a head of the link where all the
linked may be enqueued through the same io_req_task_queue(). And there
are more cases for that.
Do the flushing at the end, so it can cache completions of several waves
of a single tctx_task_work(), and do the flush at the very end.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d8bc4f82efd1..f31f00c6e829 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1890,13 +1890,13 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
static void tctx_task_work(struct callback_head *cb)
{
+ struct io_ring_ctx *ctx = NULL;
struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
task_work);
clear_bit(0, &tctx->task_state);
while (!wq_list_empty(&tctx->task_list)) {
- struct io_ring_ctx *ctx = NULL;
struct io_wq_work_list list;
struct io_wq_work_node *node;
@@ -1920,11 +1920,12 @@ static void tctx_task_work(struct callback_head *cb)
node = next;
}
- ctx_flush_and_put(ctx);
if (!list.first)
break;
cond_resched();
}
+
+ ctx_flush_and_put(ctx);
}
static int io_req_task_work_add(struct io_kiocb *req)
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 10/12] io_uring: refactor tctx task_work list splicing
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (8 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 09/12] io_uring: optimise task_work submit flushing Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 11/12] io_uring: don't resched with empty task_list Pavel Begunkov
2021-06-17 17:14 ` [PATCH 12/12] io_uring: improve in tctx_task_work() resubmission Pavel Begunkov
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't need a full copy of tctx->task_list in tctx_task_work(), but
only a first one, so just assign node directly.
Taking into account that task_works are run in a context of a task,
it's very unlikely to first see non-empty tctx->task_list and then
splice it empty, can only happen with task_work cancellations that is
not-normal slow path anyway. Hence, get rid of the check in the end,
it's there not for validity but "performance" purposes.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f31f00c6e829..31afe25596d7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1897,15 +1897,13 @@ static void tctx_task_work(struct callback_head *cb)
clear_bit(0, &tctx->task_state);
while (!wq_list_empty(&tctx->task_list)) {
- struct io_wq_work_list list;
struct io_wq_work_node *node;
spin_lock_irq(&tctx->task_lock);
- list = tctx->task_list;
+ node = tctx->task_list.first;
INIT_WQ_LIST(&tctx->task_list);
spin_unlock_irq(&tctx->task_lock);
- node = list.first;
while (node) {
struct io_wq_work_node *next = node->next;
struct io_kiocb *req = container_of(node, struct io_kiocb,
@@ -1919,9 +1917,6 @@ static void tctx_task_work(struct callback_head *cb)
req->task_work.func(&req->task_work);
node = next;
}
-
- if (!list.first)
- break;
cond_resched();
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 11/12] io_uring: don't resched with empty task_list
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (9 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 10/12] io_uring: refactor tctx task_work list splicing Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-17 17:14 ` [PATCH 12/12] io_uring: improve in tctx_task_work() resubmission Pavel Begunkov
11 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
Entering tctx_task_work() with empty task_list is a strange scenario,
that can happen only on rare occasion during task exit, so let's not
check for task_list emptiness in advance and do it do-while style. The
code still correct for the empty case, just would do extra work about
which we don't care.
Do extra step and do the check before cond_resched(), so we don't
resched if have nothing to execute.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 31afe25596d7..2fdca298e173 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1896,7 +1896,7 @@ static void tctx_task_work(struct callback_head *cb)
clear_bit(0, &tctx->task_state);
- while (!wq_list_empty(&tctx->task_list)) {
+ while (1) {
struct io_wq_work_node *node;
spin_lock_irq(&tctx->task_lock);
@@ -1917,6 +1917,8 @@ static void tctx_task_work(struct callback_head *cb)
req->task_work.func(&req->task_work);
node = next;
}
+ if (wq_list_empty(&tctx->task_list))
+ break;
cond_resched();
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 12/12] io_uring: improve in tctx_task_work() resubmission
2021-06-17 17:13 [PATCH for-next 00/12] another round of 5.14 optimisations Pavel Begunkov
` (10 preceding siblings ...)
2021-06-17 17:14 ` [PATCH 11/12] io_uring: don't resched with empty task_list Pavel Begunkov
@ 2021-06-17 17:14 ` Pavel Begunkov
2021-06-18 15:23 ` Jens Axboe
11 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-17 17:14 UTC (permalink / raw)
To: Jens Axboe, io-uring
If task_state is cleared, io_req_task_work_add() will go the slow path
adding a task_work, setting the task_state, waking up the task and so
on. Not to mention it's expensive. tctx_task_work() first clears the
state and then executes all the work items queued, so if any of them
resubmits or adds new task_work items, it would unnecessarily go through
the slow path of io_req_task_work_add().
Let's clear the ->task_state at the end. We still have to check
->task_list for emptiness afterward to synchronise with
io_req_task_work_add(), do that, and set the state back if we're going
to retry, because clearing not-ours task_state on the next iteration
would be buggy.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2fdca298e173..4353f64c10c4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1894,8 +1894,6 @@ static void tctx_task_work(struct callback_head *cb)
struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
task_work);
- clear_bit(0, &tctx->task_state);
-
while (1) {
struct io_wq_work_node *node;
@@ -1917,8 +1915,14 @@ static void tctx_task_work(struct callback_head *cb)
req->task_work.func(&req->task_work);
node = next;
}
- if (wq_list_empty(&tctx->task_list))
- break;
+ if (wq_list_empty(&tctx->task_list)) {
+ clear_bit(0, &tctx->task_state);
+ if (wq_list_empty(&tctx->task_list))
+ break;
+ /* another tctx_task_work() is enqueued, yield */
+ if (test_and_set_bit(0, &tctx->task_state))
+ break;
+ }
cond_resched();
}
--
2.31.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 12/12] io_uring: improve in tctx_task_work() resubmission
2021-06-17 17:14 ` [PATCH 12/12] io_uring: improve in tctx_task_work() resubmission Pavel Begunkov
@ 2021-06-18 15:23 ` Jens Axboe
2021-06-18 15:33 ` Pavel Begunkov
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2021-06-18 15:23 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 6/17/21 11:14 AM, Pavel Begunkov wrote:
> If task_state is cleared, io_req_task_work_add() will go the slow path
> adding a task_work, setting the task_state, waking up the task and so
> on. Not to mention it's expensive. tctx_task_work() first clears the
> state and then executes all the work items queued, so if any of them
> resubmits or adds new task_work items, it would unnecessarily go through
> the slow path of io_req_task_work_add().
>
> Let's clear the ->task_state at the end. We still have to check
> ->task_list for emptiness afterward to synchronise with
> io_req_task_work_add(), do that, and set the state back if we're going
> to retry, because clearing not-ours task_state on the next iteration
> would be buggy.
Are we not re-introducing the problem fixed by 1d5f360dd1a3c by swapping
these around?
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 12/12] io_uring: improve in tctx_task_work() resubmission
2021-06-18 15:23 ` Jens Axboe
@ 2021-06-18 15:33 ` Pavel Begunkov
2021-06-18 15:35 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-06-18 15:33 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 6/18/21 4:23 PM, Jens Axboe wrote:
> On 6/17/21 11:14 AM, Pavel Begunkov wrote:
>> If task_state is cleared, io_req_task_work_add() will go the slow path
>> adding a task_work, setting the task_state, waking up the task and so
>> on. Not to mention it's expensive. tctx_task_work() first clears the
>> state and then executes all the work items queued, so if any of them
>> resubmits or adds new task_work items, it would unnecessarily go through
>> the slow path of io_req_task_work_add().
>>
>> Let's clear the ->task_state at the end. We still have to check
>> ->task_list for emptiness afterward to synchronise with
>> io_req_task_work_add(), do that, and set the state back if we're going
>> to retry, because clearing not-ours task_state on the next iteration
>> would be buggy.
>
> Are we not re-introducing the problem fixed by 1d5f360dd1a3c by swapping
> these around?
if (wq_list_empty(&tctx->task_list)) {
clear_bit(0, &tctx->task_state);
if (wq_list_empty(&tctx->task_list))
break;
... // goto repeat
}
Note wq_list_empty() right after clear_bit(), it will
retry splicing the list as it currently does.
There is some more for restoring the bit back, but
should be fine as well. Alternatively, could have
been as below, but found it uglier:
bool cleared = false;
...
if (wq_list_empty(&tctx->task_list)) {
if (cleared)
break;
cleared = true;
clear_bit(0, &tctx->task_state);
if (wq_list_empty(&tctx->task_list))
break;
goto repeat;
}
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 12/12] io_uring: improve in tctx_task_work() resubmission
2021-06-18 15:33 ` Pavel Begunkov
@ 2021-06-18 15:35 ` Jens Axboe
0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2021-06-18 15:35 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 6/18/21 9:33 AM, Pavel Begunkov wrote:
> On 6/18/21 4:23 PM, Jens Axboe wrote:
>> On 6/17/21 11:14 AM, Pavel Begunkov wrote:
>>> If task_state is cleared, io_req_task_work_add() will go the slow path
>>> adding a task_work, setting the task_state, waking up the task and so
>>> on. Not to mention it's expensive. tctx_task_work() first clears the
>>> state and then executes all the work items queued, so if any of them
>>> resubmits or adds new task_work items, it would unnecessarily go through
>>> the slow path of io_req_task_work_add().
>>>
>>> Let's clear the ->task_state at the end. We still have to check
>>> ->task_list for emptiness afterward to synchronise with
>>> io_req_task_work_add(), do that, and set the state back if we're going
>>> to retry, because clearing not-ours task_state on the next iteration
>>> would be buggy.
>>
>> Are we not re-introducing the problem fixed by 1d5f360dd1a3c by swapping
>> these around?
>
> if (wq_list_empty(&tctx->task_list)) {
> clear_bit(0, &tctx->task_state);
> if (wq_list_empty(&tctx->task_list))
> break;
> ... // goto repeat
> }
Yeah ok, that should do it.
I've applied the series, thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread