* [PATCH 01/11] io_uring: kill fictitious submit iteration index
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 02/11] io_uring: keep io_*_prep() naming consistent Pavel Begunkov
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
@i and @submitted are very much coupled together, and there is no need
to keep them both. Remove @i, it doesn't change generated binary but
helps to keep a single source of truth.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4352bcea3d9d..32a6c89e69b1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6884,7 +6884,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
{
struct io_submit_link link;
- int i, submitted = 0;
+ int submitted = 0;
/* if we have a backlog and couldn't flush it all, return BUSY */
if (test_bit(0, &ctx->sq_check_overflow)) {
@@ -6904,7 +6904,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
io_submit_state_start(&ctx->submit_state, nr);
link.head = NULL;
- for (i = 0; i < nr; i++) {
+ while (submitted < nr) {
const struct io_uring_sqe *sqe;
struct io_kiocb *req;
int err;
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 02/11] io_uring: keep io_*_prep() naming consistent
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
2021-02-18 18:29 ` [PATCH 01/11] io_uring: kill fictitious submit iteration index Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 03/11] io_uring: don't duplicate ->file check in sfr Pavel Begunkov
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
Follow io_*_prep() naming pattern, there are only fsync and sfr that
don't do that.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 32a6c89e69b1..adb5cd4b760d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4008,7 +4008,7 @@ static int io_nop(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_prep_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+static int io_fsync_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_ring_ctx *ctx = req->ctx;
@@ -4595,7 +4595,7 @@ static int io_close(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_prep_sfr(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+static int io_sfr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_ring_ctx *ctx = req->ctx;
@@ -6081,9 +6081,9 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
case IORING_OP_POLL_REMOVE:
return io_poll_remove_prep(req, sqe);
case IORING_OP_FSYNC:
- return io_prep_fsync(req, sqe);
+ return io_fsync_prep(req, sqe);
case IORING_OP_SYNC_FILE_RANGE:
- return io_prep_sfr(req, sqe);
+ return io_sfr_prep(req, sqe);
case IORING_OP_SENDMSG:
case IORING_OP_SEND:
return io_sendmsg_prep(req, sqe);
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 03/11] io_uring: don't duplicate ->file check in sfr
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
2021-02-18 18:29 ` [PATCH 01/11] io_uring: kill fictitious submit iteration index Pavel Begunkov
2021-02-18 18:29 ` [PATCH 02/11] io_uring: keep io_*_prep() naming consistent Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 04/11] io_uring: move io_init_req()'s definition Pavel Begunkov
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
IORING_OP_SYNC_FILE_RANGE is marked as .needs_file, so the common path
will take care of assigning and validating req->file, no need to
duplicate it in io_sfr_prep().
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index adb5cd4b760d..db6680bb02d3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4599,9 +4599,6 @@ static int io_sfr_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_ring_ctx *ctx = req->ctx;
- if (!req->file)
- return -EBADF;
-
if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
if (unlikely(sqe->addr || sqe->ioprio || sqe->buf_index))
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 04/11] io_uring: move io_init_req()'s definition
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (2 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 03/11] io_uring: don't duplicate ->file check in sfr Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 05/11] io_uring: move io_init_req() into io_submit_sqe() Pavel Begunkov
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
A preparation patch, symbol to symbol move io_init_req() +
io_check_restriction() a bit up. The submission path is pretty settled
down, so don't worry about backports and move the functions instead of
relying on forward declarations in the future.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 214 +++++++++++++++++++++++++-------------------------
1 file changed, 107 insertions(+), 107 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index db6680bb02d3..1563853caac5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -104,6 +104,10 @@
#define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \
IORING_REGISTER_LAST + IORING_OP_LAST)
+#define SQE_VALID_FLAGS (IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK| \
+ IOSQE_IO_HARDLINK | IOSQE_ASYNC | \
+ IOSQE_BUFFER_SELECT)
+
struct io_uring {
u32 head ____cacheline_aligned_in_smp;
u32 tail ____cacheline_aligned_in_smp;
@@ -6639,6 +6643,109 @@ static inline void io_queue_link_head(struct io_kiocb *req)
io_queue_sqe(req, NULL);
}
+/*
+ * Check SQE restrictions (opcode and flags).
+ *
+ * Returns 'true' if SQE is allowed, 'false' otherwise.
+ */
+static inline bool io_check_restriction(struct io_ring_ctx *ctx,
+ struct io_kiocb *req,
+ unsigned int sqe_flags)
+{
+ if (!ctx->restricted)
+ return true;
+
+ if (!test_bit(req->opcode, ctx->restrictions.sqe_op))
+ return false;
+
+ if ((sqe_flags & ctx->restrictions.sqe_flags_required) !=
+ ctx->restrictions.sqe_flags_required)
+ return false;
+
+ if (sqe_flags & ~(ctx->restrictions.sqe_flags_allowed |
+ ctx->restrictions.sqe_flags_required))
+ return false;
+
+ return true;
+}
+
+static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ struct io_submit_state *state;
+ unsigned int sqe_flags;
+ int id, ret = 0;
+
+ req->opcode = READ_ONCE(sqe->opcode);
+ /* same numerical values with corresponding REQ_F_*, safe to copy */
+ req->flags = sqe_flags = READ_ONCE(sqe->flags);
+ req->user_data = READ_ONCE(sqe->user_data);
+ req->async_data = NULL;
+ req->file = NULL;
+ req->ctx = ctx;
+ req->link = NULL;
+ req->fixed_rsrc_refs = NULL;
+ /* one is dropped after submission, the other at completion */
+ refcount_set(&req->refs, 2);
+ req->task = current;
+ req->result = 0;
+
+ /* enforce forwards compatibility on users */
+ if (unlikely(sqe_flags & ~SQE_VALID_FLAGS))
+ return -EINVAL;
+
+ if (unlikely(req->opcode >= IORING_OP_LAST))
+ return -EINVAL;
+
+ if (unlikely(io_sq_thread_acquire_mm_files(ctx, req)))
+ return -EFAULT;
+
+ if (unlikely(!io_check_restriction(ctx, req, sqe_flags)))
+ return -EACCES;
+
+ if ((sqe_flags & IOSQE_BUFFER_SELECT) &&
+ !io_op_defs[req->opcode].buffer_select)
+ return -EOPNOTSUPP;
+
+ id = READ_ONCE(sqe->personality);
+ if (id) {
+ struct io_identity *iod;
+
+ iod = idr_find(&ctx->personality_idr, id);
+ if (unlikely(!iod))
+ return -EINVAL;
+ refcount_inc(&iod->count);
+
+ __io_req_init_async(req);
+ get_cred(iod->creds);
+ req->work.identity = iod;
+ req->work.flags |= IO_WQ_WORK_CREDS;
+ }
+
+ state = &ctx->submit_state;
+
+ /*
+ * Plug now if we have more than 1 IO left after this, and the target
+ * is potentially a read/write to block based storage.
+ */
+ if (!state->plug_started && state->ios_left > 1 &&
+ io_op_defs[req->opcode].plug) {
+ blk_start_plug(&state->plug);
+ state->plug_started = true;
+ }
+
+ if (io_op_defs[req->opcode].needs_file) {
+ bool fixed = req->flags & REQ_F_FIXED_FILE;
+
+ req->file = io_file_get(state, req, READ_ONCE(sqe->fd), fixed);
+ if (unlikely(!req->file))
+ ret = -EBADF;
+ }
+
+ state->ios_left--;
+ return ret;
+}
+
struct io_submit_link {
struct io_kiocb *head;
struct io_kiocb *last;
@@ -6771,113 +6878,6 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
return NULL;
}
-/*
- * Check SQE restrictions (opcode and flags).
- *
- * Returns 'true' if SQE is allowed, 'false' otherwise.
- */
-static inline bool io_check_restriction(struct io_ring_ctx *ctx,
- struct io_kiocb *req,
- unsigned int sqe_flags)
-{
- if (!ctx->restricted)
- return true;
-
- if (!test_bit(req->opcode, ctx->restrictions.sqe_op))
- return false;
-
- if ((sqe_flags & ctx->restrictions.sqe_flags_required) !=
- ctx->restrictions.sqe_flags_required)
- return false;
-
- if (sqe_flags & ~(ctx->restrictions.sqe_flags_allowed |
- ctx->restrictions.sqe_flags_required))
- return false;
-
- return true;
-}
-
-#define SQE_VALID_FLAGS (IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK| \
- IOSQE_IO_HARDLINK | IOSQE_ASYNC | \
- IOSQE_BUFFER_SELECT)
-
-static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
-{
- struct io_submit_state *state;
- unsigned int sqe_flags;
- int id, ret = 0;
-
- req->opcode = READ_ONCE(sqe->opcode);
- /* same numerical values with corresponding REQ_F_*, safe to copy */
- req->flags = sqe_flags = READ_ONCE(sqe->flags);
- req->user_data = READ_ONCE(sqe->user_data);
- req->async_data = NULL;
- req->file = NULL;
- req->ctx = ctx;
- req->link = NULL;
- req->fixed_rsrc_refs = NULL;
- /* one is dropped after submission, the other at completion */
- refcount_set(&req->refs, 2);
- req->task = current;
- req->result = 0;
-
- /* enforce forwards compatibility on users */
- if (unlikely(sqe_flags & ~SQE_VALID_FLAGS))
- return -EINVAL;
-
- if (unlikely(req->opcode >= IORING_OP_LAST))
- return -EINVAL;
-
- if (unlikely(io_sq_thread_acquire_mm_files(ctx, req)))
- return -EFAULT;
-
- if (unlikely(!io_check_restriction(ctx, req, sqe_flags)))
- return -EACCES;
-
- if ((sqe_flags & IOSQE_BUFFER_SELECT) &&
- !io_op_defs[req->opcode].buffer_select)
- return -EOPNOTSUPP;
-
- id = READ_ONCE(sqe->personality);
- if (id) {
- struct io_identity *iod;
-
- iod = idr_find(&ctx->personality_idr, id);
- if (unlikely(!iod))
- return -EINVAL;
- refcount_inc(&iod->count);
-
- __io_req_init_async(req);
- get_cred(iod->creds);
- req->work.identity = iod;
- req->work.flags |= IO_WQ_WORK_CREDS;
- }
-
- state = &ctx->submit_state;
-
- /*
- * Plug now if we have more than 1 IO left after this, and the target
- * is potentially a read/write to block based storage.
- */
- if (!state->plug_started && state->ios_left > 1 &&
- io_op_defs[req->opcode].plug) {
- blk_start_plug(&state->plug);
- state->plug_started = true;
- }
-
- if (io_op_defs[req->opcode].needs_file) {
- bool fixed = req->flags & REQ_F_FIXED_FILE;
-
- req->file = io_file_get(state, req, READ_ONCE(sqe->fd), fixed);
- if (unlikely(!req->file))
- ret = -EBADF;
- }
-
- state->ios_left--;
- return ret;
-}
-
static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
{
struct io_submit_link link;
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 05/11] io_uring: move io_init_req() into io_submit_sqe()
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (3 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 04/11] io_uring: move io_init_req()'s definition Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 06/11] io_uring: move req link into submit_state Pavel Begunkov
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
Behaves identically, just move io_init_req() call into the beginning of
io_submit_sqes(). That looks better unloads io_submit_sqes().
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1563853caac5..5c9b3b9ff92f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6751,12 +6751,23 @@ struct io_submit_link {
struct io_kiocb *last;
};
-static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
+ const struct io_uring_sqe *sqe,
struct io_submit_link *link)
{
- struct io_ring_ctx *ctx = req->ctx;
int ret;
+ ret = io_init_req(ctx, req, sqe);
+ if (unlikely(ret)) {
+fail_req:
+ io_put_req(req);
+ io_req_complete(req, ret);
+ return ret;
+ }
+
+ trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
+ true, ctx->flags & IORING_SETUP_SQPOLL);
+
/*
* If we already have a head request, queue this one for async
* submittal once the head completes. If we don't have a head but
@@ -6782,7 +6793,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
if (unlikely(ret)) {
/* fail even hard links since we don't submit */
head->flags |= REQ_F_FAIL_LINK;
- return ret;
+ goto fail_req;
}
trace_io_uring_link(ctx, req, head);
link->last->link = req;
@@ -6904,7 +6915,6 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
while (submitted < nr) {
const struct io_uring_sqe *sqe;
struct io_kiocb *req;
- int err;
req = io_alloc_req(ctx);
if (unlikely(!req)) {
@@ -6919,20 +6929,8 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
}
/* will complete beyond this point, count as submitted */
submitted++;
-
- err = io_init_req(ctx, req, sqe);
- if (unlikely(err)) {
-fail_req:
- io_put_req(req);
- io_req_complete(req, err);
+ if (io_submit_sqe(ctx, req, sqe, &link))
break;
- }
-
- trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
- true, ctx->flags & IORING_SETUP_SQPOLL);
- err = io_submit_sqe(req, sqe, &link);
- if (err)
- goto fail_req;
}
if (unlikely(submitted != nr)) {
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 06/11] io_uring: move req link into submit_state
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (4 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 05/11] io_uring: move io_init_req() into io_submit_sqe() Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 07/11] io_uring: don't submit link on error Pavel Begunkov
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
Move struct io_submit_link into submit_state, which is a part of a
submission state and so belongs to it. It saves us from explicitly
passing it, and init/deinit is now nicely hidden in
io_submit_state_[start,end].
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5c9b3b9ff92f..fe2379179b00 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -283,8 +283,14 @@ struct io_comp_state {
struct list_head locked_free_list;
};
+struct io_submit_link {
+ struct io_kiocb *head;
+ struct io_kiocb *last;
+};
+
struct io_submit_state {
struct blk_plug plug;
+ struct io_submit_link link;
/*
* io_kiocb alloc cache
@@ -6746,15 +6752,10 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
return ret;
}
-struct io_submit_link {
- struct io_kiocb *head;
- struct io_kiocb *last;
-};
-
static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
- const struct io_uring_sqe *sqe,
- struct io_submit_link *link)
+ const struct io_uring_sqe *sqe)
{
+ struct io_submit_link *link = &ctx->submit_state.link;
int ret;
ret = io_init_req(ctx, req, sqe);
@@ -6829,6 +6830,8 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
static void io_submit_state_end(struct io_submit_state *state,
struct io_ring_ctx *ctx)
{
+ if (state->link.head)
+ io_queue_link_head(state->link.head);
if (state->comp.nr)
io_submit_flush_completions(&state->comp, ctx);
if (state->plug_started)
@@ -6844,6 +6847,8 @@ static void io_submit_state_start(struct io_submit_state *state,
{
state->plug_started = false;
state->ios_left = max_ios;
+ /* set only head, no need to init link_last in advance */
+ state->link.head = NULL;
}
static void io_commit_sqring(struct io_ring_ctx *ctx)
@@ -6891,7 +6896,6 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
{
- struct io_submit_link link;
int submitted = 0;
/* if we have a backlog and couldn't flush it all, return BUSY */
@@ -6908,9 +6912,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
percpu_counter_add(¤t->io_uring->inflight, nr);
refcount_add(nr, ¤t->usage);
-
io_submit_state_start(&ctx->submit_state, nr);
- link.head = NULL;
while (submitted < nr) {
const struct io_uring_sqe *sqe;
@@ -6929,7 +6931,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
}
/* will complete beyond this point, count as submitted */
submitted++;
- if (io_submit_sqe(ctx, req, sqe, &link))
+ if (io_submit_sqe(ctx, req, sqe))
break;
}
@@ -6942,10 +6944,8 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
percpu_counter_sub(&tctx->inflight, unused);
put_task_struct_many(current, unused);
}
- if (link.head)
- io_queue_link_head(link.head);
- io_submit_state_end(&ctx->submit_state, ctx);
+ io_submit_state_end(&ctx->submit_state, ctx);
/* Commit SQ ring head once we've consumed and submitted all SQEs */
io_commit_sqring(ctx);
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 07/11] io_uring: don't submit link on error
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (5 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 06/11] io_uring: move req link into submit_state Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 08/11] io_uring: split sqe-prep and async setup Pavel Begunkov
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
If we get an error in io_init_req() for a request that would have been
linked, we break the submission but still issue a partially composed
link, that's nasty, fail it instead.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fe2379179b00..62688866357c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6763,6 +6763,9 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
fail_req:
io_put_req(req);
io_req_complete(req, ret);
+ /* fail even hard links since we don't submit */
+ if (link->head)
+ link->head->flags |= REQ_F_FAIL_LINK;
return ret;
}
@@ -6791,11 +6794,8 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
ctx->drain_next = 1;
}
ret = io_req_defer_prep(req, sqe);
- if (unlikely(ret)) {
- /* fail even hard links since we don't submit */
- head->flags |= REQ_F_FAIL_LINK;
+ if (unlikely(ret))
goto fail_req;
- }
trace_io_uring_link(ctx, req, head);
link->last->link = req;
link->last = req;
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 08/11] io_uring: split sqe-prep and async setup
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (6 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 07/11] io_uring: don't submit link on error Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 09/11] io_uring: do io_*_prep() early in io_submit_sqe() Pavel Begunkov
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
There are two kinds of opcode-specific preparations we do. The first is
just initialising req with what is always needed for an opcode and
reading all non-generic SQE fields. And the second is copying some of
the stuff like iovec preparing to punt a request to somewhere async,
e.g. to io-wq or for draining. For requests that have tried an inline
execution but still needing to be punted, the second prep type is done
by the opcode handler itself.
Currently, we don't explicitly split those preparation steps, but
combining both of them into io_*_prep(), altering the behaviour by
allocating ->async_data. That's pretty messy and hard to follow and also
gets in the way of some optimisations.
Split the steps, leave the first type as where it is now, and put the
second into a new io_req_prep_async() helper. It may make us to do opcode
switch twice, but it's worth it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 120 +++++++++++++++++++++++++++++---------------------
1 file changed, 70 insertions(+), 50 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 62688866357c..987cfd8db213 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3472,19 +3472,9 @@ static inline int io_rw_prep_async(struct io_kiocb *req, int rw)
static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
- ssize_t ret;
-
- ret = io_prep_rw(req, sqe);
- if (ret)
- return ret;
-
if (unlikely(!(req->file->f_mode & FMODE_READ)))
return -EBADF;
-
- /* either don't need iovec imported or already have it */
- if (!req->async_data)
- return 0;
- return io_rw_prep_async(req, READ);
+ return io_prep_rw(req, sqe);
}
/*
@@ -3669,19 +3659,9 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
- ssize_t ret;
-
- ret = io_prep_rw(req, sqe);
- if (ret)
- return ret;
-
if (unlikely(!(req->file->f_mode & FMODE_WRITE)))
return -EBADF;
-
- /* either don't need iovec imported or already have it */
- if (!req->async_data)
- return 0;
- return io_rw_prep_async(req, WRITE);
+ return io_prep_rw(req, sqe);
}
static int io_write(struct io_kiocb *req, unsigned int issue_flags)
@@ -4668,11 +4648,21 @@ static int io_sendmsg_copy_hdr(struct io_kiocb *req,
req->sr_msg.msg_flags, &iomsg->free_iov);
}
+static int io_sendmsg_prep_async(struct io_kiocb *req)
+{
+ int ret;
+
+ if (!io_op_defs[req->opcode].needs_async_data)
+ return 0;
+ ret = io_sendmsg_copy_hdr(req, req->async_data);
+ if (!ret)
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return ret;
+}
+
static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
- struct io_async_msghdr *async_msg = req->async_data;
struct io_sr_msg *sr = &req->sr_msg;
- int ret;
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
@@ -4685,13 +4675,7 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
#endif
-
- if (!async_msg || !io_op_defs[req->opcode].needs_async_data)
- return 0;
- ret = io_sendmsg_copy_hdr(req, async_msg);
- if (!ret)
- req->flags |= REQ_F_NEED_CLEANUP;
- return ret;
+ return 0;
}
static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
@@ -4885,13 +4869,22 @@ static inline unsigned int io_put_recv_kbuf(struct io_kiocb *req)
return io_put_kbuf(req, req->sr_msg.kbuf);
}
-static int io_recvmsg_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
+static int io_recvmsg_prep_async(struct io_kiocb *req)
{
- struct io_async_msghdr *async_msg = req->async_data;
- struct io_sr_msg *sr = &req->sr_msg;
int ret;
+ if (!io_op_defs[req->opcode].needs_async_data)
+ return 0;
+ ret = io_recvmsg_copy_hdr(req, req->async_data);
+ if (!ret)
+ req->flags |= REQ_F_NEED_CLEANUP;
+ return ret;
+}
+
+static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_sr_msg *sr = &req->sr_msg;
+
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
@@ -4904,13 +4897,7 @@ static int io_recvmsg_prep(struct io_kiocb *req,
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
#endif
-
- if (!async_msg || !io_op_defs[req->opcode].needs_async_data)
- return 0;
- ret = io_recvmsg_copy_hdr(req, async_msg);
- if (!ret)
- req->flags |= REQ_F_NEED_CLEANUP;
- return ret;
+ return 0;
}
static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
@@ -5063,10 +5050,17 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
+static int io_connect_prep_async(struct io_kiocb *req)
+{
+ struct io_async_connect *io = req->async_data;
+ struct io_connect *conn = &req->connect;
+
+ return move_addr_to_kernel(conn->addr, conn->addr_len, &io->address);
+}
+
static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_connect *conn = &req->connect;
- struct io_async_connect *io = req->async_data;
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
@@ -5075,12 +5069,7 @@ static int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
conn->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
conn->addr_len = READ_ONCE(sqe->addr2);
-
- if (!io)
- return 0;
-
- return move_addr_to_kernel(conn->addr, conn->addr_len,
- &io->address);
+ return 0;
}
static int io_connect(struct io_kiocb *req, unsigned int issue_flags)
@@ -6148,14 +6137,45 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return-EINVAL;
}
+static int io_req_prep_async(struct io_kiocb *req)
+{
+ switch (req->opcode) {
+ case IORING_OP_READV:
+ case IORING_OP_READ_FIXED:
+ case IORING_OP_READ:
+ return io_rw_prep_async(req, READ);
+ case IORING_OP_WRITEV:
+ case IORING_OP_WRITE_FIXED:
+ case IORING_OP_WRITE:
+ return io_rw_prep_async(req, WRITE);
+ case IORING_OP_SENDMSG:
+ case IORING_OP_SEND:
+ return io_sendmsg_prep_async(req);
+ case IORING_OP_RECVMSG:
+ case IORING_OP_RECV:
+ return io_recvmsg_prep_async(req);
+ case IORING_OP_CONNECT:
+ return io_connect_prep_async(req);
+ }
+ return 0;
+}
+
static int io_req_defer_prep(struct io_kiocb *req,
const struct io_uring_sqe *sqe)
{
+ int ret;
+
if (!sqe)
return 0;
if (io_alloc_async_data(req))
return -EAGAIN;
- return io_req_prep(req, sqe);
+ ret = io_req_prep(req, sqe);
+ if (ret)
+ return ret;
+ if (req->async_data)
+ return io_req_prep_async(req);
+ return 0;
+
}
static u32 io_get_sequence(struct io_kiocb *req)
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 09/11] io_uring: do io_*_prep() early in io_submit_sqe()
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (7 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 08/11] io_uring: split sqe-prep and async setup Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 10/11] io_uring: don't do async setup for links' heads Pavel Begunkov
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
Now as preparations are split from async setup, we can do the first one
pretty early not spilling it across multiple call sites. And after it's
done SQE is not needed anymore and we can save on passing it deeply into
the submission stack.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 59 +++++++++++++++++++++------------------------------
1 file changed, 24 insertions(+), 35 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 987cfd8db213..7d54b0abbb82 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6160,22 +6160,16 @@ static int io_req_prep_async(struct io_kiocb *req)
return 0;
}
-static int io_req_defer_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
+static int io_req_defer_prep(struct io_kiocb *req)
{
- int ret;
-
- if (!sqe)
+ if (!io_op_defs[req->opcode].needs_async_data)
return 0;
- if (io_alloc_async_data(req))
- return -EAGAIN;
- ret = io_req_prep(req, sqe);
- if (ret)
- return ret;
+ /* some opcodes init it during the inital prep */
if (req->async_data)
- return io_req_prep_async(req);
- return 0;
-
+ return 0;
+ if (__io_alloc_async_data(req))
+ return -EAGAIN;
+ return io_req_prep_async(req);
}
static u32 io_get_sequence(struct io_kiocb *req)
@@ -6191,7 +6185,7 @@ static u32 io_get_sequence(struct io_kiocb *req)
return total_submitted - nr_reqs;
}
-static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+static int io_req_defer(struct io_kiocb *req)
{
struct io_ring_ctx *ctx = req->ctx;
struct io_defer_entry *de;
@@ -6208,11 +6202,9 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (!req_need_defer(req, seq) && list_empty_careful(&ctx->defer_list))
return 0;
- if (!req->async_data) {
- ret = io_req_defer_prep(req, sqe);
- if (ret)
- return ret;
- }
+ ret = io_req_defer_prep(req);
+ if (ret)
+ return ret;
io_prep_async_link(req);
de = kmalloc(sizeof(*de), GFP_KERNEL);
if (!de)
@@ -6631,11 +6623,11 @@ static void __io_queue_sqe(struct io_kiocb *req)
io_queue_linked_timeout(linked_timeout);
}
-static void io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+static void io_queue_sqe(struct io_kiocb *req)
{
int ret;
- ret = io_req_defer(req, sqe);
+ ret = io_req_defer(req);
if (ret) {
if (ret != -EIOCBQUEUED) {
fail_req:
@@ -6644,18 +6636,11 @@ static void io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
io_req_complete(req, ret);
}
} else if (req->flags & REQ_F_FORCE_ASYNC) {
- if (!req->async_data) {
- ret = io_req_defer_prep(req, sqe);
- if (unlikely(ret))
- goto fail_req;
- }
+ ret = io_req_defer_prep(req);
+ if (unlikely(ret))
+ goto fail_req;
io_queue_async_work(req);
} else {
- if (sqe) {
- ret = io_req_prep(req, sqe);
- if (unlikely(ret))
- goto fail_req;
- }
__io_queue_sqe(req);
}
}
@@ -6666,7 +6651,7 @@ static inline void io_queue_link_head(struct io_kiocb *req)
io_put_req(req);
io_req_complete(req, -ECANCELED);
} else
- io_queue_sqe(req, NULL);
+ io_queue_sqe(req);
}
/*
@@ -6788,7 +6773,11 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
link->head->flags |= REQ_F_FAIL_LINK;
return ret;
}
+ ret = io_req_prep(req, sqe);
+ if (unlikely(ret))
+ goto fail_req;
+ /* don't need @sqe from now on */
trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
true, ctx->flags & IORING_SETUP_SQPOLL);
@@ -6813,7 +6802,7 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
head->flags |= REQ_F_IO_DRAIN;
ctx->drain_next = 1;
}
- ret = io_req_defer_prep(req, sqe);
+ ret = io_req_defer_prep(req);
if (unlikely(ret))
goto fail_req;
trace_io_uring_link(ctx, req, head);
@@ -6831,13 +6820,13 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
ctx->drain_next = 0;
}
if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
- ret = io_req_defer_prep(req, sqe);
+ ret = io_req_defer_prep(req);
if (unlikely(ret))
req->flags |= REQ_F_FAIL_LINK;
link->head = req;
link->last = req;
} else {
- io_queue_sqe(req, sqe);
+ io_queue_sqe(req);
}
}
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 10/11] io_uring: don't do async setup for links' heads
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (8 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 09/11] io_uring: do io_*_prep() early in io_submit_sqe() Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 18:29 ` [PATCH 11/11] io_uring: fail links more in io_submit_sqe() Pavel Begunkov
2021-02-18 20:25 ` [PATCH 00/11] submission path cleanups and optimisation Jens Axboe
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
Now, as we can do async setup without holding an SQE, we can skip doing
io_req_defer_prep() for link heads, it will be tried to be executed
inline and follows all the rules of the non-linked requests.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7d54b0abbb82..45f78fd25ce2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6820,9 +6820,6 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
ctx->drain_next = 0;
}
if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
- ret = io_req_defer_prep(req);
- if (unlikely(ret))
- req->flags |= REQ_F_FAIL_LINK;
link->head = req;
link->last = req;
} else {
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 11/11] io_uring: fail links more in io_submit_sqe()
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (9 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 10/11] io_uring: don't do async setup for links' heads Pavel Begunkov
@ 2021-02-18 18:29 ` Pavel Begunkov
2021-02-18 20:25 ` [PATCH 00/11] submission path cleanups and optimisation Jens Axboe
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2021-02-18 18:29 UTC (permalink / raw)
To: Jens Axboe, io-uring
Instead of marking a link with REQ_F_FAIL_LINK on an error and delaying
its failing to the caller, do it eagerly right when after getting an
error in io_submit_sqe(). This renders FAIL_LINK checks in
io_queue_link_head() useless and we can skip it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 45f78fd25ce2..2fdfe5fa00b0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6645,15 +6645,6 @@ static void io_queue_sqe(struct io_kiocb *req)
}
}
-static inline void io_queue_link_head(struct io_kiocb *req)
-{
- if (unlikely(req->flags & REQ_F_FAIL_LINK)) {
- io_put_req(req);
- io_req_complete(req, -ECANCELED);
- } else
- io_queue_sqe(req);
-}
-
/*
* Check SQE restrictions (opcode and flags).
*
@@ -6768,9 +6759,13 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
fail_req:
io_put_req(req);
io_req_complete(req, ret);
- /* fail even hard links since we don't submit */
- if (link->head)
+ if (link->head) {
+ /* fail even hard links since we don't submit */
link->head->flags |= REQ_F_FAIL_LINK;
+ io_put_req(link->head);
+ io_req_complete(link->head, -ECANCELED);
+ link->head = NULL;
+ }
return ret;
}
ret = io_req_prep(req, sqe);
@@ -6811,7 +6806,7 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
/* last request of a link, enqueue the link */
if (!(req->flags & (REQ_F_LINK | REQ_F_HARDLINK))) {
- io_queue_link_head(head);
+ io_queue_sqe(head);
link->head = NULL;
}
} else {
@@ -6837,7 +6832,7 @@ static void io_submit_state_end(struct io_submit_state *state,
struct io_ring_ctx *ctx)
{
if (state->link.head)
- io_queue_link_head(state->link.head);
+ io_queue_sqe(state->link.head);
if (state->comp.nr)
io_submit_flush_completions(&state->comp, ctx);
if (state->plug_started)
--
2.24.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 00/11] submission path cleanups and optimisation
2021-02-18 18:29 [PATCH 00/11] submission path cleanups and optimisation Pavel Begunkov
` (10 preceding siblings ...)
2021-02-18 18:29 ` [PATCH 11/11] io_uring: fail links more in io_submit_sqe() Pavel Begunkov
@ 2021-02-18 20:25 ` Jens Axboe
11 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2021-02-18 20:25 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 2/18/21 11:29 AM, Pavel Begunkov wrote:
> Refactor how we do io_req_prep(), which is currently spilled across
> multiple ifs and functions, that's a mess which is hard to validate.
> It also cuts down amount of work we're doing during submission, where
> nops(batch=32) test shows 15217 vs 16830 KIOPS, before and after
> respectively.
>
> 1-6 are easy and should change nothing functionally.
>
> 7/11 cancels all the link, where currently it can be partially executed.
> That happens only in some cases, and currently is not consistent. That
> change alters the user visible behaviour and breaks one liburing test,
> but looks like the right thing to do. (IMHO, the test is buggy in that
> regard).
>
> 8/11 makes us to do one more opcode switch for where we previously
> were doing io_req_defer_prep(). That includes all links, but the total
> performance win, removing an extra async setup in 10/11, and just making
> all the thing cleaner justifies it well enough.
Looks good and tests good, I'm going to queue this up for 5.12 as it'll
be easier for later fixes too.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread