* [PATCH v4 01/12] io_uring: support CQE32 in io_uring_cqe
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 02/12] io_uring: store add. return values for CQE32 Stefan Roesch
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This adds the big_cqe array to the struct io_uring_cqe to support large
CQE's.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
include/uapi/linux/io_uring.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index ee677dbd6a6d..7020a434e3b1 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -111,6 +111,7 @@ enum {
#define IORING_SETUP_R_DISABLED (1U << 6) /* start with ring disabled */
#define IORING_SETUP_SUBMIT_ALL (1U << 7) /* continue submit on error */
#define IORING_SETUP_SQE128 (1U << 8) /* SQEs are 128b */
+#define IORING_SETUP_CQE32 (1U << 9) /* CQEs are 32b */
enum {
IORING_OP_NOP,
@@ -208,6 +209,12 @@ struct io_uring_cqe {
__u64 user_data; /* sqe->data submission passed back */
__s32 res; /* result code for this event */
__u32 flags;
+
+ /*
+ * If the ring is initialized with IORING_SETUP_CQE32, then this field
+ * contains 16-bytes of padding, doubling the size of the CQE.
+ */
+ __u64 big_cqe[];
};
/*
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 02/12] io_uring: store add. return values for CQE32
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 01/12] io_uring: support CQE32 in io_uring_cqe Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 03/12] io_uring: change ring size calculation " Stefan Roesch
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This reuses the hash list node for the storage we need to hold the two
64-bit values that must be passed back.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4c32cf987ef3..bf2b02518332 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -964,7 +964,13 @@ struct io_kiocb {
atomic_t poll_refs;
struct io_task_work io_task_work;
/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
- struct hlist_node hash_node;
+ union {
+ struct hlist_node hash_node;
+ struct {
+ u64 extra1;
+ u64 extra2;
+ };
+ };
/* internal polling, see IORING_FEAT_FAST_POLL */
struct async_poll *apoll;
/* opcode allocated if it needs to store data for async defer */
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 03/12] io_uring: change ring size calculation for CQE32
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 01/12] io_uring: support CQE32 in io_uring_cqe Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 02/12] io_uring: store add. return values for CQE32 Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 04/12] io_uring: add CQE32 setup processing Stefan Roesch
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This changes the function rings_size to take large CQE's into account.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index bf2b02518332..9712483d3a17 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9693,8 +9693,8 @@ static void *io_mem_alloc(size_t size)
return (void *) __get_free_pages(gfp, get_order(size));
}
-static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
- size_t *sq_offset)
+static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries,
+ unsigned int cq_entries, size_t *sq_offset)
{
struct io_rings *rings;
size_t off, sq_array_size;
@@ -9702,6 +9702,10 @@ static unsigned long rings_size(unsigned sq_entries, unsigned cq_entries,
off = struct_size(rings, cqes, cq_entries);
if (off == SIZE_MAX)
return SIZE_MAX;
+ if (ctx->flags & IORING_SETUP_CQE32) {
+ if (check_shl_overflow(off, 1, &off))
+ return SIZE_MAX;
+ }
#ifdef CONFIG_SMP
off = ALIGN(off, SMP_CACHE_BYTES);
@@ -11365,7 +11369,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
ctx->sq_entries = p->sq_entries;
ctx->cq_entries = p->cq_entries;
- size = rings_size(p->sq_entries, p->cq_entries, &sq_array_offset);
+ size = rings_size(ctx, p->sq_entries, p->cq_entries, &sq_array_offset);
if (size == SIZE_MAX)
return -EOVERFLOW;
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 04/12] io_uring: add CQE32 setup processing
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (2 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 03/12] io_uring: change ring size calculation " Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 05/12] io_uring: add CQE32 completion processing Stefan Roesch
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k
This adds two new function to setup and fill the CQE32 result structure.
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9712483d3a17..8cb51676d38d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2175,12 +2175,70 @@ static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
req->cqe.res, req->cqe.flags);
}
+static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
+ struct io_kiocb *req)
+{
+ struct io_uring_cqe *cqe;
+ u64 extra1 = req->extra1;
+ u64 extra2 = req->extra2;
+
+ trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
+ req->cqe.res, req->cqe.flags);
+
+ /*
+ * If we can't get a cq entry, userspace overflowed the
+ * submission (by quite a lot). Increment the overflow count in
+ * the ring.
+ */
+ cqe = io_get_cqe(ctx);
+ if (likely(cqe)) {
+ memcpy(cqe, &req->cqe, sizeof(struct io_uring_cqe));
+ cqe->big_cqe[0] = extra1;
+ cqe->big_cqe[1] = extra2;
+ return true;
+ }
+
+ return io_cqring_event_overflow(ctx, req->cqe.user_data,
+ req->cqe.res, req->cqe.flags);
+}
+
static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
{
trace_io_uring_complete(req->ctx, req, req->cqe.user_data, res, cflags);
return __io_fill_cqe(req->ctx, req->cqe.user_data, res, cflags);
}
+static inline void __io_fill_cqe32_req(struct io_kiocb *req, s32 res, u32 cflags,
+ u64 extra1, u64 extra2)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_uring_cqe *cqe;
+
+ if (WARN_ON_ONCE(!(ctx->flags & IORING_SETUP_CQE32)))
+ return;
+ if (req->flags & REQ_F_CQE_SKIP)
+ return;
+
+ trace_io_uring_complete(ctx, req, req->cqe.user_data, res, cflags);
+
+ /*
+ * If we can't get a cq entry, userspace overflowed the
+ * submission (by quite a lot). Increment the overflow count in
+ * the ring.
+ */
+ cqe = io_get_cqe(ctx);
+ if (likely(cqe)) {
+ WRITE_ONCE(cqe->user_data, req->cqe.user_data);
+ WRITE_ONCE(cqe->res, res);
+ WRITE_ONCE(cqe->flags, cflags);
+ WRITE_ONCE(cqe->big_cqe[0], extra1);
+ WRITE_ONCE(cqe->big_cqe[1], extra2);
+ return;
+ }
+
+ io_cqring_event_overflow(ctx, req->cqe.user_data, res, cflags);
+}
+
static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
s32 res, u32 cflags)
{
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 05/12] io_uring: add CQE32 completion processing
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (3 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 04/12] io_uring: add CQE32 setup processing Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 06/12] io_uring: modify io_get_cqe for CQE32 Stefan Roesch
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This adds the completion processing for the large CQE's and makes sure
that the extra1 and extra2 fields are passed through.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 53 +++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 45 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8cb51676d38d..f300130fd9f0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2247,18 +2247,15 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
return __io_fill_cqe(ctx, user_data, res, cflags);
}
-static void __io_req_complete_post(struct io_kiocb *req, s32 res,
- u32 cflags)
+static void __io_req_complete_put(struct io_kiocb *req)
{
- struct io_ring_ctx *ctx = req->ctx;
-
- if (!(req->flags & REQ_F_CQE_SKIP))
- __io_fill_cqe_req(req, res, cflags);
/*
* If we're the last reference to this request, add to our locked
* free_list cache.
*/
if (req_ref_put_and_test(req)) {
+ struct io_ring_ctx *ctx = req->ctx;
+
if (req->flags & IO_REQ_LINK_FLAGS) {
if (req->flags & IO_DISARM_MASK)
io_disarm_next(req);
@@ -2281,8 +2278,23 @@ static void __io_req_complete_post(struct io_kiocb *req, s32 res,
}
}
-static void io_req_complete_post(struct io_kiocb *req, s32 res,
- u32 cflags)
+static void __io_req_complete_post(struct io_kiocb *req, s32 res,
+ u32 cflags)
+{
+ if (!(req->flags & REQ_F_CQE_SKIP))
+ __io_fill_cqe_req(req, res, cflags);
+ __io_req_complete_put(req);
+}
+
+static void __io_req_complete_post32(struct io_kiocb *req, s32 res,
+ u32 cflags, u64 extra1, u64 extra2)
+{
+ if (!(req->flags & REQ_F_CQE_SKIP))
+ __io_fill_cqe32_req(req, res, cflags, extra1, extra2);
+ __io_req_complete_put(req);
+}
+
+static void io_req_complete_post(struct io_kiocb *req, s32 res, u32 cflags)
{
struct io_ring_ctx *ctx = req->ctx;
@@ -2293,6 +2305,18 @@ static void io_req_complete_post(struct io_kiocb *req, s32 res,
io_cqring_ev_posted(ctx);
}
+static void io_req_complete_post32(struct io_kiocb *req, s32 res,
+ u32 cflags, u64 extra1, u64 extra2)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ spin_lock(&ctx->completion_lock);
+ __io_req_complete_post32(req, res, cflags, extra1, extra2);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+}
+
static inline void io_req_complete_state(struct io_kiocb *req, s32 res,
u32 cflags)
{
@@ -2310,6 +2334,19 @@ static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
io_req_complete_post(req, res, cflags);
}
+static inline void __io_req_complete32(struct io_kiocb *req,
+ unsigned int issue_flags, s32 res,
+ u32 cflags, u64 extra1, u64 extra2)
+{
+ if (issue_flags & IO_URING_F_COMPLETE_DEFER) {
+ io_req_complete_state(req, res, cflags);
+ req->extra1 = extra1;
+ req->extra2 = extra2;
+ } else {
+ io_req_complete_post32(req, res, cflags, extra1, extra2);
+ }
+}
+
static inline void io_req_complete(struct io_kiocb *req, s32 res)
{
__io_req_complete(req, 0, res, 0);
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 06/12] io_uring: modify io_get_cqe for CQE32
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (4 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 05/12] io_uring: add CQE32 completion processing Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 07/12] io_uring: flush completions " Stefan Roesch
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k
Modify accesses to the CQE array to take large CQE's into account. The
index needs to be shifted by one for large CQE's.
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f300130fd9f0..726238dc65dc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1909,8 +1909,12 @@ static noinline struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx)
{
struct io_rings *rings = ctx->rings;
unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1);
+ unsigned int shift = 0;
unsigned int free, queued, len;
+ if (ctx->flags & IORING_SETUP_CQE32)
+ shift = 1;
+
/* userspace may cheat modifying the tail, be safe and do min */
queued = min(__io_cqring_events(ctx), ctx->cq_entries);
free = ctx->cq_entries - queued;
@@ -1922,15 +1926,26 @@ static noinline struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx)
ctx->cached_cq_tail++;
ctx->cqe_cached = &rings->cqes[off];
ctx->cqe_sentinel = ctx->cqe_cached + len;
- return ctx->cqe_cached++;
+ ctx->cqe_cached++;
+ return &rings->cqes[off << shift];
}
static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
{
if (likely(ctx->cqe_cached < ctx->cqe_sentinel)) {
+ struct io_uring_cqe *cqe = ctx->cqe_cached;
+
+ if (ctx->flags & IORING_SETUP_CQE32) {
+ unsigned int off = ctx->cqe_cached - ctx->rings->cqes;
+
+ cqe += off;
+ }
+
ctx->cached_cq_tail++;
- return ctx->cqe_cached++;
+ ctx->cqe_cached++;
+ return cqe;
}
+
return __io_get_cqe(ctx);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 07/12] io_uring: flush completions for CQE32
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (5 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 06/12] io_uring: modify io_get_cqe for CQE32 Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 08/12] io_uring: overflow processing " Stefan Roesch
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k
This flushes the completions according to their CQE type: the same
processing is done for the default CQE size, but for large CQE's the
extra1 and extra2 fields are filled in.
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 726238dc65dc..68b61d2b356d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2885,8 +2885,12 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
- if (!(req->flags & REQ_F_CQE_SKIP))
- __io_fill_cqe_req_filled(ctx, req);
+ if (!(req->flags & REQ_F_CQE_SKIP)) {
+ if (!(ctx->flags & IORING_SETUP_CQE32))
+ __io_fill_cqe_req_filled(ctx, req);
+ else
+ __io_fill_cqe32_req_filled(ctx, req);
+ }
}
io_commit_cqring(ctx);
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 08/12] io_uring: overflow processing for CQE32
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (6 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 07/12] io_uring: flush completions " Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 09/12] io_uring: add tracing for additional CQE32 fields Stefan Roesch
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This adds the overflow processing for large CQE's.
This adds two parameters to the io_cqring_event_overflow function and
uses these fields to initialize the large CQE fields.
Allocate enough space for large CQE's in the overflow structue. If no
large CQE's are used, the size of the allocation is unchanged.
The cqe field can have a different size depending if its a large
CQE or not. To be able to allocate different sizes, the two fields
in the structure are re-ordered.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 68b61d2b356d..3630671325ea 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -220,8 +220,8 @@ struct io_mapped_ubuf {
struct io_ring_ctx;
struct io_overflow_cqe {
- struct io_uring_cqe cqe;
struct list_head list;
+ struct io_uring_cqe cqe;
};
struct io_fixed_file {
@@ -2017,10 +2017,14 @@ static void io_cqring_ev_posted_iopoll(struct io_ring_ctx *ctx)
static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
{
bool all_flushed, posted;
+ size_t cqe_size = sizeof(struct io_uring_cqe);
if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
return false;
+ if (ctx->flags & IORING_SETUP_CQE32)
+ cqe_size <<= 1;
+
posted = false;
spin_lock(&ctx->completion_lock);
while (!list_empty(&ctx->cq_overflow_list)) {
@@ -2032,7 +2036,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
ocqe = list_first_entry(&ctx->cq_overflow_list,
struct io_overflow_cqe, list);
if (cqe)
- memcpy(cqe, &ocqe->cqe, sizeof(*cqe));
+ memcpy(cqe, &ocqe->cqe, cqe_size);
else
io_account_cq_overflow(ctx);
@@ -2121,11 +2125,16 @@ static __cold void io_uring_drop_tctx_refs(struct task_struct *task)
}
static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
- s32 res, u32 cflags)
+ s32 res, u32 cflags, u64 extra1, u64 extra2)
{
struct io_overflow_cqe *ocqe;
+ size_t ocq_size = sizeof(struct io_overflow_cqe);
+ bool is_cqe32 = (ctx->flags & IORING_SETUP_CQE32);
+
+ if (is_cqe32)
+ ocq_size += sizeof(struct io_uring_cqe);
- ocqe = kmalloc(sizeof(*ocqe), GFP_ATOMIC | __GFP_ACCOUNT);
+ ocqe = kmalloc(ocq_size, GFP_ATOMIC | __GFP_ACCOUNT);
if (!ocqe) {
/*
* If we're in ring overflow flush mode, or in task cancel mode,
@@ -2144,6 +2153,10 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
ocqe->cqe.user_data = user_data;
ocqe->cqe.res = res;
ocqe->cqe.flags = cflags;
+ if (is_cqe32) {
+ ocqe->cqe.big_cqe[0] = extra1;
+ ocqe->cqe.big_cqe[1] = extra2;
+ }
list_add_tail(&ocqe->list, &ctx->cq_overflow_list);
return true;
}
@@ -2165,7 +2178,7 @@ static inline bool __io_fill_cqe(struct io_ring_ctx *ctx, u64 user_data,
WRITE_ONCE(cqe->flags, cflags);
return true;
}
- return io_cqring_event_overflow(ctx, user_data, res, cflags);
+ return io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0);
}
static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
@@ -2187,7 +2200,7 @@ static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
return true;
}
return io_cqring_event_overflow(ctx, req->cqe.user_data,
- req->cqe.res, req->cqe.flags);
+ req->cqe.res, req->cqe.flags, 0, 0);
}
static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
@@ -2213,8 +2226,8 @@ static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
return true;
}
- return io_cqring_event_overflow(ctx, req->cqe.user_data,
- req->cqe.res, req->cqe.flags);
+ return io_cqring_event_overflow(ctx, req->cqe.user_data, req->cqe.res,
+ req->cqe.flags, extra1, extra2);
}
static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
@@ -2251,7 +2264,7 @@ static inline void __io_fill_cqe32_req(struct io_kiocb *req, s32 res, u32 cflags
return;
}
- io_cqring_event_overflow(ctx, req->cqe.user_data, res, cflags);
+ io_cqring_event_overflow(ctx, req->cqe.user_data, res, cflags, extra1, extra2);
}
static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 09/12] io_uring: add tracing for additional CQE32 fields
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (7 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 08/12] io_uring: overflow processing " Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 10/12] io_uring: support CQE32 in /proc info Stefan Roesch
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This adds tracing for the extra1 and extra2 fields.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 11 ++++++-----
include/trace/events/io_uring.h | 18 ++++++++++++++----
2 files changed, 20 insertions(+), 9 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3630671325ea..9dd075e39850 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2187,7 +2187,7 @@ static inline bool __io_fill_cqe_req_filled(struct io_ring_ctx *ctx,
struct io_uring_cqe *cqe;
trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
- req->cqe.res, req->cqe.flags);
+ req->cqe.res, req->cqe.flags, 0, 0);
/*
* If we can't get a cq entry, userspace overflowed the
@@ -2211,7 +2211,7 @@ static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
u64 extra2 = req->extra2;
trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
- req->cqe.res, req->cqe.flags);
+ req->cqe.res, req->cqe.flags, extra1, extra2);
/*
* If we can't get a cq entry, userspace overflowed the
@@ -2232,7 +2232,7 @@ static inline bool __io_fill_cqe32_req_filled(struct io_ring_ctx *ctx,
static inline bool __io_fill_cqe_req(struct io_kiocb *req, s32 res, u32 cflags)
{
- trace_io_uring_complete(req->ctx, req, req->cqe.user_data, res, cflags);
+ trace_io_uring_complete(req->ctx, req, req->cqe.user_data, res, cflags, 0, 0);
return __io_fill_cqe(req->ctx, req->cqe.user_data, res, cflags);
}
@@ -2247,7 +2247,8 @@ static inline void __io_fill_cqe32_req(struct io_kiocb *req, s32 res, u32 cflags
if (req->flags & REQ_F_CQE_SKIP)
return;
- trace_io_uring_complete(ctx, req, req->cqe.user_data, res, cflags);
+ trace_io_uring_complete(ctx, req, req->cqe.user_data, res, cflags,
+ extra1, extra2);
/*
* If we can't get a cq entry, userspace overflowed the
@@ -2271,7 +2272,7 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
s32 res, u32 cflags)
{
ctx->cq_extra++;
- trace_io_uring_complete(ctx, NULL, user_data, res, cflags);
+ trace_io_uring_complete(ctx, NULL, user_data, res, cflags, 0, 0);
return __io_fill_cqe(ctx, user_data, res, cflags);
}
diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h
index 8477414d6d06..2eb4f4e47de4 100644
--- a/include/trace/events/io_uring.h
+++ b/include/trace/events/io_uring.h
@@ -318,13 +318,16 @@ TRACE_EVENT(io_uring_fail_link,
* @user_data: user data associated with the request
* @res: result of the request
* @cflags: completion flags
+ * @extra1: extra 64-bit data for CQE32
+ * @extra2: extra 64-bit data for CQE32
*
*/
TRACE_EVENT(io_uring_complete,
- TP_PROTO(void *ctx, void *req, u64 user_data, int res, unsigned cflags),
+ TP_PROTO(void *ctx, void *req, u64 user_data, int res, unsigned cflags,
+ u64 extra1, u64 extra2),
- TP_ARGS(ctx, req, user_data, res, cflags),
+ TP_ARGS(ctx, req, user_data, res, cflags, extra1, extra2),
TP_STRUCT__entry (
__field( void *, ctx )
@@ -332,6 +335,8 @@ TRACE_EVENT(io_uring_complete,
__field( u64, user_data )
__field( int, res )
__field( unsigned, cflags )
+ __field( u64, extra1 )
+ __field( u64, extra2 )
),
TP_fast_assign(
@@ -340,12 +345,17 @@ TRACE_EVENT(io_uring_complete,
__entry->user_data = user_data;
__entry->res = res;
__entry->cflags = cflags;
+ __entry->extra1 = extra1;
+ __entry->extra2 = extra2;
),
- TP_printk("ring %p, req %p, user_data 0x%llx, result %d, cflags 0x%x",
+ TP_printk("ring %p, req %p, user_data 0x%llx, result %d, cflags 0x%x "
+ "extra1 %llu extra2 %llu ",
__entry->ctx, __entry->req,
__entry->user_data,
- __entry->res, __entry->cflags)
+ __entry->res, __entry->cflags,
+ (unsigned long long) __entry->extra1,
+ (unsigned long long) __entry->extra2)
);
/**
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 10/12] io_uring: support CQE32 in /proc info
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (8 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 09/12] io_uring: add tracing for additional CQE32 fields Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 11/12] io_uring: enable CQE32 Stefan Roesch
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k
This exposes the extra1 and extra2 fields in the /proc output.
Signed-off-by: Stefan Roesch <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9dd075e39850..e1b84204b0ab 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11354,10 +11354,15 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
unsigned int sq_tail = READ_ONCE(r->sq.tail);
unsigned int cq_head = READ_ONCE(r->cq.head);
unsigned int cq_tail = READ_ONCE(r->cq.tail);
+ unsigned int cq_shift = 0;
unsigned int sq_entries, cq_entries;
bool has_lock;
+ bool is_cqe32 = (ctx->flags & IORING_SETUP_CQE32);
unsigned int i;
+ if (is_cqe32)
+ cq_shift = 1;
+
/*
* we may get imprecise sqe and cqe info if uring is actively running
* since we get cached_sq_head and cached_cq_tail without uring_lock
@@ -11390,11 +11395,18 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
for (i = 0; i < cq_entries; i++) {
unsigned int entry = i + cq_head;
- struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
+ struct io_uring_cqe *cqe = &r->cqes[(entry & cq_mask) << cq_shift];
- seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
+ if (!is_cqe32) {
+ seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
entry & cq_mask, cqe->user_data, cqe->res,
cqe->flags);
+ } else {
+ seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x, "
+ "extra1:%llu, extra2:%llu\n",
+ entry & cq_mask, cqe->user_data, cqe->res,
+ cqe->flags, cqe->big_cqe[0], cqe->big_cqe[1]);
+ }
}
/*
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 11/12] io_uring: enable CQE32
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (9 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 10/12] io_uring: support CQE32 in /proc info Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 18:21 ` [PATCH v4 12/12] io_uring: support CQE32 for nop operation Stefan Roesch
2022-04-26 22:58 ` [PATCH v4 00/12] add large CQE support for io-uring Jens Axboe
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This enables large CQE's in the uring setup.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e1b84204b0ab..caeddcf8a61c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11752,7 +11752,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
- IORING_SETUP_SQE128))
+ IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
return -EINVAL;
return io_uring_create(entries, &p, params);
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 12/12] io_uring: support CQE32 for nop operation
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (10 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 11/12] io_uring: enable CQE32 Stefan Roesch
@ 2022-04-26 18:21 ` Stefan Roesch
2022-04-26 22:58 ` [PATCH v4 00/12] add large CQE support for io-uring Jens Axboe
12 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2022-04-26 18:21 UTC (permalink / raw)
To: io-uring, linux-nvme, kernel-team; +Cc: shr, joshi.k, Jens Axboe
This adds support for filling the extra1 and extra2 fields for large
CQE's.
Co-developed-by: Jens Axboe <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
---
fs/io_uring.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index caeddcf8a61c..9e1fb8be9687 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -744,6 +744,12 @@ struct io_msg {
u32 len;
};
+struct io_nop {
+ struct file *file;
+ u64 extra1;
+ u64 extra2;
+};
+
struct io_async_connect {
struct sockaddr_storage address;
};
@@ -937,6 +943,7 @@ struct io_kiocb {
struct io_msg msg;
struct io_xattr xattr;
struct io_socket sock;
+ struct io_nop nop;
};
u8 opcode;
@@ -4872,6 +4879,19 @@ static int io_splice(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
+static int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ /*
+ * If the ring is setup with CQE32, relay back addr/addr
+ */
+ if (req->ctx->flags & IORING_SETUP_CQE32) {
+ req->nop.extra1 = READ_ONCE(sqe->addr);
+ req->nop.extra2 = READ_ONCE(sqe->addr2);
+ }
+
+ return 0;
+}
+
/*
* IORING_OP_NOP just posts a completion event, nothing else.
*/
@@ -4882,7 +4902,11 @@ static int io_nop(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
- __io_req_complete(req, issue_flags, 0, 0);
+ if (!(ctx->flags & IORING_SETUP_CQE32))
+ __io_req_complete(req, issue_flags, 0, 0);
+ else
+ __io_req_complete32(req, issue_flags, 0, 0, req->nop.extra1,
+ req->nop.extra2);
return 0;
}
@@ -7354,7 +7378,7 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
switch (req->opcode) {
case IORING_OP_NOP:
- return 0;
+ return io_nop_prep(req, sqe);
case IORING_OP_READV:
case IORING_OP_READ_FIXED:
case IORING_OP_READ:
--
2.30.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/12] add large CQE support for io-uring
2022-04-26 18:21 [PATCH v4 00/12] add large CQE support for io-uring Stefan Roesch
` (11 preceding siblings ...)
2022-04-26 18:21 ` [PATCH v4 12/12] io_uring: support CQE32 for nop operation Stefan Roesch
@ 2022-04-26 22:58 ` Jens Axboe
12 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2022-04-26 22:58 UTC (permalink / raw)
To: io-uring, shr, linux-nvme, kernel-team; +Cc: joshi.k
On Tue, 26 Apr 2022 11:21:22 -0700, Stefan Roesch wrote:
> This adds the large CQE support for io-uring. Large CQE's are 16 bytes longer.
> To support the longer CQE's the allocation part is changed and when the CQE is
> accessed.
>
> The allocation of the large CQE's is twice as big, so the allocation size is
> doubled. The ring size calculation needs to take this into account.
>
> [...]
Applied, thanks!
[01/12] io_uring: support CQE32 in io_uring_cqe
commit: 5c8bcc8e97123e3e68a6b1aa4c3eb6c5d5b9d174
[02/12] io_uring: store add. return values for CQE32
commit: 04c3f8c8deae29e184d54b2cd815f39fd46c6b2e
[03/12] io_uring: change ring size calculation for CQE32
commit: 9291ac41fda10ba7e80fc2147ca39a3b1d130ef9
[04/12] io_uring: add CQE32 setup processing
commit: bc6bda624e953fcf42c6075fe35a219ce6df4bc4
[05/12] io_uring: add CQE32 completion processing
commit: 22b76e8c5fd312701a1827b970230ee66aa24f69
[06/12] io_uring: modify io_get_cqe for CQE32
commit: 771c7f07faf909b9993fd5e42581c8c82531fb58
[07/12] io_uring: flush completions for CQE32
commit: b8e5029ed965c01066009bcb172c082b60ff436c
[08/12] io_uring: overflow processing for CQE32
commit: 3ee1cd786a668ba2a6e8dfefacb8f29e1d995c12
[09/12] io_uring: add tracing for additional CQE32 fields
commit: 225afd24978b55a771660fb4c6ad90cac75e7da8
[10/12] io_uring: support CQE32 in /proc info
commit: 41a971975a3ae2b498b9f5ecad34c34280f0ffdc
[11/12] io_uring: enable CQE32
commit: bb30aab40bcb6e9b80321615a2847a9491c95bf9
[12/12] io_uring: support CQE32 for nop operation
commit: 0fde61fe729221b43d9c8374cb57e571f4fb2a16
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread