* [RFC 00/16] Introduce ring flexible placement
@ 2025-11-06 17:01 Pavel Begunkov
2025-11-06 17:01 ` [RFC 01/16] io_uring: add helper calculating region byte size Pavel Begunkov
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
This patchset allows the user to put rings and ring headers
into the memory/parameter region and specify offsets where they
should lie.
It solves a problem where the headers are placed before ring entries,
which 1. usually padded wasting memory and using more cache lines
than a smarter placement would need. And 2. it grows the region size
to a non pow2. It's also handy to have a way to put everything into
a single region.
It's implemented for SQ and CQ, but planned to be supported by zcrx
as well. There is a bunch of cleanups (1-13), it'd make sense to
merge some of them separately. It also adds an ability to register
a memory region during setup and not via a registration (Patch 15).
And the placement handling is in Patch 16.
Pavel Begunkov (16):
io_uring: add helper calculating region byte size
io_uring: pass sq entires in the params struct
io_uring: use mem_is_zero to check ring params
io_uring: move flags check to io_uring_sanitise_params
io_uring: introduce struct io_ctx_config
io_uring: split out config init helper
io_uring: add structure keeping ring offsets
io_uring: pre-calculate scq offsets
io_uring: inroduce helper for setting user offset
io_uring: separate cqe array from headers
io_uring/region: introduce io_region_slice
io_uring: convert pointer init to io_region_slice
io_uring: refactor rings_size()
io_uring: extract io_create_mem_region
io_uring: allow creating mem region at setup
io_uring: introduce SCQ placement
include/linux/io_uring_types.h | 17 +-
include/uapi/linux/io_uring.h | 21 +-
io_uring/fdinfo.c | 2 +-
io_uring/io_uring.c | 341 +++++++++++++++++++++++----------
io_uring/io_uring.h | 34 +++-
io_uring/memmap.c | 16 +-
io_uring/memmap.h | 7 +
io_uring/register.c | 75 ++++----
8 files changed, 360 insertions(+), 153 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 01/16] io_uring: add helper calculating region byte size
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 02/16] io_uring: pass sq entires in the params struct Pavel Begunkov
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
There has been type related issues with region size calculation, add an
utility helper function that returns the size and handles type
conversions right.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/memmap.c | 4 ++--
io_uring/memmap.h | 5 +++++
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index ce8118434c5a..b329cee8d6e8 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -135,7 +135,7 @@ static int io_region_pin_pages(struct io_ring_ctx *ctx,
struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
- unsigned long size = (size_t)mr->nr_pages << PAGE_SHIFT;
+ size_t size = io_region_size(mr);
struct page **pages;
int nr_pages;
@@ -156,7 +156,7 @@ static int io_region_allocate_pages(struct io_ring_ctx *ctx,
unsigned long mmap_offset)
{
gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
- size_t size = (size_t) mr->nr_pages << PAGE_SHIFT;
+ size_t size = io_region_size(mr);
unsigned long nr_allocated;
struct page **pages;
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index f9e94458c01f..d4b8b6363a7d 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -43,4 +43,9 @@ static inline void io_region_publish(struct io_ring_ctx *ctx,
*dst_region = *src_region;
}
+static inline size_t io_region_size(struct io_mapped_region *mr)
+{
+ return (size_t)mr->nr_pages << PAGE_SHIFT;
+}
+
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 02/16] io_uring: pass sq entires in the params struct
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
2025-11-06 17:01 ` [RFC 01/16] io_uring: add helper calculating region byte size Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 03/16] io_uring: use mem_is_zero to check ring params Pavel Begunkov
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
There is no need to pass the user requested number of SQ entries
separately from the main parameter structure io_uring_params. Initialise
it at the beginning and stop passing it in favour of struct
io_uring_params::sq_entries.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 11 +++++++----
io_uring/io_uring.h | 2 +-
io_uring/register.c | 2 +-
3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f9f8ffcdad07..eae1ad3cd02e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3484,8 +3484,10 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
return 0;
}
-int io_uring_fill_params(unsigned entries, struct io_uring_params *p)
+int io_uring_fill_params(struct io_uring_params *p)
{
+ unsigned entries = p->sq_entries;
+
if (!entries)
return -EINVAL;
if (entries > IORING_MAX_ENTRIES) {
@@ -3547,7 +3549,7 @@ int io_uring_fill_params(unsigned entries, struct io_uring_params *p)
return 0;
}
-static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
+static __cold int io_uring_create(struct io_uring_params *p,
struct io_uring_params __user *params)
{
struct io_ring_ctx *ctx;
@@ -3559,7 +3561,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
if (ret)
return ret;
- ret = io_uring_fill_params(entries, p);
+ ret = io_uring_fill_params(p);
if (unlikely(ret))
return ret;
@@ -3698,7 +3700,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
if (p.flags & ~IORING_SETUP_FLAGS)
return -EINVAL;
- return io_uring_create(entries, &p, params);
+ p.sq_entries = entries;
+ return io_uring_create(&p, params);
}
static inline int io_uring_allowed(void)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 23c268ab1c8f..b2251446497a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -136,7 +136,7 @@ static inline bool io_should_wake(struct io_wait_queue *iowq)
unsigned long rings_size(unsigned int flags, unsigned int sq_entries,
unsigned int cq_entries, size_t *sq_offset);
-int io_uring_fill_params(unsigned entries, struct io_uring_params *p);
+int io_uring_fill_params(struct io_uring_params *p);
bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32);
int io_run_task_work_sig(struct io_ring_ctx *ctx);
int io_run_local_work(struct io_ring_ctx *ctx, int min_events, int max_events);
diff --git a/io_uring/register.c b/io_uring/register.c
index 1a3e05be6e7b..0d70696468f6 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -416,7 +416,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
/* properties that are always inherited */
p.flags |= (ctx->flags & COPY_FLAGS);
- ret = io_uring_fill_params(p.sq_entries, &p);
+ ret = io_uring_fill_params(&p);
if (unlikely(ret))
return ret;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 03/16] io_uring: use mem_is_zero to check ring params
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
2025-11-06 17:01 ` [RFC 01/16] io_uring: add helper calculating region byte size Pavel Begunkov
2025-11-06 17:01 ` [RFC 02/16] io_uring: pass sq entires in the params struct Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 04/16] io_uring: move flags check to io_uring_sanitise_params Pavel Begunkov
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
mem_is_zero() does the job without hand rolled loops, use that to verify
reserved fields of ring params.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index eae1ad3cd02e..dec37cf7c62c 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3689,14 +3689,12 @@ static __cold int io_uring_create(struct io_uring_params *p,
static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
{
struct io_uring_params p;
- int i;
if (copy_from_user(&p, params, sizeof(p)))
return -EFAULT;
- for (i = 0; i < ARRAY_SIZE(p.resv); i++) {
- if (p.resv[i])
- return -EINVAL;
- }
+
+ if (!mem_is_zero(&p.resv, sizeof(p.resv)))
+ return -EINVAL;
if (p.flags & ~IORING_SETUP_FLAGS)
return -EINVAL;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 04/16] io_uring: move flags check to io_uring_sanitise_params
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (2 preceding siblings ...)
2025-11-06 17:01 ` [RFC 03/16] io_uring: use mem_is_zero to check ring params Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 05/16] io_uring: introduce struct io_ctx_config Pavel Begunkov
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
io_uring_sanitise_params() sanitises most of the setup flags invariants,
move the IORING_SETUP_FLAGS check from io_uring_setup() into it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index dec37cf7c62c..ef1b75c5a4d2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3435,6 +3435,9 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
{
unsigned flags = p->flags;
+ if (flags & ~IORING_SETUP_FLAGS)
+ return -EINVAL;
+
/* There is no way to mmap rings without a real fd */
if ((flags & IORING_SETUP_REGISTERED_FD_ONLY) &&
!(flags & IORING_SETUP_NO_MMAP))
@@ -3696,8 +3699,6 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
if (!mem_is_zero(&p.resv, sizeof(p.resv)))
return -EINVAL;
- if (p.flags & ~IORING_SETUP_FLAGS)
- return -EINVAL;
p.sq_entries = entries;
return io_uring_create(&p, params);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 05/16] io_uring: introduce struct io_ctx_config
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (3 preceding siblings ...)
2025-11-06 17:01 ` [RFC 04/16] io_uring: move flags check to io_uring_sanitise_params Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 06/16] io_uring: split out config init helper Pavel Begunkov
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
There will be more information needed during ctx setup, so instead of
passing a handful of pointers around, wrap them all into a new
structure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 19 +++++++++++--------
io_uring/io_uring.h | 5 +++++
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ef1b75c5a4d2..142811c7a4f5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3552,9 +3552,9 @@ int io_uring_fill_params(struct io_uring_params *p)
return 0;
}
-static __cold int io_uring_create(struct io_uring_params *p,
- struct io_uring_params __user *params)
+static __cold int io_uring_create(struct io_ctx_config *config)
{
+ struct io_uring_params *p = &config->p;
struct io_ring_ctx *ctx;
struct io_uring_task *tctx;
struct file *file;
@@ -3638,7 +3638,7 @@ static __cold int io_uring_create(struct io_uring_params *p,
p->features = IORING_FEAT_FLAGS;
- if (copy_to_user(params, p, sizeof(*p))) {
+ if (copy_to_user(config->uptr, p, sizeof(*p))) {
ret = -EFAULT;
goto err;
}
@@ -3691,16 +3691,19 @@ static __cold int io_uring_create(struct io_uring_params *p,
*/
static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
{
- struct io_uring_params p;
+ struct io_ctx_config config;
- if (copy_from_user(&p, params, sizeof(p)))
+ memset(&config, 0, sizeof(config));
+
+ if (copy_from_user(&config.p, params, sizeof(config.p)))
return -EFAULT;
- if (!mem_is_zero(&p.resv, sizeof(p.resv)))
+ if (!mem_is_zero(&config.p.resv, sizeof(config.p.resv)))
return -EINVAL;
- p.sq_entries = entries;
- return io_uring_create(&p, params);
+ config.p.sq_entries = entries;
+ config.uptr = params;
+ return io_uring_create(&config);
}
static inline int io_uring_allowed(void)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index b2251446497a..c4d47ad7777c 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -17,6 +17,11 @@
#include <trace/events/io_uring.h>
#endif
+struct io_ctx_config {
+ struct io_uring_params p;
+ struct io_uring_params __user *uptr;
+};
+
#define IORING_FEAT_FLAGS (IORING_FEAT_SINGLE_MMAP |\
IORING_FEAT_NODROP |\
IORING_FEAT_SUBMIT_STABLE |\
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 06/16] io_uring: split out config init helper
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (4 preceding siblings ...)
2025-11-06 17:01 ` [RFC 05/16] io_uring: introduce struct io_ctx_config Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 07/16] io_uring: add structure keeping ring offsets Pavel Begunkov
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Separate most of configuration verification / calculation into a
function, that will be the first step before doing any actual
allocation.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 142811c7a4f5..30ba60974f1d 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3552,12 +3552,9 @@ int io_uring_fill_params(struct io_uring_params *p)
return 0;
}
-static __cold int io_uring_create(struct io_ctx_config *config)
+static int io_prepare_config(struct io_ctx_config *config)
{
struct io_uring_params *p = &config->p;
- struct io_ring_ctx *ctx;
- struct io_uring_task *tctx;
- struct file *file;
int ret;
ret = io_uring_sanitise_params(p);
@@ -3565,7 +3562,22 @@ static __cold int io_uring_create(struct io_ctx_config *config)
return ret;
ret = io_uring_fill_params(p);
- if (unlikely(ret))
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static __cold int io_uring_create(struct io_ctx_config *config)
+{
+ struct io_uring_params *p = &config->p;
+ struct io_ring_ctx *ctx;
+ struct io_uring_task *tctx;
+ struct file *file;
+ int ret;
+
+ ret = io_prepare_config(config);
+ if (ret)
return ret;
ctx = io_ring_ctx_alloc(p);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 07/16] io_uring: add structure keeping ring offsets
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (5 preceding siblings ...)
2025-11-06 17:01 ` [RFC 06/16] io_uring: split out config init helper Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 08/16] io_uring: pre-calculate scq offsets Pavel Begunkov
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Add struct io_scq_dim that keeps all offset / size / dimension
information about the rings, and let rings_size() initialise it. It
improves calculation locality and allows to dedup some code.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 60 ++++++++++++++++++++++++---------------------
io_uring/io_uring.h | 12 +++++++--
io_uring/register.c | 19 ++++++--------
3 files changed, 49 insertions(+), 42 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 30ba60974f1d..8166ea9140f8 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2757,49 +2757,61 @@ static void io_rings_free(struct io_ring_ctx *ctx)
ctx->sq_sqes = NULL;
}
-unsigned long rings_size(unsigned int flags, unsigned int sq_entries,
- unsigned int cq_entries, size_t *sq_offset)
+int rings_size(unsigned int flags, unsigned int sq_entries,
+ unsigned int cq_entries, struct io_scq_dim *dims)
{
struct io_rings *rings;
size_t off, sq_array_size;
+ size_t sqe_size;
+
+ dims->sq_array_offset = SIZE_MAX;
+
+ sqe_size = sizeof(struct io_uring_sqe);
+ if (flags & IORING_SETUP_SQE128)
+ sqe_size *= 2;
+
+ dims->sq_size = array_size(sqe_size, sq_entries);
+ if (dims->sq_size == SIZE_MAX)
+ return -EOVERFLOW;
off = struct_size(rings, cqes, cq_entries);
if (off == SIZE_MAX)
- return SIZE_MAX;
+ return -EOVERFLOW;
if (flags & IORING_SETUP_CQE32) {
if (check_shl_overflow(off, 1, &off))
- return SIZE_MAX;
+ return -EOVERFLOW;
}
if (flags & IORING_SETUP_CQE_MIXED) {
if (cq_entries < 2)
- return SIZE_MAX;
+ return -EOVERFLOW;
}
if (flags & IORING_SETUP_SQE_MIXED) {
if (sq_entries < 2)
- return SIZE_MAX;
+ return -EOVERFLOW;
}
#ifdef CONFIG_SMP
off = ALIGN(off, SMP_CACHE_BYTES);
if (off == 0)
- return SIZE_MAX;
+ return -EOVERFLOW;
#endif
if (flags & IORING_SETUP_NO_SQARRAY) {
- *sq_offset = SIZE_MAX;
- return off;
+ dims->cq_comp_size = off;
+ return 0;
}
- *sq_offset = off;
+ dims->sq_array_offset = off;
sq_array_size = array_size(sizeof(u32), sq_entries);
if (sq_array_size == SIZE_MAX)
- return SIZE_MAX;
+ return -EOVERFLOW;
if (check_add_overflow(off, sq_array_size, &off))
- return SIZE_MAX;
+ return -EOVERFLOW;
- return off;
+ dims->cq_comp_size = off;
+ return 0;
}
static __cold void __io_req_caches_free(struct io_ring_ctx *ctx)
@@ -3354,27 +3366,19 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
{
struct io_uring_region_desc rd;
struct io_rings *rings;
- size_t sq_array_offset;
- size_t sq_size, cq_size, sqe_size;
+ struct io_scq_dim dims;
int ret;
/* make sure these are sane, as we already accounted them */
ctx->sq_entries = p->sq_entries;
ctx->cq_entries = p->cq_entries;
- sqe_size = sizeof(struct io_uring_sqe);
- if (p->flags & IORING_SETUP_SQE128)
- sqe_size *= 2;
- sq_size = array_size(sqe_size, p->sq_entries);
- if (sq_size == SIZE_MAX)
- return -EOVERFLOW;
- cq_size = rings_size(ctx->flags, p->sq_entries, p->cq_entries,
- &sq_array_offset);
- if (cq_size == SIZE_MAX)
- return -EOVERFLOW;
+ ret = rings_size(ctx->flags, p->sq_entries, p->cq_entries, &dims);
+ if (ret)
+ return ret;
memset(&rd, 0, sizeof(rd));
- rd.size = PAGE_ALIGN(cq_size);
+ rd.size = PAGE_ALIGN(dims.cq_comp_size);
if (ctx->flags & IORING_SETUP_NO_MMAP) {
rd.user_addr = p->cq_off.user_addr;
rd.flags |= IORING_MEM_REGION_TYPE_USER;
@@ -3385,10 +3389,10 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
- ctx->sq_array = (u32 *)((char *)rings + sq_array_offset);
+ ctx->sq_array = (u32 *)((char *)rings + dims.sq_array_offset);
memset(&rd, 0, sizeof(rd));
- rd.size = PAGE_ALIGN(sq_size);
+ rd.size = PAGE_ALIGN(dims.sq_size);
if (ctx->flags & IORING_SETUP_NO_MMAP) {
rd.user_addr = p->sq_off.user_addr;
rd.flags |= IORING_MEM_REGION_TYPE_USER;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index c4d47ad7777c..29464be9733c 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -17,6 +17,14 @@
#include <trace/events/io_uring.h>
#endif
+struct io_scq_dim {
+ size_t sq_array_offset;
+ size_t sq_size;
+
+ /* Compound array mmap'ed together with CQ. */
+ size_t cq_comp_size;
+};
+
struct io_ctx_config {
struct io_uring_params p;
struct io_uring_params __user *uptr;
@@ -139,8 +147,8 @@ static inline bool io_should_wake(struct io_wait_queue *iowq)
#define IORING_MAX_ENTRIES 32768
#define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES)
-unsigned long rings_size(unsigned int flags, unsigned int sq_entries,
- unsigned int cq_entries, size_t *sq_offset);
+int rings_size(unsigned int flags, unsigned int sq_entries,
+ unsigned int cq_entries, struct io_scq_dim *dims);
int io_uring_fill_params(struct io_uring_params *p);
bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32);
int io_run_task_work_sig(struct io_ring_ctx *ctx);
diff --git a/io_uring/register.c b/io_uring/register.c
index 0d70696468f6..85814f983dde 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -402,6 +402,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
struct io_ring_ctx_rings o = { }, n = { }, *to_free = NULL;
size_t size, sq_array_offset;
unsigned i, tail, old_head;
+ struct io_scq_dim dims;
struct io_uring_params p;
int ret;
@@ -419,11 +420,12 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ret = io_uring_fill_params(&p);
if (unlikely(ret))
return ret;
+ ret = rings_size(p.flags, p.sq_entries, p.cq_entries, &dims);
+ if (ret)
+ return ret;
- size = rings_size(p.flags, p.sq_entries, p.cq_entries,
- &sq_array_offset);
- if (size == SIZE_MAX)
- return -EOVERFLOW;
+ size = dims.cq_comp_size;
+ sq_array_offset = dims.sq_array_offset;
memset(&rd, 0, sizeof(rd));
rd.size = PAGE_ALIGN(size);
@@ -455,14 +457,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
return -EFAULT;
}
- if (p.flags & IORING_SETUP_SQE128)
- size = array_size(2 * sizeof(struct io_uring_sqe), p.sq_entries);
- else
- size = array_size(sizeof(struct io_uring_sqe), p.sq_entries);
- if (size == SIZE_MAX) {
- io_register_free_rings(ctx, &n);
- return -EOVERFLOW;
- }
+ size = dims.sq_size;
memset(&rd, 0, sizeof(rd));
rd.size = PAGE_ALIGN(size);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 08/16] io_uring: pre-calculate scq offsets
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (6 preceding siblings ...)
2025-11-06 17:01 ` [RFC 07/16] io_uring: add structure keeping ring offsets Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 09/16] io_uring: inroduce helper for setting user offset Pavel Begunkov
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Move ring size / offset calculations into io_prepare_config(). It
simplifies misconfiguration handling, keeps it local, allows to move
p->sq_off.array calculation closer to the rest of p->sq_off init.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 28 ++++++++++++++--------------
io_uring/io_uring.h | 1 +
2 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8166ea9140f8..aeb9555bd258 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3362,23 +3362,20 @@ bool io_is_uring_fops(struct file *file)
}
static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
- struct io_uring_params *p)
+ struct io_ctx_config *config)
{
+ struct io_uring_params *p = &config->p;
+ struct io_scq_dim *dims = &config->dims;
struct io_uring_region_desc rd;
struct io_rings *rings;
- struct io_scq_dim dims;
int ret;
/* make sure these are sane, as we already accounted them */
ctx->sq_entries = p->sq_entries;
ctx->cq_entries = p->cq_entries;
- ret = rings_size(ctx->flags, p->sq_entries, p->cq_entries, &dims);
- if (ret)
- return ret;
-
memset(&rd, 0, sizeof(rd));
- rd.size = PAGE_ALIGN(dims.cq_comp_size);
+ rd.size = PAGE_ALIGN(dims->cq_comp_size);
if (ctx->flags & IORING_SETUP_NO_MMAP) {
rd.user_addr = p->cq_off.user_addr;
rd.flags |= IORING_MEM_REGION_TYPE_USER;
@@ -3387,12 +3384,11 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
if (ret)
return ret;
ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
-
if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
- ctx->sq_array = (u32 *)((char *)rings + dims.sq_array_offset);
+ ctx->sq_array = (u32 *)((char *)rings + dims->sq_array_offset);
memset(&rd, 0, sizeof(rd));
- rd.size = PAGE_ALIGN(dims.sq_size);
+ rd.size = PAGE_ALIGN(dims->sq_size);
if (ctx->flags & IORING_SETUP_NO_MMAP) {
rd.user_addr = p->sq_off.user_addr;
rd.flags |= IORING_MEM_REGION_TYPE_USER;
@@ -3569,6 +3565,13 @@ static int io_prepare_config(struct io_ctx_config *config)
if (ret)
return ret;
+ ret = rings_size(p->flags, p->sq_entries, p->cq_entries, &config->dims);
+ if (ret)
+ return ret;
+
+ if (!(p->flags & IORING_SETUP_NO_SQARRAY))
+ p->sq_off.array = config->dims.sq_array_offset;
+
return 0;
}
@@ -3641,13 +3644,10 @@ static __cold int io_uring_create(struct io_ctx_config *config)
mmgrab(current->mm);
ctx->mm_account = current->mm;
- ret = io_allocate_scq_urings(ctx, p);
+ ret = io_allocate_scq_urings(ctx, config);
if (ret)
goto err;
- if (!(p->flags & IORING_SETUP_NO_SQARRAY))
- p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings;
-
ret = io_sq_offload_create(ctx, p);
if (ret)
goto err;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 29464be9733c..d1c2c70720f1 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -27,6 +27,7 @@ struct io_scq_dim {
struct io_ctx_config {
struct io_uring_params p;
+ struct io_scq_dim dims;
struct io_uring_params __user *uptr;
};
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 09/16] io_uring: inroduce helper for setting user offset
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (7 preceding siblings ...)
2025-11-06 17:01 ` [RFC 08/16] io_uring: pre-calculate scq offsets Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 10/16] io_uring: separate cqe array from headers Pavel Begunkov
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
struct io_[c,s]qring_offsets may require computation from other steps
like in case of sq_off.array. Move the initialisation out of
io_uring_fill_params() into a separate function that can be called
later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 52 ++++++++++++++++++++++++---------------------
io_uring/io_uring.h | 1 +
io_uring/register.c | 1 +
3 files changed, 30 insertions(+), 24 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index aeb9555bd258..be866a8e94bf 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3487,6 +3487,33 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
return 0;
}
+void io_fill_scq_offsets(struct io_uring_params *p, struct io_scq_dim *dims)
+{
+ p->sq_off.head = offsetof(struct io_rings, sq.head);
+ p->sq_off.tail = offsetof(struct io_rings, sq.tail);
+ p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask);
+ p->sq_off.ring_entries = offsetof(struct io_rings, sq_ring_entries);
+ p->sq_off.flags = offsetof(struct io_rings, sq_flags);
+ p->sq_off.dropped = offsetof(struct io_rings, sq_dropped);
+ p->sq_off.resv1 = 0;
+ if (!(p->flags & IORING_SETUP_NO_MMAP))
+ p->sq_off.user_addr = 0;
+
+ p->cq_off.head = offsetof(struct io_rings, cq.head);
+ p->cq_off.tail = offsetof(struct io_rings, cq.tail);
+ p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask);
+ p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries);
+ p->cq_off.overflow = offsetof(struct io_rings, cq_overflow);
+ p->cq_off.cqes = offsetof(struct io_rings, cqes);
+ p->cq_off.flags = offsetof(struct io_rings, cq_flags);
+ p->cq_off.resv1 = 0;
+ if (!(p->flags & IORING_SETUP_NO_MMAP))
+ p->cq_off.user_addr = 0;
+
+ if (!(p->flags & IORING_SETUP_NO_SQARRAY))
+ p->sq_off.array = dims->sq_array_offset;
+}
+
int io_uring_fill_params(struct io_uring_params *p)
{
unsigned entries = p->sq_entries;
@@ -3528,27 +3555,6 @@ int io_uring_fill_params(struct io_uring_params *p)
p->cq_entries = 2 * p->sq_entries;
}
- p->sq_off.head = offsetof(struct io_rings, sq.head);
- p->sq_off.tail = offsetof(struct io_rings, sq.tail);
- p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask);
- p->sq_off.ring_entries = offsetof(struct io_rings, sq_ring_entries);
- p->sq_off.flags = offsetof(struct io_rings, sq_flags);
- p->sq_off.dropped = offsetof(struct io_rings, sq_dropped);
- p->sq_off.resv1 = 0;
- if (!(p->flags & IORING_SETUP_NO_MMAP))
- p->sq_off.user_addr = 0;
-
- p->cq_off.head = offsetof(struct io_rings, cq.head);
- p->cq_off.tail = offsetof(struct io_rings, cq.tail);
- p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask);
- p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries);
- p->cq_off.overflow = offsetof(struct io_rings, cq_overflow);
- p->cq_off.cqes = offsetof(struct io_rings, cqes);
- p->cq_off.flags = offsetof(struct io_rings, cq_flags);
- p->cq_off.resv1 = 0;
- if (!(p->flags & IORING_SETUP_NO_MMAP))
- p->cq_off.user_addr = 0;
-
return 0;
}
@@ -3569,9 +3575,7 @@ static int io_prepare_config(struct io_ctx_config *config)
if (ret)
return ret;
- if (!(p->flags & IORING_SETUP_NO_SQARRAY))
- p->sq_off.array = config->dims.sq_array_offset;
-
+ io_fill_scq_offsets(p, &config->dims);
return 0;
}
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index d1c2c70720f1..f6c4b141a33d 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -151,6 +151,7 @@ static inline bool io_should_wake(struct io_wait_queue *iowq)
int rings_size(unsigned int flags, unsigned int sq_entries,
unsigned int cq_entries, struct io_scq_dim *dims);
int io_uring_fill_params(struct io_uring_params *p);
+void io_fill_scq_offsets(struct io_uring_params *p, struct io_scq_dim *dims);
bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32);
int io_run_task_work_sig(struct io_ring_ctx *ctx);
int io_run_local_work(struct io_ring_ctx *ctx, int min_events, int max_events);
diff --git a/io_uring/register.c b/io_uring/register.c
index 85814f983dde..da804f925622 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -423,6 +423,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ret = rings_size(p.flags, p.sq_entries, p.cq_entries, &dims);
if (ret)
return ret;
+ io_fill_scq_offsets(&p, &dims);
size = dims.cq_comp_size;
sq_array_offset = dims.sq_array_offset;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 10/16] io_uring: separate cqe array from headers
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (8 preceding siblings ...)
2025-11-06 17:01 ` [RFC 09/16] io_uring: inroduce helper for setting user offset Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 11/16] io_uring/region: introduce io_region_slice Pavel Begunkov
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Keep a pointer to the CQ separate from SCQ headers, it'll be used
shortly in next patches. Also, don't overestimate the CQ size for
SETUP_CQE32, which not only doubles memory for CQ entries but also the
headers as well.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/linux/io_uring_types.h | 17 +++++++++--------
io_uring/fdinfo.c | 2 +-
io_uring/io_uring.c | 35 ++++++++++++++++++++++------------
io_uring/io_uring.h | 1 +
io_uring/register.c | 8 +++++++-
5 files changed, 41 insertions(+), 22 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 92780764d5fa..91ded559a147 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -209,14 +209,6 @@ struct io_rings {
* ordered with any other data.
*/
u32 cq_overflow;
- /*
- * Ring buffer of completion events.
- *
- * The kernel writes completion events fresh every time they are
- * produced, so the application is allowed to modify pending
- * entries.
- */
- struct io_uring_cqe cqes[] ____cacheline_aligned_in_smp;
};
struct io_restriction {
@@ -274,6 +266,15 @@ struct io_ring_ctx {
struct task_struct *submitter_task;
struct io_rings *rings;
+ /*
+ * Ring buffer of completion events.
+ *
+ * The kernel writes completion events fresh every time they are
+ * produced, so the application is allowed to modify pending
+ * entries.
+ */
+ struct io_uring_cqe *cqes;
+
struct percpu_ref refs;
clockid_t clockid;
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index ac6e7edc7027..eae13ac9b1a9 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -153,7 +153,7 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
struct io_uring_cqe *cqe;
bool cqe32 = false;
- cqe = &r->cqes[(cq_head & cq_mask)];
+ cqe = &ctx->cqes[(cq_head & cq_mask)];
if (cqe->flags & IORING_CQE_F_32 || ctx->flags & IORING_SETUP_CQE32)
cqe32 = true;
seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x",
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index be866a8e94bf..9aef41f6ce23 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -745,7 +745,7 @@ static struct io_overflow_cqe *io_alloc_ocqe(struct io_ring_ctx *ctx,
static bool io_fill_nop_cqe(struct io_ring_ctx *ctx, unsigned int off)
{
if (__io_cqring_events(ctx) < ctx->cq_entries) {
- struct io_uring_cqe *cqe = &ctx->rings->cqes[off];
+ struct io_uring_cqe *cqe = &ctx->cqes[off];
cqe->user_data = 0;
cqe->res = 0;
@@ -763,7 +763,6 @@ static bool io_fill_nop_cqe(struct io_ring_ctx *ctx, unsigned int off)
*/
bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32)
{
- struct io_rings *rings = ctx->rings;
unsigned int off = ctx->cached_cq_tail & (ctx->cq_entries - 1);
unsigned int free, queued, len;
@@ -798,7 +797,7 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32)
len <<= 1;
}
- ctx->cqe_cached = &rings->cqes[off];
+ ctx->cqe_cached = &ctx->cqes[off];
ctx->cqe_sentinel = ctx->cqe_cached + len;
return true;
}
@@ -2760,8 +2759,8 @@ static void io_rings_free(struct io_ring_ctx *ctx)
int rings_size(unsigned int flags, unsigned int sq_entries,
unsigned int cq_entries, struct io_scq_dim *dims)
{
- struct io_rings *rings;
size_t off, sq_array_size;
+ size_t cq_size, cqe_size;
size_t sqe_size;
dims->sq_array_offset = SIZE_MAX;
@@ -2769,18 +2768,26 @@ int rings_size(unsigned int flags, unsigned int sq_entries,
sqe_size = sizeof(struct io_uring_sqe);
if (flags & IORING_SETUP_SQE128)
sqe_size *= 2;
+ cqe_size = sizeof(struct io_uring_cqe);
+ if (flags & IORING_SETUP_CQE32)
+ cqe_size *= 2;
dims->sq_size = array_size(sqe_size, sq_entries);
if (dims->sq_size == SIZE_MAX)
return -EOVERFLOW;
- off = struct_size(rings, cqes, cq_entries);
+ off = sizeof(struct io_rings);
+ off = L1_CACHE_ALIGN(off);
+ dims->cq_offset = off;
+
+ cq_size = array_size(cqe_size, cq_entries);
+ if (cq_size == SIZE_MAX)
+ return -EOVERFLOW;
+
+ off = size_add(off, cq_size);
if (off == SIZE_MAX)
return -EOVERFLOW;
- if (flags & IORING_SETUP_CQE32) {
- if (check_shl_overflow(off, 1, &off))
- return -EOVERFLOW;
- }
+
if (flags & IORING_SETUP_CQE_MIXED) {
if (cq_entries < 2)
return -EOVERFLOW;
@@ -3368,6 +3375,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
struct io_scq_dim *dims = &config->dims;
struct io_uring_region_desc rd;
struct io_rings *rings;
+ void *ptr;
int ret;
/* make sure these are sane, as we already accounted them */
@@ -3383,9 +3391,12 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
ret = io_create_region(ctx, &ctx->ring_region, &rd, IORING_OFF_CQ_RING);
if (ret)
return ret;
- ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
+ ptr = io_region_get_ptr(&ctx->ring_region);
+ ctx->rings = rings = ptr;
+ ctx->cqes = ptr + config->dims.cq_offset;
+
if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
- ctx->sq_array = (u32 *)((char *)rings + dims->sq_array_offset);
+ ctx->sq_array = ptr + dims->sq_array_offset;
memset(&rd, 0, sizeof(rd));
rd.size = PAGE_ALIGN(dims->sq_size);
@@ -3504,7 +3515,7 @@ void io_fill_scq_offsets(struct io_uring_params *p, struct io_scq_dim *dims)
p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask);
p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries);
p->cq_off.overflow = offsetof(struct io_rings, cq_overflow);
- p->cq_off.cqes = offsetof(struct io_rings, cqes);
+ p->cq_off.cqes = dims->cq_offset;
p->cq_off.flags = offsetof(struct io_rings, cq_flags);
p->cq_off.resv1 = 0;
if (!(p->flags & IORING_SETUP_NO_MMAP))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index f6c4b141a33d..80228c5a843c 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -20,6 +20,7 @@
struct io_scq_dim {
size_t sq_array_offset;
size_t sq_size;
+ size_t cq_offset;
/* Compound array mmap'ed together with CQ. */
size_t cq_comp_size;
diff --git a/io_uring/register.c b/io_uring/register.c
index da804f925622..b43a121e2974 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -373,6 +373,7 @@ static int io_register_clock(struct io_ring_ctx *ctx,
struct io_ring_ctx_rings {
struct io_rings *rings;
struct io_uring_sqe *sq_sqes;
+ struct io_uring_cqe *cqes;
struct io_mapped_region sq_region;
struct io_mapped_region ring_region;
@@ -439,6 +440,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
return ret;
n.rings = io_region_get_ptr(&n.ring_region);
+ n.cqes = io_region_get_ptr(&n.ring_region) + dims.cq_offset;
/*
* At this point n.rings is shared with userspace, just like o.rings
@@ -497,6 +499,8 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ctx->rings = NULL;
o.sq_sqes = ctx->sq_sqes;
ctx->sq_sqes = NULL;
+ o.cqes = ctx->cqes;
+ ctx->cqes = NULL;
/*
* Now copy SQ and CQ entries, if any. If either of the destination
@@ -522,6 +526,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
/* restore old rings, and return -EOVERFLOW via cleanup path */
ctx->rings = o.rings;
ctx->sq_sqes = o.sq_sqes;
+ ctx->cqes = o.cqes;
to_free = &n;
ret = -EOVERFLOW;
goto out;
@@ -530,7 +535,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
unsigned src_head = i & (ctx->cq_entries - 1);
unsigned dst_head = i & (p.cq_entries - 1);
- n.rings->cqes[dst_head] = o.rings->cqes[src_head];
+ n.cqes[dst_head] = o.cqes[src_head];
}
WRITE_ONCE(n.rings->cq.head, old_head);
WRITE_ONCE(n.rings->cq.tail, tail);
@@ -551,6 +556,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ctx->rings = n.rings;
ctx->sq_sqes = n.sq_sqes;
+ ctx->cqes = n.cqes;
swap_old(ctx, o, n, ring_region);
swap_old(ctx, o, n, sq_region);
to_free = &o;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 11/16] io_uring/region: introduce io_region_slice
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (9 preceding siblings ...)
2025-11-06 17:01 ` [RFC 10/16] io_uring: separate cqe array from headers Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 12/16] io_uring: convert pointer init to io_region_slice Pavel Begunkov
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Add a new helper function that returns a sub-slice of a region memory
from an {offset,size} pair and performs a bunch of extra sanitisation.
It'll be used later for (slow path) ring setup.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/memmap.c | 12 ++++++++++++
io_uring/memmap.h | 2 ++
2 files changed, 14 insertions(+)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index b329cee8d6e8..83faef350b9d 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -232,6 +232,18 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
return ret;
}
+void *io_region_slice(struct io_mapped_region *mr, size_t off, size_t size)
+{
+ if (WARN_ON_ONCE(!size) || !io_region_is_set(mr))
+ return NULL;
+
+ size = size_add(off, size);
+ if (size == SIZE_MAX || size > io_region_size(mr))
+ return NULL;
+
+ return io_region_get_ptr(mr) + off;
+}
+
static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
loff_t pgoff)
{
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index d4b8b6363a7d..fa7a45cdb6dd 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -21,6 +21,8 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg,
unsigned long mmap_offset);
+void *io_region_slice(struct io_mapped_region *mr, size_t off, size_t size);
+
static inline void *io_region_get_ptr(struct io_mapped_region *mr)
{
return mr->ptr;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 12/16] io_uring: convert pointer init to io_region_slice
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (10 preceding siblings ...)
2025-11-06 17:01 ` [RFC 11/16] io_uring/region: introduce io_region_slice Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 13/16] io_uring: refactor rings_size() Pavel Begunkov
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Use io_region_slice() to initialise ctx ring pointers. The helper
performs bound checks and other sanitisation, which is safer and will be
especially helpful when ring placement gets more complicated in coming
patches. It also extends struct io_scq_dim with all intermediate offsets
and sizes to fully describe rings.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 47 ++++++++++++++++++++++++---------------------
io_uring/io_uring.h | 2 ++
2 files changed, 27 insertions(+), 22 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9aef41f6ce23..4f38a0b587fd 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2759,9 +2759,7 @@ static void io_rings_free(struct io_ring_ctx *ctx)
int rings_size(unsigned int flags, unsigned int sq_entries,
unsigned int cq_entries, struct io_scq_dim *dims)
{
- size_t off, sq_array_size;
- size_t cq_size, cqe_size;
- size_t sqe_size;
+ size_t cqe_size, off, sqe_size;
dims->sq_array_offset = SIZE_MAX;
@@ -2773,18 +2771,18 @@ int rings_size(unsigned int flags, unsigned int sq_entries,
cqe_size *= 2;
dims->sq_size = array_size(sqe_size, sq_entries);
- if (dims->sq_size == SIZE_MAX)
+ dims->sq_array_size = array_size(sizeof(u32), sq_entries);
+ dims->cq_size = array_size(cqe_size, cq_entries);
+
+ if (dims->sq_size == SIZE_MAX || dims->cq_size == SIZE_MAX ||
+ dims->sq_array_size == SIZE_MAX)
return -EOVERFLOW;
off = sizeof(struct io_rings);
off = L1_CACHE_ALIGN(off);
dims->cq_offset = off;
- cq_size = array_size(cqe_size, cq_entries);
- if (cq_size == SIZE_MAX)
- return -EOVERFLOW;
-
- off = size_add(off, cq_size);
+ off = size_add(off, dims->cq_size);
if (off == SIZE_MAX)
return -EOVERFLOW;
@@ -2809,12 +2807,7 @@ int rings_size(unsigned int flags, unsigned int sq_entries,
}
dims->sq_array_offset = off;
-
- sq_array_size = array_size(sizeof(u32), sq_entries);
- if (sq_array_size == SIZE_MAX)
- return -EOVERFLOW;
-
- if (check_add_overflow(off, sq_array_size, &off))
+ if (check_add_overflow(off, dims->sq_array_size, &off))
return -EOVERFLOW;
dims->cq_comp_size = off;
@@ -3375,7 +3368,6 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
struct io_scq_dim *dims = &config->dims;
struct io_uring_region_desc rd;
struct io_rings *rings;
- void *ptr;
int ret;
/* make sure these are sane, as we already accounted them */
@@ -3391,12 +3383,19 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
ret = io_create_region(ctx, &ctx->ring_region, &rd, IORING_OFF_CQ_RING);
if (ret)
return ret;
- ptr = io_region_get_ptr(&ctx->ring_region);
- ctx->rings = rings = ptr;
- ctx->cqes = ptr + config->dims.cq_offset;
- if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
- ctx->sq_array = ptr + dims->sq_array_offset;
+ ctx->rings = io_region_slice(&ctx->ring_region, 0, sizeof(struct io_rings));
+ ctx->cqes = io_region_slice(&ctx->ring_region, dims->cq_offset, dims->cq_size);
+ if (!ctx->rings || !ctx->cqes)
+ return -EFAULT;
+
+ if (!(ctx->flags & IORING_SETUP_NO_SQARRAY)) {
+ ctx->sq_array = io_region_slice(&ctx->ring_region,
+ dims->sq_array_offset,
+ dims->sq_array_size);
+ if (!ctx->sq_array)
+ return -EFAULT;
+ }
memset(&rd, 0, sizeof(rd));
rd.size = PAGE_ALIGN(dims->sq_size);
@@ -3409,8 +3408,12 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
io_rings_free(ctx);
return ret;
}
- ctx->sq_sqes = io_region_get_ptr(&ctx->sq_region);
+ ctx->sq_sqes = io_region_slice(&ctx->sq_region, 0, dims->sq_size);
+ if (!ctx->sq_sqes)
+ return -EFAULT;
+
+ rings = ctx->rings;
memset(rings, 0, sizeof(*rings));
WRITE_ONCE(rings->sq_ring_mask, ctx->sq_entries - 1);
WRITE_ONCE(rings->cq_ring_mask, ctx->cq_entries - 1);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 80228c5a843c..ed57ab4161db 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -19,8 +19,10 @@
struct io_scq_dim {
size_t sq_array_offset;
+ size_t sq_array_size;
size_t sq_size;
size_t cq_offset;
+ size_t cq_size;
/* Compound array mmap'ed together with CQ. */
size_t cq_comp_size;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 13/16] io_uring: refactor rings_size()
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (11 preceding siblings ...)
2025-11-06 17:01 ` [RFC 12/16] io_uring: convert pointer init to io_region_slice Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 14/16] io_uring: extract io_create_mem_region Pavel Begunkov
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Refactor rings_size() to facilitate further changes. Move entries checks
to the beginning, which localises offset calculation. And instead of
special casing IORING_SETUP_NO_SQARRAY, reverse the check and just
continue offset calculation. This way it assign cq_comp_size only once
at the end.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4f38a0b587fd..ff52d9f110ce 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2761,6 +2761,15 @@ int rings_size(unsigned int flags, unsigned int sq_entries,
{
size_t cqe_size, off, sqe_size;
+ if (flags & IORING_SETUP_CQE_MIXED) {
+ if (cq_entries < 2)
+ return -EOVERFLOW;
+ }
+ if (flags & IORING_SETUP_SQE_MIXED) {
+ if (sq_entries < 2)
+ return -EOVERFLOW;
+ }
+
dims->sq_array_offset = SIZE_MAX;
sqe_size = sizeof(struct io_uring_sqe);
@@ -2786,29 +2795,19 @@ int rings_size(unsigned int flags, unsigned int sq_entries,
if (off == SIZE_MAX)
return -EOVERFLOW;
- if (flags & IORING_SETUP_CQE_MIXED) {
- if (cq_entries < 2)
- return -EOVERFLOW;
- }
- if (flags & IORING_SETUP_SQE_MIXED) {
- if (sq_entries < 2)
- return -EOVERFLOW;
- }
-
#ifdef CONFIG_SMP
off = ALIGN(off, SMP_CACHE_BYTES);
if (off == 0)
return -EOVERFLOW;
#endif
- if (flags & IORING_SETUP_NO_SQARRAY) {
- dims->cq_comp_size = off;
- return 0;
- }
+ if (!(flags & IORING_SETUP_NO_SQARRAY)) {
+ dims->sq_array_offset = off;
- dims->sq_array_offset = off;
- if (check_add_overflow(off, dims->sq_array_size, &off))
- return -EOVERFLOW;
+ off = size_add(off, dims->sq_array_size);
+ if (off == SIZE_MAX)
+ return -EOVERFLOW;
+ }
dims->cq_comp_size = off;
return 0;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 14/16] io_uring: extract io_create_mem_region
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (12 preceding siblings ...)
2025-11-06 17:01 ` [RFC 13/16] io_uring: refactor rings_size() Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 15/16] io_uring: allow creating mem region at setup Pavel Begunkov
2025-11-06 17:01 ` [RFC 16/16] io_uring: introduce SCQ placement Pavel Begunkov
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Extract a helper function for creating a memory region but that can be
used at setup and not only in io_uring_register(). Specifically, it
doesn't check IORING_SETUP_R_DISABLED.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.h | 3 +++
io_uring/register.c | 43 +++++++++++++++++++++++++------------------
2 files changed, 28 insertions(+), 18 deletions(-)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index ed57ab4161db..20f6ca4696c1 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -200,6 +200,9 @@ void io_queue_next(struct io_kiocb *req);
void io_task_refs_refill(struct io_uring_task *tctx);
bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
+int io_create_mem_region(struct io_ring_ctx *ctx,
+ struct io_uring_mem_region_reg *reg);
+
void io_activate_pollwq(struct io_ring_ctx *ctx);
static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
diff --git a/io_uring/register.c b/io_uring/register.c
index b43a121e2974..425529a30dd9 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -572,10 +572,9 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
return ret;
}
-static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
+int io_create_mem_region(struct io_ring_ctx *ctx,
+ struct io_uring_mem_region_reg *reg)
{
- struct io_uring_mem_region_reg __user *reg_uptr = uarg;
- struct io_uring_mem_region_reg reg;
struct io_uring_region_desc __user *rd_uptr;
struct io_uring_region_desc rd;
struct io_mapped_region region = {};
@@ -583,23 +582,12 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
if (io_region_is_set(&ctx->param_region))
return -EBUSY;
- if (copy_from_user(®, reg_uptr, sizeof(reg)))
- return -EFAULT;
- rd_uptr = u64_to_user_ptr(reg.region_uptr);
+ rd_uptr = u64_to_user_ptr(reg->region_uptr);
if (copy_from_user(&rd, rd_uptr, sizeof(rd)))
return -EFAULT;
- if (memchr_inv(®.__resv, 0, sizeof(reg.__resv)))
+ if (memchr_inv(®->__resv, 0, sizeof(reg->__resv)))
return -EINVAL;
- if (reg.flags & ~IORING_MEM_REGION_REG_WAIT_ARG)
- return -EINVAL;
-
- /*
- * This ensures there are no waiters. Waiters are unlocked and it's
- * hard to synchronise with them, especially if we need to initialise
- * the region.
- */
- if ((reg.flags & IORING_MEM_REGION_REG_WAIT_ARG) &&
- !(ctx->flags & IORING_SETUP_R_DISABLED))
+ if (reg->flags & ~IORING_MEM_REGION_REG_WAIT_ARG)
return -EINVAL;
ret = io_create_region(ctx, ®ion, &rd, IORING_MAP_OFF_PARAM_REGION);
@@ -610,7 +598,7 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
return -EFAULT;
}
- if (reg.flags & IORING_MEM_REGION_REG_WAIT_ARG) {
+ if (reg->flags & IORING_MEM_REGION_REG_WAIT_ARG) {
ctx->cq_wait_arg = io_region_get_ptr(®ion);
ctx->cq_wait_size = rd.size;
}
@@ -619,6 +607,25 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
return 0;
}
+static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
+{
+ struct io_uring_mem_region_reg __user *reg_uptr = uarg;
+ struct io_uring_mem_region_reg reg;
+
+ if (copy_from_user(®, reg_uptr, sizeof(reg)))
+ return -EFAULT;
+ /*
+ * This ensures there are no waiters. Waiters are unlocked and it's
+ * hard to synchronise with them, especially if we need to initialise
+ * the region.
+ */
+ if ((reg.flags & IORING_MEM_REGION_REG_WAIT_ARG) &&
+ !(ctx->flags & IORING_SETUP_R_DISABLED))
+ return -EINVAL;
+
+ return io_create_mem_region(ctx, ®);
+}
+
static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
void __user *arg, unsigned nr_args)
__releases(ctx->uring_lock)
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 15/16] io_uring: allow creating mem region at setup
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (13 preceding siblings ...)
2025-11-06 17:01 ` [RFC 14/16] io_uring: extract io_create_mem_region Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
2025-11-06 17:01 ` [RFC 16/16] io_uring: introduce SCQ placement Pavel Begunkov
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
This patch gives users a way to create a memory region at ring setup
instead of requiring a registration call. First, it can be much more
convenient when the region is used for wait parameters and hence
requires IORING_SETUP_R_DISABLED. I'll also need it in the next patch
for placing SCQ into it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 7 ++++++-
io_uring/io_uring.c | 26 ++++++++++++++++++++++++++
io_uring/io_uring.h | 1 +
io_uring/register.c | 2 ++
4 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 04797a9b76bc..2da052bd4138 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -599,11 +599,16 @@ struct io_uring_params {
__u32 sq_thread_idle;
__u32 features;
__u32 wq_fd;
- __u32 resv[3];
+ __u32 resv;
+ __u64 params_ext; /* pointer to struct io_uring_params_ext */
struct io_sqring_offsets sq_off;
struct io_cqring_offsets cq_off;
};
+struct io_uring_params_ext {
+ __u64 mem_region; /* pointer to struct io_uring_mem_region_reg */
+};
+
/*
* io_uring_params->features flags
*/
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ff52d9f110ce..908c432aaaaa 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3574,8 +3574,13 @@ int io_uring_fill_params(struct io_uring_params *p)
static int io_prepare_config(struct io_ctx_config *config)
{
struct io_uring_params *p = &config->p;
+ struct io_uring_params_ext __user *ext_user;
int ret;
+ ext_user = u64_to_user_ptr(config->p.params_ext);
+ if (ext_user && copy_from_user(&config->ext, ext_user, sizeof(config->ext)))
+ return -EFAULT;
+
ret = io_uring_sanitise_params(p);
if (ret)
return ret;
@@ -3592,6 +3597,22 @@ static int io_prepare_config(struct io_ctx_config *config)
return 0;
}
+static int io_ctx_init_mem_region(struct io_ring_ctx *ctx,
+ struct io_ctx_config *config)
+{
+ struct io_uring_params_ext *e = &config->ext;
+ struct io_uring_mem_region_reg __user *mr_user;
+ struct io_uring_mem_region_reg mr;
+
+ mr_user = u64_to_user_ptr(e->mem_region);
+ if (!mr_user)
+ return 0;
+
+ if (copy_from_user(&mr, mr_user, sizeof(mr)))
+ return -EFAULT;
+ return io_create_mem_region(ctx, &mr);
+}
+
static __cold int io_uring_create(struct io_ctx_config *config)
{
struct io_uring_params *p = &config->p;
@@ -3661,10 +3682,15 @@ static __cold int io_uring_create(struct io_ctx_config *config)
mmgrab(current->mm);
ctx->mm_account = current->mm;
+ ret = io_ctx_init_mem_region(ctx, config);
+ if (ret)
+ goto err;
+
ret = io_allocate_scq_urings(ctx, config);
if (ret)
goto err;
+
ret = io_sq_offload_create(ctx, p);
if (ret)
goto err;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 20f6ca4696c1..c883017b11d3 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -30,6 +30,7 @@ struct io_scq_dim {
struct io_ctx_config {
struct io_uring_params p;
+ struct io_uring_params_ext ext;
struct io_scq_dim dims;
struct io_uring_params __user *uptr;
};
diff --git a/io_uring/register.c b/io_uring/register.c
index 425529a30dd9..4affabc416aa 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -414,6 +414,8 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
return -EFAULT;
if (p.flags & ~RESIZE_FLAGS)
return -EINVAL;
+ if (p.params_ext)
+ return -EINVAL;
/* properties that are always inherited */
p.flags |= (ctx->flags & COPY_FLAGS);
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC 16/16] io_uring: introduce SCQ placement
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
` (14 preceding siblings ...)
2025-11-06 17:01 ` [RFC 15/16] io_uring: allow creating mem region at setup Pavel Begunkov
@ 2025-11-06 17:01 ` Pavel Begunkov
15 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2025-11-06 17:01 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
There is a repeated problem with how io_uring manages rings.
Specifically, it creates a new memory region for each ring and places
entries together with headers. As the number of entries is always a
power of 2, it usually means that it needs to allocate an additional
page just for headers, which is wasteful. The headers structure size
is also usually small and under the cache line size, however it's
padded, which might mean additional cache bouncing.
Introduce a way for the user space to overlap SCQ headers and/or rings
onto a pre-registered memory/parameter region. Each of them has a
separate flag / offset, and they'll be attempted to be placed at the
specified offset in the region. If the user doesn't request placement
for SQ and/or CQ, io_uring will create a new memory region for them as
before.
The second goal is to be able to put all components into a single region
while knowing what's placed where. It's specifically interesting for
planned BPF work, as it makes program writing much simpler.
Note: zcrx also have the same issue, but it's left out of this series.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 14 ++++
io_uring/io_uring.c | 143 ++++++++++++++++++++++++----------
io_uring/io_uring.h | 10 ++-
io_uring/register.c | 4 +-
4 files changed, 128 insertions(+), 43 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 2da052bd4138..6574f0c6fc57 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -605,8 +605,22 @@ struct io_uring_params {
struct io_cqring_offsets cq_off;
};
+enum io_uring_scq_placement_flags {
+ IORING_PLACEMENT_SCQ_HDR = (1U << 0),
+ IORING_PLACEMENT_SQ = (1U << 1),
+ IORING_PLACEMENT_CQ = (1U << 2),
+};
+
+struct io_uring_scq_placement {
+ __u64 flags;
+ __u64 scq_hdr_off;
+ __u64 sq_off;
+ __u64 cq_off;
+};
+
struct io_uring_params_ext {
__u64 mem_region; /* pointer to struct io_uring_mem_region_reg */
+ struct io_uring_scq_placement placement;
};
/*
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 908c432aaaaa..b5179e444db2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2757,9 +2757,11 @@ static void io_rings_free(struct io_ring_ctx *ctx)
}
int rings_size(unsigned int flags, unsigned int sq_entries,
- unsigned int cq_entries, struct io_scq_dim *dims)
+ unsigned int cq_entries, struct io_scq_dim *dims,
+ unsigned placement_flags)
{
- size_t cqe_size, off, sqe_size;
+ size_t cqe_size, sqe_size;
+ size_t off = 0;
if (flags & IORING_SETUP_CQE_MIXED) {
if (cq_entries < 2)
@@ -2787,19 +2789,25 @@ int rings_size(unsigned int flags, unsigned int sq_entries,
dims->sq_array_size == SIZE_MAX)
return -EOVERFLOW;
- off = sizeof(struct io_rings);
- off = L1_CACHE_ALIGN(off);
- dims->cq_offset = off;
+ if (!(placement_flags & IORING_PLACEMENT_SQ))
+ dims->sq_mr_size = dims->sq_size;
- off = size_add(off, dims->cq_size);
- if (off == SIZE_MAX)
- return -EOVERFLOW;
+ if (!(placement_flags & IORING_PLACEMENT_SCQ_HDR)) {
+ off = sizeof(struct io_rings);
+ off = L1_CACHE_ALIGN(off);
+ }
+ dims->cq_offset = off;
+ if (!(placement_flags & IORING_PLACEMENT_CQ)) {
+ off = size_add(off, dims->cq_size);
+ if (off == SIZE_MAX)
+ return -EOVERFLOW;
#ifdef CONFIG_SMP
- off = ALIGN(off, SMP_CACHE_BYTES);
- if (off == 0)
- return -EOVERFLOW;
+ off = ALIGN(off, SMP_CACHE_BYTES);
+ if (off == 0)
+ return -EOVERFLOW;
#endif
+ }
if (!(flags & IORING_SETUP_NO_SQARRAY)) {
dims->sq_array_offset = off;
@@ -2809,7 +2817,7 @@ int rings_size(unsigned int flags, unsigned int sq_entries,
return -EOVERFLOW;
}
- dims->cq_comp_size = off;
+ dims->rings_mr_size = off;
return 0;
}
@@ -3360,12 +3368,47 @@ bool io_is_uring_fops(struct file *file)
return file->f_op == &io_uring_fops;
}
+static int io_create_scq_regions(struct io_ring_ctx *ctx,
+ struct io_ctx_config *config)
+{
+ struct io_scq_dim *dims = &config->dims;
+ struct io_uring_params *p = &config->p;
+ struct io_uring_region_desc rd;
+ int ret;
+
+ if (dims->rings_mr_size) {
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(dims->rings_mr_size);
+ if (ctx->flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p->cq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region(ctx, &ctx->ring_region, &rd, IORING_OFF_CQ_RING);
+ if (ret)
+ return ret;
+ }
+
+ if (dims->sq_mr_size) {
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(dims->sq_mr_size);
+ if (ctx->flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p->sq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region(ctx, &ctx->sq_region, &rd, IORING_OFF_SQES);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
struct io_ctx_config *config)
{
+ struct io_uring_scq_placement *pl = &config->ext.placement;
struct io_uring_params *p = &config->p;
struct io_scq_dim *dims = &config->dims;
- struct io_uring_region_desc rd;
struct io_rings *rings;
int ret;
@@ -3373,22 +3416,39 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
ctx->sq_entries = p->sq_entries;
ctx->cq_entries = p->cq_entries;
- memset(&rd, 0, sizeof(rd));
- rd.size = PAGE_ALIGN(dims->cq_comp_size);
- if (ctx->flags & IORING_SETUP_NO_MMAP) {
- rd.user_addr = p->cq_off.user_addr;
- rd.flags |= IORING_MEM_REGION_TYPE_USER;
- }
- ret = io_create_region(ctx, &ctx->ring_region, &rd, IORING_OFF_CQ_RING);
+ ret = io_create_scq_regions(ctx, config);
if (ret)
return ret;
- ctx->rings = io_region_slice(&ctx->ring_region, 0, sizeof(struct io_rings));
- ctx->cqes = io_region_slice(&ctx->ring_region, dims->cq_offset, dims->cq_size);
- if (!ctx->rings || !ctx->cqes)
- return -EFAULT;
+ if (pl->flags & IORING_PLACEMENT_SQ) {
+ ctx->sq_sqes = io_region_slice(&ctx->param_region,
+ pl->sq_off, dims->sq_size);
+ } else {
+ ctx->sq_sqes = io_region_slice(&ctx->sq_region,
+ 0, dims->sq_size);
+ }
+
+ if (pl->flags & IORING_PLACEMENT_SCQ_HDR) {
+ ctx->rings = io_region_slice(&ctx->param_region,
+ pl->scq_hdr_off,
+ sizeof(struct io_rings));
+ } else {
+ ctx->rings = io_region_slice(&ctx->ring_region,
+ 0, sizeof(struct io_rings));
+ }
+
+ if (pl->flags & IORING_PLACEMENT_CQ) {
+ ctx->cqes = io_region_slice(&ctx->param_region,
+ pl->cq_off, dims->cq_size);
+ } else {
+ ctx->cqes = io_region_slice(&ctx->ring_region,
+ dims->cq_offset, dims->cq_size);
+ }
if (!(ctx->flags & IORING_SETUP_NO_SQARRAY)) {
+ if (WARN_ON_ONCE(pl->flags & IORING_PLACEMENT_CQ))
+ return -EFAULT;
+
ctx->sq_array = io_region_slice(&ctx->ring_region,
dims->sq_array_offset,
dims->sq_array_size);
@@ -3396,20 +3456,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
return -EFAULT;
}
- memset(&rd, 0, sizeof(rd));
- rd.size = PAGE_ALIGN(dims->sq_size);
- if (ctx->flags & IORING_SETUP_NO_MMAP) {
- rd.user_addr = p->sq_off.user_addr;
- rd.flags |= IORING_MEM_REGION_TYPE_USER;
- }
- ret = io_create_region(ctx, &ctx->sq_region, &rd, IORING_OFF_SQES);
- if (ret) {
- io_rings_free(ctx);
- return ret;
- }
-
- ctx->sq_sqes = io_region_slice(&ctx->sq_region, 0, dims->sq_size);
- if (!ctx->sq_sqes)
+ if (!ctx->sq_sqes || !ctx->cqes || !ctx->rings)
return -EFAULT;
rings = ctx->rings;
@@ -3575,6 +3622,8 @@ static int io_prepare_config(struct io_ctx_config *config)
{
struct io_uring_params *p = &config->p;
struct io_uring_params_ext __user *ext_user;
+ struct io_uring_params_ext *e = &config->ext;
+ struct io_uring_scq_placement *pl = &e->placement;
int ret;
ext_user = u64_to_user_ptr(config->p.params_ext);
@@ -3589,10 +3638,26 @@ static int io_prepare_config(struct io_ctx_config *config)
if (ret)
return ret;
- ret = rings_size(p->flags, p->sq_entries, p->cq_entries, &config->dims);
+ ret = rings_size(p->flags, p->sq_entries, p->cq_entries, &config->dims,
+ pl->flags);
if (ret)
return ret;
+ if (pl->flags) {
+ if (pl->flags & ~IORING_PLACEMENT_MASK)
+ return -EOPNOTSUPP;
+ /* requires a registered memory region */
+ if (!e->mem_region)
+ return -EINVAL;
+ /* SQ arrays are not supported for simplicity */
+ if (!(p->flags & IORING_SETUP_NO_SQARRAY))
+ return -EINVAL;
+ /* don't allow creating a new region with just for headers */
+ if ((pl->flags & IORING_PLACEMENT_CQ) &&
+ !(pl->flags & IORING_PLACEMENT_SCQ_HDR))
+ return -EINVAL;
+ }
+
io_fill_scq_offsets(p, &config->dims);
return 0;
}
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index c883017b11d3..307710464cc4 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -25,7 +25,8 @@ struct io_scq_dim {
size_t cq_size;
/* Compound array mmap'ed together with CQ. */
- size_t cq_comp_size;
+ size_t rings_mr_size;
+ size_t sq_mr_size;
};
struct io_ctx_config {
@@ -35,6 +36,10 @@ struct io_ctx_config {
struct io_uring_params __user *uptr;
};
+#define IORING_PLACEMENT_MASK (IORING_PLACEMENT_SCQ_HDR |\
+ IORING_PLACEMENT_SQ |\
+ IORING_PLACEMENT_CQ)
+
#define IORING_FEAT_FLAGS (IORING_FEAT_SINGLE_MMAP |\
IORING_FEAT_NODROP |\
IORING_FEAT_SUBMIT_STABLE |\
@@ -153,7 +158,8 @@ static inline bool io_should_wake(struct io_wait_queue *iowq)
#define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES)
int rings_size(unsigned int flags, unsigned int sq_entries,
- unsigned int cq_entries, struct io_scq_dim *dims);
+ unsigned int cq_entries, struct io_scq_dim *dims,
+ unsigned placement_flags);
int io_uring_fill_params(struct io_uring_params *p);
void io_fill_scq_offsets(struct io_uring_params *p, struct io_scq_dim *dims);
bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow, bool cqe32);
diff --git a/io_uring/register.c b/io_uring/register.c
index 4affabc416aa..bbcb5a79a35f 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -423,12 +423,12 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ret = io_uring_fill_params(&p);
if (unlikely(ret))
return ret;
- ret = rings_size(p.flags, p.sq_entries, p.cq_entries, &dims);
+ ret = rings_size(p.flags, p.sq_entries, p.cq_entries, &dims, 0);
if (ret)
return ret;
io_fill_scq_offsets(&p, &dims);
- size = dims.cq_comp_size;
+ size = dims.rings_mr_size;
sq_array_offset = dims.sq_array_offset;
memset(&rd, 0, sizeof(rd));
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-11-06 17:02 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06 17:01 [RFC 00/16] Introduce ring flexible placement Pavel Begunkov
2025-11-06 17:01 ` [RFC 01/16] io_uring: add helper calculating region byte size Pavel Begunkov
2025-11-06 17:01 ` [RFC 02/16] io_uring: pass sq entires in the params struct Pavel Begunkov
2025-11-06 17:01 ` [RFC 03/16] io_uring: use mem_is_zero to check ring params Pavel Begunkov
2025-11-06 17:01 ` [RFC 04/16] io_uring: move flags check to io_uring_sanitise_params Pavel Begunkov
2025-11-06 17:01 ` [RFC 05/16] io_uring: introduce struct io_ctx_config Pavel Begunkov
2025-11-06 17:01 ` [RFC 06/16] io_uring: split out config init helper Pavel Begunkov
2025-11-06 17:01 ` [RFC 07/16] io_uring: add structure keeping ring offsets Pavel Begunkov
2025-11-06 17:01 ` [RFC 08/16] io_uring: pre-calculate scq offsets Pavel Begunkov
2025-11-06 17:01 ` [RFC 09/16] io_uring: inroduce helper for setting user offset Pavel Begunkov
2025-11-06 17:01 ` [RFC 10/16] io_uring: separate cqe array from headers Pavel Begunkov
2025-11-06 17:01 ` [RFC 11/16] io_uring/region: introduce io_region_slice Pavel Begunkov
2025-11-06 17:01 ` [RFC 12/16] io_uring: convert pointer init to io_region_slice Pavel Begunkov
2025-11-06 17:01 ` [RFC 13/16] io_uring: refactor rings_size() Pavel Begunkov
2025-11-06 17:01 ` [RFC 14/16] io_uring: extract io_create_mem_region Pavel Begunkov
2025-11-06 17:01 ` [RFC 15/16] io_uring: allow creating mem region at setup Pavel Begunkov
2025-11-06 17:01 ` [RFC 16/16] io_uring: introduce SCQ placement Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox