* [PATCH v3 00/18] kernel allocated regions and convert memmap to regions
@ 2024-11-29 13:34 Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 01/18] io_uring: rename ->resize_lock Pavel Begunkov
` (19 more replies)
0 siblings, 20 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
The first part of the series (Patches 1-11) implement kernel allocated
regions, which is the classical way SQ/CQ are created. It should be
straightforward with simple preparations patches and cleanups. The main
part is Patch 10, which internally implements kernel allocations, and
Patch 11 that implementing the mmap part and exposes it to reg-wait /
parameter region users.
The rest (Patches 12-18) converts SQ, CQ and provided buffers rings
to regions, which carves a common path for all of them and removes
duplication.
v3: fix !NOMMU unused function warning
rebased to avoid conflicts with recent fixes
use more appropriate alloc_pages_bulk_array_node
Pavel Begunkov (18):
io_uring: rename ->resize_lock
io_uring/rsrc: export io_check_coalesce_buffer
io_uring/memmap: flag vmap'ed regions
io_uring/memmap: flag regions with user pages
io_uring/memmap: account memory before pinning
io_uring/memmap: reuse io_free_region for failure path
io_uring/memmap: optimise single folio regions
io_uring/memmap: helper for pinning region pages
io_uring/memmap: add IO_REGION_F_SINGLE_REF
io_uring/memmap: implement kernel allocated regions
io_uring/memmap: implement mmap for regions
io_uring: pass ctx to io_register_free_rings
io_uring: use region api for SQ
io_uring: use region api for CQ
io_uring/kbuf: use mmap_lock to sync with mmap
io_uring/kbuf: remove pbuf ring refcounting
io_uring/kbuf: use region api for pbuf rings
io_uring/memmap: unify io_uring mmap'ing code
include/linux/io_uring_types.h | 23 +-
io_uring/io_uring.c | 72 +++----
io_uring/kbuf.c | 226 ++++++--------------
io_uring/kbuf.h | 20 +-
io_uring/memmap.c | 375 ++++++++++++++++-----------------
io_uring/memmap.h | 23 +-
io_uring/register.c | 91 ++++----
io_uring/rsrc.c | 22 +-
io_uring/rsrc.h | 4 +
9 files changed, 362 insertions(+), 494 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 01/18] io_uring: rename ->resize_lock
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 02/18] io_uring/rsrc: export io_check_coalesce_buffer Pavel Begunkov
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
->resize_lock is used for resizing rings, but it's a good idea to reuse
it in other cases as well. Rename it into mmap_lock as it's protects
from races with mmap.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 2 +-
io_uring/io_uring.c | 2 +-
io_uring/memmap.c | 6 +++---
io_uring/register.c | 8 ++++----
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3e934feb3187..adb36e0da40e 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -423,7 +423,7 @@ struct io_ring_ctx {
* side will need to grab this lock, to prevent either side from
* being run concurrently with the other.
*/
- struct mutex resize_lock;
+ struct mutex mmap_lock;
/*
* If IORING_SETUP_NO_MMAP is used, then the below holds
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ae199e44da57..c713ef35447b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -351,7 +351,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd);
io_napi_init(ctx);
- mutex_init(&ctx->resize_lock);
+ mutex_init(&ctx->mmap_lock);
return ctx;
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 57de9bccbf50..a0d4151d11af 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -329,7 +329,7 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
unsigned int npages;
void *ptr;
- guard(mutex)(&ctx->resize_lock);
+ guard(mutex)(&ctx->mmap_lock);
ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz);
if (IS_ERR(ptr))
@@ -365,7 +365,7 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
if (addr)
return -EINVAL;
- guard(mutex)(&ctx->resize_lock);
+ guard(mutex)(&ctx->mmap_lock);
ptr = io_uring_validate_mmap_request(filp, pgoff, len);
if (IS_ERR(ptr))
@@ -415,7 +415,7 @@ unsigned long io_uring_get_unmapped_area(struct file *file, unsigned long addr,
struct io_ring_ctx *ctx = file->private_data;
void *ptr;
- guard(mutex)(&ctx->resize_lock);
+ guard(mutex)(&ctx->mmap_lock);
ptr = io_uring_validate_mmap_request(file, pgoff, len);
if (IS_ERR(ptr))
diff --git a/io_uring/register.c b/io_uring/register.c
index 1e99c783abdf..ba61697d7a53 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -486,15 +486,15 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
}
/*
- * We'll do the swap. Grab the ctx->resize_lock, which will exclude
+ * We'll do the swap. Grab the ctx->mmap_lock, which will exclude
* any new mmap's on the ring fd. Clear out existing mappings to prevent
* mmap from seeing them, as we'll unmap them. Any attempt to mmap
* existing rings beyond this point will fail. Not that it could proceed
* at this point anyway, as the io_uring mmap side needs go grab the
- * ctx->resize_lock as well. Likewise, hold the completion lock over the
+ * ctx->mmap_lock as well. Likewise, hold the completion lock over the
* duration of the actual swap.
*/
- mutex_lock(&ctx->resize_lock);
+ mutex_lock(&ctx->mmap_lock);
spin_lock(&ctx->completion_lock);
o.rings = ctx->rings;
ctx->rings = NULL;
@@ -561,7 +561,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ret = 0;
out:
spin_unlock(&ctx->completion_lock);
- mutex_unlock(&ctx->resize_lock);
+ mutex_unlock(&ctx->mmap_lock);
io_register_free_rings(&p, to_free);
if (ctx->sq_data)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 02/18] io_uring/rsrc: export io_check_coalesce_buffer
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 01/18] io_uring: rename ->resize_lock Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 03/18] io_uring/memmap: flag vmap'ed regions Pavel Begunkov
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
io_try_coalesce_buffer() is a useful helper collecting useful info about
a set of pages, I want to reuse it for analysing ring/etc. mappings. I
don't need the entire thing and only interested if it can be coalesced
into a single page, but that's better than duplicating the parsing.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/rsrc.c | 22 ++++++++++++----------
io_uring/rsrc.h | 4 ++++
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index adaae8630932..e51e5ddae728 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -626,11 +626,12 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
return ret;
}
-static bool io_do_coalesce_buffer(struct page ***pages, int *nr_pages,
- struct io_imu_folio_data *data, int nr_folios)
+static bool io_coalesce_buffer(struct page ***pages, int *nr_pages,
+ struct io_imu_folio_data *data)
{
struct page **page_array = *pages, **new_array = NULL;
int nr_pages_left = *nr_pages, i, j;
+ int nr_folios = data->nr_folios;
/* Store head pages only*/
new_array = kvmalloc_array(nr_folios, sizeof(struct page *),
@@ -667,15 +668,14 @@ static bool io_do_coalesce_buffer(struct page ***pages, int *nr_pages,
return true;
}
-static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
- struct io_imu_folio_data *data)
+bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
+ struct io_imu_folio_data *data)
{
- struct page **page_array = *pages;
struct folio *folio = page_folio(page_array[0]);
unsigned int count = 1, nr_folios = 1;
int i;
- if (*nr_pages <= 1)
+ if (nr_pages <= 1)
return false;
data->nr_pages_mid = folio_nr_pages(folio);
@@ -687,7 +687,7 @@ static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
* Check if pages are contiguous inside a folio, and all folios have
* the same page count except for the head and tail.
*/
- for (i = 1; i < *nr_pages; i++) {
+ for (i = 1; i < nr_pages; i++) {
if (page_folio(page_array[i]) == folio &&
page_array[i] == page_array[i-1] + 1) {
count++;
@@ -715,7 +715,8 @@ static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
if (nr_folios == 1)
data->nr_pages_head = count;
- return io_do_coalesce_buffer(pages, nr_pages, data, nr_folios);
+ data->nr_folios = nr_folios;
+ return true;
}
static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
@@ -729,7 +730,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
size_t size;
int ret, nr_pages, i;
struct io_imu_folio_data data;
- bool coalesced;
+ bool coalesced = false;
if (!iov->iov_base)
return NULL;
@@ -749,7 +750,8 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
}
/* If it's huge page(s), try to coalesce them into fewer bvec entries */
- coalesced = io_try_coalesce_buffer(&pages, &nr_pages, &data);
+ if (io_check_coalesce_buffer(pages, nr_pages, &data))
+ coalesced = io_coalesce_buffer(&pages, &nr_pages, &data);
imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
if (!imu)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 7a4668deaa1a..c8b093584461 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -40,6 +40,7 @@ struct io_imu_folio_data {
/* For non-head/tail folios, has to be fully included */
unsigned int nr_pages_mid;
unsigned int folio_shift;
+ unsigned int nr_folios;
};
struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type);
@@ -66,6 +67,9 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg,
int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg,
unsigned int size, unsigned int type);
+bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
+ struct io_imu_folio_data *data);
+
static inline struct io_rsrc_node *io_rsrc_node_lookup(struct io_rsrc_data *data,
int index)
{
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 03/18] io_uring/memmap: flag vmap'ed regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 01/18] io_uring: rename ->resize_lock Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 02/18] io_uring/rsrc: export io_check_coalesce_buffer Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 04/18] io_uring/memmap: flag regions with user pages Pavel Begunkov
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Add internal flags for struct io_mapped_region. The first flag we need
is IO_REGION_F_VMAPPED, that indicates that the pointer has to be
unmapped on region destruction. For now all regions are vmap'ed, so it's
set unconditionally.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 5 +++--
io_uring/memmap.c | 14 ++++++++++----
io_uring/memmap.h | 2 +-
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index adb36e0da40e..4cee414080fd 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -77,8 +77,9 @@ struct io_hash_table {
struct io_mapped_region {
struct page **pages;
- void *vmap_ptr;
- size_t nr_pages;
+ void *ptr;
+ unsigned nr_pages;
+ unsigned flags;
};
/*
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index a0d4151d11af..31fb8c8ffe4e 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -202,14 +202,19 @@ void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
return ERR_PTR(-ENOMEM);
}
+enum {
+ /* memory was vmap'ed for the kernel, freeing the region vunmap's it */
+ IO_REGION_F_VMAP = 1,
+};
+
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
{
if (mr->pages) {
unpin_user_pages(mr->pages, mr->nr_pages);
kvfree(mr->pages);
}
- if (mr->vmap_ptr)
- vunmap(mr->vmap_ptr);
+ if ((mr->flags & IO_REGION_F_VMAP) && mr->ptr)
+ vunmap(mr->ptr);
if (mr->nr_pages && ctx->user)
__io_unaccount_mem(ctx->user, mr->nr_pages);
@@ -225,7 +230,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
void *vptr;
u64 end;
- if (WARN_ON_ONCE(mr->pages || mr->vmap_ptr || mr->nr_pages))
+ if (WARN_ON_ONCE(mr->pages || mr->ptr || mr->nr_pages))
return -EFAULT;
if (memchr_inv(®->__resv, 0, sizeof(reg->__resv)))
return -EINVAL;
@@ -260,8 +265,9 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
}
mr->pages = pages;
- mr->vmap_ptr = vptr;
+ mr->ptr = vptr;
mr->nr_pages = nr_pages;
+ mr->flags |= IO_REGION_F_VMAP;
return 0;
out_free:
if (pages_accounted)
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index f361a635b6c7..2096a8427277 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -28,7 +28,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
static inline void *io_region_get_ptr(struct io_mapped_region *mr)
{
- return mr->vmap_ptr;
+ return mr->ptr;
}
static inline bool io_region_is_set(struct io_mapped_region *mr)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 04/18] io_uring/memmap: flag regions with user pages
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (2 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 03/18] io_uring/memmap: flag vmap'ed regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 05/18] io_uring/memmap: account memory before pinning Pavel Begunkov
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
In preparation to kernel allocated regions add a flag telling if
the region contains user pinned pages or not.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 31fb8c8ffe4e..a0416733e921 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -205,12 +205,17 @@ void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
enum {
/* memory was vmap'ed for the kernel, freeing the region vunmap's it */
IO_REGION_F_VMAP = 1,
+ /* memory is provided by user and pinned by the kernel */
+ IO_REGION_F_USER_PROVIDED = 2,
};
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
{
if (mr->pages) {
- unpin_user_pages(mr->pages, mr->nr_pages);
+ if (mr->flags & IO_REGION_F_USER_PROVIDED)
+ unpin_user_pages(mr->pages, mr->nr_pages);
+ else
+ release_pages(mr->pages, mr->nr_pages);
kvfree(mr->pages);
}
if ((mr->flags & IO_REGION_F_VMAP) && mr->ptr)
@@ -267,7 +272,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
mr->pages = pages;
mr->ptr = vptr;
mr->nr_pages = nr_pages;
- mr->flags |= IO_REGION_F_VMAP;
+ mr->flags |= IO_REGION_F_VMAP | IO_REGION_F_USER_PROVIDED;
return 0;
out_free:
if (pages_accounted)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 05/18] io_uring/memmap: account memory before pinning
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (3 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 04/18] io_uring/memmap: flag regions with user pages Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 06/18] io_uring/memmap: reuse io_free_region for failure path Pavel Begunkov
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Move memory accounting before page pinning. It shouldn't even try to pin
pages if it's not allowed, and accounting is also relatively
inexpensive. It also give a better code structure as we do generic
accounting and then can branch for different mapping types.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index a0416733e921..fca93bc4c6f1 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -252,17 +252,21 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
if (check_add_overflow(reg->user_addr, reg->size, &end))
return -EOVERFLOW;
- pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
- if (IS_ERR(pages))
- return PTR_ERR(pages);
-
+ nr_pages = reg->size >> PAGE_SHIFT;
if (ctx->user) {
ret = __io_account_mem(ctx->user, nr_pages);
if (ret)
- goto out_free;
+ return ret;
pages_accounted = nr_pages;
}
+ pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
+ if (IS_ERR(pages)) {
+ ret = PTR_ERR(pages);
+ pages = NULL;
+ goto out_free;
+ }
+
vptr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
if (!vptr) {
ret = -ENOMEM;
@@ -277,7 +281,8 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
out_free:
if (pages_accounted)
__io_unaccount_mem(ctx->user, pages_accounted);
- io_pages_free(&pages, nr_pages);
+ if (pages)
+ io_pages_free(&pages, nr_pages);
return ret;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 06/18] io_uring/memmap: reuse io_free_region for failure path
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (4 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 05/18] io_uring/memmap: account memory before pinning Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 07/18] io_uring/memmap: optimise single folio regions Pavel Begunkov
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Regions are going to become more complex with allocation options and
optimisations, I want to split initialisation into steps and for that it
needs a sane fail path. Reuse io_free_region(), it's smart enough to
undo only what's needed and leaves the structure in a consistent state.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index fca93bc4c6f1..96c4f6b61171 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -229,7 +229,6 @@ void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
- int pages_accounted = 0;
struct page **pages;
int nr_pages, ret;
void *vptr;
@@ -257,32 +256,27 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
ret = __io_account_mem(ctx->user, nr_pages);
if (ret)
return ret;
- pages_accounted = nr_pages;
}
+ mr->nr_pages = nr_pages;
pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
if (IS_ERR(pages)) {
ret = PTR_ERR(pages);
- pages = NULL;
goto out_free;
}
+ mr->pages = pages;
+ mr->flags |= IO_REGION_F_USER_PROVIDED;
vptr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
if (!vptr) {
ret = -ENOMEM;
goto out_free;
}
-
- mr->pages = pages;
mr->ptr = vptr;
- mr->nr_pages = nr_pages;
- mr->flags |= IO_REGION_F_VMAP | IO_REGION_F_USER_PROVIDED;
+ mr->flags |= IO_REGION_F_VMAP;
return 0;
out_free:
- if (pages_accounted)
- __io_unaccount_mem(ctx->user, pages_accounted);
- if (pages)
- io_pages_free(&pages, nr_pages);
+ io_free_region(ctx, mr);
return ret;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 07/18] io_uring/memmap: optimise single folio regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (5 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 06/18] io_uring/memmap: reuse io_free_region for failure path Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 08/18] io_uring/memmap: helper for pinning region pages Pavel Begunkov
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
We don't need to vmap if memory is already physically contiguous. There
are two important cases it covers: PAGE_SIZE regions and huge pages.
Use io_check_coalesce_buffer() to get the number of contiguous folios.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 96c4f6b61171..fd348c98f64f 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -226,12 +226,31 @@ void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
memset(mr, 0, sizeof(*mr));
}
+static int io_region_init_ptr(struct io_mapped_region *mr)
+{
+ struct io_imu_folio_data ifd;
+ void *ptr;
+
+ if (io_check_coalesce_buffer(mr->pages, mr->nr_pages, &ifd)) {
+ if (ifd.nr_folios == 1) {
+ mr->ptr = page_address(mr->pages[0]);
+ return 0;
+ }
+ }
+ ptr = vmap(mr->pages, mr->nr_pages, VM_MAP, PAGE_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
+
+ mr->ptr = ptr;
+ mr->flags |= IO_REGION_F_VMAP;
+ return 0;
+}
+
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
struct page **pages;
int nr_pages, ret;
- void *vptr;
u64 end;
if (WARN_ON_ONCE(mr->pages || mr->ptr || mr->nr_pages))
@@ -267,13 +286,9 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
mr->pages = pages;
mr->flags |= IO_REGION_F_USER_PROVIDED;
- vptr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- if (!vptr) {
- ret = -ENOMEM;
+ ret = io_region_init_ptr(mr);
+ if (ret)
goto out_free;
- }
- mr->ptr = vptr;
- mr->flags |= IO_REGION_F_VMAP;
return 0;
out_free:
io_free_region(ctx, mr);
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 08/18] io_uring/memmap: helper for pinning region pages
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (6 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 07/18] io_uring/memmap: optimise single folio regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF Pavel Begunkov
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
In preparation to adding kernel allocated regions extract a new helper
that pins user pages.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index fd348c98f64f..5d261e07c2e3 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -246,10 +246,28 @@ static int io_region_init_ptr(struct io_mapped_region *mr)
return 0;
}
+static int io_region_pin_pages(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg)
+{
+ unsigned long size = mr->nr_pages << PAGE_SHIFT;
+ struct page **pages;
+ int nr_pages;
+
+ pages = io_pin_pages(reg->user_addr, size, &nr_pages);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
+ if (WARN_ON_ONCE(nr_pages != mr->nr_pages))
+ return -EFAULT;
+
+ mr->pages = pages;
+ mr->flags |= IO_REGION_F_USER_PROVIDED;
+ return 0;
+}
+
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
- struct page **pages;
int nr_pages, ret;
u64 end;
@@ -278,13 +296,9 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
}
mr->nr_pages = nr_pages;
- pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
- if (IS_ERR(pages)) {
- ret = PTR_ERR(pages);
+ ret = io_region_pin_pages(ctx, mr, reg);
+ if (ret)
goto out_free;
- }
- mr->pages = pages;
- mr->flags |= IO_REGION_F_USER_PROVIDED;
ret = io_region_init_ptr(mr);
if (ret)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (7 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 08/18] io_uring/memmap: helper for pinning region pages Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 10/18] io_uring/memmap: implement kernel allocated regions Pavel Begunkov
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Kernel allocated compound pages will have just one reference for the
entire page array, add a flag telling io_free_region about that.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 5d261e07c2e3..a37ccb167258 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -207,15 +207,23 @@ enum {
IO_REGION_F_VMAP = 1,
/* memory is provided by user and pinned by the kernel */
IO_REGION_F_USER_PROVIDED = 2,
+ /* only the first page in the array is ref'ed */
+ IO_REGION_F_SINGLE_REF = 4,
};
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
{
if (mr->pages) {
+ long nr_refs = mr->nr_pages;
+
+ if (mr->flags & IO_REGION_F_SINGLE_REF)
+ nr_refs = 1;
+
if (mr->flags & IO_REGION_F_USER_PROVIDED)
- unpin_user_pages(mr->pages, mr->nr_pages);
+ unpin_user_pages(mr->pages, nr_refs);
else
- release_pages(mr->pages, mr->nr_pages);
+ release_pages(mr->pages, nr_refs);
+
kvfree(mr->pages);
}
if ((mr->flags & IO_REGION_F_VMAP) && mr->ptr)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 10/18] io_uring/memmap: implement kernel allocated regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (8 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 11/18] io_uring/memmap: implement mmap for regions Pavel Begunkov
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Allow the kernel to allocate memory for a region. That's the classical
way SQ/CQ are allocated. It's not yet useful to user space as there
is no way to mmap it, which is why it's explicitly disabled in
io_register_mem_region().
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 43 ++++++++++++++++++++++++++++++++++++++++---
io_uring/register.c | 2 ++
2 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index a37ccb167258..0908a71bf57e 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -273,6 +273,39 @@ static int io_region_pin_pages(struct io_ring_ctx *ctx,
return 0;
}
+static int io_region_allocate_pages(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg)
+{
+ gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
+ unsigned long size = mr->nr_pages << PAGE_SHIFT;
+ unsigned long nr_allocated;
+ struct page **pages;
+ void *p;
+
+ pages = kvmalloc_array(mr->nr_pages, sizeof(*pages), gfp);
+ if (!pages)
+ return -ENOMEM;
+
+ p = io_mem_alloc_compound(pages, mr->nr_pages, size, gfp);
+ if (!IS_ERR(p)) {
+ mr->flags |= IO_REGION_F_SINGLE_REF;
+ mr->pages = pages;
+ return 0;
+ }
+
+ nr_allocated = alloc_pages_bulk_array_node(gfp, NUMA_NO_NODE,
+ mr->nr_pages, pages);
+ if (nr_allocated != mr->nr_pages) {
+ if (nr_allocated)
+ release_pages(pages, nr_allocated);
+ kvfree(pages);
+ return -ENOMEM;
+ }
+ mr->pages = pages;
+ return 0;
+}
+
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
@@ -283,9 +316,10 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
return -EFAULT;
if (memchr_inv(®->__resv, 0, sizeof(reg->__resv)))
return -EINVAL;
- if (reg->flags != IORING_MEM_REGION_TYPE_USER)
+ if (reg->flags & ~IORING_MEM_REGION_TYPE_USER)
return -EINVAL;
- if (!reg->user_addr)
+ /* user_addr should be set IFF it's a user memory backed region */
+ if ((reg->flags & IORING_MEM_REGION_TYPE_USER) != !!reg->user_addr)
return -EFAULT;
if (!reg->size || reg->mmap_offset || reg->id)
return -EINVAL;
@@ -304,7 +338,10 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
}
mr->nr_pages = nr_pages;
- ret = io_region_pin_pages(ctx, mr, reg);
+ if (reg->flags & IORING_MEM_REGION_TYPE_USER)
+ ret = io_region_pin_pages(ctx, mr, reg);
+ else
+ ret = io_region_allocate_pages(ctx, mr, reg);
if (ret)
goto out_free;
diff --git a/io_uring/register.c b/io_uring/register.c
index ba61697d7a53..f043d3f6b026 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -586,6 +586,8 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
if (copy_from_user(&rd, rd_uptr, sizeof(rd)))
return -EFAULT;
+ if (!(rd.flags & IORING_MEM_REGION_TYPE_USER))
+ return -EINVAL;
if (memchr_inv(®.__resv, 0, sizeof(reg.__resv)))
return -EINVAL;
if (reg.flags & ~IORING_MEM_REGION_REG_WAIT_ARG)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 11/18] io_uring/memmap: implement mmap for regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (9 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 10/18] io_uring/memmap: implement kernel allocated regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 12/18] io_uring: pass ctx to io_register_free_rings Pavel Begunkov
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
The patch implements mmap for the param region and enables the kernel
allocation mode. Internally it uses a fixed mmap offset, however the
user has to use the offset returned in
struct io_uring_region_desc::mmap_offset.
Note, mmap doesn't and can't take ->uring_lock and the region / ring
lookup is protected by ->mmap_lock, and it's directly peeking at
ctx->param_region. We can't protect io_create_region() with the
mmap_lock as it'd deadlock, which is why io_create_region_mmap_safe()
initialises it for us in a temporary variable and then publishes it
with the lock taken. It's intentionally decoupled from main region
helpers, and in the future we might want to have a list of active
regions, which then could be protected by the ->mmap_lock.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 61 +++++++++++++++++++++++++++++++++++++++++----
io_uring/memmap.h | 10 +++++++-
io_uring/register.c | 6 ++---
3 files changed, 67 insertions(+), 10 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 0908a71bf57e..9a182c8a4be1 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -275,7 +275,8 @@ static int io_region_pin_pages(struct io_ring_ctx *ctx,
static int io_region_allocate_pages(struct io_ring_ctx *ctx,
struct io_mapped_region *mr,
- struct io_uring_region_desc *reg)
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset)
{
gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
unsigned long size = mr->nr_pages << PAGE_SHIFT;
@@ -290,8 +291,7 @@ static int io_region_allocate_pages(struct io_ring_ctx *ctx,
p = io_mem_alloc_compound(pages, mr->nr_pages, size, gfp);
if (!IS_ERR(p)) {
mr->flags |= IO_REGION_F_SINGLE_REF;
- mr->pages = pages;
- return 0;
+ goto done;
}
nr_allocated = alloc_pages_bulk_array_node(gfp, NUMA_NO_NODE,
@@ -302,12 +302,15 @@ static int io_region_allocate_pages(struct io_ring_ctx *ctx,
kvfree(pages);
return -ENOMEM;
}
+done:
+ reg->mmap_offset = mmap_offset;
mr->pages = pages;
return 0;
}
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
- struct io_uring_region_desc *reg)
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset)
{
int nr_pages, ret;
u64 end;
@@ -341,7 +344,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
if (reg->flags & IORING_MEM_REGION_TYPE_USER)
ret = io_region_pin_pages(ctx, mr, reg);
else
- ret = io_region_allocate_pages(ctx, mr, reg);
+ ret = io_region_allocate_pages(ctx, mr, reg, mmap_offset);
if (ret)
goto out_free;
@@ -354,6 +357,40 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
return ret;
}
+int io_create_region_mmap_safe(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset)
+{
+ struct io_mapped_region tmp_mr;
+ int ret;
+
+ memcpy(&tmp_mr, mr, sizeof(tmp_mr));
+ ret = io_create_region(ctx, &tmp_mr, reg, mmap_offset);
+ if (ret)
+ return ret;
+
+ /*
+ * Once published mmap can find it without holding only the ->mmap_lock
+ * and not ->uring_lock.
+ */
+ guard(mutex)(&ctx->mmap_lock);
+ memcpy(mr, &tmp_mr, sizeof(tmp_mr));
+ return 0;
+}
+
+static void *io_region_validate_mmap(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr)
+{
+ lockdep_assert_held(&ctx->mmap_lock);
+
+ if (!io_region_is_set(mr))
+ return ERR_PTR(-EINVAL);
+ if (mr->flags & IO_REGION_F_USER_PROVIDED)
+ return ERR_PTR(-EINVAL);
+
+ return io_region_get_ptr(mr);
+}
+
static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
size_t sz)
{
@@ -389,6 +426,8 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
io_put_bl(ctx, bl);
return ptr;
}
+ case IORING_MAP_OFF_PARAM_REGION:
+ return io_region_validate_mmap(ctx, &ctx->param_region);
}
return ERR_PTR(-EINVAL);
@@ -405,6 +444,16 @@ int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
#ifdef CONFIG_MMU
+static int io_region_mmap(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct vm_area_struct *vma)
+{
+ unsigned long nr_pages = mr->nr_pages;
+
+ vm_flags_set(vma, VM_DONTEXPAND);
+ return vm_insert_pages(vma, vma->vm_start, mr->pages, &nr_pages);
+}
+
__cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
{
struct io_ring_ctx *ctx = file->private_data;
@@ -429,6 +478,8 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
ctx->n_sqe_pages);
case IORING_OFF_PBUF_RING:
return io_pbuf_mmap(file, vma);
+ case IORING_MAP_OFF_PARAM_REGION:
+ return io_region_mmap(ctx, &ctx->param_region, vma);
}
return -EINVAL;
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index 2096a8427277..2402bca3d700 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -1,6 +1,8 @@
#ifndef IO_URING_MEMMAP_H
#define IO_URING_MEMMAP_H
+#define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL
+
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
void io_pages_free(struct page ***pages, int npages);
int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
@@ -24,7 +26,13 @@ int io_uring_mmap(struct file *file, struct vm_area_struct *vma);
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr);
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
- struct io_uring_region_desc *reg);
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset);
+
+int io_create_region_mmap_safe(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset);
static inline void *io_region_get_ptr(struct io_mapped_region *mr)
{
diff --git a/io_uring/register.c b/io_uring/register.c
index f043d3f6b026..5b099ec36d00 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -585,9 +585,6 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
rd_uptr = u64_to_user_ptr(reg.region_uptr);
if (copy_from_user(&rd, rd_uptr, sizeof(rd)))
return -EFAULT;
-
- if (!(rd.flags & IORING_MEM_REGION_TYPE_USER))
- return -EINVAL;
if (memchr_inv(®.__resv, 0, sizeof(reg.__resv)))
return -EINVAL;
if (reg.flags & ~IORING_MEM_REGION_REG_WAIT_ARG)
@@ -602,7 +599,8 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
!(ctx->flags & IORING_SETUP_R_DISABLED))
return -EINVAL;
- ret = io_create_region(ctx, &ctx->param_region, &rd);
+ ret = io_create_region_mmap_safe(ctx, &ctx->param_region, &rd,
+ IORING_MAP_OFF_PARAM_REGION);
if (ret)
return ret;
if (copy_to_user(rd_uptr, &rd, sizeof(rd))) {
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 12/18] io_uring: pass ctx to io_register_free_rings
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (10 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 11/18] io_uring/memmap: implement mmap for regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 13/18] io_uring: use region api for SQ Pavel Begunkov
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
A preparation patch, pass the context to io_register_free_rings.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/register.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/io_uring/register.c b/io_uring/register.c
index 5b099ec36d00..5e07205fb071 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -375,7 +375,8 @@ struct io_ring_ctx_rings {
struct io_rings *rings;
};
-static void io_register_free_rings(struct io_uring_params *p,
+static void io_register_free_rings(struct io_ring_ctx *ctx,
+ struct io_uring_params *p,
struct io_ring_ctx_rings *r)
{
if (!(p->flags & IORING_SETUP_NO_MMAP)) {
@@ -452,7 +453,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
n.rings->cq_ring_entries = p.cq_entries;
if (copy_to_user(arg, &p, sizeof(p))) {
- io_register_free_rings(&p, &n);
+ io_register_free_rings(ctx, &p, &n);
return -EFAULT;
}
@@ -461,7 +462,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
else
size = array_size(sizeof(struct io_uring_sqe), p.sq_entries);
if (size == SIZE_MAX) {
- io_register_free_rings(&p, &n);
+ io_register_free_rings(ctx, &p, &n);
return -EOVERFLOW;
}
@@ -472,7 +473,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
p.sq_off.user_addr,
size);
if (IS_ERR(ptr)) {
- io_register_free_rings(&p, &n);
+ io_register_free_rings(ctx, &p, &n);
return PTR_ERR(ptr);
}
@@ -562,7 +563,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
out:
spin_unlock(&ctx->completion_lock);
mutex_unlock(&ctx->mmap_lock);
- io_register_free_rings(&p, to_free);
+ io_register_free_rings(ctx, &p, to_free);
if (ctx->sq_data)
io_sq_thread_unpark(ctx->sq_data);
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 13/18] io_uring: use region api for SQ
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (11 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 12/18] io_uring: pass ctx to io_register_free_rings Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 14/18] io_uring: use region api for CQ Pavel Begunkov
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 3 +--
io_uring/io_uring.c | 36 +++++++++++++---------------------
io_uring/memmap.c | 3 +--
io_uring/register.c | 35 +++++++++++++++------------------
4 files changed, 32 insertions(+), 45 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 4cee414080fd..3f353f269c6e 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -431,10 +431,9 @@ struct io_ring_ctx {
* the gup'ed pages for the two rings, and the sqes.
*/
unsigned short n_ring_pages;
- unsigned short n_sqe_pages;
struct page **ring_pages;
- struct page **sqe_pages;
+ struct io_mapped_region sq_region;
/* used for optimised request parameter and wait argument passing */
struct io_mapped_region param_region;
};
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c713ef35447b..2ac80b4d4016 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2637,29 +2637,19 @@ static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr,
size);
}
-static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr,
- size_t size)
-{
- return __io_uaddr_map(&ctx->sqe_pages, &ctx->n_sqe_pages, uaddr,
- size);
-}
-
static void io_rings_free(struct io_ring_ctx *ctx)
{
if (!(ctx->flags & IORING_SETUP_NO_MMAP)) {
io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages,
true);
- io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages,
- true);
} else {
io_pages_free(&ctx->ring_pages, ctx->n_ring_pages);
ctx->n_ring_pages = 0;
- io_pages_free(&ctx->sqe_pages, ctx->n_sqe_pages);
- ctx->n_sqe_pages = 0;
vunmap(ctx->rings);
- vunmap(ctx->sq_sqes);
}
+ io_free_region(ctx, &ctx->sq_region);
+
ctx->rings = NULL;
ctx->sq_sqes = NULL;
}
@@ -3476,9 +3466,10 @@ bool io_is_uring_fops(struct file *file)
static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
struct io_uring_params *p)
{
+ struct io_uring_region_desc rd;
struct io_rings *rings;
size_t size, sq_array_offset;
- void *ptr;
+ int ret;
/* make sure these are sane, as we already accounted them */
ctx->sq_entries = p->sq_entries;
@@ -3514,17 +3505,18 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
return -EOVERFLOW;
}
- if (!(ctx->flags & IORING_SETUP_NO_MMAP))
- ptr = io_pages_map(&ctx->sqe_pages, &ctx->n_sqe_pages, size);
- else
- ptr = io_sqes_map(ctx, p->sq_off.user_addr, size);
-
- if (IS_ERR(ptr)) {
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (ctx->flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p->sq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region(ctx, &ctx->sq_region, &rd, IORING_OFF_SQES);
+ if (ret) {
io_rings_free(ctx);
- return PTR_ERR(ptr);
+ return ret;
}
-
- ctx->sq_sqes = ptr;
+ ctx->sq_sqes = io_region_get_ptr(&ctx->sq_region);
return 0;
}
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 9a182c8a4be1..b9aaa25182a5 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -474,8 +474,7 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
npages = min(ctx->n_ring_pages, (sz + PAGE_SIZE - 1) >> PAGE_SHIFT);
return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, npages);
case IORING_OFF_SQES:
- return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages,
- ctx->n_sqe_pages);
+ return io_region_mmap(ctx, &ctx->sq_region, vma);
case IORING_OFF_PBUF_RING:
return io_pbuf_mmap(file, vma);
case IORING_MAP_OFF_PARAM_REGION:
diff --git a/io_uring/register.c b/io_uring/register.c
index 5e07205fb071..44cd64923d31 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -368,11 +368,11 @@ static int io_register_clock(struct io_ring_ctx *ctx,
*/
struct io_ring_ctx_rings {
unsigned short n_ring_pages;
- unsigned short n_sqe_pages;
struct page **ring_pages;
- struct page **sqe_pages;
- struct io_uring_sqe *sq_sqes;
struct io_rings *rings;
+
+ struct io_uring_sqe *sq_sqes;
+ struct io_mapped_region sq_region;
};
static void io_register_free_rings(struct io_ring_ctx *ctx,
@@ -382,14 +382,11 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
if (!(p->flags & IORING_SETUP_NO_MMAP)) {
io_pages_unmap(r->rings, &r->ring_pages, &r->n_ring_pages,
true);
- io_pages_unmap(r->sq_sqes, &r->sqe_pages, &r->n_sqe_pages,
- true);
} else {
io_pages_free(&r->ring_pages, r->n_ring_pages);
- io_pages_free(&r->sqe_pages, r->n_sqe_pages);
vunmap(r->rings);
- vunmap(r->sq_sqes);
}
+ io_free_region(ctx, &r->sq_region);
}
#define swap_old(ctx, o, n, field) \
@@ -404,11 +401,11 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
{
+ struct io_uring_region_desc rd;
struct io_ring_ctx_rings o = { }, n = { }, *to_free = NULL;
size_t size, sq_array_offset;
struct io_uring_params p;
unsigned i, tail;
- void *ptr;
int ret;
/* for single issuer, must be owner resizing */
@@ -466,16 +463,18 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
return -EOVERFLOW;
}
- if (!(p.flags & IORING_SETUP_NO_MMAP))
- ptr = io_pages_map(&n.sqe_pages, &n.n_sqe_pages, size);
- else
- ptr = __io_uaddr_map(&n.sqe_pages, &n.n_sqe_pages,
- p.sq_off.user_addr,
- size);
- if (IS_ERR(ptr)) {
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (p.flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p.sq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region_mmap_safe(ctx, &n.sq_region, &rd, IORING_OFF_SQES);
+ if (ret) {
io_register_free_rings(ctx, &p, &n);
- return PTR_ERR(ptr);
+ return ret;
}
+ n.sq_sqes = io_region_get_ptr(&n.sq_region);
/*
* If using SQPOLL, park the thread
@@ -506,7 +505,6 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
* Now copy SQ and CQ entries, if any. If either of the destination
* rings can't hold what is already there, then fail the operation.
*/
- n.sq_sqes = ptr;
tail = o.rings->sq.tail;
if (tail - o.rings->sq.head > p.sq_entries)
goto overflow;
@@ -555,9 +553,8 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ctx->rings = n.rings;
ctx->sq_sqes = n.sq_sqes;
swap_old(ctx, o, n, n_ring_pages);
- swap_old(ctx, o, n, n_sqe_pages);
swap_old(ctx, o, n, ring_pages);
- swap_old(ctx, o, n, sqe_pages);
+ swap_old(ctx, o, n, sq_region);
to_free = &o;
ret = 0;
out:
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 14/18] io_uring: use region api for CQ
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (12 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 13/18] io_uring: use region api for SQ Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 15/18] io_uring/kbuf: use mmap_lock to sync with mmap Pavel Begunkov
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 8 +----
io_uring/io_uring.c | 36 +++++++---------------
io_uring/memmap.c | 55 +++++-----------------------------
io_uring/memmap.h | 4 ---
io_uring/register.c | 35 ++++++++++------------
5 files changed, 36 insertions(+), 102 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3f353f269c6e..2db252841509 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -426,14 +426,8 @@ struct io_ring_ctx {
*/
struct mutex mmap_lock;
- /*
- * If IORING_SETUP_NO_MMAP is used, then the below holds
- * the gup'ed pages for the two rings, and the sqes.
- */
- unsigned short n_ring_pages;
- struct page **ring_pages;
-
struct io_mapped_region sq_region;
+ struct io_mapped_region ring_region;
/* used for optimised request parameter and wait argument passing */
struct io_mapped_region param_region;
};
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2ac80b4d4016..bc0ab2bb7ae2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2630,26 +2630,10 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
}
-static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr,
- size_t size)
-{
- return __io_uaddr_map(&ctx->ring_pages, &ctx->n_ring_pages, uaddr,
- size);
-}
-
static void io_rings_free(struct io_ring_ctx *ctx)
{
- if (!(ctx->flags & IORING_SETUP_NO_MMAP)) {
- io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages,
- true);
- } else {
- io_pages_free(&ctx->ring_pages, ctx->n_ring_pages);
- ctx->n_ring_pages = 0;
- vunmap(ctx->rings);
- }
-
io_free_region(ctx, &ctx->sq_region);
-
+ io_free_region(ctx, &ctx->ring_region);
ctx->rings = NULL;
ctx->sq_sqes = NULL;
}
@@ -3480,15 +3464,17 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
if (size == SIZE_MAX)
return -EOVERFLOW;
- if (!(ctx->flags & IORING_SETUP_NO_MMAP))
- rings = io_pages_map(&ctx->ring_pages, &ctx->n_ring_pages, size);
- else
- rings = io_rings_map(ctx, p->cq_off.user_addr, size);
-
- if (IS_ERR(rings))
- return PTR_ERR(rings);
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (ctx->flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p->cq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region(ctx, &ctx->ring_region, &rd, IORING_OFF_CQ_RING);
+ if (ret)
+ return ret;
+ ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
- ctx->rings = rings;
if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
ctx->sq_array = (u32 *)((char *)rings + sq_array_offset);
rings->sq_ring_mask = p->sq_entries - 1;
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index b9aaa25182a5..668b1c3579a2 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -120,18 +120,6 @@ void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
*npages = 0;
}
-void io_pages_free(struct page ***pages, int npages)
-{
- struct page **page_array = *pages;
-
- if (!page_array)
- return;
-
- unpin_user_pages(page_array, npages);
- kvfree(page_array);
- *pages = NULL;
-}
-
struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
{
unsigned long start, end, nr_pages;
@@ -174,34 +162,6 @@ struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
return ERR_PTR(ret);
}
-void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
- unsigned long uaddr, size_t size)
-{
- struct page **page_array;
- unsigned int nr_pages;
- void *page_addr;
-
- *npages = 0;
-
- if (uaddr & (PAGE_SIZE - 1) || !size)
- return ERR_PTR(-EINVAL);
-
- nr_pages = 0;
- page_array = io_pin_pages(uaddr, size, &nr_pages);
- if (IS_ERR(page_array))
- return page_array;
-
- page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL);
- if (page_addr) {
- *pages = page_array;
- *npages = nr_pages;
- return page_addr;
- }
-
- io_pages_free(&page_array, nr_pages);
- return ERR_PTR(-ENOMEM);
-}
-
enum {
/* memory was vmap'ed for the kernel, freeing the region vunmap's it */
IO_REGION_F_VMAP = 1,
@@ -446,9 +406,10 @@ int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
static int io_region_mmap(struct io_ring_ctx *ctx,
struct io_mapped_region *mr,
- struct vm_area_struct *vma)
+ struct vm_area_struct *vma,
+ unsigned max_pages)
{
- unsigned long nr_pages = mr->nr_pages;
+ unsigned long nr_pages = min(mr->nr_pages, max_pages);
vm_flags_set(vma, VM_DONTEXPAND);
return vm_insert_pages(vma, vma->vm_start, mr->pages, &nr_pages);
@@ -459,7 +420,7 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
struct io_ring_ctx *ctx = file->private_data;
size_t sz = vma->vm_end - vma->vm_start;
long offset = vma->vm_pgoff << PAGE_SHIFT;
- unsigned int npages;
+ unsigned int page_limit;
void *ptr;
guard(mutex)(&ctx->mmap_lock);
@@ -471,14 +432,14 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
switch (offset & IORING_OFF_MMAP_MASK) {
case IORING_OFF_SQ_RING:
case IORING_OFF_CQ_RING:
- npages = min(ctx->n_ring_pages, (sz + PAGE_SIZE - 1) >> PAGE_SHIFT);
- return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, npages);
+ page_limit = (sz + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ return io_region_mmap(ctx, &ctx->ring_region, vma, page_limit);
case IORING_OFF_SQES:
- return io_region_mmap(ctx, &ctx->sq_region, vma);
+ return io_region_mmap(ctx, &ctx->sq_region, vma, UINT_MAX);
case IORING_OFF_PBUF_RING:
return io_pbuf_mmap(file, vma);
case IORING_MAP_OFF_PARAM_REGION:
- return io_region_mmap(ctx, &ctx->param_region, vma);
+ return io_region_mmap(ctx, &ctx->param_region, vma, UINT_MAX);
}
return -EINVAL;
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index 2402bca3d700..7395996eb353 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -4,7 +4,6 @@
#define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
-void io_pages_free(struct page ***pages, int npages);
int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
struct page **pages, int npages);
@@ -13,9 +12,6 @@ void *io_pages_map(struct page ***out_pages, unsigned short *npages,
void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
bool put_pages);
-void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
- unsigned long uaddr, size_t size);
-
#ifndef CONFIG_MMU
unsigned int io_uring_nommu_mmap_capabilities(struct file *file);
#endif
diff --git a/io_uring/register.c b/io_uring/register.c
index 44cd64923d31..f1698c18c7cb 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -367,26 +367,19 @@ static int io_register_clock(struct io_ring_ctx *ctx,
* either mapping or freeing.
*/
struct io_ring_ctx_rings {
- unsigned short n_ring_pages;
- struct page **ring_pages;
struct io_rings *rings;
-
struct io_uring_sqe *sq_sqes;
+
struct io_mapped_region sq_region;
+ struct io_mapped_region ring_region;
};
static void io_register_free_rings(struct io_ring_ctx *ctx,
struct io_uring_params *p,
struct io_ring_ctx_rings *r)
{
- if (!(p->flags & IORING_SETUP_NO_MMAP)) {
- io_pages_unmap(r->rings, &r->ring_pages, &r->n_ring_pages,
- true);
- } else {
- io_pages_free(&r->ring_pages, r->n_ring_pages);
- vunmap(r->rings);
- }
io_free_region(ctx, &r->sq_region);
+ io_free_region(ctx, &r->ring_region);
}
#define swap_old(ctx, o, n, field) \
@@ -436,13 +429,18 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
if (size == SIZE_MAX)
return -EOVERFLOW;
- if (!(p.flags & IORING_SETUP_NO_MMAP))
- n.rings = io_pages_map(&n.ring_pages, &n.n_ring_pages, size);
- else
- n.rings = __io_uaddr_map(&n.ring_pages, &n.n_ring_pages,
- p.cq_off.user_addr, size);
- if (IS_ERR(n.rings))
- return PTR_ERR(n.rings);
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (p.flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p.cq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region_mmap_safe(ctx, &n.ring_region, &rd, IORING_OFF_CQ_RING);
+ if (ret) {
+ io_register_free_rings(ctx, &p, &n);
+ return ret;
+ }
+ n.rings = io_region_get_ptr(&n.ring_region);
n.rings->sq_ring_mask = p.sq_entries - 1;
n.rings->cq_ring_mask = p.cq_entries - 1;
@@ -552,8 +550,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ctx->rings = n.rings;
ctx->sq_sqes = n.sq_sqes;
- swap_old(ctx, o, n, n_ring_pages);
- swap_old(ctx, o, n, ring_pages);
+ swap_old(ctx, o, n, ring_region);
swap_old(ctx, o, n, sq_region);
to_free = &o;
ret = 0;
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 15/18] io_uring/kbuf: use mmap_lock to sync with mmap
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (13 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 14/18] io_uring: use region api for CQ Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 16/18] io_uring/kbuf: remove pbuf ring refcounting Pavel Begunkov
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
A preparation / cleanup patch simplifying the buf ring - mmap
synchronisation. Instead of relying on RCU, which is trickier, do it by
grabbing the mmap_lock when when anyone tries to publish or remove a
registered buffer to / from ->io_bl_xa.
Modifications of the xarray should always be protected by both
->uring_lock and ->mmap_lock, while lookups should hold either of them.
While a struct io_buffer_list is in the xarray, the mmap related fields
like ->flags and ->buf_pages should stay stable.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 5 +++
io_uring/kbuf.c | 56 +++++++++++++++-------------------
io_uring/kbuf.h | 1 -
3 files changed, 29 insertions(+), 33 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 2db252841509..091d1eaf5ba0 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -293,6 +293,11 @@ struct io_ring_ctx {
struct io_submit_state submit_state;
+ /*
+ * Modifications are protected by ->uring_lock and ->mmap_lock.
+ * The flags, buf_pages and buf_nr_pages fields should be stable
+ * once published.
+ */
struct xarray io_bl_xa;
struct io_hash_table cancel_table;
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index d407576ddfb7..662e928cc3b0 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -45,10 +45,11 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
/*
* Store buffer group ID and finally mark the list as visible.
* The normal lookup doesn't care about the visibility as we're
- * always under the ->uring_lock, but the RCU lookup from mmap does.
+ * always under the ->uring_lock, but lookups from mmap do.
*/
bl->bgid = bgid;
atomic_set(&bl->refs, 1);
+ guard(mutex)(&ctx->mmap_lock);
return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
}
@@ -388,7 +389,7 @@ void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
{
if (atomic_dec_and_test(&bl->refs)) {
__io_remove_buffers(ctx, bl, -1U);
- kfree_rcu(bl, rcu);
+ kfree(bl);
}
}
@@ -397,10 +398,17 @@ void io_destroy_buffers(struct io_ring_ctx *ctx)
struct io_buffer_list *bl;
struct list_head *item, *tmp;
struct io_buffer *buf;
- unsigned long index;
- xa_for_each(&ctx->io_bl_xa, index, bl) {
- xa_erase(&ctx->io_bl_xa, bl->bgid);
+ while (1) {
+ unsigned long index = 0;
+
+ scoped_guard(mutex, &ctx->mmap_lock) {
+ bl = xa_find(&ctx->io_bl_xa, &index, ULONG_MAX, XA_PRESENT);
+ if (bl)
+ xa_erase(&ctx->io_bl_xa, bl->bgid);
+ }
+ if (!bl)
+ break;
io_put_bl(ctx, bl);
}
@@ -589,11 +597,7 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
INIT_LIST_HEAD(&bl->buf_list);
ret = io_buffer_add_list(ctx, bl, p->bgid);
if (ret) {
- /*
- * Doesn't need rcu free as it was never visible, but
- * let's keep it consistent throughout.
- */
- kfree_rcu(bl, rcu);
+ kfree(bl);
goto err;
}
}
@@ -736,7 +740,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
return 0;
}
- kfree_rcu(free_bl, rcu);
+ kfree(free_bl);
return ret;
}
@@ -760,7 +764,9 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
if (!(bl->flags & IOBL_BUF_RING))
return -EINVAL;
- xa_erase(&ctx->io_bl_xa, bl->bgid);
+ scoped_guard(mutex, &ctx->mmap_lock)
+ xa_erase(&ctx->io_bl_xa, bl->bgid);
+
io_put_bl(ctx, bl);
return 0;
}
@@ -795,29 +801,13 @@ struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
unsigned long bgid)
{
struct io_buffer_list *bl;
- bool ret;
- /*
- * We have to be a bit careful here - we're inside mmap and cannot grab
- * the uring_lock. This means the buffer_list could be simultaneously
- * going away, if someone is trying to be sneaky. Look it up under rcu
- * so we know it's not going away, and attempt to grab a reference to
- * it. If the ref is already zero, then fail the mapping. If successful,
- * the caller will call io_put_bl() to drop the the reference at at the
- * end. This may then safely free the buffer_list (and drop the pages)
- * at that point, vm_insert_pages() would've already grabbed the
- * necessary vma references.
- */
- rcu_read_lock();
bl = xa_load(&ctx->io_bl_xa, bgid);
/* must be a mmap'able buffer ring and have pages */
- ret = false;
- if (bl && bl->flags & IOBL_MMAP)
- ret = atomic_inc_not_zero(&bl->refs);
- rcu_read_unlock();
-
- if (ret)
- return bl;
+ if (bl && bl->flags & IOBL_MMAP) {
+ if (atomic_inc_not_zero(&bl->refs))
+ return bl;
+ }
return ERR_PTR(-EINVAL);
}
@@ -829,6 +819,8 @@ int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
struct io_buffer_list *bl;
int bgid, ret;
+ lockdep_assert_held(&ctx->mmap_lock);
+
bgid = (pgoff & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
bl = io_pbuf_get_bl(ctx, bgid);
if (IS_ERR(bl))
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 36aadfe5ac00..d5e4afcbfbb3 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -25,7 +25,6 @@ struct io_buffer_list {
struct page **buf_pages;
struct io_uring_buf_ring *buf_ring;
};
- struct rcu_head rcu;
};
__u16 bgid;
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 16/18] io_uring/kbuf: remove pbuf ring refcounting
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (14 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 15/18] io_uring/kbuf: use mmap_lock to sync with mmap Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 17/18] io_uring/kbuf: use region api for pbuf rings Pavel Begunkov
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
struct io_buffer_list refcounting was needed for RCU based sync with
mmap, now we can kill it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/kbuf.c | 21 +++++++--------------
io_uring/kbuf.h | 3 ---
io_uring/memmap.c | 1 -
3 files changed, 7 insertions(+), 18 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 662e928cc3b0..644f61445ec9 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -48,7 +48,6 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
* always under the ->uring_lock, but lookups from mmap do.
*/
bl->bgid = bgid;
- atomic_set(&bl->refs, 1);
guard(mutex)(&ctx->mmap_lock);
return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
}
@@ -385,12 +384,10 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx,
return i;
}
-void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
+static void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
{
- if (atomic_dec_and_test(&bl->refs)) {
- __io_remove_buffers(ctx, bl, -1U);
- kfree(bl);
- }
+ __io_remove_buffers(ctx, bl, -1U);
+ kfree(bl);
}
void io_destroy_buffers(struct io_ring_ctx *ctx)
@@ -804,10 +801,8 @@ struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
bl = xa_load(&ctx->io_bl_xa, bgid);
/* must be a mmap'able buffer ring and have pages */
- if (bl && bl->flags & IOBL_MMAP) {
- if (atomic_inc_not_zero(&bl->refs))
- return bl;
- }
+ if (bl && bl->flags & IOBL_MMAP)
+ return bl;
return ERR_PTR(-EINVAL);
}
@@ -817,7 +812,7 @@ int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
struct io_ring_ctx *ctx = file->private_data;
loff_t pgoff = vma->vm_pgoff << PAGE_SHIFT;
struct io_buffer_list *bl;
- int bgid, ret;
+ int bgid;
lockdep_assert_held(&ctx->mmap_lock);
@@ -826,7 +821,5 @@ int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
if (IS_ERR(bl))
return PTR_ERR(bl);
- ret = io_uring_mmap_pages(ctx, vma, bl->buf_pages, bl->buf_nr_pages);
- io_put_bl(ctx, bl);
- return ret;
+ return io_uring_mmap_pages(ctx, vma, bl->buf_pages, bl->buf_nr_pages);
}
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index d5e4afcbfbb3..dff7444026a6 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -35,8 +35,6 @@ struct io_buffer_list {
__u16 mask;
__u16 flags;
-
- atomic_t refs;
};
struct io_buffer {
@@ -83,7 +81,6 @@ void __io_put_kbuf(struct io_kiocb *req, int len, unsigned issue_flags);
bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
-void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl);
struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
unsigned long bgid);
int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma);
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 668b1c3579a2..73b73f4ea1bd 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -383,7 +383,6 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
if (IS_ERR(bl))
return bl;
ptr = bl->buf_ring;
- io_put_bl(ctx, bl);
return ptr;
}
case IORING_MAP_OFF_PARAM_REGION:
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 17/18] io_uring/kbuf: use region api for pbuf rings
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (15 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 16/18] io_uring/kbuf: remove pbuf ring refcounting Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 18/18] io_uring/memmap: unify io_uring mmap'ing code Pavel Begunkov
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Convert internal parts of the provided buffer ring managment to the
region API. It's the last non-region mapped ring we have, so it also
kills a bunch of now unused memmap.c helpers.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/kbuf.c | 170 ++++++++++++++--------------------------------
io_uring/kbuf.h | 18 ++---
io_uring/memmap.c | 118 +++++---------------------------
io_uring/memmap.h | 7 --
4 files changed, 73 insertions(+), 240 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 644f61445ec9..2dfb9f9419a0 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -351,17 +351,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx,
if (bl->flags & IOBL_BUF_RING) {
i = bl->buf_ring->tail - bl->head;
- if (bl->buf_nr_pages) {
- int j;
-
- if (!(bl->flags & IOBL_MMAP)) {
- for (j = 0; j < bl->buf_nr_pages; j++)
- unpin_user_page(bl->buf_pages[j]);
- }
- io_pages_unmap(bl->buf_ring, &bl->buf_pages,
- &bl->buf_nr_pages, bl->flags & IOBL_MMAP);
- bl->flags &= ~IOBL_MMAP;
- }
+ io_free_region(ctx, &bl->region);
/* make sure it's seen as empty */
INIT_LIST_HEAD(&bl->buf_list);
bl->flags &= ~IOBL_BUF_RING;
@@ -614,75 +604,14 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
return IOU_OK;
}
-static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg,
- struct io_buffer_list *bl)
-{
- struct io_uring_buf_ring *br = NULL;
- struct page **pages;
- int nr_pages, ret;
-
- pages = io_pin_pages(reg->ring_addr,
- flex_array_size(br, bufs, reg->ring_entries),
- &nr_pages);
- if (IS_ERR(pages))
- return PTR_ERR(pages);
-
- br = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- if (!br) {
- ret = -ENOMEM;
- goto error_unpin;
- }
-
-#ifdef SHM_COLOUR
- /*
- * On platforms that have specific aliasing requirements, SHM_COLOUR
- * is set and we must guarantee that the kernel and user side align
- * nicely. We cannot do that if IOU_PBUF_RING_MMAP isn't set and
- * the application mmap's the provided ring buffer. Fail the request
- * if we, by chance, don't end up with aligned addresses. The app
- * should use IOU_PBUF_RING_MMAP instead, and liburing will handle
- * this transparently.
- */
- if ((reg->ring_addr | (unsigned long) br) & (SHM_COLOUR - 1)) {
- ret = -EINVAL;
- goto error_unpin;
- }
-#endif
- bl->buf_pages = pages;
- bl->buf_nr_pages = nr_pages;
- bl->buf_ring = br;
- bl->flags |= IOBL_BUF_RING;
- bl->flags &= ~IOBL_MMAP;
- return 0;
-error_unpin:
- unpin_user_pages(pages, nr_pages);
- kvfree(pages);
- vunmap(br);
- return ret;
-}
-
-static int io_alloc_pbuf_ring(struct io_ring_ctx *ctx,
- struct io_uring_buf_reg *reg,
- struct io_buffer_list *bl)
-{
- size_t ring_size;
-
- ring_size = reg->ring_entries * sizeof(struct io_uring_buf_ring);
-
- bl->buf_ring = io_pages_map(&bl->buf_pages, &bl->buf_nr_pages, ring_size);
- if (IS_ERR(bl->buf_ring)) {
- bl->buf_ring = NULL;
- return -ENOMEM;
- }
-
- bl->flags |= (IOBL_BUF_RING | IOBL_MMAP);
- return 0;
-}
-
int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
{
struct io_uring_buf_reg reg;
struct io_buffer_list *bl, *free_bl = NULL;
+ struct io_uring_region_desc rd;
+ struct io_uring_buf_ring *br;
+ unsigned long mmap_offset;
+ unsigned long ring_size;
int ret;
lockdep_assert_held(&ctx->uring_lock);
@@ -694,19 +623,8 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
return -EINVAL;
if (reg.flags & ~(IOU_PBUF_RING_MMAP | IOU_PBUF_RING_INC))
return -EINVAL;
- if (!(reg.flags & IOU_PBUF_RING_MMAP)) {
- if (!reg.ring_addr)
- return -EFAULT;
- if (reg.ring_addr & ~PAGE_MASK)
- return -EINVAL;
- } else {
- if (reg.ring_addr)
- return -EINVAL;
- }
-
if (!is_power_of_2(reg.ring_entries))
return -EINVAL;
-
/* cannot disambiguate full vs empty due to head/tail size */
if (reg.ring_entries >= 65536)
return -EINVAL;
@@ -722,21 +640,47 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
return -ENOMEM;
}
- if (!(reg.flags & IOU_PBUF_RING_MMAP))
- ret = io_pin_pbuf_ring(®, bl);
- else
- ret = io_alloc_pbuf_ring(ctx, ®, bl);
+ mmap_offset = reg.bgid << IORING_OFF_PBUF_SHIFT;
+ ring_size = flex_array_size(br, bufs, reg.ring_entries);
- if (!ret) {
- bl->nr_entries = reg.ring_entries;
- bl->mask = reg.ring_entries - 1;
- if (reg.flags & IOU_PBUF_RING_INC)
- bl->flags |= IOBL_INC;
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(ring_size);
+ if (!(reg.flags & IOU_PBUF_RING_MMAP)) {
+ rd.user_addr = reg.ring_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region_mmap_safe(ctx, &bl->region, &rd, mmap_offset);
+ if (ret)
+ goto fail;
+ br = io_region_get_ptr(&bl->region);
- io_buffer_add_list(ctx, bl, reg.bgid);
- return 0;
+#ifdef SHM_COLOUR
+ /*
+ * On platforms that have specific aliasing requirements, SHM_COLOUR
+ * is set and we must guarantee that the kernel and user side align
+ * nicely. We cannot do that if IOU_PBUF_RING_MMAP isn't set and
+ * the application mmap's the provided ring buffer. Fail the request
+ * if we, by chance, don't end up with aligned addresses. The app
+ * should use IOU_PBUF_RING_MMAP instead, and liburing will handle
+ * this transparently.
+ */
+ if (!(reg.flags & IOU_PBUF_RING_MMAP) &&
+ ((reg.ring_addr | (unsigned long)br) & (SHM_COLOUR - 1))) {
+ ret = -EINVAL;
+ goto fail;
}
+#endif
+ bl->nr_entries = reg.ring_entries;
+ bl->mask = reg.ring_entries - 1;
+ bl->flags |= IOBL_BUF_RING;
+ bl->buf_ring = br;
+ if (reg.flags & IOU_PBUF_RING_INC)
+ bl->flags |= IOBL_INC;
+ io_buffer_add_list(ctx, bl, reg.bgid);
+ return 0;
+fail:
+ io_free_region(ctx, &bl->region);
kfree(free_bl);
return ret;
}
@@ -794,32 +738,18 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg)
return 0;
}
-struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
- unsigned long bgid)
-{
- struct io_buffer_list *bl;
-
- bl = xa_load(&ctx->io_bl_xa, bgid);
- /* must be a mmap'able buffer ring and have pages */
- if (bl && bl->flags & IOBL_MMAP)
- return bl;
-
- return ERR_PTR(-EINVAL);
-}
-
-int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
+struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
+ unsigned int bgid)
{
- struct io_ring_ctx *ctx = file->private_data;
- loff_t pgoff = vma->vm_pgoff << PAGE_SHIFT;
struct io_buffer_list *bl;
- int bgid;
lockdep_assert_held(&ctx->mmap_lock);
- bgid = (pgoff & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- bl = io_pbuf_get_bl(ctx, bgid);
- if (IS_ERR(bl))
- return PTR_ERR(bl);
+ bl = xa_load(&ctx->io_bl_xa, bgid);
+ if (!bl || !(bl->flags & IOBL_BUF_RING))
+ return NULL;
+ if (WARN_ON_ONCE(!io_region_is_set(&bl->region)))
+ return NULL;
- return io_uring_mmap_pages(ctx, vma, bl->buf_pages, bl->buf_nr_pages);
+ return &bl->region;
}
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index dff7444026a6..bd80c44c5af1 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -3,15 +3,13 @@
#define IOU_KBUF_H
#include <uapi/linux/io_uring.h>
+#include <linux/io_uring_types.h>
enum {
/* ring mapped provided buffers */
IOBL_BUF_RING = 1,
- /* ring mapped provided buffers, but mmap'ed by application */
- IOBL_MMAP = 2,
/* buffers are consumed incrementally rather than always fully */
- IOBL_INC = 4,
-
+ IOBL_INC = 2,
};
struct io_buffer_list {
@@ -21,10 +19,7 @@ struct io_buffer_list {
*/
union {
struct list_head buf_list;
- struct {
- struct page **buf_pages;
- struct io_uring_buf_ring *buf_ring;
- };
+ struct io_uring_buf_ring *buf_ring;
};
__u16 bgid;
@@ -35,6 +30,8 @@ struct io_buffer_list {
__u16 mask;
__u16 flags;
+
+ struct io_mapped_region region;
};
struct io_buffer {
@@ -81,9 +78,8 @@ void __io_put_kbuf(struct io_kiocb *req, int len, unsigned issue_flags);
bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
-struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
- unsigned long bgid);
-int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma);
+struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
+ unsigned int bgid);
static inline bool io_kbuf_recycle_ring(struct io_kiocb *req)
{
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 73b73f4ea1bd..6d8a98bd9cac 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -36,90 +36,6 @@ static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
return page_address(page);
}
-static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size,
- gfp_t gfp)
-{
- void *ret;
- int i;
-
- for (i = 0; i < nr_pages; i++) {
- pages[i] = alloc_page(gfp);
- if (!pages[i])
- goto err;
- }
-
- ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- if (ret)
- return ret;
-err:
- while (i--)
- put_page(pages[i]);
- return ERR_PTR(-ENOMEM);
-}
-
-void *io_pages_map(struct page ***out_pages, unsigned short *npages,
- size_t size)
-{
- gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
- struct page **pages;
- int nr_pages;
- void *ret;
-
- nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
- pages = kvmalloc_array(nr_pages, sizeof(struct page *), gfp);
- if (!pages)
- return ERR_PTR(-ENOMEM);
-
- ret = io_mem_alloc_compound(pages, nr_pages, size, gfp);
- if (!IS_ERR(ret))
- goto done;
- if (nr_pages == 1)
- goto fail;
-
- ret = io_mem_alloc_single(pages, nr_pages, size, gfp);
- if (!IS_ERR(ret)) {
-done:
- *out_pages = pages;
- *npages = nr_pages;
- return ret;
- }
-fail:
- kvfree(pages);
- *out_pages = NULL;
- *npages = 0;
- return ret;
-}
-
-void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
- bool put_pages)
-{
- bool do_vunmap = false;
-
- if (!ptr)
- return;
-
- if (put_pages && *npages) {
- struct page **to_free = *pages;
- int i;
-
- /*
- * Only did vmap for the non-compound multiple page case.
- * For the compound page, we just need to put the head.
- */
- if (PageCompound(to_free[0]))
- *npages = 1;
- else if (*npages > 1)
- do_vunmap = true;
- for (i = 0; i < *npages; i++)
- put_page(to_free[i]);
- }
- if (do_vunmap)
- vunmap(ptr);
- kvfree(*pages);
- *pages = NULL;
- *npages = 0;
-}
-
struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
{
unsigned long start, end, nr_pages;
@@ -374,16 +290,14 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
return ERR_PTR(-EFAULT);
return ctx->sq_sqes;
case IORING_OFF_PBUF_RING: {
- struct io_buffer_list *bl;
+ struct io_mapped_region *region;
unsigned int bgid;
- void *ptr;
bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- bl = io_pbuf_get_bl(ctx, bgid);
- if (IS_ERR(bl))
- return bl;
- ptr = bl->buf_ring;
- return ptr;
+ region = io_pbuf_get_region(ctx, bgid);
+ if (!region)
+ return ERR_PTR(-EINVAL);
+ return io_region_validate_mmap(ctx, region);
}
case IORING_MAP_OFF_PARAM_REGION:
return io_region_validate_mmap(ctx, &ctx->param_region);
@@ -392,15 +306,6 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
return ERR_PTR(-EINVAL);
}
-int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
- struct page **pages, int npages)
-{
- unsigned long nr_pages = npages;
-
- vm_flags_set(vma, VM_DONTEXPAND);
- return vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);
-}
-
#ifdef CONFIG_MMU
static int io_region_mmap(struct io_ring_ctx *ctx,
@@ -435,8 +340,17 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
return io_region_mmap(ctx, &ctx->ring_region, vma, page_limit);
case IORING_OFF_SQES:
return io_region_mmap(ctx, &ctx->sq_region, vma, UINT_MAX);
- case IORING_OFF_PBUF_RING:
- return io_pbuf_mmap(file, vma);
+ case IORING_OFF_PBUF_RING: {
+ struct io_mapped_region *region;
+ unsigned int bgid;
+
+ bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
+ region = io_pbuf_get_region(ctx, bgid);
+ if (!region)
+ return -EINVAL;
+
+ return io_region_mmap(ctx, region, vma, UINT_MAX);
+ }
case IORING_MAP_OFF_PARAM_REGION:
return io_region_mmap(ctx, &ctx->param_region, vma, UINT_MAX);
}
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index 7395996eb353..c898dcba2b4e 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -4,13 +4,6 @@
#define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
-int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
- struct page **pages, int npages);
-
-void *io_pages_map(struct page ***out_pages, unsigned short *npages,
- size_t size);
-void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
- bool put_pages);
#ifndef CONFIG_MMU
unsigned int io_uring_nommu_mmap_capabilities(struct file *file);
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 18/18] io_uring/memmap: unify io_uring mmap'ing code
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (16 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 17/18] io_uring/kbuf: use region api for pbuf rings Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 16:04 ` [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Jens Axboe
2024-11-29 16:06 ` Jens Axboe
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
All mapped memory is now backed by regions and we can unify and clean
up io_region_validate_mmap() and io_uring_mmap(). Extract a function
looking up a region, the rest of the handling should be generic and just
needs the region.
There is one more ring type specific code, i.e. the mmaping size
truncation quirk for IORING_OFF_[S,C]Q_RING, which is left as is.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/kbuf.c | 3 --
io_uring/memmap.c | 81 ++++++++++++++++++-----------------------------
2 files changed, 31 insertions(+), 53 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 2dfb9f9419a0..e91260a6156b 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -748,8 +748,5 @@ struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
bl = xa_load(&ctx->io_bl_xa, bgid);
if (!bl || !(bl->flags & IOBL_BUF_RING))
return NULL;
- if (WARN_ON_ONCE(!io_region_is_set(&bl->region)))
- return NULL;
-
return &bl->region;
}
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 6d8a98bd9cac..dda846190fbd 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -254,6 +254,27 @@ int io_create_region_mmap_safe(struct io_ring_ctx *ctx, struct io_mapped_region
return 0;
}
+static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
+ loff_t pgoff)
+{
+ loff_t offset = pgoff << PAGE_SHIFT;
+ unsigned int bgid;
+
+ switch (offset & IORING_OFF_MMAP_MASK) {
+ case IORING_OFF_SQ_RING:
+ case IORING_OFF_CQ_RING:
+ return &ctx->ring_region;
+ case IORING_OFF_SQES:
+ return &ctx->sq_region;
+ case IORING_OFF_PBUF_RING:
+ bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
+ return io_pbuf_get_region(ctx, bgid);
+ case IORING_MAP_OFF_PARAM_REGION:
+ return &ctx->param_region;
+ }
+ return NULL;
+}
+
static void *io_region_validate_mmap(struct io_ring_ctx *ctx,
struct io_mapped_region *mr)
{
@@ -271,39 +292,12 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
size_t sz)
{
struct io_ring_ctx *ctx = file->private_data;
- loff_t offset = pgoff << PAGE_SHIFT;
+ struct io_mapped_region *region;
- switch ((pgoff << PAGE_SHIFT) & IORING_OFF_MMAP_MASK) {
- case IORING_OFF_SQ_RING:
- case IORING_OFF_CQ_RING:
- /* Don't allow mmap if the ring was setup without it */
- if (ctx->flags & IORING_SETUP_NO_MMAP)
- return ERR_PTR(-EINVAL);
- if (!ctx->rings)
- return ERR_PTR(-EFAULT);
- return ctx->rings;
- case IORING_OFF_SQES:
- /* Don't allow mmap if the ring was setup without it */
- if (ctx->flags & IORING_SETUP_NO_MMAP)
- return ERR_PTR(-EINVAL);
- if (!ctx->sq_sqes)
- return ERR_PTR(-EFAULT);
- return ctx->sq_sqes;
- case IORING_OFF_PBUF_RING: {
- struct io_mapped_region *region;
- unsigned int bgid;
-
- bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- region = io_pbuf_get_region(ctx, bgid);
- if (!region)
- return ERR_PTR(-EINVAL);
- return io_region_validate_mmap(ctx, region);
- }
- case IORING_MAP_OFF_PARAM_REGION:
- return io_region_validate_mmap(ctx, &ctx->param_region);
- }
-
- return ERR_PTR(-EINVAL);
+ region = io_mmap_get_region(ctx, pgoff);
+ if (!region)
+ return ERR_PTR(-EINVAL);
+ return io_region_validate_mmap(ctx, region);
}
#ifdef CONFIG_MMU
@@ -324,7 +318,8 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
struct io_ring_ctx *ctx = file->private_data;
size_t sz = vma->vm_end - vma->vm_start;
long offset = vma->vm_pgoff << PAGE_SHIFT;
- unsigned int page_limit;
+ unsigned int page_limit = UINT_MAX;
+ struct io_mapped_region *region;
void *ptr;
guard(mutex)(&ctx->mmap_lock);
@@ -337,25 +332,11 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
case IORING_OFF_SQ_RING:
case IORING_OFF_CQ_RING:
page_limit = (sz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- return io_region_mmap(ctx, &ctx->ring_region, vma, page_limit);
- case IORING_OFF_SQES:
- return io_region_mmap(ctx, &ctx->sq_region, vma, UINT_MAX);
- case IORING_OFF_PBUF_RING: {
- struct io_mapped_region *region;
- unsigned int bgid;
-
- bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- region = io_pbuf_get_region(ctx, bgid);
- if (!region)
- return -EINVAL;
-
- return io_region_mmap(ctx, region, vma, UINT_MAX);
- }
- case IORING_MAP_OFF_PARAM_REGION:
- return io_region_mmap(ctx, &ctx->param_region, vma, UINT_MAX);
+ break;
}
- return -EINVAL;
+ region = io_mmap_get_region(ctx, vma->vm_pgoff);
+ return io_region_mmap(ctx, region, vma, page_limit);
}
unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 00/18] kernel allocated regions and convert memmap to regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (17 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 18/18] io_uring/memmap: unify io_uring mmap'ing code Pavel Begunkov
@ 2024-11-29 16:04 ` Jens Axboe
2024-11-29 16:06 ` Jens Axboe
19 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2024-11-29 16:04 UTC (permalink / raw)
To: io-uring, Pavel Begunkov
On Fri, 29 Nov 2024 13:34:21 +0000, Pavel Begunkov wrote:
> The first part of the series (Patches 1-11) implement kernel allocated
> regions, which is the classical way SQ/CQ are created. It should be
> straightforward with simple preparations patches and cleanups. The main
> part is Patch 10, which internally implements kernel allocations, and
> Patch 11 that implementing the mmap part and exposes it to reg-wait /
> parameter region users.
>
> [...]
Applied, thanks!
[01/18] io_uring: rename ->resize_lock
commit: e4e0f7d04627a3a8380bda82c4690f598b095b66
[02/18] io_uring/rsrc: export io_check_coalesce_buffer
commit: b5c715ee796dee285f902276c38c808f6a7799cf
[03/18] io_uring/memmap: flag vmap'ed regions
commit: ea57c4c88ffb3f7247200275435bf4aa4894f965
[04/18] io_uring/memmap: flag regions with user pages
commit: 67b855ba258319abe9fac15e6ddf07e57c1589c5
[05/18] io_uring/memmap: account memory before pinning
commit: 85652c20eda52bdf2ecb059da0e5d9c50f2824b7
[06/18] io_uring/memmap: reuse io_free_region for failure path
commit: 3e0b1575a596cded61eee4ef75870a741a40fcc4
[07/18] io_uring/memmap: optimise single folio regions
commit: 1e80236d16da642240292194c9e34fb37664f606
[08/18] io_uring/memmap: helper for pinning region pages
commit: 5e015f23f7d382ed1a301d015284bc8cca87335b
[09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF
commit: 8acfcf152fef8566a19fe9cdbacdb6a6bdec5520
[10/18] io_uring/memmap: implement kernel allocated regions
commit: 9407cfd8c016024e23ef9c37e422b204dfaf435c
[11/18] io_uring/memmap: implement mmap for regions
commit: efd160a19fdb27db0436a21194972a4ce49bab2d
[12/18] io_uring: pass ctx to io_register_free_rings
commit: 458b0ea4de8d5045e446035a1cdb49f1e6f01789
[13/18] io_uring: use region api for SQ
commit: 5f58f826fcbff03f392fda796445992d59d34a80
[14/18] io_uring: use region api for CQ
commit: 9c0966c93e771eb17da6a41721c4f6613f616212
[15/18] io_uring/kbuf: use mmap_lock to sync with mmap
commit: 8dec4fa7082c0f8dd9692ac110777a994258a798
[16/18] io_uring/kbuf: remove pbuf ring refcounting
commit: 6a2036aec3830a293a5ca2d6059b5e4a450a4e0e
[17/18] io_uring/kbuf: use region api for pbuf rings
commit: d67839c6abfe5dd505390710502e2f9944a51126
[18/18] io_uring/memmap: unify io_uring mmap'ing code
commit: 17f5a7960c70c9a1ec4cb9a63be0898a47af804a
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 00/18] kernel allocated regions and convert memmap to regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (18 preceding siblings ...)
2024-11-29 16:04 ` [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Jens Axboe
@ 2024-11-29 16:06 ` Jens Axboe
19 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2024-11-29 16:06 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 11/29/24 6:34 AM, Pavel Begunkov wrote:
> The first part of the series (Patches 1-11) implement kernel allocated
> regions, which is the classical way SQ/CQ are created. It should be
> straightforward with simple preparations patches and cleanups. The main
> part is Patch 10, which internally implements kernel allocations, and
> Patch 11 that implementing the mmap part and exposes it to reg-wait /
> parameter region users.
>
> The rest (Patches 12-18) converts SQ, CQ and provided buffers rings
> to regions, which carves a common path for all of them and removes
> duplication.
This is really nice, great unification of it all. And the diffstat
tells that story too.
--
Jens Axboe
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-11-29 16:06 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 01/18] io_uring: rename ->resize_lock Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 02/18] io_uring/rsrc: export io_check_coalesce_buffer Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 03/18] io_uring/memmap: flag vmap'ed regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 04/18] io_uring/memmap: flag regions with user pages Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 05/18] io_uring/memmap: account memory before pinning Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 06/18] io_uring/memmap: reuse io_free_region for failure path Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 07/18] io_uring/memmap: optimise single folio regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 08/18] io_uring/memmap: helper for pinning region pages Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 10/18] io_uring/memmap: implement kernel allocated regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 11/18] io_uring/memmap: implement mmap for regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 12/18] io_uring: pass ctx to io_register_free_rings Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 13/18] io_uring: use region api for SQ Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 14/18] io_uring: use region api for CQ Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 15/18] io_uring/kbuf: use mmap_lock to sync with mmap Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 16/18] io_uring/kbuf: remove pbuf ring refcounting Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 17/18] io_uring/kbuf: use region api for pbuf rings Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 18/18] io_uring/memmap: unify io_uring mmap'ing code Pavel Begunkov
2024-11-29 16:04 ` [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Jens Axboe
2024-11-29 16:06 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox