* [PATCH v3 01/18] io_uring: rename ->resize_lock
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 02/18] io_uring/rsrc: export io_check_coalesce_buffer Pavel Begunkov
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
->resize_lock is used for resizing rings, but it's a good idea to reuse
it in other cases as well. Rename it into mmap_lock as it's protects
from races with mmap.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 2 +-
io_uring/io_uring.c | 2 +-
io_uring/memmap.c | 6 +++---
io_uring/register.c | 8 ++++----
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3e934feb3187..adb36e0da40e 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -423,7 +423,7 @@ struct io_ring_ctx {
* side will need to grab this lock, to prevent either side from
* being run concurrently with the other.
*/
- struct mutex resize_lock;
+ struct mutex mmap_lock;
/*
* If IORING_SETUP_NO_MMAP is used, then the below holds
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ae199e44da57..c713ef35447b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -351,7 +351,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd);
io_napi_init(ctx);
- mutex_init(&ctx->resize_lock);
+ mutex_init(&ctx->mmap_lock);
return ctx;
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 57de9bccbf50..a0d4151d11af 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -329,7 +329,7 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
unsigned int npages;
void *ptr;
- guard(mutex)(&ctx->resize_lock);
+ guard(mutex)(&ctx->mmap_lock);
ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz);
if (IS_ERR(ptr))
@@ -365,7 +365,7 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
if (addr)
return -EINVAL;
- guard(mutex)(&ctx->resize_lock);
+ guard(mutex)(&ctx->mmap_lock);
ptr = io_uring_validate_mmap_request(filp, pgoff, len);
if (IS_ERR(ptr))
@@ -415,7 +415,7 @@ unsigned long io_uring_get_unmapped_area(struct file *file, unsigned long addr,
struct io_ring_ctx *ctx = file->private_data;
void *ptr;
- guard(mutex)(&ctx->resize_lock);
+ guard(mutex)(&ctx->mmap_lock);
ptr = io_uring_validate_mmap_request(file, pgoff, len);
if (IS_ERR(ptr))
diff --git a/io_uring/register.c b/io_uring/register.c
index 1e99c783abdf..ba61697d7a53 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -486,15 +486,15 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
}
/*
- * We'll do the swap. Grab the ctx->resize_lock, which will exclude
+ * We'll do the swap. Grab the ctx->mmap_lock, which will exclude
* any new mmap's on the ring fd. Clear out existing mappings to prevent
* mmap from seeing them, as we'll unmap them. Any attempt to mmap
* existing rings beyond this point will fail. Not that it could proceed
* at this point anyway, as the io_uring mmap side needs go grab the
- * ctx->resize_lock as well. Likewise, hold the completion lock over the
+ * ctx->mmap_lock as well. Likewise, hold the completion lock over the
* duration of the actual swap.
*/
- mutex_lock(&ctx->resize_lock);
+ mutex_lock(&ctx->mmap_lock);
spin_lock(&ctx->completion_lock);
o.rings = ctx->rings;
ctx->rings = NULL;
@@ -561,7 +561,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ret = 0;
out:
spin_unlock(&ctx->completion_lock);
- mutex_unlock(&ctx->resize_lock);
+ mutex_unlock(&ctx->mmap_lock);
io_register_free_rings(&p, to_free);
if (ctx->sq_data)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 02/18] io_uring/rsrc: export io_check_coalesce_buffer
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 01/18] io_uring: rename ->resize_lock Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 03/18] io_uring/memmap: flag vmap'ed regions Pavel Begunkov
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
io_try_coalesce_buffer() is a useful helper collecting useful info about
a set of pages, I want to reuse it for analysing ring/etc. mappings. I
don't need the entire thing and only interested if it can be coalesced
into a single page, but that's better than duplicating the parsing.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/rsrc.c | 22 ++++++++++++----------
io_uring/rsrc.h | 4 ++++
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index adaae8630932..e51e5ddae728 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -626,11 +626,12 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
return ret;
}
-static bool io_do_coalesce_buffer(struct page ***pages, int *nr_pages,
- struct io_imu_folio_data *data, int nr_folios)
+static bool io_coalesce_buffer(struct page ***pages, int *nr_pages,
+ struct io_imu_folio_data *data)
{
struct page **page_array = *pages, **new_array = NULL;
int nr_pages_left = *nr_pages, i, j;
+ int nr_folios = data->nr_folios;
/* Store head pages only*/
new_array = kvmalloc_array(nr_folios, sizeof(struct page *),
@@ -667,15 +668,14 @@ static bool io_do_coalesce_buffer(struct page ***pages, int *nr_pages,
return true;
}
-static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
- struct io_imu_folio_data *data)
+bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
+ struct io_imu_folio_data *data)
{
- struct page **page_array = *pages;
struct folio *folio = page_folio(page_array[0]);
unsigned int count = 1, nr_folios = 1;
int i;
- if (*nr_pages <= 1)
+ if (nr_pages <= 1)
return false;
data->nr_pages_mid = folio_nr_pages(folio);
@@ -687,7 +687,7 @@ static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
* Check if pages are contiguous inside a folio, and all folios have
* the same page count except for the head and tail.
*/
- for (i = 1; i < *nr_pages; i++) {
+ for (i = 1; i < nr_pages; i++) {
if (page_folio(page_array[i]) == folio &&
page_array[i] == page_array[i-1] + 1) {
count++;
@@ -715,7 +715,8 @@ static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
if (nr_folios == 1)
data->nr_pages_head = count;
- return io_do_coalesce_buffer(pages, nr_pages, data, nr_folios);
+ data->nr_folios = nr_folios;
+ return true;
}
static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
@@ -729,7 +730,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
size_t size;
int ret, nr_pages, i;
struct io_imu_folio_data data;
- bool coalesced;
+ bool coalesced = false;
if (!iov->iov_base)
return NULL;
@@ -749,7 +750,8 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
}
/* If it's huge page(s), try to coalesce them into fewer bvec entries */
- coalesced = io_try_coalesce_buffer(&pages, &nr_pages, &data);
+ if (io_check_coalesce_buffer(pages, nr_pages, &data))
+ coalesced = io_coalesce_buffer(&pages, &nr_pages, &data);
imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
if (!imu)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 7a4668deaa1a..c8b093584461 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -40,6 +40,7 @@ struct io_imu_folio_data {
/* For non-head/tail folios, has to be fully included */
unsigned int nr_pages_mid;
unsigned int folio_shift;
+ unsigned int nr_folios;
};
struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type);
@@ -66,6 +67,9 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg,
int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg,
unsigned int size, unsigned int type);
+bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
+ struct io_imu_folio_data *data);
+
static inline struct io_rsrc_node *io_rsrc_node_lookup(struct io_rsrc_data *data,
int index)
{
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 03/18] io_uring/memmap: flag vmap'ed regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 01/18] io_uring: rename ->resize_lock Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 02/18] io_uring/rsrc: export io_check_coalesce_buffer Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 04/18] io_uring/memmap: flag regions with user pages Pavel Begunkov
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Add internal flags for struct io_mapped_region. The first flag we need
is IO_REGION_F_VMAPPED, that indicates that the pointer has to be
unmapped on region destruction. For now all regions are vmap'ed, so it's
set unconditionally.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 5 +++--
io_uring/memmap.c | 14 ++++++++++----
io_uring/memmap.h | 2 +-
3 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index adb36e0da40e..4cee414080fd 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -77,8 +77,9 @@ struct io_hash_table {
struct io_mapped_region {
struct page **pages;
- void *vmap_ptr;
- size_t nr_pages;
+ void *ptr;
+ unsigned nr_pages;
+ unsigned flags;
};
/*
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index a0d4151d11af..31fb8c8ffe4e 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -202,14 +202,19 @@ void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
return ERR_PTR(-ENOMEM);
}
+enum {
+ /* memory was vmap'ed for the kernel, freeing the region vunmap's it */
+ IO_REGION_F_VMAP = 1,
+};
+
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
{
if (mr->pages) {
unpin_user_pages(mr->pages, mr->nr_pages);
kvfree(mr->pages);
}
- if (mr->vmap_ptr)
- vunmap(mr->vmap_ptr);
+ if ((mr->flags & IO_REGION_F_VMAP) && mr->ptr)
+ vunmap(mr->ptr);
if (mr->nr_pages && ctx->user)
__io_unaccount_mem(ctx->user, mr->nr_pages);
@@ -225,7 +230,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
void *vptr;
u64 end;
- if (WARN_ON_ONCE(mr->pages || mr->vmap_ptr || mr->nr_pages))
+ if (WARN_ON_ONCE(mr->pages || mr->ptr || mr->nr_pages))
return -EFAULT;
if (memchr_inv(®->__resv, 0, sizeof(reg->__resv)))
return -EINVAL;
@@ -260,8 +265,9 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
}
mr->pages = pages;
- mr->vmap_ptr = vptr;
+ mr->ptr = vptr;
mr->nr_pages = nr_pages;
+ mr->flags |= IO_REGION_F_VMAP;
return 0;
out_free:
if (pages_accounted)
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index f361a635b6c7..2096a8427277 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -28,7 +28,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
static inline void *io_region_get_ptr(struct io_mapped_region *mr)
{
- return mr->vmap_ptr;
+ return mr->ptr;
}
static inline bool io_region_is_set(struct io_mapped_region *mr)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 04/18] io_uring/memmap: flag regions with user pages
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (2 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 03/18] io_uring/memmap: flag vmap'ed regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 05/18] io_uring/memmap: account memory before pinning Pavel Begunkov
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
In preparation to kernel allocated regions add a flag telling if
the region contains user pinned pages or not.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 31fb8c8ffe4e..a0416733e921 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -205,12 +205,17 @@ void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
enum {
/* memory was vmap'ed for the kernel, freeing the region vunmap's it */
IO_REGION_F_VMAP = 1,
+ /* memory is provided by user and pinned by the kernel */
+ IO_REGION_F_USER_PROVIDED = 2,
};
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
{
if (mr->pages) {
- unpin_user_pages(mr->pages, mr->nr_pages);
+ if (mr->flags & IO_REGION_F_USER_PROVIDED)
+ unpin_user_pages(mr->pages, mr->nr_pages);
+ else
+ release_pages(mr->pages, mr->nr_pages);
kvfree(mr->pages);
}
if ((mr->flags & IO_REGION_F_VMAP) && mr->ptr)
@@ -267,7 +272,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
mr->pages = pages;
mr->ptr = vptr;
mr->nr_pages = nr_pages;
- mr->flags |= IO_REGION_F_VMAP;
+ mr->flags |= IO_REGION_F_VMAP | IO_REGION_F_USER_PROVIDED;
return 0;
out_free:
if (pages_accounted)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 05/18] io_uring/memmap: account memory before pinning
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (3 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 04/18] io_uring/memmap: flag regions with user pages Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 06/18] io_uring/memmap: reuse io_free_region for failure path Pavel Begunkov
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Move memory accounting before page pinning. It shouldn't even try to pin
pages if it's not allowed, and accounting is also relatively
inexpensive. It also give a better code structure as we do generic
accounting and then can branch for different mapping types.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index a0416733e921..fca93bc4c6f1 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -252,17 +252,21 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
if (check_add_overflow(reg->user_addr, reg->size, &end))
return -EOVERFLOW;
- pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
- if (IS_ERR(pages))
- return PTR_ERR(pages);
-
+ nr_pages = reg->size >> PAGE_SHIFT;
if (ctx->user) {
ret = __io_account_mem(ctx->user, nr_pages);
if (ret)
- goto out_free;
+ return ret;
pages_accounted = nr_pages;
}
+ pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
+ if (IS_ERR(pages)) {
+ ret = PTR_ERR(pages);
+ pages = NULL;
+ goto out_free;
+ }
+
vptr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
if (!vptr) {
ret = -ENOMEM;
@@ -277,7 +281,8 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
out_free:
if (pages_accounted)
__io_unaccount_mem(ctx->user, pages_accounted);
- io_pages_free(&pages, nr_pages);
+ if (pages)
+ io_pages_free(&pages, nr_pages);
return ret;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 06/18] io_uring/memmap: reuse io_free_region for failure path
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (4 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 05/18] io_uring/memmap: account memory before pinning Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 07/18] io_uring/memmap: optimise single folio regions Pavel Begunkov
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Regions are going to become more complex with allocation options and
optimisations, I want to split initialisation into steps and for that it
needs a sane fail path. Reuse io_free_region(), it's smart enough to
undo only what's needed and leaves the structure in a consistent state.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index fca93bc4c6f1..96c4f6b61171 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -229,7 +229,6 @@ void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
- int pages_accounted = 0;
struct page **pages;
int nr_pages, ret;
void *vptr;
@@ -257,32 +256,27 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
ret = __io_account_mem(ctx->user, nr_pages);
if (ret)
return ret;
- pages_accounted = nr_pages;
}
+ mr->nr_pages = nr_pages;
pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
if (IS_ERR(pages)) {
ret = PTR_ERR(pages);
- pages = NULL;
goto out_free;
}
+ mr->pages = pages;
+ mr->flags |= IO_REGION_F_USER_PROVIDED;
vptr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
if (!vptr) {
ret = -ENOMEM;
goto out_free;
}
-
- mr->pages = pages;
mr->ptr = vptr;
- mr->nr_pages = nr_pages;
- mr->flags |= IO_REGION_F_VMAP | IO_REGION_F_USER_PROVIDED;
+ mr->flags |= IO_REGION_F_VMAP;
return 0;
out_free:
- if (pages_accounted)
- __io_unaccount_mem(ctx->user, pages_accounted);
- if (pages)
- io_pages_free(&pages, nr_pages);
+ io_free_region(ctx, mr);
return ret;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 07/18] io_uring/memmap: optimise single folio regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (5 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 06/18] io_uring/memmap: reuse io_free_region for failure path Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 08/18] io_uring/memmap: helper for pinning region pages Pavel Begunkov
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
We don't need to vmap if memory is already physically contiguous. There
are two important cases it covers: PAGE_SIZE regions and huge pages.
Use io_check_coalesce_buffer() to get the number of contiguous folios.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 96c4f6b61171..fd348c98f64f 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -226,12 +226,31 @@ void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
memset(mr, 0, sizeof(*mr));
}
+static int io_region_init_ptr(struct io_mapped_region *mr)
+{
+ struct io_imu_folio_data ifd;
+ void *ptr;
+
+ if (io_check_coalesce_buffer(mr->pages, mr->nr_pages, &ifd)) {
+ if (ifd.nr_folios == 1) {
+ mr->ptr = page_address(mr->pages[0]);
+ return 0;
+ }
+ }
+ ptr = vmap(mr->pages, mr->nr_pages, VM_MAP, PAGE_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
+
+ mr->ptr = ptr;
+ mr->flags |= IO_REGION_F_VMAP;
+ return 0;
+}
+
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
struct page **pages;
int nr_pages, ret;
- void *vptr;
u64 end;
if (WARN_ON_ONCE(mr->pages || mr->ptr || mr->nr_pages))
@@ -267,13 +286,9 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
mr->pages = pages;
mr->flags |= IO_REGION_F_USER_PROVIDED;
- vptr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- if (!vptr) {
- ret = -ENOMEM;
+ ret = io_region_init_ptr(mr);
+ if (ret)
goto out_free;
- }
- mr->ptr = vptr;
- mr->flags |= IO_REGION_F_VMAP;
return 0;
out_free:
io_free_region(ctx, mr);
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 08/18] io_uring/memmap: helper for pinning region pages
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (6 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 07/18] io_uring/memmap: optimise single folio regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF Pavel Begunkov
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
In preparation to adding kernel allocated regions extract a new helper
that pins user pages.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index fd348c98f64f..5d261e07c2e3 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -246,10 +246,28 @@ static int io_region_init_ptr(struct io_mapped_region *mr)
return 0;
}
+static int io_region_pin_pages(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg)
+{
+ unsigned long size = mr->nr_pages << PAGE_SHIFT;
+ struct page **pages;
+ int nr_pages;
+
+ pages = io_pin_pages(reg->user_addr, size, &nr_pages);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
+ if (WARN_ON_ONCE(nr_pages != mr->nr_pages))
+ return -EFAULT;
+
+ mr->pages = pages;
+ mr->flags |= IO_REGION_F_USER_PROVIDED;
+ return 0;
+}
+
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
- struct page **pages;
int nr_pages, ret;
u64 end;
@@ -278,13 +296,9 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
}
mr->nr_pages = nr_pages;
- pages = io_pin_pages(reg->user_addr, reg->size, &nr_pages);
- if (IS_ERR(pages)) {
- ret = PTR_ERR(pages);
+ ret = io_region_pin_pages(ctx, mr, reg);
+ if (ret)
goto out_free;
- }
- mr->pages = pages;
- mr->flags |= IO_REGION_F_USER_PROVIDED;
ret = io_region_init_ptr(mr);
if (ret)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (7 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 08/18] io_uring/memmap: helper for pinning region pages Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 10/18] io_uring/memmap: implement kernel allocated regions Pavel Begunkov
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Kernel allocated compound pages will have just one reference for the
entire page array, add a flag telling io_free_region about that.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 5d261e07c2e3..a37ccb167258 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -207,15 +207,23 @@ enum {
IO_REGION_F_VMAP = 1,
/* memory is provided by user and pinned by the kernel */
IO_REGION_F_USER_PROVIDED = 2,
+ /* only the first page in the array is ref'ed */
+ IO_REGION_F_SINGLE_REF = 4,
};
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr)
{
if (mr->pages) {
+ long nr_refs = mr->nr_pages;
+
+ if (mr->flags & IO_REGION_F_SINGLE_REF)
+ nr_refs = 1;
+
if (mr->flags & IO_REGION_F_USER_PROVIDED)
- unpin_user_pages(mr->pages, mr->nr_pages);
+ unpin_user_pages(mr->pages, nr_refs);
else
- release_pages(mr->pages, mr->nr_pages);
+ release_pages(mr->pages, nr_refs);
+
kvfree(mr->pages);
}
if ((mr->flags & IO_REGION_F_VMAP) && mr->ptr)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 10/18] io_uring/memmap: implement kernel allocated regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (8 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 11/18] io_uring/memmap: implement mmap for regions Pavel Begunkov
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Allow the kernel to allocate memory for a region. That's the classical
way SQ/CQ are allocated. It's not yet useful to user space as there
is no way to mmap it, which is why it's explicitly disabled in
io_register_mem_region().
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 43 ++++++++++++++++++++++++++++++++++++++++---
io_uring/register.c | 2 ++
2 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index a37ccb167258..0908a71bf57e 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -273,6 +273,39 @@ static int io_region_pin_pages(struct io_ring_ctx *ctx,
return 0;
}
+static int io_region_allocate_pages(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg)
+{
+ gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
+ unsigned long size = mr->nr_pages << PAGE_SHIFT;
+ unsigned long nr_allocated;
+ struct page **pages;
+ void *p;
+
+ pages = kvmalloc_array(mr->nr_pages, sizeof(*pages), gfp);
+ if (!pages)
+ return -ENOMEM;
+
+ p = io_mem_alloc_compound(pages, mr->nr_pages, size, gfp);
+ if (!IS_ERR(p)) {
+ mr->flags |= IO_REGION_F_SINGLE_REF;
+ mr->pages = pages;
+ return 0;
+ }
+
+ nr_allocated = alloc_pages_bulk_array_node(gfp, NUMA_NO_NODE,
+ mr->nr_pages, pages);
+ if (nr_allocated != mr->nr_pages) {
+ if (nr_allocated)
+ release_pages(pages, nr_allocated);
+ kvfree(pages);
+ return -ENOMEM;
+ }
+ mr->pages = pages;
+ return 0;
+}
+
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
struct io_uring_region_desc *reg)
{
@@ -283,9 +316,10 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
return -EFAULT;
if (memchr_inv(®->__resv, 0, sizeof(reg->__resv)))
return -EINVAL;
- if (reg->flags != IORING_MEM_REGION_TYPE_USER)
+ if (reg->flags & ~IORING_MEM_REGION_TYPE_USER)
return -EINVAL;
- if (!reg->user_addr)
+ /* user_addr should be set IFF it's a user memory backed region */
+ if ((reg->flags & IORING_MEM_REGION_TYPE_USER) != !!reg->user_addr)
return -EFAULT;
if (!reg->size || reg->mmap_offset || reg->id)
return -EINVAL;
@@ -304,7 +338,10 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
}
mr->nr_pages = nr_pages;
- ret = io_region_pin_pages(ctx, mr, reg);
+ if (reg->flags & IORING_MEM_REGION_TYPE_USER)
+ ret = io_region_pin_pages(ctx, mr, reg);
+ else
+ ret = io_region_allocate_pages(ctx, mr, reg);
if (ret)
goto out_free;
diff --git a/io_uring/register.c b/io_uring/register.c
index ba61697d7a53..f043d3f6b026 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -586,6 +586,8 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
if (copy_from_user(&rd, rd_uptr, sizeof(rd)))
return -EFAULT;
+ if (!(rd.flags & IORING_MEM_REGION_TYPE_USER))
+ return -EINVAL;
if (memchr_inv(®.__resv, 0, sizeof(reg.__resv)))
return -EINVAL;
if (reg.flags & ~IORING_MEM_REGION_REG_WAIT_ARG)
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 11/18] io_uring/memmap: implement mmap for regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (9 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 10/18] io_uring/memmap: implement kernel allocated regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 12/18] io_uring: pass ctx to io_register_free_rings Pavel Begunkov
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
The patch implements mmap for the param region and enables the kernel
allocation mode. Internally it uses a fixed mmap offset, however the
user has to use the offset returned in
struct io_uring_region_desc::mmap_offset.
Note, mmap doesn't and can't take ->uring_lock and the region / ring
lookup is protected by ->mmap_lock, and it's directly peeking at
ctx->param_region. We can't protect io_create_region() with the
mmap_lock as it'd deadlock, which is why io_create_region_mmap_safe()
initialises it for us in a temporary variable and then publishes it
with the lock taken. It's intentionally decoupled from main region
helpers, and in the future we might want to have a list of active
regions, which then could be protected by the ->mmap_lock.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/memmap.c | 61 +++++++++++++++++++++++++++++++++++++++++----
io_uring/memmap.h | 10 +++++++-
io_uring/register.c | 6 ++---
3 files changed, 67 insertions(+), 10 deletions(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 0908a71bf57e..9a182c8a4be1 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -275,7 +275,8 @@ static int io_region_pin_pages(struct io_ring_ctx *ctx,
static int io_region_allocate_pages(struct io_ring_ctx *ctx,
struct io_mapped_region *mr,
- struct io_uring_region_desc *reg)
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset)
{
gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
unsigned long size = mr->nr_pages << PAGE_SHIFT;
@@ -290,8 +291,7 @@ static int io_region_allocate_pages(struct io_ring_ctx *ctx,
p = io_mem_alloc_compound(pages, mr->nr_pages, size, gfp);
if (!IS_ERR(p)) {
mr->flags |= IO_REGION_F_SINGLE_REF;
- mr->pages = pages;
- return 0;
+ goto done;
}
nr_allocated = alloc_pages_bulk_array_node(gfp, NUMA_NO_NODE,
@@ -302,12 +302,15 @@ static int io_region_allocate_pages(struct io_ring_ctx *ctx,
kvfree(pages);
return -ENOMEM;
}
+done:
+ reg->mmap_offset = mmap_offset;
mr->pages = pages;
return 0;
}
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
- struct io_uring_region_desc *reg)
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset)
{
int nr_pages, ret;
u64 end;
@@ -341,7 +344,7 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
if (reg->flags & IORING_MEM_REGION_TYPE_USER)
ret = io_region_pin_pages(ctx, mr, reg);
else
- ret = io_region_allocate_pages(ctx, mr, reg);
+ ret = io_region_allocate_pages(ctx, mr, reg, mmap_offset);
if (ret)
goto out_free;
@@ -354,6 +357,40 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
return ret;
}
+int io_create_region_mmap_safe(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset)
+{
+ struct io_mapped_region tmp_mr;
+ int ret;
+
+ memcpy(&tmp_mr, mr, sizeof(tmp_mr));
+ ret = io_create_region(ctx, &tmp_mr, reg, mmap_offset);
+ if (ret)
+ return ret;
+
+ /*
+ * Once published mmap can find it without holding only the ->mmap_lock
+ * and not ->uring_lock.
+ */
+ guard(mutex)(&ctx->mmap_lock);
+ memcpy(mr, &tmp_mr, sizeof(tmp_mr));
+ return 0;
+}
+
+static void *io_region_validate_mmap(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr)
+{
+ lockdep_assert_held(&ctx->mmap_lock);
+
+ if (!io_region_is_set(mr))
+ return ERR_PTR(-EINVAL);
+ if (mr->flags & IO_REGION_F_USER_PROVIDED)
+ return ERR_PTR(-EINVAL);
+
+ return io_region_get_ptr(mr);
+}
+
static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
size_t sz)
{
@@ -389,6 +426,8 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
io_put_bl(ctx, bl);
return ptr;
}
+ case IORING_MAP_OFF_PARAM_REGION:
+ return io_region_validate_mmap(ctx, &ctx->param_region);
}
return ERR_PTR(-EINVAL);
@@ -405,6 +444,16 @@ int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
#ifdef CONFIG_MMU
+static int io_region_mmap(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct vm_area_struct *vma)
+{
+ unsigned long nr_pages = mr->nr_pages;
+
+ vm_flags_set(vma, VM_DONTEXPAND);
+ return vm_insert_pages(vma, vma->vm_start, mr->pages, &nr_pages);
+}
+
__cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
{
struct io_ring_ctx *ctx = file->private_data;
@@ -429,6 +478,8 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
ctx->n_sqe_pages);
case IORING_OFF_PBUF_RING:
return io_pbuf_mmap(file, vma);
+ case IORING_MAP_OFF_PARAM_REGION:
+ return io_region_mmap(ctx, &ctx->param_region, vma);
}
return -EINVAL;
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index 2096a8427277..2402bca3d700 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -1,6 +1,8 @@
#ifndef IO_URING_MEMMAP_H
#define IO_URING_MEMMAP_H
+#define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL
+
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
void io_pages_free(struct page ***pages, int npages);
int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
@@ -24,7 +26,13 @@ int io_uring_mmap(struct file *file, struct vm_area_struct *vma);
void io_free_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr);
int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
- struct io_uring_region_desc *reg);
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset);
+
+int io_create_region_mmap_safe(struct io_ring_ctx *ctx,
+ struct io_mapped_region *mr,
+ struct io_uring_region_desc *reg,
+ unsigned long mmap_offset);
static inline void *io_region_get_ptr(struct io_mapped_region *mr)
{
diff --git a/io_uring/register.c b/io_uring/register.c
index f043d3f6b026..5b099ec36d00 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -585,9 +585,6 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
rd_uptr = u64_to_user_ptr(reg.region_uptr);
if (copy_from_user(&rd, rd_uptr, sizeof(rd)))
return -EFAULT;
-
- if (!(rd.flags & IORING_MEM_REGION_TYPE_USER))
- return -EINVAL;
if (memchr_inv(®.__resv, 0, sizeof(reg.__resv)))
return -EINVAL;
if (reg.flags & ~IORING_MEM_REGION_REG_WAIT_ARG)
@@ -602,7 +599,8 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
!(ctx->flags & IORING_SETUP_R_DISABLED))
return -EINVAL;
- ret = io_create_region(ctx, &ctx->param_region, &rd);
+ ret = io_create_region_mmap_safe(ctx, &ctx->param_region, &rd,
+ IORING_MAP_OFF_PARAM_REGION);
if (ret)
return ret;
if (copy_to_user(rd_uptr, &rd, sizeof(rd))) {
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 12/18] io_uring: pass ctx to io_register_free_rings
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (10 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 11/18] io_uring/memmap: implement mmap for regions Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 13/18] io_uring: use region api for SQ Pavel Begunkov
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
A preparation patch, pass the context to io_register_free_rings.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/register.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/io_uring/register.c b/io_uring/register.c
index 5b099ec36d00..5e07205fb071 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -375,7 +375,8 @@ struct io_ring_ctx_rings {
struct io_rings *rings;
};
-static void io_register_free_rings(struct io_uring_params *p,
+static void io_register_free_rings(struct io_ring_ctx *ctx,
+ struct io_uring_params *p,
struct io_ring_ctx_rings *r)
{
if (!(p->flags & IORING_SETUP_NO_MMAP)) {
@@ -452,7 +453,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
n.rings->cq_ring_entries = p.cq_entries;
if (copy_to_user(arg, &p, sizeof(p))) {
- io_register_free_rings(&p, &n);
+ io_register_free_rings(ctx, &p, &n);
return -EFAULT;
}
@@ -461,7 +462,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
else
size = array_size(sizeof(struct io_uring_sqe), p.sq_entries);
if (size == SIZE_MAX) {
- io_register_free_rings(&p, &n);
+ io_register_free_rings(ctx, &p, &n);
return -EOVERFLOW;
}
@@ -472,7 +473,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
p.sq_off.user_addr,
size);
if (IS_ERR(ptr)) {
- io_register_free_rings(&p, &n);
+ io_register_free_rings(ctx, &p, &n);
return PTR_ERR(ptr);
}
@@ -562,7 +563,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
out:
spin_unlock(&ctx->completion_lock);
mutex_unlock(&ctx->mmap_lock);
- io_register_free_rings(&p, to_free);
+ io_register_free_rings(ctx, &p, to_free);
if (ctx->sq_data)
io_sq_thread_unpark(ctx->sq_data);
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 13/18] io_uring: use region api for SQ
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (11 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 12/18] io_uring: pass ctx to io_register_free_rings Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 14/18] io_uring: use region api for CQ Pavel Begunkov
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 3 +--
io_uring/io_uring.c | 36 +++++++++++++---------------------
io_uring/memmap.c | 3 +--
io_uring/register.c | 35 +++++++++++++++------------------
4 files changed, 32 insertions(+), 45 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 4cee414080fd..3f353f269c6e 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -431,10 +431,9 @@ struct io_ring_ctx {
* the gup'ed pages for the two rings, and the sqes.
*/
unsigned short n_ring_pages;
- unsigned short n_sqe_pages;
struct page **ring_pages;
- struct page **sqe_pages;
+ struct io_mapped_region sq_region;
/* used for optimised request parameter and wait argument passing */
struct io_mapped_region param_region;
};
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c713ef35447b..2ac80b4d4016 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2637,29 +2637,19 @@ static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr,
size);
}
-static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr,
- size_t size)
-{
- return __io_uaddr_map(&ctx->sqe_pages, &ctx->n_sqe_pages, uaddr,
- size);
-}
-
static void io_rings_free(struct io_ring_ctx *ctx)
{
if (!(ctx->flags & IORING_SETUP_NO_MMAP)) {
io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages,
true);
- io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages,
- true);
} else {
io_pages_free(&ctx->ring_pages, ctx->n_ring_pages);
ctx->n_ring_pages = 0;
- io_pages_free(&ctx->sqe_pages, ctx->n_sqe_pages);
- ctx->n_sqe_pages = 0;
vunmap(ctx->rings);
- vunmap(ctx->sq_sqes);
}
+ io_free_region(ctx, &ctx->sq_region);
+
ctx->rings = NULL;
ctx->sq_sqes = NULL;
}
@@ -3476,9 +3466,10 @@ bool io_is_uring_fops(struct file *file)
static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
struct io_uring_params *p)
{
+ struct io_uring_region_desc rd;
struct io_rings *rings;
size_t size, sq_array_offset;
- void *ptr;
+ int ret;
/* make sure these are sane, as we already accounted them */
ctx->sq_entries = p->sq_entries;
@@ -3514,17 +3505,18 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
return -EOVERFLOW;
}
- if (!(ctx->flags & IORING_SETUP_NO_MMAP))
- ptr = io_pages_map(&ctx->sqe_pages, &ctx->n_sqe_pages, size);
- else
- ptr = io_sqes_map(ctx, p->sq_off.user_addr, size);
-
- if (IS_ERR(ptr)) {
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (ctx->flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p->sq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region(ctx, &ctx->sq_region, &rd, IORING_OFF_SQES);
+ if (ret) {
io_rings_free(ctx);
- return PTR_ERR(ptr);
+ return ret;
}
-
- ctx->sq_sqes = ptr;
+ ctx->sq_sqes = io_region_get_ptr(&ctx->sq_region);
return 0;
}
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 9a182c8a4be1..b9aaa25182a5 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -474,8 +474,7 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
npages = min(ctx->n_ring_pages, (sz + PAGE_SIZE - 1) >> PAGE_SHIFT);
return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, npages);
case IORING_OFF_SQES:
- return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages,
- ctx->n_sqe_pages);
+ return io_region_mmap(ctx, &ctx->sq_region, vma);
case IORING_OFF_PBUF_RING:
return io_pbuf_mmap(file, vma);
case IORING_MAP_OFF_PARAM_REGION:
diff --git a/io_uring/register.c b/io_uring/register.c
index 5e07205fb071..44cd64923d31 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -368,11 +368,11 @@ static int io_register_clock(struct io_ring_ctx *ctx,
*/
struct io_ring_ctx_rings {
unsigned short n_ring_pages;
- unsigned short n_sqe_pages;
struct page **ring_pages;
- struct page **sqe_pages;
- struct io_uring_sqe *sq_sqes;
struct io_rings *rings;
+
+ struct io_uring_sqe *sq_sqes;
+ struct io_mapped_region sq_region;
};
static void io_register_free_rings(struct io_ring_ctx *ctx,
@@ -382,14 +382,11 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
if (!(p->flags & IORING_SETUP_NO_MMAP)) {
io_pages_unmap(r->rings, &r->ring_pages, &r->n_ring_pages,
true);
- io_pages_unmap(r->sq_sqes, &r->sqe_pages, &r->n_sqe_pages,
- true);
} else {
io_pages_free(&r->ring_pages, r->n_ring_pages);
- io_pages_free(&r->sqe_pages, r->n_sqe_pages);
vunmap(r->rings);
- vunmap(r->sq_sqes);
}
+ io_free_region(ctx, &r->sq_region);
}
#define swap_old(ctx, o, n, field) \
@@ -404,11 +401,11 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
{
+ struct io_uring_region_desc rd;
struct io_ring_ctx_rings o = { }, n = { }, *to_free = NULL;
size_t size, sq_array_offset;
struct io_uring_params p;
unsigned i, tail;
- void *ptr;
int ret;
/* for single issuer, must be owner resizing */
@@ -466,16 +463,18 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
return -EOVERFLOW;
}
- if (!(p.flags & IORING_SETUP_NO_MMAP))
- ptr = io_pages_map(&n.sqe_pages, &n.n_sqe_pages, size);
- else
- ptr = __io_uaddr_map(&n.sqe_pages, &n.n_sqe_pages,
- p.sq_off.user_addr,
- size);
- if (IS_ERR(ptr)) {
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (p.flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p.sq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region_mmap_safe(ctx, &n.sq_region, &rd, IORING_OFF_SQES);
+ if (ret) {
io_register_free_rings(ctx, &p, &n);
- return PTR_ERR(ptr);
+ return ret;
}
+ n.sq_sqes = io_region_get_ptr(&n.sq_region);
/*
* If using SQPOLL, park the thread
@@ -506,7 +505,6 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
* Now copy SQ and CQ entries, if any. If either of the destination
* rings can't hold what is already there, then fail the operation.
*/
- n.sq_sqes = ptr;
tail = o.rings->sq.tail;
if (tail - o.rings->sq.head > p.sq_entries)
goto overflow;
@@ -555,9 +553,8 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ctx->rings = n.rings;
ctx->sq_sqes = n.sq_sqes;
swap_old(ctx, o, n, n_ring_pages);
- swap_old(ctx, o, n, n_sqe_pages);
swap_old(ctx, o, n, ring_pages);
- swap_old(ctx, o, n, sqe_pages);
+ swap_old(ctx, o, n, sq_region);
to_free = &o;
ret = 0;
out:
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 14/18] io_uring: use region api for CQ
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (12 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 13/18] io_uring: use region api for SQ Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 15/18] io_uring/kbuf: use mmap_lock to sync with mmap Pavel Begunkov
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 8 +----
io_uring/io_uring.c | 36 +++++++---------------
io_uring/memmap.c | 55 +++++-----------------------------
io_uring/memmap.h | 4 ---
io_uring/register.c | 35 ++++++++++------------
5 files changed, 36 insertions(+), 102 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3f353f269c6e..2db252841509 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -426,14 +426,8 @@ struct io_ring_ctx {
*/
struct mutex mmap_lock;
- /*
- * If IORING_SETUP_NO_MMAP is used, then the below holds
- * the gup'ed pages for the two rings, and the sqes.
- */
- unsigned short n_ring_pages;
- struct page **ring_pages;
-
struct io_mapped_region sq_region;
+ struct io_mapped_region ring_region;
/* used for optimised request parameter and wait argument passing */
struct io_mapped_region param_region;
};
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2ac80b4d4016..bc0ab2bb7ae2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2630,26 +2630,10 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
}
-static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr,
- size_t size)
-{
- return __io_uaddr_map(&ctx->ring_pages, &ctx->n_ring_pages, uaddr,
- size);
-}
-
static void io_rings_free(struct io_ring_ctx *ctx)
{
- if (!(ctx->flags & IORING_SETUP_NO_MMAP)) {
- io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages,
- true);
- } else {
- io_pages_free(&ctx->ring_pages, ctx->n_ring_pages);
- ctx->n_ring_pages = 0;
- vunmap(ctx->rings);
- }
-
io_free_region(ctx, &ctx->sq_region);
-
+ io_free_region(ctx, &ctx->ring_region);
ctx->rings = NULL;
ctx->sq_sqes = NULL;
}
@@ -3480,15 +3464,17 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
if (size == SIZE_MAX)
return -EOVERFLOW;
- if (!(ctx->flags & IORING_SETUP_NO_MMAP))
- rings = io_pages_map(&ctx->ring_pages, &ctx->n_ring_pages, size);
- else
- rings = io_rings_map(ctx, p->cq_off.user_addr, size);
-
- if (IS_ERR(rings))
- return PTR_ERR(rings);
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (ctx->flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p->cq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region(ctx, &ctx->ring_region, &rd, IORING_OFF_CQ_RING);
+ if (ret)
+ return ret;
+ ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
- ctx->rings = rings;
if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
ctx->sq_array = (u32 *)((char *)rings + sq_array_offset);
rings->sq_ring_mask = p->sq_entries - 1;
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index b9aaa25182a5..668b1c3579a2 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -120,18 +120,6 @@ void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
*npages = 0;
}
-void io_pages_free(struct page ***pages, int npages)
-{
- struct page **page_array = *pages;
-
- if (!page_array)
- return;
-
- unpin_user_pages(page_array, npages);
- kvfree(page_array);
- *pages = NULL;
-}
-
struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
{
unsigned long start, end, nr_pages;
@@ -174,34 +162,6 @@ struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
return ERR_PTR(ret);
}
-void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
- unsigned long uaddr, size_t size)
-{
- struct page **page_array;
- unsigned int nr_pages;
- void *page_addr;
-
- *npages = 0;
-
- if (uaddr & (PAGE_SIZE - 1) || !size)
- return ERR_PTR(-EINVAL);
-
- nr_pages = 0;
- page_array = io_pin_pages(uaddr, size, &nr_pages);
- if (IS_ERR(page_array))
- return page_array;
-
- page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL);
- if (page_addr) {
- *pages = page_array;
- *npages = nr_pages;
- return page_addr;
- }
-
- io_pages_free(&page_array, nr_pages);
- return ERR_PTR(-ENOMEM);
-}
-
enum {
/* memory was vmap'ed for the kernel, freeing the region vunmap's it */
IO_REGION_F_VMAP = 1,
@@ -446,9 +406,10 @@ int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
static int io_region_mmap(struct io_ring_ctx *ctx,
struct io_mapped_region *mr,
- struct vm_area_struct *vma)
+ struct vm_area_struct *vma,
+ unsigned max_pages)
{
- unsigned long nr_pages = mr->nr_pages;
+ unsigned long nr_pages = min(mr->nr_pages, max_pages);
vm_flags_set(vma, VM_DONTEXPAND);
return vm_insert_pages(vma, vma->vm_start, mr->pages, &nr_pages);
@@ -459,7 +420,7 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
struct io_ring_ctx *ctx = file->private_data;
size_t sz = vma->vm_end - vma->vm_start;
long offset = vma->vm_pgoff << PAGE_SHIFT;
- unsigned int npages;
+ unsigned int page_limit;
void *ptr;
guard(mutex)(&ctx->mmap_lock);
@@ -471,14 +432,14 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
switch (offset & IORING_OFF_MMAP_MASK) {
case IORING_OFF_SQ_RING:
case IORING_OFF_CQ_RING:
- npages = min(ctx->n_ring_pages, (sz + PAGE_SIZE - 1) >> PAGE_SHIFT);
- return io_uring_mmap_pages(ctx, vma, ctx->ring_pages, npages);
+ page_limit = (sz + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ return io_region_mmap(ctx, &ctx->ring_region, vma, page_limit);
case IORING_OFF_SQES:
- return io_region_mmap(ctx, &ctx->sq_region, vma);
+ return io_region_mmap(ctx, &ctx->sq_region, vma, UINT_MAX);
case IORING_OFF_PBUF_RING:
return io_pbuf_mmap(file, vma);
case IORING_MAP_OFF_PARAM_REGION:
- return io_region_mmap(ctx, &ctx->param_region, vma);
+ return io_region_mmap(ctx, &ctx->param_region, vma, UINT_MAX);
}
return -EINVAL;
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index 2402bca3d700..7395996eb353 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -4,7 +4,6 @@
#define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
-void io_pages_free(struct page ***pages, int npages);
int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
struct page **pages, int npages);
@@ -13,9 +12,6 @@ void *io_pages_map(struct page ***out_pages, unsigned short *npages,
void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
bool put_pages);
-void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
- unsigned long uaddr, size_t size);
-
#ifndef CONFIG_MMU
unsigned int io_uring_nommu_mmap_capabilities(struct file *file);
#endif
diff --git a/io_uring/register.c b/io_uring/register.c
index 44cd64923d31..f1698c18c7cb 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -367,26 +367,19 @@ static int io_register_clock(struct io_ring_ctx *ctx,
* either mapping or freeing.
*/
struct io_ring_ctx_rings {
- unsigned short n_ring_pages;
- struct page **ring_pages;
struct io_rings *rings;
-
struct io_uring_sqe *sq_sqes;
+
struct io_mapped_region sq_region;
+ struct io_mapped_region ring_region;
};
static void io_register_free_rings(struct io_ring_ctx *ctx,
struct io_uring_params *p,
struct io_ring_ctx_rings *r)
{
- if (!(p->flags & IORING_SETUP_NO_MMAP)) {
- io_pages_unmap(r->rings, &r->ring_pages, &r->n_ring_pages,
- true);
- } else {
- io_pages_free(&r->ring_pages, r->n_ring_pages);
- vunmap(r->rings);
- }
io_free_region(ctx, &r->sq_region);
+ io_free_region(ctx, &r->ring_region);
}
#define swap_old(ctx, o, n, field) \
@@ -436,13 +429,18 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
if (size == SIZE_MAX)
return -EOVERFLOW;
- if (!(p.flags & IORING_SETUP_NO_MMAP))
- n.rings = io_pages_map(&n.ring_pages, &n.n_ring_pages, size);
- else
- n.rings = __io_uaddr_map(&n.ring_pages, &n.n_ring_pages,
- p.cq_off.user_addr, size);
- if (IS_ERR(n.rings))
- return PTR_ERR(n.rings);
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(size);
+ if (p.flags & IORING_SETUP_NO_MMAP) {
+ rd.user_addr = p.cq_off.user_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region_mmap_safe(ctx, &n.ring_region, &rd, IORING_OFF_CQ_RING);
+ if (ret) {
+ io_register_free_rings(ctx, &p, &n);
+ return ret;
+ }
+ n.rings = io_region_get_ptr(&n.ring_region);
n.rings->sq_ring_mask = p.sq_entries - 1;
n.rings->cq_ring_mask = p.cq_entries - 1;
@@ -552,8 +550,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
ctx->rings = n.rings;
ctx->sq_sqes = n.sq_sqes;
- swap_old(ctx, o, n, n_ring_pages);
- swap_old(ctx, o, n, ring_pages);
+ swap_old(ctx, o, n, ring_region);
swap_old(ctx, o, n, sq_region);
to_free = &o;
ret = 0;
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 15/18] io_uring/kbuf: use mmap_lock to sync with mmap
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (13 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 14/18] io_uring: use region api for CQ Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 16/18] io_uring/kbuf: remove pbuf ring refcounting Pavel Begunkov
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
A preparation / cleanup patch simplifying the buf ring - mmap
synchronisation. Instead of relying on RCU, which is trickier, do it by
grabbing the mmap_lock when when anyone tries to publish or remove a
registered buffer to / from ->io_bl_xa.
Modifications of the xarray should always be protected by both
->uring_lock and ->mmap_lock, while lookups should hold either of them.
While a struct io_buffer_list is in the xarray, the mmap related fields
like ->flags and ->buf_pages should stay stable.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 5 +++
io_uring/kbuf.c | 56 +++++++++++++++-------------------
io_uring/kbuf.h | 1 -
3 files changed, 29 insertions(+), 33 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 2db252841509..091d1eaf5ba0 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -293,6 +293,11 @@ struct io_ring_ctx {
struct io_submit_state submit_state;
+ /*
+ * Modifications are protected by ->uring_lock and ->mmap_lock.
+ * The flags, buf_pages and buf_nr_pages fields should be stable
+ * once published.
+ */
struct xarray io_bl_xa;
struct io_hash_table cancel_table;
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index d407576ddfb7..662e928cc3b0 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -45,10 +45,11 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
/*
* Store buffer group ID and finally mark the list as visible.
* The normal lookup doesn't care about the visibility as we're
- * always under the ->uring_lock, but the RCU lookup from mmap does.
+ * always under the ->uring_lock, but lookups from mmap do.
*/
bl->bgid = bgid;
atomic_set(&bl->refs, 1);
+ guard(mutex)(&ctx->mmap_lock);
return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
}
@@ -388,7 +389,7 @@ void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
{
if (atomic_dec_and_test(&bl->refs)) {
__io_remove_buffers(ctx, bl, -1U);
- kfree_rcu(bl, rcu);
+ kfree(bl);
}
}
@@ -397,10 +398,17 @@ void io_destroy_buffers(struct io_ring_ctx *ctx)
struct io_buffer_list *bl;
struct list_head *item, *tmp;
struct io_buffer *buf;
- unsigned long index;
- xa_for_each(&ctx->io_bl_xa, index, bl) {
- xa_erase(&ctx->io_bl_xa, bl->bgid);
+ while (1) {
+ unsigned long index = 0;
+
+ scoped_guard(mutex, &ctx->mmap_lock) {
+ bl = xa_find(&ctx->io_bl_xa, &index, ULONG_MAX, XA_PRESENT);
+ if (bl)
+ xa_erase(&ctx->io_bl_xa, bl->bgid);
+ }
+ if (!bl)
+ break;
io_put_bl(ctx, bl);
}
@@ -589,11 +597,7 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
INIT_LIST_HEAD(&bl->buf_list);
ret = io_buffer_add_list(ctx, bl, p->bgid);
if (ret) {
- /*
- * Doesn't need rcu free as it was never visible, but
- * let's keep it consistent throughout.
- */
- kfree_rcu(bl, rcu);
+ kfree(bl);
goto err;
}
}
@@ -736,7 +740,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
return 0;
}
- kfree_rcu(free_bl, rcu);
+ kfree(free_bl);
return ret;
}
@@ -760,7 +764,9 @@ int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
if (!(bl->flags & IOBL_BUF_RING))
return -EINVAL;
- xa_erase(&ctx->io_bl_xa, bl->bgid);
+ scoped_guard(mutex, &ctx->mmap_lock)
+ xa_erase(&ctx->io_bl_xa, bl->bgid);
+
io_put_bl(ctx, bl);
return 0;
}
@@ -795,29 +801,13 @@ struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
unsigned long bgid)
{
struct io_buffer_list *bl;
- bool ret;
- /*
- * We have to be a bit careful here - we're inside mmap and cannot grab
- * the uring_lock. This means the buffer_list could be simultaneously
- * going away, if someone is trying to be sneaky. Look it up under rcu
- * so we know it's not going away, and attempt to grab a reference to
- * it. If the ref is already zero, then fail the mapping. If successful,
- * the caller will call io_put_bl() to drop the the reference at at the
- * end. This may then safely free the buffer_list (and drop the pages)
- * at that point, vm_insert_pages() would've already grabbed the
- * necessary vma references.
- */
- rcu_read_lock();
bl = xa_load(&ctx->io_bl_xa, bgid);
/* must be a mmap'able buffer ring and have pages */
- ret = false;
- if (bl && bl->flags & IOBL_MMAP)
- ret = atomic_inc_not_zero(&bl->refs);
- rcu_read_unlock();
-
- if (ret)
- return bl;
+ if (bl && bl->flags & IOBL_MMAP) {
+ if (atomic_inc_not_zero(&bl->refs))
+ return bl;
+ }
return ERR_PTR(-EINVAL);
}
@@ -829,6 +819,8 @@ int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
struct io_buffer_list *bl;
int bgid, ret;
+ lockdep_assert_held(&ctx->mmap_lock);
+
bgid = (pgoff & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
bl = io_pbuf_get_bl(ctx, bgid);
if (IS_ERR(bl))
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 36aadfe5ac00..d5e4afcbfbb3 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -25,7 +25,6 @@ struct io_buffer_list {
struct page **buf_pages;
struct io_uring_buf_ring *buf_ring;
};
- struct rcu_head rcu;
};
__u16 bgid;
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 16/18] io_uring/kbuf: remove pbuf ring refcounting
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (14 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 15/18] io_uring/kbuf: use mmap_lock to sync with mmap Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 17/18] io_uring/kbuf: use region api for pbuf rings Pavel Begunkov
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
struct io_buffer_list refcounting was needed for RCU based sync with
mmap, now we can kill it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/kbuf.c | 21 +++++++--------------
io_uring/kbuf.h | 3 ---
io_uring/memmap.c | 1 -
3 files changed, 7 insertions(+), 18 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 662e928cc3b0..644f61445ec9 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -48,7 +48,6 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
* always under the ->uring_lock, but lookups from mmap do.
*/
bl->bgid = bgid;
- atomic_set(&bl->refs, 1);
guard(mutex)(&ctx->mmap_lock);
return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
}
@@ -385,12 +384,10 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx,
return i;
}
-void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
+static void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
{
- if (atomic_dec_and_test(&bl->refs)) {
- __io_remove_buffers(ctx, bl, -1U);
- kfree(bl);
- }
+ __io_remove_buffers(ctx, bl, -1U);
+ kfree(bl);
}
void io_destroy_buffers(struct io_ring_ctx *ctx)
@@ -804,10 +801,8 @@ struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
bl = xa_load(&ctx->io_bl_xa, bgid);
/* must be a mmap'able buffer ring and have pages */
- if (bl && bl->flags & IOBL_MMAP) {
- if (atomic_inc_not_zero(&bl->refs))
- return bl;
- }
+ if (bl && bl->flags & IOBL_MMAP)
+ return bl;
return ERR_PTR(-EINVAL);
}
@@ -817,7 +812,7 @@ int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
struct io_ring_ctx *ctx = file->private_data;
loff_t pgoff = vma->vm_pgoff << PAGE_SHIFT;
struct io_buffer_list *bl;
- int bgid, ret;
+ int bgid;
lockdep_assert_held(&ctx->mmap_lock);
@@ -826,7 +821,5 @@ int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
if (IS_ERR(bl))
return PTR_ERR(bl);
- ret = io_uring_mmap_pages(ctx, vma, bl->buf_pages, bl->buf_nr_pages);
- io_put_bl(ctx, bl);
- return ret;
+ return io_uring_mmap_pages(ctx, vma, bl->buf_pages, bl->buf_nr_pages);
}
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index d5e4afcbfbb3..dff7444026a6 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -35,8 +35,6 @@ struct io_buffer_list {
__u16 mask;
__u16 flags;
-
- atomic_t refs;
};
struct io_buffer {
@@ -83,7 +81,6 @@ void __io_put_kbuf(struct io_kiocb *req, int len, unsigned issue_flags);
bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
-void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl);
struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
unsigned long bgid);
int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma);
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 668b1c3579a2..73b73f4ea1bd 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -383,7 +383,6 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
if (IS_ERR(bl))
return bl;
ptr = bl->buf_ring;
- io_put_bl(ctx, bl);
return ptr;
}
case IORING_MAP_OFF_PARAM_REGION:
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 17/18] io_uring/kbuf: use region api for pbuf rings
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (15 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 16/18] io_uring/kbuf: remove pbuf ring refcounting Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 13:34 ` [PATCH v3 18/18] io_uring/memmap: unify io_uring mmap'ing code Pavel Begunkov
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Convert internal parts of the provided buffer ring managment to the
region API. It's the last non-region mapped ring we have, so it also
kills a bunch of now unused memmap.c helpers.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/kbuf.c | 170 ++++++++++++++--------------------------------
io_uring/kbuf.h | 18 ++---
io_uring/memmap.c | 118 +++++---------------------------
io_uring/memmap.h | 7 --
4 files changed, 73 insertions(+), 240 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 644f61445ec9..2dfb9f9419a0 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -351,17 +351,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx,
if (bl->flags & IOBL_BUF_RING) {
i = bl->buf_ring->tail - bl->head;
- if (bl->buf_nr_pages) {
- int j;
-
- if (!(bl->flags & IOBL_MMAP)) {
- for (j = 0; j < bl->buf_nr_pages; j++)
- unpin_user_page(bl->buf_pages[j]);
- }
- io_pages_unmap(bl->buf_ring, &bl->buf_pages,
- &bl->buf_nr_pages, bl->flags & IOBL_MMAP);
- bl->flags &= ~IOBL_MMAP;
- }
+ io_free_region(ctx, &bl->region);
/* make sure it's seen as empty */
INIT_LIST_HEAD(&bl->buf_list);
bl->flags &= ~IOBL_BUF_RING;
@@ -614,75 +604,14 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags)
return IOU_OK;
}
-static int io_pin_pbuf_ring(struct io_uring_buf_reg *reg,
- struct io_buffer_list *bl)
-{
- struct io_uring_buf_ring *br = NULL;
- struct page **pages;
- int nr_pages, ret;
-
- pages = io_pin_pages(reg->ring_addr,
- flex_array_size(br, bufs, reg->ring_entries),
- &nr_pages);
- if (IS_ERR(pages))
- return PTR_ERR(pages);
-
- br = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- if (!br) {
- ret = -ENOMEM;
- goto error_unpin;
- }
-
-#ifdef SHM_COLOUR
- /*
- * On platforms that have specific aliasing requirements, SHM_COLOUR
- * is set and we must guarantee that the kernel and user side align
- * nicely. We cannot do that if IOU_PBUF_RING_MMAP isn't set and
- * the application mmap's the provided ring buffer. Fail the request
- * if we, by chance, don't end up with aligned addresses. The app
- * should use IOU_PBUF_RING_MMAP instead, and liburing will handle
- * this transparently.
- */
- if ((reg->ring_addr | (unsigned long) br) & (SHM_COLOUR - 1)) {
- ret = -EINVAL;
- goto error_unpin;
- }
-#endif
- bl->buf_pages = pages;
- bl->buf_nr_pages = nr_pages;
- bl->buf_ring = br;
- bl->flags |= IOBL_BUF_RING;
- bl->flags &= ~IOBL_MMAP;
- return 0;
-error_unpin:
- unpin_user_pages(pages, nr_pages);
- kvfree(pages);
- vunmap(br);
- return ret;
-}
-
-static int io_alloc_pbuf_ring(struct io_ring_ctx *ctx,
- struct io_uring_buf_reg *reg,
- struct io_buffer_list *bl)
-{
- size_t ring_size;
-
- ring_size = reg->ring_entries * sizeof(struct io_uring_buf_ring);
-
- bl->buf_ring = io_pages_map(&bl->buf_pages, &bl->buf_nr_pages, ring_size);
- if (IS_ERR(bl->buf_ring)) {
- bl->buf_ring = NULL;
- return -ENOMEM;
- }
-
- bl->flags |= (IOBL_BUF_RING | IOBL_MMAP);
- return 0;
-}
-
int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
{
struct io_uring_buf_reg reg;
struct io_buffer_list *bl, *free_bl = NULL;
+ struct io_uring_region_desc rd;
+ struct io_uring_buf_ring *br;
+ unsigned long mmap_offset;
+ unsigned long ring_size;
int ret;
lockdep_assert_held(&ctx->uring_lock);
@@ -694,19 +623,8 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
return -EINVAL;
if (reg.flags & ~(IOU_PBUF_RING_MMAP | IOU_PBUF_RING_INC))
return -EINVAL;
- if (!(reg.flags & IOU_PBUF_RING_MMAP)) {
- if (!reg.ring_addr)
- return -EFAULT;
- if (reg.ring_addr & ~PAGE_MASK)
- return -EINVAL;
- } else {
- if (reg.ring_addr)
- return -EINVAL;
- }
-
if (!is_power_of_2(reg.ring_entries))
return -EINVAL;
-
/* cannot disambiguate full vs empty due to head/tail size */
if (reg.ring_entries >= 65536)
return -EINVAL;
@@ -722,21 +640,47 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
return -ENOMEM;
}
- if (!(reg.flags & IOU_PBUF_RING_MMAP))
- ret = io_pin_pbuf_ring(®, bl);
- else
- ret = io_alloc_pbuf_ring(ctx, ®, bl);
+ mmap_offset = reg.bgid << IORING_OFF_PBUF_SHIFT;
+ ring_size = flex_array_size(br, bufs, reg.ring_entries);
- if (!ret) {
- bl->nr_entries = reg.ring_entries;
- bl->mask = reg.ring_entries - 1;
- if (reg.flags & IOU_PBUF_RING_INC)
- bl->flags |= IOBL_INC;
+ memset(&rd, 0, sizeof(rd));
+ rd.size = PAGE_ALIGN(ring_size);
+ if (!(reg.flags & IOU_PBUF_RING_MMAP)) {
+ rd.user_addr = reg.ring_addr;
+ rd.flags |= IORING_MEM_REGION_TYPE_USER;
+ }
+ ret = io_create_region_mmap_safe(ctx, &bl->region, &rd, mmap_offset);
+ if (ret)
+ goto fail;
+ br = io_region_get_ptr(&bl->region);
- io_buffer_add_list(ctx, bl, reg.bgid);
- return 0;
+#ifdef SHM_COLOUR
+ /*
+ * On platforms that have specific aliasing requirements, SHM_COLOUR
+ * is set and we must guarantee that the kernel and user side align
+ * nicely. We cannot do that if IOU_PBUF_RING_MMAP isn't set and
+ * the application mmap's the provided ring buffer. Fail the request
+ * if we, by chance, don't end up with aligned addresses. The app
+ * should use IOU_PBUF_RING_MMAP instead, and liburing will handle
+ * this transparently.
+ */
+ if (!(reg.flags & IOU_PBUF_RING_MMAP) &&
+ ((reg.ring_addr | (unsigned long)br) & (SHM_COLOUR - 1))) {
+ ret = -EINVAL;
+ goto fail;
}
+#endif
+ bl->nr_entries = reg.ring_entries;
+ bl->mask = reg.ring_entries - 1;
+ bl->flags |= IOBL_BUF_RING;
+ bl->buf_ring = br;
+ if (reg.flags & IOU_PBUF_RING_INC)
+ bl->flags |= IOBL_INC;
+ io_buffer_add_list(ctx, bl, reg.bgid);
+ return 0;
+fail:
+ io_free_region(ctx, &bl->region);
kfree(free_bl);
return ret;
}
@@ -794,32 +738,18 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg)
return 0;
}
-struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
- unsigned long bgid)
-{
- struct io_buffer_list *bl;
-
- bl = xa_load(&ctx->io_bl_xa, bgid);
- /* must be a mmap'able buffer ring and have pages */
- if (bl && bl->flags & IOBL_MMAP)
- return bl;
-
- return ERR_PTR(-EINVAL);
-}
-
-int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma)
+struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
+ unsigned int bgid)
{
- struct io_ring_ctx *ctx = file->private_data;
- loff_t pgoff = vma->vm_pgoff << PAGE_SHIFT;
struct io_buffer_list *bl;
- int bgid;
lockdep_assert_held(&ctx->mmap_lock);
- bgid = (pgoff & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- bl = io_pbuf_get_bl(ctx, bgid);
- if (IS_ERR(bl))
- return PTR_ERR(bl);
+ bl = xa_load(&ctx->io_bl_xa, bgid);
+ if (!bl || !(bl->flags & IOBL_BUF_RING))
+ return NULL;
+ if (WARN_ON_ONCE(!io_region_is_set(&bl->region)))
+ return NULL;
- return io_uring_mmap_pages(ctx, vma, bl->buf_pages, bl->buf_nr_pages);
+ return &bl->region;
}
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index dff7444026a6..bd80c44c5af1 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -3,15 +3,13 @@
#define IOU_KBUF_H
#include <uapi/linux/io_uring.h>
+#include <linux/io_uring_types.h>
enum {
/* ring mapped provided buffers */
IOBL_BUF_RING = 1,
- /* ring mapped provided buffers, but mmap'ed by application */
- IOBL_MMAP = 2,
/* buffers are consumed incrementally rather than always fully */
- IOBL_INC = 4,
-
+ IOBL_INC = 2,
};
struct io_buffer_list {
@@ -21,10 +19,7 @@ struct io_buffer_list {
*/
union {
struct list_head buf_list;
- struct {
- struct page **buf_pages;
- struct io_uring_buf_ring *buf_ring;
- };
+ struct io_uring_buf_ring *buf_ring;
};
__u16 bgid;
@@ -35,6 +30,8 @@ struct io_buffer_list {
__u16 mask;
__u16 flags;
+
+ struct io_mapped_region region;
};
struct io_buffer {
@@ -81,9 +78,8 @@ void __io_put_kbuf(struct io_kiocb *req, int len, unsigned issue_flags);
bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
-struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx,
- unsigned long bgid);
-int io_pbuf_mmap(struct file *file, struct vm_area_struct *vma);
+struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
+ unsigned int bgid);
static inline bool io_kbuf_recycle_ring(struct io_kiocb *req)
{
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 73b73f4ea1bd..6d8a98bd9cac 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -36,90 +36,6 @@ static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
return page_address(page);
}
-static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size,
- gfp_t gfp)
-{
- void *ret;
- int i;
-
- for (i = 0; i < nr_pages; i++) {
- pages[i] = alloc_page(gfp);
- if (!pages[i])
- goto err;
- }
-
- ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- if (ret)
- return ret;
-err:
- while (i--)
- put_page(pages[i]);
- return ERR_PTR(-ENOMEM);
-}
-
-void *io_pages_map(struct page ***out_pages, unsigned short *npages,
- size_t size)
-{
- gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
- struct page **pages;
- int nr_pages;
- void *ret;
-
- nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
- pages = kvmalloc_array(nr_pages, sizeof(struct page *), gfp);
- if (!pages)
- return ERR_PTR(-ENOMEM);
-
- ret = io_mem_alloc_compound(pages, nr_pages, size, gfp);
- if (!IS_ERR(ret))
- goto done;
- if (nr_pages == 1)
- goto fail;
-
- ret = io_mem_alloc_single(pages, nr_pages, size, gfp);
- if (!IS_ERR(ret)) {
-done:
- *out_pages = pages;
- *npages = nr_pages;
- return ret;
- }
-fail:
- kvfree(pages);
- *out_pages = NULL;
- *npages = 0;
- return ret;
-}
-
-void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
- bool put_pages)
-{
- bool do_vunmap = false;
-
- if (!ptr)
- return;
-
- if (put_pages && *npages) {
- struct page **to_free = *pages;
- int i;
-
- /*
- * Only did vmap for the non-compound multiple page case.
- * For the compound page, we just need to put the head.
- */
- if (PageCompound(to_free[0]))
- *npages = 1;
- else if (*npages > 1)
- do_vunmap = true;
- for (i = 0; i < *npages; i++)
- put_page(to_free[i]);
- }
- if (do_vunmap)
- vunmap(ptr);
- kvfree(*pages);
- *pages = NULL;
- *npages = 0;
-}
-
struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
{
unsigned long start, end, nr_pages;
@@ -374,16 +290,14 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
return ERR_PTR(-EFAULT);
return ctx->sq_sqes;
case IORING_OFF_PBUF_RING: {
- struct io_buffer_list *bl;
+ struct io_mapped_region *region;
unsigned int bgid;
- void *ptr;
bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- bl = io_pbuf_get_bl(ctx, bgid);
- if (IS_ERR(bl))
- return bl;
- ptr = bl->buf_ring;
- return ptr;
+ region = io_pbuf_get_region(ctx, bgid);
+ if (!region)
+ return ERR_PTR(-EINVAL);
+ return io_region_validate_mmap(ctx, region);
}
case IORING_MAP_OFF_PARAM_REGION:
return io_region_validate_mmap(ctx, &ctx->param_region);
@@ -392,15 +306,6 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
return ERR_PTR(-EINVAL);
}
-int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
- struct page **pages, int npages)
-{
- unsigned long nr_pages = npages;
-
- vm_flags_set(vma, VM_DONTEXPAND);
- return vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);
-}
-
#ifdef CONFIG_MMU
static int io_region_mmap(struct io_ring_ctx *ctx,
@@ -435,8 +340,17 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
return io_region_mmap(ctx, &ctx->ring_region, vma, page_limit);
case IORING_OFF_SQES:
return io_region_mmap(ctx, &ctx->sq_region, vma, UINT_MAX);
- case IORING_OFF_PBUF_RING:
- return io_pbuf_mmap(file, vma);
+ case IORING_OFF_PBUF_RING: {
+ struct io_mapped_region *region;
+ unsigned int bgid;
+
+ bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
+ region = io_pbuf_get_region(ctx, bgid);
+ if (!region)
+ return -EINVAL;
+
+ return io_region_mmap(ctx, region, vma, UINT_MAX);
+ }
case IORING_MAP_OFF_PARAM_REGION:
return io_region_mmap(ctx, &ctx->param_region, vma, UINT_MAX);
}
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index 7395996eb353..c898dcba2b4e 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -4,13 +4,6 @@
#define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
-int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
- struct page **pages, int npages);
-
-void *io_pages_map(struct page ***out_pages, unsigned short *npages,
- size_t size);
-void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
- bool put_pages);
#ifndef CONFIG_MMU
unsigned int io_uring_nommu_mmap_capabilities(struct file *file);
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 18/18] io_uring/memmap: unify io_uring mmap'ing code
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (16 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 17/18] io_uring/kbuf: use region api for pbuf rings Pavel Begunkov
@ 2024-11-29 13:34 ` Pavel Begunkov
2024-11-29 16:04 ` [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Jens Axboe
2024-11-29 16:06 ` Jens Axboe
19 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2024-11-29 13:34 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
All mapped memory is now backed by regions and we can unify and clean
up io_region_validate_mmap() and io_uring_mmap(). Extract a function
looking up a region, the rest of the handling should be generic and just
needs the region.
There is one more ring type specific code, i.e. the mmaping size
truncation quirk for IORING_OFF_[S,C]Q_RING, which is left as is.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/kbuf.c | 3 --
io_uring/memmap.c | 81 ++++++++++++++++++-----------------------------
2 files changed, 31 insertions(+), 53 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 2dfb9f9419a0..e91260a6156b 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -748,8 +748,5 @@ struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
bl = xa_load(&ctx->io_bl_xa, bgid);
if (!bl || !(bl->flags & IOBL_BUF_RING))
return NULL;
- if (WARN_ON_ONCE(!io_region_is_set(&bl->region)))
- return NULL;
-
return &bl->region;
}
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 6d8a98bd9cac..dda846190fbd 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -254,6 +254,27 @@ int io_create_region_mmap_safe(struct io_ring_ctx *ctx, struct io_mapped_region
return 0;
}
+static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
+ loff_t pgoff)
+{
+ loff_t offset = pgoff << PAGE_SHIFT;
+ unsigned int bgid;
+
+ switch (offset & IORING_OFF_MMAP_MASK) {
+ case IORING_OFF_SQ_RING:
+ case IORING_OFF_CQ_RING:
+ return &ctx->ring_region;
+ case IORING_OFF_SQES:
+ return &ctx->sq_region;
+ case IORING_OFF_PBUF_RING:
+ bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
+ return io_pbuf_get_region(ctx, bgid);
+ case IORING_MAP_OFF_PARAM_REGION:
+ return &ctx->param_region;
+ }
+ return NULL;
+}
+
static void *io_region_validate_mmap(struct io_ring_ctx *ctx,
struct io_mapped_region *mr)
{
@@ -271,39 +292,12 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
size_t sz)
{
struct io_ring_ctx *ctx = file->private_data;
- loff_t offset = pgoff << PAGE_SHIFT;
+ struct io_mapped_region *region;
- switch ((pgoff << PAGE_SHIFT) & IORING_OFF_MMAP_MASK) {
- case IORING_OFF_SQ_RING:
- case IORING_OFF_CQ_RING:
- /* Don't allow mmap if the ring was setup without it */
- if (ctx->flags & IORING_SETUP_NO_MMAP)
- return ERR_PTR(-EINVAL);
- if (!ctx->rings)
- return ERR_PTR(-EFAULT);
- return ctx->rings;
- case IORING_OFF_SQES:
- /* Don't allow mmap if the ring was setup without it */
- if (ctx->flags & IORING_SETUP_NO_MMAP)
- return ERR_PTR(-EINVAL);
- if (!ctx->sq_sqes)
- return ERR_PTR(-EFAULT);
- return ctx->sq_sqes;
- case IORING_OFF_PBUF_RING: {
- struct io_mapped_region *region;
- unsigned int bgid;
-
- bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- region = io_pbuf_get_region(ctx, bgid);
- if (!region)
- return ERR_PTR(-EINVAL);
- return io_region_validate_mmap(ctx, region);
- }
- case IORING_MAP_OFF_PARAM_REGION:
- return io_region_validate_mmap(ctx, &ctx->param_region);
- }
-
- return ERR_PTR(-EINVAL);
+ region = io_mmap_get_region(ctx, pgoff);
+ if (!region)
+ return ERR_PTR(-EINVAL);
+ return io_region_validate_mmap(ctx, region);
}
#ifdef CONFIG_MMU
@@ -324,7 +318,8 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
struct io_ring_ctx *ctx = file->private_data;
size_t sz = vma->vm_end - vma->vm_start;
long offset = vma->vm_pgoff << PAGE_SHIFT;
- unsigned int page_limit;
+ unsigned int page_limit = UINT_MAX;
+ struct io_mapped_region *region;
void *ptr;
guard(mutex)(&ctx->mmap_lock);
@@ -337,25 +332,11 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
case IORING_OFF_SQ_RING:
case IORING_OFF_CQ_RING:
page_limit = (sz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- return io_region_mmap(ctx, &ctx->ring_region, vma, page_limit);
- case IORING_OFF_SQES:
- return io_region_mmap(ctx, &ctx->sq_region, vma, UINT_MAX);
- case IORING_OFF_PBUF_RING: {
- struct io_mapped_region *region;
- unsigned int bgid;
-
- bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
- region = io_pbuf_get_region(ctx, bgid);
- if (!region)
- return -EINVAL;
-
- return io_region_mmap(ctx, region, vma, UINT_MAX);
- }
- case IORING_MAP_OFF_PARAM_REGION:
- return io_region_mmap(ctx, &ctx->param_region, vma, UINT_MAX);
+ break;
}
- return -EINVAL;
+ region = io_mmap_get_region(ctx, vma->vm_pgoff);
+ return io_region_mmap(ctx, region, vma, page_limit);
}
unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
--
2.47.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 00/18] kernel allocated regions and convert memmap to regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (17 preceding siblings ...)
2024-11-29 13:34 ` [PATCH v3 18/18] io_uring/memmap: unify io_uring mmap'ing code Pavel Begunkov
@ 2024-11-29 16:04 ` Jens Axboe
2024-11-29 16:06 ` Jens Axboe
19 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2024-11-29 16:04 UTC (permalink / raw)
To: io-uring, Pavel Begunkov
On Fri, 29 Nov 2024 13:34:21 +0000, Pavel Begunkov wrote:
> The first part of the series (Patches 1-11) implement kernel allocated
> regions, which is the classical way SQ/CQ are created. It should be
> straightforward with simple preparations patches and cleanups. The main
> part is Patch 10, which internally implements kernel allocations, and
> Patch 11 that implementing the mmap part and exposes it to reg-wait /
> parameter region users.
>
> [...]
Applied, thanks!
[01/18] io_uring: rename ->resize_lock
commit: e4e0f7d04627a3a8380bda82c4690f598b095b66
[02/18] io_uring/rsrc: export io_check_coalesce_buffer
commit: b5c715ee796dee285f902276c38c808f6a7799cf
[03/18] io_uring/memmap: flag vmap'ed regions
commit: ea57c4c88ffb3f7247200275435bf4aa4894f965
[04/18] io_uring/memmap: flag regions with user pages
commit: 67b855ba258319abe9fac15e6ddf07e57c1589c5
[05/18] io_uring/memmap: account memory before pinning
commit: 85652c20eda52bdf2ecb059da0e5d9c50f2824b7
[06/18] io_uring/memmap: reuse io_free_region for failure path
commit: 3e0b1575a596cded61eee4ef75870a741a40fcc4
[07/18] io_uring/memmap: optimise single folio regions
commit: 1e80236d16da642240292194c9e34fb37664f606
[08/18] io_uring/memmap: helper for pinning region pages
commit: 5e015f23f7d382ed1a301d015284bc8cca87335b
[09/18] io_uring/memmap: add IO_REGION_F_SINGLE_REF
commit: 8acfcf152fef8566a19fe9cdbacdb6a6bdec5520
[10/18] io_uring/memmap: implement kernel allocated regions
commit: 9407cfd8c016024e23ef9c37e422b204dfaf435c
[11/18] io_uring/memmap: implement mmap for regions
commit: efd160a19fdb27db0436a21194972a4ce49bab2d
[12/18] io_uring: pass ctx to io_register_free_rings
commit: 458b0ea4de8d5045e446035a1cdb49f1e6f01789
[13/18] io_uring: use region api for SQ
commit: 5f58f826fcbff03f392fda796445992d59d34a80
[14/18] io_uring: use region api for CQ
commit: 9c0966c93e771eb17da6a41721c4f6613f616212
[15/18] io_uring/kbuf: use mmap_lock to sync with mmap
commit: 8dec4fa7082c0f8dd9692ac110777a994258a798
[16/18] io_uring/kbuf: remove pbuf ring refcounting
commit: 6a2036aec3830a293a5ca2d6059b5e4a450a4e0e
[17/18] io_uring/kbuf: use region api for pbuf rings
commit: d67839c6abfe5dd505390710502e2f9944a51126
[18/18] io_uring/memmap: unify io_uring mmap'ing code
commit: 17f5a7960c70c9a1ec4cb9a63be0898a47af804a
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 00/18] kernel allocated regions and convert memmap to regions
2024-11-29 13:34 [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Pavel Begunkov
` (18 preceding siblings ...)
2024-11-29 16:04 ` [PATCH v3 00/18] kernel allocated regions and convert memmap to regions Jens Axboe
@ 2024-11-29 16:06 ` Jens Axboe
19 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2024-11-29 16:06 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 11/29/24 6:34 AM, Pavel Begunkov wrote:
> The first part of the series (Patches 1-11) implement kernel allocated
> regions, which is the classical way SQ/CQ are created. It should be
> straightforward with simple preparations patches and cleanups. The main
> part is Patch 10, which internally implements kernel allocations, and
> Patch 11 that implementing the mmap part and exposes it to reg-wait /
> parameter region users.
>
> The rest (Patches 12-18) converts SQ, CQ and provided buffers rings
> to regions, which carves a common path for all of them and removes
> duplication.
This is really nice, great unification of it all. And the diffstat
tells that story too.
--
Jens Axboe
^ permalink raw reply [flat|nested] 21+ messages in thread