* [PATCHSET 0/4 v2] Support for mapping SQ/CQ rings into huge page @ 2023-05-13 14:16 Jens Axboe 2023-05-13 14:16 ` [PATCH 1/4] io_uring: remove sq/cq_off memset Jens Axboe ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: Jens Axboe @ 2023-05-13 14:16 UTC (permalink / raw) To: io-uring Hi, io_uring SQ/CQ rings are allocated by the kernel from contigious, normal pages, and then the application mmap()'s the rings into userspace. This works fine, but does require contigious pages to be available for the given SQ and CQ ring sizes. As uptime increases on a given system, so does memory fragmentation. Entropy is invevitable. This patchset adds support for the application passing in a pre-allocated huge page, and then placing the rings in that. This reduces the need for contigious pages, and also reduces the TLB pressure for larger rings. The liburing huge.2 branch has support for using this trivially. Applications may use the normal ring init helpers and set IORING_SETUP_NO_MMAP, in which case a huge page will get allocated for them and used. Or they may use io_uring_queue_init_mem() and pass in a pre-allocated huge page, getting the amount of it used returned. This allows placing multiple rings into a single huge page. Changes since v1: - Mandate that we're using a single page. May be a normal page if we don't need a lot of memory, or a huge page if the ring itself takes up more space than a single normal page. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/4] io_uring: remove sq/cq_off memset 2023-05-13 14:16 [PATCHSET 0/4 v2] Support for mapping SQ/CQ rings into huge page Jens Axboe @ 2023-05-13 14:16 ` Jens Axboe 2023-05-13 14:16 ` [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Jens Axboe ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2023-05-13 14:16 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe We only have two reserved members we're not clearing, do so manually instead. This is in preparation for using one of these members for a new feature. Signed-off-by: Jens Axboe <[email protected]> --- io_uring/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3bca7a79efda..3695c5e6fbf0 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3887,7 +3887,6 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, if (ret) goto err; - memset(&p->sq_off, 0, sizeof(p->sq_off)); p->sq_off.head = offsetof(struct io_rings, sq.head); p->sq_off.tail = offsetof(struct io_rings, sq.tail); p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask); @@ -3895,8 +3894,9 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->sq_off.flags = offsetof(struct io_rings, sq_flags); p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; + p->sq_off.resv1 = 0; + p->sq_off.resv2 = 0; - memset(&p->cq_off, 0, sizeof(p->cq_off)); p->cq_off.head = offsetof(struct io_rings, cq.head); p->cq_off.tail = offsetof(struct io_rings, cq.tail); p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask); @@ -3904,6 +3904,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.overflow = offsetof(struct io_rings, cq_overflow); p->cq_off.cqes = offsetof(struct io_rings, cqes); p->cq_off.flags = offsetof(struct io_rings, cq_flags); + p->cq_off.resv1 = 0; + p->cq_off.resv2 = 0; p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP | IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS | -- 2.39.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() 2023-05-13 14:16 [PATCHSET 0/4 v2] Support for mapping SQ/CQ rings into huge page Jens Axboe 2023-05-13 14:16 ` [PATCH 1/4] io_uring: remove sq/cq_off memset Jens Axboe @ 2023-05-13 14:16 ` Jens Axboe 2023-05-14 2:54 ` Dmitry Kadashev 2023-05-13 14:16 ` [PATCH 3/4] io_uring: add ring freeing helper Jens Axboe 2023-05-13 14:16 ` [PATCH 4/4] io_uring: support for user allocated memory for rings/sqes Jens Axboe 3 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2023-05-13 14:16 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe In preparation for having more than one time of ring allocator, make the existing one return valid/error-pointer rather than just NULL. Signed-off-by: Jens Axboe <[email protected]> --- io_uring/io_uring.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3695c5e6fbf0..6266a870c89f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2712,8 +2712,12 @@ static void io_mem_free(void *ptr) static void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; + void *ret; - return (void *) __get_free_pages(gfp, get_order(size)); + ret = (void *) __get_free_pages(gfp, get_order(size)); + if (ret) + return ret; + return ERR_PTR(-ENOMEM); } static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, @@ -3673,6 +3677,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, { struct io_rings *rings; size_t size, sq_array_offset; + void *ptr; /* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; @@ -3683,8 +3688,8 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; rings = io_mem_alloc(size); - if (!rings) - return -ENOMEM; + if (IS_ERR(rings)) + return PTR_ERR(rings); ctx->rings = rings; ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); @@ -3703,13 +3708,14 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; } - ctx->sq_sqes = io_mem_alloc(size); - if (!ctx->sq_sqes) { + ptr = io_mem_alloc(size); + if (IS_ERR(ptr)) { io_mem_free(ctx->rings); ctx->rings = NULL; - return -ENOMEM; + return PTR_ERR(ptr); } + ctx->sq_sqes = io_mem_alloc(size); return 0; } -- 2.39.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() 2023-05-13 14:16 ` [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Jens Axboe @ 2023-05-14 2:54 ` Dmitry Kadashev 2023-05-14 2:59 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Dmitry Kadashev @ 2023-05-14 2:54 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring Hi Jens, On Sat, May 13, 2023 at 9:19 PM Jens Axboe <[email protected]> wrote: > > In preparation for having more than one time of ring allocator, make the > existing one return valid/error-pointer rather than just NULL. > > Signed-off-by: Jens Axboe <[email protected]> > --- > io_uring/io_uring.c | 18 ++++++++++++------ > 1 file changed, 12 insertions(+), 6 deletions(-) > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 3695c5e6fbf0..6266a870c89f 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -2712,8 +2712,12 @@ static void io_mem_free(void *ptr) > static void *io_mem_alloc(size_t size) > { > gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; > + void *ret; > > - return (void *) __get_free_pages(gfp, get_order(size)); > + ret = (void *) __get_free_pages(gfp, get_order(size)); > + if (ret) > + return ret; > + return ERR_PTR(-ENOMEM); > } > > static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, > @@ -3673,6 +3677,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, > { > struct io_rings *rings; > size_t size, sq_array_offset; > + void *ptr; > > /* make sure these are sane, as we already accounted them */ > ctx->sq_entries = p->sq_entries; > @@ -3683,8 +3688,8 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, > return -EOVERFLOW; > > rings = io_mem_alloc(size); > - if (!rings) > - return -ENOMEM; > + if (IS_ERR(rings)) > + return PTR_ERR(rings); > > ctx->rings = rings; > ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); > @@ -3703,13 +3708,14 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, > return -EOVERFLOW; > } > > - ctx->sq_sqes = io_mem_alloc(size); > - if (!ctx->sq_sqes) { > + ptr = io_mem_alloc(size); > + if (IS_ERR(ptr)) { > io_mem_free(ctx->rings); > ctx->rings = NULL; > - return -ENOMEM; > + return PTR_ERR(ptr); > } > > + ctx->sq_sqes = io_mem_alloc(size); Should be 'ptr' rather than 'io_mem_alloc(size)' here. -- Dmitry Kadashev ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() 2023-05-14 2:54 ` Dmitry Kadashev @ 2023-05-14 2:59 ` Jens Axboe 0 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2023-05-14 2:59 UTC (permalink / raw) To: Dmitry Kadashev; +Cc: io-uring On 5/13/23 8:54?PM, Dmitry Kadashev wrote: > Hi Jens, > > On Sat, May 13, 2023 at 9:19?PM Jens Axboe <[email protected]> wrote: >> >> In preparation for having more than one time of ring allocator, make the >> existing one return valid/error-pointer rather than just NULL. >> >> Signed-off-by: Jens Axboe <[email protected]> >> --- >> io_uring/io_uring.c | 18 ++++++++++++------ >> 1 file changed, 12 insertions(+), 6 deletions(-) >> >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >> index 3695c5e6fbf0..6266a870c89f 100644 >> --- a/io_uring/io_uring.c >> +++ b/io_uring/io_uring.c >> @@ -2712,8 +2712,12 @@ static void io_mem_free(void *ptr) >> static void *io_mem_alloc(size_t size) >> { >> gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; >> + void *ret; >> >> - return (void *) __get_free_pages(gfp, get_order(size)); >> + ret = (void *) __get_free_pages(gfp, get_order(size)); >> + if (ret) >> + return ret; >> + return ERR_PTR(-ENOMEM); >> } >> >> static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, >> @@ -3673,6 +3677,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, >> { >> struct io_rings *rings; >> size_t size, sq_array_offset; >> + void *ptr; >> >> /* make sure these are sane, as we already accounted them */ >> ctx->sq_entries = p->sq_entries; >> @@ -3683,8 +3688,8 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, >> return -EOVERFLOW; >> >> rings = io_mem_alloc(size); >> - if (!rings) >> - return -ENOMEM; >> + if (IS_ERR(rings)) >> + return PTR_ERR(rings); >> >> ctx->rings = rings; >> ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); >> @@ -3703,13 +3708,14 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, >> return -EOVERFLOW; >> } >> >> - ctx->sq_sqes = io_mem_alloc(size); >> - if (!ctx->sq_sqes) { >> + ptr = io_mem_alloc(size); >> + if (IS_ERR(ptr)) { >> io_mem_free(ctx->rings); >> ctx->rings = NULL; >> - return -ENOMEM; >> + return PTR_ERR(ptr); >> } >> >> + ctx->sq_sqes = io_mem_alloc(size); > > Should be 'ptr' rather than 'io_mem_alloc(size)' here. Indeed, good catch. Patch 4 does correct that so the final result is correct, must've happened during a split rebase a while back. I'll fix up patch 2 and 4 so that it's correct after patch 2 as well. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/4] io_uring: add ring freeing helper 2023-05-13 14:16 [PATCHSET 0/4 v2] Support for mapping SQ/CQ rings into huge page Jens Axboe 2023-05-13 14:16 ` [PATCH 1/4] io_uring: remove sq/cq_off memset Jens Axboe 2023-05-13 14:16 ` [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Jens Axboe @ 2023-05-13 14:16 ` Jens Axboe 2023-05-13 14:16 ` [PATCH 4/4] io_uring: support for user allocated memory for rings/sqes Jens Axboe 3 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2023-05-13 14:16 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe We do rings and sqes separately, move them into a helper that does both the freeing and clearing of the memory. Signed-off-by: Jens Axboe <[email protected]> --- io_uring/io_uring.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6266a870c89f..5433e8d6c481 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2709,6 +2709,14 @@ static void io_mem_free(void *ptr) free_compound_page(page); } +static void io_rings_free(struct io_ring_ctx *ctx) +{ + io_mem_free(ctx->rings); + io_mem_free(ctx->sq_sqes); + ctx->rings = NULL; + ctx->sq_sqes = NULL; +} + static void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; @@ -2873,8 +2881,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) mmdrop(ctx->mm_account); ctx->mm_account = NULL; } - io_mem_free(ctx->rings); - io_mem_free(ctx->sq_sqes); + io_rings_free(ctx); percpu_ref_exit(&ctx->refs); free_uid(ctx->user); @@ -3703,15 +3710,13 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, else size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); if (size == SIZE_MAX) { - io_mem_free(ctx->rings); - ctx->rings = NULL; + io_rings_free(ctx); return -EOVERFLOW; } ptr = io_mem_alloc(size); if (IS_ERR(ptr)) { - io_mem_free(ctx->rings); - ctx->rings = NULL; + io_rings_free(ctx); return PTR_ERR(ptr); } -- 2.39.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/4] io_uring: support for user allocated memory for rings/sqes 2023-05-13 14:16 [PATCHSET 0/4 v2] Support for mapping SQ/CQ rings into huge page Jens Axboe ` (2 preceding siblings ...) 2023-05-13 14:16 ` [PATCH 3/4] io_uring: add ring freeing helper Jens Axboe @ 2023-05-13 14:16 ` Jens Axboe 3 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2023-05-13 14:16 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe Currently io_uring applications must call mmap(2) twice to map the rings themselves, and the sqes array. This works fine, but it does not support using huge pages to back the rings/sqes. Provide a way for the application to pass in pre-allocated memory for the rings/sqes, which can then suitably be allocated from shmfs or via mmap to get huge page support. Particularly for larger rings, this reduces the TLBs needed. If an application wishes to take advantage of that, it must pre-allocate the memory needed for the sq/cq ring, and the sqes. The former must be passed in via the io_uring_params->cq_off.user_data field, while the latter is passed in via the io_uring_params->sq_off.user_data field. Then it must set IORING_SETUP_NO_MMAP in the io_uring_params->flags field, and io_uring will then map the existing memory into the kernel for shared use. The application must not call mmap(2) to map rings as it otherwise would have, that will now fail with -EINVAL if this setup flag was used. The pages used for the rings and sqes must be contigious. The intent here is clearly that huge pages should be used, otherwise the normal setup procedure works fine as-is. The application may use one huge page for both the rings and sqes. Outside of those initialization changes, everything works like it did before. Signed-off-by: Jens Axboe <[email protected]> --- include/linux/io_uring_types.h | 10 +++ include/uapi/linux/io_uring.h | 9 ++- io_uring/io_uring.c | 108 ++++++++++++++++++++++++++++++--- 3 files changed, 115 insertions(+), 12 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 1b2a20a42413..f04ce513fadb 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -211,6 +211,16 @@ struct io_ring_ctx { unsigned int compat: 1; enum task_work_notify_mode notify_method; + + /* + * If IORING_SETUP_NO_MMAP is used, then the below holds + * the gup'ed pages for the two rings, and the sqes. + */ + unsigned short n_ring_pages; + unsigned short n_sqe_pages; + struct page **ring_pages; + struct page **sqe_pages; + struct io_rings *rings; struct task_struct *submitter_task; struct percpu_ref refs; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 0716cb17e436..2edba9a274de 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -173,6 +173,11 @@ enum { */ #define IORING_SETUP_DEFER_TASKRUN (1U << 13) +/* + * Application provides the memory for the rings + */ +#define IORING_SETUP_NO_MMAP (1U << 14) + enum io_uring_op { IORING_OP_NOP, IORING_OP_READV, @@ -406,7 +411,7 @@ struct io_sqring_offsets { __u32 dropped; __u32 array; __u32 resv1; - __u64 resv2; + __u64 user_addr; }; /* @@ -425,7 +430,7 @@ struct io_cqring_offsets { __u32 cqes; __u32 flags; __u32 resv1; - __u64 resv2; + __u64 user_addr; }; /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5433e8d6c481..fccc80c201fb 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2709,12 +2709,85 @@ static void io_mem_free(void *ptr) free_compound_page(page); } +static void io_pages_free(struct page ***pages, int npages) +{ + struct page **page_array; + int i; + + if (!pages) + return; + page_array = *pages; + for (i = 0; i < npages; i++) + unpin_user_page(page_array[i]); + kvfree(page_array); + *pages = NULL; +} + +static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, + unsigned long uaddr, size_t size) +{ + struct page **page_array; + unsigned int nr_pages; + int ret; + + *npages = 0; + + if (uaddr & (PAGE_SIZE - 1) || !size) + return ERR_PTR(-EINVAL); + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + if (nr_pages > USHRT_MAX) + return ERR_PTR(-EINVAL); + page_array = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!page_array) + return ERR_PTR(-ENOMEM); + + ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, + page_array); + if (ret != nr_pages) { +err: + io_pages_free(&page_array, ret > 0 ? ret : 0); + return ret < 0 ? ERR_PTR(ret) : ERR_PTR(-EFAULT); + } + /* + * Should be a single page. If the ring is small enough that we can + * use a normal page, that is fine. If we need multiple pages, then + * userspace should use a huge page. That's the only way to guarantee + * that we get contigious memory, outside of just being lucky or + * (currently) having low memory fragmentation. + */ + if (page_array[0] != page_array[ret - 1]) + goto err; + *pages = page_array; + *npages = nr_pages; + return page_to_virt(page_array[0]); +} + +static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr, + size_t size) +{ + return __io_uaddr_map(&ctx->ring_pages, &ctx->n_ring_pages, uaddr, + size); +} + +static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr, + size_t size) +{ + return __io_uaddr_map(&ctx->sqe_pages, &ctx->n_sqe_pages, uaddr, + size); +} + static void io_rings_free(struct io_ring_ctx *ctx) { - io_mem_free(ctx->rings); - io_mem_free(ctx->sq_sqes); - ctx->rings = NULL; - ctx->sq_sqes = NULL; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) { + io_mem_free(ctx->rings); + io_mem_free(ctx->sq_sqes); + ctx->rings = NULL; + ctx->sq_sqes = NULL; + } else { + io_pages_free(&ctx->ring_pages, ctx->n_ring_pages); + io_pages_free(&ctx->sqe_pages, ctx->n_sqe_pages); + } } static void *io_mem_alloc(size_t size) @@ -3359,6 +3432,10 @@ static void *io_uring_validate_mmap_request(struct file *file, struct page *page; void *ptr; + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); + switch (offset & IORING_OFF_MMAP_MASK) { case IORING_OFF_SQ_RING: case IORING_OFF_CQ_RING: @@ -3694,7 +3771,11 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, if (size == SIZE_MAX) return -EOVERFLOW; - rings = io_mem_alloc(size); + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + rings = io_mem_alloc(size); + else + rings = io_rings_map(ctx, p->cq_off.user_addr, size); + if (IS_ERR(rings)) return PTR_ERR(rings); @@ -3714,13 +3795,17 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; } - ptr = io_mem_alloc(size); + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + ptr = io_mem_alloc(size); + else + ptr = io_sqes_map(ctx, p->sq_off.user_addr, size); + if (IS_ERR(ptr)) { io_rings_free(ctx); return PTR_ERR(ptr); } - ctx->sq_sqes = io_mem_alloc(size); + ctx->sq_sqes = ptr; return 0; } @@ -3906,7 +3991,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; p->sq_off.resv1 = 0; - p->sq_off.resv2 = 0; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + p->sq_off.user_addr = 0; p->cq_off.head = offsetof(struct io_rings, cq.head); p->cq_off.tail = offsetof(struct io_rings, cq.tail); @@ -3916,7 +4002,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.cqes = offsetof(struct io_rings, cqes); p->cq_off.flags = offsetof(struct io_rings, cq_flags); p->cq_off.resv1 = 0; - p->cq_off.resv2 = 0; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + p->cq_off.user_addr = 0; p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP | IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS | @@ -3982,7 +4069,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL | IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG | IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | - IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN)) + IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | + IORING_SETUP_NO_MMAP)) return -EINVAL; return io_uring_create(entries, &p, params); -- 2.39.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCHSET RFC 0/4] Support for mapping SQ/CQ rings into huge page @ 2023-04-19 22:48 Jens Axboe 2023-04-19 22:48 ` [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2023-04-19 22:48 UTC (permalink / raw) To: io-uring Hi, io_uring SQ/CQ rings are allocated by the kernel from contigious, normal pages, and then the application mmap()'s the rings into userspace. This works fine, but does require contigious pages to be available for the given SQ and CQ ring sizes. As uptime increases on a given system, so does memory fragmentation. Entropy is invevitable. This patchset adds support for the application passing in a pre-allocated huge page, and then placing the rings in that. This reduces the need for contigious pages, and also reduces the TLB pressure for larger rings. The liburing huge.2 branch has support for using this trivially. Applications may use the normal ring init helpers and set IORING_SETUP_NO_MMAP, in which case a huge page will get allocated for them and used. Or they may use io_uring_queue_init_mem() and pass in a pre-allocated huge page, getting the amount of it used returned. This allows placing multiple rings into a single huge page. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() 2023-04-19 22:48 [PATCHSET RFC 0/4] Support for mapping SQ/CQ rings into huge page Jens Axboe @ 2023-04-19 22:48 ` Jens Axboe 0 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2023-04-19 22:48 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe In preparation for having more than one time of ring allocator, make the existing one return valid/error-pointer rather than just NULL. Signed-off-by: Jens Axboe <[email protected]> --- io_uring/io_uring.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7b4f3eb16a73..13faa3115eb5 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2719,8 +2719,12 @@ static void io_mem_free(void *ptr) static void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; + void *ret; - return (void *) __get_free_pages(gfp, get_order(size)); + ret = (void *) __get_free_pages(gfp, get_order(size)); + if (ret) + return ret; + return ERR_PTR(-ENOMEM); } static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, @@ -3686,6 +3690,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, { struct io_rings *rings; size_t size, sq_array_offset; + void *ptr; /* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; @@ -3696,8 +3701,8 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; rings = io_mem_alloc(size); - if (!rings) - return -ENOMEM; + if (IS_ERR(rings)) + return PTR_ERR(rings); ctx->rings = rings; ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); @@ -3716,13 +3721,14 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; } - ctx->sq_sqes = io_mem_alloc(size); - if (!ctx->sq_sqes) { + ptr = io_mem_alloc(size); + if (IS_ERR(ptr)) { io_mem_free(ctx->rings); ctx->rings = NULL; - return -ENOMEM; + return PTR_ERR(ptr); } + ctx->sq_sqes = io_mem_alloc(size); return 0; } -- 2.39.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-05-14 2:59 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-13 14:16 [PATCHSET 0/4 v2] Support for mapping SQ/CQ rings into huge page Jens Axboe 2023-05-13 14:16 ` [PATCH 1/4] io_uring: remove sq/cq_off memset Jens Axboe 2023-05-13 14:16 ` [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Jens Axboe 2023-05-14 2:54 ` Dmitry Kadashev 2023-05-14 2:59 ` Jens Axboe 2023-05-13 14:16 ` [PATCH 3/4] io_uring: add ring freeing helper Jens Axboe 2023-05-13 14:16 ` [PATCH 4/4] io_uring: support for user allocated memory for rings/sqes Jens Axboe -- strict thread matches above, loose matches on Subject: below -- 2023-04-19 22:48 [PATCHSET RFC 0/4] Support for mapping SQ/CQ rings into huge page Jens Axboe 2023-04-19 22:48 ` [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox