public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/9] Clean up alloc_cache allocations
@ 2024-11-19  1:22 Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper Gabriel Krisman Bertazi
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

Jens, Pavel,

The allocation paths that use alloc_cache duplicate the same code
pattern, sometimes in a quite convoluted way.  This series cleans up
that code by folding the allocation into the cache code itself, making
it just an allocator function, and keeping the cache policy invisible to
callers.  A bigger justification for doing this, beyond code simplicity,
is that it makes it trivial to test the impact of disabling the cache
and using slab directly, which I've used for slab improvement
experiments.  I think this is one step forward in the direction
eventually lifting the alloc_cache into a proper magazine layer in slab
out of io_uring.

It survived liburing testsuite, and when microbenchmarking the
read-write path with mmtests and fio, I didn't observe any significant
performance variation (there was actually a 2% gain, but that was
within the variance of the test runs, making it not signficant and
surely test noise).

I'm specifically interested, and happy to do so, if there are specific
benchmarks you'd like me to run it against.

Thanks,

Gabriel Krisman Bertazi (9):
  io_uring: Fold allocation into alloc_cache helper
  io_uring: Add generic helper to allocate async data
  io_uring/futex: Allocate ifd with generic alloc_cache helper
  io_uring/poll: Allocate apoll with generic alloc_cache helper
  io_uring/uring_cmd: Allocate async data through generic helper
  io_uring/net: Allocate msghdr async data through helper
  io_uring/rw: Allocate async data through helper
  io_uring: Move old async data allocation helper to header
  io_uring/msg_ring: Drop custom destructor

 io_uring/alloc_cache.h |  7 +++++++
 io_uring/futex.c       | 13 +------------
 io_uring/io_uring.c    | 17 ++---------------
 io_uring/io_uring.h    | 22 ++++++++++++++++++++++
 io_uring/msg_ring.c    |  7 -------
 io_uring/msg_ring.h    |  1 -
 io_uring/net.c         | 28 ++++++++++------------------
 io_uring/poll.c        | 13 +++++--------
 io_uring/rw.c          | 28 ++++++++--------------------
 io_uring/timeout.c     |  5 ++---
 io_uring/uring_cmd.c   | 20 ++------------------
 io_uring/waitid.c      |  4 ++--
 12 files changed, 61 insertions(+), 104 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  2:02   ` Jens Axboe
  2024-11-19  1:22 ` [PATCH 2/9] io_uring: Add generic helper to allocate async data Gabriel Krisman Bertazi
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

The allocation paths that use alloc_cache duplicate the same code
pattern, sometimes in a quite convoluted way.  Fold the allocation into
the cache code itself, making it just an allocator function, and keeping
the cache policy invisible to callers.  Another justification for doing
this, beyond code simplicity, is that it makes it trivial to test the
impact of disabling the cache and using slab directly, which I've used
for slab improvement experiments.

One relevant detail is that this allocates zeroed memory.  Rationale is
that it simplifies the handling of the embedded free_iov in some of the
cached objects, and the performance impact shouldn't be meaningful,
since we are supposed to be hitting the cache most of the time and the
allocation is already the slow path.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/alloc_cache.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h
index b7a38a2069cf..6b34e491a30a 100644
--- a/io_uring/alloc_cache.h
+++ b/io_uring/alloc_cache.h
@@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache)
 	return NULL;
 }
 
+static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp)
+{
+	if (!cache->nr_cached)
+		return kzalloc(cache->elem_size, gfp);
+	return io_alloc_cache_get(cache);
+}
+
 /* returns false if the cache was initialized properly */
 static inline bool io_alloc_cache_init(struct io_alloc_cache *cache,
 				       unsigned max_nr, size_t size)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/9] io_uring: Add generic helper to allocate async data
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 3/9] io_uring/futex: Allocate ifd with generic alloc_cache helper Gabriel Krisman Bertazi
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

This helper replaces io_alloc_async_data by using the folded allocation.
Do it in a header to allow the compiler to decide whether to inline.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/io_uring.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 4070d4c8ef97..9c158e401ecb 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -222,6 +222,15 @@ static inline void io_req_set_res(struct io_kiocb *req, s32 res, u32 cflags)
 	req->cqe.flags = cflags;
 }
 
+static inline void *io_uring_alloc_async_data(struct io_alloc_cache *cache,
+					      struct io_kiocb *req)
+{
+	req->async_data = io_alloc_cache_alloc(cache, GFP_KERNEL);
+	if (req->async_data)
+		req->flags |= REQ_F_ASYNC_DATA;
+	return req->async_data;
+}
+
 static inline bool req_has_async_data(struct io_kiocb *req)
 {
 	return req->flags & REQ_F_ASYNC_DATA;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/9] io_uring/futex: Allocate ifd with generic alloc_cache helper
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 2/9] io_uring: Add generic helper to allocate async data Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 4/9] io_uring/poll: Allocate apoll " Gabriel Krisman Bertazi
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

Instead of open-coding the allocation, use the generic alloc_cache
helper.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/futex.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/io_uring/futex.c b/io_uring/futex.c
index e29662f039e1..dd9dddea2a9b 100644
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -251,17 +251,6 @@ static void io_futex_wake_fn(struct wake_q_head *wake_q, struct futex_q *q)
 	io_req_task_work_add(req);
 }
 
-static struct io_futex_data *io_alloc_ifd(struct io_ring_ctx *ctx)
-{
-	struct io_futex_data *ifd;
-
-	ifd = io_alloc_cache_get(&ctx->futex_cache);
-	if (ifd)
-		return ifd;
-
-	return kmalloc(sizeof(struct io_futex_data), GFP_NOWAIT);
-}
-
 int io_futexv_wait(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_futex *iof = io_kiocb_to_cmd(req, struct io_futex);
@@ -331,7 +320,7 @@ int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags)
 	}
 
 	io_ring_submit_lock(ctx, issue_flags);
-	ifd = io_alloc_ifd(ctx);
+	ifd = io_alloc_cache_alloc(&ctx->futex_cache, GFP_NOWAIT);
 	if (!ifd) {
 		ret = -ENOMEM;
 		goto done_unlock;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/9] io_uring/poll: Allocate apoll with generic alloc_cache helper
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2024-11-19  1:22 ` [PATCH 3/9] io_uring/futex: Allocate ifd with generic alloc_cache helper Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 5/9] io_uring/uring_cmd: Allocate async data through generic helper Gabriel Krisman Bertazi
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

This abstracts away the cache details to simplify the code.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/poll.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index bced9edd5233..837790ed6816 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -648,15 +648,12 @@ static struct async_poll *io_req_alloc_apoll(struct io_kiocb *req,
 	if (req->flags & REQ_F_POLLED) {
 		apoll = req->apoll;
 		kfree(apoll->double_poll);
-	} else if (!(issue_flags & IO_URING_F_UNLOCKED)) {
-		apoll = io_alloc_cache_get(&ctx->apoll_cache);
-		if (!apoll)
-			goto alloc_apoll;
-		apoll->poll.retries = APOLL_MAX_RETRY;
 	} else {
-alloc_apoll:
-		apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
-		if (unlikely(!apoll))
+		if (!(issue_flags & IO_URING_F_UNLOCKED))
+			apoll = io_alloc_cache_alloc(&ctx->apoll_cache, GFP_ATOMIC);
+		else
+			apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
+		if (!apoll)
 			return NULL;
 		apoll->poll.retries = APOLL_MAX_RETRY;
 	}
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/9] io_uring/uring_cmd: Allocate async data through generic helper
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2024-11-19  1:22 ` [PATCH 4/9] io_uring/poll: Allocate apoll " Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 6/9] io_uring/net: Allocate msghdr async data through helper Gabriel Krisman Bertazi
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

This abstracts away the cache details and simplify the code.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/io_uring.h  |  1 +
 io_uring/uring_cmd.c | 20 ++------------------
 2 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 9c158e401ecb..f35bf7ef68f3 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -8,6 +8,7 @@
 #include <linux/poll.h>
 #include <linux/io_uring_types.h>
 #include <uapi/linux/eventpoll.h>
+#include "alloc_cache.h"
 #include "io-wq.h"
 #include "slist.h"
 #include "filetable.h"
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index e9d99d3ecc34..b029672f3ec3 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -16,22 +16,6 @@
 #include "rsrc.h"
 #include "uring_cmd.h"
 
-static struct uring_cache *io_uring_async_get(struct io_kiocb *req)
-{
-	struct io_ring_ctx *ctx = req->ctx;
-	struct uring_cache *cache;
-
-	cache = io_alloc_cache_get(&ctx->uring_cache);
-	if (cache) {
-		req->flags |= REQ_F_ASYNC_DATA;
-		req->async_data = cache;
-		return cache;
-	}
-	if (!io_alloc_async_data(req))
-		return req->async_data;
-	return NULL;
-}
-
 static void io_req_uring_cleanup(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
@@ -181,8 +165,8 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req,
 	struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
 	struct uring_cache *cache;
 
-	cache = io_uring_async_get(req);
-	if (unlikely(!cache))
+	cache = io_uring_alloc_async_data(&req->ctx->uring_cache, req);
+	if (!cache)
 		return -ENOMEM;
 
 	if (!(req->flags & REQ_F_FORCE_ASYNC)) {
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/9] io_uring/net: Allocate msghdr async data through helper
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
                   ` (4 preceding siblings ...)
  2024-11-19  1:22 ` [PATCH 5/9] io_uring/uring_cmd: Allocate async data through generic helper Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 7/9] io_uring/rw: Allocate " Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

This abstracts away the cache details.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/net.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index df1f7dc6f1c8..104863101980 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -160,25 +160,17 @@ static struct io_async_msghdr *io_msg_alloc_async(struct io_kiocb *req)
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_async_msghdr *hdr;
 
-	hdr = io_alloc_cache_get(&ctx->netmsg_cache);
-	if (hdr) {
-		if (hdr->free_iov) {
-			kasan_mempool_unpoison_object(hdr->free_iov,
-				hdr->free_iov_nr * sizeof(struct iovec));
-			req->flags |= REQ_F_NEED_CLEANUP;
-		}
-		req->flags |= REQ_F_ASYNC_DATA;
-		req->async_data = hdr;
-		return hdr;
-	}
-
-	if (!io_alloc_async_data(req)) {
-		hdr = req->async_data;
-		hdr->free_iov_nr = 0;
-		hdr->free_iov = NULL;
-		return hdr;
+	hdr = io_uring_alloc_async_data(&ctx->netmsg_cache, req);
+	if (!hdr)
+		return NULL;
+
+	/* If the async data was cached, we might have an iov cached inside. */
+	if (hdr->free_iov) {
+		kasan_mempool_unpoison_object(hdr->free_iov,
+					      hdr->free_iov_nr * sizeof(struct iovec));
+		req->flags |= REQ_F_NEED_CLEANUP;
 	}
-	return NULL;
+	return hdr;
 }
 
 /* assign new iovec to kmsg, if we need to */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/9] io_uring/rw: Allocate async data through helper
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
                   ` (5 preceding siblings ...)
  2024-11-19  1:22 ` [PATCH 6/9] io_uring/net: Allocate msghdr async data through helper Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 8/9] io_uring: Move old async data allocation helper to header Gabriel Krisman Bertazi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

This abstract away the cache details.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/rw.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index b62cdb5fc936..7f1890f23312 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -213,28 +213,16 @@ static int io_rw_alloc_async(struct io_kiocb *req)
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_async_rw *rw;
 
-	rw = io_alloc_cache_get(&ctx->rw_cache);
-	if (rw) {
-		if (rw->free_iovec) {
-			kasan_mempool_unpoison_object(rw->free_iovec,
-				rw->free_iov_nr * sizeof(struct iovec));
-			req->flags |= REQ_F_NEED_CLEANUP;
-		}
-		req->flags |= REQ_F_ASYNC_DATA;
-		req->async_data = rw;
-		goto done;
-	}
-
-	if (!io_alloc_async_data(req)) {
-		rw = req->async_data;
-		rw->free_iovec = NULL;
-		rw->free_iov_nr = 0;
-done:
+	rw = io_uring_alloc_async_data(&ctx->rw_cache, req);
+	if (!rw)
+		return -ENOMEM;
+	if (rw->free_iovec) {
+		kasan_mempool_unpoison_object(rw->free_iovec,
+					      rw->free_iov_nr * sizeof(struct iovec));
+		req->flags |= REQ_F_NEED_CLEANUP;
 		rw->bytes_done = 0;
-		return 0;
 	}
-
-	return -ENOMEM;
+	return 0;
 }
 
 static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/9] io_uring: Move old async data allocation helper to header
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
                   ` (6 preceding siblings ...)
  2024-11-19  1:22 ` [PATCH 7/9] io_uring/rw: Allocate " Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  1:22 ` [PATCH 9/9] io_uring/msg_ring: Drop custom destructor Gabriel Krisman Bertazi
  2024-11-19  2:05 ` [PATCH 0/9] Clean up alloc_cache allocations Jens Axboe
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

There are two remaining uses of the old async data allocator that do not
rely on the alloc cache.  I don't want to make them use the new
allocator helper because that would require a if(cache) check, which
will result in dead code for the cached case (for callers passing a
cache, gcc can't prove the cache isn't NULL, and will therefore preserve
the check.  Since this is an inline function and just a few lines long,
keep a second helper to deal with cases where we don't have an async
data cache.

No functional change intended here.  This is just moving the helper
around and making it inline.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/io_uring.c | 13 -------------
 io_uring/io_uring.h | 12 ++++++++++++
 io_uring/timeout.c  |  5 ++---
 io_uring/waitid.c   |  4 ++--
 4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index bd71782057de..08c42a0e3e74 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1616,19 +1616,6 @@ io_req_flags_t io_file_get_flags(struct file *file)
 	return res;
 }
 
-bool io_alloc_async_data(struct io_kiocb *req)
-{
-	const struct io_issue_def *def = &io_issue_defs[req->opcode];
-
-	WARN_ON_ONCE(!def->async_size);
-	req->async_data = kmalloc(def->async_size, GFP_KERNEL);
-	if (req->async_data) {
-		req->flags |= REQ_F_ASYNC_DATA;
-		return false;
-	}
-	return true;
-}
-
 static u32 io_get_sequence(struct io_kiocb *req)
 {
 	u32 seq = req->ctx->cached_sq_head;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index f35bf7ef68f3..13e07e65c0a1 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -12,6 +12,7 @@
 #include "io-wq.h"
 #include "slist.h"
 #include "filetable.h"
+#include "opdef.h"
 
 #ifndef CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -232,6 +233,17 @@ static inline void *io_uring_alloc_async_data(struct io_alloc_cache *cache,
 	return req->async_data;
 }
 
+static inline void *io_uring_alloc_async_data_nocache(struct io_kiocb *req)
+{
+	const struct io_issue_def *def = &io_issue_defs[req->opcode];
+
+	WARN_ON_ONCE(!def->async_size);
+	req->async_data = kmalloc(def->async_size, GFP_KERNEL);
+	if (req->async_data)
+		req->flags |= REQ_F_ASYNC_DATA;
+	return req->async_data;
+}
+
 static inline bool req_has_async_data(struct io_kiocb *req)
 {
 	return req->flags & REQ_F_ASYNC_DATA;
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 5b12bd6a804c..078dfd6fcf71 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -526,10 +526,9 @@ static int __io_timeout_prep(struct io_kiocb *req,
 
 	if (WARN_ON_ONCE(req_has_async_data(req)))
 		return -EFAULT;
-	if (io_alloc_async_data(req))
+	data = io_uring_alloc_async_data_nocache(req);
+	if (!data)
 		return -ENOMEM;
-
-	data = req->async_data;
 	data->req = req;
 	data->flags = flags;
 
diff --git a/io_uring/waitid.c b/io_uring/waitid.c
index daef5dd644f0..6778c0ee76c4 100644
--- a/io_uring/waitid.c
+++ b/io_uring/waitid.c
@@ -303,10 +303,10 @@ int io_waitid(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_waitid_async *iwa;
 	int ret;
 
-	if (io_alloc_async_data(req))
+	iwa = io_uring_alloc_async_data_nocache(req);
+	if (!iwa)
 		return -ENOMEM;
 
-	iwa = req->async_data;
 	iwa->req = req;
 
 	ret = kernel_waitid_prepare(&iwa->wo, iw->which, iw->upid, &iw->info,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 9/9] io_uring/msg_ring: Drop custom destructor
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
                   ` (7 preceding siblings ...)
  2024-11-19  1:22 ` [PATCH 8/9] io_uring: Move old async data allocation helper to header Gabriel Krisman Bertazi
@ 2024-11-19  1:22 ` Gabriel Krisman Bertazi
  2024-11-19  2:05 ` [PATCH 0/9] Clean up alloc_cache allocations Jens Axboe
  9 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19  1:22 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring, Gabriel Krisman Bertazi

kfree can handle slab objects nowadays. Drop the extra callback and just
use kfree.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/io_uring.c | 4 ++--
 io_uring/msg_ring.c | 7 -------
 io_uring/msg_ring.h | 1 -
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 08c42a0e3e74..1e591234f32b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -361,7 +361,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	io_alloc_cache_free(&ctx->netmsg_cache, io_netmsg_cache_free);
 	io_alloc_cache_free(&ctx->rw_cache, io_rw_cache_free);
 	io_alloc_cache_free(&ctx->uring_cache, kfree);
-	io_alloc_cache_free(&ctx->msg_cache, io_msg_cache_free);
+	io_alloc_cache_free(&ctx->msg_cache, kfree);
 	io_futex_cache_free(ctx);
 	kvfree(ctx->cancel_table.hbs);
 	xa_destroy(&ctx->io_bl_xa);
@@ -2693,7 +2693,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	io_alloc_cache_free(&ctx->netmsg_cache, io_netmsg_cache_free);
 	io_alloc_cache_free(&ctx->rw_cache, io_rw_cache_free);
 	io_alloc_cache_free(&ctx->uring_cache, kfree);
-	io_alloc_cache_free(&ctx->msg_cache, io_msg_cache_free);
+	io_alloc_cache_free(&ctx->msg_cache, kfree);
 	io_futex_cache_free(ctx);
 	io_destroy_buffers(ctx);
 	io_unregister_cqwait_reg(ctx);
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index e63af34004b7..d41ea29802a6 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -358,10 +358,3 @@ int io_uring_sync_msg_ring(struct io_uring_sqe *sqe)
 	}
 	return ret;
 }
-
-void io_msg_cache_free(const void *entry)
-{
-	struct io_kiocb *req = (struct io_kiocb *) entry;
-
-	kmem_cache_free(req_cachep, req);
-}
diff --git a/io_uring/msg_ring.h b/io_uring/msg_ring.h
index 38e7f8f0c944..32236d2fb778 100644
--- a/io_uring/msg_ring.h
+++ b/io_uring/msg_ring.h
@@ -4,4 +4,3 @@ int io_uring_sync_msg_ring(struct io_uring_sqe *sqe);
 int io_msg_ring_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_msg_ring(struct io_kiocb *req, unsigned int issue_flags);
 void io_msg_ring_cleanup(struct io_kiocb *req);
-void io_msg_cache_free(const void *entry);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper
  2024-11-19  1:22 ` [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper Gabriel Krisman Bertazi
@ 2024-11-19  2:02   ` Jens Axboe
  2024-11-19 15:30     ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2024-11-19  2:02 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, asml.silence; +Cc: io-uring

On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote:
> diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h
> index b7a38a2069cf..6b34e491a30a 100644
> --- a/io_uring/alloc_cache.h
> +++ b/io_uring/alloc_cache.h
> @@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache)
>  	return NULL;
>  }
>  
> +static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp)
> +{
> +	if (!cache->nr_cached)
> +		return kzalloc(cache->elem_size, gfp);
> +	return io_alloc_cache_get(cache);
> +}

I don't think you want to use kzalloc here. The caller will need to
clear what its needs for the cached path anyway, so has no other option
than to clear/set things twice for that case.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/9] Clean up alloc_cache allocations
  2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
                   ` (8 preceding siblings ...)
  2024-11-19  1:22 ` [PATCH 9/9] io_uring/msg_ring: Drop custom destructor Gabriel Krisman Bertazi
@ 2024-11-19  2:05 ` Jens Axboe
  2024-11-19 15:31   ` Gabriel Krisman Bertazi
  9 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2024-11-19  2:05 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, asml.silence; +Cc: io-uring

On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote:
> Jens, Pavel,
> 
> The allocation paths that use alloc_cache duplicate the same code
> pattern, sometimes in a quite convoluted way.  This series cleans up
> that code by folding the allocation into the cache code itself, making
> it just an allocator function, and keeping the cache policy invisible to
> callers.  A bigger justification for doing this, beyond code simplicity,
> is that it makes it trivial to test the impact of disabling the cache
> and using slab directly, which I've used for slab improvement
> experiments.  I think this is one step forward in the direction
> eventually lifting the alloc_cache into a proper magazine layer in slab
> out of io_uring.

Nice!

Patchset looks good, from a quick look, even from just a cleanup
perspective. We're obviously inside the merge window right now, so it's
a 6.14 target at this point. I'll take some timer to review it a bit
closer later this week.

> It survived liburing testsuite, and when microbenchmarking the
> read-write path with mmtests and fio, I didn't observe any significant
> performance variation (there was actually a 2% gain, but that was
> within the variance of the test runs, making it not signficant and
> surely test noise).
> 
> I'm specifically interested, and happy to do so, if there are specific
> benchmarks you'd like me to run it against.

In general, running the liburing test suite with various types of files
and devices and with KASAN and LOCKDEP turned on is a good test of not
having messed something up, in a big way at least. But maybe you already
ran that too?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper
  2024-11-19  2:02   ` Jens Axboe
@ 2024-11-19 15:30     ` Gabriel Krisman Bertazi
  2024-11-19 16:18       ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19 15:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: asml.silence, io-uring

Jens Axboe <[email protected]> writes:

> On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote:
>> diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h
>> index b7a38a2069cf..6b34e491a30a 100644
>> --- a/io_uring/alloc_cache.h
>> +++ b/io_uring/alloc_cache.h
>> @@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache)
>>  	return NULL;
>>  }
>>  
>> +static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp)
>> +{
>> +	if (!cache->nr_cached)
>> +		return kzalloc(cache->elem_size, gfp);
>> +	return io_alloc_cache_get(cache);
>> +}
>
> I don't think you want to use kzalloc here. The caller will need to
> clear what its needs for the cached path anyway, so has no other option
> than to clear/set things twice for that case.

Hi Jens,

The reason I do kzalloc here is to be able to trust the value of
rw->free_iov (io_rw_alloc_async) and hdr->free_iov (io_msg_alloc_async)
regardless of where the allocated memory came from, cache or slab.  In
the callers (patch 6 and 7), we do:

+	hdr = io_uring_alloc_async_data(&ctx->netmsg_cache, req);
+	if (!hdr)
+		return NULL;
+
+	/* If the async data was cached, we might have an iov cached inside. */
+	if (hdr->free_iov) {

An alternative would be to return a flag indicating whether the
allocated memory came from the cache or not, but it didn't seem elegant.
Do you see a better way?

I also considered that zeroing memory here shouldn't harm performance,
because it'll hit the cache most of the time.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/9] Clean up alloc_cache allocations
  2024-11-19  2:05 ` [PATCH 0/9] Clean up alloc_cache allocations Jens Axboe
@ 2024-11-19 15:31   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-11-19 15:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: asml.silence, io-uring

Jens Axboe <[email protected]> writes:

> On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote:
>> Jens, Pavel,
>> 
>> The allocation paths that use alloc_cache duplicate the same code
>> pattern, sometimes in a quite convoluted way.  This series cleans up
>> that code by folding the allocation into the cache code itself, making
>> it just an allocator function, and keeping the cache policy invisible to
>> callers.  A bigger justification for doing this, beyond code simplicity,
>> is that it makes it trivial to test the impact of disabling the cache
>> and using slab directly, which I've used for slab improvement
>> experiments.  I think this is one step forward in the direction
>> eventually lifting the alloc_cache into a proper magazine layer in slab
>> out of io_uring.
>
> Nice!
>
> Patchset looks good, from a quick look, even from just a cleanup
> perspective. We're obviously inside the merge window right now, so it's
> a 6.14 target at this point. I'll take some timer to review it a bit
> closer later this week.
>
>> It survived liburing testsuite, and when microbenchmarking the
>> read-write path with mmtests and fio, I didn't observe any significant
>> performance variation (there was actually a 2% gain, but that was
>> within the variance of the test runs, making it not signficant and
>> surely test noise).
>> 
>> I'm specifically interested, and happy to do so, if there are specific
>> benchmarks you'd like me to run it against.
>
> In general, running the liburing test suite with various types of files
> and devices and with KASAN and LOCKDEP turned on is a good test of not
> having messed something up, in a big way at least. But maybe you already
> ran that too?

I was thinking more of extra performance benchmark, but I always forget to
enable KASAN.  I'll rerun to confirm it is fine. :)

thanks,



-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper
  2024-11-19 15:30     ` Gabriel Krisman Bertazi
@ 2024-11-19 16:18       ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2024-11-19 16:18 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: asml.silence, io-uring

On 11/19/24 8:30 AM, Gabriel Krisman Bertazi wrote:
> Jens Axboe <[email protected]> writes:
> 
>> On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote:
>>> diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h
>>> index b7a38a2069cf..6b34e491a30a 100644
>>> --- a/io_uring/alloc_cache.h
>>> +++ b/io_uring/alloc_cache.h
>>> @@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache)
>>>  	return NULL;
>>>  }
>>>  
>>> +static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp)
>>> +{
>>> +	if (!cache->nr_cached)
>>> +		return kzalloc(cache->elem_size, gfp);
>>> +	return io_alloc_cache_get(cache);
>>> +}
>>
>> I don't think you want to use kzalloc here. The caller will need to
>> clear what its needs for the cached path anyway, so has no other option
>> than to clear/set things twice for that case.
> 
> Hi Jens,
> 
> The reason I do kzalloc here is to be able to trust the value of
> rw->free_iov (io_rw_alloc_async) and hdr->free_iov (io_msg_alloc_async)
> regardless of where the allocated memory came from, cache or slab.  In
> the callers (patch 6 and 7), we do:

I see, I guess that makes sense as some things are persistent in cache
and need clearing upfront if freshly allocated.

> +	hdr = io_uring_alloc_async_data(&ctx->netmsg_cache, req);
> +	if (!hdr)
> +		return NULL;
> +
> +	/* If the async data was cached, we might have an iov cached inside. */
> +	if (hdr->free_iov) {
> 
> An alternative would be to return a flag indicating whether the
> allocated memory came from the cache or not, but it didn't seem elegant.
> Do you see a better way?
> 
> I also considered that zeroing memory here shouldn't harm performance,
> because it'll hit the cache most of the time.

It should hit cache most of the time, but if we exceed the cache size,
then you will see allocations happen and churn. I don't like the idea of
the flag, then we still need to complicate the caller. We can do
something like slab where you have a hook for freshly allocated data
only? That can either be a property of the cache, or passed in via
io_alloc_cache_alloc()?

BTW, I'd probably change the name of that to io_cache_get() or
io_cache_alloc() or something like that, I don't think we need two
allocs in there.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-11-19 16:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-19  1:22 [PATCH 0/9] Clean up alloc_cache allocations Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 1/9] io_uring: Fold allocation into alloc_cache helper Gabriel Krisman Bertazi
2024-11-19  2:02   ` Jens Axboe
2024-11-19 15:30     ` Gabriel Krisman Bertazi
2024-11-19 16:18       ` Jens Axboe
2024-11-19  1:22 ` [PATCH 2/9] io_uring: Add generic helper to allocate async data Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 3/9] io_uring/futex: Allocate ifd with generic alloc_cache helper Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 4/9] io_uring/poll: Allocate apoll " Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 5/9] io_uring/uring_cmd: Allocate async data through generic helper Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 6/9] io_uring/net: Allocate msghdr async data through helper Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 7/9] io_uring/rw: Allocate " Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 8/9] io_uring: Move old async data allocation helper to header Gabriel Krisman Bertazi
2024-11-19  1:22 ` [PATCH 9/9] io_uring/msg_ring: Drop custom destructor Gabriel Krisman Bertazi
2024-11-19  2:05 ` [PATCH 0/9] Clean up alloc_cache allocations Jens Axboe
2024-11-19 15:31   ` Gabriel Krisman Bertazi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox