[PATCHSET RFC 0/3] Add support for registered waits

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCHSET RFC 0/3] Add support for registered waits
@ 2024-10-22 20:39 Jens Axboe
  2024-10-22 20:39 ` [PATCH 1/3] io_uring: switch struct ext_arg from __kernel_timespec to timespec64 Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jens Axboe @ 2024-10-22 20:39 UTC (permalink / raw)
  To: io-uring

Hi,

While doing testing on zero-copy tx/rx with io_uring, I noticed that a
LOT of time is being spent copying in data structures related to waiting
on events. While the structures aren't big (24b and 16b), it's a user
copy, and a dependent user copy on top of that - first get the main
struct io_uring_getevents_arg, and then copy in the timeout from that.
Details are in patch 3 on the numbers.

I then toyed around with the idea of having registered regions of wait
data. This can be set by the application, and then directly seen by the
kernel. On top of that, embed the timeout itself in this region, rather
than it being supplied as an out-of-band pointer.

Patch 1 is just a generic cleanup, and patch 2 improves the copying a
both. Both of these can stand alone. Patch 3 implements the meat of this,
which adds IORING_REGISTER_CQWAIT_REG, allowing an application to
register a number of struct io_uring_reg_wait regions that it can index
for wait scenarios. The kernel always registers a full page, so on 4k
page size archs, 64 regions are available by default. Then rather than
pass in a pointer to a struct io_uring_getevents_arg, an index is passed
instead, telling the kernel which registered wait region should be used
for this wait.

This basically removes all of the copying seen, which was anywhere from
3.5-4.5% of the time, when doing high frequency operations where
the number of wait invocations can be quite high.

Patches sit on top of the io_uring-ring-resize branch, as both end up
adding register opcodes.

Kernel branch here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-reg-wait

and liburing (pretty basic right now) branch here:

https://git.kernel.dk/cgit/liburing/log/?h=reg-wait

 include/linux/io_uring_types.h |   7 +++
 include/uapi/linux/io_uring.h  |  18 ++++++
 io_uring/io_uring.c            | 102 ++++++++++++++++++++++++++-------
 io_uring/register.c            |  48 ++++++++++++++++
 4 files changed, 153 insertions(+), 22 deletions(-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] io_uring: switch struct ext_arg from __kernel_timespec to timespec64
  2024-10-22 20:39 [PATCHSET RFC 0/3] Add support for registered waits Jens Axboe
@ 2024-10-22 20:39 ` Jens Axboe
  2024-10-22 20:39 ` [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end Jens Axboe
  2024-10-22 20:39 ` [PATCH 3/3] io_uring: add support for fixed wait regions Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-10-22 20:39 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

This avoids intermediate storage for turning a __kernel_timespec
user pointer into an on-stack struct timespec64, only then to turn it
into a ktime_t.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/io_uring.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index b5974bdad48b..8952453ea807 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2494,9 +2494,10 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 
 struct ext_arg {
 	size_t argsz;
-	struct __kernel_timespec __user *ts;
+	struct timespec64 ts;
 	const sigset_t __user *sig;
 	ktime_t min_time;
+	bool ts_set;
 };
 
 /*
@@ -2534,13 +2535,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 	iowq.timeout = KTIME_MAX;
 	start_time = io_get_time(ctx);
 
-	if (ext_arg->ts) {
-		struct timespec64 ts;
-
-		if (get_timespec64(&ts, ext_arg->ts))
-			return -EFAULT;
-
-		iowq.timeout = timespec64_to_ktime(ts);
+	if (ext_arg->ts_set) {
+		iowq.timeout = timespec64_to_ktime(ext_arg->ts);
 		if (!(flags & IORING_ENTER_ABS_TIMER))
 			iowq.timeout = ktime_add(iowq.timeout, start_time);
 	}
@@ -3251,7 +3247,6 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp,
 	 */
 	if (!(flags & IORING_ENTER_EXT_ARG)) {
 		ext_arg->sig = (const sigset_t __user *) argp;
-		ext_arg->ts = NULL;
 		return 0;
 	}
 
@@ -3266,7 +3261,11 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp,
 	ext_arg->min_time = arg.min_wait_usec * NSEC_PER_USEC;
 	ext_arg->sig = u64_to_user_ptr(arg.sigmask);
 	ext_arg->argsz = arg.sigmask_sz;
-	ext_arg->ts = u64_to_user_ptr(arg.ts);
+	if (arg.ts) {
+		if (get_timespec64(&ext_arg->ts, u64_to_user_ptr(arg.ts)))
+			return -EFAULT;
+		ext_arg->ts_set = true;
+	}
 	return 0;
 }
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end
  2024-10-22 20:39 [PATCHSET RFC 0/3] Add support for registered waits Jens Axboe
  2024-10-22 20:39 ` [PATCH 1/3] io_uring: switch struct ext_arg from __kernel_timespec to timespec64 Jens Axboe
@ 2024-10-22 20:39 ` Jens Axboe
  2024-10-22 22:40   ` Keith Busch
  2024-10-22 20:39 ` [PATCH 3/3] io_uring: add support for fixed wait regions Jens Axboe
  2 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2024-10-22 20:39 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

In scenarios where a high frequency of wait events are seen, the copy
of the struct io_uring_getevents_arg is quite noticeable in the
profiles in terms of time spent. It can be seen as up to 3.5-4.5%.
Rewrite the copy-in logic, saving about 0.5% of the time.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/io_uring.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8952453ea807..612e7d66f845 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3239,6 +3239,7 @@ static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t a
 static int io_get_ext_arg(unsigned flags, const void __user *argp,
 			  struct ext_arg *ext_arg)
 {
+	const struct io_uring_getevents_arg __user *uarg = argp;
 	struct io_uring_getevents_arg arg;
 
 	/*
@@ -3256,8 +3257,13 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp,
 	 */
 	if (ext_arg->argsz != sizeof(arg))
 		return -EINVAL;
-	if (copy_from_user(&arg, argp, sizeof(arg)))
+	if (!user_access_begin(uarg, sizeof(*uarg)))
 		return -EFAULT;
+	unsafe_get_user(arg.sigmask, &uarg->sigmask, uaccess_end);
+	unsafe_get_user(arg.min_wait_usec, &uarg->min_wait_usec, uaccess_end);
+	unsafe_get_user(arg.ts, &uarg->ts, uaccess_end);
+	unsafe_get_user(arg.sigmask_sz, &uarg->sigmask_sz, uaccess_end);
+	user_access_end();
 	ext_arg->min_time = arg.min_wait_usec * NSEC_PER_USEC;
 	ext_arg->sig = u64_to_user_ptr(arg.sigmask);
 	ext_arg->argsz = arg.sigmask_sz;
@@ -3267,6 +3273,9 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp,
 		ext_arg->ts_set = true;
 	}
 	return 0;
+uaccess_end:
+	user_access_end();
+	return -EFAULT;
 }
 
 SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] io_uring: add support for fixed wait regions
  2024-10-22 20:39 [PATCHSET RFC 0/3] Add support for registered waits Jens Axboe
  2024-10-22 20:39 ` [PATCH 1/3] io_uring: switch struct ext_arg from __kernel_timespec to timespec64 Jens Axboe
  2024-10-22 20:39 ` [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end Jens Axboe
@ 2024-10-22 20:39 ` Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-10-22 20:39 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Generally applications have 1 or a few waits of waiting, yet they pass
in a struct io_uring_getevents_arg every time. This needs to get copied
and, in turn, the timeout value needs to get copied.

Rather than do this for every invocation, allow the application to
register a fixed set of wait regions that can simply be indexed when
asking the kernel to wait on events.

At ring setup time, the application can register a number of these wait
regions ala:

	struct io_uring_reg_wait *reg;

	posix_memalign((void **) &reg, page_size, page_size);
	memset(reg, 0, page_size);

	/* set timeout and mark it as set, sigmask/sigmask_sz as needed */
	reg->ts.tv_sec = 0;
	reg->ts.tv_nsec = 100000;
	reg->flags = IORING_REG_WAIT_TS;

	io_uring_register_cqwait_reg(ring, reg, nr_regions);

and instead of doing:

	struct __kernel_timespec timeout = { .tv_nsec = 100000, };

	io_uring_submit_and_wait_timeout(ring, &cqe, nr, &t, NULL);

for each submit_and_wait, or just wait, operation, it can just reference
the above region at offset 0 and do:

	io_uring_submit_and_wait_reg(ring, &cqe, nr, 0);

to achieve the same goal of waiting 100usec without needing to copy
both struct io_uring_getevents_arg (24b) and struct __kernel_timeout
(16b) for each invocation. Struct io_uring_reg_wait looks as follows:

struct io_uring_reg_wait {
	struct __kernel_timespec	ts;
	__u32				min_wait_usec;
	__u32				flags;
	__u64				sigmask;
	__u32				sigmask_sz;
	__u32				pad[3];
	__u64				pad2[2];
};

embedding the timeout itself in the region, rather than passing it as
a pointer as well. Note that the signal mask is still passed as a
pointer, both for compatability reasons, but also because there doesn't
seem to be a lot of high frequency waits scenarios that involve setting
and resetting the signal mask for each wait.

The application is free to modify any region before a wait call, or it
can use keep multiple regions with different settings to avoid needing to
modify the same one for wait calls. Up to a page size of regions is mapped
by default, allowing PAGE_SIZE / 64 available regions for use.

In network performance testing with zero-copy, this reduced the time
spent waiting on the TX side from 3.12% to 0.3% and the RX side from 4.4%
to 0.3%.

Wait regions are fixed for the lifetime of the ring - once registered,
they are persistent until the ring is torn down.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/io_uring_types.h |  7 ++++
 include/uapi/linux/io_uring.h  | 18 +++++++++
 io_uring/io_uring.c            | 72 ++++++++++++++++++++++++++++------
 io_uring/register.c            | 48 +++++++++++++++++++++++
 4 files changed, 134 insertions(+), 11 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 6d3ee71bd832..40dc1ec37a42 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -327,6 +327,13 @@ struct io_ring_ctx {
 		atomic_t		cq_wait_nr;
 		atomic_t		cq_timeouts;
 		struct wait_queue_head	cq_wait;
+
+		/*
+		 * If registered with IORING_REGISTER_CQWAIT_REG, a single
+		 * page holds N entries, mapped in cq_wait_arg.
+		 */
+		struct page			**cq_wait_page;
+		struct io_uring_reg_wait	*cq_wait_arg;
 	} ____cacheline_aligned_in_smp;
 
 	/* timeouts */
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index c4737892c7cd..4ead32fa9275 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -518,6 +518,7 @@ struct io_cqring_offsets {
 #define IORING_ENTER_EXT_ARG		(1U << 3)
 #define IORING_ENTER_REGISTERED_RING	(1U << 4)
 #define IORING_ENTER_ABS_TIMER		(1U << 5)
+#define IORING_ENTER_EXT_ARG_REG	(1U << 6)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
@@ -618,6 +619,9 @@ enum io_uring_register_op {
 	/* resize CQ ring */
 	IORING_REGISTER_RESIZE_RINGS		= 33,
 
+	/* register fixed io_uring_reg_wait arguments */
+	IORING_REGISTER_CQWAIT_REG		= 34,
+
 	/* this goes last */
 	IORING_REGISTER_LAST,
 
@@ -801,6 +805,20 @@ enum io_uring_register_restriction_op {
 	IORING_RESTRICTION_LAST
 };
 
+enum {
+	IORING_REG_WAIT_TS		= (1U << 0),
+};
+
+struct io_uring_reg_wait {
+	struct __kernel_timespec	ts;
+	__u32				min_wait_usec;
+	__u32				flags;
+	__u64				sigmask;
+	__u32				sigmask_sz;
+	__u32				pad[3];
+	__u64				pad2[2];
+};
+
 struct io_uring_getevents_arg {
 	__u64	sigmask;
 	__u32	sigmask_sz;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 612e7d66f845..0b76b4becda9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2735,6 +2735,10 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	io_alloc_cache_free(&ctx->msg_cache, io_msg_cache_free);
 	io_futex_cache_free(ctx);
 	io_destroy_buffers(ctx);
+	if (ctx->cq_wait_page) {
+		unsigned short npages = 1;
+		io_pages_unmap(ctx->cq_wait_arg, &ctx->cq_wait_page, &npages, true);
+	}
 	mutex_unlock(&ctx->uring_lock);
 	if (ctx->sq_creds)
 		put_cred(ctx->sq_creds);
@@ -3223,21 +3227,44 @@ void __io_uring_cancel(bool cancel_all)
 	io_uring_cancel_generic(cancel_all, NULL);
 }
 
-static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz)
+static struct io_uring_reg_wait *io_get_ext_arg_fixed(struct io_ring_ctx *ctx,
+			const struct io_uring_getevents_arg __user *uarg)
+{
+	struct io_uring_reg_wait *arg = READ_ONCE(ctx->cq_wait_arg);
+
+	if (arg) {
+		unsigned int index = (unsigned int) (uintptr_t) uarg;
+
+		if (index >= PAGE_SIZE / sizeof(struct io_uring_reg_wait))
+			return ERR_PTR(-EINVAL);
+		return arg + index;
+	}
+
+	return ERR_PTR(-EFAULT);
+}
+
+static int io_validate_ext_arg(struct io_ring_ctx *ctx, unsigned flags,
+			       const void __user *argp, size_t argsz)
 {
-	if (flags & IORING_ENTER_EXT_ARG) {
-		struct io_uring_getevents_arg arg;
+	struct io_uring_getevents_arg arg;
 
-		if (argsz != sizeof(arg))
+	if (!(flags & IORING_ENTER_EXT_ARG))
+		return 0;
+
+	if (flags & IORING_ENTER_EXT_ARG_REG) {
+		if (argsz != sizeof(struct io_uring_reg_wait))
 			return -EINVAL;
-		if (copy_from_user(&arg, argp, sizeof(arg)))
-			return -EFAULT;
+		return PTR_ERR(io_get_ext_arg_fixed(ctx, argp));
 	}
+	if (argsz != sizeof(arg))
+		return -EINVAL;
+	if (copy_from_user(&arg, argp, sizeof(arg)))
+		return -EFAULT;
 	return 0;
 }
 
-static int io_get_ext_arg(unsigned flags, const void __user *argp,
-			  struct ext_arg *ext_arg)
+static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned flags,
+			  const void __user *argp, struct ext_arg *ext_arg)
 {
 	const struct io_uring_getevents_arg __user *uarg = argp;
 	struct io_uring_getevents_arg arg;
@@ -3251,6 +3278,28 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp,
 		return 0;
 	}
 
+	if (flags & IORING_ENTER_EXT_ARG_REG) {
+		struct io_uring_reg_wait *w;
+
+		if (ext_arg->argsz != sizeof(struct io_uring_reg_wait))
+			return -EINVAL;
+		w = io_get_ext_arg_fixed(ctx, argp);
+		if (IS_ERR(w))
+			return PTR_ERR(w);
+
+		if (w->flags & ~IORING_REG_WAIT_TS)
+			return -EINVAL;
+		ext_arg->min_time = READ_ONCE(w->min_wait_usec) * NSEC_PER_USEC;
+		ext_arg->sig = u64_to_user_ptr(READ_ONCE(w->sigmask));
+		ext_arg->argsz = READ_ONCE(w->sigmask_sz);
+		if (w->flags & IORING_REG_WAIT_TS) {
+			ext_arg->ts.tv_sec = READ_ONCE(w->ts.tv_sec);
+			ext_arg->ts.tv_nsec = READ_ONCE(w->ts.tv_nsec);
+			ext_arg->ts_set = true;
+		}
+		return 0;
+	}
+
 	/*
 	 * EXT_ARG is set - ensure we agree on the size of it and copy in our
 	 * timespec and sigset_t pointers if good.
@@ -3289,7 +3338,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
 			       IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG |
 			       IORING_ENTER_REGISTERED_RING |
-			       IORING_ENTER_ABS_TIMER)))
+			       IORING_ENTER_ABS_TIMER |
+			       IORING_ENTER_EXT_ARG_REG)))
 		return -EINVAL;
 
 	/*
@@ -3372,7 +3422,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 			 */
 			mutex_lock(&ctx->uring_lock);
 iopoll_locked:
-			ret2 = io_validate_ext_arg(flags, argp, argsz);
+			ret2 = io_validate_ext_arg(ctx, flags, argp, argsz);
 			if (likely(!ret2)) {
 				min_complete = min(min_complete,
 						   ctx->cq_entries);
@@ -3382,7 +3432,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		} else {
 			struct ext_arg ext_arg = { .argsz = argsz };
 
-			ret2 = io_get_ext_arg(flags, argp, &ext_arg);
+			ret2 = io_get_ext_arg(ctx, flags, argp, &ext_arg);
 			if (likely(!ret2)) {
 				min_complete = min(min_complete,
 						   ctx->cq_entries);
diff --git a/io_uring/register.c b/io_uring/register.c
index 8fbce6f268b6..edf6c218b228 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -520,6 +520,48 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 	return 0;
 }
 
+/*
+ * Register a page holding N entries of struct io_uring_reg_wait, which can
+ * be used via io_uring_enter(2) if IORING_GETEVENTS_EXT_ARG_REG is set.
+ * If that is set with IORING_GETEVENTS_EXT_ARG, then instead of passing
+ * in a pointer for a struct io_uring_getevents_arg, an index into this
+ * registered array is passed, avoiding two (arg + timeout) copies per
+ * invocation.
+ */
+static int io_register_cqwait_reg(struct io_ring_ctx *ctx, void __user *uarg,
+				  unsigned nr_args)
+{
+	struct io_uring_reg_wait *arg;
+	struct page **pages;
+	unsigned long len;
+	int nr_pages;
+
+	if (ctx->cq_wait_page || ctx->cq_wait_arg)
+		return -EBUSY;
+	if (check_mul_overflow(sizeof(*arg), nr_args, &len))
+		return -EOVERFLOW;
+	if (len > PAGE_SIZE)
+		return -EINVAL;
+
+	pages = io_pin_pages((unsigned long) uarg, len, &nr_pages);
+	if (IS_ERR(pages))
+		return PTR_ERR(pages);
+	if (nr_pages != 1) {
+		io_pages_free(&pages, nr_pages);
+		return -EINVAL;
+	}
+
+	arg = vmap(pages, 1, VM_MAP, PAGE_KERNEL);
+	if (arg) {
+		WRITE_ONCE(ctx->cq_wait_page, pages);
+		WRITE_ONCE(ctx->cq_wait_arg, arg);
+		return 0;
+	}
+
+	io_pages_free(&pages, 1);
+	return -ENOMEM;
+}
+
 static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
 			       void __user *arg, unsigned nr_args)
 	__releases(ctx->uring_lock)
@@ -714,6 +756,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
 			break;
 		ret = io_register_resize_rings(ctx, arg);
 		break;
+	case IORING_REGISTER_CQWAIT_REG:
+		ret = -EINVAL;
+		if (!arg)
+			break;
+		ret = io_register_cqwait_reg(ctx, arg, nr_args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end
  2024-10-22 20:39 ` [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end Jens Axboe
@ 2024-10-22 22:40   ` Keith Busch
  2024-10-23  1:11     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2024-10-22 22:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

On Tue, Oct 22, 2024 at 02:39:03PM -0600, Jens Axboe wrote:
> In scenarios where a high frequency of wait events are seen, the copy
> of the struct io_uring_getevents_arg is quite noticeable in the
> profiles in terms of time spent. It can be seen as up to 3.5-4.5%.
> Rewrite the copy-in logic, saving about 0.5% of the time.

I'm surprised it's faster to retrieve field by field instead of a
straight copy. But you can't argue with the results!
 
> Signed-off-by: Jens Axboe <[email protected]>
> ---
>  io_uring/io_uring.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 8952453ea807..612e7d66f845 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3239,6 +3239,7 @@ static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t a
>  static int io_get_ext_arg(unsigned flags, const void __user *argp,
>  			  struct ext_arg *ext_arg)
>  {
> +	const struct io_uring_getevents_arg __user *uarg = argp;
>  	struct io_uring_getevents_arg arg;
>  
>  	/*
> @@ -3256,8 +3257,13 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp,
>  	 */
>  	if (ext_arg->argsz != sizeof(arg))
>  		return -EINVAL;
> -	if (copy_from_user(&arg, argp, sizeof(arg)))
> +	if (!user_access_begin(uarg, sizeof(*uarg)))
>  		return -EFAULT;
> +	unsafe_get_user(arg.sigmask, &uarg->sigmask, uaccess_end);
> +	unsafe_get_user(arg.min_wait_usec, &uarg->min_wait_usec, uaccess_end);
> +	unsafe_get_user(arg.ts, &uarg->ts, uaccess_end);
> +	unsafe_get_user(arg.sigmask_sz, &uarg->sigmask_sz, uaccess_end);
> +	user_access_end();
>  	ext_arg->min_time = arg.min_wait_usec * NSEC_PER_USEC;
>  	ext_arg->sig = u64_to_user_ptr(arg.sigmask);
>  	ext_arg->argsz = arg.sigmask_sz;
> @@ -3267,6 +3273,9 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp,
>  		ext_arg->ts_set = true;
>  	}
>  	return 0;
> +uaccess_end:
> +	user_access_end();
> +	return -EFAULT;
>  }

I was looking for the 'goto' for this label and discovered it's in the
macro. Neat.

Reviewed-by: Keith Busch <[email protected]>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end
  2024-10-22 22:40   ` Keith Busch
@ 2024-10-23  1:11     ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-10-23  1:11 UTC (permalink / raw)
  To: Keith Busch; +Cc: io-uring

On 10/22/24 4:40 PM, Keith Busch wrote:
> On Tue, Oct 22, 2024 at 02:39:03PM -0600, Jens Axboe wrote:
>> In scenarios where a high frequency of wait events are seen, the copy
>> of the struct io_uring_getevents_arg is quite noticeable in the
>> profiles in terms of time spent. It can be seen as up to 3.5-4.5%.
>> Rewrite the copy-in logic, saving about 0.5% of the time.
> 
> I'm surprised it's faster to retrieve field by field instead of a
> straight copy. But you can't argue with the results!

It's a pretty common setup - mostly when copying separate entities. But
works here too!

> I was looking for the 'goto' for this label and discovered it's in the
> macro. Neat.
> 
> Reviewed-by: Keith Busch <[email protected]>

Thanks, will add.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-10-23  1:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-22 20:39 [PATCHSET RFC 0/3] Add support for registered waits Jens Axboe
2024-10-22 20:39 ` [PATCH 1/3] io_uring: switch struct ext_arg from __kernel_timespec to timespec64 Jens Axboe
2024-10-22 20:39 ` [PATCH 2/3] io_uring: change io_get_ext_arg() to uaccess begin + end Jens Axboe
2024-10-22 22:40   ` Keith Busch
2024-10-23  1:11     ` Jens Axboe
2024-10-22 20:39 ` [PATCH 3/3] io_uring: add support for fixed wait regions Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox