public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] timeout immediate arg
@ 2026-02-25 10:35 Pavel Begunkov
  2026-02-25 10:35 ` [PATCH v2 1/2] io_uring/timeout: READ_ONCE sqe->addr Pavel Begunkov
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-25 10:35 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, axboe, Keith Busch

Allow the user to pass the timeout value inside the SQE instead of
pointing to a timespec, people asked for it as it makes user space
simpler. More details description is in Patch 2.

v2: ditto for timeout updates

Pavel Begunkov (2):
  io_uring/timeout: READ_ONCE sqe->addr
  io_uring/timeout: immediate timeout arg

 include/uapi/linux/io_uring.h |  5 +++++
 io_uring/timeout.c            | 28 +++++++++++++++++++++++-----
 2 files changed, 28 insertions(+), 5 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/2] io_uring/timeout: READ_ONCE sqe->addr
  2026-02-25 10:35 [PATCH v2 0/2] timeout immediate arg Pavel Begunkov
@ 2026-02-25 10:35 ` Pavel Begunkov
  2026-02-25 10:35 ` [PATCH v2 2/2] io_uring/timeout: immediate timeout arg Pavel Begunkov
  2026-02-25 15:36 ` (subset) [PATCH v2 0/2] timeout immediate arg Jens Axboe
  2 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-25 10:35 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, axboe, Keith Busch

We should use READ_ONCE when reading from a SQE, make sure timeout gets
a stable timespec address.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/timeout.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 84dda24f3eb2..cb61d4862fc6 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -462,7 +462,7 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 			tr->ltimeout = true;
 		if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
 			return -EINVAL;
-		if (get_timespec64(&tr->ts, u64_to_user_ptr(sqe->addr2)))
+		if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
 			return -EFAULT;
 		if (tr->ts.tv_sec < 0 || tr->ts.tv_nsec < 0)
 			return -EINVAL;
@@ -557,7 +557,7 @@ static int __io_timeout_prep(struct io_kiocb *req,
 	data->req = req;
 	data->flags = flags;
 
-	if (get_timespec64(&data->ts, u64_to_user_ptr(sqe->addr)))
+	if (get_timespec64(&data->ts, u64_to_user_ptr(READ_ONCE(sqe->addr))))
 		return -EFAULT;
 
 	if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-25 10:35 [PATCH v2 0/2] timeout immediate arg Pavel Begunkov
  2026-02-25 10:35 ` [PATCH v2 1/2] io_uring/timeout: READ_ONCE sqe->addr Pavel Begunkov
@ 2026-02-25 10:35 ` Pavel Begunkov
  2026-02-27 14:08   ` Stefan Metzmacher
  2026-02-25 15:36 ` (subset) [PATCH v2 0/2] timeout immediate arg Jens Axboe
  2 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-25 10:35 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, axboe, Keith Busch

One the things the user has always keep in mind is that any user
pointers they put into an SQE is not going to be read by the kernel
until submission happens, and the user has to ensure the pointee
stays alive until then. For example, this snippet:

void prep_timeout(struct io_uring_sqe *sqe) {
	struct __kernel_timespec ts = {...};
	prep_timeout(sqe, &ts);
}

void submit() {
	sqe = get_sqe();
	prep_timeout(sqe);
	io_uring_submit();
}

would lead to UAF for the on stack variable 'ts'. Instead of passing
the timeout value as a pointer allow to store it immediately in the SQE.
The user has to set a new flag called IORING_TIMEOUT_IMMEDIATE_ARG,
in which case sqe->addr will be interpreted as the timeout value in ns.
It only works with relative timeouts and rejected if set together with
IORING_TIMEOUT_ABS out of concerns of not having enough range in u64 to
represent a good long term API.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/uapi/linux/io_uring.h |  5 +++++
 io_uring/timeout.c            | 28 +++++++++++++++++++++++-----
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 6750c383a2ab..8f4de786e6e9 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -340,6 +340,10 @@ enum io_uring_op {
 
 /*
  * sqe->timeout_flags
+ *
+ * IORING_TIMEOUT_IMMEDIATE_ARG:	If set, sqe->addr stores the timeout
+ *					value in nanoseconds instead of
+ *					pointing to a timespec.
  */
 #define IORING_TIMEOUT_ABS		(1U << 0)
 #define IORING_TIMEOUT_UPDATE		(1U << 1)
@@ -348,6 +352,7 @@ enum io_uring_op {
 #define IORING_LINK_TIMEOUT_UPDATE	(1U << 4)
 #define IORING_TIMEOUT_ETIME_SUCCESS	(1U << 5)
 #define IORING_TIMEOUT_MULTISHOT	(1U << 6)
+#define IORING_TIMEOUT_IMMEDIATE_ARG	(1U << 7)
 #define IORING_TIMEOUT_CLOCK_MASK	(IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME)
 #define IORING_TIMEOUT_UPDATE_MASK	(IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE)
 /*
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index cb61d4862fc6..a0d1db98d1fc 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -446,6 +446,7 @@ static int io_timeout_update(struct io_ring_ctx *ctx, __u64 user_data,
 int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_timeout_rem *tr = io_kiocb_to_cmd(req, struct io_timeout_rem);
+	__u64 arg;
 
 	if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
 		return -EINVAL;
@@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 			return -EINVAL;
 		if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
 			tr->ltimeout = true;
-		if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
+		if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
+				  IORING_TIMEOUT_ABS |
+				  IORING_TIMEOUT_IMMEDIATE_ARG))
 			return -EINVAL;
-		if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
+
+		arg = READ_ONCE(sqe->addr2);
+		if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
+			if (tr->flags & IORING_TIMEOUT_ABS)
+				return -EINVAL;
+			tr->ts = ns_to_timespec64(arg);
+		} else if (get_timespec64(&tr->ts, u64_to_user_ptr(arg))) {
 			return -EFAULT;
+		}
+
 		if (tr->ts.tv_sec < 0 || tr->ts.tv_nsec < 0)
 			return -EINVAL;
 	} else if (tr->flags) {
@@ -518,8 +529,8 @@ static int __io_timeout_prep(struct io_kiocb *req,
 {
 	struct io_timeout *timeout = io_kiocb_to_cmd(req, struct io_timeout);
 	struct io_timeout_data *data;
-	unsigned flags;
 	u32 off = READ_ONCE(sqe->off);
+	unsigned flags;
 
 	if (sqe->buf_index || sqe->len != 1 || sqe->splice_fd_in)
 		return -EINVAL;
@@ -528,7 +539,8 @@ static int __io_timeout_prep(struct io_kiocb *req,
 	flags = READ_ONCE(sqe->timeout_flags);
 	if (flags & ~(IORING_TIMEOUT_ABS | IORING_TIMEOUT_CLOCK_MASK |
 		      IORING_TIMEOUT_ETIME_SUCCESS |
-		      IORING_TIMEOUT_MULTISHOT))
+		      IORING_TIMEOUT_MULTISHOT |
+		      IORING_TIMEOUT_IMMEDIATE_ARG))
 		return -EINVAL;
 	/* more than one clock specified is invalid, obviously */
 	if (hweight32(flags & IORING_TIMEOUT_CLOCK_MASK) > 1)
@@ -557,8 +569,14 @@ static int __io_timeout_prep(struct io_kiocb *req,
 	data->req = req;
 	data->flags = flags;
 
-	if (get_timespec64(&data->ts, u64_to_user_ptr(READ_ONCE(sqe->addr))))
+	if (flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
+		if (flags & IORING_TIMEOUT_ABS)
+			return -EINVAL;
+		data->ts = ns_to_timespec64(READ_ONCE(sqe->addr));
+	} else if (get_timespec64(&data->ts,
+				  u64_to_user_ptr(READ_ONCE(sqe->addr)))) {
 		return -EFAULT;
+	}
 
 	if (data->ts.tv_sec < 0 || data->ts.tv_nsec < 0)
 		return -EINVAL;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: (subset) [PATCH v2 0/2] timeout immediate arg
  2026-02-25 10:35 [PATCH v2 0/2] timeout immediate arg Pavel Begunkov
  2026-02-25 10:35 ` [PATCH v2 1/2] io_uring/timeout: READ_ONCE sqe->addr Pavel Begunkov
  2026-02-25 10:35 ` [PATCH v2 2/2] io_uring/timeout: immediate timeout arg Pavel Begunkov
@ 2026-02-25 15:36 ` Jens Axboe
  2 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2026-02-25 15:36 UTC (permalink / raw)
  To: io-uring, Pavel Begunkov; +Cc: Keith Busch


On Wed, 25 Feb 2026 10:35:56 +0000, Pavel Begunkov wrote:
> Allow the user to pass the timeout value inside the SQE instead of
> pointing to a timespec, people asked for it as it makes user space
> simpler. More details description is in Patch 2.
> 
> v2: ditto for timeout updates
> 
> Pavel Begunkov (2):
>   io_uring/timeout: READ_ONCE sqe->addr
>   io_uring/timeout: immediate timeout arg
> 
> [...]

Applied, thanks!

[1/2] io_uring/timeout: READ_ONCE sqe->addr
      commit: 85f6c439a69afe4fa8a688512e586971e97e273a

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-25 10:35 ` [PATCH v2 2/2] io_uring/timeout: immediate timeout arg Pavel Begunkov
@ 2026-02-27 14:08   ` Stefan Metzmacher
  2026-02-27 15:05     ` Jens Axboe
  2026-02-27 19:08     ` Pavel Begunkov
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Metzmacher @ 2026-02-27 14:08 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe, Keith Busch

Hi Pavel,

>   	if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>   		return -EINVAL;
> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>   			return -EINVAL;
>   		if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>   			tr->ltimeout = true;
> -		if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
> +		if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
> +				  IORING_TIMEOUT_ABS |
> +				  IORING_TIMEOUT_IMMEDIATE_ARG))
>   			return -EINVAL;
> -		if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
> +
> +		arg = READ_ONCE(sqe->addr2);
> +		if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
> +			if (tr->flags & IORING_TIMEOUT_ABS)
> +				return -EINVAL;
> +			tr->ts = ns_to_timespec64(arg);

I'm wondering if there is enough free space in a small sqe to hold a full timespec?
So that there is no restriction for IORING_TIMEOUT_ABS...

metze


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 14:08   ` Stefan Metzmacher
@ 2026-02-27 15:05     ` Jens Axboe
  2026-02-27 16:17       ` Stefan Metzmacher
  2026-02-27 19:08     ` Pavel Begunkov
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2026-02-27 15:05 UTC (permalink / raw)
  To: Stefan Metzmacher, Pavel Begunkov, io-uring; +Cc: Keith Busch

On 2/27/26 7:08 AM, Stefan Metzmacher wrote:
> Hi Pavel,
> 
>>       if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>>           return -EINVAL;
>> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>               return -EINVAL;
>>           if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>>               tr->ltimeout = true;
>> -        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
>> +        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
>> +                  IORING_TIMEOUT_ABS |
>> +                  IORING_TIMEOUT_IMMEDIATE_ARG))
>>               return -EINVAL;
>> -        if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
>> +
>> +        arg = READ_ONCE(sqe->addr2);
>> +        if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
>> +            if (tr->flags & IORING_TIMEOUT_ABS)
>> +                return -EINVAL;
>> +            tr->ts = ns_to_timespec64(arg);
> 
> I'm wondering if there is enough free space in a small sqe to hold a full timespec?
> So that there is no restriction for IORING_TIMEOUT_ABS...

There's ->addr3 for another 8b value, so yes it should very much be
possible. I quite like that idea, it'll then be the same as the regular
timeout options, except the values are passed directly in the sqe rather
than needing the copy.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 15:05     ` Jens Axboe
@ 2026-02-27 16:17       ` Stefan Metzmacher
  2026-02-27 16:21         ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Metzmacher @ 2026-02-27 16:17 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Keith Busch

Am 27.02.26 um 16:05 schrieb Jens Axboe:
> On 2/27/26 7:08 AM, Stefan Metzmacher wrote:
>> Hi Pavel,
>>
>>>        if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>>>            return -EINVAL;
>>> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>                return -EINVAL;
>>>            if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>>>                tr->ltimeout = true;
>>> -        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
>>> +        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
>>> +                  IORING_TIMEOUT_ABS |
>>> +                  IORING_TIMEOUT_IMMEDIATE_ARG))
>>>                return -EINVAL;
>>> -        if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
>>> +
>>> +        arg = READ_ONCE(sqe->addr2);
>>> +        if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
>>> +            if (tr->flags & IORING_TIMEOUT_ABS)
>>> +                return -EINVAL;
>>> +            tr->ts = ns_to_timespec64(arg);
>>
>> I'm wondering if there is enough free space in a small sqe to hold a full timespec?
>> So that there is no restriction for IORING_TIMEOUT_ABS...
> 
> There's ->addr3 for another 8b value, so yes it should very much be
> possible. I quite like that idea, it'll then be the same as the regular
> timeout options, except the values are passed directly in the sqe rather
> than needing the copy.

Yes, attr_ptr and attr_type_mask would even be two 8b values together,
basically the same as 'struct __kernel_timespec', correct?

So we could even add 'struct __kernel_timespec ts;' to the last
union of sqe...

metze

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 16:17       ` Stefan Metzmacher
@ 2026-02-27 16:21         ` Jens Axboe
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2026-02-27 16:21 UTC (permalink / raw)
  To: Stefan Metzmacher, Pavel Begunkov, io-uring; +Cc: Keith Busch

On 2/27/26 9:17 AM, Stefan Metzmacher wrote:
> Am 27.02.26 um 16:05 schrieb Jens Axboe:
>> On 2/27/26 7:08 AM, Stefan Metzmacher wrote:
>>> Hi Pavel,
>>>
>>>>        if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>>>>            return -EINVAL;
>>>> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>                return -EINVAL;
>>>>            if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>>>>                tr->ltimeout = true;
>>>> -        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
>>>> +        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
>>>> +                  IORING_TIMEOUT_ABS |
>>>> +                  IORING_TIMEOUT_IMMEDIATE_ARG))
>>>>                return -EINVAL;
>>>> -        if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
>>>> +
>>>> +        arg = READ_ONCE(sqe->addr2);
>>>> +        if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
>>>> +            if (tr->flags & IORING_TIMEOUT_ABS)
>>>> +                return -EINVAL;
>>>> +            tr->ts = ns_to_timespec64(arg);
>>>
>>> I'm wondering if there is enough free space in a small sqe to hold a full timespec?
>>> So that there is no restriction for IORING_TIMEOUT_ABS...
>>
>> There's ->addr3 for another 8b value, so yes it should very much be
>> possible. I quite like that idea, it'll then be the same as the regular
>> timeout options, except the values are passed directly in the sqe rather
>> than needing the copy.
> 
> Yes, attr_ptr and attr_type_mask would even be two 8b values together,
> basically the same as 'struct __kernel_timespec', correct?
> 
> So we could even add 'struct __kernel_timespec ts;' to the last
> union of sqe...

Let's just encode it in addr and addr3, I'd prefer not to further
obfuscate the sqe definition! And on the liburing side, you just add an
immediate helper for setting it up like that, done.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 14:08   ` Stefan Metzmacher
  2026-02-27 15:05     ` Jens Axboe
@ 2026-02-27 19:08     ` Pavel Begunkov
  2026-02-27 19:39       ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-27 19:08 UTC (permalink / raw)
  To: Stefan Metzmacher, io-uring; +Cc: axboe, Keith Busch

On 2/27/26 14:08, Stefan Metzmacher wrote:
> Hi Pavel,
> 
>>       if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>>           return -EINVAL;
>> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>               return -EINVAL;
>>           if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>>               tr->ltimeout = true;
>> -        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
>> +        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
>> +                  IORING_TIMEOUT_ABS |
>> +                  IORING_TIMEOUT_IMMEDIATE_ARG))
>>               return -EINVAL;
>> -        if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
>> +
>> +        arg = READ_ONCE(sqe->addr2);
>> +        if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
>> +            if (tr->flags & IORING_TIMEOUT_ABS)
>> +                return -EINVAL;
>> +            tr->ts = ns_to_timespec64(arg);
> 
> I'm wondering if there is enough free space in a small sqe to hold a full timespec?
> So that there is no restriction for IORING_TIMEOUT_ABS...

Well, u64 gives ~500 years in ns, it should be fine to just
allow the abs mode. We just need to make sure to zero check
the unused fields in case it'd need to be extended.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 19:08     ` Pavel Begunkov
@ 2026-02-27 19:39       ` Jens Axboe
  2026-02-27 20:03         ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2026-02-27 19:39 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 12:08 PM, Pavel Begunkov wrote:
> On 2/27/26 14:08, Stefan Metzmacher wrote:
>> Hi Pavel,
>>
>>>       if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>>>           return -EINVAL;
>>> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>               return -EINVAL;
>>>           if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>>>               tr->ltimeout = true;
>>> -        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
>>> +        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
>>> +                  IORING_TIMEOUT_ABS |
>>> +                  IORING_TIMEOUT_IMMEDIATE_ARG))
>>>               return -EINVAL;
>>> -        if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
>>> +
>>> +        arg = READ_ONCE(sqe->addr2);
>>> +        if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
>>> +            if (tr->flags & IORING_TIMEOUT_ABS)
>>> +                return -EINVAL;
>>> +            tr->ts = ns_to_timespec64(arg);
>>
>> I'm wondering if there is enough free space in a small sqe to hold a full timespec?
>> So that there is no restriction for IORING_TIMEOUT_ABS...
> 
> Well, u64 gives ~500 years in ns, it should be fine to just
> allow the abs mode. We just need to make sure to zero check
> the unused fields in case it'd need to be extended.

I don't think it's about length of it - if you can avoid the div by
doing ns_to_timespec64(), that might be very useful? Would make
userspace simpler too potentially, and basically make the immediate mode
_exactly_ the same as the non-immediate mode, it just delivers the
__kernel_timespec in a different way.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 19:39       ` Jens Axboe
@ 2026-02-27 20:03         ` Pavel Begunkov
  2026-02-27 20:19           ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-27 20:03 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 19:39, Jens Axboe wrote:
> On 2/27/26 12:08 PM, Pavel Begunkov wrote:
>> On 2/27/26 14:08, Stefan Metzmacher wrote:
>>> Hi Pavel,
>>>
>>>>        if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>>>>            return -EINVAL;
>>>> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>                return -EINVAL;
>>>>            if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>>>>                tr->ltimeout = true;
>>>> -        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
>>>> +        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
>>>> +                  IORING_TIMEOUT_ABS |
>>>> +                  IORING_TIMEOUT_IMMEDIATE_ARG))
>>>>                return -EINVAL;
>>>> -        if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
>>>> +
>>>> +        arg = READ_ONCE(sqe->addr2);
>>>> +        if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
>>>> +            if (tr->flags & IORING_TIMEOUT_ABS)
>>>> +                return -EINVAL;
>>>> +            tr->ts = ns_to_timespec64(arg);
>>>
>>> I'm wondering if there is enough free space in a small sqe to hold a full timespec?
>>> So that there is no restriction for IORING_TIMEOUT_ABS...
>>
>> Well, u64 gives ~500 years in ns, it should be fine to just
>> allow the abs mode. We just need to make sure to zero check
>> the unused fields in case it'd need to be extended.
> 
> I don't think it's about length of it - if you can avoid the div by
> doing ns_to_timespec64(), that might be very useful? Would make

hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), mode);
                                    ^^^

io_uring just needs to flip it and use ktime, but I left it for later.

> userspace simpler too potentially, and basically make the immediate mode
> _exactly_ the same as the non-immediate mode, it just delivers the
> __kernel_timespec in a different way.

I very much want to believe that everything about kernel_timespec has
some deep meaning, but I fail to see why they split it as sec/ns and
left invalid ranges for ns, why ns is signed, and why even after a
large revamp one of the fields doesn't use a fixed width type.
I'm not sure exactly like it is actually a good idea.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 20:03         ` Pavel Begunkov
@ 2026-02-27 20:19           ` Jens Axboe
  2026-02-27 21:09             ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2026-02-27 20:19 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 1:03 PM, Pavel Begunkov wrote:
> On 2/27/26 19:39, Jens Axboe wrote:
>> On 2/27/26 12:08 PM, Pavel Begunkov wrote:
>>> On 2/27/26 14:08, Stefan Metzmacher wrote:
>>>> Hi Pavel,
>>>>
>>>>>        if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT)))
>>>>>            return -EINVAL;
>>>>> @@ -460,10 +461,20 @@ int io_timeout_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>                return -EINVAL;
>>>>>            if (tr->flags & IORING_LINK_TIMEOUT_UPDATE)
>>>>>                tr->ltimeout = true;
>>>>> -        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK|IORING_TIMEOUT_ABS))
>>>>> +        if (tr->flags & ~(IORING_TIMEOUT_UPDATE_MASK |
>>>>> +                  IORING_TIMEOUT_ABS |
>>>>> +                  IORING_TIMEOUT_IMMEDIATE_ARG))
>>>>>                return -EINVAL;
>>>>> -        if (get_timespec64(&tr->ts, u64_to_user_ptr(READ_ONCE(sqe->addr2))))
>>>>> +
>>>>> +        arg = READ_ONCE(sqe->addr2);
>>>>> +        if (tr->flags & IORING_TIMEOUT_IMMEDIATE_ARG) {
>>>>> +            if (tr->flags & IORING_TIMEOUT_ABS)
>>>>> +                return -EINVAL;
>>>>> +            tr->ts = ns_to_timespec64(arg);
>>>>
>>>> I'm wondering if there is enough free space in a small sqe to hold a full timespec?
>>>> So that there is no restriction for IORING_TIMEOUT_ABS...
>>>
>>> Well, u64 gives ~500 years in ns, it should be fine to just
>>> allow the abs mode. We just need to make sure to zero check
>>> the unused fields in case it'd need to be extended.
>>
>> I don't think it's about length of it - if you can avoid the div by
>> doing ns_to_timespec64(), that might be very useful? Would make
> 
> hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), mode);
>                                    ^^^
> 
> io_uring just needs to flip it and use ktime, but I left it for later.

I think we should go all the way with this if we're doing the immediate
mode. And I do think that is a good idea. Doing a half-way thing doesn't
make much sense to me.

>> userspace simpler too potentially, and basically make the immediate mode
>> _exactly_ the same as the non-immediate mode, it just delivers the
>> __kernel_timespec in a different way.
> 
> I very much want to believe that everything about kernel_timespec has
> some deep meaning, but I fail to see why they split it as sec/ns and
> left invalid ranges for ns, why ns is signed, and why even after a
> large revamp one of the fields doesn't use a fixed width type.
> I'm not sure exactly like it is actually a good idea.

But that's the API for anything timing related, whether it be timeval,
timespec, or __kernel_timespec the latter obviously only existing
because everybody else could not be bothered to do a proper 32 vs 64-bit
agnostic type before. Hence that's the API that people know and use,
there's no deeper meaning other than that. And I agree, it's kind of
crap in how you can have an invalid range and it gets masked.

It's like like I'm a huge __kernel_timespec fan, but for consistency's
sake, I do like it. With a clock source, then it does start to make
sense. Not that I think there's a lot of ABS use cases, I'd expect
relative to be what people generally use here. But at least then the
IMMED API addition will just work regardless of what you do in the app.
That's better than someone a few revisions later than saying "hey that's
cool, can we do ABS too which I use because of X and Y" and then having
to hack that on top.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 20:19           ` Jens Axboe
@ 2026-02-27 21:09             ` Pavel Begunkov
  2026-02-27 21:17               ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-27 21:09 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 20:19, Jens Axboe wrote:
> On 2/27/26 1:03 PM, Pavel Begunkov wrote:
>> On 2/27/26 19:39, Jens Axboe wrote:
...
>>> I don't think it's about length of it - if you can avoid the div by
>>> doing ns_to_timespec64(), that might be very useful? Would make
>>
>> hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), mode);
>>                                     ^^^
>>
>> io_uring just needs to flip it and use ktime, but I left it for later.
> 
> I think we should go all the way with this if we're doing the immediate
> mode. And I do think that is a good idea. Doing a half-way thing doesn't
> make much sense to me.

One time people will tell you don't do premature optimisations,
merge the main thing first, another time it's the other way around,
you can never guess.

In either case, it's useful by itself. Div is not that expensive
to be a break dealer, and div by constant will usually get
replaced with a mul by modern compilers.

>>> userspace simpler too potentially, and basically make the immediate mode
>>> _exactly_ the same as the non-immediate mode, it just delivers the
>>> __kernel_timespec in a different way.
>>
>> I very much want to believe that everything about kernel_timespec has
>> some deep meaning, but I fail to see why they split it as sec/ns and
>> left invalid ranges for ns, why ns is signed, and why even after a
>> large revamp one of the fields doesn't use a fixed width type.
>> I'm not sure exactly like it is actually a good idea.
> 
> But that's the API for anything timing related, whether it be timeval,
> timespec, or __kernel_timespec the latter obviously only existing
> because everybody else could not be bothered to do a proper 32 vs 64-bit
> agnostic type before. Hence that's the API that people know and use,
> there's no deeper meaning other than that. And I agree, it's kind of
> crap in how you can have an invalid range and it gets masked.
> 
> It's like like I'm a huge __kernel_timespec fan, but for consistency's
> sake, I do like it. With a clock source, then it does start to make
> sense. 

What's about clock source? I'm curious

> Not that I think there's a lot of ABS use cases, I'd expect
> relative to be what people generally use here. But at least then the

It's probably common enough. Not the "wake me on Monday at 5am" kind,
but rather get the current time and then recalculating intervals
from it. Would be nice to return the wake up time as well in a CQE,
if only it was free.

> IMMED API addition will just work regardless of what you do in the app.
> That's better than someone a few revisions later than saying "hey that's
> cool, can we do ABS too which I use because of X and Y" and then having
> to hack that on top.

Hack it up in the user program? I didn't get it.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 21:09             ` Pavel Begunkov
@ 2026-02-27 21:17               ` Jens Axboe
  2026-02-27 22:10                 ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2026-02-27 21:17 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 2:09 PM, Pavel Begunkov wrote:
> On 2/27/26 20:19, Jens Axboe wrote:
>> On 2/27/26 1:03 PM, Pavel Begunkov wrote:
>>> On 2/27/26 19:39, Jens Axboe wrote:
> ...
>>>> I don't think it's about length of it - if you can avoid the div by
>>>> doing ns_to_timespec64(), that might be very useful? Would make
>>>
>>> hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), mode);
>>>                                     ^^^
>>>
>>> io_uring just needs to flip it and use ktime, but I left it for later.
>>
>> I think we should go all the way with this if we're doing the immediate
>> mode. And I do think that is a good idea. Doing a half-way thing doesn't
>> make much sense to me.
> 
> One time people will tell you don't do premature optimisations,
> merge the main thing first, another time it's the other way around,
> you can never guess.

IMHO this isn't premature optimization, it's more about just doing it
right in the first place.

> In either case, it's useful by itself. Div is not that expensive
> to be a break dealer, and div by constant will usually get
> replaced with a mul by modern compilers.

Agree, and that's likely what it does here. It's just why not avoid it
in the first place if possible. That's not premature optimization.

>>>> userspace simpler too potentially, and basically make the immediate mode
>>>> _exactly_ the same as the non-immediate mode, it just delivers the
>>>> __kernel_timespec in a different way.
>>>
>>> I very much want to believe that everything about kernel_timespec has
>>> some deep meaning, but I fail to see why they split it as sec/ns and
>>> left invalid ranges for ns, why ns is signed, and why even after a
>>> large revamp one of the fields doesn't use a fixed width type.
>>> I'm not sure exactly like it is actually a good idea.
>>
>> But that's the API for anything timing related, whether it be timeval,
>> timespec, or __kernel_timespec the latter obviously only existing
>> because everybody else could not be bothered to do a proper 32 vs 64-bit
>> agnostic type before. Hence that's the API that people know and use,
>> there's no deeper meaning other than that. And I agree, it's kind of
>> crap in how you can have an invalid range and it gets masked.
>>
>> It's like like I'm a huge __kernel_timespec fan, but for consistency's
>> sake, I do like it. With a clock source, then it does start to make
>> sense. 
> 
> What's about clock source? I'm curious

Just that timespec/timeval/whatever usually go with a clocksource, vs
some isolated nsec value.

>> Not that I think there's a lot of ABS use cases, I'd expect
>> relative to be what people generally use here. But at least then the
> 
> It's probably common enough. Not the "wake me on Monday at 5am" kind,
> but rather get the current time and then recalculating intervals
> from it. Would be nice to return the wake up time as well in a CQE,
> if only it was free.

Yeah could be, it's always hard to know. Which is why it's nice to error
on the side of "let's just support both upfront" if it's feasible.

>> IMMED API addition will just work regardless of what you do in the app.
>> That's better than someone a few revisions later than saying "hey that's
>> cool, can we do ABS too which I use because of X and Y" and then having
>> to hack that on top.
> 
> Hack it up in the user program? I didn't get it.

No, I mean needing to retrofit ABS support on the kernel side for IMMED
up front.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 21:17               ` Jens Axboe
@ 2026-02-27 22:10                 ` Pavel Begunkov
  2026-02-27 22:19                   ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-27 22:10 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 21:17, Jens Axboe wrote:
> On 2/27/26 2:09 PM, Pavel Begunkov wrote:
>> On 2/27/26 20:19, Jens Axboe wrote:
>>> On 2/27/26 1:03 PM, Pavel Begunkov wrote:
>>>> On 2/27/26 19:39, Jens Axboe wrote:
>> ...
>>>>> I don't think it's about length of it - if you can avoid the div by
>>>>> doing ns_to_timespec64(), that might be very useful? Would make
>>>>
>>>> hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), mode);
>>>>                                      ^^^
>>>>
>>>> io_uring just needs to flip it and use ktime, but I left it for later.
>>>
>>> I think we should go all the way with this if we're doing the immediate
>>> mode. And I do think that is a good idea. Doing a half-way thing doesn't
>>> make much sense to me.
>>
>> One time people will tell you don't do premature optimisations,
>> merge the main thing first, another time it's the other way around,
>> you can never guess.
> 
> IMHO this isn't premature optimization, it's more about just doing it
> right in the first place.
> 
>> In either case, it's useful by itself. Div is not that expensive
>> to be a break dealer, and div by constant will usually get
>> replaced with a mul by modern compilers.
> 
> Agree, and that's likely what it does here. It's just why not avoid it
> in the first place if possible. That's not premature optimization.

I think uapi should come first before we waste time on making
internals one way or another.

>>>>> userspace simpler too potentially, and basically make the immediate mode
>>>>> _exactly_ the same as the non-immediate mode, it just delivers the
>>>>> __kernel_timespec in a different way.
>>>>
>>>> I very much want to believe that everything about kernel_timespec has
>>>> some deep meaning, but I fail to see why they split it as sec/ns and
>>>> left invalid ranges for ns, why ns is signed, and why even after a
>>>> large revamp one of the fields doesn't use a fixed width type.
>>>> I'm not sure exactly like it is actually a good idea.
>>>
>>> But that's the API for anything timing related, whether it be timeval,
>>> timespec, or __kernel_timespec the latter obviously only existing
>>> because everybody else could not be bothered to do a proper 32 vs 64-bit
>>> agnostic type before. Hence that's the API that people know and use,
>>> there's no deeper meaning other than that. And I agree, it's kind of
>>> crap in how you can have an invalid range and it gets masked.
>>>
>>> It's like like I'm a huge __kernel_timespec fan, but for consistency's
>>> sake, I do like it. With a clock source, then it does start to make
>>> sense.
>>
>> What's about clock source? I'm curious
> 
> Just that timespec/timeval/whatever usually go with a clocksource, vs
> some isolated nsec value.

They're orthogonal though, timeout requests also support clock
sources.

>>> Not that I think there's a lot of ABS use cases, I'd expect
>>> relative to be what people generally use here. But at least then the
>>
>> It's probably common enough. Not the "wake me on Monday at 5am" kind,
>> but rather get the current time and then recalculating intervals
>> from it. Would be nice to return the wake up time as well in a CQE,
>> if only it was free.
> 
> Yeah could be, it's always hard to know. Which is why it's nice to error
> on the side of "let's just support both upfront" if it's feasible.
> 
>>> IMMED API addition will just work regardless of what you do in the app.
>>> That's better than someone a few revisions later than saying "hey that's
>>> cool, can we do ABS too which I use because of X and Y" and then having
>>> to hack that on top.
>>
>> Hack it up in the user program? I didn't get it.
> 
> No, I mean needing to retrofit ABS support on the kernel side for IMMED
> up front.

They should be enabled in the same release, but we've been rather
discussing the way to do that. I was saying that u64 is enough to
pass the abs timeout value, and we can extend it to another u64 if
needed in several centuries from now. And it's not a bad option
because plain u64 ns makes much much more sense for the relative
mode. And even for the abs scenario above, I'd prefer that rather
than doing second adjustments every single time.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 22:10                 ` Pavel Begunkov
@ 2026-02-27 22:19                   ` Jens Axboe
  2026-02-27 22:47                     ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2026-02-27 22:19 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 3:10 PM, Pavel Begunkov wrote:
> On 2/27/26 21:17, Jens Axboe wrote:
>> On 2/27/26 2:09 PM, Pavel Begunkov wrote:
>>> On 2/27/26 20:19, Jens Axboe wrote:
>>>> On 2/27/26 1:03 PM, Pavel Begunkov wrote:
>>>>> On 2/27/26 19:39, Jens Axboe wrote:
>>> ...
>>>>>> I don't think it's about length of it - if you can avoid the div by
>>>>>> doing ns_to_timespec64(), that might be very useful? Would make
>>>>>
>>>>> hrtimer_start(&data->timer, timespec64_to_ktime(data->ts), mode);
>>>>>                                      ^^^
>>>>>
>>>>> io_uring just needs to flip it and use ktime, but I left it for later.
>>>>
>>>> I think we should go all the way with this if we're doing the immediate
>>>> mode. And I do think that is a good idea. Doing a half-way thing doesn't
>>>> make much sense to me.
>>>
>>> One time people will tell you don't do premature optimisations,
>>> merge the main thing first, another time it's the other way around,
>>> you can never guess.
>>
>> IMHO this isn't premature optimization, it's more about just doing it
>> right in the first place.
>>
>>> In either case, it's useful by itself. Div is not that expensive
>>> to be a break dealer, and div by constant will usually get
>>> replaced with a mul by modern compilers.
>>
>> Agree, and that's likely what it does here. It's just why not avoid it
>> in the first place if possible. That's not premature optimization.
> 
> I think uapi should come first before we waste time on making
> internals one way or another.

Yep agree, which is really what this reply thread is mostly about - the
user API.

>>>>>> userspace simpler too potentially, and basically make the immediate mode
>>>>>> _exactly_ the same as the non-immediate mode, it just delivers the
>>>>>> __kernel_timespec in a different way.
>>>>>
>>>>> I very much want to believe that everything about kernel_timespec has
>>>>> some deep meaning, but I fail to see why they split it as sec/ns and
>>>>> left invalid ranges for ns, why ns is signed, and why even after a
>>>>> large revamp one of the fields doesn't use a fixed width type.
>>>>> I'm not sure exactly like it is actually a good idea.
>>>>
>>>> But that's the API for anything timing related, whether it be timeval,
>>>> timespec, or __kernel_timespec the latter obviously only existing
>>>> because everybody else could not be bothered to do a proper 32 vs 64-bit
>>>> agnostic type before. Hence that's the API that people know and use,
>>>> there's no deeper meaning other than that. And I agree, it's kind of
>>>> crap in how you can have an invalid range and it gets masked.
>>>>
>>>> It's like like I'm a huge __kernel_timespec fan, but for consistency's
>>>> sake, I do like it. With a clock source, then it does start to make
>>>> sense.
>>>
>>> What's about clock source? I'm curious
>>
>> Just that timespec/timeval/whatever usually go with a clocksource, vs
>> some isolated nsec value.
> 
> They're orthogonal though, timeout requests also support clock
> sources.

Sure, it's just less natural that way. At least imho.

>>>> Not that I think there's a lot of ABS use cases, I'd expect
>>>> relative to be what people generally use here. But at least then the
>>>
>>> It's probably common enough. Not the "wake me on Monday at 5am" kind,
>>> but rather get the current time and then recalculating intervals
>>> from it. Would be nice to return the wake up time as well in a CQE,
>>> if only it was free.
>>
>> Yeah could be, it's always hard to know. Which is why it's nice to error
>> on the side of "let's just support both upfront" if it's feasible.
>>
>>>> IMMED API addition will just work regardless of what you do in the app.
>>>> That's better than someone a few revisions later than saying "hey that's
>>>> cool, can we do ABS too which I use because of X and Y" and then having
>>>> to hack that on top.
>>>
>>> Hack it up in the user program? I didn't get it.
>>
>> No, I mean needing to retrofit ABS support on the kernel side for IMMED
>> up front.
> 
> They should be enabled in the same release, but we've been rather
> discussing the way to do that. I was saying that u64 is enough to
> pass the abs timeout value, and we can extend it to another u64 if
> needed in several centuries from now. And it's not a bad option
> because plain u64 ns makes much much more sense for the relative
> mode. And even for the abs scenario above, I'd prefer that rather
> than doing second adjustments every single time.

ABS makes very little sense as nanoseconds, that's pretty confusing on
the userspace side. That's the main issue.

I'm not sure why it's such a big deal to just encode the sec/nsec so
that userspace can use it directly from a timespec or timeval which is
most likely what they are querying time from anyway? If you do absolute,
surely you'd do

get_time(&t);
t.tv_sec += 1;

now issue timeout for that. That's a hell of a lot more natural to use
than converting to and from nsecs.

For relative it's obviously not a huge deal, but it'd be nice to keep
them consistent.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/2] io_uring/timeout: immediate timeout arg
  2026-02-27 22:19                   ` Jens Axboe
@ 2026-02-27 22:47                     ` Pavel Begunkov
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2026-02-27 22:47 UTC (permalink / raw)
  To: Jens Axboe, Stefan Metzmacher, io-uring; +Cc: Keith Busch

On 2/27/26 22:19, Jens Axboe wrote:
...
>> They should be enabled in the same release, but we've been rather
>> discussing the way to do that. I was saying that u64 is enough to
>> pass the abs timeout value, and we can extend it to another u64 if
>> needed in several centuries from now. And it's not a bad option
>> because plain u64 ns makes much much more sense for the relative
>> mode. And even for the abs scenario above, I'd prefer that rather
>> than doing second adjustments every single time.
> 
> ABS makes very little sense as nanoseconds, that's pretty confusing on
> the userspace side. That's the main issue.
> 
> I'm not sure why it's such a big deal to just encode the sec/nsec so
> that userspace can use it directly from a timespec or timeval which is
> most likely what they are querying time from anyway? If you do absolute,
> surely you'd do
> 
> get_time(&t);
> t.tv_sec += 1;

More like +N ms, which would be

t.tv_sec += N / 1000;
t.tv_nsec += (N % 1000) * NS_IN_MS;
if (t.tv_nsec >= NS_IN_SEC) {
	t.tv_nsec -= NS_IN_SEC;
	t.tv_sec++;
}

And then you want to compare them and calculate differences. io_uring
works with __kernel_timespec, but just take a look at liburing
tests/examples, lots of them open code some version of
get_time_[m,u,n]s unless they hard code a specific relative timeout.
It's a self propelling misery.

> now issue timeout for that. That's a hell of a lot more natural to use
> than converting to and from nsecs.

I'd rather convert it to ns once and use that after. And I bet it'll
be nicer with other non Linux specific libraries. e.g. you can get
ns from std c++.

> For relative it's obviously not a huge deal, but it'd be nice to keep
> them consistent.
> 

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-02-27 22:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-25 10:35 [PATCH v2 0/2] timeout immediate arg Pavel Begunkov
2026-02-25 10:35 ` [PATCH v2 1/2] io_uring/timeout: READ_ONCE sqe->addr Pavel Begunkov
2026-02-25 10:35 ` [PATCH v2 2/2] io_uring/timeout: immediate timeout arg Pavel Begunkov
2026-02-27 14:08   ` Stefan Metzmacher
2026-02-27 15:05     ` Jens Axboe
2026-02-27 16:17       ` Stefan Metzmacher
2026-02-27 16:21         ` Jens Axboe
2026-02-27 19:08     ` Pavel Begunkov
2026-02-27 19:39       ` Jens Axboe
2026-02-27 20:03         ` Pavel Begunkov
2026-02-27 20:19           ` Jens Axboe
2026-02-27 21:09             ` Pavel Begunkov
2026-02-27 21:17               ` Jens Axboe
2026-02-27 22:10                 ` Pavel Begunkov
2026-02-27 22:19                   ` Jens Axboe
2026-02-27 22:47                     ` Pavel Begunkov
2026-02-25 15:36 ` (subset) [PATCH v2 0/2] timeout immediate arg Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox