* [PATCH 1/1] io_uring/zctx: separate notification user_data
@ 2026-02-16 11:48 Pavel Begunkov
2026-02-16 15:10 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2026-02-16 11:48 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, Dylan Yudaken
People previously asked for the notification CQE to have a different
user_data value from the main request completion. It's useful to
separate buffer and request handling logic and avoid separately
refcounting the request.
Let the user pass the notification user_data in sqe->addr3. If zero,
it'll inherit sqe->user_data as before. It doesn't change the rules for
when the user can expect a notification CQE, and it should still check
the IORING_CQE_F_MORE flag.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/net.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/io_uring/net.c b/io_uring/net.c
index 7ebfd51b84de..5673f088fc1d 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1327,11 +1327,12 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
struct io_ring_ctx *ctx = req->ctx;
struct io_async_msghdr *iomsg;
struct io_kiocb *notif;
+ u64 user_data;
int ret;
zc->done_io = 0;
- if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
+ if (unlikely(READ_ONCE(sqe->__pad2[0])))
return -EINVAL;
/* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
if (req->flags & REQ_F_CQE_SKIP)
@@ -1340,7 +1341,11 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
notif = zc->notif = io_alloc_notif(ctx);
if (!notif)
return -ENOMEM;
- notif->cqe.user_data = req->cqe.user_data;
+ user_data = READ_ONCE(sqe->addr3);
+ if (!user_data)
+ user_data = req->cqe.user_data;
+
+ notif->cqe.user_data = user_data;
notif->cqe.res = 0;
notif->cqe.flags = IORING_CQE_F_NOTIF;
req->flags |= REQ_F_NEED_CLEANUP | REQ_F_POLL_NO_LAZY;
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data
2026-02-16 11:48 [PATCH 1/1] io_uring/zctx: separate notification user_data Pavel Begunkov
@ 2026-02-16 15:10 ` Jens Axboe
2026-02-16 15:48 ` Pavel Begunkov
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2026-02-16 15:10 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken
On 2/16/26 4:48 AM, Pavel Begunkov wrote:
> People previously asked for the notification CQE to have a different
> user_data value from the main request completion. It's useful to
> separate buffer and request handling logic and avoid separately
> refcounting the request.
>
> Let the user pass the notification user_data in sqe->addr3. If zero,
> it'll inherit sqe->user_data as before. It doesn't change the rules for
> when the user can expect a notification CQE, and it should still check
> the IORING_CQE_F_MORE flag.
This should use and sqe->ioprio flag to manage it, otherwise you're
excluding 0. Which may not be important in and of itself, but the
flag approach is expected way to do this.
Also, please use io_uring/net: for this in the subject and just
have the title reflect zerocopy send.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data
2026-02-16 15:10 ` Jens Axboe
@ 2026-02-16 15:48 ` Pavel Begunkov
2026-02-16 15:52 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2026-02-16 15:48 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Dylan Yudaken
On 2/16/26 15:10, Jens Axboe wrote:
> On 2/16/26 4:48 AM, Pavel Begunkov wrote:
>> People previously asked for the notification CQE to have a different
>> user_data value from the main request completion. It's useful to
>> separate buffer and request handling logic and avoid separately
>> refcounting the request.
>>
>> Let the user pass the notification user_data in sqe->addr3. If zero,
>> it'll inherit sqe->user_data as before. It doesn't change the rules for
>> when the user can expect a notification CQE, and it should still check
>> the IORING_CQE_F_MORE flag.
>
> This should use and sqe->ioprio flag to manage it, otherwise you're
> excluding 0. Which may not be important in and of itself, but the
> flag approach is expected way to do this.
What's the benefit? It's not unreasonable to exclude zero, it won't
limit any use cases, and it's not new either (i.e. buffer tags).
On the other hand, the user will now have to modify two fields
instead of one, which is cleaner. And you're taking one extra bit
out of 16bit ->ioprio, which is not critical if it's all going to
be flags, but it wouldn't be an outrageous idea to take 8 bits
out of it for some index, for example.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data
2026-02-16 15:48 ` Pavel Begunkov
@ 2026-02-16 15:52 ` Jens Axboe
2026-02-16 15:53 ` Pavel Begunkov
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2026-02-16 15:52 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken
On 2/16/26 8:48 AM, Pavel Begunkov wrote:
> On 2/16/26 15:10, Jens Axboe wrote:
>> On 2/16/26 4:48 AM, Pavel Begunkov wrote:
>>> People previously asked for the notification CQE to have a different
>>> user_data value from the main request completion. It's useful to
>>> separate buffer and request handling logic and avoid separately
>>> refcounting the request.
>>>
>>> Let the user pass the notification user_data in sqe->addr3. If zero,
>>> it'll inherit sqe->user_data as before. It doesn't change the rules for
>>> when the user can expect a notification CQE, and it should still check
>>> the IORING_CQE_F_MORE flag.
>>
>> This should use and sqe->ioprio flag to manage it, otherwise you're
>> excluding 0. Which may not be important in and of itself, but the
>> flag approach is expected way to do this.
>
> What's the benefit? It's not unreasonable to exclude zero, it won't
> limit any use cases, and it's not new either (i.e. buffer tags).
> On the other hand, the user will now have to modify two fields
> instead of one, which is cleaner. And you're taking one extra bit
> out of 16bit ->ioprio, which is not critical if it's all going to
> be flags, but it wouldn't be an outrageous idea to take 8 bits
> out of it for some index, for example.
The benefit is that it's weird to exclude a given user_data value, just
so it can get used as both a key and a flag. IMHO much cleaner to have a
flag for it which explicitly says "use the user_data I provide". Also
easier to explain in docs, set this flag and then the value in X will be
the user_data for the completion.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data
2026-02-16 15:52 ` Jens Axboe
@ 2026-02-16 15:53 ` Pavel Begunkov
2026-02-16 15:55 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2026-02-16 15:53 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Dylan Yudaken
On 2/16/26 15:52, Jens Axboe wrote:
> On 2/16/26 8:48 AM, Pavel Begunkov wrote:
>> On 2/16/26 15:10, Jens Axboe wrote:
>>> On 2/16/26 4:48 AM, Pavel Begunkov wrote:
>>>> People previously asked for the notification CQE to have a different
>>>> user_data value from the main request completion. It's useful to
>>>> separate buffer and request handling logic and avoid separately
>>>> refcounting the request.
>>>>
>>>> Let the user pass the notification user_data in sqe->addr3. If zero,
>>>> it'll inherit sqe->user_data as before. It doesn't change the rules for
>>>> when the user can expect a notification CQE, and it should still check
>>>> the IORING_CQE_F_MORE flag.
>>>
>>> This should use and sqe->ioprio flag to manage it, otherwise you're
>>> excluding 0. Which may not be important in and of itself, but the
>>> flag approach is expected way to do this.
>>
>> What's the benefit? It's not unreasonable to exclude zero, it won't
>> limit any use cases, and it's not new either (i.e. buffer tags).
>> On the other hand, the user will now have to modify two fields
>> instead of one, which is cleaner. And you're taking one extra bit
>> out of 16bit ->ioprio, which is not critical if it's all going to
>> be flags, but it wouldn't be an outrageous idea to take 8 bits
>> out of it for some index, for example.
>
> The benefit is that it's weird to exclude a given user_data value, just
> so it can get used as both a key and a flag. IMHO much cleaner to have a
> flag for it which explicitly says "use the user_data I provide". Also
> easier to explain in docs, set this flag and then the value in X will be
> the user_data for the completion.
Ok, I'll respin, let's go with wasting bits for nothing.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data
2026-02-16 15:53 ` Pavel Begunkov
@ 2026-02-16 15:55 ` Jens Axboe
2026-02-16 17:20 ` Pavel Begunkov
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2026-02-16 15:55 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken
On 2/16/26 8:53 AM, Pavel Begunkov wrote:
> On 2/16/26 15:52, Jens Axboe wrote:
>> On 2/16/26 8:48 AM, Pavel Begunkov wrote:
>>> On 2/16/26 15:10, Jens Axboe wrote:
>>>> On 2/16/26 4:48 AM, Pavel Begunkov wrote:
>>>>> People previously asked for the notification CQE to have a different
>>>>> user_data value from the main request completion. It's useful to
>>>>> separate buffer and request handling logic and avoid separately
>>>>> refcounting the request.
>>>>>
>>>>> Let the user pass the notification user_data in sqe->addr3. If zero,
>>>>> it'll inherit sqe->user_data as before. It doesn't change the rules for
>>>>> when the user can expect a notification CQE, and it should still check
>>>>> the IORING_CQE_F_MORE flag.
>>>>
>>>> This should use and sqe->ioprio flag to manage it, otherwise you're
>>>> excluding 0. Which may not be important in and of itself, but the
>>>> flag approach is expected way to do this.
>>>
>>> What's the benefit? It's not unreasonable to exclude zero, it won't
>>> limit any use cases, and it's not new either (i.e. buffer tags).
>>> On the other hand, the user will now have to modify two fields
>>> instead of one, which is cleaner. And you're taking one extra bit
>>> out of 16bit ->ioprio, which is not critical if it's all going to
>>> be flags, but it wouldn't be an outrageous idea to take 8 bits
>>> out of it for some index, for example.
>>
>> The benefit is that it's weird to exclude a given user_data value, just
>> so it can get used as both a key and a flag. IMHO much cleaner to have a
>> flag for it which explicitly says "use the user_data I provide". Also
>> easier to explain in docs, set this flag and then the value in X will be
>> the user_data for the completion.
>
> Ok, I'll respin, let's go with wasting bits for nothing.
It's not like they are a scarce resource, and if we need more than 16
bits to modify send/recv behavior, then arguably we have bigger
problems.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data
2026-02-16 15:55 ` Jens Axboe
@ 2026-02-16 17:20 ` Pavel Begunkov
2026-02-16 17:27 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2026-02-16 17:20 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Dylan Yudaken
On 2/16/26 15:55, Jens Axboe wrote:
> On 2/16/26 8:53 AM, Pavel Begunkov wrote:
>> On 2/16/26 15:52, Jens Axboe wrote:
>>> On 2/16/26 8:48 AM, Pavel Begunkov wrote:
>>>> On 2/16/26 15:10, Jens Axboe wrote:
>>>>> On 2/16/26 4:48 AM, Pavel Begunkov wrote:
>>>>>> People previously asked for the notification CQE to have a different
>>>>>> user_data value from the main request completion. It's useful to
>>>>>> separate buffer and request handling logic and avoid separately
>>>>>> refcounting the request.
>>>>>>
>>>>>> Let the user pass the notification user_data in sqe->addr3. If zero,
>>>>>> it'll inherit sqe->user_data as before. It doesn't change the rules for
>>>>>> when the user can expect a notification CQE, and it should still check
>>>>>> the IORING_CQE_F_MORE flag.
>>>>>
>>>>> This should use and sqe->ioprio flag to manage it, otherwise you're
>>>>> excluding 0. Which may not be important in and of itself, but the
>>>>> flag approach is expected way to do this.
>>>>
>>>> What's the benefit? It's not unreasonable to exclude zero, it won't
>>>> limit any use cases, and it's not new either (i.e. buffer tags).
>>>> On the other hand, the user will now have to modify two fields
>>>> instead of one, which is cleaner. And you're taking one extra bit
>>>> out of 16bit ->ioprio, which is not critical if it's all going to
>>>> be flags, but it wouldn't be an outrageous idea to take 8 bits
>>>> out of it for some index, for example.
>>>
>>> The benefit is that it's weird to exclude a given user_data value, just
>>> so it can get used as both a key and a flag. IMHO much cleaner to have a
>>> flag for it which explicitly says "use the user_data I provide". Also
>>> easier to explain in docs, set this flag and then the value in X will be
>>> the user_data for the completion.
>>
>> Ok, I'll respin, let's go with wasting bits for nothing.
>
> It's not like they are a scarce resource, and if we need more than 16
> bits to modify send/recv behavior, then arguably we have bigger
> problems.
There are already 6, it'll be 7th. I also have one or two more in mind,
that's already over the half. The same was probably thought about
sqe->flags, and even though it's twice as many bits for net, those
are taken faster as potential cost of redesign is lower.
Fwiw, the code is nastier as well, more branchy and away from
other notification init because of dependency on reading the
flags.
@@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
zc->done_io = 0;
- if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
+ if (unlikely(READ_ONCE(sqe->__pad2[0])))
return -EINVAL;
/* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
if (req->flags & REQ_F_CQE_SKIP)
@@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
}
}
+ if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) {
+ notif->cqe.user_data = READ_ONCE(sqe->addr3);
+ } else {
+ if (READ_ONCE(sqe->addr3))
+ return -EINVAL;
+ }
+
zc->len = READ_ONCE(sqe->len);
zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL | MSG_ZEROCOPY;
req->buf_index = READ_ONCE(sqe->buf_index);
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data
2026-02-16 17:20 ` Pavel Begunkov
@ 2026-02-16 17:27 ` Jens Axboe
0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2026-02-16 17:27 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken
On 2/16/26 10:20 AM, Pavel Begunkov wrote:
> On 2/16/26 15:55, Jens Axboe wrote:
>> On 2/16/26 8:53 AM, Pavel Begunkov wrote:
>>> On 2/16/26 15:52, Jens Axboe wrote:
>>>> On 2/16/26 8:48 AM, Pavel Begunkov wrote:
>>>>> On 2/16/26 15:10, Jens Axboe wrote:
>>>>>> On 2/16/26 4:48 AM, Pavel Begunkov wrote:
>>>>>>> People previously asked for the notification CQE to have a different
>>>>>>> user_data value from the main request completion. It's useful to
>>>>>>> separate buffer and request handling logic and avoid separately
>>>>>>> refcounting the request.
>>>>>>>
>>>>>>> Let the user pass the notification user_data in sqe->addr3. If zero,
>>>>>>> it'll inherit sqe->user_data as before. It doesn't change the rules for
>>>>>>> when the user can expect a notification CQE, and it should still check
>>>>>>> the IORING_CQE_F_MORE flag.
>>>>>>
>>>>>> This should use and sqe->ioprio flag to manage it, otherwise you're
>>>>>> excluding 0. Which may not be important in and of itself, but the
>>>>>> flag approach is expected way to do this.
>>>>>
>>>>> What's the benefit? It's not unreasonable to exclude zero, it won't
>>>>> limit any use cases, and it's not new either (i.e. buffer tags).
>>>>> On the other hand, the user will now have to modify two fields
>>>>> instead of one, which is cleaner. And you're taking one extra bit
>>>>> out of 16bit ->ioprio, which is not critical if it's all going to
>>>>> be flags, but it wouldn't be an outrageous idea to take 8 bits
>>>>> out of it for some index, for example.
>>>>
>>>> The benefit is that it's weird to exclude a given user_data value, just
>>>> so it can get used as both a key and a flag. IMHO much cleaner to have a
>>>> flag for it which explicitly says "use the user_data I provide". Also
>>>> easier to explain in docs, set this flag and then the value in X will be
>>>> the user_data for the completion.
>>>
>>> Ok, I'll respin, let's go with wasting bits for nothing.
>>
>> It's not like they are a scarce resource, and if we need more than 16
>> bits to modify send/recv behavior, then arguably we have bigger
>> problems.
>
> There are already 6, it'll be 7th. I also have one or two more in mind,
> that's already over the half. The same was probably thought about
> sqe->flags, and even though it's twice as many bits for net, those
> are taken faster as potential cost of redesign is lower.
>
> Fwiw, the code is nastier as well, more branchy and away from
> other notification init because of dependency on reading the
> flags.
>
> @@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>
> zc->done_io = 0;
>
> - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
> + if (unlikely(READ_ONCE(sqe->__pad2[0])))
> return -EINVAL;
> /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
> if (req->flags & REQ_F_CQE_SKIP)
> @@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> }
> }
>
> + if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) {
> + notif->cqe.user_data = READ_ONCE(sqe->addr3);
> + } else {
> + if (READ_ONCE(sqe->addr3))
> + return -EINVAL;
> + }
> +
I think just remove the else part here - addr3 is valid now that
IORING_SEND_ZC_NOTIF_USER_DATA is supported, and if you mess it up in
your applications, you'll find this via development anyway. Since addr3
== 0 is a valid value, it doesn't make much sense to check for it being
non-zero. It's not like a flags field where any value set would be an
-EINVAL case. Doesn't even exclude having another flag for using addr3
for something else anyway.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-02-16 17:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 11:48 [PATCH 1/1] io_uring/zctx: separate notification user_data Pavel Begunkov
2026-02-16 15:10 ` Jens Axboe
2026-02-16 15:48 ` Pavel Begunkov
2026-02-16 15:52 ` Jens Axboe
2026-02-16 15:53 ` Pavel Begunkov
2026-02-16 15:55 ` Jens Axboe
2026-02-16 17:20 ` Pavel Begunkov
2026-02-16 17:27 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox