* [PATCH 1/1] io_uring/zctx: separate notification user_data @ 2026-02-16 11:48 Pavel Begunkov 2026-02-16 15:10 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2026-02-16 11:48 UTC (permalink / raw) To: io-uring; +Cc: asml.silence, axboe, Dylan Yudaken People previously asked for the notification CQE to have a different user_data value from the main request completion. It's useful to separate buffer and request handling logic and avoid separately refcounting the request. Let the user pass the notification user_data in sqe->addr3. If zero, it'll inherit sqe->user_data as before. It doesn't change the rules for when the user can expect a notification CQE, and it should still check the IORING_CQE_F_MORE flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/net.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 7ebfd51b84de..5673f088fc1d 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1327,11 +1327,12 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) struct io_ring_ctx *ctx = req->ctx; struct io_async_msghdr *iomsg; struct io_kiocb *notif; + u64 user_data; int ret; zc->done_io = 0; - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) + if (unlikely(READ_ONCE(sqe->__pad2[0]))) return -EINVAL; /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ if (req->flags & REQ_F_CQE_SKIP) @@ -1340,7 +1341,11 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) notif = zc->notif = io_alloc_notif(ctx); if (!notif) return -ENOMEM; - notif->cqe.user_data = req->cqe.user_data; + user_data = READ_ONCE(sqe->addr3); + if (!user_data) + user_data = req->cqe.user_data; + + notif->cqe.user_data = user_data; notif->cqe.res = 0; notif->cqe.flags = IORING_CQE_F_NOTIF; req->flags |= REQ_F_NEED_CLEANUP | REQ_F_POLL_NO_LAZY; -- 2.52.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 11:48 [PATCH 1/1] io_uring/zctx: separate notification user_data Pavel Begunkov @ 2026-02-16 15:10 ` Jens Axboe 2026-02-16 15:48 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2026-02-16 15:10 UTC (permalink / raw) To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken On 2/16/26 4:48 AM, Pavel Begunkov wrote: > People previously asked for the notification CQE to have a different > user_data value from the main request completion. It's useful to > separate buffer and request handling logic and avoid separately > refcounting the request. > > Let the user pass the notification user_data in sqe->addr3. If zero, > it'll inherit sqe->user_data as before. It doesn't change the rules for > when the user can expect a notification CQE, and it should still check > the IORING_CQE_F_MORE flag. This should use and sqe->ioprio flag to manage it, otherwise you're excluding 0. Which may not be important in and of itself, but the flag approach is expected way to do this. Also, please use io_uring/net: for this in the subject and just have the title reflect zerocopy send. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 15:10 ` Jens Axboe @ 2026-02-16 15:48 ` Pavel Begunkov 2026-02-16 15:52 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2026-02-16 15:48 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Dylan Yudaken On 2/16/26 15:10, Jens Axboe wrote: > On 2/16/26 4:48 AM, Pavel Begunkov wrote: >> People previously asked for the notification CQE to have a different >> user_data value from the main request completion. It's useful to >> separate buffer and request handling logic and avoid separately >> refcounting the request. >> >> Let the user pass the notification user_data in sqe->addr3. If zero, >> it'll inherit sqe->user_data as before. It doesn't change the rules for >> when the user can expect a notification CQE, and it should still check >> the IORING_CQE_F_MORE flag. > > This should use and sqe->ioprio flag to manage it, otherwise you're > excluding 0. Which may not be important in and of itself, but the > flag approach is expected way to do this. What's the benefit? It's not unreasonable to exclude zero, it won't limit any use cases, and it's not new either (i.e. buffer tags). On the other hand, the user will now have to modify two fields instead of one, which is cleaner. And you're taking one extra bit out of 16bit ->ioprio, which is not critical if it's all going to be flags, but it wouldn't be an outrageous idea to take 8 bits out of it for some index, for example. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 15:48 ` Pavel Begunkov @ 2026-02-16 15:52 ` Jens Axboe 2026-02-16 15:53 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2026-02-16 15:52 UTC (permalink / raw) To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken On 2/16/26 8:48 AM, Pavel Begunkov wrote: > On 2/16/26 15:10, Jens Axboe wrote: >> On 2/16/26 4:48 AM, Pavel Begunkov wrote: >>> People previously asked for the notification CQE to have a different >>> user_data value from the main request completion. It's useful to >>> separate buffer and request handling logic and avoid separately >>> refcounting the request. >>> >>> Let the user pass the notification user_data in sqe->addr3. If zero, >>> it'll inherit sqe->user_data as before. It doesn't change the rules for >>> when the user can expect a notification CQE, and it should still check >>> the IORING_CQE_F_MORE flag. >> >> This should use and sqe->ioprio flag to manage it, otherwise you're >> excluding 0. Which may not be important in and of itself, but the >> flag approach is expected way to do this. > > What's the benefit? It's not unreasonable to exclude zero, it won't > limit any use cases, and it's not new either (i.e. buffer tags). > On the other hand, the user will now have to modify two fields > instead of one, which is cleaner. And you're taking one extra bit > out of 16bit ->ioprio, which is not critical if it's all going to > be flags, but it wouldn't be an outrageous idea to take 8 bits > out of it for some index, for example. The benefit is that it's weird to exclude a given user_data value, just so it can get used as both a key and a flag. IMHO much cleaner to have a flag for it which explicitly says "use the user_data I provide". Also easier to explain in docs, set this flag and then the value in X will be the user_data for the completion. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 15:52 ` Jens Axboe @ 2026-02-16 15:53 ` Pavel Begunkov 2026-02-16 15:55 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2026-02-16 15:53 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Dylan Yudaken On 2/16/26 15:52, Jens Axboe wrote: > On 2/16/26 8:48 AM, Pavel Begunkov wrote: >> On 2/16/26 15:10, Jens Axboe wrote: >>> On 2/16/26 4:48 AM, Pavel Begunkov wrote: >>>> People previously asked for the notification CQE to have a different >>>> user_data value from the main request completion. It's useful to >>>> separate buffer and request handling logic and avoid separately >>>> refcounting the request. >>>> >>>> Let the user pass the notification user_data in sqe->addr3. If zero, >>>> it'll inherit sqe->user_data as before. It doesn't change the rules for >>>> when the user can expect a notification CQE, and it should still check >>>> the IORING_CQE_F_MORE flag. >>> >>> This should use and sqe->ioprio flag to manage it, otherwise you're >>> excluding 0. Which may not be important in and of itself, but the >>> flag approach is expected way to do this. >> >> What's the benefit? It's not unreasonable to exclude zero, it won't >> limit any use cases, and it's not new either (i.e. buffer tags). >> On the other hand, the user will now have to modify two fields >> instead of one, which is cleaner. And you're taking one extra bit >> out of 16bit ->ioprio, which is not critical if it's all going to >> be flags, but it wouldn't be an outrageous idea to take 8 bits >> out of it for some index, for example. > > The benefit is that it's weird to exclude a given user_data value, just > so it can get used as both a key and a flag. IMHO much cleaner to have a > flag for it which explicitly says "use the user_data I provide". Also > easier to explain in docs, set this flag and then the value in X will be > the user_data for the completion. Ok, I'll respin, let's go with wasting bits for nothing. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 15:53 ` Pavel Begunkov @ 2026-02-16 15:55 ` Jens Axboe 2026-02-16 17:20 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2026-02-16 15:55 UTC (permalink / raw) To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken On 2/16/26 8:53 AM, Pavel Begunkov wrote: > On 2/16/26 15:52, Jens Axboe wrote: >> On 2/16/26 8:48 AM, Pavel Begunkov wrote: >>> On 2/16/26 15:10, Jens Axboe wrote: >>>> On 2/16/26 4:48 AM, Pavel Begunkov wrote: >>>>> People previously asked for the notification CQE to have a different >>>>> user_data value from the main request completion. It's useful to >>>>> separate buffer and request handling logic and avoid separately >>>>> refcounting the request. >>>>> >>>>> Let the user pass the notification user_data in sqe->addr3. If zero, >>>>> it'll inherit sqe->user_data as before. It doesn't change the rules for >>>>> when the user can expect a notification CQE, and it should still check >>>>> the IORING_CQE_F_MORE flag. >>>> >>>> This should use and sqe->ioprio flag to manage it, otherwise you're >>>> excluding 0. Which may not be important in and of itself, but the >>>> flag approach is expected way to do this. >>> >>> What's the benefit? It's not unreasonable to exclude zero, it won't >>> limit any use cases, and it's not new either (i.e. buffer tags). >>> On the other hand, the user will now have to modify two fields >>> instead of one, which is cleaner. And you're taking one extra bit >>> out of 16bit ->ioprio, which is not critical if it's all going to >>> be flags, but it wouldn't be an outrageous idea to take 8 bits >>> out of it for some index, for example. >> >> The benefit is that it's weird to exclude a given user_data value, just >> so it can get used as both a key and a flag. IMHO much cleaner to have a >> flag for it which explicitly says "use the user_data I provide". Also >> easier to explain in docs, set this flag and then the value in X will be >> the user_data for the completion. > > Ok, I'll respin, let's go with wasting bits for nothing. It's not like they are a scarce resource, and if we need more than 16 bits to modify send/recv behavior, then arguably we have bigger problems. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 15:55 ` Jens Axboe @ 2026-02-16 17:20 ` Pavel Begunkov 2026-02-16 17:27 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2026-02-16 17:20 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Dylan Yudaken On 2/16/26 15:55, Jens Axboe wrote: > On 2/16/26 8:53 AM, Pavel Begunkov wrote: >> On 2/16/26 15:52, Jens Axboe wrote: >>> On 2/16/26 8:48 AM, Pavel Begunkov wrote: >>>> On 2/16/26 15:10, Jens Axboe wrote: >>>>> On 2/16/26 4:48 AM, Pavel Begunkov wrote: >>>>>> People previously asked for the notification CQE to have a different >>>>>> user_data value from the main request completion. It's useful to >>>>>> separate buffer and request handling logic and avoid separately >>>>>> refcounting the request. >>>>>> >>>>>> Let the user pass the notification user_data in sqe->addr3. If zero, >>>>>> it'll inherit sqe->user_data as before. It doesn't change the rules for >>>>>> when the user can expect a notification CQE, and it should still check >>>>>> the IORING_CQE_F_MORE flag. >>>>> >>>>> This should use and sqe->ioprio flag to manage it, otherwise you're >>>>> excluding 0. Which may not be important in and of itself, but the >>>>> flag approach is expected way to do this. >>>> >>>> What's the benefit? It's not unreasonable to exclude zero, it won't >>>> limit any use cases, and it's not new either (i.e. buffer tags). >>>> On the other hand, the user will now have to modify two fields >>>> instead of one, which is cleaner. And you're taking one extra bit >>>> out of 16bit ->ioprio, which is not critical if it's all going to >>>> be flags, but it wouldn't be an outrageous idea to take 8 bits >>>> out of it for some index, for example. >>> >>> The benefit is that it's weird to exclude a given user_data value, just >>> so it can get used as both a key and a flag. IMHO much cleaner to have a >>> flag for it which explicitly says "use the user_data I provide". Also >>> easier to explain in docs, set this flag and then the value in X will be >>> the user_data for the completion. >> >> Ok, I'll respin, let's go with wasting bits for nothing. > > It's not like they are a scarce resource, and if we need more than 16 > bits to modify send/recv behavior, then arguably we have bigger > problems. There are already 6, it'll be 7th. I also have one or two more in mind, that's already over the half. The same was probably thought about sqe->flags, and even though it's twice as many bits for net, those are taken faster as potential cost of redesign is lower. Fwiw, the code is nastier as well, more branchy and away from other notification init because of dependency on reading the flags. @@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->done_io = 0; - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) + if (unlikely(READ_ONCE(sqe->__pad2[0]))) return -EINVAL; /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ if (req->flags & REQ_F_CQE_SKIP) @@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) } } + if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) { + notif->cqe.user_data = READ_ONCE(sqe->addr3); + } else { + if (READ_ONCE(sqe->addr3)) + return -EINVAL; + } + zc->len = READ_ONCE(sqe->len); zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL | MSG_ZEROCOPY; req->buf_index = READ_ONCE(sqe->buf_index); -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 17:20 ` Pavel Begunkov @ 2026-02-16 17:27 ` Jens Axboe 2026-02-17 11:15 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2026-02-16 17:27 UTC (permalink / raw) To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken On 2/16/26 10:20 AM, Pavel Begunkov wrote: > On 2/16/26 15:55, Jens Axboe wrote: >> On 2/16/26 8:53 AM, Pavel Begunkov wrote: >>> On 2/16/26 15:52, Jens Axboe wrote: >>>> On 2/16/26 8:48 AM, Pavel Begunkov wrote: >>>>> On 2/16/26 15:10, Jens Axboe wrote: >>>>>> On 2/16/26 4:48 AM, Pavel Begunkov wrote: >>>>>>> People previously asked for the notification CQE to have a different >>>>>>> user_data value from the main request completion. It's useful to >>>>>>> separate buffer and request handling logic and avoid separately >>>>>>> refcounting the request. >>>>>>> >>>>>>> Let the user pass the notification user_data in sqe->addr3. If zero, >>>>>>> it'll inherit sqe->user_data as before. It doesn't change the rules for >>>>>>> when the user can expect a notification CQE, and it should still check >>>>>>> the IORING_CQE_F_MORE flag. >>>>>> >>>>>> This should use and sqe->ioprio flag to manage it, otherwise you're >>>>>> excluding 0. Which may not be important in and of itself, but the >>>>>> flag approach is expected way to do this. >>>>> >>>>> What's the benefit? It's not unreasonable to exclude zero, it won't >>>>> limit any use cases, and it's not new either (i.e. buffer tags). >>>>> On the other hand, the user will now have to modify two fields >>>>> instead of one, which is cleaner. And you're taking one extra bit >>>>> out of 16bit ->ioprio, which is not critical if it's all going to >>>>> be flags, but it wouldn't be an outrageous idea to take 8 bits >>>>> out of it for some index, for example. >>>> >>>> The benefit is that it's weird to exclude a given user_data value, just >>>> so it can get used as both a key and a flag. IMHO much cleaner to have a >>>> flag for it which explicitly says "use the user_data I provide". Also >>>> easier to explain in docs, set this flag and then the value in X will be >>>> the user_data for the completion. >>> >>> Ok, I'll respin, let's go with wasting bits for nothing. >> >> It's not like they are a scarce resource, and if we need more than 16 >> bits to modify send/recv behavior, then arguably we have bigger >> problems. > > There are already 6, it'll be 7th. I also have one or two more in mind, > that's already over the half. The same was probably thought about > sqe->flags, and even though it's twice as many bits for net, those > are taken faster as potential cost of redesign is lower. > > Fwiw, the code is nastier as well, more branchy and away from > other notification init because of dependency on reading the > flags. > > @@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > > zc->done_io = 0; > > - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) > + if (unlikely(READ_ONCE(sqe->__pad2[0]))) > return -EINVAL; > /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ > if (req->flags & REQ_F_CQE_SKIP) > @@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > } > } > > + if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) { > + notif->cqe.user_data = READ_ONCE(sqe->addr3); > + } else { > + if (READ_ONCE(sqe->addr3)) > + return -EINVAL; > + } > + I think just remove the else part here - addr3 is valid now that IORING_SEND_ZC_NOTIF_USER_DATA is supported, and if you mess it up in your applications, you'll find this via development anyway. Since addr3 == 0 is a valid value, it doesn't make much sense to check for it being non-zero. It's not like a flags field where any value set would be an -EINVAL case. Doesn't even exclude having another flag for using addr3 for something else anyway. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-16 17:27 ` Jens Axboe @ 2026-02-17 11:15 ` Pavel Begunkov 2026-02-17 13:12 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2026-02-17 11:15 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Dylan Yudaken On 2/16/26 17:27, Jens Axboe wrote: ... >> There are already 6, it'll be 7th. I also have one or two more in mind, >> that's already over the half. The same was probably thought about >> sqe->flags, and even though it's twice as many bits for net, those >> are taken faster as potential cost of redesign is lower. >> >> Fwiw, the code is nastier as well, more branchy and away from >> other notification init because of dependency on reading the >> flags. >> >> @@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> >> zc->done_io = 0; >> >> - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) >> + if (unlikely(READ_ONCE(sqe->__pad2[0]))) >> return -EINVAL; >> /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ >> if (req->flags & REQ_F_CQE_SKIP) >> @@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> } >> } >> >> + if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) { >> + notif->cqe.user_data = READ_ONCE(sqe->addr3); >> + } else { >> + if (READ_ONCE(sqe->addr3)) >> + return -EINVAL; >> + } >> + > > I think just remove the else part here - addr3 is valid now that > IORING_SEND_ZC_NOTIF_USER_DATA is supported, and if you mess it up in > your applications, you'll find this via development anyway. Since addr3 > == 0 is a valid value, it doesn't make much sense to check for it being > non-zero. Gating it on a separate flag but not checking when not set makes it only more confusing in terms of why would you do a flag in the first place. It's not like a flags field where any value set would be an > -EINVAL case. Doesn't even exclude having another flag for using addr3 > for something else anyway. You can override the behaviour with another flag in either case, but realistically it's better to avoid as it's always messy, unless the features are clearly exclusive. I know there is no way to convince you, but v2 already degraded the uapi as per requested, can we have that one? The "else" branch doesn't make the api worse, on the opposite. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-17 11:15 ` Pavel Begunkov @ 2026-02-17 13:12 ` Jens Axboe 2026-02-17 15:03 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2026-02-17 13:12 UTC (permalink / raw) To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken On 2/17/26 4:15 AM, Pavel Begunkov wrote: > On 2/16/26 17:27, Jens Axboe wrote: > ... >>> There are already 6, it'll be 7th. I also have one or two more in mind, >>> that's already over the half. The same was probably thought about >>> sqe->flags, and even though it's twice as many bits for net, those >>> are taken faster as potential cost of redesign is lower. >>> >>> Fwiw, the code is nastier as well, more branchy and away from >>> other notification init because of dependency on reading the >>> flags. >>> >>> @@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>> zc->done_io = 0; >>> - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) >>> + if (unlikely(READ_ONCE(sqe->__pad2[0]))) >>> return -EINVAL; >>> /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ >>> if (req->flags & REQ_F_CQE_SKIP) >>> @@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>> } >>> } >>> + if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) { >>> + notif->cqe.user_data = READ_ONCE(sqe->addr3); >>> + } else { >>> + if (READ_ONCE(sqe->addr3)) >>> + return -EINVAL; >>> + } >>> + >> >> I think just remove the else part here - addr3 is valid now that >> IORING_SEND_ZC_NOTIF_USER_DATA is supported, and if you mess it up in >> your applications, you'll find this via development anyway. Since addr3 >> == 0 is a valid value, it doesn't make much sense to check for it being >> non-zero. > > Gating it on a separate flag but not checking when not set makes > it only more confusing in terms of why would you do a flag in > the first place. > > It's not like a flags field where any value set would be an >> -EINVAL case. Doesn't even exclude having another flag for using addr3 >> for something else anyway. > > You can override the behaviour with another flag in either case, > but realistically it's better to avoid as it's always messy, > unless the features are clearly exclusive. > > I know there is no way to convince you, but v2 already degraded > the uapi as per requested, can we have that one? The "else" branch > doesn't make the api worse, on the opposite. The else ties into all of it though, as it perpetuates the "user_data is zero is not valid" part. The reason we have the addr3 check in the first place is to have a way of saying "this field isn't used for this opcode, may be used in the future". Now it is used/supported, and I don't think we should be checking it. If we end up with future flags that also need addr3 and 0 is valid, then it'll end up with more checking for that, based on which flags are set and which are not. The patch should just be removing that addr3 -EINVAL case, and adding the two lines that check IORING_SEND_ZC_NOTIF_USER_DATA, and if set, assign notif->cqe.user_data from addr3. But I object to saying this is a "degraded" uapi, to me it's very much a better one as it allows all values of user_data, rather than have some magic 0 value that's not valid for no other reason than force policy. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-17 13:12 ` Jens Axboe @ 2026-02-17 15:03 ` Pavel Begunkov 2026-02-17 15:15 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2026-02-17 15:03 UTC (permalink / raw) To: Jens Axboe, io-uring; +Cc: Dylan Yudaken On 2/17/26 13:12, Jens Axboe wrote: > On 2/17/26 4:15 AM, Pavel Begunkov wrote: >> On 2/16/26 17:27, Jens Axboe wrote: >> ... >>>> There are already 6, it'll be 7th. I also have one or two more in mind, >>>> that's already over the half. The same was probably thought about >>>> sqe->flags, and even though it's twice as many bits for net, those >>>> are taken faster as potential cost of redesign is lower. >>>> >>>> Fwiw, the code is nastier as well, more branchy and away from >>>> other notification init because of dependency on reading the >>>> flags. >>>> >>>> @@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>>> zc->done_io = 0; >>>> - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) >>>> + if (unlikely(READ_ONCE(sqe->__pad2[0]))) >>>> return -EINVAL; >>>> /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ >>>> if (req->flags & REQ_F_CQE_SKIP) >>>> @@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>>> } >>>> } >>>> + if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) { >>>> + notif->cqe.user_data = READ_ONCE(sqe->addr3); >>>> + } else { >>>> + if (READ_ONCE(sqe->addr3)) >>>> + return -EINVAL; >>>> + } >>>> + >>> >>> I think just remove the else part here - addr3 is valid now that >>> IORING_SEND_ZC_NOTIF_USER_DATA is supported, and if you mess it up in >>> your applications, you'll find this via development anyway. Since addr3 >>> == 0 is a valid value, it doesn't make much sense to check for it being >>> non-zero. >> >> Gating it on a separate flag but not checking when not set makes >> it only more confusing in terms of why would you do a flag in >> the first place. >> >> It's not like a flags field where any value set would be an >>> -EINVAL case. Doesn't even exclude having another flag for using addr3 >>> for something else anyway. >> >> You can override the behaviour with another flag in either case, >> but realistically it's better to avoid as it's always messy, >> unless the features are clearly exclusive. >> >> I know there is no way to convince you, but v2 already degraded >> the uapi as per requested, can we have that one? The "else" branch >> doesn't make the api worse, on the opposite. > > The else ties into all of it though, as it perpetuates the "user_data is > zero is not valid" part. The reason we have the addr3 check in the first How come? If the flag is not set, it shouldn't be user_data and the kernel shouldn't care whether the user assumes otherwise. It's a general unused value check, and it is unused exactly because the field doesn't have any meaning put into it when the flag is not set. Same way nobody argues that sqe->__pad2[0] shouldn't be zero checked because a user might mess up the ABI by accident and put some user_data there. It'd be nice to be able to check if it's "set", but the best I can do is the typical "0 == not set". > place is to have a way of saying "this field isn't used for this opcode, > may be used in the future". Now it is used/supported, and I don't think addr3 is not used nor supported in the else branch. You'd allow users to set addr3 without the flag, but what behaviour does it supposed to achieve? Just ignoring doing nothing? > we should be checking it. If we end up with future flags that also need > addr3 and 0 is valid, then it'll end up with more checking for that, > based on which flags are set and which are not. Or it can be used without any new flags when the user_data flag is not set, assuming it can tolerate the default 0. > The patch should just be removing that addr3 -EINVAL case, and adding > the two lines that check IORING_SEND_ZC_NOTIF_USER_DATA, and if set, assign > notif->cqe.user_data from addr3. > > But I object to saying this is a "degraded" uapi, to me it's very much a > better one as it allows all values of user_data, rather than have some > magic 0 value that's not valid for no other reason than force policy. Well, we clearly disagree on that one. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] io_uring/zctx: separate notification user_data 2026-02-17 15:03 ` Pavel Begunkov @ 2026-02-17 15:15 ` Jens Axboe 0 siblings, 0 replies; 12+ messages in thread From: Jens Axboe @ 2026-02-17 15:15 UTC (permalink / raw) To: Pavel Begunkov, io-uring; +Cc: Dylan Yudaken On 2/17/26 8:03 AM, Pavel Begunkov wrote: > On 2/17/26 13:12, Jens Axboe wrote: >> On 2/17/26 4:15 AM, Pavel Begunkov wrote: >>> On 2/16/26 17:27, Jens Axboe wrote: >>> ... >>>>> There are already 6, it'll be 7th. I also have one or two more in mind, >>>>> that's already over the half. The same was probably thought about >>>>> sqe->flags, and even though it's twice as many bits for net, those >>>>> are taken faster as potential cost of redesign is lower. >>>>> >>>>> Fwiw, the code is nastier as well, more branchy and away from >>>>> other notification init because of dependency on reading the >>>>> flags. >>>>> >>>>> @@ -1331,7 +1333,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>>>> zc->done_io = 0; >>>>> - if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3))) >>>>> + if (unlikely(READ_ONCE(sqe->__pad2[0]))) >>>>> return -EINVAL; >>>>> /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */ >>>>> if (req->flags & REQ_F_CQE_SKIP) >>>>> @@ -1358,6 +1360,13 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>>>> } >>>>> } >>>>> + if (zc->flags & IORING_SEND_ZC_NOTIF_USER_DATA) { >>>>> + notif->cqe.user_data = READ_ONCE(sqe->addr3); >>>>> + } else { >>>>> + if (READ_ONCE(sqe->addr3)) >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>> >>>> I think just remove the else part here - addr3 is valid now that >>>> IORING_SEND_ZC_NOTIF_USER_DATA is supported, and if you mess it up in >>>> your applications, you'll find this via development anyway. Since addr3 >>>> == 0 is a valid value, it doesn't make much sense to check for it being >>>> non-zero. >>> >>> Gating it on a separate flag but not checking when not set makes >>> it only more confusing in terms of why would you do a flag in >>> the first place. >>> >>> It's not like a flags field where any value set would be an >>>> -EINVAL case. Doesn't even exclude having another flag for using addr3 >>>> for something else anyway. >>> >>> You can override the behaviour with another flag in either case, >>> but realistically it's better to avoid as it's always messy, >>> unless the features are clearly exclusive. >>> >>> I know there is no way to convince you, but v2 already degraded >>> the uapi as per requested, can we have that one? The "else" branch >>> doesn't make the api worse, on the opposite. >> >> The else ties into all of it though, as it perpetuates the "user_data is >> zero is not valid" part. The reason we have the addr3 check in the first > > How come? If the flag is not set, it shouldn't be user_data and > the kernel shouldn't care whether the user assumes otherwise. It's > a general unused value check, and it is unused exactly because the > field doesn't have any meaning put into it when the flag is not > set. Same way nobody argues that sqe->__pad2[0] shouldn't be zero > checked because a user might mess up the ABI by accident and put > some user_data there. It'd be nice to be able to check if it's > "set", but the best I can do is the typical "0 == not set". Check if it's set == IORING_SEND_ZC_NOTIF_USER_DATA is set. That's why the flag is there. >> place is to have a way of saying "this field isn't used for this opcode, >> may be used in the future". Now it is used/supported, and I don't think > > addr3 is not used nor supported in the else branch. You'd allow > users to set addr3 without the flag, but what behaviour does it > supposed to achieve? Just ignoring doing nothing? It's not supposed to achieve anything, that is the point. The only downside I can see is that you now always need a flag to say 'use addr3' for this. But I think that's better than if we had used it for something else first, and then you'd need a flag later on anyway to say "no it's not X, it's Y if this flag is set". >> we should be checking it. If we end up with future flags that also need >> addr3 and 0 is valid, then it'll end up with more checking for that, >> based on which flags are set and which are not. > > Or it can be used without any new flags when the user_data flag > is not set, assuming it can tolerate the default 0. So you'd have addr3 be user_data is IORING_SEND_ZC_NOTIF_USER_DATA is set, and potentially something else if NO flag is set? That seems pretty confusing. >> The patch should just be removing that addr3 -EINVAL case, and adding >> the two lines that check IORING_SEND_ZC_NOTIF_USER_DATA, and if set, assign >> notif->cqe.user_data from addr3. >> >> But I object to saying this is a "degraded" uapi, to me it's very much a >> better one as it allows all values of user_data, rather than have some >> magic 0 value that's not valid for no other reason than force policy. > > Well, we clearly disagree on that one. In the spirit of making progress and not wasting anymore time on this fairly fruitless discussion, I'm fine with adding the else branch, and hence v2 as-is. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-02-17 15:15 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-16 11:48 [PATCH 1/1] io_uring/zctx: separate notification user_data Pavel Begunkov 2026-02-16 15:10 ` Jens Axboe 2026-02-16 15:48 ` Pavel Begunkov 2026-02-16 15:52 ` Jens Axboe 2026-02-16 15:53 ` Pavel Begunkov 2026-02-16 15:55 ` Jens Axboe 2026-02-16 17:20 ` Pavel Begunkov 2026-02-16 17:27 ` Jens Axboe 2026-02-17 11:15 ` Pavel Begunkov 2026-02-17 13:12 ` Jens Axboe 2026-02-17 15:03 ` Pavel Begunkov 2026-02-17 15:15 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox