* [PATCH v4 0/5] io_uring cmd for tx timestamps
@ 2025-06-13 18:32 Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-13 18:32 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Vadim Fedorenko suggested to add an alternative API for receiving
tx timestamps through io_uring. The series introduces io_uring socket
cmd for fetching tx timestamps, which is a polled multishot request,
i.e. internally polling the socket for POLLERR and posts timestamps
when they're arrives. For the API description see Patch 5.
It reuses existing timestamp infra and takes them from the socket's
error queue. For networking people the important parts are Patch 1,
and io_uring_cmd_timestamp() from Patch 5 walking the error queue.
It should be reasonable to take it through the io_uring tree once
we have consensus, but let me know if there are any concerns.
v4: rename uapi flags, etc.
v3: Add a flag to distinguish sw vs hw timestamp. skb_get_tx_timestamp()
from Patch 1 now returns the indication of that, and in Patch 5
it's converted into a io_uring CQE bit flag.
v2: remove (rx) false timestamp handling
fix skipping already queued events on request submission
constantize socket in a helper
Pavel Begunkov (5):
net: timestamp: add helper returning skb's tx tstamp
io_uring/poll: introduce io_arm_apoll()
io_uring/cmd: allow multishot polled commands
io_uring: add mshot helper for posting CQE32
io_uring/netcmd: add tx timestamping cmd support
include/net/sock.h | 9 ++++
include/uapi/linux/io_uring.h | 16 +++++++
io_uring/cmd_net.c | 82 +++++++++++++++++++++++++++++++++++
io_uring/io_uring.c | 40 +++++++++++++++++
io_uring/io_uring.h | 1 +
io_uring/poll.c | 44 +++++++++++--------
io_uring/poll.h | 1 +
io_uring/uring_cmd.c | 34 +++++++++++++++
io_uring/uring_cmd.h | 7 +++
net/socket.c | 46 ++++++++++++++++++++
10 files changed, 263 insertions(+), 17 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-13 18:32 [PATCH v4 0/5] io_uring cmd for tx timestamps Pavel Begunkov
@ 2025-06-13 18:32 ` Pavel Begunkov
2025-06-16 2:31 ` Willem de Bruijn
2025-06-13 18:32 ` [PATCH v4 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-13 18:32 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
associated with an error queue skb.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/net/sock.h | 9 +++++++++
net/socket.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)
diff --git a/include/net/sock.h b/include/net/sock.h
index 92e7c1aae3cc..0b96196d8a34 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2677,6 +2677,15 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
struct sk_buff *skb);
+enum {
+ NET_TIMESTAMP_ORIGIN_SW = 0,
+ NET_TIMESTAMP_ORIGIN_HW = 1,
+};
+
+bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk);
+int skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
+ struct timespec64 *ts);
+
static inline void
sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
{
diff --git a/net/socket.c b/net/socket.c
index 9a0e720f0859..eefbd730a9a2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -843,6 +843,52 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
sizeof(ts_pktinfo), &ts_pktinfo);
}
+bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk)
+{
+ const struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+ u32 tsflags = READ_ONCE(sk->sk_tsflags);
+
+ if (serr->ee.ee_errno != ENOMSG ||
+ serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
+ return false;
+
+ /* software time stamp available and wanted */
+ if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
+ return true;
+ /* hardware time stamps available and wanted */
+ return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
+ skb_hwtstamps(skb)->hwtstamp;
+}
+
+int skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
+ struct timespec64 *ts)
+{
+ u32 tsflags = READ_ONCE(sk->sk_tsflags);
+ ktime_t hwtstamp;
+ int if_index = 0;
+
+ if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
+ ktime_to_timespec64_cond(skb->tstamp, ts))
+ return NET_TIMESTAMP_ORIGIN_SW;
+
+ if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
+ skb_is_swtx_tstamp(skb, false))
+ return -ENOENT;
+
+ if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
+ hwtstamp = get_timestamp(sk, skb, &if_index);
+ else
+ hwtstamp = skb_hwtstamps(skb)->hwtstamp;
+
+ if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
+ hwtstamp = ptp_convert_timestamp(&hwtstamp,
+ READ_ONCE(sk->sk_bind_phc));
+ if (!ktime_to_timespec64_cond(hwtstamp, ts))
+ return -ENOENT;
+
+ return NET_TIMESTAMP_ORIGIN_HW;
+}
+
/*
* called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
*/
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 2/5] io_uring/poll: introduce io_arm_apoll()
2025-06-13 18:32 [PATCH v4 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
@ 2025-06-13 18:32 ` Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-13 18:32 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
In preparation to allowing commands to do file polling, add a helper
that takes the desired poll event mask and arms it for polling. We won't
be able to use io_arm_poll_handler() with IORING_OP_URING_CMD as it
tries to infer the mask from the opcode data, and we can't unify it
across all commands.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/poll.c | 44 +++++++++++++++++++++++++++-----------------
io_uring/poll.h | 1 +
2 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 0526062e2f81..c7e9fb34563d 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -669,33 +669,18 @@ static struct async_poll *io_req_alloc_apoll(struct io_kiocb *req,
return apoll;
}
-int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask)
{
- const struct io_issue_def *def = &io_issue_defs[req->opcode];
struct async_poll *apoll;
struct io_poll_table ipt;
- __poll_t mask = POLLPRI | POLLERR | EPOLLET;
int ret;
- if (!def->pollin && !def->pollout)
- return IO_APOLL_ABORTED;
+ mask |= EPOLLET;
if (!io_file_can_poll(req))
return IO_APOLL_ABORTED;
if (!(req->flags & REQ_F_APOLL_MULTISHOT))
mask |= EPOLLONESHOT;
- if (def->pollin) {
- mask |= EPOLLIN | EPOLLRDNORM;
-
- /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
- if (req->flags & REQ_F_CLEAR_POLLIN)
- mask &= ~EPOLLIN;
- } else {
- mask |= EPOLLOUT | EPOLLWRNORM;
- }
- if (def->poll_exclusive)
- mask |= EPOLLEXCLUSIVE;
-
apoll = io_req_alloc_apoll(req, issue_flags);
if (!apoll)
return IO_APOLL_ABORTED;
@@ -712,6 +697,31 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
return IO_APOLL_OK;
}
+int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+{
+ const struct io_issue_def *def = &io_issue_defs[req->opcode];
+ __poll_t mask = POLLPRI | POLLERR;
+
+ if (!def->pollin && !def->pollout)
+ return IO_APOLL_ABORTED;
+ if (!io_file_can_poll(req))
+ return IO_APOLL_ABORTED;
+
+ if (def->pollin) {
+ mask |= EPOLLIN | EPOLLRDNORM;
+
+ /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+ if (req->flags & REQ_F_CLEAR_POLLIN)
+ mask &= ~EPOLLIN;
+ } else {
+ mask |= EPOLLOUT | EPOLLWRNORM;
+ }
+ if (def->poll_exclusive)
+ mask |= EPOLLEXCLUSIVE;
+
+ return io_arm_apoll(req, issue_flags, mask);
+}
+
/*
* Returns true if we found and killed one or more poll requests
*/
diff --git a/io_uring/poll.h b/io_uring/poll.h
index 27e2db2ed4ae..c8438286dfa0 100644
--- a/io_uring/poll.h
+++ b/io_uring/poll.h
@@ -41,6 +41,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
struct io_cancel_data;
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
unsigned issue_flags);
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask);
int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags);
bool io_poll_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx,
bool cancel_all);
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 3/5] io_uring/cmd: allow multishot polled commands
2025-06-13 18:32 [PATCH v4 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
@ 2025-06-13 18:32 ` Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
4 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-13 18:32 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Some commands like timestamping in the next patch can make use of
multishot polling, i.e. REQ_F_APOLL_MULTISHOT. Add support for that,
which is condensed in a single helper called io_cmd_poll_multishot().
The user who wants to continue with a request in a multishot mode must
call the function, and only if it returns 0 the user is free to proceed.
Apart from normal terminal errors, it can also end up with -EIOCBQUEUED,
in which case the user must forward it to the core io_uring. It's
forbidden to use task work while the request is executing in a multishot
mode.
The API is not foolproof, hence it's not exported to modules nor exposed
in public headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/uring_cmd.c | 23 +++++++++++++++++++++++
io_uring/uring_cmd.h | 3 +++
2 files changed, 26 insertions(+)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 9ad0ea5398c2..02cec6231831 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -12,6 +12,7 @@
#include "alloc_cache.h"
#include "rsrc.h"
#include "uring_cmd.h"
+#include "poll.h"
void io_cmd_cache_free(const void *entry)
{
@@ -136,6 +137,9 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
{
struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+ return;
+
ioucmd->task_work_cb = task_work_cb;
req->io_task_work.func = io_uring_cmd_work;
__io_req_task_work_add(req, flags);
@@ -158,6 +162,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
{
struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+ return;
+
io_uring_cmd_del_cancelable(ioucmd, issue_flags);
if (ret < 0)
@@ -305,3 +312,19 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd)
io_req_queue_iowq(req);
}
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, __poll_t mask)
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+ int ret;
+
+ if (likely(req->flags & REQ_F_APOLL_MULTISHOT))
+ return 0;
+
+ req->flags |= REQ_F_APOLL_MULTISHOT;
+ mask &= ~EPOLLONESHOT;
+
+ ret = io_arm_apoll(req, issue_flags, mask);
+ return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index a6dad47afc6b..50a6ccb831df 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -18,3 +18,6 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
struct io_uring_task *tctx, bool cancel_all);
void io_cmd_cache_free(const void *entry);
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, __poll_t mask);
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 4/5] io_uring: add mshot helper for posting CQE32
2025-06-13 18:32 [PATCH v4 0/5] io_uring cmd for tx timestamps Pavel Begunkov
` (2 preceding siblings ...)
2025-06-13 18:32 ` [PATCH v4 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
@ 2025-06-13 18:32 ` Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
4 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-13 18:32 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Add a helper for posting 32 byte CQEs in a multishot mode and add a cmd
helper on top. As it specifically works with requests, the helper ignore
the passed in cqe->user_data and sets it to the one stored in the
request.
The command helper is only valid with multishot requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 40 ++++++++++++++++++++++++++++++++++++++++
io_uring/io_uring.h | 1 +
io_uring/uring_cmd.c | 11 +++++++++++
io_uring/uring_cmd.h | 4 ++++
4 files changed, 56 insertions(+)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 98a701fc56cc..4352cf209450 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -793,6 +793,21 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow)
return true;
}
+static bool io_fill_cqe_aux32(struct io_ring_ctx *ctx,
+ struct io_uring_cqe src_cqe[2])
+{
+ struct io_uring_cqe *cqe;
+
+ if (WARN_ON_ONCE(!(ctx->flags & IORING_SETUP_CQE32)))
+ return false;
+ if (unlikely(!io_get_cqe(ctx, &cqe)))
+ return false;
+
+ memcpy(cqe, src_cqe, 2 * sizeof(*cqe));
+ trace_io_uring_complete(ctx, NULL, cqe);
+ return true;
+}
+
static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
u32 cflags)
{
@@ -904,6 +919,31 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
return posted;
}
+/*
+ * A helper for multishot requests posting additional CQEs.
+ * Should only be used from a task_work including IO_URING_F_MULTISHOT.
+ */
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2])
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ bool posted;
+
+ lockdep_assert(!io_wq_current_is_worker());
+ lockdep_assert_held(&ctx->uring_lock);
+
+ cqe[0].user_data = req->cqe.user_data;
+ if (!ctx->lockless_cq) {
+ spin_lock(&ctx->completion_lock);
+ posted = io_fill_cqe_aux32(ctx, cqe);
+ spin_unlock(&ctx->completion_lock);
+ } else {
+ posted = io_fill_cqe_aux32(ctx, cqe);
+ }
+
+ ctx->submit_state.cq_flush = true;
+ return posted;
+}
+
static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index d59c12277d58..1263af818c47 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -81,6 +81,7 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res);
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags);
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe src_cqe[2]);
void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
void io_req_track_inflight(struct io_kiocb *req);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 02cec6231831..b228b84a510f 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -328,3 +328,14 @@ int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
ret = io_arm_apoll(req, issue_flags, mask);
return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
}
+
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+ unsigned int issue_flags,
+ struct io_uring_cqe cqe[2])
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+
+ if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_MULTISHOT)))
+ return false;
+ return io_req_post_cqe32(req, cqe);
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index 50a6ccb831df..9e11da10ecab 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -17,6 +17,10 @@ void io_uring_cmd_cleanup(struct io_kiocb *req);
bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
struct io_uring_task *tctx, bool cancel_all);
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+ unsigned int issue_flags,
+ struct io_uring_cqe cqe[2]);
+
void io_cmd_cache_free(const void *entry);
int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v4 5/5] io_uring/netcmd: add tx timestamping cmd support
2025-06-13 18:32 [PATCH v4 0/5] io_uring cmd for tx timestamps Pavel Begunkov
` (3 preceding siblings ...)
2025-06-13 18:32 ` [PATCH v4 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
@ 2025-06-13 18:32 ` Pavel Begunkov
2025-06-16 2:33 ` Willem de Bruijn
4 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-13 18:32 UTC (permalink / raw)
To: io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Add a new socket command which returns tx time stamps to the user. It
provide an alternative to the existing error queue recvmsg interface.
The command works in a polled multishot mode, which means io_uring will
poll the socket and keep posting timestamps until the request is
cancelled or fails in any other way (e.g. with no space in the CQ). It
reuses the net infra and grabs timestamps from the socket's error queue.
The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
timevalue is store in the upper part of the extended CQE. The final
completion won't have IORING_CQE_F_MORE and will have cqe->res storing
0/error.
Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 16 +++++++
io_uring/cmd_net.c | 82 +++++++++++++++++++++++++++++++++++
2 files changed, 98 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index cfd17e382082..dcadf709bfc4 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -968,6 +968,22 @@ enum io_uring_socket_op {
SOCKET_URING_OP_SIOCOUTQ,
SOCKET_URING_OP_GETSOCKOPT,
SOCKET_URING_OP_SETSOCKOPT,
+ SOCKET_URING_OP_TX_TIMESTAMP,
+};
+
+/*
+ * SOCKET_URING_OP_TX_TIMESTAMP definitions
+ */
+
+#define IORING_TIMESTAMP_HW_SHIFT 16
+/* The cqe->flags bit from which the timestamp type is stored */
+#define IORING_TIMESTAMP_TYPE_SHIFT (IORING_TIMESTAMP_HW_SHIFT + 1)
+/* The cqe->flags flag signifying whether it's a hardware timestamp */
+#define IORING_CQE_F_TSTAMP_HW ((__u32)1 << IORING_TIMESTAMP_HW_SHIFT);
+
+struct io_timespec {
+ __u64 tv_sec;
+ __u64 tv_nsec;
};
/* Zero copy receive refill queue entry */
diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
index e99170c7d41a..39726283b951 100644
--- a/io_uring/cmd_net.c
+++ b/io_uring/cmd_net.c
@@ -1,5 +1,6 @@
#include <asm/ioctls.h>
#include <linux/io_uring/net.h>
+#include <linux/errqueue.h>
#include <net/sock.h>
#include "uring_cmd.h"
@@ -51,6 +52,85 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
optlen);
}
+static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
+ struct sk_buff *skb, unsigned issue_flags)
+{
+ struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+ struct io_uring_cqe cqe[2];
+ struct io_timespec *iots;
+ struct timespec64 ts;
+ u32 tstype, tskey;
+ int ret;
+
+ BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
+
+ ret = skb_get_tx_timestamp(skb, sk, &ts);
+ if (ret < 0)
+ return false;
+
+ tskey = serr->ee.ee_data;
+ tstype = serr->ee.ee_info;
+
+ cqe->user_data = 0;
+ cqe->res = tskey;
+ cqe->flags = IORING_CQE_F_MORE;
+ cqe->flags |= tstype << IORING_TIMESTAMP_TYPE_SHIFT;
+ if (ret == NET_TIMESTAMP_ORIGIN_HW)
+ cqe->flags |= IORING_CQE_F_TSTAMP_HW;
+
+ iots = (struct io_timespec *)&cqe[1];
+ iots->tv_sec = ts.tv_sec;
+ iots->tv_nsec = ts.tv_nsec;
+ return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
+}
+
+static int io_uring_cmd_timestamp(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct sock *sk = sock->sk;
+ struct sk_buff_head *q = &sk->sk_error_queue;
+ struct sk_buff *skb, *tmp;
+ struct sk_buff_head list;
+ int ret;
+
+ if (!(issue_flags & IO_URING_F_CQE32))
+ return -EINVAL;
+ ret = io_cmd_poll_multishot(cmd, issue_flags, EPOLLERR);
+ if (unlikely(ret))
+ return ret;
+
+ if (skb_queue_empty_lockless(q))
+ return -EAGAIN;
+ __skb_queue_head_init(&list);
+
+ scoped_guard(spinlock_irq, &q->lock) {
+ skb_queue_walk_safe(q, skb, tmp) {
+ /* don't support skbs with payload */
+ if (!skb_has_tx_timestamp(skb, sk) || skb->len)
+ continue;
+ __skb_unlink(skb, q);
+ __skb_queue_tail(&list, skb);
+ }
+ }
+
+ while (1) {
+ skb = skb_peek(&list);
+ if (!skb)
+ break;
+ if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags))
+ break;
+ __skb_dequeue(&list);
+ consume_skb(skb);
+ }
+
+ if (!unlikely(skb_queue_empty(&list))) {
+ scoped_guard(spinlock_irqsave, &q->lock)
+ skb_queue_splice(q, &list);
+ }
+ return -EAGAIN;
+}
+
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
struct socket *sock = cmd->file->private_data;
@@ -76,6 +156,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
case SOCKET_URING_OP_SETSOCKOPT:
return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
+ case SOCKET_URING_OP_TX_TIMESTAMP:
+ return io_uring_cmd_timestamp(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-13 18:32 ` [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
@ 2025-06-16 2:31 ` Willem de Bruijn
2025-06-16 9:44 ` Pavel Begunkov
0 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2025-06-16 2:31 UTC (permalink / raw)
To: Pavel Begunkov, io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Pavel Begunkov wrote:
> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> associated with an error queue skb.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> include/net/sock.h | 9 +++++++++
> net/socket.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 55 insertions(+)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 92e7c1aae3cc..0b96196d8a34 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2677,6 +2677,15 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
> struct sk_buff *skb);
>
> +enum {
> + NET_TIMESTAMP_ORIGIN_SW = 0,
> + NET_TIMESTAMP_ORIGIN_HW = 1,
> +};
Can you avoid introducing a new enum, and instead just return
SOF_TIMESTAMPING_TX_HARDWARE (1) or SOF_TIMESTAMPING_TX_SOFTWARE (2)?
> +
> +bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk);
> +int skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> + struct timespec64 *ts);
> +
> static inline void
> sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
> {
> diff --git a/net/socket.c b/net/socket.c
> index 9a0e720f0859..eefbd730a9a2 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -843,6 +843,52 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
> sizeof(ts_pktinfo), &ts_pktinfo);
> }
>
> +bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk)
> +{
> + const struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> +
> + if (serr->ee.ee_errno != ENOMSG ||
> + serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
> + return false;
> +
> + /* software time stamp available and wanted */
> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
> + return true;
> + /* hardware time stamps available and wanted */
> + return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> + skb_hwtstamps(skb)->hwtstamp;
> +}
> +
> +int skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> + struct timespec64 *ts)
> +{
> + u32 tsflags = READ_ONCE(sk->sk_tsflags);
> + ktime_t hwtstamp;
> + int if_index = 0;
> +
> + if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
> + ktime_to_timespec64_cond(skb->tstamp, ts))
> + return NET_TIMESTAMP_ORIGIN_SW;
> +
> + if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
> + skb_is_swtx_tstamp(skb, false))
> + return -ENOENT;
> +
> + if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
> + hwtstamp = get_timestamp(sk, skb, &if_index);
> + else
> + hwtstamp = skb_hwtstamps(skb)->hwtstamp;
> +
> + if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
> + hwtstamp = ptp_convert_timestamp(&hwtstamp,
> + READ_ONCE(sk->sk_bind_phc));
> + if (!ktime_to_timespec64_cond(hwtstamp, ts))
> + return -ENOENT;
> +
> + return NET_TIMESTAMP_ORIGIN_HW;
> +}
> +
> /*
> * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
> */
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 5/5] io_uring/netcmd: add tx timestamping cmd support
2025-06-13 18:32 ` [PATCH v4 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
@ 2025-06-16 2:33 ` Willem de Bruijn
0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-06-16 2:33 UTC (permalink / raw)
To: Pavel Begunkov, io-uring, Vadim Fedorenko
Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Pavel Begunkov wrote:
> Add a new socket command which returns tx time stamps to the user. It
> provide an alternative to the existing error queue recvmsg interface.
> The command works in a polled multishot mode, which means io_uring will
> poll the socket and keep posting timestamps until the request is
> cancelled or fails in any other way (e.g. with no space in the CQ). It
> reuses the net infra and grabs timestamps from the socket's error queue.
>
> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
> timevalue is store in the upper part of the extended CQE. The final
> completion won't have IORING_CQE_F_MORE and will have cqe->res storing
> 0/error.
>
> Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-16 2:31 ` Willem de Bruijn
@ 2025-06-16 9:44 ` Pavel Begunkov
2025-06-16 14:58 ` Willem de Bruijn
0 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-16 9:44 UTC (permalink / raw)
To: Willem de Bruijn, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
On 6/16/25 03:31, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
>> associated with an error queue skb.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>> include/net/sock.h | 9 +++++++++
>> net/socket.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 55 insertions(+)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 92e7c1aae3cc..0b96196d8a34 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -2677,6 +2677,15 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>> void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
>> struct sk_buff *skb);
>>
>> +enum {
>> + NET_TIMESTAMP_ORIGIN_SW = 0,
>> + NET_TIMESTAMP_ORIGIN_HW = 1,
>> +};
>
> Can you avoid introducing a new enum, and instead just return
> SOF_TIMESTAMPING_TX_HARDWARE (1) or SOF_TIMESTAMPING_TX_SOFTWARE (2)?
I can't say I like it more because TX_{SW,HW} is just a small
subset of SOF_TIMESTAMPING_* flags and the caller by default
could assume that there might be other values as well, but let
me send v5 and we'll see which is better.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-16 9:44 ` Pavel Begunkov
@ 2025-06-16 14:58 ` Willem de Bruijn
2025-06-16 16:44 ` Pavel Begunkov
0 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2025-06-16 14:58 UTC (permalink / raw)
To: Pavel Begunkov, Willem de Bruijn, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
Pavel Begunkov wrote:
> On 6/16/25 03:31, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> >> associated with an error queue skb.
> >>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> ---
> >> include/net/sock.h | 9 +++++++++
> >> net/socket.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> >> 2 files changed, 55 insertions(+)
> >>
> >> diff --git a/include/net/sock.h b/include/net/sock.h
> >> index 92e7c1aae3cc..0b96196d8a34 100644
> >> --- a/include/net/sock.h
> >> +++ b/include/net/sock.h
> >> @@ -2677,6 +2677,15 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
> >> void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
> >> struct sk_buff *skb);
> >>
> >> +enum {
> >> + NET_TIMESTAMP_ORIGIN_SW = 0,
> >> + NET_TIMESTAMP_ORIGIN_HW = 1,
> >> +};
> >
> > Can you avoid introducing a new enum, and instead just return
> > SOF_TIMESTAMPING_TX_HARDWARE (1) or SOF_TIMESTAMPING_TX_SOFTWARE (2)?
>
> I can't say I like it more because TX_{SW,HW} is just a small
> subset of SOF_TIMESTAMPING_* flags and the caller by default
> could assume that there might be other values as well, but let
> me send v5 and we'll see which is better.
This is quite a lot of new timestamping logic for only io_uring as
user, and I don't see any other user of it coming soon. I also see no
easy way to make it more concise, so it's fine. But this at least
avoids one extra new enum.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp
2025-06-16 14:58 ` Willem de Bruijn
@ 2025-06-16 16:44 ` Pavel Begunkov
0 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-06-16 16:44 UTC (permalink / raw)
To: Willem de Bruijn, io-uring, Vadim Fedorenko
Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, David S . Miller, Jakub Kicinski,
Richard Cochran, Stanislav Fomichev, Jason Xing
On 6/16/25 15:58, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> On 6/16/25 03:31, Willem de Bruijn wrote:
>>> Pavel Begunkov wrote:
>>>> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
>>>> associated with an error queue skb.
>>>>
>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> ---
>>>> include/net/sock.h | 9 +++++++++
>>>> net/socket.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 55 insertions(+)
>>>>
>>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>>> index 92e7c1aae3cc..0b96196d8a34 100644
>>>> --- a/include/net/sock.h
>>>> +++ b/include/net/sock.h
>>>> @@ -2677,6 +2677,15 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>>>> void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
>>>> struct sk_buff *skb);
>>>>
>>>> +enum {
>>>> + NET_TIMESTAMP_ORIGIN_SW = 0,
>>>> + NET_TIMESTAMP_ORIGIN_HW = 1,
>>>> +};
>>>
>>> Can you avoid introducing a new enum, and instead just return
>>> SOF_TIMESTAMPING_TX_HARDWARE (1) or SOF_TIMESTAMPING_TX_SOFTWARE (2)?
>>
>> I can't say I like it more because TX_{SW,HW} is just a small
>> subset of SOF_TIMESTAMPING_* flags and the caller by default
>> could assume that there might be other values as well, but let
>> me send v5 and we'll see which is better.
>
> This is quite a lot of new timestamping logic for only io_uring as
> user, and I don't see any other user of it coming soon. I also see no
> easy way to make it more concise, so it's fine. But this at least
> avoids one extra new enum.
enums are free :) Anyway, I don't have plans for further changes,
so I agree, SOF_TIMESTAMPING_* shouldn't be a problem.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-06-16 16:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13 18:32 [PATCH v4 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
2025-06-16 2:31 ` Willem de Bruijn
2025-06-16 9:44 ` Pavel Begunkov
2025-06-16 14:58 ` Willem de Bruijn
2025-06-16 16:44 ` Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
2025-06-13 18:32 ` [PATCH v4 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
2025-06-16 2:33 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox