public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] io_uring cmd for tx timestamps
@ 2025-06-04  8:42 Pavel Begunkov
  2025-06-04  8:42 ` [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04  8:42 UTC (permalink / raw)
  To: io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Vadim Fedorenko suggested to add an alternative API for receiving
tx timestamps through io_uring. The series introduces io_uring socket
cmd for fetching tx timestamps, which is a polled multishot request,
i.e. internally polling the socket for POLLERR and posts timestamps
when they're arrives. For the API description see Patch 5.

It reuses existing timestamp infra and takes them from the socket's
error queue. For networking people the important parts are Patch 1,
and io_uring_cmd_timestamp() from Patch 5 walking the error queue.

It should be reasonable to take it through the io_uring tree once
we have consensus, but let me know if there are any concerns.

v2: remove (rx) false timestamp handling
    fix skipping already queued events on request submission
    constantize socket in a helper

Pavel Begunkov (5):
  net: timestamp: add helper returning skb's tx tstamp
  io_uring/poll: introduce io_arm_apoll()
  io_uring/cmd: allow multishot polled commands
  io_uring: add mshot helper for posting CQE32
  io_uring/netcmd: add tx timestamping cmd support

 include/net/sock.h            |  4 ++
 include/uapi/linux/io_uring.h |  6 +++
 io_uring/cmd_net.c            | 77 +++++++++++++++++++++++++++++++++++
 io_uring/io_uring.c           | 40 ++++++++++++++++++
 io_uring/io_uring.h           |  1 +
 io_uring/poll.c               | 44 ++++++++++++--------
 io_uring/poll.h               |  1 +
 io_uring/uring_cmd.c          | 34 ++++++++++++++++
 io_uring/uring_cmd.h          |  7 ++++
 net/socket.c                  | 43 +++++++++++++++++++
 10 files changed, 240 insertions(+), 17 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp
  2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
@ 2025-06-04  8:42 ` Pavel Begunkov
  2025-06-04 15:37   ` Stanislav Fomichev
                     ` (2 more replies)
  2025-06-04  8:42 ` [PATCH v2 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04  8:42 UTC (permalink / raw)
  To: io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
associated with an skb from an queue queue.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/sock.h |  4 ++++
 net/socket.c       | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 92e7c1aae3cc..1cd288880ab3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
 			     struct sk_buff *skb);
 
+bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk);
+bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
+			  struct timespec64 *ts);
+
 static inline void
 sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 {
diff --git a/net/socket.c b/net/socket.c
index 9a0e720f0859..e9c8f3074fe1 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -843,6 +843,49 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
 		 sizeof(ts_pktinfo), &ts_pktinfo);
 }
 
+bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk)
+{
+	const struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+	u32 tsflags = READ_ONCE(sk->sk_tsflags);
+
+	if (serr->ee.ee_errno != ENOMSG ||
+	   serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
+		return false;
+
+	/* software time stamp available and wanted */
+	if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
+		return true;
+	/* hardware time stamps available and wanted */
+	return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
+		skb_hwtstamps(skb)->hwtstamp;
+}
+
+bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
+			  struct timespec64 *ts)
+{
+	u32 tsflags = READ_ONCE(sk->sk_tsflags);
+	ktime_t hwtstamp;
+	int if_index = 0;
+
+	if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
+	    ktime_to_timespec64_cond(skb->tstamp, ts))
+		return true;
+
+	if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
+	    skb_is_swtx_tstamp(skb, false))
+		return false;
+
+	if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
+		hwtstamp = get_timestamp(sk, skb, &if_index);
+	else
+		hwtstamp = skb_hwtstamps(skb)->hwtstamp;
+
+	if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
+		hwtstamp = ptp_convert_timestamp(&hwtstamp,
+						READ_ONCE(sk->sk_bind_phc));
+	return ktime_to_timespec64_cond(hwtstamp, ts);
+}
+
 /*
  * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
  */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/5] io_uring/poll: introduce io_arm_apoll()
  2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
  2025-06-04  8:42 ` [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
@ 2025-06-04  8:42 ` Pavel Begunkov
  2025-06-04  8:42 ` [PATCH v2 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04  8:42 UTC (permalink / raw)
  To: io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

In preparation to allowing commands to do file polling, add a helper
that takes the desired poll event mask and arms it for polling. We won't
be able to use io_arm_poll_handler() with IORING_OP_URING_CMD as it
tries to infer the mask from the opcode data, and we can't unify it
across all commands.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/poll.c | 44 +++++++++++++++++++++++++++-----------------
 io_uring/poll.h |  1 +
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 0526062e2f81..c7e9fb34563d 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -669,33 +669,18 @@ static struct async_poll *io_req_alloc_apoll(struct io_kiocb *req,
 	return apoll;
 }
 
-int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask)
 {
-	const struct io_issue_def *def = &io_issue_defs[req->opcode];
 	struct async_poll *apoll;
 	struct io_poll_table ipt;
-	__poll_t mask = POLLPRI | POLLERR | EPOLLET;
 	int ret;
 
-	if (!def->pollin && !def->pollout)
-		return IO_APOLL_ABORTED;
+	mask |= EPOLLET;
 	if (!io_file_can_poll(req))
 		return IO_APOLL_ABORTED;
 	if (!(req->flags & REQ_F_APOLL_MULTISHOT))
 		mask |= EPOLLONESHOT;
 
-	if (def->pollin) {
-		mask |= EPOLLIN | EPOLLRDNORM;
-
-		/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
-		if (req->flags & REQ_F_CLEAR_POLLIN)
-			mask &= ~EPOLLIN;
-	} else {
-		mask |= EPOLLOUT | EPOLLWRNORM;
-	}
-	if (def->poll_exclusive)
-		mask |= EPOLLEXCLUSIVE;
-
 	apoll = io_req_alloc_apoll(req, issue_flags);
 	if (!apoll)
 		return IO_APOLL_ABORTED;
@@ -712,6 +697,31 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 	return IO_APOLL_OK;
 }
 
+int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+{
+	const struct io_issue_def *def = &io_issue_defs[req->opcode];
+	__poll_t mask = POLLPRI | POLLERR;
+
+	if (!def->pollin && !def->pollout)
+		return IO_APOLL_ABORTED;
+	if (!io_file_can_poll(req))
+		return IO_APOLL_ABORTED;
+
+	if (def->pollin) {
+		mask |= EPOLLIN | EPOLLRDNORM;
+
+		/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+		if (req->flags & REQ_F_CLEAR_POLLIN)
+			mask &= ~EPOLLIN;
+	} else {
+		mask |= EPOLLOUT | EPOLLWRNORM;
+	}
+	if (def->poll_exclusive)
+		mask |= EPOLLEXCLUSIVE;
+
+	return io_arm_apoll(req, issue_flags, mask);
+}
+
 /*
  * Returns true if we found and killed one or more poll requests
  */
diff --git a/io_uring/poll.h b/io_uring/poll.h
index 27e2db2ed4ae..c8438286dfa0 100644
--- a/io_uring/poll.h
+++ b/io_uring/poll.h
@@ -41,6 +41,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
 struct io_cancel_data;
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 		   unsigned issue_flags);
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask);
 int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags);
 bool io_poll_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx,
 			bool cancel_all);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/5] io_uring/cmd: allow multishot polled commands
  2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
  2025-06-04  8:42 ` [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
  2025-06-04  8:42 ` [PATCH v2 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
@ 2025-06-04  8:42 ` Pavel Begunkov
  2025-06-04  8:42 ` [PATCH v2 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04  8:42 UTC (permalink / raw)
  To: io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Some commands like timestamping in the next patch can make use of
multishot polling, i.e. REQ_F_APOLL_MULTISHOT. Add support for that,
which is condensed in a single helper called io_cmd_poll_multishot().

The user who wants to continue with a request in a multishot mode must
call the function, and only if it returns 0 the user is free to proceed.
Apart from normal terminal errors, it can also end up with -EIOCBQUEUED,
in which case the user must forward it to the core io_uring. It's
forbidden to use task work while the request is executing in a multishot
mode.

The API is not foolproof, hence it's not exported to modules nor exposed
in public headers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/uring_cmd.c | 23 +++++++++++++++++++++++
 io_uring/uring_cmd.h |  3 +++
 2 files changed, 26 insertions(+)

diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 929cad6ee326..2710521eec62 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -12,6 +12,7 @@
 #include "alloc_cache.h"
 #include "rsrc.h"
 #include "uring_cmd.h"
+#include "poll.h"
 
 void io_cmd_cache_free(const void *entry)
 {
@@ -136,6 +137,9 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
 {
 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
 
+	if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+		return;
+
 	ioucmd->task_work_cb = task_work_cb;
 	req->io_task_work.func = io_uring_cmd_work;
 	__io_req_task_work_add(req, flags);
@@ -158,6 +162,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
 {
 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
 
+	if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+		return;
+
 	io_uring_cmd_del_cancelable(ioucmd, issue_flags);
 
 	if (ret < 0)
@@ -310,3 +317,19 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd)
 
 	io_req_queue_iowq(req);
 }
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+			  unsigned int issue_flags, __poll_t mask)
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+	int ret;
+
+	if (likely(req->flags & REQ_F_APOLL_MULTISHOT))
+		return 0;
+
+	req->flags |= REQ_F_APOLL_MULTISHOT;
+	mask &= ~EPOLLONESHOT;
+
+	ret = io_arm_apoll(req, issue_flags, mask);
+	return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index e6a5142c890e..9565ca5d5cf2 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -17,3 +17,6 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
 				   struct io_uring_task *tctx, bool cancel_all);
 
 void io_cmd_cache_free(const void *entry);
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+			  unsigned int issue_flags, __poll_t mask);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/5] io_uring: add mshot helper for posting CQE32
  2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
                   ` (2 preceding siblings ...)
  2025-06-04  8:42 ` [PATCH v2 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
@ 2025-06-04  8:42 ` Pavel Begunkov
  2025-06-04  8:42 ` [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04  8:42 UTC (permalink / raw)
  To: io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Add a helper for posting 32 byte CQEs in a multishot mode and add a cmd
helper on top. As it specifically works with requests, the helper ignore
the passed in cqe->user_data and sets it to the one stored in the
request.

The command helper is only valid with multishot requests.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/io_uring.c  | 40 ++++++++++++++++++++++++++++++++++++++++
 io_uring/io_uring.h  |  1 +
 io_uring/uring_cmd.c | 11 +++++++++++
 io_uring/uring_cmd.h |  4 ++++
 4 files changed, 56 insertions(+)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c7a9cecf528e..4ca357057384 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -788,6 +788,21 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow)
 	return true;
 }
 
+static bool io_fill_cqe_aux32(struct io_ring_ctx *ctx,
+			      struct io_uring_cqe src_cqe[2])
+{
+	struct io_uring_cqe *cqe;
+
+	if (WARN_ON_ONCE(!(ctx->flags & IORING_SETUP_CQE32)))
+		return false;
+	if (unlikely(!io_get_cqe(ctx, &cqe)))
+		return false;
+
+	memcpy(cqe, src_cqe, 2 * sizeof(*cqe));
+	trace_io_uring_complete(ctx, NULL, cqe);
+	return true;
+}
+
 static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
 			      u32 cflags)
 {
@@ -899,6 +914,31 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
 	return posted;
 }
 
+/*
+ * A helper for multishot requests posting additional CQEs.
+ * Should only be used from a task_work including IO_URING_F_MULTISHOT.
+ */
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2])
+{
+	struct io_ring_ctx *ctx = req->ctx;
+	bool posted;
+
+	lockdep_assert(!io_wq_current_is_worker());
+	lockdep_assert_held(&ctx->uring_lock);
+
+	cqe[0].user_data = req->cqe.user_data;
+	if (!ctx->lockless_cq) {
+		spin_lock(&ctx->completion_lock);
+		posted = io_fill_cqe_aux32(ctx, cqe);
+		spin_unlock(&ctx->completion_lock);
+	} else {
+		posted = io_fill_cqe_aux32(ctx, cqe);
+	}
+
+	ctx->submit_state.cq_flush = true;
+	return posted;
+}
+
 static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 0ea7a435d1de..bff5580507ba 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -81,6 +81,7 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res);
 bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
 void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
 bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags);
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe src_cqe[2]);
 void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
 
 struct file *io_file_get_normal(struct io_kiocb *req, int fd);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 2710521eec62..429a3e4a6a02 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -333,3 +333,14 @@ int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
 	ret = io_arm_apoll(req, issue_flags, mask);
 	return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
 }
+
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+				   unsigned int issue_flags,
+				   struct io_uring_cqe cqe[2])
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+
+	if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_MULTISHOT)))
+		return false;
+	return io_req_post_cqe32(req, cqe);
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index 9565ca5d5cf2..be97407e4019 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -16,6 +16,10 @@ void io_uring_cmd_cleanup(struct io_kiocb *req);
 bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
 				   struct io_uring_task *tctx, bool cancel_all);
 
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+				   unsigned int issue_flags,
+				   struct io_uring_cqe cqe[2]);
+
 void io_cmd_cache_free(const void *entry);
 
 int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
                   ` (3 preceding siblings ...)
  2025-06-04  8:42 ` [PATCH v2 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
@ 2025-06-04  8:42 ` Pavel Begunkov
  2025-06-04 12:04   ` Jens Axboe
  2025-06-05  0:59   ` Willem de Bruijn
  2025-06-04  8:53 ` [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
  2025-06-04 12:06 ` Jens Axboe
  6 siblings, 2 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04  8:42 UTC (permalink / raw)
  To: io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Add a new socket command which returns tx time stamps to the user. It
provide an alternative to the existing error queue recvmsg interface.
The command works in a polled multishot mode, which means io_uring will
poll the socket and keep posting timestamps until the request is
cancelled or fails in any other way (e.g. with no space in the CQ). It
reuses the net infra and grabs timestamps from the socket's error queue.

The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
timevalue is store in the upper part of the extended CQE. The final
completion won't have IORING_CQR_F_MORE and will have cqe->res storing
0/error.

Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/uapi/linux/io_uring.h |  6 +++
 io_uring/cmd_net.c            | 77 +++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index cfd17e382082..0bc156eb96d4 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -960,6 +960,11 @@ struct io_uring_recvmsg_out {
 	__u32 flags;
 };
 
+struct io_timespec {
+	__u64		tv_sec;
+	__u64		tv_nsec;
+};
+
 /*
  * Argument for IORING_OP_URING_CMD when file is a socket
  */
@@ -968,6 +973,7 @@ enum io_uring_socket_op {
 	SOCKET_URING_OP_SIOCOUTQ,
 	SOCKET_URING_OP_GETSOCKOPT,
 	SOCKET_URING_OP_SETSOCKOPT,
+	SOCKET_URING_OP_TX_TIMESTAMP,
 };
 
 /* Zero copy receive refill queue entry */
diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
index e99170c7d41a..dae59aea5847 100644
--- a/io_uring/cmd_net.c
+++ b/io_uring/cmd_net.c
@@ -1,5 +1,6 @@
 #include <asm/ioctls.h>
 #include <linux/io_uring/net.h>
+#include <linux/errqueue.h>
 #include <net/sock.h>
 
 #include "uring_cmd.h"
@@ -51,6 +52,80 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
 				  optlen);
 }
 
+static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
+				     struct sk_buff *skb, unsigned issue_flags)
+{
+	struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+	struct io_uring_cqe cqe[2];
+	struct io_timespec *iots;
+	struct timespec64 ts;
+	u32 tskey;
+
+	BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
+
+	if (!skb_get_tx_timestamp(skb, sk, &ts))
+		return false;
+
+	tskey = serr->ee.ee_data;
+
+	cqe->user_data = 0;
+	cqe->res = tskey;
+	cqe->flags = IORING_CQE_F_MORE;
+	cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
+
+	iots = (struct io_timespec *)&cqe[1];
+	iots->tv_sec = ts.tv_sec;
+	iots->tv_nsec = ts.tv_nsec;
+	return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
+}
+
+static int io_uring_cmd_timestamp(struct socket *sock,
+				  struct io_uring_cmd *cmd,
+				  unsigned int issue_flags)
+{
+	struct sock *sk = sock->sk;
+	struct sk_buff_head *q = &sk->sk_error_queue;
+	struct sk_buff *skb, *tmp;
+	struct sk_buff_head list;
+	int ret;
+
+	if (!(issue_flags & IO_URING_F_CQE32))
+		return -EINVAL;
+	ret = io_cmd_poll_multishot(cmd, issue_flags, EPOLLERR);
+	if (unlikely(ret))
+		return ret;
+
+	if (skb_queue_empty_lockless(q))
+		return -EAGAIN;
+	__skb_queue_head_init(&list);
+
+	scoped_guard(spinlock_irq, &q->lock) {
+		skb_queue_walk_safe(q, skb, tmp) {
+			/* don't support skbs with payload */
+			if (!skb_has_tx_timestamp(skb, sk) || skb->len)
+				continue;
+			__skb_unlink(skb, q);
+			__skb_queue_tail(&list, skb);
+		}
+	}
+
+	while (1) {
+		skb = skb_peek(&list);
+		if (!skb)
+			break;
+		if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags))
+			break;
+		__skb_dequeue(&list);
+		consume_skb(skb);
+	}
+
+	if (!unlikely(skb_queue_empty(&list))) {
+		scoped_guard(spinlock_irqsave, &q->lock)
+			skb_queue_splice(q, &list);
+	}
+	return -EAGAIN;
+}
+
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
 	struct socket *sock = cmd->file->private_data;
@@ -76,6 +151,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
 	case SOCKET_URING_OP_SETSOCKOPT:
 		return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
+	case SOCKET_URING_OP_TX_TIMESTAMP:
+		return io_uring_cmd_timestamp(sock, cmd, issue_flags);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/5] io_uring cmd for tx timestamps
  2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
                   ` (4 preceding siblings ...)
  2025-06-04  8:42 ` [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
@ 2025-06-04  8:53 ` Pavel Begunkov
  2025-06-04 12:06 ` Jens Axboe
  6 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04  8:53 UTC (permalink / raw)
  To: io-uring, Vadim Fedorenko
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran, Stanislav Fomichev

On 6/4/25 09:42, Pavel Begunkov wrote:
> Vadim Fedorenko suggested to add an alternative API for receiving
> tx timestamps through io_uring. The series introduces io_uring socket
> cmd for fetching tx timestamps, which is a polled multishot request,
> i.e. internally polling the socket for POLLERR and posts timestamps
> when they're arrives. For the API description see Patch 5.
> 
> It reuses existing timestamp infra and takes them from the socket's
> error queue. For networking people the important parts are Patch 1,
> and io_uring_cmd_timestamp() from Patch 5 walking the error queue.
> 
> It should be reasonable to take it through the io_uring tree once
> we have consensus, but let me know if there are any concerns.

CC'ing Stanislav as he took a look at v1

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-04  8:42 ` [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
@ 2025-06-04 12:04   ` Jens Axboe
  2025-06-04 12:33     ` Pavel Begunkov
  2025-06-05  0:59   ` Willem de Bruijn
  1 sibling, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2025-06-04 12:04 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, Vadim Fedorenko
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

On 6/4/25 2:42 AM, Pavel Begunkov wrote:
> Add a new socket command which returns tx time stamps to the user. It
> provide an alternative to the existing error queue recvmsg interface.
> The command works in a polled multishot mode, which means io_uring will
> poll the socket and keep posting timestamps until the request is
> cancelled or fails in any other way (e.g. with no space in the CQ). It
> reuses the net infra and grabs timestamps from the socket's error queue.
> 
> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
> timevalue is store in the upper part of the extended CQE. The final
> completion won't have IORING_CQR_F_MORE and will have cqe->res storing
                         ^^

CQE_F_MORE

Minor nit below.

> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
> +				     struct sk_buff *skb, unsigned issue_flags)
> +{
> +	struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> +	struct io_uring_cqe cqe[2];
> +	struct io_timespec *iots;
> +	struct timespec64 ts;
> +	u32 tskey;
> +
> +	BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
> +
> +	if (!skb_get_tx_timestamp(skb, sk, &ts))
> +		return false;
> +
> +	tskey = serr->ee.ee_data;
> +
> +	cqe->user_data = 0;
> +	cqe->res = tskey;
> +	cqe->flags = IORING_CQE_F_MORE;
> +	cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;

Get rid of the tskey variable? And I think this would be more easily
readable if it used:

	cqe[0].user_data = 0;

etc.

> +	iots = (struct io_timespec *)&cqe[1];
> +	iots->tv_sec = ts.tv_sec;
> +	iots->tv_nsec = ts.tv_nsec;
> +	return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
> +}

A bit of a shame we can't just get the double CQE and fill it in, rather
than fill it on stack and copy it. But probably doesn't matter much.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/5] io_uring cmd for tx timestamps
  2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
                   ` (5 preceding siblings ...)
  2025-06-04  8:53 ` [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
@ 2025-06-04 12:06 ` Jens Axboe
  2025-06-04 12:38   ` Pavel Begunkov
  6 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2025-06-04 12:06 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, Vadim Fedorenko
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

On 6/4/25 2:42 AM, Pavel Begunkov wrote:
> Vadim Fedorenko suggested to add an alternative API for receiving
> tx timestamps through io_uring. The series introduces io_uring socket
> cmd for fetching tx timestamps, which is a polled multishot request,
> i.e. internally polling the socket for POLLERR and posts timestamps
> when they're arrives. For the API description see Patch 5.
> 
> It reuses existing timestamp infra and takes them from the socket's
> error queue. For networking people the important parts are Patch 1,
> and io_uring_cmd_timestamp() from Patch 5 walking the error queue.
> 
> It should be reasonable to take it through the io_uring tree once
> we have consensus, but let me know if there are any concerns.

Still looks fine to me - is there a liburing test case as well?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-04 12:04   ` Jens Axboe
@ 2025-06-04 12:33     ` Pavel Begunkov
  0 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04 12:33 UTC (permalink / raw)
  To: Jens Axboe, io-uring, Vadim Fedorenko
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

On 6/4/25 13:04, Jens Axboe wrote:
> On 6/4/25 2:42 AM, Pavel Begunkov wrote:
>> Add a new socket command which returns tx time stamps to the user. It
>> provide an alternative to the existing error queue recvmsg interface.
>> The command works in a polled multishot mode, which means io_uring will
>> poll the socket and keep posting timestamps until the request is
>> cancelled or fails in any other way (e.g. with no space in the CQ). It
>> reuses the net infra and grabs timestamps from the socket's error queue.
>>
>> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
>> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
>> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
>> timevalue is store in the upper part of the extended CQE. The final
>> completion won't have IORING_CQR_F_MORE and will have cqe->res storing
>                           ^^
> 
> CQE_F_MORE
> 
> Minor nit below.
> 
>> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
>> +				     struct sk_buff *skb, unsigned issue_flags)
>> +{
>> +	struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
>> +	struct io_uring_cqe cqe[2];
>> +	struct io_timespec *iots;
>> +	struct timespec64 ts;
>> +	u32 tskey;
>> +
>> +	BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
>> +
>> +	if (!skb_get_tx_timestamp(skb, sk, &ts))
>> +		return false;
>> +
>> +	tskey = serr->ee.ee_data;
>> +
>> +	cqe->user_data = 0;
>> +	cqe->res = tskey;
>> +	cqe->flags = IORING_CQE_F_MORE;
>> +	cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
> 
> Get rid of the tskey variable?

Why? I named it specifically so that it's obvious what the field
means, and "cqe->res = serr->ee.ee_data" hardly tells the meaning
of the fields without extra digging.

And I think this would be more easily
> readable if it used:
> 
> 	cqe[0].user_data = 0;

I don't see how it'd be easier to read by cluttering the code
with "[0]" in multiple statements. Combined with that it's
one extended cqe and not some array of them.
  > etc.
> 
>> +	iots = (struct io_timespec *)&cqe[1];
>> +	iots->tv_sec = ts.tv_sec;
>> +	iots->tv_nsec = ts.tv_nsec;
>> +	return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
>> +}
> 
> A bit of a shame we can't just get the double CQE and fill it in, rather
> than fill it on stack and copy it. But probably doesn't matter much.
> 

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/5] io_uring cmd for tx timestamps
  2025-06-04 12:06 ` Jens Axboe
@ 2025-06-04 12:38   ` Pavel Begunkov
  0 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-04 12:38 UTC (permalink / raw)
  To: Jens Axboe, io-uring, Vadim Fedorenko
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

On 6/4/25 13:06, Jens Axboe wrote:
> On 6/4/25 2:42 AM, Pavel Begunkov wrote:
>> Vadim Fedorenko suggested to add an alternative API for receiving
>> tx timestamps through io_uring. The series introduces io_uring socket
>> cmd for fetching tx timestamps, which is a polled multishot request,
>> i.e. internally polling the socket for POLLERR and posts timestamps
>> when they're arrives. For the API description see Patch 5.
>>
>> It reuses existing timestamp infra and takes them from the socket's
>> error queue. For networking people the important parts are Patch 1,
>> and io_uring_cmd_timestamp() from Patch 5 walking the error queue.
>>
>> It should be reasonable to take it through the io_uring tree once
>> we have consensus, but let me know if there are any concerns.
> 
> Still looks fine to me - is there a liburing test case as well?

Just a hacky adaptation of a selftest for now and requires tc
incantations either way.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp
  2025-06-04  8:42 ` [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
@ 2025-06-04 15:37   ` Stanislav Fomichev
  2025-06-05  0:52   ` Willem de Bruijn
  2025-06-05  3:51   ` Jason Xing
  2 siblings, 0 replies; 23+ messages in thread
From: Stanislav Fomichev @ 2025-06-04 15:37 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: io-uring, Vadim Fedorenko, netdev, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Richard Cochran

On 06/04, Pavel Begunkov wrote:
> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> associated with an skb from an queue queue.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp
  2025-06-04  8:42 ` [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
  2025-06-04 15:37   ` Stanislav Fomichev
@ 2025-06-05  0:52   ` Willem de Bruijn
  2025-06-05  3:51   ` Jason Xing
  2 siblings, 0 replies; 23+ messages in thread
From: Willem de Bruijn @ 2025-06-05  0:52 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Pavel Begunkov wrote:
> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> associated with an skb from an queue queue.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-04  8:42 ` [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
  2025-06-04 12:04   ` Jens Axboe
@ 2025-06-05  0:59   ` Willem de Bruijn
  2025-06-05 10:25     ` Vadim Fedorenko
  1 sibling, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-06-05  0:59 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, Vadim Fedorenko
  Cc: asml.silence, netdev, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Pavel Begunkov wrote:
> Add a new socket command which returns tx time stamps to the user. It
> provide an alternative to the existing error queue recvmsg interface.
> The command works in a polled multishot mode, which means io_uring will
> poll the socket and keep posting timestamps until the request is
> cancelled or fails in any other way (e.g. with no space in the CQ). It
> reuses the net infra and grabs timestamps from the socket's error queue.
> 
> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
> timevalue is store in the upper part of the extended CQE. The final
> completion won't have IORING_CQR_F_MORE and will have cqe->res storing
> 0/error.
> 
> Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/uapi/linux/io_uring.h |  6 +++
>  io_uring/cmd_net.c            | 77 +++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index cfd17e382082..0bc156eb96d4 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -960,6 +960,11 @@ struct io_uring_recvmsg_out {
>  	__u32 flags;
>  };
>  
> +struct io_timespec {
> +	__u64		tv_sec;
> +	__u64		tv_nsec;
> +};
> +
>  /*
>   * Argument for IORING_OP_URING_CMD when file is a socket
>   */
> @@ -968,6 +973,7 @@ enum io_uring_socket_op {
>  	SOCKET_URING_OP_SIOCOUTQ,
>  	SOCKET_URING_OP_GETSOCKOPT,
>  	SOCKET_URING_OP_SETSOCKOPT,
> +	SOCKET_URING_OP_TX_TIMESTAMP,
>  };
>  
>  /* Zero copy receive refill queue entry */
> diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
> index e99170c7d41a..dae59aea5847 100644
> --- a/io_uring/cmd_net.c
> +++ b/io_uring/cmd_net.c
> @@ -1,5 +1,6 @@
>  #include <asm/ioctls.h>
>  #include <linux/io_uring/net.h>
> +#include <linux/errqueue.h>
>  #include <net/sock.h>
>  
>  #include "uring_cmd.h"
> @@ -51,6 +52,80 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
>  				  optlen);
>  }
>  
> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
> +				     struct sk_buff *skb, unsigned issue_flags)
> +{
> +	struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> +	struct io_uring_cqe cqe[2];
> +	struct io_timespec *iots;
> +	struct timespec64 ts;
> +	u32 tskey;
> +
> +	BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
> +
> +	if (!skb_get_tx_timestamp(skb, sk, &ts))
> +		return false;
> +
> +	tskey = serr->ee.ee_data;
> +
> +	cqe->user_data = 0;
> +	cqe->res = tskey;
> +	cqe->flags = IORING_CQE_F_MORE;
> +	cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
> +
> +	iots = (struct io_timespec *)&cqe[1];
> +	iots->tv_sec = ts.tv_sec;
> +	iots->tv_nsec = ts.tv_nsec;

skb_get_tx_timestamp loses the information whether this is a
software or a hardware timestamp. Is that loss problematic?

If a process only requests one type of timestamp, it will not be.

But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
annotation may be necessary.

> +	return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
> +}
> +
> +static int io_uring_cmd_timestamp(struct socket *sock,
> +				  struct io_uring_cmd *cmd,
> +				  unsigned int issue_flags)
> +{
> +	struct sock *sk = sock->sk;
> +	struct sk_buff_head *q = &sk->sk_error_queue;
> +	struct sk_buff *skb, *tmp;
> +	struct sk_buff_head list;
> +	int ret;
> +
> +	if (!(issue_flags & IO_URING_F_CQE32))
> +		return -EINVAL;
> +	ret = io_cmd_poll_multishot(cmd, issue_flags, EPOLLERR);
> +	if (unlikely(ret))
> +		return ret;
> +
> +	if (skb_queue_empty_lockless(q))
> +		return -EAGAIN;
> +	__skb_queue_head_init(&list);
> +
> +	scoped_guard(spinlock_irq, &q->lock) {
> +		skb_queue_walk_safe(q, skb, tmp) {
> +			/* don't support skbs with payload */
> +			if (!skb_has_tx_timestamp(skb, sk) || skb->len)
> +				continue;
> +			__skb_unlink(skb, q);
> +			__skb_queue_tail(&list, skb);
> +		}
> +	}
> +
> +	while (1) {
> +		skb = skb_peek(&list);
> +		if (!skb)
> +			break;
> +		if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags))
> +			break;
> +		__skb_dequeue(&list);
> +		consume_skb(skb);
> +	}
> +
> +	if (!unlikely(skb_queue_empty(&list))) {
> +		scoped_guard(spinlock_irqsave, &q->lock)
> +			skb_queue_splice(q, &list);
> +	}
> +	return -EAGAIN;
> +}
> +
>  int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  {
>  	struct socket *sock = cmd->file->private_data;
> @@ -76,6 +151,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  		return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
>  	case SOCKET_URING_OP_SETSOCKOPT:
>  		return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
> +	case SOCKET_URING_OP_TX_TIMESTAMP:
> +		return io_uring_cmd_timestamp(sock, cmd, issue_flags);
>  	default:
>  		return -EOPNOTSUPP;
>  	}
> -- 
> 2.49.0
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp
  2025-06-04  8:42 ` [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
  2025-06-04 15:37   ` Stanislav Fomichev
  2025-06-05  0:52   ` Willem de Bruijn
@ 2025-06-05  3:51   ` Jason Xing
  2 siblings, 0 replies; 23+ messages in thread
From: Jason Xing @ 2025-06-05  3:51 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: io-uring, Vadim Fedorenko, netdev, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Richard Cochran

On Wed, Jun 4, 2025 at 4:41 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
> associated with an skb from an queue queue.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thanks,
Jason

> ---
>  include/net/sock.h |  4 ++++
>  net/socket.c       | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 92e7c1aae3cc..1cd288880ab3 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2677,6 +2677,10 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
>  void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
>                              struct sk_buff *skb);
>
> +bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk);
> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> +                         struct timespec64 *ts);
> +
>  static inline void
>  sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
>  {
> diff --git a/net/socket.c b/net/socket.c
> index 9a0e720f0859..e9c8f3074fe1 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -843,6 +843,49 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
>                  sizeof(ts_pktinfo), &ts_pktinfo);
>  }
>
> +bool skb_has_tx_timestamp(struct sk_buff *skb, const struct sock *sk)
> +{
> +       const struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> +       u32 tsflags = READ_ONCE(sk->sk_tsflags);
> +
> +       if (serr->ee.ee_errno != ENOMSG ||
> +          serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING)
> +               return false;
> +
> +       /* software time stamp available and wanted */
> +       if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
> +               return true;
> +       /* hardware time stamps available and wanted */
> +       return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
> +               skb_hwtstamps(skb)->hwtstamp;
> +}
> +
> +bool skb_get_tx_timestamp(struct sk_buff *skb, struct sock *sk,
> +                         struct timespec64 *ts)
> +{
> +       u32 tsflags = READ_ONCE(sk->sk_tsflags);
> +       ktime_t hwtstamp;
> +       int if_index = 0;
> +
> +       if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
> +           ktime_to_timespec64_cond(skb->tstamp, ts))
> +               return true;
> +
> +       if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
> +           skb_is_swtx_tstamp(skb, false))
> +               return false;
> +
> +       if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
> +               hwtstamp = get_timestamp(sk, skb, &if_index);
> +       else
> +               hwtstamp = skb_hwtstamps(skb)->hwtstamp;
> +
> +       if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
> +               hwtstamp = ptp_convert_timestamp(&hwtstamp,
> +                                               READ_ONCE(sk->sk_bind_phc));
> +       return ktime_to_timespec64_cond(hwtstamp, ts);
> +}
> +
>  /*
>   * called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
>   */
> --
> 2.49.0
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-05  0:59   ` Willem de Bruijn
@ 2025-06-05 10:25     ` Vadim Fedorenko
  2025-06-05 11:01       ` Pavel Begunkov
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Vadim Fedorenko @ 2025-06-05 10:25 UTC (permalink / raw)
  To: Willem de Bruijn, Pavel Begunkov, io-uring
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

On 05/06/2025 01:59, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> Add a new socket command which returns tx time stamps to the user. It
>> provide an alternative to the existing error queue recvmsg interface.
>> The command works in a polled multishot mode, which means io_uring will
>> poll the socket and keep posting timestamps until the request is
>> cancelled or fails in any other way (e.g. with no space in the CQ). It
>> reuses the net infra and grabs timestamps from the socket's error queue.
>>
>> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
>> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
>> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
>> timevalue is store in the upper part of the extended CQE. The final
>> completion won't have IORING_CQR_F_MORE and will have cqe->res storing
>> 0/error.
>>
>> Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/uapi/linux/io_uring.h |  6 +++
>>   io_uring/cmd_net.c            | 77 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 83 insertions(+)
>>
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index cfd17e382082..0bc156eb96d4 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -960,6 +960,11 @@ struct io_uring_recvmsg_out {
>>   	__u32 flags;
>>   };
>>   
>> +struct io_timespec {
>> +	__u64		tv_sec;
>> +	__u64		tv_nsec;
>> +};
>> +
>>   /*
>>    * Argument for IORING_OP_URING_CMD when file is a socket
>>    */
>> @@ -968,6 +973,7 @@ enum io_uring_socket_op {
>>   	SOCKET_URING_OP_SIOCOUTQ,
>>   	SOCKET_URING_OP_GETSOCKOPT,
>>   	SOCKET_URING_OP_SETSOCKOPT,
>> +	SOCKET_URING_OP_TX_TIMESTAMP,
>>   };
>>   
>>   /* Zero copy receive refill queue entry */
>> diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
>> index e99170c7d41a..dae59aea5847 100644
>> --- a/io_uring/cmd_net.c
>> +++ b/io_uring/cmd_net.c
>> @@ -1,5 +1,6 @@
>>   #include <asm/ioctls.h>
>>   #include <linux/io_uring/net.h>
>> +#include <linux/errqueue.h>
>>   #include <net/sock.h>
>>   
>>   #include "uring_cmd.h"
>> @@ -51,6 +52,80 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
>>   				  optlen);
>>   }
>>   
>> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
>> +				     struct sk_buff *skb, unsigned issue_flags)
>> +{
>> +	struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
>> +	struct io_uring_cqe cqe[2];
>> +	struct io_timespec *iots;
>> +	struct timespec64 ts;
>> +	u32 tskey;
>> +
>> +	BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
>> +
>> +	if (!skb_get_tx_timestamp(skb, sk, &ts))
>> +		return false;
>> +
>> +	tskey = serr->ee.ee_data;
>> +
>> +	cqe->user_data = 0;
>> +	cqe->res = tskey;
>> +	cqe->flags = IORING_CQE_F_MORE;
>> +	cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
>> +
>> +	iots = (struct io_timespec *)&cqe[1];
>> +	iots->tv_sec = ts.tv_sec;
>> +	iots->tv_nsec = ts.tv_nsec;
> 
> skb_get_tx_timestamp loses the information whether this is a
> software or a hardware timestamp. Is that loss problematic?
> 
> If a process only requests one type of timestamp, it will not be.
> 
> But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
> annotation may be necessary.

skb_has_tx_timestamp() helper has clear priority of software timestamp,
if enabled for the socket. Looks like SOF_TIMESTAMPING_OPT_TX_SWHW case
won't produce both timestamps with the current implementation. Am I
missing something?

> 
>> +	return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
>> +}
>> +
>> +static int io_uring_cmd_timestamp(struct socket *sock,
>> +				  struct io_uring_cmd *cmd,
>> +				  unsigned int issue_flags)
>> +{
>> +	struct sock *sk = sock->sk;
>> +	struct sk_buff_head *q = &sk->sk_error_queue;
>> +	struct sk_buff *skb, *tmp;
>> +	struct sk_buff_head list;
>> +	int ret;
>> +
>> +	if (!(issue_flags & IO_URING_F_CQE32))
>> +		return -EINVAL;
>> +	ret = io_cmd_poll_multishot(cmd, issue_flags, EPOLLERR);
>> +	if (unlikely(ret))
>> +		return ret;
>> +
>> +	if (skb_queue_empty_lockless(q))
>> +		return -EAGAIN;
>> +	__skb_queue_head_init(&list);
>> +
>> +	scoped_guard(spinlock_irq, &q->lock) {
>> +		skb_queue_walk_safe(q, skb, tmp) {
>> +			/* don't support skbs with payload */
>> +			if (!skb_has_tx_timestamp(skb, sk) || skb->len)
>> +				continue;
>> +			__skb_unlink(skb, q);
>> +			__skb_queue_tail(&list, skb);
>> +		}
>> +	}
>> +
>> +	while (1) {
>> +		skb = skb_peek(&list);
>> +		if (!skb)
>> +			break;
>> +		if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags))
>> +			break;
>> +		__skb_dequeue(&list);
>> +		consume_skb(skb);
>> +	}
>> +
>> +	if (!unlikely(skb_queue_empty(&list))) {
>> +		scoped_guard(spinlock_irqsave, &q->lock)
>> +			skb_queue_splice(q, &list);
>> +	}
>> +	return -EAGAIN;
>> +}
>> +
>>   int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>>   {
>>   	struct socket *sock = cmd->file->private_data;
>> @@ -76,6 +151,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>>   		return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
>>   	case SOCKET_URING_OP_SETSOCKOPT:
>>   		return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
>> +	case SOCKET_URING_OP_TX_TIMESTAMP:
>> +		return io_uring_cmd_timestamp(sock, cmd, issue_flags);
>>   	default:
>>   		return -EOPNOTSUPP;
>>   	}
>> -- 
>> 2.49.0
>>
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-05 10:25     ` Vadim Fedorenko
@ 2025-06-05 11:01       ` Pavel Begunkov
  2025-06-05 23:54       ` Willem de Bruijn
  2025-06-06  0:02       ` Jason Xing
  2 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-05 11:01 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, io-uring
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

On 6/5/25 11:25, Vadim Fedorenko wrote:
> On 05/06/2025 01:59, Willem de Bruijn wrote:
>> Pavel Begunkov wrote:
...>>> +
>>> +    tskey = serr->ee.ee_data;
>>> +
>>> +    cqe->user_data = 0;
>>> +    cqe->res = tskey;
>>> +    cqe->flags = IORING_CQE_F_MORE;
>>> +    cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
>>> +
>>> +    iots = (struct io_timespec *)&cqe[1];
>>> +    iots->tv_sec = ts.tv_sec;
>>> +    iots->tv_nsec = ts.tv_nsec;
>>
>> skb_get_tx_timestamp loses the information whether this is a
>> software or a hardware timestamp. Is that loss problematic?
>>
>> If a process only requests one type of timestamp, it will not be.
>>
>> But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
>> annotation may be necessary.
> 
> skb_has_tx_timestamp() helper has clear priority of software timestamp,
> if enabled for the socket. Looks like SOF_TIMESTAMPING_OPT_TX_SWHW case
> won't produce both timestamps with the current implementation. Am I
> missing something?

I'll let Vadim handle the question as I hardly know anything about
the timestamping needs, but just wanted to add that it wouldn't be
a problem to pass a flag to the user for distinguishing sw vs hw if
needed.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-05 10:25     ` Vadim Fedorenko
  2025-06-05 11:01       ` Pavel Begunkov
@ 2025-06-05 23:54       ` Willem de Bruijn
  2025-06-06  0:17         ` Jason Xing
  2025-06-06  0:02       ` Jason Xing
  2 siblings, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-06-05 23:54 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, Pavel Begunkov, io-uring
  Cc: netdev, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, David S . Miller, Jakub Kicinski,
	Richard Cochran

Vadim Fedorenko wrote:
> On 05/06/2025 01:59, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> Add a new socket command which returns tx time stamps to the user. It
> >> provide an alternative to the existing error queue recvmsg interface.
> >> The command works in a polled multishot mode, which means io_uring will
> >> poll the socket and keep posting timestamps until the request is
> >> cancelled or fails in any other way (e.g. with no space in the CQ). It
> >> reuses the net infra and grabs timestamps from the socket's error queue.
> >>
> >> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
> >> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
> >> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
> >> timevalue is store in the upper part of the extended CQE. The final
> >> completion won't have IORING_CQR_F_MORE and will have cqe->res storing
> >> 0/error.
> >>
> >> Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> ---
> >>   include/uapi/linux/io_uring.h |  6 +++
> >>   io_uring/cmd_net.c            | 77 +++++++++++++++++++++++++++++++++++
> >>   2 files changed, 83 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> >> index cfd17e382082..0bc156eb96d4 100644
> >> --- a/include/uapi/linux/io_uring.h
> >> +++ b/include/uapi/linux/io_uring.h
> >> @@ -960,6 +960,11 @@ struct io_uring_recvmsg_out {
> >>   	__u32 flags;
> >>   };
> >>   
> >> +struct io_timespec {
> >> +	__u64		tv_sec;
> >> +	__u64		tv_nsec;
> >> +};
> >> +
> >>   /*
> >>    * Argument for IORING_OP_URING_CMD when file is a socket
> >>    */
> >> @@ -968,6 +973,7 @@ enum io_uring_socket_op {
> >>   	SOCKET_URING_OP_SIOCOUTQ,
> >>   	SOCKET_URING_OP_GETSOCKOPT,
> >>   	SOCKET_URING_OP_SETSOCKOPT,
> >> +	SOCKET_URING_OP_TX_TIMESTAMP,
> >>   };
> >>   
> >>   /* Zero copy receive refill queue entry */
> >> diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
> >> index e99170c7d41a..dae59aea5847 100644
> >> --- a/io_uring/cmd_net.c
> >> +++ b/io_uring/cmd_net.c
> >> @@ -1,5 +1,6 @@
> >>   #include <asm/ioctls.h>
> >>   #include <linux/io_uring/net.h>
> >> +#include <linux/errqueue.h>
> >>   #include <net/sock.h>
> >>   
> >>   #include "uring_cmd.h"
> >> @@ -51,6 +52,80 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
> >>   				  optlen);
> >>   }
> >>   
> >> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
> >> +				     struct sk_buff *skb, unsigned issue_flags)
> >> +{
> >> +	struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> >> +	struct io_uring_cqe cqe[2];
> >> +	struct io_timespec *iots;
> >> +	struct timespec64 ts;
> >> +	u32 tskey;
> >> +
> >> +	BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
> >> +
> >> +	if (!skb_get_tx_timestamp(skb, sk, &ts))
> >> +		return false;
> >> +
> >> +	tskey = serr->ee.ee_data;
> >> +
> >> +	cqe->user_data = 0;
> >> +	cqe->res = tskey;
> >> +	cqe->flags = IORING_CQE_F_MORE;
> >> +	cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
> >> +
> >> +	iots = (struct io_timespec *)&cqe[1];
> >> +	iots->tv_sec = ts.tv_sec;
> >> +	iots->tv_nsec = ts.tv_nsec;
> > 
> > skb_get_tx_timestamp loses the information whether this is a
> > software or a hardware timestamp. Is that loss problematic?
> > 
> > If a process only requests one type of timestamp, it will not be.
> > 
> > But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
> > annotation may be necessary.
> 
> skb_has_tx_timestamp() helper has clear priority of software timestamp,
> if enabled for the socket. Looks like SOF_TIMESTAMPING_OPT_TX_SWHW case
> won't produce both timestamps with the current implementation. Am I
> missing something?

The point of that option is to request both SW and HW tx
timestamps. Before that option, the SW timestamp was suppressed if a
HW timestamp was pending.

The io_uring API as currently written will again return only one
timestamp, but this time the SW one.

If the user explicitly sets SOF_TIMESTAMPING_OPT_TX_SWHW, they will
want both, and want to be able to disambiguate them.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-05 10:25     ` Vadim Fedorenko
  2025-06-05 11:01       ` Pavel Begunkov
  2025-06-05 23:54       ` Willem de Bruijn
@ 2025-06-06  0:02       ` Jason Xing
  2025-06-06  8:12         ` Pavel Begunkov
  2 siblings, 1 reply; 23+ messages in thread
From: Jason Xing @ 2025-06-06  0:02 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, Pavel Begunkov, io-uring, netdev, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Richard Cochran

Hi Vadim,

On Thu, Jun 5, 2025 at 6:26 PM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 05/06/2025 01:59, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> Add a new socket command which returns tx time stamps to the user. It
> >> provide an alternative to the existing error queue recvmsg interface.
> >> The command works in a polled multishot mode, which means io_uring will
> >> poll the socket and keep posting timestamps until the request is
> >> cancelled or fails in any other way (e.g. with no space in the CQ). It
> >> reuses the net infra and grabs timestamps from the socket's error queue.
> >>
> >> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
> >> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
> >> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
> >> timevalue is store in the upper part of the extended CQE. The final
> >> completion won't have IORING_CQR_F_MORE and will have cqe->res storing
> >> 0/error.
> >>
> >> Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> ---
> >>   include/uapi/linux/io_uring.h |  6 +++
> >>   io_uring/cmd_net.c            | 77 +++++++++++++++++++++++++++++++++++
> >>   2 files changed, 83 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> >> index cfd17e382082..0bc156eb96d4 100644
> >> --- a/include/uapi/linux/io_uring.h
> >> +++ b/include/uapi/linux/io_uring.h
> >> @@ -960,6 +960,11 @@ struct io_uring_recvmsg_out {
> >>      __u32 flags;
> >>   };
> >>
> >> +struct io_timespec {
> >> +    __u64           tv_sec;
> >> +    __u64           tv_nsec;
> >> +};
> >> +
> >>   /*
> >>    * Argument for IORING_OP_URING_CMD when file is a socket
> >>    */
> >> @@ -968,6 +973,7 @@ enum io_uring_socket_op {
> >>      SOCKET_URING_OP_SIOCOUTQ,
> >>      SOCKET_URING_OP_GETSOCKOPT,
> >>      SOCKET_URING_OP_SETSOCKOPT,
> >> +    SOCKET_URING_OP_TX_TIMESTAMP,
> >>   };
> >>
> >>   /* Zero copy receive refill queue entry */
> >> diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
> >> index e99170c7d41a..dae59aea5847 100644
> >> --- a/io_uring/cmd_net.c
> >> +++ b/io_uring/cmd_net.c
> >> @@ -1,5 +1,6 @@
> >>   #include <asm/ioctls.h>
> >>   #include <linux/io_uring/net.h>
> >> +#include <linux/errqueue.h>
> >>   #include <net/sock.h>
> >>
> >>   #include "uring_cmd.h"
> >> @@ -51,6 +52,80 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
> >>                                optlen);
> >>   }
> >>
> >> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
> >> +                                 struct sk_buff *skb, unsigned issue_flags)
> >> +{
> >> +    struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> >> +    struct io_uring_cqe cqe[2];
> >> +    struct io_timespec *iots;
> >> +    struct timespec64 ts;
> >> +    u32 tskey;
> >> +
> >> +    BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
> >> +
> >> +    if (!skb_get_tx_timestamp(skb, sk, &ts))
> >> +            return false;
> >> +
> >> +    tskey = serr->ee.ee_data;
> >> +
> >> +    cqe->user_data = 0;
> >> +    cqe->res = tskey;
> >> +    cqe->flags = IORING_CQE_F_MORE;
> >> +    cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
> >> +
> >> +    iots = (struct io_timespec *)&cqe[1];
> >> +    iots->tv_sec = ts.tv_sec;
> >> +    iots->tv_nsec = ts.tv_nsec;
> >
> > skb_get_tx_timestamp loses the information whether this is a
> > software or a hardware timestamp. Is that loss problematic?
> >
> > If a process only requests one type of timestamp, it will not be.
> >
> > But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
> > annotation may be necessary.
>
> skb_has_tx_timestamp() helper has clear priority of software timestamp,
> if enabled for the socket. Looks like SOF_TIMESTAMPING_OPT_TX_SWHW case
> won't produce both timestamps with the current implementation. Am I
> missing something?

Sorry that I don't know how iouring works at a high level, so my
question could be naive and unrelated to what Willem said.

Is it possible that applications set various tx sw timestamp flags
(like SOF_TIMESTAMPING_TX_SCHED, SOF_TIMESTAMPING_TX_SOFTWARE)? If it
might happen, then applications can get lost on which type of tx
timestamp it acquires without explicitly specifying in cqe.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-05 23:54       ` Willem de Bruijn
@ 2025-06-06  0:17         ` Jason Xing
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Xing @ 2025-06-06  0:17 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Vadim Fedorenko, Pavel Begunkov, io-uring, netdev, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Richard Cochran

On Fri, Jun 6, 2025 at 7:55 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Vadim Fedorenko wrote:
> > On 05/06/2025 01:59, Willem de Bruijn wrote:
> > > Pavel Begunkov wrote:
> > >> Add a new socket command which returns tx time stamps to the user. It
> > >> provide an alternative to the existing error queue recvmsg interface.
> > >> The command works in a polled multishot mode, which means io_uring will
> > >> poll the socket and keep posting timestamps until the request is
> > >> cancelled or fails in any other way (e.g. with no space in the CQ). It
> > >> reuses the net infra and grabs timestamps from the socket's error queue.
> > >>
> > >> The command requires IORING_SETUP_CQE32. All non-final CQEs (marked with
> > >> IORING_CQE_F_MORE) have cqe->res set to the tskey, and the upper 16 bits
> > >> of cqe->flags keep tstype (i.e. offset by IORING_CQE_BUFFER_SHIFT). The
> > >> timevalue is store in the upper part of the extended CQE. The final
> > >> completion won't have IORING_CQR_F_MORE and will have cqe->res storing
> > >> 0/error.
> > >>
> > >> Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> > >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> > >> ---
> > >>   include/uapi/linux/io_uring.h |  6 +++
> > >>   io_uring/cmd_net.c            | 77 +++++++++++++++++++++++++++++++++++
> > >>   2 files changed, 83 insertions(+)
> > >>
> > >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > >> index cfd17e382082..0bc156eb96d4 100644
> > >> --- a/include/uapi/linux/io_uring.h
> > >> +++ b/include/uapi/linux/io_uring.h
> > >> @@ -960,6 +960,11 @@ struct io_uring_recvmsg_out {
> > >>    __u32 flags;
> > >>   };
> > >>
> > >> +struct io_timespec {
> > >> +  __u64           tv_sec;
> > >> +  __u64           tv_nsec;
> > >> +};
> > >> +
> > >>   /*
> > >>    * Argument for IORING_OP_URING_CMD when file is a socket
> > >>    */
> > >> @@ -968,6 +973,7 @@ enum io_uring_socket_op {
> > >>    SOCKET_URING_OP_SIOCOUTQ,
> > >>    SOCKET_URING_OP_GETSOCKOPT,
> > >>    SOCKET_URING_OP_SETSOCKOPT,
> > >> +  SOCKET_URING_OP_TX_TIMESTAMP,
> > >>   };
> > >>
> > >>   /* Zero copy receive refill queue entry */
> > >> diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
> > >> index e99170c7d41a..dae59aea5847 100644
> > >> --- a/io_uring/cmd_net.c
> > >> +++ b/io_uring/cmd_net.c
> > >> @@ -1,5 +1,6 @@
> > >>   #include <asm/ioctls.h>
> > >>   #include <linux/io_uring/net.h>
> > >> +#include <linux/errqueue.h>
> > >>   #include <net/sock.h>
> > >>
> > >>   #include "uring_cmd.h"
> > >> @@ -51,6 +52,80 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
> > >>                              optlen);
> > >>   }
> > >>
> > >> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
> > >> +                               struct sk_buff *skb, unsigned issue_flags)
> > >> +{
> > >> +  struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> > >> +  struct io_uring_cqe cqe[2];
> > >> +  struct io_timespec *iots;
> > >> +  struct timespec64 ts;
> > >> +  u32 tskey;
> > >> +
> > >> +  BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
> > >> +
> > >> +  if (!skb_get_tx_timestamp(skb, sk, &ts))
> > >> +          return false;
> > >> +
> > >> +  tskey = serr->ee.ee_data;
> > >> +
> > >> +  cqe->user_data = 0;
> > >> +  cqe->res = tskey;
> > >> +  cqe->flags = IORING_CQE_F_MORE;
> > >> +  cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
> > >> +
> > >> +  iots = (struct io_timespec *)&cqe[1];
> > >> +  iots->tv_sec = ts.tv_sec;
> > >> +  iots->tv_nsec = ts.tv_nsec;
> > >
> > > skb_get_tx_timestamp loses the information whether this is a
> > > software or a hardware timestamp. Is that loss problematic?
> > >
> > > If a process only requests one type of timestamp, it will not be.
> > >
> > > But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
> > > annotation may be necessary.
> >
> > skb_has_tx_timestamp() helper has clear priority of software timestamp,
> > if enabled for the socket. Looks like SOF_TIMESTAMPING_OPT_TX_SWHW case
> > won't produce both timestamps with the current implementation. Am I
> > missing something?
>
> The point of that option is to request both SW and HW tx
> timestamps. Before that option, the SW timestamp was suppressed if a
> HW timestamp was pending.

Agree, the regular driver does such an implementation in such an order:
XXX_xmit()
{
    if (unlikely((skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)...
        skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;

    skb_tx_timestamp(skb); // here, it would not generate timestamp
with SKBTX_IN_PROGRESS being set in advance
}

If both sw and hw flags are set, the software one will not take any effect.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-06  0:02       ` Jason Xing
@ 2025-06-06  8:12         ` Pavel Begunkov
  2025-06-06  8:33           ` Jason Xing
  0 siblings, 1 reply; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-06  8:12 UTC (permalink / raw)
  To: Jason Xing, Vadim Fedorenko
  Cc: Willem de Bruijn, io-uring, netdev, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Richard Cochran

On 6/6/25 01:02, Jason Xing wrote:
...>>>>                                 optlen);
>>>>    }
>>>>
>>>> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
>>>> +                                 struct sk_buff *skb, unsigned issue_flags)
>>>> +{
>>>> +    struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
>>>> +    struct io_uring_cqe cqe[2];
>>>> +    struct io_timespec *iots;
>>>> +    struct timespec64 ts;
>>>> +    u32 tskey;
>>>> +
>>>> +    BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
>>>> +
>>>> +    if (!skb_get_tx_timestamp(skb, sk, &ts))
>>>> +            return false;
>>>> +
>>>> +    tskey = serr->ee.ee_data;
>>>> +
>>>> +    cqe->user_data = 0;
>>>> +    cqe->res = tskey;
>>>> +    cqe->flags = IORING_CQE_F_MORE;
>>>> +    cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
>>>> +
>>>> +    iots = (struct io_timespec *)&cqe[1];
>>>> +    iots->tv_sec = ts.tv_sec;
>>>> +    iots->tv_nsec = ts.tv_nsec;
>>>
>>> skb_get_tx_timestamp loses the information whether this is a
>>> software or a hardware timestamp. Is that loss problematic?
>>>
>>> If a process only requests one type of timestamp, it will not be.
>>>
>>> But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
>>> annotation may be necessary.
>>
>> skb_has_tx_timestamp() helper has clear priority of software timestamp,
>> if enabled for the socket. Looks like SOF_TIMESTAMPING_OPT_TX_SWHW case
>> won't produce both timestamps with the current implementation. Am I
>> missing something?
> 
> Sorry that I don't know how iouring works at a high level, so my
> question could be naive and unrelated to what Willem said.
> 
> Is it possible that applications set various tx sw timestamp flags
> (like SOF_TIMESTAMPING_TX_SCHED, SOF_TIMESTAMPING_TX_SOFTWARE)? If it

io_uring takes timestamps from the error queue, just like the socket
api does it. There should be different skbs in the queue for different
SCM_TSTAMP_{SND,SCHED,ACK,*} timestamps, io_uring only passes the
type it got in an skb's serr->ee.ee_info to user without changes.
Hope it answers it

> might happen, then applications can get lost on which type of tx
> timestamp it acquires without explicitly specifying in cqe.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-06  8:12         ` Pavel Begunkov
@ 2025-06-06  8:33           ` Jason Xing
  2025-06-06  9:08             ` Pavel Begunkov
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Xing @ 2025-06-06  8:33 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Vadim Fedorenko, Willem de Bruijn, io-uring, netdev, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Richard Cochran

On Fri, Jun 6, 2025 at 4:11 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 6/6/25 01:02, Jason Xing wrote:
> ...>>>>                                 optlen);
> >>>>    }
> >>>>
> >>>> +static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
> >>>> +                                 struct sk_buff *skb, unsigned issue_flags)
> >>>> +{
> >>>> +    struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
> >>>> +    struct io_uring_cqe cqe[2];
> >>>> +    struct io_timespec *iots;
> >>>> +    struct timespec64 ts;
> >>>> +    u32 tskey;
> >>>> +
> >>>> +    BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
> >>>> +
> >>>> +    if (!skb_get_tx_timestamp(skb, sk, &ts))
> >>>> +            return false;
> >>>> +
> >>>> +    tskey = serr->ee.ee_data;
> >>>> +
> >>>> +    cqe->user_data = 0;
> >>>> +    cqe->res = tskey;
> >>>> +    cqe->flags = IORING_CQE_F_MORE;
> >>>> +    cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
> >>>> +
> >>>> +    iots = (struct io_timespec *)&cqe[1];
> >>>> +    iots->tv_sec = ts.tv_sec;
> >>>> +    iots->tv_nsec = ts.tv_nsec;
> >>>
> >>> skb_get_tx_timestamp loses the information whether this is a
> >>> software or a hardware timestamp. Is that loss problematic?
> >>>
> >>> If a process only requests one type of timestamp, it will not be.
> >>>
> >>> But when requesting both (SOF_TIMESTAMPING_OPT_TX_SWHW) this per cqe
> >>> annotation may be necessary.
> >>
> >> skb_has_tx_timestamp() helper has clear priority of software timestamp,
> >> if enabled for the socket. Looks like SOF_TIMESTAMPING_OPT_TX_SWHW case
> >> won't produce both timestamps with the current implementation. Am I
> >> missing something?
> >
> > Sorry that I don't know how iouring works at a high level, so my
> > question could be naive and unrelated to what Willem said.
> >
> > Is it possible that applications set various tx sw timestamp flags
> > (like SOF_TIMESTAMPING_TX_SCHED, SOF_TIMESTAMPING_TX_SOFTWARE)? If it
>
> io_uring takes timestamps from the error queue, just like the socket
> api does it. There should be different skbs in the queue for different
> SCM_TSTAMP_{SND,SCHED,ACK,*} timestamps, io_uring only passes the
> type it got in an skb's serr->ee.ee_info to user without changes.
> Hope it answers it

Sure, thanks, io_uring has no difference from other regular
applications in this case. Then the question that Willem proposed
remains because in other applications struct scm_timestamping_internal
can be used to distinguish sw and hw timestamps (please see
__sock_recv_timestamp() as an example).

Thanks,
Jason

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support
  2025-06-06  8:33           ` Jason Xing
@ 2025-06-06  9:08             ` Pavel Begunkov
  0 siblings, 0 replies; 23+ messages in thread
From: Pavel Begunkov @ 2025-06-06  9:08 UTC (permalink / raw)
  To: Jason Xing
  Cc: Vadim Fedorenko, Willem de Bruijn, io-uring, netdev, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S . Miller, Jakub Kicinski, Richard Cochran

On 6/6/25 09:33, Jason Xing wrote:
...>>> Sorry that I don't know how iouring works at a high level, so my
>>> question could be naive and unrelated to what Willem said.
>>>
>>> Is it possible that applications set various tx sw timestamp flags
>>> (like SOF_TIMESTAMPING_TX_SCHED, SOF_TIMESTAMPING_TX_SOFTWARE)? If it
>>
>> io_uring takes timestamps from the error queue, just like the socket
>> api does it. There should be different skbs in the queue for different
>> SCM_TSTAMP_{SND,SCHED,ACK,*} timestamps, io_uring only passes the
>> type it got in an skb's serr->ee.ee_info to user without changes.
>> Hope it answers it
> 
> Sure, thanks, io_uring has no difference from other regular
> applications in this case. Then the question that Willem proposed
> remains because in other applications struct scm_timestamping_internal
> can be used to distinguish sw and hw timestamps (please see
> __sock_recv_timestamp() as an example).

Right, that's exactly where I copied it from ;) I can certainly add
a flag if there is a need to distinguish software vs hardware
timestamps.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-06-06  9:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04  8:42 [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-06-04  8:42 ` [PATCH v2 1/5] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
2025-06-04 15:37   ` Stanislav Fomichev
2025-06-05  0:52   ` Willem de Bruijn
2025-06-05  3:51   ` Jason Xing
2025-06-04  8:42 ` [PATCH v2 2/5] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
2025-06-04  8:42 ` [PATCH v2 3/5] io_uring/cmd: allow multishot polled commands Pavel Begunkov
2025-06-04  8:42 ` [PATCH v2 4/5] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
2025-06-04  8:42 ` [PATCH v2 5/5] io_uring/netcmd: add tx timestamping cmd support Pavel Begunkov
2025-06-04 12:04   ` Jens Axboe
2025-06-04 12:33     ` Pavel Begunkov
2025-06-05  0:59   ` Willem de Bruijn
2025-06-05 10:25     ` Vadim Fedorenko
2025-06-05 11:01       ` Pavel Begunkov
2025-06-05 23:54       ` Willem de Bruijn
2025-06-06  0:17         ` Jason Xing
2025-06-06  0:02       ` Jason Xing
2025-06-06  8:12         ` Pavel Begunkov
2025-06-06  8:33           ` Jason Xing
2025-06-06  9:08             ` Pavel Begunkov
2025-06-04  8:53 ` [PATCH v2 0/5] io_uring cmd for tx timestamps Pavel Begunkov
2025-06-04 12:06 ` Jens Axboe
2025-06-04 12:38   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox