* [PATCH RFC 0/7] tx timestamp io_uring commands
@ 2025-04-28 12:52 Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 1/7] io_uring: delete misleading comment in io_fill_cqe_aux() Pavel Begunkov
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
Vadim expressed interest in having an io_uring API for tx timestamping,
and the series implements a rough prototype to support that. It
introduces a new socket command, which works in a multishot polling
mode, i.e. it polls the socket and posts CQEs when a timestamp arrives.
It reuses most of the bits on the networking side by grabbing timestamp
skbs from the socket's error queue.
The ABI and net bits like skb parsing will need to be discussed and
ironed before posting a non-RFC version.
Pavel Begunkov (7):
io_uring: delete misleading comment in io_fill_cqe_aux()
io_uring/cmd: move net cmd into a separate file
net: timestamp: add helper returning skb's tx tstamp
io_uring/poll: introduce io_arm_apoll()
io_uring/cmd: allow multishot polled commands
io_uring: add mshot helper for posting CQE32
io_uring/cmd: add tx timestamping cmd support
include/net/sock.h | 3 +
include/uapi/linux/io_uring.h | 6 ++
io_uring/Makefile | 1 +
io_uring/cmd_net.c | 177 ++++++++++++++++++++++++++++++++++
io_uring/io_uring.c | 46 ++++++++-
io_uring/io_uring.h | 1 +
io_uring/poll.c | 43 +++++----
io_uring/poll.h | 1 +
io_uring/uring_cmd.c | 97 +++++--------------
io_uring/uring_cmd.h | 7 ++
net/socket.c | 32 ++++++
11 files changed, 319 insertions(+), 95 deletions(-)
create mode 100644 io_uring/cmd_net.c
--
2.48.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH RFC 1/7] io_uring: delete misleading comment in io_fill_cqe_aux()
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
@ 2025-04-28 12:52 ` Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 2/7] io_uring/cmd: move net cmd into a separate file Pavel Begunkov
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
io_fill_cqe_aux() doesn't overflow completions, however it might fail
them and lets the caller handle it. Remove the comment, which doesn't
make any sense.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 7099b488c5e1..dc6dac544fe0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -814,11 +814,6 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
ctx->cq_extra++;
- /*
- * If we can't get a cq entry, userspace overflowed the
- * submission (by quite a lot). Increment the overflow count in
- * the ring.
- */
if (likely(io_get_cqe(ctx, &cqe))) {
WRITE_ONCE(cqe->user_data, user_data);
WRITE_ONCE(cqe->res, res);
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH RFC 2/7] io_uring/cmd: move net cmd into a separate file
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 1/7] io_uring: delete misleading comment in io_fill_cqe_aux() Pavel Begunkov
@ 2025-04-28 12:52 ` Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 3/7] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
We keep socket io_uring command implementation in io_uring/uring_cmd.c.
Separate it from generic command code into a separate file.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/Makefile | 1 +
io_uring/cmd_net.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
io_uring/uring_cmd.c | 83 --------------------------------------------
3 files changed, 84 insertions(+), 83 deletions(-)
create mode 100644 io_uring/cmd_net.c
diff --git a/io_uring/Makefile b/io_uring/Makefile
index 3e28a741ca15..75e0ca795685 100644
--- a/io_uring/Makefile
+++ b/io_uring/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_IO_WQ) += io-wq.o
obj-$(CONFIG_FUTEX) += futex.o
obj-$(CONFIG_EPOLL) += epoll.o
obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o
+obj-$(CONFIG_NET) += cmd_net.o
diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
new file mode 100644
index 000000000000..e99170c7d41a
--- /dev/null
+++ b/io_uring/cmd_net.c
@@ -0,0 +1,83 @@
+#include <asm/ioctls.h>
+#include <linux/io_uring/net.h>
+#include <net/sock.h>
+
+#include "uring_cmd.h"
+
+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ const struct io_uring_sqe *sqe = cmd->sqe;
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optlen, optname, level, err;
+ void __user *optval;
+
+ level = READ_ONCE(sqe->level);
+ if (level != SOL_SOCKET)
+ return -EOPNOTSUPP;
+
+ optval = u64_to_user_ptr(READ_ONCE(sqe->optval));
+ optname = READ_ONCE(sqe->optname);
+ optlen = READ_ONCE(sqe->optlen);
+
+ err = do_sock_getsockopt(sock, compat, level, optname,
+ USER_SOCKPTR(optval),
+ KERNEL_SOCKPTR(&optlen));
+ if (err)
+ return err;
+
+ /* On success, return optlen */
+ return optlen;
+}
+
+static inline int io_uring_cmd_setsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ const struct io_uring_sqe *sqe = cmd->sqe;
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optname, optlen, level;
+ void __user *optval;
+ sockptr_t optval_s;
+
+ optval = u64_to_user_ptr(READ_ONCE(sqe->optval));
+ optname = READ_ONCE(sqe->optname);
+ optlen = READ_ONCE(sqe->optlen);
+ level = READ_ONCE(sqe->level);
+ optval_s = USER_SOCKPTR(optval);
+
+ return do_sock_setsockopt(sock, compat, level, optname, optval_s,
+ optlen);
+}
+
+int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+ struct socket *sock = cmd->file->private_data;
+ struct sock *sk = sock->sk;
+ struct proto *prot = READ_ONCE(sk->sk_prot);
+ int ret, arg = 0;
+
+ if (!prot || !prot->ioctl)
+ return -EOPNOTSUPP;
+
+ switch (cmd->cmd_op) {
+ case SOCKET_URING_OP_SIOCINQ:
+ ret = prot->ioctl(sk, SIOCINQ, &arg);
+ if (ret)
+ return ret;
+ return arg;
+ case SOCKET_URING_OP_SIOCOUTQ:
+ ret = prot->ioctl(sk, SIOCOUTQ, &arg);
+ if (ret)
+ return ret;
+ return arg;
+ case SOCKET_URING_OP_GETSOCKOPT:
+ return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
+ case SOCKET_URING_OP_SETSOCKOPT:
+ return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index a9ea7d29cdd9..34b450c78e2b 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -3,13 +3,10 @@
#include <linux/errno.h>
#include <linux/file.h>
#include <linux/io_uring/cmd.h>
-#include <linux/io_uring/net.h>
#include <linux/security.h>
#include <linux/nospec.h>
-#include <net/sock.h>
#include <uapi/linux/io_uring.h>
-#include <asm/ioctls.h>
#include "io_uring.h"
#include "alloc_cache.h"
@@ -302,83 +299,3 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd)
io_req_queue_iowq(req);
}
-
-static inline int io_uring_cmd_getsockopt(struct socket *sock,
- struct io_uring_cmd *cmd,
- unsigned int issue_flags)
-{
- const struct io_uring_sqe *sqe = cmd->sqe;
- bool compat = !!(issue_flags & IO_URING_F_COMPAT);
- int optlen, optname, level, err;
- void __user *optval;
-
- level = READ_ONCE(sqe->level);
- if (level != SOL_SOCKET)
- return -EOPNOTSUPP;
-
- optval = u64_to_user_ptr(READ_ONCE(sqe->optval));
- optname = READ_ONCE(sqe->optname);
- optlen = READ_ONCE(sqe->optlen);
-
- err = do_sock_getsockopt(sock, compat, level, optname,
- USER_SOCKPTR(optval),
- KERNEL_SOCKPTR(&optlen));
- if (err)
- return err;
-
- /* On success, return optlen */
- return optlen;
-}
-
-static inline int io_uring_cmd_setsockopt(struct socket *sock,
- struct io_uring_cmd *cmd,
- unsigned int issue_flags)
-{
- const struct io_uring_sqe *sqe = cmd->sqe;
- bool compat = !!(issue_flags & IO_URING_F_COMPAT);
- int optname, optlen, level;
- void __user *optval;
- sockptr_t optval_s;
-
- optval = u64_to_user_ptr(READ_ONCE(sqe->optval));
- optname = READ_ONCE(sqe->optname);
- optlen = READ_ONCE(sqe->optlen);
- level = READ_ONCE(sqe->level);
- optval_s = USER_SOCKPTR(optval);
-
- return do_sock_setsockopt(sock, compat, level, optname, optval_s,
- optlen);
-}
-
-#if defined(CONFIG_NET)
-int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
-{
- struct socket *sock = cmd->file->private_data;
- struct sock *sk = sock->sk;
- struct proto *prot = READ_ONCE(sk->sk_prot);
- int ret, arg = 0;
-
- if (!prot || !prot->ioctl)
- return -EOPNOTSUPP;
-
- switch (cmd->cmd_op) {
- case SOCKET_URING_OP_SIOCINQ:
- ret = prot->ioctl(sk, SIOCINQ, &arg);
- if (ret)
- return ret;
- return arg;
- case SOCKET_URING_OP_SIOCOUTQ:
- ret = prot->ioctl(sk, SIOCOUTQ, &arg);
- if (ret)
- return ret;
- return arg;
- case SOCKET_URING_OP_GETSOCKOPT:
- return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
- case SOCKET_URING_OP_SETSOCKOPT:
- return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
- default:
- return -EOPNOTSUPP;
- }
-}
-EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
-#endif
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH RFC 3/7] net: timestamp: add helper returning skb's tx tstamp
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 1/7] io_uring: delete misleading comment in io_fill_cqe_aux() Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 2/7] io_uring/cmd: move net cmd into a separate file Pavel Begunkov
@ 2025-04-28 12:52 ` Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 4/7] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
Add a helper function skb_get_tx_timestamp() that returns a tx timestamp
associated with an skb from an queue queue.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/net/sock.h | 3 +++
net/socket.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/include/net/sock.h b/include/net/sock.h
index 694f954258d4..37fb15a04799 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2661,6 +2661,9 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
struct sk_buff *skb);
+bool skb_get_tx_timestamp(struct sock *sk, struct sk_buff *skb,
+ struct timespec64 *ts);
+
static inline void
sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
{
diff --git a/net/socket.c b/net/socket.c
index 9a0e720f0859..2ae776011ca1 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -843,6 +843,38 @@ static void put_ts_pktinfo(struct msghdr *msg, struct sk_buff *skb,
sizeof(ts_pktinfo), &ts_pktinfo);
}
+bool skb_get_tx_timestamp(struct sock *sk, struct sk_buff *skb,
+ struct timespec64 *ts)
+{
+ u32 tsflags = READ_ONCE(sk->sk_tsflags);
+ bool false_tstamp = false;
+ ktime_t hwtstamp;
+ int if_index = 0;
+
+ if (sock_flag(sk, SOCK_RCVTSTAMP) && skb->tstamp == 0) {
+ __net_timestamp(skb);
+ false_tstamp = true;
+ }
+
+ if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) &&
+ ktime_to_timespec64_cond(skb->tstamp, ts))
+ return true;
+
+ if (!(tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) ||
+ skb_is_swtx_tstamp(skb, false_tstamp))
+ return false;
+
+ if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP_NETDEV)
+ hwtstamp = get_timestamp(sk, skb, &if_index);
+ else
+ hwtstamp = skb_hwtstamps(skb)->hwtstamp;
+
+ if (tsflags & SOF_TIMESTAMPING_BIND_PHC)
+ hwtstamp = ptp_convert_timestamp(&hwtstamp,
+ READ_ONCE(sk->sk_bind_phc));
+ return ktime_to_timespec64_cond(hwtstamp, ts);
+}
+
/*
* called from sock_recv_timestamp() if sock_flag(sk, SOCK_RCVTSTAMP)
*/
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH RFC 4/7] io_uring/poll: introduce io_arm_apoll()
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (2 preceding siblings ...)
2025-04-28 12:52 ` [PATCH RFC 3/7] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
@ 2025-04-28 12:52 ` Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 5/7] io_uring/cmd: allow multishot polled commands Pavel Begunkov
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
In preparation to allowing commands to do file polling, add a helper
that takes the desired poll event mask and arms it for polling. We won't
be able to use io_arm_poll_handler() with IORING_OP_URING_CMD as it
tries to infer the mask from the opcode data, and we can't unify it
across all commands.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/poll.c | 43 ++++++++++++++++++++++++++-----------------
io_uring/poll.h | 1 +
2 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 8eb744eb9f4c..9e6d9b889733 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -669,33 +669,17 @@ static struct async_poll *io_req_alloc_apoll(struct io_kiocb *req,
return apoll;
}
-int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask)
{
- const struct io_issue_def *def = &io_issue_defs[req->opcode];
struct async_poll *apoll;
struct io_poll_table ipt;
- __poll_t mask = POLLPRI | POLLERR | EPOLLET;
int ret;
- if (!def->pollin && !def->pollout)
- return IO_APOLL_ABORTED;
if (!io_file_can_poll(req))
return IO_APOLL_ABORTED;
if (!(req->flags & REQ_F_APOLL_MULTISHOT))
mask |= EPOLLONESHOT;
- if (def->pollin) {
- mask |= EPOLLIN | EPOLLRDNORM;
-
- /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
- if (req->flags & REQ_F_CLEAR_POLLIN)
- mask &= ~EPOLLIN;
- } else {
- mask |= EPOLLOUT | EPOLLWRNORM;
- }
- if (def->poll_exclusive)
- mask |= EPOLLEXCLUSIVE;
-
apoll = io_req_alloc_apoll(req, issue_flags);
if (!apoll)
return IO_APOLL_ABORTED;
@@ -712,6 +696,31 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
return IO_APOLL_OK;
}
+int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
+{
+ const struct io_issue_def *def = &io_issue_defs[req->opcode];
+ __poll_t mask = POLLPRI | POLLERR | EPOLLET;
+
+ if (!def->pollin && !def->pollout)
+ return IO_APOLL_ABORTED;
+ if (!io_file_can_poll(req))
+ return IO_APOLL_ABORTED;
+
+ if (def->pollin) {
+ mask |= EPOLLIN | EPOLLRDNORM;
+
+ /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+ if (req->flags & REQ_F_CLEAR_POLLIN)
+ mask &= ~EPOLLIN;
+ } else {
+ mask |= EPOLLOUT | EPOLLWRNORM;
+ }
+ if (def->poll_exclusive)
+ mask |= EPOLLEXCLUSIVE;
+
+ return io_arm_apoll(req, issue_flags, mask);
+}
+
/*
* Returns true if we found and killed one or more poll requests
*/
diff --git a/io_uring/poll.h b/io_uring/poll.h
index 27e2db2ed4ae..c8438286dfa0 100644
--- a/io_uring/poll.h
+++ b/io_uring/poll.h
@@ -41,6 +41,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
struct io_cancel_data;
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
unsigned issue_flags);
+int io_arm_apoll(struct io_kiocb *req, unsigned issue_flags, __poll_t mask);
int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags);
bool io_poll_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx,
bool cancel_all);
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH RFC 5/7] io_uring/cmd: allow multishot polled commands
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (3 preceding siblings ...)
2025-04-28 12:52 ` [PATCH RFC 4/7] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
@ 2025-04-28 12:52 ` Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 6/7] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
Some commands like timestamping in the next patch can make use of
multishot polling, i.e. REQ_F_APOLL_MULTISHOT. Add support for that,
which is condensed in a single helper called io_cmd_poll_multishot().
The user who wants to continue with a request in a multishot mode must
call the function, and only if it returns 0 the user is free to proceed.
Apart from normal terminal errors, it can also end up with -EIOCBQUEUED,
in which case the user must forward it to the core io_uring. It's
forbidden to use task work while the request is executing in a multishot
mode.
The API is not foolproof, hence it's not exported to modules nor exposed
in public headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/uring_cmd.c | 23 +++++++++++++++++++++++
io_uring/uring_cmd.h | 3 +++
2 files changed, 26 insertions(+)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 34b450c78e2b..94246ba90e13 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -12,6 +12,7 @@
#include "alloc_cache.h"
#include "rsrc.h"
#include "uring_cmd.h"
+#include "poll.h"
void io_cmd_cache_free(const void *entry)
{
@@ -136,6 +137,9 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
{
struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+ return;
+
ioucmd->task_work_cb = task_work_cb;
req->io_task_work.func = io_uring_cmd_work;
__io_req_task_work_add(req, flags);
@@ -158,6 +162,9 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
{
struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+ if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
+ return;
+
io_uring_cmd_del_cancelable(ioucmd, issue_flags);
if (ret < 0)
@@ -299,3 +306,19 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd)
io_req_queue_iowq(req);
}
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, __poll_t mask)
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+ int ret;
+
+ if (likely(req->flags & REQ_F_APOLL_MULTISHOT))
+ return 0;
+
+ req->flags |= REQ_F_APOLL_MULTISHOT;
+ mask &= ~EPOLLONESHOT;
+
+ ret = io_arm_apoll(req, issue_flags, mask);
+ return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index b04686b6b5d2..40305a7de038 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -23,3 +23,6 @@ int io_uring_cmd_import_fixed_vec(struct io_uring_cmd *ioucmd,
size_t uvec_segs,
int ddir, struct iov_iter *iter,
unsigned issue_flags);
+
+int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
+ unsigned int issue_flags, __poll_t mask);
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH RFC 6/7] io_uring: add mshot helper for posting CQE32
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (4 preceding siblings ...)
2025-04-28 12:52 ` [PATCH RFC 5/7] io_uring/cmd: allow multishot polled commands Pavel Begunkov
@ 2025-04-28 12:52 ` Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 7/7] io_uring/cmd: add tx timestamping cmd support Pavel Begunkov
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
Add a helper for posting 32 byte CQEs in a multishot mode and add a cmd
helper on top. As it specifically works with requests, the helper ignore
the passed in cqe->user_data and sets it to the one stored in the
request.
The command helper is only valid with multishot requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 41 +++++++++++++++++++++++++++++++++++++++++
io_uring/io_uring.h | 1 +
io_uring/uring_cmd.c | 11 +++++++++++
io_uring/uring_cmd.h | 4 ++++
4 files changed, 57 insertions(+)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index dc6dac544fe0..ca341f9d7b42 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -807,6 +807,22 @@ bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow)
return true;
}
+static bool io_fill_cqe_aux32(struct io_ring_ctx *ctx,
+ struct io_uring_cqe src_cqe[2])
+{
+ struct io_uring_cqe *cqe;
+
+ if (WARN_ON_ONCE(!(ctx->flags & IORING_SETUP_CQE32)))
+ return false;
+ if (unlikely(!io_get_cqe(ctx, &cqe)))
+ return false;
+
+ ctx->cq_extra++;
+ memcpy(cqe, src_cqe, 2 * sizeof(*cqe));
+ trace_io_uring_complete(ctx, NULL, cqe);
+ return true;
+}
+
static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
u32 cflags)
{
@@ -880,6 +896,31 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
return posted;
}
+/*
+ * A helper for multishot requests posting additional CQEs.
+ * Should only be used from a task_work including IO_URING_F_MULTISHOT.
+ */
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2])
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ bool posted;
+
+ lockdep_assert(!io_wq_current_is_worker());
+ lockdep_assert_held(&ctx->uring_lock);
+
+ cqe[0].user_data = req->cqe.user_data;
+ if (!ctx->lockless_cq) {
+ spin_lock(&ctx->completion_lock);
+ posted = io_fill_cqe_aux32(ctx, cqe);
+ spin_unlock(&ctx->completion_lock);
+ } else {
+ posted = io_fill_cqe_aux32(ctx, cqe);
+ }
+
+ ctx->submit_state.cq_flush = true;
+ return posted;
+}
+
static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index e4050b2d0821..6a8e3c79805d 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -82,6 +82,7 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res);
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags);
+bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe src_cqe[2]);
void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
struct file *io_file_get_normal(struct io_kiocb *req, int fd);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 94246ba90e13..6bc84877d205 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -322,3 +322,14 @@ int io_cmd_poll_multishot(struct io_uring_cmd *cmd,
ret = io_arm_apoll(req, issue_flags, mask);
return ret == IO_APOLL_OK ? -EIOCBQUEUED : -ECANCELED;
}
+
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+ unsigned int issue_flags,
+ struct io_uring_cqe cqe[2])
+{
+ struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+
+ if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_MULTISHOT)))
+ return false;
+ return io_req_post_cqe32(req, cqe);
+}
diff --git a/io_uring/uring_cmd.h b/io_uring/uring_cmd.h
index 40305a7de038..d504b6b08a56 100644
--- a/io_uring/uring_cmd.h
+++ b/io_uring/uring_cmd.h
@@ -16,6 +16,10 @@ void io_uring_cmd_cleanup(struct io_kiocb *req);
bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
struct io_uring_task *tctx, bool cancel_all);
+bool io_uring_cmd_post_mshot_cqe32(struct io_uring_cmd *cmd,
+ unsigned int issue_flags,
+ struct io_uring_cqe cqe[2]);
+
void io_cmd_cache_free(const void *entry);
int io_uring_cmd_import_fixed_vec(struct io_uring_cmd *ioucmd,
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH RFC 7/7] io_uring/cmd: add tx timestamping cmd support
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (5 preceding siblings ...)
2025-04-28 12:52 ` [PATCH RFC 6/7] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
@ 2025-04-28 12:52 ` Pavel Begunkov
2025-04-28 13:08 ` [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 12:52 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, Vadim Fedorenko
Add a new socket command which returns tx time stamps to the user. It
provide an alternative to the existing error queue recvmsg interface.
The command works in a polled multishot mode, which means io_uring will
poll the socket and keep posting timestamps until the request is
cancelled or fails in any other way (e.g. with no space in the CQ).
The command requires CQE32 as it posts the timespec value in the upper
half, and the lower cqe holds the tstamp key/id and type.
Suggested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 6 +++
io_uring/cmd_net.c | 94 +++++++++++++++++++++++++++++++++++
2 files changed, 100 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 130f3bc71a69..3a477dbd2627 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -956,6 +956,11 @@ struct io_uring_recvmsg_out {
__u32 flags;
};
+struct io_timespec {
+ __u64 tv_sec;
+ __u64 tv_nsec;
+};
+
/*
* Argument for IORING_OP_URING_CMD when file is a socket
*/
@@ -964,6 +969,7 @@ enum io_uring_socket_op {
SOCKET_URING_OP_SIOCOUTQ,
SOCKET_URING_OP_GETSOCKOPT,
SOCKET_URING_OP_SETSOCKOPT,
+ SOCKET_URING_OP_TX_TIMESTAMP,
};
/* Zero copy receive refill queue entry */
diff --git a/io_uring/cmd_net.c b/io_uring/cmd_net.c
index e99170c7d41a..9695a9f78d76 100644
--- a/io_uring/cmd_net.c
+++ b/io_uring/cmd_net.c
@@ -1,5 +1,6 @@
#include <asm/ioctls.h>
#include <linux/io_uring/net.h>
+#include <linux/errqueue.h>
#include <net/sock.h>
#include "uring_cmd.h"
@@ -51,6 +52,97 @@ static inline int io_uring_cmd_setsockopt(struct socket *sock,
optlen);
}
+static bool io_skb_has_tx_tstamp(struct sk_buff *skb, struct sock *sk)
+{
+ u32 tsflags = READ_ONCE(sk->sk_tsflags);
+ struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+
+ if (serr->ee.ee_errno != ENOMSG ||
+ serr->ee.ee_origin != SO_EE_ORIGIN_TIMESTAMPING ||
+ skb->len)
+ return false;
+
+ /* software time stamp available and wanted */
+ if ((tsflags & SOF_TIMESTAMPING_SOFTWARE) && skb->tstamp)
+ return true;
+ /* hardware time stamps available and wanted */
+ return (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
+ skb_hwtstamps(skb)->hwtstamp;
+}
+
+static bool io_process_timestamp_skb(struct io_uring_cmd *cmd, struct sock *sk,
+ struct sk_buff *skb, unsigned issue_flags)
+{
+ struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
+ struct io_uring_cqe cqe[2];
+ struct io_timespec *iots;
+ struct timespec64 ts;
+ u32 tskey;
+
+ BUILD_BUG_ON(sizeof(struct io_uring_cqe) != sizeof(struct io_timespec));
+
+ if (!skb_get_tx_timestamp(sk, skb, &ts))
+ return false;
+
+ tskey = serr->ee.ee_data;
+
+ cqe->user_data = 0;
+ cqe->res = tskey;
+ cqe->flags = IORING_CQE_F_MORE;
+ cqe->flags |= (u32)serr->ee.ee_info << IORING_CQE_BUFFER_SHIFT;
+
+ iots = (struct io_timespec *)&cqe[1];
+ iots->tv_sec = ts.tv_sec;
+ iots->tv_nsec = ts.tv_nsec;
+ return io_uring_cmd_post_mshot_cqe32(cmd, issue_flags, cqe);
+}
+
+static int io_uring_cmd_timestamp(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ struct sock *sk = sock->sk;
+ struct sk_buff_head *q = &sk->sk_error_queue;
+ struct sk_buff *skb, *tmp;
+ struct sk_buff_head list;
+ int ret;
+
+ if (!(issue_flags & IO_URING_F_CQE32))
+ return -EINVAL;
+ ret = io_cmd_poll_multishot(cmd, issue_flags, POLLERR);
+ if (unlikely(ret))
+ return ret;
+
+ if (skb_queue_empty_lockless(q))
+ return -EAGAIN;
+ __skb_queue_head_init(&list);
+
+ scoped_guard(spinlock_irq, &q->lock) {
+ skb_queue_walk_safe(q, skb, tmp) {
+ if (!io_skb_has_tx_tstamp(skb, sk))
+ continue;
+ __skb_unlink(skb, q);
+ __skb_queue_tail(&list, skb);
+ }
+ }
+
+ while (1) {
+ skb = skb_peek(&list);
+ if (!skb)
+ break;
+ if (!io_process_timestamp_skb(cmd, sk, skb, issue_flags))
+ break;
+ __skb_dequeue(&list);
+ consume_skb(skb);
+ }
+
+ if (!unlikely(skb_queue_empty(&list))) {
+ scoped_guard(spinlock_irqsave, &q->lock)
+ skb_queue_splice(q, &list);
+ }
+ return -EAGAIN;
+}
+
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
struct socket *sock = cmd->file->private_data;
@@ -76,6 +168,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
case SOCKET_URING_OP_SETSOCKOPT:
return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
+ case SOCKET_URING_OP_TX_TIMESTAMP:
+ return io_uring_cmd_timestamp(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/7] tx timestamp io_uring commands
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (6 preceding siblings ...)
2025-04-28 12:52 ` [PATCH RFC 7/7] io_uring/cmd: add tx timestamping cmd support Pavel Begunkov
@ 2025-04-28 13:08 ` Pavel Begunkov
2025-04-28 17:51 ` Jens Axboe
2025-04-28 17:51 ` (subset) " Jens Axboe
9 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2025-04-28 13:08 UTC (permalink / raw)
To: io-uring; +Cc: Vadim Fedorenko
On 4/28/25 13:52, Pavel Begunkov wrote:
> Vadim expressed interest in having an io_uring API for tx timestamping,
> and the series implements a rough prototype to support that. It
> introduces a new socket command, which works in a multishot polling
> mode, i.e. it polls the socket and posts CQEs when a timestamp arrives.
> It reuses most of the bits on the networking side by grabbing timestamp
> skbs from the socket's error queue.
A branch for convenience:
https://github.com/isilence/linux.git tx-tstamp
> The ABI and net bits like skb parsing will need to be discussed and
> ironed before posting a non-RFC version.
FWIW, I'm not spamming net list just yet, not before figuring out
io_uring bits and other basic requirements.
Also, Jens, please consider taking first two patches if you're
happy with that.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC 0/7] tx timestamp io_uring commands
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (7 preceding siblings ...)
2025-04-28 13:08 ` [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
@ 2025-04-28 17:51 ` Jens Axboe
2025-04-28 17:51 ` (subset) " Jens Axboe
9 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2025-04-28 17:51 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Vadim Fedorenko
On 4/28/25 6:52 AM, Pavel Begunkov wrote:
> Vadim expressed interest in having an io_uring API for tx timestamping,
> and the series implements a rough prototype to support that. It
> introduces a new socket command, which works in a multishot polling
> mode, i.e. it polls the socket and posts CQEs when a timestamp arrives.
> It reuses most of the bits on the networking side by grabbing timestamp
> skbs from the socket's error queue.
>
> The ABI and net bits like skb parsing will need to be discussed and
> ironed before posting a non-RFC version.
Implementation looks nice and clean and straight forward, don't see why
this can't be a non-RFC posting. At least in my opinion!
I'll queue up the first 2 patches.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (subset) [PATCH RFC 0/7] tx timestamp io_uring commands
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
` (8 preceding siblings ...)
2025-04-28 17:51 ` Jens Axboe
@ 2025-04-28 17:51 ` Jens Axboe
9 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2025-04-28 17:51 UTC (permalink / raw)
To: io-uring, Pavel Begunkov; +Cc: Vadim Fedorenko
On Mon, 28 Apr 2025 13:52:31 +0100, Pavel Begunkov wrote:
> Vadim expressed interest in having an io_uring API for tx timestamping,
> and the series implements a rough prototype to support that. It
> introduces a new socket command, which works in a multishot polling
> mode, i.e. it polls the socket and posts CQEs when a timestamp arrives.
> It reuses most of the bits on the networking side by grabbing timestamp
> skbs from the socket's error queue.
>
> [...]
Applied, thanks!
[1/7] io_uring: delete misleading comment in io_fill_cqe_aux()
commit: 27d2fed790ce6407e321e89aac3c8c0e28986fff
[2/7] io_uring/cmd: move net cmd into a separate file
commit: 91db6edc573bf238c277602b2ea4b4f4688fdedc
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-04-28 17:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-28 12:52 [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 1/7] io_uring: delete misleading comment in io_fill_cqe_aux() Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 2/7] io_uring/cmd: move net cmd into a separate file Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 3/7] net: timestamp: add helper returning skb's tx tstamp Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 4/7] io_uring/poll: introduce io_arm_apoll() Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 5/7] io_uring/cmd: allow multishot polled commands Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 6/7] io_uring: add mshot helper for posting CQE32 Pavel Begunkov
2025-04-28 12:52 ` [PATCH RFC 7/7] io_uring/cmd: add tx timestamping cmd support Pavel Begunkov
2025-04-28 13:08 ` [PATCH RFC 0/7] tx timestamp io_uring commands Pavel Begunkov
2025-04-28 17:51 ` Jens Axboe
2025-04-28 17:51 ` (subset) " Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox