public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking
@ 2024-04-19 11:08 Pavel Begunkov
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure Pavel Begunkov
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Pavel Begunkov @ 2024-04-19 11:08 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu,
	Paul Durrant, xen-devel, Michael S . Tsirkin, virtualization, kvm

Please, don't take directly, conflicts with io_uring.

To have per request buffer notifications each zerocopy io_uring send
request allocates a new ubuf_info. However, as an skb can carry only
one uarg, it may force the stack to create many small skbs hurting
performance in many ways.

The patchset implements notification, i.e. an io_uring's ubuf_info
extension, stacking. It attempts to link ubuf_info's into a list,
allowing to have multiple of them per skb.

liburing/examples/send-zerocopy shows up 6 times performance improvement
for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without
the patchset it requires much larger sends to utilise all potential.

bytes  | before | after (Kqps)
1200   | 195    | 1023
4000   | 193    | 1386
8000   | 154    | 1058

The patches are on top of net-next + io_uring-next:

https://github.com/isilence/linux.git iou-sendzc/notif-stacking-v2

First two patches based on net-next:

https://github.com/isilence/linux.git iou-sendzc/notif-stacking-v2-netonly

v2: convert xen-netback to ubuf_info_ops (patch 1)
    drop two separately merged io_uring patches

Pavel Begunkov (4):
  net: extend ubuf_info callback to ops structure
  net: add callback for setting a ubuf_info to skb
  io_uring/notif: simplify io_notif_flush()
  io_uring/notif: implement notification stacking

 drivers/net/tap.c                   |  2 +-
 drivers/net/tun.c                   |  2 +-
 drivers/net/xen-netback/common.h    |  5 +-
 drivers/net/xen-netback/interface.c |  2 +-
 drivers/net/xen-netback/netback.c   | 11 ++--
 drivers/vhost/net.c                 |  8 ++-
 include/linux/skbuff.h              | 21 +++++---
 io_uring/notif.c                    | 83 +++++++++++++++++++++++++----
 io_uring/notif.h                    | 12 ++---
 net/core/skbuff.c                   | 36 ++++++++-----
 10 files changed, 134 insertions(+), 48 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure
  2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
@ 2024-04-19 11:08 ` Pavel Begunkov
  2024-04-22 16:59   ` Willem de Bruijn
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 2/4] net: add callback for setting a ubuf_info to skb Pavel Begunkov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2024-04-19 11:08 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu,
	Paul Durrant, xen-devel, Michael S . Tsirkin, virtualization, kvm

We'll need to associate additional callbacks with ubuf_info, introduce
a structure holding ubuf_info callbacks. Apart from a more smarter
io_uring notification management introduced in next patches, it can be
used to generalise msg_zerocopy_put_abort() and also store
->sg_from_iter, which is currently passed in struct msghdr.

Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 drivers/net/tap.c                   |  2 +-
 drivers/net/tun.c                   |  2 +-
 drivers/net/xen-netback/common.h    |  5 ++---
 drivers/net/xen-netback/interface.c |  2 +-
 drivers/net/xen-netback/netback.c   | 11 ++++++++---
 drivers/vhost/net.c                 |  8 ++++++--
 include/linux/skbuff.h              | 19 +++++++++++--------
 io_uring/notif.c                    |  8 ++++++--
 net/core/skbuff.c                   | 16 ++++++++++------
 9 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 9f0495e8df4d..bfdd3875fe86 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -754,7 +754,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
 		skb_zcopy_init(skb, msg_control);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(NULL, uarg, false);
+		uarg->ops->complete(NULL, uarg, false);
 	}
 
 	dev_queue_xmit(skb);
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0b3f21cba552..b7401d990680 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1906,7 +1906,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_zcopy_init(skb, msg_control);
 	} else if (msg_control) {
 		struct ubuf_info *uarg = msg_control;
-		uarg->callback(NULL, uarg, false);
+		uarg->ops->complete(NULL, uarg, false);
 	}
 
 	skb_reset_network_header(skb);
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 1fcbd83f7ff2..17421da139f2 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -390,9 +390,8 @@ bool xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
 void xenvif_carrier_on(struct xenvif *vif);
 
-/* Callback from stack when TX packet can be released */
-void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
-			      bool zerocopy_success);
+/* Callbacks from stack when TX packet can be released */
+extern const struct ubuf_info_ops xenvif_ubuf_ops;
 
 static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
 {
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7cff90aa8d24..65db5f14465f 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -593,7 +593,7 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 
 	for (i = 0; i < MAX_PENDING_REQS; i++) {
 		queue->pending_tx_info[i].callback_struct = (struct ubuf_info_msgzc)
-			{ { .callback = xenvif_zerocopy_callback },
+			{ { .ops = &xenvif_ubuf_ops },
 			  { { .ctx = NULL,
 			      .desc = i } } };
 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 48254fc07d64..5836995d6774 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1157,7 +1157,7 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
 	uarg = skb_shinfo(skb)->destructor_arg;
 	/* increase inflight counter to offset decrement in callback */
 	atomic_inc(&queue->inflight_packets);
-	uarg->callback(NULL, uarg, true);
+	uarg->ops->complete(NULL, uarg, true);
 	skb_shinfo(skb)->destructor_arg = NULL;
 
 	/* Fill the skb with the new (local) frags. */
@@ -1279,8 +1279,9 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
 	return work_done;
 }
 
-void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf_base,
-			      bool zerocopy_success)
+static void xenvif_zerocopy_callback(struct sk_buff *skb,
+				     struct ubuf_info *ubuf_base,
+				     bool zerocopy_success)
 {
 	unsigned long flags;
 	pending_ring_idx_t index;
@@ -1313,6 +1314,10 @@ void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf_base,
 	xenvif_skb_zerocopy_complete(queue);
 }
 
+const struct ubuf_info_ops xenvif_ubuf_ops = {
+	.complete = xenvif_zerocopy_callback,
+};
+
 static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue)
 {
 	struct gnttab_unmap_grant_ref *gop;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c64ded183f8d..f16279351db5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -380,7 +380,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 	}
 }
 
-static void vhost_zerocopy_callback(struct sk_buff *skb,
+static void vhost_zerocopy_complete(struct sk_buff *skb,
 				    struct ubuf_info *ubuf_base, bool success)
 {
 	struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base);
@@ -408,6 +408,10 @@ static void vhost_zerocopy_callback(struct sk_buff *skb,
 	rcu_read_unlock_bh();
 }
 
+static const struct ubuf_info_ops vhost_ubuf_ops = {
+	.complete = vhost_zerocopy_complete,
+};
+
 static inline unsigned long busy_clock(void)
 {
 	return local_clock() >> 10;
@@ -879,7 +883,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
-			ubuf->ubuf.callback = vhost_zerocopy_callback;
+			ubuf->ubuf.ops = &vhost_ubuf_ops;
 			ubuf->ubuf.flags = SKBFL_ZEROCOPY_FRAG;
 			refcount_set(&ubuf->ubuf.refcnt, 1);
 			msg.msg_control = &ctl;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4072a7ee3859..a44954264746 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -527,6 +527,11 @@ enum {
 #define SKBFL_ALL_ZEROCOPY	(SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \
 				 SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS)
 
+struct ubuf_info_ops {
+	void (*complete)(struct sk_buff *, struct ubuf_info *,
+			 bool zerocopy_success);
+};
+
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
@@ -536,8 +541,7 @@ enum {
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-	void (*callback)(struct sk_buff *, struct ubuf_info *,
-			 bool zerocopy_success);
+	const struct ubuf_info_ops *ops;
 	refcount_t refcnt;
 	u8 flags;
 };
@@ -1671,14 +1675,13 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset)
 }
 #endif
 
+extern const struct ubuf_info_ops msg_zerocopy_ubuf_ops;
+
 struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 				       struct ubuf_info *uarg);
 
 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref);
 
-void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
-			   bool success);
-
 int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
 			    struct sk_buff *skb, struct iov_iter *from,
 			    size_t length);
@@ -1766,13 +1769,13 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
 static inline void net_zcopy_put(struct ubuf_info *uarg)
 {
 	if (uarg)
-		uarg->callback(NULL, uarg, true);
+		uarg->ops->complete(NULL, uarg, true);
 }
 
 static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
 	if (uarg) {
-		if (uarg->callback == msg_zerocopy_callback)
+		if (uarg->ops == &msg_zerocopy_ubuf_ops)
 			msg_zerocopy_put_abort(uarg, have_uref);
 		else if (have_uref)
 			net_zcopy_put(uarg);
@@ -1786,7 +1789,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success)
 
 	if (uarg) {
 		if (!skb_zcopy_is_nouarg(skb))
-			uarg->callback(skb, uarg, zerocopy_success);
+			uarg->ops->complete(skb, uarg, zerocopy_success);
 
 		skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY;
 	}
diff --git a/io_uring/notif.c b/io_uring/notif.c
index 3485437b207d..53532d78a947 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -23,7 +23,7 @@ void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 	io_req_task_complete(notif, ts);
 }
 
-static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg,
+static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
 				bool success)
 {
 	struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg);
@@ -43,6 +43,10 @@ static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 	__io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE);
 }
 
+static const struct ubuf_info_ops io_ubuf_ops = {
+	.complete = io_tx_ubuf_complete,
+};
+
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	__must_hold(&ctx->uring_lock)
 {
@@ -62,7 +66,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	nd->zc_report = false;
 	nd->account_pages = 0;
 	nd->uarg.flags = IO_NOTIF_UBUF_FLAGS;
-	nd->uarg.callback = io_tx_ubuf_callback;
+	nd->uarg.ops = &io_ubuf_ops;
 	refcount_set(&nd->uarg.refcnt, 1);
 	return notif;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 37c858dc11a6..0f4cc759824b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1652,7 +1652,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 		return NULL;
 	}
 
-	uarg->ubuf.callback = msg_zerocopy_callback;
+	uarg->ubuf.ops = &msg_zerocopy_ubuf_ops;
 	uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1;
 	uarg->len = 1;
 	uarg->bytelen = size;
@@ -1678,7 +1678,7 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 		u32 bytelen, next;
 
 		/* there might be non MSG_ZEROCOPY users */
-		if (uarg->callback != msg_zerocopy_callback)
+		if (uarg->ops != &msg_zerocopy_ubuf_ops)
 			return NULL;
 
 		/* realloc only when socket is locked (TCP, UDP cork),
@@ -1789,8 +1789,8 @@ static void __msg_zerocopy_callback(struct ubuf_info_msgzc *uarg)
 	sock_put(sk);
 }
 
-void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
-			   bool success)
+static void msg_zerocopy_complete(struct sk_buff *skb, struct ubuf_info *uarg,
+				  bool success)
 {
 	struct ubuf_info_msgzc *uarg_zc = uarg_to_msgzc(uarg);
 
@@ -1799,7 +1799,6 @@ void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 	if (refcount_dec_and_test(&uarg->refcnt))
 		__msg_zerocopy_callback(uarg_zc);
 }
-EXPORT_SYMBOL_GPL(msg_zerocopy_callback);
 
 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
@@ -1809,10 +1808,15 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 	uarg_to_msgzc(uarg)->len--;
 
 	if (have_uref)
-		msg_zerocopy_callback(NULL, uarg, true);
+		msg_zerocopy_complete(NULL, uarg, true);
 }
 EXPORT_SYMBOL_GPL(msg_zerocopy_put_abort);
 
+const struct ubuf_info_ops msg_zerocopy_ubuf_ops = {
+	.complete = msg_zerocopy_complete,
+};
+EXPORT_SYMBOL_GPL(msg_zerocopy_ubuf_ops);
+
 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 			     struct msghdr *msg, int len,
 			     struct ubuf_info *uarg)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH io_uring-next/net-next v2 2/4] net: add callback for setting a ubuf_info to skb
  2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure Pavel Begunkov
@ 2024-04-19 11:08 ` Pavel Begunkov
  2024-04-22 17:01   ` Willem de Bruijn
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 3/4] io_uring/notif: simplify io_notif_flush() Pavel Begunkov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2024-04-19 11:08 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu,
	Paul Durrant, xen-devel, Michael S . Tsirkin, virtualization, kvm

At the moment an skb can only have one ubuf_info associated with it,
which might be a performance problem for zerocopy sends in cases like
TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
way we will implement smarter assignment later like linking ubuf_info
together.

Note, it's an optional callback, which should be compatible with
skb_zcopy_set(), that's because the net stack might potentially decide
to clone an skb and take another reference to ubuf_info whenever it
wishes. Also, a correct implementation should always be able to bind to
an skb without prior ubuf_info, otherwise we could end up in a situation
when the send would not be able to progress.

Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 include/linux/skbuff.h |  2 ++
 net/core/skbuff.c      | 20 ++++++++++++++------
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a44954264746..f76825e5b92a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -530,6 +530,8 @@ enum {
 struct ubuf_info_ops {
 	void (*complete)(struct sk_buff *, struct ubuf_info *,
 			 bool zerocopy_success);
+	/* has to be compatible with skb_zcopy_set() */
+	int (*link_skb)(struct sk_buff *skb, struct ubuf_info *uarg);
 };
 
 /*
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0f4cc759824b..0c8b82750000 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1824,11 +1824,18 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 	struct ubuf_info *orig_uarg = skb_zcopy(skb);
 	int err, orig_len = skb->len;
 
-	/* An skb can only point to one uarg. This edge case happens when
-	 * TCP appends to an skb, but zerocopy_realloc triggered a new alloc.
-	 */
-	if (orig_uarg && uarg != orig_uarg)
-		return -EEXIST;
+	if (uarg->ops->link_skb) {
+		err = uarg->ops->link_skb(skb, uarg);
+		if (err)
+			return err;
+	} else {
+		/* An skb can only point to one uarg. This edge case happens
+		 * when TCP appends to an skb, but zerocopy_realloc triggered
+		 * a new alloc.
+		 */
+		if (orig_uarg && uarg != orig_uarg)
+			return -EEXIST;
+	}
 
 	err = __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len);
 	if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) {
@@ -1842,7 +1849,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 		return err;
 	}
 
-	skb_zcopy_set(skb, uarg, NULL);
+	if (!uarg->ops->link_skb)
+		skb_zcopy_set(skb, uarg, NULL);
 	return skb->len - orig_len;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH io_uring-next/net-next v2 3/4] io_uring/notif: simplify io_notif_flush()
  2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure Pavel Begunkov
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 2/4] net: add callback for setting a ubuf_info to skb Pavel Begunkov
@ 2024-04-19 11:08 ` Pavel Begunkov
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 4/4] io_uring/notif: implement notification stacking Pavel Begunkov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2024-04-19 11:08 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu,
	Paul Durrant, xen-devel, Michael S . Tsirkin, virtualization, kvm

io_notif_flush() is partially duplicating io_tx_ubuf_complete(), so
instead of duplicating it, make the flush call io_tx_ubuf_complete.

Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/notif.c | 6 +++---
 io_uring/notif.h | 9 +++------
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index 53532d78a947..26680176335f 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -9,7 +9,7 @@
 #include "notif.h"
 #include "rsrc.h"
 
-void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
+static void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 
@@ -23,8 +23,8 @@ void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 	io_req_task_complete(notif, ts);
 }
 
-static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
-				bool success)
+void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
+			 bool success)
 {
 	struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg);
 	struct io_kiocb *notif = cmd_to_io_kiocb(nd);
diff --git a/io_uring/notif.h b/io_uring/notif.h
index 2e25a2fc77d1..2cf9ff6abd7a 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -21,7 +21,8 @@ struct io_notif_data {
 };
 
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx);
-void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts);
+void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
+			 bool success);
 
 static inline struct io_notif_data *io_notif_to_data(struct io_kiocb *notif)
 {
@@ -33,11 +34,7 @@ static inline void io_notif_flush(struct io_kiocb *notif)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 
-	/* drop slot's master ref */
-	if (refcount_dec_and_test(&nd->uarg.refcnt)) {
-		notif->io_task_work.func = io_notif_tw_complete;
-		__io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE);
-	}
+	io_tx_ubuf_complete(NULL, &nd->uarg, true);
 }
 
 static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH io_uring-next/net-next v2 4/4] io_uring/notif: implement notification stacking
  2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (2 preceding siblings ...)
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 3/4] io_uring/notif: simplify io_notif_flush() Pavel Begunkov
@ 2024-04-19 11:08 ` Pavel Begunkov
  2024-04-19 11:21 ` [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2024-04-19 11:08 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu,
	Paul Durrant, xen-devel, Michael S . Tsirkin, virtualization, kvm

The network stack allows only one ubuf_info per skb, and unlike
MSG_ZEROCOPY, each io_uring zerocopy send will carry a separate
ubuf_info. That means that send requests can't reuse a previosly
allocated skb and need to get one more or more of new ones. That's fine
for large sends, but otherwise it would spam the stack with lots of skbs
carrying just a little data each.

To help with that implement linking notification (i.e. an io_uring wrapper
around ubuf_info) into a list. Each is refcounted by skbs and the stack
as usual. additionally all non head entries keep a reference to the
head, which they put down when their refcount hits 0. When the head have
no more users, it'll efficiently put all notifications in a batch.

As mentioned previously about ->io_link_skb, the callback implementation
always allows to bind to an skb without a ubuf_info.

Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/notif.c | 71 +++++++++++++++++++++++++++++++++++++++++++-----
 io_uring/notif.h |  3 ++
 2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index 26680176335f..d58cdc01e691 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -9,18 +9,28 @@
 #include "notif.h"
 #include "rsrc.h"
 
+static const struct ubuf_info_ops io_ubuf_ops;
+
 static void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 
-	if (unlikely(nd->zc_report) && (nd->zc_copied || !nd->zc_used))
-		notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED;
+	do {
+		notif = cmd_to_io_kiocb(nd);
 
-	if (nd->account_pages && notif->ctx->user) {
-		__io_unaccount_mem(notif->ctx->user, nd->account_pages);
-		nd->account_pages = 0;
-	}
-	io_req_task_complete(notif, ts);
+		lockdep_assert(refcount_read(&nd->uarg.refcnt) == 0);
+
+		if (unlikely(nd->zc_report) && (nd->zc_copied || !nd->zc_used))
+			notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED;
+
+		if (nd->account_pages && notif->ctx->user) {
+			__io_unaccount_mem(notif->ctx->user, nd->account_pages);
+			nd->account_pages = 0;
+		}
+
+		nd = nd->next;
+		io_req_task_complete(notif, ts);
+	} while (nd);
 }
 
 void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
@@ -39,12 +49,56 @@ void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg,
 	if (!refcount_dec_and_test(&uarg->refcnt))
 		return;
 
+	if (nd->head != nd) {
+		io_tx_ubuf_complete(skb, &nd->head->uarg, success);
+		return;
+	}
 	notif->io_task_work.func = io_notif_tw_complete;
 	__io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE);
 }
 
+static int io_link_skb(struct sk_buff *skb, struct ubuf_info *uarg)
+{
+	struct io_notif_data *nd, *prev_nd;
+	struct io_kiocb *prev_notif, *notif;
+	struct ubuf_info *prev_uarg = skb_zcopy(skb);
+
+	nd = container_of(uarg, struct io_notif_data, uarg);
+	notif = cmd_to_io_kiocb(nd);
+
+	if (!prev_uarg) {
+		net_zcopy_get(&nd->uarg);
+		skb_zcopy_init(skb, &nd->uarg);
+		return 0;
+	}
+	/* handle it separately as we can't link a notif to itself */
+	if (unlikely(prev_uarg == &nd->uarg))
+		return 0;
+	/* we can't join two links together, just request a fresh skb */
+	if (unlikely(nd->head != nd || nd->next))
+		return -EEXIST;
+	/* don't mix zc providers */
+	if (unlikely(prev_uarg->ops != &io_ubuf_ops))
+		return -EEXIST;
+
+	prev_nd = container_of(prev_uarg, struct io_notif_data, uarg);
+	prev_notif = cmd_to_io_kiocb(nd);
+
+	/* make sure all noifications can be finished in the same task_work */
+	if (unlikely(notif->ctx != prev_notif->ctx ||
+		     notif->task != prev_notif->task))
+		return -EEXIST;
+
+	nd->head = prev_nd->head;
+	nd->next = prev_nd->next;
+	prev_nd->next = nd;
+	net_zcopy_get(&nd->head->uarg);
+	return 0;
+}
+
 static const struct ubuf_info_ops io_ubuf_ops = {
 	.complete = io_tx_ubuf_complete,
+	.link_skb = io_link_skb,
 };
 
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
@@ -65,6 +119,9 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
 	nd = io_notif_to_data(notif);
 	nd->zc_report = false;
 	nd->account_pages = 0;
+	nd->next = NULL;
+	nd->head = nd;
+
 	nd->uarg.flags = IO_NOTIF_UBUF_FLAGS;
 	nd->uarg.ops = &io_ubuf_ops;
 	refcount_set(&nd->uarg.refcnt, 1);
diff --git a/io_uring/notif.h b/io_uring/notif.h
index 2cf9ff6abd7a..f3589cfef4a9 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -14,6 +14,9 @@ struct io_notif_data {
 	struct file		*file;
 	struct ubuf_info	uarg;
 
+	struct io_notif_data	*next;
+	struct io_notif_data	*head;
+
 	unsigned		account_pages;
 	bool			zc_report;
 	bool			zc_used;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking
  2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (3 preceding siblings ...)
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 4/4] io_uring/notif: implement notification stacking Pavel Begunkov
@ 2024-04-19 11:21 ` Pavel Begunkov
  2024-04-23  0:20 ` patchwork-bot+netdevbpf
  2024-04-23  1:36 ` (subset) " Jens Axboe
  6 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2024-04-19 11:21 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern,
	Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu, Paul Durrant,
	xen-devel, Michael S . Tsirkin, virtualization, kvm

On 4/19/24 12:08, Pavel Begunkov wrote:
> Please, don't take directly, conflicts with io_uring.

When everyone is happy with the patches, Jens and Jakub will hopefully
help to merge them. E.g. first staging net/ specific changes [1] and then
handling all conflicts on the io_uring side.

[1] https://github.com/isilence/linux.git iou-sendzc/notif-stacking-v2-netonly


> To have per request buffer notifications each zerocopy io_uring send
> request allocates a new ubuf_info. However, as an skb can carry only
> one uarg, it may force the stack to create many small skbs hurting
> performance in many ways.
> 
> The patchset implements notification, i.e. an io_uring's ubuf_info
> extension, stacking. It attempts to link ubuf_info's into a list,
> allowing to have multiple of them per skb.
> 
> liburing/examples/send-zerocopy shows up 6 times performance improvement
> for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without
> the patchset it requires much larger sends to utilise all potential.
> 
> bytes  | before | after (Kqps)
> 1200   | 195    | 1023
> 4000   | 193    | 1386
> 8000   | 154    | 1058
> 
> The patches are on top of net-next + io_uring-next:
> 
> https://github.com/isilence/linux.git iou-sendzc/notif-stacking-v2
> 
> First two patches based on net-next:
> 
> https://github.com/isilence/linux.git iou-sendzc/notif-stacking-v2-netonly
> 
> v2: convert xen-netback to ubuf_info_ops (patch 1)
>      drop two separately merged io_uring patches
> 
> Pavel Begunkov (4):
>    net: extend ubuf_info callback to ops structure
>    net: add callback for setting a ubuf_info to skb
>    io_uring/notif: simplify io_notif_flush()
>    io_uring/notif: implement notification stacking
> 
>   drivers/net/tap.c                   |  2 +-
>   drivers/net/tun.c                   |  2 +-
>   drivers/net/xen-netback/common.h    |  5 +-
>   drivers/net/xen-netback/interface.c |  2 +-
>   drivers/net/xen-netback/netback.c   | 11 ++--
>   drivers/vhost/net.c                 |  8 ++-
>   include/linux/skbuff.h              | 21 +++++---
>   io_uring/notif.c                    | 83 +++++++++++++++++++++++++----
>   io_uring/notif.h                    | 12 ++---
>   net/core/skbuff.c                   | 36 ++++++++-----
>   10 files changed, 134 insertions(+), 48 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure Pavel Begunkov
@ 2024-04-22 16:59   ` Willem de Bruijn
  0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2024-04-22 16:59 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu,
	Paul Durrant, xen-devel, Michael S . Tsirkin, virtualization, kvm

Pavel Begunkov wrote:
> We'll need to associate additional callbacks with ubuf_info, introduce
> a structure holding ubuf_info callbacks. Apart from a more smarter
> io_uring notification management introduced in next patches, it can be
> used to generalise msg_zerocopy_put_abort() and also store
> ->sg_from_iter, which is currently passed in struct msghdr.
> 
> Reviewed-by: Jens Axboe <[email protected]>
> Reviewed-by: David Ahern <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH io_uring-next/net-next v2 2/4] net: add callback for setting a ubuf_info to skb
  2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 2/4] net: add callback for setting a ubuf_info to skb Pavel Begunkov
@ 2024-04-22 17:01   ` Willem de Bruijn
  0 siblings, 0 replies; 10+ messages in thread
From: Willem de Bruijn @ 2024-04-22 17:01 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn, Jason Wang, Wei Liu,
	Paul Durrant, xen-devel, Michael S . Tsirkin, virtualization, kvm

Pavel Begunkov wrote:
> At the moment an skb can only have one ubuf_info associated with it,
> which might be a performance problem for zerocopy sends in cases like
> TCP via io_uring. Add a callback for assigning ubuf_info to skb, this
> way we will implement smarter assignment later like linking ubuf_info
> together.
> 
> Note, it's an optional callback, which should be compatible with
> skb_zcopy_set(), that's because the net stack might potentially decide
> to clone an skb and take another reference to ubuf_info whenever it
> wishes. Also, a correct implementation should always be able to bind to
> an skb without prior ubuf_info, otherwise we could end up in a situation
> when the send would not be able to progress.
> 
> Reviewed-by: Jens Axboe <[email protected]>
> Reviewed-by: David Ahern <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking
  2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (4 preceding siblings ...)
  2024-04-19 11:21 ` [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
@ 2024-04-23  0:20 ` patchwork-bot+netdevbpf
  2024-04-23  1:36 ` (subset) " Jens Axboe
  6 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-04-23  0:20 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: io-uring, netdev, axboe, davem, kuba, dsahern, edumazet,
	willemdebruijn.kernel, jasowang, wei.liu, paul, xen-devel, mst,
	virtualization, kvm

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <[email protected]>:

On Fri, 19 Apr 2024 12:08:38 +0100 you wrote:
> Please, don't take directly, conflicts with io_uring.
> 
> To have per request buffer notifications each zerocopy io_uring send
> request allocates a new ubuf_info. However, as an skb can carry only
> one uarg, it may force the stack to create many small skbs hurting
> performance in many ways.
> 
> [...]

Here is the summary with links:
  - [io_uring-next/net-next,v2,1/4] net: extend ubuf_info callback to ops structure
    https://git.kernel.org/netdev/net-next/c/7ab4f16f9e24
  - [io_uring-next/net-next,v2,2/4] net: add callback for setting a ubuf_info to skb
    https://git.kernel.org/netdev/net-next/c/65bada80dec1
  - [io_uring-next/net-next,v2,3/4] io_uring/notif: simplify io_notif_flush()
    (no matching commit)
  - [io_uring-next/net-next,v2,4/4] io_uring/notif: implement notification stacking
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: (subset) [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking
  2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
                   ` (5 preceding siblings ...)
  2024-04-23  0:20 ` patchwork-bot+netdevbpf
@ 2024-04-23  1:36 ` Jens Axboe
  6 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-04-23  1:36 UTC (permalink / raw)
  To: io-uring, netdev, Pavel Begunkov
  Cc: David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet,
	Willem de Bruijn, Jason Wang, Wei Liu, Paul Durrant, xen-devel,
	Michael S . Tsirkin, virtualization, kvm


On Fri, 19 Apr 2024 12:08:38 +0100, Pavel Begunkov wrote:
> Please, don't take directly, conflicts with io_uring.
> 
> To have per request buffer notifications each zerocopy io_uring send
> request allocates a new ubuf_info. However, as an skb can carry only
> one uarg, it may force the stack to create many small skbs hurting
> performance in many ways.
> 
> [...]

Applied, thanks!

[3/4] io_uring/notif: simplify io_notif_flush()
      commit: 5a569469b973cb7a6c58192a37dfb8418686e518
[4/4] io_uring/notif: implement notification stacking
      commit: 6fe4220912d19152a26ce19713ab232f4263018d

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-23  1:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-19 11:08 [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure Pavel Begunkov
2024-04-22 16:59   ` Willem de Bruijn
2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 2/4] net: add callback for setting a ubuf_info to skb Pavel Begunkov
2024-04-22 17:01   ` Willem de Bruijn
2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 3/4] io_uring/notif: simplify io_notif_flush() Pavel Begunkov
2024-04-19 11:08 ` [PATCH io_uring-next/net-next v2 4/4] io_uring/notif: implement notification stacking Pavel Begunkov
2024-04-19 11:21 ` [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking Pavel Begunkov
2024-04-23  0:20 ` patchwork-bot+netdevbpf
2024-04-23  1:36 ` (subset) " Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox