public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] zerocopy tx cleanups
@ 2024-06-27 12:59 Pavel Begunkov
  2024-06-27 12:59 ` [PATCH net-next 1/5] net: always try to set ubuf in skb_zerocopy_iter_stream Pavel Begunkov
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-27 12:59 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Assorted zerocopy send path cleanups, the main part of which is
moving some net stack specific accounting out of io_uring back
to net/ in Patch 4.

Pavel Begunkov (5):
  net: always try to set ubuf in skb_zerocopy_iter_stream
  net: split __zerocopy_sg_from_iter()
  net: batch zerocopy_fill_skb_from_iter accounting
  io_uring/net: move charging socket out of zc io_uring
  net: limit scope of a skb_zerocopy_iter_stream var

 include/linux/skbuff.h |  3 +++
 include/linux/socket.h |  2 +-
 io_uring/net.c         | 16 ++++----------
 net/core/datagram.c    | 47 +++++++++++++++++++++++++-----------------
 net/core/skbuff.c      |  6 +++---
 5 files changed, 39 insertions(+), 35 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/5] net: always try to set ubuf in skb_zerocopy_iter_stream
  2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
@ 2024-06-27 12:59 ` Pavel Begunkov
  2024-06-28 17:03   ` Willem de Bruijn
  2024-06-27 12:59 ` [PATCH net-next 2/5] net: split __zerocopy_sg_from_iter() Pavel Begunkov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-27 12:59 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

skb_zcopy_set() does nothing if there is already a ubuf_info associated
with an skb, and since ->link_skb should have set it several lines above
the check here essentially does nothing and can be removed. It's also
safer this way, because even if the callback is faulty we'll
have it set.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 net/core/skbuff.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2315c088e91d..9f28822dde6f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1884,8 +1884,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 		return err;
 	}
 
-	if (!uarg->ops->link_skb)
-		skb_zcopy_set(skb, uarg, NULL);
+	skb_zcopy_set(skb, uarg, NULL);
 	return skb->len - orig_len;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/5] net: split __zerocopy_sg_from_iter()
  2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
  2024-06-27 12:59 ` [PATCH net-next 1/5] net: always try to set ubuf in skb_zerocopy_iter_stream Pavel Begunkov
@ 2024-06-27 12:59 ` Pavel Begunkov
  2024-06-28 17:03   ` Willem de Bruijn
  2024-06-27 12:59 ` [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting Pavel Begunkov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-27 12:59 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Split a function out of __zerocopy_sg_from_iter() that only cares about
the traditional path with refcounted pages and doesn't need to know
about ->sg_from_iter. A preparation patch, we'll improve on the function
later.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 net/core/datagram.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 95f242591fd2..7f7d5da2e406 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -610,16 +610,10 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
 }
 EXPORT_SYMBOL(skb_copy_datagram_from_iter);
 
-int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
-			    struct sk_buff *skb, struct iov_iter *from,
-			    size_t length)
+static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
+					struct iov_iter *from, size_t length)
 {
-	int frag;
-
-	if (msg && msg->msg_ubuf && msg->sg_from_iter)
-		return msg->sg_from_iter(sk, skb, from, length);
-
-	frag = skb_shinfo(skb)->nr_frags;
+	int frag = skb_shinfo(skb)->nr_frags;
 
 	while (length && iov_iter_count(from)) {
 		struct page *head, *last_head = NULL;
@@ -692,6 +686,16 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
 	}
 	return 0;
 }
+
+int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
+			    struct sk_buff *skb, struct iov_iter *from,
+			    size_t length)
+{
+	if (msg && msg->msg_ubuf && msg->sg_from_iter)
+		return msg->sg_from_iter(sk, skb, from, length);
+	else
+		return zerocopy_fill_skb_from_iter(sk, skb, from, length);
+}
 EXPORT_SYMBOL(__zerocopy_sg_from_iter);
 
 /**
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting
  2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
  2024-06-27 12:59 ` [PATCH net-next 1/5] net: always try to set ubuf in skb_zerocopy_iter_stream Pavel Begunkov
  2024-06-27 12:59 ` [PATCH net-next 2/5] net: split __zerocopy_sg_from_iter() Pavel Begunkov
@ 2024-06-27 12:59 ` Pavel Begunkov
  2024-06-28 17:06   ` Willem de Bruijn
  2024-06-27 12:59 ` [PATCH net-next 4/5] io_uring/net: move charging socket out of zc io_uring Pavel Begunkov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-27 12:59 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Instead of accounting every page range against the socket separately, do
it in batch based on the change in skb->truesize. It's also moved into
__zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
simpler and responsible for setting frags but not the accounting.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 net/core/datagram.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 7f7d5da2e406..2b24d69b1e94 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
 }
 EXPORT_SYMBOL(skb_copy_datagram_from_iter);
 
-static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
+static int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
 					struct iov_iter *from, size_t length)
 {
 	int frag = skb_shinfo(skb)->nr_frags;
@@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
 		int refs, order, n = 0;
 		size_t start;
 		ssize_t copied;
-		unsigned long truesize;
 
 		if (frag == MAX_SKB_FRAGS)
 			return -EMSGSIZE;
@@ -633,17 +632,9 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
 
 		length -= copied;
 
-		truesize = PAGE_ALIGN(copied + start);
 		skb->data_len += copied;
 		skb->len += copied;
-		skb->truesize += truesize;
-		if (sk && sk->sk_type == SOCK_STREAM) {
-			sk_wmem_queued_add(sk, truesize);
-			if (!skb_zcopy_pure(skb))
-				sk_mem_charge(sk, truesize);
-		} else {
-			refcount_add(truesize, &skb->sk->sk_wmem_alloc);
-		}
+		skb->truesize += PAGE_ALIGN(copied + start);
 
 		head = compound_head(pages[n]);
 		order = compound_order(head);
@@ -691,10 +682,24 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
 			    struct sk_buff *skb, struct iov_iter *from,
 			    size_t length)
 {
+	unsigned long orig_size = skb->truesize;
+	unsigned long truesize;
+	int ret;
+
 	if (msg && msg->msg_ubuf && msg->sg_from_iter)
 		return msg->sg_from_iter(sk, skb, from, length);
-	else
-		return zerocopy_fill_skb_from_iter(sk, skb, from, length);
+
+	ret = zerocopy_fill_skb_from_iter(skb, from, length);
+	truesize = skb->truesize - orig_size;
+
+	if (sk && sk->sk_type == SOCK_STREAM) {
+		sk_wmem_queued_add(sk, truesize);
+		if (!skb_zcopy_pure(skb))
+			sk_mem_charge(sk, truesize);
+	} else {
+		refcount_add(truesize, &skb->sk->sk_wmem_alloc);
+	}
+	return ret;
 }
 EXPORT_SYMBOL(__zerocopy_sg_from_iter);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/5] io_uring/net: move charging socket out of zc io_uring
  2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
                   ` (2 preceding siblings ...)
  2024-06-27 12:59 ` [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting Pavel Begunkov
@ 2024-06-27 12:59 ` Pavel Begunkov
  2024-06-28 17:06   ` Willem de Bruijn
  2024-06-27 12:59 ` [PATCH net-next 5/5] net: limit scope of a skb_zerocopy_iter_stream var Pavel Begunkov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-27 12:59 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Currently, io_uring's io_sg_from_iter() duplicates the part of
__zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy
to miss while changing it in net/, the chunk is not the most
straightforward for outside users and full of internal implementation
details. io_uring is not a good place to keep it, deduplicate it by
moving out of the callback into __zerocopy_sg_from_iter().

Signed-off-by: Pavel Begunkov <[email protected]>
---
 include/linux/skbuff.h |  3 +++
 include/linux/socket.h |  2 +-
 io_uring/net.c         | 16 ++++------------
 net/core/datagram.c    | 10 +++++-----
 4 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f4cda3fbdb75..9c29bdd5596d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1703,6 +1703,9 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
 			    struct sk_buff *skb, struct iov_iter *from,
 			    size_t length);
 
+int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
+				struct iov_iter *from, size_t length);
+
 static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb,
 					  struct msghdr *msg, int len)
 {
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 89d16b90370b..2a1ff91d1914 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -76,7 +76,7 @@ struct msghdr {
 	__kernel_size_t	msg_controllen;	/* ancillary data buffer length */
 	struct kiocb	*msg_iocb;	/* ptr to iocb for async requests */
 	struct ubuf_info *msg_ubuf;
-	int (*sg_from_iter)(struct sock *sk, struct sk_buff *skb,
+	int (*sg_from_iter)(struct sk_buff *skb,
 			    struct iov_iter *from, size_t length);
 };
 
diff --git a/io_uring/net.c b/io_uring/net.c
index 7c98c4d50946..84a7602bcef1 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1265,14 +1265,14 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return io_sendmsg_prep_setup(req, req->opcode == IORING_OP_SENDMSG_ZC);
 }
 
-static int io_sg_from_iter_iovec(struct sock *sk, struct sk_buff *skb,
+static int io_sg_from_iter_iovec(struct sk_buff *skb,
 				 struct iov_iter *from, size_t length)
 {
 	skb_zcopy_downgrade_managed(skb);
-	return __zerocopy_sg_from_iter(NULL, sk, skb, from, length);
+	return zerocopy_fill_skb_from_iter(skb, from, length);
 }
 
-static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb,
+static int io_sg_from_iter(struct sk_buff *skb,
 			   struct iov_iter *from, size_t length)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
@@ -1285,7 +1285,7 @@ static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb,
 	if (!frag)
 		shinfo->flags |= SKBFL_MANAGED_FRAG_REFS;
 	else if (unlikely(!skb_zcopy_managed(skb)))
-		return __zerocopy_sg_from_iter(NULL, sk, skb, from, length);
+		return zerocopy_fill_skb_from_iter(skb, from, length);
 
 	bi.bi_size = min(from->count, length);
 	bi.bi_bvec_done = from->iov_offset;
@@ -1312,14 +1312,6 @@ static int io_sg_from_iter(struct sock *sk, struct sk_buff *skb,
 	skb->data_len += copied;
 	skb->len += copied;
 	skb->truesize += truesize;
-
-	if (sk && sk->sk_type == SOCK_STREAM) {
-		sk_wmem_queued_add(sk, truesize);
-		if (!skb_zcopy_pure(skb))
-			sk_mem_charge(sk, truesize);
-	} else {
-		refcount_add(truesize, &skb->sk->sk_wmem_alloc);
-	}
 	return ret;
 }
 
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 2b24d69b1e94..554316c40883 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -610,8 +610,8 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
 }
 EXPORT_SYMBOL(skb_copy_datagram_from_iter);
 
-static int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
-					struct iov_iter *from, size_t length)
+int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
+				struct iov_iter *from, size_t length)
 {
 	int frag = skb_shinfo(skb)->nr_frags;
 
@@ -687,11 +687,11 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
 	int ret;
 
 	if (msg && msg->msg_ubuf && msg->sg_from_iter)
-		return msg->sg_from_iter(sk, skb, from, length);
+		ret = msg->sg_from_iter(skb, from, length);
+	else
+		ret = zerocopy_fill_skb_from_iter(skb, from, length);
 
-	ret = zerocopy_fill_skb_from_iter(skb, from, length);
 	truesize = skb->truesize - orig_size;
-
 	if (sk && sk->sk_type == SOCK_STREAM) {
 		sk_wmem_queued_add(sk, truesize);
 		if (!skb_zcopy_pure(skb))
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/5] net: limit scope of a skb_zerocopy_iter_stream var
  2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
                   ` (3 preceding siblings ...)
  2024-06-27 12:59 ` [PATCH net-next 4/5] io_uring/net: move charging socket out of zc io_uring Pavel Begunkov
@ 2024-06-27 12:59 ` Pavel Begunkov
  2024-06-28 17:07   ` Willem de Bruijn
  2024-06-28 17:09 ` [PATCH net-next 0/5] zerocopy tx cleanups Jens Axboe
  2024-07-02 10:30 ` patchwork-bot+netdevbpf
  6 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-27 12:59 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

skb_zerocopy_iter_stream() only uses @orig_uarg in the !link_skb path,
and we can move the local variable in the appropriate block.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 net/core/skbuff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9f28822dde6f..9b71e4b8796a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1856,7 +1856,6 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 			     struct msghdr *msg, int len,
 			     struct ubuf_info *uarg)
 {
-	struct ubuf_info *orig_uarg = skb_zcopy(skb);
 	int err, orig_len = skb->len;
 
 	if (uarg->ops->link_skb) {
@@ -1864,6 +1863,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 		if (err)
 			return err;
 	} else {
+		struct ubuf_info *orig_uarg = skb_zcopy(skb);
+
 		/* An skb can only point to one uarg. This edge case happens
 		 * when TCP appends to an skb, but zerocopy_realloc triggered
 		 * a new alloc.
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/5] net: always try to set ubuf in skb_zerocopy_iter_stream
  2024-06-27 12:59 ` [PATCH net-next 1/5] net: always try to set ubuf in skb_zerocopy_iter_stream Pavel Begunkov
@ 2024-06-28 17:03   ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2024-06-28 17:03 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Pavel Begunkov wrote:
> skb_zcopy_set() does nothing if there is already a ubuf_info associated
> with an skb, and since ->link_skb should have set it several lines above
> the check here essentially does nothing and can be removed. It's also
> safer this way, because even if the callback is faulty we'll
> have it set.
> 
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 2/5] net: split __zerocopy_sg_from_iter()
  2024-06-27 12:59 ` [PATCH net-next 2/5] net: split __zerocopy_sg_from_iter() Pavel Begunkov
@ 2024-06-28 17:03   ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2024-06-28 17:03 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Pavel Begunkov wrote:
> Split a function out of __zerocopy_sg_from_iter() that only cares about
> the traditional path with refcounted pages and doesn't need to know
> about ->sg_from_iter. A preparation patch, we'll improve on the function
> later.
> 
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting
  2024-06-27 12:59 ` [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting Pavel Begunkov
@ 2024-06-28 17:06   ` Willem de Bruijn
  2024-07-01 11:11     ` Pavel Begunkov
  0 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2024-06-28 17:06 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Pavel Begunkov wrote:
> Instead of accounting every page range against the socket separately, do
> it in batch based on the change in skb->truesize. It's also moved into
> __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
> simpler and responsible for setting frags but not the accounting.
> 
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

> ---
>  net/core/datagram.c | 31 ++++++++++++++++++-------------
>  1 file changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 7f7d5da2e406..2b24d69b1e94 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
>  }
>  EXPORT_SYMBOL(skb_copy_datagram_from_iter);
>  
> -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
> +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
>  					struct iov_iter *from, size_t length)
>  {
>  	int frag = skb_shinfo(skb)->nr_frags;
> @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
>  		int refs, order, n = 0;
>  		size_t start;
>  		ssize_t copied;
> -		unsigned long truesize;
>  
>  		if (frag == MAX_SKB_FRAGS)
>  			return -EMSGSIZE;

Does the existing code then incorrectly not unwind sk_wmem_queued_add
and sk_mem_charge if returning with error from the second or later
loop..

> @@ -633,17 +632,9 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
>  
>  		length -= copied;
>  
> -		truesize = PAGE_ALIGN(copied + start);
>  		skb->data_len += copied;
>  		skb->len += copied;
> -		skb->truesize += truesize;
> -		if (sk && sk->sk_type == SOCK_STREAM) {
> -			sk_wmem_queued_add(sk, truesize);
> -			if (!skb_zcopy_pure(skb))
> -				sk_mem_charge(sk, truesize);
> -		} else {
> -			refcount_add(truesize, &skb->sk->sk_wmem_alloc);
> -		}
> +		skb->truesize += PAGE_ALIGN(copied + start);
>  
>  		head = compound_head(pages[n]);
>  		order = compound_order(head);
> @@ -691,10 +682,24 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk,
>  			    struct sk_buff *skb, struct iov_iter *from,
>  			    size_t length)
>  {
> +	unsigned long orig_size = skb->truesize;
> +	unsigned long truesize;
> +	int ret;
> +
>  	if (msg && msg->msg_ubuf && msg->sg_from_iter)
>  		return msg->sg_from_iter(sk, skb, from, length);
> -	else
> -		return zerocopy_fill_skb_from_iter(sk, skb, from, length);
> +
> +	ret = zerocopy_fill_skb_from_iter(skb, from, length);
> +	truesize = skb->truesize - orig_size;
> +
> +	if (sk && sk->sk_type == SOCK_STREAM) {
> +		sk_wmem_queued_add(sk, truesize);
> +		if (!skb_zcopy_pure(skb))
> +			sk_mem_charge(sk, truesize);
> +	} else {
> +		refcount_add(truesize, &skb->sk->sk_wmem_alloc);
> +	}
> +	return ret;
>  }
>  EXPORT_SYMBOL(__zerocopy_sg_from_iter);
>  
> -- 
> 2.44.0
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 4/5] io_uring/net: move charging socket out of zc io_uring
  2024-06-27 12:59 ` [PATCH net-next 4/5] io_uring/net: move charging socket out of zc io_uring Pavel Begunkov
@ 2024-06-28 17:06   ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2024-06-28 17:06 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Pavel Begunkov wrote:
> Currently, io_uring's io_sg_from_iter() duplicates the part of
> __zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy
> to miss while changing it in net/, the chunk is not the most
> straightforward for outside users and full of internal implementation
> details. io_uring is not a good place to keep it, deduplicate it by
> moving out of the callback into __zerocopy_sg_from_iter().
> 
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 5/5] net: limit scope of a skb_zerocopy_iter_stream var
  2024-06-27 12:59 ` [PATCH net-next 5/5] net: limit scope of a skb_zerocopy_iter_stream var Pavel Begunkov
@ 2024-06-28 17:07   ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2024-06-28 17:07 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: Jens Axboe, asml.silence, David S . Miller, Jakub Kicinski,
	David Ahern, Eric Dumazet, Willem de Bruijn

Pavel Begunkov wrote:
> skb_zerocopy_iter_stream() only uses @orig_uarg in the !link_skb path,
> and we can move the local variable in the appropriate block.
> 
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Willem de Bruijn <[email protected]>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/5] zerocopy tx cleanups
  2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
                   ` (4 preceding siblings ...)
  2024-06-27 12:59 ` [PATCH net-next 5/5] net: limit scope of a skb_zerocopy_iter_stream var Pavel Begunkov
@ 2024-06-28 17:09 ` Jens Axboe
  2024-07-02 10:30 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2024-06-28 17:09 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, netdev
  Cc: David S . Miller, Jakub Kicinski, David Ahern, Eric Dumazet,
	Willem de Bruijn

On 6/27/24 6:59 AM, Pavel Begunkov wrote:
> Assorted zerocopy send path cleanups, the main part of which is
> moving some net stack specific accounting out of io_uring back
> to net/ in Patch 4.

Looks good to me:

Reviewed-by: Jens Axboe <[email protected]>

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting
  2024-06-28 17:06   ` Willem de Bruijn
@ 2024-07-01 11:11     ` Pavel Begunkov
  2024-07-01 18:27       ` Willem de Bruijn
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-07-01 11:11 UTC (permalink / raw)
  To: Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern,
	Eric Dumazet

On 6/28/24 18:06, Willem de Bruijn wrote:
> Pavel Begunkov wrote:
>> Instead of accounting every page range against the socket separately, do
>> it in batch based on the change in skb->truesize. It's also moved into
>> __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
>> simpler and responsible for setting frags but not the accounting.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
> 
> Reviewed-by: Willem de Bruijn <[email protected]>

Thanks for reviews!

>> ---
>>   net/core/datagram.c | 31 ++++++++++++++++++-------------
>>   1 file changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>> index 7f7d5da2e406..2b24d69b1e94 100644
>> --- a/net/core/datagram.c
>> +++ b/net/core/datagram.c
>> @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
>>   }
>>   EXPORT_SYMBOL(skb_copy_datagram_from_iter);
>>   
>> -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
>> +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
>>   					struct iov_iter *from, size_t length)
>>   {
>>   	int frag = skb_shinfo(skb)->nr_frags;
>> @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
>>   		int refs, order, n = 0;
>>   		size_t start;
>>   		ssize_t copied;
>> -		unsigned long truesize;
>>   
>>   		if (frag == MAX_SKB_FRAGS)
>>   			return -EMSGSIZE;
> 
> Does the existing code then incorrectly not unwind sk_wmem_queued_add
> and sk_mem_charge if returning with error from the second or later
> loop..

As long as ->truesize matches what's accounted to the socket,
kfree_skb() -> sock_wfree()/->destructor() should take care of it.
With sk_mem_charge() I assume __zerocopy_sg_from_iter -> ___pskb_trim()
should do it, need to look it up, but if not, it sounds like a temporary
over estimation until the skb is put down. I don't see anything
concerning. Is that the scenario you're worried about?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting
  2024-07-01 11:11     ` Pavel Begunkov
@ 2024-07-01 18:27       ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2024-07-01 18:27 UTC (permalink / raw)
  To: Pavel Begunkov, Willem de Bruijn, io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, David Ahern,
	Eric Dumazet

Pavel Begunkov wrote:
> On 6/28/24 18:06, Willem de Bruijn wrote:
> > Pavel Begunkov wrote:
> >> Instead of accounting every page range against the socket separately, do
> >> it in batch based on the change in skb->truesize. It's also moved into
> >> __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
> >> simpler and responsible for setting frags but not the accounting.
> >>
> >> Signed-off-by: Pavel Begunkov <[email protected]>
> > 
> > Reviewed-by: Willem de Bruijn <[email protected]>
> 
> Thanks for reviews!
> 
> >> ---
> >>   net/core/datagram.c | 31 ++++++++++++++++++-------------
> >>   1 file changed, 18 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/net/core/datagram.c b/net/core/datagram.c
> >> index 7f7d5da2e406..2b24d69b1e94 100644
> >> --- a/net/core/datagram.c
> >> +++ b/net/core/datagram.c
> >> @@ -610,7 +610,7 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
> >>   }
> >>   EXPORT_SYMBOL(skb_copy_datagram_from_iter);
> >>   
> >> -static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
> >> +static int zerocopy_fill_skb_from_iter(struct sk_buff *skb,
> >>   					struct iov_iter *from, size_t length)
> >>   {
> >>   	int frag = skb_shinfo(skb)->nr_frags;
> >> @@ -621,7 +621,6 @@ static int zerocopy_fill_skb_from_iter(struct sock *sk, struct sk_buff *skb,
> >>   		int refs, order, n = 0;
> >>   		size_t start;
> >>   		ssize_t copied;
> >> -		unsigned long truesize;
> >>   
> >>   		if (frag == MAX_SKB_FRAGS)
> >>   			return -EMSGSIZE;
> > 
> > Does the existing code then incorrectly not unwind sk_wmem_queued_add
> > and sk_mem_charge if returning with error from the second or later
> > loop..
> 
> As long as ->truesize matches what's accounted to the socket,
> kfree_skb() -> sock_wfree()/->destructor() should take care of it.
> With sk_mem_charge() I assume __zerocopy_sg_from_iter -> ___pskb_trim()
> should do it, need to look it up, but if not, it sounds like a temporary
> over estimation until the skb is put down. I don't see anything
> concerning. Is that the scenario you're worried about?

Oh indeed. Thanks. I don't see ___pskb_trim adjusting except for the
cases where it calls skb_condese, but neither does it adjust truesize.
So indeed a temporary over estimation until e.g., tcp_wmem_free_skb.
Sounds fine.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 0/5] zerocopy tx cleanups
  2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
                   ` (5 preceding siblings ...)
  2024-06-28 17:09 ` [PATCH net-next 0/5] zerocopy tx cleanups Jens Axboe
@ 2024-07-02 10:30 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-07-02 10:30 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: io-uring, netdev, axboe, davem, kuba, dsahern, edumazet,
	willemdebruijn.kernel

Hello:

This series was applied to netdev/net-next.git (main)
by Paolo Abeni <[email protected]>:

On Thu, 27 Jun 2024 13:59:40 +0100 you wrote:
> Assorted zerocopy send path cleanups, the main part of which is
> moving some net stack specific accounting out of io_uring back
> to net/ in Patch 4.
> 
> Pavel Begunkov (5):
>   net: always try to set ubuf in skb_zerocopy_iter_stream
>   net: split __zerocopy_sg_from_iter()
>   net: batch zerocopy_fill_skb_from_iter accounting
>   io_uring/net: move charging socket out of zc io_uring
>   net: limit scope of a skb_zerocopy_iter_stream var
> 
> [...]

Here is the summary with links:
  - [net-next,1/5] net: always try to set ubuf in skb_zerocopy_iter_stream
    https://git.kernel.org/netdev/net-next/c/9e2db9d3993e
  - [net-next,2/5] net: split __zerocopy_sg_from_iter()
    https://git.kernel.org/netdev/net-next/c/7fb05423fed4
  - [net-next,3/5] net: batch zerocopy_fill_skb_from_iter accounting
    https://git.kernel.org/netdev/net-next/c/aeb320fc05c7
  - [net-next,4/5] io_uring/net: move charging socket out of zc io_uring
    https://git.kernel.org/netdev/net-next/c/060f4ba6e403
  - [net-next,5/5] net: limit scope of a skb_zerocopy_iter_stream var
    https://git.kernel.org/netdev/net-next/c/2ca58ed21cef

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-07-02 10:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-27 12:59 [PATCH net-next 0/5] zerocopy tx cleanups Pavel Begunkov
2024-06-27 12:59 ` [PATCH net-next 1/5] net: always try to set ubuf in skb_zerocopy_iter_stream Pavel Begunkov
2024-06-28 17:03   ` Willem de Bruijn
2024-06-27 12:59 ` [PATCH net-next 2/5] net: split __zerocopy_sg_from_iter() Pavel Begunkov
2024-06-28 17:03   ` Willem de Bruijn
2024-06-27 12:59 ` [PATCH net-next 3/5] net: batch zerocopy_fill_skb_from_iter accounting Pavel Begunkov
2024-06-28 17:06   ` Willem de Bruijn
2024-07-01 11:11     ` Pavel Begunkov
2024-07-01 18:27       ` Willem de Bruijn
2024-06-27 12:59 ` [PATCH net-next 4/5] io_uring/net: move charging socket out of zc io_uring Pavel Begunkov
2024-06-28 17:06   ` Willem de Bruijn
2024-06-27 12:59 ` [PATCH net-next 5/5] net: limit scope of a skb_zerocopy_iter_stream var Pavel Begunkov
2024-06-28 17:07   ` Willem de Bruijn
2024-06-28 17:09 ` [PATCH net-next 0/5] zerocopy tx cleanups Jens Axboe
2024-07-02 10:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox