public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Willem de Bruijn <[email protected]>, Jakub Kicinski <[email protected]>
Cc: [email protected], [email protected],
	[email protected],
	"David S . Miller" <[email protected]>,
	Jonathan Lemon <[email protected]>,
	Jens Axboe <[email protected]>, David Ahern <[email protected]>,
	[email protected]
Subject: Re: [PATCH net-next v5 01/27] ipv4: avoid partial copy for zc
Date: Thu, 21 Jul 2022 11:03:47 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CA+FuTSf0+cJ9_N_xrHmCGX_KoVCWcE0YQBdtgEkzGvcLMSv7Qw@mail.gmail.com>

On 7/19/22 10:35, Willem de Bruijn wrote:
> On Tue, Jul 19, 2022 at 3:54 AM Jakub Kicinski <[email protected]> wrote:
>>
>> On Tue, 12 Jul 2022 21:52:25 +0100 Pavel Begunkov wrote:
>>> Even when zerocopy transmission is requested and possible,
>>> __ip_append_data() will still copy a small chunk of data just because it
>>> allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles
>>> on copy and iter manipulations and also misalignes potentially aligned
>>> data. Avoid such coies. And as a bonus we can allocate smaller skb.
>>
>> s/coies/copies/ can fix when applying
>>
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>>   net/ipv4/ip_output.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>> index 00b4bf26fd93..581d1e233260 100644
>>> --- a/net/ipv4/ip_output.c
>>> +++ b/net/ipv4/ip_output.c
>>> @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk,
>>>        struct inet_sock *inet = inet_sk(sk);
>>>        struct ubuf_info *uarg = NULL;
>>>        struct sk_buff *skb;
>>> -
>>>        struct ip_options *opt = cork->opt;
>>>        int hh_len;
>>>        int exthdrlen;
>>> @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk,
>>>        int copy;
>>>        int err;
>>>        int offset = 0;
>>> +     bool zc = false;
>>>        unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
>>>        int csummode = CHECKSUM_NONE;
>>>        struct rtable *rt = (struct rtable *)cork->dst;
>>> @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk,
>>>                if (rt->dst.dev->features & NETIF_F_SG &&
>>>                    csummode == CHECKSUM_PARTIAL) {
>>>                        paged = true;
>>> +                     zc = true;
>>>                } else {
>>>                        uarg->zerocopy = 0;
>>>                        skb_zcopy_set(skb, uarg, &extra_uref);
>>> @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk,
>>>                                 (fraglen + alloc_extra < SKB_MAX_ALLOC ||
>>>                                  !(rt->dst.dev->features & NETIF_F_SG)))
>>>                                alloclen = fraglen;
>>> -                     else {
>>> +                     else if (!zc) {
>>>                                alloclen = min_t(int, fraglen, MAX_HEADER);
>>
>> Willem, I think this came in with your GSO work, is there a reason we
>> use MAX_HEADER here? I thought MAX_HEADER is for headers (i.e. more or
>> less to be reserved) not for the min amount of data to be included.
>>
>> I wanna make sure we're not missing something about GSO here.
>>
>> Otherwise I don't think we need the extra branch but that can
>> be a follow up.

I brought it up before but left it for later as I don't know workloads
and there might be perf implications. I'll send a follow up.

> The change was introduced for UDP GSO, to avoid copying most payload
> on software segmentation:
> 
> "
> commit 15e36f5b8e982debe43e425d2e12d34e022d51e9
> Author: Willem de Bruijn <[email protected]>
> Date:   Thu Apr 26 13:42:19 2018 -0400
> 
>      udp: paged allocation with gso
> 
>      When sending large datagrams that are later segmented, store data in
>      page frags to avoid copying from linear in skb_segment.
> "
> 
> and in code
> 
> -                       else
> -                               alloclen = datalen + fragheaderlen;
> +                       else if (!paged)
> +                               alloclen = fraglen;
> +                       else {
> +                               alloclen = min_t(int, fraglen, MAX_HEADER);
> +                               pagedlen = fraglen - alloclen;
> +                       }
> 
> 
> MAX_HEADER was a short-hand for the exact header length. "alloclen =
> fragheaderlen + transhdrlen;" is probably a better choice indeed.

Great, thanks for taking a look!

> 
> Whether with branch or without, the same change needs to be made to
> __ip6_append_data, just as in the referenced commit. Let's keep the
> stacks in sync.

__ip6_append_data() is changed as well but in the following patch.
I had doubts whether it's preferable to keep ipv4 and ipv6 changes
separately.

> This is tricky code. If in doubt, run the msg_zerocopy and udp_gso
> tests from tools/testing/selftests/net, ideally with KASAN.

-- 
Pavel Begunkov

  reply	other threads:[~2022-07-21 10:04 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 20:52 [PATCH net-next v5 00/27] io_uring zerocopy send Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 01/27] ipv4: avoid partial copy for zc Pavel Begunkov
2022-07-19  1:54   ` Jakub Kicinski
2022-07-19  9:35     ` Willem de Bruijn
2022-07-21 10:03       ` Pavel Begunkov [this message]
2022-07-12 20:52 ` [PATCH net-next v5 02/27] ipv6: " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 03/27] skbuff: don't mix ubuf_info from different sources Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 04/27] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 05/27] skbuff: carry external ubuf_info in msghdr Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 06/27] net: Allow custom iter handler " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 07/27] net: introduce managed frags infrastructure Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 08/27] net: introduce __skb_fill_page_desc_noacc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 09/27] ipv4/udp: support externally provided ubufs Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 10/27] ipv6/udp: " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 11/27] tcp: " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 12/27] io_uring: initialise msghdr::msg_ubuf Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 13/27] io_uring: export io_put_task() Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 14/27] io_uring: add zc notification infrastructure Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 15/27] io_uring: cache struct io_notif Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 16/27] io_uring: complete notifiers in tw Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 17/27] io_uring: add rsrc referencing for notifiers Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 18/27] io_uring: add notification slot registration Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 19/27] io_uring: wire send zc request type Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 20/27] io_uring: account locked pages for non-fixed zc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 21/27] io_uring: allow to pass addr into sendzc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 22/27] io_uring: sendzc with fixed buffers Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 23/27] io_uring: flush notifiers after sendzc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 24/27] io_uring: rename IORING_OP_FILES_UPDATE Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 25/27] io_uring: add zc notification flush requests Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 26/27] io_uring: enable managed frags with register buffers Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 27/27] selftests/io_uring: test zerocopy send Pavel Begunkov
2022-07-27  8:01   ` dust.li
2022-07-27  9:18     ` Pavel Begunkov
2022-07-20 12:46 ` (subset) [PATCH net-next v5 00/27] io_uring " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox