public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, io-uring@vger.kernel.org,
	 Eric Dumazet <edumazet@google.com>,
	Willem de Bruijn <willemb@google.com>,
	 Paolo Abeni <pabeni@redhat.com>,
	andrew+netdev@lunn.ch, horms@kernel.org,  davem@davemloft.net,
	sdf@fomichev.me, dw@davidwei.uk,  michael.chan@broadcom.com,
	dtatulea@nvidia.com, ap420073@gmail.com
Subject: Re: [RFC v1 00/22] Large rx buffer support for zcrx
Date: Mon, 28 Jul 2025 11:54:12 -0700	[thread overview]
Message-ID: <CAHS8izMyhMFA5DwBmHNJpEfPLE6xUmA453V+tF4pdWAenbrV3w@mail.gmail.com> (raw)
In-Reply-To: <cover.1753694913.git.asml.silence@gmail.com>

On Mon, Jul 28, 2025 at 4:03 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> This series implements large rx buffer support for io_uring/zcrx on
> top of Jakub's queue configuration changes, but it can also be used
> by other memory providers. Large rx buffers can be drastically
> beneficial with high-end hw-gro enabled cards that can coalesce traffic
> into larger pages, reducing the number of frags traversing the network
> stack and resuling in larger contiguous chunks of data for the
> userspace. Benchamrks showed up to ~30% improvement in CPU util.
>

Very exciting.

I have not yet had a chance to thoroughly look, but even still I have
a few high level questions/concerns. Maybe you already have answers to
them that can make my life a bit easier as I try to take a thorough
look.

- I'm a bit confused that you're not making changes to the core net
stack to support non-PAGE_SIZE netmems. From a quick glance, it seems
that there are potentially a ton of places in the net stack that
assume PAGE_SIZE:

cd net
ackc "PAGE_SIZE|PAGE_SHIFT" | wc -l
468

Are we sure none of these places assuming PAGE_SIZE or PAGE_SHIFT are
concerning?

- You're not adding a field in the net_iov that tells us how big the
net_iov is. It seems to me you're configuring the driver to set the rx
buffer size, then assuming all the pp allocations are of that size,
then assuming in the rxzc code that all the net_iov are of that size.
I think a few problems may happen?

(a) what happens if the rx buffer size is re-configured? Does the
io_uring rxrc instance get recreated as well?
(b) what happens with skb coalescing? skb coalescing is already a bit
of a mess. We don't allow coalescing unreadable and readable skbs, but
we do allow coalescing devmem and iozcrx skbs which could lead to some
bugs I'm guessing already. AFAICT as of this patch series we may allow
coalescing of skbs with netmems inside of them of different sizes, but
AFAICT so far, the iozcrx assume the size is constant across all the
netmems it gets, which I'm not sure is always true?

For all these reasons I had assumed that we'd need space in the
net_iov that tells us its size: net_iov->size.

And then netmem_size(netmem) would replace all the PAGE_SIZE
assumptions in the net stack, and then we'd disallow coalescing of
skbs with different-sized netmems (else we need to handle them
correctly per the netmem_size).

-- 
Thanks,
Mina

  parent reply	other threads:[~2025-07-28 18:54 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-28 11:04 [RFC v1 00/22] Large rx buffer support for zcrx Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 01/22] docs: ethtool: document that rx_buf_len must control payload lengths Pavel Begunkov
2025-07-28 18:11   ` Mina Almasry
2025-07-28 21:36   ` Mina Almasry
2025-08-01 23:13     ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 02/22] net: ethtool: report max value for rx-buf-len Pavel Begunkov
2025-07-29  5:00   ` Subbaraya Sundeep
2025-07-28 11:04 ` [RFC v1 03/22] net: use zero value to restore rx_buf_len to default Pavel Begunkov
2025-07-29  5:03   ` Subbaraya Sundeep
2025-07-28 11:04 ` [RFC v1 04/22] net: clarify the meaning of netdev_config members Pavel Begunkov
2025-07-28 21:44   ` Mina Almasry
2025-08-01 23:14     ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 05/22] net: add rx_buf_len to netdev config Pavel Begunkov
2025-07-28 21:50   ` Mina Almasry
2025-08-01 23:18     ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 06/22] eth: bnxt: read the page size from the adapter struct Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 07/22] eth: bnxt: set page pool page order based on rx_page_size Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 08/22] eth: bnxt: support setting size of agg buffers via ethtool Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 09/22] net: move netdev_config manipulation to dedicated helpers Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 10/22] net: reduce indent of struct netdev_queue_mgmt_ops members Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 11/22] net: allocate per-queue config structs and pass them thru the queue API Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 12/22] net: pass extack to netdev_rx_queue_restart() Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 13/22] net: add queue config validation callback Pavel Begunkov
2025-07-28 22:26   ` Mina Almasry
2025-07-28 11:04 ` [RFC v1 14/22] eth: bnxt: always set the queue mgmt ops Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 15/22] eth: bnxt: store the rx buf size per queue Pavel Begunkov
2025-07-28 22:33   ` Mina Almasry
2025-08-01 23:20     ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 16/22] eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 17/22] netdev: add support for setting rx-buf-len per queue Pavel Begunkov
2025-07-28 23:10   ` Mina Almasry
2025-08-01 23:37     ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 18/22] net: wipe the setting of deactived queues Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 19/22] eth: bnxt: use queue op config validate Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 20/22] eth: bnxt: support per queue configuration of rx-buf-len Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 21/22] net: parametrise mp open with a queue config Pavel Begunkov
2025-08-02  0:10   ` Jakub Kicinski
2025-08-04 12:50     ` Pavel Begunkov
2025-08-05 22:43       ` Jakub Kicinski
2025-08-06  0:05       ` Jakub Kicinski
2025-08-06 16:48       ` Mina Almasry
2025-08-06 18:11         ` Jakub Kicinski
2025-08-06 18:30           ` Mina Almasry
2025-08-06 22:05             ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 22/22] io_uring/zcrx: implement large rx buffer support Pavel Begunkov
2025-07-28 17:13 ` [RFC v1 00/22] Large rx buffer support for zcrx Stanislav Fomichev
2025-07-28 18:18   ` Pavel Begunkov
2025-07-28 20:21     ` Stanislav Fomichev
2025-07-28 21:28       ` Pavel Begunkov
2025-07-28 22:06         ` Stanislav Fomichev
2025-07-28 22:44           ` Pavel Begunkov
2025-07-29 16:33             ` Stanislav Fomichev
2025-07-30 14:16               ` Pavel Begunkov
2025-07-30 15:50                 ` Stanislav Fomichev
2025-07-31 19:34                   ` Mina Almasry
2025-07-31 19:57                     ` Pavel Begunkov
2025-07-31 20:05                       ` Mina Almasry
2025-08-01  9:48                         ` Pavel Begunkov
2025-08-01  9:58                     ` Pavel Begunkov
2025-07-28 23:22           ` Mina Almasry
2025-07-29 16:41             ` Stanislav Fomichev
2025-07-29 17:01               ` Mina Almasry
2025-07-28 18:54 ` Mina Almasry [this message]
2025-07-28 19:42   ` Pavel Begunkov
2025-07-28 20:23     ` Mina Almasry
2025-07-28 20:57       ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHS8izMyhMFA5DwBmHNJpEfPLE6xUmA453V+tF4pdWAenbrV3w@mail.gmail.com \
    --to=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ap420073@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=kuba@kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox