From: Pavel Begunkov <asml.silence@gmail.com>
To: netdev@vger.kernel.org, io-uring@vger.kernel.org
Cc: Michael Chan <michael.chan@broadcom.com>,
	Pavan Chebbi <pavan.chebbi@broadcom.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Simon Horman <horms@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Mina Almasry <almasrymina@google.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Willem de Bruijn <willemb@google.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	David Wei <dw@davidwei.uk>,
	linux-kernel@vger.kernel.org
Subject: [PATCH net-next v5 00/24][pull request] Add support for providers with large rx buffer
Date: Tue, 14 Oct 2025 14:01:20 +0100	[thread overview]
Message-ID: <cover.1760440268.git.asml.silence@gmail.com> (raw)
Many modern network cards support configurable rx buffer lengths larger
than typically used PAGE_SIZE. When paired with hw-gro larger rx buffer
sizes can drastically reduce the number of buffers traversing the stack
and save a lot of processing time. Another benefit for memory providers
like zcrx is that the userspace will be getting larger contiguous chunks
as well.
This series adds net infrastructure for memory providers configuring
the size and implements it for bnxt. It'll be used by io_uring/zcrx,
which is intentionally separated to simplify merging. You can find
a branch that includes zcrx changes at [1] and an example liburing
program at [3].
It's an opt-in feature for drivers, they should advertise support for
the parameter in the qops and must check if the hardware supports
the given size. It's limited to memory providers as it drastically
simplifies the series comparing with previous revisions and detangles
it from ethtool api discussions.
The idea was first floated around by Saeed during netdev conf 2024.
The series also borrows some patches from [2].
Benchmarks with zcrx [3] show up to ~30% improvement in CPU util.
E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC, napi and
userspace pinned to the same CPU:
packets=23987040 (MB=2745098), rps=199559 (MB/s=22837)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    1.53    0.00   27.78    2.72    1.31   66.45    0.22
packets=24078368 (MB=2755550), rps=200319 (MB/s=22924)
CPU    %usr   %nice    %sys %iowait    %irq   %soft   %idle
  0    0.69    0.00    8.26   31.65    1.83   57.00    0.57
netdev + zcrx changes:
[1] https://github.com/isilence/linux.git zcrx/large-buffers-v5
Per queue configuration series:
[2] https://lore.kernel.org/all/20250421222827.283737-1-kuba@kernel.org/
Liburing example:
[3] https://github.com/isilence/liburing.git zcrx/rx-buf-len
---
The following changes since commit 3a8660878839faadb4f1a6dd72c3179c1df56787:
  Linux 6.18-rc1 (2025-10-12 13:42:36 -0700)
are available in the Git repository at:
  https://github.com/isilence/linux.git tags/net-for-6.19-queue-rx-buf-len-v5
for you to fetch changes up to f389276330412ec4305fb423944261e78490f06a:
  eth: bnxt: allow providers to set rx buf size (2025-10-14 00:11:59 +0100)
v5: Remove all unnecessary bits like configuration via netlink, and
    multi-stage queue configuration.
v4: https://lore.kernel.org/all/cover.1760364551.git.asml.silence@gmail.com/
    - Update fbnic qops
    - Propagate max buf len for hns3
    - Use configured buf size in __bnxt_alloc_rx_netmem
    - Minor stylistic changes
v3: https://lore.kernel.org/all/cover.1755499375.git.asml.silence@gmail.com/
    - Rebased, excluded zcrx specific patches
    - Set agg_size_fac to 1 on warning
v2: https://lore.kernel.org/all/cover.1754657711.git.asml.silence@gmail.com/
    - Add MAX_PAGE_ORDER check on pp init
    - Applied comments rewording
    - Adjust pp.max_len based on order
    - Patch up mlx5 queue callbacks after rebase
    - Minor ->queue_mgmt_ops refactoring
    - Rebased to account for both fill level and agg_size_fac
    - Pass providers buf length in struct pp_memory_provider_params and
      apply it in __netdev_queue_confi().
    - Use ->supported_ring_params to validate drivers support of set
      qcfg parameters.
Jakub Kicinski (1):
  eth: bnxt: adjust the fill level of agg queues with larger buffers
Pavel Begunkov (5):
  net: page_pool: sanitise allocation order
  net: memzero mp params when closing a queue
  net: let pp memory provider to specify rx buf len
  eth: bnxt: store rx buffer size per queue
  eth: bnxt: allow providers to set rx buf size
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 118 ++++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |   6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +-
 include/net/netdev_queues.h                   |   9 ++
 include/net/page_pool/types.h                 |   1 +
 net/core/netdev_rx_queue.c                    |  14 ++-
 net/core/page_pool.c                          |   3 +
 8 files changed, 118 insertions(+), 37 deletions(-)
-- 
2.49.0
next             reply	other threads:[~2025-10-14 13:00 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14 13:01 Pavel Begunkov [this message]
2025-10-14 13:01 ` [PATCH net-next v5 1/6] net: page_pool: sanitise allocation order Pavel Begunkov
2025-10-14 13:01 ` [PATCH net-next v5 2/6] net: memzero mp params when closing a queue Pavel Begunkov
2025-10-14 13:01 ` [PATCH net-next v5 3/6] net: let pp memory provider to specify rx buf len Pavel Begunkov
2025-10-14 13:01 ` [PATCH net-next v5 4/6] eth: bnxt: store rx buffer size per queue Pavel Begunkov
2025-10-14 13:01 ` [PATCH net-next v5 5/6] eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov
2025-10-14 13:01 ` [PATCH net-next v5 6/6] eth: bnxt: allow providers to set rx buf size Pavel Begunkov
2025-10-14 13:03 ` [PATCH net-next v5 00/24][pull request] Add support for providers with large rx buffer Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=cover.1760440268.git.asml.silence@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=io-uring@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    --cc=saeedm@nvidia.com \
    --cc=sdf@fomichev.me \
    --cc=tariqt@nvidia.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox