From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Stanislav Fomichev <stfomichev@gmail.com>,
Jakub Kicinski <kuba@kernel.org>,
netdev@vger.kernel.org, io-uring@vger.kernel.org,
Eric Dumazet <edumazet@google.com>,
Willem de Bruijn <willemb@google.com>,
Paolo Abeni <pabeni@redhat.com>,
andrew+netdev@lunn.ch, horms@kernel.org, davem@davemloft.net,
sdf@fomichev.me, dw@davidwei.uk, michael.chan@broadcom.com,
dtatulea@nvidia.com, ap420073@gmail.com
Subject: Re: [RFC v1 00/22] Large rx buffer support for zcrx
Date: Fri, 1 Aug 2025 10:48:57 +0100 [thread overview]
Message-ID: <58a592bf-3e88-4ad4-8a6e-37dd9319da99@gmail.com> (raw)
In-Reply-To: <CAHS8izM9238zKuFy1ifyigXmG8sbB8h=A=XqtLz-i0U2WM3vqw@mail.gmail.com>
On 7/31/25 21:05, Mina Almasry wrote:
> On Thu, Jul 31, 2025 at 12:56 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
...>>>>>> If the setup is done outside, you can also setup rx-buf-len outside, no?
>>>>>
>>>>> You can't do it without assuming the memory layout, and that's
>>>>> the application's role to allocate buffers. Not to mention that
>>>>> often the app won't know about all specifics either and it'd be
>>>>> resolved on zcrx registration.
>>>>
>>>> I think, fundamentally, we need to distinguish:
>>>>
>>>> 1. chunk size of the memory pool (page pool order, niov size)
>>>> 2. chunk size of the rx queue entries (this is what this series calls
>>>> rx-buf-len), mostly influenced by MTU?
>>>>
>>>> For devmem (and same for iou?), we want an option to derive (2) from (1):
>>>> page pools with larger chunks need to generate larger rx entries.
>>>
>>> To be honest I'm not following. #1 and #2 seem the same to me.
>>> rx-buf-len is just the size of each rx buffer posted to the NIC.
>>>
>>> With pp_params.order = 0 (most common configuration today), rx-buf-len
>>> == 4K. Regardless of MTU. With pp_params.order=1, I'm guessing 8K
>>> then, again regardless of MTU.
>>
>> There are drivers that fragment the buffer they get from a page
>> pool and give smaller chunks to the hw. It's surely a good idea to
>> be more explicit on what's what, but from the whole setup and uapi
>> perspective I'm not too concerned.
>>
>> The parameter the user passes to zcrx must controls 1. As for 2.
>> I'd expect the driver to use the passed size directly or fail
>> validation, but even if that's not the case, zcrx / devmem would
>> just continue to work without any change in uapi, so we have
>> the freedom to patch up the nuances later on if anything sticks
>> out.
>>
>
> I indeed forgot about driver-fragmenting. That does complicate things
> quite a bit.
>
> So AFAIU the intended behavior is that rx-buf-len refers to the memory
> size allocated by the driver (and thun memory provider), but not
> necessarily the one posted by the driver if it's fragmenting that
> piece of memory? If so, that sounds good to me. Although I wonder if
Yep
> that could cause some unexpected behavior... Someone may configure
> rx-buf-len to 8K on one driver and get actual 8K packets, but then
> configure rx-buf-len on another driver and get 4K packets because the
> driver fragmented each buffer into 2...
That already can happen, the user can hope to get whole full buffers
but shouldn't assume that it will. hw gro can't be 100% reliable in
this sense for all circumstances. And I don't think it's sane for
driver implementations to do that. Fragmenting PAGE_SIZE because the
NIC needs smaller chunks or for some other compatibility reasons?
Sure, but then I don't see a reason for validating even larger buffers.
> I guess in the future there may be a knob that controls how much
> fragmentation the driver does?
Probably, but hopefully it'll not be needed
--
Pavel Begunkov
next prev parent reply other threads:[~2025-08-01 9:47 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-28 11:04 [RFC v1 00/22] Large rx buffer support for zcrx Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 01/22] docs: ethtool: document that rx_buf_len must control payload lengths Pavel Begunkov
2025-07-28 18:11 ` Mina Almasry
2025-07-28 21:36 ` Mina Almasry
2025-08-01 23:13 ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 02/22] net: ethtool: report max value for rx-buf-len Pavel Begunkov
2025-07-29 5:00 ` Subbaraya Sundeep
2025-07-28 11:04 ` [RFC v1 03/22] net: use zero value to restore rx_buf_len to default Pavel Begunkov
2025-07-29 5:03 ` Subbaraya Sundeep
2025-07-28 11:04 ` [RFC v1 04/22] net: clarify the meaning of netdev_config members Pavel Begunkov
2025-07-28 21:44 ` Mina Almasry
2025-08-01 23:14 ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 05/22] net: add rx_buf_len to netdev config Pavel Begunkov
2025-07-28 21:50 ` Mina Almasry
2025-08-01 23:18 ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 06/22] eth: bnxt: read the page size from the adapter struct Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 07/22] eth: bnxt: set page pool page order based on rx_page_size Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 08/22] eth: bnxt: support setting size of agg buffers via ethtool Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 09/22] net: move netdev_config manipulation to dedicated helpers Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 10/22] net: reduce indent of struct netdev_queue_mgmt_ops members Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 11/22] net: allocate per-queue config structs and pass them thru the queue API Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 12/22] net: pass extack to netdev_rx_queue_restart() Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 13/22] net: add queue config validation callback Pavel Begunkov
2025-07-28 22:26 ` Mina Almasry
2025-07-28 11:04 ` [RFC v1 14/22] eth: bnxt: always set the queue mgmt ops Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 15/22] eth: bnxt: store the rx buf size per queue Pavel Begunkov
2025-07-28 22:33 ` Mina Almasry
2025-08-01 23:20 ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 16/22] eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 17/22] netdev: add support for setting rx-buf-len per queue Pavel Begunkov
2025-07-28 23:10 ` Mina Almasry
2025-08-01 23:37 ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 18/22] net: wipe the setting of deactived queues Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 19/22] eth: bnxt: use queue op config validate Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 20/22] eth: bnxt: support per queue configuration of rx-buf-len Pavel Begunkov
2025-07-28 11:04 ` [RFC v1 21/22] net: parametrise mp open with a queue config Pavel Begunkov
2025-08-02 0:10 ` Jakub Kicinski
2025-08-04 12:50 ` Pavel Begunkov
2025-08-05 22:43 ` Jakub Kicinski
2025-08-06 0:05 ` Jakub Kicinski
2025-08-06 16:48 ` Mina Almasry
2025-08-06 18:11 ` Jakub Kicinski
2025-08-06 18:30 ` Mina Almasry
2025-08-06 22:05 ` Jakub Kicinski
2025-07-28 11:04 ` [RFC v1 22/22] io_uring/zcrx: implement large rx buffer support Pavel Begunkov
2025-07-28 17:13 ` [RFC v1 00/22] Large rx buffer support for zcrx Stanislav Fomichev
2025-07-28 18:18 ` Pavel Begunkov
2025-07-28 20:21 ` Stanislav Fomichev
2025-07-28 21:28 ` Pavel Begunkov
2025-07-28 22:06 ` Stanislav Fomichev
2025-07-28 22:44 ` Pavel Begunkov
2025-07-29 16:33 ` Stanislav Fomichev
2025-07-30 14:16 ` Pavel Begunkov
2025-07-30 15:50 ` Stanislav Fomichev
2025-07-31 19:34 ` Mina Almasry
2025-07-31 19:57 ` Pavel Begunkov
2025-07-31 20:05 ` Mina Almasry
2025-08-01 9:48 ` Pavel Begunkov [this message]
2025-08-01 9:58 ` Pavel Begunkov
2025-07-28 23:22 ` Mina Almasry
2025-07-29 16:41 ` Stanislav Fomichev
2025-07-29 17:01 ` Mina Almasry
2025-07-28 18:54 ` Mina Almasry
2025-07-28 19:42 ` Pavel Begunkov
2025-07-28 20:23 ` Mina Almasry
2025-07-28 20:57 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58a592bf-3e88-4ad4-8a6e-37dd9319da99@gmail.com \
--to=asml.silence@gmail.com \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=ap420073@gmail.com \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=io-uring@vger.kernel.org \
--cc=kuba@kernel.org \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=stfomichev@gmail.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox