From: Pavel Begunkov <[email protected]>
To: Hao Xu <[email protected]>,
[email protected], [email protected],
[email protected]
Cc: Jakub Kicinski <[email protected]>,
Jonathan Lemon <[email protected]>,
"David S . Miller" <[email protected]>,
Willem de Bruijn <[email protected]>,
Eric Dumazet <[email protected]>,
David Ahern <[email protected]>, Jens Axboe <[email protected]>
Subject: Re: [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr
Date: Wed, 12 Jan 2022 16:53:06 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/12/22 03:39, Hao Xu wrote:
> 在 2022/1/11 下午11:50, Pavel Begunkov 写道:
>> On 1/11/22 13:51, Hao Xu wrote:
>>> 在 2021/12/21 下午11:35, Pavel Begunkov 写道:
>>>> Instead of the net stack managing ubuf_info, allow to pass it in from
>>>> outside in a struct msghdr (in-kernel structure), so io_uring can make
>>>> use of it.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> ---
>>> Hi Pavel,
>>> I've some confusions here since I have a lack of
>>> network knowledge.
>>> The first one is why do we make ubuf_info visible
>>> for io_uring. Why not just follow the old MSG_ZEROCOPY
>>> logic?
>>
>> I assume you mean leaving allocation up and so in socket awhile the
>> patchset let's io_uring to manage and control ubufs. In short,
>> performance and out convenience
>>
>> TL;DR;
>> First, we want a nice and uniform API with io_uring, i.e. posting
>> an CQE instead of polling an err queue/etc., and for that the network
>> will need to know about io_uring ctx in some way. As an alternative it
>> may theoretically be registered in socket, but it'll quickly turn into
>> a huge mess, consider that it's a many to many relation b/w io_uring and
>> sockets. The fact that io_uring holds refs to files will only complicate
>> it.
> Make sense to me, thanks.
>>
>> It will also limit API. For instance, we won't be able to use a single
>> ubuf with several different sockets.
> Is there any use cases for this multiple sockets with single
> notification?
Don't know, scatter send maybe? It's just one of those moments when
a design that feels right (to me) yields more flexibility than was
initially planned, which is definitely a good thing
>> Another problem is performance, registration or some other tricks
>> would some additional sync. It'd also need sync on use, say it's
>> just one rcu_read, but the problem that it only adds up to complexity
>> and prevents some other optimisations. E.g. we amortise to ~0 atomics
>> getting refs on skb setups based on guarantees io_uring provides, and
>> not only. SKBFL_MANAGED_FRAGS can only work with pages being controlled
>> by the issuer, and so it needs some context as currently provided by
>> ubuf. io_uring also caches ubufs, which relies on io_uring locking, so
>> it removes kmalloc/free for almost zero overhead.
>>
>>
>>> The second one, my understanding about the buffer
>>> lifecycle is that the kernel side informs
>>> the userspace by a cqe generated by the ubuf_info
>>> callback that all the buffers attaching to the
>>> same notifier is now free to use when all the data
>>> is sent, then why is the flush in 13/19 needed as
>>> it is at the submission period?
>>
>> Probably I wasn't clear enough. A user has to flush a notifier, only
>> then it's expected to post an CQE after all buffers attached to it
>> are freed. io_uring holds one ubuf ref, which will be release on flush.
> I see, I saw another ref inc in skb_zcopy_set() which I previously
> misunderstood and thus thought there was only one refcount. Thanks!
>> I also need to add a way to flush without send.
>>
>> Will spend some time documenting for next iteration.
--
Pavel Begunkov
next prev parent reply other threads:[~2022-01-12 16:53 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-21 15:35 [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2022-01-11 13:51 ` Hao Xu
2022-01-11 15:50 ` Pavel Begunkov
2022-01-12 3:39 ` Hao Xu
2022-01-12 16:53 ` Pavel Begunkov [this message]
2021-12-21 15:35 ` [RFC v2 03/19] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 04/19] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 05/19] net: don't track pfmemalloc for zc registered mem Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 06/19] ipv4/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 07/19] ipv6/udp: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 08/19] ipv4: avoid partial copy for zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 09/19] ipv6: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 10/19] io_uring: add send notifiers registration Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 11/19] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 12/19] io_uring: wire send zc request type Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 13/19] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 14/19] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 15/19] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 16/19] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 17/19] io_uring: unclog ctx refs waiting with zc notifiers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 18/19] io_uring: task_work for notification delivery Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 19/19] io_uring: optimise task referencing by notifiers Pavel Begunkov
2021-12-21 15:43 ` [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox