From: Mina Almasry <almasrymina@google.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, io-uring@vger.kernel.org,
virtualization@lists.linux.dev, kvm@vger.kernel.org,
linux-kselftest@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Simon Horman" <horms@kernel.org>,
"Donald Hunter" <donald.hunter@gmail.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"Jeroen de Borst" <jeroendb@google.com>,
"Harshitha Ramamurthy" <hramamurthy@google.com>,
"Kuniyuki Iwashima" <kuniyu@amazon.com>,
"Willem de Bruijn" <willemb@google.com>,
"Jens Axboe" <axboe@kernel.dk>,
"Pavel Begunkov" <asml.silence@gmail.com>,
"David Ahern" <dsahern@kernel.org>,
"Neal Cardwell" <ncardwell@google.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Stefano Garzarella" <sgarzare@redhat.com>,
"Shuah Khan" <shuah@kernel.org>,
sdf@fomichev.me, dw@davidwei.uk,
"Jamal Hadi Salim" <jhs@mojatatu.com>,
"Victor Nogueira" <victor@mojatatu.com>,
"Pedro Tammela" <pctammela@mojatatu.com>,
"Samiullah Khawaja" <skhawaja@google.com>,
"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [PATCH net-next v13 4/9] net: devmem: Implement TX path
Date: Mon, 5 May 2025 12:21:51 -0700 [thread overview]
Message-ID: <CAHS8izOVV-NviR-Ty=hDdg29OCpJCQwW_K7B+mg1X=r3N7Lr7Q@mail.gmail.com> (raw)
In-Reply-To: <8cdf120d-a0f0-4dcc-a8f9-cea967ce6e7b@redhat.com>
On Mon, May 5, 2025 at 12:42 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 5/2/25 9:20 PM, Mina Almasry wrote:
> > On Fri, May 2, 2025 at 4:47 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >>> @@ -1078,7 +1092,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>> zc = MSG_ZEROCOPY;
> >>> } else if (sock_flag(sk, SOCK_ZEROCOPY)) {
> >>> skb = tcp_write_queue_tail(sk);
> >>> - uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb));
> >>> + uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb),
> >>> + sockc_valid && !!sockc.dmabuf_id);
> >>
> >> If sock_cmsg_send() failed and the user did not provide a dmabuf_id,
> >> memory accounting will be incorrect.
> >
> > Forgive me but I don't see it. sockc_valid will be false, so
> > msg_zerocopy_realloc will do the normal MSG_ZEROCOPY accounting. Then
> > below whech check sockc_valid in place of where we did the
> > sock_cmsg_send before, and goto err. I assume the goto err should undo
> > the memory accounting in the new code as in the old code. Can you
> > elaborate on the bug you see?
>
> Uhm, I think I misread the condition used for msg_zerocopy_realloc()
> last argument.
>
> Re-reading it now it the problem I see is that if sock_cmsg_send() fails
> after correctly setting 'dmabuf_id', msg_zerocopy_realloc() will account
> the dmabuf memory, which looks unexpected.
>
This is my intention with the code, let me know if you think it's
actually wrong. In this scenario, sockc_valid will be false, so
msg_zerocopy_realloc() will account the dma-buf memory, then later if
!sockc_valid, we goto out_err which will net_zcopy_put and finally
unaccount the dmabuf memory. It is a bit weird indeed to account and
unaccount the dmabuf memory in this edge case but AFAICT it's
harmless? It also matches the scenario where the user uses
MSG_ZEROCOPY with an invalid cmsg. In that case the zerocopy memory
will be accounted in msg_zerocopy_realloc and unaccounted in
net_zcopy_put in the error path as well.
Improving this edge case is possible but maybe complicates the code.
Either the dmabuf id needs to be split up into its own parsing like
you suggested earlier, or we need to record that the user is
attempting to set a dmabuf id, but since the whole sock_cmsg_send
failed we may not know what the user intended to do at all.
> Somewhat related, I don't see any change to the msg_zerocopy/ubuf
> complete/cleanup path(s): what will happen to the devmem ubuf memory at
> uarg->complete() time? It looks like it will go unexpectedly through
> mm_unaccount_pinned_pages()???
>
Right, this works without a change in the cleanup path needed. When
the dmabuf id is provided, we skip calling mm_account_pinned_pages in
msg_zerocopy_alloc and msg_zerocopy_realloc, so we skip setting
uarg->mmp->user.
mm_unaccount_pinned_pages does nothing if uarg->mmp->user is not set:
void mm_unaccount_pinned_pages(struct mmpin *mmp)
{
if (mmp->user) {
atomic_long_sub(mmp->num_pg, &mmp->user->locked_vm);
free_uid(mmp->user);
}
}
Although maybe a comment would explain why it works to improve clarity.
--
Thanks,
Mina
next prev parent reply other threads:[~2025-05-05 19:22 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-29 3:26 [PATCH net-next v13 0/9] Device memory TCP TX Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 1/9] netmem: add niov->type attribute to distinguish different net_iov types Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 2/9] net: add get_netmem/put_netmem support Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 3/9] net: devmem: TCP tx netlink api Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 4/9] net: devmem: Implement TX path Mina Almasry
2025-05-02 11:47 ` Paolo Abeni
2025-05-02 11:51 ` Paolo Abeni
2025-05-02 19:25 ` Mina Almasry
2025-05-05 7:37 ` Paolo Abeni
2025-05-02 19:20 ` Mina Almasry
2025-05-05 7:36 ` Paolo Abeni
2025-05-05 19:21 ` Mina Almasry [this message]
2025-05-06 1:37 ` Jakub Kicinski
2025-04-29 3:26 ` [PATCH net-next v13 5/9] net: add devmem TCP TX documentation Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 6/9] net: enable driver support for netmem TX Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 7/9] gve: add netmem TX support to GVE DQO-RDA mode Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 8/9] net: check for driver support in netmem TX Mina Almasry
2025-04-29 3:26 ` [PATCH net-next v13 9/9] selftests: ncdevmem: Implement devmem TCP TX Mina Almasry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHS8izOVV-NviR-Ty=hDdg29OCpJCQwW_K7B+mg1X=r3N7Lr7Q@mail.gmail.com' \
--to=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=dsahern@kernel.org \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=io-uring@vger.kernel.org \
--cc=jasowang@redhat.com \
--cc=jeroendb@google.com \
--cc=jhs@mojatatu.com \
--cc=kaiyuanz@google.com \
--cc=kuba@kernel.org \
--cc=kuniyu@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mst@redhat.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pctammela@mojatatu.com \
--cc=sdf@fomichev.me \
--cc=sgarzare@redhat.com \
--cc=shuah@kernel.org \
--cc=skhawaja@google.com \
--cc=stefanha@redhat.com \
--cc=victor@mojatatu.com \
--cc=virtualization@lists.linux.dev \
--cc=willemb@google.com \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox