public inbox for [email protected]
 help / color / mirror / Atom feed
From: Mina Almasry <[email protected]>
To: David Wei <[email protected]>
Cc: [email protected], [email protected],
	 Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	 Jakub Kicinski <[email protected]>,
	Paolo Abeni <[email protected]>,
	 "David S. Miller" <[email protected]>,
	Eric Dumazet <[email protected]>,
	 Jesper Dangaard Brouer <[email protected]>,
	David Ahern <[email protected]>
Subject: Re: [RFC PATCH v3 03/20] net: page pool: rework ppiov life cycle
Date: Tue, 19 Dec 2023 15:35:47 -0800	[thread overview]
Message-ID: <CAHS8izPqKg73ub5WUg=EBdd8ifCcAuh69LB0pBUSw6t+5NGjjQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

On Tue, Dec 19, 2023 at 1:04 PM David Wei <[email protected]> wrote:
>
> From: Pavel Begunkov <[email protected]>
>
> NOT FOR UPSTREAM
> The final version will depend on how the ppiov infra looks like
>
> Page pool is tracking how many pages were allocated and returned, which
> serves for refcounting the pool, and so every page/frag allocated should
> eventually come back to the page pool via appropriate ways, e.g. by
> calling page_pool_put_page().
>
> When it comes to normal page pools (i.e. without memory providers
> attached), it's fine to return a page when it's still refcounted by
> somewhat in the stack, in which case we'll "detach" the page from the
> pool and rely on page refcount for it to return back to the kernel.
>
> Memory providers are different, at least ppiov based ones, they need
> all their buffers to eventually return back, so apart from custom pp
> ->release handlers, we'll catch when someone puts down a ppiov and call
> its memory provider to handle it, i.e. __page_pool_iov_free().
>
> The first problem is that __page_pool_iov_free() hard coded devmem
> handling, and other providers need a flexible way to specify their own
> callbacks.
>
> The second problem is that it doesn't go through the generic page pool
> paths and so can't do the mentioned pp accounting right. And we can't
> even safely rely on page_pool_put_page() to be called somewhere before
> to do the pp refcounting, because then the page pool might get destroyed
> and ppiov->pp would point to garbage.
>
> The solution is to make the pp ->release callback to be responsible for
> properly recycling its buffers, e.g. calling what was
> __page_pool_iov_free() before in case of devmem.
> page_pool_iov_put_many() will be returning buffers to the page pool.
>

Hmm this patch is working on top of slightly outdated code. I think
the correct solution here is to transition to using pp_ref_count for
refcounting the ppiovs/niovs. Once we do that, we no longer need
special refcounting for ppiovs, they're refcounted identically to
pages, makes the pp more maintainable, gives us some unified handling
of page pool refcounting, it becomes trivial to support fragmented
pages which require a pp_ref_count, and all the code in this patch can
go away.

I'm unsure if this patch is just because you haven't rebased to my
latest RFC (which is completely fine by me), or if you actually think
using pp_ref_count for refcounting is wrong and want us to go back to
the older model which required some custom handling for ppiov and
disabled frag support. I'm guessing it's the former, but please
correct if I'm wrong.

[1] https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

> Signed-off-by: Pavel Begunkov <[email protected]>
> Signed-off-by: David Wei <[email protected]>
> ---
>  include/net/page_pool/helpers.h | 15 ++++++++---
>  net/core/page_pool.c            | 46 +++++++++++++++++----------------
>  2 files changed, 35 insertions(+), 26 deletions(-)
>
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 92804c499833..ef380ee8f205 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -137,15 +137,22 @@ static inline void page_pool_iov_get_many(struct page_pool_iov *ppiov,
>         refcount_add(count, &ppiov->refcount);
>  }
>
> -void __page_pool_iov_free(struct page_pool_iov *ppiov);
> +static inline bool page_pool_iov_sub_and_test(struct page_pool_iov *ppiov,
> +                                             unsigned int count)
> +{
> +       return refcount_sub_and_test(count, &ppiov->refcount);
> +}
>
>  static inline void page_pool_iov_put_many(struct page_pool_iov *ppiov,
>                                           unsigned int count)
>  {
> -       if (!refcount_sub_and_test(count, &ppiov->refcount))
> -               return;
> +       if (count > 1)
> +               WARN_ON_ONCE(page_pool_iov_sub_and_test(ppiov, count - 1));
>
> -       __page_pool_iov_free(ppiov);
> +#ifdef CONFIG_PAGE_POOL
> +       page_pool_put_defragged_page(ppiov->pp, page_pool_mangle_ppiov(ppiov),
> +                                    -1, false);
> +#endif
>  }
>
>  /* page pool mm helpers */
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 38eff947f679..ecf90a1ccabe 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -599,6 +599,16 @@ void __page_pool_release_page_dma(struct page_pool *pool, struct page *page)
>         page_pool_set_dma_addr(page, 0);
>  }
>
> +static void page_pool_return_provider(struct page_pool *pool, struct page *page)
> +{
> +       int count;
> +
> +       if (pool->mp_ops->release_page(pool, page)) {
> +               count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
> +               trace_page_pool_state_release(pool, page, count);
> +       }
> +}
> +
>  /* Disconnects a page (from a page_pool).  API users can have a need
>   * to disconnect a page (from a page_pool), to allow it to be used as
>   * a regular page (that will eventually be returned to the normal
> @@ -607,13 +617,13 @@ void __page_pool_release_page_dma(struct page_pool *pool, struct page *page)
>  void page_pool_return_page(struct page_pool *pool, struct page *page)
>  {
>         int count;
> -       bool put;
>
> -       put = true;
> -       if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
> -               put = pool->mp_ops->release_page(pool, page);
> -       else
> -               __page_pool_release_page_dma(pool, page);
> +       if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) {
> +               page_pool_return_provider(pool, page);
> +               return;
> +       }
> +
> +       __page_pool_release_page_dma(pool, page);
>
>         /* This may be the last page returned, releasing the pool, so
>          * it is not safe to reference pool afterwards.
> @@ -621,10 +631,8 @@ void page_pool_return_page(struct page_pool *pool, struct page *page)
>         count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
>         trace_page_pool_state_release(pool, page, count);
>
> -       if (put) {
> -               page_pool_clear_pp_info(page);
> -               put_page(page);
> -       }
> +       page_pool_clear_pp_info(page);
> +       put_page(page);
>         /* An optimization would be to call __free_pages(page, pool->p.order)
>          * knowing page is not part of page-cache (thus avoiding a
>          * __page_cache_release() call).
> @@ -1034,15 +1042,6 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
>  }
>  EXPORT_SYMBOL(page_pool_update_nid);
>
> -void __page_pool_iov_free(struct page_pool_iov *ppiov)
> -{
> -       if (ppiov->pp->mp_ops != &dmabuf_devmem_ops)
> -               return;
> -
> -       netdev_free_devmem(ppiov);
> -}
> -EXPORT_SYMBOL_GPL(__page_pool_iov_free);
> -
>  /*** "Dmabuf devmem memory provider" ***/
>
>  static int mp_dmabuf_devmem_init(struct page_pool *pool)
> @@ -1093,9 +1092,12 @@ static bool mp_dmabuf_devmem_release_page(struct page_pool *pool,
>                 return false;
>
>         ppiov = page_to_page_pool_iov(page);
> -       page_pool_iov_put_many(ppiov, 1);
> -       /* We don't want the page pool put_page()ing our page_pool_iovs. */
> -       return false;
> +
> +       if (!page_pool_iov_sub_and_test(ppiov, 1))
> +               return false;
> +       netdev_free_devmem(ppiov);
> +       /* tell page_pool that the ppiov is released */
> +       return true;
>  }
>
>  const struct pp_memory_provider_ops dmabuf_devmem_ops = {
> --
> 2.39.3
>


-- 
Thanks,
Mina

  reply	other threads:[~2023-12-19 23:36 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-19 21:03 [RFC PATCH v3 00/20] Zero copy Rx using io_uring David Wei
2023-12-19 21:03 ` [RFC PATCH v3 01/20] net: page_pool: add ppiov mangling helper David Wei
2023-12-19 23:22   ` Mina Almasry
2023-12-19 23:59     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 02/20] tcp: don't allow non-devmem originated ppiov David Wei
2023-12-19 23:24   ` Mina Almasry
2023-12-20  1:29     ` Pavel Begunkov
2024-01-02 16:11       ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 03/20] net: page pool: rework ppiov life cycle David Wei
2023-12-19 23:35   ` Mina Almasry [this message]
2023-12-20  0:49     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 04/20] net: enable napi_pp_put_page for ppiov David Wei
2023-12-19 21:03 ` [RFC PATCH v3 05/20] net: page_pool: add ->scrub mem provider callback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 06/20] io_uring: separate header for exported net bits David Wei
2023-12-20 16:01   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 07/20] io_uring: add interface queue David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-20 16:23     ` Pavel Begunkov
2023-12-21  1:44     ` David Wei
2023-12-21 17:57   ` Willem de Bruijn
2023-12-30 16:25     ` Pavel Begunkov
2023-12-31 22:25       ` Willem de Bruijn
2023-12-19 21:03 ` [RFC PATCH v3 08/20] io_uring: add mmap support for shared ifq ringbuffers David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 09/20] netdev: add XDP_SETUP_ZC_RX command David Wei
2023-12-19 21:03 ` [RFC PATCH v3 10/20] io_uring: setup ZC for an Rx queue when registering an ifq David Wei
2023-12-20 16:06   ` Jens Axboe
2023-12-20 16:24     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 11/20] io_uring/zcrx: implement socket registration David Wei
2023-12-19 21:03 ` [RFC PATCH v3 12/20] io_uring: add ZC buf and pool David Wei
2023-12-19 21:03 ` [RFC PATCH v3 13/20] io_uring: implement pp memory provider for zc rx David Wei
2023-12-19 23:44   ` Mina Almasry
2023-12-20  0:39     ` Pavel Begunkov
2023-12-21 19:36   ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 14/20] net: page pool: add io_uring memory provider David Wei
2023-12-19 23:39   ` Mina Almasry
2023-12-20  0:04     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 15/20] io_uring: add io_recvzc request David Wei
2023-12-20 16:27   ` Jens Axboe
2023-12-20 17:04     ` Pavel Begunkov
2023-12-20 18:09       ` Jens Axboe
2023-12-21 18:59         ` Pavel Begunkov
2023-12-21 21:32           ` Jens Axboe
2023-12-30 21:15             ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 16/20] net: execute custom callback from napi David Wei
2023-12-19 21:03 ` [RFC PATCH v3 17/20] io_uring/zcrx: add copy fallback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 18/20] veth: add support for io_uring zc rx David Wei
2023-12-19 21:03 ` [RFC PATCH v3 19/20] net: page pool: generalise ppiov dma address get David Wei
2023-12-21 19:51   ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 20/20] bnxt: enable io_uring zc page pool David Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izPqKg73ub5WUg=EBdd8ifCcAuh69LB0pBUSw6t+5NGjjQ@mail.gmail.com' \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox