public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: David Wei <[email protected]>,
	[email protected], [email protected]
Cc: Jens Axboe <[email protected]>, Jakub Kicinski <[email protected]>,
	Paolo Abeni <[email protected]>,
	"David S. Miller" <[email protected]>,
	Eric Dumazet <[email protected]>,
	Jesper Dangaard Brouer <[email protected]>,
	David Ahern <[email protected]>,
	Mina Almasry <[email protected]>
Subject: Re: [RFC PATCH v3 13/20] io_uring: implement pp memory provider for zc rx
Date: Thu, 21 Dec 2023 19:36:27 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 12/19/23 21:03, David Wei wrote:
> From: Pavel Begunkov <[email protected]>
> 
> We're adding a new pp memory provider to implement io_uring zerocopy
> receive. It'll be "registered" in pp and used in later paches.
> 
> The typical life cycle of a buffer goes as follows: first it's allocated
> to a driver with the initial refcount set to 1. The drivers fills it
> with data, puts it into an skb and passes down the stack, where it gets
> queued up to a socket. Later, a zc io_uring request will be receiving
> data from the socket from a task context. At that point io_uring will
> tell the userspace that this buffer has some data by posting an
> appropriate completion. It'll also elevating the refcount by
> IO_ZC_RX_UREF, so the buffer is not recycled while userspace is reading
> the data. When the userspace is done with the buffer it should return it
> back to io_uring by adding an entry to the buffer refill ring. When
> necessary io_uring will poll the refill ring, compare references
> including IO_ZC_RX_UREF and reuse the buffer.
> 
> Initally, all buffers are placed in a spinlock protected ->freelist.
> It's a slow path stash, where buffers are considered to be unallocated
> and not exposed to core page pool. On allocation, pp will first try
> all its caches, and the ->alloc_pages callback if everything else
> failed.
> 
> The hot path for io_pp_zc_alloc_pages() is to grab pages from the refill
> ring. The consumption from the ring is always done in the attached napi
> context, so no additional synchronisation required. If that fails we'll
> be getting buffers from the ->freelist.
> 
> Note: only ->freelist are considered unallocated for page pool, so we
> only add pages_state_hold_cnt when allocating from there. Subsequently,
> as page_pool_return_page() and others bump the ->pages_state_release_cnt
> counter, io_pp_zc_release_page() can only use ->freelist, which is not a
> problem as it's not a slow path.
> 
> Signed-off-by: Pavel Begunkov <[email protected]>
> Signed-off-by: David Wei <[email protected]>
> ---
...
> +static void io_zc_rx_ring_refill(struct page_pool *pp,
> +				 struct io_zc_rx_ifq *ifq)
> +{
> +	unsigned int entries = io_zc_rx_rqring_entries(ifq);
> +	unsigned int mask = ifq->rq_entries - 1;
> +	struct io_zc_rx_pool *pool = ifq->pool;
> +
> +	if (unlikely(!entries))
> +		return;
> +
> +	while (entries--) {
> +		unsigned int rq_idx = ifq->cached_rq_head++ & mask;
> +		struct io_uring_rbuf_rqe *rqe = &ifq->rqes[rq_idx];
> +		u32 pgid = rqe->off / PAGE_SIZE;
> +		struct io_zc_rx_buf *buf = &pool->bufs[pgid];
> +
> +		if (!io_zc_rx_put_buf_uref(buf))
> +			continue;

It's worth to note that here we have to add a dma sync as per
discussions with page pool folks.

> +		io_zc_add_pp_cache(pp, buf);
> +		if (pp->alloc.count >= PP_ALLOC_CACHE_REFILL)
> +			break;
> +	}
> +	smp_store_release(&ifq->ring->rq.head, ifq->cached_rq_head);
> +}
> +
> +static void io_zc_rx_refill_slow(struct page_pool *pp, struct io_zc_rx_ifq *ifq)
> +{
> +	struct io_zc_rx_pool *pool = ifq->pool;
> +
> +	spin_lock_bh(&pool->freelist_lock);
> +	while (pool->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) {
> +		struct io_zc_rx_buf *buf;
> +		u32 pgid;
> +
> +		pgid = pool->freelist[--pool->free_count];
> +		buf = &pool->bufs[pgid];
> +
> +		io_zc_add_pp_cache(pp, buf);
> +		pp->pages_state_hold_cnt++;
> +		trace_page_pool_state_hold(pp, io_zc_buf_to_pp_page(buf),
> +					   pp->pages_state_hold_cnt);
> +	}
> +	spin_unlock_bh(&pool->freelist_lock);
> +}
...

-- 
Pavel Begunkov

  parent reply	other threads:[~2023-12-21 19:44 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-19 21:03 [RFC PATCH v3 00/20] Zero copy Rx using io_uring David Wei
2023-12-19 21:03 ` [RFC PATCH v3 01/20] net: page_pool: add ppiov mangling helper David Wei
2023-12-19 23:22   ` Mina Almasry
2023-12-19 23:59     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 02/20] tcp: don't allow non-devmem originated ppiov David Wei
2023-12-19 23:24   ` Mina Almasry
2023-12-20  1:29     ` Pavel Begunkov
2024-01-02 16:11       ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 03/20] net: page pool: rework ppiov life cycle David Wei
2023-12-19 23:35   ` Mina Almasry
2023-12-20  0:49     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 04/20] net: enable napi_pp_put_page for ppiov David Wei
2023-12-19 21:03 ` [RFC PATCH v3 05/20] net: page_pool: add ->scrub mem provider callback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 06/20] io_uring: separate header for exported net bits David Wei
2023-12-20 16:01   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 07/20] io_uring: add interface queue David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-20 16:23     ` Pavel Begunkov
2023-12-21  1:44     ` David Wei
2023-12-21 17:57   ` Willem de Bruijn
2023-12-30 16:25     ` Pavel Begunkov
2023-12-31 22:25       ` Willem de Bruijn
2023-12-19 21:03 ` [RFC PATCH v3 08/20] io_uring: add mmap support for shared ifq ringbuffers David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 09/20] netdev: add XDP_SETUP_ZC_RX command David Wei
2023-12-19 21:03 ` [RFC PATCH v3 10/20] io_uring: setup ZC for an Rx queue when registering an ifq David Wei
2023-12-20 16:06   ` Jens Axboe
2023-12-20 16:24     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 11/20] io_uring/zcrx: implement socket registration David Wei
2023-12-19 21:03 ` [RFC PATCH v3 12/20] io_uring: add ZC buf and pool David Wei
2023-12-19 21:03 ` [RFC PATCH v3 13/20] io_uring: implement pp memory provider for zc rx David Wei
2023-12-19 23:44   ` Mina Almasry
2023-12-20  0:39     ` Pavel Begunkov
2023-12-21 19:36   ` Pavel Begunkov [this message]
2023-12-19 21:03 ` [RFC PATCH v3 14/20] net: page pool: add io_uring memory provider David Wei
2023-12-19 23:39   ` Mina Almasry
2023-12-20  0:04     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 15/20] io_uring: add io_recvzc request David Wei
2023-12-20 16:27   ` Jens Axboe
2023-12-20 17:04     ` Pavel Begunkov
2023-12-20 18:09       ` Jens Axboe
2023-12-21 18:59         ` Pavel Begunkov
2023-12-21 21:32           ` Jens Axboe
2023-12-30 21:15             ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 16/20] net: execute custom callback from napi David Wei
2023-12-19 21:03 ` [RFC PATCH v3 17/20] io_uring/zcrx: add copy fallback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 18/20] veth: add support for io_uring zc rx David Wei
2023-12-19 21:03 ` [RFC PATCH v3 19/20] net: page pool: generalise ppiov dma address get David Wei
2023-12-21 19:51   ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 20/20] bnxt: enable io_uring zc page pool David Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox