public inbox for [email protected]
 help / color / mirror / Atom feed
From: David Wei <[email protected]>
To: [email protected], [email protected]
Cc: Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	Jakub Kicinski <[email protected]>, Paolo Abeni <[email protected]>,
	"David S. Miller" <[email protected]>,
	Eric Dumazet <[email protected]>,
	Jesper Dangaard Brouer <[email protected]>,
	David Ahern <[email protected]>,
	Mina Almasry <[email protected]>,
	Willem de Bruijn <[email protected]>,
	Dragos Tatulea <[email protected]>
Subject: [RFC PATCH v2 00/20] Zero copy Rx using io_uring
Date: Tue,  7 Nov 2023 13:40:25 -0800	[thread overview]
Message-ID: <[email protected]> (raw)

Changes in RFC v2:
------------------

* Added copy fallback support if userspace memory allocated for ZC Rx
  runs out, or if header splitting or flow steering fails.
* Added veth support for ZC Rx, for testing and demonstration. We will
  need to figure out what driver would be best for such testing
  functionality in the future. Perhaps netdevsim?
* Added socket registration API to io_uring to associate specific
  sockets with ifqs/Rx queues for ZC.
* Added multi-socket support, such that multiple connections can be
  steered into the same hardware Rx queue.
* Added Netbench server/client support.

Known deficiencies that we will address in a future patchset:

* Rebase on top of Kuba's page pool memory provider RFC.
* Proper test driver + selftests, maybe netdevsim.
* Further optimisation work.
* ...and more.

We are looking for feedback on our approach. Here are some example
points we would like to specifically discuss:

* Use of bpf_netdev_command to set up a hardware Rx queue for ZC?
* Tagging page private fields with a magic cookie to distinguish special
  userspace pages used for ZC Rx. This is used when reading skbs from a
  socket in io_uring to decide what to do.

This patchset is a proposal that adds zero copy network Rx to io_uring.
With it, userspace can register a region of host memory for receiving
data directly from a NIC using DMA, without needing a kernel to user
copy.

Full kernel tree including some out of tree BNXT changes:

https://github.com/spikeh/linux/tree/zcrx_sil

On the userspace side, support is added to both liburing and Netbench:

https://github.com/spikeh/liburing/tree/zcrx2
https://github.com/spikeh/netbench/tree/zcrx

If you would like to try out this patchset, build and run the kernel
tree then build Netbench using liburing, all from forks above.

Run setup.sh first:

https://gist.github.com/isilence/e6a28ce41a545a261566672104afa461

Then run the following commands:

sudo ip netns exec nsserv ./netbench --server_only 1 --v6 false \
    --rx "io_uring --provide_buffers 0 --use_zc 1 \
    --zc_pool_pages 16384 --zc_ifname ptp-serv" --use_port 9999

sudo ip netns exec nscl ./netbench --client_only 1 --v6 false \
    --tx "epoll --threads 1 --per_thread 1 --size 2800" \
    --host 10.10.10.20 --use_port 9999

Hardware support is added to the Broadcom BNXT driver. This patchset +
userspace code was tested on an Intel Xeon Platinum 8321HC CPU and
Broadcom BCM57504 NIC.

Early benchmarks using this prototype, with iperf3 as a load generator,
showed a ~50% reduction in overall system memory bandwidth as measured
using perf counters. Note that DDIO must be disabled on Intel systems.
Build Netbench using the modified liburing above.

This patchset is based on the work by Jonathan Lemon
<[email protected]>:
https://lore.kernel.org/io-uring/[email protected]/

David Wei (13):
  io_uring: add interface queue
  io_uring: add mmap support for shared ifq ringbuffers
  netdev: add XDP_SETUP_ZC_RX command
  io_uring: setup ZC for an Rx queue when registering an ifq
  io_uring: add ZC buf and pool
  io_uring: add ZC pool API
  skbuff: add SKBFL_FIXED_FRAG and skb_fixed()
  io_uring: allocate a uarg for freeing zero copy skbs
  io_uring: delay ZC pool destruction
  net: add data pool
  io_uring: add io_recvzc request
  bnxt: use data pool
  io_uring/zcrx: add multi socket support per Rx queue

Pavel Begunkov (7):
  io_uring/zcrx: implement socket registration
  io_uring/zcrx: propagate ifq down the stack
  io_uring/zcrx: introduce io_zc_get_rbuf_cqe
  io_uring/zcrx: add copy fallback
  net: execute custom callback from napi
  io_uring/zcrx: copy fallback to ring buffers
  veth: add support for io_uring zc rx

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  61 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   5 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |   3 +
 drivers/net/veth.c                            | 179 +++-
 include/linux/io_uring.h                      |  33 +
 include/linux/io_uring_types.h                |   6 +
 include/linux/net.h                           |   2 +
 include/linux/netdevice.h                     |   7 +
 include/linux/skbuff.h                        |  10 +-
 include/net/busy_poll.h                       |   2 +
 include/net/data_pool.h                       |  74 ++
 include/net/netdev_rx_queue.h                 |   2 +
 include/uapi/linux/io_uring.h                 |  61 ++
 io_uring/Makefile                             |   3 +-
 io_uring/io_uring.c                           |  19 +
 io_uring/kbuf.c                               |  27 +
 io_uring/kbuf.h                               |   5 +
 io_uring/net.c                                | 136 ++-
 io_uring/opdef.c                              |  16 +
 io_uring/zc_rx.c                              | 967 ++++++++++++++++++
 io_uring/zc_rx.h                              |  69 ++
 net/core/dev.c                                |  51 +
 net/socket.c                                  |   1 +
 23 files changed, 1721 insertions(+), 18 deletions(-)
 create mode 100644 include/net/data_pool.h
 create mode 100644 io_uring/zc_rx.c
 create mode 100644 io_uring/zc_rx.h

-- 
2.39.3


             reply	other threads:[~2023-11-07 21:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07 21:40 David Wei [this message]
2023-11-07 21:40 ` [PATCH 01/20] io_uring: add interface queue David Wei
2023-11-07 21:40 ` [PATCH 02/20] io_uring: add mmap support for shared ifq ringbuffers David Wei
2023-11-07 21:40 ` [PATCH 03/20] netdev: add XDP_SETUP_ZC_RX command David Wei
2023-11-07 21:40 ` [PATCH 04/20] io_uring: setup ZC for an Rx queue when registering an ifq David Wei
2023-11-07 21:40 ` [PATCH 05/20] io_uring/zcrx: implement socket registration David Wei
2023-11-07 21:40 ` [PATCH 06/20] io_uring: add ZC buf and pool David Wei
2023-11-07 21:40 ` [PATCH 07/20] io_uring: add ZC pool API David Wei
2023-11-07 21:40 ` [PATCH 08/20] skbuff: add SKBFL_FIXED_FRAG and skb_fixed() David Wei
2023-11-07 21:40 ` [PATCH 09/20] io_uring: allocate a uarg for freeing zero copy skbs David Wei
2023-11-07 21:40 ` [PATCH 10/20] io_uring: delay ZC pool destruction David Wei
2023-11-07 21:40 ` [PATCH 11/20] net: add data pool David Wei
2023-11-07 21:40 ` [PATCH 12/20] io_uring: add io_recvzc request David Wei
2023-11-07 21:40 ` [PATCH 13/20] io_uring/zcrx: propagate ifq down the stack David Wei
2023-11-07 21:40 ` [PATCH 14/20] io_uring/zcrx: introduce io_zc_get_rbuf_cqe David Wei
2023-11-07 21:40 ` [PATCH 15/20] io_uring/zcrx: add copy fallback David Wei
2023-11-07 21:40 ` [PATCH 16/20] net: execute custom callback from napi David Wei
2023-11-07 21:40 ` [PATCH 17/20] io_uring/zcrx: copy fallback to ring buffers David Wei
2023-11-07 21:40 ` [PATCH 18/20] veth: add support for io_uring zc rx David Wei
2023-11-07 21:40 ` [PATCH 19/20] bnxt: use data pool David Wei
2023-11-07 21:40 ` [PATCH 20/20] io_uring/zcrx: add multi socket support per Rx queue David Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox