public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: io-uring <io-uring@vger.kernel.org>, netdev <netdev@vger.kernel.org>
Subject: [GIT PULL] io_uring network zero-copy receive support
Date: Thu, 27 Mar 2025 05:46:21 -0600	[thread overview]
Message-ID: <12e0af8c-8417-41d5-9d47-408556b50322@kernel.dk> (raw)

Hi Linus,

This pull request adds support for zero-copy receive with io_uring,
enabling fast bulk receive of data directly into application memory,
rather than needing to copy the data out of kernel memory. While this
version only supports host memory as that was the initial target, other
memory types are planned as well, with notably GPU memory coming next.

This work depends on some networking components which were queued up on
the networking side, but have now landed in your tree.

This is the work of Pavel Begunkov and David Wei. From the v14 posting:

"We configure a page pool that a driver uses to fill a hw rx queue to
 hand out user pages instead of kernel pages. Any data that ends up
 hitting this hw rx queue will thus be dma'd into userspace memory
 directly, without needing to be bounced through kernel memory. 'Reading'
 data out of a socket instead becomes a _notification_ mechanism, where
 the kernel tells userspace where the data is. The overall approach is
 similar to the devmem TCP proposal.

 This relies on hw header/data split, flow steering and RSS to ensure
 packet headers remain in kernel memory and only desired flows hit a hw
 rx queue configured for zero copy. Configuring this is outside of the
 scope of this patchset.

 We share netdev core infra with devmem TCP. The main difference is that
 io_uring is used for the uAPI and the lifetime of all objects are bound
 to an io_uring instance. Data is 'read' using a new io_uring request
 type. When done, data is returned via a new shared refill queue. A zero
 copy page pool refills a hw rx queue from this refill queue directly. Of
 course, the lifetime of these data buffers are managed by io_uring
 rather than the networking stack, with different refcounting rules.

 This patchset is the first step adding basic zero copy support. We will
 extend this iteratively with new features e.g. dynamically allocated
 zero copy areas, THP support, dmabuf support, improved copy fallback,
 general optimisations and more."

In a local setup, I was able to saturate a 200G link with a single CPU
core, and at netdev conf 0x19 earlier this month, Jamal reported 188Gbit
of bandwidth using a single core (no HT, including soft-irq). Safe to
say the efficiency is there, as bigger links would be needed to find the
per-core limit, and it's considerably more efficient and faster than the
existing devmem solution.

Please pull!


The following changes since commit 5c496ff11df179c32db960cf10af90a624a035eb:

  Merge commit '71f0dd5a3293d75d26d405ffbaedfdda4836af32' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into for-6.15/io_uring-rx-zc (2025-02-17 05:38:28 -0700)

are available in the Git repository at:

  git://git.kernel.dk/linux.git for-6.15/io_uring-rx-zc-20250325

for you to fetch changes up to 89baa22d75278b69d3a30f86c3f47ac3a3a659e9:

  io_uring/zcrx: add selftest case for recvzc with read limit (2025-02-24 12:56:13 -0700)

----------------------------------------------------------------
Bui Quang Minh (1):
      io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap

David Wei (8):
      io_uring/zcrx: add interface queue and refill queue
      io_uring/zcrx: add io_zcrx_area
      io_uring/zcrx: add io_recvzc request
      io_uring/zcrx: set pp memory provider for an rx queue
      net: add documentation for io_uring zcrx
      io_uring/zcrx: add selftest
      io_uring/zcrx: add a read limit to recvzc requests
      io_uring/zcrx: add selftest case for recvzc with read limit

Geert Uytterhoeven (1):
      io_uring: Rename KConfig to Kconfig

Jens Axboe (1):
      Merge commit '71f0dd5a3293d75d26d405ffbaedfdda4836af32' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into for-6.15/io_uring-rx-zc

Pavel Begunkov (7):
      io_uring/zcrx: grab a net device
      io_uring/zcrx: implement zerocopy receive pp memory provider
      io_uring/zcrx: dma-map area for the device
      io_uring/zcrx: throttle receive requests
      io_uring/zcrx: add copy fallback
      io_uring/zcrx: recheck ifq on shutdown
      io_uring/zcrx: fix leaks on failed registration

 Documentation/networking/index.rst                 |   1 +
 Documentation/networking/iou-zcrx.rst              | 202 +++++
 Kconfig                                            |   2 +
 include/linux/io_uring_types.h                     |   6 +
 include/uapi/linux/io_uring.h                      |  54 +-
 io_uring/Kconfig                                   |  10 +
 io_uring/Makefile                                  |   1 +
 io_uring/io_uring.c                                |   7 +
 io_uring/io_uring.h                                |  10 +
 io_uring/memmap.c                                  |   2 +
 io_uring/memmap.h                                  |   1 +
 io_uring/net.c                                     |  84 ++
 io_uring/opdef.c                                   |  16 +
 io_uring/register.c                                |   7 +
 io_uring/rsrc.c                                    |   2 +-
 io_uring/rsrc.h                                    |   1 +
 io_uring/zcrx.c                                    | 960 +++++++++++++++++++++
 io_uring/zcrx.h                                    |  73 ++
 tools/testing/selftests/drivers/net/hw/.gitignore  |   2 +
 tools/testing/selftests/drivers/net/hw/Makefile    |   5 +
 tools/testing/selftests/drivers/net/hw/iou-zcrx.c  | 457 ++++++++++
 tools/testing/selftests/drivers/net/hw/iou-zcrx.py |  87 ++
 22 files changed, 1988 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/networking/iou-zcrx.rst
 create mode 100644 io_uring/Kconfig
 create mode 100644 io_uring/zcrx.c
 create mode 100644 io_uring/zcrx.h
 create mode 100644 tools/testing/selftests/drivers/net/hw/iou-zcrx.c
 create mode 100755 tools/testing/selftests/drivers/net/hw/iou-zcrx.py

-- 
Jens Axboe


             reply	other threads:[~2025-03-27 11:46 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-27 11:46 Jens Axboe [this message]
2025-03-28 22:11 ` [GIT PULL] io_uring network zero-copy receive support pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12e0af8c-8417-41d5-9d47-408556b50322@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox