From: Jens Axboe <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: io-uring <[email protected]>
Subject: [GIT PULL] io_uring updates for 6.10-rc1
Date: Sat, 11 May 2024 08:02:55 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
Hi Linus,
Here are the io_uring updates and fixes for the 6.10 kernel merge
window. This pull request contains:
- Greatly improve send zerocopy performance, by enabling coalescing of
sent buffers. MSG_ZEROCOPY already does this with send(2) and
sendmsg(2), but the io_uring side did not. In local testing, the
crossover point for send zerocopy being faster is now around 3000 byte
packets, and it performs better than the sync syscall variants as
well. This feature relies on a shared branch with net-next, which was
pulled into both branches.
- Unification of how async preparation is done across opcodes.
Previously, opcodes that required extra memory for async retry would
allocate that as needed, using on-stack state until that was the case.
If async retry was needed, the on-stack state was adjusted
appropriately for a retry and then copied to the allocated memory.
This led to some fragile and ugly code, particularly for read/write
handling, and made storage retries more difficult than they needed to
be. Allocate the memory upfront, as it's cheap from our pools, and use
that state consistently both initially and also from the retry side.
- Move away from using remap_pfn_range() for mapping the rings. This is
really not the right interface to use and can cause lifetime issues or
leaks. Additionally, it means the ring sq/cq arrays need to be
physically contigious, which can cause problems in production with
larger rings when services are restarted, as memory can be very
fragmented at that point. Move to using vm_insert_page(s) for the ring
sq/cq arrays, and apply the same treatment to mapped ring provided
buffers. This also helps unify the code we have dealing with
allocating and mapping memory. Hard to see in the diffstat as we're
adding a few features as well, but this kills about ~400 lines of code
from the codebase as well.
- Add support for bundles for send/recv. When used with provided
buffers, bundles support sending or receiving more than one buffer at
the time, improving the efficiency by only needing to call into the
networking stack once for multiple sends or receives.
- Tweaks for our accept operations, supporting both a DONTWAIT flag for
skipping poll arm and retry if we can, and a POLLFIRST flag that the
application can use to skip the initial accept attempt and rely purely
on poll for triggering the operation. Both of these have identical
flags on the receive side already.
- Make the task_work ctx locking unconditional. We had various code
paths here that would do a mix of lock/trylock and set the task_work
state to whether or not it was locked. All of that goes away, we lock
it unconditionally and get rid of the state flag indicating whether
it's locked or not. The state struct still exists as an empty type,
can go away in the future.
- Add support for specifying NOP completion values, allowing it to be
used for error handling testing.
- Use set/test bit for io-wq worker flags. Not strictly needed, but also
doesn't hurt and helps silence a KCSAN warning.
- Cleanups for io-wq locking and work assignments, closing a tiny race
where cancelations would not be able to find the work item reliably.
- Misc fixes, cleanups, and improvements.
Please pull!
The following changes since commit 0bbac3facb5d6cc0171c45c9873a2dc96bea9680:
Linux 6.9-rc4 (2024-04-14 13:38:39 -0700)
are available in the Git repository at:
git://git.kernel.dk/linux.git tags/for-6.10/io_uring-20240511
for you to fetch changes up to deb1e496a83557896fe0cca0b8af01c2a97c0dc6:
io_uring: support to inject result for NOP (2024-05-10 06:09:45 -0600)
----------------------------------------------------------------
for-6.10/io_uring-20240511
----------------------------------------------------------------
Breno Leitao (1):
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
Gabriel Krisman Bertazi (4):
io_uring: Avoid anonymous enums in io_uring uapi
io-wq: write next_work before dropping acct_lock
io-wq: Drop intermediate step between pending list and active work
io_uring: Require zeroed sqe->len on provided-buffers send
Jens Axboe (52):
nvme/io_uring: use helper for polled completions
io_uring: flush delayed fallback task_work in cancelation
io_uring: remove timeout/poll specific cancelations
io_uring/alloc_cache: shrink default max entries from 512 to 128
io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr
io_uring/net: switch io_recv() to using io_async_msghdr
io_uring/net: unify cleanup handling
io_uring/net: always setup an io_async_msghdr
io_uring/net: always set kmsg->msg.msg_control_user before issue
io_uring/net: get rid of ->prep_async() for receive side
io_uring/net: get rid of ->prep_async() for send side
io_uring: kill io_msg_alloc_async_prep()
io_uring/net: remove (now) dead code in io_netmsg_recycle()
io_uring/net: add iovec recycling
io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()
io_uring/rw: always setup io_async_rw for read/write requests
io_uring: get rid of struct io_rw_state
io_uring/rw: cleanup retry path
io_uring/rw: add iovec recycling
io_uring/net: move connect to always using async data
io_uring/uring_cmd: switch to always allocating async data
io_uring/uring_cmd: defer SQE copying until it's needed
io_uring: drop ->prep_async()
io_uring/alloc_cache: switch to array based caching
io_uring/poll: shrink alloc cache size to 32
io_uring: refill request cache in memory order
io_uring: re-arrange Makefile order
io_uring: use the right type for work_llist empty check
mm: add nommu variant of vm_insert_pages()
io_uring: get rid of remap_pfn_range() for mapping rings/sqes
io_uring: use vmap() for ring mapping
io_uring: unify io_pin_pages()
io_uring/kbuf: vmap pinned buffer ring
io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring
io_uring: use unpin_user_pages() where appropriate
io_uring: move mapping/allocation helpers to a separate file
io_uring: fix warnings on shadow variables
io_uring/kbuf: remove dead define
io_uring: ensure overflow entries are dropped when ring is exiting
io_uring/sqpoll: work around a potential audit memory leak
io_uring/rw: ensure retry condition isn't lost
io_uring/net: add generic multishot retry helper
io_uring/net: add provided buffer support for IORING_OP_SEND
io_uring/kbuf: add helpers for getting/peeking multiple buffers
io_uring/net: support bundles for send
io_uring/net: support bundles for recv
Merge branch 'for-uring-ubufops' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux into for-6.10/io_uring
io_uring/rw: reinstate thread check for retries
io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
io_uring/filetable: don't unnecessarily clear/reset bitmap
io_uring/net: add IORING_ACCEPT_DONTWAIT flag
io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
Jiapeng Chong (1):
io_uring: Remove unused function
Joel Granados (1):
io_uring: Remove the now superfluous sentinel elements from ctl_table array
Ming Lei (4):
io_uring: kill dead code in io_req_complete_post
io_uring: return void from io_put_kbuf_comp()
io_uring: fail NOP if non-zero op flags is passed in
io_uring: support to inject result for NOP
Pavel Begunkov (33):
io_uring/cmd: move io_uring_try_cancel_uring_cmd()
io_uring/cmd: kill one issue_flags to tw conversion
io_uring/cmd: fix tw <-> issue_flags conversion
io_uring/cmd: document some uring_cmd related helpers
io_uring/rw: avoid punting to io-wq directly
io_uring: force tw ctx locking
io_uring: remove struct io_tw_state::locked
io_uring: refactor io_fill_cqe_req_aux
io_uring: get rid of intermediate aux cqe caches
io_uring: remove current check from complete_post
io_uring: refactor io_req_complete_post()
io_uring: clean up io_lockdep_assert_cq_locked
io_uring: turn implicit assumptions into a warning
io_uring: remove async request cache
io_uring: remove io_req_put_rsrc_locked()
io_uring/net: merge ubuf sendzc callbacks
io_uring/net: get rid of io_notif_complete_tw_ext
io_uring/net: set MSG_ZEROCOPY for sendzc in advance
io_uring: separate header for exported net bits
io_uring: unexport io_req_cqe_overflow()
io_uring: remove extra SQPOLL overflow flush
io_uring: open code io_cqring_overflow_flush()
io_uring: always lock __io_cqring_overflow_flush
io_uring: consolidate overflow flushing
io_uring/notif: refactor io_tx_ubuf_complete()
io_uring/notif: remove ctx var from io_notif_tw_complete
io_uring/notif: shrink account_pages to u32
net: extend ubuf_info callback to ops structure
net: add callback for setting a ubuf_info to skb
io_uring/notif: simplify io_notif_flush()
io_uring/notif: implement notification stacking
io_uring/net: fix sendzc lazy wake polling
io_uring/notif: disable LAZY_WAKE for linked notifs
Ruyi Zhang (1):
io_uring/timeout: remove duplicate initialization of the io_timeout list.
linke li (1):
io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
drivers/net/tap.c | 2 +-
drivers/net/tun.c | 2 +-
drivers/net/xen-netback/common.h | 5 +-
drivers/net/xen-netback/interface.c | 2 +-
drivers/net/xen-netback/netback.c | 11 +-
drivers/nvme/host/ioctl.c | 15 +-
drivers/vhost/net.c | 8 +-
include/linux/io_uring.h | 6 -
include/linux/io_uring/cmd.h | 24 +
include/linux/io_uring/net.h | 18 +
include/linux/io_uring_types.h | 19 +-
include/linux/skbuff.h | 21 +-
include/uapi/linux/io_uring.h | 38 +-
io_uring/Makefile | 15 +-
io_uring/alloc_cache.h | 59 ++-
io_uring/cancel.c | 4 +-
io_uring/fdinfo.c | 4 +-
io_uring/filetable.c | 4 +-
io_uring/futex.c | 30 +-
io_uring/futex.h | 5 +-
io_uring/io-wq.c | 67 +--
io_uring/io_uring.c | 665 +++++-----------------------
io_uring/io_uring.h | 33 +-
io_uring/kbuf.c | 318 ++++++++------
io_uring/kbuf.h | 64 ++-
io_uring/memmap.c | 336 ++++++++++++++
io_uring/memmap.h | 25 ++
io_uring/msg_ring.c | 12 +-
io_uring/net.c | 852 +++++++++++++++++++++---------------
io_uring/net.h | 29 +-
io_uring/nop.c | 26 +-
io_uring/notif.c | 108 +++--
io_uring/notif.h | 13 +-
io_uring/opdef.c | 65 ++-
io_uring/opdef.h | 9 +-
io_uring/poll.c | 15 +-
io_uring/poll.h | 9 +-
io_uring/refs.h | 7 +
io_uring/register.c | 3 +-
io_uring/rsrc.c | 47 +-
io_uring/rsrc.h | 13 +-
io_uring/rw.c | 585 ++++++++++++-------------
io_uring/rw.h | 25 +-
io_uring/sqpoll.c | 8 +
io_uring/timeout.c | 9 +-
io_uring/uring_cmd.c | 122 +++++-
io_uring/uring_cmd.h | 8 +-
io_uring/waitid.c | 2 +-
mm/nommu.c | 7 +
net/core/skbuff.c | 36 +-
net/socket.c | 2 +-
51 files changed, 2050 insertions(+), 1762 deletions(-)
create mode 100644 include/linux/io_uring/net.h
create mode 100644 io_uring/memmap.c
create mode 100644 io_uring/memmap.h
--
Jens Axboe
next reply other threads:[~2024-05-11 14:02 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-11 14:02 Jens Axboe [this message]
2024-05-13 21:33 ` [GIT PULL] io_uring updates for 6.10-rc1 pr-tracker-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox