public inbox for [email protected]
 help / color / mirror / Atom feed
From: Olivier Langlois <[email protected]>
To: Jakub Kicinski <[email protected]>, Jens Axboe <[email protected]>
Cc: Linus Torvalds <[email protected]>,
	io-uring <[email protected]>
Subject: Re: [GIT PULL] io_uring updates for 5.18-rc1
Date: Wed, 01 Jun 2022 02:59:12 -0400	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Sat, 2022-03-26 at 14:30 -0700, Jakub Kicinski wrote:
> On Sat, 26 Mar 2022 15:06:40 -0600 Jens Axboe wrote:
> > On 3/26/22 2:57 PM, Jens Axboe wrote:
> > > > I'd also like to have a conversation about continuing to use
> > > > the socket as a proxy for NAPI_ID, NAPI_ID is exposed to user
> > > > space now. io_uring being a new interface I wonder if it's not 
> > > > better to let the user specify the request parameters
> > > > directly.  
> > > 
> > > Definitely open to something that makes more sense, given we
> > > don't
> > > have to shoehorn things through the regular API for NAPI with
> > > io_uring.  
> > 
> > The most appropriate is probably to add a way to get/set NAPI
> > settings
> > on a per-io_uring basis, eg through io_uring_register(2). It's a
> > bit
> > more difficult if they have to be per-socket, as the polling
> > happens off
> > what would normally be the event wait path.
> > 
> > What did you have in mind?
> 
> Not sure I fully comprehend what the current code does. IIUC it uses
> the socket and the caches its napi_id, presumably because it doesn't
> want to hold a reference on the socket?

Again, the io_uring napi busy_poll integration is strongly inspired
from epoll implementation which caches a single napi_id.

I guess that I did reverse engineer the rational justifying the epoll
design decisions.

If you were to busy poll receive queues for a socket set containing
hundreds of thousands of sockets, would you rather scan the whole
socket set to retrieve which queues to poll or simple iterate through a
list containing a dozen of so of ids?
> 
> This may give the user a false impression that the polling follows 
> the socket. NAPIs may get reshuffled underneath on pretty random
> reconfiguration / recovery events (random == driver dependent).

There is nothing random. When a socket is added to the poll set, its
receive queue is added to the short list of queues to poll.

A very common usage pattern among networking applications it is to
reinsert the socket into the polling set after each polling event. In
recognition to this pattern and to avoid allocating/deallocating memory
to modify the napi_id list all the time, each napi id is kept in the
list until a very long period of inactivity is reached where it is
finally removed to stop the receive queue busy polling.
> 
> I'm not entirely clear how the thing is supposed to be used with TCP
> socket, as from a quick grep it appears that listening sockets don't
> get napi_id marked at all.
> 
> The commit mentions a UDP benchmark, Olivier can you point me to more
> info on the use case? I'm mostly familiar with NAPI busy poll with
> XDP
> sockets, where it's pretty obvious.

https://github.com/lano1106/io_uring_udp_ping

IDK what else I can tell you. I choose to unit test the new feature
with an UDP app because it was the simplest setup for testing. AFAIK,
the ultimate goal of busy polling is to minimize latency in packets
reception and the NAPI busy polling code should not treat differently
packets whether they are UDP or TCP or whatever the type of frames the
NIC does receive...
> 
> My immediate reaction is that we should either explicitly call out
> NAPI
> instances by id in uAPI, or make sure we follow the socket in every
> case. Also we can probably figure out an easy way of avoiding the
> hash
> table lookups and cache a pointer to the NAPI struct.
> 
That is an interesting idea. If this is something that NAPI API would
offer, I would gladly use that to avoid the hash lookup but IMHO, I see
it as a very interesting improvement but hopefully this should not
block my patch...



  parent reply	other threads:[~2022-06-01  7:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-18 21:59 [GIT PULL] io_uring updates for 5.18-rc1 Jens Axboe
2022-03-22  0:25 ` pr-tracker-bot
     [not found] ` <20220326122838.19d7193f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
     [not found]   ` <[email protected]>
     [not found]     ` <20220326130615.2d3c6c85@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
     [not found]       ` <[email protected]>
     [not found]         ` <[email protected]>
     [not found]           ` <[email protected]>
2022-06-01  6:59             ` Olivier Langlois [this message]
2022-06-01 16:24               ` Jakub Kicinski
2022-06-01 18:09               ` Linus Torvalds
2022-06-01 18:21                 ` Jens Axboe
2022-06-01 18:28                   ` Linus Torvalds
2022-06-01 18:34                     ` Jens Axboe
2022-06-01 18:52                       ` Linus Torvalds
2022-06-01 19:10                         ` Jens Axboe
2022-06-01 19:20                           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78d9a5e2eaad11058f54b1392662099549aa925f.camel@trillion01.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox