public inbox for [email protected]
 help / color / mirror / Atom feed
From: Olivier Langlois <[email protected]>
To: Hao Xu <[email protected]>, Jens Axboe <[email protected]>,
	[email protected]
Subject: Re: napi_busy_poll
Date: Thu, 17 Feb 2022 15:28:13 -0500	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Wed, 2022-02-16 at 20:14 +0800, Hao Xu wrote:
> > 
> Hi Olivier,
> I've write something to express my idea, it would be great if you can
> try it.
> It's totally untested and only does polling in sqthread, won't be
> hard
> to expand it to cqring_wait. My original idea is to poll all the napi
> device but seems that may be not efficient. so for a request, just
> do napi polling for one napi.
> There is still one problem: when to delete the polled NAPIs.
> 
> Regards,
> Hao
> 
Hi Hao,

I am going to give your patch a try hopefully later today.

On my side, I have made a small change to my code and it started to
work.

I did remove the call to io_run_task_work() from io_busy_loop_end().
While inside napi_busy_loop(), preemption is disabled, local_bh too and
the function acquires a napi_poll lock. I haven't been able to put my
finger on exactly why but it appears that in that state, the net
subsystem is not reentrant. Therefore, I did replace the call to
io_run_task_work() with a call to signal_pending() just to know if
there are pending task works so that they can be handled outside the
napi_busy_loop.

I am not sure how a socket is assigned to a napi device but I got a few
with my 50 sockets program (8 to be exact):

[2022-02-17 09:59:10] INFO WSBASE/client_established 706
LWS_CALLBACK_CLIENT_ESTABLISHED client 50(17), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 3(63), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 49(16), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 15(51), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 31(35), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 14(52), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 11(55), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 16(50), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 40(26), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 39(27), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 8(58), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 20(46), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 7(59), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 6(60), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 22(44), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 13(53), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 38(28), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 21(45), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 4(62), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 35(31), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 25(41), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 18(48), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 12(54), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 5(61), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 23(43), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 46(20), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 43(23), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 9(57), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 29(37), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 28(38), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 33(33), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 27(39), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 32(34), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 1(65), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 26(40), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 37(29), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 30(36), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 47(19), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 10(56), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 44(22), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 17(49), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 41(25), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 48(18), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 2(64), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 45(21), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 24(42), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 34(32), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 42(24), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 36(30), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 19(47), napi_id: 16
[2022-02-17 09:59:11] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 38(66), napi_id: 11

First number is the thread id (706 and 696). I have 2 threads. 1
io_uring context per thread.

Next, you have the client id and the number in parenthesis is the
socket fd.

Based on the result, it appears that having a single napi_id per
context won't do it... I wish I could pick the brains of the people
having done things the way they have done it with the epoll
implementation.

I guess that I could create 8 distinct io_uring context with
IORING_SETUP_ATTACH_WQ but it seems like a big burden placed on the
shoulders of users when a simple linked list with ref-counted elements
could do it...

I think that if I merge your patch with what I have done so far, we
could get something really cool!

Concerning the remaining problem about when to remove the napi_id, I
would say that a good place to do it would be when a request is
completed and discarded if there was a refcount added to your
napi_entry struct.

The only thing that I hate about this idea is that in my scenario, the
sockets are going to be pretty much the same for the whole io_uring
context existance. Therefore, the whole ref counting overhead is
useless and unneeded.

Here is the latest version of my effort:
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 77b9c7e4793b..ea2a3661c16f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -63,6 +63,7 @@
 #include <net/sock.h>
 #include <net/af_unix.h>
 #include <net/scm.h>
+#include <net/busy_poll.h>
 #include <linux/anon_inodes.h>
 #include <linux/sched/mm.h>
 #include <linux/uaccess.h>
@@ -395,6 +396,10 @@ struct io_ring_ctx {
 	struct list_head	sqd_list;
 
 	unsigned long		check_cq_overflow;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	/* used to track busy poll napi_id */
+	unsigned int napi_id;
+#endif
 
 	struct {
 		unsigned		cached_cq_tail;
@@ -6976,7 +6981,40 @@ static inline struct file
*io_file_get_fixed(struct io_ring_ctx *ctx,
 	io_req_set_rsrc_node(req, ctx);
 	return file;
 }
+#ifdef CONFIG_NET_RX_BUSY_POLL
+/*
+ * Set epoll busy poll NAPI ID from sk.
+ */
+static inline void io_set_busy_poll_napi_id(struct io_ring_ctx *ctx,
struct file *file)
+{
+	unsigned int napi_id;
+	struct socket *sock;
+	struct sock *sk;
+
+	if (!net_busy_loop_on())
+		return;
 
+	sock = sock_from_file(file);
+	if (!sock)
+		return;
+
+	sk = sock->sk;
+	if (!sk)
+		return;
+
+	napi_id = READ_ONCE(sk->sk_napi_id);
+
+	/* Non-NAPI IDs can be rejected
+	 *	or
+	 * Nothing to do if we already have this ID
+	 */
+	if (napi_id < MIN_NAPI_ID || napi_id == ctx->napi_id)
+		return;
+
+	/* record NAPI ID for use in next busy poll */
+	ctx->napi_id = napi_id;
+}
+#endif
 static struct file *io_file_get_normal(struct io_ring_ctx *ctx,
 				       struct io_kiocb *req, int fd)
 {
@@ -6985,8 +7023,14 @@ static struct file *io_file_get_normal(struct
io_ring_ctx *ctx,
 	trace_io_uring_file_get(ctx, fd);
 
 	/* we don't allow fixed io_uring files */
-	if (file && unlikely(file->f_op == &io_uring_fops))
-		io_req_track_inflight(req);
+	if (file) {
+		if (unlikely(file->f_op == &io_uring_fops))
+			io_req_track_inflight(req);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+		else
+			io_set_busy_poll_napi_id(ctx, file);
+#endif
+	}
 	return file;
 }
 
@@ -7489,7 +7533,22 @@ static inline void
io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
 		   ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
 	spin_unlock(&ctx->completion_lock);
 }
+#ifdef CONFIG_NET_RX_BUSY_POLL
+/*
+ * Busy poll if globally on and supporting sockets found
+ */
+static inline bool io_napi_busy_loop(struct io_ring_ctx *ctx)
+{
+	unsigned int napi_id = ctx->napi_id;
 
+	if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) {
+		napi_busy_loop(napi_id, NULL, NULL, true,
+			       BUSY_POLL_BUDGET);
+		return true;
+	}
+	return false;
+}
+#endif
 static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
 {
 	unsigned int to_submit;
@@ -7518,7 +7577,10 @@ static int __io_sq_thread(struct io_ring_ctx
*ctx, bool cap_entries)
 		    !(ctx->flags & IORING_SETUP_R_DISABLED))
 			ret = io_submit_sqes(ctx, to_submit);
 		mutex_unlock(&ctx->uring_lock);
-
+#ifdef CONFIG_NET_RX_BUSY_POLL
+		if (io_napi_busy_loop(ctx))
+			++ret;
+#endif
 		if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
 			wake_up(&ctx->sqo_sq_wait);
 		if (creds)
@@ -7649,6 +7711,9 @@ struct io_wait_queue {
 	struct io_ring_ctx *ctx;
 	unsigned cq_tail;
 	unsigned nr_timeouts;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	unsigned busy_poll_to;
+#endif
 };
 
 static inline bool io_should_wake(struct io_wait_queue *iowq)
@@ -7709,6 +7774,29 @@ static inline int io_cqring_wait_schedule(struct
io_ring_ctx *ctx,
 	return !*timeout ? -ETIME : 1;
 }
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static inline bool io_busy_loop_timeout(unsigned long start_time,
+					unsigned long bp_usec)
+{
+	if (bp_usec) {
+		unsigned long end_time = start_time + bp_usec;
+		unsigned long now = busy_loop_current_time();
+
+		return time_after(now, end_time);
+	}
+	return true;
+}
+
+static bool io_busy_loop_end(void *p, unsigned long start_time)
+{
+	struct io_wait_queue *iowq = p;
+
+	return io_busy_loop_timeout(start_time, iowq->busy_poll_to) ||
+	       signal_pending(current) ||
+	       io_should_wake(iowq);
+}
+#endif
+
 /*
  * Wait until events become available, if we don't already have some.
The
  * application must reap them itself, as they reside on the shared cq
ring.
@@ -7729,12 +7817,33 @@ static int io_cqring_wait(struct io_ring_ctx
*ctx, int min_events,
 		if (!io_run_task_work())
 			break;
 	} while (1);
-
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	iowq.busy_poll_to = 0;
+#endif
 	if (uts) {
 		struct timespec64 ts;
 
 		if (get_timespec64(&ts, uts))
 			return -EFAULT;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+		if (!(ctx->flags & IORING_SETUP_SQPOLL) &&
+		    (ctx->napi_id >= MIN_NAPI_ID) &&
net_busy_loop_on()) {
+			unsigned busy_poll_to =
+				READ_ONCE(sysctl_net_busy_poll);
+			struct timespec64 pollto =
+				ns_to_timespec64(1000*busy_poll_to);
+
+			if (timespec64_compare(&ts, &pollto) > 0) {
+				ts = timespec64_sub(ts, pollto);
+				iowq.busy_poll_to = busy_poll_to;
+			}
+			else {
+				iowq.busy_poll_to =
timespec64_to_ns(&ts)/1000;
+				ts.tv_sec = 0;
+				ts.tv_nsec = 0;
+			}
+		}
+#endif
 		timeout = timespec64_to_jiffies(&ts);
 	}
 
@@ -7759,6 +7868,11 @@ static int io_cqring_wait(struct io_ring_ctx
*ctx, int min_events,
 	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
 
 	trace_io_uring_cqring_wait(ctx, min_events);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	if (iowq.busy_poll_to)
+		napi_busy_loop(ctx->napi_id, io_busy_loop_end, &iowq,
true,
+			       BUSY_POLL_BUDGET);
+#endif
 	do {
 		/* if we can't even flush overflow, don't wait for
more */
 		if (!io_cqring_overflow_flush(ctx)) {


  reply	other threads:[~2022-02-17 20:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 14:58 napi_busy_poll Olivier Langlois
2022-02-08 17:05 ` napi_busy_poll Jens Axboe
2022-02-09  3:34   ` napi_busy_poll Hao Xu
2022-02-12 19:51     ` napi_busy_poll Olivier Langlois
2022-02-13 18:47       ` napi_busy_poll Jens Axboe
2022-02-14 17:13       ` napi_busy_poll Hao Xu
2022-02-15  8:37         ` napi_busy_poll Olivier Langlois
2022-02-15 18:05           ` napi_busy_poll Olivier Langlois
2022-02-16  3:12             ` napi_busy_poll Hao Xu
2022-02-16 19:19               ` napi_busy_poll Olivier Langlois
2022-02-16 12:14             ` napi_busy_poll Hao Xu
2022-02-17 20:28               ` Olivier Langlois [this message]
2022-02-18  8:06                 ` napi_busy_poll Hao Xu
2022-02-19  7:14                   ` napi_busy_poll Olivier Langlois
2022-02-21  4:52                     ` napi_busy_poll Hao Xu
2022-02-17 23:18               ` napi_busy_poll Olivier Langlois
2022-02-17 23:25                 ` napi_busy_poll Jens Axboe
2022-02-18  7:21                 ` napi_busy_poll Hao Xu
2022-02-18  5:05               ` napi_busy_poll Olivier Langlois
2022-02-18  7:41                 ` napi_busy_poll Hao Xu
2022-02-19  7:02                   ` napi_busy_poll Olivier Langlois
2022-02-21  5:03                     ` napi_busy_poll Hao Xu
2022-02-25  4:42                       ` napi_busy_poll Olivier Langlois

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3dcb591407be5180d7b14c05eceff30a8f990b58.camel@trillion01.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox