From: Olivier Langlois <[email protected]>
To: Hao Xu <[email protected]>, Jens Axboe <[email protected]>,
[email protected]
Subject: Re: napi_busy_poll
Date: Thu, 17 Feb 2022 15:28:13 -0500 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On Wed, 2022-02-16 at 20:14 +0800, Hao Xu wrote:
> >
> Hi Olivier,
> I've write something to express my idea, it would be great if you can
> try it.
> It's totally untested and only does polling in sqthread, won't be
> hard
> to expand it to cqring_wait. My original idea is to poll all the napi
> device but seems that may be not efficient. so for a request, just
> do napi polling for one napi.
> There is still one problem: when to delete the polled NAPIs.
>
> Regards,
> Hao
>
Hi Hao,
I am going to give your patch a try hopefully later today.
On my side, I have made a small change to my code and it started to
work.
I did remove the call to io_run_task_work() from io_busy_loop_end().
While inside napi_busy_loop(), preemption is disabled, local_bh too and
the function acquires a napi_poll lock. I haven't been able to put my
finger on exactly why but it appears that in that state, the net
subsystem is not reentrant. Therefore, I did replace the call to
io_run_task_work() with a call to signal_pending() just to know if
there are pending task works so that they can be handled outside the
napi_busy_loop.
I am not sure how a socket is assigned to a napi device but I got a few
with my 50 sockets program (8 to be exact):
[2022-02-17 09:59:10] INFO WSBASE/client_established 706
LWS_CALLBACK_CLIENT_ESTABLISHED client 50(17), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 3(63), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 49(16), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 15(51), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 31(35), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 14(52), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 11(55), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 16(50), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 40(26), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 39(27), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 8(58), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 20(46), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 7(59), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 6(60), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 22(44), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 13(53), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 38(28), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 21(45), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 4(62), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 35(31), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 25(41), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 18(48), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 12(54), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 5(61), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 23(43), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 46(20), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 43(23), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 9(57), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 29(37), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 28(38), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 33(33), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 27(39), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 32(34), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 1(65), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 26(40), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 37(29), napi_id: 12
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 30(36), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 47(19), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 10(56), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 44(22), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 17(49), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 41(25), napi_id: 13
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 48(18), napi_id: 10
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 2(64), napi_id: 15
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 45(21), napi_id: 14
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 24(42), napi_id: 9
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 34(32), napi_id: 11
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 42(24), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 36(30), napi_id: 16
[2022-02-17 09:59:10] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 19(47), napi_id: 16
[2022-02-17 09:59:11] INFO WSBASE/client_established 696
LWS_CALLBACK_CLIENT_ESTABLISHED client 38(66), napi_id: 11
First number is the thread id (706 and 696). I have 2 threads. 1
io_uring context per thread.
Next, you have the client id and the number in parenthesis is the
socket fd.
Based on the result, it appears that having a single napi_id per
context won't do it... I wish I could pick the brains of the people
having done things the way they have done it with the epoll
implementation.
I guess that I could create 8 distinct io_uring context with
IORING_SETUP_ATTACH_WQ but it seems like a big burden placed on the
shoulders of users when a simple linked list with ref-counted elements
could do it...
I think that if I merge your patch with what I have done so far, we
could get something really cool!
Concerning the remaining problem about when to remove the napi_id, I
would say that a good place to do it would be when a request is
completed and discarded if there was a refcount added to your
napi_entry struct.
The only thing that I hate about this idea is that in my scenario, the
sockets are going to be pretty much the same for the whole io_uring
context existance. Therefore, the whole ref counting overhead is
useless and unneeded.
Here is the latest version of my effort:
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 77b9c7e4793b..ea2a3661c16f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -63,6 +63,7 @@
#include <net/sock.h>
#include <net/af_unix.h>
#include <net/scm.h>
+#include <net/busy_poll.h>
#include <linux/anon_inodes.h>
#include <linux/sched/mm.h>
#include <linux/uaccess.h>
@@ -395,6 +396,10 @@ struct io_ring_ctx {
struct list_head sqd_list;
unsigned long check_cq_overflow;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ /* used to track busy poll napi_id */
+ unsigned int napi_id;
+#endif
struct {
unsigned cached_cq_tail;
@@ -6976,7 +6981,40 @@ static inline struct file
*io_file_get_fixed(struct io_ring_ctx *ctx,
io_req_set_rsrc_node(req, ctx);
return file;
}
+#ifdef CONFIG_NET_RX_BUSY_POLL
+/*
+ * Set epoll busy poll NAPI ID from sk.
+ */
+static inline void io_set_busy_poll_napi_id(struct io_ring_ctx *ctx,
struct file *file)
+{
+ unsigned int napi_id;
+ struct socket *sock;
+ struct sock *sk;
+
+ if (!net_busy_loop_on())
+ return;
+ sock = sock_from_file(file);
+ if (!sock)
+ return;
+
+ sk = sock->sk;
+ if (!sk)
+ return;
+
+ napi_id = READ_ONCE(sk->sk_napi_id);
+
+ /* Non-NAPI IDs can be rejected
+ * or
+ * Nothing to do if we already have this ID
+ */
+ if (napi_id < MIN_NAPI_ID || napi_id == ctx->napi_id)
+ return;
+
+ /* record NAPI ID for use in next busy poll */
+ ctx->napi_id = napi_id;
+}
+#endif
static struct file *io_file_get_normal(struct io_ring_ctx *ctx,
struct io_kiocb *req, int fd)
{
@@ -6985,8 +7023,14 @@ static struct file *io_file_get_normal(struct
io_ring_ctx *ctx,
trace_io_uring_file_get(ctx, fd);
/* we don't allow fixed io_uring files */
- if (file && unlikely(file->f_op == &io_uring_fops))
- io_req_track_inflight(req);
+ if (file) {
+ if (unlikely(file->f_op == &io_uring_fops))
+ io_req_track_inflight(req);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ else
+ io_set_busy_poll_napi_id(ctx, file);
+#endif
+ }
return file;
}
@@ -7489,7 +7533,22 @@ static inline void
io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
spin_unlock(&ctx->completion_lock);
}
+#ifdef CONFIG_NET_RX_BUSY_POLL
+/*
+ * Busy poll if globally on and supporting sockets found
+ */
+static inline bool io_napi_busy_loop(struct io_ring_ctx *ctx)
+{
+ unsigned int napi_id = ctx->napi_id;
+ if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) {
+ napi_busy_loop(napi_id, NULL, NULL, true,
+ BUSY_POLL_BUDGET);
+ return true;
+ }
+ return false;
+}
+#endif
static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
{
unsigned int to_submit;
@@ -7518,7 +7577,10 @@ static int __io_sq_thread(struct io_ring_ctx
*ctx, bool cap_entries)
!(ctx->flags & IORING_SETUP_R_DISABLED))
ret = io_submit_sqes(ctx, to_submit);
mutex_unlock(&ctx->uring_lock);
-
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ if (io_napi_busy_loop(ctx))
+ ++ret;
+#endif
if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
wake_up(&ctx->sqo_sq_wait);
if (creds)
@@ -7649,6 +7711,9 @@ struct io_wait_queue {
struct io_ring_ctx *ctx;
unsigned cq_tail;
unsigned nr_timeouts;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ unsigned busy_poll_to;
+#endif
};
static inline bool io_should_wake(struct io_wait_queue *iowq)
@@ -7709,6 +7774,29 @@ static inline int io_cqring_wait_schedule(struct
io_ring_ctx *ctx,
return !*timeout ? -ETIME : 1;
}
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static inline bool io_busy_loop_timeout(unsigned long start_time,
+ unsigned long bp_usec)
+{
+ if (bp_usec) {
+ unsigned long end_time = start_time + bp_usec;
+ unsigned long now = busy_loop_current_time();
+
+ return time_after(now, end_time);
+ }
+ return true;
+}
+
+static bool io_busy_loop_end(void *p, unsigned long start_time)
+{
+ struct io_wait_queue *iowq = p;
+
+ return io_busy_loop_timeout(start_time, iowq->busy_poll_to) ||
+ signal_pending(current) ||
+ io_should_wake(iowq);
+}
+#endif
+
/*
* Wait until events become available, if we don't already have some.
The
* application must reap them itself, as they reside on the shared cq
ring.
@@ -7729,12 +7817,33 @@ static int io_cqring_wait(struct io_ring_ctx
*ctx, int min_events,
if (!io_run_task_work())
break;
} while (1);
-
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ iowq.busy_poll_to = 0;
+#endif
if (uts) {
struct timespec64 ts;
if (get_timespec64(&ts, uts))
return -EFAULT;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ if (!(ctx->flags & IORING_SETUP_SQPOLL) &&
+ (ctx->napi_id >= MIN_NAPI_ID) &&
net_busy_loop_on()) {
+ unsigned busy_poll_to =
+ READ_ONCE(sysctl_net_busy_poll);
+ struct timespec64 pollto =
+ ns_to_timespec64(1000*busy_poll_to);
+
+ if (timespec64_compare(&ts, &pollto) > 0) {
+ ts = timespec64_sub(ts, pollto);
+ iowq.busy_poll_to = busy_poll_to;
+ }
+ else {
+ iowq.busy_poll_to =
timespec64_to_ns(&ts)/1000;
+ ts.tv_sec = 0;
+ ts.tv_nsec = 0;
+ }
+ }
+#endif
timeout = timespec64_to_jiffies(&ts);
}
@@ -7759,6 +7868,11 @@ static int io_cqring_wait(struct io_ring_ctx
*ctx, int min_events,
iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
trace_io_uring_cqring_wait(ctx, min_events);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ if (iowq.busy_poll_to)
+ napi_busy_loop(ctx->napi_id, io_busy_loop_end, &iowq,
true,
+ BUSY_POLL_BUDGET);
+#endif
do {
/* if we can't even flush overflow, don't wait for
more */
if (!io_cqring_overflow_flush(ctx)) {
next prev parent reply other threads:[~2022-02-17 20:28 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-08 14:58 napi_busy_poll Olivier Langlois
2022-02-08 17:05 ` napi_busy_poll Jens Axboe
2022-02-09 3:34 ` napi_busy_poll Hao Xu
2022-02-12 19:51 ` napi_busy_poll Olivier Langlois
2022-02-13 18:47 ` napi_busy_poll Jens Axboe
2022-02-14 17:13 ` napi_busy_poll Hao Xu
2022-02-15 8:37 ` napi_busy_poll Olivier Langlois
2022-02-15 18:05 ` napi_busy_poll Olivier Langlois
2022-02-16 3:12 ` napi_busy_poll Hao Xu
2022-02-16 19:19 ` napi_busy_poll Olivier Langlois
2022-02-16 12:14 ` napi_busy_poll Hao Xu
2022-02-17 20:28 ` Olivier Langlois [this message]
2022-02-18 8:06 ` napi_busy_poll Hao Xu
2022-02-19 7:14 ` napi_busy_poll Olivier Langlois
2022-02-21 4:52 ` napi_busy_poll Hao Xu
2022-02-17 23:18 ` napi_busy_poll Olivier Langlois
2022-02-17 23:25 ` napi_busy_poll Jens Axboe
2022-02-18 7:21 ` napi_busy_poll Hao Xu
2022-02-18 5:05 ` napi_busy_poll Olivier Langlois
2022-02-18 7:41 ` napi_busy_poll Hao Xu
2022-02-19 7:02 ` napi_busy_poll Olivier Langlois
2022-02-21 5:03 ` napi_busy_poll Hao Xu
2022-02-25 4:42 ` napi_busy_poll Olivier Langlois
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3dcb591407be5180d7b14c05eceff30a8f990b58.camel@trillion01.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox