* [PATCH v2 0/5] fast poll multishot mode
@ 2022-05-06 7:00 Hao Xu
2022-05-06 7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
` (6 more replies)
0 siblings, 7 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06 7:00 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel
Let multishot support multishot mode, currently only add accept as its
first comsumer.
theoretical analysis:
1) when connections come in fast
- singleshot:
add accept sqe(userpsace) --> accept inline
^ |
|-----------------|
- multishot:
add accept sqe(userspace) --> accept inline
^ |
|--*--|
we do accept repeatedly in * place until get EAGAIN
2) when connections come in at a low pressure
similar thing like 1), we reduce a lot of userspace-kernel context
switch and useless vfs_poll()
tests:
Did some tests, which goes in this way:
server client(multiple)
accept connect
read write
write read
close close
Basically, raise up a number of clients(on same machine with server) to
connect to the server, and then write some data to it, the server will
write those data back to the client after it receives them, and then
close the connection after write return. Then the client will read the
data and then close the connection. Here I test 10000 clients connect
one server, data size 128 bytes. And each client has a go routine for
it, so they come to the server in short time.
test 20 times before/after this patchset, time spent:(unit cycle, which
is the return value of clock())
before:
1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
+1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
+1934226+1914385)/20.0 = 1927633.75
after:
1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
+1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
+1871324+1940803)/20.0 = 1894750.45
(1927633.75 - 1894750.45) / 1927633.75 = 1.65%
A liburing test is here:
https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
v1->v2:
- re-implement it against the reworked poll code
Hao Xu (5):
io_uring: add IORING_ACCEPT_MULTISHOT for accept
io_uring: add REQ_F_APOLL_MULTISHOT for requests
io_uring: let fast poll support multishot
io_uring: add a helper for poll clean
io_uring: implement multishot mode for accept
fs/io_uring.c | 121 ++++++++++++++++++++++++++--------
include/uapi/linux/io_uring.h | 5 ++
2 files changed, 100 insertions(+), 26 deletions(-)
base-commit: f2e030dd7aaea5a937a2547dc980fab418fbc5e7
--
2.36.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
@ 2022-05-06 7:00 ` Hao Xu
2022-05-06 14:32 ` Jens Axboe
2022-05-06 7:00 ` [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
` (5 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-06 7:00 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel
From: Hao Xu <[email protected]>
add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
support multishot.
Signed-off-by: Hao Xu <[email protected]>
---
include/uapi/linux/io_uring.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index fad63564678a..73bc7e54ac18 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -213,6 +213,11 @@ enum {
#define IORING_ASYNC_CANCEL_FD (1U << 1)
#define IORING_ASYNC_CANCEL_ANY (1U << 2)
+/*
+ * accept flags stored in accept_flags
+ */
+#define IORING_ACCEPT_MULTISHOT (1U << 15)
+
/*
* IO completion data structure (Completion Queue Entry)
*/
--
2.36.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
2022-05-06 7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
@ 2022-05-06 7:00 ` Hao Xu
2022-05-06 7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
` (4 subsequent siblings)
6 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06 7:00 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel
From: Hao Xu <[email protected]>
Add a flag to indicate multishot mode for fast poll. currently only
accept use it, but there may be more operations leveraging it in the
future.
Signed-off-by: Hao Xu <[email protected]>
---
fs/io_uring.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1e7466079af7..8ebb1a794e36 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -808,6 +808,7 @@ enum {
REQ_F_SINGLE_POLL_BIT,
REQ_F_DOUBLE_POLL_BIT,
REQ_F_PARTIAL_IO_BIT,
+ REQ_F_APOLL_MULTISHOT_BIT,
/* keep async read/write and isreg together and in order */
REQ_F_SUPPORT_NOWAIT_BIT,
REQ_F_ISREG_BIT,
@@ -872,6 +873,8 @@ enum {
REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT),
/* request has already done partial IO */
REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT),
+ /* fast poll multishot mode */
+ REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT),
};
struct async_poll {
--
2.36.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
2022-05-06 7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
2022-05-06 7:00 ` [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
@ 2022-05-06 7:01 ` Hao Xu
2022-05-06 17:19 ` Pavel Begunkov
2022-05-06 18:02 ` kernel test robot
2022-05-06 7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
` (3 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06 7:01 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel
From: Hao Xu <[email protected]>
For operations like accept, multishot is a useful feature, since we can
reduce a number of accept sqe. Let's integrate it to fast poll, it may
be good for other operations in the future.
Signed-off-by: Hao Xu <[email protected]>
---
fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
1 file changed, 26 insertions(+), 15 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8ebb1a794e36..d33777575faf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
* either spurious wakeup or multishot CQE is served. 0 when it's done with
* the request, then the mask is stored in req->cqe.res.
*/
-static int io_poll_check_events(struct io_kiocb *req, bool locked)
+static int io_poll_check_events(struct io_kiocb *req, bool *locked)
{
struct io_ring_ctx *ctx = req->ctx;
int v;
@@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
/* multishot, just fill an CQE and proceed */
if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
- __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
- bool filled;
-
- spin_lock(&ctx->completion_lock);
- filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
- IORING_CQE_F_MORE);
- io_commit_cqring(ctx);
- spin_unlock(&ctx->completion_lock);
- if (unlikely(!filled))
- return -ECANCELED;
- io_cqring_ev_posted(ctx);
+ if (req->flags & REQ_F_APOLL_MULTISHOT) {
+ io_tw_lock(req->ctx, locked);
+ if (likely(!(req->task->flags & PF_EXITING)))
+ io_queue_sqe(req);
+ else
+ return -EFAULT;
+ } else {
+ __poll_t mask = mangle_poll(req->cqe.res &
+ req->apoll_events);
+ bool filled;
+
+ spin_lock(&ctx->completion_lock);
+ filled = io_fill_cqe_aux(ctx, req->cqe.user_data,
+ mask, IORING_CQE_F_MORE);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ if (unlikely(!filled))
+ return -ECANCELED;
+ io_cqring_ev_posted(ctx);
+ }
} else if (req->cqe.res) {
return 0;
}
@@ -6010,7 +6019,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
struct io_ring_ctx *ctx = req->ctx;
int ret;
- ret = io_poll_check_events(req, *locked);
+ ret = io_poll_check_events(req, locked);
if (ret > 0)
return;
@@ -6035,7 +6044,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
struct io_ring_ctx *ctx = req->ctx;
int ret;
- ret = io_poll_check_events(req, *locked);
+ ret = io_poll_check_events(req, locked);
if (ret > 0)
return;
@@ -6275,7 +6284,7 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
struct io_ring_ctx *ctx = req->ctx;
struct async_poll *apoll;
struct io_poll_table ipt;
- __poll_t mask = EPOLLONESHOT | POLLERR | POLLPRI;
+ __poll_t mask = POLLERR | POLLPRI;
int ret;
if (!def->pollin && !def->pollout)
@@ -6284,6 +6293,8 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
return IO_APOLL_ABORTED;
if ((req->flags & (REQ_F_POLLED|REQ_F_PARTIAL_IO)) == REQ_F_POLLED)
return IO_APOLL_ABORTED;
+ if (!(req->flags & REQ_F_APOLL_MULTISHOT))
+ mask |= EPOLLONESHOT;
if (def->pollin) {
mask |= POLLIN | POLLRDNORM;
--
2.36.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
` (2 preceding siblings ...)
2022-05-06 7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
@ 2022-05-06 7:01 ` Hao Xu
2022-05-06 11:04 ` kernel test robot
` (3 more replies)
2022-05-06 7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
` (2 subsequent siblings)
6 siblings, 4 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06 7:01 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel
From: Hao Xu <[email protected]>
Add a helper for poll clean, it will be used in the multishot accept in
the later patches.
Signed-off-by: Hao Xu <[email protected]>
---
fs/io_uring.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d33777575faf..0a83ecc457d1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return 0;
}
+static inline void __io_poll_clean(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+
+ io_poll_remove_entries(req);
+ spin_lock(&ctx->completion_lock);
+ hash_del(&req->hash_node);
+ spin_unlock(&ctx->completion_lock);
+}
+
+#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
+static inline void io_poll_clean(struct io_kiocb *req)
+{
+ if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
+ __io_poll_clean(req);
+}
+
static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_accept *accept = &req->accept;
@@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
{
- struct io_ring_ctx *ctx = req->ctx;
int ret;
ret = io_poll_check_events(req, locked);
if (ret > 0)
return;
- io_poll_remove_entries(req);
- spin_lock(&ctx->completion_lock);
- hash_del(&req->hash_node);
- spin_unlock(&ctx->completion_lock);
+ __io_poll_clean(req);
if (!ret)
io_req_task_submit(req, locked);
--
2.36.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 5/5] io_uring: implement multishot mode for accept
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
` (3 preceding siblings ...)
2022-05-06 7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
@ 2022-05-06 7:01 ` Hao Xu
2022-05-06 14:42 ` Jens Axboe
2022-05-06 20:50 ` Jens Axboe
2022-05-06 7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
2022-05-06 22:23 ` Jens Axboe
6 siblings, 2 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-06 7:01 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel
From: Hao Xu <[email protected]>
Refactor io_accept() to support multishot mode.
theoretical analysis:
1) when connections come in fast
- singleshot:
add accept sqe(userpsace) --> accept inline
^ |
|-----------------|
- multishot:
add accept sqe(userspace) --> accept inline
^ |
|--*--|
we do accept repeatedly in * place until get EAGAIN
2) when connections come in at a low pressure
similar thing like 1), we reduce a lot of userspace-kernel context
switch and useless vfs_poll()
tests:
Did some tests, which goes in this way:
server client(multiple)
accept connect
read write
write read
close close
Basically, raise up a number of clients(on same machine with server) to
connect to the server, and then write some data to it, the server will
write those data back to the client after it receives them, and then
close the connection after write return. Then the client will read the
data and then close the connection. Here I test 10000 clients connect
one server, data size 128 bytes. And each client has a go routine for
it, so they come to the server in short time.
test 20 times before/after this patchset, time spent:(unit cycle, which
is the return value of clock())
before:
1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
+1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
+1934226+1914385)/20.0 = 1927633.75
after:
1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
+1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
+1871324+1940803)/20.0 = 1894750.45
(1927633.75 - 1894750.45) / 1927633.75 = 1.65%
Signed-off-by: Hao Xu <[email protected]>
---
fs/io_uring.c | 54 +++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 48 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0a83ecc457d1..9febe7774dc3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1254,6 +1254,7 @@ static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
static void io_eventfd_signal(struct io_ring_ctx *ctx);
static void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
+static void io_poll_remove_entries(struct io_kiocb *req);
static struct kmem_cache *req_cachep;
@@ -5690,24 +5691,29 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_accept *accept = &req->accept;
+ bool multishot;
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
- if (sqe->ioprio || sqe->len || sqe->buf_index)
+ if (sqe->len || sqe->buf_index)
return -EINVAL;
accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
accept->flags = READ_ONCE(sqe->accept_flags);
accept->nofile = rlimit(RLIMIT_NOFILE);
+ multishot = !!(READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT);
accept->file_slot = READ_ONCE(sqe->file_index);
- if (accept->file_slot && (accept->flags & SOCK_CLOEXEC))
+ if (accept->file_slot && ((accept->flags & SOCK_CLOEXEC) || multishot))
return -EINVAL;
if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
return -EINVAL;
if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
+ if (multishot)
+ req->flags |= REQ_F_APOLL_MULTISHOT;
+
return 0;
}
@@ -5730,6 +5736,7 @@ static inline void io_poll_clean(struct io_kiocb *req)
static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct io_ring_ctx *ctx = req->ctx;
struct io_accept *accept = &req->accept;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
@@ -5737,10 +5744,13 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
struct file *file;
int ret, fd;
+retry:
if (!fixed) {
fd = __get_unused_fd_flags(accept->flags, accept->nofile);
- if (unlikely(fd < 0))
+ if (unlikely(fd < 0)) {
+ io_poll_clean(req);
return fd;
+ }
}
file = do_accept(req->file, file_flags, accept->addr, accept->addr_len,
accept->flags);
@@ -5748,8 +5758,12 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
if (!fixed)
put_unused_fd(fd);
ret = PTR_ERR(file);
- if (ret == -EAGAIN && force_nonblock)
- return -EAGAIN;
+ if (ret == -EAGAIN && force_nonblock) {
+ if ((req->flags & REQ_F_APOLL_MULTI_POLLED) ==
+ REQ_F_APOLL_MULTI_POLLED)
+ ret = 0;
+ return ret;
+ }
if (ret == -ERESTARTSYS)
ret = -EINTR;
req_set_fail(req);
@@ -5760,7 +5774,35 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
ret = io_install_fixed_file(req, file, issue_flags,
accept->file_slot - 1);
}
- __io_req_complete(req, issue_flags, ret, 0);
+
+ if (req->flags & REQ_F_APOLL_MULTISHOT) {
+ if (ret >= 0) {
+ bool filled;
+
+ spin_lock(&ctx->completion_lock);
+ filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
+ IORING_CQE_F_MORE);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ if (unlikely(!filled)) {
+ io_poll_clean(req);
+ return -ECANCELED;
+ }
+ io_cqring_ev_posted(ctx);
+ goto retry;
+ } else {
+ /*
+ * the apoll multishot req should handle poll
+ * cancellation by itself since the upper layer
+ * who called io_queue_sqe() cannot get errors
+ * happened here.
+ */
+ io_poll_clean(req);
+ return ret;
+ }
+ } else {
+ __io_req_complete(req, issue_flags, ret, 0);
+ }
return 0;
}
--
2.36.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
` (4 preceding siblings ...)
2022-05-06 7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
@ 2022-05-06 7:36 ` Hao Xu
2022-05-06 14:18 ` Jens Axboe
2022-05-06 22:23 ` Jens Axboe
6 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-06 7:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, Pavel Begunkov, linux-kernel
Hi All,
I actually had a question about the current poll code, from the code it
seems when we cancel a poll-like request, it will ignore the existing
events and just raise a -ECANCELED cqe though I haven't tested it. Is
this by design or am I missing something?
Regards,
Hao
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-06 7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
@ 2022-05-06 11:04 ` kernel test robot
2022-05-06 12:47 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2022-05-06 11:04 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: kbuild-all, Jens Axboe, Pavel Begunkov, linux-kernel
Hi Hao,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on f2e030dd7aaea5a937a2547dc980fab418fbc5e7]
url: https://github.com/intel-lab-lkp/linux/commits/Hao-Xu/fast-poll-multishot-mode/20220506-150750
base: f2e030dd7aaea5a937a2547dc980fab418fbc5e7
config: m68k-randconfig-r025-20220506 (https://download.01.org/0day-ci/archive/20220506/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/acb232e81643bd097278ebdc17038e6f280e7212
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Hao-Xu/fast-poll-multishot-mode/20220506-150750
git checkout acb232e81643bd097278ebdc17038e6f280e7212
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
fs/io_uring.c: In function '__io_submit_flush_completions':
fs/io_uring.c:2785:40: warning: variable 'prev' set but not used [-Wunused-but-set-variable]
2785 | struct io_wq_work_node *node, *prev;
| ^~~~
fs/io_uring.c: In function 'io_apoll_task_func':
>> fs/io_uring.c:6067:9: error: implicit declaration of function '__io_poll_clean'; did you mean '__io_fill_cqe'? [-Werror=implicit-function-declaration]
6067 | __io_poll_clean(req);
| ^~~~~~~~~~~~~~~
| __io_fill_cqe
cc1: some warnings being treated as errors
vim +6067 fs/io_uring.c
6058
6059 static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
6060 {
6061 int ret;
6062
6063 ret = io_poll_check_events(req, locked);
6064 if (ret > 0)
6065 return;
6066
> 6067 __io_poll_clean(req);
6068
6069 if (!ret)
6070 io_req_task_submit(req, locked);
6071 else
6072 io_req_complete_failed(req, ret);
6073 }
6074
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-06 7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
2022-05-06 11:04 ` kernel test robot
@ 2022-05-06 12:47 ` kernel test robot
2022-05-06 14:36 ` Jens Axboe
2022-05-06 16:22 ` Pavel Begunkov
3 siblings, 0 replies; 36+ messages in thread
From: kernel test robot @ 2022-05-06 12:47 UTC (permalink / raw)
To: Hao Xu, io-uring
Cc: llvm, kbuild-all, Jens Axboe, Pavel Begunkov, linux-kernel
Hi Hao,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on f2e030dd7aaea5a937a2547dc980fab418fbc5e7]
url: https://github.com/intel-lab-lkp/linux/commits/Hao-Xu/fast-poll-multishot-mode/20220506-150750
base: f2e030dd7aaea5a937a2547dc980fab418fbc5e7
config: mips-pic32mzda_defconfig (https://download.01.org/0day-ci/archive/20220506/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5e004fb787698440a387750db7f8028e7cb14cfc)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mips-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/acb232e81643bd097278ebdc17038e6f280e7212
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Hao-Xu/fast-poll-multishot-mode/20220506-150750
git checkout acb232e81643bd097278ebdc17038e6f280e7212
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
>> fs/io_uring.c:6067:2: error: call to undeclared function '__io_poll_clean'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
__io_poll_clean(req);
^
fs/io_uring.c:6067:2: note: did you mean '__io_fill_cqe'?
fs/io_uring.c:2141:20: note: '__io_fill_cqe' declared here
static inline bool __io_fill_cqe(struct io_ring_ctx *ctx, u64 user_data,
^
1 error generated.
vim +/__io_poll_clean +6067 fs/io_uring.c
6058
6059 static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
6060 {
6061 int ret;
6062
6063 ret = io_poll_check_events(req, locked);
6064 if (ret > 0)
6065 return;
6066
> 6067 __io_poll_clean(req);
6068
6069 if (!ret)
6070 io_req_task_submit(req, locked);
6071 else
6072 io_req_complete_failed(req, ret);
6073 }
6074
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-06 7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
@ 2022-05-06 14:18 ` Jens Axboe
2022-05-06 16:01 ` Pavel Begunkov
0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:18 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 1:36 AM, Hao Xu wrote:
> Hi All,
> I actually had a question about the current poll code, from the code it
> seems when we cancel a poll-like request, it will ignore the existing
> events and just raise a -ECANCELED cqe though I haven't tested it. Is
> this by design or am I missing something?
That's by design, but honestly I don't think anyone considered the case
where it's being canceled but has events already. For that case, I think
we should follow the usual logic of only returning an error (canceled)
if we don't have events, if we have events just return them. For
multi-shot, obviously also terminate, but same logic there.
Care to do a separate patch for that?
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept
2022-05-06 7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
@ 2022-05-06 14:32 ` Jens Axboe
2022-05-07 4:05 ` Hao Xu
0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:32 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 1:00 AM, Hao Xu wrote:
> From: Hao Xu <[email protected]>
>
> add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
> support multishot.
>
> Signed-off-by: Hao Xu <[email protected]>
> ---
> include/uapi/linux/io_uring.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index fad63564678a..73bc7e54ac18 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -213,6 +213,11 @@ enum {
> #define IORING_ASYNC_CANCEL_FD (1U << 1)
> #define IORING_ASYNC_CANCEL_ANY (1U << 2)
>
> +/*
> + * accept flags stored in accept_flags
> + */
> +#define IORING_ACCEPT_MULTISHOT (1U << 15)
It isn't stored in accept_flags, is it? This is an io_uring private
flag, and it's in ioprio. Which is honestly a good place for per-op
private flags, since nobody really uses ioprio outside of read/write
style requests. But the comment is wrong :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-06 7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
2022-05-06 11:04 ` kernel test robot
2022-05-06 12:47 ` kernel test robot
@ 2022-05-06 14:36 ` Jens Axboe
2022-05-07 6:37 ` Hao Xu
2022-05-06 16:22 ` Pavel Begunkov
3 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:36 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 1:01 AM, Hao Xu wrote:
> From: Hao Xu <[email protected]>
>
> Add a helper for poll clean, it will be used in the multishot accept in
> the later patches.
Should this just go into io_clean_op()? Didn't look at it thoroughly,
but it'd remove some cases from the next patch if it could.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
2022-05-06 7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
@ 2022-05-06 14:42 ` Jens Axboe
2022-05-07 9:13 ` Hao Xu
2022-05-06 20:50 ` Jens Axboe
1 sibling, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 14:42 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 1:01 AM, Hao Xu wrote:
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 0a83ecc457d1..9febe7774dc3 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1254,6 +1254,7 @@ static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
> static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
> static void io_eventfd_signal(struct io_ring_ctx *ctx);
> static void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
> +static void io_poll_remove_entries(struct io_kiocb *req);
>
> static struct kmem_cache *req_cachep;
>
> @@ -5690,24 +5691,29 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
> static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> {
> struct io_accept *accept = &req->accept;
> + bool multishot;
>
> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> return -EINVAL;
> - if (sqe->ioprio || sqe->len || sqe->buf_index)
> + if (sqe->len || sqe->buf_index)
> return -EINVAL;
>
> accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
> accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> accept->flags = READ_ONCE(sqe->accept_flags);
> accept->nofile = rlimit(RLIMIT_NOFILE);
> + multishot = !!(READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT);
I tend to like:
multishot = READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT) != 0;
as I think it's more readable. But I think we really want it ala:
u16 poll_flags;
poll_flags = READ_ONCE(sqe->ioprio);
if (poll_flags & ~IORING_ACCEPT_MULTISHOT)
return -EINVAL;
...
to ensure that we can add more flags later, hence only accepting this
single flag right now.
Do we need REQ_F_APOLL_MULTI_POLLED, or can we just store whether this
is a multishot request in struct io_accept?
> @@ -5760,7 +5774,35 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
> ret = io_install_fixed_file(req, file, issue_flags,
> accept->file_slot - 1);
> }
> - __io_req_complete(req, issue_flags, ret, 0);
> +
> + if (req->flags & REQ_F_APOLL_MULTISHOT) {
> + if (ret >= 0) {
> + bool filled;
> +
> + spin_lock(&ctx->completion_lock);
> + filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
> + IORING_CQE_F_MORE);
> + io_commit_cqring(ctx);
> + spin_unlock(&ctx->completion_lock);
> + if (unlikely(!filled)) {
> + io_poll_clean(req);
> + return -ECANCELED;
> + }
> + io_cqring_ev_posted(ctx);
> + goto retry;
> + } else {
> + /*
> + * the apoll multishot req should handle poll
> + * cancellation by itself since the upper layer
> + * who called io_queue_sqe() cannot get errors
> + * happened here.
> + */
> + io_poll_clean(req);
> + return ret;
> + }
> + } else {
> + __io_req_complete(req, issue_flags, ret, 0);
> + }
> return 0;
> }
I'd probably just make that:
if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
__io_req_complete(req, issue_flags, ret, 0);
return 0;
}
if (ret >= 0) {
bool filled;
spin_lock(&ctx->completion_lock);
filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
IORING_CQE_F_MORE);
io_commit_cqring(ctx);
spin_unlock(&ctx->completion_lock);
if (filled) {
io_cqring_ev_posted(ctx);
goto retry;
}
/* fall through to error case */
ret = -ECANCELED;
}
/*
* the apoll multishot req should handle poll
* cancellation by itself since the upper layer
* who called io_queue_sqe() cannot get errors
* happened here.
*/
io_poll_clean(req);
return ret;
which I think is a lot easier to read and keeps the indentation at a
manageable level and reduces duplicate code.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-06 14:18 ` Jens Axboe
@ 2022-05-06 16:01 ` Pavel Begunkov
2022-05-06 16:03 ` Jens Axboe
0 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-06 16:01 UTC (permalink / raw)
To: Jens Axboe, Hao Xu, io-uring; +Cc: linux-kernel
On 5/6/22 15:18, Jens Axboe wrote:
> On 5/6/22 1:36 AM, Hao Xu wrote:
>> Hi All,
>> I actually had a question about the current poll code, from the code it
>> seems when we cancel a poll-like request, it will ignore the existing
>> events and just raise a -ECANCELED cqe though I haven't tested it. Is
>> this by design or am I missing something?
>
> That's by design, but honestly I don't think anyone considered the case
> where it's being canceled but has events already. For that case, I think
> we should follow the usual logic of only returning an error (canceled)
> if we don't have events, if we have events just return them. For
> multi-shot, obviously also terminate, but same logic there.
Why would we care? It's inherently racy in any case and any
user not handling this is already screwed.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-06 16:01 ` Pavel Begunkov
@ 2022-05-06 16:03 ` Jens Axboe
0 siblings, 0 replies; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 16:03 UTC (permalink / raw)
To: Pavel Begunkov, Hao Xu, io-uring; +Cc: linux-kernel
On 5/6/22 10:01 AM, Pavel Begunkov wrote:
> On 5/6/22 15:18, Jens Axboe wrote:
>> On 5/6/22 1:36 AM, Hao Xu wrote:
>>> Hi All,
>>> I actually had a question about the current poll code, from the code it
>>> seems when we cancel a poll-like request, it will ignore the existing
>>> events and just raise a -ECANCELED cqe though I haven't tested it. Is
>>> this by design or am I missing something?
>>
>> That's by design, but honestly I don't think anyone considered the case
>> where it's being canceled but has events already. For that case, I think
>> we should follow the usual logic of only returning an error (canceled)
>> if we don't have events, if we have events just return them. For
>> multi-shot, obviously also terminate, but same logic there.
>
> Why would we care? It's inherently racy in any case and any
> user not handling this is already screwed.
We don't really need to care, but the normal logic is "return error if
we have no result, return result otherwise". No reason this should be
any different imho, it's not really a correctness thing.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-06 7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
` (2 preceding siblings ...)
2022-05-06 14:36 ` Jens Axboe
@ 2022-05-06 16:22 ` Pavel Begunkov
2022-05-07 6:43 ` Hao Xu
3 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-06 16:22 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel
On 5/6/22 08:01, Hao Xu wrote:
> From: Hao Xu <[email protected]>
>
> Add a helper for poll clean, it will be used in the multishot accept in
> the later patches.
>
> Signed-off-by: Hao Xu <[email protected]>
> ---
> fs/io_uring.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index d33777575faf..0a83ecc457d1 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> return 0;
> }
>
> +static inline void __io_poll_clean(struct io_kiocb *req)
> +{
> + struct io_ring_ctx *ctx = req->ctx;
> +
> + io_poll_remove_entries(req);
> + spin_lock(&ctx->completion_lock);
> + hash_del(&req->hash_node);
> + spin_unlock(&ctx->completion_lock);
> +}
> +
> +#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
> +static inline void io_poll_clean(struct io_kiocb *req)
> +{
> + if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
So it triggers for apoll multishot only when REQ_F_POLLED is _not_ set,
but if it's not set it did never go through arm_poll / etc. and there is
nothing to clean up. What is the catch?
btw, don't see the function used in this patch, better to add it later
or at least mark with attribute unused, or some may get build failures.
> + __io_poll_clean(req);
> +}
> +
> static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
> {
> struct io_accept *accept = &req->accept;
> @@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>
> static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
> {
> - struct io_ring_ctx *ctx = req->ctx;
> int ret;
>
> ret = io_poll_check_events(req, locked);
> if (ret > 0)
> return;
>
> - io_poll_remove_entries(req);
> - spin_lock(&ctx->completion_lock);
> - hash_del(&req->hash_node);
> - spin_unlock(&ctx->completion_lock);
> + __io_poll_clean(req);
>
> if (!ret)
> io_req_task_submit(req, locked);
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-06 7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
@ 2022-05-06 17:19 ` Pavel Begunkov
2022-05-06 22:02 ` Jens Axboe
2022-05-07 7:08 ` Hao Xu
2022-05-06 18:02 ` kernel test robot
1 sibling, 2 replies; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-06 17:19 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel
On 5/6/22 08:01, Hao Xu wrote:
> From: Hao Xu <[email protected]>
>
> For operations like accept, multishot is a useful feature, since we can
> reduce a number of accept sqe. Let's integrate it to fast poll, it may
> be good for other operations in the future.
>
> Signed-off-by: Hao Xu <[email protected]>
> ---
> fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
> 1 file changed, 26 insertions(+), 15 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 8ebb1a794e36..d33777575faf 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
> * either spurious wakeup or multishot CQE is served. 0 when it's done with
> * the request, then the mask is stored in req->cqe.res.
> */
> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
> {
> struct io_ring_ctx *ctx = req->ctx;
> int v;
> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>
> /* multishot, just fill an CQE and proceed */
> if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
> - __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
> - bool filled;
> -
> - spin_lock(&ctx->completion_lock);
> - filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
> - IORING_CQE_F_MORE);
> - io_commit_cqring(ctx);
> - spin_unlock(&ctx->completion_lock);
> - if (unlikely(!filled))
> - return -ECANCELED;
> - io_cqring_ev_posted(ctx);
> + if (req->flags & REQ_F_APOLL_MULTISHOT) {
> + io_tw_lock(req->ctx, locked);
> + if (likely(!(req->task->flags & PF_EXITING)))
> + io_queue_sqe(req);
That looks dangerous, io_queue_sqe() usually takes the request ownership
and doesn't expect that someone, i.e. io_poll_check_events(), may still be
actively using it.
E.g. io_accept() fails on fd < 0, return an error,
io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
kills it. Then io_poll_check_events() and polling in general
carry on using the freed request => UAF. Didn't look at it
too carefully, but there might other similar cases.
> + else
> + return -EFAULT;
> + } else {
> + __poll_t mask = mangle_poll(req->cqe.res &
> + req->apoll_events);
> + bool filled;
> +
> + spin_lock(&ctx->completion_lock);
> + filled = io_fill_cqe_aux(ctx, req->cqe.user_data,
> + mask, IORING_CQE_F_MORE);
> + io_commit_cqring(ctx);
> + spin_unlock(&ctx->completion_lock);
> + if (unlikely(!filled))
> + return -ECANCELED;
> + io_cqring_ev_posted(ctx);
> + }
> } else if (req->cqe.res) {
> return 0;
> }
> @@ -6010,7 +6019,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
> struct io_ring_ctx *ctx = req->ctx;
> int ret;
>
> - ret = io_poll_check_events(req, *locked);
> + ret = io_poll_check_events(req, locked);
> if (ret > 0)
> return;
>
> @@ -6035,7 +6044,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
> struct io_ring_ctx *ctx = req->ctx;
> int ret;
>
> - ret = io_poll_check_events(req, *locked);
> + ret = io_poll_check_events(req, locked);
> if (ret > 0)
> return;
>
> @@ -6275,7 +6284,7 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
> struct io_ring_ctx *ctx = req->ctx;
> struct async_poll *apoll;
> struct io_poll_table ipt;
> - __poll_t mask = EPOLLONESHOT | POLLERR | POLLPRI;
> + __poll_t mask = POLLERR | POLLPRI;
> int ret;
>
> if (!def->pollin && !def->pollout)
> @@ -6284,6 +6293,8 @@ static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
> return IO_APOLL_ABORTED;
> if ((req->flags & (REQ_F_POLLED|REQ_F_PARTIAL_IO)) == REQ_F_POLLED)
> return IO_APOLL_ABORTED;
> + if (!(req->flags & REQ_F_APOLL_MULTISHOT))
> + mask |= EPOLLONESHOT;
>
> if (def->pollin) {
> mask |= POLLIN | POLLRDNORM;
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-06 7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
2022-05-06 17:19 ` Pavel Begunkov
@ 2022-05-06 18:02 ` kernel test robot
1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2022-05-06 18:02 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: kbuild-all, Jens Axboe, Pavel Begunkov, linux-kernel
Hi Hao,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on f2e030dd7aaea5a937a2547dc980fab418fbc5e7]
url: https://github.com/intel-lab-lkp/linux/commits/Hao-Xu/fast-poll-multishot-mode/20220506-150750
base: f2e030dd7aaea5a937a2547dc980fab418fbc5e7
config: x86_64-randconfig-s021 (https://download.01.org/0day-ci/archive/20220507/[email protected]/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/intel-lab-lkp/linux/commit/6001c3e95550875d4328aa2ca8b342c42b0e644e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Hao-Xu/fast-poll-multishot-mode/20220506-150750
git checkout 6001c3e95550875d4328aa2ca8b342c42b0e644e
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
sparse warnings: (new ones prefixed by >>)
fs/io_uring.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, include/trace/events/io_uring.h):
include/trace/events/io_uring.h:488:1: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned int [usertype] op_flags @@ got restricted __kernel_rwf_t const [usertype] rw_flags @@
include/trace/events/io_uring.h:488:1: sparse: expected unsigned int [usertype] op_flags
include/trace/events/io_uring.h:488:1: sparse: got restricted __kernel_rwf_t const [usertype] rw_flags
fs/io_uring.c: note: in included file (through include/trace/perf.h, include/trace/define_trace.h, include/trace/events/io_uring.h):
include/trace/events/io_uring.h:488:1: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned int [usertype] op_flags @@ got restricted __kernel_rwf_t const [usertype] rw_flags @@
include/trace/events/io_uring.h:488:1: sparse: expected unsigned int [usertype] op_flags
include/trace/events/io_uring.h:488:1: sparse: got restricted __kernel_rwf_t const [usertype] rw_flags
fs/io_uring.c:3280:23: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned int [usertype] flags @@ got restricted __kernel_rwf_t @@
fs/io_uring.c:3280:23: sparse: expected unsigned int [usertype] flags
fs/io_uring.c:3280:23: sparse: got restricted __kernel_rwf_t
fs/io_uring.c:3477:24: sparse: sparse: incorrect type in return expression (different address spaces) @@ expected void [noderef] __user * @@ got struct io_buffer *[assigned] kbuf @@
fs/io_uring.c:3477:24: sparse: expected void [noderef] __user *
fs/io_uring.c:3477:24: sparse: got struct io_buffer *[assigned] kbuf
fs/io_uring.c:3864:48: sparse: sparse: incorrect type in argument 2 (different base types) @@ expected restricted __kernel_rwf_t [usertype] flags @@ got unsigned int [usertype] flags @@
fs/io_uring.c:3864:48: sparse: expected restricted __kernel_rwf_t [usertype] flags
fs/io_uring.c:3864:48: sparse: got unsigned int [usertype] flags
fs/io_uring.c:5187:14: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct file *file @@ got struct file [noderef] __rcu * @@
fs/io_uring.c:5187:14: sparse: expected struct file *file
fs/io_uring.c:5187:14: sparse: got struct file [noderef] __rcu *
fs/io_uring.c:5974:68: sparse: sparse: incorrect type in initializer (different base types) @@ expected restricted __poll_t [usertype] _key @@ got int apoll_events @@
fs/io_uring.c:5974:68: sparse: expected restricted __poll_t [usertype] _key
fs/io_uring.c:5974:68: sparse: got int apoll_events
fs/io_uring.c:5979:48: sparse: sparse: restricted __poll_t degrades to integer
fs/io_uring.c:5983:59: sparse: sparse: restricted __poll_t degrades to integer
fs/io_uring.c:5991:74: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected restricted __poll_t [usertype] val @@ got int @@
fs/io_uring.c:5991:74: sparse: expected restricted __poll_t [usertype] val
fs/io_uring.c:5991:74: sparse: got int
fs/io_uring.c:5991:60: sparse: sparse: incorrect type in initializer (different base types) @@ expected restricted __poll_t [usertype] mask @@ got unsigned short @@
fs/io_uring.c:5991:60: sparse: expected restricted __poll_t [usertype] mask
fs/io_uring.c:5991:60: sparse: got unsigned short
fs/io_uring.c:5997:58: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected signed int [usertype] res @@ got restricted __poll_t [usertype] mask @@
fs/io_uring.c:5997:58: sparse: expected signed int [usertype] res
fs/io_uring.c:5997:58: sparse: got restricted __poll_t [usertype] mask
fs/io_uring.c:6027:68: sparse: sparse: restricted __poll_t degrades to integer
fs/io_uring.c:6027:57: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected restricted __poll_t [usertype] val @@ got unsigned int @@
fs/io_uring.c:6027:57: sparse: expected restricted __poll_t [usertype] val
fs/io_uring.c:6027:57: sparse: got unsigned int
fs/io_uring.c:6108:45: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int events @@ got restricted __poll_t [usertype] events @@
fs/io_uring.c:6108:45: sparse: expected int events
fs/io_uring.c:6108:45: sparse: got restricted __poll_t [usertype] events
fs/io_uring.c:6143:40: sparse: sparse: incorrect type in argument 2 (different base types) @@ expected int mask @@ got restricted __poll_t [usertype] mask @@
fs/io_uring.c:6143:40: sparse: expected int mask
fs/io_uring.c:6143:40: sparse: got restricted __poll_t [usertype] mask
fs/io_uring.c:6143:50: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int events @@ got restricted __poll_t [usertype] events @@
fs/io_uring.c:6143:50: sparse: expected int events
fs/io_uring.c:6143:50: sparse: got restricted __poll_t [usertype] events
fs/io_uring.c:6235:24: sparse: sparse: incorrect type in return expression (different base types) @@ expected int @@ got restricted __poll_t [assigned] [usertype] mask @@
fs/io_uring.c:6235:24: sparse: expected int
fs/io_uring.c:6235:24: sparse: got restricted __poll_t [assigned] [usertype] mask
fs/io_uring.c:6252:40: sparse: sparse: incorrect type in argument 2 (different base types) @@ expected int mask @@ got restricted __poll_t [assigned] [usertype] mask @@
fs/io_uring.c:6252:40: sparse: expected int mask
fs/io_uring.c:6252:40: sparse: got restricted __poll_t [assigned] [usertype] mask
fs/io_uring.c:6252:50: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int events @@ got restricted __poll_t [usertype] events @@
fs/io_uring.c:6252:50: sparse: expected int events
fs/io_uring.c:6252:50: sparse: got restricted __poll_t [usertype] events
fs/io_uring.c:6262:47: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int events @@ got restricted __poll_t [usertype] events @@
fs/io_uring.c:6262:47: sparse: expected int events
fs/io_uring.c:6262:47: sparse: got restricted __poll_t [usertype] events
>> fs/io_uring.c:6287:33: sparse: sparse: incorrect type in initializer (different base types) @@ expected restricted __poll_t [usertype] mask @@ got int @@
fs/io_uring.c:6287:33: sparse: expected restricted __poll_t [usertype] mask
fs/io_uring.c:6287:33: sparse: got int
fs/io_uring.c:6300:22: sparse: sparse: invalid assignment: |=
fs/io_uring.c:6300:22: sparse: left side has type restricted __poll_t
fs/io_uring.c:6300:22: sparse: right side has type int
fs/io_uring.c:6305:30: sparse: sparse: invalid assignment: &=
fs/io_uring.c:6305:30: sparse: left side has type restricted __poll_t
fs/io_uring.c:6305:30: sparse: right side has type int
fs/io_uring.c:6307:22: sparse: sparse: invalid assignment: |=
fs/io_uring.c:6307:22: sparse: left side has type restricted __poll_t
fs/io_uring.c:6307:22: sparse: right side has type int
fs/io_uring.c:6335:33: sparse: sparse: incorrect type in argument 5 (different base types) @@ expected int mask @@ got restricted __poll_t [assigned] [usertype] mask @@
fs/io_uring.c:6335:33: sparse: expected int mask
fs/io_uring.c:6335:33: sparse: got restricted __poll_t [assigned] [usertype] mask
fs/io_uring.c:6335:50: sparse: sparse: incorrect type in argument 6 (different base types) @@ expected int events @@ got restricted __poll_t [usertype] events @@
fs/io_uring.c:6335:50: sparse: expected int events
fs/io_uring.c:6335:50: sparse: got restricted __poll_t [usertype] events
fs/io_uring.c:6449:24: sparse: sparse: invalid assignment: |=
fs/io_uring.c:6449:24: sparse: left side has type unsigned int
fs/io_uring.c:6449:24: sparse: right side has type restricted __poll_t
fs/io_uring.c:6450:65: sparse: sparse: restricted __poll_t degrades to integer
fs/io_uring.c:6450:29: sparse: sparse: restricted __poll_t degrades to integer
fs/io_uring.c:6450:38: sparse: sparse: incorrect type in return expression (different base types) @@ expected restricted __poll_t @@ got unsigned int @@
fs/io_uring.c:6450:38: sparse: expected restricted __poll_t
fs/io_uring.c:6450:38: sparse: got unsigned int
fs/io_uring.c:6502:27: sparse: sparse: incorrect type in assignment (different base types) @@ expected int apoll_events @@ got restricted __poll_t [usertype] events @@
fs/io_uring.c:6502:27: sparse: expected int apoll_events
fs/io_uring.c:6502:27: sparse: got restricted __poll_t [usertype] events
fs/io_uring.c:6541:43: sparse: sparse: invalid assignment: &=
fs/io_uring.c:6541:43: sparse: left side has type restricted __poll_t
fs/io_uring.c:6541:43: sparse: right side has type int
fs/io_uring.c:6542:62: sparse: sparse: restricted __poll_t degrades to integer
fs/io_uring.c:6542:43: sparse: sparse: invalid assignment: |=
fs/io_uring.c:6542:43: sparse: left side has type restricted __poll_t
fs/io_uring.c:6542:43: sparse: right side has type unsigned int
fs/io_uring.c:2536:17: sparse: sparse: context imbalance in 'handle_prev_tw_list' - different lock contexts for basic block
fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition
fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition
fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition
fs/io_uring.c:7610:39: sparse: sparse: marked inline, but without a definition
vim +6287 fs/io_uring.c
6280
6281 static int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
6282 {
6283 const struct io_op_def *def = &io_op_defs[req->opcode];
6284 struct io_ring_ctx *ctx = req->ctx;
6285 struct async_poll *apoll;
6286 struct io_poll_table ipt;
> 6287 __poll_t mask = POLLERR | POLLPRI;
6288 int ret;
6289
6290 if (!def->pollin && !def->pollout)
6291 return IO_APOLL_ABORTED;
6292 if (!file_can_poll(req->file))
6293 return IO_APOLL_ABORTED;
6294 if ((req->flags & (REQ_F_POLLED|REQ_F_PARTIAL_IO)) == REQ_F_POLLED)
6295 return IO_APOLL_ABORTED;
6296 if (!(req->flags & REQ_F_APOLL_MULTISHOT))
6297 mask |= EPOLLONESHOT;
6298
6299 if (def->pollin) {
6300 mask |= POLLIN | POLLRDNORM;
6301
6302 /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
6303 if ((req->opcode == IORING_OP_RECVMSG) &&
6304 (req->sr_msg.msg_flags & MSG_ERRQUEUE))
6305 mask &= ~POLLIN;
6306 } else {
6307 mask |= POLLOUT | POLLWRNORM;
6308 }
6309 if (def->poll_exclusive)
6310 mask |= EPOLLEXCLUSIVE;
6311 if (req->flags & REQ_F_POLLED) {
6312 apoll = req->apoll;
6313 } else if (!(issue_flags & IO_URING_F_UNLOCKED) &&
6314 !list_empty(&ctx->apoll_cache)) {
6315 apoll = list_first_entry(&ctx->apoll_cache, struct async_poll,
6316 poll.wait.entry);
6317 list_del_init(&apoll->poll.wait.entry);
6318 } else {
6319 apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
6320 if (unlikely(!apoll))
6321 return IO_APOLL_ABORTED;
6322 }
6323 apoll->double_poll = NULL;
6324 req->apoll = apoll;
6325 req->flags |= REQ_F_POLLED;
6326 ipt.pt._qproc = io_async_queue_proc;
6327
6328 io_kbuf_recycle(req, issue_flags);
6329
6330 ret = __io_arm_poll_handler(req, &apoll->poll, &ipt, mask);
6331 if (ret || ipt.error)
6332 return ret ? IO_APOLL_READY : IO_APOLL_ABORTED;
6333
6334 trace_io_uring_poll_arm(ctx, req, req->cqe.user_data, req->opcode,
6335 mask, apoll->poll.events);
6336 return IO_APOLL_OK;
6337 }
6338
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
2022-05-06 7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
2022-05-06 14:42 ` Jens Axboe
@ 2022-05-06 20:50 ` Jens Axboe
2022-05-06 21:29 ` Jens Axboe
1 sibling, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 20:50 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 1:01 AM, Hao Xu wrote:
> @@ -5748,8 +5758,12 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
> if (!fixed)
> put_unused_fd(fd);
> ret = PTR_ERR(file);
> - if (ret == -EAGAIN && force_nonblock)
> - return -EAGAIN;
> + if (ret == -EAGAIN && force_nonblock) {
> + if ((req->flags & REQ_F_APOLL_MULTI_POLLED) ==
> + REQ_F_APOLL_MULTI_POLLED)
> + ret = 0;
> + return ret;
FWIW, this
if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
is identical to
if (req->flags & REQ_F_APOLL_MULTI_POLLED)
but I suspect this used to check more flags (??), because as it stands
it seems a bit nonsensical.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
2022-05-06 20:50 ` Jens Axboe
@ 2022-05-06 21:29 ` Jens Axboe
0 siblings, 0 replies; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 21:29 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 2:50 PM, Jens Axboe wrote:
> On 5/6/22 1:01 AM, Hao Xu wrote:
>> @@ -5748,8 +5758,12 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>> if (!fixed)
>> put_unused_fd(fd);
>> ret = PTR_ERR(file);
>> - if (ret == -EAGAIN && force_nonblock)
>> - return -EAGAIN;
>> + if (ret == -EAGAIN && force_nonblock) {
>> + if ((req->flags & REQ_F_APOLL_MULTI_POLLED) ==
>> + REQ_F_APOLL_MULTI_POLLED)
>> + ret = 0;
>> + return ret;
>
> FWIW, this
>
> if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
>
> is identical to
>
> if (req->flags & REQ_F_APOLL_MULTI_POLLED)
>
> but I suspect this used to check more flags (??), because as it stands
> it seems a bit nonsensical.
Looking deeper, it is indeed a mask and not a single flag! So the check
looks fine.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-06 17:19 ` Pavel Begunkov
@ 2022-05-06 22:02 ` Jens Axboe
2022-05-07 6:32 ` Hao Xu
2022-05-07 9:26 ` Pavel Begunkov
2022-05-07 7:08 ` Hao Xu
1 sibling, 2 replies; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 22:02 UTC (permalink / raw)
To: Pavel Begunkov, Hao Xu, io-uring; +Cc: linux-kernel
On 5/6/22 11:19 AM, Pavel Begunkov wrote:
> On 5/6/22 08:01, Hao Xu wrote:
>> From: Hao Xu <[email protected]>
>>
>> For operations like accept, multishot is a useful feature, since we can
>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>> be good for other operations in the future.
>>
>> Signed-off-by: Hao Xu <[email protected]>
>> ---
>> fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>> 1 file changed, 26 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 8ebb1a794e36..d33777575faf 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
>> * either spurious wakeup or multishot CQE is served. 0 when it's done with
>> * the request, then the mask is stored in req->cqe.res.
>> */
>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>> {
>> struct io_ring_ctx *ctx = req->ctx;
>> int v;
>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>> /* multishot, just fill an CQE and proceed */
>> if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>> - __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
>> - bool filled;
>> -
>> - spin_lock(&ctx->completion_lock);
>> - filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>> - IORING_CQE_F_MORE);
>> - io_commit_cqring(ctx);
>> - spin_unlock(&ctx->completion_lock);
>> - if (unlikely(!filled))
>> - return -ECANCELED;
>> - io_cqring_ev_posted(ctx);
>> + if (req->flags & REQ_F_APOLL_MULTISHOT) {
>> + io_tw_lock(req->ctx, locked);
>> + if (likely(!(req->task->flags & PF_EXITING)))
>> + io_queue_sqe(req);
>
> That looks dangerous, io_queue_sqe() usually takes the request
> ownership and doesn't expect that someone, i.e.
> io_poll_check_events(), may still be actively using it.
I took a look at this, too. We do own the request at this point, but
it's still on the poll list. If io_accept() fails, then we do run the
poll_clean.
> E.g. io_accept() fails on fd < 0, return an error, io_queue_sqe() ->
> io_queue_async() -> io_req_complete_failed() kills it. Then
> io_poll_check_events() and polling in general carry on using the freed
> request => UAF. Didn't look at it too carefully, but there might other
> similar cases.
But we better have done poll_clean() before returning the error. What am
I missing here?
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
` (5 preceding siblings ...)
2022-05-06 7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
@ 2022-05-06 22:23 ` Jens Axboe
2022-05-06 23:26 ` Jens Axboe
6 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 22:23 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 1:00 AM, Hao Xu wrote:
> Let multishot support multishot mode, currently only add accept as its
> first comsumer.
> theoretical analysis:
> 1) when connections come in fast
> - singleshot:
> add accept sqe(userpsace) --> accept inline
> ^ |
> |-----------------|
> - multishot:
> add accept sqe(userspace) --> accept inline
> ^ |
> |--*--|
>
> we do accept repeatedly in * place until get EAGAIN
>
> 2) when connections come in at a low pressure
> similar thing like 1), we reduce a lot of userspace-kernel context
> switch and useless vfs_poll()
>
>
> tests:
> Did some tests, which goes in this way:
>
> server client(multiple)
> accept connect
> read write
> write read
> close close
>
> Basically, raise up a number of clients(on same machine with server) to
> connect to the server, and then write some data to it, the server will
> write those data back to the client after it receives them, and then
> close the connection after write return. Then the client will read the
> data and then close the connection. Here I test 10000 clients connect
> one server, data size 128 bytes. And each client has a go routine for
> it, so they come to the server in short time.
> test 20 times before/after this patchset, time spent:(unit cycle, which
> is the return value of clock())
> before:
> 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
> +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
> +1934226+1914385)/20.0 = 1927633.75
> after:
> 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
> +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
> +1871324+1940803)/20.0 = 1894750.45
>
> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>
>
> A liburing test is here:
> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
Wish I had seen that, I wrote my own! But maybe that's good, you tend to
find other issues through that.
Anyway, works for me in testing, and I can see this being a nice win for
accept intensive workloads. I pushed a bunch of cleanup patches that
should just get folded in. Can you fold them into your patches and
address the other feedback, and post a v3? I pushed the test branch
here:
https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-06 22:23 ` Jens Axboe
@ 2022-05-06 23:26 ` Jens Axboe
2022-05-07 2:33 ` Jens Axboe
0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-06 23:26 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 4:23 PM, Jens Axboe wrote:
> On 5/6/22 1:00 AM, Hao Xu wrote:
>> Let multishot support multishot mode, currently only add accept as its
>> first comsumer.
>> theoretical analysis:
>> 1) when connections come in fast
>> - singleshot:
>> add accept sqe(userpsace) --> accept inline
>> ^ |
>> |-----------------|
>> - multishot:
>> add accept sqe(userspace) --> accept inline
>> ^ |
>> |--*--|
>>
>> we do accept repeatedly in * place until get EAGAIN
>>
>> 2) when connections come in at a low pressure
>> similar thing like 1), we reduce a lot of userspace-kernel context
>> switch and useless vfs_poll()
>>
>>
>> tests:
>> Did some tests, which goes in this way:
>>
>> server client(multiple)
>> accept connect
>> read write
>> write read
>> close close
>>
>> Basically, raise up a number of clients(on same machine with server) to
>> connect to the server, and then write some data to it, the server will
>> write those data back to the client after it receives them, and then
>> close the connection after write return. Then the client will read the
>> data and then close the connection. Here I test 10000 clients connect
>> one server, data size 128 bytes. And each client has a go routine for
>> it, so they come to the server in short time.
>> test 20 times before/after this patchset, time spent:(unit cycle, which
>> is the return value of clock())
>> before:
>> 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>> +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>> +1934226+1914385)/20.0 = 1927633.75
>> after:
>> 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>> +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>> +1871324+1940803)/20.0 = 1894750.45
>>
>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>
>>
>> A liburing test is here:
>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
>
> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
> find other issues through that.
>
> Anyway, works for me in testing, and I can see this being a nice win for
> accept intensive workloads. I pushed a bunch of cleanup patches that
> should just get folded in. Can you fold them into your patches and
> address the other feedback, and post a v3? I pushed the test branch
> here:
>
> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
Quick benchmark here, accepting 10k connections:
Stock kernel
real 0m0.728s
user 0m0.009s
sys 0m0.192s
Patched
real 0m0.684s
user 0m0.018s
sys 0m0.102s
Looks like a nice win for a highly synthetic benchmark. Nothing
scientific, was just curious.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-06 23:26 ` Jens Axboe
@ 2022-05-07 2:33 ` Jens Axboe
2022-05-07 3:08 ` Jens Axboe
0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-07 2:33 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 5:26 PM, Jens Axboe wrote:
> On 5/6/22 4:23 PM, Jens Axboe wrote:
>> On 5/6/22 1:00 AM, Hao Xu wrote:
>>> Let multishot support multishot mode, currently only add accept as its
>>> first comsumer.
>>> theoretical analysis:
>>> 1) when connections come in fast
>>> - singleshot:
>>> add accept sqe(userpsace) --> accept inline
>>> ^ |
>>> |-----------------|
>>> - multishot:
>>> add accept sqe(userspace) --> accept inline
>>> ^ |
>>> |--*--|
>>>
>>> we do accept repeatedly in * place until get EAGAIN
>>>
>>> 2) when connections come in at a low pressure
>>> similar thing like 1), we reduce a lot of userspace-kernel context
>>> switch and useless vfs_poll()
>>>
>>>
>>> tests:
>>> Did some tests, which goes in this way:
>>>
>>> server client(multiple)
>>> accept connect
>>> read write
>>> write read
>>> close close
>>>
>>> Basically, raise up a number of clients(on same machine with server) to
>>> connect to the server, and then write some data to it, the server will
>>> write those data back to the client after it receives them, and then
>>> close the connection after write return. Then the client will read the
>>> data and then close the connection. Here I test 10000 clients connect
>>> one server, data size 128 bytes. And each client has a go routine for
>>> it, so they come to the server in short time.
>>> test 20 times before/after this patchset, time spent:(unit cycle, which
>>> is the return value of clock())
>>> before:
>>> 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>> +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>> +1934226+1914385)/20.0 = 1927633.75
>>> after:
>>> 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>> +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>> +1871324+1940803)/20.0 = 1894750.45
>>>
>>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>>
>>>
>>> A liburing test is here:
>>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
>>
>> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
>> find other issues through that.
>>
>> Anyway, works for me in testing, and I can see this being a nice win for
>> accept intensive workloads. I pushed a bunch of cleanup patches that
>> should just get folded in. Can you fold them into your patches and
>> address the other feedback, and post a v3? I pushed the test branch
>> here:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
>
> Quick benchmark here, accepting 10k connections:
>
> Stock kernel
> real 0m0.728s
> user 0m0.009s
> sys 0m0.192s
>
> Patched
> real 0m0.684s
> user 0m0.018s
> sys 0m0.102s
>
> Looks like a nice win for a highly synthetic benchmark. Nothing
> scientific, was just curious.
One more thought on this - how is it supposed to work with
accept-direct? One idea would be to make it incrementally increasing.
But we need a good story for that, if it's exclusive to non-direct
files, then it's a lot less interesting as the latter is really nice win
for lots of files. If we can combine the two, even better.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-07 2:33 ` Jens Axboe
@ 2022-05-07 3:08 ` Jens Axboe
2022-05-07 16:01 ` Hao Xu
0 siblings, 1 reply; 36+ messages in thread
From: Jens Axboe @ 2022-05-07 3:08 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Pavel Begunkov, linux-kernel
On 5/6/22 8:33 PM, Jens Axboe wrote:
> On 5/6/22 5:26 PM, Jens Axboe wrote:
>> On 5/6/22 4:23 PM, Jens Axboe wrote:
>>> On 5/6/22 1:00 AM, Hao Xu wrote:
>>>> Let multishot support multishot mode, currently only add accept as its
>>>> first comsumer.
>>>> theoretical analysis:
>>>> 1) when connections come in fast
>>>> - singleshot:
>>>> add accept sqe(userpsace) --> accept inline
>>>> ^ |
>>>> |-----------------|
>>>> - multishot:
>>>> add accept sqe(userspace) --> accept inline
>>>> ^ |
>>>> |--*--|
>>>>
>>>> we do accept repeatedly in * place until get EAGAIN
>>>>
>>>> 2) when connections come in at a low pressure
>>>> similar thing like 1), we reduce a lot of userspace-kernel context
>>>> switch and useless vfs_poll()
>>>>
>>>>
>>>> tests:
>>>> Did some tests, which goes in this way:
>>>>
>>>> server client(multiple)
>>>> accept connect
>>>> read write
>>>> write read
>>>> close close
>>>>
>>>> Basically, raise up a number of clients(on same machine with server) to
>>>> connect to the server, and then write some data to it, the server will
>>>> write those data back to the client after it receives them, and then
>>>> close the connection after write return. Then the client will read the
>>>> data and then close the connection. Here I test 10000 clients connect
>>>> one server, data size 128 bytes. And each client has a go routine for
>>>> it, so they come to the server in short time.
>>>> test 20 times before/after this patchset, time spent:(unit cycle, which
>>>> is the return value of clock())
>>>> before:
>>>> 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>>> +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>>> +1934226+1914385)/20.0 = 1927633.75
>>>> after:
>>>> 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>>> +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>>> +1871324+1940803)/20.0 = 1894750.45
>>>>
>>>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>>>
>>>>
>>>> A liburing test is here:
>>>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
>>>
>>> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
>>> find other issues through that.
>>>
>>> Anyway, works for me in testing, and I can see this being a nice win for
>>> accept intensive workloads. I pushed a bunch of cleanup patches that
>>> should just get folded in. Can you fold them into your patches and
>>> address the other feedback, and post a v3? I pushed the test branch
>>> here:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
>>
>> Quick benchmark here, accepting 10k connections:
>>
>> Stock kernel
>> real 0m0.728s
>> user 0m0.009s
>> sys 0m0.192s
>>
>> Patched
>> real 0m0.684s
>> user 0m0.018s
>> sys 0m0.102s
>>
>> Looks like a nice win for a highly synthetic benchmark. Nothing
>> scientific, was just curious.
>
> One more thought on this - how is it supposed to work with
> accept-direct? One idea would be to make it incrementally increasing.
> But we need a good story for that, if it's exclusive to non-direct
> files, then it's a lot less interesting as the latter is really nice win
> for lots of files. If we can combine the two, even better.
Running some quick testing, on an actual test box (previous numbers were
from a vm on my laptop):
Testing singleshot, normal files
Did 10000 accepts
________________________________________________________
Executed in 216.10 millis fish external
usr time 9.32 millis 150.00 micros 9.17 millis
sys time 110.06 millis 67.00 micros 109.99 millis
Testing multishot, fixed files
Did 10000 accepts
________________________________________________________
Executed in 189.04 millis fish external
usr time 11.86 millis 159.00 micros 11.71 millis
sys time 93.71 millis 70.00 micros 93.64 millis
That's about ~19 usec to accept a connection, pretty decent. Using
singleshot and with fixed files, it shaves about ~8% off, ends at around
200msec.
I think we can get away with using fixed files and multishot, attaching
the quick patch I did below to test it. We need something better than
this, otherwise once the space fills up, we'll likely end up with a
sparse space and the naive approach of just incrementing the next slot
won't work at all.
--
Jens Axboe
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept
2022-05-06 14:32 ` Jens Axboe
@ 2022-05-07 4:05 ` Hao Xu
0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 4:05 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel
在 5/6/22 10:32 PM, Jens Axboe 写道:
> On 5/6/22 1:00 AM, Hao Xu wrote:
>> From: Hao Xu <[email protected]>
>>
>> add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
>> support multishot.
>>
>> Signed-off-by: Hao Xu <[email protected]>
>> ---
>> include/uapi/linux/io_uring.h | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index fad63564678a..73bc7e54ac18 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -213,6 +213,11 @@ enum {
>> #define IORING_ASYNC_CANCEL_FD (1U << 1)
>> #define IORING_ASYNC_CANCEL_ANY (1U << 2)
>>
>> +/*
>> + * accept flags stored in accept_flags
>> + */
>> +#define IORING_ACCEPT_MULTISHOT (1U << 15)
>
> It isn't stored in accept_flags, is it? This is an io_uring private
> flag, and it's in ioprio. Which is honestly a good place for per-op
> private flags, since nobody really uses ioprio outside of read/write
> style requests. But the comment is wrong :-)
Ah, yes, thanks for pointing it out, I forgot to update the comment
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-06 22:02 ` Jens Axboe
@ 2022-05-07 6:32 ` Hao Xu
2022-05-07 9:26 ` Pavel Begunkov
1 sibling, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 6:32 UTC (permalink / raw)
To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: linux-kernel
在 2022/5/7 上午6:02, Jens Axboe 写道:
> On 5/6/22 11:19 AM, Pavel Begunkov wrote:
>> On 5/6/22 08:01, Hao Xu wrote:
>>> From: Hao Xu <[email protected]>
>>>
>>> For operations like accept, multishot is a useful feature, since we can
>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>> be good for other operations in the future.
>>>
>>> Signed-off-by: Hao Xu <[email protected]>
>>> ---
>>> fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>>> 1 file changed, 26 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 8ebb1a794e36..d33777575faf 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
>>> * either spurious wakeup or multishot CQE is served. 0 when it's done with
>>> * the request, then the mask is stored in req->cqe.res.
>>> */
>>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>>> {
>>> struct io_ring_ctx *ctx = req->ctx;
>>> int v;
>>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>> /* multishot, just fill an CQE and proceed */
>>> if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>>> - __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
>>> - bool filled;
>>> -
>>> - spin_lock(&ctx->completion_lock);
>>> - filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>>> - IORING_CQE_F_MORE);
>>> - io_commit_cqring(ctx);
>>> - spin_unlock(&ctx->completion_lock);
>>> - if (unlikely(!filled))
>>> - return -ECANCELED;
>>> - io_cqring_ev_posted(ctx);
>>> + if (req->flags & REQ_F_APOLL_MULTISHOT) {
>>> + io_tw_lock(req->ctx, locked);
>>> + if (likely(!(req->task->flags & PF_EXITING)))
>>> + io_queue_sqe(req);
>>
>> That looks dangerous, io_queue_sqe() usually takes the request
>> ownership and doesn't expect that someone, i.e.
>> io_poll_check_events(), may still be actively using it.
>
> I took a look at this, too. We do own the request at this point, but
> it's still on the poll list. If io_accept() fails, then we do run the
> poll_clean.
>
>> E.g. io_accept() fails on fd < 0, return an error, io_queue_sqe() ->
>> io_queue_async() -> io_req_complete_failed() kills it. Then
>> io_poll_check_events() and polling in general carry on using the freed
>> request => UAF. Didn't look at it too carefully, but there might other
>> similar cases.
>
> But we better have done poll_clean() before returning the error. What am
> I missing here?
Sorry, I don't get it. I've done the poll_clean() before returnning
error in io_accept()
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-06 14:36 ` Jens Axboe
@ 2022-05-07 6:37 ` Hao Xu
0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 6:37 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel
在 2022/5/6 下午10:36, Jens Axboe 写道:
> On 5/6/22 1:01 AM, Hao Xu wrote:
>> From: Hao Xu <[email protected]>
>>
>> Add a helper for poll clean, it will be used in the multishot accept in
>> the later patches.
>
> Should this just go into io_clean_op()? Didn't look at it thoroughly,
> but it'd remove some cases from the next patch if it could.
>
Actually this was my first version, here I put it in io_accept() to make
it happen as early as possible. I rethinked about this, seems it's ok to
put it into the io_clean_op().
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-06 16:22 ` Pavel Begunkov
@ 2022-05-07 6:43 ` Hao Xu
2022-05-07 9:29 ` Pavel Begunkov
0 siblings, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-07 6:43 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe, linux-kernel
在 2022/5/7 上午12:22, Pavel Begunkov 写道:
> On 5/6/22 08:01, Hao Xu wrote:
>> From: Hao Xu <[email protected]>
>>
>> Add a helper for poll clean, it will be used in the multishot accept in
>> the later patches.
>>
>> Signed-off-by: Hao Xu <[email protected]>
>> ---
>> fs/io_uring.c | 23 ++++++++++++++++++-----
>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index d33777575faf..0a83ecc457d1 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req,
>> const struct io_uring_sqe *sqe)
>> return 0;
>> }
>> +static inline void __io_poll_clean(struct io_kiocb *req)
>> +{
>> + struct io_ring_ctx *ctx = req->ctx;
>> +
>> + io_poll_remove_entries(req);
>> + spin_lock(&ctx->completion_lock);
>> + hash_del(&req->hash_node);
>> + spin_unlock(&ctx->completion_lock);
>> +}
>> +
>> +#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
>> +static inline void io_poll_clean(struct io_kiocb *req)
>> +{
>> + if ((req->flags & REQ_F_APOLL_MULTI_POLLED) ==
>> REQ_F_APOLL_MULTI_POLLED)
>
> So it triggers for apoll multishot only when REQ_F_POLLED is _not_ set,
> but if it's not set it did never go through arm_poll / etc. and there is
> nothing to clean up. What is the catch?
No, it is triggered for apoll multishot only when REQ_F_POLLED is set..
>
> btw, don't see the function used in this patch, better to add it later
> or at least mark with attribute unused, or some may get build failures.
Gotcha.
>
>
>> + __io_poll_clean(req);
>> +}
>> +
>> static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>> {
>> struct io_accept *accept = &req->accept;
>> @@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb
>> *req, bool *locked)
>> static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>> {
>> - struct io_ring_ctx *ctx = req->ctx;
>> int ret;
>> ret = io_poll_check_events(req, locked);
>> if (ret > 0)
>> return;
>> - io_poll_remove_entries(req);
>> - spin_lock(&ctx->completion_lock);
>> - hash_del(&req->hash_node);
>> - spin_unlock(&ctx->completion_lock);
>> + __io_poll_clean(req);
>> if (!ret)
>> io_req_task_submit(req, locked);
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-06 17:19 ` Pavel Begunkov
2022-05-06 22:02 ` Jens Axboe
@ 2022-05-07 7:08 ` Hao Xu
2022-05-07 9:47 ` Pavel Begunkov
1 sibling, 1 reply; 36+ messages in thread
From: Hao Xu @ 2022-05-07 7:08 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe, linux-kernel
在 2022/5/7 上午1:19, Pavel Begunkov 写道:
> On 5/6/22 08:01, Hao Xu wrote:
>> From: Hao Xu <[email protected]>
>>
>> For operations like accept, multishot is a useful feature, since we can
>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>> be good for other operations in the future.
>>
>> Signed-off-by: Hao Xu <[email protected]>
>> ---
>> fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>> 1 file changed, 26 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 8ebb1a794e36..d33777575faf 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct
>> io_kiocb *req)
>> * either spurious wakeup or multishot CQE is served. 0 when it's
>> done with
>> * the request, then the mask is stored in req->cqe.res.
>> */
>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>> {
>> struct io_ring_ctx *ctx = req->ctx;
>> int v;
>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct
>> io_kiocb *req, bool locked)
>> /* multishot, just fill an CQE and proceed */
>> if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>> - __poll_t mask = mangle_poll(req->cqe.res &
>> req->apoll_events);
>> - bool filled;
>> -
>> - spin_lock(&ctx->completion_lock);
>> - filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>> - IORING_CQE_F_MORE);
>> - io_commit_cqring(ctx);
>> - spin_unlock(&ctx->completion_lock);
>> - if (unlikely(!filled))
>> - return -ECANCELED;
>> - io_cqring_ev_posted(ctx);
>> + if (req->flags & REQ_F_APOLL_MULTISHOT) {
>> + io_tw_lock(req->ctx, locked);
>> + if (likely(!(req->task->flags & PF_EXITING)))
>> + io_queue_sqe(req);
>
> That looks dangerous, io_queue_sqe() usually takes the request ownership
> and doesn't expect that someone, i.e. io_poll_check_events(), may still be
> actively using it.
>
> E.g. io_accept() fails on fd < 0, return an error,
> io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
> kills it. Then io_poll_check_events() and polling in general
> carry on using the freed request => UAF. Didn't look at it
> too carefully, but there might other similar cases.
>
I checked this when I did the coding, it seems the only case is
while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
uses req again after req recycled in io_queue_sqe() path like you
pointed out above, but this case should be ok since we haven't
reuse the struct req{} at that point.
In my first version, I skiped the do while{} in io_poll_check_events()
for multishot apoll and do the reap in io_req_task_submit()
static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
{
int ret;
ret = io_poll_check_events(req, locked);
if (ret > 0)
return;
__io_poll_clean(req);
if (!ret)
io_req_task_submit(req, locked); <------here
else
io_req_complete_failed(req, ret);
}
But the disadvantage is in high frequent workloads case, it may loop in
io_poll_check_events for long time, then finally generating cqes in the
above io_req_task_submit() which is not good in terms of latency.
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 5/5] io_uring: implement multishot mode for accept
2022-05-06 14:42 ` Jens Axboe
@ 2022-05-07 9:13 ` Hao Xu
0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 9:13 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel
在 2022/5/6 下午10:42, Jens Axboe 写道:
> On 5/6/22 1:01 AM, Hao Xu wrote:
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 0a83ecc457d1..9febe7774dc3 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1254,6 +1254,7 @@ static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
>> static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
>> static void io_eventfd_signal(struct io_ring_ctx *ctx);
>> static void io_req_tw_post_queue(struct io_kiocb *req, s32 res, u32 cflags);
>> +static void io_poll_remove_entries(struct io_kiocb *req);
>>
>> static struct kmem_cache *req_cachep;
>>
>> @@ -5690,24 +5691,29 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>> static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> {
>> struct io_accept *accept = &req->accept;
>> + bool multishot;
>>
>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>> return -EINVAL;
>> - if (sqe->ioprio || sqe->len || sqe->buf_index)
>> + if (sqe->len || sqe->buf_index)
>> return -EINVAL;
>>
>> accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>> accept->flags = READ_ONCE(sqe->accept_flags);
>> accept->nofile = rlimit(RLIMIT_NOFILE);
>> + multishot = !!(READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT);
>
> I tend to like:
>
> multishot = READ_ONCE(sqe->ioprio) & IORING_ACCEPT_MULTISHOT) != 0;
>
> as I think it's more readable. But I think we really want it ala:
>
> u16 poll_flags;
>
> poll_flags = READ_ONCE(sqe->ioprio);
> if (poll_flags & ~IORING_ACCEPT_MULTISHOT)
> return -EINVAL;
>
> ...
>
> to ensure that we can add more flags later, hence only accepting this
> single flag right now.
>
> Do we need REQ_F_APOLL_MULTI_POLLED, or can we just store whether this
> is a multishot request in struct io_accept?
I think we can do it in this way, but it may be a bit inconvenient if we
add other multishot OPCODE. With REQ_F_APOLL_MULTI_POLLED we can just
check req->flags in the poll arming path, which keeps it op unrelated.
>
>> @@ -5760,7 +5774,35 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>> ret = io_install_fixed_file(req, file, issue_flags,
>> accept->file_slot - 1);
>> }
>> - __io_req_complete(req, issue_flags, ret, 0);
>> +
>> + if (req->flags & REQ_F_APOLL_MULTISHOT) {
>> + if (ret >= 0) {
>> + bool filled;
>> +
>> + spin_lock(&ctx->completion_lock);
>> + filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
>> + IORING_CQE_F_MORE);
>> + io_commit_cqring(ctx);
>> + spin_unlock(&ctx->completion_lock);
>> + if (unlikely(!filled)) {
>> + io_poll_clean(req);
>> + return -ECANCELED;
>> + }
>> + io_cqring_ev_posted(ctx);
>> + goto retry;
>> + } else {
>> + /*
>> + * the apoll multishot req should handle poll
>> + * cancellation by itself since the upper layer
>> + * who called io_queue_sqe() cannot get errors
>> + * happened here.
>> + */
>> + io_poll_clean(req);
>> + return ret;
>> + }
>> + } else {
>> + __io_req_complete(req, issue_flags, ret, 0);
>> + }
>> return 0;
>> }
>
> I'd probably just make that:
>
> if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
> __io_req_complete(req, issue_flags, ret, 0);
> return 0;
> }
> if (ret >= 0) {
> bool filled;
>
> spin_lock(&ctx->completion_lock);
> filled = io_fill_cqe_aux(ctx, req->cqe.user_data, ret,
> IORING_CQE_F_MORE);
> io_commit_cqring(ctx);
> spin_unlock(&ctx->completion_lock);
> if (filled) {
> io_cqring_ev_posted(ctx);
> goto retry;
> }
> /* fall through to error case */
> ret = -ECANCELED;
> }
>
> /*
> * the apoll multishot req should handle poll
> * cancellation by itself since the upper layer
> * who called io_queue_sqe() cannot get errors
> * happened here.
> */
> io_poll_clean(req);
> return ret;
>
> which I think is a lot easier to read and keeps the indentation at a
> manageable level and reduces duplicate code.
Great, thanks, it's better.
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-06 22:02 ` Jens Axboe
2022-05-07 6:32 ` Hao Xu
@ 2022-05-07 9:26 ` Pavel Begunkov
1 sibling, 0 replies; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-07 9:26 UTC (permalink / raw)
To: Jens Axboe, Hao Xu, io-uring; +Cc: linux-kernel
On 5/6/22 23:02, Jens Axboe wrote:
> On 5/6/22 11:19 AM, Pavel Begunkov wrote:
>> On 5/6/22 08:01, Hao Xu wrote:
>>> From: Hao Xu <[email protected]>
>>>
>>> For operations like accept, multishot is a useful feature, since we can
>>> reduce a number of accept sqe. Let's integrate it to fast poll, it may
>>> be good for other operations in the future.
>>>
>>> Signed-off-by: Hao Xu <[email protected]>
>>> ---
>>> fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
>>> 1 file changed, 26 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 8ebb1a794e36..d33777575faf 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
>>> * either spurious wakeup or multishot CQE is served. 0 when it's done with
>>> * the request, then the mask is stored in req->cqe.res.
>>> */
>>> -static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>> +static int io_poll_check_events(struct io_kiocb *req, bool *locked)
>>> {
>>> struct io_ring_ctx *ctx = req->ctx;
>>> int v;
>>> @@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
>>> /* multishot, just fill an CQE and proceed */
>>> if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
>>> - __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
>>> - bool filled;
>>> -
>>> - spin_lock(&ctx->completion_lock);
>>> - filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
>>> - IORING_CQE_F_MORE);
>>> - io_commit_cqring(ctx);
>>> - spin_unlock(&ctx->completion_lock);
>>> - if (unlikely(!filled))
>>> - return -ECANCELED;
>>> - io_cqring_ev_posted(ctx);
>>> + if (req->flags & REQ_F_APOLL_MULTISHOT) {
>>> + io_tw_lock(req->ctx, locked);
>>> + if (likely(!(req->task->flags & PF_EXITING)))
>>> + io_queue_sqe(req);
>>
>> That looks dangerous, io_queue_sqe() usually takes the request
>> ownership and doesn't expect that someone, i.e.
>> io_poll_check_events(), may still be actively using it.
>
> I took a look at this, too. We do own the request at this point, but
Right, but we don't pass the ownership into io_queue_sqe(). IOW,
it can potentially free it / use tw / etc. inside and then we
return back to io_poll_check_events() with a broken req.
> it's still on the poll list. If io_accept() fails, then we do run the
> poll_clean.
>
>> E.g. io_accept() fails on fd < 0, return an error, io_queue_sqe() ->
>> io_queue_async() -> io_req_complete_failed() kills it. Then
>> io_poll_check_events() and polling in general carry on using the freed
>> request => UAF. Didn't look at it too carefully, but there might other
>> similar cases.
>
> But we better have done poll_clean() before returning the error. What am
> I missing here?
One scenario I'd be worry about is sth like:
io_apoll_task_func() |
-> io_poll_check_events() |
// 1st iteration |
-> io_queue_sqe() |
| poll cancel()
| -> set IO_POLL_CANCEL_FLAG
-> io_accept() fails |
-> io_poll_clean() |
-> io_req_complete_failed() |
// 2nd iteration finds IO_POLL_CANCEL_FLAG |
return -ECANCELLED |
-> io_req_complete_failed(req, ret) |
The problem in this example is double io_req_complete_failed()
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/5] io_uring: add a helper for poll clean
2022-05-07 6:43 ` Hao Xu
@ 2022-05-07 9:29 ` Pavel Begunkov
0 siblings, 0 replies; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-07 9:29 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel
On 5/7/22 07:43, Hao Xu wrote:
> 在 2022/5/7 上午12:22, Pavel Begunkov 写道:
>> On 5/6/22 08:01, Hao Xu wrote:
>>> From: Hao Xu <[email protected]>
>>>
>>> Add a helper for poll clean, it will be used in the multishot accept in
>>> the later patches.
>>>
>>> Signed-off-by: Hao Xu <[email protected]>
>>> ---
>>> fs/io_uring.c | 23 ++++++++++++++++++-----
>>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index d33777575faf..0a83ecc457d1 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -5711,6 +5711,23 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> return 0;
>>> }
>>> +static inline void __io_poll_clean(struct io_kiocb *req)
>>> +{
>>> + struct io_ring_ctx *ctx = req->ctx;
>>> +
>>> + io_poll_remove_entries(req);
>>> + spin_lock(&ctx->completion_lock);
>>> + hash_del(&req->hash_node);
>>> + spin_unlock(&ctx->completion_lock);
>>> +}
>>> +
>>> +#define REQ_F_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED)
>>> +static inline void io_poll_clean(struct io_kiocb *req)
>>> +{
>>> + if ((req->flags & REQ_F_APOLL_MULTI_POLLED) == REQ_F_APOLL_MULTI_POLLED)
>>
>> So it triggers for apoll multishot only when REQ_F_POLLED is _not_ set,
>> but if it's not set it did never go through arm_poll / etc. and there is
>> nothing to clean up. What is the catch?
>
> No, it is triggered for apoll multishot only when REQ_F_POLLED is set..
Ok, see it now, probably confused REQ_F_APOLL_MULTI_POLLED on the right
hand side with something else
>> btw, don't see the function used in this patch, better to add it later
>> or at least mark with attribute unused, or some may get build failures.
> Gotcha.
>>
>>
>>> + __io_poll_clean(req);
>>> +}
>>> +
>>> static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
>>> {
>>> struct io_accept *accept = &req->accept;
>>> @@ -6041,17 +6058,13 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
>>> static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
>>> {
>>> - struct io_ring_ctx *ctx = req->ctx;
>>> int ret;
>>> ret = io_poll_check_events(req, locked);
>>> if (ret > 0)
>>> return;
>>> - io_poll_remove_entries(req);
>>> - spin_lock(&ctx->completion_lock);
>>> - hash_del(&req->hash_node);
>>> - spin_unlock(&ctx->completion_lock);
>>> + __io_poll_clean(req);
>>> if (!ret)
>>> io_req_task_submit(req, locked);
>>
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-07 7:08 ` Hao Xu
@ 2022-05-07 9:47 ` Pavel Begunkov
2022-05-07 11:06 ` Hao Xu
0 siblings, 1 reply; 36+ messages in thread
From: Pavel Begunkov @ 2022-05-07 9:47 UTC (permalink / raw)
To: Hao Xu, io-uring; +Cc: Jens Axboe, linux-kernel
On 5/7/22 08:08, Hao Xu wrote:
> 在 2022/5/7 上午1:19, Pavel Begunkov 写道:
>> On 5/6/22 08:01, Hao Xu wrote:
[...]
>> That looks dangerous, io_queue_sqe() usually takes the request ownership
>> and doesn't expect that someone, i.e. io_poll_check_events(), may still be
>> actively using it.
>>
>> E.g. io_accept() fails on fd < 0, return an error,
>> io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
>> kills it. Then io_poll_check_events() and polling in general
>> carry on using the freed request => UAF. Didn't look at it
>> too carefully, but there might other similar cases.
>>
> I checked this when I did the coding, it seems the only case is
> while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
> uses req again after req recycled in io_queue_sqe() path like you
> pointed out above, but this case should be ok since we haven't
> reuse the struct req{} at that point.
Replied to another message with an example that I think might
be broken, please take a look.
The issue is that io_queue_sqe() was always consuming / freeing /
redirecting / etc. requests, i.e. call it and forget about the req.
With io_accept now it may or may not free it and not even returning
any return code about that. This implicit knowledge is quite tricky
to maintain.
might make more sense to "duplicate" io_queue_sqe()
ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
// REQ_F_COMPLETE_INLINE should never happen, no check for that
// don't care about io_arm_ltimeout(), should already be armed
// ret handling here
> In my first version, I skiped the do while{} in io_poll_check_events()
> for multishot apoll and do the reap in io_req_task_submit()
>
> static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
> {
> int ret;
>
>
> ret = io_poll_check_events(req, locked);
> if (ret > 0)
> return;
>
>
> __io_poll_clean(req);
>
>
> if (!ret)
> io_req_task_submit(req, locked); <------here
> else
> io_req_complete_failed(req, ret);
> }
>
> But the disadvantage is in high frequent workloads case, it may loop in
> io_poll_check_events for long time, then finally generating cqes in the
> above io_req_task_submit() which is not good in terms of latency.
>>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/5] io_uring: let fast poll support multishot
2022-05-07 9:47 ` Pavel Begunkov
@ 2022-05-07 11:06 ` Hao Xu
0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 11:06 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe, linux-kernel
在 2022/5/7 下午5:47, Pavel Begunkov 写道:
> On 5/7/22 08:08, Hao Xu wrote:
>> 在 2022/5/7 上午1:19, Pavel Begunkov 写道:
>>> On 5/6/22 08:01, Hao Xu wrote:
> [...]
>>> That looks dangerous, io_queue_sqe() usually takes the request ownership
>>> and doesn't expect that someone, i.e. io_poll_check_events(), may
>>> still be
>>> actively using it.
>>>
>>> E.g. io_accept() fails on fd < 0, return an error,
>>> io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
>>> kills it. Then io_poll_check_events() and polling in general
>>> carry on using the freed request => UAF. Didn't look at it
>>> too carefully, but there might other similar cases.
>>>
>> I checked this when I did the coding, it seems the only case is
>> while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
>> uses req again after req recycled in io_queue_sqe() path like you
>> pointed out above, but this case should be ok since we haven't
>> reuse the struct req{} at that point.
>
> Replied to another message with an example that I think might
> be broken, please take a look.
I saw it just now, it looks a valid case to me. Thanks.
>
> The issue is that io_queue_sqe() was always consuming / freeing /
> redirecting / etc. requests, i.e. call it and forget about the req.
> With io_accept now it may or may not free it and not even returning
> any return code about that. This implicit knowledge is quite tricky
> to maintain.
>
> might make more sense to "duplicate" io_queue_sqe()
>
> ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> // REQ_F_COMPLETE_INLINE should never happen, no check for that
> // don't care about io_arm_ltimeout(), should already be armed
> // ret handling here
This is what I'm doing for v3, indeed make more sense.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 0/5] fast poll multishot mode
2022-05-07 3:08 ` Jens Axboe
@ 2022-05-07 16:01 ` Hao Xu
0 siblings, 0 replies; 36+ messages in thread
From: Hao Xu @ 2022-05-07 16:01 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel
在 2022/5/7 上午11:08, Jens Axboe 写道:
> On 5/6/22 8:33 PM, Jens Axboe wrote:
>> On 5/6/22 5:26 PM, Jens Axboe wrote:
>>> On 5/6/22 4:23 PM, Jens Axboe wrote:
>>>> On 5/6/22 1:00 AM, Hao Xu wrote:
>>>>> Let multishot support multishot mode, currently only add accept as its
>>>>> first comsumer.
>>>>> theoretical analysis:
>>>>> 1) when connections come in fast
>>>>> - singleshot:
>>>>> add accept sqe(userpsace) --> accept inline
>>>>> ^ |
>>>>> |-----------------|
>>>>> - multishot:
>>>>> add accept sqe(userspace) --> accept inline
>>>>> ^ |
>>>>> |--*--|
>>>>>
>>>>> we do accept repeatedly in * place until get EAGAIN
>>>>>
>>>>> 2) when connections come in at a low pressure
>>>>> similar thing like 1), we reduce a lot of userspace-kernel context
>>>>> switch and useless vfs_poll()
>>>>>
>>>>>
>>>>> tests:
>>>>> Did some tests, which goes in this way:
>>>>>
>>>>> server client(multiple)
>>>>> accept connect
>>>>> read write
>>>>> write read
>>>>> close close
>>>>>
>>>>> Basically, raise up a number of clients(on same machine with server) to
>>>>> connect to the server, and then write some data to it, the server will
>>>>> write those data back to the client after it receives them, and then
>>>>> close the connection after write return. Then the client will read the
>>>>> data and then close the connection. Here I test 10000 clients connect
>>>>> one server, data size 128 bytes. And each client has a go routine for
>>>>> it, so they come to the server in short time.
>>>>> test 20 times before/after this patchset, time spent:(unit cycle, which
>>>>> is the return value of clock())
>>>>> before:
>>>>> 1930136+1940725+1907981+1947601+1923812+1928226+1911087+1905897+1941075
>>>>> +1934374+1906614+1912504+1949110+1908790+1909951+1941672+1969525+1934984
>>>>> +1934226+1914385)/20.0 = 1927633.75
>>>>> after:
>>>>> 1858905+1917104+1895455+1963963+1892706+1889208+1874175+1904753+1874112
>>>>> +1874985+1882706+1884642+1864694+1906508+1916150+1924250+1869060+1889506
>>>>> +1871324+1940803)/20.0 = 1894750.45
>>>>>
>>>>> (1927633.75 - 1894750.45) / 1927633.75 = 1.65%
>>>>>
>>>>>
>>>>> A liburing test is here:
>>>>> https://github.com/HowHsu/liburing/blob/multishot_accept/test/accept.c
>>>>
>>>> Wish I had seen that, I wrote my own! But maybe that's good, you tend to
>>>> find other issues through that.
>>>>
>>>> Anyway, works for me in testing, and I can see this being a nice win for
>>>> accept intensive workloads. I pushed a bunch of cleanup patches that
>>>> should just get folded in. Can you fold them into your patches and
>>>> address the other feedback, and post a v3? I pushed the test branch
>>>> here:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=fastpoll-mshot
>>>
>>> Quick benchmark here, accepting 10k connections:
>>>
>>> Stock kernel
>>> real 0m0.728s
>>> user 0m0.009s
>>> sys 0m0.192s
>>>
>>> Patched
>>> real 0m0.684s
>>> user 0m0.018s
>>> sys 0m0.102s
>>>
>>> Looks like a nice win for a highly synthetic benchmark. Nothing
>>> scientific, was just curious.
>>
>> One more thought on this - how is it supposed to work with
>> accept-direct? One idea would be to make it incrementally increasing.
>> But we need a good story for that, if it's exclusive to non-direct
>> files, then it's a lot less interesting as the latter is really nice win
>> for lots of files. If we can combine the two, even better.
>
> Running some quick testing, on an actual test box (previous numbers were
> from a vm on my laptop):
>
> Testing singleshot, normal files
> Did 10000 accepts
>
> ________________________________________________________
> Executed in 216.10 millis fish external
> usr time 9.32 millis 150.00 micros 9.17 millis
> sys time 110.06 millis 67.00 micros 109.99 millis
>
> Testing multishot, fixed files
> Did 10000 accepts
>
> ________________________________________________________
> Executed in 189.04 millis fish external
> usr time 11.86 millis 159.00 micros 11.71 millis
> sys time 93.71 millis 70.00 micros 93.64 millis
>
> That's about ~19 usec to accept a connection, pretty decent. Using
> singleshot and with fixed files, it shaves about ~8% off, ends at around
> 200msec.
>
> I think we can get away with using fixed files and multishot, attaching
I'm not following, do you mean we shouldn't do the multishot+fixed file
or we should use multishot+fixed to make the result better?
> the quick patch I did below to test it. We need something better than
Sorry Jens, I didn't see the quick patch, is there anything I misunderstand?
> this, otherwise once the space fills up, we'll likely end up with a
> sparse space and the naive approach of just incrementing the next slot
> won't work at all.
>
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2022-05-07 16:01 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-06 7:00 [PATCH v2 0/5] fast poll multishot mode Hao Xu
2022-05-06 7:00 ` [PATCH 1/5] io_uring: add IORING_ACCEPT_MULTISHOT for accept Hao Xu
2022-05-06 14:32 ` Jens Axboe
2022-05-07 4:05 ` Hao Xu
2022-05-06 7:00 ` [PATCH 2/5] io_uring: add REQ_F_APOLL_MULTISHOT for requests Hao Xu
2022-05-06 7:01 ` [PATCH 3/5] io_uring: let fast poll support multishot Hao Xu
2022-05-06 17:19 ` Pavel Begunkov
2022-05-06 22:02 ` Jens Axboe
2022-05-07 6:32 ` Hao Xu
2022-05-07 9:26 ` Pavel Begunkov
2022-05-07 7:08 ` Hao Xu
2022-05-07 9:47 ` Pavel Begunkov
2022-05-07 11:06 ` Hao Xu
2022-05-06 18:02 ` kernel test robot
2022-05-06 7:01 ` [PATCH 4/5] io_uring: add a helper for poll clean Hao Xu
2022-05-06 11:04 ` kernel test robot
2022-05-06 12:47 ` kernel test robot
2022-05-06 14:36 ` Jens Axboe
2022-05-07 6:37 ` Hao Xu
2022-05-06 16:22 ` Pavel Begunkov
2022-05-07 6:43 ` Hao Xu
2022-05-07 9:29 ` Pavel Begunkov
2022-05-06 7:01 ` [PATCH 5/5] io_uring: implement multishot mode for accept Hao Xu
2022-05-06 14:42 ` Jens Axboe
2022-05-07 9:13 ` Hao Xu
2022-05-06 20:50 ` Jens Axboe
2022-05-06 21:29 ` Jens Axboe
2022-05-06 7:36 ` [PATCH v2 0/5] fast poll multishot mode Hao Xu
2022-05-06 14:18 ` Jens Axboe
2022-05-06 16:01 ` Pavel Begunkov
2022-05-06 16:03 ` Jens Axboe
2022-05-06 22:23 ` Jens Axboe
2022-05-06 23:26 ` Jens Axboe
2022-05-07 2:33 ` Jens Axboe
2022-05-07 3:08 ` Jens Axboe
2022-05-07 16:01 ` Hao Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox