From: Jens Axboe <[email protected]>
To: [email protected]
Cc: [email protected], [email protected],
[email protected], Jens Axboe <[email protected]>
Subject: [PATCH 5/5] io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
Date: Wed, 19 Feb 2025 10:22:28 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
For existing epoll event loops that can't fully convert to io_uring,
the used approach is usually to add the io_uring fd to the epoll
instance and use epoll_wait() to wait on both "legacy" and io_uring
events. While this work, it isn't optimal as:
1) epoll_wait() is pretty limited in what it can do. It does not support
partial reaping of events, or waiting on a batch of events.
2) When an io_uring ring is added to an epoll instance, it activates the
io_uring "I'm being polled" logic which slows things down.
Rather than use this approach, with EPOLL_WAIT support added to io_uring,
event loops can use the normal io_uring wait logic for everything, as
long as an epoll wait request has been armed with io_uring.
Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this
is an async request. Waiting on io_uring events in general has various
timeout parameters, and those are the ones that should be used when
waiting on any kind of request. If events are immediately available for
reaping, then This opcode will return those immediately. If none are
available, then it will post an async completion when they become
available.
cqe->res will contain either an error code (< 0 value) for a malformed
request, invalid epoll instance, etc. It will return a positive result
indicating how many events were reaped.
IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring
cancelation infrastructure.
Signed-off-by: Jens Axboe <[email protected]>
---
include/uapi/linux/io_uring.h | 1 +
io_uring/epoll.c | 33 +++++++++++++++++++++++++++++++++
io_uring/epoll.h | 2 ++
io_uring/opdef.c | 14 ++++++++++++++
4 files changed, 50 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 05d6255b0f6a..135eb9296296 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -280,6 +280,7 @@ enum io_uring_op {
IORING_OP_BIND,
IORING_OP_LISTEN,
IORING_OP_RECV_ZC,
+ IORING_OP_EPOLL_WAIT,
/* this goes last, obviously */
IORING_OP_LAST,
diff --git a/io_uring/epoll.c b/io_uring/epoll.c
index 7848d9cc073d..6d2c48ba1923 100644
--- a/io_uring/epoll.c
+++ b/io_uring/epoll.c
@@ -20,6 +20,12 @@ struct io_epoll {
struct epoll_event event;
};
+struct io_epoll_wait {
+ struct file *file;
+ int maxevents;
+ struct epoll_event __user *events;
+};
+
int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_epoll *epoll = io_kiocb_to_cmd(req, struct io_epoll);
@@ -57,3 +63,30 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags)
io_req_set_res(req, ret, 0);
return IOU_OK;
}
+
+int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait);
+
+ if (sqe->off || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in)
+ return -EINVAL;
+
+ iew->maxevents = READ_ONCE(sqe->len);
+ iew->events = u64_to_user_ptr(READ_ONCE(sqe->addr));
+ return 0;
+}
+
+int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait);
+ int ret;
+
+ ret = epoll_sendevents(req->file, iew->events, iew->maxevents);
+ if (ret == 0)
+ return -EAGAIN;
+ if (ret < 0)
+ req_set_fail(req);
+
+ io_req_set_res(req, ret, 0);
+ return IOU_OK;
+}
diff --git a/io_uring/epoll.h b/io_uring/epoll.h
index 870cce11ba98..4111997c360b 100644
--- a/io_uring/epoll.h
+++ b/io_uring/epoll.h
@@ -3,4 +3,6 @@
#if defined(CONFIG_EPOLL)
int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags);
+int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags);
#endif
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 89f50ecadeaf..9344534780a0 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -527,6 +527,17 @@ const struct io_issue_def io_issue_defs[] = {
.issue = io_recvzc,
#else
.prep = io_eopnotsupp_prep,
+#endif
+ },
+ [IORING_OP_EPOLL_WAIT] = {
+ .needs_file = 1,
+ .audit_skip = 1,
+ .pollin = 1,
+#if defined(CONFIG_EPOLL)
+ .prep = io_epoll_wait_prep,
+ .issue = io_epoll_wait,
+#else
+ .prep = io_eopnotsupp_prep,
#endif
},
};
@@ -761,6 +772,9 @@ const struct io_cold_def io_cold_defs[] = {
[IORING_OP_RECV_ZC] = {
.name = "RECV_ZC",
},
+ [IORING_OP_EPOLL_WAIT] = {
+ .name = "EPOLL_WAIT",
+ },
};
const char *io_uring_get_opcode(u8 opcode)
--
2.47.2
next prev parent reply other threads:[~2025-02-19 17:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-19 17:22 [PATCHSET v4 0/7] io_uring epoll wait support Jens Axboe
2025-02-19 17:22 ` [PATCH 1/5] eventpoll: abstract out parameter sanity checking Jens Axboe
2025-02-19 17:22 ` [PATCH 2/5] eventpoll: abstract out ep_try_send_events() helper Jens Axboe
2025-02-19 17:22 ` [PATCH 3/5] eventpoll: add epoll_sendevents() helper Jens Axboe
2025-02-19 17:22 ` [PATCH 4/5] io_uring/epoll: remove CONFIG_EPOLL guards Jens Axboe
2025-02-19 17:22 ` Jens Axboe [this message]
2025-02-24 14:17 ` [PATCH 5/5] io_uring/epoll: add support for IORING_OP_EPOLL_WAIT Pavel Begunkov
2025-02-20 9:21 ` (subset) [PATCHSET v4 0/7] io_uring epoll wait support Christian Brauner
2025-02-20 15:15 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox