From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: [email protected], [email protected]
Subject: Re: [PATCH 09/11] io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
Date: Fri, 7 Feb 2025 12:38:45 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 2/4/25 19:46, Jens Axboe wrote:
> For existing epoll event loops that can't fully convert to io_uring,
> the used approach is usually to add the io_uring fd to the epoll
> instance and use epoll_wait() to wait on both "legacy" and io_uring
> events. While this work, it isn't optimal as:
>
> 1) epoll_wait() is pretty limited in what it can do. It does not support
> partial reaping of events, or waiting on a batch of events.
>
> 2) When an io_uring ring is added to an epoll instance, it activates the
> io_uring "I'm being polled" logic which slows things down.
>
> Rather than use this approach, with EPOLL_WAIT support added to io_uring,
> event loops can use the normal io_uring wait logic for everything, as
> long as an epoll wait request has been armed with io_uring.
>
> Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this
> is an async request. Waiting on io_uring events in general has various
> timeout parameters, and those are the ones that should be used when
> waiting on any kind of request. If events are immediately available for
> reaping, then This opcode will return those immediately. If none are
> available, then it will post an async completion when they become
> available.
>
> cqe->res will contain either an error code (< 0 value) for a malformed
> request, invalid epoll instance, etc. It will return a positive result
> indicating how many events were reaped.
>
> IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring
> cancelation infrastructure. The poll logic for managing ownership is
> adopted to guard the epoll side too.
...
> diff --git a/io_uring/epoll.c b/io_uring/epoll.c
> index 7848d9cc073d..5a47f0cce647 100644
> --- a/io_uring/epoll.c
> +++ b/io_uring/epoll.c
...
> +static void io_epoll_retry(struct io_kiocb *req, struct io_tw_state *ts)
> +{
> + int v;
> +
> + do {
> + v = atomic_read(&req->poll_refs);
> + if (unlikely(v != 1)) {
> + if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))
> + return;
> + if (v & IO_POLL_CANCEL_FLAG) {
> + __io_epoll_cancel(req);
> + return;
> + }
> + if (v & IO_POLL_FINISH_FLAG)
> + return;
> + }
> + v &= IO_POLL_REF_MASK;
> + } while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK);
I haven't looked deep into the set, but this loop looks very
suspicious. The entire purpose of the twin loop in poll.c is
not to lose events while doing processing, which is why the
processing happens before the decrement...
> + io_req_task_submit(req, ts);
Maybe the issue is supposed to handle that, but this one is
not allowed unless you fully unhash all the polling. Once you
dropped refs the poll wait entry feels free to claim the request,
and, for example, queue a task work, and io_req_task_submit()
would decide to queue it as well. It's likely not the only
race that can happen.
> +}
> +
> +static int io_epoll_execute(struct io_kiocb *req)
> +{
> + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait);
> +
> + list_del_init_careful(&iew->wait.entry);
> + if (io_poll_get_ownership(req)) {
> + req->io_task_work.func = io_epoll_retry;
> + io_req_task_work_add(req);
> + }
> +
> + return 1;
> +}
...
> +int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags)
> +{
> + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait);
> + struct io_ring_ctx *ctx = req->ctx;
> + int ret;
> +
> + io_ring_submit_lock(ctx, issue_flags);
> +
> + ret = epoll_wait(req->file, iew->events, iew->maxevents, NULL, &iew->wait);
> + if (ret == -EIOCBQUEUED) {
> + if (hlist_unhashed(&req->hash_node))
> + hlist_add_head(&req->hash_node, &ctx->epoll_list);
> + io_ring_submit_unlock(ctx, issue_flags);
> + return IOU_ISSUE_SKIP_COMPLETE;
> + } else if (ret < 0) {
> + req_set_fail(req);
> + }
> + hlist_del_init(&req->hash_node);
> + io_ring_submit_unlock(ctx, issue_flags);
> + io_req_set_res(req, ret, 0);
> + return IOU_OK;
> +}
...
--
Pavel Begunkov
next prev parent reply other threads:[~2025-02-07 12:38 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-04 19:46 [PATCHSET v2 0/11] io_uring epoll wait support Jens Axboe
2025-02-04 19:46 ` [PATCH 01/11] eventpoll: abstract out main epoll reaper into a function Jens Axboe
2025-02-04 19:46 ` [PATCH 02/11] eventpoll: add helper to remove wait entry from wait queue head Jens Axboe
2025-02-04 19:46 ` [PATCH 03/11] eventpoll: abstract out ep_try_send_events() helper Jens Axboe
2025-02-04 19:46 ` [PATCH 04/11] eventpoll: add struct wait_queue_entry argument to epoll_wait() Jens Axboe
2025-02-04 19:46 ` [PATCH 05/11] eventpoll: add ep_poll_queue() loop Jens Axboe
2025-02-07 12:28 ` Pavel Begunkov
2025-02-07 14:29 ` Jens Axboe
2025-02-04 19:46 ` [PATCH 06/11] io_uring/epoll: remove CONFIG_EPOLL guards Jens Axboe
2025-02-04 19:46 ` [PATCH 07/11] io_uring/poll: pull ownership handling into poll.h Jens Axboe
2025-02-04 19:46 ` [PATCH 08/11] io_uring/poll: add IO_POLL_FINISH_FLAG Jens Axboe
2025-02-07 12:15 ` Pavel Begunkov
2025-02-07 14:29 ` Jens Axboe
2025-02-04 19:46 ` [PATCH 09/11] io_uring/epoll: add support for IORING_OP_EPOLL_WAIT Jens Axboe
2025-02-07 12:38 ` Pavel Begunkov [this message]
2025-02-07 17:13 ` Jens Axboe
2025-02-04 19:46 ` [PATCH 10/11] io_uring/epoll: add support for provided buffers Jens Axboe
2025-02-04 19:46 ` [PATCH 11/11] io_uring/epoll: add multishot support for IORING_OP_EPOLL_WAIT Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox