From: Pavel Begunkov <asml.silence@gmail.com>
To: Max Kellermann <max.kellermann@ionos.com>,
axboe@kernel.dk, io-uring@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 8/8] io_uring: skip redundant poll wakeups
Date: Fri, 31 Jan 2025 13:54:25 +0000 [thread overview]
Message-ID: <794043b6-4008-448e-b241-1390aa91d2ab@gmail.com> (raw)
In-Reply-To: <20250128133927.3989681-9-max.kellermann@ionos.com>
On 1/28/25 13:39, Max Kellermann wrote:
> Using io_uring with epoll is very expensive because every completion
> leads to a __wake_up() call, most of which are unnecessary because the
> polling process has already been woken up but has not had a chance to
> process the completions. During this time, wq_has_sleeper() still
> returns true, therefore this check is not enough.
The poller is not required to call vfs_poll / io_uring_poll()
multiple times, in which case all subsequent events will be dropped
on the floor. E.g. the poller queues a poll entry in the first
io_uring_poll(), and then every time there is an event it'd do
vfs_read() or whatever without removing the entry.
I don't think we can make such assumptions about the poller, at
least without some help from it / special casing particular
callbacks.
...
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 137c2066c5a3..b65efd07e9f0 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
...
> @@ -2793,6 +2794,9 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
>
> if (unlikely(!ctx->poll_activated))
> io_activate_pollwq(ctx);
> +
> + atomic_set(&ctx->poll_wq_waiting, 1);
io_uring_poll() |
poll_wq_waiting = 1 |
| io_poll_wq_wake()
| poll_wq_waiting = 0
| // no waiters yet => no wake ups
| <return to user space>
| <consume all cqes>
poll_wait() |
return; // no events |
| produce_cqes()
| io_poll_wq_wake()
| if (poll_wq_waiting) wake();
| // it's still 0, wake up is lost
> /*
> * provides mb() which pairs with barrier from wq_has_sleeper
> * call in io_commit_cqring
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index f65e3f3ede51..186cee066f9f 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -287,7 +287,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
>
> static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
> {
> - if (wq_has_sleeper(&ctx->poll_wq))
> + if (wq_has_sleeper(&ctx->poll_wq) && atomic_xchg_release(&ctx->poll_wq_waiting, 0) > 0)
> __wake_up(&ctx->poll_wq, TASK_NORMAL, 0,
> poll_to_key(EPOLL_URING_WAKE | EPOLLIN));
> }
--
Pavel Begunkov
next prev parent reply other threads:[~2025-01-31 13:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 13:39 [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention) Max Kellermann
2025-01-28 13:39 ` [PATCH 1/8] io_uring/io-wq: eliminate redundant io_work_get_acct() calls Max Kellermann
2025-01-28 13:39 ` [PATCH 2/8] io_uring/io-wq: add io_worker.acct pointer Max Kellermann
2025-01-28 13:39 ` [PATCH 3/8] io_uring/io-wq: move worker lists to struct io_wq_acct Max Kellermann
2025-01-28 13:39 ` [PATCH 4/8] io_uring/io-wq: cache work->flags in variable Max Kellermann
2025-01-29 18:57 ` Pavel Begunkov
2025-01-29 19:11 ` Max Kellermann
2025-01-29 23:41 ` Pavel Begunkov
2025-01-30 5:36 ` Max Kellermann
2025-01-30 14:57 ` Jens Axboe
2025-01-31 14:06 ` Pavel Begunkov
2025-01-30 14:54 ` Jens Axboe
2025-01-28 13:39 ` [PATCH 5/8] io_uring/io-wq: do not use bogus hash value Max Kellermann
2025-01-28 13:39 ` [PATCH 6/8] io_uring/io-wq: pass io_wq to io_get_next_work() Max Kellermann
2025-01-28 13:39 ` [PATCH 7/8] io_uring: cache io_kiocb->flags in variable Max Kellermann
2025-01-29 19:11 ` Pavel Begunkov
2025-02-04 12:07 ` Pavel Begunkov
2025-02-04 19:45 ` Jens Axboe
2025-01-28 13:39 ` [PATCH 8/8] io_uring: skip redundant poll wakeups Max Kellermann
2025-01-31 13:54 ` Pavel Begunkov [this message]
2025-01-31 17:16 ` Max Kellermann
2025-01-31 17:25 ` Pavel Begunkov
2025-01-29 17:18 ` [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention) Jens Axboe
2025-01-29 17:39 ` Max Kellermann
2025-01-29 17:45 ` Jens Axboe
2025-01-29 18:01 ` Max Kellermann
2025-01-31 16:13 ` Jens Axboe
2025-02-01 15:25 ` Jens Axboe
2025-02-01 15:30 ` Max Kellermann
2025-02-01 15:38 ` Jens Axboe
2025-01-29 19:30 ` Pavel Begunkov
2025-01-29 19:43 ` Max Kellermann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=794043b6-4008-448e-b241-1390aa91d2ab@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=max.kellermann@ionos.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox