* [PATCH 1/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
2025-01-07 16:26 [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Oleg Nesterov
@ 2025-01-07 16:27 ` Oleg Nesterov
2025-01-07 16:27 ` [PATCH 2/5] poll_wait: kill the obsolete wait_address check Oleg Nesterov
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2025-01-07 16:27 UTC (permalink / raw)
To: Christian Brauner, Linus Torvalds, Manfred Spraul
Cc: David S. Miller, Eric Dumazet, Jens Axboe, Pavel Begunkov,
WangYuli, linux-kernel, io-uring, netdev
As the comment above waitqueue_active() explains, it can only be used
if both waker and waiter have mb()'s that pair with each other. However
__pollwait() is broken in this respect.
This is not pipe-specific, but let's look at pipe_poll() for example:
poll_wait(...); // -> __pollwait() -> add_wait_queue()
LOAD(pipe->head);
LOAD(pipe->head);
In theory these LOAD()'s can leak into the critical section inside
add_wait_queue() and can happen before list_add(entry, wq_head), in this
case pipe_poll() can race with wakeup_pipe_readers/writers which do
smp_mb();
if (waitqueue_active(wq_head))
wake_up_interruptible(wq_head);
There are more __pollwait()-like functions (grep init_poll_funcptr), and
it seems that at least ep_ptable_queue_proc() has the same problem, so the
patch adds smp_mb() into poll_wait().
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Oleg Nesterov <[email protected]>
---
include/linux/poll.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/linux/poll.h b/include/linux/poll.h
index d1ea4f3714a8..fc641b50f129 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -41,8 +41,16 @@ typedef struct poll_table_struct {
static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
{
- if (p && p->_qproc && wait_address)
+ if (p && p->_qproc && wait_address) {
p->_qproc(filp, wait_address, p);
+ /*
+ * This memory barrier is paired in the wq_has_sleeper().
+ * See the comment above prepare_to_wait(), we need to
+ * ensure that subsequent tests in this thread can't be
+ * reordered with __add_wait_queue() in _qproc() paths.
+ */
+ smp_mb();
+ }
}
/*
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/5] poll_wait: kill the obsolete wait_address check
2025-01-07 16:26 [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Oleg Nesterov
2025-01-07 16:27 ` [PATCH 1/5] " Oleg Nesterov
@ 2025-01-07 16:27 ` Oleg Nesterov
2025-01-07 16:27 ` [PATCH 3/5] io_uring_poll: kill the no longer necessary barrier after poll_wait() Oleg Nesterov
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2025-01-07 16:27 UTC (permalink / raw)
To: Christian Brauner, Linus Torvalds, Manfred Spraul
Cc: David S. Miller, Eric Dumazet, Jens Axboe, Pavel Begunkov,
WangYuli, linux-kernel, io-uring, netdev
This check is historical and no longer needed, wait_address is never NULL.
These days we rely on the poll_table->_qproc check. NULL if select/poll
is not going to sleep, or it already has a data to report, or all waiters
have already been registered after the 1st iteration.
However, poll_table *p can be NULL, see p9_fd_poll() for example, so we
can't remove the "p != NULL" check.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Oleg Nesterov <[email protected]>
---
include/linux/poll.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/poll.h b/include/linux/poll.h
index fc641b50f129..57b6d1ccd8bf 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -41,7 +41,7 @@ typedef struct poll_table_struct {
static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
{
- if (p && p->_qproc && wait_address) {
+ if (p && p->_qproc) {
p->_qproc(filp, wait_address, p);
/*
* This memory barrier is paired in the wq_has_sleeper().
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/5] io_uring_poll: kill the no longer necessary barrier after poll_wait()
2025-01-07 16:26 [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Oleg Nesterov
2025-01-07 16:27 ` [PATCH 1/5] " Oleg Nesterov
2025-01-07 16:27 ` [PATCH 2/5] poll_wait: kill the obsolete wait_address check Oleg Nesterov
@ 2025-01-07 16:27 ` Oleg Nesterov
2025-01-07 16:27 ` [PATCH 4/5] sock_poll_wait: " Oleg Nesterov
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2025-01-07 16:27 UTC (permalink / raw)
To: Christian Brauner, Linus Torvalds, Manfred Spraul
Cc: David S. Miller, Eric Dumazet, Jens Axboe, Pavel Begunkov,
WangYuli, linux-kernel, io-uring, netdev
Now that poll_wait() provides a full barrier we can remove smp_rmb() from
io_uring_poll().
In fact I don't think smp_rmb() was correct, it can't serialize LOADs and
STOREs.
Signed-off-by: Oleg Nesterov <[email protected]>
---
io_uring/io_uring.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 06ff41484e29..a64a82b93b86 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2809,13 +2809,12 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
if (unlikely(!ctx->poll_activated))
io_activate_pollwq(ctx);
-
- poll_wait(file, &ctx->poll_wq, wait);
/*
- * synchronizes with barrier from wq_has_sleeper call in
- * io_commit_cqring
+ * provides mb() which pairs with barrier from wq_has_sleeper
+ * call in io_commit_cqring
*/
- smp_rmb();
+ poll_wait(file, &ctx->poll_wq, wait);
+
if (!io_sqring_full(ctx))
mask |= EPOLLOUT | EPOLLWRNORM;
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/5] sock_poll_wait: kill the no longer necessary barrier after poll_wait()
2025-01-07 16:26 [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Oleg Nesterov
` (2 preceding siblings ...)
2025-01-07 16:27 ` [PATCH 3/5] io_uring_poll: kill the no longer necessary barrier after poll_wait() Oleg Nesterov
@ 2025-01-07 16:27 ` Oleg Nesterov
2025-01-07 16:27 ` [PATCH 5/5] poll: kill poll_does_not_wait() Oleg Nesterov
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2025-01-07 16:27 UTC (permalink / raw)
To: Christian Brauner, Linus Torvalds, Manfred Spraul
Cc: David S. Miller, Eric Dumazet, Jens Axboe, Pavel Begunkov,
WangYuli, linux-kernel, io-uring, netdev
Now that poll_wait() provides a full barrier we can remove smp_mb() from
sock_poll_wait().
Also, the poll_does_not_wait() check before poll_wait() just adds the
unnecessary confusion, kill it. poll_wait() does the same "p && p->_qproc"
check.
Signed-off-by: Oleg Nesterov <[email protected]>
---
include/net/sock.h | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 7464e9f9f47c..305f3ae5edc2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2291,7 +2291,7 @@ static inline bool skwq_has_sleeper(struct socket_wq *wq)
}
/**
- * sock_poll_wait - place memory barrier behind the poll_wait call.
+ * sock_poll_wait - wrapper for the poll_wait call.
* @filp: file
* @sock: socket to wait on
* @p: poll_table
@@ -2301,15 +2301,12 @@ static inline bool skwq_has_sleeper(struct socket_wq *wq)
static inline void sock_poll_wait(struct file *filp, struct socket *sock,
poll_table *p)
{
- if (!poll_does_not_wait(p)) {
- poll_wait(filp, &sock->wq.wait, p);
- /* We need to be sure we are in sync with the
- * socket flags modification.
- *
- * This memory barrier is paired in the wq_has_sleeper.
- */
- smp_mb();
- }
+ /* Provides a barrier we need to be sure we are in sync
+ * with the socket flags modification.
+ *
+ * This memory barrier is paired in the wq_has_sleeper.
+ */
+ poll_wait(filp, &sock->wq.wait, p);
}
static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] poll: kill poll_does_not_wait()
2025-01-07 16:26 [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Oleg Nesterov
` (3 preceding siblings ...)
2025-01-07 16:27 ` [PATCH 4/5] sock_poll_wait: " Oleg Nesterov
@ 2025-01-07 16:27 ` Oleg Nesterov
2025-01-07 17:38 ` [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Linus Torvalds
2025-01-10 11:00 ` Christian Brauner
6 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2025-01-07 16:27 UTC (permalink / raw)
To: Christian Brauner, Linus Torvalds, Manfred Spraul
Cc: David S. Miller, Eric Dumazet, Jens Axboe, Pavel Begunkov,
WangYuli, linux-kernel, io-uring, netdev
It no longer has users.
Signed-off-by: Oleg Nesterov <[email protected]>
---
include/linux/poll.h | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 57b6d1ccd8bf..12bb18e8b978 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -25,14 +25,14 @@
struct poll_table_struct;
-/*
+/*
* structures and helpers for f_op->poll implementations
*/
typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_table_struct *);
/*
- * Do not touch the structure directly, use the access functions
- * poll_does_not_wait() and poll_requested_events() instead.
+ * Do not touch the structure directly, use the access function
+ * poll_requested_events() instead.
*/
typedef struct poll_table_struct {
poll_queue_proc _qproc;
@@ -53,16 +53,6 @@ static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_addres
}
}
-/*
- * Return true if it is guaranteed that poll will not wait. This is the case
- * if the poll() of another file descriptor in the set got an event, so there
- * is no need for waiting.
- */
-static inline bool poll_does_not_wait(const poll_table *p)
-{
- return p == NULL || p->_qproc == NULL;
-}
-
/*
* Return the set of events that the application wants to poll for.
* This is useful for drivers that need to know whether a DMA transfer has
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
2025-01-07 16:26 [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Oleg Nesterov
` (4 preceding siblings ...)
2025-01-07 16:27 ` [PATCH 5/5] poll: kill poll_does_not_wait() Oleg Nesterov
@ 2025-01-07 17:38 ` Linus Torvalds
2025-01-07 22:55 ` Jens Axboe
2025-01-10 10:56 ` Christian Brauner
2025-01-10 11:00 ` Christian Brauner
6 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2025-01-07 17:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Christian Brauner, Manfred Spraul, David S. Miller, Eric Dumazet,
Jens Axboe, Pavel Begunkov, WangYuli, linux-kernel, io-uring,
netdev
On Tue, 7 Jan 2025 at 08:27, Oleg Nesterov <[email protected]> wrote:
>
> I misread fs/eventpoll.c, it has the same problem. And more __pollwait()-like
> functions, for example p9_pollwait(). So 1/5 adds mb() into poll_wait(), not
> into __pollwait().
Ack on all five patches, looks sane to me.
Christian, I'm assuming this goes through your tree? If not, holler,
and I can take it directly.
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
2025-01-07 17:38 ` [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Linus Torvalds
@ 2025-01-07 22:55 ` Jens Axboe
2025-01-10 10:56 ` Christian Brauner
1 sibling, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2025-01-07 22:55 UTC (permalink / raw)
To: Linus Torvalds, Oleg Nesterov
Cc: Christian Brauner, Manfred Spraul, David S. Miller, Eric Dumazet,
Pavel Begunkov, WangYuli, linux-kernel, io-uring, netdev
On 1/7/25 10:38 AM, Linus Torvalds wrote:
> On Tue, 7 Jan 2025 at 08:27, Oleg Nesterov <[email protected]> wrote:
>>
>> I misread fs/eventpoll.c, it has the same problem. And more __pollwait()-like
>> functions, for example p9_pollwait(). So 1/5 adds mb() into poll_wait(), not
>> into __pollwait().
>
> Ack on all five patches, looks sane to me.
>
> Christian, I'm assuming this goes through your tree? If not, holler,
> and I can take it directly.
Same, series looks good.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
2025-01-07 17:38 ` [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Linus Torvalds
2025-01-07 22:55 ` Jens Axboe
@ 2025-01-10 10:56 ` Christian Brauner
1 sibling, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2025-01-10 10:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: Oleg Nesterov, Manfred Spraul, David S. Miller, Eric Dumazet,
Jens Axboe, Pavel Begunkov, WangYuli, linux-kernel, io-uring,
netdev
On Tue, Jan 07, 2025 at 09:38:36AM -0800, Linus Torvalds wrote:
> On Tue, 7 Jan 2025 at 08:27, Oleg Nesterov <[email protected]> wrote:
> >
> > I misread fs/eventpoll.c, it has the same problem. And more __pollwait()-like
> > functions, for example p9_pollwait(). So 1/5 adds mb() into poll_wait(), not
> > into __pollwait().
>
> Ack on all five patches, looks sane to me.
>
> Christian, I'm assuming this goes through your tree? If not, holler,
> and I can take it directly.
Yes, on it. Sorry, this took a bit. I'm trying to work downwards through
all the mail.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
2025-01-07 16:26 [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Oleg Nesterov
` (5 preceding siblings ...)
2025-01-07 17:38 ` [PATCH 0/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() Linus Torvalds
@ 2025-01-10 11:00 ` Christian Brauner
6 siblings, 0 replies; 10+ messages in thread
From: Christian Brauner @ 2025-01-10 11:00 UTC (permalink / raw)
To: Linus Torvalds, Manfred Spraul, Oleg Nesterov
Cc: Christian Brauner, David S. Miller, Eric Dumazet, Jens Axboe,
Pavel Begunkov, WangYuli, linux-kernel, io-uring, netdev
On Tue, 07 Jan 2025 17:26:49 +0100, Oleg Nesterov wrote:
> Linus,
>
> I misread fs/eventpoll.c, it has the same problem. And more __pollwait()-like
> functions, for example p9_pollwait(). So 1/5 adds mb() into poll_wait(), not
> into __pollwait().
>
> WangYuli, after 1/5 we can reconsider your patch.
>
> [...]
Applied to the vfs-6.14.poll branch of the vfs/vfs.git tree.
Patches in the vfs-6.14.poll branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.14.poll
[1/5] poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
https://git.kernel.org/vfs/vfs/c/cacd9ae4bf80
[2/5] poll_wait: kill the obsolete wait_address check
https://git.kernel.org/vfs/vfs/c/10b02a2cfec2
[3/5] io_uring_poll: kill the no longer necessary barrier after poll_wait()
https://git.kernel.org/vfs/vfs/c/4e15fa8305de
[4/5] sock_poll_wait: kill the no longer necessary barrier after poll_wait()
https://git.kernel.org/vfs/vfs/c/b2849867b3a7
[5/5] poll: kill poll_does_not_wait()
https://git.kernel.org/vfs/vfs/c/f005bf18a57a
^ permalink raw reply [flat|nested] 10+ messages in thread