public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET 0/4] Terminate multishot early on dependencies
@ 2022-11-20 17:28 Jens Axboe
  2022-11-20 17:28 ` [PATCH 1/4] eventpoll: add EPOLL_URING wakeup flag Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Jens Axboe @ 2022-11-20 17:28 UTC (permalink / raw)
  To: io-uring

Hi,

We had a fix for -rc5 which resolved multishot dependencies for polling
on the io_uring fd itself, but we can also have dependencies with eventfd
(if it's registered as the ring eventfd), or epoll if an io_uring fd is
added to an epoll context and io_uring arms a multishot poll for the epoll
context.

Try and cover this generically rather than need to special case file
types.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] eventpoll: add EPOLL_URING wakeup flag
  2022-11-20 17:28 [PATCHSET 0/4] Terminate multishot early on dependencies Jens Axboe
@ 2022-11-20 17:28 ` Jens Axboe
  2022-11-20 17:28 ` [PATCH 2/4] eventfd: provide a eventfd_signal_mask() helper Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2022-11-20 17:28 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, stable

We can have dependencies between epoll and io_uring. Consider an epoll
context, identified by the epfd file descriptor, and an io_uring file
descriptor identified by iofd. If we add iofd to the epfd context, and
arm a multishot poll request for epfd with iofd, then the multishot
poll request will repeatedly trigger and generate events until terminated
by CQ ring overflow. This isn't a desired behavior.

Add EPOLL_URING so that io_uring can pass it in as part of the poll wakeup
key, and io_uring can check for that to detect a potential recursive
invocation.

Cc: [email protected] # 6.0
Signed-off-by: Jens Axboe <[email protected]>
---
 fs/eventpoll.c                 | 18 ++++++++++--------
 include/uapi/linux/eventpoll.h |  6 ++++++
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 52954d4637b5..e864256001e8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -491,7 +491,8 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
  */
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
+static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
+			     unsigned pollflags)
 {
 	struct eventpoll *ep_src;
 	unsigned long flags;
@@ -522,16 +523,17 @@ static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
 	}
 	spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
 	ep->nests = nests + 1;
-	wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
+	wake_up_locked_poll(&ep->poll_wait, EPOLLIN | pollflags);
 	ep->nests = 0;
 	spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
 }
 
 #else
 
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
+static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
+			     unsigned pollflags)
 {
-	wake_up_poll(&ep->poll_wait, EPOLLIN);
+	wake_up_poll(&ep->poll_wait, EPOLLIN | pollflags);
 }
 
 #endif
@@ -742,7 +744,7 @@ static void ep_free(struct eventpoll *ep)
 
 	/* We need to release all tasks waiting for these file */
 	if (waitqueue_active(&ep->poll_wait))
-		ep_poll_safewake(ep, NULL);
+		ep_poll_safewake(ep, NULL, 0);
 
 	/*
 	 * We need to lock this because we could be hit by
@@ -1208,7 +1210,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 
 	/* We have to call this outside the lock */
 	if (pwake)
-		ep_poll_safewake(ep, epi);
+		ep_poll_safewake(ep, epi, pollflags & EPOLL_URING);
 
 	if (!(epi->event.events & EPOLLEXCLUSIVE))
 		ewake = 1;
@@ -1553,7 +1555,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 
 	/* We have to call this outside the lock */
 	if (pwake)
-		ep_poll_safewake(ep, NULL);
+		ep_poll_safewake(ep, NULL, 0);
 
 	return 0;
 }
@@ -1629,7 +1631,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
 
 	/* We have to call this outside the lock */
 	if (pwake)
-		ep_poll_safewake(ep, NULL);
+		ep_poll_safewake(ep, NULL, 0);
 
 	return 0;
 }
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index 8a3432d0f0dc..532cc7fa75c0 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -41,6 +41,12 @@
 #define EPOLLMSG	(__force __poll_t)0x00000400
 #define EPOLLRDHUP	(__force __poll_t)0x00002000
 
+/*
+ * Internal flag - wakeup generated by io_uring, used to detect recursion back
+ * into the io_uring poll handler.
+ */
+#define EPOLL_URING	((__force __poll_t)(1U << 27))
+
 /* Set exclusive wakeup mode for the target file descriptor */
 #define EPOLLEXCLUSIVE	((__force __poll_t)(1U << 28))
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] eventfd: provide a eventfd_signal_mask() helper
  2022-11-20 17:28 [PATCHSET 0/4] Terminate multishot early on dependencies Jens Axboe
  2022-11-20 17:28 ` [PATCH 1/4] eventpoll: add EPOLL_URING wakeup flag Jens Axboe
@ 2022-11-20 17:28 ` Jens Axboe
  2022-11-20 17:28 ` [PATCH 3/4] io_uring: pass in EPOLL_URING as part of eventfd signaling and wakeups Jens Axboe
  2022-11-20 17:28 ` [PATCH 4/4] Revert "io_uring: disallow self-propelled ring polling" Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2022-11-20 17:28 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, stable

This is identical to eventfd_signal(), but it allows the caller to pass
in a mask to be used for the poll wakeup key. The use case is avoiding
repeated multishot triggers if we have a dependency between eventfd and
io_uring.

If we setup an eventfd context and register that as the io_uring eventfd,
and at the same time queue a multishot poll request for the eventfd
context, then any CQE posted will repeatedly trigger the multishot request
until it terminates when the CQ ring overflows.

In preparation for io_uring detecting this circular dependency, add the
mentioned helper so that io_uring can pass in EPOLL_URING as part of the
poll wakeup key.

Cc: [email protected] # 6.0
Signed-off-by: Jens Axboe <[email protected]>
---
 fs/eventfd.c            | 37 +++++++++++++++++++++----------------
 include/linux/eventfd.h |  1 +
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index c0ffee99ad23..249ca6c0b784 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -43,21 +43,7 @@ struct eventfd_ctx {
 	int id;
 };
 
-/**
- * eventfd_signal - Adds @n to the eventfd counter.
- * @ctx: [in] Pointer to the eventfd context.
- * @n: [in] Value of the counter to be added to the eventfd internal counter.
- *          The value cannot be negative.
- *
- * This function is supposed to be called by the kernel in paths that do not
- * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
- * value, and we signal this as overflow condition by returning a EPOLLERR
- * to poll(2).
- *
- * Returns the amount by which the counter was incremented.  This will be less
- * than @n if the counter has overflowed.
- */
-__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask)
 {
 	unsigned long flags;
 
@@ -78,12 +64,31 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
 		n = ULLONG_MAX - ctx->count;
 	ctx->count += n;
 	if (waitqueue_active(&ctx->wqh))
-		wake_up_locked_poll(&ctx->wqh, EPOLLIN);
+		wake_up_locked_poll(&ctx->wqh, EPOLLIN | mask);
 	current->in_eventfd = 0;
 	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
 
 	return n;
 }
+
+/**
+ * eventfd_signal - Adds @n to the eventfd counter.
+ * @ctx: [in] Pointer to the eventfd context.
+ * @n: [in] Value of the counter to be added to the eventfd internal counter.
+ *          The value cannot be negative.
+ *
+ * This function is supposed to be called by the kernel in paths that do not
+ * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
+ * value, and we signal this as overflow condition by returning a EPOLLERR
+ * to poll(2).
+ *
+ * Returns the amount by which the counter was incremented.  This will be less
+ * than @n if the counter has overflowed.
+ */
+__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
+{
+	return eventfd_signal_mask(ctx, n, 0);
+}
 EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static void eventfd_free_ctx(struct eventfd_ctx *ctx)
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 30eb30d6909b..e849329ce1a8 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -40,6 +40,7 @@ struct file *eventfd_fget(int fd);
 struct eventfd_ctx *eventfd_ctx_fdget(int fd);
 struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
 __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n);
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask);
 int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait,
 				  __u64 *cnt);
 void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] io_uring: pass in EPOLL_URING as part of eventfd signaling and wakeups
  2022-11-20 17:28 [PATCHSET 0/4] Terminate multishot early on dependencies Jens Axboe
  2022-11-20 17:28 ` [PATCH 1/4] eventpoll: add EPOLL_URING wakeup flag Jens Axboe
  2022-11-20 17:28 ` [PATCH 2/4] eventfd: provide a eventfd_signal_mask() helper Jens Axboe
@ 2022-11-20 17:28 ` Jens Axboe
  2022-11-20 17:28 ` [PATCH 4/4] Revert "io_uring: disallow self-propelled ring polling" Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2022-11-20 17:28 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, stable

Pass in EPOLL_URING when signaling eventfd or doing poll related wakups,
so that we can check for a circular event dependency between eventfd
and epoll. If this flag is set when our wakeup handlers are called, then
we know we have a dependency that needs to terminate multishot requests.

eventfd and epoll are the only such possible dependencies.

Cc: [email protected] # 6.0
Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/io_uring.c |  4 ++--
 io_uring/io_uring.h | 15 +++++++++++----
 io_uring/poll.c     |  8 ++++++++
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8840cf3e20f2..53d0043b77a5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -495,7 +495,7 @@ static void io_eventfd_ops(struct rcu_head *rcu)
 	int ops = atomic_xchg(&ev_fd->ops, 0);
 
 	if (ops & BIT(IO_EVENTFD_OP_SIGNAL_BIT))
-		eventfd_signal(ev_fd->cq_ev_fd, 1);
+		eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING);
 
 	/* IO_EVENTFD_OP_FREE_BIT may not be set here depending on callback
 	 * ordering in a race but if references are 0 we know we have to free
@@ -531,7 +531,7 @@ static void io_eventfd_signal(struct io_ring_ctx *ctx)
 		goto out;
 
 	if (likely(eventfd_signal_allowed())) {
-		eventfd_signal(ev_fd->cq_ev_fd, 1);
+		eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING);
 	} else {
 		atomic_inc(&ev_fd->refs);
 		if (!atomic_fetch_or(BIT(IO_EVENTFD_OP_SIGNAL_BIT), &ev_fd->ops))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index cef5ff924e63..f6cf74cd692b 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -4,6 +4,7 @@
 #include <linux/errno.h>
 #include <linux/lockdep.h>
 #include <linux/io_uring_types.h>
+#include <uapi/linux/eventpoll.h>
 #include "io-wq.h"
 #include "slist.h"
 #include "filetable.h"
@@ -207,12 +208,18 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
 static inline void __io_cqring_wake(struct io_ring_ctx *ctx)
 {
 	/*
-	 * wake_up_all() may seem excessive, but io_wake_function() and
-	 * io_should_wake() handle the termination of the loop and only
-	 * wake as many waiters as we need to.
+	 * Trigger waitqueue handler on all waiters on our waitqueue. This
+	 * won't necessarily wake up all the tasks, io_should_wake() will make
+	 * that decision.
+	 *
+	 * Pass in EPOLLIN|EPOLL_URING as the poll wakeup key. The latter set
+	 * in the mask so that if we recurse back into our own poll waitqueue
+	 * handlers, we know we have a dependency between eventfd or epoll and
+	 * should terminate multishot poll at that point.
 	 */
 	if (waitqueue_active(&ctx->cq_wait))
-		wake_up_all(&ctx->cq_wait);
+		__wake_up(&ctx->cq_wait, TASK_NORMAL, 0,
+				poll_to_key(EPOLL_URING | EPOLLIN));
 }
 
 static inline void io_cqring_wake(struct io_ring_ctx *ctx)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 055632e9092a..b5d9426c60f6 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -394,6 +394,14 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 		return 0;
 
 	if (io_poll_get_ownership(req)) {
+		/*
+		 * If we trigger a multishot poll off our own wakeup path,
+		 * disable multishot as there is a circular dependency between
+		 * CQ posting and triggering the event.
+		 */
+		if (mask & EPOLL_URING)
+			poll->events |= EPOLLONESHOT;
+
 		/* optional, saves extra locking for removal in tw handler */
 		if (mask && poll->events & EPOLLONESHOT) {
 			list_del_init(&poll->wait.entry);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] Revert "io_uring: disallow self-propelled ring polling"
  2022-11-20 17:28 [PATCHSET 0/4] Terminate multishot early on dependencies Jens Axboe
                   ` (2 preceding siblings ...)
  2022-11-20 17:28 ` [PATCH 3/4] io_uring: pass in EPOLL_URING as part of eventfd signaling and wakeups Jens Axboe
@ 2022-11-20 17:28 ` Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2022-11-20 17:28 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

This reverts commit 7fdbc5f014c3f71bc44673a2d6c5bb2d12d45f25.

This patch dealt with a subset of the real problem, which is a potential
circular dependency on the wakup path for io_uring itself. Outside of
io_uring, eventfd can also trigger this (see details in 8c881e87feae)
and so can epoll (see details in 426930308abf). Now that we have a
generic solution to this problem, get rid of the io_uring specific
work-around.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/poll.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index b5d9426c60f6..5733ed334738 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -246,8 +246,6 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 			continue;
 		if (req->apoll_events & EPOLLONESHOT)
 			return IOU_POLL_DONE;
-		if (io_is_uring_fops(req->file))
-			return IOU_POLL_DONE;
 
 		/* multishot, just fill a CQE and proceed */
 		if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-20 17:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-20 17:28 [PATCHSET 0/4] Terminate multishot early on dependencies Jens Axboe
2022-11-20 17:28 ` [PATCH 1/4] eventpoll: add EPOLL_URING wakeup flag Jens Axboe
2022-11-20 17:28 ` [PATCH 2/4] eventfd: provide a eventfd_signal_mask() helper Jens Axboe
2022-11-20 17:28 ` [PATCH 3/4] io_uring: pass in EPOLL_URING as part of eventfd signaling and wakeups Jens Axboe
2022-11-20 17:28 ` [PATCH 4/4] Revert "io_uring: disallow self-propelled ring polling" Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox