* [PATCH] io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups @ 2023-10-19 13:01 Jens Axboe 2023-11-07 17:47 ` Pavel Begunkov 0 siblings, 1 reply; 5+ messages in thread From: Jens Axboe @ 2023-10-19 13:01 UTC (permalink / raw) To: io-uring With poll triggered retries, each event trigger will cause a task_work item to be added for processing. If the ring is setup with IORING_SETUP_DEFER_TASKRUN and a task is waiting on multiple events to complete, any task_work addition will wake the task for processing these items. This can cause more context switches than we would like, if the application is deliberately waiting on multiple items to increase efficiency. For example, if an application has receive multishot armed for sockets and wants to wait for N to complete within M usec of time, we should not be waking up and processing these items until we have all the events we asked for. By switching the poll trigger to lazy wake, we'll process them when they are all ready, in one swoop, rather than wake multiple times only to process one and then go back to sleep. At some point we probably want to look at just making the lazy wake the default, but for now, let's just selectively enable it where it makes sense. Signed-off-by: Jens Axboe <[email protected]> --- diff --git a/io_uring/poll.c b/io_uring/poll.c index 4c360ba8793a..d38d05edb4fa 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -370,7 +370,7 @@ static void __io_poll_execute(struct io_kiocb *req, int mask) req->io_task_work.func = io_poll_task_func; trace_io_uring_task_add(req, mask); - io_req_task_work_add(req); + __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } static inline void io_poll_execute(struct io_kiocb *req, int res) -- Jens Axboe ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups 2023-10-19 13:01 [PATCH] io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups Jens Axboe @ 2023-11-07 17:47 ` Pavel Begunkov 2023-11-07 19:57 ` Jens Axboe 0 siblings, 1 reply; 5+ messages in thread From: Pavel Begunkov @ 2023-11-07 17:47 UTC (permalink / raw) To: Jens Axboe, io-uring On 10/19/23 14:01, Jens Axboe wrote: > With poll triggered retries, each event trigger will cause a task_work > item to be added for processing. If the ring is setup with > IORING_SETUP_DEFER_TASKRUN and a task is waiting on multiple events to > complete, any task_work addition will wake the task for processing these > items. This can cause more context switches than we would like, if the > application is deliberately waiting on multiple items to increase > efficiency. I'm a bit late here. The reason why I didn't enable it for polling is because it changes the behaviour. Let's think of a situation where we want to accept 2 sockets, so we send a multishot accept and do cq_wait(nr=2). It was perfectly fine before, but now it'll hung as there's only 1 request and so 1 tw queued. And same would happen with multishot recv even though it's more relevant to packet based protocols like UDP. It might be not specific to multishots: listen(backlog=1), queue N oneshot accepts and cq_wait(N). Now we get the first connection in the queue to accept. [IORING_OP_ACCEPT] = { .poll_exclusive = 1, } Due to poll_exclusive (I assume) it wakes only one accept. That will try to queue up a tw for it, but it'll not be executed because it's just one item. No other connection can be queued up because of the backlog limit => presumably no other request will be woken up => that first tw never runs. It's more subtle and timing specific than the previous example, but nevertheless it's concerning we might step on sth like that. > For example, if an application has receive multishot armed for sockets > and wants to wait for N to complete within M usec of time, we should not > be waking up and processing these items until we have all the events we > asked for. By switching the poll trigger to lazy wake, we'll process > them when they are all ready, in one swoop, rather than wake multiple > times only to process one and then go back to sleep. > > At some point we probably want to look at just making the lazy wake > the default, but for now, let's just selectively enable it where it > makes sense. > > Signed-off-by: Jens Axboe <[email protected]> > > --- > > diff --git a/io_uring/poll.c b/io_uring/poll.c > index 4c360ba8793a..d38d05edb4fa 100644 > --- a/io_uring/poll.c > +++ b/io_uring/poll.c > @@ -370,7 +370,7 @@ static void __io_poll_execute(struct io_kiocb *req, int mask) > req->io_task_work.func = io_poll_task_func; > > trace_io_uring_task_add(req, mask); > - io_req_task_work_add(req); > + __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); > } > > static inline void io_poll_execute(struct io_kiocb *req, int res) > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups 2023-11-07 17:47 ` Pavel Begunkov @ 2023-11-07 19:57 ` Jens Axboe 2023-11-08 16:14 ` Jens Axboe 0 siblings, 1 reply; 5+ messages in thread From: Jens Axboe @ 2023-11-07 19:57 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 11/7/23 10:47 AM, Pavel Begunkov wrote: > On 10/19/23 14:01, Jens Axboe wrote: >> With poll triggered retries, each event trigger will cause a task_work >> item to be added for processing. If the ring is setup with >> IORING_SETUP_DEFER_TASKRUN and a task is waiting on multiple events to >> complete, any task_work addition will wake the task for processing these >> items. This can cause more context switches than we would like, if the >> application is deliberately waiting on multiple items to increase >> efficiency. > > I'm a bit late here. The reason why I didn't enable it for polling is > because it changes the behaviour. Let's think of a situation where we > want to accept 2 sockets, so we send a multishot accept and do > cq_wait(nr=2). It was perfectly fine before, but now it'll hung as > there's only 1 request and so 1 tw queued. And same would happen with > multishot recv even though it's more relevant to packet based protocols > like UDP. > > It might be not specific to multishots: > listen(backlog=1), queue N oneshot accepts and cq_wait(N). I don't think it's unreasonable to assume that you need a backlog of N if you batch wait for N to come in, with defer taskrun. I'm more curious about non-accept cases that would potentially break. > Now we get the first connection in the queue to accept. > > [IORING_OP_ACCEPT] = { > .poll_exclusive = 1, > } > > Due to poll_exclusive (I assume) it wakes only one accept. That Right, if poll_exclusive is set, then we use exclusive waits. IFF the caller also groks that and asks to wake just 1, then we will indeed just wake one. I can certainly see this one being a problem, but at least that could be handled by not doing this for exclusive poll waits. > will try to queue up a tw for it, but it'll not be executed > because it's just one item. No other connection can be queued > up because of the backlog limit => presumably no other request > will be woken up => that first tw never runs. It's more subtle > and timing specific than the previous example, but nevertheless > it's concerning we might step on sth like that. Backlog aside, as per above, what other cases would we need to worry about here? It's really anything where something in poll would need processing to trigger more events. IOW, if we can allow some known cases at least that'd certainly help, or conversely exclude ones (like poll_exclusive). -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups 2023-11-07 19:57 ` Jens Axboe @ 2023-11-08 16:14 ` Jens Axboe 2023-11-09 13:07 ` Pavel Begunkov 0 siblings, 1 reply; 5+ messages in thread From: Jens Axboe @ 2023-11-08 16:14 UTC (permalink / raw) To: Pavel Begunkov, io-uring In the spirit of getting some closure/progress on this, how about this for starters? Disables lazy wake for poll_exclusive, and provides a flag that can otherwise be set to disable it as well. diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d3009d56af0b..03401c6ce5bb 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -431,6 +431,7 @@ enum { /* keep async read/write and isreg together and in order */ REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, + REQ_F_POLL_NO_LAZY_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -498,6 +499,8 @@ enum { REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + /* don't use lazy poll wake for this request */ + REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); diff --git a/io_uring/poll.c b/io_uring/poll.c index d38d05edb4fa..4fed5514c379 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -366,11 +366,16 @@ void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts) static void __io_poll_execute(struct io_kiocb *req, int mask) { + unsigned flags = 0; + io_req_set_res(req, mask, 0); req->io_task_work.func = io_poll_task_func; trace_io_uring_task_add(req, mask); - __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); + + if (!(req->flags & REQ_F_POLL_NO_LAZY)) + flags = IOU_F_TWQ_LAZY_WAKE; + __io_req_task_work_add(req, flags); } static inline void io_poll_execute(struct io_kiocb *req, int res) @@ -526,10 +531,17 @@ static void __io_queue_proc(struct io_poll *poll, struct io_poll_table *pt, poll->head = head; poll->wait.private = (void *) wqe_private; - if (poll->events & EPOLLEXCLUSIVE) + if (poll->events & EPOLLEXCLUSIVE) { + /* + * Exclusive waits may only wake a limited amount of entries + * rather than all of them, this may interfere with lazy + * wake if someone does wait(events > 1). + */ + req->flags |= REQ_F_POLL_NO_LAZY; add_wait_queue_exclusive(head, &poll->wait); - else + } else { add_wait_queue(head, &poll->wait); + } } static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head, -- Jens Axboe ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups 2023-11-08 16:14 ` Jens Axboe @ 2023-11-09 13:07 ` Pavel Begunkov 0 siblings, 0 replies; 5+ messages in thread From: Pavel Begunkov @ 2023-11-09 13:07 UTC (permalink / raw) To: Jens Axboe, io-uring On 11/8/23 16:14, Jens Axboe wrote: > In the spirit of getting some closure/progress on this, how about this > for starters? Disables lazy wake for poll_exclusive, and provides a flag > that can otherwise be set to disable it as well. I'm not that concerned about the non-multishot accept case specifically, if anything we can let it slide by saying that backlog=1 is unreasonable there. A bigger problem is that for the purpose of counting nr_wait passed into the waiting syscall, users must never assume that a multishot request can produce more than 1 completion. For example this is not allowed: queue_multishot_{rcv,accept}(); cq_wait(2); So we can just leave the enablement patch alone and say it's the only reasonable behaviour, and it was the indented way from the beginning (hoping nobody will complain about it). Or do it via a flag, perhaps SETUP_*. For the multishot part I described the rules above. As for the problem in general, it come from interdependecies b/w requests, so the rule is the vague "there should be no deps b/w requests", but I'm not sure we can spell at the moment the precise semantics. > diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h > index d3009d56af0b..03401c6ce5bb 100644 > --- a/include/linux/io_uring_types.h > +++ b/include/linux/io_uring_types.h > @@ -431,6 +431,7 @@ enum { > /* keep async read/write and isreg together and in order */ > REQ_F_SUPPORT_NOWAIT_BIT, > REQ_F_ISREG_BIT, > + REQ_F_POLL_NO_LAZY_BIT, > > /* not a real bit, just to check we're not overflowing the space */ > __REQ_F_LAST_BIT, > @@ -498,6 +499,8 @@ enum { > REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), > /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ > REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), > + /* don't use lazy poll wake for this request */ > + REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT), > }; > > typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); > diff --git a/io_uring/poll.c b/io_uring/poll.c > index d38d05edb4fa..4fed5514c379 100644 > --- a/io_uring/poll.c > +++ b/io_uring/poll.c > @@ -366,11 +366,16 @@ void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts) > > static void __io_poll_execute(struct io_kiocb *req, int mask) > { > + unsigned flags = 0; > + Why flag when you can just check the exclusive flag in the poll entry? > io_req_set_res(req, mask, 0); > req->io_task_work.func = io_poll_task_func; > > trace_io_uring_task_add(req, mask); > - __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); > + > + if (!(req->flags & REQ_F_POLL_NO_LAZY)) > + flags = IOU_F_TWQ_LAZY_WAKE; > + __io_req_task_work_add(req, flags); > } > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-09 13:08 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-19 13:01 [PATCH] io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups Jens Axboe 2023-11-07 17:47 ` Pavel Begunkov 2023-11-07 19:57 ` Jens Axboe 2023-11-08 16:14 ` Jens Axboe 2023-11-09 13:07 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox