* [PATCH next v1 1/2] io_uring: add io_local_work_pending()
2024-11-20 22:14 [PATCH next v1 0/2] limit local tw done David Wei
@ 2024-11-20 22:14 ` David Wei
2024-11-20 23:45 ` Pavel Begunkov
2024-11-20 22:14 ` [PATCH next v1 2/2] io_uring: limit local tw done David Wei
2024-11-21 1:12 ` [PATCH next v1 0/2] " Jens Axboe
2 siblings, 1 reply; 8+ messages in thread
From: David Wei @ 2024-11-20 22:14 UTC (permalink / raw)
To: io-uring; +Cc: David Wei, Jens Axboe, Pavel Begunkov
In preparation for adding a new llist of tw to retry due to hitting the
tw limit, add a helper io_local_work_pending(). This function returns
true if there is any local tw pending. For now it only checks
ctx->work_llist.
Signed-off-by: David Wei <[email protected]>
---
io_uring/io_uring.c | 14 +++++++-------
io_uring/io_uring.h | 9 +++++++--
2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 801293399883..83bf041d2648 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1260,7 +1260,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events,
int min_events)
{
- if (llist_empty(&ctx->work_llist))
+ if (!io_local_work_pending(ctx))
return false;
if (events < min_events)
return true;
@@ -1313,7 +1313,7 @@ static inline int io_run_local_work_locked(struct io_ring_ctx *ctx,
{
struct io_tw_state ts = {};
- if (llist_empty(&ctx->work_llist))
+ if (!io_local_work_pending(ctx))
return 0;
return __io_run_local_work(ctx, &ts, min_events);
}
@@ -2328,7 +2328,7 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
int io_run_task_work_sig(struct io_ring_ctx *ctx)
{
- if (!llist_empty(&ctx->work_llist)) {
+ if (io_local_work_pending(ctx)) {
__set_current_state(TASK_RUNNING);
if (io_run_local_work(ctx, INT_MAX) > 0)
return 0;
@@ -2459,7 +2459,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
{
if (unlikely(READ_ONCE(ctx->check_cq)))
return 1;
- if (unlikely(!llist_empty(&ctx->work_llist)))
+ if (unlikely(io_local_work_pending(ctx)))
return 1;
if (unlikely(task_work_pending(current)))
return 1;
@@ -2493,7 +2493,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
if (!io_allowed_run_tw(ctx))
return -EEXIST;
- if (!llist_empty(&ctx->work_llist))
+ if (io_local_work_pending(ctx))
io_run_local_work(ctx, min_events);
io_run_task_work();
@@ -2564,7 +2564,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
* If we got woken because of task_work being processed, run it
* now rather than let the caller do another wait loop.
*/
- if (!llist_empty(&ctx->work_llist))
+ if (io_local_work_pending(ctx))
io_run_local_work(ctx, nr_wait);
io_run_task_work();
@@ -3158,7 +3158,7 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
io_run_task_work();
io_uring_drop_tctx_refs(current);
xa_for_each(&tctx->xa, index, node) {
- if (!llist_empty(&node->ctx->work_llist)) {
+ if (io_local_work_pending(node->ctx)) {
WARN_ON_ONCE(node->ctx->submitter_task &&
node->ctx->submitter_task != current);
goto end_wait;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 4070d4c8ef97..69eb3b23a5a0 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -347,9 +347,14 @@ static inline int io_run_task_work(void)
return ret;
}
+static inline bool io_local_work_pending(struct io_ring_ctx *ctx)
+{
+ return !llist_empty(&ctx->work_llist);
+}
+
static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
{
- return task_work_pending(current) || !llist_empty(&ctx->work_llist);
+ return task_work_pending(current) || io_local_work_pending(ctx);
}
static inline void io_tw_lock(struct io_ring_ctx *ctx, struct io_tw_state *ts)
@@ -484,6 +489,6 @@ enum {
static inline bool io_has_work(struct io_ring_ctx *ctx)
{
return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
- !llist_empty(&ctx->work_llist);
+ io_local_work_pending(ctx);
}
#endif
--
2.43.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH next v1 2/2] io_uring: limit local tw done
2024-11-20 22:14 [PATCH next v1 0/2] limit local tw done David Wei
2024-11-20 22:14 ` [PATCH next v1 1/2] io_uring: add io_local_work_pending() David Wei
@ 2024-11-20 22:14 ` David Wei
2024-11-20 23:56 ` Pavel Begunkov
2024-11-21 1:12 ` [PATCH next v1 0/2] " Jens Axboe
2 siblings, 1 reply; 8+ messages in thread
From: David Wei @ 2024-11-20 22:14 UTC (permalink / raw)
To: io-uring; +Cc: David Wei, Jens Axboe, Pavel Begunkov
Instead of eagerly running all available local tw, limit the amount of
local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The
value of 20 is chosen as a reasonable heuristic to allow enough work
batching but also keep latency down.
Add a retry_llist that maintains a list of local tw that couldn't be
done in time. No synchronisation is needed since it is only modified
within the task context.
Signed-off-by: David Wei <[email protected]>
---
include/linux/io_uring_types.h | 1 +
io_uring/io_uring.c | 43 +++++++++++++++++++++++++---------
io_uring/io_uring.h | 2 +-
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 593c10a02144..011860ade268 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -336,6 +336,7 @@ struct io_ring_ctx {
*/
struct {
struct llist_head work_llist;
+ struct llist_head retry_llist;
unsigned long check_cq;
atomic_t cq_wait_nr;
atomic_t cq_timeouts;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 83bf041d2648..c3a7d0197636 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -121,6 +121,7 @@
#define IO_COMPL_BATCH 32
#define IO_REQ_ALLOC_BATCH 8
+#define IO_LOCAL_TW_DEFAULT_MAX 20
struct io_defer_entry {
struct list_head list;
@@ -1255,6 +1256,8 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
struct llist_node *node = llist_del_all(&ctx->work_llist);
__io_fallback_tw(node, false);
+ node = llist_del_all(&ctx->retry_llist);
+ __io_fallback_tw(node, false);
}
static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events,
@@ -1269,37 +1272,55 @@ static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events,
return false;
}
+static int __io_run_local_work_loop(struct llist_node **node,
+ struct io_tw_state *ts,
+ int events)
+{
+ while (*node) {
+ struct llist_node *next = (*node)->next;
+ struct io_kiocb *req = container_of(*node, struct io_kiocb,
+ io_task_work.node);
+ INDIRECT_CALL_2(req->io_task_work.func,
+ io_poll_task_func, io_req_rw_complete,
+ req, ts);
+ *node = next;
+ if (--events <= 0)
+ break;
+ }
+
+ return events;
+}
+
static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
int min_events)
{
struct llist_node *node;
unsigned int loops = 0;
- int ret = 0;
+ int ret, limit;
if (WARN_ON_ONCE(ctx->submitter_task != current))
return -EEXIST;
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+ limit = max(IO_LOCAL_TW_DEFAULT_MAX, min_events);
again:
+ ret = __io_run_local_work_loop(&ctx->retry_llist.first, ts, limit);
+ if (ctx->retry_llist.first)
+ goto retry_done;
+
/*
* llists are in reverse order, flip it back the right way before
* running the pending items.
*/
node = llist_reverse_order(llist_del_all(&ctx->work_llist));
- while (node) {
- struct llist_node *next = node->next;
- struct io_kiocb *req = container_of(node, struct io_kiocb,
- io_task_work.node);
- INDIRECT_CALL_2(req->io_task_work.func,
- io_poll_task_func, io_req_rw_complete,
- req, ts);
- ret++;
- node = next;
- }
+ ret = __io_run_local_work_loop(&node, ts, ret);
+ ctx->retry_llist.first = node;
loops++;
+ ret = limit - ret;
if (io_run_local_work_continue(ctx, ret, min_events))
goto again;
+retry_done:
io_submit_flush_completions(ctx);
if (io_run_local_work_continue(ctx, ret, min_events))
goto again;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 69eb3b23a5a0..12abee607e4a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -349,7 +349,7 @@ static inline int io_run_task_work(void)
static inline bool io_local_work_pending(struct io_ring_ctx *ctx)
{
- return !llist_empty(&ctx->work_llist);
+ return !llist_empty(&ctx->work_llist) || !llist_empty(&ctx->retry_llist);
}
static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
--
2.43.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH next v1 2/2] io_uring: limit local tw done
2024-11-20 22:14 ` [PATCH next v1 2/2] io_uring: limit local tw done David Wei
@ 2024-11-20 23:56 ` Pavel Begunkov
2024-11-21 0:52 ` David Wei
2024-11-21 1:12 ` Jens Axboe
0 siblings, 2 replies; 8+ messages in thread
From: Pavel Begunkov @ 2024-11-20 23:56 UTC (permalink / raw)
To: David Wei, io-uring; +Cc: Jens Axboe
On 11/20/24 22:14, David Wei wrote:
> Instead of eagerly running all available local tw, limit the amount of
> local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The
> value of 20 is chosen as a reasonable heuristic to allow enough work
> batching but also keep latency down.
>
> Add a retry_llist that maintains a list of local tw that couldn't be
> done in time. No synchronisation is needed since it is only modified
> within the task context.
>
> Signed-off-by: David Wei <[email protected]>
> ---
> include/linux/io_uring_types.h | 1 +
> io_uring/io_uring.c | 43 +++++++++++++++++++++++++---------
> io_uring/io_uring.h | 2 +-
> 3 files changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 593c10a02144..011860ade268 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -336,6 +336,7 @@ struct io_ring_ctx {
> */
> struct {
> struct llist_head work_llist;
> + struct llist_head retry_llist;
Fwiw, probably doesn't matter, but it doesn't even need
to be atomic, it's queued and spliced while holding
->uring_lock, the pending check is also synchronised as
there is only one possible task doing that.
> unsigned long check_cq;
> atomic_t cq_wait_nr;
> atomic_t cq_timeouts;
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 83bf041d2648..c3a7d0197636 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -121,6 +121,7 @@
...
> static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
> int min_events)
> {
> struct llist_node *node;
> unsigned int loops = 0;
> - int ret = 0;
> + int ret, limit;
>
> if (WARN_ON_ONCE(ctx->submitter_task != current))
> return -EEXIST;
> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
> atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
> + limit = max(IO_LOCAL_TW_DEFAULT_MAX, min_events);
> again:
> + ret = __io_run_local_work_loop(&ctx->retry_llist.first, ts, limit);
> + if (ctx->retry_llist.first)
> + goto retry_done;
> +
> /*
> * llists are in reverse order, flip it back the right way before
> * running the pending items.
> */
> node = llist_reverse_order(llist_del_all(&ctx->work_llist));
> - while (node) {
> - struct llist_node *next = node->next;
> - struct io_kiocb *req = container_of(node, struct io_kiocb,
> - io_task_work.node);
> - INDIRECT_CALL_2(req->io_task_work.func,
> - io_poll_task_func, io_req_rw_complete,
> - req, ts);
> - ret++;
> - node = next;
> - }
> + ret = __io_run_local_work_loop(&node, ts, ret);
One thing that is not so nice is that now we have this handling and
checks in the hot path, and __io_run_local_work_loop() most likely
gets uninlined.
I wonder, can we just requeue it via task_work again? We can even
add a variant efficiently adding a list instead of a single entry,
i.e. local_task_work_add(head, tail, ...);
I'm also curious what's the use case you've got that is hitting
the problem?
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH next v1 2/2] io_uring: limit local tw done
2024-11-20 23:56 ` Pavel Begunkov
@ 2024-11-21 0:52 ` David Wei
2024-11-21 1:12 ` Jens Axboe
1 sibling, 0 replies; 8+ messages in thread
From: David Wei @ 2024-11-21 0:52 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 2024-11-20 15:56, Pavel Begunkov wrote:
> On 11/20/24 22:14, David Wei wrote:
>> Instead of eagerly running all available local tw, limit the amount of
>> local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The
>> value of 20 is chosen as a reasonable heuristic to allow enough work
>> batching but also keep latency down.
>>
>> Add a retry_llist that maintains a list of local tw that couldn't be
>> done in time. No synchronisation is needed since it is only modified
>> within the task context.
>>
>> Signed-off-by: David Wei <[email protected]>
>> ---
>> include/linux/io_uring_types.h | 1 +
>> io_uring/io_uring.c | 43 +++++++++++++++++++++++++---------
>> io_uring/io_uring.h | 2 +-
>> 3 files changed, 34 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 593c10a02144..011860ade268 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -336,6 +336,7 @@ struct io_ring_ctx {
>> */
>> struct {
>> struct llist_head work_llist;
>> + struct llist_head retry_llist;
>
> Fwiw, probably doesn't matter, but it doesn't even need
> to be atomic, it's queued and spliced while holding
> ->uring_lock, the pending check is also synchronised as
> there is only one possible task doing that.
Yeah, it doesn't. Keeping it as a llist_head means being able to re-use
helpers that take llist_head or llist_node.
>
>> unsigned long check_cq;
>> atomic_t cq_wait_nr;
>> atomic_t cq_timeouts;
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 83bf041d2648..c3a7d0197636 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -121,6 +121,7 @@
> ...
>> static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
>> int min_events)
>> {
>> struct llist_node *node;
>> unsigned int loops = 0;
>> - int ret = 0;
>> + int ret, limit;
>> if (WARN_ON_ONCE(ctx->submitter_task != current))
>> return -EEXIST;
>> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
>> atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>> + limit = max(IO_LOCAL_TW_DEFAULT_MAX, min_events);
>> again:
>> + ret = __io_run_local_work_loop(&ctx->retry_llist.first, ts, limit);
>> + if (ctx->retry_llist.first)
>> + goto retry_done;
>> +
>> /*
>> * llists are in reverse order, flip it back the right way before
>> * running the pending items.
>> */
>> node = llist_reverse_order(llist_del_all(&ctx->work_llist));
>> - while (node) {
>> - struct llist_node *next = node->next;
>> - struct io_kiocb *req = container_of(node, struct io_kiocb,
>> - io_task_work.node);
>> - INDIRECT_CALL_2(req->io_task_work.func,
>> - io_poll_task_func, io_req_rw_complete,
>> - req, ts);
>> - ret++;
>> - node = next;
>> - }
>> + ret = __io_run_local_work_loop(&node, ts, ret);
>
> One thing that is not so nice is that now we have this handling and
> checks in the hot path, and __io_run_local_work_loop() most likely
> gets uninlined.
>
> I wonder, can we just requeue it via task_work again? We can even
> add a variant efficiently adding a list instead of a single entry,
> i.e. local_task_work_add(head, tail, ...);
That was an early idea, but it means re-reversing the list and then
atomically adding each node back to work_llist concurrently with e.g.
io_req_local_work_add().
Using a separate retry_llist means we don't need to concurrently add to
either retry_llist or work_llist.
>
> I'm also curious what's the use case you've got that is hitting
> the problem?
>
There is a Memcache-like workload that has load shedding based on the
time spent doing work. With epoll, the work of reading sockets and
processing a request is done by user, which can decide after some amount
of time to drop the remaining work if it takes too long. With io_uring,
the work of reading sockets is done eagerly inside of task work. If
there is a burst of work, then so much time is spent in task work
reading from sockets that, by the time control returns to user the
timeout has already elapsed.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH next v1 2/2] io_uring: limit local tw done
2024-11-20 23:56 ` Pavel Begunkov
2024-11-21 0:52 ` David Wei
@ 2024-11-21 1:12 ` Jens Axboe
1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-11-21 1:12 UTC (permalink / raw)
To: Pavel Begunkov, David Wei, io-uring
On 11/20/24 4:56 PM, Pavel Begunkov wrote:
> On 11/20/24 22:14, David Wei wrote:
>> Instead of eagerly running all available local tw, limit the amount of
>> local tw done to the max of IO_LOCAL_TW_DEFAULT_MAX (20) or wait_nr. The
>> value of 20 is chosen as a reasonable heuristic to allow enough work
>> batching but also keep latency down.
>>
>> Add a retry_llist that maintains a list of local tw that couldn't be
>> done in time. No synchronisation is needed since it is only modified
>> within the task context.
>>
>> Signed-off-by: David Wei <[email protected]>
>> ---
>> include/linux/io_uring_types.h | 1 +
>> io_uring/io_uring.c | 43 +++++++++++++++++++++++++---------
>> io_uring/io_uring.h | 2 +-
>> 3 files changed, 34 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 593c10a02144..011860ade268 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -336,6 +336,7 @@ struct io_ring_ctx {
>> */
>> struct {
>> struct llist_head work_llist;
>> + struct llist_head retry_llist;
>
> Fwiw, probably doesn't matter, but it doesn't even need
> to be atomic, it's queued and spliced while holding
> ->uring_lock, the pending check is also synchronised as
> there is only one possible task doing that.
>
>> unsigned long check_cq;
>> atomic_t cq_wait_nr;
>> atomic_t cq_timeouts;
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 83bf041d2648..c3a7d0197636 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -121,6 +121,7 @@
> ...
>> static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
>> int min_events)
>> {
>> struct llist_node *node;
>> unsigned int loops = 0;
>> - int ret = 0;
>> + int ret, limit;
>> if (WARN_ON_ONCE(ctx->submitter_task != current))
>> return -EEXIST;
>> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
>> atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>> + limit = max(IO_LOCAL_TW_DEFAULT_MAX, min_events);
>> again:
>> + ret = __io_run_local_work_loop(&ctx->retry_llist.first, ts, limit);
>> + if (ctx->retry_llist.first)
>> + goto retry_done;
>> +
>> /*
>> * llists are in reverse order, flip it back the right way before
>> * running the pending items.
>> */
>> node = llist_reverse_order(llist_del_all(&ctx->work_llist));
>> - while (node) {
>> - struct llist_node *next = node->next;
>> - struct io_kiocb *req = container_of(node, struct io_kiocb,
>> - io_task_work.node);
>> - INDIRECT_CALL_2(req->io_task_work.func,
>> - io_poll_task_func, io_req_rw_complete,
>> - req, ts);
>> - ret++;
>> - node = next;
>> - }
>> + ret = __io_run_local_work_loop(&node, ts, ret);
>
> One thing that is not so nice is that now we have this handling and
> checks in the hot path, and __io_run_local_work_loop() most likely
> gets uninlined.
I don't think that really matters, it's pretty light. The main overhead
in this function is not the call, it's reordering requests and touching
cachelines of the requests.
I think it's pretty light as-is and actually looks pretty good. It's
also similar to how sqpoll bites over longer task_work lines, and
arguably a mistake that we allow huge depths of this when we can avoid
it with deferred task_work.
> I wonder, can we just requeue it via task_work again? We can even
> add a variant efficiently adding a list instead of a single entry,
> i.e. local_task_work_add(head, tail, ...);
I think that can only work if we change work_llist to be a regular list
with regular locking. Otherwise it's a bit of a mess with the list being
reordered, and then you're spending extra cycles on potentially
reordering all the entries again.
> I'm also curious what's the use case you've got that is hitting
> the problem?
I'll let David answer that one, but some task_work can take a while to
run, eg if it's not just posting a completion.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH next v1 0/2] limit local tw done
2024-11-20 22:14 [PATCH next v1 0/2] limit local tw done David Wei
2024-11-20 22:14 ` [PATCH next v1 1/2] io_uring: add io_local_work_pending() David Wei
2024-11-20 22:14 ` [PATCH next v1 2/2] io_uring: limit local tw done David Wei
@ 2024-11-21 1:12 ` Jens Axboe
2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-11-21 1:12 UTC (permalink / raw)
To: David Wei, io-uring; +Cc: Pavel Begunkov
On 11/20/24 3:14 PM, David Wei wrote:
> Currently when running local tw it will eagerly try and complete all
> available work. When there is a burst of work coming in, the list of
> work in work_llist may be much larger than the requested batch count
> wait_nr. Doing all of the work eagerly may cause latency issues for some
> applications that do not want to spend so much time in the kernel.
>
> Limit the amount of local tw done to the max of 20 (a heuristic) or
> wait_nr. This then does not require any userspace changes.
>
> Many applications will submit and wait with wait_nr = 1 to mean "wait
> for _at least_ 1 completion, but do more work if available". This used
> to mean "all work" but now that is capped to 20 requests. If users want
> more work batching, then they can set wait_nr to > 20.
Looks good to me!
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread