From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Subject: Re: [PATCH 2/6] io_uring: replace defer task_work llist with io_wq_work_list
Date: Fri, 22 Nov 2024 17:07:02 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 11/22/24 16:12, Jens Axboe wrote:
...
> static inline void io_req_local_work_add(struct io_kiocb *req,
> struct io_ring_ctx *ctx,
> - unsigned flags)
> + unsigned tw_flags)
> {
> - unsigned nr_wait, nr_tw, nr_tw_prev;
> - struct llist_node *head;
> + unsigned nr_tw, nr_tw_prev, nr_wait;
> + unsigned long flags;
>
> /* See comment above IO_CQ_WAKE_INIT */
> BUILD_BUG_ON(IO_CQ_WAKE_FORCE <= IORING_MAX_CQ_ENTRIES);
>
> /*
> - * We don't know how many reuqests is there in the link and whether
> - * they can even be queued lazily, fall back to non-lazy.
> + * We don't know how many requests are in the link and whether they can
> + * even be queued lazily, fall back to non-lazy.
> */
> if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK))
> - flags &= ~IOU_F_TWQ_LAZY_WAKE;
> + tw_flags &= ~IOU_F_TWQ_LAZY_WAKE;
>
> - guard(rcu)();
protects against ctx->task deallocation, see a comment in
io_ring_exit_work() -> synchronize_rcu()
> + spin_lock_irqsave(&ctx->work_lock, flags);
> + wq_list_add_tail(&req->io_task_work.work_node, &ctx->work_list);
> + nr_tw_prev = ctx->work_items++;
Is there a good reason why it changes the semantics of
what's stored across adds? It was assigning a corrected
nr_tw, this one will start heavily spamming with wake_up()
in some cases.
> + spin_unlock_irqrestore(&ctx->work_lock, flags);
>
> - head = READ_ONCE(ctx->work_llist.first);
> - do {
> - nr_tw_prev = 0;
> - if (head) {
> - struct io_kiocb *first_req = container_of(head,
> - struct io_kiocb,
> - io_task_work.node);
> - /*
> - * Might be executed at any moment, rely on
> - * SLAB_TYPESAFE_BY_RCU to keep it alive.
> - */
> - nr_tw_prev = READ_ONCE(first_req->nr_tw);
> - }
> -
> - /*
> - * Theoretically, it can overflow, but that's fine as one of
> - * previous adds should've tried to wake the task.
> - */
> - nr_tw = nr_tw_prev + 1;
> - if (!(flags & IOU_F_TWQ_LAZY_WAKE))
> - nr_tw = IO_CQ_WAKE_FORCE;
> -
> - req->nr_tw = nr_tw;
> - req->io_task_work.node.next = head;
> - } while (!try_cmpxchg(&ctx->work_llist.first, &head,
> - &req->io_task_work.node));
> -
> - /*
> - * cmpxchg implies a full barrier, which pairs with the barrier
> - * in set_current_state() on the io_cqring_wait() side. It's used
> - * to ensure that either we see updated ->cq_wait_nr, or waiters
> - * going to sleep will observe the work added to the list, which
> - * is similar to the wait/wawke task state sync.
> - */
> + nr_tw = nr_tw_prev + 1;
> + if (!(tw_flags & IOU_F_TWQ_LAZY_WAKE))
> + nr_tw = IO_CQ_WAKE_FORCE;
>
> - if (!head) {
> + if (!nr_tw_prev) {
> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
> atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
> if (ctx->has_evfd)
> io_eventfd_signal(ctx);
> }
>
> + /*
> + * We need a barrier after unlock, which pairs with the barrier
> + * in set_current_state() on the io_cqring_wait() side. It's used
> + * to ensure that either we see updated ->cq_wait_nr, or waiters
> + * going to sleep will observe the work added to the list, which
> + * is similar to the wait/wake task state sync.
> + */
> + smp_mb();
> nr_wait = atomic_read(&ctx->cq_wait_nr);
> /* not enough or no one is waiting */
> if (nr_tw < nr_wait)
> @@ -1253,11 +1233,27 @@ void io_req_task_work_add_remote(struct io_kiocb *req, struct io_ring_ctx *ctx,
--
Pavel Begunkov
next prev parent reply other threads:[~2024-11-22 17:06 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-22 16:12 [PATCHSET for-next 0/6] task work cleanups Jens Axboe
2024-11-22 16:12 ` [PATCH 1/6] io_uring: make task_work pending check dependent on ring type Jens Axboe
2024-11-22 16:12 ` [PATCH 2/6] io_uring: replace defer task_work llist with io_wq_work_list Jens Axboe
2024-11-22 17:07 ` Pavel Begunkov [this message]
2024-11-22 17:11 ` Jens Axboe
2024-11-22 17:25 ` Pavel Begunkov
2024-11-22 17:44 ` Jens Axboe
2024-11-23 0:36 ` Pavel Begunkov
2024-11-22 16:12 ` [PATCH 3/6] io_uring/slist: add list-to-list list splice helper Jens Axboe
2024-11-22 16:12 ` [PATCH 4/6] io_uring: switch non-defer task_work to io_wq_work_list Jens Axboe
2024-11-22 16:12 ` [PATCH 5/6] io_uring: add __tctx_task_work_run() helper Jens Axboe
2024-11-22 16:12 ` [PATCH 6/6] io_uring: make __tctx_task_work_run() take an io_wq_work_list Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox