From: Xiaoguang Wang <[email protected]>
To: Yu Jian Wu <[email protected]>
Cc: [email protected], [email protected],
[email protected], [email protected]
Subject: Re: [PATCH] io_uring: create percpu io sq thread when IORING_SETUP_SQ_AFF is flagged
Date: Mon, 25 May 2020 11:16:32 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <20200522111732.GA20291@amber>
hi,
> On Wed, May 20, 2020 at 08:11:04PM +0800, Xiaoguang Wang wrote:
>> hi,
>>
>> There're still some left work to do, fox example, use srcu to protect multiple
>> ctxs iteration to reduce mutex lock competition, make percpu thread aware of
>> cpu hotplug, but I send it now for some early comments, thanks in advance!
>>
>> Regards,
>> Xiaoguang Wang
>
> Hi,
>
> Thanks for doing this!
> Speaking as someone who tried this and really struggled with a few parts.
>
> Few comments below.
>
> W.r.t Pavel's comments on how to do this fairly, I think the only way is
> for this multiple ctx thread to handle all the ctxs fairly is to queue all
> the work and do it async rather than inline.
Look like that you mean every io_sq_thread always uses REQ_F_FORCE_ASYNC to
submit reqs. I'm not sure it's efficient. Queuing works to io-wq should be
a fast job, then that means io_sq_thread will just do busy loop other than
queuing works most of time, which will waste cpu resource the io_sq_thread
is bound to. What I want to express is that io_sq_thead should do some real
job, not just queue the work.
>>> + needs_wait = true;
>>> + prepare_to_wait(&t->sqo_percpu_wait, &wait, TASK_INTERRUPTIBLE);
>>> + mutex_lock(&t->lock);
>>> + list_for_each_entry_safe(ctx, tmp, &t->ctx_list, node) {
>>> + if ((ctx->flags & IORING_SETUP_IOPOLL) &&
>>> + !list_empty_careful(&ctx->poll_list)) {
>>> + needs_wait = false;
>>> + break;
>>> + }
>>> + to_submit = io_sqring_entries(ctx);
>
> Unless I'm mistaken, I don't think these are submitted anywher
Yes, before io_sq_thread goes to sleep, it'll check whether some ctxs has
new sqes to handle, if "to_submit" is greater than zero, io_sq_thread will
skip the sleep and continue to handle these new sqes.
>
>>> + if (to_submit && ctx->submit_status != -EBUSY) {
>>> @@ -6841,6 +6990,52 @@ static int io_init_wq_offload(struct io_ring_ctx *ctx,
>>> return ret;
>>> }
>>> +static void create_io_percpu_thread(struct io_ring_ctx *ctx, int cpu)
>>> +{
>>> + struct io_percpu_thread *t;
>>> +
>>> + t = per_cpu_ptr(percpu_threads, cpu);
>>> + mutex_lock(&t->lock);
>>> + if (!t->sqo_thread) {
>>> + t->sqo_thread = kthread_create_on_cpu(io_sq_percpu_thread, t,
>>> + cpu, "io_uring_percpu-sq");
>>> + if (IS_ERR(t->sqo_thread)) {
>>> + ctx->sqo_thread = t->sqo_thread;
>>> + t->sqo_thread = NULL;
>>> + mutex_unlock(&t->lock);
>>> + return;
>>> + }
>>> + }
>>> +
>>> + if (t->sq_thread_idle < ctx->sq_thread_idle)
>>> + t->sq_thread_idle = ctx->sq_thread_idle;
>
> Is max really the best way to do this?
> Or should it be per ctx?
Because these ctxs are sharing same io_sq_thread, and I think sq_thread_idle is to
control when io_sq_thread can go to sleep, so we should choose a max value.
Regards,
Xiaoguang Wang
>
> Suppose the first ctx has a
> sq_thread_idle of something very small, and the second has a
> sq_thread_idle of 500ms, this will cause the loop to iterate over the
> first ctx even though it should have been considered idle a long time
> ago.
>
>>> + ctx->sqo_wait = &t->sqo_percpu_wait;
>>> + ctx->sq_thread_cpu = cpu;
>>> + list_add_tail(&ctx->node, &t->ctx_list);
>>> + ctx->sqo_thread = t->sqo_thread;
>>> + mutex_unlock(&t->lock);
>>> +}
>>> +
>>> +static void destroy_io_percpu_thread(struct io_ring_ctx *ctx, int cpu)
>>> +{
>>> + struct io_percpu_thread *t;
>>> + struct task_struct *sqo_thread = NULL;
>>> +
>>> + t = per_cpu_ptr(percpu_threads, cpu);
>>> + mutex_lock(&t->lock);
>>> + list_del(&ctx->node);
>>> + if (list_empty(&t->ctx_list)) {
>>> + sqo_thread = t->sqo_thread;
>>> + t->sqo_thread = NULL;
>>> + }
>>> + mutex_unlock(&t->lock);
>>> +
>>> + if (sqo_thread) {
>>> + kthread_park(sqo_thread);
>>> + kthread_stop(sqo_thread);
>>> + }
>>> +}
>>> +
>>> static int io_sq_offload_start(struct io_ring_ctx *ctx,
>>> struct io_uring_params *p)
>>> {
>>> @@ -6867,9 +7062,7 @@ static int io_sq_offload_start(struct io_ring_ctx *ctx,
>>> if (!cpu_online(cpu))
>>> goto err;
>>> - ctx->sqo_thread = kthread_create_on_cpu(io_sq_thread,
>>> - ctx, cpu,
>>> - "io_uring-sq");
>>> + create_io_percpu_thread(ctx, cpu);
>>> } else {
>>> ctx->sqo_thread = kthread_create(io_sq_thread, ctx,
>>> "io_uring-sq");
>>> @@ -7516,7 +7709,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>>> if (!list_empty_careful(&ctx->cq_overflow_list))
>>> io_cqring_overflow_flush(ctx, false);
>>> if (flags & IORING_ENTER_SQ_WAKEUP)
>>> - wake_up(&ctx->sqo_wait);
>>> + wake_up(ctx->sqo_wait);
>>> submitted = to_submit;
>>> } else if (to_submit) {
>>> mutex_lock(&ctx->uring_lock);
>>> @@ -8102,6 +8295,8 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
>>> static int __init io_uring_init(void)
>>> {
>>> + int cpu;
>>> +
>>> #define __BUILD_BUG_VERIFY_ELEMENT(stype, eoffset, etype, ename) do { \
>>> BUILD_BUG_ON(offsetof(stype, ename) != eoffset); \
>>> BUILD_BUG_ON(sizeof(etype) != sizeof_field(stype, ename)); \
>>> @@ -8141,6 +8336,18 @@ static int __init io_uring_init(void)
>>> BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
>>> BUILD_BUG_ON(__REQ_F_LAST_BIT >= 8 * sizeof(int));
>>> req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC);
>>> +
>>> + percpu_threads = alloc_percpu(struct io_percpu_thread);
>>> + for_each_possible_cpu(cpu) {
>>> + struct io_percpu_thread *t;
>>> +
>>> + t = per_cpu_ptr(percpu_threads, cpu);
>>> + INIT_LIST_HEAD(&t->ctx_list);
>>> + init_waitqueue_head(&t->sqo_percpu_wait);
>>> + mutex_init(&t->lock);
>>> + t->sqo_thread = NULL;
>>> + t->sq_thread_idle = 0;
>>> + }
>>> return 0;
>>> };
>>> __initcall(io_uring_init);
>>>
> Thanks!
>
next prev parent reply other threads:[~2020-05-25 3:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-20 11:56 [PATCH] io_uring: create percpu io sq thread when IORING_SETUP_SQ_AFF is flagged Xiaoguang Wang
2020-05-20 12:11 ` Xiaoguang Wang
2020-05-22 11:17 ` Yu Jian Wu
2020-05-25 3:16 ` Xiaoguang Wang [this message]
2020-05-20 22:09 ` Pavel Begunkov
2020-05-22 8:33 ` Xiaoguang Wang
2020-05-24 11:46 ` Pavel Begunkov
2020-05-26 14:42 ` Xiaoguang Wang
2020-06-03 14:47 ` Pavel Begunkov
2020-06-03 18:48 ` Pavel Begunkov
2020-05-28 7:56 ` Xiaoguang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e0f47684-8431-599d-7451-bed64b18c401@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox