From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, io-uring <[email protected]>
Subject: Re: [PATCH] io_uring: use task_work for links if possible
Date: Sat, 27 Jun 2020 00:20:18 +0300 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 26/06/2020 23:43, Jens Axboe wrote:
> On 6/26/20 2:27 PM, Pavel Begunkov wrote:
>> On 25/06/2020 21:27, Jens Axboe wrote:
>>> Currently links are always done in an async fashion, unless we
>>> catch them inline after we successfully complete a request without
>>> having to resort to blocking. This isn't necessarily the most efficient
>>> approach, it'd be more ideal if we could just use the task_work handling
>>> for this.
>>>
>>> Outside of saving an async jump, we can also do less prep work for
>>> these kinds of requests.
>>>
>>> Running dependent links from the task_work handler yields some nice
>>> performance benefits. As an example, examples/link-cp from the liburing
>>> repository uses read+write links to implement a copy operation. Without
>>> this patch, the a cache fold 4G file read from a VM runs in about
>>> 3 seconds:
>>
>> A few comments below
>>
>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 0bba12e4e559..389274a078c8 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -898,6 +898,7 @@ enum io_mem_account {
>> ...
>>> +static void __io_req_task_submit(struct io_kiocb *req)
>>> +{
>>> + struct io_ring_ctx *ctx = req->ctx;
>>> +
>>> + __set_current_state(TASK_RUNNING);
>>> + if (!io_sq_thread_acquire_mm(ctx, req)) {
>>> + mutex_lock(&ctx->uring_lock);
>>> + __io_queue_sqe(req, NULL, NULL);
>>> + mutex_unlock(&ctx->uring_lock);
>>> + } else {
>>> + __io_req_task_cancel(req, -EFAULT);
>>> + }
>>> +}
>>> +
>>> +static void io_req_task_submit(struct callback_head *cb)
>>> +{
>>> + struct io_kiocb *req = container_of(cb, struct io_kiocb, task_work);
>>> +
>>> + __io_req_task_submit(req);
>>> +}
>>> +
>>> +static void io_req_task_queue(struct io_kiocb *req)
>>> +{
>>> + struct task_struct *tsk = req->task;
>>> + int ret;
>>> +
>>> + init_task_work(&req->task_work, io_req_task_submit);
>>> +
>>> + ret = task_work_add(tsk, &req->task_work, true);
>>> + if (unlikely(ret)) {
>>> + init_task_work(&req->task_work, io_req_task_cancel);
>>
>> Why not to kill it off here? It just was nxt, so shouldn't anything like
>> NOCANCEL
>
> We don't necessarily know the context, and we'd at least need to check
> REQ_F_COMP_LOCKED and propagate. I think it gets ugly really quick,
> better to just punt it to clean context.
Makes sense
>
>>> + tsk = io_wq_get_task(req->ctx->io_wq);
>>> + task_work_add(tsk, &req->task_work, true);
>>> + }
>>> + wake_up_process(tsk);
>>> +}
>>> +
>>> static void io_free_req(struct io_kiocb *req)
>>> {
>>> struct io_kiocb *nxt = NULL;
>>> @@ -1671,8 +1758,12 @@ static void io_free_req(struct io_kiocb *req)
>>> io_req_find_next(req, &nxt);
>>> __io_free_req(req);
>>>
>>> - if (nxt)
>>> - io_queue_async_work(nxt);
>>> + if (nxt) {
>>> + if (nxt->flags & REQ_F_WORK_INITIALIZED)
>>> + io_queue_async_work(nxt);
>>
>> Don't think it will work. E.g. io_close_prep() may have set
>> REQ_F_WORK_INITIALIZED but without io_req_work_grab_env().
>
> This really doesn't change the existing path, it just makes sure we
> don't do io_req_task_queue() on something that has already modified
> ->work (and hence, ->task_work). This might miss cases where we have
> only cleared it and done nothing else, but that just means we'll have
> cases that we could potentially improve the effiency of down the line.
Before the patch it was always initialising linked reqs, and that would
work ok, if not this lazy grab_env().
E.g. req1 -> close_req
It calls, io_req_defer_prep(__close_req__, sqe, __false__)
which doesn't do grab_env() because of for_async=false,
but calls io_close_prep() which sets REQ_F_WORK_INITIALIZED.
Then, after completion of req1 it will follow added lines
if (nxt)
if (nxt->flags & REQ_F_WORK_INITIALIZED)
io_queue_async_work(nxt);
Ending up in
io_queue_async_work()
-> grab_env()
And that's who knows from which context.
E.g. req1 was an rw completed in an irq.
Not sure it's related, but fallocate shows the log below, and some other tests
hang the kernel as well.
[ 42.445719] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 42.445723] #PF: supervisor write access in kernel mode
[ 42.445725] #PF: error_code(0x0002) - not-present page
[ 42.445726] PGD 0 P4D 0
[ 42.445729] Oops: 0002 [#1] PREEMPT SMP PTI
[ 42.445733] CPU: 1 PID: 1511 Comm: io_wqe_worker-0 Tainted: G I 5.8.0-rc2-00358-g9d2391dd5359 #489
[ 42.445740] RIP: 0010:override_creds+0x19/0x30
...
[ 42.445754] Call Trace:
[ 42.445758] io_worker_handle_work+0x25c/0x430
[ 42.445760] io_wqe_worker+0x2a0/0x350
[ 42.445764] ? _raw_spin_unlock_irqrestore+0x24/0x40
[ 42.445766] ? io_worker_handle_work+0x430/0x430
[ 42.445769] kthread+0x136/0x180
[ 42.445771] ? kthread_park+0x90/0x90
[ 42.445774] ret_from_fork+0x22/0x30
>
>>> -static inline void req_set_fail_links(struct io_kiocb *req)
>>> -{
>>> - if ((req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) == REQ_F_LINK)
>>> - req->flags |= REQ_F_FAIL_LINK;
>>> -}
>>> -
>>
>> I think it'd be nicer in 2 patches, first moving io_sq_thread_drop_mm, etc. up.
>> And the second one doing actual work.
>
> Yeah I agree...
>
>>> static void io_complete_rw_common(struct kiocb *kiocb, long res,
>>> struct io_comp_state *cs)
>>> {
>>> @@ -2035,35 +2120,6 @@ static void io_complete_rw_common(struct kiocb *kiocb, long res,
>>> __io_req_complete(req, res, cflags, cs);
>>> }
>>>
>> ...
>>> switch (req->opcode) {
>>> case IORING_OP_NOP:
>>> @@ -5347,7 +5382,7 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (!req->io) {
>>> if (io_alloc_async_ctx(req))
>>> return -EAGAIN;
>>> - ret = io_req_defer_prep(req, sqe);
>>> + ret = io_req_defer_prep(req, sqe, true);
>>
>> Why head of a link is for_async?
>
> True, that could be false instead.
>
> Since these are just minor things, we can do a fix on top. I don't want
> to reshuffle this unless I have to.
Agree, I have a pile on top myself.
--
Pavel Begunkov
next prev parent reply other threads:[~2020-06-26 21:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-25 18:27 [PATCH] io_uring: use task_work for links if possible Jens Axboe
2020-06-25 20:28 ` Pavel Begunkov
2020-06-25 21:37 ` Jens Axboe
2020-06-26 9:41 ` Pavel Begunkov
2020-06-26 20:27 ` Pavel Begunkov
2020-06-26 20:43 ` Jens Axboe
2020-06-26 21:20 ` Pavel Begunkov [this message]
2020-06-27 1:45 ` Jens Axboe
2020-06-27 10:57 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox