From: Bijan Mottahedeh <[email protected]>
To: Pavel Begunkov <[email protected]>,
[email protected], [email protected]
Subject: Re: [PATCH v3 08/13] io_uring: implement fixed buffers registration similar to fixed files
Date: Thu, 7 Jan 2021 14:14:42 -0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/7/2021 1:37 PM, Pavel Begunkov wrote:
> On 07/01/2021 21:21, Bijan Mottahedeh wrote:
>>
>>>>> Because it's do quiesce, fixed read/write access buffers from asynchronous
>>>>> contexts without synchronisation. That won't work anymore, so
>>>>>
>>>>> 1. either we save it in advance, that would require extra req_async
>>>>> allocation for linked fixed rw
>>>>>
>>>>> 2. or synchronise whenever async. But that would mean that a request
>>>>> may get and do IO on two different buffers, that's rotten.
>>>>>
>>>>> 3. do mixed -- lazy, but if do IO then alloc.
>>>>>
>>>>> 3.5 also "synchronise" there would mean uring_lock, that's not welcome,
>>>>> but we can probably do rcu.
>>>>
>>>> Are you referring to a case where a fixed buffer request can be submitted from async context while those buffers are being unregistered, or something like that?
>>>>
>>>>> Let me think of a patch...
>>>
>>> The most convenient API would be [1], it selects a buffer during
>>> submission, but allocates if needs to go async or for all linked
>>> requests.
>>>
>>> [2] should be correct from the kernel perspective (no races), it
>>> also solves doing IO on 2 different buffers, that's nasty (BTW,
>>> [1] solves this problem naturally). However, a buffer might be
>>> selected async, but the following can happen, and user should
>>> wait for request completion before removing a buffer.
>>>
>>> 1. register buf id=0
>>> 2. syscall io_uring_enter(submit=RW_FIXED,buf_id=0,IOSQE_ASYNC)
>>> 3. unregister buffers
>>> 4. the request may not find the buffer and fail.
>>>
>>> Not very convenient + can actually add overhead on the userspace
>>> side, can be even some heavy synchronisation.
>>>
>>> uring_lock in [2] is not nice, but I think I can replace it
>>> with rcu, probably can even help with sharing, but I need to
>>> try to implement to be sure.
>>>
>>> So that's an open question what API to have.
>>> Neither of diffs is tested.
>>>
>>> [1]
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 7e35283fc0b1..2171836a9ce3 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -826,6 +826,7 @@ static const struct io_op_def io_op_defs[] = {
>>> .needs_file = 1,
>>> .unbound_nonreg_file = 1,
>>> .pollin = 1,
>>> + .needs_async_data = 1,
>>> .plug = 1,
>>> .async_size = sizeof(struct io_async_rw),
>>> .work_flags = IO_WQ_WORK_BLKCG | IO_WQ_WORK_MM,
>>> @@ -835,6 +836,7 @@ static const struct io_op_def io_op_defs[] = {
>>> .hash_reg_file = 1,
>>> .unbound_nonreg_file = 1,
>>> .pollout = 1,
>>> + .needs_async_data = 1,
>>> .plug = 1,
>>> .async_size = sizeof(struct io_async_rw),
>>> .work_flags = IO_WQ_WORK_BLKCG | IO_WQ_WORK_FSIZE |
>>>
>>>
>>>
>>> [2]
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 7e35283fc0b1..31560b879fb3 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -3148,7 +3148,12 @@ static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
>>> opcode = req->opcode;
>>> if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) {
>>> *iovec = NULL;
>>> - return io_import_fixed(req, rw, iter);
>>> +
>>> + io_ring_submit_lock(req->ctx, needs_lock);
>>> + lockdep_assert_held(&req->ctx->uring_lock);
>>> + ret = io_import_fixed(req, rw, iter);
>>> + io_ring_submit_unlock(req->ctx, needs_lock);
>>> + return ret;
>>> }
>>> /* buffer index only valid with fixed read/write, or buffer select */
>>> @@ -3638,7 +3643,7 @@ static int io_write(struct io_kiocb *req, bool force_nonblock,
>>> copy_iov:
>>> /* some cases will consume bytes even on error returns */
>>> iov_iter_revert(iter, io_size - iov_iter_count(iter));
>>> - ret = io_setup_async_rw(req, iovec, inline_vecs, iter, false);
>>> + ret = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
>>> if (!ret)
>>> return -EAGAIN;
>>> }
>>>
>>>
>>
>> For my understanding, is [1] essentially about stashing the iovec for the fixed IO in an io_async_rw struct and referencing it in async context?
>
> Yes, like that. It actually doesn't use iov but employs bvec, which
> it gets from struct io_mapped_ubuf, and stores it inside iter.
>
>> I don't understand how this prevents unregistering the buffer (described by the iovec) while the IO takes place.
>
> The bvec itself is guaranteed to be alive during the whole lifetime
> of the request, that's because of all that percpu_ref in nodes.
> However, the table storing buffers (i.e. ctx->user_bufs) may be
> overwritten.
>
> reg/unreg/update happens with uring_lock held, as well as submission.
> Hence if we always grab a buffer during submission it will be fine.
So because of the uring_lock being held, if we implement [1], then once
we grab a fixed buffer during submission, we are guaranteed that the IO
successfully completes, even if the buffer table is overwritten?
Would the bvec persistence help us with buffer sharing and the deadlock
scenario you brought up as well? If the sharing task wouldn't have to
block for the attached tasks to get rid of their references, it seems
that any outstanding IO would complete successfully.
My concern however is what would happen if the sharing task actually
*frees* its buffers after returning from unregister, since those buffers
would still live in the buf_data, right?
>> Taking a step back, what is the cost of keeping the quiesce for buffer registration operations? It should not be a frequent operation even a heavy handed quiesce should not be a big issue?
>
> It waits for __all__ inflight requests to complete and doesn't allow
> submissions in the meantime (basically all io_uring_enter() attempts
> will fail). +grace period.
>
> It's pretty heavy, but worse is that it shuts down everything while
> waiting. However, if an application is prepared for that and it's
> really rare or done once, that should be ok.
>
Jens, what do you think?
next prev parent reply other threads:[~2021-01-07 22:17 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-18 18:07 [PATCH v3 00/13] io_uring: buffer registration enhancements Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 01/13] io_uring: modularize io_sqe_buffer_register Bijan Mottahedeh
2021-01-04 21:54 ` Pavel Begunkov
2021-01-06 19:46 ` Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 02/13] io_uring: modularize io_sqe_buffers_register Bijan Mottahedeh
2021-01-04 21:48 ` Pavel Begunkov
2020-12-18 18:07 ` [PATCH v3 03/13] io_uring: rename file related variables to rsrc Bijan Mottahedeh
2021-01-05 1:53 ` Pavel Begunkov
2021-01-06 19:46 ` Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 04/13] io_uring: generalize io_queue_rsrc_removal Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 05/13] io_uring: separate ref_list from fixed_rsrc_data Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 06/13] io_uring: generalize fixed_file_ref_node functionality Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 07/13] io_uring: add rsrc_ref locking routines Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 08/13] io_uring: implement fixed buffers registration similar to fixed files Bijan Mottahedeh
2021-01-05 2:43 ` Pavel Begunkov
2021-01-06 19:46 ` Bijan Mottahedeh
2021-01-06 22:22 ` Pavel Begunkov
2021-01-07 2:37 ` Pavel Begunkov
2021-01-07 21:21 ` Bijan Mottahedeh
2021-01-07 21:37 ` Pavel Begunkov
2021-01-07 22:14 ` Bijan Mottahedeh [this message]
2021-01-07 22:33 ` Pavel Begunkov
2021-01-07 23:10 ` Pavel Begunkov
2021-01-08 1:53 ` Bijan Mottahedeh
2021-01-11 5:12 ` Pavel Begunkov
2021-01-08 0:17 ` Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 09/13] io_uring: create common fixed_rsrc_ref_node handling routines Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 10/13] io_uring: generalize files_update functionlity to rsrc_update Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 11/13] io_uring: support buffer registration updates Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 12/13] io_uring: create common fixed_rsrc_data allocation routines Bijan Mottahedeh
2020-12-18 18:07 ` [PATCH v3 13/13] io_uring: support buffer registration sharing Bijan Mottahedeh
2021-01-04 17:09 ` [PATCH v3 00/13] io_uring: buffer registration enhancements Bijan Mottahedeh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox