public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, Ming Lei <[email protected]>
Cc: [email protected], [email protected],
	Uday Shankar <[email protected]>,
	Akilesh Kailash <[email protected]>
Subject: Re: [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf
Date: Thu, 31 Oct 2024 15:25:55 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 10/31/24 14:29, Jens Axboe wrote:
> On 10/31/24 7:25 AM, Pavel Begunkov wrote:
>> On 10/30/24 02:43, Jens Axboe wrote:
>>> On 10/29/24 8:03 PM, Ming Lei wrote:
>>>> On Tue, Oct 29, 2024 at 03:26:37PM -0600, Jens Axboe wrote:
>>>>> On 10/29/24 2:06 PM, Jens Axboe wrote:
>>>>>> On 10/29/24 1:18 PM, Jens Axboe wrote:
>> ...
>>>>> +    node->buf = imu;
>>>>> +    node->kbuf_fn = kbuf_fn;
>>>>> +    return node;
>>>>
>>>> Also this function needs to register the buffer to table with one
>>>> pre-defined buf index, then the following request can use it by
>>>> the way of io_prep_rw_fixed().
>>>
>>> It should not register it with the table, the whole point is to keep
>>> this node only per-submission discoverable. If you're grabbing random
>>> request pages, then it very much is a bit finicky
>>
>> Registering it into the table has enough of design and flexibility
>> merits: error handling, allowing any type of dependencies of requests
>> by handling it in the user space, etc.
> 
> Right, but it has to be a special table. See my lengthier reply to Ming.

Mind pointing the specific part? I read through the thread and didn't
see why it _has_ to be a special table.
And by "special" I assume you mean the property of it being cleaned up
/ flushed by the end of submission / syscall, right?

> The initial POC did install it into a table, it's just a one-slot table,

By "table" I actually mean anything that survives beyond the current
syscall / submission and potentially can be used by requests submitted
with another syscall.

> io_submit_state. I think the right approach is to have an actual struct
> io_rsrc_data local_table in the ctx, with refs put at the end of submit.
> Same kind of concept, just allows for more entries (potentially), with
> the same requirement that nodes get put when submit ends. IOW, requests
> need to find it within the same submit.
> 
> Obviously you would not NEED to do that, but if the use case is grabbing
> bvecs out of a request, then it very much should not be discoverable
> past the initial assignments within that submit scope.
> 
>>> and needs to be of
>>> limited scope.
>>
>> And I don't think we can force it, neither with limiting exposure to
>> submission only nor with the Ming's group based approach. The user can
>> always queue a request that will never complete and/or by using
>> DEFER_TASKRUN and just not letting it run. In this sense it might be
>> dangerous to block requests of an average system shared block device,
>> but if it's fine with ublk it sounds like it should be fine for any of
>> the aforementioned approaches.
> 
> As long as the resource remains valid until the last put of the node,
> then it should be OK. Yes the application can mess things up in terms of

It should be fine in terms of buffers staying alive. The "dangerous"
part I mentioned is about abuse of a shared resource, e.g. one
container locking up all requests of a bdev so that another container
can't do any IO, maybe even with an fs on top. Nevertheless, it's ublk,
I don't think we need to concern about that much since io_uring is
on the other side from normal user space.

> latency if it uses one of these bufs for eg a read on a pipe that never
> gets any data, but the data will remain valid regardless. And that's
> very much a "doctor it hurts when I..." case, it should not cause any

Right, I care about malicious abuse when it can affect other users,
break isolation / fairness, etc., I'm saying that there is no
difference between all the approaches in this aspect, and if so
it should also be perfectly ok from the kernel's perspective to allow
to leave a buffer in the table long term. If the user wants to screw
itself and doesn't remove the buffer that's the user's choice to
shoot itself in the leg.

 From this angle, that I look at the auto removal you add not as some
security / etc. concern, but just as a QoL / performance feature so
that the user doesn't need to remove the buffer by hand.

FWIW, instead of having another table, we can just mark a sub range
of the main buffer table to be cleared every time after submission,
just like we separate auto slot allocation with ranges.

> safety issues. It'll just prevent progress for the other requests that
> are using that buffer, if they need the final put to happen before
> making progress.

-- 
Pavel Begunkov

  reply	other threads:[~2024-10-31 15:25 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 12:22 [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Ming Lei
2024-10-25 12:22 ` [PATCH V8 1/7] io_uring: add io_link_req() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 2/7] io_uring: add io_submit_fail_link() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 3/7] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-10-25 12:22 ` [PATCH V8 4/7] io_uring: support SQE group Ming Lei
2024-10-29  0:12   ` Jens Axboe
2024-10-29  1:50     ` Ming Lei
2024-10-29 16:38       ` Pavel Begunkov
2024-10-25 12:22 ` [PATCH V8 5/7] io_uring: support leased group buffer with REQ_F_GROUP_KBUF Ming Lei
2024-10-29 16:47   ` Pavel Begunkov
2024-10-30  0:45     ` Ming Lei
2024-10-30  1:25       ` Pavel Begunkov
2024-10-30  2:04         ` Ming Lei
2024-10-31 13:16           ` Pavel Begunkov
2024-10-25 12:22 ` [PATCH V8 6/7] io_uring/uring_cmd: support leasing device kernel buffer to io_uring Ming Lei
2024-10-25 12:22 ` [PATCH V8 7/7] ublk: support leasing io " Ming Lei
2024-10-29 17:01 ` [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Pavel Begunkov
2024-10-29 17:04   ` Jens Axboe
2024-10-29 19:18     ` Jens Axboe
2024-10-29 20:06       ` Jens Axboe
2024-10-29 21:26         ` Jens Axboe
2024-10-30  2:03           ` Ming Lei
2024-10-30  2:43             ` Jens Axboe
2024-10-30  3:08               ` Ming Lei
2024-10-30  4:11                 ` Ming Lei
2024-10-30 13:20                   ` Jens Axboe
2024-10-31  2:53                     ` Ming Lei
2024-10-31 13:35                       ` Jens Axboe
2024-10-31 15:07                         ` Jens Axboe
2024-10-31 13:42                       ` Pavel Begunkov
2024-10-30 13:18                 ` Jens Axboe
2024-10-31 13:25               ` Pavel Begunkov
2024-10-31 14:29                 ` Jens Axboe
2024-10-31 15:25                   ` Pavel Begunkov [this message]
2024-10-31 15:42                     ` Jens Axboe
2024-10-31 16:29                       ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox