public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, David Wei <[email protected]>,
	[email protected], [email protected]
Cc: Jakub Kicinski <[email protected]>, Paolo Abeni <[email protected]>,
	"David S. Miller" <[email protected]>,
	Eric Dumazet <[email protected]>,
	Jesper Dangaard Brouer <[email protected]>,
	David Ahern <[email protected]>,
	Mina Almasry <[email protected]>
Subject: Re: [RFC PATCH v4 13/16] io_uring: add io_recvzc request
Date: Fri, 15 Mar 2024 23:52:02 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 3/15/24 18:38, Jens Axboe wrote:
> On 3/15/24 11:34 AM, Pavel Begunkov wrote:
>> On 3/14/24 16:14, Jens Axboe wrote:
>> [...]
>>>>>> @@ -1053,6 +1058,85 @@ struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req,
>>>>>>         return ifq;
>>>>>>     }
>>>>>>     +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>>> +{
>>>>>> +    struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
>>>>>> +
>>>>>> +    /* non-iopoll defer_taskrun only */
>>>>>> +    if (!req->ctx->task_complete)
>>>>>> +        return -EINVAL;
>>>>>
>>>>> What's the reasoning behind this?
>>>>
>>>> CQ locking, see the comment a couple lines below
>>>
>>> My question here was more towards "is this something we want to do".
>>> Maybe this is just a temporary work-around and it's nothing to discuss,
>>> but I'm not sure we want to have opcodes only work on certain ring
>>> setups.
>>
>> I don't think it's that unreasonable restricting it. It's hard to
>> care about !DEFER_TASKRUN for net workloads, it makes CQE posting a bit
> 
> I think there's a distinction between "not reasonable to support because
> it's complicated/impossible to do so", and "we prefer not to support
> it". I agree, as a developer it's hard to care about !DEFER_TASKRUN for
> networking workloads, but as a user, they will just setup a default
> queue until they wise up. And maybe this can be a good thing in that

They'd still need to find a supported NIC and do all the other
setup, comparably to that it doesn't add much trouble. And my
usual argument is that io_uring is a low-level api, it's expected
that people interacting with it directly are experienced enough,
expect to spend some time to make it right and likely library
devs.

> they'd be nudged toward DEFER_TASKRUN, but I can also see some head
> scratching when something just returns (the worst of all error codes)
> -EINVAL when they attempt to use it.

Yeah, we should try to find a better error code, and the check
should migrate to ifq registration.

>> cleaner, and who knows where the single task part would become handy.
> 
> But you can still take advantage of single task, since you know if
> that's going to be true or not. It just can't be unconditional.
> 
>> Thinking about ifq termination, which should better cancel and wait
>> for all corresponding zc requests, it's should be easier without
>> parallel threads. E.g. what if another thread is in the enter syscall
>> using ifq, or running task_work and not cancellable. Then apart
>> from (non-atomic) refcounting, we'd need to somehow wait for it,
>> doing wake ups on the zc side, and so on.
> 
> I don't know, not seeing a lot of strong arguments for making it
> DEFER_TASKRUN only. My worry is that once we starting doing that, then
> more will follow. And honestly I think that would be a shame.
> 
> For ifq termination, surely these things are referenced, and termination
> would need to wait for the last reference to drop? And if that isn't an
> expected condition (it should not be), then a percpu ref would suffice.
> Nobody cares if the teardown side is more expensive, as long as the fast
> path is efficient.

You can solve any of that, it's true, the question how much crap
you'd need to add in hot paths and diffstat wise. Just take a look
at what a nice function io_recvmsg() is together with its helpers
like io_recvmsg_multishot().

The biggest concern is optimisations and quirks that we can't
predict at the moment. DEFER_TASKRUN/SINGLE_ISSUER provide a simpler
model, I'd rather keep recvzc simple than having tens of conditional
optimisations with different execution flavours and contexts.
Especially, since it can be implemented later, wouldn't work the
other way around.

> Dunno - anyway, for now let's just leave it as-is, it's just something
> to consider once we get closer to a more finished patchset.
> 
>> The CQ side is easy to support though, put conditional locking
>> around the posting like fill/post_cqe does with the todays
>> patchset.
> 
> Yep, which is one of the reasons why I was hopeful this could go away!
> 

-- 
Pavel Begunkov

  reply	other threads:[~2024-03-15 23:53 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12 21:44 [RFC PATCH v4 00/16] Zero copy Rx using io_uring David Wei
2024-03-12 21:44 ` [RFC PATCH v4 01/16] net: generalise pp provider params passing David Wei
2024-03-12 21:44 ` [RFC PATCH v4 02/16] io_uring: delayed cqe commit David Wei
2024-03-12 21:44 ` [RFC PATCH v4 03/16] net: page_pool: add ->scrub mem provider callback David Wei
2024-03-12 21:44 ` [RFC PATCH v4 04/16] io_uring: separate header for exported net bits David Wei
2024-03-12 21:44 ` [RFC PATCH v4 05/16] io_uring: introduce interface queue David Wei
2024-03-12 21:44 ` [RFC PATCH v4 06/16] io_uring: add mmap support for shared ifq ringbuffers David Wei
2024-03-12 21:44 ` [RFC PATCH v4 07/16] netdev: add XDP_SETUP_ZC_RX command David Wei
2024-03-12 21:44 ` [RFC PATCH v4 08/16] io_uring: setup ZC for an Rx queue when registering an ifq David Wei
2024-03-12 21:44 ` [RFC PATCH v4 09/16] io_uring/zcrx: implement socket registration David Wei
2024-03-12 21:44 ` [RFC PATCH v4 10/16] io_uring: add zero copy buf representation and pool David Wei
2024-03-12 21:44 ` [RFC PATCH v4 11/16] io_uring: implement pp memory provider for zc rx David Wei
2024-03-12 21:44 ` [RFC PATCH v4 12/16] io_uring/zcrx: implement PP_FLAG_DMA_* handling David Wei
2024-03-12 21:44 ` [RFC PATCH v4 13/16] io_uring: add io_recvzc request David Wei
2024-03-13 20:25   ` Jens Axboe
2024-03-13 20:26     ` Pavel Begunkov
2024-03-13 21:03       ` Jens Axboe
2024-03-14 16:14       ` Jens Axboe
2024-03-15 17:34         ` Pavel Begunkov
2024-03-15 18:38           ` Jens Axboe
2024-03-15 23:52             ` Pavel Begunkov [this message]
2024-03-16 16:59               ` Jens Axboe
2024-03-17 21:22                 ` Pavel Begunkov
2024-03-17 21:30                   ` Jens Axboe
2024-03-12 21:44 ` [RFC PATCH v4 14/16] net: execute custom callback from napi David Wei
2024-03-12 21:44 ` [RFC PATCH v4 15/16] io_uring/zcrx: add copy fallback David Wei
2024-03-12 21:44 ` [RFC PATCH v4 16/16] veth: add support for io_uring zc rx David Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox