Re: [RFC PATCH v3 15/20] io_uring: add io_recvzc request

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, David Wei <[email protected]>,
	[email protected], [email protected]
Cc: Jakub Kicinski <[email protected]>, Paolo Abeni <[email protected]>,
	"David S. Miller" <[email protected]>,
	Eric Dumazet <[email protected]>,
	Jesper Dangaard Brouer <[email protected]>,
	David Ahern <[email protected]>,
	Mina Almasry <[email protected]>
Subject: Re: [RFC PATCH v3 15/20] io_uring: add io_recvzc request
Date: Sat, 30 Dec 2023 21:15:53 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 12/21/23 21:32, Jens Axboe wrote:
> On 12/21/23 11:59 AM, Pavel Begunkov wrote:
>> On 12/20/23 18:09, Jens Axboe wrote:
>>> On 12/20/23 10:04 AM, Pavel Begunkov wrote:
>>>> On 12/20/23 16:27, Jens Axboe wrote:
>>>>> On 12/19/23 2:03 PM, David Wei wrote:
>>>>>> diff --git a/io_uring/net.c b/io_uring/net.c
>>>>>> index 454ba301ae6b..7a2aadf6962c 100644
>>>>>> --- a/io_uring/net.c
>>>>>> +++ b/io_uring/net.c
[...]
>>>>>> +    if (issue_flags & IO_URING_F_UNLOCKED)
>>>>>> +        return -EAGAIN;
>>>>>
>>>>> This seems odd, why? If we're called with IO_URING_F_UNLOCKED set, then
>>>>
>>>> It's my addition, let me explain.
>>>>
>>>> io_recvzc() -> io_zc_rx_recv() -> ... -> zc_rx_recv_frag()
>>>>
>>>> This chain posts completions to a buffer completion queue, and
>>>> we don't want extra locking to share it with io-wq threads. In
>>>> some sense it's quite similar to the CQ locking, considering
>>>> we restrict zc to DEFER_TASKRUN. And doesn't change anything
>>>> anyway because multishot cannot post completions from io-wq
>>>> and are executed from the poll callback in task work.
>>>>
>>>>> it's from io-wq. And returning -EAGAIN there will not do anything to
>>>>
>>>> It will. It's supposed to just requeue for polling (it's not
>>>> IOPOLL to keep retrying -EAGAIN), just like multishots do.
>>>
>>> It definitely needs a good comment, as it's highly non-obvious when
>>> reading the code!
>>>
>>>> Double checking the code, it can actually terminate the request,
>>>> which doesn't make much difference for us because multishots
>>>> should normally never end up in io-wq anyway, but I guess we
>>>> can improve it a liitle bit.
>>>
>>> Right, assumptions seems to be that -EAGAIN will lead to poll arm, which
>>> seems a bit fragile.
>>
>> The main assumption is that io-wq will eventually leave the
>> request alone and push it somewhere else, either queuing for
>> polling or terminating, which is more than reasonable. I'd
> 
> But surely you don't want it terminated from here? It seems like a very
> odd choice. As it stands, if you end up doing more than one loop, then
> it won't arm poll and it'll get failed.
>> add that it's rather insane for io-wq indefinitely spinning
>> on -EAGAIN, but it has long been fixed (for !IOPOLL).
> 
> There's no other choice for polling, and it doesn't do it for

zc rx is !IOPOLL, that's what I care about.

> non-polling. The current logic makes sense - if you do a blocking
> attempt and you get -EAGAIN, that's really the final result and you
> cannot sanely retry for !IOPOLL in that case. Before we did poll arm for
> io-wq, even the first -EAGAIN would've terminated it. Relying on -EAGAIN
> from a blocking attempt to do anything but fail the request with -EAGAIN
> res is pretty fragile and odd, I think that needs sorting out.
> 
>> As said, can be made a bit better, but it won't change anything
>> for real life execution, multishots would never end up there
>> after they start listening for poll events.
> 
> Right, probably won't ever be a thing for !multishot. As mentioned in my
> original reply, it really just needs a comment explaining exactly what
> it's doing and why it's fine.
> 
>>>> And it should also use IO_URING_F_IOWQ, forgot I split out
>>>> it from *F_UNLOCK.
>>>
>>> Yep, that'd be clearer.
>>
>> Not "clearer", but more correct. Even though it's not
>> a bug because deps between the flags.
> 
> Both clearer and more correct, I would say.
> 
[...]
>>>
>>> Oracle coding?
>>
>> I.e. knowing how later patches (should) look like.
>>
>>> Each patch stands separately, there's no reason not to make this one as
>>
>> They are not standalone, you cannot sanely develop anything not
>> thinking how and where it's used, otherwise you'd get a set of
>> functions full of sleeping but later used in irq context or just
>> unfittable into a desired framework. By extent, code often is
>> written while trying to look a step ahead. For example, first
>> patches don't push everything into io_uring.c just to wholesale
>> move it into zc_rx.c because of exceeding some size threshold.
> 
> Yes, this is how most patch series are - they will compile separately,
> but obviously won't necessarily make sense or be functional until you
> get to the end. But since you very much do have future knowledge in
> these patches, there's no excuse for not making them interact with each
> other better. Each patch should not pretend it doesn't know what comes

Which is exactly the reason why it is how it is.

> next in a series, if you can make a followup patch simpler with a tweak
> to a previous patch, that is definitely a good idea.
> 
> And here, even the end result would be better imho without having
> 
> if (a) {
> 	big blob of stuff
> } else {
> 	other blob of stuff
> }
> 
> when it could just be
> 
> if (a)
> 	return big_blog_of_stuff();
> 
> return other_blog_of_stuff();
> 
> instead.

That sounds like a good general advice, but the "blobs" are
not big nor expected to grow to require splitting, I can't say
it makes it any cleaner or simpler.

>>> clean as it can be. And an error case with the main bits inline is a lot
>>
>> I agree that it should be clean among all, but it _is_ clean
>> and readable, all else is stylistic nit picking. And maybe it's
>> just my opinion, but I also personally appreciate when a patch is
>> easy to review, which includes not restructuring all written before
>> with every patch, which also helps with back porting and other
>> developing aspects.
> 
> But that's basically my point, it even makes followup patches simpler to
> read as well. Is it stylistic? Certainly, I just prefer having the above
> rather than two big indentations. But it also makes the followup patch
> simpler
> and it's basically a one-liner change at that point, and a
> bigger hunk of added code that's the new function that handles the new
> case.
> 
>>> nicer imho than two separate indented parts. For the latter addition
>>> instead of the -EOPNOTSUPP, would probably be nice to have it in a
>>> separate function. Probably ditto for the page pool case here now, would
>>> make the later patch simpler too.
>>
>> If we'd need it in the future, we'll change it then, patches
>> stand separately, at least it's IMHO not needed in the current
>> series.
> 
> It's still an RFC series, please do change it for v4.

It's not my patch, but I don't view it as moving the patches in
any positive direction.

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2023-12-30 21:17 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-19 21:03 [RFC PATCH v3 00/20] Zero copy Rx using io_uring David Wei
2023-12-19 21:03 ` [RFC PATCH v3 01/20] net: page_pool: add ppiov mangling helper David Wei
2023-12-19 23:22   ` Mina Almasry
2023-12-19 23:59     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 02/20] tcp: don't allow non-devmem originated ppiov David Wei
2023-12-19 23:24   ` Mina Almasry
2023-12-20  1:29     ` Pavel Begunkov
2024-01-02 16:11       ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 03/20] net: page pool: rework ppiov life cycle David Wei
2023-12-19 23:35   ` Mina Almasry
2023-12-20  0:49     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 04/20] net: enable napi_pp_put_page for ppiov David Wei
2023-12-19 21:03 ` [RFC PATCH v3 05/20] net: page_pool: add ->scrub mem provider callback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 06/20] io_uring: separate header for exported net bits David Wei
2023-12-20 16:01   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 07/20] io_uring: add interface queue David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-20 16:23     ` Pavel Begunkov
2023-12-21  1:44     ` David Wei
2023-12-21 17:57   ` Willem de Bruijn
2023-12-30 16:25     ` Pavel Begunkov
2023-12-31 22:25       ` Willem de Bruijn
2023-12-19 21:03 ` [RFC PATCH v3 08/20] io_uring: add mmap support for shared ifq ringbuffers David Wei
2023-12-20 16:13   ` Jens Axboe
2023-12-19 21:03 ` [RFC PATCH v3 09/20] netdev: add XDP_SETUP_ZC_RX command David Wei
2023-12-19 21:03 ` [RFC PATCH v3 10/20] io_uring: setup ZC for an Rx queue when registering an ifq David Wei
2023-12-20 16:06   ` Jens Axboe
2023-12-20 16:24     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 11/20] io_uring/zcrx: implement socket registration David Wei
2023-12-19 21:03 ` [RFC PATCH v3 12/20] io_uring: add ZC buf and pool David Wei
2023-12-19 21:03 ` [RFC PATCH v3 13/20] io_uring: implement pp memory provider for zc rx David Wei
2023-12-19 23:44   ` Mina Almasry
2023-12-20  0:39     ` Pavel Begunkov
2023-12-21 19:36   ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 14/20] net: page pool: add io_uring memory provider David Wei
2023-12-19 23:39   ` Mina Almasry
2023-12-20  0:04     ` Pavel Begunkov
2023-12-19 21:03 ` [RFC PATCH v3 15/20] io_uring: add io_recvzc request David Wei
2023-12-20 16:27   ` Jens Axboe
2023-12-20 17:04     ` Pavel Begunkov
2023-12-20 18:09       ` Jens Axboe
2023-12-21 18:59         ` Pavel Begunkov
2023-12-21 21:32           ` Jens Axboe
2023-12-30 21:15             ` Pavel Begunkov [this message]
2023-12-19 21:03 ` [RFC PATCH v3 16/20] net: execute custom callback from napi David Wei
2023-12-19 21:03 ` [RFC PATCH v3 17/20] io_uring/zcrx: add copy fallback David Wei
2023-12-19 21:03 ` [RFC PATCH v3 18/20] veth: add support for io_uring zc rx David Wei
2023-12-19 21:03 ` [RFC PATCH v3 19/20] net: page pool: generalise ppiov dma address get David Wei
2023-12-21 19:51   ` Mina Almasry
2023-12-19 21:03 ` [RFC PATCH v3 20/20] bnxt: enable io_uring zc page pool David Wei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox