Re: [PATCH 6/8] io_uring/net: support multishot for send

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, Dylan Yudaken <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH 6/8] io_uring/net: support multishot for send
Date: Wed, 28 Feb 2024 12:39:36 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/26/24 21:27, Jens Axboe wrote:
> On 2/26/24 1:51 PM, Pavel Begunkov wrote:
>> On 2/26/24 20:12, Jens Axboe wrote:
>>> On 2/26/24 12:21 PM, Pavel Begunkov wrote:
>>>> On 2/26/24 19:11, Jens Axboe wrote:
>>>>> On 2/26/24 8:41 AM, Pavel Begunkov wrote:
>>>>>> On 2/26/24 15:16, Jens Axboe wrote:
>>>>>>> On 2/26/24 7:36 AM, Pavel Begunkov wrote:
>>>>>>>> On 2/26/24 14:27, Jens Axboe wrote:
>>>>>>>>> On 2/26/24 7:02 AM, Dylan Yudaken wrote:
>>>>>>>>>> On Mon, Feb 26, 2024 at 1:38?PM Jens Axboe
>> ...
>>>>> I don't think that's true - if you're doing large streaming, you're
>>>>> more likely to keep the socket buffer full, whereas for smallish
>>>>> sends, it's less likely to be full. Testing with the silly proxy
>>>>> confirms that. And
>>>>
>>>> I don't see any contradiction to what I said. With streaming/large
>>>> sends it's more likely to be polled. For small sends and
>>>> send-receive-send-... patterns the sock queue is unlikely to be full,
>>>> in which case the send is processed inline, and so the feature
>>>> doesn't add performance, as you agreed a couple email before.
>>>
>>> Gotcha, I guess I misread you, we agree that the poll side is more
>>> likely on bigger buffers.
>>>
>>>>> outside of cases where pacing just isn't feasible, it's extra
>>>>> overhead for cases where you potentially could or what.
>>>>
>>>> I lost it, what overhead?
>>>
>>> Overhead of needing to serialize the sends in the application, which may
>>> include both extra memory needed and overhead in dealing with it.
>>
>> I think I misread the code. Does it push 1 request for each
>> send buffer / queue_send() in case of provided bufs?
> 
> Right, that's the way it's currently setup. Per send (per loop), if
> you're using provided buffers, it'll do a send per buffer. If using
> multishot on top of that, it'll do one send per loop regardless of the
> number of buffers.

Ok, then I'd say the performance tests are misleading, at least
the proxy one. For selected buffers, sending tons of requests sounds
unnecessarily expensive, but I don't have numbers to back it. The
normal case must employ the natural batching it has with
serialisation.

It's important comparing programs that are optimised and close
to what is achievable in userspace. You wouldn't compare it with

while (i = 0; i < nr_bytes; i++)
	send(data+i, bytes=1);

and claim that it's N times faster. Or surely we can add an
empty "for" loop to one side and say "users are stupid and
will put garbage here anyway, so fair game", but then it
needs a note in small font "valid only for users who can't
write code up to standard"

static void defer_send() {
	ps = malloc(sizeof(*ps));
}

fwiw, maybe malloc is not really expensive there, but
that sounds like a pretty bad practice for a hi perf
benchmark.

>> Anyway, the overhead of serialisation would be negligent.
>> And that's same extra memory you keep for the provided buffer
>> pool, and you can allocate it once. Also consider that provided
>> buffers are fixed size and it'd be hard to resize without waiting,
>> thus the userspace would still need to have another, userspace
>> backlog, it can't just drop requests. Or you make provided queues
>> extra large, but it's per socket and you'd wasting lots of memory.
>>
>> IOW, I don't think this overhead could anyhow close us to
>> the understanding of the 30%+ perf gap.
> 
> The 32-byte case is obviously somewhat pathological, as you're going to
> be much better off having a bunch of these pipelined rather than issued
> serially. As you can see from the 1000 byte packets, at that point it
> doesn't matter that much. It's mostly about making it simpler at that
> point.
> 
>>>>> To me, the main appeal of this is the simplicity.
>>>>
>>>> I'd argue it doesn't seem any simpler than the alternative.
>>>
>>> It's certainly simpler for an application to do "add buffer to queue"
>>> and not need to worry about managing sends, than it is to manage a
>>> backlog of only having a single send active.
>>
>> They still need to manage / re-queue sends. And maybe I
>> misunderstand the point, but it's only one request inflight
>> per socket in either case.
> 
> Sure, but one is a manageable condition, the other one is not. If you
> can keep N inflight at the same time and only abort the chain in case of
> error/short send, that's a corner case. Versus not knowing when things
> get reordered, and hence always needing to serialize.

Manageable or not, you still have to implement all that, whereas
serialisation is not complex and I doubt it's anywhere expensive
enough to overturn the picture. It seems that multishot selected
bufs would also need serialisation, and for oneshots managing
multiple requests when you don't know which one sends what buffer
would be complicated in real scenarios.

>>> What kind of batching? The batching done by the tests are the same,
>>> regardless of whether or not send multishot is used in the sense that we
>>
>> You can say that, but I say that it moves into the kernel
>> batching that can be implemented in userspace.
> 
> And then most people get it wrong or just do the basic stuff, and
> performance isn't very good. Getting the most out of it can be tricky
> and require extensive testing and knowledge building. I'm confident
> you'd be able to write an efficient version, but that's not the same as
> saying "it's trivial to write an efficient version".

It actually _is_ trivial to write an efficient version for anyone
competent enough and having a spare day for that, which is usually
in a form of a library

I'm a firm believer of not putting into the kernel what can
already be well implemented in userspace, because the next step
could be "firefox can be implemented in userspace, but it requires
knowledge building, so let's implement it in the kernel". At
least I recall nginx / HTTP servers not flying well

>>> wait on the same number of completions. As it's a basic proxy kind of
>>> thing, it'll receive a packet and send a packet. Submission batching is
>>> the same too, we'll submit when we have to.
>>
>> "If you actually need to poll tx, you send a request and collect
>> data into iov in userspace in background. When the request
>> completes you send all that in batch..."
>>
>> That's how it's in Thrift for example.
>>
>> In terms of "proxy", the first approximation would be to
>> do sth like defer_send() for normal requests as well, then
>>
>> static void __queue_send(struct io_uring *ring, struct conn *c, int fd,
>>               void *data, int bid, int len)
>> {
>>      ...
>>
>>      defer_send(data);
>>
>>      while (buf = defer_backlog.get()) {
>>          iov[idx++] = buf;
>>      }
>>      msghdr->iovlen = idx;
>>      ...
>> }
> 
> Yep, that's the iovec coalescing, and that could certainly be done. And
> then you need to size the iov[] so that it's always big enough, OR
> submit that send and still deal with managing your own backlog.

Which is as trivial as iov.push_back() or a couple of lines with
realloc, whereas for selected buffers you would likely need to
wait for the previous request[s] to complete, which you cannot
do in place.

The point is, it seems that when you'd try to write something
error proof and real life ready, it wouldn't look simpler or
much simpler than the approach suggested, then the performance
is the question.

> I don't think we disagree that there are other solutions. I'm saying
> that I like this solution. I think it's simple to use for the cases that
> can use it, and that's why the patches exist. It fits with the notion of
> an async API being able to keep multiple things in flight, rather than a
> semi solution where you kind of can, except not for cases X and Y
> because of corner cases.
...

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2024-02-28 13:02 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-25  0:35 [PATCHSET v3 0/8] Support for provided buffers for send Jens Axboe
2024-02-25  0:35 ` [PATCH 1/8] io_uring/net: unify how recvmsg and sendmsg copy in the msghdr Jens Axboe
2024-02-26 14:41   ` Pavel Begunkov
2024-02-26 15:03     ` Jens Axboe
2024-02-25  0:35 ` [PATCH 2/8] net: remove {revc,send}msg_copy_msghdr() from exports Jens Axboe
2024-02-25  0:35 ` [PATCH 3/8] io_uring/net: add provided buffer support for IORING_OP_SEND Jens Axboe
2024-02-25  0:35 ` [PATCH 4/8] io_uring/net: add provided buffer support for IORING_OP_SENDMSG Jens Axboe
2024-02-25  0:35 ` [PATCH 5/8] io_uring/kbuf: flag request if buffer pool is empty after buffer pick Jens Axboe
2024-02-25  0:35 ` [PATCH 6/8] io_uring/net: support multishot for send Jens Axboe
2024-02-26 10:47   ` Dylan Yudaken
2024-02-26 13:38     ` Jens Axboe
2024-02-26 14:02       ` Dylan Yudaken
2024-02-26 14:27         ` Jens Axboe
2024-02-26 14:36           ` Pavel Begunkov
2024-02-26 15:16             ` Jens Axboe
2024-02-26 15:41               ` Pavel Begunkov
2024-02-26 19:11                 ` Jens Axboe
2024-02-26 19:21                   ` Pavel Begunkov
2024-02-26 20:12                     ` Jens Axboe
2024-02-26 20:51                       ` Pavel Begunkov
2024-02-26 21:27                         ` Jens Axboe
2024-02-28 12:39                           ` Pavel Begunkov [this message]
2024-02-28 17:28                             ` Jens Axboe
2024-02-28 23:49                               ` Jens Axboe
2024-02-29  1:46                                 ` Jens Axboe
2024-02-29 15:42                                   ` Jens Axboe
2024-02-26 19:31           ` Dylan Yudaken
2024-02-26 19:49             ` Jens Axboe
2024-02-25  0:35 ` [PATCH 7/8] io_uring/net: support multishot for sendmsg Jens Axboe
2024-02-25  0:35 ` [PATCH 8/8] io_uring/net: set MSG_MORE if we're doing multishot send and have more Jens Axboe
2024-02-26 10:59   ` Dylan Yudaken
2024-02-26 13:42     ` Jens Axboe
2024-02-26 14:24       ` Pavel Begunkov
2024-02-26 14:52         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox