public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, Dylan Yudaken <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH 6/8] io_uring/net: support multishot for send
Date: Mon, 26 Feb 2024 20:51:15 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/26/24 20:12, Jens Axboe wrote:
> On 2/26/24 12:21 PM, Pavel Begunkov wrote:
>> On 2/26/24 19:11, Jens Axboe wrote:
>>> On 2/26/24 8:41 AM, Pavel Begunkov wrote:
>>>> On 2/26/24 15:16, Jens Axboe wrote:
>>>>> On 2/26/24 7:36 AM, Pavel Begunkov wrote:
>>>>>> On 2/26/24 14:27, Jens Axboe wrote:
>>>>>>> On 2/26/24 7:02 AM, Dylan Yudaken wrote:
>>>>>>>> On Mon, Feb 26, 2024 at 1:38?PM Jens Axboe
...
>>> I don't think that's true - if you're doing large streaming, you're
>>> more likely to keep the socket buffer full, whereas for smallish
>>> sends, it's less likely to be full. Testing with the silly proxy
>>> confirms that. And
>>
>> I don't see any contradiction to what I said. With streaming/large
>> sends it's more likely to be polled. For small sends and
>> send-receive-send-... patterns the sock queue is unlikely to be full,
>> in which case the send is processed inline, and so the feature
>> doesn't add performance, as you agreed a couple email before.
> 
> Gotcha, I guess I misread you, we agree that the poll side is more
> likely on bigger buffers.
> 
>>> outside of cases where pacing just isn't feasible, it's extra
>>> overhead for cases where you potentially could or what.
>>
>> I lost it, what overhead?
> 
> Overhead of needing to serialize the sends in the application, which may
> include both extra memory needed and overhead in dealing with it.

I think I misread the code. Does it push 1 request for each
send buffer / queue_send() in case of provided bufs?

Anyway, the overhead of serialisation would be negligent.
And that's same extra memory you keep for the provided buffer
pool, and you can allocate it once. Also consider that provided
buffers are fixed size and it'd be hard to resize without waiting,
thus the userspace would still need to have another, userspace
backlog, it can't just drop requests. Or you make provided queues
extra large, but it's per socket and you'd wasting lots of memory.

IOW, I don't think this overhead could anyhow close us to
the understanding of the 30%+ perf gap.
  
>>> To me, the main appeal of this is the simplicity.
>>
>> I'd argue it doesn't seem any simpler than the alternative.
> 
> It's certainly simpler for an application to do "add buffer to queue"
> and not need to worry about managing sends, than it is to manage a
> backlog of only having a single send active.

They still need to manage / re-queue sends. And maybe I
misunderstand the point, but it's only one request inflight
per socket in either case.
  

>>>>> serialize sends. Using provided buffers makes this very easy,
>>>>> as you don't need to care about it at all, and it eliminates
>>>>> complexity in the application dealing with this.
>>>>
>>>> If I'm correct the example also serialises sends(?). I don't
>>>> think it's that simpler. You batch, you send. Same with this, but
>>>> batch into a provided buffer and the send is conditional.
>>>
>>> Do you mean the proxy example? Just want to be sure we're talking
>>> about
>>
>> Yes, proxy, the one you referenced in the CV. And FWIW, I don't think
>> it's a fair comparison without batching followed by multi-iov.
> 
> It's not about vectored vs non-vectored IO, though you could of course
> need to allocate an arbitrarily sized iovec that you can append to. And
> now you need to use sendmsg rather than just send, which has further
> overhead on top of send.

That's not nearly enough of overhead to explain the difference,
I don't believe so, going through the net stack is quite expensive.

> What kind of batching? The batching done by the tests are the same,
> regardless of whether or not send multishot is used in the sense that we

You can say that, but I say that it moves into the kernel
batching that can be implemented in userspace.

> wait on the same number of completions. As it's a basic proxy kind of
> thing, it'll receive a packet and send a packet. Submission batching is
> the same too, we'll submit when we have to.

"If you actually need to poll tx, you send a request and collect
data into iov in userspace in background. When the request
completes you send all that in batch..."

That's how it's in Thrift for example.

In terms of "proxy", the first approximation would be to
do sth like defer_send() for normal requests as well, then

static void __queue_send(struct io_uring *ring, struct conn *c, int fd,
			 void *data, int bid, int len)
{
	...

	defer_send(data);

	while (buf = defer_backlog.get()) {
		iov[idx++] = buf;
	}
	msghdr->iovlen = idx;
	...
}

>>> the same thing. Yes it has to serialize sends, because otherwise we
>>> can run into the condition described in the patch that adds
>>> provided buffer support for send. But I did bench multishot
>>> separately from there, here's some of it:
>>>
>>> 10G network, 3 hosts, 1 acting as a mirror proxy shuffling N-byte
>>> packets. Send ring and send multishot not used:
>>>
>>> Pkt sz | Send ring | mshot |  usec  |  QPS  |  Bw
>>> ===================================================== 1000   |
>>> No       |  No   |   437  | 1.22M | 9598M 32     |    No       |
>>> No   |  5856  | 2.87M |  734M
>>>
>>> Same test, now turn on send ring:
>>>
>>> Pkt sz | Send ring | mshot |  usec  |  QPS  |  Bw   | Diff
>>> =========================================================== 1000
>>> |    Yes       |  No   |   436  | 1.23M | 9620M | + 0.2% 32     |
>>> Yes       |  No   |  3462  | 4.85M | 1237M | +68.5%
>>>
>>> Same test, now turn on send mshot as well:
>>>
>>> Pkt sz | Send ring | mshot |  usec  |  QPS  |  Bw   | Diff
>>> =========================================================== 1000
>>> |    Yes       |  Yes  |   436  | 1.23M | 9620M | + 0.2% 32     |
>>> Yes       |  Yes  |  3125  | 5.37M | 1374M | +87.2%
>>>
>>> which does show that there's another win on top for just queueing
>>> these sends and doing a single send to handle them, rather than
>>> needing to prepare a send for each buffer. Part of that may be that
>>> you simply run out of SQEs and then have to submit regardless of
>>> where you are at.
>>
>> How many sockets did you test with? It's 1 SQE per sock max
> 
> The above is just one, but I've run it with a lot more sockets. Nothing
> ilke thousands, but 64-128.
> 
>> +87% sounds like a huge difference, and I don't understand where it
>> comes from, hence the question
> 
> There are several things:
> 
> 1) Fact is that the app has to serialize sends for the unlikely case
>     of sends being reordered because of the condition outlined in the
>     patch that enables provided buffer support for send. This is the
>     largest win, particularly with smaller packets, as it ruins the
>     send pipeline.

Do those small packets force it to poll?

> 2) We're posting fewer SQEs. That's the multishot win. Obivously not
>     as large, but it does help.
> 
> People have asked in the past on how to serialize sends, and I've had to
> tell them that it isn't really possible. The only option we had was
> using drain or links, which aren't ideal nor very flexible. Using
> provided buffers finally gives the application a way to do that without
> needing to do anything really. Does every application need it? Certainly
> not, but for the ones that do, I do think it provides a great
> alternative that's better performing than doing single sends at the
> time.

As per note on additional userspace backlog, any real generic app
and especially libs would need to do more to support it.

-- 
Pavel Begunkov

  reply	other threads:[~2024-02-26 21:11 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-25  0:35 [PATCHSET v3 0/8] Support for provided buffers for send Jens Axboe
2024-02-25  0:35 ` [PATCH 1/8] io_uring/net: unify how recvmsg and sendmsg copy in the msghdr Jens Axboe
2024-02-26 14:41   ` Pavel Begunkov
2024-02-26 15:03     ` Jens Axboe
2024-02-25  0:35 ` [PATCH 2/8] net: remove {revc,send}msg_copy_msghdr() from exports Jens Axboe
2024-02-25  0:35 ` [PATCH 3/8] io_uring/net: add provided buffer support for IORING_OP_SEND Jens Axboe
2024-02-25  0:35 ` [PATCH 4/8] io_uring/net: add provided buffer support for IORING_OP_SENDMSG Jens Axboe
2024-02-25  0:35 ` [PATCH 5/8] io_uring/kbuf: flag request if buffer pool is empty after buffer pick Jens Axboe
2024-02-25  0:35 ` [PATCH 6/8] io_uring/net: support multishot for send Jens Axboe
2024-02-26 10:47   ` Dylan Yudaken
2024-02-26 13:38     ` Jens Axboe
2024-02-26 14:02       ` Dylan Yudaken
2024-02-26 14:27         ` Jens Axboe
2024-02-26 14:36           ` Pavel Begunkov
2024-02-26 15:16             ` Jens Axboe
2024-02-26 15:41               ` Pavel Begunkov
2024-02-26 19:11                 ` Jens Axboe
2024-02-26 19:21                   ` Pavel Begunkov
2024-02-26 20:12                     ` Jens Axboe
2024-02-26 20:51                       ` Pavel Begunkov [this message]
2024-02-26 21:27                         ` Jens Axboe
2024-02-28 12:39                           ` Pavel Begunkov
2024-02-28 17:28                             ` Jens Axboe
2024-02-28 23:49                               ` Jens Axboe
2024-02-29  1:46                                 ` Jens Axboe
2024-02-29 15:42                                   ` Jens Axboe
2024-02-26 19:31           ` Dylan Yudaken
2024-02-26 19:49             ` Jens Axboe
2024-02-25  0:35 ` [PATCH 7/8] io_uring/net: support multishot for sendmsg Jens Axboe
2024-02-25  0:35 ` [PATCH 8/8] io_uring/net: set MSG_MORE if we're doing multishot send and have more Jens Axboe
2024-02-26 10:59   ` Dylan Yudaken
2024-02-26 13:42     ` Jens Axboe
2024-02-26 14:24       ` Pavel Begunkov
2024-02-26 14:52         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox