Re: [PATCHSET 0/3] Improve MSG_RING SINGLE_ISSUER performance

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Subject: Re: [PATCHSET 0/3] Improve MSG_RING SINGLE_ISSUER performance
Date: Tue, 28 May 2024 14:31:39 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 5/24/24 23:58, Jens Axboe wrote:
> Hi,
> 
> A ring setup with with IORING_SETUP_SINGLE_ISSUER, which is required to

IORING_SETUP_SINGLE_ISSUER has nothing to do with it, it's
specifically an IORING_SETUP_DEFER_TASKRUN optimisation.

> use IORING_SETUP_DEFER_TASKRUN, will need two round trips through
> generic task_work. This isn't ideal. This patchset attempts to rectify
> that, taking a new approach rather than trying to use the io_uring
> task_work infrastructure to handle it as in previous postings.

Not sure why you'd want to piggyback onto overflows, it's not
such a well made and reliable infra, whereas the DEFER_TASKRUN
part of the task_work approach was fine.

The completion path doesn't usually look at the overflow list
but on cached cqe pointers showing the CQ is full, that means
after you queue an overflow someone may post a CQE in the CQ
in the normal path and you get reordering. Not that bad
considering it's from another ring, but a bit nasty and surely
will bite us back in the future, it always does.

That's assuming you decide io_msg_need_remote() solely based
->task_complete and remove

	return current != target_ctx->submitter_task;

otherwise you can get two linked msg_ring target CQEs reordered.

It's also duplicating that crappy overflow code nobody cares
much about, and it's still a forced wake up with no batching,
circumventing the normal wake up path, i.e. io_uring tw.

I don't think we should care about the request completion
latency (sender latency), people should be more interested
in the reaction speed (receiver latency), but if you care
about it for a reason, perhaps you can just as well allocate
a new request and complete the previous one right away.

> In a sample test app that has one thread send messages to another and
> logging both the time between sender sending and receiver receving and
> just the time for the sender to post a message and get the CQE back,
> I see the following sender latencies with the stock kernel:
> 
> Latencies for: Sender
>      percentiles (nsec):
>       |  1.0000th=[ 4384],  5.0000th=[ 4512], 10.0000th=[ 4576],
>       | 20.0000th=[ 4768], 30.0000th=[ 4896], 40.0000th=[ 5024],
>       | 50.0000th=[ 5088], 60.0000th=[ 5152], 70.0000th=[ 5280],
>       | 80.0000th=[ 5344], 90.0000th=[ 5536], 95.0000th=[ 5728],
>       | 99.0000th=[ 8032], 99.5000th=[18048], 99.9000th=[21376],
>       | 99.9500th=[26496], 99.9900th=[59136]
> 
> and with the patches:
> 
> Latencies for: Sender
>      percentiles (nsec):
>       |  1.0000th=[  756],  5.0000th=[  820], 10.0000th=[  828],
>       | 20.0000th=[  844], 30.0000th=[  852], 40.0000th=[  852],
>       | 50.0000th=[  860], 60.0000th=[  860], 70.0000th=[  868],
>       | 80.0000th=[  884], 90.0000th=[  964], 95.0000th=[  988],
>       | 99.0000th=[ 1128], 99.5000th=[ 1208], 99.9000th=[ 1544],
>       | 99.9500th=[ 1944], 99.9900th=[ 2896]
> 
> For the receiving side the win is smaller as it only "suffers" from
> a single generic task_work, about a 10% win in latencies there.
> 
> The idea here is to utilize the CQE overflow infrastructure for this,
> as that allows the right task to post the CQE to the ring.
> 
> 1 is a basic refactoring prep patch, patch 2 adds support for normal
> messages, and patch 3 adopts the same approach for fd passing.
> 
>   io_uring/msg_ring.c | 151 ++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 138 insertions(+), 13 deletions(-)
> 

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2024-05-28 13:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-24 22:58 [PATCHSET 0/3] Improve MSG_RING SINGLE_ISSUER performance Jens Axboe
2024-05-24 22:58 ` [PATCH 1/3] io_uring/msg_ring: split fd installing into a helper Jens Axboe
2024-05-24 22:58 ` [PATCH 2/3] io_uring/msg_ring: avoid double indirection task_work for data messages Jens Axboe
2024-05-28 13:18   ` Pavel Begunkov
2024-05-28 14:23     ` Jens Axboe
2024-05-28 13:32   ` Pavel Begunkov
2024-05-28 14:23     ` Jens Axboe
2024-05-28 16:23       ` Pavel Begunkov
2024-05-28 17:59         ` Jens Axboe
2024-05-29  2:04           ` Pavel Begunkov
2024-05-29  2:43             ` Jens Axboe
2024-05-24 22:58 ` [PATCH 3/3] io_uring/msg_ring: avoid double indirection task_work for fd passing Jens Axboe
2024-05-28 13:31 ` Pavel Begunkov [this message]
2024-05-28 14:34   ` [PATCHSET 0/3] Improve MSG_RING SINGLE_ISSUER performance Jens Axboe
2024-05-28 14:39     ` Jens Axboe
2024-05-28 15:27     ` Jens Axboe
2024-05-28 16:50     ` Pavel Begunkov
2024-05-28 18:07       ` Jens Axboe
2024-05-28 18:31         ` Jens Axboe
2024-05-28 23:04           ` Jens Axboe
2024-05-29  1:35             ` Jens Axboe
2024-05-29  2:08               ` Pavel Begunkov
2024-05-29  2:42                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox