public inbox for [email protected]
 help / color / mirror / Atom feed
* io_uring networking performance degradation
@ 2021-04-19  9:13 Michael Stoler
  2021-04-19 10:20 ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Stoler @ 2021-04-19  9:13 UTC (permalink / raw)
  To: io-uring

We are trying to reproduce reported on page
https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md
results with a more realistic environment:
1. Internode networking in AWS cluster with i3.16xlarge nodes type(25
Gigabit network connection between client and server)
2. 128 and 2048 packet sizes, to simulate typical payloads
3. 10 clients to get 75-95% CPU utilization by server to simulate
server's normal load
4. 20 clients to get 100% CPU utilization by server to simulate
server's hard load

Software:
1. OS: Ubuntu 20.04.2 LTS HWE with 5.8.0-45-generic kernel with latest liburing
2. io_uring-echo-server: https://github.com/frevib/io_uring-echo-server
3. epoll-echo-server: https://github.com/frevib/epoll-echo-server
4. benchmark: https://github.com/haraldh/rust_echo_bench
5. all commands runs with "hwloc-bind os=eth1"

The results are confusing, epoll_echo_server shows stable advantage
over io_uring-echo-server, despite reported advantage of
io_uring-echo-server:

128 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 128
epoll_echo_server:      Speed: 80999 request/sec, 80999 response/sec
io_uring_echo_server:   Speed: 74488 request/sec, 74488 response/sec
epoll_echo_server is 8% faster

128 bytes packet size, 20 clients, 100% CPU core utilization by server:
echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 128
epoll_echo_server:      Speed: 129063 request/sec, 129063 response/sec
io_uring_echo_server:    Speed: 102681 request/sec, 102681 response/sec
epoll_echo_server is 25% faster

2048 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 2048
epoll_echo_server:       Speed: 74421 request/sec, 74421 response/sec
io_uring_echo_server:    Speed: 66510 request/sec, 66510 response/sec
epoll_echo_server is 11% faster

2048 bytes packet size, 20 clients, 100% CPU core utilization by server:
echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 2048
epoll_echo_server:       Speed: 108704 request/sec, 108704 response/sec
io_uring_echo_server:    Speed: 85536 request/sec, 85536 response/sec
epoll_echo_server is 27% faster

Why io_uring shows consistent performance degradation? What is going wrong?

Regards
    Michael Stoler

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io_uring networking performance degradation
  2021-04-19  9:13 io_uring networking performance degradation Michael Stoler
@ 2021-04-19 10:20 ` Pavel Begunkov
  2021-04-19 14:27   ` Michael Stoler
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-04-19 10:20 UTC (permalink / raw)
  To: Michael Stoler, io-uring, Jens Axboe

On 4/19/21 10:13 AM, Michael Stoler wrote:
> We are trying to reproduce reported on page
> https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md
> results with a more realistic environment:
> 1. Internode networking in AWS cluster with i3.16xlarge nodes type(25
> Gigabit network connection between client and server)
> 2. 128 and 2048 packet sizes, to simulate typical payloads
> 3. 10 clients to get 75-95% CPU utilization by server to simulate
> server's normal load
> 4. 20 clients to get 100% CPU utilization by server to simulate
> server's hard load
> 
> Software:
> 1. OS: Ubuntu 20.04.2 LTS HWE with 5.8.0-45-generic kernel with latest liburing
> 2. io_uring-echo-server: https://github.com/frevib/io_uring-echo-server
> 3. epoll-echo-server: https://github.com/frevib/epoll-echo-server
> 4. benchmark: https://github.com/haraldh/rust_echo_bench
> 5. all commands runs with "hwloc-bind os=eth1"
> 
> The results are confusing, epoll_echo_server shows stable advantage
> over io_uring-echo-server, despite reported advantage of
> io_uring-echo-server:
> 
> 128 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 128
> epoll_echo_server:      Speed: 80999 request/sec, 80999 response/sec
> io_uring_echo_server:   Speed: 74488 request/sec, 74488 response/sec
> epoll_echo_server is 8% faster
> 
> 128 bytes packet size, 20 clients, 100% CPU core utilization by server:
> echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 128
> epoll_echo_server:      Speed: 129063 request/sec, 129063 response/sec
> io_uring_echo_server:    Speed: 102681 request/sec, 102681 response/sec
> epoll_echo_server is 25% faster
> 
> 2048 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 2048
> epoll_echo_server:       Speed: 74421 request/sec, 74421 response/sec
> io_uring_echo_server:    Speed: 66510 request/sec, 66510 response/sec
> epoll_echo_server is 11% faster
> 
> 2048 bytes packet size, 20 clients, 100% CPU core utilization by server:
> echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 2048
> epoll_echo_server:       Speed: 108704 request/sec, 108704 response/sec
> io_uring_echo_server:    Speed: 85536 request/sec, 85536 response/sec
> epoll_echo_server is 27% faster
> 
> Why io_uring shows consistent performance degradation? What is going wrong?

5.8 is pretty old, and I'm not sure all the performance problems were
addressed there. Apart from missing common optimisations as you may
have seen in the thread, it looks to me it doesn't have sighd(?) lock
hammering fix. Jens, knows better it has been backported or not.

So, things you can do:
1) try out 5.12
2) attach `perf top` output or some other profiling for your 5.8
3) to have a more complete picture do 2) with 5.12

Let's find what's gone wrong

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io_uring networking performance degradation
  2021-04-19 10:20 ` Pavel Begunkov
@ 2021-04-19 14:27   ` Michael Stoler
  2021-04-20 10:44     ` Michael Stoler
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Stoler @ 2021-04-19 14:27 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe

1)  linux-5.12-rc8 shows generally same picture:

average load, 70-85% CPU core usage, 128 bytes packets
    echo_bench --address '172.22.150.170:7777' --number 10 --duration
60 --length 128`
epoll-echo-server:      Speed: 71513 request/sec, 71513 response/sec
io_uring_echo_server:   Speed: 64091 request/sec, 64091 response/sec
    epoll-echo-server is 11% faster

high load, 95-100% CPU core usage, 128 bytes packets
    echo_bench --address '172.22.150.170:7777' --number 20 --duration
60 --length 128`
epoll-echo-server:      Speed: 130186 request/sec, 130186 response/sec
io_uring_echo_server:   Speed: 109793 request/sec, 109793 response/sec
    epoll-echo-server is 18% faster

average load, 70-85% CPU core usage, 2048 bytes packets
    echo_bench --address '172.22.150.170:7777' --number 10 --duration
60 --length 2048`
epoll-echo-server:      Speed: 63082 request/sec, 63082 response/sec
io_uring_echo_server:   Speed: 59449 request/sec, 59449 response/sec
    epoll-echo-server is 6% faster

high load, 95-100% CPU core usage, 2048 bytes packets
    echo_bench --address '172.22.150.170:7777' --number 20 --duration
60 --length 2048`
epoll-echo-server:      Speed: 110402 request/sec, 110402 response/sec
io_uring_echo_server:   Speed: 88718 request/sec, 88718 response/sec
    epoll-echo-server is 24% faster


2-3) The "perf top" doesn't work stable with Ubuntu over AWS. All the
time it shows errors: "Uhhuh. NMI received for unknown reason", "Do
you have a strange power saving mode enabled?",  "Dazed and confused,
but trying to continue".

Regards
    Michael Stoler

On Mon, Apr 19, 2021 at 1:20 PM Pavel Begunkov <[email protected]> wrote:
>
> On 4/19/21 10:13 AM, Michael Stoler wrote:
> > We are trying to reproduce reported on page
> > https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md
> > results with a more realistic environment:
> > 1. Internode networking in AWS cluster with i3.16xlarge nodes type(25
> > Gigabit network connection between client and server)
> > 2. 128 and 2048 packet sizes, to simulate typical payloads
> > 3. 10 clients to get 75-95% CPU utilization by server to simulate
> > server's normal load
> > 4. 20 clients to get 100% CPU utilization by server to simulate
> > server's hard load
> >
> > Software:
> > 1. OS: Ubuntu 20.04.2 LTS HWE with 5.8.0-45-generic kernel with latest liburing
> > 2. io_uring-echo-server: https://github.com/frevib/io_uring-echo-server
> > 3. epoll-echo-server: https://github.com/frevib/epoll-echo-server
> > 4. benchmark: https://github.com/haraldh/rust_echo_bench
> > 5. all commands runs with "hwloc-bind os=eth1"
> >
> > The results are confusing, epoll_echo_server shows stable advantage
> > over io_uring-echo-server, despite reported advantage of
> > io_uring-echo-server:
> >
> > 128 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> > echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 128
> > epoll_echo_server:      Speed: 80999 request/sec, 80999 response/sec
> > io_uring_echo_server:   Speed: 74488 request/sec, 74488 response/sec
> > epoll_echo_server is 8% faster
> >
> > 128 bytes packet size, 20 clients, 100% CPU core utilization by server:
> > echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 128
> > epoll_echo_server:      Speed: 129063 request/sec, 129063 response/sec
> > io_uring_echo_server:    Speed: 102681 request/sec, 102681 response/sec
> > epoll_echo_server is 25% faster
> >
> > 2048 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> > echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 2048
> > epoll_echo_server:       Speed: 74421 request/sec, 74421 response/sec
> > io_uring_echo_server:    Speed: 66510 request/sec, 66510 response/sec
> > epoll_echo_server is 11% faster
> >
> > 2048 bytes packet size, 20 clients, 100% CPU core utilization by server:
> > echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 2048
> > epoll_echo_server:       Speed: 108704 request/sec, 108704 response/sec
> > io_uring_echo_server:    Speed: 85536 request/sec, 85536 response/sec
> > epoll_echo_server is 27% faster
> >
> > Why io_uring shows consistent performance degradation? What is going wrong?
>
> 5.8 is pretty old, and I'm not sure all the performance problems were
> addressed there. Apart from missing common optimisations as you may
> have seen in the thread, it looks to me it doesn't have sighd(?) lock
> hammering fix. Jens, knows better it has been backported or not.
>
> So, things you can do:
> 1) try out 5.12
> 2) attach `perf top` output or some other profiling for your 5.8
> 3) to have a more complete picture do 2) with 5.12
>
> Let's find what's gone wrong
>
> --
> Pavel Begunkov



-- 
Michael Stoler

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io_uring networking performance degradation
  2021-04-19 14:27   ` Michael Stoler
@ 2021-04-20 10:44     ` Michael Stoler
  2021-04-25  9:52       ` Michael Stoler
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Stoler @ 2021-04-20 10:44 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe

Hi, perf data and tops for linux-5.8 are here:
http://rdxdownloads.rdxdyn.com/michael_stoler_perf_data.tgz

Regards
    Michael Stoler

On Mon, Apr 19, 2021 at 5:27 PM Michael Stoler <[email protected]> wrote:
>
> 1)  linux-5.12-rc8 shows generally same picture:
>
> average load, 70-85% CPU core usage, 128 bytes packets
>     echo_bench --address '172.22.150.170:7777' --number 10 --duration
> 60 --length 128`
> epoll-echo-server:      Speed: 71513 request/sec, 71513 response/sec
> io_uring_echo_server:   Speed: 64091 request/sec, 64091 response/sec
>     epoll-echo-server is 11% faster
>
> high load, 95-100% CPU core usage, 128 bytes packets
>     echo_bench --address '172.22.150.170:7777' --number 20 --duration
> 60 --length 128`
> epoll-echo-server:      Speed: 130186 request/sec, 130186 response/sec
> io_uring_echo_server:   Speed: 109793 request/sec, 109793 response/sec
>     epoll-echo-server is 18% faster
>
> average load, 70-85% CPU core usage, 2048 bytes packets
>     echo_bench --address '172.22.150.170:7777' --number 10 --duration
> 60 --length 2048`
> epoll-echo-server:      Speed: 63082 request/sec, 63082 response/sec
> io_uring_echo_server:   Speed: 59449 request/sec, 59449 response/sec
>     epoll-echo-server is 6% faster
>
> high load, 95-100% CPU core usage, 2048 bytes packets
>     echo_bench --address '172.22.150.170:7777' --number 20 --duration
> 60 --length 2048`
> epoll-echo-server:      Speed: 110402 request/sec, 110402 response/sec
> io_uring_echo_server:   Speed: 88718 request/sec, 88718 response/sec
>     epoll-echo-server is 24% faster
>
>
> 2-3) The "perf top" doesn't work stable with Ubuntu over AWS. All the
> time it shows errors: "Uhhuh. NMI received for unknown reason", "Do
> you have a strange power saving mode enabled?",  "Dazed and confused,
> but trying to continue".
>
> Regards
>     Michael Stoler
>
> On Mon, Apr 19, 2021 at 1:20 PM Pavel Begunkov <[email protected]> wrote:
> >
> > On 4/19/21 10:13 AM, Michael Stoler wrote:
> > > We are trying to reproduce reported on page
> > > https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md
> > > results with a more realistic environment:
> > > 1. Internode networking in AWS cluster with i3.16xlarge nodes type(25
> > > Gigabit network connection between client and server)
> > > 2. 128 and 2048 packet sizes, to simulate typical payloads
> > > 3. 10 clients to get 75-95% CPU utilization by server to simulate
> > > server's normal load
> > > 4. 20 clients to get 100% CPU utilization by server to simulate
> > > server's hard load
> > >
> > > Software:
> > > 1. OS: Ubuntu 20.04.2 LTS HWE with 5.8.0-45-generic kernel with latest liburing
> > > 2. io_uring-echo-server: https://github.com/frevib/io_uring-echo-server
> > > 3. epoll-echo-server: https://github.com/frevib/epoll-echo-server
> > > 4. benchmark: https://github.com/haraldh/rust_echo_bench
> > > 5. all commands runs with "hwloc-bind os=eth1"
> > >
> > > The results are confusing, epoll_echo_server shows stable advantage
> > > over io_uring-echo-server, despite reported advantage of
> > > io_uring-echo-server:
> > >
> > > 128 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> > > echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 128
> > > epoll_echo_server:      Speed: 80999 request/sec, 80999 response/sec
> > > io_uring_echo_server:   Speed: 74488 request/sec, 74488 response/sec
> > > epoll_echo_server is 8% faster
> > >
> > > 128 bytes packet size, 20 clients, 100% CPU core utilization by server:
> > > echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 128
> > > epoll_echo_server:      Speed: 129063 request/sec, 129063 response/sec
> > > io_uring_echo_server:    Speed: 102681 request/sec, 102681 response/sec
> > > epoll_echo_server is 25% faster
> > >
> > > 2048 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> > > echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 2048
> > > epoll_echo_server:       Speed: 74421 request/sec, 74421 response/sec
> > > io_uring_echo_server:    Speed: 66510 request/sec, 66510 response/sec
> > > epoll_echo_server is 11% faster
> > >
> > > 2048 bytes packet size, 20 clients, 100% CPU core utilization by server:
> > > echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 2048
> > > epoll_echo_server:       Speed: 108704 request/sec, 108704 response/sec
> > > io_uring_echo_server:    Speed: 85536 request/sec, 85536 response/sec
> > > epoll_echo_server is 27% faster
> > >
> > > Why io_uring shows consistent performance degradation? What is going wrong?
> >
> > 5.8 is pretty old, and I'm not sure all the performance problems were
> > addressed there. Apart from missing common optimisations as you may
> > have seen in the thread, it looks to me it doesn't have sighd(?) lock
> > hammering fix. Jens, knows better it has been backported or not.
> >
> > So, things you can do:
> > 1) try out 5.12
> > 2) attach `perf top` output or some other profiling for your 5.8
> > 3) to have a more complete picture do 2) with 5.12
> >
> > Let's find what's gone wrong
> >
> > --
> > Pavel Begunkov
>
>
>
> --
> Michael Stoler



-- 
Michael Stoler

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io_uring networking performance degradation
  2021-04-20 10:44     ` Michael Stoler
@ 2021-04-25  9:52       ` Michael Stoler
  2021-04-26 11:49         ` Pavel Begunkov
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Stoler @ 2021-04-25  9:52 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe

Because of unstable working of perf over AWS VM I recheck test on
physical machine: Ubuntu 20.04, 5.8.0-50-generic kernel, CPU AMD EPYC
7272 12-Core Processor 3200MHz, BogoMIPS 5789.39, NIC melanox 5,
Speed: 25000Mb/s Full Duplex.
Over physical machine performance degradation is much less pronounced
but still exists:
io_uring-echo-server    Speed: 143081 request/sec, 143081 response/sec
epoll-echo-server   Speed: 150692 request/sec, 150692 response/sec
epoll-echo-server is 5% faster

"perf top" with io_uring-echo-server:
PerfTop:   16481 irqs/sec  kernel:98.5%  exact: 99.8% lost: 0/0 drop:
0/0 [4000Hz cycles],  (all, 24 CPUs)
-------------------------------------------------------------------------------
     8.66%  [kernel]          [k] __x86_indirect_thunk_rax
     8.49%  [kernel]          [k] copy_user_generic_string
     5.57%  [kernel]          [k] memset
     2.81%  [kernel]          [k] tcp_rate_skb_sent
     2.32%  [kernel]          [k] __alloc_skb
     2.16%  [kernel]          [k] __check_object_size
     1.44%  [unknown]         [k] 0xffffffffc100c296
     1.28%  [kernel]          [k] tcp_write_xmit
     1.22%  [kernel]          [k] iommu_dma_map_page
     1.16%  [kernel]          [k] kmem_cache_free
     1.14%  [kernel]          [k] __softirqentry_text_start
     1.06%  [unknown]         [k] 0xffffffffc1008a7e
     1.03%  [kernel]          [k] __skb_datagram_iter
     0.97%  [kernel]          [k] __dev_queue_xmit
     0.86%  [kernel]          [k] ipv4_mtu
     0.85%  [kernel]          [k] tcp_schedule_loss_probe
     0.80%  [kernel]          [k] tcp_release_cb
     0.78%  [unknown]         [k] 0xffffffffc100c290
     0.77%  [unknown]         [k] 0xffffffffc100c295
     0.76%  perf              [.] __symbols__insert

"perf top" with epoll-echo-server:
PerfTop:   24255 irqs/sec  kernel:98.3%  exact: 99.6% lost: 0/0 drop:
0/0 [4000Hz cycles],  (all, 24 CPUs)
-------------------------------------------------------------------------------
     8.77%  [kernel]          [k] __x86_indirect_thunk_rax
     7.50%  [kernel]          [k] copy_user_generic_string
     4.10%  [kernel]          [k] memset
     2.70%  [kernel]          [k] tcp_rate_skb_sent
     2.18%  [kernel]          [k] __check_object_size
     2.09%  [kernel]          [k] __alloc_skb
     1.61%  [unknown]         [k] 0xffffffffc100c296
     1.47%  [kernel]          [k] __virt_addr_valid
     1.40%  [kernel]          [k] iommu_dma_map_page
     1.37%  [unknown]         [k] 0xffffffffc1008a7e
     1.22%  [kernel]          [k] tcp_poll
     1.16%  [kernel]          [k] __softirqentry_text_start
     1.15%  [kernel]          [k] tcp_stream_memory_free
     1.07%  [kernel]          [k] tcp_write_xmit
     1.06%  [kernel]          [k] kmem_cache_free
     1.03%  [kernel]          [k] tcp_release_cb
     0.96%  [kernel]          [k] syscall_return_via_sysret
     0.90%  [kernel]          [k] __lock_text_start
     0.82%  [kernel]          [k] __copy_skb_header
     0.81%  [kernel]          [k] amd_iommu_map

Regards
    Michael Stoler

On Tue, Apr 20, 2021 at 1:44 PM Michael Stoler <[email protected]> wrote:
>
> Hi, perf data and tops for linux-5.8 are here:
> http://rdxdownloads.rdxdyn.com/michael_stoler_perf_data.tgz
>
> Regards
>     Michael Stoler
>
> On Mon, Apr 19, 2021 at 5:27 PM Michael Stoler <[email protected]> wrote:
> >
> > 1)  linux-5.12-rc8 shows generally same picture:
> >
> > average load, 70-85% CPU core usage, 128 bytes packets
> >     echo_bench --address '172.22.150.170:7777' --number 10 --duration
> > 60 --length 128`
> > epoll-echo-server:      Speed: 71513 request/sec, 71513 response/sec
> > io_uring_echo_server:   Speed: 64091 request/sec, 64091 response/sec
> >     epoll-echo-server is 11% faster
> >
> > high load, 95-100% CPU core usage, 128 bytes packets
> >     echo_bench --address '172.22.150.170:7777' --number 20 --duration
> > 60 --length 128`
> > epoll-echo-server:      Speed: 130186 request/sec, 130186 response/sec
> > io_uring_echo_server:   Speed: 109793 request/sec, 109793 response/sec
> >     epoll-echo-server is 18% faster
> >
> > average load, 70-85% CPU core usage, 2048 bytes packets
> >     echo_bench --address '172.22.150.170:7777' --number 10 --duration
> > 60 --length 2048`
> > epoll-echo-server:      Speed: 63082 request/sec, 63082 response/sec
> > io_uring_echo_server:   Speed: 59449 request/sec, 59449 response/sec
> >     epoll-echo-server is 6% faster
> >
> > high load, 95-100% CPU core usage, 2048 bytes packets
> >     echo_bench --address '172.22.150.170:7777' --number 20 --duration
> > 60 --length 2048`
> > epoll-echo-server:      Speed: 110402 request/sec, 110402 response/sec
> > io_uring_echo_server:   Speed: 88718 request/sec, 88718 response/sec
> >     epoll-echo-server is 24% faster
> >
> >
> > 2-3) The "perf top" doesn't work stable with Ubuntu over AWS. All the
> > time it shows errors: "Uhhuh. NMI received for unknown reason", "Do
> > you have a strange power saving mode enabled?",  "Dazed and confused,
> > but trying to continue".
> >
> > Regards
> >     Michael Stoler
> >
> > On Mon, Apr 19, 2021 at 1:20 PM Pavel Begunkov <[email protected]> wrote:
> > >
> > > On 4/19/21 10:13 AM, Michael Stoler wrote:
> > > > We are trying to reproduce reported on page
> > > > https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md
> > > > results with a more realistic environment:
> > > > 1. Internode networking in AWS cluster with i3.16xlarge nodes type(25
> > > > Gigabit network connection between client and server)
> > > > 2. 128 and 2048 packet sizes, to simulate typical payloads
> > > > 3. 10 clients to get 75-95% CPU utilization by server to simulate
> > > > server's normal load
> > > > 4. 20 clients to get 100% CPU utilization by server to simulate
> > > > server's hard load
> > > >
> > > > Software:
> > > > 1. OS: Ubuntu 20.04.2 LTS HWE with 5.8.0-45-generic kernel with latest liburing
> > > > 2. io_uring-echo-server: https://github.com/frevib/io_uring-echo-server
> > > > 3. epoll-echo-server: https://github.com/frevib/epoll-echo-server
> > > > 4. benchmark: https://github.com/haraldh/rust_echo_bench
> > > > 5. all commands runs with "hwloc-bind os=eth1"
> > > >
> > > > The results are confusing, epoll_echo_server shows stable advantage
> > > > over io_uring-echo-server, despite reported advantage of
> > > > io_uring-echo-server:
> > > >
> > > > 128 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> > > > echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 128
> > > > epoll_echo_server:      Speed: 80999 request/sec, 80999 response/sec
> > > > io_uring_echo_server:   Speed: 74488 request/sec, 74488 response/sec
> > > > epoll_echo_server is 8% faster
> > > >
> > > > 128 bytes packet size, 20 clients, 100% CPU core utilization by server:
> > > > echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 128
> > > > epoll_echo_server:      Speed: 129063 request/sec, 129063 response/sec
> > > > io_uring_echo_server:    Speed: 102681 request/sec, 102681 response/sec
> > > > epoll_echo_server is 25% faster
> > > >
> > > > 2048 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> > > > echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 2048
> > > > epoll_echo_server:       Speed: 74421 request/sec, 74421 response/sec
> > > > io_uring_echo_server:    Speed: 66510 request/sec, 66510 response/sec
> > > > epoll_echo_server is 11% faster
> > > >
> > > > 2048 bytes packet size, 20 clients, 100% CPU core utilization by server:
> > > > echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 2048
> > > > epoll_echo_server:       Speed: 108704 request/sec, 108704 response/sec
> > > > io_uring_echo_server:    Speed: 85536 request/sec, 85536 response/sec
> > > > epoll_echo_server is 27% faster
> > > >
> > > > Why io_uring shows consistent performance degradation? What is going wrong?
> > >
> > > 5.8 is pretty old, and I'm not sure all the performance problems were
> > > addressed there. Apart from missing common optimisations as you may
> > > have seen in the thread, it looks to me it doesn't have sighd(?) lock
> > > hammering fix. Jens, knows better it has been backported or not.
> > >
> > > So, things you can do:
> > > 1) try out 5.12
> > > 2) attach `perf top` output or some other profiling for your 5.8
> > > 3) to have a more complete picture do 2) with 5.12
> > >
> > > Let's find what's gone wrong
> > >
> > > --
> > > Pavel Begunkov
> >
> >
> >
> > --
> > Michael Stoler
>
>
>
> --
> Michael Stoler



-- 
Michael Stoler

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io_uring networking performance degradation
  2021-04-25  9:52       ` Michael Stoler
@ 2021-04-26 11:49         ` Pavel Begunkov
  2021-04-26 13:07           ` Michael Stoler
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-04-26 11:49 UTC (permalink / raw)
  To: Michael Stoler; +Cc: io-uring, Jens Axboe

On 4/25/21 10:52 AM, Michael Stoler wrote:
> Because of unstable working of perf over AWS VM I recheck test on
> physical machine: Ubuntu 20.04, 5.8.0-50-generic kernel, CPU AMD EPYC
> 7272 12-Core Processor 3200MHz, BogoMIPS 5789.39, NIC melanox 5,
> Speed: 25000Mb/s Full Duplex.
> Over physical machine performance degradation is much less pronounced
> but still exists:
> io_uring-echo-server    Speed: 143081 request/sec, 143081 response/sec
> epoll-echo-server   Speed: 150692 request/sec, 150692 response/sec
> epoll-echo-server is 5% faster

Have to note that I haven't check the userspace programs, so not sure
it's a fair comparison (may be or may be not). So, with it being said:

1) The last report had lot of idle time, so it may be a question of
latency but not throughput for it.

2) Did you do proper pinning to a CPU/core? taskset or cset? Also,
did it saturate the CPU/core you used in the most recent post?

3) Looking at __skb_datagram_iter taking 1%, seems there are other
tasks taking a relatively good share of CPU/NIC resources. What is
this datagram? UDP on the same NIC? Is something else using your
NIC/interface?

4) don't see even close anything related to io_uring in the recent
run, and it was only a small fraction in previous ones. So it's
definitely not the overhead on submit/complete. If there is a
io_uring problem, it could be the difference in polling / iowq
punting comparing with epoll. It may be interesting to look into.

And related thing I'm curious about is to compare FAST_POLL
requests with io_uring multi-shot polling + send/recv.


> 
> "perf top" with io_uring-echo-server:
> PerfTop:   16481 irqs/sec  kernel:98.5%  exact: 99.8% lost: 0/0 drop:
> 0/0 [4000Hz cycles],  (all, 24 CPUs)
> -------------------------------------------------------------------------------
>      8.66%  [kernel]          [k] __x86_indirect_thunk_rax
>      8.49%  [kernel]          [k] copy_user_generic_string
>      5.57%  [kernel]          [k] memset
>      2.81%  [kernel]          [k] tcp_rate_skb_sent
>      2.32%  [kernel]          [k] __alloc_skb
>      2.16%  [kernel]          [k] __check_object_size
>      1.44%  [unknown]         [k] 0xffffffffc100c296
>      1.28%  [kernel]          [k] tcp_write_xmit
>      1.22%  [kernel]          [k] iommu_dma_map_page
>      1.16%  [kernel]          [k] kmem_cache_free
>      1.14%  [kernel]          [k] __softirqentry_text_start
>      1.06%  [unknown]         [k] 0xffffffffc1008a7e
>      1.03%  [kernel]          [k] __skb_datagram_iter
>      0.97%  [kernel]          [k] __dev_queue_xmit
>      0.86%  [kernel]          [k] ipv4_mtu
>      0.85%  [kernel]          [k] tcp_schedule_loss_probe
>      0.80%  [kernel]          [k] tcp_release_cb
>      0.78%  [unknown]         [k] 0xffffffffc100c290
>      0.77%  [unknown]         [k] 0xffffffffc100c295
>      0.76%  perf              [.] __symbols__insert
> 
> "perf top" with epoll-echo-server:
> PerfTop:   24255 irqs/sec  kernel:98.3%  exact: 99.6% lost: 0/0 drop:
> 0/0 [4000Hz cycles],  (all, 24 CPUs)
> -------------------------------------------------------------------------------
>      8.77%  [kernel]          [k] __x86_indirect_thunk_rax
>      7.50%  [kernel]          [k] copy_user_generic_string
>      4.10%  [kernel]          [k] memset
>      2.70%  [kernel]          [k] tcp_rate_skb_sent
>      2.18%  [kernel]          [k] __check_object_size
>      2.09%  [kernel]          [k] __alloc_skb
>      1.61%  [unknown]         [k] 0xffffffffc100c296
>      1.47%  [kernel]          [k] __virt_addr_valid
>      1.40%  [kernel]          [k] iommu_dma_map_page
>      1.37%  [unknown]         [k] 0xffffffffc1008a7e
>      1.22%  [kernel]          [k] tcp_poll
>      1.16%  [kernel]          [k] __softirqentry_text_start
>      1.15%  [kernel]          [k] tcp_stream_memory_free
>      1.07%  [kernel]          [k] tcp_write_xmit
>      1.06%  [kernel]          [k] kmem_cache_free
>      1.03%  [kernel]          [k] tcp_release_cb
>      0.96%  [kernel]          [k] syscall_return_via_sysret
>      0.90%  [kernel]          [k] __lock_text_start
>      0.82%  [kernel]          [k] __copy_skb_header
>      0.81%  [kernel]          [k] amd_iommu_map
> 
> Regards
>     Michael Stoler
> 
> On Tue, Apr 20, 2021 at 1:44 PM Michael Stoler <[email protected]> wrote:
>>
>> Hi, perf data and tops for linux-5.8 are here:
>> http://rdxdownloads.rdxdyn.com/michael_stoler_perf_data.tgz
>>
>> Regards
>>     Michael Stoler
>>
>> On Mon, Apr 19, 2021 at 5:27 PM Michael Stoler <[email protected]> wrote:
>>>
>>> 1)  linux-5.12-rc8 shows generally same picture:
>>>
>>> average load, 70-85% CPU core usage, 128 bytes packets
>>>     echo_bench --address '172.22.150.170:7777' --number 10 --duration
>>> 60 --length 128`
>>> epoll-echo-server:      Speed: 71513 request/sec, 71513 response/sec
>>> io_uring_echo_server:   Speed: 64091 request/sec, 64091 response/sec
>>>     epoll-echo-server is 11% faster
>>>
>>> high load, 95-100% CPU core usage, 128 bytes packets
>>>     echo_bench --address '172.22.150.170:7777' --number 20 --duration
>>> 60 --length 128`
>>> epoll-echo-server:      Speed: 130186 request/sec, 130186 response/sec
>>> io_uring_echo_server:   Speed: 109793 request/sec, 109793 response/sec
>>>     epoll-echo-server is 18% faster
>>>
>>> average load, 70-85% CPU core usage, 2048 bytes packets
>>>     echo_bench --address '172.22.150.170:7777' --number 10 --duration
>>> 60 --length 2048`
>>> epoll-echo-server:      Speed: 63082 request/sec, 63082 response/sec
>>> io_uring_echo_server:   Speed: 59449 request/sec, 59449 response/sec
>>>     epoll-echo-server is 6% faster
>>>
>>> high load, 95-100% CPU core usage, 2048 bytes packets
>>>     echo_bench --address '172.22.150.170:7777' --number 20 --duration
>>> 60 --length 2048`
>>> epoll-echo-server:      Speed: 110402 request/sec, 110402 response/sec
>>> io_uring_echo_server:   Speed: 88718 request/sec, 88718 response/sec
>>>     epoll-echo-server is 24% faster
>>>
>>>
>>> 2-3) The "perf top" doesn't work stable with Ubuntu over AWS. All the
>>> time it shows errors: "Uhhuh. NMI received for unknown reason", "Do
>>> you have a strange power saving mode enabled?",  "Dazed and confused,
>>> but trying to continue".
>>>
>>> Regards
>>>     Michael Stoler
>>>
>>> On Mon, Apr 19, 2021 at 1:20 PM Pavel Begunkov <[email protected]> wrote:
>>>>
>>>> On 4/19/21 10:13 AM, Michael Stoler wrote:
>>>>> We are trying to reproduce reported on page
>>>>> https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md
>>>>> results with a more realistic environment:
>>>>> 1. Internode networking in AWS cluster with i3.16xlarge nodes type(25
>>>>> Gigabit network connection between client and server)
>>>>> 2. 128 and 2048 packet sizes, to simulate typical payloads
>>>>> 3. 10 clients to get 75-95% CPU utilization by server to simulate
>>>>> server's normal load
>>>>> 4. 20 clients to get 100% CPU utilization by server to simulate
>>>>> server's hard load
>>>>>
>>>>> Software:
>>>>> 1. OS: Ubuntu 20.04.2 LTS HWE with 5.8.0-45-generic kernel with latest liburing
>>>>> 2. io_uring-echo-server: https://github.com/frevib/io_uring-echo-server
>>>>> 3. epoll-echo-server: https://github.com/frevib/epoll-echo-server
>>>>> 4. benchmark: https://github.com/haraldh/rust_echo_bench
>>>>> 5. all commands runs with "hwloc-bind os=eth1"
>>>>>
>>>>> The results are confusing, epoll_echo_server shows stable advantage
>>>>> over io_uring-echo-server, despite reported advantage of
>>>>> io_uring-echo-server:
>>>>>
>>>>> 128 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
>>>>> echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 128
>>>>> epoll_echo_server:      Speed: 80999 request/sec, 80999 response/sec
>>>>> io_uring_echo_server:   Speed: 74488 request/sec, 74488 response/sec
>>>>> epoll_echo_server is 8% faster
>>>>>
>>>>> 128 bytes packet size, 20 clients, 100% CPU core utilization by server:
>>>>> echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 128
>>>>> epoll_echo_server:      Speed: 129063 request/sec, 129063 response/sec
>>>>> io_uring_echo_server:    Speed: 102681 request/sec, 102681 response/sec
>>>>> epoll_echo_server is 25% faster
>>>>>
>>>>> 2048 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
>>>>> echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 2048
>>>>> epoll_echo_server:       Speed: 74421 request/sec, 74421 response/sec
>>>>> io_uring_echo_server:    Speed: 66510 request/sec, 66510 response/sec
>>>>> epoll_echo_server is 11% faster
>>>>>
>>>>> 2048 bytes packet size, 20 clients, 100% CPU core utilization by server:
>>>>> echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 2048
>>>>> epoll_echo_server:       Speed: 108704 request/sec, 108704 response/sec
>>>>> io_uring_echo_server:    Speed: 85536 request/sec, 85536 response/sec
>>>>> epoll_echo_server is 27% faster
>>>>>
>>>>> Why io_uring shows consistent performance degradation? What is going wrong?
>>>>
>>>> 5.8 is pretty old, and I'm not sure all the performance problems were
>>>> addressed there. Apart from missing common optimisations as you may
>>>> have seen in the thread, it looks to me it doesn't have sighd(?) lock
>>>> hammering fix. Jens, knows better it has been backported or not.
>>>>
>>>> So, things you can do:
>>>> 1) try out 5.12
>>>> 2) attach `perf top` output or some other profiling for your 5.8
>>>> 3) to have a more complete picture do 2) with 5.12
>>>>


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: io_uring networking performance degradation
  2021-04-26 11:49         ` Pavel Begunkov
@ 2021-04-26 13:07           ` Michael Stoler
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Stoler @ 2021-04-26 13:07 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe

1. I am of the same opinion, but cannot prove it. bpftrace is too
intrusive and rough to measure read/write path latency.
2. No, the test wasn't bound to a particular CPU/core. It was bound to
NIC's node only by hwloc-bind os=eth_name ...
3. __skb_datagram_iter is very strange. I didn't see any activity in
top during tests. In any case all test was performed over dedicated
NIC.

Regards
    Michael Stoler

On Mon, Apr 26, 2021 at 2:49 PM Pavel Begunkov <[email protected]> wrote:
>
> On 4/25/21 10:52 AM, Michael Stoler wrote:
> > Because of unstable working of perf over AWS VM I recheck test on
> > physical machine: Ubuntu 20.04, 5.8.0-50-generic kernel, CPU AMD EPYC
> > 7272 12-Core Processor 3200MHz, BogoMIPS 5789.39, NIC melanox 5,
> > Speed: 25000Mb/s Full Duplex.
> > Over physical machine performance degradation is much less pronounced
> > but still exists:
> > io_uring-echo-server    Speed: 143081 request/sec, 143081 response/sec
> > epoll-echo-server   Speed: 150692 request/sec, 150692 response/sec
> > epoll-echo-server is 5% faster
>
> Have to note that I haven't check the userspace programs, so not sure
> it's a fair comparison (may be or may be not). So, with it being said:
>
> 1) The last report had lot of idle time, so it may be a question of
> latency but not throughput for it.
>
> 2) Did you do proper pinning to a CPU/core? taskset or cset? Also,
> did it saturate the CPU/core you used in the most recent post?
>
> 3) Looking at __skb_datagram_iter taking 1%, seems there are other
> tasks taking a relatively good share of CPU/NIC resources. What is
> this datagram? UDP on the same NIC? Is something else using your
> NIC/interface?
>
> 4) don't see even close anything related to io_uring in the recent
> run, and it was only a small fraction in previous ones. So it's
> definitely not the overhead on submit/complete. If there is a
> io_uring problem, it could be the difference in polling / iowq
> punting comparing with epoll. It may be interesting to look into.
>
> And related thing I'm curious about is to compare FAST_POLL
> requests with io_uring multi-shot polling + send/recv.
>
>
> >
> > "perf top" with io_uring-echo-server:
> > PerfTop:   16481 irqs/sec  kernel:98.5%  exact: 99.8% lost: 0/0 drop:
> > 0/0 [4000Hz cycles],  (all, 24 CPUs)
> > -------------------------------------------------------------------------------
> >      8.66%  [kernel]          [k] __x86_indirect_thunk_rax
> >      8.49%  [kernel]          [k] copy_user_generic_string
> >      5.57%  [kernel]          [k] memset
> >      2.81%  [kernel]          [k] tcp_rate_skb_sent
> >      2.32%  [kernel]          [k] __alloc_skb
> >      2.16%  [kernel]          [k] __check_object_size
> >      1.44%  [unknown]         [k] 0xffffffffc100c296
> >      1.28%  [kernel]          [k] tcp_write_xmit
> >      1.22%  [kernel]          [k] iommu_dma_map_page
> >      1.16%  [kernel]          [k] kmem_cache_free
> >      1.14%  [kernel]          [k] __softirqentry_text_start
> >      1.06%  [unknown]         [k] 0xffffffffc1008a7e
> >      1.03%  [kernel]          [k] __skb_datagram_iter
> >      0.97%  [kernel]          [k] __dev_queue_xmit
> >      0.86%  [kernel]          [k] ipv4_mtu
> >      0.85%  [kernel]          [k] tcp_schedule_loss_probe
> >      0.80%  [kernel]          [k] tcp_release_cb
> >      0.78%  [unknown]         [k] 0xffffffffc100c290
> >      0.77%  [unknown]         [k] 0xffffffffc100c295
> >      0.76%  perf              [.] __symbols__insert
> >
> > "perf top" with epoll-echo-server:
> > PerfTop:   24255 irqs/sec  kernel:98.3%  exact: 99.6% lost: 0/0 drop:
> > 0/0 [4000Hz cycles],  (all, 24 CPUs)
> > -------------------------------------------------------------------------------
> >      8.77%  [kernel]          [k] __x86_indirect_thunk_rax
> >      7.50%  [kernel]          [k] copy_user_generic_string
> >      4.10%  [kernel]          [k] memset
> >      2.70%  [kernel]          [k] tcp_rate_skb_sent
> >      2.18%  [kernel]          [k] __check_object_size
> >      2.09%  [kernel]          [k] __alloc_skb
> >      1.61%  [unknown]         [k] 0xffffffffc100c296
> >      1.47%  [kernel]          [k] __virt_addr_valid
> >      1.40%  [kernel]          [k] iommu_dma_map_page
> >      1.37%  [unknown]         [k] 0xffffffffc1008a7e
> >      1.22%  [kernel]          [k] tcp_poll
> >      1.16%  [kernel]          [k] __softirqentry_text_start
> >      1.15%  [kernel]          [k] tcp_stream_memory_free
> >      1.07%  [kernel]          [k] tcp_write_xmit
> >      1.06%  [kernel]          [k] kmem_cache_free
> >      1.03%  [kernel]          [k] tcp_release_cb
> >      0.96%  [kernel]          [k] syscall_return_via_sysret
> >      0.90%  [kernel]          [k] __lock_text_start
> >      0.82%  [kernel]          [k] __copy_skb_header
> >      0.81%  [kernel]          [k] amd_iommu_map
> >
> > Regards
> >     Michael Stoler
> >
> > On Tue, Apr 20, 2021 at 1:44 PM Michael Stoler <[email protected]> wrote:
> >>
> >> Hi, perf data and tops for linux-5.8 are here:
> >> http://rdxdownloads.rdxdyn.com/michael_stoler_perf_data.tgz
> >>
> >> Regards
> >>     Michael Stoler
> >>
> >> On Mon, Apr 19, 2021 at 5:27 PM Michael Stoler <[email protected]> wrote:
> >>>
> >>> 1)  linux-5.12-rc8 shows generally same picture:
> >>>
> >>> average load, 70-85% CPU core usage, 128 bytes packets
> >>>     echo_bench --address '172.22.150.170:7777' --number 10 --duration
> >>> 60 --length 128`
> >>> epoll-echo-server:      Speed: 71513 request/sec, 71513 response/sec
> >>> io_uring_echo_server:   Speed: 64091 request/sec, 64091 response/sec
> >>>     epoll-echo-server is 11% faster
> >>>
> >>> high load, 95-100% CPU core usage, 128 bytes packets
> >>>     echo_bench --address '172.22.150.170:7777' --number 20 --duration
> >>> 60 --length 128`
> >>> epoll-echo-server:      Speed: 130186 request/sec, 130186 response/sec
> >>> io_uring_echo_server:   Speed: 109793 request/sec, 109793 response/sec
> >>>     epoll-echo-server is 18% faster
> >>>
> >>> average load, 70-85% CPU core usage, 2048 bytes packets
> >>>     echo_bench --address '172.22.150.170:7777' --number 10 --duration
> >>> 60 --length 2048`
> >>> epoll-echo-server:      Speed: 63082 request/sec, 63082 response/sec
> >>> io_uring_echo_server:   Speed: 59449 request/sec, 59449 response/sec
> >>>     epoll-echo-server is 6% faster
> >>>
> >>> high load, 95-100% CPU core usage, 2048 bytes packets
> >>>     echo_bench --address '172.22.150.170:7777' --number 20 --duration
> >>> 60 --length 2048`
> >>> epoll-echo-server:      Speed: 110402 request/sec, 110402 response/sec
> >>> io_uring_echo_server:   Speed: 88718 request/sec, 88718 response/sec
> >>>     epoll-echo-server is 24% faster
> >>>
> >>>
> >>> 2-3) The "perf top" doesn't work stable with Ubuntu over AWS. All the
> >>> time it shows errors: "Uhhuh. NMI received for unknown reason", "Do
> >>> you have a strange power saving mode enabled?",  "Dazed and confused,
> >>> but trying to continue".
> >>>
> >>> Regards
> >>>     Michael Stoler
> >>>
> >>> On Mon, Apr 19, 2021 at 1:20 PM Pavel Begunkov <[email protected]> wrote:
> >>>>
> >>>> On 4/19/21 10:13 AM, Michael Stoler wrote:
> >>>>> We are trying to reproduce reported on page
> >>>>> https://github.com/frevib/io_uring-echo-server/blob/master/benchmarks/benchmarks.md
> >>>>> results with a more realistic environment:
> >>>>> 1. Internode networking in AWS cluster with i3.16xlarge nodes type(25
> >>>>> Gigabit network connection between client and server)
> >>>>> 2. 128 and 2048 packet sizes, to simulate typical payloads
> >>>>> 3. 10 clients to get 75-95% CPU utilization by server to simulate
> >>>>> server's normal load
> >>>>> 4. 20 clients to get 100% CPU utilization by server to simulate
> >>>>> server's hard load
> >>>>>
> >>>>> Software:
> >>>>> 1. OS: Ubuntu 20.04.2 LTS HWE with 5.8.0-45-generic kernel with latest liburing
> >>>>> 2. io_uring-echo-server: https://github.com/frevib/io_uring-echo-server
> >>>>> 3. epoll-echo-server: https://github.com/frevib/epoll-echo-server
> >>>>> 4. benchmark: https://github.com/haraldh/rust_echo_bench
> >>>>> 5. all commands runs with "hwloc-bind os=eth1"
> >>>>>
> >>>>> The results are confusing, epoll_echo_server shows stable advantage
> >>>>> over io_uring-echo-server, despite reported advantage of
> >>>>> io_uring-echo-server:
> >>>>>
> >>>>> 128 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> >>>>> echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 128
> >>>>> epoll_echo_server:      Speed: 80999 request/sec, 80999 response/sec
> >>>>> io_uring_echo_server:   Speed: 74488 request/sec, 74488 response/sec
> >>>>> epoll_echo_server is 8% faster
> >>>>>
> >>>>> 128 bytes packet size, 20 clients, 100% CPU core utilization by server:
> >>>>> echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 128
> >>>>> epoll_echo_server:      Speed: 129063 request/sec, 129063 response/sec
> >>>>> io_uring_echo_server:    Speed: 102681 request/sec, 102681 response/sec
> >>>>> epoll_echo_server is 25% faster
> >>>>>
> >>>>> 2048 bytes packet size, 10 clients, 75-95% CPU core utilization by server:
> >>>>> echo_bench --address '172.22.117.67:7777' -c 10 -t 60 -l 2048
> >>>>> epoll_echo_server:       Speed: 74421 request/sec, 74421 response/sec
> >>>>> io_uring_echo_server:    Speed: 66510 request/sec, 66510 response/sec
> >>>>> epoll_echo_server is 11% faster
> >>>>>
> >>>>> 2048 bytes packet size, 20 clients, 100% CPU core utilization by server:
> >>>>> echo_bench --address '172.22.117.67:7777' -c 20 -t 60 -l 2048
> >>>>> epoll_echo_server:       Speed: 108704 request/sec, 108704 response/sec
> >>>>> io_uring_echo_server:    Speed: 85536 request/sec, 85536 response/sec
> >>>>> epoll_echo_server is 27% faster
> >>>>>
> >>>>> Why io_uring shows consistent performance degradation? What is going wrong?
> >>>>
> >>>> 5.8 is pretty old, and I'm not sure all the performance problems were
> >>>> addressed there. Apart from missing common optimisations as you may
> >>>> have seen in the thread, it looks to me it doesn't have sighd(?) lock
> >>>> hammering fix. Jens, knows better it has been backported or not.
> >>>>
> >>>> So, things you can do:
> >>>> 1) try out 5.12
> >>>> 2) attach `perf top` output or some other profiling for your 5.8
> >>>> 3) to have a more complete picture do 2) with 5.12
> >>>>
>
>
> --
> Pavel Begunkov



-- 
Michael Stoler

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-26 13:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-19  9:13 io_uring networking performance degradation Michael Stoler
2021-04-19 10:20 ` Pavel Begunkov
2021-04-19 14:27   ` Michael Stoler
2021-04-20 10:44     ` Michael Stoler
2021-04-25  9:52       ` Michael Stoler
2021-04-26 11:49         ` Pavel Begunkov
2021-04-26 13:07           ` Michael Stoler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox