Hi again! More interesting data for you. :-) We tried iommu=pt. 1) Xeon E5-2650 v2 + Intel XL710 40Gbit/s 4096 8192 10000 12000 16384 65435 zc MB/s 721 1191 1281 1665 1752 2255 zc CPU 95% 99% 99% 99% 99% 98% send MB/s 2229 2555 2704 2642 2756 2993 send CPU 97% 99% 98% 99% 98% 98% Xeon E5-2650 v2 + Intel XL710 40Gbit/s, iommu=pt 4096 8192 10000 12000 16384 32768 65435 zc MB/s 1130 1893 2222 2503 2994 3855 3717 zc CPU 99% 99% 99% 89% 94% 71% 48% send MB/s 2903 3620 3602 3346 3658 3855 3514 send CPU 98% 89% 96% 99% 89% 82% 74% Much much better, and makes zero-copy beneficial for >= 32 kb buffers. iommu-related things completely go away from the perf profile with iommu=pt. 2) Xeon Gold 6342 + Mellanox ConnectX-6 Dx 4096 8192 10000 12000 16384 32768 65435 zc MB/s 2060 2950 2927 2934 2945 2945 2947 zc CPU 99% 62% 59% 29% 22% 23% 11% send MB/s 2950 2949 2950 2950 2949 2949 2949 send CPU 64% 44% 50% 46% 51% 49% 45% Xeon Gold 6342 + Mellanox ConnectX-6 Dx + iommu=pt 4096 8192 10000 12000 16384 32768 65435 zc MB/s 2165 2277 2790 2802 2871 2945 2944 zc CPU 99% 89% 75% 65% 53% 34% 36% send MB/s 2902 2912 2945 2943 2927 2935 2941 send CPU 80% 63% 55% 64% 78% 68% 65% Here, disabling iommu actually makes things worse - CPU usage increases in all tests. The default mode is optimal. 3) AMD EPYC Genoa 9554 + Mellanox CX-5 4096 8192 10000 12000 16384 65435 zc MB/s 864 1495 1646 1714 1790 2266 zc CPU 99% 93% 81% 86% 75% 57% send MB/s 1799 2167 2265 2285 2248 2286 send CPU 90% 58% 54% 54% 52% 42% AMD EPYC Genoa 9554 + Mellanox CX-5 + iommu=pt 4096 8192 10000 12000 16384 65435 zc MB/s 794 1191 1361 1762 1850 2125 zc CPU 99% 84% 84% 99% 82% 60% send MB/s 2007 2238 2255 2291 2229 2218 send CPU 86% 65% 55% 55% 50% 40% AMD EPYC Genoa 9554 + Mellanox CX-5 + iommu=pt + hugepages (-l1) 4096 8192 10000 12000 16384 65435 zc MB/s 804 1539 1718 1749 1666 2310 zc CPU 99% 95% 89% 87% 65% 33% send MB/s 1763 2262 2323 2296 2235 2285 send CPU 91% 63% 61% 55% 50% 41% So here zerocopy is just slightly better in just one test - with huge pages and the maximum buffer size. Flamegraph is in the attachment, it really doesn't include any iommu-related things. -- Vitaliy Filippov