public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [axboe-block:io_uring-defer-tw.4] [io_uring]  61a5e20297: stress-ng.io-uring.ops_per_sec 41.9% regression
@ 2025-07-01  4:47 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2025-07-01  4:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: oe-lkp, lkp, io-uring, oliver.sang



Hello,

kernel test robot noticed a 41.9% regression of stress-ng.io-uring.ops_per_sec on:


commit: 61a5e202971d4a242fc761728e89922edde02d38 ("io_uring: switch defer task_work to using a ring")
https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git io_uring-defer-tw.4

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: io-uring
	cpufreq_governor: performance


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202507010550.2d6f83ea-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250701/202507010550.2d6f83ea-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp2/io-uring/stress-ng/60s

commit: 
  8559f3b41f ("io_uring: make task_work pending check dependent on ring type")
  61a5e20297 ("io_uring: switch defer task_work to using a ring")

8559f3b41fdcdd01 61a5e202971d4a242fc761728e8 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1022268 ±  2%     -30.4%     711175        meminfo.Mapped
 7.478e+09           +30.1%  9.727e+09        cpuidle..time
  3.03e+08           -20.9%  2.398e+08 ±  3%  cpuidle..usage
    696425 ±171%    +181.2%    1958387 ± 81%  numa-meminfo.node0.Unevictable
    940879 ± 10%     -32.9%     631792 ± 14%  numa-meminfo.node1.Mapped
     43.50 ± 20%     -73.9%      11.33 ± 53%  perf-c2c.DRAM.local
     32749 ± 10%     -86.3%       4475 ± 29%  perf-c2c.HITM.local
     33251 ± 10%     -85.0%       4989 ± 25%  perf-c2c.HITM.total
  14632245 ±  9%     -38.5%    8999074 ±  7%  numa-numastat.node0.local_node
  14749610 ±  9%     -38.6%    9056826 ±  6%  numa-numastat.node0.numa_hit
  21106190 ±  4%     -37.5%   13198942 ±  4%  numa-numastat.node1.local_node
  21186924 ±  4%     -37.0%   13339356 ±  4%  numa-numastat.node1.numa_hit
     43.02 ±  2%     -12.5%      37.66 ±  2%  vmstat.cpu.id
     19.87          +121.8%      44.07 ±  2%  vmstat.cpu.wa
     73.14          +101.7%     147.54 ±  2%  vmstat.procs.b
    112.33 ±  2%     -64.8%      39.60 ±  7%  vmstat.procs.r
  12695197           -38.2%    7849636 ±  4%  vmstat.system.cs
   5179340 ±  2%     -24.4%    3915343 ±  4%  vmstat.system.in
    174059 ±171%    +181.3%     489607 ± 81%  numa-vmstat.node0.nr_unevictable
    174060 ±171%    +181.3%     489607 ± 81%  numa-vmstat.node0.nr_zone_unevictable
  14750003 ±  9%     -38.6%    9057006 ±  6%  numa-vmstat.node0.numa_hit
  14632638 ±  9%     -38.5%    8999253 ±  7%  numa-vmstat.node0.numa_local
    236391 ± 10%     -33.3%     157713 ± 14%  numa-vmstat.node1.nr_mapped
  21186186 ±  4%     -37.0%   13338387 ±  4%  numa-vmstat.node1.numa_hit
  21105453 ±  4%     -37.5%   13197958 ±  4%  numa-vmstat.node1.numa_local
     41.57            -5.8       35.76 ±  2%  mpstat.cpu.all.idle%
     20.32           +25.1       45.43 ±  2%  mpstat.cpu.all.iowait%
      6.25 ±  4%      -2.2        4.09 ±  6%  mpstat.cpu.all.irq%
      0.34 ±  4%      -0.2        0.14 ±  6%  mpstat.cpu.all.soft%
     28.91           -15.5       13.40 ±  6%  mpstat.cpu.all.sys%
      2.62            -1.4        1.17 ±  6%  mpstat.cpu.all.usr%
     18.83 ±  5%     -84.1%       3.00        mpstat.max_utilization.seconds
     61.41           -30.1%      42.94        mpstat.max_utilization_pct
 3.455e+08           -41.9%  2.006e+08 ±  4%  stress-ng.io-uring.ops
   5758736           -41.9%    3343243 ±  4%  stress-ng.io-uring.ops_per_sec
  63485668           -85.7%    9052788 ± 15%  stress-ng.time.involuntary_context_switches
     86971            -2.2%      85030        stress-ng.time.minor_page_faults
      6021           -54.8%       2724 ±  6%  stress-ng.time.percent_of_cpu_this_job_got
      3383           -53.8%       1562 ±  6%  stress-ng.time.system_time
    248.17           -67.3%      81.18 ±  9%  stress-ng.time.user_time
 4.227e+08           -40.1%  2.531e+08 ±  4%  stress-ng.time.voluntary_context_switches
   2888857 ±  2%      -8.1%    2654260        proc-vmstat.nr_active_anon
    302955            -3.1%     293576        proc-vmstat.nr_anon_pages
   3475920 ±  2%      -6.5%    3250878        proc-vmstat.nr_file_pages
     44207            -3.1%      42858        proc-vmstat.nr_kernel_stack
    255933 ±  3%     -30.6%     177546        proc-vmstat.nr_mapped
   2586684 ±  3%      -8.7%    2361525        proc-vmstat.nr_shmem
     43152            -1.5%      42518        proc-vmstat.nr_slab_reclaimable
   2888857 ±  2%      -8.1%    2654260        proc-vmstat.nr_zone_active_anon
  35939101           -37.7%   22399100 ±  3%  proc-vmstat.numa_hit
  35741003           -37.9%   22200912 ±  3%  proc-vmstat.numa_local
    585759 ±  5%     -27.5%     424436 ±  8%  proc-vmstat.numa_pte_updates
  36196152           -37.5%   22624491 ±  3%  proc-vmstat.pgalloc_normal
    700860 ±  3%      -7.0%     651538 ±  4%  proc-vmstat.pgfault
  32134448           -41.1%   18939637 ±  4%  proc-vmstat.pgfree
  16707904           -77.5%    3755057 ± 10%  proc-vmstat.unevictable_pgs_culled
      0.17 ±  4%     +94.3%       0.32 ± 16%  perf-stat.i.MPKI
 2.698e+10           -40.1%  1.616e+10 ±  4%  perf-stat.i.branch-instructions
      0.92            -0.3        0.64        perf-stat.i.branch-miss-rate%
 2.173e+08           -57.1%   93142321 ±  5%  perf-stat.i.branch-misses
      2.25 ±  4%      +6.4        8.67 ± 17%  perf-stat.i.cache-miss-rate%
 1.262e+09           -68.8%   3.94e+08 ±  6%  perf-stat.i.cache-references
  13218006           -37.6%    8252620 ±  4%  perf-stat.i.context-switches
      3.40            -7.5%       3.15 ±  3%  perf-stat.i.cpi
 4.003e+11           -40.4%  2.384e+11 ±  5%  perf-stat.i.cpu-cycles
   5382764           -76.2%    1281759 ± 10%  perf-stat.i.cpu-migrations
     32980 ±  5%     -25.9%      24437 ±  9%  perf-stat.i.cycles-between-cache-misses
 1.327e+11           -39.9%  7.973e+10 ±  4%  perf-stat.i.instructions
      0.33            +9.9%       0.36 ±  3%  perf-stat.i.ipc
     96.88           -48.8%      49.64 ±  4%  perf-stat.i.metric.K/sec
      8872 ±  4%     -11.6%       7844 ±  4%  perf-stat.i.minor-faults
      8872 ±  4%     -11.6%       7844 ±  4%  perf-stat.i.page-faults
      0.18 ±  3%     +61.7%       0.29 ±  8%  perf-stat.overall.MPKI
      0.81            -0.2        0.58        perf-stat.overall.branch-miss-rate%
      1.88 ±  3%      +4.0        5.86 ±  9%  perf-stat.overall.cache-miss-rate%
     16903 ±  3%     -38.3%      10426 ±  9%  perf-stat.overall.cycles-between-cache-misses
 2.655e+10           -40.1%   1.59e+10 ±  4%  perf-stat.ps.branch-instructions
 2.138e+08           -57.2%   91585587 ±  5%  perf-stat.ps.branch-misses
 1.241e+09           -68.8%  3.875e+08 ±  6%  perf-stat.ps.cache-references
  13003285           -37.6%    8120099 ±  4%  perf-stat.ps.context-switches
 3.938e+11           -40.5%  2.345e+11 ±  5%  perf-stat.ps.cpu-cycles
   5295095           -76.2%    1259803 ± 10%  perf-stat.ps.cpu-migrations
 1.306e+11           -39.9%  7.846e+10 ±  4%  perf-stat.ps.instructions
      8714 ±  4%     -11.7%       7694 ±  4%  perf-stat.ps.minor-faults
      8714 ±  4%     -11.7%       7694 ±  4%  perf-stat.ps.page-faults
 8.049e+12           -40.0%  4.829e+12 ±  4%  perf-stat.total.instructions
    879267 ±  3%     -77.4%     198767 ± 46%  sched_debug.cfs_rq:/.avg_vruntime.avg
   2197261 ±  7%     -80.3%     433455 ± 40%  sched_debug.cfs_rq:/.avg_vruntime.max
    702597 ±  3%     -82.0%     126663 ± 48%  sched_debug.cfs_rq:/.avg_vruntime.min
    144651 ±  9%     -75.7%      35081 ± 36%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.38 ±  7%     -79.6%       0.08 ± 20%  sched_debug.cfs_rq:/.h_nr_queued.avg
      2.92 ± 20%     -65.7%       1.00        sched_debug.cfs_rq:/.h_nr_queued.max
      0.61 ±  4%     -57.2%       0.26 ±  9%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.34 ±  6%     -77.5%       0.08 ± 19%  sched_debug.cfs_rq:/.h_nr_runnable.avg
      2.92 ± 20%     -65.7%       1.00        sched_debug.cfs_rq:/.h_nr_runnable.max
      0.56 ±  5%     -53.3%       0.26 ±  9%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
    115895 ± 14%     -93.3%       7740 ± 69%  sched_debug.cfs_rq:/.left_deadline.avg
   1148129 ± 31%     -77.7%     255916 ± 52%  sched_debug.cfs_rq:/.left_deadline.max
    300169 ±  8%     -87.0%      39025 ± 54%  sched_debug.cfs_rq:/.left_deadline.stddev
    115876 ± 14%     -93.3%       7740 ± 69%  sched_debug.cfs_rq:/.left_vruntime.avg
   1147975 ± 31%     -77.7%     255883 ± 52%  sched_debug.cfs_rq:/.left_vruntime.max
    300120 ±  8%     -87.0%      39021 ± 54%  sched_debug.cfs_rq:/.left_vruntime.stddev
      2.08 ± 16%    -100.0%       0.00        sched_debug.cfs_rq:/.load_avg.min
    879267 ±  3%     -77.4%     198767 ± 46%  sched_debug.cfs_rq:/.min_vruntime.avg
   2197261 ±  7%     -80.3%     433455 ± 40%  sched_debug.cfs_rq:/.min_vruntime.max
    702597 ±  3%     -82.0%     126663 ± 48%  sched_debug.cfs_rq:/.min_vruntime.min
    144651 ±  9%     -75.7%      35081 ± 36%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.24 ±  5%     -67.9%       0.08 ± 19%  sched_debug.cfs_rq:/.nr_queued.avg
      0.36           -27.4%       0.26 ±  9%  sched_debug.cfs_rq:/.nr_queued.stddev
    115876 ± 14%     -93.3%       7740 ± 69%  sched_debug.cfs_rq:/.right_vruntime.avg
   1147975 ± 31%     -77.7%     255883 ± 52%  sched_debug.cfs_rq:/.right_vruntime.max
    300120 ±  8%     -87.0%      39021 ± 54%  sched_debug.cfs_rq:/.right_vruntime.stddev
    293.31 ±  2%     -61.0%     114.35 ± 10%  sched_debug.cfs_rq:/.runnable_avg.avg
    114.75 ±  6%    -100.0%       0.00        sched_debug.cfs_rq:/.runnable_avg.min
    161.40 ±  3%     +16.8%     188.44 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
    243.06 ±  2%     -53.0%     114.20 ± 10%  sched_debug.cfs_rq:/.util_avg.avg
    111.42 ±  5%    -100.0%       0.00        sched_debug.cfs_rq:/.util_avg.min
    143.53 ±  4%     +31.2%     188.36 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
     45.14 ±  5%     -53.8%      20.87 ± 29%  sched_debug.cfs_rq:/.util_est.avg
    117.16 ±  9%     -23.3%      89.81 ± 15%  sched_debug.cfs_rq:/.util_est.stddev
    460889           +78.9%     824600 ±  4%  sched_debug.cpu.avg_idle.avg
    545161 ±  4%     +83.4%    1000000        sched_debug.cpu.avg_idle.max
      7815 ±  7%     -47.7%       4084 ± 14%  sched_debug.cpu.avg_idle.min
     96234 ±  8%    +192.4%     281404 ± 13%  sched_debug.cpu.avg_idle.stddev
    754.64 ±  5%     -19.2%     609.61 ±  9%  sched_debug.cpu.clock_task.stddev
      1016 ±  7%     -74.3%     261.72 ± 25%  sched_debug.cpu.curr->pid.avg
      1648           -37.6%       1027 ± 14%  sched_debug.cpu.curr->pid.stddev
      0.00 ± 24%     -27.7%       0.00 ± 10%  sched_debug.cpu.next_balance.stddev
      0.35 ± 10%     -82.5%       0.06 ± 20%  sched_debug.cpu.nr_running.avg
      2.92 ± 20%     -65.7%       1.00        sched_debug.cpu.nr_running.max
      0.60 ±  6%     -61.0%       0.23 ±  8%  sched_debug.cpu.nr_running.stddev
   2060126           -47.9%    1073009 ± 44%  sched_debug.cpu.nr_switches.avg
   2688437           -31.6%    1839609 ± 44%  sched_debug.cpu.nr_switches.max
    650892 ±  9%     -43.0%     370926 ± 54%  sched_debug.cpu.nr_switches.min
    522908 ±  2%     -49.9%     261974 ± 45%  sched_debug.cpu.nr_switches.stddev




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-07-01  4:48 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01  4:47 [axboe-block:io_uring-defer-tw.4] [io_uring] 61a5e20297: stress-ng.io-uring.ops_per_sec 41.9% regression kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox