On Fri, 1 Apr 2022 at 17:36, Jens Axboe wrote: > I take it you're continually reusing those slots? Yes. > If you have a test > case that'd be ideal. Agree that it sounds like we just need an > appropriate breather to allow fput/task_work to run. Or it could be the > deferral free of the fixed slot. Adding a breather could make the worst case latency be large. I think doing the fput synchronously would be better in general. I test this on an VM with 8G of memory and run the following: ./forkbomb 14 & # wait till 16k processes are forked for i in `seq 1 100`; do ./procreads u; done You can compare performance with plain reads (./procreads p), the other tests don't work on public kernels. Thanks, Miklos