On Thu, Dec 09, 2021 at 12:56:36PM -0500, jrun wrote: > On Thu, Dec 09, 2021 at 03:02:12PM +0000, Pavel Begunkov wrote: > > 1) Anything in dmesg? Please when it got stuck (or what the symptoms are), > > don't kill it but wait for 3 minutes and check dmesg again. > > > > nothing in dmesg! > > > Or you to reduce the waiting time: > > "echo 10 > /proc/sys/kernel/hung_task_timeout_secs" > > oh, my kernel[mek] is missing that; rebuilding right now with > `CONFIG_DETECT_HUNG_TASK=y`; will report back after reboot. > > btw, enabled CONFIG_WQ_WATCHDOG=y for workqueue.watchdog_thresh; don't know if > that would help too. let me know. nothin! > > 3) Have you tried normal accept (non-direct)? hum, io_uring_prep_accept() also goes out for lunch. wait a minute, i see something (BUG?): all things equal, unix sockets fails but tcp socket works. i can investigate further to see if it has to do with _abstract_ unix sockets. let me know. to test, apply the attached patch to the origial repo in this thread. > no, will try, but accept_direct worked for me before introducing pthread into > the code. don't know if it matters. > > > 4) Can try increase the max number io-wq workers exceeds the max number > > of inflight requests? Increase RLIMIT_NPROC, E.g. set it to > > RLIMIT_NPROC = nr_threads + max inflight requests. i'm maxed out i think, doing this at the top of main anyway, main(): ``` struct rlimit rlim = {0}; getrlimit(RLIMIT_NPROC, &rlim); if (!(rlim.rlim_cur == RLIM_INFINITY) || !(rlim.rlim_max == RLIM_INFINITY)) { fprintf(stderr, "rlim.rlim_cur=%lu rlim.rlim_max=%lu\n", rlim.rlim_cur, rlim.rlim_max); rlim.rlim_cur = RLIM_INFINITY; rlim.rlim_max = RLIM_INFINITY; setrlimit(RLIMIT_NPROC, &rlim); perror("setrlimit"); if (ret) exit(EX_SOFTWARE); } ``` - jrun