From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server-vie001.gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,URIBL_DBL_BLOCKED_OPENDNS, URIBL_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=new2025; t=1753969180; bh=u/IlO50r2ztfgwT7tbV9aGVwdr74pOrsIKycAG4nuS4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To:Message-ID:Date:From:Reply-To:Subject:To: Cc:In-Reply-To:References:Resent-Date:Resent-From:Resent-To: Resent-Cc:User-Agent:Content-Type:Content-Transfer-Encoding; b=iBA3Z+ohTaGL3ZZDrsTPWTCt0P37YifkoA+TA2LqC5cM08lEv3EsrVRfTWt7WP+Se t4syc/Hm17asmyvXlPOuELH0dquPRk1K2bA+28W1hr+mQk9ka+UZa9C6lx+twKIxwb Tiyn0tUF0pWAWG3SdvqY8xAdrp4g/7+i9DBQLh9nLgmaui0MYWQ2Hd5L42ajsWR8a8 Ao+/1MQ/PVEnhmzNXAc/iEVgxxvOcMO1HEDR7i5zf56UELektCj4N2GH+4YVPH+7T/ CxxMgAuIkgxkENv5FRQ+/lguwhODVxu7bEanRTXiy4n6uU+nrCWyJY245X693/AS2I WD8uY1QJaIByw== Received: from linux.gnuweeb.org (unknown [182.253.126.229]) by server-vie001.gnuweeb.org (Postfix) with ESMTPSA id 4E2FC3126E14; Thu, 31 Jul 2025 13:39:39 +0000 (UTC) Date: Thu, 31 Jul 2025 20:39:36 +0700 From: Ammar Faizi To: Ahmad Gani Cc: Alviro Iskandar Setiawan , GNU/Weeb Mailing List Subject: Re: [PATCH gwproxy v1 0/3] Initial work for DNS lookup implementation Message-ID: References: <20250731030856.366368-1-reyuki@gnuweeb.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250731030856.366368-1-reyuki@gnuweeb.org> X-Machine-Hash: hUx9VaHkTWcLO7S8CQCslj6OzqBx2hfLChRz45nPESx5VSB/xuJQVOKOB1zSXE3yc9ntP27bV1M1 List-Id: On Thu, Jul 31, 2025 at 10:07:43AM +0700, Ahmad Gani wrote: > Is it preferred to use the current model (spawning dedicated DNS threads) > and make the behavior of the resolver the same as getaddrinfo (blocking)? > So far I've created the addrinfo interfaces with C-ares style and tested > it; although, it's still blocking. > > I would like to know the numbers for comparison between the thread model > vs. the asynchronous model. Which approach is best for this scenario/case > (DNS resolution)? It's better if the DNS resolution can be done non-blocking via epoll than a separate thread. With dedicated DNS worker threads, you need: 1) Queue. 2) Mutex. 3) Condvar. 4) An evenfd to notify the sleeping epoll. 5) A reference count to avoid UAF on cancellation. 6) Open-and-close a SOCK_DGRAM for each query. 7) eventfd_write() from the producer. 8) eventfd_read() from the consumer (sleeping epoll). The steps to perform just a single query are unecessarily compilated. The communication between multiple threads costs could have been elided with a non-blocking pollable socket. With a non-blocking pollable socket, everything is done more efficiently: 1) No contention waiting on queue mutex lock. 2) Only one SOCK_DGRAM socket is needed (can be reused forever). 3) No event fd is needed. 4) No mutex is not needed (maybe only for the cache, but even that, it's very minimal with rwlock, not a full mutex protection). You can scale up the number of SOCK_DGRAM sockets if you ever want to use multiple DNS servers. Anyway, since SOCK_DGRAM is stateless, you can even use one SOCK_DGRAM to send-and-recv to-from multiple DNS servers (see sendto(2) and recvfrom(2)). Also, with a very busy proxy server, you'll be more far away from hitting RLIMIT_NOFILE as the number of fds is probably cut in half (no event fd + socket fds created by getaddrinfo() internally). > I feel like multi-threading is much faster than single-threading, as it's > executed in true parallel (on a multi-core system) compared to > concurrently doing things with an event notification mechanism. Yes, but you should add more threads that call epoll_wait(). Not the number of DNS threads. The reason why we have so many DNS threads is because we can't poll it. Not because we have maxed out a CPU core to 100%. For now, multithread makes things faster when performing getaddrinfo() because multiple queries are done asynchronously. It's not because getaddrinfo() calls have fully eaten your CPU core. Right? Ideally, the same thing could be achieved with epoll_wait() without extra mutex, condvar and eventfd (which is cheaper). Communication between threads is costly, avoid it if possible. If you can have multithreaded workload with zero communication between threads, that's very good. And that's one of the reasons we want to invent our own DNS resolver. It's an effort to reduce the communication between threads. > I also noticed that in the C-ares example [3], they recommend using the > event thread example, and it is similar to the io_uring model in my > perspective, where the operation is executed internally instead of > letting the caller poll for readiness with ares_process_fd. Maybe we can > mimic this aspect? That's wrong, io_uring arms poll for networking workloads, it does not create io-wq threads (except for shutdown()). This has been discussed previously: that'll just arm a poll trigger to retry the operation when a connection comes in if you see io-wq activity for something that does connect/accept/recv/send/etc only, then there's either a bug in the kernel or you are doing something wrong in your app https://discord.com/channels/1241076672589991966/1241076672589991970/1398781840546074624 -- Ammar Faizi