From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server-vie001.gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,URIBL_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=new2025; t=1758221268; bh=OwtVjO+rPUagr1/5tB8bSRXa5FM7IgWaLxLEN8XWfGs=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version: Content-Transfer-Encoding:Message-ID:Date:From:Reply-To:Subject:To: Cc:In-Reply-To:References:Resent-Date:Resent-From:Resent-To: Resent-Cc:User-Agent:Content-Type:Content-Transfer-Encoding; b=Zi8lu8mMFFg0BMvSkXdaO+p3hOLzlMJM1v2UAR2UJXA8if8tyw2UCNOEIRli9bO/w Rs7c/+0/oIup67uQXpqbSRiwqc8LQTzrIqM7abyyfmL7rbPal4z4QG/4+vUCOzWO7X GeT5fT41xvPl8YAI8TKVPv3RZ6zKnE4+Jm2AFuTfwQcd8EfcSdUIWTcIbfdm7ooXVV kkmel0c+tgJFEr066u9EM06jOo880vqclqXWyeZf39+msISh2eFhfj8Prwi+8NimTh dKOudrLmydORTF82090Nfp80OQgJgZ/rs2k5mU2i/AotJkPKI+b0oCy01yctEIro2c 2hn7L8EB82s0A== Received: from localhost.localdomain (unknown [68.183.184.174]) by server-vie001.gnuweeb.org (Postfix) with ESMTPSA id 921F33127976; Thu, 18 Sep 2025 18:47:47 +0000 (UTC) From: Alviro Iskandar Setiawan To: Ammar Faiz Cc: Alviro Iskandar Setiawan , Ahmad Gani , GNU/Weeb Mailing List Subject: [PATCH gwproxy v12 0/8] Initial work on integration of DNS parser lib in gwproxy Date: Fri, 19 Sep 2025 01:47:22 +0700 Message-Id: <20250918184730.598305-1-alviro.iskandar@gnuweeb.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Happy Friday everyone, I hope you all are doing well! This is the v12 revision of the new DNS resolver feature that does not rely on getaddrinfo(). This revision is based on the previous v11 series with significant changes. This series is also available at: https://github.com/alviroiskandar/gwproxy.git #new-dns-resolver One of the concerns in using getaddrinfo() is that it will block the entire thread while waiting for the DNS resolution to complete. As a result, the naive workaround is to have dedicated worker threads that solely handle DNS resolution. Such workaround is not so efficient as it needs to communicate across multiple threads using mutexes and condition variables, which adds overhead. Moreover, each getaddrinfo() call will create its own socket and close it after the resolution is done, which is not efficient either. Another concern is that getaddrinfo() does not allow us to specify the DNS server to use, so it relies on the system's DNS configuration, which may not be desirable in some scenarios. There was an attempt to batch DNS resolutions using getaddrinfo_a(), but it's not widely available in all libc implementations. And it's still not pollable. Behind the scenes, it still uses multiple threads to handle the DNS resolutions, so it still has the same concerns as above. Even worse, it needs to clone() the entire process to handle a single DNS query. So if we batch 100 DNS queries, it will execute close() syscall 100 times, which is not efficient and scalable at all. The cancellation of pending DNS queries is also very complicated. To address that concern, this series introduces a new experimental DNS resolver feature that does not rely on getaddrinfo(). Instead, it uses a single reusable UDP socket per thread to communicate with the DNS server directly. It also allows the event loop to poll the UDP socket for incoming DNS responses, so it does not block the entire thread. The DNS server to use can also be specified via the new --dns-server option. Currently, only one DNS server can be specified. There are 8 patches in this series. One patch is not relevant to the DNS feature, which is the patch that removes the unused struct gwp_dns_query declaration. The other 7 patches are related to the DNS feature, which are: * Ahmad Gani: Introduce __unused macro to silence unused variable warnings. * Ahmad Gani: Add DNS parser code. * me: Add DNS resolver code. * me: Add DNS resolution interface APIs. * me: Introduce --dns-server and --raw-dns options. * me: Integrate the raw DNS feature to epoll. * me: Introduce --use-new-dns-resolver configure option. I tried not to be too invasive in this series. I don't touch dns.c at all. Instead, I added a new file dns_resolver.c which will only be compiled if --use-new-dns-resolver is enabled during the configuration time. The new DNS resolver feature is disabled by default as it's still experimental. I also added a new option --dns-server to specify the DNS server to use, and --raw-dns=1|0 to enable the new raw DNS feature. This series slowly constructs the new DNS resolver feature step by step. Patch 2-5 are preparatory patches. Patch 6 is where the data structures are embedded into gwproxy common data structures and the initialization happens. The actual integration to epoll happens in patch 7. And then the last patch, patch 8, adds the configure option to enable the new DNS resolver feature. How to test this feature: ./configure --cc=clang --use-new-dns-resolver; make -j$(nproc); ./gwproxy --as-socks5=1 --raw-dns=1 --nr-workers=1 --log-level=4 --bind=[::]:1080 --dns-server=1.1.1.1:53; Then in another terminal, you can use curl to test it: curl --proxy socks5h://[::1]:1080 http://example.com; You will see something like this: $ ./gwproxy --as-socks5=1 --raw-dns=1 --nr-workers=4 --log-level=4 --bind=[::]:1080 --dns-server=1.1.1.1:53 [2025-09-19 01:17:55][debug ][00940294]: Using event loop: epoll [2025-09-19 01:17:55][debug ][00940294]: Initializing SOCKS5 context [2025-09-19 01:17:55][debug ][00940294]: SOCKS5 context initialized without auth file [2025-09-19 01:17:55][info ][00940294]: Worker 0 is listening on [::]:1080 (fd=3) [2025-09-19 01:17:55][debug ][00940294]: Worker 0 initialized raw DNS resolver: 1.1.1.1:53 (fd=4) [2025-09-19 01:17:55][debug ][00940294]: Worker 0 registered raw DNS UDP socket to epoll (fd=4) [2025-09-19 01:17:55][debug ][00940294]: Worker 0 epoll (ep_fd=5, ev_fd=6) [2025-09-19 01:17:55][info ][00940294]: Worker 1 is listening on [::]:1080 (fd=7) [2025-09-19 01:17:55][debug ][00940294]: Worker 1 initialized raw DNS resolver: 1.1.1.1:53 (fd=8) [2025-09-19 01:17:55][debug ][00940294]: Worker 1 registered raw DNS UDP socket to epoll (fd=8) [2025-09-19 01:17:55][debug ][00940294]: Worker 1 epoll (ep_fd=9, ev_fd=10) [2025-09-19 01:17:55][info ][00940294]: Worker 2 is listening on [::]:1080 (fd=11) [2025-09-19 01:17:55][debug ][00940294]: Worker 2 initialized raw DNS resolver: 1.1.1.1:53 (fd=12) [2025-09-19 01:17:55][debug ][00940294]: Worker 2 registered raw DNS UDP socket to epoll (fd=12) [2025-09-19 01:17:55][debug ][00940294]: Worker 2 epoll (ep_fd=13, ev_fd=14) [2025-09-19 01:17:55][info ][00940294]: Worker 3 is listening on [::]:1080 (fd=15) [2025-09-19 01:17:55][debug ][00940294]: Worker 3 initialized raw DNS resolver: 1.1.1.1:53 (fd=16) [2025-09-19 01:17:55][debug ][00940294]: Worker 3 registered raw DNS UDP socket to epoll (fd=16) [2025-09-19 01:17:55][debug ][00940294]: Worker 3 epoll (ep_fd=17, ev_fd=18) [2025-09-19 01:17:55][info ][00940295]: Worker 1 started (epoll) [2025-09-19 01:17:55][info ][00940296]: Worker 2 started (epoll) [2025-09-19 01:17:55][info ][00940297]: Worker 3 started (epoll) [2025-09-19 01:17:55][info ][00940294]: Worker 0 started (epoll) [2025-09-19 01:17:55][debug ][00940296]: Increased connection slot capacity to 16 [2025-09-19 01:17:55][debug ][00940296]: New connection from [::1]:45444 (fd=19) [2025-09-19 01:17:55][debug ][00940296]: Resolved DNS query for example.com to 23.215.0.138:80 (gcp_idx=0) [2025-09-19 01:17:55][info ][00940296]: New connection pair created (idx=0, cfd=19, tfd=20, ca=[::1]:45444, ta=23.215.0.138:80) [2025-09-19 01:17:56][info ][00940296]: Target socket connected (fd=20, idx=0, ca=[::1]:45444, ta=23.215.0.138:80) [2025-09-19 01:17:56][info ][00940296]: Closing connection pair (idx=0, cfd=19, tfd=20, ca=[::1]:45444, ta=23.215.0.138:80) [2025-09-19 01:17:56][debug ][00940296]: Connection slot capacity shrunk to 0 Interesting points from that log: * The DNS server 1.1.1.1:53 is used. * Each thread has its own DNS UDP socket registered to epoll. * The DNS query for example.com is resolved properly. * gcp_idx=0 indicates that the DNS query uses txid=0, which is also the index in the sess_map array. Future works to be done: * Support multiple DNS servers. * Support /etc/hosts file parsing. * Add timeout handling for DNS queries. * Integrate io_uring support for the new DNS feature. * Split DNS parser unit tests into a separate test suite. * Integrate GitHub Actions to build and test the new DNS feature. Link to v11: https://lore.gnuweeb.org/gwml/20250914050943.184934-1-reyuki@gnuweeb.org Makefile | 6 + configure | 8 + src/gwproxy/common.h | 4 + src/gwproxy/dns_parser.c | 583 +++++++++++++++++++++++++++++++++++++ src/gwproxy/dns_parser.h | 192 ++++++++++++ src/gwproxy/dns_resolver.c | 377 ++++++++++++++++++++++++ src/gwproxy/dns_resolver.h | 49 ++++ src/gwproxy/ev/epoll.c | 143 ++++++++- src/gwproxy/gwproxy.c | 199 ++++++++++++- src/gwproxy/gwproxy.h | 28 +- 10 files changed, 1570 insertions(+), 19 deletions(-) create mode 100644 src/gwproxy/dns_parser.c create mode 100644 src/gwproxy/dns_parser.h create mode 100644 src/gwproxy/dns_resolver.c create mode 100644 src/gwproxy/dns_resolver.h base-commit: 60c6c822cf8ab14d80800776435417238ea371b0 -- Alviro Iskandar Setiawan