From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server-vie001.gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,URIBL_DBL_BLOCKED_OPENDNS, URIBL_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=new2025; t=1759222476; bh=yLpMOMOeitZaDa5Zawbwpn+6/HOmjBcRrXnh61xiL1w=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version: Content-Transfer-Encoding:Message-ID:Date:From:Reply-To:Subject:To: Cc:In-Reply-To:References:Resent-Date:Resent-From:Resent-To: Resent-Cc:User-Agent:Content-Type:Content-Transfer-Encoding; b=gpvA+z6w+rLWdzQLOZJKrb6LO9Oku0zvTGEKR+UrLdCHx3WBMn8S0fVshgrIatc62 rhyMa5Bd3OGjSXIEaleZQEZcpU1YaeesF8eDOeUWEr2MOmWYuHgHNIiWUlZ2mIcXtl 07IkehYjUjgEZBaE66+tQPzN3NylXyWvC98xnk6tFk2vBkLVgpmqsSjX0mpKvKG74U Z9NlqHdMABh6BQGv+lpscQV99USJnS/li3AAfwCc29eDnlc9avYKDZPqsotTfZ3yP1 gjvZyVEpEv7yMsxIIzOR1KkDtXt4pPWIjfhXxSouYw4UdaQTspZGos8IgG8yjsskVb yXQgE/rjuYaJA== Received: from localhost.localdomain (unknown [68.183.184.174]) by server-vie001.gnuweeb.org (Postfix) with ESMTPSA id CDD303127919; Tue, 30 Sep 2025 08:54:34 +0000 (UTC) From: Alviro Iskandar Setiawan To: Ammar Faiz Cc: Alviro Iskandar Setiawan , Ahmad Gani , GNU/Weeb Mailing List Subject: [PATCH gwproxy v13 0/8] Initial work on integration of DNS parser lib in gwproxy Date: Tue, 30 Sep 2025 15:54:20 +0700 Message-Id: <20250930085428.717195-1-alviro.iskandar@gnuweeb.org> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Hi, I hope you all are doing well! I was busy doing business in Jakarta. So it's a bit delayed. This is the v13 revision of the new DNS resolver feature that does not rely on getaddrinfo(). This series is also available at: https://github.com/alviroiskandar/gwproxy.git #new-dns-resolver Changes from v11 to v12: * Significant changes from Ahmad Gani's work. Link to v11: https://lore.gnuweeb.org/gwml/20250914050943.184934-1-reyuki@gnuweeb.org Changes from v12 to v13: * Forbid using the raw DNS feature when the event loop is io_uring. * Fix missed gwp_dns_res_drop_query() call in gwp_dns_res_complete_query(). Link to v12: https://lore.gnuweeb.org/gwml/20250918184730.598305-1-alviro.iskandar@gnuweeb.org One of the concerns in using getaddrinfo() is that it will block the entire thread while waiting for the DNS resolution to complete. As a result, the naive workaround is to have dedicated worker threads that solely handle DNS resolution. Such workaround is not so efficient as it needs to communicate across multiple threads using mutexes and condition variables, which adds overhead. Moreover, each getaddrinfo() call will create its own socket and close it after the resolution is done, which is not efficient either. Another concern is that getaddrinfo() does not allow us to specify the DNS server to use, so it relies on the system's DNS configuration, which may not be desirable in some scenarios. There was an attempt to batch DNS resolutions using getaddrinfo_a(), but it's not widely available in all libc implementations. And it's still not pollable. Behind the scenes, it still uses multiple threads to handle the DNS resolutions, so it still has the same concerns as above. Even worse, it needs to clone() the entire process to handle a single DNS query. So if we batch 100 DNS queries, it will execute clone() syscall 100 times, which is not efficient and scalable at all. The cancellation of pending DNS queries is also very complicated. To address that concern, this series introduces a new experimental DNS resolver feature that does not rely on getaddrinfo(). Instead, it uses a single reusable UDP socket per thread to communicate with the DNS server directly. It also allows the event loop to poll the UDP socket for incoming DNS responses, so it does not block the entire thread. The DNS server to use can also be specified via the new --dns-server option. Currently, only one DNS server can be specified. There are 8 patches in this series. One patch is not relevant to the DNS feature, which is the patch that removes the unused struct gwp_dns_query declaration. The other 7 patches are related to the DNS feature, which are: * Ahmad Gani: Introduce __unused macro to silence unused variable warnings. * Ahmad Gani: Add DNS parser code. * me: Add DNS resolver code. * me: Add DNS resolution interface APIs. * me: Introduce --dns-server and --raw-dns options. * me: Integrate the raw DNS feature to epoll. * me: Introduce --use-new-dns-resolver configure option. I tried not to be too invasive in this series. I don't touch dns.c at all. Instead, I added a new file dns_resolver.c which will only be compiled if --use-new-dns-resolver is enabled during the configuration time. The new DNS resolver feature is disabled by default as it's still experimental. I also added a new option --dns-server to specify the DNS server to use, and --raw-dns=1|0 to enable the new raw DNS feature. This series slowly constructs the new DNS resolver feature step by step. Patch 2-5 are preparatory patches. Patch 6 is where the data structures are embedded into gwproxy common data structures and the initialization happens. The actual integration to epoll happens in patch 7. And then the last patch, patch 8, adds the configure option to enable the new DNS resolver feature. How to test this feature: ./configure --cc=clang --use-new-dns-resolver; make -j$(nproc); ./gwproxy --as-socks5=1 --raw-dns=1 --nr-workers=1 --log-level=4 --bind=[::]:1080 --dns-server=1.1.1.1:53; Then in another terminal, you can use curl to test it: curl --proxy socks5h://[::1]:1080 http://example.com; You will see something like this: $ ./gwproxy --as-socks5=1 --raw-dns=1 --nr-workers=4 --log-level=4 --bind=[::]:1080 --dns-server=1.1.1.1:53 [2025-09-19 01:17:55][debug ][00940294]: Using event loop: epoll [2025-09-19 01:17:55][debug ][00940294]: Initializing SOCKS5 context [2025-09-19 01:17:55][debug ][00940294]: SOCKS5 context initialized without auth file [2025-09-19 01:17:55][info ][00940294]: Worker 0 is listening on [::]:1080 (fd=3) [2025-09-19 01:17:55][debug ][00940294]: Worker 0 initialized raw DNS resolver: 1.1.1.1:53 (fd=4) [2025-09-19 01:17:55][debug ][00940294]: Worker 0 registered raw DNS UDP socket to epoll (fd=4) [2025-09-19 01:17:55][debug ][00940294]: Worker 0 epoll (ep_fd=5, ev_fd=6) [2025-09-19 01:17:55][info ][00940294]: Worker 1 is listening on [::]:1080 (fd=7) [2025-09-19 01:17:55][debug ][00940294]: Worker 1 initialized raw DNS resolver: 1.1.1.1:53 (fd=8) [2025-09-19 01:17:55][debug ][00940294]: Worker 1 registered raw DNS UDP socket to epoll (fd=8) [2025-09-19 01:17:55][debug ][00940294]: Worker 1 epoll (ep_fd=9, ev_fd=10) [2025-09-19 01:17:55][info ][00940294]: Worker 2 is listening on [::]:1080 (fd=11) [2025-09-19 01:17:55][debug ][00940294]: Worker 2 initialized raw DNS resolver: 1.1.1.1:53 (fd=12) [2025-09-19 01:17:55][debug ][00940294]: Worker 2 registered raw DNS UDP socket to epoll (fd=12) [2025-09-19 01:17:55][debug ][00940294]: Worker 2 epoll (ep_fd=13, ev_fd=14) [2025-09-19 01:17:55][info ][00940294]: Worker 3 is listening on [::]:1080 (fd=15) [2025-09-19 01:17:55][debug ][00940294]: Worker 3 initialized raw DNS resolver: 1.1.1.1:53 (fd=16) [2025-09-19 01:17:55][debug ][00940294]: Worker 3 registered raw DNS UDP socket to epoll (fd=16) [2025-09-19 01:17:55][debug ][00940294]: Worker 3 epoll (ep_fd=17, ev_fd=18) [2025-09-19 01:17:55][info ][00940295]: Worker 1 started (epoll) [2025-09-19 01:17:55][info ][00940296]: Worker 2 started (epoll) [2025-09-19 01:17:55][info ][00940297]: Worker 3 started (epoll) [2025-09-19 01:17:55][info ][00940294]: Worker 0 started (epoll) [2025-09-19 01:17:55][debug ][00940296]: Increased connection slot capacity to 16 [2025-09-19 01:17:55][debug ][00940296]: New connection from [::1]:45444 (fd=19) [2025-09-19 01:17:55][debug ][00940296]: Resolved DNS query for example.com to 23.215.0.138:80 (gcp_idx=0) [2025-09-19 01:17:55][info ][00940296]: New connection pair created (idx=0, cfd=19, tfd=20, ca=[::1]:45444, ta=23.215.0.138:80) [2025-09-19 01:17:56][info ][00940296]: Target socket connected (fd=20, idx=0, ca=[::1]:45444, ta=23.215.0.138:80) [2025-09-19 01:17:56][info ][00940296]: Closing connection pair (idx=0, cfd=19, tfd=20, ca=[::1]:45444, ta=23.215.0.138:80) [2025-09-19 01:17:56][debug ][00940296]: Connection slot capacity shrunk to 0 Interesting points from that log: * The DNS server 1.1.1.1:53 is used. * Each thread has its own DNS UDP socket registered to epoll. * The DNS query for example.com is resolved properly. * gcp_idx=0 indicates that the DNS query uses txid=0, which is also the index in the sess_map array. Future works to be done: * Support multiple DNS servers. * Support /etc/hosts file parsing. * Add timeout handling for DNS queries. * Integrate io_uring support for the new DNS feature. * Split DNS parser unit tests into a separate test suite. * Integrate GitHub Actions to build and test the new DNS feature. Ahmad Gani (2): gwproxy: Introduce __unused macro Add DNS parser code Alviro Iskandar Setiawan (6): gwproxy: Remove 'struct gwp_dns_query' declaration Add DNS resolver code dns_resolver: Add DNS resolution interface APIs gwproxy: Introduce --dns-server and --raw-dns epoll: Intregrate the raw DNS feature to epoll Makefile: Introduce --use-new-dns-resolver configure option Makefile | 6 + configure | 8 + src/gwproxy/common.h | 4 + src/gwproxy/dns_parser.c | 583 +++++++++++++++++++++++++++++++++++++ src/gwproxy/dns_parser.h | 192 ++++++++++++ src/gwproxy/dns_resolver.c | 377 ++++++++++++++++++++++++ src/gwproxy/dns_resolver.h | 49 ++++ src/gwproxy/ev/epoll.c | 143 ++++++++- src/gwproxy/gwproxy.c | 205 ++++++++++++- src/gwproxy/gwproxy.h | 28 +- 10 files changed, 1576 insertions(+), 19 deletions(-) create mode 100644 src/gwproxy/dns_parser.c create mode 100644 src/gwproxy/dns_parser.h create mode 100644 src/gwproxy/dns_resolver.c create mode 100644 src/gwproxy/dns_resolver.h base-commit: 60c6c822cf8ab14d80800776435417238ea371b0 -- Alviro Iskandar Setiawan