public inbox for gwml@vger.gnuweeb.org
 help / color / mirror / Atom feed
* [PATCH gwproxy v12 0/8] Initial work on integration of DNS parser lib in gwproxy
@ 2025-09-18 18:47 Alviro Iskandar Setiawan
  2025-09-18 18:47 ` [PATCH gwproxy v12 1/8] gwproxy: Remove 'struct gwp_dns_query' declaration Alviro Iskandar Setiawan
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Alviro Iskandar Setiawan @ 2025-09-18 18:47 UTC (permalink / raw)
  To: Ammar Faiz; +Cc: Alviro Iskandar Setiawan, Ahmad Gani, GNU/Weeb Mailing List

Happy Friday everyone,

I hope you all are doing well!

This is the v12 revision of the new DNS resolver feature that does not
rely on getaddrinfo(). This revision is based on the previous v11
series with significant changes. This series is also available at:
	https://github.com/alviroiskandar/gwproxy.git #new-dns-resolver

	One of the concerns in using getaddrinfo() is that it will
block the entire thread while waiting for the DNS resolution to
complete. As a result, the naive workaround is to have dedicated worker
threads that solely handle DNS resolution. Such workaround is not so
efficient as it needs to communicate across multiple threads using
mutexes and condition variables, which adds overhead. Moreover, each
getaddrinfo() call will create its own socket and close it after the
resolution is done, which is not efficient either. Another concern is
that getaddrinfo() does not allow us to specify the DNS server to use,
so it relies on the system's DNS configuration, which may not be
desirable in some scenarios.

	There was an attempt to batch DNS resolutions using
getaddrinfo_a(), but it's not widely available in all libc
implementations. And it's still not pollable. Behind the scenes, it
still uses multiple threads to handle the DNS resolutions, so it still
has the same concerns as above. Even worse, it needs to clone() the
entire process to handle a single DNS query. So if we batch 100 DNS
queries, it will execute close() syscall 100 times, which is not
efficient and scalable at all. The cancellation of pending DNS queries
is also very complicated.

	To address that concern, this series introduces a new
experimental DNS resolver feature that does not rely on getaddrinfo().
Instead, it uses a single reusable UDP socket per thread to communicate
with the DNS server directly. It also allows the event loop to poll the
UDP socket for incoming DNS responses, so it does not block the entire
thread. The DNS server to use can also be specified via the new
--dns-server option. Currently, only one DNS server can be specified.

	There are 8 patches in this series. One patch is not relevant
to the DNS feature, which is the patch that removes the unused struct
gwp_dns_query declaration. The other 7 patches are related to the DNS
feature, which are:
	* Ahmad Gani: Introduce __unused macro to silence unused variable warnings.
	* Ahmad Gani: Add DNS parser code.
	* me: Add DNS resolver code.
	* me: Add DNS resolution interface APIs.
	* me: Introduce --dns-server and --raw-dns options.
	* me: Integrate the raw DNS feature to epoll.
	* me: Introduce --use-new-dns-resolver configure option.
I tried not to be too invasive in this series. I don't touch dns.c at
all. Instead, I added a new file dns_resolver.c which will only be
compiled if --use-new-dns-resolver is enabled during the configuration
time. The new DNS resolver feature is disabled by default as it's still
experimental. I also added a new option --dns-server to specify the DNS
server to use, and --raw-dns=1|0 to enable the new raw DNS feature.

	This series slowly constructs the new DNS resolver feature step
by step. Patch 2-5 are preparatory patches. Patch 6 is where the data
structures are embedded into gwproxy common data structures and the
initialization happens. The actual integration to epoll happens in
patch 7. And then the last patch, patch 8, adds the configure option
to enable the new DNS resolver feature.

How to test this feature:
	./configure --cc=clang --use-new-dns-resolver;
	make -j$(nproc);
	./gwproxy --as-socks5=1 --raw-dns=1 --nr-workers=1 --log-level=4 --bind=[::]:1080 --dns-server=1.1.1.1:53;

Then in another terminal, you can use curl to test it:
	curl --proxy socks5h://[::1]:1080 http://example.com;

You will see something like this:
$ ./gwproxy --as-socks5=1 --raw-dns=1 --nr-workers=4 --log-level=4 --bind=[::]:1080 --dns-server=1.1.1.1:53
[2025-09-19 01:17:55][debug ][00940294]: Using event loop: epoll
[2025-09-19 01:17:55][debug ][00940294]: Initializing SOCKS5 context
[2025-09-19 01:17:55][debug ][00940294]: SOCKS5 context initialized without auth file
[2025-09-19 01:17:55][info  ][00940294]: Worker 0 is listening on [::]:1080 (fd=3)
[2025-09-19 01:17:55][debug ][00940294]: Worker 0 initialized raw DNS resolver: 1.1.1.1:53 (fd=4)
[2025-09-19 01:17:55][debug ][00940294]: Worker 0 registered raw DNS UDP socket to epoll (fd=4)
[2025-09-19 01:17:55][debug ][00940294]: Worker 0 epoll (ep_fd=5, ev_fd=6)
[2025-09-19 01:17:55][info  ][00940294]: Worker 1 is listening on [::]:1080 (fd=7)
[2025-09-19 01:17:55][debug ][00940294]: Worker 1 initialized raw DNS resolver: 1.1.1.1:53 (fd=8)
[2025-09-19 01:17:55][debug ][00940294]: Worker 1 registered raw DNS UDP socket to epoll (fd=8)
[2025-09-19 01:17:55][debug ][00940294]: Worker 1 epoll (ep_fd=9, ev_fd=10)
[2025-09-19 01:17:55][info  ][00940294]: Worker 2 is listening on [::]:1080 (fd=11)
[2025-09-19 01:17:55][debug ][00940294]: Worker 2 initialized raw DNS resolver: 1.1.1.1:53 (fd=12)
[2025-09-19 01:17:55][debug ][00940294]: Worker 2 registered raw DNS UDP socket to epoll (fd=12)
[2025-09-19 01:17:55][debug ][00940294]: Worker 2 epoll (ep_fd=13, ev_fd=14)
[2025-09-19 01:17:55][info  ][00940294]: Worker 3 is listening on [::]:1080 (fd=15)
[2025-09-19 01:17:55][debug ][00940294]: Worker 3 initialized raw DNS resolver: 1.1.1.1:53 (fd=16)
[2025-09-19 01:17:55][debug ][00940294]: Worker 3 registered raw DNS UDP socket to epoll (fd=16)
[2025-09-19 01:17:55][debug ][00940294]: Worker 3 epoll (ep_fd=17, ev_fd=18)
[2025-09-19 01:17:55][info  ][00940295]: Worker 1 started (epoll)
[2025-09-19 01:17:55][info  ][00940296]: Worker 2 started (epoll)
[2025-09-19 01:17:55][info  ][00940297]: Worker 3 started (epoll)
[2025-09-19 01:17:55][info  ][00940294]: Worker 0 started (epoll)
[2025-09-19 01:17:55][debug ][00940296]: Increased connection slot capacity to 16
[2025-09-19 01:17:55][debug ][00940296]: New connection from [::1]:45444 (fd=19)
[2025-09-19 01:17:55][debug ][00940296]: Resolved DNS query for example.com to 23.215.0.138:80 (gcp_idx=0)
[2025-09-19 01:17:55][info  ][00940296]: New connection pair created (idx=0, cfd=19, tfd=20, ca=[::1]:45444, ta=23.215.0.138:80)
[2025-09-19 01:17:56][info  ][00940296]: Target socket connected (fd=20, idx=0, ca=[::1]:45444, ta=23.215.0.138:80)
[2025-09-19 01:17:56][info  ][00940296]: Closing connection pair (idx=0, cfd=19, tfd=20, ca=[::1]:45444, ta=23.215.0.138:80)
[2025-09-19 01:17:56][debug ][00940296]: Connection slot capacity shrunk to 0

Interesting points from that log:
	* The DNS server 1.1.1.1:53 is used.
	* Each thread has its own DNS UDP socket registered to epoll.
	* The DNS query for example.com is resolved properly.
	* gcp_idx=0 indicates that the DNS query uses txid=0, which is
also the index in the sess_map array.

Future works to be done:
	* Support multiple DNS servers.
	* Support /etc/hosts file parsing.
	* Add timeout handling for DNS queries.
	* Integrate io_uring support for the new DNS feature.
	* Split DNS parser unit tests into a separate test suite.
	* Integrate GitHub Actions to build and test the new DNS feature.

Link to v11: https://lore.gnuweeb.org/gwml/20250914050943.184934-1-reyuki@gnuweeb.org

 Makefile                   |   6 +
 configure                  |   8 +
 src/gwproxy/common.h       |   4 +
 src/gwproxy/dns_parser.c   | 583 +++++++++++++++++++++++++++++++++++++
 src/gwproxy/dns_parser.h   | 192 ++++++++++++
 src/gwproxy/dns_resolver.c | 377 ++++++++++++++++++++++++
 src/gwproxy/dns_resolver.h |  49 ++++
 src/gwproxy/ev/epoll.c     | 143 ++++++++-
 src/gwproxy/gwproxy.c      | 199 ++++++++++++-
 src/gwproxy/gwproxy.h      |  28 +-
 10 files changed, 1570 insertions(+), 19 deletions(-)
 create mode 100644 src/gwproxy/dns_parser.c
 create mode 100644 src/gwproxy/dns_parser.h
 create mode 100644 src/gwproxy/dns_resolver.c
 create mode 100644 src/gwproxy/dns_resolver.h


base-commit: 60c6c822cf8ab14d80800776435417238ea371b0
-- 
Alviro Iskandar Setiawan


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-18 23:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 18:47 [PATCH gwproxy v12 0/8] Initial work on integration of DNS parser lib in gwproxy Alviro Iskandar Setiawan
2025-09-18 18:47 ` [PATCH gwproxy v12 1/8] gwproxy: Remove 'struct gwp_dns_query' declaration Alviro Iskandar Setiawan
2025-09-18 18:47 ` [PATCH gwproxy v12 2/8] gwproxy: Introduce __unused macro Alviro Iskandar Setiawan
2025-09-18 18:47 ` [PATCH gwproxy v12 3/8] Add DNS parser code Alviro Iskandar Setiawan
2025-09-18 18:47 ` [PATCH gwproxy v12 4/8] Add DNS resolver code Alviro Iskandar Setiawan
2025-09-18 18:47 ` [PATCH gwproxy v12 5/8] dns_resolver: Add DNS resolution interface APIs Alviro Iskandar Setiawan
2025-09-18 23:16   ` Ammar Faizi
2025-09-18 18:47 ` [PATCH gwproxy v12 6/8] gwproxy: Introduce --dns-server and --raw-dns Alviro Iskandar Setiawan
2025-09-18 22:54   ` Ammar Faizi
2025-09-18 23:07     ` Alviro Iskandar Setiawan
2025-09-18 18:47 ` [PATCH gwproxy v12 7/8] epoll: Intregrate the raw DNS feature to epoll Alviro Iskandar Setiawan
2025-09-18 18:47 ` [PATCH gwproxy v12 8/8] Makefile: Introduce --use-new-dns-resolver configure option Alviro Iskandar Setiawan
2025-09-18 18:55 ` [PATCH gwproxy v12 0/8] Initial work on integration of DNS parser lib in gwproxy Alviro Iskandar Setiawan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox