Hi, Facebook Onion CDN is slow... Replace Facebook onion asset endpoint with non-onion for faster response time. Avoid HTTP request to Facebook onion if possible. When using Facebook onion, the CDN asset URL looks like this: https://scontent.xx.facebookcooa4ldbat4g7iacswl3p2zrf5nuylvnhxn6kqolvojixwid.onion/something we can simply replace the domain with scontent.xx.fbcdn.net to get the same asset: https://scontent.xx.fbcdn.net/something Side note: We don't fully understand how Facebook actually manages their CDN. We may introduce a subtle issue by doing it this way. But we hope we don't. There are 3 patches in this series: 1. Don't use proxy if the host isn't an onion domain. Speed up the HTTP request by not using the Tor proxy if the destination host is not an onion domain. This is also a preparation to handle Facebook assets (photos, video, files) better and faster. 2. Introduce `build_url()` function. Introduce build_url() function to construct a URL based on the return value of parse_url(). Currently, the only purpose of this function is to easily change the hostname without inventing our own URL parser. This function is taken from an answer on the stackoverflow site. I put the stackoverflow link in the commit message. 3. Replace Facebook onion asset endpoint with non-onion. We're mucking around with the URL here, in patch #3. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- The following changes since commit 68e95a61956e75ad08ad0bb68f10172fd2883816: Merge branch 'dev.cache' (Facebook scraper cache) (2023-05-09 17:59:03 +0700) are available in the Git repository at: https://gitlab.torproject.org/ammarfaizi2/Facebook.git dev.fast_asset for you to fetch changes up to 8a74bcd85a0ea781ed23c77527af288abcfff900: fb: web: Replace Facebook onion asset endpoint with non-onion (2023-05-13 01:35:01 +0700) ---------------------------------------------------------------- Ammar Faizi (3): fb: web: Don't use proxy if the host isn't an onion domain fb: helper: Introduce `build_url()` function fb: web: Replace Facebook onion asset endpoint with non-onion src/Facebook/helpers.php | 18 ++++++++++++++++++ web/public/api.php | 18 ++++++++++++++++++ 2 files changed, 36 insertions(+) base-commit: 68e95a61956e75ad08ad0bb68f10172fd2883816 -- Ammar Faizi
Speed up the HTTP request by not using the Tor proxy if the destination host is not an onion domain. This is also a preparation to handle Facebook assets (photos, video, files) better and faster. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- web/public/api.php | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/web/public/api.php b/web/public/api.php index a427191b21639279..1b9ef97a2fb9c868 100644 --- a/web/public/api.php +++ b/web/public/api.php @@ -219,6 +219,16 @@ function handle_url_proxy(Facebook $fb, string $url) return 0; } + if (filter_var($data, FILTER_VALIDATE_URL)) { + /** + * Don't use proxy for non onion URL. + */ + $u = parse_url($data); + if (!preg_match("/\\.onion$/i", $u["host"])) { + $fb->setProxy(NULL); + } + } + if (!fb_http_get($fb, $data)) exit(0); } -- Ammar Faizi
Introduce build_url() function to construct a URL based on the return value of parse_url(). Currently, the only purpose of this function is to easily change the hostname without inventing our own URL parser. This function is taken from an answer on the stackoverflow site. Link: https://stackoverflow.com/a/35207936/7275114 Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- src/Facebook/helpers.php | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/src/Facebook/helpers.php b/src/Facebook/helpers.php index 224ec45d70755407..ae50775908ceb71f 100644 --- a/src/Facebook/helpers.php +++ b/src/Facebook/helpers.php @@ -66,3 +66,21 @@ function full_html_clean(string $m): string return trim(implode("\n", $m)); } + +/** + * @param array $parts + * @return string + */ +function build_url(array $parts): string +{ + return (isset($parts['scheme']) ? "{$parts['scheme']}:" : '') . + ((isset($parts['user']) || isset($parts['host'])) ? '//' : '') . + (isset($parts['user']) ? "{$parts['user']}" : '') . + (isset($parts['pass']) ? ":{$parts['pass']}" : '') . + (isset($parts['user']) ? '@' : '') . + (isset($parts['host']) ? "{$parts['host']}" : '') . + (isset($parts['port']) ? ":{$parts['port']}" : '') . + (isset($parts['path']) ? "{$parts['path']}" : '') . + (isset($parts['query']) ? "?{$parts['query']}" : '') . + (isset($parts['fragment']) ? "#{$parts['fragment']}" : ''); +} -- Ammar Faizi
... for faster response time. Avoid HTTP request to Facebook onion if possible because using Tor network is slow. When using Facebook onion, the CDN asset URL looks like this: https://scontent.xx.facebookcooa4ldbat4g7iacswl3p2zrf5nuylvnhxn6kqolvojixwid.onion/something we can simply replace the domain with scontent.xx.fbcdn.net to get the same asset: https://scontent.xx.fbcdn.net/something Side note: We don't fully understand how Facebook actually manages their CDN. We may introduce a subtle issue by doing it this way. But we hope we don't. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- web/public/api.php | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/web/public/api.php b/web/public/api.php index 1b9ef97a2fb9c868..b2e19789dbca21c6 100644 --- a/web/public/api.php +++ b/web/public/api.php @@ -165,6 +165,14 @@ function rewriteOnionURL(?string $str): ?string return $str; } + /** + * Don't use Facebook onion CDN for performance reasons. + */ + if (preg_match("/^scontent.xx.face.+?\.onion$/", $p["host"])) { + $p["host"] = "scontent.xx.fbcdn.net"; + return build_url($p); + } + $signature = md5($str.API_SECRET, true); /** -- Ammar Faizi
The pull request you sent on Sat, 13 May 2023 01:44:08 +0700: > https://gitlab.torproject.org/ammarfaizi2/Facebook.git dev.fast_asset has been merged into ammarfaizi2/Facebook.git: https://github.com/ammarfaizi2/Facebook/commit/dbc3dfcc32af4b934c0ace3dd7efbfe9878f133f Thank you! [1/3] fb: web: Don't use proxy if the host isn't an onion domain commit: 2b387f821f5b538dd6f566f74c0013cb13e2bd3b [2/3] fb: helper: Introduce `build_url()` function commit: 4561609dccb7237ceeeaba2a4f594130b4c0b982 [3/3] fb: web: Replace Facebook onion asset endpoint with non-onion commit: 8a74bcd85a0ea781ed23c77527af288abcfff900 -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html