Hi, This series introduce a cache mechanism to speed up the web API performance. It's very useful to reduce the pain when developing an app that uses the API. It also greatly reduces the number of requests to the same endpoint that happens in a short period of time. There are 6 patches in this series: Patch #1: Introduce `getCache()` and `setCache()`. A preparation patch to implement better caching mechanism. All methods that need cache will call these functions. Patch #2: Replace old cache mechanism in `getTimelineYears()`. Simplify the caching mechanism and make the `getTimelineYears()` cache private to itself. This also means that the endpoint "action=getTimelineYears" will utilize the cache. Patch #3, #4: Implement cache in `getTimelinePosts()` and `getPost()`. Make short repeated calls fast. Patch #5: Introduce `clearExpiredCaches()`. When a cache is expired, it won't be deleted unless getCache() with the corresponding key is invoked. Introduce a new function to scan for expired caches and delete them. Patch #6: Create cron.php to clear cache. Allow the server to clear expired caches via a small PHP script, cron.php. Periodically calling clearExpiredCaches() will delete old expired caches, it saves storage space. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- The following changes since commit 0d5e59e00359e165778a81f80122bb522f8edb0f: Merge branch 'rewrite_url' (Facebook Onion rewrite support) (2023-05-03 18:46:47 +0700) are available in the Git repository at: https://gitlab.torproject.org/ammarfaizi2/Facebook.git dev.cache for you to fetch changes up to d30f2dad8ca761b5a9c8de32ea48adbbdd201d03: fb: web: Create cron.php to clear cache (2023-05-09 17:33:12 +0700) ---------------------------------------------------------------- Ammar Faizi (6): fb: Introduce `getCache()` and `setCache()` functions fb: Post: Replace old cache mechanism in `getTimelineYears()` fb: Post: Implement cache in `getPost()` fb: Post: Implement cache in `getTimelinePosts()` fb: cache: Introduce `clearExpiredCaches()` fb: web: Create cron.php to clear cache src/Facebook/Facebook.php | 99 ++++++++++++++++++++++++++++++---------- src/Facebook/Methods/Post.php | 74 ++++++++++-------------------- web/cron.php | 9 ++++ 3 files changed, 108 insertions(+), 74 deletions(-) create mode 100644 web/cron.php base-commit: 0d5e59e00359e165778a81f80122bb522f8edb0f -- Ammar Faizi
A preparation patch to implement better caching mechanism. All methods that need cache will call these functions. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- src/Facebook/Facebook.php | 45 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/src/Facebook/Facebook.php b/src/Facebook/Facebook.php index 2fe33f3b7cb6e9ff..6411c26709c24307 100644 --- a/src/Facebook/Facebook.php +++ b/src/Facebook/Facebook.php @@ -287,4 +287,49 @@ class Facebook return $url; } + + /** + * @param string $key + * @param mixed $data + * @param int $expire + * @return void + */ + private function setCache(string $key, $data, int $expire = 600): void + { + $key = str_replace(["/", "\\"], "_", $key); + $data = [ + "exp" => time() + $expire, + "data" => $data + ]; + $data = json_encode($data, JSON_INTERNAL_FLAGS); + if (!is_dir($this->cache_dir)) { + mkdir($this->cache_dir, 0777, true); + if (!is_dir($this->cache_dir)) { + throw new \Exception("Unable to create cache directory: {$this->cache_dir}"); + } + } + file_put_contents("{$this->cache_dir}/{$key}.json", $data); + } + + /** + * @param string $key + * @return mixed + */ + private function getCache(string $key) + { + $key = str_replace(["/", "\\"], "_", $key); + $file = "{$this->cache_dir}/{$key}.json"; + + if (!file_exists($file)) { + return NULL; + } + + $data = json_decode(file_get_contents($file), true); + if (!isset($data["exp"]) || !isset($data["data"])) { + unlink($file); + return NULL; + } + + return $data["data"]; + } } -- Ammar Faizi
`getTimelineYears()` always fetches the endpoint online, then it sets the cache based on the fetch result. On the other hand, the `getTimelinePosts()` method always tries to read the cache that `getTimelineYears()` sets. Now, simplify the caching mechanism and make the `getTimelineYears()` cache private to itself. This also means that the endpoint "action=getTimelineYears" will utilize the cache. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- src/Facebook/Facebook.php | 24 --------------- src/Facebook/Methods/Post.php | 58 +++++------------------------------ 2 files changed, 8 insertions(+), 74 deletions(-) diff --git a/src/Facebook/Facebook.php b/src/Facebook/Facebook.php index 6411c26709c24307..16970714ddce7bdc 100644 --- a/src/Facebook/Facebook.php +++ b/src/Facebook/Facebook.php @@ -115,30 +115,6 @@ class Facebook } } - /** - * @param string $username - * @return string - */ - public function getUserCacheDir(string $username): string - { - $ret = $this->cache_dir."/".$username; - if (!is_dir($ret)) { - if (!mkdir($ret, 0755, true)) { - throw new \Exception("Cannot create user cache directory: {$ret}"); - } - } - - if (!is_writable($ret)) { - throw new \Exception("User cache directory is not writable: {$ret}"); - } - - if (!is_readable($ret)) { - throw new \Exception("User cache directory is not readable: {$ret}"); - } - - return $ret; - } - /** * @param string $user_agent * @return void diff --git a/src/Facebook/Methods/Post.php b/src/Facebook/Methods/Post.php index 988739568dddb9cb..81017c9122e6c341 100644 --- a/src/Facebook/Methods/Post.php +++ b/src/Facebook/Methods/Post.php @@ -38,66 +38,28 @@ trait Post return $years; } - /** - * Cache timeline year links. - * - * @param string $username - * @param array $years - * @return void - */ - private function setCacheTimelineYears(string $username, array $years) - { - $years = json_encode($years, JSON_INTERNAL_FLAGS); - $dir = $this->getUserCacheDir($username); - file_put_contents("{$dir}/timeline_years.json", $years); - } - - /** - * @param string $username - * @return array|null - */ - private function getCacheTimelineYears(string $username): ?array - { - $dir = $this->getUserCacheDir($username); - $file = "{$dir}/timeline_years.json"; - - if (!file_exists($file)) { - return NULL; - } - - /* - * Max cache time: 10 minutes. - */ - if (time() - filemtime($file) > 600) { - unlink($file); - return NULL; - } - - $years = json_decode(file_get_contents($file), true); - if (!is_array($years)) { - return NULL; - } - - return $years; - } - /** * @param string $username * @return array */ public function getTimelineYears(string $username): array { + $cacheKey = __METHOD__.$username; $username = trim($username); if ($username === "") { throw new \Exception("Username cannot be empty!"); } + $years = $this->getCache($cacheKey); + if (is_array($years)) + return $years; + $username = urlencode($username); $o = $this->http("/profile.php?id={$username}", "GET"); try { $ret = $this->parseTimelineYears($o["out"]); if (count($ret) > 0) { - $this->setCacheTimelineYears($username, $ret); + $this->setCache($cacheKey, $ret); return $ret; } } catch (\Exception $e) { @@ -118,7 +80,7 @@ trait Post $ret = $this->parseTimelineYears($o); if (count($ret) > 0) { - $this->setCacheTimelineYears($username, $ret); + $this->setCache($cacheKey, $ret); } return $ret; @@ -134,11 +96,7 @@ trait Post */ public function getTimelinePosts(string $username, int $year = -1, bool $take_content = false, int $limit = -1): array { - $years = $this->getCacheTimelineYears($username); - if (!is_array($years)) { - $years = $this->getTimelineYears($username); - } - + $years = $this->getTimelineYears($username); if ($year === -1) { $year = max(array_keys($years)); } -- Ammar Faizi
Make short repeated calls fast. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- src/Facebook/Methods/Post.php | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/Facebook/Methods/Post.php b/src/Facebook/Methods/Post.php index 81017c9122e6c341..3cf5d7e9896e74ce 100644 --- a/src/Facebook/Methods/Post.php +++ b/src/Facebook/Methods/Post.php @@ -353,6 +353,13 @@ trait Post */ public function getPost(string $post_id): array { + $cacheKey = __METHOD__.$post_id; + + $ret = $this->getCache($cacheKey); + if ($ret) { + return $ret; + } + /** * $post_id must be numeric or a string starts with "pfbid". */ @@ -372,9 +379,11 @@ trait Post $content = $this->parsePostContent($o); $content["embedded_link"] = $this->parseEmbeddedLink($orig); - return [ + $ret = [ "content" => $content, "info" => $info ]; + $this->setCache($cacheKey, $ret); + return $ret; } } -- Ammar Faizi
Make short repeated calls fast. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- src/Facebook/Methods/Post.php | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/Facebook/Methods/Post.php b/src/Facebook/Methods/Post.php index 3cf5d7e9896e74ce..7fe38c5c1b982c72 100644 --- a/src/Facebook/Methods/Post.php +++ b/src/Facebook/Methods/Post.php @@ -96,6 +96,12 @@ trait Post */ public function getTimelinePosts(string $username, int $year = -1, bool $take_content = false, int $limit = -1): array { + $cacheKey = __METHOD__.$username.$year.($take_content ? 1 : 0).sprintf("%010d", $limit); + + $posts = $this->getCache($cacheKey); + if (is_array($posts)) + return $posts; + $years = $this->getTimelineYears($username); if ($year === -1) { $year = max(array_keys($years)); @@ -155,6 +161,7 @@ trait Post ]; } + $this->setCache($cacheKey, $posts); return $posts; } -- Ammar Faizi
When a cache is expired, it won't be deleted unless getCache() with the corresponding key is invoked. Introduce a new function to scan for expired caches and delete them. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- src/Facebook/Facebook.php | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/src/Facebook/Facebook.php b/src/Facebook/Facebook.php index 16970714ddce7bdc..00aed812693248a8 100644 --- a/src/Facebook/Facebook.php +++ b/src/Facebook/Facebook.php @@ -308,4 +308,34 @@ class Facebook return $data["data"]; } + + /** + * @return void + */ + public function clearExpiredCaches(): void + { + $scan = scandir($this->cache_dir); + foreach ($scan as $file) { + $file = "{$this->cache_dir}/{$file}"; + if (!is_file($file)) { + continue; + } + + $data = @file_get_contents($file); + if (!$data) { + unlink($file); + continue; + } + + $data = @json_decode($data, true); + if (!isset($data["exp"]) || !isset($data["data"])) { + unlink($file); + continue; + } + + if ($data["exp"] < time()) { + unlink($file); + } + } + } } -- Ammar Faizi
Allow the server to clear expired caches via a small PHP script, cron.php. Periodically calling clearExpiredCaches() will delete old expired caches, it saves storage space. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> --- web/cron.php | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 web/cron.php diff --git a/web/cron.php b/web/cron.php new file mode 100644 index 0000000000000000..bc183bedea9f4062 --- /dev/null +++ b/web/cron.php @@ -0,0 +1,9 @@ +<?php + +require __DIR__."/../vendor/autoload.php"; +require __DIR__."/auth.php"; + +use Facebook\Facebook; + +$fb = new Facebook($session_dir); +$fb->clearExpiredCaches(); -- Ammar Faizi
On Tue, 9 May 2023 17:46:52 +0700, Ammar Faizi wrote: The pull request you sent on Sun, 07 May 2023 18:12:10 +0000: > https://gitlab.torproject.org/ammarfaizi2/Facebook.git dev.cache has been merged into ammarfaizi2/Facebook.git: https://github.com/ammarfaizi2/Facebook/commit/68e95a61956e75ad08ad0bb68f10172fd2883816 Thank you! [1/6] fb: Introduce `getCache()` and `setCache()` functions commit: 88952f396b1b4831eab3b8ed5d71959e42686a88 [2/6] fb: Post: Replace old cache mechanism in `getTimelineYears()` commit: 8bc6986c8b802b4b22ba69b86ca0892ff70546e7 [3/6] fb: Post: Implement cache in `getPost()` commit: eb5b43a9ac232b4e0d7e973e5feddf1922ca8415 [4/6] fb: Post: Implement cache in `getTimelinePosts()` commit: bf957bbe7e23d48360d6d3bb7b96364a25b0148a [5/6] fb: cache: Introduce `clearExpiredCaches()` commit: 38622b9d3c33a44ee6c05cc75bb781a6c6f52cd5 [6/6] fb: web: Create cron.php to clear cache commit: cbed859c4a77521dcac840b18d4cf30ef493d747 -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html