* [PATCH fb v1 1/6] fb: Introduce `getCache()` and `setCache()` functions
2023-05-09 10:46 [PATCH fb v1 0/6] Introducing cache for the Facebook scraper Ammar Faizi
@ 2023-05-09 10:46 ` Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 2/6] fb: Post: Replace old cache mechanism in `getTimelineYears()` Ammar Faizi
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Ammar Faizi @ 2023-05-09 10:46 UTC (permalink / raw)
To: GNU/Weeb FB Team
Cc: Ammar Faizi, GNU/Weeb Mailing List, Michael William Jonathan
A preparation patch to implement better caching mechanism. All methods
that need cache will call these functions.
Signed-off-by: Ammar Faizi <[email protected]>
---
src/Facebook/Facebook.php | 45 +++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/src/Facebook/Facebook.php b/src/Facebook/Facebook.php
index 2fe33f3b7cb6e9ff..6411c26709c24307 100644
--- a/src/Facebook/Facebook.php
+++ b/src/Facebook/Facebook.php
@@ -287,4 +287,49 @@ class Facebook
return $url;
}
+
+ /**
+ * @param string $key
+ * @param mixed $data
+ * @param int $expire
+ * @return void
+ */
+ private function setCache(string $key, $data, int $expire = 600): void
+ {
+ $key = str_replace(["/", "\\"], "_", $key);
+ $data = [
+ "exp" => time() + $expire,
+ "data" => $data
+ ];
+ $data = json_encode($data, JSON_INTERNAL_FLAGS);
+ if (!is_dir($this->cache_dir)) {
+ mkdir($this->cache_dir, 0777, true);
+ if (!is_dir($this->cache_dir)) {
+ throw new \Exception("Unable to create cache directory: {$this->cache_dir}");
+ }
+ }
+ file_put_contents("{$this->cache_dir}/{$key}.json", $data);
+ }
+
+ /**
+ * @param string $key
+ * @return mixed
+ */
+ private function getCache(string $key)
+ {
+ $key = str_replace(["/", "\\"], "_", $key);
+ $file = "{$this->cache_dir}/{$key}.json";
+
+ if (!file_exists($file)) {
+ return NULL;
+ }
+
+ $data = json_decode(file_get_contents($file), true);
+ if (!isset($data["exp"]) || !isset($data["data"])) {
+ unlink($file);
+ return NULL;
+ }
+
+ return $data["data"];
+ }
}
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH fb v1 2/6] fb: Post: Replace old cache mechanism in `getTimelineYears()`
2023-05-09 10:46 [PATCH fb v1 0/6] Introducing cache for the Facebook scraper Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 1/6] fb: Introduce `getCache()` and `setCache()` functions Ammar Faizi
@ 2023-05-09 10:46 ` Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 3/6] fb: Post: Implement cache in `getPost()` Ammar Faizi
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Ammar Faizi @ 2023-05-09 10:46 UTC (permalink / raw)
To: GNU/Weeb FB Team
Cc: Ammar Faizi, GNU/Weeb Mailing List, Michael William Jonathan
`getTimelineYears()` always fetches the endpoint online, then it sets
the cache based on the fetch result. On the other hand, the
`getTimelinePosts()` method always tries to read the cache that
`getTimelineYears()` sets.
Now, simplify the caching mechanism and make the `getTimelineYears()`
cache private to itself. This also means that the endpoint
"action=getTimelineYears" will utilize the cache.
Signed-off-by: Ammar Faizi <[email protected]>
---
src/Facebook/Facebook.php | 24 ---------------
src/Facebook/Methods/Post.php | 58 +++++------------------------------
2 files changed, 8 insertions(+), 74 deletions(-)
diff --git a/src/Facebook/Facebook.php b/src/Facebook/Facebook.php
index 6411c26709c24307..16970714ddce7bdc 100644
--- a/src/Facebook/Facebook.php
+++ b/src/Facebook/Facebook.php
@@ -115,30 +115,6 @@ class Facebook
}
}
- /**
- * @param string $username
- * @return string
- */
- public function getUserCacheDir(string $username): string
- {
- $ret = $this->cache_dir."/".$username;
- if (!is_dir($ret)) {
- if (!mkdir($ret, 0755, true)) {
- throw new \Exception("Cannot create user cache directory: {$ret}");
- }
- }
-
- if (!is_writable($ret)) {
- throw new \Exception("User cache directory is not writable: {$ret}");
- }
-
- if (!is_readable($ret)) {
- throw new \Exception("User cache directory is not readable: {$ret}");
- }
-
- return $ret;
- }
-
/**
* @param string $user_agent
* @return void
diff --git a/src/Facebook/Methods/Post.php b/src/Facebook/Methods/Post.php
index 988739568dddb9cb..81017c9122e6c341 100644
--- a/src/Facebook/Methods/Post.php
+++ b/src/Facebook/Methods/Post.php
@@ -38,66 +38,28 @@ trait Post
return $years;
}
- /**
- * Cache timeline year links.
- *
- * @param string $username
- * @param array $years
- * @return void
- */
- private function setCacheTimelineYears(string $username, array $years)
- {
- $years = json_encode($years, JSON_INTERNAL_FLAGS);
- $dir = $this->getUserCacheDir($username);
- file_put_contents("{$dir}/timeline_years.json", $years);
- }
-
- /**
- * @param string $username
- * @return array|null
- */
- private function getCacheTimelineYears(string $username): ?array
- {
- $dir = $this->getUserCacheDir($username);
- $file = "{$dir}/timeline_years.json";
-
- if (!file_exists($file)) {
- return NULL;
- }
-
- /*
- * Max cache time: 10 minutes.
- */
- if (time() - filemtime($file) > 600) {
- unlink($file);
- return NULL;
- }
-
- $years = json_decode(file_get_contents($file), true);
- if (!is_array($years)) {
- return NULL;
- }
-
- return $years;
- }
-
/**
* @param string $username
* @return array
*/
public function getTimelineYears(string $username): array
{
+ $cacheKey = __METHOD__.$username;
$username = trim($username);
if ($username === "") {
throw new \Exception("Username cannot be empty!");
}
+ $years = $this->getCache($cacheKey);
+ if (is_array($years))
+ return $years;
+
$username = urlencode($username);
$o = $this->http("/profile.php?id={$username}", "GET");
try {
$ret = $this->parseTimelineYears($o["out"]);
if (count($ret) > 0) {
- $this->setCacheTimelineYears($username, $ret);
+ $this->setCache($cacheKey, $ret);
return $ret;
}
} catch (\Exception $e) {
@@ -118,7 +80,7 @@ trait Post
$ret = $this->parseTimelineYears($o);
if (count($ret) > 0) {
- $this->setCacheTimelineYears($username, $ret);
+ $this->setCache($cacheKey, $ret);
}
return $ret;
@@ -134,11 +96,7 @@ trait Post
*/
public function getTimelinePosts(string $username, int $year = -1, bool $take_content = false, int $limit = -1): array
{
- $years = $this->getCacheTimelineYears($username);
- if (!is_array($years)) {
- $years = $this->getTimelineYears($username);
- }
-
+ $years = $this->getTimelineYears($username);
if ($year === -1) {
$year = max(array_keys($years));
}
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH fb v1 3/6] fb: Post: Implement cache in `getPost()`
2023-05-09 10:46 [PATCH fb v1 0/6] Introducing cache for the Facebook scraper Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 1/6] fb: Introduce `getCache()` and `setCache()` functions Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 2/6] fb: Post: Replace old cache mechanism in `getTimelineYears()` Ammar Faizi
@ 2023-05-09 10:46 ` Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 4/6] fb: Post: Implement cache in `getTimelinePosts()` Ammar Faizi
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Ammar Faizi @ 2023-05-09 10:46 UTC (permalink / raw)
To: GNU/Weeb FB Team
Cc: Ammar Faizi, GNU/Weeb Mailing List, Michael William Jonathan
Make short repeated calls fast.
Signed-off-by: Ammar Faizi <[email protected]>
---
src/Facebook/Methods/Post.php | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/src/Facebook/Methods/Post.php b/src/Facebook/Methods/Post.php
index 81017c9122e6c341..3cf5d7e9896e74ce 100644
--- a/src/Facebook/Methods/Post.php
+++ b/src/Facebook/Methods/Post.php
@@ -353,6 +353,13 @@ trait Post
*/
public function getPost(string $post_id): array
{
+ $cacheKey = __METHOD__.$post_id;
+
+ $ret = $this->getCache($cacheKey);
+ if ($ret) {
+ return $ret;
+ }
+
/**
* $post_id must be numeric or a string starts with "pfbid".
*/
@@ -372,9 +379,11 @@ trait Post
$content = $this->parsePostContent($o);
$content["embedded_link"] = $this->parseEmbeddedLink($orig);
- return [
+ $ret = [
"content" => $content,
"info" => $info
];
+ $this->setCache($cacheKey, $ret);
+ return $ret;
}
}
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH fb v1 4/6] fb: Post: Implement cache in `getTimelinePosts()`
2023-05-09 10:46 [PATCH fb v1 0/6] Introducing cache for the Facebook scraper Ammar Faizi
` (2 preceding siblings ...)
2023-05-09 10:46 ` [PATCH fb v1 3/6] fb: Post: Implement cache in `getPost()` Ammar Faizi
@ 2023-05-09 10:46 ` Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 5/6] fb: cache: Introduce `clearExpiredCaches()` Ammar Faizi
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Ammar Faizi @ 2023-05-09 10:46 UTC (permalink / raw)
To: GNU/Weeb FB Team
Cc: Ammar Faizi, GNU/Weeb Mailing List, Michael William Jonathan
Make short repeated calls fast.
Signed-off-by: Ammar Faizi <[email protected]>
---
src/Facebook/Methods/Post.php | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/Facebook/Methods/Post.php b/src/Facebook/Methods/Post.php
index 3cf5d7e9896e74ce..7fe38c5c1b982c72 100644
--- a/src/Facebook/Methods/Post.php
+++ b/src/Facebook/Methods/Post.php
@@ -96,6 +96,12 @@ trait Post
*/
public function getTimelinePosts(string $username, int $year = -1, bool $take_content = false, int $limit = -1): array
{
+ $cacheKey = __METHOD__.$username.$year.($take_content ? 1 : 0).sprintf("%010d", $limit);
+
+ $posts = $this->getCache($cacheKey);
+ if (is_array($posts))
+ return $posts;
+
$years = $this->getTimelineYears($username);
if ($year === -1) {
$year = max(array_keys($years));
@@ -155,6 +161,7 @@ trait Post
];
}
+ $this->setCache($cacheKey, $posts);
return $posts;
}
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH fb v1 5/6] fb: cache: Introduce `clearExpiredCaches()`
2023-05-09 10:46 [PATCH fb v1 0/6] Introducing cache for the Facebook scraper Ammar Faizi
` (3 preceding siblings ...)
2023-05-09 10:46 ` [PATCH fb v1 4/6] fb: Post: Implement cache in `getTimelinePosts()` Ammar Faizi
@ 2023-05-09 10:46 ` Ammar Faizi
2023-05-09 10:46 ` [PATCH fb v1 6/6] fb: web: Create cron.php to clear cache Ammar Faizi
2023-05-09 11:06 ` [PATCH fb v1 0/6] Introducing cache for the Facebook scraper GNU/Weeb Facebook Team
6 siblings, 0 replies; 8+ messages in thread
From: Ammar Faizi @ 2023-05-09 10:46 UTC (permalink / raw)
To: GNU/Weeb FB Team
Cc: Ammar Faizi, GNU/Weeb Mailing List, Michael William Jonathan
When a cache is expired, it won't be deleted unless getCache() with the
corresponding key is invoked. Introduce a new function to scan for
expired caches and delete them.
Signed-off-by: Ammar Faizi <[email protected]>
---
src/Facebook/Facebook.php | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/src/Facebook/Facebook.php b/src/Facebook/Facebook.php
index 16970714ddce7bdc..00aed812693248a8 100644
--- a/src/Facebook/Facebook.php
+++ b/src/Facebook/Facebook.php
@@ -308,4 +308,34 @@ class Facebook
return $data["data"];
}
+
+ /**
+ * @return void
+ */
+ public function clearExpiredCaches(): void
+ {
+ $scan = scandir($this->cache_dir);
+ foreach ($scan as $file) {
+ $file = "{$this->cache_dir}/{$file}";
+ if (!is_file($file)) {
+ continue;
+ }
+
+ $data = @file_get_contents($file);
+ if (!$data) {
+ unlink($file);
+ continue;
+ }
+
+ $data = @json_decode($data, true);
+ if (!isset($data["exp"]) || !isset($data["data"])) {
+ unlink($file);
+ continue;
+ }
+
+ if ($data["exp"] < time()) {
+ unlink($file);
+ }
+ }
+ }
}
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH fb v1 6/6] fb: web: Create cron.php to clear cache
2023-05-09 10:46 [PATCH fb v1 0/6] Introducing cache for the Facebook scraper Ammar Faizi
` (4 preceding siblings ...)
2023-05-09 10:46 ` [PATCH fb v1 5/6] fb: cache: Introduce `clearExpiredCaches()` Ammar Faizi
@ 2023-05-09 10:46 ` Ammar Faizi
2023-05-09 11:06 ` [PATCH fb v1 0/6] Introducing cache for the Facebook scraper GNU/Weeb Facebook Team
6 siblings, 0 replies; 8+ messages in thread
From: Ammar Faizi @ 2023-05-09 10:46 UTC (permalink / raw)
To: GNU/Weeb FB Team
Cc: Ammar Faizi, GNU/Weeb Mailing List, Michael William Jonathan
Allow the server to clear expired caches via a small PHP script,
cron.php. Periodically calling clearExpiredCaches() will delete old
expired caches, it saves storage space.
Signed-off-by: Ammar Faizi <[email protected]>
---
web/cron.php | 9 +++++++++
1 file changed, 9 insertions(+)
create mode 100644 web/cron.php
diff --git a/web/cron.php b/web/cron.php
new file mode 100644
index 0000000000000000..bc183bedea9f4062
--- /dev/null
+++ b/web/cron.php
@@ -0,0 +1,9 @@
+<?php
+
+require __DIR__."/../vendor/autoload.php";
+require __DIR__."/auth.php";
+
+use Facebook\Facebook;
+
+$fb = new Facebook($session_dir);
+$fb->clearExpiredCaches();
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH fb v1 0/6] Introducing cache for the Facebook scraper
2023-05-09 10:46 [PATCH fb v1 0/6] Introducing cache for the Facebook scraper Ammar Faizi
` (5 preceding siblings ...)
2023-05-09 10:46 ` [PATCH fb v1 6/6] fb: web: Create cron.php to clear cache Ammar Faizi
@ 2023-05-09 11:06 ` GNU/Weeb Facebook Team
6 siblings, 0 replies; 8+ messages in thread
From: GNU/Weeb Facebook Team @ 2023-05-09 11:06 UTC (permalink / raw)
To: Ammar Faizi
Cc: GNU/Weeb Facebook Team, GNU/Weeb Mailing List,
Michael William Jonathan
On Tue, 9 May 2023 17:46:52 +0700, Ammar Faizi wrote:
The pull request you sent on Sun, 07 May 2023 18:12:10 +0000:
> https://gitlab.torproject.org/ammarfaizi2/Facebook.git dev.cache
has been merged into ammarfaizi2/Facebook.git:
https://github.com/ammarfaizi2/Facebook/commit/68e95a61956e75ad08ad0bb68f10172fd2883816
Thank you!
[1/6] fb: Introduce `getCache()` and `setCache()` functions
commit: 88952f396b1b4831eab3b8ed5d71959e42686a88
[2/6] fb: Post: Replace old cache mechanism in `getTimelineYears()`
commit: 8bc6986c8b802b4b22ba69b86ca0892ff70546e7
[3/6] fb: Post: Implement cache in `getPost()`
commit: eb5b43a9ac232b4e0d7e973e5feddf1922ca8415
[4/6] fb: Post: Implement cache in `getTimelinePosts()`
commit: bf957bbe7e23d48360d6d3bb7b96364a25b0148a
[5/6] fb: cache: Introduce `clearExpiredCaches()`
commit: 38622b9d3c33a44ee6c05cc75bb781a6c6f52cd5
[6/6] fb: web: Create cron.php to clear cache
commit: cbed859c4a77521dcac840b18d4cf30ef493d747
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply [flat|nested] 8+ messages in thread