From: Ammar Faizi <[email protected]>
To: GNU/Weeb Mailing List <[email protected]>
Cc: GNU/Weeb FB Team <[email protected]>,
Michael William Jonathan <[email protected]>
Subject: Introducing Facebook Scraper API (with Tor network support)
Date: Wed, 3 May 2023 19:37:07 +0700 [thread overview]
Message-ID: <[email protected]> (raw)
We are open-sourcing a new project: Facebook Scraper API with Tor
network support. It's fully written in PHP (yeah PHP, feel free to argue
PHP is dead: it's not!). Currently, it can only scrape text and photo
posts. I'll be adding the video support in the near future. Many
features will come soon.
This project is licensed under the GPLv2 license, which is my default
open-source license choice.
Comments and patches, welcome...
Patches and any inquiry about this project should be directed to:
To: Ammar Faizi <[email protected]>
Cc: Michael William Jonathan <[email protected]>
Cc: GNU/Weeb Mailing List <[email protected]>
Cc: GNU/Weeb FB Team <[email protected]>
The following changes since commit 75065abdba76e40f102041b70c2edaf4bf902259:
fb: Initial commit (2023-05-01 00:48:36 +0700)
are available in the Git repository at:
https://gitlab.torproject.org/ammarfaizi2/Facebook.git master
for you to fetch changes up to 0d5e59e00359e165778a81f80122bb522f8edb0f:
Merge branch 'rewrite_url' (Facebook Onion rewrite support) (2023-05-03 18:46:47 +0700)
----------------------------------------------------------------
Ammar Faizi (33):
fb: Create the initial 'Post' trait (getTimelineYears)
fb: Create user cache mechanism
fb: Post: Handle a 'get timeline years' edge case
fb: Post: Create getTimelinePosts method
fb: Post: Make getTimelineYears() more reliable
fb: Post: Implement getPost() function
fb: web: Create initial web API
fb: Post: Handle not found in getTimelineYears()
fb: Post: Fix stupid indentation
fb: web: Add getPost() endpoint
fb: helpers: Replace '</p>' with double lines instead of single line
fb: web: Create 'logs' directory for web server logs
fb: composer.json: Remove phpunit from require-dev
fb: Use CURLPROXY_SOCKS5_HOSTNAME as proxy type
fb: helpers: Trim the end result of full_html_clean()
fb: Post: Split parsing logic in getPost()
fb: Post: Split info parser
fb: Post: Implement tryParsePhotoPost()
fb: Post: Introduce `$take_content` argument in `getTimelinePosts()`
fb: Post: Introduce `$limit` argument in `getTimelinePosts()`
fb: web: Invert the getTimelinePosts() condition
fb: web: Integrate `$take_content` and `$limit` args
fb: Post: Switch `content` and `info` key position
fb: Post: Parse the embedded link in a post
fb: web: Create `httpGet()` API for visiting FB onion endpoints
fb: Post: Introduce rewrite URL callback
fb: web: Provide a proxy to access onion endpoints
fb: Post: Call cleanURL() on the img_preview URL
fb: web: Fix `is_compressed` value
fb: web: Supress gzinflate error
fb: web: Don't rewrite non Facebook onion URL
Merge branch 'post' (initial FB post scraper API)
Merge branch 'rewrite_url' (Facebook Onion rewrite support)
auth.example.php | 2 +
composer.json | 3 -
main.php | 45 +++++
src/Facebook/Facebook.php | 93 ++++++++-
src/Facebook/Methods/Post.php | 422 ++++++++++++++++++++++++++++++++++++++++
src/Facebook/helpers.php | 54 +++++
web/.gitignore | 1 +
web/logs/.gitignore | 2 +
web/public/api.php | 268 +++++++++++++++++++++++++
9 files changed, 885 insertions(+), 5 deletions(-)
create mode 100644 src/Facebook/Methods/Post.php
create mode 100644 web/.gitignore
create mode 100644 web/logs/.gitignore
create mode 100644 web/public/api.php
--
Ammar Faizi
reply other threads:[~2023-05-03 12:37 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox