From: Muhammad Rizki <[email protected]>
To: Ammar Faizi <[email protected]>
Cc: Muhammad Rizki <[email protected]>,
GNU/Weeb Mailing List <[email protected]>,
Alviro Iskandar Setiawan <[email protected]>
Subject: [RFC PATCH v1 5/5] Refactor many files
Date: Tue, 6 Sep 2022 18:19:29 +0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
I want to refactor atom scraper file and utility file and create a
directory for both of it to make it reuseable in the future use. This
commit contains:
- Rename some functions in utils
- Rename file name in telegram such as scraper => mailer,
bot.py => listener
- Move class Mutexes to utility file
- Rename the Mutexes attribute send_to_tg => lock
- Changes affected codes during this refactor
Signed-off-by: Muhammad Rizki <[email protected]>
---
.gitignore | 1 +
daemon/atom/__init__.py | 7 ++
daemon/{telegram/scraper => atom}/scraper.py | 12 +--
daemon/{telegram/scraper => atom}/utils.py | 87 ++++++++++++++-----
daemon/{telegram => }/db.sql | 0
.../.env.example => telegram.env.example} | 0
.../telegram/{scraper => mailer}/__init__.py | 4 +-
.../{scraper/bot.py => mailer/listener.py} | 23 ++---
daemon/telegram/packages/client.py | 10 ++-
.../packages/plugins/callbacks/del_atom.py | 6 +-
.../packages/plugins/callbacks/del_chat.py | 6 +-
.../packages/plugins/commands/debugger.py | 2 +-
.../packages/plugins/commands/manage_atom.py | 6 +-
.../plugins/commands/manage_broadcast.py | 6 +-
.../packages/plugins/commands/scrape.py | 10 +--
daemon/{telegram/run.py => tg.py} | 24 +++--
16 files changed, 122 insertions(+), 82 deletions(-)
create mode 100644 daemon/atom/__init__.py
rename daemon/{telegram/scraper => atom}/scraper.py (79%)
rename daemon/{telegram/scraper => atom}/utils.py (72%)
rename daemon/{telegram => }/db.sql (100%)
rename daemon/{telegram/.env.example => telegram.env.example} (100%)
rename daemon/telegram/{scraper => mailer}/__init__.py (68%)
rename daemon/telegram/{scraper/bot.py => mailer/listener.py} (88%)
rename daemon/{telegram/run.py => tg.py} (68%)
diff --git a/.gitignore b/.gitignore
index 4201a17..53027d9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,5 +140,6 @@ data.json
*.patch
# configuration file
+daemon/*.env
daemon/telegram/config.py
daemon/discord/config.py
diff --git a/daemon/atom/__init__.py b/daemon/atom/__init__.py
new file mode 100644
index 0000000..2fe4e31
--- /dev/null
+++ b/daemon/atom/__init__.py
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (C) 2022 Muhammad Rizki <[email protected]>
+# Copyright (C) 2022 Ammar Faizi <[email protected]>
+#
+
+from .scraper import Scraper
diff --git a/daemon/telegram/scraper/scraper.py b/daemon/atom/scraper.py
similarity index 79%
rename from daemon/telegram/scraper/scraper.py
rename to daemon/atom/scraper.py
index 2d5942b..8508ae9 100644
--- a/daemon/telegram/scraper/scraper.py
+++ b/daemon/atom/scraper.py
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
#
-# Copyright (C) 2022 Muhammad Rizki <[email protected]>
+# Copyright (C) 2022 Muhammad Rizki <[email protected]>
# Copyright (C) 2022 Ammar Faizi <[email protected]>
#
@@ -11,7 +11,7 @@ import httpx
import email
-class Scraper():
+class Scraper:
async def get_new_threads_urls(self, atom_url):
ret = await self.__get_atom_content(atom_url)
return await self.__get_new_threads_from_atom(ret)
@@ -19,10 +19,10 @@ class Scraper():
async def __get_atom_content(self, atom_url):
async with httpx.AsyncClient() as client:
- res = await client.get(atom_url)
+ res = await client.get(atom_url, timeout=20)
if res.status_code == 200:
return res.text
- raise Exception(f"[get_atom_content]: Returned {res.status_code} HTTP code")
+ raise Exception(f"[__get_atom_content]: Returned {res.status_code} HTTP code")
async def __get_new_threads_from_atom(self, atom):
@@ -54,10 +54,10 @@ class Scraper():
async def get_email_from_url(self, url):
async with httpx.AsyncClient() as client:
- res = await client.get(url)
+ res = await client.get(url, timeout=20)
if res.status_code == 200:
return email.message_from_string(
res.text,
policy=email.policy.default
)
- raise Exception(f"[get_atom_content]: Returned {res.status_code} HTTP code")
+ raise Exception(f"[get_email_from_url]: Returned {res.status_code} HTTP code")
diff --git a/daemon/telegram/scraper/utils.py b/daemon/atom/utils.py
similarity index 72%
rename from daemon/telegram/scraper/utils.py
rename to daemon/atom/utils.py
index c428a33..d73d6bd 100644
--- a/daemon/telegram/scraper/utils.py
+++ b/daemon/atom/utils.py
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
#
-# Copyright (C) 2022 Muhammad Rizki <[email protected]>
+# Copyright (C) 2022 Muhammad Rizki <[email protected]>
# Copyright (C) 2022 Ammar Faizi <[email protected]>
#
@@ -8,13 +8,19 @@ from pyrogram.types import Chat, InlineKeyboardMarkup, InlineKeyboardButton
from email.message import Message
from typing import Dict
from slugify import slugify
+import html
import hashlib
import uuid
import os
import re
import shutil
import httpx
-import html
+import asyncio
+
+
+class Mutexes:
+ def __init__(self):
+ self.lock = asyncio.Lock()
def get_email_msg_id(mail):
@@ -113,25 +119,37 @@ def consruct_to_n_cc(to: list, cc: list):
return ret
-def gen_temp(name: str):
+def gen_temp(name: str, platform: str):
+ platform = platform.lower()
+ plt_ls = ["telegram", "discord"]
+
+ if platform not in plt_ls:
+ t = f"Platform {platform} is not found, "
+ t += f"only {', '.join(plt_ls)} is available"
+ raise ValueError(f"Platform {platform} is not found")
+
md5 = hashlib.md5(name.encode()).hexdigest()
- ret = os.getenv("STORAGE_DIR", "storage") + "/" + md5
+ store_dir = os.getenv("STORAGE_DIR", "storage")
+ platform = platform.replace("discord", "dscord")
+ path = f"{platform}/{store_dir}/{md5}"
try:
- os.mkdir(ret)
+ os.mkdir(path)
except FileExistsError:
pass
- return ret
+ return path
-def extract_body(thread: Message):
+def extract_body(thread: Message, platform: str):
if not thread.is_multipart():
- p = thread.get_payload(decode=True)
- return f"{p.decode(errors='replace')}\n".lstrip(), []
+ p = thread.get_payload(decode=True).decode(errors='replace')
+ if platform == "discord":
+ p = quote_reply(p)
+ return f"{p}\n".lstrip(), []
ret = ""
files = []
- temp = gen_temp(str(uuid.uuid4()))
+ temp = gen_temp(str(uuid.uuid4()), platform)
for p in thread.get_payload():
fname = p.get_filename()
payload = p.get_payload(decode=True)
@@ -164,35 +182,42 @@ def __is_patch(subject, content):
return True
-def create_template(thread: Message, to=None, cc=None):
+def create_template(thread: Message, platform: str, to=None, cc=None):
if not to:
to = extract_list("to", thread)
if not cc:
cc = extract_list("cc", thread)
+ if platform == "telegram":
+ substr = 4000
+ border = f"\n<code>{'-'*72}</code>"
+ else:
+ substr = 1900
+ border = f"\n{'-'*80}"
subject = thread.get('subject')
ret = f"From: {thread.get('from')}\n"
ret += consruct_to_n_cc(to, cc)
ret += f"Date: {thread.get('date')}\n"
ret += f"Subject: {subject}\n\n"
- content, files = extract_body(thread)
+ content, files = extract_body(thread, platform)
is_patch = __is_patch(subject, content)
if is_patch:
ret += content
else:
ret += content.strip().replace("\t", " ")
- if len(ret) >= 4000:
- ret = ret[:4000] + "..."
- ret = fix_utf8_char(ret)
- ret += f"\n<code>{'-'*72}</code>"
+ if len(ret) >= substr:
+ ret = ret[:substr] + "..."
+
+ ret = fix_utf8_char(ret, platform == "telegram")
+ ret += border
return ret, files, is_patch
-def prepare_send_patch(mail, text, url):
- tmp = gen_temp(url)
+def prepare_patch(mail: "Message", text: str, url: str, platform: str):
+ tmp = gen_temp(url, platform)
fnm = str(mail.get("subject"))
sch = re.search(PATCH_PATTERN, fnm, re.IGNORECASE)
@@ -210,17 +235,31 @@ def prepare_send_patch(mail, text, url):
with open(file, "wb") as f:
f.write(bytes(text, encoding="utf8"))
- caption = "#patch #ml\n" + fix_utf8_char(cap)
+ caption = "#patch #ml"
+ if platform == "telegram":
+ caption += fix_utf8_char("\n" + cap, True)
return tmp, file, caption, url
-def clean_up_after_send_patch(tmp):
+def remove_patch(tmp):
shutil.rmtree(tmp)
-def fix_utf8_char(text: str):
- text = text.rstrip().replace("�"," ")
- return html.escape(html.escape(text))
+def fix_utf8_char(text: str, html_escape: bool = True):
+ t = text.rstrip().replace("�"," ")
+ if html_escape:
+ t = html.escape(html.escape(text))
+ return t
+
+
+def quote_reply(text: str):
+ a = ""
+ for b in text.split("\n"):
+ b = b.replace(">\n", "> ")
+ if b.startswith(">"):
+ a += "> "
+ a += f"{b}\n"
+ return a
EMAIL_MSG_ID_PATTERN = r"<([^\<\>]+)>"
@@ -240,6 +279,8 @@ async def is_atom_url(text: str):
return mime == "application/atom+xml"
except: return False
+
+
def remove_command(text: str):
txt = text.split(" ")
txt = text.replace(txt[0] + " ","")
diff --git a/daemon/telegram/db.sql b/daemon/db.sql
similarity index 100%
rename from daemon/telegram/db.sql
rename to daemon/db.sql
diff --git a/daemon/telegram/.env.example b/daemon/telegram.env.example
similarity index 100%
rename from daemon/telegram/.env.example
rename to daemon/telegram.env.example
diff --git a/daemon/telegram/scraper/__init__.py b/daemon/telegram/mailer/__init__.py
similarity index 68%
rename from daemon/telegram/scraper/__init__.py
rename to daemon/telegram/mailer/__init__.py
index 4294302..20f9034 100644
--- a/daemon/telegram/scraper/__init__.py
+++ b/daemon/telegram/mailer/__init__.py
@@ -4,6 +4,4 @@
# Copyright (C) 2022 Ammar Faizi <[email protected]>
#
-from .scraper import Scraper
-from .bot import BotMutexes
-from .bot import Bot
+from .listener import Listener
diff --git a/daemon/telegram/scraper/bot.py b/daemon/telegram/mailer/listener.py
similarity index 88%
rename from daemon/telegram/scraper/bot.py
rename to daemon/telegram/mailer/listener.py
index a7087ad..5e9acd2 100644
--- a/daemon/telegram/scraper/bot.py
+++ b/daemon/telegram/mailer/listener.py
@@ -6,26 +6,21 @@
from pyrogram.types import Message
from apscheduler.schedulers.asyncio import AsyncIOScheduler
-from packages import DaemonClient
-from scraper import Scraper
-from . import utils
+from telegram.packages import DaemonClient
+from atom import Scraper
+from atom import utils
import asyncio
import shutil
import re
import traceback
-class BotMutexes():
- def __init__(self):
- self.send_to_tg = asyncio.Lock()
-
-
-class Bot():
+class Listener:
def __init__(self, client: DaemonClient, sched: AsyncIOScheduler,
- scraper: Scraper, mutexes: BotMutexes):
+ mutexes: utils.Mutexes):
self.client = client
self.sched = sched
- self.scraper = scraper
+ self.scraper = Scraper()
self.mutexes = mutexes
self.db = client.db
self.isRunnerFixed = False
@@ -72,7 +67,7 @@ class Bot():
async def __handle_mail(self, url, mail):
chats = self.db.get_broadcast_chats()
for chat in chats:
- async with self.mutexes.send_to_tg:
+ async with self.mutexes.lock:
should_wait = await self.__send_mail(url, mail,
chat[1])
@@ -80,7 +75,7 @@ class Bot():
await asyncio.sleep(1)
- # @__must_hold(self.mutexes.send_to_tg)
+ # @__must_hold(self.mutexes.lock)
async def __send_mail(self, url, mail, tg_chat_id):
email_msg_id = utils.get_email_msg_id(mail)
if not email_msg_id:
@@ -99,7 +94,7 @@ class Bot():
#
return False
- text, files, is_patch = utils.create_template(mail)
+ text, files, is_patch = utils.create_template(mail, "telegram")
reply_to = self.get_reply(mail, tg_chat_id)
url = str(re.sub(r"/raw$", "", url))
diff --git a/daemon/telegram/packages/client.py b/daemon/telegram/packages/client.py
index 820c3e2..686e5ef 100644
--- a/daemon/telegram/packages/client.py
+++ b/daemon/telegram/packages/client.py
@@ -8,8 +8,8 @@ from pyrogram.enums import ParseMode
from pyrogram.types import Message, InlineKeyboardMarkup, InlineKeyboardButton
from typing import Union
from email.message import Message
-from scraper import utils
-from database import DB
+from atom import utils
+from telegram.database import DB
from .decorator import handle_flood
@@ -56,7 +56,9 @@ class DaemonClient(Client):
parse_mode: ParseMode = ParseMode.HTML
) -> Message:
print("[send_patch_email]")
- tmp, doc, caption, url = utils.prepare_send_patch(mail, text, url)
+ tmp, doc, caption, url = utils.prepare_patch(
+ mail, text, url, "telegram"
+ )
m = await self.send_document(
chat_id=chat_id,
document=doc,
@@ -71,5 +73,5 @@ class DaemonClient(Client):
])
)
- utils.clean_up_after_send_patch(tmp)
+ utils.remove_patch(tmp)
return m
diff --git a/daemon/telegram/packages/plugins/callbacks/del_atom.py b/daemon/telegram/packages/plugins/callbacks/del_atom.py
index 1510d60..b750e1c 100644
--- a/daemon/telegram/packages/plugins/callbacks/del_atom.py
+++ b/daemon/telegram/packages/plugins/callbacks/del_atom.py
@@ -3,10 +3,10 @@
# Copyright (C) 2022 Muhammad Rizki <[email protected]>
#
-from packages import DaemonClient
-from scraper import utils
+from telegram.packages import DaemonClient
+from atom import utils
from pyrogram.types import CallbackQuery
-import config
+from telegram import config
@DaemonClient.on_callback_query(config.admin_only, group=1)
diff --git a/daemon/telegram/packages/plugins/callbacks/del_chat.py b/daemon/telegram/packages/plugins/callbacks/del_chat.py
index 26c6dd8..90b557e 100644
--- a/daemon/telegram/packages/plugins/callbacks/del_chat.py
+++ b/daemon/telegram/packages/plugins/callbacks/del_chat.py
@@ -3,10 +3,10 @@
# Copyright (C) 2022 Muhammad Rizki <[email protected]>
#
-from packages import DaemonClient
-from scraper import utils
+from telegram.packages import DaemonClient
+from atom import utils
from pyrogram.types import CallbackQuery
-import config
+from telegram import config
@DaemonClient.on_callback_query(config.admin_only, group=2)
diff --git a/daemon/telegram/packages/plugins/commands/debugger.py b/daemon/telegram/packages/plugins/commands/debugger.py
index ae2d31d..7f6f367 100644
--- a/daemon/telegram/packages/plugins/commands/debugger.py
+++ b/daemon/telegram/packages/plugins/commands/debugger.py
@@ -7,7 +7,7 @@ from pyrogram import Client, filters, enums
from pyrogram.types import Message
from textwrap import indent
import io, import_expression, contextlib, traceback
-import config
+from telegram import config
@Client.on_message(
diff --git a/daemon/telegram/packages/plugins/commands/manage_atom.py b/daemon/telegram/packages/plugins/commands/manage_atom.py
index 4ba422a..99df7f7 100644
--- a/daemon/telegram/packages/plugins/commands/manage_atom.py
+++ b/daemon/telegram/packages/plugins/commands/manage_atom.py
@@ -5,9 +5,9 @@
from pyrogram.types import Message
from pyrogram import filters
-from packages import DaemonClient
-from scraper import utils
-import config
+from telegram.packages import DaemonClient
+from atom import utils
+from telegram import config
@DaemonClient.on_message(
diff --git a/daemon/telegram/packages/plugins/commands/manage_broadcast.py b/daemon/telegram/packages/plugins/commands/manage_broadcast.py
index 6d75c36..0aa70de 100644
--- a/daemon/telegram/packages/plugins/commands/manage_broadcast.py
+++ b/daemon/telegram/packages/plugins/commands/manage_broadcast.py
@@ -5,9 +5,9 @@
from pyrogram.types import Message
from pyrogram import filters, enums
-from packages import DaemonClient
-from scraper import utils
-import config
+from telegram.packages import DaemonClient
+from atom import utils
+from telegram import config
@DaemonClient.on_message(
diff --git a/daemon/telegram/packages/plugins/commands/scrape.py b/daemon/telegram/packages/plugins/commands/scrape.py
index 45b1581..4cdbf1c 100644
--- a/daemon/telegram/packages/plugins/commands/scrape.py
+++ b/daemon/telegram/packages/plugins/commands/scrape.py
@@ -6,10 +6,10 @@
from pyrogram.types import Message
from pyrogram import filters
-from packages import DaemonClient
-from scraper import Scraper
-from scraper import utils
-import config
+from telegram.packages import DaemonClient
+from atom import Scraper
+from atom import utils
+from telegram import config
import shutil
import re
import asyncio
@@ -37,7 +37,7 @@ async def scrap_email(c: DaemonClient, m: Message):
s = Scraper()
mail = await s.get_email_from_url(url)
- text, files, is_patch = utils.create_template(mail)
+ text, files, is_patch = utils.create_template(mail, "telegram")
if is_patch:
m = await c.send_patch_email(
diff --git a/daemon/telegram/run.py b/daemon/tg.py
similarity index 68%
rename from daemon/telegram/run.py
rename to daemon/tg.py
index 5360395..c3e85ab 100644
--- a/daemon/telegram/run.py
+++ b/daemon/tg.py
@@ -1,24 +1,23 @@
# SPDX-License-Identifier: GPL-2.0-only
#
-# Copyright (C) 2022 Muhammad Rizki <[email protected]>
+# Copyright (C) 2022 Muhammad Rizki <[email protected]>
# Copyright (C) 2022 Ammar Faizi <[email protected]>
#
from apscheduler.schedulers.asyncio import AsyncIOScheduler
-from scraper import BotMutexes
+from atom.utils import Mutexes
from dotenv import load_dotenv
from mysql import connector
-from packages import DaemonClient
-from scraper import Scraper
-from scraper import Bot
+from telegram.packages import DaemonClient
+from telegram.mailer import Listener
import os
def main():
- load_dotenv()
+ load_dotenv("telegram.env")
client = DaemonClient(
- "storage/EmailScraper",
+ "telegram/storage/EmailScraper",
api_id=int(os.getenv("API_ID")),
api_hash=os.getenv("API_HASH"),
bot_token=os.getenv("BOT_TOKEN"),
@@ -28,9 +27,7 @@ def main():
password=os.getenv("DB_PASS"),
database=os.getenv("DB_NAME")
),
- plugins=dict(
- root="packages.plugins"
- ),
+ plugins=dict(root="telegram.packages.plugins")
)
sched = AsyncIOScheduler(
@@ -40,14 +37,13 @@ def main():
}
)
- bot = Bot(
+ mailer = Listener(
client=client,
sched=sched,
- scraper=Scraper(),
- mutexes=BotMutexes()
+ mutexes=Mutexes()
)
sched.start()
- bot.run()
+ mailer.run()
client.run()
--
Muhammad Rizki
next prev parent reply other threads:[~2022-09-06 11:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-06 11:19 [RFC PATCH v1 0/5] Refactor some Telegram bot source code Muhammad Rizki
2022-09-06 11:19 ` [RFC PATCH v1 1/5] [telegram] Move the " Muhammad Rizki
2022-09-06 11:19 ` [RFC PATCH v1 2/5] [telegram] Refactor Telegram bot database method Muhammad Rizki
2022-09-06 11:19 ` [RFC PATCH v1 3/5] [telegram] Renaming some functions in scraper/bot.py Muhammad Rizki
2022-09-06 16:11 ` Ammar Faizi
2022-09-06 11:19 ` [RFC PATCH v1 4/5] [telegram] Remove unecessary files Muhammad Rizki
2022-09-06 16:02 ` Ammar Faizi
2022-09-06 16:46 ` Muhammad Rizki
2022-09-06 16:53 ` Ammar Faizi
2022-09-06 11:19 ` Muhammad Rizki [this message]
2022-09-06 16:04 ` [RFC PATCH v1 5/5] Refactor many files Ammar Faizi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox