public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v1 0/7] Fix some bugs and add some features
@ 2022-10-18  8:16 Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 1/7] discord: Add send_text_mail_interaction() Muhammad Rizki
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

Hi sir,

This series is to fix some bugs, improve some codes and add some new
features. These bugs should have been fixed now, and the email file
attachments should have been removed after all attachments have been
sent.

Known bugs:
1. Email payload extraction result become unicode if the email payload
   contain non-UTF8 characters like chinese, japanese, and similar
   like that.
2. remove_patch() doesn't remove all file attachments properly.

Improvements:
1. Improve fix_utf8_char() to make it stable for both Discord and
   Telegram bot.
2. Improve remove_patch() to make it all file attachments removed
   after sending them.
3. Fix `/lore {raw atom url}` to add a "telegram" onto the
   create_template() platform parameter.

New features:
1. Add send_text_mail_interaction() for the `/lore {raw atom url}` slash
   command.
2. Add send_patch_mail_interaction() for the `/lore {raw atom url}`
   slash command.
3. Add `/lore {raw atom url}` slash command.

There are 7 patches in this series:
- Patch 1 is to add send_text_mail_interaction()
- Patch 2 is to add send_patch_mail_interaction()
- Patch 3 is to add `/lore` slash command
- Patch 4 is to improve fix_utf8_char() code
- Patch 5 is to improve remove_patch() code to make it more stable
- Patch 6 is to add manage_payload() for manage email payload extraction
- Patch 7 is to fix the Telegram `/lore` command

How to use:
1. Execute the db.sql file in the daemon directory,
2. Setup .env file, the example is there with suffix .example, this
   file name must remove the suffix name .example,
3. Set up the config.py in each bot directory, such as dscord and
   telegram. The example is there with suffix .example & the file name
   must remove suffix name .example,
4. Run `pip3 install -r requirements.txt` in each bot directory,
5. STORAGE_DIR env value must `storage` to make it work fine,
6. Run the bot by `python3 dc.py` or `python3 tg.py`.

Both tested. But, I want to make sure if it's already fixed and stable.
So, don't forget to test it too, thanks.

Muhammad Rizki (7):
  discord: Add send_text_mail_interaction()
  discord: Add send_patch_mail_interaction()
  discord: Add get lore mail slash command
  atom: Improve fix_utf8_char()
  atom: Improve remove_patch()
  atom: add manage_payload()
  telegram: Fix get lore command

 daemon/atom/utils.py                          | 31 +++++++++++----
 daemon/dscord/gnuweeb/client.py               | 23 +++++++++++
 .../plugins/slash_commands/__init__.py        |  2 +
 .../plugins/slash_commands/get_lore_mail.py   | 39 +++++++++++++++++++
 daemon/dscord/mailer/listener.py              |  4 +-
 daemon/telegram/mailer/listener.py            |  3 +-
 .../packages/plugins/commands/scrape.py       |  5 +--
 7 files changed, 92 insertions(+), 15 deletions(-)
 create mode 100644 daemon/dscord/gnuweeb/plugins/slash_commands/get_lore_mail.py


base-commit: d9b20dab81202b93f48d5365ad680796e5839d80
--
Muhammad Rizki

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v1 1/7] discord: Add send_text_mail_interaction()
  2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
@ 2022-10-18  8:16 ` Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 2/7] discord: Add send_patch_mail_interaction() Muhammad Rizki
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

Add send_text_mail_interaction() for the `/atom get {url}` to get the
specific lore email message from an URL in future use.

Signed-off-by: Muhammad Rizki <[email protected]>
---
 daemon/dscord/gnuweeb/client.py | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/daemon/dscord/gnuweeb/client.py b/daemon/dscord/gnuweeb/client.py
index 03a1b8c..15a6eee 100644
--- a/daemon/dscord/gnuweeb/client.py
+++ b/daemon/dscord/gnuweeb/client.py
@@ -4,6 +4,7 @@
 #
 
 import discord
+from discord import Interaction
 from discord.ext import commands
 from discord import Intents
 from dscord.config import ACTIVITY_NAME
@@ -76,3 +77,11 @@ class GWClient(commands.Bot):
 
 		utils.remove_patch(tmp)
 		return m
+
+
+	async def send_text_mail_interaction(self, i: "Interaction",
+					text: str, url: str = None):
+		return await i.response.send_message(
+			content=text,
+			view=models.FullMessageBtn(url)
+		)
-- 
Muhammad Rizki


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v1 2/7] discord: Add send_patch_mail_interaction()
  2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 1/7] discord: Add send_text_mail_interaction() Muhammad Rizki
@ 2022-10-18  8:16 ` Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 3/7] discord: Add get lore mail slash command Muhammad Rizki
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

This function is just the same as send_text_mail_interaction(), the
different between them is send_patch_mail_interaction() for sending a
lore email message with patch file attachment.

Signed-off-by: Muhammad Rizki <[email protected]>
---
 daemon/dscord/gnuweeb/client.py | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/daemon/dscord/gnuweeb/client.py b/daemon/dscord/gnuweeb/client.py
index 15a6eee..b921f7d 100644
--- a/daemon/dscord/gnuweeb/client.py
+++ b/daemon/dscord/gnuweeb/client.py
@@ -85,3 +85,17 @@ class GWClient(commands.Bot):
 			content=text,
 			view=models.FullMessageBtn(url)
 		)
+
+
+	async def send_patch_mail_interaction(self, mail, i: "Interaction",
+						text: str, url: str = None):
+		tmp, doc, caption, url = utils.prepare_patch(
+			mail, text, url, "discord"
+		)
+		m = await i.response.send_message(
+			content=caption,
+			file=discord.File(doc),
+			view=models.FullMessageBtn(url)
+		)
+		utils.remove_patch(tmp)
+		return m
-- 
Muhammad Rizki


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v1 3/7] discord: Add get lore mail slash command
  2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 1/7] discord: Add send_text_mail_interaction() Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 2/7] discord: Add send_patch_mail_interaction() Muhammad Rizki
@ 2022-10-18  8:16 ` Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 4/7] atom: Improve fix_utf8_char() Muhammad Rizki
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

Add a `/lore {url}` slash command to get the specific lore email message
from the raw atom URL.

Signed-off-by: Muhammad Rizki <[email protected]>
---
 .../plugins/slash_commands/__init__.py        |  2 +
 .../plugins/slash_commands/get_lore_mail.py   | 39 +++++++++++++++++++
 2 files changed, 41 insertions(+)
 create mode 100644 daemon/dscord/gnuweeb/plugins/slash_commands/get_lore_mail.py

diff --git a/daemon/dscord/gnuweeb/plugins/slash_commands/__init__.py b/daemon/dscord/gnuweeb/plugins/slash_commands/__init__.py
index a6d913c..126af45 100644
--- a/daemon/dscord/gnuweeb/plugins/slash_commands/__init__.py
+++ b/daemon/dscord/gnuweeb/plugins/slash_commands/__init__.py
@@ -5,9 +5,11 @@
 
 from .manage_atom import ManageAtomSC
 from .manage_broadcast import ManageBroadcastSC
+from .get_lore_mail import GetLoreSC
 
 
 class SlashCommands(
+	GetLoreSC,
 	ManageAtomSC,
 	ManageBroadcastSC
 ): pass
diff --git a/daemon/dscord/gnuweeb/plugins/slash_commands/get_lore_mail.py b/daemon/dscord/gnuweeb/plugins/slash_commands/get_lore_mail.py
new file mode 100644
index 0000000..2d55d16
--- /dev/null
+++ b/daemon/dscord/gnuweeb/plugins/slash_commands/get_lore_mail.py
@@ -0,0 +1,39 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Copyright (C) 2022  Muhammad Rizki <[email protected]>
+#
+
+import asyncio
+import discord
+from discord.ext import commands
+from discord import Interaction
+from discord import app_commands
+
+from atom import utils
+from atom import Scraper
+
+
+class GetLoreSC(commands.Cog):
+	def __init__(self, bot) -> None:
+		self.bot = bot
+
+
+	@app_commands.command(
+		name="lore",
+		description="Get lore email from raw email URL."
+	)
+	@app_commands.describe(url="Raw lore email URL")
+	async def get_lore(self, i: "Interaction", url: str):
+		s = Scraper()
+		mail = await s.get_email_from_url(url)
+		text, _, is_patch = utils.create_template(mail, "discord")
+
+		if is_patch:
+			m = await self.bot.send_patch_mail_interaction(
+				mail=mail, i=i, text=text, url=url
+			)
+		else:
+			text = "#ml\n" + text
+			m = await self.bot.send_text_mail_interaction(
+				i=i, text=text, url=url
+			)
-- 
Muhammad Rizki


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
                   ` (2 preceding siblings ...)
  2022-10-18  8:16 ` [PATCH v1 3/7] discord: Add get lore mail slash command Muhammad Rizki
@ 2022-10-18  8:16 ` Muhammad Rizki
  2022-10-19 16:59   ` Ammar Faizi
  2022-10-19 22:44   ` Alviro Iskandar Setiawan
  2022-10-18  8:16 ` [PATCH v1 5/7] atom: Improve remove_patch() Muhammad Rizki
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

Improvement on the fix_utf8_char() and change the logic at the
create_template(). Use the `platform == "discord"` to clean html escape.

Signed-off-by: Muhammad Rizki <[email protected]>
---
 daemon/atom/utils.py | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/daemon/atom/utils.py b/daemon/atom/utils.py
index a30d5cb..48857a7 100644
--- a/daemon/atom/utils.py
+++ b/daemon/atom/utils.py
@@ -206,7 +206,7 @@ def create_template(thread: Message, platform: str, to=None, cc=None):
 		if len(ret) >= substr:
 			ret = ret[:substr] + "..."
 
-		ret = fix_utf8_char(ret, platform == "telegram")
+		ret = fix_utf8_char(ret, platform == "discord")
 		ret += border
 
 	return ret, files, is_patch
@@ -242,10 +242,12 @@ def remove_patch(tmp):
 	shutil.rmtree(tmp)
 
 
-def fix_utf8_char(text: str, html_escape: bool = True):
+def fix_utf8_char(text: str, unescape: bool = True):
 	t = text.rstrip().replace("�"," ")
-	if html_escape:
-		t = html.escape(html.escape(text))
+	if unescape:
+		t = html.unescape(html.unescape(text))
+		reg = re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')
+		t = reg.sub('', t)
 	return t
 
 
-- 
Muhammad Rizki


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v1 5/7] atom: Improve remove_patch()
  2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
                   ` (3 preceding siblings ...)
  2022-10-18  8:16 ` [PATCH v1 4/7] atom: Improve fix_utf8_char() Muhammad Rizki
@ 2022-10-18  8:16 ` Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 6/7] atom: add manage_payload() Muhammad Rizki
  2022-10-18  8:16 ` [PATCH v1 7/7] telegram: Fix get lore command Muhammad Rizki
  6 siblings, 0 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

Improvement on remove_patch(). So, after send all the email attachments
then remove them. On the previous patch, this function can't remove
them properly.

Signed-off-by: Muhammad Rizki <[email protected]>
---
 daemon/atom/utils.py                                | 10 +++++++---
 daemon/dscord/mailer/listener.py                    |  4 ++--
 daemon/telegram/mailer/listener.py                  |  3 +--
 daemon/telegram/packages/plugins/commands/scrape.py |  3 +--
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/daemon/atom/utils.py b/daemon/atom/utils.py
index 48857a7..c129925 100644
--- a/daemon/atom/utils.py
+++ b/daemon/atom/utils.py
@@ -6,7 +6,7 @@
 
 from pyrogram.types import Chat, InlineKeyboardMarkup, InlineKeyboardButton
 from email.message import Message
-from typing import Dict
+from typing import Dict, Union
 from slugify import slugify
 import hashlib
 import uuid
@@ -238,8 +238,12 @@ def prepare_patch(mail, text, url, platform: str):
 	return tmp, file, caption, url
 
 
-def remove_patch(tmp):
-	shutil.rmtree(tmp)
+def remove_patch(tmp: Union[str, list]):
+	if isinstance(tmp, str):
+		return shutil.rmtree(tmp)
+
+	for d,_ in tmp:
+		shutil.rmtree(d)
 
 
 def fix_utf8_char(text: str, unescape: bool = True):
diff --git a/daemon/dscord/mailer/listener.py b/daemon/dscord/mailer/listener.py
index a280a58..cc0a9f7 100644
--- a/daemon/dscord/mailer/listener.py
+++ b/daemon/dscord/mailer/listener.py
@@ -125,10 +125,10 @@ class Listener:
 
 		for d, f in files:
 			await m.reply(file=File(f"{d}/{f}"))
-			if files.index((d,f)) == len(files)-1:
-				utils.remove_patch(d)
 			await asyncio.sleep(1)
 
+		utils.remove_patch(files)
+
 		return True
 
 
diff --git a/daemon/telegram/mailer/listener.py b/daemon/telegram/mailer/listener.py
index 208aed0..08feddd 100644
--- a/daemon/telegram/mailer/listener.py
+++ b/daemon/telegram/mailer/listener.py
@@ -118,8 +118,7 @@ class Bot():
 			await m.reply_document(f"{d}/{f}", file_name=f)
 			await asyncio.sleep(1)
 
-		if files:
-			shutil.rmtree(str(files[0][0]))
+		utils.remove_patch(files)
 
 		return True
 
diff --git a/daemon/telegram/packages/plugins/commands/scrape.py b/daemon/telegram/packages/plugins/commands/scrape.py
index d4d10a9..52ddb0b 100644
--- a/daemon/telegram/packages/plugins/commands/scrape.py
+++ b/daemon/telegram/packages/plugins/commands/scrape.py
@@ -53,5 +53,4 @@ async def scrap_email(c: DaemonClient, m: Message):
 		await m.reply_document(f"{d}/{f}", file_name=f)
 		await asyncio.sleep(1)
 
-	if files:
-		shutil.rmtree(str(files[0][0]))
+	utils.remove_patch(files)
-- 
Muhammad Rizki


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v1 6/7] atom: add manage_payload()
  2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
                   ` (4 preceding siblings ...)
  2022-10-18  8:16 ` [PATCH v1 5/7] atom: Improve remove_patch() Muhammad Rizki
@ 2022-10-18  8:16 ` Muhammad Rizki
  2022-10-19 17:04   ` Ammar Faizi
  2022-10-18  8:16 ` [PATCH v1 7/7] telegram: Fix get lore command Muhammad Rizki
  6 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

Add manage_payload() to handle the email decoding to utf-8. This include
a non-UTF8 character and base64 decoding.

Signed-off-by: Muhammad Rizki <[email protected]>
---
 daemon/atom/utils.py | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/daemon/atom/utils.py b/daemon/atom/utils.py
index c129925..e24c5df 100644
--- a/daemon/atom/utils.py
+++ b/daemon/atom/utils.py
@@ -8,6 +8,7 @@ from pyrogram.types import Chat, InlineKeyboardMarkup, InlineKeyboardButton
 from email.message import Message
 from typing import Dict, Union
 from slugify import slugify
+from base64 import b64decode
 import hashlib
 import uuid
 import os
@@ -136,7 +137,7 @@ def gen_temp(name: str, platform: str):
 
 def extract_body(thread: Message, platform: str):
 	if not thread.is_multipart():
-		p = thread.get_payload(decode=True).decode(errors='replace')
+		p = manage_payload(thread)
 
 		if platform == "discord":
 			p = quote_reply(p)
@@ -255,6 +256,14 @@ def fix_utf8_char(text: str, unescape: bool = True):
 	return t
 
 
+def manage_payload(payload: Message):
+	p = str(payload.get_payload())
+	tf_encode = payload.get("Content-Transfer-Encoding")
+	if tf_encode != "base64":
+		return p.encode().decode("utf-8", errors="replace")
+	return b64decode(p).decode("utf-8")
+
+
 EMAIL_MSG_ID_PATTERN = r"<([^\<\>]+)>"
 def extract_email_msg_id(msg_id):
 	ret = re.search(EMAIL_MSG_ID_PATTERN, msg_id)
-- 
Muhammad Rizki


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v1 7/7] telegram: Fix get lore command
  2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
                   ` (5 preceding siblings ...)
  2022-10-18  8:16 ` [PATCH v1 6/7] atom: add manage_payload() Muhammad Rizki
@ 2022-10-18  8:16 ` Muhammad Rizki
  6 siblings, 0 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-18  8:16 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Muhammad Rizki, Alviro Iskandar Setiawan, GNU/Weeb Mailing List

Fix the `/lore {raw lore url}` to add "telegram" on the
create_template() platform parameter.

Signed-off-by: Muhammad Rizki <[email protected]>
---
 daemon/telegram/packages/plugins/commands/scrape.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/daemon/telegram/packages/plugins/commands/scrape.py b/daemon/telegram/packages/plugins/commands/scrape.py
index 52ddb0b..860e993 100644
--- a/daemon/telegram/packages/plugins/commands/scrape.py
+++ b/daemon/telegram/packages/plugins/commands/scrape.py
@@ -37,7 +37,7 @@ async def scrap_email(c: DaemonClient, m: Message):
 
 	s = Scraper()
 	mail = await s.get_email_from_url(url)
-	text, files, is_patch = utils.create_template(mail)
+	text, files, is_patch = utils.create_template(mail, "telegram")
 
 	if is_patch:
 		m = await c.send_patch_email(
-- 
Muhammad Rizki


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-18  8:16 ` [PATCH v1 4/7] atom: Improve fix_utf8_char() Muhammad Rizki
@ 2022-10-19 16:59   ` Ammar Faizi
  2022-10-19 17:23     ` Muhammad Rizki
  2022-10-19 22:44   ` Alviro Iskandar Setiawan
  1 sibling, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 16:59 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/18/22 3:16 PM, Muhammad Rizki wrote:
> -def fix_utf8_char(text: str, html_escape: bool = True):
> +def fix_utf8_char(text: str, unescape: bool = True):
>   	t = text.rstrip().replace("�"," ")
> -	if html_escape:
> -		t = html.escape(html.escape(text))
> +	if unescape:
> +		t = html.unescape(html.unescape(text))
> +		reg = re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')
> +		t = reg.sub('', t)
>   	return t

You do html.unescape() twice, then remove all HTML special chars and
tags. I don't understand why we should do that. Can you explain a bit
on what is going on here?

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-18  8:16 ` [PATCH v1 6/7] atom: add manage_payload() Muhammad Rizki
@ 2022-10-19 17:04   ` Ammar Faizi
  2022-10-19 17:23     ` Muhammad Rizki
  0 siblings, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 17:04 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/18/22 3:16 PM, Muhammad Rizki wrote:
> +def manage_payload(payload: Message):
> +	p = str(payload.get_payload())
> +	tf_encode = payload.get("Content-Transfer-Encoding")
> +	if tf_encode != "base64":
> +		return p.encode().decode("utf-8", errors="replace")
> +	return b64decode(p).decode("utf-8")

What happen if we have "Content-Transfer-Encoding: quoted-printable"?
Does this decode it properly?

Example: https://lore.kernel.org/io-uring/[email protected]/raw

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 16:59   ` Ammar Faizi
@ 2022-10-19 17:23     ` Muhammad Rizki
  2022-10-19 17:27       ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-19 17:23 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 19/10/2022 23.59, Ammar Faizi wrote:
> On 10/18/22 3:16 PM, Muhammad Rizki wrote:
>> -def fix_utf8_char(text: str, html_escape: bool = True):
>> +def fix_utf8_char(text: str, unescape: bool = True):
>>       t = text.rstrip().replace("�"," ")
>> -    if html_escape:
>> -        t = html.escape(html.escape(text))
>> +    if unescape:
>> +        t = html.unescape(html.unescape(text))
>> +        reg = 
>> re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')
>> +        t = reg.sub('', t)
>>       return t
> 
> You do html.unescape() twice, then remove all HTML special chars and
> tags. I don't understand why we should do that. Can you explain a bit
> on what is going on here?
> 

You said an HTML tag in the email payload should be empty or removed, so 
I created the re.sub() to remove the HTML tag. I forgot where you said that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-19 17:04   ` Ammar Faizi
@ 2022-10-19 17:23     ` Muhammad Rizki
  2022-10-19 17:28       ` Ammar Faizi
  2022-10-21  7:04       ` Ammar Faizi
  0 siblings, 2 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-19 17:23 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 20/10/2022 00.04, Ammar Faizi wrote:
> On 10/18/22 3:16 PM, Muhammad Rizki wrote:
>> +def manage_payload(payload: Message):
>> +    p = str(payload.get_payload())
>> +    tf_encode = payload.get("Content-Transfer-Encoding")
>> +    if tf_encode != "base64":
>> +        return p.encode().decode("utf-8", errors="replace")
>> +    return b64decode(p).decode("utf-8")
> 
> What happen if we have "Content-Transfer-Encoding: quoted-printable"?
> Does this decode it properly?
> 
> Example: 
> https://lore.kernel.org/io-uring/[email protected]/raw
> 

Yes, it does normally.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:23     ` Muhammad Rizki
@ 2022-10-19 17:27       ` Ammar Faizi
  2022-10-19 17:35         ` Muhammad Rizki
  0 siblings, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 17:27 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 12:23 AM, Muhammad Rizki wrote:
> On 19/10/2022 23.59, Ammar Faizi wrote:
>> On 10/18/22 3:16 PM, Muhammad Rizki wrote:
>>> -def fix_utf8_char(text: str, html_escape: bool = True):
>>> +def fix_utf8_char(text: str, unescape: bool = True):
>>>       t = text.rstrip().replace("�"," ")
>>> -    if html_escape:
>>> -        t = html.escape(html.escape(text))
>>> +    if unescape:
>>> +        t = html.unescape(html.unescape(text))
>>> +        reg = re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')
>>> +        t = reg.sub('', t)
>>>       return t
>>
>> You do html.unescape() twice, then remove all HTML special chars and
>> tags. I don't understand why we should do that. Can you explain a bit
>> on what is going on here?
>>
> 
> You said an HTML tag in the email payload should be empty or removed, so I created the re.sub() to remove the HTML tag. I forgot where you said that.

How so?

I don't think I said that. Won't this patch corrupt the email
if it contains HTML special chars?

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-19 17:23     ` Muhammad Rizki
@ 2022-10-19 17:28       ` Ammar Faizi
  2022-10-21  7:04       ` Ammar Faizi
  1 sibling, 0 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 17:28 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 12:23 AM, Muhammad Rizki wrote:
> Yes, it does normally.

OK, I'll give it a test.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:27       ` Ammar Faizi
@ 2022-10-19 17:35         ` Muhammad Rizki
  2022-10-19 17:42           ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-19 17:35 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 20/10/2022 00.27, Ammar Faizi wrote:
> On 10/20/22 12:23 AM, Muhammad Rizki wrote:
>> On 19/10/2022 23.59, Ammar Faizi wrote:
>>> On 10/18/22 3:16 PM, Muhammad Rizki wrote:
>>>> -def fix_utf8_char(text: str, html_escape: bool = True):
>>>> +def fix_utf8_char(text: str, unescape: bool = True):
>>>>       t = text.rstrip().replace("�"," ")
>>>> -    if html_escape:
>>>> -        t = html.escape(html.escape(text))
>>>> +    if unescape:
>>>> +        t = html.unescape(html.unescape(text))
>>>> +        reg = 
>>>> re.compile('<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')
>>>> +        t = reg.sub('', t)
>>>>       return t
>>>
>>> You do html.unescape() twice, then remove all HTML special chars and
>>> tags. I don't understand why we should do that. Can you explain a bit
>>> on what is going on here?
>>>
>>
>> You said an HTML tag in the email payload should be empty or removed, 
>> so I created the re.sub() to remove the HTML tag. I forgot where you 
>> said that.
> 
> How so?
> 
> I don't think I said that. Won't this patch corrupt the email
> if it contains HTML special chars?
> 

Ugh, hate when I should digging up the chat to give a prove. So, you 
want the re.sub() to be remove or no?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:35         ` Muhammad Rizki
@ 2022-10-19 17:42           ` Ammar Faizi
  2022-10-19 17:46             ` Ammar Faizi
  2022-10-19 17:51             ` Muhammad Rizki
  0 siblings, 2 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 17:42 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 12:35 AM, Muhammad Rizki wrote:
> Ugh, hate when I should digging up the chat to give a prove. So, you want the re.sub() to be remove or no?

What I want is: decode the email *properly*, then send it
to Telegram intact.

That being said, if you have a string "&gt;" in the email
decoded email, it should be still "&gt;" when it is sent
to Telegram. If you have a string ">" in the decode email,
it should be still ">" when it is sent to Telegram. And
so on so forth...

But what you do here is removing all HTML special chars
after unescape() it twice. I also don't understand why
unescape() should be called twice and nested like that.

Make me understand why it is necessary doing that is
your job as a submitter.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:42           ` Ammar Faizi
@ 2022-10-19 17:46             ` Ammar Faizi
  2022-10-19 17:51             ` Muhammad Rizki
  1 sibling, 0 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 17:46 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 12:42 AM, Ammar Faizi wrote:
> On 10/20/22 12:35 AM, Muhammad Rizki wrote:
>> Ugh, hate when I should digging up the chat to give a prove. So, you want the re.sub() to be remove or no?
> 
> What I want is: decode the email *properly*, then send it
> to Telegram intact.
> 
> That being said, if you have a string "&gt;" in the email
> decoded email, it should be still "&gt;" when it is sent
> to Telegram. If you have a string ">" in the decode email,
> it should be still ">" when it is sent to Telegram. And
> so on so forth...
> 
> But what you do here is removing all HTML special chars
> after unescape() it twice. I also don't understand why
> unescape() should be called twice and nested like that.
> 
> Make me understand why it is necessary doing that is
> your job as a submitter.

As you can see on Telegram just now

  "&gt" becomes ">"

that is the part where you did it wrong. I want to keep
the email intact when it is sent to Telegram.

Side note:
If you read this email from Telegram, it is shown wrong,
so please read it from the lore or your Thunderbird.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:42           ` Ammar Faizi
  2022-10-19 17:46             ` Ammar Faizi
@ 2022-10-19 17:51             ` Muhammad Rizki
  2022-10-19 17:53               ` Ammar Faizi
  1 sibling, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-19 17:51 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 20/10/2022 00.42, Ammar Faizi wrote:
> On 10/20/22 12:35 AM, Muhammad Rizki wrote:
>> Ugh, hate when I should digging up the chat to give a prove. So, you 
>> want the re.sub() to be remove or no?
> 
> What I want is: decode the email *properly*, then send it
> to Telegram intact.
> 
> That being said, if you have a string "&gt;" in the email
> decoded email, it should be still "&gt;" when it is sent
> to Telegram. If you have a string ">" in the decode email,
> it should be still ">" when it is sent to Telegram. And
> so on so forth...
> 
> But what you do here is removing all HTML special chars
> after unescape() it twice. I also don't understand why
> unescape() should be called twice and nested like that.
> 
> Make me understand why it is necessary doing that is
> your job as a submitter.
> 

I know. I forgot where the conversation started, but you said this 
https://discord.com/channels/845302963739033611/845302963739033613/1028563014418444348

"Just send an empty email that would be fine."

Sorry, I don't understand your statement about it. I thought you were 
asked me that the HTML contain tags should be empty or what. Please explain.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:51             ` Muhammad Rizki
@ 2022-10-19 17:53               ` Ammar Faizi
  2022-10-19 17:55                 ` Muhammad Rizki
  2022-10-19 18:04                 ` Muhammad Rizki
  0 siblings, 2 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 17:53 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 12:51 AM, Muhammad Rizki wrote:
> On 20/10/2022 00.42, Ammar Faizi wrote:
>> On 10/20/22 12:35 AM, Muhammad Rizki wrote:
>>> Ugh, hate when I should digging up the chat to give a prove. So, you want the re.sub() to be remove or no?
>>
>> What I want is: decode the email *properly*, then send it
>> to Telegram intact.
>>
>> That being said, if you have a string "&gt;" in the email
>> decoded email, it should be still "&gt;" when it is sent
>> to Telegram. If you have a string ">" in the decode email,
>> it should be still ">" when it is sent to Telegram. And
>> so on so forth...
>>
>> But what you do here is removing all HTML special chars
>> after unescape() it twice. I also don't understand why
>> unescape() should be called twice and nested like that.
>>
>> Make me understand why it is necessary doing that is
>> your job as a submitter.
>>
> 
> I know. I forgot where the conversation started, but you said this https://discord.com/channels/845302963739033611/845302963739033613/1028563014418444348
> 
> "Just send an empty email that would be fine."
> 
> Sorry, I don't understand your statement about it. I thought you were asked me that the HTML contain tags should be empty or what. Please explain.
That is because the email is literally an empty email.

If someone sends an empty email, then just send an empty
email would be fine.

I didn't say we should remove all HTML special chars.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:53               ` Ammar Faizi
@ 2022-10-19 17:55                 ` Muhammad Rizki
  2022-10-19 18:11                   ` Ammar Faizi
  2022-10-19 18:04                 ` Muhammad Rizki
  1 sibling, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-19 17:55 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 20/10/2022 00.53, Ammar Faizi wrote:
> On 10/20/22 12:51 AM, Muhammad Rizki wrote:
>> On 20/10/2022 00.42, Ammar Faizi wrote:
>>> On 10/20/22 12:35 AM, Muhammad Rizki wrote:
>>>> Ugh, hate when I should digging up the chat to give a prove. So, you 
>>>> want the re.sub() to be remove or no?
>>>
>>> What I want is: decode the email *properly*, then send it
>>> to Telegram intact.
>>>
>>> That being said, if you have a string "&gt;" in the email
>>> decoded email, it should be still "&gt;" when it is sent
>>> to Telegram. If you have a string ">" in the decode email,
>>> it should be still ">" when it is sent to Telegram. And
>>> so on so forth...
>>>
>>> But what you do here is removing all HTML special chars
>>> after unescape() it twice. I also don't understand why
>>> unescape() should be called twice and nested like that.
>>>
>>> Make me understand why it is necessary doing that is
>>> your job as a submitter.
>>>
>>
>> I know. I forgot where the conversation started, but you said this 
>> https://discord.com/channels/845302963739033611/845302963739033613/1028563014418444348
>>
>> "Just send an empty email that would be fine."
>>
>> Sorry, I don't understand your statement about it. I thought you were 
>> asked me that the HTML contain tags should be empty or what. Please 
>> explain.
> That is because the email is literally an empty email.
> 
> If someone sends an empty email, then just send an empty
> email would be fine.
> 
> I didn't say we should remove all HTML special chars.
> 

So, I just misunderstood here. But, the email payload extracting an HTML 
tags when send to Discord, not an empty email.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:53               ` Ammar Faizi
  2022-10-19 17:55                 ` Muhammad Rizki
@ 2022-10-19 18:04                 ` Muhammad Rizki
  2022-10-19 18:14                   ` Ammar Faizi
  1 sibling, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-19 18:04 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 20/10/2022 00.53, Ammar Faizi wrote:
> On 10/20/22 12:51 AM, Muhammad Rizki wrote:
>> On 20/10/2022 00.42, Ammar Faizi wrote:
>>> On 10/20/22 12:35 AM, Muhammad Rizki wrote:
>>>> Ugh, hate when I should digging up the chat to give a prove. So, you 
>>>> want the re.sub() to be remove or no?
>>>
>>> What I want is: decode the email *properly*, then send it
>>> to Telegram intact.
>>>
>>> That being said, if you have a string "&gt;" in the email
>>> decoded email, it should be still "&gt;" when it is sent
>>> to Telegram. If you have a string ">" in the decode email,
>>> it should be still ">" when it is sent to Telegram. And
>>> so on so forth...
>>>
>>> But what you do here is removing all HTML special chars
>>> after unescape() it twice. I also don't understand why
>>> unescape() should be called twice and nested like that.
>>>
>>> Make me understand why it is necessary doing that is
>>> your job as a submitter.
>>>
>>
>> I know. I forgot where the conversation started, but you said this 
>> https://discord.com/channels/845302963739033611/845302963739033613/1028563014418444348
>>
>> "Just send an empty email that would be fine."
>>
>> Sorry, I don't understand your statement about it. I thought you were 
>> asked me that the HTML contain tags should be empty or what. Please 
>> explain.
> That is because the email is literally an empty email.
> 
> If someone sends an empty email, then just send an empty
> email would be fine.
> 
> I didn't say we should remove all HTML special chars.
> 

In the Telegram bot send that raw lore email URL send an empty email, 
but in the Discord bot, the HTML tags appear. Its from the default email 
payload that why I put re.sub() if `unescape` parameter is `True`, 
because I use the `unescape` parameter if the `platform == "discord"`.

So, should I ignore it when an email payload contains HTML tags even you 
say that email is an empty email and remove the re.sub()?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 17:55                 ` Muhammad Rizki
@ 2022-10-19 18:11                   ` Ammar Faizi
  2022-10-19 22:34                     ` Alviro Iskandar Setiawan
  0 siblings, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 18:11 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 12:55 AM, Muhammad Rizki wrote:
> So, I just misunderstood here. But, the email payload extracting an HTML tags when send to Discord, not an empty email.

See:

https://lore.kernel.org/all/CAOnxWcsdyTkY+nDGz0ca-SP7cxjx8z1YSXxEwEwYsf-2FeAouQ@mail.gmail.com/

That email is literally empty.

I know what happened. Let me explain.

When you send an *HTML* email from gmail, gmail generates two types of
email, which it does by default when you don't explicitly ask to send
a plain text email.

The first one with "text/plain", and the second one with "text/html".
Both are sent in one email with:

    "Content-Type: multipart/alternative;"

Those two are sent as an attachment. Both attachments have the same
content, but the "text/html" one is formatted as HTML.

When that happens, you only take the "text/plain".

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 18:04                 ` Muhammad Rizki
@ 2022-10-19 18:14                   ` Ammar Faizi
  0 siblings, 0 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-19 18:14 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 1:04 AM, Muhammad Rizki wrote:
> So, should I ignore it when an email payload contains HTML tags even you say that email is an empty email and remove the re.sub()?

See my another email about gmail.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 18:11                   ` Ammar Faizi
@ 2022-10-19 22:34                     ` Alviro Iskandar Setiawan
  2022-10-20  4:26                       ` Muhammad Rizki
  0 siblings, 1 reply; 42+ messages in thread
From: Alviro Iskandar Setiawan @ 2022-10-19 22:34 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Muhammad Rizki, GNU/Weeb Mailing List

On Thu, Oct 20, 2022 at 1:11 AM Ammar Faizi wrote:
> On 10/20/22 12:55 AM, Muhammad Rizki wrote:
> > So, I just misunderstood here. But, the email payload extracting an HTML tags when send to Discord, not an empty email.
>
> See:
>
> https://lore.kernel.org/all/CAOnxWcsdyTkY+nDGz0ca-SP7cxjx8z1YSXxEwEwYsf-2FeAouQ@mail.gmail.com/
>
> That email is literally empty.
>
> I know what happened. Let me explain.
>
> When you send an *HTML* email from gmail, gmail generates two types of
> email, which it does by default when you don't explicitly ask to send
> a plain text email.
>
> The first one with "text/plain", and the second one with "text/html".
> Both are sent in one email with:
>
>     "Content-Type: multipart/alternative;"
>
> Those two are sent as an attachment. Both attachments have the same
> content, but the "text/html" one is formatted as HTML.
>
> When that happens, you only take the "text/plain".

weird that lore accepts an HTML email, but that's already set in stone

-- Viro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-18  8:16 ` [PATCH v1 4/7] atom: Improve fix_utf8_char() Muhammad Rizki
  2022-10-19 16:59   ` Ammar Faizi
@ 2022-10-19 22:44   ` Alviro Iskandar Setiawan
  2022-10-20  4:24     ` Muhammad Rizki
  1 sibling, 1 reply; 42+ messages in thread
From: Alviro Iskandar Setiawan @ 2022-10-19 22:44 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Ammar Faizi, GNU/Weeb Mailing List

On Tue, Oct 18, 2022 at 3:16 PM Muhammad Rizki wrote:
> Improvement on the fix_utf8_char() and change the logic at the
> create_template(). Use the `platform == "discord"` to clean html escape.
>
> Signed-off-by: Muhammad Rizki <[email protected]>

tq for the patch

Instead of having platform == "discord" (string compare), can you make
them integer constant?
e.g.:

enum {
   PL_TELEGRAM = 1
   PL_DISCORD = 2
};

which is much cheaper to compare than string. IDK about enum in
Python, maybe just a global constant var or class property that does
the same thing?

-- Viro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 22:44   ` Alviro Iskandar Setiawan
@ 2022-10-20  4:24     ` Muhammad Rizki
  2022-10-21 11:31       ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-20  4:24 UTC (permalink / raw)
  To: Alviro Iskandar Setiawan; +Cc: Ammar Faizi, GNU/Weeb Mailing List

On 20/10/2022 05.44, Alviro Iskandar Setiawan wrote:
> On Tue, Oct 18, 2022 at 3:16 PM Muhammad Rizki wrote:
>> Improvement on the fix_utf8_char() and change the logic at the
>> create_template(). Use the `platform == "discord"` to clean html escape.
>>
>> Signed-off-by: Muhammad Rizki <[email protected]>
> 
> tq for the patch
> 
> Instead of having platform == "discord" (string compare), can you make
> them integer constant?
> e.g.:
> 
> enum {
>     PL_TELEGRAM = 1
>     PL_DISCORD = 2
> };
> 
> which is much cheaper to compare than string. IDK about enum in
> Python, maybe just a global constant var or class property that does
> the same thing?
> 
> -- Viro

Actually, I can do it. Just use an enums library and make a class 
Platform and use it like Platform.DISCORD or Platform.TELEGRAM

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-19 22:34                     ` Alviro Iskandar Setiawan
@ 2022-10-20  4:26                       ` Muhammad Rizki
  2022-10-20  5:02                         ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-20  4:26 UTC (permalink / raw)
  To: Alviro Iskandar Setiawan, Ammar Faizi; +Cc: GNU/Weeb Mailing List

On 20/10/2022 05.34, Alviro Iskandar Setiawan wrote:
> On Thu, Oct 20, 2022 at 1:11 AM Ammar Faizi wrote:
>> On 10/20/22 12:55 AM, Muhammad Rizki wrote:
>>> So, I just misunderstood here. But, the email payload extracting an HTML tags when send to Discord, not an empty email.
>>
>> See:
>>
>> https://lore.kernel.org/all/CAOnxWcsdyTkY+nDGz0ca-SP7cxjx8z1YSXxEwEwYsf-2FeAouQ@mail.gmail.com/
>>
>> That email is literally empty.
>>
>> I know what happened. Let me explain.
>>
>> When you send an *HTML* email from gmail, gmail generates two types of
>> email, which it does by default when you don't explicitly ask to send
>> a plain text email.
>>
>> The first one with "text/plain", and the second one with "text/html".
>> Both are sent in one email with:
>>
>>      "Content-Type: multipart/alternative;"
>>
>> Those two are sent as an attachment. Both attachments have the same
>> content, but the "text/html" one is formatted as HTML.
>>
>> When that happens, you only take the "text/plain".
> 
> weird that lore accepts an HTML email, but that's already set in stone
> 
> -- Viro

Wait, so we shouldn't have receive an HTML email? I could just use if 
statement here.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-20  4:26                       ` Muhammad Rizki
@ 2022-10-20  5:02                         ` Ammar Faizi
  2022-10-20  5:06                           ` Muhammad Rizki
  0 siblings, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-20  5:02 UTC (permalink / raw)
  To: Muhammad Rizki, Alviro Iskandar Setiawan; +Cc: GNU/Weeb Mailing List

On 10/20/22 11:26 AM, Muhammad Rizki wrote:
> Wait, so we shouldn't have receive an HTML email? I could just use if statement here.

The vger end rejects any HTML email by default. But that email was sent
to linux-mm (kvack).

There is:

    X-Delivered-To: [email protected]

in the email header that implies so.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-20  5:02                         ` Ammar Faizi
@ 2022-10-20  5:06                           ` Muhammad Rizki
  2022-10-20  5:10                             ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-20  5:06 UTC (permalink / raw)
  To: Ammar Faizi, Alviro Iskandar Setiawan; +Cc: GNU/Weeb Mailing List

On 20/10/2022 12.02, Ammar Faizi wrote:
> On 10/20/22 11:26 AM, Muhammad Rizki wrote:
>> Wait, so we shouldn't have receive an HTML email? I could just use if 
>> statement here.
> 
> The vger end rejects any HTML email by default. But that email was sent
> to linux-mm (kvack).
> 
> There is:
> 
>     X-Delivered-To: [email protected]
> 
> in the email header that implies so.
> 

So, I could just do if statement here and ignore whenever the email 
contain text/html header?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-20  5:06                           ` Muhammad Rizki
@ 2022-10-20  5:10                             ` Ammar Faizi
  2022-10-20  5:10                               ` Muhammad Rizki
  0 siblings, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-20  5:10 UTC (permalink / raw)
  To: Muhammad Rizki, Alviro Iskandar Setiawan; +Cc: GNU/Weeb Mailing List

On 10/20/22 12:06 PM, Muhammad Rizki wrote:
> On 20/10/2022 12.02, Ammar Faizi wrote:
>> On 10/20/22 11:26 AM, Muhammad Rizki wrote:
>>> Wait, so we shouldn't have receive an HTML email? I could just use if statement here.
>>
>> The vger end rejects any HTML email by default. But that email was sent
>> to linux-mm (kvack).
>>
>> There is:
>>
>>     X-Delivered-To: [email protected]
>>
>> in the email header that implies so.
>>
> 
> So, I could just do if statement here and ignore whenever the email contain text/html header?

If you can ignore the HTML part and take only the text/plain
part, that would be great. It's not a mandatory requirement
though.

That's to say, the ideal way to interpret multipart/alternative
in this context is taking the text/plain and discard the HTML.

Can you do that?

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-20  5:10                             ` Ammar Faizi
@ 2022-10-20  5:10                               ` Muhammad Rizki
  2022-10-20  5:16                                 ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-20  5:10 UTC (permalink / raw)
  To: Ammar Faizi, Alviro Iskandar Setiawan; +Cc: GNU/Weeb Mailing List

On 20/10/2022 12.10, Ammar Faizi wrote:
> On 10/20/22 12:06 PM, Muhammad Rizki wrote:
>> On 20/10/2022 12.02, Ammar Faizi wrote:
>>> On 10/20/22 11:26 AM, Muhammad Rizki wrote:
>>>> Wait, so we shouldn't have receive an HTML email? I could just use 
>>>> if statement here.
>>>
>>> The vger end rejects any HTML email by default. But that email was sent
>>> to linux-mm (kvack).
>>>
>>> There is:
>>>
>>>     X-Delivered-To: [email protected]
>>>
>>> in the email header that implies so.
>>>
>>
>> So, I could just do if statement here and ignore whenever the email 
>> contain text/html header?
> 
> If you can ignore the HTML part and take only the text/plain
> part, that would be great. It's not a mandatory requirement
> though.
> 
> That's to say, the ideal way to interpret multipart/alternative
> in this context is taking the text/plain and discard the HTML.
> 
> Can you do that?
> 

I would like to debug it first.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-20  5:10                               ` Muhammad Rizki
@ 2022-10-20  5:16                                 ` Ammar Faizi
  0 siblings, 0 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-20  5:16 UTC (permalink / raw)
  To: Muhammad Rizki, Alviro Iskandar Setiawan; +Cc: GNU/Weeb Mailing List

On 10/20/22 12:10 PM, Muhammad Rizki wrote:
> I would like to debug it first.

Looking forward to the update.

Thanks.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-19 17:23     ` Muhammad Rizki
  2022-10-19 17:28       ` Ammar Faizi
@ 2022-10-21  7:04       ` Ammar Faizi
  2022-10-21  7:37         ` Muhammad Rizki
  1 sibling, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-21  7:04 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/20/22 12:23 AM, Muhammad Rizki wrote:
> On 20/10/2022 00.04, Ammar Faizi wrote:
>> On 10/18/22 3:16 PM, Muhammad Rizki wrote:
>>> +def manage_payload(payload: Message):
>>> +    p = str(payload.get_payload())
>>> +    tf_encode = payload.get("Content-Transfer-Encoding")
>>> +    if tf_encode != "base64":
>>> +        return p.encode().decode("utf-8", errors="replace")
>>> +    return b64decode(p).decode("utf-8")
>>
>> What happen if we have "Content-Transfer-Encoding: quoted-printable"?
>> Does this decode it properly?
>>
>> Example: https://lore.kernel.org/io-uring/[email protected]/raw
>>
> 
> Yes, it does normally.

It really *does not*, please try and see the output yourself.
The email sent to Telegram is not decoded.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21  7:04       ` Ammar Faizi
@ 2022-10-21  7:37         ` Muhammad Rizki
  2022-10-21  7:40           ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-21  7:37 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 21/10/2022 14.04, Ammar Faizi wrote:
> On 10/20/22 12:23 AM, Muhammad Rizki wrote:
>> On 20/10/2022 00.04, Ammar Faizi wrote:
>>> On 10/18/22 3:16 PM, Muhammad Rizki wrote:
>>>> +def manage_payload(payload: Message):
>>>> +    p = str(payload.get_payload())
>>>> +    tf_encode = payload.get("Content-Transfer-Encoding")
>>>> +    if tf_encode != "base64":
>>>> +        return p.encode().decode("utf-8", errors="replace")
>>>> +    return b64decode(p).decode("utf-8")
>>>
>>> What happen if we have "Content-Transfer-Encoding: quoted-printable"?
>>> Does this decode it properly?
>>>
>>> Example: 
>>> https://lore.kernel.org/io-uring/[email protected]/raw
>>>
>>
>> Yes, it does normally.
> 
> It really *does not*, please try and see the output yourself.
> The email sent to Telegram is not decoded.
> 

IDk why you said it *does not*
https://i.ibb.co/84LHCbt/image.png

Can you explain?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21  7:37         ` Muhammad Rizki
@ 2022-10-21  7:40           ` Ammar Faizi
  2022-10-21  8:22             ` Muhammad Rizki
  0 siblings, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-21  7:40 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/21/22 2:37 PM, Muhammad Rizki wrote:
> IDk why you said it *does not*
> https://i.ibb.co/84LHCbt/image.png
> 
> Can you explain?

It's called quote-printable encode. Please see the decoded version:

https://lore.kernel.org/io-uring/[email protected]/

and carefully examine the differences.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21  7:40           ` Ammar Faizi
@ 2022-10-21  8:22             ` Muhammad Rizki
  2022-10-21  8:33               ` Ammar Faizi
  0 siblings, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-21  8:22 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 21/10/2022 14.40, Ammar Faizi wrote:
> On 10/21/22 2:37 PM, Muhammad Rizki wrote:
>> IDk why you said it *does not*
>> https://i.ibb.co/84LHCbt/image.png
>>
>> Can you explain?
> 
> It's called quote-printable encode. Please see the decoded version:
> 
> https://lore.kernel.org/io-uring/[email protected]/
> 
> and carefully examine the differences.
> 

Ohhh, I see. Let's use `quopri` libary here and use decodestring().

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21  8:22             ` Muhammad Rizki
@ 2022-10-21  8:33               ` Ammar Faizi
  2022-10-21  9:58                 ` Muhammad Rizki
  2022-10-21 10:47                 ` Muhammad Rizki
  0 siblings, 2 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-21  8:33 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/21/22 3:22 PM, Muhammad Rizki wrote:
> Ohhh, I see. Let's use `quopri` libary here and use decodestring().

There are other possible values for this. But 7-bit and 8-bit
are not encoded.

https://help.perforce.com/sourcepro/current/HTML/index.html#page/SourcePro_Net/protocolsug-MIMEAdvanced.26.04.html

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21  8:33               ` Ammar Faizi
@ 2022-10-21  9:58                 ` Muhammad Rizki
  2022-10-21 10:47                 ` Muhammad Rizki
  1 sibling, 0 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-21  9:58 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 21/10/2022 15.33, Ammar Faizi wrote:
> On 10/21/22 3:22 PM, Muhammad Rizki wrote:
>> Ohhh, I see. Let's use `quopri` libary here and use decodestring().
> 
> There are other possible values for this. But 7-bit and 8-bit
> are not encoded.
> 
> https://help.perforce.com/sourcepro/current/HTML/index.html#page/SourcePro_Net/protocolsug-MIMEAdvanced.26.04.html
> 

I'll look into that, thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21  8:33               ` Ammar Faizi
  2022-10-21  9:58                 ` Muhammad Rizki
@ 2022-10-21 10:47                 ` Muhammad Rizki
  2022-10-21 10:53                   ` Ammar Faizi
  1 sibling, 1 reply; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-21 10:47 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 21/10/2022 15.33, Ammar Faizi wrote:
> On 10/21/22 3:22 PM, Muhammad Rizki wrote:
>> Ohhh, I see. Let's use `quopri` libary here and use decodestring().
> 
> There are other possible values for this. But 7-bit and 8-bit
> are not encoded.
> 
> https://help.perforce.com/sourcepro/current/HTML/index.html#page/SourcePro_Net/protocolsug-MIMEAdvanced.26.04.html
> 

Do you have the sample email message that use binary as 
Content-Transfer-Encoding? I want to test it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21 10:47                 ` Muhammad Rizki
@ 2022-10-21 10:53                   ` Ammar Faizi
  2022-10-21 10:54                     ` Muhammad Rizki
  0 siblings, 1 reply; 42+ messages in thread
From: Ammar Faizi @ 2022-10-21 10:53 UTC (permalink / raw)
  To: Muhammad Rizki; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 10/21/22 5:47 PM, Muhammad Rizki wrote:
> Do you have the sample email message that use binary as Content-Transfer-Encoding? I want to test it.

Unfortunately, I don't have. I never see it. It seems it's
a rarely used transfer-encoding. Let's ignore this for now.

At least please take care of the quoted-printable format
which has a real example in front of us.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 6/7] atom: add manage_payload()
  2022-10-21 10:53                   ` Ammar Faizi
@ 2022-10-21 10:54                     ` Muhammad Rizki
  0 siblings, 0 replies; 42+ messages in thread
From: Muhammad Rizki @ 2022-10-21 10:54 UTC (permalink / raw)
  To: Ammar Faizi; +Cc: Alviro Iskandar Setiawan, GNU/Weeb Mailing List

On 21/10/2022 17.53, Ammar Faizi wrote:
> 
> At least please take care of the quoted-printable format
> which has a real example in front of us.
> 
Yes, I tested it and it works using the quopri from the python3 library.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v1 4/7] atom: Improve fix_utf8_char()
  2022-10-20  4:24     ` Muhammad Rizki
@ 2022-10-21 11:31       ` Ammar Faizi
  0 siblings, 0 replies; 42+ messages in thread
From: Ammar Faizi @ 2022-10-21 11:31 UTC (permalink / raw)
  To: Muhammad Rizki, Alviro Iskandar Setiawan; +Cc: GNU/Weeb Mailing List

On 10/20/22 11:24 AM, Muhammad Rizki wrote:
> Actually, I can do it. Just use an enums library and make a class Platform and use it like Platform.DISCORD or Platform.TELEGRAM

I agree with that. Please wire this up in v3.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2022-10-21 11:31 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-18  8:16 [PATCH v1 0/7] Fix some bugs and add some features Muhammad Rizki
2022-10-18  8:16 ` [PATCH v1 1/7] discord: Add send_text_mail_interaction() Muhammad Rizki
2022-10-18  8:16 ` [PATCH v1 2/7] discord: Add send_patch_mail_interaction() Muhammad Rizki
2022-10-18  8:16 ` [PATCH v1 3/7] discord: Add get lore mail slash command Muhammad Rizki
2022-10-18  8:16 ` [PATCH v1 4/7] atom: Improve fix_utf8_char() Muhammad Rizki
2022-10-19 16:59   ` Ammar Faizi
2022-10-19 17:23     ` Muhammad Rizki
2022-10-19 17:27       ` Ammar Faizi
2022-10-19 17:35         ` Muhammad Rizki
2022-10-19 17:42           ` Ammar Faizi
2022-10-19 17:46             ` Ammar Faizi
2022-10-19 17:51             ` Muhammad Rizki
2022-10-19 17:53               ` Ammar Faizi
2022-10-19 17:55                 ` Muhammad Rizki
2022-10-19 18:11                   ` Ammar Faizi
2022-10-19 22:34                     ` Alviro Iskandar Setiawan
2022-10-20  4:26                       ` Muhammad Rizki
2022-10-20  5:02                         ` Ammar Faizi
2022-10-20  5:06                           ` Muhammad Rizki
2022-10-20  5:10                             ` Ammar Faizi
2022-10-20  5:10                               ` Muhammad Rizki
2022-10-20  5:16                                 ` Ammar Faizi
2022-10-19 18:04                 ` Muhammad Rizki
2022-10-19 18:14                   ` Ammar Faizi
2022-10-19 22:44   ` Alviro Iskandar Setiawan
2022-10-20  4:24     ` Muhammad Rizki
2022-10-21 11:31       ` Ammar Faizi
2022-10-18  8:16 ` [PATCH v1 5/7] atom: Improve remove_patch() Muhammad Rizki
2022-10-18  8:16 ` [PATCH v1 6/7] atom: add manage_payload() Muhammad Rizki
2022-10-19 17:04   ` Ammar Faizi
2022-10-19 17:23     ` Muhammad Rizki
2022-10-19 17:28       ` Ammar Faizi
2022-10-21  7:04       ` Ammar Faizi
2022-10-21  7:37         ` Muhammad Rizki
2022-10-21  7:40           ` Ammar Faizi
2022-10-21  8:22             ` Muhammad Rizki
2022-10-21  8:33               ` Ammar Faizi
2022-10-21  9:58                 ` Muhammad Rizki
2022-10-21 10:47                 ` Muhammad Rizki
2022-10-21 10:53                   ` Ammar Faizi
2022-10-21 10:54                     ` Muhammad Rizki
2022-10-18  8:16 ` [PATCH v1 7/7] telegram: Fix get lore command Muhammad Rizki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox