From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NO_DNS_FOR_FROM,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 Received: from localhost.localdomain (unknown [101.128.125.226]) by gnuweeb.org (Postfix) with ESMTPSA id B654E8060F; Wed, 3 Aug 2022 10:27:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=default; t=1659522472; bh=I4FpxsFYdTzsmQ5cXWVKga/vMxa5q4f5TLdPCFm270o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=B03/ocOL7IO7zDHbk+yAD4LpzCuKnjPWIqP6lhkZpfEra6fIHAJHCL22P7hvq/fP9 vQLulkXvmyD9UNTFsZDo8Wex0RvIgEbvYWyms70AU3h8oAwH/0rAcrQUgR34gWoJzo T6ME5ZCOM7LG5JYP6jIE4WLki3KrkyQGMJLcnI+ggb1tphkWME14ONw1B+iuxRCnv8 60sILk0wet+DRWxwcSb80dVVB0I7LQC2t0w8r4ll5CJpTS/ufpx3dI/ic6RgpjnCSJ xD6cTy+HVZPNNzV7Gp7SHzUgJvrReu3Otrnq3o3NSnLs6nKWaJcYSu7xrA7FTOL9Tl eE9qM5cVJFZFw== From: Muhammad Rizki To: Ammar Faizi Cc: Muhammad Rizki , GNU/Weeb Mailing List , Alviro Iskandar Setiawan Subject: [PATCH v1 2/2] Use html.escape() for fix_utf8_chars() Date: Wed, 3 Aug 2022 17:27:18 +0700 Message-Id: <20220803102718.1084-3-kiizuha@gnuweeb.org> X-Mailer: git-send-email 2.34.1.windows.1 In-Reply-To: <20220803102718.1084-1-kiizuha@gnuweeb.org> References: <20220803102718.1084-1-kiizuha@gnuweeb.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-Id: We found a bug when receiving email payload like ``, `®`, and more. Using double html.escape() will fix the bug for now. Signed-off-by: Muhammad Rizki Reported-by: Alviro Iskandar Setiawan --- daemon/scraper/utils.py | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/daemon/scraper/utils.py b/daemon/scraper/utils.py index 765468c..c428a33 100644 --- a/daemon/scraper/utils.py +++ b/daemon/scraper/utils.py @@ -14,6 +14,7 @@ import os import re import shutil import httpx +import html def get_email_msg_id(mail): @@ -218,12 +219,8 @@ def clean_up_after_send_patch(tmp): def fix_utf8_char(text: str): - return ( - text.rstrip() - .replace("<", "<") - .replace(">",">") - .replace("�"," ") - ) + text = text.rstrip().replace("�"," ") + return html.escape(html.escape(text)) EMAIL_MSG_ID_PATTERN = r"<([^\<\>]+)>" -- Muhammad Rizki