From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,NO_DNS_FOR_FROM, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from [192.168.1.3] (unknown [101.128.125.55]) by gnuweeb.org (Postfix) with ESMTPSA id 7DBF88160D; Wed, 23 Nov 2022 09:50:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=default; t=1669197010; bh=RXqNnbNX5V/rwrL/O2Kg3N8aR/m+9rBZSoqAtxdD1FE=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=UA4ANf7JsurpmLX7HfPYxnLGZYHEt8aEU7KeYD4t5PEChn5Zr6C6rz/FA/+HrnNeK Y36kd1yUoLhl6+5Zm5yuEYrk7/7WnBCj1aqQGc3Futj01snpZcNrRxPL1EZwK3n2Lb jrZiK7Y4393cUDgqSIxxhLhc63qFsC9eq6LQAMAKRBqfuhVW9xmLoNZ8wCN5HHP8/a J20Svm0KpoH0aNeZAJj+lwkS/kso1sJ/jJ9vcOGTDWeMXOFjj72Og0mHPERkmmflGC Rz7Nn0QUY6TInRYZn0ol1j8l5/B0qlVghKllg7QjTYwC0Su4qx4iulVnYJ/iM9A4Z1 0RieibQFvpJaw== Message-ID: <18a3f58f-d6cf-54c0-ced2-59eafec29fae@gnuweeb.org> Date: Wed, 23 Nov 2022 16:50:05 +0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH v2 06/17] utils: Improve fix_utf8_char() Content-Language: en-US To: Ammar Faizi Cc: Alviro Iskandar Setiawan , GNU/Weeb Mailing List References: <20221109025002.258-1-kiizuha@gnuweeb.org> <20221109025002.258-7-kiizuha@gnuweeb.org> <6fd38326-a7b1-38ea-d9f1-1da90ed6ff19@gnuweeb.org> From: Muhammad Rizki In-Reply-To: <6fd38326-a7b1-38ea-d9f1-1da90ed6ff19@gnuweeb.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: On 20/11/2022 12.23, Ammar Faizi wrote: > On 11/9/22 9:49 AM, Muhammad Rizki wrote: >> Improvement for the fix_utf8_char() to ensure the `>` will be >> unescaped, because if not use the html.unescape(), the email payload >> will contain `>` for the Discord bot. >> >> Also, change on the html.escape() to use it only once. From the past >> issue bb8855bf, some email message doesn't escaped correctly, so I use >> the html.escape() twice. Within the current version, this issue should >> be fixed and can call the html.escape() just once. >> >> Fixes: bb8855bf ("Fix the storage management after the refactor was >> happened") >> Signed-off-by: Muhammad Rizki >> --- >>   daemon/atom/utils.py | 4 ++-- >>   1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/daemon/atom/utils.py b/daemon/atom/utils.py >> index dd9e1a6..c21a4b5 100644 >> --- a/daemon/atom/utils.py >> +++ b/daemon/atom/utils.py >> @@ -258,8 +258,8 @@ def remove_patch(tmp: Union[str, list]): >>   def fix_utf8_char(text: str, html_escape: bool = True): >>       t = text.rstrip().replace("�"," ") >>       if html_escape: >> -        t = html.escape(html.escape(text)) >> -    return t >> +        return html.escape(text) >> +    return html.unescape(t) > > Please stop trying random things to make your output looks good > without understanding what went wrong. This stupid path has been > turning on and off forever since the beginning. What is exactly > the underlying issue behind this? > > I want to get a real understanding of why such an issue happens. I > will start rejecting fixes that can't be well-understood start from > now on. For this one patch, I want you: > >   1. Understand went wrong from the past. > >   2. Explain how did it go wrong. > >   3. Explain how does this patch act as a real fix. > > Double escape was just your random attempt and it didn't actually > fix the issue well enough. Why? Because your fix is not based on an > understanding, your fix is only respecting particular output and > you hacked it to make it looks good, but throw away generic cases. > > You can't explain the technical reason of why you did double escape. > Just like this patch does. I don't want we work this way forever. > Oh, my bad. Sometimes I just being dumb when I really getting tired. adding `html.unescape()` at the end it was because I didn't focuse to see the change from "t = html.escape(html.escape(text))" to "return html.escape(text)", that's why I unescape it again from the old `t` variable. I change them to: def fix_utf8_char(text: str, html_escape: bool = True): t = text.rstrip().replace("�"," ") if html_escape: return html.escape(t) return t We only escape the text if the platform is Telegram and we don't need to escape nor unescape for the Discord platform, UNLESS you want me to force unescape if the email payload contain ">". But, yeah, forcing unescape is not a practice way, it will break the email payload. This should a FINAL change for html escape issue. That's my explanation, sorry if sometimes having a bad code, I usually don't work too seriously like this, please understand.