From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ammarfaizi2@gnuweeb.org>
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.8 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,NO_DNS_FOR_FROM,
	URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6
Received: from [10.7.7.5] (unknown [182.253.183.240])
	by gnuweeb.org (Postfix) with ESMTPSA id 0B0DC81663;
	Sun, 20 Nov 2022 05:23:48 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org;
	s=default; t=1668921830;
	bh=ZQ7gwgAKC/Inh3HsGQd0gDCR+IO75f1Cavp1KN7v8tU=;
	h=Date:To:Cc:References:From:Subject:In-Reply-To:From;
	b=jlifs+ppp71Z+xFcn5VkEf0Ji3h3/ThuRGlm/8O+zC/gbLWqTfmvOuAfe68pSX4Ub
	 mzaRu8MTbbMNML5DdLFPQeaFfmH0Qy2LrSprxdfMK2e9Xo1sxXvMLuxiYOPhxxVteL
	 OoCVri5psqnxstJyk4kNW4c4f2Z7WnkIqN8EgJxuP5qVUR9mv4RXLDRSvR0on95Kh8
	 ddpK7+iOZ33UJ6aRoI6lKip2fS/ofJEKIW+ds5nlIYFC1UxwzeR4hDfcFbid9IkEih
	 rWTRHANjXr7FPVaFUBTZXUve1MjL4S6MrOxikD/4unTS/Dlx0qPGDfnYFtgFWCJt4X
	 ej2cVXCeU3T0g==
Message-ID: <6fd38326-a7b1-38ea-d9f1-1da90ed6ff19@gnuweeb.org>
Date: Sun, 20 Nov 2022 12:23:46 +0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.2
Content-Language: en-US
To: Muhammad Rizki <kiizuha@gnuweeb.org>
Cc: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>,
 GNU/Weeb Mailing List <gwml@vger.gnuweeb.org>
References: <20221109025002.258-1-kiizuha@gnuweeb.org>
 <20221109025002.258-7-kiizuha@gnuweeb.org>
From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Subject: Re: [PATCH v2 06/17] utils: Improve fix_utf8_char()
In-Reply-To: <20221109025002.258-7-kiizuha@gnuweeb.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
List-Id: <gwml.vger.gnuweeb.org>

On 11/9/22 9:49 AM, Muhammad Rizki wrote:
> Improvement for the fix_utf8_char() to ensure the `&gt;` will be
> unescaped, because if not use the html.unescape(), the email payload
> will contain `&gt;` for the Discord bot.
> 
> Also, change on the html.escape() to use it only once. From the past
> issue bb8855bf, some email message doesn't escaped correctly, so I use
> the html.escape() twice. Within the current version, this issue should
> be fixed and can call the html.escape() just once.
> 
> Fixes: bb8855bf ("Fix the storage management after the refactor was happened")
> Signed-off-by: Muhammad Rizki <kiizuha@gnuweeb.org>
> ---
>   daemon/atom/utils.py | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/daemon/atom/utils.py b/daemon/atom/utils.py
> index dd9e1a6..c21a4b5 100644
> --- a/daemon/atom/utils.py
> +++ b/daemon/atom/utils.py
> @@ -258,8 +258,8 @@ def remove_patch(tmp: Union[str, list]):
>   def fix_utf8_char(text: str, html_escape: bool = True):
>   	t = text.rstrip().replace("�"," ")
>   	if html_escape:
> -		t = html.escape(html.escape(text))
> -	return t
> +		return html.escape(text)
> +	return html.unescape(t)

Please stop trying random things to make your output looks good
without understanding what went wrong. This stupid path has been
turning on and off forever since the beginning. What is exactly
the underlying issue behind this?

I want to get a real understanding of why such an issue happens. I
will start rejecting fixes that can't be well-understood start from
now on. For this one patch, I want you:

   1. Understand went wrong from the past.

   2. Explain how did it go wrong.

   3. Explain how does this patch act as a real fix.

Double escape was just your random attempt and it didn't actually
fix the issue well enough. Why? Because your fix is not based on an
understanding, your fix is only respecting particular output and
you hacked it to make it looks good, but throw away generic cases.

You can't explain the technical reason of why you did double escape.
Just like this patch does. I don't want we work this way forever.

-- 
Ammar Faizi