From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kiizuha@gnuweeb.org>
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.8 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,NO_DNS_FOR_FROM,
	URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6
Received: from [192.168.1.2] (unknown [101.128.126.198])
	by gnuweeb.org (Postfix) with ESMTPSA id 169FF804D1;
	Mon,  7 Nov 2022 01:11:30 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org;
	s=default; t=1667783492;
	bh=/vdkII3V7puSicnnjC0yw1pUqz+EDmeldMzYNJ/ZD5c=;
	h=Date:Subject:To:Cc:References:From:In-Reply-To:From;
	b=hQn5Csvbs80L7RLOicp+QilooqqNSkYJ+9MUDtoSkAFbFzG5WesH2fSf5qvShNNva
	 RGNDfQZxBv+XuwF4yyc3OmnLv40U4mte7EHUWWqp/cwtj+mU1F9VSAbr76j1v0SM7f
	 wcUm16zKoYC5IuT4AGCqAqwCXawIVQ1XWhnC4bzLrlHolR88TkjctkAe4fG/ySHMeK
	 UzYflpIw8khvSIhrYF+ZwfgtkRP+VMypkDalA+ASIwJkpMu1daDrvlsr17/CkH56FS
	 yDS1rKcjF/3M9HVoGO69JS39PyXM1IHi8oOHGXgqkE0QTYvli62L2Uj32Tr8Ov9T18
	 OccTX5GShajZA==
Message-ID: <942fe0b7-9b12-0445-6c75-ad90addb0f21@gnuweeb.org>
Date: Mon, 7 Nov 2022 08:11:28 +0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.0
Subject: Re: [PATCH v1 05/16] utils: Improve fix_utf8_char()
Content-Language: en-US
To: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Cc: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>,
 GNU/Weeb Mailing List <gwml@vger.gnuweeb.org>
References: <20221104180931.3852-1-kiizuha@gnuweeb.org>
 <20221104180931.3852-6-kiizuha@gnuweeb.org>
 <73b917bc-6d20-b2d1-f956-11d6e2ff1662@gnuweeb.org>
From: Muhammad Rizki <kiizuha@gnuweeb.org>
In-Reply-To: <73b917bc-6d20-b2d1-f956-11d6e2ff1662@gnuweeb.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
List-Id: <gwml.vger.gnuweeb.org>

On 07/11/2022 08.01, Ammar Faizi wrote:
> On 11/5/22 1:09 AM, Muhammad Rizki wrote:
>> --- a/daemon/atom/utils.py
>> +++ b/daemon/atom/utils.py
>> @@ -263,8 +263,8 @@ def remove_patch(tmp: Union[str, list]):
>>   def fix_utf8_char(text: str, html_escape: bool = True):
>>       t = text.rstrip().replace("�"," ")
>>       if html_escape:
>> -        t = html.escape(html.escape(text))
>> -    return t
>> +        return html.escape(html.escape(text))
>> +    return html.unescape(t)
> 
> Can you explain why we need to do the following:
> 
>     html.escape(html.escape(text))
> 
> Why does it have to be escaped twice? I still don't understand the
> reason behind this mess since the beginning.
> 

 From the past patch, some email doesn't escape correctly, like it 
supposed to escaped "&gt; &gt;" to "> >", but the result keep "> &gt;". 
That problem is from the past, IDK if current patch will fix that escape 
if using html.escape() 1 time.