From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,NO_DNS_FOR_FROM, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from [192.168.1.2] (unknown [101.128.126.198]) by gnuweeb.org (Postfix) with ESMTPSA id 169FF804D1; Mon, 7 Nov 2022 01:11:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=default; t=1667783492; bh=/vdkII3V7puSicnnjC0yw1pUqz+EDmeldMzYNJ/ZD5c=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=hQn5Csvbs80L7RLOicp+QilooqqNSkYJ+9MUDtoSkAFbFzG5WesH2fSf5qvShNNva RGNDfQZxBv+XuwF4yyc3OmnLv40U4mte7EHUWWqp/cwtj+mU1F9VSAbr76j1v0SM7f wcUm16zKoYC5IuT4AGCqAqwCXawIVQ1XWhnC4bzLrlHolR88TkjctkAe4fG/ySHMeK UzYflpIw8khvSIhrYF+ZwfgtkRP+VMypkDalA+ASIwJkpMu1daDrvlsr17/CkH56FS yDS1rKcjF/3M9HVoGO69JS39PyXM1IHi8oOHGXgqkE0QTYvli62L2Uj32Tr8Ov9T18 OccTX5GShajZA== Message-ID: <942fe0b7-9b12-0445-6c75-ad90addb0f21@gnuweeb.org> Date: Mon, 7 Nov 2022 08:11:28 +0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: [PATCH v1 05/16] utils: Improve fix_utf8_char() Content-Language: en-US To: Ammar Faizi Cc: Alviro Iskandar Setiawan , GNU/Weeb Mailing List References: <20221104180931.3852-1-kiizuha@gnuweeb.org> <20221104180931.3852-6-kiizuha@gnuweeb.org> <73b917bc-6d20-b2d1-f956-11d6e2ff1662@gnuweeb.org> From: Muhammad Rizki In-Reply-To: <73b917bc-6d20-b2d1-f956-11d6e2ff1662@gnuweeb.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: On 07/11/2022 08.01, Ammar Faizi wrote: > On 11/5/22 1:09 AM, Muhammad Rizki wrote: >> --- a/daemon/atom/utils.py >> +++ b/daemon/atom/utils.py >> @@ -263,8 +263,8 @@ def remove_patch(tmp: Union[str, list]): >>   def fix_utf8_char(text: str, html_escape: bool = True): >>       t = text.rstrip().replace("�"," ") >>       if html_escape: >> -        t = html.escape(html.escape(text)) >> -    return t >> +        return html.escape(html.escape(text)) >> +    return html.unescape(t) > > Can you explain why we need to do the following: > >    html.escape(html.escape(text)) > > Why does it have to be escaped twice? I still don't understand the > reason behind this mess since the beginning. > From the past patch, some email doesn't escape correctly, like it supposed to escaped "> >" to "> >", but the result keep "> >". That problem is from the past, IDK if current patch will fix that escape if using html.escape() 1 time.