From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NO_DNS_FOR_FROM,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 Received: from localhost.localdomain (unknown [182.2.68.216]) by gnuweeb.org (Postfix) with ESMTPSA id 8066680B09; Mon, 29 Aug 2022 01:12:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=default; t=1661735523; bh=/ArapD7QZLmilYmAv/JjaeDelAu7xBwyw3EMFyr4FGk=; h=From:To:Cc:Subject:Date:From; b=m0Z/Yb2qvy4AcLSf06uPCnnKeqC7KuPAOe6K8vAtGaucfVsm4yus1SmjvGhJuX9p5 AXPN4O04k+ybuZQPQJwLIY4QRnMx4rUviq1tY+mnue6tCIaqcIDdKGsogFmpJ07WfC zPTY8tScf03Bc3zR0CHwxf7INCUjhdsuvPqIYIa/lue07+HdE59JFQUK1vuhIQ7ZGb gxSd8iVYnk8roRX8tFfVyWYdSAPBGtl5uoAfMGIDd4NSranJkCy4pUfZGkW/4dEffu QroMy7fhkAn/Uhq126YFieXXhpjnOwg8cog8kEbUrvdOjI5m7N0sCoK0wBV4aX4o+n jmsP4Gm4ICp3Q== From: Ammar Faizi To: Alviro Iskandar Setiawan Cc: Ammar Faizi , Muhammad Rizki , Kanna Scarlet , GNU/Weeb Mailing List Subject: [RFC PATCH v1 0/2] Fixed number of chromium workers Date: Mon, 29 Aug 2022 08:11:25 +0700 Message-Id: <20220829011127.3150320-1-ammarfaizi2@gnuweeb.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Hi Alviro, Currently, a single chnet instance uses a single dedicated chromium thread worker to perform an HTTP request. This doesn't scale well because we need to spawn a new thread for each HTTP request. Performing 4096 HTTP requests in parallel will spawn 4096 chromium thread workers, which is too expensive and consuming too much memory. A single chromium thread worker can handle multiple HTTP requests. This series creates a fixed number of chromium workers with ref count to spread the jobs fairly across the chromium thread workers. This greatly reduces the context switches and improve the performance. It also greatly reduces the memory usage. Implementation: 1) At initialization, when chnet_global_init() is called, create an array of pointers to `struct ch_thpool` and initialize those pointers to null. The number of elements in the array is taken from `std::thread::hardware_concurrency() - 1`. 2) When a new CHNet instance is created, its constructor calls `get_thread()` function which will initialize the array of pointers to `struct ch_thpool` if needed, then increment the ref count and it returns a pointer to the base::Thread class in `struct ch_thpool`. 3) When a CHNet instance is destroyed, it calls `put_thread()` function that will decrement the ref count of the `struct ch_thpool` and delete the object if the ref count reaches zero after getting decremented. [ The implementation detail that fairly spread the job across multiple chromium threads worker is in get_thread() function, please have a look. ] For the ring test case, the gained speedup is 33%. Without this series: ammarfaizi2@integral2:~/work/ncns$ time taskset -c 0-7 make -j8 test -s Running /home/ammarfaizi2/work/ncns/tests/cpp/ring.t real 0m28.184s user 0m52.368s sys 0m27.582s With this series: ammarfaizi2@integral2:~/work/ncns$ time taskset -c 0-7 make -j8 test -s Running /home/ammarfaizi2/work/ncns/tests/cpp/ring.t real 0m18.657s user 0m35.452s sys 0m2.146s Please review and comment! Signed-off-by: Ammar Faizi --- Ammar Faizi (2): chnet: Prepare global struct ch_thpool array chnet: Implement `get_thread()` and `put_thread()` function chnet/chnet.cc | 128 ++++++++++++++++++++++++++++++++++++++++++++++++- chnet/chnet.h | 2 +- 2 files changed, 128 insertions(+), 2 deletions(-) base-commit: e1123a1e7b9526e4b12356bfed222386d4b00a80 -- Ammar Faizi