From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAD913C3C; Fri, 21 Feb 2025 04:26:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740112018; cv=none; b=rXuPs+/C3nJlRvgTNjIDGUzP7N8BKlGPeE5VphQKt7mSPWLLlk+3zGSyeGdCZ4qfNof+zX9kz+L7IBe4PH647O2VM2DKjWjnNQ7vQpsbTYTRrxJtRdX2S6uxYSiYFACMyeyWMOznUf1z9EyF1+eerVEec/g7ELqqJUBa7UUXUIo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740112018; c=relaxed/simple; bh=Ea4219ooDO1EniOFkCiKof+gI8YN40VRM+J6vw4MEPw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rMTPsI/fRrGBD6w21IywETPLzKbhxAEiQnbAZrZGV8gRl6pxpdvBpy6AXtMSD66aeoFH80RgNnPwUeEGxRm/86XN7Ybtxzm3M3zhV/2Y7TV0wWtVihJTmPueG0CfYMJTLO21bPr8QovNFSbAkZ6pjx9GBnYV/J7onCHS9/OEzOo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mS4bABxB; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mS4bABxB" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-2212a930001so44918435ad.0; Thu, 20 Feb 2025 20:26:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740112015; x=1740716815; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SjEhKfycrLbgv/538+UuibgLqcEAOguZ+J6XiO5Q9OQ=; b=mS4bABxBKRGhS+MC4vEAhiGQdeZZ+VjDdUVPv8D74pw1i/UEXJqFpr8vV8Oui6knWo 4DXFI06MyqWt8PsG+LL/zlCB2MLgSx/5UgrvkSbxuiw2i7yO/NitTDaXxG0kYVT0j95m NreM0s3zdgVx4UrRyc4EH65BBxWg9zTzUZofz6CdtaelJ9MmXi6Nuq54or9KmHgpkr8k aP2zQ1+GRGT3xB4zbir9WwXak2zItHHsA/UWkkcodtfnK3tV2VGGgTokYL27HftS37Br idshkYh9nAxgwKJ96D53x+QKdKZYKvTr3T/42bT+E4yxRtsnzzX3V6ZfOw4ynuuMB2p6 KNcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740112015; x=1740716815; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SjEhKfycrLbgv/538+UuibgLqcEAOguZ+J6XiO5Q9OQ=; b=PVH4paUtD0dpM769oLGJ2e2T6PGbhHYCHJpfGhryT8WSgpiqcX9ZqxswoNnyfvJ1AV qEdtoG0+yHHUK8B3HBZi04FQ8VNlwrHAfobXnEyJxGE9n3yrgIxkFHYB2x9VjlEyAXiK OCBoFVhXZhG3uYI11Eh2X8o/b0m39CYDbxY8Zbsv9IiHOa9zMZGLCwNeV0pg+RiGRpjn pT/OfWwCOHWz9Z3KhV/76hZaC4NsdxzODbFuAl2+x9RqKMGFGHkXQ0JL7gcXp4eizgyo XtWacVPZiNz8kgHbN8JjGsDqkhfotvegxmzbaaaGxS9P+IpsRnSt0pbxIARZBlZ/wyRJ OtUA== X-Forwarded-Encrypted: i=1; AJvYcCVyA4S8yTo0O4TeaXLflFnsx/GZpA/UeNrJjlZ5KjHbfjMQ2phPUMovG+A/oq6ktoA0TDIDPK+aQF4HX60=@vger.kernel.org X-Gm-Message-State: AOJu0YzBM79OA45LXrekX3PG7SzAEAxedqiD5JoANzio0RssFtuEO5sS KyV5F/BtHTsZctnwhTUbrDRHOC2RrsbtrBnHoCNt7wr3qGi9FGlB4vTuvA== X-Gm-Gg: ASbGncvnDzj9E3oOTpMkf6/AsnKVNG6tKVzgbpz9t4BE4iE8takHb/PiXxd2VxcPaHB gwWfBWAgu1jotBb/dRxBQfpeyk6fYQVDPIl4Cg7TBEK2Wt/rsolagD+cauuDi9CJQPs/BZs+Yxz RMwpvlXzRdca9VwWJtB+D9Ywy0x+i73P73FpPHvmLaOFqkcPwvOnMr1IP2ahDeNtmkhmwjhQ/ib pGgbF2+7Cd7gTcinTwprPaYbedyc99GoaYJDdPagYac1JwPTXymrRAq5ZDztCdrfOiXZ8271BRx Uu4NjpJfkPmgq24vkjOYNsYTPp3l X-Google-Smtp-Source: AGHT+IGDa9dcAoo53tWGxA2n0P8xfbUvjODqYi+UtqbBATdXkzff0z9b96ewC3YeMOc/Lth5/IBvOw== X-Received: by 2002:a17:902:cecc:b0:210:f706:dc4b with SMTP id d9443c01a7336-221a0ed71c5mr19103545ad.13.1740112015541; Thu, 20 Feb 2025 20:26:55 -0800 (PST) Received: from minh.. ([2001:ee0:4f4d:ece0:cdf5:64ad:9d8c:d0ba]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-220d5586080sm128162165ad.229.2025.02.20.20.26.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Feb 2025 20:26:55 -0800 (PST) From: Bui Quang Minh To: io-uring@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , linux-kernel@vger.kernel.org, Bui Quang Minh Subject: [RFC PATCH 2/2] io_uring/io-wq: try to batch multiple free work Date: Fri, 21 Feb 2025 11:19:26 +0700 Message-ID: <20250221041927.8470-3-minhquangbui99@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250221041927.8470-1-minhquangbui99@gmail.com> References: <20250221041927.8470-1-minhquangbui99@gmail.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Currently, in case we don't use IORING_SETUP_DEFER_TASKRUN, when io worker frees work, it needs to add a task work. This creates contention on tctx->task_list. With this commit, io work queues free work on a local list and batch multiple free work in one call when the number of free work in local list exceeds IO_REQ_ALLOC_BATCH. Signed-off-by: Bui Quang Minh --- io_uring/io-wq.c | 62 +++++++++++++++++++++++++++++++++++++++++++-- io_uring/io-wq.h | 4 ++- io_uring/io_uring.c | 23 ++++++++++++++--- io_uring/io_uring.h | 6 ++++- 4 files changed, 87 insertions(+), 8 deletions(-) diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index 5d0928f37471..096711707db9 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -544,6 +544,20 @@ static void io_assign_current_work(struct io_worker *worker, raw_spin_unlock(&worker->lock); } +static void flush_req_free_list(struct llist_head *free_list, + struct llist_node *tail) +{ + struct io_kiocb *first_req, *last_req; + + first_req = container_of(free_list->first, struct io_kiocb, + io_task_work.node); + last_req = container_of(tail, struct io_kiocb, + io_task_work.node); + + io_req_normal_work_add(first_req, last_req); + init_llist_head(free_list); +} + /* * Called with acct->lock held, drops it before returning */ @@ -553,6 +567,10 @@ static void io_worker_handle_work(struct io_wq_acct *acct, { struct io_wq *wq = worker->wq; bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state); + LLIST_HEAD(free_list); + int free_req = 0; + struct llist_node *tail = NULL; + struct io_ring_ctx *last_added_ctx = NULL; do { struct io_wq_work *work; @@ -592,6 +610,9 @@ static void io_worker_handle_work(struct io_wq_acct *acct, do { struct io_wq_work *next_hashed, *linked; unsigned int hash = io_get_work_hash(work); + struct io_kiocb *req = container_of(work, + struct io_kiocb, work); + bool did_free = false; next_hashed = wq_next_work(work); @@ -601,7 +622,41 @@ static void io_worker_handle_work(struct io_wq_acct *acct, wq->do_work(work); io_assign_current_work(worker, NULL); - linked = wq->free_work(work); + /* + * All requests in free list must have the same + * io_ring_ctx. + */ + if (last_added_ctx && last_added_ctx != req->ctx) { + flush_req_free_list(&free_list, tail); + tail = NULL; + last_added_ctx = NULL; + free_req = 0; + } + + /* + * Try to batch free work when + * !IORING_SETUP_DEFER_TASKRUN to reduce contention + * on tctx->task_list. + */ + if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN) + linked = wq->free_work(work, NULL, NULL); + else + linked = wq->free_work(work, &free_list, &did_free); + + if (did_free) { + if (!tail) + tail = free_list.first; + + last_added_ctx = req->ctx; + free_req++; + if (free_req == IO_REQ_ALLOC_BATCH) { + flush_req_free_list(&free_list, tail); + tail = NULL; + last_added_ctx = NULL; + free_req = 0; + } + } + work = next_hashed; if (!work && linked && !io_wq_is_hashed(linked)) { work = linked; @@ -626,6 +681,9 @@ static void io_worker_handle_work(struct io_wq_acct *acct, break; raw_spin_lock(&acct->lock); } while (1); + + if (free_list.first) + flush_req_free_list(&free_list, tail); } static int io_wq_worker(void *data) @@ -899,7 +957,7 @@ static void io_run_cancel(struct io_wq_work *work, struct io_wq *wq) do { atomic_or(IO_WQ_WORK_CANCEL, &work->flags); wq->do_work(work); - work = wq->free_work(work); + work = wq->free_work(work, NULL, NULL); } while (work); } diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h index b3b004a7b625..4f1674d3d68e 100644 --- a/io_uring/io-wq.h +++ b/io_uring/io-wq.h @@ -21,7 +21,9 @@ enum io_wq_cancel { IO_WQ_CANCEL_NOTFOUND, /* work not found */ }; -typedef struct io_wq_work *(free_work_fn)(struct io_wq_work *); +typedef struct io_wq_work *(free_work_fn)(struct io_wq_work *, + struct llist_head *, + bool *did_free); typedef void (io_wq_work_fn)(struct io_wq_work *); struct io_wq_hash { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 0c111f7d7832..0343c9ec7271 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -120,7 +120,6 @@ #define IO_TCTX_REFS_CACHE_NR (1U << 10) #define IO_COMPL_BATCH 32 -#define IO_REQ_ALLOC_BATCH 8 #define IO_LOCAL_TW_DEFAULT_MAX 20 struct io_defer_entry { @@ -985,13 +984,18 @@ __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx) return true; } -__cold void io_free_req(struct io_kiocb *req) +static __cold void io_set_req_free(struct io_kiocb *req) { /* refs were already put, restore them for io_req_task_complete() */ req->flags &= ~REQ_F_REFCOUNT; /* we only want to free it, don't post CQEs */ req->flags |= REQ_F_CQE_SKIP; req->io_task_work.func = io_req_task_complete; +} + +__cold void io_free_req(struct io_kiocb *req) +{ + io_set_req_free(req); io_req_task_work_add(req); } @@ -1772,16 +1776,27 @@ int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts) IO_URING_F_COMPLETE_DEFER); } -struct io_wq_work *io_wq_free_work(struct io_wq_work *work) +struct io_wq_work *io_wq_free_work(struct io_wq_work *work, + struct llist_head *free_list, + bool *did_free) { struct io_kiocb *req = container_of(work, struct io_kiocb, work); struct io_kiocb *nxt = NULL; + bool free_req = false; if (req_ref_put_and_test(req)) { if (req->flags & IO_REQ_LINK_FLAGS) nxt = io_req_find_next(req); - io_free_req(req); + io_set_req_free(req); + if (free_list) + __llist_add(&req->io_task_work.node, free_list); + else + io_req_task_work_add(req); + free_req = true; } + if (did_free) + *did_free = free_req; + return nxt ? &nxt->work : NULL; } diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index bdd6407c14d0..dc050bc44b65 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -54,6 +54,8 @@ struct io_wait_queue { #endif }; +#define IO_REQ_ALLOC_BATCH 8 + static inline bool io_should_wake(struct io_wait_queue *iowq) { struct io_ring_ctx *ctx = iowq->ctx; @@ -111,7 +113,9 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr); int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin); void __io_submit_flush_completions(struct io_ring_ctx *ctx); -struct io_wq_work *io_wq_free_work(struct io_wq_work *work); +struct io_wq_work *io_wq_free_work(struct io_wq_work *work, + struct llist_head *free_req, + bool *did_free); void io_wq_submit_work(struct io_wq_work *work); void io_free_req(struct io_kiocb *req); -- 2.43.0