From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A87A1ACEC8 for ; Wed, 7 May 2025 23:11:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746659515; cv=none; b=JKpbZW9CpVWY3mm+as7RVoYq0QdD1dkSEwCF67LsyfdJFRsuerpsEJClKTjU9vv0kaLu4cpITJYynkv/ELJHaFRDaK7YtunzuCJGPARsfgxQ0QjE0b8VZUBGQAGa+dwdwUe1dpEwZLEy7Xk0PD1dR5TU5FKEfxRaEqc9iUlQd3A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746659515; c=relaxed/simple; bh=tBTdQPXRWExatOhyyjqYAT1RNWeoQuo3hLZB/ndQyC8=; h=Message-ID:Date:MIME-Version:Subject:To:References:From: In-Reply-To:Content-Type; b=cviFxw7E2Ajohjg4+1mbA31FTIZYwi1lpLthkIftaDBNDxED3r9GgybR9RkIwIDYZdDhg1KkCvvwcELomhCF6x34bL/e4Foj/m2506X+PFchzw+1Y+iGu2kQuh6g6eblcuvUrablCv8rjhaDUtad0aGWa6RlwULJgtvVn9iLsyI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=NbIGvU6G; arc=none smtp.client-ip=209.85.166.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="NbIGvU6G" Received: by mail-il1-f180.google.com with SMTP id e9e14a558f8ab-3d81768268dso2413925ab.3 for ; Wed, 07 May 2025 16:11:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1746659510; x=1747264310; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=MTSY7PSc8f3g/I/A4eqTacoY2ArUlG1NaNX/chZl2g0=; b=NbIGvU6G5RnZwBa4zI/LYAupsHQFQfRF9nCadzLXuMVv/3HecjbEtAVU2Ew1MqHzfb XlfL9FIfKOgxQPWo8bvQR+tAq/jhJUNmao1SVshF6DLIKWcnt5F81oVmS5PS2XNZSLrC dJm+DjA6H6QFLHGf8IPHidg1NV78YKICJDJQR9zVu5LTweUzUVtpQE4FY6ySeEbF+NQg 8pKAWEtdbQ4IJ9O4G/LU08dibcL91VOgbAr9t+IRzc7YOYlsGKcOk5EwtJ+6W4dXHexe rcylbtpd7Zb9a3Gtku9UxP+7XXK1TkA4X1K2aaim5riN9acNLZPXzNXvzLMfvVqMccvG 14Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746659510; x=1747264310; h=content-transfer-encoding:in-reply-to:content-language:from :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MTSY7PSc8f3g/I/A4eqTacoY2ArUlG1NaNX/chZl2g0=; b=pyvp/gOjcruCSEgfum0s03O4JM3ONIFCE08l2FwZe68sAy/Hpey5dHLpFvtSaSEyvh miG9ZcUWREZvJFxJjoCugEeALsVaBTJmcb0nCH+QY0o9xFXYb1BP8IxHiQEfTETIkfKb RQ4uzlr3k5mLSNGDCCcOOaJriwQ2bDKCgJYiJVVPOFIos+g7kyok1nT+iRtWZ9b1JyIh 8hXlQL6FJ2izb9+EDL/k/H1aN/zs+VSeipIn/Rej28+V6jwPpN5/BX+7usisfodi2yKm xASnLqAHUU9wKgsyV8N+KEXO5O1l2IwGfUW7nwlvMANoCTLBz+BqBxQLqjkCJ8yvMGtX 5STA== X-Forwarded-Encrypted: i=1; AJvYcCVZaNm/UmU/803qpyBNpNQpiAlBC34D2tGL4q8PJ5x7+sGqvHvNtMNqdMfZ296fNR2CJdyllcZtUA==@vger.kernel.org X-Gm-Message-State: AOJu0Yw58iJSKh49PAGsU0pfJw+rHFxN8KjsK+zEek+dQ4yTHNRKNQ/o C2ssiK6lKCJJIxsNDWWQm7ZUaRdBbQVHVO3vu5PRUPv5bZMD6M6V05DwTMUpqN4= X-Gm-Gg: ASbGnctnwImC003cYLxSbp8Ja47WKooOTM7TaOBa8n3jQaCinyDy5bFUGPHdNf8L9Ss 7bAUohq4RfDz4oS1sFfulm/k1ysU09vHdq3PVBduqJ5/XloPQKLonfU/1QUbeuxSHn5e0FTTmI/ HMYV6wV+E2NtZahZx/4DM47DhzK0NutMZEXxDdYK/HCLlIb0+BXkSCPXhMYr5Tweke1JwFYwSG+ 1VvRVNzz0N8QUxmi3tWh7l++1/EtKq6WApyio9+R36FnCxYzIoKsZN+2fjZ5rbc4KknKe9NRRWx bgU2tHNGBLIsyG4Bm6ygLd6NAuGw0aw96RA93A== X-Google-Smtp-Source: AGHT+IHgQfUTYmYSQGkNI0F+1Siw2hVCAwbDp4AlEW03RxkVetSd0bHGu7zqMdyUx2/8MSsxZav/Jg== X-Received: by 2002:a05:6e02:220b:b0:3d9:66ba:1ad2 with SMTP id e9e14a558f8ab-3da7388855cmr64360245ab.0.1746659510118; Wed, 07 May 2025 16:11:50 -0700 (PDT) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f88a8d1998sm2894380173.4.2025.05.07.16.11.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 May 2025 16:11:49 -0700 (PDT) Message-ID: Date: Wed, 7 May 2025 17:11:48 -0600 Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] io_uring/poll: use regular CQE posting for multishot termination To: Pavel Begunkov , io-uring References: <1711744d-1dd1-4efc-87e2-6ddc1124a95e@gmail.com> From: Jens Axboe Content-Language: en-US In-Reply-To: <1711744d-1dd1-4efc-87e2-6ddc1124a95e@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 5/7/25 2:53 PM, Pavel Begunkov wrote: > On 5/7/25 19:08, Jens Axboe wrote: >> A previous patch avoided reordering of multiple multishot requests >> getting their CQEs potentiall reordered when one of them terminates, as >> that last termination CQE is posted as a deferred completion rather than >> directly as a CQE. This can reduce the efficiency of the batched >> posting, hence was not ideal. >> >> Provide a basic helper that poll can use for this kind of termination, >> which does a normal CQE posting rather than a deferred one. With that, >> the work-around where io_req_post_cqe() needs to flush deferred >> completions can be removed. >> Signed-off-by: Jens Axboe >> >> --- >> >> This removes the io_req_post_cqe() flushing, and instead puts the honus >> on the poll side to provide the ordering. I've verified that this also >> fixes the reported issue. The previous patch can be easily backported to >> stable, so makes sense to keep that one. > > It still gives a bad feeling tbh, it's not a polling problem, > we're working around shortcomings of the incremental / bundled > uapi and/or design. Patching it in semi unrelated places will > defitely bite back. I don't think that's fair, we should always strive to have as close to ordered completions as we can. The fact that multishot ends up using a mix of both methods to fill CQEs is problematic. > Can it be fixed in relevant opcodes? So it stays close to > those who actually use it. And let me ask since I'm lost in > new features, can the uapi be fixed so that it doesn't > depend on request ordering? The API absolutely relies on ordering within a buffer group ID. It can certainly be fixed at the opcode sites, but there'd be 3 spots in net and one in rw.c, and for each spot it'd be more involved to fix it. >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >> index 541e65a1eebf..505959fc2de0 100644 >> --- a/io_uring/io_uring.c >> +++ b/io_uring/io_uring.c >> @@ -848,14 +848,6 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags) >> struct io_ring_ctx *ctx = req->ctx; >> bool posted; >> - /* >> - * If multishot has already posted deferred completions, ensure that >> - * those are flushed first before posting this one. If not, CQEs >> - * could get reordered. >> - */ >> - if (!wq_list_empty(&ctx->submit_state.compl_reqs)) >> - __io_submit_flush_completions(ctx); >> - >> lockdep_assert(!io_wq_current_is_worker()); >> lockdep_assert_held(&ctx->uring_lock); >> @@ -871,6 +863,23 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags) >> return posted; >> } >> +bool io_req_post_cqe_overflow(struct io_kiocb *req) > > "overflow" here is rather confusing, it could mean lots of things. > Maybe some *_post_poll_complete for now? Yeah it's not a great name, just didn't have any better ideas at the time. I'll ponder a bit, __complete() isn't terrible. >> +{ >> + bool filled; >> + >> + filled = io_req_post_cqe(req, req->cqe.res, req->cqe.flags); > > posting and overflow must be under the same CQ critical section, > like io_cq_lock(). Just copy io_post_aux_cqe() and add > ctx->cq_extra--? Hopefully we'll remove the cq_extra ugliness > later and combine them after. Would be great to combine those, as it stands there's a mix of them, and io_add_aux_cqe() for example does split locking. I'll update the locking. >> diff --git a/io_uring/poll.c b/io_uring/poll.c >> index 8eb744eb9f4c..af8e3d4f6f1f 100644 >> --- a/io_uring/poll.c >> +++ b/io_uring/poll.c >> @@ -312,6 +312,13 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw) >> return IOU_POLL_NO_ACTION; >> } >> +static void io_poll_req_complete(struct io_kiocb *req, io_tw_token_t tw) >> +{ >> + if (io_req_post_cqe_overflow(req)) >> + req->flags |= REQ_F_CQE_SKIP; > > Unconditional would be better. It'd still end up in attempting > to post, likely failing and reattemptng allocation just one > extra time, not like it gives any reliability. And if I'd be > choosing b/w dropping a completion or potentially getting a > botched completion as per the problem you tried, I say the > former is better. Not sure I follow, unconditional what? SKIP? Yes that probably makes sense, if we don't overflow post, it'll get logged as such anyway. -- Jens Axboe