From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from cloud48395.mywhc.ca (cloud48395.mywhc.ca [173.209.37.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C71F4C7B for ; Sat, 3 Aug 2024 16:51:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=173.209.37.211 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722703868; cv=none; b=bJZz2xx5rVoIWEOu3FVZciCpIWUJ488Hv7WWJyozqpBghQmNKd7BmC/CREHUjsOYlatsMdgBLc3LwJmqCNfBzY+dAYkDGswhStYZWTitN82ZkIrtcrxxPLQ8li9ASaaELlK/EOXQRXrbRhEBe0YMFwIBfXJexOW9wbEAjJcZikg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722703868; c=relaxed/simple; bh=nrvZki7Fta7W7HrPrexI0Z7GZ+jlqt0afb70shnGfL8=; h=Message-ID:Subject:From:To:Date:In-Reply-To:References: Content-Type:MIME-Version; b=N8F6E6zR7Ib/eAjuQVdsFG8dVAY8vdWmwjzN1Y+DtqNqDsb9JtPJ6VTZ55dHwH40ZYKw06bZMgL8JIpjpITM2GvCgK12A3YkE+Bm0udfATSupF8ydJnN8f0X4VSQGyxL6BtkZ0Q3dkEa2BSo2kwaGcZqt2WLST6s22MjcWfFttY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=trillion01.com; spf=pass smtp.mailfrom=trillion01.com; arc=none smtp.client-ip=173.209.37.211 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=trillion01.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=trillion01.com Received: from [45.44.224.220] (port=40776 helo=[192.168.1.177]) by cloud48395.mywhc.ca with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1saHy6-00041x-1F; Sat, 03 Aug 2024 12:50:58 -0400 Message-ID: Subject: Re: io_uring NAPI busy poll RCU is causing 50 context switches/second to my sqpoll thread From: Olivier Langlois To: Jens Axboe , Pavel Begunkov , io-uring@vger.kernel.org Date: Sat, 03 Aug 2024 12:50:57 -0400 In-Reply-To: References: <00918946-253e-43c9-a635-c91d870407b7@gmail.com> <43c27aa1-d955-4375-8d96-cd4201aecf50@gmail.com> <4dbbd36aa7ecd1ce7a6289600b5655563e4a5a74.camel@trillion01.com> <93b294fc-c4e8-4f1f-8abb-ebcea5b8c3a1@gmail.com> <7edc139bd159764075923e75ffb646e7313c7864.camel@trillion01.com> Autocrypt: addr=olivier@trillion01.com; prefer-encrypt=mutual; keydata=mQINBFYd0ycBEAC53xedP1NExPwtBnDkVuMZgRiLmWoQQ8U7vEwt6HVGSsMRHx9smD76i 5rO/iCT6tDIpZoyJsTOh1h2NTn6ZkoFSn9lNOJksE77/n7HNaNxiBfvZHsuNuI53CkYFix9JhzP3t g5nV/401re30kRfA8OPivpnj6mZhU/9RTwjbVPPb8dPlm2gFLXwGPeDITgSRs+KJ0mM37fW8EatJs 0a8J1Nk8wBvT7ce+S2lOrxDItra9pW3ukze7LMirwvdMRC5bdlw2Lz03b5NrOUq+Wxv7szn5Xr9f/ HdaCH7baWNAO6H/O5LbJ3zndewokEmKk+oCIcXjaH0U6QK5gJoO+3Yt5dcTo92Vm3VMxzK2NPFXgp La7lR9Ei0hzQ0zptyFFyftt9uV71kMHldaQaSfUTsu9dJbnS2kI/j+F2S1q6dgKi3DEm0ZRGvjsSG rkgPJ5T16GI1cS2iQntawdr0A1vfXiB9xZ1SMGxL/l6js9BVlIx/CBGOJ4L190QmxJlcAZ2VnQzrl ramRUv01xb00IPJ5TBft5IJ+SY0FnY9pIERIl6w9khwLt/oGuKNmUHmzJGYoJHYfh72Mm8RQ1R/JS o6v85ULBGdEC3pQq1j//OPyH3egiXIwFq6BtULH5CvsxQkSqgj1MpjwfgVJ8VbjNwqwBXHjooEORj vFQqWQki6By3QARAQABtDJPbGl2aWVyIExhbmdsb2lzIChNeSBrZXkpIDxvbGl2aWVyQHRyaWxsaW 9uMDEuY29tPokCNwQTAQgAIQUCVh3TJwIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIXgAAKCRBlaka GGsWHEI1AD/9sbj+vnFU29WemVqB4iW+9RrHIcbXI4Jg8WaffTQ8KvVeCJ4otzgVT2nHC2A82t4PF 0tp21Ez17CKDNilMvOt8zq6ZHx36CPjoqUVjAdozOiBDpC4qB6ZKYn+gqSENO4hqmmaOW57wT9vII v6mtHmnFvgpOEJl6wbs8ArHDt0BLSjc8QQfvBhoKoWs+ijQTyvFGlQl0oWxEbUkR1J3gdft9Oj9xQ G4OFo73WaSEK/L9IalU2ulCBC+ucSP9McoDxy1i1u8HUDrV5wBY1zafc9zVBcMNH6+ZjxwQmZXqtz ATzB3RbSFHAdmvxl8q6MeS2yx7Atk0CXgW9z5k2KeuZhz5rVV5A+D19SSGzW11uYXsibZx/Wjr9xB KHB6U7qh5sRHaQS191NPonKcsXXAziR+vxwQTP7ZKfy+g5N/e6uivoUnQrl9uvUDDPXEpwVNSoVws Vn4tNyrGEdN11pHDbH5fSGzdpbY8+yczUoxMmsEQe/fpVwRBZUqafRn2TVUhV0qqzsUuQcTNw1zIZ JgvkqrHgd4ivd2b1bXBczmu/wMGpEnF6cWzSQDiwC1NF3i+gHCuD8IX1ujThWtzXsn0VtrMkrRCbn ponVQ6HcbRYYXPuK0HRRjCSuAKo5porVONepiOSmu0FBrpGqBkpBtLrzKXoi1yt/7a/wGdMcVhYGg vA== Organization: Trillion01 Inc Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.52.3 Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - cloud48395.mywhc.ca X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - trillion01.com X-Get-Message-Sender-Via: cloud48395.mywhc.ca: authenticated_id: olivier@trillion01.com X-Authenticated-Sender: cloud48395.mywhc.ca: olivier@trillion01.com X-Source: X-Source-Args: X-Source-Dir: On Sat, 2024-08-03 at 08:36 -0600, Jens Axboe wrote: > You can check the mappings in /sys/kernel/debug/block// >=20 > in there you'll find a number of hctxN folders, each of these is a > hardware queue. hcxt0/type tells you what kind of queue it is, and > inside the directory, you'll find which CPUs this queue is mapped to. > Example: >=20 > root@r7625 /s/k/d/b/nvme0n1# cat hctx1/type=20 > default >=20 > "default" means it's a read/write queue, so it'll handle both reads > and > writes. >=20 > root@r7625 /s/k/d/b/nvme0n1# ls hctx1/ > active=A0 cpu11/=A0=A0 dispatch=A0=A0=A0=A0=A0=A0 sched_tags=A0=A0=A0=A0= =A0=A0=A0=A0 tags > busy=A0=A0=A0 cpu266/=A0 dispatch_busy=A0 sched_tags_bitmap=A0 tags_bitma= p > cpu10/=A0 ctx_map=A0 flags=A0=A0=A0=A0=A0=A0=A0=A0=A0 state=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 type >=20 > and we can see this hardware queue is mapped to cpu 10/11/266. >=20 > That ties into how these are mapped. It's pretty simple - if a task > is > running on cpu 10/11/266 when it's queueing IO, then it'll use hw > queue > 1. This maps to the interrupts you found, but note that the admin > queue > (which is not listed these directories, as it's not an IO queue) is > the > first one there. hctx0 is nvme0q1 in your /proc/interrupts list. >=20 > If IO is queued on hctx1, then it should complete on the interrupt > vector associated with nvme0q2. >=20 Jens, I knew there were nvme experts here! thx for your help. # ls nvme0n1/hctx0/ active busy cpu0 cpu1 ctx_map dispatch dispatch_busy flags=20 sched_tags sched_tags_bitmap state tags tags_bitmap type it means that some I/O that I am unaware of is initiated either from cpu0-cpu1... It seems like nvme number of queues is configurable... I'll try to find out how to reduce it to 1... but my real problem is not really which I/O queue is assigned to a request. It is the irq affinity assigned to the queues... I have found the function: nvme_setup_irqs() where the assignations happen. Considering that I have the bootparams irqaffinity=3D3 I do not understand how the admin queue and hctx0 irqs can be assigned to the cpu 0 and 1. It is as-if the irqaffinity param had no effect on=20 MSIX interrupts affinity masks...