From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFC23C433F5 for ; Thu, 18 Nov 2021 19:13:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AC52361260 for ; Thu, 18 Nov 2021 19:13:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230499AbhKRTQb (ORCPT ); Thu, 18 Nov 2021 14:16:31 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:13312 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229554AbhKRTQa (ORCPT ); Thu, 18 Nov 2021 14:16:30 -0500 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 1AIDNved027755; Thu, 18 Nov 2021 11:13:10 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=facebook; bh=rwLSmNZA5IDTB8c26CY8dQU43yli8CL1UMDwgo0y/yY=; b=Xi+1aG5pPXz5OTnQ7WZQ+Y6MleopBoMOpQmu2tAcsAxO6PlOQXsrZYZVCI4ZAmlpqMF5 AEjLMJ538t0HvjCAJRtGumBNVTAv6N/xJ5PcgDXHwW4TD1YjWFRZujK5b6iDW6gewh7g esTXood0D8PQyI9iK1apyttJbQ66xzFRZRk= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 3cdqp4jsjp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 18 Nov 2021 11:13:09 -0800 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (100.104.98.9) by o365-in.thefacebook.com (100.104.94.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 18 Nov 2021 11:13:09 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ImuRdc9u5kvzXAjrq0Bw6ffSo+iYPb6Ky0ZukQpS48QBQz/RygfNbgT6YU5p3rvI1MzS/R/5GodeeTKk5nI3CD2fNBMhgifFdrsDTM+gecxsbsGUr9b9vQfJNbp5LV/bTpz/k1sLO6kRBbc2nj3/ZLEd3cOhZTCqGBwuQvjfRqW2w5cP2CGIvQ4D8Gob1bSWv43HTCvtcCTHKWY4pJB0j+uoZEnvK0LKSoYws+aXPX0wCnmeD9cau8V6S9dfpfkz4/GcG4IDR1qk6vIuFpmC9/WP4P63USVC+KCuqQ25trDFy3kgOrcFImppZy7Q9LqFrRob++XoEJW+yE2hkVFKPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rwLSmNZA5IDTB8c26CY8dQU43yli8CL1UMDwgo0y/yY=; b=Oplubm2W60s34OtDH1Grw3dgmsxhwHEsC7o+A/xjR3OaGnARJULkApxc7GFAOLuXLoexiXhIEY66FkPvGGMXfptduMmwjFt7jo+OkvrB8r3cX71WTHjc9CyOx2A0Vkp1TctRtkMjZ6OK101TSuI8+aOhM315CEYR0CN+kjOM28jie4gVhKYh58z3Hf3fk5Ky/LEs14rM9LzUPjm/wArVyaMLGgWNfbwk1dc3z/Eq/IDvCYvl75S2NV7apueu497N15M06k/zZQcpTaPHNLxHBoYot48l87l6ELEkqyd3Z8hGnN/RSWAi64MPnyicPmNlnreJyUkSr0kDx6sN+TkOAQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fb.com; dmarc=pass action=none header.from=fb.com; dkim=pass header.d=fb.com; arc=none Authentication-Results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=fb.com; Received: from SN6PR1501MB2064.namprd15.prod.outlook.com (2603:10b6:805:d::27) by SA1PR15MB4869.namprd15.prod.outlook.com (2603:10b6:806:1d0::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.15; Thu, 18 Nov 2021 19:13:07 +0000 Received: from SN6PR1501MB2064.namprd15.prod.outlook.com ([fe80::a91b:fba1:b79c:812c]) by SN6PR1501MB2064.namprd15.prod.outlook.com ([fe80::a91b:fba1:b79c:812c%5]) with mapi id 15.20.4690.029; Thu, 18 Nov 2021 19:13:05 +0000 Message-ID: <3c6e4801-61d2-d6b9-0e2b-869053f8adbc@fb.com> Date: Thu, 18 Nov 2021 11:13:02 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Subject: Re: [PATCH bpf-next v1 1/8] io_uring: Implement eBPF iterator for registered buffers Content-Language: en-US To: Kumar Kartikeya Dwivedi CC: , Jens Axboe , Pavel Begunkov , , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Pavel Emelyanov , Alexander Mihalicyn , Andrei Vagin , , References: <20211116054237.100814-1-memxor@gmail.com> <20211116054237.100814-2-memxor@gmail.com> <8d95bd01-7f1a-9350-cede-c6abd56a7927@fb.com> <20211118182845.b4b7qaj7ip3lmkcj@apollo.localdomain> From: Yonghong Song In-Reply-To: <20211118182845.b4b7qaj7ip3lmkcj@apollo.localdomain> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: MWHPR17CA0057.namprd17.prod.outlook.com (2603:10b6:300:93::19) To SN6PR1501MB2064.namprd15.prod.outlook.com (2603:10b6:805:d::27) MIME-Version: 1.0 Received: from [IPV6:2620:10d:c085:21e1::1798] (2620:10d:c090:400::5:184a) by MWHPR17CA0057.namprd17.prod.outlook.com (2603:10b6:300:93::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4713.19 via Frontend Transport; Thu, 18 Nov 2021 19:13:04 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: fed9418d-7cff-4b91-bdb8-08d9aac77303 X-MS-TrafficTypeDiagnostic: SA1PR15MB4869: X-Microsoft-Antispam-PRVS: X-FB-Source: Internal X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Eun4vViPOTABZ+Wb8GSgiPJedaoAnxoST9Nn6Cjf4uxTvb8B++WR1XpLbPS0SfjPvuSoPlO7q1xSaXaGGpkaAUTX1UHlHSCe6I74Q3CMqKW4/z695cTUE6Cujp5YN4veM2+q40xewRLpLBkwaG068OtMiNDLQc6RAVOpMxqi6SFn7Cif3t87/epC3fX7V/TYc54QgZRhS80bHPvUJyLb2k8ci22eqx0MPP67mwtoGTp2XO2XzTgYi41vLFCvZ5aqWzGMlJ0LDdhHRUXs0AOuXciMjO9cj7tjaoVy8iamnk0cGajSuSX/+6I3Z5PM7cGyu8Yz6YIGN4AQVkElsRbspSwEJhQE5UDk7W22+56XBULssYh+OuIbTpYc3gUAd+CRjREc1egklPfDSdNPGzKODkMh4U1IdxTTLAtk4sdhQMnkBym0u4oxANA8Nh8N9Ft9sAaAdwFIq6/v3taBWK2XC3SgzC0f3wREhXntjcxMwBuYyZH5Izg3ljKwK4rPyFGRTo7dVLAiCtG51sQaE952KFJzmeV3sl5iC+C6t1pTu6j5f9pCxhW6nP3tEa5djBFkYidK5EDCVbgXXiPp9pe8ygQUH858L8S6RTUuxd1hE2C9g5dZHpA2YEwCCOvBewGnGcOyhwS6309Ti6qaok3DUxKc+jJcO1SoFCgmnsxAeZ+FW9CmakWiu2O9uCQLkjkHCNCOPZyRQJw2iAoRv6pP8DRjopfEn66L8U1Bdqc9WP/ZH0Utb3Dt/v+gx60wQ88P X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN6PR1501MB2064.namprd15.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(366004)(86362001)(4326008)(66946007)(508600001)(2616005)(186003)(66556008)(8676002)(66476007)(38100700002)(8936002)(36756003)(83380400001)(53546011)(316002)(54906003)(7416002)(6916009)(2906002)(31696002)(52116002)(5660300002)(31686004)(6486002)(45980500001)(43740500002);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?VmdSajRTQUkvL3RnWUZZdzVRSmIvZkpDS1RUUEFRRkxXdTVHSlpwb0JQNmJB?= =?utf-8?B?U0pCR0hnTUpNQnRDUEJFN0NzSTIwWWpjd09raGNmV0hNQ3NtUnJqT2oxaHdv?= =?utf-8?B?dTJ1VG1lRFlLZFNJWFAyVnZ6dVdqZ1V1UVhOSFFqdDhKNTMzQ1V6MnpZdnIw?= =?utf-8?B?NEorU010QjRwKzZuQUFTQW43SUR6Um93Z01RSVM4U2hndmZNY2d4VlNuZStu?= =?utf-8?B?Qm1PQkhYTDF0dllqTHg0NFJZUW9YT1pSSjhSMStpZWNBdm5US05iNk1YWDRu?= =?utf-8?B?RjRkeTFXZFBWMVVuTkZHcUxyRlhWVUc5MUlxazQ4djRHS2RLY2ZGeTF6K3B1?= =?utf-8?B?a1VienBJcUNjYWc1akJUMm1mb2lVOGRoUVVVeG5lTy9yTkdNdGNTcHYzOXdC?= =?utf-8?B?T0oyK1RSSEE1bGlzNXNGK0ErdmN3eWNpYy9HRjA1cXhQbkluNmxDbFVWNE1R?= =?utf-8?B?Wm1rU3J0b0c5Qi9BWUdraWJqaGpHQ3B4c1dmTG5BbHhFWlllREVCbW9nM2Za?= =?utf-8?B?TTVJSlg0YWJrUEhMV1lubDV3cG96Y0tkaUJmTWpHRjN3Rmd1N0tYYkZkRkx4?= =?utf-8?B?QlJxSzJyR3JwSEtXbjBsaXdBTmRKWGdtM3IrbXFXVkxyYVpHVjUvUEExbmRq?= =?utf-8?B?Vy9QMzJaVGRONWYyVWs2M0NBZGZkT1doTGNWbzVPUytmTCtURjUzZllqekI2?= =?utf-8?B?dEVoZEpzMGxnc0Rnb3ZFWU9UV24wRHZ2VDdSdXlpc0tJT2ZRcHFBMDRWTDlF?= =?utf-8?B?cjRQeUtEQzdVaTNYb2dWSHRJSnl4S0VhaXdTL1NWME41NVljbXpyY2c3d1Vm?= =?utf-8?B?cXA3TmVhbndFeGFwU0Q3c0p1TGFmQzZUTUNFRHFqS1lQYWxJMFk4clVWYUFt?= =?utf-8?B?aVhUdnloR2JyRlpRRWptRTlnNW0yTUNQamkrOWo1aFJjWXAwdEt2RnJPdkhz?= =?utf-8?B?SFpiZ0wrZjA4L0pDNlUvcDk0cGN6N2tWVmVoeFZJbWpUbzRNZXNKbStXeXdn?= =?utf-8?B?c01ONmJTT3pMaXkveXJRNjN4QnU3ZnBua2dJZC9xeE1qeXE3dlNNenhwZnF0?= =?utf-8?B?VWk0V0RIMFF5QWFJM1FJRDNxTFhFcFVxbWV5ZUQrM1FqQlNzL2lxbWNrazk0?= =?utf-8?B?c2NMS0c4NU9nZVpLaWJaSE9UakRoWEMrZjh0K2NEZjBsaXRIQS9hV3hmTFVl?= =?utf-8?B?Vnk1NjZRM3pjZ3RCY1k2VmtOMDlSQkI0dzdBZHhzRFE1NDIza3p0MnpqSWN1?= =?utf-8?B?ZDdMYjhYWHpBR1ZMa0xhZlRpWkdDaEEvWjBwM1J3ZXJzcnJVb2tjTkc3RGs1?= =?utf-8?B?NzQzd0pBWGJYNzI1aWxIMmc3N00rSTVWbkNtWTVVWmNJOFZUOVo3WTc5dTFM?= =?utf-8?B?UVNlMkM3UmJJUU92L1gxdmw2bXNOVmowaW9YaFJsNGk1ajN0TVVZMi8wclVw?= =?utf-8?B?NVN6b0tBZG4zT05rajdOQVpQOFpoMGFrYkJrelJnSWdET2RrSTRQb0Qwd3R3?= =?utf-8?B?dzBZSndiNnFWRUltZkxPNnROUXBWYWZnTDlhcGRmcVNZY2QweFVraFgzbk4z?= =?utf-8?B?ZWJTRkhYOXVwT3F6b3A0V2pwQTZyQitQT3hDbm9HdlJQV2xRbHdSd2dRSEk3?= =?utf-8?B?VDQxNWJ3ZlphWVJVeWNmeFNkVXpFaGVydnVpMGR2UW5vTWI4YWk5YlZJVFdw?= =?utf-8?B?b3BKaUNEY1pIUzh1cGp6elFvQWlaYVNtUEM0MkhTNmUyNmhLM2FGenlqeGdp?= =?utf-8?B?YWo4L1hyS3VQemE3SjFCQTZWa3JJMGl3eitLa2U4U0Fkc2ZMdGJ3dXoxdWhB?= =?utf-8?B?OERqYW10UytTZHVMN2NSUlVZVTJBaTJxNllvSUZsR0hpVzRQcGZtSThjMU5a?= =?utf-8?B?WDkveHhCcnd2TlRoZnR5VXpQeUlnTVdpRVZoZHhSbHRpRXdFN0pnclZlQzIr?= =?utf-8?B?R1RqZTI1TlhDMDdTR1pCY2RWazB4eVZEYS9QMnRFYnFzb0VFUzRNYkREUTZZ?= =?utf-8?B?MXJHcjBMWWNLQlg4SDdia1BJcThSeEdWdlZXaG1oQnpvVlFiS3NYV3NrNDh5?= =?utf-8?B?elNQd04zTmlYMCtmVTFDVXV1bjg0OGtUSHhKbUtXVXJZOGgvUkN1cFRLU2dU?= =?utf-8?B?Qy9HUnhpSW1zeWhmeDI1VG5RK1NSeDgzbHZmT2k0QmdXQ3RERVQ4cFlNcDRX?= =?utf-8?B?U3c9PQ==?= X-MS-Exchange-CrossTenant-Network-Message-Id: fed9418d-7cff-4b91-bdb8-08d9aac77303 X-MS-Exchange-CrossTenant-AuthSource: SN6PR1501MB2064.namprd15.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Nov 2021 19:13:05.5853 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: dzLPkq68hYswmDlWdmp0iYJXt4sAcrZMohz0zPM6mZu/FOf0vwhDYDFJnLUgls0q X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR15MB4869 X-OriginatorOrg: fb.com X-Proofpoint-GUID: j8y5_N3V2agQ1vJT8ZDC84AxAo1Fnkva X-Proofpoint-ORIG-GUID: j8y5_N3V2agQ1vJT8ZDC84AxAo1Fnkva X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-11-18_12,2021-11-17_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 spamscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 bulkscore=0 malwarescore=0 suspectscore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111180102 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 11/18/21 10:28 AM, Kumar Kartikeya Dwivedi wrote: > On Thu, Nov 18, 2021 at 10:51:59PM IST, Yonghong Song wrote: >> >> >> On 11/15/21 9:42 PM, Kumar Kartikeya Dwivedi wrote: >>> This change adds eBPF iterator for buffers registered in io_uring ctx. >>> It gives access to the ctx, the index of the registered buffer, and a >>> pointer to the io_uring_ubuf itself. This allows the iterator to save >>> info related to buffers added to an io_uring instance, that isn't easy >>> to export using the fdinfo interface (like exact struct page composing >>> the registered buffer). >>> >>> The primary usecase this is enabling is checkpoint/restore support. >>> >>> Note that we need to use mutex_trylock when the file is read from, in >>> seq_start functions, as the order of lock taken is opposite of what it >>> would be when io_uring operation reads the same file. We take >>> seq_file->lock, then ctx->uring_lock, while io_uring would first take >>> ctx->uring_lock and then seq_file->lock for the same ctx. >>> >>> This can lead to a deadlock scenario described below: >>> >>> CPU 0 CPU 1 >>> >>> vfs_read >>> mutex_lock(&seq_file->lock) io_read >>> mutex_lock(&ctx->uring_lock) >>> mutex_lock(&ctx->uring_lock) # switched to mutex_trylock >>> mutex_lock(&seq_file->lock) >> >> It is not clear which mutex_lock switched to mutex_trylock. > > The one in vfs_read. > >> From below example, it looks like &ctx->uring_lock. But if this is >> the case, we could have deadlock, right? >> > > Sorry, I will make the commit message clearer in the next version. > > The sequence on CPU 0 is for normal read(2) on iterator. > For CPU 1, it is an io_uring instance trying to do same on iterator attached to > itself. > > So CPU 0 does > > sys_read > vfs_read > bpf_seq_read > mutex_lock(&seq_file->lock) # A > io_uring_buf_seq_start > mutex_lock(&ctx->uring_lock) # B > > and CPU 1 does > > io_uring_enter > mutex_lock(&ctx->uring_lock) # B > io_read > bpf_seq_read > mutex_lock(&seq_file->lock) # A > ... > > Since the order of locks is opposite, it can deadlock. So I switched the > mutex_lock in io_uring_buf_seq_start to trylock, so it can return an error for > this case, then it will release seq_file->lock and CPU 1 will make progress. > > The second problem without use of trylock is described below (for same case of > io_uring reading from iterator attached to itself). > > Let me know if I missed something. Thanks for explanation. The above diagram is much better. > >>> >>> The trylock also protects the case where io_uring tries to read from >>> iterator attached to itself (same ctx), where the order of locks would >>> be: >>> io_uring_enter >>> mutex_lock(&ctx->uring_lock) <-----------. >>> io_read \ >>> seq_read \ >>> mutex_lock(&seq_file->lock) / >>> mutex_lock(&ctx->uring_lock) # deadlock-` >>> >>> In both these cases (recursive read and contended uring_lock), -EDEADLK >>> is returned to userspace. >>> >>> In the future, this iterator will be extended to directly support >>> iteration of bvec Flexible Array Member, so that when there is no >>> corresponding VMA that maps to the registered buffer (e.g. if VMA is >>> destroyed after pinning pages), we are able to reconstruct the >>> registration on restore by dumping the page contents and then replaying >>> them into a temporary mapping used for registration later. All this is >>> out of scope for the current series however, but builds upon this >>> iterator. >>> >>> Cc: Jens Axboe >>> Cc: Pavel Begunkov >>> Cc: io-uring@vger.kernel.org >>> Signed-off-by: Kumar Kartikeya Dwivedi >>> --- >>> fs/io_uring.c | 179 +++++++++++++++++++++++++++++++++ >>> include/linux/bpf.h | 2 + >>> include/uapi/linux/bpf.h | 3 + >>> tools/include/uapi/linux/bpf.h | 3 + >>> 4 files changed, 187 insertions(+) >>> [...] > >>> [...] >>> +static struct bpf_iter_reg io_uring_buf_reg_info = { >>> + .target = "io_uring_buf", >>> + .feature = BPF_ITER_RESCHED, >>> + .attach_target = bpf_io_uring_iter_attach, >>> + .detach_target = bpf_io_uring_iter_detach, >> >> Since you have this extra `io_uring_fd` for the iterator, you may want >> to implement show_fdinfo and fill_link_info callback functions here. >> > > Ack, but some questions: > > What should it have? e.g. it easy to go from map_id to map fd if one wants > access to the map attached to the iterator, but not sure how one can obtain more > information about target fd from io_uring or epoll iterators. Just to be clear, I am talking about uapi struct bpf_link_info. I agree that fd is not really useful. So I guess it is up to you whether you want to show fd to user or not. We can always add it later if needed. > > Should I delegate to their show_fdinfo and dump using that? > > The type/target is already available in link_info, not sure what other useful > information can be added there, which allows obtaining the io_uring/epoll fd. > >>> + .ctx_arg_info_size = 2, >>> + .ctx_arg_info = { >>> + { offsetof(struct bpf_iter__io_uring_buf, ctx), >>> + PTR_TO_BTF_ID }, >>> + { offsetof(struct bpf_iter__io_uring_buf, ubuf), >>> + PTR_TO_BTF_ID_OR_NULL }, >>> + }, >>> + .seq_info = &bpf_io_uring_buf_seq_info, >>> +}; >>> + >>> +static int __init io_uring_iter_init(void) >>> +{ >>> + io_uring_buf_reg_info.ctx_arg_info[0].btf_id = btf_io_uring_ids[0]; >>> + io_uring_buf_reg_info.ctx_arg_info[1].btf_id = btf_io_uring_ids[1]; >>> + return bpf_iter_reg_target(&io_uring_buf_reg_info); >>> +} >>> +late_initcall(io_uring_iter_init); >>> + >>> +#endif >>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >>> index 56098c866704..ddb9d4520a3f 100644 >>> --- a/include/linux/bpf.h >>> +++ b/include/linux/bpf.h >>> @@ -1509,8 +1509,10 @@ int bpf_obj_get_user(const char __user *pathname, int flags); >>> extern int bpf_iter_ ## target(args); \ >>> int __init bpf_iter_ ## target(args) { return 0; } >>> +struct io_ring_ctx; >>> struct bpf_iter_aux_info { >>> struct bpf_map *map; >>> + struct io_ring_ctx *ctx; >>> }; >> >> Can we use union here? Note that below bpf_iter_link_info in >> uapi/linux/bpf.h, map_fd and io_uring_fd is also an union. >> > > So the reason I didn't use a union was the link->aux.map check in > bpf_iter.c::__get_seq_info. Even if we switch to union bpf_iter_aux_info, it > needs some way to determine whether link is for map type, so maybe a string > comparison there? Leaving it out of union felt cleaner, also I move both > io_ring_ctx and eventpoll file into a union in later patch. I see. the seq_ops for map element iterator is different from others. the seq_ops is not from main iter registration but from map_ops. I think your change is okay. But maybe a comment to explain why map is different from others in struct bpf_iter_aux_info. > >>> typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog, >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>> index 6297eafdc40f..3323defa99a1 100644 >>> --- a/include/uapi/linux/bpf.h >>> +++ b/include/uapi/linux/bpf.h >>> @@ -91,6 +91,9 @@ union bpf_iter_link_info { >>> struct { >>> __u32 map_fd; >>> } map; >>> + struct { >>> + __u32 io_uring_fd; >>> + } io_uring; >>> }; >>> /* BPF syscall commands, see bpf(2) man-page for more details. */ >>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h >>> index 6297eafdc40f..3323defa99a1 100644 >>> --- a/tools/include/uapi/linux/bpf.h >>> +++ b/tools/include/uapi/linux/bpf.h >>> @@ -91,6 +91,9 @@ union bpf_iter_link_info { >>> struct { >>> __u32 map_fd; >>> } map; >>> + struct { >>> + __u32 io_uring_fd; >>> + } io_uring; >>> }; >>> /* BPF syscall commands, see bpf(2) man-page for more details. */ >>> > > -- > Kartikeya >