From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2EE7249F5; Mon, 26 Feb 2024 09:46:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=205.220.177.32 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708940806; cv=fail; b=YERLjUw/Wi8/kWR+9Xhg3oASao+4fS9gMeZI1FNk1iyjveLVUn5DeDc1/INfcFKC0Kp9G0Rbjm+aJ+kKKDTNwRq8s5cx1Zo26kGYqrS7tGEux1BurpxOC1iy/muHRRw09j9cTc5rw7N+aZ//4JovqD0GHQM3AQWFkM0bt1QaJag= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708940806; c=relaxed/simple; bh=bTvk9vXPqUIVk4RDJC9GU9pfa04/a9z70sP4Jn+2zcU=; h=Message-ID:Date:Subject:To:Cc:References:From:In-Reply-To: Content-Type:MIME-Version; b=Fo4nY/+QunLXt7ap703Co3TP1fTDGmYi/zMHF9X1G0aaWJC23Lf5Tlz56UnuULRr7Pc9zAWJ20lo/DJYfHkUWbvz91dhCDfha6IpnZu79BHB8l6mm3dRL8cc3bKLM8Ke5zCBvh6EqX+/+hTzz19vingOzEqASycbrNZURAYa5x4= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=SF1nBQBP; dkim=pass (1024-bit key) header.d=oracle.onmicrosoft.com header.i=@oracle.onmicrosoft.com header.b=aTMr4IXC; arc=fail smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="SF1nBQBP"; dkim=pass (1024-bit key) header.d=oracle.onmicrosoft.com header.i=@oracle.onmicrosoft.com header.b="aTMr4IXC" Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 41Q18SsC020498; Mon, 26 Feb 2024 09:46:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=corp-2023-11-20; bh=BnwTHdPjRCfpYLu4wdGlxtr413Sa79LBHWc61E1r8m4=; b=SF1nBQBPmHydrgUc6SAuQepZe1ZZuzeSkvtEdzFzSEN9nhyFDxMMKY0E815XlpjsYpo0 rJUbehzExXwtI/N2SqjA5iWWTY5TnRtit0uI81WSdlZkoQzp80Q1oYDxiXoUFQyNszD7 RESUuh0KnRJxwOh4gcfcOAplIRNnIgZidJWKsg6nd2hjoTXFAyX7w6z1xIk/04dqRiWY YVW4zTdiziLQhcnaOGXZVY87n+sswCaLd0KQG/RGiBx+Q0Y1Nmn/nbAmCppyAAkpTpln msfe0quYlCHuxHRUv0L9loOQv4nC/DuHKJqup9plZfiNsPO0iNQ3XM8X2wonmq10WYXp 0A== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3wf8gdc905-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Feb 2024 09:46:11 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 41Q7uv1Q013946; Mon, 26 Feb 2024 09:46:10 GMT Received: from nam10-mw2-obe.outbound.protection.outlook.com (mail-mw2nam10lp2100.outbound.protection.outlook.com [104.47.55.100]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3wgbdh59t3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Feb 2024 09:46:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Xy8KUXY9c8kKult3FdWicjd7XkwEWzwVw3UxW1JR8xpmUzwVCbzCAZqjxLamFrevDKMwYn7GMyHQW0P5WGxLZMzFSkklhVazyXbWA2KxBJEn9rV0IpC3pVTy6N/6kVHt/0zj0tOOAZuTPtO3xSj+J/781e2OQqOkszbXkpe1NcNf4JC5Pk3PYd6xUHOW/j3/RBz+00MY3bLEWVW/PsC1bi6tlpj4Zfnaj/wZy0J8OG+IfSjETaqZmN9uQ/GUULe/DaQT5KOAsSgE1fIQjuassG8WLBlPNfCw+5gIX4O1q6PF9RrWo016eWXN9+Fb/GWLpEipw5CGO+dAjgvh1a7CjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=BnwTHdPjRCfpYLu4wdGlxtr413Sa79LBHWc61E1r8m4=; b=XD2Zdo3CWMy7mALIYUrsAxIJemimQf21pSe8hwlMAULvyXz+kw2oAP8j2cpujEbD1FfJ3P8JgiTNc23JYQH7f3NEZ1WEMVQUFwUFwq6+GJyTDwOTD56bJHKNaH8tJGU7ytViiWqqh4K51DVoz3aq28MNrYLnk/KlnKA6PUxWYqrpcc4l4B4I0hlp1nDiPvJw5v876ssbEnnqtwKiiUtxvts+vdKEMoVWIHz5rUll8d0uxzRaYD8PEm+8fksLIuGVg9+pS9Wkce3/FQ4PWRSGQG9LqAuwnIPVnP06dPxS8DXHSG6Mm3FMSpm+xKh/pdO6N719x4wMZxpMycxuTjgumw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BnwTHdPjRCfpYLu4wdGlxtr413Sa79LBHWc61E1r8m4=; b=aTMr4IXCwvqzjMxRxBkDi0OGzU6mAqWRxqyipTAOJftr5+y/96dQCX6LAKK07+9sDdys9y37EDIzF8KRuD37X6JE0OmrNpOMkahfNHfDC4tzJav0N2OV6WnYnKU9nhLPGi+89+0Kkz1RGet67cn/ifbuW9B7qJRnjapbNvNVYP0= Received: from DM6PR10MB4313.namprd10.prod.outlook.com (2603:10b6:5:212::20) by DS7PR10MB5999.namprd10.prod.outlook.com (2603:10b6:8:9d::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7316.34; Mon, 26 Feb 2024 09:46:08 +0000 Received: from DM6PR10MB4313.namprd10.prod.outlook.com ([fe80::97a0:a2a2:315e:7aff]) by DM6PR10MB4313.namprd10.prod.outlook.com ([fe80::97a0:a2a2:315e:7aff%3]) with mapi id 15.20.7316.034; Mon, 26 Feb 2024 09:46:08 +0000 Message-ID: Date: Mon, 26 Feb 2024 09:46:02 +0000 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 07/11] block: Add fops atomic write support Content-Language: en-US To: "Ritesh Harjani (IBM)" , axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org, ojaswin@linux.ibm.com, linux-aio@kvack.org, linux-btrfs@vger.kernel.org, io-uring@vger.kernel.org, nilay@linux.ibm.com References: <87cysk1u14.fsf@doe.com> From: John Garry Organization: Oracle Corporation In-Reply-To: <87cysk1u14.fsf@doe.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: LO4P265CA0125.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2c6::11) To DM6PR10MB4313.namprd10.prod.outlook.com (2603:10b6:5:212::20) Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6PR10MB4313:EE_|DS7PR10MB5999:EE_ X-MS-Office365-Filtering-Correlation-Id: 47e313c7-4ad4-4c95-6800-08dc36afc1cf X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3QcyuUVHPnn0P8NadljAZfTJPzAZ+Pgi52GtlHy23vjYXAvSWtLfv2h4XD9Lsj1jiVFs2zARsbaO1zwwraIg78li/b3INNi3H3n2LoTYwRUbQdY/OEftunaZWMn+6MFPlmCqe9b6L4+s1vqMFcf60k6MjWtSCqVxtt66+R43DSqC4vw4585ZEFu/UgKzk+1p45RxI/0ysZBJCwXjL8Y1Z7n9ahAp6APvb4T1Srk38YkPJOEJK42nGtL1ucjXf7pp2OgmJJr71TtyCXoqrsHbENR78qVFNjtKboYu2os935rCexo96mRzel6gPva+FxebgNYVK/JrfPMnIxT1vXNpwQ1jMwVIvaO3zdsB2LqiHstg3hcte2M1n22Tvyy1U7jLqRg6OZVAh3Kd2kKZEqyZXlTKOeDsQ6QSmX0fX/KyYvG7i2G8moL+VHz0az7IkuxgADWuVplWQhlRbmZsG3rxFcTbF6QPJpFqjaUgcff25lCBWXIbbjKoIl2mq74Wh4o/Df5ki0Pfq4VJ0h/yYnxP1NIc43JJsWNr3M7NzAnAnYsdbhsis7BnUCaIn69JOdg7WwrtHDAArGXvvKcD+nZpcxYkziZPgkxwh7BkNsiGp5B/sDhIaZzf+OLBojxgR21HKd1mXKN/g3l7maLFYmYIeGYMcyzyEDDHGDQdt6raKFibspF7t+BIbxDUsm+ivWMYbzlPaCjOMe6oKgFm18cjeg== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR10MB4313.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(921011);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?WWJOa3ZrNEFDem4waGlQYUNXQ0hGdEtzcEQ5dzhxUGFIVjgrRkJuZ0l3S1gv?= =?utf-8?B?aHJubWJJdkV3Y3ZuOFZYUzh6S3FSR1UzcmJTYUNlU1lIaDZRQnZLTEIxdmhl?= =?utf-8?B?anAvNHc1aW1Mc0x3cENNaTJVMnFFaG05Z09XOUsyL3Z3NXNxWXZjWWdIMmFv?= =?utf-8?B?K3lFTWFWekp0R1gwQ2pOU01od3p3azhpYzlwd0ZOUE16WWdEWVAwejlxcGxl?= =?utf-8?B?RmFSZGRSRnliL20vNVdXVmJhczU4SWRwajh3Y2ozZ1FLelhHNk9QN2VYczJi?= =?utf-8?B?R2lybnZHVy93TGFlYS8xNnoyRVNMMnZ0QWxEMlVaRjNTWHNUcTFaZTZTMzhR?= =?utf-8?B?Z2Y0VUV4ZTFkQmdpNFNmSHBJQk1iWmZ4TEhsQTJ2UlFkRHBnc2t4UjdYYUJQ?= =?utf-8?B?Q1dLM3ZmUnRJVUwzYVVrMDJZMXRrQVhQZTdKRHdSM3ZrejUzYklKeGxXdDVr?= =?utf-8?B?UXRSTXdoYXcrRlRIV0VLSVFUaWVXWklMcWlIQ2pvbjIwdGxIRXNsN0lDZE5T?= =?utf-8?B?ai9XWEltRHcrYVNqYnhPcS9UdzZiazdnSDFCMGVEYlUwa2dOKzhiZFdiZ25h?= =?utf-8?B?T3YyWlhxMnN4azlMZUpXeTFra01LL3FrV25kY0NHaWdZNGFiYkxsN2V5RmdL?= =?utf-8?B?ZVR2cmlnS25ETU5hNytpOTJRcjZlNzdsTDUzV0JwcXBJYXNXT3RSSnVMdzZz?= =?utf-8?B?VU1qdFNUdGptbzFVZkxVZXd4T3Ryb0JBTWZ5aDhIaHlWbTU1dS8yUUdPWlRM?= =?utf-8?B?cmppcWdlaVpjSStkRDFqMzhoR1QvZ0FTeFVEQXlUeWpTeWNLRHp3djRxblk4?= =?utf-8?B?WnYvYTFVQkE2QUxIWWVjYkxRSFR2emo0ZHFSeVBFUVBzbUZEYzdxbmIvdDVV?= =?utf-8?B?UXRaYWlVekx2KzR1YlZiSWFQYm80VlpHWURZNFZNREpXUW1FUGpkS1U3SE9Y?= =?utf-8?B?OVU2QW1FL2NZTEFXc2lvT0NBTThUQjMvVUZBbmVveEFjQ3E0cTZLcW94b1VS?= =?utf-8?B?TU4vTHZSQVducmJuM1E4YTZ1UWNRZW9YanhjZGJYaE1LWUMrRHQxTHJuUy96?= =?utf-8?B?dmNIbjczZFhWVEJQQklIalNIdER5TklPSWcrVG9YaGcrcHdkU0tnM0RjMjdN?= =?utf-8?B?SFAxSW5tSU45TFE1d3B2Smo5cGlZM2lpNkJFTDZ4ZUdnUkkxRDJSMVZyTFVP?= =?utf-8?B?TzlUWm1yNWdNTGhwRVZwZEs0aVhDajZja1BJZEFTUVQzTTVPYjlUbVBvUTl2?= =?utf-8?B?a0JxTGl1RmZIbTlLRXA3NDdkYVkyc0ltSlQwUlJDZWtYeVpHNUxTU2YzWkgv?= =?utf-8?B?TFQ1aGhONEd5NXlUOHFXdlZxRUI4OFkrS25kckVXdEs0Tm1oaUdBU3EwOFBK?= =?utf-8?B?a3FxQXZPb3hFK0lROGx4aFNjQklaajdRTjY3VHhZNWpNOXpJMDA0bml3ZG5B?= =?utf-8?B?RXMvbFFmL0phU3Z3blZsTkdNbG9MZU1UckZRRE5tVUQwNEpyTE1zOFdOMXlS?= =?utf-8?B?WGZtRDZCQUQ3eGsxRlpPRU1tS3BmeWJ6bGlrSTRnRFdmdmlITVRRNVU0U2Q3?= =?utf-8?B?N1NMa0J5TWhhUUpOd0lqY2ZFVXdqaG9XM1pKS1h4c040OEp4WnBEQWFCL2xB?= =?utf-8?B?aDhLU2I5WUx3VjFFblBVYXZZTGcwdjFqWkE0MDJBNW9lQU1WRE9QQXJzZ3di?= =?utf-8?B?NTBnZklPb3RTNWdmNkJCc09oV0loNjVLMUVxWENnU0l5ZHkrc1ZxTURqdUta?= =?utf-8?B?c3VtdEhDUGJoRVplRTcvaDFJcExvbDgzK2JlMFhoMC9vZDBFbklWNG1vcXZQ?= =?utf-8?B?dFpkaXgwK2lMekg2V0V6NHEyQXppTytGa0Q5S1pneHAweEFHTkRXS2ZqOW9y?= =?utf-8?B?dzZ0VDROenZLS1Q3L2ZXUWJUeFpBeUpOeTdmYjNFL0Jtb3hOcGNpelMwRlF2?= =?utf-8?B?akRja1ZaZm92YjB4QXBZV2ljcjNyTTlHRjNrNlQ2NW5KUkJCZmVaUWE4bFZR?= =?utf-8?B?U1ptWC8zRnBYeGRVaDQycVVyVVdlUUR2a2tQUlhDOXRmWUF1QjFZNGNRVEx2?= =?utf-8?B?a1hrYlMxYXpHb2E1V0VqNWpudXZHaEFlamcrZ0VpMWtnZVJ5cytpV2pxeEpo?= =?utf-8?B?bFdCb0orVHJQNDQzWVdrZ3RKakU3SFZrR0ZORnJONXBUUXpVbXgrZ0pxTGl0?= =?utf-8?B?U3c9PQ==?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: 8NWtDyC4uPw8F/kCOmcKt1Rh3TEkej+N7dP6ui4b5MWo0ROmePkJBarrkoE2SZESZH1xA4KaJQNqsbaLZzAqvxTrkjSb/S83jNppyWsEW9/Q6TcrJx9E27c54LCjxawgQCkpt017a7pKDmGnBe2+YFDCVoJll8FBGYaIy6al88WNo6QSZWXvCk6jTev3i4mU/mCZv7POeJxX5OdaIUv3ySIqvSRK3KC99AlZ3y+CCyIBEosbDzcHRwMvDqLR0qjQWsQf3Uh1IeHatg457+Uj0glohAuhDuXe8EO7o5y7voUugcwn/vJ5eFVO3G7V4AZxR8dbKcLxNzSihfiU++PPFzYUscZZW1ft/SRQN/YSaoy+BwklkjXDQ3NfrdUL6AJ4eGGqc6nQst6/D/T4hgWpW2HXdcLwkb6iqOJjLZHFkSbrn3gE6anMNZGti2HIyjRvolb/Z2GaiCLqoamhHebTddgizcfNIgvgUdflzqHjs/jXfIb58s3rUTA9oeJpKWcLSKFhEC2HhSSC4OEJ/TfR9gXARJj6w4tQcGhLUatu8xD7YSz5VqnbYbZnjj9+zDP9gSpuwj1/9Yjs4S9L5T4yCNUvKlQF7AUAhrFGlsirIF4= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 47e313c7-4ad4-4c95-6800-08dc36afc1cf X-MS-Exchange-CrossTenant-AuthSource: DM6PR10MB4313.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Feb 2024 09:46:08.0329 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: vnP8xXVjG3qNgPw/1pR6t4byF0xaMgpJ0X7xDoQtUR6xVX1HJ/6Jc5eFMbEkG8Od8c+LcrmJ3tyVKnbNLEAYnA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR10MB5999 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-26_07,2024-02-23_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402260073 X-Proofpoint-ORIG-GUID: cPtHu2MNmApFdzGtXY2rIL5OcxrhQNYA X-Proofpoint-GUID: cPtHu2MNmApFdzGtXY2rIL5OcxrhQNYA On 25/02/2024 14:46, Ritesh Harjani (IBM) wrote: > John Garry writes: > >> Support atomic writes by submitting a single BIO with the REQ_ATOMIC set. >> >> It must be ensured that the atomic write adheres to its rules, like >> naturally aligned offset, so call blkdev_dio_invalid() -> >> blkdev_atomic_write_valid() [with renaming blkdev_dio_unaligned() to >> blkdev_dio_invalid()] for this purpose. >> >> In blkdev_direct_IO(), if the nr_pages exceeds BIO_MAX_VECS, then we cannot >> produce a single BIO, so error in this case. > > BIO_MAX_VECS is 256. So around 1MB limit with 4k pagesize. > Any mention of why this limit for now? Is it due to code complexity that > we only support a single bio? The reason is that lifting this limit adds extra complexity and I don't see any HW out there which supports a larger atomic write unit yet. And even if there was HW (which supports this larger size), is there a usecase for a larger atomic write unit? Nilay reports awupf = 63 for his controller: # lspci 0040:01:00.0 Non-Volatile memory controller: KIOXIA Corporation Device 0025 (rev 01) # nvme id-ctrl /dev/nvme0 -H NVME Identify Controller: vid : 0x1e0f ssvid : 0x1014 sn : Z130A00LTGZ8 mn : 800GB NVMe Gen4 U.2 SSD fr : REV.C9S2 [...] awun : 65535 awupf : 63 [...] And SCSI device I know which supports atomic writes can only handle 32KB max. > As I see it, you have still enabled req merging in block layer for > atomic requests. So it can essentially submit bio chains to the device > driver? So why not support this case for user to submit a req. larger > than 1 MB? Indeed, we could try to lift this limit and submit larger bios or chains of bios for a single atomic write from userspace, but do we need it now? Please also remember that we are always limited by the request queue DMA capabilities also. > >> >> Finally set FMODE_CAN_ATOMIC_WRITE when the bdev can support atomic writes >> and the associated file flag is for O_DIRECT. >> >> Signed-off-by: John Garry >> --- >> block/fops.c | 31 ++++++++++++++++++++++++++++--- >> 1 file changed, 28 insertions(+), 3 deletions(-) >> >> diff --git a/block/fops.c b/block/fops.c >> index 28382b4d097a..563189c2fc5a 100644 >> --- a/block/fops.c >> +++ b/block/fops.c >> @@ -34,13 +34,27 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb) >> return opf; >> } >> >> -static bool blkdev_dio_unaligned(struct block_device *bdev, loff_t pos, >> - struct iov_iter *iter) >> +static bool blkdev_atomic_write_valid(struct block_device *bdev, loff_t pos, >> + struct iov_iter *iter) >> { >> + struct request_queue *q = bdev_get_queue(bdev); >> + unsigned int min_bytes = queue_atomic_write_unit_min_bytes(q); >> + unsigned int max_bytes = queue_atomic_write_unit_max_bytes(q); >> + >> + return atomic_write_valid(pos, iter, min_bytes, max_bytes); > > generic_atomic_write_valid() would be better for this function. However, > I have any commented about this in some previous ok > >> +} >> + >> +static bool blkdev_dio_invalid(struct block_device *bdev, loff_t pos, >> + struct iov_iter *iter, bool atomic_write) > > bool "is_atomic" or "is_atomic_write" perhaps? > we anyway know that we only support atomic writes and RWF_ATOMIC > operation is made -EOPNOTSUPP for reads in kiocb_set_rw_flags(). > So we may as well make it "is_atomic" for bools. ok > >> +{ >> + if (atomic_write && !blkdev_atomic_write_valid(bdev, pos, iter)) >> + return true; >> + >> return pos & (bdev_logical_block_size(bdev) - 1) || >> !bdev_iter_is_aligned(bdev, iter); >> } >> >> + >> #define DIO_INLINE_BIO_VECS 4 >> >> static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, >> @@ -71,6 +85,8 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, >> } >> bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT; >> bio.bi_ioprio = iocb->ki_ioprio; >> + if (iocb->ki_flags & IOCB_ATOMIC) >> + bio.bi_opf |= REQ_ATOMIC; >> >> ret = bio_iov_iter_get_pages(&bio, iter); >> if (unlikely(ret)) >> @@ -341,6 +357,9 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, >> task_io_account_write(bio->bi_iter.bi_size); >> } >> >> + if (iocb->ki_flags & IOCB_ATOMIC) >> + bio->bi_opf |= REQ_ATOMIC; >> + >> if (iocb->ki_flags & IOCB_NOWAIT) >> bio->bi_opf |= REQ_NOWAIT; >> >> @@ -357,13 +376,14 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, >> static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) >> { >> struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host); >> + bool atomic_write = iocb->ki_flags & IOCB_ATOMIC; > > ditto, bool is_atomic perhaps? ok > >> loff_t pos = iocb->ki_pos; >> unsigned int nr_pages; >> >> if (!iov_iter_count(iter)) >> return 0; >> >> - if (blkdev_dio_unaligned(bdev, pos, iter)) >> + if (blkdev_dio_invalid(bdev, pos, iter, atomic_write)) >> return -EINVAL; >> >> nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1); >> @@ -371,6 +391,8 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) >> if (is_sync_kiocb(iocb)) >> return __blkdev_direct_IO_simple(iocb, iter, nr_pages); >> return __blkdev_direct_IO_async(iocb, iter, nr_pages); >> + } else if (atomic_write) { >> + return -EINVAL; >> } >> return __blkdev_direct_IO(iocb, iter, bio_max_segs(nr_pages)); >> } >> @@ -616,6 +638,9 @@ static int blkdev_open(struct inode *inode, struct file *filp) >> if (bdev_nowait(handle->bdev)) >> filp->f_mode |= FMODE_NOWAIT; >> >> + if (bdev_can_atomic_write(handle->bdev) && filp->f_flags & O_DIRECT) >> + filp->f_mode |= FMODE_CAN_ATOMIC_WRITE; >> + >> filp->f_mapping = handle->bdev->bd_inode->i_mapping; >> filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping); >> filp->private_data = handle; >> -- >> 2.31.1 Thanks, John