From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A89E2C433F5 for ; Mon, 21 Feb 2022 18:14:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229574AbiBUSOV (ORCPT ); Mon, 21 Feb 2022 13:14:21 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:41728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231526AbiBUSMe (ORCPT ); Mon, 21 Feb 2022 13:12:34 -0500 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED3B2215; Mon, 21 Feb 2022 10:03:36 -0800 (PST) Received: by mail-ed1-x536.google.com with SMTP id s14so13658950edw.0; Mon, 21 Feb 2022 10:03:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=JSzbeyJ3wmi72yBQpp06Oo4N3LKGxxejN6QrBogGfGY=; b=ImHynHr4TWu/Jy3NY3KN2p5AbC6LhDNHrAmrMj7BkA8vKIuHo+2S3dB7i1FcXXu6xT Aw9mYdUevGkiHxzsqJpQzsOa6VkXKDSRN2d+pUvSj/xSdO9NRZlYRdRokOH3uYu8++8S 5SCfSWWB0EJc854KDzKK2iFAF2GSLXVkmybj6xQ2DjB1S+SKB+PfBLjVmePk4EUgIQ8U SjKX2rB2nrNULYbIXBg6zlWB+tEelS3RtuXdkrIpEkBkbnlkWKaEmLse7xjaEE4ET57G fnjczkAFH8kjtR2qjQuokfibC3+s0gk7Kn3zwutU9Lpwrgv/jMTwJ4OgXyumcIrhF+TZ Xctg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=JSzbeyJ3wmi72yBQpp06Oo4N3LKGxxejN6QrBogGfGY=; b=RkcurywT6jXQXQIcEbLUmbKLlCCW0/R9HKr35URmRAALOJf/tKqe3+2HqbbjPC6dwh dNY87OPu+hAFFhp8AAYNz+5mAc0cEW4nw4z6NDEDGKtoj9oSSUO//egTHnkWDQ+MR0tx GzgDngzEUulNcQ7RpkkdfppRZfe92Ld76/XryFkc/A0kpaXQgIRQn29SihUiqGnzSYS1 IUt/dVsgx8mfRFvHPeUbG7GEhz6GMAue8eNHJr9eXLWEbTRlNGvvPjIHuCbiJsK/UdfO JuHC5heKW60Z0JdLE/AYmPlzdzZjjDFBNmYYL091LX5PrMg6Q+KOQOpZ/30FC21jrCt8 q3Iw== X-Gm-Message-State: AOAM531635p+Ncj1WMn7Nyo1a11lAELUPTrJrt6q4OflqjO2CBO10yMY DTDXVk/LsvUKWqIZoTRL3QySMYbB8oM= X-Google-Smtp-Source: ABdhPJyRHVKif8YwOMbAOuZy8/YcSv7z4RmtG1Sdw/znjf5FGmjpGfTCWwd3HIqBOrfosroOHlBazg== X-Received: by 2002:a05:6402:350e:b0:412:d02f:9004 with SMTP id b14-20020a056402350e00b00412d02f9004mr16805706edd.59.1645466615399; Mon, 21 Feb 2022 10:03:35 -0800 (PST) Received: from [192.168.8.198] ([85.255.234.184]) by smtp.gmail.com with ESMTPSA id nc40sm3123736ejc.127.2022.02.21.10.03.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 21 Feb 2022 10:03:35 -0800 (PST) Message-ID: Date: Mon, 21 Feb 2022 18:00:17 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw Content-Language: en-US To: Dylan Yudaken , Jens Axboe , io-uring@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com References: <20220221141649.624233-1-dylany@fb.com> <20220221141649.624233-5-dylany@fb.com> From: Pavel Begunkov In-Reply-To: <20220221141649.624233-5-dylany@fb.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 2/21/22 14:16, Dylan Yudaken wrote: > In read/write ops, preincrement f_pos when no offset is specified, and > then attempt fix up the position after IO completes if it completed less > than expected. This fixes the problem where multiple queued up IO will all > obtain the same f_pos, and so perform the same read/write. > > This is still not as consistent as sync r/w, as it is able to advance the > file offset past the end of the file. It seems it would be quite a > performance hit to work around this limitation - such as by keeping track > of concurrent operations - and the downside does not seem to be too > problematic. > > The attempt to fix up the f_pos after will at least mean that in situations > where a single operation is run, then the position will be consistent. > > Co-developed-by: Jens Axboe > Signed-off-by: Jens Axboe > Signed-off-by: Dylan Yudaken > --- > fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 68 insertions(+), 13 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index abd8c739988e..a951d0754899 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret) [...] > + return false; > } > } > - return is_stream ? NULL : &kiocb->ki_pos; > + *ppos = is_stream ? NULL : &kiocb->ki_pos; > + return false; > +} > + > +static inline void > +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64 actual) That's a lot of inlining, I wouldn't be surprised if the compiler will even refuse to do that. io_kiocb_done_pos() { // rest of it } inline io_kiocb_done_pos() { if (!(flags & CUR_POS)); return; __io_kiocb_done_pos(); } io_kiocb_update_pos() is huge as well > +{ > + u64 expected; > + > + if (likely(!(req->flags & REQ_F_CUR_POS))) > + return; > + > + expected = req->rw.len; > + if (actual >= expected) > + return; > + > + /* > + * It's not definitely safe to lock here, and the assumption is, > + * that if we cannot lock the position that it will be changing, > + * and if it will be changing - then we can't update it anyway > + */ > + if (req->file->f_mode & FMODE_ATOMIC_POS > + && !mutex_trylock(&req->file->f_pos_lock)) > + return; > + > + /* > + * now we want to move the pointer, but only if everything is consistent > + * with how we left it originally > + */ > + if (req->file->f_pos == kiocb->ki_pos + (expected - actual)) > + req->file->f_pos = kiocb->ki_pos; I wonder, is it good enough / safe to just assign it considering that the request was executed outside of locks? vfs_seek()? > + > + /* else something else messed with f_pos and we can't do anything */ > + > + if (req->file->f_mode & FMODE_ATOMIC_POS) > + mutex_unlock(&req->file->f_pos_lock); > } Do we even care about races while reading it? E.g. pos = READ_ONCE(); > > - ppos = io_kiocb_update_pos(req, kiocb); > - > ret = rw_verify_area(READ, req->file, ppos, req->result); > if (unlikely(ret)) { > kfree(iovec); > + io_kiocb_done_pos(req, kiocb, 0); Why do we update it on failure? [...] > - ppos = io_kiocb_update_pos(req, kiocb); > - > ret = rw_verify_area(WRITE, req->file, ppos, req->result); > if (unlikely(ret)) > goto out_free; > @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags) > return ret ?: -EAGAIN; > } > out_free: > + io_kiocb_done_pos(req, kiocb, 0); Looks weird. It appears we don't need it on failure and successes are covered by kiocb_done() / ->ki_complete > /* it's reportedly faster than delegating the null check to kfree() */ > if (iovec) > kfree(iovec); -- Pavel Begunkov