Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: Dylan Yudaken <[email protected]>, Jens Axboe <[email protected]>,
	[email protected]
Cc: [email protected], [email protected]
Subject: Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw
Date: Mon, 21 Feb 2022 18:00:17 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/21/22 14:16, Dylan Yudaken wrote:
> In read/write ops, preincrement f_pos when no offset is specified, and
> then attempt fix up the position after IO completes if it completed less
> than expected. This fixes the problem where multiple queued up IO will all
> obtain the same f_pos, and so perform the same read/write.
> 
> This is still not as consistent as sync r/w, as it is able to advance the
> file offset past the end of the file. It seems it would be quite a
> performance hit to work around this limitation - such as by keeping track
> of concurrent operations - and the downside does not seem to be too
> problematic.
> 
> The attempt to fix up the f_pos after will at least mean that in situations
> where a single operation is run, then the position will be consistent.
> 
> Co-developed-by: Jens Axboe <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> Signed-off-by: Dylan Yudaken <[email protected]>
> ---
>   fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 68 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index abd8c739988e..a951d0754899 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)

[...]

> +			return false;
>   		}
>   	}
> -	return is_stream ? NULL : &kiocb->ki_pos;
> +	*ppos = is_stream ? NULL : &kiocb->ki_pos;
> +	return false;
> +}
> +
> +static inline void
> +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64 actual)

That's a lot of inlining, I wouldn't be surprised if the compiler
will even refuse to do that.

io_kiocb_done_pos() {
	// rest of it
}

inline io_kiocb_done_pos() {
	if (!(flags & CUR_POS));
		return;
	__io_kiocb_done_pos();
}

io_kiocb_update_pos() is huge as well

> +{
> +	u64 expected;
> +
> +	if (likely(!(req->flags & REQ_F_CUR_POS)))
> +		return;
> +
> +	expected = req->rw.len;
> +	if (actual >= expected)
> +		return;
> +
> +	/*
> +	 * It's not definitely safe to lock here, and the assumption is,
> +	 * that if we cannot lock the position that it will be changing,
> +	 * and if it will be changing - then we can't update it anyway
> +	 */
> +	if (req->file->f_mode & FMODE_ATOMIC_POS
> +		&& !mutex_trylock(&req->file->f_pos_lock))
> +		return;
> +
> +	/*
> +	 * now we want to move the pointer, but only if everything is consistent
> +	 * with how we left it originally
> +	 */
> +	if (req->file->f_pos == kiocb->ki_pos + (expected - actual))
> +		req->file->f_pos = kiocb->ki_pos;

I wonder, is it good enough / safe to just assign it considering that
the request was executed outside of locks? vfs_seek()?

> +
> +	/* else something else messed with f_pos and we can't do anything */
> +
> +	if (req->file->f_mode & FMODE_ATOMIC_POS)
> +		mutex_unlock(&req->file->f_pos_lock);
>   }

Do we even care about races while reading it? E.g.
pos = READ_ONCE();

>   
> -	ppos = io_kiocb_update_pos(req, kiocb);
> -
>   	ret = rw_verify_area(READ, req->file, ppos, req->result);
>   	if (unlikely(ret)) {
>   		kfree(iovec);
> +		io_kiocb_done_pos(req, kiocb, 0);

Why do we update it on failure?

[...]

> -	ppos = io_kiocb_update_pos(req, kiocb);
> -
>   	ret = rw_verify_area(WRITE, req->file, ppos, req->result);
>   	if (unlikely(ret))
>   		goto out_free;
> @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
>   		return ret ?: -EAGAIN;
>   	}
>   out_free:
> +	io_kiocb_done_pos(req, kiocb, 0);

Looks weird. It appears we don't need it on failure and
successes are covered by kiocb_done() / ->ki_complete

>   	/* it's reportedly faster than delegating the null check to kfree() */
>   	if (iovec)
>   		kfree(iovec);

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2022-02-21 18:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 14:16 [PATCH v2 0/4] io_uring: consistent behaviour with linked read/write Dylan Yudaken
2022-02-21 14:16 ` [PATCH v2 1/4] io_uring: remove duplicated calls to io_kiocb_ppos Dylan Yudaken
2022-02-21 14:16 ` [PATCH v2 2/4] io_uring: update kiocb->ki_pos at execution time Dylan Yudaken
2022-02-21 16:32   ` Jens Axboe
2022-02-21 14:16 ` [PATCH v2 3/4] io_uring: do not recalculate ppos unnecessarily Dylan Yudaken
2022-02-21 14:16 ` [PATCH v2 4/4] io_uring: pre-increment f_pos on rw Dylan Yudaken
2022-02-21 18:00   ` Pavel Begunkov [this message]
2022-02-22  7:20     ` Hao Xu
2022-02-22  8:26     ` Dylan Yudaken
2022-02-22  7:34   ` Hao Xu
2022-02-22 10:52     ` Dylan Yudaken
2022-02-21 16:33 ` [PATCH v2 0/4] io_uring: consistent behaviour with linked read/write Jens Axboe
2022-02-21 17:48   ` Dylan Yudaken

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox