public inbox for [email protected]
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <[email protected]>
To: Stefan Roesch <[email protected]>
Cc: [email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected]
Subject: Re: [PATCH v6 05/16] iomap: Add async buffered write support
Date: Thu, 26 May 2022 11:42:24 -0700	[thread overview]
Message-ID: <Yo/KEFfqcl3H4+q/@magnolia> (raw)
In-Reply-To: <[email protected]>

On Thu, May 26, 2022 at 10:38:29AM -0700, Stefan Roesch wrote:
> This adds async buffered write support to iomap.
> 
> This replaces the call to balance_dirty_pages_ratelimited() with the
> call to balance_dirty_pages_ratelimited_flags. This allows to specify if
> the write request is async or not.
> 
> In addition this also moves the above function call to the beginning of
> the function. If the function call is at the end of the function and the
> decision is made to throttle writes, then there is no request that
> io-uring can wait on. By moving it to the beginning of the function, the
> write request is not issued, but returns -EAGAIN instead. io-uring will
> punt the request and process it in the io-worker.
> 
> By moving the function call to the beginning of the function, the write
> throttling will happen one page later.

It does?  I would have thought that moving it before iomap_write_begin
call would make the throttling happen one page sooner?  Sorry if I'm
being dense here...

> Signed-off-by: Stefan Roesch <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> ---
>  fs/iomap/buffered-io.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d6ddc54e190e..2281667646d2 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -559,6 +559,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  	loff_t block_size = i_blocksize(iter->inode);
>  	loff_t block_start = round_down(pos, block_size);
>  	loff_t block_end = round_up(pos + len, block_size);
> +	unsigned int nr_blocks = i_blocks_per_folio(iter->inode, folio);
>  	size_t from = offset_in_folio(folio, pos), to = from + len;
>  	size_t poff, plen;
>  
> @@ -567,6 +568,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  	folio_clear_error(folio);
>  
>  	iop = iomap_page_create(iter->inode, folio, iter->flags);
> +	if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1)
> +		return -EAGAIN;
>  
>  	do {
>  		iomap_adjust_read_range(iter->inode, folio, &block_start,
> @@ -584,7 +587,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  				return -EIO;
>  			folio_zero_segments(folio, poff, from, to, poff + plen);
>  		} else {
> -			int status = iomap_read_folio_sync(block_start, folio,
> +			int status;
> +
> +			if (iter->flags & IOMAP_NOWAIT)
> +				return -EAGAIN;
> +
> +			status = iomap_read_folio_sync(block_start, folio,
>  					poff, plen, srcmap);
>  			if (status)
>  				return status;
> @@ -613,6 +621,9 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  	unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
>  	int status = 0;
>  
> +	if (iter->flags & IOMAP_NOWAIT)
> +		fgp |= FGP_NOWAIT;

FGP_NOWAIT can cause __filemap_get_folio to return a NULL folio, which
makes iomap_write_begin return -ENOMEM.  If nothing has been written
yet, won't that cause the ENOMEM to escape to userspace?  Why do we want
that instead of EAGAIN?

> +
>  	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
>  	if (srcmap != &iter->iomap)
>  		BUG_ON(pos + len > srcmap->offset + srcmap->length);
> @@ -750,6 +761,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  	loff_t pos = iter->pos;
>  	ssize_t written = 0;
>  	long status = 0;
> +	struct address_space *mapping = iter->inode->i_mapping;
> +	unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
>  
>  	do {
>  		struct folio *folio;
> @@ -762,6 +775,11 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  		bytes = min_t(unsigned long, PAGE_SIZE - offset,
>  						iov_iter_count(i));
>  again:
> +		status = balance_dirty_pages_ratelimited_flags(mapping,
> +							       bdp_flags);
> +		if (unlikely(status))
> +			break;
> +
>  		if (bytes > length)
>  			bytes = length;
>  
> @@ -770,6 +788,10 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  		 * Otherwise there's a nasty deadlock on copying from the
>  		 * same page as we're writing to, without it being marked
>  		 * up-to-date.
> +		 *
> +		 * For async buffered writes the assumption is that the user
> +		 * page has already been faulted in. This can be optimized by
> +		 * faulting the user page in the prepare phase of io-uring.

I don't think this pattern is unique to async writes with io_uring --
gfs2 also wanted this "try as many pages as you can until you hit a page
fault and then return a short write to caller so it can fault in the
rest" behavior.

--D

>  		 */
>  		if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
>  			status = -EFAULT;
> @@ -781,7 +803,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  			break;
>  
>  		page = folio_file_page(folio, pos >> PAGE_SHIFT);
> -		if (mapping_writably_mapped(iter->inode->i_mapping))
> +		if (mapping_writably_mapped(mapping))
>  			flush_dcache_page(page);
>  
>  		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> @@ -806,8 +828,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  		pos += status;
>  		written += status;
>  		length -= status;
> -
> -		balance_dirty_pages_ratelimited(iter->inode->i_mapping);
>  	} while (iov_iter_count(i) && length);
>  
>  	return written ? written : status;
> @@ -825,6 +845,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
>  	};
>  	int ret;
>  
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		iter.flags |= IOMAP_NOWAIT;
> +
>  	while ((ret = iomap_iter(&iter, ops)) > 0)
>  		iter.processed = iomap_write_iter(&iter, i);
>  	if (iter.pos == iocb->ki_pos)
> -- 
> 2.30.2
> 

  reply	other threads:[~2022-05-26 18:42 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-26 17:38 [PATCH v6 00/16] io-uring/xfs: support async buffered writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 01/16] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 02/16] mm: Move updates of dirty_exceeded into one place Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 03/16] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
2022-05-31  6:52   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 04/16] iomap: Add flags parameter to iomap_page_create() Stefan Roesch
2022-05-26 18:25   ` Darrick J. Wong
2022-05-26 18:43     ` Stefan Roesch
2022-06-01  0:34     ` Olivier Langlois
2022-06-01  8:21       ` Jan Kara
2022-06-01 17:29         ` Olivier Langlois
2022-05-31  6:54   ` Christoph Hellwig
2022-05-31 18:12     ` Stefan Roesch
2022-06-01 17:56       ` Darrick J. Wong
2022-05-26 17:38 ` [PATCH v6 05/16] iomap: Add async buffered write support Stefan Roesch
2022-05-26 18:42   ` Darrick J. Wong [this message]
2022-05-26 22:37   ` Dave Chinner
2022-05-27  8:42     ` Jan Kara
2022-05-27 22:52       ` Dave Chinner
2022-05-31  7:55         ` Jan Kara
2022-05-31  6:58   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 06/16] fs: Add check for async buffered writes to generic_write_checks Stefan Roesch
2022-05-31  6:59   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 07/16] fs: add __remove_file_privs() with flags parameter Stefan Roesch
2022-05-31  7:00   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 08/16] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
2022-05-31  7:01   ` Christoph Hellwig
2022-05-31 19:02     ` Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 09/16] fs: Add async write file modification handling Stefan Roesch
2022-05-31  7:01   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 10/16] fs: Optimization for concurrent file time updates Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 11/16] io_uring: Add support for async buffered writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 12/16] io_uring: Add tracepoint for short writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 13/16] xfs: Specify lockmode when calling xfs_ilock_for_iomap() Stefan Roesch
2022-05-31  7:03   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 14/16] xfs: Change function signature of xfs_ilock_iocb() Stefan Roesch
2022-05-31  7:04   ` Christoph Hellwig
2022-05-31 19:15     ` Stefan Roesch
2022-06-01  5:26       ` Christoph Hellwig
2022-06-01 17:15         ` Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 15/16] xfs: Add async buffered write support Stefan Roesch
2022-05-31  7:05   ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 16/16] xfs: Enable " Stefan Roesch
2022-05-31  7:05   ` Christoph Hellwig
2022-05-31 19:18     ` Stefan Roesch
2022-05-26 18:12 ` [PATCH v6 00/16] io-uring/xfs: support async buffered writes Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yo/KEFfqcl3H4+q/@magnolia \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox