From: "Darrick J. Wong" <[email protected]>
To: Stefan Roesch <[email protected]>
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected]
Subject: Re: [PATCH v6 05/16] iomap: Add async buffered write support
Date: Thu, 26 May 2022 11:42:24 -0700 [thread overview]
Message-ID: <Yo/KEFfqcl3H4+q/@magnolia> (raw)
In-Reply-To: <[email protected]>
On Thu, May 26, 2022 at 10:38:29AM -0700, Stefan Roesch wrote:
> This adds async buffered write support to iomap.
>
> This replaces the call to balance_dirty_pages_ratelimited() with the
> call to balance_dirty_pages_ratelimited_flags. This allows to specify if
> the write request is async or not.
>
> In addition this also moves the above function call to the beginning of
> the function. If the function call is at the end of the function and the
> decision is made to throttle writes, then there is no request that
> io-uring can wait on. By moving it to the beginning of the function, the
> write request is not issued, but returns -EAGAIN instead. io-uring will
> punt the request and process it in the io-worker.
>
> By moving the function call to the beginning of the function, the write
> throttling will happen one page later.
It does? I would have thought that moving it before iomap_write_begin
call would make the throttling happen one page sooner? Sorry if I'm
being dense here...
> Signed-off-by: Stefan Roesch <[email protected]>
> Reviewed-by: Jan Kara <[email protected]>
> ---
> fs/iomap/buffered-io.c | 31 +++++++++++++++++++++++++++----
> 1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d6ddc54e190e..2281667646d2 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -559,6 +559,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> loff_t block_size = i_blocksize(iter->inode);
> loff_t block_start = round_down(pos, block_size);
> loff_t block_end = round_up(pos + len, block_size);
> + unsigned int nr_blocks = i_blocks_per_folio(iter->inode, folio);
> size_t from = offset_in_folio(folio, pos), to = from + len;
> size_t poff, plen;
>
> @@ -567,6 +568,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> folio_clear_error(folio);
>
> iop = iomap_page_create(iter->inode, folio, iter->flags);
> + if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1)
> + return -EAGAIN;
>
> do {
> iomap_adjust_read_range(iter->inode, folio, &block_start,
> @@ -584,7 +587,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> return -EIO;
> folio_zero_segments(folio, poff, from, to, poff + plen);
> } else {
> - int status = iomap_read_folio_sync(block_start, folio,
> + int status;
> +
> + if (iter->flags & IOMAP_NOWAIT)
> + return -EAGAIN;
> +
> + status = iomap_read_folio_sync(block_start, folio,
> poff, plen, srcmap);
> if (status)
> return status;
> @@ -613,6 +621,9 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
> int status = 0;
>
> + if (iter->flags & IOMAP_NOWAIT)
> + fgp |= FGP_NOWAIT;
FGP_NOWAIT can cause __filemap_get_folio to return a NULL folio, which
makes iomap_write_begin return -ENOMEM. If nothing has been written
yet, won't that cause the ENOMEM to escape to userspace? Why do we want
that instead of EAGAIN?
> +
> BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> if (srcmap != &iter->iomap)
> BUG_ON(pos + len > srcmap->offset + srcmap->length);
> @@ -750,6 +761,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> loff_t pos = iter->pos;
> ssize_t written = 0;
> long status = 0;
> + struct address_space *mapping = iter->inode->i_mapping;
> + unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
>
> do {
> struct folio *folio;
> @@ -762,6 +775,11 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> bytes = min_t(unsigned long, PAGE_SIZE - offset,
> iov_iter_count(i));
> again:
> + status = balance_dirty_pages_ratelimited_flags(mapping,
> + bdp_flags);
> + if (unlikely(status))
> + break;
> +
> if (bytes > length)
> bytes = length;
>
> @@ -770,6 +788,10 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> * Otherwise there's a nasty deadlock on copying from the
> * same page as we're writing to, without it being marked
> * up-to-date.
> + *
> + * For async buffered writes the assumption is that the user
> + * page has already been faulted in. This can be optimized by
> + * faulting the user page in the prepare phase of io-uring.
I don't think this pattern is unique to async writes with io_uring --
gfs2 also wanted this "try as many pages as you can until you hit a page
fault and then return a short write to caller so it can fault in the
rest" behavior.
--D
> */
> if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
> status = -EFAULT;
> @@ -781,7 +803,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> break;
>
> page = folio_file_page(folio, pos >> PAGE_SHIFT);
> - if (mapping_writably_mapped(iter->inode->i_mapping))
> + if (mapping_writably_mapped(mapping))
> flush_dcache_page(page);
>
> copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> @@ -806,8 +828,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> pos += status;
> written += status;
> length -= status;
> -
> - balance_dirty_pages_ratelimited(iter->inode->i_mapping);
> } while (iov_iter_count(i) && length);
>
> return written ? written : status;
> @@ -825,6 +845,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
> };
> int ret;
>
> + if (iocb->ki_flags & IOCB_NOWAIT)
> + iter.flags |= IOMAP_NOWAIT;
> +
> while ((ret = iomap_iter(&iter, ops)) > 0)
> iter.processed = iomap_write_iter(&iter, i);
> if (iter.pos == iocb->ki_pos)
> --
> 2.30.2
>
next prev parent reply other threads:[~2022-05-26 18:42 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-26 17:38 [PATCH v6 00/16] io-uring/xfs: support async buffered writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 01/16] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 02/16] mm: Move updates of dirty_exceeded into one place Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 03/16] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
2022-05-31 6:52 ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 04/16] iomap: Add flags parameter to iomap_page_create() Stefan Roesch
2022-05-26 18:25 ` Darrick J. Wong
2022-05-26 18:43 ` Stefan Roesch
[not found] ` <[email protected]>
2022-05-31 18:12 ` Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 05/16] iomap: Add async buffered write support Stefan Roesch
2022-05-26 18:42 ` Darrick J. Wong [this message]
2022-05-26 22:37 ` Dave Chinner
2022-05-27 8:42 ` Jan Kara
2022-05-27 22:52 ` Dave Chinner
2022-05-31 7:55 ` Jan Kara
2022-05-26 17:38 ` [PATCH v6 06/16] fs: Add check for async buffered writes to generic_write_checks Stefan Roesch
2022-05-31 6:59 ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 07/16] fs: add __remove_file_privs() with flags parameter Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 08/16] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 09/16] fs: Add async write file modification handling Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 10/16] fs: Optimization for concurrent file time updates Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 11/16] io_uring: Add support for async buffered writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 12/16] io_uring: Add tracepoint for short writes Stefan Roesch
2022-05-26 17:38 ` [PATCH v6 13/16] xfs: Specify lockmode when calling xfs_ilock_for_iomap() Stefan Roesch
2022-05-31 7:03 ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 14/16] xfs: Change function signature of xfs_ilock_iocb() Stefan Roesch
2022-05-31 7:04 ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 15/16] xfs: Add async buffered write support Stefan Roesch
2022-05-31 7:05 ` Christoph Hellwig
2022-05-26 17:38 ` [PATCH v6 16/16] xfs: Enable " Stefan Roesch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yo/KEFfqcl3H4+q/@magnolia \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox