public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Stefan Roesch <[email protected]>,
	[email protected], [email protected], [email protected],
	[email protected], [email protected]
Cc: [email protected], [email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v9 00/14] io-uring/xfs: support async buffered writes
Date: Wed, 22 Jun 2022 11:41:14 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

Top posting - are people fine with queueing this up at this point? Will
need a bit of massaging for io_uring as certain things moved to another
file, but it's really minor. I'd do a separate topic branch for this.


On 6/16/22 3:22 PM, Stefan Roesch wrote:
> This patch series adds support for async buffered writes when using both
> xfs and io-uring. Currently io-uring only supports buffered writes in the
> slow path, by processing them in the io workers. With this patch series it is
> now possible to support buffered writes in the fast path. To be able to use
> the fast path the required pages must be in the page cache, the required locks
> in xfs can be granted immediately and no additional blocks need to be read
> form disk.
> 
> Updating the inode can take time. An optimization has been implemented for
> the time update. Time updates will be processed in the slow path. While there
> is already a time update in process, other write requests for the same file,
> can skip the update of the modification time.
>   
> 
> Performance results:
>   For fio the following results have been obtained with a queue depth of
>   1 and 4k block size (runtime 600 secs):
> 
>                  sequential writes:
>                  without patch           with patch      libaio     psync
>   iops:              77k                    209k          195K       233K
>   bw:               314MB/s                 854MB/s       790MB/s    953MB/s
>   clat:            9600ns                   120ns         540ns     3000ns
> 
> 
> For an io depth of 1, the new patch improves throughput by over three times
> (compared to the exiting behavior, where buffered writes are processed by an
> io-worker process) and also the latency is considerably reduced. To achieve the
> same or better performance with the exisiting code an io depth of 4 is required.
> Increasing the iodepth further does not lead to improvements.
> 
> In addition the latency of buffered write operations is reduced considerably.
> 
> 
> 
> Support for async buffered writes:
> 
>   To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In
>   addition the check in generic_write_checks is modified to allow for async
>   buffered writes that have this flag set.
> 
>   Changes to the iomap page create function to allow the caller to specify
>   the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set
>   and specifies the requested gfp flags.
> 
>   Adds the iomap async buffered write support to the xfs iomap layer.
>   Adds async buffered write support to the xfs iomap layer.
> 
> Support for async buffered write support and inode time modification
> 
>   Splits the functions for checking if the file privileges need to be removed in
>   two functions: check function and a function for the removal of file privileges.
>   The same split is also done for the function to update the file modification time.
> 
>   Implement an optimization that while a file modification time is pending other
>   requests for the same file don't need to wait for the file modification update. 
>   This avoids that a considerable number of buffered async write requests get
>   punted.
> 
>   Take the ilock in nowait mode if async buffered writes are enabled and enable
>   the async buffered writes optimization in io_uring.
> 
> Support for write throttling of async buffered writes:
> 
>   Add a no_wait parameter to the exisiting balance_dirty_pages() function. The
>   function will return -EAGAIN if the parameter is true and write throttling is
>   required.
> 
>   Add a new function called balance_dirty_pages_ratelimited_async() that will be
>   invoked from iomap_write_iter() if an async buffered write is requested.
>   
> Enable async buffered write support in xfs
>    This enables async buffered writes for xfs.
> 
> 
> Testing:
>   This patch has been tested with xfstests, fsx, fio and individual test programs.
> 
> 
> Changes:
>   V9:
>   - Added comment for function balance_dirty_pages_ratelimited_flags()
>   - checking return code for iop allocation in iomap_page_create()
>   
>   V8:
>   - Reverted back changes to iomap_write_iter and used Mathew Wilcox code review
>     recommendation with an additional change to revert the iterator.
>   - Removed patch "fs: Optimization for concurrent file time updates" 
>   - Setting flag value in file_modified_flags()
>   - Removed additional spaces in comment in file_update_time()
>   - Run fsx with 1 billion ops against the changes (Run passed)
> 
>   V7:
>   - Change definition and if clause in " iomap: Add flags parameter to
>     iomap_page_create()"
>   - Added patch "iomap: Return error code from iomap_write_iter()" to address
>     the problem Dave Chinner brought up: retrying memory allocation a second
>     time when we are under memory pressure. 
>   - Removed patch "xfs: Change function signature of xfs_ilock_iocb()"
>   - Merged patch "xfs: Enable async buffered write support" with previous
>     patch
> 
>   V6:
>   - Pass in iter->flags to calls in iomap_page_create()
>   
>   V5:
>   - Refreshed to 5.19-rc1
>   - Merged patch 3 and patch 4
>     "mm: Prepare balance_dirty_pages() for async buffered writes" and
>     "mm: Add balance_dirty_pages_ratelimited_flags() function"
>   - Reformatting long file in iomap_page_create()
>   - Replacing gfp parameter with flags parameter in iomap_page_create()
>     This makes sure that the gfp setting is done in one location.
>   - Moved variable definition outside of loop in iomap_write_iter()
>   - Merged patch 7 with patch 6.
>   - Introduced __file_remove_privs() that get the iocb_flags passed in
>     as an additional parameter
>   - Removed file_needs_remove_privs() function
>   - Renamed file_needs_update_time() inode_needs_update_time()
>   - inode_needs_update_time() no longer passes the file pointer
>   - Renamed file_modified_async() to file_modified_flags()
>   - Made file_modified_flags() an internal function
>   - Removed extern keyword in file_modified_async definition
>   - Added kiocb_modified function.
>   - Separate patch for changes to xfs_ilock_for_iomap()
>   - Separate patch for changes to xfs_ilock_inode()
>   - Renamed xfs_ilock_xfs_inode()n back to xfs_ilock_iocb()
>   - Renamed flags parameter to iocb_flags in function xfs_ilock_iocb()
>   - Used inode_set_flags() to manipulate inode flags in the function
>     file_modified_flags()
> 
>   V4:
>   - Reformat new code in generic_write_checks_count().
>   - Removed patch that introduced new function iomap_page_create_gfp().
>   - Add gfp parameter to iomap_page_create() and change all callers
>     All users will enforce the number of blocks check
>   - Removed confusing statement in iomap async buffer support patch
>   - Replace no_wait variable in __iomap_write_begin with check of
>     IOMAP_NOWAIT for easier readability.
>   - Moved else if clause in __iomap_write_begin into else clause for
>     easier readability
>   - Removed the balance_dirty_pages_ratelimited_async() function and
>     reverted back to the earlier version that used the function
>     balance_dirty_pages_ratelimited_flags()
>   - Introduced the flag BDP_ASYNC.
>   - Renamed variable in iomap_write_iter from i_mapping to mapping.
>   - Directly call balance_dirty_pages_ratelimited_flags() in the function
>     iomap_write_iter().
>   - Re-ordered the patches.
>   
>   V3:
>   - Reformat new code in generic_write_checks_count() to line lengthof 80.
>   - Remove if condition in __iomap_write_begin to maintain current behavior.
>   - use GFP_NOWAIT flag in __iomap_write_begin
>   - rename need_file_remove_privs() function to file_needs_remove_privs()
>   - rename do_file_remove_privs to __file_remove_privs()
>   - add kernel documentation to file_remove_privs() function
>   - rework else if branch in file_remove_privs() function
>   - add kernel documentation to file_modified() function
>   - add kernel documentation to file_modified_async() function
>   - rename err variable in file_update_time to ret
>   - rename function need_file_update_time() to file_needs_update_time()
>   - rename function do_file_update_time() to __file_update_time()
>   - don't move check for FMODE_NOCMTIME in generic helper
>   - reformat __file_update_time for easier reading
>   - add kernel documentation to file_update_time() function
>   - fix if in file_update_time from < to <=
>   - move modification of inode flags from do_file_update_time to file_modified()
>     When this function is called, the caller must hold the inode lock.
>   - 3 new patches from Jan to add new no_wait flag to balance_dirty_pages(),
>     remove patch 12 from previous series
>   - Make balance_dirty_pages_ratelimited_flags() a static function
>   - Add new balance_dirty_pages_ratelimited_async() function
>   
>   V2:
>   - Remove atomic allocation
>   - Use direct write in xfs_buffered_write_iomap_begin()
>   - Use xfs_ilock_for_iomap() in xfs_buffered_write_iomap_begin()
>   - Remove no_wait check at the end of xfs_buffered_write_iomap_begin() for
>     the COW path.
>   - Pass xfs_inode pointer to xfs_ilock_iocb and rename function to
>     xfs_lock_xfs_inode
>   - Replace existing uses of xfs_ilock_iocb with xfs_ilock_xfs_inode
>   - Use xfs_ilock_xfs_inode in xfs_file_buffered_write()
>   - Callers of xfs_ilock_for_iomap need to initialize lock mode. This is
>     required so writes use an exclusive lock
>   - Split of _balance_dirty_pages() from balance_dirty_pages() and return
>     sleep time
>   - Call _balance_dirty_pages() in balance_dirty_pages_ratelimited_flags()
>   - Move call to balance_dirty_pages_ratelimited_flags() in iomap_write_iter()
>     to the beginning of the loop
> 
> 
> 
> Jan Kara (3):
>   mm: Move starting of background writeback into the main balancing loop
>   mm: Move updates of dirty_exceeded into one place
>   mm: Add balance_dirty_pages_ratelimited_flags() function
> 
> Stefan Roesch (11):
>   iomap: Add flags parameter to iomap_page_create()
>   iomap: Add async buffered write support
>   iomap: Return -EAGAIN from iomap_write_iter()
>   fs: Add check for async buffered writes to generic_write_checks
>   fs: add __remove_file_privs() with flags parameter
>   fs: Split off inode_needs_update_time and __file_update_time
>   fs: Add async write file modification handling.
>   io_uring: Add support for async buffered writes
>   io_uring: Add tracepoint for short writes
>   xfs: Specify lockmode when calling xfs_ilock_for_iomap()
>   xfs: Add async buffered write support
> 
>  fs/inode.c                      | 168 +++++++++++++++++++++++---------
>  fs/io_uring.c                   |  32 +++++-
>  fs/iomap/buffered-io.c          |  71 +++++++++++---
>  fs/read_write.c                 |   4 +-
>  fs/xfs/xfs_file.c               |  11 +--
>  fs/xfs/xfs_iomap.c              |  11 ++-
>  include/linux/fs.h              |   4 +
>  include/linux/writeback.h       |   7 ++
>  include/trace/events/io_uring.h |  25 +++++
>  mm/page-writeback.c             |  89 +++++++++++------
>  10 files changed, 314 insertions(+), 108 deletions(-)
> 
> 
> base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3


-- 
Jens Axboe


  parent reply	other threads:[~2022-06-22 17:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-16 21:22 [PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 01/14] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 02/14] mm: Move updates of dirty_exceeded into one place Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 03/14] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
2022-06-20  0:53   ` kernel test robot
2022-06-16 21:22 ` [PATCH v9 05/14] iomap: Add async buffered write support Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter() Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 09/14] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 11/14] io_uring: Add support for async buffered writes Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 12/14] io_uring: Add tracepoint for short writes Stefan Roesch
2022-06-16 21:22 ` [PATCH v9 14/14] xfs: Add async buffered write support Stefan Roesch
2022-06-22 17:41 ` Jens Axboe [this message]
2022-06-22 19:35   ` [PATCH v9 00/14] io-uring/xfs: support async buffered writes Matthew Wilcox
2022-06-22 19:37     ` Jens Axboe
2022-06-22 21:24   ` Jan Kara
2022-06-22 22:27 ` Jens Axboe
2022-06-23  0:29   ` Darrick J. Wong
2022-06-23  0:50     ` Jens Axboe
2022-06-23  6:29       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox