public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v3 00/12] io-uring/btrfs: support async buffered writes
@ 2022-09-12 19:27 Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
                   ` (13 more replies)
  0 siblings, 14 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

This patch series adds support for async buffered writes when using both
btrfs and io-uring. Currently io-uring only supports buffered writes (for btrfs)
in the slow path, by processing them in the io workers. With this patch series
it is now possible to support buffered writes in the fast path. To be able to use
the fast path, the required pages must be in the page cache, the required locks
in btrfs can be granted immediately and no additional blocks need to be read
form disk.

This patch series makes use of the changes that have been introduced by a
previous patch series: "io-uring/xfs: support async buffered writes"

Performance results:

The new patch improves throughput by over two times (compared to the exiting
behavior, where buffered writes are processed by an io-worker process) and also
the latency is considerably reduced. Detailled results are part of the changelog
of the first commit.


BTRFS changes:
 -Add option for NOWAIT IOCB's to tell that searches do not wait on locks. This
  adds the nowait option to btrfs_path.
 -For NOWAIT buffered writes on PREALLOC or NOCOW extents tell can_nocow_extent()
  that we don't want to wait on any locks or metadata IO.
 -Support no_flush reservations for nowait buffered writes.
 -Add btrfs_try_lock_ordered_range() function.
 -Add nowait flag to btrfs_check_nocow_lock() to use it in write code path.
 -Add nowait parameter to prepare_pages() function.
 -Plumb nowait through the write code path.
 -Enable nowait buffered writes.


Testing:
  This patch has been tested with xfstests, fsx, fio. xfstests shows no new
  diffs compared to running without the patch series.


Changes:

V3:
 - Updated changelog of "btrfs: implement a nowait option for tree searches"
   to say -EAGAIN.
 - Use bool return value in signature of btrfs_try_lock_ordered_range
 - Renamed variable tmp to can_nocow in btrfs_buffered_write
 - Fixed coding style in get_prepare_fgp_flags
 - Set pages[i] to NULL in error code path of lock_and_cleanup_extent_if_need()
 - Added const in definition of "bool nowait"
 - Removed unlikely from btrfs_buffered_write
 - Rephrased changelog for "btrfs: add assert to search functions" and used
   asserts instead of warnings
 - Explained why enocded writes are not supported in the changelog
 - Moved performance results to changelog of first commit
 
V2:
 - Replace EWOULDBLOCK with EAGAIN. In Version 1 it was not used consistently
 - Export function balance_dirty_pages_ratelimited_flags()
 - Add asserts/warnings for search functions when nowait is set, but we don't
   expect that they are invoked with nowait set.



Josef Bacik (5):
  btrfs: implement a nowait option for tree searches
  btrfs: make can_nocow_extent nowait compatible
  btrfs: add the ability to use NO_FLUSH for data reservations
  btrfs: add btrfs_try_lock_ordered_range
  btrfs: make btrfs_check_nocow_lock nowait compatible

Stefan Roesch (7):
  mm: export balance_dirty_pages_ratelimited_flags()
  btrfs: make prepare_pages nowait compatible
  btrfs: make lock_and_cleanup_extent_if_need nowait compatible
  btrfs: plumb NOWAIT through the write path
  btrfs: make balance_dirty_pages nowait compatible
  btrfs: assert nowait mode is not used for some btree search functions
  btrfs: enable nowait async buffered writes

 fs/btrfs/block-group.c    |   2 +-
 fs/btrfs/ctree.c          |  43 ++++++++++++-
 fs/btrfs/ctree.h          |   8 ++-
 fs/btrfs/delalloc-space.c |  13 +++-
 fs/btrfs/delalloc-space.h |   3 +-
 fs/btrfs/extent-tree.c    |   5 ++
 fs/btrfs/file-item.c      |   4 +-
 fs/btrfs/file.c           | 124 ++++++++++++++++++++++++++++----------
 fs/btrfs/inode.c          |  22 ++++---
 fs/btrfs/locking.c        |  23 +++++++
 fs/btrfs/locking.h        |   1 +
 fs/btrfs/ordered-data.c   |  28 +++++++++
 fs/btrfs/ordered-data.h   |   1 +
 fs/btrfs/relocation.c     |   2 +-
 fs/btrfs/scrub.c          |   4 +-
 fs/btrfs/space-info.c     |   3 +-
 fs/btrfs/tree-log.c       |   6 +-
 mm/page-writeback.c       |   1 +
 18 files changed, 234 insertions(+), 59 deletions(-)


base-commit: 80e78fcce86de0288793a0ef0f6acf37656ee4cf
-- 
2.30.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 01/12] mm: export balance_dirty_pages_ratelimited_flags()
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm
  Cc: shr, axboe, josef, fdmanana, kernel test robot

Export the function balance_dirty_pages_ratelimited_flags(). It is now
also called from btrfs.

Signed-off-by: Stefan Roesch <[email protected]>
Reported-by: kernel test robot <[email protected]>
---
 mm/page-writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 032a7bf8d259..7e9d8d857ecc 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1933,6 +1933,7 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
 	wb_put(wb);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(balance_dirty_pages_ratelimited_flags);
 
 /**
  * balance_dirty_pages_ratelimited - balance dirty memory state.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 02/12] btrfs: implement a nowait option for tree searches
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-26  9:14   ` Filipe Manana
  2022-09-12 19:27 ` [PATCH v3 03/12] btrfs: make can_nocow_extent nowait compatible Stefan Roesch
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

For NOWAIT IOCB's we'll need a way to tell search to not wait on locks
or anything.  Accomplish this by adding a path->nowait flag that will
use trylocks and skip reading of metadata, returning -EAGAIN in either
of these cases.  For now we only need this for reads, so only the read
side is handled.  Add an ASSERT() to catch anybody trying to use this
for writes so they know they'll have to implement the write side.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.c   | 39 ++++++++++++++++++++++++++++++++++++---
 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/locking.c | 23 +++++++++++++++++++++++
 fs/btrfs/locking.h |  1 +
 4 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index ebfa35fe1c38..71b238364939 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1447,6 +1447,11 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 			return 0;
 		}
 
+		if (p->nowait) {
+			free_extent_buffer(tmp);
+			return -EAGAIN;
+		}
+
 		if (unlock_up)
 			btrfs_unlock_up_safe(p, level + 1);
 
@@ -1467,6 +1472,8 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 			ret = -EAGAIN;
 
 		goto out;
+	} else if (p->nowait) {
+		return -EAGAIN;
 	}
 
 	if (unlock_up) {
@@ -1634,7 +1641,13 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 		 * We don't know the level of the root node until we actually
 		 * have it read locked
 		 */
-		b = btrfs_read_lock_root_node(root);
+		if (p->nowait) {
+			b = btrfs_try_read_lock_root_node(root);
+			if (IS_ERR(b))
+				return b;
+		} else {
+			b = btrfs_read_lock_root_node(root);
+		}
 		level = btrfs_header_level(b);
 		if (level > write_lock_level)
 			goto out;
@@ -1910,6 +1923,13 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	WARN_ON(p->nodes[0] != NULL);
 	BUG_ON(!cow && ins_len);
 
+	/*
+	 * For now only allow nowait for read only operations.  There's no
+	 * strict reason why we can't, we just only need it for reads so I'm
+	 * only implementing it for reads right now.
+	 */
+	ASSERT(!p->nowait || !cow);
+
 	if (ins_len < 0) {
 		lowest_unlock = 2;
 
@@ -1936,7 +1956,12 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
 	if (p->need_commit_sem) {
 		ASSERT(p->search_commit_root);
-		down_read(&fs_info->commit_root_sem);
+		if (p->nowait) {
+			if (!down_read_trylock(&fs_info->commit_root_sem))
+				return -EAGAIN;
+		} else {
+			down_read(&fs_info->commit_root_sem);
+		}
 	}
 
 again:
@@ -2082,7 +2107,15 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 				btrfs_tree_lock(b);
 				p->locks[level] = BTRFS_WRITE_LOCK;
 			} else {
-				btrfs_tree_read_lock(b);
+				if (p->nowait) {
+					if (!btrfs_try_tree_read_lock(b)) {
+						free_extent_buffer(b);
+						ret = -EAGAIN;
+						goto done;
+					}
+				} else {
+					btrfs_tree_read_lock(b);
+				}
 				p->locks[level] = BTRFS_READ_LOCK;
 			}
 			p->nodes[level] = b;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index df8c99c99df9..ca59ba6421a9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -443,6 +443,7 @@ struct btrfs_path {
 	 * header (ie. sizeof(struct btrfs_item) is not included).
 	 */
 	unsigned int search_for_extension:1;
+	unsigned int nowait:1;
 };
 #define BTRFS_MAX_EXTENT_ITEM_SIZE(r) ((BTRFS_LEAF_DATA_SIZE(r->fs_info) >> 4) - \
 					sizeof(struct btrfs_item))
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 9063072b399b..d6c88922d3e2 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -285,6 +285,29 @@ struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root)
 	return eb;
 }
 
+/*
+ * Loop around taking references on and locking the root node of the tree in
+ * nowait mode until we end up with a lock on the root node or returning to
+ * avoid blocking.
+ *
+ * Return: root extent buffer with read lock held or -EWOULDBLOCK.
+ */
+struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root)
+{
+	struct extent_buffer *eb;
+
+	while (1) {
+		eb = btrfs_root_node(root);
+		if (!btrfs_try_tree_read_lock(eb))
+			return ERR_PTR(-EAGAIN);
+		if (eb == root->node)
+			break;
+		btrfs_tree_read_unlock(eb);
+		free_extent_buffer(eb);
+	}
+	return eb;
+}
+
 /*
  * DREW locks
  * ==========
diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index ab268be09bb5..490c7a79e995 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -94,6 +94,7 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb);
 int btrfs_try_tree_write_lock(struct extent_buffer *eb);
 struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root);
 struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root);
+struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root);
 
 #ifdef CONFIG_BTRFS_DEBUG
 static inline void btrfs_assert_tree_write_locked(struct extent_buffer *eb)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 03/12] btrfs: make can_nocow_extent nowait compatible
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 04/12] btrfs: add the ability to use NO_FLUSH for data reservations Stefan Roesch
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

If we have NOWAIT specified on our IOCB and we're writing into a
PREALLOC or NOCOW extent then we need to be able to tell
can_nocow_extent that we don't want to wait on any locks or metadata IO.
Fix can_nocow_extent to allow for NOWAIT.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.h       |  5 +++--
 fs/btrfs/extent-tree.c |  5 +++++
 fs/btrfs/file-item.c   |  4 +++-
 fs/btrfs/file.c        |  2 +-
 fs/btrfs/inode.c       | 16 ++++++++++------
 fs/btrfs/relocation.c  |  2 +-
 fs/btrfs/scrub.c       |  4 ++--
 fs/btrfs/tree-log.c    |  6 +++---
 8 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ca59ba6421a9..5cc5394503e0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3274,7 +3274,8 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
 				u64 offset, bool one_ordered);
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
-			     struct list_head *list, int search_commit);
+			     struct list_head *list, int search_commit,
+			     bool nowait);
 void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				     const struct btrfs_path *path,
 				     struct btrfs_file_extent_item *fi,
@@ -3304,7 +3305,7 @@ struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
 					   u64 start, u64 len);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 			      u64 *orig_start, u64 *orig_block_len,
-			      u64 *ram_bytes, bool strict);
+			      u64 *ram_bytes, bool nowait, bool strict);
 
 void __btrfs_del_delalloc_inode(struct btrfs_root *root,
 				struct btrfs_inode *inode);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6914cd8024ba..583ddae3c270 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2220,6 +2220,11 @@ static noinline int check_delayed_ref(struct btrfs_root *root,
 	}
 
 	if (!mutex_trylock(&head->mutex)) {
+		if (path->nowait) {
+			spin_unlock(&delayed_refs->lock);
+			return -EAGAIN;
+		}
+
 		refcount_inc(&head->refs);
 		spin_unlock(&delayed_refs->lock);
 
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index c828f971a346..fcc6ce861409 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -503,7 +503,8 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 }
 
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
-			     struct list_head *list, int search_commit)
+			     struct list_head *list, int search_commit,
+			     bool nowait)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key key;
@@ -525,6 +526,7 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
 	if (!path)
 		return -ENOMEM;
 
+	path->nowait = nowait;
 	if (search_commit) {
 		path->skip_locking = 1;
 		path->reada = READA_FORWARD;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5a3f6e0d9688..f4aa198f0f87 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1502,7 +1502,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 
 	btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
-			NULL, NULL, NULL, false);
+			NULL, NULL, NULL, false, false);
 	if (ret <= 0) {
 		ret = 0;
 		btrfs_drew_write_unlock(&root->snapshot_lock);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1372210869b1..e0fe2c4b721f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1666,7 +1666,7 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 }
 
 static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info,
-					u64 bytenr, u64 num_bytes)
+					u64 bytenr, u64 num_bytes, bool nowait)
 {
 	struct btrfs_root *csum_root = btrfs_csum_root(fs_info, bytenr);
 	struct btrfs_ordered_sum *sums;
@@ -1674,7 +1674,8 @@ static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info,
 	LIST_HEAD(list);
 
 	ret = btrfs_lookup_csums_range(csum_root, bytenr,
-				       bytenr + num_bytes - 1, &list, 0);
+				       bytenr + num_bytes - 1, &list, 0,
+				       nowait);
 	if (ret == 0 && list_empty(&list))
 		return 0;
 
@@ -1800,6 +1801,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 	u8 extent_type;
 	int can_nocow = 0;
 	int ret = 0;
+	bool nowait = path->nowait;
 
 	fi = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_file_extent_item);
 	extent_type = btrfs_file_extent_type(leaf, fi);
@@ -1876,7 +1878,8 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 	 * Force COW if csums exist in the range. This ensures that csums for a
 	 * given extent are either valid or do not exist.
 	 */
-	ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes);
+	ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes,
+				  nowait);
 	WARN_ON_ONCE(ret > 0 && is_freespace_inode);
 	if (ret != 0)
 		goto out;
@@ -7292,7 +7295,7 @@ static bool btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr)
  */
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 			      u64 *orig_start, u64 *orig_block_len,
-			      u64 *ram_bytes, bool strict)
+			      u64 *ram_bytes, bool nowait, bool strict)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct can_nocow_file_extent_args nocow_args = { 0 };
@@ -7308,6 +7311,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
+	path->nowait = nowait;
 
 	ret = btrfs_lookup_file_extent(NULL, root, path,
 			btrfs_ino(BTRFS_I(inode)), offset, 0);
@@ -7577,7 +7581,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		block_start = em->block_start + (start - em->start);
 
 		if (can_nocow_extent(inode, start, &len, &orig_start,
-				     &orig_block_len, &ram_bytes, false) == 1) {
+				     &orig_block_len, &ram_bytes, false, false) == 1) {
 			bg = btrfs_inc_nocow_writers(fs_info, block_start);
 			if (bg)
 				can_nocow = true;
@@ -11242,7 +11246,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 		free_extent_map(em);
 		em = NULL;
 
-		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, true);
+		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, false, true);
 		if (ret < 0) {
 			goto out;
 		} else if (ret) {
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 45c02aba2492..dfc3f6c04b13 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4339,7 +4339,7 @@ int btrfs_reloc_clone_csums(struct btrfs_inode *inode, u64 file_pos, u64 len)
 	disk_bytenr = file_pos + inode->index_cnt;
 	csum_root = btrfs_csum_root(fs_info, disk_bytenr);
 	ret = btrfs_lookup_csums_range(csum_root, disk_bytenr,
-				       disk_bytenr + len - 1, &list, 0);
+				       disk_bytenr + len - 1, &list, 0, false);
 	if (ret)
 		goto out;
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3afe5fa50a63..1cb3eed8b917 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3077,7 +3077,7 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx,
 
 		ret = btrfs_lookup_csums_range(csum_root, extent_start,
 					       extent_start + extent_size - 1,
-					       &sctx->csum_list, 1);
+					       &sctx->csum_list, 1, false);
 		if (ret) {
 			scrub_parity_mark_sectors_error(sparity, extent_start,
 							extent_size);
@@ -3303,7 +3303,7 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		if (extent_flags & BTRFS_EXTENT_FLAG_DATA) {
 			ret = btrfs_lookup_csums_range(csum_root, cur_logical,
 					cur_logical + scrub_len - 1,
-					&sctx->csum_list, 1);
+					&sctx->csum_list, 1, false);
 			if (ret)
 				break;
 		}
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9205c4a5ca81..8af30dab2a17 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -801,7 +801,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 
 			ret = btrfs_lookup_csums_range(root->log_root,
 						csum_start, csum_end - 1,
-						&ordered_sums, 0);
+						&ordered_sums, 0, false);
 			if (ret)
 				goto out;
 			/*
@@ -4513,7 +4513,7 @@ static noinline int copy_items(struct btrfs_trans_handle *trans,
 		disk_bytenr += extent_offset;
 		ret = btrfs_lookup_csums_range(csum_root, disk_bytenr,
 					       disk_bytenr + extent_num_bytes - 1,
-					       &ordered_sums, 0);
+					       &ordered_sums, 0, false);
 		if (ret)
 			goto out;
 
@@ -4709,7 +4709,7 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 	ret = btrfs_lookup_csums_range(csum_root,
 				       em->block_start + csum_offset,
 				       em->block_start + csum_offset +
-				       csum_len - 1, &ordered_sums, 0);
+				       csum_len - 1, &ordered_sums, 0, false);
 	if (ret)
 		return ret;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 04/12] btrfs: add the ability to use NO_FLUSH for data reservations
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (2 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 03/12] btrfs: make can_nocow_extent nowait compatible Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 05/12] btrfs: add btrfs_try_lock_ordered_range Stefan Roesch
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

In order to accommodate NOWAIT IOCB's we need to be able to do NO_FLUSH
data reservations, so plumb this through the delalloc reservation
system.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/block-group.c    |  2 +-
 fs/btrfs/delalloc-space.c | 13 ++++++++++---
 fs/btrfs/delalloc-space.h |  3 ++-
 fs/btrfs/file.c           |  2 +-
 fs/btrfs/inode.c          |  4 ++--
 fs/btrfs/space-info.c     |  3 ++-
 6 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index e0375ba9d0fe..9df51245ba93 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2869,7 +2869,7 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
 	cache_size *= fs_info->sectorsize;
 
 	ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, 0,
-					  cache_size);
+					  cache_size, false);
 	if (ret)
 		goto out_put;
 
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 1e8f17ff829e..118b2e20b2e1 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -127,9 +127,11 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 }
 
 int btrfs_check_data_free_space(struct btrfs_inode *inode,
-			struct extent_changeset **reserved, u64 start, u64 len)
+				struct extent_changeset **reserved, u64 start,
+				u64 len, bool noflush)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_DATA;
 	int ret;
 
 	/* align the range */
@@ -137,7 +139,12 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
 	      round_down(start, fs_info->sectorsize);
 	start = round_down(start, fs_info->sectorsize);
 
-	ret = btrfs_alloc_data_chunk_ondemand(inode, len);
+	if (noflush)
+		flush = BTRFS_RESERVE_NO_FLUSH;
+	else if (btrfs_is_free_space_inode(inode))
+		flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
+
+	ret = btrfs_reserve_data_bytes(fs_info, len, flush);
 	if (ret < 0)
 		return ret;
 
@@ -454,7 +461,7 @@ int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
 {
 	int ret;
 
-	ret = btrfs_check_data_free_space(inode, reserved, start, len);
+	ret = btrfs_check_data_free_space(inode, reserved, start, len, false);
 	if (ret < 0)
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(inode, len, len, false);
diff --git a/fs/btrfs/delalloc-space.h b/fs/btrfs/delalloc-space.h
index 28bf5c3ef430..e07d46043455 100644
--- a/fs/btrfs/delalloc-space.h
+++ b/fs/btrfs/delalloc-space.h
@@ -7,7 +7,8 @@ struct extent_changeset;
 
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
 int btrfs_check_data_free_space(struct btrfs_inode *inode,
-			struct extent_changeset **reserved, u64 start, u64 len);
+			struct extent_changeset **reserved, u64 start, u64 len,
+			bool noflush);
 void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct btrfs_inode *inode,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f4aa198f0f87..0f257205c63d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1664,7 +1664,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extent_changeset_release(data_reserved);
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
 						  &data_reserved, pos,
-						  write_bytes);
+						  write_bytes, false);
 		if (ret < 0) {
 			/*
 			 * If we don't have to COW at the offset, reserve
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e0fe2c4b721f..52b3abb4c57c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4881,7 +4881,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	block_end = block_start + blocksize - 1;
 
 	ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
-					  blocksize);
+					  blocksize, false);
 	if (ret < 0) {
 		if (btrfs_check_nocow_lock(inode, block_start, &write_bytes) > 0) {
 			/* For nocow case, no need to reserve data space */
@@ -7766,7 +7766,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	if (write && !(flags & IOMAP_NOWAIT)) {
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
 						  &dio_data->data_reserved,
-						  start, data_alloc_len);
+						  start, data_alloc_len, false);
 		if (!ret)
 			dio_data->data_space_reserved = true;
 		else if (ret && !(BTRFS_I(inode)->flags &
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 435559ba94fa..a9d4bd374462 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1737,7 +1737,8 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 	int ret;
 
 	ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
-	       flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE);
+	       flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE ||
+	       flush == BTRFS_RESERVE_NO_FLUSH);
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
 
 	ret = __reserve_bytes(fs_info, data_sinfo, bytes, flush);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 05/12] btrfs: add btrfs_try_lock_ordered_range
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (3 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 04/12] btrfs: add the ability to use NO_FLUSH for data reservations Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible Stefan Roesch
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

For IOCB_NOWAIT we're going to want to use try lock on the extent lock,
and simply bail if there's an ordered extent in the range because the
only choice there is to wait for the ordered extent to complete.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ordered-data.c | 28 ++++++++++++++++++++++++++++
 fs/btrfs/ordered-data.h |  1 +
 2 files changed, 29 insertions(+)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 1952ac85222c..29c570f9a0ff 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -1041,6 +1041,34 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 	}
 }
 
+/*
+ * btrfs_try_lock_ordered_range - lock the passed range and ensure all pending
+ * ordered extents in it are run to completion in nowait mode.
+ *
+ * @inode:        Inode whose ordered tree is to be searched
+ * @start:        Beginning of range to flush
+ * @end:          Last byte of range to lock
+ *
+ * This function returns true if btrfs_lock_ordered_range does not return any
+ * extents, otherwise false.
+ */
+bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end)
+{
+	struct btrfs_ordered_extent *ordered;
+
+	if (!try_lock_extent(&inode->io_tree, start, end))
+		return false;
+
+	ordered = btrfs_lookup_ordered_range(inode, start, end - start + 1);
+	if (!ordered)
+		return true;
+
+	btrfs_put_ordered_extent(ordered);
+	unlock_extent(&inode->io_tree, start, end);
+	return false;
+}
+
+
 static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
 				u64 len)
 {
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 87792f85e2c4..8edd4b2d3952 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -218,6 +218,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
 void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 					u64 end,
 					struct extent_state **cached_state);
+bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end);
 int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
 			       u64 post);
 int __init ordered_data_init(void);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (4 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 05/12] btrfs: add btrfs_try_lock_ordered_range Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 07/12] btrfs: make prepare_pages " Stefan Roesch
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

Now all the helpers that btrfs_check_nocow_lock uses handle nowait, add
a nowait flag to btrfs_check_nocow_lock so it can be used by the write
path.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/file.c  | 33 ++++++++++++++++++++++-----------
 fs/btrfs/inode.c |  2 +-
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 5cc5394503e0..eb25d007e651 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3480,7 +3480,7 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 		      struct extent_state **cached, bool noreserve);
 int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
 int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
-			   size_t *write_bytes);
+			   size_t *write_bytes, bool nowait);
 void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
 
 /* tree-defrag.c */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0f257205c63d..f18efd9f2bc3 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1481,7 +1481,7 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
  * NOTE: Callers need to call btrfs_check_nocow_unlock() if we return > 0.
  */
 int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
-			   size_t *write_bytes)
+			   size_t *write_bytes, bool nowait)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_root *root = inode->root;
@@ -1500,16 +1500,21 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 			   fs_info->sectorsize) - 1;
 	num_bytes = lockend - lockstart + 1;
 
-	btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
+	if (nowait) {
+		if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend)) {
+			btrfs_drew_write_unlock(&root->snapshot_lock);
+			return -EAGAIN;
+		}
+	} else {
+		btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
+	}
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
-			NULL, NULL, NULL, false, false);
-	if (ret <= 0) {
-		ret = 0;
+			NULL, NULL, NULL, nowait, false);
+	if (ret <= 0)
 		btrfs_drew_write_unlock(&root->snapshot_lock);
-	} else {
+	else
 		*write_bytes = min_t(size_t, *write_bytes ,
 				     num_bytes - pos + lockstart);
-	}
 	unlock_extent(&inode->io_tree, lockstart, lockend);
 
 	return ret;
@@ -1666,16 +1671,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 						  &data_reserved, pos,
 						  write_bytes, false);
 		if (ret < 0) {
+			int can_nocow;
+
 			/*
 			 * If we don't have to COW at the offset, reserve
 			 * metadata only. write_bytes may get smaller than
 			 * requested here.
 			 */
-			if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
-						   &write_bytes) > 0)
-				only_release_metadata = true;
-			else
+			can_nocow = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
+							   &write_bytes, false);
+			if (can_nocow < 0)
+				ret = can_nocow;
+			if (can_nocow > 0)
+				ret = 0;
+			if (ret)
 				break;
+			only_release_metadata = true;
 		}
 
 		num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 52b3abb4c57c..4ff07b47df82 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4883,7 +4883,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
 					  blocksize, false);
 	if (ret < 0) {
-		if (btrfs_check_nocow_lock(inode, block_start, &write_bytes) > 0) {
+		if (btrfs_check_nocow_lock(inode, block_start, &write_bytes, false) > 0) {
 			/* For nocow case, no need to reserve data space */
 			only_release_metadata = true;
 		} else {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 07/12] btrfs: make prepare_pages nowait compatible
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (5 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 08/12] btrfs: make lock_and_cleanup_extent_if_need " Stefan Roesch
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

Add nowait parameter to the prepare_pages function. In case nowait is
specified for an async buffered write request, do a nowait allocation or
return -EAGAIN.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f18efd9f2bc3..d4362ee7e09e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1339,26 +1339,54 @@ static int prepare_uptodate_page(struct inode *inode,
 	return 0;
 }
 
+static int get_prepare_fgp_flags(bool nowait)
+{
+	int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;
+
+	if (nowait)
+		fgp_flags |= FGP_NOWAIT;
+
+	return fgp_flags;
+}
+
+static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait)
+{
+	gfp_t gfp;
+
+	gfp = btrfs_alloc_write_mask(inode->i_mapping);
+	if (nowait) {
+		gfp &= ~__GFP_DIRECT_RECLAIM;
+		gfp |= GFP_NOWAIT;
+	}
+
+	return gfp;
+}
+
 /*
  * this just gets pages into the page cache and locks them down.
  */
 static noinline int prepare_pages(struct inode *inode, struct page **pages,
 				  size_t num_pages, loff_t pos,
-				  size_t write_bytes, bool force_uptodate)
+				  size_t write_bytes, bool force_uptodate,
+				  bool nowait)
 {
 	int i;
 	unsigned long index = pos >> PAGE_SHIFT;
-	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
+	gfp_t mask = get_prepare_gfp_flags(inode, nowait);
+	int fgp_flags = get_prepare_fgp_flags(nowait);
 	int err = 0;
 	int faili;
 
 	for (i = 0; i < num_pages; i++) {
 again:
-		pages[i] = find_or_create_page(inode->i_mapping, index + i,
-					       mask | __GFP_WRITE);
+		pages[i] = pagecache_get_page(inode->i_mapping, index + i,
+					fgp_flags, mask | __GFP_WRITE);
 		if (!pages[i]) {
 			faili = i - 1;
-			err = -ENOMEM;
+			if (nowait)
+				err = -EAGAIN;
+			else
+				err = -ENOMEM;
 			goto fail;
 		}
 
@@ -1376,7 +1404,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 						    pos + write_bytes, false);
 		if (err) {
 			put_page(pages[i]);
-			if (err == -EAGAIN) {
+			if (!nowait && err == -EAGAIN) {
 				err = 0;
 				goto again;
 			}
@@ -1716,7 +1744,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		 */
 		ret = prepare_pages(inode, pages, num_pages,
 				    pos, write_bytes,
-				    force_page_uptodate);
+				    force_page_uptodate, false);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 08/12] btrfs: make lock_and_cleanup_extent_if_need nowait compatible
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (6 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 07/12] btrfs: make prepare_pages " Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 09/12] btrfs: plumb NOWAIT through the write path Stefan Roesch
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

This adds the nowait parameter to lock_and_cleanup_extent_if_need(). If
the nowait parameter is specified we try to lock the extent in nowait
mode.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d4362ee7e09e..cb54ba631691 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1439,7 +1439,7 @@ static noinline int
 lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 				size_t num_pages, loff_t pos,
 				size_t write_bytes,
-				u64 *lockstart, u64 *lockend,
+				u64 *lockstart, u64 *lockend, bool nowait,
 				struct extent_state **cached_state)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
@@ -1454,8 +1454,21 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	if (start_pos < inode->vfs_inode.i_size) {
 		struct btrfs_ordered_extent *ordered;
 
-		lock_extent_bits(&inode->io_tree, start_pos, last_pos,
+		if (nowait) {
+			if (!try_lock_extent(&inode->io_tree, start_pos, last_pos)) {
+				for (i = 0; i < num_pages; i++) {
+					unlock_page(pages[i]);
+					put_page(pages[i]);
+					pages[i] = NULL;
+				}
+
+				return -EAGAIN;
+			}
+		} else {
+			lock_extent_bits(&inode->io_tree, start_pos, last_pos,
 				cached_state);
+		}
+
 		ordered = btrfs_lookup_ordered_range(inode, start_pos,
 						     last_pos - start_pos + 1);
 		if (ordered &&
@@ -1754,7 +1767,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extents_locked = lock_and_cleanup_extent_if_need(
 				BTRFS_I(inode), pages,
 				num_pages, pos, write_bytes, &lockstart,
-				&lockend, &cached_state);
+				&lockend, false, &cached_state);
 		if (extents_locked < 0) {
 			if (extents_locked == -EAGAIN)
 				goto again;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 09/12] btrfs: plumb NOWAIT through the write path
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (7 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 08/12] btrfs: make lock_and_cleanup_extent_if_need " Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 10/12] btrfs: make balance_dirty_pages nowait compatible Stefan Roesch
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

We have everywhere setup for nowait, plumb NOWAIT through the write path.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cb54ba631691..3d038e1257f9 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1653,8 +1653,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	bool force_page_uptodate = false;
 	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
+	const bool nowait = iocb->ki_flags & IOCB_NOWAIT;
 
-	if (iocb->ki_flags & IOCB_NOWAIT)
+	if (nowait)
 		ilock_flags |= BTRFS_ILOCK_TRY;
 
 	ret = btrfs_inode_lock(inode, ilock_flags);
@@ -1710,17 +1711,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extent_changeset_release(data_reserved);
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
 						  &data_reserved, pos,
-						  write_bytes, false);
+						  write_bytes, nowait);
 		if (ret < 0) {
 			int can_nocow;
 
+			if (nowait && (ret == -ENOSPC || ret == -EAGAIN)) {
+				ret = -EAGAIN;
+				break;
+			}
+
 			/*
 			 * If we don't have to COW at the offset, reserve
 			 * metadata only. write_bytes may get smaller than
 			 * requested here.
 			 */
 			can_nocow = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
-							   &write_bytes, false);
+							   &write_bytes, nowait);
 			if (can_nocow < 0)
 				ret = can_nocow;
 			if (can_nocow > 0)
@@ -1737,7 +1743,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		WARN_ON(reserve_bytes == 0);
 		ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
 						      reserve_bytes,
-						      reserve_bytes, false);
+						      reserve_bytes, nowait);
 		if (ret) {
 			if (!only_release_metadata)
 				btrfs_free_reserved_data_space(BTRFS_I(inode),
@@ -1767,10 +1773,11 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extents_locked = lock_and_cleanup_extent_if_need(
 				BTRFS_I(inode), pages,
 				num_pages, pos, write_bytes, &lockstart,
-				&lockend, false, &cached_state);
+				&lockend, nowait, &cached_state);
 		if (extents_locked < 0) {
-			if (extents_locked == -EAGAIN)
+			if (!nowait && extents_locked == -EAGAIN)
 				goto again;
+
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
 			ret = extents_locked;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 10/12] btrfs: make balance_dirty_pages nowait compatible
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (8 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 09/12] btrfs: plumb NOWAIT through the write path Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 11/12] btrfs: assert nowait mode is not used for some btree search functions Stefan Roesch
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

This replaces the call to function balance_dirty_pages_ratelimited() in
the function btrfs_buffered_write() with a call to
balance_dirty_pages_ratelimited_flags().

It also moves the function after the again label. This can cause the
function to be called a bit later, but this should have no impact in the
real world.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 3d038e1257f9..4dc6484ff229 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1654,6 +1654,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
 	const bool nowait = iocb->ki_flags & IOCB_NOWAIT;
+	unsigned int bdp_flags = nowait ? BDP_ASYNC : 0;
 
 	if (nowait)
 		ilock_flags |= BTRFS_ILOCK_TRY;
@@ -1756,6 +1757,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		release_bytes = reserve_bytes;
 again:
+		ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
+		if (ret)
+			break;
+
 		/*
 		 * This is going to setup the pages array with the number of
 		 * pages we want, so we don't really need to worry about the
@@ -1860,8 +1865,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		cond_resched();
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
-
 		pos += copied;
 		num_written += copied;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 11/12] btrfs: assert nowait mode is not used for some btree search functions
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (9 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 10/12] btrfs: make balance_dirty_pages nowait compatible Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-12 19:27 ` [PATCH v3 12/12] btrfs: enable nowait async buffered writes Stefan Roesch
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

This adds nowait asserts to btree search functions which are not used
by buffered IO and direct IO paths.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 71b238364939..d973d6702cd0 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2164,6 +2164,7 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
 
 	lowest_level = p->lowest_level;
 	WARN_ON(p->nodes[0] != NULL);
+	ASSERT(!p->nowait);
 
 	if (p->search_commit_root) {
 		BUG_ON(time_seq);
@@ -4465,6 +4466,7 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key,
 	int ret = 1;
 	int keep_locks = path->keep_locks;
 
+	ASSERT(!path->nowait);
 	path->keep_locks = 1;
 again:
 	cur = btrfs_read_lock_root_node(root);
@@ -4645,6 +4647,8 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
 	int ret;
 	int i;
 
+	ASSERT(!path->nowait);
+
 	nritems = btrfs_header_nritems(path->nodes[0]);
 	if (nritems == 0)
 		return 1;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 12/12] btrfs: enable nowait async buffered writes
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (10 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 11/12] btrfs: assert nowait mode is not used for some btree search functions Stefan Roesch
@ 2022-09-12 19:27 ` Stefan Roesch
  2022-09-13  9:47 ` [PATCH v3 00/12] io-uring/btrfs: support " Filipe Manana
  2022-09-20 12:25 ` David Sterba
  13 siblings, 0 replies; 19+ messages in thread
From: Stefan Roesch @ 2022-09-12 19:27 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs, linux-mm; +Cc: shr, axboe, josef, fdmanana

Enable nowait async buffered writes in btrfs_do_write_iter() and
btrfs_file_open().

In this version encoded buffered writes have the optimization not
enabled. Encoded writes are enabled by using an ioctl. io-uring
currently do not support ioctl's. This might be enabled in the future.

Performance results:
  For fio the following results have been obtained with a queue depth of
  1 and 4k block size (runtime 600 secs):

                 sequential writes:
                 without patch           with patch      libaio     psync
  iops:              55k                    134k          117K       148K
  bw:               221MB/s                 538MB/s       469MB/s    592MB/s
  clat:           15286ns                    82ns         994ns     6340ns

For an io depth of 1, the new patch improves throughput by over two times
(compared to the exiting behavior, where buffered writes are processed by an
io-worker process) and also the latency is considerably reduced. To achieve the
same or better performance with the exisiting code an io depth of 4 is required.
Increasing the iodepth further does not lead to improvements.

The tests have been run like this:
./fio --name=seq-writers --ioengine=psync --iodepth=1 --rw=write \
  --bs=4k --direct=0 --size=100000m --time_based --runtime=600   \
  --numjobs=1 --filename=...
./fio --name=seq-writers --ioengine=io_uring --iodepth=1 --rw=write \
  --bs=4k --direct=0 --size=100000m --time_based --runtime=600   \
  --numjobs=1 --filename=...
./fio --name=seq-writers --ioengine=libaio --iodepth=1 --rw=write \
  --bs=4k --direct=0 --size=100000m --time_based --runtime=600   \
  --numjobs=1 --filename=...

Testing:
  This patch has been tested with xfstests, fsx, fio. xfstests shows no new
  diffs compared to running without the patch series.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4dc6484ff229..16052903fa82 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2107,13 +2107,13 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	if (BTRFS_FS_ERROR(inode->root->fs_info))
 		return -EROFS;
 
-	if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
-		return -EOPNOTSUPP;
-
 	if (sync)
 		atomic_inc(&inode->sync_writers);
 
 	if (encoded) {
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			return -EOPNOTSUPP;
+
 		num_written = btrfs_encoded_write(iocb, from, encoded);
 		num_sync = encoded->len;
 	} else if (iocb->ki_flags & IOCB_DIRECT) {
@@ -3755,7 +3755,7 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 {
 	int ret;
 
-	filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
+	filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
 
 	ret = fsverity_file_open(inode, filp);
 	if (ret)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 00/12] io-uring/btrfs: support async buffered writes
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (11 preceding siblings ...)
  2022-09-12 19:27 ` [PATCH v3 12/12] btrfs: enable nowait async buffered writes Stefan Roesch
@ 2022-09-13  9:47 ` Filipe Manana
  2022-09-20 12:25 ` David Sterba
  13 siblings, 0 replies; 19+ messages in thread
From: Filipe Manana @ 2022-09-13  9:47 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, linux-mm, axboe, josef

On Mon, Sep 12, 2022 at 8:28 PM Stefan Roesch <[email protected]> wrote:
>
> This patch series adds support for async buffered writes when using both
> btrfs and io-uring. Currently io-uring only supports buffered writes (for btrfs)
> in the slow path, by processing them in the io workers. With this patch series
> it is now possible to support buffered writes in the fast path. To be able to use
> the fast path, the required pages must be in the page cache, the required locks
> in btrfs can be granted immediately and no additional blocks need to be read
> form disk.
>
> This patch series makes use of the changes that have been introduced by a
> previous patch series: "io-uring/xfs: support async buffered writes"
>
> Performance results:
>
> The new patch improves throughput by over two times (compared to the exiting
> behavior, where buffered writes are processed by an io-worker process) and also
> the latency is considerably reduced. Detailled results are part of the changelog
> of the first commit.

Not in the first, but in the last commit instead.

Looks fine to me now, thanks.

Reviewed-by: Filipe Manana <[email protected]>


>
>
> BTRFS changes:
>  -Add option for NOWAIT IOCB's to tell that searches do not wait on locks. This
>   adds the nowait option to btrfs_path.
>  -For NOWAIT buffered writes on PREALLOC or NOCOW extents tell can_nocow_extent()
>   that we don't want to wait on any locks or metadata IO.
>  -Support no_flush reservations for nowait buffered writes.
>  -Add btrfs_try_lock_ordered_range() function.
>  -Add nowait flag to btrfs_check_nocow_lock() to use it in write code path.
>  -Add nowait parameter to prepare_pages() function.
>  -Plumb nowait through the write code path.
>  -Enable nowait buffered writes.
>
>
> Testing:
>   This patch has been tested with xfstests, fsx, fio. xfstests shows no new
>   diffs compared to running without the patch series.
>
>
> Changes:
>
> V3:
>  - Updated changelog of "btrfs: implement a nowait option for tree searches"
>    to say -EAGAIN.
>  - Use bool return value in signature of btrfs_try_lock_ordered_range
>  - Renamed variable tmp to can_nocow in btrfs_buffered_write
>  - Fixed coding style in get_prepare_fgp_flags
>  - Set pages[i] to NULL in error code path of lock_and_cleanup_extent_if_need()
>  - Added const in definition of "bool nowait"
>  - Removed unlikely from btrfs_buffered_write
>  - Rephrased changelog for "btrfs: add assert to search functions" and used
>    asserts instead of warnings
>  - Explained why enocded writes are not supported in the changelog
>  - Moved performance results to changelog of first commit
>
> V2:
>  - Replace EWOULDBLOCK with EAGAIN. In Version 1 it was not used consistently
>  - Export function balance_dirty_pages_ratelimited_flags()
>  - Add asserts/warnings for search functions when nowait is set, but we don't
>    expect that they are invoked with nowait set.
>
>
>
> Josef Bacik (5):
>   btrfs: implement a nowait option for tree searches
>   btrfs: make can_nocow_extent nowait compatible
>   btrfs: add the ability to use NO_FLUSH for data reservations
>   btrfs: add btrfs_try_lock_ordered_range
>   btrfs: make btrfs_check_nocow_lock nowait compatible
>
> Stefan Roesch (7):
>   mm: export balance_dirty_pages_ratelimited_flags()
>   btrfs: make prepare_pages nowait compatible
>   btrfs: make lock_and_cleanup_extent_if_need nowait compatible
>   btrfs: plumb NOWAIT through the write path
>   btrfs: make balance_dirty_pages nowait compatible
>   btrfs: assert nowait mode is not used for some btree search functions
>   btrfs: enable nowait async buffered writes
>
>  fs/btrfs/block-group.c    |   2 +-
>  fs/btrfs/ctree.c          |  43 ++++++++++++-
>  fs/btrfs/ctree.h          |   8 ++-
>  fs/btrfs/delalloc-space.c |  13 +++-
>  fs/btrfs/delalloc-space.h |   3 +-
>  fs/btrfs/extent-tree.c    |   5 ++
>  fs/btrfs/file-item.c      |   4 +-
>  fs/btrfs/file.c           | 124 ++++++++++++++++++++++++++++----------
>  fs/btrfs/inode.c          |  22 ++++---
>  fs/btrfs/locking.c        |  23 +++++++
>  fs/btrfs/locking.h        |   1 +
>  fs/btrfs/ordered-data.c   |  28 +++++++++
>  fs/btrfs/ordered-data.h   |   1 +
>  fs/btrfs/relocation.c     |   2 +-
>  fs/btrfs/scrub.c          |   4 +-
>  fs/btrfs/space-info.c     |   3 +-
>  fs/btrfs/tree-log.c       |   6 +-
>  mm/page-writeback.c       |   1 +
>  18 files changed, 234 insertions(+), 59 deletions(-)
>
>
> base-commit: 80e78fcce86de0288793a0ef0f6acf37656ee4cf
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 00/12] io-uring/btrfs: support async buffered writes
  2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (12 preceding siblings ...)
  2022-09-13  9:47 ` [PATCH v3 00/12] io-uring/btrfs: support " Filipe Manana
@ 2022-09-20 12:25 ` David Sterba
  2022-09-21 10:19   ` David Sterba
  13 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2022-09-20 12:25 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: kernel-team, io-uring, linux-btrfs, linux-mm, axboe, josef,
	fdmanana

On Mon, Sep 12, 2022 at 12:27:40PM -0700, Stefan Roesch wrote:
> This patch series adds support for async buffered writes when using both
> btrfs and io-uring. Currently io-uring only supports buffered writes (for btrfs)
> in the slow path, by processing them in the io workers. With this patch series
> it is now possible to support buffered writes in the fast path. To be able to use
> the fast path, the required pages must be in the page cache, the required locks
> in btrfs can be granted immediately and no additional blocks need to be read
> form disk.
> 
> This patch series makes use of the changes that have been introduced by a
> previous patch series: "io-uring/xfs: support async buffered writes"
> 
> Performance results:
> 
> The new patch improves throughput by over two times (compared to the exiting
> behavior, where buffered writes are processed by an io-worker process) and also
> the latency is considerably reduced. Detailled results are part of the changelog
> of the first commit.

Thanks. It's late for including this patches to 6.1 queue but it's now
in for-next and will be added to misc-next after rc1, targeting merge to
6.2. I did some minor fixups, so please don't resend full series unless
there's a significant change. Incremental changes are fine if needed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 00/12] io-uring/btrfs: support async buffered writes
  2022-09-20 12:25 ` David Sterba
@ 2022-09-21 10:19   ` David Sterba
  2022-09-21 14:33     ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2022-09-21 10:19 UTC (permalink / raw)
  To: David Sterba
  Cc: Stefan Roesch, kernel-team, io-uring, linux-btrfs, linux-mm,
	axboe, josef, fdmanana

On Tue, Sep 20, 2022 at 02:25:40PM +0200, David Sterba wrote:
> On Mon, Sep 12, 2022 at 12:27:40PM -0700, Stefan Roesch wrote:
> > This patch series adds support for async buffered writes when using both
> > btrfs and io-uring. Currently io-uring only supports buffered writes (for btrfs)
> > in the slow path, by processing them in the io workers. With this patch series
> > it is now possible to support buffered writes in the fast path. To be able to use
> > the fast path, the required pages must be in the page cache, the required locks
> > in btrfs can be granted immediately and no additional blocks need to be read
> > form disk.
> > 
> > This patch series makes use of the changes that have been introduced by a
> > previous patch series: "io-uring/xfs: support async buffered writes"
> > 
> > Performance results:
> > 
> > The new patch improves throughput by over two times (compared to the exiting
> > behavior, where buffered writes are processed by an io-worker process) and also
> > the latency is considerably reduced. Detailled results are part of the changelog
> > of the first commit.
> 
> Thanks. It's late for including this patches to 6.1 queue but it's now
> in for-next and will be added to misc-next after rc1, targeting merge to
> 6.2. I did some minor fixups, so please don't resend full series unless
> there's a significant change. Incremental changes are fine if needed.

I'm revisiting the merge target, the potential risk seems to be low
here, straightforward changes for a separate feature, so it's now in the
6.1 queue. We still don't have ack for the function export so that would
be good to have.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 00/12] io-uring/btrfs: support async buffered writes
  2022-09-21 10:19   ` David Sterba
@ 2022-09-21 14:33     ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2022-09-21 14:33 UTC (permalink / raw)
  To: dsterba
  Cc: Stefan Roesch, kernel-team, io-uring, linux-btrfs, linux-mm,
	josef, fdmanana

On 9/21/22 4:19 AM, David Sterba wrote:
> On Tue, Sep 20, 2022 at 02:25:40PM +0200, David Sterba wrote:
>> On Mon, Sep 12, 2022 at 12:27:40PM -0700, Stefan Roesch wrote:
>>> This patch series adds support for async buffered writes when using both
>>> btrfs and io-uring. Currently io-uring only supports buffered writes (for btrfs)
>>> in the slow path, by processing them in the io workers. With this patch series
>>> it is now possible to support buffered writes in the fast path. To be able to use
>>> the fast path, the required pages must be in the page cache, the required locks
>>> in btrfs can be granted immediately and no additional blocks need to be read
>>> form disk.
>>>
>>> This patch series makes use of the changes that have been introduced by a
>>> previous patch series: "io-uring/xfs: support async buffered writes"
>>>
>>> Performance results:
>>>
>>> The new patch improves throughput by over two times (compared to the exiting
>>> behavior, where buffered writes are processed by an io-worker process) and also
>>> the latency is considerably reduced. Detailled results are part of the changelog
>>> of the first commit.
>>
>> Thanks. It's late for including this patches to 6.1 queue but it's now
>> in for-next and will be added to misc-next after rc1, targeting merge to
>> 6.2. I did some minor fixups, so please don't resend full series unless
>> there's a significant change. Incremental changes are fine if needed.
> 
> I'm revisiting the merge target, the potential risk seems to be low
> here, straightforward changes for a separate feature, so it's now in the
> 6.1 queue. We still don't have ack for the function export so that would
> be good to have.

I don't think the ack there is a big deal, iomap already uses it but
can't be modular. So it's just adding a modular user of the same thing,
really.

6.1 sounds good to me!

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 02/12] btrfs: implement a nowait option for tree searches
  2022-09-12 19:27 ` [PATCH v3 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
@ 2022-09-26  9:14   ` Filipe Manana
  2022-09-26 10:55     ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Filipe Manana @ 2022-09-26  9:14 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: kernel-team, io-uring, linux-btrfs, linux-mm, axboe, josef,
	fdmanana

On Mon, Sep 12, 2022 at 12:27:42PM -0700, Stefan Roesch wrote:
> From: Josef Bacik <[email protected]>
> 
> For NOWAIT IOCB's we'll need a way to tell search to not wait on locks
> or anything.  Accomplish this by adding a path->nowait flag that will
> use trylocks and skip reading of metadata, returning -EAGAIN in either
> of these cases.  For now we only need this for reads, so only the read
> side is handled.  Add an ASSERT() to catch anybody trying to use this
> for writes so they know they'll have to implement the write side.
> 
> Signed-off-by: Josef Bacik <[email protected]>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/ctree.c   | 39 ++++++++++++++++++++++++++++++++++++---
>  fs/btrfs/ctree.h   |  1 +
>  fs/btrfs/locking.c | 23 +++++++++++++++++++++++
>  fs/btrfs/locking.h |  1 +
>  4 files changed, 61 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index ebfa35fe1c38..71b238364939 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1447,6 +1447,11 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
>  			return 0;
>  		}
>  
> +		if (p->nowait) {
> +			free_extent_buffer(tmp);
> +			return -EAGAIN;
> +		}
> +
>  		if (unlock_up)
>  			btrfs_unlock_up_safe(p, level + 1);
>  
> @@ -1467,6 +1472,8 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
>  			ret = -EAGAIN;
>  
>  		goto out;
> +	} else if (p->nowait) {
> +		return -EAGAIN;
>  	}
>  
>  	if (unlock_up) {
> @@ -1634,7 +1641,13 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
>  		 * We don't know the level of the root node until we actually
>  		 * have it read locked
>  		 */
> -		b = btrfs_read_lock_root_node(root);
> +		if (p->nowait) {
> +			b = btrfs_try_read_lock_root_node(root);
> +			if (IS_ERR(b))
> +				return b;
> +		} else {
> +			b = btrfs_read_lock_root_node(root);
> +		}
>  		level = btrfs_header_level(b);
>  		if (level > write_lock_level)
>  			goto out;
> @@ -1910,6 +1923,13 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  	WARN_ON(p->nodes[0] != NULL);
>  	BUG_ON(!cow && ins_len);
>  
> +	/*
> +	 * For now only allow nowait for read only operations.  There's no
> +	 * strict reason why we can't, we just only need it for reads so I'm
> +	 * only implementing it for reads right now.
> +	 */
> +	ASSERT(!p->nowait || !cow);
> +
>  	if (ins_len < 0) {
>  		lowest_unlock = 2;
>  
> @@ -1936,7 +1956,12 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  
>  	if (p->need_commit_sem) {
>  		ASSERT(p->search_commit_root);
> -		down_read(&fs_info->commit_root_sem);
> +		if (p->nowait) {
> +			if (!down_read_trylock(&fs_info->commit_root_sem))
> +				return -EAGAIN;
> +		} else {
> +			down_read(&fs_info->commit_root_sem);
> +		}
>  	}
>  
>  again:
> @@ -2082,7 +2107,15 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  				btrfs_tree_lock(b);
>  				p->locks[level] = BTRFS_WRITE_LOCK;
>  			} else {
> -				btrfs_tree_read_lock(b);
> +				if (p->nowait) {
> +					if (!btrfs_try_tree_read_lock(b)) {
> +						free_extent_buffer(b);
> +						ret = -EAGAIN;
> +						goto done;
> +					}
> +				} else {
> +					btrfs_tree_read_lock(b);
> +				}
>  				p->locks[level] = BTRFS_READ_LOCK;
>  			}
>  			p->nodes[level] = b;
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index df8c99c99df9..ca59ba6421a9 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -443,6 +443,7 @@ struct btrfs_path {
>  	 * header (ie. sizeof(struct btrfs_item) is not included).
>  	 */
>  	unsigned int search_for_extension:1;
> +	unsigned int nowait:1;
>  };
>  #define BTRFS_MAX_EXTENT_ITEM_SIZE(r) ((BTRFS_LEAF_DATA_SIZE(r->fs_info) >> 4) - \
>  					sizeof(struct btrfs_item))
> diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
> index 9063072b399b..d6c88922d3e2 100644
> --- a/fs/btrfs/locking.c
> +++ b/fs/btrfs/locking.c
> @@ -285,6 +285,29 @@ struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root)
>  	return eb;
>  }
>  
> +/*
> + * Loop around taking references on and locking the root node of the tree in
> + * nowait mode until we end up with a lock on the root node or returning to
> + * avoid blocking.
> + *
> + * Return: root extent buffer with read lock held or -EWOULDBLOCK.
> + */
> +struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root)
> +{
> +	struct extent_buffer *eb;
> +
> +	while (1) {
> +		eb = btrfs_root_node(root);
> +		if (!btrfs_try_tree_read_lock(eb))
> +			return ERR_PTR(-EAGAIN);

There's a leak of the extent buffer here.
This fixes it up:

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 9d53bcfb6d9b..0eab3cb274a1 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -298,8 +298,10 @@ struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root)
 
        while (1) {
                eb = btrfs_root_node(root);
-               if (!btrfs_try_tree_read_lock(eb))
+               if (!btrfs_try_tree_read_lock(eb)) {
+                       free_extent_buffer(eb);
                        return ERR_PTR(-EAGAIN);
+               }
                if (eb == root->node)
                        break;
                btrfs_tree_read_unlock(eb);


Thanks.

> +		if (eb == root->node)
> +			break;
> +		btrfs_tree_read_unlock(eb);
> +		free_extent_buffer(eb);
> +	}
> +	return eb;
> +}
> +
>  /*
>   * DREW locks
>   * ==========
> diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
> index ab268be09bb5..490c7a79e995 100644
> --- a/fs/btrfs/locking.h
> +++ b/fs/btrfs/locking.h
> @@ -94,6 +94,7 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb);
>  int btrfs_try_tree_write_lock(struct extent_buffer *eb);
>  struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root);
>  struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root);
> +struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root);
>  
>  #ifdef CONFIG_BTRFS_DEBUG
>  static inline void btrfs_assert_tree_write_locked(struct extent_buffer *eb)
> -- 
> 2.30.2
> 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 02/12] btrfs: implement a nowait option for tree searches
  2022-09-26  9:14   ` Filipe Manana
@ 2022-09-26 10:55     ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2022-09-26 10:55 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Stefan Roesch, kernel-team, io-uring, linux-btrfs, linux-mm,
	axboe, josef, fdmanana

On Mon, Sep 26, 2022 at 10:14:40AM +0100, Filipe Manana wrote:
> On Mon, Sep 12, 2022 at 12:27:42PM -0700, Stefan Roesch wrote:
> > From: Josef Bacik <[email protected]>
> > +{
> > +	struct extent_buffer *eb;
> > +
> > +	while (1) {
> > +		eb = btrfs_root_node(root);
> > +		if (!btrfs_try_tree_read_lock(eb))
> > +			return ERR_PTR(-EAGAIN);
> 
> There's a leak of the extent buffer here.
> This fixes it up:
> 
> diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
> index 9d53bcfb6d9b..0eab3cb274a1 100644
> --- a/fs/btrfs/locking.c
> +++ b/fs/btrfs/locking.c
> @@ -298,8 +298,10 @@ struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root)
>  
>         while (1) {
>                 eb = btrfs_root_node(root);
> -               if (!btrfs_try_tree_read_lock(eb))
> +               if (!btrfs_try_tree_read_lock(eb)) {
> +                       free_extent_buffer(eb);
>                         return ERR_PTR(-EAGAIN);
> +               }
>                 if (eb == root->node)
>                         break;
>                 btrfs_tree_read_unlock(eb);

Folded to the commit, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-09-26 12:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-12 19:27 [PATCH v3 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
2022-09-26  9:14   ` Filipe Manana
2022-09-26 10:55     ` David Sterba
2022-09-12 19:27 ` [PATCH v3 03/12] btrfs: make can_nocow_extent nowait compatible Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 04/12] btrfs: add the ability to use NO_FLUSH for data reservations Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 05/12] btrfs: add btrfs_try_lock_ordered_range Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 07/12] btrfs: make prepare_pages " Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 08/12] btrfs: make lock_and_cleanup_extent_if_need " Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 09/12] btrfs: plumb NOWAIT through the write path Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 10/12] btrfs: make balance_dirty_pages nowait compatible Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 11/12] btrfs: assert nowait mode is not used for some btree search functions Stefan Roesch
2022-09-12 19:27 ` [PATCH v3 12/12] btrfs: enable nowait async buffered writes Stefan Roesch
2022-09-13  9:47 ` [PATCH v3 00/12] io-uring/btrfs: support " Filipe Manana
2022-09-20 12:25 ` David Sterba
2022-09-21 10:19   ` David Sterba
2022-09-21 14:33     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox