public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2 00/12] io-uring/btrfs: support async buffered writes
@ 2022-09-08  0:26 Stefan Roesch
  2022-09-08  0:26 ` [PATCH v2 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
                   ` (12 more replies)
  0 siblings, 13 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

This patch series adds support for async buffered writes when using both
btrfs and io-uring. Currently io-uring only supports buffered writes (for btrfs)
in the slow path, by processing them in the io workers. With this patch series
it is now possible to support buffered writes in the fast path. To be able to use
the fast path, the required pages must be in the page cache, the required locks
in btrfs can be granted immediately and no additional blocks need to be read
form disk.

This patch series makes use of the changes that have been introduced by a
previous patch series: "io-uring/xfs: support async buffered writes"

Performance results:
  For fio the following results have been obtained with a queue depth of
  1 and 4k block size (runtime 600 secs):

                 sequential writes:
                 without patch           with patch      libaio     psync
  iops:              55k                    134k          117K       148K
  bw:               221MB/s                 538MB/s       469MB/s    592MB/s
  clat:           15286ns                    82ns         994ns     6340ns


For an io depth of 1, the new patch improves throughput by over two times
(compared to the exiting behavior, where buffered writes are processed by an
io-worker process) and also the latency is considerably reduced. To achieve the
same or better performance with the exisiting code an io depth of 4 is required.
Increasing the iodepth further does not lead to improvements.


BTRFS changes:
 -Add option for NOWAIT IOCB's to tell that searches do not wait on locks. This
  adds the nowait option to btrfs_path.

 -For NOWAIT buffered writes on PREALLOC or NOCOW extents tell can_nocow_extent()
  that we don't want to wait on any locks or metadata IO.

 -Support no_flush reservations for nowait buffered writes.

 -Add btrfs_try_lock_ordered_range() function.

 -Add nowait flag to btrfs_check_nocow_lock() to use it in write code path.

 -Add nowait parameter to prepare_pages() function.

 -Plumb nowait through the write code path.

 -Enable nowait buffered writes.


Testing:
  This patch has been tested with xfstests, fsx, fio. xfstests shows no new
  diffs compared to running without the patch series.


Changes:

V2:
 - Replace EWOULDBLOCK with EAGAIN. In Version 1 it was not used consistently
 - Export function balance_dirty_pages_ratelimited_flags()
 - Add asserts/warnings for search functions when nowait is set, but we don't
   expect that they are invoked with nowait set.


Josef Bacik (5):
  btrfs: implement a nowait option for tree searches
  btrfs: make can_nocow_extent nowait compatible
  btrfs: add the ability to use NO_FLUSH for data reservations
  btrfs: add btrfs_try_lock_ordered_range
  btrfs: make btrfs_check_nocow_lock nowait compatible

Stefan Roesch (7):
  mm: export balance_dirty_pages_ratelimited_flags()
  btrfs: make prepare_pages nowait compatible
  btrfs: make lock_and_cleanup_extent_if_need nowait compatible
  btrfs: btrfs: plumb NOWAIT through the write path
  btrfs: make balance_dirty_pages nowait compatible
  btrfs: add assert to search functions
  btrfs: enable nowait async buffered writes

 fs/btrfs/block-group.c    |   2 +-
 fs/btrfs/ctree.c          |  48 ++++++++++++++-
 fs/btrfs/ctree.h          |   8 ++-
 fs/btrfs/delalloc-space.c |  13 +++-
 fs/btrfs/delalloc-space.h |   3 +-
 fs/btrfs/extent-tree.c    |   5 ++
 fs/btrfs/file-item.c      |   4 +-
 fs/btrfs/file.c           | 124 ++++++++++++++++++++++++++++----------
 fs/btrfs/inode.c          |  22 ++++---
 fs/btrfs/locking.c        |  23 +++++++
 fs/btrfs/locking.h        |   1 +
 fs/btrfs/ordered-data.c   |  28 +++++++++
 fs/btrfs/ordered-data.h   |   1 +
 fs/btrfs/relocation.c     |   2 +-
 fs/btrfs/scrub.c          |   4 +-
 fs/btrfs/space-info.c     |   3 +-
 fs/btrfs/tree-log.c       |   6 +-
 mm/page-writeback.c       |   1 +
 18 files changed, 239 insertions(+), 59 deletions(-)


base-commit: 53e99dcff61e1523ec1c3628b2d564ba15d32eb7
-- 
2.30.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/12] mm: export balance_dirty_pages_ratelimited_flags()
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:18   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs
  Cc: shr, axboe, josef, fdmanana, kernel test robot

Export the function balance_dirty_pages_ratelimited_flags(). It is now
also called from btrfs.

Signed-off-by: Stefan Roesch <[email protected]>
Reported-by: kernel test robot <[email protected]>
---
 mm/page-writeback.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 032a7bf8d259..7e9d8d857ecc 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1933,6 +1933,7 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
 	wb_put(wb);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(balance_dirty_pages_ratelimited_flags);
 
 /**
  * balance_dirty_pages_ratelimited - balance dirty memory state.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 02/12] btrfs: implement a nowait option for tree searches
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
  2022-09-08  0:26 ` [PATCH v2 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 13:19   ` Josef Bacik
  2022-09-08  0:26 ` [PATCH v2 03/12] btrfs: make can_nocow_extent nowait compatible Stefan Roesch
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

For NOWAIT IOCB's we'll need a way to tell search to not wait on locks
or anything.  Accomplish this by adding a path->nowait flag that will
use trylocks and skip reading of metadata, returning -EWOULDBLOCK in
either of these cases.  For now we only need this for reads, so only the
read side is handled.  Add an ASSERT() to catch anybody trying to use
this for writes so they know they'll have to implement the write side.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.c   | 39 ++++++++++++++++++++++++++++++++++++---
 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/locking.c | 23 +++++++++++++++++++++++
 fs/btrfs/locking.h |  1 +
 4 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index ebfa35fe1c38..71b238364939 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1447,6 +1447,11 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 			return 0;
 		}
 
+		if (p->nowait) {
+			free_extent_buffer(tmp);
+			return -EAGAIN;
+		}
+
 		if (unlock_up)
 			btrfs_unlock_up_safe(p, level + 1);
 
@@ -1467,6 +1472,8 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 			ret = -EAGAIN;
 
 		goto out;
+	} else if (p->nowait) {
+		return -EAGAIN;
 	}
 
 	if (unlock_up) {
@@ -1634,7 +1641,13 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 		 * We don't know the level of the root node until we actually
 		 * have it read locked
 		 */
-		b = btrfs_read_lock_root_node(root);
+		if (p->nowait) {
+			b = btrfs_try_read_lock_root_node(root);
+			if (IS_ERR(b))
+				return b;
+		} else {
+			b = btrfs_read_lock_root_node(root);
+		}
 		level = btrfs_header_level(b);
 		if (level > write_lock_level)
 			goto out;
@@ -1910,6 +1923,13 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	WARN_ON(p->nodes[0] != NULL);
 	BUG_ON(!cow && ins_len);
 
+	/*
+	 * For now only allow nowait for read only operations.  There's no
+	 * strict reason why we can't, we just only need it for reads so I'm
+	 * only implementing it for reads right now.
+	 */
+	ASSERT(!p->nowait || !cow);
+
 	if (ins_len < 0) {
 		lowest_unlock = 2;
 
@@ -1936,7 +1956,12 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
 	if (p->need_commit_sem) {
 		ASSERT(p->search_commit_root);
-		down_read(&fs_info->commit_root_sem);
+		if (p->nowait) {
+			if (!down_read_trylock(&fs_info->commit_root_sem))
+				return -EAGAIN;
+		} else {
+			down_read(&fs_info->commit_root_sem);
+		}
 	}
 
 again:
@@ -2082,7 +2107,15 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 				btrfs_tree_lock(b);
 				p->locks[level] = BTRFS_WRITE_LOCK;
 			} else {
-				btrfs_tree_read_lock(b);
+				if (p->nowait) {
+					if (!btrfs_try_tree_read_lock(b)) {
+						free_extent_buffer(b);
+						ret = -EAGAIN;
+						goto done;
+					}
+				} else {
+					btrfs_tree_read_lock(b);
+				}
 				p->locks[level] = BTRFS_READ_LOCK;
 			}
 			p->nodes[level] = b;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9ef162dbd4bc..d6d05450198d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -443,6 +443,7 @@ struct btrfs_path {
 	 * header (ie. sizeof(struct btrfs_item) is not included).
 	 */
 	unsigned int search_for_extension:1;
+	unsigned int nowait:1;
 };
 #define BTRFS_MAX_EXTENT_ITEM_SIZE(r) ((BTRFS_LEAF_DATA_SIZE(r->fs_info) >> 4) - \
 					sizeof(struct btrfs_item))
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 9063072b399b..d6c88922d3e2 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -285,6 +285,29 @@ struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root)
 	return eb;
 }
 
+/*
+ * Loop around taking references on and locking the root node of the tree in
+ * nowait mode until we end up with a lock on the root node or returning to
+ * avoid blocking.
+ *
+ * Return: root extent buffer with read lock held or -EWOULDBLOCK.
+ */
+struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root)
+{
+	struct extent_buffer *eb;
+
+	while (1) {
+		eb = btrfs_root_node(root);
+		if (!btrfs_try_tree_read_lock(eb))
+			return ERR_PTR(-EAGAIN);
+		if (eb == root->node)
+			break;
+		btrfs_tree_read_unlock(eb);
+		free_extent_buffer(eb);
+	}
+	return eb;
+}
+
 /*
  * DREW locks
  * ==========
diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index ab268be09bb5..490c7a79e995 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -94,6 +94,7 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb);
 int btrfs_try_tree_write_lock(struct extent_buffer *eb);
 struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root);
 struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root);
+struct extent_buffer *btrfs_try_read_lock_root_node(struct btrfs_root *root);
 
 #ifdef CONFIG_BTRFS_DEBUG
 static inline void btrfs_assert_tree_write_locked(struct extent_buffer *eb)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 03/12] btrfs: make can_nocow_extent nowait compatible
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
  2022-09-08  0:26 ` [PATCH v2 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
  2022-09-08  0:26 ` [PATCH v2 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08  0:26 ` [PATCH v2 04/12] btrfs: add the ability to use NO_FLUSH for data reservations Stefan Roesch
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

If we have NOWAIT specified on our IOCB and we're writing into a
PREALLOC or NOCOW extent then we need to be able to tell
can_nocow_extent that we don't want to wait on any locks or metadata IO.
Fix can_nocow_extent to allow for NOWAIT.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.h       |  5 +++--
 fs/btrfs/extent-tree.c |  5 +++++
 fs/btrfs/file-item.c   |  4 +++-
 fs/btrfs/file.c        |  2 +-
 fs/btrfs/inode.c       | 16 ++++++++++------
 fs/btrfs/relocation.c  |  2 +-
 fs/btrfs/scrub.c       |  4 ++--
 fs/btrfs/tree-log.c    |  6 +++---
 8 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index d6d05450198d..536bbc8551fc 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3276,7 +3276,8 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
 				u64 offset, bool one_ordered);
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
-			     struct list_head *list, int search_commit);
+			     struct list_head *list, int search_commit,
+			     bool nowait);
 void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode,
 				     const struct btrfs_path *path,
 				     struct btrfs_file_extent_item *fi,
@@ -3306,7 +3307,7 @@ struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode,
 					   u64 start, u64 len);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 			      u64 *orig_start, u64 *orig_block_len,
-			      u64 *ram_bytes, bool strict);
+			      u64 *ram_bytes, bool nowait, bool strict);
 
 void __btrfs_del_delalloc_inode(struct btrfs_root *root,
 				struct btrfs_inode *inode);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6914cd8024ba..583ddae3c270 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2220,6 +2220,11 @@ static noinline int check_delayed_ref(struct btrfs_root *root,
 	}
 
 	if (!mutex_trylock(&head->mutex)) {
+		if (path->nowait) {
+			spin_unlock(&delayed_refs->lock);
+			return -EAGAIN;
+		}
+
 		refcount_inc(&head->refs);
 		spin_unlock(&delayed_refs->lock);
 
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index c828f971a346..fcc6ce861409 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -503,7 +503,8 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 }
 
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
-			     struct list_head *list, int search_commit)
+			     struct list_head *list, int search_commit,
+			     bool nowait)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct btrfs_key key;
@@ -525,6 +526,7 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
 	if (!path)
 		return -ENOMEM;
 
+	path->nowait = nowait;
 	if (search_commit) {
 		path->skip_locking = 1;
 		path->reada = READA_FORWARD;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5a3f6e0d9688..f4aa198f0f87 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1502,7 +1502,7 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 
 	btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
-			NULL, NULL, NULL, false);
+			NULL, NULL, NULL, false, false);
 	if (ret <= 0) {
 		ret = 0;
 		btrfs_drew_write_unlock(&root->snapshot_lock);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ad250892028d..8ad3bea26652 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1667,7 +1667,7 @@ static noinline int run_delalloc_zoned(struct btrfs_inode *inode,
 }
 
 static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info,
-					u64 bytenr, u64 num_bytes)
+					u64 bytenr, u64 num_bytes, bool nowait)
 {
 	struct btrfs_root *csum_root = btrfs_csum_root(fs_info, bytenr);
 	struct btrfs_ordered_sum *sums;
@@ -1675,7 +1675,8 @@ static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info,
 	LIST_HEAD(list);
 
 	ret = btrfs_lookup_csums_range(csum_root, bytenr,
-				       bytenr + num_bytes - 1, &list, 0);
+				       bytenr + num_bytes - 1, &list, 0,
+				       nowait);
 	if (ret == 0 && list_empty(&list))
 		return 0;
 
@@ -1801,6 +1802,7 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 	u8 extent_type;
 	int can_nocow = 0;
 	int ret = 0;
+	bool nowait = path->nowait;
 
 	fi = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_file_extent_item);
 	extent_type = btrfs_file_extent_type(leaf, fi);
@@ -1877,7 +1879,8 @@ static int can_nocow_file_extent(struct btrfs_path *path,
 	 * Force COW if csums exist in the range. This ensures that csums for a
 	 * given extent are either valid or do not exist.
 	 */
-	ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes);
+	ret = csum_exist_in_range(root->fs_info, args->disk_bytenr, args->num_bytes,
+				  nowait);
 	WARN_ON_ONCE(ret > 0 && is_freespace_inode);
 	if (ret != 0)
 		goto out;
@@ -7293,7 +7296,7 @@ static bool btrfs_extent_readonly(struct btrfs_fs_info *fs_info, u64 bytenr)
  */
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 			      u64 *orig_start, u64 *orig_block_len,
-			      u64 *ram_bytes, bool strict)
+			      u64 *ram_bytes, bool nowait, bool strict)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct can_nocow_file_extent_args nocow_args = { 0 };
@@ -7309,6 +7312,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
+	path->nowait = nowait;
 
 	ret = btrfs_lookup_file_extent(NULL, root, path,
 			btrfs_ino(BTRFS_I(inode)), offset, 0);
@@ -7578,7 +7582,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		block_start = em->block_start + (start - em->start);
 
 		if (can_nocow_extent(inode, start, &len, &orig_start,
-				     &orig_block_len, &ram_bytes, false) == 1) {
+				     &orig_block_len, &ram_bytes, false, false) == 1) {
 			bg = btrfs_inc_nocow_writers(fs_info, block_start);
 			if (bg)
 				can_nocow = true;
@@ -11243,7 +11247,7 @@ static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 		free_extent_map(em);
 		em = NULL;
 
-		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, true);
+		ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL, false, true);
 		if (ret < 0) {
 			goto out;
 		} else if (ret) {
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 45c02aba2492..dfc3f6c04b13 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4339,7 +4339,7 @@ int btrfs_reloc_clone_csums(struct btrfs_inode *inode, u64 file_pos, u64 len)
 	disk_bytenr = file_pos + inode->index_cnt;
 	csum_root = btrfs_csum_root(fs_info, disk_bytenr);
 	ret = btrfs_lookup_csums_range(csum_root, disk_bytenr,
-				       disk_bytenr + len - 1, &list, 0);
+				       disk_bytenr + len - 1, &list, 0, false);
 	if (ret)
 		goto out;
 
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3afe5fa50a63..1cb3eed8b917 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3077,7 +3077,7 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx,
 
 		ret = btrfs_lookup_csums_range(csum_root, extent_start,
 					       extent_start + extent_size - 1,
-					       &sctx->csum_list, 1);
+					       &sctx->csum_list, 1, false);
 		if (ret) {
 			scrub_parity_mark_sectors_error(sparity, extent_start,
 							extent_size);
@@ -3303,7 +3303,7 @@ static int scrub_simple_mirror(struct scrub_ctx *sctx,
 		if (extent_flags & BTRFS_EXTENT_FLAG_DATA) {
 			ret = btrfs_lookup_csums_range(csum_root, cur_logical,
 					cur_logical + scrub_len - 1,
-					&sctx->csum_list, 1);
+					&sctx->csum_list, 1, false);
 			if (ret)
 				break;
 		}
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9205c4a5ca81..8af30dab2a17 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -801,7 +801,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
 
 			ret = btrfs_lookup_csums_range(root->log_root,
 						csum_start, csum_end - 1,
-						&ordered_sums, 0);
+						&ordered_sums, 0, false);
 			if (ret)
 				goto out;
 			/*
@@ -4513,7 +4513,7 @@ static noinline int copy_items(struct btrfs_trans_handle *trans,
 		disk_bytenr += extent_offset;
 		ret = btrfs_lookup_csums_range(csum_root, disk_bytenr,
 					       disk_bytenr + extent_num_bytes - 1,
-					       &ordered_sums, 0);
+					       &ordered_sums, 0, false);
 		if (ret)
 			goto out;
 
@@ -4709,7 +4709,7 @@ static int log_extent_csums(struct btrfs_trans_handle *trans,
 	ret = btrfs_lookup_csums_range(csum_root,
 				       em->block_start + csum_offset,
 				       em->block_start + csum_offset +
-				       csum_len - 1, &ordered_sums, 0);
+				       csum_len - 1, &ordered_sums, 0, false);
 	if (ret)
 		return ret;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 04/12] btrfs: add the ability to use NO_FLUSH for data reservations
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (2 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 03/12] btrfs: make can_nocow_extent nowait compatible Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08  0:26 ` [PATCH v2 05/12] btrfs: add btrfs_try_lock_ordered_range Stefan Roesch
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

In order to accommodate NOWAIT IOCB's we need to be able to do NO_FLUSH
data reservations, so plumb this through the delalloc reservation
system.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/block-group.c    |  2 +-
 fs/btrfs/delalloc-space.c | 13 ++++++++++---
 fs/btrfs/delalloc-space.h |  3 ++-
 fs/btrfs/file.c           |  2 +-
 fs/btrfs/inode.c          |  4 ++--
 fs/btrfs/space-info.c     |  3 ++-
 6 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index e0375ba9d0fe..9df51245ba93 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2869,7 +2869,7 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
 	cache_size *= fs_info->sectorsize;
 
 	ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved, 0,
-					  cache_size);
+					  cache_size, false);
 	if (ret)
 		goto out_put;
 
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 1e8f17ff829e..118b2e20b2e1 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -127,9 +127,11 @@ int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
 }
 
 int btrfs_check_data_free_space(struct btrfs_inode *inode,
-			struct extent_changeset **reserved, u64 start, u64 len)
+				struct extent_changeset **reserved, u64 start,
+				u64 len, bool noflush)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_DATA;
 	int ret;
 
 	/* align the range */
@@ -137,7 +139,12 @@ int btrfs_check_data_free_space(struct btrfs_inode *inode,
 	      round_down(start, fs_info->sectorsize);
 	start = round_down(start, fs_info->sectorsize);
 
-	ret = btrfs_alloc_data_chunk_ondemand(inode, len);
+	if (noflush)
+		flush = BTRFS_RESERVE_NO_FLUSH;
+	else if (btrfs_is_free_space_inode(inode))
+		flush = BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE;
+
+	ret = btrfs_reserve_data_bytes(fs_info, len, flush);
 	if (ret < 0)
 		return ret;
 
@@ -454,7 +461,7 @@ int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
 {
 	int ret;
 
-	ret = btrfs_check_data_free_space(inode, reserved, start, len);
+	ret = btrfs_check_data_free_space(inode, reserved, start, len, false);
 	if (ret < 0)
 		return ret;
 	ret = btrfs_delalloc_reserve_metadata(inode, len, len, false);
diff --git a/fs/btrfs/delalloc-space.h b/fs/btrfs/delalloc-space.h
index 28bf5c3ef430..e07d46043455 100644
--- a/fs/btrfs/delalloc-space.h
+++ b/fs/btrfs/delalloc-space.h
@@ -7,7 +7,8 @@ struct extent_changeset;
 
 int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes);
 int btrfs_check_data_free_space(struct btrfs_inode *inode,
-			struct extent_changeset **reserved, u64 start, u64 len);
+			struct extent_changeset **reserved, u64 start, u64 len,
+			bool noflush);
 void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
 			struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_delalloc_release_space(struct btrfs_inode *inode,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f4aa198f0f87..0f257205c63d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1664,7 +1664,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extent_changeset_release(data_reserved);
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
 						  &data_reserved, pos,
-						  write_bytes);
+						  write_bytes, false);
 		if (ret < 0) {
 			/*
 			 * If we don't have to COW at the offset, reserve
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8ad3bea26652..36e755f73764 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4882,7 +4882,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	block_end = block_start + blocksize - 1;
 
 	ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
-					  blocksize);
+					  blocksize, false);
 	if (ret < 0) {
 		if (btrfs_check_nocow_lock(inode, block_start, &write_bytes) > 0) {
 			/* For nocow case, no need to reserve data space */
@@ -7767,7 +7767,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	if (write && !(flags & IOMAP_NOWAIT)) {
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
 						  &dio_data->data_reserved,
-						  start, data_alloc_len);
+						  start, data_alloc_len, false);
 		if (!ret)
 			dio_data->data_space_reserved = true;
 		else if (ret && !(BTRFS_I(inode)->flags &
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index d0cbeb7ae81c..0738e1efed79 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1737,7 +1737,8 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 	int ret;
 
 	ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
-	       flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE);
+	       flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE ||
+	       flush == BTRFS_RESERVE_NO_FLUSH);
 	ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
 
 	ret = __reserve_bytes(fs_info, data_sinfo, bytes, flush);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 05/12] btrfs: add btrfs_try_lock_ordered_range
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (3 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 04/12] btrfs: add the ability to use NO_FLUSH for data reservations Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:18   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible Stefan Roesch
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

For IOCB_NOWAIT we're going to want to use try lock on the extent lock,
and simply bail if there's an ordered extent in the range because the
only choice there is to wait for the ordered extent to complete.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ordered-data.c | 28 ++++++++++++++++++++++++++++
 fs/btrfs/ordered-data.h |  1 +
 2 files changed, 29 insertions(+)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 1952ac85222c..3cdfdcedb088 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -1041,6 +1041,34 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 	}
 }
 
+/*
+ * btrfs_try_lock_ordered_range - lock the passed range and ensure all pending
+ * ordered extents in it are run to completion in nowait mode.
+ *
+ * @inode:        Inode whose ordered tree is to be searched
+ * @start:        Beginning of range to flush
+ * @end:          Last byte of range to lock
+ *
+ * This function returns 1 if btrfs_lock_ordered_range does not return any
+ * extents, otherwise 0.
+ */
+int btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end)
+{
+	struct btrfs_ordered_extent *ordered;
+
+	if (!try_lock_extent(&inode->io_tree, start, end))
+		return 0;
+
+	ordered = btrfs_lookup_ordered_range(inode, start, end - start + 1);
+	if (!ordered)
+		return 1;
+
+	btrfs_put_ordered_extent(ordered);
+	unlock_extent(&inode->io_tree, start, end);
+	return 0;
+}
+
+
 static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
 				u64 len)
 {
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 87792f85e2c4..ec27ebf0af4b 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -218,6 +218,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
 void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 					u64 end,
 					struct extent_state **cached_state);
+int btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end);
 int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
 			       u64 post);
 int __init ordered_data_init(void);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (4 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 05/12] btrfs: add btrfs_try_lock_ordered_range Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:18   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 07/12] btrfs: make prepare_pages " Stefan Roesch
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

From: Josef Bacik <[email protected]>

Now all the helpers that btrfs_check_nocow_lock uses handle nowait, add
a nowait flag to btrfs_check_nocow_lock so it can be used by the write
path.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/file.c  | 33 ++++++++++++++++++++++-----------
 fs/btrfs/inode.c |  2 +-
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 536bbc8551fc..06cb25f2d3bd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3482,7 +3482,7 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
 		      struct extent_state **cached, bool noreserve);
 int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
 int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
-			   size_t *write_bytes);
+			   size_t *write_bytes, bool nowait);
 void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
 
 /* tree-defrag.c */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0f257205c63d..cf19d381ead6 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1481,7 +1481,7 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
  * NOTE: Callers need to call btrfs_check_nocow_unlock() if we return > 0.
  */
 int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
-			   size_t *write_bytes)
+			   size_t *write_bytes, bool nowait)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_root *root = inode->root;
@@ -1500,16 +1500,21 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
 			   fs_info->sectorsize) - 1;
 	num_bytes = lockend - lockstart + 1;
 
-	btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
+	if (nowait) {
+		if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend)) {
+			btrfs_drew_write_unlock(&root->snapshot_lock);
+			return -EAGAIN;
+		}
+	} else {
+		btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
+	}
 	ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
-			NULL, NULL, NULL, false, false);
-	if (ret <= 0) {
-		ret = 0;
+			NULL, NULL, NULL, nowait, false);
+	if (ret <= 0)
 		btrfs_drew_write_unlock(&root->snapshot_lock);
-	} else {
+	else
 		*write_bytes = min_t(size_t, *write_bytes ,
 				     num_bytes - pos + lockstart);
-	}
 	unlock_extent(&inode->io_tree, lockstart, lockend);
 
 	return ret;
@@ -1666,16 +1671,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 						  &data_reserved, pos,
 						  write_bytes, false);
 		if (ret < 0) {
+			int tmp;
+
 			/*
 			 * If we don't have to COW at the offset, reserve
 			 * metadata only. write_bytes may get smaller than
 			 * requested here.
 			 */
-			if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
-						   &write_bytes) > 0)
-				only_release_metadata = true;
-			else
+			tmp = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
+						     &write_bytes, false);
+			if (tmp < 0)
+				ret = tmp;
+			if (tmp > 0)
+				ret = 0;
+			if (ret)
 				break;
+			only_release_metadata = true;
 		}
 
 		num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 36e755f73764..5426d4f4ac23 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4884,7 +4884,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
 	ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
 					  blocksize, false);
 	if (ret < 0) {
-		if (btrfs_check_nocow_lock(inode, block_start, &write_bytes) > 0) {
+		if (btrfs_check_nocow_lock(inode, block_start, &write_bytes, false) > 0) {
 			/* For nocow case, no need to reserve data space */
 			only_release_metadata = true;
 		} else {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 07/12] btrfs: make prepare_pages nowait compatible
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (5 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:17   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 08/12] btrfs: make lock_and_cleanup_extent_if_need " Stefan Roesch
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

Add nowait parameter to the prepare_pages function. In case nowait is
specified for an async buffered write request, do a nowait allocation or
return -EAGAIN.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 43 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index cf19d381ead6..a154a3cec44b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1339,26 +1339,55 @@ static int prepare_uptodate_page(struct inode *inode,
 	return 0;
 }
 
+static int get_prepare_fgp_flags(bool nowait)
+{
+	int fgp_flags;
+
+	fgp_flags = FGP_LOCK|FGP_ACCESSED|FGP_CREAT;
+	if (nowait)
+		fgp_flags |= FGP_NOWAIT;
+
+	return fgp_flags;
+}
+
+static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait)
+{
+	gfp_t gfp;
+
+	gfp = btrfs_alloc_write_mask(inode->i_mapping);
+	if (nowait) {
+		gfp &= ~__GFP_DIRECT_RECLAIM;
+		gfp |= GFP_NOWAIT;
+	}
+
+	return gfp;
+}
+
 /*
  * this just gets pages into the page cache and locks them down.
  */
 static noinline int prepare_pages(struct inode *inode, struct page **pages,
 				  size_t num_pages, loff_t pos,
-				  size_t write_bytes, bool force_uptodate)
+				  size_t write_bytes, bool force_uptodate,
+				  bool nowait)
 {
 	int i;
 	unsigned long index = pos >> PAGE_SHIFT;
-	gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
+	gfp_t mask = get_prepare_gfp_flags(inode, nowait);
+	int fgp_flags = get_prepare_fgp_flags(nowait);
 	int err = 0;
 	int faili;
 
 	for (i = 0; i < num_pages; i++) {
 again:
-		pages[i] = find_or_create_page(inode->i_mapping, index + i,
-					       mask | __GFP_WRITE);
+		pages[i] = pagecache_get_page(inode->i_mapping, index + i,
+					fgp_flags, mask | __GFP_WRITE);
 		if (!pages[i]) {
 			faili = i - 1;
-			err = -ENOMEM;
+			if (nowait)
+				err = -EAGAIN;
+			else
+				err = -ENOMEM;
 			goto fail;
 		}
 
@@ -1376,7 +1405,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
 						    pos + write_bytes, false);
 		if (err) {
 			put_page(pages[i]);
-			if (err == -EAGAIN) {
+			if (!nowait && err == -EAGAIN) {
 				err = 0;
 				goto again;
 			}
@@ -1716,7 +1745,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		 */
 		ret = prepare_pages(inode, pages, num_pages,
 				    pos, write_bytes,
-				    force_page_uptodate);
+				    force_page_uptodate, false);
 		if (ret) {
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 08/12] btrfs: make lock_and_cleanup_extent_if_need nowait compatible
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (6 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 07/12] btrfs: make prepare_pages " Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:17   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 09/12] btrfs: btrfs: plumb NOWAIT through the write path Stefan Roesch
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

This adds the nowait parameter to lock_and_cleanup_extent_if_need(). If
the nowait parameter is specified we try to lock the extent in nowait
mode.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a154a3cec44b..4e1745e585cb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1440,7 +1440,7 @@ static noinline int
 lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 				size_t num_pages, loff_t pos,
 				size_t write_bytes,
-				u64 *lockstart, u64 *lockend,
+				u64 *lockstart, u64 *lockend, bool nowait,
 				struct extent_state **cached_state)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
@@ -1455,8 +1455,20 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
 	if (start_pos < inode->vfs_inode.i_size) {
 		struct btrfs_ordered_extent *ordered;
 
-		lock_extent_bits(&inode->io_tree, start_pos, last_pos,
+		if (nowait) {
+			if (!try_lock_extent(&inode->io_tree, start_pos, last_pos)) {
+				for (i = 0; i < num_pages; i++) {
+					unlock_page(pages[i]);
+					put_page(pages[i]);
+				}
+
+				return -EAGAIN;
+			}
+		} else {
+			lock_extent_bits(&inode->io_tree, start_pos, last_pos,
 				cached_state);
+		}
+
 		ordered = btrfs_lookup_ordered_range(inode, start_pos,
 						     last_pos - start_pos + 1);
 		if (ordered &&
@@ -1755,7 +1767,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extents_locked = lock_and_cleanup_extent_if_need(
 				BTRFS_I(inode), pages,
 				num_pages, pos, write_bytes, &lockstart,
-				&lockend, &cached_state);
+				&lockend, false, &cached_state);
 		if (extents_locked < 0) {
 			if (extents_locked == -EAGAIN)
 				goto again;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 09/12] btrfs: btrfs: plumb NOWAIT through the write path
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (7 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 08/12] btrfs: make lock_and_cleanup_extent_if_need " Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:16   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 10/12] btrfs: make balance_dirty_pages nowait compatible Stefan Roesch
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

We have everywhere setup for nowait, plumb NOWAIT through the write path.

Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 4e1745e585cb..6e191e353b22 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1653,8 +1653,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	bool force_page_uptodate = false;
 	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
+	bool nowait = iocb->ki_flags & IOCB_NOWAIT;
 
-	if (iocb->ki_flags & IOCB_NOWAIT)
+	if (nowait)
 		ilock_flags |= BTRFS_ILOCK_TRY;
 
 	ret = btrfs_inode_lock(inode, ilock_flags);
@@ -1710,17 +1711,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extent_changeset_release(data_reserved);
 		ret = btrfs_check_data_free_space(BTRFS_I(inode),
 						  &data_reserved, pos,
-						  write_bytes, false);
+						  write_bytes, nowait);
 		if (ret < 0) {
 			int tmp;
 
+			if (nowait && (ret == -ENOSPC || ret == -EAGAIN)) {
+				ret = -EAGAIN;
+				break;
+			}
+
 			/*
 			 * If we don't have to COW at the offset, reserve
 			 * metadata only. write_bytes may get smaller than
 			 * requested here.
 			 */
 			tmp = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
-						     &write_bytes, false);
+						     &write_bytes, nowait);
 			if (tmp < 0)
 				ret = tmp;
 			if (tmp > 0)
@@ -1737,7 +1743,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		WARN_ON(reserve_bytes == 0);
 		ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
 						      reserve_bytes,
-						      reserve_bytes, false);
+						      reserve_bytes, nowait);
 		if (ret) {
 			if (!only_release_metadata)
 				btrfs_free_reserved_data_space(BTRFS_I(inode),
@@ -1767,10 +1773,11 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 		extents_locked = lock_and_cleanup_extent_if_need(
 				BTRFS_I(inode), pages,
 				num_pages, pos, write_bytes, &lockstart,
-				&lockend, false, &cached_state);
+				&lockend, nowait, &cached_state);
 		if (extents_locked < 0) {
-			if (extents_locked == -EAGAIN)
+			if (!nowait && extents_locked == -EAGAIN)
 				goto again;
+
 			btrfs_delalloc_release_extents(BTRFS_I(inode),
 						       reserve_bytes);
 			ret = extents_locked;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 10/12] btrfs: make balance_dirty_pages nowait compatible
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (8 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 09/12] btrfs: btrfs: plumb NOWAIT through the write path Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:16   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 11/12] btrfs: add assert to search functions Stefan Roesch
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

This replaces the call to function balance_dirty_pages_ratelimited() in
the function btrfs_buffered_write() with a call to
balance_dirty_pages_ratelimited_flags().

It also moves the function after the again label. This can cause the
function to be called a bit later, but this should have no impact in the
real world.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 6e191e353b22..fd42ba9de7a7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1654,6 +1654,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 	loff_t old_isize = i_size_read(inode);
 	unsigned int ilock_flags = 0;
 	bool nowait = iocb->ki_flags & IOCB_NOWAIT;
+	unsigned int bdp_flags = nowait ? BDP_ASYNC : 0;
 
 	if (nowait)
 		ilock_flags |= BTRFS_ILOCK_TRY;
@@ -1756,6 +1757,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		release_bytes = reserve_bytes;
 again:
+		ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
+		if (unlikely(ret))
+			break;
+
 		/*
 		 * This is going to setup the pages array with the number of
 		 * pages we want, so we don't really need to worry about the
@@ -1860,8 +1865,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
 
 		cond_resched();
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
-
 		pos += copied;
 		num_written += copied;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 11/12] btrfs: add assert to search functions
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (9 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 10/12] btrfs: make balance_dirty_pages nowait compatible Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:15   ` Filipe Manana
  2022-09-08  0:26 ` [PATCH v2 12/12] btrfs: enable nowait async buffered writes Stefan Roesch
  2022-09-08 10:14 ` [PATCH v2 00/12] io-uring/btrfs: support " Filipe Manana
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

This adds warnings to search functions, which should not have the nowait
flag set when called.

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/ctree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 71b238364939..9caf0f87cbcb 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2165,6 +2165,9 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
 	lowest_level = p->lowest_level;
 	WARN_ON(p->nodes[0] != NULL);
 
+	if (WARN_ON_ONCE(p->nowait == 1))
+		return -EINVAL;
+
 	if (p->search_commit_root) {
 		BUG_ON(time_seq);
 		return btrfs_search_slot(NULL, root, key, p, 0, 0);
@@ -4465,6 +4468,9 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key,
 	int ret = 1;
 	int keep_locks = path->keep_locks;
 
+	if (WARN_ON_ONCE(path->nowait == 1))
+		return -EINVAL;
+
 	path->keep_locks = 1;
 again:
 	cur = btrfs_read_lock_root_node(root);
@@ -4645,6 +4651,9 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
 	int ret;
 	int i;
 
+	if (WARN_ON_ONCE(path->nowait == 1))
+		return -EINVAL;
+
 	nritems = btrfs_header_nritems(path->nodes[0]);
 	if (nritems == 0)
 		return 1;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 12/12] btrfs: enable nowait async buffered writes
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (10 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 11/12] btrfs: add assert to search functions Stefan Roesch
@ 2022-09-08  0:26 ` Stefan Roesch
  2022-09-08 10:14   ` Filipe Manana
  2022-09-08 10:14 ` [PATCH v2 00/12] io-uring/btrfs: support " Filipe Manana
  12 siblings, 1 reply; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08  0:26 UTC (permalink / raw)
  To: kernel-team, io-uring, linux-btrfs; +Cc: shr, axboe, josef, fdmanana

Enable nowait async buffered writes in btrfs_do_write_iter() and
btrfs_file_open().

Signed-off-by: Stefan Roesch <[email protected]>
---
 fs/btrfs/file.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fd42ba9de7a7..887497fd524f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2107,13 +2107,13 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	if (BTRFS_FS_ERROR(inode->root->fs_info))
 		return -EROFS;
 
-	if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
-		return -EOPNOTSUPP;
-
 	if (sync)
 		atomic_inc(&inode->sync_writers);
 
 	if (encoded) {
+		if (iocb->ki_flags & IOCB_NOWAIT)
+			return -EOPNOTSUPP;
+
 		num_written = btrfs_encoded_write(iocb, from, encoded);
 		num_sync = encoded->len;
 	} else if (iocb->ki_flags & IOCB_DIRECT) {
@@ -3755,7 +3755,7 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
 {
 	int ret;
 
-	filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
+	filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
 
 	ret = fsverity_file_open(inode, filp);
 	if (ret)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/12] io-uring/btrfs: support async buffered writes
  2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
                   ` (11 preceding siblings ...)
  2022-09-08  0:26 ` [PATCH v2 12/12] btrfs: enable nowait async buffered writes Stefan Roesch
@ 2022-09-08 10:14 ` Filipe Manana
  12 siblings, 0 replies; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:14 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> This patch series adds support for async buffered writes when using both
> btrfs and io-uring. Currently io-uring only supports buffered writes (for btrfs)
> in the slow path, by processing them in the io workers. With this patch series
> it is now possible to support buffered writes in the fast path. To be able to use
> the fast path, the required pages must be in the page cache, the required locks
> in btrfs can be granted immediately and no additional blocks need to be read
> form disk.
>
> This patch series makes use of the changes that have been introduced by a
> previous patch series: "io-uring/xfs: support async buffered writes"
>
> Performance results:
>   For fio the following results have been obtained with a queue depth of
>   1 and 4k block size (runtime 600 secs):
>
>                  sequential writes:
>                  without patch           with patch      libaio     psync
>   iops:              55k                    134k          117K       148K
>   bw:               221MB/s                 538MB/s       469MB/s    592MB/s
>   clat:           15286ns                    82ns         994ns     6340ns

This is very interesting information to not have in the git history,
specially because it's
exactly this that motivates all the patches.

I suggest adding this to the changelog of the last patch, and then
mention that those results
are the result of comparing a branch with the patchset versus one
without it, also mentioning
the subject of all the patches that belong to the patchset.

It should also mention how to run the test: i.e. the fio command line
or config file, so
that it's clear to everyone all the details and how to reproduce the results.

The patchset overall looks good to me, I just left a few comments on
most of the individual patches,
all minor things.

Thanks.

>
>
> For an io depth of 1, the new patch improves throughput by over two times
> (compared to the exiting behavior, where buffered writes are processed by an
> io-worker process) and also the latency is considerably reduced. To achieve the
> same or better performance with the exisiting code an io depth of 4 is required.
> Increasing the iodepth further does not lead to improvements.
>
>
> BTRFS changes:
>  -Add option for NOWAIT IOCB's to tell that searches do not wait on locks. This
>   adds the nowait option to btrfs_path.
>
>  -For NOWAIT buffered writes on PREALLOC or NOCOW extents tell can_nocow_extent()
>   that we don't want to wait on any locks or metadata IO.
>
>  -Support no_flush reservations for nowait buffered writes.
>
>  -Add btrfs_try_lock_ordered_range() function.
>
>  -Add nowait flag to btrfs_check_nocow_lock() to use it in write code path.
>
>  -Add nowait parameter to prepare_pages() function.
>
>  -Plumb nowait through the write code path.
>
>  -Enable nowait buffered writes.
>
>
> Testing:
>   This patch has been tested with xfstests, fsx, fio. xfstests shows no new
>   diffs compared to running without the patch series.
>
>
> Changes:
>
> V2:
>  - Replace EWOULDBLOCK with EAGAIN. In Version 1 it was not used consistently
>  - Export function balance_dirty_pages_ratelimited_flags()
>  - Add asserts/warnings for search functions when nowait is set, but we don't
>    expect that they are invoked with nowait set.
>
>
> Josef Bacik (5):
>   btrfs: implement a nowait option for tree searches
>   btrfs: make can_nocow_extent nowait compatible
>   btrfs: add the ability to use NO_FLUSH for data reservations
>   btrfs: add btrfs_try_lock_ordered_range
>   btrfs: make btrfs_check_nocow_lock nowait compatible
>
> Stefan Roesch (7):
>   mm: export balance_dirty_pages_ratelimited_flags()
>   btrfs: make prepare_pages nowait compatible
>   btrfs: make lock_and_cleanup_extent_if_need nowait compatible
>   btrfs: btrfs: plumb NOWAIT through the write path
>   btrfs: make balance_dirty_pages nowait compatible
>   btrfs: add assert to search functions
>   btrfs: enable nowait async buffered writes
>
>  fs/btrfs/block-group.c    |   2 +-
>  fs/btrfs/ctree.c          |  48 ++++++++++++++-
>  fs/btrfs/ctree.h          |   8 ++-
>  fs/btrfs/delalloc-space.c |  13 +++-
>  fs/btrfs/delalloc-space.h |   3 +-
>  fs/btrfs/extent-tree.c    |   5 ++
>  fs/btrfs/file-item.c      |   4 +-
>  fs/btrfs/file.c           | 124 ++++++++++++++++++++++++++++----------
>  fs/btrfs/inode.c          |  22 ++++---
>  fs/btrfs/locking.c        |  23 +++++++
>  fs/btrfs/locking.h        |   1 +
>  fs/btrfs/ordered-data.c   |  28 +++++++++
>  fs/btrfs/ordered-data.h   |   1 +
>  fs/btrfs/relocation.c     |   2 +-
>  fs/btrfs/scrub.c          |   4 +-
>  fs/btrfs/space-info.c     |   3 +-
>  fs/btrfs/tree-log.c       |   6 +-
>  mm/page-writeback.c       |   1 +
>  18 files changed, 239 insertions(+), 59 deletions(-)
>
>
> base-commit: 53e99dcff61e1523ec1c3628b2d564ba15d32eb7
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 12/12] btrfs: enable nowait async buffered writes
  2022-09-08  0:26 ` [PATCH v2 12/12] btrfs: enable nowait async buffered writes Stefan Roesch
@ 2022-09-08 10:14   ` Filipe Manana
  2022-09-08 19:14     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:14 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:29 AM Stefan Roesch <[email protected]> wrote:
>
> Enable nowait async buffered writes in btrfs_do_write_iter() and
> btrfs_file_open().

This is too terse, see below.

>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/file.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index fd42ba9de7a7..887497fd524f 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2107,13 +2107,13 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
>         if (BTRFS_FS_ERROR(inode->root->fs_info))
>                 return -EROFS;
>
> -       if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
> -               return -EOPNOTSUPP;
> -
>         if (sync)
>                 atomic_inc(&inode->sync_writers);
>
>         if (encoded) {
> +               if (iocb->ki_flags & IOCB_NOWAIT)
> +                       return -EOPNOTSUPP;

The changelog should provide some rationale about why encoded writes
are not supported.

Thanks.

> +
>                 num_written = btrfs_encoded_write(iocb, from, encoded);
>                 num_sync = encoded->len;
>         } else if (iocb->ki_flags & IOCB_DIRECT) {
> @@ -3755,7 +3755,7 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
>  {
>         int ret;
>
> -       filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
> +       filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
>
>         ret = fsverity_file_open(inode, filp);
>         if (ret)
> --
> 2.30.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 11/12] btrfs: add assert to search functions
  2022-09-08  0:26 ` [PATCH v2 11/12] btrfs: add assert to search functions Stefan Roesch
@ 2022-09-08 10:15   ` Filipe Manana
  2022-09-08 19:10     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:15 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> This adds warnings to search functions, which should not have the nowait
> flag set when called.

This could be more clear, by saying btree search functions which are
not used for the buffered IO
and direct IO paths, which are the only users of nowait btree searches.

Also the subject: "btrfs: add assert to search functions"

Mentions assert, but the code adds warnings, which are not the same.
It could also be more clear like:   "btrfs: assert nowait mode is not
used for some btree search functions''


>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/ctree.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 71b238364939..9caf0f87cbcb 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2165,6 +2165,9 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
>         lowest_level = p->lowest_level;
>         WARN_ON(p->nodes[0] != NULL);
>
> +       if (WARN_ON_ONCE(p->nowait == 1))

This doesn't follow the existing code style, which is to treat path
members as booleans, and just do:

WARN_ON_ONCE(p->nowait)

I.e., no explicit " == 1"

As this is a developer thing, I would use ASSERT() instead.

For release builds that typically have CONFIG_BTRFS_ASSERT not set
(like Ubuntu and Debian), it would
still allow the search to continue, which is fine from a functional
perspective, since not respecting nowait
semantics is just a performance thing.

Thanks.


> +               return -EINVAL;
> +
>         if (p->search_commit_root) {
>                 BUG_ON(time_seq);
>                 return btrfs_search_slot(NULL, root, key, p, 0, 0);
> @@ -4465,6 +4468,9 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key,
>         int ret = 1;
>         int keep_locks = path->keep_locks;
>
> +       if (WARN_ON_ONCE(path->nowait == 1))
> +               return -EINVAL;
> +
>         path->keep_locks = 1;
>  again:
>         cur = btrfs_read_lock_root_node(root);
> @@ -4645,6 +4651,9 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
>         int ret;
>         int i;
>
> +       if (WARN_ON_ONCE(path->nowait == 1))
> +               return -EINVAL;
> +
>         nritems = btrfs_header_nritems(path->nodes[0]);
>         if (nritems == 0)
>                 return 1;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 10/12] btrfs: make balance_dirty_pages nowait compatible
  2022-09-08  0:26 ` [PATCH v2 10/12] btrfs: make balance_dirty_pages nowait compatible Stefan Roesch
@ 2022-09-08 10:16   ` Filipe Manana
  2022-09-08 18:48     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:16 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> This replaces the call to function balance_dirty_pages_ratelimited() in
> the function btrfs_buffered_write() with a call to
> balance_dirty_pages_ratelimited_flags().
>
> It also moves the function after the again label. This can cause the
> function to be called a bit later, but this should have no impact in the
> real world.
>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/file.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 6e191e353b22..fd42ba9de7a7 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1654,6 +1654,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>         loff_t old_isize = i_size_read(inode);
>         unsigned int ilock_flags = 0;
>         bool nowait = iocb->ki_flags & IOCB_NOWAIT;
> +       unsigned int bdp_flags = nowait ? BDP_ASYNC : 0;
>
>         if (nowait)
>                 ilock_flags |= BTRFS_ILOCK_TRY;
> @@ -1756,6 +1757,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>
>                 release_bytes = reserve_bytes;
>  again:
> +               ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
> +               if (unlikely(ret))

We normally only use likely or unlikely in contextes where we observe
that it makes a significant difference.
What's the motivation here, have you verified that in this case it has
a significant impact?

Thanks.

> +                       break;
> +
>                 /*
>                  * This is going to setup the pages array with the number of
>                  * pages we want, so we don't really need to worry about the
> @@ -1860,8 +1865,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>
>                 cond_resched();
>
> -               balance_dirty_pages_ratelimited(inode->i_mapping);
> -
>                 pos += copied;
>                 num_written += copied;
>         }
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/12] btrfs: btrfs: plumb NOWAIT through the write path
  2022-09-08  0:26 ` [PATCH v2 09/12] btrfs: btrfs: plumb NOWAIT through the write path Stefan Roesch
@ 2022-09-08 10:16   ` Filipe Manana
  2022-09-08 18:44     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:16 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> We have everywhere setup for nowait, plumb NOWAIT through the write path.

Note, there's a double "btrfs: " prefix in the subject line.

>
> Signed-off-by: Josef Bacik <[email protected]>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/file.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 4e1745e585cb..6e191e353b22 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1653,8 +1653,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>         bool force_page_uptodate = false;
>         loff_t old_isize = i_size_read(inode);
>         unsigned int ilock_flags = 0;
> +       bool nowait = iocb->ki_flags & IOCB_NOWAIT;

Can be made const.

Thanks.

>
> -       if (iocb->ki_flags & IOCB_NOWAIT)
> +       if (nowait)
>                 ilock_flags |= BTRFS_ILOCK_TRY;
>
>         ret = btrfs_inode_lock(inode, ilock_flags);
> @@ -1710,17 +1711,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                 extent_changeset_release(data_reserved);
>                 ret = btrfs_check_data_free_space(BTRFS_I(inode),
>                                                   &data_reserved, pos,
> -                                                 write_bytes, false);
> +                                                 write_bytes, nowait);
>                 if (ret < 0) {
>                         int tmp;
>
> +                       if (nowait && (ret == -ENOSPC || ret == -EAGAIN)) {
> +                               ret = -EAGAIN;
> +                               break;
> +                       }
> +
>                         /*
>                          * If we don't have to COW at the offset, reserve
>                          * metadata only. write_bytes may get smaller than
>                          * requested here.
>                          */
>                         tmp = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
> -                                                    &write_bytes, false);
> +                                                    &write_bytes, nowait);
>                         if (tmp < 0)
>                                 ret = tmp;
>                         if (tmp > 0)
> @@ -1737,7 +1743,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                 WARN_ON(reserve_bytes == 0);
>                 ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
>                                                       reserve_bytes,
> -                                                     reserve_bytes, false);
> +                                                     reserve_bytes, nowait);
>                 if (ret) {
>                         if (!only_release_metadata)
>                                 btrfs_free_reserved_data_space(BTRFS_I(inode),
> @@ -1767,10 +1773,11 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                 extents_locked = lock_and_cleanup_extent_if_need(
>                                 BTRFS_I(inode), pages,
>                                 num_pages, pos, write_bytes, &lockstart,
> -                               &lockend, false, &cached_state);
> +                               &lockend, nowait, &cached_state);
>                 if (extents_locked < 0) {
> -                       if (extents_locked == -EAGAIN)
> +                       if (!nowait && extents_locked == -EAGAIN)
>                                 goto again;
> +
>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
>                                                        reserve_bytes);
>                         ret = extents_locked;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/12] btrfs: make lock_and_cleanup_extent_if_need nowait compatible
  2022-09-08  0:26 ` [PATCH v2 08/12] btrfs: make lock_and_cleanup_extent_if_need " Stefan Roesch
@ 2022-09-08 10:17   ` Filipe Manana
  2022-09-08 18:38     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:17 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> This adds the nowait parameter to lock_and_cleanup_extent_if_need(). If
> the nowait parameter is specified we try to lock the extent in nowait
> mode.
>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/file.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a154a3cec44b..4e1745e585cb 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1440,7 +1440,7 @@ static noinline int
>  lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>                                 size_t num_pages, loff_t pos,
>                                 size_t write_bytes,
> -                               u64 *lockstart, u64 *lockend,
> +                               u64 *lockstart, u64 *lockend, bool nowait,
>                                 struct extent_state **cached_state)
>  {
>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
> @@ -1455,8 +1455,20 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>         if (start_pos < inode->vfs_inode.i_size) {
>                 struct btrfs_ordered_extent *ordered;
>
> -               lock_extent_bits(&inode->io_tree, start_pos, last_pos,
> +               if (nowait) {
> +                       if (!try_lock_extent(&inode->io_tree, start_pos, last_pos)) {
> +                               for (i = 0; i < num_pages; i++) {
> +                                       unlock_page(pages[i]);
> +                                       put_page(pages[i]);

Since this is a non-local array, I'd prefer if we also set pages[i] to NULL.
That may help prevent hard to debug bugs in the future.

Thanks.


> +                               }
> +
> +                               return -EAGAIN;
> +                       }
> +               } else {
> +                       lock_extent_bits(&inode->io_tree, start_pos, last_pos,
>                                 cached_state);
> +               }
> +
>                 ordered = btrfs_lookup_ordered_range(inode, start_pos,
>                                                      last_pos - start_pos + 1);
>                 if (ordered &&
> @@ -1755,7 +1767,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                 extents_locked = lock_and_cleanup_extent_if_need(
>                                 BTRFS_I(inode), pages,
>                                 num_pages, pos, write_bytes, &lockstart,
> -                               &lockend, &cached_state);
> +                               &lockend, false, &cached_state);
>                 if (extents_locked < 0) {
>                         if (extents_locked == -EAGAIN)
>                                 goto again;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/12] btrfs: make prepare_pages nowait compatible
  2022-09-08  0:26 ` [PATCH v2 07/12] btrfs: make prepare_pages " Stefan Roesch
@ 2022-09-08 10:17   ` Filipe Manana
  2022-09-08 18:33     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:17 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> Add nowait parameter to the prepare_pages function. In case nowait is
> specified for an async buffered write request, do a nowait allocation or
> return -EAGAIN.
>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/file.c | 43 ++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index cf19d381ead6..a154a3cec44b 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1339,26 +1339,55 @@ static int prepare_uptodate_page(struct inode *inode,
>         return 0;
>  }
>
> +static int get_prepare_fgp_flags(bool nowait)
> +{
> +       int fgp_flags;
> +
> +       fgp_flags = FGP_LOCK|FGP_ACCESSED|FGP_CREAT;

Please follow the existing code style and add a space before and after
each bitwise or operator.
Not only does it conform to the btrfs style, it's also easier to read.

The assignment could also be done when declaring the variable, since
it's short and simple.

Thanks.

> +       if (nowait)
> +               fgp_flags |= FGP_NOWAIT;
> +
> +       return fgp_flags;
> +}
> +
> +static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait)
> +{
> +       gfp_t gfp;
> +
> +       gfp = btrfs_alloc_write_mask(inode->i_mapping);
> +       if (nowait) {
> +               gfp &= ~__GFP_DIRECT_RECLAIM;
> +               gfp |= GFP_NOWAIT;
> +       }
> +
> +       return gfp;
> +}
> +
>  /*
>   * this just gets pages into the page cache and locks them down.
>   */
>  static noinline int prepare_pages(struct inode *inode, struct page **pages,
>                                   size_t num_pages, loff_t pos,
> -                                 size_t write_bytes, bool force_uptodate)
> +                                 size_t write_bytes, bool force_uptodate,
> +                                 bool nowait)
>  {
>         int i;
>         unsigned long index = pos >> PAGE_SHIFT;
> -       gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
> +       gfp_t mask = get_prepare_gfp_flags(inode, nowait);
> +       int fgp_flags = get_prepare_fgp_flags(nowait);
>         int err = 0;
>         int faili;
>
>         for (i = 0; i < num_pages; i++) {
>  again:
> -               pages[i] = find_or_create_page(inode->i_mapping, index + i,
> -                                              mask | __GFP_WRITE);
> +               pages[i] = pagecache_get_page(inode->i_mapping, index + i,
> +                                       fgp_flags, mask | __GFP_WRITE);
>                 if (!pages[i]) {
>                         faili = i - 1;
> -                       err = -ENOMEM;
> +                       if (nowait)
> +                               err = -EAGAIN;
> +                       else
> +                               err = -ENOMEM;
>                         goto fail;
>                 }
>
> @@ -1376,7 +1405,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
>                                                     pos + write_bytes, false);
>                 if (err) {
>                         put_page(pages[i]);
> -                       if (err == -EAGAIN) {
> +                       if (!nowait && err == -EAGAIN) {
>                                 err = 0;
>                                 goto again;
>                         }
> @@ -1716,7 +1745,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                  */
>                 ret = prepare_pages(inode, pages, num_pages,
>                                     pos, write_bytes,
> -                                   force_page_uptodate);
> +                                   force_page_uptodate, false);
>                 if (ret) {
>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
>                                                        reserve_bytes);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible
  2022-09-08  0:26 ` [PATCH v2 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible Stefan Roesch
@ 2022-09-08 10:18   ` Filipe Manana
  2022-09-08 18:23     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:18 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> From: Josef Bacik <[email protected]>
>
> Now all the helpers that btrfs_check_nocow_lock uses handle nowait, add
> a nowait flag to btrfs_check_nocow_lock so it can be used by the write
> path.
>
> Signed-off-by: Josef Bacik <[email protected]>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/ctree.h |  2 +-
>  fs/btrfs/file.c  | 33 ++++++++++++++++++++++-----------
>  fs/btrfs/inode.c |  2 +-
>  3 files changed, 24 insertions(+), 13 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 536bbc8551fc..06cb25f2d3bd 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3482,7 +3482,7 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
>                       struct extent_state **cached, bool noreserve);
>  int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>  int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
> -                          size_t *write_bytes);
> +                          size_t *write_bytes, bool nowait);
>  void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
>
>  /* tree-defrag.c */
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 0f257205c63d..cf19d381ead6 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1481,7 +1481,7 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>   * NOTE: Callers need to call btrfs_check_nocow_unlock() if we return > 0.
>   */
>  int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
> -                          size_t *write_bytes)
> +                          size_t *write_bytes, bool nowait)
>  {
>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>         struct btrfs_root *root = inode->root;
> @@ -1500,16 +1500,21 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>                            fs_info->sectorsize) - 1;
>         num_bytes = lockend - lockstart + 1;
>
> -       btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
> +       if (nowait) {
> +               if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend)) {
> +                       btrfs_drew_write_unlock(&root->snapshot_lock);
> +                       return -EAGAIN;
> +               }
> +       } else {
> +               btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
> +       }
>         ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
> -                       NULL, NULL, NULL, false, false);
> -       if (ret <= 0) {
> -               ret = 0;
> +                       NULL, NULL, NULL, nowait, false);
> +       if (ret <= 0)
>                 btrfs_drew_write_unlock(&root->snapshot_lock);
> -       } else {
> +       else
>                 *write_bytes = min_t(size_t, *write_bytes ,
>                                      num_bytes - pos + lockstart);
> -       }
>         unlock_extent(&inode->io_tree, lockstart, lockend);
>
>         return ret;
> @@ -1666,16 +1671,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>                                                   &data_reserved, pos,
>                                                   write_bytes, false);
>                 if (ret < 0) {
> +                       int tmp;
> +
>                         /*
>                          * If we don't have to COW at the offset, reserve
>                          * metadata only. write_bytes may get smaller than
>                          * requested here.
>                          */
> -                       if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
> -                                                  &write_bytes) > 0)
> -                               only_release_metadata = true;
> -                       else
> +                       tmp = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
> +                                                    &write_bytes, false);
> +                       if (tmp < 0)
> +                               ret = tmp;
> +                       if (tmp > 0)
> +                               ret = 0;
> +                       if (ret)

A variable named tmp is not a great name, something like "can_nocow'
would be a lot more clear.

Thanks.


>                                 break;
> +                       only_release_metadata = true;
>                 }
>
>                 num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 36e755f73764..5426d4f4ac23 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4884,7 +4884,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
>         ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>                                           blocksize, false);
>         if (ret < 0) {
> -               if (btrfs_check_nocow_lock(inode, block_start, &write_bytes) > 0) {
> +               if (btrfs_check_nocow_lock(inode, block_start, &write_bytes, false) > 0) {
>                         /* For nocow case, no need to reserve data space */
>                         only_release_metadata = true;
>                 } else {
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/12] btrfs: add btrfs_try_lock_ordered_range
  2022-09-08  0:26 ` [PATCH v2 05/12] btrfs: add btrfs_try_lock_ordered_range Stefan Roesch
@ 2022-09-08 10:18   ` Filipe Manana
  2022-09-08 18:12     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:18 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> From: Josef Bacik <[email protected]>
>
> For IOCB_NOWAIT we're going to want to use try lock on the extent lock,
> and simply bail if there's an ordered extent in the range because the
> only choice there is to wait for the ordered extent to complete.
>
> Signed-off-by: Josef Bacik <[email protected]>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
>  fs/btrfs/ordered-data.c | 28 ++++++++++++++++++++++++++++
>  fs/btrfs/ordered-data.h |  1 +
>  2 files changed, 29 insertions(+)
>
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 1952ac85222c..3cdfdcedb088 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -1041,6 +1041,34 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
>         }
>  }
>
> +/*
> + * btrfs_try_lock_ordered_range - lock the passed range and ensure all pending
> + * ordered extents in it are run to completion in nowait mode.
> + *
> + * @inode:        Inode whose ordered tree is to be searched
> + * @start:        Beginning of range to flush
> + * @end:          Last byte of range to lock
> + *
> + * This function returns 1 if btrfs_lock_ordered_range does not return any
> + * extents, otherwise 0.

Why not a bool, true/false? That's all that is needed, and it's clear.

Thanks.

> + */
> +int btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end)
> +{
> +       struct btrfs_ordered_extent *ordered;
> +
> +       if (!try_lock_extent(&inode->io_tree, start, end))
> +               return 0;
> +
> +       ordered = btrfs_lookup_ordered_range(inode, start, end - start + 1);
> +       if (!ordered)
> +               return 1;
> +
> +       btrfs_put_ordered_extent(ordered);
> +       unlock_extent(&inode->io_tree, start, end);
> +       return 0;
> +}
> +
> +
>  static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
>                                 u64 len)
>  {
> diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
> index 87792f85e2c4..ec27ebf0af4b 100644
> --- a/fs/btrfs/ordered-data.h
> +++ b/fs/btrfs/ordered-data.h
> @@ -218,6 +218,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
>  void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
>                                         u64 end,
>                                         struct extent_state **cached_state);
> +int btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end);
>  int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
>                                u64 post);
>  int __init ordered_data_init(void);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/12] mm: export balance_dirty_pages_ratelimited_flags()
  2022-09-08  0:26 ` [PATCH v2 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
@ 2022-09-08 10:18   ` Filipe Manana
  0 siblings, 0 replies; 33+ messages in thread
From: Filipe Manana @ 2022-09-08 10:18 UTC (permalink / raw)
  To: Stefan Roesch
  Cc: kernel-team, io-uring, linux-btrfs, axboe, josef,
	kernel test robot

On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>
> Export the function balance_dirty_pages_ratelimited_flags(). It is now
> also called from btrfs.
>
> Signed-off-by: Stefan Roesch <[email protected]>
> Reported-by: kernel test robot <[email protected]>
> ---
>  mm/page-writeback.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 032a7bf8d259..7e9d8d857ecc 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1933,6 +1933,7 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
>         wb_put(wb);
>         return ret;
>  }
> +EXPORT_SYMBOL_GPL(balance_dirty_pages_ratelimited_flags);

Even though it's a trivial change, the linux-mm list should be CC'ed.

Thanks.

>
>  /**
>   * balance_dirty_pages_ratelimited - balance dirty memory state.
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/12] btrfs: implement a nowait option for tree searches
  2022-09-08  0:26 ` [PATCH v2 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
@ 2022-09-08 13:19   ` Josef Bacik
  2022-09-08 18:05     ` Stefan Roesch
  0 siblings, 1 reply; 33+ messages in thread
From: Josef Bacik @ 2022-09-08 13:19 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, io-uring, linux-btrfs, axboe, fdmanana

On Wed, Sep 07, 2022 at 05:26:06PM -0700, Stefan Roesch wrote:
> From: Josef Bacik <[email protected]>
> 
> For NOWAIT IOCB's we'll need a way to tell search to not wait on locks
> or anything.  Accomplish this by adding a path->nowait flag that will
> use trylocks and skip reading of metadata, returning -EWOULDBLOCK in
> either of these cases.  For now we only need this for reads, so only the
> read side is handled.  Add an ASSERT() to catch anybody trying to use
> this for writes so they know they'll have to implement the write side.
> 
> Signed-off-by: Josef Bacik <[email protected]>
> Signed-off-by: Stefan Roesch <[email protected]>

Should update my changelog to say use -EAGAIN instead of -EWOULDBLOCK.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/12] btrfs: implement a nowait option for tree searches
  2022-09-08 13:19   ` Josef Bacik
@ 2022-09-08 18:05     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 18:05 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, io-uring, linux-btrfs, axboe, fdmanana



On 9/8/22 6:19 AM, Josef Bacik wrote:
> > 
> On Wed, Sep 07, 2022 at 05:26:06PM -0700, Stefan Roesch wrote:
>> From: Josef Bacik <[email protected]>
>>
>> For NOWAIT IOCB's we'll need a way to tell search to not wait on locks
>> or anything.  Accomplish this by adding a path->nowait flag that will
>> use trylocks and skip reading of metadata, returning -EWOULDBLOCK in
>> either of these cases.  For now we only need this for reads, so only the
>> read side is handled.  Add an ASSERT() to catch anybody trying to use
>> this for writes so they know they'll have to implement the write side.
>>
>> Signed-off-by: Josef Bacik <[email protected]>
>> Signed-off-by: Stefan Roesch <[email protected]>
> 
> Should update my changelog to say use -EAGAIN instead of -EWOULDBLOCK.  Thanks,
> 

The next version will have that change.

> Josef

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/12] btrfs: add btrfs_try_lock_ordered_range
  2022-09-08 10:18   ` Filipe Manana
@ 2022-09-08 18:12     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 18:12 UTC (permalink / raw)
  To: Filipe Manana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:18 AM, Filipe Manana wrote:
> > 
> On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>>
>> From: Josef Bacik <[email protected]>
>>
>> For IOCB_NOWAIT we're going to want to use try lock on the extent lock,
>> and simply bail if there's an ordered extent in the range because the
>> only choice there is to wait for the ordered extent to complete.
>>
>> Signed-off-by: Josef Bacik <[email protected]>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/ordered-data.c | 28 ++++++++++++++++++++++++++++
>>  fs/btrfs/ordered-data.h |  1 +
>>  2 files changed, 29 insertions(+)
>>
>> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
>> index 1952ac85222c..3cdfdcedb088 100644
>> --- a/fs/btrfs/ordered-data.c
>> +++ b/fs/btrfs/ordered-data.c
>> @@ -1041,6 +1041,34 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
>>         }
>>  }
>>
>> +/*
>> + * btrfs_try_lock_ordered_range - lock the passed range and ensure all pending
>> + * ordered extents in it are run to completion in nowait mode.
>> + *
>> + * @inode:        Inode whose ordered tree is to be searched
>> + * @start:        Beginning of range to flush
>> + * @end:          Last byte of range to lock
>> + *
>> + * This function returns 1 if btrfs_lock_ordered_range does not return any
>> + * extents, otherwise 0.
> 
> Why not a bool, true/false? That's all that is needed, and it's clear.
> 
> Thanks.
>

The next version of the patch series will return bool instead of int.
 
>> + */
>> +int btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end)
>> +{
>> +       struct btrfs_ordered_extent *ordered;
>> +
>> +       if (!try_lock_extent(&inode->io_tree, start, end))
>> +               return 0;
>> +
>> +       ordered = btrfs_lookup_ordered_range(inode, start, end - start + 1);
>> +       if (!ordered)
>> +               return 1;
>> +
>> +       btrfs_put_ordered_extent(ordered);
>> +       unlock_extent(&inode->io_tree, start, end);
>> +       return 0;
>> +}
>> +
>> +
>>  static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
>>                                 u64 len)
>>  {
>> diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
>> index 87792f85e2c4..ec27ebf0af4b 100644
>> --- a/fs/btrfs/ordered-data.h
>> +++ b/fs/btrfs/ordered-data.h
>> @@ -218,6 +218,7 @@ void btrfs_wait_ordered_roots(struct btrfs_fs_info *fs_info, u64 nr,
>>  void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
>>                                         u64 end,
>>                                         struct extent_state **cached_state);
>> +int btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end);
>>  int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
>>                                u64 post);
>>  int __init ordered_data_init(void);
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible
  2022-09-08 10:18   ` Filipe Manana
@ 2022-09-08 18:23     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 18:23 UTC (permalink / raw)
  To: Filipe Manana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:18 AM, Filipe Manana wrote:
> > 
> On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>>
>> From: Josef Bacik <[email protected]>
>>
>> Now all the helpers that btrfs_check_nocow_lock uses handle nowait, add
>> a nowait flag to btrfs_check_nocow_lock so it can be used by the write
>> path.
>>
>> Signed-off-by: Josef Bacik <[email protected]>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/ctree.h |  2 +-
>>  fs/btrfs/file.c  | 33 ++++++++++++++++++++++-----------
>>  fs/btrfs/inode.c |  2 +-
>>  3 files changed, 24 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 536bbc8551fc..06cb25f2d3bd 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -3482,7 +3482,7 @@ int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages,
>>                       struct extent_state **cached, bool noreserve);
>>  int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
>>  int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>> -                          size_t *write_bytes);
>> +                          size_t *write_bytes, bool nowait);
>>  void btrfs_check_nocow_unlock(struct btrfs_inode *inode);
>>
>>  /* tree-defrag.c */
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 0f257205c63d..cf19d381ead6 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1481,7 +1481,7 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>   * NOTE: Callers need to call btrfs_check_nocow_unlock() if we return > 0.
>>   */
>>  int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>> -                          size_t *write_bytes)
>> +                          size_t *write_bytes, bool nowait)
>>  {
>>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>>         struct btrfs_root *root = inode->root;
>> @@ -1500,16 +1500,21 @@ int btrfs_check_nocow_lock(struct btrfs_inode *inode, loff_t pos,
>>                            fs_info->sectorsize) - 1;
>>         num_bytes = lockend - lockstart + 1;
>>
>> -       btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
>> +       if (nowait) {
>> +               if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend)) {
>> +                       btrfs_drew_write_unlock(&root->snapshot_lock);
>> +                       return -EAGAIN;
>> +               }
>> +       } else {
>> +               btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, NULL);
>> +       }
>>         ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes,
>> -                       NULL, NULL, NULL, false, false);
>> -       if (ret <= 0) {
>> -               ret = 0;
>> +                       NULL, NULL, NULL, nowait, false);
>> +       if (ret <= 0)
>>                 btrfs_drew_write_unlock(&root->snapshot_lock);
>> -       } else {
>> +       else
>>                 *write_bytes = min_t(size_t, *write_bytes ,
>>                                      num_bytes - pos + lockstart);
>> -       }
>>         unlock_extent(&inode->io_tree, lockstart, lockend);
>>
>>         return ret;
>> @@ -1666,16 +1671,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                                                   &data_reserved, pos,
>>                                                   write_bytes, false);
>>                 if (ret < 0) {
>> +                       int tmp;
>> +
>>                         /*
>>                          * If we don't have to COW at the offset, reserve
>>                          * metadata only. write_bytes may get smaller than
>>                          * requested here.
>>                          */
>> -                       if (btrfs_check_nocow_lock(BTRFS_I(inode), pos,
>> -                                                  &write_bytes) > 0)
>> -                               only_release_metadata = true;
>> -                       else
>> +                       tmp = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
>> +                                                    &write_bytes, false);
>> +                       if (tmp < 0)
>> +                               ret = tmp;
>> +                       if (tmp > 0)
>> +                               ret = 0;
>> +                       if (ret)
> 
> A variable named tmp is not a great name, something like "can_nocow'
> would be a lot more clear.
> 

I renamed the variable tmp to can_nocow.

> Thanks.
> 
> 
>>                                 break;
>> +                       only_release_metadata = true;
>>                 }
>>
>>                 num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_SIZE);
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 36e755f73764..5426d4f4ac23 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4884,7 +4884,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
>>         ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
>>                                           blocksize, false);
>>         if (ret < 0) {
>> -               if (btrfs_check_nocow_lock(inode, block_start, &write_bytes) > 0) {
>> +               if (btrfs_check_nocow_lock(inode, block_start, &write_bytes, false) > 0) {
>>                         /* For nocow case, no need to reserve data space */
>>                         only_release_metadata = true;
>>                 } else {
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/12] btrfs: make prepare_pages nowait compatible
  2022-09-08 10:17   ` Filipe Manana
@ 2022-09-08 18:33     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 18:33 UTC (permalink / raw)
  To: Filipe Manana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:17 AM, Filipe Manana wrote:
> > 
> On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>>
>> Add nowait parameter to the prepare_pages function. In case nowait is
>> specified for an async buffered write request, do a nowait allocation or
>> return -EAGAIN.
>>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/file.c | 43 ++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 36 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index cf19d381ead6..a154a3cec44b 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1339,26 +1339,55 @@ static int prepare_uptodate_page(struct inode *inode,
>>         return 0;
>>  }
>>
>> +static int get_prepare_fgp_flags(bool nowait)
>> +{
>> +       int fgp_flags;
>> +
>> +       fgp_flags = FGP_LOCK|FGP_ACCESSED|FGP_CREAT;
> 
> Please follow the existing code style and add a space before and after
> each bitwise or operator.
> Not only does it conform to the btrfs style, it's also easier to read.
> 
> The assignment could also be done when declaring the variable, since
> it's short and simple.
> 
> Thanks.

I added the space and moved the assignment to the definition.
> 
>> +       if (nowait)
>> +               fgp_flags |= FGP_NOWAIT;
>> +
>> +       return fgp_flags;
>> +}
>> +
>> +static gfp_t get_prepare_gfp_flags(struct inode *inode, bool nowait)
>> +{
>> +       gfp_t gfp;
>> +
>> +       gfp = btrfs_alloc_write_mask(inode->i_mapping);
>> +       if (nowait) {
>> +               gfp &= ~__GFP_DIRECT_RECLAIM;
>> +               gfp |= GFP_NOWAIT;
>> +       }
>> +
>> +       return gfp;
>> +}
>> +
>>  /*
>>   * this just gets pages into the page cache and locks them down.
>>   */
>>  static noinline int prepare_pages(struct inode *inode, struct page **pages,
>>                                   size_t num_pages, loff_t pos,
>> -                                 size_t write_bytes, bool force_uptodate)
>> +                                 size_t write_bytes, bool force_uptodate,
>> +                                 bool nowait)
>>  {
>>         int i;
>>         unsigned long index = pos >> PAGE_SHIFT;
>> -       gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping);
>> +       gfp_t mask = get_prepare_gfp_flags(inode, nowait);
>> +       int fgp_flags = get_prepare_fgp_flags(nowait);
>>         int err = 0;
>>         int faili;
>>
>>         for (i = 0; i < num_pages; i++) {
>>  again:
>> -               pages[i] = find_or_create_page(inode->i_mapping, index + i,
>> -                                              mask | __GFP_WRITE);
>> +               pages[i] = pagecache_get_page(inode->i_mapping, index + i,
>> +                                       fgp_flags, mask | __GFP_WRITE);
>>                 if (!pages[i]) {
>>                         faili = i - 1;
>> -                       err = -ENOMEM;
>> +                       if (nowait)
>> +                               err = -EAGAIN;
>> +                       else
>> +                               err = -ENOMEM;
>>                         goto fail;
>>                 }
>>
>> @@ -1376,7 +1405,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages,
>>                                                     pos + write_bytes, false);
>>                 if (err) {
>>                         put_page(pages[i]);
>> -                       if (err == -EAGAIN) {
>> +                       if (!nowait && err == -EAGAIN) {
>>                                 err = 0;
>>                                 goto again;
>>                         }
>> @@ -1716,7 +1745,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                  */
>>                 ret = prepare_pages(inode, pages, num_pages,
>>                                     pos, write_bytes,
>> -                                   force_page_uptodate);
>> +                                   force_page_uptodate, false);
>>                 if (ret) {
>>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
>>                                                        reserve_bytes);
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/12] btrfs: make lock_and_cleanup_extent_if_need nowait compatible
  2022-09-08 10:17   ` Filipe Manana
@ 2022-09-08 18:38     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 18:38 UTC (permalink / raw)
  To: Filipe Manana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:17 AM, Filipe Manana wrote:
> > 
> On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>>
>> This adds the nowait parameter to lock_and_cleanup_extent_if_need(). If
>> the nowait parameter is specified we try to lock the extent in nowait
>> mode.
>>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/file.c | 18 +++++++++++++++---
>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index a154a3cec44b..4e1745e585cb 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1440,7 +1440,7 @@ static noinline int
>>  lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>                                 size_t num_pages, loff_t pos,
>>                                 size_t write_bytes,
>> -                               u64 *lockstart, u64 *lockend,
>> +                               u64 *lockstart, u64 *lockend, bool nowait,
>>                                 struct extent_state **cached_state)
>>  {
>>         struct btrfs_fs_info *fs_info = inode->root->fs_info;
>> @@ -1455,8 +1455,20 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
>>         if (start_pos < inode->vfs_inode.i_size) {
>>                 struct btrfs_ordered_extent *ordered;
>>
>> -               lock_extent_bits(&inode->io_tree, start_pos, last_pos,
>> +               if (nowait) {
>> +                       if (!try_lock_extent(&inode->io_tree, start_pos, last_pos)) {
>> +                               for (i = 0; i < num_pages; i++) {
>> +                                       unlock_page(pages[i]);
>> +                                       put_page(pages[i]);
> 
> Since this is a non-local array, I'd prefer if we also set pages[i] to NULL.
> That may help prevent hard to debug bugs in the future.
> 
> Thanks.
> 
> 

I set pages[i] to null in the next version of the patch series.

>> +                               }
>> +
>> +                               return -EAGAIN;
>> +                       }
>> +               } else {
>> +                       lock_extent_bits(&inode->io_tree, start_pos, last_pos,
>>                                 cached_state);
>> +               }
>> +
>>                 ordered = btrfs_lookup_ordered_range(inode, start_pos,
>>                                                      last_pos - start_pos + 1);
>>                 if (ordered &&
>> @@ -1755,7 +1767,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                 extents_locked = lock_and_cleanup_extent_if_need(
>>                                 BTRFS_I(inode), pages,
>>                                 num_pages, pos, write_bytes, &lockstart,
>> -                               &lockend, &cached_state);
>> +                               &lockend, false, &cached_state);
>>                 if (extents_locked < 0) {
>>                         if (extents_locked == -EAGAIN)
>>                                 goto again;
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/12] btrfs: btrfs: plumb NOWAIT through the write path
  2022-09-08 10:16   ` Filipe Manana
@ 2022-09-08 18:44     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 18:44 UTC (permalink / raw)
  To: Filipe Manana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:16 AM, Filipe Manana wrote:
> > 
> On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>>
>> We have everywhere setup for nowait, plumb NOWAIT through the write path.
> 
> Note, there's a double "btrfs: " prefix in the subject line.
> 

Fixed, thanks for catching it.

>>
>> Signed-off-by: Josef Bacik <[email protected]>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/file.c | 19 +++++++++++++------
>>  1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 4e1745e585cb..6e191e353b22 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1653,8 +1653,9 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>         bool force_page_uptodate = false;
>>         loff_t old_isize = i_size_read(inode);
>>         unsigned int ilock_flags = 0;
>> +       bool nowait = iocb->ki_flags & IOCB_NOWAIT;
> 
> Can be made const.
> 

In the next version of the patch series it will be const.

> Thanks.
> 
>>
>> -       if (iocb->ki_flags & IOCB_NOWAIT)
>> +       if (nowait)
>>                 ilock_flags |= BTRFS_ILOCK_TRY;
>>
>>         ret = btrfs_inode_lock(inode, ilock_flags);
>> @@ -1710,17 +1711,22 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                 extent_changeset_release(data_reserved);
>>                 ret = btrfs_check_data_free_space(BTRFS_I(inode),
>>                                                   &data_reserved, pos,
>> -                                                 write_bytes, false);
>> +                                                 write_bytes, nowait);
>>                 if (ret < 0) {
>>                         int tmp;
>>
>> +                       if (nowait && (ret == -ENOSPC || ret == -EAGAIN)) {
>> +                               ret = -EAGAIN;
>> +                               break;
>> +                       }
>> +
>>                         /*
>>                          * If we don't have to COW at the offset, reserve
>>                          * metadata only. write_bytes may get smaller than
>>                          * requested here.
>>                          */
>>                         tmp = btrfs_check_nocow_lock(BTRFS_I(inode), pos,
>> -                                                    &write_bytes, false);
>> +                                                    &write_bytes, nowait);
>>                         if (tmp < 0)
>>                                 ret = tmp;
>>                         if (tmp > 0)
>> @@ -1737,7 +1743,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                 WARN_ON(reserve_bytes == 0);
>>                 ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode),
>>                                                       reserve_bytes,
>> -                                                     reserve_bytes, false);
>> +                                                     reserve_bytes, nowait);
>>                 if (ret) {
>>                         if (!only_release_metadata)
>>                                 btrfs_free_reserved_data_space(BTRFS_I(inode),
>> @@ -1767,10 +1773,11 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>                 extents_locked = lock_and_cleanup_extent_if_need(
>>                                 BTRFS_I(inode), pages,
>>                                 num_pages, pos, write_bytes, &lockstart,
>> -                               &lockend, false, &cached_state);
>> +                               &lockend, nowait, &cached_state);
>>                 if (extents_locked < 0) {
>> -                       if (extents_locked == -EAGAIN)
>> +                       if (!nowait && extents_locked == -EAGAIN)
>>                                 goto again;
>> +
>>                         btrfs_delalloc_release_extents(BTRFS_I(inode),
>>                                                        reserve_bytes);
>>                         ret = extents_locked;
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 10/12] btrfs: make balance_dirty_pages nowait compatible
  2022-09-08 10:16   ` Filipe Manana
@ 2022-09-08 18:48     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 18:48 UTC (permalink / raw)
  To: Filipe Manana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:16 AM, Filipe Manana wrote:
> > 
> On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>>
>> This replaces the call to function balance_dirty_pages_ratelimited() in
>> the function btrfs_buffered_write() with a call to
>> balance_dirty_pages_ratelimited_flags().
>>
>> It also moves the function after the again label. This can cause the
>> function to be called a bit later, but this should have no impact in the
>> real world.
>>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/file.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index 6e191e353b22..fd42ba9de7a7 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1654,6 +1654,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>         loff_t old_isize = i_size_read(inode);
>>         unsigned int ilock_flags = 0;
>>         bool nowait = iocb->ki_flags & IOCB_NOWAIT;
>> +       unsigned int bdp_flags = nowait ? BDP_ASYNC : 0;
>>
>>         if (nowait)
>>                 ilock_flags |= BTRFS_ILOCK_TRY;
>> @@ -1756,6 +1757,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>
>>                 release_bytes = reserve_bytes;
>>  again:
>> +               ret = balance_dirty_pages_ratelimited_flags(inode->i_mapping, bdp_flags);
>> +               if (unlikely(ret))
> 
> We normally only use likely or unlikely in contextes where we observe
> that it makes a significant difference.
> What's the motivation here, have you verified that in this case it has
> a significant impact?
> 

I removed it.

> Thanks.
> 
>> +                       break;
>> +
>>                 /*
>>                  * This is going to setup the pages array with the number of
>>                  * pages we want, so we don't really need to worry about the
>> @@ -1860,8 +1865,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>
>>                 cond_resched();
>>
>> -               balance_dirty_pages_ratelimited(inode->i_mapping);
>> -
>>                 pos += copied;
>>                 num_written += copied;
>>         }
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 11/12] btrfs: add assert to search functions
  2022-09-08 10:15   ` Filipe Manana
@ 2022-09-08 19:10     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 19:10 UTC (permalink / raw)
  To: Filipe Manana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:15 AM, Filipe Manana wrote:
> > 
> On Thu, Sep 8, 2022 at 1:26 AM Stefan Roesch <[email protected]> wrote:
>>
>> This adds warnings to search functions, which should not have the nowait
>> flag set when called.
> 
> This could be more clear, by saying btree search functions which are
> not used for the buffered IO
> and direct IO paths, which are the only users of nowait btree searches.
> 
> Also the subject: "btrfs: add assert to search functions"
> 
> Mentions assert, but the code adds warnings, which are not the same.
> It could also be more clear like:   "btrfs: assert nowait mode is not
> used for some btree search functions''
> 

I rephrased the commit message.

> 
>>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/ctree.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 71b238364939..9caf0f87cbcb 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -2165,6 +2165,9 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
>>         lowest_level = p->lowest_level;
>>         WARN_ON(p->nodes[0] != NULL);
>>
>> +       if (WARN_ON_ONCE(p->nowait == 1))
> 
> This doesn't follow the existing code style, which is to treat path
> members as booleans, and just do:
> 
> WARN_ON_ONCE(p->nowait)
> 
> I.e., no explicit " == 1"
> 
> As this is a developer thing, I would use ASSERT() instead.
> 
> For release builds that typically have CONFIG_BTRFS_ASSERT not set
> (like Ubuntu and Debian), it would
> still allow the search to continue, which is fine from a functional
> perspective, since not respecting nowait
> semantics is just a performance thing.
> 

The next version of the patch series will use ASSERT.

> Thanks.
> 
> 
>> +               return -EINVAL;
>> +
>>         if (p->search_commit_root) {
>>                 BUG_ON(time_seq);
>>                 return btrfs_search_slot(NULL, root, key, p, 0, 0);
>> @@ -4465,6 +4468,9 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key,
>>         int ret = 1;
>>         int keep_locks = path->keep_locks;
>>
>> +       if (WARN_ON_ONCE(path->nowait == 1))
>> +               return -EINVAL;
>> +
>>         path->keep_locks = 1;
>>  again:
>>         cur = btrfs_read_lock_root_node(root);
>> @@ -4645,6 +4651,9 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
>>         int ret;
>>         int i;
>>
>> +       if (WARN_ON_ONCE(path->nowait == 1))
>> +               return -EINVAL;
>> +
>>         nritems = btrfs_header_nritems(path->nodes[0]);
>>         if (nritems == 0)
>>                 return 1;
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 12/12] btrfs: enable nowait async buffered writes
  2022-09-08 10:14   ` Filipe Manana
@ 2022-09-08 19:14     ` Stefan Roesch
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Roesch @ 2022-09-08 19:14 UTC (permalink / raw)
  To: fdmanana; +Cc: kernel-team, io-uring, linux-btrfs, axboe, josef



On 9/8/22 3:14 AM, Filipe Manana wrote:
> On Thu, Sep 8, 2022 at 1:29 AM Stefan Roesch <[email protected]> wrote:
>>
>> Enable nowait async buffered writes in btrfs_do_write_iter() and
>> btrfs_file_open().
> 
> This is too terse, see below.
> 
>>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>>  fs/btrfs/file.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index fd42ba9de7a7..887497fd524f 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -2107,13 +2107,13 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
>>         if (BTRFS_FS_ERROR(inode->root->fs_info))
>>                 return -EROFS;
>>
>> -       if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
>> -               return -EOPNOTSUPP;
>> -
>>         if (sync)
>>                 atomic_inc(&inode->sync_writers);
>>
>>         if (encoded) {
>> +               if (iocb->ki_flags & IOCB_NOWAIT)
>> +                       return -EOPNOTSUPP;
> 
> The changelog should provide some rationale about why encoded writes
> are not supported.
> 
> Thanks.

I added an explanation why encoded writes are not yet supported.

> 
>> +
>>                 num_written = btrfs_encoded_write(iocb, from, encoded);
>>                 num_sync = encoded->len;
>>         } else if (iocb->ki_flags & IOCB_DIRECT) {
>> @@ -3755,7 +3755,7 @@ static int btrfs_file_open(struct inode *inode, struct file *filp)
>>  {
>>         int ret;
>>
>> -       filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
>> +       filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
>>
>>         ret = fsverity_file_open(inode, filp);
>>         if (ret)
>> --
>> 2.30.2
>>
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-09-08 19:15 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-08  0:26 [PATCH v2 00/12] io-uring/btrfs: support async buffered writes Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 01/12] mm: export balance_dirty_pages_ratelimited_flags() Stefan Roesch
2022-09-08 10:18   ` Filipe Manana
2022-09-08  0:26 ` [PATCH v2 02/12] btrfs: implement a nowait option for tree searches Stefan Roesch
2022-09-08 13:19   ` Josef Bacik
2022-09-08 18:05     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 03/12] btrfs: make can_nocow_extent nowait compatible Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 04/12] btrfs: add the ability to use NO_FLUSH for data reservations Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 05/12] btrfs: add btrfs_try_lock_ordered_range Stefan Roesch
2022-09-08 10:18   ` Filipe Manana
2022-09-08 18:12     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 06/12] btrfs: make btrfs_check_nocow_lock nowait compatible Stefan Roesch
2022-09-08 10:18   ` Filipe Manana
2022-09-08 18:23     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 07/12] btrfs: make prepare_pages " Stefan Roesch
2022-09-08 10:17   ` Filipe Manana
2022-09-08 18:33     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 08/12] btrfs: make lock_and_cleanup_extent_if_need " Stefan Roesch
2022-09-08 10:17   ` Filipe Manana
2022-09-08 18:38     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 09/12] btrfs: btrfs: plumb NOWAIT through the write path Stefan Roesch
2022-09-08 10:16   ` Filipe Manana
2022-09-08 18:44     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 10/12] btrfs: make balance_dirty_pages nowait compatible Stefan Roesch
2022-09-08 10:16   ` Filipe Manana
2022-09-08 18:48     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 11/12] btrfs: add assert to search functions Stefan Roesch
2022-09-08 10:15   ` Filipe Manana
2022-09-08 19:10     ` Stefan Roesch
2022-09-08  0:26 ` [PATCH v2 12/12] btrfs: enable nowait async buffered writes Stefan Roesch
2022-09-08 10:14   ` Filipe Manana
2022-09-08 19:14     ` Stefan Roesch
2022-09-08 10:14 ` [PATCH v2 00/12] io-uring/btrfs: support " Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox