public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH RFC v5 00/29] io_uring getdents
@ 2023-08-25 13:54 Hao Xu
  2023-08-25 13:54 ` [PATCH 01/29] fs: split off vfs_getdents function of getdents64 syscall Hao Xu
                   ` (30 more replies)
  0 siblings, 31 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

This series introduce getdents64 to io_uring, the code logic is similar
with the snychronized version's. It first try nowait issue, and offload
it to io-wq threads if the first try fails.

Patch1 and Patch2 are some preparation
Patch3 supports nowait for xfs getdents code
Patch4-11 are vfs change, include adding helpers and trylock for locks
Patch12-29 supports nowait for involved xfs journal stuff
note, Patch24 and 27 are actually two questions, might be removed later.
an xfs test may come later.

Tests I've done:
a liburing test case for functional test:
https://github.com/HowHsu/liburing/commit/39dc9a8e19c06a8cebf8c2301b85320eb45c061e?diff=unified

xfstests:
    test/generic: 1 fails and 171 not run
    test/xfs: 72 fails and 156 not run
run the code before without this patchset, same result.
I'll try to make the environment more right to run more tests here.


Tested it with a liburing performance test:
https://github.com/HowHsu/liburing/blob/getdents/test/getdents2.c

The test is controlled by the below script[2] which runs getdents2.t 100
times and calulate the avg.
The result show that io_uring version is about 2.6% faster:

note:
[1] the number of getdents call/request in io_uring and normal sync version
are made sure to be same beforehand.

[2] run_getdents.py

```python3

import subprocess

N = 100
sum = 0.0
args = ["/data/home/howeyxu/tmpdir", "sync"]

for i in range(N):
    output = subprocess.check_output(["./liburing/test/getdents2.t"] + args)
    sum += float(output)

average = sum / N
print("Average of sync:", average)

sum = 0.0
args = ["/data/home/howeyxu/tmpdir", "iouring"]

for i in range(N):
    output = subprocess.check_output(["./liburing/test/getdents2.t"] + args)
    sum += float(output)

average = sum / N
print("Average of iouring:", average)

```

v4->v5:
 - move atime update to the beginning of getdents operation
 - trylock for i_rwsem
 - nowait semantics for involved xfs journal stuff

v3->v4:
 - add Dave's xfs nowait code and fix a deadlock problem, with some code
   style tweak.
 - disable fixed file to avoid a race problem for now
 - add a test program.

v2->v3:
 - removed the kernfs patches
 - add f_pos_lock logic
 - remove the "reduce last EOF getdents try" optimization since
   Dominique reports that doesn't make difference
 - remove the rewind logic, I think the right way is to introduce lseek
   to io_uring not to patch this logic to getdents.
 - add Singed-off-by of Stefan Roesch for patch 1 since checkpatch
   complained that Co-developed-by someone should be accompanied with
   Signed-off-by same person, I can remove them if Stefan thinks that's
   not proper.


Dominique Martinet (1):
  fs: split off vfs_getdents function of getdents64 syscall

Hao Xu (28):
  xfs: rename XBF_TRYLOCK to XBF_NOWAIT
  xfs: add NOWAIT semantics for readdir
  vfs: add nowait flag for struct dir_context
  vfs: add a vfs helper for io_uring file pos lock
  vfs: add file_pos_unlock() for io_uring usage
  vfs: add a nowait parameter for touch_atime()
  vfs: add nowait parameter for file_accessed()
  vfs: move file_accessed() to the beginning of iterate_dir()
  vfs: add S_NOWAIT for nowait time update
  vfs: trylock inode->i_rwsem in iterate_dir() to support nowait
  xfs: enforce GFP_NOIO implicitly during nowait time update
  xfs: make xfs_trans_alloc() support nowait semantics
  xfs: support nowait for xfs_log_reserve()
  xfs: don't wait for free space in xlog_grant_head_check() in nowait
    case
  xfs: add nowait parameter for xfs_inode_item_init()
  xfs: make xfs_trans_ijoin() error out -EAGAIN
  xfs: set XBF_NOWAIT for xfs_buf_read_map if necessary
  xfs: support nowait memory allocation in _xfs_buf_alloc()
  xfs: distinguish error type of memory allocation failure for nowait
    case
  xfs: return -EAGAIN when bulk memory allocation fails in nowait case
  xfs: comment page allocation for nowait case in xfs_buf_find_insert()
  xfs: don't print warn info for -EAGAIN error in  xfs_buf_get_map()
  xfs: support nowait for xfs_buf_read_map()
  xfs: support nowait for xfs_buf_item_init()
  xfs: return -EAGAIN when nowait meets sync in transaction commit
  xfs: add a comment for xlog_kvmalloc()
  xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit()
  io_uring: add support for getdents

 arch/s390/hypfs/inode.c         |  2 +-
 block/fops.c                    |  2 +-
 fs/btrfs/file.c                 |  2 +-
 fs/btrfs/inode.c                |  2 +-
 fs/cachefiles/namei.c           |  2 +-
 fs/coda/dir.c                   |  4 +--
 fs/ecryptfs/file.c              |  4 +--
 fs/ext2/file.c                  |  4 +--
 fs/ext4/file.c                  |  6 ++--
 fs/f2fs/file.c                  |  4 +--
 fs/file.c                       | 13 +++++++
 fs/fuse/dax.c                   |  2 +-
 fs/fuse/file.c                  |  4 +--
 fs/gfs2/file.c                  |  2 +-
 fs/hugetlbfs/inode.c            |  2 +-
 fs/inode.c                      | 10 +++---
 fs/internal.h                   |  8 +++++
 fs/namei.c                      |  4 +--
 fs/nfsd/vfs.c                   |  2 +-
 fs/nilfs2/file.c                |  2 +-
 fs/orangefs/file.c              |  2 +-
 fs/orangefs/inode.c             |  2 +-
 fs/overlayfs/file.c             |  2 +-
 fs/overlayfs/inode.c            |  2 +-
 fs/pipe.c                       |  2 +-
 fs/ramfs/file-nommu.c           |  2 +-
 fs/readdir.c                    | 61 +++++++++++++++++++++++++--------
 fs/smb/client/cifsfs.c          |  2 +-
 fs/splice.c                     |  2 +-
 fs/stat.c                       |  2 +-
 fs/ubifs/file.c                 |  2 +-
 fs/udf/file.c                   |  2 +-
 fs/xfs/libxfs/xfs_alloc.c       |  2 +-
 fs/xfs/libxfs/xfs_attr_remote.c |  2 +-
 fs/xfs/libxfs/xfs_btree.c       |  2 +-
 fs/xfs/libxfs/xfs_da_btree.c    | 16 +++++++++
 fs/xfs/libxfs/xfs_da_btree.h    |  1 +
 fs/xfs/libxfs/xfs_dir2_block.c  |  7 ++--
 fs/xfs/libxfs/xfs_dir2_priv.h   |  2 +-
 fs/xfs/libxfs/xfs_shared.h      |  2 ++
 fs/xfs/libxfs/xfs_trans_inode.c | 12 +++++--
 fs/xfs/scrub/dir.c              |  2 +-
 fs/xfs/scrub/readdir.c          |  2 +-
 fs/xfs/scrub/repair.c           |  2 +-
 fs/xfs/xfs_buf.c                | 43 +++++++++++++++++------
 fs/xfs/xfs_buf.h                |  4 +--
 fs/xfs/xfs_buf_item.c           |  9 +++--
 fs/xfs/xfs_buf_item.h           |  2 +-
 fs/xfs/xfs_buf_item_recover.c   |  2 +-
 fs/xfs/xfs_dir2_readdir.c       | 49 ++++++++++++++++++++------
 fs/xfs/xfs_dquot.c              |  2 +-
 fs/xfs/xfs_file.c               |  6 ++--
 fs/xfs/xfs_inode.c              | 27 +++++++++++++++
 fs/xfs/xfs_inode.h              | 17 +++++----
 fs/xfs/xfs_inode_item.c         | 12 ++++---
 fs/xfs/xfs_inode_item.h         |  3 +-
 fs/xfs/xfs_iops.c               | 31 ++++++++++++++---
 fs/xfs/xfs_log.c                | 33 ++++++++++++------
 fs/xfs/xfs_log.h                |  5 +--
 fs/xfs/xfs_log_cil.c            | 17 +++++++--
 fs/xfs/xfs_log_priv.h           |  4 +--
 fs/xfs/xfs_trans.c              | 44 ++++++++++++++++++++----
 fs/xfs/xfs_trans.h              |  2 +-
 fs/xfs/xfs_trans_buf.c          | 18 ++++++++--
 fs/zonefs/file.c                |  4 +--
 include/linux/file.h            |  7 ++++
 include/linux/fs.h              | 16 +++++++--
 include/uapi/linux/io_uring.h   |  1 +
 io_uring/fs.c                   | 53 ++++++++++++++++++++++++++++
 io_uring/fs.h                   |  3 ++
 io_uring/opdef.c                |  8 +++++
 kernel/bpf/inode.c              |  4 +--
 mm/filemap.c                    |  8 ++---
 mm/shmem.c                      |  6 ++--
 net/unix/af_unix.c              |  4 +--
 75 files changed, 499 insertions(+), 161 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/29] fs: split off vfs_getdents function of getdents64 syscall
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT Hao Xu
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Dominique Martinet <[email protected]>

This splits off the vfs_getdents function from the getdents64 system
call.
This will allow io_uring to call the vfs_getdents function.

Co-developed-by: Stefan Roesch <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
Signed-off-by: Hao Xu <[email protected]>
---
 fs/internal.h |  8 ++++++++
 fs/readdir.c  | 34 ++++++++++++++++++++++++++--------
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index f7a3dc111026..b1f66e52d61b 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -304,3 +304,11 @@ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *po
 struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns);
 struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap);
 void mnt_idmap_put(struct mnt_idmap *idmap);
+
+/*
+ * fs/readdir.c
+ */
+struct linux_dirent64;
+
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+		 unsigned int count);
diff --git a/fs/readdir.c b/fs/readdir.c
index b264ce60114d..9592259b7e7f 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -21,6 +21,7 @@
 #include <linux/unistd.h>
 #include <linux/compat.h>
 #include <linux/uaccess.h>
+#include "internal.h"
 
 #include <asm/unaligned.h>
 
@@ -351,10 +352,16 @@ static bool filldir64(struct dir_context *ctx, const char *name, int namlen,
 	return false;
 }
 
-SYSCALL_DEFINE3(getdents64, unsigned int, fd,
-		struct linux_dirent64 __user *, dirent, unsigned int, count)
+
+/**
+ * vfs_getdents - getdents without fdget
+ * @file    : pointer to file struct of directory
+ * @dirent  : pointer to user directory structure
+ * @count   : size of buffer
+ */
+int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
+		 unsigned int count)
 {
-	struct fd f;
 	struct getdents_callback64 buf = {
 		.ctx.actor = filldir64,
 		.count = count,
@@ -362,11 +369,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 	};
 	int error;
 
-	f = fdget_pos(fd);
-	if (!f.file)
-		return -EBADF;
-
-	error = iterate_dir(f.file, &buf.ctx);
+	error = iterate_dir(file, &buf.ctx);
 	if (error >= 0)
 		error = buf.error;
 	if (buf.prev_reclen) {
@@ -379,6 +382,21 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 		else
 			error = count - buf.count;
 	}
+	return error;
+}
+
+SYSCALL_DEFINE3(getdents64, unsigned int, fd,
+		struct linux_dirent64 __user *, dirent, unsigned int, count)
+{
+	struct fd f;
+	int error;
+
+	f = fdget_pos(fd);
+	if (!f.file)
+		return -EBADF;
+
+	error = vfs_getdents(f.file, dirent, count);
+
 	fdput_pos(f);
 	return error;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
  2023-08-25 13:54 ` [PATCH 01/29] fs: split off vfs_getdents function of getdents64 syscall Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 21:39   ` Dave Chinner
  2023-08-25 13:54 ` [PATCH 03/29] xfs: add NOWAIT semantics for readdir Hao Xu
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

XBF_TRYLOCK means we need lock but don't block on it, we can use it to
stand for not waiting for memory allcation. Rename XBF_TRYLOCK to
XBF_NOWAIT, which is more generic.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/libxfs/xfs_alloc.c       | 2 +-
 fs/xfs/libxfs/xfs_attr_remote.c | 2 +-
 fs/xfs/libxfs/xfs_btree.c       | 2 +-
 fs/xfs/scrub/repair.c           | 2 +-
 fs/xfs/xfs_buf.c                | 6 +++---
 fs/xfs/xfs_buf.h                | 4 ++--
 fs/xfs/xfs_dquot.c              | 2 +-
 7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3069194527dd..a75b9298faa8 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3183,7 +3183,7 @@ xfs_alloc_read_agf(
 	ASSERT((flags & (XFS_ALLOC_FLAG_FREEING | XFS_ALLOC_FLAG_TRYLOCK)) !=
 			(XFS_ALLOC_FLAG_FREEING | XFS_ALLOC_FLAG_TRYLOCK));
 	error = xfs_read_agf(pag, tp,
-			(flags & XFS_ALLOC_FLAG_TRYLOCK) ? XBF_TRYLOCK : 0,
+			(flags & XFS_ALLOC_FLAG_TRYLOCK) ? XBF_NOWAIT : 0,
 			&agfbp);
 	if (error)
 		return error;
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index d440393b40eb..2ccb0867824c 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -661,7 +661,7 @@ xfs_attr_rmtval_invalidate(
 			return error;
 		if (XFS_IS_CORRUPT(args->dp->i_mount, nmap != 1))
 			return -EFSCORRUPTED;
-		error = xfs_attr_rmtval_stale(args->dp, &map, XBF_TRYLOCK);
+		error = xfs_attr_rmtval_stale(args->dp, &map, XBF_NOWAIT);
 		if (error)
 			return error;
 
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 6a6503ab0cd7..77c4f1d83475 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1343,7 +1343,7 @@ xfs_btree_read_buf_block(
 	int			error;
 
 	/* need to sort out how callers deal with failures first */
-	ASSERT(!(flags & XBF_TRYLOCK));
+	ASSERT(!(flags & XBF_NOWAIT));
 
 	error = xfs_btree_ptr_to_daddr(cur, ptr, &d);
 	if (error)
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index ac6d8803e660..9312cf3b20e2 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -460,7 +460,7 @@ xrep_invalidate_block(
 
 	error = xfs_buf_incore(sc->mp->m_ddev_targp,
 			XFS_FSB_TO_DADDR(sc->mp, fsbno),
-			XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK, &bp);
+			XFS_FSB_TO_BB(sc->mp, 1), XBF_NOWAIT, &bp);
 	if (error)
 		return 0;
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 15d1e5a7c2d3..9f84bc3b802c 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -228,7 +228,7 @@ _xfs_buf_alloc(
 	 * We don't want certain flags to appear in b_flags unless they are
 	 * specifically set by later operations on the buffer.
 	 */
-	flags &= ~(XBF_UNMAPPED | XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD);
+	flags &= ~(XBF_UNMAPPED | XBF_NOWAIT | XBF_ASYNC | XBF_READ_AHEAD);
 
 	atomic_set(&bp->b_hold, 1);
 	atomic_set(&bp->b_lru_ref, 1);
@@ -543,7 +543,7 @@ xfs_buf_find_lock(
 	struct xfs_buf          *bp,
 	xfs_buf_flags_t		flags)
 {
-	if (flags & XBF_TRYLOCK) {
+	if (flags & XBF_NOWAIT) {
 		if (!xfs_buf_trylock(bp)) {
 			XFS_STATS_INC(bp->b_mount, xb_busy_locked);
 			return -EAGAIN;
@@ -886,7 +886,7 @@ xfs_buf_readahead_map(
 	struct xfs_buf		*bp;
 
 	xfs_buf_read_map(target, map, nmaps,
-		     XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
+		     XBF_NOWAIT | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
 		     __this_address);
 }
 
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 549c60942208..8cd307626939 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -45,7 +45,7 @@ struct xfs_buf;
 
 /* flags used only as arguments to access routines */
 #define XBF_INCORE	 (1u << 29)/* lookup only, return if found in cache */
-#define XBF_TRYLOCK	 (1u << 30)/* lock requested, but do not wait */
+#define XBF_NOWAIT	 (1u << 30)/* mem/lock requested, but do not wait */
 #define XBF_UNMAPPED	 (1u << 31)/* do not map the buffer */
 
 
@@ -68,7 +68,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }, \
 	/* The following interface flags should never be set */ \
 	{ XBF_INCORE,		"INCORE" }, \
-	{ XBF_TRYLOCK,		"TRYLOCK" }, \
+	{ XBF_NOWAIT,		"NOWAIT" }, \
 	{ XBF_UNMAPPED,		"UNMAPPED" }
 
 /*
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 7f071757f278..5bc01ed4b2d7 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1233,7 +1233,7 @@ xfs_qm_dqflush(
 	 * Get the buffer containing the on-disk dquot
 	 */
 	error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dqp->q_blkno,
-				   mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK,
+				   mp->m_quotainfo->qi_dqchunklen, XBF_NOWAIT,
 				   &bp, &xfs_dquot_buf_ops);
 	if (error == -EAGAIN)
 		goto out_unlock;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/29] xfs: add NOWAIT semantics for readdir
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
  2023-08-25 13:54 ` [PATCH 01/29] fs: split off vfs_getdents function of getdents64 syscall Hao Xu
  2023-08-25 13:54 ` [PATCH 02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 04/29] vfs: add nowait flag for struct dir_context Hao Xu
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Implement NOWAIT semantics for readdir. Return EAGAIN error to the
caller if it would block, like failing to get locks, or going to
do IO.

Co-developed-by: Dave Chinner <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Hao Xu <[email protected]>
[fixes deadlock issue, tweak code style]
---
 fs/xfs/libxfs/xfs_da_btree.c   | 16 +++++++++++
 fs/xfs/libxfs/xfs_da_btree.h   |  1 +
 fs/xfs/libxfs/xfs_dir2_block.c |  7 ++---
 fs/xfs/libxfs/xfs_dir2_priv.h  |  2 +-
 fs/xfs/scrub/dir.c             |  2 +-
 fs/xfs/scrub/readdir.c         |  2 +-
 fs/xfs/xfs_dir2_readdir.c      | 49 ++++++++++++++++++++++++++--------
 fs/xfs/xfs_inode.c             | 27 +++++++++++++++++++
 fs/xfs/xfs_inode.h             | 17 +++++++-----
 9 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index e576560b46e9..2638eb37bc77 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2643,16 +2643,32 @@ xfs_da_read_buf(
 	struct xfs_buf_map	map, *mapp = &map;
 	int			nmap = 1;
 	int			error;
+	int			buf_flags = 0;
 
 	*bpp = NULL;
 	error = xfs_dabuf_map(dp, bno, flags, whichfork, &mapp, &nmap);
 	if (error || !nmap)
 		goto out_free;
 
+	/*
+	 * NOWAIT semantics mean we don't wait on the buffer lock nor do we
+	 * issue IO for this buffer if it is not already in memory. Caller will
+	 * retry. This will return -EAGAIN if the buffer is in memory and cannot
+	 * be locked, and no buffer and no error if it isn't in memory.  We
+	 * translate both of those into a return state of -EAGAIN and *bpp =
+	 * NULL.
+	 */
+	if (flags & XFS_DABUF_NOWAIT)
+		buf_flags |= XBF_NOWAIT | XBF_INCORE;
 	error = xfs_trans_read_buf_map(mp, tp, mp->m_ddev_targp, mapp, nmap, 0,
 			&bp, ops);
 	if (error)
 		goto out_free;
+	if (!bp) {
+		ASSERT(flags & XFS_DABUF_NOWAIT);
+		error = -EAGAIN;
+		goto out_free;
+	}
 
 	if (whichfork == XFS_ATTR_FORK)
 		xfs_buf_set_ref(bp, XFS_ATTR_BTREE_REF);
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index ffa3df5b2893..32e7b1cca402 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -205,6 +205,7 @@ int	xfs_da3_node_read_mapped(struct xfs_trans *tp, struct xfs_inode *dp,
  */
 
 #define XFS_DABUF_MAP_HOLE_OK	(1u << 0)
+#define XFS_DABUF_NOWAIT	(1u << 1)
 
 int	xfs_da_grow_inode(xfs_da_args_t *args, xfs_dablk_t *new_blkno);
 int	xfs_da_grow_inode_int(struct xfs_da_args *args, xfs_fileoff_t *bno,
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 00f960a703b2..59b24a594add 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -135,13 +135,14 @@ int
 xfs_dir3_block_read(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
+	unsigned int		flags,
 	struct xfs_buf		**bpp)
 {
 	struct xfs_mount	*mp = dp->i_mount;
 	xfs_failaddr_t		fa;
 	int			err;
 
-	err = xfs_da_read_buf(tp, dp, mp->m_dir_geo->datablk, 0, bpp,
+	err = xfs_da_read_buf(tp, dp, mp->m_dir_geo->datablk, flags, bpp,
 				XFS_DATA_FORK, &xfs_dir3_block_buf_ops);
 	if (err || !*bpp)
 		return err;
@@ -380,7 +381,7 @@ xfs_dir2_block_addname(
 	tp = args->trans;
 
 	/* Read the (one and only) directory block into bp. */
-	error = xfs_dir3_block_read(tp, dp, &bp);
+	error = xfs_dir3_block_read(tp, dp, 0, &bp);
 	if (error)
 		return error;
 
@@ -695,7 +696,7 @@ xfs_dir2_block_lookup_int(
 	dp = args->dp;
 	tp = args->trans;
 
-	error = xfs_dir3_block_read(tp, dp, &bp);
+	error = xfs_dir3_block_read(tp, dp, 0, &bp);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h
index 7404a9ff1a92..7d4cf8a0f15b 100644
--- a/fs/xfs/libxfs/xfs_dir2_priv.h
+++ b/fs/xfs/libxfs/xfs_dir2_priv.h
@@ -51,7 +51,7 @@ extern int xfs_dir_cilookup_result(struct xfs_da_args *args,
 
 /* xfs_dir2_block.c */
 extern int xfs_dir3_block_read(struct xfs_trans *tp, struct xfs_inode *dp,
-			       struct xfs_buf **bpp);
+			       unsigned int flags, struct xfs_buf **bpp);
 extern int xfs_dir2_block_addname(struct xfs_da_args *args);
 extern int xfs_dir2_block_lookup(struct xfs_da_args *args);
 extern int xfs_dir2_block_removename(struct xfs_da_args *args);
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 0b491784b759..5cc51f201bd7 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -313,7 +313,7 @@ xchk_directory_data_bestfree(
 		/* dir block format */
 		if (lblk != XFS_B_TO_FSBT(mp, XFS_DIR2_DATA_OFFSET))
 			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, lblk);
-		error = xfs_dir3_block_read(sc->tp, sc->ip, &bp);
+		error = xfs_dir3_block_read(sc->tp, sc->ip, 0, &bp);
 	} else {
 		/* dir data format */
 		error = xfs_dir3_data_read(sc->tp, sc->ip, lblk, 0, &bp);
diff --git a/fs/xfs/scrub/readdir.c b/fs/xfs/scrub/readdir.c
index e51c1544be63..f0a727311632 100644
--- a/fs/xfs/scrub/readdir.c
+++ b/fs/xfs/scrub/readdir.c
@@ -101,7 +101,7 @@ xchk_dir_walk_block(
 	unsigned int		off, next_off, end;
 	int			error;
 
-	error = xfs_dir3_block_read(sc->tp, dp, &bp);
+	error = xfs_dir3_block_read(sc->tp, dp, 0, &bp);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index 9f3ceb461515..dcdbd26e0402 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -149,6 +149,7 @@ xfs_dir2_block_getdents(
 	struct xfs_da_geometry	*geo = args->geo;
 	unsigned int		offset, next_offset;
 	unsigned int		end;
+	unsigned int		flags = 0;
 
 	/*
 	 * If the block number in the offset is out of range, we're done.
@@ -156,7 +157,9 @@ xfs_dir2_block_getdents(
 	if (xfs_dir2_dataptr_to_db(geo, ctx->pos) > geo->datablk)
 		return 0;
 
-	error = xfs_dir3_block_read(args->trans, dp, &bp);
+	if (ctx->flags & DIR_CONTEXT_F_NOWAIT)
+		flags |= XFS_DABUF_NOWAIT;
+	error = xfs_dir3_block_read(args->trans, dp, flags, &bp);
 	if (error)
 		return error;
 
@@ -240,6 +243,7 @@ xfs_dir2_block_getdents(
 STATIC int
 xfs_dir2_leaf_readbuf(
 	struct xfs_da_args	*args,
+	struct dir_context	*ctx,
 	size_t			bufsize,
 	xfs_dir2_off_t		*cur_off,
 	xfs_dablk_t		*ra_blk,
@@ -258,10 +262,15 @@ xfs_dir2_leaf_readbuf(
 	struct xfs_iext_cursor	icur;
 	int			ra_want;
 	int			error = 0;
-
-	error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK);
-	if (error)
-		goto out;
+	unsigned int		flags = 0;
+
+	if (ctx->flags & DIR_CONTEXT_F_NOWAIT) {
+		flags |= XFS_DABUF_NOWAIT;
+	} else {
+		error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK);
+		if (error)
+			goto out;
+	}
 
 	/*
 	 * Look for mapped directory blocks at or above the current offset.
@@ -280,7 +289,7 @@ xfs_dir2_leaf_readbuf(
 	new_off = xfs_dir2_da_to_byte(geo, map.br_startoff);
 	if (new_off > *cur_off)
 		*cur_off = new_off;
-	error = xfs_dir3_data_read(args->trans, dp, map.br_startoff, 0, &bp);
+	error = xfs_dir3_data_read(args->trans, dp, map.br_startoff, flags, &bp);
 	if (error)
 		goto out;
 
@@ -360,6 +369,7 @@ xfs_dir2_leaf_getdents(
 	int			byteoff;	/* offset in current block */
 	unsigned int		offset = 0;
 	int			error = 0;	/* error return value */
+	int			written = 0;
 
 	/*
 	 * If the offset is at or past the largest allowed value,
@@ -391,10 +401,17 @@ xfs_dir2_leaf_getdents(
 				bp = NULL;
 			}
 
-			if (*lock_mode == 0)
-				*lock_mode = xfs_ilock_data_map_shared(dp);
-			error = xfs_dir2_leaf_readbuf(args, bufsize, &curoff,
-					&rablk, &bp);
+			if (*lock_mode == 0) {
+				*lock_mode =
+					xfs_ilock_data_map_shared_generic(dp,
+					ctx->flags & DIR_CONTEXT_F_NOWAIT);
+				if (!*lock_mode) {
+					error = -EAGAIN;
+					break;
+				}
+			}
+			error = xfs_dir2_leaf_readbuf(args, ctx, bufsize,
+					&curoff, &rablk, &bp);
 			if (error || !bp)
 				break;
 
@@ -479,6 +496,7 @@ xfs_dir2_leaf_getdents(
 		 */
 		offset += length;
 		curoff += length;
+		written += length;
 		/* bufsize may have just been a guess; don't go negative */
 		bufsize = bufsize > length ? bufsize - length : 0;
 	}
@@ -492,6 +510,8 @@ xfs_dir2_leaf_getdents(
 		ctx->pos = xfs_dir2_byte_to_dataptr(curoff) & 0x7fffffff;
 	if (bp)
 		xfs_trans_brelse(args->trans, bp);
+	if (error == -EAGAIN && written > 0)
+		error = 0;
 	return error;
 }
 
@@ -514,6 +534,7 @@ xfs_readdir(
 	unsigned int		lock_mode;
 	bool			isblock;
 	int			error;
+	bool			nowait;
 
 	trace_xfs_readdir(dp);
 
@@ -531,7 +552,11 @@ xfs_readdir(
 	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
 		return xfs_dir2_sf_getdents(&args, ctx);
 
-	lock_mode = xfs_ilock_data_map_shared(dp);
+	nowait = ctx->flags & DIR_CONTEXT_F_NOWAIT;
+	lock_mode = xfs_ilock_data_map_shared_generic(dp, nowait);
+	if (!lock_mode)
+		return -EAGAIN;
+
 	error = xfs_dir2_isblock(&args, &isblock);
 	if (error)
 		goto out_unlock;
@@ -546,5 +571,7 @@ xfs_readdir(
 out_unlock:
 	if (lock_mode)
 		xfs_iunlock(dp, lock_mode);
+	if (error == -EAGAIN)
+		ASSERT(nowait);
 	return error;
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9e62cc500140..d088f7d0c23a 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -120,6 +120,33 @@ xfs_ilock_data_map_shared(
 	return lock_mode;
 }
 
+/*
+ * Similar to xfs_ilock_data_map_shared(), except that it will only try to lock
+ * the inode in shared mode if the extents are already in memory. If it fails to
+ * get the lock or has to do IO to read the extent list, fail the operation by
+ * returning 0 as the lock mode.
+ */
+uint
+xfs_ilock_data_map_shared_nowait(
+	struct xfs_inode	*ip)
+{
+	if (xfs_need_iread_extents(&ip->i_df))
+		return 0;
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
+		return 0;
+	return XFS_ILOCK_SHARED;
+}
+
+int
+xfs_ilock_data_map_shared_generic(
+	struct xfs_inode	*dp,
+	bool			nowait)
+{
+	if (nowait)
+		return xfs_ilock_data_map_shared_nowait(dp);
+	return xfs_ilock_data_map_shared(dp);
+}
+
 uint
 xfs_ilock_attr_map_shared(
 	struct xfs_inode	*ip)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 7547caf2f2ab..ea206a5a27df 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -490,13 +490,16 @@ int		xfs_rename(struct mnt_idmap *idmap,
 			   struct xfs_name *target_name,
 			   struct xfs_inode *target_ip, unsigned int flags);
 
-void		xfs_ilock(xfs_inode_t *, uint);
-int		xfs_ilock_nowait(xfs_inode_t *, uint);
-void		xfs_iunlock(xfs_inode_t *, uint);
-void		xfs_ilock_demote(xfs_inode_t *, uint);
-bool		xfs_isilocked(struct xfs_inode *, uint);
-uint		xfs_ilock_data_map_shared(struct xfs_inode *);
-uint		xfs_ilock_attr_map_shared(struct xfs_inode *);
+void		xfs_ilock(struct xfs_inode *ip, uint lockmode);
+int		xfs_ilock_nowait(struct xfs_inode *ip, uint lockmode);
+void		xfs_iunlock(struct xfs_inode *ip, uint lockmode);
+void		xfs_ilock_demote(struct xfs_inode *ip, uint lockmode);
+bool		xfs_isilocked(struct xfs_inode *ip, uint lockmode);
+uint		xfs_ilock_data_map_shared(struct xfs_inode *ip);
+uint		xfs_ilock_data_map_shared_nowait(struct xfs_inode *ip);
+int		xfs_ilock_data_map_shared_generic(struct xfs_inode *ip,
+						  bool nowait);
+uint		xfs_ilock_attr_map_shared(struct xfs_inode *ip);
 
 uint		xfs_ip2xflags(struct xfs_inode *);
 int		xfs_ifree(struct xfs_trans *, struct xfs_inode *);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/29] vfs: add nowait flag for struct dir_context
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (2 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 03/29] xfs: add NOWAIT semantics for readdir Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 05/29] vfs: add a vfs helper for io_uring file pos lock Hao Xu
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

The flags will allow passing DIR_CONTEXT_F_NOWAIT to iterate()
implementations that support it (as signaled through FMODE_NWAIT
in file->f_mode)

Notes:
- considered using IOCB_NOWAIT but if we add more flags later it
would be confusing to keep track of which values are valid, use
dedicated flags
- might want to check ctx.flags & DIR_CONTEXT_F_NOWAIT is only set
when file->f_mode & FMODE_NOWAIT in iterate_dir() as e.g. WARN_ONCE?

Co-developed-by: Dominique Martinet <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
Signed-off-by: Hao Xu <[email protected]>
---
 fs/internal.h      | 2 +-
 fs/readdir.c       | 6 ++++--
 include/linux/fs.h | 8 ++++++++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index b1f66e52d61b..7508d485c655 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -311,4 +311,4 @@ void mnt_idmap_put(struct mnt_idmap *idmap);
 struct linux_dirent64;
 
 int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
-		 unsigned int count);
+		 unsigned int count, unsigned long flags);
diff --git a/fs/readdir.c b/fs/readdir.c
index 9592259b7e7f..b80caf4c9321 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -358,12 +358,14 @@ static bool filldir64(struct dir_context *ctx, const char *name, int namlen,
  * @file    : pointer to file struct of directory
  * @dirent  : pointer to user directory structure
  * @count   : size of buffer
+ * @flags   : additional dir_context flags
  */
 int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent,
-		 unsigned int count)
+		 unsigned int count, unsigned long flags)
 {
 	struct getdents_callback64 buf = {
 		.ctx.actor = filldir64,
+		.ctx.flags = flags,
 		.count = count,
 		.current_dir = dirent
 	};
@@ -395,7 +397,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
 	if (!f.file)
 		return -EBADF;
 
-	error = vfs_getdents(f.file, dirent, count);
+	error = vfs_getdents(f.file, dirent, count, 0);
 
 	fdput_pos(f);
 	return error;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6867512907d6..f3e315e8efdd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1719,8 +1719,16 @@ typedef bool (*filldir_t)(struct dir_context *, const char *, int, loff_t, u64,
 struct dir_context {
 	filldir_t actor;
 	loff_t pos;
+	unsigned long flags;
 };
 
+/*
+ * flags for dir_context flags
+ * DIR_CONTEXT_F_NOWAIT: Request non-blocking iterate
+ *                       (requires file->f_mode & FMODE_NOWAIT)
+ */
+#define DIR_CONTEXT_F_NOWAIT	(1 << 0)
+
 /*
  * These flags let !MMU mmap() govern direct device mapping vs immediate
  * copying more easily for MAP_PRIVATE, especially for ROM filesystems.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/29] vfs: add a vfs helper for io_uring file pos lock
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (3 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 04/29] vfs: add nowait flag for struct dir_context Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 06/29] vfs: add file_pos_unlock() for io_uring usage Hao Xu
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Add a vfs helper file_pos_lock_nowait() for io_uring usage. The function
have conditional nowait logic, i.e. if nowait is needed, return -EAGAIN
when trylock fails.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/file.c            | 13 +++++++++++++
 include/linux/file.h |  2 ++
 2 files changed, 15 insertions(+)

diff --git a/fs/file.c b/fs/file.c
index 35c62b54c9d6..8e5c38f5db52 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1053,6 +1053,19 @@ void __f_unlock_pos(struct file *f)
 	mutex_unlock(&f->f_pos_lock);
 }
 
+int file_pos_lock_nowait(struct file *file, bool nowait)
+{
+	if (!(file->f_mode & FMODE_ATOMIC_POS))
+		return 0;
+
+	if (!nowait)
+		mutex_lock(&file->f_pos_lock);
+	else if (!mutex_trylock(&file->f_pos_lock))
+		return -EAGAIN;
+
+	return 1;
+}
+
 /*
  * We only lock f_pos if we have threads or if the file might be
  * shared with another process. In both cases we'll have an elevated
diff --git a/include/linux/file.h b/include/linux/file.h
index 6e9099d29343..bcc6ba0aec50 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -81,6 +81,8 @@ static inline void fdput_pos(struct fd f)
 	fdput(f);
 }
 
+extern int file_pos_lock_nowait(struct file *file, bool nowait);
+
 DEFINE_CLASS(fd, struct fd, fdput(_T), fdget(fd), int fd)
 
 extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/29] vfs: add file_pos_unlock() for io_uring usage
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (4 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 05/29] vfs: add a vfs helper for io_uring file pos lock Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 07/29] vfs: add a nowait parameter for touch_atime() Hao Xu
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Add a helper to unlock f_pos_lock without any condition. Introduce this
since io_uring handles f_pos_lock not with a fd struct, thus
FDPUT_POS_UNLOCK isn't used.

Signed-off-by: Hao Xu <[email protected]>
---
 include/linux/file.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/file.h b/include/linux/file.h
index bcc6ba0aec50..a179f4794341 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -81,6 +81,11 @@ static inline void fdput_pos(struct fd f)
 	fdput(f);
 }
 
+static inline void file_pos_unlock(struct file *file)
+{
+	__f_unlock_pos(file);
+}
+
 extern int file_pos_lock_nowait(struct file *file, bool nowait);
 
 DEFINE_CLASS(fd, struct fd, fdput(_T), fdget(fd), int fd)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/29] vfs: add a nowait parameter for touch_atime()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (5 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 06/29] vfs: add file_pos_unlock() for io_uring usage Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 08/29] vfs: add nowait parameter for file_accessed() Hao Xu
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Add a nowait boolean parameter for touch_atime() to support nowait
semantics. It is true only when io_uring is the initial caller.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/cachefiles/namei.c | 2 +-
 fs/ecryptfs/file.c    | 4 ++--
 fs/inode.c            | 7 ++++---
 fs/namei.c            | 4 ++--
 fs/nfsd/vfs.c         | 2 +-
 fs/overlayfs/file.c   | 2 +-
 fs/overlayfs/inode.c  | 2 +-
 fs/stat.c             | 2 +-
 include/linux/fs.h    | 4 ++--
 kernel/bpf/inode.c    | 4 ++--
 net/unix/af_unix.c    | 4 ++--
 11 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d9d22d0ec38a..7a21bf0e36b8 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -591,7 +591,7 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
 	 * used to keep track of culling, and atimes are only updated by read,
 	 * write and readdir but not lookup or open).
 	 */
-	touch_atime(&file->f_path);
+	touch_atime(&file->f_path, false);
 	dput(dentry);
 	return true;
 
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index ce0a3c5ed0ca..3db7006cc440 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -39,7 +39,7 @@ static ssize_t ecryptfs_read_update_atime(struct kiocb *iocb,
 	rc = generic_file_read_iter(iocb, to);
 	if (rc >= 0) {
 		path = ecryptfs_dentry_to_lower_path(file->f_path.dentry);
-		touch_atime(path);
+		touch_atime(path, false);
 	}
 	return rc;
 }
@@ -64,7 +64,7 @@ static ssize_t ecryptfs_splice_read_update_atime(struct file *in, loff_t *ppos,
 	rc = filemap_splice_read(in, ppos, pipe, len, flags);
 	if (rc >= 0) {
 		path = ecryptfs_dentry_to_lower_path(in->f_path.dentry);
-		touch_atime(path);
+		touch_atime(path, false);
 	}
 	return rc;
 }
diff --git a/fs/inode.c b/fs/inode.c
index 8fefb69e1f84..e83b836f2d09 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1961,17 +1961,17 @@ bool atime_needs_update(const struct path *path, struct inode *inode)
 	return true;
 }
 
-void touch_atime(const struct path *path)
+int touch_atime(const struct path *path, bool nowait)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct inode *inode = d_inode(path->dentry);
 	struct timespec64 now;
 
 	if (!atime_needs_update(path, inode))
-		return;
+		return 0;
 
 	if (!sb_start_write_trylock(inode->i_sb))
-		return;
+		return 0;
 
 	if (__mnt_want_write(mnt) != 0)
 		goto skip_update;
@@ -1989,6 +1989,7 @@ void touch_atime(const struct path *path)
 	__mnt_drop_write(mnt);
 skip_update:
 	sb_end_write(inode->i_sb);
+	return 0;
 }
 EXPORT_SYMBOL(touch_atime);
 
diff --git a/fs/namei.c b/fs/namei.c
index e56ff39a79bc..35731d405730 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1776,12 +1776,12 @@ static const char *pick_link(struct nameidata *nd, struct path *link,
 		return ERR_PTR(-ELOOP);
 
 	if (!(nd->flags & LOOKUP_RCU)) {
-		touch_atime(&last->link);
+		touch_atime(&last->link, false);
 		cond_resched();
 	} else if (atime_needs_update(&last->link, inode)) {
 		if (!try_to_unlazy(nd))
 			return ERR_PTR(-ECHILD);
-		touch_atime(&last->link);
+		touch_atime(&last->link, false);
 	}
 
 	error = security_inode_follow_link(link->dentry, inode,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 8a2321d19194..3179e7b5d209 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1569,7 +1569,7 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp)
 	if (unlikely(!d_is_symlink(path.dentry)))
 		return nfserr_inval;
 
-	touch_atime(&path);
+	touch_atime(&path, false);
 
 	link = vfs_get_link(path.dentry, &done);
 	if (IS_ERR(link))
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 21245b00722a..6ff466ef98ea 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -255,7 +255,7 @@ static void ovl_file_accessed(struct file *file)
 		inode->i_ctime = upperinode->i_ctime;
 	}
 
-	touch_atime(&file->f_path);
+	touch_atime(&file->f_path, false);
 }
 
 static rwf_t ovl_iocb_to_rwf(int ifl)
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index a63e57447be9..66e03025e748 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -703,7 +703,7 @@ int ovl_update_time(struct inode *inode, struct timespec64 *ts, int flags)
 		};
 
 		if (upperpath.dentry) {
-			touch_atime(&upperpath);
+			touch_atime(&upperpath, false);
 			inode->i_atime = d_inode(upperpath.dentry)->i_atime;
 		}
 	}
diff --git a/fs/stat.c b/fs/stat.c
index 7c238da22ef0..713773e61110 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -485,7 +485,7 @@ static int do_readlinkat(int dfd, const char __user *pathname,
 		if (d_is_symlink(path.dentry) || inode->i_op->readlink) {
 			error = security_inode_readlink(path.dentry);
 			if (!error) {
-				touch_atime(&path);
+				touch_atime(&path, false);
 				error = vfs_readlink(path.dentry, buf, bufsiz);
 			}
 		}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f3e315e8efdd..ba54879089ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2201,13 +2201,13 @@ enum file_time_flags {
 };
 
 extern bool atime_needs_update(const struct path *, struct inode *);
-extern void touch_atime(const struct path *);
+extern int touch_atime(const struct path *path, bool nowait);
 int inode_update_time(struct inode *inode, struct timespec64 *time, int flags);
 
 static inline void file_accessed(struct file *file)
 {
 	if (!(file->f_flags & O_NOATIME))
-		touch_atime(&file->f_path);
+		touch_atime(&file->f_path, false);
 }
 
 extern int file_modified(struct file *file);
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 4174f76133df..bc020b45d5c8 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -517,7 +517,7 @@ static void *bpf_obj_do_get(int path_fd, const char __user *pathname,
 
 	raw = bpf_any_get(inode->i_private, *type);
 	if (!IS_ERR(raw))
-		touch_atime(&path);
+		touch_atime(&path, false);
 
 	path_put(&path);
 	return raw;
@@ -591,7 +591,7 @@ struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type typ
 		return ERR_PTR(ret);
 	prog = __get_prog_inode(d_backing_inode(path.dentry), type);
 	if (!IS_ERR(prog))
-		touch_atime(&path);
+		touch_atime(&path, false);
 	path_put(&path);
 	return prog;
 }
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 123b35ddfd71..5868e4e47320 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1084,7 +1084,7 @@ static struct sock *unix_find_bsd(struct sockaddr_un *sunaddr, int addr_len,
 
 	err = -EPROTOTYPE;
 	if (sk->sk_type == type)
-		touch_atime(&path);
+		touch_atime(&path, false);
 	else
 		goto sock_put;
 
@@ -1114,7 +1114,7 @@ static struct sock *unix_find_abstract(struct net *net,
 
 	dentry = unix_sk(sk)->path.dentry;
 	if (dentry)
-		touch_atime(&unix_sk(sk)->path);
+		touch_atime(&unix_sk(sk)->path, false);
 
 	return sk;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/29] vfs: add nowait parameter for file_accessed()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (6 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 07/29] vfs: add a nowait parameter for touch_atime() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 09/29] vfs: move file_accessed() to the beginning of iterate_dir() Hao Xu
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Add a boolean parameter for file_accessed() to support nowait semantics.
Currently it is true only with io_uring as its initial caller.

Signed-off-by: Hao Xu <[email protected]>
---
 arch/s390/hypfs/inode.c | 2 +-
 block/fops.c            | 2 +-
 fs/btrfs/file.c         | 2 +-
 fs/btrfs/inode.c        | 2 +-
 fs/coda/dir.c           | 4 ++--
 fs/ext2/file.c          | 4 ++--
 fs/ext4/file.c          | 6 +++---
 fs/f2fs/file.c          | 4 ++--
 fs/fuse/dax.c           | 2 +-
 fs/fuse/file.c          | 4 ++--
 fs/gfs2/file.c          | 2 +-
 fs/hugetlbfs/inode.c    | 2 +-
 fs/nilfs2/file.c        | 2 +-
 fs/orangefs/file.c      | 2 +-
 fs/orangefs/inode.c     | 2 +-
 fs/pipe.c               | 2 +-
 fs/ramfs/file-nommu.c   | 2 +-
 fs/readdir.c            | 2 +-
 fs/smb/client/cifsfs.c  | 2 +-
 fs/splice.c             | 2 +-
 fs/ubifs/file.c         | 2 +-
 fs/udf/file.c           | 2 +-
 fs/xfs/xfs_file.c       | 6 +++---
 fs/zonefs/file.c        | 4 ++--
 include/linux/fs.h      | 5 +++--
 mm/filemap.c            | 8 ++++----
 mm/shmem.c              | 6 +++---
 27 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index ee919bfc8186..55f562027c4f 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -157,7 +157,7 @@ static ssize_t hypfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (!count)
 		return -EFAULT;
 	iocb->ki_pos = pos + count;
-	file_accessed(file);
+	file_accessed(file, false);
 	return count;
 }
 
diff --git a/block/fops.c b/block/fops.c
index a286bf3325c5..546ecd3c8084 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -601,7 +601,7 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		ret = kiocb_write_and_wait(iocb, count);
 		if (ret < 0)
 			goto reexpand;
-		file_accessed(iocb->ki_filp);
+		file_accessed(iocb->ki_filp, false);
 
 		ret = blkdev_direct_IO(iocb, to);
 		if (ret >= 0) {
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fd03e689a6be..24c0bf3818a6 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2013,7 +2013,7 @@ static int btrfs_file_mmap(struct file	*filp, struct vm_area_struct *vma)
 	if (!mapping->a_ops->read_folio)
 		return -ENOEXEC;
 
-	file_accessed(filp);
+	file_accessed(filp, false);
 	vma->vm_ops = &btrfs_file_vm_ops;
 
 	return 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index dbbb67293e34..50e9ae8c388c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10153,7 +10153,7 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
 	struct extent_map *em;
 	bool unlocked = false;
 
-	file_accessed(iocb->ki_filp);
+	file_accessed(iocb->ki_filp, false);
 
 	btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
 
diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index 8450b1bd354b..1d94c013ac88 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -436,12 +436,12 @@ static int coda_readdir(struct file *coda_file, struct dir_context *ctx)
 			if (host_file->f_op->iterate_shared) {
 				inode_lock_shared(host_inode);
 				ret = host_file->f_op->iterate_shared(host_file, ctx);
-				file_accessed(host_file);
+				file_accessed(host_file, false);
 				inode_unlock_shared(host_inode);
 			} else {
 				inode_lock(host_inode);
 				ret = host_file->f_op->iterate(host_file, ctx);
-				file_accessed(host_file);
+				file_accessed(host_file, false);
 				inode_unlock(host_inode);
 			}
 		}
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 0b4c91c62e1f..dc059cae50a4 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -44,7 +44,7 @@ static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ret = dax_iomap_rw(iocb, to, &ext2_iomap_ops);
 	inode_unlock_shared(inode);
 
-	file_accessed(iocb->ki_filp);
+	file_accessed(iocb->ki_filp, false);
 	return ret;
 }
 
@@ -127,7 +127,7 @@ static int ext2_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (!IS_DAX(file_inode(file)))
 		return generic_file_mmap(file, vma);
 
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &ext2_dax_vm_ops;
 	return 0;
 }
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index c457c8517f0f..2ab790a668a8 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -94,7 +94,7 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, 0, NULL, 0);
 	inode_unlock_shared(inode);
 
-	file_accessed(iocb->ki_filp);
+	file_accessed(iocb->ki_filp, false);
 	return ret;
 }
 
@@ -122,7 +122,7 @@ static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	ret = dax_iomap_rw(iocb, to, &ext4_iomap_ops);
 	inode_unlock_shared(inode);
 
-	file_accessed(iocb->ki_filp);
+	file_accessed(iocb->ki_filp, false);
 	return ret;
 }
 #endif
@@ -820,7 +820,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
-	file_accessed(file);
+	file_accessed(file, false);
 	if (IS_DAX(file_inode(file))) {
 		vma->vm_ops = &ext4_dax_vm_ops;
 		vm_flags_set(vma, VM_HUGEPAGE);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 093039dee992..246e61d78f92 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -524,7 +524,7 @@ static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (!f2fs_is_compress_backend_ready(inode))
 		return -EOPNOTSUPP;
 
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &f2fs_file_vm_ops;
 	set_inode_flag(inode, FI_MMAP_FILE);
 	return 0;
@@ -4380,7 +4380,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 	f2fs_up_read(&fi->i_gc_rwsem[READ]);
 
-	file_accessed(file);
+	file_accessed(file, false);
 out:
 	trace_f2fs_direct_IO_exit(inode, pos, count, READ, ret);
 	return ret;
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 8e74f278a3f6..8a43c37195dd 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -858,7 +858,7 @@ static const struct vm_operations_struct fuse_dax_vm_ops = {
 
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &fuse_dax_vm_ops;
 	vm_flags_set(vma, VM_MIXEDMAP | VM_HUGEPAGE);
 	return 0;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index bc4115288eec..3c4cbc5e2de6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2496,7 +2496,7 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
 		fuse_link_write_file(file);
 
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &fuse_file_vm_ops;
 	return 0;
 }
@@ -3193,7 +3193,7 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in,
 		clear_bit(FUSE_I_SIZE_UNSTABLE, &fi_out->state);
 
 	inode_unlock(inode_out);
-	file_accessed(file_in);
+	file_accessed(file_in, false);
 
 	fuse_flush_time_update(inode_out);
 
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 1bf3c4453516..3003be5b8266 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -601,7 +601,7 @@ static int gfs2_mmap(struct file *file, struct vm_area_struct *vma)
 			return error;
 		/* grab lock to update inode */
 		gfs2_glock_dq_uninit(&i_gh);
-		file_accessed(file);
+		file_accessed(file, false);
 	}
 	vma->vm_ops = &gfs2_vm_ops;
 
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 7b17ccfa039d..729f66346c3c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -161,7 +161,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 		return -EINVAL;
 
 	inode_lock(inode);
-	file_accessed(file);
+	file_accessed(file, false);
 
 	ret = -ENOMEM;
 	if (!hugetlb_reserve_pages(inode,
diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index a9eb3487efb2..a857ebcf099c 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -119,7 +119,7 @@ static const struct vm_operations_struct nilfs_file_vm_ops = {
 
 static int nilfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &nilfs_file_vm_ops;
 	return 0;
 }
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index d68372241b30..5c7a17995fe1 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -412,7 +412,7 @@ static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	/* set the sequential readahead hint */
 	vm_flags_mod(vma, VM_SEQ_READ, VM_RAND_READ);
 
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &orangefs_file_vm_ops;
 	return 0;
 }
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 9014bbcc8031..77d56703bb09 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -597,7 +597,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 		ret = total_count;
 	if (ret > 0) {
 		if (type == ORANGEFS_IO_READ) {
-			file_accessed(file);
+			file_accessed(file, false);
 		} else {
 			file_update_time(file);
 			if (*offset > i_size_read(inode))
diff --git a/fs/pipe.c b/fs/pipe.c
index 2d88f73f585a..ce1038d3de4b 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -393,7 +393,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
 		wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
 	kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
 	if (ret > 0)
-		file_accessed(filp);
+		file_accessed(filp, false);
 	return ret;
 }
 
diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index efb1b4c1a0a4..ad69f828f6ad 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -267,7 +267,7 @@ static int ramfs_nommu_mmap(struct file *file, struct vm_area_struct *vma)
 	if (!is_nommu_shared_mapping(vma->vm_flags))
 		return -ENOSYS;
 
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &generic_file_vm_ops;
 	return 0;
 }
diff --git a/fs/readdir.c b/fs/readdir.c
index b80caf4c9321..2f4c9c663a39 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -68,7 +68,7 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 			res = file->f_op->iterate(file, ctx);
 		file->f_pos = ctx->pos;
 		fsnotify_access(file);
-		file_accessed(file);
+		file_accessed(file, ctx->flags & DIR_CONTEXT_F_NOWAIT);
 	}
 	if (shared)
 		inode_unlock_shared(inode);
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index a4d8b0ea1c8c..20156c5e83e6 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1307,7 +1307,7 @@ ssize_t cifs_file_copychunk_range(unsigned int xid,
 		rc = target_tcon->ses->server->ops->copychunk_range(xid,
 			smb_file_src, smb_file_target, off, len, destoff);
 
-	file_accessed(src_file);
+	file_accessed(src_file, false);
 
 	/* force revalidate of size and timestamps of target file now
 	 * that target is updated on the server
diff --git a/fs/splice.c b/fs/splice.c
index 004eb1c4ce31..e4dcfa1c0fef 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1104,7 +1104,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 
 done:
 	pipe->tail = pipe->head = 0;
-	file_accessed(in);
+	file_accessed(in, false);
 	return bytes;
 
 read_failure:
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 6738fe43040b..a27c73848571 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1603,7 +1603,7 @@ static int ubifs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &ubifs_file_vm_ops;
 
 	if (IS_ENABLED(CONFIG_UBIFS_ATIME_SUPPORT))
-		file_accessed(file);
+		file_accessed(file, false);
 
 	return 0;
 }
diff --git a/fs/udf/file.c b/fs/udf/file.c
index 243840dc83ad..46edf6e64632 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -191,7 +191,7 @@ static int udf_release_file(struct inode *inode, struct file *filp)
 
 static int udf_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &udf_file_vm_ops;
 
 	return 0;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 4f502219ae4f..c72efdb9e43e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -227,7 +227,7 @@ xfs_file_dio_read(
 	if (!iov_iter_count(to))
 		return 0; /* skip atime */
 
-	file_accessed(iocb->ki_filp);
+	file_accessed(iocb->ki_filp, false);
 
 	ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED);
 	if (ret)
@@ -257,7 +257,7 @@ xfs_file_dax_read(
 	ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
-	file_accessed(iocb->ki_filp);
+	file_accessed(iocb->ki_filp, false);
 	return ret;
 }
 
@@ -1434,7 +1434,7 @@ xfs_file_mmap(
 	if (!daxdev_mapping_supported(vma, target->bt_daxdev))
 		return -EOPNOTSUPP;
 
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &xfs_file_vm_ops;
 	if (IS_DAX(inode))
 		vm_flags_set(vma, VM_HUGEPAGE);
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 92c9aaae3663..664ebae181bd 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -323,7 +323,7 @@ static int zonefs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	    (vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
 		return -EINVAL;
 
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &zonefs_file_vm_ops;
 
 	return 0;
@@ -736,7 +736,7 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			ret = -EINVAL;
 			goto inode_unlock;
 		}
-		file_accessed(iocb->ki_filp);
+		file_accessed(iocb->ki_filp, false);
 		ret = iomap_dio_rw(iocb, to, &zonefs_read_iomap_ops,
 				   &zonefs_read_dio_ops, 0, NULL, 0);
 	} else {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ba54879089ac..ed60b3d70d1e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2204,10 +2204,11 @@ extern bool atime_needs_update(const struct path *, struct inode *);
 extern int touch_atime(const struct path *path, bool nowait);
 int inode_update_time(struct inode *inode, struct timespec64 *time, int flags);
 
-static inline void file_accessed(struct file *file)
+static inline int file_accessed(struct file *file, bool nowait)
 {
 	if (!(file->f_flags & O_NOATIME))
-		touch_atime(&file->f_path, false);
+		return touch_atime(&file->f_path, nowait);
+	return 0;
 }
 
 extern int file_modified(struct file *file);
diff --git a/mm/filemap.c b/mm/filemap.c
index 9e44a49bbd74..1f2032f4fd10 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2723,7 +2723,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 		folio_batch_init(&fbatch);
 	} while (iov_iter_count(iter) && iocb->ki_pos < isize && !error);
 
-	file_accessed(filp);
+	file_accessed(filp, false);
 
 	return already_read ? already_read : error;
 }
@@ -2809,7 +2809,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		retval = kiocb_write_and_wait(iocb, count);
 		if (retval < 0)
 			return retval;
-		file_accessed(file);
+		file_accessed(file, false);
 
 		retval = mapping->a_ops->direct_IO(iocb, iter);
 		if (retval >= 0) {
@@ -2978,7 +2978,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 
 out:
 	folio_batch_release(&fbatch);
-	file_accessed(in);
+	file_accessed(in, false);
 
 	return total_spliced ? total_spliced : error;
 }
@@ -3613,7 +3613,7 @@ int generic_file_mmap(struct file *file, struct vm_area_struct *vma)
 
 	if (!mapping->a_ops->read_folio)
 		return -ENOEXEC;
-	file_accessed(file);
+	file_accessed(file, false);
 	vma->vm_ops = &generic_file_vm_ops;
 	return 0;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 2f2e0e618072..440b23e2d9e1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2317,7 +2317,7 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 	/* arm64 - allow memory tagging on RAM-based files */
 	vm_flags_set(vma, VM_MTE_ALLOWED);
 
-	file_accessed(file);
+	file_accessed(file, false);
 	/* This is anonymous shared memory if it is unlinked at the time of mmap */
 	if (inode->i_nlink)
 		vma->vm_ops = &shmem_vm_ops;
@@ -2727,7 +2727,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	}
 
 	*ppos = ((loff_t) index << PAGE_SHIFT) + offset;
-	file_accessed(file);
+	file_accessed(file, false);
 	return retval ? retval : error;
 }
 
@@ -2859,7 +2859,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 	if (folio)
 		folio_put(folio);
 
-	file_accessed(in);
+	file_accessed(in, false);
 	return total_spliced ? total_spliced : error;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/29] vfs: move file_accessed() to the beginning of iterate_dir()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (7 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 08/29] vfs: add nowait parameter for file_accessed() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 10/29] vfs: add S_NOWAIT for nowait time update Hao Xu
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Move file_accessed() to the beginning of iterate_dir() so that we don't
need to rollback all the work done when file_accessed() returns -EAGAIN
at the end of getdents.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/readdir.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index 2f4c9c663a39..6469f076ba6e 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -61,6 +61,10 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
+		res = file_accessed(file, ctx->flags & DIR_CONTEXT_F_NOWAIT);
+		if (res == -EAGAIN)
+			goto out_unlock;
+
 		ctx->pos = file->f_pos;
 		if (shared)
 			res = file->f_op->iterate_shared(file, ctx);
@@ -68,8 +72,9 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 			res = file->f_op->iterate(file, ctx);
 		file->f_pos = ctx->pos;
 		fsnotify_access(file);
-		file_accessed(file, ctx->flags & DIR_CONTEXT_F_NOWAIT);
 	}
+
+out_unlock:
 	if (shared)
 		inode_unlock_shared(inode);
 	else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/29] vfs: add S_NOWAIT for nowait time update
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (8 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 09/29] vfs: move file_accessed() to the beginning of iterate_dir() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 11/29] vfs: trylock inode->i_rwsem in iterate_dir() to support nowait Hao Xu
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Add a new time flag S_NOWAIT to support nowait time update. Deliver it
to specific filesystem and error out -EAGAIN when it would block.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/inode.c         | 9 +++++----
 fs/xfs/xfs_iops.c  | 8 +++++++-
 include/linux/fs.h | 1 +
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index e83b836f2d09..eb3db34a3e6e 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1966,12 +1966,13 @@ int touch_atime(const struct path *path, bool nowait)
 	struct vfsmount *mnt = path->mnt;
 	struct inode *inode = d_inode(path->dentry);
 	struct timespec64 now;
+	int ret = 0;
 
 	if (!atime_needs_update(path, inode))
-		return 0;
+		return ret;
 
 	if (!sb_start_write_trylock(inode->i_sb))
-		return 0;
+		return ret;
 
 	if (__mnt_want_write(mnt) != 0)
 		goto skip_update;
@@ -1985,11 +1986,11 @@ int touch_atime(const struct path *path, bool nowait)
 	 * of the fs read only, e.g. subvolumes in Btrfs.
 	 */
 	now = current_time(inode);
-	inode_update_time(inode, &now, S_ATIME);
+	ret = inode_update_time(inode, &now, S_ATIME | (nowait ? S_NOWAIT : 0));
 	__mnt_drop_write(mnt);
 skip_update:
 	sb_end_write(inode->i_sb);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(touch_atime);
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 24718adb3c16..bf1d4c31f009 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1053,7 +1053,13 @@ xfs_vn_update_time(
 	if (error)
 		return error;
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	if (flags & S_NOWAIT) {
+		if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
+			return -EAGAIN;
+	} else {
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+	}
+
 	if (flags & S_CTIME)
 		inode->i_ctime = *now;
 	if (flags & S_MTIME)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ed60b3d70d1e..f8c267ee5cb7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2198,6 +2198,7 @@ enum file_time_flags {
 	S_MTIME = 2,
 	S_CTIME = 4,
 	S_VERSION = 8,
+	S_NOWAIT = 16,
 };
 
 extern bool atime_needs_update(const struct path *, struct inode *);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/29] vfs: trylock inode->i_rwsem in iterate_dir() to support nowait
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (9 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 10/29] vfs: add S_NOWAIT for nowait time update Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 12/29] xfs: enforce GFP_NOIO implicitly during nowait time update Hao Xu
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Trylock inode->i_rwsem in iterate_dir() to support nowait semantics and
error out -EAGAIN when there is contention.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/readdir.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/readdir.c b/fs/readdir.c
index 6469f076ba6e..664ecd9665a1 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -43,6 +43,8 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 	struct inode *inode = file_inode(file);
 	bool shared = false;
 	int res = -ENOTDIR;
+	bool nowait;
+
 	if (file->f_op->iterate_shared)
 		shared = true;
 	else if (!file->f_op->iterate)
@@ -52,16 +54,22 @@ int iterate_dir(struct file *file, struct dir_context *ctx)
 	if (res)
 		goto out;
 
-	if (shared)
-		res = down_read_killable(&inode->i_rwsem);
-	else
-		res = down_write_killable(&inode->i_rwsem);
-	if (res)
+	nowait = ctx->flags & DIR_CONTEXT_F_NOWAIT;
+	if (nowait) {
+		res = shared ? down_read_trylock(&inode->i_rwsem) :
+			       down_write_trylock(&inode->i_rwsem);
+		if (!res)
+			res = -EAGAIN;
+	} else {
+		res = shared ? down_read_killable(&inode->i_rwsem) :
+			       down_write_killable(&inode->i_rwsem);
+	}
+	if (res < 0)
 		goto out;
 
 	res = -ENOENT;
 	if (!IS_DEADDIR(inode)) {
-		res = file_accessed(file, ctx->flags & DIR_CONTEXT_F_NOWAIT);
+		res = file_accessed(file, nowait);
 		if (res == -EAGAIN)
 			goto out_unlock;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 12/29] xfs: enforce GFP_NOIO implicitly during nowait time update
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (10 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 11/29] vfs: trylock inode->i_rwsem in iterate_dir() to support nowait Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 14:20   ` Matthew Wilcox
  2023-08-25 13:54 ` [PATCH 13/29] xfs: make xfs_trans_alloc() support nowait semantics Hao Xu
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Enforce GFP_NOIO logic implicitly by set pflags if we are in nowait
time update process. Nowait semantics means no waiting for IO,
therefore GFP_NOIO is needed.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_iops.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index bf1d4c31f009..5fa391083de9 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1037,6 +1037,8 @@ xfs_vn_update_time(
 	int			log_flags = XFS_ILOG_TIMESTAMP;
 	struct xfs_trans	*tp;
 	int			error;
+	int			old_pflags;
+	bool			nowait = flags & S_NOWAIT;
 
 	trace_xfs_update_time(ip);
 
@@ -1049,13 +1051,18 @@ xfs_vn_update_time(
 		log_flags |= XFS_ILOG_CORE;
 	}
 
+	if (nowait)
+		old_pflags = memalloc_noio_save();
+
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
 	if (error)
-		return error;
+		goto out;
 
-	if (flags & S_NOWAIT) {
-		if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
-			return -EAGAIN;
+	if (nowait) {
+		if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
+			error = -EAGAIN;
+			goto out;
+		}
 	} else {
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
 	}
@@ -1069,7 +1076,12 @@ xfs_vn_update_time(
 
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 	xfs_trans_log_inode(tp, ip, log_flags);
-	return xfs_trans_commit(tp);
+	error = xfs_trans_commit(tp);
+
+out:
+	if (nowait)
+		memalloc_noio_restore(old_pflags);
+	return error;
 }
 
 STATIC int
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 13/29] xfs: make xfs_trans_alloc() support nowait semantics
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (11 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 12/29] xfs: enforce GFP_NOIO implicitly during nowait time update Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 14/29] xfs: support nowait for xfs_log_reserve() Hao Xu
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

There are locks in xfs_trans_alloc(), spot them and apply trylock logic.
Make them return -EAGAIN when it would block. To achieve this, add
nowait parameter for those functions in the path. Besides, add a generic
transaction flag XFS_TRANS_NOWAIT to deliver nowait info.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/libxfs/xfs_shared.h |  2 ++
 fs/xfs/xfs_iops.c          |  3 ++-
 fs/xfs/xfs_trans.c         | 21 ++++++++++++++++++---
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index c4381388c0c1..0ba3d6f53405 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -83,6 +83,8 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
  * made then this algorithm will eventually find all the space it needs.
  */
 #define XFS_TRANS_LOWMODE	0x100	/* allocate in low space mode */
+/* Transaction should follow nowait semantics */
+#define XFS_TRANS_NOWAIT		(1u << 9)
 
 /*
  * Field values for xfs_trans_mod_sb.
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 5fa391083de9..47b4fd5f8f5c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1054,7 +1054,8 @@ xfs_vn_update_time(
 	if (nowait)
 		old_pflags = memalloc_noio_save();
 
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0,
+				nowait ? XFS_TRANS_NOWAIT : 0, &tp);
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 8c0bfc9a33b1..dbec685f4f4a 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -251,6 +251,9 @@ xfs_trans_alloc(
 	struct xfs_trans	*tp;
 	bool			want_retry = true;
 	int			error;
+	bool			nowait = flags & XFS_TRANS_NOWAIT;
+	gfp_t			gfp_flags = GFP_KERNEL |
+					    (nowait ? 0 : __GFP_NOFAIL);
 
 	/*
 	 * Allocate the handle before we do our freeze accounting and setting up
@@ -258,9 +261,21 @@ xfs_trans_alloc(
 	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
 	 */
 retry:
-	tp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL);
-	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
-		sb_start_intwrite(mp->m_super);
+	tp = kmem_cache_zalloc(xfs_trans_cache, gfp_flags);
+	if (!tp)
+		return -EAGAIN;
+	if (!(flags & XFS_TRANS_NO_WRITECOUNT)) {
+		if (nowait) {
+			bool locked = sb_start_intwrite_trylock(mp->m_super);
+
+			if (!locked) {
+				xfs_trans_cancel(tp);
+				return -EAGAIN;
+			}
+		} else {
+			sb_start_intwrite(mp->m_super);
+		}
+	}
 	xfs_trans_set_context(tp);
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 14/29] xfs: support nowait for xfs_log_reserve()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (12 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 13/29] xfs: make xfs_trans_alloc() support nowait semantics Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 15/29] xfs: don't wait for free space in xlog_grant_head_check() in nowait case Hao Xu
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Support nowait logic for xfs_log_reserve(), including add a nowait
boolean parameter and error out -EAGAIN for ticket allocation.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_log.c      | 18 +++++++++++++-----
 fs/xfs/xfs_log.h      |  5 +++--
 fs/xfs/xfs_log_cil.c  |  2 +-
 fs/xfs/xfs_log_priv.h |  2 +-
 fs/xfs/xfs_trans.c    |  5 +++--
 5 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 79004d193e54..90fbb1c0eca2 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -462,7 +462,8 @@ xfs_log_reserve(
 	int			unit_bytes,
 	int			cnt,
 	struct xlog_ticket	**ticp,
-	bool			permanent)
+	bool			permanent,
+	bool			nowait)
 {
 	struct xlog		*log = mp->m_log;
 	struct xlog_ticket	*tic;
@@ -475,7 +476,9 @@ xfs_log_reserve(
 	XFS_STATS_INC(mp, xs_try_logspace);
 
 	ASSERT(*ticp == NULL);
-	tic = xlog_ticket_alloc(log, unit_bytes, cnt, permanent);
+	tic = xlog_ticket_alloc(log, unit_bytes, cnt, permanent, nowait);
+	if (!tic)
+		return -EAGAIN;
 	*ticp = tic;
 
 	xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt
@@ -974,7 +977,7 @@ xlog_unmount_write(
 	struct xlog_ticket	*tic = NULL;
 	int			error;
 
-	error = xfs_log_reserve(mp, 600, 1, &tic, 0);
+	error = xfs_log_reserve(mp, 600, 1, &tic, 0, false);
 	if (error)
 		goto out_err;
 
@@ -3527,12 +3530,17 @@ xlog_ticket_alloc(
 	struct xlog		*log,
 	int			unit_bytes,
 	int			cnt,
-	bool			permanent)
+	bool			permanent,
+	bool			nowait)
 {
 	struct xlog_ticket	*tic;
 	int			unit_res;
 
-	tic = kmem_cache_zalloc(xfs_log_ticket_cache, GFP_NOFS | __GFP_NOFAIL);
+	gfp_t			gfp_flags = GFP_NOFS |
+					    (nowait ? 0 : __GFP_NOFAIL);
+	tic = kmem_cache_zalloc(xfs_log_ticket_cache, gfp_flags);
+	if (!tic)
+		return NULL;
 
 	unit_res = xlog_calc_unit_res(log, unit_bytes, &tic->t_iclog_hdrs);
 
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 2728886c2963..ba515df443c3 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -139,8 +139,9 @@ void	xfs_log_mount_cancel(struct xfs_mount *);
 xfs_lsn_t xlog_assign_tail_lsn(struct xfs_mount *mp);
 xfs_lsn_t xlog_assign_tail_lsn_locked(struct xfs_mount *mp);
 void	xfs_log_space_wake(struct xfs_mount *mp);
-int	xfs_log_reserve(struct xfs_mount *mp, int length, int count,
-			struct xlog_ticket **ticket, bool permanent);
+int	xfs_log_reserve(struct xfs_mount *mp, int length,
+			int count, struct xlog_ticket **ticket,
+			bool permanent, bool nowait);
 int	xfs_log_regrant(struct xfs_mount *mp, struct xlog_ticket *tic);
 void	xfs_log_unmount(struct xfs_mount *mp);
 bool	xfs_log_writable(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index eccbfb99e894..f17c1799b3c4 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -37,7 +37,7 @@ xlog_cil_ticket_alloc(
 {
 	struct xlog_ticket *tic;
 
-	tic = xlog_ticket_alloc(log, 0, 1, 0);
+	tic = xlog_ticket_alloc(log, 0, 1, 0, false);
 
 	/*
 	 * set the current reservation to zero so we know to steal the basic
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 1bd2963e8fbd..41edaa0ae869 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -503,7 +503,7 @@ extern __le32	 xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead,
 
 extern struct kmem_cache *xfs_log_ticket_cache;
 struct xlog_ticket *xlog_ticket_alloc(struct xlog *log, int unit_bytes,
-		int count, bool permanent);
+		int count, bool permanent, bool nowait);
 
 void	xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
 void	xlog_print_trans(struct xfs_trans *);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index dbec685f4f4a..7988b4c7f36e 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -155,6 +155,7 @@ xfs_trans_reserve(
 	struct xfs_mount	*mp = tp->t_mountp;
 	int			error = 0;
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
+	bool			nowait = tp->t_flags & XFS_TRANS_NOWAIT;
 
 	/*
 	 * Attempt to reserve the needed disk blocks by decrementing
@@ -192,8 +193,8 @@ xfs_trans_reserve(
 			error = xfs_log_regrant(mp, tp->t_ticket);
 		} else {
 			error = xfs_log_reserve(mp, resp->tr_logres,
-						resp->tr_logcount,
-						&tp->t_ticket, permanent);
+						resp->tr_logcount, &tp->t_ticket,
+						permanent, nowait);
 		}
 
 		if (error)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 15/29] xfs: don't wait for free space in xlog_grant_head_check() in nowait case
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (13 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 14/29] xfs: support nowait for xfs_log_reserve() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 16/29] xfs: add nowait parameter for xfs_inode_item_init() Hao Xu
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Don't sleep and wait for more space for a log ticket in
xlog_grant_head_check() when it is in nowait case.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_log.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 90fbb1c0eca2..a2aabdd42a29 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -341,7 +341,8 @@ xlog_grant_head_check(
 	struct xlog		*log,
 	struct xlog_grant_head	*head,
 	struct xlog_ticket	*tic,
-	int			*need_bytes)
+	int			*need_bytes,
+	bool			nowait)
 {
 	int			free_bytes;
 	int			error = 0;
@@ -360,13 +361,15 @@ xlog_grant_head_check(
 		spin_lock(&head->lock);
 		if (!xlog_grant_head_wake(log, head, &free_bytes) ||
 		    free_bytes < *need_bytes) {
-			error = xlog_grant_head_wait(log, head, tic,
-						     *need_bytes);
+			error = nowait ?
+				-EAGAIN : xlog_grant_head_wait(log, head, tic,
+							       *need_bytes);
 		}
 		spin_unlock(&head->lock);
 	} else if (free_bytes < *need_bytes) {
 		spin_lock(&head->lock);
-		error = xlog_grant_head_wait(log, head, tic, *need_bytes);
+		error = nowait ? -EAGAIN : xlog_grant_head_wait(log, head, tic,
+								*need_bytes);
 		spin_unlock(&head->lock);
 	}
 
@@ -428,7 +431,7 @@ xfs_log_regrant(
 	trace_xfs_log_regrant(log, tic);
 
 	error = xlog_grant_head_check(log, &log->l_write_head, tic,
-				      &need_bytes);
+				      &need_bytes, false);
 	if (error)
 		goto out_error;
 
@@ -487,7 +490,7 @@ xfs_log_reserve(
 	trace_xfs_log_reserve(log, tic);
 
 	error = xlog_grant_head_check(log, &log->l_reserve_head, tic,
-				      &need_bytes);
+				      &need_bytes, nowait);
 	if (error)
 		goto out_error;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 16/29] xfs: add nowait parameter for xfs_inode_item_init()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (14 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 15/29] xfs: don't wait for free space in xlog_grant_head_check() in nowait case Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 17/29] xfs: make xfs_trans_ijoin() error out -EAGAIN Hao Xu
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Add nowait parameter for xfs_inode_item_init() to support nowait
semantics.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/libxfs/xfs_trans_inode.c |  3 ++-
 fs/xfs/xfs_inode_item.c         | 12 ++++++++----
 fs/xfs/xfs_inode_item.h         |  3 ++-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index cb4796b6e693..e7a8f63c8975 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -33,7 +33,8 @@ xfs_trans_ijoin(
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	if (ip->i_itemp == NULL)
-		xfs_inode_item_init(ip, ip->i_mount);
+		xfs_inode_item_init(ip, ip->i_mount,
+				    tp->t_flags & XFS_TRANS_NOWAIT);
 	iip = ip->i_itemp;
 
 	ASSERT(iip->ili_lock_flags == 0);
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 91c847a84e10..1742920bb4ce 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -825,21 +825,25 @@ static const struct xfs_item_ops xfs_inode_item_ops = {
 /*
  * Initialize the inode log item for a newly allocated (in-core) inode.
  */
-void
+int
 xfs_inode_item_init(
 	struct xfs_inode	*ip,
-	struct xfs_mount	*mp)
+	struct xfs_mount	*mp,
+	bool			nowait)
 {
 	struct xfs_inode_log_item *iip;
+	gfp_t gfp_flags = GFP_KERNEL | (nowait ? 0 : __GFP_NOFAIL);
 
 	ASSERT(ip->i_itemp == NULL);
-	iip = ip->i_itemp = kmem_cache_zalloc(xfs_ili_cache,
-					      GFP_KERNEL | __GFP_NOFAIL);
+	iip = ip->i_itemp = kmem_cache_zalloc(xfs_ili_cache, gfp_flags);
+	if (!iip)
+		return -EAGAIN;
 
 	iip->ili_inode = ip;
 	spin_lock_init(&iip->ili_lock);
 	xfs_log_item_init(mp, &iip->ili_item, XFS_LI_INODE,
 						&xfs_inode_item_ops);
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index 377e06007804..7ba6f8a6b243 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -42,7 +42,8 @@ static inline int xfs_inode_clean(struct xfs_inode *ip)
 	return !ip->i_itemp || !(ip->i_itemp->ili_fields & XFS_ILOG_ALL);
 }
 
-extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *);
+extern int xfs_inode_item_init(struct xfs_inode *ip, struct xfs_mount *mp,
+			       bool nowait);
 extern void xfs_inode_item_destroy(struct xfs_inode *);
 extern void xfs_iflush_abort(struct xfs_inode *);
 extern void xfs_iflush_shutdown_abort(struct xfs_inode *);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 17/29] xfs: make xfs_trans_ijoin() error out -EAGAIN
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (15 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 16/29] xfs: add nowait parameter for xfs_inode_item_init() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 18/29] xfs: set XBF_NOWAIT for xfs_buf_read_map if necessary Hao Xu
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Change return value of xfs_trans_ijoin() to error out -EAGAIN.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/libxfs/xfs_trans_inode.c | 13 +++++++++----
 fs/xfs/xfs_iops.c               |  4 +++-
 fs/xfs/xfs_trans.h              |  2 +-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index e7a8f63c8975..7bda62bad90a 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -23,7 +23,7 @@
  * The inode must be locked, and it cannot be associated with any transaction.
  * If lock_flags is non-zero the inode will be unlocked on transaction commit.
  */
-void
+int
 xfs_trans_ijoin(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*ip,
@@ -32,9 +32,12 @@ xfs_trans_ijoin(
 	struct xfs_inode_log_item *iip;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-	if (ip->i_itemp == NULL)
-		xfs_inode_item_init(ip, ip->i_mount,
-				    tp->t_flags & XFS_TRANS_NOWAIT);
+	if (ip->i_itemp == NULL) {
+		int ret = xfs_inode_item_init(ip, ip->i_mount,
+					      tp->t_flags & XFS_TRANS_NOWAIT);
+		if (ret == -EAGAIN)
+			return ret;
+	}
 	iip = ip->i_itemp;
 
 	ASSERT(iip->ili_lock_flags == 0);
@@ -44,6 +47,8 @@ xfs_trans_ijoin(
 	/* Reset the per-tx dirty context and add the item to the tx. */
 	iip->ili_dirty_flags = 0;
 	xfs_trans_add_item(tp, &iip->ili_item);
+
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 47b4fd5f8f5c..034a8fea1f8e 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1075,7 +1075,9 @@ xfs_vn_update_time(
 	if (flags & S_ATIME)
 		inode->i_atime = *now;
 
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	error = xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	if (error)
+		goto out;
 	xfs_trans_log_inode(tp, ip, log_flags);
 	error = xfs_trans_commit(tp);
 
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 6e3646d524ce..f2c05884c4b6 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -226,7 +226,7 @@ bool		xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
 void		xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
-void		xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
+int		xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
 void		xfs_trans_log_buf(struct xfs_trans *, struct xfs_buf *, uint,
 				  uint);
 void		xfs_trans_dirty_buf(struct xfs_trans *, struct xfs_buf *);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 18/29] xfs: set XBF_NOWAIT for xfs_buf_read_map if necessary
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (16 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 17/29] xfs: make xfs_trans_ijoin() error out -EAGAIN Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 19/29] xfs: support nowait memory allocation in _xfs_buf_alloc() Hao Xu
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Set XBF_NOWAIT for xfs_buf_read_map() if necessary.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_trans_buf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 6549e50d852c..016371f58f26 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -286,6 +286,8 @@ xfs_trans_read_buf_map(
 		return 0;
 	}
 
+	if (tp && (tp->t_flags & XFS_TRANS_NOWAIT))
+		flags |= XBF_NOWAIT;
 	error = xfs_buf_read_map(target, map, nmaps, flags, &bp, ops,
 			__return_address);
 	switch (error) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 19/29] xfs: support nowait memory allocation in _xfs_buf_alloc()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (17 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 18/29] xfs: set XBF_NOWAIT for xfs_buf_read_map if necessary Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 20/29] xfs: distinguish error type of memory allocation failure for nowait case Hao Xu
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Choose different gfp flags to support nowait memory allocation in
_xfs_buf_alloc().

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_buf.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 9f84bc3b802c..8b800ce28996 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -220,9 +220,14 @@ _xfs_buf_alloc(
 	struct xfs_buf		*bp;
 	int			error;
 	int			i;
+	bool			nowait = flags & XBF_NOWAIT;
+	gfp_t			gfp_flags = GFP_NOFS |
+					    (nowait ? 0 : __GFP_NOFAIL);
 
 	*bpp = NULL;
-	bp = kmem_cache_zalloc(xfs_buf_cache, GFP_NOFS | __GFP_NOFAIL);
+	bp = kmem_cache_zalloc(xfs_buf_cache, gfp_flags);
+	if (!bp)
+		return -EAGAIN;
 
 	/*
 	 * We don't want certain flags to appear in b_flags unless they are
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 20/29] xfs: distinguish error type of memory allocation failure for nowait case
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (18 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 19/29] xfs: support nowait memory allocation in _xfs_buf_alloc() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 21/29] xfs: return -EAGAIN when bulk memory allocation fails in " Hao Xu
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Previously, if we fail to get the memory we need, -ENOMEM is returned.
It can be -EAGAIN now since we support nowait now. Return the latter
when it is the case. Involved functions are:  _xfs_buf_map_pages(),
xfs_buf_get_maps(), xfs_buf_alloc_kmem() and xfs_buf_alloc_pages().

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_buf.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 8b800ce28996..a6e6e64ff940 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -192,7 +192,7 @@ xfs_buf_get_maps(
 	bp->b_maps = kmem_zalloc(map_count * sizeof(struct xfs_buf_map),
 				KM_NOFS);
 	if (!bp->b_maps)
-		return -ENOMEM;
+		return bp->b_flags & XBF_NOWAIT ? -EAGAIN : -ENOMEM;
 	return 0;
 }
 
@@ -339,7 +339,7 @@ xfs_buf_alloc_kmem(
 
 	bp->b_addr = kmem_alloc(size, kmflag_mask);
 	if (!bp->b_addr)
-		return -ENOMEM;
+		return flags & XBF_NOWAIT ? -EAGAIN : -ENOMEM;
 
 	if (((unsigned long)(bp->b_addr + size - 1) & PAGE_MASK) !=
 	    ((unsigned long)bp->b_addr & PAGE_MASK)) {
@@ -363,6 +363,7 @@ xfs_buf_alloc_pages(
 {
 	gfp_t		gfp_mask = __GFP_NOWARN;
 	long		filled = 0;
+	bool		nowait = flags & XBF_NOWAIT;
 
 	if (flags & XBF_READ_AHEAD)
 		gfp_mask |= __GFP_NORETRY;
@@ -377,7 +378,7 @@ xfs_buf_alloc_pages(
 		bp->b_pages = kzalloc(sizeof(struct page *) * bp->b_page_count,
 					gfp_mask);
 		if (!bp->b_pages)
-			return -ENOMEM;
+			return nowait ? -EAGAIN : -ENOMEM;
 	}
 	bp->b_flags |= _XBF_PAGES;
 
@@ -451,7 +452,7 @@ _xfs_buf_map_pages(
 		memalloc_nofs_restore(nofs_flag);
 
 		if (!bp->b_addr)
-			return -ENOMEM;
+			return flags & XBF_NOWAIT ? -EAGAIN : -ENOMEM;
 	}
 
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 21/29] xfs: return -EAGAIN when bulk memory allocation fails in nowait case
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (19 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 20/29] xfs: distinguish error type of memory allocation failure for nowait case Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 22/29] xfs: comment page allocation for nowait case in xfs_buf_find_insert() Hao Xu
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Rather than wait for a moment and retry, we return -EAGAIN when we fail
to allocate bulk memory in xfs_buf_alloc_pages() in nowait case.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_buf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index a6e6e64ff940..eb3cd7702545 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -404,6 +404,11 @@ xfs_buf_alloc_pages(
 		if (filled != last)
 			continue;
 
+		if (nowait) {
+			xfs_buf_free_pages(bp);
+			return -EAGAIN;
+		}
+
 		if (flags & XBF_READ_AHEAD) {
 			xfs_buf_free_pages(bp);
 			return -ENOMEM;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 22/29] xfs: comment page allocation for nowait case in xfs_buf_find_insert()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (20 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 21/29] xfs: return -EAGAIN when bulk memory allocation fails in " Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 14:09   ` Matthew Wilcox
  2023-08-25 13:54 ` [PATCH 23/29] xfs: don't print warn info for -EAGAIN error in xfs_buf_get_map() Hao Xu
                   ` (8 subsequent siblings)
  30 siblings, 1 reply; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Add comments for page allocation in nowait case in xfs_buf_find_insert()

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_buf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index eb3cd7702545..57bdc4c5dde1 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -633,6 +633,8 @@ xfs_buf_find_insert(
 	 * allocate the memory from the heap to minimise memory usage. If we
 	 * can't get heap memory for these small buffers, we fall back to using
 	 * the page allocator.
+	 * xfs_buf_alloc_kmem may return -EAGAIN, let's not return it but turn
+	 * to page allocator as well.
 	 */
 	if (BBTOB(new_bp->b_length) >= PAGE_SIZE ||
 	    xfs_buf_alloc_kmem(new_bp, flags) < 0) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 23/29] xfs: don't print warn info for -EAGAIN error in  xfs_buf_get_map()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (21 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 22/29] xfs: comment page allocation for nowait case in xfs_buf_find_insert() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 24/29] xfs: support nowait for xfs_buf_read_map() Hao Xu
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

-EAGAIN is internal error to indicate a retry, no needs to print a
warn.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_buf.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 57bdc4c5dde1..cdad80e1ae25 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -730,9 +730,10 @@ xfs_buf_get_map(
 	if (!bp->b_addr) {
 		error = _xfs_buf_map_pages(bp, flags);
 		if (unlikely(error)) {
-			xfs_warn_ratelimited(btp->bt_mount,
-				"%s: failed to map %u pages", __func__,
-				bp->b_page_count);
+			if (error != -EAGAIN)
+				xfs_warn_ratelimited(btp->bt_mount,
+					"%s: failed to map %u pages", __func__,
+					bp->b_page_count);
 			xfs_buf_relse(bp);
 			return error;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 24/29] xfs: support nowait for xfs_buf_read_map()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (22 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 23/29] xfs: don't print warn info for -EAGAIN error in xfs_buf_get_map() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 21:53   ` Dave Chinner
  2023-08-25 13:54 ` [PATCH 25/29] xfs: support nowait for xfs_buf_item_init() Hao Xu
                   ` (6 subsequent siblings)
  30 siblings, 1 reply; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

This causes xfstests generic/232 hung in umount process, waiting for ail
push, so I comment it for now, need some hints from xfs folks.
Not a real patch.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_buf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index cdad80e1ae25..284962a9f31a 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -828,6 +828,13 @@ xfs_buf_read_map(
 	trace_xfs_buf_read(bp, flags, _RET_IP_);
 
 	if (!(bp->b_flags & XBF_DONE)) {
+//		/*
+//		 * Let's bypass the _xfs_buf_read() for now
+//		 */
+//		if (flags & XBF_NOWAIT) {
+//			xfs_buf_relse(bp);
+//			return -EAGAIN;
+//		}
 		/* Initiate the buffer read and wait. */
 		XFS_STATS_INC(target->bt_mount, xb_get_read);
 		bp->b_ops = ops;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 25/29] xfs: support nowait for xfs_buf_item_init()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (23 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 24/29] xfs: support nowait for xfs_buf_read_map() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 22:16   ` Dave Chinner
  2023-08-25 13:54 ` [PATCH 26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit Hao Xu
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

support nowait for xfs_buf_item_init() and error out -EAGAIN to
_xfs_trans_bjoin() when it would block.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_buf_item.c         |  9 +++++++--
 fs/xfs/xfs_buf_item.h         |  2 +-
 fs/xfs/xfs_buf_item_recover.c |  2 +-
 fs/xfs/xfs_trans_buf.c        | 16 +++++++++++++---
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 023d4e0385dd..b1e63137d65b 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -827,7 +827,8 @@ xfs_buf_item_free_format(
 int
 xfs_buf_item_init(
 	struct xfs_buf	*bp,
-	struct xfs_mount *mp)
+	struct xfs_mount *mp,
+	bool   nowait)
 {
 	struct xfs_buf_log_item	*bip = bp->b_log_item;
 	int			chunks;
@@ -847,7 +848,11 @@ xfs_buf_item_init(
 		return 0;
 	}
 
-	bip = kmem_cache_zalloc(xfs_buf_item_cache, GFP_KERNEL | __GFP_NOFAIL);
+	bip = kmem_cache_zalloc(xfs_buf_item_cache,
+				GFP_KERNEL | (nowait ? 0 : __GFP_NOFAIL));
+	if (!bip)
+		return -EAGAIN;
+
 	xfs_log_item_init(mp, &bip->bli_item, XFS_LI_BUF, &xfs_buf_item_ops);
 	bip->bli_buf = bp;
 
diff --git a/fs/xfs/xfs_buf_item.h b/fs/xfs/xfs_buf_item.h
index 4d8a6aece995..b1daf8988280 100644
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -47,7 +47,7 @@ struct xfs_buf_log_item {
 	struct xfs_buf_log_format __bli_format;	/* embedded in-log header */
 };
 
-int	xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
+int	xfs_buf_item_init(struct xfs_buf *bp, struct xfs_mount *mp, bool nowait);
 void	xfs_buf_item_done(struct xfs_buf *bp);
 void	xfs_buf_item_relse(struct xfs_buf *);
 bool	xfs_buf_item_put(struct xfs_buf_log_item *);
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 43167f543afc..aa64d5a499d6 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -429,7 +429,7 @@ xlog_recover_validate_buf_type(
 		struct xfs_buf_log_item	*bip;
 
 		bp->b_flags |= _XBF_LOGRECOVERY;
-		xfs_buf_item_init(bp, mp);
+		xfs_buf_item_init(bp, mp, false);
 		bip = bp->b_log_item;
 		bip->bli_item.li_lsn = current_lsn;
 	}
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 016371f58f26..a1e4f2e8629a 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -57,13 +57,14 @@ xfs_trans_buf_item_match(
  * If the buffer does not yet have a buf log item associated with it,
  * then allocate one for it.  Then add the buf item to the transaction.
  */
-STATIC void
+STATIC int
 _xfs_trans_bjoin(
 	struct xfs_trans	*tp,
 	struct xfs_buf		*bp,
 	int			reset_recur)
 {
 	struct xfs_buf_log_item	*bip;
+	int ret;
 
 	ASSERT(bp->b_transp == NULL);
 
@@ -72,7 +73,11 @@ _xfs_trans_bjoin(
 	 * it doesn't have one yet, then allocate one and initialize it.
 	 * The checks to see if one is there are in xfs_buf_item_init().
 	 */
-	xfs_buf_item_init(bp, tp->t_mountp);
+	ret = xfs_buf_item_init(bp, tp->t_mountp,
+				tp->t_flags & XFS_TRANS_NOWAIT);
+	if (ret < 0)
+		return ret;
+
 	bip = bp->b_log_item;
 	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
 	ASSERT(!(bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
@@ -92,6 +97,7 @@ _xfs_trans_bjoin(
 	xfs_trans_add_item(tp, &bip->bli_item);
 	bp->b_transp = tp;
 
+	return 0;
 }
 
 void
@@ -309,7 +315,11 @@ xfs_trans_read_buf_map(
 	}
 
 	if (tp) {
-		_xfs_trans_bjoin(tp, bp, 1);
+		error = _xfs_trans_bjoin(tp, bp, 1);
+		if (error) {
+			xfs_buf_relse(bp);
+			return error;
+		}
 		trace_xfs_trans_read_buf(bp->b_log_item);
 	}
 	ASSERT(bp->b_ops != NULL || ops == NULL);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (24 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 25/29] xfs: support nowait for xfs_buf_item_init() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 21:58   ` Dave Chinner
  2023-08-25 13:54 ` [PATCH 27/29] xfs: add a comment for xlog_kvmalloc() Hao Xu
                   ` (4 subsequent siblings)
  30 siblings, 1 reply; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

if the log transaction is a sync one, let's fail the nowait try and
return -EAGAIN directly since sync transaction means blocked by IO.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_trans.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7988b4c7f36e..f1f84a3dd456 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -968,12 +968,24 @@ __xfs_trans_commit(
 	xfs_csn_t		commit_seq = 0;
 	int			error = 0;
 	int			sync = tp->t_flags & XFS_TRANS_SYNC;
+	bool			nowait = tp->t_flags & XFS_TRANS_NOWAIT;
+	bool			perm_log = tp->t_flags & XFS_TRANS_PERM_LOG_RES;
 
 	trace_xfs_trans_commit(tp, _RET_IP_);
 
+	if (nowait && sync) {
+		/*
+		 * Currently nowait is only from xfs_vn_update_time()
+		 * so perm_log is always false here, but let's make
+		 * code general.
+		 */
+		if (perm_log)
+			xfs_defer_cancel(tp);
+		goto out_unreserve;
+	}
 	error = xfs_trans_run_precommits(tp);
 	if (error) {
-		if (tp->t_flags & XFS_TRANS_PERM_LOG_RES)
+		if (perm_log)
 			xfs_defer_cancel(tp);
 		goto out_unreserve;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 27/29] xfs: add a comment for xlog_kvmalloc()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (25 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 13:54 ` [PATCH 28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit() Hao Xu
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

vmalloc() always succeed in 64 bit system?
Not a real patch.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_log_cil.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index f17c1799b3c4..b31830ee36dd 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -335,6 +335,9 @@ xlog_cil_alloc_shadow_bufs(
 			 * storage.
 			 */
 			kmem_free(lip->li_lv_shadow);
+			/*
+			 * May this be indefinite loop in nowait case?
+			 */
 			lv = xlog_kvmalloc(buf_size);
 
 			memset(lv, 0, xlog_cil_iovec_space(niovecs));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit()
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (26 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 27/29] xfs: add a comment for xlog_kvmalloc() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 21:59   ` Dave Chinner
  2023-08-25 13:54 ` [PATCH 29/29] io_uring: add support for getdents Hao Xu
                   ` (2 subsequent siblings)
  30 siblings, 1 reply; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

Apply trylock logic for xc_ctx_lock in xlog_cil_commit() in nowait
case and error out -EAGAIN for xlog_cil_commit().

Signed-off-by: Hao Xu <[email protected]>
---
 fs/xfs/xfs_log_cil.c  | 12 ++++++++++--
 fs/xfs/xfs_log_priv.h |  2 +-
 fs/xfs/xfs_trans.c    |  4 +++-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index b31830ee36dd..6d054359bbb5 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -1613,7 +1613,7 @@ xlog_cil_process_intents(
  * background commit, returns without it held once background commits are
  * allowed again.
  */
-void
+int
 xlog_cil_commit(
 	struct xlog		*log,
 	struct xfs_trans	*tp,
@@ -1623,6 +1623,7 @@ xlog_cil_commit(
 	struct xfs_cil		*cil = log->l_cilp;
 	struct xfs_log_item	*lip, *next;
 	uint32_t		released_space = 0;
+	bool			nowait = tp->t_flags & XFS_TRANS_NOWAIT;
 
 	/*
 	 * Do all necessary memory allocation before we lock the CIL.
@@ -1632,7 +1633,12 @@ xlog_cil_commit(
 	xlog_cil_alloc_shadow_bufs(log, tp);
 
 	/* lock out background commit */
-	down_read(&cil->xc_ctx_lock);
+	if (nowait) {
+		if (!down_read_trylock(&cil->xc_ctx_lock))
+			return -EAGAIN;
+	} else {
+		down_read(&cil->xc_ctx_lock);
+	}
 
 	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
 		released_space = xlog_cil_process_intents(cil, tp);
@@ -1668,6 +1674,8 @@ xlog_cil_commit(
 
 	/* xlog_cil_push_background() releases cil->xc_ctx_lock */
 	xlog_cil_push_background(log);
+
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 41edaa0ae869..eb7a1241deab 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -580,7 +580,7 @@ int	xlog_cil_init(struct xlog *log);
 void	xlog_cil_init_post_recovery(struct xlog *log);
 void	xlog_cil_destroy(struct xlog *log);
 bool	xlog_cil_empty(struct xlog *log);
-void	xlog_cil_commit(struct xlog *log, struct xfs_trans *tp,
+int	xlog_cil_commit(struct xlog *log, struct xfs_trans *tp,
 			xfs_csn_t *commit_seq, bool regrant);
 void	xlog_cil_set_ctx_write_state(struct xfs_cil_ctx *ctx,
 			struct xlog_in_core *iclog);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index f1f84a3dd456..e5beda636a37 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1037,7 +1037,9 @@ __xfs_trans_commit(
 		xfs_trans_apply_sb_deltas(tp);
 	xfs_trans_apply_dquot_deltas(tp);
 
-	xlog_cil_commit(log, tp, &commit_seq, regrant);
+	error = xlog_cil_commit(log, tp, &commit_seq, regrant);
+	if (error)
+		goto out_unreserve;
 
 	xfs_trans_free(tp);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 29/29] io_uring: add support for getdents
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (27 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit() Hao Xu
@ 2023-08-25 13:54 ` Hao Xu
  2023-08-25 15:11 ` [PATCH RFC v5 00/29] io_uring getdents Darrick J. Wong
  2023-08-25 22:53 ` Dave Chinner
  30 siblings, 0 replies; 39+ messages in thread
From: Hao Xu @ 2023-08-25 13:54 UTC (permalink / raw)
  To: io-uring, Jens Axboe
  Cc: Dominique Martinet, Pavel Begunkov, Christian Brauner,
	Alexander Viro, Stefan Roesch, Clay Harris, Dave Chinner,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

From: Hao Xu <[email protected]>

This add support for getdents64 to io_uring, acting exactly like the
syscall: the directory is iterated from it's current's position as
stored in the file struct, and the file's position is updated exactly as
if getdents64 had been called.

For filesystems that support NOWAIT in iterate_shared(), try to use it
first; if a user already knows the filesystem they use do not support
nowait they can force async through IOSQE_ASYNC in the sqe flags,
avoiding the need to bounce back through a useless EAGAIN return.

Co-developed-by: Dominique Martinet <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
Signed-off-by: Hao Xu <[email protected]>
---
 include/uapi/linux/io_uring.h |  1 +
 io_uring/fs.c                 | 53 +++++++++++++++++++++++++++++++++++
 io_uring/fs.h                 |  3 ++
 io_uring/opdef.c              |  8 ++++++
 4 files changed, 65 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 8e61f8b7c2ce..3896397a1998 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -240,6 +240,7 @@ enum io_uring_op {
 	IORING_OP_URING_CMD,
 	IORING_OP_SEND_ZC,
 	IORING_OP_SENDMSG_ZC,
+	IORING_OP_GETDENTS,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
diff --git a/io_uring/fs.c b/io_uring/fs.c
index f6a69a549fd4..04711feac4e6 100644
--- a/io_uring/fs.c
+++ b/io_uring/fs.c
@@ -47,6 +47,12 @@ struct io_link {
 	int				flags;
 };
 
+struct io_getdents {
+	struct file			*file;
+	struct linux_dirent64 __user	*dirent;
+	unsigned int			count;
+};
+
 int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_rename *ren = io_kiocb_to_cmd(req, struct io_rename);
@@ -291,3 +297,50 @@ void io_link_cleanup(struct io_kiocb *req)
 	putname(sl->oldpath);
 	putname(sl->newpath);
 }
+
+int io_getdents_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	struct io_getdents *gd = io_kiocb_to_cmd(req, struct io_getdents);
+
+	if (READ_ONCE(sqe->off))
+		return -EINVAL;
+
+	gd->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	gd->count = READ_ONCE(sqe->len);
+
+	return 0;
+}
+
+int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_getdents *gd = io_kiocb_to_cmd(req, struct io_getdents);
+	struct file *file = req->file;
+	unsigned long getdents_flags = 0;
+	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+	bool locked;
+	int ret;
+
+	if (force_nonblock) {
+		if (!(file->f_flags & O_NONBLOCK) &&
+		    !(file->f_mode & FMODE_NOWAIT))
+			return -EAGAIN;
+
+		getdents_flags = DIR_CONTEXT_F_NOWAIT;
+	}
+
+	ret = file_pos_lock_nowait(file, force_nonblock);
+	if (ret == -EAGAIN)
+		return ret;
+	locked = ret;
+
+	ret = vfs_getdents(file, gd->dirent, gd->count, getdents_flags);
+	if (locked)
+		file_pos_unlock(file);
+
+	if (ret == -EAGAIN && force_nonblock)
+		return -EAGAIN;
+
+	io_req_set_res(req, ret, 0);
+	return 0;
+}
+
diff --git a/io_uring/fs.h b/io_uring/fs.h
index 0bb5efe3d6bb..f83a6f3a678d 100644
--- a/io_uring/fs.h
+++ b/io_uring/fs.h
@@ -18,3 +18,6 @@ int io_symlinkat(struct io_kiocb *req, unsigned int issue_flags);
 int io_linkat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_linkat(struct io_kiocb *req, unsigned int issue_flags);
 void io_link_cleanup(struct io_kiocb *req);
+
+int io_getdents_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_getdents(struct io_kiocb *req, unsigned int issue_flags);
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 3b9c6489b8b6..1bae6b2a8d0b 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -428,6 +428,11 @@ const struct io_issue_def io_issue_defs[] = {
 		.prep			= io_eopnotsupp_prep,
 #endif
 	},
+	[IORING_OP_GETDENTS] = {
+		.needs_file		= 1,
+		.prep			= io_getdents_prep,
+		.issue			= io_getdents,
+	},
 };
 
 
@@ -648,6 +653,9 @@ const struct io_cold_def io_cold_defs[] = {
 		.fail			= io_sendrecv_fail,
 #endif
 	},
+	[IORING_OP_GETDENTS] = {
+		.name			= "GETDENTS",
+	},
 };
 
 const char *io_uring_get_opcode(u8 opcode)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 22/29] xfs: comment page allocation for nowait case in xfs_buf_find_insert()
  2023-08-25 13:54 ` [PATCH 22/29] xfs: comment page allocation for nowait case in xfs_buf_find_insert() Hao Xu
@ 2023-08-25 14:09   ` Matthew Wilcox
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Wilcox @ 2023-08-25 14:09 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Dave Chinner, Darrick J . Wong, linux-fsdevel, linux-xfs,
	linux-ext4, linux-cachefs, ecryptfs, linux-nfs, linux-unionfs,
	bpf, netdev, linux-s390, linux-kernel, linux-block, linux-btrfs,
	codalist, linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs,
	devel, linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:24PM +0800, Hao Xu wrote:
> @@ -633,6 +633,8 @@ xfs_buf_find_insert(
>  	 * allocate the memory from the heap to minimise memory usage. If we
>  	 * can't get heap memory for these small buffers, we fall back to using
>  	 * the page allocator.
> +	 * xfs_buf_alloc_kmem may return -EAGAIN, let's not return it but turn
> +	 * to page allocator as well.

This new sentence seems like it says exactly the same thing as the
previous sentence.  What am I missing?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 12/29] xfs: enforce GFP_NOIO implicitly during nowait time update
  2023-08-25 13:54 ` [PATCH 12/29] xfs: enforce GFP_NOIO implicitly during nowait time update Hao Xu
@ 2023-08-25 14:20   ` Matthew Wilcox
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Wilcox @ 2023-08-25 14:20 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Dave Chinner, Darrick J . Wong, linux-fsdevel, linux-xfs,
	linux-ext4, linux-cachefs, ecryptfs, linux-nfs, linux-unionfs,
	bpf, netdev, linux-s390, linux-kernel, linux-block, linux-btrfs,
	codalist, linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs,
	devel, linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:14PM +0800, Hao Xu wrote:
> +++ b/fs/xfs/xfs_iops.c
> @@ -1037,6 +1037,8 @@ xfs_vn_update_time(
>  	int			log_flags = XFS_ILOG_TIMESTAMP;
>  	struct xfs_trans	*tp;
>  	int			error;
> +	int			old_pflags;
> +	bool			nowait = flags & S_NOWAIT;
>  
>  	trace_xfs_update_time(ip);
>  
> @@ -1049,13 +1051,18 @@ xfs_vn_update_time(
>  		log_flags |= XFS_ILOG_CORE;
>  	}
>  
> +	if (nowait)
> +		old_pflags = memalloc_noio_save();
> +
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);

This is an abuse of the memalloc_noio_save() interface.  You shouldn't
be setting it around individual allocations; it's the part of the kernel
which decides "I can't afford to do I/O" that should be setting it.
In this case, it should probably be set by io_uring, way way way up at
the top.

But Jens didn't actually answer my question about that:

https://lore.kernel.org/all/[email protected]/


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC v5 00/29] io_uring getdents
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (28 preceding siblings ...)
  2023-08-25 13:54 ` [PATCH 29/29] io_uring: add support for getdents Hao Xu
@ 2023-08-25 15:11 ` Darrick J. Wong
  2023-08-25 22:53 ` Dave Chinner
  30 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2023-08-25 15:11 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Dave Chinner, linux-fsdevel, linux-xfs, linux-ext4, linux-cachefs,
	ecryptfs, linux-nfs, linux-unionfs, bpf, netdev, linux-s390,
	linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:02PM +0800, Hao Xu wrote:
> From: Hao Xu <[email protected]>
> 
> This series introduce getdents64 to io_uring, the code logic is similar
> with the snychronized version's. It first try nowait issue, and offload
> it to io-wq threads if the first try fails.

NAK on the entire series until Jens actually writes down what NOWAIT
does, so that we can check that the *existing* nowait code branches
actually behave how he says it should.

https://lore.kernel.org/all/[email protected]/

--D

> 
> Patch1 and Patch2 are some preparation
> Patch3 supports nowait for xfs getdents code
> Patch4-11 are vfs change, include adding helpers and trylock for locks
> Patch12-29 supports nowait for involved xfs journal stuff
> note, Patch24 and 27 are actually two questions, might be removed later.
> an xfs test may come later.
> 
> Tests I've done:
> a liburing test case for functional test:
> https://github.com/HowHsu/liburing/commit/39dc9a8e19c06a8cebf8c2301b85320eb45c061e?diff=unified
> 
> xfstests:
>     test/generic: 1 fails and 171 not run
>     test/xfs: 72 fails and 156 not run
> run the code before without this patchset, same result.
> I'll try to make the environment more right to run more tests here.
> 
> 
> Tested it with a liburing performance test:
> https://github.com/HowHsu/liburing/blob/getdents/test/getdents2.c
> 
> The test is controlled by the below script[2] which runs getdents2.t 100
> times and calulate the avg.
> The result show that io_uring version is about 2.6% faster:
> 
> note:
> [1] the number of getdents call/request in io_uring and normal sync version
> are made sure to be same beforehand.
> 
> [2] run_getdents.py
> 
> ```python3
> 
> import subprocess
> 
> N = 100
> sum = 0.0
> args = ["/data/home/howeyxu/tmpdir", "sync"]
> 
> for i in range(N):
>     output = subprocess.check_output(["./liburing/test/getdents2.t"] + args)
>     sum += float(output)
> 
> average = sum / N
> print("Average of sync:", average)
> 
> sum = 0.0
> args = ["/data/home/howeyxu/tmpdir", "iouring"]
> 
> for i in range(N):
>     output = subprocess.check_output(["./liburing/test/getdents2.t"] + args)
>     sum += float(output)
> 
> average = sum / N
> print("Average of iouring:", average)
> 
> ```
> 
> v4->v5:
>  - move atime update to the beginning of getdents operation
>  - trylock for i_rwsem
>  - nowait semantics for involved xfs journal stuff
> 
> v3->v4:
>  - add Dave's xfs nowait code and fix a deadlock problem, with some code
>    style tweak.
>  - disable fixed file to avoid a race problem for now
>  - add a test program.
> 
> v2->v3:
>  - removed the kernfs patches
>  - add f_pos_lock logic
>  - remove the "reduce last EOF getdents try" optimization since
>    Dominique reports that doesn't make difference
>  - remove the rewind logic, I think the right way is to introduce lseek
>    to io_uring not to patch this logic to getdents.
>  - add Singed-off-by of Stefan Roesch for patch 1 since checkpatch
>    complained that Co-developed-by someone should be accompanied with
>    Signed-off-by same person, I can remove them if Stefan thinks that's
>    not proper.
> 
> 
> Dominique Martinet (1):
>   fs: split off vfs_getdents function of getdents64 syscall
> 
> Hao Xu (28):
>   xfs: rename XBF_TRYLOCK to XBF_NOWAIT
>   xfs: add NOWAIT semantics for readdir
>   vfs: add nowait flag for struct dir_context
>   vfs: add a vfs helper for io_uring file pos lock
>   vfs: add file_pos_unlock() for io_uring usage
>   vfs: add a nowait parameter for touch_atime()
>   vfs: add nowait parameter for file_accessed()
>   vfs: move file_accessed() to the beginning of iterate_dir()
>   vfs: add S_NOWAIT for nowait time update
>   vfs: trylock inode->i_rwsem in iterate_dir() to support nowait
>   xfs: enforce GFP_NOIO implicitly during nowait time update
>   xfs: make xfs_trans_alloc() support nowait semantics
>   xfs: support nowait for xfs_log_reserve()
>   xfs: don't wait for free space in xlog_grant_head_check() in nowait
>     case
>   xfs: add nowait parameter for xfs_inode_item_init()
>   xfs: make xfs_trans_ijoin() error out -EAGAIN
>   xfs: set XBF_NOWAIT for xfs_buf_read_map if necessary
>   xfs: support nowait memory allocation in _xfs_buf_alloc()
>   xfs: distinguish error type of memory allocation failure for nowait
>     case
>   xfs: return -EAGAIN when bulk memory allocation fails in nowait case
>   xfs: comment page allocation for nowait case in xfs_buf_find_insert()
>   xfs: don't print warn info for -EAGAIN error in  xfs_buf_get_map()
>   xfs: support nowait for xfs_buf_read_map()
>   xfs: support nowait for xfs_buf_item_init()
>   xfs: return -EAGAIN when nowait meets sync in transaction commit
>   xfs: add a comment for xlog_kvmalloc()
>   xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit()
>   io_uring: add support for getdents
> 
>  arch/s390/hypfs/inode.c         |  2 +-
>  block/fops.c                    |  2 +-
>  fs/btrfs/file.c                 |  2 +-
>  fs/btrfs/inode.c                |  2 +-
>  fs/cachefiles/namei.c           |  2 +-
>  fs/coda/dir.c                   |  4 +--
>  fs/ecryptfs/file.c              |  4 +--
>  fs/ext2/file.c                  |  4 +--
>  fs/ext4/file.c                  |  6 ++--
>  fs/f2fs/file.c                  |  4 +--
>  fs/file.c                       | 13 +++++++
>  fs/fuse/dax.c                   |  2 +-
>  fs/fuse/file.c                  |  4 +--
>  fs/gfs2/file.c                  |  2 +-
>  fs/hugetlbfs/inode.c            |  2 +-
>  fs/inode.c                      | 10 +++---
>  fs/internal.h                   |  8 +++++
>  fs/namei.c                      |  4 +--
>  fs/nfsd/vfs.c                   |  2 +-
>  fs/nilfs2/file.c                |  2 +-
>  fs/orangefs/file.c              |  2 +-
>  fs/orangefs/inode.c             |  2 +-
>  fs/overlayfs/file.c             |  2 +-
>  fs/overlayfs/inode.c            |  2 +-
>  fs/pipe.c                       |  2 +-
>  fs/ramfs/file-nommu.c           |  2 +-
>  fs/readdir.c                    | 61 +++++++++++++++++++++++++--------
>  fs/smb/client/cifsfs.c          |  2 +-
>  fs/splice.c                     |  2 +-
>  fs/stat.c                       |  2 +-
>  fs/ubifs/file.c                 |  2 +-
>  fs/udf/file.c                   |  2 +-
>  fs/xfs/libxfs/xfs_alloc.c       |  2 +-
>  fs/xfs/libxfs/xfs_attr_remote.c |  2 +-
>  fs/xfs/libxfs/xfs_btree.c       |  2 +-
>  fs/xfs/libxfs/xfs_da_btree.c    | 16 +++++++++
>  fs/xfs/libxfs/xfs_da_btree.h    |  1 +
>  fs/xfs/libxfs/xfs_dir2_block.c  |  7 ++--
>  fs/xfs/libxfs/xfs_dir2_priv.h   |  2 +-
>  fs/xfs/libxfs/xfs_shared.h      |  2 ++
>  fs/xfs/libxfs/xfs_trans_inode.c | 12 +++++--
>  fs/xfs/scrub/dir.c              |  2 +-
>  fs/xfs/scrub/readdir.c          |  2 +-
>  fs/xfs/scrub/repair.c           |  2 +-
>  fs/xfs/xfs_buf.c                | 43 +++++++++++++++++------
>  fs/xfs/xfs_buf.h                |  4 +--
>  fs/xfs/xfs_buf_item.c           |  9 +++--
>  fs/xfs/xfs_buf_item.h           |  2 +-
>  fs/xfs/xfs_buf_item_recover.c   |  2 +-
>  fs/xfs/xfs_dir2_readdir.c       | 49 ++++++++++++++++++++------
>  fs/xfs/xfs_dquot.c              |  2 +-
>  fs/xfs/xfs_file.c               |  6 ++--
>  fs/xfs/xfs_inode.c              | 27 +++++++++++++++
>  fs/xfs/xfs_inode.h              | 17 +++++----
>  fs/xfs/xfs_inode_item.c         | 12 ++++---
>  fs/xfs/xfs_inode_item.h         |  3 +-
>  fs/xfs/xfs_iops.c               | 31 ++++++++++++++---
>  fs/xfs/xfs_log.c                | 33 ++++++++++++------
>  fs/xfs/xfs_log.h                |  5 +--
>  fs/xfs/xfs_log_cil.c            | 17 +++++++--
>  fs/xfs/xfs_log_priv.h           |  4 +--
>  fs/xfs/xfs_trans.c              | 44 ++++++++++++++++++++----
>  fs/xfs/xfs_trans.h              |  2 +-
>  fs/xfs/xfs_trans_buf.c          | 18 ++++++++--
>  fs/zonefs/file.c                |  4 +--
>  include/linux/file.h            |  7 ++++
>  include/linux/fs.h              | 16 +++++++--
>  include/uapi/linux/io_uring.h   |  1 +
>  io_uring/fs.c                   | 53 ++++++++++++++++++++++++++++
>  io_uring/fs.h                   |  3 ++
>  io_uring/opdef.c                |  8 +++++
>  kernel/bpf/inode.c              |  4 +--
>  mm/filemap.c                    |  8 ++---
>  mm/shmem.c                      |  6 ++--
>  net/unix/af_unix.c              |  4 +--
>  75 files changed, 499 insertions(+), 161 deletions(-)
> 
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT
  2023-08-25 13:54 ` [PATCH 02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT Hao Xu
@ 2023-08-25 21:39   ` Dave Chinner
  0 siblings, 0 replies; 39+ messages in thread
From: Dave Chinner @ 2023-08-25 21:39 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:04PM +0800, Hao Xu wrote:
> From: Hao Xu <[email protected]>
> 
> XBF_TRYLOCK means we need lock but don't block on it,

Yes.


> we can use it to
> stand for not waiting for memory allcation. Rename XBF_TRYLOCK to
> XBF_NOWAIT, which is more generic.

No.

Not only can XBF_TRYLOCK require memory allocation, it can require
IO to be issued. We use TRYLOCK for -readahead- and so we *must* be
able to allocate memory and issue IO under TRYLOCK caller
conditions.

[...]

> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index d440393b40eb..2ccb0867824c 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -661,7 +661,7 @@ xfs_attr_rmtval_invalidate(
>  			return error;
>  		if (XFS_IS_CORRUPT(args->dp->i_mount, nmap != 1))
>  			return -EFSCORRUPTED;
> -		error = xfs_attr_rmtval_stale(args->dp, &map, XBF_TRYLOCK);
> +		error = xfs_attr_rmtval_stale(args->dp, &map, XBF_NOWAIT);
>  		if (error)
>  			return error;

XBF_INCORE | XBF_NOWAIT makes no real sense. I mean, XBF_INCORE is
exactly "find a cached buffer or fail" - it's not going to do any
memory allocation or IO so NOWAIT smeantics don't make any sense
here. It's the buffer lock that this lookup is explicitly
avoiding, and so TRYLOCK describes exactly the semantics we want
from this incore lookup.

Indeed, this is a deadlock avoidance mechanism as the transaction
may already have the buffer locked and so we don't want the
xfs_buf_incore() lookup to try to lock the buffer again. TRYLOCK
documents this pretty clearly - NOWAIT loses that context....

> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 6a6503ab0cd7..77c4f1d83475 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -1343,7 +1343,7 @@ xfs_btree_read_buf_block(
>  	int			error;
>  
>  	/* need to sort out how callers deal with failures first */
> -	ASSERT(!(flags & XBF_TRYLOCK));
> +	ASSERT(!(flags & XBF_NOWAIT));
>  
>  	error = xfs_btree_ptr_to_daddr(cur, ptr, &d);
>  	if (error)
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index ac6d8803e660..9312cf3b20e2 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -460,7 +460,7 @@ xrep_invalidate_block(
>  
>  	error = xfs_buf_incore(sc->mp->m_ddev_targp,
>  			XFS_FSB_TO_DADDR(sc->mp, fsbno),
> -			XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK, &bp);
> +			XFS_FSB_TO_BB(sc->mp, 1), XBF_NOWAIT, &bp);

My point exactly.

xfs_buf_incore() is simply a lookup with XBF_INCORE set. (XBF_INCORE
| XBF_TRYLOCK) has the exactly semantics of "return the buffer only
if it is cached and we can lock it without blocking.

It will not instantiate a new buffer (i.e. do memory allocation) or
do IO because the if it is under IO the buffer lock will be held.

So, essentially, this "NOWAIT" semantic you want is already supplied
by (XBF_INCORE | XBF_TRYLOCK) buffer lookups.

>  	if (error)
>  		return 0;
>  
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 15d1e5a7c2d3..9f84bc3b802c 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -228,7 +228,7 @@ _xfs_buf_alloc(
>  	 * We don't want certain flags to appear in b_flags unless they are
>  	 * specifically set by later operations on the buffer.
>  	 */
> -	flags &= ~(XBF_UNMAPPED | XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD);
> +	flags &= ~(XBF_UNMAPPED | XBF_NOWAIT | XBF_ASYNC | XBF_READ_AHEAD);
>  
>  	atomic_set(&bp->b_hold, 1);
>  	atomic_set(&bp->b_lru_ref, 1);
> @@ -543,7 +543,7 @@ xfs_buf_find_lock(
>  	struct xfs_buf          *bp,
>  	xfs_buf_flags_t		flags)
>  {
> -	if (flags & XBF_TRYLOCK) {
> +	if (flags & XBF_NOWAIT) {
>  		if (!xfs_buf_trylock(bp)) {
>  			XFS_STATS_INC(bp->b_mount, xb_busy_locked);
>  			return -EAGAIN;
> @@ -886,7 +886,7 @@ xfs_buf_readahead_map(
>  	struct xfs_buf		*bp;
>  
>  	xfs_buf_read_map(target, map, nmaps,
> -		     XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
> +		     XBF_NOWAIT | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops,
>  		     __this_address);

That will break readahead (which we use extensively in getdents
operations) if we can't allocate buffers and issue IO under NOWAIT
conditions.

>  }
>  
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 549c60942208..8cd307626939 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -45,7 +45,7 @@ struct xfs_buf;
>  
>  /* flags used only as arguments to access routines */
>  #define XBF_INCORE	 (1u << 29)/* lookup only, return if found in cache */
> -#define XBF_TRYLOCK	 (1u << 30)/* lock requested, but do not wait */
> +#define XBF_NOWAIT	 (1u << 30)/* mem/lock requested, but do not wait */

That's now a really poor comment. It doesn't describe the semantics
or constraints that NOWAIT might imply.

-Dave.
-- 
Dave Chinner
[email protected]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 24/29] xfs: support nowait for xfs_buf_read_map()
  2023-08-25 13:54 ` [PATCH 24/29] xfs: support nowait for xfs_buf_read_map() Hao Xu
@ 2023-08-25 21:53   ` Dave Chinner
  0 siblings, 0 replies; 39+ messages in thread
From: Dave Chinner @ 2023-08-25 21:53 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:26PM +0800, Hao Xu wrote:
> From: Hao Xu <[email protected]>
> 
> This causes xfstests generic/232 hung in umount process, waiting for ail
> push, so I comment it for now, need some hints from xfs folks.
> Not a real patch.
> 
> Signed-off-by: Hao Xu <[email protected]>
> ---
>  fs/xfs/xfs_buf.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index cdad80e1ae25..284962a9f31a 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -828,6 +828,13 @@ xfs_buf_read_map(
>  	trace_xfs_buf_read(bp, flags, _RET_IP_);
>  
>  	if (!(bp->b_flags & XBF_DONE)) {
> +//		/*
> +//		 * Let's bypass the _xfs_buf_read() for now
> +//		 */
> +//		if (flags & XBF_NOWAIT) {
> +//			xfs_buf_relse(bp);
> +//			return -EAGAIN;
> +//		}

This is *fundamentally broken*, and apart from anything else breaks
readahead.

IF we asked for a read, we cannot instantiate the buffer and then
*not issue any IO on it* and release it. That leaves an
uninitialised buffer in memory, and there's every chance that
something then trips over it and bad things happen.

A buffer like this *must* be errored out and marked stale so that
the next access to it will then re-initialise the buffer state and
trigger any preparatory work that needs to be done for the new
operation.

This comes back to my first comments that XBF_TRYLOCK cannot simpy
be replaced with XBF_NOWAIT semantics...

-Dave.
-- 
Dave Chinner
[email protected]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit
  2023-08-25 13:54 ` [PATCH 26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit Hao Xu
@ 2023-08-25 21:58   ` Dave Chinner
  0 siblings, 0 replies; 39+ messages in thread
From: Dave Chinner @ 2023-08-25 21:58 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:28PM +0800, Hao Xu wrote:
> From: Hao Xu <[email protected]>
> 
> if the log transaction is a sync one, let's fail the nowait try and
> return -EAGAIN directly since sync transaction means blocked by IO.
> 
> Signed-off-by: Hao Xu <[email protected]>
> ---
>  fs/xfs/xfs_trans.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 7988b4c7f36e..f1f84a3dd456 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -968,12 +968,24 @@ __xfs_trans_commit(
>  	xfs_csn_t		commit_seq = 0;
>  	int			error = 0;
>  	int			sync = tp->t_flags & XFS_TRANS_SYNC;
> +	bool			nowait = tp->t_flags & XFS_TRANS_NOWAIT;
> +	bool			perm_log = tp->t_flags & XFS_TRANS_PERM_LOG_RES;
>  
>  	trace_xfs_trans_commit(tp, _RET_IP_);
>  
> +	if (nowait && sync) {
> +		/*
> +		 * Currently nowait is only from xfs_vn_update_time()
> +		 * so perm_log is always false here, but let's make
> +		 * code general.
> +		 */
> +		if (perm_log)
> +			xfs_defer_cancel(tp);
> +		goto out_unreserve;
> +	}

This is fundamentally broken.  We cannot about a transaction commit
with dirty items at this point with shutting down the filesystem.

This points to XFS_TRANS_NOWAIT being completely broken, too,
because we don't call xfs_trans_set_sync() until just before we
commit the transaction. At this point, it is -too late- for
nowait+sync to be handled gracefully, and it will *always* go bad.

IOWs, the whole transaction "nowait" semantics as designed and
implemented is not a workable solution....

-Dave.
-- 
Dave Chinner
[email protected]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit()
  2023-08-25 13:54 ` [PATCH 28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit() Hao Xu
@ 2023-08-25 21:59   ` Dave Chinner
  0 siblings, 0 replies; 39+ messages in thread
From: Dave Chinner @ 2023-08-25 21:59 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:30PM +0800, Hao Xu wrote:
> From: Hao Xu <[email protected]>
> 
> Apply trylock logic for xc_ctx_lock in xlog_cil_commit() in nowait
> case and error out -EAGAIN for xlog_cil_commit().

Again, fundamentally broken. Any error from xlog_cil_commit() will
result in a filesystem shutdown as we cannot back out from failure
with dirty log items gracefully at this point.

-Dave.

-- 
Dave Chinner
[email protected]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 25/29] xfs: support nowait for xfs_buf_item_init()
  2023-08-25 13:54 ` [PATCH 25/29] xfs: support nowait for xfs_buf_item_init() Hao Xu
@ 2023-08-25 22:16   ` Dave Chinner
  0 siblings, 0 replies; 39+ messages in thread
From: Dave Chinner @ 2023-08-25 22:16 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:27PM +0800, Hao Xu wrote:
> From: Hao Xu <[email protected]>
> 
> support nowait for xfs_buf_item_init() and error out -EAGAIN to
> _xfs_trans_bjoin() when it would block.
> 
> Signed-off-by: Hao Xu <[email protected]>
> ---
>  fs/xfs/xfs_buf_item.c         |  9 +++++++--
>  fs/xfs/xfs_buf_item.h         |  2 +-
>  fs/xfs/xfs_buf_item_recover.c |  2 +-
>  fs/xfs/xfs_trans_buf.c        | 16 +++++++++++++---
>  4 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 023d4e0385dd..b1e63137d65b 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -827,7 +827,8 @@ xfs_buf_item_free_format(
>  int
>  xfs_buf_item_init(
>  	struct xfs_buf	*bp,
> -	struct xfs_mount *mp)
> +	struct xfs_mount *mp,
> +	bool   nowait)
>  {
>  	struct xfs_buf_log_item	*bip = bp->b_log_item;
>  	int			chunks;
> @@ -847,7 +848,11 @@ xfs_buf_item_init(
>  		return 0;
>  	}
>  
> -	bip = kmem_cache_zalloc(xfs_buf_item_cache, GFP_KERNEL | __GFP_NOFAIL);
> +	bip = kmem_cache_zalloc(xfs_buf_item_cache,
> +				GFP_KERNEL | (nowait ? 0 : __GFP_NOFAIL));
> +	if (!bip)
> +		return -EAGAIN;
> +
>  	xfs_log_item_init(mp, &bip->bli_item, XFS_LI_BUF, &xfs_buf_item_ops);
>  	bip->bli_buf = bp;

I see filesystem shutdowns....

> diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
> index 016371f58f26..a1e4f2e8629a 100644
> --- a/fs/xfs/xfs_trans_buf.c
> +++ b/fs/xfs/xfs_trans_buf.c
> @@ -57,13 +57,14 @@ xfs_trans_buf_item_match(
>   * If the buffer does not yet have a buf log item associated with it,
>   * then allocate one for it.  Then add the buf item to the transaction.
>   */
> -STATIC void
> +STATIC int
>  _xfs_trans_bjoin(
>  	struct xfs_trans	*tp,
>  	struct xfs_buf		*bp,
>  	int			reset_recur)
>  {
>  	struct xfs_buf_log_item	*bip;
> +	int ret;
>  
>  	ASSERT(bp->b_transp == NULL);
>  
> @@ -72,7 +73,11 @@ _xfs_trans_bjoin(
>  	 * it doesn't have one yet, then allocate one and initialize it.
>  	 * The checks to see if one is there are in xfs_buf_item_init().
>  	 */
> -	xfs_buf_item_init(bp, tp->t_mountp);
> +	ret = xfs_buf_item_init(bp, tp->t_mountp,
> +				tp->t_flags & XFS_TRANS_NOWAIT);
> +	if (ret < 0)
> +		return ret;
> +
>  	bip = bp->b_log_item;
>  	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
>  	ASSERT(!(bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
> @@ -92,6 +97,7 @@ _xfs_trans_bjoin(
>  	xfs_trans_add_item(tp, &bip->bli_item);
>  	bp->b_transp = tp;
>  
> +	return 0;
>  }
>  
>  void
> @@ -309,7 +315,11 @@ xfs_trans_read_buf_map(
>  	}
>  
>  	if (tp) {
> -		_xfs_trans_bjoin(tp, bp, 1);
> +		error = _xfs_trans_bjoin(tp, bp, 1);
> +		if (error) {
> +			xfs_buf_relse(bp);
> +			return error;
> +		}
>  		trace_xfs_trans_read_buf(bp->b_log_item);

So what happens at the callers when we have a dirty transaction and
joining a buffer fails with -EAGAIN?

Apart from the fact this may well propagate -EAGAIN up to userspace,
cancelling a dirty transaction at this point will result in a
filesystem shutdown....

Indeed, this can happen in the "simple" timestamp update case that
this "nowait" semantic is being aimed at. We log the inode in the
timestamp update, which dirties the log item and registers a
precommit operation to be run. We commit the
transaction, which then runs xfs_inode_item_precommit() and that
may need to attach the inode to the inode cluster buffer. This
results in:

xfs_inode_item_precommit
  xfs_imap_to_bp
    xfs_trans_read_buf_map
      _xfs_trans_bjoin
        xfs_buf_item_init(XFS_TRANS_NOWAIT)
	  kmem_cache_zalloc(GFP_NOFS)
	  <memory allocation fails>
      gets -EAGAIN error
    propagates -EAGAIN
  fails due to -EAGAIN

And now xfs_trans_commit() fails with a dirty transaction and the
filesystem shuts down.

IOWs, XFS_TRANS_NOWAIT as it stands is fundamentally broken. Once we
dirty an item in a transaction, we *cannot* back out of the
transaction. We *must block* in every place that could fail -
locking, memory allocation and/or IO - until the transaction
completes because we cannot undo the changes we've already made to
the dirty items in the transaction....

It's even worse than that - once we have committed intents, the
whole chain of intent processing must be run to completionr. Hence
we can't tolerate backing out of that defered processing chain half
way through because we might have to block.

Until we can roll back partial dirty transactions and partially
completed defered intent chains at any random point of completion,
XFS_TRANS_NOWAIT will not work.

-Dave.
-- 
Dave Chinner
[email protected]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH RFC v5 00/29] io_uring getdents
  2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
                   ` (29 preceding siblings ...)
  2023-08-25 15:11 ` [PATCH RFC v5 00/29] io_uring getdents Darrick J. Wong
@ 2023-08-25 22:53 ` Dave Chinner
  30 siblings, 0 replies; 39+ messages in thread
From: Dave Chinner @ 2023-08-25 22:53 UTC (permalink / raw)
  To: Hao Xu
  Cc: io-uring, Jens Axboe, Dominique Martinet, Pavel Begunkov,
	Christian Brauner, Alexander Viro, Stefan Roesch, Clay Harris,
	Darrick J . Wong, linux-fsdevel, linux-xfs, linux-ext4,
	linux-cachefs, ecryptfs, linux-nfs, linux-unionfs, bpf, netdev,
	linux-s390, linux-kernel, linux-block, linux-btrfs, codalist,
	linux-f2fs-devel, cluster-devel, linux-mm, linux-nilfs, devel,
	linux-cifs, samba-technical, linux-mtd, Wanpeng Li

On Fri, Aug 25, 2023 at 09:54:02PM +0800, Hao Xu wrote:
> From: Hao Xu <[email protected]>
> 
> This series introduce getdents64 to io_uring, the code logic is similar
> with the snychronized version's. It first try nowait issue, and offload
> it to io-wq threads if the first try fails.
> 
> Patch1 and Patch2 are some preparation
> Patch3 supports nowait for xfs getdents code
> Patch4-11 are vfs change, include adding helpers and trylock for locks
> Patch12-29 supports nowait for involved xfs journal stuff
> note, Patch24 and 27 are actually two questions, might be removed later.
> an xfs test may come later.

You need to drop all the XFS journal stuff. It's fundamentally
broken as it stands, and we cannot support non-blocking
transactional changes without first putting a massive investment in
transaction and intent chain rollback to allow correctly undoing
partially complete modifications.

Regardless, non-blocking transactions are completely unnecessary for
a non-blocking readdir implementation. readdir should only be
touching atime, and with relatime it should only occur once every 24
hours per inode. If that's a problem, then we have noatime mount
options. Hence I just don't see any point in worrying about having a
timestamp update block occasionally...

I also don't really don't see why you need to fiddle with xfs buffer
cache semantics - it already has the functionality "nowait" buffer
reads require (i.e.  XBF_INCORE|XBF_TRYLOCK).

However, the readahead IO that the xfs readdir code issues cannot
use your defined NOWAIT semantics - it must be able to allocate
memory and issue IO. Readahead already avoids blocking on memory
allocation and blocking on IO via the XBF_READ_AHEAD flag. This sets
__GFP_NORETRY for buffer allocation and REQ_RAHEAD for IO. Hence
readahead only needs the existing XBF_TRYLOCK flag to be set to be
compatible with the required NOWAIT semantics....

As for the NOIO memory allocation restrictions io_uring requires,
that should be enforced at the io_uring layer before calling into
the VFS using memalloc_noio_save/restore.  At that point no memory
allocation will trigger IO and none of the code running under NOWAIT
conditions even needs to be aware that io_uring has a GFP_NOIO
restriction on memory allocation....

Please go back to the simple "do non-blocking buffer IO"
implementation we started with and don't try to solve every little
blocking problem that might exist in the VFS and filesystems...

-Dave
-- 
Dave Chinner
[email protected]

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-08-25 22:54 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-25 13:54 [PATCH RFC v5 00/29] io_uring getdents Hao Xu
2023-08-25 13:54 ` [PATCH 01/29] fs: split off vfs_getdents function of getdents64 syscall Hao Xu
2023-08-25 13:54 ` [PATCH 02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT Hao Xu
2023-08-25 21:39   ` Dave Chinner
2023-08-25 13:54 ` [PATCH 03/29] xfs: add NOWAIT semantics for readdir Hao Xu
2023-08-25 13:54 ` [PATCH 04/29] vfs: add nowait flag for struct dir_context Hao Xu
2023-08-25 13:54 ` [PATCH 05/29] vfs: add a vfs helper for io_uring file pos lock Hao Xu
2023-08-25 13:54 ` [PATCH 06/29] vfs: add file_pos_unlock() for io_uring usage Hao Xu
2023-08-25 13:54 ` [PATCH 07/29] vfs: add a nowait parameter for touch_atime() Hao Xu
2023-08-25 13:54 ` [PATCH 08/29] vfs: add nowait parameter for file_accessed() Hao Xu
2023-08-25 13:54 ` [PATCH 09/29] vfs: move file_accessed() to the beginning of iterate_dir() Hao Xu
2023-08-25 13:54 ` [PATCH 10/29] vfs: add S_NOWAIT for nowait time update Hao Xu
2023-08-25 13:54 ` [PATCH 11/29] vfs: trylock inode->i_rwsem in iterate_dir() to support nowait Hao Xu
2023-08-25 13:54 ` [PATCH 12/29] xfs: enforce GFP_NOIO implicitly during nowait time update Hao Xu
2023-08-25 14:20   ` Matthew Wilcox
2023-08-25 13:54 ` [PATCH 13/29] xfs: make xfs_trans_alloc() support nowait semantics Hao Xu
2023-08-25 13:54 ` [PATCH 14/29] xfs: support nowait for xfs_log_reserve() Hao Xu
2023-08-25 13:54 ` [PATCH 15/29] xfs: don't wait for free space in xlog_grant_head_check() in nowait case Hao Xu
2023-08-25 13:54 ` [PATCH 16/29] xfs: add nowait parameter for xfs_inode_item_init() Hao Xu
2023-08-25 13:54 ` [PATCH 17/29] xfs: make xfs_trans_ijoin() error out -EAGAIN Hao Xu
2023-08-25 13:54 ` [PATCH 18/29] xfs: set XBF_NOWAIT for xfs_buf_read_map if necessary Hao Xu
2023-08-25 13:54 ` [PATCH 19/29] xfs: support nowait memory allocation in _xfs_buf_alloc() Hao Xu
2023-08-25 13:54 ` [PATCH 20/29] xfs: distinguish error type of memory allocation failure for nowait case Hao Xu
2023-08-25 13:54 ` [PATCH 21/29] xfs: return -EAGAIN when bulk memory allocation fails in " Hao Xu
2023-08-25 13:54 ` [PATCH 22/29] xfs: comment page allocation for nowait case in xfs_buf_find_insert() Hao Xu
2023-08-25 14:09   ` Matthew Wilcox
2023-08-25 13:54 ` [PATCH 23/29] xfs: don't print warn info for -EAGAIN error in xfs_buf_get_map() Hao Xu
2023-08-25 13:54 ` [PATCH 24/29] xfs: support nowait for xfs_buf_read_map() Hao Xu
2023-08-25 21:53   ` Dave Chinner
2023-08-25 13:54 ` [PATCH 25/29] xfs: support nowait for xfs_buf_item_init() Hao Xu
2023-08-25 22:16   ` Dave Chinner
2023-08-25 13:54 ` [PATCH 26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit Hao Xu
2023-08-25 21:58   ` Dave Chinner
2023-08-25 13:54 ` [PATCH 27/29] xfs: add a comment for xlog_kvmalloc() Hao Xu
2023-08-25 13:54 ` [PATCH 28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit() Hao Xu
2023-08-25 21:59   ` Dave Chinner
2023-08-25 13:54 ` [PATCH 29/29] io_uring: add support for getdents Hao Xu
2023-08-25 15:11 ` [PATCH RFC v5 00/29] io_uring getdents Darrick J. Wong
2023-08-25 22:53 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox