* re-enable IOCB_NOWAIT writes to files v3
@ 2025-12-17 6:09 Christoph Hellwig
2025-12-17 6:09 ` [PATCH 01/10] fs: remove inode_update_time Christoph Hellwig
` (9 more replies)
0 siblings, 10 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
Hi all,
commit 66fa3cedf16a ("fs: Add async write file modification handling.")
effectively disabled IOCB_NOWAIT writes as timestamp updates currently
always require blocking, and the modern timestamp resolution means we
always update timestamps. This leads to a lot of context switches from
applications using io_uring to submit file writes, making it often worse
than using the legacy aio code that is not using IOCB_NOWAIT.
This series allows non-blocking updates for lazytime if the file system
supports it, and adds that support for XFS.
Changes since v2:
- drop patches merged upstream
- adjust for the inode state accesors
- keep a check in __writeback_single_inode instead of exercising
potentially undefined behavior
- more spelling fixes
Changes since v1:
- more regular numbering of the S_* flags
- fix XFS to actually not block
- don't ignore the generic_update_time return value in
file_update_time_flags
- fix the sync_lazytime return value
- fix an out of data comment in btrfs
- fix a race that would update i_version before returning -EAGAIN in XFS
Diffstat:
Documentation/filesystems/locking.rst | 2
Documentation/filesystems/vfs.rst | 6 ++
fs/btrfs/inode.c | 11 +++-
fs/fat/misc.c | 3 +
fs/fs-writeback.c | 52 ++++++++++++++++---
fs/gfs2/inode.c | 6 +-
fs/inode.c | 89 +++++++++++++++++++---------------
fs/internal.h | 3 -
fs/nfs/inode.c | 4 -
fs/orangefs/inode.c | 8 ++-
fs/overlayfs/inode.c | 3 +
fs/sync.c | 4 -
fs/ubifs/file.c | 11 ++--
fs/xfs/xfs_iops.c | 35 ++++++++++++-
fs/xfs/xfs_super.c | 29 -----------
include/linux/fs.h | 19 ++++---
include/trace/events/writeback.h | 6 --
17 files changed, 182 insertions(+), 109 deletions(-)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 01/10] fs: remove inode_update_time
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 6:09 ` [PATCH 02/10] fs: allow error returns from generic_update_time Christoph Hellwig
` (8 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs,
Chaitanya Kulkarni
The only external user is gone now, open code it in the two VFS
callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/inode.c | 23 ++++++++---------------
include/linux/fs.h | 1 -
2 files changed, 8 insertions(+), 16 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 521383223d8a..07effa0cb999 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2157,19 +2157,6 @@ int generic_update_time(struct inode *inode, int flags)
}
EXPORT_SYMBOL(generic_update_time);
-/*
- * This does the actual work of updating an inodes time or version. Must have
- * had called mnt_want_write() before calling this.
- */
-int inode_update_time(struct inode *inode, int flags)
-{
- if (inode->i_op->update_time)
- return inode->i_op->update_time(inode, flags);
- generic_update_time(inode, flags);
- return 0;
-}
-EXPORT_SYMBOL(inode_update_time);
-
/**
* atime_needs_update - update the access time
* @path: the &struct path to update
@@ -2237,7 +2224,10 @@ void touch_atime(const struct path *path)
* We may also fail on filesystems that have the ability to make parts
* of the fs read only, e.g. subvolumes in Btrfs.
*/
- inode_update_time(inode, S_ATIME);
+ if (inode->i_op->update_time)
+ inode->i_op->update_time(inode, S_ATIME);
+ else
+ generic_update_time(inode, S_ATIME);
mnt_put_write_access(mnt);
skip_update:
sb_end_write(inode->i_sb);
@@ -2392,7 +2382,10 @@ static int file_update_time_flags(struct file *file, unsigned int flags)
if (mnt_get_write_access_file(file))
return 0;
- ret = inode_update_time(inode, sync_mode);
+ if (inode->i_op->update_time)
+ ret = inode->i_op->update_time(inode, sync_mode);
+ else
+ ret = generic_update_time(inode, sync_mode);
mnt_put_write_access_file(file);
return ret;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f5c9cf28c4dc..ee623c16d835 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2246,7 +2246,6 @@ enum file_time_flags {
extern bool atime_needs_update(const struct path *, struct inode *);
extern void touch_atime(const struct path *);
-int inode_update_time(struct inode *inode, int flags);
static inline void file_accessed(struct file *file)
{
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 02/10] fs: allow error returns from generic_update_time
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
2025-12-17 6:09 ` [PATCH 01/10] fs: remove inode_update_time Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 6:09 ` [PATCH 03/10] fs: exit early in generic_update_time when there is no work Christoph Hellwig
` (7 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs,
Chaitanya Kulkarni
Now that no caller looks at the updated flags, switch generic_update_time
to the same calling convention as the ->update_time method and return 0
or a negative errno.
This prepares for adding non-blocking timestamp updates that could return
-EAGAIN.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/gfs2/inode.c | 3 +--
fs/inode.c | 4 ++--
fs/ubifs/file.c | 6 ++----
fs/xfs/xfs_iops.c | 6 ++----
include/linux/fs.h | 2 +-
5 files changed, 8 insertions(+), 13 deletions(-)
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 36618e353199..e08eb419347c 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2257,8 +2257,7 @@ static int gfs2_update_time(struct inode *inode, int flags)
if (error)
return error;
}
- generic_update_time(inode, flags);
- return 0;
+ return generic_update_time(inode, flags);
}
static const struct inode_operations gfs2_file_iops = {
diff --git a/fs/inode.c b/fs/inode.c
index 07effa0cb999..7eb28dd45a5a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2141,7 +2141,7 @@ EXPORT_SYMBOL(inode_update_timestamps);
* or S_VERSION need to be updated we attempt to update all three of them. S_ATIME
* updates can be handled done independently of the rest.
*
- * Returns a S_* mask indicating which fields were updated.
+ * Returns a negative error value on error, else 0.
*/
int generic_update_time(struct inode *inode, int flags)
{
@@ -2153,7 +2153,7 @@ int generic_update_time(struct inode *inode, int flags)
if (updated & S_VERSION)
dirty_flags |= I_DIRTY_SYNC;
__mark_inode_dirty(inode, dirty_flags);
- return updated;
+ return 0;
}
EXPORT_SYMBOL(generic_update_time);
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index c3265b8804f5..ec1bb9f43acc 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1379,10 +1379,8 @@ int ubifs_update_time(struct inode *inode, int flags)
.dirtied_ino_d = ALIGN(ui->data_len, 8) };
int err, release;
- if (!IS_ENABLED(CONFIG_UBIFS_ATIME_SUPPORT)) {
- generic_update_time(inode, flags);
- return 0;
- }
+ if (!IS_ENABLED(CONFIG_UBIFS_ATIME_SUPPORT))
+ return generic_update_time(inode, flags);
err = ubifs_budget_space(c, &req);
if (err)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ad94fbf55014..9dedb54e3cb0 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1197,10 +1197,8 @@ xfs_vn_update_time(
if (inode->i_sb->s_flags & SB_LAZYTIME) {
if (!((flags & S_VERSION) &&
- inode_maybe_inc_iversion(inode, false))) {
- generic_update_time(inode, flags);
- return 0;
- }
+ inode_maybe_inc_iversion(inode, false)))
+ return generic_update_time(inode, flags);
/* Capture the iversion update that just occurred */
log_flags |= XFS_ILOG_CORE;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ee623c16d835..fccb0a38cb74 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2399,7 +2399,7 @@ extern void ihold(struct inode * inode);
extern void iput(struct inode *);
void iput_not_last(struct inode *);
int inode_update_timestamps(struct inode *inode, int flags);
-int generic_update_time(struct inode *, int);
+int generic_update_time(struct inode *inode, int flags);
/* /sys/fs */
extern struct kobject *fs_kobj;
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 03/10] fs: exit early in generic_update_time when there is no work
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
2025-12-17 6:09 ` [PATCH 01/10] fs: remove inode_update_time Christoph Hellwig
2025-12-17 6:09 ` [PATCH 02/10] fs: allow error returns from generic_update_time Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 6:09 ` [PATCH 04/10] fs: factor out a mark_inode_dirty_time helper Christoph Hellwig
` (6 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
Exit early if no attributes are to be updated, to avoid a spurious call
to __mark_inode_dirty which can turn into a fairly expensive no-op due to
the extra checks and locking.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/inode.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/inode.c b/fs/inode.c
index 7eb28dd45a5a..876641a6e478 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2148,6 +2148,9 @@ int generic_update_time(struct inode *inode, int flags)
int updated = inode_update_timestamps(inode, flags);
int dirty_flags = 0;
+ if (!updated)
+ return 0;
+
if (updated & (S_ATIME|S_MTIME|S_CTIME))
dirty_flags = inode->i_sb->s_flags & SB_LAZYTIME ? I_DIRTY_TIME : I_DIRTY_SYNC;
if (updated & S_VERSION)
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 04/10] fs: factor out a mark_inode_dirty_time helper
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
` (2 preceding siblings ...)
2025-12-17 6:09 ` [PATCH 03/10] fs: exit early in generic_update_time when there is no work Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 6:09 ` [PATCH 05/10] fs: allow error returns from inode_update_timestamps Christoph Hellwig
` (5 subsequent siblings)
9 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs,
Chaitanya Kulkarni
Factor out the inode dirtying vs lazytime logic from generic_update_time
into a new helper so that it can be reused in file system methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fs-writeback.c | 16 ++++++++++++++++
fs/inode.c | 14 +++-----------
include/linux/fs.h | 3 ++-
3 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6800886c4d10..7870c158e4a2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2692,6 +2692,22 @@ void __mark_inode_dirty(struct inode *inode, int flags)
}
EXPORT_SYMBOL(__mark_inode_dirty);
+void mark_inode_dirty_time(struct inode *inode, unsigned int flags)
+{
+ if (inode->i_sb->s_flags & SB_LAZYTIME) {
+ int dirty_flags = 0;
+
+ if (flags & (S_ATIME | S_MTIME | S_CTIME))
+ dirty_flags = I_DIRTY_TIME;
+ if (flags & S_VERSION)
+ dirty_flags |= I_DIRTY_SYNC;
+ __mark_inode_dirty(inode, dirty_flags);
+ } else {
+ mark_inode_dirty_sync(inode);
+ }
+}
+EXPORT_SYMBOL_GPL(mark_inode_dirty_time);
+
/*
* The @s_sync_lock is used to serialise concurrent sync operations
* to avoid lock contention problems with concurrent wait_sb_inodes() calls.
diff --git a/fs/inode.c b/fs/inode.c
index 876641a6e478..17ecb7bb5067 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2145,17 +2145,9 @@ EXPORT_SYMBOL(inode_update_timestamps);
*/
int generic_update_time(struct inode *inode, int flags)
{
- int updated = inode_update_timestamps(inode, flags);
- int dirty_flags = 0;
-
- if (!updated)
- return 0;
-
- if (updated & (S_ATIME|S_MTIME|S_CTIME))
- dirty_flags = inode->i_sb->s_flags & SB_LAZYTIME ? I_DIRTY_TIME : I_DIRTY_SYNC;
- if (updated & S_VERSION)
- dirty_flags |= I_DIRTY_SYNC;
- __mark_inode_dirty(inode, dirty_flags);
+ flags = inode_update_timestamps(inode, flags);
+ if (flags)
+ mark_inode_dirty_time(inode, flags);
return 0;
}
EXPORT_SYMBOL(generic_update_time);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fccb0a38cb74..66d3d18cf4e3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2189,7 +2189,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
};
}
-extern void __mark_inode_dirty(struct inode *, int);
+void mark_inode_dirty_time(struct inode *inode, unsigned int flags);
+void __mark_inode_dirty(struct inode *inode, int flags);
static inline void mark_inode_dirty(struct inode *inode)
{
__mark_inode_dirty(inode, I_DIRTY);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 05/10] fs: allow error returns from inode_update_timestamps
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
` (3 preceding siblings ...)
2025-12-17 6:09 ` [PATCH 04/10] fs: factor out a mark_inode_dirty_time helper Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 12:23 ` Jan Kara
2025-12-17 6:09 ` [PATCH 06/10] fs: factor out a sync_lazytime helper Christoph Hellwig
` (4 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
Change flags to a by reference argument so that it can be updated so that
the return value can be used for error returns. This will be used to
implement non-blocking timestamp updates.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/btrfs/inode.c | 8 +++++---
fs/inode.c | 24 ++++++++++++++++--------
fs/nfs/inode.c | 4 ++--
fs/orangefs/inode.c | 5 ++++-
fs/ubifs/file.c | 2 +-
include/linux/fs.h | 2 +-
6 files changed, 29 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 317db7d10a21..3ca8d294770e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6349,13 +6349,15 @@ static int btrfs_dirty_inode(struct btrfs_inode *inode)
static int btrfs_update_time(struct inode *inode, int flags)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
- bool dirty;
+ int error;
if (btrfs_root_readonly(root))
return -EROFS;
- dirty = inode_update_timestamps(inode, flags);
- return dirty ? btrfs_dirty_inode(BTRFS_I(inode)) : 0;
+ error = inode_update_timestamps(inode, &flags);
+ if (error || !flags)
+ return error;
+ return btrfs_dirty_inode(BTRFS_I(inode));
}
/*
diff --git a/fs/inode.c b/fs/inode.c
index 17ecb7bb5067..2c0d69f7fd01 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2095,14 +2095,18 @@ static bool relatime_need_update(struct vfsmount *mnt, struct inode *inode,
* attempt to update all three of them. S_ATIME updates can be handled
* independently of the rest.
*
- * Returns a set of S_* flags indicating which values changed.
+ * Updates @flags to contain the S_* flags which actually need changing. This
+ * can drop flags from the input when they don't need an update, or can add
+ * S_VERSION when the version needs to be bumped.
+ *
+ * Returns 0 or a negative errno.
*/
-int inode_update_timestamps(struct inode *inode, int flags)
+int inode_update_timestamps(struct inode *inode, int *flags)
{
int updated = 0;
struct timespec64 now;
- if (flags & (S_MTIME|S_CTIME|S_VERSION)) {
+ if (*flags & (S_MTIME | S_CTIME | S_VERSION)) {
struct timespec64 ctime = inode_get_ctime(inode);
struct timespec64 mtime = inode_get_mtime(inode);
@@ -2119,7 +2123,7 @@ int inode_update_timestamps(struct inode *inode, int flags)
now = current_time(inode);
}
- if (flags & S_ATIME) {
+ if (*flags & S_ATIME) {
struct timespec64 atime = inode_get_atime(inode);
if (!timespec64_equal(&now, &atime)) {
@@ -2127,7 +2131,9 @@ int inode_update_timestamps(struct inode *inode, int flags)
updated |= S_ATIME;
}
}
- return updated;
+
+ *flags = updated;
+ return 0;
}
EXPORT_SYMBOL(inode_update_timestamps);
@@ -2145,10 +2151,12 @@ EXPORT_SYMBOL(inode_update_timestamps);
*/
int generic_update_time(struct inode *inode, int flags)
{
- flags = inode_update_timestamps(inode, flags);
- if (flags)
+ int error;
+
+ error = inode_update_timestamps(inode, &flags);
+ if (!error && flags)
mark_inode_dirty_time(inode, flags);
- return 0;
+ return error;
}
EXPORT_SYMBOL(generic_update_time);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 84049f3cd340..221816524c66 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -671,8 +671,8 @@ static void nfs_set_timestamps_to_ts(struct inode *inode, struct iattr *attr)
static void nfs_update_timestamps(struct inode *inode, unsigned int ia_valid)
{
- enum file_time_flags time_flags = 0;
unsigned int cache_flags = 0;
+ int time_flags = 0;
if (ia_valid & ATTR_MTIME) {
time_flags |= S_MTIME | S_CTIME;
@@ -682,7 +682,7 @@ static void nfs_update_timestamps(struct inode *inode, unsigned int ia_valid)
time_flags |= S_ATIME;
cache_flags |= NFS_INO_INVALID_ATIME;
}
- inode_update_timestamps(inode, time_flags);
+ inode_update_timestamps(inode, &time_flags);
NFS_I(inode)->cache_validity &= ~cache_flags;
}
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index d7275990ffa4..3b58f31bd54f 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -875,11 +875,14 @@ int orangefs_permission(struct mnt_idmap *idmap,
int orangefs_update_time(struct inode *inode, int flags)
{
struct iattr iattr;
+ int error;
gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_update_time: %pU\n",
get_khandle_from_ino(inode));
- flags = inode_update_timestamps(inode, flags);
+ error = inode_update_timestamps(inode, &flags);
+ if (error || !flags)
+ return error;
memset(&iattr, 0, sizeof iattr);
if (flags & S_ATIME)
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index ec1bb9f43acc..71540644a931 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1387,7 +1387,7 @@ int ubifs_update_time(struct inode *inode, int flags)
return err;
mutex_lock(&ui->ui_mutex);
- inode_update_timestamps(inode, flags);
+ inode_update_timestamps(inode, &flags);
release = ui->dirty;
__mark_inode_dirty(inode, I_DIRTY_SYNC);
mutex_unlock(&ui->ui_mutex);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 66d3d18cf4e3..75d5f38b08c9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2399,7 +2399,7 @@ static inline void super_set_sysfs_name_generic(struct super_block *sb, const ch
extern void ihold(struct inode * inode);
extern void iput(struct inode *);
void iput_not_last(struct inode *);
-int inode_update_timestamps(struct inode *inode, int flags);
+int inode_update_timestamps(struct inode *inode, int *flags);
int generic_update_time(struct inode *inode, int flags);
/* /sys/fs */
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 06/10] fs: factor out a sync_lazytime helper
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
` (4 preceding siblings ...)
2025-12-17 6:09 ` [PATCH 05/10] fs: allow error returns from inode_update_timestamps Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 12:26 ` Jan Kara
2025-12-17 6:09 ` [PATCH 07/10] fs: add a ->sync_lazytime method Christoph Hellwig
` (3 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
Centralize how we synchronize a lazytime update into the actual on-disk
timestamp into a single helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/fs-writeback.c | 22 +++++++++++++++-------
fs/inode.c | 5 +----
fs/internal.h | 3 ++-
fs/sync.c | 4 ++--
include/trace/events/writeback.h | 6 ------
5 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 7870c158e4a2..fa555e10d8b9 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1711,6 +1711,16 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
}
}
+bool sync_lazytime(struct inode *inode)
+{
+ if (!(inode_state_read_once(inode) & I_DIRTY_TIME))
+ return false;
+
+ trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
+ return true;
+}
+
/*
* Write out an inode and its dirty pages (or some of its dirty pages, depending
* on @wbc->nr_to_write), and clear the relevant dirty flags from i_state.
@@ -1750,17 +1760,15 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * If the inode has dirty timestamps and we need to write them, call
- * mark_inode_dirty_sync() to notify the filesystem about it and to
- * change I_DIRTY_TIME into I_DIRTY_SYNC.
+ * For data integrity writeback, or when the dirty interval expired,
+ * ask the file system to propagata lazy timestamp updates into real
+ * dirty state.
*/
if ((inode_state_read_once(inode) & I_DIRTY_TIME) &&
(wbc->sync_mode == WB_SYNC_ALL ||
time_after(jiffies, inode->dirtied_time_when +
- dirtytime_expire_interval * HZ))) {
- trace_writeback_lazytime(inode);
- mark_inode_dirty_sync(inode);
- }
+ dirtytime_expire_interval * HZ)))
+ sync_lazytime(inode);
/*
* Get and clear the dirty flags from i_state. This needs to be done
diff --git a/fs/inode.c b/fs/inode.c
index 2c0d69f7fd01..f1c09fc0913d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1979,11 +1979,8 @@ void iput(struct inode *inode)
if (atomic_add_unless(&inode->i_count, -1, 1))
return;
- if ((inode_state_read_once(inode) & I_DIRTY_TIME) && inode->i_nlink) {
- trace_writeback_lazytime_iput(inode);
- mark_inode_dirty_sync(inode);
+ if (inode->i_nlink && sync_lazytime(inode))
goto retry;
- }
spin_lock(&inode->i_lock);
if (unlikely((inode_state_read(inode) & I_DIRTY_TIME) && inode->i_nlink)) {
diff --git a/fs/internal.h b/fs/internal.h
index ab638d41ab81..18a062c1b5b0 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -214,7 +214,8 @@ bool in_group_or_capable(struct mnt_idmap *idmap,
/*
* fs-writeback.c
*/
-extern long get_nr_dirty_inodes(void);
+long get_nr_dirty_inodes(void);
+bool sync_lazytime(struct inode *inode);
/*
* dcache.c
diff --git a/fs/sync.c b/fs/sync.c
index 431fc5f5be06..4283af7119d1 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -183,8 +183,8 @@ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
if (!file->f_op->fsync)
return -EINVAL;
- if (!datasync && (inode_state_read_once(inode) & I_DIRTY_TIME))
- mark_inode_dirty_sync(inode);
+ if (!datasync)
+ sync_lazytime(inode);
return file->f_op->fsync(file, start, end, datasync);
}
EXPORT_SYMBOL(vfs_fsync_range);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 311a341e6fe4..7162d03e69a5 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -856,12 +856,6 @@ DEFINE_EVENT(writeback_inode_template, writeback_lazytime,
TP_ARGS(inode)
);
-DEFINE_EVENT(writeback_inode_template, writeback_lazytime_iput,
- TP_PROTO(struct inode *inode),
-
- TP_ARGS(inode)
-);
-
DEFINE_EVENT(writeback_inode_template, writeback_dirty_inode_enqueue,
TP_PROTO(struct inode *inode),
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 07/10] fs: add a ->sync_lazytime method
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
` (5 preceding siblings ...)
2025-12-17 6:09 ` [PATCH 06/10] fs: factor out a sync_lazytime helper Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 12:30 ` Jan Kara
2025-12-17 6:09 ` [PATCH 08/10] fs: add support for non-blocking timestamp updates Christoph Hellwig
` (2 subsequent siblings)
9 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
Allow the file system to explicitly implement lazytime syncing instead
of pigging back on generic inode dirtying. This allows to simplify
the XFS implementation and prepares for non-blocking lazytime timestamp
updates.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
Documentation/filesystems/locking.rst | 2 ++
Documentation/filesystems/vfs.rst | 6 ++++++
fs/fs-writeback.c | 16 +++++++++++++---
include/linux/fs.h | 1 +
4 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 77704fde9845..9b2f14ada8cd 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -81,6 +81,7 @@ prototypes::
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len);
void (*update_time)(struct inode *, struct timespec *, int);
+ void (*sync_lazytime)(struct inode *inode);
int (*atomic_open)(struct inode *, struct dentry *,
struct file *, unsigned open_flag,
umode_t create_mode);
@@ -117,6 +118,7 @@ getattr: no
listxattr: no
fiemap: no
update_time: no
+sync_lazytime: no
atomic_open: shared (exclusive if O_CREAT is set in open flags)
tmpfile: no
fileattr_get: no or exclusive
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 670ba66b60e4..4509655d12c6 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -486,6 +486,7 @@ As of kernel 2.6.22, the following members are defined:
int (*getattr) (struct mnt_idmap *, const struct path *, struct kstat *, u32, unsigned int);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
void (*update_time)(struct inode *, struct timespec *, int);
+ void (*sync_lazytime)(struct inode *inode);
int (*atomic_open)(struct inode *, struct dentry *, struct file *,
unsigned open_flag, umode_t create_mode);
int (*tmpfile) (struct mnt_idmap *, struct inode *, struct file *, umode_t);
@@ -642,6 +643,11 @@ otherwise noted.
an inode. If this is not defined the VFS will update the inode
itself and call mark_inode_dirty_sync.
+``sync_lazytime``:
+ called by the writeback code to update the lazy time stamps to
+ regular time stamp updates that get syncing into the on-disk
+ inode.
+
``atomic_open``
called on the last component of an open. Using this optional
method the filesystem can look up, possibly create and open the
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index fa555e10d8b9..c2e08eaeadc8 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1717,7 +1717,10 @@ bool sync_lazytime(struct inode *inode)
return false;
trace_writeback_lazytime(inode);
- mark_inode_dirty_sync(inode);
+ if (inode->i_op->sync_lazytime)
+ inode->i_op->sync_lazytime(inode);
+ else
+ mark_inode_dirty_sync(inode);
return true;
}
@@ -2569,16 +2572,20 @@ void __mark_inode_dirty(struct inode *inode, int flags)
trace_writeback_mark_inode_dirty(inode, flags);
if (flags & I_DIRTY_INODE) {
+ bool was_dirty_time =
+ inode_state_read_once(inode) & I_DIRTY_TIME;
+
/*
* Inode timestamp update will piggback on this dirtying.
* We tell ->dirty_inode callback that timestamps need to
* be updated by setting I_DIRTY_TIME in flags.
*/
- if (inode_state_read_once(inode) & I_DIRTY_TIME) {
+ if (was_dirty_time) {
spin_lock(&inode->i_lock);
if (inode_state_read(inode) & I_DIRTY_TIME) {
inode_state_clear(inode, I_DIRTY_TIME);
flags |= I_DIRTY_TIME;
+ was_dirty_time = true;
}
spin_unlock(&inode->i_lock);
}
@@ -2591,9 +2598,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
* for just I_DIRTY_PAGES or I_DIRTY_TIME.
*/
trace_writeback_dirty_inode_start(inode, flags);
- if (sb->s_op->dirty_inode)
+ if (sb->s_op->dirty_inode) {
sb->s_op->dirty_inode(inode,
flags & (I_DIRTY_INODE | I_DIRTY_TIME));
+ } else if (was_dirty_time && inode->i_op->sync_lazytime) {
+ inode->i_op->sync_lazytime(inode);
+ }
trace_writeback_dirty_inode(inode, flags);
/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 75d5f38b08c9..255eb3b42d1d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2011,6 +2011,7 @@ struct inode_operations {
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
u64 len);
int (*update_time)(struct inode *, int);
+ void (*sync_lazytime)(struct inode *inode);
int (*atomic_open)(struct inode *, struct dentry *,
struct file *, unsigned open_flag,
umode_t create_mode);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 08/10] fs: add support for non-blocking timestamp updates
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
` (6 preceding siblings ...)
2025-12-17 6:09 ` [PATCH 07/10] fs: add a ->sync_lazytime method Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 12:42 ` Jan Kara
2025-12-17 6:09 ` [PATCH 09/10] xfs: implement ->sync_lazytime Christoph Hellwig
2025-12-17 6:09 ` [PATCH 10/10] xfs: enable non-blocking timestamp updates Christoph Hellwig
9 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
Currently file_update_time_flags unconditionally returns -EAGAIN if any
timestamp needs to be updated and IOCB_NOWAIT is passed. This makes
non-blocking direct writes impossible on file systems with granular
enough timestamps.
Add a S_NOWAIT to ask for timestamps to not block, and return -EAGAIN in
all methods for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/btrfs/inode.c | 3 +++
fs/fat/misc.c | 3 +++
fs/gfs2/inode.c | 3 +++
fs/inode.c | 30 +++++++++++++++++++++++++-----
fs/orangefs/inode.c | 3 +++
fs/overlayfs/inode.c | 3 +++
fs/ubifs/file.c | 3 +++
fs/xfs/xfs_iops.c | 3 +++
include/linux/fs.h | 10 ++++++----
9 files changed, 52 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3ca8d294770e..7e5553878818 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6354,6 +6354,9 @@ static int btrfs_update_time(struct inode *inode, int flags)
if (btrfs_root_readonly(root))
return -EROFS;
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
error = inode_update_timestamps(inode, &flags);
if (error || !flags)
return error;
diff --git a/fs/fat/misc.c b/fs/fat/misc.c
index 950da09f0961..5df3193c35f9 100644
--- a/fs/fat/misc.c
+++ b/fs/fat/misc.c
@@ -346,6 +346,9 @@ int fat_update_time(struct inode *inode, int flags)
if (inode->i_ino == MSDOS_ROOT_INO)
return 0;
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
if (flags & (S_ATIME | S_CTIME | S_MTIME)) {
fat_truncate_time(inode, NULL, flags);
if (inode->i_sb->s_flags & SB_LAZYTIME)
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index e08eb419347c..0dce2af533b2 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2249,6 +2249,9 @@ static int gfs2_update_time(struct inode *inode, int flags)
struct gfs2_holder *gh;
int error;
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
gh = gfs2_glock_is_locked_by_me(gl);
if (gh && gl->gl_state != LM_ST_EXCLUSIVE) {
gfs2_glock_dq(gh);
diff --git a/fs/inode.c b/fs/inode.c
index f1c09fc0913d..0180ad526cf8 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2110,12 +2110,26 @@ int inode_update_timestamps(struct inode *inode, int *flags)
now = inode_set_ctime_current(inode);
if (!timespec64_equal(&now, &ctime))
updated |= S_CTIME;
- if (!timespec64_equal(&now, &mtime)) {
- inode_set_mtime_to_ts(inode, now);
+ if (!timespec64_equal(&now, &mtime))
updated |= S_MTIME;
+
+ if (IS_I_VERSION(inode)) {
+ if (*flags & S_NOWAIT) {
+ /*
+ * Error out if we'd need timestamp updates, as
+ * the generally requires blocking to dirty the
+ * inode in one form or another.
+ */
+ if (updated && inode_iversion_need_inc(inode))
+ goto bail;
+ } else {
+ if (inode_maybe_inc_iversion(inode, updated))
+ updated |= S_VERSION;
+ }
}
- if (IS_I_VERSION(inode) && inode_maybe_inc_iversion(inode, updated))
- updated |= S_VERSION;
+
+ if (updated & S_MTIME)
+ inode_set_mtime_to_ts(inode, now);
} else {
now = current_time(inode);
}
@@ -2131,6 +2145,9 @@ int inode_update_timestamps(struct inode *inode, int *flags)
*flags = updated;
return 0;
+bail:
+ *flags = 0;
+ return -EAGAIN;
}
EXPORT_SYMBOL(inode_update_timestamps);
@@ -2150,6 +2167,9 @@ int generic_update_time(struct inode *inode, int flags)
{
int error;
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
error = inode_update_timestamps(inode, &flags);
if (!error && flags)
mark_inode_dirty_time(inode, flags);
@@ -2378,7 +2398,7 @@ static int file_update_time_flags(struct file *file, unsigned int flags)
return 0;
if (flags & IOCB_NOWAIT)
- return -EAGAIN;
+ sync_mode |= S_NOWAIT;
if (mnt_get_write_access_file(file))
return 0;
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 3b58f31bd54f..a84142f56344 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -880,6 +880,9 @@ int orangefs_update_time(struct inode *inode, int flags)
gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_update_time: %pU\n",
get_khandle_from_ino(inode));
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
error = inode_update_timestamps(inode, &flags);
if (error || !flags)
return error;
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index bdbf86b56a9b..28ec75994cb3 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -564,6 +564,9 @@ int ovl_update_time(struct inode *inode, int flags)
.dentry = ovl_upperdentry_dereference(OVL_I(inode)),
};
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
if (upperpath.dentry) {
touch_atime(&upperpath);
inode_set_atime_to_ts(inode,
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 71540644a931..fd47d0e972e2 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1382,6 +1382,9 @@ int ubifs_update_time(struct inode *inode, int flags)
if (!IS_ENABLED(CONFIG_UBIFS_ATIME_SUPPORT))
return generic_update_time(inode, flags);
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
err = ubifs_budget_space(c, &req);
if (err)
return err;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 9dedb54e3cb0..626a541b247b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1195,6 +1195,9 @@ xfs_vn_update_time(
trace_xfs_update_time(ip);
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
+
if (inode->i_sb->s_flags & SB_LAZYTIME) {
if (!((flags & S_VERSION) &&
inode_maybe_inc_iversion(inode, false)))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 255eb3b42d1d..34152f687b46 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2240,10 +2240,12 @@ static inline void inode_dec_link_count(struct inode *inode)
}
enum file_time_flags {
- S_ATIME = 1,
- S_MTIME = 2,
- S_CTIME = 4,
- S_VERSION = 8,
+ S_ATIME = 1U << 0,
+ S_MTIME = 1U << 1,
+ S_CTIME = 1U << 2,
+ S_VERSION = 1U << 3,
+
+ S_NOWAIT = 1U << 15,
};
extern bool atime_needs_update(const struct path *, struct inode *);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 09/10] xfs: implement ->sync_lazytime
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
` (7 preceding siblings ...)
2025-12-17 6:09 ` [PATCH 08/10] fs: add support for non-blocking timestamp updates Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
2025-12-17 6:09 ` [PATCH 10/10] xfs: enable non-blocking timestamp updates Christoph Hellwig
9 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
Switch to the new explicit lazytime syncing method instead of trying
to second guess what could be a lazytime update in ->dirty_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/xfs/xfs_iops.c | 20 ++++++++++++++++++++
fs/xfs/xfs_super.c | 29 -----------------------------
2 files changed, 20 insertions(+), 29 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 626a541b247b..e12c6e6d313e 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1227,6 +1227,22 @@ xfs_vn_update_time(
return xfs_trans_commit(tp);
}
+static void
+xfs_vn_sync_lazytime(
+ struct inode *inode)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+
+ if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
+ return;
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
+ xfs_trans_commit(tp);
+}
+
STATIC int
xfs_vn_fiemap(
struct inode *inode,
@@ -1270,6 +1286,7 @@ static const struct inode_operations xfs_inode_operations = {
.listxattr = xfs_vn_listxattr,
.fiemap = xfs_vn_fiemap,
.update_time = xfs_vn_update_time,
+ .sync_lazytime = xfs_vn_sync_lazytime,
.fileattr_get = xfs_fileattr_get,
.fileattr_set = xfs_fileattr_set,
};
@@ -1296,6 +1313,7 @@ static const struct inode_operations xfs_dir_inode_operations = {
.setattr = xfs_vn_setattr,
.listxattr = xfs_vn_listxattr,
.update_time = xfs_vn_update_time,
+ .sync_lazytime = xfs_vn_sync_lazytime,
.tmpfile = xfs_vn_tmpfile,
.fileattr_get = xfs_fileattr_get,
.fileattr_set = xfs_fileattr_set,
@@ -1323,6 +1341,7 @@ static const struct inode_operations xfs_dir_ci_inode_operations = {
.setattr = xfs_vn_setattr,
.listxattr = xfs_vn_listxattr,
.update_time = xfs_vn_update_time,
+ .sync_lazytime = xfs_vn_sync_lazytime,
.tmpfile = xfs_vn_tmpfile,
.fileattr_get = xfs_fileattr_get,
.fileattr_set = xfs_fileattr_set,
@@ -1334,6 +1353,7 @@ static const struct inode_operations xfs_symlink_inode_operations = {
.setattr = xfs_vn_setattr,
.listxattr = xfs_vn_listxattr,
.update_time = xfs_vn_update_time,
+ .sync_lazytime = xfs_vn_sync_lazytime,
.fileattr_get = xfs_fileattr_get,
.fileattr_set = xfs_fileattr_set,
};
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index bc71aa9dcee8..094f257eff15 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -712,34 +712,6 @@ xfs_fs_destroy_inode(
xfs_inode_mark_reclaimable(ip);
}
-static void
-xfs_fs_dirty_inode(
- struct inode *inode,
- int flags)
-{
- struct xfs_inode *ip = XFS_I(inode);
- struct xfs_mount *mp = ip->i_mount;
- struct xfs_trans *tp;
-
- if (!(inode->i_sb->s_flags & SB_LAZYTIME))
- return;
-
- /*
- * Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC)
- * and has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be passed
- * in flags possibly together with I_DIRTY_SYNC.
- */
- if ((flags & ~I_DIRTY_TIME) != I_DIRTY_SYNC || !(flags & I_DIRTY_TIME))
- return;
-
- if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
- return;
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
- xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
- xfs_trans_commit(tp);
-}
-
/*
* Slab object creation initialisation for the XFS inode.
* This covers only the idempotent fields in the XFS inode;
@@ -1304,7 +1276,6 @@ xfs_fs_show_stats(
static const struct super_operations xfs_super_operations = {
.alloc_inode = xfs_fs_alloc_inode,
.destroy_inode = xfs_fs_destroy_inode,
- .dirty_inode = xfs_fs_dirty_inode,
.drop_inode = xfs_fs_drop_inode,
.evict_inode = xfs_fs_evict_inode,
.put_super = xfs_fs_put_super,
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 10/10] xfs: enable non-blocking timestamp updates
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
` (8 preceding siblings ...)
2025-12-17 6:09 ` [PATCH 09/10] xfs: implement ->sync_lazytime Christoph Hellwig
@ 2025-12-17 6:09 ` Christoph Hellwig
9 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-17 6:09 UTC (permalink / raw)
To: Christian Brauner
Cc: Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
The lazytime path using the generic helpers can never block in XFS
because there is no ->dirty_inode method that could block. Allow
non-blocking timestamp updates for this case by replacing
generic_update_times with the open coded version without the S_NOWAIT
check.
Fixes: 66fa3cedf16a ("fs: Add async write file modification handling.")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/xfs/xfs_iops.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index e12c6e6d313e..7b6aa438cebc 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1195,16 +1195,24 @@ xfs_vn_update_time(
trace_xfs_update_time(ip);
- if (flags & S_NOWAIT)
- return -EAGAIN;
-
if (inode->i_sb->s_flags & SB_LAZYTIME) {
- if (!((flags & S_VERSION) &&
- inode_maybe_inc_iversion(inode, false)))
- return generic_update_time(inode, flags);
+ int updated = flags;
+
+ error = inode_update_timestamps(inode, &updated);
+ if (error)
+ return error;
+
+ if (!(updated & S_VERSION)) {
+ if (updated)
+ mark_inode_dirty_time(inode, updated);
+ return 0;
+ }
/* Capture the iversion update that just occurred */
log_flags |= XFS_ILOG_CORE;
+ } else {
+ if (flags & S_NOWAIT)
+ return -EAGAIN;
}
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp);
--
2.47.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 05/10] fs: allow error returns from inode_update_timestamps
2025-12-17 6:09 ` [PATCH 05/10] fs: allow error returns from inode_update_timestamps Christoph Hellwig
@ 2025-12-17 12:23 ` Jan Kara
0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2025-12-17 12:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Wed 17-12-25 07:09:38, Christoph Hellwig wrote:
> Change flags to a by reference argument so that it can be updated so that
> the return value can be used for error returns. This will be used to
> implement non-blocking timestamp updates.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/btrfs/inode.c | 8 +++++---
> fs/inode.c | 24 ++++++++++++++++--------
> fs/nfs/inode.c | 4 ++--
> fs/orangefs/inode.c | 5 ++++-
> fs/ubifs/file.c | 2 +-
> include/linux/fs.h | 2 +-
> 6 files changed, 29 insertions(+), 16 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 317db7d10a21..3ca8d294770e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6349,13 +6349,15 @@ static int btrfs_dirty_inode(struct btrfs_inode *inode)
> static int btrfs_update_time(struct inode *inode, int flags)
> {
> struct btrfs_root *root = BTRFS_I(inode)->root;
> - bool dirty;
> + int error;
>
> if (btrfs_root_readonly(root))
> return -EROFS;
>
> - dirty = inode_update_timestamps(inode, flags);
> - return dirty ? btrfs_dirty_inode(BTRFS_I(inode)) : 0;
> + error = inode_update_timestamps(inode, &flags);
> + if (error || !flags)
> + return error;
> + return btrfs_dirty_inode(BTRFS_I(inode));
> }
>
> /*
> diff --git a/fs/inode.c b/fs/inode.c
> index 17ecb7bb5067..2c0d69f7fd01 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2095,14 +2095,18 @@ static bool relatime_need_update(struct vfsmount *mnt, struct inode *inode,
> * attempt to update all three of them. S_ATIME updates can be handled
> * independently of the rest.
> *
> - * Returns a set of S_* flags indicating which values changed.
> + * Updates @flags to contain the S_* flags which actually need changing. This
> + * can drop flags from the input when they don't need an update, or can add
> + * S_VERSION when the version needs to be bumped.
> + *
> + * Returns 0 or a negative errno.
> */
> -int inode_update_timestamps(struct inode *inode, int flags)
> +int inode_update_timestamps(struct inode *inode, int *flags)
> {
> int updated = 0;
> struct timespec64 now;
>
> - if (flags & (S_MTIME|S_CTIME|S_VERSION)) {
> + if (*flags & (S_MTIME | S_CTIME | S_VERSION)) {
> struct timespec64 ctime = inode_get_ctime(inode);
> struct timespec64 mtime = inode_get_mtime(inode);
>
> @@ -2119,7 +2123,7 @@ int inode_update_timestamps(struct inode *inode, int flags)
> now = current_time(inode);
> }
>
> - if (flags & S_ATIME) {
> + if (*flags & S_ATIME) {
> struct timespec64 atime = inode_get_atime(inode);
>
> if (!timespec64_equal(&now, &atime)) {
> @@ -2127,7 +2131,9 @@ int inode_update_timestamps(struct inode *inode, int flags)
> updated |= S_ATIME;
> }
> }
> - return updated;
> +
> + *flags = updated;
> + return 0;
> }
> EXPORT_SYMBOL(inode_update_timestamps);
>
> @@ -2145,10 +2151,12 @@ EXPORT_SYMBOL(inode_update_timestamps);
> */
> int generic_update_time(struct inode *inode, int flags)
> {
> - flags = inode_update_timestamps(inode, flags);
> - if (flags)
> + int error;
> +
> + error = inode_update_timestamps(inode, &flags);
> + if (!error && flags)
> mark_inode_dirty_time(inode, flags);
> - return 0;
> + return error;
> }
> EXPORT_SYMBOL(generic_update_time);
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 84049f3cd340..221816524c66 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -671,8 +671,8 @@ static void nfs_set_timestamps_to_ts(struct inode *inode, struct iattr *attr)
>
> static void nfs_update_timestamps(struct inode *inode, unsigned int ia_valid)
> {
> - enum file_time_flags time_flags = 0;
> unsigned int cache_flags = 0;
> + int time_flags = 0;
>
> if (ia_valid & ATTR_MTIME) {
> time_flags |= S_MTIME | S_CTIME;
> @@ -682,7 +682,7 @@ static void nfs_update_timestamps(struct inode *inode, unsigned int ia_valid)
> time_flags |= S_ATIME;
> cache_flags |= NFS_INO_INVALID_ATIME;
> }
> - inode_update_timestamps(inode, time_flags);
> + inode_update_timestamps(inode, &time_flags);
> NFS_I(inode)->cache_validity &= ~cache_flags;
> }
>
> diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
> index d7275990ffa4..3b58f31bd54f 100644
> --- a/fs/orangefs/inode.c
> +++ b/fs/orangefs/inode.c
> @@ -875,11 +875,14 @@ int orangefs_permission(struct mnt_idmap *idmap,
> int orangefs_update_time(struct inode *inode, int flags)
> {
> struct iattr iattr;
> + int error;
>
> gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_update_time: %pU\n",
> get_khandle_from_ino(inode));
>
> - flags = inode_update_timestamps(inode, flags);
> + error = inode_update_timestamps(inode, &flags);
> + if (error || !flags)
> + return error;
>
> memset(&iattr, 0, sizeof iattr);
> if (flags & S_ATIME)
> diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
> index ec1bb9f43acc..71540644a931 100644
> --- a/fs/ubifs/file.c
> +++ b/fs/ubifs/file.c
> @@ -1387,7 +1387,7 @@ int ubifs_update_time(struct inode *inode, int flags)
> return err;
>
> mutex_lock(&ui->ui_mutex);
> - inode_update_timestamps(inode, flags);
> + inode_update_timestamps(inode, &flags);
> release = ui->dirty;
> __mark_inode_dirty(inode, I_DIRTY_SYNC);
> mutex_unlock(&ui->ui_mutex);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 66d3d18cf4e3..75d5f38b08c9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2399,7 +2399,7 @@ static inline void super_set_sysfs_name_generic(struct super_block *sb, const ch
> extern void ihold(struct inode * inode);
> extern void iput(struct inode *);
> void iput_not_last(struct inode *);
> -int inode_update_timestamps(struct inode *inode, int flags);
> +int inode_update_timestamps(struct inode *inode, int *flags);
> int generic_update_time(struct inode *inode, int flags);
>
> /* /sys/fs */
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 06/10] fs: factor out a sync_lazytime helper
2025-12-17 6:09 ` [PATCH 06/10] fs: factor out a sync_lazytime helper Christoph Hellwig
@ 2025-12-17 12:26 ` Jan Kara
0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2025-12-17 12:26 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Wed 17-12-25 07:09:39, Christoph Hellwig wrote:
> Centralize how we synchronize a lazytime update into the actual on-disk
> timestamp into a single helper.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/fs-writeback.c | 22 +++++++++++++++-------
> fs/inode.c | 5 +----
> fs/internal.h | 3 ++-
> fs/sync.c | 4 ++--
> include/trace/events/writeback.h | 6 ------
> 5 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 7870c158e4a2..fa555e10d8b9 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1711,6 +1711,16 @@ static void requeue_inode(struct inode *inode, struct bdi_writeback *wb,
> }
> }
>
> +bool sync_lazytime(struct inode *inode)
> +{
> + if (!(inode_state_read_once(inode) & I_DIRTY_TIME))
> + return false;
> +
> + trace_writeback_lazytime(inode);
> + mark_inode_dirty_sync(inode);
> + return true;
> +}
> +
> /*
> * Write out an inode and its dirty pages (or some of its dirty pages, depending
> * on @wbc->nr_to_write), and clear the relevant dirty flags from i_state.
> @@ -1750,17 +1760,15 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
> }
>
> /*
> - * If the inode has dirty timestamps and we need to write them, call
> - * mark_inode_dirty_sync() to notify the filesystem about it and to
> - * change I_DIRTY_TIME into I_DIRTY_SYNC.
> + * For data integrity writeback, or when the dirty interval expired,
> + * ask the file system to propagata lazy timestamp updates into real
> + * dirty state.
> */
> if ((inode_state_read_once(inode) & I_DIRTY_TIME) &&
> (wbc->sync_mode == WB_SYNC_ALL ||
> time_after(jiffies, inode->dirtied_time_when +
> - dirtytime_expire_interval * HZ))) {
> - trace_writeback_lazytime(inode);
> - mark_inode_dirty_sync(inode);
> - }
> + dirtytime_expire_interval * HZ)))
> + sync_lazytime(inode);
>
> /*
> * Get and clear the dirty flags from i_state. This needs to be done
> diff --git a/fs/inode.c b/fs/inode.c
> index 2c0d69f7fd01..f1c09fc0913d 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1979,11 +1979,8 @@ void iput(struct inode *inode)
> if (atomic_add_unless(&inode->i_count, -1, 1))
> return;
>
> - if ((inode_state_read_once(inode) & I_DIRTY_TIME) && inode->i_nlink) {
> - trace_writeback_lazytime_iput(inode);
> - mark_inode_dirty_sync(inode);
> + if (inode->i_nlink && sync_lazytime(inode))
> goto retry;
> - }
>
> spin_lock(&inode->i_lock);
> if (unlikely((inode_state_read(inode) & I_DIRTY_TIME) && inode->i_nlink)) {
> diff --git a/fs/internal.h b/fs/internal.h
> index ab638d41ab81..18a062c1b5b0 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -214,7 +214,8 @@ bool in_group_or_capable(struct mnt_idmap *idmap,
> /*
> * fs-writeback.c
> */
> -extern long get_nr_dirty_inodes(void);
> +long get_nr_dirty_inodes(void);
> +bool sync_lazytime(struct inode *inode);
>
> /*
> * dcache.c
> diff --git a/fs/sync.c b/fs/sync.c
> index 431fc5f5be06..4283af7119d1 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -183,8 +183,8 @@ int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
>
> if (!file->f_op->fsync)
> return -EINVAL;
> - if (!datasync && (inode_state_read_once(inode) & I_DIRTY_TIME))
> - mark_inode_dirty_sync(inode);
> + if (!datasync)
> + sync_lazytime(inode);
> return file->f_op->fsync(file, start, end, datasync);
> }
> EXPORT_SYMBOL(vfs_fsync_range);
> diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> index 311a341e6fe4..7162d03e69a5 100644
> --- a/include/trace/events/writeback.h
> +++ b/include/trace/events/writeback.h
> @@ -856,12 +856,6 @@ DEFINE_EVENT(writeback_inode_template, writeback_lazytime,
> TP_ARGS(inode)
> );
>
> -DEFINE_EVENT(writeback_inode_template, writeback_lazytime_iput,
> - TP_PROTO(struct inode *inode),
> -
> - TP_ARGS(inode)
> -);
> -
> DEFINE_EVENT(writeback_inode_template, writeback_dirty_inode_enqueue,
>
> TP_PROTO(struct inode *inode),
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 07/10] fs: add a ->sync_lazytime method
2025-12-17 6:09 ` [PATCH 07/10] fs: add a ->sync_lazytime method Christoph Hellwig
@ 2025-12-17 12:30 ` Jan Kara
2025-12-18 6:13 ` Christoph Hellwig
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2025-12-17 12:30 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Wed 17-12-25 07:09:40, Christoph Hellwig wrote:
> Allow the file system to explicitly implement lazytime syncing instead
> of pigging back on generic inode dirtying. This allows to simplify
> the XFS implementation and prepares for non-blocking lazytime timestamp
> updates.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
...
> if (flags & I_DIRTY_INODE) {
> + bool was_dirty_time =
> + inode_state_read_once(inode) & I_DIRTY_TIME;
> +
> /*
> * Inode timestamp update will piggback on this dirtying.
> * We tell ->dirty_inode callback that timestamps need to
> * be updated by setting I_DIRTY_TIME in flags.
> */
> - if (inode_state_read_once(inode) & I_DIRTY_TIME) {
> + if (was_dirty_time) {
> spin_lock(&inode->i_lock);
> if (inode_state_read(inode) & I_DIRTY_TIME) {
> inode_state_clear(inode, I_DIRTY_TIME);
> flags |= I_DIRTY_TIME;
> + was_dirty_time = true;
This looks bogus. was_dirty_time is already true here. What I think you
wanted here is to set it to false if locked I_DIRTY_TIME check failed.
Otherwise the patch looks good.
Honza
> }
> spin_unlock(&inode->i_lock);
> }
> @@ -2591,9 +2598,12 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> * for just I_DIRTY_PAGES or I_DIRTY_TIME.
> */
> trace_writeback_dirty_inode_start(inode, flags);
> - if (sb->s_op->dirty_inode)
> + if (sb->s_op->dirty_inode) {
> sb->s_op->dirty_inode(inode,
> flags & (I_DIRTY_INODE | I_DIRTY_TIME));
> + } else if (was_dirty_time && inode->i_op->sync_lazytime) {
> + inode->i_op->sync_lazytime(inode);
> + }
> trace_writeback_dirty_inode(inode, flags);
>
> /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 75d5f38b08c9..255eb3b42d1d 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2011,6 +2011,7 @@ struct inode_operations {
> int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
> u64 len);
> int (*update_time)(struct inode *, int);
> + void (*sync_lazytime)(struct inode *inode);
> int (*atomic_open)(struct inode *, struct dentry *,
> struct file *, unsigned open_flag,
> umode_t create_mode);
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 08/10] fs: add support for non-blocking timestamp updates
2025-12-17 6:09 ` [PATCH 08/10] fs: add support for non-blocking timestamp updates Christoph Hellwig
@ 2025-12-17 12:42 ` Jan Kara
2025-12-18 6:19 ` Christoph Hellwig
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2025-12-17 12:42 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Al Viro, David Sterba, Jan Kara, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Wed 17-12-25 07:09:41, Christoph Hellwig wrote:
> Currently file_update_time_flags unconditionally returns -EAGAIN if any
> timestamp needs to be updated and IOCB_NOWAIT is passed. This makes
> non-blocking direct writes impossible on file systems with granular
> enough timestamps.
>
> Add a S_NOWAIT to ask for timestamps to not block, and return -EAGAIN in
> all methods for now.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
...
> @@ -2110,12 +2110,26 @@ int inode_update_timestamps(struct inode *inode, int *flags)
> now = inode_set_ctime_current(inode);
> if (!timespec64_equal(&now, &ctime))
> updated |= S_CTIME;
> - if (!timespec64_equal(&now, &mtime)) {
> - inode_set_mtime_to_ts(inode, now);
> + if (!timespec64_equal(&now, &mtime))
> updated |= S_MTIME;
> +
> + if (IS_I_VERSION(inode)) {
> + if (*flags & S_NOWAIT) {
> + /*
> + * Error out if we'd need timestamp updates, as
> + * the generally requires blocking to dirty the
> + * inode in one form or another.
> + */
> + if (updated && inode_iversion_need_inc(inode))
> + goto bail;
I'm confused here. What the code does is that if S_NOWAIT is set and
i_version needs increment we bail. However the comment as well as the
changelog speaks about timestamps needing update and not about i_version.
And intuitively I agree that if any timestamp is updated, inode needs
dirtying and thus we should bail regardless of whether i_version is updated
as well or not. What am I missing?
Honza
> + } else {
> + if (inode_maybe_inc_iversion(inode, updated))
> + updated |= S_VERSION;
> + }
> }
> - if (IS_I_VERSION(inode) && inode_maybe_inc_iversion(inode, updated))
> - updated |= S_VERSION;
> +
> + if (updated & S_MTIME)
> + inode_set_mtime_to_ts(inode, now);
> } else {
> now = current_time(inode);
> }
> @@ -2131,6 +2145,9 @@ int inode_update_timestamps(struct inode *inode, int *flags)
>
> *flags = updated;
> return 0;
> +bail:
> + *flags = 0;
> + return -EAGAIN;
> }
> EXPORT_SYMBOL(inode_update_timestamps);
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 07/10] fs: add a ->sync_lazytime method
2025-12-17 12:30 ` Jan Kara
@ 2025-12-18 6:13 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-18 6:13 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Christian Brauner, Al Viro, David Sterba,
Mike Marshall, Martin Brandenburg, Carlos Maiolino, Stefan Roesch,
Jeff Layton, linux-kernel, linux-btrfs, linux-fsdevel, gfs2,
io-uring, devel, linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Wed, Dec 17, 2025 at 01:30:18PM +0100, Jan Kara wrote:
> > if (flags & I_DIRTY_INODE) {
> > + bool was_dirty_time =
> > + inode_state_read_once(inode) & I_DIRTY_TIME;
> > +
> > /*
> > * Inode timestamp update will piggback on this dirtying.
> > * We tell ->dirty_inode callback that timestamps need to
> > * be updated by setting I_DIRTY_TIME in flags.
> > */
> > - if (inode_state_read_once(inode) & I_DIRTY_TIME) {
> > + if (was_dirty_time) {
> > spin_lock(&inode->i_lock);
> > if (inode_state_read(inode) & I_DIRTY_TIME) {
> > inode_state_clear(inode, I_DIRTY_TIME);
> > flags |= I_DIRTY_TIME;
> > + was_dirty_time = true;
>
> This looks bogus. was_dirty_time is already true here. What I think you
> wanted here is to set it to false if locked I_DIRTY_TIME check failed.
> Otherwise the patch looks good.
Or better set it to false at initialization time and only set it to
true here to simply things a bit. But otherwise: yes.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 08/10] fs: add support for non-blocking timestamp updates
2025-12-17 12:42 ` Jan Kara
@ 2025-12-18 6:19 ` Christoph Hellwig
2025-12-19 15:12 ` Jan Kara
0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-18 6:19 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Christian Brauner, Al Viro, David Sterba,
Mike Marshall, Martin Brandenburg, Carlos Maiolino, Stefan Roesch,
Jeff Layton, linux-kernel, linux-btrfs, linux-fsdevel, gfs2,
io-uring, devel, linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Wed, Dec 17, 2025 at 01:42:20PM +0100, Jan Kara wrote:
> > @@ -2110,12 +2110,26 @@ int inode_update_timestamps(struct inode *inode, int *flags)
> > now = inode_set_ctime_current(inode);
> > if (!timespec64_equal(&now, &ctime))
> > updated |= S_CTIME;
> > - if (!timespec64_equal(&now, &mtime)) {
> > - inode_set_mtime_to_ts(inode, now);
> > + if (!timespec64_equal(&now, &mtime))
> > updated |= S_MTIME;
> > +
> > + if (IS_I_VERSION(inode)) {
> > + if (*flags & S_NOWAIT) {
> > + /*
> > + * Error out if we'd need timestamp updates, as
> > + * the generally requires blocking to dirty the
> > + * inode in one form or another.
> > + */
> > + if (updated && inode_iversion_need_inc(inode))
> > + goto bail;
>
> I'm confused here. What the code does is that if S_NOWAIT is set and
> i_version needs increment we bail. However the comment as well as the
> changelog speaks about timestamps needing update and not about i_version.
> And intuitively I agree that if any timestamp is updated, inode needs
> dirtying and thus we should bail regardless of whether i_version is updated
> as well or not. What am I missing?
With lazytime timestamp updates that don't require i_version updates
are performed in-memory only, and we'll only reach this with S_NOWAIT
set for those (later in the series, it can't be reached at all as
of this patch).
But yes, this needs to be documented much better.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 08/10] fs: add support for non-blocking timestamp updates
2025-12-18 6:19 ` Christoph Hellwig
@ 2025-12-19 15:12 ` Jan Kara
2025-12-22 23:41 ` Christoph Hellwig
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2025-12-19 15:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Christian Brauner, Al Viro, David Sterba, Mike Marshall,
Martin Brandenburg, Carlos Maiolino, Stefan Roesch, Jeff Layton,
linux-kernel, linux-btrfs, linux-fsdevel, gfs2, io-uring, devel,
linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Thu 18-12-25 07:19:00, Christoph Hellwig wrote:
> On Wed, Dec 17, 2025 at 01:42:20PM +0100, Jan Kara wrote:
> > > @@ -2110,12 +2110,26 @@ int inode_update_timestamps(struct inode *inode, int *flags)
> > > now = inode_set_ctime_current(inode);
> > > if (!timespec64_equal(&now, &ctime))
> > > updated |= S_CTIME;
> > > - if (!timespec64_equal(&now, &mtime)) {
> > > - inode_set_mtime_to_ts(inode, now);
> > > + if (!timespec64_equal(&now, &mtime))
> > > updated |= S_MTIME;
> > > +
> > > + if (IS_I_VERSION(inode)) {
> > > + if (*flags & S_NOWAIT) {
> > > + /*
> > > + * Error out if we'd need timestamp updates, as
> > > + * the generally requires blocking to dirty the
> > > + * inode in one form or another.
> > > + */
> > > + if (updated && inode_iversion_need_inc(inode))
> > > + goto bail;
> >
> > I'm confused here. What the code does is that if S_NOWAIT is set and
> > i_version needs increment we bail. However the comment as well as the
> > changelog speaks about timestamps needing update and not about i_version.
> > And intuitively I agree that if any timestamp is updated, inode needs
> > dirtying and thus we should bail regardless of whether i_version is updated
> > as well or not. What am I missing?
>
> With lazytime timestamp updates that don't require i_version updates
> are performed in-memory only, and we'll only reach this with S_NOWAIT
> set for those (later in the series, it can't be reached at all as
> of this patch).
Ah, I see now. Thanks for explanation. This interplay between filesystem's
.update_time() helper and inode_update_timestamps() is rather subtle.
Cannot we move the SB_LAZYTIME checking from .update_time() to
inode_update_timestamps() to have it all in one function? The hunk you're
adding to xfs_vn_update_time() later in the series looks like what the
other filesystems using it will want as well?
BTW, I've noticed that ovl_update_time() and fat_update_time() should be
safe wrt NOWAIT IO so perhaps you don't have to disable it in your patch
(or maybe reenable explicitly?).
And I don't really now what orangefs_update_time() is trying to do with its
__orangefs_setattr() call which just copies the zeroed-out timestamps from
iattr into the inode? Mike?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 08/10] fs: add support for non-blocking timestamp updates
2025-12-19 15:12 ` Jan Kara
@ 2025-12-22 23:41 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-12-22 23:41 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Christian Brauner, Al Viro, David Sterba,
Mike Marshall, Martin Brandenburg, Carlos Maiolino, Stefan Roesch,
Jeff Layton, linux-kernel, linux-btrfs, linux-fsdevel, gfs2,
io-uring, devel, linux-unionfs, linux-mtd, linux-xfs, linux-nfs
On Fri, Dec 19, 2025 at 04:12:01PM +0100, Jan Kara wrote:
> Ah, I see now. Thanks for explanation. This interplay between filesystem's
> .update_time() helper and inode_update_timestamps() is rather subtle.
> Cannot we move the SB_LAZYTIME checking from .update_time() to
> inode_update_timestamps() to have it all in one function? The hunk you're
> adding to xfs_vn_update_time() later in the series looks like what the
> other filesystems using it will want as well?
XFS is a bit special as it requires the ilock for timestamp updates
(I'm actually not sure how they are properly serialized for others,
but let's open that can of worms after this one is dealt with..).
But I came up with a way to make this a bit more obvious, which is
by moving the flags selection from mark_inode_dirty_time into
inode_update_timestamps.
> BTW, I've noticed that ovl_update_time() and fat_update_time() should be
> safe wrt NOWAIT IO so perhaps you don't have to disable it in your patch
> (or maybe reenable explicitly?).
fat is safe. overlayfs is not, touch_atime might sleep in the lower fs.
> And I don't really now what orangefs_update_time() is trying to do with its
> __orangefs_setattr() call which just copies the zeroed-out timestamps from
> iattr into the inode? Mike?
I'll leave that to Mike.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-12-22 23:42 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17 6:09 re-enable IOCB_NOWAIT writes to files v3 Christoph Hellwig
2025-12-17 6:09 ` [PATCH 01/10] fs: remove inode_update_time Christoph Hellwig
2025-12-17 6:09 ` [PATCH 02/10] fs: allow error returns from generic_update_time Christoph Hellwig
2025-12-17 6:09 ` [PATCH 03/10] fs: exit early in generic_update_time when there is no work Christoph Hellwig
2025-12-17 6:09 ` [PATCH 04/10] fs: factor out a mark_inode_dirty_time helper Christoph Hellwig
2025-12-17 6:09 ` [PATCH 05/10] fs: allow error returns from inode_update_timestamps Christoph Hellwig
2025-12-17 12:23 ` Jan Kara
2025-12-17 6:09 ` [PATCH 06/10] fs: factor out a sync_lazytime helper Christoph Hellwig
2025-12-17 12:26 ` Jan Kara
2025-12-17 6:09 ` [PATCH 07/10] fs: add a ->sync_lazytime method Christoph Hellwig
2025-12-17 12:30 ` Jan Kara
2025-12-18 6:13 ` Christoph Hellwig
2025-12-17 6:09 ` [PATCH 08/10] fs: add support for non-blocking timestamp updates Christoph Hellwig
2025-12-17 12:42 ` Jan Kara
2025-12-18 6:19 ` Christoph Hellwig
2025-12-19 15:12 ` Jan Kara
2025-12-22 23:41 ` Christoph Hellwig
2025-12-17 6:09 ` [PATCH 09/10] xfs: implement ->sync_lazytime Christoph Hellwig
2025-12-17 6:09 ` [PATCH 10/10] xfs: enable non-blocking timestamp updates Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox