* [RESEND PATCH v9 01/14] mm: Move starting of background writeback into the main balancing loop
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 02/14] mm: Move updates of dirty_exceeded into one place Stefan Roesch
` (11 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy
From: Jan Kara <[email protected]>
We start background writeback if we are over background threshold after
exiting the main loop in balance_dirty_pages(). This may result in
basing the decision on already stale values (we may have slept for
significant amount of time) and it is also inconvenient for refactoring
needed for async dirty throttling. Move the check into the main waiting
loop.
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
mm/page-writeback.c | 31 ++++++++++++++-----------------
1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 55c2776ae699..e59c523aed1a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1627,6 +1627,19 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
}
}
+ /*
+ * In laptop mode, we wait until hitting the higher threshold
+ * before starting background writeout, and then write out all
+ * the way down to the lower threshold. So slow writers cause
+ * minimal disk activity.
+ *
+ * In normal mode, we start background writeout at the lower
+ * background_thresh, to keep the amount of dirty memory low.
+ */
+ if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh &&
+ !writeback_in_progress(wb))
+ wb_start_background_writeback(wb);
+
/*
* Throttle it only when the background writeback cannot
* catch-up. This avoids (excessively) small writeouts
@@ -1657,6 +1670,7 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
break;
}
+ /* Start writeback even when in laptop mode */
if (unlikely(!writeback_in_progress(wb)))
wb_start_background_writeback(wb);
@@ -1823,23 +1837,6 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
if (!dirty_exceeded && wb->dirty_exceeded)
wb->dirty_exceeded = 0;
-
- if (writeback_in_progress(wb))
- return;
-
- /*
- * In laptop mode, we wait until hitting the higher threshold before
- * starting background writeout, and then write out all the way down
- * to the lower threshold. So slow writers cause minimal disk activity.
- *
- * In normal mode, we start background writeout at the lower
- * background_thresh, to keep the amount of dirty memory low.
- */
- if (laptop_mode)
- return;
-
- if (nr_reclaimable > gdtc->bg_thresh)
- wb_start_background_writeback(wb);
}
static DEFINE_PER_CPU(int, bdp_ratelimits);
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 02/14] mm: Move updates of dirty_exceeded into one place
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 01/14] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 03/14] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
` (10 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy
From: Jan Kara <[email protected]>
Transition of wb->dirty_exceeded from 0 to 1 happens before we go to
sleep in balance_dirty_pages() while transition from 1 to 0 happens when
exiting from balance_dirty_pages(), possibly based on old values. This
does not make a lot of sense since wb->dirty_exceeded should simply
reflect whether wb is over dirty limit and so we should ratelimit
entering to balance_dirty_pages() less. Move the two updates together.
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
---
mm/page-writeback.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index e59c523aed1a..90b1998c16a1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1729,8 +1729,8 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
sdtc = mdtc;
}
- if (dirty_exceeded && !wb->dirty_exceeded)
- wb->dirty_exceeded = 1;
+ if (dirty_exceeded != wb->dirty_exceeded)
+ wb->dirty_exceeded = dirty_exceeded;
if (time_is_before_jiffies(READ_ONCE(wb->bw_time_stamp) +
BANDWIDTH_INTERVAL))
@@ -1834,9 +1834,6 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
if (fatal_signal_pending(current))
break;
}
-
- if (!dirty_exceeded && wb->dirty_exceeded)
- wb->dirty_exceeded = 0;
}
static DEFINE_PER_CPU(int, bdp_ratelimits);
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 03/14] mm: Add balance_dirty_pages_ratelimited_flags() function
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 01/14] mm: Move starting of background writeback into the main balancing loop Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 02/14] mm: Move updates of dirty_exceeded into one place Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 05/14] iomap: Add async buffered write support Stefan Roesch
` (9 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy, Christoph Hellwig
From: Jan Kara <[email protected]>
This adds the helper function balance_dirty_pages_ratelimited_flags().
It adds the parameter flags to balance_dirty_pages_ratelimited().
The flags parameter is passed to balance_dirty_pages(). For async
buffered writes the flag value will be BDP_ASYNC.
If balance_dirty_pages() gets called for async buffered write, we don't
want to wait. Instead we need to indicate to the caller that throttling
is needed so that it can stop writing and offload the rest of the write
to a context that can block.
The new helper function is also used by balance_dirty_pages_ratelimited().
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Stefan Roesch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
include/linux/writeback.h | 7 ++++++
mm/page-writeback.c | 51 +++++++++++++++++++++++++++++++--------
2 files changed, 48 insertions(+), 10 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index da21d63f70e2..b8c9610c2313 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -364,7 +364,14 @@ void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty);
unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
void wb_update_bandwidth(struct bdi_writeback *wb);
+
+/* Invoke balance dirty pages in async mode. */
+#define BDP_ASYNC 0x0001
+
void balance_dirty_pages_ratelimited(struct address_space *mapping);
+int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
+ unsigned int flags);
+
bool wb_over_bg_thresh(struct bdi_writeback *wb);
typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 90b1998c16a1..bfca433640fc 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1554,8 +1554,8 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
* If we're over `background_thresh' then the writeback threads are woken to
* perform some writeout.
*/
-static void balance_dirty_pages(struct bdi_writeback *wb,
- unsigned long pages_dirtied)
+static int balance_dirty_pages(struct bdi_writeback *wb,
+ unsigned long pages_dirtied, unsigned int flags)
{
struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) };
struct dirty_throttle_control mdtc_stor = { MDTC_INIT(wb, &gdtc_stor) };
@@ -1575,6 +1575,7 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
struct backing_dev_info *bdi = wb->bdi;
bool strictlimit = bdi->capabilities & BDI_CAP_STRICTLIMIT;
unsigned long start_time = jiffies;
+ int ret = 0;
for (;;) {
unsigned long now = jiffies;
@@ -1803,6 +1804,10 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
period,
pause,
start_time);
+ if (flags & BDP_ASYNC) {
+ ret = -EAGAIN;
+ break;
+ }
__set_current_state(TASK_KILLABLE);
wb->dirty_sleep = now;
io_schedule_timeout(pause);
@@ -1834,6 +1839,7 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
if (fatal_signal_pending(current))
break;
}
+ return ret;
}
static DEFINE_PER_CPU(int, bdp_ratelimits);
@@ -1855,27 +1861,34 @@ static DEFINE_PER_CPU(int, bdp_ratelimits);
DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
/**
- * balance_dirty_pages_ratelimited - balance dirty memory state
- * @mapping: address_space which was dirtied
+ * balance_dirty_pages_ratelimited_flags - Balance dirty memory state.
+ * @mapping: address_space which was dirtied.
+ * @flags: BDP flags.
*
* Processes which are dirtying memory should call in here once for each page
* which was newly dirtied. The function will periodically check the system's
* dirty state and will initiate writeback if needed.
*
- * Once we're over the dirty memory limit we decrease the ratelimiting
- * by a lot, to prevent individual processes from overshooting the limit
- * by (ratelimit_pages) each.
+ * See balance_dirty_pages_ratelimited() for details.
+ *
+ * Return: If @flags contains BDP_ASYNC, it may return -EAGAIN to
+ * indicate that memory is out of balance and the caller must wait
+ * for I/O to complete. Otherwise, it will return 0 to indicate
+ * that either memory was already in balance, or it was able to sleep
+ * until the amount of dirty memory returned to balance.
*/
-void balance_dirty_pages_ratelimited(struct address_space *mapping)
+int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
+ unsigned int flags)
{
struct inode *inode = mapping->host;
struct backing_dev_info *bdi = inode_to_bdi(inode);
struct bdi_writeback *wb = NULL;
int ratelimit;
+ int ret = 0;
int *p;
if (!(bdi->capabilities & BDI_CAP_WRITEBACK))
- return;
+ return ret;
if (inode_cgwb_enabled(inode))
wb = wb_get_create_current(bdi, GFP_KERNEL);
@@ -1915,9 +1928,27 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
preempt_enable();
if (unlikely(current->nr_dirtied >= ratelimit))
- balance_dirty_pages(wb, current->nr_dirtied);
+ balance_dirty_pages(wb, current->nr_dirtied, flags);
wb_put(wb);
+ return ret;
+}
+
+/**
+ * balance_dirty_pages_ratelimited - balance dirty memory state.
+ * @mapping: address_space which was dirtied.
+ *
+ * Processes which are dirtying memory should call in here once for each page
+ * which was newly dirtied. The function will periodically check the system's
+ * dirty state and will initiate writeback if needed.
+ *
+ * Once we're over the dirty memory limit we decrease the ratelimiting
+ * by a lot, to prevent individual processes from overshooting the limit
+ * by (ratelimit_pages) each.
+ */
+void balance_dirty_pages_ratelimited(struct address_space *mapping)
+{
+ balance_dirty_pages_ratelimited_flags(mapping, 0);
}
EXPORT_SYMBOL(balance_dirty_pages_ratelimited);
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 05/14] iomap: Add async buffered write support
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (2 preceding siblings ...)
2022-06-23 17:51 ` [RESEND PATCH v9 03/14] mm: Add balance_dirty_pages_ratelimited_flags() function Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter() Stefan Roesch
` (8 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy, Christoph Hellwig
This adds async buffered write support to iomap.
This replaces the call to balance_dirty_pages_ratelimited() with the
call to balance_dirty_pages_ratelimited_flags. This allows to specify if
the write request is async or not.
In addition this also moves the above function call to the beginning of
the function. If the function call is at the end of the function and the
decision is made to throttle writes, then there is no request that
io-uring can wait on. By moving it to the beginning of the function, the
write request is not issued, but returns -EAGAIN instead. io-uring will
punt the request and process it in the io-worker.
By moving the function call to the beginning of the function, the write
throttling will happen one page later.
Signed-off-by: Stefan Roesch <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/iomap/buffered-io.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3c97b713f831..83cf093fcb92 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -559,6 +559,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
loff_t block_size = i_blocksize(iter->inode);
loff_t block_start = round_down(pos, block_size);
loff_t block_end = round_up(pos + len, block_size);
+ unsigned int nr_blocks = i_blocks_per_folio(iter->inode, folio);
size_t from = offset_in_folio(folio, pos), to = from + len;
size_t poff, plen;
@@ -567,6 +568,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
folio_clear_error(folio);
iop = iomap_page_create(iter->inode, folio, iter->flags);
+ if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1)
+ return -EAGAIN;
do {
iomap_adjust_read_range(iter->inode, folio, &block_start,
@@ -584,7 +587,12 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
return -EIO;
folio_zero_segments(folio, poff, from, to, poff + plen);
} else {
- int status = iomap_read_folio_sync(block_start, folio,
+ int status;
+
+ if (iter->flags & IOMAP_NOWAIT)
+ return -EAGAIN;
+
+ status = iomap_read_folio_sync(block_start, folio,
poff, plen, srcmap);
if (status)
return status;
@@ -613,6 +621,9 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
int status = 0;
+ if (iter->flags & IOMAP_NOWAIT)
+ fgp |= FGP_NOWAIT;
+
BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
if (srcmap != &iter->iomap)
BUG_ON(pos + len > srcmap->offset + srcmap->length);
@@ -632,7 +643,7 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
fgp, mapping_gfp_mask(iter->inode->i_mapping));
if (!folio) {
- status = -ENOMEM;
+ status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
goto out_no_page;
}
if (pos + len > folio_pos(folio) + folio_size(folio))
@@ -750,6 +761,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
loff_t pos = iter->pos;
ssize_t written = 0;
long status = 0;
+ struct address_space *mapping = iter->inode->i_mapping;
+ unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
do {
struct folio *folio;
@@ -762,6 +775,11 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
bytes = min_t(unsigned long, PAGE_SIZE - offset,
iov_iter_count(i));
again:
+ status = balance_dirty_pages_ratelimited_flags(mapping,
+ bdp_flags);
+ if (unlikely(status))
+ break;
+
if (bytes > length)
bytes = length;
@@ -770,6 +788,10 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
* Otherwise there's a nasty deadlock on copying from the
* same page as we're writing to, without it being marked
* up-to-date.
+ *
+ * For async buffered writes the assumption is that the user
+ * page has already been faulted in. This can be optimized by
+ * faulting the user page.
*/
if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {
status = -EFAULT;
@@ -781,7 +803,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
break;
page = folio_file_page(folio, pos >> PAGE_SHIFT);
- if (mapping_writably_mapped(iter->inode->i_mapping))
+ if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
copied = copy_page_from_iter_atomic(page, offset, bytes, i);
@@ -806,8 +828,6 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
pos += status;
written += status;
length -= status;
-
- balance_dirty_pages_ratelimited(iter->inode->i_mapping);
} while (iov_iter_count(i) && length);
return written ? written : status;
@@ -825,6 +845,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
};
int ret;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ iter.flags |= IOMAP_NOWAIT;
+
while ((ret = iomap_iter(&iter, ops)) > 0)
iter.processed = iomap_write_iter(&iter, i);
if (iter.pos == iocb->ki_pos)
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter()
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (3 preceding siblings ...)
2022-06-23 17:51 ` [RESEND PATCH v9 05/14] iomap: Add async buffered write support Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 20:18 ` Darrick J. Wong
2022-06-24 5:19 ` Christoph Hellwig
2022-06-23 17:51 ` [RESEND PATCH v9 09/14] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
` (7 subsequent siblings)
12 siblings, 2 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy
If iomap_write_iter() encounters -EAGAIN, return -EAGAIN to the caller.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/iomap/buffered-io.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 83cf093fcb92..f2e36240079f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -830,7 +830,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
length -= status;
} while (iov_iter_count(i) && length);
- return written ? written : status;
+ if (status == -EAGAIN) {
+ iov_iter_revert(i, written);
+ return -EAGAIN;
+ }
+ if (written)
+ return written;
+ return status;
}
ssize_t
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter()
2022-06-23 17:51 ` [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter() Stefan Roesch
@ 2022-06-23 20:18 ` Darrick J. Wong
2022-06-23 20:23 ` Stefan Roesch
2022-06-24 5:19 ` Christoph Hellwig
1 sibling, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2022-06-23 20:18 UTC (permalink / raw)
To: Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, axboe, willy
On Thu, Jun 23, 2022 at 10:51:49AM -0700, Stefan Roesch wrote:
> If iomap_write_iter() encounters -EAGAIN, return -EAGAIN to the caller.
>
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
> fs/iomap/buffered-io.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 83cf093fcb92..f2e36240079f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -830,7 +830,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> length -= status;
> } while (iov_iter_count(i) && length);
>
> - return written ? written : status;
> + if (status == -EAGAIN) {
> + iov_iter_revert(i, written);
> + return -EAGAIN;
> + }
> + if (written)
> + return written;
> + return status;
Any particular reason for decomposing the ternary into this? It still
looks correct, but it doesn't seem totally necessary...
Reviewed-by: Darrick J. Wong <[email protected]>
--D
> }
>
> ssize_t
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter()
2022-06-23 20:18 ` Darrick J. Wong
@ 2022-06-23 20:23 ` Stefan Roesch
2022-06-23 20:32 ` Darrick J. Wong
0 siblings, 1 reply; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 20:23 UTC (permalink / raw)
To: Darrick J. Wong
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, axboe, willy
On 6/23/22 1:18 PM, Darrick J. Wong wrote:
> On Thu, Jun 23, 2022 at 10:51:49AM -0700, Stefan Roesch wrote:
>> If iomap_write_iter() encounters -EAGAIN, return -EAGAIN to the caller.
>>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>> fs/iomap/buffered-io.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 83cf093fcb92..f2e36240079f 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -830,7 +830,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>> length -= status;
>> } while (iov_iter_count(i) && length);
>>
>> - return written ? written : status;
>> + if (status == -EAGAIN) {
>> + iov_iter_revert(i, written);
>> + return -EAGAIN;
>> + }
>> + if (written)
>> + return written;
>> + return status;
>
> Any particular reason for decomposing the ternary into this? It still
> looks correct, but it doesn't seem totally necessary...
>
Do you prefer this version?
+ if (status == -EAGAIN) {
+ iov_iter_revert(i, written);
+ return -EAGAIN;
+ }
return written ? written : status;
> Reviewed-by: Darrick J. Wong <[email protected]>
>
> --D
>
>> }
>>
>> ssize_t
>> --
>> 2.30.2
>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter()
2022-06-23 20:23 ` Stefan Roesch
@ 2022-06-23 20:32 ` Darrick J. Wong
0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2022-06-23 20:32 UTC (permalink / raw)
To: Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, axboe, willy
On Thu, Jun 23, 2022 at 01:23:02PM -0700, Stefan Roesch wrote:
>
>
> On 6/23/22 1:18 PM, Darrick J. Wong wrote:
> > On Thu, Jun 23, 2022 at 10:51:49AM -0700, Stefan Roesch wrote:
> >> If iomap_write_iter() encounters -EAGAIN, return -EAGAIN to the caller.
> >>
> >> Signed-off-by: Stefan Roesch <[email protected]>
> >> ---
> >> fs/iomap/buffered-io.c | 8 +++++++-
> >> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> >> index 83cf093fcb92..f2e36240079f 100644
> >> --- a/fs/iomap/buffered-io.c
> >> +++ b/fs/iomap/buffered-io.c
> >> @@ -830,7 +830,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> >> length -= status;
> >> } while (iov_iter_count(i) && length);
> >>
> >> - return written ? written : status;
> >> + if (status == -EAGAIN) {
> >> + iov_iter_revert(i, written);
> >> + return -EAGAIN;
> >> + }
> >> + if (written)
> >> + return written;
> >> + return status;
> >
> > Any particular reason for decomposing the ternary into this? It still
> > looks correct, but it doesn't seem totally necessary...
> >
>
> Do you prefer this version?
>
> + if (status == -EAGAIN) {
> + iov_iter_revert(i, written);
> + return -EAGAIN;
> + }
> return written ? written : status;
Yes, because it /does/ make it a lot more obvious that the only change
is intercepting EAGAIN to rewind the iov_iter. :)
--D
>
>
> > Reviewed-by: Darrick J. Wong <[email protected]>
> >
> > --D
> >
> >> }
> >>
> >> ssize_t
> >> --
> >> 2.30.2
> >>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter()
2022-06-23 17:51 ` [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter() Stefan Roesch
2022-06-23 20:18 ` Darrick J. Wong
@ 2022-06-24 5:19 ` Christoph Hellwig
1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-06-24 5:19 UTC (permalink / raw)
To: Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, axboe, willy
On Thu, Jun 23, 2022 at 10:51:49AM -0700, Stefan Roesch wrote:
> If iomap_write_iter() encounters -EAGAIN, return -EAGAIN to the caller.
>
> Signed-off-by: Stefan Roesch <[email protected]>
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 09/14] fs: Split off inode_needs_update_time and __file_update_time
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (4 preceding siblings ...)
2022-06-23 17:51 ` [RESEND PATCH v9 06/14] iomap: Return -EAGAIN from iomap_write_iter() Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 11/14] io_uring: Add support for async buffered writes Stefan Roesch
` (6 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy, Christian Brauner
This splits off the functions inode_needs_update_time() and
__file_update_time() from the function file_update_time().
This is required to support async buffered writes.
No intended functional changes in this patch.
Signed-off-by: Stefan Roesch <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
---
fs/inode.c | 76 +++++++++++++++++++++++++++++++++++-------------------
1 file changed, 50 insertions(+), 26 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index a2e18379c8a6..ff726d99ecc7 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2049,35 +2049,18 @@ int file_remove_privs(struct file *file)
}
EXPORT_SYMBOL(file_remove_privs);
-/**
- * file_update_time - update mtime and ctime time
- * @file: file accessed
- *
- * Update the mtime and ctime members of an inode and mark the inode
- * for writeback. Note that this function is meant exclusively for
- * usage in the file write path of filesystems, and filesystems may
- * choose to explicitly ignore update via this function with the
- * S_NOCMTIME inode flag, e.g. for network filesystem where these
- * timestamps are handled by the server. This can return an error for
- * file systems who need to allocate space in order to update an inode.
- */
-
-int file_update_time(struct file *file)
+static int inode_needs_update_time(struct inode *inode, struct timespec64 *now)
{
- struct inode *inode = file_inode(file);
- struct timespec64 now;
int sync_it = 0;
- int ret;
/* First try to exhaust all avenues to not sync */
if (IS_NOCMTIME(inode))
return 0;
- now = current_time(inode);
- if (!timespec64_equal(&inode->i_mtime, &now))
+ if (!timespec64_equal(&inode->i_mtime, now))
sync_it = S_MTIME;
- if (!timespec64_equal(&inode->i_ctime, &now))
+ if (!timespec64_equal(&inode->i_ctime, now))
sync_it |= S_CTIME;
if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode))
@@ -2086,15 +2069,50 @@ int file_update_time(struct file *file)
if (!sync_it)
return 0;
- /* Finally allowed to write? Takes lock. */
- if (__mnt_want_write_file(file))
- return 0;
+ return sync_it;
+}
+
+static int __file_update_time(struct file *file, struct timespec64 *now,
+ int sync_mode)
+{
+ int ret = 0;
+ struct inode *inode = file_inode(file);
- ret = inode_update_time(inode, &now, sync_it);
- __mnt_drop_write_file(file);
+ /* try to update time settings */
+ if (!__mnt_want_write_file(file)) {
+ ret = inode_update_time(inode, now, sync_mode);
+ __mnt_drop_write_file(file);
+ }
return ret;
}
+
+/**
+ * file_update_time - update mtime and ctime time
+ * @file: file accessed
+ *
+ * Update the mtime and ctime members of an inode and mark the inode for
+ * writeback. Note that this function is meant exclusively for usage in
+ * the file write path of filesystems, and filesystems may choose to
+ * explicitly ignore updates via this function with the _NOCMTIME inode
+ * flag, e.g. for network filesystem where these imestamps are handled
+ * by the server. This can return an error for file systems who need to
+ * allocate space in order to update an inode.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int file_update_time(struct file *file)
+{
+ int ret;
+ struct inode *inode = file_inode(file);
+ struct timespec64 now = current_time(inode);
+
+ ret = inode_needs_update_time(inode, &now);
+ if (ret <= 0)
+ return ret;
+
+ return __file_update_time(file, &now, ret);
+}
EXPORT_SYMBOL(file_update_time);
/**
@@ -2111,6 +2129,8 @@ EXPORT_SYMBOL(file_update_time);
int file_modified(struct file *file)
{
int ret;
+ struct inode *inode = file_inode(file);
+ struct timespec64 now = current_time(inode);
/*
* Clear the security bits if the process is not being run by root.
@@ -2123,7 +2143,11 @@ int file_modified(struct file *file)
if (unlikely(file->f_mode & FMODE_NOCMTIME))
return 0;
- return file_update_time(file);
+ ret = inode_needs_update_time(inode, &now);
+ if (ret <= 0)
+ return ret;
+
+ return __file_update_time(file, &now, ret);
}
EXPORT_SYMBOL(file_modified);
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 11/14] io_uring: Add support for async buffered writes
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (5 preceding siblings ...)
2022-06-23 17:51 ` [RESEND PATCH v9 09/14] fs: Split off inode_needs_update_time and __file_update_time Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 12/14] io_uring: Add tracepoint for short writes Stefan Roesch
` (5 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy
This enables the async buffered writes for the filesystems that support
async buffered writes in io-uring. Buffered writes are enabled for
blocks that are already in the page cache or can be acquired with noio.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/io_uring.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3aab4182fd89..22a0bb8c5fe5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4311,7 +4311,7 @@ static inline int io_iter_do_read(struct io_kiocb *req, struct iov_iter *iter)
return -EINVAL;
}
-static bool need_read_all(struct io_kiocb *req)
+static bool need_complete_io(struct io_kiocb *req)
{
return req->flags & REQ_F_ISREG ||
S_ISBLK(file_inode(req->file)->i_mode);
@@ -4440,7 +4440,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
} else if (ret == -EIOCBQUEUED) {
goto out_free;
} else if (ret == req->cqe.res || ret <= 0 || !force_nonblock ||
- (req->flags & REQ_F_NOWAIT) || !need_read_all(req)) {
+ (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) {
/* read all, failed, already did sync or don't want to retry */
goto done;
}
@@ -4536,9 +4536,10 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(!io_file_supports_nowait(req)))
goto copy_iov;
- /* file path doesn't support NOWAIT for non-direct_IO */
- if (force_nonblock && !(kiocb->ki_flags & IOCB_DIRECT) &&
- (req->flags & REQ_F_ISREG))
+ /* File path supports NOWAIT for non-direct_IO only for block devices. */
+ if (!(kiocb->ki_flags & IOCB_DIRECT) &&
+ !(kiocb->ki_filp->f_mode & FMODE_BUF_WASYNC) &&
+ (req->flags & REQ_F_ISREG))
goto copy_iov;
kiocb->ki_flags |= IOCB_NOWAIT;
@@ -4592,6 +4593,24 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
/* IOPOLL retry should happen for io-wq threads */
if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL))
goto copy_iov;
+
+ if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)) {
+ struct io_async_rw *rw;
+
+ /* This is a partial write. The file pos has already been
+ * updated, setup the async struct to complete the request
+ * in the worker. Also update bytes_done to account for
+ * the bytes already written.
+ */
+ iov_iter_save_state(&s->iter, &s->iter_state);
+ ret = io_setup_async_rw(req, iovec, s, true);
+
+ rw = req->async_data;
+ if (rw)
+ rw->bytes_done += ret2;
+
+ return ret ? ret : -EAGAIN;
+ }
done:
kiocb_done(req, ret2, issue_flags);
} else {
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 12/14] io_uring: Add tracepoint for short writes
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (6 preceding siblings ...)
2022-06-23 17:51 ` [RESEND PATCH v9 11/14] io_uring: Add support for async buffered writes Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 17:51 ` [RESEND PATCH v9 14/14] xfs: Add async buffered write support Stefan Roesch
` (4 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy
This adds the io_uring_short_write tracepoint to io_uring. A short write
is issued if not all pages that are required for a write are in the page
cache and the async buffered writes have to return EAGAIN.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/io_uring.c | 3 +++
include/trace/events/io_uring.h | 25 +++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 22a0bb8c5fe5..510c09192832 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4597,6 +4597,9 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)) {
struct io_async_rw *rw;
+ trace_io_uring_short_write(req->ctx, kiocb->ki_pos - ret2,
+ req->cqe.res, ret2);
+
/* This is a partial write. The file pos has already been
* updated, setup the async struct to complete the request
* in the worker. Also update bytes_done to account for
diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h
index 66fcc5a1a5b1..25df513660cc 100644
--- a/include/trace/events/io_uring.h
+++ b/include/trace/events/io_uring.h
@@ -600,6 +600,31 @@ TRACE_EVENT(io_uring_cqe_overflow,
__entry->cflags, __entry->ocqe)
);
+TRACE_EVENT(io_uring_short_write,
+
+ TP_PROTO(void *ctx, u64 fpos, u64 wanted, u64 got),
+
+ TP_ARGS(ctx, fpos, wanted, got),
+
+ TP_STRUCT__entry(
+ __field(void *, ctx)
+ __field(u64, fpos)
+ __field(u64, wanted)
+ __field(u64, got)
+ ),
+
+ TP_fast_assign(
+ __entry->ctx = ctx;
+ __entry->fpos = fpos;
+ __entry->wanted = wanted;
+ __entry->got = got;
+ ),
+
+ TP_printk("ring %p, fpos %lld, wanted %lld, got %lld",
+ __entry->ctx, __entry->fpos,
+ __entry->wanted, __entry->got)
+);
+
#endif /* _TRACE_IO_URING_H */
/* This part must be outside protection */
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RESEND PATCH v9 14/14] xfs: Add async buffered write support
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (7 preceding siblings ...)
2022-06-23 17:51 ` [RESEND PATCH v9 12/14] io_uring: Add tracepoint for short writes Stefan Roesch
@ 2022-06-23 17:51 ` Stefan Roesch
2022-06-23 20:31 ` [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Darrick J. Wong
` (3 subsequent siblings)
12 siblings, 0 replies; 28+ messages in thread
From: Stefan Roesch @ 2022-06-23 17:51 UTC (permalink / raw)
To: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel
Cc: shr, david, jack, hch, axboe, willy, Christoph Hellwig
This adds the async buffered write support to XFS. For async buffered
write requests, the request will return -EAGAIN if the ilock cannot be
obtained immediately.
Signed-off-by: Stefan Roesch <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/xfs/xfs_file.c | 11 +++++------
fs/xfs/xfs_iomap.c | 5 ++++-
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5a171c0b244b..8d9b14d2b912 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -410,7 +410,7 @@ xfs_file_write_checks(
spin_unlock(&ip->i_flags_lock);
out:
- return file_modified(file);
+ return kiocb_modified(iocb);
}
static int
@@ -700,12 +700,11 @@ xfs_file_buffered_write(
bool cleared_space = false;
unsigned int iolock;
- if (iocb->ki_flags & IOCB_NOWAIT)
- return -EOPNOTSUPP;
-
write_retry:
iolock = XFS_IOLOCK_EXCL;
- xfs_ilock(ip, iolock);
+ ret = xfs_ilock_iocb(iocb, iolock);
+ if (ret)
+ return ret;
ret = xfs_file_write_checks(iocb, from, &iolock);
if (ret)
@@ -1165,7 +1164,7 @@ xfs_file_open(
{
if (xfs_is_shutdown(XFS_M(inode->i_sb)))
return -EIO;
- file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
+ file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
return generic_file_open(inode, file);
}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index bcf7c3694290..5d50fed291b4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -886,6 +886,7 @@ xfs_buffered_write_iomap_begin(
bool eof = false, cow_eof = false, shared = false;
int allocfork = XFS_DATA_FORK;
int error = 0;
+ unsigned int lockmode = XFS_ILOCK_EXCL;
if (xfs_is_shutdown(mp))
return -EIO;
@@ -897,7 +898,9 @@ xfs_buffered_write_iomap_begin(
ASSERT(!XFS_IS_REALTIME_INODE(ip));
- xfs_ilock(ip, XFS_ILOCK_EXCL);
+ error = xfs_ilock_for_iomap(ip, flags, &lockmode);
+ if (error)
+ return error;
if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(&ip->i_df)) ||
XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) {
--
2.30.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (8 preceding siblings ...)
2022-06-23 17:51 ` [RESEND PATCH v9 14/14] xfs: Add async buffered write support Stefan Roesch
@ 2022-06-23 20:31 ` Darrick J. Wong
2022-06-23 22:06 ` Jens Axboe
2022-06-24 5:14 ` Christoph Hellwig
[not found] ` <[email protected]>
` (2 subsequent siblings)
12 siblings, 2 replies; 28+ messages in thread
From: Darrick J. Wong @ 2022-06-23 20:31 UTC (permalink / raw)
To: Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, axboe, willy
On Thu, Jun 23, 2022 at 10:51:43AM -0700, Stefan Roesch wrote:
> This patch series adds support for async buffered writes when using both
> xfs and io-uring. Currently io-uring only supports buffered writes in the
> slow path, by processing them in the io workers. With this patch series it is
> now possible to support buffered writes in the fast path. To be able to use
> the fast path the required pages must be in the page cache, the required locks
> in xfs can be granted immediately and no additional blocks need to be read
> form disk.
>
> Updating the inode can take time. An optimization has been implemented for
> the time update. Time updates will be processed in the slow path. While there
> is already a time update in process, other write requests for the same file,
> can skip the update of the modification time.
>
>
> Performance results:
> For fio the following results have been obtained with a queue depth of
> 1 and 4k block size (runtime 600 secs):
>
> sequential writes:
> without patch with patch libaio psync
> iops: 77k 209k 195K 233K
> bw: 314MB/s 854MB/s 790MB/s 953MB/s
> clat: 9600ns 120ns 540ns 3000ns
Hey, nice!
>
>
> For an io depth of 1, the new patch improves throughput by over three times
> (compared to the exiting behavior, where buffered writes are processed by an
> io-worker process) and also the latency is considerably reduced. To achieve the
> same or better performance with the exisiting code an io depth of 4 is required.
> Increasing the iodepth further does not lead to improvements.
>
> In addition the latency of buffered write operations is reduced considerably.
>
>
>
> Support for async buffered writes:
>
> To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In
> addition the check in generic_write_checks is modified to allow for async
> buffered writes that have this flag set.
>
> Changes to the iomap page create function to allow the caller to specify
> the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set
> and specifies the requested gfp flags.
>
> Adds the iomap async buffered write support to the xfs iomap layer.
> Adds async buffered write support to the xfs iomap layer.
>
> Support for async buffered write support and inode time modification
>
> Splits the functions for checking if the file privileges need to be removed in
> two functions: check function and a function for the removal of file privileges.
> The same split is also done for the function to update the file modification time.
>
> Implement an optimization that while a file modification time is pending other
> requests for the same file don't need to wait for the file modification update.
> This avoids that a considerable number of buffered async write requests get
> punted.
>
> Take the ilock in nowait mode if async buffered writes are enabled and enable
> the async buffered writes optimization in io_uring.
>
> Support for write throttling of async buffered writes:
>
> Add a no_wait parameter to the exisiting balance_dirty_pages() function. The
> function will return -EAGAIN if the parameter is true and write throttling is
> required.
>
> Add a new function called balance_dirty_pages_ratelimited_async() that will be
> invoked from iomap_write_iter() if an async buffered write is requested.
>
> Enable async buffered write support in xfs
> This enables async buffered writes for xfs.
>
>
> Testing:
> This patch has been tested with xfstests, fsx, fio and individual test programs.
Good to hear. Will there be some new fstest coming/already merged?
<snip>
Hmm, well, vger and lore are still having stomach problems, so even the
resend didn't result in #5 ending up in my mailbox. :(
For the patches I haven't received, I'll just attach my replies as
comments /after/ each patch subject line. What a way to review code!
> Jan Kara (3):
> mm: Move starting of background writeback into the main balancing loop
> mm: Move updates of dirty_exceeded into one place
> mm: Add balance_dirty_pages_ratelimited_flags() function
(Yeah, I guess these changes make sense...)
> Stefan Roesch (11):
> iomap: Add flags parameter to iomap_page_create()
> iomap: Add async buffered write support
Reviewed-by: Darrick J. Wong <[email protected]>
> iomap: Return -EAGAIN from iomap_write_iter()
> fs: Add check for async buffered writes to generic_write_checks
> fs: add __remove_file_privs() with flags parameter
> fs: Split off inode_needs_update_time and __file_update_time
> fs: Add async write file modification handling.
The commit message references a file_modified_async function, but all I
see is file_modified_flags? Assuming that's just a clerical error,
Reviewed-by: Darrick J. Wong <[email protected]>
> io_uring: Add support for async buffered writes
Hm, ok, so the EAGAINs that we sprinkle everywhere get turned into short
writes at the end of iomap_file_buffered_write, and that's what this
picks up? If so, then...
> io_uring: Add tracepoint for short writes
> xfs: Specify lockmode when calling xfs_ilock_for_iomap()
> xfs: Add async buffered write support
...I guess I'm ok with signing off on the last patch:
Reviewed-by: Darrick J. Wong <[email protected]>
--D
>
> fs/inode.c | 168 +++++++++++++++++++++++---------
> fs/io_uring.c | 32 +++++-
> fs/iomap/buffered-io.c | 71 +++++++++++---
> fs/read_write.c | 4 +-
> fs/xfs/xfs_file.c | 11 +--
> fs/xfs/xfs_iomap.c | 11 ++-
> include/linux/fs.h | 4 +
> include/linux/writeback.h | 7 ++
> include/trace/events/io_uring.h | 25 +++++
> mm/page-writeback.c | 89 +++++++++++------
> 10 files changed, 314 insertions(+), 108 deletions(-)
>
>
> base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes
2022-06-23 20:31 ` [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Darrick J. Wong
@ 2022-06-23 22:06 ` Jens Axboe
2022-06-24 5:14 ` Christoph Hellwig
1 sibling, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2022-06-23 22:06 UTC (permalink / raw)
To: Darrick J. Wong, Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, willy
On 6/23/22 2:31 PM, Darrick J. Wong wrote:
>> Testing:
>> This patch has been tested with xfstests, fsx, fio and individual test programs.
>
> Good to hear. Will there be some new fstest coming/already merged?
It should not really require any new tests, as anything buffered +
io_uring on xfs will now use this code. But Stefan has run a bunch of
things on the side too, some of those synthetic (like ensure that
various parts of a buffered write range isn't cached, etc) and some more
generic (fsx). There might be some that could be turned into xfstests,
I'll let him answer that one.
> Hmm, well, vger and lore are still having stomach problems, so even the
> resend didn't result in #5 ending up in my mailbox. :(
>
> For the patches I haven't received, I'll just attach my replies as
> comments /after/ each patch subject line. What a way to review code!
Really not sure what's going on with email these days, it's quite a
pain... Thanks for taking a look so quickly!
I've added your reviewed-bys and also made that ternary change you
suggested. Only other change is addressing a kernelbot noticing that one
ret in the mm side was being set to zero only, so we could kill it. End
result:
https://git.kernel.dk/cgit/linux-block/log/?h=for-5.20/io_uring-buffered-writes
--
Jens Axboe
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes
2022-06-23 20:31 ` [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Darrick J. Wong
2022-06-23 22:06 ` Jens Axboe
@ 2022-06-24 5:14 ` Christoph Hellwig
2022-06-24 14:49 ` Jens Axboe
1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2022-06-24 5:14 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Stefan Roesch, io-uring, kernel-team, linux-mm, linux-xfs,
linux-fsdevel, david, jack, hch, axboe, willy
On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote:
> Hmm, well, vger and lore are still having stomach problems, so even the
> resend didn't result in #5 ending up in my mailbox. :(
I can see a all here. Sometimes it helps to just wait a bit.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes
2022-06-24 5:14 ` Christoph Hellwig
@ 2022-06-24 14:49 ` Jens Axboe
2022-06-24 15:27 ` Ammar Faizi
0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2022-06-24 14:49 UTC (permalink / raw)
To: Christoph Hellwig, Darrick J. Wong
Cc: Stefan Roesch, io-uring, kernel-team, linux-mm, linux-xfs,
linux-fsdevel, david, jack, willy
On 6/23/22 11:14 PM, Christoph Hellwig wrote:
> On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote:
>> Hmm, well, vger and lore are still having stomach problems, so even the
>> resend didn't result in #5 ending up in my mailbox. :(
>
> I can see a all here. Sometimes it helps to just wait a bit.
on lore? I'm still seeing some missing. Which is a bit odd, since eg b4
can pull the series down just fine.
--
Jens Axboe
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes
2022-06-24 14:49 ` Jens Axboe
@ 2022-06-24 15:27 ` Ammar Faizi
2022-06-24 15:29 ` Jens Axboe
0 siblings, 1 reply; 28+ messages in thread
From: Ammar Faizi @ 2022-06-24 15:27 UTC (permalink / raw)
To: Jens Axboe, Christoph Hellwig, Darrick J. Wong
Cc: Stefan Roesch, io-uring, kernel-team, linux-mm, linux-xfs,
linux-fsdevel, david, jack, willy
On 6/24/22 9:49 PM, Jens Axboe wrote:
> On 6/23/22 11:14 PM, Christoph Hellwig wrote:
>> On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote:
>>> Hmm, well, vger and lore are still having stomach problems, so even the
>>> resend didn't result in #5 ending up in my mailbox. :(
>>
>> I can see a all here. Sometimes it helps to just wait a bit.
>
> on lore? I'm still seeing some missing. Which is a bit odd, since eg b4
> can pull the series down just fine.
I'm still seeing some missing on the io-uring lore too:
https://lore.kernel.org/io-uring/[email protected]/ (missing)
But changing the path to `/all/` shows the complete series:
https://lore.kernel.org/all/[email protected]/ (complete)
b4 seems to be fetching from `/all/`.
--
Ammar Faizi
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes
2022-06-24 15:27 ` Ammar Faizi
@ 2022-06-24 15:29 ` Jens Axboe
0 siblings, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2022-06-24 15:29 UTC (permalink / raw)
To: Ammar Faizi, Christoph Hellwig, Darrick J. Wong
Cc: Stefan Roesch, io-uring, kernel-team, linux-mm, linux-xfs,
linux-fsdevel, david, jack, willy, Konstantin Ryabitsev
On 6/24/22 9:27 AM, Ammar Faizi wrote:
> On 6/24/22 9:49 PM, Jens Axboe wrote:
>> On 6/23/22 11:14 PM, Christoph Hellwig wrote:
>>> On Thu, Jun 23, 2022 at 01:31:14PM -0700, Darrick J. Wong wrote:
>>>> Hmm, well, vger and lore are still having stomach problems, so even the
>>>> resend didn't result in #5 ending up in my mailbox. :(
>>>
>>> I can see a all here. Sometimes it helps to just wait a bit.
>>
>> on lore? I'm still seeing some missing. Which is a bit odd, since eg b4
>> can pull the series down just fine.
>
> I'm still seeing some missing on the io-uring lore too:
>
> https://lore.kernel.org/io-uring/[email protected]/ (missing)
>
> But changing the path to `/all/` shows the complete series:
>
> https://lore.kernel.org/all/[email protected]/ (complete)
>
> b4 seems to be fetching from `/all/`.
Ah, I see. Konstantin, do you know what is going on here? tldr - /all/
has the whole series, io-uring list does not.
--
Jens Axboe
^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <[email protected]>]
* Re: [RESEND PATCH v9 07/14] fs: Add check for async buffered writes to generic_write_checks
[not found] ` <[email protected]>
@ 2022-06-24 5:21 ` Christoph Hellwig
2022-06-24 14:48 ` Jens Axboe
2022-06-24 17:06 ` Jens Axboe
0 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-06-24 5:21 UTC (permalink / raw)
To: Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, axboe, willy, Christoph Hellwig, Christian Brauner
FYI, I think a subject like
"fs: add a FMODE_BUF_WASYNC flags for f_mode"
might be a more descriptive. As the new flag here really is the
interesting part, not that we check it.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 07/14] fs: Add check for async buffered writes to generic_write_checks
2022-06-24 5:21 ` [RESEND PATCH v9 07/14] fs: Add check for async buffered writes to generic_write_checks Christoph Hellwig
@ 2022-06-24 14:48 ` Jens Axboe
2022-06-24 17:06 ` Jens Axboe
1 sibling, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2022-06-24 14:48 UTC (permalink / raw)
To: Christoph Hellwig, Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, willy, Christoph Hellwig, Christian Brauner
On 6/23/22 11:21 PM, Christoph Hellwig wrote:
> FYI, I think a subject like
>
> "fs: add a FMODE_BUF_WASYNC flags for f_mode"
>
> might be a more descriptive. As the new flag here really is the
> interesting part, not that we check it.
Agree on that - if others do too, I can just make that edit.
--
Jens Axboe
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 07/14] fs: Add check for async buffered writes to generic_write_checks
2022-06-24 5:21 ` [RESEND PATCH v9 07/14] fs: Add check for async buffered writes to generic_write_checks Christoph Hellwig
2022-06-24 14:48 ` Jens Axboe
@ 2022-06-24 17:06 ` Jens Axboe
1 sibling, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2022-06-24 17:06 UTC (permalink / raw)
To: Christoph Hellwig, Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, willy, Christoph Hellwig, Christian Brauner
On 6/23/22 11:21 PM, Christoph Hellwig wrote:
> FYI, I think a subject like
>
> "fs: add a FMODE_BUF_WASYNC flags for f_mode"
>
> might be a more descriptive. As the new flag here really is the
> interesting part, not that we check it.
I made that edit.
--
Jens Axboe
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: (subset) [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes
2022-06-23 17:51 [RESEND PATCH v9 00/14] io-uring/xfs: support async buffered writes Stefan Roesch
` (10 preceding siblings ...)
[not found] ` <[email protected]>
@ 2022-06-25 12:48 ` Jens Axboe
[not found] ` <[email protected]>
12 siblings, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2022-06-25 12:48 UTC (permalink / raw)
To: linux-fsdevel, io-uring, linux-xfs, linux-mm, kernel-team, shr
Cc: david, jack, willy, hch
On Thu, 23 Jun 2022 10:51:43 -0700, Stefan Roesch wrote:
> This patch series adds support for async buffered writes when using both
> xfs and io-uring. Currently io-uring only supports buffered writes in the
> slow path, by processing them in the io workers. With this patch series it is
> now possible to support buffered writes in the fast path. To be able to use
> the fast path the required pages must be in the page cache, the required locks
> in xfs can be granted immediately and no additional blocks need to be read
> form disk.
>
> [...]
Applied, thanks!
[13/14] xfs: Specify lockmode when calling xfs_ilock_for_iomap()
(no commit info)
[14/14] xfs: Add async buffered write support
(no commit info)
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <[email protected]>]
* Re: [RESEND PATCH v9 04/14] iomap: Add flags parameter to iomap_page_create()
[not found] ` <[email protected]>
@ 2023-03-03 4:51 ` Matthew Wilcox
2023-03-03 16:53 ` Darrick J. Wong
0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2023-03-03 4:51 UTC (permalink / raw)
To: Stefan Roesch
Cc: io-uring, kernel-team, linux-mm, linux-xfs, linux-fsdevel, david,
jack, hch, axboe, Christoph Hellwig, Darrick J. Wong
On Thu, Jun 23, 2022 at 10:51:47AM -0700, Stefan Roesch wrote:
> Add the kiocb flags parameter to the function iomap_page_create().
> Depending on the value of the flags parameter it enables different gfp
> flags.
>
> No intended functional changes in this patch.
[...]
> @@ -226,7 +234,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
> if (WARN_ON_ONCE(size > iomap->length))
> return -EIO;
> if (offset > 0)
> - iop = iomap_page_create(iter->inode, folio);
> + iop = iomap_page_create(iter->inode, folio, iter->flags);
> else
> iop = to_iomap_page(folio);
I really don't like what this change has done to this file. I'm
modifying this function, and I start thinking "Well, hang on, if
flags has IOMAP_NOWAIT set, then GFP_NOWAIT can fail, and iop
will be NULL, so we'll end up marking the entire folio uptodate
when really we should only be marking some blocks uptodate, so
we should really be failing the entire read if the allocation
failed, but maybe it's OK because IOMAP_NOWAIT is never set in
this path".
I don't know how we fix this. Maybe return ERR_PTR(-ENOMEM) or
-EAGAIN if the memory allocation fails (leaving the NULL return
for "we don't need an iop"). Thoughts?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 04/14] iomap: Add flags parameter to iomap_page_create()
2023-03-03 4:51 ` [RESEND PATCH v9 04/14] iomap: Add flags parameter to iomap_page_create() Matthew Wilcox
@ 2023-03-03 16:53 ` Darrick J. Wong
2023-03-03 17:29 ` Stefan Roesch
0 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2023-03-03 16:53 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Stefan Roesch, io-uring, kernel-team, linux-mm, linux-xfs,
linux-fsdevel, david, jack, hch, axboe, Christoph Hellwig
On Fri, Mar 03, 2023 at 04:51:10AM +0000, Matthew Wilcox wrote:
> On Thu, Jun 23, 2022 at 10:51:47AM -0700, Stefan Roesch wrote:
> > Add the kiocb flags parameter to the function iomap_page_create().
> > Depending on the value of the flags parameter it enables different gfp
> > flags.
> >
> > No intended functional changes in this patch.
>
> [...]
>
> > @@ -226,7 +234,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
> > if (WARN_ON_ONCE(size > iomap->length))
> > return -EIO;
> > if (offset > 0)
> > - iop = iomap_page_create(iter->inode, folio);
> > + iop = iomap_page_create(iter->inode, folio, iter->flags);
> > else
> > iop = to_iomap_page(folio);
>
> I really don't like what this change has done to this file. I'm
> modifying this function, and I start thinking "Well, hang on, if
> flags has IOMAP_NOWAIT set, then GFP_NOWAIT can fail, and iop
> will be NULL, so we'll end up marking the entire folio uptodate
> when really we should only be marking some blocks uptodate, so
> we should really be failing the entire read if the allocation
> failed, but maybe it's OK because IOMAP_NOWAIT is never set in
> this path".
>
> I don't know how we fix this. Maybe return ERR_PTR(-ENOMEM) or
> -EAGAIN if the memory allocation fails (leaving the NULL return
> for "we don't need an iop"). Thoughts?
I don't see any problem with that, aside from being pre-coffee and on
vacation for the rest of today. ;)
--D
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 04/14] iomap: Add flags parameter to iomap_page_create()
2023-03-03 16:53 ` Darrick J. Wong
@ 2023-03-03 17:29 ` Stefan Roesch
2023-03-06 13:03 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Stefan Roesch @ 2023-03-03 17:29 UTC (permalink / raw)
To: Darrick J. Wong, Matthew Wilcox
Cc: Stefan Roesch, io-uring, kernel-team, linux-mm, linux-xfs,
linux-fsdevel, david, jack, hch, axboe, Christoph Hellwig
On 3/3/23 8:53 AM, Darrick J. Wong wrote:
> >
> On Fri, Mar 03, 2023 at 04:51:10AM +0000, Matthew Wilcox wrote:
>> On Thu, Jun 23, 2022 at 10:51:47AM -0700, Stefan Roesch wrote:
>>> Add the kiocb flags parameter to the function iomap_page_create().
>>> Depending on the value of the flags parameter it enables different gfp
>>> flags.
>>>
>>> No intended functional changes in this patch.
>>
>> [...]
>>
>>> @@ -226,7 +234,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter,
>>> if (WARN_ON_ONCE(size > iomap->length))
>>> return -EIO;
>>> if (offset > 0)
>>> - iop = iomap_page_create(iter->inode, folio);
>>> + iop = iomap_page_create(iter->inode, folio, iter->flags);
>>> else
>>> iop = to_iomap_page(folio);
>>
>> I really don't like what this change has done to this file. I'm
>> modifying this function, and I start thinking "Well, hang on, if
>> flags has IOMAP_NOWAIT set, then GFP_NOWAIT can fail, and iop
>> will be NULL, so we'll end up marking the entire folio uptodate
>> when really we should only be marking some blocks uptodate, so
>> we should really be failing the entire read if the allocation
>> failed, but maybe it's OK because IOMAP_NOWAIT is never set in
>> this path".
>>
>> I don't know how we fix this. Maybe return ERR_PTR(-ENOMEM) or
>> -EAGAIN if the memory allocation fails (leaving the NULL return
>> for "we don't need an iop"). Thoughts?
>
> I don't see any problem with that, aside from being pre-coffee and on
> vacation for the rest of today. ;)
>
> --D
If IOMAP_NOWAIT is set, and the allocation fails, we should return
-EAGAIN, so the write request is retried in the slow path.
--Stefan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RESEND PATCH v9 04/14] iomap: Add flags parameter to iomap_page_create()
2023-03-03 17:29 ` Stefan Roesch
@ 2023-03-06 13:03 ` Christoph Hellwig
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-03-06 13:03 UTC (permalink / raw)
To: Stefan Roesch
Cc: Darrick J. Wong, Matthew Wilcox, Stefan Roesch, io-uring,
kernel-team, linux-mm, linux-xfs, linux-fsdevel, david, jack, hch,
axboe, Christoph Hellwig
On Fri, Mar 03, 2023 at 09:29:30AM -0800, Stefan Roesch wrote:
> If IOMAP_NOWAIT is set, and the allocation fails, we should return
> -EAGAIN, so the write request is retried in the slow path.
Yes. Another vote for doing the ERR_PTR.
willy, are you going to look into that yourself or are you waiting for
someone to take care of it?
^ permalink raw reply [flat|nested] 28+ messages in thread