* [PATCH v1 00/14] Support sync buffered writes for io-uring
@ 2022-02-14 17:43 Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int Stefan Roesch
` (14 more replies)
0 siblings, 15 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This patch series adds support for async buffered writes. Currently
io-uring only supports buffered writes in the slow path, by processing
them in the io workers. With this patch series it is now possible to
support buffered writes in the fast path. To be able to use the fast
path the required pages must be in the page cache or they can be loaded
with noio. Otherwise they still get punted to the slow path.
If a buffered write request requires more than one page, it is possible
that only part of the request can use the fast path, the resst will be
completed by the io workers.
Support for async buffered writes:
Patch 1: fs: Add flags parameter to __block_write_begin_int
Add a flag parameter to the function __block_write_begin_int
to allow specifying a nowait parameter.
Patch 2: mm: Introduce do_generic_perform_write
Introduce a new do_generic_perform_write function. The function
is split off from the existing generic_perform_write() function.
It allows to specify an additional flag parameter. This parameter
is used to specify the nowait flag.
Patch 3: mm: add noio support in filemap_get_pages
This allows to allocate pages with noio, if a page for async
buffered writes is not yet loaded in the page cache.
Patch 4: mm: Add support for async buffered writes
For async buffered writes allocate pages without blocking on the
allocation.
Patch 5: fs: split off __alloc_page_buffers function
Split off __alloc_page_buffers() function with new gfp_t parameter.
Patch 6: fs: split off __create_empty_buffers function
Split off __create_empty_buffers() function with new gfp_t parameter.
Patch 7: fs: Add aop_flags parameter to create_page_buffers()
Add aop_flags to create_page_buffers() function. Use atomic allocation
for async buffered writes.
Patch 8: fs: add support for async buffered writes
Return -EAGAIN instead of -ENOMEM for async buffered writes. This
will cause the write request to be processed by an io worker.
Patch 9: io_uring: add support for async buffered writes
This enables the async buffered writes for block devices in io_uring.
Buffered writes are enabled for blocks that are already in the page
cache or can be acquired with noio.
Patch 10: io_uring: Add tracepoint for short writes
Support for write throttling of async buffered writes:
Patch 11: sched: add new fields to task_struct
Add two new fields to the task_struct. These fields store the
deadline after which writes are no longer throttled.
Patch 12: mm: support write throttling for async buffered writes
This changes the balance_dirty_pages function to take an additonal
parameter. When nowait is specified the write throttling code no
longer waits synchronously for the deadline to expire. Instead
it sets the fields in task_struct. Once the deadline expires the
fields are reset.
Patch 13: io_uring: support write throttling for async buffered writes
Adds support to io_uring for write throttling. When the writes
are throttled, the write requests are added to the pending io list.
Once the write throttling deadline expires, the writes are submitted.
Enable async buffered write support
Patch 14: fs: add flag to support async buffered writes
This sets the flags that enables async buffered writes for block
devices.
Testing:
This patch has been tested with xfstests and fio.
Peformance results:
For fio the following results have been obtained with a queue depth of
1 and 4k block size (runtime 600 secs):
sequential writes:
without patch with patch
throughput: 329 Mib/s 1032Mib/s
iops: 82k 264k
slat (nsec) 2332 3340
clat (nsec) 9017 60
CPU util%: 37% 78%
random writes:
without patch with patch
throughput: 307 Mib/s 909Mib/s
iops: 76k 227k
slat (nsec) 2419 3780
clat (nsec) 9934 59
CPU util%: 57% 88%
For an io depth of 1, the new patch improves throughput by close to 3
times and also the latency is considerably reduced. To achieve the same
or better performance with the exisiting code an io depth of 4 is required.
Especially for mixed workloads this is a considerable improvement.
Stefan Roesch (14):
fs: Add flags parameter to __block_write_begin_int
mm: Introduce do_generic_perform_write
mm: add noio support in filemap_get_pages
mm: Add support for async buffered writes
fs: split off __alloc_page_buffers function
fs: split off __create_empty_buffers function
fs: Add aop_flags parameter to create_page_buffers()
fs: add support for async buffered writes
io_uring: add support for async buffered writes
io_uring: Add tracepoint for short writes
sched: add new fields to task_struct
mm: support write throttling for async buffered writes
io_uring: support write throttling for async buffered writes
block: enable async buffered writes for block devices.
block/fops.c | 5 +-
fs/buffer.c | 103 ++++++++++++++++---------
fs/internal.h | 3 +-
fs/io_uring.c | 130 +++++++++++++++++++++++++++++---
fs/iomap/buffered-io.c | 4 +-
fs/read_write.c | 3 +-
include/linux/fs.h | 4 +
include/linux/sched.h | 3 +
include/linux/writeback.h | 1 +
include/trace/events/io_uring.h | 25 ++++++
kernel/fork.c | 1 +
mm/filemap.c | 34 +++++++--
mm/folio-compat.c | 4 +
mm/page-writeback.c | 54 +++++++++----
14 files changed, 298 insertions(+), 76 deletions(-)
base-commit: f1baf68e1383f6ed93eb9cff2866d46562607a43
--
2.30.2
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 19:02 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 02/14] mm: Introduce do_generic_perform_write Stefan Roesch
` (13 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This adds a flags parameter to the __begin_write_begin_int() function.
This allows to pass flags down the stack.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/buffer.c | 7 ++++---
fs/internal.h | 3 ++-
fs/iomap/buffered-io.c | 4 ++--
3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 8e112b6bd371..6e6a69a12eed 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1970,7 +1970,8 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
}
int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
- get_block_t *get_block, const struct iomap *iomap)
+ get_block_t *get_block, const struct iomap *iomap,
+ unsigned int flags)
{
unsigned from = pos & (PAGE_SIZE - 1);
unsigned to = from + len;
@@ -2058,7 +2059,7 @@ int __block_write_begin(struct page *page, loff_t pos, unsigned len,
get_block_t *get_block)
{
return __block_write_begin_int(page_folio(page), pos, len, get_block,
- NULL);
+ NULL, 0);
}
EXPORT_SYMBOL(__block_write_begin);
@@ -2118,7 +2119,7 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
if (!page)
return -ENOMEM;
- status = __block_write_begin(page, pos, len, get_block);
+ status = __block_write_begin_int(page_folio(page), pos, len, get_block, NULL, flags);
if (unlikely(status)) {
unlock_page(page);
put_page(page);
diff --git a/fs/internal.h b/fs/internal.h
index 8590c973c2f4..7432df23f3ce 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -38,7 +38,8 @@ static inline int emergency_thaw_bdev(struct super_block *sb)
* buffer.c
*/
int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
- get_block_t *get_block, const struct iomap *iomap);
+ get_block_t *get_block, const struct iomap *iomap,
+ unsigned int flags);
/*
* char_dev.c
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6c51a75d0be6..47c519952725 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -646,7 +646,7 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
if (srcmap->type == IOMAP_INLINE)
status = iomap_write_begin_inline(iter, folio);
else if (srcmap->flags & IOMAP_F_BUFFER_HEAD)
- status = __block_write_begin_int(folio, pos, len, NULL, srcmap);
+ status = __block_write_begin_int(folio, pos, len, NULL, srcmap, 0);
else
status = __iomap_write_begin(iter, pos, len, folio);
@@ -979,7 +979,7 @@ static loff_t iomap_folio_mkwrite_iter(struct iomap_iter *iter,
if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD) {
ret = __block_write_begin_int(folio, iter->pos, length, NULL,
- &iter->iomap);
+ &iter->iomap, 0);
if (ret)
return ret;
block_commit_write(&folio->page, 0, length);
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 02/14] mm: Introduce do_generic_perform_write
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 19:06 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 03/14] mm: add noio support in filemap_get_pages Stefan Roesch
` (12 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This splits off the do generic_perform_write() function, so an
additional flags parameter can be specified. It uses the new flag
parameter to support async buffered writes.
Signed-off-by: Stefan Roesch <[email protected]>
---
include/linux/fs.h | 1 +
mm/filemap.c | 20 +++++++++++++++-----
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e2d892b201b0..e62dba6ed453 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -278,6 +278,7 @@ enum positive_aop_returns {
#define AOP_FLAG_NOFS 0x0002 /* used by filesystem to direct
* helper code (eg buffer layer)
* to clear GFP_FS from alloc */
+#define AOP_FLAGS_NOWAIT 0x0004 /* async nowait buffered writes */
/*
* oh the beauties of C type declarations.
diff --git a/mm/filemap.c b/mm/filemap.c
index ad8c39d90bf9..d2fb817c0845 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3725,14 +3725,13 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
}
EXPORT_SYMBOL(generic_file_direct_write);
-ssize_t generic_perform_write(struct file *file,
- struct iov_iter *i, loff_t pos)
+static ssize_t do_generic_perform_write(struct file *file, struct iov_iter *i,
+ loff_t pos, int flags)
{
struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
long status = 0;
ssize_t written = 0;
- unsigned int flags = 0;
do {
struct page *page;
@@ -3801,6 +3800,12 @@ ssize_t generic_perform_write(struct file *file,
return written ? written : status;
}
+
+ssize_t generic_perform_write(struct file *file,
+ struct iov_iter *i, loff_t pos)
+{
+ return do_generic_perform_write(file, i, pos, 0);
+}
EXPORT_SYMBOL(generic_perform_write);
/**
@@ -3832,6 +3837,10 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
ssize_t written = 0;
ssize_t err;
ssize_t status;
+ int flags = 0;
+
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ flags |= AOP_FLAGS_NOWAIT;
/* We can write back this queue in page reclaim */
current->backing_dev_info = inode_to_bdi(inode);
@@ -3857,7 +3866,8 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
goto out;
- status = generic_perform_write(file, from, pos = iocb->ki_pos);
+ status = do_generic_perform_write(file, from, pos = iocb->ki_pos, flags);
+
/*
* If generic_perform_write() returned a synchronous error
* then we want to return the number of bytes which were
@@ -3889,7 +3899,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
*/
}
} else {
- written = generic_perform_write(file, from, iocb->ki_pos);
+ written = do_generic_perform_write(file, from, iocb->ki_pos, flags);
if (likely(written > 0))
iocb->ki_pos += written;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 03/14] mm: add noio support in filemap_get_pages
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 02/14] mm: Introduce do_generic_perform_write Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 18:08 ` Matthew Wilcox
2022-02-14 19:33 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 04/14] mm: Add support for async buffered writes Stefan Roesch
` (11 subsequent siblings)
14 siblings, 2 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This adds noio support for async buffered writes in filemap_get_pages.
The idea is to handle the failure gracefully and return -EAGAIN if we
can't get the memory quickly.
Signed-off-by: Stefan Roesch <[email protected]>
---
mm/filemap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index d2fb817c0845..0ff4278c3961 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2591,10 +2591,15 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
filemap_get_read_batch(mapping, index, last_index, fbatch);
}
if (!folio_batch_count(fbatch)) {
+ unsigned int pflags;
+
if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
- return -EAGAIN;
+ pflags = memalloc_noio_save();
err = filemap_create_folio(filp, mapping,
iocb->ki_pos >> PAGE_SHIFT, fbatch);
+ if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
+ memalloc_noio_restore(pflags);
+
if (err == AOP_TRUNCATED_PAGE)
goto retry;
return err;
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 04/14] mm: Add support for async buffered writes
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (2 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 03/14] mm: add noio support in filemap_get_pages Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 19:09 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function Stefan Roesch
` (10 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This adds support for async buffered writes in the mm layer. When the
AOP_FLAGS_BUF_WASYNC flag is set, if the page is not already loaded,
the page gets created without blocking on the allocation.
Signed-off-by: Stefan Roesch <[email protected]>
---
mm/filemap.c | 5 +++++
mm/folio-compat.c | 4 ++++
2 files changed, 9 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 0ff4278c3961..19065ad95a4c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -42,6 +42,7 @@
#include <linux/ramfs.h>
#include <linux/page_idle.h>
#include <linux/migrate.h>
+#include <linux/sched/mm.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include "internal.h"
@@ -1986,6 +1987,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
gfp |= __GFP_WRITE;
if (fgp_flags & FGP_NOFS)
gfp &= ~__GFP_FS;
+ if (fgp_flags & FGP_NOWAIT) {
+ gfp |= GFP_ATOMIC;
+ gfp &= ~__GFP_DIRECT_RECLAIM;
+ }
folio = filemap_alloc_folio(gfp, 0);
if (!folio)
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 749555a232a8..a1d05509b29f 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -136,6 +136,10 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping,
if (flags & AOP_FLAG_NOFS)
fgp_flags |= FGP_NOFS;
+
+ if (flags & AOP_FLAGS_NOWAIT)
+ fgp_flags |= FGP_NOWAIT;
+
return pagecache_get_page(mapping, index, fgp_flags,
mapping_gfp_mask(mapping));
}
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 05/14] fs: split off __alloc_page_buffers function
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (3 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 04/14] mm: Add support for async buffered writes Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 22:46 ` kernel test robot
` (3 more replies)
2022-02-14 17:43 ` [PATCH v1 06/14] fs: split off __create_empty_buffers function Stefan Roesch
` (9 subsequent siblings)
14 siblings, 4 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This splits off the __alloc_page_buffers() function from the
alloc_page_buffers_function(). In addition it adds a gfp_t parameter, so
the caller can specify the allocation flags.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/buffer.c | 37 ++++++++++++++++++++++---------------
1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 6e6a69a12eed..a1986f95a39a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -802,26 +802,13 @@ int remove_inode_buffers(struct inode *inode)
return ret;
}
-/*
- * Create the appropriate buffers when given a page for data area and
- * the size of each buffer.. Use the bh->b_this_page linked list to
- * follow the buffers created. Return NULL if unable to create more
- * buffers.
- *
- * The retry flag is used to differentiate async IO (paging, swapping)
- * which may not fail from ordinary buffer allocations.
- */
-struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
- bool retry)
+struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
+ gfp_t gfp)
{
struct buffer_head *bh, *head;
- gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
long offset;
struct mem_cgroup *memcg, *old_memcg;
- if (retry)
- gfp |= __GFP_NOFAIL;
-
/* The page lock pins the memcg */
memcg = page_memcg(page);
old_memcg = set_active_memcg(memcg);
@@ -859,6 +846,26 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
goto out;
}
+
+/*
+ * Create the appropriate buffers when given a page for data area and
+ * the size of each buffer.. Use the bh->b_this_page linked list to
+ * follow the buffers created. Return NULL if unable to create more
+ * buffers.
+ *
+ * The retry flag is used to differentiate async IO (paging, swapping)
+ * which may not fail from ordinary buffer allocations.
+ */
+struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
+ bool retry)
+{
+ gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
+
+ if (retry)
+ gfp |= __GFP_NOFAIL;
+
+ return __alloc_page_buffers(page, size, gfp);
+}
EXPORT_SYMBOL_GPL(alloc_page_buffers);
static inline void
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 06/14] fs: split off __create_empty_buffers function
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (4 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers() Stefan Roesch
` (8 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This splits off the function __create_empty_buffers() from the function
create_empty_buffers. The __create_empty_buffers has an additional gfp
parameter. This allows the caller to specify the allocation properties.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/buffer.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index a1986f95a39a..948505480b43 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1554,17 +1554,12 @@ void block_invalidatepage(struct page *page, unsigned int offset,
EXPORT_SYMBOL(block_invalidatepage);
-/*
- * We attach and possibly dirty the buffers atomically wrt
- * __set_page_dirty_buffers() via private_lock. try_to_free_buffers
- * is already excluded via the page lock.
- */
-void create_empty_buffers(struct page *page,
- unsigned long blocksize, unsigned long b_state)
+static void __create_empty_buffers(struct page *page, unsigned long blocksize,
+ unsigned long b_state, gfp_t gfp)
{
struct buffer_head *bh, *head, *tail;
- head = alloc_page_buffers(page, blocksize, true);
+ head = __alloc_page_buffers(page, blocksize, gfp);
bh = head;
do {
bh->b_state |= b_state;
@@ -1587,6 +1582,17 @@ void create_empty_buffers(struct page *page,
attach_page_private(page, head);
spin_unlock(&page->mapping->private_lock);
}
+/*
+ * We attach and possibly dirty the buffers atomically wrt
+ * __set_page_dirty_buffers() via private_lock. try_to_free_buffers
+ * is already excluded via the page lock.
+ */
+void create_empty_buffers(struct page *page,
+ unsigned long blocksize, unsigned long b_state)
+{
+ return __create_empty_buffers(page, blocksize, b_state,
+ GFP_NOFS | __GFP_ACCOUNT | __GFP_NOFAIL);
+}
EXPORT_SYMBOL(create_empty_buffers);
/**
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers()
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (5 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 06/14] fs: split off __create_empty_buffers function Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 18:14 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 08/14] fs: add support for async buffered writes Stefan Roesch
` (7 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This adds the aop_flags parameter to the create_page_buffers function.
When AOP_FLAGS_NOWAIT parameter is set, the atomic allocation flag is
set. The AOP_FLAGS_NOWAIT flag is set, when async buffered writes are
enabled.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/buffer.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 948505480b43..5e3067173580 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1682,13 +1682,27 @@ static inline int block_size_bits(unsigned int blocksize)
return ilog2(blocksize);
}
-static struct buffer_head *create_page_buffers(struct page *page, struct inode *inode, unsigned int b_state)
+static struct buffer_head *create_page_buffers(struct page *page,
+ struct inode *inode,
+ unsigned int b_state,
+ unsigned int aop_flags)
{
BUG_ON(!PageLocked(page));
- if (!page_has_buffers(page))
- create_empty_buffers(page, 1 << READ_ONCE(inode->i_blkbits),
- b_state);
+ if (!page_has_buffers(page)) {
+ gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
+
+ if (aop_flags & AOP_FLAGS_NOWAIT) {
+ gfp |= GFP_ATOMIC | __GFP_NOWARN;
+ gfp &= ~__GFP_DIRECT_RECLAIM;
+ } else {
+ gfp |= __GFP_NOFAIL;
+ }
+
+ __create_empty_buffers(page, 1 << READ_ONCE(inode->i_blkbits),
+ b_state, gfp);
+ }
+
return page_buffers(page);
}
@@ -1734,7 +1748,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
int write_flags = wbc_to_write_flags(wbc);
head = create_page_buffers(page, inode,
- (1 << BH_Dirty)|(1 << BH_Uptodate));
+ (1 << BH_Dirty)|(1 << BH_Uptodate), 0);
/*
* Be very careful. We have no exclusion from __set_page_dirty_buffers
@@ -2000,7 +2014,7 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
BUG_ON(to > PAGE_SIZE);
BUG_ON(from > to);
- head = create_page_buffers(&folio->page, inode, 0);
+ head = create_page_buffers(&folio->page, inode, 0, flags);
blocksize = head->b_size;
bbits = block_size_bits(blocksize);
@@ -2280,7 +2294,7 @@ int block_read_full_page(struct page *page, get_block_t *get_block)
int nr, i;
int fully_mapped = 1;
- head = create_page_buffers(page, inode, 0);
+ head = create_page_buffers(page, inode, 0, 0);
blocksize = head->b_size;
bbits = block_size_bits(blocksize);
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 08/14] fs: add support for async buffered writes
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (6 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers() Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 09/14] io_uring: " Stefan Roesch
` (6 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This adds support for the AOP_FLAGS_BUF_WASYNC flag to the fs layer. If
a page that is required for writing is not in the page cache, it returns
EAGAIN instead of ENOMEM.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/buffer.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 5e3067173580..140f57c1cbdd 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2069,6 +2069,10 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
*wait_bh++=bh;
}
}
+
+ /* No wait specified, don't wait for reads to complete. */
+ if (!err && wait_bh > wait && (flags & AOP_FLAGS_NOWAIT))
+ return -EAGAIN;
/*
* If we issued read requests - let them complete.
*/
@@ -2143,8 +2147,11 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
int status;
page = grab_cache_page_write_begin(mapping, index, flags);
- if (!page)
+ if (!page) {
+ if (flags & AOP_FLAGS_NOWAIT)
+ return -EAGAIN;
return -ENOMEM;
+ }
status = __block_write_begin_int(page_folio(page), pos, len, get_block, NULL, flags);
if (unlikely(status)) {
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 09/14] io_uring: add support for async buffered writes
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (7 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 08/14] fs: add support for async buffered writes Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 10/14] io_uring: Add tracepoint for short writes Stefan Roesch
` (5 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This enables the async buffered writes for block devices in io_uring.
Buffered writes are enabled for blocks that are already in the page
cache or can be acquired with noio.
It is possible that a write request cannot be completely fullfilled
(short write). In that case the request is punted and sent to the io
workers to be completed. Before submitting the request to the io
workers, the request is updated with how much has already been written.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/io_uring.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2e04f718319d..76b1ff602470 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3615,7 +3615,7 @@ static inline int io_iter_do_read(struct io_kiocb *req, struct iov_iter *iter)
return -EINVAL;
}
-static bool need_read_all(struct io_kiocb *req)
+static bool need_complete_io(struct io_kiocb *req)
{
return req->flags & REQ_F_ISREG ||
S_ISBLK(file_inode(req->file)->i_mode);
@@ -3679,7 +3679,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
} else if (ret == -EIOCBQUEUED) {
goto out_free;
} else if (ret == req->result || ret <= 0 || !force_nonblock ||
- (req->flags & REQ_F_NOWAIT) || !need_read_all(req)) {
+ (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) {
/* read all, failed, already did sync or don't want to retry */
goto done;
}
@@ -3777,9 +3777,10 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(!io_file_supports_nowait(req)))
goto copy_iov;
- /* file path doesn't support NOWAIT for non-direct_IO */
- if (force_nonblock && !(kiocb->ki_flags & IOCB_DIRECT) &&
- (req->flags & REQ_F_ISREG))
+ /* File path supports NOWAIT for non-direct_IO only for block devices. */
+ if (!(kiocb->ki_flags & IOCB_DIRECT) &&
+ !(kiocb->ki_filp->f_mode & FMODE_BUF_WASYNC) &&
+ (req->flags & REQ_F_ISREG))
goto copy_iov;
kiocb->ki_flags |= IOCB_NOWAIT;
@@ -3831,6 +3832,24 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
/* IOPOLL retry should happen for io-wq threads */
if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL))
goto copy_iov;
+
+ if (ret2 != req->result && ret2 >= 0 && need_complete_io(req)) {
+ struct io_async_rw *rw;
+
+ /* This is a partial write. The file pos has already been
+ * updated, setup the async struct to complete the request
+ * in the worker. Also update bytes_done to account for
+ * the bytes already written.
+ */
+ iov_iter_save_state(&s->iter, &s->iter_state);
+ ret = io_setup_async_rw(req, iovec, s, true);
+
+ rw = req->async_data;
+ if (rw)
+ rw->bytes_done += ret2;
+
+ return ret ? ret : -EAGAIN;
+ }
done:
kiocb_done(req, ret2, issue_flags);
} else {
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 10/14] io_uring: Add tracepoint for short writes
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (8 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 09/14] io_uring: " Stefan Roesch
@ 2022-02-14 17:43 ` Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 11/14] sched: add new fields to task_struct Stefan Roesch
` (4 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:43 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This adds the io_uring_short_write tracepoint to io_uring. A short write
is issued if not all pages that are required for a write are in the page
cache and the async buffered writes have to return EAGAIN.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/io_uring.c | 3 +++
include/trace/events/io_uring.h | 25 +++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 76b1ff602470..507f28b5b2bb 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3836,6 +3836,9 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
if (ret2 != req->result && ret2 >= 0 && need_complete_io(req)) {
struct io_async_rw *rw;
+ trace_io_uring_short_write(req->ctx, kiocb->ki_pos - ret2,
+ req->result, ret2);
+
/* This is a partial write. The file pos has already been
* updated, setup the async struct to complete the request
* in the worker. Also update bytes_done to account for
diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h
index 7346f0164cf4..ce1cfdf4b015 100644
--- a/include/trace/events/io_uring.h
+++ b/include/trace/events/io_uring.h
@@ -558,6 +558,31 @@ TRACE_EVENT(io_uring_req_failed,
(unsigned long long) __entry->pad2, __entry->error)
);
+TRACE_EVENT(io_uring_short_write,
+
+ TP_PROTO(void *ctx, u64 fpos, u64 wanted, u64 got),
+
+ TP_ARGS(ctx, fpos, wanted, got),
+
+ TP_STRUCT__entry(
+ __field(void *, ctx)
+ __field(u64, fpos)
+ __field(u64, wanted)
+ __field(u64, got)
+ ),
+
+ TP_fast_assign(
+ __entry->ctx = ctx;
+ __entry->fpos = fpos;
+ __entry->wanted = wanted;
+ __entry->got = got;
+ ),
+
+ TP_printk("ring %p, fpos %lld, wanted %lld, got %lld",
+ __entry->ctx, __entry->fpos,
+ __entry->wanted, __entry->got)
+);
+
#endif /* _TRACE_IO_URING_H */
/* This part must be outside protection */
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 11/14] sched: add new fields to task_struct
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (9 preceding siblings ...)
2022-02-14 17:43 ` [PATCH v1 10/14] io_uring: Add tracepoint for short writes Stefan Roesch
@ 2022-02-14 17:44 ` Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 12/14] mm: support write throttling for async buffered writes Stefan Roesch
` (3 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:44 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
Add two new fields to the task_struct to support async
write throttling.
- One field to store how long writes are throttled: bdp_pause
- The other field to store the number of dirtied pages:
bdp_nr_dirtied_pause
Signed-off-by: Stefan Roesch <[email protected]>
---
include/linux/sched.h | 3 +++
kernel/fork.c | 1 +
2 files changed, 4 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75ba8aa60248..97146b7539c5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1324,6 +1324,9 @@ struct task_struct {
/* Start of a write-and-pause period: */
unsigned long dirty_paused_when;
+ unsigned long bdp_pause;
+ int bdp_nr_dirtied_pause;
+
#ifdef CONFIG_LATENCYTOP
int latency_record_count;
struct latency_record latency_record[LT_SAVECOUNT];
diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..d34c9c00baea 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2246,6 +2246,7 @@ static __latent_entropy struct task_struct *copy_process(
p->nr_dirtied = 0;
p->nr_dirtied_pause = 128 >> (PAGE_SHIFT - 10);
p->dirty_paused_when = 0;
+ p->bdp_nr_dirtied_pause = -1;
p->pdeath_signal = 0;
INIT_LIST_HEAD(&p->thread_group);
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 12/14] mm: support write throttling for async buffered writes
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (10 preceding siblings ...)
2022-02-14 17:44 ` [PATCH v1 11/14] sched: add new fields to task_struct Stefan Roesch
@ 2022-02-14 17:44 ` Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 13/14] io_uring: " Stefan Roesch
` (2 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:44 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This change adds support for async write throttling in the function
balance_dirty_pages(). So far if throttling was required, the code
was waiting synchronously as long as the writes were throttled. This
change introduces asynchronous throttling. Instead of waiting in the
function balance_dirty_pages(), the timeout is set in the task_struct
field bdp_pause. Once the timeout has expired, the writes are no
longer throttled.
- Add a new parameter to the balance_dirty_pages() function
- This allows the caller to pass in the nowait flag
- When the nowait flag is specified, the code does not wait in
balance_dirty_pages(), but instead stores the wait expiration in the
new task_struct field bdp_pause.
- The function balance_dirty_pages_ratelimited() resets the new values
in the task_struct, once the timeout has expired
This change is required to support write throttling for the async
buffered writes. While the writes are throttled, io_uring still can make
progress with processing other requests.
Signed-off-by: Stefan Roesch <[email protected]>
---
include/linux/writeback.h | 1 +
mm/filemap.c | 2 +-
mm/page-writeback.c | 54 ++++++++++++++++++++++++++++-----------
3 files changed, 41 insertions(+), 16 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fec248ab1fec..48176a8047db 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -373,6 +373,7 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
void wb_update_bandwidth(struct bdi_writeback *wb);
void balance_dirty_pages_ratelimited(struct address_space *mapping);
+void balance_dirty_pages_ratelimited_flags(struct address_space *mapping, bool is_async);
bool wb_over_bg_thresh(struct bdi_writeback *wb);
typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
diff --git a/mm/filemap.c b/mm/filemap.c
index 19065ad95a4c..aa51ff1a0e8f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3805,7 +3805,7 @@ static ssize_t do_generic_perform_write(struct file *file, struct iov_iter *i,
pos += status;
written += status;
- balance_dirty_pages_ratelimited(mapping);
+ balance_dirty_pages_ratelimited_flags(mapping, flags & AOP_FLAGS_NOWAIT);
} while (iov_iter_count(i));
return written ? written : status;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 91d163f8d36b..767d0b997da5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1558,7 +1558,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
* perform some writeout.
*/
static void balance_dirty_pages(struct bdi_writeback *wb,
- unsigned long pages_dirtied)
+ unsigned long pages_dirtied, bool is_async)
{
struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) };
struct dirty_throttle_control mdtc_stor = { MDTC_INIT(wb, &gdtc_stor) };
@@ -1792,6 +1792,14 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
period,
pause,
start_time);
+ if (is_async) {
+ if (current->bdp_nr_dirtied_pause == -1) {
+ current->bdp_pause = now + pause;
+ current->bdp_nr_dirtied_pause = nr_dirtied_pause;
+ }
+ break;
+ }
+
__set_current_state(TASK_KILLABLE);
wb->dirty_sleep = now;
io_schedule_timeout(pause);
@@ -1799,6 +1807,8 @@ static void balance_dirty_pages(struct bdi_writeback *wb,
current->dirty_paused_when = now + pause;
current->nr_dirtied = 0;
current->nr_dirtied_pause = nr_dirtied_pause;
+ current->bdp_nr_dirtied_pause = -1;
+ current->bdp_pause = 0;
/*
* This is typically equal to (dirty < thresh) and can also
@@ -1863,19 +1873,7 @@ static DEFINE_PER_CPU(int, bdp_ratelimits);
*/
DEFINE_PER_CPU(int, dirty_throttle_leaks) = 0;
-/**
- * balance_dirty_pages_ratelimited - balance dirty memory state
- * @mapping: address_space which was dirtied
- *
- * Processes which are dirtying memory should call in here once for each page
- * which was newly dirtied. The function will periodically check the system's
- * dirty state and will initiate writeback if needed.
- *
- * Once we're over the dirty memory limit we decrease the ratelimiting
- * by a lot, to prevent individual processes from overshooting the limit
- * by (ratelimit_pages) each.
- */
-void balance_dirty_pages_ratelimited(struct address_space *mapping)
+void balance_dirty_pages_ratelimited_flags(struct address_space *mapping, bool is_async)
{
struct inode *inode = mapping->host;
struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -1886,6 +1884,15 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
if (!(bdi->capabilities & BDI_CAP_WRITEBACK))
return;
+ if (current->bdp_nr_dirtied_pause != -1 && time_after(jiffies, current->bdp_pause)) {
+ current->dirty_paused_when = current->bdp_pause;
+ current->nr_dirtied = 0;
+ current->nr_dirtied_pause = current->bdp_nr_dirtied_pause;
+
+ current->bdp_nr_dirtied_pause = -1;
+ current->bdp_pause = 0;
+ }
+
if (inode_cgwb_enabled(inode))
wb = wb_get_create_current(bdi, GFP_KERNEL);
if (!wb)
@@ -1924,10 +1931,27 @@ void balance_dirty_pages_ratelimited(struct address_space *mapping)
preempt_enable();
if (unlikely(current->nr_dirtied >= ratelimit))
- balance_dirty_pages(wb, current->nr_dirtied);
+ balance_dirty_pages(wb, current->nr_dirtied, is_async);
wb_put(wb);
}
+
+/**
+ * balance_dirty_pages_ratelimited - balance dirty memory state
+ * @mapping: address_space which was dirtied
+ *
+ * Processes which are dirtying memory should call in here once for each page
+ * which was newly dirtied. The function will periodically check the system's
+ * dirty state and will initiate writeback if needed.
+ *
+ * Once we're over the dirty memory limit we decrease the ratelimiting
+ * by a lot, to prevent individual processes from overshooting the limit
+ * by (ratelimit_pages) each.
+ */
+void balance_dirty_pages_ratelimited(struct address_space *mapping)
+{
+ balance_dirty_pages_ratelimited_flags(mapping, false);
+}
EXPORT_SYMBOL(balance_dirty_pages_ratelimited);
/**
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 13/14] io_uring: support write throttling for async buffered writes
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (11 preceding siblings ...)
2022-02-14 17:44 ` [PATCH v1 12/14] mm: support write throttling for async buffered writes Stefan Roesch
@ 2022-02-14 17:44 ` Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 14/14] block: enable async buffered writes for block devices Stefan Roesch
2022-02-15 3:59 ` [PATCH v1 00/14] Support sync buffered writes for io-uring Hao Xu
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:44 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This adds the process-level throttling for the block layer for async
buffered writes to io-uring.In io_write the code now checks if the write
needs to be throttled. If this is required, it adds the request to the
list of pending io requests and starts a timer. After the timer expires,
it submits the list of pending writes.
- Add new list called pending_ios for delayed writes (throttled writes)
to struct io_uring_task. The list is protected by the task_lock spin
lock.
- Add new timer to struct io_uring_task.
Signed-off-by: Stefan Roesch <[email protected]>
---
fs/io_uring.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 91 insertions(+), 7 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 507f28b5b2bb..7bb77700ffac 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -461,6 +461,11 @@ struct io_ring_ctx {
};
};
+struct pending_list {
+ struct list_head list;
+ struct io_kiocb *req;
+};
+
struct io_uring_task {
/* submission side */
int cached_refs;
@@ -477,6 +482,9 @@ struct io_uring_task {
struct io_wq_work_list prior_task_list;
struct callback_head task_work;
bool task_running;
+
+ struct pending_list pending_ios;
+ struct timer_list timer;
};
/*
@@ -1134,13 +1142,14 @@ static void io_rsrc_put_work(struct work_struct *work);
static void io_req_task_queue(struct io_kiocb *req);
static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
-static int io_req_prep_async(struct io_kiocb *req);
+static int io_req_prep_async(struct io_kiocb *req, bool force);
static int io_install_fixed_file(struct io_kiocb *req, struct file *file,
unsigned int issue_flags, u32 slot_index);
static int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags);
static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer);
+static void delayed_write_fn(struct timer_list *tmr);
static struct kmem_cache *req_cachep;
@@ -2462,6 +2471,31 @@ static void io_req_task_queue_reissue(struct io_kiocb *req)
io_req_task_work_add(req, false);
}
+static int io_req_task_queue_reissue_delayed(struct io_kiocb *req)
+{
+ struct io_uring_task *tctx = req->task->io_uring;
+ struct pending_list *pending = kmalloc(sizeof(struct pending_list), GFP_KERNEL);
+ bool empty;
+
+ if (!pending)
+ return -ENOMEM;
+ pending->req = req;
+
+ spin_lock_irq(&tctx->task_lock);
+ empty = list_empty(&tctx->pending_ios.list);
+ list_add_tail(&pending->list, &tctx->pending_ios.list);
+
+ if (empty) {
+ timer_setup(&tctx->timer, delayed_write_fn, 0);
+
+ tctx->timer.expires = current->bdp_pause;
+ add_timer(&tctx->timer);
+ }
+ spin_unlock_irq(&tctx->task_lock);
+
+ return 0;
+}
+
static inline void io_queue_next(struct io_kiocb *req)
{
struct io_kiocb *nxt = io_req_find_next(req);
@@ -2770,7 +2804,7 @@ static bool io_resubmit_prep(struct io_kiocb *req)
struct io_async_rw *rw = req->async_data;
if (!req_has_async_data(req))
- return !io_req_prep_async(req);
+ return !io_req_prep_async(req, false);
iov_iter_restore(&rw->s.iter, &rw->s.iter_state);
return true;
}
@@ -3751,6 +3785,38 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return io_prep_rw(req, sqe);
}
+static inline unsigned long write_delay(void)
+{
+ if (likely(current->bdp_nr_dirtied_pause == -1 ||
+ !time_before(jiffies, current->bdp_pause)))
+ return 0;
+
+ return current->bdp_pause;
+}
+
+static void delayed_write_fn(struct timer_list *tmr)
+{
+ struct io_uring_task *tctx = from_timer(tctx, tmr, timer);
+ struct list_head *curr;
+ struct list_head *next;
+ LIST_HEAD(pending_ios);
+
+ /* Move list to temporary list. */
+ spin_lock_irq(&tctx->task_lock);
+ list_splice_init(&tctx->pending_ios.list, &pending_ios);
+ spin_unlock_irq(&tctx->task_lock);
+
+ list_for_each_safe(curr, next, &pending_ios) {
+ struct pending_list *io;
+
+ io = list_entry(curr, struct pending_list, list);
+ io_req_task_queue_reissue(io->req);
+
+ list_del(curr);
+ kfree(io);
+ }
+}
+
static int io_write(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_rw_state __s, *s = &__s;
@@ -3759,6 +3825,18 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
ssize_t ret, ret2;
+ /* Write throttling active? */
+ if (unlikely(write_delay()) && !(kiocb->ki_flags & IOCB_DIRECT)) {
+ int ret = io_req_prep_async(req, true);
+
+ if (unlikely(ret))
+ io_req_complete_failed(req, ret);
+ else
+ ret = io_req_task_queue_reissue_delayed(req);
+
+ return ret;
+ }
+
if (!req_has_async_data(req)) {
ret = io_import_iovec(WRITE, req, &iovec, s, issue_flags);
if (unlikely(ret < 0))
@@ -6597,9 +6675,9 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return -EINVAL;
}
-static int io_req_prep_async(struct io_kiocb *req)
+static int io_req_prep_async(struct io_kiocb *req, bool force)
{
- if (!io_op_defs[req->opcode].needs_async_setup)
+ if (!force && !io_op_defs[req->opcode].needs_async_setup)
return 0;
if (WARN_ON_ONCE(req_has_async_data(req)))
return -EFAULT;
@@ -6609,6 +6687,10 @@ static int io_req_prep_async(struct io_kiocb *req)
switch (req->opcode) {
case IORING_OP_READV:
return io_rw_prep_async(req, READ);
+ case IORING_OP_WRITE:
+ if (!force)
+ break;
+ fallthrough;
case IORING_OP_WRITEV:
return io_rw_prep_async(req, WRITE);
case IORING_OP_SENDMSG:
@@ -6618,6 +6700,7 @@ static int io_req_prep_async(struct io_kiocb *req)
case IORING_OP_CONNECT:
return io_connect_prep_async(req);
}
+
printk_once(KERN_WARNING "io_uring: prep_async() bad opcode %d\n",
req->opcode);
return -EFAULT;
@@ -6651,7 +6734,7 @@ static __cold void io_drain_req(struct io_kiocb *req)
}
spin_unlock(&ctx->completion_lock);
- ret = io_req_prep_async(req);
+ ret = io_req_prep_async(req, false);
if (ret) {
fail:
io_req_complete_failed(req, ret);
@@ -7146,7 +7229,7 @@ static void io_queue_sqe_fallback(struct io_kiocb *req)
} else if (unlikely(req->ctx->drain_active)) {
io_drain_req(req);
} else {
- int ret = io_req_prep_async(req);
+ int ret = io_req_prep_async(req, false);
if (unlikely(ret))
io_req_complete_failed(req, ret);
@@ -7345,7 +7428,7 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
struct io_kiocb *head = link->head;
if (!(req->flags & REQ_F_FAIL)) {
- ret = io_req_prep_async(req);
+ ret = io_req_prep_async(req, false);
if (unlikely(ret)) {
req_fail_link_node(req, ret);
if (!(head->flags & REQ_F_FAIL))
@@ -8785,6 +8868,7 @@ static __cold int io_uring_alloc_task_context(struct task_struct *task,
INIT_WQ_LIST(&tctx->task_list);
INIT_WQ_LIST(&tctx->prior_task_list);
init_task_work(&tctx->task_work, tctx_task_work);
+ INIT_LIST_HEAD(&tctx->pending_ios.list);
return 0;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v1 14/14] block: enable async buffered writes for block devices.
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (12 preceding siblings ...)
2022-02-14 17:44 ` [PATCH v1 13/14] io_uring: " Stefan Roesch
@ 2022-02-14 17:44 ` Stefan Roesch
2022-02-15 3:59 ` [PATCH v1 00/14] Support sync buffered writes for io-uring Hao Xu
14 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-14 17:44 UTC (permalink / raw)
To: io-uring, linux-fsdevel, linux-block, kernel-team; +Cc: shr
This introduces the flag FMODE_BUF_WASYNC. If devices support async
buffered writes, this flag can be set. It also enables async buffered
writes for block devices.
Signed-off-by: Stefan Roesch <[email protected]>
---
block/fops.c | 5 +----
fs/read_write.c | 3 ++-
include/linux/fs.h | 3 +++
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 4f59e0f5bf30..75b36f8b5e71 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -489,7 +489,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
* during an unstable branch.
*/
filp->f_flags |= O_LARGEFILE;
- filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC;
+ filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC;
if (filp->f_flags & O_NDELAY)
filp->f_mode |= FMODE_NDELAY;
@@ -544,9 +544,6 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (iocb->ki_pos >= size)
return -ENOSPC;
- if ((iocb->ki_flags & (IOCB_NOWAIT | IOCB_DIRECT)) == IOCB_NOWAIT)
- return -EOPNOTSUPP;
-
size -= iocb->ki_pos;
if (iov_iter_count(from) > size) {
shorted = iov_iter_count(from) - size;
diff --git a/fs/read_write.c b/fs/read_write.c
index 0074afa7ecb3..58233844a9d8 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1641,7 +1641,8 @@ ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
if (iocb->ki_flags & IOCB_APPEND)
iocb->ki_pos = i_size_read(inode);
- if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+ if ((iocb->ki_flags & IOCB_NOWAIT) &&
+ (!(iocb->ki_flags & IOCB_DIRECT) && !(file->f_mode & FMODE_BUF_WASYNC)))
return -EINVAL;
count = iov_iter_count(from);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e62dba6ed453..a19c7903e031 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -176,6 +176,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
/* File supports async buffered reads */
#define FMODE_BUF_RASYNC ((__force fmode_t)0x40000000)
+/* File supports async nowait buffered writes */
+#define FMODE_BUF_WASYNC ((__force fmode_t)0x80000000)
+
/*
* Attribute flags. These should be or-ed together to figure out what
* has been changed!
--
2.30.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v1 03/14] mm: add noio support in filemap_get_pages
2022-02-14 17:43 ` [PATCH v1 03/14] mm: add noio support in filemap_get_pages Stefan Roesch
@ 2022-02-14 18:08 ` Matthew Wilcox
2022-02-16 18:27 ` Stefan Roesch
2022-02-14 19:33 ` Matthew Wilcox
1 sibling, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-14 18:08 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Mon, Feb 14, 2022 at 09:43:52AM -0800, Stefan Roesch wrote:
> This adds noio support for async buffered writes in filemap_get_pages.
> The idea is to handle the failure gracefully and return -EAGAIN if we
> can't get the memory quickly.
But it doesn't return -EAGAIN?
folio = filemap_alloc_folio(mapping_gfp_mask(mapping), 0);
if (!folio)
return -ENOMEM;
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
> mm/filemap.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d2fb817c0845..0ff4278c3961 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2591,10 +2591,15 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
> filemap_get_read_batch(mapping, index, last_index, fbatch);
> }
> if (!folio_batch_count(fbatch)) {
> + unsigned int pflags;
> +
> if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
> - return -EAGAIN;
> + pflags = memalloc_noio_save();
> err = filemap_create_folio(filp, mapping,
> iocb->ki_pos >> PAGE_SHIFT, fbatch);
> + if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
> + memalloc_noio_restore(pflags);
> +
> if (err == AOP_TRUNCATED_PAGE)
> goto retry;
> return err;
I would also not expect the memalloc_noio_save/restore calls to be
here. Surely they should be at the top of the call chain where
IOCB_NOWAIT/IOCB_WAITQ are set?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers()
2022-02-14 17:43 ` [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers() Stefan Roesch
@ 2022-02-14 18:14 ` Matthew Wilcox
2022-02-16 18:30 ` Stefan Roesch
0 siblings, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-14 18:14 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Mon, Feb 14, 2022 at 09:43:56AM -0800, Stefan Roesch wrote:
> This adds the aop_flags parameter to the create_page_buffers function.
> When AOP_FLAGS_NOWAIT parameter is set, the atomic allocation flag is
> set. The AOP_FLAGS_NOWAIT flag is set, when async buffered writes are
> enabled.
Why is this better than passing in gfp flags directly?
> Signed-off-by: Stefan Roesch <[email protected]>
> ---
> fs/buffer.c | 28 +++++++++++++++++++++-------
> 1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 948505480b43..5e3067173580 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1682,13 +1682,27 @@ static inline int block_size_bits(unsigned int blocksize)
> return ilog2(blocksize);
> }
>
> -static struct buffer_head *create_page_buffers(struct page *page, struct inode *inode, unsigned int b_state)
> +static struct buffer_head *create_page_buffers(struct page *page,
> + struct inode *inode,
> + unsigned int b_state,
> + unsigned int aop_flags)
> {
> BUG_ON(!PageLocked(page));
>
> - if (!page_has_buffers(page))
> - create_empty_buffers(page, 1 << READ_ONCE(inode->i_blkbits),
> - b_state);
> + if (!page_has_buffers(page)) {
> + gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
> +
> + if (aop_flags & AOP_FLAGS_NOWAIT) {
> + gfp |= GFP_ATOMIC | __GFP_NOWARN;
> + gfp &= ~__GFP_DIRECT_RECLAIM;
> + } else {
> + gfp |= __GFP_NOFAIL;
> + }
> +
> + __create_empty_buffers(page, 1 << READ_ONCE(inode->i_blkbits),
> + b_state, gfp);
> + }
> +
> return page_buffers(page);
> }
>
> @@ -1734,7 +1748,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
> int write_flags = wbc_to_write_flags(wbc);
>
> head = create_page_buffers(page, inode,
> - (1 << BH_Dirty)|(1 << BH_Uptodate));
> + (1 << BH_Dirty)|(1 << BH_Uptodate), 0);
>
> /*
> * Be very careful. We have no exclusion from __set_page_dirty_buffers
> @@ -2000,7 +2014,7 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
> BUG_ON(to > PAGE_SIZE);
> BUG_ON(from > to);
>
> - head = create_page_buffers(&folio->page, inode, 0);
> + head = create_page_buffers(&folio->page, inode, 0, flags);
> blocksize = head->b_size;
> bbits = block_size_bits(blocksize);
>
> @@ -2280,7 +2294,7 @@ int block_read_full_page(struct page *page, get_block_t *get_block)
> int nr, i;
> int fully_mapped = 1;
>
> - head = create_page_buffers(page, inode, 0);
> + head = create_page_buffers(page, inode, 0, 0);
> blocksize = head->b_size;
> bbits = block_size_bits(blocksize);
>
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int
2022-02-14 17:43 ` [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int Stefan Roesch
@ 2022-02-14 19:02 ` Matthew Wilcox
2022-02-16 18:31 ` Stefan Roesch
0 siblings, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-14 19:02 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Mon, Feb 14, 2022 at 09:43:50AM -0800, Stefan Roesch wrote:
> This adds a flags parameter to the __begin_write_begin_int() function.
> This allows to pass flags down the stack.
In general, I am not in favour of more AOP_FLAG uses. I'd prefer to
remove the two that we do have (reiserfs is the only user of
AOP_FLAG_CONT_EXPAND and AOP_FLAG_NOFS can usually be replaced by
passing in gfp_t flags).
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 02/14] mm: Introduce do_generic_perform_write
2022-02-14 17:43 ` [PATCH v1 02/14] mm: Introduce do_generic_perform_write Stefan Roesch
@ 2022-02-14 19:06 ` Matthew Wilcox
0 siblings, 0 replies; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-14 19:06 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Mon, Feb 14, 2022 at 09:43:51AM -0800, Stefan Roesch wrote:
> This splits off the do generic_perform_write() function, so an
> additional flags parameter can be specified. It uses the new flag
> parameter to support async buffered writes.
It would seem simpler to pass the iocb pointer to generic_perform_write()
(in place of the struct file pointer) instead of inventing a new flag.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 04/14] mm: Add support for async buffered writes
2022-02-14 17:43 ` [PATCH v1 04/14] mm: Add support for async buffered writes Stefan Roesch
@ 2022-02-14 19:09 ` Matthew Wilcox
0 siblings, 0 replies; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-14 19:09 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Mon, Feb 14, 2022 at 09:43:53AM -0800, Stefan Roesch wrote:
> @@ -1986,6 +1987,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
> gfp |= __GFP_WRITE;
> if (fgp_flags & FGP_NOFS)
> gfp &= ~__GFP_FS;
> + if (fgp_flags & FGP_NOWAIT) {
> + gfp |= GFP_ATOMIC;
> + gfp &= ~__GFP_DIRECT_RECLAIM;
> + }
>
> folio = filemap_alloc_folio(gfp, 0);
No. FGP_NOWAIT means "Don't block on page lock". You can't redefine it
to mean "Use GFP_ATOMIC" without changing all the existing callers. (Do
not change the existing callers).
__filemap_get_folio() already takes a gfp_t. There's no need for this.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 03/14] mm: add noio support in filemap_get_pages
2022-02-14 17:43 ` [PATCH v1 03/14] mm: add noio support in filemap_get_pages Stefan Roesch
2022-02-14 18:08 ` Matthew Wilcox
@ 2022-02-14 19:33 ` Matthew Wilcox
2022-02-16 18:26 ` Stefan Roesch
1 sibling, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-14 19:33 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Mon, Feb 14, 2022 at 09:43:52AM -0800, Stefan Roesch wrote:
> This adds noio support for async buffered writes in filemap_get_pages.
> The idea is to handle the failure gracefully and return -EAGAIN if we
> can't get the memory quickly.
I don't understand why this helps you. filemap_get_pages() is for
reads, not writes.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 05/14] fs: split off __alloc_page_buffers function
2022-02-14 17:43 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function Stefan Roesch
@ 2022-02-14 22:46 ` kernel test robot
2022-02-14 23:27 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2022-02-14 22:46 UTC (permalink / raw)
To: Stefan Roesch, io-uring, linux-fsdevel, linux-block, kernel-team
Cc: kbuild-all, shr
Hi Stefan,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on f1baf68e1383f6ed93eb9cff2866d46562607a43]
url: https://github.com/0day-ci/linux/commits/Stefan-Roesch/Support-sync-buffered-writes-for-io-uring/20220215-014908
base: f1baf68e1383f6ed93eb9cff2866d46562607a43
config: i386-randconfig-a016-20220214 (https://download.01.org/0day-ci/archive/20220215/[email protected]/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/e8b24c1ab111c127cbe1daaac3b607c626fb03a8
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Stefan-Roesch/Support-sync-buffered-writes-for-io-uring/20220215-014908
git checkout e8b24c1ab111c127cbe1daaac3b607c626fb03a8
# save the config file to linux build tree
mkdir build_dir
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
>> fs/buffer.c:805:21: warning: no previous prototype for '__alloc_page_buffers' [-Wmissing-prototypes]
805 | struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
| ^~~~~~~~~~~~~~~~~~~~
vim +/__alloc_page_buffers +805 fs/buffer.c
804
> 805 struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
806 gfp_t gfp)
807 {
808 struct buffer_head *bh, *head;
809 long offset;
810 struct mem_cgroup *memcg, *old_memcg;
811
812 /* The page lock pins the memcg */
813 memcg = page_memcg(page);
814 old_memcg = set_active_memcg(memcg);
815
816 head = NULL;
817 offset = PAGE_SIZE;
818 while ((offset -= size) >= 0) {
819 bh = alloc_buffer_head(gfp);
820 if (!bh)
821 goto no_grow;
822
823 bh->b_this_page = head;
824 bh->b_blocknr = -1;
825 head = bh;
826
827 bh->b_size = size;
828
829 /* Link the buffer to its page */
830 set_bh_page(bh, page, offset);
831 }
832 out:
833 set_active_memcg(old_memcg);
834 return head;
835 /*
836 * In case anything failed, we just free everything we got.
837 */
838 no_grow:
839 if (head) {
840 do {
841 bh = head;
842 head = head->b_this_page;
843 free_buffer_head(bh);
844 } while (head);
845 }
846
847 goto out;
848 }
849
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 05/14] fs: split off __alloc_page_buffers function
2022-02-14 17:43 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function Stefan Roesch
2022-02-14 22:46 ` kernel test robot
@ 2022-02-14 23:27 ` kernel test robot
2022-02-15 2:40 ` [RFC PATCH] fs: __alloc_page_buffers() can be static kernel test robot
2022-02-15 2:41 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function kernel test robot
3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2022-02-14 23:27 UTC (permalink / raw)
To: Stefan Roesch, io-uring, linux-fsdevel, linux-block, kernel-team
Cc: llvm, kbuild-all, shr
Hi Stefan,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on f1baf68e1383f6ed93eb9cff2866d46562607a43]
url: https://github.com/0day-ci/linux/commits/Stefan-Roesch/Support-sync-buffered-writes-for-io-uring/20220215-014908
base: f1baf68e1383f6ed93eb9cff2866d46562607a43
config: arm-s5pv210_defconfig (https://download.01.org/0day-ci/archive/20220215/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project ea071884b0cc7210b3cc5fe858f0e892a779a23b)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/e8b24c1ab111c127cbe1daaac3b607c626fb03a8
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Stefan-Roesch/Support-sync-buffered-writes-for-io-uring/20220215-014908
git checkout e8b24c1ab111c127cbe1daaac3b607c626fb03a8
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
>> fs/buffer.c:805:21: warning: no previous prototype for function '__alloc_page_buffers' [-Wmissing-prototypes]
struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
^
fs/buffer.c:805:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
^
static
1 warning generated.
vim +/__alloc_page_buffers +805 fs/buffer.c
804
> 805 struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
806 gfp_t gfp)
807 {
808 struct buffer_head *bh, *head;
809 long offset;
810 struct mem_cgroup *memcg, *old_memcg;
811
812 /* The page lock pins the memcg */
813 memcg = page_memcg(page);
814 old_memcg = set_active_memcg(memcg);
815
816 head = NULL;
817 offset = PAGE_SIZE;
818 while ((offset -= size) >= 0) {
819 bh = alloc_buffer_head(gfp);
820 if (!bh)
821 goto no_grow;
822
823 bh->b_this_page = head;
824 bh->b_blocknr = -1;
825 head = bh;
826
827 bh->b_size = size;
828
829 /* Link the buffer to its page */
830 set_bh_page(bh, page, offset);
831 }
832 out:
833 set_active_memcg(old_memcg);
834 return head;
835 /*
836 * In case anything failed, we just free everything we got.
837 */
838 no_grow:
839 if (head) {
840 do {
841 bh = head;
842 head = head->b_this_page;
843 free_buffer_head(bh);
844 } while (head);
845 }
846
847 goto out;
848 }
849
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH] fs: __alloc_page_buffers() can be static
2022-02-14 17:43 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function Stefan Roesch
2022-02-14 22:46 ` kernel test robot
2022-02-14 23:27 ` kernel test robot
@ 2022-02-15 2:40 ` kernel test robot
2022-02-15 2:41 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function kernel test robot
3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2022-02-15 2:40 UTC (permalink / raw)
To: Stefan Roesch, io-uring, linux-fsdevel, linux-block, kernel-team
Cc: kbuild-all, shr
fs/buffer.c:805:20: warning: symbol '__alloc_page_buffers' was not declared. Should it be static?
Reported-by: kernel test robot <[email protected]>
Signed-off-by: kernel test robot <[email protected]>
---
fs/buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index a1986f95a39a0..19a4ab1f61686 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -802,7 +802,7 @@ int remove_inode_buffers(struct inode *inode)
return ret;
}
-struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
+static struct buffer_head *__alloc_page_buffers(struct page *page, unsigned long size,
gfp_t gfp)
{
struct buffer_head *bh, *head;
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v1 05/14] fs: split off __alloc_page_buffers function
2022-02-14 17:43 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function Stefan Roesch
` (2 preceding siblings ...)
2022-02-15 2:40 ` [RFC PATCH] fs: __alloc_page_buffers() can be static kernel test robot
@ 2022-02-15 2:41 ` kernel test robot
3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2022-02-15 2:41 UTC (permalink / raw)
To: Stefan Roesch, io-uring, linux-fsdevel, linux-block, kernel-team
Cc: kbuild-all, shr
Hi Stefan,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on f1baf68e1383f6ed93eb9cff2866d46562607a43]
url: https://github.com/0day-ci/linux/commits/Stefan-Roesch/Support-sync-buffered-writes-for-io-uring/20220215-014908
base: f1baf68e1383f6ed93eb9cff2866d46562607a43
config: i386-randconfig-s002-20220214 (https://download.01.org/0day-ci/archive/20220215/[email protected]/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/0day-ci/linux/commit/e8b24c1ab111c127cbe1daaac3b607c626fb03a8
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Stefan-Roesch/Support-sync-buffered-writes-for-io-uring/20220215-014908
git checkout e8b24c1ab111c127cbe1daaac3b607c626fb03a8
# save the config file to linux build tree
mkdir build_dir
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
sparse warnings: (new ones prefixed by >>)
>> fs/buffer.c:805:20: sparse: sparse: symbol '__alloc_page_buffers' was not declared. Should it be static?
Please review and possibly fold the followup patch.
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 00/14] Support sync buffered writes for io-uring
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
` (13 preceding siblings ...)
2022-02-14 17:44 ` [PATCH v1 14/14] block: enable async buffered writes for block devices Stefan Roesch
@ 2022-02-15 3:59 ` Hao Xu
2022-02-15 17:38 ` Stefan Roesch
14 siblings, 1 reply; 34+ messages in thread
From: Hao Xu @ 2022-02-15 3:59 UTC (permalink / raw)
To: Stefan Roesch, io-uring, linux-fsdevel, linux-block, kernel-team
在 2022/2/15 上午1:43, Stefan Roesch 写道:
> This patch series adds support for async buffered writes. Currently
> io-uring only supports buffered writes in the slow path, by processing
> them in the io workers. With this patch series it is now possible to
> support buffered writes in the fast path. To be able to use the fast
> path the required pages must be in the page cache or they can be loaded
> with noio. Otherwise they still get punted to the slow path.
>
> If a buffered write request requires more than one page, it is possible
> that only part of the request can use the fast path, the resst will be
> completed by the io workers.
>
> Support for async buffered writes:
> Patch 1: fs: Add flags parameter to __block_write_begin_int
> Add a flag parameter to the function __block_write_begin_int
> to allow specifying a nowait parameter.
>
> Patch 2: mm: Introduce do_generic_perform_write
> Introduce a new do_generic_perform_write function. The function
> is split off from the existing generic_perform_write() function.
> It allows to specify an additional flag parameter. This parameter
> is used to specify the nowait flag.
>
> Patch 3: mm: add noio support in filemap_get_pages
> This allows to allocate pages with noio, if a page for async
> buffered writes is not yet loaded in the page cache.
>
> Patch 4: mm: Add support for async buffered writes
> For async buffered writes allocate pages without blocking on the
> allocation.
>
> Patch 5: fs: split off __alloc_page_buffers function
> Split off __alloc_page_buffers() function with new gfp_t parameter.
>
> Patch 6: fs: split off __create_empty_buffers function
> Split off __create_empty_buffers() function with new gfp_t parameter.
>
> Patch 7: fs: Add aop_flags parameter to create_page_buffers()
> Add aop_flags to create_page_buffers() function. Use atomic allocation
> for async buffered writes.
>
> Patch 8: fs: add support for async buffered writes
> Return -EAGAIN instead of -ENOMEM for async buffered writes. This
> will cause the write request to be processed by an io worker.
>
> Patch 9: io_uring: add support for async buffered writes
> This enables the async buffered writes for block devices in io_uring.
> Buffered writes are enabled for blocks that are already in the page
> cache or can be acquired with noio.
>
> Patch 10: io_uring: Add tracepoint for short writes
>
> Support for write throttling of async buffered writes:
> Patch 11: sched: add new fields to task_struct
> Add two new fields to the task_struct. These fields store the
> deadline after which writes are no longer throttled.
>
> Patch 12: mm: support write throttling for async buffered writes
> This changes the balance_dirty_pages function to take an additonal
> parameter. When nowait is specified the write throttling code no
> longer waits synchronously for the deadline to expire. Instead
> it sets the fields in task_struct. Once the deadline expires the
> fields are reset.
>
> Patch 13: io_uring: support write throttling for async buffered writes
> Adds support to io_uring for write throttling. When the writes
> are throttled, the write requests are added to the pending io list.
> Once the write throttling deadline expires, the writes are submitted.
>
> Enable async buffered write support
> Patch 14: fs: add flag to support async buffered writes
> This sets the flags that enables async buffered writes for block
> devices.
>
>
> Testing:
> This patch has been tested with xfstests and fio.
>
>
> Peformance results:
> For fio the following results have been obtained with a queue depth of
> 1 and 4k block size (runtime 600 secs):
>
> sequential writes:
> without patch with patch
> throughput: 329 Mib/s 1032Mib/s
> iops: 82k 264k
> slat (nsec) 2332 3340
> clat (nsec) 9017 60
>
> CPU util%: 37% 78%
>
>
>
> random writes:
> without patch with patch
> throughput: 307 Mib/s 909Mib/s
> iops: 76k 227k
> slat (nsec) 2419 3780
> clat (nsec) 9934 59
>
> CPU util%: 57% 88%
>
> For an io depth of 1, the new patch improves throughput by close to 3
> times and also the latency is considerably reduced. To achieve the same
> or better performance with the exisiting code an io depth of 4 is required.
>
> Especially for mixed workloads this is a considerable improvement.
>
>
>
>
> Stefan Roesch (14):
> fs: Add flags parameter to __block_write_begin_int
> mm: Introduce do_generic_perform_write
> mm: add noio support in filemap_get_pages
> mm: Add support for async buffered writes
> fs: split off __alloc_page_buffers function
> fs: split off __create_empty_buffers function
> fs: Add aop_flags parameter to create_page_buffers()
> fs: add support for async buffered writes
> io_uring: add support for async buffered writes
> io_uring: Add tracepoint for short writes
> sched: add new fields to task_struct
> mm: support write throttling for async buffered writes
> io_uring: support write throttling for async buffered writes
> block: enable async buffered writes for block devices.
>
> block/fops.c | 5 +-
> fs/buffer.c | 103 ++++++++++++++++---------
> fs/internal.h | 3 +-
> fs/io_uring.c | 130 +++++++++++++++++++++++++++++---
> fs/iomap/buffered-io.c | 4 +-
> fs/read_write.c | 3 +-
> include/linux/fs.h | 4 +
> include/linux/sched.h | 3 +
> include/linux/writeback.h | 1 +
> include/trace/events/io_uring.h | 25 ++++++
> kernel/fork.c | 1 +
> mm/filemap.c | 34 +++++++--
> mm/folio-compat.c | 4 +
> mm/page-writeback.c | 54 +++++++++----
> 14 files changed, 298 insertions(+), 76 deletions(-)
>
>
> base-commit: f1baf68e1383f6ed93eb9cff2866d46562607a43
>
It's a little bit different between buffered read and buffered write,
there may be block points in detail filesystems due to journal
operations for the latter.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 00/14] Support sync buffered writes for io-uring
2022-02-15 3:59 ` [PATCH v1 00/14] Support sync buffered writes for io-uring Hao Xu
@ 2022-02-15 17:38 ` Stefan Roesch
0 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-15 17:38 UTC (permalink / raw)
To: Hao Xu, io-uring, linux-fsdevel, linux-block, kernel-team
On 2/14/22 7:59 PM, Hao Xu wrote:
> 在 2022/2/15 上午1:43, Stefan Roesch 写道:
>> This patch series adds support for async buffered writes. Currently
>> io-uring only supports buffered writes in the slow path, by processing
>> them in the io workers. With this patch series it is now possible to
>> support buffered writes in the fast path. To be able to use the fast
>> path the required pages must be in the page cache or they can be loaded
>> with noio. Otherwise they still get punted to the slow path.
>>
>> If a buffered write request requires more than one page, it is possible
>> that only part of the request can use the fast path, the resst will be
>> completed by the io workers.
>>
>> Support for async buffered writes:
>> Patch 1: fs: Add flags parameter to __block_write_begin_int
>> Add a flag parameter to the function __block_write_begin_int
>> to allow specifying a nowait parameter.
>> Patch 2: mm: Introduce do_generic_perform_write
>> Introduce a new do_generic_perform_write function. The function
>> is split off from the existing generic_perform_write() function.
>> It allows to specify an additional flag parameter. This parameter
>> is used to specify the nowait flag.
>> Patch 3: mm: add noio support in filemap_get_pages
>> This allows to allocate pages with noio, if a page for async
>> buffered writes is not yet loaded in the page cache.
>> Patch 4: mm: Add support for async buffered writes
>> For async buffered writes allocate pages without blocking on the
>> allocation.
>>
>> Patch 5: fs: split off __alloc_page_buffers function
>> Split off __alloc_page_buffers() function with new gfp_t parameter.
>>
>> Patch 6: fs: split off __create_empty_buffers function
>> Split off __create_empty_buffers() function with new gfp_t parameter.
>>
>> Patch 7: fs: Add aop_flags parameter to create_page_buffers()
>> Add aop_flags to create_page_buffers() function. Use atomic allocation
>> for async buffered writes.
>>
>> Patch 8: fs: add support for async buffered writes
>> Return -EAGAIN instead of -ENOMEM for async buffered writes. This
>> will cause the write request to be processed by an io worker.
>>
>> Patch 9: io_uring: add support for async buffered writes
>> This enables the async buffered writes for block devices in io_uring.
>> Buffered writes are enabled for blocks that are already in the page
>> cache or can be acquired with noio.
>>
>> Patch 10: io_uring: Add tracepoint for short writes
>>
>> Support for write throttling of async buffered writes:
>> Patch 11: sched: add new fields to task_struct
>> Add two new fields to the task_struct. These fields store the
>> deadline after which writes are no longer throttled.
>>
>> Patch 12: mm: support write throttling for async buffered writes
>> This changes the balance_dirty_pages function to take an additonal
>> parameter. When nowait is specified the write throttling code no
>> longer waits synchronously for the deadline to expire. Instead
>> it sets the fields in task_struct. Once the deadline expires the
>> fields are reset.
>> Patch 13: io_uring: support write throttling for async buffered writes
>> Adds support to io_uring for write throttling. When the writes
>> are throttled, the write requests are added to the pending io list.
>> Once the write throttling deadline expires, the writes are submitted.
>> Enable async buffered write support
>> Patch 14: fs: add flag to support async buffered writes
>> This sets the flags that enables async buffered writes for block
>> devices.
>>
>>
>> Testing:
>> This patch has been tested with xfstests and fio.
>>
>>
>> Peformance results:
>> For fio the following results have been obtained with a queue depth of
>> 1 and 4k block size (runtime 600 secs):
>>
>> sequential writes:
>> without patch with patch
>> throughput: 329 Mib/s 1032Mib/s
>> iops: 82k 264k
>> slat (nsec) 2332 3340
>> clat (nsec) 9017 60
>> CPU util%: 37% 78%
>>
>>
>>
>> random writes:
>> without patch with patch
>> throughput: 307 Mib/s 909Mib/s
>> iops: 76k 227k
>> slat (nsec) 2419 3780
>> clat (nsec) 9934 59
>>
>> CPU util%: 57% 88%
>>
>> For an io depth of 1, the new patch improves throughput by close to 3
>> times and also the latency is considerably reduced. To achieve the same
>> or better performance with the exisiting code an io depth of 4 is required.
>>
>> Especially for mixed workloads this is a considerable improvement.
>>
>>
>>
>>
>> Stefan Roesch (14):
>> fs: Add flags parameter to __block_write_begin_int
>> mm: Introduce do_generic_perform_write
>> mm: add noio support in filemap_get_pages
>> mm: Add support for async buffered writes
>> fs: split off __alloc_page_buffers function
>> fs: split off __create_empty_buffers function
>> fs: Add aop_flags parameter to create_page_buffers()
>> fs: add support for async buffered writes
>> io_uring: add support for async buffered writes
>> io_uring: Add tracepoint for short writes
>> sched: add new fields to task_struct
>> mm: support write throttling for async buffered writes
>> io_uring: support write throttling for async buffered writes
>> block: enable async buffered writes for block devices.
>>
>> block/fops.c | 5 +-
>> fs/buffer.c | 103 ++++++++++++++++---------
>> fs/internal.h | 3 +-
>> fs/io_uring.c | 130 +++++++++++++++++++++++++++++---
>> fs/iomap/buffered-io.c | 4 +-
>> fs/read_write.c | 3 +-
>> include/linux/fs.h | 4 +
>> include/linux/sched.h | 3 +
>> include/linux/writeback.h | 1 +
>> include/trace/events/io_uring.h | 25 ++++++
>> kernel/fork.c | 1 +
>> mm/filemap.c | 34 +++++++--
>> mm/folio-compat.c | 4 +
>> mm/page-writeback.c | 54 +++++++++----
>> 14 files changed, 298 insertions(+), 76 deletions(-)
>>
>>
>> base-commit: f1baf68e1383f6ed93eb9cff2866d46562607a43
>>
> It's a little bit different between buffered read and buffered write,
> there may be block points in detail filesystems due to journal
> operations for the latter.
>
This patch series only adds support for async buffered writes for block
devices, not filesystems.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 03/14] mm: add noio support in filemap_get_pages
2022-02-14 19:33 ` Matthew Wilcox
@ 2022-02-16 18:26 ` Stefan Roesch
0 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-16 18:26 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
This patch will be removed from the next version.
On 2/14/22 11:33 AM, Matthew Wilcox wrote:
> On Mon, Feb 14, 2022 at 09:43:52AM -0800, Stefan Roesch wrote:
>> This adds noio support for async buffered writes in filemap_get_pages.
>> The idea is to handle the failure gracefully and return -EAGAIN if we
>> can't get the memory quickly.
>
> I don't understand why this helps you. filemap_get_pages() is for
> reads, not writes.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 03/14] mm: add noio support in filemap_get_pages
2022-02-14 18:08 ` Matthew Wilcox
@ 2022-02-16 18:27 ` Stefan Roesch
0 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-16 18:27 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On 2/14/22 10:08 AM, Matthew Wilcox wrote:
> On Mon, Feb 14, 2022 at 09:43:52AM -0800, Stefan Roesch wrote:
>> This adds noio support for async buffered writes in filemap_get_pages.
>> The idea is to handle the failure gracefully and return -EAGAIN if we
>> can't get the memory quickly.
>
> But it doesn't return -EAGAIN?
>
> folio = filemap_alloc_folio(mapping_gfp_mask(mapping), 0);
> if (!folio)
> return -ENOMEM;
>
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>> mm/filemap.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index d2fb817c0845..0ff4278c3961 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2591,10 +2591,15 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
>> filemap_get_read_batch(mapping, index, last_index, fbatch);
>> }
>> if (!folio_batch_count(fbatch)) {
>> + unsigned int pflags;
>> +
>> if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
>> - return -EAGAIN;
>> + pflags = memalloc_noio_save();
>> err = filemap_create_folio(filp, mapping,
>> iocb->ki_pos >> PAGE_SHIFT, fbatch);
>> + if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
>> + memalloc_noio_restore(pflags);
>> +
>> if (err == AOP_TRUNCATED_PAGE)
>> goto retry;
>> return err;
>
> I would also not expect the memalloc_noio_save/restore calls to be
> here. Surely they should be at the top of the call chain where
> IOCB_NOWAIT/IOCB_WAITQ are set?
This patch will be removed from the next version of the patch series.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers()
2022-02-14 18:14 ` Matthew Wilcox
@ 2022-02-16 18:30 ` Stefan Roesch
2022-02-16 18:34 ` Matthew Wilcox
0 siblings, 1 reply; 34+ messages in thread
From: Stefan Roesch @ 2022-02-16 18:30 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On 2/14/22 10:14 AM, Matthew Wilcox wrote:
> On Mon, Feb 14, 2022 at 09:43:56AM -0800, Stefan Roesch wrote:
>> This adds the aop_flags parameter to the create_page_buffers function.
>> When AOP_FLAGS_NOWAIT parameter is set, the atomic allocation flag is
>> set. The AOP_FLAGS_NOWAIT flag is set, when async buffered writes are
>> enabled.
>
> Why is this better than passing in gfp flags directly?
>
I don't think that gfp flags are a great fit here. We only want to pass in
a nowait flag and this does not map nicely to a gfp flag.
Instead of passing in a flag parameter we can also pass in a bool parameter,
however that has its limitations as it can't be extended in the future.
>> Signed-off-by: Stefan Roesch <[email protected]>
>> ---
>> fs/buffer.c | 28 +++++++++++++++++++++-------
>> 1 file changed, 21 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 948505480b43..5e3067173580 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -1682,13 +1682,27 @@ static inline int block_size_bits(unsigned int blocksize)
>> return ilog2(blocksize);
>> }
>>
>> -static struct buffer_head *create_page_buffers(struct page *page, struct inode *inode, unsigned int b_state)
>> +static struct buffer_head *create_page_buffers(struct page *page,
>> + struct inode *inode,
>> + unsigned int b_state,
>> + unsigned int aop_flags)
>> {
>> BUG_ON(!PageLocked(page));
>>
>> - if (!page_has_buffers(page))
>> - create_empty_buffers(page, 1 << READ_ONCE(inode->i_blkbits),
>> - b_state);
>> + if (!page_has_buffers(page)) {
>> + gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
>> +
>> + if (aop_flags & AOP_FLAGS_NOWAIT) {
>> + gfp |= GFP_ATOMIC | __GFP_NOWARN;
>> + gfp &= ~__GFP_DIRECT_RECLAIM;
>> + } else {
>> + gfp |= __GFP_NOFAIL;
>> + }
>> +
>> + __create_empty_buffers(page, 1 << READ_ONCE(inode->i_blkbits),
>> + b_state, gfp);
>> + }
>> +
>> return page_buffers(page);
>> }
>>
>> @@ -1734,7 +1748,7 @@ int __block_write_full_page(struct inode *inode, struct page *page,
>> int write_flags = wbc_to_write_flags(wbc);
>>
>> head = create_page_buffers(page, inode,
>> - (1 << BH_Dirty)|(1 << BH_Uptodate));
>> + (1 << BH_Dirty)|(1 << BH_Uptodate), 0);
>>
>> /*
>> * Be very careful. We have no exclusion from __set_page_dirty_buffers
>> @@ -2000,7 +2014,7 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
>> BUG_ON(to > PAGE_SIZE);
>> BUG_ON(from > to);
>>
>> - head = create_page_buffers(&folio->page, inode, 0);
>> + head = create_page_buffers(&folio->page, inode, 0, flags);
>> blocksize = head->b_size;
>> bbits = block_size_bits(blocksize);
>>
>> @@ -2280,7 +2294,7 @@ int block_read_full_page(struct page *page, get_block_t *get_block)
>> int nr, i;
>> int fully_mapped = 1;
>>
>> - head = create_page_buffers(page, inode, 0);
>> + head = create_page_buffers(page, inode, 0, 0);
>> blocksize = head->b_size;
>> bbits = block_size_bits(blocksize);
>>
>> --
>> 2.30.2
>>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int
2022-02-14 19:02 ` Matthew Wilcox
@ 2022-02-16 18:31 ` Stefan Roesch
2022-02-16 18:35 ` Matthew Wilcox
0 siblings, 1 reply; 34+ messages in thread
From: Stefan Roesch @ 2022-02-16 18:31 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
I don't think that gfp flags are a great fit here. We only want to pass in
a nowait flag and this does not map nicely to a gfp flag.
Instead of passing in a flag parameter we can also pass in a bool parameter,
however that has its limitations as it can't be extended in the future.
On 2/14/22 11:02 AM, Matthew Wilcox wrote:
> On Mon, Feb 14, 2022 at 09:43:50AM -0800, Stefan Roesch wrote:
>> This adds a flags parameter to the __begin_write_begin_int() function.
>> This allows to pass flags down the stack.
>
> In general, I am not in favour of more AOP_FLAG uses. I'd prefer to
> remove the two that we do have (reiserfs is the only user of
> AOP_FLAG_CONT_EXPAND and AOP_FLAG_NOFS can usually be replaced by
> passing in gfp_t flags).
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers()
2022-02-16 18:30 ` Stefan Roesch
@ 2022-02-16 18:34 ` Matthew Wilcox
2022-02-16 18:35 ` Stefan Roesch
0 siblings, 1 reply; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-16 18:34 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Wed, Feb 16, 2022 at 10:30:33AM -0800, Stefan Roesch wrote:
> On 2/14/22 10:14 AM, Matthew Wilcox wrote:
> > On Mon, Feb 14, 2022 at 09:43:56AM -0800, Stefan Roesch wrote:
> >> This adds the aop_flags parameter to the create_page_buffers function.
> >> When AOP_FLAGS_NOWAIT parameter is set, the atomic allocation flag is
> >> set. The AOP_FLAGS_NOWAIT flag is set, when async buffered writes are
> >> enabled.
> >
> > Why is this better than passing in gfp flags directly?
> >
>
> I don't think that gfp flags are a great fit here. We only want to pass in
> a nowait flag and this does not map nicely to a gfp flag.
... what? The only thing you do with this flag is use it to choose
some gfp flags. Pass those gfp flags in directly.
> >> + gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
> >> +
> >> + if (aop_flags & AOP_FLAGS_NOWAIT) {
> >> + gfp |= GFP_ATOMIC | __GFP_NOWARN;
> >> + gfp &= ~__GFP_DIRECT_RECLAIM;
> >> + } else {
> >> + gfp |= __GFP_NOFAIL;
> >> + }
The flags you've chosen here are also bonkers, but I'm not sure that
it's worth explaining to you why if you're this resistant to making
obvious corrections to your patches.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int
2022-02-16 18:31 ` Stefan Roesch
@ 2022-02-16 18:35 ` Matthew Wilcox
0 siblings, 0 replies; 34+ messages in thread
From: Matthew Wilcox @ 2022-02-16 18:35 UTC (permalink / raw)
To: Stefan Roesch; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On Wed, Feb 16, 2022 at 10:31:18AM -0800, Stefan Roesch wrote:
> I don't think that gfp flags are a great fit here. We only want to pass in
> a nowait flag and this does not map nicely to a gfp flag.
>
> Instead of passing in a flag parameter we can also pass in a bool parameter,
> however that has its limitations as it can't be extended in the future.
if you're just going to copy and paste this inanity into every reply
to me, let's just say hard NAK to every patch from you for now on?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers()
2022-02-16 18:34 ` Matthew Wilcox
@ 2022-02-16 18:35 ` Stefan Roesch
0 siblings, 0 replies; 34+ messages in thread
From: Stefan Roesch @ 2022-02-16 18:35 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: io-uring, linux-fsdevel, linux-block, kernel-team
On 2/16/22 10:34 AM, Matthew Wilcox wrote:
> On Wed, Feb 16, 2022 at 10:30:33AM -0800, Stefan Roesch wrote:
>> On 2/14/22 10:14 AM, Matthew Wilcox wrote:
>>> On Mon, Feb 14, 2022 at 09:43:56AM -0800, Stefan Roesch wrote:
>>>> This adds the aop_flags parameter to the create_page_buffers function.
>>>> When AOP_FLAGS_NOWAIT parameter is set, the atomic allocation flag is
>>>> set. The AOP_FLAGS_NOWAIT flag is set, when async buffered writes are
>>>> enabled.
>>>
>>> Why is this better than passing in gfp flags directly?
>>>
>>
>> I don't think that gfp flags are a great fit here. We only want to pass in
>> a nowait flag and this does not map nicely to a gfp flag.
>
> ... what? The only thing you do with this flag is use it to choose
> some gfp flags. Pass those gfp flags in directly.
>
>>>> + gfp_t gfp = GFP_NOFS | __GFP_ACCOUNT;
>>>> +
>>>> + if (aop_flags & AOP_FLAGS_NOWAIT) {
>>>> + gfp |= GFP_ATOMIC | __GFP_NOWARN;
>>>> + gfp &= ~__GFP_DIRECT_RECLAIM;
>>>> + } else {
>>>> + gfp |= __GFP_NOFAIL;
>>>> + }
>
> The flags you've chosen here are also bonkers, but I'm not sure that
> it's worth explaining to you why if you're this resistant to making
> obvious corrections to your patches.
Sorry my comment was for patch 1 not patch 7.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2022-02-16 18:35 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-14 17:43 [PATCH v1 00/14] Support sync buffered writes for io-uring Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 01/14] fs: Add flags parameter to __block_write_begin_int Stefan Roesch
2022-02-14 19:02 ` Matthew Wilcox
2022-02-16 18:31 ` Stefan Roesch
2022-02-16 18:35 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 02/14] mm: Introduce do_generic_perform_write Stefan Roesch
2022-02-14 19:06 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 03/14] mm: add noio support in filemap_get_pages Stefan Roesch
2022-02-14 18:08 ` Matthew Wilcox
2022-02-16 18:27 ` Stefan Roesch
2022-02-14 19:33 ` Matthew Wilcox
2022-02-16 18:26 ` Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 04/14] mm: Add support for async buffered writes Stefan Roesch
2022-02-14 19:09 ` Matthew Wilcox
2022-02-14 17:43 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function Stefan Roesch
2022-02-14 22:46 ` kernel test robot
2022-02-14 23:27 ` kernel test robot
2022-02-15 2:40 ` [RFC PATCH] fs: __alloc_page_buffers() can be static kernel test robot
2022-02-15 2:41 ` [PATCH v1 05/14] fs: split off __alloc_page_buffers function kernel test robot
2022-02-14 17:43 ` [PATCH v1 06/14] fs: split off __create_empty_buffers function Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 07/14] fs: Add aop_flags parameter to create_page_buffers() Stefan Roesch
2022-02-14 18:14 ` Matthew Wilcox
2022-02-16 18:30 ` Stefan Roesch
2022-02-16 18:34 ` Matthew Wilcox
2022-02-16 18:35 ` Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 08/14] fs: add support for async buffered writes Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 09/14] io_uring: " Stefan Roesch
2022-02-14 17:43 ` [PATCH v1 10/14] io_uring: Add tracepoint for short writes Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 11/14] sched: add new fields to task_struct Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 12/14] mm: support write throttling for async buffered writes Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 13/14] io_uring: " Stefan Roesch
2022-02-14 17:44 ` [PATCH v1 14/14] block: enable async buffered writes for block devices Stefan Roesch
2022-02-15 3:59 ` [PATCH v1 00/14] Support sync buffered writes for io-uring Hao Xu
2022-02-15 17:38 ` Stefan Roesch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox