* [PATCH v3 1/7] splice: don't generate zero-len segement bvecs
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
@ 2021-01-09 16:02 ` Pavel Begunkov
2021-01-11 2:48 ` Ming Lei
2021-01-09 16:02 ` [PATCH v3 2/7] bvec/iter: disallow zero-length segment bvecs Pavel Begunkov
` (6 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-01-09 16:02 UTC (permalink / raw)
To: linux-block
Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox, Ming Lei,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
iter_file_splice_write() may spawn bvec segments with zero-length. In
preparation for prohibiting them, filter out by hand at splice level.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/splice.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 866d5c2367b2..474fb8b5562a 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -662,12 +662,14 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
/* build the vector */
left = sd.total_len;
- for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++, n++) {
+ for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++) {
struct pipe_buffer *buf = &pipe->bufs[tail & mask];
size_t this_len = buf->len;
- if (this_len > left)
- this_len = left;
+ /* zero-length bvecs are not supported, skip them */
+ if (!this_len)
+ continue;
+ this_len = min(this_len, left);
ret = pipe_buf_confirm(pipe, buf);
if (unlikely(ret)) {
@@ -680,6 +682,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
array[n].bv_len = this_len;
array[n].bv_offset = buf->offset;
left -= this_len;
+ n++;
}
iov_iter_bvec(&from, WRITE, array, n, sd.total_len - left);
--
2.24.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 1/7] splice: don't generate zero-len segement bvecs
2021-01-09 16:02 ` [PATCH v3 1/7] splice: don't generate zero-len segement bvecs Pavel Begunkov
@ 2021-01-11 2:48 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-01-11 2:48 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, Jens Axboe, Christoph Hellwig, Matthew Wilcox,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
On Sat, Jan 09, 2021 at 04:02:57PM +0000, Pavel Begunkov wrote:
> iter_file_splice_write() may spawn bvec segments with zero-length. In
> preparation for prohibiting them, filter out by hand at splice level.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> fs/splice.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/splice.c b/fs/splice.c
> index 866d5c2367b2..474fb8b5562a 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -662,12 +662,14 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
>
> /* build the vector */
> left = sd.total_len;
> - for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++, n++) {
> + for (n = 0; !pipe_empty(head, tail) && left && n < nbufs; tail++) {
> struct pipe_buffer *buf = &pipe->bufs[tail & mask];
> size_t this_len = buf->len;
>
> - if (this_len > left)
> - this_len = left;
> + /* zero-length bvecs are not supported, skip them */
> + if (!this_len)
> + continue;
> + this_len = min(this_len, left);
>
> ret = pipe_buf_confirm(pipe, buf);
> if (unlikely(ret)) {
> @@ -680,6 +682,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
> array[n].bv_len = this_len;
> array[n].bv_offset = buf->offset;
> left -= this_len;
> + n++;
> }
>
> iov_iter_bvec(&from, WRITE, array, n, sd.total_len - left);
> --
> 2.24.0
>
Reviewed-by: Ming Lei <[email protected]>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 2/7] bvec/iter: disallow zero-length segment bvecs
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
2021-01-09 16:02 ` [PATCH v3 1/7] splice: don't generate zero-len segement bvecs Pavel Begunkov
@ 2021-01-09 16:02 ` Pavel Begunkov
2021-01-11 2:49 ` Ming Lei
2021-01-09 16:02 ` [PATCH v3 3/7] block/psi: remove PSI annotations from direct IO Pavel Begunkov
` (5 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-01-09 16:02 UTC (permalink / raw)
To: linux-block
Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox, Ming Lei,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
zero-length bvec segments are allowed in general, but not handled by bio
and down the block layer so filtered out. This inconsistency may be
confusing and prevent from optimisations. As zero-length segments are
useless and places that were generating them are patched, declare them
not allowed.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
Documentation/block/biovecs.rst | 2 ++
Documentation/filesystems/porting.rst | 7 +++++++
lib/iov_iter.c | 2 --
3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst
index 36771a131b56..ddb867e0185b 100644
--- a/Documentation/block/biovecs.rst
+++ b/Documentation/block/biovecs.rst
@@ -40,6 +40,8 @@ normal code doesn't have to deal with bi_bvec_done.
There is a lower level advance function - bvec_iter_advance() - which takes
a pointer to a biovec, not a bio; this is used by the bio integrity code.
+As of 5.12 bvec segments with zero bv_len are not supported.
+
What's all this get us?
=======================
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 867036aa90b8..c722d94f29ea 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -865,3 +865,10 @@ no matter what. Everything is handled by the caller.
clone_private_mount() returns a longterm mount now, so the proper destructor of
its result is kern_unmount() or kern_unmount_array().
+
+---
+
+**mandatory**
+
+zero-length bvec segments are disallowed, they must be filtered out before
+passed on to an iterator.
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..7de304269641 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -72,8 +72,6 @@
__start.bi_bvec_done = skip; \
__start.bi_idx = 0; \
for_each_bvec(__v, i->bvec, __bi, __start) { \
- if (!__v.bv_len) \
- continue; \
(void)(STEP); \
} \
}
--
2.24.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 2/7] bvec/iter: disallow zero-length segment bvecs
2021-01-09 16:02 ` [PATCH v3 2/7] bvec/iter: disallow zero-length segment bvecs Pavel Begunkov
@ 2021-01-11 2:49 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-01-11 2:49 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, Jens Axboe, Christoph Hellwig, Matthew Wilcox,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
On Sat, Jan 09, 2021 at 04:02:58PM +0000, Pavel Begunkov wrote:
> zero-length bvec segments are allowed in general, but not handled by bio
> and down the block layer so filtered out. This inconsistency may be
> confusing and prevent from optimisations. As zero-length segments are
> useless and places that were generating them are patched, declare them
> not allowed.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> Documentation/block/biovecs.rst | 2 ++
> Documentation/filesystems/porting.rst | 7 +++++++
> lib/iov_iter.c | 2 --
> 3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst
> index 36771a131b56..ddb867e0185b 100644
> --- a/Documentation/block/biovecs.rst
> +++ b/Documentation/block/biovecs.rst
> @@ -40,6 +40,8 @@ normal code doesn't have to deal with bi_bvec_done.
> There is a lower level advance function - bvec_iter_advance() - which takes
> a pointer to a biovec, not a bio; this is used by the bio integrity code.
>
> +As of 5.12 bvec segments with zero bv_len are not supported.
> +
> What's all this get us?
> =======================
>
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 867036aa90b8..c722d94f29ea 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -865,3 +865,10 @@ no matter what. Everything is handled by the caller.
>
> clone_private_mount() returns a longterm mount now, so the proper destructor of
> its result is kern_unmount() or kern_unmount_array().
> +
> +---
> +
> +**mandatory**
> +
> +zero-length bvec segments are disallowed, they must be filtered out before
> +passed on to an iterator.
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1635111c5bd2..7de304269641 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -72,8 +72,6 @@
> __start.bi_bvec_done = skip; \
> __start.bi_idx = 0; \
> for_each_bvec(__v, i->bvec, __bi, __start) { \
> - if (!__v.bv_len) \
> - continue; \
> (void)(STEP); \
> } \
> }
> --
> 2.24.0
>
Reviewed-by: Ming Lei <[email protected]>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 3/7] block/psi: remove PSI annotations from direct IO
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
2021-01-09 16:02 ` [PATCH v3 1/7] splice: don't generate zero-len segement bvecs Pavel Begunkov
2021-01-09 16:02 ` [PATCH v3 2/7] bvec/iter: disallow zero-length segment bvecs Pavel Begunkov
@ 2021-01-09 16:02 ` Pavel Begunkov
2021-01-11 2:52 ` Ming Lei
2021-01-09 16:03 ` [PATCH v3 4/7] target/file: allocate the bvec array as part of struct target_core_file_cmd Pavel Begunkov
` (4 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-01-09 16:02 UTC (permalink / raw)
To: linux-block
Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox, Ming Lei,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
Direct IO does not operate on the current working set of pages managed
by the kernel, so it should not be accounted as memory stall to PSI
infrastructure.
The block layer and iomap direct IO use bio_iov_iter_get_pages()
to build bios, and they are the only users of it, so to avoid PSI
tracking for them clear out BIO_WORKINGSET flag. Do same for
dio_bio_submit() because fs/direct_io constructs bios by hand directly
calling bio_add_page().
Reported-by: Christoph Hellwig <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Suggested-by: Johannes Weiner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
block/bio.c | 6 ++++++
fs/direct-io.c | 2 ++
2 files changed, 8 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..9f26984af643 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1099,6 +1099,9 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
* fit into the bio, or are requested in @iter, whatever is smaller. If
* MM encounters an error pinning the requested pages, it stops. Error
* is returned only if 0 pages could be pinned.
+ *
+ * It's intended for direct IO, so doesn't do PSI tracking, the caller is
+ * responsible for setting BIO_WORKINGSET if necessary.
*/
int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
{
@@ -1123,6 +1126,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
if (is_bvec)
bio_set_flag(bio, BIO_NO_PAGE_REF);
+
+ /* don't account direct I/O as memory stall */
+ bio_clear_flag(bio, BIO_WORKINGSET);
return bio->bi_vcnt ? 0 : ret;
}
EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index d53fa92a1ab6..0e689233f2c7 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -426,6 +426,8 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
unsigned long flags;
bio->bi_private = dio;
+ /* don't account direct I/O as memory stall */
+ bio_clear_flag(bio, BIO_WORKINGSET);
spin_lock_irqsave(&dio->bio_lock, flags);
dio->refcount++;
--
2.24.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 3/7] block/psi: remove PSI annotations from direct IO
2021-01-09 16:02 ` [PATCH v3 3/7] block/psi: remove PSI annotations from direct IO Pavel Begunkov
@ 2021-01-11 2:52 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-01-11 2:52 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, Jens Axboe, Christoph Hellwig, Matthew Wilcox,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
On Sat, Jan 09, 2021 at 04:02:59PM +0000, Pavel Begunkov wrote:
> Direct IO does not operate on the current working set of pages managed
> by the kernel, so it should not be accounted as memory stall to PSI
> infrastructure.
>
> The block layer and iomap direct IO use bio_iov_iter_get_pages()
> to build bios, and they are the only users of it, so to avoid PSI
> tracking for them clear out BIO_WORKINGSET flag. Do same for
> dio_bio_submit() because fs/direct_io constructs bios by hand directly
> calling bio_add_page().
>
> Reported-by: Christoph Hellwig <[email protected]>
> Suggested-by: Christoph Hellwig <[email protected]>
> Suggested-by: Johannes Weiner <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> block/bio.c | 6 ++++++
> fs/direct-io.c | 2 ++
> 2 files changed, 8 insertions(+)
>
> diff --git a/block/bio.c b/block/bio.c
> index 1f2cc1fbe283..9f26984af643 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1099,6 +1099,9 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
> * fit into the bio, or are requested in @iter, whatever is smaller. If
> * MM encounters an error pinning the requested pages, it stops. Error
> * is returned only if 0 pages could be pinned.
> + *
> + * It's intended for direct IO, so doesn't do PSI tracking, the caller is
> + * responsible for setting BIO_WORKINGSET if necessary.
> */
> int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> {
> @@ -1123,6 +1126,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>
> if (is_bvec)
> bio_set_flag(bio, BIO_NO_PAGE_REF);
> +
> + /* don't account direct I/O as memory stall */
> + bio_clear_flag(bio, BIO_WORKINGSET);
> return bio->bi_vcnt ? 0 : ret;
> }
> EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index d53fa92a1ab6..0e689233f2c7 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -426,6 +426,8 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
> unsigned long flags;
>
> bio->bi_private = dio;
> + /* don't account direct I/O as memory stall */
> + bio_clear_flag(bio, BIO_WORKINGSET);
>
> spin_lock_irqsave(&dio->bio_lock, flags);
> dio->refcount++;
> --
> 2.24.0
>
Reviewed-by: Ming Lei <[email protected]>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 4/7] target/file: allocate the bvec array as part of struct target_core_file_cmd
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
` (2 preceding siblings ...)
2021-01-09 16:02 ` [PATCH v3 3/7] block/psi: remove PSI annotations from direct IO Pavel Begunkov
@ 2021-01-09 16:03 ` Pavel Begunkov
2021-01-11 2:53 ` Ming Lei
2021-01-09 16:03 ` [PATCH v3 5/7] iov_iter: optimise bvec iov_iter_advance() Pavel Begunkov
` (3 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-01-09 16:03 UTC (permalink / raw)
To: linux-block
Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox, Ming Lei,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
From: Christoph Hellwig <[email protected]>
This saves one memory allocation, and ensures the bvecs aren't freed
before the AIO completion. This will allow the lower level code to be
optimized so that it can avoid allocating another bvec array.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
drivers/target/target_core_file.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)
diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index b0cb5b95e892..cce455929778 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -241,6 +241,7 @@ struct target_core_file_cmd {
unsigned long len;
struct se_cmd *cmd;
struct kiocb iocb;
+ struct bio_vec bvecs[];
};
static void cmd_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
@@ -268,29 +269,22 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
struct target_core_file_cmd *aio_cmd;
struct iov_iter iter = {};
struct scatterlist *sg;
- struct bio_vec *bvec;
ssize_t len = 0;
int ret = 0, i;
- aio_cmd = kmalloc(sizeof(struct target_core_file_cmd), GFP_KERNEL);
+ aio_cmd = kmalloc(struct_size(aio_cmd, bvecs, sgl_nents), GFP_KERNEL);
if (!aio_cmd)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
- bvec = kcalloc(sgl_nents, sizeof(struct bio_vec), GFP_KERNEL);
- if (!bvec) {
- kfree(aio_cmd);
- return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
- }
-
for_each_sg(sgl, sg, sgl_nents, i) {
- bvec[i].bv_page = sg_page(sg);
- bvec[i].bv_len = sg->length;
- bvec[i].bv_offset = sg->offset;
+ aio_cmd->bvecs[i].bv_page = sg_page(sg);
+ aio_cmd->bvecs[i].bv_len = sg->length;
+ aio_cmd->bvecs[i].bv_offset = sg->offset;
len += sg->length;
}
- iov_iter_bvec(&iter, is_write, bvec, sgl_nents, len);
+ iov_iter_bvec(&iter, is_write, aio_cmd->bvecs, sgl_nents, len);
aio_cmd->cmd = cmd;
aio_cmd->len = len;
@@ -307,8 +301,6 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
else
ret = call_read_iter(file, &aio_cmd->iocb, &iter);
- kfree(bvec);
-
if (ret != -EIOCBQUEUED)
cmd_rw_aio_complete(&aio_cmd->iocb, ret, 0);
--
2.24.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 4/7] target/file: allocate the bvec array as part of struct target_core_file_cmd
2021-01-09 16:03 ` [PATCH v3 4/7] target/file: allocate the bvec array as part of struct target_core_file_cmd Pavel Begunkov
@ 2021-01-11 2:53 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-01-11 2:53 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, Jens Axboe, Christoph Hellwig, Matthew Wilcox,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
On Sat, Jan 09, 2021 at 04:03:00PM +0000, Pavel Begunkov wrote:
> From: Christoph Hellwig <[email protected]>
>
> This saves one memory allocation, and ensures the bvecs aren't freed
> before the AIO completion. This will allow the lower level code to be
> optimized so that it can avoid allocating another bvec array.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> drivers/target/target_core_file.c | 20 ++++++--------------
> 1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
> index b0cb5b95e892..cce455929778 100644
> --- a/drivers/target/target_core_file.c
> +++ b/drivers/target/target_core_file.c
> @@ -241,6 +241,7 @@ struct target_core_file_cmd {
> unsigned long len;
> struct se_cmd *cmd;
> struct kiocb iocb;
> + struct bio_vec bvecs[];
> };
>
> static void cmd_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
> @@ -268,29 +269,22 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> struct target_core_file_cmd *aio_cmd;
> struct iov_iter iter = {};
> struct scatterlist *sg;
> - struct bio_vec *bvec;
> ssize_t len = 0;
> int ret = 0, i;
>
> - aio_cmd = kmalloc(sizeof(struct target_core_file_cmd), GFP_KERNEL);
> + aio_cmd = kmalloc(struct_size(aio_cmd, bvecs, sgl_nents), GFP_KERNEL);
> if (!aio_cmd)
> return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
>
> - bvec = kcalloc(sgl_nents, sizeof(struct bio_vec), GFP_KERNEL);
> - if (!bvec) {
> - kfree(aio_cmd);
> - return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
> - }
> -
> for_each_sg(sgl, sg, sgl_nents, i) {
> - bvec[i].bv_page = sg_page(sg);
> - bvec[i].bv_len = sg->length;
> - bvec[i].bv_offset = sg->offset;
> + aio_cmd->bvecs[i].bv_page = sg_page(sg);
> + aio_cmd->bvecs[i].bv_len = sg->length;
> + aio_cmd->bvecs[i].bv_offset = sg->offset;
>
> len += sg->length;
> }
>
> - iov_iter_bvec(&iter, is_write, bvec, sgl_nents, len);
> + iov_iter_bvec(&iter, is_write, aio_cmd->bvecs, sgl_nents, len);
>
> aio_cmd->cmd = cmd;
> aio_cmd->len = len;
> @@ -307,8 +301,6 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> else
> ret = call_read_iter(file, &aio_cmd->iocb, &iter);
>
> - kfree(bvec);
> -
> if (ret != -EIOCBQUEUED)
> cmd_rw_aio_complete(&aio_cmd->iocb, ret, 0);
>
> --
> 2.24.0
>
Reviewed-by: Ming Lei <[email protected]>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 5/7] iov_iter: optimise bvec iov_iter_advance()
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
` (3 preceding siblings ...)
2021-01-09 16:03 ` [PATCH v3 4/7] target/file: allocate the bvec array as part of struct target_core_file_cmd Pavel Begunkov
@ 2021-01-09 16:03 ` Pavel Begunkov
2021-01-11 2:55 ` Ming Lei
2021-01-09 16:03 ` [PATCH v3 6/7] bio: add a helper calculating nr segments to alloc Pavel Begunkov
` (2 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-01-09 16:03 UTC (permalink / raw)
To: linux-block
Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox, Ming Lei,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
iov_iter_advance() is heavily used, but implemented through generic
means. For bvecs there is a specifically crafted function for that, so
use bvec_iter_advance() instead, it's faster and slimmer.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
lib/iov_iter.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 7de304269641..9b1c109dc8a9 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1065,6 +1065,21 @@ static void pipe_advance(struct iov_iter *i, size_t size)
pipe_truncate(i);
}
+static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
+{
+ struct bvec_iter bi;
+
+ bi.bi_size = i->count;
+ bi.bi_bvec_done = i->iov_offset;
+ bi.bi_idx = 0;
+ bvec_iter_advance(i->bvec, &bi, size);
+
+ i->bvec += bi.bi_idx;
+ i->nr_segs -= bi.bi_idx;
+ i->count = bi.bi_size;
+ i->iov_offset = bi.bi_bvec_done;
+}
+
void iov_iter_advance(struct iov_iter *i, size_t size)
{
if (unlikely(iov_iter_is_pipe(i))) {
@@ -1075,6 +1090,10 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
i->count -= size;
return;
}
+ if (iov_iter_is_bvec(i)) {
+ iov_iter_bvec_advance(i, size);
+ return;
+ }
iterate_and_advance(i, size, v, 0, 0, 0)
}
EXPORT_SYMBOL(iov_iter_advance);
--
2.24.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 5/7] iov_iter: optimise bvec iov_iter_advance()
2021-01-09 16:03 ` [PATCH v3 5/7] iov_iter: optimise bvec iov_iter_advance() Pavel Begunkov
@ 2021-01-11 2:55 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-01-11 2:55 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, Jens Axboe, Christoph Hellwig, Matthew Wilcox,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
On Sat, Jan 09, 2021 at 04:03:01PM +0000, Pavel Begunkov wrote:
> iov_iter_advance() is heavily used, but implemented through generic
> means. For bvecs there is a specifically crafted function for that, so
> use bvec_iter_advance() instead, it's faster and slimmer.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> lib/iov_iter.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 7de304269641..9b1c109dc8a9 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1065,6 +1065,21 @@ static void pipe_advance(struct iov_iter *i, size_t size)
> pipe_truncate(i);
> }
>
> +static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
> +{
> + struct bvec_iter bi;
> +
> + bi.bi_size = i->count;
> + bi.bi_bvec_done = i->iov_offset;
> + bi.bi_idx = 0;
> + bvec_iter_advance(i->bvec, &bi, size);
> +
> + i->bvec += bi.bi_idx;
> + i->nr_segs -= bi.bi_idx;
> + i->count = bi.bi_size;
> + i->iov_offset = bi.bi_bvec_done;
> +}
> +
> void iov_iter_advance(struct iov_iter *i, size_t size)
> {
> if (unlikely(iov_iter_is_pipe(i))) {
> @@ -1075,6 +1090,10 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
> i->count -= size;
> return;
> }
> + if (iov_iter_is_bvec(i)) {
> + iov_iter_bvec_advance(i, size);
> + return;
> + }
> iterate_and_advance(i, size, v, 0, 0, 0)
> }
> EXPORT_SYMBOL(iov_iter_advance);
> --
> 2.24.0
>
Reviewed-by: Ming Lei <[email protected]>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 6/7] bio: add a helper calculating nr segments to alloc
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
` (4 preceding siblings ...)
2021-01-09 16:03 ` [PATCH v3 5/7] iov_iter: optimise bvec iov_iter_advance() Pavel Begunkov
@ 2021-01-09 16:03 ` Pavel Begunkov
2021-01-11 2:57 ` Ming Lei
2021-01-09 16:03 ` [PATCH v3 7/7] bio: don't copy bvec for direct IO Pavel Begunkov
2021-01-25 15:58 ` [PATCH v3 0/7] no-copy bvec Jens Axboe
7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-01-09 16:03 UTC (permalink / raw)
To: linux-block
Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox, Ming Lei,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
Add a helper function calculating the number of bvec segments we need to
allocate to construct a bio. It doesn't change anything functionally,
but will be used to not duplicate special cases in the future.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/block_dev.c | 7 ++++---
fs/iomap/direct-io.c | 9 ++++-----
include/linux/bio.h | 10 ++++++++++
3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 3b8963e228a1..6f5bd9950baf 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -416,7 +416,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
dio->size += bio->bi_iter.bi_size;
pos += bio->bi_iter.bi_size;
- nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
+ nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_PAGES);
if (!nr_pages) {
bool polled = false;
@@ -481,9 +481,10 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
int nr_pages;
- nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1);
- if (!nr_pages)
+ if (!iov_iter_count(iter))
return 0;
+
+ nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_PAGES + 1);
if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES)
return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 933f234d5bec..ea1e8f696076 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -250,11 +250,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
orig_count = iov_iter_count(dio->submit.iter);
iov_iter_truncate(dio->submit.iter, length);
- nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
- if (nr_pages <= 0) {
- ret = nr_pages;
+ if (!iov_iter_count(dio->submit.iter))
goto out;
- }
if (need_zeroout) {
/* zero out from the start of the block to the write offset */
@@ -263,6 +260,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
iomap_dio_zero(dio, iomap, pos - pad, pad);
}
+ nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_PAGES);
do {
size_t n;
if (dio->error) {
@@ -308,7 +306,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
dio->size += n;
copied += n;
- nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
+ nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter,
+ BIO_MAX_PAGES);
iomap_dio_submit_bio(dio, iomap, bio, pos);
pos += n;
} while (nr_pages);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..d8f9077c43ef 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -10,6 +10,7 @@
#include <linux/ioprio.h>
/* struct bio, bio_vec and BIO_* flags are defined in blk_types.h */
#include <linux/blk_types.h>
+#include <linux/uio.h>
#define BIO_DEBUG
@@ -441,6 +442,15 @@ static inline void bio_wouldblock_error(struct bio *bio)
bio_endio(bio);
}
+/*
+ * Calculate number of bvec segments that should be allocated to fit data
+ * pointed by @iter.
+ */
+static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
+{
+ return iov_iter_npages(iter, max_segs);
+}
+
struct request_queue;
extern int submit_bio_wait(struct bio *bio);
--
2.24.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 6/7] bio: add a helper calculating nr segments to alloc
2021-01-09 16:03 ` [PATCH v3 6/7] bio: add a helper calculating nr segments to alloc Pavel Begunkov
@ 2021-01-11 2:57 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-01-11 2:57 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, Jens Axboe, Christoph Hellwig, Matthew Wilcox,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
On Sat, Jan 09, 2021 at 04:03:02PM +0000, Pavel Begunkov wrote:
> Add a helper function calculating the number of bvec segments we need to
> allocate to construct a bio. It doesn't change anything functionally,
> but will be used to not duplicate special cases in the future.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> fs/block_dev.c | 7 ++++---
> fs/iomap/direct-io.c | 9 ++++-----
> include/linux/bio.h | 10 ++++++++++
> 3 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 3b8963e228a1..6f5bd9950baf 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -416,7 +416,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
> dio->size += bio->bi_iter.bi_size;
> pos += bio->bi_iter.bi_size;
>
> - nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
> + nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_PAGES);
> if (!nr_pages) {
> bool polled = false;
>
> @@ -481,9 +481,10 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> {
> int nr_pages;
>
> - nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1);
> - if (!nr_pages)
> + if (!iov_iter_count(iter))
> return 0;
> +
> + nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_PAGES + 1);
> if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES)
> return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 933f234d5bec..ea1e8f696076 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -250,11 +250,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
> orig_count = iov_iter_count(dio->submit.iter);
> iov_iter_truncate(dio->submit.iter, length);
>
> - nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
> - if (nr_pages <= 0) {
> - ret = nr_pages;
> + if (!iov_iter_count(dio->submit.iter))
> goto out;
> - }
>
> if (need_zeroout) {
> /* zero out from the start of the block to the write offset */
> @@ -263,6 +260,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
> iomap_dio_zero(dio, iomap, pos - pad, pad);
> }
>
> + nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_PAGES);
> do {
> size_t n;
> if (dio->error) {
> @@ -308,7 +306,8 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
> dio->size += n;
> copied += n;
>
> - nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
> + nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter,
> + BIO_MAX_PAGES);
> iomap_dio_submit_bio(dio, iomap, bio, pos);
> pos += n;
> } while (nr_pages);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 1edda614f7ce..d8f9077c43ef 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -10,6 +10,7 @@
> #include <linux/ioprio.h>
> /* struct bio, bio_vec and BIO_* flags are defined in blk_types.h */
> #include <linux/blk_types.h>
> +#include <linux/uio.h>
>
> #define BIO_DEBUG
>
> @@ -441,6 +442,15 @@ static inline void bio_wouldblock_error(struct bio *bio)
> bio_endio(bio);
> }
>
> +/*
> + * Calculate number of bvec segments that should be allocated to fit data
> + * pointed by @iter.
> + */
> +static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> +{
> + return iov_iter_npages(iter, max_segs);
> +}
> +
> struct request_queue;
>
> extern int submit_bio_wait(struct bio *bio);
> --
> 2.24.0
>
Reviewed-by: Ming Lei <[email protected]>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 7/7] bio: don't copy bvec for direct IO
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
` (5 preceding siblings ...)
2021-01-09 16:03 ` [PATCH v3 6/7] bio: add a helper calculating nr segments to alloc Pavel Begunkov
@ 2021-01-09 16:03 ` Pavel Begunkov
2021-01-11 3:00 ` Ming Lei
2021-01-25 15:58 ` [PATCH v3 0/7] no-copy bvec Jens Axboe
7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2021-01-09 16:03 UTC (permalink / raw)
To: linux-block
Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox, Ming Lei,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.
Suggested-by: Matthew Wilcox <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
Documentation/filesystems/porting.rst | 9 ++++
block/bio.c | 67 ++++++++++++---------------
include/linux/bio.h | 5 +-
3 files changed, 42 insertions(+), 39 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index c722d94f29ea..1f8cf8e10b34 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -872,3 +872,12 @@ its result is kern_unmount() or kern_unmount_array().
zero-length bvec segments are disallowed, they must be filtered out before
passed on to an iterator.
+
+---
+
+**mandatory**
+
+For bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but
+uses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and
+page references stay until I/O has completed, i.e. until ->ki_complete() has
+been called or returned with non -EIOCBQUEUED code.
diff --git a/block/bio.c b/block/bio.c
index 9f26984af643..6f031a04b59a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -960,21 +960,17 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
}
EXPORT_SYMBOL_GPL(bio_release_pages);
-static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
+static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
{
- const struct bio_vec *bv = iter->bvec;
- unsigned int len;
- size_t size;
-
- if (WARN_ON_ONCE(iter->iov_offset > bv->bv_len))
- return -EINVAL;
-
- len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
- size = bio_add_page(bio, bv->bv_page, len,
- bv->bv_offset + iter->iov_offset);
- if (unlikely(size != len))
- return -EINVAL;
- iov_iter_advance(iter, size);
+ WARN_ON_ONCE(BVEC_POOL_IDX(bio) != 0);
+
+ bio->bi_vcnt = iter->nr_segs;
+ bio->bi_max_vecs = iter->nr_segs;
+ bio->bi_io_vec = (struct bio_vec *)iter->bvec;
+ bio->bi_iter.bi_bvec_done = iter->iov_offset;
+ bio->bi_iter.bi_size = iter->count;
+
+ iov_iter_advance(iter, iter->count);
return 0;
}
@@ -1088,12 +1084,12 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
* This takes either an iterator pointing to user memory, or one pointing to
* kernel pages (BVEC iterator). If we're adding user pages, we pin them and
* map them into the kernel. On IO completion, the caller should put those
- * pages. If we're adding kernel pages, and the caller told us it's safe to
- * do so, we just have to add the pages to the bio directly. We don't grab an
- * extra reference to those pages (the user should already have that), and we
- * don't put the page on IO completion. The caller needs to check if the bio is
- * flagged BIO_NO_PAGE_REF on IO completion. If it isn't, then pages should be
- * released.
+ * pages. For bvec based iterators bio_iov_iter_get_pages() uses the provided
+ * bvecs rather than copying them. Hence anyone issuing kiocb based IO needs
+ * to ensure the bvecs and pages stay referenced until the submitted I/O is
+ * completed by a call to ->ki_complete() or returns with an error other than
+ * -EIOCBQUEUED. The caller needs to check if the bio is flagged BIO_NO_PAGE_REF
+ * on IO completion. If it isn't, then pages should be released.
*
* The function tries, but does not guarantee, to pin as many pages as
* fit into the bio, or are requested in @iter, whatever is smaller. If
@@ -1105,27 +1101,22 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
*/
int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
{
- const bool is_bvec = iov_iter_is_bvec(iter);
- int ret;
-
- if (WARN_ON_ONCE(bio->bi_vcnt))
- return -EINVAL;
+ int ret = 0;
- do {
- if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
- if (WARN_ON_ONCE(is_bvec))
- return -EINVAL;
- ret = __bio_iov_append_get_pages(bio, iter);
- } else {
- if (is_bvec)
- ret = __bio_iov_bvec_add_pages(bio, iter);
+ if (iov_iter_is_bvec(iter)) {
+ if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
+ return -EINVAL;
+ bio_iov_bvec_set(bio, iter);
+ bio_set_flag(bio, BIO_NO_PAGE_REF);
+ return 0;
+ } else {
+ do {
+ if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+ ret = __bio_iov_append_get_pages(bio, iter);
else
ret = __bio_iov_iter_get_pages(bio, iter);
- }
- } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
-
- if (is_bvec)
- bio_set_flag(bio, BIO_NO_PAGE_REF);
+ } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
+ }
/* don't account direct I/O as memory stall */
bio_clear_flag(bio, BIO_WORKINGSET);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d8f9077c43ef..1d30572a8c53 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -444,10 +444,13 @@ static inline void bio_wouldblock_error(struct bio *bio)
/*
* Calculate number of bvec segments that should be allocated to fit data
- * pointed by @iter.
+ * pointed by @iter. If @iter is backed by bvec it's going to be reused
+ * instead of allocating a new one.
*/
static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
{
+ if (iov_iter_is_bvec(iter))
+ return 0;
return iov_iter_npages(iter, max_segs);
}
--
2.24.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 7/7] bio: don't copy bvec for direct IO
2021-01-09 16:03 ` [PATCH v3 7/7] bio: don't copy bvec for direct IO Pavel Begunkov
@ 2021-01-11 3:00 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2021-01-11 3:00 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, Jens Axboe, Christoph Hellwig, Matthew Wilcox,
Johannes Weiner, Alexander Viro, Darrick J . Wong,
Martin K . Petersen, Jonathan Corbet, linux-xfs, linux-fsdevel,
io-uring, linux-kernel, target-devel, linux-scsi, linux-doc,
Christoph Hellwig
On Sat, Jan 09, 2021 at 04:03:03PM +0000, Pavel Begunkov wrote:
> The block layer spends quite a while in blkdev_direct_IO() to copy and
> initialise bio's bvec. However, if we've already got a bvec in the input
> iterator it might be reused in some cases, i.e. when new
> ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
> performance boost, and it also reduces memory footprint.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> Documentation/filesystems/porting.rst | 9 ++++
> block/bio.c | 67 ++++++++++++---------------
> include/linux/bio.h | 5 +-
> 3 files changed, 42 insertions(+), 39 deletions(-)
>
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index c722d94f29ea..1f8cf8e10b34 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -872,3 +872,12 @@ its result is kern_unmount() or kern_unmount_array().
>
> zero-length bvec segments are disallowed, they must be filtered out before
> passed on to an iterator.
> +
> +---
> +
> +**mandatory**
> +
> +For bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but
> +uses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and
> +page references stay until I/O has completed, i.e. until ->ki_complete() has
> +been called or returned with non -EIOCBQUEUED code.
> diff --git a/block/bio.c b/block/bio.c
> index 9f26984af643..6f031a04b59a 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -960,21 +960,17 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
> }
> EXPORT_SYMBOL_GPL(bio_release_pages);
>
> -static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
> +static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
> {
> - const struct bio_vec *bv = iter->bvec;
> - unsigned int len;
> - size_t size;
> -
> - if (WARN_ON_ONCE(iter->iov_offset > bv->bv_len))
> - return -EINVAL;
> -
> - len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
> - size = bio_add_page(bio, bv->bv_page, len,
> - bv->bv_offset + iter->iov_offset);
> - if (unlikely(size != len))
> - return -EINVAL;
> - iov_iter_advance(iter, size);
> + WARN_ON_ONCE(BVEC_POOL_IDX(bio) != 0);
> +
> + bio->bi_vcnt = iter->nr_segs;
> + bio->bi_max_vecs = iter->nr_segs;
> + bio->bi_io_vec = (struct bio_vec *)iter->bvec;
> + bio->bi_iter.bi_bvec_done = iter->iov_offset;
> + bio->bi_iter.bi_size = iter->count;
> +
> + iov_iter_advance(iter, iter->count);
> return 0;
> }
>
> @@ -1088,12 +1084,12 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
> * This takes either an iterator pointing to user memory, or one pointing to
> * kernel pages (BVEC iterator). If we're adding user pages, we pin them and
> * map them into the kernel. On IO completion, the caller should put those
> - * pages. If we're adding kernel pages, and the caller told us it's safe to
> - * do so, we just have to add the pages to the bio directly. We don't grab an
> - * extra reference to those pages (the user should already have that), and we
> - * don't put the page on IO completion. The caller needs to check if the bio is
> - * flagged BIO_NO_PAGE_REF on IO completion. If it isn't, then pages should be
> - * released.
> + * pages. For bvec based iterators bio_iov_iter_get_pages() uses the provided
> + * bvecs rather than copying them. Hence anyone issuing kiocb based IO needs
> + * to ensure the bvecs and pages stay referenced until the submitted I/O is
> + * completed by a call to ->ki_complete() or returns with an error other than
> + * -EIOCBQUEUED. The caller needs to check if the bio is flagged BIO_NO_PAGE_REF
> + * on IO completion. If it isn't, then pages should be released.
> *
> * The function tries, but does not guarantee, to pin as many pages as
> * fit into the bio, or are requested in @iter, whatever is smaller. If
> @@ -1105,27 +1101,22 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
> */
> int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> {
> - const bool is_bvec = iov_iter_is_bvec(iter);
> - int ret;
> -
> - if (WARN_ON_ONCE(bio->bi_vcnt))
> - return -EINVAL;
> + int ret = 0;
>
> - do {
> - if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> - if (WARN_ON_ONCE(is_bvec))
> - return -EINVAL;
> - ret = __bio_iov_append_get_pages(bio, iter);
> - } else {
> - if (is_bvec)
> - ret = __bio_iov_bvec_add_pages(bio, iter);
> + if (iov_iter_is_bvec(iter)) {
> + if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
> + return -EINVAL;
> + bio_iov_bvec_set(bio, iter);
> + bio_set_flag(bio, BIO_NO_PAGE_REF);
> + return 0;
> + } else {
> + do {
> + if (bio_op(bio) == REQ_OP_ZONE_APPEND)
> + ret = __bio_iov_append_get_pages(bio, iter);
> else
> ret = __bio_iov_iter_get_pages(bio, iter);
> - }
> - } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
> -
> - if (is_bvec)
> - bio_set_flag(bio, BIO_NO_PAGE_REF);
> + } while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
> + }
>
> /* don't account direct I/O as memory stall */
> bio_clear_flag(bio, BIO_WORKINGSET);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d8f9077c43ef..1d30572a8c53 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -444,10 +444,13 @@ static inline void bio_wouldblock_error(struct bio *bio)
>
> /*
> * Calculate number of bvec segments that should be allocated to fit data
> - * pointed by @iter.
> + * pointed by @iter. If @iter is backed by bvec it's going to be reused
> + * instead of allocating a new one.
> */
> static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> {
> + if (iov_iter_is_bvec(iter))
> + return 0;
> return iov_iter_npages(iter, max_segs);
> }
>
> --
> 2.24.0
>
Reviewed-by: Ming Lei <[email protected]>
--
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 0/7] no-copy bvec
2021-01-09 16:02 [PATCH v3 0/7] no-copy bvec Pavel Begunkov
` (6 preceding siblings ...)
2021-01-09 16:03 ` [PATCH v3 7/7] bio: don't copy bvec for direct IO Pavel Begunkov
@ 2021-01-25 15:58 ` Jens Axboe
7 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2021-01-25 15:58 UTC (permalink / raw)
To: Pavel Begunkov, linux-block
Cc: Christoph Hellwig, Matthew Wilcox, Ming Lei, Johannes Weiner,
Alexander Viro, Darrick J . Wong, Martin K . Petersen,
Jonathan Corbet, linux-xfs, linux-fsdevel, io-uring, linux-kernel,
target-devel, linux-scsi, linux-doc
On 1/9/21 9:02 AM, Pavel Begunkov wrote:
> Currently, when iomap and block direct IO gets a bvec based iterator
> the bvec will be copied, with all other accounting that takes much
> CPU time and causes additional allocation for larger bvecs. The
> patchset makes it to reuse the passed in iter bvec.
Applied, thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread