From: John Garry <[email protected]>
To: "Ritesh Harjani (IBM)" <[email protected]>,
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected]
Cc: [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected]
Subject: Re: [PATCH v4 05/11] block: Add core atomic write support
Date: Mon, 26 Feb 2024 09:23:35 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 25/02/2024 12:09, Ritesh Harjani (IBM) wrote:
> John Garry <[email protected]> writes:
>
>> Add atomic write support as follows:
>> - report request_queue atomic write support limits to sysfs and udpate Doc
>> - add helper functions to get request_queue atomic write limits
>> - support to safely merge atomic writes
>> - add a per-request atomic write flag
>> - deal with splitting atomic writes
>> - misc helper functions
>>
>> New sysfs files are added to report the following atomic write limits:
>> - atomic_write_boundary_bytes
>> - atomic_write_max_bytes
>> - atomic_write_unit_max_bytes
>> - atomic_write_unit_min_bytes
>>
>> atomic_write_unit_{min,max}_bytes report the min and max atomic write
>> support size, inclusive, and are primarily dictated by HW capability. Both
>> values must be a power-of-2. atomic_write_boundary_bytes, if non-zero,
>> indicates an LBA space boundary at which an atomic write straddles no
>> longer is atomically executed by the disk. atomic_write_max_bytes is the
>> maximum merged size for an atomic write. Often it will be the same value as
>> atomic_write_unit_max_bytes.
>
> Instead of explaining sysfs outputs which are deriviatives of HW
> and request_queue limits (and also defined in Documentation), maybe we
> could explain how those sysfs values are derived instead -
>
> struct queue_limits {
> <...>
> unsigned int atomic_write_hw_max_sectors;
> unsigned int atomic_write_max_sectors;
> unsigned int atomic_write_hw_boundary_sectors;
> unsigned int atomic_write_hw_unit_min_sectors;
> unsigned int atomic_write_unit_min_sectors;
> unsigned int atomic_write_hw_unit_max_sectors;
> unsigned int atomic_write_unit_max_sectors;
> <...>
>
> 1. atomic_write_unit_hw_max_sectors comes directly from hw and it need
> not be a power of 2.
>
> 2. atomic_write_hw_unit_min_sectors and atomic_write_hw_unit_max_sectors
> is again defined/derived from hw limits, but it is rounded down so that
> it is always a power of 2.
>
> 3. atomic_write_hw_boundary_sectors again comes from HW boundary limit.
> It could either be 0 (which means the device specify no boundary limit) or a
> multiple of unit_max. It need not be power of 2, however the current
> code assumes it to be a power of 2 (check callers of blk_queue_atomic_write_boundary_bytes())
>
> 4. atomic_write_max_sectors, atomic_write_unit_min_sectors
> and atomic_write_unit_max_sectors are all derived out of above hw limits
> inside function blk_atomic_writes_update_limits() based on request_queue
> limits.
> a. atomic_write_max_sectors is derived from atomic_write_hw_unit_max_sectors and
> request_queue's max_hw_sectors limit. It also guarantees max
> sectors that can be fit in a single bio.
> b. atomic_write_unit_[min|max]_sectors are derived from atomic_write_hw_unit_[min|max]_sectors,
> request_queue's max_hw_sectors & blk_queue_max_guaranteed_bio_sectors(). Both of these limits
> are kept as a power of 2.
>
> Now coming to sysfs outputs -
> 1. atomic_write_unit_max_bytes: Same as atomic_write_unix_max_sectors in bytes
> 2. atomic_write_unit_min_bytes: Same as atomic_write_unit_min_sectors in bytes
> 3. atomic_write_boundary_bytes: same as atomic_write_hw_boundary_sectors
> in bytes
> 4. atomic_write_max_bytes: Same as atomic_write_max_sectors in bytes
>
ok, I can look to incorporate the advised formatting changes
>>
>> atomic_write_unit_max_bytes is capped at the maximum data size which we are
>> guaranteed to be able to fit in a BIO, as an atomic write must always be
>> submitted as a single BIO. This BIO max size is dictated by the number of
>
> Here it says that the atomic write must always be submitted as a single
> bio. From where to where?
submitted to the block layer/core
> I think you meant from FS to block layer.
sure, or also block device file operations (in fops.c) to block core
> Because otherwise we still allow request/bio merging inside block layer
> based on the request queue limits we defined above. i.e. bio can be
> chained to form
> rq->biotail->bi_next = next_rq->bio
> as long as the merged requests is within the queue_limits.
>
> i.e. atomic write requests can be merged as long as -
> - both rqs have REQ_ATOMIC set
> - blk_rq_sectors(final_rq) <= q->limits.atomic_write_max_sectors
> - final rq formed should not straddle limits->atomic_write_hw_boundary_sectors
>
> However, splitting of an atomic write requests is not allowed. And if it
> happens, we fail the I/O req & return -EINVAL.
...
>
> IMHO, the commit message can definitely use a re-write. I agree that you
> have put in a lot of information, but I think it can be more organized.#
ok, fine. I'll look at this. Thanks.
>
>>
>> Contains significant contributions from:
>> Himanshu Madhani <[email protected]>
>
> Myabe it can use a better tag then.
> "Documentation/process/submitting-patches.rst"
ok
>
>>
>> Signed-off-by: John Garry <[email protected]>
>> ---
>> Documentation/ABI/stable/sysfs-block | 52 ++++++++++++++
>> block/blk-merge.c | 91 ++++++++++++++++++++++-
>> block/blk-settings.c | 103 +++++++++++++++++++++++++++
>> block/blk-sysfs.c | 33 +++++++++
>> block/blk.h | 3 +
>> include/linux/blk_types.h | 2 +
>> include/linux/blkdev.h | 60 ++++++++++++++++
>> 7 files changed, 343 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
>> index 1fe9a553c37b..4c775f4bdefe 100644
>> --- a/Documentation/ABI/stable/sysfs-block
>> +++ b/Documentation/ABI/stable/sysfs-block
>> @@ -21,6 +21,58 @@ Description:
>> device is offset from the internal allocation unit's
>> natural alignment.
...
>>
>
> /* A comment explaining this function and arguments could be helpful */
already addressed according to earlier review
>
>> +static bool rq_straddles_atomic_write_boundary(struct request *rq,
>> + unsigned int front,
>> + unsigned int back)
>
> A better naming perhaps be start_adjust, end_adjust?
ok
>
>> +{
>> + unsigned int boundary = queue_atomic_write_boundary_bytes(rq->q);
>> + unsigned int mask, imask;
>> + loff_t start, end;
>
> start_rq_pos, end_rq_pos maybe?
ok
>
>> +
>> + if (!boundary)
>> + return false;
>> +
>> + start = rq->__sector << SECTOR_SHIFT;
>
> blk_rq_pos(rq) perhaps?
ok
>
>> + end = start + rq->__data_len;
>
> blk_rq_bytes(rq) perhaps? It should be..
ok
>> +
>> + start -= front;
>> + end += back;
>> +
>> + /* We're longer than the boundary, so must be crossing it */
>> + if (end - start > boundary)
>> + return true;
>> +
>> + mask = boundary - 1;
>> +
>> + /* start/end are boundary-aligned, so cannot be crossing */
>> + if (!(start & mask) || !(end & mask))
>> + return false;
>> +
>> + imask = ~mask;
>> +
>> + /* Top bits are different, so crossed a boundary */
>> + if ((start & imask) != (end & imask))
>> + return true;
>
> The last condition looks wrong. Shouldn't it be end - 1?
>
>> +
>> + return false;
>> +}
>
> Can we do something like this?
>
> static bool rq_straddles_atomic_write_boundary(struct request *rq,
> unsigned int start_adjust,
> unsigned int end_adjust)
> {
> unsigned int boundary = queue_atomic_write_boundary_bytes(rq->q);
> unsigned long boundary_mask;
> unsigned long start_rq_pos, end_rq_pos;
>
> if (!boundary)
> return false;
>
> start_rq_pos = blk_rq_pos(rq) << SECTOR_SHIFT;
> end_rq_pos = start_rq_pos + blk_rq_bytes(rq);
>
> start_rq_pos -= start_adjust;
> end_rq_pos += end_adjust;
>
> boundary_mask = boundary - 1;
>
> if ((start_rq_pos | boundary_mask) != (end_rq_pos | boundary_mask))
> return true;
>
> return false;
> }
>
> I was thinking this check should cover all cases? Thoughts?
that looks ok (apart from issue already detected later). It is quite
similar to how I coded it in the NVMe driver, apart from the initial >
boundary check.
>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>> index f288c94374b3..cd7cceb8565d 100644
>> --- a/include/linux/blk_types.h
>> +++ b/include/linux/blk_types.h
>> @@ -422,6 +422,7 @@ enum req_flag_bits {
>> __REQ_DRV, /* for driver use */
>> __REQ_FS_PRIVATE, /* for file system (submitter) use */
>>
>> + __REQ_ATOMIC, /* for atomic write operations */
>> /*
>> * Command specific flags, keep last:
>> */
>> @@ -448,6 +449,7 @@ enum req_flag_bits {
>> #define REQ_RAHEAD (__force blk_opf_t)(1ULL << __REQ_RAHEAD)
>> #define REQ_BACKGROUND (__force blk_opf_t)(1ULL << __REQ_BACKGROUND)
>> #define REQ_NOWAIT (__force blk_opf_t)(1ULL << __REQ_NOWAIT)
>> +#define REQ_ATOMIC (__force blk_opf_t)(1ULL << __REQ_ATOMIC)
>
> Let's add this in the same order as of __REQ_ATOMIC i.e. after
> REQ_FS_PRIVATE macro
ok, fine
>> @@ -299,6 +299,14 @@ struct queue_limits {
>> unsigned int discard_alignment;
>> unsigned int zone_write_granularity;
>>
>> + unsigned int atomic_write_hw_max_sectors;
>> + unsigned int atomic_write_max_sectors;
>> + unsigned int atomic_write_hw_boundary_sectors;
>> + unsigned int atomic_write_hw_unit_min_sectors;
>> + unsigned int atomic_write_unit_min_sectors;
>> + unsigned int atomic_write_hw_unit_max_sectors;
>> + unsigned int atomic_write_unit_max_sectors;
>> +
> 1 liner comment for above members please?
ok
>> +static inline bool bdev_can_atomic_write(struct block_device *bdev)
>> +{
>> + struct request_queue *bd_queue = bdev->bd_queue;
>> + struct queue_limits *limits = &bd_queue->limits;
>> +
>> + if (!limits->atomic_write_unit_min_sectors)
>> + return false;
>> +
>> + if (bdev_is_partition(bdev)) {
>> + sector_t bd_start_sect = bdev->bd_start_sect;
>> + unsigned int granularity = max(
>
> atomic_align perhaps?
or just "align"
>
>> + limits->atomic_write_unit_min_sectors,
>> + limits->atomic_write_hw_boundary_sectors);
>> + if (do_div(bd_start_sect, granularity))
>> + return false;
>> + }
>
> since atomic_align is a power of 2. Why not use IS_ALIGNED()?
> (bitwise operation instead of div)?
already changed as advised
Thanks,
John
next prev parent reply other threads:[~2024-02-26 9:24 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-19 13:00 [PATCH v4 00/11] block atomic writes John Garry
2024-02-19 13:00 ` [PATCH v4 01/11] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-02-19 18:57 ` Keith Busch
2024-02-19 13:01 ` [PATCH v4 02/11] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-02-19 18:57 ` Keith Busch
2024-02-20 8:31 ` John Garry
2024-02-20 6:54 ` Christoph Hellwig
2024-02-19 13:01 ` [PATCH v4 03/11] fs: Initial atomic write support John Garry
2024-02-19 19:16 ` David Sterba
2024-02-20 8:13 ` John Garry
2024-02-19 22:44 ` Dave Chinner
2024-02-20 9:52 ` John Garry
2024-02-24 18:16 ` Ritesh Harjani
2024-02-24 18:20 ` Ritesh Harjani
2024-02-26 8:58 ` John Garry
2024-02-26 9:13 ` Ritesh Harjani
2024-02-26 9:46 ` John Garry
2024-02-26 8:51 ` John Garry
2024-02-19 13:01 ` [PATCH v4 04/11] fs: Add initial atomic write support info to statx John Garry
2024-02-19 22:28 ` Dave Chinner
2024-02-20 9:40 ` John Garry
2024-02-20 8:20 ` Christoph Hellwig
2024-02-20 9:01 ` John Garry
2024-02-24 18:46 ` Ritesh Harjani
2024-02-26 9:07 ` John Garry
2024-02-19 13:01 ` [PATCH v4 05/11] block: Add core atomic write support John Garry
2024-02-19 22:58 ` Dave Chinner
2024-02-20 8:22 ` Christoph Hellwig
2024-02-20 10:01 ` John Garry
2024-02-25 12:09 ` Ritesh Harjani
2024-02-25 12:21 ` Ritesh Harjani
2024-02-26 9:23 ` John Garry [this message]
2024-02-19 13:01 ` [PATCH v4 06/11] block: Add atomic write support for statx John Garry
2024-02-20 8:29 ` Christoph Hellwig
2024-02-20 9:35 ` John Garry
2024-02-25 14:20 ` Ritesh Harjani
2024-02-26 9:36 ` John Garry
2024-02-19 13:01 ` [PATCH v4 07/11] block: Add fops atomic write support John Garry
2024-02-25 14:46 ` Ritesh Harjani
2024-02-26 9:46 ` John Garry
2024-02-19 13:01 ` [PATCH v4 08/11] scsi: sd: Atomic " John Garry
2024-02-19 13:01 ` [PATCH v4 09/11] scsi: scsi_debug: " John Garry
2024-02-20 7:12 ` Ojaswin Mujoo
2024-02-20 9:01 ` John Garry
2024-02-19 13:01 ` [PATCH v4 10/11] nvme: " John Garry
2024-02-19 19:21 ` Keith Busch
2024-02-20 6:55 ` Christoph Hellwig
2024-02-20 8:19 ` John Garry
2024-02-20 8:31 ` Christoph Hellwig
2024-02-20 8:50 ` John Garry
2024-02-19 13:01 ` [PATCH v4 11/11] nvme: Ensure atomic writes will be executed atomically John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox