public inbox for [email protected]
 help / color / mirror / Atom feed
From: John Garry <[email protected]>
To: "Ritesh Harjani (IBM)" <[email protected]>,
	[email protected], [email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected]
Cc: [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v4 05/11] block: Add core atomic write support
Date: Mon, 26 Feb 2024 09:23:35 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 25/02/2024 12:09, Ritesh Harjani (IBM) wrote:
> John Garry <[email protected]> writes:
> 
>> Add atomic write support as follows:
>> - report request_queue atomic write support limits to sysfs and udpate Doc
>> - add helper functions to get request_queue atomic write limits
>> - support to safely merge atomic writes
>> - add a per-request atomic write flag
>> - deal with splitting atomic writes
>> - misc helper functions
>>
>> New sysfs files are added to report the following atomic write limits:
>> - atomic_write_boundary_bytes
>> - atomic_write_max_bytes
>> - atomic_write_unit_max_bytes
>> - atomic_write_unit_min_bytes
>>
>> atomic_write_unit_{min,max}_bytes report the min and max atomic write
>> support size, inclusive, and are primarily dictated by HW capability. Both
>> values must be a power-of-2. atomic_write_boundary_bytes, if non-zero,
>> indicates an LBA space boundary at which an atomic write straddles no
>> longer is atomically executed by the disk. atomic_write_max_bytes is the
>> maximum merged size for an atomic write. Often it will be the same value as
>> atomic_write_unit_max_bytes.
> 
> Instead of explaining sysfs outputs which are deriviatives of HW
> and request_queue limits (and also defined in Documentation), maybe we
> could explain how those sysfs values are derived instead -
> 
> struct queue_limits {
> <...>
> 	unsigned int		atomic_write_hw_max_sectors;
> 	unsigned int		atomic_write_max_sectors;
> 	unsigned int		atomic_write_hw_boundary_sectors;
> 	unsigned int		atomic_write_hw_unit_min_sectors;
> 	unsigned int		atomic_write_unit_min_sectors;
> 	unsigned int		atomic_write_hw_unit_max_sectors;
> 	unsigned int		atomic_write_unit_max_sectors;
> <...>
> 
> 1. atomic_write_unit_hw_max_sectors comes directly from hw and it need
> not be a power of 2.
> 
> 2. atomic_write_hw_unit_min_sectors and atomic_write_hw_unit_max_sectors
> is again defined/derived from hw limits, but it is rounded down so that
> it is always a power of 2.
> 
> 3. atomic_write_hw_boundary_sectors again comes from HW boundary limit.
> It could either be 0 (which means the device specify no boundary limit) or a
> multiple of unit_max. It need not be power of 2, however the current
> code assumes it to be a power of 2 (check callers of blk_queue_atomic_write_boundary_bytes())
> 
> 4. atomic_write_max_sectors, atomic_write_unit_min_sectors
> and atomic_write_unit_max_sectors are all derived out of above hw limits
> inside function blk_atomic_writes_update_limits() based on request_queue
> limits.
>      a. atomic_write_max_sectors is derived from atomic_write_hw_unit_max_sectors and
>         request_queue's max_hw_sectors limit. It also guarantees max
>         sectors that can be fit in a single bio.
>      b. atomic_write_unit_[min|max]_sectors are derived from atomic_write_hw_unit_[min|max]_sectors,
>         request_queue's max_hw_sectors & blk_queue_max_guaranteed_bio_sectors(). Both of these limits
>         are kept as a power of 2.
> 
> Now coming to sysfs outputs -
> 1. atomic_write_unit_max_bytes: Same as atomic_write_unix_max_sectors in bytes
> 2. atomic_write_unit_min_bytes: Same as atomic_write_unit_min_sectors in bytes
> 3. atomic_write_boundary_bytes: same as atomic_write_hw_boundary_sectors
> in bytes
> 4. atomic_write_max_bytes: Same as atomic_write_max_sectors in bytes
> 

ok, I can look to incorporate the advised formatting changes

>>
>> atomic_write_unit_max_bytes is capped at the maximum data size which we are
>> guaranteed to be able to fit in a BIO, as an atomic write must always be
>> submitted as a single BIO. This BIO max size is dictated by the number of
> 
> Here it says that the atomic write must always be submitted as a single
> bio. From where to where?

submitted to the block layer/core

> I think you meant from FS to block layer.

sure, or also block device file operations (in fops.c) to block core

> Because otherwise we still allow request/bio merging inside block layer
> based on the request queue limits we defined above. i.e. bio can be
> chained to form
>        rq->biotail->bi_next = next_rq->bio
> as long as the merged requests is within the queue_limits.
> 
> i.e. atomic write requests can be merged as long as -
>      - both rqs have REQ_ATOMIC set
>      - blk_rq_sectors(final_rq) <= q->limits.atomic_write_max_sectors
>      - final rq formed should not straddle limits->atomic_write_hw_boundary_sectors
> 
> However, splitting of an atomic write requests is not allowed. And if it
> happens, we fail the I/O req & return -EINVAL.

...

> 
> IMHO, the commit message can definitely use a re-write. I agree that you
> have put in a lot of information, but I think it can be more organized.#

ok, fine. I'll look at this. Thanks.

> 
>>
>> Contains significant contributions from:
>> Himanshu Madhani <[email protected]>
> 
> Myabe it can use a better tag then.
> "Documentation/process/submitting-patches.rst"

ok

> 
>>
>> Signed-off-by: John Garry <[email protected]>
>> ---
>>   Documentation/ABI/stable/sysfs-block |  52 ++++++++++++++
>>   block/blk-merge.c                    |  91 ++++++++++++++++++++++-
>>   block/blk-settings.c                 | 103 +++++++++++++++++++++++++++
>>   block/blk-sysfs.c                    |  33 +++++++++
>>   block/blk.h                          |   3 +
>>   include/linux/blk_types.h            |   2 +
>>   include/linux/blkdev.h               |  60 ++++++++++++++++
>>   7 files changed, 343 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
>> index 1fe9a553c37b..4c775f4bdefe 100644
>> --- a/Documentation/ABI/stable/sysfs-block
>> +++ b/Documentation/ABI/stable/sysfs-block
>> @@ -21,6 +21,58 @@ Description:
>>   		device is offset from the internal allocation unit's
>>   		natural alignment.

...

>>   
> 
> /* A comment explaining this function and arguments could be helpful */

already addressed according to earlier review

> 
>> +static bool rq_straddles_atomic_write_boundary(struct request *rq,
>> +					unsigned int front,
>> +					unsigned int back)
> 
> A better naming perhaps be start_adjust, end_adjust?

ok

> 
>> +{
>> +	unsigned int boundary = queue_atomic_write_boundary_bytes(rq->q);
>> +	unsigned int mask, imask;
>> +	loff_t start, end;
> 
> start_rq_pos, end_rq_pos maybe?

ok

> 
>> +
>> +	if (!boundary)
>> +		return false;
>> +
>> +	start = rq->__sector << SECTOR_SHIFT;
> 
> blk_rq_pos(rq) perhaps?

ok

> 
>> +	end = start + rq->__data_len;
> 
> blk_rq_bytes(rq) perhaps? It should be..

ok

>> +
>> +	start -= front;
>> +	end += back;
>> +
>> +	/* We're longer than the boundary, so must be crossing it */
>> +	if (end - start > boundary)
>> +		return true;
>> +
>> +	mask = boundary - 1;
>> +
>> +	/* start/end are boundary-aligned, so cannot be crossing */
>> +	if (!(start & mask) || !(end & mask))
>> +		return false;
>> +
>> +	imask = ~mask;
>> +
>> +	/* Top bits are different, so crossed a boundary */
>> +	if ((start & imask) != (end & imask))
>> +		return true;
> 
> The last condition looks wrong. Shouldn't it be end - 1?
> 
>> +
>> +	return false;
>> +}
> 
> Can we do something like this?
> 
> static bool rq_straddles_atomic_write_boundary(struct request *rq,
> 					       unsigned int start_adjust,
> 					       unsigned int end_adjust)
> {
> 	unsigned int boundary = queue_atomic_write_boundary_bytes(rq->q);
> 	unsigned long boundary_mask;
> 	unsigned long start_rq_pos, end_rq_pos;
> 
> 	if (!boundary)
> 		return false;
> 
> 	start_rq_pos = blk_rq_pos(rq) << SECTOR_SHIFT;
> 	end_rq_pos = start_rq_pos + blk_rq_bytes(rq);
> 
> 	start_rq_pos -= start_adjust;
> 	end_rq_pos += end_adjust;
> 
> 	boundary_mask = boundary - 1;
> 
> 	if ((start_rq_pos | boundary_mask) != (end_rq_pos | boundary_mask))
> 		return true;
> 
> 	return false;
> }
> 
> I was thinking this check should cover all cases? Thoughts?

that looks ok (apart from issue already detected later). It is quite 
similar to how I coded it in the NVMe driver, apart from the initial > 
boundary check.

>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>> index f288c94374b3..cd7cceb8565d 100644
>> --- a/include/linux/blk_types.h
>> +++ b/include/linux/blk_types.h
>> @@ -422,6 +422,7 @@ enum req_flag_bits {
>>   	__REQ_DRV,		/* for driver use */
>>   	__REQ_FS_PRIVATE,	/* for file system (submitter) use */
>>
>> +	__REQ_ATOMIC,		/* for atomic write operations */
>>   	/*
>>   	 * Command specific flags, keep last:
>>   	 */
>> @@ -448,6 +449,7 @@ enum req_flag_bits {
>>   #define REQ_RAHEAD	(__force blk_opf_t)(1ULL << __REQ_RAHEAD)
>>   #define REQ_BACKGROUND	(__force blk_opf_t)(1ULL << __REQ_BACKGROUND)
>>   #define REQ_NOWAIT	(__force blk_opf_t)(1ULL << __REQ_NOWAIT)
>> +#define REQ_ATOMIC	(__force blk_opf_t)(1ULL << __REQ_ATOMIC)
> 
> Let's add this in the same order as of __REQ_ATOMIC i.e. after
> REQ_FS_PRIVATE macro

ok, fine

 >> @@ -299,6 +299,14 @@ struct queue_limits {
 >>   	unsigned int		discard_alignment;
 >>   	unsigned int		zone_write_granularity;
 >>
 >> +	unsigned int		atomic_write_hw_max_sectors;
 >> +	unsigned int		atomic_write_max_sectors;
 >> +	unsigned int		atomic_write_hw_boundary_sectors;
 >> +	unsigned int		atomic_write_hw_unit_min_sectors;
 >> +	unsigned int		atomic_write_unit_min_sectors;
 >> +	unsigned int		atomic_write_hw_unit_max_sectors;
 >> +	unsigned int		atomic_write_unit_max_sectors;
 >> +
 > 1 liner comment for above members please?

ok


>> +static inline bool bdev_can_atomic_write(struct block_device *bdev)
>> +{
>> +	struct request_queue *bd_queue = bdev->bd_queue;
>> +	struct queue_limits *limits = &bd_queue->limits;
>> +
>> +	if (!limits->atomic_write_unit_min_sectors)
>> +		return false;
>> +
>> +	if (bdev_is_partition(bdev)) {
>> +		sector_t bd_start_sect = bdev->bd_start_sect;
>> +		unsigned int granularity = max(
> 
> atomic_align perhaps?

or just "align"

> 
>> +				limits->atomic_write_unit_min_sectors,
>> +				limits->atomic_write_hw_boundary_sectors);
>> +		if (do_div(bd_start_sect, granularity))
>> +			return false;
>> +	}
> 
> since atomic_align is a power of 2. Why not use IS_ALIGNED()?
> (bitwise operation instead of div)?

already changed as advised

Thanks,
John


  parent reply	other threads:[~2024-02-26  9:24 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-19 13:00 [PATCH v4 00/11] block atomic writes John Garry
2024-02-19 13:00 ` [PATCH v4 01/11] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-02-19 18:57   ` Keith Busch
2024-02-19 13:01 ` [PATCH v4 02/11] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-02-19 18:57   ` Keith Busch
2024-02-20  8:31     ` John Garry
2024-02-20  6:54   ` Christoph Hellwig
2024-02-19 13:01 ` [PATCH v4 03/11] fs: Initial atomic write support John Garry
2024-02-19 19:16   ` David Sterba
2024-02-20  8:13     ` John Garry
2024-02-19 22:44   ` Dave Chinner
2024-02-20  9:52     ` John Garry
2024-02-24 18:16   ` Ritesh Harjani
2024-02-24 18:20     ` Ritesh Harjani
2024-02-26  8:58       ` John Garry
2024-02-26  9:13         ` Ritesh Harjani
2024-02-26  9:46           ` John Garry
2024-02-26  8:51     ` John Garry
2024-02-19 13:01 ` [PATCH v4 04/11] fs: Add initial atomic write support info to statx John Garry
2024-02-19 22:28   ` Dave Chinner
2024-02-20  9:40     ` John Garry
2024-02-20  8:20   ` Christoph Hellwig
2024-02-20  9:01     ` John Garry
2024-02-24 18:46   ` Ritesh Harjani
2024-02-26  9:07     ` John Garry
2024-02-19 13:01 ` [PATCH v4 05/11] block: Add core atomic write support John Garry
2024-02-19 22:58   ` Dave Chinner
2024-02-20  8:22     ` Christoph Hellwig
2024-02-20 10:01     ` John Garry
2024-02-25 12:09   ` Ritesh Harjani
2024-02-25 12:21     ` Ritesh Harjani
2024-02-26  9:23     ` John Garry [this message]
2024-02-19 13:01 ` [PATCH v4 06/11] block: Add atomic write support for statx John Garry
2024-02-20  8:29   ` Christoph Hellwig
2024-02-20  9:35     ` John Garry
2024-02-25 14:20   ` Ritesh Harjani
2024-02-26  9:36     ` John Garry
2024-02-19 13:01 ` [PATCH v4 07/11] block: Add fops atomic write support John Garry
2024-02-25 14:46   ` Ritesh Harjani
2024-02-26  9:46     ` John Garry
2024-02-19 13:01 ` [PATCH v4 08/11] scsi: sd: Atomic " John Garry
2024-02-19 13:01 ` [PATCH v4 09/11] scsi: scsi_debug: " John Garry
2024-02-20  7:12   ` Ojaswin Mujoo
2024-02-20  9:01     ` John Garry
2024-02-19 13:01 ` [PATCH v4 10/11] nvme: " John Garry
2024-02-19 19:21   ` Keith Busch
2024-02-20  6:55     ` Christoph Hellwig
2024-02-20  8:19       ` John Garry
2024-02-20  8:31   ` Christoph Hellwig
2024-02-20  8:50     ` John Garry
2024-02-19 13:01 ` [PATCH v4 11/11] nvme: Ensure atomic writes will be executed atomically John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox