From: John Garry <[email protected]>
To: [email protected], [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected]
Cc: [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
Alan Adamson <[email protected]>,
John Garry <[email protected]>
Subject: [PATCH v7 9/9] nvme: Atomic write support
Date: Sun, 2 Jun 2024 14:09:12 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
From: Alan Adamson <[email protected]>
Add support to set block layer request_queue atomic write limits. The
limits will be derived from either the namespace or controller atomic
parameters.
NVMe atomic-related parameters are grouped into "normal" and "power-fail"
(or PF) class of parameter. For atomic write support, only PF parameters
are of interest. The "normal" parameters are concerned with racing reads
and writes (which also applies to PF). See NVM Command Set Specification
Revision 1.0d section 2.1.4 for reference.
Whether to use per namespace or controller atomic parameters is decided by
NSFEAT bit 1 - see Figure 97: Identify – Identify Namespace Data
Structure, NVM Command Set.
NVMe namespaces may define an atomic boundary, whereby no atomic guarantees
are provided for a write which straddles this per-lba space boundary. The
block layer merging policy is such that no merges may occur in which the
resultant request would straddle such a boundary.
Unlike SCSI, NVMe specifies no granularity or alignment rules, apart from
atomic boundary rule. In addition, again unlike SCSI, there is no
dedicated atomic write command - a write which adheres to the atomic size
limit and boundary is implicitly atomic.
If NSFEAT bit 1 is set, the following parameters are of interest:
- NAWUPF (Namespace Atomic Write Unit Power Fail)
- NABSPF (Namespace Atomic Boundary Size Power Fail)
- NABO (Namespace Atomic Boundary Offset)
and we set request_queue limits as follows:
- atomic_write_unit_max = rounddown_pow_of_two(NAWUPF)
- atomic_write_max_bytes = NAWUPF
- atomic_write_boundary = NABSPF
If in the unlikely scenario that NABO is non-zero, then atomic writes will
not be supported at all as dealing with this adds extra complexity. This
policy may change in future.
In all cases, atomic_write_unit_min is set to the logical block size.
If NSFEAT bit 1 is unset, the following parameter is of interest:
- AWUPF (Atomic Write Unit Power Fail)
and we set request_queue limits as follows:
- atomic_write_unit_max = rounddown_pow_of_two(AWUPF)
- atomic_write_max_bytes = AWUPF
- atomic_write_boundary = 0
A new function, nvme_valid_atomic_write(), is also called from submission
path to verify that a request has been submitted to the driver will
actually be executed atomically. As mentioned, there is no dedicated NVMe
atomic write command (which may error for a command which exceeds the
controller atomic write limits).
Note on NABSPF:
There seems to be some vagueness in the spec as to whether NABSPF applies
for NSFEAT bit 1 being unset. Figure 97 does not explicitly mention NABSPF
and how it is affected by bit 1. However Figure 4 does tell to check Figure
97 for info about per-namespace parameters, which NABSPF is, so it is
implied. However currently nvme_update_disk_info() does check namespace
parameter NABO regardless of this bit.
Signed-off-by: Alan Adamson <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
jpg: total rewrite
Signed-off-by: John Garry <[email protected]>
---
drivers/nvme/host/core.c | 49 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f5d150c62955..91001892f60b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -927,6 +927,30 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
return BLK_STS_OK;
}
+static bool nvme_valid_atomic_write(struct request *req)
+{
+ struct request_queue *q = req->q;
+ u32 boundary_bytes = queue_atomic_write_boundary_bytes(q);
+
+ if (blk_rq_bytes(req) > queue_atomic_write_unit_max_bytes(q))
+ return false;
+
+ if (boundary_bytes) {
+ u64 mask = boundary_bytes - 1, imask = ~mask;
+ u64 start = blk_rq_pos(req) << SECTOR_SHIFT;
+ u64 end = start + blk_rq_bytes(req) - 1;
+
+ /* If greater then must be crossing a boundary */
+ if (blk_rq_bytes(req) > boundary_bytes)
+ return false;
+
+ if ((start & imask) != (end & imask))
+ return false;
+ }
+
+ return true;
+}
+
static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
struct request *req, struct nvme_command *cmnd,
enum nvme_opcode op)
@@ -941,6 +965,12 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
if (req->cmd_flags & REQ_RAHEAD)
dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
+ /*
+ * Ensure that nothing has been sent which cannot be executed
+ * atomically.
+ */
+ if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req))
+ return BLK_STS_INVAL;
cmnd->rw.opcode = op;
cmnd->rw.flags = 0;
@@ -1921,6 +1951,23 @@ static void nvme_configure_metadata(struct nvme_ctrl *ctrl,
}
}
+
+static void nvme_update_atomic_write_disk_info(struct nvme_ns *ns,
+ struct nvme_id_ns *id, struct queue_limits *lim,
+ u32 bs, u32 atomic_bs)
+{
+ unsigned int boundary = 0;
+
+ if (id->nsfeat & NVME_NS_FEAT_ATOMICS && id->nawupf) {
+ if (le16_to_cpu(id->nabspf))
+ boundary = (le16_to_cpu(id->nabspf) + 1) * bs;
+ }
+ lim->atomic_write_hw_max = atomic_bs;
+ lim->atomic_write_hw_boundary = boundary;
+ lim->atomic_write_hw_unit_min = bs;
+ lim->atomic_write_hw_unit_max = rounddown_pow_of_two(atomic_bs);
+}
+
static u32 nvme_max_drv_segments(struct nvme_ctrl *ctrl)
{
return ctrl->max_hw_sectors / (NVME_CTRL_PAGE_SIZE >> SECTOR_SHIFT) + 1;
@@ -1967,6 +2014,8 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
atomic_bs = (1 + le16_to_cpu(id->nawupf)) * bs;
else
atomic_bs = (1 + ns->ctrl->subsys->awupf) * bs;
+
+ nvme_update_atomic_write_disk_info(ns, id, lim, bs, atomic_bs);
}
if (id->nsfeat & NVME_NS_FEAT_IO_OPT) {
--
2.31.1
next prev parent reply other threads:[~2024-06-02 14:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-02 14:09 [PATCH v7 0/9] block atomic writes John Garry
2024-06-02 14:09 ` [PATCH v7 1/9] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-06-02 14:09 ` [PATCH v7 2/9] fs: Initial atomic write support John Garry
2024-06-05 8:30 ` Christoph Hellwig
2024-06-05 10:48 ` John Garry
2024-06-06 5:41 ` Christoph Hellwig
2024-06-06 6:38 ` John Garry
2024-06-02 14:09 ` [PATCH v7 3/9] fs: Add initial atomic write support info to statx John Garry
2024-06-02 14:09 ` [PATCH v7 4/9] block: Add core atomic write support John Garry
2024-06-03 9:26 ` Hannes Reinecke
2024-06-03 11:38 ` John Garry
2024-06-03 12:31 ` Hannes Reinecke
2024-06-03 13:29 ` John Garry
2024-06-05 8:32 ` Christoph Hellwig
2024-06-05 11:21 ` John Garry
2024-06-06 5:44 ` Christoph Hellwig
2024-06-05 8:31 ` Christoph Hellwig
2024-06-02 14:09 ` [PATCH v7 5/9] block: Add atomic write support for statx John Garry
2024-06-02 14:09 ` [PATCH v7 6/9] block: Add fops atomic write support John Garry
2024-06-02 14:09 ` [PATCH v7 7/9] scsi: sd: Atomic " John Garry
2024-06-02 14:09 ` [PATCH v7 8/9] scsi: scsi_debug: " John Garry
2024-06-02 14:09 ` John Garry [this message]
2024-06-07 6:16 ` [PATCH v7 0/9] block atomic writes John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox