public inbox for [email protected]
 help / color / mirror / Atom feed
From: Dave Chinner <[email protected]>
To: Matthew Wilcox <[email protected]>
Cc: John Garry <[email protected]>,
	[email protected], [email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v6 00/10] block atomic writes
Date: Thu, 28 Mar 2024 07:31:45 +1100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Wed, Mar 27, 2024 at 03:50:07AM +0000, Matthew Wilcox wrote:
> On Tue, Mar 26, 2024 at 01:38:03PM +0000, John Garry wrote:
> > The goal here is to provide an interface that allows applications use
> > application-specific block sizes larger than logical block size
> > reported by the storage device or larger than filesystem block size as
> > reported by stat().
> > 
> > With this new interface, application blocks will never be torn or
> > fractured when written. For a power fail, for each individual application
> > block, all or none of the data to be written. A racing atomic write and
> > read will mean that the read sees all the old data or all the new data,
> > but never a mix of old and new.
> > 
> > Three new fields are added to struct statx - atomic_write_unit_min,
> > atomic_write_unit_max, and atomic_write_segments_max. For each atomic
> > individual write, the total length of a write must be a between
> > atomic_write_unit_min and atomic_write_unit_max, inclusive, and a
> > power-of-2. The write must also be at a natural offset in the file
> > wrt the write length. For pwritev2, iovcnt is limited by
> > atomic_write_segments_max.
> > 
> > There has been some discussion on supporting buffered IO and whether the
> > API is suitable, like:
> > https://lore.kernel.org/linux-nvme/[email protected]/
> > 
> > Specifically the concern is that supporting a range of sizes of atomic IO
> > in the pagecache is complex to support. For this, my idea is that FSes can
> > fix atomic_write_unit_min and atomic_write_unit_max at the same size, the
> > extent alignment size, which should be easier to support. We may need to
> > implement O_ATOMIC to avoid mixing atomic and non-atomic IOs for this. I
> > have no proposed solution for atomic write buffered IO for bdev file
> > operations, but I know of no requirement for this.
> 
> The thing is that there's no requirement for an interface as complex as
> the one you're proposing here.  I've talked to a few database people
> and all they want is to increase the untorn write boundary from "one
> disc block" to one database block, typically 8kB or 16kB.
> 
> So they would be quite happy with a much simpler interface where they
> set the inode block size at inode creation time, and then all writes to
> that inode were guaranteed to be untorn.  This would also be simpler to
> implement for buffered writes.

You're conflating filesystem functionality that applications will use
with hardware and block-layer enablement that filesystems and
filesystem utilities need to configure the filesystem in ways that
allow users to make use of atomic write capability of the hardware.

The block layer functionality needs to export everything that the
hardware can do and filesystems will make use of. The actual
application usage and setup of atomic writes at the filesystem/page
cache layer is a separate problem.  i.e. The block layer interfaces
need only support direct IO and expose limits for issuing atomic
direct IO, and nothing more. All the more complex stuff to make it
"easy to use" is filesystem level functionality and completely
outside the scope of this patchset....

-Dave.
-- 
Dave Chinner
[email protected]

  parent reply	other threads:[~2024-03-27 20:31 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 13:38 [PATCH v6 00/10] block atomic writes John Garry
2024-03-26 13:38 ` [PATCH v6 01/10] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-04-10 22:58   ` Luis Chamberlain
2024-03-26 13:38 ` [PATCH v6 02/10] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-04-10 22:53   ` Luis Chamberlain
2024-04-11  8:06     ` John Garry
2024-03-26 13:38 ` [PATCH v6 03/10] fs: Initial atomic write support John Garry
2024-03-26 13:38 ` [PATCH v6 04/10] fs: Add initial atomic write support info to statx John Garry
2024-03-26 13:38 ` [PATCH v6 05/10] block: Add core atomic write support John Garry
2024-03-26 17:11   ` Randy Dunlap
2024-04-10 23:34     ` Luis Chamberlain
2024-04-11  8:15       ` John Garry
2024-03-26 13:38 ` [PATCH v6 06/10] block: Add atomic write support for statx John Garry
2024-03-26 13:38 ` [PATCH v6 07/10] block: Add fops atomic write support John Garry
2024-03-26 13:38 ` [PATCH v6 08/10] scsi: sd: Atomic " John Garry
2024-03-26 13:38 ` [PATCH v6 09/10] scsi: scsi_debug: " John Garry
2024-03-26 13:38 ` [PATCH v6 10/10] nvme: " John Garry
2024-04-11  0:29   ` Luis Chamberlain
2024-04-11  8:59     ` John Garry
2024-04-11 16:22       ` Luis Chamberlain
2024-04-11 23:32         ` Dan Helmick
2024-03-27  3:50 ` [PATCH v6 00/10] block atomic writes Matthew Wilcox
2024-03-27 13:37   ` John Garry
2024-04-04 16:48     ` Matthew Wilcox
2024-04-05 10:06       ` John Garry
2024-04-08 17:50         ` Luis Chamberlain
2024-04-10  4:05           ` Matthew Wilcox
2024-04-10  6:20             ` Hannes Reinecke
2024-04-11  0:38               ` Luis Chamberlain
2024-04-14 20:50             ` Luis Chamberlain
2024-04-15 21:18               ` Matthew Wilcox
2024-04-16 21:11                 ` Luis Chamberlain
2024-04-10  8:34           ` John Garry
2024-04-11 19:07             ` Luis Chamberlain
2024-04-12  8:15               ` John Garry
2024-04-12 18:28                 ` Luis Chamberlain
2024-03-27 20:31   ` Dave Chinner [this message]
2024-04-05 10:20     ` Kent Overstreet
2024-04-05 10:55       ` John Garry
2024-04-05  6:14   ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox