From: John Garry <[email protected]>
To: Matthew Wilcox <[email protected]>
Cc: [email protected], [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected]
Subject: Re: [PATCH v6 00/10] block atomic writes
Date: Fri, 5 Apr 2024 11:06:00 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 04/04/2024 17:48, Matthew Wilcox wrote:
>>> The thing is that there's no requirement for an interface as complex as
>>> the one you're proposing here. I've talked to a few database people
>>> and all they want is to increase the untorn write boundary from "one
>>> disc block" to one database block, typically 8kB or 16kB.
>>>
>>> So they would be quite happy with a much simpler interface where they
>>> set the inode block size at inode creation time,
>> We want to support untorn writes for bdev file operations - how can we set
>> the inode block size there? Currently it is based on logical block size.
> ioctl(BLKBSZSET), I guess? That currently limits to PAGE_SIZE, but I
> think we can remove that limitation with the bs>PS patches.
We want a consistent interface for bdev and regular files, so that would
need to work for FSes also. FSes(XFS) work based on a homogeneous inode
blocksize, which is the SB blocksize.
Furthermore, we would seem to be mixing different concepts here.
Currently in Linux we say that a logical block size write is atomic. In
the block layer, we split BIOs on LBS boundaries. iomap creates BIOs
based on LBS boundaries. But writing a FS block is not always guaranteed
to be atomic, as far as I'm concerned. So just increasing the inode
block size / FS block size does not really change anything, in itself.
>
>>> and then all writes to
>>> that inode were guaranteed to be untorn. This would also be simpler to
>>> implement for buffered writes.
>> We did consider that. Won't that lead to the possibility of breaking
>> existing applications which want to do regular unaligned writes to these
>> files? We do know that mysql/innodb does have some "compressed" mode of
>> operation, which involves regular writes to the same file which wants untorn
>> writes.
> If you're talking about "regular unaligned buffered writes", then that
> won't break. If you cross a folio boundary, the result may be torn,
> but if you're crossing a block boundary you expect that.
>
>> Furthermore, untorn writes in HW are expensive - for SCSI anyway. Do we
>> always want these for such a file?
> Do untorn writes actually exist in SCSI? I was under the impression
> nobody had actually implemented them in SCSI hardware.
I know that some SCSI targets actually atomically write data in chunks >
LBS. Obviously atomic vs non-atomic performance is a moot point there,
as data is implicitly always atomically written.
We actually have an mysql/innodb port of this API working on such a SCSI
target.
However I am not sure about atomic write support for other SCSI targets.
>
>> We saw untorn writes as not being a property of the file or even the inode
>> itself, but rather an attribute of the specific IO being issued from the
>> userspace application.
> The problem is that keeping track of that is expensive for buffered
> writes. It's a model that only works for direct IO. Arguably we
> could make it work for O_SYNC buffered IO, but that'll require some
> surgery.
To me, O_ATOMIC would be required for buffered atomic writes IO, as we
want a fixed-sized IO, so that would mean no mixing of atomic and
non-atomic IO.
Thanks,
John
next prev parent reply other threads:[~2024-04-05 10:06 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-26 13:38 [PATCH v6 00/10] block atomic writes John Garry
2024-03-26 13:38 ` [PATCH v6 01/10] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-04-10 22:58 ` Luis Chamberlain
2024-03-26 13:38 ` [PATCH v6 02/10] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-04-10 22:53 ` Luis Chamberlain
2024-04-11 8:06 ` John Garry
2024-03-26 13:38 ` [PATCH v6 03/10] fs: Initial atomic write support John Garry
2024-03-26 13:38 ` [PATCH v6 04/10] fs: Add initial atomic write support info to statx John Garry
2024-03-26 13:38 ` [PATCH v6 05/10] block: Add core atomic write support John Garry
2024-03-26 17:11 ` Randy Dunlap
2024-04-10 23:34 ` Luis Chamberlain
2024-04-11 8:15 ` John Garry
2024-03-26 13:38 ` [PATCH v6 06/10] block: Add atomic write support for statx John Garry
2024-03-26 13:38 ` [PATCH v6 07/10] block: Add fops atomic write support John Garry
2024-03-26 13:38 ` [PATCH v6 08/10] scsi: sd: Atomic " John Garry
2024-03-26 13:38 ` [PATCH v6 09/10] scsi: scsi_debug: " John Garry
2024-03-26 13:38 ` [PATCH v6 10/10] nvme: " John Garry
2024-04-11 0:29 ` Luis Chamberlain
2024-04-11 8:59 ` John Garry
2024-04-11 16:22 ` Luis Chamberlain
2024-04-11 23:32 ` Dan Helmick
2024-03-27 3:50 ` [PATCH v6 00/10] block atomic writes Matthew Wilcox
2024-03-27 13:37 ` John Garry
2024-04-04 16:48 ` Matthew Wilcox
2024-04-05 10:06 ` John Garry [this message]
2024-04-08 17:50 ` Luis Chamberlain
2024-04-10 4:05 ` Matthew Wilcox
2024-04-10 6:20 ` Hannes Reinecke
2024-04-11 0:38 ` Luis Chamberlain
2024-04-14 20:50 ` Luis Chamberlain
2024-04-15 21:18 ` Matthew Wilcox
2024-04-16 21:11 ` Luis Chamberlain
2024-04-10 8:34 ` John Garry
2024-04-11 19:07 ` Luis Chamberlain
2024-04-12 8:15 ` John Garry
2024-04-12 18:28 ` Luis Chamberlain
2024-03-27 20:31 ` Dave Chinner
2024-04-05 10:20 ` Kent Overstreet
2024-04-05 10:55 ` John Garry
2024-04-05 6:14 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox