public inbox for [email protected]
 help / color / mirror / Atom feed
From: Matthew Wilcox <[email protected]>
To: Luis Chamberlain <[email protected]>
Cc: "John Garry" <[email protected]>,
	"Pankaj Raghav" <[email protected]>,
	"Daniel Gomez" <[email protected]>,
	"Javier González" <[email protected]>,
	[email protected], [email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v6 00/10] block atomic writes
Date: Wed, 10 Apr 2024 05:05:20 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Mon, Apr 08, 2024 at 10:50:47AM -0700, Luis Chamberlain wrote:
> On Fri, Apr 05, 2024 at 11:06:00AM +0100, John Garry wrote:
> > On 04/04/2024 17:48, Matthew Wilcox wrote:
> > > > > The thing is that there's no requirement for an interface as complex as
> > > > > the one you're proposing here.  I've talked to a few database people
> > > > > and all they want is to increase the untorn write boundary from "one
> > > > > disc block" to one database block, typically 8kB or 16kB.
> > > > > 
> > > > > So they would be quite happy with a much simpler interface where they
> > > > > set the inode block size at inode creation time,
> > > > We want to support untorn writes for bdev file operations - how can we set
> > > > the inode block size there? Currently it is based on logical block size.
> > > ioctl(BLKBSZSET), I guess?  That currently limits to PAGE_SIZE, but I
> > > think we can remove that limitation with the bs>PS patches.
> 
> I can say a bit more on this, as I explored that. Essentially Matthew,
> yes, I got that to work but it requires a set of different patches. We have
> what we tried and then based on feedback from Chinner we have a
> direction on what to try next. The last effort on that front was having the
> iomap aops for bdev be used and lifting the PAGE_SIZE limit up to the
> page cache limits. The crux on that front was that we end requiring
> disabling BUFFER_HEAD and that is pretty limitting, so my old
> implementation had dynamic aops so to let us use the buffer-head aops
> only when using filesystems which require it and use iomap aops
> otherwise. But as Chinner noted we learned through the DAX experience
> that's not a route we want to again try, so the real solution is to
> extend iomap bdev aops code with buffer-head compatibility.

Have you tried just using the buffer_head code?  I think you heard bad
advice at last LSFMM.  Since then I've landed a bunch of patches which
remove PAGE_SIZE assumptions throughout the buffer_head code, and while
I haven't tried it, it might work.  And it might be easier to make work
than adding more BH hacks to the iomap code.

A quick audit for problems ...

__getblk_slow:
       if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
                        (size < 512 || size > PAGE_SIZE))) {

cont_expand_zero (not used by bdev code)
cont_write_begin (ditto)

That's all I spot from a quick grep for PAGE, offset_in_page() and kmap.

You can't do a lot of buffer_heads per folio, because you'll overrun
        struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
in block_read_full_folio(), but you can certainly do _one_ buffer_head
per folio, and that's all you need for bs>PS.

> I suspect this is a use case where perhaps the max folio order could be
> set for the bdev in the future, the logical block size the min order,
> and max order the large atomic.

No, that's not what we want to do at all!  Minimum writeback size needs
to be the atomic size, otherwise we have to keep track of which writes
are atomic and which ones aren't.  So, just set the logical block size
to the atomic size, and we're done.


  reply	other threads:[~2024-04-10  4:05 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 13:38 [PATCH v6 00/10] block atomic writes John Garry
2024-03-26 13:38 ` [PATCH v6 01/10] block: Pass blk_queue_get_max_sectors() a request pointer John Garry
2024-04-10 22:58   ` Luis Chamberlain
2024-03-26 13:38 ` [PATCH v6 02/10] block: Call blkdev_dio_unaligned() from blkdev_direct_IO() John Garry
2024-04-10 22:53   ` Luis Chamberlain
2024-04-11  8:06     ` John Garry
2024-03-26 13:38 ` [PATCH v6 03/10] fs: Initial atomic write support John Garry
2024-03-26 13:38 ` [PATCH v6 04/10] fs: Add initial atomic write support info to statx John Garry
2024-03-26 13:38 ` [PATCH v6 05/10] block: Add core atomic write support John Garry
2024-03-26 17:11   ` Randy Dunlap
2024-04-10 23:34     ` Luis Chamberlain
2024-04-11  8:15       ` John Garry
2024-03-26 13:38 ` [PATCH v6 06/10] block: Add atomic write support for statx John Garry
2024-03-26 13:38 ` [PATCH v6 07/10] block: Add fops atomic write support John Garry
2024-03-26 13:38 ` [PATCH v6 08/10] scsi: sd: Atomic " John Garry
2024-03-26 13:38 ` [PATCH v6 09/10] scsi: scsi_debug: " John Garry
2024-03-26 13:38 ` [PATCH v6 10/10] nvme: " John Garry
2024-04-11  0:29   ` Luis Chamberlain
2024-04-11  8:59     ` John Garry
2024-04-11 16:22       ` Luis Chamberlain
2024-04-11 23:32         ` Dan Helmick
2024-03-27  3:50 ` [PATCH v6 00/10] block atomic writes Matthew Wilcox
2024-03-27 13:37   ` John Garry
2024-04-04 16:48     ` Matthew Wilcox
2024-04-05 10:06       ` John Garry
2024-04-08 17:50         ` Luis Chamberlain
2024-04-10  4:05           ` Matthew Wilcox [this message]
2024-04-10  6:20             ` Hannes Reinecke
2024-04-11  0:38               ` Luis Chamberlain
2024-04-14 20:50             ` Luis Chamberlain
2024-04-15 21:18               ` Matthew Wilcox
2024-04-16 21:11                 ` Luis Chamberlain
2024-04-10  8:34           ` John Garry
2024-04-11 19:07             ` Luis Chamberlain
2024-04-12  8:15               ` John Garry
2024-04-12 18:28                 ` Luis Chamberlain
2024-03-27 20:31   ` Dave Chinner
2024-04-05 10:20     ` Kent Overstreet
2024-04-05 10:55       ` John Garry
2024-04-05  6:14   ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox