public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Pavel Begunkov <[email protected]>
Cc: Jens Axboe <[email protected]>,
	[email protected], Conrad Meyer <[email protected]>,
	[email protected], [email protected],
	Jan Kara <[email protected]>,
	Shin'ichiro Kawasaki <[email protected]>
Subject: Re: [RFC 5/5] block: implement io_uring discard cmd
Date: Wed, 21 Aug 2024 10:55:06 +0800	[thread overview]
Message-ID: <ZsVXClra11+yLjss@fedora> (raw)
In-Reply-To: <[email protected]>

On Tue, Aug 20, 2024 at 06:19:00PM +0100, Pavel Begunkov wrote:
> On 8/20/24 17:30, Jens Axboe wrote:
> > On 8/19/24 8:36 PM, Ming Lei wrote:
> > > On Mon, Aug 19, 2024 at 02:01:21PM -0600, Jens Axboe wrote:
> > > > On 8/15/24 7:45 PM, Ming Lei wrote:
> ...
> > > > > Meantime the handling has to move to io-wq for avoiding to block current
> > > > > context, the interface becomes same with IORING_OP_FALLOCATE?
> > > > 
> > > > I think the current truncate is overkill, we should be able to get by
> > > > without. And no, I will not entertain an option that's "oh just punt it
> > > > to io-wq".
> > > 
> > > BTW, the truncate is added by 351499a172c0 ("block: Invalidate cache on discard v2"),
> > > and block/009 serves as regression test for covering page cache
> > > coherency and discard.
> > > 
> > > Here the issue is actually related with the exclusive lock of
> > > filemap_invalidate_lock(). IMO, it is reasonable to prevent page read during
> > > discard for not polluting page cache. block/009 may fail too without the lock.
> > > 
> > > It is just that concurrent discards can't be allowed any more by
> > > down_write() of rw_semaphore, and block device is really capable of doing
> > > that. It can be thought as one regression of 7607c44c157d ("block: Hold invalidate_lock in
> > > BLKDISCARD ioctl").
> > > 
> > > Cc Jan Kara and Shin'ichiro Kawasaki.
> > 
> > Honestly I just think that's nonsense. It's like mixing direct and
> > buffered writes. Can you get corruption? Yes you most certainly can.
> > There should be no reason why we can't run discards without providing
> > page cache coherency. The sync interface attempts to do that, but that
> > doesn't mean that an async (or a different sync one, if that made sense)
> > should.
> 
> I don't see it as a problem either, it's a new interface, just need
> to be upfront on what guarantees it provides (one more reason why
> not fallocate), I'll elaborate on it in the commit message and so.

Fair enough.

> 
> I think a reasonable thing to do is to have one rule for all write-like
> operations starting from plain writes, which is currently allowing races
> to happen and shift it to the user. Purely in theory we can get inventive
> with likes of range lock trees, but that's unwarranted for all sorts of
> reasons.
> 
> > If you do discards to the same range as you're doing buffered IO, you
> > get to keep both potentially pieces. Fact is that most folks are doing
> > dio for performant IO exactly because buffered writes tend to be
> > horrible, and you could certainly use that with async discards and have
> > the application manage it just fine.
> > 
> > So I really think any attempts to provide page cache synchronization for
> > this is futile. And the existing sync one looks pretty abysmal, but it
> > doesn't really matter as it's a sync interfce. If one were to do
> 
> It should be a pain for sync as well, you can't even spin another process
> and parallelise this way.

Yes, this way has degraded some sync discard workloads perf a lot.

Thanks,
Ming


  reply	other threads:[~2024-08-21  2:55 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-14 10:45 [RFC 0/5] implement asynchronous BLKDISCARD via io_uring Pavel Begunkov
2024-08-14 10:45 ` [RFC 1/5] io_uring/cmd: expose iowq to cmds Pavel Begunkov
2024-08-14 10:45 ` [RFC 2/5] io_uring/cmd: give inline space in request " Pavel Begunkov
2024-08-14 10:45 ` [RFC 3/5] filemap: introduce filemap_invalidate_pages Pavel Begunkov
2024-08-14 10:45 ` [RFC 4/5] block: introduce blk_validate_discard() Pavel Begunkov
2024-08-14 10:45 ` [RFC 5/5] block: implement io_uring discard cmd Pavel Begunkov
2024-08-15  1:42   ` Ming Lei
2024-08-15 14:33     ` Jens Axboe
2024-08-15 17:11       ` Pavel Begunkov
2024-08-15 23:44         ` Ming Lei
2024-08-16  1:24           ` Jens Axboe
2024-08-16  1:45             ` Ming Lei
2024-08-16  1:59               ` Pavel Begunkov
2024-08-16  2:08                 ` Ming Lei
2024-08-16  2:16                   ` Pavel Begunkov
2024-08-19 20:02                     ` Jens Axboe
2024-08-19 20:01               ` Jens Axboe
2024-08-20  2:36                 ` Ming Lei
2024-08-20 16:30                   ` Jens Axboe
2024-08-20 17:19                     ` Pavel Begunkov
2024-08-21  2:55                       ` Ming Lei [this message]
2024-08-15 14:42   ` Jens Axboe
2024-08-15 15:50 ` [RFC 0/5] implement asynchronous BLKDISCARD via io_uring Jens Axboe
2024-08-15 17:26   ` Pavel Begunkov
2024-08-15 16:15 ` Martin K. Petersen
2024-08-15 17:12   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsVXClra11+yLjss@fedora \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox