public inbox for [email protected]
 help / color / mirror / Atom feed
From: Matthew Wilcox <[email protected]>
To: Dave Chinner <[email protected]>
Cc: Keith Busch <[email protected]>,
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	Alexander Viro <[email protected]>,
	Kernel Team <[email protected]>, Keith Busch <[email protected]>
Subject: Re: [PATCHv3 2/7] file: add ops to dma map bvec
Date: Mon, 8 Aug 2022 03:49:09 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Mon, Aug 08, 2022 at 12:15:01PM +1000, Dave Chinner wrote:
> On Mon, Aug 08, 2022 at 02:13:41AM +0100, Matthew Wilcox wrote:
> > On Mon, Aug 08, 2022 at 10:21:24AM +1000, Dave Chinner wrote:
> > > > +#ifdef CONFIG_HAS_DMA
> > > > +	void *(*dma_map)(struct file *, struct bio_vec *, int);
> > > > +	void (*dma_unmap)(struct file *, void *);
> > > > +#endif
> > > 
> > > This just smells wrong. Using a block layer specific construct as a
> > > primary file operation parameter shouts "layering violation" to me.
> > 
> > A bio_vec is also used for networking; it's in disguise as an skb_frag,
> > but it's there.
> 
> Which is just as awful. Just because it's done somewhere else
> doesn't make it right.
> 
> > > What we really need is a callout that returns the bdevs that the
> > > struct file is mapped to (one, or many), so the caller can then map
> > > the memory addresses to the block devices itself. The caller then
> > > needs to do an {file, offset, len} -> {bdev, sector, count}
> > > translation so the io_uring code can then use the correct bdev and
> > > dma mappings for the file offset that the user is doing IO to/from.
> > 
> > I don't even know if what you're proposing is possible.  Consider a
> > network filesystem which might transparently be moved from one network
> > interface to another.  I don't even know if the filesystem would know
> > which network device is going to be used for the IO at the time of
> > IO submission.
> 
> Sure, but nobody is suggesting we support direct DMA buffer mapping
> and reuse for network devices right now, whereas we have working
> code for block devices in front of us.

But we have working code already (merged) in the networking layer for
reusing pages that are mapped to particular devices.

> What I want to see is broad-based generic block device based
> filesysetm support, not niche functionality that can only work on a
> single type of block device. Network filesystems and devices are a
> *long* way from being able to do anything like this, so I don't see
> a need to cater for them at this point in time.
> 
> When someone has a network device abstraction and network filesystem
> that can do direct data placement based on that device abstraction,
> then we can talk about the high level interface we should use to
> drive it....
> 
> > I think a totally different model is needed where we can find out if
> > the bvec contains pages which are already mapped to the device, and map
> > them if they aren't.  That also handles a DM case where extra devices
> > are hot-added to a RAID, for example.
> 
> I cannot form a picture of what you are suggesting from such a brief
> description. Care to explain in more detail?

Let's suppose you have a RAID 5 of NVMe devices.  One fails and now
the RAID-5 is operating in degraded mode.  So you hot-unplug the failed
device, plug in a new NVMe drive and add it to the RAID.  The pages now
need to be DMA mapped to that new PCI device.

What I'm saying is that the set of devices that the pages need to be
mapped to is not static and cannot be known at "setup time", even given
the additional information that you were proposing earlier in this thread.
It has to be dynamically adjusted.

  reply	other threads:[~2022-08-08  2:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 16:24 [PATCHv3 0/7] dma mapping optimisations Keith Busch
2022-08-05 16:24 ` [PATCHv3 1/7] blk-mq: add ops to dma map bvec Keith Busch
2022-08-05 16:24 ` [PATCHv3 2/7] file: " Keith Busch
2022-08-08  0:21   ` Dave Chinner
2022-08-08  1:13     ` Matthew Wilcox
2022-08-08  2:15       ` Dave Chinner
2022-08-08  2:49         ` Matthew Wilcox [this message]
2022-08-08  7:31           ` Dave Chinner
2022-08-08 15:28             ` Keith Busch
2022-08-08 10:14         ` Pavel Begunkov
2022-08-05 16:24 ` [PATCHv3 3/7] iov_iter: introduce type for preregistered dma tags Keith Busch
2022-08-05 16:24 ` [PATCHv3 4/7] block: add dma tag bio type Keith Busch
2022-08-05 16:24 ` [PATCHv3 5/7] io_uring: introduce file slot release helper Keith Busch
2022-08-05 16:24 ` [PATCHv3 6/7] io_uring: add support for dma pre-mapping Keith Busch
2022-08-05 16:24 ` [PATCHv3 7/7] nvme-pci: implement dma_map support Keith Busch
2022-08-09  6:46 ` [PATCHv3 0/7] dma mapping optimisations Christoph Hellwig
2022-08-09 14:18   ` Keith Busch
2022-08-09 18:39     ` Christoph Hellwig
2022-08-09 16:46   ` Keith Busch
2022-08-09 18:41     ` Christoph Hellwig
2022-08-10 18:05       ` Keith Busch
2022-08-11  7:22         ` Christoph Hellwig
2022-08-31 21:19           ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox