From: Jason Gunthorpe <jgg@nvidia.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: linux-block@vger.kernel.org, io-uring <io-uring@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"Gohad, Tushar" <tushar.gohad@intel.com>,
"Christian König" <christian.koenig@amd.com>,
"Christoph Hellwig" <hch@lst.de>,
"Kanchan Joshi" <joshi.k@samsung.com>,
"Anuj Gupta" <anuj20.g@samsung.com>,
"Nitesh Shetty" <nj.shetty@samsung.com>,
"lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] dmabuf backed read/write
Date: Fri, 6 Feb 2026 14:37:56 -0400 [thread overview]
Message-ID: <20260206183756.GB1874040@nvidia.com> (raw)
In-Reply-To: <df7fe4d7-ca28-408e-bed3-bd1fa23e7588@gmail.com>
On Fri, Feb 06, 2026 at 05:57:14PM +0000, Pavel Begunkov wrote:
> On 2/6/26 15:20, Jason Gunthorpe wrote:
> > On Fri, Feb 06, 2026 at 03:08:25PM +0000, Pavel Begunkov wrote:
> > > On 2/5/26 23:56, Jason Gunthorpe wrote:
> > > > On Thu, Feb 05, 2026 at 07:06:03PM +0000, Pavel Begunkov wrote:
> > > > > On 2/5/26 17:41, Jason Gunthorpe wrote:
> > > > > > On Tue, Feb 03, 2026 at 02:29:55PM +0000, Pavel Begunkov wrote:
> > > > > >
> > > > > > > The proposal consists of two parts. The first is a small in-kernel
> > > > > > > framework that allows a dma-buf to be registered against a given file
> > > > > > > and returns an object representing a DMA mapping.
> > > > > >
> > > > > > What is this about and why would you need something like this?
> > > > > >
> > > > > > The rest makes more sense - pass a DMABUF (or even memfd) to iouring
> > > > > > and pre-setup the DMA mapping to get dma_addr_t, then directly use
> > > > > > dma_addr_t through the entire block stack right into the eventual
> > > > > > driver.
> > > > >
> > > > > That's more or less what I tried to do in v1, but 1) people didn't like
> > > > > the idea of passing raw dma addresses directly, and having it wrapped
> > > > > into a black box gives more flexibility like potentially supporting
> > > > > multi-device filesystems.
> > > >
> > > > Ok.. but what does that have to do with a user space visible file?
> > >
> > > If you're referring to registration taking a file, it's used to forward
> > > this registration to the right driver, which knows about devices and can
> > > create dma-buf attachment[s]. The abstraction users get is not just a
> > > buffer but rather a buffer registered for a "subsystem" represented by
> > > the passed file. With nvme raw bdev as the only importer in the patch set,
> > > it's simply converges to "registered for the file", but the notion will
> > > need to be expanded later, e.g. to accommodate filesystems.
> >
> > Sounds completely goofy to me.
>
> Hmm... the discussion is not going to be productive, isn't it?
Well, this FD thing is very confounding and, sorry I don't see much
logic to this design. I understand the problems you are explaining but
not this solution.
> Or would it be mapping it for each IO?
mapping for each IO could be possible with a phys_addr_t path.
> dma-buf already exists as well, and I'm ashamed to admit,
> but I don't know how a user program can read into / write from
> memory provided by dma-buf.
You can mmap them. It can even be used with read() write() system
calls if the dma buf exporter is using P2P pages.
> I'm not doing it for any particular driver but rather trying
> to reuse what's already there, i.e. a good coverage of existing
> dma-buf exporters, and infrastructure dma-buf provides, e.g.
> move_notify. And trying to do that efficiently, avoiding GUP
> (what io_uring can already do for normal memory), keeping long
> term mappings (modulo move_notify), and so. That includes
> optimising the cost of system memory rw with iommu.
I would suggest leading with these reasons to frame why you are trying
to do this. It seems the main motivation is to create a pre
registered, and pre-IOMMU-mapped io uring pool of MMIO memory, and
indeed you cannot do that with the existing mechanisms at all.
As a step forward I could imagine having a DMABUF handing out P2P
pages and allowing io uring to "register" it complete with move
notify. This would get you half the way there and doesn't require
major changes to the block stack since you can still be pushing
unmapped struct page backed addresses and everything will work
fine. It is a good way to sidestep the FOLL_LONGTERM issue.
Pre-iommu-mapping the pool seems like an orthogonal project as it
applies to everything coming from pre-registered io uring buffers,
even normal cpu memory. You could have a next step of pre-mapping the
P2P pages and CPU pages equally.
Finally you could try a project to remove the P2P page requirement for
cases that use the pre-iommu-mapping flow.
It would probably be helpful not to mixup those three things..
Jason
prev parent reply other threads:[~2026-02-06 18:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260204153051epcas5p1c2efd01ef32883680fed2541f9fca6c2@epcas5p1.samsung.com>
2026-02-03 14:29 ` [LSF/MM/BPF TOPIC] dmabuf backed read/write Pavel Begunkov
2026-02-03 18:07 ` Keith Busch
2026-02-04 6:07 ` Anuj Gupta/Anuj Gupta
2026-02-04 11:38 ` Pavel Begunkov
2026-02-04 15:26 ` Nitesh Shetty
2026-02-05 3:12 ` Ming Lei
2026-02-05 18:13 ` Pavel Begunkov
2026-02-05 17:41 ` Jason Gunthorpe
2026-02-05 19:06 ` Pavel Begunkov
2026-02-05 23:56 ` Jason Gunthorpe
2026-02-06 15:08 ` Pavel Begunkov
2026-02-06 15:20 ` Jason Gunthorpe
2026-02-06 17:57 ` Pavel Begunkov
2026-02-06 18:37 ` Jason Gunthorpe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260206183756.GB1874040@nvidia.com \
--to=jgg@nvidia.com \
--cc=anuj20.g@samsung.com \
--cc=asml.silence@gmail.com \
--cc=christian.koenig@amd.com \
--cc=hch@lst.de \
--cc=io-uring@vger.kernel.org \
--cc=joshi.k@samsung.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nj.shetty@samsung.com \
--cc=tushar.gohad@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox