From: Pavel Begunkov <asml.silence@gmail.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-block@vger.kernel.org, io-uring <io-uring@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"Gohad, Tushar" <tushar.gohad@intel.com>,
"Christian König" <christian.koenig@amd.com>,
"Christoph Hellwig" <hch@lst.de>,
"Kanchan Joshi" <joshi.k@samsung.com>,
"Anuj Gupta" <anuj20.g@samsung.com>,
"Nitesh Shetty" <nj.shetty@samsung.com>,
"lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>
Subject: Re: [LSF/MM/BPF TOPIC] dmabuf backed read/write
Date: Fri, 6 Feb 2026 15:08:25 +0000 [thread overview]
Message-ID: <3281a845-a1b8-468c-a528-b9f6003cddea@gmail.com> (raw)
In-Reply-To: <20260205235647.GA4177530@nvidia.com>
On 2/5/26 23:56, Jason Gunthorpe wrote:
> On Thu, Feb 05, 2026 at 07:06:03PM +0000, Pavel Begunkov wrote:
>> On 2/5/26 17:41, Jason Gunthorpe wrote:
>>> On Tue, Feb 03, 2026 at 02:29:55PM +0000, Pavel Begunkov wrote:
>>>
>>>> The proposal consists of two parts. The first is a small in-kernel
>>>> framework that allows a dma-buf to be registered against a given file
>>>> and returns an object representing a DMA mapping.
>>>
>>> What is this about and why would you need something like this?
>>>
>>> The rest makes more sense - pass a DMABUF (or even memfd) to iouring
>>> and pre-setup the DMA mapping to get dma_addr_t, then directly use
>>> dma_addr_t through the entire block stack right into the eventual
>>> driver.
>>
>> That's more or less what I tried to do in v1, but 1) people didn't like
>> the idea of passing raw dma addresses directly, and having it wrapped
>> into a black box gives more flexibility like potentially supporting
>> multi-device filesystems.
>
> Ok.. but what does that have to do with a user space visible file?
If you're referring to registration taking a file, it's used to forward
this registration to the right driver, which knows about devices and can
create dma-buf attachment[s]. The abstraction users get is not just a
buffer but rather a buffer registered for a "subsystem" represented by
the passed file. With nvme raw bdev as the only importer in the patch set,
it's simply converges to "registered for the file", but the notion will
need to be expanded later, e.g. to accommodate filesystems.
>> 2) dma-buf folks want dynamic attachments,
>> and it makes it quite a bit more complicated when you might be asked to
>> shoot down DMA mappings at any moment, so I'm isolating all that
>> into something that can be reused.
>
> IMHO there is probably nothing really resuable here. The logic to
> fence any usage is entirely unique to whoever is using it, and the
> locking tends to be really hard.
>
> You should review the email threads linked to this patch and all it's
> prior versions as the expected importer behavior for pinned dmabufs is
> not well understood.
I'm not pinning it (i.e. no dma_buf_pin()), it should be a fair
dynamic implementation. In short, It adds a fence on move_notify
and signals when all requests using it are gone. New requests will
be trying to create a new mapping (and wait for fences).
> https://lore.kernel.org/all/20260131-dmabuf-revoke-v7-0-463d956bd527@nvidia.com/
>
>>>> Tushar was helping and mention he got good numbers for P2P transfers
>>>> compared to bouncing it via RAM.
>>>
>>> We can already avoid the bouncing, it seems the main improvements here
>>> are avoiding the DMA map per-io and allowing the use of P2P without
>>> also creating struct page. Meanginful wins for sure.
>>
>> Yes, and it should probably be nicer for frameworks that already
>> expose dma-bufs.
>
> I'm not sure what this means?
I'm saying that when a user app can easily get or already has a
dma-buf fd, it should be easier to just use it instead of finding
its way to FOLL_PCI_P2PDMA. I'm actually curious, is there a way
to somehow create a MEMORY_DEVICE_PCI_P2PDMA mapping out of a random
dma-buf? From a quick glance, I only see nvme cmb and some accelerator
being registered to P2PDMA, but maybe I'm missing something.
--
Pavel Begunkov
next prev parent reply other threads:[~2026-02-06 15:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260204153051epcas5p1c2efd01ef32883680fed2541f9fca6c2@epcas5p1.samsung.com>
2026-02-03 14:29 ` [LSF/MM/BPF TOPIC] dmabuf backed read/write Pavel Begunkov
2026-02-03 18:07 ` Keith Busch
2026-02-04 6:07 ` Anuj Gupta/Anuj Gupta
2026-02-04 11:38 ` Pavel Begunkov
2026-02-04 15:26 ` Nitesh Shetty
2026-02-05 3:12 ` Ming Lei
2026-02-05 18:13 ` Pavel Begunkov
2026-02-05 17:41 ` Jason Gunthorpe
2026-02-05 19:06 ` Pavel Begunkov
2026-02-05 23:56 ` Jason Gunthorpe
2026-02-06 15:08 ` Pavel Begunkov [this message]
2026-02-06 15:20 ` Jason Gunthorpe
2026-02-06 17:57 ` Pavel Begunkov
2026-02-06 18:37 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3281a845-a1b8-468c-a528-b9f6003cddea@gmail.com \
--to=asml.silence@gmail.com \
--cc=anuj20.g@samsung.com \
--cc=christian.koenig@amd.com \
--cc=hch@lst.de \
--cc=io-uring@vger.kernel.org \
--cc=jgg@nvidia.com \
--cc=joshi.k@samsung.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nj.shetty@samsung.com \
--cc=tushar.gohad@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox