* Re: [LSF/MM/BPF TOPIC] dmabuf backed read/write
2026-02-03 14:29 ` [LSF/MM/BPF TOPIC] dmabuf backed read/write Pavel Begunkov
@ 2026-02-03 18:07 ` Keith Busch
2026-02-04 6:07 ` Anuj Gupta/Anuj Gupta
2026-02-04 11:38 ` Pavel Begunkov
2026-02-04 15:26 ` Nitesh Shetty
2026-02-05 3:12 ` Ming Lei
2 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2026-02-03 18:07 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, io-uring, linux-nvme@lists.infradead.org,
Gohad, Tushar, Christian König, Christoph Hellwig,
Kanchan Joshi, Anuj Gupta, Nitesh Shetty,
lsf-pc@lists.linux-foundation.org
On Tue, Feb 03, 2026 at 02:29:55PM +0000, Pavel Begunkov wrote:
> Good day everyone,
>
> dma-buf is a powerful abstraction for managing buffers and DMA mappings,
> and there is growing interest in extending it to the read/write path to
> enable device-to-device transfers without bouncing data through system
> memory. I was encouraged to submit it to LSF/MM/BPF as that might be
> useful to mull over details and what capabilities and features people
> may need.
>
> The proposal consists of two parts. The first is a small in-kernel
> framework that allows a dma-buf to be registered against a given file
> and returns an object representing a DMA mapping. The actual mapping
> creation is delegated to the target subsystem (e.g. NVMe). This
> abstraction centralises request accounting, mapping management, dynamic
> recreation, etc. The resulting mapping object is passed through the I/O
> stack via a new iov_iter type.
>
> As for the user API, a dma-buf is installed as an io_uring registered
> buffer for a specific file. Once registered, the buffer can be used by
> read / write io_uring requests as normal. io_uring will enforce that the
> buffer is only used with "compatible files", which is for now restricted
> to the target registration file, but will be expanded in the future.
> Notably, io_uring is a consumer of the framework rather than a
> dependency, and the infrastructure can be reused.
>
> It took a couple of iterations on the list to get it to the current
> design, v2 of the series can be looked up at [1], which implements the
> infrastructure and initial wiring for NVMe. It slightly diverges from
> the description above, as some of the framework bits are block specific,
> and I'll be working on refining that and simplifying some of the
> interfaces for v3. A good chunk of block handling is based on prior work
> from Keith that was pre DMA mapping buffers [2].
>
> Tushar was helping and mention he got good numbers for P2P transfers
> compared to bouncing it via RAM. Anuj, Kanchan and Nitesh also
> previously reported encouraging results for system memory backed
> dma-buf for optimising IOMMU overhead, quoting Anuj:
>
> - STRICT: before = 570 KIOPS, after = 5.01 MIOPS
> - LAZY: before = 1.93 MIOPS, after = 5.01 MIOPS
> - PASSTHROUGH: before = 5.01 MIOPS, after = 5.01 MIOPS
Thanks for submitting the topic. The performance wins look great, but
I'm a little surpised passthrough didn't show any difference. We're
still skipping a bit of transformations with the dmabuf compared to not
having it, so maybe it's just a matter of crafting the right benchmark
to show the benefit.
Anyway, I look forward to the next version of this feature. I promise to
have more cycles to review and test the v3.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] dmabuf backed read/write
2026-02-03 18:07 ` Keith Busch
@ 2026-02-04 6:07 ` Anuj Gupta/Anuj Gupta
2026-02-04 11:38 ` Pavel Begunkov
1 sibling, 0 replies; 6+ messages in thread
From: Anuj Gupta/Anuj Gupta @ 2026-02-04 6:07 UTC (permalink / raw)
To: Keith Busch, Pavel Begunkov
Cc: linux-block, io-uring, linux-nvme@lists.infradead.org,
Christian König, Christoph Hellwig, Kanchan Joshi,
Nitesh Shetty, lsf-pc@lists.linux-foundation.org
On 2/3/2026 11:37 PM, Keith Busch wrote:
> Thanks for submitting the topic. The performance wins look great, but
> I'm a little surpised passthrough didn't show any difference. We're
> still skipping a bit of transformations with the dmabuf compared to not
> having it, so maybe it's just a matter of crafting the right benchmark
> to show the benefit.
>
Those numbers were from a drive that saturates at ~5M IOPS,
sopassthrough didn’t have much headroom. I did a quick run with two such
drives and saw a small improvement (~2–3%): ~5.97 MIOPS -> ~6.13 MIOPS,
but I’ll try tweaking the kernel config a bit to see if there’s more
headroom.
+1 on the topic - I'm interested in attending the discussion and
reviewing/testing v3 when it lands.
Thanks,
Anuj
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] dmabuf backed read/write
2026-02-03 18:07 ` Keith Busch
2026-02-04 6:07 ` Anuj Gupta/Anuj Gupta
@ 2026-02-04 11:38 ` Pavel Begunkov
1 sibling, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2026-02-04 11:38 UTC (permalink / raw)
To: Keith Busch
Cc: linux-block, io-uring, linux-nvme@lists.infradead.org,
Gohad, Tushar, Christian König, Christoph Hellwig,
Kanchan Joshi, Anuj Gupta, Nitesh Shetty,
lsf-pc@lists.linux-foundation.org
On 2/3/26 18:07, Keith Busch wrote:
> On Tue, Feb 03, 2026 at 02:29:55PM +0000, Pavel Begunkov wrote:
>> Good day everyone,
>>
...
>> Tushar was helping and mention he got good numbers for P2P transfers
>> compared to bouncing it via RAM. Anuj, Kanchan and Nitesh also
>> previously reported encouraging results for system memory backed
>> dma-buf for optimising IOMMU overhead, quoting Anuj:
>>
>> - STRICT: before = 570 KIOPS, after = 5.01 MIOPS
>> - LAZY: before = 1.93 MIOPS, after = 5.01 MIOPS
>> - PASSTHROUGH: before = 5.01 MIOPS, after = 5.01 MIOPS
>
> Thanks for submitting the topic. The performance wins look great, but
> I'm a little surpised passthrough didn't show any difference. We're
> still skipping a bit of transformations with the dmabuf compared to not
> having it, so maybe it's just a matter of crafting the right benchmark
> to show the benefit.
My first thought was that hardware couldn't push more and would
be great to have idle numbers, but Anuj already demystified it.
> Anyway, I look forward to the next version of this feature. I promise to
> have more cycles to review and test the v3.
Thanks! And in general, IMHO at this point waiting for next
version would be more time efficient for reviewers.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] dmabuf backed read/write
2026-02-03 14:29 ` [LSF/MM/BPF TOPIC] dmabuf backed read/write Pavel Begunkov
2026-02-03 18:07 ` Keith Busch
@ 2026-02-04 15:26 ` Nitesh Shetty
2026-02-05 3:12 ` Ming Lei
2 siblings, 0 replies; 6+ messages in thread
From: Nitesh Shetty @ 2026-02-04 15:26 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, io-uring, linux-nvme@lists.infradead.org,
Christian König, Christoph Hellwig, Kanchan Joshi,
Anuj Gupta, lsf-pc@lists.linux-foundation.org
[-- Attachment #1: Type: text/plain, Size: 1934 bytes --]
On 03/02/26 02:29PM, Pavel Begunkov wrote:
>Good day everyone,
>
>dma-buf is a powerful abstraction for managing buffers and DMA mappings,
>and there is growing interest in extending it to the read/write path to
>enable device-to-device transfers without bouncing data through system
>memory. I was encouraged to submit it to LSF/MM/BPF as that might be
>useful to mull over details and what capabilities and features people
>may need.
>
>The proposal consists of two parts. The first is a small in-kernel
>framework that allows a dma-buf to be registered against a given file
>and returns an object representing a DMA mapping. The actual mapping
>creation is delegated to the target subsystem (e.g. NVMe). This
>abstraction centralises request accounting, mapping management, dynamic
>recreation, etc. The resulting mapping object is passed through the I/O
>stack via a new iov_iter type.
>
>As for the user API, a dma-buf is installed as an io_uring registered
>buffer for a specific file. Once registered, the buffer can be used by
>read / write io_uring requests as normal. io_uring will enforce that the
>buffer is only used with "compatible files", which is for now restricted
>to the target registration file, but will be expanded in the future.
>Notably, io_uring is a consumer of the framework rather than a
>dependency, and the infrastructure can be reused.
>
We have been following the series, its interesting from couple of angles,
- IOPS wise we see a major improvement especially for IOMMU
- Series provides a way to do p2pdma to accelerator memory
Here are few topics which I am looking into specifically,
- Right now the series uses a PRP list. We need a good way to keep the
sg_table info around and decide on‑the‑fly whether to expose the buffer
as a PRP list or an SG list, depending on the I/O size.
- Possibility of futher optimization for new type of iov iter to reduce
per IO cost
Thanks,
Nitesh
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [LSF/MM/BPF TOPIC] dmabuf backed read/write
2026-02-03 14:29 ` [LSF/MM/BPF TOPIC] dmabuf backed read/write Pavel Begunkov
2026-02-03 18:07 ` Keith Busch
2026-02-04 15:26 ` Nitesh Shetty
@ 2026-02-05 3:12 ` Ming Lei
2 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2026-02-05 3:12 UTC (permalink / raw)
To: Pavel Begunkov
Cc: linux-block, io-uring, linux-nvme@lists.infradead.org,
Gohad, Tushar, Christian König, Christoph Hellwig,
Kanchan Joshi, Anuj Gupta, Nitesh Shetty,
lsf-pc@lists.linux-foundation.org
On Tue, Feb 03, 2026 at 02:29:55PM +0000, Pavel Begunkov wrote:
> Good day everyone,
>
> dma-buf is a powerful abstraction for managing buffers and DMA mappings,
> and there is growing interest in extending it to the read/write path to
> enable device-to-device transfers without bouncing data through system
> memory. I was encouraged to submit it to LSF/MM/BPF as that might be
> useful to mull over details and what capabilities and features people
> may need.
>
> The proposal consists of two parts. The first is a small in-kernel
> framework that allows a dma-buf to be registered against a given file
> and returns an object representing a DMA mapping. The actual mapping
> creation is delegated to the target subsystem (e.g. NVMe). This
> abstraction centralises request accounting, mapping management, dynamic
> recreation, etc. The resulting mapping object is passed through the I/O
> stack via a new iov_iter type.
>
> As for the user API, a dma-buf is installed as an io_uring registered
> buffer for a specific file. Once registered, the buffer can be used by
> read / write io_uring requests as normal. io_uring will enforce that the
> buffer is only used with "compatible files", which is for now restricted
> to the target registration file, but will be expanded in the future.
> Notably, io_uring is a consumer of the framework rather than a
> dependency, and the infrastructure can be reused.
I am interested in this topic.
Given dma-buf is inherently designed for sharing, I hope the io-uring
interface can be generic for covering:
- read/write with same dma-buf can be submitted to multiple devices
- read/write with dma-buf can cross stackable devices(device mapper, raid,
...)
Thanks,
Ming
^ permalink raw reply [flat|nested] 6+ messages in thread