public inbox for [email protected]
 help / color / mirror / Atom feed
* dma_buf support with io_uring
       [not found] <BY5PR11MB399005DAD1BB172B7A42586AEFB59@BY5PR11MB3990.namprd11.prod.outlook.com>
@ 2022-06-23  6:17 ` Fang, Wilson
  2022-06-23 10:35   ` Pavel Begunkov
  0 siblings, 1 reply; 3+ messages in thread
From: Fang, Wilson @ 2022-06-23  6:17 UTC (permalink / raw)
  To: [email protected]; +Cc: Jens Axboe

Hi Jens,

We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex.
NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver.
One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD.

The flow is as below:
1. Application passes the dma_buf fd to the kernel through liburing.
2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory.
3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver.
4. NVMe SSD DMA engine DMA the data to/from the physical memory address.

The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data). 
But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor.

Do you have any recommendations? Much appreciated!

Thanks,
Wilson

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: dma_buf support with io_uring
  2022-06-23  6:17 ` dma_buf support with io_uring Fang, Wilson
@ 2022-06-23 10:35   ` Pavel Begunkov
  2022-07-13  5:41     ` Fang, Wilson
  0 siblings, 1 reply; 3+ messages in thread
From: Pavel Begunkov @ 2022-06-23 10:35 UTC (permalink / raw)
  To: Fang, Wilson, [email protected]; +Cc: Jens Axboe

On 6/23/22 07:17, Fang, Wilson wrote:
> Hi Jens,
> 
> We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex.
> NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver.
> One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD.

Interesting, that's quite aligns with what we're doing, that is a
more generic way for p2p with some non-p2p optimisations on the way.
Our approach we tried before is to let userspace to register dma-buf
fd inside io_uring as a register buffer, prepare everything in advance
like dmabuf attach, and then rw/send/etc. can use that.

> The flow is as below:
> 1. Application passes the dma_buf fd to the kernel through liburing.
> 2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory.
> 3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver.
> 4. NVMe SSD DMA engine DMA the data to/from the physical memory address.
> 
> The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data).
> But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor.
> 
> Do you have any recommendations? Much appreciated!

For finding a device pointer, we added an optional file operation
callback. I think that's much better than parsing it on the io_uring
side, especially since we need a guarantee that the device is the
only one which will be targeted and won't change (e.g. network may
choose a device dynamically based on target address).

I think we have space to cooperate here :)

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: dma_buf support with io_uring
  2022-06-23 10:35   ` Pavel Begunkov
@ 2022-07-13  5:41     ` Fang, Wilson
  0 siblings, 0 replies; 3+ messages in thread
From: Fang, Wilson @ 2022-07-13  5:41 UTC (permalink / raw)
  To: Pavel Begunkov, [email protected]; +Cc: Jens Axboe

Thanks, Pavel, for the recommendation!
We are super interested in collaborating on this - we are working on the prototype of your recommendation but moving a little bit slow due to vacation and resources.

Thanks,
Wilson

-----Original Message-----
From: Pavel Begunkov <[email protected]> 
Sent: Thursday, June 23, 2022 3:35 AM
To: Fang, Wilson <[email protected]>; [email protected]
Cc: Jens Axboe <[email protected]>
Subject: Re: dma_buf support with io_uring

On 6/23/22 07:17, Fang, Wilson wrote:
> Hi Jens,
> 
> We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex.
> NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver.
> One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD.

Interesting, that's quite aligns with what we're doing, that is a more generic way for p2p with some non-p2p optimisations on the way.
Our approach we tried before is to let userspace to register dma-buf fd inside io_uring as a register buffer, prepare everything in advance like dmabuf attach, and then rw/send/etc. can use that.

> The flow is as below:
> 1. Application passes the dma_buf fd to the kernel through liburing.
> 2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory.
> 3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver.
> 4. NVMe SSD DMA engine DMA the data to/from the physical memory address.
> 
> The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data).
> But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor.
> 
> Do you have any recommendations? Much appreciated!

For finding a device pointer, we added an optional file operation callback. I think that's much better than parsing it on the io_uring side, especially since we need a guarantee that the device is the only one which will be targeted and won't change (e.g. network may choose a device dynamically based on target address).

I think we have space to cooperate here :)

--
Pavel Begunkov

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-07-13  5:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <BY5PR11MB399005DAD1BB172B7A42586AEFB59@BY5PR11MB3990.namprd11.prod.outlook.com>
2022-06-23  6:17 ` dma_buf support with io_uring Fang, Wilson
2022-06-23 10:35   ` Pavel Begunkov
2022-07-13  5:41     ` Fang, Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox