* dma_buf support with io_uring [not found] <BY5PR11MB399005DAD1BB172B7A42586AEFB59@BY5PR11MB3990.namprd11.prod.outlook.com> @ 2022-06-23 6:17 ` Fang, Wilson 2022-06-23 10:35 ` Pavel Begunkov 0 siblings, 1 reply; 3+ messages in thread From: Fang, Wilson @ 2022-06-23 6:17 UTC (permalink / raw) To: [email protected]; +Cc: Jens Axboe Hi Jens, We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex. NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver. One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD. The flow is as below: 1. Application passes the dma_buf fd to the kernel through liburing. 2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory. 3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver. 4. NVMe SSD DMA engine DMA the data to/from the physical memory address. The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data). But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor. Do you have any recommendations? Much appreciated! Thanks, Wilson ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: dma_buf support with io_uring 2022-06-23 6:17 ` dma_buf support with io_uring Fang, Wilson @ 2022-06-23 10:35 ` Pavel Begunkov 2022-07-13 5:41 ` Fang, Wilson 0 siblings, 1 reply; 3+ messages in thread From: Pavel Begunkov @ 2022-06-23 10:35 UTC (permalink / raw) To: Fang, Wilson, [email protected]; +Cc: Jens Axboe On 6/23/22 07:17, Fang, Wilson wrote: > Hi Jens, > > We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex. > NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver. > One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD. Interesting, that's quite aligns with what we're doing, that is a more generic way for p2p with some non-p2p optimisations on the way. Our approach we tried before is to let userspace to register dma-buf fd inside io_uring as a register buffer, prepare everything in advance like dmabuf attach, and then rw/send/etc. can use that. > The flow is as below: > 1. Application passes the dma_buf fd to the kernel through liburing. > 2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory. > 3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver. > 4. NVMe SSD DMA engine DMA the data to/from the physical memory address. > > The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data). > But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor. > > Do you have any recommendations? Much appreciated! For finding a device pointer, we added an optional file operation callback. I think that's much better than parsing it on the io_uring side, especially since we need a guarantee that the device is the only one which will be targeted and won't change (e.g. network may choose a device dynamically based on target address). I think we have space to cooperate here :) -- Pavel Begunkov ^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: dma_buf support with io_uring 2022-06-23 10:35 ` Pavel Begunkov @ 2022-07-13 5:41 ` Fang, Wilson 0 siblings, 0 replies; 3+ messages in thread From: Fang, Wilson @ 2022-07-13 5:41 UTC (permalink / raw) To: Pavel Begunkov, [email protected]; +Cc: Jens Axboe Thanks, Pavel, for the recommendation! We are super interested in collaborating on this - we are working on the prototype of your recommendation but moving a little bit slow due to vacation and resources. Thanks, Wilson -----Original Message----- From: Pavel Begunkov <[email protected]> Sent: Thursday, June 23, 2022 3:35 AM To: Fang, Wilson <[email protected]>; [email protected] Cc: Jens Axboe <[email protected]> Subject: Re: dma_buf support with io_uring On 6/23/22 07:17, Fang, Wilson wrote: > Hi Jens, > > We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex. > NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver. > One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD. Interesting, that's quite aligns with what we're doing, that is a more generic way for p2p with some non-p2p optimisations on the way. Our approach we tried before is to let userspace to register dma-buf fd inside io_uring as a register buffer, prepare everything in advance like dmabuf attach, and then rw/send/etc. can use that. > The flow is as below: > 1. Application passes the dma_buf fd to the kernel through liburing. > 2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory. > 3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver. > 4. NVMe SSD DMA engine DMA the data to/from the physical memory address. > > The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data). > But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor. > > Do you have any recommendations? Much appreciated! For finding a device pointer, we added an optional file operation callback. I think that's much better than parsing it on the io_uring side, especially since we need a guarantee that the device is the only one which will be targeted and won't change (e.g. network may choose a device dynamically based on target address). I think we have space to cooperate here :) -- Pavel Begunkov ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-07-13 5:42 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <BY5PR11MB399005DAD1BB172B7A42586AEFB59@BY5PR11MB3990.namprd11.prod.outlook.com> 2022-06-23 6:17 ` dma_buf support with io_uring Fang, Wilson 2022-06-23 10:35 ` Pavel Begunkov 2022-07-13 5:41 ` Fang, Wilson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox