public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: "Fang, Wilson" <[email protected]>,
	"[email protected]" <[email protected]>
Cc: Jens Axboe <[email protected]>
Subject: Re: dma_buf support with io_uring
Date: Thu, 23 Jun 2022 11:35:02 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <BY5PR11MB399055971B9A3902CC3A3121EFB59@BY5PR11MB3990.namprd11.prod.outlook.com>

On 6/23/22 07:17, Fang, Wilson wrote:
> Hi Jens,
> 
> We are exploring a kernel native mechanism to support peer to peer data transfer between a NVMe SSD and another device supporting dma_buf, connected on the same PCIe root complex.
> NVMe SSD DMA engine requires physical memory address and there is no easy way to pass non system memory address through VFS to the block device driver.
> One of the ideas is to use the io_uring and dma_buf mechanism which is supported by the peer device of the SSD.

Interesting, that's quite aligns with what we're doing, that is a
more generic way for p2p with some non-p2p optimisations on the way.
Our approach we tried before is to let userspace to register dma-buf
fd inside io_uring as a register buffer, prepare everything in advance
like dmabuf attach, and then rw/send/etc. can use that.

> The flow is as below:
> 1. Application passes the dma_buf fd to the kernel through liburing.
> 2. Io_uring adds two new options IORING_OP_READ_DMA and IORING_OP_WRITE_DMA to support read write operations that DMA to/from the peer device memory.
> 3. If the dma_buf fd is valid, io_uring attaches dma_buf and get sgl which contains physical memory addresses to be passed down to the block device driver.
> 4. NVMe SSD DMA engine DMA the data to/from the physical memory address.
> 
> The road blocker we are facing is that dma_buf_attach() and dma_buf_map_attachment() APIs expects the caller to provide the struct device *dev as input parameter pointing to the device which does the DMA (in this case the block/NVMe device that holds the source data).
> But since io_uring operates at the VFS layer there is no straight forward way of finding the block/NVMe device object (struct device*) from the source file descriptor.
> 
> Do you have any recommendations? Much appreciated!

For finding a device pointer, we added an optional file operation
callback. I think that's much better than parsing it on the io_uring
side, especially since we need a guarantee that the device is the
only one which will be targeted and won't change (e.g. network may
choose a device dynamically based on target address).

I think we have space to cooperate here :)

-- 
Pavel Begunkov

  reply	other threads:[~2022-06-23 10:35 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BY5PR11MB399005DAD1BB172B7A42586AEFB59@BY5PR11MB3990.namprd11.prod.outlook.com>
2022-06-23  6:17 ` dma_buf support with io_uring Fang, Wilson
2022-06-23 10:35   ` Pavel Begunkov [this message]
2022-07-13  5:41     ` Fang, Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox