public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: Allow non parent devices to be used for ZC DMA
@ 2025-07-09 12:40 Dragos Tatulea
  2025-07-09 13:50 ` Pavel Begunkov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dragos Tatulea @ 2025-07-09 12:40 UTC (permalink / raw)
  To: almasrymina, asml.silence, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jens Axboe, Simona Vetter, Willem de Bruijn, Kaiyuan Zhang
  Cc: Dragos Tatulea, cratiu, parav, Tariq Toukan, netdev, linux-kernel,
	io-uring

For zerocopy (io_uring, devmem), there is an assumption that the
parent device can do DMA. However that is not always the case:
ScalableFunction devices have the DMA device in the grandparent.

This patch adds a helper for getting the DMA device for a netdev from
its parent or grandparent if necessary. The NULL case is handled in the
callers.

devmem and io_uring are updated accordingly to use this helper instead
of directly using the parent.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
Changes in v1:
- Upgraded from RFC status.
- Dropped driver specific bits for generic solution.
- Implemented single patch as a fix as requested in RFC.
- Handling of multi-PF netdevs will be handled in a subsequent patch
  series.

RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
---
 include/linux/netdevice.h | 14 ++++++++++++++
 io_uring/zcrx.c           |  2 +-
 net/core/devmem.c         | 10 +++++++++-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5847c20994d3..1cbde7193c4d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -5560,4 +5560,18 @@ extern struct net_device *blackhole_netdev;
 		atomic_long_add((VAL), &(DEV)->stats.__##FIELD)
 #define DEV_STATS_READ(DEV, FIELD) atomic_long_read(&(DEV)->stats.__##FIELD)
 
+static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
+{
+	struct device *dma_dev = dev->dev.parent;
+
+	if (!dma_dev)
+		return NULL;
+
+	/* Some devices (e.g. SFs) have the dma device as a grandparent. */
+	if (!dma_dev->dma_mask)
+		dma_dev = dma_dev->parent;
+
+	return (dma_dev && dma_dev->dma_mask) ? dma_dev : NULL;
+}
+
 #endif	/* _LINUX_NETDEVICE_H */
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 797247a34cb7..93462e5b2207 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -584,7 +584,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 		goto err;
 	}
 
-	ifq->dev = ifq->netdev->dev.parent;
+	ifq->dev = netdev_get_dma_dev(ifq->netdev);
 	if (!ifq->dev) {
 		ret = -EOPNOTSUPP;
 		goto err;
diff --git a/net/core/devmem.c b/net/core/devmem.c
index b3a62ca0df65..881044e0ae0e 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -183,6 +183,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 {
 	struct net_devmem_dmabuf_binding *binding;
 	static u32 id_alloc_next;
+	struct device *dma_dev;
 	struct scatterlist *sg;
 	struct dma_buf *dmabuf;
 	unsigned int sg_idx, i;
@@ -193,6 +194,13 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 	if (IS_ERR(dmabuf))
 		return ERR_CAST(dmabuf);
 
+	dma_dev = netdev_get_dma_dev(dev);
+	if (!dma_dev) {
+		err = -EOPNOTSUPP;
+		NL_SET_ERR_MSG(extack, "Device doesn't support dma");
+		goto err_put_dmabuf;
+	}
+
 	binding = kzalloc_node(sizeof(*binding), GFP_KERNEL,
 			       dev_to_node(&dev->dev));
 	if (!binding) {
@@ -209,7 +217,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 
 	binding->dmabuf = dmabuf;
 
-	binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
+	binding->attachment = dma_buf_attach(binding->dmabuf, dma_dev);
 	if (IS_ERR(binding->attachment)) {
 		err = PTR_ERR(binding->attachment);
 		NL_SET_ERR_MSG(extack, "Failed to bind dmabuf to device");
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA
  2025-07-09 12:40 [PATCH net] net: Allow non parent devices to be used for ZC DMA Dragos Tatulea
@ 2025-07-09 13:50 ` Pavel Begunkov
  2025-07-09 19:29 ` Mina Almasry
  2025-07-10 23:49 ` Jakub Kicinski
  2 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2025-07-09 13:50 UTC (permalink / raw)
  To: Dragos Tatulea, almasrymina, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jens Axboe, Simona Vetter, Willem de Bruijn, Kaiyuan Zhang
  Cc: cratiu, parav, Tariq Toukan, netdev, linux-kernel, io-uring

On 7/9/25 13:40, Dragos Tatulea wrote:
> For zerocopy (io_uring, devmem), there is an assumption that the
> parent device can do DMA. However that is not always the case:
> ScalableFunction devices have the DMA device in the grandparent.
> 
> This patch adds a helper for getting the DMA device for a netdev from
> its parent or grandparent if necessary. The NULL case is handled in the
> callers.
> 
> devmem and io_uring are updated accordingly to use this helper instead
> of directly using the parent.
> 
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> Changes in v1:
> - Upgraded from RFC status.
> - Dropped driver specific bits for generic solution.
> - Implemented single patch as a fix as requested in RFC.
> - Handling of multi-PF netdevs will be handled in a subsequent patch
>    series.
> 
> RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
I can't say anything about the walking to grand parent part, but
the rest looks good.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA
  2025-07-09 12:40 [PATCH net] net: Allow non parent devices to be used for ZC DMA Dragos Tatulea
  2025-07-09 13:50 ` Pavel Begunkov
@ 2025-07-09 19:29 ` Mina Almasry
  2025-07-09 19:53   ` Dragos Tatulea
  2025-07-10 23:49 ` Jakub Kicinski
  2 siblings, 1 reply; 7+ messages in thread
From: Mina Almasry @ 2025-07-09 19:29 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: asml.silence, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jens Axboe,
	Simona Vetter, Willem de Bruijn, Kaiyuan Zhang, cratiu, parav,
	Tariq Toukan, netdev, linux-kernel, io-uring

On Wed, Jul 9, 2025 at 5:46 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> For zerocopy (io_uring, devmem), there is an assumption that the
> parent device can do DMA. However that is not always the case:
> ScalableFunction devices have the DMA device in the grandparent.
>
> This patch adds a helper for getting the DMA device for a netdev from
> its parent or grandparent if necessary. The NULL case is handled in the
> callers.
>
> devmem and io_uring are updated accordingly to use this helper instead
> of directly using the parent.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")

nit: This doesn't seem like a fix? The current code supports all
devices that are not SF well enough, right? And in the case of SF
devices, I expect net_devmem_bind_dmabuf() to fail gracefully as the
dma mapping of a device that doesn't support it, I think, would fail
gracefully. So to me this seems like an improvement rather than a bug
fix.

> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> Changes in v1:
> - Upgraded from RFC status.
> - Dropped driver specific bits for generic solution.
> - Implemented single patch as a fix as requested in RFC.
> - Handling of multi-PF netdevs will be handled in a subsequent patch
>   series.
>
> RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
> ---
>  include/linux/netdevice.h | 14 ++++++++++++++
>  io_uring/zcrx.c           |  2 +-
>  net/core/devmem.c         | 10 +++++++++-
>  3 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5847c20994d3..1cbde7193c4d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -5560,4 +5560,18 @@ extern struct net_device *blackhole_netdev;
>                 atomic_long_add((VAL), &(DEV)->stats.__##FIELD)
>  #define DEV_STATS_READ(DEV, FIELD) atomic_long_read(&(DEV)->stats.__##FIELD)
>
> +static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
> +{
> +       struct device *dma_dev = dev->dev.parent;
> +
> +       if (!dma_dev)
> +               return NULL;
> +
> +       /* Some devices (e.g. SFs) have the dma device as a grandparent. */
> +       if (!dma_dev->dma_mask)

I was able to confirm that !dev->dma_mask means "this device doesn't
support dma". Multiple existing places in the code seem to use this
check.

> +               dma_dev = dma_dev->parent;
> +
> +       return (dma_dev && dma_dev->dma_mask) ? dma_dev : NULL;

This may be a noob question, but are we sure that !dma_dev->dma_mask
&& dma_dev->parent->dma_mask != NULL means that the parent is the
dma-device that we should use? I understand SF devices work that way
but it's not immediately obvious to me that this is generically true.

For example pavel came up with the case where for veth,
netdev->dev.parent == NULL , I wonder if there are weird devices in
the wild where netdev->dev.parent->dma_mask == NULL but that doesn't
necessarily mean that the grandparent is the dma-device that we should
use.

I guess to keep my long question short: what makes you think this is
generically safe to do? Or is it not, but we think most devices behave
this way and we're going to handle more edge cases in follow up
patches?

> +}
> +
>  #endif /* _LINUX_NETDEVICE_H */
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 797247a34cb7..93462e5b2207 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -584,7 +584,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
>                 goto err;
>         }
>
> -       ifq->dev = ifq->netdev->dev.parent;
> +       ifq->dev = netdev_get_dma_dev(ifq->netdev);

nit: this hunk will not apply when backporting this to trees that only
have the Fixes commit... which makes it more weird that this is
considered a fix for that, but I'm fine either way.



--
Thanks,
Mina

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA
  2025-07-09 19:29 ` Mina Almasry
@ 2025-07-09 19:53   ` Dragos Tatulea
  2025-07-09 22:56     ` Mina Almasry
  0 siblings, 1 reply; 7+ messages in thread
From: Dragos Tatulea @ 2025-07-09 19:53 UTC (permalink / raw)
  To: Mina Almasry
  Cc: asml.silence, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jens Axboe,
	Simona Vetter, Willem de Bruijn, Kaiyuan Zhang, cratiu, parav,
	Tariq Toukan, netdev, linux-kernel, io-uring

On Wed, Jul 09, 2025 at 12:29:22PM -0700, Mina Almasry wrote:
> On Wed, Jul 9, 2025 at 5:46 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> >
> > For zerocopy (io_uring, devmem), there is an assumption that the
> > parent device can do DMA. However that is not always the case:
> > ScalableFunction devices have the DMA device in the grandparent.
> >
> > This patch adds a helper for getting the DMA device for a netdev from
> > its parent or grandparent if necessary. The NULL case is handled in the
> > callers.
> >
> > devmem and io_uring are updated accordingly to use this helper instead
> > of directly using the parent.
> >
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
> 
> nit: This doesn't seem like a fix? The current code supports all
> devices that are not SF well enough, right? And in the case of SF
> devices, I expect net_devmem_bind_dmabuf() to fail gracefully as the
> dma mapping of a device that doesn't support it, I think, would fail
> gracefully. So to me this seems like an improvement rather than a bug
> fix.
>
dma_buf_map_attachment_unlocked() will return a sg_table with 0 nents.
That is graceful. However this will result in page_pools that will
always be returning errors further down the line which is very confusing
regarding the motives that caused it.

I am also fine to not make it a fix btw. Especially since the mlx5
devmem code was just accepted.

> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > ---
> > Changes in v1:
> > - Upgraded from RFC status.
> > - Dropped driver specific bits for generic solution.
> > - Implemented single patch as a fix as requested in RFC.
> > - Handling of multi-PF netdevs will be handled in a subsequent patch
> >   series.
> >
> > RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
> > ---
> >  include/linux/netdevice.h | 14 ++++++++++++++
> >  io_uring/zcrx.c           |  2 +-
> >  net/core/devmem.c         | 10 +++++++++-
> >  3 files changed, 24 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 5847c20994d3..1cbde7193c4d 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -5560,4 +5560,18 @@ extern struct net_device *blackhole_netdev;
> >                 atomic_long_add((VAL), &(DEV)->stats.__##FIELD)
> >  #define DEV_STATS_READ(DEV, FIELD) atomic_long_read(&(DEV)->stats.__##FIELD)
> >
> > +static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
> > +{
> > +       struct device *dma_dev = dev->dev.parent;
> > +
> > +       if (!dma_dev)
> > +               return NULL;
> > +
> > +       /* Some devices (e.g. SFs) have the dma device as a grandparent. */
> > +       if (!dma_dev->dma_mask)
> 
> I was able to confirm that !dev->dma_mask means "this device doesn't
> support dma". Multiple existing places in the code seem to use this
> check.
>
Ack. That was my understanding as well.

> > +               dma_dev = dma_dev->parent;
> > +
> > +       return (dma_dev && dma_dev->dma_mask) ? dma_dev : NULL;
> 
> This may be a noob question, but are we sure that !dma_dev->dma_mask
> && dma_dev->parent->dma_mask != NULL means that the parent is the
> dma-device that we should use? I understand SF devices work that way
> but it's not immediately obvious to me that this is generically true.
>
This is what I gathered from Parav's answer.

> For example pavel came up with the case where for veth,
> netdev->dev.parent == NULL , I wonder if there are weird devices in
> the wild where netdev->dev.parent->dma_mask == NULL but that doesn't
> necessarily mean that the grandparent is the dma-device that we should
> use.
>
Yep.

> I guess to keep my long question short: what makes you think this is
> generically safe to do? Or is it not, but we think most devices behave
> this way and we're going to handle more edge cases in follow up
> patches?
>
It is just what we know so far about SFs. See end of mail.

> > +}
> > +
> >  #endif /* _LINUX_NETDEVICE_H */
> > diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> > index 797247a34cb7..93462e5b2207 100644
> > --- a/io_uring/zcrx.c
> > +++ b/io_uring/zcrx.c
> > @@ -584,7 +584,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
> >                 goto err;
> >         }
> >
> > -       ifq->dev = ifq->netdev->dev.parent;
> > +       ifq->dev = netdev_get_dma_dev(ifq->netdev);
> 
> nit: this hunk will not apply when backporting this to trees that only
> have the Fixes commit... which makes it more weird that this is
> considered a fix for that, but I'm fine either way.
>
Ouch, indeed. Should have thought of that.

Big picture view:
Maybe after all it is more generic to have queue ops that
can provide this info. For zcrx it is trivial (see below hunk).

For devmem I was thinking of calling netdev_queue_get_dma_dev()
for every bound queue (before the mapping) and return an error
only when we find different . It will make netdev_nl_bind_rx_doit()
a bit icky, but the idea is not complicated. What do you think?

---
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index 6e835972abd1..04c69f39558d 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -127,6 +127,9 @@ void netdev_stat_queue_sum(struct net_device *netdev,
  * @ndo_queue_stop:    Stop the RX queue at the specified index. The stopped
  *                     queue's memory is written at the specified address.
  *
+ * @ndo_queue_get_dma_dev: When set, the driver can provide the DMA device to
+ *                        be used for the given queue.
+ *
  * Note that @ndo_queue_mem_alloc and @ndo_queue_mem_free may be called while
  * the interface is closed. @ndo_queue_start and @ndo_queue_stop will only
  * be called for an interface which is open.
@@ -144,6 +147,8 @@ struct netdev_queue_mgmt_ops {
        int                     (*ndo_queue_stop)(struct net_device *dev,
                                                  void *per_queue_mem,
                                                  int idx);
+       struct device *         (*ndo_queue_get_dma_dev)(const struct net_device *dev,
+                                                        int idx);
 };
 
 /**
@@ -321,4 +326,15 @@ static inline void netif_subqueue_sent(const struct net_device *dev,
                                         get_desc, start_thrs);         \
        })
 
+static inline struct device *netdev_queue_get_dma_dev(const struct net_device *dev,
+                                                     int idx)
+{
+       const struct netdev_queue_mgmt_ops *qops = dev->queue_mgmt_ops;
+
+       if (qops && qops->ndo_queue_get_dma_dev)
+               return qops->ndo_queue_get_dma_dev(dev, idx);
+
+       return netdev_get_dma_dev(dev);
+}
+
 #endif
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 93462e5b2207..478693a6d325 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -12,6 +12,7 @@
 #include <net/page_pool/helpers.h>
 #include <net/page_pool/memory_provider.h>
 #include <net/netlink.h>
+#include <net/netdev_queues.h>
 #include <net/netdev_rx_queue.h>
 #include <net/tcp.h>
 #include <net/rps.h>
@@ -584,7 +585,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
                goto err;
        }
 
-       ifq->dev = netdev_get_dma_dev(ifq->netdev);
+       ifq->dev = netdev_queue_get_dma_dev(ifq->netdev, reg.if_rxq);
        if (!ifq->dev) {
                ret = -EOPNOTSUPP;
                goto err;
---

Thanks,
Dragos

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA
  2025-07-09 19:53   ` Dragos Tatulea
@ 2025-07-09 22:56     ` Mina Almasry
  2025-07-10 23:47       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Mina Almasry @ 2025-07-09 22:56 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: asml.silence, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jens Axboe,
	Simona Vetter, Willem de Bruijn, Kaiyuan Zhang, cratiu, parav,
	Tariq Toukan, netdev, linux-kernel, io-uring

On Wed, Jul 9, 2025 at 12:54 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Wed, Jul 09, 2025 at 12:29:22PM -0700, Mina Almasry wrote:
> > On Wed, Jul 9, 2025 at 5:46 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > For zerocopy (io_uring, devmem), there is an assumption that the
> > > parent device can do DMA. However that is not always the case:
> > > ScalableFunction devices have the DMA device in the grandparent.
> > >
> > > This patch adds a helper for getting the DMA device for a netdev from
> > > its parent or grandparent if necessary. The NULL case is handled in the
> > > callers.
> > >
> > > devmem and io_uring are updated accordingly to use this helper instead
> > > of directly using the parent.
> > >
> > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
> >
> > nit: This doesn't seem like a fix? The current code supports all
> > devices that are not SF well enough, right? And in the case of SF
> > devices, I expect net_devmem_bind_dmabuf() to fail gracefully as the
> > dma mapping of a device that doesn't support it, I think, would fail
> > gracefully. So to me this seems like an improvement rather than a bug
> > fix.
> >
> dma_buf_map_attachment_unlocked() will return a sg_table with 0 nents.
> That is graceful. However this will result in page_pools that will
> always be returning errors further down the line which is very confusing
> regarding the motives that caused it.
>
> I am also fine to not make it a fix btw. Especially since the mlx5
> devmem code was just accepted.
>

If you submit another version I'd rather it be a non-fix, especially
since applying the io_uring hunk will be challenging when backporting
this patch, but I assume hunk can be dropped while backporting, so I'm
fine either way.

> > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > > ---
> > > Changes in v1:
> > > - Upgraded from RFC status.
> > > - Dropped driver specific bits for generic solution.
> > > - Implemented single patch as a fix as requested in RFC.
> > > - Handling of multi-PF netdevs will be handled in a subsequent patch
> > >   series.
> > >
> > > RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
> > > ---
> > >  include/linux/netdevice.h | 14 ++++++++++++++
> > >  io_uring/zcrx.c           |  2 +-
> > >  net/core/devmem.c         | 10 +++++++++-
> > >  3 files changed, 24 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 5847c20994d3..1cbde7193c4d 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -5560,4 +5560,18 @@ extern struct net_device *blackhole_netdev;
> > >                 atomic_long_add((VAL), &(DEV)->stats.__##FIELD)
> > >  #define DEV_STATS_READ(DEV, FIELD) atomic_long_read(&(DEV)->stats.__##FIELD)
> > >
> > > +static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
> > > +{
> > > +       struct device *dma_dev = dev->dev.parent;
> > > +
> > > +       if (!dma_dev)
> > > +               return NULL;
> > > +
> > > +       /* Some devices (e.g. SFs) have the dma device as a grandparent. */
> > > +       if (!dma_dev->dma_mask)
> >
> > I was able to confirm that !dev->dma_mask means "this device doesn't
> > support dma". Multiple existing places in the code seem to use this
> > check.
> >
> Ack. That was my understanding as well.
>
> > > +               dma_dev = dma_dev->parent;
> > > +
> > > +       return (dma_dev && dma_dev->dma_mask) ? dma_dev : NULL;
> >
> > This may be a noob question, but are we sure that !dma_dev->dma_mask
> > && dma_dev->parent->dma_mask != NULL means that the parent is the
> > dma-device that we should use? I understand SF devices work that way
> > but it's not immediately obvious to me that this is generically true.
> >
> This is what I gathered from Parav's answer.
>
> > For example pavel came up with the case where for veth,
> > netdev->dev.parent == NULL , I wonder if there are weird devices in
> > the wild where netdev->dev.parent->dma_mask == NULL but that doesn't
> > necessarily mean that the grandparent is the dma-device that we should
> > use.
> >
> Yep.
>
> > I guess to keep my long question short: what makes you think this is
> > generically safe to do? Or is it not, but we think most devices behave
> > this way and we're going to handle more edge cases in follow up
> > patches?
> >
> It is just what we know so far about SFs. See end of mail.
>

I see. OK, even though this is 'just what we know so far', I'm still
in favor of this simple approach, but I would say it would be good to
communicate in the comments that this is a best-effort dma-device
finding and doesn't handle every case under the sun. Something like
(untested):

static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
{
       struct device *parent = dev->dev.parent;

       if (!parent)
               return NULL;

       /* For most netdevs, the parent supports dma and is the correct
        * dma-device
        */
       if (parent->dma_mask)
               return parent;

       /* For SF devices, the parent doesn't support dma, but the grandparent
        * does, and is the correct dma-device to use (link to docs that explain
        * this if any).
        */
       if (parent->parent && parent->parent->dma_mask)
               return parent->parent;

       /* If neither the parent nor grandparent support dma, then we're not
        * sure what dma-device to use. Error out. Special handling for new
        * netdevs may need to be added in the future.
        */
       return NULL;
}

With some comments explaining the logic a bit, you can add:

Reviewed-by: Mina Almasry <almasrymina@google.com>

And let's see if Jakub likes this. If not, we can always do the future
proof approach with the queue API giving the driver the ability to
tell us what exactly is the dma-device to use (or whatever approach he
prefers).

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA
  2025-07-09 22:56     ` Mina Almasry
@ 2025-07-10 23:47       ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2025-07-10 23:47 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Dragos Tatulea, asml.silence, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Jens Axboe,
	Simona Vetter, Willem de Bruijn, Kaiyuan Zhang, cratiu, parav,
	Tariq Toukan, netdev, linux-kernel, io-uring

On Wed, 9 Jul 2025 15:56:16 -0700 Mina Almasry wrote:
> > > nit: This doesn't seem like a fix? The current code supports all
> > > devices that are not SF well enough, right? And in the case of SF
> > > devices, I expect net_devmem_bind_dmabuf() to fail gracefully as the
> > > dma mapping of a device that doesn't support it, I think, would fail
> > > gracefully. So to me this seems like an improvement rather than a bug
> > > fix.
> > >  
> > dma_buf_map_attachment_unlocked() will return a sg_table with 0 nents.
> > That is graceful. However this will result in page_pools that will
> > always be returning errors further down the line which is very confusing
> > regarding the motives that caused it.
> >
> > I am also fine to not make it a fix btw. Especially since the mlx5
> > devmem code was just accepted.
> 
> If you submit another version I'd rather it be a non-fix, especially
> since applying the io_uring hunk will be challenging when backporting
> this patch, but I assume hunk can be dropped while backporting, so I'm
> fine either way.

+1 for non-fix. The core is working as expected, if we want a Fixes tag
I'd aim it at mlx5 and make the patch reject queue binding to SFs
_within mlx5_ until the core bits are figured out.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA
  2025-07-09 12:40 [PATCH net] net: Allow non parent devices to be used for ZC DMA Dragos Tatulea
  2025-07-09 13:50 ` Pavel Begunkov
  2025-07-09 19:29 ` Mina Almasry
@ 2025-07-10 23:49 ` Jakub Kicinski
  2 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2025-07-10 23:49 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: almasrymina, asml.silence, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Jens Axboe,
	Simona Vetter, Willem de Bruijn, Kaiyuan Zhang, cratiu, parav,
	Tariq Toukan, netdev, linux-kernel, io-uring

On Wed, 9 Jul 2025 15:40:57 +0300 Dragos Tatulea wrote:
> For zerocopy (io_uring, devmem), there is an assumption that the
> parent device can do DMA. However that is not always the case:
> ScalableFunction devices have the DMA device in the grandparent.
> 
> This patch adds a helper for getting the DMA device for a netdev from
> its parent or grandparent if necessary. The NULL case is handled in the
> callers.
> 
> devmem and io_uring are updated accordingly to use this helper instead
> of directly using the parent.

Sorry for the silence. I'll reply to Parav on the RFC, I don't think
the question I was asking was answered there.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-10 23:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09 12:40 [PATCH net] net: Allow non parent devices to be used for ZC DMA Dragos Tatulea
2025-07-09 13:50 ` Pavel Begunkov
2025-07-09 19:29 ` Mina Almasry
2025-07-09 19:53   ` Dragos Tatulea
2025-07-09 22:56     ` Mina Almasry
2025-07-10 23:47       ` Jakub Kicinski
2025-07-10 23:49 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox