* How to use iouring zcrx with NIC teaming? @ 2025-09-11 3:46 Chao Shi 2025-09-15 20:20 ` David Wei 0 siblings, 1 reply; 3+ messages in thread From: Chao Shi @ 2025-09-11 3:46 UTC (permalink / raw) To: io-uring Hello, I'm running into a issue when using iouring zcrx with NIC teaming. I'm glad if anyone can help. I wrote a program that uses iouring-zcrx to receive TCP packets. The program works well when only a single net interface is up (by manually `ifconfig down` the other interface). The server uses Broadcom P2100G Dual-Port 100G NIC, and is configured link aggregation with teaming. Teaming works at L2, i.e. TCP packets (of single or multiple connections) may come from arbitrary port. I'm using kernel 6.16.4. To illustrate this issue, consider the belowing example: The server program registered **two** zcrx IFQs (2 data buffers and 2 refill rings), one for each NIC port. It accepts an incoming TCP connection. The server receives packets from that connection, by submiting RECV_ZC sqes. Here comes the problem. The field `zcrx_ifq_idx` of sqe is used to specify which IFQ will be used. However, which IFQ to use is not known before packets are received. If `zcrx_ifq_idx` specifies the wrong IFQ, the kernel will fallback to copying. In a rare but possible situation, packets of a single TCP connection may received from both ports. I'm looking forward if anyone can help. I'm new here, so correct me if I am wrong. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: How to use iouring zcrx with NIC teaming? 2025-09-11 3:46 How to use iouring zcrx with NIC teaming? Chao Shi @ 2025-09-15 20:20 ` David Wei 2025-09-17 14:28 ` 回复:How " 石超 0 siblings, 1 reply; 3+ messages in thread From: David Wei @ 2025-09-15 20:20 UTC (permalink / raw) To: Chao Shi, io-uring On 2025-09-10 20:46, Chao Shi wrote: > Hello, > > I'm running into a issue when using iouring zcrx with NIC teaming. > I'm glad if anyone can help. > > I wrote a program that uses iouring-zcrx to receive TCP packets. The > program works well when only a single net interface is up (by manually > `ifconfig down` the other interface). The server uses Broadcom P2100G > Dual-Port 100G NIC, and is configured link aggregation with teaming. > Teaming works at L2, i.e. TCP packets (of single or multiple > connections) may come from arbitrary port. I'm using kernel 6.16.4. Hi Chao. I'm not familiar with NIC bonding. Can it be guaranteed that packets belonging to a single connection (as defined by its 5-tuple) always go to the same port? > > To illustrate this issue, consider the belowing example: > > The server program registered **two** zcrx IFQs (2 data buffers and 2 > refill rings), one for each NIC port. It accepts an incoming TCP > connection. The server receives packets from that connection, by > submiting RECV_ZC sqes. Here comes the problem. The field > `zcrx_ifq_idx` of sqe is used to specify which IFQ will be used. > However, which IFQ to use is not known before packets are received. If > `zcrx_ifq_idx` specifies the wrong IFQ, the kernel will fallback to > copying. In a rare but possible situation, packets of a single TCP > connection may received from both ports. How can this be possible? Can this behaviour be disabled such that the same 5-tuple is always hashed to the same port, and then hashed to the same rx queue? This sounds similar to a single NIC but multiple ifqs, one per rx queue, in an RSS contxt. I use SO_INCOMING_NAPI_ID at connection accept time to determine which ifq to process the socket on to avoid copy fallback. > > I'm looking forward if anyone can help. I'm new here, so correct me > if I am wrong. ^ permalink raw reply [flat|nested] 3+ messages in thread
* 回复:How to use iouring zcrx with NIC teaming? 2025-09-15 20:20 ` David Wei @ 2025-09-17 14:28 ` 石超 0 siblings, 0 replies; 3+ messages in thread From: 石超 @ 2025-09-17 14:28 UTC (permalink / raw) To: David Wei, io-uring; +Cc: 席永青(席言) Hi David, Thanks for your reply. I tested your approach (SO_INCOMING_NAPI_ID after connection gets accepted) and it basically works. There is still left a subtle issue, that the connection may silently migrate from one NIC port to another. It's OK for short-lived connections. For long-lived connections, I think we have to periodically calls getopt(SO_INCOMING_NAPI_ID) to detect whether or not the connection is at the correct port. By the way, is there a flag in SQE to indicate whether zero copy is used? I'm not an expert on bonding either. I asked network ops guy (cc'ed by this mail) and here are answers to your questions. >> Can it be guaranteed that packets belonging to a single connection (as defined by its 5-tuple) always go to the same port? No. In most cases, packets of a single connection do go into the same port, but it is not guaranteed. In a rare case, where switches are down, the packets may move to another port. Think of this example: eth0 ---- sw1a --- sw2a --- sw3a --- eth0 host1 X X host2 eth1 ---- sw1b --- sw2b --- sw3b --- eth1 (See https://gist.github.com/stepinto/cd23803e21da1d0100e8b3941308ca8f in case the above ASCII diagram is not rendered correctly.) eth0 and eth1 are NIC ports of hosts. sw1a, 1b, 3a and 3c are ToR switches. sw2a and 2b are aggregation switches. If any cables (for example host1's eth0 --- sw1a, sw1a --- sw2a) are broken, the TCP connection keeps good, but may switch to another port at host2 side. >> Can this behaviour be disabled such that the same 5-tuple is always hashed to the same port, and then hashed to the same rx queue? No. This is a behavior on switch and good for failure tolerance of switches. See above example. ------------------------------------------------------------------ 发件人:David Wei <dw@davidwei.uk> 发送时间:2025年9月16日(周二) 04:28 收件人:"石超"<chao.shi@alibaba-inc.com>; "io-uring"<io-uring@vger.kernel.org> 主 题:Re: How to use iouring zcrx with NIC teaming? On 2025-09-10 20:46, Chao Shi wrote: > Hello, > > I'm running into a issue when using iouring zcrx with NIC teaming. > I'm glad if anyone can help. > > I wrote a program that uses iouring-zcrx to receive TCP packets. The > program works well when only a single net interface is up (by manually > `ifconfig down` the other interface). The server uses Broadcom P2100G > Dual-Port 100G NIC, and is configured link aggregation with teaming. > Teaming works at L2, i.e. TCP packets (of single or multiple > connections) may come from arbitrary port. I'm using kernel 6.16.4. Hi Chao. I'm not familiar with NIC bonding. Can it be guaranteed that packets belonging to a single connection (as defined by its 5-tuple) always go to the same port? > > To illustrate this issue, consider the belowing example: > > The server program registered **two** zcrx IFQs (2 data buffers and 2 > refill rings), one for each NIC port. It accepts an incoming TCP > connection. The server receives packets from that connection, by > submiting RECV_ZC sqes. Here comes the problem. The field > `zcrx_ifq_idx` of sqe is used to specify which IFQ will be used. > However, which IFQ to use is not known before packets are received. If > `zcrx_ifq_idx` specifies the wrong IFQ, the kernel will fallback to > copying. In a rare but possible situation, packets of a single TCP > connection may received from both ports. How can this be possible? Can this behaviour be disabled such that the same 5-tuple is always hashed to the same port, and then hashed to the same rx queue? This sounds similar to a single NIC but multiple ifqs, one per rx queue, in an RSS contxt. I use SO_INCOMING_NAPI_ID at connection accept time to determine which ifq to process the socket on to avoid copy fallback. > > I'm looking forward if anyone can help. I'm new here, so correct me > if I am wrong. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-09-17 14:28 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-11 3:46 How to use iouring zcrx with NIC teaming? Chao Shi 2025-09-15 20:20 ` David Wei 2025-09-17 14:28 ` 回复:How " 石超
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox