InfiniBand vs Ethernet for AI Fabrics
"Just buy InfiniBand" used to be the safe answer for any non-trivial GPU cluster. That answer is changing. Modern Ethernet — RoCEv2 with PFC, ECN/DCQCN, DLB, and the upcoming GLB and UEC — closes most of the performance gap while opening the multi-vendor, open-hardware door that hyperscalers are walking through. Here's the architect's-view decision guide.
Two Fabrics, Two Operating Models
Left: a single-vendor InfiniBand fabric — one IB silicon vendor, one set of switches, one NIC ecosystem. Right: a multi-vendor open Ethernet fabric — RoCEv2 / UEC NICs from any vendor, switch silicon from Broadcom, OcNOS-DC as the NOS, the same protocols as the rest of your DC.
The honest comparison
InfiniBand was purpose-built for low-latency, lossless RDMA. For two decades, that gave it a real performance edge for tightly-coupled HPC workloads. Modern Ethernet — built on the DCB stack, RoCEv2, and increasingly DLB and UEC — has spent the last several years closing that gap. The remaining gaps matter for some workloads and don't for others. The right answer is workload-specific, not religious.
| Axis | InfiniBand | Ethernet (RoCEv2 / UEC) |
|---|---|---|
| Latency floor | Very low end-to-end NIC-to-NIC; typical hundreds of nanoseconds switch hop. | Higher floor than IB by hundreds of nanoseconds, but well below the threshold that affects most distributed-training collectives at scale. |
| Loss tolerance | Lossless by architecture (credit-based flow control). | Lossless via PFC + ECN + DCQCN. Production-grade today; UEC further reduces dependence on PFC pause. |
| Multi-path / load balancing | Adaptive routing built into the spec. | Static ECMP, plus DLB for adaptive single-hop, GLB (OcNOS 7.1) for end-to-end, UEC packet-spray for next-gen. |
| Vendor ecosystem | Effectively single-vendor for both NIC and switch silicon. | Multi-vendor at every layer — ASIC, switch, NIC, NOS, optics. UEC is explicitly designed for vendor-neutral interop. |
| Operational model | Subnet manager (UFM-class). Different from rest of DC. Separate skills, separate tooling. | Same BGP, EVPN, gNMI you already run. Same automation tools (Ansible, NETCONF, OpenConfig) as the rest of DC. |
| Multi-tenancy | Limited; partitioning exists but is not a first-class concept. | First-class via EVPN-VXLAN. GPU-as-a-Service, multi-team clusters, shared infra all natural. |
| Long-haul DCI | Not designed for it; needs IB-over-WAN gateways. | Native via 400G ZR/ZR+ coherent pluggables and EVPN inter-DC. |
| Storage convergence | Storage runs alongside compute; needs IB-attached storage. | NVMe-oF, NFS, S3 all over the same Ethernet fabric. |
| Cost / port (typical 400G+) | Premium; single-vendor pricing. | Open-hardware spine + OcNOS-DC NOS materially undercuts vendor-locked alternatives. |
| Roadmap velocity | Driven by one vendor's release cadence. | UEC consortium (AMD, Arista, Broadcom, Cisco, HPE, Intel, Meta, Microsoft, Oracle …) drives openly published spec evolution. |
Where each one wins
Latency floor is contractual
HPC simulation workloads where every-collective-counts and the absolute latency floor matters more than total cost of ownership. Tight, captive single-tenant clusters where lock-in is acceptable.
Operational model matters
Multi-tenant GPU-as-a-Service. AI clusters that share infrastructure with the rest of DC. Anything where the team wants one operational model, one tooling stack, and a multi-vendor supply chain.
Cost-per-GPU-flop is the gate
Open-hardware spines + OcNOS-DC eliminates the proprietary network tax. On a multi-thousand-GPU cluster, the saved capex frequently buys additional GPU capacity.
The fabric extends across DCs
If a training run will ever span two halls or two regions, Ethernet wins by default — coherent DCI, EVPN inter-DC, and standard multi-vendor optics make this a one-day-of-work problem rather than a quarter-long line-system project.
Where modern Ethernet has closed the gap
Lossless behaviour. RoCEv2 with PFC, DCQCN, and the OcNOS-DC PFC deadlock watchdog is production-grade today. The "Ethernet drops packets" critique stops being relevant once these are configured correctly.
Adaptive routing. Static ECMP collisions on AI workloads are real — but DLB rebins flowlets on local congestion in sub-millisecond windows, and GLB in OcNOS 7.1 extends that to full end-to-end path scoring.
Spray-friendly transport. Ultra Ethernet (UEC) brings packet spray, multi-path RDMA, out-of-order delivery, and selective retransmit to standard Ethernet. The architectural advantages that defined InfiniBand are arriving on a multi-vendor open stack.
The TCO conversation
For most production AI fabric decisions in 2026, the network is 5–8% of cluster TCO over five years. The InfiniBand premium typically lands in the +30% to +60% range over open-hardware Ethernet for equivalent capacity. On a $100M cluster, that's a meaningful number — but the more important number is what you can do with the saved capex (more GPUs, larger storage tier, second site for HA). And for clusters where the network is multi-tenant or shared with the rest of DC, the operational simplification of one network model is worth more than its line-item cost difference.
The IP Infusion view
- Both have a place. We're not going to pretend Ethernet wins every workload. Tight HPC clusters with absolute-floor latency requirements will keep buying InfiniBand for a while.
- Most AI fabrics belong on Ethernet. Production AI training and inference at hyperscale is moving to Ethernet because the operational and economic case is overwhelming once the technical gap closes — and it's closing fast.
- OcNOS-DC is the open path. RoCEv2 today, DLB today, GLB next, UEC as NICs ship. One NOS, one feature roadmap, on validated open hardware from Edgecore, UfiSpace, Wedge, and others.
- The architecture review is free. If you're sizing a fabric and want a workload-specific take rather than a vendor pitch, our network architects will run the maths with you.