RoCEv2 — Lossless Ethernet for AI Fabrics
RDMA over Converged Ethernet v2 is what carries GPU collective traffic across modern AI fabrics. OcNOS implements the full RoCEv2 toolkit — PFC, ECN/DCQCN, adaptive load-balancing, and per-priority telemetry — on validated 400G and 800G open hardware.
AI Fabric Rail Topology
A compact rail slice — two spines and two leaves carrying RoCEv2 between four GPUs. PFC pause frames travel hop-by-hop on congestion, while ECN marks elephant flows for DCQCN reaction at the source.
Why RoCEv2 matters for AI/ML fabrics
GPU collectives (all-reduce, all-gather, all-to-all) generate elephant flows that saturate single fabric paths and demand near-zero loss to keep training jobs efficient. Drop a single packet on a 400G RoCEv2 link and the affected NIC will re-transmit the entire RDMA send window — measurable as seconds of GPU idle time. RoCEv2 turns a leaf-spine fabric into a lossless transport for these workloads, with three pillars: PFC (Priority Flow Control), ECN (Explicit Congestion Notification), and DCQCN (Data Center Quantized Congestion Notification).
The OcNOS RoCEv2 implementation
Per-priority pause
802.1Qbb PFC on configurable priority queues, paired with watchdog timers to detect deadlock conditions and auto-recover before they propagate.
Adaptive marking
WRED-based ECN marking on a per-queue basis with DCQCN reaction-point feedback. Tuned defaults for NCCL-class workloads; parametric override for custom RDMA stacks.
Adaptive flowlet
Dynamic Load Balancing (DLB) re-bins flowlets on link saturation in sub-millisecond intervals. Removes the static hashing collisions that hurt symmetric topologies.
Per-priority queue stats
gNMI streaming sensors for queue depth, PFC pause counters, ECN-marked packets, and microburst detection — exported at 1-second granularity.
Rail-optimized fabrics
Validated for rail-aligned and scheduled-fabric topologies. Recipes for 256–4,096 GPU clusters using off-the-shelf 400G and 800G open switches.
Lossless verification
CLI diagnostics to verify a known-good lossless config end-to-end: PFC headroom math, ECN threshold sanity, and a synthetic incast test.
What you get with OcNOS
- Open hardware choice. Run RoCEv2 on UfiSpace, Edgecore, Wedge, or Celestica platforms with the same NOS image — no vendor lock-in for the fabric layer.
- Day-one feature parity. Adaptive LB, DCQCN tuning, and ASIC-native telemetry are not paid add-ons — they're part of the base OcNOS-DC license.
- Reference designs. Validated configs for popular AI fabric topologies; we publish the configs and the test results.
- Engineering access. Premium support tier includes direct dialog with the OcNOS RoCEv2 team during fabric bring-up.