OcNOS 7.0 for Data Centers: AI Fabric, 800G Platforms, and EVPN-VXLAN at Scale

OcNOS 7.0 for Data Centers is now generally available. This release is built around a single reality: AI infrastructure has entered a new era, and the network is the critical determinant of GPU cluster efficiency, job completion time, and infrastructure return on investment.

Latency, jitter, and packet loss inside an AI training fabric directly translate into lost GPU productivity at scale. OcNOS 7.0 addresses this with purpose-built AI fabric capabilities on open, ONIE-enabled Broadcom Tomahawk 5 platforms — delivering the performance of proprietary hyperscale solutions without vendor lock-in.

1. AI/ML Fabric: Lossless Transport on Broadcom Tomahawk 5

GPU-to-GPU communication in AI training clusters relies on RDMA over Converged Ethernet (RoCEv2). RoCEv2 is highly sensitive to packet loss — even brief congestion events trigger retransmission cycles that stall the entire training job. OcNOS 7.0 delivers a complete lossless fabric solution for RoCEv2 workloads.

Priority-Based Flow Control (PFC) and Enhanced Transmission Selection (ETS)

PFC implements per-priority pause frames, ensuring that a congested queue on a receiving interface signals the sender to stop transmitting for that priority class — preventing packet drops without affecting other traffic classes. ETS allocates bandwidth among traffic classes using weighted scheduling.

Spine-1 TH5 • 64x800G • 51.2T Spine-2 TH5 • 64x800G • 51.2T Leaf-1 EVPN-VXLAN • PFC • ETS Leaf-2 EVPN-VXLAN • PFC • ETS Leaf-3 EVPN-VXLAN • PFC • ETS GPU Pod A RoCEv2 • 400G GPU Pod B RoCEv2 • 400G GPU Pod C RoCEv2 • 400G GPU Pod D RoCEv2 • 400G GPU Pod E RoCEv2 • 400G GPU Pod F RoCEv2 • 400G OcNOS 7.0 on TH5 • PFC + ETS + DCBX • RoCEv2 Lossless • EVPN-VXLAN Multi-Tenancy
OcNOS 7.0 AI fabric: Broadcom Tomahawk 5 spine and leaf switches running EVPN-VXLAN with PFC, ETS, and DCBX for lossless RoCEv2 GPU-to-GPU communication.
! OcNOS 7.0 -- AI fabric: PFC, ETS, and DCBX configuration
!
! Step 1: Define QoS map for RoCEv2 traffic (priority 3)
qos map dscp-cos ROCE-MAP
  dscp 24 cos 3         ! RoCEv2 uses DSCP 24 (CS3)
!
! Step 2: Enable PFC for priority 3 (RoCEv2)
interface Ethernet1/1
  qos map dscp-cos ROCE-MAP
  priority-flow-control mode on
  priority-flow-control priority 3 no-drop
!
! Step 3: ETS bandwidth allocation
qos scheduler-group FABRIC-ETS
  strict-priority 7       ! Highest: control plane
  wrr cos 3 weight 70     ! 70% bandwidth: RoCEv2
  wrr cos 0 weight 30     ! 30% bandwidth: best-effort
!
! Step 4: DCBX for auto-negotiation with servers
interface Ethernet1/1
  dcbx enable
  dcbx version ieee
!
! Verification:
show priority-flow-control
show dcbx interface Ethernet1/1
show qos scheduler-group FABRIC-ETS

2. EVPN-VXLAN Multi-Site Overlay Extension

Modern data centers are no longer confined to a single location. AI clusters, hybrid cloud architectures, and distributed applications require seamless connectivity across multiple sites without sacrificing tenant isolation or operational control. OcNOS 7.0 delivers EVPN Layer 3 overlay extension with VXLAN stitching between sites, eliminating external gateway appliances and reducing infrastructure cost.

! OcNOS 7.0 -- EVPN-VXLAN multi-site L3 overlay extension
! Border Leaf: Site A side
!
vrf TENANT-A
  vni 10001
!
router bgp 65001
  !
  address-family l2vpn evpn
    neighbor 10.0.0.2 activate            ! Site A spine
    neighbor 10.200.0.1 activate          ! Site B border leaf
    neighbor 10.200.0.1 route-map EXPORT-SITE-B out
  !
  vrf TENANT-A
    rd 65001:10001
    route-target import 65002:10001       ! Import Site B prefixes
    route-target export 65001:10001       ! Export Site A prefixes
    redistribute connected
!
! Route-map: control which VRFs extend to Site B
route-map EXPORT-SITE-B permit 10
  match evpn route-type 5               ! Only IP prefix routes (type 5)
  set extcommunity rt 65001:10001 additive

3. Enhanced Visibility: Selective Mirroring and Route Target Filtering

At scale, operational visibility is not optional. OcNOS 7.0 introduces selective packet mirroring to CPU with filtering, enabling operators to capture and analyze specific traffic flows directly within the fabric — without deploying dedicated capture infrastructure or impacting forwarding performance.

! OcNOS 7.0 -- Selective mirroring for fabric troubleshooting
!
monitor session 1
  source interface Ethernet1/1 rx
  filter access-group MIRROR-FILTER
  destination cpu
!
! Filter: capture only VXLAN-encapsulated traffic on VNI 10001
ip access-list MIRROR-FILTER
  permit udp any any dst-port 4789      ! VXLAN UDP port
!
! Verification:
show monitor session 1
show capture buffer

4. New Platform: UfiSpace S9321-64EO (Broadcom Tomahawk 5)

OcNOS 7.0 introduces support for the UfiSpace S9321-64EO — a purpose-built AI/ML fabric switch delivering 51.2 Tbps of switching capacity with 64 high-density OSFP ports at 800G. This platform is engineered for next-generation GPU interconnects and large-scale AI training clusters requiring ultra-low latency and deterministic forwarding.

Platform Silicon Switching Capacity Ports Use Case
UfiSpace S9321-64EO Broadcom Tomahawk 5 51.2 Tbps 64×800G OSFP AI/ML spine, GPU fabric

Key Benefits of OcNOS for Data Centers

  • Lossless AI fabric — PFC, ETS, and DCBX deliver RoCEv2 lossless transport without proprietary NICs or switches
  • Open hardware choice — Broadcom Tomahawk 5 silicon on multiple ODM platforms from UfiSpace, Edgecore, and others
  • EVPN-VXLAN multi-tenancy — scalable overlay with Route Target filtering and multi-site L3 extension
  • Real-time visibility — on-change gNMI telemetry and selective mirroring for operational insight at scale
  • All-inclusive licensing — single SKU covers full OcNOS feature set; no per-feature upsell

Alan Huang is Senior Product Manager, Data Center at IP Infusion. Connect on LinkedIn.

Share