An open AI fabric — built for what your training job actually feels.
At thousands of accelerators you don't measure switches in Tbps — you measure job completion time, GPU utilization, and tail latency under microbursts. OcNOS-DC moves those numbers on open merchant silicon with a 24/7 carrier-grade SLA: the same technical floor as the closed AI stacks, none of the lock-in.
"Will my training job actually finish faster?"
At scale, traditional network metrics lose their meaning. What matters is Job Completion Time, GPU utilization, and tail latency under microbursts — because every minute a multi-billion-dollar cluster waits on a synchronization step is capital burned.
The lossless, low-latency performance AI needs no longer requires a closed, proprietary stack. On open merchant silicon with a carrier-grade SLA, OcNOS-DC matches the technical floor of closed architectures with no vendor lock-in — congestion management, sub-millisecond dynamic routing, and Ultra Ethernet alignment, tuned for the bursty patterns of collective traffic. GPUs spend their time processing data, not waiting on the network.
Every threshold is exposed, so your team can tune it against real xCCL (NCCL / RCCL / oneCCL) traffic. Below: each workload pattern, the mechanism that handles it, and what the operator gets back.
→ DLB rebinds flowlets sub-ms on live queue depth.
→ GLB (OcNOS 7.1) scores leaf · spine · super-spine.
→ DCQCN (xCCL-tuned ECN + CNP) caps rate before the drop.
→ PFC Watchdog auto-drains stuck queues per-port.
→ UEC 1.0: packet spray + multi-path RDMA + out-of-order delivery.
→ The switch you buy today stays when UEC NICs land.
Field measurement. DLB lifts fabric utilization from ~55% on static ECMP to 90%+ on the same hardware — no extra uplinks. Local at each hop; system-wide across the AllReduce.
DLB deep-dive →800G spine-leaf, lossless from rack to rack.
A 3-stage Clos: eBGP unnumbered underlay, ECMP at every tier, PFC/ECN per priority group, isolated out-of-band bus for ZTP and telemetry. Hover any node for switch, port count, and ASIC.
Hover nodes for capability and platform details · Full HCL: 40+ validated platforms at ipinfusion.com/hcl
Four layers of losslessness — correct on Day 1.
Most AI fabric failures trace to one misconfigured PFC priority group or an ECN threshold tuned for cloud, not RDMA. OcNOS-DC ships RoCEv2 buffer profiles validated per Broadcom ASIC — so your first AllReduce runs lossless without a tuning sprint.
PFC + ECN — priority-group lossless control
PFC pauses per-priority traffic before buffers overflow; ECN marks packets early for sender-side slowdown. No drops, no port-wide stall. PFC over L3 for routed multi-row fabrics.
DLB — flowlet-level adaptive routing
Static-hash ECMP collides when 8 NICs hash to the same spine. DLB watches live queue depth and rebinds flowlets to less-loaded paths sub-ms — the AllReduce stops dragging on the slowest link.
DCBX — server config auto-pushed over LLDP
The leaf pushes correct PFC and ETS config to the GPU server automatically — no silent loss of losslessness when a node gets re-imaged, the most common production failure mode.
gNMI on-change telemetry — sub-second visibility
PFC pauses, ECN marking, DCQCN thresholds, and buffer depths as gNMI on-change sensor paths — straight into Prometheus / Grafana / OpenTelemetry. Catch congestion before it stalls a job.
40+ validated platforms — HCL 全件 →
The fabric profile is ready before the NICs are. That's the point.
RoCEv2 is the production transport in 2026; UEC is what comes next. The UEC 1.0 fabric profile adds packet spray, multi-path RDMA, and out-of-order-friendly forwarding — closing the single-hash limit that kept earlier RoCE a step behind InfiniBand on multi-rail collectives. OcNOS-DC tracks the UEC 1.0 fabric profile today, while UEC NICs roll out. The point isn't leading the standard — everyone is aligning to it. It's that the switch you buy this quarter won't need replacing when your UEC NIC arrives.
Packet spray
Single flow uses every parallel path simultaneously instead of being pinned to one ECMP hash. Multi-rail bandwidth is no longer left on the table.
Multi-path RDMA
Reorder buffers handle out-of-order delivery in hardware. Modern congestion control replaces NACK-based loss recovery on tail latency.
Same hardware, forward path
The TH4 and TH5 platforms validated for OcNOS-DC today extend into UEC. No fork. No second SKU line. One fabric, two transport generations.
Where OcNOS-DC sits — honestly, by name.
The race has converged on a shared floor: lossless RoCEv2, DCQCN, adaptive routing, UEC alignment. Everyone ships these. The real differentiator is solution shape — vertical lock-in vs. open NOS, locked vs. open hardware, closed-loop IB vs. standards Ethernet. Pick the trade-off you can live with for five years.
Every row ships a real product — including OcNOS-DC. The question is rarely a missing feature; it's the trade-off you'll live with.
What it actually is — and where it stops.
An AI cluster is three layers. The fabric moves bytes between switches; the NIC terminates RDMA; the scheduler decides what runs where. "AI-aware fabric" usually means one vendor bundled all three under one SKU. OcNOS-DC owns the fabric, exposes every threshold, and stays out of the layers above. Here's the boundary, named.
What OcNOS-DC owns.
- Lossless RoCEv2 transport — PFC + ECN + ETS + DCBX
- DCQCN with xCCL-validated default thresholds, every knob YANG-modeled
- DLB sub-ms flowlet rebinding on live ASIC queue depth
- GLB fabric-wide path scoring (OcNOS 7.1)
- PFC deadlock watchdog — per-port, per-priority
- UEC 1.0 fabric-profile alignment — packet-spray-friendly forwarding
- gNMI on-change telemetry, OpenConfig YANG, sub-second cadence
Your NIC vendor's job.
- xCCL collective implementation and tuning
- RDMA verbs, queue pairs, retransmit logic
- UEC packet spray endpoint + reorder buffer (UEC NICs)
- GPU-direct memory access, NVLink coordination
- Per-flow rate limiting and end-host congestion response
Your orchestration platform's job.
- Training-job placement, gang scheduling, gradient-sync windows
- Epoch / training-phase awareness
- Tenant isolation, queue priority, resource quotas
- xCCL ring topology assignment, rail-group affinity
- Cross-job interference detection
Every mechanism on this page has its own deep-dive.
The page above is for picking a fabric. These are for tuning one — packet captures, ASIC behavior, YANG paths, and where each feature ships in the release train.
RoCEv2 + PFC + ECN + DCQCN
The lossless RDMA transport layer for GPU collectives. Buffer profiles pre-tuned per Broadcom ASIC, xCCL-class DCQCN defaults, sub-µs jitter under load.
Read deep-dive → AI Fabric · LocalAdaptive Dynamic Load Balancing (DLB)
Sub-millisecond flowlet rebinding using live ASIC queue-depth telemetry. Closes the ECMP hash-collision gap on AllReduce elephant flows.
Read deep-dive → AI Fabric · Fabric-wide OcNOS 7.1Global Load Balancing (GLB)
End-to-end path scoring across leaf · spine · super-spine for clusters up to 16k GPU. The multi-hop adaptive layer DLB cannot see alone.
Read deep-dive → AI Fabric · Frontier UEC 1.0Ultra Ethernet (UEC)
Packet spray, multi-path RDMA, out-of-order delivery, modern congestion control. The standards-based open answer to InfiniBand.
Read deep-dive → AI Fabric · Reference DesignsTopologies — 1k / 4k / 16k GPU
Rail-only and rail-optimized designs map the fabric shape directly onto the xCCL 8-rail multi-NIC pattern. 3-stage Clos for scale-out beyond 1k GPU. Port counts on TH4 / TH5.
Read deep-dive → AI Fabric · Congestion ControlDCQCN — RDMA Congestion Control
WRED ECN marking, CNP feedback, quantized rate control. xCCL-class defaults out of the box; every threshold YANG-modeled for tuning.
Read deep-dive → AI Fabric · SurvivalWatchdog — PFC Deadlock Detection
Per-port, per-priority watchdog detects paused-queue cycles and auto-drains the affected queue before training jobs hang.
Read deep-dive → AI Fabric · Decision GuideInfiniBand vs Ethernet for AI
Workload-specific decision guide. Where modern Ethernet (RoCEv2 + DLB + UEC) closes the gap, where IB still wins, and how to pick.
Read deep-dive → ObservabilitygNMI ストリーミングテレメトリ
gNMI Subscribe over gRPC, OpenConfig YANG, dial-out collectors. Integrations with Telegraf, Prometheus, and Grafana.
Read deep-dive →Three cluster shapes. Three fabric stories.
Framed by what the job feels, not by switch features. Pick the shape closest to yours; the deep-dives have the configs.
The multi-week LLM pre-training run.
AllReduce dominates the network. Every GPU must hold >90% utilization in-collective and survive microbursts without restarting a nine-day run.
Mechanisms: DCQCN + DLB + PFC Watchdog. Rail-optimized below 1k GPU; 3-stage Clos with GLB above.
成果: AllReduce at line rate, zero collective restarts, JCT inside schedule.
The high-throughput inference fleet behind a public API.
Real-time inference where p99 tail latency drives the SLO. Inference must never queue behind batch retraining, and ops needs per-flow visibility the moment latency drifts.
Mechanisms: ETS strict-priority + gNMI on-change telemetry into Prometheus / OpenTelemetry.
成果: p99 held inside SLO; regressions caught in milliseconds, not the support queue.
The neocloud renting H100 / H200 / Blackwell to tenants.
A multi-tenant GPU cloud. Each tenant needs isolated lossless RoCEv2 paths — without a separate fabric segment per customer or a second NOS image.
Mechanisms: EVPN-VXLAN isolation + lossless RoCEv2 on one OcNOS-DC instance.
成果: per-tenant isolation, one ops model, one SLA, one image to upgrade.
Take it offline. Read it on a plane.
PDF briefs — architecture detail, SKU tiers, validated platforms — one document to share with the team.
OcNOS 800G イーサネットベースのロスレス AI ファブリック
Tomahawk 4/5スパイン上のノンブロッキングRoCEv2ファブリック — SKUティア、検証済みプラットフォーム、導入アーキテクチャ。
ダウンロード → ソリューション概要 · PDFEVPN-VXLAN データセンターファブリック
キャリアグレードのリーフ・スパイン型データセンターファブリック:対称型IRB、Type-2/Type-5ルート、分散エニーキャストゲートウェイ。
ダウンロード → 導入事例本番 AI & DC 展開
キャリアグレードのワークロードを本番運用する事業者による、実際のOcNOSデータセンターおよびAIファブリック導入事例。
閲覧 →Bring your topology. We'll show you the path.
Every IPI architecture review is led by a network engineer running production OcNOS — no slides, no sales theatre. Bring your GPU count, NIC choice, and target JCT; we'll map it to topology, SKUs, and configs that ship today.
Connect it to everything else.
AI is one segment of the data center. DC Fabric and DCI extend the same OcNOS image into compute, storage, and remote sites — same NOS, same CLI, same SLA.
The honest FAQ.
OcNOS 800G AI Fabric Solution Brief
Complete a short form and the PDF is delivered instantly by our resource centre.