BCM78900 · TSMC 5 nm · Shipping since March 2023

Broadcom Tomahawk 5 Tomahawk 5 Switches Four 800G open platforms, validated on OcNOS-DC.

Network engineers picking a Tomahawk 5 switch — start here. Edgecore AIS800-64D and AS9817-64D, UfiSpace S9321-64E and S9321-64EO. Same silicon, same OcNOS-DC image, four procurement paths. Specs, decision rules, and the OcNOS-DC feature surface — without the marketing fluff.

01The Switches 02Inside the Silicon 03Generation Jump 04OcNOS-DC

아키텍처 리뷰 예약 Test Drive OcNOS VM

OcNOS-DC — AIS800-64D · Tomahawk 5 라이브

$show version

OcNOS-DC 7.0 on Tomahawk 5 (BCM78900)

Platform: Edgecore AIS800-64D 64 × 800G

$show qos pfc int eth1/1

PRIORITY FLOW CONTROL — RoCEv2 lanes

Pri 3 lossless ✓ 활성화

Pri 4 lossless ✓ 활성화

Wdog deadlock ✓ armed

$show ecn dcqcn profile

프로필 ai-fabric-ncc1

Kmin 200 KB Kmax 800 KB

$show dlb status

Mode Reactive Path Rebalance

Rebind 64 µs flowlet

활성 14,832 flows ✓ balanced

51.2Tbps

Switch Capacity

64×800G

Native Port Radix

4SKUs

OcNOS-Validated

2ODMs

Edgecore · UfiSpace

5nm

TSMC N5 Process

The Switches

Open hardware running Tomahawk 5

Four 800G platforms. Two ODMs. One OcNOS-DC image.

Two hardware designs, four SKUs. All four ship ONIE pre-loaded and run the same OcNOS-DC image — the differences are form factor (QSFP-DD vs OSFP), branding (AI-fabric SKU vs general-DC SKU), and which optics ecosystem the deployment is built around. Each card links to the full vendor datasheet (PDF, hosted locally).

Edgecore· DCS560 platform family

AI fabric spine

AIS800-64D

Validated on OcNOS-DC · ONIE pre-loaded

Ports: 64 × QSFP-DD800Breakout: 2×400 / 4×200 / 8×100 (320 logical ports)
Form: 2RU
Power: 2× 3000 W AC/DC redundant30 W per QSFP-DD cage
CPU: Intel Xeon D1713NTE

▌ Pick this when

GPU-cluster AI fabric. AI-branded SKU of the DCS560 — same hardware as AS9817-64D under different framing.

Edgecore AIS800-64D datasheet PDF

Edgecore· DCS560 platform family

DC fabric · 800G aggregation

AS9817-64D

Validated on OcNOS-DC · ONIE pre-loaded

Ports: 64 × QSFP-DD800Breakout: 2×400 / 4×200 / 8×100 (320 logical ports)
Form: 2RU
Power: Hot-swap redundant AC/DC30 W per QSFP-DD cage
CPU: Intel Xeon D-class

▌ Pick this when

General data-center fabric or DCI duty. Same DCS560 chassis as AIS800-64D, branded for non-AI workloads.

Edgecore AS9817-64D datasheet PDF

UfiSpace· S9321 platform family

AI/ML fabric spine

S9321-64E

Validated on OcNOS-DC · ONIE pre-loaded

Ports: 64 × QSFP-DD (200/400/800G)Breakout: 2×400 / 4×200 / 8×100
Form: 2RU · 23.72 kg
Power: 913 W typical (no transceivers)30 W per QSFP-DD cage
CPU: Intel Icelake-D 4-core · 32 GB DDR4

▌ Pick this when

Large, low-entropy AI/ML flows. UfiSpace markets the 64E for AllReduce-dominant traffic where TH5 adaptive routing is the design centre.

UfiSpace S9321-64E datasheet PDF

UfiSpace· S9321 platform family

800G DCI · coherent optics

S9321-64EO

Validated on OcNOS-DC · ONIE pre-loaded

Ports: 64 × OSFP (200/400/800G)Breakout: 2×400 / 4×200 / 8×100
Form: 2RU · 23.74 kg
Power: 925 W typical · 200–240 V ACOSFP cages for higher-power optics
CPU: Intel Icelake-D · 32 GB DDR4

▌ Pick this when

800G ZR/ZR+ coherent or other higher-power module classes. OSFP form factor of the 64E — pick when the optics drive the cage choice.

UfiSpace S9321-64EO datasheet PDF

· How to choose between the four

AIS800 vs AS9817Same Edgecore DCS560 hardware. AIS for AI-cluster framing; AS9817 for general DC fabric or DCI.

QSFP-DD vs OSFPQSFP-DD (S9321-64E + both Edgecore SKUs) for the high-volume optics ecosystem. OSFP (S9321-64EO) for higher-power module classes including 800G ZR/ZR+ coherent.

Edgecore vs UfiSpaceBoth are open-hardware ODMs with strong IP Infusion co-design. Pick by your ODM relationship, RMA logistics, or BoM economics.

Single-vendor riskTwo vendors with TH5 platforms means dual-source BoM is realistic — important for hyperscale and NeoCloud procurement.

Inside the Silicon

What 51.2 Tbps in one die buys you

Tomahawk 5 — Broadcom's flagship merchant switch ASIC.

The BCM78900 is a single 5 nm monolithic die delivering 51.2 Tbps of switching capacity — feeding 64 ports of 800GbE, 128 of 400G, or 256 of 200G natively. It was Broadcom's first 5 nm merchant switch IC and the first product anywhere to support 800GbE at the cage. 512 SerDes lanes running 100G PAM4 — the same lane count as Tomahawk 4, twice the per-lane speed.

Beyond raw capacity, three architectural choices made TH5 the silicon under most production AI fabrics: a shared-buffer architecture that absorbs NCCL micro-bursts, hardware Cognitive Routing (DLB) that rebinds elephant flows in the ASIC, and 5 nm thermal headroom that lets 30 W QSFP-DD800 cages run without per-port active cooling.

Specs verifiable against Broadcom's public BCM78900 product page.

ProcessTSMC N5 SeriesStrataXGS BufferShared, RDMA-tuned RoutingCognitive · DLB ShippingSince Mar 2023

· What 64 × 800G looks like

BCM78900 die51.2 Tbps

512 lanes × 100G PAM4 = 51.2 Tbps. Eight lanes per cage → 800G. The arithmetic is the architecture.

Four design choices that matter

Why TH5 ended up in almost every open AI fabric built since 2024.

The headline number gets the press. These four engineering choices are what AI fabric architects actually care about.

PRINCIPLE 01

Same lane count, twice the speed.

TH5 carries the same 512 SerDes lanes as TH4 — running them at 100G PAM4 instead of 50G. The throughput double came from speeding up existing infrastructure, not adding to it.

100G PAM4 · 106 Gbps

PRINCIPLE 02

Shared-buffer, not partitioned.

Packet memory pools across all 64 ports — not split per-port. NCCL AllReduce micro-bursts on one port absorb into the fabric-wide pool instead of triggering tail-drop. The single-line reason TH5 wins on RoCEv2.

Shared-buffer · RDMA-tuned

PRINCIPLE 03

Hardware adaptive routing.

Broadcom Cognitive Routing detects congested paths and rebinds elephant flows in the ASIC — no controller round-trip, no ECMP rehashing. OcNOS-DC turns it on as DLB Reactive-Path Rebalance.

DLB · 64 µs flowlet

PRINCIPLE 04

5 nm thermal headroom.

The first 5 nm merchant switch IC. The process shrink is what made 30 W per QSFP-DD800 cage feasible without active per-port cooling — including high-power 800G optics and 8×100G breakout.

TSMC N5 · 30 W/port

Generation Jump

Tomahawk 4 → Tomahawk 5

Every dimension doubled. Same rack footprint.

Honest framing: TH4 (25.6 Tbps · 32×400G · 7 nm) is still excellent for clusters built around 400G NICs. TH5 earns its rack space when 800G radix and AI-fabric primitives both matter.

Switching capacity

25.6 Tbps → 51.2 Tbps

Doubled at the same rack footprint. Same 2RU, same power envelope class.

Native port radix

32 × 400G → 64 × 800G

Three-tier Clos at 16k GPU instead of four — radix collapses a layer.

Process node

7 nm → 5 nm

First 5 nm merchant switch IC. Thermal headroom for 30 W/port without active cooling.

SerDes per lane

50G PAM4 → 100G PAM4

Same 512 lanes, twice the speed. The throughput double came from existing infrastructure.

Brownfield refresh stays clean. The same OcNOS-DC image runs on TH3, TH4, and TH5 platforms — configurations, automation, and gNMI pipelines carry over. Pick TH5 for the next cluster; keep TH4 where it already works.

What OcNOS-DC Ships

OcNOS-DC on this silicon

Carrier-grade NOS. AI-tuned defaults.

Tomahawk 5 has the hardware. The job of the NOS is to expose it — to operators, to telemetry pipelines, to the cluster scheduler — without forcing them to write CLI gymnastics around it. OcNOS-DC ships these primitives as first-class configurable objects with YANG-modelled state.

Lossless RoCEv2

Shared-buffer architecture, zero-drop east-west.

OcNOS-DC ships PFC + ETS + Dynamic ECN pre-tuned to NCCL collective patterns. Tail latency stays bounded under AllReduce micro-bursts that take community NOS fabrics down. The TH5 shared-buffer pool absorbs synchronised many-to-one traffic that would tail-drop on partitioned-buffer chips.

Adaptive Routing

DLB rebinds flowlets in 64 µs.

ECMP hash-collision under elephant flows is the AI fabric killer. OcNOS-DC turns on TH5 Cognitive Routing's flowlet rebinding so AllReduce traffic spreads across every spine path automatically.

PFC Deadlock Watchdog

Per-port, per-priority. Auto-drain.

Detects paused-queue cycles before they hang training jobs. Auto-recovers without operator intervention.

스트리밍 텔레메트리

gNMI on-change, OpenConfig YANG.

Buffer depth, ECN marks, PFC pause counts — every threshold a knob, every counter a sensor path. Plugs into Prometheus, Grafana, OTel.

Real Network

BGP · OSPF · IS-IS · EVPN-VXLAN.

The TH5 spine is also a real router. Full carrier-grade Layer 3 stack on the same silicon — operate the AI fabric like the rest of your network, not like a black box.

Validated feature surface

215 features across 8 categories — pulled from the live OcNOS Feature Matrix.

Layer 3 routing · L1/L2 · AI/ML fabric primitives · Multicast · QoS · Security · Hardware · Management. Every entry verifiable per-platform on the public matrix.

RoCEv2 / PFC DCQCN DLB EVPN-VXLAN SR-MPLS BGP / OSPF / IS-IS gNMI / NETCONF ZTP UEC 1.0 ready

Day-0 to Day-2

ZTP. gNMI on-change. NETCONF + YANG. DCBX.

Bring up a TH5 spine in the rack with zero-touch provisioning. Stream every counter to your observability stack. Tune every threshold via YANG-modelled config. No glue scripts.

ZTP IPv4/IPv6 gNMI NETCONF OpenConfig YANG DCBX LLDP Ansible Terraform provider

Who builds this stack

Three operator profiles. One silicon + NOS combo.

Same TH5 die, same OcNOS-DC image, three different framings of the same architectural question: how do you scale lossless east-west without locking the whole stack to one vendor?

AI Cluster Operator

1k–16k GPU training fabric on open silicon.

"We need 800G to the leaf, lossless RoCEv2, and tail latency that doesn't blow up under AllReduce. Single-vendor lock-in is not on the table."

TH5 64×800G spines, RoCEv2 with NCCL-tuned DCQCN, sub-millisecond DLB rebinding, PFC deadlock watchdog. Three-tier Clos at 16k GPU instead of four — the radix collapses a layer.

DC · AI Fabric SKU

NeoCloud · GPU-as-a-Service

Multi-tenant fabric, BoM under control.

"Our customers pick the GPU. We can't tie our fabric BoM to their NIC choice. We need a switch we can buy from two vendors at minimum."

Four OcNOS-validated TH5 SKUs across two vendors (Edgecore, UfiSpace). VRF-Lite tenant isolation, gNMI per-tenant telemetry, EVPN-VXLAN segmentation. One NOS image, multi-vendor hardware.

DC · Multi-Tenant

Hyperscaler · Brownfield Refresh

TH3/TH4 fabric refresh without forklift.

"We have a TH4 fabric in production. The next training cluster needs 800G NICs. We don't want to redesign the whole NOS layer to upgrade the silicon."

Same OcNOS-DC image runs on TH3, TH4, and TH5 platforms. Brownfield refresh keeps configs, automation, and gNMI pipelines intact. UEC 1.0 fabric profile already aligned for the next NIC generation.

DC · UEC-Ready

Full Feature Matrix AI 패브릭 솔루션 Reference Topologies 하드웨어 호환성 목록

Frequently Asked

The questions architects actually ask.

Which Tomahawk 5 switches run OcNOS-DC?

Four open-hardware platforms across two ODMs: Edgecore AIS800-64D and AS9817-64D (sibling SKUs on the DCS560 chassis), and UfiSpace S9321-64E (QSFP-DD) and S9321-64EO (OSFP). All four ship ONIE pre-loaded and run the same OcNOS-DC image — same configuration, same feature surface, same automation hooks. Two vendors means dual-source BoM is realistic for hyperscale and NeoCloud procurement.

AIS800-64D vs AS9817-64D — what is the actual difference?

Same Edgecore DCS560 hardware, different SKU framing. AIS800-64D is the AI-fabric branding (sold for GPU clusters); AS9817-64D is the general data-center branding (sold for DC fabric or DCI). Mechanically and electrically identical — the choice is procurement framing, not engineering. Pick AIS if AI-cluster framing matters to your deployment; pick AS9817 for general DC fabric or DCI.

QSFP-DD vs OSFP — when do I need the S9321-64EO?

QSFP-DD (S9321-64E and both Edgecore SKUs) is the high-volume optics ecosystem — the right default for short-reach 800G inside the data center. OSFP (S9321-64EO) provides higher-power cages for module classes QSFP-DD cannot host: 800G ZR/ZR+ coherent for DCI, longer-reach DR4/DR8, and pluggable amplifiers. Pick OSFP when the optics drive the cage choice; otherwise QSFP-DD wins on cost and ecosystem breadth.

How does Tomahawk 5 compare to Tomahawk 4 — when do I pick which?

TH4 is 25.6 Tbps · 32×400G · 7 nm · 50G PAM4. TH5 doubles every dimension at the same rack footprint. Decision rule: if the cluster needs 800G ports natively, or GPU count puts pressure on tier count (TH5 collapses one Clos tier at 16k GPU), pick TH5. If the design is built around 400G NICs and the 256–1k GPU envelope, TH4 is still excellent and cheaper per port. OcNOS-DC supports both with the same feature set — brownfield refresh stays clean.

Does Tomahawk 5 support Ultra Ethernet (UEC)?

TH5 has the hardware mechanisms UEC 1.0 fabric profiles need — per-packet ECMP, packet-spray-friendly forwarding, shared-buffer scheduling that tolerates out-of-order delivery. UEC itself lives mostly in the NIC; TH5 fabrics running OcNOS-DC will carry UEC traffic correctly when UEC NICs ship in volume. RoCEv2 and UEC coexist on the same switch — migrate clusters NIC-by-NIC, no fabric replacement.

What does OcNOS-DC light up on TH5 that community SONiC does not?

On TH5, OcNOS-DC ships pre-tuned for AI fabrics: PFC over L3, ETS, Dynamic ECN, DLB Reactive-Path Rebalance, DLB Random-Flow, PFC Deadlock Detection & Recovery, NCCL-aligned buffer profiles, DCBX LLDP. On the same silicon it also runs a full carrier-grade Layer 3 stack — BGP, OSPF, IS-IS, SR-MPLS, EVPN-VXLAN — that AI-only stacks typically don't cover. 215 features validated across 8 categories, every entry verifiable on the public OcNOS Feature Matrix.

Where is Tomahawk 5 the wrong choice?

SP edge, cell-site gateway, sub-1 Tbps aggregation. The 64×800G radix doesn't earn its rack space in those roles. For SP routing OcNOS validates Broadcom Qumran (Q2C, Q2C+) and Jericho (J2C+); for 100G/400G DC leaf where the cluster is below 1k GPUs, Trident (TD3-X7, TD4) is the better economics. Honest framing: TH5 wins when 800G radix and AI-fabric primitives both matter — not when only one does.

Designing a Tomahawk 5 fabric? Let's size it together.

30-minute architecture session with an OcNOS network architect. Bring your GPU count, NIC speed, and tier preference — leave with a sized BoM across all four TH5 SKUs.

아키텍처 리뷰 예약 Test Drive OcNOS-DC

솔루션

제품

리소스

파트너

회사 소개

Broadcom Tomahawk 5 Tomahawk 5 Switches Four 800G open platforms, validated on OcNOS-DC.

Four 800G platforms. Two ODMs. One OcNOS-DC image.

AIS800-64D

AS9817-64D

S9321-64E

S9321-64EO

· How to choose between the four

Tomahawk 5 — Broadcom's flagship merchant switch ASIC.

· What 64 × 800G looks like

Why TH5 ended up in almost every open AI fabric built since 2024.

Same lane count, twice the speed.

Shared-buffer, not partitioned.

Hardware adaptive routing.

5 nm thermal headroom.

Every dimension doubled. Same rack footprint.

Carrier-grade NOS. AI-tuned defaults.

Shared-buffer architecture, zero-drop east-west.

DLB rebinds flowlets in 64 µs.

Per-port, per-priority. Auto-drain.

gNMI on-change, OpenConfig YANG.

BGP · OSPF · IS-IS · EVPN-VXLAN.

215 features across 8 categories — pulled from the live OcNOS Feature Matrix.

ZTP. gNMI on-change. NETCONF + YANG. DCBX.

Three operator profiles. One silicon + NOS combo.

1k–16k GPU training fabric on open silicon.

Multi-tenant fabric, BoM under control.

TH3/TH4 fabric refresh without forklift.

The questions architects actually ask.

Designing a Tomahawk 5 fabric? Let's size it together.