BCM78900 · TSMC 5 nm · Shipping since March 2023

Broadcom Tomahawk 5 Switch Tomahawk 5 Three 800G open platforms, validated on OcNOS-DC.

Network engineer che scelgono uno switch Tomahawk 5: si parte da qui. Edgecore AIS800-64D, UfiSpace S9321-64E e S9321-64EO. Stesso silicio, stessa immagine OcNOS-DC, tre percorsi di acquisto. Specifiche, criteri di scelta e il perimetro di funzionalità di OcNOS-DC, senza orpelli di marketing.

01The Switches 02Inside the Silicon 03Generation Jump 04OcNOS-DC

Prenota una revisione dell'architettura Test Drive OcNOS VM

OcNOS-DC — AIS800-64D · Tomahawk 5 IN DIRETTA

$show version

OcNOS-DC 7.0 on Tomahawk 5 (BCM78900)

Platform: Edgecore AIS800-64D 64 × 800G

$show qos pfc int eth1/1

PRIORITY FLOW CONTROL — RoCEv2 lanes

Pri 3 lossless ✓ abilitato

Pri 4 lossless ✓ abilitato

Wdog deadlock ✓ armed

$show ecn dcqcn profile

Profilo ai-fabric-ncc1

Kmin 200 KB Kmax 800 KB

$show dlb status

Mode Reactive Path Rebalance

Rebind 64 µs flowlet

Attivo 14,832 flows ✓ balanced

51.2Tbps

Switch Capacity

64×800G

Native Port Radix

4SKUs

OcNOS-Validated

2ODMs

Edgecore · UfiSpace

5nm

TSMC N5 Process

The Switches

Open hardware running Tomahawk 5

Three 800G platforms. Two ODMs. One OcNOS-DC image.

Two hardware designs, four SKUs. All four ship ONIE pre-loaded and run the same OcNOS-DC image — the differences are form factor (QSFP-DD vs OSFP), branding (AI-fabric SKU vs general-DC SKU), and which optics ecosystem the deployment is built around. Each card links to the full vendor datasheet (PDF, hosted locally).

Edgecore· DCS560 platform family

AI fabric spine

AIS800-64D

Validated on OcNOS-DC · ONIE pre-loaded

Ports: 64 × QSFP-DD800Breakout: 2×400 / 4×200 / 8×100 (320 logical ports)
Form: 2RU
Power: 2× 3000 W AC/DC redundant30 W per QSFP-DD cage
CPU: Intel Xeon D1713NTE

▌ Pick this when

GPU-cluster AI fabric. Edgecore DCS560 chassis with the AI-fabric SKU framing.

Edgecore AIS800-64D datasheet PDF

UfiSpace· S9321 platform family

AI/ML fabric spine

S9321-64E

Validated on OcNOS-DC · ONIE pre-loaded

Ports: 64 × QSFP-DD (200/400/800G)Breakout: 2×400 / 4×200 / 8×100
Form: 2RU · 23.72 kg
Power: 913 W typical (no transceivers)30 W per QSFP-DD cage
CPU: Intel Icelake-D 4-core · 32 GB DDR4

▌ Pick this when

Large, low-entropy AI/ML flows. UfiSpace markets the 64E for AllReduce-dominant traffic where TH5 adaptive routing is the design centre.

UfiSpace S9321-64E datasheet PDF

UfiSpace· S9321 platform family

800G DCI · coherent optics

S9321-64EO

Validated on OcNOS-DC · ONIE pre-loaded

Ports: 64 × OSFP (200/400/800G)Breakout: 2×400 / 4×200 / 8×100
Form: 2RU · 23.74 kg
Power: 925 W typical · 200–240 V ACOSFP cages for higher-power optics
CPU: Intel Icelake-D · 32 GB DDR4

▌ Pick this when

800G ZR/ZR+ coherent or other higher-power module classes. OSFP form factor of the 64E — pick when the optics drive the cage choice.

UfiSpace S9321-64EO datasheet PDF

· How to choose between the four

AIS800 vs S9321-64ELo stesso silicio TH5, due ODM. Edgecore DCS560 (AIS800-64D) vs UfiSpace S9321: BoM dual-source per l'approvvigionamento hyperscale e NeoCloud.

QSFP-DD vs OSFPQSFP-DD (S9321-64E + entrambi gli SKU Edgecore) per l'ecosistema ottico ad alto volume. OSFP (S9321-64EO) per le classi di moduli a maggiore consumo, incluso 800G ZR/ZR+ coerente.

Edgecore vs UfiSpaceBoth are open-hardware ODMs with strong IP Infusion co-design. Pick by your ODM relationship, RMA logistics, or BoM economics.

Single-vendor riskTwo vendors with TH5 platforms means dual-source BoM is realistic — important for hyperscale and NeoCloud procurement.

Inside the Silicon

What 51.2 Tbps in one die buys you

Tomahawk 5 — Broadcom's flagship merchant switch ASIC.

The BCM78900 is a single 5 nm monolithic die delivering 51.2 Tbps of switching capacity — feeding 64 ports of 800GbE, 128 of 400G, or 256 of 200G natively. It was Broadcom's first 5 nm merchant switch IC and the first product anywhere to support 800GbE at the cage. 512 SerDes lanes running 100G PAM4 — the same lane count as Tomahawk 4, twice the per-lane speed.

Beyond raw capacity, three architectural choices made TH5 the silicon under most production AI fabrics: a shared-buffer architecture che assorbe in hardware i micro-burst collettivi xCCL (NCCL / RCCL / oneCCL) Cognitive Routing (DLB) that rebinds elephant flows in the ASIC, and 5 nm thermal headroom that lets 30 W QSFP-DD800 cages run without per-port active cooling.

Specs verifiable against Broadcom's public BCM78900 product page.

ProcessoTSMC N5 SeriesStrataXGS BufferShared, RDMA-tuned RoutingCognitive · DLB ShippingSince Mar 2023

· What 64 × 800G looks like

BCM78900 die51,2 Tbps

512 lanes × 100G PAM4 = 51.2 Tbps. Eight lanes per cage → 800G. The arithmetic is the architecture.

Four design choices that matter

Why TH5 ended up in almost every open AI fabric built since 2024.

The headline number gets the press. These four engineering choices are what AI fabric architects actually care about.

PRINCIPLE 01

Same lane count, twice the speed.

TH5 carries the same 512 SerDes lanes as TH4 — running them at 100G PAM4 instead of 50G. The throughput double came from speeding up existing infrastructure, not adding to it.

100G PAM4 · 106 Gbps

PRINCIPLE 02

Shared-buffer, not partitioned.

Pool di memoria pacchetti condivisi su tutte le 64 porte, non suddivisi per singola porta. I micro-burst xCCL AllReduce su una porta vengono assorbiti nel pool fabric-wide invece di innescare tail-drop. Il motivo in una riga per cui TH5 vince su RoCEv2.

Shared-buffer · RDMA-tuned

PRINCIPLE 03

Hardware adaptive routing.

Broadcom Cognitive Routing detects congested paths and rebinds elephant flows in the ASIC — no controller round-trip, no ECMP rehashing. OcNOS-DC turns it on as DLB Reactive-Path Rebalance.

DLB · 64 µs flowlet

PRINCIPLE 04

5 nm thermal headroom.

The first 5 nm merchant switch IC. The process shrink is what made 30 W per QSFP-DD800 cage feasible without active per-port cooling — including high-power 800G optics and 8×100G breakout.

TSMC N5 · 30 W/port

Generation Jump

Tomahawk 4 → Tomahawk 5

Per-port speed doubled. Capacity doubled. Same 64-port radix.

Inquadramento onesto: TH4 (25,6 Tbps · 64×400G · 7 nm) resta eccellente per cluster costruiti attorno a NIC 400G. TH5 si guadagna il proprio spazio nel rack quando contano sia 800G per porta sia le primitive di AI fabric.

Switching capacity

25.6 Tbps → 51,2 Tbps

Doubled at the same rack footprint. Same 2RU, same power envelope class.

Per-port speed

64 × 400G → 64 × 800G

Lo stesso radix a 64 porte sulle effettive piattaforme IPI (AS9736-64D → AIS800-64D / S9321). La banda per porta raddoppia, quindi ogni livello Clos trasporta il doppio del traffico.

Process node

7 nm → 5 nm

First 5 nm merchant switch IC. Thermal headroom for 30 W/port without active cooling.

SerDes per lane

50G PAM4 → 100G PAM4

Same 512 lanes, twice the speed. The throughput double came from existing infrastructure.

Brownfield refresh stays clean. The same OcNOS-DC image runs on TH3, TH4, and TH5 platforms — configurations, automation, and gNMI pipelines carry over. Pick TH5 for the next cluster; keep TH4 where it already works.

What OcNOS-DC Ships

OcNOS-DC on this silicon

Carrier-grade NOS. AI-tuned defaults.

Tomahawk 5 has the hardware. The job of the NOS is to expose it — to operators, to telemetry pipelines, to the cluster scheduler — without forcing them to write CLI gymnastics around it. OcNOS-DC ships these primitives as first-class configurable objects with YANG-modelled state.

RoCEv2 lossless

Shared-buffer architecture, zero-drop east-west.

OcNOS-DC fornisce PFC + ETS + Dynamic ECN pre-tarati sui pattern collettivi xCCL. La latenza di coda rimane contenuta anche sotto i micro-burst AllReduce che mettono fuori uso i fabric NOS community. Il pool di buffer condiviso del TH5 assorbe il traffico sincronizzato many-to-one che provocherebbe tail-drop su chip con buffer partizionati.

Adaptive Routing

DLB rebinds flowlets in 64 µs.

ECMP hash-collision under elephant flows is the AI fabric killer. OcNOS-DC turns on TH5 Cognitive Routing's flowlet rebinding so AllReduce traffic spreads across every spine path automatically.

PFC Deadlock Watchdog

Per-port, per-priority. Auto-drain.

Detects paused-queue cycles before they hang training jobs. Auto-recovers without operator intervention.

Telemetria in streaming

gNMI on-change, OpenConfig YANG.

Buffer depth, ECN marks, PFC pause counts — every threshold a knob, every counter a sensor path. Plugs into Prometheus, Grafana, OTel.

Real Network

BGP · OSPF · IS-IS · EVPN-VXLAN.

The TH5 spine is also a real router. Full carrier-grade Layer 3 stack on the same silicon — operate the AI fabric like the rest of your network, not like a black box.

Validated feature surface

215 features across 8 categories — pulled from the live OcNOS Feature Matrix.

Layer 3 routing · L1/L2 · AI/ML fabric primitives · Multicast · QoS · Security · Hardware · Management. Every entry verifiable per-platform on the public matrix.

RoCEv2 / PFC DCQCN DLB EVPN-VXLAN SR-MPLS BGP / OSPF / IS-IS gNMI / NETCONF ZTP UEC 1.0 ready

Day-0 to Day-2

ZTP. gNMI on-change. NETCONF + YANG. DCBX.

Bring up a TH5 spine in the rack with zero-touch provisioning. Stream every counter to your observability stack. Tune every threshold via YANG-modelled config. No glue scripts.

ZTP IPv4/IPv6 gNMI NETCONF OpenConfig YANG DCBX LLDP Ansible Terraform provider

Who builds this stack

Three operator profiles. One silicon + NOS combo.

Same TH5 die, same OcNOS-DC image, three different framings of the same architectural question: how do you scale lossless east-west without locking the whole stack to one vendor?

AI Cluster Operator

Training fabric up to the 16k-GPU ceiling on open silicon.

"We need 800G to the leaf, lossless RoCEv2, and tail latency that doesn't blow up under AllReduce. Single-vendor lock-in is not on the table."

Spine TH5 64×800G, RoCEv2 con DCQCN ottimizzato per xCCL, rebinding DLB sub-millisecondo, watchdog deadlock PFC. Stesso radix a 64 porte del TH4 ma ogni porta spine trasporta 800G: dimezza il cablaggio spine-leaf a parità di banda aggregata del fabric.

DC · AI Fabric SKU

NeoCloud · GPU-as-a-Service

Multi-tenant fabric, BoM under control.

"Our customers pick the GPU. We can't tie our fabric BoM to their NIC choice. We need a switch we can buy from two vendors at minimum."

Four OcNOS-validated TH5 SKUs across two vendors (Edgecore, UfiSpace). VRF-Lite tenant isolation, gNMI per-tenant telemetry, EVPN-VXLAN segmentation. One NOS image, multi-vendor hardware.

DC · Multi-Tenant

Hyperscaler · Brownfield Refresh

TH3/TH4 fabric refresh without forklift.

"We have a TH4 fabric in production. The next training cluster needs 800G NICs. We don't want to redesign the whole NOS layer to upgrade the silicon."

Same OcNOS-DC image runs on TH3, TH4, and TH5 platforms. Brownfield refresh keeps configs, automation, and gNMI pipelines intact. UEC 1.0 fabric profile already aligned for the next NIC generation.

DC · UEC-Ready

Matrice completa delle funzionalità Soluzione AI fabric Reference Topologies Hardware compatibility list

Frequently Asked

The questions architects actually ask.

Which Tomahawk 5 switches run OcNOS-DC?

Three open-hardware platforms across two ODMs: Edgecore AIS800-64D (DCS560 chassis) and UfiSpace S9321-64E (QSFP-DD) and S9321-64EO (OSFP). All three ship ONIE pre-loaded and run the same OcNOS-DC image — same configuration, same feature surface, same automation hooks. Two vendors means dual-source BoM is realistic for hyperscale and NeoCloud procurement.

QSFP-DD vs OSFP — when do I need the S9321-64EO?

QSFP-DD (AIS800-64D and S9321-64E) is the high-volume optics ecosystem — the right default for short-reach 800G inside the data center. OSFP (S9321-64EO) provides higher-power cages for module classes QSFP-DD cannot host: 800G ZR/ZR+ coherent for DCI, longer-reach DR4/DR8, and pluggable amplifiers. Pick OSFP when the optics drive the cage choice; otherwise QSFP-DD wins on cost and ecosystem breadth.

How does Tomahawk 5 compare to Tomahawk 4 — when do I pick which?

TH4 is 25.6 Tbps · 64×400G · 7 nm · 50G PAM4. TH5 doubles per-port speed and total switching capacity at the same 64-port radix (51.2 Tbps · 64×800G · 5 nm · 100G PAM4). Decision rule: if the cluster needs 800G ports natively, or each spine port needs to carry twice the bandwidth (halving the cable plant for the same aggregate fabric throughput), pick TH5. If the design is built around 400G NICs and a single-pod footprint, TH4 is still excellent and cheaper per port. OcNOS-DC supports both with the same feature set — brownfield refresh stays clean.

Does Tomahawk 5 support Ultra Ethernet (UEC)?

TH5 has the hardware mechanisms UEC 1.0 fabric profiles need — per-packet ECMP, packet-spray-friendly forwarding, shared-buffer scheduling that tolerates out-of-order delivery. UEC itself lives mostly in the NIC; TH5 fabrics running OcNOS-DC will carry UEC traffic correctly when UEC NICs ship in volume. RoCEv2 and UEC coexist on the same switch — migrate clusters NIC-by-NIC, no fabric replacement.

What does OcNOS-DC light up on TH5 that community SONiC does not?

On TH5, OcNOS-DC ships pre-tuned for AI fabrics: PFC over L3, ETS, Dynamic ECN, DLB Reactive-Path Rebalance, DLB Random-Flow, PFC Deadlock Detection & Recovery, xCCL-aligned buffer profiles, DCBX LLDP. On the same silicon it also runs a full carrier-grade Layer 3 stack — BGP, OSPF, IS-IS, SR-MPLS, EVPN-VXLAN — that AI-only stacks typically don't cover. 215 features validated across 8 categories, every entry verifiable on the public OcNOS Feature Matrix.

Where is Tomahawk 5 the wrong choice?

Edge SP, gateway cell-site, aggregazione sub-1 Tbps. Il radix 64×800G non giustifica il proprio spazio in rack in questi ruoli. Per il routing SP, OcNOS valida Broadcom Qumran (Q2C, Q2C+) e Jericho (J2C+); per leaf DC 100G/400G in deployment single-pod, Trident (TD3-X7, TD4) offre un'economia migliore. Inquadramento onesto: TH5 vince quando contano sia il radix 800G sia le primitive AI fabric, non quando ne conta una sola.

Designing a Tomahawk 5 fabric? Let's size it together.

30-minute architecture session with an OcNOS network architect. Bring your GPU count, NIC speed, and tier preference — leave with a sized BoM across all four TH5 SKUs.

Prenota una revisione dell'architettura Test Drive OcNOS-DC

References

Tomahawk 5 references & further reading

Vendor doc Broadcom BCM78900 StrataXGS Tomahawk 5 series
Vendor doc Edgecore AIS800-64O 64x800G AI/ML switch (Tomahawk 5)
Vendor doc UfiSpace S9321-64E 800G AI/ML data center switch (Tomahawk 5)
Standard IEEE 802.3df-2024: 800 Gb/s and 400 Gb/s Ethernet amendment
Spec IEEE P802.3df Task Force (800 GbE PHY and MAC)

Soluzioni

Prodotti

Risorse

Partner

Azienda

Broadcom Tomahawk 5 Switch Tomahawk 5 Three 800G open platforms, validated on OcNOS-DC.

Three 800G platforms. Two ODMs. One OcNOS-DC image.

AIS800-64D

S9321-64E

S9321-64EO

· How to choose between the four

Tomahawk 5 — Broadcom's flagship merchant switch ASIC.

· What 64 × 800G looks like

Why TH5 ended up in almost every open AI fabric built since 2024.

Same lane count, twice the speed.

Shared-buffer, not partitioned.

Hardware adaptive routing.

5 nm thermal headroom.

Per-port speed doubled. Capacity doubled. Same 64-port radix.

Carrier-grade NOS. AI-tuned defaults.

Shared-buffer architecture, zero-drop east-west.

DLB rebinds flowlets in 64 µs.

Per-port, per-priority. Auto-drain.

gNMI on-change, OpenConfig YANG.

BGP · OSPF · IS-IS · EVPN-VXLAN.

215 features across 8 categories — pulled from the live OcNOS Feature Matrix.

ZTP. gNMI on-change. NETCONF + YANG. DCBX.

Three operator profiles. One silicon + NOS combo.

Training fabric up to the 16k-GPU ceiling on open silicon.

Multi-tenant fabric, BoM under control.

TH3/TH4 fabric refresh without forklift.

The questions architects actually ask.

Designing a Tomahawk 5 fabric? Let's size it together.

Tomahawk 5 references & further reading