BCM78900 · TSMC 5 nm · Shipping since March 2023

Broadcom Tomahawk 5 Switch Tomahawk 5 Three 800G open platforms, validated on OcNOS-DC.

Network engineer che scelgono uno switch Tomahawk 5: si parte da qui. Edgecore AIS800-64D, UfiSpace S9321-64E e S9321-64EO. Stesso silicio, stessa immagine OcNOS-DC, tre percorsi di acquisto. Specifiche, criteri di scelta e il perimetro di funzionalità di OcNOS-DC, senza orpelli di marketing.

51.2Tbps
Switch Capacity
64×800G
Native Port Radix
4SKUs
OcNOS-Validated
2ODMs
Edgecore · UfiSpace
5nm
TSMC N5 Process
01
The Switches
Open hardware running Tomahawk 5

Three 800G platforms. Two ODMs. One OcNOS-DC image.

Two hardware designs, four SKUs. All four ship ONIE pre-loaded and run the same OcNOS-DC image — the differences are form factor (QSFP-DD vs OSFP), branding (AI-fabric SKU vs general-DC SKU), and which optics ecosystem the deployment is built around. Each card links to the full vendor datasheet (PDF, hosted locally).

Edgecore· DCS560 platform family
AI fabric spine

AIS800-64D

Validated on OcNOS-DC · ONIE pre-loaded
Ports
64 × QSFP-DD800Breakout: 2×400 / 4×200 / 8×100 (320 logical ports)
Form
2RU
Power
2× 3000 W AC/DC redundant30 W per QSFP-DD cage
CPU
Intel Xeon D1713NTE
▌ Pick this when

GPU-cluster AI fabric. Edgecore DCS560 chassis with the AI-fabric SKU framing.

UfiSpace· S9321 platform family
AI/ML fabric spine

S9321-64E

Validated on OcNOS-DC · ONIE pre-loaded
Ports
64 × QSFP-DD (200/400/800G)Breakout: 2×400 / 4×200 / 8×100
Form
2RU · 23.72 kg
Power
913 W typical (no transceivers)30 W per QSFP-DD cage
CPU
Intel Icelake-D 4-core · 32 GB DDR4
▌ Pick this when

Large, low-entropy AI/ML flows. UfiSpace markets the 64E for AllReduce-dominant traffic where TH5 adaptive routing is the design centre.

UfiSpace· S9321 platform family
800G DCI · coherent optics

S9321-64EO

Validated on OcNOS-DC · ONIE pre-loaded
Ports
64 × OSFP (200/400/800G)Breakout: 2×400 / 4×200 / 8×100
Form
2RU · 23.74 kg
Power
925 W typical · 200–240 V ACOSFP cages for higher-power optics
CPU
Intel Icelake-D · 32 GB DDR4
▌ Pick this when

800G ZR/ZR+ coherent or other higher-power module classes. OSFP form factor of the 64E — pick when the optics drive the cage choice.

· How to choose between the four

AIS800 vs S9321-64ELo stesso silicio TH5, due ODM. Edgecore DCS560 (AIS800-64D) vs UfiSpace S9321: BoM dual-source per l'approvvigionamento hyperscale e NeoCloud.
QSFP-DD vs OSFPQSFP-DD (S9321-64E + entrambi gli SKU Edgecore) per l'ecosistema ottico ad alto volume. OSFP (S9321-64EO) per le classi di moduli a maggiore consumo, incluso 800G ZR/ZR+ coerente.
Edgecore vs UfiSpaceBoth are open-hardware ODMs with strong IP Infusion co-design. Pick by your ODM relationship, RMA logistics, or BoM economics.
Single-vendor riskTwo vendors with TH5 platforms means dual-source BoM is realistic — important for hyperscale and NeoCloud procurement.
02
Inside the Silicon
What 51.2 Tbps in one die buys you

Tomahawk 5 — Broadcom's flagship merchant switch ASIC.

The BCM78900 is a single 5 nm monolithic die delivering 51.2 Tbps of switching capacity — feeding 64 ports of 800GbE, 128 of 400G, or 256 of 200G natively. It was Broadcom's first 5 nm merchant switch IC and the first product anywhere to support 800GbE at the cage. 512 SerDes lanes running 100G PAM4 — the same lane count as Tomahawk 4, twice the per-lane speed.

Beyond raw capacity, three architectural choices made TH5 the silicon under most production AI fabrics: a shared-buffer architecture che assorbe in hardware i micro-burst collettivi xCCL (NCCL / RCCL / oneCCL) Cognitive Routing (DLB) that rebinds elephant flows in the ASIC, and 5 nm thermal headroom that lets 30 W QSFP-DD800 cages run without per-port active cooling.

Specs verifiable against Broadcom's public BCM78900 product page.

ProcessoTSMC N5 SeriesStrataXGS BufferShared, RDMA-tuned RoutingCognitive · DLB ShippingSince Mar 2023

· What 64 × 800G looks like

BCM78900 die51,2 Tbps
512 lanes × 100G PAM4 = 51.2 Tbps. Eight lanes per cage → 800G. The arithmetic is the architecture.
Four design choices that matter

Why TH5 ended up in almost every open AI fabric built since 2024.

The headline number gets the press. These four engineering choices are what AI fabric architects actually care about.

PRINCIPLE 01

Same lane count, twice the speed.

TH5 carries the same 512 SerDes lanes as TH4 — running them at 100G PAM4 instead of 50G. The throughput double came from speeding up existing infrastructure, not adding to it.

100G PAM4 · 106 Gbps
PRINCIPLE 02

Shared-buffer, not partitioned.

Pool di memoria pacchetti condivisi su tutte le 64 porte, non suddivisi per singola porta. I micro-burst xCCL AllReduce su una porta vengono assorbiti nel pool fabric-wide invece di innescare tail-drop. Il motivo in una riga per cui TH5 vince su RoCEv2.

Shared-buffer · RDMA-tuned
PRINCIPLE 03

Hardware adaptive routing.

Broadcom Cognitive Routing detects congested paths and rebinds elephant flows in the ASIC — no controller round-trip, no ECMP rehashing. OcNOS-DC turns it on as DLB Reactive-Path Rebalance.

DLB · 64 µs flowlet
PRINCIPLE 04

5 nm thermal headroom.

The first 5 nm merchant switch IC. The process shrink is what made 30 W per QSFP-DD800 cage feasible without active per-port cooling — including high-power 800G optics and 8×100G breakout.

TSMC N5 · 30 W/port
03
Generation Jump
Tomahawk 4 → Tomahawk 5

Per-port speed doubled. Capacity doubled. Same 64-port radix.

Inquadramento onesto: TH4 (25,6 Tbps · 64×400G · 7 nm) resta eccellente per cluster costruiti attorno a NIC 400G. TH5 si guadagna il proprio spazio nel rack quando contano sia 800G per porta sia le primitive di AI fabric.

Switching capacity
25.6 Tbps 51,2 Tbps

Doubled at the same rack footprint. Same 2RU, same power envelope class.

Per-port speed
64 × 400G 64 × 800G

Lo stesso radix a 64 porte sulle effettive piattaforme IPI (AS9736-64D → AIS800-64D / S9321). La banda per porta raddoppia, quindi ogni livello Clos trasporta il doppio del traffico.

Process node
7 nm 5 nm

First 5 nm merchant switch IC. Thermal headroom for 30 W/port without active cooling.

SerDes per lane
50G PAM4 100G PAM4

Same 512 lanes, twice the speed. The throughput double came from existing infrastructure.

Brownfield refresh stays clean. The same OcNOS-DC image runs on TH3, TH4, and TH5 platforms — configurations, automation, and gNMI pipelines carry over. Pick TH5 for the next cluster; keep TH4 where it already works.
04
What OcNOS-DC Ships
OcNOS-DC on this silicon

Carrier-grade NOS. AI-tuned defaults.

Tomahawk 5 has the hardware. The job of the NOS is to expose it — to operators, to telemetry pipelines, to the cluster scheduler — without forcing them to write CLI gymnastics around it. OcNOS-DC ships these primitives as first-class configurable objects with YANG-modelled state.

RoCEv2 lossless

Shared-buffer architecture, zero-drop east-west.

OcNOS-DC fornisce PFC + ETS + Dynamic ECN pre-tarati sui pattern collettivi xCCL. La latenza di coda rimane contenuta anche sotto i micro-burst AllReduce che mettono fuori uso i fabric NOS community. Il pool di buffer condiviso del TH5 assorbe il traffico sincronizzato many-to-one che provocherebbe tail-drop su chip con buffer partizionati.

Adaptive Routing

DLB rebinds flowlets in 64 µs.

ECMP hash-collision under elephant flows is the AI fabric killer. OcNOS-DC turns on TH5 Cognitive Routing's flowlet rebinding so AllReduce traffic spreads across every spine path automatically.

PFC Deadlock Watchdog

Per-port, per-priority. Auto-drain.

Detects paused-queue cycles before they hang training jobs. Auto-recovers without operator intervention.

Telemetria in streaming

gNMI on-change, OpenConfig YANG.

Buffer depth, ECN marks, PFC pause counts — every threshold a knob, every counter a sensor path. Plugs into Prometheus, Grafana, OTel.

Real Network

BGP · OSPF · IS-IS · EVPN-VXLAN.

The TH5 spine is also a real router. Full carrier-grade Layer 3 stack on the same silicon — operate the AI fabric like the rest of your network, not like a black box.

Validated feature surface

215 features across 8 categories — pulled from the live OcNOS Feature Matrix.

Layer 3 routing · L1/L2 · AI/ML fabric primitives · Multicast · QoS · Security · Hardware · Management. Every entry verifiable per-platform on the public matrix.

RoCEv2 / PFC DCQCN DLB EVPN-VXLAN SR-MPLS BGP / OSPF / IS-IS gNMI / NETCONF ZTP UEC 1.0 ready
Day-0 to Day-2

ZTP. gNMI on-change. NETCONF + YANG. DCBX.

Bring up a TH5 spine in the rack with zero-touch provisioning. Stream every counter to your observability stack. Tune every threshold via YANG-modelled config. No glue scripts.

ZTP IPv4/IPv6 gNMI NETCONF OpenConfig YANG DCBX LLDP Ansible Terraform provider
Who builds this stack

Three operator profiles. One silicon + NOS combo.

Same TH5 die, same OcNOS-DC image, three different framings of the same architectural question: how do you scale lossless east-west without locking the whole stack to one vendor?

AI Cluster Operator

Training fabric up to the 16k-GPU ceiling on open silicon.

"We need 800G to the leaf, lossless RoCEv2, and tail latency that doesn't blow up under AllReduce. Single-vendor lock-in is not on the table."

Spine TH5 64×800G, RoCEv2 con DCQCN ottimizzato per xCCL, rebinding DLB sub-millisecondo, watchdog deadlock PFC. Stesso radix a 64 porte del TH4 ma ogni porta spine trasporta 800G: dimezza il cablaggio spine-leaf a parità di banda aggregata del fabric.

DC · AI Fabric SKU
NeoCloud · GPU-as-a-Service

Multi-tenant fabric, BoM under control.

"Our customers pick the GPU. We can't tie our fabric BoM to their NIC choice. We need a switch we can buy from two vendors at minimum."

Four OcNOS-validated TH5 SKUs across two vendors (Edgecore, UfiSpace). VRF-Lite tenant isolation, gNMI per-tenant telemetry, EVPN-VXLAN segmentation. One NOS image, multi-vendor hardware.

DC · Multi-Tenant
Hyperscaler · Brownfield Refresh

TH3/TH4 fabric refresh without forklift.

"We have a TH4 fabric in production. The next training cluster needs 800G NICs. We don't want to redesign the whole NOS layer to upgrade the silicon."

Same OcNOS-DC image runs on TH3, TH4, and TH5 platforms. Brownfield refresh keeps configs, automation, and gNMI pipelines intact. UEC 1.0 fabric profile already aligned for the next NIC generation.

DC · UEC-Ready
Frequently Asked

The questions architects actually ask.

Three open-hardware platforms across two ODMs: Edgecore AIS800-64D (DCS560 chassis) and UfiSpace S9321-64E (QSFP-DD) and S9321-64EO (OSFP). All three ship ONIE pre-loaded and run the same OcNOS-DC image — same configuration, same feature surface, same automation hooks. Two vendors means dual-source BoM is realistic for hyperscale and NeoCloud procurement.
QSFP-DD (AIS800-64D and S9321-64E) is the high-volume optics ecosystem — the right default for short-reach 800G inside the data center. OSFP (S9321-64EO) provides higher-power cages for module classes QSFP-DD cannot host: 800G ZR/ZR+ coherent for DCI, longer-reach DR4/DR8, and pluggable amplifiers. Pick OSFP when the optics drive the cage choice; otherwise QSFP-DD wins on cost and ecosystem breadth.
TH4 is 25.6 Tbps · 64×400G · 7 nm · 50G PAM4. TH5 doubles per-port speed and total switching capacity at the same 64-port radix (51.2 Tbps · 64×800G · 5 nm · 100G PAM4). Decision rule: if the cluster needs 800G ports natively, or each spine port needs to carry twice the bandwidth (halving the cable plant for the same aggregate fabric throughput), pick TH5. If the design is built around 400G NICs and a single-pod footprint, TH4 is still excellent and cheaper per port. OcNOS-DC supports both with the same feature set — brownfield refresh stays clean.
TH5 has the hardware mechanisms UEC 1.0 fabric profiles need — per-packet ECMP, packet-spray-friendly forwarding, shared-buffer scheduling that tolerates out-of-order delivery. UEC itself lives mostly in the NIC; TH5 fabrics running OcNOS-DC will carry UEC traffic correctly when UEC NICs ship in volume. RoCEv2 and UEC coexist on the same switch — migrate clusters NIC-by-NIC, no fabric replacement.
On TH5, OcNOS-DC ships pre-tuned for AI fabrics: PFC over L3, ETS, Dynamic ECN, DLB Reactive-Path Rebalance, DLB Random-Flow, PFC Deadlock Detection & Recovery, xCCL-aligned buffer profiles, DCBX LLDP. On the same silicon it also runs a full carrier-grade Layer 3 stack — BGP, OSPF, IS-IS, SR-MPLS, EVPN-VXLAN — that AI-only stacks typically don't cover. 215 features validated across 8 categories, every entry verifiable on the public OcNOS Feature Matrix.
Edge SP, gateway cell-site, aggregazione sub-1 Tbps. Il radix 64×800G non giustifica il proprio spazio in rack in questi ruoli. Per il routing SP, OcNOS valida Broadcom Qumran (Q2C, Q2C+) e Jericho (J2C+); per leaf DC 100G/400G in deployment single-pod, Trident (TD3-X7, TD4) offre un'economia migliore. Inquadramento onesto: TH5 vince quando contano sia il radix 800G sia le primitive AI fabric, non quando ne conta una sola.

Designing a Tomahawk 5 fabric? Let's size it together.

30-minute architecture session with an OcNOS network architect. Bring your GPU count, NIC speed, and tier preference — leave with a sized BoM across all four TH5 SKUs.