EVPN Multi-Homing — ESI-LAG Active/Active
A production AI server has two NICs into two leaves — both active, both forwarding, no active/standby waste. EVPN multi-homing (RFC 7432, ESI-LAG) is the standards-based way to get there: no proprietary MLAG cabling, no inter-switch sync link. Just BGP, an Ethernet Segment Identifier, and the protocol does the rest.
Active/Active Server Attachment
A GPU server with two bonded NICs attaches to two leaves. Both leaves share the same Ethernet Segment ID (ESI). Both advertise the server's MAC into EVPN with the same ESI. Remote leaves install both as ECMP next-hops — aliasing across the ESI peers. On link failure, mass-withdraw collapses convergence to the BGP propagation time.
Why ESI-LAG over MLAG
Traditional Multi-Chassis LAG (MLAG) gives you Active/Active server attachment, but at the cost of a proprietary Inter-Chassis Link (ICL), per-vendor synchronization protocols, and forklift compatibility constraints between leaf models. EVPN multi-homing replaces all of that with BGP and a six-byte Ethernet Segment Identifier.
With EVPN multi-homing, the two leaves don't need to know about each other directly. They both advertise the same ESI on the relevant Ethernet Segment, and the EVPN control plane handles designated forwarder election, aliasing, and mass-withdraw. The leaves can be different vendors, different generations, even different platforms — as long as they speak EVPN and ESI-LAG correctly, multi-homing works.
The four EVPN multi-homing primitives
Auto-Discovery per ESI / per EVI
Each leaf advertises Type-1 (Auto-Discovery) routes for the ESI. Receivers learn which leaves participate in the segment and use this for aliasing and mass-withdraw on failure.
Ethernet Segment route
Type-4 routes drive Designated Forwarder election among the leaves attached to the same ESI. The DF is responsible for forwarding BUM (broadcast/unknown/multicast) traffic toward the segment.
ECMP across the ESI peers
Remote VTEPs install both leaf VTEPs as next-hops for the segment's MACs. Unicast traffic ECMP-spreads across the two paths — Active/Active utilisation without per-flow stickiness.
Sub-second convergence on failure
When a leaf loses its link to the server, it withdraws its Type-1 ESI route. Remote VTEPs collapse the ESI's next-hop set in a single update. No per-MAC withdrawal storm.
BUM loop prevention
The non-DF and the DF coordinate via the segment's local-bias to prevent a BUM frame from looping back to its origin server. ESI label split-horizon filtering makes this stateless on the data plane.
Service interface flexibility
OcNOS supports both VLAN-Based and VLAN-Aware service interfaces, with per-EVI ESI configuration. Mix tenants and physical-segment topologies as the deployment requires.
What this gives you in production
- Standards-based redundancy. RFC 7432 and RFC 8365 — same protocol every modern DC vendor implements. No proprietary tax, no leaf-vendor lock-in.
- 2× bandwidth utilisation. Both NICs forward live traffic; no Active/Standby waste. Critical for AI servers where 2× 200G or 2× 400G into the leaf is the cabling baseline.
- Sub-second link-failure convergence. Mass-withdraw collapses the convergence event to BGP propagation time — typically inside one second on a tuned fabric.
- No ICL cable. The MLAG inter-chassis link goes away. Cabling, port consumption, and the failure-mode complexity of ICL split-brain all disappear.
- Multi-vendor leaf pairs. The two leaves on the same ESI don't need to be the same model or vendor. EVPN handles the protocol; the data plane just forwards.
- Validated in OcNOS-DC. ESI-LAG Active/Active is part of the DC-IPBASE feature set — production-grade on every supported Tomahawk and Trident platform.