Simple Network Management Protocol (SNMP) was designed in 1988 for networks that carried kilobits of traffic per second. It works by polling — a management system asks a device for its current state, the device responds, and the cycle repeats every 60–300 seconds. For the networks SNMP was designed for, this was adequate.
For a modern AI data center where a single link can move 800 gigabits per second, or a service provider network carrying real-time 5G traffic, SNMP polling is not just inadequate — it actively masks the network behavior that operators need to observe.
The Fundamental Problem with Polling
Why SNMP Fails in AI and High-Speed Networks
- Polling granularity — even aggressive 1-minute SNMP polling misses events that last seconds. Queue fills and drains in under 100ms in 800G fabrics.
- CPU overhead — each SNMP poll request consumes CPU cycles on the network device. At scale, polling hundreds of interfaces creates a predictable CPU spike every cycle.
- No event semantics — SNMP tells you the state at polling time. It does not tell you what changed, when it changed, or what caused it.
- Counter wrap — 32-bit SNMP counters wrap in under 7 seconds on a 400G interface. Most SNMP implementations use 64-bit counters, but the wrapping behavior creates gaps in high-speed monitoring.
- AI fabric specific — PFC pause frame storms, ECN marking events, and RoCEv2 retransmission bursts are transient and invisible to SNMP polling.
gNMI Streaming Telemetry: How It Works
gNMI (gRPC Network Management Interface) inverts the monitoring model. Instead of the management system asking the device for data, the device pushes data to the collector the moment it changes. Two subscription modes are available:
- On-Change — data is pushed only when the value changes. A link going down triggers an immediate push. Ideal for state changes (interface status, BGP session state, route table changes).
- Sample — data is pushed at a configured interval. Used for counters and metrics where absolute values at regular intervals are needed (queue depth, byte counters).
Configuring gNMI On-Change in OcNOS
! OcNOS -- gNMI streaming telemetry configuration
!
telemetry
!
! Subscription 1: on-change for interface state (link events)
subscription INTERFACE-EVENTS
protocol gNMI
encoding JSON_IETF
sensor-group INTF-STATE
path /interfaces/interface/state/oper-status
path /interfaces/interface/state/counters
!
destination-group COLLECTOR-1
address 10.100.0.10 port 57400
protocol gRPC no-tls
!
sample-interval 0 ! 0 = on-change mode
suppress-redundant true ! Don't push if value unchanged
!
! Subscription 2: sampled for queue depth (10-second interval)
subscription QUEUE-DEPTH
protocol gNMI
encoding JSON_IETF
sensor-group QOS-QUEUES
path /qos/interfaces/interface/output/queues/queue/state
!
destination-group COLLECTOR-1
sample-interval 10000 ! 10 seconds in milliseconds
!
! Subscription 3: on-change for BGP session state
subscription BGP-EVENTS
protocol gNMI
encoding JSON_IETF
sensor-group BGP-STATE
path /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/state/session-state
!
destination-group COLLECTOR-1
sample-interval 0 ! On-change: fires on session up/down
!
OpenConfig vs. Native YANG Models
! OcNOS -- Using OpenConfig data models for vendor-neutral monitoring
!
! OpenConfig paths are standardized across vendors --
! the same collector config works with OcNOS, Arista, Juniper, etc.
!
telemetry
subscription OC-INTERFACES
protocol gNMI
encoding JSON_IETF
sensor-group OC-INTF
! OpenConfig interface model (vendor-neutral)
path openconfig:/interfaces/interface[name=*]/state
!
destination-group COLLECTOR-1
sample-interval 0
!
!
! Verify subscriptions are active:
show telemetry subscription
show telemetry sensor-group
!
! Check collector connectivity:
show telemetry destination
Migration Path: SNMP to gNMI
| Step | Action | Notes |
|---|---|---|
| 1 | Deploy gNMI collector (Prometheus + gNMI Exporter, InfluxDB, or commercial) | Run in parallel with SNMP initially |
| 2 | Configure OcNOS gNMI subscriptions for key sensors | Start with interface state and BGP events |
| 3 | Validate data completeness against SNMP baselines | 2–4 week parallel run |
| 4 | Expand to queue, PFC, and optical sensors | These are invisible to SNMP anyway |
| 5 | Decommission SNMP polling for covered sensors | Keep SNMP only for legacy devices that lack gNMI |
- Automating Network Deployments with OcNOS
- Performance Measurement: TWAMP and Y.1731
- OcNOS 7.0 SP — On-Change gNMI Telemetry
- IP Maestro — Element Management for OcNOS
- Contact IP Infusion
IP Infusion Engineering Team