Network Automation

Stop Polling, Start Streaming: Why SNMP is Failing Modern Networks and How gNMI Fixes It

Simple Network Management Protocol (SNMP) was designed in 1988 for networks that carried kilobits of traffic per second. It works by polling — a management system asks a device for its current state, the device responds, and the cycle repeats every 60–300 seconds. For the networks SNMP was designed for, this was adequate.

For a modern AI data center where a single link can move 800 gigabits per second, or a service provider network carrying real-time 5G traffic, SNMP polling is not just inadequate — it actively masks the network behavior that operators need to observe.

The Fundamental Problem with Polling

SNMP (5-min polling) T=0 poll T=5min poll T=10min Congestion (30s) ✗ Missed — occurs between polls gNMI On-Change Streaming Congestion (30s) ✓ Detected within ms Alert fired immediately SNMP misses short-lived events. gNMI on-change fires the moment state changes.
SNMP polling (5-minute interval) completely misses a 30-second congestion event that occurs between polls. gNMI on-change streaming detects the event within milliseconds and pushes data to the collector immediately — enabling automated remediation before the event becomes a service impacting issue.

Why SNMP Fails in AI and High-Speed Networks

  • Polling granularity — even aggressive 1-minute SNMP polling misses events that last seconds. Queue fills and drains in under 100ms in 800G fabrics.
  • CPU overhead — each SNMP poll request consumes CPU cycles on the network device. At scale, polling hundreds of interfaces creates a predictable CPU spike every cycle.
  • No event semantics — SNMP tells you the state at polling time. It does not tell you what changed, when it changed, or what caused it.
  • Counter wrap — 32-bit SNMP counters wrap in under 7 seconds on a 400G interface. Most SNMP implementations use 64-bit counters, but the wrapping behavior creates gaps in high-speed monitoring.
  • AI fabric specific — PFC pause frame storms, ECN marking events, and RoCEv2 retransmission bursts are transient and invisible to SNMP polling.

gNMI Streaming Telemetry: How It Works

gNMI (gRPC Network Management Interface) inverts the monitoring model. Instead of the management system asking the device for data, the device pushes data to the collector the moment it changes. Two subscription modes are available:

  • On-Change — data is pushed only when the value changes. A link going down triggers an immediate push. Ideal for state changes (interface status, BGP session state, route table changes).
  • Sample — data is pushed at a configured interval. Used for counters and metrics where absolute values at regular intervals are needed (queue depth, byte counters).

Configuring gNMI On-Change in OcNOS

! OcNOS -- gNMI streaming telemetry configuration
!
telemetry
  !
  ! Subscription 1: on-change for interface state (link events)
  subscription INTERFACE-EVENTS
    protocol gNMI
    encoding JSON_IETF
    sensor-group INTF-STATE
      path /interfaces/interface/state/oper-status
      path /interfaces/interface/state/counters
    !
    destination-group COLLECTOR-1
      address 10.100.0.10 port 57400
      protocol gRPC no-tls
    !
    sample-interval 0              ! 0 = on-change mode
    suppress-redundant true        ! Don't push if value unchanged
  !
  ! Subscription 2: sampled for queue depth (10-second interval)
  subscription QUEUE-DEPTH
    protocol gNMI
    encoding JSON_IETF
    sensor-group QOS-QUEUES
      path /qos/interfaces/interface/output/queues/queue/state
    !
    destination-group COLLECTOR-1
    sample-interval 10000          ! 10 seconds in milliseconds
  !
  ! Subscription 3: on-change for BGP session state
  subscription BGP-EVENTS
    protocol gNMI
    encoding JSON_IETF
    sensor-group BGP-STATE
      path /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/state/session-state
    !
    destination-group COLLECTOR-1
    sample-interval 0              ! On-change: fires on session up/down
  !

OpenConfig vs. Native YANG Models

! OcNOS -- Using OpenConfig data models for vendor-neutral monitoring
!
! OpenConfig paths are standardized across vendors --
! the same collector config works with OcNOS, Arista, Juniper, etc.
!
telemetry
  subscription OC-INTERFACES
    protocol gNMI
    encoding JSON_IETF
    sensor-group OC-INTF
      ! OpenConfig interface model (vendor-neutral)
      path openconfig:/interfaces/interface[name=*]/state
    !
    destination-group COLLECTOR-1
    sample-interval 0
  !
!
! Verify subscriptions are active:
show telemetry subscription
show telemetry sensor-group
!
! Check collector connectivity:
show telemetry destination

Migration Path: SNMP to gNMI

Step Action Notes
1 Deploy gNMI collector (Prometheus + gNMI Exporter, InfluxDB, or commercial) Run in parallel with SNMP initially
2 Configure OcNOS gNMI subscriptions for key sensors Start with interface state and BGP events
3 Validate data completeness against SNMP baselines 2–4 week parallel run
4 Expand to queue, PFC, and optical sensors These are invisible to SNMP anyway
5 Decommission SNMP polling for covered sensors Keep SNMP only for legacy devices that lack gNMI

IP Infusion Engineering Team

Share