Stop Polling, Start Streaming: Why SNMP is Failing Modern Networks and How gNMI Fixes It

Simple Network Management Protocol (SNMP) was designed in 1988 for networks that carried kilobits of traffic per second. It works by polling — a management system asks a device for its current state, the device responds, and the cycle repeats every 60–300 seconds. For the networks SNMP was designed for, this was adequate.

For a modern AI data center where a single link can move 800 gigabits per second, or a service provider network carrying real-time 5G traffic, SNMP polling is not just inadequate — it actively masks the network behavior that operators need to observe.

The Fundamental Problem with Polling

SNMP (5-min polling) T=0 poll T=5min poll T=10min Congestion (30s) ✗ Missed — occurs between polls gNMI On-Change Streaming Congestion (30s) ✓ Detected within ms Alert fired immediately SNMP misses short-lived events. gNMI on-change fires the moment state changes.
SNMP polling (5-minute interval) completely misses a 30-second congestion event that occurs between polls. gNMI on-change streaming detects the event within milliseconds and pushes data to the collector immediately — enabling automated remediation before the event becomes a service impacting issue.

Why SNMP Fails in AI and High-Speed Networks

  • Polling granularity — even aggressive 1-minute SNMP polling misses events that last seconds. Queue fills and drains in under 100ms in 800G fabrics.
  • CPU overhead — each SNMP poll request consumes CPU cycles on the network device. At scale, polling hundreds of interfaces creates a predictable CPU spike every cycle.
  • No event semantics — SNMP tells you the state at polling time. It does not tell you what changed, when it changed, or what caused it.
  • Counter wrap — 32-bit SNMP counters wrap in under 7 seconds on a 400G interface. Most SNMP implementations use 64-bit counters, but the wrapping behavior creates gaps in high-speed monitoring.
  • AI fabric specific — PFC pause frame storms, ECN marking events, and RoCEv2 retransmission bursts are transient and invisible to SNMP polling.

gNMI Streaming Telemetry: How It Works

gNMI (gRPC Network Management Interface) inverts the monitoring model. Instead of the management system asking the device for data, the device pushes data to the collector the moment it changes. Two subscription modes are available:

  • On-Change — data is pushed only when the value changes. A link going down triggers an immediate push. Ideal for state changes (interface status, BGP session state, route table changes).
  • Sample — data is pushed at a configured interval. Used for counters and metrics where absolute values at regular intervals are needed (queue depth, byte counters).

Configuring gNMI On-Change in OcNOS

! OcNOS -- gNMI streaming telemetry configuration
!
telemetry
  !
  ! Subscription 1: on-change for interface state (link events)
  subscription INTERFACE-EVENTS
    protocol gNMI
    encoding JSON_IETF
    sensor-group INTF-STATE
      path /interfaces/interface/state/oper-status
      path /interfaces/interface/state/counters
    !
    destination-group COLLECTOR-1
      address 10.100.0.10 port 57400
      protocol gRPC no-tls
    !
    sample-interval 0              ! 0 = on-change mode
    suppress-redundant true        ! Don't push if value unchanged
  !
  ! Subscription 2: sampled for queue depth (10-second interval)
  subscription QUEUE-DEPTH
    protocol gNMI
    encoding JSON_IETF
    sensor-group QOS-QUEUES
      path /qos/interfaces/interface/output/queues/queue/state
    !
    destination-group COLLECTOR-1
    sample-interval 10000          ! 10 seconds in milliseconds
  !
  ! Subscription 3: on-change for BGP session state
  subscription BGP-EVENTS
    protocol gNMI
    encoding JSON_IETF
    sensor-group BGP-STATE
      path /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/state/session-state
    !
    destination-group COLLECTOR-1
    sample-interval 0              ! On-change: fires on session up/down
  !

OpenConfig vs. Native YANG Models

! OcNOS -- Using OpenConfig data models for vendor-neutral monitoring
!
! OpenConfig paths are standardized across vendors --
! the same collector config works with OcNOS, Arista, Juniper, etc.
!
telemetry
  subscription OC-INTERFACES
    protocol gNMI
    encoding JSON_IETF
    sensor-group OC-INTF
      ! OpenConfig interface model (vendor-neutral)
      path openconfig:/interfaces/interface[name=*]/state
    !
    destination-group COLLECTOR-1
    sample-interval 0
  !
!
! Verify subscriptions are active:
show telemetry subscription
show telemetry sensor-group
!
! Check collector connectivity:
show telemetry destination

Migration Path: SNMP to gNMI

Step Action Notes
1 Deploy gNMI collector (Prometheus + gNMI Exporter, InfluxDB, or commercial) Run in parallel with SNMP initially
2 Configure OcNOS gNMI subscriptions for key sensors Start with interface state and BGP events
3 Validate data completeness against SNMP baselines 2–4 week parallel run
4 Expand to queue, PFC, and optical sensors These are invisible to SNMP anyway
5 Decommission SNMP polling for covered sensors Keep SNMP only for legacy devices that lack gNMI

IP Infusion Engineering Team

Simplifying OcNOS Configuration Backups with Ansible

Configuration backups are one of those operational tasks that every network team acknowledges is critical but often lacks a clean, automated solution for. OcNOS does not have built-in scheduled backup functionality, but it integrates cleanly with Ansible — making it straightforward to build a robust backup workflow using tools most network teams already have.

This guide covers three approaches from simple to full production-grade:

  1. Basic SSH backup using Ansible raw module
  2. NETCONF-based structured config retrieval
  3. Git-backed version history with drift detection

Approach 1: SSH Backup with Ansible

# backup-ocnos.yaml -- Simple Ansible playbook for OcNOS config backup
# Run via cron: ansible-playbook backup-ocnos.yaml
# Or schedule with Ansible AWX / AAP

- name: Backup OcNOS configurations
  hosts: ocnos_nodes
  gather_facts: false

  vars:
    backup_dir: "/opt/network-backups/ocnos"
    timestamp: "{{ lookup('pipe', 'date +%Y%m%d-%H%M%S') }}"

  tasks:
    - name: Create backup directory per host
      file:
        path: "{{ backup_dir }}/{{ inventory_hostname }}"
        state: directory
      delegate_to: localhost

    - name: Fetch running configuration
      ansible.netcommon.cli_command:
        command: show running-config
      register: running_config

    - name: Save configuration to file
      copy:
        content: "{{ running_config.stdout }}"
        dest: "{{ backup_dir }}/{{ inventory_hostname }}/running-config-{{ timestamp }}.txt"
      delegate_to: localhost

    - name: Save latest symlink
      file:
        src: "{{ backup_dir }}/{{ inventory_hostname }}/running-config-{{ timestamp }}.txt"
        dest: "{{ backup_dir }}/{{ inventory_hostname }}/running-config-latest.txt"
        state: link
      delegate_to: localhost

Approach 2: NETCONF Structured Backup

# netconf-backup.yaml -- Retrieve config via NETCONF for structured storage
# Produces XML files that can be diff'd and fed back into NETCONF

- name: NETCONF configuration backup
  hosts: ocnos_nodes
  connection: netconf
  gather_facts: false

  tasks:
    - name: Get full running configuration via NETCONF
      netconf_get:
        source: running
        filter: |
          <filter type="subtree">
            <interfaces xmlns="http://openconfig.net/yang/interfaces"/>
            <routing xmlns="urn:ietf:params:xml:ns:yang:ietf-routing"/>
            <network-instances xmlns="http://openconfig.net/yang/network-instance"/>
          </filter>
      register: netconf_config

    - name: Save NETCONF XML config
      copy:
        content: "{{ netconf_config.output }}"
        dest: "/opt/network-backups/ocnos/{{ inventory_hostname }}/netconf-{{ timestamp }}.xml"
      delegate_to: localhost

Approach 3: Git-Backed Version Control with Drift Detection

# git-backup.yaml -- Full production backup with Git version history
# Detects configuration changes and alerts if drift is detected

- name: Git-backed OcNOS configuration backup
  hosts: ocnos_nodes
  gather_facts: false

  vars:
    git_repo: "/opt/network-configs"

  tasks:
    - name: Fetch running configuration
      ansible.netcommon.cli_command:
        command: show running-config
      register: running_config

    - name: Write config to Git working directory
      copy:
        content: "{{ running_config.stdout }}"
        dest: "{{ git_repo }}/{{ inventory_hostname }}.cfg"
      delegate_to: localhost

    - name: Check for config changes (Git diff)
      command: git -C {{ git_repo }} diff --name-only
      register: git_diff
      delegate_to: localhost
      changed_when: git_diff.stdout != ""

    - name: Commit changes if config drifted
      shell: |
        cd {{ git_repo }}
        git add {{ inventory_hostname }}.cfg
        git commit -m "Config change detected on {{ inventory_hostname }} at {{ timestamp }}"
      delegate_to: localhost
      when: git_diff.stdout != ""

    - name: Alert on configuration drift
      debug:
        msg: "ALERT: Configuration change detected on {{ inventory_hostname }}"
      when: git_diff.stdout != ""

Scheduling Backups with Cron

# crontab -e -- Schedule daily backups at 2 AM
0 2 * * * /usr/bin/ansible-playbook /opt/ansible/git-backup.yaml   -i /opt/ansible/inventory.yaml   >> /var/log/ocnos-backup.log 2>&1

# Weekly full backup with retention cleanup (keep 90 days):
0 3 * * 0 find /opt/network-backups/ocnos -name "*.txt"   -mtime +90 -delete

IP Infusion Engineering Team

Automating Network Deployments with OcNOS: Ansible, NETCONF, and gNMI

As networks grow in size and complexity, manual configuration becomes the primary bottleneck for both new deployments and day-2 operations. A service provider managing hundreds of cell site routers cannot afford to configure each one individually. A data center operator managing dozens of leaf switches needs a repeatable, auditable way to push VXLAN fabric configurations. OcNOS supports multiple automation interfaces that integrate with the tools network teams already use.

OcNOS Automation Interfaces

Interface Protocol Best For Data Model
NETCONF SSH/XML Ansible, Terraform, custom scripts Native YANG + OpenConfig
gNMI gRPC Telemetry, config push, streaming OpenConfig + native
CLI via SSH SSH Ansible raw module, Expect scripts CLI text
REST API HTTPS IP Maestro integration, custom apps JSON

Ansible + NETCONF: Configuration Deployment

# Ansible playbook: deploy IS-IS SR config to multiple OcNOS nodes
# inventory.yaml defines hosts with NETCONF connection

- name: Deploy IS-IS SR configuration
  hosts: ocnos_sp_nodes
  connection: netconf
  gather_facts: false

  vars:
    isis_net_prefix: "49.0001"
    srgb_start: 16000
    srgb_end: 23999

  tasks:
    - name: Configure IS-IS SR on each node
      netconf_config:
        content: |
          <config>
            <routing xmlns="urn:ietf:params:xml:ns:yang:ietf-routing">
              <control-plane-protocols>
                <control-plane-protocol>
                  <type>isis</type>
                  <name>CORE</name>
                  <isis xmlns="urn:ietf:params:xml:ns:yang:ietf-isis">
                    <interfaces>
                      <interface>
                        <name>{{ ansible_host_loopback }}</name>
                        <passive>true</passive>
                      </interface>
                    </interfaces>
                  </isis>
                </control-plane-protocol>
              </control-plane-protocols>
            </routing>
          </config>

    - name: Verify IS-IS neighbors via NETCONF get
      netconf_get:
        filter: |
          <routing-state xmlns="urn:ietf:params:xml:ns:yang:ietf-routing">
            <routing-instance>
              <routing-protocols>
                <routing-protocol>
                  <isis xmlns="urn:ietf:params:xml:ns:yang:ietf-isis">
                    <adjacencies/>
                  </isis>
                </routing-protocol>
              </routing-protocols>
            </routing-instance>
          </routing-state>
      register: isis_state

    - name: Assert all neighbors are UP
      assert:
        that: "'UP' in isis_state.output"

gNMI: Real-Time Config Push and State Retrieval

! gNMI CLI (gnmic tool) -- push configuration to OcNOS
!
# Install gnmic:
bash -c "$(curl -sL https://get-gnmic.openconfig.net)"

# Push interface description via gNMI Set:
gnmic -a 10.0.0.1:57400 -u admin -p admin --insecure set   --update-path '/interfaces/interface[name=eth-0-1]/config/description'   --update-value '"Uplink to Spine-1"'

# Get current BGP neighbor state:
gnmic -a 10.0.0.1:57400 -u admin -p admin --insecure get   --path '/network-instances/network-instance[name=default]/protocols/protocol/bgp/neighbors/neighbor[neighbor-address=10.0.0.2]/state'

# Subscribe to interface state changes (on-change):
gnmic -a 10.0.0.1:57400 -u admin -p admin --insecure subscribe   --path '/interfaces/interface/state/oper-status'   --mode on-change

Zero-Touch Provisioning Pattern

! OcNOS -- ZTP hook: execute script on first boot
!
! OcNOS supports a ZTP boot script that runs when no startup config exists.
! The script can pull configuration from a DHCP/TFTP/HTTP server:
!
! 1. DHCP option 67 points to ZTP script URL
! 2. OcNOS downloads and executes the script on first boot
! 3. Script fetches device-specific config based on MAC/serial number
! 4. Config is applied via CLI or NETCONF
!
! Example ZTP script (Python, runs on OcNOS):
! import subprocess, urllib.request
! serial = subprocess.check_output(['show', 'version'], text=True)
! config_url = f"http://ztp-server/configs/{serial.strip()}.cfg"
! urllib.request.urlretrieve(config_url, '/tmp/startup.cfg')
! subprocess.run(['copy', '/tmp/startup.cfg', 'running-config'])

IP Infusion Engineering Team