ISIS Micro-loop Avoidance in OcNOS 6.3

In IP networking L3 is preferred over L2 as the former prevents traffic looping problems and finds the best path to reach the destination. In L3 Interior Gateway Protocols (IGPs) run their own algorithms to compute the shortest path between source and destination and ensure loop free reachability.

Still some short-lived micro-loops can happen in L3 networks too, and these micro-loops can consume the entire bandwidth of the link hence traffic loss can happen. Let us understand in this article how it happens and how to prevent the same.

What is a Micro Loop?

A micro loop is a short-lived packet forwarding loop that may occur among two or more routers when these routers do not update their Forwarding Information Base (FIB) for a certain prefix at the same time.

Consider the below topology:

In the above topology packets, R1 to R5 are going via R2 and R3 because of shortest path.

If the link between R3 and R4 goes down then R3 calculates the SPF first and then R2. Between these two times a loop can happen between R2 and R3, until R2 installs the changed routing info to its FIB. Similarly the same scenario could happen between R2 and R1. When R2 has installed the result to FIB and R1 has not changed it yet, a loop can happen. At t3 we can see that traffic is forwarded on the right path. Until t3 there could be traffic losses.

This micro-loop appears due to asynchronous convergence of nodes in a network when an event occurs. An event could be as simple as a link going down or it could be a complex metric change.

Multiple factors may increase the probability of a micro-loop appearance:

  • Delay of failure notification
  • SPF delay
  • SPF computation time
  • RIB and FIB prefix insertion speed and ordering

Solution

Micro-loops can be avoided by using an ordered-FIB(oFIB) approach, by calculating the rank of the node and accordingly delaying the installation of routes to FIB. OcNOS’s ISIS implementation of oFIB is in compliance with RFC 6976.

In the previous diagram traffic looping could have been avoided if the prefix is installed (after SPF calculation) first in R1, then in R2 and in R3. Until this sequence of installation of prefix, packets will go to stale path. Packets going to stale path is acceptable considering it will have some backup (eg. FRR) path configured, for example, in the diagram R3 should have a backup path to reach R4/R5.

Ordered-FIB ensures the installation of a prefix in sequence across routers. This in-sequence is achieved by delaying the prefix installation in each router.  The delay is calculated with following calculation.

In the above, the formula rank of a node varies as per the event and topology. In the next sections we will learn about this rank calculations for different event types.

Link Down, Router Down, or Metric Increase

In case of event failure, routers near to the failure should install at the end (higher rank) and routers far from the failure will install in the beginning (lower rank). Here rank is defined as the depth (in number of hops) of the branch from itself in Reverse Shortest Path Tree (rSPT) before failure (rSPT_old) rooted at failed node.

For the given topology failure scenario, rSPT_old with R4 as root is shown below along with the ranks (depth-hops) calculated.

The above calculation ensures nodes closer to failure are installing the prefix later and hence traffic loops will be avoided.

Link Up, Router Up, or Metric Decrease

In this recovery, case rank is calculated with regular Shortest Path Tree (SPT) keeping the self as root and calculating the distance (in hops) to the event generated node.

For the given topology router R1 builds the SPT as follows and the rank (in hops) calculation is also shown.

The above calculation ensures nodes closer to up-event are installing the prefix first and hence traffic loops will be avoided during up cases.

 

More details on configuring and verification can be found in OcNOS Configuration Guides, available at https://www.ipinfusion.com/documentation/configuration-guides/

Notes:

  1. This feature delays the prefix installation hence convergence will take time, hence this should be considered for networks with FRR configured.
  2. Ensure same values configured in all nodes in a given topology for hold-timer and max-fib.

This feature is available in OcNOS from release 6.3.0 for IPv4 routes.

Contact us today to learn more about OcNOS 6.3.

Try OcNOS before you buy.

Prasanna Kumara S is a Technical Marketing Engineer for IP Infusion.