 |
ZebOS High Availability Suite Overview |
As communications networks move to service mission critical applications for Next Generation Networks (NGN), the network devices servicing this network need to be 99.999% or 99.9999% reliable. These network devices need to be able to reliably handle operational, application, system and component failures. More importantly these devices should be able to meet the strict Service Level Agreement (SLAs) of their customers. In order to satisfy these requirements and demands, the routing/switching software needs to have an ability to protect the system and the network against these failures. IP Infusion’s ZebOS Routing and Switching platform provides the modularity required for supporting these applications in a High-Availability environment.
IP Infusion’s High Availability (HA) solution provides the control plane redundancy necessary to meet this ever-increasing need for higher network service availability.
Key Features
The ZebOS High Availability solution
- Supports stateful switchover operation (SSO) for different HA configurations i.e. 1+1, 1:1, m:n or simplex configurations.
- Supports ability to support In Service Software Upgrades (ISSU)
- HA Framework utilizes a Checkpoint Abstraction Layer (CAL) to work with different HA middleware.
- Supports stateful switchover operation using replication (SSO) for ZebOS STP and LACP
- Supports stateful switchover operation using replication of Routing Information Base (RIB) to support Non-Stop Routing (NSR)
- Scales to support large systems.
High Availability Architecture
IP Infusion’s ZebOS HA Framework provides a abstraction layer called Checkpoint Abstraction Layer (CAL) to interface with any HA middleware. It defines the services required for stateful replication in a system configuration independent manner. This framework can be extended to any external modules and protocol modules. ZebOS currently supports stateful replication of LACP, STP and the Routing Information Base to support Non-Stop Routing (NSR). The framework provides hooks for integrating with a Fault Manager (FM) to support graceful (in case of ISSU for eg) or ungraceful (software fault for eg) switchovers.

ZebOS High Availability Architecture
Checkpointing in ZebOS
Checkpointing provides control plane protocols with the capability to replicate state data to standby nodes. The standby nodes receive replicas of state data from peer control plane processes. When used in a 1:1 configuration for HA one processor acts as Primary (Active), and one acts as Secondary (Standby).
CAL provides a set of services for the definition, operation and dependency of protocol state and configuration information.
Each protocol performs the following operation for Checkpointing:
- Opens a session with the Checkpoint service
- Registers with the service for the data they need to replicate
- Performs an initial full checkpoint to the Standby protocol
- Subsequently performs full or partial checkpoints of the state and data
- Upon failure of an Active protocol, the Standby protocol resumes operation based starting at the last checkpoint.
The checkpointing mechanism could be per data, as defined by the middleware, or could be transaction based.
|