logo

Metro-Area Downtime and Disaster Recovery

Overview

Every successful business continuity strategy needs an effective disaster recovery plan. Metro-area disaster recovery means putting backup servers in another area where they are ready to pickup work in case the servers at the main data centers are down. There are four major tasks involved to achieve this result within the Orbit framework.

 

Goals

1. Synchronizing Site State

Evaluation of the application of transfer of virtual machine state in Metro Area Networks and its application to concepts of Orbit.

Work done:

  • To fully understand how existing MAN technologies could be applied to Orbit framework. Survey of multiple MAN technologies has been carried out.
  • Also an extensive study was carried out to study traffic characteristics and application requirements of such technologies.

 

2. Synchronizing Data

Further develop fault tolerant capabilities of Orbit in order to to achieve downtime as close to zero as possible.

Work done:

  • A volume replication feature for Cinder- OpenStack block storage.
  • Mechanisms to synchronize VM state and data.

 

3. Handling Fail-overs

1- Failure Detection: Enable accurate and in time detection of when and where a failure has happened over geographically distributed deployment.

2- Traffic Redirection: After failure detection enable a smooth and transparent redirection of traffic.

Work done:

  • Fault Tolerant Islands build upon SERF gossip protocol.
  • Evaluation of candidate solutions BGP Anycast, SDN based, HAproxy.
  • Configuration of BGP Anycast Between two test beds at Umea and Lulea.

 

4. Creating an orchestration layer for site Disaster Recovery across sites

Extend Open Stack cloud management platform with appropriate protocols for federated (multi data center) operation.

Management for cross-data center disaster recovery mechanisms for state and I/O synchronization over MANs, and fail-over handling

Work done:

Design and implementation of the Orchestration layer

  • DR-Orchestrator
  • DR-Logic
  • Orchestrates and automatizes the protection and recovery actions.
  • Easy user interactions via nova protect-vm and nova protect-volume offers a recovery API to the Failure Detection Logic via Nova recover DATACENTER.

 

 

Demo

 

Installation Guides
Disaster Recovery for Openstack Installation Manual
Dragon OpenStack API Reference