This site requires JavaScript to be enabled
787 views

SNIA definitions (Storage Networking Industry Association)

Disaster Recovery (DR): The recovery of data, access to data and associated processing through a comprehensive process of setting up a redundant site with the recovery of operational data to continue business operations after a loss of use of all or part of a data center.

This involves not only an essential set of data but also an essential set of all the hardware and software to continue processing of that data and business. Any disaster recovery may involve some amount of downtime.

Business Continuity (BC): Processes and/or procedures for ensuring continued business operations.

Maximum Tolerable Outage (MTO): The maximum acceptable time period during which recovery must become effective before an outage compromises the ability of a business unit to achieve its business objectives.

Recovery Time Objective (RTO): The maximum acceptable time period required to bring one or more applications and associated data back from an outage to a correct operational state

Recovery Point Objective (RPO): The maximum acceptable time period prior to a failure or disaster during which changes to data may be lost as a consequence of recovery.

Data changes preceding the failure or disaster by at least this time period are preserved by recovery. Zero is a valid value and is equivalent to a "zero data loss" requirement.

Failover: The automatic substitution of a functionally equivalent system component for a failed one.

High Availability: The ability of a system to perform its function continuously (without interruption) for a significantly longer period of time than the reliabilities of its individual components would suggest. High availability is most often achieved through failure tolerance. High availability is not an easily quantifiable term. Both the bounds of a system that is called highly available and the degree to which its availability is extraordinary must be clearly understood on a case-by-case basis.

DR / BC In the context of AWS

AWS Overall DR Plan: The process steps to recover from a situation where we can’t use our primary AWS account (e.g. recover from an AWS account takeover or self-destruct). The Platform Services group is accountable for the overall DR Plan.

AWS Service DR Plan: The process steps to recover each individual AWS system architecture component for a given service within the primary AWS account. (e.g. system doesn’t come up after AWS hardware failure, recover from bad upgrade, compromised machine, corrupted volume) a.k.a. “mini-disaster”, service disruption plan. Service owners are accountable for Service DR plans for their services. All plans should be tested prior to go live, and on a periodic basis once live. Service owners may delegate responsibility for Service DR Plans to Support Owners in some circumstances.

Note for Banner to Ancillary Applications Migration (BAAM) to AWS project: We will have one overall Disaster Recovery Plan, as well as a DR Plan for each service/architecture component that will migrate to AWS. We will not address Business Continuity at this time. This will be a follow-on effort.

Guidance for creating an AWS Service DR Plan

Using the System Architecture Diagram for the service as a guide,

We should have the following goals in mind for system recovery design:

Example AWS Service DR Plan

Service: OnBase

Assumption: backups available, overall RTO - OnBase back up in 4 hrs

Example AWS Service DR Test