Specialized IT Services focused on Data Management | Speak with Us 877-634-9222
Business Continuity Planning: Disaster Recovery in the Cloud
What can be done to protect against outages from major cloud providers?
Cloud services have become an essential component of enterprise computing. However, as the most recent Amazon Web Services outage illustrates, some IT professionals still have their head in the clouds when it comes to preparing for provider outages and having a database disaster recovery plan. On Christmas Eve 2012, Amazon Web Services experienced an outage at its Northern Virginia data center related to its Elastic Load Balancing Service. It was the fourth such outage of 2012, and poor timing for a marquee customer like Netflix to experience service disruptions.
Families still like to gather ’round the television and watch a holiday film or two before putting Santa’s cookies and milk out. When they couldn’t stream “It’s a Wonderful Life” or “A Christmas Story” using Netflix, you bet it was a customer relations nightmare.
Amazon Web Services says the disruption began at 12:24 p.m. PT, December 24. Service was ultimately restored at 10:30 a.m. PT, Christmas Day, although the company did not announce operations had returned to normal until 12:05 p.m. PT.
Despite the finger-pointing in Amazon’s direction, the company is not the only service provider to ever experience an outage. Salesforce.com, a cloud and CRM services provider, has had some high-profile interruptions, as have Skype, many of the hosted email service (Gmail, Hotmail, Yahoo! Mail), Intuit hosting services, and Microsoft Windows Azure.
This should reinforce the point that no single provider is failproof. Servers could be taken out by natural disaster (Hurricane Sandy, anyone?), power disruptions, mechanical failures, or any number of other issues, including human error. The challenge is to be aware of those issues that may affect reliability in the cloud and how to optimally prepare for those.
Audrey Rasmussen, a partner and principal analyst at Ptak, Noel & Associates LLC, writing forEnterprise Management 360˚, notes that cloud computing does not alleviate any of the challenges found in traditional IT structures. “[I]t depends on software, it requires management, it uses processes, and more,” she writes. “So the fact of the matter is there will be outages.”
Preparation and planning are most important to minimizing and coping with cloud service outages. How well do you know your service provider? Many questions regarding its cloud architecture, processes, and redundancies should have been asked when choosing a cloud services provider.
What is your Service Level Agreement? These are important in understanding what pieces the provider has in place in the event of disruptions or outages so you can better understand how vulnerable your operation is if a disruption or outage should occur. Any remedies for such disruptions or outages would be specified in your Service Level Agreement.
Perhaps the greatest lesson these types of outages offer is that it is not wise to rely on some other organization to solve business continuity problems for your organization. You are shifting resources to the cloud, not responsibility. You must be proactive and prepared with your database disaster recovery plan.
A good initial step is to evaluate the business risks in the event of a cloud services outage. If the stakes are high, you need to know how to keep the service running. The onus may be on your organization, for example, to ask for the provider for business-continuity service options that keep high-risk applications running.
One possible approach is to use a variety of tools rather than relying on vendor-specific solutions. Brian Adler, a services architect with RightScale, advises against using vendor-specific tools. In the December 2012 Amazon outage, for example, only Amazon’s Elastic Load Balancing service was affected. Those organizations using a different instance-based load balancing solution were isolated from the failure.
The use of vendor-specific tools and virtual appliances may make deploying an application easier in the short term, but many times these services are integrated or tied into other services that can result in cascading outages if one of these underlying services suffers a service disruption. The use of vendor-neutral solutions insulates your application tiers from these service integrations, as well as creating a cloud-portable solution.
Resilience is ultimately not the provider’s responsibility, say experts. Rafal Los, enterprise and cloud security strategist, Hewlett-Packard Software, writing for Infosec Island, contends:
Consumers of cloud services are still failing to understand that building resiliency into their critical services is their responsibility. If you are pushing a critical service, and I mean really critical, to the public cloud and you’re dependent on a single provider then I would argue you’ve done a terrible job of understanding and mitigating your risks. Period. […] You should absolutely have diverse technology and a multi-vendor strategy… [N]o doubt in my mind.
If your enterprise’s critical services need to be resilient, they need to be redundant. This means adopting recovery strategies that include using multiple service providers based in more than one region and adding more complexity to your system. Enabling these types of solutions may cost more, but if your organization deems your service essential to its business and if your website absolutely cannot be unreachable or down, then this ounce of prevention may well be a worthy expense.
“The responsibility for balancing risk, services and costs rests with the cloud customer,” notes Rasmussen, “which means that decisions to use or not use high value service options could spell the difference between hundreds of thousands of dollars in revenue lost due to an outage or business as usual during a single data center outage.”
There is no pixie dust in cloud computing that prevents any outages, just smart planning that minimizes the impact of outages. The same deep due diligence that is done in internal IT organizations to ensure business continuity, security and resilience must also be applied to evaluating cloud infrastructures.