Disaster Recovery and Business Continuity: Is it Better to be Lucky or Prepared?
Four Simple Tips to Help Prepare for the Next “Oops” Moment
by Kemal Balioglu
Recently I came across a sobering story about how one simple command line resulted in the deletion of the majority of production files for Pixar’s Toy Story 2 from a studio server. While the studio had been creating backups of production files daily, they did not realize, until they went to restore the lost files, that the backup solution had not been working. This, by any measure would qualify as a disaster. It also is known as, in today’s IT vernacular, an unplanned downtime event.
This event happened in 1998 when backup solutions were immensely complex and difficult, if not impossible, to “test.” According to the story, the studio had seemingly been prepared for this unimaginable situation; however, they ultimately had to rely on blind luck to recover. An employee just happen to have copy of the movie that she had taken home the week before, and that became the de facto backup file.
Today, cloud-based disaster recovery solutions are quickly gaining enterprise-wide adoption as organizations seek to reduce hardware costs and improve flexibility in responding to unplanned downtime events.
Disaster Recovery as a Service (DRaaS) not only allows organizations to quickly and easily recover data, but more importantly, enables them resume operations seamlessly during a disaster. Advances in cloud-based DR solutions allow IT to determine the level of protection at the server level. Mission critical servers can be set to recover instantly while other servers with less critical data might be set to recover at a longer Recovery Time Objective (RTO).
Despite the benefits of cloud-based DR over traditional solutions, a DR program can only be successful if it is consistently tested. Regular scheduled testing must include communications, data recovery, and application recovery. DR testing in these areas is required to conduct planned maintenance and train staff in disaster recovery procedures.
Traditionally DR tests have been complex, disruptive and consequently unpopular. Too often, testing focuses on backing up instead of recovering. While this approach ensures you have a copy, it does little to make the data, server, or application easy to reinstate.
To further complicate efforts, many of the systems used in the testing are needed to run day-to-day operations. To have those systems down during testing is unacceptable.
But a hybrid-cloud approach to DR has changed the testing landscape for the better, combining public cloud and SaaS automation software to make continuity planning easier. Companies gain data backup, fail-over of servers and the ability to have a secondary data center at a different site to allow for regional disaster recovery.
Here are four suggestions to make your DRaaS testing more efficient and productive.
- Plan Ahead and Plan Often
The problem with disasters is they aren’t planned and are unexpected. If you’re not testing your DR frequently, you might find yourself hung out to dry when lightning strikes. DR tests can be done frequently because DRaaS doesn’t have the physical infrastructure and configuration synchronization associated with traditional disaster recovery.
With an automated DRaaS solution, you don’t need to schedule IT personnel to manually check system configurations. Recent innovations make it easy to create an on-demand recovery node that you can test quickly. Unlike a typical backup-only cloud storage solution, hybrid DRaaS solutions can maintain up-to-date, ready-to-run virtual machine clones of your critical systems that can run on an appliance or in the cloud.
- Test Your DRaaS in a Sandbox
With DRaaS solutions, standby-computing capacity is available to recover applications in the event of a disaster. This can be easily tested without impacting your production servers or unsettling the daily business routine. A sandbox copy is created in the cloud, which is only accessible by the system administrator. These copies are created on demand, paid for while being used and deleted once the test is complete. The approach makes testing simple, cost effective and does not disrupt business operations. You can test DR and applications every day without missing a beat, assuming you have the right DRaaS provider.
Test cases can be performed against the recovery nodes in as little as 15 minutes, depending on the application, often with no incremental costs. Applications and services are immediately available for other uses, enabling businesses to effectively adopt cloud infrastructure or speed tie to production for new applications or initiatives.
- Take Advantage of a Sliding Scale
There are financial benefits to cloud-based testing. Service providers regularly offer sliding scales for DR testing. Putting your DR solution in the cloud also means there isn’t a redundant in-house infrastructure that is sitting unused most of the time.
The cloud gives small- to medium-sized businesses the same capabilities of larger organizations. With a level playing field then, SMBs have greater access to DR solutions and the ability to test frequently,
- Entice Regular Employee Participation
In traditional DR settings, employees may consider testing to be time consuming and distracting from their already busy schedule. However, according to a survey by market research company, Enterprise Strategy Group, respondents using cloud-based DR services were four times more likely to perform weekly DR tests than those-hosting their BC/DR solution.
People learn by reputation, so just like fire drills, we have to create and practice DR drills, which are critical to a DR Plan. Companies that fail to conduct regular drills, shouldn’t be shocked when their employees panic during a disaster.
As you consider these steps, you might find yourself among other skeptics who think drills are unnecessary and that the chances of disaster striking are still relatively slim. But according to a May 2014 study by the Aberdeen Group, the annual average number of unplanned downtime events is 1.7 per small business organization, with the average downtime per event 6.7 hours. The average cost from downtime is estimated at $8,600 per hour, or about $100,000 year.
Unplanned downtime events, whether caused by a natural disaster, human error, or hardware failure can have immediate and long-term negative impact. Take steps to ensure your business can quickly and easily recover its IT infrastructure and data, and minimize the impact by being prepared. rather than relying on luck.
Kemal Balioglu is the vice president of products at Quorum.