Mitigation (eliminating or lowering the impact of an incident) is a crucial part of any Disaster Recovery plan, as well as planning itself. I’ve always wanted to underscore the need to have this sorted out in advance, and Hurricane Sandy is the ideal place to point.
Some of you will remember hearing about this when a major hurricane slammed into the Eastern United States with New York City right in the bull’s eye. In its wake it left devastated communities and businesses. In a lot of cases, the business impacted most were those that didn’t rely upon cloud service providers for their redundancy with servers, but instead still remained married to physical servers. Here’s a small list of the problems they encountered:
- Power loss. While many companies had backup generators, few considered the possibility that the roads would be so damaged that fuel to keep them up and going might become a challenge or that restoring basic services would take as long as it did.
- Wet servers. In some cases, the storm surge was high enough that servers and in some cases, entire datacenters were damaged or outright destroyed.
- Key personnel unable to return. Again, the end result of roads being impassable. And while it’s true that they might have been able to work remotely, the communications infrastructure damage kept that from happening.
But some companies moved happily along, despite the damage, and one of the big heroes of the day was good old fashioned planning, and of course virtualization.
To understand how virtualization is a huge asset in a situation like this, you need to remember that no matter what kind of virtual machine we’re talking about, we’re talking about files (and in some cases, some really big files). The fact they’re files makes it really simple to move them. But just because they’re easy to move, doesn’t mean you can plan for it. To understand how, we need to look at some of the ways companies moved their VMs from New York to other locations.
Method one: Physically move them. One of my students regaled me with tales of how his company copied all the VMs over to external hard drives, and he and another tech loaded them up into the back of Jeep and got out of New York less than an hour ahead of landfall of the storm. They had to drive to another location, load the machines, and get everything from the company repointed towards their location.
This works, BUT there’s more than a few issues with it. First, I’m glad they got to their location safely, but given the situation, a bad wreck would have wiped out any hope that company had of getting up and going any time soon. The downside of it was it took a lot of planning and last minute coordination to pull this off, and at the end of it all, it took several days to get everything up and going.
Method two: Move the VMs over the Internet to a cloud service provider. More than a few companies chose this option, and while it worked very well for some, for others it didn’t work at all. First, they had to coordinate with a cloud service provider to make this move. Before byte one of data could be moved, the contracts had to be in place.
For those that did get contracts in place, one of the biggest issues was bandwidth. Let’s be honest about it. To move a small VM (say 60 GBs) takes time. That time is determined by how fast your connection is as well as how many thousands of others are moving big files also. The ones who were successful had three things going for them:
- They didn’t wait till the last minute, but instead had already figured out the contract piece and had it in place.
- Knew their infrastructure. Rather than make wild guess about what to move (“move everything!”), they knew what was crucial, and what wasn’t. That way they could focus on the important things.
- Had Bandwidth to burn. Getting a head start on the rest of the herd was a good thing for them. By the time everyone else started, they were moved, or were down to the last few. Big thing was that they could react faster than everyone else.
Challenges here included the need to redirect traffic, not to mention communicating with users (go here now to work), and of course, wherever users were going to work from!
The real success stories fall into Method three:
The ones who came through best were those companies that had already built some kind of infrastructure to support it, and of course had thought it all through. What many had done was built a site inland, and again, they knew what they had to have duplicated. It was just a matter of pre-seeding the important VMs onto these remote locations, and then having plans in place to redirect traffic. These VMs of course had to be kept up to date, but a huge number of popular backup softwares, as well as built in capabilities of both VMware and Hyper-V, makes this a task that isn’t exactly impossible. If you have some folks ready to play some routing games in advance, this works well.
Most people used to look at this capability as almost impossible due to costs of storage, servers, and assorted other equipment, but with companies that specialize in providing these services on the upswing, what was once expensive, is merely another tool in the toolbox.
One thing a lot of people who rode out the storm pointed out was that there might be some services you want on the cloud, period. One of course is email, but the biggest piece were services such as payroll and accounting. While some of this can be achieved through on-line services, others found it easier just to keep certain functions as VMs (like payroll, whatever tools they needed lived on a virtualized workstation) and the user always went there anyway to do this function. So the VM it ran on, became one of those machines that was placed at remote site and simply kept up to date.
Next time, we examine some of the security issues that popped up during recovery situations.