UPS Devices, SOX and Other Disasters

Return the Princess of Darkness. I had been asked by management to give them (meaning the SOX auditors or Demons straight from Hell, depending on how you want to address them) the nickel tour of the datacenters.  There were a lot of questions, many of which will filter their way into future articles.

I remember she leaned down and looked at one of our gleaming, brand new, APC UPS devices and asked sweetly, “How do you know they’ll actually work?”  I meet her with shocked silence.  After all, a UPS is one of those things you take on faith, right.  Just like the Sun coming up in the morning, they’re that reliable.  When you need them, they’ll respond.

Unfortunately, it’s not that simple.

Now Uninterrupted Power Supplies (UPS) are a staple for IT.  And what she was asking in her round about, Devil’s advocate way was how much thought had we really put into them.  Or had we just gone on line, and said, “Gee that one looks nice” and bought it.

I think that’s how we purchased the thing honestly, but there’s a science to selecting them.

So, let’s talk a little about them.  UPS’s are designed to keep systems up during a brownout, a power bump, or give you enough time to shut everything down gracefully.  If you recall the chart from one of the previous articles, electrical outages are high on the list of what causes companies to activate emergency plans.  So out UPS devices become our first and best line of defense against this happening.  Unfortunately, too many times we go out, we buy them (and they may or may not serve our needs even then) and then just forget they exist.  A UPS device is one of those things we expect to work when we need it, but just like a spare tire on a car, unless we check them every now and again, they might not work the way we expect.

First, how do you select a UPS.  Many manufacturers have a site where you can go, and start plugging in numbers such as how many systems usually broke down by maker, model number etc.  You also plug in how much time you want to stay up.  While the first part isn’t a big deal, the second is.  The longer you want to up with your load, the higher the cost.  So someplace in there we just have to bite the bullet and realize that batteries are there to keep us up and going for a certain amount of time and if that time passes, to allow us to shut the systems down with no losses.

Second, a UPS has an expiration date so to speak, or more specifically, the batteries in them do.  Over time, they begin to lose the ability to take or retain a charge, and most folks notice this when it’s too late (in short, when the lights go out, and so do the servers).  There’s a number of tasks we have to perform as Network and System Admins, and one of is we need to at the very least, conduct monthly checks of our UPS devices.  Most these days have a web interface that allows you to go in and get a report off the battery.  One of the things it will tell you is how much of a charge they’re taking, a projected time when they’ll zero out under current load should power go out, how the batteries are doing, identify bad cells and etc.  Also, if certain issues occur you can be notified about them via Email (and you really should set those up).

There’s a lot more to a UPS then just plugging them in, plugging in our systems, and forgetting them.  They’re an ultimate form of insurance, and like any kind of insurance, you have to pay some attention to them.  The care and administration of your UPS devices is something that needs to be thought through, written down, and treated just like you would patch management or AV administration.  It’s important and failure to pay attention to it can cost you in damaged servers, data loss, which translates into loss of money.

That leads us to generators.  A generator would be a great thing to have, but you have to look at it from a couple of different points of view if you can justify having one or not, but they all boil down to this.  How often do you really need one?  If you have extended power outages say once a year and it lasts maybe one or two hours, does that justify the expense of purchasing a generator, integrating it with your datacenter electrical system, and then the assorted monthly cost of maintaining this system?

There’s another piece of this, and it’s this simple.  Is there a mandate anywhere that says we have to have one?  If you’re a healthcare facility, then it might apply, but a small company?

It all comes down to this, will having one cost more than the amount of money you’ll lose in a power outage.

If you do opt to get a generator, there are rules to the road there.  First, you need to test them, and you need to do it at least quarterly.  Most folks think starting it up is enough.  That might be okay for a monthly test, but every so often you need to see if it will actually do what it’s supposed to do, and that’s keep your datacenter up and going.  That means making sure that if the power goes out, it can power your systems.

It’s not a bad idea to test the batteries under similar situations.  Yank the plug out of the wall and see if the system stays up like they should.

I certainly wouldn’t do this during the time when everyone’s using them, but it’s something that has to be planned through, and scheduled to minimize the impact of the outage.  We’ll talk a little more about that when we talk about exercises later.

Oh, since SOX is hot on documentation, you may be asked to present logs when they were tested, results of maintenance etc.  So keep track of that.  I like to make it all part of ticket and file it under monthly tasks.

Richard is a freelance IT consultant, a blogger, and a teacher for Saisoft where he teaches VMware Administration, Citrix XenApp, Disaster Planning and Recovery for IT, and Comptia Server+