Tips to Avoid Detrimental Changes in VMware

Maintaining the performance and security of your virtual environment on a daily basis is an important — and non-trivial — task. VMware monitoring tools are a critical component of a solid security strategy because they help you quickly detect and remediate unauthorized changes and errors that could lead to downtime, data loss or compliance failures.

Every time I build a new VM, add more CPU or RAM, or add more disk space, there’s an impact on the infrastructure as a whole. Now, granted, ESXi is very good at working with these changes, and there are a lot of built-in tools that help. However, the fact still remains that there’s only so much in terms of resources to go around, and sooner or later, I could end up running out if I’m not careful.

So, before I begin discussing tracking the changes, we need to understand why this is important and how the change impacts us.

The Case:

I have 4 Windows servers built on ESXi 6.0 host, so let’s take a look how the process of running them and installing applications impact servers. I go to the “Performance Charts.” VM-W2K3 is turned on. CPU usage is going higher.

06-10-2016_VM1

After loading the machine on, there’s no big difference in CPU usage. Look at the two areas circled in “Green.”

06-10-2016_VM2

So let’s talk about what’s different between VM-W2K3-3 and the other two machines. The first one I set up as a reservation. I’m treating this as a Mission Critical server, and I wanted to set it so it always had 250 MHz of processor power set aside just for it.

06-10-2016_VM3

As long as that machine is off, the reservation doesn’t mean anything. The minute I turned it on, 250 MHz of my already very limited processor power is set aside just for the exclusive use of this machine. That means that no other machine can use it.

Another big difference here is that the VMware tools are out of date. While VMware tools help things like making the mouse easier to use, etc., its biggest contribution is revealing what’s going on behind the scenes. One of these is it sits between the VM and the Host. It helps to facilitate better use of resources, etc. So, I’m also going to get it updated and see if that makes a difference.

Now I’ve done the same exact thing with VM-W2k3-4. Let’s turn it on, and see where we’re at.

After the green light appeared, VM-W2K3-4 was powered on. Notice that total usage is higher than before, but it’s behaving rather well.

06-10-2016_VM4

So next, we sock it to the machines. The danger zone as per VMware is getting these processors above 80-90 percent and keeping them there. If they stay up there, I really need to look at my processors and do something about either adding more or getting rid of some VMs.

I’ve turned on a heavy load on each VM. Remember, this will simulate a software/user load on the machines. After several minutes of letting it run, I see that my CPU usage is nearly maxed out. According to the VMware troubleshooting definition, I am officially in deep trouble.

06-10-2016_VM5

Remember, we’re looking at one very small piece of what’s going on. To get the full picture, you need to look at what’s happening in memory usage, disk usage, network usage and so on. In almost every case, you’ll see something very similar to what’s happening above.

Now, this was intended to demonstrate to you what growth does to your system. If you don’t have means to regulate growth, you’ll get into the mess very quickly.

 What does monitoring changes have to do with this?

The whole key to getting and keeping yourself out of trouble in the infrastructure is acknowledging that changes have impacts.  To make sure those changes aren’t detrimental, we need to implement change management.

  • A ticketing system: someone needs a VM, or to make a change to a VM, they have to create a ticket. This serves a couple of purposes. One is that it will cut down on frivolous requests, and of course we want it for SOX and HIPAA purposes.
  • Along with the Ticketing system, we need to develop the process. At a minimum, you’ll need some representation from senior management, someone from storage, networking, backup, security, and VM Operations on your Change Management Board (CMB). As your system evolves, more people will come into the fold.

Questions the CMB must ask:

  • Does the machine fit a business need?
  • What impact, if any, will it have on the current environment? In short, if we build it, are we going to be hurting ourselves? This is where knowing your resource utilization is priceless. It should also give you the ability to project where we’ll be if we do build it.
  • What’s the expected life span of the machine?
  • Are there sufficient resources from anything other than VM Operations to support the machine (can we back it up, are there enough security licenses, etc.)?

If you don’t have the resources to run the machine adequately, or if it really serves no business sense, then the request should be rejected by CAB.

Along with change management, we need to have a way of monitoring things and to protect ourselves against unauthorized changes. We need to watch for changes that have been made. Changes should be reflected not only here, but matched by a ticket as well. Monitoring things will also help evaluate the impact of changes and perform whatever control needs to be done to get processes in accordance with the change management process.

This is where I’d drag in Netwrix Auditor for VMware to do the job for me. It provides visibility into VMware vSphere, vCenter, and standalone ESXi hosts. Netwrix Auditor also serves as a tripwire for potential issues. As an example, this is one change that popped up in the morning report:

06-10-2016_VM6

What it’s telling me is that VMware tools are out of date. Now ideally, I should have known this or known it was coming, but in case the machine got missed in updates, it serves as notice that we have an issue. Ideally at this point, I’d open a ticket and start planning to get that one machine updated. This email would become part of the ticket as well (you always want to give your auditors as much info as you can).

In summary, anytime we make a change in our virtual environment, expect changes in resources available. These must be planned for and approved.  An effect must have a cause, and monitoring is one way to help make sure process and procedure are adhered to.

Related article:
10 simple ways to prevent security breaches in VMware server