We continue the series of posts about aspects of securing Active Directory administration, created by Brian Svidergol. There are three parts to this series, each one dedicated to a specific problem around securing Active Directory administration. To get a deeper insight into the topic, you are welcome to read the first part, focusing on the Principle of least privilege, and the second post, telling about infrastructure considerations. Today, in the final part of the series, you can learn about auditing, monitoring and alerting.
The topic of monitoring alone could take up an entire book. So could auditing and alerting! So, for this blog post, we will only talk about a few small areas of these systems at a high level. The most common thing I see and hear about monitoring an environment is that administrators get too many alerts. And too many alerts usually leads to one thing – lack of response or slow response for incidents. Often, people are aware of this conundrum. But it doesn’t change the configuration for a lot of organizations. I want to talk about 3 areas – why you should audit success and failure, how monitoring plays into securing Active Directory, and reducing the number of alerts so that incident response is improved.
- Why you should audit success and failure. Look at your auditing configuration. Are you auditing only for failures? If so, you aren’t alone! But, it is time to rethink that strategy. Auditing for success and failure is a highly recommended strategy. Let me bring up a couple of examples on the important of auditing for success. First, you are investigating a configuration change on a domain controller. The change was made. That required successful authentication and some type of remote administration (RDP or MMC). In this case, the entire path is littered with successful actions. If you aren’t capturing these with auditing, how much harder will it be to figure out who/what/when? Quite a bit. Another example – you see a very large amount of failure audits related to a service account authenticating to a domain controller. The good news – you captured the failed attempts with auditing. The bad news – you aren’t sure if the attempts stopped because the attacker gave up or because he actually got in. Auditing success and failure helps tie things together like this.
- How monitoring plays into securing Active Directory. There are a number of topics on monitoring Active Directory…so many that entire white papers have been written on it. For today, I’m not going to dive into health monitoring and thresholds. Instead, let’s first focus on security groups. Let’s talk about the Domain Admins group (although the upcoming recommendation applies to all of the critical security groups). You should monitor all changes to the group membership (additions and subtractions). If you don’t, you probably won’t ever know if anybody was temporarily added (for example, to cause havoc on your network) and then removed a couple of hours later. Now let’s change gears to Group Policy. Group Policy is often an area that is overlooked when monitoring Active Directory. Many ugly scenarios come to mind with Group Policy: attacker wants to launch a phishing attack across an enterprise but the current IE settings prevent or reduce the effectiveness – quick change to a GPO and go, attacker needs to push some malware to all domain joined computers – create a GPO and go. As you can see, monitoring Group Policy plays a big role in the overall monitoring strategy. You need to know when new GPOs are created, when GPO links change, and when GPO settings change. You also want to know who performed the changes. Finally, administrative subnets comes into play. Having centralized administrative subnets allows for highly targeted monitoring and opens up the potential to automate big swaths of the initial monitoring configuration. Instead of targeting monitoring at specific servers, specific IPs, or specific groups (containers/groups/or other collection), you can target monitoring at entire subnets.
- Reduce the number of alerts. For anybody that has ever deployed System Center Operations Manager (OpsMgr), you already know where I’m going. Today, monitoring tools are incredibly sophisticated. They come out of the box with a vast array of monitors built in and customized for server roles. OpsMgr has management packs for just about everything! That is the good news. The bad news is that the moment you turn everything on and complete the base configuration, administrators will be overwhelmed at how many “problems” exist in their environment. At first, it is exhilarating – tons of activity and a bunch of low hanging fruit. But then, after days pass, then weeks pass, the alerts keep coming. The next thing you know, the nifty little Monitoring folder you created in your Inbox is loaded up with thousands of unread alert messages. You stop looking at the Monitoring folder in real time and just peruse it when you have some down time. This is common. Of course, there is worse. One time, I was working with a company that had some administrators that pulled their SMS addresses out of the notifications because the amount of SMS alerts was outrageous. I’ve talked to many people about reducing the number of alerts. But it usually takes an incident before action is taken. The typical incident begins with an outage. Once everything is back up, the investigation of the outage often looks at monitoring. Certainly, it would be a good time to add or enhance the monitoring so that the administrators will know about this ahead of time next time, right? Yes, but in this case, administrators did know ahead of time. They got 15 alerts on Wednesday afternoon – currently messages #550-564 in their Monitoring folder that contains 2600 messages. Thus, nobody even noticed them. When such an incident occurs, people will finally begin to look at reducing the number of alerts to make things more manageable. You can reduce the number of alerts or hire 10 guys to stare at monitors and logs all day (or, in organizations with big budgets, you can do both). My strategy for reducing alerts is pretty simple. Every time an issue crops up (outage, degradation, etc.) – I look at monitoring. I find out if we knew about it and if we did – did we know about it early enough to take action and avoid an issue? I find out if the monitoring gave us enough information to take action. I find out if the right people were notified. Often, administrators in one department receive alerts for more than just their department. This occurs on multiple teams. Then, when an alert comes in, even if administrators did see it, nobody takes action because they figure somebody else did. In my monitoring strategy, I also ask a key question when an alert comes in. Does the alert require me or anybody else to take action? If not, then the alert’s value is suspect. In a perfect world, every alert would equate to action by the recipient. People would only get alerts if they needed to take action and the alert would contain everything they needed to know. To summarize my monitoring strategy of reducing the number of alerts, follow these steps:
- Ascertain whether any action was taken on an alert that was received. If not, why not? Could the alert be removed without impacting anything? Then remove it. The data will still be in the monitoring database for reporting but it won’t clog up your phone or e-mail.
- Are you getting alerts that others are taking action on? If so, are they also getting the alerts? If so, consider removing yourself. If not, add them to the alert and then remove yourself. Only those that will take action should get an alert.
- Are you getting notified on your phone via SMS about issues that are not critical (such as disk space is at 20% free)? If so, remove yourself. SMS notifications should only occur for production issues and outages or impending production issues and outages. If a disk is running at 20% but won’t fill up for 8 more days, you shouldn’t be getting notified by SMS yet.
I hope I touched on some topics that hit home and get you to start thinking about some of these areas in your environments. As Bruce Schneier said way back in April of 2000, “security is a process, not a product”. Have fun, and safe computing!