Organizations have limited resources to invest in safeguarding their data. Knowing exactly what data needs protection will help you set priorities and develop a sound plan so you can allocate your budget and other resources wisely, minimizing security and compliance costs.
The best place to start is by classifying your data. Data classification provides a solid foundation for a data security strategy because it helps you identify the data at risk in the IT network, both on premises and in the cloud. Moreover, it helps you improve decision-making and get rid of unneeded data to reduce storage costs.
In this article, we will give the data classification definition and explore the steps involved in getting started.
Data classification definition
What is data classification? It is the process of organizing both structured and unstructured data into agreed-on categories. Data classification enables more efficient use and protection of critical data across the organization, including facilitating risk management, legal discovery and compliance processes.
For years, it was up to users to classify data they created, sent, modified or otherwise touched. Today, organizations have options for automating classification of new data that users create or the organization collects. They can also discover and classify older data, or choose to leave it to gradually be retired without being classified.
What is data discovery? It is the process of scanning repositories to locate data. Data discovery can serve many purposes, such as enterprise content search, data governance, and data analysis and visualization. When combined with data classification, it helps organizations identify repositories that might contain sensitive information so they can make informed decisions about how to properly protect that data.
Benefits of data classification
Data classification helps you improve both data security and regulatory compliance.
To safeguard sensitive corporate and customer data adequately, first of all, you must know and understand your data. Specifically, you need to be able to answer the following questions:
- What sensitive data — such as intellectual property (IP), protected health information (PHI), personally identifiable information (PII), and credit card numbers — do you store?
- Where does this sensitive data reside?
- Who can access, modify and delete it?
- How will your business be affected if this data is leaked, destroyed or improperly altered?
Having answers to these questions, along with information about the threat landscape, enables you to protect sensitive data by assessing risk levels, prioritizing your efforts, and planning and implementing appropriate data protection and threat detection measures.
Compliance standards require organizations to protect specific data, such as cardholder information (PCI DSS), health records (HIPAA), financial data (SOX) or EU residents’ personal data (GDPR). Data discovery and classification help you determine where these types of data are located so you can make sure that appropriate security controls are in place and that the data is trackable and searchable as required by regulations. By focusing your compliance efforts on data that falls under the regulations you’re subject to, you increase your chances of maintaining day-to-day compliance and passing audits.
Guidelines for data classification
There is no one-size-fits-all approach to data classification. However, the classification process can be broken down into four key steps, which you can tailor to meet your organization’s unique needs as you develop your general data protection strategy.
Step #1. Establish a data classification policy
First, you should define a data classification policy and communicate it to all employees who work with sensitive data. The policy should be short and simple and include the following basic elements:
- Objectives — The reasons data classification has been put into place and the goals the company expects to achieve from it.
- Workflows — How the data classification process will be organized and how it will impact employees who use different categories of sensitive data.
- Data classification scheme — The categories that the data will be classified into (discussed in more detail below).
- Data owners — The roles and responsibilities of the business units, including how they should classify sensitive data and grant access to it
- Handling instructions — Security standards that specify appropriate handling practices for each category of data, such as how it must be stored, what access rights should be assigned, how it can be shared, when it must be encrypted, and retention terms and processes. Since these guidelines may change, it is best to maintain them as a separate document.
Step #2. Discover the sensitive data you already store
Now it’s time to apply your classification policies to your existing data. You could choose to classify only new data, but then business-critical or confidential data you already have might be left insufficiently protected.
Rather than trying to manually trying to identify databases, file shares and other systems that might contain sensitive information, consider investing in a data discovery application that will automate the process. Some technology tools report both the volume and potential category of the data.
Step #3. Apply labels
Each sensitive data asset needs a label (tag) in accordance with your data classification schema; this will help you enforce your data classification policy. Labeling can be automated or done manually by data owners.
Step #4. Use the results to improve security and compliance
Once you know what sensitive data you have and its storage locations, you can review your security and privacy policies and procedures to assess whether all data is protected by risk-appropriate measures. By categorizing all your sensitive data, you can prioritize your efforts, control costs and improve data management processes.
Step #5. Repeat
Data is dynamic: Files are created, copied, moved and deleted every day. Therefore, data classification must be an ongoing process in the organization. Proper administration of the data classification process will help ensure that all sensitive data is protected.
Types of data classification categories
There is no one right way to design your data classification model and define your data classification categories. For instance, U.S. government agencies often define three data types: Public, Secret and Top Secret, while NATO used a five-level scheme for the Manhattan Project.
One option is to begin with a simple three-level type of data classification:
- Public data— Data that may be freely disclosed with public, such as customer service contact email addresses and phone numbers.
- Internal data— Data that has low security requirements but is not meant for public disclosure. Examples include business data like marketing research and sales phone scripts.
- Restricted data— Highly sensitive internal data whose disclosure could negatively affect operations and put the organization at financial or legal risk. Restricted data requires the highest level of security protection. Examples include data protected by regulations or confidentiality agreements, such as patient health information, PII of customers or employees (e.g., Social Security numbers), and authentication data (such as user IDs and passwords).
Your organization can use these three types of data classification categories to define an initial data classification model, and later add more granular levels based on your specific data, compliance requirements and other business needs.
As you can see, data classification is not a magic wand that ensures data security or compliance with regulatory requirements by itself. Rather, it helps organizations identify the data most critical to the business so they can focus their limited time and financial resources on ensuring appropriate data protection and ongoing compliance with security policies and regulations.