Data Classification for Compliance: Looking at the Nuances

Data classification is essential for achieving, maintaining and proving compliance with a wide range of regulations and standards. For example, PCI DSS, HIPAA, SOX and GDPR all have different purposes and requirements, but data classification is necessary for compliance with all of them — after all, you need to accurately identify and tag health records, cardholder information, financial documents and other regulated data in order to protect that data appropriately.

This guide offers effective methods for approaching data classification for compliance based on which regulations and standards apply to your organization.

Why do businesses need to classify data according to different compliance regulations?

Compliance regulations for data classification are mandated by various laws and standards specific to industries and regions. These laws and standards aim to ensure that organizations handle data in a manner that ensures privacy, security and ethical use.

Complying with these regulations helps organizations protect sensitive information so they can avoid data breaches, legal repercussions and steep fines. Compliance also builds trust with customers and stakeholders by demonstrating a commitment to maintaining data privacy and security.

How does data compliance impact businesses?

Complying with requirements for data security and privacy offers businesses a wide range of benefits, including the following:

Reduced risk of financial damage — Failure to comply with compliance regulations can result in substantial fines, penalties or legal actions, which can significantly impact a company’s finances and reputation.
Protection of reputation and trust — Adhering to data compliance regulations helps build trust with customers, clients and other stakeholders.
Reduced risk of security breaches — Following compliance regulations reduces the likelihood of data breaches and all their costly consequences, including disruption to business operations and lasting brand damage.
Operational efficiency and transparency — Achieving and maintaining compliance often necessitates a more structured and organized approach to data handling, which can increase operational efficiency, data management and transparency in business practices.
Global market access — Adhering to international data compliance standards like the EU’s GDPR can enable businesses to operate in a broader market and expand their customer base.
Innovation and competitive advantage — Companies that prioritize data compliance are often better positioned to adapt to emerging technologies and changing regulations, giving them a competitive advantage in the market.
Customer loyalty — Compliance demonstrates a commitment to protecting customer data and privacy rights, which is essential for building and maintaining strong customer relationships.

What are the top challenges in data classification for compliance?

The sheer complexity of evolving compliance requirements is a stumbling block for many organizations. Indeed, compliance officers report that one of their top compliance-related challenges is the speed and volume of regulatory change.

Another top issue is a lack of skilled resources; in fact, 60% of cybersecurity professionals report that the global shortage of cybersecurity talent places their organization at risk. Another important challenge is focus: Although 95% of organizations recognize data privacy as a business imperative, only 33% of security professionals list data protection and governance as top job responsibilities.

How can organizations build a good data classification model?

Organizations are free to design their own data classification models and categories to achieve compliance. While this allows you to create systems that meet your business needs, it can be complicated to establish a data classification policy that ensures sensitive data is handled according to risk level across the information lifecycle.

A good starting point is to define an initial data classification model and then add more granular levels based on the types of data you collect and your specific compliance obligations. For instance, you might begin with the three basic categories of Restricted, Private and Public, and then implement additional levels if you deal with multi-jurisdictional challenges or data sprawl across internal and external data repositories.

Your data protection policy should also include provisions for scalability. As your organization grows, your classification scheme must adapt to handle an increasing volume and variety of data. This is particularly important when dealing with complex and diverse content, such as structured, semi-structured and unstructured datasets that each which might contain multiple data types that require fine-grained classification.

Finally, being compliant and proving compliance are two different issues, so your data classification model should also be designed with audit readiness in mind. Many organizations struggle to pass compliance audits due to the level of documentation and evidence required, even if they have an appropriate classification scheme in place. The tight deadlines associated with compliance audits don’t allow much extra time for gathering evidence and reports, particularly when you’re maintaining day-to-day business operations.

Data Classification for Regulations that Protect Personally Identifiable Information (PII)

Personally identifiable information (PII) is data that could be used to identify, contact or locate a specific individual or distinguish one person from another. Examples of PII can include:

Name
Birth date
Address
Social Security number
State-issued driver’s license number
State-issued identification card number, passport number
Credit card number
Financial account number in combination with a code or password that grants access to the account
Medical or health insurance information

When considered separately, some of these details might not seem terribly sensitive. However, the United States General Accounting Office estimates that the identity of 87% of Americans can be determined using a combination of the person’s gender, date of birth and ZIP code. Accordingly, if a breach of those three elements would likely also compromise the individual’s name, home address, SSN or other personal data, those elements should be considered sensitive.

Federal statutes protecting PII include:

Gramm-Leach-Bliley Act — Financial information
Health Insurance Portability and Accountability Act (HIPAA) — Healthcare information
Family Educational Rights and Privacy Act (FERPA) — Students’ educational records
Children’s Online Privacy Protection Act (COPPA) — PII of children under 13

To effectively achieve PII data classification, it is necessary to determine the following:

The level of confidentiality that the data requires
The potential impact that a personal information breach or data corruption would cause on the individuals involved
The importance of data availability

Data Classification for NIST 800-53

The National Institute of Standards and Technology (NIST) provides guidance to help organization improve data security. NIST Special Publication (SP) 800-53 details security and privacy controls for federal information systems and organizations, including how agencies should maintain their systems, applications and integrations to ensure confidentiality, integrity and availability.

NIST 800-53 is mandatory for all federal agencies and their contractors. It’s also useful for organizations in the private sector.

What does NIST 800-53 request in terms of data classification?

The data classification standard for NIST involves three categories — low impact, moderate impact and high impact. These categories are assigned based on the potential damage on agency operations, agency assets, or individuals that could result from unauthorized disclosure of the data by a malicious internal or external actor.

An impact value is assigned for each security objective (confidentiality, integrity and availability), which is used to assign the overall security impact level. The NIST 800-53 data classification policy employs the concept of a “high watermark,” which means that the final level assigned is the highest across the confidentiality, integrity and availability inputs. Thus, if any of the three areas is categorized as high impact, the overall NIST data classification level is high impact.

Which types of data are protected under NIST 800-53?

There are no explicit data classification levels for NIST in the same manner as some other standards. However, NIST Special Publication 800-53 Rev. 5 outlines the following categories:

Classified Information — NIST does not specifically define levels of classified information, such as Confidential, Secret and Top Secret; they are generally governed by separate government standards and protocols, such as those outlined by Executive Order 13526.
Controlled Unclassified Information (CUI) — CUI is data that is not Classified Information but that still requires safeguarding. For example, certain data may be considered as CUI because its improper disclosure could pose a risk to national security, requiring protection despite not being formally classified.
Unclassified information — This is general information that isn’t sensitive enough to warrant particular protection measures. It is typically used to contact open for public access.
Organizations can define other categories. For example, NIST suggests that the “Planning and Budgeting” category may include elements like budget formulation, capital planning, tax and fiscal policy documents, which in general have a low impact level on confidentiality, integrity and availability. However, organizations are encouraged to review special factors that might affect impact levels, such as the premature public release of a draft budget.

Data Classification for ISO 27001

ISO/IEC 27001 is an international standard for the establishment, implementation, maintenance and continuous improvement of an information security management system (ISMS). This voluntary standard is useful for organizations across all industries. During an ISO 27001 audit, organizations need to show that they have a good understanding of what their assets are, the value of each, data ownership, and scenarios of internal use of data.

What types of data are protected under ISO 27001?

ISO/IEC 27001 doesn’t specify an exact list of regulated information. Instead, each organization should determine the scope of the data environment and perform a review of all in-scope data. The scope must consider the internal and external threats, interested parties’ requirements, and dependencies between the organization’s activities.

What does ISO 27001 request in terms of data classification?

Information classification is critical to ISO 27001 compliance since the objective is to ensure that information receives an appropriate level of protection. The ISO standard requires companies to perform information asset inventory and classification, assign information owners, and define procedures for acceptable data use.

There is no specific ISO 27011 data classification policy that specifies which security controls should applied to classified data. Rather, section A.8.2 gives the following instructions:

Classify data — Information should be classified according to legal requirements, value and sensitivity to unauthorized disclosure or modification. The framework doesn’t provide exact examples of classification levels, so organizations can develop their own schemes. Often, three or four levels of classification are used, such as Restricted, Confidential and Public.
Label data — The organization should develop procedures to label information according to its classification scheme. The process includes labeling data in both digital and physical formats. The labeling system needs to be clear and easy to manage.
Establish rules for handling data — The organization must establish rules for protecting data based on its classification, such as access restrictions or encryption.

Data Classification for GDPR

Data inventory and classification are also critical to compliance with the EU’s General Data Protection Regulation (GDPR). The text of the EU’s GDPR does not use the terms “data inventory” or “mapping,” but these processes are essential to protect personal data and manage a data security program that complies with the data privacy law. For example, data inventory is the first step in complying with the requirement to manage records of processing activities, including establishing the categories of data, the purpose of processing, and a general description of the relevant technical solutions and organizational security measures.

Organizations need to perform a data protection impact assessment (DPIA) that covers all processes involved in the collection, storage, use or deletion of personal data. The DPIA should also assess the value or confidentiality of the information and the potential violation of privacy rights or distress individuals might suffer in the event of a security breach.

Which personal data is protected under the GDPR?

The GDPR defines personal data as any information that can identify a natural person, directly or indirectly, such as:

Name
Identification number
Location data
Online identifier
One or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of the person

What does GDPR request in terms of data classification?

To properly implement GDPR data classification, organizations may need to consider the following facts about the data:

Data type (financial information, health data, etc.)
Basis for data protection
Categories of individuals involved (customers, patients, etc.)
Categories of recipients (especially international third-party vendors)

Record keeping for GDPR and ISO 27001 framework

The record-keeping requirements for GDPR compliance are very similar to those described above for ISO 27001 compliance, so following the approach of ISO 27001 helps companies meet GDPR requirements as well.

Data Classification for PCI DSS

The Payment Card Industry Data Security Standard (PCI DSS) was developed to secure cardholder data globally. Organizations must implement technical and operational measures to mitigate vulnerabilities and secure payment card transactions.

What is payment card information?

Payment card information is defined as a credit card number (also referred to as a primary account number or PAN) in combination with one or more of the following data elements:

Cardholder name
Service code
Expiration date
CVC2, CVV2 or CID value
PIN or PIN block
Contents of a credit card’s magnetic stripe

What does PCI DSS request in terms of data classification?

PCI data classification involves classifying cardholder data elements according to their type, storage permissions and required level of protection. Organizations must document all instances of cardholder data and ensure that no such data exists outside of the defined cardholder environment.

According to the Netwrix 2020 Data Risk and Security Report, 75% of financial organizations that classify data can detect data misuse in minutes, while those who don’t mostly need days (43%) or months (29%). This highlights the importance of data classification for PCI DSS compliance purposes.

75% of financial organizations that classify data can detect data misuse in minutes, while those who don’t mostly need days (43%) or months (29%)
Source: 2020 Data Risk & Security Report

Data classification for HIPAA

The HIPAA Security Rule establishes baseline administrative, physical and technical safeguards for ensuring the confidentiality, integrity and availability of protected health information (PHI). Electronic PHI (ePHI) is any PHI stored on or transmitted by electronic media. Electronic storage media include computer hard drives as well as removable media like optical disks and memory cards. Transmission media include the internet or private networks.

PHI data classification includes the following details about a patient:

Name
Address
Any date directly related to an individual (such as birth date, date of admission or discharge, or date of death), as well as the exact age of individuals older than 89
Telephone or fax number
Email address
Social Security number
Medical record number
Health plan or health insurance beneficiary number
Vehicle identifier, serial number or license plate number
Web URL or IP address
Biometric identifiers, such as fingerprint, voice print or full-face photo
Any other unique identifying number, characteristic or code

What does HIPAA request in terms of data classification?

HIPAA requires organizations to ensure the integrity of ePHI and protect it from being altered or destroyed in an unauthorized manner. Therefore, each covered entity or business associate must inventory their ePHI and identify the risks to its confidentiality, availability and integrity. The organization must identify where the ePHI is stored, received, maintained or transmitted. Organizations can gather this data by reviewing past projects, performing interviews, and reviewing documentation.

HIPAA data classification guidelines require grouping data according to its level of sensitivity. Classification of data will aid in determining baseline security controls for the protection of data. Organizations can start with a simple three-level data classification:

Restricted/confidential data — Data whose unauthorized disclosure, alteration or destruction could cause significant damage. This data requires the highest level of security and controlled access following the principle of the least privilege.
Internal data — Data whose unauthorized disclosure, alteration or destruction could cause low or moderate damage. This data is not for public release and requires reasonable security controls.
Public data — Although public data doesn’t need protection against unauthorized access, it still needs protection against unauthorized modification or destruction.

Data classification for SOX

While the Sarbanes-Oxley Act (SOX) doesn’t specifically mandate data classification, implementing robust data classification practices is a pivotal step in aligning with its objectives. By categorizing and securing sensitive financial data, companies can fortify internal controls, prevent unauthorized access, and uphold the accuracy and integrity of financial information — indispensable aspects of SOX compliance.

What does SOX request in terms of data classification?

Proper data classification can assist in complying with the following sections of SOX:

Section 302: Corporate Responsibility for Financial Reports — Requires the CEO and CFO to certify the accuracy of financial reports. Data classification can help organizations ensure the accuracy and reliability of their financial information.
Section 404: Assessment of Internal Controls — Requires companies to maintain adequate internal controls for financial reporting. Proper data classification is essential to knowing which data to requires what level of protection.
Section 802: Criminal Penalties for Altering Documents — Prohibits the alteration, destruction or concealment of records. Proper data classification aids in identifying crucial records and applying appropriate controls to prevent unauthorized alteration or deletion.

Data classification for California Consumer Protection Act (CCPA)

The California Consumer Protection Act (CCPA) is a robust privacy law that aims to give consumers more control over their personal information collected by businesses. While the CCPA doesn’t explicitly prescribe data classification methodologies, it heavily emphasizes the protection and categorization of consumer data. CCPA data classification is vital for compliance as it enables businesses to effectively manage and safeguard the personal information they collect. By categorizing data based on its sensitivity, businesses can more readily identify, control and protect personal information as required by the CCPA.

What does CCPA request in terms of data classification?

Under the CCPA, organizations are required to implement measures that involve the classification and handling of personal data in a manner that ensures the protection and privacy of consumers’ information. Again, while the CCPA doesn’t explicitly dictate data classification methodologies, it does necessitate that companies undertake several actions:

Identify and categorize personal information — Businesses must identify and categorize the types of personal information they collect, process or store. This includes but is not limited to:
- Names
- Addresses
- Social Security numbers
- Biometric data
- Geolocation information
- Online identifiers

Implement security measures — The law requires organizations to establish robust security measures to safeguard personal data. Encryption, access controls and other security practices are commonly used to protect sensitive data.
Uphold consumer rights — CCPA grants consumers rights regarding their personal information, including the right to know what data is being collected, the right to request deletion of their data, and the right to opt out of the sale of their data. Data classification aids in identifying and managing the data subject to these rights so organizations can respond appropriately to consumer requests.
Ensure transparency and accountability — Businesses need to be transparent about their data practices and accountable for how they handle consumer information. Data classification supports these requirements by enabling organizations to document and demonstrate their data handling processes and compliance efforts.

Data classification for Cybersecurity Maturity Model Certification

The Cybersecurity Maturity Model Certification (CMMC) is a framework developed by the U.S. Department of Defense (DoD) to enhance the cybersecurity posture of defense contractors and subcontractors. One major component of CMMC is the emphasis on protection and categorization of sensitive information. Data classification enables organizations to systematically categorize and safeguard sensitive data, aligning with the security objectives outlined in the CMMC framework.

What does CMMC request in terms of data classification?

CMMC encompasses five levels of cybersecurity maturity, with higher levels mandating more stringent controls for protecting sensitive information:

Level 1: Basic Cyber Hygiene — This level focuses on basic cybersecurity controls and practices, such as maintaining antivirus software, establishing basic password requirements and conducting employee cybersecurity training.
Level 2: Intermediate Cyber Hygiene — Level 2 includes the establishment of documented policies and the implementation of controlled processes to ensure a more structured approach to cybersecurity.
Level 3: Good Cyber Hygiene — Level 3 represents a significant advancement in an organization’s cybersecurity practices. It involves the implementation of a comprehensive and well-documented set of security policies and practices. This level aligns with the protection CUI, as defined earlier.
Level 4: Proactive — At Level 4, organizations need to show a proactive approach to cybersecurity, including reviewing and adapting cybersecurity practices regularly to address more complex and evolving threats, such as advanced persistent threats (APTs).
Level 5: Advanced/Progressive — Organizations at this level demonstrate an advanced and highly adaptive approach to cybersecurity. This includes continuously monitoring, reviewing and improving cybersecurity processes and practices to swiftly detect and mitigate sophisticated cyber threats.

Data classification is essential to meeting the requirements of each CMMC level. It involves applying labels to data and then implementing access controls, encryption and other security measures to protect that data appropriately. Effectively classifying data also strengthens the cyber resilience of defense contractors in today’s dynamic threat landscape.

Data classification for Federal Information Security Management Act (FISMA)

The Federal Information Security Management Act (FISMA) provides a set of guidelines and security standards. FISMA compliance is required for U.S. federal agencies, state agencies that administer federal programs like Medicare, and private businesses that have a contractual relationship with the U.S. government.

What does FISMA request in terms of data classification?

FISMA data classification requirements mandate that agencies identify and categorize the various types of information they handle. This involves a detailed assessment to distinguish between different levels of sensitivity for different data, such as PII and intellectual property (IP).

Other key FISMA requirements include the following:

Data-handling policies and procedures — Federal agencies are expected to establish specific policies and procedures for handling different categories of data. This might involve implementing access controls, encryption, data loss prevention mechanisms, and specific security protocols for different categories of information. For instance, policies could dictate that PII be encrypted both in transit and at rest, while classified documents might require restricted access and stringent control measures.
Risk assessment and management — FISMA mandates a comprehensive risk management approach. Agencies are required to conduct detailed risk assessments for various data categories and implement controls commensurate with the level of risk associated with each category.
Auditing and reporting — Compliance with FISMA standards includes regular auditing and reporting. Data classification enables agencies to effectively demonstrate compliance by showcasing that appropriate security controls are applied to protect different categories of data and that they undergo regular assessments and audits.
Incident response plans: Effective data classification assists in incident response. Agencies can develop precise response plans tailored to different categories of data, allowing for swift and appropriate actions in the event of a security breach or incident.

Summary

The major compliance regulations have a lot in common when it comes to data classification. In general, organizations should follow this process:

Define the purpose of data classification, such as:
1. To mitigate the risks associated with unauthorized disclosure and access (e.g., PCI DSS)
2. To comply with industry standards that require information classification (e.g., ISO 27001)
3. To uphold data subject rights and retrieve specific information in a set timeframe (e.g., GDPR)
Define the scope of the data environment, and then perform a review of all in-scope data.
Define levels of data sensitivity and classify the data. Start with a minimum number of levels so as not to overcomplicate the process.
Develop data handling guidelines to ensure the security of each category of data.

Farrah Gamboa

Senior Director of Product Management at Netwrix. Farrah is responsible for building and delivering on the roadmap of Netwrix products and solutions related to Data Security and Audit & Compliance. Farrah has over 10 years of experience working with enterprise scale data security solutions, joining Netwrix from Stealthbits Technologies where she served as the Technical Product Manager and QC Manager. Farrah has a BS in Industrial Engineering from Rutgers University.

Data Classification for Compliance with PCI DSS, NIST, HIPAA and More