Data classification is a critical part of any information security and compliance program. It involves identifying the types of data that an organization stores and processes, and the sensitivity of that data, based on sets of rules. For example, data classification is often used to identify data regulated by compliance standards like HIPAA or GDPR.
Data classification offers multiple benefits. It is invaluable for effectively prioritizing your security controls and ensuring proper protection of your most critical assets — for example, you might encrypt all documents that are classified as “restricted.” It facilitates risk management by helping organizations assess the value of their data and the impact that would be caused if certain types of data were lost, misused or compromised. Data classification also streamlines legal discovery and drives user productivity by making data easier to find.
Finally, it essential to ensuring compliance with regulations and passing audits, in both the public and private sectors, by helping organizations protect the privacy of regulated data, such as cardholder data (PCI DSS), health records (HIPAA) or EU residents’ personal data (GDPR). Unfortunately, according to the Netwrix 2020 Data Risk and Security Report, 66% of CISOs and compliance officers are not sure if they store regulated data only in secure locations — even though most of them work in organizations subject to PCI DSS (51%) and GDPR (45%).
66% of CISOs and compliance officers are not sure if they store regulated data only in secure locations — even though most of them work in organizations subject to PCI DSS (51%) and GDPR (45%)
Source: 2020 Data Risk & Security Report
Organizations usually design their own data classification models and categories. For instance, U.S. government agencies often define three data types, Public, Secret and Top Secret, while organizations in the private sector usually start by classifying data as restricted, private or public. The best practice is to define an initial data classification model, and later add more granular levels based on your specific data, compliance requirements and other business needs.
In this article, we will review how to approach data classification based on which regulations and standards your organization is subject to:
- Data Classification for Regulations that Protect Personally Identifiable Information (PII)
- Data Classification for NIST 800-53
- Data Classification for ISO 27001
- Data Classification for GDPR
- Data Classification for PCI DSS
- Data Classification for HIPAA
Personally identifiable information (PII) is data that could be used to identify, contact or locate an specific individual or distinguish one person from another. PII is often defined as a person’s first name or first initial and last name in combination with one or more of the following data elements:
- Social Security number
- State-issued driver’s license number
- State-issued identification card number, passport number
- Credit card number
- Financial account number in combination with a security code, access code or password that would permit access to the account
- Medical or health insurance information
Federal statutes protecting PII include:
- Gramm-Leach-Bliley Act — Financial information
- Health Insurance Portability and Accountability Act (HIPAA) — Healthcare information
- Family Educational Rights and Privacy Act (FERPA) — Students’ educational records
- Children’s Online Privacy Protection Act (COPPA) — PII of children under 13
To classify PII, it is necessary to determine the following:
- The level of confidentiality that the data requires
- The potential impact that a personal information breach or data corruption would cause to the individuals involved
- The importance of data availability
The United States General Accounting Office estimates that the identity of 87% of the Americans can be determined using a combination of the person’s gender, date of birth and ZIP code. When taken separately, these details might not seem terribly sensitive. However, if a breach of those three elements would likely also compromise the individual’s name, home address, SSN or other personal data, those elements should be considered sensitive.
To satisfy the information security requirements of the Federal Information Security Management Act (FISMA) law, the Computer Security Division of National Institute of Standards and Technology developed Special Publication 800-53, Security and Privacy Controls for Information Systems and Organizations (NIST 800-53). NIST 800-53 details security and privacy controls for federal information systems and organizations, including how agencies should maintain their systems, applications and integrations in order to ensure confidentiality, integrity and availability. NIST 800-53 is mandatory for all federal agencies. It’s also useful for organizations in the private sector and those seeking to become contractors for any federal agency.
To pass a NIST compliance audit, organizations must categorize their information and information systems by security category with the purpose of applying necessary cybersecurity resources. NIST recommends using three categories — low impact, moderate impact and high impact— which indicate the potential adverse impact of unauthorized disclosure of the data by a malicious internal or external actor concerning agency operations, agency assets or individuals.
The categorization starts with identification of the information types. Each information type gets the provisional impact value (low, moderate or high) for each security objective (confidentiality, integrity and availability). After the value is adjusted to all information types, each information system is assigned with the final security impact level. NIST employs the concept of a “high watermark” when categorizing a system, which means that the overall system is categorized at the highest level across confidentiality, integrity and availability requirements. Thus, if at least one information type is categorized as high, the information system gets the highest impact level.
NIST 800-53 applies to data in systems used to provide services for citizens or administrative and business services. NIST doesn’t give an exact list of information types; rather, it offers recommendations for reviewing information types of interest and considering their classification. Thus, each agency selects their own combination of elements belonging to information types. For example, NIST suggests that the “Planning and Budgeting” information type may include elements like budget formulation, capital planning, tax and fiscal policy, which in general may have a low-impact level on confidentiality, integrity and availability. However, each agency is encouraged to review special factors that might affect impact levels, such as premature public release of a draft budget.
ISO/IEC 27001 is an international standard for the establishment, implementation, maintenance and continuous improvement of an information security management system (ISMS). This voluntary standard is useful for organizations across all industries. During an ISO 27001 audit, organizations need to show that they have a good understanding of what their assets are, the value of each, data ownership, and scenarios of internal use of data.
ISO/IEC 27001 doesn’t specify an exact list of regulated information; it leaves that to each organization. The first step is to determine the scope of the data environment and perform a review all in-scope data. The scope must consider the internal and external threats, interested parties’ requirements, and dependencies between the organization’s activities.
Information classification is critical to ISO 27001 compliance, since the objective is to ensure that information receives an appropriate level of protection. The ISO standard requires companies need to perform information asset inventory and classification, assign information owners, and define procedures for acceptable data use.
The framework doesn’t define a data classification policy and which security controls should applied to the classified data. Rather, section A.8.2 gives the following three-step instructions:
- Classification of data — Information should be classified according to legal requirements, value and sensitivity to unauthorized disclosure or modification. The framework doesn’t provide exact examples of classification levels, so organizations in the government and private sectors can develop their own schemes. Often, three or four levels of classification are used, such as Restricted, Confidential and Public.
- Labelling of data — The organization should develop procedures to label information according to its classification scheme. The process includes labeling data in both digital and physical formats. The labeling system needs to be clear and easy to manage.
- Handling of data — The organization must establish rules for protecting data based on its classification, such as access restrictions or encryption.
The text of the EU’s General Data Protection Regulation (GDPR) does not use the terms “data inventory” or “mapping,” but these processes are essential to protect personal data and manage a data security program that complies with the data privacy law. For example, data inventory is the first step in complying with the requirement to manage records of processing activities, including establishing the categories of data, the purpose of processing, and a general description of the relevant technical solutions and organizational security measures.
Companies need to review all data assets and understand which of them contain an individual’s personal data. Specifically, the Data Protection Impact Assessment (DPIA) requirement mandates an inventory of all processes that involve the collection, storage, use or deletion of personal data, as well as an assessment of the value or confidentiality of the information and the potential violation of privacy rights or distress individuals might suffer in the event of a security breach.
The GDPR defines personal data as any information that can identify a natural person, directly or indirectly, such as:
- An identification number
- Location data
- An online identifier
- One or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of the person
To comply with the GDPR, originations need to incorporate controls like data discovery, data profiling, taxonomies for data sensitivity, and data asset cataloging. To classify data, companies may need to consider the following:
- Type of data (financial information, health data, etc.)
- Basis for data protection (personal or sensitive information)
- The categories of the individuals involved (customers, patients, etc.)
- The categories of recipients (especially international third-party vendors)
The record-keeping requirements for GDPR compliance are very similar to those described above for ISO 27001 compliance, so following the approach of the ISO 27001 helps companies meet GDPR requirements as well.
The Payment Card Industry Data Security Standard (PCI DSS) certification was developed to encourage securing of cardholder data. It facilitates the broad adoption of consistent data security measures globally through a set of requirements administered by the PCI SSC. PCI DSS compliance requirements include technical and operational measures designed to alleviate vulnerabilities and secure personal consumer financial information like credit and debit card data used in payment card transactions.
Payment card information is defined as a credit card number (also referred to as a primary account number or PAN) in combination with one or more of the following data elements:
- Cardholder name
- Service code
- Expiration date
- CVC2, CVV2 or CID value
- PIN or PIN block
- Contents of a credit card’s magnetic stripe
Data classification is requested in terms of regular risk assessment and security categorization process. Cardholder data elements should be classified according to their type, storage permission and required level of protection in order to ensure that security controls apply to all sensitive data as well as confirm that all instances of cardholder data in the environment are documented and that no cardholder data exists outside of the defined card holder environment.
According to the Netwrix 2020 Data Risk and Security Report, 75% of financial organizations that classify data can detect data misuse in minutes, while those who don’t mostly need days (43%) or months (29%). This highlights the importance of data classification for PCI DSS compliance purposes.
75% of financial organizations that classify data can detect data misuse in minutes, while those who don’t mostly need days (43%) or months (29%)
Source: 2020 Data Risk & Security Report
The HIPAA Security Rule establishes baseline administrative, physical and technical safeguards for ensuring the confidentiality, integrity and availability of electronic protected health information (PHI and ePHI). PHI is similar to personally identifiable information, as discussed above. PHI is considered as any individually identifiable health information, including:
- Mental health history
- Healthcare services
- Payments for healthcare
- Other identifiable information, such as patient’s name, address or Social Security number
ePHI is defined as any protected health information that is stored in or transmitted by electronic media. Electronic storage media include computer hard drives, as well as removable or transportable digital memory media like optical disks and digital memory cards. Transmission media include the internet or private networks. Common examples of ePHI include:
- Address (including street address, city, county or zip code)
- Any date (except years) directly related to an individual, including birthday, date of admission or discharge, or date of death, as well as the exact age of individuals older than 89
- Telephone or fax number
- Email address
- Social Security number
- Medical record number
- Health plan or health insurance beneficiary number
- Vehicle identifier, serial number or license plate number
- Web URL or IP address
- Biometric identifier, such as fingerprint, voice print or full-face photo
- Any other unique identifying number, characteristic or code
The HIPAA Privacy Rule requires organizations to ensure the integrity of ePHI and protecting it from being altered or destroyed in an unauthorized manner. Therefore, each covered entity or business associate should inventory their ePHI and identify the risks to its confidentiality, availability and integrity. The organization must identify where the ePHI is stored, received, maintained or transmitted. Organization can gather this data by reviewing past projects, performing interviews, and reviewing documentation.
HIPAA classification guidelines require grouping data according to its level of sensitivity. Classification of data will aid in determining baseline security controls for the protection of data. Organizations can start with a simple three-level data classification:
- Restricted/confidential data — Data whose unauthorized disclosure, alteration or destruction could causes significant damage. This data requires the highest level of security and controlled access in accordance with the principle of the least privilege.
- Internal data — Data whose its unauthorized disclosure, alteration or destruction could cause low or moderate damage. This data is not for release to public, and requires reasonable security controls.
- Public data — Although public data doesn’t need protection against unauthorized access, it still needs protection against unauthorized modification or destruction.
The major compliance regulations have a lot in common when it comes to data classification. In general, organizations should follow this process:
- Define the purpose of data classification, such as:
- To mitigate the risks associated with unauthorized disclosure and access (e.g., PCI DSS)
- To comply with industry standards that require information classification (e.g., ISO 27001)
- To uphold data subject rights and retrieve specific information in a set timeframe (e.g., GDPR)
- Define the scope of the data environment, and then perform a review of all in-scope data.
- Define levels of data sensitivity and classify the data. Start with a minimum number of levels so as not to overcomplicate the process.
- Develop data handling guidelines to ensure the security of each category of data.