Skip to main content
Data Governance

NetApp Data Classification

NetApp Data Classification gives you actionable insights into your data to maintain compliance, optimize storage, accelerate data migrations, and prepare data for GenAI and retrieval augmented generation (RAG).

NetApp Data Classification

In today's digital age, data is the lifeblood of any organization. But as data volumes explode and environments become more complex, how can you ensure that your data is not just managed, but harnessed for its full potential? Enter NetApp Data Classification—your partner in transforming data chaos into data clarity.

Data Classification, a core capability within NetApp Data Services, is a robust data governance service that provides comprehensive visibility for managing data across your NetApp footprint more effectively.

Data Classification automatically maps your data, determining how much data exists, where it's located, and the types and categories of the data. This enables you to make intelligent decisions on your data in real time and take action to optimize storage, accelerate data migrations and prepare your data for GenAI and RAG—reducing risk and costs.

Using advanced AI, NetApp Data Classification simplifies data governance, giving you actionable insights to address data privacy, security and compliance requirements.

NetApp Data Classification overview

Features and Benefits

The capabilities that set NetApp Data Classification apart.

Quickly uncover compliance and security risks

Data discovery and classification

Identifying sensitive data is complex and especially important in enterprise environments. Often the sensitivity is specific to the organization (and potentially a specific domain or language). To define sensitive data accurately, AI is a must.

Data Classification goes much further than traditional pattern matching. Data Classification uses AI, machine learning (ML), and natural language processing (NLP) technologies to categorize and classify the data by sensitivity and compliance type, while highlighting potential security and/or compliance risks.

Personally Identifiable Information (PII)

Data Classification automatically identifies specific words, strings, and patterns in the data. It can recognize PII, credit card numbers, social security numbers, bank account numbers, and more.

To ensure accuracy, Data Classification uses proximity validation to validate its findings. Validation works by looking for one or more predefined keywords near the personal data that was found. For example, Data Classification identifies an Australian Tax File Number (TFN) as a TFN only if it finds a proximity phrase next to it, for example, "TFN" or "Tax File."

Sensitive personal data

Data Classification also automatically identifies special types of sensitive personal information as defined by privacy regulations such as articles 9 and 10 of the General Data Protection Regulation (GDPR). For example, information regarding a person's health, ethnic origin, or sexual orientation. With its NLP abilities, Data Classification can distinguish between "George is Mexican" (indicating sensitive data), versus "George is eating Mexican food".

Key Benefits

Govern all of your NetApp data

  • Map, classify, and categorize your data for visibility and control.
  • Perform data hygiene tasks holistically across your hybrid NetApp data estate.

Optimize storage and reduce costs

  • Archive stale data.
  • Identify and remove duplicate data.

Accelerate data migration projects

  • Map data for migration.
  • Identify sensitive data before moving to the cloud.

Maintain regulatory compliance

  • Map personally identifiable information (PII).
  • Comply with privacy regulations, including GDPR, CCPA, PCI, HIPAA.
  • Respond quickly to Data Subject Access Requests (DSARs).

Prepare data for GenAI and RAG

  • Find and remove irrelevant or stale data that can distort results.
  • Identify and delete duplicate data to enhance training efficiency and prevent the model from assigning undue importance to it.
  • Identify PII and sensitive PII to avoid inadvertent use in training sets and results.

Get actionable reports

Get actionable reports

Actionable compliance reports

Data Classification offers ready-to-use and custom reports for compliance that reduce manual work, cost, and errors. These include:
  • The Privacy Risk Assessment report: Provides an overview of your organization's data privacy risk status to support privacy regulations such as GDPR and the California Consumer Privacy Act (CCPA).
  • The Payment Card Industry Data Security Standard (PCI DSS) report: Helps identify credit card information within your data.
  • The Health Insurance Portability and Accountability Act (HIPAA) report: Helps identify files containing health information.
  • The Service Data Subject Access Requests (DSAR) report: Helps comply with GDPR and similar data privacy regulations by finding files that have that person's name or identifier in it.
Expert Guidance

Thrive with expert-led storage guidance

Get tailored advice on how NetApp Data Classification fits your environment — from sizing and deployment to long-term optimization.

Thrive with expert-led storage guidance

Technical Specifications

Exhaustive hardware and software metrics extracted directly from official documentation.

  • Artificial Intelligence (AI)
    Used to categorize and classify data by sensitivity and compliance type
  • Machine Learning (ML)
    Used to categorize and classify data by sensitivity and compliance type
  • Natural Language Processing (NLP)
    Distinguishes context (e.g., "George is Mexican" vs. "George is eating Mexican food")
  • Proximity Validation
    Validates findings by looking for predefined keywords near personal data

  • Personally Identifiable Information (PII)
    Automatic identification
  • Credit card numbers
    Automatic identification
  • Social security numbers
    Automatic identification
  • Bank account numbers
    Automatic identification
  • Australian Tax File Number (TFN)
    Identified via proximity phrases such as "TFN" or "Tax File"
  • Sensitive personal data
    Identifies special types as defined by GDPR articles 9 and 10 (e.g., health, ethnic origin, sexual orientation)

  • Privacy Risk Assessment report
    Overview of organization's data privacy risk status to support GDPR and CCPA
  • PCI DSS report
    Helps identify credit card information within your data
  • HIPAA report
    Helps identify files containing health information
  • Service Data Subject Access Requests (DSAR) report
    Helps comply with GDPR and similar data privacy regulations
  • Supported Regulations
    GDPR, CCPA, PCI, HIPAA

  • Service category
    Core capability within NetApp Data Services
  • Coverage
    NetApp data estate / hybrid NetApp footprint
  • Document ID
    SB-4068-1025

Ready to get started?

Get your data flowing from edge to core to cloud.

Talk to a specialist

Request a custom quote

Build a configuration with a Data Governance specialist.

Request a quote

Download the datasheet

Full specs, performance metrics, and deployment notes.

Get the datasheet

Learn more

Explore resources

Datasheets, whitepapers, case studies, and technical documentation.

Explore resources

View solutions

Tailored storage and data management solutions for your workloads.

View solutions
FREQUENTLY ASKED QUESTIONS

Common questions about NetApp Data Classification & Governance

Answers to what enterprise IT leaders ask most before deploying NetApp Data Classification & Governance with SANDataWorks.

NetApp Data Classification is an AI-driven tool that uses Natural Language Processing to scan your entire data estate. It automatically maps and categorizes sensitive Personally Identifiable Information (PII) so you can maintain compliance with GDPR, CCPA, and HIPAA.

For AI to be effective and secure, training data must be clean. Data Classification automatically finds and removes duplicate or stale data that distorts AI models, and isolates sensitive PII so it isn’t inadvertently fed into public Generative AI algorithms.

Yes. Finding specific consumer data manually across petabytes is nearly impossible. Data Classification generates ready-to-use DSAR reports in seconds by automatically locating every file containing a specific person’s name or identifier.

No. It provides integrated data intelligence across your entire hybrid multicloud data estate, discovering and classifying data whether it lives in your on-premises data center or in public cloud storage.

Technology alone doesn’t ensure compliance. SANDataWorks experts use BlueXP and Data Classification to uncover compliance risks, apply automated policy-driven guardrails, and execute data migrations securely without exposing hidden liabilities.

Most secure storage on the planet FIPS 140-3 · NSA CSfC · DoDIN APL
Validated for top-secret data Only enterprise storage to hold this certification
Authorized NetApp Partner SANDataWorks · a division of BlueAlly