Skip to main content
Data Lake Infrastructure

Modern, powerful, and efficient data lake infrastructure with StorageGRID and Dremio

NetApp has partnered with Dremio, the easy and open data lakehouse, to help enterprises face the challenge of building a future-proof, scalable, and efficient data infrastructure.

Storagegrid And Dremio

Developing a modern data infrastructure isn't an easy task.

Enterprises want to extract as much value as possible from their data, while minimizing complicated data pipelines. Traditional data warehouses can lead to data silos, vendor lock-in, inconsistent sources of truth, and tangled processes.

Faced with head-spinning increases in data volume and variety year after year, enterprises often find that legacy data infrastructures like Hadoop just can't scale to provide the access they need. Data administrators are tasked with the complicated process of building out modern data lakes while maintaining legacy data pipelines that are critical for day-to-day success. NetApp has partnered with Dremio, the easy and open data lakehouse, to help enterprises face the challenge of building a future-proof, scalable, and efficient data infrastructure.

Features and Benefits

The capabilities that set StorageGRID and Dremio apart.

Key benefits

Unmatched scalability and data management

Build your data lake on StorageGRID for unmatched scalability and data management.

Enterprise-wide data access

Empower data users across the enterprise to derive full value from your data lake by querying data in place with Dremio.

Efficient data consumption

With the combination of StorageGRID and the Dremio open data lakehouse, enable enterprisewide infrastructure that galvanizes the grid's data for efficient consumption.

Unrivaled scalability and data management for the data lake

NetApp StorageGRID object storage

NetApp StorageGRID object storage is an enterprise-grade, on-premises solution that supports the native Amazon Simple Storage Service (S3) API. StorageGRID is software defined, which means that you can run it on different platforms—bare metal, VMware-based environments, or NetApp's purpose-built appliances—and mix platforms within a grid.

Massive S3 object storage and dynamic data management

StorageGRID offers massive S3 object storage and dynamic data management, enabling you to run next-generation workflows on premises alongside your public cloud. The solution's industry-leading data management policy engine helps you optimize performance and durability and adhere to data locality requirements.

Extreme scalability

StorageGRID is extremely scalable, supporting low-touch, nondisruptive expansions, and can store billions of objects. In a single namespace, StorageGRID can scale up to 16 data centers worldwide.

Faster access with Dremio

Dremio helps data teams deliver faster access to their data. Deploying the Dremio open data lakehouse on top of StorageGRID object storage, you can maximize the value of your enterprise's data while exercising full control of data placement, lifecycle, and tiering.

Information lifecycle management (ILM)

StorageGRID information lifecycle management (ILM) policies allow complete granular control over how long data is retained, where the data sits, when it's tiered to lower-cost object storage, and more.

Dremio: The easy and open data lakehouse

SQL query engine

The Dremio open data lakehouse empowers you to make effective and impactful use of your enterprise's data. With Dremio's SQL query engine, data can be queried in place on StorageGRID, so data users across an enterprise have access to the data lake.

Wide range of data sources

Dremio can query data from a wide range of sources in addition to S3 object storage, including block and file storage, Hadoop Distributed File System (HDFS), and relational databases like Amazon Redshift and Postgres.

Semantic layer

With Dremio's semantic layer, it's easy to build and share data products over your current data infrastructure, which gives you direct access to all your data during phased migrations to a modern data lake.

Platform-agnostic data formats

Dremio supports platform-agnostic data formats like Parquet and Apache Iceberg, making it easy to avoid vendor lock-in and to future-proof your data infrastructure.

Maximize the value of your data

Direct connection for BI tools and SQL clients

Enterprises employ a wide variety of data users—from data scientists to business analysts to executives who need high-level BI dashboards—and traditional data warehouse infrastructures make it difficult to get the right data to the right users quickly. With Dremio, users can connect their BI tools and SQL clients directly to the data lake. This approach removes complicated pipelines, so the whole data lake is just a click away for any data user.

Native BI integrations

Dremio natively connects with Tableau and Microsoft Power BI for BI teams, and it supports Apache Arrow Flight to easily connect the data lake to Python, R, and Jupyter Notebook.

Dremio spaces

Dremio spaces provide a shared semantic layer for all users and tools. Spaces allow data analysts and scientists to create consistent dataset definitions, calculated fields, and security rules that downstream users and tools can use. By providing simple access to an organization's entire data infrastructure, Dremio maximizes the value of data stored on your StorageGRID system to enable high-level business intelligence and data analytics.

Query-acceleration technology

The Dremio SQL engine uses query-acceleration technology to achieve interactive-speed response times, opening the data lake to real-time data analysis and BI.

Columnar Cloud Cache (C3)

Dremio supports Columnar Cloud Cache (C3), which uses NVMe SSD technology built into cloud compute instances to achieve NVMe-level I/O performance.

Data Reflections

Dremio also uses Data Reflections to help accelerate BI dashboards and help end users work freely in their semantic layer without ever needing to know about their physical data model. This helps data engineers eliminate redundant data pipelines and physical data copies commonly found in maintaining materialized views and BI extracts.

Modernize your Hadoop cluster

Migrate Hadoop workloads

Hadoop is a widely used legacy data analytics platform that is still employed by many enterprises across industries. Although Hadoop is a powerful tool for processing big data, it faces several challenges in the modern enterprise. To address these pain points, Hadoop workloads can be migrated to a StorageGRID and Dremio joint solution.

Modernized data analytics platform

This solution provides a modernized data analytics platform with improved performance, scalability, security, and simplicity, and offers several advantages over a legacy Hadoop cluster. These advantages include lightning-fast query performance, independent, and low-touch scaling of compute and storage, built-in future-proof security, and ease of administration. Migrating your Hadoop cluster to StorageGRID and Dremio can provide these benefits and more, making it a no-brainer decision.

Simple setup with incredible results

Flexible deployment options

Dremio can be deployed as a software solution on your enterprise's hardware, or as a fully managed software-as-a-service (SaaS) cloud deployment. Dremio Cloud runs on AWS and handles software installation, configuration, and upgrade as well as compute engine management.

Multi-environment support

Dremio software runs on premises and in multiple clouds, and brings data lakehouse functionality wherever your data resides, including your data center.

Easy S3 connection

StorageGRID can easily connect to the Dremio lakehouse as an S3 data source, and after this simple setup, the grid's data is available for democratized access across the enterprise. Deployed together, Dremio and StorageGRID powerfully enable an enterprisewide data infrastructure, galvanizing the grid's data for effective, impactful, and efficient consumption. Dremio is the perfect platform to make the most of your StorageGRID data lake.
Expert Guidance

Thrive with expert-led storage guidance

Get tailored advice on how StorageGRID and Dremio fits your environment — from sizing and deployment to long-term optimization.

Thrive with expert-led storage guidance

Technical Specifications

Exhaustive hardware and software metrics extracted directly from official documentation.

  • Storage Type
    Enterprise-grade, on-premises object storage
  • API Support
    Native Amazon Simple Storage Service (S3) API
  • Architecture
    Software defined
  • Deployment Platforms
    Bare metal, VMware-based environments, or NetApp's purpose-built appliances—and mix platforms within a grid
  • Scalability
    Low-touch, nondisruptive expansions; can store billions of objects
  • Namespace
    Single namespace can scale up to 16 data centers worldwide
  • Data Management
    Industry-leading data management policy engine; Information lifecycle management (ILM) policies

  • Query Engine
    SQL query engine with query-acceleration technology for interactive-speed response times
  • Supported Data Sources
    S3 object storage, block and file storage, Hadoop Distributed File System (HDFS), and relational databases like Amazon Redshift and Postgres
  • Data Formats
    Platform-agnostic data formats like Parquet and Apache Iceberg
  • BI Tool Integrations
    Native connections with Tableau and Microsoft Power BI
  • Data Science Integrations
    Apache Arrow Flight to connect to Python, R, and Jupyter Notebook
  • Caching
    Columnar Cloud Cache (C3) using NVMe SSD technology built into cloud compute instances to achieve NVMe-level I/O performance
  • Acceleration
    Data Reflections to accelerate BI dashboards
  • Semantic Layer
    Dremio spaces provide a shared semantic layer for all users and tools
  • Deployment Options
    Software solution on enterprise's hardware, or fully managed software-as-a-service (SaaS) cloud deployment
  • Dremio Cloud
    Runs on AWS; handles software installation, configuration, and upgrade as well as compute engine management
  • On-Premises and Cloud
    Dremio software runs on premises and in multiple clouds

Ready to get started?

Get your data flowing from edge to core to cloud.

Talk to a specialist

Request a custom quote

Build a configuration with a Data Lake Infrastructure specialist.

Request a quote

Download the datasheet

Full specs, performance metrics, and deployment notes.

Get the datasheet

Learn more

Explore resources

Datasheets, whitepapers, case studies, and technical documentation.

Explore resources

View solutions

Tailored storage and data management solutions for your workloads.

View solutions

Most secure storage on the planet FIPS 140-3 · NSA CSfC · DoDIN APL
Validated for top-secret data Only enterprise storage to hold this certification
Authorized NetApp Partner SANDataWorks · a division of BlueAlly