Skip to main content
Genomic Data Management

Simplify Genomic Data Management

Raw genomic datasets are huge, and scientists and bioinformaticians have long sought ways to reduce the size of the datasets they work with by using a combination of data compression and reduction techniques.

PetaGene PetaSuite with NetApp Data Fabric

Raw genomic datasets are huge, and scientists and bioinformaticians have long sought ways to reduce the size of the datasets they work with by using a combination of data compression and reduction techniques. When genomic sequencing was in its infancy, raw sequencer output, around 2TB, was often stored for extended periods while bioinformaticians carried out the complex tasks in assembling and aligning of the sequencing data. With these steps complete, the data could be used in variant calling and interpretation, which are vital steps in understanding gene expression and disease.

Today this process is highly automated and has been greatly accelerated through a combination of parallel processing and the availability of reference genomes. Work that previously took months or years can now be turned around in little more than a day, and a number of compressed genomic file formats are available that reduce the size of an individually stored genome down to a few tens of gigabytes. This reduction has greatly improved the bioinformatician's ability to work with and transfer data to clinicians quickly and efficiently.

PetaGene PetaSuite with NetApp Data Fabric overview

Features and Benefits

The capabilities that set PetaGene PetaSuite with NetApp Data Fabric apart.

Key Benefits

Increase collaborative efficiency

Transfer with smaller, more portable files over the NetApp® Data Fabric. Typical full genome file size from PetaGene PetaSuite is 16GiB, versus 65GiB to 85GiB for FASTQ.GZ and BAM formats.

Use less storage capacity and lower costs

Smaller files use less storage and dramatically reduce storage costs.

Leverage the flexibility of the cloud

With the NetApp Data Fabric, files can be seamlessly and securely moved to and from the cloud to support cloud-based workflows. Cold data can be tiered to the cloud by using FabricPool, freeing performance tiers for new sequencing projects.

Maintain interoperability with existing workflows and formats

PetaGene PetaSuite lets researchers and clinicians continue using FASTQ.GZ and BAM file representations in their existing tools and pipelines without needing to decompress first.

Solution Capabilities

The Advent of Precision or Personalised Medicine

Faster sequencing and more compact datasets have increased the number of individual sequences that can be performed. In this new world of personalised or precision medicine, individual patients and even their individual diseases, typically cancers, can be sequenced. (Many tumors have their own genetic makeup that can differ from that of the patient.) This sequencing brings great hope and opportunity for new insight, but it maintains the pressure on data capacity and increases the need to support privacy.

Beyond Storage Efficiency

Unlike generic data reduction techniques, PetaSuite understands the internal structure of genomics files. For lossless storage, PetaSuite offers cost reductions of up to 6:1 when compared with BAM or FASTQ.GZ files. This is a 96% data reduction in comparison with raw FASTQ.GZ files.

ONTAP Cloud and OnCommand Cloud Manager

NetApp ONTAP® Cloud storage software delivers enterprise quality control and protection for genomic data, with the flexibility to simplify the use of public cloud. ONTAP Cloud is a software-only service that offers a universal storage platform with easy-to-use file services (NFS, CIFS) and block services (iSCSI) common across all cloud and on-premises platforms. The SnapMirror® features of ONTAP offer a bandwidth-efficient data replication and transfer mechanism between clouds and to or from the data center.

To simplify the management experience, NetApp also offers OnCommand® Cloud Manager software, a centralised management environment for ONTAP Cloud software that fully supports hybrid storage environments.

Improve Analysis Speed

The PetaView command line file access system is lightweight, and I/O reductions dominate. Therefore using PetaView's on-the-fly random-access client-side decompression can actually speed up analysis, tools, and pipelines, especially in high-performance computing.

Storage Tiering

PetaSuite can exploit tiered storage by identifying and separating out unimportant NGS components to lower-cost tiered storage, while retaining important information in faster storage tiers. This capablity can integrate well with a hybrid FAS Solution and NetApp FabricPool.
Expert Guidance

Thrive with expert-led storage guidance

Get tailored advice on how PetaGene PetaSuite with NetApp Data Fabric fits your environment — from sizing and deployment to long-term optimization.

Thrive with expert-led storage guidance

Technical Specifications

Exhaustive hardware and software metrics extracted directly from official documentation.

  • PetaGene PetaSuite Typical Full Genome File Size
    16GiB
  • FASTQ.GZ and BAM Formats File Size
    65GiB to 85GiB
  • Raw Sequencer Output
    Around 2TB

  • Lossless Cost Reduction vs BAM or FASTQ.GZ
    Up to 6:1
  • Data Reduction vs Raw FASTQ.GZ
    96%

  • File Services
    NFS, CIFS
  • Block Services
    iSCSI
  • Data Replication
    SnapMirror®

  • Storage Tiering Integration
    Hybrid FAS Solution and NetApp FabricPool
  • Management
    OnCommand® Cloud Manager
  • File Access System
    PetaView command line
  • Decompression
    On-the-fly random-access client-side

Compare PetaGene PetaSuite with NetApp Data Fabric Series

Select the right scale for your workload demands.

Compare PetaGene PetaSuite with NetApp Data Fabric Series — capacity and port configuration by model.
Model Name Max Capacity Port Config Action
PetaGene PetaSuite 16GiB typical full genome file size N/A Get Quote
FASTQ.GZ Format 65GiB to 85GiB N/A Get Quote
BAM Format 65GiB to 85GiB N/A Get Quote

Ready to get started?

Get your data flowing from edge to core to cloud.

Talk to a specialist

Request a custom quote

Build a configuration with a Genomic Data Management specialist.

Request a quote

Download the datasheet

Full specs, performance metrics, and deployment notes.

Get the datasheet

Learn more

Explore resources

Datasheets, whitepapers, case studies, and technical documentation.

Explore resources

View solutions

Tailored storage and data management solutions for your workloads.

View solutions

Most secure storage on the planet FIPS 140-3 · NSA CSfC · DoDIN APL
Validated for top-secret data Only enterprise storage to hold this certification
Authorized NetApp Partner SANDataWorks · a division of BlueAlly