Skip to main content
AI in Genomics

AI in genomics: Progress through innovation

Patients benefit from faster technical breakthroughs in genomics. High-performance GPU computing on genomic workloads can provide 30 to 50 times faster secondary analysis compared with other approaches.

Netapp Ontap Ai For Genomics

With a growth rate in double digits and a global market forecast of more than US$62 billion by 2026, genomics is one of the fastest-growing industries.

But the real story is about more than just market share. Science and health experts are calling genomics a revolution that's only just begun.

The ability to sequence DNA quickly and easily has opened up an array of applications in personalized medicine, cancer research, drug discovery, and more. COVID-19 is also highlighting the importance of sequencing as scientists work to understand the virus.

Features and Benefits

The capabilities that set NetApp ONTAP AI for Genomics apart.

Challenge

Interpreting massive sequence data

The fundamental challenge of genomics is to take mountains of human sequence data and figure out which differences are important. Which gene variants, or combinations of genes, contribute to various medical conditions, and how can genomic information be used to individualize patient treatment?

AI integrity in clinical diagnostics

AI-based algorithms are extraordinary in their ability to interpret complex data. However, their power and complexity can also result in spurious or biased conclusions when applied to human health data. Without careful consideration of the integrity of a trained AI system, the practical benefit of these systems in clinical diagnostics can be limited.

Speed, accuracy, and cost of WGS

To get the most value from WGS in a clinical setting, operators must do it quickly, accurately, and inexpensively, largely because of technical factors. Most often, only limited genetic data is available to clinicians because of the time and cost to process and store WGS data. Since the amount of data generated per patient can be 300GB to 1TB, processing alone can take several days.

Compute, storage, and data management bottlenecks

The data generated by WGS requires massive amounts of compute power, storage, and data management that can easily become a bottleneck. Although 300GB might seem manageable by itself, with thousands of subjects or patients, it scales quickly, and it could take years to get through every record. To keep up with the increased demand, organizations must be able to handle more sequencing jobs in less time without sacrificing accuracy or security.

Data management complexity

In genomics, data management is a much more significant challenge compared to medical imaging or digital pathology. With larger datasets, WGS creates data management challenges in every part of the lifecycle. Although the file formats used in genomics are standardized, there is no equivalent to a picture archiving and communication system (PACS) or vendor-neutral archive (VNA) for managing sequence data.

EHR integration limits

Methods to store genomic data in electronic health record (EHR) systems are being investigated, but existing EHR databases can't effectively store these very large data files. It's likely that EHR systems will need to store shortcuts to the data files in external systems that are better suited to the task.

Persistent value of genomic data

In many big data applications, data loses its value over time. But in genomics, data never loses its value. Intermediate data generated during analysis is often used for reanalysis, enabling new insights. For example, data scientists in pharmaceutical companies frequently reanalyze genomic files to try to discover new mutations or biomarkers. By definition, this reanalysis poses a new set of scalability requirements.

Solution

AI-driven personalized treatment

After a person's genomic information is collected, AI and machine learning help analyze the data to determine personalized treatment options. The results can impact pharmacology, oncology, infectious diseases, and many other areas of healthcare. In practice, AI applications in genomics tend to target tasks that are difficult or impractical to complete using human intelligence alone. AI is also useful for tasks that are prone to error with standard statistical approaches, including variant calling, genome annotation, variant classification, and phenotype-to-genotype correspondence.

WuXi NextCODE

NetApp customer WuXi NextCODE has created a unique platform specifically to organize, mine, share, and apply genomic data to improve human health. Over the past 20 years, it has amassed the world's largest database of human genome sequences. WuXi NextCODE uses NetApp Cloud Volumes Service to help researchers quickly generate insights from processing unprecedented amounts of genomic data, discovering new ways to address disease. Across four continents, the company's partners can correlate genomic and phenotypic data at unprecedented scale and speed.

ICON plc

In the case of ICON plc, headquartered in Dublin, Ireland, it's all about data, mountains of it, that drives clinical trials, accelerates proof of efficacy, and hastens the delivery of lifesaving new medicines and therapeutic devices to market. As a global leader in contract clinical research, ICON's data-modeling team runs algorithms on a software-as-a-service grid platform deployed on NetApp E-Series systems. This advanced data modeling requires extreme performance.

AstraZeneca

AstraZeneca accelerates pharmaceutical science with the cloud. By partnering with NetApp, AstraZeneca has been able to design and implement a data strategy for its hybrid multicloud environment. This dynamic movement of data to any cloud from any cloud allows faster analysis and discovery.

NVIDIA Parabricks

NVIDIA Parabricks provides 30 to 50 times faster secondary analyses of sequencer-generated FASTQ files to variant call files. Additionally, Parabricks achieves results that are equivalent to the results of common secondary analysis tools like GATK4 and DeepVariant, while significantly increasing throughput. By using GPU-accelerated computing, Parabricks can provide throughput comparable to about 40 to 50 CPU servers with one GPU server, reducing IT management overhead and operating costs, including power and cooling.

NetApp ONTAP AI with Parabricks

In combination with NetApp ONTAP AI, Parabricks makes it possible to deploy a fully integrated, end-to-end AI solution tuned for compute and data-intensive genomics workloads. For example, in response to the impact of COVID-19, Core Scientific gave researchers free access to AI infrastructure as a service, powered by NetApp ONTAP AI and NVIDIA Parabricks, for GPU-accelerated coronavirus-related research.

Real-world clinical impact

These types of solutions are not prototypes. There are many real-life examples of AI improving genomics workloads—and helping patients. For example, San Diego researchers recently set a record for sequencing to diagnosis in a neonatal/pediatric ICU. The team created a pipelined workflow that included Illumina sequencers and Diploid Moon to automatically filter and rank the possible causative gene variants, compressing the entire workflow down to about 19 hours.

Hybrid cloud for genomics

Many researchers are turning to the cloud to gain access to necessary storage and compute resources. Because of the large datasets and the amount of compute required, genetics researchers use a hybrid cloud approach to gain access to the necessary storage and other resources. In clinical settings, secondary and tertiary analysis of patient genomic data has to be done close to the sequencer, at least in urgent cases. Hospitals and laboratories that want to perform sequencing for clinical use will need to have local compute and high-performance storage, and the results of genetic analysis must be integrated with EHRs so that they are quickly available to clinicians.

Benefits

Accelerate genome sequencing

With innovations like the NetApp and Parabricks solutions, companies can speed the performance of GPU-accelerated genomic sequencing an average of 50 times compared to CPU-only solutions.

Maximize throughput and minimize turnaround time

Perform more genome analysis operations in less time with industry-leading, cloud-connected all-flash storage from NetApp and NVIDIA DGX servers.

Improve accuracy and security

Protect sensitive genomic data while improving test accuracy for AI-powered precision calculations.

Lower total cost of ownership

Data-efficiency technologies and a 25 times capacity advantage versus competitive systems mean lower TCO.

Simplify design and accelerate return on investment

Accelerate ROI with simplified integration, automation, and orchestration of data in clouds and on premises.
Expert Guidance

Thrive with expert-led storage guidance

Get tailored advice on how NetApp ONTAP AI for Genomics fits your environment — from sizing and deployment to long-term optimization.

Thrive with expert-led storage guidance

Technical Specifications

Exhaustive hardware and software metrics extracted directly from official documentation.

  • Secondary analysis speed-up (GPU vs CPU)
    30 to 50 times faster
  • GPU server throughput equivalence
    Comparable to about 40 to 50 CPU servers with one GPU server
  • Capacity advantage versus competitive systems
    25 times

  • Whole genome sequencing data per patient
    300GB to 1TB
  • Typical processing time per patient (without acceleration)
    Several days
  • Sequencing-to-diagnosis record (San Diego, neonatal/pediatric ICU)
    About 19 hours

  • Global genomics market forecast by 2026
    More than US$62 billion
  • Typical genome sequencing cost
    Generally under $600
  • Typical genome sequencing turnaround
    Less than a week
  • Veritas Genetics WGS offering (late 2018)
    $199 with a 2-day turnaround
  • BGI offering
    $100

  • Compute platform
    NVIDIA DGX servers
  • AI software acceleration
    NVIDIA Parabricks
  • AI infrastructure
    NetApp ONTAP AI
  • Cloud data service
    NetApp Cloud Volumes Service
  • On-premises grid storage
    NetApp E-Series systems
  • Comparable secondary analysis tools
    GATK4 and DeepVariant

  • Document ID
    NA-432-0921
  • Copyright
    © 2021 NetApp, Inc. All Rights Reserved.

Compare NetApp ONTAP AI for Genomics Series

Select the right scale for your workload demands.

Compare NetApp ONTAP AI for Genomics Series — capacity and port configuration by model.
Model Name Max Capacity Port Config Action
NetApp ONTAP AI + NVIDIA Parabricks (GPU) Cloud-connected all-flash storage; 25x capacity advantage vs competitive systems N/A Get Quote
Traditional CPU-only secondary analysis N/A N/A Get Quote
GATK4 N/A N/A Get Quote
DeepVariant N/A N/A Get Quote

Ready to get started?

Get your data flowing from edge to core to cloud.

Talk to a specialist

Request a custom quote

Build a configuration with a AI in Genomics specialist.

Request a quote

Download the datasheet

Full specs, performance metrics, and deployment notes.

Get the datasheet

Learn more

Explore resources

Datasheets, whitepapers, case studies, and technical documentation.

Explore resources

View solutions

Tailored storage and data management solutions for your workloads.

View solutions

Most secure storage on the planet FIPS 140-3 · NSA CSfC · DoDIN APL
Validated for top-secret data Only enterprise storage to hold this certification
Authorized NetApp Partner SANDataWorks · a division of BlueAlly