Institute Report
Understanding the Global Landscape of Genomic Initiatives
Progress and Promise
May 12, 2020

About the Report

This report utilizes the IQVIA Genomic Initiatives Database — a new database of genomic data initiatives — to examine and segment the global genomic landscape. In the COVID-19 era, the role genomic databases play is growing. For instance, the UK Biobank and Finngen, among others, are currently involved in efforts to understand the genetic determinants underlying patient susceptibility to COVID-19 and severe response to infection.1,2  They are also working with the COVID-19 Host Genetics Initiative3 to help COVID researchers generate, share, and analyze data globally. The publication of this first landscape of genomic data initiatives provides a baseline on the important use of human genomic data — a point from which to measure progress building insight into the origins of human disease and developing better therapeutics to prevent and treat diseases. With it, stakeholders can gain a better picture of the current genomic landscape.

Report Summary

The landscape of initiatives to generate and collect human genomic data is evolving rapidly and is highly diverse, with private and public initiatives across multiple countries. Since 1990, the cost of sequencing a whole genome dropped from $2.7 million to as low as $300, opening new opportunities to build repositories of genomic data. By the start of 2020, there were 187 genomic initiatives globally of which 50% originated in the U.S. and 19% in Europe. Thirty-eight million genomes had been analyzed using techniques ranging from genotyping to whole genome sequencing, and this number is expected to grow to 52 million by 2025.

Planned national genomic databases are proliferating as countries increasingly appreciate the potential technological and healthcare system benefits of genomic data, with some countries planning to sequence their entire population. The medical utility of data varies across initiatives based on the number of genomes collected, the completeness of genomic data, linkage to other healthcare-relevant data, and the disease or populations it covers. Although only 42% of databases publicly state their genomic data links to patient demographic information or clinical data and only 28% state links to the most valuable EMR/EHR and clinical data, an analysis of initiatives’ data quality versus their target cohort sizes suggests that the next decade will see an increasing number of large genomic databases with strong utility for human data science and medical research.

 


1 Morelle R. UK Biobank: DNA to unlock coronavirus secrets. BBC News. 14 Apr 2020. Available from: https://www.bbc.com/news/health-52243605
2 Finngen. Finngen involved in the Covid19 study. Accessed May 4, 2020. Available from: https://www.finngen.fi/fi/finngen_mukana_covid19_tutkimuksessa
3 The COVID-19 Host Genetics Initiative. Available from: https://www.covid19hg.org/

Key Findings

The cost of whole genome sequencing dropped from $2.7 million in 1990 to $300 in 2020, enabling genomic data repositories to be built that can now be used to study COVID-19

The Evolving Cost of Genomic Sequencing
  • The next decade will supply researchers with vastly increased genomic data resources to gain insights into the molecular mechanisms of human disease and better understand the epidemiological landscape.
  • Genomic initiatives can help healthcare stakeholders identify genetic variants that increase risk for disease, diagnose patients earlier and prevent disease, develop companion diagnostics to personalize treatment with medicines, and accelerate the discovery, repurposing and clinical development of medicines.

 


 

By the start of 2020, 38 million genomes had been analyzed by various genomic initiatives using techniques ranging from genotyping to whole genome sequencing

  • There are 187 genomic initiatives globally of which 50% originated in the U.S. and 19% in Europe.
  • While only 32% of the genomic initiatives are public in the U.S., versus 50% in Europe, the U.S. still has double the total number of public initiatives.
  • Planned national genomic databases are proliferating as countries increasingly appreciate the potential technological and healthcare system benefits of genomic data, with some countries planning to sequence the entire population.
  • Among disease-specific datasets more than half focus on oncology and 13% on rare diseases.

 


 

The number of genomes analyzed is expected to grow to 52 million by 2025, opening new opportunities to treat chronic diseases and understand patient susceptibility to infections like COVID-19

  • Compared with other regions, a higher proportion of the genetic data collected in Europe and Asia will be high-value whole genome (28%) or biological samples (29%) that could be fully sequenced either now or in the future.
  • Europe is expected to fully sequence 1.5 million whole genomes by 2025, more than the 780,000 in North America, but an additional 2.2 million whole exomes will also be sequenced in North America that hold high medical value.
  • Genomic data currently collected by existing initiatives do not reflect the genomes of global populations well — especially populations in Asia, Africa and South America. Unless there are dramatic increases in the sequencing of these populations, they will continue to be under-represented.
  • Though Asia’s population makes up 60% of the world’s population, only six million Asian genomes are planned to be sequenced by 2025 — 12% of the global target.

 


 

Large genomic databases will increasingly be available in the next decade to support Human Data Science initiatives

  • The medical utility of data varies across initiatives based on the number of genomes collected, the completeness of genomic data, linkage to other healthcare-relevant data, and the disease or populations it covers.
  • Only 42% of databases publicly state their genomic data links to patient demographic information or clinical data, among which 28% of initiatives tie to valuable EMR/EHR and clinical data.
  • Challenges remain in oncology as most tumor samples are collected in a way optimized for histology not whole genome sequencing. Though cancer has been a driver of initial genomic activity, this issue may impact downstream genomic analyses and see genomic data contribute more in other disease areas.
  • The development of interoperability standards and agreements are needed to enable future linkage across databases and accelerate the power of very large, high-quality genomic databases to improve human health in the next decade.

Have a question? Want to learn more?

We’d love to show you how IQVIA can help answer your most pressing research questions with our genomic data, powerful analytic tools and research expertise.

Tell us a little about your business needs and we'll connect you with a dedicated expert who can share more about our product solutions and answer any questions you have.

Contact Us