The More We Learn About the Disease, the Less We Know
A Human Data Science Perspective on COVID-19
Murray Aitken, Executive Director, IQVIA Institute for Human Data Science
May 08, 2020

The global COVID-19 pandemic has unleashed a tsunami of data, evidence, and studies about the coronavirus disease, and we have learned a lot in a remarkably short period of time. Yet, as we learn more, we are also confronted with new questions that highlight gaps in our knowledge, weaknesses of methodologies, uncertainty of modeling, shortcomings of analytics, and the siloed nature of human health research and data science disciplines. More than ever, the COVID-19 pandemic has revealed the urgency and the opportunity for a more integrated, interdisciplinary approach, such as Human Data Science. 

The IQVIA Institute is publishing a series of articles that look at how COVID-19 may prove to be a catalyst for positive change, igniting a sense of urgency around addressing the gaps in knowledge related to COVID-19 (and other diseases) and the opportunity to take a Human Data Science approach. The first article provides an overview of the 10 key areas that need rethinking. Here, we focus in on the key dimensions of our understanding (or lack thereof) of the coronavirus COVID-19 disease. Understanding the nature of the pathogen, its origin and transmission, the clinical characteristics of the disease, comorbidities and ethnic disparities, are critically important prerequisites for the best possible approaches to diagnosis, treatment, and intervention both at the clinical, behavioral, and public health levels.

Every day, every hour, we are uncovering new, critically important aspects of the COVID-19 pandemic. Leading scientific journals are posting new studies and observations from researchers and clinicians at the frontline of the battle several times a week. Public health experts and data scientists are releasing daily data tracking the spread of the pandemic in terms of cases, hospitalizations, intubations and fatalities, etc. Yet, every report, every study and frontline report raises new questions in our understanding of this complex and puzzling disease, its pathogenic makeup, the nexus between the underlying comorbidities and social determinants, and the best public health strategy combining pharmaceutical and nonpharmaceutical interventions.

The multidimensional, integrated, and interdisciplinary approach in Human Data Science is uniquely suited to address the complexity of infectious diseases and the extraordinary challenges from a global pandemic, such as COVID-19.

When preparing to more effectively address and respond to future disease outbreaks, Human Data Science enables a comprehensive approach that can include contact tracing, identification of community spreads, and the management of clusters of outbreaks. This is accomplished by  

  • Activating human science to understand the origin of the pathogen, the etiology of infectious disease/virus and the pharmaceutical (antivirals and vaccines) as well as nonpharmaceutical public health interventions (containment, mitigation and physical distancing)
  • Applying analysis of human health and behavior relating to factors outside of the clinical field, such as animal human interactions, food and diet practices, urbanization, travel patterns and communications
  • Combining both using data science, many times using AI, machine learning, and advanced analytics

For more details, see the recent report from The IQVIA Institute: Human Data Science – A New Approach to Advancing Human Health Outcomes.1

Here are three key mysteries still hampering our understanding of COVID-19:

The puzzling pathogenesis

Coronavirus is not a new pathogen, but despite ~1,000 new scientific papers being published every week about COVID-19, the disease still remains a mystery.

What we know: Coronavirus is not a new pathogen. Previous outbreaks of coronaviruses (COVs) include the severe acute respiratory syndrome (SARS)-CoV and the Middle East respiratory syndrome (MERS)-CoV. The new coronavirus, 2019 Novel Coronavirus COVID-19, was not named by WHO until February 11, 2020, though the first cases initially were reported in the early December 2019, in Wuhan, China. Recent reports indicate that the virus occurred even earlier, including a study from France that COVID-19 was spreading in France already in late December 2019.2 The genetic sequence of COVID-19 showed more than 80% identity to SARS-CoV and 50% to MERS-CoV. Significantly, both SARS-CoV and MERS-CoV originated in bats.3 This should not have come as a surprise, as a group of scientists at the University of Hong Kong in 2007 already linked SARS with exotic food animals.4 The researchers also warned that “The presence of a large reservoir of SARS-CoV-like viruses in horseshoe bats, together with the culture of eating exotic mammals in southern China, is a timebomb. The possibility of the reemergence of SARS and other novel viruses from animals or laboratories and therefore the need for preparedness should not be ignored.”

The most common symptoms at the onset of COVID-19 appear to be fever, dry cough, shortness of breath, and fatigue—other symptoms, while they can vary widely from patient to patient, include excessive sputum production, headache, muscle pain, new loss of sense of taste and smell, chills, diarrhea, and lymphonia. Clinical characteristics are demonstrated by chest CT scans presenting as pneumonia, acute respiratory distress syndrome, or acute cardiac injury. Additionally, impacts on other organ systems are features also observed in patients with SARS and MERS. While WHO estimated a case fatality ratio of approximately 14 – 15% for severe acute respiratory syndrome (SARS) in 2003, and approximately 35% for Middle East respiratory syndrome (MERS) in 2012, compared to 1.38% for COVID-19,5 COVID-19 has spread much more rapidly than SARS and MERS, and thus represents a more challenging global disease outbreak.

What we don’t know: Despite about 1,000 new scientific papers being published every week about COVID-19, the disease still remains a mystery. While clinicians realize that the lungs are ground zero of the disease, its reach can extend to many organs including the heart and blood vessels, kidneys, gut, skin, and brain.6 Most clinicians suspect that the driving force in many gravely ill patients may not be the virus, but a fatal overreaction of the immune system to the viral infection known as a cytokine storm

Case fatality ratios also need to be viewed with an abundance of caution as they are crude and based mainly on early observations from China where the outbreak started. The health system in Wuhan where the outbreak started was quickly overwhelmed, and the case fatality ratio is likely strongly influenced by the availability of healthcare facilities.7 It is notable that case fatality rates in China outside of Hubei province were much lower. As health systems learn to treat COVID-19 and intervene earlier based on better testing and nonpharmaceutical interventions (contact tracing, mitigation, and physical distancing) case fatality ratios will likely change. Furthermore, case reporting has suggested that many fatality reports don’t correctly register underlying diseases and therefore are unable to clearly isolate excess death rates (EDRs) caused by COVID-19 from other conditions such as, the flu, pneumonia, heart failure, and cancer. 

While there seems to be scientific consensus that bats and other wildlife are the main origin of coronaviruses, both SARS-CoV and COVID-19, there is still a lot of uncertainty. The first COVID-19 patients were epidemiologically linked to a seafood and wet animal wholesale market in Wuhan, Hubei Province, China.8 However, the link to bats and other wildlife has not been solidly confirmed. Few coronaviruses have been tested in bat cell lines, which suggests that humans or perhaps other select mammals may serve as the gateway species for mammalian coronaviruses. In addition, this suggests that human or other mammalian coronaviruses could potentially circulate back and forth between bats and humans, establishing a zoonotic-reverse zoonotic cycle that may allow the virus to maintain viral populations in multiple hosts, exchange genetic information to alter pathogenesis or transmission characteristics, and potentially evolve new variants that can infect humans and animals.9

Human Data Science approach: More research is needed to determine the pathogenesis and epidemiology of COVID-19 through a broader, multidisciplinary approach that encompasses several disciplines across genomic analysis, clinical research, epidemiology, animal and human science, and the anthropology of human and animal behaviors. This highlights the urgency of doing more research on animal to human transmission of zoonotic disease, which will enable earlier identification of reservoirs and prevention of spread of disease by changing protocols and practices around sourcing of exotic food animals, handling of food animals, production practices and hygiene at food markets.  The sociocultural drivers associated with the transmission of zoonoses demand closer scrutiny and understanding in light of the known risk factors. Engaging anthropologists and sociologists will be essential to addressing these issues of zoonotic transmission. Human Data Science is uniquely positioned to undertake a more holistic approach to solving the puzzles around the clinical profile and origin of COVID-19, and indeed other diseases. 

The perplexing nexus of disease, comorbidities, and gender and ethnic disparities 

Researchers are starting to pinpoint underlying comorbidities and the gender, ethnic, and social disparities of COVID-19, buy fully understanding how these factors intersect and impact humans longer-term requires a multidisciplinary approach.

What we know: Much attention has been given recently to the underlying comorbidities and the gender, ethnic, and social disparities of COVID-19. CDC’s surveillance data10 of 1,482 COVID-19 patients hospitalized in the U.S. through the month of March showed that among 178 adult patients with underlying conditions, the most common were hypertension (49.7%), obesity (48.3%), chronic lung disease (34.6%), diabetes (28.3%), and cardiovascular disease (27.8%). 

More men than women are dying from COVID-19, according to global data.11 Adverse outcomes of COVID-19 seem to be associated with the common comorbidities above, and these conditions are more prevalent in men.12   

Among patients with race/ethnicity data, the CDC report found that in the COVID-NET catchment population, approximately 59% of residents are white, 18% are black, and 14% are Hispanic; however among hospitalized COVID-19 patients with race/ethnicity data, approximately 45% were white, 33% were black, and 8% were Hispanic, suggesting that the black population might be disproportionately affected by COVID-19. The disproportionate impact is particularly pronounced in the largest cities in the U.S. In New York City, Hispanic coronavirus victims make up 34% of all fatalities while comprising only 29% of the city’s population. In Chicago, 71% of deaths from the virus occur among black people, who make up 29% of the city’s population.13 

What we don’t know: The high prevalence of heart disease in hospitalized patients with COVID-19 is puzzling doctors. Case studies have even shown fatal strokes in men aged 20-50,14  The question whether the emerging problems with cardiovascular disease are caused by the virus itself or are a byproduct of the body’s reaction to it has become one of the critical unknowns.15  There is also uncertainty whether hypertension as a single entity – not associated with any other cardiovascular disease or diabetes or obesity or smoking or vaping – is an increased risk factor.16 

Despite the higher fatality rates in men, it is still unclear whether women or men are more likely to get infected. In South Korea, for example, while men make up 40% of confirmed cases, they account for 53% of deaths. In Ireland, so far men make up 48% of confirmed cases but 69% of deaths.17

While data on ethnic disparity appears to match the higher prevalence of chronic disease in blacks and Hispanics vs white Americans, for example markedly higher rates of hypertension (32.4% in blacks and 23.3% in whites)18, there is still a lot of uncertainty about the driving factors behind the higher rate of serious COVID-19 infections causing hospitalizations among ethnic minorities. Other factors likely play a role in exposing minorities, such as the higher proportion of ethnic minorities among frontline healthcare workers and workers in so-called essential jobs who are working during the pandemic. Furthermore, social factors likely play a role, as the higher prevalence of COVID-19 cases in lower-income zip-codes with higher urban density in New York City suggests.19 The causal impact and interconnections of these factors – genetics, race, job exposure, socioeconomics and behavior – are not fully understood and will require more research.

In addition to the unresolved challenges around comorbidities and gender and ethnic disparities, there is a growing concern about the potential long-term complications of COVID-19 infection. After any severe case of pneumonia, a combination of underlying chronic diseases and prolonged inflammation seems to increase the risk of future illnesses, including heart attack, stroke, and kidney disease.20 

Human Data Science approach: Fully understanding how all these factors intersect will require a holistic, multidisciplinary Human Data Science methodology that looks at all the factors: human science (clinical disease, genetics and comorbidities), human health (social determinants, behavior and culture) and data science (tracking and predicting disease, transmission and behaviors simultaneously).

Not an ACE in the hole

ACE inhibitors are being used to treat COVID-19 patients to varying degrees of success, but a more comprehensive analysis incorporating RWE is essential to assessing their use.

What we know: As described above, patients with chronic conditions, such as hypertension and diabetes, appear to be at increased risk for COVID-19 infection. Notably, the most frequent comorbidities reported in studies of COVID-19 patients in China are often treated with angiotensin-converting enzyme (ACE) inhibitors. Human pathogenic coronaviruses, such as SARS and COVID-19, bind to their target cells through angiotensin-converting enzyme 2 (ACE2), which is expressed by epithelial cells of the lung, intestine, kidney and blood vessels. The expression of ACE2 is substantially increased in patients with type 1 and type 2 diabetes, who are treated with ACE inhibitors and angiotensin II type-1 receptor blockers (ARBs). Hypertension is also treated with ACE inhibitors and ARBs, which results in an upregulation of ACE2. 

What we don’t know: Researchers recently hypothesized in a correspondence in the New England Journal of Medicine21 that diabetes and hypertension treatment with ACE2-stimulating drugs increases the risk of developing severe and fatal COVID-19. In other words, COVID-19 patients with comorbidities, such as hypertension and diabetes, may be at elevated risk of a severe outcome of COVID-19, not just due to the underlying comorbidity, but the potentially deleterious link between the ACE2 expression in the coronavirus pathogen and the ACE2 modulating properties of ACE-inhibitor medications. However, a large population-based, case-control study from Italy with a total of 6,272 patients showed that while the use of ACE-inhibitors and angiotensin-receptor blockers (ARBs) was more frequent among patients with COVID-19 than among controls because of their higher prevalence of cardiovascular disease, there was no evidence that ACE-inhibitors or ARBs affected the risk of COVID-19.22

Human Data Science approach: Further study of this hypothesis would require a comprehensive analysis of clinical data from randomized trials combined with real world evidence of patients with COVID-19 infections and underlying chronic comorbidities, such as hypertension and diabetes.

Human Data Science regarding COVID-19 and beyond

The application of Human Data Science to the COVID-19 pandemic creates opportunities for exploring and better understanding some of the puzzling complexities of this disease. But it also provides insights to some fundamental challenges beyond COVID-19 in medical research and healthcare regarding the intersection of human science (clinical and epidemiological aspects), human health (socioeconomic, demographic, behavioral and ethnic factors), and data science (data and predictive analytics).

I’ll be publishing new articles regularly, to get alerts when they are posted, connect to the IQVIA Institute on LinkedIn and Twitter. You can also share your comments or ask questions there. 



You may also be interested in
Contact Us
Contact Us
Contact Us

Email Us

Get in touch today to discover the right solutions for you.

Call Us

We are pleased to speak with you during our standard business hours.

U.S. Toll-Free only
+1 866 267 4479

For international call please find a number in our toll-free list.

IQVIA Institute Inquiries

We are ready to help you better understand and benefit from the work of the IQVIA Institute for Human Data Science. Please get in touch today to learn more.