Blog
Beyond Classical Statistics: What Machine Learning Offers the Life Sciences Industry
Pierre St-Martin, Senior Director, Data Science and Advanced Analytics (Canada & Latin America)
Jul 29, 2020

Is machine learning a benefit to the life sciences industry?

Artificial intelligence (AI) and machine learning (ML) are rapidly growing disciplines within the life sciences industry. Applications of AI in healthcare alone are expected to grow to more than $8 billion USD by 2022 (globally). Almost half of global life sciences professionals are either using, or are interested in using, AI in some area of their work.

The healthcare industry is competitive and dynamic in nature. AI tools are highly attractive to this industry since they have been successful in clinical research, trial management, regulatory and market access, as well as commercial effectiveness applications. AI/ML has been adopted by those needing access to deeper market insights to craft real world data-driven strategies with speed and precision. AI/ML, integrated into a healthcare company’s analytics strategy, replaces gut instinct and rule-based decision making. It provides evidence-based insights that can reveal complex patterns like those found in patient behaviors, health outcomes, HCP prescribing, and sales, that were previously undetected.

Advances in AI/ML, combined with the increasing availability of healthcare data (e.g. from pharmacies, insurers, healthcare professionals (HCP), labs, electronic medical records (EMR), marketing campaigns, and social media), offers the life sciences industry a wealth of insights and the promise of a competitive advantage with the ability to drive healthcare forward.

Is AI/ML just a new term for classical statistics?

Machine learning first appeared in computer science research in the 1950s. So why, after all these decades, is the life sciences industry finally interested in this family of analytics? The simple answer has to do with data storage and data processing capacities. Both have grown tremendously since that time, to the point where it is now affordable for businesses to use machine learning. Consider that, for a fraction of the cost, a smartphone now has more storage and computing power than a mainframe in the 80s.

Machine learning draws from numerous fields of study: artificial intelligence, data mining, statistics, and optimization. Data (text) mining uses data storage and data manipulation technologies to prepare the data for analysis. Then, as part of the data mining task, statistical or machine learning algorithms can detect patterns in the data and make predictions about new data.

When comparing machine learning to classical statistics, we often look to the assumptions about the data required for the analyses to function reliably. Classical statistical methods typically require the data to have certain characteristics and often use only a few features (called covariates or predictors) to produce results, while machine learning models might use hundreds or even thousands of parameters in a computer-based method to find similarities and patterns among data.

Data Mining Venn Diagram

The similarities and differences between classical statistics and machine learning is a topic that has generated numerous discussion papers and blogs. Here are some key points worth mentioning:

Classical statistics, a subfield of mathematics, almost always starts with a hypothesis, and generally assumes that some structural relationship exists in the data. It uses probability theory and underlying distributions, and is usually applied:

  • To low-dimension problems, those with a limited number of potential covariates, predictors, studied populations, or with smaller sample sizes.
  • When you need to know more about data and the properties of predictors to make accurate inferences about the population under study.
  • When you have more structured and complete datasets.
  • When you want to create a scientifically reliable sample dataset from a population in order to conduct valid inferences and draw unbiased conclusions.

In the life sciences industry, the use of classical statistical methods is the foundation for R&D activities and peer-reviewed, real world publications. Statistical analysis plans in this discipline must adhere to pre-defined industry standards. Such cases include randomized clinical trial analyses and patient analytics, such as survival analysis to compare persistence metrics across multiple groups.

Machine learning is more exploratory and less dependent on a priori hypotheses or assumptions. Algorithms are typically far more complex than their statistical counterparts and often require design decisions to be made before an iterative training process begins. This is due to the difficulty of feature engineering caused by the large number of inputs (high dimensional data sets) and the inclusion of unstructured data (e.g. text data).

  • Machine learning is mainly about creating predictive models, using supervised and unsupervised learning, for classification problems. It requires no prior assumptions about the underlying relationships between population variables and distributions.

Despite these differences, there are many instances where classical statistics and ML use similar approaches and, therefore, overlap with each other. For example, logistic regression is one technique ML borrowed from the field of statistics. It is widely used for classification problems such as segmentation and prediction of group assignment.

Here’s a quick summary of the differences between classical statistics and machine learning:

 

 

Classical Statistics

Machine Learning

Approach

Data Generating (stochastic) Process

Algorithmic Model

Driver

Math, Probability Theory

Fitting Data

Focus

Hypothesis Testing, Interpretability

Predictive Accuracy (Precision and Recall)

Data size

Low-Med

Big Data

Dimensions

Mostly for Low Dimensions

High Dimensional Data

Inference

Parameter Estimation, Predictions, Estimating errors

Pattern Recognition

Model Choice

Parameter Significance (p-values), Goodness of Fit

Cross-validation of Predictive Accuracy on Partitions of Data

Popular Tools

R, SAS

Python

Interpretability

High

Med

What are the implications of using AI/ML for the healthcare industry?

For life sciences companies, understanding the pros and cons of both classical statistics and AI/ML is important when investing in your business. Several key industry-specific conditions can lead decision makers to adopt machine learning solutions. For example:

  • The high-dimensional nature of several healthcare datasets and features required to generate powerful predictive models.
  • Large data sets (Big Data) with millions of records, made of both structured and unstructured data.
  • Rare disease population data that creates unbalanced sub-groups which require complex data engineering steps for model fitting.
  • The need for dynamic algorithms based on a properly deployed machine learning platform that can take advantage of frequent data refreshes and changes in the marketplace, improving the performance of the models over time while maintaining relevance.

Effectively deploying AI/ML technologies can transform a commercial strategy, giving decision makers an edge in the marketplace. However, it only works when organizations have a machine learning strategy with all the necessary elements:

  • Access to diverse industry data sets and subject-matter expertise.
  • Working with healthcare data is far more complex than one would think, due in part to the diversity of data sources and the variable level of completeness, requiring sophisticated data engineering steps of data imputation and normalization.
  • Deep healthcare industry and regulatory knowledge, including knowledge of data privacy legislation.
  • Advanced AI/ML technologies allowing for efficient delivery of proofs of concept and solutions
  • Technical expertise to build AI/ML algorithms that are fit for purpose and generate meaningful insights. Data scientists, the most skilled analytic professionals, need a unique blend of computer science, mathematical statistics, and domain expertise. Most data scientists are trained in industries such as retail, financial services, and communication/social media, making experienced healthcare data scientists hard to find.

Given the many challenges, is there a reward?

AI and machine learning can deliver previously inaccessible insights that can positively impact commercial activities and support various functions within healthcare organizations. AI/ML methods have been shown to consistently deliver more accurate outcomes in less time than conventional assessments. Deriving the greatest benefit from the investment entails adopting a long-term strategy and new ways of performing analytics, rather than looking for short-term gains.

Strategies include:

  • Start with smaller projects and scale up over time. You will be less overwhelmed working with Big Data.
  • Define clear business objectives. AI/ML cannot magically guess what you are trying to do!
  • Plan for how you will measure success. AI/ML accumulates knowledge over time. Be patient and prepared for an iterative process that will get better and provide incremental benefits (ROI) as time goes by.
  • Success requires a change in culture around analytics within the organization. You will need to build trust towards AI/ML-driven predictive and prescriptive outcomes. The level of comfort in receiving action plans designed by a machine will vary from individual to individual.
  • Finally, remember that when properly designed and implemented, AI/ML-driven insights work! Don’t be afraid to innovate. Many others in the industry are already doing it. At IQVIA, we have conducted hundreds of projects around the globe with proven significant return on investment for healthcare companies.

Conclusion

Classical statistics and machine learning need to co-exist; the use of one versus the other should be based on the analytical problem at hand. In some scenarios, they serve very different purposes. In others, they may overlap. The question is not whether one approach should be adopted at the expense of the other, but rather to determine which is the most appropriate for any given business situation.

Machine learning is moving into the mainstream. Effective use of machine learning in business entails developing an understanding of ML within the broader analytics environment, becoming familiar with proven applications, anticipating the challenges you may face using it in your organization, and learning from leaders in the field. Consider a holistic view of machine learning inside your organization. The volume and variety of data, combined with significant regulatory requirements in the healthcare industry, presents a challenge. However, if healthcare companies can successfully navigate this challenge, they face an unpreceded opportunity to answer complex questions about how to best demonstrate the value of their products, craft messaging, and execute sales strategies that deliver commercial success.

In the next few blogs of this AI/ML series, we will demonstrate success stories where AI/ML has been applied to bring competitive advantage to clinical and commercial teams.

If you have questions or comments about this blog or would like to discuss how your business can transform from using traditional statistics to machine learning, contact Pierre.St-Martin or Canadainfo@iqvia.com.

Contact Us