Data Science, Machine Learning (ML), and Artificial Intelligence (AI) have without doubt become hot topics across all industries, including healthcare. Companies, large and small, are rushing to stock up on data scientists, but are data scientists alone enough to build a successful data science practice in healthcare? Without a doubt, data scientists are needed to build models. But when you are dealing with healthcare and human data, you need an entire ecosystem of support functions to develop and sustain the team.
At IQVIA, that ecosystem is in place, and it’s called Human Data Science.
Previous articles on human data science have introduced some of the requirements for bringing data science to healthcare. In this article, we dive into 6 specific ingredients which, when carefully orchestrated, will provide the necessary foundation for success.
- Data science expertise
- Data access and data expertise
- Business expertise
- Medical expertise
- Technology access and technology expertise
- Information governance expertise
Data Science: First up is the most widely discussed ingredient for success – data science itself. Data science is a relatively new field that resides at the intersection of statistics, computer science and analytics. All are equally important and none can be overlooked. The role of statistics is to prevent you from making inappropriate conclusions from the data. The role of computer science is to go faster and further with the data. The role of analytics is to play with the data, uncover actionable insights and tell a story about the data. Having an appreciation for the importance of each of these roles and an understanding of how they blend together is essential for a data scientist in any field, however, is only a starting point for a successful practice of human data science
Data access and expertise: There is no data science without data. The amount of routinely collected healthcare data has been consistently growing over the past decades. However, not all healthcare data is the same, and most certainly none is perfect. As such, understanding the benefits and caveats of different types of healthcare data, and making a wise choice of which type(s) to use to best address your business need is essential. The list of considerations would vary depending on your application but will certainly include data quality and coverage, ease of data access, and timeliness of data transfer.
Imagine you are building a clinical decision support tool for Alzheimer’s Disease to facilitate patient physician interaction, but you procure data that is collected by US commercial insurance providers. You will certainly see some patients, but will not capture your target population completely, because commercial insurance has poor capture of patients over the age of 65.
Read more about the need for understanding the minutia and complexity of healthcare data from Michael Kleinrock, the lead human data scientist at the IQVIA Institute for Human Data Science.
Data access: check. Next up – domain expertise. In the healthcare industry, you need at least two types of domain expertise: business and medical, all to ensure that data science projects use data in a commercially and clinically appropriate manner.
Business Expertise: Business expertise is essential for understanding the environment for which you are designing a solution: Who are the key stakeholders and users? What keeps them awake at night? What is the status quo and how will the proposed data science solution improve on it? What regulatory rules need to be considered?
Consider a machine learning model that predicts whether a patient is ready to switch to your brand. An obvious way to build this model would be to train it on patients who have already switched to your brand – but what about the competitive landscape? Do candidate patients often initiate treatment with a competitor instead? Training a model on patients initiating a competitor brand as well allows you to expand market opportunity for your brand.
Medical expertise: Medical expertise goes hand in hand with expertise in the complexity and uniqueness of healthcare data. Knowledge of the therapy area is critical to influence study design, sanity check model outputs and recommendations, and ultimately assure the final solution has positive impact on patient care. What clinical codes should I use for my inclusion/exclusion criteria? Which predictors are expected to be important? How should the model learning be incorporated into medical practice?
One example of the importance of medical expertise is developing a machine learning algorithm to predict whether patients will respond better to treatment A vs. treatment B, without accounting for the fact that one of the treatments is contraindicated for a specific subgroup of patients. The impact of this oversight may be significant, and it may become apparent too late.
Technology: With petabytes of healthcare data being routinely collected, investment into technology for efficient storage, real-time processing of data and machine learning facilitation becomes a must. Growth in cloud environments (e.g. AWS, Azure) and open-source languages and tools (e.g. Python, TensorFlow) can both facilitate on-demand machine learning and allow for rapid development of AI solutions, thus leading to faster time to value. Emerging standards for interoperability (e.g. HL7/FHIR) that guide the exchange, integration, sharing and retrieval of healthcare data should not be overlooked, nor should the necessity of aforementioned domain expertise to tailor fluency in these technologies to answer healthcare questions.
Deep learning with TensorFlow or PyTorch is the new exciting modeling approach that is attracting attention, but can be harder to interpret, and is better for modeling unstructured data (images/text) than structured data (medical claims, EHR). Understanding these tradeoffs is key to choosing the right application for an AI/ML problem.
Read more about the emerging technologies and the necessary fluency to apply data science methodologies to healthcare questions from Yilian Yuan, SVP of Advanced Analytics and human data science disciple.
Information Governance: The last, and often overlooked piece is Information Governance guiding patient privacy. The US HIPAA (Health Insurance Portability and Accountability Act) establishes a set of safeguards to keep healthcare information secure. However, on top of HIPAA-mandated regulations, there are countless best practices for minimizing the risk of inadvertent patient re-identification and maximizing patient privacy. The trade-off between risk and reward is often ill understood. Which information can and cannot be shared and with who? Can data be transferred across geographical borders? What steps should be taken if patient re-identification is required for the research engagement?
Consider the rare disease space. Algorithms used to find undiagnosed patients do a wonderful job of finding hand-to-find patients, but also increase the risk of reidentifying patients. When an algorithm identifies a single patient in a small town who visits the only specialist in town, information governance should be in place to prevent breaches of privacy. Good information governance can also enable sales teams to improve their messaging, even with safeguards in place.
If you find one person who checks all of these boxes by holding expertise in all six areas described above, consider yourself extremely lucky. Offer them a job and keep them happy. However, in my experience, such a unicorn doesn’t exist. You need to find a combination of brilliant and motivated people who are very strong in at least one of these areas, and separately those who can effectively operate in multiple areas and act as connectors helping different sides find a common language.
As my colleague Michael Kleinrock so aptly put it, human data science simply doesn’t – nor can it – exist in a vacuum. This new breed of data science experts need to have more than just depth. They need to have a sense of collaboration, creativity and community.
Consider just a few tasks for a human data science team:
- Gather and combine both business and medical background for the research engagement at hand, and understand which data source(s) should be used
- Recommend a solution with an appropriate level of transparency for the business need in question and avoid the common pitfall of choosing the fanciest machine learning algorithm when a simpler algorithm will suffice
- Design the study, considering a myriad of consecutive steps, anticipating and resolving challenges along the way and playing with the data to investigate multiple hypotheses gathered from domain experts
- Validate the results against medical expertise, and assure they are clinically meaningful
- Recommend a deployment strategy that is timely, able to adapt to changing environment, and deemed suitable by business experts
Good data science connectors possess collaborative nature and strong communication skills, inquisitiveness and pragmatism, organized logical thinking and curiosity. A lot of these are soft skills – human skills - that haven’t historically appeared in data science related job postings but are increasingly part of the rubric for human data scientists.
This blog post only scratched the surface of the important considerations for data science in healthcare. In the upcoming months, my colleagues and I will be sharing a deeper dive into each one of the success factors.
Are you building a data science team in healthcare? What have you learned on your journey so far? What lessons do you have to share? Are there other requirement that should be considered? I would love to hear from you!