Today, there are approximately 350 million people living with 7,000 rare diseases worldwide. In the United States alone, 25 to 30 million people are affected – more than half of whom are children – while treatments are available for just 5% of them. As these diseases are exceedingly rare, initial misdiagnosis is common and underdiagnosis is extensive. On average, it can take more than seven years for patients with a rare disease to receive an accurate diagnosis – which creates a significant delay to critical treatment options for these patients.
We must find a more reliable means of detecting rare diseases to ensure patients get the treatments they need. With advances in machine learning and deep learning technologies, we can now begin to pursue answers to questions about these diseases regardless of the complexity in faster, more compelling ways.
Two primary challenges exist in using machine learning to detect rare diseases. First, the low prevalence rates of these diseases limit the number of positive subjects in the training data (i.e., patients with a conﬁrmed diagnosis of a disease). Thus, disease patterns are hard to extract.
This is made more difficult by the fact that many rare diseases have not been assigned an ICD-10 code. ICD-10 codes are used by physicians to identify specific diseases or symptoms. In the case of many rare diseases, each physician must use their judgment to select an ICD-10 code that best accounts for a patient’s symptoms. In practice, this means a set of patients who in fact share the same disease could each be assigned a different ICD-10 code.
Second, there are many patients with an uncertain diagnosis due to the long period needed for rare diseases to be correctly diagnosed. Although we do not know who those patients are, their existence can potentially help. In recent years, the availability of massive electronic health records (EHR) data has further enabled the training of deep learning models for accurate predictive health.
Getting past the false positive hurdle
While artificial intelligence holds great promise for detecting undiagnosed patients, the barrier is much higher with rare diseases due to the difficulty in distinguishing between patients with similar conditions due to the low prevalence rates of rare diseases. To tackle these challenges, we need to continue to evolve how we apply machine learning. One way is to use pattern augmentation to better preserve and enrich crucial patterns of the target disease.
In partnership with Jimeng Sun, University of Illinois Urbana-Champaign, my team at IQVIA published a paper featured at AAAI 2020 outlining a Complementary Pattern Augmentation (CONAN) framework for rare and low prevalence disease detection.
CONAN uses the idea of adversarial learning. First, a generator learns to create plausible, but fake, patient samples. Then, a disease detector aims to distinguish between negative and positive patient samples. This establishes what is called a minimax game between the generator and disease detector. After this training, the disease detector can be used for detecting positive patients. Experiments on real-world data sets demonstrated strong performance. Read the full paper here.
Improving the ability to detect rare diseases is a critical step in finding answers to the perplexing questions surrounding rare diseases, and ultimately ensuring patients can be properly diagnosed and treated. To build on this success, deep expertise and creativity are required to challenge the way things have been done previously and to make connections, with accuracy, even when it seems impossible.
At IQVIA, we believe the pursuit of a healthier world begins by solving problems that once seemed unsolvable and enabling treatments for diseases once thought untreatable. In this pursuit, Human Data Science leads the way. It’s a revolutionary way to approach problem solving in healthcare, harnessing advances in technology, data science and human ingenuity to improve human health.
For more information on rare diseases and orphan drugs read the IQVIA Institute report.