Blog
Three Data Matching Challenges in Healthcare
Understanding the challenges specific to matching healthcare professional and healthcare organization data
Herve Fuselier, Engagement Manager, Technology
Gilbert Merariu, Practice Leader, Technology Solutions
Jul 14, 2020

Successful data matching means high match rates with no incorrect matches and with minimal manual effort. Having worked on many healthcare data management projects, we want to share three reasons why data matching, specifically when dealing with healthcare provider (HCP) and healthcare organizations (HCO) data, is more challenging than in other businesses.

Challenges specific to matching healthcare data

  • Data cleanliness and completeness issues. The abundance of systems that collect HCP and HCO data results in data inconsistencies and different levels of data quality.
  • Lack of understanding the healthcare context. Not understanding or taking into consideration healthcare concepts, definitions, terms, and phrases in the matching process can lead to false positives.
  • Lack of adequate tools causes volumes of manual intervention. There are still some decisions that need human intervention. Even with the most sophisticated matching systems, the mechanisms and processes in place to make this happen can be inflexible, disjointed, and inadequate to meet healthcare’s data matching needs.

Consequently, a non-optimal matching system can generate false positives and result in wrong information being linked to a person or organization. This can lead to incorrect decisions or regulatory/compliance issues , for example, allocating an HCP payment against the wrong individual. Once incorrect information is linked, for example within a CRM, it is extremely difficult to resolve or untangle the linked information to reassign the data. That is why it is important to get it right the first time.

In this blog, let’s dive deeper into each one of the challenges outlined above. Whether matching is a stand-alone requirement (i.e. you have an existing master data/reference) or part of a broader master data management solution, these challenges can still apply.

Data cleanliness and completeness issues

Without common keys (a common ID that links one file to another), matching is highly sensitive to the quality of the data that is being matched.

The largest contributor to a lack of data cleanliness and completeness is human error. Missed entries, entries into incorrect fields, additions of extra non-essential information, spelling mistakes, and misinterpretation of the meaning of a field can all contribute to subpar data. For example, a free text input field to enter the HCP specialty, instead of a predefined selection list, can mean many variations for the same information.

A second reason for data incompleteness is design consistency across input systems. Some of these systems were designed to satisfy one business purpose and did not consider future data integration requirements. For example, a training registration page may not need to collect a physician’s speciality. Although not required for training purposes, information on a physician’s specialty can be key to avoiding false positives when you need to integrate the training information in a CRM or with business intelligence, compliance, or other systems. It is not unusual for some HCPs to have similar first and last names.

It is unrealistic to expect all upstream data collection systems to be rebuilt to accommodate your specific data integration needs . Even with data entry standardization, this does not always solve all matching problems because there can still be issues of ambiguity or misuse of the system.

Lack of understanding the healthcare context leads to data ambiguity

As you can see above, it’s not uncommon for data cleanliness and completeness issues to cause ambiguous matching.

Ambiguity is inexactness or the quality of being open to more than one interpretation. If we have an HCP in a file with the first name ‘Jane’, the last name ‘Smith’, and the workplace as ‘Lakeshore Hospital’, we may find that it matches to two different records in our reference file. This makes the correct match indistinguishable unless another piece of information is available to remove this ambiguity. The use of formal names in official registries at Colleges and Boards (e.g. Dr. Terrance Jones) and the use of more common names (e.g. Dr. Terry Jones) for everyday correspondence can also add to ambiguity.

Without a good knowledge of the healthcare landscape and lexicon, matching exercises can lead to all sorts of ambiguous results. Here are a few examples that are unique to healthcare data and need to be considered. HCPs are frequently affiliated with multiple locations; one doctor may have a primary office but also work out of several clinics or hospital sites. For HCOs, one hospital may have multiples sites, each with their own physical address or an organization may be referred to by different names, some referring to a clinic or department that exists inside the hospital.

There will always be differences between the data that has been input and the reference data. The trick is to decide how to measure the confidence level of a match, decide on pass/fail, and to automate this process to make matching effective and fast.

Lack of adequate tools to match healthcare records causes volumes of manual intervention

Now that we’ve seen how healthcare matching can be a challenge due to cleanliness and ambiguity, when it comes to data steward needs, let’s look at challenges within existing matching tools. The data steward typically owns the processes and governance to ensure data cleanliness and accuracy.

Increasing the automatic match rates is relatively easy and tempting to do. You can simply adjust the matching parameters to achieve a high match rate, but the downside is to end up with many false positives. Even the smartest computer and the most sophisticated programs will need a person to decide when the input data lacks enough information for a confident match. Some level of manual intervention is inevitable, so the objective is to find the balance where all automatic matches are correct, and the manual workload is minimized.

For the records that could not be matched automatically, we have seen manual review processes involve lengthy Excel files where the same ambiguous records are coming from different sources and require repeated manual lookup. We have seen manual matching processes which are disjointed and inadequate. In some cases, manual matching took so long that reports were delayed, or the reference data became outdated before it could be used. This can result in the whole process starting again.

Matching tools that do not understand the healthcare context lead to an increase in the manual workload. Tools need to provide data stewards an interface that reflects these healthcare nuances, not only in how the data can be visualized but also in the functionalities offered to the user to research the best match, investigate the ambiguities, or work around data quality issues.

Matching can be improved

Although matching healthcare records can be attempted using basic tools like spreadsheets, automated tools are needed to ensure that any manual intervention is kept to a minimum and, when required, can be performed quickly and with confidence. In addition, automated processes have the advantage of quality control, speed, consistency, better reproducibility of results, and centralized tracking.

In the second part of our healthcare data matching blog, Three Themes to Keep in Mind When Looking for Matching Solutions , we will discuss some of the most common things overlooked by clients when they are considering matching solutions.

If you have comments or questions about this blog, please contact Canadainfo@iqvia.com

Contact Us