Article

How does pseudonymization work in CTcue?

Apr 21, 2021

Pseudonymization of structured data

CTcue uses pseudo IDs to ensure patient privacy. These IDs replace patient names with patient numbers. Once assigned, these numbers remain fixed. This technique is known as hashing. We use the SHA-512 hashing algorithm.

The hashes are also ‘salted.’ This makes them even safer, ensuring that the pseudo ID can never be traced back without access to the CTcue online database within the hospital.

In addition, the Electronic Health Records (EHR) of patients are sometimes merged. For example: Patient 1 and Patient 2 are initially two distinct patients. However, following a change in the healthcare delivery system, it may have been decided that P2 (who has arrived at an emergency department) is actually the same as P1 (who is already registered in the EHR). In this case, under CTcue, the pseudo IDs for these patients will merge. The pseudo ID is then a hash of the number for the existing registered patient of the set (in the example, a SHA-512 hash of P1).

Pseudonymization of text

Pseudonymizing text requires particularly care. Delete too little personal data and the patient’s privacy is maintained. Delete too much and the quality of the medical information will decrease.

To pseudonymize texts, we apply two different methods: the first uses all the information that is known about a patient. The second uses pattern recognition to detect the unknown information.

In the database, information such as first name, surname, date of birth, etc, will be stored for each patient. In the case of the first method, we use the following information as input to analyze the text:

  • Patient number
  • Initials
  • First name
  • Surname
  • Date of birth
  • Date of death
  • National Register Number
  • Street name
  • House number
  • Post code
  • Domicile
  • Telephone number

The whole history of this information will be included. For example, if a patient has lived at multiple addresses, then all the street names will be detected.

Data
For each type of data, various search methods are used. For example, it is important to first analyze the name that is known, if it is an actual real name. Sometimes, during emergencies, medical information has been entered in the name field. It is also important to note spelling errors or variants. For example, an "ij" can be written down as "y" or "ea" may have been accidently recorded as "ae." All these variants are replaced before we then carry out a fuzzy match on the texts. This means that words that are not an exact match but are very similar to the name will also be replaced. A further check is carried out to ensure that we do not remove a medical term.

Contact details
Telephone numbers and email addresses are always deleted because both have a pattern that can be easily detected. We also remove all physical addresses that are entered as input, as well as looking at many common variants such as “str” which is listed for “street.”

Dates
Date of birth and death in both the numerical variant (‘1-3-67’, ‘02/03/1972’) and in a textual variant (‘3 Oct 57’, ‘2 November 1965’) are deleted. However, the year of birth and age are not removed because these may be of medical importance.

Personal numbers
Personal numbers, such as National Register Number or Insurance Number are removed, based on the input provided. This includes, for example, variants that are leading zeros. We also consider the context for numbers. For example, if "National Register Number" is applicable or text indicating that it concerns an Insurance Number, then we delete the number, even if it was not registered as input or if it was incorrectly registered.

This process ensures that, as a first step, all known information is replaced before the methods relating to unknown information, such as foreign addresses, wrongly registered information or names of family members, can be applied. These methods apply to known patterns and make use of previous pseudonymizations that took place based on the input.

Continuous improvement

Patient privacy and protection of personal data are very important. Pseudonymizing data ensures compliance with the rules stipulated by the General Data Protection Regulation (GDPR). With CTcue, we continuously improve our methods based on the latest developments but also on input from users and clients.

 

Reference for this article: CTcue. How does pseudonymization work in CTcue?

Contact Us