Your patients are more than data sets (which probably goes without saying). 

But the best way to make sure you’re providing the best patient care is by ensuring that patient data is correct. 

Think about it. Every day, you rely on patient records for accuracy to make real-world clinical decisions that impact their daily lives. If a single missing medical record or broken or dead table contains crucial medication information that you can’t access, there could be very real and very negative consequences. One missing piece of data can cause an insurance claim to be denied. 

That’s a lot of pressure. 

Now, let’s add in the fact that if you’re here, you likely already read in our previous blog post about healthcare data that healthcare generates over 30% of the world's data, with that number growing by the day. This data has to be trustworthy, secure, and reliable. 

Enough with the pressure! How do you ensure that your data is trustworthy? 

With data lineage. Let’s dive in. 

Download the Gartner Market Guide for Active Metadata Management

What is Data Lineage for Healthcare?


Data lineage provides a detailed record of the journey of data from its origin to its final destination, including any transformations, modifications, or updates made to the data along the way. Understanding this journey is crucial in ensuring that patient data is correct, and therefore critical to keeping patients safe. It is also imperative that this information adheres to and complies with laws governing data use in healthcare. 

Now, let’s break down the technical process involved in using data lineage for better patient outcomes in healthcare. 

The 5-Step Technical Process of Data Lineage in Healthcare


The technical process of data lineage in healthcare typically involves data source identification, data extraction, data transformation, data loading, data mapping, and data management. These steps enable healthcare organizations to track the movement of data through various systems and applications, including any modifications, transformations, or updates made to the data.

Let’s take a look at what each of these steps entails. 

  1. Data Source Identification: In this step, you’ll begin by identifying the original source of the data. The original source could be electronic medical records (EMR), laboratory results, imaging reports, etc.

    It’s important to note that during this, and all subsequent steps, the data should not be visible as it may be sensitive or privileged. Instead, the source itself should be identified using a data lineage scanner that does not access anything aside from the location of origin.
  2. Data Extraction/ Data Harvesting: Now that you’ve identified where the data originated, you can extract the data from the source system to ensure that it is accurate and complete. Again, it’s critical to note that the data shouldn’t be visible in the lineage platform, only the knowledge of whether it is complete or not and where it originated.

    For example, you can deploy Manta’s Connectivity module to gather metadata and use Manta’s automated scanners to crunch all the SQL, ETL, and BI code. 
  3. Data Transformation: After pinpointing where the data came from and harvesting to make sure it is complete, you can transform the data into a format that is compatible with the target system or application. This is possible through integrations with third-party applications and your data lineage platform. During this step, you’ll also contextualize your data with semantics. 

    For example, Manta adds semantics to enrich the attribute-level lineage with indirect data dependencies, transformation logic, evolution over time, or external metadata such as profiling information, quality scores, PII labels, and more. The goal of this step is to provide actionable insights.
  4. Data Mapping: Next, you’ll map the data elements to the corresponding data elements in the target system. This step particularly would be nearly impossible with a manual lineage approach. You could think of it like creating a map of the U.S. road network today - now that you have all the roads built, how do you actually create a map? Do you start with highways so that people can get from Ohio to California, but then are lost on the “last-mile” problem? Or do you start building a detailed map of a state/county, but then  miss the country-level big picture? Or do you do it completely differently (e.g. by monitoring traffic and speed using cellphone positions)? 

    Different approaches provide different coverage and different levels of accuracy and detail. The situation with lineage is similar - different approaches are suited for different use cases and address different needs. It is important to understand what you want to use the lineage for, which in turn will give you the level of detail and coverage you need.

    Then, you can choose the approach to create your map. Please note that the approach may be different for the environment that you already have built and for the environment that you are about to build. With the latter, you can design it with lineage in mind. For example, developers can use some tagging libraries to generate the lineage into both design and runtime.
  5. Data Lineage Tracking/ Data Lineage Documentation: The data lineage map is not a static, one-time viewpoint. Once you’ve created your map, you’ll be able to track the movement of data through various systems and applications, including any modifications, transformations, or updates made to the data. 

    This is important in healthcare because it can help healthcare providers pinpoint where important patient data was transformed, updated, or even lost. For example, imagine your patient has had a complicated diagnosis history that requires multiple medications (which would be dependencies of the original data), but their chart is now incomplete. You’ll need to pinpoint where it has changed and identify all dependencies to ensure the accuracy of the medication and continued diagnosis. From a medical billing standpoint, if you are suddenly missing contact or insurance information, you’ll need to find out where it went off track. This is achievable only with a living map, not a static, one-time use example of lineage. 

As you can imagine, each of these steps is extremely cumbersome when done manually. Automated data lineage creates a visual map in a matter of minutes vs. hours, finding dependencies along the way. 

Benefits of Data Lineage in Healthcare:


The benefits of data lineage in healthcare are numerous. For starters, it ensures that the data being used for patient care is accurate and reliable, which is essential for making informed decisions about patient care. 

Next, healthcare organizations are subject to various regulations that require them to maintain accurate and complete medical records. Data lineage makes it easier for healthcare organizations to comply with these regulations.

Moreover, data lineage provides a clear understanding of the data being used in healthcare, which improves data governance and helps to ensure that data is being used appropriately. With accurate and reliable medical data and clinical data integrations, healthcare providers can make informed decisions about patient care, which can lead to improved patient outcomes.

Example Use Case of Data Lineage in Healthcare 


One use case for data lineage in healthcare is in the management of electronic health records (EHRs) and healthcare data integration. By tracking the data lineage of patient records as they move through different systems and applications, healthcare organizations can identify and resolve data inconsistencies, reduce errors, and enhance data accuracy. This can result in better patient outcomes, improved care coordination, and increased regulatory compliance.

One such example can be found in Manta customer CHRISTUS Health. During required quarterly EHR system upgrades, CHRISTUS Health was experiencing major downstream impacts and system outages. More specifically, EHR system upgrades led to data problems including new and deprecated columns, which resulted in downstream impacts affecting end users. 

Without visibility into how required upgrades would affect their data environment, the CHRISTUS team was forced to take a reactive rather than a proactive approach to impacts. Rather than CHRISTUS flagging potential impacts in advance and addressing them before they posed problems, end users would inform them of outages as they occurred. Then, CHRISTUS engineers needed to backtrack and troubleshoot what had already gone wrong.  

By incorporating automated data lineage, CHRISTUS Health could scan key parts of its data environment to identify EHR system changes and create proactive quarterly updates. Insights that once required days of tedious work — leaving the team in a constant reactive mode — took minutes or hours after implementing lineage tools. 

The end result is significantly reduced downtime and a high level of service-level agreement (SLA) for system uptime.  

Using Manta’s Automated Data Lineage in Healthcare

Data lineage is revolutionizing healthcare data management by providing a detailed record of the journey of data from its origin to its final destination. Ready to unlock the benefits of data lineage for your healthcare organization? Request a demo with one of our data lineage experts today.

Manta's Ultimate Guide to Data Lineage cover on top of download now button

P.S. This article was written by a human. 

Related Resources