Deciphering Enterprise-Wide Data Dependencies
There is no question that data lineage is a key component of data governance in any organization. Despite this, we still often see difficulties tracing data flows inside silos that are built on specific data processing technologies. This is especially true if we wish to reach an atomically detailed level information (or so-called column-level data lineage) or if we aspire for absolute accuracy of harvested lineage metadata.
However, organizations face increasing challenges tracing end-to-end dependencies across the whole data landscape. This landscape is complex, and is usually represented by technologically heterogeneous environments, the deployment of multiple data processing systems and, increasingly often, combining cloud and on-premise infrastructure.
In other words, while in the past organizations were quite satisfied with having just the information on what happens with their data inside a single layer of their data warehouse, that’s no longer the case. Now, the users or consumers of lineage information want to see that the data element they are investigating originated in a mobile application, then was collected and transformed on a cloud platform, replicated to an on-premise hosted database, and finally delivered to them as part of their monthly dashboard.
So how can an enterprise uncover dependencies in such a complex landscape? The answer lies in enterprise-wide data lineage.
Manta for Enterprise-Wide Data Lineage
To start, let’s highlight what Manta does in the field of enterprise-wide data lineage. First, Manta connects to a variety of data processing technologies and via reverse-engineering, recognizes what data flows and transformation rules are implemented. This means parsing the source code (analyzing it and transforming the result into a graph database drawing all data dependencies on utmost atomic level) and utilizing internal metadata provided by that specific technology.
Second, Manta integrates all pieces of metadata harvested from different technologies together by linking identical objects. For example, what is seen as object X in an “external location” seen from the Databricks cloud solution can at the same time be a table Y in your local on-premise Oracle database. For enterprise-wide data lineage, it is crucial that we identify these seemingly different references as identical objects.
Unifying Metadata Models
We see that the number of data processing technologies and cloud platforms is growing constantly. That is why data experts globally seek some level of unification of metadata models which they could exchange mutually, allowing for more seamless data lineage tracking and analyses.
For example, Manta partners with the OpenLineage project and plans to integrate any OpenLineage Producer (meaning any technology which is connected to the OpenLineage project) using the Manta OpenLineage connector (interface) currently being developed. Manta plans to introduce a table-level lineage version of this connector first and later focus on column-level lineage resolution using some of its source code parsers.
Setting Up the Right Governance Management Functions
It may seem that enterprise-wide data lineage is primarily a matter of technologies and specific tools like Manta. However, it’s important to note that proper setup of data governance management functions, supported by data lineage, will require the expertise of highly skilled professionals. In the long term, every organization will need to maintain such expertise internally – but in the interim, organizations will also need the expertise of specialized vendors who function in this area specifically.
This takes a combination of the right people and technologies to get the job done. You may need assistance from a partner like Profinit or another one of Manta’s partners. Once you have the right governance framework, processes, people, and lineage in place, your enterprise will see the full benefits.
To learn more about setting your team up for success, get in touch with a member of our team.