Rethinking Data Lineage: A Conversation with Tomas Kratky
You know the basics of data lineage, but how do you put it into practice and what does the future of data management actually look like? Manta’s CEO, Tomas Kratky, recently spoke with Shane Gibson of the AgileData podcast to share valuable insights into data lineage, metadata usage, and the importance of understanding user needs. As businesses become increasingly data-driven, Kratky's perspectives on agile data practices highlight the need to shift our focus from merely collecting data to effectively embedding it into workflows.
"Let’s assume you have a really detailed column level across multiple technologies. You understand the semantics of those transformations… So if you have the most powerful subset of metadata you can get, you can start thinking about a lot of very interesting use cases… You can be proactive and it can be fully automated. You use data in your workflows and processes to fully automate them. That’s what we want. So we are doing the same with metadata. You can think about optimization. You actually understand all possible ways how data can flow in your environment.”
Tomas Kratky, Manta CEO
In this blog post, we'll recap some of the key points from the discussion and explore how organizations can benefit from a user-centric approach to data management.
Understanding Data Lineage Use Cases
Data lineage for enterprises has long been a critical capability, but it's often misunderstood or oversimplified.
“When I started the company, I saw data lineage as a fundamental piece of information that if it’s collected and if it’s used in a proper way, can unlock a lot of power for my customers back then, which is large enterprises,” says Kratky. “Doesn’t matter if it’s finance, retail, healthcare, it’s all the same. They are large and complex. So I saw that as a way to help them, how to give them more visibility, more transparency. And also that was the most important thing for me. How to give them or how to enable agility for them, because that’s one of the secret things about lineage.”
According to Kratky, when you look around, you see different people thinking about lineage in very different ways and for various uses, including:
For some people, data lineage is all about traceability – how is a specific data record moving in the data environment. Some people think in terms of what we call runtime manager operational knowledge. This is basically runtime information about a specific workflow being executed, connecting to tables, or potentially connecting multiple columns together and producing some results. You can think of this as runtime lineage.
That’s very useful, for example, for incident management because you really care only about the workflow executed just a minute ago.
Kratky notes that when you start thinking about more difficult things like change management, which is the most critical thing for every organization, you need the ability to change things quickly and safely. So, for change management, what do you truly do? Runtime lineage can help you understand all possible dependencies in your environment, but it can’t analyze something that is not yet running (hence the name). You need to understand how things may go or flow in your environment and what may happen depending on specific conditions.
In that instance, what you actually need is what we call design lineage. It answers the questions of what are all possible ways in your environment, how data can flow. Is it the same as runtime lineage? No. You may have very critical, exceptional workflows, but they are only triggered possibly once per year, but if you miss them when, for example, migrating or changing something in your environment, you can cause huge incidents that will be seen in a few months or years and will cost millions to fix.
Rethinking Metadata Collection and Usage
Companies often focus on collecting metadata and data without considering how it's used. Kratky emphasizes the importance of understanding how people actually use data in their workflows and integrating metadata to optimize processes, identify issues, and save costs. By focusing on user needs and delivering data when and how it's required, organizations can fully capitalize on the power of metadata.
Delivering data to users in a way that seamlessly integrates into workflows is essential. Kratky suggests that organizations embed data professionals into software engineering practices to reduce complexity, improve data sharing, and ensure that data is stored for consumption, not just data entry. This approach addresses the challenge of designing data platforms that cater to a wide array of user requirements.
Building a robust data lineage capability is both complex and time-consuming, but Kratky believes that the industry should strive to make metadata and lineage a standard part of product development. By learning from experienced enterprise architects, data architects, and data engineers, organizations can enhance data modeling and lineage to drive value for stakeholders.
These insights into data lineage, metadata, and user-centric data practices emphasize the need for a paradigm shift in how businesses approach data management. By becoming more data-driven, organizations can unlock the true potential of lineage and drive more informed decision-making processes.
Tune into The AgileData podcast to hear the full conversation and discover more about the future of data lineage, metadata, and agile data practices.
P.S. This post was written by a human.