Data is, without question, one of the most critical assets for every organization. Every company consists of a set of processes implemented via manual tasks or software applications. One way or another, for everything we do, we use data on input, and we also create data as a result of our actions. Imagine a software system automating critical tasks in finance or marketing that would actually produce and use no data!
While data gets more and more attention on all levels in every organization, there is something we, unfortunately, do not value enough yet: metadata. It’s also referred to as data about data – or context, if you wish. Metadata is everything. It can help you answer questions like:
And we can continue with more and more examples. Metadata is essentially every piece of information about our data. But why is metadata so important? One obvious reason is search. Imagine a library with thousands or millions of books, an e-shop that sells millions of items, or even the internet! Whenever you need to find something, metadata is the key to doing so quickly and efficiently.
Actually, for anyone who is interested in search and metadata, I highly recommend the book The Enterprise Data Catalog by Ole Olesen-Bagneux. Many great books were written about the importance of metadata, information science, taxonomies, data semantics, and knowledge management, and our goal today is not to compete with them.
Our focus in this article is active metadata, a concept whose definition is still evolving. To help understand it, let's compare and contrast the concept with how we use data. Metadata, after all, is also just "data". Metadata management is a technology market that has existed for decades, going through various phases. The most recent phase started with the rise of data catalogs. There are more than 30 different tools out there (probably even more), and new data catalogs are created almost every month. Yet Gartner, in their Market Guide for Active Metadata, stated that "[t]raditional metadata practices are insufficient.” So what is wrong with metadata and how can activation help?
Our ultimate issue is that we are focused too much on metadata collection, which has resulted in silos of metadata. As each catalog has its specific strengths, it is not uncommon to see multiple tools implemented by one company in different business units, which then leads to a catalog of catalogs. This is funny… and useless. Like with data, just collecting it adds no value to the organization.
Using the data analogy, we typically use data in the following ways.
Obviously, there are more ways that we interact with data (like in data governance in healthcare); the above are the most traditional examples. The first "search" case represents a very "static" experience. Everything is sitting in a silo (e.g., a data warehouse or data lake), and we expect people to come, find what they need, and ask questions they need to ask. Do not get me wrong - it is awesome for some use cases and, when compared to a case with no data available, a huge jump forward.
However, we see that data is put to much better use in the other two examples, actively supporting users with limited data engineering skills and dramatically increasing their productivity. Compared to the first example, the latter are more "active" and thus more useful and accessible to a broader audience. And that is what we want to achieve with active metadata too.
That leads us back to the very first question: what is active metadata? Gartner’s definition in their most recent Market Guide for Active Metadata Management is a bit vague but touches on several key aspects.
As mentioned in the beginning of this text, there are various types of metadata. Metadata is almost everything. One obvious question is how to map ALL metadata, and whether there is even a strong business case to do so. We strongly believe that the key to unlocking the true potential of metadata is an intelligent and open standard for metadata exchange and integration. There is a lot to discuss about that topic and I encourage you to start with this article on OpenLineage.
But we at Manta are experts in data lineage and that is what I would like to talk about. For anyone who wants to learn more about data lineage basics, I recommend The Ultimate Guide to Data Lineage. Now the question stands - how can you activate data lineage, and what does it even mean for users? Let's take a look at several examples.
These are just a few examples among many that show how data lineage can be powerful when it is activated rather than sitting somewhere in your metadata repository.
Using the examples above, it is clear that a huge driver of success when it comes to active metadata is the ability to embed and integrate it into other tools. However, a lack of universal standards for metadata exchange and no universal API that vendors can use to embed metadata make it difficult to achieve full integration. In this way, it is similar to Apple’s CarPlay and Google’s Android Auto.
Think of it like this. Today, many new cars come with CarPlay compatibility. At the same time, CarPlay is not truly integrated with the car itself – rather, it projects whatever your phone displays onto the car’s screen, similar to how browser widgets or iframes work. If there were a universal behind-the-scenes API integration for CarPlay, no matter the car type, it would be capable of so much more.
Similarly, while metadata browser plug-ins and widgets are useful, they can’t compete with the value of true integration.
At Manta, we were always pioneers in the space. We integrated actionable metadata years before the term active metadata was coined, and we are big proponents of an open ecosystem with standards. We are part of OpenLineage and Egeria, but those efforts are still evolving. It means that metadata vendors must negotiate and implement point integrations with every single data solution out there, which will clearly never scale. That said, there is still a lot of work to be done. But thinking about the opportunity, we could not be more excited.
Okay, so what next? We have a lot of data and we do a lot with data, and that is not going to change. Building, maintaining, processing, and using data is, however, harder every minute and metadata can save us. The caveat is that it must be metadata not simply sitting in a silo somewhere, but rather metadata actively used by people and machines in everything they do.
Let’s put this even more bluntly. For decades, organizations have collected metadata, forcing or begging users to use their enterprise metadata repository, data catalog (or another industry buzzword) – and epically failed with all metadata-related projects. Now, we may finally start to understand that for metadata to succeed, it must be invisible, intelligent, and smoothly integrated into the lives of people and machines that benefit from its power. That is the true promise of active metadata.