• Pricing
  • Partners
  • Careers
  • Search
manta-logo
  • Solutions
    • Use Cases
        • Data Governance & Regulatory Compliance
        Request a Demo
        • DataOps
        Request a Demo
        • Cloud Migrations
        Request a Demo
        • Mergers & Acquisitions
        Request a Demo
      Request a Demo
    • Industries
        • Financial Services
        Request a Demo
        • Healthcare
        Request a Demo
        • Pharmaceutical
        Request a Demo
        • Insurance
        Request a Demo
      Request a Demo
    • Best Practices
        • Data Lineage Granualarity
        Request a Demo
        • Ultimate Guide to Data Lineage
        Request a Demo
        • Understanding the Basics: Data Mesh Data Fabric Data Lineage
        Request a Demo
      Request a Demo
  • Platform
    • How It Works
    • Integrations
    • Supported Scanners
  • Resources
    • Blog
    • Customer Stories
    • Events
    • Resources
  • Support
    • Manta Customer Portal
    • Manta Partner Portal
  • About
    • About Manta
    • News
    • Contact Us
Request a Demo
Request a Demo

    Sidebar title

    • This is anchor text
    • This is anchor text
    • This is anchor text
    • This is anchor text
    • This is anchor text

    Return of the Metadata Bubble

    Tomáš Krátký Jul 27, 2017 12:00:00 AM

    The bubble around metadata in BI is back – with all it’s previous sins and even more just around the corner. [LONG READ]  

    In my view, 2016 and 2017 are definitely the years for metadata management and data lineage specifically. After the first bubble 15 years ago, people were disappointed with metadata. A lot of money was spent on solutions and projects, but expectations were never met (usually because they were not established realistically, as with any other buzzword at its start). Metadata fell into damnation for many years.

    But if you look around today, visit few BI events, read some blog posts and comments on social networks, you will see metadata everywhere. How is it possible? Simply, because metadata has been reborn through the bubble of data governance associated with big data and analytics hype. Could you imagine any bigger enterprise today without a data governance program running (or at least in its planning phase)? No! Everyone is talking about a business glossary to track their Critical Data Elements, end-to-end data lineage is once again the holy grail (but this time including the Big Data environment), and we get several metadata related RFPs every few weeks.

    Don’t get me wrong, I’m happy about it. I see proper metadata management practice to be a critical denominator for the success of any initiative around data. With huge investments flowing into big data today, it is even more important to have proper governance in place. Without it, no additional revenue, chaos, and lost money would be the only outcome of big (and small) data analytics. My point is that even if everything looks promising on the surface, I feel a lot of enterprises have taken the wrong approach. Why?

    A) No Numbers Approach

    I have heard so often that you can’t demonstrate with numbers how metadata helps an organisation. I couldn’t disagree more. Always start to measure efficiency before you start a data governance/metadata project. How many days does it take, on average, to do an impact analysis? How long does it take, on average, to do an ad-hoc analysis. How long does it take to get a new person onboard – data analyst, data scientist, developer, architect, etc. How much time do your senior people spend analysing incidents and errors from testing or production and correcting them? My advice is to focus on one or two important teams and gather data for at least several weeks, or better yet, months. If you aren’t doing it already, you should start immediately.

    You should also collect as many “crisis” stories as you can. Such as when a junior employee at a bank mistyped an amount in a source system and a bad $1 000 000 transaction went through. They spent another three weeks in a group of 3 tracking it from its source to all its targets and making corrections. Or when a finance company refused to give a customer a big loan and he came to complain five months later. What a surprise when they ran simulations and found out that they were ready to approve his application. They spent another 5 weeks in a group of 2 trying to figure out what exactly happened to finally discover that a risk algorithm in use had been changed several times over the last few months. When you factor in bad publicity related to this incident, your story is more than solid.

    Why all this? Because using your numbers to build a business case and comparing them with numbers after a project to demonstrate efficiency improvements and those well-known, terrifying stories that cause so many troubles to your organisation, will be your “never want it to happen again” memento.

    B) Big Bang Approach

    I saw several companies last year that started too broad and expected too much in very short time. When it comes to metadata and data governance, your vision must be complex and broad, but your execution should be “sliced” – the best approach is simply to move step-by-step. Data governance usually needs some time to demonstrate its value in reduced chaos and better understanding between people in a company. It is tempting to spend a budget quickly, to implement as much functionality as possible and hope for great success. In most cases, however, it becomes a huge failure. Many, good resources are available online on this topic, so I recommend investing your time to read and learn from others’ mistakes first.

    I believe that starting with several, critical data elements most often used is the best strategy. Define their business meaning first, than map your business terms to the the real world and use an automated approach to track your data elements both at a business and technical level. When the first, small set of your data elements is mapped, do your best to show their value to others (see the previous section about how to measure efficiency improvements). With success, your experience with other data sets will be much smoother and easier.

    C) Monolithic Approach

    you collect all your metadata and data governance related requirements from both business and technical teams, include your management and other key stakeholders, prepare a wonderful RFP and share it with all vendors from the top right Gartner Data Governance quadrant (or Forrester wave if you like it more). You meet well-dressed sales people and pre-sales consultants, see amazing demonstrations and marketing papers, hear a lot of promises how all your requirements will be met, pick up a solution you like, implement it, and earn you credit. Prrrrr! Wake up! Marketing papers lie most of the time (see my other post on this subject).

    Your environment is probably very complex with hundreds of different and sometimes very old technologies. Metadata and data governance is primarily an integration initiative. To succeed, business and IT has to be put together – people, systems, processes, technologies. You can see how hard it is, and you may already know it! To be blunt, there is no single product or vendor covering all your needs. Great tools are out there for business users with compliance perspectives such as Collibra or Data3Sixty, more big data friendly information catalogs such as Alation, Cloudera Navigator, or Waterline Data, and technical metadata managers such as IBM Governance Catalog, Informatica Metadata Manager, Adaptive, or ASG. Each one of them, of course, overlaps with the others. Smaller vendors then also focus on specific areas not covered well by other players. Such as MANTA, with the unique ability to turn your programming code into both technical and business data lineage and integrate it with other solutions.

    Metadata is not an easy beast to tame. Don’t make it worse by falling into the “one-size-fits-all” trap.

    Manta technology callout 2017 ibm v1 1

    D) Manual Approach

    I meet a lot of large companies ignoring automation when it comes to metadata and data governance. Especially with big data. Almost everyone builds a metadata portal today, but in most cases it is only a very nice information catalog (the same sort you can buy from Collibra, Data3Sixty, or IBM) without proper support for automated metadata harvesting. The “How to get metadata in” problem is solved in a different way. How? Simply by setting up a manual procedure – whoever wants to load a piece of logic into DWH or Data lake has to provide associated metadata describing meaning, structures, logic, data lineage, etc. Do you see how tricky this is? On the surface, you will have a lot of metadata collected, but every bit of information is not reality – it is a perception of reality and only as good as the information input by a person. What is worse, is that it will cost you a lot of money to keep synchronised with real logic during all updates, upgrades, etc. The history of engineering tells us clearly one fact – any documentation, especially documentation not an integral part of your code/logic, created and maintained manually, is out of date the very same moment it was created.

    Sometimes there is a different reason for harvesting metadata manually – typically when you choose a promising DG solution, but it turns out that a lot is missing. Such as when your solution of choice cannot extract metadata from programming code and you end up with an expensive tool without the important pieces of your business and transformation logic inside. Your only chance is to analyse everything remaining by hand, and that means a lot of expense and a slow and error-prone process.

    Most of the time I see a combination of a), c) and d), and in rare cases also with b). Why is that? I do not know. I have plenty of opinions but none of them have been substantiated. One thing for sure is that we are doing our best to kill metadata, yet again. This is something I am not ready to accept. Metadata is about understanding, about context, about meaning. Companies like Google and Apple have known it for a long time, which is why they win. The rest of the world is still behind with compliance, regulations being the most important factor why large companies implement data governance programs.

    I am asking every single professional out there to fight for metadata, to explain that measuring is necessary and easy to implement, small steps are much safer and easier to manage than a big bang, an ecosystem of integrated tools provides greater coverage of requirements than a huge monolith, and that automation is possible.

    Tomas Kratky is the CEO of MANTA and this article was originally published on his LinkedIn Pulse. Let him know what you think on manta@getmanta.com.

    MANTA Business
    Share on social

    Leave a Comment

    Company
    • About Manta
    • News
    • Partners
    • Careers
    Get in Touch
    • Request a Demo
    • Get Pricing
    • Contact Us
    • Manta Portal
    LOCATIONS
    Tampa, Florida
    Prague, Czechia
    Lisbon, Portugal
    Dublin, Ireland
    London, United Kingdom
    Legal
    • Privacy Policy & Cookies
    • Terms & Conditions
    • Quality Policy
    • Licensing Policy
    • Information Security Policy
    • Certifications
    • Third-Party Libraries
    manta-logo-lp

    Manta is a world-class data lineage platform that automatically scans your data environment to build a powerful map of all data flows and deliver it through a native UI and other channels to both technical and non-technical users. With Manta, everyone gets full visibility and control of their data pipeline.

    © 2023 Manta