Unleash the Full Potential of StreamSets with Data Lineage from Manta
StreamSets Data Collector is an open-source execution engine for fast data ingestion and light transformations. The engine is designed to execute smart data pipelines for streaming, change data capture (CDC), and batch data without hand coding. The Manta StreamSets scanner includes – but is not limited to – support for Hadoop, JDBC, and Google BigQuery, both as origin and destination stages, as well as processor stages such as fields, expressions, schemas, and data parsers.
Manta understands pipelines from StreamSets Data Collector and is able to analyze and visualize them. This includes extracting pipelines, resolving individual stages, working with provided runtime values, and processing expression language and database connections. By processing the embedded origin, destination, and processor stages in these jobs, Manta creates a detailed visualization of the data lineage that can be pushed into any third-party metadata management solution or viewed in Manta’s native visualization.
Manta Currently Scans:
Pipelines and their stages
Frequently Asked Questions
What is a data lineage scanner?
A data lineage scanner connects to database repositories, ETL tools, reporting tools, and other types of source technology to document how data flows, transforms, and impacts assets both downstream and upstream as well as where the data is sourced from, making it possible to gain full visibility and control over even the most complex data pipelines.
How flexible is Manta when it comes to possible integrations?
Can I integrate Manta with my CICD pipeline?
Yes, Manta can be utilized as a component of a CICD pipeline to supplement teams’ development efforts.
How can Manta integrate with my data intelligence?
You can boost your data intelligence efforts with detailed, accurate, and up-to-date data lineage provided by Manta. Manta has a robust API for developing integrations with data intelligence tools.
Can I integrate Manta with my data privacy tool?
Yes, you can leverage Manta’s comprehensive data lineage to build trust in data, ensure data security, and adjust your data privacy policies. Manta has a robust API for developing integrations with data privacy tools.
Can I integrate Manta with my profiling tool?
You can utilize Manta’s detailed lineage and unique features for data profiling and achieving better data quality. Manta has a robust API for developing integrations with data profiling tools.
How can Manta integrate with my metadata management tool?
Manta has OOTB connectors to all the major players in the data governance/cataloging space. Manta also can export its repository to consumable formats for unsupported third-party metadata management applications.
Does Manta work with various ETL orchestrations?
There will always be technologies on the market that don’t have supported scanners provided by Manta. In order for the lineage from unsupported technologies to be represented in Manta visualization diagrams, Manta provides a framework called Open Manta. The Open Manta framework makes it possible to define and manage lineage generated by unsupported technologies.
How can I improve lineage data quality?
When you have a complete overview of all your data flows, sources, transformations, and dependencies, you have control of your data assets. You can speak to the accuracy and quality of your data and have confidence in your data information and reports. By giving you a full overview of how your data moves across systems, where it originated, how it transforms along the way, and how it’s interconnected, data lineage can help you to ensure the quality of your data, reinforce your overall data management strategy, and increase trust in your data.
What is the purpose of data lineage?
Data lineage helps you tame data complexity and gives you a full overview of how your data moves across systems, including where it originated, how it transforms along the way, and how it’s interconnected. Such an overview will help you boost your data governance efforts, increase overall trust in data, achieve full regulatory compliance, accelerate root cause and impact analyses, roll out our frequent bug-free releases, painlessly migrate to the cloud, and more.