Data is already well known as the fundamental asset of large enterprises in the information age. Large datasets are currently being used to support decision making, optimize operations, power predictive models, and train complex AI modules. Over the last decade, data proliferated at an incredible rate, as it became cheaper to store, easier to access, and simpler to process. This momentum is expected to continue throughout the next decade, as organizations will be able to store and access ever-increasing amounts of data, which will inevitably be spread across a growing number of interconnected datasets.
As data grows in scale and complexity, tracing its flow across the organization is becoming increasingly difficult. Data can be ingested via multiple sources – from automatic sensors to user interaction. It then undergoes thousands of transformations, gets stored in multiple datasets, and is used by many different applications. In addition, many data-points are actually interconnected across datasets, with a complex web of dependencies that only grows with scale. Things work reasonably well as long as these datasets remain static, but even a minor change to a single field can have a cascade of unforeseen effects on other data points, and in turn, on critical business applications. Now imagine moving entire datasets to the cloud while maintaining the existing wiring. Sounds impossible? It nearly is.
Data lineage aims to solve this problem. By connecting to an organization’s data infrastructure layer and tracking the flow of data in the organization, it provides complete visibility into the flow of data, the transformations each data point undergoes, and the complex web of inter-dependencies between datasets. Users describe this technology as magical. It enables large data modernization and transformation projects, while also granting data scientists and engineers the ability to make changes to their schemas and investigate data incidents without the fear of unforeseen effects. Having a complete map of every data flow in your architecture also greatly increases overall productivity of data teams.
Moreover, lineage plays a major role in data privacy management. Organizations that already identify Personal Information (PI) in their datasets, using PI discovery products like BigID, can leverage Manta’s lineage capabilities to track the flow of sensitive PI and ensure they meet compliance regulations and protect the information of their users. Some forward-thinking organizations are also using this technology to improve their AI models and remove biases, by tracking the actual data points that end up feeding into AI training sets – from ingestions, through all transformations, to the final datasets themselves.
Bessemer has a longstanding history of investing in the most transformative technology infrastructure companies in the world, and we have spent the last few years learning about the challenges and the opportunities presented by the proliferation of data. We have already witnessed successful companies like Snowflake and Collibra build large businesses in the data infrastructure space, and we believe many more will come as this trend accelerates. Data lineage strikes us as the next critical piece in data governance. Data lineage is viewed as a must-have by data scientists and engineers, as well as CIOs and other decision-makers, and with time, every modern organization would depend on lineage capabilities to make sense of its data. The tailwinds from privacy, regulatory, and AI use-cases are all contributing to the emergence of this category.
Today, we have the pleasure of announcing our Series A investment in Manta – the emerging leader in data lineage. We are proud to share that Manta has the strongest product in the market, the largest scale of happy enterprise customers, and a vast network of partnerships and integrations with all leading data infrastructure tools – including data catalog vendors, privacy providers, and ETLs. Manta’s simple architecture allows it to tap into these existing solutions to shorten the implementation cycle, enabling customers to go from no visibility to complete visibility within a few hours.
As we look at the decade ahead, we expect the trends of data proliferation to accelerate and view Manta as an emerging leader in the lineage category, which is promising to become the most critical piece in data governance. Many organizations already agree, and the rest will inevitably follow in the upcoming years. Data is gradually changing the world, and so we have to change the way we manage it.