Bridging the AI-data divide with DataHub
Bessemer Venture Partners leads DataHub's $35 million Series B funding.
AI has upped the ante when it comes to metadata management, shifting enterprise data understanding, quality, and control from important to a strategic imperative. Companies racing to deploy AI quickly discover a hard truth: models can’t deliver value without context, and they’re only as accurate and secure as the data that feeds them. Missing governance or reliability can cripple an AI initiative or even lead to a legal liability.
This is where DataHub steps in. The project was conceived at LinkedIn during the initial GDPR mandate to help engineers continue to move fast, while giving assurances that all data they employed could be traced and governed from storage to use in production. The DataHub team spun out on their own in 2021 and has since built a metadata management platform used by leading enterprises like Apple, Pinterest, and Slack on top of the open-source project, which has grown into a community of 13,000+ data builders across thousands of enterprises.
The modern enterprise has data across a multitude of stores, systems, and use cases. Unlike other solutions, DataHub works with this complexity, anywhere you have data or data in use, bringing it into the context management platform. Rather than managing the data itself, DataHub creates a comprehensive "data about your data" layer that brings discovery, visibility, lineage, and control to complex data ecosystems.
We couldn't be more excited to lead DataHub’s $35 million Series B and partner with their amazing team on this journey — bringing order to data chaos and unlocking the full potential of AI in the enterprise.
DataHub is needed now more than ever
We first spotlighted Acryl Data in 2022 (now DataHub). Since then, rapidly advancing AI has made proper data cataloging even more essential. For starters, AI systems need context. Consider a table in a database: AI agents need to know what the table holds, who owns it, and what it’s supposed to be used for. Similarly, with a model in a registry, AI systems must understand its training data, performance boundaries, approved use cases, and compliance status. Without this context, AI might make critical errors — like using inappropriate data, violating privacy policies, or deploying models for unintended purposes.
Reliability is also existential. If AI is using broken or outdated data, it might go unnoticed until customers flag it. Building this “observability” directly into a data catalog helps catch issues early. And finally, compliance can’t slip. Data privacy laws are strict, making mistakes costly. If sensitive personal information (PII) accidentally ends up in an AI training set, like joining the wrong tables, it could lead to legal trouble.
DataHub addresses all of this, and does so with an open‑source core. That combination of transparency and control is why teams trust DataHub to manage their data safely and effectively for various use cases from AI/ML training to running agentic AI systems.
Why we back the architects behind DataHub
Co‑founders Shirshanka Das (CTO) and Swaroop Jagadish (CEO) tackled GDPR compliance to maintain rapid and flexible data-rich product development at LinkedIn (where they crossed paths with our partner Lauri Moore) and Airbnb, leading data infrastructure projects at a scale few have experienced. Their technical depth and creativity has already attracted exceptional talent across engineering, product, and now go‑to‑market.
DataHub has gone beyond cataloging to empower data teams to move quickly with confidence through streamlined access workflows, clear lineage tracking, governance, and data observability—ensuring reliability across the data ecosystem. We're convinced they’re ideally situated to broaden the market. For large enterprises open source is no longer optional, it’s a requirement. The shift toward open source creates a significant advantage for Acryl Data as the commercial entity behind DataHub, a mature and most-starred open-source metadata management platform.
Looking for reliability solutions surrounding your data infrastructure? Learn more about DataHub.