AI & ML Data

Meet the founders of dltHub: Matthaus Krzykowski, Marcin Rudolf, Adrian Brudaru, and Anna Hoffmann

Bessemer Venture Partners leads dltHub’s $8 million seed round to develop a Python-native data platform for the future of data engineering.

For nearly every business, getting new, reliable data into production remains a slow, manual bottleneck. Python developers building AI workflows or analytics either have to wait on overloaded data engineering teams or attempt to put together fragile notebook code that specialists later rebuild from scratch. Yet these data pipelines are crucial, directly informing AI agent deployments, real-time dashboards, machine learning operations, and more.

At Bessemer, we believe the future of data engineering will look radically different than it does today. With Python as the universal language of AI and LLMs as everyday coding partners, data pipelines can now be built at a scale, speed, and accessibility that were previously impossible. This not only accelerates development but also unlocks entirely new use cases across organizations, from regulated finance and healthcare teams to individual developers building their first analytics workflow. By combining Python-native simplicity with enterprise-grade governance, dltHub makes data engineering faster, more collaborative, and as frictionless as writing code itself.

This is why we’re proud to lead dltHub's $8 million seed round. We sat down with dltHub's founders, Matthaus Krzykowski (CEO), Marcin Rudolf (CTO), Anna Hoffmann (COO), and Adrian Brudaru, to hear their vision for democratizing data engineering through Python-native pipelines in their own words.

What problem are you solving in data engineering, and why does it matter?

Whether it's launching a first analytics workflow or powering full ML-driven organizations, we’re on a mission to make data engineering as accessible, collaborative, and frictionless as writing Python itself.

Python is the language of AI, analytics, and data workflows. Since 2017, the global Python developer community has grown from 7 million to 22 million in 2024 — and we believe it’ll grow to 100 million in the next decade. With dltHub, everyone in this community will be able to build with data more efficiently.

Was there an "aha moment" that led you to start dltHub? When did you know there was a bigger problem to solve?

Marcin and Matthaus first got access to the GPT-3 API in April 2021. We were subcontracting for one of the early AI agent companies and brought Adrian on board to build production data pipelines for Fortune 500 teams.

We quickly saw a recurring pattern: Python developers who wanted to use business data either had to wait for a data engineering team or hack something together in tools such as Jupyter notebooks. Later on, specialists like data engineers and MLOps experts would step in to rebuild everything from scratch.

It became clear that as AI adoption accelerated, this pattern would become widespread. We started prototyping and realized we could meet both groups’ needs — the Python developer and the data engineer — with a minimal, Python-native library. We started to become influenced by tools such as Hugginface and started to believe that data loading would become a commodity with hundreds of thousands of data pipelines, all created and shared openly.

We had more “aha moments” — the first was one two and a half years ago when we built an agent that could generate functional dlt pipelines, which led us to redesign dlt and our docs for LLMs. The second was around a year ago, when we saw early teams attempting to do all their data engineering with LLMs.

How does dltHub extend beyond what dlt already does?

dlt is our open-source Python library that handles “EL” — extracting and loading data from messy sources into clean datasets. It’s modular, interoperable, and built from the ground up to work with LLMs.

With dltHub, we go further. We extend into ELT, storage, and runtime — turning what once required full data teams into something any Python developer can do. You can deploy and run pipelines, transformations in Python or SQL, and notebooks with a single command, no infrastructure required.

We’re building dltHub in close conversation with users from highly regulated industries like finance and healthcare. For them, governance, security, and compliance (like BCBS 239 for risk reporting in finance) aren’t optional. dltHub delivers that while maintaining Pythonic simplicity, offering full data lineage, observability, and quality control in a platform that feels as natural as writing code.

What do you think about the importance of community?

For us, community is mission-critical. Some of the most advanced platform engineers and LLM users in the world are active contributors. We see thousands of messages from them every month.

They shape our roadmap, drive adoption, and build trust. In fact, we see our docs and our community as half of our product. dlt’s interoperability makes this even more powerful: it works seamlessly with the modern data stack as well as the Pythonic stack — DuckDB, Airflow, Pandas, and beyond. When developers in those communities release or discover something new, dlt is often part of the conversation. That cross-pollination compounds our reach and impact.

What's the biggest misconception about the AI bubble as it relates to data engineering?

Some people say we’re in an AI bubble and that things are overhyped or unsustainable. But in data engineering, the change is real and structural. Whether it’s an AI bubble or not, in data engineering, the change is real, fundamental, and we see it everywhere.

If something can be expressed in Python, it can now be LLM’d. But LLMs aren’t replacing data engineers; they’re multiplying their output. With dlt, we’re seeing it firsthand with developers creating over 50,000 custom connectors in September alone - a 20x increase since January. The legacy connector ecosystem supported only a few hundred. We already support more than 4,700 sources and see a clear path toward hundreds of thousands.

That growth isn’t just quantitative, it’s qualitative. It’s translating into richer data lineage, higher quality, and more automation across the stack. So while some corners of AI may be speculative, in Python-based data engineering, it’s pure productivity. LLMs are everyday coding partners now, and tools like dlt and dlthub are bridging the gap to make that power usable.

What advice would you give to developers who are hesitant to take on data engineering work?

The biggest trend we’re seeing is that data engineering is converging with software engineering. The old, heavyweight tools are being replaced by modular, Python-first libraries that feel like any other part of a modern developer’s toolkit. You don’t have to become a data engineer. Think of data as just another layer of your application. Block off two hours, take one of our courses, follow one of our quick starts or LLM-native workflow guides, and by the end, you’ll probably have a production-ready pipeline. Most developers are surprised by how much they can do without a dedicated data team.

What should we expect to see next from dltHub?

We plan to launch dltHub for individual developers in February. If you’re curious, join the waitlist — we’d love to have you building with us early.

Meet the founders of dltHub: Matthaus Krzykowski, Marcin Rudolf, Adrian Brudaru, and Anna Hoffmann

Bessemer Venture Partners leads dltHub’s $8 million seed round to develop a Python-native data platform for the future of data engineering.

Recommended Articles

Meet the founders of Columnar: Ian Cook, David Li, and Matt Topol

Revolutionizing data streaming at microsecond scale with AlgoX2

Meet the founder and CEO of ChipAgents: William Wang