5.28.26

Researching the frontier of robotics: Three founders on what it takes to succeed in embodied AI

From the data pyramid to ‘dark magic’ in data attribution, here's what three founders at the cutting edge of embodied AI are watching closely—and what they think their field is missing.

Robotics hasn't had its ChatGPT moment just yet. But it's coming faster than most people think, and the researchers building at the frontier are already racing to make that a reality. Getting there, however, will require solving problems that the field is only beginning to crack. "We're really, really early on all the fundamental research in order to get us where we need to go," says Armen Aghajanyan, CEO and co-founder of Perceptron AI. 

Armen was one of three founders who sat down for a research panel at Bessemer's first-ever Robotics Day in San Francisco in March 2026, moderated by Janelle Teng Wade, Partner at Bessemer. Each founder brought a distinct vantage point. Jason Ma, co-founder of Dyna, is building full-stack from hardware all the way to a generalized foundation model. Philipp Wu, co-founder of a stealth robotics company, sits at the data layer, running what he describes as the world's largest robotics teleoperation operation. And Armen brings a visual language model (VLM) lens to physical AI from the helm of Perceptron.

Eight key insights for founders building embodied AI 

1. Use ‘the data pyramid’ as a mental model to weigh trade-offs of quality versus scale. At the top of the pyramid sits the highest-quality data—the most expensive and hardest to collect. As you move down, data becomes easier and cheaper to gather, but also moves further from the robot’s embodiment. Consider a data pyramid as a starting point for where you strategically leverage data.

2. The value of data shifts over time as hardware catches up. Data quality doesn’t remain static over time—a single dataset can become significantly more or less useful depending on what hardware exists to train on it. For founders, this means your data strategy can’t be static.

3. Build in-house where the frontier is unsolved and buy everywhere else. Robotics and foundation model founders agree: build in-house anything that determines whether the product actually works, and buy everything else. The reliability gap between good and great—and the risk of distilling from others’ models rather than owning your own data and training—makes full-stack ownership a competitive necessity.

4. Reinforcement learning (RL) works best once robots can match what human demonstrations can teach. For robots to have superhuman abilities, the simulation must accurately model the physics and capabilities of humans, and then RL can work more successfully on top. Robots need a strong foundation of human demonstration data to build on before RL can do its best work.

5. Pixel-level reconstruction is likely not the best path for world models. Most robotics world model research focuses on predicting video frames pixel by pixel, but this level of fidelity may not be necessary. The semantic approach to world modeling may be the most underinvested and likely to unlock far easier training methods.

6. Physics understanding may be getting less attention than it deserves. While the field chases algorithmic and scaling advances, robotics may be underinvesting in the foundational physics work—sim-to-real gaps, system identification—that actually makes robots reliable. The mass learning approach has merit, but not at the expense of deeply understanding the physical systems robots run on.

7. Data attribution is an underappreciated problem in robotics. When robots fail or succeed unexpectedly, researchers rarely know which training data caused it. This may be one of the field’s most overlooked unsolved problems. Solving data attribution would have compounding benefits: better data collection, better model understanding, and more confident deployment.

8. Know where open source data fits in your training stack—and where it doesn’t. Proprietary robotics data captures signals, such as proprioception, operator actions, tactile and audio, that open source simply can’t replicate at scale. Open source teleoperation data is useful for mid-training, but it may not be reliable for last-mile fine-tuning where policy precision matters most.

Read on for eight of their sharpest insights on data, reinforcement learning, and the unsolved problems defining the next era of physical AI.

1. Use ‘the data pyramid’ as a mental model to weigh trade-offs of quality versus scale

One of the central challenges in robotics is sourcing enough training data, and it's a fundamentally harder one to solve than large language models ever faced. LLMs were able to take off with relative speed and ease since there was an entire internet of data to scrape. Robotics data, by contrast, requires physical hardware, human operators, and real-world environments. "The field of robotics is still orders of magnitude smaller than where we see ourselves with language models today," says Philipp Wu, co-founder of a stealth robotics company.

To prioritize resources given these constraints, Philipp and his team think about robotics data as a pyramid. At the top sits the highest-quality data—that is, recordings tightly matched to the specific robot being deployed, capturing not just video but proprioception, operator actions, and sometimes touch and audio. It's the most useful for training a reliable policy, but it's also the hardest and most expensive to collect, which is why there's less of it. As you move down the pyramid, data becomes easier and cheaper to gather, but it also moves further from the robot's embodiment.

Robotics founders need to consider their company’s balance of data types with the resources they have available. "Different people have different strategies in terms of how to scale this up and how they leverage that type of data for training," says Philipp. The data pyramid is a starting point for making those trade-offs deliberately.

2. The value of data shifts over time as hardware catches up

Building full-stack means every layer of the system is your purview. For Jason Ma, co-founder of Dyna, that makes data quality a concern that runs through everything he does. "High-quality data really matters to get the model to be general and also be very reliable in real-world operation," he says. The Dyna team is building full-stack from hardware all the way to a generalized foundation model to solve end-to-end workflows in business settings, with initial deployments focused on the hospitality sector. 

One of Jason’s key observations is that data quality is not static over time. The same dataset can become significantly more or less useful depending on what hardware exists to train on it. For example, Jason points out that egocentric data collection has been around for at least five years, and a paper called EGO4D, which released over 3,600 hours of egocentric data. “But at the time, no one had a functioning humanoid robot that was remotely close to human in terms of form factor," says Jason. 

As humanoids improved and the embodiment gap narrowed, that dataset became considerably more useful. "The same data shifted in terms of how useful it is over time," he says. The implication for founders is that data strategy can't be static. As hardware improves and new form factors emerge, the datasets worth collecting today may not be the same ones worth collecting a year from now. 

3. Build in-house where the frontier is unsolved and buy everywhere else

One question every robotics founder faces is how to allocate resources across a stack that spans hardware, data, models, and infrastructure. For Jason, when good off-the-shelf solutions exist, there's no reason to reinvent them. "Take cameras, for instance. We know what kind of cameras we want, so we're not going to build them in-house," he says. But for anything that determines whether a robot actually works in the real world, such as the form factor, the robot hardware, the model, Dyna builds it themselves.

The reason comes down to the reliability bar. "We put out demos, but we also care a lot about actual real-world production," says Jason. "The gap between an 80% success rate and 99% success rate—and also the gap from 99% to 100%—is as big of a gap as from zero to 80%. Because of that, we build all the critical tech in-house."

Armen takes the same logic further. If the reliability bar is what drives Jason to build in-house, for Armen, the imperative is even more fundamental. "If you're building foundation models, you more or less have to own the full stack," he says. Fragmented teams working on pre-training and mid-training without a unified vision introduce too much risk. He's equally cautious about distillation—training your model on outputs from a larger model rather than original data. "You're essentially always chasing an ambulance," he says. "You're never going to be able to get past whatever you distilled from."

4. Reinforcement learning works best once robots can match what human demonstrations can teach

Contrary to the popular assumption that reinforcement learning (RL) is still on the horizon, Jason asserts that it’s already widely used in robotics today, just not for manipulation. "All the humanoid dancing and martial arts results you've seen lately use simulation-based reinforcement learning techniques," he says. Where RL has struggled is in visual, perception-based manipulation. The reason, Jason explains, is that simulation still can't accurately model the physics and visual context that complex manipulation requires.

For Phillip, though, the more important question is sequencing. "Our largest bottleneck right now is that we're not able to extract enough human capability into robotics yet," he says. “A robot can still be outperformed by a human teleoperating the same task. Until that gap closes, applying RL on top won't unlock much."

Philipp knows this challenge firsthand. In his 2023 Daydreamer paper, he applied RL to learn robot policies directly from scratch in the real world. The result: after more than 12 hours of continuous training, the robot could pick up a ball and move it. "This is a very, very simplistic task, and we still had to spend so much energy and so many hours training it," he says. Before RL can do its best work in manipulation, robots first need a much stronger foundation of human demonstration data to build on.

"Once we get to a place where robots can perform similarly to humans, then we can suddenly apply RL on top of this to optimize and get superhuman performance," says Philipp. This sequencing, he points out, would mirror what's already happened in LLMs: get machines to imitate humans first, then apply RL on top.

5. Pixel-level reconstruction is likely not the best path for world models

World models have attracted significant attention as a potential path to general purpose robotics. The idea is that if a robot can simulate how the physical world works, it can plan and act more reliably within it. But Armen thinks the field may be pursuing this in the wrong way.

Most current approaches rely on pixel-level reconstruction, essentially training a model to predict exactly what the next frame of video will look like, down to the pixel. It's an enormously difficult task, and in Armen's view, that level of fidelity may not actually be necessary. "Pixel-level reconstruction is not the best way to do world modeling," he says. "There's likely a better semantic formulation that's going to unlock much easier ways to train these world models." It's a research direction he believes is underinvested relative to its potential.

6. Physics understanding may be getting less attention than it deserves

Much of the excitement in robotics today is focused on algorithms, foundation models, and scaling data. Philipp thinks this focus is leaving something important on the table: a deeper understanding of the physical systems robots actually run on.

"What's understudied is the physics understanding of robotics," says Philipp. A lot of the impressive humanoid demos that have captured attention recently were enabled not just by algorithmic advances, but by serious work on understanding the gap between simulation and the real world—known as the sim-to-real gap—and on system identification, the process of precisely characterizing how a specific robot actually behaves physically.

"Right now, the field is taking a mass learning approach to try to learn deltas and variances between robot systems, which has its benefits, and we should continue doing so,” he says. “But I don't think we're spending enough time on the lower level, actually, and understanding that well and building really great physical systems and characterizing those.”

7. Data attribution is an underappreciated problem in robotics 

When a robot fails in the real world, often researchers don’t know exactly why. And when robots succeed in an unanticipated way, researchers often don’t have a full explanation either. This is the data attribution problem, and according to Jason, it is one of the most underappreciated unsolved challenges in robotics today.

"We have all these models trained on tens of thousands of hours of data," says Jason. "But we often don’t understand what data contributed to failures or successes." In computer vision and language, data attribution has made meaningful progress. "But in robotics, it's just very hard to trace from behavior back to what in the training data contributed to it," he says. "It's dark magic."

The practical cost of this is significant. Without understanding which data drives which behaviors, it's harder to collect better data, harder to debug models, and harder to deploy with confidence. Progress on data attribution, Jason argues, would have compounding benefits across the entire robotics development stack. "I haven't heard this issue being talked about as much as it should be," he says. "If we can do that for robotics, it'll help us collect better data, understand our models better, and deploy more confidently."

8. Know where open source data fits in your training stack— and where it doesn't

Every robotics founder eventually has to decide how much to invest in building proprietary datasets versus relying on what the open source community has already produced. There’s risk in over-indexing in either direction: if you over-invest, an open source release may render your data redundant, but if you under-invest, open source data may not be sufficient when you need it most.

Philipp takes a pragmatic approach. For the company he’s building, customer demand largely drives what gets collected—clients want specific data for specific deployments, which makes the build decision easier. But he's also clear that proprietary robotics data has structural advantages that open source is unlikely to replicate. Beyond video, high-quality robot teleoperation data captures proprioception, operator actions, and potentially tactile and audio signals across multiple camera views. "Those are things that are much more difficult to get from pure internet scale," he says.

Armen offers a framework for thinking about where open source fits. "Different qualities of data have their own respective stages where they're useful," he says. Open source teleoperation data—of which there are roughly 50,000 to 60,000 hours available—is well-suited for mid-training. But he's direct about its limits. "I probably would not use open source data for your last mile fine-tuning to get a policy running on the robot," he says. "It's quite rare for data to be completely useless—it more or less depends where you fit it in."

To read more about the future of robotics and physical AI, dive into our full library here.