AI & ML Developer

Inside Shopify’s AI-first engineering playbook

Shopify’s Head of Engineering on the AI infrastructure, culture, and guardrails behind productivity gains.

Shopify has generated a lot of buzz about AI-native development, much of which has been fueled by viral posts from CEO Tobi Lütke and other leaders describing how deeply AI is being embedded into their engineers’ everyday workflows.

In one post, Tobi stated that he had “shipped more code in the last three weeks than the decade before,” attributing the increase to AI tools. The posts quickly circulated among founders and investors, prompting widespread curiosity about whether the hype is real—and how leadership is enabling engineers to operate AI-first.

We sat down with Farhan Thawar, VP & Head of Engineering at Shopify—who estimates that his team is 20% more productive—to demystify the company’s AI playbook. Farhan and his team were early adopters of AI. As soon as GitHub first launched their Copilot product, he recalls texting the company’s CEO saying, “I don't care if you have a go-to-market for this. I don't care what the price is. How do I get this inside Shopify for every engineer?” When the CEO said that it wasn’t yet possible, Farhan remembers replying: “Figure out a way.”

This was a year before ChatGPT democratized AI usage. "I just knew this was going to change how we work," says Thawar. Five years later, AI has exploded, and Farhan remains at the vanguard of what’s possible with this emerging technology. In this operating guide, Farhan breaks down his approach to AI—from the infrastructure powering it to how he measures engineering productivity and his predictions for the agentic future.

Key takeaways from Shopify’s AI-first engineering strategy

Infrastructure standardization unlocks tool experimentation. Rather than forcing teams onto a single AI tool, Shopify standardized the layer underneath — building an LLM proxy that routes all AI requests through one gateway. This approach allows engineers to experiment with Claude Code, GitHub Copilot, and other tools simultaneously while maintaining centralized cost control, usage analytics, and model flexibility. The lesson: in a rapidly evolving AI landscape, standardize infrastructure, not tools.

The 20% productivity gain is real, but it’s not about code volume. Farhan’s team has achieved roughly 20% productivity improvements (a number Farhan remarks as just a humble estimate), but not through traditional metrics like lines of code or pull requests (which are easily gamed). The real gains show up in faster prototyping, exploring 10 approaches instead of two, and higher-fidelity deliverables across all functions. The best measure of progress? Weekly demos that show tangible velocity and unblock teams in real-time.

Cultural adoption beats top-down mandates every time. Shopify’s “make it look easy” approach — where leaders openly share how they solve problems with AI, rather than pure brilliance — drove organic adoptions across engineering, sales, finance, and HR. Paired with low-friction enablement such as prompt libraries, setup guides, and MCP server connections, this cultural nudging led to unexpected outcomes. Salespeople creating custom dashboards, finance teams creating “n-of-1” software built by non-engineers. Access + enablement + leadership modeling = effective change management.

Comprehension debt is the #1 long-term risk. While AI can dramatically accelerate development, Farhan warns about the brain being a muscle that you can’t let atrophy. If engineers stop thinking deeply and learning, they’ll lose understanding of their systems. His guardrail: engineers must understand systems 2-3 layers below where they’re working, using AI to accelerate learning, not replace it. Companies that accumulate comprehension debt won’t be able to maintain or evolve their systems when things break down.
2026 belongs to those who master agentic harnesses. The next competitive advantage lies in orchestrating AI agents, either through parallel execution (10 agents working simultaneously with human review and merge) or sequential critique loops (extended 45+ minute thinking sessions with multi-model interrogation). Farhan is direct in his perspective: “If you don’t figure out how to harness agents in 2026, you’ll be behind.” Engineers need to move from writing every line of code to directing intelligent systems and evaluating outputs — a fundamentally different skill set that requires new infrastructure, workflows, and mental models.

Part 1: AI infrastructure

1. Standardize infrastructure, not AI tools

The foundation of Shopify’s AI strategy is its infrastructure. Rather than standardizing on a single AI tool, Shopify focused on building a platform layer that allows many tools and models to coexist. The team built a centralized LLM proxy—an internal gateway that routes all AI requests through a single platform layer. The proxy sits between Shopify’s internal tools and the underlying AI models. Every request from tools like Claude Code or Copilot flows through the proxy before reaching models from providers like OpenAI, Anthropic, or Google.

Instead of standardizing a single AI tool, Shopify standardized the infrastructure layer that all AI tools run on. This system has many benefits, including the fact that Shopify can manage costs centrally. Since AI models charge per token, costs can ramp up quickly across thousands of employees. By purchasing tokens in bulk and routing usage through a shared gateway, he can access discounted bulk rates while also monitoring spending across teams and projects.

Leadership can see where experimentation is happening and which workflows are gaining traction. “I can look at usage by team, by project, by person,” Farhan explains. “We get alerts if someone spends more than $250 in tokens in a day.” Instead of shutting it down, Farhan investigates what they’re building, often discovering ambitious and worthwhile experiments, like attempts to refactor large parts of Shopify’s mobile codebase.

The proxy also creates model flexibility. Because tools connect to the gateway rather than directly to a provider, Shopify can switch models behind the scenes as capabilities improve or costs change.

2. Connect AI to your internal systems

AI tools become far more useful when they can interact with the systems employees already use to do their jobs. At Shopify, this is done through MCP servers and internal systems like their wiki, product management tool (GSD), and data warehouse. These servers allow AI tools to query and retrieve information across those systems in a structured way.

For example, an employee preparing for a meeting could ask an AI assistant for context on a person or account. The system might pull information from Salesforce, check relevant Slack conversations, and look at calendar events or documents stored in Google Workspace to build a fuller picture. Crucially, access controls remain intact. AI only retrieves information that the user already has permission to see. “Because it’s going through the same auth flow that you have,” Farhan explains, “it’s not going to give me information that I don’t have access to.”

Some of these MCP servers are written internally, while others come from vendors. But regardless of where they originate, they must meet the same reliability and testing standards as any other internal system—because once AI is integrated this deeply into workflows, those connections become part of the company’s core infrastructure.

3. Make internal tools easy to build and deploy

Farhan compares Shopify’s internal tooling moment to the early days of the internet. “If you remember GeoCities, you could just create a website on the internet with a URL,” he says. The goal was similar: remove the friction so anyone in the company can quickly create and share simple software.

To enable that, Shopify built an internal tool called Quick. Employees can drag and drop a JavaScript, TypeScript, or HTML file, assign a URL, and instantly deploy a working application that anyone inside the company can access. This dramatically lowers the barrier to building small internal tools. Sales, support, finance, and other teams can now use AI tools to generate simple apps and deploy them without needing help from engineering.

Farhan shared one example from a recent merchant meeting. Before the call, someone sent him a Quick link that compiled everything he needed to know about the merchant—from internal systems and data sources—into a simple dashboard.

Instead of waiting for engineering resources, employees can build these “n-of-1” tools themselves. The result is less operational friction and more experimentation across the company, often resulting in teams solving their own problems and increasing GTM efficiency and revenue.

4. Allow engineers to experiment with many tools

Many organizations approach AI adoption the same way they approach software procurement: pick a standard tool, roll it out company-wide, and limit the rest. When it comes to AI, Farhan took the opposite approach. Rather than standardizing on a single AI tool, the company standardized the infrastructure layer underneath.

Today, Shopify engineers use a wide mix of AI coding tools, including Cursor, Claude Code, GitHub Copilot, OpenAI Codex, and experimental tools from Gemini. The company intentionally allows this diversity because the AI ecosystem is evolving too quickly for a single best-in-class tool to emerge. “At Shopify, we always have one tool for one job—except for with AI,” he explains, “since we don’t know yet which company, workflow, or model is going to win.”

Part 2: Adoption and enablement

5. Drive AI adoption through culture paired with tooling

When Farhan first launched Cursor at Shopify, adoption caught fire in ways he wasn't expecting. He had expected engineers to be the main users. However, the tool quickly gained traction with teams across sales, finance, and HR. Farhan recalls Tobi joking that the release almost worked too well. The Cursor team even wanted to know how Farhan was able to get salespeople to adopt the tool so successfully.

Farhan regularly posts examples of work he completed with AI, framing it not as a show of brilliance but as a show of leverage: “I didn’t say look at how much work I did and how smart I am. I said, look how lazy I am.” That combination of company-wide sharing, visible leadership modeling, and low-friction enablement made adoption feel less like a directive and more like an obvious advantage.

Once people outside R&D got their hands on the tools, they started building what Farhan calls “n-of-1 software”—small, highly specific tools for their own workflows. Sales teams were writing queries, building reports, creating decks, and generating Monthly Business Reviews (MBRs) without waiting on engineering. Leadership then helped reinforce that behavior by publicly sharing their own use cases.

The result was a deep and widespread adoption, both among engineers and company-wide. “A lot of our tactics were simply nudging, showing people demos, and bragging about how ‘lazy’ we are— working smarter, not harder,” he says. “We’d say, ‘Look what I built in five minutes.’ There's no forcing. I just try to show people what's possible.”

Farhan emphasizes that the push to use AI is cultural first and foremost, paired with deliberate enablement and widespread access to the tools. This included practical onboarding, setup guidance, connections of all systems to MCP servers, and an internal prompt library that let employees reuse and adapt workflows that were already working.

This cultural push is reinforced by incentives from the top: employees are evaluated on how “AI-reflexive” they are in their biannual performance reviews—how quickly they turn to AI when they encounter a problem.

6. Establish clear ownership for AI enablement

At Shopify, a small internal team builds the AI infrastructure that lets engineers experiment safely and cheaply. “We have an ML infrastructure team,” Farhan says. “It’s a small team—pretty sure it’s six engineers,” Farhan says. Their job isn’t to dictate how teams use AI; it’s to remove friction.

The team maintains the LLM proxy, monitors model performance, and ensures engineers can access AI tools without latency or reliability issues. “They’re always looking for opportunities to reduce toil for engineers and create more enablement,” Farhan explains. “They’re constantly asking, ‘How do I reduce latency? How do I reduce friction? How do I make sure people aren’t blocked?’”

Part 3: Tracking AI impact

7. Don’t confuse output with productivity

Measuring engineering productivity has always been notoriously difficult—and AI only amplifies the challenge. When AI tools can generate large amounts of code quickly, traditional metrics like lines of code or pull requests become even less meaningful. More output doesn’t necessarily mean more progress.

“There has never really been a good metric to determine whether an engineer is productive or not,” admits Farhan. He offers an example from his own team: an intern once deleted six lines of code that ended up saving Shopify $600,000/year in infrastructure costs. By any meaningful definition, that was an extremely productive change. But by traditional metrics, it would barely register. “It would have been very hard for me to recognize this impact with an automated tool,” he says.

In Farhan’s view, the goal of good engineering has never been to maximize the volume of code produced. In fact, the opposite is often true. He points to a well-known joke from the pair programming community: when someone asked whether pair programming would cause engineers to write half as much code, the response was that they would actually write even less.

The same principle applies in an AI-assisted world. AI systems can generate large amounts of code quickly, but more code isn’t necessarily better code. “Code is cheap now,” Farhan says. “But I don't want code, I want solutions.” Ideally, he adds, AI could help engineers produce “small, elegant, shorter code,” not simply more of it.

Conversely, in an AI-centric world, the type of code written, read, and modified by AI might have completely different properties from the code once maintained by humans. Metrics may likely emerge from the structure and properties of this type of code. Only time will tell.

8. Measure impact with real signals, not vanity metrics

In the face of this complexity, Farhan believes the most reliable signal of progress is running weekly demos where teams can showcase what they’re actually building. “What I’ve said in the past—and I'll still stand behind—is that the best way to determine if progress is happening is weekly demos. There are lots of people trying to triangulate engineer productivity based on a variety of metrics,” he says. “But the best way is still very human.”

Farhan adds that with these demos, leadership can check alignment, ask questions to understand successes and blockers. He does see a slight correlation between AI usage and those who ship more code—though he quickly caveats that this is slight and that if this were an established KPI, engineers could easily game it without an actual meaningful increase in productivity.

Overall, however, Farhan is seeing meaningful gains in productivity with more AI usage. “Our goal is to give every engineer superpowers. We’re not trying to reduce the size of the workforce. We are trying to enable our engineers to do more,” he says. Farhan sees engineers exploring more approaches to problems, testing more ideas, and moving faster through experimentation cycles. While his costs are increasing non-trivially by equipping employees with AI, Farhan estimates that engineer productivity has increased by roughly 20%. The gains have shown up most noticeably in teams shipping features faster and improving overall product quality.

Part 4: Quality and security guardrails

9. Keep an eye on quality using reversion rate and humans in the loop

As AI accelerates development speed, it raises the question: how do you maintain quality and security when more code is being generated than ever before? Farhan stresses the importance of having guardrails in place to protect code quality and system integrity.

One common concern is that if engineers are producing more code using AI, they might also be introducing more bugs into production. At Shopify, Farhan’s team tracks this by looking at reversion rates—how often a pull request has to be rolled back after it’s merged. If AI were generating lower-quality code, you would expect that rate to increase. So far, he reports that hasn’t happened.

Farhan says the company has seen a slight increase in the number of pull requests engineers are shipping each week when they use AI tools, but the reversion rate of those PRs has remained roughly the same. Engineers appear to be shipping more code without a corresponding drop in quality.

Importantly, the team never merges code without a senior engineer’s review. “Shopify is not yet at the place where we allow AI to check in code automatically into the repos,” says Farhan. “We still require a human PR reviewer, which is now becoming a big bottleneck because if lots of code is being generated by AI, more time is needed to review the code.”

For now, Farhan sees that friction as a necessary safeguard. As AI speeds up development, maintaining careful human review ensures that speed doesn’t come at the expense of reliability.

10. Use AI as a partner in finding security vulnerabilities

Another concern that emerges as AI accelerates the pace of software development is whether security measures can keep up. Some proponents argue that LLMs may even write more secure code than humans, especially when it comes to common vulnerabilities like SQL injection. Farhan is currently skeptical of that claim. In his view, AI often generates more verbose code than a human would, which can actually introduce additional surface area for mistakes.

Instead of assuming AI will automatically produce safer software, Shopify is exploring a different approach: using AI as a security partner. “One thing AI is very good at is looking for vulnerabilities,” Farhan explains. By providing the model with the right context and prompts, engineers can ask it to interrogate code for logical flaws, unsafe patterns, or architectural weaknesses. In that role, the model acts less like a developer and more like a reviewer.

This type of analysis can extend beyond code itself. AI can also probe APIs, explore system boundaries, and perform fuzz testing—sending unexpected or malformed inputs to uncover hidden vulnerabilities. Farhan emphasizes that this doesn’t replace human responsibility for security. “I would not abdicate,” he says. “I would use it as a pairing partner to help you find those holes.” In practice, that means engineers need to actively guide the model. AI systems won’t automatically hunt for vulnerabilities unless they’re prompted to do so.

“You have to direct it with prompts like: Act as a senior security researcher. Analyze the following controller code for Insecure Direct Object Reference (IDOR) vulnerabilities. Specifically, check if the user_id or resource_id in the request parameters is being used to fetch data from the database without verifying that the currently authenticated user (session.user_id) has the explicit permission to access or modify that specific record. Highlight any line where a database lookup occurs without a multi-tenant ownership check.” Farhan explains.

Used this way, AI becomes a powerful tool for scaling security analysis. It can’t guarantee that systems are perfectly safe, but it can dramatically expand the amount of testing and review a team has the capacity to perform. “It's a tedious operation for humans that could be a very good use case for LLMs,” says Farhan. “It can't prove that there's no security hole, but it can enable you to do a lot more analysis than you would've done without it.”

11. Beware of comprehension debt

Farhan says there is one AI risk that worries him more than any other: comprehension debt. “The brain is a muscle,” he says, “If you stop going to the gym—or stop using your brain—it will atrophy.” As AI systems generate more code and automate more tasks, engineers may gradually lose their understanding of how the systems they maintain actually work.

While tools built for personal use (like internal dashboards or workflow helpers) don’t require the same level of scrutiny, Farhan insists that anything that touches Shopify’s core commerce infrastructure still demands deep human oversight. “In general, I tell my team that they need to understand things two or three layers below the layer they’re working at,” he says. Farhan compares this mindset to the way elite Formula One drivers approach their craft. The best drivers don’t just know how to drive the car—they understand the engine, the braking systems, and the materials the vehicle is made of. That depth of understanding is what allows them to react when something goes wrong.

The same principle applies to engineering in an AI-native world. “You shouldn’t abdicate the thinking,” Farhan says. “You should abdicate the toil.” AI can help interrogate APIs, test edge cases, and accelerate experimentation. But engineers still need to understand the systems they’re building.

“If you’re trying to connect to an API, have the AI help you learn,” recommends Farhan. “Have it interrogate the API for you, or have it test out the boundary conditions of the API. But do not abdicate the thinking and say, ‘Hey, go build this for me, and then I'll come back after lunch.’”

Farhan stresses that engineers must use AI in a way that complements their learning, instead of delegating the learning. “If your use of AI takes that away from your mental capacity, I think you will lose over the long term,” he states.

Part 5: Agentic workflows

12. Prepare for agentic development

Looking ahead, Farhan believes the next major shift in software development will be the rise of agentic workflows—systems where multiple AI agents collaborate with engineers to write, test, and refine code. In this model, developers spend less time writing individual lines of code and more time directing and evaluating AI systems.

“The move in 2026 is agentic harnesses,” he says. In practice, that means delegating more of the repetitive work of coding to AI while engineers focus on higher-level decisions. The guiding question: “How do I get more of the AI to focus on the toilsome parts of writing code so that I can focus on the strategic parts?”

One emerging pattern involves running multiple agents in parallel. Some of Shopify’s senior engineers now launch several AI agents simultaneously to work on different parts of a codebase. The engineer then reviews the outputs, discards what doesn’t work, and merges the pieces that do—dramatically increasing the pace of development.

Another pattern focuses on deeper reasoning rather than parallelism. Instead of spawning many agents, an engineer might run a single model through extended critique loops, where the AI generates an answer, evaluates it, revises it, and continues refining the work over long reasoning cycles. Both approaches reflect a broader shift in how engineers interact with software systems. Rather than writing every line of code themselves, developers increasingly act as orchestrators, guiding AI systems and evaluating their output.

“If you don’t figure out how to harness the agents in 2026, you’ll be behind,” Farhan warns. Shopify is already investing in the infrastructure required to support this model—building systems that allow AI agents to operate safely inside large codebases while keeping engineers in control of the final decisions.

The future of AI-first engineering

Shopify’s experience suggests that building an AI-first engineering organization is less about adopting a single breakthrough tool and more about designing the right operating system around it. Infrastructure must make experimentation cheap and safe. Culture must encourage engineers to reach for AI by default. And guardrails must ensure that teams move faster without sacrificing quality or understanding of the systems they build.

As AI capabilities continue to improve, the role of engineers may increasingly shift from writing every line of code to orchestrating intelligent systems that do. The companies that learn how to harness that leverage—while preserving deep technical comprehension—will define the next era of software development. But knowing the principles is different from making the actual decisions your team faces. The key is starting with the right approach for your current scale and stage, while planning for evolution as you grow.