TuneInTalks
From Lenny's Podcast: Product | Career | Growth

First interview with Scale AI’s CEO: $14B Meta deal, what’s working in enterprise AI, and what frontier labs are building next | Jason Droege

1:24:01
October 9, 2025
Lenny's Podcast: Product | Career | Growth
https://api.substack.com/feed/podcast/10845.rss

Can experts teach machines better than more data ever could?

At first glance the idea sounds backwards: pour more human expertise into artificial intelligence to make it smarter. Yet Jason Drogi, now running Scale AI, argues that the shortcut for models isn’t simply more text or compute. It’s guided human judgment—precise, domain-specific corrections that teach models how to behave inside real-world systems.

From labeling images to choreographing agents

Scale’s origin story reads like a footnote to modern AI history: label the data, improve the model, repeat. What struck me listening to Drogi was how fast the problem changed. Tasks that once asked crowdworkers to prefer one poem over another now demand PhDs spending hours explaining a single medical nuance.

That shift matters because models are moving from knowing to doing. It’s one thing for a model to answer a trivia question. It’s another for it to navigate a Salesforce instance, book a meeting, or flag an obscure allergy that would interact dangerously with a prescription. The latter requires annotated judgment, not just raw examples.

Why experts, not armies of generalists

Hiring cheaper, generalist labor worked early. But Drogi described a present where roughly 80% of his contributor network holds at least a bachelor’s degree and about 15% hold PhDs. Those people aren’t filling gaps with surface corrections. They are writing benchmarks, annotating intent, explaining trade-offs and building evals—tests that define what “good” looks like.

I found the healthcare example particularly revealing. A specialist, faced with a 300-page dossier, used a purpose-built AI to surface top diagnostic flags—sometimes catching allergies that human teams might miss. That kind of outcome isn’t achieved by feeding everything the model can see; it’s achieved by digitizing how a human expert reasons through messy, inconsistent information.

Environments, agents, and reinforcement learning

Drogi framed another useful lens: put the agent in the environment it will operate in. Training an agent to act inside a particular Salesforce configuration is different from training it to parse email. The permutations are vast, so the industry is trying to find data and training approaches that generalize across many contexts without collecting a trillion specific examples.

Call it engineering tradecraft: the value comes from exercises that are generalizable and representative, and from thoughtfully designed evals that expose where an agent should ask for human help.

What this means for enterprises

If you’re running a company, the headline is practical: expect a longer runway. Low-fidelity pilots can get to 60–70% of the way there quickly, but getting to reliable automation that a business trusts takes six to twelve months of careful engineering, labeling, and change management. That is tedious. It is also necessary.

I appreciated Drogi’s honesty around timelines. He pushed back on viral narratives that AI will instantly replace complex knowledge work. The better bet is that AI amplifies human judgment—at first by doing the heavy lifting and routing edge cases to experts, and later by shifting what expertise looks like.

Product lessons that read like a founder’s playbook

The conversation detoured into product and startup craft, where Drogi’s past—building Uber Eats among other ventures—provided color. He returned to a theme I’ve seen before: understand the underlying incentives of customers. Don’t just ask what they want; figure out what they truly need today and whether they have urgency to adopt a solution.

He also insisted on a simple hiring framework: curiosity, collaborative humility, and leadership potential. That felt refreshingly practical. If teams are composed as organisms—balanced to compensate for each other’s weaknesses—the odds of survival and later scale improve.

Margins, negotiating, and the subtlety of risk

Drogi is a believer in using gross margin as a quick filter to test whether a business idea adds real value. He also told stories that reminded me how negotiable business terms can be—and how luck and timing change outcomes. Those anecdotes gave the interview human texture; they showed how strategy and stubbornness sometimes collide to produce a favorable deal.

My takeaways

  • Models will get smarter by learning from experts inside contexts, not just from more generic data.
  • Enterprises should budget for eight- to twelve-month projects when they want trustworthy automation.
  • Hiring for curiosity and teamwork beats hunting for a perfect résumé in fast-moving fields.

Honestly, I didn’t expect how operational the AI story would sound. It’s less about magic and more about craftsmanship—observation, measurement, and iteration. There’s a grind to teaching machines what humans already know intuitively.

What if the most valuable AI work over the next three years is quietly biographical—experts translating a lifetime of decisions into the few signals a machine can act on? It’s an unglamorous thought, but one that explains why Scale’s business still matters.

I left the conversation feeling both impatient and reassured. The headline tech will keep dazzling us, but the real progress will hinge on people who are willing to do the slow, exacting work of teaching machines how to decide.

Insights

  • Prioritize building generalizable training tasks so labels transfer across multiple enterprise environments.
  • Budget six to twelve months and partner closely with experts to move from POC to production.
  • Hire for curiosity, cross-team collaboration, and leadership potential instead of perfect resumes.
  • Use gross margin as a fast filter when evaluating new business ideas and defensibility.
  • Design evals early—define what 'good' looks like before automating critical decision processes.

Timecodes

00:00 Guest introduction — Jason Drogi and Scale background
00:00 Meta transaction and Scale’s independence explained
00:00 Shift from generalist labeling to expert-driven tasks
00:00 Healthcare use case: diagnosing from long patient records
00:00 Product lessons and founding philosophy from Uber Eats
00:01 Lightning round and closing thoughts

More from Lenny's Podcast: Product | Career | Growth

Lenny's Podcast: Product | Career | Growth
Inside the expert network training every frontier AI model | Garrett Lord (Handshake CEO)
How Handshake turned a decade-old student network into a $50M AI training-data powerhouse.
Lenny's Podcast: Product | Career | Growth
How Intercom rose from the ashes by betting everything on AI | Eoghan McCabe (founder and CEO)
How Intercom turned a six-week GPT prototype into a $100M AI agent business.
Lenny's Podcast: Product | Career | Growth
Why ChatGPT will be the next big growth channel (and how to capitalize on it) | Brian Balfour (Reforge)
ChatGPT could become the next dominant distribution platform—are you ready to place your bet?
Lenny's Podcast: Product | Career | Growth
The one question that saves product careers | Matt LeMay
Learn three practical steps product teams use to link work directly to business results.

You Might Also Like

00:0000:00