First interview with Scale AI’s CEO: $14B Meta deal, what’s working in enterprise AI, and what frontier labs are building next | Jason Droege
Can experts teach machines better than more data ever could?
At first glance the idea sounds backwards: pour more human expertise into artificial intelligence to make it smarter. Yet Jason Drogi, now running Scale AI, argues that the shortcut for models isn’t simply more text or compute. It’s guided human judgment—precise, domain-specific corrections that teach models how to behave inside real-world systems.
From labeling images to choreographing agents
Scale’s origin story reads like a footnote to modern AI history: label the data, improve the model, repeat. What struck me listening to Drogi was how fast the problem changed. Tasks that once asked crowdworkers to prefer one poem over another now demand PhDs spending hours explaining a single medical nuance.
That shift matters because models are moving from knowing to doing. It’s one thing for a model to answer a trivia question. It’s another for it to navigate a Salesforce instance, book a meeting, or flag an obscure allergy that would interact dangerously with a prescription. The latter requires annotated judgment, not just raw examples.
Why experts, not armies of generalists
Hiring cheaper, generalist labor worked early. But Drogi described a present where roughly 80% of his contributor network holds at least a bachelor’s degree and about 15% hold PhDs. Those people aren’t filling gaps with surface corrections. They are writing benchmarks, annotating intent, explaining trade-offs and building evals—tests that define what “good” looks like.
I found the healthcare example particularly revealing. A specialist, faced with a 300-page dossier, used a purpose-built AI to surface top diagnostic flags—sometimes catching allergies that human teams might miss. That kind of outcome isn’t achieved by feeding everything the model can see; it’s achieved by digitizing how a human expert reasons through messy, inconsistent information.
Environments, agents, and reinforcement learning
Drogi framed another useful lens: put the agent in the environment it will operate in. Training an agent to act inside a particular Salesforce configuration is different from training it to parse email. The permutations are vast, so the industry is trying to find data and training approaches that generalize across many contexts without collecting a trillion specific examples.
Call it engineering tradecraft: the value comes from exercises that are generalizable and representative, and from thoughtfully designed evals that expose where an agent should ask for human help.
What this means for enterprises
If you’re running a company, the headline is practical: expect a longer runway. Low-fidelity pilots can get to 60–70% of the way there quickly, but getting to reliable automation that a business trusts takes six to twelve months of careful engineering, labeling, and change management. That is tedious. It is also necessary.
I appreciated Drogi’s honesty around timelines. He pushed back on viral narratives that AI will instantly replace complex knowledge work. The better bet is that AI amplifies human judgment—at first by doing the heavy lifting and routing edge cases to experts, and later by shifting what expertise looks like.
Product lessons that read like a founder’s playbook
The conversation detoured into product and startup craft, where Drogi’s past—building Uber Eats among other ventures—provided color. He returned to a theme I’ve seen before: understand the underlying incentives of customers. Don’t just ask what they want; figure out what they truly need today and whether they have urgency to adopt a solution.
He also insisted on a simple hiring framework: curiosity, collaborative humility, and leadership potential. That felt refreshingly practical. If teams are composed as organisms—balanced to compensate for each other’s weaknesses—the odds of survival and later scale improve.
Margins, negotiating, and the subtlety of risk
Drogi is a believer in using gross margin as a quick filter to test whether a business idea adds real value. He also told stories that reminded me how negotiable business terms can be—and how luck and timing change outcomes. Those anecdotes gave the interview human texture; they showed how strategy and stubbornness sometimes collide to produce a favorable deal.
My takeaways
- Models will get smarter by learning from experts inside contexts, not just from more generic data.
- Enterprises should budget for eight- to twelve-month projects when they want trustworthy automation.
- Hiring for curiosity and teamwork beats hunting for a perfect résumé in fast-moving fields.
Honestly, I didn’t expect how operational the AI story would sound. It’s less about magic and more about craftsmanship—observation, measurement, and iteration. There’s a grind to teaching machines what humans already know intuitively.
What if the most valuable AI work over the next three years is quietly biographical—experts translating a lifetime of decisions into the few signals a machine can act on? It’s an unglamorous thought, but one that explains why Scale’s business still matters.
I left the conversation feeling both impatient and reassured. The headline tech will keep dazzling us, but the real progress will hinge on people who are willing to do the slow, exacting work of teaching machines how to decide.
Insights
- Prioritize building generalizable training tasks so labels transfer across multiple enterprise environments.
- Budget six to twelve months and partner closely with experts to move from POC to production.
- Hire for curiosity, cross-team collaboration, and leadership potential instead of perfect resumes.
- Use gross margin as a fast filter when evaluating new business ideas and defensibility.
- Design evals early—define what 'good' looks like before automating critical decision processes.




