TuneInTalks
From Lenny's Podcast: Product | Career | Growth

Inside the expert network training every frontier AI model | Garrett Lord (Handshake CEO)

1:09:50
August 24, 2025
Lenny's Podcast: Product | Career | Growth
https://api.substack.com/feed/podcast/10845.rss

Why Handshake's Student Network Became an AI Training-Data Powerhouse

When product teams and researchers talk about what actually moves the needle for large language models today, the answer increasingly points to high-quality human-created data. Garrett Lord, co-founder and CEO of Handshake, explains how a decade-old campus recruiting platform transformed overnight into one of the fastest-growing suppliers of post-training data for frontier AI labs. The story is as much about product-market fit as it is about recognizing an unfair advantage: access to millions of students, thousands of advanced-degree experts, and a trusted campus brand.

From Careers Platform to Human Data Marketplace

Handshake began as a social careers network for college students and early-career professionals. That long-term accumulation of profiles, academic signals, and university partnerships created a rare asset: a direct, targetable audience of 18+ million users that includes hundreds of thousands of PhDs and master’s students. With labs shifting from pre-training on ubiquitous internet text to post-training that requires specialized, verifiable, and often multimodal data, that audience became a new product.

What Post-Training Data Looks Like

Post-training work covers supervised fine-tuning, reinforcement learning with human feedback, trajectory capture, rubric-based evaluation, and multimodal labeling (audio, video, tool use). Garrett describes how experts—PhD researchers, domain specialists, and professional practitioners—are paid to discover model failure modes, provide ground-truth answers, and record step-by-step tool use. These units of data are often returned as structured JSON and packaged to be directly useful for post-training experiments and evaluation.

Why Experts Matter More Than Generalists

As models become more capable, the low-skill, generalist labor that once sufficed for simple labeling is less valuable. What frontier labs now need are experts who can break models in deep subdomains—mathematics, chemistry, physics, law, medicine—and produce constrained, verifiable datasets that improve reasoning, tool use, and domain-specific capabilities. Handshake's approach elevates contributors from anonymous micro-task labor to trained fellows who receive instruction, assessment, and higher compensation for hard work.

Three Priorities for Model Builders

  • Quality: every unit of data must be precise and verifiable to avoid contaminating model behavior.
  • Volume: labs need thousands to millions of units across focused hypothesis-driven experiments.
  • Speed: rapid iteration allows researchers to test hypotheses and expand only the data pipelines that show gains.

Scaling a New Business Inside an Old One

Launching a disruptive product from within a mature company requires structure and separation. Garrett outlines how Handshake spun up a distinct organization with dedicated engineering, product, design, and operations teams focused solely on the AI data business. Early hires were entrepreneurial and comfortable with ambiguity, processes were metrics-driven, and the culture emphasized extreme ownership and rapid customer feedback.

Competitive Moat: Audience Over Ads

Many competitors buy users through expensive ads and recruiter outreach. Handshake’s decade-long relationships with 1,600 universities and high brand affinity mean near-zero acquisition cost and higher conversion and retention for expert contributors. That audience access becomes the primary moat in human-sourced training data.

The Broader Impact On Careers And Research

Rather than displacing graduates, accessible AI tools combined with paid assessment work can accelerate career outcomes. Young people who are AI-native gain outsized productivity advantages, while PhD fellows earn substantial per-hour rates by doing specialized labeling and evaluation work that simultaneously informs their research and classrooms. For employers and universities, this model promises better talent matching, improved educational design, and measurable benefits to the labor market.

Types Of Data To Expect Next

Future datasets will grow beyond text: CAD files, scientific instrument telemetry, multimodal video trajectories, annotated tool interactions, and richer audio corpora. Synthetic data has a role in verifiable domains, but domain-specific human data will remain essential for many years as labs chase narrow, high-value capability gains.

Handshake’s pivot illustrates how an established network and deep domain trust can be repurposed into a high-velocity human data engine for AI labs pursuing post-training improvements; the result is faster model progress, new work pathways for experts, and a measurable business that shows how access to audience and expert quality can define competitive advantage.

Key points

  • Handshake leveraged 18 million students and alumni, including 500,000 PhDs and three million master’s students.
  • Handshake launched a post-training data business that reached $50M ARR within four months.
  • Post-training work focuses on supervised fine-tuning, RLHF, trajectory capture, and rubric-based evals.
  • Model builders prioritize quality, volume, and speed when buying human-created training data.
  • Experts can earn $100–$200 an hour performing high-value labeling and reasoning tasks.
  • Handshake’s competitive advantage is near-zero acquisition cost via university partnerships.
  • Successful internal spinouts require separate teams, metrics cadence, and entrepreneurial hires.
  • Human-in-the-loop labeling remains critical for domain-specific gains for at least the next decade.

Timecodes

00:00 Opening: Why human data matters right now
00:01 Guest introduction and Handshake overview
00:05 What is data labeling and post-training explained
00:09 Handshake's focus: expert network and target audiences
00:13 Concrete examples: GB QA paper and expert workflows
00:17 What trajectory and multimodal data mean
00:22 Quality, volume, and speed: what labs need
00:33 Origin story: discovering the AI data opportunity
00:36 Rapid growth: revenue milestones and lab partnerships
00:40 Incubating a new business inside an established company
00:45 Acquisition strategies and audience moat
00:52 Operational choices: teams, cadence, and hiring
01:00 Long-term vision: labor markets and matching
01:00 Future data types and role of synthetic data
01:03 Advice to entrepreneurs and closing thoughts
01:04 Lightning round and personal anecdotes
01:08 Closing remarks and hiring call

More from Lenny's Podcast: Product | Career | Growth

Lenny's Podcast: Product | Career | Growth
Why ChatGPT will be the next big growth channel (and how to capitalize on it) | Brian Balfour (Reforge)
ChatGPT could become the next dominant distribution platform—are you ready to place your bet?
Lenny's Podcast: Product | Career | Growth
How Intercom rose from the ashes by betting everything on AI | Eoghan McCabe (founder and CEO)
How Intercom turned a six-week GPT prototype into a $100M AI agent business.
Lenny's Podcast: Product | Career | Growth
Brian Chesky's secret mentor who died 9 times, started the Burning Man board, and built the world's first midlife wisdom school | Chip Conley (founder of MEA)
How joining Airbnb in his 50s launched a movement for midlife reinvention.
Lenny's Podcast: Product | Career | Growth
Inside ChatGPT: The fastest growing product in history | Nick Turley (Head of ChatGPT at OpenAI)
Inside the launch and future vision of GPT-5 from ChatGPT’s product lead.

You Might Also Like

00:0000:00