From Lenny's Podcast: Product | Career | Growth

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

1:22:35

October 23, 2025

Lenny's Podcast: Product | Career | Growth

https://api.substack.com/feed/podcast/10845.rss

Business

Entrepreneurship

Technology

What if most AI efforts fail because teams chase shiny tech instead of users?

That question sat with me after listening to Chip Huynh, a rare practitioner who has built models at Nvidia, taught at Stanford, and written the AI Engineering book many teams swear by. Her account is equal parts technical primer and hard-won workplace wisdom. I left feeling both relieved and warned — relieved because the path forward is practical, warned because too many companies are still fiddling with the knobs that don’t move the needle.

Start with people, not model wars

Here's what stood out: teams obsess about which model, which vector store, which agent framework — and often miss the basics of product craft. Chip made me squint at familiar patterns. She kept returning to the same prescription: talk to users, prepare better data, design end-to-end workflows, and write clearer prompts. Those are boring, stubbornly human tasks. Yet they are far more likely to change outcomes than subscribing to every new frontier model.

Pre-training, fine-tuning and why "post-training" matters now

Chip gives a clean map of model stages. Pre-training encodes statistical patterns of language. Fine-tuning nudges those weights for specific tasks. But the real action lately lives in post-training: reinforcement learning, RLHF, reward models and careful evaluation. That struck me because it shifts the center of gravity away from raw model size and toward how companies curate data and shape behavior.

RLHF, reward models and the human-in-the-loop economy

I was surprised by how central human comparison judgments remain. Instead of absolute scores, ML teams often train reward models on pairwise preferences — humans pick A over B — and then refine behavior with reinforcement learning. That human labor has become a scarce, oddly lopsided market: many labeling startups but a small number of buyers. Chip admitted she’s uneasy about that economics — and so should we be.

RAG isn’t magic — it’s context engineering

Retrieval Augmented Generation (RAG) sounds like a silver bullet until you realize that the quality of answers depends more on how you prepare the retrieval corpus than which vector database you use. Chunk size, contextual metadata, hypothetical Q&A rewrites, and summaries for AI consumption made me rethink indexing as an act of editing. The biggest gains Chip has seen come from better data preparation, not database brand debates.

Evals aren’t just tests — they’re product instruments

Chip framed evals as creative design work. They tell you where the product performs well and where it fails for specific user segments. Her pragmatic take landed: you don’t need a hundred evals; you need targeted ones that guide product decisions. For high-risk, high-scale features, be rigorous. For early-stage experiments, be pragmatic — aim for “good enough” and iterate.

Measure productivity the old-fashioned way — with experiments

This was a personal favorite. Teams keep reporting fuzzy productivity gains from AI tools. Chip offered a practical remedy: randomized trials. One friend’s A/B test with a coding assistant showed senior engineers sometimes benefit most — counter to the narrative that juniors will be replaced overnight. That trial mindset is the cure for wishful measurement.

Systems thinking beats syntax

Chip kept returning to a theme I want to elevate: system thinking. As tooling automates many discrete tasks, the remaining valuable skill is designing how pieces fit together. Debugging a broken deployment revealed this for me — the problem wasn’t the generated code, it was a tier mismatch in the hosting plan. Knowing how systems interact is the future of engineering education.

Multimodal and voice: hard edges ahead

Text is the easy part. Audio and video introduce latency, interruption logic, naturalness, and regulation questions. Making a voice assistant feel "human" without tricking users is an engineering and ethical tightrope. I found Chip’s emphasis on latency and conversational interruption delightful and sobering — voice will be a major battleground of product design.

Organizational shifts and the micro-tool revolution

Expect teams to reorganize. Chip sees a blurring of product, engineering, and data roles so someone owns the end-to-end user metric. At the same time, she celebrated micro tools — tiny automations that fix daily frustrations. Her advice here felt immediate: look for small, miserably annoying problems you personally encounter — they’re the best seed for an AI micro-tool.

Final thought

Honestly, my biggest reaction was relief. The future of useful AI is not just bigger models or flashier agents. It’s mundane, craft-driven work: better data, sharper evals, clearer product metrics, and system thinking. That’s less glamorous — and a lot more human — but it’s where lasting value lives.

insights

Insights

Prioritize user interviews and end-to-end workflows before debating model choice.

Design 3–10 focused evals for the product’s core user journey to surface failures.

Invest in data preparation for RAG: optimal chunk size, metadata, and QA rewriting.

Run randomized A/B tests to measure AI tool productivity rather than rely on anecdotes.

Restructure teams so product, engineering, and data jointly own reliability metrics.

Teach system thinking over only syntax; it prepares engineers to use AI effectively.

Treat RLHF as an investment in behavior shaping — hire domain experts for reward labels.

More from Lenny's Podcast: Product | Career | Growth

Lenny's Podcast: Product | Career | Growth

Inside the expert network training every frontier AI model | Garrett Lord (Handshake CEO)

How Handshake turned a decade-old student network into a $50M AI training-data powerhouse.

Business

Entrepreneurship

Technology

1:09:50

Aug 24, 2025

Lenny's Podcast: Product | Career | Growth

How Intercom rose from the ashes by betting everything on AI | Eoghan McCabe (founder and CEO)

How Intercom turned a six-week GPT prototype into a $100M AI agent business.

Business

Entrepreneurship

Technology

1:23:20

Aug 21, 2025

Lenny's Podcast: Product | Career | Growth

Why ChatGPT will be the next big growth channel (and how to capitalize on it) | Brian Balfour (Reforge)

ChatGPT could become the next dominant distribution platform—are you ready to place your bet?

Business

Entrepreneurship

Technology

1:29:11

Aug 17, 2025

Lenny's Podcast: Product | Career | Growth

The one question that saves product careers | Matt LeMay

Learn three practical steps product teams use to link work directly to business results.

Business

Entrepreneurship

Technology

1:32:09

Aug 14, 2025

The Thoughtful Entrepreneur

The Wolf Of All Streets

Good Sleep: Positive Affirmations

KGCI: Real Estate on Air

The School of Greatness

00:0000:00

What if most AI efforts fail because teams chase shiny tech instead of users?

Start with people, not model wars

Pre-training, fine-tuning and why "post-training" matters now

RLHF, reward models and the human-in-the-loop economy

RAG isn’t magic — it’s context engineering

Evals aren’t just tests — they’re product instruments

Measure productivity the old-fashioned way — with experiments

Systems thinking beats syntax

Multimodal and voice: hard edges ahead

Organizational shifts and the micro-tool revolution

Final thought

Insights

More from Lenny's Podcast: Product | Career | Growth

You Might Also Like