← Back to issue
Dwarkesh Podcast

How GPT-5, Claude, and Gemini are actually trained and served — Reiner Pope

2h 13m · Transcribed via assemblyai · Watch on YouTube

Blackboard lecture from Reiner Pope (CEO Maddox, ex-Google TPU architecture) — the technical *underpinning* for why everything in the 20VC + All-In episodes is true. Roofline analysis (memory bandwidth vs compute) explains: why optimal inference batch is ~300 × sparsity (~2-3k tokens), why coding-agent 'fast modes' charge 6x for 2.5x speed, why Gemini's pricing jumps 50% at 200k context, why output tokens cost 5x input tokens, why frontier models effectively run inside a single rack, why scale-up domain size (NVL72 → Rubin 500+) is the actual GPU-generation unlock not raw FLOPS, and why **memory bandwidth — not compute, not even just power — is the deepest bottleneck.** Per Dylan Patel cited mid-episode: ~50% of 2026 hyperscaler CapEx is going on memory. Models are ~100x overtrained vs Chinchilla because inference token volume across a model's 2-month life exceeds training tokens.

Key points

Notable quotes

If you do not batch many users together, the cost can be a thousand times worse. Batch size is the single biggest lever in inference economics.

Reiner Pope · 4:00

There are no dark GPUs — but there is a memory wall. Hyperscalers are spending half their CapEx on memory this year.

Reiner Pope (channeling Dylan Patel) · 1:13:20

API pricing actually leaks information about the architecture. The 50% jump at 200k context tells you exactly where memory time crosses compute time.

Reiner Pope · 1:53:20

Each model should generate the sum of human knowledge on its output — because cost-equilibrium says inference tokens equal pretrain tokens.

Reiner Pope · 1:30:00

The reason scale-up size matters isn't memory capacity — it's bandwidth. The bandwidth lets you do longer context, which is what makes models agentic.

Reiner Pope · 1:21:40

I don't see a good path to solving the memory wall. The empirical result is context lengths haven't moved in two years.

Reiner Pope · 2:08:20

Themes

Mentioned