No Priors

Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

42m · Transcribed via assemblyai · Watch on YouTube

Tuhin Srivastava (Baseten CEO) — talking from inside the inference cloud at the moment everyone else's narrative starts. **30x revenue growth in 12 months, on track for >$1B in 2026.** 95% of tokens served are *custom* models (post-trained variants of open-source). Operates at mid-90s utilisation across **90 clusters in 18 clouds**, runs a daily 4pm capacity-allocation meeting. **GB200 access now requires 3-5 year contracts with 20-30% TCV prepay**, materially changing the IPO/financing calculus for inference companies. H100 still in demand 4.5 years post-launch — price still going up. Frontier open-source is now overwhelmingly Chinese (DeepSeek, Moonshot/Kimi, Canopy, Orpheus); 'effectively the Chinese government is subsidising US enterprise.' 400% NDR, top-30 customers never churned. Confirms Reiner Pope's framing — disentangling pre-fill and decode is 'the next set of primitives.' On Jevons: 'inference is the last market — even if there's AGI, all that's left is inference.'

Key points

**30x revenue growth in 12 months, >$1B run-rate trajectory in 2026, 400% NDR, top-30 customer churn = 0.** The inference cloud category is real and Baseten is a clean read on its growth rate. AI 'long tail' = customers in-housing intelligence + post-training is now mainstream enough to be the default pattern.
**95% of tokens served are custom (post-trained) models.** Almost no one is running vanilla open-source weights at scale. Baseten acquired the Parsed research team to support post-training because 'inference and post-training are two sides of the same problem' — inference begets evals, evals beget reward signals, reward signals beget more post-training, more post-training begets more inference.
**Open-source frontier is now Chinese.** Customer-favoured open models cited by name: DeepSeek, Moonshot/Kimi, Canopy, Orpheus (text-to-speech). 'It would be a fundamental problem if America never came up with good open-source models.' Sarah Wang's framing accepted: 'effectively the Chinese government is subsidising US enterprise' via these freely-available models. **Direct point of friction with Sacks's 'sovereign-AI' framing on All-In.**
**Capacity is structurally constrained — and the way it's constrained has changed.** Baseten operates at mid-90s utilisation across **90 clusters in 18 clouds**. Daily 4pm standing capacity-allocation meeting. New GB200 capacity now requires **3-5 year contracts with 20-30% TCV prepay**. 'What becomes important when acquiring capacity is having low cost of capital' — direct push toward earlier IPO for inference plays.
**H100 is still appreciating in the secondary market 4.5 years post-launch** despite Blackwell + Rubin coming. Useful life now estimated at 9 years. Direct cross-reference to PTJ's leverage / capacity-allocation thesis — these are very long-duration capital commitments.
**'Probably 12 good clouds, 3-4 in the gold tier.'** A lot of new GPU suppliers are 'grifty' — haven't run data centres before, don't understand SLAs especially for inference. Even when capacity is nominally available, operational diligence kills it. Multi-cloud inference fabric (Baseten's tech) becomes the only way to avoid being held hostage by individual provider failures.
**Disentangling pre-fill and decode.** Direct echo of Reiner Pope's Dwarkesh episode: pre-fill is compute-bound, decode is memory-bandwidth-bound, and the next set of inference primitives treats them as separate problems. KV-cache-aware routing, speculation techniques, dedicated decode chips — all on Baseten's roadmap.
**'GPUs as a service is not sticky. Inference + software layer is incredibly sticky.'** None of top-30 customers have ever churned. The strategic lesson for the labs: in a compute-constrained world the labs are vertically integrating (own the inference cloud). 'In a world of constrained compute, the number one thing to own is compute.'
**Customer pattern: capability first, cost second.** Customers come in for the highest-quality model and then optimise. 'No GPUs pre product-market-fit; no post-training pre product-market-fit.' Once an application has shown user-signal value, post-train a specialised model that's better-faster-cheaper for that specific job (e.g. customer support model that doesn't need to be good at coding).
**Lean-org was the company until 12-18 months ago** — Sarah Wang told Tuhin he 'just needed leaders.' Hero culture explicitly banned. First-principles + kind + low-ego + can-handle-no-manager = the explicit hiring rubric. **Fourth lean-ops case in this issue (with AppLovin, 20VC framing, Kalshi).** But unlike those, Baseten has accepted that infrastructure scale eventually requires a leadership layer.
**Pager culture is the operations DNA.** Co-founder Amir's 7-year-old asks 'is that a P0?' when his pager goes off. Senior AWS execs' pagers all went off during a 45-min meeting — 'it's a cultural thing, you just have to get used to it.' Self-selecting filter for who can build infrastructure companies.
**Jevons Paradox confirmed in customer behaviour.** 'When inference cost drops, agents just run longer or do more work to get to a larger end.' Compute scales from an inference perspective too. Tuhin: **'inference is the last market — even if there's AGI, all that's left is inference.'** This is the operator-side mirror of Reiner Pope's overtraining-vs-Chinchilla math from the Dwarkesh episode.

Notable quotes

30x growth in the last 12 months. None of our top 30 customers have ever churned. We're talking 400% net dollar retention.

Tuhin Srivastava · 4:00

GPUs as a service is not sticky. Inference with the software layer included is incredibly sticky. In a world of constrained compute, the number one thing to own is compute.

Tuhin Srivastava · 28:20

Effectively the Chinese government is subsidising US enterprise via these open-source models. If we don't have access to that intelligence, we won't be able to innovate as fast.

Sarah Wang and Tuhin Srivastava · 18:20

If you want a B200 right now from a good cloud, you're not getting that less than a three-to-five-year contract with a 20-30% TCV prepay. Cost of capital is everything.

Tuhin Srivastava · 25:00

Inference is the last market. Even if there's AGI, all that's left is inference.

Tuhin Srivastava · 40:00

Capacity. That's what keeps me up at night. There's no world in which there's enough compute to get the value we want out of LLMs in the next five to ten years.

Tuhin Srivastava · 35:50

Themes

Inference cloud category growth at 30x with structural compute constraint
95% of served tokens are custom post-trained models
Capital-cost is now the binding constraint on capacity acquisition
Frontier open-source has shifted to Chinese labs
Inference + software stickiness vs commoditised GPU-as-a-service

Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

Key points

Notable quotes

Themes

Mentioned

People

Companies

Ideas