Altimeter Capital
Inference Costs Dropped 99% in 2.5 Years
A 3-minute clip from a longer Altimeter conversation: inference costs are down ~90% in 12 months and ~99% in 2.5 years. The drivers are supply-chain (TSMC + packaging + lithography), engineering innovation (chip size, quantisation, MVFP4), and power. The complication: model size (1T → 10T params) and demand are racing ahead of cost-per-token gains, so H100 prices are rising even as cost-per-unit-intelligence falls.
Key points
- Inference cost trajectory: ~90% down in last 12 months. ~99% down in last 2.5 years. Net unit-of-intelligence is collapsing in price.
- Three drivers of the cost decline: (1) supply chain — TSMC + advanced packaging + lithography; (2) engineering innovation — bigger chips, novel layouts, quantisation (MVFP4); (3) power per token.
- Lithography is hitting a limit. Moore's Law alone no longer keeps pace — solution is making chips physically bigger (Cerebras-style 'pizza box' wafer-scale) plus better packaging.
- Counter-pressure: models are scaling 1T → 10T parameters. The fundamental flop count per token is rising. Demand is rising. Even at 50x cost reduction over five years, 'the models and the demand are growing faster.'
- Net effect: cost-per-token-of-intelligence falls, but H100 prices rise. Aggregate compute spend goes up, not down. Same paradox Anj Midha, Jensen, and Dylan Patel have all flagged in different framings this issue.
Notable quotes
If we look at the cost of inference, it's dropped by basically 90% over the course of the last year. It's dropped by closer to 99% over the course of the last two, two and a half years.
Even if we get a 50x cost reduction over five years, the models and the demand are growing faster. That's why H100 prices are going up.
Themes
- Cost of inference collapsing faster than any commodity in tech history
- Demand growing even faster than cost falls