← Back to issue
Altimeter Capital

Inference Costs Dropped 99% in 2.5 Years

3m · Transcribed via assemblyai · Watch on YouTube

A 3-minute clip from a longer Altimeter conversation: inference costs are down ~90% in 12 months and ~99% in 2.5 years. The drivers are supply-chain (TSMC + packaging + lithography), engineering innovation (chip size, quantisation, MVFP4), and power. The complication: model size (1T → 10T params) and demand are racing ahead of cost-per-token gains, so H100 prices are rising even as cost-per-unit-intelligence falls.

Key points

Notable quotes

If we look at the cost of inference, it's dropped by basically 90% over the course of the last year. It's dropped by closer to 99% over the course of the last two, two and a half years.

Host · 0:00

Even if we get a 50x cost reduction over five years, the models and the demand are growing faster. That's why H100 prices are going up.

Speaker (chip-industry executive) · 3:20

Themes

Mentioned