
cerebras turns fast inference into a capital allocation story
Andrew Feldman's most important claim is that AI demand has moved from "can the model work?" to "can the model respond fast enough to be useful?" Cerebras is therefore not only a chip story. It is a bet that inference speed, cloud access, manufacturing scale, and public-market credibility become scarce allocation points. Cerebras built a very large chip designed for AI. Feldman's point is that when people use AI all day, waiting for answers becomes expensive and frustrating. If Cerebras can make models answer much faster, customers may pay for that speed. But making the chips and deploying them is hard because it needs factories, power, software, money, and large customers. Cerebras is a useful counterweight to NVIDIA because it asks whether the allocation economy can route around the dominant GPU path. Feldman's story says inference demand is pulling capital toward alternative architectures, but the bottlenecks are still physical: manufacturing, power, buildings, testing, networking, compilers, and customer commitments. The allocation question is who controls fast output tokens, not only who trains the next model.


