As inference splits into prefill and decode, Nvidia's Groq deal could enable a "Rubin SRAM" variant optimized for ultra-low latency agentic reasoning workloads (Gavin Baker/@gavinsbaker)

Gavin Baker / @gavinsbaker : As inference splits into prefill and decode, Nvidia's Groq deal could enable a “Rubin SRAM” variant optimized for ultra-low latency agentic reasoning workloads —  Nvidia is buying Groq for two reasons imo. 1) Inference is disaggregating into prefill and decode.