CES 2026: AI compute sees a shift from training to inference

LAS VEGAS — Not so long ago — last year, let’s say — tech industry spending was all about the big AI companies spending billions of dollars on training ever-larger frontier AI models. That’s rapidly changing, which is why at this year’s CES , AI inference was at the heart of the show’s keynote speeches and major announcements. (Inference is when AI models move from training to using the information they have to handle new, previously unseen data.) Until recently, according to Lenovo CEO Yuanqing Yang, most AI spending was related to training. Approximately 80% went to creating the large language models (LLMs) that underpin generative AI, he said, with the remaining 20% to the inference side. That is starting to change. “In the future, those numbers are reversed,” he told reporters Wednesday at CES. “Eighty percent will be on the inference and 20% will be on training. That is our forecast.” And that’s why Lenovo launched three new inference servers on Tuesday, he said. “We definitely want to lead the trend.” According to industry experts, that shift is already under way. In a November report, Deloitte estimated that inference workloads accounted for half of all AI compute in 2025 — a figure that will jump to two-thirds in 2026. The actual infrastructure spending lags a little bit, Lenovo’s Ashley Gorakhpurwalla, executive vice president and president of infrastructure solutions group, told Computerworld . “When you train foundational models, you start big, and you put all the capital in up front,” he said. But when enterprises deploy AI, such as a chatbot, they start small and slowly scale up. “People deploy, iterate, and move forward.,” Gorakhpurwalla said. “When you first deploy a chatbot, it’s a small expense.” However, even on the spending side, 2026 will be a big inflection year for inference, according to a December report by the Futurum Group . “We’re seeing a clear shift,” Futurum analyst Nick Patience said in the report. “Inference workloads are set to overtake training revenue by 2026.” Enterprises are moving from experimentation to deployment, boosting the demand for AI inference servers, and are also increasing hybrid and edge deployments. That’s the rationale for Lenovo’s decision to launch three new inferencing servers at CES this week. The servers include the Lenovo ThinkSystem SR675i, designed to run full-sized LLMs for applications in areas like manufacturing, healthcare and financial services; the Lenovo ThinkSystem SR650i, which is designed to be scalable and easy to deploy in existing data centers; and the Lenovo ThinkEdge SE455i, a compact server built for retail, telco and industrial environments. This isn’t Lenovo’s first foray into inferencing services, or even into small-scale inferencing servers. It released its first entry-level AI inferencing server for edge AI in March 2025 . The company also offered other servers capable of handling AI workloads that weren’t specifically marketed as inferencing servers. There are three main drivers for enterprises looking to buy and deploy their own inference servers, said Arthur Hu, senior vice president, global CIO and chief delivery and technology officer for Lenovo’s solutions and services group. That role puts him in direct contact with enterprise customers. First, customers are getting more strategic about how they use cloud computing, he said in an interview at CES. The public cloud is good for early experimentation or when a company needs to deploy over a large geography. “But if you know what your workload size and predictability is, you don’t need to pay the additional premium,” he said. Another adoption driver is the need to use data where it’s generated. “You don’t want to store all the data all the time,” Hu said. With an edge AI server, the data can be immediately used when needed, then discarded later. Finally, there are the privacy, security and sovereignty concerns. “Everyone is very sensitive that they can control their data and govern it,” he said. When a company runs its own AI inferencing, the data never needs to be out of its corporate hands. Lenovo wasn’t the only company making bets on AI inferencing this week. AMD announced the AMD Instinct MI440X GPU , designed for on-premises inferencing for enterprise AI. And while Lenovo rivals Dell and HPE didn’t announce similar hardware at CES, they did both release new inferencing servers last year. Dell’s air-cooled PowerEdge XE9780 and XE9785 servers integrate into existing enterprise data centers, while the liquid-cooled Dell PowerEdge XE9780L and XE9785L servers support rack-scale deployment, the company announced last May . For its part, HPE last March released its latest AI servers , including the HPE ProLiant Compute DL380a Gen12. It offers support for up to 16 GPUs as well as direct liquid cooling, and is purpose built for AI fine-tuning and inference. Given the rapidly increasing demand for enterprising inferencing, more announcements from the major players are likely this year.