Apple’s Mac Minis have become the darlings for running local AI. Apple’s latest MacBook Pros with its new Apple M5 Pro and Apple M5 Max chips could bewitch developers even further. Apple debuted its new MacBook Pros this week , combining a pair of 3-nm CPU dies with an interconnect between the two. Macworld’s Jason Cross noted that the dual-die design isn’t new; AMD processors like the Ryzen 9 5950X or the 7950X3D use a pair of chiplets connected with its Infinity Fabric interconnect. What’s new about Apple’s M5 chips are the inclusion of “super cores” and “performance cores.” Some — the renamed “super cores” — is branding, as Jason points out. Other bits, like the new performance cores, remain somewhat unknown. But what’s really interesting is that they’re paired with between 20- and 40-core GPUs, each with a neural accelerator inside, as well as up to a whopping 128GB of configurable unified memory. That’s something that your Windows PC doesn’t do, and brings Apple customers several advantages. Right now, we don’t know some of the details surrounding Apple’s latest chips that a deep dive would reveal, perhaps at the Hot Chips chip conference this August. But just the top-down features tell us enough that we can marvel at what Apple is doing. Memory matters, and the M5 Max MacBooks deliver in spades Let’s talk about the most impressive aspect right away: the memory. Apple’s M5 Max chip includes a 48GB of unified memory standard, up to a whopping 128GB. Apple’s configuration page is a little confusing, but it appears that all of the 16-inch MacBook Pro laptops are configurable to that capacity, though for a whopping $4,399 price. But that’s a unified memory configuration. For years, Windows laptops powered by AMD and Intel chips included dedicated VRAM: half of the available system RAM in an Intel laptop, and a fixed version within a Ryzen notebook as well. When AMD announced the Ryzen AI Max processor for local AI, it tweaked its Adrenalin software to allow users to adjust the VRAM on the fly. Last August, Intel debuted the “Shared GPU Memory Override,” which basically did the same. Qualcomm, whose Arm processors are the closest to the M5 Pro and Max from an architecture standpoint, don’t offer that capability. Apple’s MacBooks can b e used for more than AI, of course, like ray tracing. Apple Apple’s M5 chips use what it calls MLX, an open-source array framework that appears to take local AI to another level. MLX, as Apple explains , doesn’t require the user to figure out how much memory to allocate. It doesn’t necessarily even make the choice itself. Instead, MLX, which built in support for neural network training and inference, including text and image generation, can “run on either the CPU or the GPU without needing to move memory around.” To use an old cliche, it just works. AI models gobble up the fastest memory available on your PC, which typically is the video memory associated with your GPU. The best AI models are typically the most complex: bigger is better, and AI models use the number of parameters as a general metric of how good they are. But those models also need a lot of memory to run in, too — just like Windows won’t run in a PC with a measly 2GB of RAM, and why Adobe Photoshop requires substantially more. So, as a subtotal: Apple’s new M5 MacBook Pros include up to 128GB of available memory, the vast majority of which is available to the GPU — in this case, the AI engine. That’s far more than the local VRAM associated with the most powerful PC graphics card, the Nvidia RTX 5090, which has 32GB of VRAM attached to it. And if you think that $4,399 is way too much to pay for an AI box, the $3,099 MacBook Pro includes 48GB of unified memory, standard. You get the idea. Apple’s MacBooks, then, should be able to both load and run local AI models that would normally be forced to run in the cloud. That means no latency, no subscription costs, no data leaving the device for applications governed by strict privacy laws. Apple’s own estimates (below) show the memory comparisons of various popular models, and they should easily run on the MacBook. Double the memory allocation on the Qwen3 model, and a 70 billion parameter model should be possible. Even cooler, Apple published a few lines of code to quantize, or compress, a model down to a lower precision. That’s slick. But Apple’s design also has some impressive performance characteristics, too. Processors from AMD, Intel, and Qualcomm each include a dedicated, unified GPU. Apple doesn’t: instead, it puts an NPU (or at least a “neural accelerator”) inside each of its GPU cores. We know that these neural accelerators perform specific dedicated matrix-multiplication operations necessary to machine learning, and that Apple also has a separate 16-core neural engine that presumably is its name for a dedicated NPU. We don’t really know how it all interacts with one another. Still, Apple says that the “time to first token” (how quickly the LLM responds to your input) is dramatically faster, both in the table above and in the chart below. Personally, my LLM use hasn’t really been gated by how fast the LLM is. I like a quick response, but the dot matrix-printer-esque way in which an LLM generates text can make it difficult to read. The sophistication of the response matters more. Can the Windows world keep up? To be fair, Microsoft has something similar in concept moving through its pipeline: Windows ML , which takes advantage of the most powerful silicon available to the PC to run local AI applications. It basically says that you don’t need an NPU, just whatever is the most powerful, available component in your PC. Which approach is better? I honestly don’t know, though proper testing should reveal the answer. AMD’s Ryzen AI Max+ is an AI powerhouse, and probably Apple’s closest challenger. AMD throws 80MB of cache at AI problem with the AI Max+ 395, and adding memory to processors has been a strategy it’s used to excellent effect. Still, I’ve seen anecdotal reports that Apple Store employees have been shocked by the Mac Minis moving out the door. But the small, compact little boxes have proven quite useful to developers looking to run LLMs or agentic AI locally, without chewing through either tons of power or an AI token subscription. Apple’s new MacBook Pros simply add a screen. As far as I can tell, however, the Mini wasn’t designed with AI in mind. With Apple’s management now aware that the Mini is a preferred AI device, it will be interesting to see what happens with the expected Apple 2026 Mac Mini with an M5 chip inside . It all certainly sets a high bar for AMD, Intel, and Qualcomm to meet.