Unified Memory Explained: Why It Powers Local AI Mini PCs

If you have spent any time researching local AI hardware in 2026, you have run into the same phrase over and over: unified memory. It is the single feature that separates a device that can run a large language model on your desk from one that cannot. As a manufacturer that has built mini PCs and OPS modules for years, we want to explain what it actually means for buyers — without the marketing gloss.

The problem unified memory solves

A large language model is, at its core, a very large file of numbers (the "weights"). To run it, the processor has to hold those weights in memory. The catch is that the bigger the model, the more memory it needs. A 70-billion-parameter model, once compressed to 4-bit precision, needs roughly 40 to 50 GB just to load.

On a traditional computer, the fast memory that a graphics card uses — VRAM — is separate from system RAM and usually small: 8, 16, sometimes 24 GB on consumer cards. If the model does not fit in VRAM, it does not run. You cannot simply "add more" because the VRAM is soldered to the graphics card.

Diagram comparing traditional discrete GPU architecture with unified memory architecture for AI workloads

Traditional discrete GPU vs. unified memory. In a unified design, CPU and GPU share one large memory pool, so the whole model fits without copying data back and forth.

How unified memory is different

Unified memory puts the CPU and the GPU on the same chip and gives them one shared pool of memory. Instead of a small dedicated VRAM, you get a single large pool — up to 128 GB on current hardware — that both can access directly. There is no copying data from system RAM into VRAM, because there is only one memory.

This is the architecture behind Apple's M-series chips, NVIDIA's DGX Spark, and AMD's Strix Halo (Ryzen AI Max+ 395). It is why a box the size of a small book can load a model that a $2,000 gaming GPU cannot.

The short version: For local AI, memory capacity matters more than raw speed. A model that does not fit simply will not run. Unified memory trades some bandwidth for a much larger usable pool — and for big models, that trade is what makes them runnable at all.

The trade-off nobody should hide from you

Unified memory has a real limitation, and any honest supplier should tell you about it: bandwidth. The shared memory in these systems (LPDDR5X) runs at roughly 256 to 273 GB/s. Dedicated VRAM on a high-end discrete card can exceed 1,000 GB/s.

What does that mean in practice? Large models fit and run, but the speed at which they generate text (tokens per second) is lower than a discrete GPU would deliver on a model small enough to fit its VRAM. For most business uses — running an assistant, processing documents, retrieval-augmented generation — this is perfectly acceptable. For latency-critical, high-throughput serving, it is a consideration.

Why this matters for the next generation of mini PCs

For years, the mini PC and OPS industry optimised for one thing: enough performance for signage, office work, and conferencing, in the smallest possible box. AI changes the target. The same form factor now has to hold enough memory to run a model.

We expect the same pattern that played out with OPS modules to repeat here: the technology starts expensive and proprietary, then standardises and drops in price as more silicon vendors enter. Gartner projects that inference costs will fall by over 90% between 2025 and 2030, with edge devices a key driver. The AI-capable mini host of 2028 will likely cost a fraction of today's, and the buyers who understand the architecture now will be the ones who source well later.

What to ask a supplier

How much unified memory, and is it upgradeable? On most of these chips the memory is soldered — so the capacity you buy is the capacity you keep.
What memory bandwidth? This sets the realistic token-generation speed.
What software stack is supported? CUDA, ROCm, or a domestic equivalent — this determines which tools and models run out of the box.
What is the thermal design? These chips draw 65–240W in a small chassis; cooling quality decides whether performance is sustained or throttled.

We build to these questions because we field them every week. If you are evaluating AI-capable mini hosts for your market, we are happy to talk through the trade-offs for your specific use case — no obligation.

Unified Memory Explained: Why It Powers the New Generation of AI Mini PCs

The problem unified memory solves

How unified memory is different

The trade-off nobody should hide from you

Why this matters for the next generation of mini PCs

What to ask a supplier

Continue reading