Loading lesson…
Whether a model runs well — or at all — depends on the hardware you put under it. Here is the practical map of what hardware can run which class of model.
An LLM has to fit into memory before it can run. On a discrete GPU, that means VRAM. On Apple Silicon, that means unified memory shared between CPU and GPU. On a CPU-only machine, that means RAM and a lot of patience. Whatever runs is whatever fits. So the buying decision is really a memory-sizing decision.
| Hardware | Useful memory | Realistic model class | Vibe |
|---|---|---|---|
| 8GB integrated GPU laptop | ~6GB usable | Up to ~7B at Q4 | Toy projects, learning |
| 16GB Apple Silicon Mac | ~10-12GB usable | Up to ~13B at Q4 | Solid daily driver |
| 24GB consumer GPU (e.g. high-end RTX class) | ~22GB usable | Up to ~30B at Q4 or 13B at Q8 | Comfortable workhorse |
| 48GB+ Mac Studio class | ~40GB+ usable | Up to ~70B at Q4 | Power user / small team server |
| 80GB+ datacenter GPU | ~78GB+ | 70B at Q8 or 405B at low quant | Serious self-host |
Apple's unified memory architecture means a 64GB Mac Studio can hold a 70B-class model that a 24GB consumer GPU simply cannot. Throughput is not as high as a top-end discrete GPU, but the ceiling on model size is dramatically higher per dollar. For local inference, M-series Macs punch far above their weight.
The big idea: pick the model first, then size the memory, then pick the hardware. Reversing that order is how teams end up with great GPUs that cannot run the model they actually want.
8 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-local-hardware-sizing-creators
What is the main idea of "Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities"?
Which concept is most central to "Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities"?
Which use of AI fits this topic best?
What should a careful learner remember about "The math: parameters times bytes"?
You want to use AI after this lesson. What is the safest next step?
How should AI output about VRAM be treated?
Name one way to verify an AI answer about VRAM.
Which action would help you apply "Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities" responsibly?