Loading lesson…
Whether a model runs well — or at all — depends on the hardware you put under it. Here is the practical map of what hardware can run which class of model.
An LLM has to fit into memory before it can run. On a discrete GPU, that means VRAM. On Apple Silicon, that means unified memory shared between CPU and GPU. On a CPU-only machine, that means RAM and a lot of patience. Whatever runs is whatever fits. So the buying decision is really a memory-sizing decision.
| Hardware | Useful memory | Realistic model class | Vibe |
|---|---|---|---|
| 8GB integrated GPU laptop | ~6GB usable | Up to ~7B at Q4 | Toy projects, learning |
| 16GB Apple Silicon Mac | ~10-12GB usable | Up to ~13B at Q4 | Solid daily driver |
| 24GB consumer GPU (e.g. high-end RTX class) | ~22GB usable | Up to ~30B at Q4 or 13B at Q8 | Comfortable workhorse |
| 48GB+ Mac Studio class | ~40GB+ usable | Up to ~70B at Q4 | Power user / small team server |
| 80GB+ datacenter GPU | ~78GB+ | 70B at Q8 or 405B at low quant | Serious self-host |
Apple's unified memory architecture means a 64GB Mac Studio can hold a 70B-class model that a 24GB consumer GPU simply cannot. Throughput is not as high as a top-end discrete GPU, but the ceiling on model size is dramatically higher per dollar. For local inference, M-series Macs punch far above their weight.
The big idea: pick the model first, then size the memory, then pick the hardware. Reversing that order is how teams end up with great GPUs that cannot run the model they actually want.
15 questions · take it digitally for instant feedback at tendril.neural-forge.io/learn/quiz/end-local-hardware-sizing-creators
What is the core idea behind "Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities"?
Which term best describes a foundational idea in "Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities"?
A learner studying Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities would need to understand which concept?
Which of these is directly relevant to Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
Which of the following is a key point about Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
Which of these does NOT belong in a discussion of Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
Which statement is accurate regarding Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
Which of these does NOT belong in a discussion of Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
What is the key insight about "The math: parameters times bytes" in the context of Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
What is the key insight about "Context length costs memory too" in the context of Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
What is the key insight about "From the community" in the context of Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
Which statement accurately describes an aspect of Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
What does working with Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities typically involve?
Which of the following is true about Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities?
Which best describes the scope of "Hardware Sizing for Local Models: VRAM, Unified Memory, and CPU-Only Realities"?