← All tools

AI PC Planner

Find the right GPU for local LLM inference. Real tok/s estimates, VRAM requirements by quantization, and value-per-GB comparisons.

$300$2,000
5GB VRAM required

Q4_K_M (4-bit, best value)

NVIDIA RTX 3090 (24GB) fits 7B Q4_K_M with 24GB VRAM and delivers ~38 tok/s.

Recommended GPUs

NVIDIA RTX 3090 (24GB)Best valueTop pick
24GB VRAM · 38 tok/s · 350W
GeForce RTX 4060 Ti 16GBBest value
16GB VRAM · 36 tok/s · 165W

FAQ

Which models are covered?

7B through 70B parameter models at Q4, Q5, and Q8 quantization. We focus on models runnable with Ollama and LM Studio.

How accurate are the tok/s numbers?

Community benchmark averages. Results vary with prompt length, context size, and system configuration.

What does quantization mean?

Quantization reduces model precision to save VRAM. Q4 uses ~4 bits per weight (smallest, fastest), Q8 uses ~8 bits (closest to full precision). Q4_K_M is the best quality/size tradeoff for most users.