← All tools
AI PC Planner
Find the right GPU for local LLM inference. Real tok/s estimates, VRAM requirements by quantization, and value-per-GB comparisons.
FAQ
Which models are covered?
7B through 70B parameter models at Q4, Q5, and Q8 quantization. We focus on models runnable with Ollama and LM Studio.
How accurate are the tok/s numbers?
Community benchmark averages. Results vary with prompt length, context size, and system configuration.
What does quantization mean?
Quantization reduces model precision to save VRAM. Q4 uses ~4 bits per weight (smallest, fastest), Q8 uses ~8 bits (closest to full precision). Q4_K_M is the best quality/size tradeoff for most users.