Question 1

Which models are covered?

Accepted Answer

7B through 70B parameter models at Q4, Q5, and Q8 quantization. We focus on models runnable with Ollama and LM Studio.

Question 2

How accurate are the tok/s numbers?

Accepted Answer

Community benchmark averages. Results vary with prompt length, context size, and system configuration.

Question 3

What does quantization mean?

Accepted Answer

Quantization reduces model precision to save VRAM. Q4 uses ~4 bits per weight (smallest, fastest), Q8 uses ~8 bits (closest to full precision). Q4_K_M is the best quality/size tradeoff for most users.

AI PC Planner

Recommended GPUs

FAQ

Which models are covered?

How accurate are the tok/s numbers?

What does quantization mean?