Choosing a Model
What the numbers mean
- Parameters (e.g. 3B, 8B): roughly, the model's size and capability. Bigger models are smarter but slower and need more memory.
- Quantization (e.g. Q4_K_M, Q8_0): compression.
Q4_K_Mis a good balance of quality and size; lower numbers are smaller/faster but a bit less accurate.
Picking a tier
| Tier | Example | Download | RAM needed | Good for |
|---|---|---|---|---|
| Tiny | Gemma-2 2B, Llama-3.2 3B (Q4) | ~1.5–2.5 GB | 4–8 GB | Older laptops, fast replies |
| Default | Llama-3.2 3B Instruct (Q4_K_M) | ~2 GB | 8 GB | Most machines |
| Standard | Qwen2.5 7B, Llama-3.1 8B (Q4) | ~4.5–5 GB | 16 GB | Better reasoning |
When in doubt, start with the default. You can always download a larger model later and switch between them.
How downloading works
- Open the model picker and choose a model from the catalog (or paste a custom Hugging Face model reference, if your build supports it).
- The app checks you have enough free disk space, then downloads with a progress bar. Interrupted downloads resume.
- Once complete, select the model to load it. The first prompt after loading may take a few seconds while the model initializes.
Switching models
Pick a different downloaded model from the model picker at any time. Each conversation records which model produced it.
Managing disk space
Models are the largest files the app stores. Delete models you no longer use from Settings → Models to free space. See Knowledge Base for the (often much larger) Wikipedia archives.