Choosing a Model

What the numbers mean

Parameters (e.g. 3B, 8B): roughly, the model's size and capability. Bigger models are smarter but slower and need more memory.
Quantization (e.g. Q4_K_M, Q8_0): compression. Q4_K_M is a good balance of quality and size; lower numbers are smaller/faster but a bit less accurate.

Picking a tier

Tier	Example	Download	RAM needed	Good for
Tiny	Gemma-2 2B, Llama-3.2 3B (Q4)	~1.5–2.5 GB	4–8 GB	Older laptops, fast replies
Default	Llama-3.2 3B Instruct (Q4_K_M)	~2 GB	8 GB	Most machines
Standard	Qwen2.5 7B, Llama-3.1 8B (Q4)	~4.5–5 GB	16 GB	Better reasoning

When in doubt, start with the default. You can always download a larger model later and switch between them.

How downloading works

Open the model picker and choose a model from the catalog (or paste a custom Hugging Face model reference, if your build supports it).
The app checks you have enough free disk space, then downloads with a progress bar. Interrupted downloads resume.
Once complete, select the model to load it. The first prompt after loading may take a few seconds while the model initializes.

Switching models

Pick a different downloaded model from the model picker at any time. Each conversation records which model produced it.

Managing disk space

Models are the largest files the app stores. Delete models you no longer use from Settings → Models to free space. See Knowledge Base for the (often much larger) Wikipedia archives.