There is no single best local coding model
The best model depends on the job. Inline autocomplete needs speed because it runs while you type. Chat, explanations, and test generation can tolerate a little more latency in exchange for stronger reasoning.
LocalPilot separates inline, chat, and low-RAM model settings so you can tune the experience instead of forcing one model to do every task.
Recommended starting points
For most machines, qwen2.5-coder:1.5b is a good first inline model. It is small enough to feel responsive and still understands common coding patterns. For larger tasks, qwen2.5-coder:7b can be a stronger chat model if you have enough memory.
- Balanced inline model: qwen2.5-coder:1.5b
- Higher-quality chat model: qwen2.5-coder:7b
- Very small model: smollm2:360m
- Low-end coding model: deepseek-coder:1.3b
- Compact general coding model: codegemma:2b
How to test a model in VS Code
Use one real file from your project and compare the same task across models. Ask for a function explanation, generate a small unit test, then try inline suggestions in a tight loop. The winning model is the one that helps without making the editor feel heavy.
ollama pull qwen2.5-coder:1.5b
ollama pull qwen2.5-coder:7bQuality is not only model size
Context quality matters. LocalPilot keeps prompts bounded, skips noisy generated folders, and redacts secret-like values before the model sees the request. That keeps local models focused on the code that matters instead of dumping entire projects into a prompt.