Local autocomplete has a tighter latency budget
Inline suggestions happen while you type, so even a helpful model can feel bad if it responds too slowly. The fix is usually not one setting. It is a mix of model size, context size, output length, debounce timing, and whether you ask for one line or multiline completions.
Start with line mode
If suggestions feel slow, switch from full multiline completion to line mode. Line mode asks for a smaller answer and is often enough for finishing the current line, adding a condition, or completing a small expression.
localpilot.inlineCompletionMode = line
localpilot.mode = liteUse a smaller inline model
A large model can be great for chat and explanation, but inline autocomplete rewards speed. Keep a smaller model configured for inline suggestions and reserve larger models for tasks where waiting a few seconds is acceptable.
- Try qwen2.5-coder:1.5b for balanced inline suggestions.
- Try smollm2:360m on very constrained machines.
- Use qwen2.5-coder:7b for chat only if your machine handles it well.
- Avoid asking a large model for every keystroke.
Lower prompt and output budgets
More context can improve answers, but it also costs time. LocalPilot lets you cap nearby context lines and output tokens. Lower both when you care most about responsiveness.
localpilot.maxContextLines = 60
localpilot.maxOutputTokens = 96Check what LocalPilot is skipping
Sometimes the extension is quiet because it is protecting the editor. Large files, generated folders, lock files, dependency folders, and sensitive paths can be skipped intentionally. Run the status and troubleshooting commands before assuming the model is broken.