Why local autocomplete is worth trying

Cloud autocomplete tools are convenient, but not every project can send source code to a remote AI service. Local autocomplete gives you a practical middle path: useful coding help inside VS Code while model calls stay on your machine or on an Ollama host you control.

LocalPilot is built for that workflow. It prepares a small amount of nearby editor context, filters sensitive paths, redacts secret-like strings, and asks Ollama for a completion. The result appears as inline ghost text in the editor.

Basic setup

Start with Ollama running locally, then pull a coding model that matches your machine. Smaller models are better for quick inline suggestions. Larger models can be better for chat, explanation, and code review style tasks.

ollama serve
ollama pull qwen2.5-coder:1.5b

Choose a model for the job

A good default is qwen2.5-coder:1.5b for responsive autocomplete. If your machine has more memory, qwen2.5-coder:7b can produce stronger answers for chat and larger edits. On lower-end machines, micro and lite modes keep context and output budgets smaller so the editor stays usable.

  • Use a smaller model for inline ghost text.
  • Use a larger model for explanations, tests, and fixes.
  • Switch to line mode if multiline suggestions feel slow.
  • Keep Ollama on localhost unless you intentionally manage a remote private host.

What makes LocalPilot different from a generic chat window

LocalPilot is editor-aware. It can read the active selection, build bounded context from nearby source, open diff previews for suggested edits, and expose commands through the VS Code command palette and context menu.

The goal is not to replace review or testing. The goal is to make local AI useful in the exact places where developers already work: autocomplete, explaining code, generating tests, and safely previewing fixes before applying them.