Local LLM
ELI5 — The Vibe Check
Running an AI model on your own computer instead of calling an API in the cloud. No internet needed, no API costs, total privacy. The tradeoff? You need decent hardware (GPU with enough VRAM), and local models are usually smaller/dumber than cloud ones. It's the difference between renting and owning — your house, your rules, your GPU crying in the corner.
Real Talk
Local LLMs are language models run on personal hardware rather than accessed via cloud APIs. Tools like Ollama, LM Studio, and llama.cpp make it accessible. Benefits include privacy, zero API costs, and offline use. Limitations include hardware requirements and generally lower capability than frontier cloud models.
When You'll Hear This
"I run a local LLM for code completion — zero latency and free." / "For sensitive data, always use a local LLM."
Related Terms
Inference
Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.
LLM (Large Language Model)
An LLM is a humongous AI that read basically the entire internet and learned to predict what words come next, really really well.
Ollama
Ollama is Docker for AI models. One command downloads and runs any open-source AI model on your computer.
Open Source Model
AI models where the weights are publicly available — anyone can download, run, modify, and fine-tune them.
Quantization
Quantization is the art of making AI models smaller and faster by using less precise numbers.