GGUF
ELI5 — The Vibe Check
GGUF is a file format for running AI models on your laptop — it's like the MP3 of AI models. Before GGUF (and its predecessor GGML), running a large language model locally was practically impossible unless you had a data center. Now you can download a GGUF file and run it with llama.cpp. It's what made the 'run your own AI' movement possible.
Real Talk
GGUF (GPT-Generated Unified Format) is a binary file format for storing quantized machine learning models, designed for efficient local inference with llama.cpp and compatible tools. It supports various quantization levels (Q2_K through Q8_0), metadata storage, and multiple tensor types. It replaced the earlier GGML format with better extensibility and compatibility.
When You'll Hear This
"Download the Q4_K_M GGUF — best quality-to-size ratio." / "The GGUF format lets you run Llama on a MacBook Pro."
Related Terms
Inference
Inference is when the AI actually runs and generates output — as opposed to training, which is when it's learning.
Llama
Llama is Meta's open-source AI model — it's like if one of the big tech companies just... gave away their homework.
Local AI
Local AI means running AI models on your own computer instead of sending data to the cloud.
Model
A model is the trained AI — the finished product.
Quantization
Quantization is the art of making AI models smaller and faster by using less precise numbers.