LoRA
Low-Rank Adaptation
ELI5 — The Vibe Check
LoRA is how you fine-tune a massive AI model without needing a massive GPU budget. Instead of updating billions of parameters, LoRA adds tiny 'adapter' layers that learn just the changes needed. It's like tailoring a suit — instead of making a new suit from scratch, you just adjust the sleeves and waist. You get a custom fit without the custom price.
Real Talk
LoRA is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable low-rank decomposition matrices into each layer. This reduces the number of trainable parameters by orders of magnitude (often 10,000x) while achieving comparable performance to full fine-tuning. It enables fine-tuning large models on consumer hardware.
Show Me The Code
# Fine-tuning with LoRA using PEFT
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16, # rank
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05
)
model = get_peft_model(base_model, config)
# Now only ~0.1% of params are trainable
When You'll Hear This
"We LoRA'd the base model on our domain data — took 2 hours on a single A100." / "LoRA adapters are tiny — you can swap them at runtime."
Related Terms
LLM (Large Language Model)
An LLM is a humongous AI that read basically the entire internet and learned to predict what words come next, really really well.
Quantization
Quantization is the art of making AI models smaller and faster by using less precise numbers.
Training
Training is the long, expensive process where an AI learns from data.
Transfer Learning
Transfer Learning is using knowledge a model already has from one task to help it with a different task.