Reinforcement Learning
ELI5 — The Vibe Check
Reinforcement Learning is how you train an AI by giving it rewards and punishments instead of labeled examples. The AI tries stuff, gets a score, and learns to do more of what got high scores. This is how DeepMind's AlphaGo became the world's best Go player, and it's a key part of how LLMs like ChatGPT get aligned via RLHF (Reinforcement Learning from Human Feedback).
Real Talk
Reinforcement Learning is a learning paradigm where an agent learns to take actions in an environment to maximize cumulative reward. Unlike supervised learning, no labeled dataset is required — feedback comes from the environment. Key algorithms include Q-learning, PPO, and SAC. RLHF is a variant used to align LLMs with human preferences.
When You'll Hear This
"RLHF uses reinforcement learning to align the LLM." / "Reinforcement learning powered the AlphaGo breakthrough."
Related Terms
Agent
An AI agent is an LLM that doesn't just answer questions — it takes actions.
Fine-tuning
Fine-tuning is like taking a smart graduate student who knows everything and then sending them to a specialist bootcamp.
Machine Learning (ML)
Machine Learning is teaching a computer by showing it thousands of examples instead of writing out every rule.
Training
Training is the long, expensive process where an AI learns from data.