[{"data":1,"prerenderedAt":75},["ShallowReactive",2],{"term-r\u002Freinforcement-learning":3,"related-r\u002Freinforcement-learning":59},{"id":4,"title":5,"acronym":6,"body":7,"category":40,"description":41,"difficulty":42,"extension":43,"letter":44,"meta":45,"navigation":46,"path":47,"related":48,"seo":53,"sitemap":54,"stem":57,"subcategory":6,"__hash__":58},"terms\u002Fterms\u002Fr\u002Freinforcement-learning.md","Reinforcement Learning",null,{"type":8,"value":9,"toc":33},"minimark",[10,15,19,23,26,30],[11,12,14],"h2",{"id":13},"eli5-the-vibe-check","ELI5 — The Vibe Check",[16,17,18],"p",{},"Reinforcement Learning is how you train an AI by giving it rewards and punishments instead of labeled examples. The AI tries stuff, gets a score, and learns to do more of what got high scores. This is how DeepMind's AlphaGo became the world's best Go player, and it's a key part of how LLMs like ChatGPT get aligned via RLHF (Reinforcement Learning from Human Feedback).",[11,20,22],{"id":21},"real-talk","Real Talk",[16,24,25],{},"Reinforcement Learning is a learning paradigm where an agent learns to take actions in an environment to maximize cumulative reward. Unlike supervised learning, no labeled dataset is required — feedback comes from the environment. Key algorithms include Q-learning, PPO, and SAC. RLHF is a variant used to align LLMs with human preferences.",[11,27,29],{"id":28},"when-youll-hear-this","When You'll Hear This",[16,31,32],{},"\"RLHF uses reinforcement learning to align the LLM.\" \u002F \"Reinforcement learning powered the AlphaGo breakthrough.\"",{"title":34,"searchDepth":35,"depth":35,"links":36},"",2,[37,38,39],{"id":13,"depth":35,"text":14},{"id":21,"depth":35,"text":22},{"id":28,"depth":35,"text":29},"ai","Reinforcement Learning is how you train an AI by giving it rewards and punishments instead of labeled examples.","advanced","md","r",{},true,"\u002Fterms\u002Fr\u002Freinforcement-learning",[49,50,51,52],"Agent","Fine-tuning","Training","Machine Learning",{"title":5,"description":41},{"changefreq":55,"priority":56},"weekly",0.7,"terms\u002Fr\u002Freinforcement-learning","GFIOURPZyMaPp7IcNBKGFoqdLbwEQlxiRlCDye2UJ20",[60,64,67,72],{"title":49,"path":61,"acronym":6,"category":40,"difficulty":62,"description":63},"\u002Fterms\u002Fa\u002Fagent","intermediate","An AI agent is an LLM that doesn't just answer questions — it takes actions.",{"title":50,"path":65,"acronym":6,"category":40,"difficulty":62,"description":66},"\u002Fterms\u002Ff\u002Ffine-tuning","Fine-tuning is like taking a smart graduate student who knows everything and then sending them to a specialist bootcamp.",{"title":52,"path":68,"acronym":69,"category":40,"difficulty":70,"description":71},"\u002Fterms\u002Fm\u002Fmachine-learning","ML","beginner","Machine Learning is teaching a computer by showing it thousands of examples instead of writing out every rule.",{"title":51,"path":73,"acronym":6,"category":40,"difficulty":62,"description":74},"\u002Fterms\u002Ft\u002Ftraining","Training is the long, expensive process where an AI learns from data.",1776518307351]