Lex Fridman's MIT lecture on deep reinforcement learning, from Q-learning and DQN to AlphaGo Zero and the DeepTraffic competition.

Lex Fridman — MIT researcher and lecturer teaching the 6.S094 Deep Learning for Self-Driving Cars course
This is a solo MIT lecture in which Lex Fridman explains deep reinforcement learning as the attempt to teach systems to perceive and act in the world end-to-end from raw sensory data. He walks through the full AI stack, the structure of reinforcement learning (states, actions, rewards, policies, value functions), Q-learning and the Bellman equation, and how neural networks scale these methods to huge state spaces via Deep Q-Networks. He details the tricks that made DQN work (experience replay, fixed target networks, reward clipping) and celebrates AlphaGo and AlphaGo Zero as landmark achievements. He then introduces the class's DeepTraffic competition, a browser-based multi-agent deep RL challenge, and closes by questioning whether RL is yet applicable to real-world robotics and driving.