Home Lex Fridman Notes
Lex Fridman · 2017-01-22 · 1h 27m

MIT 6.S094: Deep Reinforcement Learning for Motion Planning

Lex Fridman teaches deep reinforcement learning and Q-learning, then unveils DeepTraffic, a browser-based competition to solve traffic with neural networks.

MIT 6.S094: Deep Reinforcement Learning for Motion Planning
The guest

Lex Fridman — MIT researcher and lecturer teaching the 6.S094 course on deep learning for self-driving cars

The gist

This is a solo MIT 6.S094 lecture by Lex Fridman covering deep reinforcement learning and its application to motion planning. He builds from the fundamentals of supervised, unsupervised, and reinforcement learning, through perceptrons, neural networks, and Q-learning with the Bellman equation. He explains how DeepMind's deep Q-networks learned to play Atari games from raw pixels and how AlphaGo beat the world Go champion. The lecture culminates in the unveiling of DeepTraffic, a browser-based deep reinforcement learning competition where students design neural networks to drive a car at high speed on a simulated seven-lane highway.

Big reveals

  • Fridman unveils the first course project, DeepTraffic, where students use deep reinforcement learning to solve a traffic simulation and compete on a leaderboard for a special prize.
  • DeepMind's deep Q-network learned to play Atari games better than humans using only raw pixels as input, demonstrated in the 2013 paper.
  • After four hours of training, the Atari Breakout agent discovers the lazy strategy of drilling a hole through the blocks to trap the ball at the top.
  • DeepTraffic challenges students to build a network achieving 65 mph or higher on a seven-lane highway, all running live in the browser.
  • The DeepTraffic agent chooses among five actions: move left, move right, stay, accelerate, or slow down, learned via deep Q-learning in the browser.
  • AlphaGo beat the Go world champion by first training a policy network on expert games before learning through self-play simulation.

Things worth remembering

  • A single neuron can approximate a NAND gate, a universal logic gate from which any computer can be built.
  • A 28x28 pixel digit image is 784 numbers, each from 0 to 255, forming the 784-neuron input layer of the classifier network.
  • Reinforcement learning follows an explore-then-exploit pattern using an epsilon-greedy policy that takes random actions with probability epsilon.
  • The Q-table for four stacked 84x84 grayscale Atari frames would be larger than the number of atoms in the universe.
  • After only ten minutes of training the Breakout agent learns almost nothing, but after two hours on a single GPU it reaches human-level play.
  • DeepTraffic training runs in a separate Web Worker thread at about a thousand frames per second, far faster than real time.
  • DeepTraffic uses ConvNet.JS, a JavaScript neural network library written by Andrej Karpathy of Stanford and OpenAI.
  • DeepTraffic was featured on the front page of Hacker News, so Fridman delayed posting the links publicly.