Home Lex Fridman Notes
Lex Fridman · 2019-04-29 · 1h 46m

Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20

DeepMind's Oriol Vinyals on how AlphaStar beat top StarCraft pros and what it reveals about learning, language, and AGI.

Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20
The guest

Oriol Vinyals — Senior research scientist at Google DeepMind, lead researcher of the AlphaStar project, behind seminal deep learning work in sequence-to-sequence learning, image captioning, and neural machine translation.

The gist

Lex Fridman interviews Oriol Vinyals, the DeepMind lead behind AlphaStar, the agent that defeated top professional StarCraft II players. Vinyals traces his path from competitive StarCraft gaming in 1990s Europe to building AlphaStar, explaining the game's core AI challenge of exploration in a vast action space with partial observability. He details the technical approach: imitation learning from human replays, transformers and LSTMs for sequence modeling, and the AlphaStar League of diverse self-play agents. The conversation broadens into generalization as deep learning's central problem, the feasibility of the Turing test, meta-learning, and what AGI might concretely look like.

Big reveals

  • AlphaStar bootstraps from human replays via imitation learning, but those imitation-only agents are not as good as the humans they imitate, so self-play is required.
  • DeepMind tells the network the skill level (MMR) it is imitating, so the policy can be controlled to play like a 3,000 or 6,000 MMR player.
  • The team imposes action-per-minute cutoffs to keep agents human-like, though the cutoff may have been set too high, sparking debate about fairness.
  • The AlphaStar League deliberately creates different agent 'personalities' (cheesy, greedy, aggressive) rather than pure self-play, like building a battlenet for agents.
  • Vinyals expected to lose 5-0; he sent the team a betting email and treated the first match as a test run before AlphaStar swept the games.
  • He defines a concrete near-term AGI milestone as meta-learning: a network that solves genuinely new problems at human speed without restarting its weights.

Things worth remembering

  • Vinyals played StarCraft pseudo-professionally in Europe around 1998 and was consistently top-32 in Europe for a couple of years.
  • He preferred to play 'random' race in tournaments to understand all three races and learn what annoys opponents from the other side.
  • Early-game exploration is brutally hard because nearly any random action (pulling workers off mining) is bad, making win signals extremely rare.
  • StarCraft runs at about 22 frames per second, creating an enormous number of time steps for the agent to reason over.
  • Programmatic StarCraft bots can issue 20,000-40,000 actions per minute, while top human pros do roughly 300-800.
  • Cloaked (invisible) units appear to humans as a subtle 'shimmer,' a space-time distortion, which is very hard to faithfully simulate for the AI.
  • Vinyals likened watching pro player MaNa lose, unable to find excuses, to Kasparov losing to Deep Blue.
  • Image captioning emerged when Vinyals changed one line of code in a machine-translation model and it began producing captions overnight.