Home Lex Fridman Notes
Lex Fridman · 2022-07-26 · 2h 10m

Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306

DeepMind's Oriol Vinyals on Gato, the generalist agent, and what it'll take to reach human-level AGI.

Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306
The guest

Oriol Vinyals — Research director and deep learning lead at DeepMind, and one of the most influential AI researchers of his generation, known for work on sequence models behind AlphaStar, AlphaFold, Flamingo, Chinchilla, and Gato.

The gist

Lex Fridman and Oriol Vinyals dig into the frontier of deep learning, centered on Gato, DeepMind's generalist agent that handles text, images, and actions with a single set of weights. They explore how tokenization unifies different modalities, why transformers and the attention mechanism became the dominant architecture, and the tension between training models from scratch versus reusing weights through modularity (as Flamingo did with Chinchilla). The conversation covers meta-learning, in-context prompting as an evolution of nearest-neighbor classification, and the phenomenon of emergent abilities that appear only past certain scale thresholds. Vinyals reflects on the role of humans, engineering, benchmarks, and data in driving progress, and weighs in on sentience claims, consciousness, and the path to AGI. He closes convinced human-level AI is achievable in his lifetime while urging society to prepare for the questions it raises.

Big reveals

  • Vinyals breaks down his viral tweet 'Gato is not the end, it's the beginning. Meow' and explains Gato was named for 'general agent' (and gato is Spanish for cat).
  • Reveals Gato is only ~1 billion parameters, far smaller than the trillion-parameter models of the era, yet handles text, vision, and actions with one shared brain.
  • Explains the core hard problem of the field: it's extraordinarily difficult to grow a network's weights rather than retraining from scratch every time.
  • Describes how Flamingo froze Chinchilla's 70B weights and bolted on ~10B new parameters to give a language model the ability to see.
  • As a human (not a DeepMind spokesperson), says he has never once thought current models are sentient and believes we're quite far from it.
  • Predicts a future civil rights movement for robots as people form deep relationships with AI systems that have names, stories, and memories.
  • States he is definitely convinced human-level intelligence will be achieved in his lifetime, though going beyond it via reward functions is less clear.
  • Recalls rehearsing Ilya Sutskever's famous sequence-to-sequence talk in a hotel room, where the original cut was even more controversial.

Things worth remembering

  • Language models at the time held only about 2,000 words of working memory before they start forgetting earlier context.
  • Text tokenization techniques trace back to n-gram models from the 1950s; a typical English word maps to roughly two to five tokens.
  • In Gato, modalities occupy separate integer ranges (e.g. ~1-10,000 for text, the next block for images, the highest for actions) and only the learning algorithm connects them.
  • Flamingo could understand the subtlety of a joke image (Obama putting his foot on a scale) that Andrej Karpathy once said no computer vision system could grasp.
  • Flamingo showed an emergent ability to do arithmetic from pictures of numbers after being shown just a few examples.
  • The transformer architecture has stayed remarkably stable, changing very little in the roughly five years since 'Attention Is All You Need.'
  • Deep learning's rise partly hinged on luck: GPUs happened to exist for video games at the right time to power neural networks.
  • Emergent abilities show up as 'phase transitions' where performance stays random until a scale threshold, then jumps to non-random.
  • Vinyals notes statistical language modeling progress traces back to Shannon in the 1950s, and over that timescale progress isn't actually that fast.
  • Discusses Rich Sutton's 'bitter lesson' that general methods leveraging computation ultimately win, agreeing on scale but staying skeptical about search.