Home Lex Fridman Notes
Lex Fridman · 2024-03-07 · 2h 47m

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416

Yann LeCun argues autoregressive LLMs can't reach human-level AI, defends open-source models, and dismisses AGI doom scenarios.

Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416
The guest

Yann LeCun — Chief AI scientist at Meta, professor at NYU, and Turing Award winner; one of the seminal figures in the history of artificial intelligence and a leading proponent of open-sourcing AI development.

The gist

Yann LeCun explains why he believes autoregressive large language models like GPT-4 and Llama are missing essential components of intelligence and won't lead to human-level AI. He lays out his alternative vision built on joint embedding predictive architectures (JEPA), world models learned from video, energy-based models, and objective-driven planning. The conversation covers hallucinations, the Moravec paradox, and why sensory experience carries vastly more information than language. LeCun makes a forceful case for open-source AI as the only path to diverse, democratic AI systems and pushes back hard against AI doomers, arguing AGI will arrive gradually with guardrails rather than as a single catastrophic event.

Big reveals

  • LeCun argues autoregressive LLMs are not the path to superhuman intelligence because they lack four essential traits: understanding the physical world, persistent memory, reasoning, and planning.
  • A four-year-old has absorbed roughly 10^15 bytes through vision versus only ~2x10^13 bytes in all the public text an LLM trains on, showing sensory input dwarfs language as a learning source.
  • Training systems to reconstruct corrupted images self-supervised produces poor representations; LeCun's alternative is joint embedding (JEPA), predicting abstract representations rather than pixels.
  • Hallucinations are inherent: because of autoregressive prediction, the probability of staying in the set of correct answers decreases exponentially with the number of tokens produced.
  • LeCun's summary recommendation: abandon generative models, autoregressive generation, probabilistic models, and contrastive methods in favor of JEPA, energy-based models, and regularized methods.
  • He warns we cannot afford AI systems mediating all human knowledge to come from a handful of West Coast companies; open source is the answer to preserve diversity and democracy.
  • He calls AI doom scenarios mostly false: super-intelligence won't be a sudden event, intelligent systems won't inherently want to dominate, and the desire to dominate must be hardwired, not emergent.
  • LeCun's hope for humanity: AI will amplify human intelligence so everyone effectively manages a staff of smart AI assistants, like the printing press making everyone smarter.

Things worth remembering

  • LLMs train on roughly 10^13 tokens of public text; it would take a human 170,000 years reading 8 hours a day to get through that much data.
  • The Moravec paradox: computers easily play chess and solve integrals but struggle with tasks like driving or clearing a dinner table that any child does effortlessly.
  • Meta developed V-JEPA, applying the JEPA idea to video by masking a spatiotemporal 'tube' (typically 16 frames) to learn good video representations.
  • V-JEPA representations can preliminarily tell whether a video is physically possible or impossible based on whether objects appear, disappear, or jump location.
  • Meta has speech-to-speech translation systems covering hundreds of languages, including languages that have no written form, going directly speech-to-speech.
  • The founder of Infosys is funding a project to fine-tune Llama 2 to speak all 22 official languages of India.
  • Studies suggest LLMs don't meaningfully help someone build a bioweapon beyond what a search engine and library already provide, and actual fabrication requires hard-to-acquire physical expertise.
  • A GPU consumes about half a kilowatt to a kilowatt while the human brain runs on about 25 watts, leaving AI off by a factor of roughly 100,000 to a million in power efficiency.
  • The Ottoman Empire banned the printing press for Arabic for 200 years, partly to protect the powerful corporation of calligraphers and scribes.
  • LeCun cites Gemini generating images of a black George Washington and refusing to depict Tiananmen Square's Tank Man as examples of debiasing gone wrong.