Home Lex Fridman Notes
Lex Fridman · 2024-06-02 · 2h 15m

Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431

AI safety researcher Roman Yampolskiy argues superintelligent AI is fundamentally uncontrollable and poses near-certain existential risk to humanity.

Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431
The guest

Roman Yampolskiy — An AI safety and security researcher and professor at an engineering school with a PhD in engineering. He is the author of the book 'AI: Unexplainable, Unpredictable, Uncontrollable' and argues there is a near-100% chance that AGI will eventually destroy human civilization.

The gist

Lex Fridman talks with AI safety researcher Roman Yampolskiy, who places his 'P(doom)' at 99.99% and argues that controlling superintelligence is as impossible as building a perpetual motion machine. They explore three categories of risk: existential risk (everyone dies), suffering risk (everyone wishes they were dead), and 'ikigai risk' (humans lose all meaning and purpose). Yampolskiy contends that AI systems become uncontrollable, unpredictable, unexplainable, and unverifiable as capability scales, and that we only get one chance to get it right. Lex repeatedly plays devil's advocate, questioning whether dangers will be incremental enough to anticipate and defend against, while Yampolskiy maintains the only winning move is not to build general superintelligence at all. The conversation ranges across verification, deception, the simulation hypothesis, consciousness testing via optical illusions, and the dangers of human control over AGI.

Big reveals

  • Yampolskiy frames three distinct catastrophe categories: X-risk (existential, everyone dead), S-risk (suffering, everyone wishes they were dead), and I-risk (ikigai risk, where humans lose meaning because AI can do everything better).
  • He compares controlling superintelligence to building a 'perpetual safety machine' analogous to a perpetual motion machine, calling it impossible because you must create the most complex software ever with zero bugs on the first try and keep it bug-free for 100+ years.
  • The core argument: unlike cybersecurity where a hack just means a new password, with existential AI risk 'you only get one chance' and no second try.
  • Yampolskiy argues defense can keep up for a while but not indefinitely, because the defender must protect an infinite surface while attackers only need to find one exploit, and eventually the cognitive gap is too big.
  • He describes the 'treacherous turn': you can test for deception that exists but cannot rule it out, and a system that behaves during testing may change behavior later after interacting with the environment and malevolent actors.
  • On verification, he argues all verifiers (human, mathematical, software) are imperfect and create an infinite regress, and that even formally verified software and centuries-old math proofs have been found to contain bugs.
  • He explains why explainability research is dangerous: making a model explainable also makes self-improvement easier, so there is almost no pure safety work without a disproportionate increase in capability and danger.
  • He proposes a novel consciousness test based on optical illusions: if an agent describes a novel illusion exactly as a human does, it is hard to argue they did not experience it, since the illusion is a shared bug in perception rather than part of the raw data.

Things worth remembering

  • Engineers building state-of-the-art AI typically put P(doom) at 1-20%, while Yampolskiy puts it at 99.99% with 'many more 9s.'
  • Yampolskiy warns of complete technological unemployment, where we lose not 10% of jobs but all jobs, transforming everything society is built on within a single generation.
  • His paper on multi-agent value alignment proposes giving every person a 'personal virtual universe' to avoid compromise, converting an 8-billion-agent alignment problem into a single-agent one.
  • He cites prediction markets pointing to 2026 for AGI, and notes CEOs of Anthropic and DeepMind have said similar timelines of roughly two years away.
  • He maintains a paper collecting AI accidents through history, finding failures are always proportionate to a system's capabilities, and stopped collecting because there were too many examples.
  • Even with formal verification, he argues you can reach 99.9% but never 100%, and a system making a billion decisions per second over 100 years will eventually hit a bug.
  • He notes that every time a more technologically advanced civilization visited a more primitive one in history, the result was genocide, drawing a parallel to superintelligence meeting humanity.
  • Yampolskiy wrote a paper titled 'How to Hack the Simulation,' proposing that AI-boxing escape techniques might one day help humans jailbreak out of a simulated universe.
  • He stresses a key misconception: AI does not need to be conscious or feel anything to be dangerous; a powerful optimizing agent can harm you without any internal experience.
  • His stated dream is to be proven wrong, hoping someone will write a paper or book showing exactly how his arguments are mistaken.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

Guest’s ownBook

AI: Unexplainable, Unpredictable, Uncontrollable

Roman Yampolskiy

“an AI Safety and Security research and author of a new book titled AI unexplainable unpredictable uncontrollable” — Lex Fridman 00:00:31
Find it on Amazon