Home Lex Fridman Notes
Lex Fridman · 2025-02-03 · 5h 06m

DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters | Lex Fridman Podcast #459

Dylan Patel and Nathan Lambert break down DeepSeek, AI training economics, export controls, NVIDIA, TSMC, and the global compute race.

DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters | Lex Fridman Podcast #459
The guest

Dylan Patel and Nathan Lambert — Dylan Patel runs SemiAnalysis, a research firm specializing in semiconductors, GPUs, and AI hardware. Nathan Lambert is a research scientist at the Allen Institute for AI (AI2) and author of the AI blog Interconnects.

The gist

Lex Fridman sits down with Dylan Patel and Nathan Lambert to dissect the DeepSeek moment that shook the AI world, explaining how DeepSeek V3 and R1 were trained so cheaply through mixture-of-experts and multi-head latent attention innovations. They walk through pre-training versus post-training, reinforcement learning with verifiable rewards, reasoning models, and why memory and interconnect now matter as much as flops. The conversation covers US-China geopolitics, semiconductor export controls, TSMC's central role in the global supply chain, and the case for and risks of those controls. They also analyze the economics of NVIDIA, the massive AI megacluster buildouts by Elon Musk, Meta, OpenAI, and Google, and the realities behind OpenAI's $500 billion Stargate announcement. The episode closes on open-source AI, agents, the future of software engineering, and broad optimism tempered by concerns about persuasion and concentration of power.

Big reveals

  • SemiAnalysis research believes DeepSeek actually has closer to 50,000 GPUs total, far more than the 2,000 H800s publicly claimed for the V3 pre-training run, with the rest shared across the hedge fund and research.
  • DeepSeek R1 is roughly 27 times cheaper than OpenAI's o1 (about $2 vs $60 per million output tokens), driven by genuine efficiency plus OpenAI's ~75%+ inference gross margins.
  • DeepSeek cannot actually serve its own model at scale — they stopped signups and users get under five tokens per second because they lack the GPUs.
  • Meta upstreamed a PyTorch operator (PowerPlantNoBlowup) that makes GPUs compute fake numbers during weight exchange so power transient spikes don't blow up the power plant.
  • Elon Musk built the world's largest single connected cluster — 200,000 GPUs in a converted Memphis appliance factory — using on-site natural gas generation, Tesla Megapacks, and 90 external water chillers.
  • OpenAI does not currently have the money for Stargate: the $500 billion figure is largely aspirational, the first $100B phase is mostly total-cost-of-ownership, and OpenAI is legally on the line for $19B while only having about $6B raised plus $4B debt.
  • Google actually has the biggest cluster overall, but spreads its TPUs across multiple data centers ~30 miles apart (Iowa/Nebraska) connected by high-bandwidth fiber rather than one building.
  • China only just released a roughly trillion-RMB (~$160 billion) AI subsidy and DeepSeek's CEO only recently met the second-in-command of China — suggesting Xi Jinping has not yet fully prioritized AI.

Things worth remembering

  • DeepSeek's model has ~600 billion parameters but only activates around 37 billion per token thanks to its high-sparsity mixture-of-experts design (8 of 256 experts versus the more typical 2 of 8).
  • DeepSeek wrote custom low-level GPU scheduling using PTX (below NVIDIA's NCCL/nickel library) to work around the cut interconnect bandwidth of the China-restricted H800 chips.
  • An early AI2 model training loss spike was traced to a subreddit called 'microwave gang' where users post extremely long sequences of the letter M.
  • Answering an ARC-AGI question with OpenAI o3 cost roughly $5 to $20 per question using about 1,000 samples — a thousand-to-ten-thousand-fold cost difference versus a normal chat query.
  • The cost to run inference at GPT-3 level intelligence has fallen about 1,200x in roughly three years, from ~$60 per million tokens to a few cents.
  • There are only three places in the world doing leading-edge semiconductor R&D: Hsinchu (Taiwan), Hillsboro (Oregon), and Pyeongtaek (South Korea).
  • Single men in rural China face roughly a 30-to-1 male-to-female ratio, one of several factors raised in the discussion of geopolitical instability.
  • GPUs are being smuggled into China at small scale by people flying first class from San Francisco carrying Supermicro server boxes, since the markup more than pays for the ticket.
  • DeepSeek R1's reasoning on a 'truly novel insight about humans' concluded humans convert selfish desires into cooperative systems by collectively pretending money, laws, and rights are real shared hallucinations.
  • AWS generates over 80% (likely over 90%) of Amazon's profit, and four of Amazon's top five gross-profit products are database-related.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

RecommendedProduct

Claude Sonnet 3.5

Anthropic (inferred)

“for me personally I find that Claude Sona 35 is the best model for programming except for tricky cases where I will use 01 Pro” — Lex Fridman 00:02:37
Find it on Amazon