Home Lex Fridman Notes
Lex Fridman · 2024-10-06 · 2h 29m

Cursor Team: Future of Programming with AI | Lex Fridman Podcast #447

The Cursor founders break down how AI is reshaping programming, from tab-completion and diffs to scaling laws, bug-finding, and the future of coding.

Cursor Team: Future of Programming with AI | Lex Fridman Podcast #447
The guest

Michael Truell, Sualeh Asif, Arvid Lunnemark, and Aman Sanger (Anysphere / Cursor team) — The founding members of Anysphere, makers of Cursor, an AI-assisted code editor built as a fork of VS Code. All four were originally Vim users who started building Cursor after seeing GPT-4's capabilities and believing all of programming would flow through these models.

The gist

Lex Fridman talks with the four founders of the Cursor team about the present and future of AI-assisted programming. They explain Cursor's origins as a VS Code fork, the technical guts of features like Cursor Tab (next-action prediction), the 'apply' model, speculative edits, and KV caching, plus how they make everything feel fast. The conversation ranges across which LLMs are best at coding, why benchmarks diverge from real programming, agents, bug-finding, formal verification, scaling laws, synthetic data, test-time compute, and infrastructure challenges of indexing huge codebases. They close with a vision of a 'human in the driver's seat' future where programmers inject intent at high bandwidth and programming becomes more fun.

Big reveals

  • Cursor's origin traces to the 2020 OpenAI scaling laws papers and a decisive 'step up in capabilities' when the team got early access to GPT-4 at the end of 2022.
  • They build the UX and train the models in-house simultaneously, with the person making the UI and the person training the model often sitting 18 feet away or being the same person.
  • Cursor Tab's core idea is eliminating all 'low-entropy' / 'zero-entropy' keystrokes by predicting the next edit and next location, so you can keep pressing tab to jump forward.
  • Cursor runs an ensemble of custom-trained models alongside frontier models; the 'apply' step (turning a rough sketch into a real diff) requires a custom model because deterministic matching fails over 40% of the time.
  • They make apply fast using speculative edits, a variant of speculative decoding that feeds chunks of the original code back into the model to process many lines in parallel.
  • Even the smartest models (including o1) are 'incredibly poorly calibrated' and bad at bug-finding because real bug-detection examples barely exist in pretraining data.
  • Arvid is excited about homomorphic encryption for LLM inference as an alternative to local models, warning that as models get more useful, the world's information will flow through one or two centralized actors.
  • Their vision keeps the programmer 'in the driver's seat' with control and speed, rejecting the pure text-box-to-engineering-department model because it abdicates too many important decisions.

Things worth remembering

  • GitHub Copilot, with a beta out in 2021, is described as the first real LLM consumer product and the first killer app for language models.
  • Aman bet around June 2022 that AI models would win an IMO gold medal by 2024; the others thought he was delusional, and he turned out to be essentially right (one point away), foreshadowing DeepMind's later result.
  • Code has lower bits-per-byte (character-normalized loss) than natural language, meaning many tokens in code are highly predictable.
  • Cursor Tab uses a sparse mixture-of-experts model because the task is extremely prefill-token-hungry: huge input prompts but small output.
  • A conspiracy theory the guests discuss: Claude's perceived 'degraded performance' may stem from quantized versions on AWS Bedrock having different numerics than Anthropic's own GPUs.
  • Cursor has an internal system called 'Preempt' that uses a React/JSX-style declarative approach (with priorities and Z-index-like ordering) to render prompts that fit dynamic context windows.
  • To keep a local codebase in sync with the server cheaply, Cursor uses a Merkle tree of file/folder hashes, reconciling only the root hash unless it mismatches, avoiding hammering the database or user Wi-Fi.
  • Cursor never stores user code on its servers, only vectors in a vector database, and caches embeddings keyed by chunk hash so the Nth person at a company gets fast indexing without re-embedding.
  • Over 80% of Cursor users are on Windows machines, many not very powerful, which is a major reason local model inference is impractical.
  • The team recently spent about five days migrating their codebase from Node.js async local storage (known to be non-performant) to a context object, a task they hope future AI tools could do in 10 minutes.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

Guest’s ownProduct

Cursor

Anysphere (inferred)

“first up big ridiculous question what's the point of a code editor so the code editor is largely the place where you build software” — Michael Truell 00:01:03
Find it on Amazon
RecommendedProduct

Claude 3.5 Sonnet

Anthropic

“and kind of coding capabilities the one that I'd say right now is just kind of net best is Sonnet” — Aman Sanger 00:37:26
Find it on Amazon
RecommendedProduct

Amazon Web Services (AWS)

Amazon

“AWS is just really really good it's really good like whenever you use an AWS product you just know that it's going to work” — Arvid Lunnemark 01:28:33
Find it on Amazon