Home Lex Fridman Notes
Lex Fridman · 2024-04-17 · 2h 50m

Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426

MIT psycholinguist Ted Gibson explains why human languages share deep structural patterns, how language differs from thought, and what LLMs reveal about form versus meaning.

Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426
The guest

Edward Gibson — Ted Gibson is a psycholinguistics professor at MIT who heads the MIT Language Lab investigating why human languages look the way they do. He approaches language from a computer-science and mathematical angle, studying syntax, dependency grammar, and the cognitive cost of communication.

The gist

Ted Gibson lays out his theory that human language is an invented communication system optimized to minimize the distance between dependent words, a pattern that holds across roughly 60 analyzed languages. He contrasts his data-driven, dependency-grammar approach with Noam Chomsky's movement-based, innateness-focused theory of grammar. Drawing on fMRI work from Ev Fedorenko's lab, he argues that language comprehension is a separate brain network from thinking itself, with implications for how we interpret large language models. He explores why legalese is so hard to understand (center-embedded clauses), how color and number words evolve from communicative need, and fieldwork with Amazonian groups like the Piraha who have no exact counting words. The conversation closes on machine translation, animal communication, and language as cultural identity.

Big reveals

  • The Piraha people of the Amazon have no words for exact numbers at all, not even a word for one, making questions like 'I want two of those' literally impossible to ask in their language.
  • Of roughly 1,000 languages with documented word order, about 95% fit a 'harmonic' pattern, splitting nearly half-and-half between verb-initial (like English) and verb-final (like Japanese) structures.
  • fMRI evidence shows the brain's language network is completely separate from thinking: tasks like chess, math, music, and programming activate other regions entirely, and global aphasics who lost their language network can still do all these tasks.
  • Legalese is uniquely hard because about 70-80% of contract and law sentences contain center-embedded clauses (versus ~20% in other texts), and removing that nesting dramatically improves comprehension even for lawyers.
  • Gibson argues large language models are arguably the best current theory of human language because they predict what is grammatical in English better than any other theory, even though they capture form rather than meaning.
  • LLMs fail a modified Monty Hall problem, insisting you should trade even when you already know the prize location with 100% certainty, because they are locked onto the form they have seen hundreds of times.
  • Languages die primarily for economic reasons: the Moseten language is dying because Spanish has more value for feeding families, showing language survival is driven by money and community function, not aesthetics.

Things worth remembering

  • English has about 11 universally known color words, while the Dani of Papua New Guinea label only two (roughly black and white) and the Tsimane of Bolivia know between three and seven, with red almost always the third color a culture adds.
  • Noam Chomsky invented phrase structure grammar and formal language theory in the late 1950s, and in 1971 showed that grammars with both phrase structure and movement create learnability problems, which led him to argue grammar must be innate.
  • Ted Gibson is married to MIT neuroscientist Ev Fedorenko and has been scanned by her repeatedly since around 2007; his language network has stayed essentially identical over those years, as stable as his face.
  • Constructed languages like Klingon and the Game of Thrones languages activate the same brain language network as natural languages, because they were built to function like real human languages.
  • Gibson reports he has no inner voice while thinking, even though polls suggest about 70-80% of people do experience one.
  • The visual word form area is a specialized brain region for reading that you only develop if you learn to read, demonstrating that brain modularization does not require innateness.
  • Claude Shannon developed information theory as his MIT master's thesis around 1948, was originally interested in human language, but moved to Bell Labs because communication-based theories of language were unpopular at the time.
  • An ex-student of Gibson's, Richard Futrell, showed that across roughly 60 languages with parsed dependency structures, real sentences always have shorter dependency lengths than randomly scrambled control versions.
  • Gibson did fieldwork with the Piraha (with Dan Everett) and the Tsimane, both 'isolate' languages with no known relatives, which survive partly because the Amazon has no earthquakes or droughts forcing population movement and language contact.
  • Gibson estimates Hemingway likely has the shortest average per-sentence dependency length among famous authors because of his simple, short sentences with mostly local word connections.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

Guest’s ownBook

Syntax: A Cognitive Approach

Edward Gibson

“he should have a book titled syntax a cognitive approach published by MIT press coming out this fall so look out for that” — Lex Fridman 00:00:32
Find it on Amazon