Home Lex Fridman Notes
Lex Fridman · 2020-02-15 · 1h 19m

Complete Statistical Theory of Learning (Vladimir Vapnik) | MIT Deep Learning Series

Vladimir Vapnik unveils his complete statistical theory of learning, arguing intelligence lives in smart predicates, not brute-force data.

Complete Statistical Theory of Learning (Vladimir Vapnik) | MIT Deep Learning Series
The guest

Vladimir Vapnik — Co-inventor of support vector machines and VC (Vapnik-Chervonenkis) theory, and author of 'Statistical Learning Theory.' One of the most influential statisticians and computer scientists in machine learning, who began his career in the Soviet Union.

The gist

In this MIT Deep Learning Series lecture, Vladimir Vapnik presents what he calls the 'complete' statistical theory of learning. He reviews classical VC theory and the role of VC-dimension, then introduces a second, intelligence-based principle built on 'weak convergence' and statistical invariants rather than brute-force data. Using the metaphor of the duck test, he argues that 'predicates' (abstract properties like symmetry) plus invariants drawn from data let a learner generalize from far fewer examples. He shows closed-form solutions in reproducing kernel Hilbert space, ties the idea to support vector machines and neural nets, and frames the search for smart predicates as the true essence of intelligence.

Big reveals

  • Vapnik says about five years ago he discovered a second learning principle beyond minimizing training error, an 'intelligent' principle versus the 'brute force' data principle.
  • He claims there are only two ways to generalize, data and invariants, so combining both yields a 'complete' theory with no third option possible.
  • He grounds the whole approach in the duck test: admissible functions are those that classify an animal as a duck if it looks, swims, and quacks like one.
  • Contrarian claim: more predicates DECREASE VC-dimension and reduce overfitting, the opposite of adding more features.
  • Vapnik argues the representer theorem implies a one-layer network may suffice, casting doubt on deep networks: 'I am not fond of neural nets.'
  • On a diabetes dataset, adding a targeted invariant improved error from .73 to .07 by patching the region where invariants were violated.
  • He invokes Vladimir Propp's finding that just 31 predicates can synthesize all Russian folk tales, suggesting a small predicate set could describe all 2D images.

Things worth remembering

  • Vapnik began statistical learning theory with Professor Chervonenkis about 50 years ago.
  • A set of functions can minimize the learning functional if and only if its VC-dimension is finite.
  • VC-dimension can be smaller than the dimensionality of the space, so it can be partly controlled.
  • Vapnik calls the indicator-loss setting impossible to optimize by gradient because it is zero everywhere except isolated points.
  • Kolmogorov found in 1933 an exact bound justifying replacing the unknown distribution with the empirical distribution.
  • The classical Nadaraya-Watson estimator, Vapnik shows, is actually the solution of a 'corrupted' version of the correct equation.
  • Vapnik says the hardest part of scientific discovery, as in physics, is finding the contradictory situation where invariants fail.
  • He traces his predicate philosophy from Plato's 'vault of ideas' through Hegel to Wigner's 'unreasonable effectiveness of mathematics.'
  • His challenge: match deep networks' 0.5% error rate using just 1% of the 60,000 training examples.
  • He answers an overfitting question by noting infinite predicates would leave you with exactly one function.

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

Guest’s ownBook

Statistical Learning Theory

Vladimir Vapnik

“VC theory of statistical learning, and author of "Statistical Learning Theory". He's one of the greatest and most impactful statisticians” — Lex Fridman 00:00:00
Find it on Amazon