Foundations and Challenges of Deep Learning (Yoshua Bengio)

The guest

Yoshua Bengio — Pioneering deep learning researcher, University of Montreal professor, and co-author of the Deep Learning textbook.

The gist

In this lecture-style talk, Yoshua Bengio lays out the high-level foundations of why deep learning succeeds and the major challenges that remain. He argues that neural nets beat the curse of dimensionality through compositionality: distributed representations and depth let models distinguish an exponential number of regions with only a linear number of parameters. He revisits the optimization landscape, explaining that in high dimensions the problem is dominated by saddle points rather than bad local minima, which are mostly near-optimal. He closes by framing unsupervised learning, disentangling factors of variation, and reconnecting machine learning with neuroscience as the biggest open challenges ahead.

Big reveals

The only way to beat the exponential curse of dimensionality is to use another exponential by making models compositional.
00:02:39
Distributed representations let the number of distinguishable regions grow exponentially while parameters grow only linearly.
00:11:42
In high-dimensional parameter spaces, training mostly encounters saddle points rather than bad local minima.
00:27:13
Most local minima in large neural nets concentrate in a narrow band just above the global minimum.
00:32:03
Bengio names unsupervised learning as the biggest challenge ahead for AI.
00:38:16
Model-based reinforcement learning needs unsupervised learning because dangerous states (like fatal driving mistakes) can never be sampled enough.
00:46:09
Backpropagation, while powerful, has no clear biological implementation in the brain.
00:59:28

Things worth remembering

A 2012 experiment in Antonio Torralba's lab at MIT found hidden units in a scene-recognition net that humans could assign clear semantic interpretations to.
00:16:25
In the 1990s many researchers abandoned neural nets due to theory showing exponentially many local minima in the training objective.
00:25:42
As network size increases, the distribution of achievable training costs concentrates around a single good value.
00:29:22
A two or three year old understands intuitive Newtonian physics without ever being taught equations, purely from unsupervised learning.
00:39:52
Bengio's lab has a paper applying supervised-style credit assignment ideas to sequence prediction in reinforcement learning.
00:51:57
Bengio proposed 'target prop,' a way of generalizing backprop by propagating targets to each layer.
01:00:34
Recent work on gradient estimation in deep recurrent nets produces updates resembling spike-timing-dependent plasticity (STDP) observed by neuroscientists.
01:00:34
A 2009 visualization experiment showed that hundreds of random initializations never converge to the same local minimum.
01:09:06

Recommended in this episode

Books, products and media the guest or host genuinely endorsed here — with the buy link.

Affiliate link — we may earn a commission at no extra cost to you.

Guest’s ownBook

Deep Learning

Ian Goodfellow, Yoshua Bengio, Aaron Courville

“the book that Ian Goodfellow erinkoval and I have written and it's now in presale by MIT press I think you can find it on Amazon” — Yoshua Bengio 00:00:00

Find it on Amazon

Topics

deep learning curse of dimensionality distributed representations optimization saddle points unsupervised learning reinforcement learning neuroscience