Home Lex Fridman Notes
Lex Fridman · 2016-09-27 · 1h 19m

Nuts and Bolts of Applying Deep Learning (Andrew Ng)

Andrew Ng shares practical lessons for organizing deep learning projects, from bias-variance analysis to building a career in machine learning.

Nuts and Bolts of Applying Deep Learning (Andrew Ng)
The guest

Andrew Ng — Co-founder of Coursera, former leader of Baidu's AI team and Google Brain, and a leading deep learning researcher and educator.

The gist

In this whiteboard-style talk at a deep learning workshop, Andrew Ng distills common patterns he observed leading a large AI team across vision, speech, and NLP applications. He argues that scale of data and compute is the number one driver of deep learning progress, and that end-to-end deep learning is powerful but only works when you have enough labeled data. Much of the talk is a practical workflow for diagnosing models using human-level performance, bias, and variance, including how to handle train/test sets drawn from different distributions. He closes with career advice: read 20-50 papers, replicate results, embrace the dirty work, and study consistently weekend after weekend.

Big reveals

  • Ng argues the single biggest reason deep learning works now is scale, large neural networks trained on the huge data we finally have access to.
  • He frames the rise of end-to-end deep learning, learning algorithms that output complex things like sentences, captions, or audio, as the second major trend.
  • Ng recounts publicly arguing phonemes are a fantasy of linguists and being yelled at by a linguist at Stanford, but says they turned out to be right.
  • He uses Baidu's speech-enabled rearview mirror product in China to show why train and test sets often come from different distributions.
  • Best practice revealed: your dev set and test set must come from the same distribution, or months of tuning can be wasted.
  • Rule of thumb: if a typical person can do a task in less than one second of thinking, deep learning can probably automate it.
  • His reliable formula for becoming a machine learning researcher: read 20 to 50 papers and replicate results, and you will start having your own ideas.
  • The Saturday story, real career growth comes from studying and replicating results weekend after weekend for a year despite no short-term rewards.

Things worth remembering

  • Ng recommends building a separate computer systems team alongside the AI team because HPC expertise is too specialized for one person to also master ML.
  • Doctors read X-rays of a child's hand to predict the child's age, a task where non-end-to-end pipelines work better due to limited data.
  • Synthetic OCR data can be generated by pasting random English words in random fonts onto random internet images, but requires heavy tuning to match the real distribution.
  • Speech recognition training data can be synthesized by mathematically adding clean speech to recorded background noise like car interior sounds.
  • Using Grand Theft Auto cars as training data fails because a game may show only about 20 distinct cars, an impoverished dataset for a learning algorithm.
  • Ng mandated a single unified company-wide data warehouse at Baidu, treating data as company data with separate discussions only about access rights.
  • Ng argues a team of expert doctors debating an image, around 0.5% error, is the most useful definition of human-level performance because it best estimates the optimal error rate.
  • Predicting whether a user clicks the next ad is described as probably the most lucrative application of deep learning today.
  • A phoneme is the basic unit of sound, for example the shared 'c' sound in 'cat' and 'kick' that linguists hypothesized as fundamental.