Theano Tutorial (Pascal Lamblin, MILA)

The guest

Pascal Lamblin — Core Theano developer and researcher at MILA (Montreal Institute for Learning Algorithms), the lab where Theano originated.

The gist

Pascal Lamblin gives a technical introduction to Theano, describing it as a mathematical symbolic expression compiler that lets users define computation graphs with numpy-like syntax, perform automatic differentiation, and compile optimized functions that run on CPU or GPU. He walks through defining symbolic and shared variables, building expressions, computing gradients via backpropagation, and compiling functions with updates for training. The talk includes live Jupyter notebook examples applying logistic regression to the MNIST digit dataset, a convolutional LeNet architecture, and an LSTM for character-level text generation. He also covers graph optimization, GPU usage, the scan operation for loops, debugging tools, and recent and upcoming features. The session closes with audience Q&A on debugging shape errors and distributing Theano models.

Big reveals

Theano is defined as a mathematical symbolic expression compiler that builds computation graphs and performs symbolic automatic differentiation.
00:01:37
Backpropagation in Theano is achieved by calling theano.grad, which returns symbolic gradient expressions rather than numerical values.
00:09:08
Compiling a Theano function automatically optimizes the graph, removing redundant computations and improving numerical stability.
00:16:03
Theano generates C++ or CUDA code on the fly for elementwise loop fusion, compiling it into Python modules at runtime.
00:21:52
The scan operation encapsulates a step function to express loops in the acyclic graph, enabling dynamic-length sequence models and backprop through time.
00:26:06
The LSTM character model, seeded with 'the meaning of life is', gradually learns to produce more coherent text across training.
00:55:41

Things worth remembering

Theano is over eight years old, starting from a small group of contributors in a lab then called Lisa, the ancestor of MILA.
00:03:12
Higher-level libraries like Blocks, Keras and Lasagne are built on top of Theano as a backend.
00:03:50
PyMC3 uses Theano not for machine learning but for probabilistic programming.
00:04:20
Explicitly computing the full Jacobian matrix is usually a bad idea; only the vector-Jacobian product (the L-operator) is needed.
00:09:42
Optimizations include replacing X divided by X with one and turning log of softmax into a more numerically stable operation.
00:17:07
GPUs generally have poor double-precision performance, so float32 or experimental float16 is recommended for storage.
00:23:26
Sequences in a mini-batch are grouped by similar length and padded only to the longest sequence within that batch for efficiency.
00:53:35
Theano is tightly intermingled with Python because Python handles all memory management during execution, making standalone distribution difficult.
01:02:06

Topics

Theano deep learning automatic differentiation GPU computing MNIST convolutional neural networks LSTM MILA