Krizhevsky's Bedroom

On September 30, 2012, the results of the ImageNet Large Scale Visual Recognition Challenge came back. One team, SuperVision, registered a top-5 error rate of 15.3%. The next-best entry managed 26.2%. Nearly eleven percentage points of separation, in a field where a good year moved the needle by one or two.

The team was three people at the University of Toronto. Alex Krizhevsky, a grad student with an unusual knack for wringing performance out of GPUs. Ilya Sutskever, another grad student, who had spent months convincing Krizhevsky to try a deep convolutional network on ImageNet. Geoffrey Hinton, their advisor, who signed off on the attempt and later joked that "Ilya thought we should do it, Alex made it work, and I got the Nobel Prize."

The network that became AlexNet was trained on two Nvidia GTX 580 cards, consumer GPUs you could buy at a computer shop, sitting in Krizhevsky's bedroom at his parents' house. Sixty million parameters, five convolutional layers, three fully-connected. ReLU activations, dropout, and a very efficient GPU implementation of convolution that Krizhevsky had been writing for years as cuda-convnet. Training took five or six days.

The paper landed at NeurIPS that December. In October, at the European Conference on Computer Vision in Florence, Krizhevsky presented the work and the old guard was unimpressed. Yann LeCun, who had been arguing for convolutional nets since the late eighties, told anyone who would listen it was a turning point. He turned out to be correct. Before AlexNet, almost no leading computer-vision paper used neural nets. After it, almost all of them did.

The speed of the conversion is the part worth sitting with. Computer vision had spent twenty years perfecting hand-engineered feature pipelines. SIFT, HOG, deformable part models. Whole careers built on getting the descriptors right. A single result made most of that work obsolete inside a year. Research groups that had been refining feature extraction for a decade pivoted to training deep nets, often on the same kind of hardware, often using the code Krizhevsky had open-sourced.

That's the pattern Rich Sutton later formalised in nine paragraphs: general methods that scale with computation beat specialised methods that encode human understanding. AlexNet is the cleanest example in the set. The team didn't out-clever the competition. They out-computed it, on consumer hardware, against opponents with more institutional weight and better-tuned features.

The Computer History Museum released the original AlexNet source code in partnership with Google. Reading it now, it looks almost boring. A handful of CUDA kernels, a training loop, a few regularisation tricks. Nothing that couldn't be reimplemented in a weekend. What it did on September 30, 2012, cannot be reimplemented. That moment only happened once.

Sources:

AlexNet — Wikipedia
ImageNet Classification with Deep Convolutional Neural Networks — Krizhevsky, Sutskever, Hinton (NeurIPS 2012)
How AlexNet Transformed AI and Computer Vision Forever — IEEE Spectrum
AlexNet and ImageNet: The Birth of Deep Learning — Pinecone

Plutonic Rainbows

Krizhevsky's Bedroom

Recent Entries