How AlexNet Transformed AI and Computer Vision Forever

In partnership with Google, the Computer History Museum has released the source code to AlexNet, the neural network that in 2012 kickstarted today’s prevailing approach to AI. The source code is available as open source on CHM’s GitHub page.

What Is AlexNet?

AlexNet is an artificial neural network created to recognize the contents of photographic images. It was developed in 2012 by then University of Toronto graduate students Alex Krizhevsky and Ilya Sutskever and their faculty advisor, Geoffrey Hinton.

Hinton is regarded as one of the fathers of deep learning, the type of artificial intelligence that uses neural networks and is the foundation of today’s mainstream AI. Simple three-layer neural networks with only one layer of adaptive weights were first built in the late 1950s—most notably by Cornell researcher Frank Rosenblatt—but they were found to have limitations. [This explainer gives more details on how neural networks work.] In particular, researchers needed networks with more than one layer of adaptive weights, but there wasn’t a good way to train them. By the early 1970s, neural networks had been largely rejected by AI researchers.

Black and white 1950s photo of Doctor Frank Rosenblatt and Charles W. Wightman working on a prototype of an electronic neural network using a screwdriver. Frank Rosenblatt [left, shown with Charles W. Wightman] developed the first artificial neural network, the perceptron, in 1957.Division of Rare and Manuscript Collections/Cornell University Library

In the 1980s, neural network research was revived outside the AI community by cognitive scientists at the University of California San Diego, under the new name of “connectionism.” After finishing his Ph.D. at the University of Edinburgh in 1978, Hinton had become a postdoctoral fellow at UCSD, where he collaborated with David Rumelhart and Ronald Williams. The three rediscovered the backpropagation algorithm for training neural networks, and in 1986 they published two papers showing that it enabled neural networks to learn multiple layers of features for language and vision tasks. Backpropagation, which is foundational to deep learning today, uses the difference between the current output and the desired output of the network to adjust the weights in each layer, from the output layer backward to the input layer.

In 1987, Hinton joined the University of Toronto. Away from the centers of traditional AI, Hinton’s work and those of his graduate students made Toronto a center of deep learning research over the coming decades. One postdoctoral student of Hinton’s was Yann LeCun, now chief scientist at Meta. While working in Toronto, LeCun showed that when backpropagation was used in “convolutional” neural networks, they became very good at recognizing handwritten numbers.

ImageNet and GPUs

Despite these advances, neural networks could not consistently outperform other types of machine learning algorithms. They needed two developments from outside of AI to pave the way. The first was the emergence of vastly larger amounts of data for training, made available through the Web. The second was enough computational power to perform this training, in the form of 3D graphics chips, known as GPUs. By 2012, the time was ripe for AlexNet.

Fei Fei Li speaking to Tom Kalil on stage at an event. Both of them are seated in arm chairs. Fei-Fei Li’s ImageNet image dataset, completed in 2009, was pivotal in training AlexNet. Here, Li [right] talks with Tom Kalil at the Computer History Museum.Douglas Fairbairn/Computer History Museum

The data needed to train AlexNet was found in ImageNet, a project started and led by Stanford professor Fei-Fei Li. Beginning in 2006, and against conventional wisdom, Li envisioned a dataset of images covering every noun in the English language. She and her graduate students began collecting images found on the Internet and classifying them using a taxonomy provided by WordNet, a database of words and their relationships to each other. Given the enormity of their task, Li and her collaborators ultimately crowdsourced the task of labeling images to gig workers, using Amazon’s Mechanical Turk platform.

Completed in 2009, ImageNet was larger than any previous image dataset by several orders of magnitude. Li hoped its availability would spur new breakthroughs, and she started a competition in 2010 to encourage research teams to improve their image recognition algorithms. But over the next two years, the best systems only made marginal improvements.

The second condition necessary for the success of neural networks was economical access to vast amounts of computation. Neural network training involves a lot of repeated matrix multiplications, preferably done in parallel, something that GPUs are designed to do. NVIDIA, cofounded by CEO Jensen Huang, had led the way in the 2000s in making GPUs more generalizable and programmable for applications beyond 3D graphics, especially with the CUDA programming system released in 2007.

Both ImageNet and CUDA were, like neural networks themselves, fairly niche developments that were waiting for the right circumstances to shine. In 2012, AlexNet brought together these elements—deep neural networks, big datasets, and GPUs— for the first time, with pathbreaking results. Each of these needed the other.

How AlexNet Was Created

By the late 2000s, Hinton’s grad students at the University of Toronto were beginning to use GPUs to train neural networks for both image and speech recognition. Their first successes came in speech recognition, but success in image recognition would point to deep learning as a possible general-purpose solution to AI. One student, Ilya Sutskever, believed that the performance of neural networks would scale with the amount of data available, and the arrival of ImageNet provided the opportunity.

In 2011, Sutskever convinced fellow grad student Alex Krizhevsky, who had a keen ability to wring maximum performance out of GPUs, to train a convolutional neural network for ImageNet, with Hinton serving as principal investigator.

Jensen Huang speaks behind a podium on an event stage. Behind him is a projector screen showing his name, along with a sentence underneath it that reads, "for visionary leadership in the advancement of devices and systems for computer graphics, accelerated computing and artificial intelligence". AlexNet used NVIDIA GPUs running CUDA code trained on the ImageNet dataset. NVIDIA CEO Jensen Huang was named a 2024 CHM Fellow for his contributions to computer graphics chips and AI.Douglas Fairbairn/Computer History Museum

Krizhevsky had already written CUDA code for a convolutional neural network using NVIDIA GPUs, called cuda-convnet, trained on the much smaller CIFAR-10 image dataset. He extended cuda-convnet with support for multiple GPUs and other features and retrained it on ImageNet. The training was done on a computer with two NVIDIA cards in Krizhevsky’s bedroom at his parents’ house. Over the course of the next year, he constantly tweaked the network’s parameters and retrained it until it achieved performance superior to its competitors. The network would ultimately be named AlexNet, after Krizhevsky. Geoff Hinton summed up the AlexNet project this way: “Ilya thought we should do it, Alex made it work, and I got the Nobel prize.”

Krizhevsky, Sutskever, and Hinton wrote a paper on AlexNet that was published in the fall of 2012 and presented by Krizhevsky at a computer vision conference in Florence, Italy, in October. Veteran computer vision researchers weren’t convinced, but LeCun, who was at the meeting, pronounced it a turning point for AI. He was right. Before AlexNet, almost none of the leading computer vision papers used neural nets. After it, almost all of them would.

AlexNet was just the beginning. In the next decade, neural networks would advance to synthesize believable human voices, beat champion Go players, and generate artwork, culminating with the release of ChatGPT in November 2022 by OpenAI, a company cofounded by Sutskever.

Releasing the AlexNet Source Code

In 2020, I reached out to Krizhevsky to ask about the possibility of allowing CHM to release the AlexNet source code, due to its historical significance. He connected me to Hinton, who was working at Google at the time. Google owned AlexNet, having acquired DNNresearch, the company owned by Hinton, Sutskever, and Krizhevsky. Hinton got the ball rolling by connecting CHM to the right team at Google. CHM worked with the Google team for five years to negotiate the release. The team also helped us identify the specific version of the AlexNet source code to release—there have been many versions of AlexNet over the years. There are other repositories of code called AlexNet on GitHub, but many of these are re-creations based on the famous paper, not the original code.

CHM is proud to present the source code to the 2012 version of AlexNet, which transformed the field of artificial intelligence. You can access the source code on CHM’s GitHub page.

This post originally appeared on the blog of the Computer History Museum.

Acknowledgments

Special thanks to Geoffrey Hinton for providing his quote and reviewing the text, to Cade Metz and Alex Krizhevsky for additional clarifications, and to David Bieber and the rest of the team at Google for their work in securing the source code release.

From Your Site Articles