It seems as though large language models (LLMs) exploded into public awareness almost overnight. Relatively few people had heard of GPT-2, but I would venture to guess relatively few people haven’t heard of ChatGPT.
But like most things, language models have a history. And, in addition to being outrageously interesting, that history can help us reason about the progress in LLMs, as well as their likely future impacts.
Let’s get started!
A Brief History of Artificial Intelligence Development
The human fascination with building artificial beings capable of thought and action goes back a long way. Writing in roughly the 8th century BCE, Homer recounts tales of the god Hephaestus outsourcing repetitive manual tasks to automated bellows and working alongside robot-like “attendants” that were “…golden, and in appearance like living young women.”
No mere adornments, these handmaidens were described as having “intelligence in their hearts” and stirring “nimbly in support of their master” because “from the immortal gods they have learned how to do things.”
Some 500 years later, mathematicians in Alexandria would produce treatises on creating mechanical servants and various kinds of automata. Heron wrote a technical manual for producing a mechanical shrine and an automated theatre whose figurines could be activated to stage a full tragic play through an intricate system of cords and axles.
Nor is it only ancient Greece that tells similar tales. Jewish legends speak of the Golem, a being made of clay and imbued with life and agency through the use of language. The word “abracadabra”, in fact, comes from the Aramaic phrase avra k’davra, which translates to “I create as I speak.”
Through the ages, these old ideas have found new expression in stories such as “The Sorcerer’s Apprentice”, Mary Shelley’s “Frankenstein”, and Karel Čapek’s “R.U.R.”, a science fiction play that features the first recorded use of the word “robot”.
From Science Fiction to Science Fact
But they remained purely fiction until the early 20th Century, when advances in the theory of computation, as well as the development of primitive computers, began to offer a path toward actually building intelligent systems.
Arguably, the field of artificial intelligence really began in earnest with the 1950 publication of Alan Turing’s “Computing Machinery and Intelligence” – in which he proposed the famous “Turing test” – and with the 1956 Dartmouth conference on AI, organized by luminaries John McCarthy and Marvin Minsky.
People began taking AI seriously. Over the next ~50 years, there were numerous periods of hype and exuberance in which major advances were made, as well as long stretches, known as “AI winters”, in which funding dried up and little was accomplished.
Neural networks and the deep learning revolution are two advances that are particularly important for understanding how large language models have evolved over time, so it’s to these that we now turn.
Neural Networks And The Deep Learning Revolution
The groundwork for future LLM systems was laid by Walter Pitts and Warren McCulloch in the early 1940s. Inspired by the burgeoning study of the human brain, they wondered if it would be possible to build an artificial neuron that had the same basic properties as a biological one, i.e. it would activate and fire once a certain critical threshold had been crossed.
They were successful, though several other breakthroughs would be required before artificial neurons could be arranged into systems that were capable of doing useful work. One such breakthrough was backpropagation, the basic algorithm that is still used to train deep learning systems. Backpropagation was developed in 1960, and it uses the errors in a model’s outputs to iteratively adjust its internal parameters.
It wasn’t until 1985, however, that David Rumelhart, Ronald Williams, and Geoff Hinton used backpropagation in neural networks, and in 1989, this allowed Yann LeCun to train a convolutional neural network to recognize handwritten digits.
This was not the only architectural improvement that came out of this period. Especially noteworthy were the long short-term memory (LSTM) networks that were introduced in 1997 by Sepp Hochreiter and Jürgen Schmidhuber, which made it possible to learn more complex functions.
With these advances, it was clear that neural networks could be trained to do useful work, and that they were poised to do so. All that was left was to gather the missing piece: data.
The Big Data Era
Neural networks and deep-learning applications tend to be extremely data-hungry, and access to quality training data has always been a major bottleneck. In 2009 Stanford’s Fei-Fei Li sought to change this by releasing Imagenet, a database of over 14 million labeled images that could be used for free by researchers. The increase in available data, together with substantial improvements in computer hardware like graphical processing units (GPUs), meant that at long last the promise of deep learning could begin to be fulfilled.
And it was. In 2011 a convolutional neural network called “AlexNet” won multiple international competitions for image recognition, IBM’s Watson system beat several Jeopardy! all-stars in a real game, and Apple launched Siri. Amazon’s Alexa followed in 2014, and from 2015 to 2017 DeepMind’s AlphaGo shocked the world by utterly dominating the best human Go players.
Substantial strides were made in language models. In 2018 Google introduced its Bidirectional Encoder Representations from Transformers (BERT), a pre-trained model capable of a wide array of tasks, like text summarization, translation, and sentiment analysis.
One Model To Rule Them All
It would be easy to miss the significance of AlexNet’s performance on the ImageNet competition or BERT’s usefulness across multiple tasks. For a long time, it was anyone’s guess as to whether it would be possible to train a single large model on a dataset and use it for a range of purposes, or whether it would be necessary to train a multitude of models for each application.
From 2011 onwards, it has become clear that large, general-purpose models are often the best way to go. This point has only become more reinforced, with the success of GPT-4 in everything from brainstorming scientific hypotheses to handling customer service tasks.
How Has Large Language Model Performance Improved?
Now that we’ve discussed this history, we’re well-placed to understand why LLMs and generative AI have ignited so much controversy. People have been mulling over the promise (and peril) of thinking machines for literally thousands of years. After all that time it looks like they might be here, at long last.
But what, exactly, has people so excited? What is it that advanced AI tools are doing that has captured the popular imagination? In the following sections, we’ll talk about the astonishing (and astonishingly rapid) improvements that have been seen in language models in just a few short years.
Getting To Human-Level
One of the more surprising things about LLMs such as ChatGPT is just how good they are at so many different things. LLMs are trained with a technique known as “self-supervised learning”. They take random samples of the text data they’re given, and they try to predict what words come next given the words that came before.
Suppose the model sees the famous opening lines of Leo Tolstoy’s Ann Karenina: “Happy families are all alike; unhappy families are all unhappy in their own way.” What the model is trying to do is learn a function that will allow it to predict “in their own way” from “Happy families are all alike; unhappy families are all unhappy ___”.
The modern crop of LLMs can do this incredibly well, but what is remarkable is just how far this gets you. People are using generative AI to help them write poems, business plans, and code, create recipes based on the ingredients in their fridges, and answer customer questions.
Emergence in Language Models
Perhaps even more interesting, however, is the phenomenon of “emergence” in language models. When researchers tested LLMs on a wide variety of tasks meant to be especially challenging to these models – things like identifying a movie given a string of emojis or finding legal chess moves – they found that in about 5% of tasks, there is a sudden, sharp increase in ability on a given task once a model reaches a certain size.
At present, it’s not really clear how we should think about emergence. One hypothesis for emergence is that a big enough model is able to learn some general piece of knowledge not attainable by a smaller cousin, while another, more prosaic one is that it’s a relatively straightforward consequence of the model’s internal statistical machinery.
What’s more, it’s difficult to pin down the conditions required for emergence in language models. Though it generally appears to be a function of model size, there are cases in which the same abilities can be achieved with smaller models, or with models trained on very high-quality data, and emergence shows up at different scales for different models and tasks.
Whatever ends up being the case, it’s clear that this is a promising direction for future research. Much more work needs to be done to understand how precisely LLMs accomplish what they accomplish. This will not only redound upon the question of emergence, it will also inform the ongoing efforts to make language models safer and less biased.
The GPT Series
The big recent news in AI has, of course, been ChatGPT. ChatGPT has proven useful in an astonishingly-wide variety of use cases and is among the first powerful systems to have been made widely available to the public.
ChatGPT is part of a broader series of GPT models built by OpenAI. “GPT” stands for “generative pre-trained transformer”, and the first of its kind was developed back in 2018. New models and major updates have been released at a rapid clip ever since, culminating with GPT-4 coming out in March of 2023.
At present, OpenAI’s CEO Sam Altman has claimed that there are no current plans to train a successor GPT-5 model, but there are other companies, like DeepMind, who could plausibly build a competitor.
What’s Next For Large Language Models?
Given their flexibility and power, LLMs are finding use across a wide variety of industries, from software engineering to medicine to customer service.
If your interest has been piqued and you’d like to talk to an expert at Quiq about incorporating it into your business, reach out to us to schedule a demo!