A Deep Dive on Large Language Models—And What They Mean For You

The release of OpenAI’s ChatGPT in late 2022 has utterly transformed the conversation around artificial intelligence. Whether it’s generating functioning web apps with just a few prompts, writing Spanish-language children’s stories about the blockchain in the style of Dr. Suess, or opining on the virtues and vices of major political figures, its ability to generate long strings of coherent, grammatically-correct text is shocking.

Seen in this light, it’s perhaps no surprise that ChatGPT has achieved such a staggering rate of growth. The application garnered a million users less than a week after its launch.

It’s believed that by January of 2023, this figure had climbed to 100 million monthly users, blowing past the adoption rates of TikTok (which needed nine months to get to this many monthly users) and Instagram (which took over two years.)

Naturally, many have become curious about the “large language model” (LLM) technology that makes ChatGPT and similar kinds of disruptive generative AI possible.

In this piece, we’re going to do a deep dive on LLMs, exploring how they’re trained, how they work internally, and how they might be deployed in your business. Our hope is that this will arm Quiq’s customers with the context they need to keep up with the ongoing AI revolution.

What Are Large Language Models?

LLMs are pieces of software with the ability to interact with and generate a wide variety of text. In this discussion, “text” is used very broadly to include not just existing natural language but also computer code.

A good way to begin exploring this subject is to analyze each of the terms in “large language model”, so let’s do that now. Here’s our large language models overview:

LLMs Are Models.

In machine learning (ML), you can think of a model as being a function that maps inputs to outputs. Early in their education, for example, machine learning engineers usually figure out how to fit a linear regression model that does something like predict the final price of a house based on its square footage.

They’ll feed their model a bunch of data points that look like this:

House 1: 800 square feet, $120,000
House 2: 1000 square feet, $175,000
House 3: 1500 square feet, $225,000

And the model learns the relationship between square footage and price well enough to roughly predict the price of homes that weren’t in its training data.

We’ll have a lot more to say about how LLMs are trained in the next section. For now, just be aware that when you get down to it, LLMs are inconceivably vast functions that take the input you feed them and generate a corresponding output.

LLMs Are Large.

Speaking of vastness, LLMs are truly gigantic. As with terms like “big data”, there isn’t an exact, agreed-upon point at which a basic language model becomes a large language model. Still, they’re plenty big enough to deserve the extra “L” at the beginning of their name.

There are a few ways to measure the size of machine learning models, but one of the most common is by looking at their parameters.

In the linear regression model just discussed, there would be only one parameter, for square footage. We could make our model better by also showing it the home’s zip code and the number of bathrooms it has, and then it would have three parameters.

It’s hard to say how big most real systems are because that information isn’t usually made public, but a linear regression model might have dozens of parameters, and a basic neural network could range from a few hundred thousand to a few tens of millions of parameters.

GPT-3 has 175 billion parameters, and Google’s Minerva model has 540 billion parameters. It isn’t known how many parameters GPT-4 has, but it’s almost certainly more.

(Note: I say “almost” certainly because better models don’t always have more parameters. They usually do, but it’s not an ironclad rule.)

LLMs Focus On Language.

ChatGPT and its cousins take text as input and produce text as output. This makes them distinct from some of the image-generation tools that are on the market today, such as DALL-E and Midjourney.

It’s worth noting, however, that this might be changing in the future. Though most of what people are using GPT-4 to do revolves around text, technically, the underlying model is multimodal. This means it can theoretically interact with image inputs as well. According to OpenAI’s documentation, support for this feature should arrive in the coming months.

How Are Large Language Models Trained?

Like all machine learning models, LLMs must be trained. We don’t actually know exactly how OpenAI trained the latest GPT models, as they’ve kept those details secret, but we can make some broad comments about how systems like these are generally trained.

Before we get into technical details, let’s frame the overall task that LLMs are trying to perform as a guessing game. Imagine that I start a sentence and leave out the last word, asking you to provide a guess as to how it ends.

Some of these would be fairly trivial; everyone knows that “[i]t was the best of times, it was the worst of _____,” ends with the word “times.” Others would be more ambiguous; “I stopped to pick a flower, and then continued walking down the ____,” could plausibly end with words like “road”, “street”, or “trail.”

For still others, there’d be an almost infinite number of possibilities; “He turned to face the ___,” could end with anything from “firehose” to “firing squad.”

But how is it that you’re able to generate these guesses? How do you know what a good ending to a natural-language sentence sounds like?

The answer is that you’ve been “training” for this task your entire life. You’ve been listening to sentences, reading and writing sentences, or thinking in sentences for most of your waking hours, and have therefore developed a sense of how they work.

The process of training an LLM differs in many specifics, but at a high level, it’s learning to do the same thing. A model like GPT-4 is fed gargantuan amounts of textual data from the internet or other sources, and it learns a statistical distribution that allows it to predict which words come next.

At first, it’ll have no idea how to end the sentence “[i]t was the best of times, it was the worst of ____.” But as it sees more and more examples of human-generated textual content, it improves. It discovers that when someone writes “red, orange, yellow, green, blue, indigo, ______”, the next sequence of letters is probably “violet”. It begins to be more sensitive to context, discovering that the words “bat”, “diamond”, and “plate” are probably occurring in a discussion about baseball and not the weirdest Costco you’ve ever been to.

It’s precisely this nuance that makes advanced LLMs suitable for applications such as customer service.

They’re not simply looking up pre-digested answers to questions, they’re learning a function big enough to account for the subtleties of a specific customer’s specific problem. They still don’t do this job perfectly, but they’ve made remarkable progress, which is why so many companies are looking at integrating them.

Getting into the GPT-weeds

The discussion so far is great for building a basic intuition for how LLMs are trained, but this is a deep dive, so let’s talk technical specifics.

Though we don’t know much about GPT-4, earlier models like GPT and GPT-2 have been studied in great detail. By understanding how they work, we can cultivate a better grasp of cutting-edge models.

When an LLM is trained, it’s fed a great deal of text data. It will grab samples from this data, and try to predict the next token in its sample. To make our earlier explanation easier to understand we implied that a token is a word, but that’s not quite right. A token can be a word, an individual letter, or “sub words”, i.e. small chunks of letters and spaces.

This process is known as “self-supervised learning” because the model can assess its own accuracy by checking its predicted next token against the actual next token in the dataset it’s training on.

At first, its accuracy is likely to be very bad. But as it trains its internal parameters (remember those?) are tuned with an optimizer such as stochastic gradient descent, and it gets better.

One of the crucial architectural building blocks of LLMs is the transformer.

A full discussion of transformers is well beyond the scope of this piece, but the most important thing to know is that transformers can use “attention” to model more complex relationships in language data.

For example: in a sentence like “the dog didn’t chase the cat because it was too tired”, every human knows that “it” refers to the dog and not the cat. Earlier approaches to building language models struggled with such connections in sentences that were longer than a few words, but using attention, transformers can handle them with ease.

In addition to this obvious advantage, transformers have found widespread use in deep learning applications such as language models because they’re easy to parallelize, meaning that training times can be reduced.

Building On Top Of Large Language Models

Out-of-the-box LLMs are pretty powerful, but it’s often necessary to tweak them for specific applications such as enterprise bots. There are a few ways of doing this, and we’re going to confine ourselves to two major approaches: fine-tuning and prompt engineering.

First up, it’s possible to fine-tune some of these models. Fine-tuning an LLM involves providing a training set and letting the model update its internal weights to perform better on a specific task. 

Next, the emerging discipline of prompt engineering refers to the practice of systematically crafting the text fed to the model to get it to better approximate the behavior you want.

LLMs can be surprisingly sensitive to small changes in words, phrases, and context; the job of a prompt engineer, therefore, is to develop a feel for these sensitivities and construct prompts in a way that maximizes the performance of the LLM.

Contact Us

How Can Large Language Models Be Used In Business?

There is a new gold rush in applying AI to business use cases.

For starters, given how good they are at generating text, they’re being deployed to write email copy, blog posts, and social media content, to text or survey customers, and to summarize text.

LLMs are also being used in software development. Tools like Replit’s Ghostwriter are already dramatically improving developer productivity in a variety of domains, from web development to machine learning.

What Are The “LLiMitations” Of LLMs?

For all their power, LLMs have turned out to have certain well-known limitations. To begin with, LLMs are capable of being toxic, harmful, aggressive, and biased.

Though heroic efforts have been made to train this behavior out with techniques such as reinforcement learning from human feedback, it’s possible that it can reemerge under the right conditions.

This is something you should take into account before giving customers access to generative AI offerings.

Another oft-discussed limitation is the tendency of LLMs to “invent” facts. Remember, an LLM is just trying to predict sequences of tokens, and there’s no reason it couldn’t output a sequence of text like “Dr. Micha Sartorius, professor of applied computronics at Santa Grega University”, even though this person, field, and university are fictitious.

This, too, is something you should be cognizant of before letting customers interact with generative AI.

At Quiq, we harness the power of LLMs’ language-generating capabilities, while putting strict guardrails in place to prevent these risks that are inherent to public-facing generative AI.

Should You Be Using Large Language Models?

LLMs are a remarkable engineering achievement, having been trained on vast amounts of human text and able to generate whole conversations, working code, and more.

No doubt, some of the fervor around LLMs will end up being hype. Nevertheless, the technology has been shown to be incredibly powerful, and it is unlikely to go anywhere. If you’re interested in learning about how to integrate generative AI applications like Quiq’s into your business, schedule a demo with us today!

Request A Demo

Prompt Engineering: What Is It—And How Can You Use It To Get The Most Out Of AI?

Think back to your school days. You come into class only to discover a timed writing assignment on the agenda. You have to respond to the provided prompt, quickly and accurately and will be graded against criteria like grammar, vocabulary, factual accuracy, and more.

Well, that’s what natural language processing (NLP) software like ChatGPT does daily. Except, when a computer steps into the classroom, it can’t raise its hand to ask questions.

That’s why it’s so important to provide AI with a prompt that’s clear and thorough enough to produce the best possible response.

What is ai prompt engineering?

A prompt can be a question, a phrase, or several paragraphs. The more specific the prompt is, the better the response.

Writing the perfect prompt — prompt engineering — is critical to ensure the NLP response is not only factually correct but crafted exactly as you intended to best deliver information to a specific target audience.

You can’t use low-quality ingredients in the kitchen to produce gourmet cuisine — and you can’t expect AI to, either.

Let’s revisit your old classroom again: did you ever have a teacher provide a prompt where you just weren’t really sure what the question was asking? So, you guessed a response based on the information provided, only to receive a low score.

In the post-exam review, the teacher explained what she was actually looking for and how the question was graded. You sat there thinking, “If I’d only had that information when I was given the prompt!”

Well, AI feels your pain.

The responses that NLP software provides are only as good as the input data. Learning how to communicate with AI to get it to generate desired responses is a science, and you can learn what works best through trial and error to continuously optimize your prompts.

Prompts that fail to deliver, and why.

What’s the root of the issue of prompt engineering gone wrong? It all comes down to incomplete, inconsistent, or incorrect data.

Even the most advanced AI using neural networks and deep learning techniques still needs to be fed the right information in the right way. When there is too little context provided, not enough examples, conflicting information from different sources, or major typos in the prompt, the AI can generate responses that are undesirable or just plain wrong.

How to craft the perfect prompt.

Here are some important factors to take into consideration for successful prompt engineering.

Clear instructions

Provide specific instructions and multiple examples to illustrate precisely what you want the AI to do. Words like “something,” “things,” “kind of,” and “it” (especially when there are multiple subjects within one sentence) can be indicators that your prompt is too vague.

Try to use descriptive nouns that refer to the subject of your sentence and avoid ambiguity.

  • Example (ambiguity): “She put the book on the desk; it was blue.”
  • What does “it” refer to in this sentence? Is the book blue, or is the desk blue?

Simple language

Use plain language, but avoid shorthand and slang. When in doubt, err on the side of overcommunicating and you can use trial and error to determine what shorthand approaches work for future, similar prompts. Avoid internal company or industry-specific jargon when possible, and be sure to clearly define any terms you may want to integrate.

Quality data

Give examples. Providing a single source of truth — for example, an article you want the AI to respond to questions about — will have a higher probability of returning factually correct responses based on the provided article.

On that note, teach the API how you want it to return responses when it doesn’t know the answer, such as “I don’t know,” “not enough information,” or simply “?”.

Otherwise, the AI may get creative and try to come up with an answer that sounds good but has no basis in reality.

Persona

Develop a persona for your responses. Should the response sound as though it’s being delivered by a subject matter expert or would it be better (legally or otherwise) if the response was written by someone who was only referring to subject matter experts (SMEs)?

  • Example (direct from SMEs): “Our team of specialists…”
  • Example (referring to SMEs): “Based on recent research by experts in the field…”

Voice, style, and tone

Decide how you want to represent your brand’s voice, which will largely be determined by your target audience. Would your customer be more likely to trust information that sounds like it was provided by an academic, or would a colloquial voice be more relatable?

Do you want a matter-of-fact, encyclopedia-type response, a friendly or supportive empathetic approach, or is your brand’s style more quick-witted and edgy?

With the right prompt, AI can capture all that and more.

Quiq takes prompt engineering out of the equation.

Prompt engineering is no easy task. There are many nuances to language that can trick even the most advanced NLP software.

Not only are incorrect AI responses a pain to identify and troubleshoot, but they can also hurt your business’s reputation if they aren’t caught before your content goes public.

On the other hand, manual tasks that could be automated with NLP waste time and money that could be allocated to higher-priority initiatives.

Quiq uses large language models (LLMs) to continuously optimize AI responses to your company’s unique data. With Quiq’s world-class Conversational AI platform, you can reduce the burden on your support team, lower costs, and boost customer satisfaction.

Contact Quiq today to see how our innovative LLM-built features improve business outcomes.

Contact Us