Forrester Report: The State of Conversational AI Read the report —>

What Is Transfer Learning? – The Role of Transfer Learning in Building Powerful Generative AI Models

Machine learning is hard work. Sure, it only takes a few minutes to knock out a simple tutorial where you’re training an image classifier on the famous iris dataset, but training a big model to do something truly valuable – like interacting with customers over a chat interface – is a much greater challenge.

Transfer learning offers one possible solution to this problem. By making it possible to train a model in one domain and reuse it in another, transfer learning can reduce demands on your engineering team by a substantial amount.

Today, we’re going to get into transfer learning, defining what it is, how it works, where it can be applied, and the advantages it offers.

Let’s get going!

What is Transfer Learning in AI?

In the abstract, transfer learning refers to any situation in which knowledge from one task, problem, or domain is transferred to another. If you learn how to play the guitar well and then successfully use those same skills to pick up a mandolin, that’s an example of transfer learning.

Speaking specifically about machine learning and artificial intelligence, the idea is very similar. Transfer learning is when you pre-train a model on one task or dataset and then figure out a way to reuse it for another (we’ll talk about methods later).

If you train an image model, for example, it will tend to learn certain low-level features (like curves, edges, and lines) that show up in pretty much all images. This means you could fine-tune the pre-trained model to do something more specialized, like recognizing faces.

Why Transfer Learning is Important in Deep Learning Models

Building a deep neural network requires serious expertise, especially if you’re doing something truly novel or untried.

Transfer learning, while far from trivial, is simply not as taxing. GPT-4 is the kind of project that could only have been tackled by some of Earth’s best engineers, but setting up a fine-tuning pipeline to get it to do good sentiment analysis is a much simpler job.

By lowering the barrier to entry, transfer learning brings advanced AI into reach for a much broader swath of people. For this reason alone, it’s an important development.

Transfer Learning vs. Fine-Tuning

And speaking of fine-tuning, it’s natural to wonder how it’s different from transfer learning.

The simple answer is that fine-tuning is a kind of transfer learning. Transfer learning is a broader concept, and there are other ways to approach it besides fine-tuning.

What are the 5 Types of Transfer Learning?

Broadly speaking, there are five major types of transfer learning, which we’ll discuss in the following sections.

Domain Adaptation

Under the hood, most modern machine learning is really just an application of statistics to particular datasets.

The distribution of the data a particular model sees, therefore, matters a lot. Domain adaptation refers to a family of transfer learning techniques in which a model is (hopefully) trained such that it’s able to handle a shift in distributions from one domain to another (see section 5 of this paper for more technical details).

Domain Confusion

Earlier, we referenced the fact that the layers of a neural network can learn representations of particular features – one layer might be good at detecting curves in images, for example.

It’s possible to structure our training such that a model learns more domain invariant features, i.e. features that are likely to show up across multiple domains of interest. This is known as domain confusion because, in effect, we’re making the domains as similar as possible.

Multitask Learning

Multitask learning is arguably not even a type of transfer learning, but it came up repeatedly in our research, so we’re adding a section about it here.

Multitask learning is what it sounds like; rather than simply training a model on a single task (i.e. detecting humans in images), you attempt to train it to do several things at once.

The debate about whether multitask learning is really transfer learning stems from the fact that transfer learning generally revolves around adapting a pre-trained model to a new task, rather than having it learn to do more than one thing at a time.

One-Shot Learning

One thing that distinguishes machine learning from human learning is that the former requires much more data. A human child will probably only need to see two or three apples before they learn to tell apples from oranges, but an ML model might need to see thousands of examples of each.

But what if that weren’t necessary? The field of one-shot learning addresses itself to the task of learning e.g. object categories from either one example or a small number of them. This idea was pioneered in “One-Shot Learning of Object Categories”, a watershed paper co-authored by Fei-Fei Li and her collaborators. Their Bayesian one-shot learner was able to “…to incorporate prior knowledge of the object world into the learning scheme”, and it outperformed a variety of other models in object recognition tasks.

Zero-Shot Learning

Of course, there might be other tasks (like translating a rare or endangered language), for which it is effectively impossible to have any labeled data for a model to train on. In such a case, you’d want to use zero-shot learning, which is a type of transfer learning.

With zero-shot learning, the basic idea is to learn features in one data set (like images of cats) that allow successful performance on a different data set (like images of dogs). Humans have little problem with this, because we’re able to rapidly learn similarities between types of entities. We can see that dogs and cats both have tails, both have fur, etc. Machines can perform the same feat if the data is structured correctly.

How Does Transfer Learning Work?

There are a few different ways you can go about utilizing transfer learning processes in your own projects.

Perhaps the most basic is to use a good pre-trained model off the shelf as a feature extractor. This would mean keeping the pre-trained model in place, but then replacing its final layer with a layer custom-built for your purposes. You could take the famous AlexNet image classifier, remove its last classification layer, and replace it with your own, for example.

Or, you could fine-tune the pre-trained model instead. This is a more involved engineering task and requires that the pre-trained model be modified internally to be better suited to a narrower application. This will often mean that you have to freeze certain layers in your model so that the weights don’t change, while simultaneously allowing the weights in other layers to change.

What are the Applications of Transfer Learning?

As machine learning and deep learning have grown in importance, so too has transfer learning become more crucial. It now shows up in a variety of different industries. The following are some high-level indications of where you might see transfer learning being applied.

Speech recognition across languages: Teaching machines to recognize and process spoken language is an important area of AI research and will be of special interest to those who operate contact centers. Transfer learning can be used to take a model trained in a language like French and repurpose it for Spanish.

Training general-purpose game engines: If you’ve spent any time playing games like chess or go, you know that they’re fairly different. But, at a high enough level of abstraction, they still share many features in common. That’s why transfer learning can be used to train up a model on one game and, under certain conditions, use it in another.

Object recognition and segmentation: Our Jetsons-like future will take a lot longer to get here if our robots can’t learn to distinguish between basic objects. This is why object recognition and object segmentation are both such important areas of research. Transfer learning is one way of speeding up this process. If models can learn to recognize dogs and then quickly be re-purposed for recognizing muffins, then we’ll soon be able to outsource both pet care and cooking breakfast.

transfer_learning_chihuahua
In fairness to the AI, it’s not like we can really tell them apart!

Applying Natural Language Processing: For a long time, computer vision was the major use case of high-end, high-performance AI. But with the release of ChatGPT and other large language models, NLP has taken center stage. Because much of the modern NLP pipeline involves word vector embeddings, it’s often possible to use a baseline, pre-trained NLP model in applications like topic modeling, document classification, or spicing up your chatbot so it doesn’t sound so much like a machine.

What are the Benefits of Transfer Learning?

Transfer learning has become so popular precisely because it offers so many advantages.

For one thing, it can dramatically reduce the amount of time it takes to train a new model. Because you’re using a pre-trained model as the foundation for a new, task-specific model, far fewer engineering hours have to be spent to get good results.

There are also a variety of situations in which transfer learning can actually improve performance. If you’re using a good pre-trained model that was trained on a general enough dataset, many of the features it learned will carry over to the new task.

This is especially true if you’re working in a domain where there is relatively little data to work with. It might simply not be possible to train a big, cutting-edge model on a limited dataset, but it will often be possible to use a pre-trained model that is fine-tuned on that limited dataset.

What’s more, transfer learning can work to prevent the ever-present problem of overfitting. Overfitting has several definitions depending on what resource you consult, but a common way of thinking about it is when the model is complex enough relative to the data that it begins learning noise instead of just signal.

That means that it may do spectacularly well in training only to generalize poorly when it’s shown fresh data. Transfer learning doesn’t completely rule out this possibility, but it makes it less likely to happen.

Transfer learning also has the advantage of being quite flexible. You can use transfer learning for everything from computer vision to natural language processing, and many domains besides.

Relatedly, transfer learning makes it possible for your model to expand into new frontiers. When done correctly, a pre-trained model can be deployed to solve an entirely new problem, even when the underlying data is very different from what it was shown before.

When To Use Transfer Learning

The list of benefits we just enumerated also offers a clue as to when it makes sense to use transfer learning.

Basically, you should consider using transfer learning whenever you have limited data, limited computing resources, or limited engineering brain cycles you can throw at a problem. This will often wind up being the case, so whenever you’re setting your sights on a new goal, it can make sense to spend some time seeing if you can’t get there more quickly by simply using transfer learning instead of training a bespoke model from scratch.

Check out the second video in Quiq’s LLM Intuitions series—created by our Head of AI, Kyle McIntyre—to learn about one of the oldest forms of transfer learning: Word embeddings.

Transfer Learning and You

In the contact center space, we understand how difficult it can be to effectively apply new technologies to solve our problems. It’s one thing to put together a model for a school project, and quite another to have it tactfully respond to customers who might be frustrated or confused.

Transfer learning is one way that you can get more bang for your engineering buck. By training a model on one task or dataset and using it on another, you can reduce your technical budget while still getting great results.

You could also just rely on us to transfer our decades of learning on your behalf (see what we did there). We’ve built an industry-leading conversational AI chat platform that is changing the game in contact centers. Reach out today to see how Quiq can help you leverage the latest advances in AI, without the hassle.

How Generative AI is Supercharging Contact Center Agents

If you’re reading this, you’ve probably had a chance to play around with ChatGPT or one of the other large language models (LLMs) that have been making waves and headlines in recent months.

Concerns around automation go back a long way, but there’s long been extra worry about the possibility that machines will make human labor redundant. If you’ve used generative AI to draft blog posts or answer technical questions, it’s natural to wonder if perhaps algorithms will soon be poised to replace humans in places like contact centers.

Given how new these LLMs are there has been little scholarship on how they’ve changed the way contact centers function. But “Generative AI at Work” by Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond took aim at exactly this question.

The results are remarkable. They found that access to tools like ChatGPT not only led to a marked increase in productivity among the lowest-skilled workers, it also had positive impacts on other organizational metrics, like reducing turnover.

Today, we’re going to break this economic study down, examining its methods, its conclusions, and what they mean for the contact centers of the future.

Let’s dig in!

A Look At “Generative AI At Work”

The paper studies data from the use of a conversational AI assistant by a little over 5,000 agents working in customer support.

It contains several major sections, beginning with a technical primer on what generative AI is and how it works before moving on to a discussion of the study’s methods and results.

What is Generative AI?

Covering the technical fundamentals of generative AI will inform our efforts to understand the ways in which this AI technology affected work in the study, as well as how it is likely to do so in future deployments.

A good way to do this is to first grasp how traditional, rules-based programming works, then contrast this with generative AI.

When you write a computer program, you’re essentially creating a logical structure that furnishes instructions the computer can execute.

To take a simple case, you might try to reverse a string such as “Hello world”. One way to do this explicitly is to write code in a language like Python which essentially says:

“Create a new, empty list, then start at the end of the string we gave you and work forward, successively adding each character you encounter to that list before joining all the characters into a reversed string”:

Python code demonstrating a reverse string.

Despite the fact that these are fairly basic instructions, it’s possible to weave them into software that can steer satellites and run banking infrastructure.

But this approach is not suitable for every kind of problem. If you’re trying to programmatically identify pictures of roses, for example, it’s effectively impossible to do this with rules like the ones we used to reverse the string.

Machine learning, however, doesn’t even try to explicitly define any such rules. It works instead by feeding a model many pictures of roses, and “training” it to learn a function that lets it identify new pictures of roses it has never seen before.

Generative AI is a kind of machine learning in which gargantuan models are trained on mind-boggling amounts of text data until they’re able to produce their own, new text. Generative AI is a distinct sub-branch of ML because its purpose is generation, while other kinds of models might be aimed at tasks like classification and prediction.

Is Generative AI The Same Thing As Large Language Models?

At this point, you might be wondering how whether generative AI is the same thing as LLMs. With all the hype and movement in the space, it’s easy to lose track of the terminology.

LLMs are a subset of the broader category of generative AI. All LLMs are generative AI, but there are generative algorithms that work with images, music, chess moves, and other things besides natural language.

How Did The Researchers Study the Effects of Generative AI on Work?

Now we understand that ML learns to recognize patterns, how this is different from classical computer programming, and how generative AI fits into the whole picture.

We can now get to the meat of the study, beginning with how Brynjolfsson, Li, and Raymond actually studied the use of generative AI by workers at a contact center.

The firm from which they drew their data is a Fortune 500 company that creates enterprise software. Its support agents are located mainly in the Phillippines (with a smaller number in the U.S.) to resolve customer issues via a chat interface.

Most of the agent’s job boils down to answering questions from the owners of small businesses that use the firm’s software. Their productivity is assessed via how long it takes them to resolve a given issue (“average handle time”), the fraction of total issues a given agent is able to resolve to the customer’s satisfaction (“resolution rate”), and the net number of customers who would recommend the agent (“net promoter score.”)

Line graphs showing handle time, resolution rate and customer satisfaction using AI.

The AI used by the firm is a version of GPT which has received additional training on conversations between customers and agents. It is mostly used for two things: generating appropriate responses to customers in real-time and surfacing links to the firm’s technical documentation to help answer specific questions about the software.

Bear in mind that this generative AI system is meant to help the agents in performing their jobs. It is not intended to – and is not being trained to – completely replace them. They maintain autonomy in deciding whether and how much of the AI’s suggestions to take.

How Did Generative AI Change Work?

Next, we’ll look at what the study actually uncovered.

There were four main findings, touching on how total worker productivity was impacted, whether productivity gains accrued mainly to low-skill or high-skill workers, how access to an AI tool changed learning on the job, and how the organization changed as a result.

1. Access to Generative AI Boosted Worker Productivity

First, being able to use the firm’s AI tool increased worker productivity by almost 14%. This came from three sources: a reduction in how long it took any given agent to resolve a particular issue, an expansion in the total number of resolutions an agent was able to work on in an hour, and a small jump in the fraction of chats that were completed successfully.

The firm's AI tool increased worker productivity by almost 14%

This boost happened very quickly, showing up in the first month after deployment, growing a little in the second month, and then remaining at roughly that level for the duration of the study.

2. Access to Generative AI Was Most Helpful for Lower-Skilled Agents

Intriguingly, the greatest productivity gains were seen among agents that were relatively low-skill, such as those that were new to the job, with longer-serving, higher-skilled agents seeing virtually none.

The agents in the very bottom quintile for skill level, in fact, were able to resolve 35% more calls per hour—a substantial jump.

The agents in the very bottom quintile for skill level were able to resolve more calls per hour 35%.

With the benefit of hindsight it’s tempting to see these results as obvious, but they’re not. Earlier studies have usually found that the benefits of new computing technologies accrued to the ablest workers, or led firms to raise the bar on skill requirements for different positions.

If it’s true that generative AI is primarily going to benefit less able employees, this fact alone will distinguish it from prior waves of innovation. [1]

3. Access To Generative AI Helps New Workers “Move Down the Learning Curve”

Perhaps the most philosophically interesting conclusion drawn by the study’s authors relates to how generative AI is able to partially learn the tacit knowledge of more skilled workers.

The term “tacit knowledge” refers to the hard-to-articulate behaviors you pick up as you get good at something.

Imagine trying to teach a person how to ride a bike. It’s easy enough to give broad instructions (“check your shoelaces”, “don’t brake too hard”), but there ends up being a billion little subtleties related to foot placement, posture, etc. that are difficult to get into words.

This is true for everything, and it’s part of what distinguishes masters from novices. It’s also a major reason for the fact that many professions have been resistant to full automation.

Remember our discussion of how rule-based programming is poorly suited to tasks where the rules are hard to state? Well, that applies to tasks involving a lot of tacit knowledge. If no one, not even an expert, can tell you precisely what steps to take to replicate their results, then no one is going to be able to program a computer to do it either.

But ML and generative AI don’t face this restriction. With data sets that are big enough and rich enough, the algorithms might be able to capture some of the tacit knowledge expert contact center agents have, e.g. how they phrase replies to customers.

This is suggested by the study’s results. By analyzing the text of customer-agent interactions, the authors found that novice agents using generative AI were able to sound more like experienced agents, which contributed to their success.

4. Access to Generative AI Changed the Way the Organization Functioned

Organizations are profoundly shaped by their workers, and we should expect to see organization-level changes when a new technology dramatically changes how employees operate.

Two major findings from the study were that employee turnover was markedly reduced and there were far fewer customers “escalating” an issue by asking to speak to a supervisor. This could be because agents using generative AI were overall treated much better by customers (who have been known to become frustrated and irate), leading to less stress.

The Contact Center of the Future

Generative AI has already impacted many domains, and this trend will likely only continue going forward. “Generative AI At Work” provides a fascinating glimpse into the way that this technology changed a large contact center by boosting productivity among the least-skilled agents, helping disseminate the hard-won experience of the most-skilled agents, and overall reducing turnover and dissatisfaction.

If this piece has piqued your curiosity about how you can use advanced AI tools for customer-facing applications, schedule a demo of the Quiq conversational CX platform today.

From resolving customer complaints with chatbots to automated text-message follow-ups, we’ve worked hard to build a best-in-class solution for businesses that want to scale with AI.

Let’s see what we can do for you!

[1] See e.g. this quote: “Our paper is related to a large literature on the impact of various forms of technological adoption on worker productivity and the organization of work (e.g. Rosen, 1981; Autor et al., 1998; Athey and Stern, 2002; Bresnahan et al., 2002; Bartel et al., 2007; Acemoglu et al., 2007; Hoffman et al., 2017; Bloom et al., 2014; Michaels et al., 2014; Garicano and Rossi-Hansberg, 2015; Acemoglu and Restrepo, 2020). Many of these studies, particularly those focused on information technologies, find evidence that IT complements higher-skill workers (Akerman et al., 2015; Taniguchi and Yamada, 2022). Bartel et al. (2007) shows that firms that adopt IT tend to use more skilled labor and increase skill requirements for their workers. Acemoglu and Restrepo (2020) study the diffusion of robots and find that the negative employment effects of robots are most pronounced for workers in blue-collar occupations and those with less than a college education. In contrast, we study a different type of technology—generative AI—and find evidence that it most effectively augments lower-skill workers.”

A Guide to Fine-Tuning Pretrained Language Models for Specific Use Cases

Over the past half-year, large language models (LLMs) like ChatGPT have proven remarkably useful for a wide range of tasks, including machine translation, code analysis, and customer interactions in places like contact centers.

For all this power and flexibility, however, it is often still necessary to use fine-tuning to get an LLM to generate high-quality output for specific use cases.

Today, we’re going to do a deep dive into this process, understanding how these models work, what fine-tuning is, and how you can leverage it for your business.

What is a Pretrained Language Model?

First, let’s establish some background context by tackling the question of what pretrained models are and how they work.

The “GPT” in ChatGPT stands for “generative pretrained transformer”, and this gives us a clue as to what’s going on under the hood. ChatGPT is a generative model, meaning its purpose is to create new output; it’s pretrained, meaning that it has already seen a vast amount of text data by the time end users like us get our hands on it; and it’s a transformer, which refers to the fact that it’s built out of billions of transformer modules stacked into layers.

If you’re not conversant in the history of machine learning it can be difficult to see what the big deal is, but pretrained models are a relatively new development. Once upon a time in the ancient past (i.e. 15 or 20 years ago), it was an open question as to whether engineers would be able to pretrain a single model on a dataset and then fine-tune its performance, or whether they would need to approach each new problem by training a model from scratch.

This question was largely resolved around 2013, when image models trained on the ImageNet dataset began sweeping competitions left and right. Since then it has become more common to use pretrained models as a starting point, but we want to emphasize that this approach does not always work. There remain a vast number of important projects for which building a bespoke model is the only way to go.

What is Transfer Learning?

Transfer learning refers to when an agent or system figures out how to solve one kind of problem and then uses this knowledge to solve a different kind of problem. It’s a term that shows up all over artificial intelligence, cognitive psychology, and education theory.

Author, chess master, and martial artist Josh Waitzkin captures the idea nicely in the following passage from his blockbuster book, The Art of Learning:

“Since childhood I had treasured the sublime study of chess, the swim through ever-deepening layers of complexity. I could spend hours at a chessboard and stand up from the experience on fire with insight about chess, basketball, the ocean, psychology, love, art.”

Transfer learning is a broader concept than pretraining, but the two ideas are closely related. In machine learning, competence can be transferred from one domain (generating text) to another (translating between natural languages or creating Python code) by pretraining a sufficiently large model.

What is Fine-Tuning A Pretrained Language Model?

Fine-tuning a pretrained language model occurs when the model is repurposed for a particular task by being shown illustrations of the correct behavior.

If you’re in a whimsical mood, for example, you might give ChatGPT a few dozen limericks so that its future output always has that form.

It’s easy to confuse fine-tuning with a few other techniques for getting optimum performance out of LLMs, so it’s worth getting clear on terminology before we attempt to give a precise definition of fine-tuning.

Fine-Tuning a Language Model v.s. Zero-Shot Learning

Zero-shot learning is whatever you get out of a language model when you feed it a prompt without making any special effort to show it what you want. It’s not technically a form of fine-tuning at all, but it comes up in a lot of these conversations so it needs to be mentioned.

(NOTE: It is sometimes claimed that prompt engineering counts as zero-shot learning, and we’ll have more to say about that shortly.)

Fine-Tuning a Language Model v.s. One-Shot Learning

One-shot learning is showing a language model a single example of what you want it to do. Continuing our limerick example, one-shot learning would be giving the model one limerick and instructing it to format its replies with the same structure.

Fine-Tuning a Language Model v.s. Few-Shot Learning

Few-shot learning is more or less the same thing as one-shot learning, but you give the model several examples of how you want it to act.

How many counts as “several”? There’s no agreed-upon number that we know about, but probably 3 to 5, or perhaps as many as 10. More than this and you’re arguably not doing “few”-shot learning anymore.

Fine-Tuning a Language Model v.s. Prompt Engineering

Large language models like ChatGPT are stochastic and incredibly sensitive to the phrasing of the prompts they’re given. For this reason, it can take a while to develop a sense of how to feed the model instructions such that you get what you’re looking for.

The emerging discipline of prompt engineering is focused on cultivating this intuitive feel. Minor tweaks in word choice, sentence structure, etc. can have an enormous impact on the final output, and prompt engineers are those who have spent the time to learn how to make the most effective prompts (or are willing to just keep tinkering until the output is correct).

Does prompt engineering count as fine-tuning? We would argue that it doesn’t, primarily because we want to reserve the term “fine-tuning” for the more extensive process we describe in the next few sections.

Still, none of this is set in stone, and others might take the opposite view.

Distinguishing Fine-Tuning From Other Approaches

Having discussed prompt engineering and zero-, one-, and few-shot learning, we can give a fuller definition of fine-tuning.

Fine-tuning is taking a pretrained language model and optimizing it for a particular use case by giving it many examples to learn from. How many you ultimately need will depend a lot on your task – particularly how different the task is from the model’s training data and how strict your requirements for its output are – but you should expect it to take on the order of a few dozen or a few hundred examples.

Though it bears an obvious similarity to one-shot and few-shot learning, fine-tuning will generally require more work to come up with enough examples, and you might have to build a rudimentary pipeline that feeds the examples in through the API. It’s almost certainly not something you’ll be doing directly in the ChatGPT web interface.

Contact Us

How Can I Fine-Tune a Pretrained Language Model?

Having gotten this far, we can now turn our attention to what the fine-tuning procedure actually consists in. The basic steps are: deciding what you’re wanting to accomplish, gather the requisite data (and formatting it correctly), feeding it to your model, and evaluating the results.

Let’s discuss each, in turn.

Deciding on Your Use Case

The obvious place to begin is figuring out exactly what it is you want to fine-tune a pretrained model to do.

It may seem as though this is too obvious to be included as its own standalone step, but we’ve singled it out is because you need to think through the specifics of what you’re trying to accomplish. It’s not enough to say “We want to fine-tune this model to write tweets for us”, you have to consider questions like “Should the tone by formal or informal” and “Are we okay with it adding emojis”.

This matters because it informs the data that you gather for fine-tuning. If you feed the model irreverent tweets with lots of 🙂s and 🚀s, you can expect that to show up in the tweets it generates for you.

Picking a Base Model

Next, you have to figure out which model you want to fine-tune. This is not the place to get into a discussion on all the “foundation” models that are available today, but suffice it to say that HuggingFace, OpenAI, Anthropic, and the other companies in the pretrained model space offer a variety of different options. BigScience has BLOOM and T0pp models, OpenAI has Ada, Babbage, Da Vinci, and Curie, and so on.

Different models have different fine-tuning requirements, so this choice will impact how you structure your pipeline later.

Gathering and Formatting the Fine-Tuning Data

The heart of fine-tuning a pretrained model is gathering the samples from which it will learn. This is more or less exactly what it sounds like: you compile a list of prompts and desired responses. This is the data that will adjust the model’s internal parameters, optimizing it for your use case.

AI educator Dave Shapiro has a tutorial on fine-tuning GPT-3 to be a tutor for younger students, and here is one of the training samples he used (“TIM” is the name he’s using for the tutor):

User: Hey TIM, I’m interested in learning more about Charles Darwin. Can you tell me a bit about him?

TIM: Charles Darwin was an English naturalist and geologist who developed the theory of evolution by natural selection. He was born into a wealthy family and was able to attend the prestigious Shrewsbury School. After graduation, he enrolled at the University of Edinburgh to study medicine, but he soon dropped out and enrolled at Christ’s College, Cambridge.

User: Interesting! I didn’t know he was born into a wealthy family.

TIM: Yes, his family was very wealthy and his father was a doctor. Darwin’s grandfather, Erasmus Darwin, was a well-known scientist and wrote several books on botany and evolution.

User: Wow, I had no idea. What made Darwin decide to drop out of medical school and study at Cambridge?

TIM: Darwin was interested in natural history and had developed a keen interest in taxidermy. After dropping out of medical school, he traveled to Wales with some friends to study geology. It was during this time that he began to develop his theory of evolution.

One thing you’ll note is that this sample is pretty long. It could take a while to come up with hundreds of these to fine-tune your model, but that’s actually something a generative pretrained language model like ChatGPT can help with. Just be sure you have a human go over the samples and check them for accuracy, or you risk compromising the quality of your outputs.

Another thing to think about is how you’ll handle adversarial behavior and edge cases. If you’re training a conversational AI chatbot for a contact center, for example, you’ll want to include plenty of instances of the model calmly and politely responding to an irate customer. That way, your output will be similarly calm and polite.

Lastly, you’ll have to format the fine-tuning data according to whatever specifications are required by the base model you’re using. It’ll probably be something similar to JSON, but check the documentation to be sure.

Feeding it to Your Model

Now that you’ve got your samples ready, you’ll have to give them to the model for fine-tuning. This will involve you feeding the examples to the model via its API and waiting until the process has finished.

What is the Difference Between Fine-Tuning and a Pretrained Model?

A pretrained model is one that has been previously trained on a particular dataset or task, and fine-tuning is getting that model to do well on a new task by showing it examples of the output you want to see.

Pretrained models like ChatGPT are often pretty good out of the box, but if you’re wanting it to create legal contracts or work with highly-specialized scientific vocabulary, you’ll likely need to fine-tune it.

Should You Fine-Tune a Pretrained Model For Your Business?

Generative pretrained language models like ChatGPT and Bard have already begun to change the way businesses like contact centers function, and we think this is a trend that is likely to accelerate in the years ahead.

If you’ve been intrigued by the possibility of fine-tuning a pretrained model to supercharge your enterprise, then hopefully the information contained in this article gives you some ideas on how to begin.

Another option is to leverage the power of the Quiq platform. We’ve built a best-in-class conversational AI system that can automate substantial parts of your customer interactions (without you needing to run your own models or set up a fine-tuning pipeline.)

To see how we can help, schedule a demo with us today!

Request A Demo

Brand Voice And Tone Building With Prompt Engineering

Artificial intelligence tools like ChatGPT are changing the way strategists are building their brands.

But with the staggering rate of change in the field, it can be hard to know how to utilize its full potential. Should you hire an engineering team? Pay for a subscription and do it yourself?

The truth is, it depends. But one thing you can try is prompt engineering, a term that refers to carefully crafting the instructions you give to the AI to get the best possible results.

In this piece, we’ll cover the basics of prompt engineering and discuss the many ways in which you can build your brand voice with generative AI.

What is Prompt Engineering?

As the name implies, generative AI refers to any machine learning (ML) model whose primary purpose is to generate some output. There are generative AI applications for creating new images, text, code, and music.

There are also ongoing efforts to expand the range of outputs generative models can handle, such as a fascinating project to build a high-level programming language for creating new protein structures.

The way you get output from a generative AI model is by prompting it. Just as you could prompt a friend by asking “How was your vacation in Japan,” you can prompt a generative model by asking it questions and giving it instructions. Here’s an example:

“I’m working on learning Java, and I want you to act as though you’re an experienced Java teacher. I keep seeing terms like `public class` and `public static void`. Can you explain to me the different types of Java classes, giving an example and explanation of each?”

When we tried this prompt with GPT-4, it responded with a lucid breakdown of different Java classes (i.e., static, inner, abstract, final, etc.), complete with code snippets for each one.

When Small Changes Aren’t So Small

Mapping the relationship between human-generated inputs and machine-generated outputs is what the emerging field of “prompt engineering” is all about.

Prompt engineering only entered popular awareness in the past few years, as a direct consequence of the meteoric rise of large language models (LLMs). It rapidly became obvious that GPT-3.5 was vastly better than pretty much anything that had come before, and there arose a concomitant interest in the best ways of crafting prompts to maximize the effectiveness of these (and similar) tools.

At first glance, it may not be obvious why prompt engineering is a standalone profession. After all, how difficult could it be to simply ask the computer to teach you Chinese or explain a coding concept? Why have a “prompt engineer” instead of a regular engineer who sometimes uses GPT-4 for a particular task?

A lot could be said in reply, but the big complication is the fact that a generative AI’s output is extremely dependent upon the input it receives.

An example pulled from common experience will make this clearer. You’ve no doubt noticed that when you ask people different kinds of questions you elicit different kinds of responses. “What’s up?” won’t get the same reply as “I notice you’ve been distant recently, does that have anything to do with losing your job last month?”

The same basic dynamic applies to LLMs. Just as subtleties in word choice and tone will impact the kind of interaction you have with a person, they’ll impact the kind of interaction you have with a generative model.

All this nuance means that conversing with your fellow human beings is a skill that takes a while to develop, and that also holds in trying to productively using LLMs. You must learn to phrase your queries in a way that gives the model good context, includes specific criteria as to what you’re looking for in a reply, etc.

Honestly, it can feel a little like teaching a bright, eager intern who has almost no initial understanding of the problem you’re trying to get them to solve. If you give them clear instructions with a few examples they’ll probably do alright, but you can’t just point them at a task and set them loose.

We’ll have much more to say about crafting the kinds of prompts that help you build your brand voice in upcoming sections, but first, let’s spend some time breaking down the anatomy of a prompt.

This context will come in handy later.

What’s In A Prompt?

In truth, there are very few real restrictions on how you use an LLM. If you ask it to do something immoral or illegal it’ll probably respond along the lines of “I’m sorry Dave, but as a large language model I can’t let you do that,” otherwise you can just start feeding it text and seeing how it responds.

That having been said, prompt engineers have identified some basic constituent parts that go into useful prompts. They’re worth understanding as you go about using prompt engineering to build your brand voice.

Context

First, it helps to offer the LLM some context for the task you want done. Under most circumstances, it’s enough to give it a sentence or two, though there can be instances in which it makes sense to give it a whole paragraph.

Here’s an example prompt without good context:

“Can you write me a title for a blog post?”

Most human beings wouldn’t be able to do a whole lot with this, and neither can an LLM. Here’s an example prompt with better context:

“I’ve just finished a blog post for a client that makes legal software. It’s about how they have the best payments integrations, and the tone is punchy, irreverent, and fun. Could you write me a title for the post that has the same tone?”

To get exactly what you’re looking for you may need to tinker a bit with this prompt, but you’ll have much better chances with the additional context.

Instructions

Of course, the heart of the matter is the actual instructions you give the LLM. Here’s the context-added prompt from the previous section, whose instructions are just okay:

“I’ve just finished a blog post for a client that makes legal software. It’s about how they have the best payments integrations, and the tone is punchy, irreverent, and fun. Could you write me a title for the post that has the same tone?”

A better way to format the instructions is to ask for several alternatives to choose from:

“I’ve just finished a blog post for a client that makes legal software. It’s about how they have the best payments integrations, and the tone is punchy, irreverent, and fun. Could you give me 2-3 titles for the blog post that have the same tone?”

Here again, it’ll often pay to go through a couple of iterations. You might find – as we did when we tested this prompt – that GPT-4 is just a little too irreverent (it used profanity in one of its titles.) If you feel like this doesn’t strike the right tone for your brand identity you can fix it by asking the LLM to be a bit more serious, or rework the titles to remove the profanity, etc.

You may have noticed that “keep iterating and testing” is a common theme here.

Example Data

Though you won’t always need to get the LLM input data, it is sometimes required (as when you need it to summarize or critique an argument) and is often helpful (as when you give it a few examples of titles you like.)

Here’s the reworked prompt from above, with input data:

“I’ve just finished a blog post for a client that makes legal software. It’s about how they have the best payments integrations, and the tone is punchy, irreverent, and fun. Could you give me 2-3 titles for the blog post that have the same tone?

Here’s a list of two titles that strike the right tone:
When software goes hard: dominating the legal payments game.
Put the ‘prudence’ back in ‘jurisprudence’ by streamlining your payment collections.”

Remember, LLMs are highly sensitive to what you give them as input, and they’ll key off your tone and style. Showing them what you want dramatically boosts the chances that you’ll be able to quickly get what you need.

Output Indicators

An output indicator is essentially any concrete metric you use to specify how you want the output to be structured. Our existing prompt already has one, and we’ve added another (both are bolded):

“I’ve just finished a blog post for a client that makes legal software. It’s about how they have the best payments integrations, and the tone is punchy, irreverent, and fun. Could you give me 2-3 titles for the blog post that have the same tone? Each title should be approximately 60 characters long.

Here’s a list of two titles that strike the right tone:
When software goes hard: dominating the legal payments game.
Put the ‘prudence’ back in ‘jurisprudence’ by streamlining your payment collections.”

As you go about playing with LLMs and perfecting the use of prompt engineering in building your brand voice, you’ll notice that the models don’t always follow these instructions. Sometimes you’ll ask for a five-sentence paragraph that actually contains eight sentences, or you’ll ask for 10 post ideas and get back 12.

We’re not aware of any general way of getting an LLM to consistently, strictly follow instructions. Still, if you include good instructions, clear output indicators, and examples, you’ll probably get close enough that only a little further tinkering is required.

What Are The Different Types of Prompts You Can Use For Prompt Engineering?

Though prompt engineering for tasks like brand voice and tone building is still in its infancy, there are nevertheless a few broad types of prompts that are worth knowing.

  • Zero-shot prompting: A zero-shot prompt is one in which you simply ask directly for what you want without providing any examples. It’ll simply generate an output on the basis of its internal weights and prior training, and, surprisingly, this is often more than sufficient.
  • One-shot prompting: With a one-shot prompt, you’re asking the LLM for output and giving it a single example to learn from.
  • Few-shot prompting: Few-shot prompts involve a least a few examples of expected output, as in the two titles we provided our prompt when we asked it for blog post titles.
  • Chain-of-thought prompting: Chain-of-thought prompting is similar to few-shot prompting, but with a twist. Rather than merely giving the model examples of what you want to see, you craft your examples such that they demonstrate a process of explicit reasoning. When done correctly, the model will actually walk through the process it uses to reason about a task. Not only does this make its output more interpretable, but it can also boost accuracy in domains at which LLMs are notoriously bad, like addition.

What Are The Challenges With Prompt Engineering For Brand Voice?

We don’t use the word “dazzling” lightly around here, but that’s the best way of describing the power of ChatGPT and the broader ecosystem of large language models.

You would be hard-pressed to find many people who have spent time with one and come away unmoved.

Still, challenges remain, especially when it comes to using prompt engineering for content marketing or building your brand voice.

One well-known problem is the tendency of LLMs to completely make things up, a phenomenon referred to as “hallucination”. The internet is now filled with examples of ChatGPT completely fabricating URLs, books, papers, professions, and individuals. If you use an LLM to create content for your website and don’t thoroughly vet it, you run the risk of damaging your reputation and your brand if it contains false or misleading information.

A related problem is legal or compliance issues that emerge as a result of using an LLM. Though the technology hasn’t been around long enough to get anyone into serious trouble (we suspect it won’t be long), there are now cases in which attorneys have been caught using faulty research generated by ChatGPT or engineering teams have leaked proprietary secrets by feeding meeting notes into it.

Finally, if you’re offering a fine-tuned model to customers to do something like answer questions, you must be very, very careful in delimiting its scope so that it doesn’t generate unwanted behavior. It’s pretty easy to accidentally wander into fraught territory when engaging with an LLM in an open-ended manner, and that’s not even counting users who deliberately try to get it to respond inappropriately.

One potential solution to this problem is by crafting your prompts such that they contain clear instructions about what not to do. You may tell it not to discuss its own rules, not to change its tone, not to speak negatively about anyone, not to argue, etc.

Crafting a prompt that illustrates the correct behavior while explicitly ruling out any incorrect behaviors is a non-trivial task, requiring a great deal of testing and refinement. But it’s one you’ll have to get right if you want to leverage LLMs for your brand voice while avoiding any possible harm down the line.

Prompt Engineering For Marketers

With all this context having been established, you’re no doubt excited about using prompt engineering in your marketing efforts. In this section, we’ll go through a couple of ways you can do that.

Coming up with ideas

One place where LLMs excel is in helping you with ideation. A lot goes into content marketing, but all of that follows from having a solid list of basic concepts to work from.

We tested ChatGPT’s functionality by feeding it this query:

“I’m starting a blog about becoming a data scientist. Could you give me a list of 10 blog post ideas, with a mix of beginner, intermediate, and advanced topics? The titles should be eye-catching but not too aggressive.

Here are two examples of good blog post titles:

‘Data science: what to know when considering a career in this hot new field.’
‘Data science v.s. data engineering: everything you wanted to know’”

And we got back some excellent ideas:

  1. “Demystifying Data Science: A Beginner’s Guide to Jumpstarting Your Career”
  2. “From Novice to Ninja: Essential Skills for Every Data Science Aspirant”
  3. “Unlocking the Power of Python: A Data Scientist’s Best Friend”
  4. “Exploratory Data Analysis: Unveiling Insights Hidden in the Numbers”
  5. “Machine Learning 101: A Gentle Introduction for Data Science Enthusiasts”
  6. “Feature Engineering: Enhancing Your Models Like a Pro”
  7. “The Art of Natural Language Processing: Extracting Meaning from Text”
  8. “Unsupervised Learning: Unveiling Patterns and Structure in Your Data”
  9. “Deep Dive into Neural Networks: Building Advanced Models for Predictive Analytics”
  10. “Data Science in the Real World: Tackling Complex Business Problems with Analytics”

And look, it even generated the correct number of titles! You’ll notice, though that every single one follows the same template, with an initial phrase, a colon, and then a follow-on phrase. That’s because the two examples I gave have this structure, and that’s what it used as a guide.

Performing Background Research

After you’re done coming up with good ideas for building your brand voice, you’ll likely want to do some additional background research on your domain. This, too, is a place where LLMs excel.

There can be a lot of subtlety to this. You might start with something obvious, like “give me a list of the top authors in the keto diet niche”, but you can also get more creative than this. We’ve heard of copywriters who have used GPT-3.5 to generate lengthy customer testimonials for fictional products, or diary entries for i.e. 40-year-old suburban dads who are into DIY home improvement projects.

Regardless, with a little bit of ingenuity, you can generate a tremendous amount of valuable research that can inform your attempts to develop a brand voice.

Be careful, though; this is one place where model hallucinations could be really problematic. Be sure to manually check a model’s outputs before using them for anything critical.

Generating Actual Content

Of course, one place where content marketers are using LLMs more often is in actually writing full-fledged content. We’re of the opinion that GPT-3.5 is still not at the level of a skilled human writer, but it’s excellent for creating outlines, generating email blasts, and writing relatively boilerplate introductions and conclusions.

Getting better at prompt engineering

Despite the word “engineering” in its title, prompt engineering remains as much an art as it is a science. Hopefully, the tips we’ve provided here will help you structure your prompts in a way that gets you good results, but there’s no substitute for practicing the way you interact with LLMs.

One way to approach this task is by paying careful attention to the ways in which small word choices impact the kinds of output generated. You could begin developing an intuitive feel for the relationship between input text and output text by simply starting multiple sessions with ChatGPT and trying out slight variations of prompts. If you really want to be scientific about it, copy everything over into a spreadsheet and look for patterns. Over time, you’ll become more and more precise in your instructions, just as an experienced teacher or manager does.

Prompt Engineering Can Help You Build Your Brand

Advanced AI models like ChatGPT are changing the way SEO, content marketing, and brand strategy are being done. From creating buyer personas to using chatbots for customer interactions, these tools can help you get far more work done with less effort.

But you have to be cautious, as LLMs are known to hallucinate information, change their tone, and otherwise behave inappropriately.

With the right prompt engineering expertise, these downsides can be ameliorated, and you’ll be on your way to building a strong brand. If you’re interested in other ways AI tools can take your business to the next level, schedule a demo of Quiq’s conversational CX platform today!

Contact Us

LLMs For the Enterprise: How to Protect Brand Safety While Building Your Brand Persona

It’s long been clear that advances in artificial intelligence change how businesses operate. Whether it’s extremely accurate machine translation, chatbots that automate customer service tasks, or spot-on recommendations for music and shows, enterprises have been using advanced AI systems to better serve their customers and boost their bottom line for years.

Today the big news is generative AI, with large language models (LLMs) in particular capturing the imagination. As we’d expect, businesses in many different industries are enthusiastically looking at incorporating these tools into their workflows, just as prior generations did for the internet, computers, and fax machines.

But this alacrity must be balanced with a clear understanding of the tradeoffs involved. It’s one thing to have a language model answer simple questions, and quite another to have one engaging in open-ended interactions with customers involving little direct human oversight.

If you have an LLM-powered application and it goes off the rails, it could be mildly funny, or it could do serious damage to your brand persona. You need to think through both possibilities before proceeding.

This piece is intended as a primer on effectively using LLMs for the enterprise. If you’re considering integrating LLMs for specific applications and aren’t sure how to weigh the pros and cons, it will provide invaluable advice on the different options available while furnishing the context you need to decide which is the best fit for you.

How Are LLMs Being Used in Business?

LLMs like GPT-4 are truly remarkable artifacts. They’re essentially gigantic neural networks with billions of internal parameters, trained on vast amounts of text data from books and the internet.

Once they’re ready to go, they can be used to ask and answer questions, suggest experiments or research ideas, write code, write blog posts, and perform many other tasks.

Their flexibility, in fact, has come as quite a surprise, which is why they’re showing up in so many places. Before we talk about specific strategies for integrating LLMs into your enterprise, let’s walk through a few business use cases for the technology.

Generating (or rewriting) text

The obvious use case is generating text. GPT-4 and related technologies are very good at writing generic blog posts, copy, and emails. But they’ve also proven useful in more subtle tasks, like producing technical documentation or explaining how pieces of code work.

Sometimes it makes sense to pass this entire job on to LLMs, but in other cases, they can act more like research assistants, generating ideas or taking human-generated bullet points and expanding on them. It really depends on the specifics of what you’re trying to accomplish.

Conversational AI

A subcategory of text generation is using an LLM as a conversational AI agent. Clients or other interested parties may have questions about your product, for instance, and many of them can be answered by a properly fine-tuned LLM instead of by a human. This is a use case where you need to think carefully about protecting your brand persona because LLMs are flexible enough to generate inappropriate responses to questions. You should extensively test any models meant to interact with customers and be sure your tests include belligerent or aggressive language to verify that the model continues to be polite.

Summarizing content

Another place that LLMs have excelled is in summarizing already-existing text. This, too, is something that once would’ve been handled by a human, but can now be scaled up to the greater speed and flexibility of LLMs. People are using LLMs to summarize everything from basic articles on the internet to dense scientific and legal documents (though it’s worth being careful here, as they’re known to sometimes include inaccurate information in these summaries.)

Answering questions

Though it might still be a while before ChatGPT is able to replace Google, it has become more common to simply ask it for help rather than search for the answer online. Programmers, for example, can copy and paste the error messages produced by their malfunctioning code into ChatGPT to get its advice on how to proceed. The same considerations around protecting brand safety that we mentioned in the ‘conversational AI’ section above apply here as well.

Classification

One way to get a handle on a huge amount of data is to use a classification algorithm to sort it into categories. Once you know a data point belongs in a particular bucket you already know a fair bit about it, which can cut down on the amount of time you need to spend on analysis. Classifying documents, tweets, etc. is something LLMs can help with, though at this point a fair bit of technical work is required to get models like GPT-3 to reliably and accurately handle classification tasks.

Sentiment analysis

Sentiment analysis refers to a kind of machine learning in which the overall tone of a piece of text is identified (i.e. is it happy, sarcastic, excited, etc.) It’s not exactly the same thing as classification, but it’s related. Sentiment analysis shows up in many customer-facing applications because you need to know how people are responding to your new brand persona or how they like an update to your core offering, and this is something LLMs have proven useful for.

What Are the Advantages of Using LLMs in Business?

More and more businesses are investigating LLMs for their specific applications because they confer many advantages to those that know how to use them.

For one thing, LLMs are extremely well-suited for certain domains. Though they’re still prone to hallucinations and other problems, LLMs can generate high-quality blog posts, emails, and general copy. At present, the output is usually still not as good as what a skilled human can produce.

But LLMs can generate text so quickly that it often makes sense to have the first draft created by a model and tweaked by a human, or to have relatively low-effort tasks (like generating headlines for social media) delegated to a machine so a human writer can focus on more valuable endeavors.

For another, LLMs are highly flexible. It’s relatively straightforward to take a baseline LLM like GPT-4 and feed it examples of behavior you want to see, such as generating math proofs in the form of poetry (if you’re into that sort of thing.) This can be done with prompt engineering or with a more sophisticated pipeline involving the model’s API, but in either case, you have the option of effectively pointing these general-purpose tools at specific tasks.

None of this is to suggest that LLMs are always and everywhere the right tool for the job. Still, in many domains, it makes sense to examine using LLMs for the enterprise.

What Are the Disadvantages of Using LLMs in Business?

For all their power, flexibility, and jaw-dropping speed, there are nevertheless drawbacks to using LLMs.

One disadvantage of using LLMs in business that people are already familiar with is the variable quality of their output. Sometimes, the text generated by an LLM is almost breathtakingly good. But LLMs can also be biased and inaccurate, and their hallucinations – which may not be a big deal for SEO blog posts – will be a huge liability if they end up damaging your brand.

Exacerbating this problem is the fact that no matter how right or wrong GPT-4 is, it’ll format its response in flawless, confident prose. You might expect a human being who doesn’t understand medicine very well to misspell a specialized word like “Umeclidinium bromide”, and that would offer you a clue that there might be other inaccuracies. But that essentially never happens with an LLM, so special diligence must be exercised in fact-checking their claims.

There can also be substantial operational costs associated with training and using LLMs. If you put together a team to build your own internal LLM you should expect to spend (at least) hundreds of thousands of dollars getting it up and running, to say nothing of the ongoing costs of maintenance.

Of course, you could also build your applications around API calls to external parties like OpenAI, who offer their models’ inferences as an endpoint. This is vastly cheaper, but it comes with downsides of its own. Using this approach means being beholden to another entity, which may release updates that dramatically change the performance of their models and materially impact your business.

Perhaps the biggest underlying disadvantage to using LLMs, however, is their sheer inscrutability. True, it’s not that hard to understand at a high level how models like GPT-4 are trained. But the fact remains that no one really understands what’s happening inside of them. It’s usually not clear why tiny changes to a prompt can result in such wildly different outputs, for example, or why a prompt will work well for a while before performance suddenly starts to decline.

Perhaps you just got unlucky – these models are stochastic, after all – or perhaps OpenAI changed the base model. You might not be able to tell, and either way, it’s hard to build robust, long-range applications around technologies that are difficult to understand and predict.

Contact Us

How Can LLMs Be Integrated Into Enterprise Applications?

If you’ve decided you want to integrate these groundbreaking technologies into your own platforms, there are two basic ways you can proceed. Either you can use a 3rd-party service through an API, or you can try to run your own models instead.

In the following two sections, we’ll cover each of these options and their respective tradeoffs.

Using an LLM through an API

An obvious way of leveraging the power of LLMs is by simply including API calls to a platform that specializes in them, such as OpenAI. Generally, this will involve creating infrastructure that is able to pass a prompt to an LLM and return its output.

If you’re building a user-facing chatbot through this method, that would mean that whenever the user types a question, their question is sent to the model and its response is sent back to the user.

The advantages of this approach are that they offer an extremely low barrier to entry, low costs, and fast response times. Hitting an API is pretty trivial as engineering tasks go, and though you’re charged per token, the bill will surely be less than it would be to stand up an entire machine-learning team to build your own model.

But, of course, the danger is that you’re relying on someone else to deliver crucial functionality. If OpenAI changes its terms of service or simply goes bankrupt, you could find yourself in a very bad spot.

Another disadvantage is that the company running the model may have access to the data you’re passing to its models. A team at Samsung recently made headlines when it was discovered they’d been plowing sensitive meeting notes and proprietary source code directly into ChatGPT, where both were viewable by OpenAI. You should always be careful about the data you’re exposing, particularly if it’s customer data whose privacy you’ve been entrusted to protect.

Running Your Own Model

The way to ameliorate the problems of accessing an LLM through an API is to either roll your own or run an open-source model in an environment that you control.

Building the kind of model that can compete with GPT-4 is really, really difficult, and it simply won’t be an option for any but the most elite engineering teams.

Using an open-source LLM, however, is a much more viable option. There are now many such models for text or code generation, and they can be fine-tuned for the specifics of your use case.

By and large, open-source models tend to be smaller and less performant than their closed-source cousins, so you’ll have to decide whether they’re good enough for you. And you should absolutely not underestimate the complexity of maintaining an open-sourced LLM. Though it’s nowhere near as hard as training one from scratch, maintaining an advanced piece of AI software is far from a trivial task.

All that having been said, this is one path you can take if you have the right applications in mind and the technical skills to pull it off.

How to Protect Brand Safety While Building Your Brand Persona

Throughout this piece, we’ve made mention of various ways in which LLMs can help supercharge your business while also warning of the potential damage a bad LLM response can do to your brand.

At present, there is no general-purpose way of making sure an LLM only does good things while never doing bad things. They can be startlingly creative, and with that power comes the possibility that they’ll be creative in ways you’d rather them not be (same as children, we suppose.)

Still, it is possible to put together an extensive testing suite that substantially reduces the possibility of a damaging incident. You need to feed the model many different kinds of interactions, including ones that are angry, annoyed, sarcastic, poorly spelled or formatted, etc., to see how it behaves.

What’s more, this testing needs to be ongoing. It’s not enough to run a test suite one weekend and declare the model fit for use, it needs to be periodically re-tested to ensure no bad behavior has emerged.

With these techniques, you should be able to build a persona as a company on the cutting edge while protecting yourself from incidents that damage your brand.

What Is the Future of LLMs and AI?

The business world moves fast, and if you’re not keeping up with the latest advances you run the risk of being left behind. At present, large language models like GPT-4 are setting the world ablaze with discussions of their potential to completely transform fields like customer experience chatbots.

If you want in on the action and you have the in-house engineering expertise, you could try to create your own offering. But if you would rather leverage the power of LLMs for chat-based applications by working with a world-class team that’s already done the hard engineering work, reach out to Quiq to schedule a demo.

Request A Demo