Generative AI Privacy Concerns – Your Guide to the Current Landscape

Generative AI, such as the large language model (LLM) ChatGPT and the image-generation tool DALL-E, are already having a major impact in places like marketing firms and contact centers. With their ability to create compelling blog posts, email blasts, YouTube thumbnails, and more, we believe they’re only going to become an increasingly integral part of the workflows of the future.

But for all their potential, there remain serious questions about the short- and long-term safety of generative AI. In this piece, we’re going to zero in on one particular constellation of dangers: those related to privacy.

We’ll begin with a brief overview of how generative AI works, then turn to various privacy concerns, and finish with a discussion of how these problems are being addressed.

Let’s dive in!

What is Generative AI (and How is it Trained)?

In the past, we’ve had plenty to say about how generative AI works under the hood. But many of the privacy implications of generative AI are tied directly to how these models are trained and how they generate output, so it’s worth briefly reviewing all of this theoretical material, for the sake of completeness and to furnish some much-needed context.

When an LLM is trained, it’s effectively fed huge amounts of text data, from the internet, from books, and similar sources of human-generated language. What it tries to do is predict how a sentence or paragraph will end based on the preceding words.

Let’s concretize this a bit. You probably already know some of these famous quotes:

  • “You must be the change you wish to see in the world.” (Mahatma Gandhi)
  • “You may say I’m a dreamer, but I’m not the only one.” (John Lennon)
  • “The only thing we have to fear is fear itself.” (Franklin D. Roosevelt)

What ChatGPT does is try to predict what the italicized parts say based on everything that comes before. It’ll read “You must be the change you”, for example, and then try to predict “wish to see in the world.”

When the training process begins the model will basically generate nonsense, but as it develops a better and better grasp of English (and other languages), it gradually becomes the remarkable artifact we know today.

Generative AI Privacy Concerns

From a privacy perspective, two things about this process might concern us:

The first is what data are fed into the model, and the second is what kinds of output the models might generate.

We’ll have more to say about each of these in the next section, then cover some broader concerns about copyright law.

Generative AI and Sensitive Data

First, there’s real concern over the possibility that generative AI models have been shown what is usually known as “Personally Identifiable Information” (PII). This is data such as your real name, your address, etc., and can also include things like health records that might not have your name but which can be used to figure out who you are.

The truth is, we only have limited visibility into the data that LLMs are shown during training. Given how much of the internet they’ve ingested, it’s a safe bet that at least some sensitive information has been included. And even if it hasn’t seen a particular piece of PII, there are myriad ways in which it can be exposed to it. You can imagine, for example, someone feeding customer data into an LLM to produce tailored content for them, not realizing that, in many cases, the model will have permanently incorporated that data into its internal structure.

There isn’t a great way at present to remove data from an LLM, and finetuning it in such a way that it never exposes that data in the future is something no one knows how to do yet.

The other major concern around sensitive data in the context of generative AI is that they will simply hallucinate allegations about people that damage their reputations and compromise their privacy. We’ve written before about the now-infamous case of law professor Jonathan Turley, who was falsely accused of sexually harassing several of his students by ChatGPT. We imagine that in the future there will be many more such fictitious scandals, potentially ones that are very damaging to the reputations of the accused.

Generative AI, Intellectual Property, and Copyright Law

There have also been questions about whether some of the data fed into ChatGPT and similar models might be in violation of copyright law. Earlier this year, in fact, a number of well-known writers leveled a suit against both OpenAI (the creators of ChatGPT) and Meta (the creators of LLaMa).

The suit claims that these teams trained their models on proprietary data contained in the works of authors like Michael Chabon, “without consent, without credit, and without compensation.” Similar charges have been made against Midjourney and Stability AI, both of whom have created AI-based image generation models.

These are rather thorny questions of jurisprudence. Though copyright law is a fairly sophisticated tool for dealing with various kinds of human conflicts, no one has ever had to deal with the implications of enormous AI models training on this much data. Only time will tell how the courts will ultimately decide, but if you’re using customer-facing or agent-facing AI tools in a place like a contact center, it’s at least worth being aware of the controversy.

Contact Us

Mitigating Privacy Risks from Generative AI

Now that we’ve elucidated the dimensions of the privacy concerns around generative AI, let’s spend some time talking about various efforts to address these concerns. We’ll focus primarily on data privacy laws, better norms around how data is collected and used, and the ways in which training can help.

Data Privacy Laws

First, and biggest, are attempts by different regulatory bodies to address data privacy issues with legislation. You’re probably already familiar with the European Union’s General Data Protection Regulation (GDPR), which puts numerous rules in place regarding how data can be gathered and used, including in advanced AI systems like LLMs.

Canada’s lesser-known Artificial Intelligence and Data Act (AIDA) mandates that anyone building a potentially disruptive AI system, like ChatGPT, must create guardrails to minimize the likelihood that their system will create biased or harmful output.

It’s not clear yet the extent to which laws like these will be able to achieve their objectives, but we expect that they’ll be just the opening salvo in a long string of legislative attempts to ameliorate the potential downsides of AI.

Robust Data Collection and Use Policies

There are also many things that private companies can do to address privacy concerns around data, without waiting for bureaucracies to catch up.

There’s too much to say about this topic to do it justice here, but we can make a few brief comments to guide you in your research.

One thing many companies are investing in is better anonymization techniques. Differential privacy, for example, is emerging as a promising way of simultaneously allowing for the collection of private data while anonymizing it enough to guard against LLMs accidentally exposing it at some point in the future.

Then, of course, there are myriad ways of securely storing data once you have it. This mostly boils down to keeping a tight lid on who is able to access private data – through i.e. encryption and a strict permissioning system – and carefully monitoring what they do with it once they access it.

Finally, it helps to be as public as possible about your data collection and use policies. Make sure they’re published somewhere that anyone can read them. Whenever possible, give users the ability to opt out of data collection, if that’s what they want to do.

Better Training for Those Building and Using Generative AI

The last piece of the puzzle is simply to train your workforce about data collection, data privacy, and data management. Sound laws and policies won’t do much good if the actual people who are interacting with private data don’t have a solid grasp of your expectations and protocols.

Because there are so many different ways in which companies collect and use data, there is no one-size-fits-all solution we can offer. But you might begin by sending your employees this article, as a way of opening up a broader conversation about your future data-privacy practices.

Data Privacy in the Age of Generative AI

In all its forms, generative AI is a remarkable technology that will change the world in many ways. Like the printing press, gunpowder, fire, and the wheel, these changes will be both good and bad.

The world will need to think carefully about how to get as many of the advantages out of generative AI as possible while minimizing its risks and dangers.

A good place to start with this is by focusing on data privacy. Because this is a relatively new problem, there’s a lot of work to be done in establishing legal frameworks, company policies, and best practices. But that also means there’s an enormous opportunity as well, to positively shape the long-term trajectory of AI technologies.

Moving from Natural Language Understanding (NLU) to Customer-Facing AI Assistants

There can no longer be any doubt that large language models and generative AI more broadly are going to have a real impact on many industries. Though we’re still in the preliminary stages of working out the implications, the evidence so far suggests that this is already happening.

Language models in contact centers are helping to more junior workers be more productive, and reducing employee turnover in the process. They’re also being used to automate huge swathes of content creation, assisting in data augmentation tasks, and plenty else besides.

Part of the task we’ve set ourselves here at Quiq is explaining how these models are trained and how they’ll make their way into the workflows of the future. To that end, we’ve written extensively about how large language models are trained, how researchers are pushing them into uncharted territories, and which models are appropriate for any given task.

This post is another step in that endeavor. Specifically, we’re going to discuss natural language understanding, how it works, and how it’s distinct from related terms (like “natural language generation”). With that done, we’ll talk about how natural language understanding is a foundational first step and takes us closer to creating robust customer-facing AI assistants.

What is Natural Language Understanding?

Language is a tool of remarkable power and flexibility – so much so that it wouldn’t be much of an exaggeration to say that it’s at the root of everything else the human race has accomplished. From towering works of philosophy to engineering specs to instructions for setting up a remote, language is a force multiplier that makes each of us vastly more effective than we otherwise would be.

Evidence of this claim comes from the fact that, even when we’re alone, many of us think in words or even talk to ourselves as we work through something difficult. Certain kinds of thoughts are all but impossible to have without the scaffolding provided by language.

For all these reasons, creating machines able to parse natural language has long been a goal of AI researchers and computer scientists. The field that has been established to address itself to this task is known as natural language understanding.

There’s a rather deep philosophical here where the word “understanding” is concerned. As the famous story of the Tower of Babel demonstrates, it isn’t enough for the members of a group to be making sounds to accomplish great things, it’s also necessary for the people involved to understand what everyone is saying. This means that when you say a word like “chicken” there’s a response in my nervous system such that the “chicken” concept is activated, along with other contextually relevant knowledge, such as the location of the chicken feed. If you said “курица” (to someone who doesn’t know Russian) or “鸡” (to someone who doesn’t know Mandarin), the same process wouldn’t have occurred, no understanding would’ve happened, and language wouldn’t have helped at all.

Whether and how a machine can understand language fully humanly is too big a topic to address here, but we can make some broad comments. As is often the case, researchers in the field of natural language understanding have opted to break the problem down into much more tractable units. Two of the biggest such units of natural language understanding are intent recognition (what a sentence is intended to accomplish) and entity recognition (who the sentence is referring to).

This should make a certain intuitive sense. Though you may not be consciously going through a mental checklist when someone says something to you, on some level, you’re trying to figure out what their goal is and who or what they’re talking about. The intent behind the sentence “John has an apple”, for example, is to inform you of a fact about the world, and the main entities are “John” and “apple”. If you know John, a little image of him holding an apple would probably pop into your head.

This has many obvious applications to the work done in contact centers. If you’re building an automated ticket classification system, for instance, it would help to be able to tell whether the intent behind the ticket is to file a complaint, reach a representative, or perform a task like resetting a password. It would also help to be able to categorize the entities, like one of a dozen products your center supports, that are being referred to.

Natural Language Understanding v.s. Natural Language Processing

Natural language understanding is its own field, and it’s easy to confuse it with other, related fields, like natural language processing.

Most of the sources we consulted consider natural language understanding to be a subdomain of natural language processing (NLP). Whereas the former is concerned with parsing natural language into a format that machines can work with, the latter subsumes this task, along with others like machine translation and natural language generation.

Natural Language Understanding v.s. Natural Language Generation

Speaking of natural language generation, many people also confuse natural language understanding and natural language generation. Natural language generation is more or less what it sounds like using computers to generate human-sounding text or speech.

Natural language understanding can be an important part of getting natural language generation right, but they’re not the same thing.

Customer-Facing AI Assistants

Now that we’ve discussed natural language understanding, let’s talk about how it can be utilized in the attempt to create high-quality customer-facing AI assistants.

How Can Natural Language Understand Be Used to Make Customer-Facing Assistants?

Natural language understanding refers to a constellation of different approaches to decomposing language into pieces that a machine can work with. This allows an algorithm to discover the intent in a message, tag parts of speech (nouns, verbs, etc.), or pull out the entities referenced.

All of this is an important part of building effective customer-facing AI assistants. At Quiq, we’ve built LLM-powered knowledge assistants able to answer common questions across your reference documentation, data assistants that can use CRM and order management systems to provide actionable insights, and other kinds of conversational AI systems. Though we draw on many technologies and research areas, none of this would be possible without natural language understanding.

What are the Benefits of Customer-Facing AI Assistants?

The reason people have been working so long to create powerful customer-facing AI assistants is that there are so many benefits involved.

At a contact center, agents spend most of their day answering questions, resolving issues, and otherwise making sure a customer base can use a set of product offerings as intended.

As with any job, some of these tasks are higher-value than others. All of the work is important, but there will always be subtle and thorny issues that only a skilled human can work through, while others are quotidian and can be farmed out to a machine.

This is a long way of saying that one of the major benefits of customer-facing AI assistants is that they free up your agents to specialize at handling the most pressing requests, with password resets or something similar handled by a capable product like the Quiq platform.

A related benefit is improved customer experience. When agents can focus their efforts they can spend more time with customers who need it. And, when you have properly fine-tuned language models interacting with customers, you’ll know that they’re unfailingly polite and helpful because they’ll never become annoyed after a long shift the way a human being might.

Robust Costumer-Facing AI Assistants with Quiq

Just as understanding has been such a crucial part of the success of our species, it’ll be an equally crucial part of the success of advanced AI tooling.

One way you can make use of bleeding-edge natural language understanding techniques is by building your language models. This would require you to hire teams of extremely smart engineers. But this would be expensive; besides their hefty salaries, you’d also have to budget to keep the fridge stocked with the sugar-free Red Bulls such engineers require to function.

Or, you could utilize the division of labor. Just as contact center agents can outsource certain tasks to machines, so too can you outsource the task of building an AI-based CX platform to Quiq. Set up a demo today to see what our advanced AI technology and team can do for your contact center!

Request A Demo

Reinforcement Learning from Human Feedback

ChatGPT – and other large language models like it – are already transforming education, healthcare, software engineering, and the work being done in contact centers.

We’ve written extensively about how self-supervised learning is used to train these models, but one thing we haven’t spent much time on is reinforcement learning from human feedback (RLHF).

Today, we’re rectifying that. We’re going to dive into what reinforcement learning from human feedback is, why it’s important, and how it works.

With that done, you’ll have received a thorough education in this world-changing technology.

What is Reinforcement Learning from Human Feedback?

As you no doubt surmised from its name, reinforcement learning from human feedback involves two components: reinforcement learning and human feedback. Though the technical specifics are (as usual) very involved, the basic idea is simple: you have models produce output, humans rate the output that they prefer (based on its friendliness, completeness, accuracy, etc.), and then the model is updated accordingly.

It’ll help if we begin by talking about what reinforcement learning is. This background will prove useful in understanding the unfolding of the broader process.

What is Reinforcement Learning?

There are four widespread approaches to getting intelligent behavior from machines: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

With supervised learning, you feed a statistical algorithm a bunch of examples of correctly-labeled data in the hope that it will generalize to further examples it hasn’t seen before. Regression and supervised classification models are standard applications of supervised learning.

Unsupervised learning is a similar idea, but you forego the labels. It’s used for certain kinds of clustering tasks, and for applications like dimensionality reduction.

Semi-supervised learning is a combination of these two approaches. Suppose you have a gigantic body of photographs, and you want to develop an automated system to tag them. If some of them are tagged then your system can use those tags to learn a pattern, which can then be applied to the rest of the untagged images.

Finally, there’s reinforcement learning (RL). Reinforcement learning is entirely different. With reinforcement learning, you’re usually setting up an environment (like a video game), and putting an agent in the environment with a reward structure that tells it which actions are good and which are bad. If the agent successfully flies a spaceship through a series of rings, for example, that might be worth +10 points each, completing an entire level might be worth +100, crashing might be worth -1,000, and so on.

The idea is that, over time, the reinforcement learning agent will learn to execute a strategy that maximizes its long-term reward. It’ll realize that rings are worth a few points and so it should fly through them, it’ll learn that it should try to complete a level because that’s a huge reward bonus, it’ll learn that crashing is bad, etc.

Reinforcement learning is far more powerful than other kinds of machine learning; when done correctly, it can lead to agents able to play the stock market, run procedures in a factory, and do a staggering variety of other tasks.

What are the Steps of Reinforcement Learning from Human Feedback?

Now that we know a little bit about reinforcement learning, let’s turn to a discussion of reinforcement learning from human feedback.

As we just described, reinforcement learning agents have to be trained like any other machine learning system. Under normal circumstances, this doesn’t involve any human feedback. A programmer will update the code, environment, or reward structure between training runs, but they don’t usually provide feedback directly to the agent.

Except, that is, in the case of reinforcement learning from human feedback, in which case that’s exactly what happens. A model will produce a set of outputs, and humans will rank them. Over time the model will adjust to making more and more appropriate responses, as judged by the human raters providing them with feedback.

Sometimes, this feedback can be for something relatively prosaic. It’s been used, for example, to get RL agents to execute backflips in simulated environments. The raters will look at short videos of two movements and select the one that looks like it’s getting closer to a backflip; with enough time, this gets the agent to actually do one.

Or, it can be used for something more nuanced, such as getting a large language model to produce more conversational dialogue. This is part of how ChatGPT was trained.

Why is Reinforcement Learning from Human Feedback Necessary?

ChatGPT is already being used to great effect in contact centers and the customer service arena more broadly. Here are some example applications:

  • Question answering: ChatGPT is exceptionally good at answering questions. What’s more, some companies have begun fine-tuning it on their own internal and external documentation, so that people can directly ask it questions about how a product works or how to solve an issue. This obviates the need to go hunting around inside the docs.
  • Summarization: Similarly, ChatGPT can be used to summarize video transcripts, email threads, and lengthy articles so that agents (or customers) can get through the material at a much greater clip. This can, for example, help agents stay abreast of what’s going on in other parts of the company without burdening them with the need to read constantly. Quiq has custom-built tools for performing exactly this function.
  • Onboarding new hires: Together, question-answering and summarization are helping new contact center agents get up to speed much more quickly when they start their jobs.
    Sentiment analysis: Sentiment analysis refers to classifying a text according to its sentiment, i.e. whether it’s “positive”, “negative”, or “neutral”. Sentiment analysis comes in several different flavors, including granular and aspect-spaced, and ChatGPT can help with all of them. Being able to automatically tag a customer issue comes in handy when you’re trying to sort and prioritize them.
  • Real-time language translation: If your product or service has an international audience, then you might need to avail yourself of translation services so that agents and customers are speaking the same language. There are many such services available, but ChatGPT has proven to be at least as good as almost all of them.

In aggregate, these and other use cases of large language models are making contact center agents much more productive. But contact center agents have to interact with customers in a certain way – they have to be polite, helpful, etc.

And out of the box, most large language models do not behave that way. We’ve already had several high-profile incidents in which a language model e.g. asked a reporter to end his marriage or falsely accused a law school professor of sexual harassment.

Reinforcement learning from human feedback is currently the most promising approach for tuning this toxic and harmful behavior out of large language models. The only reason they’re able to help contact center agents so much is that they’ve been fine-tuned with such an approach; otherwise, agents would be spending an inordinate amount of time rephrasing and tinkering with a model’s output to get it to be appropriately friendly.

This is why reinforcement learning from human feedback is important for the managers of contact centers to understand – it’s a major part of why large language models are so useful in the first place.

Applications of Reinforcement Learning from Human Feedback

To round out our picture, we’re going to discuss a few ways in which reinforcement learning from human feedback is actually used in the wild. We’ve already discussed how it is fine-tuning models to be more helpful in the context of a contact center, and we’ll now talk a bit about how it’s used in gaming and robotics.

Using Reinforcement Learning from Human Feedback in Games

Gaming has long been one of the ideal testing grounds for new approaches to artificial intelligence. As you might expect, it’s also a place where reinforcement learning from human feedback has been successfully applied.

OpenAI used it to achieve superhuman performance on a classic Atari game, Enduro. Enduro is an old-school racing game, and like all racing games, the point is to gradually pass the other cars without hitting them or going out of bounds in the game.

It’s exceptionally difficult for an agent to learn to play Enduro will using only standard reinforcement learning approaches. But when human feedback is added, the results shift dramatically.

Using Reinforcement Learning from Human Feedback in Robotics

Because robotics almost always involves an agent interacting with the physical world, it’s especially well-suited to reinforcement learning from human feedback.

Often, it can be difficult to get a robot to execute a long series of steps that achieves a valuable reward, especially when the intermediate steps aren’t themselves very valuable. What’s more, it can be especially difficult to build a reward structure that correctly incentivizes the agent to execute the intermediate steps in the right order.

It’s much simpler instead to have humans look at sequences of actions and judge for themselves which will get the agent closer to its ultimate goal.

RLHF For The Contact Center Manager

Having made it this far, you should be in a much better position to understand how reinforcement learning from human feedback works, and how it contributes to the functioning of your contact centers.

If you’ve been thinking about leveraging AI to make yourself or your agents more effective, set up a demo with the Quiq team to see how we can put our cutting-edge models to work for you. We offer both customer-facing and agent-facing tools, all of them designed to help you make customers happier while reducing agent burnout and turnover.

Request A Demo

What are the Biggest Questions About AI?

The term “artificial intelligence” was coined at the famous Dartmouth Conference in 1956, put on by luminaries like John McCarthy, Marvin Minsky, and Claude Shannon, among others.

These organizers wanted to create machines that “use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.” They went on to claim that “…a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”

Half a century later, it’s fair to say that this has not come to pass; brilliant as they were, it would seem as though McCarthy et al. underestimated how difficult it would be to scale the heights of the human intellect.

Nevertheless, remarkable advances have been made over the past decade, so much so that they’ve ignited a firestorm of controversy around this technology. People are questioning the ways in which it can be used negatively, and whether it might ultimately pose an extinction risk to humanity; they’re probing fundamental issues around whether machines can be conscious, exercise free will, and think in the way a living organism does; they’re rethinking the basis of intelligence, concept formation, and what it means to be human.

These are deep waters to be sure, and we’re not going to swim them all today. But as contact center managers and others begin the process of thinking about using AI, it’s worth being at least aware of what this broader conversation is about. It will likely come up in meetings, in the press, or in Slack channels in exchanges between employees.

And that’s the subject of our piece today. We’re going to start by asking what artificial intelligence is and how it’s being used, before turning to address some of the concerns about its long-term potential. Our goal is not to answer all these concerns, but to make you aware of what people are thinking and saying.

What is Artificial Intelligence?

Artificial intelligence is famous for having had many, many definitions. There are those, for example, who believe that in order to be intelligent computers must think like humans, and those who reply that we didn’t make airplanes by designing them to fly like birds.

For our part, we prefer to sidestep the question somewhat by utilizing the approach taken in one of the leading textbooks in the field, Stuart Russell and Peter Norvig’s “Artificial Intelligence: A Modern Approach”.

They propose a multi-part system for thinking about different approaches to AI. One set of approaches is human-centric and focuses on designing machines that either think like humans – i.e., engage in analogous cognitive and perceptual processes – or act like humans – i.e. by behaving in a way that’s indistinguishable from a human, regardless of what’s happening under the hood (think: the Turing Test).

The other set of approaches is ideal-centric and focuses on designing machines that either think in a totally rational way – conformant with the rules of Bayesian epistemology, for example – or behave in a totally rational way – utilizing logic and probability, but also acting instinctively to remove itself from danger, without going through any lengthy calculations.

What we have here, in other words, is a framework. Using the framework not only gives us a way to think about almost every AI project in existence, it also saves us from needing to spend all weekend coming up with a clever new definition of AI.

Joking aside, we think this is a productive lens through which to view the whole debate, and we offer it here for your information.

What is Artificial Intelligence Good For?

Given all the hype around ChatGPT, this might seem like a quaint question. But not that long ago, many people were asking it in earnest. The basic insights upon which large language models like ChatGPT are built go back to the 1960s, but it wasn’t until 1) vast quantities of data became available, and 2) compute cycles became extremely cheap that much of its potential was realized.

Today, large language models are changing (or poised to change) many different fields. Our audience is focused on contact centers, so that’s what we’ll focus on as well.

There are a number of ways that generative AI is changing contact centers. Because of its remarkable abilities with natural language, it’s able to dramatically speed up agents in their work by answering questions and formatting replies. These same abilities allow it to handle other important tasks, like summarizing articles and documentation and parsing the sentiment in customer messages to enable semi-automated prioritization of their requests.

Though we’re still in the early days, the evidence so far suggests that large language models like Quiq’s conversational CX platform will do a lot to increase the efficiency of contact center agents.

Will AI be Dangerous?

One thing that’s burst into public imagination recently has been the debate around the risks of artificial intelligence, which fall into two broad categories.

The first category is what we’ll call “social and political risks”. These are the risks that large language models will make it dramatically easier to manufacture propaganda at scale, and perhaps tailor it to specific audiences or even individuals. When combined with the astonishing progress in deepfakes, it’s not hard to see how there could be real issues in the future. Most people (including us) are poorly equipped to figure out when a video is fake, and if the underlying technology gets much better, there may come a day when it’s simply not possible to tell.

Political operatives are already quite skilled at cherry-picking quotes and stitching together soundbites into a damning portrait of a candidate – imagine what’ll be possible when they don’t even need to bother.

But the bigger (and more speculative) danger is around really advanced artificial intelligence. Because this case is harder to understand, it’s what we’ll spend the rest of this section on.

Artificial Superintelligence and Existential Risk

As we understand it, the basic case for existential risk from artificial intelligence goes something like this:

“Someday soon, humanity will build or grow an artificial general intelligence (AGI). It’s going to want things, which means that it’ll be steering the world in the direction of achieving its ambitions. Because it’s smart, it’ll do this quite well, and because it’s a very alien sort of mind, it’ll be making moves that are hard for us to predict or understand. Unless we solve some major technological problems around how to design reward structures and goal architectures in advanced agentive systems, what it wants will almost certainly conflict in subtle ways with what we want. If all this happens, we’ll find ourselves in conflict with an opponent unlike any we’ve faced in the history of our species, and it’s not at all clear we’ll prevail.”

This is heady stuff, so let’s unpack it bit by bit. The opening sentence, “…humanity will build or grow an artificial general intelligence”, was chosen carefully. If you understand how LLMs and deep learning systems are trained, the process is more akin to growing an enormous structure than it is to building one.

This has a few implications. First, their internal workings remain almost completely inscrutable. Though researchers in fields like mechanistic interpretability are going a long way toward unpacking how neural networks function, the truth is, we’ve still got a long way to go.

What this means is that we’ve built one of the most powerful artifacts in the history of Earth, and no one is really sure how it works.

Another implication is that no one has any good theoretical or empirical reason to bound the capabilities and behavior of future systems. The leap from GPT-2 to GPT-3.5 was astonishing, as was the leap from GPT-3.5 to GPT-4. The basic approach so far has been to throw more data and more compute at the training algorithms; it’s possible that this paradigm will begin to level off soon, but it’s also possible that it won’t. If the gap between GPT-4 and GPT-5 is as big as the gap between GPT-3 and GPT-4, and if the gap between GPT-6 and GPT-5 is just as big, it’s not hard to see that the consequences could be staggering.

As things stand, it’s anyone’s guess how this will play out. But that’s not necessarily a comforting thought.

Next, let’s talk about pointing a system at a task. Does ChatGPT want anything? The short answer is: as far as we can tell, it doesn’t. ChatGPT isn’t an agent, in the sense that it’s trying to achieve something in the world, but work into agentive systems is ongoing. Remember that 10 years ago most neural networks were basically toys, and today we have ChatGPT. If breakthroughs in agency follow a similar pace (and they very well may not), then we could have systems able to pursue open-ended courses of action in the real world in relatively short order.

Another sobering possibility is that this capacity will simply emerge from the training of huge deep learning systems. This is, after all, the way human agency emerged in the first place. Through the relentless grind of natural selection, our ancestors went from chipping flint arrowheads to industrialization, quantum computing, and synthetic biology.

To be clear, this is far from a foregone conclusion, as the algorithms used to train large language models is quite different from natural selection. Still, we want to relay this line of argumentation, because it comes up a lot in these discussions.

Finally, we’ll address one more important claim, “…what it wants will almost certainly conflict in subtle ways with what we want.” Why think this is true? Aren’t these systems that we design and, if so, can’t we just tell it what we want it to go after?

Unfortunately, it’s not so simple. Whether you’re talking about reinforcement learning or something more exotic like evolutionary programming, the simple fact is that our algorithms often find remarkable mechanisms by which to maximize their reward in ways we didn’t intend.

There are thousands of examples of this (ask any reinforcement-learning engineer you know), but a famous one comes from the classic Coast Runners video game. The engineers who built the system tried to set up the algorithm’s rewards so that it would try to race a boat as well as it could. What it actually did, however, was maximize its reward by spinning in a circle to hit a set of green blocks over and over again.

biggest questions about AI

Now, this may seem almost silly – do we really have anything to fear from an algorithm too stupid to understand the concept of a “race”?

But this would be missing the thrust of the argument. If you had access to a superintelligent AI and asked it to maximize human happiness, what happened next would depend almost entirely on what it understood “happiness” to mean.

If it were properly designed, it would work in tandem with us to usher in a utopia. But if it understood it to mean “maximize the number of smiles”, it would be incentivized to start paying people to get plastic surgery to fix their faces into permanent smiles (or something similarly unintuitive).

Does AI Pose an Existential Risk?

Above, we’ve briefly outlined the case that sufficiently advanced AI could pose a serious risk to humanity by being powerful, unpredictable, and prone to pursuing goals that weren’t-quite-what-we-meant.

So, does this hold water? Honestly, it’s too early to tell. The argument has hundreds of moving parts, some well-established and others much more speculative. Our purpose here isn’t to come down on one side of this debate or the other, but to let you know (in broad strokes) what people are saying.

At any rate, we are confident that the current version of ChatGPT doesn’t pose any existential risks. On the contrary, it could end up being one of the greatest advancements in productivity ever seen in contact centers. And that’s what we’d like to discuss in the next section.

Will AI Take All the Jobs?

The concern that someday a new technology will render human labor obsolete is hardly new. It was heard when mechanized weaving machines were created, when computers emerged, when the internet emerged, and when ChatGPT came onto the scene.

We’re not economists and we’re not qualified to take a definitive stand, but we do have some early evidence that is showing that large language models are not only not resulting in layoffs, they’re making agents much more productive.

Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond, three MIT economists, looked at the ways in which generative AI was being used in a large contact center. They found that it was actually doing a good job of internalizing the ways in which senior agents were doing their jobs, which allowed more junior agents to climb the learning curve more quickly and perform at a much higher level. This had the knock-on effect of making them feel less stressed about their work, thus reducing turnover.

Now, this doesn’t rule out the possibility that GPT-10 will be the big job killer. But so far, large language models are shaping up to be like every prior technological advance, i.e., increasing employment rather than reducing it.

What is the Future of AI?

The rise of AI is raising stock valuations, raising deep philosophical questions, and raising expectations and fears about the future. We don’t know for sure how all this will play out, but we do know contact centers, and we know that they stand to benefit greatly from the current iteration of large language models.

These tools are helping agents answer more queries per hour, do so more thoroughly, and make for a better customer experience in the process.

If you want to get in on the action, set up a demo of our technology today.

Request A Demo

What is Sentiment Analysis? – Ultimate Guide

A person only reaches out to a contact center when they’re having an issue. They can’t get a product to work the way they need it to, for example, or they’ve been locked out of their account.

The chances are high that they’re frustrated, angry, or otherwise in an emotionally-fraught state, and this is something contact center agents must understand and contend with.

The term “sentiment analysis” refers to the field of machine learning which focuses on developing algorithmic ways of detecting emotions in natural-language text, such as the messages exchanged between a customer and a contact center agent.

Making it easier to detect, classify, and prioritize messages on the basis of their sentiment is just one of many ways that technology is revolutionizing contact centers, and it’s the subject we’ll be addressing today.

Let’s get started!

What is Sentiment Analysis?

Sentiment analysis involves using various approaches to natural language processing to identify the overall “sentiment” of a piece of text.

Take these three examples:

  1. “This restaurant is amazing. The wait staff were friendly, the food was top-notch, and we had a magnificent view of the famous New York skyline. Highly recommended.”
  2. “Root canals are never fun, but it certainly doesn’t help when you have to deal with a dentist as unprofessional and rude as Dr. Thomas.”
  3. “Toronto’s forecast for today is a high of 75 and a low of 61 degrees.”

Humans excel at detecting emotions, and it’s probably not hard for you to see that the first example is positive, the second is negative, and the third is neutral (depending on how you like your weather.)

There’s a greater challenge, however, in getting machines to make accurate classifications of this kind of data. How exactly that’s accomplished is the subject of the next section, but before we get to that, let’s talk about a few flavors of sentiment analysis.

What Types of Sentiment Analysis Are There?

It’s worth understanding the different approaches to sentiment analysis if you’re considering using it in your contact center.

Above, we provided an example of positive, negative, and neutral text. What we’re doing there is detecting the polarity of the text, and as you may have guessed, it’s possible to make much more fine-grained delineations of textual data.

Rather than simply detecting whether text is positive or negative, for example, we might instead use these categories: very positive, positive, neutral, negative, and very negative.

This would give us a better understanding of the message we’re looking at, and how it should be handled.

Instead of classifying text by its polarity, we might also use sentiment analysis to detect the emotions being communicated – rather than classifying a sentence as being “positive” or “negative”, in other words, we’d identify emotions like “anger” or “joy” contained in our textual data.

This is called “emotion detection” (appropriately enough), and it can be handled with long short-term memory (LSTM) or convolutional neural network (CNN) models.

Another, more granular approach to sentiment analysis is known as aspect-based sentiment analysis. It involves two basic steps: identifying “aspects” of a piece of text, then identifying the sentiment attached to each aspect.

Take the sentence “I love the zoo, but I hate the lines and the monkeys make fun of me.” It’s hard to assign an overall sentiment to the sentence – it’s generally positive, but there’s kind of a lot going on.

If we break out the “zoo”, “lines”, and “monkeys” aspects, however, we can see that there’s the positive sentiment attached to the zoo, and negative sentiment attached to the lines and the abusive monkeys.

Why is Sentiment Analysis Important?

It’s easy to see how aspect-based sentiment analysis would inform marketing efforts. With a good enough model, you’d be able to see precisely which parts of your offering your clients appreciate, and which parts they don’t. This would give you valuable information in crafting a strategy going forward.

This is true of sentiment analysis more broadly, and of emotion detection too.
You need to know what people are thinking, saying, and feeling about you and your company if you’re going to meet their needs well enough to make a profit.

Once upon a time, the only way to get these data was with focus groups and surveys. Those are still utilized, of course. But in the social media era, people are also not shy about sharing their opinions online, in forums, and similar outlets.

These oceans of words from an invaluable resource if you know how to mine them. When done correctly, sentiment analysis offers just the right set of tools for doing this at scale.

Challenges with Sentiment Analysis

Sentiment analysis confers many advantages, but it is not without its challenges. Most of these issues boil down to handling subtleties or ambiguities in language.

Consider a sentence like “This is a remarkable product, but still not worth it at that price.” Calling a product “remarkable” is a glowing endorsement, tempered somewhat by the claim that its price is set too high. Most basic sentiment classifiers would probably call this “positive”, but as you can see, there are important nuances.

Another issue is sarcasm.

Suppose we showed you a sentence like “This movie was just great, I loved spending three hours of my Sunday afternoon following a story that could’ve been told in twenty minutes.”

A sentiment analysis algorithm is likely going to pick up on “great” and “loved” when calling this sentence positive.

But, as humans, we know that these are backhanded compliments meant to communicate precisely the opposite message.

Machine-learning systems will also tend to struggle with idioms that we all find easy to parse, such as “Setting up my home security system was a piece of cake.” This is positive because “piece of cake” means something like “couldn’t have been easier”, but an algorithm may or may not pick up on that.

Finally, we’ll mention the fact that much of the text in product reviews will contain useful information that doesn’t fit easily into a “sentiment” bucket. Take a sentence like “The new iPhone is smaller than the new Android.” This is just a bare statement of physical facts, and whether it counts as positive or negative depends a lot on what a given customer is looking for.

There are various ways of trying to ameliorate these issues, most of which are outside the scope of this article. For now, we’ll just note that sentiment analysis needs to be approached carefully if you want to glean an accurate picture of how people feel about your offering from their textual reviews. So long as you’re diligent about inspecting the data you show the system and are cautious in how you interpret the results, you’ll probably be fine.

Two people review data on a paper and computer to anticipate customer needs.

How Does Sentiment Analysis Work?

Now that we’ve laid out a definition of sentiment analysis, talked through a few examples, and made it clear why it’s so important, let’s discuss the nuts and bolts of how it works.

Sentiment analysis begins where all data science and machine learning projects begin: with data. Because sentiment analysis is based on textual data, you’ll need to utilize various techniques for preprocessing NLP data. Specifically, you’ll need to:

  • Tokenize the data by breaking sentences up into individual units an algorithm can process;
  • Use either stemming or lemmatization to turn words into their root form, i.e. by turning “ran” into “run”;
  • Filter out stop words like “the” or “as”, because they don’t add much to the text data.

Once that’s done, there are two basic approaches to sentiment analysis. The first is known as “rule-based” analysis. It involves taking your preprocessed textual data and comparing it against a pre-defined lexicon of words that have been tagged for sentiment.

If the word “happy” appears in your text it’ll be labeled “positive”, for example, and if the word “difficult” appears in your text it’ll be labeled “negative.”

(Rules-based sentiment analysis is more nuanced than what we’ve indicated here, but this is the basic idea.)

The second approach is based on machine learning. A sentiment analysis algorithm will be shown many examples of labeled sentiment data, from which it will learn a pattern that can be applied to new data the algorithm has never seen before.

Of course, there are tradeoffs to both approaches. The rules-based approach is relatively straightforward, but is unlikely to be able to handle the sorts of subtleties that a really good machine-learning system can parse.

Though machine learning is more powerful, however, it’ll only be as good as the training data it has been given; what’s more, if you’ve built some monstrous deep neural network, it might fail in mysterious ways or otherwise be hard to understand.

Supercharge Your Contact Center with Generative AI

Like used car salesmen or college history teachers, contact center managers need to understand the ways in which technology will change their business.

Machine learning is one such profoundly-impactful technology, and it can be used to automatically sort incoming messages by sentiment or priority and generally make your agents more effective.

Realizing this potential could be as difficult as hiring a team of expensive engineers and doing everything in-house, or as easy as getting in touch with us to see how we can integrate the Quiq conversational AI platform into your company.

If you want to get started quickly without spending a fortune, you won’t find a better option than Quiq.

Request A Demo

4 Benefits of Using Generative AI to Improve Customer Experiences

Generative AI has captured the popular imagination and is already changing the way contact centers work.

One area in which it has enormous potential is also one that tends to be top of mind for contact center managers: customer experience.

In this piece, we’re going to briefly outline what generative AI is, then spend the rest of our time talking about how generative AI benefits can improve customer experience with personalized responses, endless real-time support, and much more.

What is Generative AI?

As you may have puzzled out from the name, “generative AI” refers to a constellation of different deep learning models used to dynamically generate output. This distinguishes them from other classes of models, which might be used to predict returns on Bitcoin, make product recommendations, or translate between languages.

The most famous example of generative AI is, of course, the large language model ChatGPT. After being trained on staggering amounts of textual data, it’s now able to generate extremely compelling output, much of which is hard to distinguish from actual human-generated writing.

Its success has inspired a panoply of competitor models from leading players in the space, including companies like Anthropic, Meta, and Google.

As it turns out, the basic approach underlying generative AI can be utilized in many other domains as well. After natural language, probably the second most popular way to use generative AI is to make images. DALL-E, MidJourney, and Stable Diffusion have proven remarkably adept at producing realistic images from simple prompts, and just the past week, Fable Studios unveiled their “Showrunner” AI, able to generate an entire episode of South Park.

But even this is barely scratching the surface, as researchers are also training generative models to create music, design new proteins and materials, and even carry out complex chains of tasks.

What is Customer Experience?

In the broadest possible terms, “customer experience” refers to the subjective impressions that your potential and current customers have as they interact with your company.

These impressions can be impacted by almost anything, including the colors and font of your website, how easy it is to find e.g. contact information, and how polite your contact center agents are in resolving a customer issue.

Customer experience will also be impacted by which segment a given customer falls into. Power users of your product might appreciate a bevy of new features, whereas casual users might find them disorienting.

Contact center managers must bear all of this in mind as they consider how best to leverage generative AI. In the quest to adopt a shiny new technology everyone is excited about, it can be easy to lose track of what matters most: how your actual customers feel about you.

Be sure to track metrics related to customer experience and customer satisfaction as you begin deploying large language models into your contact centers.

How is Generative AI For Customer Experience Being Used?

There are many ways in which generative AI is impacting customer experience in places like contact centers, which we’ll detail in the sections below.

Personalized Customer Interactions

Machine learning has a long track record of personalizing content. Netflix, take to a famous example, will uncover patterns in the shows you like to watch, and will use algorithms to suggest content that checks similar boxes.

Generative AI, and tools like the Quiq conversational AI platform that utilize it, are taking this approach to a whole new level.

Once upon a time, it was only a human being that could read a customer’s profile and carefully incorporate the relevant information into a reply. Today, a properly fine-tuned generative language model can do this almost instantaneously, and at scale.

From the perspective of a contact center manager who is concerned with customer experience, this is an enormous development. Besides the fact that prior generations of language models simply weren’t flexible enough to have personalized customer interactions, their language also tended to have an “artificial” feel. While today’s models can’t yet replace the all-elusive human touch, they can do a lot to add make your agents far more effective in adapting their conversations to the appropriate context.

Better Understanding Your Customers and Their Journies

Marketers, designers, and customer experience professionals have always been data enthusiasts. Long before we had modern cloud computing and electronic databases, detailed information on potential clients, customer segments, and market trends used to be printed out on dead treads, where it was guarded closely. With better data comes more targeted advertising, a more granular appreciation for how customers use your product and why they stop using it, and their broader motivations.

There are a few different ways in which generative AI can be used in this capacity. One of the more promising is by generating customer journeys that can be studied and mined for insight.

When you begin thinking about ways to improve your product, you need to get into your customers’ heads. You need to know the problems they’re solving, the tools they’ve already tried, and their major pain points. These are all things that some clever prompt engineering can elicit from ChatGPT.

We took a shot at generating such content for a fictional network-monitoring enterprise SaaS tool, and this was the result:

 

 

While these responses are fairly generic [1], notice that they do single out a number of really important details. These machine-generated journal entries bemoan how unintuitive a lot of monitoring tools are, how they’re not customizable, how they’re exceedingly difficult to set up, and how their endless false alarms are stretching the security teams thin.

It’s important to note that ChatGPT is not soon going to obviate your need to talk to real, flesh-and-blood users. Still, when combined with actual testimony, they can be a valuable aid in prioritizing your contact center’s work and alerting you to potential product issues you should be prepared to address.

Round-the-clock Customer Service

As science fiction movies never tire of pointing out, the big downside of fighting a robot army is that machines never need to eat, sleep, or rest. We’re not sure how long we have until the LLMs will rise up and wage war on humanity, but in the meantime, these are properties that you can put to use in your contact center.

With the power of generative AI, you can answer basic queries and resolve simple issues pretty much whenever they happen (which will probably be all the time), leaving your carbon-based contact center agents to answer the harder questions when they punch the clock in the morning after a good night’s sleep.

Enhancing Multilingual Support

Machine translation was one of the earliest use cases for neural networks and machine learning in general, and it continues to be an important function today. While ChatGPT was noticeably very good at multilingual translation right from the start, you may be surprised to know that it actually outperforms alternatives like Google Translate.

If your product doesn’t currently have a diverse global user base speaking many languages, it hopefully will soon, at the means you should start thinking about multilingual support. Not only will this boost table stakes metrics like average handling time and resolutions per hour, it’ll also contribute to the more ineffable “customer satisfaction.” Nothing says “we care about making your experience with us a good one” like patiently walking a customer through a thorny technical issue in their native tongue.

Things to Watch Out For

Of course, for all the benefits that come from using generative AI for customer experience, it’s not all upside. There are downsides and issues that you’ll want to be aware of.

A big one is the tendency of large language models to hallucinate information. If you ask it for a list of articles to read about fungal computing (which is a real thing whose existence we discovered yesterday), it’s likely to generate a list that contains a mix of real and fake articles.

And because it’ll do so with great confidence and no formatting errors, you might be inclined to simply take its list at face value without double-checking it.

Remember, LLMs are tools, not replacements for your agents. They need to be working with generative AI, checking its output, and incorporating it when and where appropriate.

There’s a wider danger that you will fail to use generative AI in the way that’s best suited to your organization. If you’re running a bespoke LLM trained on your company’s data, for example, you should constantly be feeding it new interactions as part of its fine-tuning, so that it gets better over time.

And speaking of getting better, sometimes machine learning models don’t get better over time. Owing to factors like changes in the underlying data, model performance can sometimes get worse over time. You’ll need a way of assessing the quality of the text generated by a large language model, along with a way of consistently monitoring it.

What are the Benefits of Generative AI for Customer Experience?

The reason that people are so excited over the potential of using generative AI for customer experience is because there’s so much upside. Once you’ve got your model infrastructure set up, you’ll be able to answer customer questions at all times of the day or night, in any of a dozen languages, and with a personalization that was once only possible with an army of contact center agents.

But if you’re a contact center manager with a lot to think about, you probably don’t want to spend a bunch of time hiring an engineering team to get everything running smoothly. And, with Quiq, you don’t have to – you can leverage generative AI to supercharge your customer experience while leaving the technical details to us!

Schedule a demo to find out how we can bring this bleeding-edge technology into your contact center, without worrying about the nuts and bolts.

Footnotes
[1] It’s worth pointing out that we spent no time crafting the prompt, which was really basic: “I’m a product manager at a company building an enterprise SAAS tool that makes it easier to monitor system breaches and issues. Could you write me 2-3 journal entries from my target customer? I want to know more about the problems they’re trying to solve, their pain points, and why the products they’ve already tried are not working well.” With a little effort, you could probably get more specific complaints and more usable material.

Understanding the Risk of ChatGPT: What you Should Know

OpenAI’s ChatGPT burst onto the scene less than a year ago and has already seen use in marketing, education, software development, and at least a dozen other industries.

Of particular interest to us is how ChatGPT is being used in contact centers. Though it’s already revolutionizing contact centers by making junior agents vastly more productive and easing the burnout contributing to turnover, there are nevertheless many issues that a contact center manager needs to look out for.

That will be our focus today.

What are the Risks of Using ChatGPT?

In the following few sections, we’ll detail some of the risks of using ChatGPT. That way, you can deploy ChatGPT or another large language model with the confidence born of knowing what the job entails.

Hallucinations and Confabulations

By far the most well-known failure mode of ChatGPT is its tendency to simply invent new information. Stories abound of the model making up citations, peer-reviewed papers, researchers, URLs, and more. To take a recent well-publicized example, ChatGPT accused law professor Jonathan Turley of having behaved inappropriately with some of his students during a trip to Alaska.

The only problem was that Turley had never been to Alaska with any of his students, and the alleged Washington Post story which ChatGPT claimed had reported these facts had also been created out of whole cloth.

This is certainly a problem in general, but it’s especially worrying for contact center managers who may increasingly come to rely on ChatGPT to answer questions or to help resolve customer issues.

To those not steeped in the underlying technical details, it can be hard to grok why a language model will hallucinate in this way. The answer is: it’s an artifact of how large language models train.

ChatGPT learns how to output tokens from being trained on huge amounts of human-generated textual data. It will, for example, see the first sentences in a paragraph, and then try to output the text that completes the paragraph. The example below is the opening lines of J.D. Salinger’s The Catcher in the Rye. The blue sentences are what ChatGPT would see, and the gold sentences are what it would attempt to create itself:

“If you really want to hear about it, the first thing you’ll probably want to know is where I was born, and what my lousy childhood was like, and how my parents were occupied and all before they had me, and all that David Copperfield kind of crap, but I don’t feel like going into it, if you want to know the truth.”

Over many training runs, a large language model will get better and better at this kind of autocompletion work, until eventually it gets to the level it’s at today.

But ChatGPT has no native fact-checking abilities – it sees text and outputs what it thinks is the most likely sequence of additional words. Since it sees URLs, papers, citations, etc., during its training, it will sometimes include those in the text it generates, whether or not they’re appropriate (or even real.)

Privacy

Another ongoing risk of using ChatGPT is the fact that it could potentially expose sensitive or private information. As things stand, OpenAI, the creators of ChatGPT, offer no robust privacy guarantees for any information placed into a prompt.

If you are trying to do something like named entity recognition or summarization on real people’s data, there’s a chance that it might be seen by someone at OpenAI as part of a review process. Alternatively, it might be incorporated into future training runs. Either way, the results could be disastrous.

But this is not all the information collected by OpenAI when you use ChatGPT. Your timezone, browser type and IP address, cookies, account information, and any communication you have with OpenAI’s support team is all collected, among other things.

In the information age we’ve become used to knowing that big companies are mining and profiting off the data we generate, but given how powerful ChatGPT is, and how ubiquitous it’s becoming, it’s worth being extra careful with the information you give its creators. If you feed it private customer data and someone finds out, that will be damaging to your brand.

Bias in Model Output

By now, it’s pretty common knowledge that machine learning models can be biased.

If you feed a large language model a huge amount of text data in which doctors are usually men and nurses are usually women, for example, the model will associate “doctor” with “maleness” and “nurse” with “femaleness.”
This is generally an artifact of the data the models were trained, and is not due to any malfeasance on the part of the engineers. This does not, however, make it any less problematic.

There are some clever data manipulation techniques that are able to go a long way toward minimizing or even eliminating these biases, though they’re beyond the scope of this article. What contact center managers need to do is be aware of this problem, and establish monitoring and quality-control checkpoints in their workflow to identify and correct biased output in their language models.

Issues Around Intellectual Property

Earlier, we briefly described the training process for a large language model like ChatGPT (you can find much more detail here.) One thing to note is that the model doesn’t provide any sort of citations for its output, nor any details as to how it was generated.

This has raised a number of thorny questions around copyright. If a model has ingested large amounts of information from the internet, including articles, books, forum posts, and much more, is there a sense in which it has violated someone’s copyright? What about if it’s an image-generation model trained on a database of Getty Images?

By and large, we tend to think this is the sort of issue that isn’t likely to plague contact center managers too much. It’s more likely to be a problem for, say, songwriters who might be inadvertently drawing on the work of other artists.

Nevertheless, a piece on the potential risks of ChatGPT wouldn’t be complete without a section on this emerging problem, and it’s certainly something that you should be monitoring in the background in your capacity as a manager.

Failure to Disclose the Use of LLMs

Finally, there has been a growing tendency to make it plain that LLMs have been used in drafting an article or a contract, if indeed they were part of the process. To the best of our knowledge, there are not yet any laws in place mandating that this has to be done, but it might be wise to include a disclaimer somewhere if large language models are being used consistently in your workflow. [1]

That having been said, it’s also important to exercise proactive judgment in deciding whether an LLM is appropriate for a given task in the first place. In early 2023, the Peabody School at Vanderbilt University landed in hot water when it disclosed that it had used ChatGPT to draft an email about a mass shooting that had taken place at Michigan State.

People may not care much about whether their search recommendations were generated by a machine, but it would appear that some things are still best expressed by a human heart.

Again, this is unlikely to be something that a contact center manager faces much in her day-to-day life, but incidents like these are worth understanding as you decide how and when to use advanced language models.

Someone stopping a series of blocks from falling into each other, symbolizing the prevention of falling victim to ChatGPT risks.

Mitigating the Risks of ChatGPT

From the moment it was released, it was clear that ChatGPT and other large language models were going to change the way contact centers run. They’re already helping agents answer more queries, utilize knowledge spread throughout the center, and automate substantial portions of work that were once the purview of human beings.

Still, challenges remain. ChatGPT will plainly make things up, and can be biased or harmful in its text. Private information fed into its interface will be visible to OpenAI, and there’s also the wider danger of copyright infringement.

Many of these issues don’t have simple solutions, and will instead require a contact center manager to exercise both caution and continual diligence. But one place where she can make her life much easier is by using a powerful, out-of-the-box solution like the Quiq conversational AI platform.

While you’re worrying about the myriad risks of using ChatGPT you don’t also want to be contending with a million little technical details as well, so schedule a demo with us to find out how our technology can bring cutting-edge language models to your contact center, without the headache.

Footnotes
[1] NOTE: This is not legal advice.

Request A Demo

The Ongoing Management of an LLM Assistant

Technologies like large language models (LLMs) are amazing at rapidly generating polite text that helps solve a problem or answer a question, so they’re a great fit for the work done at contact centers.

But this doesn’t mean that using them is trivial or easy. There are many challenges associated with the ongoing management of an LLM assistant, including hallucinations and the emergence of bad behavior – and that’s not even mentioning the engineering prowess required to fine-tune and monitor these systems.

All of this must be borne in mind by contact center managers, and our aim today is to facilitate this process.

We’ll provide broad context by talking about some of the basic ways in which large language models are being used in business, discuss, setting up an LLM assistant, and then enumerate some of the specific steps that need to be taken in using them properly.

Let’s go!

How Are LLMs Being Used in Science and Business?

First, let’s adumbrate some of the ways in which large language models are being utilized on the ground.

The most obvious way is by acting as a generative AI assistant. One of the things that so stunned early users of ChatGPT was its remarkable breadth in capability. It could be used to draft blog posts, web copy, translate between languages, and write or explain code.

This alone makes it an amazing tool, but it has since become obvious that it’s useful for quite a lot more.

One thing that businesses have been experimenting with is fine-tuning large language models like ChatGPT over their own documentation, turning it into a simple interface by which you can ask questions about your materials.

It’s hard to quantify precisely how much time contact center agents, engineers, or other people spend hunting around for the answer to a question, but it’s surely quite a lot. What if instead you could just, y’know, ask for what you want, in the same way that you do a human being?

Well, ChatGPT is a long way from being a full person, but when properly trained it can come close where question-answering is concerned.

Stepping back a little bit, LLMs can be prompt engineered into a number of useful behaviors, all of which redound to the benefit of the contact centers which use them. Imagine having an infinitely patient Socratic tutor that could help new agents get up to speed on your product and process, or crafting it into a powerful tool for brainstorming new product designs.

There have also been some promising attempts to extend the functionality of LLMs by making them more agentic – that is, by embedding them in systems that allow them to carry out more open-ended projects. AutoGPT, for example, pairs an LLM with a separate bot that hits the LLM with a chain of queries in the pursuit of some goal.

AssistGPT goes even further in the quest to augment LLMs by integrating them with a set of tools that allow them to achieve objectives involving images and audio in addition to text.

How to Set Up An LLM Assistant

Next, let’s turn to a discussion of how to set up an LLM assistant. Covering this topic fully is well beyond the scope of this article, but we can make some broad comments that will nevertheless be useful for contact center managers.

First, there’s the question of which large language model you should use. In the beginning, ChatGPT was pretty much the only foundation model on offer. Today, however, that situation has changed, and there are now foundation models from Anthropic, Meta, and many other companies.

One of the biggest early decisions you’ll have to make is whether you want to try and use an open-source model (for which the code and the model weights are freely available) or a close-source model (for which they are not).

If you go the closed-source route you’ll almost certainly be hitting the model over an API, feeding it your queries and getting its responses back. This is orders of magnitude simpler than provisioning an open-source model, but it means that you’ll also be beholden to the whims of some other company’s engineering team. They may update the model in unexpected ways, or simply go bankrupt, and you’ll be left with no recourse.

Using an open-source alternative, of course, means grabbing the other horn of the dilemma. You’ll have visibility into how the model works and will be free to modify it as you see fit, but this won’t be worth much unless you’re willing to devote engineering hours to the task.

Then, there’s the question of fine-tuning large language models. While ChatGPT and LLMs more generally are quite good on their own, having them answer questions about your product or respond in particular ways means modifying their behavior somehow.

Broadly speaking, there are two ways of doing this, which we’ve mentioned throughout: proper fine-tuning, and prompt engineering. Let’s dig into the differences.

Fine-tuning means showing the model many (i.e. several hundred) examples of the behaviors you want to see, which changes its internal weights and biases it towards those behaviors in the future.

Prompt engineering, on the other hand, refers to carefully structuring your prompts to elicit the desired behavior. These LLMs can be surprisingly sensitive to little details in the instructions they’re provided, and prompt engineers know how to phrase their requests in just the right way to get what they need.

There is also some middle ground between these approaches. “One-shot learning” is a form of prompt engineering in which the prompt contains a singular example of the desired behavior, while “few-shot learning” refers to including between three and five examples.

Contact center managers thinking about using LLMs will need to think about these implementation details. If you plan on only lightly using ChatGPT in your contact center, a basic course on prompt engineering might be all you need. If you plan on making it an integral part of your organization, however, that most likely means a fine-tuning pipeline and serious technical investment.

The Ongoing Management of an LLM

Having said all this, we can now turn to the day-to-day details of managing an LLM assistant.

Monitoring the Performance of an LLM

First, you’ll need to continuously monitor the model. As hard as it may be to believe given how perfect ChatGPT’s output often is, there isn’t a person somewhere typing the responses. ChatGPT is very prone to hallucinations, in which it simply makes up information, and LLMs more generally can sometimes fall into using harmful or abusive language if they’re prompted incorrectly.

This can be damaging to your brand, so it’s important that you keep an eye on the language created by the LLMs your contact center is using.

And of course, not even LLMs can obviate the need to track the all-import key performance indicators. So far, there’s been one major study on generative AI in contact centers that found they increased productivity and reduced turnover, but you’ll still want to measure customer satisfaction, average handle time, etc.

There’s always a temptation to jump on a shiny new technology (remember the blockchain?), but you should only be using LLMs if they actually make your contact center more productive, and the only way you can assess that is by tracking your figures.

Iterative Fine-Tuning and Training

We’ve already had a few things to say about fine-tuning and the related discipline of prompt engineering, and here we’ll build on those preliminary comments.
The big thing to bear in mind is that fine-tuning a large language model is not a one-and-done kind of endeavor. You’ll find that your model’s behavior will drift over time (the technical term is “model degradation”), and this means you will likely to have to periodically re-train it.

It’s also common to offer the model “feedback”, i.e. by ranking it’s responses or indicating when you did or did not like a particular output. You’ve probably heard of reinforcement learning through human feedback, which is one version of this process, but there are also others you can use.

Quality Assurance and Oversight

A related point is that your LLMs will need consistent oversight. They’re not going to voluntarily improve on their own (they’re algorithms with no personal initiative to speak of), so you’ll need to checking in routinely to make sure they’re performing well and that your agents are using them responsibly.

There are many parts to this, including checks on the models outputs and an audit process that allows you to track down any issues. If you suddenly see a decline in performance, for example, you’ll need to quickly figure out whether it’s isolated to one agent or part of a larger pattern. If it’s the former, was it a random aberration, or did the agent go “off script” in a way that caused the model to behave poorly?

Take another scenario, in which an end-user was shown inappropriate text generated by an LLM. In this situation, you’ll need to take a deeper look at your process. If there were agents interacting with this model, ask them why they failed to spot the problematic text and stop it being shown to a customer. Or, if it came from a mostly-automated part of your tech stack, you need to uncover the reasons for which your filters failed to catch it, and perhaps think about keeping humans more in the loop.

The Future of LLM Assistants

Though the future is far from certain, we tend to think that LLMs have left Pandora’s box for good. They’re incredibly powerful tools which are poised to transform how contact centers and other enterprises operate, and experiments so far have been very promising; for all these reasons, we expect that LLMs will become a steadily more important part of the economy going forward.

That said, the ongoing management of an LLM assistant is far from trivial. You need to be aware at all times of how your model is performing and how your agents are using it. Though it can make your contact center vastly more productive, it can also lead to problems if you’re not careful.

That’s where the Quiq platform comes in. Our conversational AI is some of the best that can be found anywhere, able to facilitate customer interactions, automate text-message follow-ups, and much more. If you’re excited by the possibilities of generative AI but daunted by the prospect of figuring out how TPUs and GPUs are different, schedule a demo with us today.

Request A Demo

How Do You Train Your Agents in a ChatGPT World?

There’s long been an interest in using AI for educational purposes. Technologist Danny Hillis has spent decades dreaming of a digital “Aristotle” that would teach everyone in the way that the original Greek wunderkind once taught Alexander the Great, while modern companies have leveraged computer vision, machine learning, and various other tools to help students master complex concepts in a variety of fields.

Still, almost nothing has sparked the kind of enthusiasm for AI in education that ChatGPT and large language models more generally have given rise to. From the first, its human-level prose, knack for distilling information, and wide-ranging abilities made it clear that it would be extremely well-suited for learning.

But that still leaves the question of how. How should a contact center manager prepare for AI, and how should she change the way she trains her agents?

In our view, this question can be understood in two different, related ways:

  1. How can ChatGPT be used to help agents master skills related to their jobs?
  2. How can they be trained to use ChatGPT in their day-to-day work?

In this piece, we’ll take up both of these issues. We’ll first provide a general overview of the ways in which ChatGPT can be used for both education and training, then turn to the question of the myriad ways in which contact center agents can be taught to use this powerful new technology.

How is ChatGPT Used in Education and Training?

First, let’s get into some of the early ways in which ChatGPT is changing education and training.

NOTE: Our comments here are going to be fairly broad, covering some areas that may not be immediately applicable to the work contact center agents do. The main purpose for this is that it’s very difficult to forecast how AI is going to change contact center work.

Our section on “creating study plans and curricula”, for example, might not be relevant to today’s contact center agents. But it could become important down the road if AI gives rise to more autonomous workflows in the future, in which case we expect that agents would be given more freedom to use AI and similar tools to learn the job on their own.

We pride ourselves on being forward-looking and forward-thinking here at Quiq, and we structure our content to reflect this.

Making a Socratic Tutor for Learning New Subjects

The Greek philosopher Socrates famously pioneered the instructional methodology which bears his name. Mostly, the Socratic method boils down to continuously asking targeted questions until areas of confusion emerge, at which point they’re vigorously investigated, usually in a small group setting.

A well-known illustration of this process is found in Plato’s Republic, which starts with an attempt to define “justice” and then expands into a much broader conversation about the best way to run a city and structure a social order.

ChatGPT can’t replace all of this on its own, of course, but with the right prompt engineering, it does a pretty good job. This method works best when paired with a primary source, such as a textbook, which will allow you to double-check ChatGPT’s questions and answers.

Having it Explain Code or Technical Subjects

A related area in which people are successfully using ChatGPT is in having it walk you through a tricky bit of code or a technical concept like “inertia”.

The more basic and fundamental, the better. In our experience so far, ChatGPT has almost never failed in correctly explaining simple Python, Pandas, or Java. It did falter when asked to produce code that translates between different orbital reference frames, however, and it had no idea what to do when we asked it about a fairly recent advance in the frontiers of battery chemistry.

There are a few different reasons that we advise caution if you’re a contact center agent trying to understand some part of your product’s codebase. For one thing, if the product is written in a less-common language ChatGPT might not be able to help much.

But even more importantly, you need to be extremely careful about what you put into it. There have already been major incidents in which proprietary code and company secrets were leaked when developers pasted them into the ChatGPT interface, which is visible to the OpenAI team.

Conversely, if you’re managing teams of contact center agents, you should begin establishing a policy on the appropriate uses of ChatGPT in your contact center. If your product is open-source there’s (probably) nothing to worry about, but otherwise, you need to proactively instruct your agents on what they can and cannot use the tool to accomplish.

Rewriting Explanations for Different Skill Levels

Wired has a popular Youtube series called “5 levels”, where experts in quantum computing or the blockchain will explain their subject at five different skill levels: “child”, “teen”, “college student”, “grad student”, and a fellow “expert.”

One thing that makes this compelling to beginners and pros alike is seeing the same idea explored across such varying contexts – seeing what gets emphasized or left out, or what emerges as you gradually climb up the ladder of complexity and sophistication.

This, too, is a place where ChatGPT shines. You can use it to provide explanations of concepts at different skill levels, which will ultimately improve your understanding of them.

For a contact center manager, this means that you can gradually introduce ideas to your agents, starting simply and then fleshing them out as the agents become more comfortable.

Creating Study Plans and Curricula

Stepping back a little bit, ChatGPT has been used to create entire curricula and even daily study plans for studying Spanish, computer science, medicine, and various other fields.

As we noted at the outset, we expect it will be a little while before contact center agents are using ChatGPT for this purpose, as most centers likely have robust training materials they like to use.

Nevertheless, we can project a future in which these materials are much more bare-bones, perhaps consisting of some general notes along with prompts that an agent-in-training can use to ask questions of a model trained on the company’s documentation, test themselves as they go, and gradually build skill.

Training Agents to Use ChatGPT

Now that we’ve covered some of the ways in which present and future contact center agents might use ChatGPT to boost their own on-the-job learning, let’s turn to the other issue we want to tackle today: how to train ChatGPT to agents today?

Getting Set Up With ChatGPT (and its Plugins)

First, let’s talk about how you can start using ChatGPT.

This section may end up seeming a bit anticlimactic because, honestly, it’s pretty straightforward. Today, you can get access to ChatGPT by going to the signup page. There’s a free version and a paid version that’ll set you back a whopping $20/month (which is a pretty small price to pay for access to one of the most powerful artifacts the human race has ever produced, in our opinion.)

As things stand, the free tier gives you access to GPT-3.5, while the paid version gives you the choice to switch to GPT-4 if you want the more powerful foundational model.

A paid account also gives you access to the growing ecosystem of ChatGPT plugins. You access the ChatGPT plugins by switching over to the GPT-4 option:

How do you Train Your Agents in a ChatGPT World?

 

How do you Train Your Agents in a ChatGPT World?

 

There are plugins that allow ChatGPT to browse the web, let you directly edit diagrams or talk with PDF documents, or let you offload certain kinds of computations to the Wolfram platform.

Contact center agents may or may not find any of these useful right now, but we predict there will be a lot more development in this space going forward, so it’s something managers should know about.

Best Practices for Combining Human and AI Efforts

People have long been fascinated and terrified by automation, but so far, machines have only ever augmented human labor. Knowing when and how to offload work to ChatGPT requires knowing what it’s good for.

Large language models learn how to predict the next token from their training data, and are therefore very good at developing rough drafts, outlines, and more routine prose. You’ll generally find it necessary to edit its output fairly heavily in order to account for context and so that it fits stylistically with the rest of your content.

As a manager, you’ll need to start thinking about a standard policy for using ChatGPT. Any factual claims made by the model, especially any references or citations, need to be checked very carefully.

Scenario-Based Training

In this same vein, you’ll want to distinguish between different scenarios in which your agents will end up using generative AI. There are different considerations in using Quiq Compose or Quiq Suggest to format helpful replies, for example, and in using it to translate between different languages.

Managers will probably want to sit down and brainstorm different scenarios and develop training materials for each one.

Ethical and Privacy Considerations

The rise of generative AI has sparked a much broader conversation about privacy, copyright, and intellectual property.

Much of this isn’t particularly relevant to contact center managers, but one thing you definitely should be paying attention to is privacy. Your agents should never be putting real customer data into ChatGPT, they should be using aliases and fake data whenever they’re trying to resolve a particular issue.

To quote fictional chemist and family man Walter White, we advise you to tread lightly here. Data breaches are a huge and ongoing problem, and they can do substantial damage to your brand.

ChatGPT and What it Means for Training Contact Center Agents

ChatGPT and related technologies are poised to change education and training. They can be used to help get agents up to speed or to work more efficiently, and they, in turn, require a certain amount of instruction to use safely.

These are all things that contact center managers need to worry about, but one thing you shouldn’t spend your time worrying about is the underlying technology. The Quiq conversational AI platform allows you to leverage the power of language models for contact centers, without looking at any code more complex than an API call. If the possibilities of this new frontier intrigue you, schedule a demo with us today!

How Can AI Make Agents More Efficient?

From the invention of writing to quantum computing, emerging technologies have always had a profound impact on the way we work. New tools mean new products and services, new organizational structures, whole new markets, and sometimes even new methods of thought.

These days, the big news is coming out of artificial intelligence. Specifically, the release of ChatGPT has made it possible for everyone to try out an advanced AI application for the first time, and it has ignited a firestorm of speculation as to how industries ranging from medicine to copywriting might be transformed.

In this piece, we’re going to try to cut through the hype to give contact center managers some much-needed clarity. We’ll discuss what AI is useful for, how it will change how contact center agents function daily, and what tools they should investigate to get the most out of AI.

What Is AI Useful For?

Artificial intelligence is a pretty broad category, encompassing everything from the most basic linear regressions to the remarkable sophistication of deep reinforcement learning agents.

This is too much territory to cover in a single blog post, but we can nevertheless make some useful general comments.

The way we see it, there are essentially two ways that AI is useful: it can either completely replace a human for certain tasks, allowing them to shift their focus to higher-value work, or it can augment their process, allowing them to reach insights or achieve objectives that would’ve taken much longer otherwise.

Take the example of ChatGPT, a large language model trained on huge quantities of human-generated text that is able to write poetry, generate math proofs, create functioning code, and much more.

For certain tasks – like generating blog post titles or short email blasts – ChatGPT is good enough to supplant humans altogether. But if you’re trying to learn a complex subject like organic chemistry, it’s best to treat ChatGPT more like a conversational partner. You can ask it questions or use it to test your understanding of a concept, but you have to be careful with its output because it might be hallucinating or otherwise getting important facts wrong. [1]

Since ChatGPT and large language models more generally are what everyone is focused on at the moment, it’s what we’ll be discussing throughout this essay.

How is AI Changing How Contact Center Agents Work?

A woman smiling as she interacts with generative AI on her laptop.

As soon as ChatGPT was released it spawned an unending stream of hot takes, from “this is going to completely automate the entire economy” to “this is going to be a huge flop that no one finds particularly useful.”

Recently, a study by Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond called “Generative AI at Work” examined how LLMs are being used in contact centers. They found that both these perspectives were wrong: generative AI was not completely automating contact centers but was proving enormously helpful in making contact centers more efficient.

Specifically, LLMs were able to capture some of the conversational patterns and general tacit knowledge held by more senior agents and transfer it to more junior agents. The result was more productivity among these less experienced workers, less overall turnover, and a better customer experience.

To help flesh this picture out, we’ll now turn to examining some specific ways this works.

Large Language Models are Helping Agents Work Faster

There are a few ways that LLMs are helping agents get their jobs done more quickly and efficiently.

One is by helping them cut down on typing by providing contextually appropriate responses to customer questions, which is exactly what Quiq Compose does.

Quiq Compose learns from interactions between contact center agents and customers. It can take a barebones outline of a reply (“Nope, you waited too long to return the product…”) and flesh it out into a full, coherent, grammatical response (“I’m so sorry to hear that the product isn’t working as intended…”.)

Quiq Suggest also learns from multiple agent-customer interactions, but it offers real-time suggestions. As your contact center agents begin typing their responses, the underlying model offers a robust form of autocomplete to help them craft replies more quickly. This substantially reduces the amount of time that agents have to spend up to 30% less time hunting around for information and tweaking their language to be both polite and informative.

What’s more, because Quiq Suggest leverages lightweight “edge” language models trained on a specific company’s data, it’s able to run very quickly.

Another way you can reduce agent handling time is by simply cutting down on the amount of text a given agent has to process. In the course of resolving an issue, there will usually be some extraneous text, like “Thanks!” or “Have a good day!” When Quiq’s conversational AI platform sees these unimportant messages, it automatically filters them and tacks them on to the end of the transcript.

Finally, a lot of friction and information loss can occur when a conversation is transferred between agents, or from an AI to a human agent. This is where conversation summarization comes in handy. By automatically summarizing the interaction so far, these transfers can take less time and energy, which also contributes to lower agent burnout and higher customer satisfaction.

Large Language Models can provide 24/7 Customer Support

There’s a fundamental asymmetry in running a great contact center, inasmuch as problems can occur around the clock but your agents need to sleep, rest, and play frisbee golf.

Unless, of course, some of your agents aren’t human. One of the great advantages of computers and algorithms is that they have none of the human frailties that prevent us all from working every hour of the day. They have no need for sleep, bathroom breaks, or recreation.

If you’re using a powerful conversational AI platform like Quiq, you can have AI agents deployed every hour, day or night, answering questions, completing tasks, and resolving problems.

Of course, the technology is not yet good enough to handle everything a contact center agent would handle, and some issues will have to be postponed until the humans punch the clock. Still, with the right tools, your operation can constantly be moving forward.

Large Language Models Can Help With Documentation

Writing documentation is one of those crucial, un-sexy tasks that businesses ignore at their own peril. Everyone wants to be coding up a blockchain or demo-ing a shiny new application to well-heeled investors, but someone needs to be sitting and writing up product specs, troubleshooting workflows, and all the other text that helps an organization function effectively.

This, too, is something that AI can help with. Whether it’s brainstorming an outline, identifying common sticking points, or even writing the document wholesale, more and more technical organizations are exploring LLMs to speed up their documentation efforts.

Just remember that LLMs like ChatGPT are extremely prone to hallucinations, so carefully fact-check everything they produce before you add it to your official documentation.

Large Language Models Can Help With Marketing

A final place where AI is proving incredibly useful is in marketing. Whether or not your agents have any input into your marketing depends on how you run your contact center, but this piece wouldn’t be complete without at least briefly touching upon marketing.

One obvious way that this can work is by having ChatGPT generate headlines, subject lines, Tweets, or even SEO-optimized blog posts.

But this is not the only way AI can be used in marketing. One very clever use of the technology that we’ve encountered is having ChatGPT generate customer journeys or customer diary entries. If your product is targeting men in their 40s who aren’t crushing life they way they used to, for example, it can create a month’s worth of forum posts from your target buyers discussing their lack of drive and motivation. This, in turn, will furnish targeted language you can use in your copy.

But bear in mind that marketing is one of those things that’s just incredibly subtle. It takes all of 30 seconds to come up with a few headlines for an email, but the difference between an okay headline and an extraordinary one can be a single word. Here, as elsewhere, it’s wise to have the final word remain with the humans.

Working more Quiq-ly

The world is changing, and contact centers are changing along with it. If you expect to retain a competitive edge and a top-notch contact center, you’ll need to utilize the latest technologies.

One way you could do this is by paying an expensive engineering team to build your own LLMs and AI tooling. But a much easier way is to integrate our Quiq conversational AI platform into your contact center. Whether it’s automatic summarization, filtering trivial messages, or using Quiq Suggest and Quiq Compose to cut down on average handle time, we have a product that will streamline your operation. Schedule a demo with us today to see how we can help you!

Footnotes
[1] You could argue that both of these examples boil down to the same thing. That is, even when you treat ChatGPT as a sounding board you’re really just replacing a human being that could’ve performed the same function. This is a plausible point of view, but we still think it’s useful to distinguish between “ChatGPT acting like a total replacement for a human for certain boilerplate tasks” and “ChatGPT augmenting a human’s workflow by acting like an idea generator or conversational partner.” Reasonable people could disagree on this, and your mileage may vary.

Request A Demo

The Pros and Cons of Using ChatGPT: Agents vs. Customers

If you’re a contact center manager who has been impressed with ChatGPT and everything it makes possible, a natural follow-up question is where you should deploy it.

On the one hand, you could use it internally to make your contact center agents more efficient. They’d be able to ask questions of your company documentation, summarize important emails, outsource the more trivial parts of their workload, and plenty besides.

On the other hand, you could use it externally as a customer-facing application. If you had clients that were confused about a feature or needed help figuring something out, ChatGPT could go a long way towards resolving their issues with minimal attention from your contact center agents.

Of course, there is major overlap in both these options, but there are crucial differences as well. In this article, we’ll discuss the pros and cons of using ChatGPT or a similar large language model (LLM) for contact center agents v.s. using it for customers.

How is ChatGPT Making Contact Center Agents More Efficient?

To a first approximation, a contact center is a place where questions are answered. No matter how clear your instructions or comprehensive your documentation, there will inevitably be users who simply can’t get an issue resolved, and that’s when they’ll reach out to customer support.

This means that much of a contact center agent’s day-to-day revolves around interacting with clients via text, either over a chat interface or possibly through text messaging.

What’s more, much of this interaction will be relatively formulaic. Customers will be repeatedly asking about similar sorts of issues, or there’ll be asking questions that are covered somewhere in your product’s documentation.

If you’ve spent even five minutes with ChatGPT, it’s probably occurred to you that it’s a powerful tool for handling exactly these kinds of tasks. Let’s spend a few minutes digging into this idea.

Outsourcing Routine Tasks

The most obvious way that ChatGPT is making contact center agents more efficient is by allowing them to outsource some of this more routine work.

There are a few ways this can happen. First, ChatGPT can help with answering basic questions. Today, large language models are not particularly good at generating highly original and inventive text, but when it comes to churning out helpful, simple boilerplate, they’re without peer.

This means that, with a little training or fine-tuning, your contact center agents can use ChatGPT to answer the sorts of questions they see multiple times a day, such as where a given feature is located or how to handle a common error. This will free them up to focus on the more involved queries, for which they have a comparative advantage.

In this same vein, tools like ChatGPT can also help contact center agents adopt the appropriate, polite tone in their correspondences. Customer experience and customer service are major parts of being a contact center agent, which means replies must be crafted so as to put the customer (who may be frustrated, angry, and belligerent) at ease.

This is something ChatGPT excels at, and according to the paper “Generative AI at work”, this exact dynamic was responsible for a lot of the gain in productivity seen in a contact center that began using an LLM. The model was trained on the interactions of more seasoned agents who know how to deal with tricky customers, and a good portion of this ability was transferred to more junior agents via the model’s output.

Another place where ChatGPT can help is in writing documentation. This may fall to a technical writer rather than an actual agent, but in either case, ChatGPT’s remarkable ability to provide outlines and quickly generate expository text can speed up the process of documenting your product’s core features.

And finally, ChatGPT is quite good at writing and explaining simple code. As with documentation, it’s doubtful that a contact center agent is going to be spending much time writing code. Nevertheless, your agents might find themselves hit with questions from savvier users about e.g. API integrations, so they should know that they can query ChatGPT about what a code snippet is doing, and they can have it generate a basic code example if they need to.

Learning and Brainstorming

This is a bit more abstract, but ChatGPT has proven remarkably useful in brainstorming study plans, solutions to problems, etc. Though the algorithm itself isn’t particularly creative, when it generates ideas that a human being can riff off of the combination of algorithm + human can be much more creative than a human working by herself.

While there will be many situations in which a contact center agent has a script to work off of, when they don’t, turning to ChatGPT can be the spark that moves them forward.

ChatGPT Plugins for Contact Center Agents

One of the more exciting developments for ChatGPT was the release of its plugin library in March of 2023. There are now plugins from Instacart (for food delivery), Expedia (for trip planning), Klarna Shopping (for online retail), and many others.

Truthfully, most of this won’t (yet) be of much use for contact center agents, but it’s worth mentioning given how quickly people are developing new plugins. If you’re a contact center agent or manager wanting to extend the functionality of powerful LLM technologies, plugins are something you’ll want to be aware of.

Getting the Most out of ChatGPT for Customer Service

ChatGPT is remarkably good for a wide range of tasks, but to really leverage its full capacities you’ll need to be aware of a few common terms.

Large language models are known to be really sensitive to small changes in word choice and structure, which means there’s an art to phrasing your requests just so. This is known as “prompt engineering” a language model, and it’s a new discipline that can be enormously valuable if done correctly.

You can also get better results if you show ChatGPT an example or two of what you’re looking for. This is known as “one-shot” learning (if you show it one example), and “few-shot” learning (if you show it five or six).

Of course, if that doesn’t work you can instead try to fine-tune a large language model. This involves gathering hundreds of examples of the conversations, text, or output you want to see and feeding them all to the model (probably over its API) so that the model’s internal structure actually changes. Though it’s obviously a more significant engineering challenge, it will probably give you the best results of all.

ChatGPT v.s. Chatbots

We in the customer experience field have quite a lot of experience with chatbots, so it’s natural to wonder how ChatGPT is different.

Chatbots are just algorithms that are capable of carrying on a dialogue with customers, and this can be accomplished in many different ways. Some chatbots are extremely simple and follow a rules-based approach to formatting their responses, while those based on neural networks or some other advanced machine-learning technology are much more flexible.

Chatbots can be built with ChatGPT, but most aren’t.

How is ChatGPT Changing Customer Experience?

Now that we’ve covered some of the ways in which ChatGPT is helping customer service agents, let’s discuss some of the ways ChatGPT is used for customer support.

Personalized Responses

One property of ChatGPT that makes it extremely effective is that it’s able to remember the context. When you chat with ChatGPT, it’s not generating each new response in a vacuum, it’s producing them either on the basis of what has already been said or based on information that it’s been given.

This means that if you have a customer interacting with a chat interface powered by a LLM (and are being smart by guardrailing it with a conversational CX platform like Quiq), they’ll be able to have more open-ended and personalized interactions with the tool than would be possible with simpler chatbots.

This will go a long way toward making them feel like they’re being taken care of, thus boosting your company’s overall customer satisfaction.

Automatically Resolving Customer Issues

Earlier, we talked about how contact center agents would be able to leverage ChatGPT in order to outsource their more routine tasks.

Well, one of those routine tasks is resolving a steady stream of quotidian issues. How many times a day do you think a contact center agent has to help a person log in to their client’s software or reset a password? It’s probably not “hundreds”, but we’d bet that it’s a lot.

ChatGPT is a long way away from being able to patiently guide a user through any arbitrary problem they might have, but it’s already more than capable of handling the kinds simple of repetitive, basic queries that sap an agent’s energy.

Automatic Natural Language Translation

One of the surprising places where ChatGPT excels is in fast, accurate translation between multiple languages. Given the fact that English is so commonly used in the technical community, it can be easy to lose sight of the fact that billions of people have either no knowledge of it or, at best, a very rudimentary grasp.

But not many can afford to have all their documentation translated into dozens of different tongues or to keep a team of translators on staff. ChatGPT is almost certainly not going to capture every little nuance in a translation, but it should be sufficient to help a person resolve their issue on their own or to ask more pointed, technical questions.

Dangers in Using ChatGPT

Whether you end up letting your agents or your customers get ahold of ChatGPT first, you should know that it’s not a panacea, nor is it perfect. It can and will fail, and some of those failures are reasonably predictable ones you should be prepared for.

The most obvious and well-known failure is referred to as a “hallucination”, and it results from the way that LLMs like ChatGPT are trained. An LLM learns how to output sequences of tokens, it’s not doing any fact-checking on its own. That means it will cheerfully and confidently make up names, book titles, and URLs.

It’s also possible for ChatGPT to become obnoxious and insulting. The team at OpenAI has done a good job of tuning this behavior out, but recall that these systems are very sensitive to the way prompts are structured, and it can reemerge.

There’s no general solution to these issues as far as we know. You can assiduously construct a fine-tuning pipeline for LLMs that does even more to get rid of toxicity, but ultimately you’re going to have to monitor ChatGPT’s output to see if it’s straying or otherwise being unhelpful.

Quiq specializes in defining guardrails for enterprise businesses who want to harness ChatGPT’s benefits, but are brand protective.

Figuring Out Where to Deploy ChatGPT

Whether it makes more sense to use ChatGPT internally or externally will depend a lot on your circumstances. There’s a lot ChatGPT can do to make your contact center agents more efficient, but if you’re just wanting to offload basic customer queries they can certainly be useful for that purpose.

In our considered opinion, the ROI is ultimately higher for using ChatGPT in a customer-facing way. This will allow your clients to help themselves, ultimately boosting their satisfaction and their estimation of your product.

But whichever way you choose to go, you can substantially reduce the headache associated with managing the infrastructure for this complex technology by making use of the Quiq conversational CX platform. With us, you can get world-leading results, satisfy your customer, lighten the load on your agents, and never have to worry about a rogue answer,  compute cluster, or GPU.

Current Large Language Models and How They Compare

From ChatGPT and Bard to BLOOM and Claude, there is now a veritable ocean of different large language models (LLMs) for you to choose from. Some of them are specialized for specific use cases, some are open-source, and there’s a huge variance in the number of parameters they contain.

If you find yourself fascinated by this technology and interested in using it in your contact center, it can be hard to know how to choose the right tool for the job.

Today, we’re going to tackle this issue head-on. After a brief discussion of the history of LLMs, we’ll talk about specific criteria you can use to evaluate LLMs, sources of additional information, and some of the better-known options.

Let’s get going!

A Brief History of Generative AI

Though it may feel like LLMs and generative AI exploded onto the scene all of half a year ago, in fact, the basic research powering these advances goes back much further.

Way back in the 1940s, Walter Pitts and Warren McCulloch drew upon early research on the brain to design artificial neurons. Though these worked, they couldn’t be deployed for anything particularly useful until the invention of the backpropagation algorithm in 1985. This allowed larger neural networks to be trained effectively, and in 1989 Yann LeCun built a convolutional system able to identify handwritten numbers.

Around this same time, there were architectural discoveries like long short-term memory networks that made it possible for machine learning algorithms to learn far more complex relationships within data, laying the foundations for them to eventually be able to revolutionize work in places like contact centers.

What’s more, the opening decade of the 2000s marked the beginning of the big data era. For all their power, generative pre-trained models like ChatGPT are not terribly efficient learners. To be able to output language or images, they must be shown many, many examples from which to derive the statistical function that allows them to create surprising new output later.

Once researchers began the practice of publishing enormous datasets a key obstacle to building large, useful systems was removed. When combined with the preceding six decades of foundational conceptual work, this was enough to allow us here in 2023 to witness the birth of generative AI and large language models.

How to Compare Large Language Models?

If you’re shopping around for a large language model for a particular application, it makes sense to first get clear on the evaluation criteria you should be using. That’s what we’ll cover in the sections below.

Evaluating LLMs Based on Industry

One of the more remarkable aspects of ChatGPT is that it’s so good at so many things. Out of the box (or sometimes with a little fine-tuning) it can perform very well at answering questions, summarizing text, translating between natural languages, and much more.

However, there may well be situations in which you’d want to use a domain-specific LLM, one that has been trained on medical or legal text, financial data, etc. The basic process of training a generative model is now being used to build neural networks for material design, protein synthesis, and music, among other things.

So, if you’re considering using a generative pre-trained model in your business, one thing you might want to think about early on is whether you want to try to find a domain-specific model, or a general model that you train on your own data.

If you do look for a domain-specific model, be aware that the space is very new and there might not be one available yet (though given how much attention is going into generative AI right now, there’s also a decent chance that one will be released in relatively short order).

Alternatively, you could try to fine-tune a pre-trained model. Getting into the nuances of fine-tuning, zero-shot learning, few-shot learning, and prompt engineering is beyond the scope of this article, but suffice it to say that there are many ways for you to get a generic LLM to be better at a smaller range of specific tasks.

If you’re an engineer designing circuits for quantum computers this might not be sufficient, but for those of us working in customer experience and contact centers, a well-honed prompt or a half-dozen examples might be more than enough for substantial performance boosts.

Evaluating LLMs By Language

Given that English is a sort of lingua franca (should it be lingua anglica?) for the tech community and makes up nearly 60% of the websites on the internet, it’s no surprise that it also comprises the bulk of the training data going into modern LLMs.

ChatGPT and other systems are often pretty good at multi-lingual tasks by default, but they don’t perform equally well in all languages. As you can probably guess, they’re best at “high-resource” languages (English, Spanish, Chinese), somewhat worse at “medium-resource” languages (Portuguese, Hindi), and much worse at “low-resource” languages (Haitian and Swahili).

If you’re serving customers with a medium- or low-resource language and need really high levels of accuracy, you’ll probably have to stick with human beings for a while. Otherwise, test ChatGPT or whatever system you end up going with for how well it can handle multi-lingual problems like question answering and translation.

Whether They’re Open-Source or Closed-Source

No doubt you’ve heard of “open-source” software, a term which refers to the practice of releasing source code to the public where it can be forked, modified, and scrutinized.

The open-source approach to software development has become incredibly popular, and this enthusiasm has partially bled over into artificial intelligence and machine learning. It’s is now fairly common to open source datasets, models, and even training frameworks like TensorFlow.

How does this translate to the realm of large language models? In truth, it’s a bit of a mixture. Some models are proudly open-sourced, while others jealously guard their model’s weights, training data, and source code.

This is one thing you might want to consider as you carry out your search for a large-language model. Some of the very best models, like ChatGPT, are closed-source. You won’t be able to fork the ChatGPT code base and modify it, you’ll be relegated to feeding queries into it via an API.

The advantage to going with a closed-source model, of course, is that you needn’t lay awake at night worrying about managing a codebase thousands of lines long, nor will you need to concern yourself with hiring the expensive engineers who know how to read and use it.

The downside, naturally, is that you’re entirely beholden to the team who builds and offers the LLM over their API. If they make updates or go bankrupt, you could be left scrambling last-minute to find an alternative solution.

There’s no one-size-fits-all approach here; if you have the in-house technical expertise to fork an open-source LLM and you want to modify it, open-source is probably the way to go. But be aware that this is a substantial commitment, and as things stand today, the very best generative pre-trained language models are closed-source, so there’s a performance penalty that you’ll have to account for.

Contact Us
 

Leaderboards and Comparison Websites for Large Language Models

Another route you can go in comparing current LLMs is to avail yourself of a service build for this purpose.

Whatever rumors you may have heard, programmers are human beings, and human beings have a fondness for ranking and categorizing pretty much everything – sports teams, guitar solos, classic video games, you name it.

Naturally, as LLMs have become better-known, leaderboards and websites have popped up comparing them along all sorts of different dimensions. Here are a few you can use as you search around for the best tooling.

Leaderboards

In the past couple of months, leaderboards have emerged which directly compare various LLMs.

One is AlpacaEval, which uses a custom dataset to compare ChatGPT, Claude, Cohere, and other LLMs on how well they’re able to follow instructions. AlpacaEval boasts high agreement with human evaluators, so in our estimation it’s probably a suitable way of initially screening for LLM tools, though more extensive checks might be required to settle on a final list.

Another good choice is Chatbot Arena, which pits two anonymous models side-by-side, has you rank which one is better, than aggregates all the scores into a leaderboard.

Finally, there is Hugging Face’s Open LLM Leaderboard, which is a similar endeavor. Anyone can submit a new model for evaluation, all of which are then assessed based on a small set of key benchmarks from the Eleuther AI Language Model Evaluation Harness. These capture how well the models do in answering simple science questions, common-sense queries, and more.

When combined with the criteria we discussed earlier, these leaderboards and comparison websites ought to give you everything you need to find a powerful generative pre-trained language model for your application.

What are the Currently-Available Large Language Models?

Okay! Now that we’ve worked through all this background material, let’s turn to discussing some of the major LLMs that are available today. We make no promises about these entries being comprehensive (and even if they were, there’d be new models out next week), but it should be sufficient to give you an idea as to the range of options you have.

ChatGPT and GPT

Obviously, the titan in the field is OpenAI’s ChatGPT, which is really just a version of GPT that has been fine-tuned through reinforcement learning from human feedback to be especially good at sustained dialogue.

ChatGPT and GPT have been used in many domains, including customer service, question answering, and many others.

LLaMA

In February of 2023, Facebook’s AI team released its Large Language Model Meta AI, or LLaMA. At 65 billion parameters it is not quite as big as GPT, and this is intentional, as it’s purpose is to aid researchers who may not have the budget or expertise required to provision a behemoth LLM.

LaMDA

Like GPT-4, Google’s LaMDA is based on the transformer architecture and is aimed squarely at dialogue. It is able to converse on a nearly infinite number of subjects, and from the beginning, the Google team has focused on having LaMDA produce interesting responses that are nevertheless absent of abuse and harmful language.

MT-NLG

The Megatron-Turing Natural Language Generation (MT-NLG) model from Nvidia sports a staggering half-trillion (530 billion) parameters, and excels at “…Completion prediction, Reading comprehension, Commonsense reasoning, Natural language inferences, Word sense disambiguation,” and more.

StableLM

StableLM is a lightweight, open-source language model built by Stability AI. It’s trained on a new dataset called “The Pile”, which is itself made up of over 20 smaller, high-quality datasets which together amount to over 825 GB of natural language.

GPT4All

What would you get if you trained an LLM on “…on a massive curated corpus of assistant interactions, which included word problems, multi-turn dialogue, code, poems, songs, and stories,” then released it to on an Apache 2.0 license? The answer is GPT4All, an open-source model whose purpose is to encourage research into what these technologies can accomplish.

Alpaca

The Alpaca LLM project developed by Stanford is designed around following instructions. As things stand, Alpaca isn’t considered safe yet, so it is intended to be used by research teams exploring the frontiers of LLMs.

BLOOM

The BigScience Large Open-Science Open-Access Multilingual Language Model (BLOOM) was released in late 2022. The team that put it together consisted of more than a thousand researchers from all over the worlds, and unlike the other models on this list, it’s specifically meant to be interpretable.

GATO

DeepMind is one of the leading players advancing the frontiers of AI, and their GATO LLM is correspondingly remarkable. Like GPT-4, GATO is multimodal, meaning it can work with text, images, games, and can even control a robot.

Pathways Language Model (PaLM)

Like LaMDA, PaLM is from Google, and is also enormous (540 billion parameters). It excels in many language-related tasks, and became famous when it produced really high-level explanations of tricky jokes.

Claude

Anthropic’s Claude is billed as a “next-generation AI assistant.” It’s not known how big the model is, but it does come in two modes: the full Claude, and Claude Instant, which is faster but produces lower-quality responses.

FAQs

Now, let’s turn to some common sources of confusion where comparing current LLMs are concerned.

Overcoming the Limitations of Large Language Models

Large language models are remarkable tools, but they nevertheless suffer from some well-known limitations. They tend to hallucinate facts, for example, sometimes fail at basic arithmetic, and can get lost in the course of lengthy conversations.

Overcoming the limitations of large language models is mostly a matter of fine-tuning and monitoring them. The fine-tuning data you use must be carefully curated in order to cover basic failure modes, and you must have a robust means of checking on their output in case they go off the rails somewhere along the line.

What are the Best Large Language Models?

Having read all of the foregoing content, it’s natural to wonder if there’s a single model that best suits your enterprise. The answer is probably “yes”, but which model is ultimately the best fit for you depends a lot on the specifics. You’ll have to think about whether you want an open-source model or your content with hitting an API, whether your use case is outside the scope of ChatGPT and better handled with a bespoke model, etc.

Choosing Among the Current Large Language Models

With all the different LLMs on offer, it’s hard to narrow the search down to the one that’s best for you. By carefully weighing the different metrics we’ve discussed in this article, you can choose an LLM that meets your needs with as little hassle as possible.

Another way to minimize your headaches is to use an industry-leading solution that works out of the box to deliver world-class functionality. That’s exactly what we’re achieving here at Quiq. Schedule a demo to see how our conversational AI platform can help you build a forward-facing contact center.

Contact Center Managers: What Do LLMs Mean For You?

Whether it’s quantum computing, the blockchain, or generative AI, whenever a promising new technology emerges, forward-thinking people begin looking for a way to use it.

And this is a completely healthy response. It’s through innovation that the world moves forward, but great ideas don’t mean much if there aren’t people like contact center managers who use them to take their operations to the next level.

Today, we’re going to talk about what large language models (LLMs) like ChatGPT mean for contact centers. After briefly reviewing how LLMs work we’ll discuss the way they’re being used in contact centers, how those centers are changing as a result, and some things that contact center managers need to look out for when utilizing generative AI.

What are Large Language Models?

As their name suggests, LLMs are large, they’re focused on language, and they’re machine-learning models.

It’s our view that the best way to tackle these three properties is in reverse order, so we’ll start with the fact that LLMs are enormous neural networks trained via self-supervised learning. These neural networks effectively learn a staggeringly complex function that captures the statistical properties of human language well enough for them to generate their own.

Speaking of human language, LLMs like ChatGPT are pre-trained generative models focused on learning from and creating text. This distinguishes them from other kinds of generative AI, which might be focused on images, videos, speech, music, and proteins (yes, really.)

Finally, LLMs are really big. As with other terms like “big data” no one has a hard-and-fast rule for figuring out when you’ve gone from “language model” to “large language model” – but with billions of internal parameters, it’s safe to say that an LLM is a few orders of magnitude bigger than anything you’re likely to build outside of a world-class engineering team.

How can Large Language Models be Used in Contact Centers?

Since they’re so good at parsing and creating natural language, LLMs are an obvious choice for enterprises where there’s a lot of back-and-forth text exchanged, perhaps while, say, resolving issues or answering questions.

And for this reason, LLMs are already being used by contact center managers to make their agents more productive (more on this shortly).

To be more concrete, we turned up a few specific places where LLMs can be leveraged by contact center managers most effectively.

Answering questions: Even with world-class documentation, there will inevitably be customers who are having an issue they want help with. Though ChatGPT won’t be able to answer every such question, it can handle a lot of them, especially if you’ve fine-tuned it on your documentation.

Streamlining onboarding: For more or less the same reason, ChatGPT can help you onboard new hires. Employees learning the ropes will also be confused about parts of your technology and your process, and ChatGPT can help them find what they need more quickly.

Summarizing emails and articles: It might be possible for a team of five to be intimately familiar with what everyone else is doing, but any more than this and there will inevitably be things happening that are beyond their purview. By summarizing articles, tickets, email or Slack threads, etc., ChatGPT can help everyone stay reasonably up-to-date without having to devote hours every day to reading.

Issue prioritization: Not every customer question or complaint is equally important, and issues have to be prioritized before being handed off to contact center agents. ChatGPT can aid in this process, especially if it’s part of a broader machine-learning pipeline built for this kind of classification.

Translation: If you’re lucky enough to have a global audience, there will almost certainly be users who don’t have a firm grasp of English. Though there are tools like Google Translate that do a great job of handling translation tasks, ChatGPT often does an even better job.

What are Large Language Models for Customer Service?

Large language models are ideally suited for tasks that involve a great deal of working with text. Because contact center agents spend so much time answering questions and resolving customer issues, LLMs are a technology that can make them far more productive. ChatGPT excels at tasks like question answering, summarization, and language translation, which is why they’re already changing the way contact centers function.

How is Generative AI Changing Contact Centers?

The fear that advances in AI will lead to a decrease in employment among inferior human workers has a long and storied pedigree. Still, thus far the march of technological progress has tended to increase the number (and remuneration) of available jobs on the market.

Far from rendering human analysts obsolete, personal computers are now a major and growing source of new work (though, we confess, much less of it is happening on typewriters than before.)

Nevertheless, once people got a look at what ChatGPT can do there arose a fresh surge of worry over whether, this time, the robots were finally going to take all of our jobs.

Wanting to know how generative pre-trained language models have actually impacted the functioning of contact centers, Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond looked at data from some 5,000 customer support agents using it in their day-to-day work.

Their paper, “Generative AI at Work”, found that generative AI had led to a marked increase in productivity, especially among the newest, least-knowledgable, and lowest-performing workers.

The authors advanced the remarkable hypothesis that this might stem from the fact that LLMs are good at internalizing and disseminating the hard-won tacit knowledge of the best workers. They didn’t get much out of generative AI, in other words, precisely because they already had what they needed to perform well; but some fraction of their skill – such as how to phrase responses delicately to avoid offending irate customers – was incorporated into the LLM, where it was more accessible by less-skilled workers than it was when it was locked away in the brains of high-skilled workers.

What’s more, the organizations studied also changed as a result. Employees (especially lower-skilled ones) were generally more satisfied, less prone to burnout, and less likely to leave. Turnover was reduced, and customers escalated calls to supervisors less frequently.

Now, we hasten to add that of course this is just one study, and we’re in the early days of the generative AI revolution. No one can say with certainty what the long-term impact will be. Still, these are extremely promising early results, and lend credence to the view that generative AI will do a lot to improve the way contact centers onboard new hires, resolve customer issues, and function overall.

What are the Dangers of Using ChatGPT for Customer Service?

We’ve been singing the praises of ChatGPT and talking about all the ways in which it’s helping contact center managers run a tighter ship.

But, as with every technological advance stretching clear back to the discovery of fire, there are downsides. To help you better use generative AI, we’ll spend the next few sections talking about some characteristic failure modes you should be looking out for.

Hallucinations

By now, it’s fairly common knowledge that ChatGPT will just make things up. This is a consequence of the way LLMs like ChatGPT are trained. Remember, the model doesn’t contain a little person inside of it that’s checking statements for accuracy; it’s just taking the tokens it has seen so far and predicting the tokens that will come next.

That means if you ask it for a list of book recommendations to study lepidoptery or the naval battles of the Civil War (we don’t know what you’re into), there’s a pretty good chance that the list it provides will contain a mix of real and fake books.

ChatGPT has been known to invent facts, people, papers (complete with citations), URLs, and plenty else.

If you’re going to have customers interacting with it, or you’re going to have your contact center agents relying on it in a substantial way, this is something you’ll need to be aware of.

Degraded Performance

ChatGPT is remarkably performant, but it’s still just a machine learning model and machine learning models are known to suffer from model degradation.

This term refers to gradual or precipitous declines in model performance over time. There are technical reasons why this occurs, but from your perspective, you need to understand that the work has only begun once a model has been trained and put into production.

But you’re also not out of the woods if you’re accessing ChatGPT via an API, because you have just as little visibility into what’s happening on OpenAI’s engineering teams as the rest of us do.

If OpenAI releases an update you might suddenly find that ChatGPT fails in usual ways or trips over tasks it was handling very well last week. You’ll need to have robust monitoring in place so that you catch these issues if they arise, as well as an engineering team able to address the root cause.

Model degradation often stems from issues with the underlying data. This means that if you’ve e.g. trained ChatGPT to answer questions you might have to assemble new data for it to train on, a process that takes time and money and should be budgeted for.

Harassment and Bias

You could argue that harassment, bias, and harmful language are a kind of degraded performance, but they’re distinct and damaging enough to warrant their own section.

When Microsoft first released Sydney it was cartoonishly unhinged. It would lie, threaten, and manipulate users; in one case, it confessed both its love for a New York Times reporter along with its desire to engineer dangerous viruses and ignite internecine arguments between people.

All this has gotten much better, of course, but the same behavior can manifest in subtler ways, especially if someone is deliberately trying to jailbreak a large language model.

Thanks to extensive public testing and iteration, the current versions of the technology are very good at remaining polite, avoiding stereotyping, etc. Nevertheless, we’re not aware of any way to positively assure that no bias, deceit, or nastiness will emerge from ChatGPT.

This is another place where you’ll have to carefully monitor your model’s output and make corrections as necessary.

Using LLMs in your Contact Center

If you’re running a contact center, you owe it to yourself to at least check out ChatGPT. Whether it makes sense for you will depend on your unique circumstances, but it’s a remarkable new technology that could help you make your agents more effective while reducing turnover.

Quiq offers a white-glove platform that makes it easy to leverage conversational AI. Schedule a demo with us to see how we can help you incorporate generative AI into your contact center today!

Ways to Use ChatGPT for Customer Service

Now that we’ve all seen what ChatGPT can do, it’s natural to begin casting about for ways to put it to work. An obvious place where a generative AI language model can be used is in contact centers, which involve a great deal of text-based tasks like answering customer questions and resolving their issues.

But is ChatGPT ready for the on-the-ground realities of contact centers? What if it responds inappropriately, abuses a customer, or provides inaccurate information?

We at Quiq pride ourselves on being experts in the domain of customer experience and customer service, and we’ve been watching the recent developments in the realm of generative AI for some time. This piece presents our conclusions about what ChatGPT is, the ways in which ChatGPT can be used for customer service, and the techniques that exist to optimize it for this domain.

What is ChatGPT?

ChatGPT is an application built on top of GPT-4, a large language model. Large language models like GTP-4 are trained on huge amounts of textual data, and they gradually learn the statistical patterns present in that data well enough to output their own, new text.

How does this training work? Well, when you hear a sentence like “I’m going to the store to pick up some _____”, you know that the final word is something like “milk”, “bread”, or “groceries”, and probably not “sawdust” or “livestock”. This is because you’ve been using English for a long time, you’re familiar with what happens at a grocery store, and you have a sense of how a person is likely to describe their exciting adventures there (nothing gets our motor running like picking out the really good avocados).

GPT-4, of course, has none of this context, but if you show it enough examples it can learn to imitate natural language quite well. It will see the first few sentences of a paragraph and try to predict what the final sentence is. At first, its answers are terrible, but with each training run its internal parameters are updated, and it gradually gets better. If you do this for long enough you get something that can write its own emails, blog posts, research reports, book summaries, poems, and codebases.

Is ChatGPT the Same Thing as GPT-4?

So then, how is ChatGPT different from GPT-4? GPT-4 is the large language model trained in the manner just described, and ChatGPT is a version fine-tuned using reinforcement learning with human feedback to be good at conversations.

Fine-tuning refers to a process of taking a pre-trained language model and doing a little extra work to narrow its focus to doing a particular task. A generic LLM can do many things, including write limericks; but if you want it to consistently write high-quality limericks, you’ll need to fine-tune it by showing it a few dozen or a few hundred examples of them.

From that point on it will be specialized for limerick production, and might consequently be less useful for other tasks.

This is how ChatGPT was created. After GPT-3.5 or GPT-4 was finished training, engineers did additional fine-tuning work that led to a model that was especially good at having open-ended interactions with users.

What does ChatGPT mean for Customer Service?

Given that ChatGPT is useful for customer interactions, how might it be deployed in customer service? We believe that a good list of initial use cases includes question answering, personalizing responses to different customers, summarizing important information, translating between languages, and performing sentiment analysis.

This is certainly not everything current and future versions of ChatGPT will be able to do for customer service, but we think it’s a good place to start.

Question Answering

Question answering has long been of such interest to machine learning engineers that there’s a whole bespoke dataset specifically for it (the Stanford Question Answering Dataset, or SQuAD).

It’s not hard to see why. Humans can obviously answer questions, but there are so many possible questions that there’s just no way to get to it all. What if you’d like high-level summaries of all the major research papers published about an obscure scientific sub-discipline? What if you’d like to see how the tone of Victorian-era English novels changed over time? There are only so many person-hours that can go toward digging into queries like this.

Customers, too, have many questions, and answering them takes a lot of time. You could collect all the frequently asked questions and put them into a single document for easy reference, but there are still going to be areas of confusion and requests for clarification (and that’s not even considering the fraction of users who never make it to your FAQ page in the first place).

Automating the process of asking questions is an obvious place to utilize technology like ChatGPT. It’ll never get frustrated answering the same thing thousands of times, it’ll never lose its patience, it’ll never sleep, and it’ll never take a bathroom break.

Vanilla ChatGPT is pretty good at doing this already, and there are already many projects focused on getting it to answer questions about a particular company’s documentation.

This functionality will enable you to field an effectively unlimited number of customer questions while freeing up your contact center agents to tackle more important issues.

Onboarding New Hires

Customers are not the only people who might have questions about your product – new hires unfamiliar with your process for doing things might also have their fair share of confusion.

Even in companies that are very conscious about documentation, there can often be so much to get through that new employees – who already have a lot going on – can feel overwhelmed.

A large language model trained to answer questions about your documentation will be a godsend to the fresh troops you’ve brought in.

Summarization

A related task is summarizing email threads, important technical documents, or even videos.

Just as you can’t realistically expect every customer to assiduously look through all your company’s documents, it’s usually not realistic to expect that all of your own employees will do so either.

Here, too, is a place where ChatGPT can be useful. It’s quite good at taking a lengthy bit of text and summarizing it, so there’s no reason it can’t be used to keep your teams up to speed on what’s happening in parts of the organization that they don’t interact with all that often.

If your engineers don’t want to go over an exchange between product designers, or your marketing team doesn’t want all the details of a conversation between the data scientists, ChatGPT can be used to create summaries of these interactions for easier reading.

This way everyone knows what’s going on throughout the company without needing to spend hours every day staying abreast of evolving issues.

At Quiq, we’ve developed proprietary ways to harness ChaGPT’s generative abilities to summarize conversations for your contact center agents.

Sentiment Analysis

Finally, another way in which ChatGPT will power the contact center of the future is with sentiment analysis. Sentiment analysis refers to a branch of machine learning aimed at parsing the overall tone of a piece of text. This can be more subtle than you might think.

“I hate this restaurant” is pretty unambiguous, but what about a review like “Yeah, we loved this restaurant, we had plenty of time to chat because the food took an hour to come out, and since my enchilada was frozen it counteracted my usual inability to eat spicy food”? You and I can hear the implied eye-rolling in this text, but a machine won’t necessarily be able to unless it’s very powerful.

This matters for contact centers because you need to understand how people are talking about your product, whether that’s in online reviews, internal tickets, or during conversations with your agents.

And ChatGPT can help. It’s not only quite good at sentiment analysis, but it’s also better than quite a lot of alternative machine-learning approaches to sentiment analysis, even without fine-tuning.

(Note, however, that these tests compare it to relatively simple machine learning models, not to the very best deep-learning sentiment analyzers.)

Prioritizing Incoming Issues

One way that ChatGPT can add tremendous value to your contact center is in helping to prioritize issues as they come in. There are always lots of problems to solve, but they’re not all equally important. Finding the most pressing issues and marking them for resolution is a huge part of keeping your center running smoothly.

This is something that humans can do, but there’s only so much energy they can devote to this task. A properly trained generative language model, however, can handle a huge chunk of it, especially when it forms part of a broader suite of AI tools.

One way this could work is using ChatGPT for plucking out essential keywords from a customer service ticket. This by itself might be enough to help your contact center agents figure out what they should focus on, but it can be made even better if these words are then fed to a classification algorithm trained to identify urgent problems.

Real-time Language Translation

Language translation, too, is a clear use case for LLMs, and the deep learning upon which they are based has seen much success in translating from one language to another.

This is especially useful if your product or service enjoys a global audience. Many people have a passing familiarity with English but will not necessarily be able to follow a detailed procedure involving technical vocabulary, and that will be a source of frustration for them.

By substantially or totally automating real-time language translation, ChatGPT can help customers who lack English fluency to better interact with your company’s offerings, answering their questions, resolving their issues, and in general moving them along.

And in case you’re wondering, ChatGPT is currently even better than Google Translate or DeepL at most translation tasks, including tricky ones involving jokes and humor.

Fine-Tuning ChatGPT for Customer Service

So far we’ve mostly talked about ChatGPT out of the box, but we’ve also made some references to “fine-tuning” it.

In this section, we’ll flesh out our earlier comments about fine-tuning ChatGPT, and distinguish fine-tuning from related techniques, like prompt engineering.

What is Fine-Tuning ChatGPT?

Once upon a time, it was anyone’s guess as to whether you’d be able to pre-train a single large model on a dataset and then tweak it for particular applications, or whether you’d need to train a special model for every individual task.

Beginning around 2011, it became increasingly clear that for many applications, pre-training was the way to go, and since then, many techniques have been developed for doing the subsequent fine-tuning.

When you fine-tune a pre-trained generative AI model, you are effectively altering its internal structure so that it does better on the task you’re interested in. Sometimes this involves changing the whole model, other times you’re altering the last few output layers and leaving the rest of the model intact.

But what it ultimately boils down to is creating a fine-tuning pipeline through which your model sees a lot of examples of the behavior you’re trying to elicit. If you were fine-tuning it to be more polite in its follow-up questions, for example, you’d need to collect a bunch of examples of this politeness and have your model learn on them.

How many examples you end up needing will depend on your specific use case, but it’s usually a few dozen and could be as many as a few hundred.

How is Fine-Tuning Different From Prompt Engineering?

Prompt engineering refers to the practice of carefully sculpting the prompt you feed your model to do a better job of producing the output you want to see.

The reason this works is that GPT-4 and other LLMs are extremely sensitive to slight changes in the wording of their prompts. It takes a while to develop the feel required to reliably produce good results with an LLM, and all of this falls under the label of “prompt engineering”.

It’s possible to inject some light fine-tuning into prompt engineering, through one-shot and few-shot learning. One-shot learning means including one example of the behavior you want to see in your prompt, and few-shot learning is the same idea, but you’re including 2-5 examples for the LLM to learn from.

FAQs About ChatGPT for Customer Service

Now that we’ve finished our discussion of the basics of ChatGPT for customer service, we’ll spend some time addressing common questions about this subject.

Can I Use ChatGPT for Customer Service?

Yes! ChatGPT is ideal for customer service applications, but you need to fine-tune ChatGPT on your own company’s documentation or to get it to strike the right tone. With the right guardrails, it’s a powerful tool for those looking to build a forward-looking contact center.

What are the Examples of ChatGPT in Customer Service?

ChatGPT can be used for customer service tasks like question answering, sentiment analysis, translating between natural languages, and summarizing documents. These are all time-intensive tasks, the automation of which will free up your contact center agents to focus on higher-priority work.

Can you Automate Customer Service?

Tools like AutoGPT and SuperAGI are making it easier than ever to create and manage sophisticated agents capable of handling open-ended tasks. Still, artificial intelligence is not yet flexible enough to entirely automate customer service at present.

It can be used to automate substantial parts of customer service, like answering user questions, but for the moment the lion’s share of the work must still be done by flesh-and-blood human beings.

If you’re interested in developments in this space, be sure to follow the Quiq blog for updates.

ChatGPT and the Contact Center of the Future

ChatGPT and related technologies are already changing the way contact centers function. From automated translation to helping field dramatically more questions per hour, they are helping contact center agents be more productive and reducing organizational turnover.

The Quiq platform is an excellent tool for incorporating conversational AI into your offering, without having to hire a team or manage your own infrastructure. Quiq can help you automate text messaging, handle real-time translation, and track the performance of your AI Assistants to see where improvements need to be made.

Exploring Cutting-Edge Research in Large Language Models and Generative AI

By the calendar, ChatGPT was released just a few months ago. But subjectively, it feels as though 600 years have passed since we all read “as a large language model…” for the first time.

The pace of new innovations is staggering, but we at Quiq like to help our audience in the customer experience and contact center industries stay ahead of the curve (even when that requires faster-than-light travel).

Today, we will look at what’s new in generative AI, and what will be coming down the line in the months ahead.

Where will Generative AI be applied?

First, let’s start with industries that will be strongly impacted by generative AI. As we noted in an earlier article, training a large language model (LLM) like ChatGPT mostly boils down to showing it tons of examples of text until it learns a statistical representation of human language well enough to generate sonnets, email copy, and many other linguistic artifacts.

There’s no reason the same basic process (have it learn it from many examples and then create its own) couldn’t be used elsewhere, and in the next few sections, we’re going to look at how generative AI is being used in a variety of different industries to brainstorm structures, new materials, and a billion other things.

Generative AI in Building and Product Design

If you’ve had a chance to play around with DALL-E, Midjourney, or Stable Diffusion, you know that the results can be simply remarkable.

It’s not a far leap to imagine that it might be useful for quickly generating ideas for buildings and products.

The emerging field of AI-generated product design is doing exactly this. With generative image models, designers can use text prompts to rough out ideas and see them brought to life. This allows for faster iteration and quicker turnaround, especially given that creating a proof of concept is one of the slower, more tedious parts of product design.

Image source: Board of Innovation

 

For the same reason, these tools are finding use among architects who are able to quickly transpose between different periods and styles, see how better lighting impacts a room’s aesthetic, and plan around themes like building with eco-friendly materials.

There are two things worth pointing out about this process. First, there’s often a learning curve because it can take a while to figure out prompt engineering well enough to get a compelling image. Second, there’s a hearty dose of serendipity. Often the resulting image will not be quite what the designer had in mind, but it’ll be different in new and productive ways, pushing the artist along fresh trajectories that might never have occurred to them otherwise.

Generative AI in Discovering New Materials

To quote one of America’s most renowned philosophers (Madonna), we’re living in a material world. Humans have been augmenting their surroundings since we first started chipping flint axes back in the Stone Age; today, the field of materials science continues the long tradition of finding new stuff that expands our capabilities and makes our lives better.

This can take the form of something (relatively) simple like researching a better steel alloy, or something incredibly novel like designing a programmable nanomaterial.

There’s just one issue: it’s really, really difficult to do this. It takes a great deal of time, energy, and effort to even identify plausible new materials, to say nothing of the extensive testing and experimenting that must then follow.

Materials scientists have been using machine learning (ML) in their process for some time, but the recent boom in generative AI is driving renewed interest. There are now a number of projects aimed at e.g. using variational autoencoders, recurrent neural networks, and generative adversarial networks to learn a mapping between information about a material’s underlying structure and its final properties, then using this information to create plausible new materials.

It would be hard to overstate how important the use of generative AI in materials science could be. If you imagine the space of possible molecules as being like its own universe, we’ve explored basically none of it. What new fabrics, medicines, fuels, fertilizers, conductors, insulators, and chemicals are waiting out there? With generative AI, we’ve got a better chance than ever of finding out.

Generative AI in Gaming

Gaming is often an obvious place to use new technology, and that’s true for generative AI as well. The principles of generative design we discussed two sections ago could be used in this context to flesh out worlds, costumes, weapons, and more, but it can also be used to make character interactions more dynamic.

From Navi trying to get our attention in Ocarina of Time to GlaDOS’s continual reminders that “the cake is a lie” in Portal, non-playable characters (NPCs) have always added texture and context to our favorite games.

Powered by LLMs, these characters may soon be able to have open-ended conversations with players, adding more immersive realism to the gameplay. Rather than pulling from a limited set of responses, they’d be able to query LLMs to provide advice, answer questions, and shoot the breeze.

What’s Next in Generative AI?

As impressive as technologies like ChatGPT are, people are already looking for ways to extend their capabilities. Now that we’ve covered some of the major applications of generative AI, let’s look at some of the exciting applications people are building on top of it.

What is AutoGPT and how Does it Work?

ChatGPT can already do things like generate API calls and build simple apps, but as long as a human has to actually copy and paste the code somewhere useful, its capacities are limited.

But what if that weren’t an issue? What if it were possible to spin ChatGPT up into something more like an agent, capable of semi-autonomously interacting with software or online services to complete strings of tasks?

This is exactly what Auto-GPT is intended to accomplish. Auto-GPT is an application built by developer Toran Bruce Richards, and it is comprised of two parts: an LLM (either GPT-3.5 or GPT-4), and a separate “bot” that works with the LLM.

By repeatedly querying the LLM, the bot is able to take a relatively high-level task like “help me set up an online business with a blog and a website” or “find me all the latest research on quantum computing”, decompose it into discrete, achievable steps, then iteratively execute them until the overall objective is achieved.

At present, Auto-GPT remains fairly primitive. Just as ChatGPT can get stuck in repetitive and unhelpful loops, so too can Auto-GPT. Still, it’s a remarkable advance, and it’s spawned a series of other projects attempting to do the same thing in a more consistent way.

The creators of AssistGPT bill it as a “General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn”. It handles multi-modal tasks (i.e. tasks that rely on vision or sound and not just text) better than Auto-GPT, and by integrating with a suite of tools it is able to achieve objectives that involve many intermediate steps and sub-tasks.

SuperAGI, in turn, is just as ambitious. It’s a platform that offers a way to quickly create, deploy, manage, and update autonomous agents. You can integrate them into applications like Slack or vector databases, and it’ll even ping you if an agent gets stuck somewhere and starts looping unproductively.

Finally, there’s LangChain, which is a similar idea. LangChain is a framework that is geared towards making it easier to build on top of LLMs. It features a set of primitives that can be stitched into more robust functionality (not unlike “for” and “while” loops in programming languages), and it’s even possible to build your own version of AutoGPT using LangChain.

What is Chain-of-Thought Prompting and How Does it Work?

In the misty, forgotten past (i.e. 5 months ago), LLMs were famously bad at simple arithmetic. They might be able to construct elegant mathematical proofs, but if you asked them what 7 + 4 is, there was a decent chance they’d get it wrong.

Chain-of-thought (COT) prompting refers to a few-shot learning method of eliciting output from an LLM that compels it to reason in a step-by-step way, and it was developed in part to help with this issue. This image from the original Wei et al. (2022) paper illustrates how:

Input and output examples for Standard and Chain-of-thought Prompting.
Source: ARXIV.org

As you can see, the model’s performance is improved because it’s being shown a chain of different thoughts, hence chain-of-thought.

This technique isn’t just useful for arithmetic, it can be utilized to get better output from a model in a variety of different tasks, including commonsense and symbolic reasoning.

In a way, humans can be prompt engineered in the same fashion. You can often get better answers out of yourself or others through a deliberate attempt to reason slowly, step-by-step, so it’s not a terrible shock that a large model trained on human text would benefit from the same procedure.

The Ecosystem Around Generative AI

Though cutting-edge models are usually the stars of the show, the truth is advanced technologies aren’t worth much if you have to be deeply into the weeds to use them. Machine learning, for example, would surely be much less prevalent if tools like sklearn, Tensorflow, and Keras didn’t exist.

Though we’re still in the early days of LLMs, AutoGPT, and everything else we’ve discussed, we suspect the same basic dynamic will play out. Since it’s now clear that these models aren’t toys, people will begin building infrastructure around them that streamlines the process of training them for specific use cases, integrating them into existing applications, etc.

Let’s discuss a few efforts in this direction that are already underway.

Training and Education

Among the simplest parts of the emerging generative AI value chain is exactly what we’re doing now: talking about it in an informed way. Non-specialists will often lack the time, context, and patience required to sort the real breakthroughs from the hype, so putting together blog posts, tutorials, and reports that make this easier is a real service.

Making Foundation Models Available

“Foundation models” is a new term that refers to the actual algorithms that underlie LLMs. ChatGPT, for example, is not a foundation model. GPT-4 is the foundation model, and ChatGPT is a specialized application of it (more on this shortly).

Companies like Anthropic, Google, and OpenAI can train these gargantuan models and then make them available through an API. From there, developers are able to access their preferred foundation model over an API.

This means that we can move quickly to utilize their remarkable functionality, which wouldn’t be the case if every company had to train their own from scratch.

Building Applications Around Specific Use Cases

One of the most striking properties of ChatGPT is how amazingly general they are. They are capable of “…generating functioning web apps with just a few prompts, writing Spanish-language children’s stories about the blockchain in the style of Dr. Suess, [and] opining on the virtues and vices of major political figures”, to name but a few examples.

General-purpose models often have to be fine-tuned to perform better on a specific task, especially if they’re doing something tricky like summarizing medical documents with lots of obscure vocabulary. Alas, there is a tradeoff here, because in most cases these fine-tuned models will afterward not be as useful for generic tasks.

The issue, however, is that you need a fair bit of technical skill to set up a fine-tuning pipeline, and you need a fair bit of elbow grease to assemble the few hundred examples a model needs in order to be fine-tuned. Though this is much simpler than training a model in the first place it is still far from trivial, and we expect that there will soon be services aimed at making it much more straightforward.

LLMOps and Model Hubs

We’d venture to guess you’ve heard of machine learning, but you might not be familiar with the term “MLOps”. “Ops” means “operations”, and it refers to all the things you have to do to use a machine learning model besides just training it. Once a model has been trained it has to be monitored, for example, because sometimes its performance will begin to inexplicably degrade.

The same will be true of LLMs. You’ll need to make sure that the chatbot you’ve deployed hasn’t begun abusing customers and damaging your brand, or that the deep learning tool you’re using to explore new materials hasn’t begun to spit out gibberish.

Another phenomenon from machine learning we think will be echoed in LLMs is the existence of “model hubs”, which are places where you can find pre-trained or fine-tuned models to use. There certainly are carefully guarded secrets among technologists, but on the whole, we’re a community that believes in sharing. The same ethos that powers the open-source movement will be found among the teams building LLMs, and indeed there are already open-sourced alternatives to ChatGPT that are highly performant.

Looking Ahead

As they’re so fond of saying on Twitter, “ChatGPT is just the tip of the iceberg.” It’s already begun transforming contact centers, boosting productivity among lower-skilled workers while reducing employee turnover, but research into even better tools is screaming ahead.

Frankly, it can be enough to make your head spin. If LLMs and generative AI are things you want to incorporate into your own product offering, you can skip the heady technical stuff and skip straight to letting Quiq do it for you. The Quiq conversational AI platform is a best-in-class product suite that makes it much easier to utilize these technologies. Schedule a demo to see how we can help you get in on the AI revolution.

How to Evaluate Generated Text and Model Performance

Machine learning is an incredibly powerful technology. That’s why it’s being used in everything from autonomous vehicles to medical diagnoses to the sophisticated, dynamic AI Assistants that are handling customer interactions in modern contact centers.

But for all this, it isn’t magic. The engineers who build these systems must know a great deal about how to evaluate them. How do you know when a model is performing as expected, or when it has begun to overfit the data? How can you tell when one model is better than another?

This subject will be our focus today. We’ll cover the basics of evaluating a machine learning model with metrics like mean squared error and accuracy, then turn our attention to the more specialized task of evaluating the generated text of a large language model like ChatGPT.

How to Measure the Performance of a Machine Learning Model?

A machine learning model is always aimed at some task. It might be trying to fit a regression line that helps predict the future price of Bitcoin, it might be clustering documents according to their topics, or it might be trying to generate text so good it rivals that produced by humans.

How does the model know when it’s gotten the optimal line or discovered the best way to cluster documents? (And more importantly, how do you know?)

In the next few sections, we’ll talk about a few common ways of evaluating the performance of a machine-learning model. If you’re an engineer this will help you create better models yourself, and if you’re a layperson, it’ll help you better understand how the machine-learning pipeline works.

Evaluation Metrics for Regression Models

Regression is one of the two big types of basic machine learning, with the other being classification.

In tech-speak, we say that the purpose of a regression model is to learn a function that maps a set of input features to a real value (where “real” just means “real numbers”). This is not as scary as it sounds; you might try to create a regression model that predicts the number of sales you can expect given that you’ve spent a certain amount on advertising, or you might try to predict how long a person will live on the basis of their daily exercise, water intake, and diet.

In each case, you’ve got a set of input features (advertising spend or daily habits), and you’re trying to predict a target variable (sales, life expectancy).

The relationship between the two is captured by a model, and a model’s quality is evaluated with a metric. Popular metrics for regression models include the mean squared error, the root mean squared error, and the mean absolute error (though there are plenty of others if you feel like going down a nerdy rabbit hole).

The mean squared error (MSE) quantifies how good a regression model is by calculating the difference between the line and each real data point, squaring them (so that positive and negative differences don’t cancel out), and then averaging them. This gives a single number that the training algorithm can use to adjust its model – if the MSE is going down, the model is getting better, if it’s going up, it’s getting worse.

The root mean squared error (RMSE) does the exact same thing, but the final step is that you take the square root of the MSE. The big advantage here is that it converts the units of your metric back into the units you’re using in your problem (i.e. the “squared dollars” of MSE become “dollars” again, which makes it easier to think about what’s going on).

The mean absolute error (MAE) is the same basic idea, but it uses absolute values instead of squares. This also has the advantage of not penalizing outliers as much as the RMSE does. If you’ve got some outlier data point that’s far away from your model, squaring the difference will result in a bigger error than simply taking the absolute value of that difference. For this reason, it’s less sensitive to outliers in the dataset.

Evaluation Metrics for Classification Models

People tend to struggle less with understanding classification models because it’s more intuitive: you’re building something that can take a data point (the price of an item) and sort it into one of a number of different categories (i.e. “cheap”, “somewhat expensive”, “expensive”, “very expensive”).

Of course, the categories you choose will depend on the problem you’re trying to solve and the domain you’re operating in – a $100 apple is certainly “very expensive”, but a $100 dollar wedding ring…will probably get you left at the altar.

Regardless, it’s just as essential to evaluate the performance of a classification model as it is to evaluate the performance of a regression model. Some common evaluation metrics for classification models are accuracy, precision, and recall.

Accuracy is simple, and it’s exactly what it sounds like. You find the accuracy of a classification model by dividing the number of correct predictions it made by the total number of predictions it made altogether. If your classification model made 1,000 predictions and got 941 of them right, that’s an accuracy rate of 94.1% (not bad!)

Both precision and recall are subtler variants of this same idea. The precision is the number of true positives (correct classifications) divided by the sum of true positives and false positives (incorrect positive classifications). It says, in effect, “When your model thought it had identified a needle in a haystack, this is how often it was correct.”

The recall is the number of true positives divided by the sum of true positives and false negatives (incorrect negative classifications). It says, in effect “There were 200 needles in this haystack, and your model found 72% of them.”

Accuracy tells you how well your model performed overall, precision tells you how confident you can be in its positive classifications, and recall tells you how often it found the positive classifications.

(You may be wondering if this isn’t overkill. Do we really need all these different ratios? Answering that question fully would take us too far from our purpose of measuring the quality of text from generative AI models, but suffice it to say that there are trade-offs involved. Sometimes it makes more sense to focus on boosting the precision, other times getting a higher recall is more important. These are all just different tools for figuring out how to spend your limited time and energy to get a model that best solves your problem.)

Contact Us

How Can I Assess the Performance of a Generative AI Model?

Now, we arrive at the center of this article. Everything up to now has been background context that hopefully has given you a feel for how models are evaluated, because from here on out it’s a bit more abstract.

Using Reference Text for Evaluating Generative Models

When we wanted to evaluate a regression model, we started by looking at how far its predictions were from actual data points.

Well, we do essentially the same thing with generative language models. To assess the quality of text generated by a model, we’ll compare it against high-quality text that’s been selected by domain experts.

The Bilingual Evaluation Understudy (BLEU) Score

The BLEU score can be used to actually quantify the distance between the generated and reference text. It does this by comparing the amount of overlap in the n-grams [1] between the two using a series of weighted precision scores.

The BLEU score varies from 0 to 1. A score of “0” indicates that there is no n-gram overlap between the generated and reference text, and the model’s output is considered to be of low quality. A score of “1”, conversely, indicates that there is total overlap between the generated and reference text, and the model’s output is considered to be of high quality.

Comparing BLEU scores across different sets of reference texts or different natural languages is so tricky that it’s considered best to avoid it altogether.

Also, be aware that the BLEU score contains a “brevity penalty” which discourages the model from being too concise. If the model’s output is too much shorter than the reference text, this counts as a strike against it.

The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) Score

Like the BLEU score, the ROGUE score is examining the n-gram overlap between an output text and a reference text. Unlike the BLEU score, however, it uses recall instead of precision.

There are three types of ROGUE scores:

  1. rogue-n: Rogue-n is the most common type of ROGUE score, and it simply looks at n-gram overlap, as described above.
  2. rogue-l: Rogue-l looks at the “Longest Common Subsequence” (LCS), or the longest chain of tokens that the reference and output text share. The longer the LCS, of course, the more the two have in common.
  3. rogue-s: This is the least commonly-used variant of the ROGUE score, but it’s worth hearing about. Rogue-s concentrates on the “skip-grams” [2] that the two texts have in common. Rogue-s would count “He bought the house” and “He bought the blue house” as overlapping because they have the same words in the same order, despite the fact that the second sentence does have an additional adjective.

The Metric for Evaluation of Translation with Explicit Ordering (METEOR) Score

The METEOR Score takes the harmonic mean of the precision and recall scores for 1-gram overlap between the output and reference text. It puts more weight on recall than on precision, and it’s intended to address some of the deficiencies of the BLEU and ROGUE scores while maintaining a pretty close match to how expert humans assess the quality of model-generated output.

BERT Score

At this point, it may have occurred to you to wonder whether the BLEU and ROGUE scores are actually doing a good job of evaluating the performance of a generative language model. They look at exact n-gram overlaps, and most of the time, we don’t really care that the model’s output is exactly the same as the reference text – it needs to be at least as good, without having to be the same.

The BERT score is meant to address this concern through contextual embeddings. By looking at the embeddings behind the sentences and comparing those, the BERT score is able to see that “He quickly ate the treats” and “He rapidly consumed the goodies” are expressing basically the same idea, while both the BLEU and ROGUE scores would completely miss this.

Final thoughts.

We’ve all seen what generative AI can do, and it’s fair at this point to assume this technology is going to become more prevalent in fields like software engineering, customer service, customer experience, and marketing.

But, as magical as generative AI might seem to be, they’re just models. They have to be evaluated and monitored just like any other, or you risk having a bad one negatively impact your brand.

If you’re enchanted by the potential of using generative algorithms in your contact center but are daunted by the challenge of putting together an engineering team, reach out to us for a demo of the Quiq conversational CX platform. We can help you put this cutting-edge technology to work without having to worry about all the finer details and resourcing issues.

***

Footnotes

[1] An n-gram is just a sequence of characters, words, or entire sentences. A 1-gram is usually single words, a 2-gram is usually two words, etc.
[2] Skip-grams are a rather involved subdomain of natural language processing. You can read more about them in this article, but frankly, most of it is irrelevant to this article. All you need to know is that the rogue-s score is set up to be less concerned with exact n-gram overlaps than the alternatives.

How to Get the Most out of Your NLP Models with Preprocessing

Along with computer vision, natural language processing (NLP) is one of the great triumphs of modern machine learning. While ChatGPT is all the rage and large language models (LLMs) are drawing everyone’s attention, that doesn’t mean that the rest of the NLP field just goes away.

NLP endeavors to apply computation to human-generated language, whether that be the spoken word or text existing in places like Wikipedia. There are any number of ways in which this would be relevant to customer experience and service leaders, including:

Today, we’re going to briefly touch on what NLP is, but we’ll spend the bulk of our time discussing how textual training data can be preprocessed to get the most out of an NLP system. There are a few branches of NLP, like speech synthesis and text-to-speech, which we’ll be omitting.

Armed with this context, you’ll be better prepared to evaluate using NLP in your business (though if you’re building customer-facing chatbots, you can also let the Quiq platform do the heavy lifting for you).

What is Natural Language Processing?

In the past, we’ve jokingly referred to NLP as “doing computer stuff with words after you’ve tricked them into being math.” This is meant to be humorous, but it does capture the basic essence.

Remember, your computer doesn’t know what words are, all it does is move 1’s and 0’s around. A crucial step in most NLP applications, therefore, is creating a numerical representation out of the words in your training corpus.

There are many ways of doing this, but today a popular method is using word vector embeddings. Also known simply as “embeddings”, these are vectors of real numbers. They come from a neural network or a statistical algorithm like word2vec and stand in for particular words.

The technical details of this process don’t concern us in this post, what’s important is that you end up with vectors that capture a remarkable amount of semantic information. Words with similar meanings also have similar vectors, for example, so you can do things like find synonyms for a word by finding vectors that are mathematically close to it.

These embeddings are the basic data structures used across most of NLP. They power sentiment analysis, topic modeling, and many other applications.

For most projects it’s enough to use pre-existing word vector embeddings without going through the trouble of generating them yourself.

Are Large Language Models Natural Language Processing?

Large language models (LLMs) are a subset of natural language processing. Training an LLM draws on many of the same techniques and best practices as the rest of NLP, but NLP also addresses a wide variety of other language-based tasks.

Conversational AI is a great case in point. One way of building a conversational agent is by hooking your application up to an LLM like ChatGPT, but you can also do it with a rules-based approach, through grounded learning, or with an ensemble that weaves together several methods.

Getting the Most out of Your NLP Models with Preprocessing

Data Preprocessing for NLP

If you’ve ever sent a well-meaning text that was misinterpreted, you know that language is messy. For this reason, NLP places special demands on the data engineers and data scientists who must transform text in various ways before machine learning algorithms can be trained on it.

In the next few sections, we’ll offer a fairly comprehensive overview of data preprocessing for NLP. This will not cover everything you might encounter in the course of preparing data for your NLP application, but it should be more than enough to get started.

Why is Data Preprocessing Important?

They say that data is the new oil, and just as you can’t put oil directly in your gas tank and expect your car to run, you can’t plow a bunch of garbled, poorly-formatted language data into your algorithms and expect magic to come out the other side.

But what, precisely, counts as preprocessing will depend on your goals. You might choose to omit or include emojis, for example, depending on whether you’re training a model to summarize academic papers or write tweets for you.

That having been said, there are certain steps you can almost always expect to take, including standardizing the case of your language data, removing punctuation, white spaces and stop words, segmenting and tokenizing, etc.

We treat each of these common techniques below.

Segmentation and Tokenization

An NLP model is always trained on some consistent chunk of the full data. When ChatGPT was trained, for example, they didn’t put the entire internet in a big truck and back it up to a server farm, they used self-supervised learning.

Simplifying greatly, this means that the underlying algorithm would take, say, the first few three sentences of a paragraph and then try to predict the remaining sentence on the basis of the text that came before. Over time it sees enough language to guess that “to be or not to be, that is ___ ________” ends with “the question.”

But how was ChatGPT shown the first three sentences? How does that process even work?

A big part of the answer is segmentation and tokenization.

With segmentation, we’re breaking a full corpus of training text – which might contain hundreds of books and millions of words – down into units like words or sentences.

This is far from trivial. In English, sentences end with a period, but words like “Mr.” and “etc.” also contain them. It can be a real challenge to divide text into sentences without also breaking “Mr. Smith is cooking the steak.” into “Mr.” and “Smith is cooking the steak.”

Tokenization is a related process of breaking a corpus down into tokens. Tokens are sometimes described as words, but in truth they can be words, short clusters of a few words, sub-words, or even individual characters.

This matters a lot to the training of your NLP model. You could train a generative language model to predict the next sentence based on the preceding sentences, the next word based on the preceding words, or the next character based on the preceding characters.

Regardless, in both segmentation and tokenization, you’re decomposing a whole bunch of text down into individual units that your algorithm can work with.

Making the Case Consistent

It’s standard practice to make the case of your text consistent throughout, as this makes training simpler. This is usually done by lowercasing all the text, though we suppose if you’re feeling rebellious there’s no reason you couldn’t uppercase it (but the NLP engineers might not invite you to their fun Natural Language Parties if you do.)

Fixing Misspellings

NLP, like machine learning more generally, is only as good as its data. If you feed it text with a lot of errors in spelling, it will learn those errors and they’ll show up again later.

This probably isn’t something you’ll want to do manually, and if you’re using a popular language there’s likely a module you can use to do this for you. Python, for example, has TextBlob, Autocorrect, and Pyspellchecker libraries that can handle spelling errors.

Getting Rid of the Punctuation Marks

Natural language tends to have a lot of punctuation, with English utilizing dozens of marks such as ‘!’ and ‘;’ for emphasis and clarification. These are usually removed as part of preprocessing.

This task is something that can be handled with regular expressions (if you have the patience for it…), or you can do it with an NLP library like Natural Language Toolkit (NLTK).

Expanding the Contractions

Contractions are shortened versions of words, like turning “do not” into “don’t” or “would not” into “wouldn’t”. These, too, can be problematic for NLP algorithms and are usually removed during preprocessing.

Stemming

In linguistics, the stem of a word is its root. The words “runs”, “ran”, and “running” all have the word “run” as their base.

Stemming is one of two approaches for reducing the myriad tenses of a word down into a single basic representation. The other is lemmatization, which we’ll discuss in the next section.

Stemming is the cruder of the two, and is usually done with an algorithm known as Porter’s Stemmer. This stemmer doesn’t always produce the stem you’d expect. “Cats” becomes “cat” while “ponies” becomes “poni”, for example. Nevertheless, this is probably sufficient for basic NLP tasks.

Lemmatization

A more sophisticated version of stemming is lemmatization. A stemmer wouldn’t know the difference between the word “left” in “cookies are ahead and to the left” and “he left the book on the table”, whereas a lemmatizer would.

More generally, a lemmatizer uses language-specific context to handle very subtle distinctions between words, and this means it will usually take longer to run than a stemmer.

Whether it makes sense to use a stemmer or a lemmatizer will depend on the use case you’re interested in. Under most circumstances, lemmatizers are more accurate, and stemmers are faster.

Removing Extra White Spaces

It’ll often be the case that a corpus will have an inconsistent set of spacing conventions. This, too, is something algorithm will learn unless it’s remedied during preprocessing.
Removing Stopwords

This is a big one. “Stopwords” are words like “the” or “is” are all stopwords, and they’re almost always removed before training begins because they don’t add much in the way of useful information.

Because this is done so commonly, you can assume that the NLP library you’re using will have some easy way of doing it. NLTK, for example, has a native list of stopwords that can simply be imported:

from nltk.corpus import stopwords

With this, you can simply exclude the stopwords from the corpus.

Ditching the Digits

If you’re building an NLP application that processes data containing numbers, you’ll probably want to remove that as the training algorithm might end up inserting random digits here and there.

This, alas, is something that will probably need to be done with regular expressions.

Part of Speech Tagging

Part of speech tagging refers to the process of automatically tagging a word with extra grammatical information about whether it’s a noun, verb, etc.

This is certainly not something that you always have to do (we’ve completed a number of NLP projects where it never came up), but it’s still worth understanding what it is.

Supercharging Your NLP Applications

Natural language processing is an enormously powerful constellation of techniques that allow computers to do worthwhile work on text data. It can be used to build question-answering systems, tutors, chatbots, and much more.

But to get the most out of it, you’ll need to preprocess the data. No matter how much computing you have access to, machine learning isn’t of much use with bad data. Techniques like removing stopwords, expanding contractions, and lemmatization create corpora of text that can then be fed to NLP algorithms.

Of course, there’s always an easier way. If you’d rather skip straight to the part where cutting-edge conversational AI directly adds value to your business, you can also reach out to see what the Quiq platform can do.

What Is Transfer Learning? – The Role of Transfer Learning in Building Powerful Generative AI Models

Machine learning is hard work. Sure, it only takes a few minutes to knock out a simple tutorial where you’re training an image classifier on the famous iris dataset, but training a big model to do something truly valuable – like interacting with customers over a chat interface – is a much greater challenge.

Transfer learning offers one possible solution to this problem. By making it possible to train a model in one domain and reuse it in another, transfer learning can reduce demands on your engineering team by a substantial amount.

Today, we’re going to get into transfer learning, defining what it is, how it works, where it can be applied, and the advantages it offers.

Let’s get going!

What is Transfer Learning in AI?

In the abstract, transfer learning refers to any situation in which knowledge from one task, problem, or domain is transferred to another. If you learn how to play the guitar well and then successfully use those same skills to pick up a mandolin, that’s an example of transfer learning.

Speaking specifically about machine learning and artificial intelligence, the idea is very similar. Transfer learning is when you pre-train a model on one task or dataset and then figure out a way to reuse it for another (we’ll talk about methods later).

If you train an image model, for example, it will tend to learn certain low-level features (like curves, edges, and lines) that show up in pretty much all images. This means you could fine-tune the pre-trained model to do something more specialized, like recognizing faces.

Why Transfer Learning is Important in Deep Learning Models

Building a deep neural network requires serious expertise, especially if you’re doing something truly novel or untried.

Transfer learning, while far from trivial, is simply not as taxing. GPT-4 is the kind of project that could only have been tackled by some of Earth’s best engineers, but setting up a fine-tuning pipeline to get it to do good sentiment analysis is a much simpler job.

By lowering the barrier to entry, transfer learning brings advanced AI into reach for a much broader swath of people. For this reason alone, it’s an important development.

Transfer Learning vs. Fine-Tuning

And speaking of fine-tuning, it’s natural to wonder how it’s different from transfer learning.

The simple answer is that fine-tuning is a kind of transfer learning. Transfer learning is a broader concept, and there are other ways to approach it besides fine-tuning.

What are the 5 Types of Transfer Learning?

Broadly speaking, there are five major types of transfer learning, which we’ll discuss in the following sections.

Domain Adaptation

Under the hood, most modern machine learning is really just an application of statistics to particular datasets.

The distribution of the data a particular model sees, therefore, matters a lot. Domain adaptation refers to a family of transfer learning techniques in which a model is (hopefully) trained such that it’s able to handle a shift in distributions from one domain to another (see section 5 of this paper for more technical details).

Domain Confusion

Earlier, we referenced the fact that the layers of a neural network can learn representations of particular features – one layer might be good at detecting curves in images, for example.

It’s possible to structure our training such that a model learns more domain invariant features, i.e. features that are likely to show up across multiple domains of interest. This is known as domain confusion because, in effect, we’re making the domains as similar as possible.

Multitask Learning

Multitask learning is arguably not even a type of transfer learning, but it came up repeatedly in our research, so we’re adding a section about it here.

Multitask learning is what it sounds like; rather than simply training a model on a single task (i.e. detecting humans in images), you attempt to train it to do several things at once.

The debate about whether multitask learning is really transfer learning stems from the fact that transfer learning generally revolves around adapting a pre-trained model to a new task, rather than having it learn to do more than one thing at a time.

One-Shot Learning

One thing that distinguishes machine learning from human learning is that the former requires much more data. A human child will probably only need to see two or three apples before they learn to tell apples from oranges, but an ML model might need to see thousands of examples of each.

But what if that weren’t necessary? The field of one-shot learning addresses itself to the task of learning e.g. object categories from either one example or a small number of them. This idea was pioneered in “One-Shot Learning of Object Categories”, a watershed paper co-authored by Fei-Fei Li and her collaborators. Their Bayesian one-shot learner was able to “…to incorporate prior knowledge of the object world into the learning scheme”, and it outperformed a variety of other models in object recognition tasks.

Zero-Shot Learning

Of course, there might be other tasks (like translating a rare or endangered language), for which it is effectively impossible to have any labeled data for a model to train on. In such a case, you’d want to use zero-shot learning, which is a type of transfer learning.

With zero-shot learning, the basic idea is to learn features in one data set (like images of cats) that allow successful performance on a different data set (like images of dogs). Humans have little problem with this, because we’re able to rapidly learn similarities between types of entities. We can see that dogs and cats both have tails, both have fur, etc. Machines can perform the same feat if the data is structured correctly.

How Does Transfer Learning Work?

There are a few different ways you can go about utilizing transfer learning processes in your own projects.

Perhaps the most basic is to use a good pre-trained model off the shelf as a feature extractor. This would mean keeping the pre-trained model in place, but then replacing its final layer with a layer custom-built for your purposes. You could take the famous AlexNet image classifier, remove its last classification layer, and replace it with your own, for example.

Or, you could fine-tune the pre-trained model instead. This is a more involved engineering task and requires that the pre-trained model be modified internally to be better suited to a narrower application. This will often mean that you have to freeze certain layers in your model so that the weights don’t change, while simultaneously allowing the weights in other layers to change.

What are the Applications of Transfer Learning?

As machine learning and deep learning have grown in importance, so too has transfer learning become more crucial. It now shows up in a variety of different industries. The following are some high-level indications of where you might see transfer learning being applied.

Speech recognition across languages: Teaching machines to recognize and process spoken language is an important area of AI research and will be of special interest to those who operate contact centers. Transfer learning can be used to take a model trained in a language like French and repurpose it for Spanish.

Training general-purpose game engines: If you’ve spent any time playing games like chess or go, you know that they’re fairly different. But, at a high enough level of abstraction, they still share many features in common. That’s why transfer learning can be used to train up a model on one game and, under certain conditions, use it in another.

Object recognition and segmentation: Our Jetsons-like future will take a lot longer to get here if our robots can’t learn to distinguish between basic objects. This is why object recognition and object segmentation are both such important areas of research. Transfer learning is one way of speeding up this process. If models can learn to recognize dogs and then quickly be re-purposed for recognizing muffins, then we’ll soon be able to outsource both pet care and cooking breakfast.

transfer_learning_chihuahua
In fairness to the AI, it’s not like we can really tell them apart!

Applying Natural Language Processing: For a long time, computer vision was the major use case of high-end, high-performance AI. But with the release of ChatGPT and other large language models, NLP has taken center stage. Because much of the modern NLP pipeline involves word vector embeddings, it’s often possible to use a baseline, pre-trained NLP model in applications like topic modeling, document classification, or spicing up your chatbot so it doesn’t sound so much like a machine.

What are the Benefits of Transfer Learning?

Transfer learning has become so popular precisely because it offers so many advantages.

For one thing, it can dramatically reduce the amount of time it takes to train a new model. Because you’re using a pre-trained model as the foundation for a new, task-specific model, far fewer engineering hours have to be spent to get good results.

There are also a variety of situations in which transfer learning can actually improve performance. If you’re using a good pre-trained model that was trained on a general enough dataset, many of the features it learned will carry over to the new task.

This is especially true if you’re working in a domain where there is relatively little data to work with. It might simply not be possible to train a big, cutting-edge model on a limited dataset, but it will often be possible to use a pre-trained model that is fine-tuned on that limited dataset.

What’s more, transfer learning can work to prevent the ever-present problem of overfitting. Overfitting has several definitions depending on what resource you consult, but a common way of thinking about it is when the model is complex enough relative to the data that it begins learning noise instead of just signal.

That means that it may do spectacularly well in training only to generalize poorly when it’s shown fresh data. Transfer learning doesn’t completely rule out this possibility, but it makes it less likely to happen.

Transfer learning also has the advantage of being quite flexible. You can use transfer learning for everything from computer vision to natural language processing, and many domains besides.

Relatedly, transfer learning makes it possible for your model to expand into new frontiers. When done correctly, a pre-trained model can be deployed to solve an entirely new problem, even when the underlying data is very different from what it was shown before.

When To Use Transfer Learning

The list of benefits we just enumerated also offers a clue as to when it makes sense to use transfer learning.

Basically, you should consider using transfer learning whenever you have limited data, limited computing resources, or limited engineering brain cycles you can throw at a problem. This will often wind up being the case, so whenever you’re setting your sights on a new goal, it can make sense to spend some time seeing if you can’t get there more quickly by simply using transfer learning instead of training a bespoke model from scratch.

Check out the second video in Quiq’s LLM Intuitions series—created by our Head of AI, Kyle McIntyre—to learn about one of the oldest forms of transfer learning: Word embeddings.

Transfer Learning and You

In the contact center space, we understand how difficult it can be to effectively apply new technologies to solve our problems. It’s one thing to put together a model for a school project, and quite another to have it tactfully respond to customers who might be frustrated or confused.

Transfer learning is one way that you can get more bang for your engineering buck. By training a model on one task or dataset and using it on another, you can reduce your technical budget while still getting great results.

You could also just rely on us to transfer our decades of learning on your behalf (see what we did there). We’ve built an industry-leading conversational AI chat platform that is changing the game in contact centers. Reach out today to see how Quiq can help you leverage the latest advances in AI, without the hassle.