How Can AI Make Agents More Efficient?

From the invention of writing to quantum computing, emerging technologies have always had a profound impact on the way we work. New tools mean new products and services, new organizational structures, whole new markets, and sometimes even new methods of thought.

These days, the big news is coming out of artificial intelligence. Specifically, the release of ChatGPT has made it possible for everyone to try out an advanced AI application for the first time, and it has ignited a firestorm of speculation as to how industries ranging from medicine to copywriting might be transformed.

In this piece, we’re going to try to cut through the hype to give contact center managers some much-needed clarity. We’ll discuss what AI is useful for, how it will change how contact center agents function daily, and what tools they should investigate to get the most out of AI.

What Is AI Useful For?

Artificial intelligence is a pretty broad category, encompassing everything from the most basic linear regressions to the remarkable sophistication of deep reinforcement learning agents.

This is too much territory to cover in a single blog post, but we can nevertheless make some useful general comments.

The way we see it, there are essentially two ways that AI is useful: it can either completely replace a human for certain tasks, allowing them to shift their focus to higher-value work, or it can augment their process, allowing them to reach insights or achieve objectives that would’ve taken much longer otherwise.

Take the example of ChatGPT, a large language model trained on huge quantities of human-generated text that is able to write poetry, generate math proofs, create functioning code, and much more.

For certain tasks – like generating blog post titles or short email blasts – ChatGPT is good enough to supplant humans altogether. But if you’re trying to learn a complex subject like organic chemistry, it’s best to treat ChatGPT more like a conversational partner. You can ask it questions or use it to test your understanding of a concept, but you have to be careful with its output because it might be hallucinating or otherwise getting important facts wrong. [1]

Since ChatGPT and large language models more generally are what everyone is focused on at the moment, it’s what we’ll be discussing throughout this essay.

How is AI Changing How Contact Center Agents Work?

A woman smiling as she interacts with generative AI on her laptop.

As soon as ChatGPT was released it spawned an unending stream of hot takes, from “this is going to completely automate the entire economy” to “this is going to be a huge flop that no one finds particularly useful.”

Recently, a study by Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond called “Generative AI at Work” examined how LLMs are being used in contact centers. They found that both these perspectives were wrong: generative AI was not completely automating contact centers but was proving enormously helpful in making contact centers more efficient.

Specifically, LLMs were able to capture some of the conversational patterns and general tacit knowledge held by more senior agents and transfer it to more junior agents. The result was more productivity among these less experienced workers, less overall turnover, and a better customer experience.

To help flesh this picture out, we’ll now turn to examining some specific ways this works.

Large Language Models are Helping Agents Work Faster

There are a few ways that LLMs are helping agents get their jobs done more quickly and efficiently.

One is by helping them cut down on typing by providing contextually appropriate responses to customer questions, which is exactly what Quiq Compose does.

Quiq Compose learns from interactions between contact center agents and customers. It can take a barebones outline of a reply (“Nope, you waited too long to return the product…”) and flesh it out into a full, coherent, grammatical response (“I’m so sorry to hear that the product isn’t working as intended…”.)

Quiq Suggest also learns from multiple agent-customer interactions, but it offers real-time suggestions. As your contact center agents begin typing their responses, the underlying model offers a robust form of autocomplete to help them craft replies more quickly. This substantially reduces the amount of time that agents have to spend up to 30% less time hunting around for information and tweaking their language to be both polite and informative.

What’s more, because Quiq Suggest leverages lightweight “edge” language models trained on a specific company’s data, it’s able to run very quickly.

Another way you can reduce agent handling time is by simply cutting down on the amount of text a given agent has to process. In the course of resolving an issue, there will usually be some extraneous text, like “Thanks!” or “Have a good day!” When Quiq’s conversational AI platform sees these unimportant messages, it automatically filters them and tacks them on to the end of the transcript.

Finally, a lot of friction and information loss can occur when a conversation is transferred between agents, or from an AI to a human agent. This is where conversation summarization comes in handy. By automatically summarizing the interaction so far, these transfers can take less time and energy, which also contributes to lower agent burnout and higher customer satisfaction.

Large Language Models can provide 24/7 Customer Support

There’s a fundamental asymmetry in running a great contact center, inasmuch as problems can occur around the clock but your agents need to sleep, rest, and play frisbee golf.

Unless, of course, some of your agents aren’t human. One of the great advantages of computers and algorithms is that they have none of the human frailties that prevent us all from working every hour of the day. They have no need for sleep, bathroom breaks, or recreation.

If you’re using a powerful conversational AI platform like Quiq, you can have AI agents deployed every hour, day or night, answering questions, completing tasks, and resolving problems.

Of course, the technology is not yet good enough to handle everything a contact center agent would handle, and some issues will have to be postponed until the humans punch the clock. Still, with the right tools, your operation can constantly be moving forward.

Large Language Models Can Help With Documentation

Writing documentation is one of those crucial, un-sexy tasks that businesses ignore at their own peril. Everyone wants to be coding up a blockchain or demo-ing a shiny new application to well-heeled investors, but someone needs to be sitting and writing up product specs, troubleshooting workflows, and all the other text that helps an organization function effectively.

This, too, is something that AI can help with. Whether it’s brainstorming an outline, identifying common sticking points, or even writing the document wholesale, more and more technical organizations are exploring LLMs to speed up their documentation efforts.

Just remember that LLMs like ChatGPT are extremely prone to hallucinations, so carefully fact-check everything they produce before you add it to your official documentation.

Large Language Models Can Help With Marketing

A final place where AI is proving incredibly useful is in marketing. Whether or not your agents have any input into your marketing depends on how you run your contact center, but this piece wouldn’t be complete without at least briefly touching upon marketing.

One obvious way that this can work is by having ChatGPT generate headlines, subject lines, Tweets, or even SEO-optimized blog posts.

But this is not the only way AI can be used in marketing. One very clever use of the technology that we’ve encountered is having ChatGPT generate customer journeys or customer diary entries. If your product is targeting men in their 40s who aren’t crushing life they way they used to, for example, it can create a month’s worth of forum posts from your target buyers discussing their lack of drive and motivation. This, in turn, will furnish targeted language you can use in your copy.

But bear in mind that marketing is one of those things that’s just incredibly subtle. It takes all of 30 seconds to come up with a few headlines for an email, but the difference between an okay headline and an extraordinary one can be a single word. Here, as elsewhere, it’s wise to have the final word remain with the humans.

Working more Quiq-ly

The world is changing, and contact centers are changing along with it. If you expect to retain a competitive edge and a top-notch contact center, you’ll need to utilize the latest technologies.

One way you could do this is by paying an expensive engineering team to build your own LLMs and AI tooling. But a much easier way is to integrate our Quiq conversational AI platform into your contact center. Whether it’s automatic summarization, filtering trivial messages, or using Quiq Suggest and Quiq Compose to cut down on average handle time, we have a product that will streamline your operation. Schedule a demo with us today to see how we can help you!

Footnotes
[1] You could argue that both of these examples boil down to the same thing. That is, even when you treat ChatGPT as a sounding board you’re really just replacing a human being that could’ve performed the same function. This is a plausible point of view, but we still think it’s useful to distinguish between “ChatGPT acting like a total replacement for a human for certain boilerplate tasks” and “ChatGPT augmenting a human’s workflow by acting like an idea generator or conversational partner.” Reasonable people could disagree on this, and your mileage may vary.

Request A Demo

The Pros and Cons of Using ChatGPT: Agents vs. Customers

If you’re a contact center manager who has been impressed with ChatGPT and everything it makes possible, a natural follow-up question is where you should deploy it.

On the one hand, you could use it internally to make your contact center agents more efficient. They’d be able to ask questions of your company documentation, summarize important emails, outsource the more trivial parts of their workload, and plenty besides.

On the other hand, you could use it externally as a customer-facing application. If you had clients that were confused about a feature or needed help figuring something out, ChatGPT could go a long way towards resolving their issues with minimal attention from your contact center agents.

Of course, there is major overlap in both these options, but there are crucial differences as well. In this article, we’ll discuss the pros and cons of using ChatGPT or a similar large language model (LLM) for contact center agents v.s. using it for customers.

How is ChatGPT Making Contact Center Agents More Efficient?

To a first approximation, a contact center is a place where questions are answered. No matter how clear your instructions or comprehensive your documentation, there will inevitably be users who simply can’t get an issue resolved, and that’s when they’ll reach out to customer support.

This means that much of a contact center agent’s day-to-day revolves around interacting with clients via text, either over a chat interface or possibly through text messaging.

What’s more, much of this interaction will be relatively formulaic. Customers will be repeatedly asking about similar sorts of issues, or there’ll be asking questions that are covered somewhere in your product’s documentation.

If you’ve spent even five minutes with ChatGPT, it’s probably occurred to you that it’s a powerful tool for handling exactly these kinds of tasks. Let’s spend a few minutes digging into this idea.

Outsourcing Routine Tasks

The most obvious way that ChatGPT is making contact center agents more efficient is by allowing them to outsource some of this more routine work.

There are a few ways this can happen. First, ChatGPT can help with answering basic questions. Today, large language models are not particularly good at generating highly original and inventive text, but when it comes to churning out helpful, simple boilerplate, they’re without peer.

This means that, with a little training or fine-tuning, your contact center agents can use ChatGPT to answer the sorts of questions they see multiple times a day, such as where a given feature is located or how to handle a common error. This will free them up to focus on the more involved queries, for which they have a comparative advantage.

In this same vein, tools like ChatGPT can also help contact center agents adopt the appropriate, polite tone in their correspondences. Customer experience and customer service are major parts of being a contact center agent, which means replies must be crafted so as to put the customer (who may be frustrated, angry, and belligerent) at ease.

This is something ChatGPT excels at, and according to the paper “Generative AI at work”, this exact dynamic was responsible for a lot of the gain in productivity seen in a contact center that began using an LLM. The model was trained on the interactions of more seasoned agents who know how to deal with tricky customers, and a good portion of this ability was transferred to more junior agents via the model’s output.

Another place where ChatGPT can help is in writing documentation. This may fall to a technical writer rather than an actual agent, but in either case, ChatGPT’s remarkable ability to provide outlines and quickly generate expository text can speed up the process of documenting your product’s core features.

And finally, ChatGPT is quite good at writing and explaining simple code. As with documentation, it’s doubtful that a contact center agent is going to be spending much time writing code. Nevertheless, your agents might find themselves hit with questions from savvier users about e.g. API integrations, so they should know that they can query ChatGPT about what a code snippet is doing, and they can have it generate a basic code example if they need to.

Learning and Brainstorming

This is a bit more abstract, but ChatGPT has proven remarkably useful in brainstorming study plans, solutions to problems, etc. Though the algorithm itself isn’t particularly creative, when it generates ideas that a human being can riff off of the combination of algorithm + human can be much more creative than a human working by herself.

While there will be many situations in which a contact center agent has a script to work off of, when they don’t, turning to ChatGPT can be the spark that moves them forward.

ChatGPT Plugins for Contact Center Agents

One of the more exciting developments for ChatGPT was the release of its plugin library in March of 2023. There are now plugins from Instacart (for food delivery), Expedia (for trip planning), Klarna Shopping (for online retail), and many others.

Truthfully, most of this won’t (yet) be of much use for contact center agents, but it’s worth mentioning given how quickly people are developing new plugins. If you’re a contact center agent or manager wanting to extend the functionality of powerful LLM technologies, plugins are something you’ll want to be aware of.

Getting the Most out of ChatGPT for Customer Service

ChatGPT is remarkably good for a wide range of tasks, but to really leverage its full capacities you’ll need to be aware of a few common terms.

Large language models are known to be really sensitive to small changes in word choice and structure, which means there’s an art to phrasing your requests just so. This is known as “prompt engineering” a language model, and it’s a new discipline that can be enormously valuable if done correctly.

You can also get better results if you show ChatGPT an example or two of what you’re looking for. This is known as “one-shot” learning (if you show it one example), and “few-shot” learning (if you show it five or six).

Of course, if that doesn’t work you can instead try to fine-tune a large language model. This involves gathering hundreds of examples of the conversations, text, or output you want to see and feeding them all to the model (probably over its API) so that the model’s internal structure actually changes. Though it’s obviously a more significant engineering challenge, it will probably give you the best results of all.

ChatGPT v.s. Chatbots

We in the customer experience field have quite a lot of experience with chatbots, so it’s natural to wonder how ChatGPT is different.

Chatbots are just algorithms that are capable of carrying on a dialogue with customers, and this can be accomplished in many different ways. Some chatbots are extremely simple and follow a rules-based approach to formatting their responses, while those based on neural networks or some other advanced machine-learning technology are much more flexible.

Chatbots can be built with ChatGPT, but most aren’t.

How is ChatGPT Changing Customer Experience?

Now that we’ve covered some of the ways in which ChatGPT is helping customer service agents, let’s discuss some of the ways ChatGPT is used for customer support.

Personalized Responses

One property of ChatGPT that makes it extremely effective is that it’s able to remember the context. When you chat with ChatGPT, it’s not generating each new response in a vacuum, it’s producing them either on the basis of what has already been said or based on information that it’s been given.

This means that if you have a customer interacting with a chat interface powered by a LLM (and are being smart by guardrailing it with a conversational CX platform like Quiq), they’ll be able to have more open-ended and personalized interactions with the tool than would be possible with simpler chatbots.

This will go a long way toward making them feel like they’re being taken care of, thus boosting your company’s overall customer satisfaction.

Automatically Resolving Customer Issues

Earlier, we talked about how contact center agents would be able to leverage ChatGPT in order to outsource their more routine tasks.

Well, one of those routine tasks is resolving a steady stream of quotidian issues. How many times a day do you think a contact center agent has to help a person log in to their client’s software or reset a password? It’s probably not “hundreds”, but we’d bet that it’s a lot.

ChatGPT is a long way away from being able to patiently guide a user through any arbitrary problem they might have, but it’s already more than capable of handling the kinds simple of repetitive, basic queries that sap an agent’s energy.

Automatic Natural Language Translation

One of the surprising places where ChatGPT excels is in fast, accurate translation between multiple languages. Given the fact that English is so commonly used in the technical community, it can be easy to lose sight of the fact that billions of people have either no knowledge of it or, at best, a very rudimentary grasp.

But not many can afford to have all their documentation translated into dozens of different tongues or to keep a team of translators on staff. ChatGPT is almost certainly not going to capture every little nuance in a translation, but it should be sufficient to help a person resolve their issue on their own or to ask more pointed, technical questions.

Dangers in Using ChatGPT

Whether you end up letting your agents or your customers get ahold of ChatGPT first, you should know that it’s not a panacea, nor is it perfect. It can and will fail, and some of those failures are reasonably predictable ones you should be prepared for.

The most obvious and well-known failure is referred to as a “hallucination”, and it results from the way that LLMs like ChatGPT are trained. An LLM learns how to output sequences of tokens, it’s not doing any fact-checking on its own. That means it will cheerfully and confidently make up names, book titles, and URLs.

It’s also possible for ChatGPT to become obnoxious and insulting. The team at OpenAI has done a good job of tuning this behavior out, but recall that these systems are very sensitive to the way prompts are structured, and it can reemerge.

There’s no general solution to these issues as far as we know. You can assiduously construct a fine-tuning pipeline for LLMs that does even more to get rid of toxicity, but ultimately you’re going to have to monitor ChatGPT’s output to see if it’s straying or otherwise being unhelpful.

Quiq specializes in defining guardrails for enterprise businesses who want to harness ChatGPT’s benefits, but are brand protective.

Figuring Out Where to Deploy ChatGPT

Whether it makes more sense to use ChatGPT internally or externally will depend a lot on your circumstances. There’s a lot ChatGPT can do to make your contact center agents more efficient, but if you’re just wanting to offload basic customer queries they can certainly be useful for that purpose.

In our considered opinion, the ROI is ultimately higher for using ChatGPT in a customer-facing way. This will allow your clients to help themselves, ultimately boosting their satisfaction and their estimation of your product.

But whichever way you choose to go, you can substantially reduce the headache associated with managing the infrastructure for this complex technology by making use of the Quiq conversational CX platform. With us, you can get world-leading results, satisfy your customer, lighten the load on your agents, and never have to worry about a rogue answer,  compute cluster, or GPU.

Current Large Language Models and How They Compare

From ChatGPT and Bard to BLOOM and Claude, there is now a veritable ocean of different large language models (LLMs) for you to choose from. Some of them are specialized for specific use cases, some are open-source, and there’s a huge variance in the number of parameters they contain.

If you find yourself fascinated by this technology and interested in using it in your contact center, it can be hard to know how to choose the right tool for the job.

Today, we’re going to tackle this issue head-on. After a brief discussion of the history of LLMs, we’ll talk about specific criteria you can use to evaluate LLMs, sources of additional information, and some of the better-known options.

Let’s get going!

A Brief History of Generative AI

Though it may feel like LLMs and generative AI exploded onto the scene all of half a year ago, in fact, the basic research powering these advances goes back much further.

Way back in the 1940s, Walter Pitts and Warren McCulloch drew upon early research on the brain to design artificial neurons. Though these worked, they couldn’t be deployed for anything particularly useful until the invention of the backpropagation algorithm in 1985. This allowed larger neural networks to be trained effectively, and in 1989 Yann LeCun built a convolutional system able to identify handwritten numbers.

Around this same time, there were architectural discoveries like long short-term memory networks that made it possible for machine learning algorithms to learn far more complex relationships within data, laying the foundations for them to eventually be able to revolutionize work in places like contact centers.

What’s more, the opening decade of the 2000s marked the beginning of the big data era. For all their power, generative pre-trained models like ChatGPT are not terribly efficient learners. To be able to output language or images, they must be shown many, many examples from which to derive the statistical function that allows them to create surprising new output later.

Once researchers began the practice of publishing enormous datasets a key obstacle to building large, useful systems was removed. When combined with the preceding six decades of foundational conceptual work, this was enough to allow us here in 2023 to witness the birth of generative AI and large language models.

How to Compare Large Language Models?

If you’re shopping around for a large language model for a particular application, it makes sense to first get clear on the evaluation criteria you should be using. That’s what we’ll cover in the sections below.

Evaluating LLMs Based on Industry

One of the more remarkable aspects of ChatGPT is that it’s so good at so many things. Out of the box (or sometimes with a little fine-tuning) it can perform very well at answering questions, summarizing text, translating between natural languages, and much more.

However, there may well be situations in which you’d want to use a domain-specific LLM, one that has been trained on medical or legal text, financial data, etc. The basic process of training a generative model is now being used to build neural networks for material design, protein synthesis, and music, among other things.

So, if you’re considering using a generative pre-trained model in your business, one thing you might want to think about early on is whether you want to try to find a domain-specific model, or a general model that you train on your own data.

If you do look for a domain-specific model, be aware that the space is very new and there might not be one available yet (though given how much attention is going into generative AI right now, there’s also a decent chance that one will be released in relatively short order).

Alternatively, you could try to fine-tune a pre-trained model. Getting into the nuances of fine-tuning, zero-shot learning, few-shot learning, and prompt engineering is beyond the scope of this article, but suffice it to say that there are many ways for you to get a generic LLM to be better at a smaller range of specific tasks.

If you’re an engineer designing circuits for quantum computers this might not be sufficient, but for those of us working in customer experience and contact centers, a well-honed prompt or a half-dozen examples might be more than enough for substantial performance boosts.

Evaluating LLMs By Language

Given that English is a sort of lingua franca (should it be lingua anglica?) for the tech community and makes up nearly 60% of the websites on the internet, it’s no surprise that it also comprises the bulk of the training data going into modern LLMs.

ChatGPT and other systems are often pretty good at multi-lingual tasks by default, but they don’t perform equally well in all languages. As you can probably guess, they’re best at “high-resource” languages (English, Spanish, Chinese), somewhat worse at “medium-resource” languages (Portuguese, Hindi), and much worse at “low-resource” languages (Haitian and Swahili).

If you’re serving customers with a medium- or low-resource language and need really high levels of accuracy, you’ll probably have to stick with human beings for a while. Otherwise, test ChatGPT or whatever system you end up going with for how well it can handle multi-lingual problems like question answering and translation.

Whether They’re Open-Source or Closed-Source

No doubt you’ve heard of “open-source” software, a term which refers to the practice of releasing source code to the public where it can be forked, modified, and scrutinized.

The open-source approach to software development has become incredibly popular, and this enthusiasm has partially bled over into artificial intelligence and machine learning. It’s is now fairly common to open source datasets, models, and even training frameworks like TensorFlow.

How does this translate to the realm of large language models? In truth, it’s a bit of a mixture. Some models are proudly open-sourced, while others jealously guard their model’s weights, training data, and source code.

This is one thing you might want to consider as you carry out your search for a large-language model. Some of the very best models, like ChatGPT, are closed-source. You won’t be able to fork the ChatGPT code base and modify it, you’ll be relegated to feeding queries into it via an API.

The advantage to going with a closed-source model, of course, is that you needn’t lay awake at night worrying about managing a codebase thousands of lines long, nor will you need to concern yourself with hiring the expensive engineers who know how to read and use it.

The downside, naturally, is that you’re entirely beholden to the team who builds and offers the LLM over their API. If they make updates or go bankrupt, you could be left scrambling last-minute to find an alternative solution.

There’s no one-size-fits-all approach here; if you have the in-house technical expertise to fork an open-source LLM and you want to modify it, open-source is probably the way to go. But be aware that this is a substantial commitment, and as things stand today, the very best generative pre-trained language models are closed-source, so there’s a performance penalty that you’ll have to account for.

Contact Us
 

Leaderboards and Comparison Websites for Large Language Models

Another route you can go in comparing current LLMs is to avail yourself of a service build for this purpose.

Whatever rumors you may have heard, programmers are human beings, and human beings have a fondness for ranking and categorizing pretty much everything – sports teams, guitar solos, classic video games, you name it.

Naturally, as LLMs have become better-known, leaderboards and websites have popped up comparing them along all sorts of different dimensions. Here are a few you can use as you search around for the best tooling.

Leaderboards

In the past couple of months, leaderboards have emerged which directly compare various LLMs.

One is AlpacaEval, which uses a custom dataset to compare ChatGPT, Claude, Cohere, and other LLMs on how well they’re able to follow instructions. AlpacaEval boasts high agreement with human evaluators, so in our estimation it’s probably a suitable way of initially screening for LLM tools, though more extensive checks might be required to settle on a final list.

Another good choice is Chatbot Arena, which pits two anonymous models side-by-side, has you rank which one is better, than aggregates all the scores into a leaderboard.

Finally, there is Hugging Face’s Open LLM Leaderboard, which is a similar endeavor. Anyone can submit a new model for evaluation, all of which are then assessed based on a small set of key benchmarks from the Eleuther AI Language Model Evaluation Harness. These capture how well the models do in answering simple science questions, common-sense queries, and more.

When combined with the criteria we discussed earlier, these leaderboards and comparison websites ought to give you everything you need to find a powerful generative pre-trained language model for your application.

What are the Currently-Available Large Language Models?

Okay! Now that we’ve worked through all this background material, let’s turn to discussing some of the major LLMs that are available today. We make no promises about these entries being comprehensive (and even if they were, there’d be new models out next week), but it should be sufficient to give you an idea as to the range of options you have.

ChatGPT and GPT

Obviously, the titan in the field is OpenAI’s ChatGPT, which is really just a version of GPT that has been fine-tuned through reinforcement learning from human feedback to be especially good at sustained dialogue.

ChatGPT and GPT have been used in many domains, including customer service, question answering, and many others.

LLaMA

In February of 2023, Facebook’s AI team released its Large Language Model Meta AI, or LLaMA. At 65 billion parameters it is not quite as big as GPT, and this is intentional, as it’s purpose is to aid researchers who may not have the budget or expertise required to provision a behemoth LLM.

LaMDA

Like GPT-4, Google’s LaMDA is based on the transformer architecture and is aimed squarely at dialogue. It is able to converse on a nearly infinite number of subjects, and from the beginning, the Google team has focused on having LaMDA produce interesting responses that are nevertheless absent of abuse and harmful language.

MT-NLG

The Megatron-Turing Natural Language Generation (MT-NLG) model from Nvidia sports a staggering half-trillion (530 billion) parameters, and excels at “…Completion prediction, Reading comprehension, Commonsense reasoning, Natural language inferences, Word sense disambiguation,” and more.

StableLM

StableLM is a lightweight, open-source language model built by Stability AI. It’s trained on a new dataset called “The Pile”, which is itself made up of over 20 smaller, high-quality datasets which together amount to over 825 GB of natural language.

GPT4All

What would you get if you trained an LLM on “…on a massive curated corpus of assistant interactions, which included word problems, multi-turn dialogue, code, poems, songs, and stories,” then released it to on an Apache 2.0 license? The answer is GPT4All, an open-source model whose purpose is to encourage research into what these technologies can accomplish.

Alpaca

The Alpaca LLM project developed by Stanford is designed around following instructions. As things stand, Alpaca isn’t considered safe yet, so it is intended to be used by research teams exploring the frontiers of LLMs.

BLOOM

The BigScience Large Open-Science Open-Access Multilingual Language Model (BLOOM) was released in late 2022. The team that put it together consisted of more than a thousand researchers from all over the worlds, and unlike the other models on this list, it’s specifically meant to be interpretable.

GATO

DeepMind is one of the leading players advancing the frontiers of AI, and their GATO LLM is correspondingly remarkable. Like GPT-4, GATO is multimodal, meaning it can work with text, images, games, and can even control a robot.

Pathways Language Model (PaLM)

Like LaMDA, PaLM is from Google, and is also enormous (540 billion parameters). It excels in many language-related tasks, and became famous when it produced really high-level explanations of tricky jokes.

Claude

Anthropic’s Claude is billed as a “next-generation AI assistant.” It’s not known how big the model is, but it does come in two modes: the full Claude, and Claude Instant, which is faster but produces lower-quality responses.

FAQs

Now, let’s turn to some common sources of confusion where comparing current LLMs are concerned.

Overcoming the Limitations of Large Language Models

Large language models are remarkable tools, but they nevertheless suffer from some well-known limitations. They tend to hallucinate facts, for example, sometimes fail at basic arithmetic, and can get lost in the course of lengthy conversations.

Overcoming the limitations of large language models is mostly a matter of fine-tuning and monitoring them. The fine-tuning data you use must be carefully curated in order to cover basic failure modes, and you must have a robust means of checking on their output in case they go off the rails somewhere along the line.

What are the Best Large Language Models?

Having read all of the foregoing content, it’s natural to wonder if there’s a single model that best suits your enterprise. The answer is probably “yes”, but which model is ultimately the best fit for you depends a lot on the specifics. You’ll have to think about whether you want an open-source model or your content with hitting an API, whether your use case is outside the scope of ChatGPT and better handled with a bespoke model, etc.

Choosing Among the Current Large Language Models

With all the different LLMs on offer, it’s hard to narrow the search down to the one that’s best for you. By carefully weighing the different metrics we’ve discussed in this article, you can choose an LLM that meets your needs with as little hassle as possible.

Another way to minimize your headaches is to use an industry-leading solution that works out of the box to deliver world-class functionality. That’s exactly what we’re achieving here at Quiq. Schedule a demo to see how our conversational AI platform can help you build a forward-facing contact center.

Contact Center Managers: What Do LLMs Mean For You?

Whether it’s quantum computing, the blockchain, or generative AI, whenever a promising new technology emerges, forward-thinking people begin looking for a way to use it.

And this is a completely healthy response. It’s through innovation that the world moves forward, but great ideas don’t mean much if there aren’t people like contact center managers who use them to take their operations to the next level.

Today, we’re going to talk about what large language models (LLMs) like ChatGPT mean for contact centers. After briefly reviewing how LLMs work we’ll discuss the way they’re being used in contact centers, how those centers are changing as a result, and some things that contact center managers need to look out for when utilizing generative AI.

What are Large Language Models?

As their name suggests, LLMs are large, they’re focused on language, and they’re machine-learning models.

It’s our view that the best way to tackle these three properties is in reverse order, so we’ll start with the fact that LLMs are enormous neural networks trained via self-supervised learning. These neural networks effectively learn a staggeringly complex function that captures the statistical properties of human language well enough for them to generate their own.

Speaking of human language, LLMs like ChatGPT are pre-trained generative models focused on learning from and creating text. This distinguishes them from other kinds of generative AI, which might be focused on images, videos, speech, music, and proteins (yes, really.)

Finally, LLMs are really big. As with other terms like “big data” no one has a hard-and-fast rule for figuring out when you’ve gone from “language model” to “large language model” – but with billions of internal parameters, it’s safe to say that an LLM is a few orders of magnitude bigger than anything you’re likely to build outside of a world-class engineering team.

How can Large Language Models be Used in Contact Centers?

Since they’re so good at parsing and creating natural language, LLMs are an obvious choice for enterprises where there’s a lot of back-and-forth text exchanged, perhaps while, say, resolving issues or answering questions.

And for this reason, LLMs are already being used by contact center managers to make their agents more productive (more on this shortly).

To be more concrete, we turned up a few specific places where LLMs can be leveraged by contact center managers most effectively.

Answering questions: Even with world-class documentation, there will inevitably be customers who are having an issue they want help with. Though ChatGPT won’t be able to answer every such question, it can handle a lot of them, especially if you’ve fine-tuned it on your documentation.

Streamlining onboarding: For more or less the same reason, ChatGPT can help you onboard new hires. Employees learning the ropes will also be confused about parts of your technology and your process, and ChatGPT can help them find what they need more quickly.

Summarizing emails and articles: It might be possible for a team of five to be intimately familiar with what everyone else is doing, but any more than this and there will inevitably be things happening that are beyond their purview. By summarizing articles, tickets, email or Slack threads, etc., ChatGPT can help everyone stay reasonably up-to-date without having to devote hours every day to reading.

Issue prioritization: Not every customer question or complaint is equally important, and issues have to be prioritized before being handed off to contact center agents. ChatGPT can aid in this process, especially if it’s part of a broader machine-learning pipeline built for this kind of classification.

Translation: If you’re lucky enough to have a global audience, there will almost certainly be users who don’t have a firm grasp of English. Though there are tools like Google Translate that do a great job of handling translation tasks, ChatGPT often does an even better job.

What are Large Language Models for Customer Service?

Large language models are ideally suited for tasks that involve a great deal of working with text. Because contact center agents spend so much time answering questions and resolving customer issues, LLMs are a technology that can make them far more productive. ChatGPT excels at tasks like question answering, summarization, and language translation, which is why they’re already changing the way contact centers function.

How is Generative AI Changing Contact Centers?

The fear that advances in AI will lead to a decrease in employment among inferior human workers has a long and storied pedigree. Still, thus far the march of technological progress has tended to increase the number (and remuneration) of available jobs on the market.

Far from rendering human analysts obsolete, personal computers are now a major and growing source of new work (though, we confess, much less of it is happening on typewriters than before.)

Nevertheless, once people got a look at what ChatGPT can do there arose a fresh surge of worry over whether, this time, the robots were finally going to take all of our jobs.

Wanting to know how generative pre-trained language models have actually impacted the functioning of contact centers, Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond looked at data from some 5,000 customer support agents using it in their day-to-day work.

Their paper, “Generative AI at Work”, found that generative AI had led to a marked increase in productivity, especially among the newest, least-knowledgable, and lowest-performing workers.

The authors advanced the remarkable hypothesis that this might stem from the fact that LLMs are good at internalizing and disseminating the hard-won tacit knowledge of the best workers. They didn’t get much out of generative AI, in other words, precisely because they already had what they needed to perform well; but some fraction of their skill – such as how to phrase responses delicately to avoid offending irate customers – was incorporated into the LLM, where it was more accessible by less-skilled workers than it was when it was locked away in the brains of high-skilled workers.

What’s more, the organizations studied also changed as a result. Employees (especially lower-skilled ones) were generally more satisfied, less prone to burnout, and less likely to leave. Turnover was reduced, and customers escalated calls to supervisors less frequently.

Now, we hasten to add that of course this is just one study, and we’re in the early days of the generative AI revolution. No one can say with certainty what the long-term impact will be. Still, these are extremely promising early results, and lend credence to the view that generative AI will do a lot to improve the way contact centers onboard new hires, resolve customer issues, and function overall.

What are the Dangers of Using ChatGPT for Customer Service?

We’ve been singing the praises of ChatGPT and talking about all the ways in which it’s helping contact center managers run a tighter ship.

But, as with every technological advance stretching clear back to the discovery of fire, there are downsides. To help you better use generative AI, we’ll spend the next few sections talking about some characteristic failure modes you should be looking out for.

Hallucinations

By now, it’s fairly common knowledge that ChatGPT will just make things up. This is a consequence of the way LLMs like ChatGPT are trained. Remember, the model doesn’t contain a little person inside of it that’s checking statements for accuracy; it’s just taking the tokens it has seen so far and predicting the tokens that will come next.

That means if you ask it for a list of book recommendations to study lepidoptery or the naval battles of the Civil War (we don’t know what you’re into), there’s a pretty good chance that the list it provides will contain a mix of real and fake books.

ChatGPT has been known to invent facts, people, papers (complete with citations), URLs, and plenty else.

If you’re going to have customers interacting with it, or you’re going to have your contact center agents relying on it in a substantial way, this is something you’ll need to be aware of.

Degraded Performance

ChatGPT is remarkably performant, but it’s still just a machine learning model and machine learning models are known to suffer from model degradation.

This term refers to gradual or precipitous declines in model performance over time. There are technical reasons why this occurs, but from your perspective, you need to understand that the work has only begun once a model has been trained and put into production.

But you’re also not out of the woods if you’re accessing ChatGPT via an API, because you have just as little visibility into what’s happening on OpenAI’s engineering teams as the rest of us do.

If OpenAI releases an update you might suddenly find that ChatGPT fails in usual ways or trips over tasks it was handling very well last week. You’ll need to have robust monitoring in place so that you catch these issues if they arise, as well as an engineering team able to address the root cause.

Model degradation often stems from issues with the underlying data. This means that if you’ve e.g. trained ChatGPT to answer questions you might have to assemble new data for it to train on, a process that takes time and money and should be budgeted for.

Harassment and Bias

You could argue that harassment, bias, and harmful language are a kind of degraded performance, but they’re distinct and damaging enough to warrant their own section.

When Microsoft first released Sydney it was cartoonishly unhinged. It would lie, threaten, and manipulate users; in one case, it confessed both its love for a New York Times reporter along with its desire to engineer dangerous viruses and ignite internecine arguments between people.

All this has gotten much better, of course, but the same behavior can manifest in subtler ways, especially if someone is deliberately trying to jailbreak a large language model.

Thanks to extensive public testing and iteration, the current versions of the technology are very good at remaining polite, avoiding stereotyping, etc. Nevertheless, we’re not aware of any way to positively assure that no bias, deceit, or nastiness will emerge from ChatGPT.

This is another place where you’ll have to carefully monitor your model’s output and make corrections as necessary.

Using LLMs in your Contact Center

If you’re running a contact center, you owe it to yourself to at least check out ChatGPT. Whether it makes sense for you will depend on your unique circumstances, but it’s a remarkable new technology that could help you make your agents more effective while reducing turnover.

Quiq offers a white-glove platform that makes it easy to leverage conversational AI. Schedule a demo with us to see how we can help you incorporate generative AI into your contact center today!