Technologies like large language models (LLMs) are amazing at rapidly generating polite text that helps solve a problem or answer a question, so they’re a great fit for the work done at contact centers.
But this doesn’t mean that using them is trivial or easy. There are many challenges associated with the ongoing management of an LLM assistant, including hallucinations and the emergence of bad behavior – and that’s not even mentioning the engineering prowess required to fine-tune and monitor these systems.
All of this must be borne in mind by contact center managers, and our aim today is to facilitate this process.
We’ll provide broad context by talking about some of the basic ways in which large language models are being used in business, discuss, setting up an LLM assistant, and then enumerate some of the specific steps that need to be taken in using them properly.
How Are LLMs Being Used in Science and Business?
First, let’s adumbrate some of the ways in which large language models are being utilized on the ground.
The most obvious way is by acting as a generative AI assistant. One of the things that so stunned early users of ChatGPT was its remarkable breadth in capability. It could be used to draft blog posts, web copy, translate between languages, and write or explain code.
This alone makes it an amazing tool, but it has since become obvious that it’s useful for quite a lot more.
One thing that businesses have been experimenting with is fine-tuning large language models like ChatGPT over their own documentation, turning it into a simple interface by which you can ask questions about your materials.
It’s hard to quantify precisely how much time contact center agents, engineers, or other people spend hunting around for the answer to a question, but it’s surely quite a lot. What if instead you could just, y’know, ask for what you want, in the same way that you do a human being?
Well, ChatGPT is a long way from being a full person, but when properly trained it can come close where question-answering is concerned.
Stepping back a little bit, LLMs can be prompt engineered into a number of useful behaviors, all of which redound to the benefit of the contact centers which use them. Imagine having an infinitely patient Socratic tutor that could help new agents get up to speed on your product and process, or crafting it into a powerful tool for brainstorming new product designs.
There have also been some promising attempts to extend the functionality of LLMs by making them more agentic – that is, by embedding them in systems that allow them to carry out more open-ended projects. AutoGPT, for example, pairs an LLM with a separate bot that hits the LLM with a chain of queries in the pursuit of some goal.
AssistGPT goes even further in the quest to augment LLMs by integrating them with a set of tools that allow them to achieve objectives involving images and audio in addition to text.
How to Set Up An LLM Assistant
Next, let’s turn to a discussion of how to set up an LLM assistant. Covering this topic fully is well beyond the scope of this article, but we can make some broad comments that will nevertheless be useful for contact center managers.
First, there’s the question of which large language model you should use. In the beginning, ChatGPT was pretty much the only foundation model on offer. Today, however, that situation has changed, and there are now foundation models from Anthropic, Meta, and many other companies.
One of the biggest early decisions you’ll have to make is whether you want to try and use an open-source model (for which the code and the model weights are freely available) or a close-source model (for which they are not).
If you go the closed-source route you’ll almost certainly be hitting the model over an API, feeding it your queries and getting its responses back. This is orders of magnitude simpler than provisioning an open-source model, but it means that you’ll also be beholden to the whims of some other company’s engineering team. They may update the model in unexpected ways, or simply go bankrupt, and you’ll be left with no recourse.
Using an open-source alternative, of course, means grabbing the other horn of the dilemma. You’ll have visibility into how the model works and will be free to modify it as you see fit, but this won’t be worth much unless you’re willing to devote engineering hours to the task.
Then, there’s the question of fine-tuning large language models. While ChatGPT and LLMs more generally are quite good on their own, having them answer questions about your product or respond in particular ways means modifying their behavior somehow.
Broadly speaking, there are two ways of doing this, which we’ve mentioned throughout: proper fine-tuning, and prompt engineering. Let’s dig into the differences.
Fine-tuning means showing the model many (i.e. several hundred) examples of the behaviors you want to see, which changes its internal weights and biases it towards those behaviors in the future.
Prompt engineering, on the other hand, refers to carefully structuring your prompts to elicit the desired behavior. These LLMs can be surprisingly sensitive to little details in the instructions they’re provided, and prompt engineers know how to phrase their requests in just the right way to get what they need.
There is also some middle ground between these approaches. “One-shot learning” is a form of prompt engineering in which the prompt contains a singular example of the desired behavior, while “few-shot learning” refers to including between three and five examples.
Contact center managers thinking about using LLMs will need to think about these implementation details. If you plan on only lightly using ChatGPT in your contact center, a basic course on prompt engineering might be all you need. If you plan on making it an integral part of your organization, however, that most likely means a fine-tuning pipeline and serious technical investment.
The Ongoing Management of an LLM
Having said all this, we can now turn to the day-to-day details of managing an LLM assistant.
Monitoring the Performance of an LLM
First, you’ll need to continuously monitor the model. As hard as it may be to believe given how perfect ChatGPT’s output often is, there isn’t a person somewhere typing the responses. ChatGPT is very prone to hallucinations, in which it simply makes up information, and LLMs more generally can sometimes fall into using harmful or abusive language if they’re prompted incorrectly.
This can be damaging to your brand, so it’s important that you keep an eye on the language created by the LLMs your contact center is using.
And of course, not even LLMs can obviate the need to track the all-import key performance indicators. So far, there’s been one major study on generative AI in contact centers that found they increased productivity and reduced turnover, but you’ll still want to measure customer satisfaction, average handle time, etc.
There’s always a temptation to jump on a shiny new technology (remember the blockchain?), but you should only be using LLMs if they actually make your contact center more productive, and the only way you can assess that is by tracking your figures.
Iterative Fine-Tuning and Training
We’ve already had a few things to say about fine-tuning and the related discipline of prompt engineering, and here we’ll build on those preliminary comments.
The big thing to bear in mind is that fine-tuning a large language model is not a one-and-done kind of endeavor. You’ll find that your model’s behavior will drift over time (the technical term is “model degradation”), and this means you will likely to have to periodically re-train it.
It’s also common to offer the model “feedback”, i.e. by ranking it’s responses or indicating when you did or did not like a particular output. You’ve probably heard of reinforcement learning through human feedback, which is one version of this process, but there are also others you can use.
Quality Assurance and Oversight
A related point is that your LLMs will need consistent oversight. They’re not going to voluntarily improve on their own (they’re algorithms with no personal initiative to speak of), so you’ll need to checking in routinely to make sure they’re performing well and that your agents are using them responsibly.
There are many parts to this, including checks on the models outputs and an audit process that allows you to track down any issues. If you suddenly see a decline in performance, for example, you’ll need to quickly figure out whether it’s isolated to one agent or part of a larger pattern. If it’s the former, was it a random aberration, or did the agent go “off script” in a way that caused the model to behave poorly?
Take another scenario, in which an end-user was shown inappropriate text generated by an LLM. In this situation, you’ll need to take a deeper look at your process. If there were agents interacting with this model, ask them why they failed to spot the problematic text and stop it being shown to a customer. Or, if it came from a mostly-automated part of your tech stack, you need to uncover the reasons for which your filters failed to catch it, and perhaps think about keeping humans more in the loop.
The Future of LLM Assistants
Though the future is far from certain, we tend to think that LLMs have left Pandora’s box for good. They’re incredibly powerful tools which are poised to transform how contact centers and other enterprises operate, and experiments so far have been very promising; for all these reasons, we expect that LLMs will become a steadily more important part of the economy going forward.
That said, the ongoing management of an LLM assistant is far from trivial. You need to be aware at all times of how your model is performing and how your agents are using it. Though it can make your contact center vastly more productive, it can also lead to problems if you’re not careful.
That’s where the Quiq platform comes in. Our conversational AI is some of the best that can be found anywhere, able to facilitate customer interactions, automate text-message follow-ups, and much more. If you’re excited by the possibilities of generative AI but daunted by the prospect of figuring out how TPUs and GPUs are different, schedule a demo with us today.