retrieval augmented generation Archives

Engineering Excellence: How to Build Your Own AI Assistant – Part 2

Posted on December 11, 2024February 10, 2025 John Andersen

In Part One of this guide, we explored the foundational architecture needed to build production-ready AI agents – from cognitive design principles to data preparation strategies. Now, we’ll move from theory to implementation, diving deep into the technical components that bring these architectural principles to life when you attempt to build your own AI assistant or agent.

Building on those foundations, we’ll examine the practical challenges of natural language understanding, response generation, and knowledge integration. We’ll also explore the critical role of observability and testing in maintaining reliable AI systems, before concluding with advanced agent behaviors that separate exceptional implementations from basic chatbots.

Whether you’re implementing your first AI assistant or optimizing existing systems, these practical insights will help you create more sophisticated, reliable, and maintainable AI agents.

Section 1: Natural Language Understanding Implementation

With well-prepared data in place, we can focus on one of the most challenging aspects of agentic AI agent development: understanding user intent. While LLMs have impressive language capabilities, translating user input into actionable understanding requires careful implementation of several key components.

While we use terms like ‘natural language understanding’ and ‘intent classification,’ it’s important to note that in the context of LLM-based AI agents, these concepts operate at a much more sophisticated level than in traditional rule-based or pattern-matching systems. Modern LLMs understand language and intent through deep semantic processing, rather than predetermined pathways or simple keyword matching.

Vector Embeddings and Semantic Processing

User intent often lies beneath the surface of their words. Someone asking “Where’s my stuff?” might be inquiring about order status, delivery timeline, or inventory availability. Vector embeddings help bridge this gap by capturing semantic meaning behind queries.

Vector embeddings create a map of meaning rather than matching keywords. This enables your agent to understand that “I need help with my order” and “There’s a problem with my purchase” request the same type of assistance, despite sharing no common keywords.

Disambiguation Strategies

Users often communicate vaguely or assume unspoken context. An effective AI agent needs strategies for handling this ambiguity – sometimes asking clarifying questions, other times making informed assumptions based on available context.

Consider a user asking about “the blue one.” Your agent must assess whether previous conversation provides clear reference, or if multiple blue items require clarification. The key is knowing when to ask questions versus when to proceed with available context. This balance between efficiency and accuracy maintains natural, productive conversations.

Input Processing and Validation

Before formulating responses, your agent must ensure that input is safe and processable. This extends beyond security checks and content filtering to create a foundation for understanding. Your agent needs to recognize entities, identify key phrases, and understand patterns that indicate specific user needs.

Think of this as your agent’s first line of defense and comprehension. Just as a human customer service representative might ask someone to slow down or clarify when they’re speaking too quickly or unclearly, your agent needs mechanisms to ensure it’s working with quality input, which it can properly process.

Intent Classification Architectures

Reliable intent classification requires a sophisticated approach beyond simple categorization. Your architecture must consider both explicit statements and implicit meanings. Context is crucial – the same phrase might indicate different intents depending on its place in conversation or what preceded it.

Multi-intent queries present a particular challenge. Users often bundle multiple requests or questions together, and your architecture needs to recognize and handle these appropriately. The goal isn’t just to identify these separate intents but to process them in a way that maintains a natural conversation flow.

Section 2: Response Generation and Control

Once we’ve properly understood user intent, the next challenge is generating appropriate responses. This is where many AI agents either shine or fall short. While LLMs excel at producing human-like text, ensuring that those responses are accurate, appropriate, and aligned with your business needs requires careful control and validation mechanisms.

Output Quality Control Systems

Creating high-quality responses isn’t just about getting the facts right – it’s about delivering information in a way that’s helpful and appropriate for your users. Think of your quality control system as a series of checkpoints, each ensuring that different aspects of the response meet your standards.

A response can be factually correct, yet fail by not aligning with your brand voice or straying from approved messaging scope. Quality control must evaluate both content and delivery – considering tone, brand alignment, and completeness in addressing user needs.

Hallucination Prevention Strategies

One of the more challenging aspects of working with LLMs is managing their tendency to generate plausible-sounding but incorrect information. Preventing hallucinations requires a multi-faceted approach that starts with proper prompt design and extends through response validation.

Responses must be grounded in verifiable information. This involves linking to source documentation, using retrieval-augmented generation for fact inclusion, or implementing verification steps against reliable sources.

Input and Output Filtering

Filtering acts as your agent’s immune system, protecting both the system and users. Input filtering identifies and handles malicious prompts and sensitive information, while output filtering ensures responses meet security and compliance requirements while maintaining business boundaries.

Implementation of Guardrails

Guardrails aren’t just about preventing problems – they’re about creating a space where your AI agent can operate effectively and confidently. This means establishing clear boundaries for:

What types of questions your agent should and shouldn’t answer
How to handle requests for sensitive information
When to escalate to human agents

Effective guardrails balance flexibility with control, ensuring your agent remains both capable and reliable.

Response Validation Methods

Validation isn’t a single step but a process that runs throughout response generation. We need to verify not just factual accuracy, but also consistency with previous responses, alignment with business rules, and appropriateness for the current context. This often means implementing multiple validation layers that work together to ensure quality responses, all built upon a foundation of reliable information.

Section 3: Knowledge Integration

A truly effective AI agent requires seamlessly integrating your organization’s specific knowledge, layering that on top of the communication capabilities of language models.This integration should be reliable and maintainable, ensuring access to the right information at the right time. While you want to use the LLM for contextualizing responses and natural language interaction, you don’t want to rely on it for domain-specific knowledge – that should come from your verified sources.

Retrieval-Augmented Generation (RAG)

RAG fundamentally changes how AI agents interact with organizational knowledge by enabling dynamic information retrieval. Like a human agent consulting reference materials, your AI can “look up” information in real-time.

The power of RAG lies in its flexibility. As your knowledge base updates, your agent automatically has access to the new information without requiring retraining. This means your agent can stay current with product changes, policy updates, and new procedures simply by updating the underlying knowledge base.

Dynamic Knowledge Updates

Knowledge isn’t static, and your AI agent’s access to information shouldn’t be either. Your knowledge integration pipeline needs to handle continuous updates, ensuring your agent always works with current information.

This might include:

Customer profiles (orders, subscription status)
Product catalogs (pricing, features, availability)
New products, support articles, and seasonal information

Managing these updates requires strong synchronization mechanisms and clear protocols to maintain data consistency without disrupting operations.

Context Window Management

Managing the context window effectively is crucial for maintaining coherent conversations while making efficient use of your knowledge resources. While working memory handles active processing, the context window determines what knowledge base and conversation history information is available to the LLM. Not all information is equally relevant at every moment, and trying to include too much context can be as problematic as having too little.

Success depends on determining relevant context for each interaction. Some queries need recent conversation history, while others benefit from specific product documentation or user history. Proper management ensures your agent accesses the right information at the right time.

Knowledge Attribution and Verification

When your agent provides information, it should be clear where that information came from. This isn’t just about transparency – it’s about building trust and making it easier to maintain and update your knowledge base. Attribution helps track which sources are being used effectively and which might need improvement.

Verification becomes particularly important when dealing with dynamic information. As an AI engineer, you need to ensure that responses are grounded in current, verified sources, giving you confidence in the accuracy of every interaction.

Section 4: Observability and Testing

With the core components of understanding, response generation, and knowledge integration in place, we need to ensure our AI agent performs reliably over time. This requires comprehensive observability and testing capabilities that go beyond traditional software testing approaches.

Building an AI agent isn’t a one-time deployment – it’s an iterative process that requires continuous monitoring and refinement. The probabilistic nature of LLM responses means traditional testing approaches aren’t sufficient. You need comprehensive observability into how your agent is performing, and robust testing mechanisms to ensure reliability.

Regression Testing Implementation

AI agent testing requires a more nuanced approach than traditional regression testing. Instead of exact matches, we must evaluate semantic correctness, tone, and adherence to business rules.

Creating effective regression tests means building a suite of interactions that cover your core use cases while accounting for common variations. These tests should verify not just the final response, but also the entire chain of reasoning and decision-making that led to that response.

Debug-Replay Capabilities

When issues arise – and they will – you need the ability to understand exactly what happened. Debug-replay functions like a flight recorder for AI interactions, logging every decision point, context, and data transformation. This visibility lets you trace paths from input to output, simplifying issue identification and resolution. This level of visibility allows you to trace the exact path from input to output, making it much easier to identify where adjustments are needed and how to implement them effectively.

Performance Monitoring Systems

Monitoring an AI agent requires tracking multiple dimensions of performance. Start with the fundamentals:

Response accuracy and appropriateness
Processing time and resource usage
Business-defined KPIs

Your monitoring system should provide clear visibility into these metrics, allowing you to set baselines, track deviations, and measure the impact of any changes you make to your agent. This data-driven approach focuses optimization efforts on metrics that matter most to business objectives.

Iterative Development Methods

Improving your AI agent is an ongoing process. Each interaction provides valuable data about what’s working and what’s not. You want to establish systematic methods for:

Collecting and analyzing interaction data
Identifying areas for improvement
Testing and validating changes
Rolling out updates safely

Success comes from creating tight feedback loops between observation, analysis, and improvement, always guided by real-world performance data.

Section 5: Advanced Agent Behaviors

While basic query-response patterns form the foundation of AI agent interactions, implementing advanced behaviors sets exceptional agents apart. These sophisticated capabilities allow your agent to handle complex scenarios, maintain goal-oriented conversations, and effectively manage uncertainty.

Task Decomposition Strategies

Complex user requests often require breaking down larger tasks into manageable components. Rather than attempting to handle everything in a single step, effective agents need to recognize when to decompose tasks and how to manage their execution.

Consider a user asking to “change my flight and update my hotel reservation.” The agent must handle this as two distinct but related tasks, each with different information needs, systems, and constraints – all while maintaining coherent conversation flow.

Goal-oriented Planning

Outstanding AI agents don’t just respond to queries – they actively work toward completing user objectives. This means maintaining awareness of both immediate tasks and broader goals throughout the conversation.

The agent should track progress, identify potential obstacles, and adjust its approach based on new information or changing circumstances. This might mean proactively asking for additional information when needed or suggesting alternative approaches when the original path isn’t viable.

Multi-step Reasoning Implementation

Some queries require multiple steps of logical reasoning to reach a proper conclusion. Your agent needs to be able to:

Break down complex problems into logical steps
Maintain reasoning consistency across these steps
Draw appropriate conclusions based on available information

Uncertainty Handling

Building on the flexible frameworks established in your initial design, advanced AI agents need sophisticated strategies for managing uncertainty in real-time interactions. This goes beyond simply recognizing unclear requests – it’s about maintaining productive conversations even when perfect answers aren’t possible.

Effective uncertainty handling involves:

Confidence assessment: Understanding and communicating the reliability of available information
Partial solutions: Providing useful responses even when complete answers aren’t available
Strategic escalation: Knowing when and how to involve human operators

The goal isn’t eliminating uncertainty, but to make it manageable and transparent. When definitive answers aren’t possible, agents should communicate limitations while moving conversations forward constructively.

Building Outstanding AI Agents: Bringing It All Together

Creating exceptional AI agents requires careful orchestration of multiple components, from initial planning through advanced behaviors. Success comes from understanding how each component works in concert to create reliable, effective interactions.

Start with clear purpose and scope. Rather than trying to build an agent that does everything, focus on specific objectives and define clear success criteria. This focused approach allows you to build appropriate guardrails and implement effective measurement systems.

Knowledge integration forms the backbone of your agent’s capabilities. While Large Language Models provide powerful communication abilities, your agent’s real value comes from how well it leverages your organization’s specific knowledge through effective retrieval and verification systems.

Building an outstanding AI agent is an iterative process, with comprehensive observability and testing capabilities serving as essential tools for continuous improvement. Remember that your goal isn’t to replace human interaction entirely, but to create an agent that handles appropriate tasks efficiently, while knowing when to escalate to human agents. By focusing on these fundamental principles and implementing them thoughtfully, you can create AI agents that provide real value to your users while maintaining reliability and trust.

Ready to put these principles into practice? Do it with AI Studio, Quiq’s enterprise platform for building sophisticated AI agents.

Everything You Need to Know About LLM Integration

Posted on October 3, 2024December 11, 2024 Kyle McIntyre

It’s hard to imagine an application, website or workflow that wouldn’t benefit in some way from the new electricity that is generative AI. But what does it look like to integrate an LLM into an application? Is it just a matter of hitting a REST API with some basic auth credentials, or is there more to it than that?

In this article, we’ll enumerate the things you should consider when planning an LLM integration.

Why Integrate an LLM?

At first glance, it might not seem like LLMs make sense for your application—and maybe they don’t. After all, is the ability to write a compelling poem about a lost Highland Cow named Bo actually useful in your context? Or perhaps you’re not working on anything that remotely resembles a chatbot. Do LLMs still make sense?

The important thing to know about ‘Generative AI’ is that it’s not just about generating creative content like poems or chat responses. Generative AI (LLMs) can be used to solve a bevy of other problems that roughly fall into three categories:

Making decisions (classification)
Transforming data
Extracting information

Let’s use the example of an inbound email from a customer to your business. How might we use LLMs to streamline that experience?

Making Decisions
- Is this email relevant to the business?
- Is this email low, medium or high priority?
- Does this email contain inappropriate content?
- What person or department should this email be routed to?
Transforming data
- Summarize the email for human handoff or record keeping
- Redact offensive language from the email subject and body
Extracting information
- Extract information such as a phone number, business name, job title etc from the email body to be used by other systems
Generating Responses
- Generate a personalized, contextually-aware auto-response informing the customer that help is on the way
- Alternatively, deploy a more sophisticated LLM flow (likely involving RAG) to directly address the customer’s need

It’s easy to see how solving these tasks would increase user satisfaction while also improving operational efficiency. All of these use cases are utilizing ‘Generative AI’, but some feel more generative than others.

When we consider decision making, data transformation and information extraction in addition to the more stereotypical generative AI use cases, it becomes harder to imagine a system that wouldn’t benefit from an LLM integration. Why? Because nearly all systems have some amount of human-generated ‘natural’ data (like text) that is no longer opaque in the age of LLMs.

Prior to LLMs, it was possible to solve most of the tasks listed above. But, it was exponentially harder. Let’s consider ‘is this email relevant to the business’. What would it have taken to solve this before LLMs?

A dataset of example emails labeled true if they’re relevant to the business and false if not (the bigger the better)
A training pipeline to produce a custom machine learning model for this task
Specialized hardware or cloud resources for training & inferencing
Data scientists, data curators, and Ops people to make it all happen

LLMs can solve many of these problems with radically lower effort and complexity, and they will often do a better job. With traditional machine learning models, your model is, at best, as good as the data you give it. With generative AI you can coach and refine the LLM’s behavior until it matches what you desire – regardless of historical data.

For these reasons LLMs are being deployed everywhere—and consumers’ expectations continue to rise.

How Do You Feel About LLM Vendor Lock-In?

Once you’ve decided to pursue an LLM integration, the first issue to consider is whether you’re comfortable with vendor lock-in. The LLM market is moving at lightspeed with the constant release of new models featuring new capabilities like function calls, multimodal prompting, and of course increased intelligence at higher speeds. Simultaneously, costs are plummeting. For this reason, it’s likely that your preferred LLM vendor today may not be your preferred vendor tomorrow.

Even at a fixed point in time, you may need more than a single LLM vendor.

In our recent experience, there are certain classification problems that Anthropic’s Claude does a better job of handling than comparable models from OpenAI. Similarly, we often prefer OpenAI models for truly generative tasks like generating responses. All of these LLM tasks might be in support of the same integration so you may want to look at the project not so much as integrating a single LLM or vendor, but rather a suite of tools.

If your use case is simple and low volume, a single vendor is probably fine. But if you plan to do anything moderately complex or high scale you should plan on integrating multiple LLM vendors to have access to the right models at the best price.

Resiliency & Scalability are Earned—Not Given

Making API calls to an LLM is trivial. Ensuring that your LLM integration is resilient and scalable requires more elbow grease. In fact, LLM API integrations pose unique challenges:

Challenge	Solutions
They are pretty slow	If your application is high-scale and you’re doing synchronous (threaded) network calls, your application won’t scale very well since most threads will be blocked on LLM calls. Consider switching to async I/O. You’ll also want to support running multiple prompts in parallel to reduce visible latency to the user.
They are throttled by requests per minute and tokens per minute	Attempt to estimate your LLM usage in terms of requests and LLM tokens per minute and work with your provider(s) to ensure sufficient bandwidth for peak load
They are (still) kinda flakey (unpredictable response times, unresponsive connections)	Employ various retry schemes in response to timeouts, 500s, 429s (rate limit) etc.

The above remediations will help your application be scalable and resilient while your LLM service is up. But what if it’s down? If your LLM integration is on a critical execution path you’ll want to support automatic failover. Some LLMs are available from multiple providers:

OpenAI models are hosted by OpenAI itself as well as Azure
Anthropic models are hosted by Anthropic itself as well as AWS

Even if an LLM only has a single provider, or even if it has multiple, you can also provision the same logical LLM in multiple cloud regions to achieve a failover resource. Typically you’ll want the provider failover to be built into your retry scheme. Our failover mechanisms get tripped regularly out in production at Quiq, no doubt partially because of how rapidly the AI world is moving.

Are You Actually Building an Agentic Workflow?

Oftentimes you have a task that you know is well-suited for an LLM. For example, let’s say you’re planning to use an LLM to analyze the sentiment of product reviews. On the surface, this seems like a simple task that will require one LLM call that passes in the product review and asks the LLM to decide the sentiment. Will a single prompt suffice? What if we also want to determine if a given review contains profanity or personal information? What if we want to ask three LLMs and average their results?

Many tasks require multiple prompts, prompt chaining and possibly RAG (Retrieval Augmented Generation) to best solve a problem. Just like humans, AI produces better results when a problem is broken down into pieces. Such solutions are variously known as AI Agents, Agentic Workflows or Agent Networks and are why open source tools like LangChain were originally developed.

In our experience, pretty much every prompt eventually grows up to be an Agentic Workflow, which has interesting implications for how it’s configured & monitored.

Be Ready for the Snowball Effect

Introducing LLMs can result in a technological snowball effect, particularly if you need to use Retrieval Augmented Generation (RAG). LLMs are trained on mostly public data that was available at a fixed point in the past. If you want an LLM to behave in light of up-to-date and/or proprietary data sources (which most non-trivial applications do) you’ll need to do RAG.

RAG refers to retrieving the up-to-date and/or proprietary data you want the LLM to use in its decision making and passing it to the LLM as part of your prompt.

Assuming you need to search a reference dataset like a knowledge base, product catalog or product manual, the retrieval part of RAG typically entails adding the following entities to your system:

1. An embedding model

An embedding model is roughly half of an LLM – it does a great job of reading and understanding information you pass it but instead of generating a completion it produces a numeric vector that encodes its understanding of the source material.

You’ll typically run the embeddings model on all of the business data you want to search and retrieve for the LLM. Most LLM providers also have embedding models, or you can hit one via any major cloud.

2. A vector database

Once you have embeddings for all of your business data, you need to store them somewhere that facilitates speedy search based on numeric vectors. Solutions like Pinecone and MilvusDB fill this need, but that means integrating a new vendor or hosting a new database internally.

After implementing embeddings and a vector search solution, you can now retrieve information to include in the prompts you send to your LLM(s). But how can you trust that the LLM’s response is grounded in the information you provided and not something based on stale information or purely made up?

There are specialized deep learning models that exist solely for the purpose of ensuring that an LLM’s generative claims are grounded in facts you provide. This practice is variously referred to as hallucination detection, claim verification, NLI, etc. We believe NLI models are an essential part of a trustworthy RAG pipeline, but managed cloud solutions are scarce and you may need to host one yourself on GPU-enabled hardware.

Is a Black Box Sustainable?

If you bake your LLM integration directly into your app, you will effectively end up with a black box that can only be understood and improved by engineers. This could make sense if you have a decent size software shop and they’re the only folks likely to monitor or maintain the integration.

However, your best software engineers may not be your best (or most willing) prompt engineers, and you may wish to involve other personas like product and experience designers since an LLM’s output is often part of your application’s presentation layer & brand.

For these reasons, prompts will quickly need to move from code to configuration – no big deal. However, as an LLM integration matures it will likely become an Agentic Workflow involving:

More prompts, prompt parallelization & chaining
More prompt engineering
RAG and other orchestration

Moving these concerns into configuration is significantly more complex but necessary on larger projects. In addition, people will inevitably want to observe and understand the behavior of the integration to some degree.

For this reason it might make sense to embrace a visual framework for developing Agentic Workflows from the get-go. By doing so you open up the project to collaboration from non-engineers while promoting observability into the integration. If you don’t go this route be prepared to continually build out configurability and observability tools on the side.

Quiq’s AI Automations Take Care of LLM Integration Headaches For You

Hopefully we’ve given you a sense for what it takes to build an enterprise LLM integration. Now it’s time for the plug. The considerations outlined above are exactly why we built AI Studio and particularly our AI Automations product.

With AI automations you can create a serverless API that handles all the complexities of a fully orchestrated AI-flow, including support for multiple LLMs, chaining, RAG, resiliency, observability and more. With AI Automations your LLM integration can go back to being ‘just an API call with basic auth’.

Want to learn more? Dive into AI Studio or reach out to our team.

Request A Demo

Does Quiq Train Models on Your Data? No (And Here’s Why.)

Posted on June 28, 2024November 15, 2024 The Quiq Team

Customer experience directors tend to have a lot of questions about AI, especially as it becomes more and more important to the way modern contact centers function.

These can range from “Will generative AI’s well-known tendency to hallucinate eventually hurt my brand?” to “How are large language models trained in the first place?” along with many others.

Speaking of training, one question that’s often top of mind for prospective users of Quiq’s conversational AI platform is whether we train the LLMs we use with your data. This is a perfectly reasonable question, especially given famous examples of LLMs exposing proprietary data, such as happened at Samsung. Needless to say, if you have sensitive customer information, you absolutely don’t want it getting leaked – and if you’re not clear on what is going on with an LLM, you might not have the confidence you need to use one in your contact center.

The purpose of this piece is to assure you that no, we do not train LLMs with your data. To hammer that point home, we’ll briefly cover how models are trained, then discuss the two ways that Quiq optimizes model behavior: prompt engineering and retrieval augmented generation.

How are Large Language Models Trained?

Part of the confusion stems from the fact that the term ‘training’ means different things to different people. Let’s start by clarifying what this term means, but don’t worry–we’ll go very light on technical details!

First, generative language models work with tokens, which are units of language such as a part of a word (“kitch”), a whole word (“kitchen”), or sometimes small clusters of words (“kitchen sink”). When a model is trained, it’s learning to predict the token that’s most likely to follow a string of prior tokens.

Once a model has seen a great deal of text, for example, it learns that “Mary had a little ____” probably ends with the token “lamb” rather than the token “lightbulb.”

Crucially, this process involves changing the model’s internal weights, i.e. its internal structure. Quiq has various ways of optimizing a model to perform in settings such as contact centers (discussed in the next section), but we do not change any model’s weights.

How Does Quiq Optimize Model Behavior?

There are a few basic ways to influence a model’s output. The two used by Quiq are prompt engineering and retrieval augmented generation (RAG), neither of which does anything whatsoever to modify a model’s weights or its structure.

In the next two sections, we’ll briefly cover each so that you have a bit more context on what’s going on under the hood.

Prompt Engineering

Prompt engineering involves changing how you format the query you feed the model to elicit a slightly different response. Rather than saying, “Write me some social media copy,” for example, you might also include an example outline you want the model to follow.

Quiq uses an approach to prompt engineering called “atomic prompting,” wherein the process of generating an answer to a question is broken down into multiple subtasks. This ensures you’re instructing a Large Language Model in a smaller context with specific, relevant task information, which can help the model perform better.

This is not the same thing as training. If you were to train or fine-tune a model on company-specific data, then the model’s internal structure would change to represent that data, and it might inadvertently reveal it in a future reply. However, including the data in a prompt doesn’t carry that risk because prompt engineering doesn’t change a model’s weights.

Retrieval Augmented Generation (RAG)

RAG refers to giving a language model an information source – such as a database or the Internet – that it can use to improve its output. It has emerged as the most popular technique to control the information the model needs to know when generating answers.

As before, that is not the same thing as training because it does not change the model’s weights.

RAG doesn’t modify the underlying model, but if you connect it to sensitive information and then ask it a question, it may very well reveal something sensitive. RAG is very powerful, but you need to use it with caution. Your AI development platform should provide ways to securely connect to APIs that can help authenticate and retrieve account information, thus allowing you to provide customers with personalized responses.

This is why you still need to think about security when using RAG. Whatever tools or information sources you give your model must meet the strictest security standards and be certified, as appropriate.

Quiq is one such platform, built from the ground-up with data security (encryption in transit) and compliance (SOC 2 certified) in mind. We never store or use data without permission, and we’ve crafted our tools so it’s as easy as possible to utilize RAG on just the information stores you want to plug a model into. Being a security-first company, this extends to our utilization of Large Language Models and agreements with AI providers like Microsoft Open AI.

Wrapping Up on How Quiq Trains LLMs

Hopefully, you now have a much clearer picture of what Quiq does to ensure the models we use are as performant and useful as possible. With them, you can make your customers happier, improve your agents’ performance, and reduce turnover at your contact center.

If you’re interested in exploring some other common misconceptions that CX leaders face when considering incorporating generative AI into their technology stack, check out our ebook on the subject. It contains a great deal of information to help you make the best possible decision!

Request A Demo

How to Improve Contact Center Performance (With Data)

Posted on March 14, 2024November 15, 2024 Michael Hartsog

Contact centers are a crucial part of offering quality products. Long after the software has been built and the marketing campaigns have been run, there will still be agents helping customers reset their passwords and debug tricky issues.

This means we must do everything we can to ensure that our contact centers are operating at peak efficiency. Data analytics is an important piece of the puzzle, offering the kinds of hard numbers we need to make good decisions, do right by our customers, and support the teams we manage.

That will be our focus today. We’ll cover the basics of implementing a data analysis process, as well as how to use it to assess and improve various contact center performance metrics.

Let’s get going!

How to Use Data Analytics to Increase Contact Center Performance

A great place to start is with a broader overview of the role played by data analytics in making decisions in modern contact centers. Here, we’ll cover the rudiments of how data analytics works, the tools that can be used to facilitate it, and how it can be used in making critical decisions.

Understanding the Basics of Data Analytics in Contact Centers

Let’s define data analytics in the context of contact center performance management. Like the term “data scientist”—which could cover anything from running basic SQL queries to building advanced reinforcement learning agents—“data analytics” is a nebulous term that can be used in many different conversations and contexts.

Nevertheless, its basic essence could be summed up as “using numbers to make decisions.”

If you’re reading this, the chances are good that you have a lot of experience in contact center performance management already, but you may or may not have spent much time engaging in data analytics. If so, be aware that data analysis is an enormously powerful tool, especially for contact centers.

Imagine, for example, a new product is released, and you see a sudden increase in average handle time. This could mean there is something about it that’s especially tricky or poorly explained. You could improve your contact center performance metrics simply by revisiting that particular product’s documentation to see if anything strikes you as problematic.

Of course, this is just a hypothetical scenario, but it shows you how much insight you can gain from even rudimentary numbers related to your contact center.

Implementing Analytics Tools and Techniques

Now, let’s talk about what it takes to leverage the power of contact center performance metrics. You can slice up the idea of “analytics tools and techniques” in a few different ways, but by our count, there are (at least) four major components.

Gathering the Data

First, like machine learning, analytics is “hungry,” meaning that it tends to be more powerful the more data you have. For this reason, you have to have a way of capturing the data needed to make decisions.

In the context of contact center performance, this probably means setting up a mechanism for tracking any conversations between agents and customers, as well as whatever survey data is generated by customers reflecting on their experience with your company.

Storing the Data

This data has to live somewhere, and if you’re dealing with text, there are various options. “Structured” textual data follows a consistent format and can be stored in a relational database like MySQL. “Unstructured” textual data may or may not be consistent and is best stored in a non-relational database like MongoDB, which is better suited for it.

It’s not uncommon to have both relational and non-relational databases for storing specific types of data. Survey responses are well-structured so they might go in MySQL, for instance, while free-form conversations with agents might go in MongoDB. There are also more exotic options like graph databases and vector databases, but they’re beyond the scope of this article.

Analyzing the Data

Once you’ve captured your data and stored it somewhere, you have to analyze it—the field isn’t called data analytics for nothing! A common way to begin analyzing data is to look for simple, impactful, long-term trends—is your AHT going up or down, for instance? You can also look for cyclical patterns. Your AHT might generally be moving in a positive direction, but with noticeable spikes every so often that need to be explained and addressed.

You could also do more advanced analytics. After you’ve gathered a reasonably comprehensive set of survey results, for example, you could run them through a sentiment analysis algorithm to find out the general emotional tone of the interactions between your agents and your customers.

Serving Up Your Insights

Finally, once you’ve identified a set of insights you can use to make decisions about improving contact center performance, you need to make them available. By far the most common way is by putting some charts and graphs in a PowerPoint presentation and delivering it to the people making actual decisions. That said, some folks opt instead to make fancy dashboards, or even to create monitoring tools that update in real time.

Effectively Leveraging Data

As you can see, creating a top-to-bottom contact center performance solution takes a lot of effort. The best way to save time is to find a tool that abstracts away as much of the underlying technical work as possible.

Ideally, you’d be looking for quick insights generated seamlessly across all the many messaging channels contact centers utilize these days. It’s even better if those insights can easily be published in reports that inform your decision-making.

What’s the payoff? You’ll be able to scrutinize (and optimize) each step taken during a customer journey, and discover how and why your customers are reaching out. You’ll have much more granular information about how your agents are functioning, giving you the tools needed to improve KPIs and streamline your internal operations.

We’ll treat each of these topics in the remaining sections, below.

How to Improve KPIs in Contact Center

After gathering and analyzing a lot of data, you’ll no doubt notice key performance indicators (KPIs) that aren’t where you want them to be. Here, we’ll discuss strategies for getting those numbers up!

Identifying Key Performance Indicators (KPIs)

First, let’s briefly cover some of the KPIs you’d be looking for.

First Contact Resolution (FCR) – The first contact resolution is the fraction of issues a contact center is able to resolve on the first try, i.e. the first time the customer reaches out.
Average Handle Time (AHT) – The average handle time is one of the more important metrics contact centers track, and it refers to the mean length of time an agent spends on a task (this includes both talking to customers directly and whatever follow-up work comes after).
Customer Satisfaction (CSAT) – The customer satisfaction score attempts to gauge how customers feel about your product and service.
Call Abandon Rate (CAR) – The call abandon rate is the fraction of customers who end a call with an agent before their question has been answered.
Net Promoter Score (NPS) – The net promoter score is a number (usually from 1-10) that quantifies how likely a given customer would be to recommend you to someone they know.

Of course, this is just a sampling of the many contact center performance metrics you can track. Ultimately, you want to choose a set of metrics that gives you a reasonably comprehensive view of how well your contact center is doing, and whether it’s getting better or worse over time.

Strategies for Improving Key KPIs

There are many things you can do to improve your KPIs, including up-training your personnel or making your agents more productive with tools like generative AI.

This is too big a topic to cover comprehensively, but since generative AI is such a hot topic let’s walk through a case study where using it led to dramatic improvements in efficiency.

LOOP is a Texas-based car insurance provider that partnered with Quiq to deploy a generative AI assistant. Naturally, they already had a chatbot in place, but they found it could only offer formulaic answers. This frustrated customers, prevented them from solving their own problems, and negatively impacted KPIs overall.

However, by integrating a cutting-edge AI assistant powered by large language models, they achieved a remarkable threefold increase in self-service rates. By the end, more than half of all customer issues were resolved without the need for agents to get involved, and fully three-quarters of customers indicated that they were satisfied with the service provided by the AI.

Now, we’re not suggesting that you can solve every problem with fancy new technology. No, our point here is that you should evaluate every option in an attempt to find workable contact center performance solutions, and we think this is a useful example of what’s possible with the right approach.

Tips to Boost Contact Center Operational Efficiency

We’ve covered a lot of ground related to data analysis and how it can help you make decisions about improving contact center performance. In this final section, we’ll finish by talking about using data analytics and other tools to make sure you’re as operationally efficient as you can be.

Streamlining Operations with Technology

The obvious place to look is technology. We’ve already discussed AI assistants, but there’s plenty more low-hanging fruit to be picked.

Consider CRM integrations, for example. We’re in the contact center business, so we know all about the vicissitudes of trying to track and manage a billion customer relationships. Even worse, the relevant data is often spread out across many different locations, making it hard to get an accurate picture of who your customers are and what they need.

But if you invest in solutions that allow you to hook your CRM up to your other tools, you can do a better job of keeping those data in sync and serving them up where they’ll be the most use. As a bonus, these data can be fed to a retrieval augmented generation system to help your AI assistant create more accurate replies. They can also form a valuable part of your all-important data analytics process.

What’s more, these same analytics can be used to identify sticking points in your workflows. With this information, you’ll be better equipped to rectify any problems and keep the wheels turning smoothly.

Empowering Agents to Enhance Performance

We’ve spent a lot of time in this post discussing data analytics, AI, and automation, but it’s crucial not to forget that these things are supplements to human agents, not replacements for them. Ultimately, we want agents to feel empowered to utilize the right tools to do their jobs better.

First, to the extent that it’s possible (and appropriate), agents should be given access to the data analytics you perform in the future. If you think you’re making better decisions based on data, it stands to reason that they would do the same.

Then, there are various ways of leveraging generative AI to make your agents more effective. Some of these are obvious, as when you utilize a tool like Quiq Snippets to formulate high-quality replies more rapidly (this alone will surely drop your AHT). But others are more out-of-the-box, such as when new agents can use a language model to get up to speed on your product offering in a few days instead of a few weeks.

Continuously Evaluating and Refining Processes

To close out, we’ll reiterate the importance of consistently monitoring your contact center performance metrics. These kinds of numbers change in all sorts of ways, and the story they tell changes along with them.

It’s not enough to measure a few KPIs and then call it a day, you need to have a process in place to check them consistently, revising your decisions along the way.

Next Steps for Improving Your Contact Center Metrics

They say that data is the new oil, as it’s a near-inexhaustible source of insights. With the right data analysis, you can figure out which parts of your contact center are thriving and which need more support, and you can craft strategies that set you and your teams up to succeed.

Quiq is well-known as a conversational AI platform, but we also have a robust suite of tools for making the most out of the data generated by your contact center. Set up a demo to figure out how we can give you the facts you need to thrive!

Request A Demo

Retrieval Augmented Generation – Ultimate Guide

Posted on February 23, 2024November 21, 2024 J.R. Rettenmeyer

A lot has changed since the advent of large language models a little over a year ago. But, incredibly, there are already many attempts at extending the functionality of the underlying technology.

One broad category of these attempts is known as “tool use”, and consists of augmenting language models by giving them access to things like calculators. Stories of these models failing at simple arithmetic abound, and the basic idea is that we can begin to shore up their weaknesses by connecting them to specific external resources.

Because these models are famously prone to “hallucinating” incorrect information, the technique of retrieval augmented generation (RAG) has been developed to ground model output more effectively. So far, this has shown promise as a way of reducing hallucinations and creating much more trustworthy replies to queries.

In this piece, we’re going to discuss what retrieval augmented generation is, how it works, and how it can make your models even more robust.

Understanding Retrieval Augmented Generation

To begin, let’s get clear on exactly what we’re talking about. The next few sections will overview retrieval augmented generation, break down how it works, and briefly cover its myriad benefits.

What is Retrieval Augmented Generation?

Retrieval augmented generation refers to a large and growing cluster of techniques meant to help large language models ground their output in facts obtained from an external source.

By now, you’re probably aware that language models can do a remarkably good job of generating everything from code to poetry. But, owing to the way they’re trained and the way they operate, they’re also prone to simply fabricating confident-sounding nonsense. If you ask for a bunch of papers about the connection between a supplement and mental performance, for example, you might get a mix of real papers and ones that are completely fictitious.

If you could somehow hook the model up to a database of papers, however, then perhaps that would ameliorate this tendency. That’s where RAG comes in.

We will discuss some specifics in the next section, but in the broadest possible terms, you can think of RAG as having two components: the generative model, and a retrieval system that allows it to augment its outputs with data obtained from an authoritative external source.

The difference between using a foundation model and using a foundation model with RAG has been likened to the difference between taking a closed-book and an open-booked test – the metaphor is an apt one. If you were to poll all your friends about their knowledge of photosynthesis, you’d probably get a pretty big range of replies. Some friends would remember a lot about the process from high school biology, while others would barely even know that it’s related to plants.

Now, imagine what would happen if you gave these same friends a botany textbook and asked them to cite their sources. You’d still get a range of replies, of course, but they’d be far more comprehensive, grounded, and replete with up-to-date details. [1]

How RAG Works

Now that we’ve discussed what RAG is, let’s talk about how it functions. Though there are many subtleties involved, there are only a handful of overall steps.

First, you have to create a source of external data or utilize an existing one. There are already many such external resources, including databases filled with scientific papers, genomics data, time series data on the movements of stock prices, etc., which are often accessible via an API. If there isn’t already a repository containing the information you’ll need, you’ll have to make one. It’s also common to hook generative models up to internal technical documentation, of the kind utilized by e.g. contact center agents.

Then, you’ll have to do a search for relevancy. This involves converting queries into vectors, or numerical representations that capture important semantic information, then matching that representation against the vectorized contents of the external data source. Don’t worry too much if this doesn’t make a lot of sense, the important thing to remember is that this technique is far better than basic keyword matching at turning up documents related to a query.

With that done, you’ll have to augment the original user query with whatever data came up during the relevancy search. In the systems we’ve seen this all occurs silently, behind the scenes, with the user being unaware that any such changes have been made. But, with the additional context, the output generated by the model will likely be much more grounded and sensible. Modern RAG systems are also sometimes built to include citations to the specific documents they drew from, allowing a user to fact-check the output for accuracy.

And finally, you’ll need to think continuously about whether the external data source you’ve tied your model to needs to be updated. It doesn’t do much good to ground a model’s reply if the information it’s using is stale and inaccurate, so this step is important.

The Benefits of RAG

Language models equipped with retrieval augmented generation have many advantages over their more fanciful, non-RAG counterparts. As we’ve alluded to throughout, such RAG models tend to be vastly more accurate. RAG, of course, doesn’t guarantee that a model’s output will be correct. They can still hallucinate, just as one of your friends reading a botany book might misunderstand or misquote a passage. Still, it makes hallucinations far less prevalent and, if the model adds citations, gives you what you need to rectify any errors.

For this same reason, it’s easier to trust a RAG-powered language model, and they’re (usually) easier to use. As we said above a lot of the tricky technical detail is hidden from the end user, so all they see is a better-grounded output complete with a list of documents they can use to check that the output they’ve gotten is right.

Applications of Retrieval Augmented Generation

We’ve said a lot about how awesome RAG is, but what are some of its primary use cases? That will be our focus here, over the next few sections.

Enhancing Question Answering Systems

Perhaps the most obvious way RAG could be used is to supercharge the function of question-answering systems. This is already a very strong use case of generative AI, as attested to by the fact that many people are turning to tools like ChatGPT instead of Google when they want to take a first stab at understanding a new subject.

With RAG, they can get more precise and contextually relevant answers, enabling them to overcome hurdles and progress more quickly.

Of course, this dynamic will also play out in contact centers, which are more often leaning on question-answering systems to either make their agents more effective, or to give customers the resources they need to solve their own problems.

Chatbots and Conversational Agents

Chatbots are another technology that could be substantially upgraded through RAG. Because this is so closely related to the previous section we’ll keep our comments brief; suffice it to say, a chatbot able to ground its replies in internal documentation or a good external database will be much better than one that can’t.

Revolutionizing Content Creation

Because generative models are so, well, generative, they’ve already become staples in the workflows of many creative sorts, such as writers, marketers, etc. A writer might use a generative model to outline a piece, paraphrase their own earlier work, or take the other side of a contentious issue.

This, too, is a place where RAG shines. Whether you’re tinkering with the structure of a new article or trying to build a full-fledged research assistant to master an arcane part of computer science, it can only help to have more factual, grounded output.

Recommendation Systems

Finally, recommendation systems could see a boost from RAG. As you can probably tell from their name, recommendation systems are machine-learning tools that find patterns in a set of preferences and use them to make new recommendations that fit that pattern.

With grounding through RAG, this could become even better. Imagine not only having recommendations, but also specific details about why a particular recommendation was made, to say nothing of recommendations that are tied to a vast set of external resources.

Conclusion

For all the change we’ve already seen from generative AI, RAG has yet more more potential to transform our interaction with AI. With retrieval augmented generation, we could see substantial upgrades in the way we access information and use it to create new things.

If you’re intrigued by the promise of generative AI and the ways in which it could supercharge your contact center, set up a demo of the Quiq platform today!

Request A Demo

Footnotes

[1] This assumes that the book you’re giving them is itself up-to-date, and the same is true with RAG. A generative model is only as good as its data.

Featured Resource

AI Readiness Self-Assessment

Section 1: Natural Language Understanding Implementation

Vector Embeddings and Semantic Processing

Disambiguation Strategies

Input Processing and Validation

Intent Classification Architectures

Section 2: Response Generation and Control

Output Quality Control Systems

Hallucination Prevention Strategies

Input and Output Filtering

Implementation of Guardrails

Response Validation Methods

Section 3: Knowledge Integration

Retrieval-Augmented Generation (RAG)

Dynamic Knowledge Updates

Context Window Management

Knowledge Attribution and Verification

Section 4: Observability and Testing

Regression Testing Implementation

Debug-Replay Capabilities

Performance Monitoring Systems

Iterative Development Methods

Section 5: Advanced Agent Behaviors

Task Decomposition Strategies

Goal-oriented Planning

Multi-step Reasoning Implementation

Uncertainty Handling

Building Outstanding AI Agents: Bringing It All Together

Why Integrate an LLM?

How Do You Feel About LLM Vendor Lock-In?

Resiliency & Scalability are Earned—Not Given

Are You Actually Building an Agentic Workflow?

Be Ready for the Snowball Effect

1. An embedding model

2. A vector database

Is a Black Box Sustainable?

Quiq’s AI Automations Take Care of LLM Integration Headaches For You

How are Large Language Models Trained?

How Does Quiq Optimize Model Behavior?

Prompt Engineering

Retrieval Augmented Generation (RAG)

Wrapping Up on How Quiq Trains LLMs

How to Use Data Analytics to Increase Contact Center Performance

Understanding the Basics of Data Analytics in Contact Centers

Implementing Analytics Tools and Techniques

Gathering the Data

Storing the Data

Analyzing the Data

Serving Up Your Insights

Effectively Leveraging Data

How to Improve KPIs in Contact Center

Identifying Key Performance Indicators (KPIs)

Strategies for Improving Key KPIs

Tips to Boost Contact Center Operational Efficiency

Streamlining Operations with Technology

Empowering Agents to Enhance Performance

Continuously Evaluating and Refining Processes

Next Steps for Improving Your Contact Center Metrics

Understanding Retrieval Augmented Generation

What is Retrieval Augmented Generation?

How RAG Works

The Benefits of RAG

Applications of Retrieval Augmented Generation

Enhancing Question Answering Systems

Chatbots and Conversational Agents

Revolutionizing Content Creation

Recommendation Systems

Conclusion

Footnotes