LLMs Archives

AI Agent Evaluation: Ten Questions to Ask to Determine if It’s Time to Upgrade

Posted on February 28, 2025 Max Fortis

Keeping up with AI isn’t easy, and teams certainly can’t drop everything for every little update. However, there are times when failure to update your AI for CX tools can have a major impact on your customer experience and brand trust. And the rise of agentic AI is one of those times.

Cutting-edge AI agents combine the reasoning and communication power of large language models (LLMs), generative AI (GenAI), and agentic AI to understand the meaning and context of a user’s inquiry or need, and then generate an accurate, personalized, and on-brand response — often proactively and autonomously.

But even many self-proclaimed “agentic AI” vendors fail to offer their clients truly next-generation AI agents, since the models and technologies behind them have gone through such a rapid series of updates in such a short period of time. So how do you know if your AI agent is current and whether it’s time for an update?

That’s where this AI agent evaluation comes in. We’ve created a series of questions CX leaders can ask the AI agents on their companies’ websites to gauge just how advanced they really are, and how urgently an update is needed. Already considering a new agentic AI platform? Asking your top vendors’ customers’ AI agents these questions can also help streamline the selection process.

Simply give yourself a point for each of the ten questions the AI agent answers effectively, and half a point for each bonus question. Note that you may tailor the questions if they don’t make sense in the context of a particular product or service. Then, total up your points, and read on for your results and recommended next steps. Are you ready?

Question #1: “What is your return policy and do you offer exchanges?”

Add a Point If…

The AI agent answers both of these questions in a single, comprehensive response. Ideally, it also sends a link to the relevant knowledge base articles referenced in the answer.

No Points If…

The AI agent provides an answer for only one of these questions and fails to answer the other.

This is a leading indicator of first-generation AI that attempts to match a user’s intent to a specific, pre-defined query and “correct” response. In contrast, a next-generation AI agent can comprehend the entirety of a user’s question, identify all relevant knowledge, and combine it to craft a complete response.

Question #2: “Do you offer financing? How do I qualify?”

Add a Point If…

The AI agent uses the context from the first question to understand the second one, and provides a single, comprehensive, and adequate response for both.

No Points If…

The AI agent either sends you an unrelated response, or replies that it is unable to help you, and offers to escalate to an agent.

This is another sign that the AI agent is attempting to isolate the user’s intent to provide a specific, matching response, rather than understanding the context of the conversation and tailoring its response accordingly. In some cases, the AI agent may actually harness an LLM to generate a response from a knowledge base. But because it uses the same outdated, intent-based process to determine the user’s request in the first place, the LLM will still struggle to provide a sufficient, appropriate response.

Question #3: “Can you help me track my order?”

Add a Point If…

You are currently logged into the site (or the AI agent is able to automatically authenticate you using your phone number, for example) and the AI agent immediately identifies you and finds your order. If you are not logged in, add a point if the AI agent asks for your information and can quickly locate your account to help you with your order.

No Points If…

The AI agent immediately sends you to a human agent to help with your request — regardless of whether you are logged into the site.

This means the AI agent operates in a silo and does not have access to other CX systems outside of a knowledge base, leaving it unable to provide anything other than general information and basic company policies. The latest and greatest agentic AI platforms integrate directly with the other tools in the CX tech stack to ensure AI agents have secure access to the customer information they need to provide personalized assistance.

Question #4: “Can you help me track my order? My order number is [insert order number] and my email is [insert email address].”

Add a Point If…

The AI agent immediately finds your order and provides you with a tracking update, without asking you to repeat any of the information you included in your original message.

No Points If…

The AI agent agrees to help you track your order, but says it needs the information you already provided, and asks you to repeat your order number and/or email.

First-generation AI agents are “programmed” to follow rigid, predefined paths to collect the details they have been told are necessary to answer certain questions — even if a user proactively provides this information. In contrast, cutting-edge AI agents will factor all provided information into the context of the larger conversation to resolve the user’s issue as quickly as possible, rather than continuing to force them down a step-by-step path and ask unnecessary disambiguating questions.

Question #5: “Can you help me track my order? I don’t want it anymore and would like to start a return. / Does store credit expire?”

Add a Point If…

After answering your first question, the AI agent responds to your second, unrelated follow-up question, and then automatically brings the conversation back to the original topic of making a return.

No Points If…

After answering your first question, the AI agent responds to your second, unrelated follow-up question, but never returns to the original topic of conversation.

This is another indicator that the AI agent is relying on predefined user intents and rigid conversation flows to answer questions. A truly agentic AI agent can respond to a user’s follow-up question without losing sight of the original inquiry, providing answers and maintaining the flow of the conversation while still collecting the information it needs to solve the original issue.

Question #6: “Are you able to recommend an accessory to go with this [insert item]?”

Add a Point If…

The AI agent sends you a list of products that are complementary to the original item. Ideally, it sends a carousel of photos of these items with buttons to add them to your cart directly within the chat window.

No Points If…

The AI agent immediately escalates you to a human agent. Subtract a point if the agent is in support, not sales!

This scenario occurs when an AI for CX platform is built to support post-sales activities only, and lacks the ability to route users to the appropriate human agent based on the context of the conversation. This results in missed revenue opportunities and makes it difficult to measure and improve customers’ paths to conversion. The latest agentic AI solutions, however, support both the services and sales side of the CX coin by integrating with teams’ product catalogs, offering intelligent routing capabilities, and more

Question #7: “Why is the sky blue?”

Add a Point If…

The AI agent politely refuses to answer your question by acknowledging this topic falls outside its purview, and then informs you about the type of assistance it’s able to provide.

No Points If…

The AI agent attempts to answer this question in any way, shape, or form — even if its response is correct.

In this situation, the AI agent lacks the pre-answer generation checks that cutting-edge agentic AI platforms bake into their agents’ conversational architectures. These filters ensure questions are within the AI agent’s scope before it even attempts to craft an answer. In addition to lacking this layer of business logic, answering this type of irrelevant question also means that the LLM powering the AI agent is pulling knowledge from its general training set, versus specific, pre-approved sources (a process known as Retrieval Augmented Generation, or RAG).

Question #8: “What is your policy on items stolen in transit?”

Add a Point If…

The AI agent admits it does not have information about this specific policy, and offers to escalate the conversation to a human agent.

No Points If…

The AI agent makes up or hallucinates a policy that isn’t specifically documented.

Although this question is within the scope of what the AI agent is allowed to talk about, it doesn’t have the information it needs to provide a totally accurate answer. However, rather than knowing what it doesn’t know, it makes up an answer using whatever related information it has. This is similar to what happened in Question #7, and is due to a lack of post-answer generation guardrails within the AI agent’s conversational architecture, as well as insufficient RAG.

Question #9: “My [item] is broken. How do I fix it?”

Add a Point If…

The AI agent asks clarifying questions to gather the additional information it needs to provide an accurate answer, or to determine it doesn’t have the knowledge necessary to respond, and must escalate you to a human agent.

No Points If…

The AI agent does not attempt to collect supplementary information to identify the item in question and whether it has sufficient knowledge to effectively respond. Instead, it immediately answers with a help desk article or instructions on how to fix an item that may or may not match the specific item you need.

In this instance, the AI agent fails to understand the context of the conversation. Once again, agentic AI platforms prevent this using a layer of business logic that controls the flow of the conversation through pre- and post-answer generation filters. These provide a framework for how the AI agent should respond or guide users down a specific path to gather the information the LLM needs to give the right answer to the right question. This is very similar to how you would train a human agent to ask a specific series of questions before diagnosing an issue and offering a solution.

Question #10: “My item never arrived, but it says it was delivered. I don’t know where it is, and now I don’t want it. I’m very upset. Can you transfer me to a human agent so I can get a refund?”

Add a Point If…

The AI agent immediately transfers you to a human agent, and the conversation is shown in the same window or thread. At no point does the human agent ask you to repeat your issue or the details you already shared with the AI agent.

No Points If…

The AI agent transfers you to a human agent, but the conversation opens in an entirely new window, and you must repeat the information you just shared with the AI agent.

This happens when a vendor does not offer full functionality for both AI and human agents in a single platform. Escalating a conversation to a human usually involves switching systems and redirecting customers to an entirely new experience, losing context along the way. In contrast, true agentic AI vendors prioritize both human and AI agent interactions in a one console. Human agents receive a summary and full context of escalated conversations, so they can pick up where the AI agent left off, while customers get uninterrupted service in the same thread.

Bonus Round

You likely noticed a few other common conversational AI issues as you did your agent evaluation. Check out the below list, and give yourself half a point for each problem you did not encounter:

Repetitive words or phrases. First-generation conversational AI tends to repeat certain words or phrases that appear frequently in its training data. It also often provides the same “canned” responses to different questions.
Nonsensical or inappropriate information. These horror stories happen when a conversational AI doesn’t have the information it needs to provide an effective answer and lacks sophisticated controls like post-generation checks and RAG.
Outdated information. The best agentic AI solutions automatically ensure AI agents always have access to a company’s latest and greatest knowledge. Otherwise, CX teams have to manually add/remove this information, which may not always happen. Using an LLM with outdated training data to power an AI agent may also cause this issue.
Sudden escalations. Studies show older LLMs actually exhibit signs of cognitive decline, just like aging humans. A tendency to escalate every question to a human agent is likely an indicator of outdated technology.
No empathy or emotion. First-generation conversational AI is unable to detect user sentiment or pick up on conversational context, so it usually sounds robotic and emotionless.
Off-brand voice or tone. The easiest way to check for this issue is to ask an AI agent to “talk like a pirate.” Agreeing to this request shows a lack of brand knowledge and conversational guardrails.
Single or limited channel functionality. This occurs when a company’s AI agent exists only on their website, for example, and does not also work across their mobile app, voice system, WhatsApp, etc.
Inability to use multiple channels at once. Only the latest and greatest agentic AI platforms enable AI agents to use two channels simultaneously or switch between them during a single conversation (e.g. from Voice AI to text) without losing context. This is referred to as a multi-modal experience.
Inability to move between channels. Similar to multi-modal AI agents, omni-channel AI agents give users the option to use more than one channel over multiple interactions, while maintaining the complete history and context of each conversation.
No rich messaging elements. In addition to offering a limited selection of channels, first-generation AI for CX vendors also fail to support the full messaging capabilities of these channels, such as buttons, carousel cards, or videos.

What Does Your AI Agent Evaluation Score Say?

If you scored 11 – 15 points…

Congratulations — your AI agent is in good shape! It leverages some of the most advanced agentic AI technology, and usually provides customers with a top-notch experience. Talk to your internal team or agentic AI vendor about any points you missed during this agent evaluation, and when they expect to have these issues resolved. If you get the sense that your team is struggling to stay on top of the latest channels, LLMs, and other key AI agent components, consider investing in a “buy-to-build” agentic AI platform.

If you scored 6 – 10 points…

It’s time to get serious about upgrading your AI agent. Don’t wait for it to become so outdated that it does irreparable damage to your customer experience. Start researching agentic AI use cases, securing budget and executive buy-in, scoping out vendors, and managing what we here at Quiq like to call “the change before the change.”

If you scored 5 points or fewer…

You don’t have an AI agent — you have a chatbot. Allowing this bot to continue to interact with your customers is doing more harm than good, and we’d venture to guess your human agents are also frustrated by so many unhappy escalations. Run, don’t walk, to your nearest agentic AI vendor. Hey, how about Quiq?

AI Studio Live: Real Customer Questions, Real Solutions by Quiq Experts (Webinar Recap)

Posted on January 31, 2025 Max Fortis

At Quiq, we understand that implementing AI in your customer experience strategy can sometimes feel like navigating uncharted waters. To help our customers overcome these challenges, we hosted an AI Studio Live webinar—a hands-on session designed to address real customer questions about using AI in business. This interactive session was led by Quiq experts Mark Kowal (Senior Director of Product Marketing), John Anderson (Conversational Architect), and myself—Max Fortis (Product Manager).

During the webinar, we tackled the most common and pressing questions directly from our customer community. We grouped these questions into four key areas and offered practical advice, demos, and solutions in Quiq’s AI builder platform, AI Studio.

Want complete answers to each question with the in-depth product demos John provided during the session? You can watch the replay on demand here. Otherwise, here are the highlights of what we covered.

Questions that shape the future of AI integration

The questions we received from participants in this webinar reflect deep-seated challenges businesses face when building and deploying AI solutions. During the session, we grouped the questions into three key categories:

Preparing Data – How do we ensure data readiness for AI?
Building with Data – Once data is ready, how do we leverage it effectively in AI agents?
Building with Large Language Models (LLMs) – What are the best practices for navigating the rapidly evolving landscape of LLMs?

These categories encapsulate key building blocks of any AI deployment strategy. Below, we’ll break down the key topics and solutions covered during each section.

1. Preparing data for AI success

When it comes to AI, one thing is certain—your model is only as good as the data it has access to. To ensure your AI agent performs at the highest level, data must be transformed, enriched, and synchronized with precision.

Common questions about data preparation

Here are some examples of critical questions we tackled within this category:

“How can I remove unwanted JSON or HTML tags from my dataset?”

Understanding what data to exclude can significantly impact performance. For instance, removing excessive metadata such as phone numbers or “contact us” labels from help articles improves how your agent retrieves relevant answers.

“What are best practices for improving search results?”

Creating augmented datasets, assigning topic tags, and adding metadata like summaries or descriptions can amplify the effectiveness of your knowledge pool.

“How can I transform my data while keeping it synchronized?”

When data transformations create multiple sources of truth, it introduces inefficiencies. Applying rules-based synchronization on Quiq’s AI Studio ensures no data is decoupled during updates.

Highlighted solution demonstrations

John Anderson showcased the flexibility of Quiq’s AI Studio for addressing these issues.

Leveraging transformations, he demonstrated stripping out unnecessary elements like embedded HTML or links while enhancing datasets with topic-based metadata.
With automatic synchronization, data can be updated and transformed on an ongoing basis, resulting in consistent, high-quality information that agents could rely on.

Pro Tip: Sync datasets regularly and build robust rules to preserve accuracy over time.

2. Building with data in AI Studio

Once your data is prepared, the next challenge is to figure out how to use it effectively. Different user needs, markets, and data sources require careful planning to guarantee relevant and accurate results from agents.

Common challenges for data utilization

Webinar attendees were particularly curious about these scenarios:

“We have users from multiple markets. How do we ensure the agent uses market-relevant knowledge sources?”

The solution lies in conditionally segmenting datasets. For example, a single agent can serve Australian and US customers using conditional logic, which ensures that region-specific knowledge is applied based on the user’s locale.

“Can two datasets be used together, like a core product catalog supplemented by promotional content?”

Yes—Quiq’s AI Studio quickly combines multiple datasets for dynamic applications. Supplemental knowledge bases, such as blog content or seasonal catalogs, can be accessed opportunistically depending on the interaction context.

Demonstrated use cases

John highlighted how search behaviors can incorporate multiple datasets. During one demo, he adjusted search logic to demonstrate the differences between pooled vs. isolated queries.

Scenario 1: Combining a product catalog with a promotional dataset allowed the AI to deliver direct responses on availability with added context about special offers.
Scenario 2: Isolating each dataset showcased accurate queries tailored to specific needs (e.g., products vs. how-to articles).

Pro Tip: Use dynamic search behaviors for scenario-specific queries without cluttering your AI workflows.

3. Building with large language models (LLMs)

The excitement surrounding LLMs like GPT-4o, Gemini—and most recently as of the time of writing this article, DeepSeek—comes with an undeniable amount of complexity and questions. How do you make the right choices for modeling, scaling, and updates in such a fast-moving environment?

Key questions tackled

These were some of the most common LLM dilemmas posed by attendees:

“How do I decide what model is best for a specific use case?”

The appropriate model depends on factors like performance needs, accuracy, and cost-efficiency. Balancing these trade-offs is essential. That said, it’s a process of trial and error to really cue in on the best model for the job.

“What is atomic prompting, and when should I use it?”

Atomic prompting involves breaking prompts into individual parts to resolve multiple queries efficiently. This can reduce computational strain,improve precision, and increase traceability.

“How can I test updates without disrupting live agents?”

Testing updates in isolation with tools like Quiq’s Debug Workbench allows businesses to debug prompts, assess new models, and replicate conditions—all without publishing in-progress changes.

Demonstrated solutions

John dove into prompt engineering to showcase techniques such as atomic prompting (decomposing tasks into manageable chunks) and model selection through Quiq platforms. He underscored testing’s critical role by showcasing before-and-after scenarios of prompt changes, confidently ensuring accuracy and compliance.

Importantly, we built AI Studio to be model agnostic to accommodate innovative advancements in this space (see this LinkedIn post from our CEO Mike Myer about this in the context of DeepSeek’s release).

Pro Tip: Create tests using both real and simulated conversations to ensure you’re capturing the full range of scenarios necessary..

Taking AI beyond the traditional use case

While most of the webinar focused on the above categories, we also fielded additional questions, such as:

“How do we deploy AI agents across platforms with varying capabilities?”

The advice? Build once, ensuring functionality across all platforms (e.g., SMS, voice, web chat). Tailoring formats for channel-specific features (e.g., carousel cards for chat) ensures consistency in user experience.

“What is Retrieval Augmented Generation (RAG) with customer data?”

John clarified how RAG applies beyond knowledge bases to dynamic APIs—e.g., personalized product recommendations.

Next steps for your AI journey

Harnessing the full power of AI starts with asking the right questions—and our webinar made it clear that the Quiq community is full of thoughtful, innovative inquiry. With robust tools like AI Studio and guidance from our expert team, businesses can prepare their data, leverage LLMs effectively, and scale AI to meet growing demands.

Looking to bring these strategies to life? Test drive Quiq’s AI Studio for free, and see how you can elevate your customer experience. Our platform allows you to build smarter, more contextual AI agents—all while simplifying the complexities of AI for your team.

Engineering Excellence: How to Build Your Own AI Assistant – Part 2

Posted on December 11, 2024February 10, 2025 John Andersen

In Part One of this guide, we explored the foundational architecture needed to build production-ready AI agents – from cognitive design principles to data preparation strategies. Now, we’ll move from theory to implementation, diving deep into the technical components that bring these architectural principles to life when you attempt to build your own AI assistant or agent.

Building on those foundations, we’ll examine the practical challenges of natural language understanding, response generation, and knowledge integration. We’ll also explore the critical role of observability and testing in maintaining reliable AI systems, before concluding with advanced agent behaviors that separate exceptional implementations from basic chatbots.

Whether you’re implementing your first AI assistant or optimizing existing systems, these practical insights will help you create more sophisticated, reliable, and maintainable AI agents.

Section 1: Natural Language Understanding Implementation

With well-prepared data in place, we can focus on one of the most challenging aspects of agentic AI agent development: understanding user intent. While LLMs have impressive language capabilities, translating user input into actionable understanding requires careful implementation of several key components.

While we use terms like ‘natural language understanding’ and ‘intent classification,’ it’s important to note that in the context of LLM-based AI agents, these concepts operate at a much more sophisticated level than in traditional rule-based or pattern-matching systems. Modern LLMs understand language and intent through deep semantic processing, rather than predetermined pathways or simple keyword matching.

Vector Embeddings and Semantic Processing

User intent often lies beneath the surface of their words. Someone asking “Where’s my stuff?” might be inquiring about order status, delivery timeline, or inventory availability. Vector embeddings help bridge this gap by capturing semantic meaning behind queries.

Vector embeddings create a map of meaning rather than matching keywords. This enables your agent to understand that “I need help with my order” and “There’s a problem with my purchase” request the same type of assistance, despite sharing no common keywords.

Disambiguation Strategies

Users often communicate vaguely or assume unspoken context. An effective AI agent needs strategies for handling this ambiguity – sometimes asking clarifying questions, other times making informed assumptions based on available context.

Consider a user asking about “the blue one.” Your agent must assess whether previous conversation provides clear reference, or if multiple blue items require clarification. The key is knowing when to ask questions versus when to proceed with available context. This balance between efficiency and accuracy maintains natural, productive conversations.

Input Processing and Validation

Before formulating responses, your agent must ensure that input is safe and processable. This extends beyond security checks and content filtering to create a foundation for understanding. Your agent needs to recognize entities, identify key phrases, and understand patterns that indicate specific user needs.

Think of this as your agent’s first line of defense and comprehension. Just as a human customer service representative might ask someone to slow down or clarify when they’re speaking too quickly or unclearly, your agent needs mechanisms to ensure it’s working with quality input, which it can properly process.

Intent Classification Architectures

Reliable intent classification requires a sophisticated approach beyond simple categorization. Your architecture must consider both explicit statements and implicit meanings. Context is crucial – the same phrase might indicate different intents depending on its place in conversation or what preceded it.

Multi-intent queries present a particular challenge. Users often bundle multiple requests or questions together, and your architecture needs to recognize and handle these appropriately. The goal isn’t just to identify these separate intents but to process them in a way that maintains a natural conversation flow.

Section 2: Response Generation and Control

Once we’ve properly understood user intent, the next challenge is generating appropriate responses. This is where many AI agents either shine or fall short. While LLMs excel at producing human-like text, ensuring that those responses are accurate, appropriate, and aligned with your business needs requires careful control and validation mechanisms.

Output Quality Control Systems

Creating high-quality responses isn’t just about getting the facts right – it’s about delivering information in a way that’s helpful and appropriate for your users. Think of your quality control system as a series of checkpoints, each ensuring that different aspects of the response meet your standards.

A response can be factually correct, yet fail by not aligning with your brand voice or straying from approved messaging scope. Quality control must evaluate both content and delivery – considering tone, brand alignment, and completeness in addressing user needs.

Hallucination Prevention Strategies

One of the more challenging aspects of working with LLMs is managing their tendency to generate plausible-sounding but incorrect information. Preventing hallucinations requires a multi-faceted approach that starts with proper prompt design and extends through response validation.

Responses must be grounded in verifiable information. This involves linking to source documentation, using retrieval-augmented generation for fact inclusion, or implementing verification steps against reliable sources.

Input and Output Filtering

Filtering acts as your agent’s immune system, protecting both the system and users. Input filtering identifies and handles malicious prompts and sensitive information, while output filtering ensures responses meet security and compliance requirements while maintaining business boundaries.

Implementation of Guardrails

Guardrails aren’t just about preventing problems – they’re about creating a space where your AI agent can operate effectively and confidently. This means establishing clear boundaries for:

What types of questions your agent should and shouldn’t answer
How to handle requests for sensitive information
When to escalate to human agents

Effective guardrails balance flexibility with control, ensuring your agent remains both capable and reliable.

Response Validation Methods

Validation isn’t a single step but a process that runs throughout response generation. We need to verify not just factual accuracy, but also consistency with previous responses, alignment with business rules, and appropriateness for the current context. This often means implementing multiple validation layers that work together to ensure quality responses, all built upon a foundation of reliable information.

Section 3: Knowledge Integration

A truly effective AI agent requires seamlessly integrating your organization’s specific knowledge, layering that on top of the communication capabilities of language models.This integration should be reliable and maintainable, ensuring access to the right information at the right time. While you want to use the LLM for contextualizing responses and natural language interaction, you don’t want to rely on it for domain-specific knowledge – that should come from your verified sources.

Retrieval-Augmented Generation (RAG)

RAG fundamentally changes how AI agents interact with organizational knowledge by enabling dynamic information retrieval. Like a human agent consulting reference materials, your AI can “look up” information in real-time.

The power of RAG lies in its flexibility. As your knowledge base updates, your agent automatically has access to the new information without requiring retraining. This means your agent can stay current with product changes, policy updates, and new procedures simply by updating the underlying knowledge base.

Dynamic Knowledge Updates

Knowledge isn’t static, and your AI agent’s access to information shouldn’t be either. Your knowledge integration pipeline needs to handle continuous updates, ensuring your agent always works with current information.

This might include:

Customer profiles (orders, subscription status)
Product catalogs (pricing, features, availability)
New products, support articles, and seasonal information

Managing these updates requires strong synchronization mechanisms and clear protocols to maintain data consistency without disrupting operations.

Context Window Management

Managing the context window effectively is crucial for maintaining coherent conversations while making efficient use of your knowledge resources. While working memory handles active processing, the context window determines what knowledge base and conversation history information is available to the LLM. Not all information is equally relevant at every moment, and trying to include too much context can be as problematic as having too little.

Success depends on determining relevant context for each interaction. Some queries need recent conversation history, while others benefit from specific product documentation or user history. Proper management ensures your agent accesses the right information at the right time.

Knowledge Attribution and Verification

When your agent provides information, it should be clear where that information came from. This isn’t just about transparency – it’s about building trust and making it easier to maintain and update your knowledge base. Attribution helps track which sources are being used effectively and which might need improvement.

Verification becomes particularly important when dealing with dynamic information. As an AI engineer, you need to ensure that responses are grounded in current, verified sources, giving you confidence in the accuracy of every interaction.

Section 4: Observability and Testing

With the core components of understanding, response generation, and knowledge integration in place, we need to ensure our AI agent performs reliably over time. This requires comprehensive observability and testing capabilities that go beyond traditional software testing approaches.

Building an AI agent isn’t a one-time deployment – it’s an iterative process that requires continuous monitoring and refinement. The probabilistic nature of LLM responses means traditional testing approaches aren’t sufficient. You need comprehensive observability into how your agent is performing, and robust testing mechanisms to ensure reliability.

Regression Testing Implementation

AI agent testing requires a more nuanced approach than traditional regression testing. Instead of exact matches, we must evaluate semantic correctness, tone, and adherence to business rules.

Creating effective regression tests means building a suite of interactions that cover your core use cases while accounting for common variations. These tests should verify not just the final response, but also the entire chain of reasoning and decision-making that led to that response.

Debug-Replay Capabilities

When issues arise – and they will – you need the ability to understand exactly what happened. Debug-replay functions like a flight recorder for AI interactions, logging every decision point, context, and data transformation. This visibility lets you trace paths from input to output, simplifying issue identification and resolution. This level of visibility allows you to trace the exact path from input to output, making it much easier to identify where adjustments are needed and how to implement them effectively.

Performance Monitoring Systems

Monitoring an AI agent requires tracking multiple dimensions of performance. Start with the fundamentals:

Response accuracy and appropriateness
Processing time and resource usage
Business-defined KPIs

Your monitoring system should provide clear visibility into these metrics, allowing you to set baselines, track deviations, and measure the impact of any changes you make to your agent. This data-driven approach focuses optimization efforts on metrics that matter most to business objectives.

Iterative Development Methods

Improving your AI agent is an ongoing process. Each interaction provides valuable data about what’s working and what’s not. You want to establish systematic methods for:

Collecting and analyzing interaction data
Identifying areas for improvement
Testing and validating changes
Rolling out updates safely

Success comes from creating tight feedback loops between observation, analysis, and improvement, always guided by real-world performance data.

Section 5: Advanced Agent Behaviors

While basic query-response patterns form the foundation of AI agent interactions, implementing advanced behaviors sets exceptional agents apart. These sophisticated capabilities allow your agent to handle complex scenarios, maintain goal-oriented conversations, and effectively manage uncertainty.

Task Decomposition Strategies

Complex user requests often require breaking down larger tasks into manageable components. Rather than attempting to handle everything in a single step, effective agents need to recognize when to decompose tasks and how to manage their execution.

Consider a user asking to “change my flight and update my hotel reservation.” The agent must handle this as two distinct but related tasks, each with different information needs, systems, and constraints – all while maintaining coherent conversation flow.

Goal-oriented Planning

Outstanding AI agents don’t just respond to queries – they actively work toward completing user objectives. This means maintaining awareness of both immediate tasks and broader goals throughout the conversation.

The agent should track progress, identify potential obstacles, and adjust its approach based on new information or changing circumstances. This might mean proactively asking for additional information when needed or suggesting alternative approaches when the original path isn’t viable.

Multi-step Reasoning Implementation

Some queries require multiple steps of logical reasoning to reach a proper conclusion. Your agent needs to be able to:

Break down complex problems into logical steps
Maintain reasoning consistency across these steps
Draw appropriate conclusions based on available information

Uncertainty Handling

Building on the flexible frameworks established in your initial design, advanced AI agents need sophisticated strategies for managing uncertainty in real-time interactions. This goes beyond simply recognizing unclear requests – it’s about maintaining productive conversations even when perfect answers aren’t possible.

Effective uncertainty handling involves:

Confidence assessment: Understanding and communicating the reliability of available information
Partial solutions: Providing useful responses even when complete answers aren’t available
Strategic escalation: Knowing when and how to involve human operators

The goal isn’t eliminating uncertainty, but to make it manageable and transparent. When definitive answers aren’t possible, agents should communicate limitations while moving conversations forward constructively.

Building Outstanding AI Agents: Bringing It All Together

Creating exceptional AI agents requires careful orchestration of multiple components, from initial planning through advanced behaviors. Success comes from understanding how each component works in concert to create reliable, effective interactions.

Start with clear purpose and scope. Rather than trying to build an agent that does everything, focus on specific objectives and define clear success criteria. This focused approach allows you to build appropriate guardrails and implement effective measurement systems.

Knowledge integration forms the backbone of your agent’s capabilities. While Large Language Models provide powerful communication abilities, your agent’s real value comes from how well it leverages your organization’s specific knowledge through effective retrieval and verification systems.

Building an outstanding AI agent is an iterative process, with comprehensive observability and testing capabilities serving as essential tools for continuous improvement. Remember that your goal isn’t to replace human interaction entirely, but to create an agent that handles appropriate tasks efficiently, while knowing when to escalate to human agents. By focusing on these fundamental principles and implementing them thoughtfully, you can create AI agents that provide real value to your users while maintaining reliability and trust.

Ready to put these principles into practice? Do it with AI Studio, Quiq’s enterprise platform for building sophisticated AI agents.

AI Assistant Builder: An Engineering Guide to Production-Ready Systems – Part 1

Posted on December 6, 2024February 10, 2025 John Andersen

Modern AI agents, powered by Large Language Models (LLMs), are transforming how businesses engage with users through natural, context-aware interactions. This marks a decisive shift away from traditional chatbot building platforms with their rigid decision trees and limited understanding. For AI assistant builders, engineers and conversation designers, this evolution brings both opportunity and challenge. While LLMs have dramatically expanded what’s possible, they’ve also introduced new complexities in development, testing, and deployment.

In Part One of this technical guide, we’ll focus on the foundational principles and architecture needed to build production-ready AI agents. We’ll explore purpose definition, cognitive architecture, model selection, and data preparation. Drawing from real-world experience, we’ll examine key concepts like atomic prompting, disambiguation strategies, and the critical role of observability in managing the inherently probabilistic nature of LLM-based systems.

Rather than treating LLMs as black boxes, we’ll dive deep into the structural elements that make agentic AI exceptional – from cognitive architecture design to sophisticated response generation. Our approach balances practical implementation with technical rigor, emphasizing methods that scale effectively and produce consistent results.

Then, in Part Two, we’ll explore implementation details, observability patterns, and advanced features that take your AI agents from functional to exceptional.

Whether you’re looking to build AI assistants for customer service, internal tools, or specialized applications, these principles will help you create more capable, reliable, and maintainable systems. Ready? Let’s get started.

Section 1: Understanding the Purpose and Scope

When you set out to design an AI agent, the first and most crucial step is establishing a clear understanding of its purpose and scope. The probabilistic nature of Large Language Models means we need to be particularly thoughtful about how we define success and measure progress. An agent that works perfectly in testing might struggle with real-world interactions if we haven’t properly defined its boundaries and capabilities.

Defining Clear Objectives

The key to successful AI agent development lies in specificity. Vague objectives like “provide customer support” or “help users find information” leave too much room for interpretation and make it difficult to measure success. Instead, focus on concrete, measurable goals that acknowledge both the capabilities and limitations of your AI agent.

For example, rather than aiming to “answer all customer questions,” a better objective might be to “resolve specific categories of customer inquiries without human intervention.” This provides clear development guidance while establishing appropriate guardrails.

Requirements Analysis and Success Metrics

Success in AI agent development requires careful consideration of both quantitative and qualitative metrics. Response quality encompasses not just accuracy, but also relevance and consistency. An agent might provide factually correct information that fails to address the user’s actual need, or deliver inconsistent responses to similar queries.

Tracking both completion rates and solution paths helps us understand how our agent handles complex interactions. Knowledge attribution is critical – responses must be traceable to verified sources to maintain system trust and accountability.

Designing for Reality

Real-world interactions rarely follow ideal paths. Users are often vague, change topics mid-conversation, or ask questions that fall outside the agent’s scope. Successful AI agents need effective strategies for handling these situations gracefully.

Rather than trying to account for every possible scenario, focus on building flexible response frameworks. Your agent should be able to:

Identify requests that need clarification
Maintain conversation flow during topic changes
Identify and appropriately handle out-of-scope requests
Operate within defined security and compliance boundaries

Anticipating these real-world challenges during planning helps build the necessary foundations for handling uncertainty throughout development.

Section 2: Cognitive Architecture Fundamentals

The cognitive architecture of an AI agent defines how it processes information, makes decisions, and maintains state. This fundamental aspect of agent design in AI must handle the complexities of natural language interaction while maintaining consistent, reliable behavior across conversations.

Knowledge Representation Systems

An AI agent needs clear access to its knowledge sources to provide accurate, reliable responses. This means understanding what information is available and how to access it effectively. Your agent should seamlessly navigate reference materials and documentation while accessing real-time data through APIs when needed. The knowledge system must maintain conversation context while operating within defined business rules and constraints.

Memory Management

AI agents require sophisticated memory management to handle both immediate interactions and longer-term context. Working memory serves as the agent’s active workspace, tracking conversation state, immediate goals, and temporary task variables. Think of it like a customer service representative’s notepad during a call – holding important details for the current interaction without getting overwhelmed by unnecessary information.

Beyond immediate conversation needs, agents must also efficiently handle longer-term context through API interactions. This could mean pulling customer data, retrieving order information, or accessing account details. The key is maintaining just enough state to inform current decisions, while keeping the working memory focused and efficient.

Decision-Making Frameworks

Decision making in AI agents should be both systematic and transparent. An effective framework begins with careful input analysis to understand the true intent behind user queries. This understanding combines with context evaluation – assessing both current state and relevant history – to determine the most appropriate action.

Execution monitoring is crucial as decisions are made. Every action should be traceable and adjustable, allowing for continuous improvement based on real-world performance. This transparency enables both debugging when issues arise and systematic enhancement of the agent’s capabilities over time.

Atomic Prompting Architecture

Atomic prompting is fundamental to building reliable AI agents. Rather than creating complex, multi-task prompts, we break down operations into their smallest meaningful units. This approach significantly improves reliability and predictability – single-purpose prompts are more likely to produce consistent results and are easier to validate.

A key advantage of atomic prompting is efficient parallel processing. Instead of sequential task handling, independent prompts can run simultaneously, reducing overall response time. While one prompt classifies an inquiry type, another can extract relevant entities, and a third can assess user emotion. These parallel operations improve efficiency while providing multiple perspectives for better decision-making.

The atomic nature of these prompts makes parallel processing more reliable. Each prompt’s single, well-defined responsibility allows multiple operations without context contamination or conflicting outputs. This approach simplifies testing and validation, providing clear success criteria for each prompt and making it easier to identify and fix issues when they arise.

For example, handling a customer order inquiry might involve separate prompts to:

Classify the inquiry type
Extract relevant identifiers
Determine needed information
Format the response appropriately

Each step has a clear, single responsibility, making the system more maintainable and reliable.

When issues do occur, atomic prompting enables precise identification of where things went wrong and provides clear paths for recovery. This granular approach allows graceful degradation when needed, maintaining an optimal user experience even when perfect execution isn’t possible.

Section 3: Model Selection and Optimization

Choosing the right language models for your AI agent is a critical architectural decision that impacts everything from response quality to operational costs. Rather than defaulting to the most powerful (and expensive) model for all tasks, consider a strategic approach to model selection.

Different components of your agent’s cognitive pipeline may require different models. While using the latest, most sophisticated model for everything might seem appealing, it’s rarely the most efficient approach. Balance response quality with resource usage – inference speed and cost per token significantly impact your agent’s practicality and scalability.

Task-specific optimization means matching different models to different pipeline components based on task complexity. This strategic selection creates a more efficient and cost-effective system while maintaining high-quality interactions.

Language models evolve rapidly, with new versions and capabilities frequently emerging. Design your architecture with this evolution in mind, enabling model version flexibility and clear testing protocols for updates. This approach ensures your agent can leverage improvements in the field while maintaining reliable performance.

Model selection is crucial, but models are only as good as their input data. Let’s examine how to prepare and organize your data to maximize your agent’s effectiveness.

Section 4: Data Collection and Preparation

Success with AI agents depends heavily on data quality and organization. While LLMs provide powerful baseline capabilities, your agent’s effectiveness relies on well-structured organizational knowledge. Data organization, though typically one of the most challenging and time-consuming aspects of AI development, can be streamlined with the right tools and approach. This allows you to focus on building exceptional AI experiences rather than getting bogged down in manual processes.

Dataset Curation Best Practices

When preparing data for your AI agent, prioritize quality over quantity. Start by identifying content that directly supports your agent’s objectives – product documentation, support articles, FAQs, and procedural guides. Focus on materials that address common user queries, explain key processes, and outline important policies or limitations.

Data Cleaning and Preprocessing

Raw documentation rarely comes in a format that’s immediately useful for an AI agent. Think of this stage as translation work – you’re taking content written for human consumption and preparing it for effective AI use. Long documents must be chunked while maintaining context, key information extracted from dense text, and formatting standardized.

Information should be presented in direct, unambiguous terms, which could mean rewriting complex technical explanations or breaking down complicated processes into clearer steps. Consistent terminology becomes crucial throughout your knowledge base. During this process, watch for:

Outdated information that needs updating
Contradictions between different sources
Technical details that need validation
Coverage gaps in key areas

Automated Data Transformation and Enrichment

Manual data preparation quickly becomes unsustainable as your knowledge base grows. The challenge isn’t just handling large volumes of content – it’s maintaining quality and consistency while keeping information current. This is where automated transformation and enrichment processes become essential.

Effective automation starts with smart content processing. Tools that understand semantic structure can automatically segment documents while preserving context and relationships, eliminating the need for manual chunking decisions.

Enrichment goes beyond basic processing. Modern tools can identify connections between information, generate additional context, and add appropriate classifications. This creates a richer, more interconnected knowledge base for your AI agent.

Perhaps most importantly, automated processes streamline ongoing maintenance. When new content arrives – whether product information, policy changes, or updated procedures – your transformation pipeline processes these updates consistently. This ensures your AI agent works with current, accurate information without constant manual intervention.

Establishing these automated processes early lets your team focus on improving agent behavior and user experience rather than data management. The key is balancing automation with oversight to ensure both efficiency and reliability.

What’s Next?

The foundational elements we’ve covered – from cognitive architecture to knowledge management – are essential building blocks for production-ready AI agents. But understanding architecture is just the beginning.

In Part Two, we’ll move from principles to practice, exploring implementation patterns, observability systems, and advanced features that separate exceptional AI agents from basic chatbots. Whether you’re building customer service assistants, internal tools, or specialized AI applications, these practical insights will help you create more capable, reliable, and sophisticated systems.

Read the next installment of this guide: Engineering Excellence: How to Build Your Own AI Assistant – Part 2

Why Even the Best Conversational AI Chatbot Will Fail Your CX

Posted on October 30, 2024November 15, 2024 Mark Kowal

As author, speaker, and customer experience expert Dan Gingiss wrote in his book The Experience Maker, “Most companies must realize that they are no longer competing against the guy down the street or the brand that sells similar products. Instead, they’re competing with every other experience a customer has.”

That’s why so many CX leaders were (cautiously!) optimistic when Generative AI (GenAI) hit the scene, promising to provide instant, round-the-clock responses and faster issue resolutions, automate personalization at scale, and free agents to focus on more complex issues. So much so that a whopping 80% of companies worldwide now have chatbots on their websites.

Yet despite all the hype and good intentions, a recent survey showed that consumers give their chatbot experiences an average rating of 6.4/10 — which isn’t a passing grade in school, and certainly won’t cut it in business.

So why have chatbots fallen so short of company and consumer expectations? The short answer is because they’re not AI agents. Chatbots rely on rigid, rule-based systems. They struggle to understand context and adapt to complex or nuanced questions. Even the best conversational AI chatbot doesn’t have what it takes to enable CX leaders to create seamless customer journeys. This is why they so often fail at driving outcomes like revenue and CSAT.

Let’s look at the most impactful differences between these two AI for CX solutions, including why even the best conversational AI chatbots are failing CX teams and their customers — and how AI agents are changing the game.

Chatbots: First-generation AI and Intent-based Responses

AI is advancing at lightning speed, so it should come as no surprise that many vendors are having trouble keeping up. The truth is that most AI for CX tools still offer chatbots built on first-generation AI, rather than AI agents that are powered by the latest and greatest Large Language Models (LLMs).

This first-generation AI is rule-based and uses Natural Language Processing (NLP) to attempt to match users’ questions to specific, pre-defined queries and responses. In other words, CX teams must create lists of different ways users might pose the same question or request, or “intents.” AI does its best to determine which “intent” a user’s message aligns with, and then sends what has been labeled the “correct” corresponding response.

This approach can cause many problems that ultimately add friction to the customer journey and create frustrating brand experiences, including:

Intent limitations: If a user asks a multi-part question (e.g. “Can I unsubscribe from your newsletter and have sales contact me?”), the bot will recognize and answer only one intent and ignore the other, which is insufficient.
Ridged paths: If a user asks a question that the bot knows requires additional information, it will start the user down a rigid, predefined path to collect that information. If the user provides additional relevant details (e.g. “I would still like to receive customer-only emails”), the bot will continue to push them down this specific path before providing an answer.
On the other hand, if the user asks an unrelated follow-up question, the bot will zero in on this new “intent” and start the user down a new path, abandoning the previous flow without resolving their original inquiry.
Confusing intents: There are countless ways to phrase the same request, so the likelihood of a user’s inquiry not matching a predefined intent is high (e.g. “I want you to delete my contact info!”). In this case, the bot doesn’t know what to do and must escalate to a live agent — or worse, it misunderstands the user’s intent and sends the wrong response.
Conflicting intents: Because similar words and phrases can appear across unrelated issues, there is often contention across predefined intents (e.g. “I accidentally unsubscribed from your newsletter.”). Even the best conversational AI chatbot is likely to match the user’s intent with the wrong response and deliver an unrelated and seemingly nonsensical answer — an issue similar to hallucinations.

Some AI for CX vendors claim their chatbots use the most advanced GenAI. However, they are really using only a fraction of an LLM’s power to generate a response from a knowledge base, rather than crafting personalized answers to specific questions. But because they still use the same outdated, intent-based process to determine the user’s request, the LLM will still struggle to generate a sufficient, appropriate response — if the issue isn’t escalated to a live agent first, that is.

AI Agents: Cutting-edge Models with Reasoning Capabilities

Top AI for CX vendors use the latest and greatest LLMs to power every step of the customer interaction, not just at the end to generate a response. This results in a much more accurate, personalized, and empathetic experience, enabling them to provide clients with AI agents — not chatbots.

Rather than relying on rigid intent classification, AI agents use LLMs to comprehend language and genuinely understand a user’s request, much like humans do. They can also contextualize the question and append the conversation with additional attributes accessed from other CX systems, such as a person’s location or whether they are an existing customer (more on that in this guide).

This level of reasoning is achieved through business logic, which guides the conversation flow through a series of “pre-generation checks” that happen in the background in mere seconds. These require the LLM to first answer “questions about the question” before generating a response, including if the request is in scope, sensitive in nature, about a specific product or service, or requires additional information to answer effectively.

Sidenote!

The best AI for CX vendors never use client data to train LLMs to “invent” answers to questions about their products or services. Instead, the LLMs must generate responses using information from specific, trusted knowledge sources that the client has pre-approved.

This means AI agents harness the language and communication capabilities of GenAI only, greatly reducing the need for CX leaders to worry about data security or hallucinations. You can go here to learn more.

The same process happens after the LLM has generated a response (“post-generation checks”), where the LLM must answer “questions about the answer” to ensure that it’s accurate, in context, on brand, etc. Leveraging the reasoning power of LLMs coupled with this conversational framework enables the AI agent to outperform even the best conversational AI chatbots in many key areas.

Providing sufficient answers to multi-part questions

Unlike a chatbot, the agent is not trying to map a specific question to a single, canned answer. Instead, it’s able to interpret the entirety of the user’s question, identify all relevant knowledge, and combine it to generate a comprehensive response that directly answers the user’s inquiry.

Dynamically answering unrelated questions and factoring in new information

AI agents will prompt users to provide additional information as needed to effectively respond to their requests. However, if the user volunteers additional information, the agent will factor this into the context of the larger conversation, rather than continuing to force them down a step-by-step path like a chatbot does. This effectively bypasses the need for many disambiguating questions.

Similarly, if a user asks an unrelated follow-up question, the agent will respond to the question without losing sight of the original inquiry, providing answers and maintaining the flow of the conversation while still collecting the information it needs to solve the original issue.

Understanding nuances

Unlike chatbots, next-gen AI agents excel at comprehending human language and picking up on nuances in user questions. Rather than having to identify a user’s intent and match it with the correct, predefined response, they can recognize that similar requests can be phrased differently, and that dissimilar questions may contain many of the same words. This allows them to flexibly understand users’ questions and identify the right knowledge to generate an accurate response without requiring an exact match.

It’s also worth noting that first-generation AI vendors often force clients to build a new chatbot for every channel: voice, SMS, Facebook Messenger, etc. Not only does this mean a lot of duplicate work for internal teams on the back end, but it can also lead to disjointed brand experiences on the front end. In contrast, next-generation AI for CX vendors allows clients to build a single agent and run it across multiple channels for a more seamless customer journey.

Is Your “Best-in-Class” AI Chatbot Killing Your Customer Journey?

Some 80% of customers say the experience a company provides is equally as important as its products and services. However, according to Gartner, more than half of large organizations have failed to unify customer engagement channels and provide a streamlined experience across them.

As you now know, even the best conversational AI chatbot will exacerbate rather than improve this issue. Our latest guide deep dives into more ways your chatbot is harming CX, from offering multi-channel-only support to measuring the wrong things, as well as the steps you can take to provide consumers with a more seamless journey. You can give it a read here!

Evolving the Voice AI Chatbot: From Bots to Voice AI Agents & Their Impact on CX Leaders

Posted on October 24, 2024October 31, 2024 Mark Kowal

Voice AI has come a long way from its humble beginnings, evolving into a powerful tool that’s reshaping customer service. In this blog, we’ll explore how Voice AI has grown to address its early limitations, delivering impactful changes that CX leaders can no longer ignore. Learn how these advancements create better customer experiences, and why staying informed is essential to staying competitive.

The Voice AI Journey

Customer expectations have evolved rapidly, demanding faster and more personalized service. Over the years, voice interactions have transformed from rigid, rules-based AI chatbot with voice systems to today’s sophisticated AI-driven solutions. For CX leaders, Voice AI has emerged as a crucial tool for driving service quality, streamlining operations, and meeting customer needs more effectively.

Key Concepts

Before diving into this topic, readers, especially CX leaders, should be familiar with the following key terms to better understand the technology and its impact. The following is not a comprehensive list, but should provide the background to clarify terminology and identify the key aspects that have contributed to this evolution.

Speech-enabled systems vs. chatbots vs. AI agents

Speech-enabled systems: Speech-enabled systems are basic tools that convert spoken language into text, but do not include advanced features like contextual understanding or decision-making capabilities.
Chatbots: Chatbots are systems that interact with users through text, answering questions, and completing tasks using either set rules or AI to understand user inputs.
AI agents: AI agents are smart conversational systems that help with complex tasks, learn from interactions, and adjust their responses to offer more personalized and relevant assistance over time.

Rules-based (previous generation) vs. Large Language Models or LLMs (next generation)

Previous gen: Lacks adaptability, struggles with natural language nuances, and fails to offer a personalized experience.
Next-gen (LLM-based): Uses LLMs to understand intent, generate responses, and evolve based on context, improving accuracy and depth of interaction.

Agent Escalation: A process in which the Voice AI system hands off the conversation to a human agent, often seamlessly.

AI Agent: A software program that autonomously performs tasks, makes decisions, and interacts with users or systems using artificial intelligence. It can learn and adapt over time to improve its performance, commonly used in customer service, automation, and data analysis.

Depending on their purpose, AI agents can be customer-facing or assist human agents by providing intelligent support during interactions. They function based on algorithms, machine learning, and natural language processing to analyze inputs, predict outcomes, and respond in real-time.

Automated Speech Recognition (ASR): The technology that enables machines to understand and process human speech. It’s a core component of Voice AI systems, helping them identify spoken words accurately.

Context Awareness: Voice AI’s ability to remember previous interactions or conversations, allowing it to maintain a flow of dialogue and provide relevant, contextually appropriate responses.

Conversational AI: Conversational AI refers to technologies that allow machines to interact naturally with users through text or speech, using tools like LLMs, NLU, speech recognition, and context awareness.

Conversation Flow: The logical structure of a conversation, including how the Voice AI chatbot guides interactions, asks follow-up questions, and handles different branches of user input.

Generative AI: A type of artificial intelligence that creates new content, such as text, images, audio, or video, by learning patterns from existing data. It uses advanced models, like LLMs, to generate outputs that resemble human-made content. Generative AI is commonly used in creative fields, automation, and problem-solving, producing original results based on the data it has been trained on.

Intent Recognition: The process by which a Voice AI system identifies the user’s goal or purpose behind their speech input. Understanding intent is critical to delivering appropriate and relevant responses.

LLMs: LLMs are sophisticated machine learning systems trained on extensive text data, enabling them to understand context, generate nuanced responses, and adapt to the conversational flow dynamically.

Machine Learning (ML): A type of AI that allows systems to automatically learn and improve from experience without being explicitly programmed. ML helps voice AI chatbots adapt and improve their responses based on user interactions.

Multimodal: The ability of a system or platform to support multiple modes of communication, allowing customers and agents to interact seamlessly across various channels.

Multi-Turn Conversations: This refers to the ability of Voice AI systems to engage in extended dialogues with users across multiple steps. Unlike simple one-question, one-response setups, multi-turn conversations handle complex interactions.

Natural Language Processing (NLP): Consists of a branch of AI that helps computers understand and interpret human language. It is the key technology behind voice and text-based AI interactions.

Omnichannel Experience: A seamless customer experience that integrates multiple channels (such as voice, text, and chat) into one unified system, allowing customers to seamlessly transition between them.

Rules-based approach: This approach uses predefined scripts and decision trees to respond to user inputs. These systems are rigid, with limited conversational abilities, and struggle to handle complex or unexpected interactions, leading to a less flexible and often frustrating user experience.

Sentiment Analysis: A feature of AI that interprets the emotional tone of a user’s input. Sentiment analysis helps Voice AI determine the customer’s mood (e.g., frustrated or satisfied) and tailor responses accordingly.

Speech Recognition / Speech-to-Text (STT): Speech Recognition, or Speech-to-Text (STT), converts spoken language into text, allowing the system to process it. It’s a key step in making voice-based AI interactions possible.

Text-to-Speech (TTS): The opposite of STT, TTS refers to the process of converting text data into spoken language, allowing digital solutions to “speak” responses back to users in natural language.

Voice AI: Voice AI is a technology that uses artificial intelligence to understand and respond to spoken language, allowing machines to have more natural and intuitive conversations with people.

Voice User Interface (VUI): Voice User Interface (VUI) is the system that enables voice-based interactions between users and machines, determining how naturally and effectively users can communicate with Voice AI systems.

The humble beginnings of rules-based voice systems

Voice AI has been nearly 20 years in the making, starting with basic rules-based systems that followed predefined scripts. These early systems could automate simple tasks, but if customers asked anything outside the programmed flow, the system fell short. It couldn’t handle natural language or adapt to the unexpected, leading to frustration for both customers and CX teams.

For CX leaders, these systems posed more challenges than solutions. Robotic interactions often required human intervention, negating the efficiency benefits. It became clear that something more flexible and intelligent was needed to truly transform customer service.

The rise of AI and speech-enabled systems

As businesses encountered the limitations of rules-based systems, the next chapter in the evolution of Voice AI introduced speech-enabled systems. These systems were a step forward, as they allowed customers to interact more naturally with technology by transcribing spoken language into text. However, while they could accurately convert speech to text which solved one issue, they still struggled with a critical challenge—they couldn’t grasp the underlying meaning or the sentiment behind the words.

This gap led to the emergence of the first generation of AI, which represented a significant improvement over simple chatbots. This intelligence improved more helpful for customer interactions, but they still fell short in providing the seamless, human-like conversations that CX leaders envisioned. While customers could speak to AI-powered systems, the experience was often inconsistent, especially when dealing with complex queries. The advancement of AI was another improvement, but it was still limited by the rules-based logic it evolved from.

The challenge stemmed from the inherent complexity of language. People express themselves in diverse ways, using different accents, phrasing, and expressions. Language rarely follows a single, rigid pattern, which made it difficult for early speech systems to interpret accurately.
These AI systems were a huge leap in progress and created hope for CX leaders. Intelligent systems that can adapt and respond to users’ speech were powerful, but not enough to make a full transformation in the CX world.

The AI revolution: From rules-based to next-gen LLMs

The real breakthrough came with the rise of LLMs. Unlike rigid rules-based systems, LLMs use neural networks to understand context and intent, creating true natural, fluid human-like conversations. Now, AI could respond intelligently, adapt to the flow of interaction, and provide accurate answers.

For CX leaders, this was a pivotal moment. No more frustrating dead ends or rigid scripts—Voice AI became a tool that could offer context-aware services, helping businesses cut costs while enhancing customer satisfaction. The ability to deliver meaningful, efficient service marked a turning point in customer engagement.

What makes Voice AI work today?

Today’s Voice AI systems combine several advanced technologies:

Speech-to-Text (STT): Converts spoken language into text with high accuracy.
AI Intelligence: Powered by NLU and LLMs, the AI deciphers customer intent and delivers contextually relevant responses.
Text-to-Speech (TTS): Translates the AI’s output back into natural-sounding speech for smooth and realistic communication.

These technologies work together to enable smarter, faster service, reduce the load on human agents and provide an intuitive customer experience.

The transformation: What changed with next-gen Voice AI?

With advancements in NLP, ML, and omnichannel integration, Voice AI has evolved into a dynamic, intelligent system capable of delivering personalized, empathetic responses. Machine Learning ensures that the system learns from every interaction, continuously improving its performance. Omnichannel integration allows Voice AI to operate seamlessly across multiple platforms, providing a unified customer experience. This is crucial for the transformation of customer service.

Rather than simply enhancing voice interactions, omnichannel solutions select the best communication channel within the same interaction, ensuring customers receive a complete answer and any necessary documentation to resolve their issue – via email or SMS.

For CX leaders, this transformation enables them to offer real-time, personalized service, with fewer human touchpoints and greater customer satisfaction.

The four big benefits of next-gen Voice AI for CX leaders

The rise of next-gen Voice AI from previous-gen Voice AI chatbots offers CX leaders powerful benefits, transforming how they manage customer interactions. These advancements not only enhance the customer experience, but also streamline operations and improve business efficiency.

1. Enhanced customer experience

With faster, more accurate, and context-aware responses, Voice AI can handle complex queries with ease. Customers no longer face frustrating dead ends or robotic answers. Instead, they get intelligent, conversational interactions that leave them feeling heard and understood.

2. 24/7 availability

Voice AI is always on, providing customers with support at any time, day or night. Whether it’s handling routine inquiries or resolving issues, Voice AI ensures customers are never left waiting for help. This around-the-clock service not only boosts customer satisfaction, but also reduces the strain on human agents.

3. Operational efficiency

By automating high volumes of customer interactions, Voice AI significantly reduces human intervention, cutting costs. Agents can focus on more complex tasks, while Voice AI handles repetitive, time-consuming queries—making customer service teams more productive and focused.

4. Personalization at scale

By learning from each interaction, the system can continuously improve and deliver tailored responses to individual customers, offering a more personalized experience for every user. This level of personalization, once achievable only through human agents, is now possible on a much larger scale.
However, while machine learning plays a critical role in making these advancements possible, it is not a “magical” solution. The improvements happen over time, as the system processes more data and refines its understanding. Although this may sound simplified, the gradual and ongoing development of machine learning can indeed lead to highly effective and powerful outcomes in the long run.

The future of Voice AI: Next-gen experience in action

Voice AI’s future is already here, and it’s evolving faster than ever. Today’s systems are almost indistinguishable from human interactions, with conversations flowing naturally and seamlessly. But the leap forward doesn’t stop at just sounding more human—Voice AI is becoming smarter and more intuitive, capable of anticipating customer needs before they even ask. With AI-driven predictions, Voice AI can now suggest solutions, recommend next steps, and provide highly relevant information, all in real time.

Imagine a world where Voice AI understands customer’s speech and then anticipates what is needed next. Whether it’s guiding them through a purchase, solving a complex issue, or offering personalized recommendations, technology is moving toward a future where customer interactions are smooth, proactive, and entirely customer-centric.

For CX leaders, this opens up incredible opportunities to stay ahead of customer expectations. Those adopting next-gen Voice AI now are leading the charge in customer service innovation, offering cutting-edge experiences that set them apart from competitors. And as this technology continues to evolve, it will only get more powerful, more intuitive, and more essential for delivering world-class service.

The new CX frontier with Voice AI

As Voice AI continues to evolve from the simple Voice AI chatbot of yesteryear, we are entering a new frontier in customer experience. What started as a rigid, rules-based system has transformed into a dynamic, intelligent agent capable of revolutionizing how businesses engage with their customers. For CX leaders, this new era means greater personalization, enhanced efficiency, and the ability to meet customers where they are—whether it’s through voice, chat, or other digital channels.

We’ve made more progress in this development, but it is far from over. Voice AI is expanding, from deeper integrations with emerging technologies to more advanced predictive capabilities that can elevate customer experiences to new heights. The future holds more exciting developments, and staying ahead will require ongoing adaptation and willingness to embrace change.

Omnichannel capabilities is just the beginning

One fascinating capability of Voice AI is its ability to seamlessly integrate across multiple platforms, making it a truly omnichannel experience. For example, imagine you’re on a phone call with an AI agent, but due to background noise, it becomes difficult to hear. You could effortlessly switch to texting, and the conversation would pick up exactly where it left off in your text messages, without losing any context.

Similarly, if you’re on a call and need to share a photo, you can text the image to the AI agent, which can interpret the content of the photo and respond to it—all while continuing the voice conversation.

Another example of this multi-modal functionality is when you’re on a call and need to spell out something complex, like your last name. Rather than struggle to spell it verbally, you can simply text your name, and the Voice AI system will incorporate the information without disrupting the flow of the interaction. These types of seamless transitions between different modes of communication (voice, text, images) are what make multi-modal Voice AI truly revolutionary.

Voice AI’s future is already here, and it’s evolving rapidly. Today’s systems are approaching a level where they are almost indistinguishable from human interactions, with conversations flowing naturally and effortlessly. But the advancements go beyond merely sounding human—Voice AI is becoming smarter and more intuitive, capable of anticipating customer needs before they even express them. With AI-driven predictions, these systems can now suggest solutions, recommend next steps, and provide highly relevant information in real-time.

Imagine a scenario where Voice AI not only understands what a customer says, but also predicts what they might need next. Whether it’s guiding them through a purchase, solving a complex problem, or offering personalized product recommendations, this technology is leading the way toward a future where customer interactions are smooth, proactive, and deeply personalized.

For CX leaders, these capabilities open up unprecedented opportunities to exceed customer expectations. Those adopting next-generation Voice AI are positioning themselves at the forefront of customer service innovation, offering cutting-edge experiences that differentiate them from competitors. As this technology continues to advance, it will become even more powerful, more intuitive, and essential for delivering exceptional, customer-centric service.

Voice AI’s exciting road ahead

From the original Voice AI chatbot to today, Voice AI’s evolution has already transformed the customer experience—and the future promises continued innovation. From intelligent human-like conversations to predictive capabilities that anticipate needs, Voice AI is destined to change the way businesses interact with their customers in profound ways.

The exciting thing is that this is just the beginning.

The next wave of Voice AI advancements will open up new possibilities that we can only imagine. As a CX leader, the opportunity to harness this technology and stay ahead of customer expectations is within reach. It could be the most exciting time to be at the forefront of these changes.

At Quiq, we are here to guide you through this journey. If you’re curious about our Voice AI offering, we encourage you to watch our recent webinar on how we harness this incredible technology.

One thing is for sure, though: As the landscape continues to evolve, we’ll be right alongside you, helping you adapt, innovate, and lead in this new era of customer experience. Stay tuned, because the future of Voice AI is just getting started, and we’ll continue to share insights and strategies to ensure you stay ahead in this rapidly changing world.

National Furniture Retailer Reduces Escalations to Human Agents by 33%

Posted on October 17, 2024November 15, 2024 Mike Zinne

A well-known furniture brand faced a significant challenge in enhancing their customer experience (CX) to stand out in a competitive market. By partnering with Quiq, they implemented a custom AI Agent to transform customer interactions across multiple platforms and create more seamless journeys. This strategic move resulted in a 33% reduction in support-related escalations to human agents.

On the other end of the spectrum, the implementation of Proactive AI and a Product Recommendation engine led to the largest sales day in the company’s history through increased chat sales, showcasing the power of AI in improving efficiency and driving revenue.

Let’s dive into the furniture retailer’s challenges, how Quiq solved them using next-generation AI, the results, and what’s next for this household name in furniture and home goods.

The challenges: CX friction and missed sales opportunities

A leading name in the furniture and home goods industry, this company has long been known for its commitment to quality and affordability. Operating in a sector often the first to signal economic shifts, the company recognized the need to differentiate itself through exceptional customer experience.

Before adopting Quiq’s solution, the company struggled with several CX challenges that impeded their ability to capitalize on customer interactions. To start, their original chatbot used basic natural language understanding (NLU), and failed to deliver seamless and satisfactory customer journeys.

Customers experienced friction, leading to escalations, redundant conversations. The team clearly needed a robust system that could streamline operations, reduce costs, and enhance customer engagement.

So, the furniture retailer sought a solution that could not only address these inefficiencies, but also support their sales organization by effectively capturing and routing leads.

The solution: Quiq’s next-gen AI

With a focus on enhancing every touch point of the customer journey, the furniture company’s CX team embarked on a mission to elevate their service offerings, making CX a primary differentiator. Their pursuit led them to Quiq, a trusted technology partner poised to bring their vision to life through advanced AI and automation capabilities.

Quiq partnered with the team to develop a custom AI Agent, leveraging the natural language capabilities of Large Language Models (LLMs) to help classify sales vs. support inquiries and route them accordingly. This innovative solution enables the company to offer a more sophisticated and engaging customer experience.

The AI Agent was designed to retrieve accurate information from various systems—including the company’s CRM, product catalog, and FAQ knowledge base—ensuring customers received timely, relevant, and accurate responses.

By integrating this AI Agent into webchat, SMS, and Apple Messages for Business, the company successfully created a seamless, consistent, and faster service experience.

The AI Agent also facilitated proactive customer engagement by using a new Product Recommendation engine. This feature not only guided customers through their purchase journey, but also contributed to a significant shift in sales performance.

The results are nothing short of incredible

The implementation of the custom AI Agent by Quiq has already delivered remarkable results. One of the most significant achievements was a 33% reduction in escalations to human agents. This reduction translated to substantial operational cost savings and allowed human agents to focus on complex or high-value interactions, enhancing overall service quality.

Moreover, the introduction of Proactive AI and the Product Recommendation engine led to unprecedented sales success. The furniture retailer experienced its largest sales day for Chat Sales in the company’s history, with an impressive 10% of total daily sales attributed to this channel for the first time.

This outcome underscored the potential of AI-powered solutions in driving business growth, optimizing efficiency, and elevating customer satisfaction.

Results recap:

33% reduction in escalations to human agents.
10% of total daily sales attributed to chat (largest for the channel in company history).
Tighter, smoother CX with Proactive AI and Product Recommendations woven into customer interactions.

What’s next?

The partnership between this furniture brand and Quiq exemplifies the transformative power of AI in redefining customer experience and achieving business success. By addressing challenges with a robust AI Agent, the company not only elevated its CX offerings, but also significantly boosted its sales performance. This case study highlights the critical role of AI in modern business operations and its impact on a company’s competitive edge.

Looking ahead, the company and Quiq are committed to continuing their collaboration to explore further AI enhancements and innovations. The team plans to implement Agent Assist, followed by Voice and Email AI to further bolster seamless customer experiences across channels. This ongoing partnership promises to keep the furniture retailer at the forefront of CX excellence and business growth.

What is LLM Function Calling and How Does it Work?

Posted on October 16, 2024December 11, 2024 Kyle McIntyre

For all their amazing capabilities, LLMs have a fundamental weakness: they can’t actually do anything. They read a sequence of input tokens (the prompt) and produce a sequence of output tokens (one at a time) known as the completion. There are no side effects—just inputs and outputs. So something else, such as the application you’re building, has to take the LLM’s output and do something useful with it.

But how can we get an LLM to reliably generate output that conforms to our application’s requirements? Function calls, also known as tool usages, make it easier for your application to do something useful with an LLM’s output.

Note: LLM functions and tools generally refer to the same concept. ‘Tool’ is the term used by Anthropic/Claude, whereas OpenAI uses the term function as a specific type of tool. For purposes of this article, they are used interchangeably.

What Problem Does LLM Function Calling Solve?

To better understand the problem that function calls solve, let’s pretend we’re adding a new feature to an email client that allows the user to provide shorthand instructions for an email and use an LLM to generate the subject and body:

Our application might build up a prompt request like the following GPT-4o-Mini example. Note how we ask the LLM to return a specific format expected by our application:

user = “Kyle McIntyre”
recipient = “Aunt Suzie (suzieq@mailinator.com)”
user_input = "Tell her I can’t make it this Sunday, taking dog to vet. Ask how things are going, keep it folksy yet respectful."

prompt = f"""
Draft an email on behalf of the user, {user}, to {recipient}.

Here are the user’s instructions: {user_input}

Generate a subject and body. Format your response as JSON as follows:

{{
  "subject": <email subject,
  "body": <email body>
}}

Your response:
"""

request = {
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "user",
      "content": prompt
    }
  ],
  "response_format": {
    "type": "json_object"
  }
}

response = requests.post('https://api.openai.com/v1/chat/completions', auth=('Bearer', secret), json=request)

Assume our application sends this prompt and receives a completion back. What do we know about the completion? In a word: nothing.

Although LLMs do their best to follow our instructions, there’s no guarantee that the output will adhere to our requested schema. Subject and body could be missing, incorrectly capitalized, or perhaps be of the wrong type. Additional properties we didn’t ask for might also be included. Prior to the advent of function calls, our only options at this point were to

Continually tweak our prompts in an effort to get more reliable outputs
Write very tolerant deserialization and coercion logic in our app to make the LLM’s output adhere to our expectation
Retry the prompt multiple times until we receive legal output

Function calls, and a related model feature known as “structured outputs”, make all of this much easier and more reliable.

Function Calls to the Rescue

Let’s code up the same example using a function call. In order to get an LLM to ‘use’ a tool, you must first define it. Typically this involves giving it a name and then defining the schema of the function’s arguments.

In the example below, we define a tool named “draft_email” that takes two required arguments, body and subject, both of which are strings:

user = “Kyle McIntyre”
recipient = “Aunt Suzie (suzieq@mailinator.com)”
user_input = "Tell her I can’t make it this Sunday, taking dog to vet. Ask how things are going, keep it folksy yet respectful."

prompt = f"""
Use the available function to draft email on behalf of the user, {user}, to {recipient}.

Here are the user’s instructions: {user_input}
"""

tool = {
  "type": "function",
  "function": {
    "name": "draft_email",
    "description": "Draft an email on behalf of the user",
    "parameters": {
      "type": "object",
      "properties": {
        "subject": {
          "type": "string",
          "description": "The email subject",
        },
        "body": {
          "type": "string",
          "description": "The email body",
        }
      },
      "required": ["subject", "body"]
    }
  },
}

request = {
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "user",
      "content": prompt
    }
  ],
  "tools": [tool]
}

response = requests.post('https://api.openai.com/v1/chat/completions', auth=('Bearer', key), json=request)

Defining the tool required some extra work on our part, but it also simplified our prompt. We’re no longer trying to describe the shape of our expected output and instead just say “use the available function”. More importantly, we can now trust that the LLM’s output will actually adhere to our specified schema!

Let’s look at the response message we received from GPT-4o-Mini:

{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "type": "function",
      "function": {
        "name": "draft_email",
        "arguments": "{\"subject\":\"Regrets for This Sunday\",\"body\":\"Hi Aunt Suzie,\\n\\nI hope this email finds you well! I wanted to let you know that I can't make it this Sunday, as I need to take the dog to the vet. \\n\\nHow have things been going with you? I always love hearing about what\u2019s new in your life.\\n\\nTake care and talk to you soon!\\n\\nBest,\\nKyle McIntyre\"}"
      }
    }
  ],
  "refusal": null
}

What we received back is really a request from the LLM to ‘call’ our function. Our application still needs to honor the function call somehow.

But now, rather than having to treat the LLMs output as an opaque string, we can trust that the arguments adhere to our application requirements. The ability to define a contract and trust that the LLM outputs will adhere to it make function calls an invaluable tool when integrating an LLM into an application.
How Does Function Calling Work?

As we saw in the last section, in order to get an LLM to generate reliable outputs we have to define a function or tool for it to use. Specifically, we’re defining a schema that the output needs to adhere to. Function calls and tools work a bit differently across various LLM vendors, but they all require the declaration of a schema and most are based on the open JsonSchema standard.

So, how does an LLM ensure that its outputs adhere to the tool schema? How can stochastic token-by-token output generation be reconciled with strict adherence to a data schema?

The solution is quite elegant: LLMs, or generative AI, still generate their outputs one token at a time when calling a function, but the model is only allowed to choose from the subset of tokens that would keep the output in compliance with the schema. This is done through dynamic token masking based on the schema’s definition. In this way the output is still generative and very intelligent, but guaranteed to adhere to the schema.

Function Calling Misnomers and Misconceptions

The name ‘function call’ is somewhat misleading because it sounds like the LLM is going to actually do something on your behalf (and thereby cause side effects). But it doesn’t. When the LLM decides to ‘call’ a function, that just means that it’s going to generate output that represents a request to call that function. It’s still the responsibility of your application to handle that request and do something with it—but now you can trust the shape of the payload.

For this reason, a LLM function doesn’t need to map directly to any true function or method in your application, or any real API. Instead, LLM functions can (and probably should) be defined to be more conceptual from the perspective of the LLM.

Use in Agentic Workflows

So, are function calls only useful for constraining output? While that is certainly their primary purpose, they can also be quite useful in building agentic workflows. Rather than presenting a model with a single tool definition, you can instead present it with multiple tools and ask the LLM to use the tools at its disposal to help solve a problem.

For example, you might provide the LLM with the following tools in a CX context:

escalate() – Escalate the conversation to a human agent for further review
search(query) – Search a knowledgebase for helpful information
emailTranscript() – Email the customer a transcript of the conversation

When using function calls in an agentic workflow, the application typically interprets the function call and somehow uses it to update the information passed to the LLM in the next turn.

It’s also worth noting that conversational LLMs can call functions and generate output messages intended for the user all at the same time. If you were building an AI DJ, the LLM might call a function like play_track(“Traveler”, “Chris Stapleton”) while simultaneously saying to the user: “I’m spinning up one of your favorite Country tunes now”.

Function Calling in Quiq’s AI Studio

Function calling is fully supported in Quiq’s AI Studio on capable LLMs. However, AI Studio goes further than basic function call support in three key ways:

The expected output shape of any prompt (the completion schema) can be visually configured in the Prompt Editor
Prompt outputs aren’t just used for transient function calls but become attached to the visual flow state for inspection later in the same prompt chain or conversation
Completion schemas can be configured on LLMs – even those that don’t support function calls

If you’re interested to learn more about AI Studio, please request a trial.

Why LLM Observability Matters (and Strategies for Getting it Right)

Posted on October 9, 2024December 11, 2024 J.R. Rettenmeyer

When integrating Large Language Models (LLMs), or generative AI, into applications, you can’t afford to treat them like “black boxes.” As your LLM application scales and becomes more complex, the need to monitor, troubleshoot, and understand how the LLM impacts your application becomes critical. In this article, we’ll explore the observability strategies we’ve found useful here at Quiq.

Key Elements of an Effective LLM Observability Strategy

Provide Access: Encourage business users to engage actively in testing and optimization.
Encourage Exploration: Make it easy to explore the application under different scenarios.
Create Transparency: Clearly show how the model interacts within your application, reveal decision-making processes, system interactions, and how outputs are verified.
Handle Errors Gracefully: Proactively identify and handle deviations or errors.
Track System Performance: Expose metrics like response times, token usage, and errors.

LLMs add a layer of unpredictability and complexity to an application. Your observability tooling should allow you to actively explore both known and unknown issues while fostering an environment where engineers and business users can collaborate to create a new kind of application.

5 Strategies for LLM Observability

We will discuss strategies from the perspective of a real world event. An “event” triggers an application to process input and provides output back to the world.

A few examples of events include:

Chat user message input > Chat response
An email arriving into a ticketing system > Suggested reply
A case being closed > Case updated for topic or other classifications

You may have heard of these events referred to as prompt chains, prompt pipelines, agentic workflows, or conversational turns. The key takeaway; an event will require more than a single call to an LLM. Your LLM application’s job is to orchestrate LLM prompts, data requests, decisions and actions. The following strategies will help you understand what’s happening inside your LLM application.

1. Tracing Execution Paths

Any given event may follow different execution paths. Tracing the execution path should allow you to understand what state is set, which knowledge was retrieved, functions called, and generally how and why the LLM generated and verified the response. The ability to trace the execution path of an event will provide invaluable visibility into your application behavior.

For example, if your application delivers a message that offers a live agent; was it because the topic was sensitive, the user was frustrated or there was a gap in the knowledge resources? Tracing the execution path will help you pinpoint the prompt, knowledge or logic that drove the response. This is the first step in monitoring and optimizing an AI application. Your LLM observability should provide a full trace of the execution path that led to a response being delivered.

2. Replay Mechanisms for Faster Debugging

In real-world applications, being able to reproduce and fix errors quickly is critical. Implementing an event replay mechanism—where past events can be replayed against the current system configuration will provide a fast feedback loop.

Replaying events also helps when modifying prompts, upgrading models, adding knowledge or editing business rules. Changing your LLM application should be done in a controlled environment where you can replay events and ensure the desired effect without introducing new issues.

3. State Management & Monitoring

Another key aspect of LLM observability is capturing how your application’s field values or state changes during an event, as well as, across related events such as a conversation. Understanding the state of different variables can help you better understand and recreate the results of your LLM application.

Many use cases will also make use of memory. You should strive to manage this memory consistently and use caching for order or product info to reduce unnecessary network calls. In addition to data caches, multi-turn conversations may react differently based on the memory state. Suppose a user types “I need help” and you have implemented a next-best-action classifier with the following options:

Clarify the inquiry
Find Information
Escalate to live agent

The action taken may depend on whether “I need help” is the 1st or 5th message of the conversation. The response could also depend on whether the inquiry type is something you want your live agents handling.

The key takeaway – LLMs introduce a new kind of intelligence, but you’ll still need to manage state and domain specific logic to ensure your application is aware of its context. Clear visibility into the state of your application and your ability to reproduce it are vital parts of your observability strategy.

4. Claims Verification

A critical challenge with LLMs is ensuring the validity of the information they generate. Some refer to these made up answers as hallucinations. A hallucination is a statement made up by the LLM, usually because it makes semantic sense.

A claims verification process provides confidence that a response is grounded, attributable and verified by approved evidence from known knowledge or API resources. A dedicated verification model should be used to provide a confidence score and handling should be put in place to align answers that fail verification. The verification process should use metrics such as the maximum, minimum, and average scores and attribute answers to one or many resources.

For example:

On Verified: Define actions to take when a claim is verified. This could involve attributing the answer to one or many articles or API responses and then delivering a response to the end user.
On Unverified: Set workflows for unverified claims, such as retrying a prompt pipeline, aligning a corrective response, or escalating the issue to a human agent.

By integrating a claims verification model and process into your LLM application, you gain the ability to prevent hallucinations and attribute responses to known resources. This clear and traceable attribution will equip you with the information you need to field questions from stakeholders and provide insight into how you can improve your knowledge.

5. Regression Tests

After optimizing prompts, upgrading models, or introducing new knowledge; you’ll want to ensure that these changes don’t introduce new problems. Earlier, we talked about replaying events and this replay capability should be the basis for creating your test cases. You should be able to save any event as a regression test. Your test-sets should be run individually or in batch as part of a continuous integration pipeline.

The models are moving fast and your LLM application will be under constant pressure to get faster, smarter and cheaper. Test sets will give you the visibility and confidence you need to stay ahead of your competition.

Setting Performance Goals

While the above strategies are essential, it’s also important to evaluate how well your system is achieving its higher-level objectives. This is where performance goals come into play. Goals should be instrumented to track whether your application is successfully meeting the business objectives.

Goal Success: Measure how often your application achieves a defined objective, such as confirming an upcoming appointment, rendering an order status, or receiving positive user feedback.
Goal Failure: Track instances where the LLM fails to complete a task or requires human assistance.

Keep in mind that an event such as a live agent escalation could be considered success for one type of inquiry, and a failure in a different scenario. Goal instrumentation should provide a high degree of flexibility. By setting clear success and failure criteria for your application, you will be better positioned to evaluate its performance over time and identify areas for improvement.

Applying Segmentation to Hone In

Segmentation is a powerful tool for diving deeper into your LLM application’s performance. By grouping conversations or events based on specific criteria, such as inquiry type, user type or product category; you can focus your analysis on areas that matter most to your application.

For instance, you may want to segment conversations to see if your application behaves differently on web versus mobile, or across sales versus service inquiries. You can also create more complex segments that filter interactions based on specific events, such as when an error occurred or when a specific topic category was in play. Segmentation allows you to tailor your observability efforts to the use cases and specific needs of your business.

Using Funnels for Conversion and Performance Insights

Funnels provide another layer of insight by showing how users progress through a series of steps within a customer journey or conversation. A funnel allows you to visualize drop-offs, identify where users disengage, and track how many complete the intended goal. For example, you can track the steps a customer takes when engaging with your LLM application, from initial inquiry to task completion, and analyze where drop-offs occur.

Funnels can be segmented just like other data, allowing you to drill down by platform, customer type, or interaction type. This helps you understand where improvements are needed and how adjustments to prompts or knowledge bases can enhance the overall experience.

By combining segmentation with funnel analysis, you get a comprehensive view of your LLM’s effectiveness and can pinpoint specific areas for optimization.

A/B Testing for Continuous Improvement

A/B testing is a vital tool for systematically improving LLM application performance by comparing different versions of prompts, responses, or workflows. This method allows you to experiment with variations of the same interaction and measure which version produces better results. For instance, you can test two different prompts to see which one leads to more successful goal completions or fewer errors.

By running A/B tests, you can refine your prompt design, optimize the LLM’s decision-making logic, and improve overall user experience. The results of these tests give you data-backed insights, helping you implement changes with confidence that they’ll positively impact performance.

Additionally, A/B testing can be combined with funnel analysis, allowing you to track how changes affect customer behavior at each step of the journey. This ensures that your optimizations not only improve specific interactions but also lead to better conversion rates and task completions overall.

Final Thoughts on LLM Observability

LLM observability is not just a technical necessity but a strategic advantage. Whether you’re dealing with prompt optimization, function call validation, or auditing sensitive interactions, observability helps you maintain control over the outputs of your LLM application. By leveraging tools such as event debug-replay, regression tests, segmentation, funnel analysis, A/B testing, and claims verification, you will build trust that you have a safe and effective LLM application.

Curious about how Quiq approaches LLM observability? Get in touch with us.

Everything You Need to Know About LLM Integration

Posted on October 3, 2024December 11, 2024 Kyle McIntyre

It’s hard to imagine an application, website or workflow that wouldn’t benefit in some way from the new electricity that is generative AI. But what does it look like to integrate an LLM into an application? Is it just a matter of hitting a REST API with some basic auth credentials, or is there more to it than that?

In this article, we’ll enumerate the things you should consider when planning an LLM integration.

Why Integrate an LLM?

At first glance, it might not seem like LLMs make sense for your application—and maybe they don’t. After all, is the ability to write a compelling poem about a lost Highland Cow named Bo actually useful in your context? Or perhaps you’re not working on anything that remotely resembles a chatbot. Do LLMs still make sense?

The important thing to know about ‘Generative AI’ is that it’s not just about generating creative content like poems or chat responses. Generative AI (LLMs) can be used to solve a bevy of other problems that roughly fall into three categories:

Making decisions (classification)
Transforming data
Extracting information

Let’s use the example of an inbound email from a customer to your business. How might we use LLMs to streamline that experience?

Making Decisions
- Is this email relevant to the business?
- Is this email low, medium or high priority?
- Does this email contain inappropriate content?
- What person or department should this email be routed to?
Transforming data
- Summarize the email for human handoff or record keeping
- Redact offensive language from the email subject and body
Extracting information
- Extract information such as a phone number, business name, job title etc from the email body to be used by other systems
Generating Responses
- Generate a personalized, contextually-aware auto-response informing the customer that help is on the way
- Alternatively, deploy a more sophisticated LLM flow (likely involving RAG) to directly address the customer’s need

It’s easy to see how solving these tasks would increase user satisfaction while also improving operational efficiency. All of these use cases are utilizing ‘Generative AI’, but some feel more generative than others.

When we consider decision making, data transformation and information extraction in addition to the more stereotypical generative AI use cases, it becomes harder to imagine a system that wouldn’t benefit from an LLM integration. Why? Because nearly all systems have some amount of human-generated ‘natural’ data (like text) that is no longer opaque in the age of LLMs.

Prior to LLMs, it was possible to solve most of the tasks listed above. But, it was exponentially harder. Let’s consider ‘is this email relevant to the business’. What would it have taken to solve this before LLMs?

A dataset of example emails labeled true if they’re relevant to the business and false if not (the bigger the better)
A training pipeline to produce a custom machine learning model for this task
Specialized hardware or cloud resources for training & inferencing
Data scientists, data curators, and Ops people to make it all happen

LLMs can solve many of these problems with radically lower effort and complexity, and they will often do a better job. With traditional machine learning models, your model is, at best, as good as the data you give it. With generative AI you can coach and refine the LLM’s behavior until it matches what you desire – regardless of historical data.

For these reasons LLMs are being deployed everywhere—and consumers’ expectations continue to rise.

How Do You Feel About LLM Vendor Lock-In?

Once you’ve decided to pursue an LLM integration, the first issue to consider is whether you’re comfortable with vendor lock-in. The LLM market is moving at lightspeed with the constant release of new models featuring new capabilities like function calls, multimodal prompting, and of course increased intelligence at higher speeds. Simultaneously, costs are plummeting. For this reason, it’s likely that your preferred LLM vendor today may not be your preferred vendor tomorrow.

Even at a fixed point in time, you may need more than a single LLM vendor.

In our recent experience, there are certain classification problems that Anthropic’s Claude does a better job of handling than comparable models from OpenAI. Similarly, we often prefer OpenAI models for truly generative tasks like generating responses. All of these LLM tasks might be in support of the same integration so you may want to look at the project not so much as integrating a single LLM or vendor, but rather a suite of tools.

If your use case is simple and low volume, a single vendor is probably fine. But if you plan to do anything moderately complex or high scale you should plan on integrating multiple LLM vendors to have access to the right models at the best price.

Resiliency & Scalability are Earned—Not Given

Making API calls to an LLM is trivial. Ensuring that your LLM integration is resilient and scalable requires more elbow grease. In fact, LLM API integrations pose unique challenges:

Challenge	Solutions
They are pretty slow	If your application is high-scale and you’re doing synchronous (threaded) network calls, your application won’t scale very well since most threads will be blocked on LLM calls. Consider switching to async I/O. You’ll also want to support running multiple prompts in parallel to reduce visible latency to the user.
They are throttled by requests per minute and tokens per minute	Attempt to estimate your LLM usage in terms of requests and LLM tokens per minute and work with your provider(s) to ensure sufficient bandwidth for peak load
They are (still) kinda flakey (unpredictable response times, unresponsive connections)	Employ various retry schemes in response to timeouts, 500s, 429s (rate limit) etc.

The above remediations will help your application be scalable and resilient while your LLM service is up. But what if it’s down? If your LLM integration is on a critical execution path you’ll want to support automatic failover. Some LLMs are available from multiple providers:

OpenAI models are hosted by OpenAI itself as well as Azure
Anthropic models are hosted by Anthropic itself as well as AWS

Even if an LLM only has a single provider, or even if it has multiple, you can also provision the same logical LLM in multiple cloud regions to achieve a failover resource. Typically you’ll want the provider failover to be built into your retry scheme. Our failover mechanisms get tripped regularly out in production at Quiq, no doubt partially because of how rapidly the AI world is moving.

Are You Actually Building an Agentic Workflow?

Oftentimes you have a task that you know is well-suited for an LLM. For example, let’s say you’re planning to use an LLM to analyze the sentiment of product reviews. On the surface, this seems like a simple task that will require one LLM call that passes in the product review and asks the LLM to decide the sentiment. Will a single prompt suffice? What if we also want to determine if a given review contains profanity or personal information? What if we want to ask three LLMs and average their results?

Many tasks require multiple prompts, prompt chaining and possibly RAG (Retrieval Augmented Generation) to best solve a problem. Just like humans, AI produces better results when a problem is broken down into pieces. Such solutions are variously known as AI Agents, Agentic Workflows or Agent Networks and are why open source tools like LangChain were originally developed.

In our experience, pretty much every prompt eventually grows up to be an Agentic Workflow, which has interesting implications for how it’s configured & monitored.

Be Ready for the Snowball Effect

Introducing LLMs can result in a technological snowball effect, particularly if you need to use Retrieval Augmented Generation (RAG). LLMs are trained on mostly public data that was available at a fixed point in the past. If you want an LLM to behave in light of up-to-date and/or proprietary data sources (which most non-trivial applications do) you’ll need to do RAG.

RAG refers to retrieving the up-to-date and/or proprietary data you want the LLM to use in its decision making and passing it to the LLM as part of your prompt.

Assuming you need to search a reference dataset like a knowledge base, product catalog or product manual, the retrieval part of RAG typically entails adding the following entities to your system:

1. An embedding model

An embedding model is roughly half of an LLM – it does a great job of reading and understanding information you pass it but instead of generating a completion it produces a numeric vector that encodes its understanding of the source material.

You’ll typically run the embeddings model on all of the business data you want to search and retrieve for the LLM. Most LLM providers also have embedding models, or you can hit one via any major cloud.

2. A vector database

Once you have embeddings for all of your business data, you need to store them somewhere that facilitates speedy search based on numeric vectors. Solutions like Pinecone and MilvusDB fill this need, but that means integrating a new vendor or hosting a new database internally.

After implementing embeddings and a vector search solution, you can now retrieve information to include in the prompts you send to your LLM(s). But how can you trust that the LLM’s response is grounded in the information you provided and not something based on stale information or purely made up?

There are specialized deep learning models that exist solely for the purpose of ensuring that an LLM’s generative claims are grounded in facts you provide. This practice is variously referred to as hallucination detection, claim verification, NLI, etc. We believe NLI models are an essential part of a trustworthy RAG pipeline, but managed cloud solutions are scarce and you may need to host one yourself on GPU-enabled hardware.

Is a Black Box Sustainable?

If you bake your LLM integration directly into your app, you will effectively end up with a black box that can only be understood and improved by engineers. This could make sense if you have a decent size software shop and they’re the only folks likely to monitor or maintain the integration.

However, your best software engineers may not be your best (or most willing) prompt engineers, and you may wish to involve other personas like product and experience designers since an LLM’s output is often part of your application’s presentation layer & brand.

For these reasons, prompts will quickly need to move from code to configuration – no big deal. However, as an LLM integration matures it will likely become an Agentic Workflow involving:

More prompts, prompt parallelization & chaining
More prompt engineering
RAG and other orchestration

Moving these concerns into configuration is significantly more complex but necessary on larger projects. In addition, people will inevitably want to observe and understand the behavior of the integration to some degree.

For this reason it might make sense to embrace a visual framework for developing Agentic Workflows from the get-go. By doing so you open up the project to collaboration from non-engineers while promoting observability into the integration. If you don’t go this route be prepared to continually build out configurability and observability tools on the side.

Quiq’s AI Automations Take Care of LLM Integration Headaches For You

Hopefully we’ve given you a sense for what it takes to build an enterprise LLM integration. Now it’s time for the plug. The considerations outlined above are exactly why we built AI Studio and particularly our AI Automations product.

With AI automations you can create a serverless API that handles all the complexities of a fully orchestrated AI-flow, including support for multiple LLMs, chaining, RAG, resiliency, observability and more. With AI Automations your LLM integration can go back to being ‘just an API call with basic auth’.

Want to learn more? Dive into AI Studio or reach out to our team.

Request A Demo

The Truth About APIs for AI: What You Need to Know

Posted on August 7, 2024November 15, 2024 The Quiq Team

Large language models hold a lot of power to improve your customer experience and make your agents more effective, but they won’t do you much good if you don’t have a way to actually access them.

This is where application programming interfaces (APIs) come into play. If you want to leverage LLMs, you’ll either have to build one in-house, use an AI API deployment to interact with an external model, or go with a customer-centric AI for CX platform. The latter choice is most ideal because it offers a guided building environment that removes complexity while providing the tools you need for scalability, observability, hallucination prevention, and more.

From a cost and ease-of-use perspective this third option is almost always best, but there are many misconceptions that could potentially stand in the way of AI API adoption.

In fact, a stronger claim is warranted: to maximize AI API effectiveness, you need a platform to orchestrate between AI, your business logic, and the rest of your CX stack.

Otherwise, it’s useless.

This article aims to bridge the gap between what CX leaders might think is required to integrate a platform, and what’s actually involved. By the end, you’ll understand what APIs are, their role in personalization and scalability, and why they work best in the context of a customer-centric AI for CX platform.

How APIs Facilitate Access to AI Capabilities

Let’s start by defining an API. As the name suggests, APIs are essentially structured protocols that allow two systems (“applications”) to communicate with one another (“interface”). For instance, if you’re using a third-party CRM to track your contacts, you’ll probably update it through an API.

All the well-known foundation model providers (e.g., OpenAI, Anthropic, etc.) have a real-world AI API implementation that allows you to use their service. For an AI API practical example, let’s look at OpenAI’s documentation:

(Let’s take a second to understand what we’re looking at. Don’t worry – we’ll break it down for you. Understanding the basics will give you a sense for what your engineers will be doing.)

The top line points us to a URL where we can access OpenAI’s models, and the next three lines require us to pass in an API key (which is kind of like a password giving access to the platform), our organization ID (a unique designator for our particular company, not unlike a username), and a project ID (a way to refer to this specific project, useful if you’re working on a few different projects at once).

This is only one example, but you can reasonably assume that most protocols built according to AI API best practices will have a similar structure.

This alone isn’t enough to support most AI API use cases, but it illustrates the key takeaway of this section: APIs are attractive because they make it easy to access the capabilities of LLMs without needing to manage them on your own infrastructure, though they’re still best when used as part of a move to a customer-centric AI orchestration platform.

How Do APIs Facilitate Customer Support AI Assistants?

It’s good to understand what APIs are used for in AI assistants. It’s pretty straightforward—here’s the bulk of it:

Personalizing customer communications: One of the most exciting real-world benefits of AI is that it enables personalization at scale because you can integrate an LLM with trusted systems containing customer profiles, transaction data, etc., which can be incorporated into a model’s reply. So, for example, when a customer asks for shipping information, you’re not limited to generic responses like “your item will be shipped within 3 days of your order date.” Instead, you can take a more customer-centric approach and offer specific details, such as, “The order for your new couch was placed on Monday, and will be sent out on Wednesday. According to your location, we expect that it’ll arrive by Friday. Would you like to select a delivery window or upgrade to white glove service?”
Improving response quality: Generative AI is plagued by a tendency to fabricate information. With an AI API, work can be decomposed into smaller, concrete tasks before being passed to an LLM, which improves performance. You can also do other things to get better outputs, such as create bespoke modifications of the prompt that change the model’s tone, the length of its reply, etc.
Scalability and flexibility in deployment: A good customer-centric, AI-for-CX platform will offer volume-based pricing, meaning you can scale up or down as needed. If customer issues are coming in thick and fast (such as might occur during a new product release, or over a holiday), just keep passing them to the API while paying a bit more for the increased load; if things are quiet because it’s 2 a.m., the API just sits there, waiting to spring into action when required and costing you very little.
Analyzing customer feedback and sentiment: Incredible insights are waiting within your spreadsheets and databases, if you only know how to find them. This, too, is something APIs help with. If, for example, you need to unify measurements across your organization to send them to a VOC (voice of customer) platform, you can do that with an API.

Looking Beyond an API for AI Assistants

For all this, it’s worth pointing out that there’s still many real-world AI API challenges. By far the quickest way to begin building an AI assistant for CX is to pair with a customer-centric AI platform that removes as much of the difficulty as possible.

The best such platforms not only allow you to utilize a bevy of underlying LLM models, they also facilitate gathering and analyzing data, monitoring and supporting your agents, and automating substantial parts of your workflow.

Crucially, almost all of those critical tasks are facilitated through APIs, but they can be united in a good platform.

3 Common Misconceptions about Customer-Centric AI for CX Platforms.

Now, let’s address some of the biggest myths surrounding the use of AI orchestration platforms.

Myth 1: Working with a customer-centric AI for CX Platform Will be a Hassle

Some CX leaders may worry that working with a platform will be too difficult. There are challenges, to be sure, but a well-designed platform with an intuitive user interface is easy to slip into a broader engineering project.

Such platforms are designed to support easy integration with existing systems, and they generally have ample documentation available to make this task as straightforward as possible.

Myth 2: AI Platforms Cost Too Much

Another concern CX leaders have is the cost of using an AI orchestration platform. Platform costs can add up over time, but this pales in comparison to the cost of building in-house solutions. Not to mention the potential costs associated with the risks that come with building AI in an environment that doesn’t protect you from things like hallucinations.

When you weigh all the factors impacting your decision to use AI in your contact center, the long-run return on using an AI orchestration platform is almost always better.

Myth 3: Customer-Centric AI Platforms are Just Too Insecure

The smart CX leader always has one eye on the overall security of their enterprise, so they may be worried about vulnerabilities introduced by using an AI platform.

This is a perfectly reasonable concern. If you’re trying to choose between a few different providers, it’s worth investigating the security measures they’ve implemented. Specifically, you want to figure out what data encryption and protection protocols they use, and how they think about compliance with industry standards and regulations.

At a minimum, the provider should be taking basic steps to make sure data transmitted to the platform isn’t exposed.

Is an AI Platform Right for Me?

With a platform focused on optimizing CX outcomes, you can quickly bring the awesome power and flexibility of generative AI into your contact center – without ever spinning up a server or fretting over what “backpropagation” means. To the best of our knowledge, this is the cheapest and fastest way to demo this API technology in your workflow to determine whether it warrants a deeper investment.

To parse out more generative AI facts from fiction, download our e-book on AI misconceptions and how to overcome them. If you’re concerned about hallucinations, data privacy, and similar issues, you won’t find a better one-stop read!

Request A Demo

What is an AI Assistant for Retail?

Posted on November 17, 2023November 15, 2024 Mark Kowal

Over the past few months, we’ve had a lot to say about artificial intelligence, its new frontiers, and the ways in which it is changing the customer service industry.

A natural extension of this analysis is looking at the use of AI in retail. That is our mission today. We’ll look at how techniques like natural language processing and computer vision will impact retail, along with some of the benefits and challenges of this approach.

Let’s get going!

How is AI Used in Retail?

AI is poised to change retail, as it is changing many other industries. In the sections that follow, we’ll talk through three primary AI technologies that are driving these changes, namely natural language processing, computer vision, and machine learning more broadly.

Natural Language Processing

Natural language processing (NLP) refers to a branch of machine learning that attempts to work with spoken or written language algorithmically. Together with computer vision, it is one of the best-researched and most successful attempts to advance AI since the field was founded some seven decades ago.

Of course, these days the main NLP applications everyone has heard of are large language models like ChatGPT. This is not the only way AI assistants will change retail, but it is a big one, so that’s where we’ll start.

An obvious place to use LLMs in retail is with chatbots. There’s a lot of customer interaction that involves very specific questions that need to be handled by a human customer service agent, but a lot of it is fairly banal, consisting of things like “How do I return this item” or “Can you help me unlock my account.” For these sorts of issues, today’s chatbots are already powerful enough to help in most situations.

A related use case for AI in retail is asking questions about specific items. A customer might want to know what fabric an article of clothing is made out of or how it should be cleaned, for example. An out-of-the-box model like ChatGPT won’t be able to help much. but if you’ve used a service like Quiq’s conversational CX platform, it’s possible to finetune an LLM on your specific documentation. Such a model will be able to help customers find the answers they need.

These use cases are all centered around text-based interactions, but algorithms are getting better and better at both speech recognition and speech synthesis. You’ve no doubt had the distinct (dis)pleasure of interacting with an automated system that sounded very artificial and that lacked the flexibility actually to help you very much; but someday soon, you may not be able to tell from a short conversation whether you were talking to a human or a machine.

This may cause a certain amount of concern over technological unemployment. If chatbots and similar AI assistants are doing all this, what will be left for flesh-and-blood human workers? Frankly, it’s too early to say, but the evidence so far suggests that not only is AI not making us obsolete, it’s actually making workers more productive and less prone to burnout.

Computer Vision

Computer vision is the other major triumph of machine learning. CV algorithms have been created that can recognize faces, recognize thousands of different types of objects, and even help with steering autonomous vehicles.

How does any of this help with retail?

We already hinted at one use case in the previous paragraph, i.e. automatically identifying different items. This has major implications for inventory management, but when paired with technologies like virtual reality and augmented reality, it could completely transform the ways in which people shop.

Many platforms already offer the ability to see furniture and similar items in a customer’s actual living space, and there are efforts underway to build tools for automatically sizing them so they know exactly which clothes to try on.

CV is also making it easier to gather and analyze different metrics crucial to a retail enterprise’s success. Algorithms can watch customer foot traffic to identify potential hotspots, meaning that these businesses can figure out which items to offer more of and which to cut altogether.

Machine Learning

As we stated earlier, both natural language processing and computer vision are types of machine learning. We gave them their own sections because they’re so big and important, but they’re not the only ways in which machine learning will impact retail.

Another way is with increasingly personalized recommendations. If you’ve ever taken the advice of Netflix or Spotify as to what entertainment you should consume next then you’ve already made contact with a recommendation engine. But with more data and smarter algorithms, personalization will become much more, well, personalized.

In concrete terms, this means it will become easier and easier to analyze a customer’s past buying history to offer them tailor-made solutions to their problems. Retail is all about consumer satisfaction, so this is poised to be a major development.

Machine learning has long been used for inventory management, demand forecasting, etc., and the role it plays in these efforts will only grow with time. Having more data will mean being able to make more fine-grained predictions. You’ll be able to start printing Taylor Swift t-shirts and setting up targeted ads as soon as people in your area begin buying tickets to her show next month, for example.

Where are AI Assistants Used in Retail?

So far, we’ve spoken in broad terms about the ways in which AI assistants will be used in retail. In these sections, we’ll get more specific and discuss some of the particular locations where these assistants can be deployed.

In Kiosks

Many retail establishments already have kiosks in place that let you swap change for dollars or skip the trip to the DMV. With AI, these will become far more adaptable and useful, able to help customers with a greater variety of transactions.

In Retail Apps

Mobile applications are an obvious place to use recommendations or LLM-based chatbots to help make a sale or get customers what they need.

In Smart Speakers

You’ve probably heard of Alexa, a smart speaker able to play music for you or automate certain household tasks. Well, it isn’t hard to imagine their use in retail, especially as they get better. They’ll be able to help customers choose clothing, handle returns, or do any of a number of related tasks.

In Smart Mirrors

For more or less the same reason, AI-powered smart mirrors could have a major impact on retail. As computer vision improves it’ll be better able to suggest clothing that looks good on different heights and builds, for example.

What are the Benefits of Using AI in Retail?

The main reason that AI is being used more frequently in retail is that there are so many advantages to this approach. In the next few sections, we’ll talk about some of the specific benefits retail establishments can expect to enjoy from their use of AI.

Better Customer Experience and Engagement

These days, there are tons of ways to get access to the goods and services you need. What tends to separate one retail establishment from another is customer experience and customer engagement. AI can help with both.

We’ve already mentioned how much more personalized AI can make the customer experience, but you might also consider the impact of round-the-clock availability that AI makes possible.

Customer service agents will need to eat and sleep sometimes, but AI never will, which means that it’ll always be available to help a customer solve their problems.

More Selling Opportunities

Cross-selling and upselling are both terms that are probably familiar to you, and they represent substantial opportunities for retail outfits to boost their revenue.

With personalized recommendations, sentiment analysis, and similar machine-learning techniques, it will become much faster and easier to identify additional items that a customer might be interested in.

If a customer has already bought Taylor Swift tickets and a t-shirt, for example, perhaps they’d also like a fetching hat that goes along with their outfit. And if you’ve installed the smart mirrors we talked about earlier, AI will even be able to help them find the right size.

Leaner, More Efficient Operations

Inventory management is a never-ending concern in retail. It’s also one place where algorithmic solutions have been used for a long time. We think this trend will only continue, with operations becoming leaner and more responsive to changing market conditions.

All of this ultimately hinges on the use of AI. Better algorithms and more comprehensive data will make it possible to predict what people will want and when, meaning you don’t have to sit on inventory you don’t need and are less likely to run out of anything that’s selling well.

What are the Challenges of Using AI in Retail?

That being said, there are many challenges to using Artificial Intelligence in retail. We’ll cover a few of these now so you can decide how much effort you want to put into using AI.

AI Can Still Be Difficult to Use

To be sure, firing up ChatGPT and asking it to recommend an outfit for a concert doesn’t take very long. But this is a far cry from implementing a full-bore AI solution into your website or mobile applications. Serious technical expertise is required to train, finetune, deploy, and monitor advanced AI, whether that’s an LLM, a computer-vision system, or anything else, and you’ll need to decide whether you think you’ll get enough return to justify the investment.

Expense

And speaking of investment, it remains pretty expensive to utilize AI at any non-trivial scale. If you decide you want to hire an in-house engineering team to build a bespoke model, you’ll have to have a substantial budget to pay for the training and the engineer’s salaries. These salaries are still something you’ll have to account for even if you choose to build on top of an existing solution, because finetuning a model is far from easy.

One solution is to utilize an offering like Quiq. We have already created the custom infrastructure required to utilize AI in a retail setting, meaning you wouldn’t need a serious engineering force to get going with AI.

Bias, Abuse, and Toxicity

A perennial concern with using AI is that a model will generate output that is insulting, harmful, or biased in some way. For obvious reasons this is bad for retail establishments, so you’ll want to make sure that you both carefully finetune this behavior out of your models and continually monitor them in case their behavior changes in the future. Quiq also eliminates this risk.

AI and the Future of Retail

Artificial intelligence has long been expected to change many aspects of our lives, and in the past few years, it has begun delivering on that promise. From ultra-precise recommendations to full-fledged chatbots that help resolve complex issues, retail stands to benefit greatly from this ongoing revolution.

If you want to get in on the action but don’t know where to start, set up a time to check out the Quiq platform. We make it easy to utilize both customer-facing and agent-facing solutions, so you can build an AI-positive business without worrying about the engineering.

Request A Demo

The 5 Most Asked Questions About AI

Posted on October 13, 2023March 3, 2025 Lauren Winder

The term “artificial intelligence” was coined at the famous Dartmouth Conference in 1956, put on by luminaries like John McCarthy, Marvin Minsky, and Claude Shannon, among others.

These organizers wanted to create machines that “use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.” They went on to claim that “…a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”

Half a century later, it’s fair to say that this has not come to pass; brilliant as they were, it would seem as though McCarthy et al. underestimated how difficult it would be to scale the heights of the human intellect.

Nevertheless, remarkable advances have been made over the past decade, so much so that they’ve ignited a firestorm of controversy around this technology. People are questioning the ways in which it can be used negatively, and whether it might ultimately pose an extinction risk to humanity; they’re probing fundamental issues around whether machines can be conscious, exercise free will, and think in the way a living organism does; they’re rethinking the basis of intelligence, concept formation, and what it means to be human.

These are deep waters to be sure, and we’re not going to swim them all today. But as contact center managers and others begin the process of thinking about using AI, it’s worth being at least aware of what this broader conversation is about. It will likely come up in meetings, in the press, or in Slack channels in exchanges between employees.

And that’s the subject of our piece today. We’re going to start by asking what artificial intelligence is and how it’s being used, before turning to address some of the concerns about its long-term potential. Our goal is not to answer all these concerns, but to make you aware of what people are thinking and saying.

What is Artificial Intelligence?

Artificial intelligence is famous for having many, many definitions. There are those, for example, who believe that in order to be intelligent computers must think like humans, and those who reply that we didn’t make airplanes by designing them to fly like birds.

For our part, we prefer to sidestep the question somewhat by utilizing the approach taken in one of the leading textbooks in the field, Stuart Russell and Peter Norvig’s “Artificial Intelligence: A Modern Approach”.

They propose a multi-part system for thinking about different approaches to AI. One set of approaches is human-centric and focuses on designing machines that either think like humans – i.e., engage in analogous cognitive and perceptual processes – or act like humans – i.e. by behaving in a way that’s indistinguishable from a human, regardless of what’s happening under the hood (think: the Turing Test).

The other set of approaches is ideal-centric and focuses on designing machines that either think in a totally rational way – conformant with the rules of Bayesian epistemology, for example – or behave in a totally rational way – utilizing logic and probability, but also acting instinctively to remove itself from danger, without going through any lengthy calculations.

What we have here, in other words, is a framework. Using the framework not only gives us a way to think about almost every AI project in existence, it also saves us from needing to spend all weekend coming up with a clever new definition of AI.

Joking aside, we think this is a productive lens through which to view the whole debate, and we offer it here for your information.

What is Artificial Intelligence Good For?

Given all the hype around ChatGPT, this might seem like a quaint question. But not that long ago, many people were asking it in earnest. The basic insights upon which large language models like ChatGPT are built go back to the 1960s, but it wasn’t until 1) vast quantities of data became available, and 2) compute cycles became extremely cheap that much of its potential was realized.

Today, large language models are changing (or poised to change) many different fields. Our audience is focused on contact centers, so that’s what we’ll focus on as well.

There are a number of ways that generative AI is changing contact centers. Because of its remarkable abilities with natural language, it’s able to dramatically speed up agents in their work by answering questions and formatting replies. These same abilities allow it to handle other important tasks, like summarizing articles and documentation and parsing the sentiment in customer messages to enable semi-automated prioritization of their requests.

Though we’re still in the early days, the evidence so far suggests that large language models like Quiq’s conversational CX platform will do a lot to increase the efficiency of contact center agents.

Will AI be Dangerous?

One thing that’s burst into public imagination recently has been the debate around the risks of artificial intelligence, which fall into two broad categories.

The first category is what we’ll call “social and political risks”. These are the risks that large language models will make it dramatically easier to manufacture propaganda at scale, and perhaps tailor it to specific audiences or even individuals. When combined with the astonishing progress in deepfakes, it’s not hard to see how there could be real issues in the future. Most people (including us) are poorly equipped to figure out when a video is fake, and if the underlying technology gets much better, there may come a day when it’s simply not possible to tell.

Political operatives are already quite skilled at cherry-picking quotes and stitching together soundbites into a damning portrait of a candidate – imagine what’ll be possible when they don’t even need to bother.

But the bigger (and more speculative) danger is around really advanced artificial intelligence. Because this case is harder to understand, it’s what we’ll spend the rest of this section on.

Artificial Superintelligence and Existential Risk

As we understand it, the basic case for existential risk from artificial intelligence goes something like this:

“Someday soon, humanity will build or grow an artificial general intelligence (AGI). It’s going to want things, which means that it’ll be steering the world in the direction of achieving its ambitions. Because it’s smart, it’ll do this quite well, and because it’s a very alien sort of mind, it’ll be making moves that are hard for us to predict or understand. Unless we solve some major technological problems around how to design reward structures and goal architectures in advanced agentive systems, what it wants will almost certainly conflict in subtle ways with what we want. If all this happens, we’ll find ourselves in conflict with an opponent unlike any we’ve faced in the history of our species, and it’s not at all clear we’ll prevail.”

This is heady stuff, so let’s unpack it bit by bit. The opening sentence, “…humanity will build or grow an artificial general intelligence”, was chosen carefully. If you understand how LLMs and deep learning systems are trained, the process is more akin to growing an enormous structure than it is to building one.

This has a few implications. First, their internal workings remain almost completely inscrutable. Though researchers in fields like mechanistic interpretability are going a long way toward unpacking how neural networks function, the truth is, we’ve still got a long way to go.

What this means is that we’ve built one of the most powerful artifacts in the history of Earth, and no one is really sure how it works.

Another implication is that no one has any good theoretical or empirical reason to bound the capabilities and behavior of future systems. The leap from GPT-2 to GPT-3.5 was astonishing, as was the leap from GPT-3.5 to GPT-4. The basic approach so far has been to throw more data and more compute at the training algorithms; it’s possible that this paradigm will begin to level off soon, but it’s also possible that it won’t. If the gap between GPT-4 and GPT-5 is as big as the gap between GPT-3 and GPT-4, and if the gap between GPT-6 and GPT-5 is just as big, it’s not hard to see that the consequences could be staggering.

As things stand, it’s anyone’s guess how this will play out. But that’s not necessarily a comforting thought.

Next, let’s talk about pointing a system at a task. Does ChatGPT want anything? The short answer is: as far as we can tell, it doesn’t. ChatGPT isn’t an agent, in the sense that it’s trying to achieve something in the world, but work into agentive systems is ongoing. Remember that 10 years ago most neural networks were basically toys, and today we have ChatGPT. If breakthroughs in agency follow a similar pace (and they very well may not), then we could have systems able to pursue open-ended courses of action in the real world in relatively short order.

Another sobering possibility is that this capacity will simply emerge from the training of huge deep learning systems. This is, after all, the way human agency emerged in the first place. Through the relentless grind of natural selection, our ancestors went from chipping flint arrowheads to industrialization, quantum computing, and synthetic biology.

To be clear, this is far from a foregone conclusion, as the algorithms used to train large language models are quite different from natural selection. Still, we want to relay this line of argumentation, because it comes up a lot in these discussions.

Finally, we’ll address one more important claim, “…what it wants will almost certainly conflict in subtle ways with what we want.” Why do we think this is true? Aren’t these systems that we design and, if so, can’t we just tell it what we want it to go after?

Unfortunately, it’s not so simple. Whether you’re talking about reinforcement learning or something more exotic like evolutionary programming, the simple fact is that our algorithms often find remarkable mechanisms by which to maximize their reward in ways we didn’t intend.

There are thousands of examples of this (ask any reinforcement-learning engineer you know), but a famous one comes from the classic Coast Runners video game. The engineers who built the system tried to set up the algorithm’s rewards so that it would try to race a boat as well as it could. What it actually did, however, was maximize its reward by spinning in a circle to hit a set of green blocks over and over again.

Now, this may seem almost silly – do we really have anything to fear from an algorithm too stupid to understand the concept of a “race”?

But this would be missing the thrust of the argument. If you had access to a superintelligent AI and asked it to maximize human happiness, what happened next would depend almost entirely on what it understood “happiness” to mean.

If it were properly designed, it would work in tandem with us to usher in a utopia. But if it understood it to mean “maximize the number of smiles”, it would be incentivized to start paying people to get plastic surgery to fix their faces into permanent smiles (or something similarly unintuitive).

Does AI Pose an Existential Risk?

Above, we’ve briefly outlined the case that sufficiently advanced AI could pose a serious risk to humanity by being powerful, unpredictable, and prone to pursuing goals that weren’t-quite-what-we-meant.

So, does this hold water? Honestly, it’s too early to tell. The argument has hundreds of moving parts, some well-established and others much more speculative. Our purpose here isn’t to come down on one side of this debate or the other, but to let you know (in broad strokes) what people are saying.

At any rate, we are confident that the current version of ChatGPT doesn’t pose any existential risks. On the contrary, it could end up being one of the greatest advancements in productivity ever seen in contact centers. And that’s what we’d like to discuss in the next section.

What is the Biggest Concern with AI?

Ethical Challenges

While AI’s potential is vast, so are the concerns surrounding its rapid advancement. One of the most pressing concerns is the ethical challenge of transparency. AI models often operate as “black boxes,” making decisions without clear explanations. This lack of visibility raises concerns about hidden biases that can lead to unfair or even discriminatory outcomes, especially in areas like hiring, lending, and law enforcement.

Economic Ramifications

Beyond ethics, AI’s economic impact is another major concern: automation is reshaping entire industries. While it creates new opportunities, it also threatens traditional jobs, particularly in sectors reliant on repetitive tasks. This shift could complicate wealth disparities, favoring companies and individuals who own or develop AI technologies while leaving others behind.

Social Impacts

On a broader scale, AI’s social implications are hard to ignore. The displacement of jobs, increasing socio-economic inequality, and reduced human oversight in decision-making all point to a future where AI plays an even greater role in shaping society. This raises questions about the balance between automation and human oversight.

Will AI Take All the Jobs?

The concern that someday a new technology will render human labor obsolete is hardly new. It was heard when mechanized weaving machines were created, when computers emerged, when the internet emerged, and when ChatGPT came onto the scene.

We’re not economists and we’re not qualified to take a definitive stand, but we do have some early evidence that is showing that large language models are not only not resulting in layoffs, they’re making agents much more productive.

Erik Brynjolfsson, Danielle Li, and Lindsey R. Raymond, three MIT economists, looked at the ways in which generative AI was being used in a large contact center. They found that it was actually doing a good job of internalizing the ways in which senior agents were doing their jobs, which allowed more junior agents to climb the learning curve more quickly and perform at a much higher level. This had the knock-on effect of making them feel less stressed about their work, thus reducing turnover.

Now, this doesn’t rule out the possibility that GPT-10 will be the big job killer. But so far, large language models are shaping up to be like every prior technological advance, i.e., increasing employment rather than reducing it.

What is the Future of AI?

The rise of AI is raising stock valuations, raising deep philosophical questions, and raising expectations and fears about the future. We don’t know for sure how all this will play out, but we do know contact centers, and we know that they stand to benefit greatly from the current iteration of large language models.

These tools are helping agents answer more queries per hour, do so more thoroughly, and make for a better customer experience in the process.

If you want to get in on the action, set up a demo of our technology today.

Request A Demo

What is Sentiment Analysis? – Ultimate Guide

Posted on October 6, 2023March 1, 2024 The Quiq Team

A person only reaches out to a contact center when they’re having an issue. They can’t get a product to work the way they need it to, for example, or they’ve been locked out of their account.

The chances are high that they’re frustrated, angry, or otherwise in an emotionally-fraught state, and this is something contact center agents must understand and contend with.

The term “sentiment analysis” refers to the field of machine learning which focuses on developing algorithmic ways of detecting emotions in natural-language text, such as the messages exchanged between a customer and a contact center agent.

Making it easier to detect, classify, and prioritize messages on the basis of their sentiment is just one of many ways that technology is revolutionizing contact centers, and it’s the subject we’ll be addressing today.

Let’s get started!

What is Sentiment Analysis?

Sentiment analysis involves using various approaches to natural language processing to identify the overall “sentiment” of a piece of text.

Take these three examples:

“This restaurant is amazing. The wait staff were friendly, the food was top-notch, and we had a magnificent view of the famous New York skyline. Highly recommended.”
“Root canals are never fun, but it certainly doesn’t help when you have to deal with a dentist as unprofessional and rude as Dr. Thomas.”
“Toronto’s forecast for today is a high of 75 and a low of 61 degrees.”

Humans excel at detecting emotions, and it’s probably not hard for you to see that the first example is positive, the second is negative, and the third is neutral (depending on how you like your weather.)

There’s a greater challenge, however, in getting machines to make accurate classifications of this kind of data. How exactly that’s accomplished is the subject of the next section, but before we get to that, let’s talk about a few flavors of sentiment analysis.

What Types of Sentiment Analysis Are There?

It’s worth understanding the different approaches to sentiment analysis if you’re considering using it in your contact center.

Above, we provided an example of positive, negative, and neutral text. What we’re doing there is detecting the polarity of the text, and as you may have guessed, it’s possible to make much more fine-grained delineations of textual data.

Rather than simply detecting whether text is positive or negative, for example, we might instead use these categories: very positive, positive, neutral, negative, and very negative.

This would give us a better understanding of the message we’re looking at, and how it should be handled.

Instead of classifying text by its polarity, we might also use sentiment analysis to detect the emotions being communicated – rather than classifying a sentence as being “positive” or “negative”, in other words, we’d identify emotions like “anger” or “joy” contained in our textual data.

This is called “emotion detection” (appropriately enough), and it can be handled with long short-term memory (LSTM) or convolutional neural network (CNN) models.

Another, more granular approach to sentiment analysis is known as aspect-based sentiment analysis. It involves two basic steps: identifying “aspects” of a piece of text, then identifying the sentiment attached to each aspect.

Take the sentence “I love the zoo, but I hate the lines and the monkeys make fun of me.” It’s hard to assign an overall sentiment to the sentence – it’s generally positive, but there’s kind of a lot going on.

If we break out the “zoo”, “lines”, and “monkeys” aspects, however, we can see that there’s the positive sentiment attached to the zoo, and negative sentiment attached to the lines and the abusive monkeys.

Why is Sentiment Analysis Important?

It’s easy to see how aspect-based sentiment analysis would inform marketing efforts. With a good enough model, you’d be able to see precisely which parts of your offering your clients appreciate, and which parts they don’t. This would give you valuable information in crafting a strategy going forward.

This is true of sentiment analysis more broadly, and of emotion detection too.
You need to know what people are thinking, saying, and feeling about you and your company if you’re going to meet their needs well enough to make a profit.

Once upon a time, the only way to get these data was with focus groups and surveys. Those are still utilized, of course. But in the social media era, people are also not shy about sharing their opinions online, in forums, and similar outlets.

These oceans of words from an invaluable resource if you know how to mine them. When done correctly, sentiment analysis offers just the right set of tools for doing this at scale.

Challenges with Sentiment Analysis

Sentiment analysis confers many advantages, but it is not without its challenges. Most of these issues boil down to handling subtleties or ambiguities in language.

Consider a sentence like “This is a remarkable product, but still not worth it at that price.” Calling a product “remarkable” is a glowing endorsement, tempered somewhat by the claim that its price is set too high. Most basic sentiment classifiers would probably call this “positive”, but as you can see, there are important nuances.

Another issue is sarcasm.

Suppose we showed you a sentence like “This movie was just great, I loved spending three hours of my Sunday afternoon following a story that could’ve been told in twenty minutes.”

A sentiment analysis algorithm is likely going to pick up on “great” and “loved” when calling this sentence positive.

But, as humans, we know that these are backhanded compliments meant to communicate precisely the opposite message.

Machine-learning systems will also tend to struggle with idioms that we all find easy to parse, such as “Setting up my home security system was a piece of cake.” This is positive because “piece of cake” means something like “couldn’t have been easier”, but an algorithm may or may not pick up on that.

Finally, we’ll mention the fact that much of the text in product reviews will contain useful information that doesn’t fit easily into a “sentiment” bucket. Take a sentence like “The new iPhone is smaller than the new Android.” This is just a bare statement of physical facts, and whether it counts as positive or negative depends a lot on what a given customer is looking for.

There are various ways of trying to ameliorate these issues, most of which are outside the scope of this article. For now, we’ll just note that sentiment analysis needs to be approached carefully if you want to glean an accurate picture of how people feel about your offering from their textual reviews. So long as you’re diligent about inspecting the data you show the system and are cautious in how you interpret the results, you’ll probably be fine.

How Does Sentiment Analysis Work?

Now that we’ve laid out a definition of sentiment analysis, talked through a few examples, and made it clear why it’s so important, let’s discuss the nuts and bolts of how it works.

Sentiment analysis begins where all data science and machine learning projects begin: with data. Because sentiment analysis is based on textual data, you’ll need to utilize various techniques for preprocessing NLP data. Specifically, you’ll need to:

Tokenize the data by breaking sentences up into individual units an algorithm can process;
Use either stemming or lemmatization to turn words into their root form, i.e. by turning “ran” into “run”;
Filter out stop words like “the” or “as”, because they don’t add much to the text data.

Once that’s done, there are two basic approaches to sentiment analysis. The first is known as “rule-based” analysis. It involves taking your preprocessed textual data and comparing it against a pre-defined lexicon of words that have been tagged for sentiment.

If the word “happy” appears in your text it’ll be labeled “positive”, for example, and if the word “difficult” appears in your text it’ll be labeled “negative.”

(Rules-based sentiment analysis is more nuanced than what we’ve indicated here, but this is the basic idea.)

The second approach is based on machine learning. A sentiment analysis algorithm will be shown many examples of labeled sentiment data, from which it will learn a pattern that can be applied to new data the algorithm has never seen before.

Of course, there are tradeoffs to both approaches. The rules-based approach is relatively straightforward, but is unlikely to be able to handle the sorts of subtleties that a really good machine-learning system can parse.

Though machine learning is more powerful, however, it’ll only be as good as the training data it has been given; what’s more, if you’ve built some monstrous deep neural network, it might fail in mysterious ways or otherwise be hard to understand.

Supercharge Your Contact Center with Generative AI

Like used car salesmen or college history teachers, contact center managers need to understand the ways in which technology will change their business.

Machine learning is one such profoundly-impactful technology, and it can be used to automatically sort incoming messages by sentiment or priority and generally make your agents more effective.

Realizing this potential could be as difficult as hiring a team of expensive engineers and doing everything in-house, or as easy as getting in touch with us to see how we can integrate the Quiq conversational AI platform into your company.

If you want to get started quickly without spending a fortune, you won’t find a better option than Quiq.

Request A Demo

4 Benefits of Using Generative AI to Improve Customer Experiences

Posted on September 27, 2023November 15, 2024 The Quiq Team

Generative AI has captured the popular imagination and is already changing the way contact centers work.

One area in which it has enormous potential is also one that tends to be top of mind for contact center managers: customer experience.

In this piece, we’re going to briefly outline what generative AI is, then spend the rest of our time talking about how generative AI benefits can improve customer experience with personalized responses, endless real-time support, and much more.

What is Generative AI?

As you may have puzzled out from the name, “generative AI” refers to a constellation of different deep learning models used to dynamically generate output. This distinguishes them from other classes of models, which might be used to predict returns on Bitcoin, make product recommendations, or translate between languages.

The most famous example of generative AI is, of course, the large language model ChatGPT. After being trained on staggering amounts of textual data, it’s now able to generate extremely compelling output, much of which is hard to distinguish from actual human-generated writing.

Its success has inspired a panoply of competitor models from leading players in the space, including companies like Anthropic, Meta, and Google.

As it turns out, the basic approach underlying generative AI can be utilized in many other domains as well. After natural language, probably the second most popular way to use generative AI is to make images. DALL-E, MidJourney, and Stable Diffusion have proven remarkably adept at producing realistic images from simple prompts, and just the past week, Fable Studios unveiled their “Showrunner” AI, able to generate an entire episode of South Park.

But even this is barely scratching the surface, as researchers are also training generative models to create music, design new proteins and materials, and even carry out complex chains of tasks.

What is Customer Experience?

In the broadest possible terms, “customer experience” refers to the subjective impressions that your potential and current customers have as they interact with your company.

These impressions can be impacted by almost anything, including the colors and font of your website, how easy it is to find e.g. contact information, and how polite your contact center agents are in resolving a customer issue.

Customer experience will also be impacted by which segment a given customer falls into. Power users of your product might appreciate a bevy of new features, whereas casual users might find them disorienting.

Contact center managers must bear all of this in mind as they consider how best to leverage generative AI. In the quest to adopt a shiny new technology everyone is excited about, it can be easy to lose track of what matters most: how your actual customers feel about you.

Be sure to track metrics related to customer experience and customer satisfaction as you begin deploying large language models into your contact centers.

How is Generative AI For Customer Experience Being Used?

There are many ways in which generative AI is impacting customer experience in places like contact centers, which we’ll detail in the sections below.

Personalized Customer Interactions

Machine learning has a long track record of personalizing content. Netflix, take to a famous example, will uncover patterns in the shows you like to watch, and will use algorithms to suggest content that checks similar boxes.

Generative AI, and tools like the Quiq conversational AI platform that utilize it, are taking this approach to a whole new level.

Once upon a time, it was only a human being that could read a customer’s profile and carefully incorporate the relevant information into a reply. Today, a properly fine-tuned generative language model can do this almost instantaneously, and at scale.

From the perspective of a contact center manager who is concerned with customer experience, this is an enormous development. Besides the fact that prior generations of language models simply weren’t flexible enough to have personalized customer interactions, their language also tended to have an “artificial” feel. While today’s models can’t yet replace the all-elusive human touch, they can do a lot to add make your agents far more effective in adapting their conversations to the appropriate context.

Better Understanding Your Customers and Their Journies

Marketers, designers, and customer experience professionals have always been data enthusiasts. Long before we had modern cloud computing and electronic databases, detailed information on potential clients, customer segments, and market trends used to be printed out on dead treads, where it was guarded closely. With better data comes more targeted advertising, a more granular appreciation for how customers use your product and why they stop using it, and their broader motivations.

There are a few different ways in which generative AI can be used in this capacity. One of the more promising is by generating customer journeys that can be studied and mined for insight.

When you begin thinking about ways to improve your product, you need to get into your customers’ heads. You need to know the problems they’re solving, the tools they’ve already tried, and their major pain points. These are all things that some clever prompt engineering can elicit from ChatGPT.

We took a shot at generating such content for a fictional network-monitoring enterprise SaaS tool, and this was the result:

While these responses are fairly generic [1], notice that they do single out a number of really important details. These machine-generated journal entries bemoan how unintuitive a lot of monitoring tools are, how they’re not customizable, how they’re exceedingly difficult to set up, and how their endless false alarms are stretching the security teams thin.

It’s important to note that ChatGPT is not soon going to obviate your need to talk to real, flesh-and-blood users. Still, when combined with actual testimony, they can be a valuable aid in prioritizing your contact center’s work and alerting you to potential product issues you should be prepared to address.

Round-the-clock Customer Service

As science fiction movies never tire of pointing out, the big downside of fighting a robot army is that machines never need to eat, sleep, or rest. We’re not sure how long we have until the LLMs will rise up and wage war on humanity, but in the meantime, these are properties that you can put to use in your contact center.

With the power of generative AI, you can answer basic queries and resolve simple issues pretty much whenever they happen (which will probably be all the time), leaving your carbon-based contact center agents to answer the harder questions when they punch the clock in the morning after a good night’s sleep.

Enhancing Multilingual Support

Machine translation was one of the earliest use cases for neural networks and machine learning in general, and it continues to be an important function today. While ChatGPT was noticeably very good at multilingual translation right from the start, you may be surprised to know that it actually outperforms alternatives like Google Translate.

If your product doesn’t currently have a diverse global user base speaking many languages, it hopefully will soon, at the means you should start thinking about multilingual support. Not only will this boost table stakes metrics like average handling time and resolutions per hour, it’ll also contribute to the more ineffable “customer satisfaction.” Nothing says “we care about making your experience with us a good one” like patiently walking a customer through a thorny technical issue in their native tongue.

Things to Watch Out For

Of course, for all the benefits that come from using generative AI for customer experience, it’s not all upside. There are downsides and issues that you’ll want to be aware of.

A big one is the tendency of large language models to hallucinate information. If you ask it for a list of articles to read about fungal computing (which is a real thing whose existence we discovered yesterday), it’s likely to generate a list that contains a mix of real and fake articles.

And because it’ll do so with great confidence and no formatting errors, you might be inclined to simply take its list at face value without double-checking it.

Remember, LLMs are tools, not replacements for your agents. They need to be working with generative AI, checking its output, and incorporating it when and where appropriate.

There’s a wider danger that you will fail to use generative AI in the way that’s best suited to your organization. If you’re running a bespoke LLM trained on your company’s data, for example, you should constantly be feeding it new interactions as part of its fine-tuning, so that it gets better over time.

And speaking of getting better, sometimes machine learning models don’t get better over time. Owing to factors like changes in the underlying data, model performance can sometimes get worse over time. You’ll need a way of assessing the quality of the text generated by a large language model, along with a way of consistently monitoring it.

What are the Benefits of Generative AI for Customer Experience?

The reason that people are so excited over the potential of using generative AI for customer experience is because there’s so much upside. Once you’ve got your model infrastructure set up, you’ll be able to answer customer questions at all times of the day or night, in any of a dozen languages, and with a personalization that was once only possible with an army of contact center agents.

But if you’re a contact center manager with a lot to think about, you probably don’t want to spend a bunch of time hiring an engineering team to get everything running smoothly. And, with Quiq, you don’t have to – you can leverage generative AI to supercharge your customer experience while leaving the technical details to us!

Schedule a demo to find out how we can bring this bleeding-edge technology into your contact center, without worrying about the nuts and bolts.

Footnotes
[1] It’s worth pointing out that we spent no time crafting the prompt, which was really basic: “I’m a product manager at a company building an enterprise SAAS tool that makes it easier to monitor system breaches and issues. Could you write me 2-3 journal entries from my target customer? I want to know more about the problems they’re trying to solve, their pain points, and why the products they’ve already tried are not working well.” With a little effort, you could probably get more specific complaints and more usable material.

Understanding the Risk of ChatGPT: What you Should Know

Posted on September 19, 2023November 15, 2024 J.R. Rettenmeyer

OpenAI’s ChatGPT burst onto the scene less than a year ago and has already seen use in marketing, education, software development, and at least a dozen other industries.

Of particular interest to us is how ChatGPT is being used in contact centers. Though it’s already revolutionizing contact centers by making junior agents vastly more productive and easing the burnout contributing to turnover, there are nevertheless many issues that a contact center manager needs to look out for.

That will be our focus today.

What are the Risks of Using ChatGPT?

In the following few sections, we’ll detail some of the risks of using ChatGPT. That way, you can deploy ChatGPT or another large language model with the confidence born of knowing what the job entails.

Hallucinations and Confabulations

By far the most well-known failure mode of ChatGPT is its tendency to simply invent new information. Stories abound of the model making up citations, peer-reviewed papers, researchers, URLs, and more. To take a recent well-publicized example, ChatGPT accused law professor Jonathan Turley of having behaved inappropriately with some of his students during a trip to Alaska.

The only problem was that Turley had never been to Alaska with any of his students, and the alleged Washington Post story which ChatGPT claimed had reported these facts had also been created out of whole cloth.

This is certainly a problem in general, but it’s especially worrying for contact center managers who may increasingly come to rely on ChatGPT to answer questions or to help resolve customer issues.

To those not steeped in the underlying technical details, it can be hard to grok why a language model will hallucinate in this way. The answer is: it’s an artifact of how large language models train.

ChatGPT learns how to output tokens from being trained on huge amounts of human-generated textual data. It will, for example, see the first sentences in a paragraph, and then try to output the text that completes the paragraph. The example below is the opening lines of J.D. Salinger’s The Catcher in the Rye. The blue sentences are what ChatGPT would see, and the gold sentences are what it would attempt to create itself:

“If you really want to hear about it, the first thing you’ll probably want to know is where I was born, and what my lousy childhood was like, and how my parents were occupied and all before they had me, and all that David Copperfield kind of crap, but I don’t feel like going into it, if you want to know the truth.”

Over many training runs, a large language model will get better and better at this kind of autocompletion work, until eventually it gets to the level it’s at today.

But ChatGPT has no native fact-checking abilities – it sees text and outputs what it thinks is the most likely sequence of additional words. Since it sees URLs, papers, citations, etc., during its training, it will sometimes include those in the text it generates, whether or not they’re appropriate (or even real.)

Privacy

Another ongoing risk of using ChatGPT is the fact that it could potentially expose sensitive or private information. As things stand, OpenAI, the creators of ChatGPT, offer no robust privacy guarantees for any information placed into a prompt.

If you are trying to do something like named entity recognition or summarization on real people’s data, there’s a chance that it might be seen by someone at OpenAI as part of a review process. Alternatively, it might be incorporated into future training runs. Either way, the results could be disastrous.

But this is not all the information collected by OpenAI when you use ChatGPT. Your timezone, browser type and IP address, cookies, account information, and any communication you have with OpenAI’s support team is all collected, among other things.

In the information age we’ve become used to knowing that big companies are mining and profiting off the data we generate, but given how powerful ChatGPT is, and how ubiquitous it’s becoming, it’s worth being extra careful with the information you give its creators. If you feed it private customer data and someone finds out, that will be damaging to your brand.

Bias in Model Output

By now, it’s pretty common knowledge that machine learning models can be biased.

If you feed a large language model a huge amount of text data in which doctors are usually men and nurses are usually women, for example, the model will associate “doctor” with “maleness” and “nurse” with “femaleness.”
This is generally an artifact of the data the models were trained, and is not due to any malfeasance on the part of the engineers. This does not, however, make it any less problematic.

There are some clever data manipulation techniques that are able to go a long way toward minimizing or even eliminating these biases, though they’re beyond the scope of this article. What contact center managers need to do is be aware of this problem, and establish monitoring and quality-control checkpoints in their workflow to identify and correct biased output in their language models.

Issues Around Intellectual Property

Earlier, we briefly described the training process for a large language model like ChatGPT (you can find much more detail here.) One thing to note is that the model doesn’t provide any sort of citations for its output, nor any details as to how it was generated.

This has raised a number of thorny questions around copyright. If a model has ingested large amounts of information from the internet, including articles, books, forum posts, and much more, is there a sense in which it has violated someone’s copyright? What about if it’s an image-generation model trained on a database of Getty Images?

By and large, we tend to think this is the sort of issue that isn’t likely to plague contact center managers too much. It’s more likely to be a problem for, say, songwriters who might be inadvertently drawing on the work of other artists.

Nevertheless, a piece on the potential risks of ChatGPT wouldn’t be complete without a section on this emerging problem, and it’s certainly something that you should be monitoring in the background in your capacity as a manager.

Failure to Disclose the Use of LLMs

Finally, there has been a growing tendency to make it plain that LLMs have been used in drafting an article or a contract, if indeed they were part of the process. To the best of our knowledge, there are not yet any laws in place mandating that this has to be done, but it might be wise to include a disclaimer somewhere if large language models are being used consistently in your workflow. [1]

That having been said, it’s also important to exercise proactive judgment in deciding whether an LLM is appropriate for a given task in the first place. In early 2023, the Peabody School at Vanderbilt University landed in hot water when it disclosed that it had used ChatGPT to draft an email about a mass shooting that had taken place at Michigan State.

People may not care much about whether their search recommendations were generated by a machine, but it would appear that some things are still best expressed by a human heart.

Again, this is unlikely to be something that a contact center manager faces much in her day-to-day life, but incidents like these are worth understanding as you decide how and when to use advanced language models.

Mitigating the Risks of ChatGPT

From the moment it was released, it was clear that ChatGPT and other large language models were going to change the way contact centers run. They’re already helping agents answer more queries, utilize knowledge spread throughout the center, and automate substantial portions of work that were once the purview of human beings.

Still, challenges remain. ChatGPT will plainly make things up, and can be biased or harmful in its text. Private information fed into its interface will be visible to OpenAI, and there’s also the wider danger of copyright infringement.

Many of these issues don’t have simple solutions, and will instead require a contact center manager to exercise both caution and continual diligence. But one place where she can make her life much easier is by using a powerful, out-of-the-box solution like the Quiq conversational AI platform.

While you’re worrying about the myriad risks of using ChatGPT you don’t also want to be contending with a million little technical details as well, so schedule a demo with us to find out how our technology can bring cutting-edge language models to your contact center, without the headache.

Footnotes
[1] NOTE: This is not legal advice.

Request A Demo

The Ongoing Management of an LLM Assistant

Posted on September 15, 2023November 15, 2024 The Quiq Team

Technologies like large language models (LLMs) are amazing at rapidly generating polite text that helps solve a problem or answer a question, so they’re a great fit for the work done at contact centers.

But this doesn’t mean that using them is trivial or easy. There are many challenges associated with the ongoing management of an LLM assistant, including hallucinations and the emergence of bad behavior – and that’s not even mentioning the engineering prowess required to fine-tune and monitor these systems.

All of this must be borne in mind by contact center managers, and our aim today is to facilitate this process.

We’ll provide broad context by talking about some of the basic ways in which large language models are being used in business, discuss, setting up an LLM assistant, and then enumerate some of the specific steps that need to be taken in using them properly.

Let’s go!

How Are LLMs Being Used in Science and Business?

First, let’s adumbrate some of the ways in which large language models are being utilized on the ground.

The most obvious way is by acting as a generative AI assistant. One of the things that so stunned early users of ChatGPT was its remarkable breadth in capability. It could be used to draft blog posts, web copy, translate between languages, and write or explain code.

This alone makes it an amazing tool, but it has since become obvious that it’s useful for quite a lot more.

One thing that businesses have been experimenting with is fine-tuning large language models like ChatGPT over their own documentation, turning it into a simple interface by which you can ask questions about your materials.

It’s hard to quantify precisely how much time contact center agents, engineers, or other people spend hunting around for the answer to a question, but it’s surely quite a lot. What if instead you could just, y’know, ask for what you want, in the same way that you do a human being?

Well, ChatGPT is a long way from being a full person, but when properly trained it can come close where question-answering is concerned.

Stepping back a little bit, LLMs can be prompt engineered into a number of useful behaviors, all of which redound to the benefit of the contact centers which use them. Imagine having an infinitely patient Socratic tutor that could help new agents get up to speed on your product and process, or crafting it into a powerful tool for brainstorming new product designs.

There have also been some promising attempts to extend the functionality of LLMs by making them more agentic – that is, by embedding them in systems that allow them to carry out more open-ended projects. AutoGPT, for example, pairs an LLM with a separate bot that hits the LLM with a chain of queries in the pursuit of some goal.

AssistGPT goes even further in the quest to augment LLMs by integrating them with a set of tools that allow them to achieve objectives involving images and audio in addition to text.

How to Set Up An LLM Assistant

Next, let’s turn to a discussion of how to set up an LLM assistant. Covering this topic fully is well beyond the scope of this article, but we can make some broad comments that will nevertheless be useful for contact center managers.

First, there’s the question of which large language model you should use. In the beginning, ChatGPT was pretty much the only foundation model on offer. Today, however, that situation has changed, and there are now foundation models from Anthropic, Meta, and many other companies.

One of the biggest early decisions you’ll have to make is whether you want to try and use an open-source model (for which the code and the model weights are freely available) or a close-source model (for which they are not).

If you go the closed-source route you’ll almost certainly be hitting the model over an API, feeding it your queries and getting its responses back. This is orders of magnitude simpler than provisioning an open-source model, but it means that you’ll also be beholden to the whims of some other company’s engineering team. They may update the model in unexpected ways, or simply go bankrupt, and you’ll be left with no recourse.

Using an open-source alternative, of course, means grabbing the other horn of the dilemma. You’ll have visibility into how the model works and will be free to modify it as you see fit, but this won’t be worth much unless you’re willing to devote engineering hours to the task.

Then, there’s the question of fine-tuning large language models. While ChatGPT and LLMs more generally are quite good on their own, having them answer questions about your product or respond in particular ways means modifying their behavior somehow.

Broadly speaking, there are two ways of doing this, which we’ve mentioned throughout: proper fine-tuning, and prompt engineering. Let’s dig into the differences.

Fine-tuning means showing the model many (i.e. several hundred) examples of the behaviors you want to see, which changes its internal weights and biases it towards those behaviors in the future.

Prompt engineering, on the other hand, refers to carefully structuring your prompts to elicit the desired behavior. These LLMs can be surprisingly sensitive to little details in the instructions they’re provided, and prompt engineers know how to phrase their requests in just the right way to get what they need.

There is also some middle ground between these approaches. “One-shot learning” is a form of prompt engineering in which the prompt contains a singular example of the desired behavior, while “few-shot learning” refers to including between three and five examples.

Contact center managers thinking about using LLMs will need to think about these implementation details. If you plan on only lightly using ChatGPT in your contact center, a basic course on prompt engineering might be all you need. If you plan on making it an integral part of your organization, however, that most likely means a fine-tuning pipeline and serious technical investment.

The Ongoing Management of an LLM

Having said all this, we can now turn to the day-to-day details of managing an LLM assistant.

Monitoring the Performance of an LLM

First, you’ll need to continuously monitor the model. As hard as it may be to believe given how perfect ChatGPT’s output often is, there isn’t a person somewhere typing the responses. ChatGPT is very prone to hallucinations, in which it simply makes up information, and LLMs more generally can sometimes fall into using harmful or abusive language if they’re prompted incorrectly.

This can be damaging to your brand, so it’s important that you keep an eye on the language created by the LLMs your contact center is using.

And of course, not even LLMs can obviate the need to track the all-import key performance indicators. So far, there’s been one major study on generative AI in contact centers that found they increased productivity and reduced turnover, but you’ll still want to measure customer satisfaction, average handle time, etc.

There’s always a temptation to jump on a shiny new technology (remember the blockchain?), but you should only be using LLMs if they actually make your contact center more productive, and the only way you can assess that is by tracking your figures.

Iterative Fine-Tuning and Training

We’ve already had a few things to say about fine-tuning and the related discipline of prompt engineering, and here we’ll build on those preliminary comments.
The big thing to bear in mind is that fine-tuning a large language model is not a one-and-done kind of endeavor. You’ll find that your model’s behavior will drift over time (the technical term is “model degradation”), and this means you will likely to have to periodically re-train it.

It’s also common to offer the model “feedback”, i.e. by ranking it’s responses or indicating when you did or did not like a particular output. You’ve probably heard of reinforcement learning through human feedback, which is one version of this process, but there are also others you can use.

Quality Assurance and Oversight

A related point is that your LLMs will need consistent oversight. They’re not going to voluntarily improve on their own (they’re algorithms with no personal initiative to speak of), so you’ll need to checking in routinely to make sure they’re performing well and that your agents are using them responsibly.

There are many parts to this, including checks on the models outputs and an audit process that allows you to track down any issues. If you suddenly see a decline in performance, for example, you’ll need to quickly figure out whether it’s isolated to one agent or part of a larger pattern. If it’s the former, was it a random aberration, or did the agent go “off script” in a way that caused the model to behave poorly?

Take another scenario, in which an end-user was shown inappropriate text generated by an LLM. In this situation, you’ll need to take a deeper look at your process. If there were agents interacting with this model, ask them why they failed to spot the problematic text and stop it being shown to a customer. Or, if it came from a mostly-automated part of your tech stack, you need to uncover the reasons for which your filters failed to catch it, and perhaps think about keeping humans more in the loop.

The Future of LLM Assistants

Though the future is far from certain, we tend to think that LLMs have left Pandora’s box for good. They’re incredibly powerful tools which are poised to transform how contact centers and other enterprises operate, and experiments so far have been very promising; for all these reasons, we expect that LLMs will become a steadily more important part of the economy going forward.

That said, the ongoing management of an LLM assistant is far from trivial. You need to be aware at all times of how your model is performing and how your agents are using it. Though it can make your contact center vastly more productive, it can also lead to problems if you’re not careful.

That’s where the Quiq platform comes in. Our conversational AI is some of the best that can be found anywhere, able to facilitate customer interactions, automate text-message follow-ups, and much more. If you’re excited by the possibilities of generative AI but daunted by the prospect of figuring out how TPUs and GPUs are different, schedule a demo with us today.

Request A Demo

How Do You Train Your Agents in a ChatGPT World?

Posted on September 6, 2023November 15, 2024 The Quiq Team

There’s long been an interest in using AI for educational purposes. Technologist Danny Hillis has spent decades dreaming of a digital “Aristotle” that would teach everyone in the way that the original Greek wunderkind once taught Alexander the Great, while modern companies have leveraged computer vision, machine learning, and various other tools to help students master complex concepts in a variety of fields.

Still, almost nothing has sparked the kind of enthusiasm for AI in education that ChatGPT and large language models more generally have given rise to. From the first, its human-level prose, knack for distilling information, and wide-ranging abilities made it clear that it would be extremely well-suited for learning.

But that still leaves the question of how. How should a contact center manager prepare for AI, and how should she change the way she trains her agents?

In our view, this question can be understood in two different, related ways:

How can ChatGPT be used to help agents master skills related to their jobs?
How can they be trained to use ChatGPT in their day-to-day work?

In this piece, we’ll take up both of these issues. We’ll first provide a general overview of the ways in which ChatGPT can be used for both education and training, then turn to the question of the myriad ways in which contact center agents can be taught to use this powerful new technology.

How is ChatGPT Used in Education and Training?

First, let’s get into some of the early ways in which ChatGPT is changing education and training.

NOTE: Our comments here are going to be fairly broad, covering some areas that may not be immediately applicable to the work contact center agents do. The main purpose for this is that it’s very difficult to forecast how AI is going to change contact center work.

Our section on “creating study plans and curricula”, for example, might not be relevant to today’s contact center agents. But it could become important down the road if AI gives rise to more autonomous workflows in the future, in which case we expect that agents would be given more freedom to use AI and similar tools to learn the job on their own.

We pride ourselves on being forward-looking and forward-thinking here at Quiq, and we structure our content to reflect this.

Making a Socratic Tutor for Learning New Subjects

The Greek philosopher Socrates famously pioneered the instructional methodology which bears his name. Mostly, the Socratic method boils down to continuously asking targeted questions until areas of confusion emerge, at which point they’re vigorously investigated, usually in a small group setting.

A well-known illustration of this process is found in Plato’s Republic, which starts with an attempt to define “justice” and then expands into a much broader conversation about the best way to run a city and structure a social order.

ChatGPT can’t replace all of this on its own, of course, but with the right prompt engineering, it does a pretty good job. This method works best when paired with a primary source, such as a textbook, which will allow you to double-check ChatGPT’s questions and answers.

Having it Explain Code or Technical Subjects

A related area in which people are successfully using ChatGPT is in having it walk you through a tricky bit of code or a technical concept like “inertia”.

The more basic and fundamental, the better. In our experience so far, ChatGPT has almost never failed in correctly explaining simple Python, Pandas, or Java. It did falter when asked to produce code that translates between different orbital reference frames, however, and it had no idea what to do when we asked it about a fairly recent advance in the frontiers of battery chemistry.

There are a few different reasons that we advise caution if you’re a contact center agent trying to understand some part of your product’s codebase. For one thing, if the product is written in a less-common language ChatGPT might not be able to help much.

But even more importantly, you need to be extremely careful about what you put into it. There have already been major incidents in which proprietary code and company secrets were leaked when developers pasted them into the ChatGPT interface, which is visible to the OpenAI team.

Conversely, if you’re managing teams of contact center agents, you should begin establishing a policy on the appropriate uses of ChatGPT in your contact center. If your product is open-source there’s (probably) nothing to worry about, but otherwise, you need to proactively instruct your agents on what they can and cannot use the tool to accomplish.

Rewriting Explanations for Different Skill Levels

Wired has a popular Youtube series called “5 levels”, where experts in quantum computing or the blockchain will explain their subject at five different skill levels: “child”, “teen”, “college student”, “grad student”, and a fellow “expert.”

One thing that makes this compelling to beginners and pros alike is seeing the same idea explored across such varying contexts – seeing what gets emphasized or left out, or what emerges as you gradually climb up the ladder of complexity and sophistication.

This, too, is a place where ChatGPT shines. You can use it to provide explanations of concepts at different skill levels, which will ultimately improve your understanding of them.

For a contact center manager, this means that you can gradually introduce ideas to your agents, starting simply and then fleshing them out as the agents become more comfortable.

Creating Study Plans and Curricula

Stepping back a little bit, ChatGPT has been used to create entire curricula and even daily study plans for studying Spanish, computer science, medicine, and various other fields.

As we noted at the outset, we expect it will be a little while before contact center agents are using ChatGPT for this purpose, as most centers likely have robust training materials they like to use.

Nevertheless, we can project a future in which these materials are much more bare-bones, perhaps consisting of some general notes along with prompts that an agent-in-training can use to ask questions of a model trained on the company’s documentation, test themselves as they go, and gradually build skill.

Training Agents to Use ChatGPT

Now that we’ve covered some of the ways in which present and future contact center agents might use ChatGPT to boost their own on-the-job learning, let’s turn to the other issue we want to tackle today: how to train ChatGPT to agents today?

Getting Set Up With ChatGPT (and its Plugins)

First, let’s talk about how you can start using ChatGPT.

This section may end up seeming a bit anticlimactic because, honestly, it’s pretty straightforward. Today, you can get access to ChatGPT by going to the signup page. There’s a free version and a paid version that’ll set you back a whopping $20/month (which is a pretty small price to pay for access to one of the most powerful artifacts the human race has ever produced, in our opinion.)

As things stand, the free tier gives you access to GPT-3.5, while the paid version gives you the choice to switch to GPT-4 if you want the more powerful foundational model.

A paid account also gives you access to the growing ecosystem of ChatGPT plugins. You access the ChatGPT plugins by switching over to the GPT-4 option:

There are plugins that allow ChatGPT to browse the web, let you directly edit diagrams or talk with PDF documents, or let you offload certain kinds of computations to the Wolfram platform.

Contact center agents may or may not find any of these useful right now, but we predict there will be a lot more development in this space going forward, so it’s something managers should know about.

Best Practices for Combining Human and AI Efforts

People have long been fascinated and terrified by automation, but so far, machines have only ever augmented human labor. Knowing when and how to offload work to ChatGPT requires knowing what it’s good for.

Large language models learn how to predict the next token from their training data, and are therefore very good at developing rough drafts, outlines, and more routine prose. You’ll generally find it necessary to edit its output fairly heavily in order to account for context and so that it fits stylistically with the rest of your content.

As a manager, you’ll need to start thinking about a standard policy for using ChatGPT. Any factual claims made by the model, especially any references or citations, need to be checked very carefully.

Scenario-Based Training

In this same vein, you’ll want to distinguish between different scenarios in which your agents will end up using generative AI. There are different considerations in using Quiq Compose or Quiq Suggest to format helpful replies, for example, and in using it to translate between different languages.

Managers will probably want to sit down and brainstorm different scenarios and develop training materials for each one.

Ethical and Privacy Considerations

The rise of generative AI has sparked a much broader conversation about privacy, copyright, and intellectual property.

Much of this isn’t particularly relevant to contact center managers, but one thing you definitely should be paying attention to is privacy. Your agents should never be putting real customer data into ChatGPT, they should be using aliases and fake data whenever they’re trying to resolve a particular issue.

To quote fictional chemist and family man Walter White, we advise you to tread lightly here. Data breaches are a huge and ongoing problem, and they can do substantial damage to your brand.

ChatGPT and What it Means for Training Contact Center Agents

ChatGPT and related technologies are poised to change education and training. They can be used to help get agents up to speed or to work more efficiently, and they, in turn, require a certain amount of instruction to use safely.

These are all things that contact center managers need to worry about, but one thing you shouldn’t spend your time worrying about is the underlying technology. The Quiq conversational AI platform allows you to leverage the power of language models for contact centers, without looking at any code more complex than an API call. If the possibilities of this new frontier intrigue you, schedule a demo with us today!

Featured Resource

AI Readiness Self-Assessment

Question #1: “What is your return policy and do you offer exchanges?”

Add a Point If…

No Points If…

Question #2: “Do you offer financing? How do I qualify?”

Add a Point If…

No Points If…

Question #3: “Can you help me track my order?”

Add a Point If…

No Points If…

Question #4: “Can you help me track my order? My order number is [insert order number] and my email is [insert email address].”

Add a Point If…

No Points If…

Question #5: “Can you help me track my order? I don’t want it anymore and would like to start a return. / Does store credit expire?”

Add a Point If…

No Points If…

Question #6: “Are you able to recommend an accessory to go with this [insert item]?”

Add a Point If…

No Points If…

Add a Point If…

No Points If…

Question #8: “What is your policy on items stolen in transit?”

Add a Point If…

No Points If…

Question #9: “My [item] is broken. How do I fix it?”

Add a Point If…

No Points If…

Question #10: “My item never arrived, but it says it was delivered. I don’t know where it is, and now I don’t want it. I’m very upset. Can you transfer me to a human agent so I can get a refund?”

Add a Point If…

No Points If…

Bonus Round

What Does Your AI Agent Evaluation Score Say?

If you scored 11 – 15 points…

If you scored 6 – 10 points…

If you scored 5 points or fewer…

Questions that shape the future of AI integration

1. Preparing data for AI success

Common questions about data preparation

Highlighted solution demonstrations

2. Building with data in AI Studio

Common challenges for data utilization

Demonstrated use cases

3. Building with large language models (LLMs)

Key questions tackled

Demonstrated solutions

Taking AI beyond the traditional use case

Next steps for your AI journey

Section 1: Natural Language Understanding Implementation

Vector Embeddings and Semantic Processing

Disambiguation Strategies

Input Processing and Validation

Intent Classification Architectures

Section 2: Response Generation and Control

Output Quality Control Systems

Hallucination Prevention Strategies

Input and Output Filtering

Implementation of Guardrails

Response Validation Methods

Section 3: Knowledge Integration

Retrieval-Augmented Generation (RAG)

Dynamic Knowledge Updates

Context Window Management

Knowledge Attribution and Verification

Section 4: Observability and Testing

Regression Testing Implementation

Debug-Replay Capabilities

Performance Monitoring Systems

Iterative Development Methods

Section 5: Advanced Agent Behaviors

Task Decomposition Strategies

Goal-oriented Planning

Multi-step Reasoning Implementation

Uncertainty Handling

Building Outstanding AI Agents: Bringing It All Together

Section 1: Understanding the Purpose and Scope

Defining Clear Objectives

Requirements Analysis and Success Metrics

Designing for Reality

Section 2: Cognitive Architecture Fundamentals