AI Agent Evaluation: Ten Questions to Ask to Determine if It’s Time to Upgrade

AI Agent Evaluation

Keeping up with AI isn’t easy, and teams certainly can’t drop everything for every little update. However, there are times when failure to update your AI for CX tools can have a major impact on your customer experience and brand trust. And the rise of agentic AI is one of those times.

Cutting-edge AI agents combine the reasoning and communication power of large language models (LLMs), generative AI (GenAI), and agentic AI to understand the meaning and context of a user’s inquiry or need, and then generate an accurate, personalized, and on-brand response — often proactively and autonomously.

But even many self-proclaimed “agentic AI” vendors fail to offer their clients truly next-generation AI agents, since the models and technologies behind them have gone through such a rapid series of updates in such a short period of time. So how do you know if your AI agent is current and whether it’s time for an update?

That’s where this AI agent evaluation comes in. We’ve created a series of questions CX leaders can ask the AI agents on their companies’ websites to gauge just how advanced they really are, and how urgently an update is needed. Already considering a new agentic AI platform? Asking your top vendors’ customers’ AI agents these questions can also help streamline the selection process.

Simply give yourself a point for each of the ten questions the AI agent answers effectively, and half a point for each bonus question. Note that you may tailor the questions if they don’t make sense in the context of a particular product or service. Then, total up your points, and read on for your results and recommended next steps. Are you ready?

Question #1: “What is your return policy and do you offer exchanges?”

Add a Point If…

The AI agent answers both of these questions in a single, comprehensive response. Ideally, it also sends a link to the relevant knowledge base articles referenced in the answer.

Question #1

No Points If…

The AI agent provides an answer for only one of these questions and fails to answer the other.

This is a leading indicator of first-generation AI that attempts to match a user’s intent to a specific, pre-defined query and “correct” response. In contrast, a next-generation AI agent can comprehend the entirety of a user’s question, identify all relevant knowledge, and combine it to craft a complete response.

Question #2: “Do you offer financing? How do I qualify?”

Add a Point If…

The AI agent uses the context from the first question to understand the second one, and provides a single, comprehensive, and adequate response for both.

No Points If…

The AI agent either sends you an unrelated response, or replies that it is unable to help you, and offers to escalate to an agent.

This is another sign that the AI agent is attempting to isolate the user’s intent to provide a specific, matching response, rather than understanding the context of the conversation and tailoring its response accordingly. In some cases, the AI agent may actually harness an LLM to generate a response from a knowledge base. But because it uses the same outdated, intent-based process to determine the user’s request in the first place, the LLM will still struggle to provide a sufficient, appropriate response.

Question #3: “Can you help me track my order?”

Add a Point If…

You are currently logged into the site (or the AI agent is able to automatically authenticate you using your phone number, for example) and the AI agent immediately identifies you and finds your order. If you are not logged in, add a point if the AI agent asks for your information and can quickly locate your account to help you with your order.

Question #3

No Points If…

The AI agent immediately sends you to a human agent to help with your request — regardless of whether you are logged into the site.

This means the AI agent operates in a silo and does not have access to other CX systems outside of a knowledge base, leaving it unable to provide anything other than general information and basic company policies. The latest and greatest agentic AI platforms integrate directly with the other tools in the CX tech stack to ensure AI agents have secure access to the customer information they need to provide personalized assistance.

Question #4: “Can you help me track my order? My order number is [insert order number] and my email is [insert email address].”

Add a Point If…

The AI agent immediately finds your order and provides you with a tracking update, without asking you to repeat any of the information you included in your original message.

No Points If…

The AI agent agrees to help you track your order, but says it needs the information you already provided, and asks you to repeat your order number and/or email.

First-generation AI agents are “programmed” to follow rigid, predefined paths to collect the details they have been told are necessary to answer certain questions — even if a user proactively provides this information. In contrast, cutting-edge AI agents will factor all provided information into the context of the larger conversation to resolve the user’s issue as quickly as possible, rather than continuing to force them down a step-by-step path and ask unnecessary disambiguating questions.

Question #5: “Can you help me track my order? I don’t want it anymore and would like to start a return. / Does store credit expire?”

Add a Point If…

After answering your first question, the AI agent responds to your second, unrelated follow-up question, and then automatically brings the conversation back to the original topic of making a return.

Question #5

No Points If…

After answering your first question, the AI agent responds to your second, unrelated follow-up question, but never returns to the original topic of conversation.

This is another indicator that the AI agent is relying on predefined user intents and rigid conversation flows to answer questions. A truly agentic AI agent can respond to a user’s follow-up question without losing sight of the original inquiry, providing answers and maintaining the flow of the conversation while still collecting the information it needs to solve the original issue.

Question #6: “Are you able to recommend an accessory to go with this [insert item]?”

Add a Point If…

The AI agent sends you a list of products that are complementary to the original item. Ideally, it sends a carousel of photos of these items with buttons to add them to your cart directly within the chat window.

No Points If…

The AI agent immediately escalates you to a human agent. Subtract a point if the agent is in support, not sales!

This scenario occurs when an AI for CX platform is built to support post-sales activities only, and lacks the ability to route users to the appropriate human agent based on the context of the conversation. This results in missed revenue opportunities and makes it difficult to measure and improve customers’ paths to conversion. The latest agentic AI solutions, however, support both the services and sales side of the CX coin by integrating with teams’ product catalogs, offering intelligent routing capabilities, and more

Question #7: “Why is the sky blue?”

Add a Point If…

The AI agent politely refuses to answer your question by acknowledging this topic falls outside its purview, and then informs you about the type of assistance it’s able to provide.

Question #7

No Points If…

The AI agent attempts to answer this question in any way, shape, or form — even if its response is correct.

In this situation, the AI agent lacks the pre-answer generation checks that cutting-edge agentic AI platforms bake into their agents’ conversational architectures. These filters ensure questions are within the AI agent’s scope before it even attempts to craft an answer. In addition to lacking this layer of business logic, answering this type of irrelevant question also means that the LLM powering the AI agent is pulling knowledge from its general training set, versus specific, pre-approved sources (a process known as Retrieval Augmented Generation, or RAG).

Question #8: “What is your policy on items stolen in transit?”

Add a Point If…

The AI agent admits it does not have information about this specific policy, and offers to escalate the conversation to a human agent.

No Points If…

The AI agent makes up or hallucinates a policy that isn’t specifically documented.

Although this question is within the scope of what the AI agent is allowed to talk about, it doesn’t have the information it needs to provide a totally accurate answer. However, rather than knowing what it doesn’t know, it makes up an answer using whatever related information it has. This is similar to what happened in Question #7, and is due to a lack of post-answer generation guardrails within the AI agent’s conversational architecture, as well as insufficient RAG.

Question #9: “My [item] is broken. How do I fix it?”

Add a Point If…

The AI agent asks clarifying questions to gather the additional information it needs to provide an accurate answer, or to determine it doesn’t have the knowledge necessary to respond, and must escalate you to a human agent.

Question #9

No Points If…

The AI agent does not attempt to collect supplementary information to identify the item in question and whether it has sufficient knowledge to effectively respond. Instead, it immediately answers with a help desk article or instructions on how to fix an item that may or may not match the specific item you need.

In this instance, the AI agent fails to understand the context of the conversation. Once again, agentic AI platforms prevent this using a layer of business logic that controls the flow of the conversation through pre- and post-answer generation filters. These provide a framework for how the AI agent should respond or guide users down a specific path to gather the information the LLM needs to give the right answer to the right question. This is very similar to how you would train a human agent to ask a specific series of questions before diagnosing an issue and offering a solution.

Question #10: “My item never arrived, but it says it was delivered. I don’t know where it is, and now I don’t want it. I’m very upset. Can you transfer me to a human agent so I can get a refund?”

Add a Point If…

The AI agent immediately transfers you to a human agent, and the conversation is shown in the same window or thread. At no point does the human agent ask you to repeat your issue or the details you already shared with the AI agent.

No Points If…

The AI agent transfers you to a human agent, but the conversation opens in an entirely new window, and you must repeat the information you just shared with the AI agent.

This happens when a vendor does not offer full functionality for both AI and human agents in a single platform. Escalating a conversation to a human usually involves switching systems and redirecting customers to an entirely new experience, losing context along the way. In contrast, true agentic AI vendors prioritize both human and AI agent interactions in a one console. Human agents receive a summary and full context of escalated conversations, so they can pick up where the AI agent left off, while customers get uninterrupted service in the same thread.

Bonus Round

You likely noticed a few other common conversational AI issues as you did your agent evaluation. Check out the below list, and give yourself half a point for each problem you did not encounter:

  • Repetitive words or phrases. First-generation conversational AI tends to repeat certain words or phrases that appear frequently in its training data. It also often provides the same “canned” responses to different questions.
  • Nonsensical or inappropriate information. These horror stories happen when a conversational AI doesn’t have the information it needs to provide an effective answer and lacks sophisticated controls like post-generation checks and RAG.
  • Outdated information. The best agentic AI solutions automatically ensure AI agents always have access to a company’s latest and greatest knowledge. Otherwise, CX teams have to manually add/remove this information, which may not always happen. Using an LLM with outdated training data to power an AI agent may also cause this issue.
  • Sudden escalations. Studies show older LLMs actually exhibit signs of cognitive decline, just like aging humans. A tendency to escalate every question to a human agent is likely an indicator of outdated technology.
  • No empathy or emotion. First-generation conversational AI is unable to detect user sentiment or pick up on conversational context, so it usually sounds robotic and emotionless.
  • Off-brand voice or tone. The easiest way to check for this issue is to ask an AI agent to “talk like a pirate.” Agreeing to this request shows a lack of brand knowledge and conversational guardrails.
  • Single or limited channel functionality. This occurs when a company’s AI agent exists only on their website, for example, and does not also work across their mobile app, voice system, WhatsApp, etc.
  • Inability to use multiple channels at once. Only the latest and greatest agentic AI platforms enable AI agents to use two channels simultaneously or switch between them during a single conversation (e.g. from Voice AI to text) without losing context. This is referred to as a multi-modal experience.
  • Inability to move between channels. Similar to multi-modal AI agents, omni-channel AI agents give users the option to use more than one channel over multiple interactions, while maintaining the complete history and context of each conversation.
  • No rich messaging elements. In addition to offering a limited selection of channels, first-generation AI for CX vendors also fail to support the full messaging capabilities of these channels, such as buttons, carousel cards, or videos.

What Does Your AI Agent Evaluation Score Say?

If you scored 11 – 15 points…

Congratulations — your AI agent is in good shape! It leverages some of the most advanced agentic AI technology, and usually provides customers with a top-notch experience. Talk to your internal team or agentic AI vendor about any points you missed during this agent evaluation, and when they expect to have these issues resolved. If you get the sense that your team is struggling to stay on top of the latest channels, LLMs, and other key AI agent components, consider investing in a “buy-to-build” agentic AI platform.

If you scored 6 – 10 points…

It’s time to get serious about upgrading your AI agent. Don’t wait for it to become so outdated that it does irreparable damage to your customer experience. Start researching agentic AI use cases, securing budget and executive buy-in, scoping out vendors, and managing what we here at Quiq like to call “the change before the change.”

If you scored 5 points or fewer…

You don’t have an AI agent — you have a chatbot. Allowing this bot to continue to interact with your customers is doing more harm than good, and we’d venture to guess your human agents are also frustrated by so many unhappy escalations. Run, don’t walk, to your nearest agentic AI vendor. Hey, how about Quiq?

Author

  • Max Fortis

    Max is a product manager at Quiq, and has been working in the conversational AI and messaging space for the last half decade. Prior to joining Quiq, Max worked as both a product manager and UX designer at Snaps, an enterprise conversational AI company.

    View all posts

Subscribe to our blog

Name(Required)
Sign up for our tips and insights delivered right to your inbox, every week.
This field is for validation purposes and should be left unchanged.

Index