Forrester Report: The State of Conversational AI Read the report —>

National Furniture Retailer Reduces Escalations to Human Agents by 33%

A well-known furniture brand faced a significant challenge in enhancing their customer experience (CX) to stand out in a competitive market. By partnering with Quiq, they implemented a custom AI Agent to transform customer interactions across multiple platforms and create more seamless journeys. This strategic move resulted in a 33% reduction in support-related escalations to human agents.

On the other end of the spectrum, the implementation of Proactive AI and a Product Recommendation engine led to the largest sales day in the company’s history through increased chat sales, showcasing the power of AI in improving efficiency and driving revenue.

Let’s dive into the furniture retailer’s challenges, how Quiq solved them using next-generation AI, the results, and what’s next for this household name in furniture and home goods.

The challenges: CX friction and missed sales opportunities

A leading name in the furniture and home goods industry, this company has long been known for its commitment to quality and affordability. Operating in a sector often the first to signal economic shifts, the company recognized the need to differentiate itself through exceptional customer experience.

Before adopting Quiq’s solution, the company struggled with several CX challenges that impeded their ability to capitalize on customer interactions. To start, their original chatbot used basic natural language understanding (NLU), and failed to deliver seamless and satisfactory customer journeys.

Customers experienced friction, leading to escalations, redundant conversations. The team clearly needed a robust system that could streamline operations, reduce costs, and enhance customer engagement.

So, the furniture retailer sought a solution that could not only address these inefficiencies, but also support their sales organization by effectively capturing and routing leads.

The solution: Quiq’s next-gen AI

With a focus on enhancing every touch point of the customer journey, the furniture company’s CX team embarked on a mission to elevate their service offerings, making CX a primary differentiator. Their pursuit led them to Quiq, a trusted technology partner poised to bring their vision to life through advanced AI and automation capabilities.

Quiq partnered with the team to develop a custom AI Agent, leveraging the natural language capabilities of Large Language Models (LLMs) to help classify sales vs. support inquiries and route them accordingly. This innovative solution enables the company to offer a more sophisticated and engaging customer experience.

The AI Agent was designed to retrieve accurate information from various systems—including the company’s CRM, product catalog, and FAQ knowledge base—ensuring customers received timely, relevant, and accurate responses.

By integrating this AI Agent into webchat, SMS, and Apple Messages for Business, the company successfully created a seamless, consistent, and faster service experience.

The AI Agent also facilitated proactive customer engagement by using a new Product Recommendation engine. This feature not only guided customers through their purchase journey, but also contributed to a significant shift in sales performance.

The results are nothing short of incredible

The implementation of the custom AI Agent by Quiq has already delivered remarkable results. One of the most significant achievements was a 33% reduction in escalations to human agents. This reduction translated to substantial operational cost savings and allowed human agents to focus on complex or high-value interactions, enhancing overall service quality.

Moreover, the introduction of Proactive AI and the Product Recommendation engine led to unprecedented sales success. The furniture retailer experienced its largest sales day for Chat Sales in the company’s history, with an impressive 10% of total daily sales attributed to this channel for the first time.

This outcome underscored the potential of AI-powered solutions in driving business growth, optimizing efficiency, and elevating customer satisfaction.

Results recap:

  • 33% reduction in escalations to human agents.
  • 10% of total daily sales attributed to chat (largest for the channel in company history).
  • Tighter, smoother CX with Proactive AI and Product Recommendations woven into customer interactions.

What’s next?

The partnership between this furniture brand and Quiq exemplifies the transformative power of AI in redefining customer experience and achieving business success. By addressing challenges with a robust AI Agent, the company not only elevated its CX offerings, but also significantly boosted its sales performance. This case study highlights the critical role of AI in modern business operations and its impact on a company’s competitive edge.

Looking ahead, the company and Quiq are committed to continuing their collaboration to explore further AI enhancements and innovations. The team plans to implement Agent Assist, followed by Voice and Email AI to further bolster seamless customer experiences across channels. This ongoing partnership promises to keep the furniture retailer at the forefront of CX excellence and business growth.

What is LLM Function Calling and How Does it Work?

For all their amazing capabilities, LLMs have a fundamental weakness: they can’t actually do anything. They read a sequence of input tokens (the prompt) and produce a sequence of output tokens (one at a time) known as the completion. There are no side effects—just inputs and outputs. So something else, such as the application your building, has to take the LLM’s output and do something useful with it.

But how can we get an LLM to reliably generate output that conforms to our application’s requirements? Function calls, also known as tool usages, make it easier for your application to do something useful with an LLM’s output.

Note: LLM functions and tools generally refer to the same concept. ‘Tool’ is the term used by Anthropic/Claude, whereas OpenAI uses the term function as a specific type of tool. For purposes of this article, they are used interchangeably.

What Problem Does LLM Function Calling Solve?

To better understand the problem that function calls solve, let’s pretend we’re adding a new feature to an email client that allows the user to provide shorthand instructions for an email and use an LLM to generate the subject and body:

AI Email Generator

Our application might build up a prompt request like the following GPT-4o-Mini example. Note how we ask the LLM to return a specific format expected by our application:

user = “Kyle McIntyre”
recipient = “Aunt Suzie (suzieq@mailinator.com)”
user_input = "Tell her I can’t make it this Sunday, taking dog to vet. Ask how things are going, keep it folksy yet respectful."

prompt = f"""
Use the available function to draft email on behalf of the user, {user}, to {recipient}.

Here are the user’s instructions: {user_input}
"""

tool = {
  "type": "function",
  "function": {
    "name": "draft_email",
    "description": "Draft an email on behalf of the user",
    "parameters": {
      "type": "object",
      "properties": {
        "subject": {
          "type": "string",
          "description": "The email subject",
        },
        "body": {
          "type": "string",
          "description": "The email body",
        }
      },
"required": ["subject", "body"]
    }
  },
}

request = {
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "user",
      "content": prompt
    }
  ],
  "tools": [tool]
}

response = requests.post('https://api.openai.com/v1/chat/completions', auth=('Bearer', key), json=request)

Assume our application sends this prompt and receives a completion back. What do we know about the completion? In a word: nothing.

Although LLMs do their best to follow our instructions, there’s no guarantee that the output will adhere to our requested schema. Subject and body could be missing, incorrectly capitalized, or perhaps be of the wrong type. Additional properties we didn’t ask for might also be included. Prior to the advent of function calls, our only options at this point were to

  • Continually tweak our prompts in an effort to get more reliable outputs
  • Write very tolerant deserialization and coercion logic in our app to make the LLM’s output adhere to our expectation
  • Retry the prompt multiple times until we receive legal output

Function calls, and a related model feature known as “structured outputs”, make all of this much easier and more reliable.

Function Calls to the Rescue

Let’s code up the same example using a function call. In order to get an LLM to ‘use’ a tool, you must first define it. Typically this involves giving it a name and then defining the schema of the function’s arguments.

In the example below, we define a tool named “draft_email” that takes two required arguments, body and subject, both of which are strings:

user = “Kyle McIntyre”
recipient = “Aunt Suzie (suzieq@mailinator.com)”
user_input = "Tell her I can’t make it this Sunday, taking dog to vet. Ask how things are going, keep it folksy yet respectful."

prompt = f"""
Use the available function to draft email on behalf of the user, {user}, to {recipient}.

Here are the user’s instructions: {user_input}
"""

tool = {
  "type": "function",
  "function": {
    "name": "draft_email",
    "description": "Draft an email on behalf of the user",
    "parameters": {
      "type": "object",
      "properties": {
        "subject": {
          "type": "string",
          "description": "The email subject",
        },
        "body": {
          "type": "string",
          "description": "The email body",
        }
      },
      "required": ["subject", "body"]
    }
  },
}

request = {
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "user",
      "content": prompt
    }
  ],
  "tools": [tool]
}

response = requests.post('https://api.openai.com/v1/chat/completions', auth=('Bearer', key), json=request)

Defining the tool required some extra work on our part, but it also simplified our prompt. We’re no longer trying to describe the shape of our expected output and instead just say “use the available function”. More importantly, we can now trust that the LLM’s output will actually adhere to our specified schema!

Let’s look at the response message we received from GPT-4o-Mini:

{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "type": "function",
      "function": {
        "name": "draft_email",
        "arguments": "{\"subject\":\"Regrets for This Sunday\",\"body\":\"Hi Aunt Suzie,\\n\\nI hope this email finds you well! I wanted to let you know that I can't make it this Sunday, as I need to take the dog to the vet. \\n\\nHow have things been going with you? I always love hearing about what\u2019s new in your life.\\n\\nTake care and talk to you soon!\\n\\nBest,\\nKyle McIntyre\"}"
      }
    }
  ],
  "refusal": null
}

What we received back is really a request from the LLM to ‘call’ our function. Our application still needs to honor the function call somehow.

But now, rather than having to treat the LLMs output as an opaque string, we can trust that the arguments adhere to our application requirements. The ability to define a contract and trust that the LLM outputs will adhere to it make function calls an invaluable tool when integrating an LLM into an application.
How Does Function Calling Work?

As we saw in the last section, in order to get an LLM to generate reliable outputs we have to define a function or tool for it to use. Specifically, we’re defining a schema that the output needs to adhere to. Function calls and tools work a bit differently across various LLM vendors, but they all require the declaration of a schema and most are based on the open JsonSchema standard.

So, how does an LLM ensure that its outputs adhere to the tool schema? How can stochastic token-by-token output generation be reconciled with strict adherence to a data schema?

The solution is quite elegant: LLMs still generate their outputs one token at a time when calling a function, but the model is only allowed to choose from the subset of tokens that would keep the output in compliance with the schema. This is done through dynamic token masking based on the schema’s definition. In this way the output is still generative and very intelligent, but guaranteed to adhere to the schema.

Function Calling Misnomers and Misconceptions

The name ‘function call’ is somewhat misleading because it sounds like the LLM is going to actually do something on your behalf (and thereby cause side effects). But it doesn’t. When the LLM decides to ‘call’ a function, that just means that it’s going to generate output that represents a request to call that function. It’s still the responsibility of your application to handle that request and do something with it—but now you can trust the shape of the payload.

For this reason, a LLM function doesn’t need to map directly to any true function or method in your application, or any real API. Instead, LLM functions can (and probably should) be defined to be more conceptual from the perspective of the LLM.

Use in Agentic Workflows

So, are function calls only useful for constraining output? While that is certainly their primary purpose, they can also be quite useful in building agentic workflows. Rather than presenting a model with a single tool definition, you can instead present it with multiple tools and ask the LLM to use the tools at its disposal to help solve a problem.

For example, you might provide the LLM with the following tools in a CX context:

  • escalate() – Escalate the conversation to a human agent for further review
  • search(query) – Search a knowledgebase for helpful information
  • emailTranscript() – Email the customer a transcript of the conversation

When using function calls in an agentic workflow, the application typically interprets the function call and somehow uses it to update the information passed to the LLM in the next turn.

It’s also worth noting that conversational LLMs can call functions and generate output messages intended for the user all at the same time. If you were building an AI DJ, the LLM might call a function like play_track(“Traveler”, “Chris Stapleton”) while simultaneously saying to the user: “I’m spinning up one of your favorite Country tunes now”.

Function Calling in Quiq’s AI Studio

Function calling is fully supported in Quiq’s AI Studio on capable LLMs. However, AI Studio goes further than basic function call support in three key ways:

  1. The expected output shape of any prompt (the completion schema) can be visually configured in the Prompt Editor
  2. Prompt outputs aren’t just used for transient function calls but become attached to the visual flow state for inspection later in the same prompt chain or conversation
  3. Completion schemas can be configured on LLMs – even those that don’t support function calls

If you’re interested to learn more about AI Studio, please request a trial.

Why LLM Observability Matters (and Strategies for Getting it Right)

When integrating Large Language Models (LLMs) into applications, you can’t afford to treat them like “black boxes.” As your LLM application scales and becomes more complex, the need to monitor, troubleshoot, and understand how the LLM impacts your application becomes critical. In this article, we’ll explore the observability strategies we’ve found useful here at Quiq.

Key Elements of an Effective LLM Observability Strategy

  1. Provide Access: Encourage business users to engage actively in testing and optimization.
  2. Encourage Exploration: Make it easy to explore the application under different scenarios.
  3. Create Transparency: Clearly show how the model interacts within your application, reveal decision-making processes, system interactions, and how outputs are verified.
  4. Handle Errors Gracefully: Proactively identify and handle deviations or errors.
  5. Track System Performance: Expose metrics like response times, token usage, and errors.

LLMs add a layer of unpredictability and complexity to an application. Your observability tooling should allow you to actively explore both known and unknown issues while fostering an environment where engineers and business users can collaborate to create a new kind of application.

5 Strategies for LLM Observability

We will discuss strategies from the perspective of a real world event. An “event” triggers an application to process input and provides output back to the world.

A few examples of events include:

  • Chat user message input > Chat response
  • An email arriving into a ticketing system > Suggested reply
  • A case being closed > Case updated for topic or other classifications

You may have heard of these events referred to as prompt chains, prompt pipelines, agentic workflows, or conversational turns. The key takeaway; an event will require more than a single call to an LLM. Your LLM application’s job is to orchestrate LLM prompts, data requests, decisions and actions. The following strategies will help you understand what’s happening inside your LLM application.

1. Tracing Execution Paths

Any given event may follow different execution paths. Tracing the execution path should allow you to understand what state is set, which knowledge was retrieved, functions called, and generally how and why the LLM generated and verified the response. The ability to trace the execution path of an event will provide invaluable visibility into your application behavior.

For example, if your application delivers a message that offers a live agent; was it because the topic was sensitive, the user was frustrated or there was a gap in the knowledge resources? Tracing the execution path will help you pinpoint the prompt, knowledge or logic that drove the response. This is the first step in monitoring and optimizing an AI application. Your LLM observability should provide a full trace of the execution path that led to a response being delivered.

2. Replay Mechanisms for Faster Debugging

In real-world applications, being able to reproduce and fix errors quickly is critical. Implementing an event replay mechanism—where past events can be replayed against the current system configuration will provide a fast feedback loop.

Replaying events also helps when modifying prompts, upgrading models, adding knowledge or editing business rules. Changing your LLM application should be done in a controlled environment where you can replay events and ensure the desired effect without introducing new issues.

3. State Management & Monitoring

Another key aspect of LLM observability is capturing how your application’s field values or state changes during an event, as well as, across related events such as a conversation. Understanding the state of different variables can help you better understand and recreate the results of your LLM application.

Many use cases will also make use of memory. You should strive to manage this memory consistently and use caching for order or product info to reduce unnecessary network calls. In addition to data caches, multi-turn conversations may react differently based on the memory state. Suppose a user types “I need help” and you have implemented a next-best-action classifier with the following options:

  • Clarify the inquiry
  • Find Information
  • Escalate to live agent

The action taken may depend on whether “I need help” is the 1st or 5th message of the conversation. The response could also depend on whether the inquiry type is something you want your live agents handling.

The key takeaway – LLMs introduce a new kind of intelligence, but you’ll still need to manage state and domain specific logic to ensure your application is aware of its context. Clear visibility into the state of your application and your ability to reproduce it are vital parts of your observability strategy.

4. Claims Verification

A critical challenge with LLMs is ensuring the validity of the information they generate. Some refer to these made up answers as hallucinations. A hallucination is a statement made up by the LLM, usually because it makes semantic sense.

A claims verification process provides confidence that a response is grounded, attributable and verified by approved evidence from known knowledge or API resources. A dedicated verification model should be used to provide a confidence score and handling should be put in place to align answers that fail verification. The verification process should use metrics such as the maximum, minimum, and average scores and attribute answers to one or many resources.

For example:

  • On Verified: Define actions to take when a claim is verified. This could involve attributing the answer to one or many articles or API responses and then delivering a response to the end user.
  • On Unverified: Set workflows for unverified claims, such as retrying a prompt pipeline, aligning a corrective response, or escalating the issue to a human agent.

By integrating a claims verification model and process into your LLM application, you gain the ability to prevent hallucinations and attribute responses to known resources. This clear and traceable attribution will equip you with the information you need to field questions from stakeholders and provide insight into how you can improve your knowledge.

5. Regression Tests

After optimizing prompts, upgrading models, or introducing new knowledge; you’ll want to ensure that these changes don’t introduce new problems. Earlier, we talked about replaying events and this replay capability should be the basis for creating your test cases. You should be able to save any event as a regression test. Your test-sets should be run individually or in batch as part of a continuous integration pipeline.

The models are moving fast and your LLM application will be under constant pressure to get faster, smarter and cheaper. Test sets will give you the visibility and confidence you need to stay ahead of your competition.

Setting Performance Goals

While the above strategies are essential, it’s also important to evaluate how well your system is achieving its higher-level objectives. This is where performance goals come into play. Goals should be instrumented to track whether your application is successfully meeting the business objectives.

  • Goal Success: Measure how often your application achieves a defined objective, such as confirming an upcoming appointment, rendering an order status, or receiving positive user feedback.
  • Goal Failure: Track instances where the LLM fails to complete a task or requires human assistance.

Keep in mind that an event such as a live agent escalation could be considered success for one type of inquiry, and a failure in a different scenario. Goal instrumentation should provide a high degree of flexibility. By setting clear success and failure criteria for your application, you will be better positioned to evaluate its performance over time and identify areas for improvement.

Applying Segmentation to Hone In

Segmentation is a powerful tool for diving deeper into your LLM application’s performance. By grouping conversations or events based on specific criteria, such as inquiry type, user type or product category; you can focus your analysis on areas that matter most to your application.

For instance, you may want to segment conversations to see if your application behaves differently on web versus mobile, or across sales versus service inquiries. You can also create more complex segments that filter interactions based on specific events, such as when an error occurred or when a specific topic category was in play. Segmentation allows you to tailor your observability efforts to the use cases and specific needs of your business.

Using Funnels for Conversion and Performance Insights

Funnels provide another layer of insight by showing how users progress through a series of steps within a customer journey or conversation. A funnel allows you to visualize drop-offs, identify where users disengage, and track how many complete the intended goal. For example, you can track the steps a customer takes when engaging with your LLM application, from initial inquiry to task completion, and analyze where drop-offs occur.

Funnels can be segmented just like other data, allowing you to drill down by platform, customer type, or interaction type. This helps you understand where improvements are needed and how adjustments to prompts or knowledge bases can enhance the overall experience.

By combining segmentation with funnel analysis, you get a comprehensive view of your LLM’s effectiveness and can pinpoint specific areas for optimization.

A/B Testing for Continuous Improvement

A/B testing is a vital tool for systematically improving LLM application performance by comparing different versions of prompts, responses, or workflows. This method allows you to experiment with variations of the same interaction and measure which version produces better results. For instance, you can test two different prompts to see which one leads to more successful goal completions or fewer errors.

By running A/B tests, you can refine your prompt design, optimize the LLM’s decision-making logic, and improve overall user experience. The results of these tests give you data-backed insights, helping you implement changes with confidence that they’ll positively impact performance.

Additionally, A/B testing can be combined with funnel analysis, allowing you to track how changes affect customer behavior at each step of the journey. This ensures that your optimizations not only improve specific interactions but also lead to better conversion rates and task completions overall.

Final Thoughts on LLM Observability

LLM observability is not just a technical necessity but a strategic advantage. Whether you’re dealing with prompt optimization, function call validation, or auditing sensitive interactions, observability helps you maintain control over the outputs of your LLM application. By leveraging tools such as event debug-replay, regression tests, segmentation, funnel analysis, A/B testing, and claims verification, you will build trust that you have a safe and effective LLM application.

Curious about how Quiq approaches LLM observability? Get in touch with us.

Everything You Need to Know About LLM Integration

It’s hard to imagine an application, website or workflow that wouldn’t benefit in some way from the new electricity that is generative AI. But what does it look like to integrate an LLM into an application? Is it just a matter of hitting a REST API with some basic auth credentials, or is there more to it than that?

In this article, we’ll enumerate the things you should consider when planning an LLM integration.

Why Integrate an LLM?

At first glance, it might not seem like LLMs make sense for your application—and maybe they don’t. After all, is the ability to write a compelling poem about a lost Highland Cow named Bo actually useful in your context? Or perhaps you’re not working on anything that remotely resembles a chatbot. Do LLMs still make sense?

The important thing to know about ‘Generative AI’ is that it’s not just about generating creative content like poems or chat responses. Generative AI (LLMs) can be used to solve a bevy of other problems that roughly fall into three categories:

  1. Making decisions (classification)
  2. Transforming data
  3. Extracting information

Let’s use the example of an inbound email from a customer to your business. How might we use LLMs to streamline that experience?

  • Making Decisions
    • Is this email relevant to the business?
    • Is this email low, medium or high priority?
    • Does this email contain inappropriate content?
    • What person or department should this email be routed to?
  • Transforming data
    • Summarize the email for human handoff or record keeping
    • Redact offensive language from the email subject and body
  • Extracting information
    • Extract information such as a phone number, business name, job title etc from the email body to be used by other systems
  • Generating Responses
    • Generate a personalized, contextually-aware auto-response informing the customer that help is on the way
    • Alternatively, deploy a more sophisticated LLM flow (likely involving RAG) to directly address the customer’s need

It’s easy to see how solving these tasks would increase user satisfaction while also improving operational efficiency. All of these use cases are utilizing ‘Generative AI’, but some feel more generative than others.

When we consider decision making, data transformation and information extraction in addition to the more stereotypical generative AI use cases, it becomes harder to imagine a system that wouldn’t benefit from an LLM integration. Why? Because nearly all systems have some amount of human-generated ‘natural’ data (like text) that is no longer opaque in the age of LLMs.

Prior to LLMs, it was possible to solve most of the tasks listed above. But, it was exponentially harder. Let’s consider ‘is this email relevant to the business’. What would it have taken to solve this before LLMs?

  • A dataset of example emails labeled true if they’re relevant to the business and false if not (the bigger the better)
  • A training pipeline to produce a custom machine learning model for this task
  • Specialized hardware or cloud resources for training & inferencing
  • Data scientists, data curators, and Ops people to make it all happen

LLMs can solve many of these problems with radically lower effort and complexity, and they will often do a better job. With traditional machine learning models, your model is, at best, as good as the data you give it. With generative AI you can coach and refine the LLM’s behavior until it matches what you desire – regardless of historical data.

For these reasons LLMs are being deployed everywhere—and consumers’ expectations continue to rise.

How Do You Feel About LLM Vendor Lock-In?

Once you’ve decided to pursue an LLM integration, the first issue to consider is whether you’re comfortable with vendor lock-in. The LLM market is moving at lightspeed with the constant release of new models featuring new capabilities like function calls, multimodal prompting, and of course increased intelligence at higher speeds. Simultaneously, costs are plummeting. For this reason, it’s likely that your preferred LLM vendor today may not be your preferred vendor tomorrow.

Even at a fixed point in time, you may need more than a single LLM vendor.

In our recent experience, there are certain classification problems that Anthropic’s Claude does a better job of handling than comparable models from OpenAI. Similarly, we often prefer OpenAI models for truly generative tasks like generating responses. All of these LLM tasks might be in support of the same integration so you may want to look at the project not so much as integrating a single LLM or vendor, but rather a suite of tools.

If your use case is simple and low volume, a single vendor is probably fine. But if you plan to do anything moderately complex or high scale you should plan on integrating multiple LLM vendors to have access to the right models at the best price.

Resiliency & Scalability are Earned—Not Given

Making API calls to an LLM is trivial. Ensuring that your LLM integration is resilient and scalable requires more elbow grease. In fact, LLM API integrations pose unique challenges:

Challenge Solutions
They are pretty slow If your application is high-scale and you’re doing synchronous (threaded) network calls, your application won’t scale very well since most threads will be blocked on LLM calls. Consider switching to async I/O.

You’ll also want to support running multiple prompts in parallel to reduce visible latency to the user. 
They are throttled by requests per minute and tokens per minute Attempt to estimate your LLM usage in terms of requests and LLM tokens per minute and work with your provider(s) to ensure sufficient bandwidth for peak load 
They are (still) kinda flakey (unpredictable response times, unresponsive connections) Employ various retry schemes in response to timeouts, 500s, 429s (rate limit) etc.

The above remediations will help your application be scalable and resilient while your LLM service is up. But what if it’s down? If your LLM integration is on a critical execution path you’ll want to support automatic failover. Some LLMs are available from multiple providers:

  • OpenAI models are hosted by OpenAI itself as well as Azure
  • Anthropic models are hosted by Anthropic itself as well as AWS

Even if an LLM only has a single provider, or even if it has multiple, you can also provision the same logical LLM in multiple cloud regions to achieve a failover resource. Typically you’ll want the provider failover to be built into your retry scheme. Our failover mechanisms get tripped regularly out in production at Quiq, no doubt partially because of how rapidly the AI world is moving.

Are You Actually Building an Agentic Workflow?

Oftentimes you have a task that you know is well-suited for an LLM. For example, let’s say you’re planning to use an LLM to analyze the sentiment of product reviews. On the surface, this seems like a simple task that will require one LLM call that passes in the product review and asks the LLM to decide the sentiment. Will a single prompt suffice? What if we also want to determine if a given review contains profanity or personal information? What if we want to ask three LLMs and average their results?

Many tasks require multiple prompts, prompt chaining and possibly RAG (Retrieval Augmented Generation) to best solve a problem. Just like humans, AI produces better results when a problem is broken down into pieces. Such solutions are variously known as AI Agents, Agentic Workflows or Agent Networks and are why open source tools like LangChain were originally developed.

In our experience, pretty much every prompt eventually grows up to be an Agentic Workflow, which has interesting implications for how it’s configured & monitored.

Be Ready for the Snowball Effect

Introducing LLMs can result in a technological snowball effect, particularly if you need to use Retrieval Augmented Generation (RAG). LLMs are trained on mostly public data that was available at a fixed point in the past. If you want an LLM to behave in light of up-to-date and/or proprietary data sources (which most non-trivial applications do) you’ll need to do RAG.

RAG refers to retrieving the up-to-date and/or proprietary data you want the LLM to use in its decision making and passing it to the LLM as part of your prompt.

Assuming you need to search a reference dataset like a knowledge base, product catalog or product manual, the retrieval part of RAG typically entails adding the following entities to your system:

1. An embedding model

An embedding model is roughly half of an LLM – it does a great job of reading and understanding information you pass it but instead of generating a completion it produces a numeric vector that encodes its understanding of the source material.

You’ll typically run the embeddings model on all of the business data you want to search and retrieve for the LLM. Most LLM providers also have embedding models, or you can hit one via any major cloud.

2. A vector database

Once you have embeddings for all of your business data, you need to store them somewhere that facilitates speedy search based on numeric vectors. Solutions like Pinecone and MilvusDB fill this need, but that means integrating a new vendor or hosting a new database internally.

After implementing embeddings and a vector search solution, you can now retrieve information to include in the prompts you send to your LLM(s). But how can you trust that the LLM’s response is grounded in the information you provided and not something based on stale information or purely made up?

There are specialized deep learning models that exist solely for the purpose of ensuring that an LLM’s generative claims are grounded in facts you provide. This practice is variously referred to as hallucination detection, claim verification, NLI, etc. We believe NLI models are an essential part of a trustworthy RAG pipeline, but managed cloud solutions are scarce and you may need to host one yourself on GPU-enabled hardware.

Is a Black Box Sustainable?

If you bake your LLM integration directly into your app, you will effectively end up with a black box that can only be understood and improved by engineers. This could make sense if you have a decent size software shop and they’re the only folks likely to monitor or maintain the integration.

However, your best software engineers may not be your best (or most willing) prompt engineers, and you may wish to involve other personas like product and experience designers since an LLM’s output is often part of your application’s presentation layer & brand.

For these reasons, prompts will quickly need to move from code to configuration – no big deal. However, as an LLM integration matures it will likely become an Agentic Workflow involving:

  • More prompts, prompt parallelization & chaining
  • More prompt engineering
  • RAG and other orchestration

Moving these concerns into configuration is significantly more complex but necessary on larger projects. In addition, people will inevitably want to observe and understand the behavior of the integration to some degree.

For this reason it might make sense to embrace a visual framework for developing Agentic Workflows from the get-go. By doing so you open up the project to collaboration from non-engineers while promoting observability into the integration. If you don’t go this route be prepared to continually build out configurability and observability tools on the side.

Quiq’s AI Automations Take Care of LLM Integration Headaches For You

Hopefully we’ve given you a sense for what it takes to build an enterprise LLM integration. Now it’s time for the plug. The considerations outlined above are exactly why we built AI Studio and particularly our AI Automations product.

With AI automations you can create a serverless API that handles all the complexities of a fully orchestrated AI-flow, including support for multiple LLMs, chaining, RAG, resiliency, observability and more. With AI Automations your LLM integration can go back to being ‘just an API call with basic auth’.

Want to learn more? Dive into AI Studio or reach out to our team.

Request A Demo

How a Leading Office Supply Retailer Answered 35% More Store Associate Questions with Generative AI

In an era where artificial intelligence is rapidly transforming various industries, the retail sector is no exception. One leading national office supply retailer has taken a bold step forward, harnessing the power of generative AI to revolutionize their in-store experience and empower their associates.

This innovative approach has not only enhanced customer satisfaction but has also led to remarkable improvements in employee efficiency. In fact, the company has experienced a 35% increase in containment rates (with a 6-month average containment rate of 65%) vs. its legacy solution.

We’re excited to share the details of this groundbreaking initiative. Keep reading as we examine the company’s vision, their strategic approach to implementation, and the key objectives that drove their AI adoption. We’ll also discuss their GenAI assistant’s primary capabilities and how it’s improving both customer experiences and employee satisfaction. By the end, you’ll see how much potential lies in applying this use case to additional employees—not just in-store associates—as well as customers. There’s so much to unlock. Ready? Let’s dive in.

The Vision: Empowering Associates with GenAI

This company is dedicated to helping businesses of all sizes become more productive, connected, and inspired. Their team recognized the immense potential of GenAI early on. The vision? To create a GenAI-powered assistant that could enhance the capabilities of their store associates, leading to improved customer service, increased productivity, and higher job satisfaction.

Key objectives of the GenAI initiative:

  • Simplify store associate experience
  • Streamline access to information for associates
  • Improve customer service efficiency
  • Boost associate confidence and job satisfaction
  • Increase overall store associate productivity

Charting the Course to Building a GenAI-Powered Assistant

By partnering with Quiq, the national office supply retailer launched its employee-facing GenAI assistant in just 6 weeks. Here’s what the launch process looked like in 9 primary steps:

  1. Discovery of AI enhancements
  2. Pulling content from current systems
  3. Run a Proof of Concept with Quiq team
  4. Run testing through all categories of content
  5. Approval to Pilot with Top Associate Group
  6. Refine content based on associate feedback for chain rollout
  7. Run additional testing through all categories
  8. Starting chain deployment to larger district of stores
  9. Maintain content accuracy and refine based on updates

Examining the Office Supplier’s Phased Approach to Adoption

Pre-launch, the teams worked together to ensure all content was updated and accurate. Then they launched a phased testing approach, going through several rounds of iterative testing. After that, the retailer shared the GenAI assistant with a top internal associate team to test and try and break it. Finally, the internal team utilized a top associate group to share excitement before launch.

At launch, the office supplier created a standalone page dedicated to the assistant and launched a SharePoint site to share updates for the internal team. They also facilitated internal learning sessions and quickly adapted to low feedback numbers. Last but not least, the team made it fun by branding the assistant with a fun, on-brand name and personality.

Post-launch, the retailer includes the AI assistant in all communications to associates, with tips on what to search for in the assistant. They also leverage the assistant’s proactive messaging capabilities to build excitement for new launches, promotions, and best practices.

Primary Capabilities and Focus

Launching the GenAI assistant has been transformative because it is trained on all things related to the office supply retailer, which has simplified and accelerated access to information. That means associates can help customers faster, answering questions accurately the first time and every time, regardless of tenure. Ultimately, AI is empowering associates to do even better work—including enhanced cross and upselling with proactive messages.

Proactive messaging to associates helps keep rotating sales goals top of mind so they can weave additional revenue opportunities into customer interactions. For example, if the design services team has unexpected bandwidth, the AI assistant can send a message letting associates know, inspiring them to highlight design and print services to customers who may be interested. It also provides a fun countdown to important launches, like back-to-school season, and “fun facts” that help build up useful knowledge over time. It’s like bite-size bits of training.

GenAI Transforms the In-Store Experience in 4 Critical Ways

Implementing the GenAI assistant has had a profound impact on in-store operations. By providing associates with instant access to accurate information, it has:

  1. Enhanced Customer Service: Associates can now provide faster, more accurate responses to customer questions.
  2. Increased Efficiency: The time it takes to find information has been significantly reduced, allowing associates to serve more customers.
  3. Boosted Confidence: With a reliable AI assistant at their fingertips, associates feel more empowered in their roles. Plus, new associates can be as effective as experienced ones with the assistant by their side.
  4. Improved Job Satisfaction: The reduced stress of information retrieval has led to higher job satisfaction among associates. Not to mention, the GenAI assistant is there to converse and empathize with employees who experience stressful situations with customers.

Results + What’s Next?

As a result of launching its GenAI assistant with Quiq, our national office supply retailer customer has realized a:

  • 68% self service rate resolution rate, allowing associates to get immediate answers to questions 2 out of 3 times
  • Associate satisfaction with AI 4.82 out of 5

And as for next steps, the team is excited to:

  • Launch a selling assisted path
  • Expand to additional departments within stores
  • Add more devices in store for easier accessibility
  • Integrate with internal systems to be able to answer even more types of questions with real-time access to orders and other information

The Lesson: Humans and AI Can Work Together to Play Their Strongest Roles

The office supply retailer’s successful implementation of GenAI serves as a powerful example of how the technology can transform retail operations by helping human employees work more efficiently. By focusing on empowering associates with AI, the company has not only improved customer service but also enhanced employee satisfaction and productivity.

Interested in Diving Deeper into GenAI?

Download Two Truths and a Lie: Breaking Down the Major GenAI Misconceptions Holding CX Leaders Back. This comprehensive guide illuminates the path through the intricate landscape of generative AI in CX. We cut through the fog of misconceptions, offering crystal-clear, practical advice to empower your decision-making.