It’s long been clear that advances in artificial intelligence change how businesses operate. Whether it’s extremely accurate machine translation, chatbots that automate customer service tasks, or spot-on recommendations for music and shows, enterprises have been using advanced AI systems to better serve their customers and boost their bottom line for years.
Today the big news is generative AI, with large language models (LLMs) in particular capturing the imagination. As we’d expect, businesses in many different industries are enthusiastically looking at incorporating these tools into their workflows, just as prior generations did for the internet, computers, and fax machines.
But this alacrity must be balanced with a clear understanding of the tradeoffs involved. It’s one thing to have a language model answer simple questions, and quite another to have one engaging in open-ended interactions with customers involving little direct human oversight.
If you have an LLM-powered application and it goes off the rails, it could be mildly funny, or it could do serious damage to your brand persona. You need to think through both possibilities before proceeding.
This piece is intended as a primer on effectively using LLMs for the enterprise. If you’re considering integrating LLMs for specific applications and aren’t sure how to weigh the pros and cons, it will provide invaluable advice on the different options available while furnishing the context you need to decide which is the best fit for you.
How Are LLMs Being Used in Business?
LLMs like GPT-4 are truly remarkable artifacts. They’re essentially gigantic neural networks with billions of internal parameters, trained on vast amounts of text data from books and the internet.
Once they’re ready to go, they can be used to ask and answer questions, suggest experiments or research ideas, write code, write blog posts, and perform many other tasks.
Their flexibility, in fact, has come as quite a surprise, which is why they’re showing up in so many places. Before we talk about specific strategies for integrating LLMs into your enterprise, let’s walk through a few business use cases for the technology.
Generating (or rewriting) text
The obvious use case is generating text. GPT-4 and related technologies are very good at writing generic blog posts, copy, and emails. But they’ve also proven useful in more subtle tasks, like producing technical documentation or explaining how pieces of code work.
Sometimes it makes sense to pass this entire job on to LLMs, but in other cases, they can act more like research assistants, generating ideas or taking human-generated bullet points and expanding on them. It really depends on the specifics of what you’re trying to accomplish.
A subcategory of text generation is using an LLM as a conversational AI agent. Clients or other interested parties may have questions about your product, for instance, and many of them can be answered by a properly fine-tuned LLM instead of by a human. This is a use case where you need to think carefully about protecting your brand persona because LLMs are flexible enough to generate inappropriate responses to questions. You should extensively test any models meant to interact with customers and be sure your tests include belligerent or aggressive language to verify that the model continues to be polite.
Another place that LLMs have excelled is in summarizing already-existing text. This, too, is something that once would’ve been handled by a human, but can now be scaled up to the greater speed and flexibility of LLMs. People are using LLMs to summarize everything from basic articles on the internet to dense scientific and legal documents (though it’s worth being careful here, as they’re known to sometimes include inaccurate information in these summaries.)
Though it might still be a while before ChatGPT is able to replace Google, it has become more common to simply ask it for help rather than search for the answer online. Programmers, for example, can copy and paste the error messages produced by their malfunctioning code into ChatGPT to get its advice on how to proceed. The same considerations around protecting brand safety that we mentioned in the ‘conversational AI’ section above apply here as well.
One way to get a handle on a huge amount of data is to use a classification algorithm to sort it into categories. Once you know a data point belongs in a particular bucket you already know a fair bit about it, which can cut down on the amount of time you need to spend on analysis. Classifying documents, tweets, etc. is something LLMs can help with, though at this point a fair bit of technical work is required to get models like GPT-3 to reliably and accurately handle classification tasks.
Sentiment analysis refers to a kind of machine learning in which the overall tone of a piece of text is identified (i.e. is it happy, sarcastic, excited, etc.) It’s not exactly the same thing as classification, but it’s related. Sentiment analysis shows up in many customer-facing applications because you need to know how people are responding to your new brand persona or how they like an update to your core offering, and this is something LLMs have proven useful for.
What Are the Advantages of Using LLMs in Business?
More and more businesses are investigating LLMs for their specific applications because they confer many advantages to those that know how to use them.
For one thing, LLMs are extremely well-suited for certain domains. Though they’re still prone to hallucinations and other problems, LLMs can generate high-quality blog posts, emails, and general copy. At present, the output is usually still not as good as what a skilled human can produce.
But LLMs can generate text so quickly that it often makes sense to have the first draft created by a model and tweaked by a human, or to have relatively low-effort tasks (like generating headlines for social media) delegated to a machine so a human writer can focus on more valuable endeavors.
For another, LLMs are highly flexible. It’s relatively straightforward to take a baseline LLM like GPT-4 and feed it examples of behavior you want to see, such as generating math proofs in the form of poetry (if you’re into that sort of thing.) This can be done with prompt engineering or with a more sophisticated pipeline involving the model’s API, but in either case, you have the option of effectively pointing these general-purpose tools at specific tasks.
None of this is to suggest that LLMs are always and everywhere the right tool for the job. Still, in many domains, it makes sense to examine using LLMs for the enterprise.
What Are the Disadvantages of Using LLMs in Business?
For all their power, flexibility, and jaw-dropping speed, there are nevertheless drawbacks to using LLMs.
One disadvantage of using LLMs in business that people are already familiar with is the variable quality of their output. Sometimes, the text generated by an LLM is almost breathtakingly good. But LLMs can also be biased and inaccurate, and their hallucinations – which may not be a big deal for SEO blog posts – will be a huge liability if they end up damaging your brand.
Exacerbating this problem is the fact that no matter how right or wrong GPT-4 is, it’ll format its response in flawless, confident prose. You might expect a human being who doesn’t understand medicine very well to misspell a specialized word like “Umeclidinium bromide”, and that would offer you a clue that there might be other inaccuracies. But that essentially never happens with an LLM, so special diligence must be exercised in fact-checking their claims.
There can also be substantial operational costs associated with training and using LLMs. If you put together a team to build your own internal LLM you should expect to spend (at least) hundreds of thousands of dollars getting it up and running, to say nothing of the ongoing costs of maintenance.
Of course, you could also build your applications around API calls to external parties like OpenAI, who offer their models’ inferences as an endpoint. This is vastly cheaper, but it comes with downsides of its own. Using this approach means being beholden to another entity, which may release updates that dramatically change the performance of their models and materially impact your business.
Perhaps the biggest underlying disadvantage to using LLMs, however, is their sheer inscrutability. True, it’s not that hard to understand at a high level how models like GPT-4 are trained. But the fact remains that no one really understands what’s happening inside of them. It’s usually not clear why tiny changes to a prompt can result in such wildly different outputs, for example, or why a prompt will work well for a while before performance suddenly starts to decline.
Perhaps you just got unlucky – these models are stochastic, after all – or perhaps OpenAI changed the base model. You might not be able to tell, and either way, it’s hard to build robust, long-range applications around technologies that are difficult to understand and predict.
How Can LLMs Be Integrated Into Enterprise Applications?
If you’ve decided you want to integrate these groundbreaking technologies into your own platforms, there are two basic ways you can proceed. Either you can use a 3rd-party service through an API, or you can try to run your own models instead.
In the following two sections, we’ll cover each of these options and their respective tradeoffs.
Using an LLM through an API
An obvious way of leveraging the power of LLMs is by simply including API calls to a platform that specializes in them, such as OpenAI. Generally, this will involve creating infrastructure that is able to pass a prompt to an LLM and return its output.
If you’re building a user-facing chatbot through this method, that would mean that whenever the user types a question, their question is sent to the model and its response is sent back to the user.
The advantages of this approach are that they offer an extremely low barrier to entry, low costs, and fast response times. Hitting an API is pretty trivial as engineering tasks go, and though you’re charged per token, the bill will surely be less than it would be to stand up an entire machine-learning team to build your own model.
But, of course, the danger is that you’re relying on someone else to deliver crucial functionality. If OpenAI changes its terms of service or simply goes bankrupt, you could find yourself in a very bad spot.
Another disadvantage is that the company running the model may have access to the data you’re passing to its models. A team at Samsung recently made headlines when it was discovered they’d been plowing sensitive meeting notes and proprietary source code directly into ChatGPT, where both were viewable by OpenAI. You should always be careful about the data you’re exposing, particularly if it’s customer data whose privacy you’ve been entrusted to protect.
Running Your Own Model
The way to ameliorate the problems of accessing an LLM through an API is to either roll your own or run an open-source model in an environment that you control.
Building the kind of model that can compete with GPT-4 is really, really difficult, and it simply won’t be an option for any but the most elite engineering teams.
Using an open-source LLM, however, is a much more viable option. There are now many such models for text or code generation, and they can be fine-tuned for the specifics of your use case.
By and large, open-source models tend to be smaller and less performant than their closed-source cousins, so you’ll have to decide whether they’re good enough for you. And you should absolutely not underestimate the complexity of maintaining an open-sourced LLM. Though it’s nowhere near as hard as training one from scratch, maintaining an advanced piece of AI software is far from a trivial task.
All that having been said, this is one path you can take if you have the right applications in mind and the technical skills to pull it off.
How to Protect Brand Safety While Building Your Brand Persona
Throughout this piece, we’ve made mention of various ways in which LLMs can help supercharge your business while also warning of the potential damage a bad LLM response can do to your brand.
At present, there is no general-purpose way of making sure an LLM only does good things while never doing bad things. They can be startlingly creative, and with that power comes the possibility that they’ll be creative in ways you’d rather them not be (same as children, we suppose.)
Still, it is possible to put together an extensive testing suite that substantially reduces the possibility of a damaging incident. You need to feed the model many different kinds of interactions, including ones that are angry, annoyed, sarcastic, poorly spelled or formatted, etc., to see how it behaves.
What’s more, this testing needs to be ongoing. It’s not enough to run a test suite one weekend and declare the model fit for use, it needs to be periodically re-tested to ensure no bad behavior has emerged.
With these techniques, you should be able to build a persona as a company on the cutting edge while protecting yourself from incidents that damage your brand.
What Is the Future of LLMs and AI?
The business world moves fast, and if you’re not keeping up with the latest advances you run the risk of being left behind. At present, large language models like GPT-4 are setting the world ablaze with discussions of their potential to completely transform fields like customer experience chatbots.
If you want in on the action and you have the in-house engineering expertise, you could try to create your own offering. But if you would rather leverage the power of LLMs for chat-based applications by working with a world-class team that’s already done the hard engineering work, reach out to Quiq to schedule a demo.