What is LLM Function Calling and How Does it Work?

For all their amazing capabilities, LLMs have a fundamental weakness: they can’t actually do anything. They read a sequence of input tokens (the prompt) and produce a sequence of output tokens (one at a time) known as the completion. There are no side effects—just inputs and outputs. So something else, such as the application your building, has to take the LLM’s output and do something useful with it.

But how can we get an LLM to reliably generate output that conforms to our application’s requirements? Function calls, also known as tool usages, make it easier for your application to do something useful with an LLM’s output.

Note: LLM functions and tools generally refer to the same concept. ‘Tool’ is the term used by Anthropic/Claude, whereas OpenAI uses the term function as a specific type of tool. For purposes of this article, they are used interchangeably.

Table of Contents

What Problem Does LLM Function Calling Solve?

To better understand the problem that function calls solve, let’s pretend we’re adding a new feature to an email client that allows the user to provide shorthand instructions for an email and use an LLM to generate the subject and body:

Our application might build up a prompt request like the following GPT-4o-Mini example. Note how we ask the LLM to return a specific format expected by our application:

user = “Kyle McIntyre”
recipient = “Aunt Suzie (suzieq@mailinator.com)”
user_input = "Tell her I can’t make it this Sunday, taking dog to vet. Ask how things are going, keep it folksy yet respectful."

prompt = f"""
Draft an email on behalf of the user, {user}, to {recipient}.

Here are the user’s instructions: {user_input}

Generate a subject and body. Format your response as JSON as follows:

{{
  "subject": <email subject,
  "body": <email body>
}}

Your response:
"""

request = {
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "user",
      "content": prompt
    }
  ],
  "response_format": {
    "type": "json_object"
  }
}

response = requests.post('https://api.openai.com/v1/chat/completions', auth=('Bearer', secret), json=request)

Assume our application sends this prompt and receives a completion back. What do we know about the completion? In a word: nothing.

Although LLMs do their best to follow our instructions, there’s no guarantee that the output will adhere to our requested schema. Subject and body could be missing, incorrectly capitalized, or perhaps be of the wrong type. Additional properties we didn’t ask for might also be included. Prior to the advent of function calls, our only options at this point were to

Continually tweak our prompts in an effort to get more reliable outputs
Write very tolerant deserialization and coercion logic in our app to make the LLM’s output adhere to our expectation
Retry the prompt multiple times until we receive legal output

Function calls, and a related model feature known as “structured outputs”, make all of this much easier and more reliable.

Function Calls to the Rescue

Let’s code up the same example using a function call. In order to get an LLM to ‘use’ a tool, you must first define it. Typically this involves giving it a name and then defining the schema of the function’s arguments.

In the example below, we define a tool named “draft_email” that takes two required arguments, body and subject, both of which are strings:

user = “Kyle McIntyre”
recipient = “Aunt Suzie (suzieq@mailinator.com)”
user_input = "Tell her I can’t make it this Sunday, taking dog to vet. Ask how things are going, keep it folksy yet respectful."

prompt = f"""
Use the available function to draft email on behalf of the user, {user}, to {recipient}.

Here are the user’s instructions: {user_input}
"""

tool = {
  "type": "function",
  "function": {
    "name": "draft_email",
    "description": "Draft an email on behalf of the user",
    "parameters": {
      "type": "object",
      "properties": {
        "subject": {
          "type": "string",
          "description": "The email subject",
        },
        "body": {
          "type": "string",
          "description": "The email body",
        }
      },
      "required": ["subject", "body"]
    }
  },
}

request = {
  "model": "gpt-4o-mini-2024-07-18",
  "messages": [
    {
      "role": "user",
      "content": prompt
    }
  ],
  "tools": [tool]
}

response = requests.post('https://api.openai.com/v1/chat/completions', auth=('Bearer', key), json=request)

Defining the tool required some extra work on our part, but it also simplified our prompt. We’re no longer trying to describe the shape of our expected output and instead just say “use the available function”. More importantly, we can now trust that the LLM’s output will actually adhere to our specified schema!

Let’s look at the response message we received from GPT-4o-Mini:

{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "type": "function",
      "function": {
        "name": "draft_email",
        "arguments": "{\"subject\":\"Regrets for This Sunday\",\"body\":\"Hi Aunt Suzie,\\n\\nI hope this email finds you well! I wanted to let you know that I can't make it this Sunday, as I need to take the dog to the vet. \\n\\nHow have things been going with you? I always love hearing about what\u2019s new in your life.\\n\\nTake care and talk to you soon!\\n\\nBest,\\nKyle McIntyre\"}"
      }
    }
  ],
  "refusal": null
}

What we received back is really a request from the LLM to ‘call’ our function. Our application still needs to honor the function call somehow.

But now, rather than having to treat the LLMs output as an opaque string, we can trust that the arguments adhere to our application requirements. The ability to define a contract and trust that the LLM outputs will adhere to it make function calls an invaluable tool when integrating an LLM into an application.
How Does Function Calling Work?

As we saw in the last section, in order to get an LLM to generate reliable outputs we have to define a function or tool for it to use. Specifically, we’re defining a schema that the output needs to adhere to. Function calls and tools work a bit differently across various LLM vendors, but they all require the declaration of a schema and most are based on the open JsonSchema standard.

So, how does an LLM ensure that its outputs adhere to the tool schema? How can stochastic token-by-token output generation be reconciled with strict adherence to a data schema?

The solution is quite elegant: LLMs still generate their outputs one token at a time when calling a function, but the model is only allowed to choose from the subset of tokens that would keep the output in compliance with the schema. This is done through dynamic token masking based on the schema’s definition. In this way the output is still generative and very intelligent, but guaranteed to adhere to the schema.

Function Calling Misnomers and Misconceptions

The name ‘function call’ is somewhat misleading because it sounds like the LLM is going to actually do something on your behalf (and thereby cause side effects). But it doesn’t. When the LLM decides to ‘call’ a function, that just means that it’s going to generate output that represents a request to call that function. It’s still the responsibility of your application to handle that request and do something with it—but now you can trust the shape of the payload.

For this reason, a LLM function doesn’t need to map directly to any true function or method in your application, or any real API. Instead, LLM functions can (and probably should) be defined to be more conceptual from the perspective of the LLM.

Use in Agentic Workflows

So, are function calls only useful for constraining output? While that is certainly their primary purpose, they can also be quite useful in building agentic workflows. Rather than presenting a model with a single tool definition, you can instead present it with multiple tools and ask the LLM to use the tools at its disposal to help solve a problem.

For example, you might provide the LLM with the following tools in a CX context:

escalate() – Escalate the conversation to a human agent for further review
search(query) – Search a knowledgebase for helpful information
emailTranscript() – Email the customer a transcript of the conversation

When using function calls in an agentic workflow, the application typically interprets the function call and somehow uses it to update the information passed to the LLM in the next turn.

It’s also worth noting that conversational LLMs can call functions and generate output messages intended for the user all at the same time. If you were building an AI DJ, the LLM might call a function like play_track(“Traveler”, “Chris Stapleton”) while simultaneously saying to the user: “I’m spinning up one of your favorite Country tunes now”.

Function Calling in Quiq’s AI Studio

Function calling is fully supported in Quiq’s AI Studio on capable LLMs. However, AI Studio goes further than basic function call support in three key ways:

The expected output shape of any prompt (the completion schema) can be visually configured in the Prompt Editor
Prompt outputs aren’t just used for transient function calls but become attached to the visual flow state for inspection later in the same prompt chain or conversation
Completion schemas can be configured on LLMs – even those that don’t support function calls

If you’re interested to learn more about AI Studio, please request a trial.

Author

Kyle McIntyre

Kyle McIntyre is Head of AI Engineering at Quiq. Kyle is a self-described family man, software builder, data scientist, Montana kid, homesteader.
View all posts

Featured Resource

Forrester Report

Featured Resource

Forrester Report

What is LLM Function Calling and How Does it Work?

Kyle McIntyre

What Problem Does LLM Function Calling Solve?

Function Calls to the Rescue

Function Calling Misnomers and Misconceptions

Use in Agentic Workflows

Function Calling in Quiq’s AI Studio

Author

Featured Resource

Forrester Report

Featured Resource

Forrester Report

What Problem Does LLM Function Calling Solve?

Function Calls to the Rescue

Function Calling Misnomers and Misconceptions

Use in Agentic Workflows

Function Calling in Quiq’s AI Studio

Author

Subscribe to our blog

FORRESTER REPORT

The state of Conversational AI

Related stories

The state of
Conversational AI