Parallel Tool Calls¶

This notebook demonstrates how a single LLM response can contain multiple tool call requests, and how to execute them all before returning the results in one continue_chat_with_tool_results() call.

We use the hailstone_step_func tool from Chapter 3 and ask the LLM to apply it to several numbers at once. The LLM batches the calls in a single response; we execute each one and send all results back together.

Chapter 3 concept: chat() returns an assistant ChatMessage whose tool_calls field can hold any number of requests. Collecting all results before calling continue_chat_with_tool_results() is the correct pattern — and exactly what LLMAgent automates in ch04.

In [ ]:

Copied!

# Uncomment the line below to install `llm-agents-from-scratch` from PyPI
# !pip install llm-agents-from-scratch
# Uncomment the line below to install `llm-agents-from-scratch` from PyPI
# !pip install llm-agents-from-scratch

Running an Ollama service¶

To execute the code provided in this notebook, you'll need to have Ollama installed on your local machine and have its LLM hosting service running. To download Ollama, follow the instructions found on this page: https://ollama.com/download. After downloading and installing Ollama, you can start a service by opening a terminal and running the command ollama serve.

In [1]:

Copied!





import os
import shutil
import subprocess
import time
import urllib.error
import urllib.request


def ensure_ollama(host="http://localhost:11434", timeout=15):
    """Start Ollama if not already running and wait until responsive."""

    def _up():
        try:
            urllib.request.urlopen(f"{host}/api/tags", timeout=1)
            return True
        except (urllib.error.URLError, ConnectionError, TimeoutError):
            return False

    if _up():
        return print(f"✓ Ollama already running at {host}")

    # Lightning persistent path first, then standard locations
    ollama_path = shutil.which("ollama")
    if ollama_path is None:
        for candidate in [
            "/teamspace/studios/this_studio/.local/bin/ollama",
            "/usr/local/bin/ollama",
            "/usr/bin/ollama",
        ]:
            if os.path.exists(candidate):
                ollama_path = candidate
                break
    if ollama_path is None:
        raise RuntimeError(
            "Could not find the ollama binary. Install with: "
            "curl -fsSL https://ollama.com/install.sh | sh",
        )

    print(f"Starting Ollama server ({ollama_path})...")
    subprocess.Popen(
        [ollama_path, "serve"],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )

    deadline = time.time() + timeout
    while time.time() < deadline:
        if _up():
            return print(f"✓ Ollama up and running at {host}")
        time.sleep(0.5)

    raise RuntimeError(f"Ollama did not start within {timeout}s")


ensure_ollama()
import os
import shutil
import subprocess
import time
import urllib.error
import urllib.request


def ensure_ollama(host="http://localhost:11434", timeout=15):
    """Start Ollama if not already running and wait until responsive."""

    def _up():
        try:
            urllib.request.urlopen(f"{host}/api/tags", timeout=1)
            return True
        except (urllib.error.URLError, ConnectionError, TimeoutError):
            return False

    if _up():
        return print(f"✓ Ollama already running at {host}")

    # Lightning persistent path first, then standard locations
    ollama_path = shutil.which("ollama")
    if ollama_path is None:
        for candidate in [
            "/teamspace/studios/this_studio/.local/bin/ollama",
            "/usr/local/bin/ollama",
            "/usr/bin/ollama",
        ]:
            if os.path.exists(candidate):
                ollama_path = candidate
                break
    if ollama_path is None:
        raise RuntimeError(
            "Could not find the ollama binary. Install with: "
            "curl -fsSL https://ollama.com/install.sh | sh",
        )

    print(f"Starting Ollama server ({ollama_path})...")
    subprocess.Popen(
        [ollama_path, "serve"],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )

    deadline = time.time() + timeout
    while time.time() < deadline:
        if _up():
            return print(f"✓ Ollama up and running at {host}")
        time.sleep(0.5)

    raise RuntimeError(f"Ollama did not start within {timeout}s")


ensure_ollama()

✓ Ollama already running at http://localhost:11434

Defining the Tool¶

hailstone_step_func performs a single step of the Collatz sequence: halve the number if it is even, otherwise apply 3x + 1.

In [2]:

Copied!

from llm_agents_from_scratch.tools import SimpleFunctionTool

def hailstone_step_func(x: int) -> int:
    """Perform a single step of the Hailstone (Collatz) sequence."""
    if x % 2 == 0:
        return x // 2
    return 3 * x + 1

hailstone_tool = SimpleFunctionTool(func=hailstone_step_func)
from llm_agents_from_scratch.tools import SimpleFunctionTool

def hailstone_step_func(x: int) -> int:
    """Perform a single step of the Hailstone (Collatz) sequence."""
    if x % 2 == 0:
        return x // 2
    return 3 * x + 1

hailstone_tool = SimpleFunctionTool(func=hailstone_step_func)

Step 1 — Eliciting Parallel Tool Calls¶

We ask the LLM to apply the hailstone step to three numbers at once. A well-prompted model will return all three tool call requests in a single assistant message rather than one at a time.

In [3]:

Copied!





from llm_agents_from_scratch.llms.ollama import OllamaLLM

llm = OllamaLLM(model="qwen3:14b", think=False)

user_input = (
    "Apply the hailstone_step_func to each of the following numbers: "
    "10, 15, and 27. "
    "Call the tool once for each number."
)

user_msg, assistant_msg = await llm.chat(
    user_input,
    tools=[hailstone_tool],
)

print(f"Tool calls returned: {len(assistant_msg.tool_calls)}")
for tc in assistant_msg.tool_calls:
    print(f"  → {tc.tool_name}({tc.arguments})")
from llm_agents_from_scratch.llms.ollama import OllamaLLM

llm = OllamaLLM(model="qwen3:14b", think=False)

user_input = (
    "Apply the hailstone_step_func to each of the following numbers: "
    "10, 15, and 27. "
    "Call the tool once for each number."
)

user_msg, assistant_msg = await llm.chat(
    user_input,
    tools=[hailstone_tool],
)

print(f"Tool calls returned: {len(assistant_msg.tool_calls)}")
for tc in assistant_msg.tool_calls:
    print(f"  → {tc.tool_name}({tc.arguments})")

Tool calls returned: 3
  → hailstone_step_func({'x': 10})
  → hailstone_step_func({'x': 15})
  → hailstone_step_func({'x': 27})

Step 2 — Executing All Tool Calls¶

We iterate over every ToolCall in the assistant message and execute each one, collecting the ToolCallResult objects.

In [4]:

Copied!





tool_call_results = [hailstone_tool(tc) for tc in assistant_msg.tool_calls]

for tc, result in zip(
    assistant_msg.tool_calls,
    tool_call_results,
    strict=False,
):
    print(
        f"  hailstone_step_func(x={tc.arguments['x']!r}) → {result.content}",
    )
tool_call_results = [hailstone_tool(tc) for tc in assistant_msg.tool_calls]

for tc, result in zip(
    assistant_msg.tool_calls,
    tool_call_results,
    strict=False,
):
    print(
        f"  hailstone_step_func(x={tc.arguments['x']!r}) → {result.content}",
    )

  hailstone_step_func(x=10) → 5
  hailstone_step_func(x=15) → 46
  hailstone_step_func(x=27) → 82

Step 3 — Returning All Results in One Call¶

We pass the full list of ToolCallResult objects to continue_chat_with_tool_results() in a single batch. The LLM receives all three results at once and produces a final answer.

In [5]:

Copied!





new_messages, final_response = await llm.continue_chat_with_tool_results(
    tool_call_results=tool_call_results,
    chat_history=[user_msg, assistant_msg],
    tools=[hailstone_tool],
)

print(final_response.content)
new_messages, final_response = await llm.continue_chat_with_tool_results(
    tool_call_results=tool_call_results,
    chat_history=[user_msg, assistant_msg],
    tools=[hailstone_tool],
)

print(final_response.content)

The results of applying the hailstone_step_func to each number are as follows:

- For 10, the result is 5.
- For 15, the result is 46.
- For 27, the result is 82.

Full Conversation at a Glance¶

Printing the complete message sequence shows the structure the LLMAgent manages automatically: user → parallel tool requests → tool results (one per call) → final answer.

In [6]:

Copied!





all_messages = [user_msg, assistant_msg, *new_messages, final_response]

for msg in all_messages:
    role = msg.role.value
    if msg.tool_calls:
        calls = ", ".join(
            f"{tc.tool_name}({tc.arguments})" for tc in msg.tool_calls
        )
        print(f"[{role:10s}]  <tool calls> {calls}")
    else:
        preview = msg.content[:80].replace("\n", " ")
        print(f"[{role:10s}]  {preview}")
all_messages = [user_msg, assistant_msg, *new_messages, final_response]

for msg in all_messages:
    role = msg.role.value
    if msg.tool_calls:
        calls = ", ".join(
            f"{tc.tool_name}({tc.arguments})" for tc in msg.tool_calls
        )
        print(f"[{role:10s}]  <tool calls> {calls}")
    else:
        preview = msg.content[:80].replace("\n", " ")
        print(f"[{role:10s}]  {preview}")

[user      ]  Apply the hailstone_step_func to each of the following numbers: 10, 15, and 27. 
[assistant ]  <tool calls> hailstone_step_func({'x': 10}), hailstone_step_func({'x': 15}), hailstone_step_func({'x': 27})
[tool      ]  {     "tool_call_id": "ff713bbf-fa95-4bd9-a10e-a0a451fc2682",     "content": "5"
[tool      ]  {     "tool_call_id": "9b132fa9-17b7-441f-932c-61f2a252075f",     "content": "46
[tool      ]  {     "tool_call_id": "33ad59da-a1a3-479d-9a81-733d99129fa9",     "content": "82
[assistant ]  The results of applying the hailstone_step_func to each number are as follows: