Parallel Tool Calls¶
This notebook demonstrates how a single LLM response can contain
multiple tool call requests, and how to execute them all before
returning the results in one continue_chat_with_tool_results() call.
We use the hailstone_step_func tool from Chapter 3 and ask the LLM
to apply it to several numbers at once. The LLM batches the calls
in a single response; we execute each one and send all results back
together.
Chapter 3 concept:
chat()returns an assistantChatMessagewhosetool_callsfield can hold any number of requests. Collecting all results before callingcontinue_chat_with_tool_results()is the correct pattern — and exactly whatLLMAgentautomates in ch04.
# Uncomment the line below to install `llm-agents-from-scratch` from PyPI
# !pip install llm-agents-from-scratch
Running an Ollama service¶
To execute the code provided in this notebook, you'll need to have Ollama
installed on your local machine and have its LLM hosting service running.
To download Ollama, follow the instructions found on this page:
https://ollama.com/download. After downloading and installing Ollama, you
can start a service by opening a terminal and running the command
ollama serve.
import os
import shutil
import subprocess
import time
import urllib.error
import urllib.request
def ensure_ollama(host="http://localhost:11434", timeout=15):
"""Start Ollama if not already running and wait until responsive."""
def _up():
try:
urllib.request.urlopen(f"{host}/api/tags", timeout=1)
return True
except (urllib.error.URLError, ConnectionError, TimeoutError):
return False
if _up():
return print(f"✓ Ollama already running at {host}")
# Lightning persistent path first, then standard locations
ollama_path = shutil.which("ollama")
if ollama_path is None:
for candidate in [
"/teamspace/studios/this_studio/.local/bin/ollama",
"/usr/local/bin/ollama",
"/usr/bin/ollama",
]:
if os.path.exists(candidate):
ollama_path = candidate
break
if ollama_path is None:
raise RuntimeError(
"Could not find the ollama binary. Install with: "
"curl -fsSL https://ollama.com/install.sh | sh",
)
print(f"Starting Ollama server ({ollama_path})...")
subprocess.Popen(
[ollama_path, "serve"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
deadline = time.time() + timeout
while time.time() < deadline:
if _up():
return print(f"✓ Ollama up and running at {host}")
time.sleep(0.5)
raise RuntimeError(f"Ollama did not start within {timeout}s")
ensure_ollama()
✓ Ollama already running at http://localhost:11434
Defining the Tool¶
hailstone_step_func performs a single step of the
Collatz sequence:
halve the number if it is even, otherwise apply 3x + 1.
from llm_agents_from_scratch.tools import SimpleFunctionTool
def hailstone_step_func(x: int) -> int:
"""Perform a single step of the Hailstone (Collatz) sequence."""
if x % 2 == 0:
return x // 2
return 3 * x + 1
hailstone_tool = SimpleFunctionTool(func=hailstone_step_func)
Step 1 — Eliciting Parallel Tool Calls¶
We ask the LLM to apply the hailstone step to three numbers at once. A well-prompted model will return all three tool call requests in a single assistant message rather than one at a time.
from llm_agents_from_scratch.llms.ollama import OllamaLLM
llm = OllamaLLM(model="qwen3:14b", think=False)
user_input = (
"Apply the hailstone_step_func to each of the following numbers: "
"10, 15, and 27. "
"Call the tool once for each number."
)
user_msg, assistant_msg = await llm.chat(
user_input,
tools=[hailstone_tool],
)
print(f"Tool calls returned: {len(assistant_msg.tool_calls)}")
for tc in assistant_msg.tool_calls:
print(f" → {tc.tool_name}({tc.arguments})")
Tool calls returned: 3
→ hailstone_step_func({'x': 10})
→ hailstone_step_func({'x': 15})
→ hailstone_step_func({'x': 27})
Step 2 — Executing All Tool Calls¶
We iterate over every ToolCall in the assistant message and execute
each one, collecting the ToolCallResult objects.
tool_call_results = [hailstone_tool(tc) for tc in assistant_msg.tool_calls]
for tc, result in zip(
assistant_msg.tool_calls,
tool_call_results,
strict=False,
):
print(
f" hailstone_step_func(x={tc.arguments['x']!r}) → {result.content}",
)
hailstone_step_func(x=10) → 5 hailstone_step_func(x=15) → 46 hailstone_step_func(x=27) → 82
Step 3 — Returning All Results in One Call¶
We pass the full list of ToolCallResult objects to
continue_chat_with_tool_results() in a single batch.
The LLM receives all three results at once and produces a final answer.
new_messages, final_response = await llm.continue_chat_with_tool_results(
tool_call_results=tool_call_results,
chat_history=[user_msg, assistant_msg],
tools=[hailstone_tool],
)
print(final_response.content)
The results of applying the hailstone_step_func to each number are as follows: - For 10, the result is 5. - For 15, the result is 46. - For 27, the result is 82.
Full Conversation at a Glance¶
Printing the complete message sequence shows the structure the
LLMAgent manages automatically: user → parallel tool requests →
tool results (one per call) → final answer.
all_messages = [user_msg, assistant_msg, *new_messages, final_response]
for msg in all_messages:
role = msg.role.value
if msg.tool_calls:
calls = ", ".join(
f"{tc.tool_name}({tc.arguments})" for tc in msg.tool_calls
)
print(f"[{role:10s}] <tool calls> {calls}")
else:
preview = msg.content[:80].replace("\n", " ")
print(f"[{role:10s}] {preview}")
[user ] Apply the hailstone_step_func to each of the following numbers: 10, 15, and 27.
[assistant ] <tool calls> hailstone_step_func({'x': 10}), hailstone_step_func({'x': 15}), hailstone_step_func({'x': 27})
[tool ] { "tool_call_id": "ff713bbf-fa95-4bd9-a10e-a0a451fc2682", "content": "5"
[tool ] { "tool_call_id": "9b132fa9-17b7-441f-932c-61f2a252075f", "content": "46
[tool ] { "tool_call_id": "33ad59da-a1a3-479d-9a81-733d99129fa9", "content": "82
[assistant ] The results of applying the hailstone_step_func to each number are as follows: