Structured Extraction from a PDF¶
This notebook demonstrates structured_output() applied to a real-world task:
extracting structured metadata from a research paper that has been converted
to markdown.
We fetch the ReAct: Synergizing Reasoning and Acting in Language Models
paper (Yao et al., 2022), parse its first three pages to markdown using
pymupdf4llm, then ask the LLM to fill in a Pydantic model with the paper's
key fields.
Chapter 3 concept:
structured_output()accepts any text prompt and a Pydantic model class, and returns a validated instance of that model. Here we use it to turn unstructured PDF text into a typed Python object in a single call.
# Uncomment the line below to install `llm-agents-from-scratch` from PyPI
# !pip install llm-agents-from-scratch
Running an Ollama service¶
To execute the code provided in this notebook, you'll need to have Ollama
installed on your local machine and have its LLM hosting service running.
To download Ollama, follow the instructions found on this page:
https://ollama.com/download. After downloading and installing Ollama, you
can start a service by opening a terminal and running the command
ollama serve.
import os
import shutil
import subprocess
import time
import urllib.error
import urllib.request
def ensure_ollama(host="http://localhost:11434", timeout=15):
"""Start Ollama if not already running and wait until responsive."""
def _up():
try:
urllib.request.urlopen(f"{host}/api/tags", timeout=1)
return True
except (urllib.error.URLError, ConnectionError, TimeoutError):
return False
if _up():
return print(f"✓ Ollama already running at {host}")
# Lightning persistent path first, then standard locations
ollama_path = shutil.which("ollama")
if ollama_path is None:
for candidate in [
"/teamspace/studios/this_studio/.local/bin/ollama",
"/usr/local/bin/ollama",
"/usr/bin/ollama",
]:
if os.path.exists(candidate):
ollama_path = candidate
break
if ollama_path is None:
raise RuntimeError(
"Could not find the ollama binary. Install with: "
"curl -fsSL https://ollama.com/install.sh | sh",
)
print(f"Starting Ollama server ({ollama_path})...")
subprocess.Popen(
[ollama_path, "serve"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
deadline = time.time() + timeout
while time.time() < deadline:
if _up():
return print(f"✓ Ollama up and running at {host}")
time.sleep(0.5)
raise RuntimeError(f"Ollama did not start within {timeout}s")
ensure_ollama()
✓ Ollama already running at http://localhost:11434
Installing the PDF Parser¶
We use pymupdf4llm
to convert PDF pages to markdown. It is not part of the core
llm-agents-from-scratch dependencies, so we install it here.
!uv pip install pymupdf4llm
Audited 1 package in 0.66ms
Fetching the Paper¶
We download the first three pages of the ReAct paper directly from arXiv. These pages cover the title, authors, abstract, and opening sections — enough context for the LLM to fill in all extraction fields.
import tempfile
from pathlib import Path
import pymupdf4llm
PDF_URL = "https://arxiv.org/pdf/2210.03629"
PDF_PAGES = [0, 1, 2] # title, abstract, intro
req = urllib.request.Request(
PDF_URL,
headers={"User-Agent": "llm-agents-from-scratch/1.0"},
)
with urllib.request.urlopen(req) as resp:
pdf_bytes = resp.read()
print(f"Downloaded {len(pdf_bytes):,} bytes")
Downloaded 633,805 bytes
Parsing PDF to Markdown¶
with tempfile.NamedTemporaryFile(
suffix=".pdf",
delete=False,
) as tmp:
tmp.write(pdf_bytes)
tmp_path = Path(tmp.name)
try:
md_text = pymupdf4llm.to_markdown(str(tmp_path), pages=PDF_PAGES)
finally:
tmp_path.unlink(missing_ok=True)
print(f"Extracted {len(md_text):,} characters of markdown")
print("--- preview (first 500 chars) ---")
print(md_text[:500])
Extracted 14,957 characters of markdown
--- preview (first 500 chars) ---
Published as a conference paper at ICLR 2023
# REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS
Shunyu Yao _[∗]_[*,1] , Jeffrey Zhao[2] , Dian Yu[2] , Nan Du[2] , Izhak Shafran[2] , Karthik Narasimhan[1] , Yuan Cao[2]
1Department of Computer Science, Princeton University
2Google Research, Brain team
1{shunyuy,karthikn}@princeton.edu
2{jeffreyzhao,dianyu,dunan,izhak,yuancao}@google.com
## ABSTRACT
While large language models (LLMs) have demonstrated impressive performanc
Defining the Extraction Model¶
We define a Pydantic model that captures the key metadata fields we want
to pull from the paper. The LLM will populate every field from the markdown
text in a single structured_output() call.
from pydantic import BaseModel, Field
class PaperSummary(BaseModel):
"""Structured metadata extracted from a research paper."""
title: str = Field(description="Full title of the paper.")
authors: list[str] = Field(
description="List of author names as they appear on the paper.",
)
year: int = Field(
description="Year the paper was published or submitted.",
)
abstract: str = Field(
description="The paper's abstract, faithfully reproduced.",
)
key_contributions: list[str] = Field(
description=(
"Three to five concise bullet points summarising "
"the paper's main contributions."
),
)
primary_topic: str = Field(
description=(
"One short phrase describing the paper's primary research topic "
"(e.g. 'LLM reasoning', 'tool use', 'multi-agent systems')."
),
)
Extracting Structured Data¶
We pass the markdown text as the prompt and PaperSummary as the target
model. structured_output() returns a fully validated PaperSummary
instance — no parsing or post-processing needed.
from llm_agents_from_scratch.llms.ollama import OllamaLLM
llm = OllamaLLM(model="qwen3:14b", think=False)
prompt = (
"Extract structured metadata from the following research paper.\n\n"
f"{md_text}"
)
summary = await llm.structured_output(prompt=prompt, mdl=PaperSummary)
print(type(summary), "\n")
print(summary.model_dump())
<class '__main__.PaperSummary'>
{'title': 'REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS', 'authors': ['Shunyu Yao', 'Jeffrey Zhao', 'Dian Yu', 'Nan Du', 'Izhak Shafran', 'Karthik Narasimhan', 'Yuan Cao'], 'year': 2023, 'abstract': 'While large language models (LLMs) have demonstrated impressive performance across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both verbal reasoning traces and actions pertaining to a task in an interleaved manner, which allows the model to perform dynamic reasoning to create, maintain, and adjust high-level plans for acting while also interacting with external environments to incorporate additional information into reasoning. We conduct empirical evaluations of ReAct and state-of-the-art baselines on four diverse benchmarks: HotPotQA, Fever, ALFWorld, and WebShop. ReAct outperforms existing methods in few-shot learning setups and demonstrates benefits in interpretability, trustworthiness, and diagnosability.', 'key_contributions': ['Introduce ReAct, a novel prompt-based paradigm to synergize reasoning and acting in language models for general task solving.', 'Perform extensive experiments across diverse benchmarks to showcase the advantage of ReAct in a few-shot learning setup over prior approaches that perform either reasoning or action generation in isolation.', 'Present systematic ablations and analysis to understand the importance of acting in reasoning tasks, and reasoning in interactive tasks.', 'Analyze the limitations of ReAct under the prompting setup and perform initial finetuning experiments showing the potential of ReAct to improve with additional training data.'], 'primary_topic': 'Synergizing reasoning and acting in language models for general task solving using the ReAct paradigm.'}
Result¶
print(f"Title: {summary.title}")
print(f"Year: {summary.year}")
print(f"Authors: {', '.join(summary.authors)}")
print(f"Primary topic: {summary.primary_topic}")
print()
print("Abstract:")
print(summary.abstract)
print()
print("Key contributions:")
for point in summary.key_contributions:
print(f" • {point}")
Title: REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS Year: 2023 Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao Primary topic: Synergizing reasoning and acting in language models for general task solving using the ReAct paradigm. Abstract: While large language models (LLMs) have demonstrated impressive performance across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both verbal reasoning traces and actions pertaining to a task in an interleaved manner, which allows the model to perform dynamic reasoning to create, maintain, and adjust high-level plans for acting while also interacting with external environments to incorporate additional information into reasoning. We conduct empirical evaluations of ReAct and state-of-the-art baselines on four diverse benchmarks: HotPotQA, Fever, ALFWorld, and WebShop. ReAct outperforms existing methods in few-shot learning setups and demonstrates benefits in interpretability, trustworthiness, and diagnosability. Key contributions: • Introduce ReAct, a novel prompt-based paradigm to synergize reasoning and acting in language models for general task solving. • Perform extensive experiments across diverse benchmarks to showcase the advantage of ReAct in a few-shot learning setup over prior approaches that perform either reasoning or action generation in isolation. • Present systematic ablations and analysis to understand the importance of acting in reasoning tasks, and reasoning in interactive tasks. • Analyze the limitations of ReAct under the prompting setup and perform initial finetuning experiments showing the potential of ReAct to improve with additional training data.