Skip to content

Tutorial 03: Type-Based Parsing

Learn how to use Python type hints to automatically parse LLM responses into structured data.

What You'll Learn

  • How type-based parsing works in Kagura
  • Using Pydantic models for complex structures
  • Handling lists, dicts, and nested objects
  • Error handling and validation

Prerequisites

  • Completed Tutorial 01: Basic Agent
  • Basic understanding of Python type hints
  • Familiarity with Pydantic (helpful but not required)

Why Type-Based Parsing?

LLMs return unstructured text, but your application needs structured data. Kagura automatically converts LLM responses to Python types based on your return type annotation.

# Without parsing: raw string
async def get_age(name: str) -> str:
    """What is {{ name }}'s age?"""
    pass

result = await get_age("Alice")
# result = "Alice is 25 years old."  ← Hard to use in code

# With parsing: structured data
async def get_age(name: str) -> int:
    """What is {{ name }}'s age? Return only the number."""
    pass

result = await get_age("Alice")
# result = 25  ← Easy to use!

Basic Types

Strings

from kagura import agent

@agent
async def summarize(text: str) -> str:
    """Summarize this in one sentence: {{ text }}"""
    pass

result = await summarize("Long article...")
# result: str = "Article summary."

Numbers

@agent
async def count_words(text: str) -> int:
    """Count the words in: {{ text }}. Return only the number."""
    pass

result = await count_words("Hello world")
# result: int = 2

@agent
async def calculate_average(numbers: list[int]) -> float:
    """Calculate the average of {{ numbers }}. Return only the number."""
    pass

result = await calculate_average([1, 2, 3, 4, 5])
# result: float = 3.0

Booleans

@agent
async def is_positive(text: str) -> bool:
    """Is this text positive in sentiment? {{ text }}
    Return only 'true' or 'false'."""
    pass

result = await is_positive("I love this!")
# result: bool = True

Collections

Lists

from typing import List

@agent
async def extract_keywords(text: str) -> List[str]:
    """Extract keywords from: {{ text }}
    Return as JSON array."""
    pass

result = await extract_keywords("Python is great for AI")
# result: List[str] = ["Python", "AI", "programming"]

Supported List Types: - List[str]: List of strings - List[int]: List of integers - List[float]: List of floats - List[YourModel]: List of Pydantic models

Dictionaries

from typing import Dict

@agent
async def extract_metadata(text: str) -> Dict[str, str]:
    """Extract metadata from: {{ text }}
    Return as JSON object."""
    pass

result = await extract_metadata("Title: Hello\nAuthor: Alice")
# result: Dict[str, str] = {"title": "Hello", "author": "Alice"}

Pydantic Models

For complex structures, use Pydantic models:

from pydantic import BaseModel, Field
from typing import List

class Person(BaseModel):
    name: str
    age: int
    email: str

@agent
async def extract_person(text: str) -> Person:
    """Extract person information from: {{ text }}
    Return as JSON object with fields: name, age, email."""
    pass

result = await extract_person("Alice (25) - alice@example.com")
# result: Person = Person(name="Alice", age=25, email="alice@example.com")

Model Validation

Pydantic automatically validates the data:

from pydantic import BaseModel, EmailStr, validator

class User(BaseModel):
    name: str
    email: EmailStr  # Validates email format
    age: int

    @validator('age')
    def age_must_be_positive(cls, v):
        if v < 0:
            raise ValueError('age must be positive')
        return v

@agent
async def extract_user(text: str) -> User:
    """Extract user info from: {{ text }}
    Return as JSON: {name, email, age}"""
    pass

# Valid input
result = await extract_user("Bob, bob@example.com, 30")
# result: User(name="Bob", email="bob@example.com", age=30)

# Invalid input (bad email)
# Will raise ValidationError

Field Descriptions

Help the LLM understand fields:

class Article(BaseModel):
    title: str = Field(description="The article title")
    summary: str = Field(description="Brief summary, max 100 words")
    tags: List[str] = Field(description="Relevant tags, 3-5 items")
    published: bool = Field(description="Whether article is published")

@agent
async def analyze_article(content: str) -> Article:
    """Analyze this article: {{ content }}
    Return as JSON with: title, summary, tags, published."""
    pass

Nested Structures

Nested Models

class Address(BaseModel):
    street: str
    city: str
    country: str

class Company(BaseModel):
    name: str
    address: Address
    employees: int

@agent
async def extract_company(text: str) -> Company:
    """Extract company information from: {{ text }}
    Return as JSON with nested address object."""
    pass

result = await extract_company("Acme Corp, 123 Main St, NYC, USA, 500 employees")
# result: Company(
#     name="Acme Corp",
#     address=Address(street="123 Main St", city="NYC", country="USA"),
#     employees=500
# )

Lists of Models

class Task(BaseModel):
    title: str
    priority: str
    estimated_hours: int

@agent
async def extract_tasks(text: str) -> List[Task]:
    """Extract tasks from: {{ text }}
    Return as JSON array of objects."""
    pass

result = await extract_tasks("""
    1. Fix bug - High priority - 3 hours
    2. Write docs - Low priority - 5 hours
""")
# result: List[Task] = [
#     Task(title="Fix bug", priority="High", estimated_hours=3),
#     Task(title="Write docs", priority="Low", estimated_hours=5)
# ]

Advanced Patterns

Optional Fields

from typing import Optional

class Product(BaseModel):
    name: str
    price: float
    discount: Optional[float] = None
    description: Optional[str] = None

@agent
async def extract_product(text: str) -> Product:
    """Extract product info from: {{ text }}
    Return as JSON. discount and description are optional."""
    pass

result = await extract_product("Laptop $999")
# result: Product(name="Laptop", price=999.0, discount=None, description=None)

Union Types

from typing import Union

@agent
async def parse_value(text: str) -> Union[int, str]:
    """Parse the value from: {{ text }}
    Return as number if numeric, otherwise as string."""
    pass

result1 = await parse_value("42")        # returns int: 42
result2 = await parse_value("hello")     # returns str: "hello"

Enums

from enum import Enum

class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

class Issue(BaseModel):
    title: str
    priority: Priority

@agent
async def extract_issue(text: str) -> Issue:
    """Extract issue from: {{ text }}
    Priority must be: low, medium, or high."""
    pass

result = await extract_issue("Fix login bug - high priority")
# result: Issue(title="Fix login bug", priority=Priority.HIGH)

Best Practices

1. Clear Return Format Instructions

# ✅ Good: Explicit format
@agent
async def extract_data(text: str) -> Person:
    """Extract person from: {{ text }}
    Return as JSON: {"name": str, "age": int, "email": str}"""
    pass

# ❌ Bad: Unclear format
@agent
async def extract_data(text: str) -> Person:
    """Get person from: {{ text }}"""
    pass

2. Use Field Descriptions

# ✅ Good: Descriptive fields
class Report(BaseModel):
    summary: str = Field(description="Executive summary, 2-3 sentences")
    findings: List[str] = Field(description="Key findings, bullet points")
    score: int = Field(description="Overall score 0-100")

# ❌ Bad: No descriptions
class Report(BaseModel):
    summary: str
    findings: List[str]
    score: int

3. Validate Constraints

from pydantic import validator, Field

class Temperature(BaseModel):
    celsius: float = Field(ge=-273.15, description="Temperature in Celsius")

    @validator('celsius')
    def validate_temp(cls, v):
        if v < -273.15:
            raise ValueError('Temperature below absolute zero')
        return v

4. Handle Errors Gracefully

from pydantic import ValidationError

@agent
async def extract_safe(text: str) -> Optional[Person]:
    """Extract person from: {{ text }}
    Return as JSON or null if not found."""
    pass

try:
    result = await extract_safe("No person here")
except ValidationError as e:
    print(f"Validation failed: {e}")

Error Handling

Validation Errors

from pydantic import ValidationError

@agent
async def parse_age(text: str) -> int:
    """Extract age from: {{ text }}. Return only the number."""
    pass

try:
    result = await parse_age("Alice is twenty-five")
    # LLM returns "twenty-five" instead of 25
except ValidationError as e:
    print(f"Failed to parse: {e}")
    # Handle error: retry, use default, etc.

Missing Fields

class Contact(BaseModel):
    name: str
    email: str
    phone: str  # Required

@agent
async def extract_contact(text: str) -> Contact:
    """Extract contact from: {{ text }}
    Return JSON with: name, email, phone."""
    pass

# If LLM omits phone, ValidationError is raised
try:
    result = await extract_contact("John, john@example.com")
except ValidationError as e:
    print("Missing required field:", e)

Type Mismatches

@agent
async def get_count(text: str) -> int:
    """Count items in: {{ text }}. Return only the number."""
    pass

# If LLM returns "five" instead of 5
try:
    result = await get_count("five items")
except ValidationError:
    # Retry with more explicit instructions
    @agent
    async def get_count_strict(text: str) -> int:
        """Count items in: {{ text }}.
        Return ONLY a numeric digit, no words."""
        pass

Common Patterns

Progressive Extraction

# Step 1: Extract basic info
@agent
async def extract_basic(text: str) -> Dict[str, str]:
    """Extract key-value pairs from: {{ text }}"""
    pass

# Step 2: Parse into model
basic = await extract_basic(text)
person = Person(**basic)

Fallback Values

class Config(BaseModel):
    timeout: int = 30  # Default value
    retries: int = 3
    debug: bool = False

@agent
async def parse_config(text: str) -> Config:
    """Parse config from: {{ text }}
    Use defaults for missing values."""
    pass

Multi-Step Validation

class ValidatedData(BaseModel):
    data: str

    @validator('data')
    def clean_data(cls, v):
        # Clean and validate
        return v.strip().lower()

@agent
async def extract_and_validate(text: str) -> ValidatedData:
    """Extract data from: {{ text }}"""
    pass

Practice Exercises

Exercise 1: Contact Extractor

Create a model for contact information:

class Contact(BaseModel):
    # TODO: Add fields for name, email, phone, company
    pass

@agent
async def extract_contact(text: str) -> Contact:
    """# TODO: Write prompt to extract contact info"""
    pass

Exercise 2: Product List Parser

Parse a list of products:

class Product(BaseModel):
    # TODO: Add fields for name, price, stock
    pass

@agent
async def parse_products(text: str) -> List[Product]:
    """# TODO: Write prompt to parse product list"""
    pass

Exercise 3: Nested Organization

Create a nested structure:

class Employee(BaseModel):
    # TODO: name, role, salary
    pass

class Department(BaseModel):
    # TODO: name, employees list, budget
    pass

@agent
async def parse_org(text: str) -> Department:
    """# TODO: Write prompt to parse organization"""
    pass

Troubleshooting

LLM Returns Wrong Format

# Problem: LLM returns "The age is 25" instead of just "25"

# Solution: Be more explicit
@agent
async def get_age(name: str) -> int:
    """What is {{ name }}'s age?
    IMPORTANT: Return ONLY the numeric age, nothing else.
    Example: 25"""
    pass

Validation Fails Repeatedly

# Problem: LLM returns data that fails validation

# Solution: Relax constraints or provide examples
class Person(BaseModel):
    age: int = Field(ge=0, le=150, description="Age between 0-150")

@agent
async def extract_person(text: str) -> Person:
    """Extract person from: {{ text }}
    Return JSON: {"name": "string", "age": number between 0-150}
    Example: {"name": "Alice", "age": 25}"""
    pass

Next Steps

Additional Resources