Beyond the Chatbot: Building Type-Safe AI Workflows for Non-Profits with PydanticAI

TL;DR: PydanticAI lets you build AI agents that return structured, validated data instead of raw text. For non-profits integrating AI with existing databases and compliance systems, this solves a real integration headache.

One failure mode I've seen bite non-profit tech teams: an AI assistant that returns dates as prose ("early next month") when the downstream system needs ISO format ("2025-03-07"). The chatbot works great in demos. Then someone has to manually reformat every output before it can go into a CRM or grant management system.

This isn't the only problem with production AI (accuracy, cost, and latency matter too), but it's a common one, and it's addressable at the framework level.

What PydanticAI Does

PydanticAI is a Python framework from the team behind Pydantic, the same validation library that the OpenAI SDK and Anthropic SDK both depend on.

There are several ways to get structured output from LLMs: OpenAI's JSON mode, function calling, libraries like Instructor. PydanticAI is another option in this space, with some specific design choices worth understanding:

from pydantic import BaseModel
from pydantic_ai import Agent

class GrantSummary(BaseModel):
    title: str
    deadline: str
    max_budget: float
    eligibility_criteria: list[str]

grant_agent = Agent(
    'openai:gpt-4o',
    output_type=GrantSummary,
    system_prompt='Extract grant details from the provided text.'
)

result = grant_agent.run_sync('The Foundation offers up to $50,000 for...')
print(result.output.max_budget)  # 50000.0 - typed as float

If the model returns something that doesn't match the schema, PydanticAI prompts it to retry automatically. It supports multiple providers (OpenAI, Anthropic, Google, Ollama, others), with a consistent interface across them.

This matters most when AI output needs to flow into other systems (databases, calendars, accounting software) without manual reformatting.

When You Need More Than a Single Agent

A single agent handles many use cases. But some workflows have loops or require human checkpoints. A grant proposal that needs revision cycles. A CRM cleanup that shouldn't merge records without sign-off.

For these, there's pydantic-graph, a state machine library (actually independent of PydanticAI) where transitions between states are defined by Python type hints. This isn't the only way to model workflows. LangGraph, Prefect, and Temporal are alternatives with different tradeoffs. But pydantic-graph's approach is worth examining if you're already in the Pydantic ecosystem.

Grant Writing with Revision Loops

Grant proposals go through drafts. You could model revision cycles with a while loop and some flags. That works, though it gets harder to maintain when you want to pause mid-workflow, persist state across process restarts, or visualize what's happening.

Here's the pydantic-graph approach:

Draft → Critique → Draft (if score < 80)
           ↓
         End (if approved)

from __future__ import annotations
from dataclasses import dataclass, field
from pydantic import BaseModel
from pydantic_ai import Agent, ModelMessage
from pydantic_graph import BaseNode, End, Graph, GraphRunContext

class CritiqueResult(BaseModel):
    score: int  # 0-100
    feedback: str
    approved: bool

@dataclass
class GrantState:
    funder_name: str
    program_area: str
    draft: str = ""
    revision_count: int = 0
    drafter_messages: list[ModelMessage] = field(default_factory=list)

drafter_agent = Agent(
    'openai:gpt-4o',
    output_type=str,
    system_prompt='Write compelling grant narratives. Be specific and data-driven.'
)

critic_agent = Agent(
    'openai:gpt-4o',
    output_type=CritiqueResult,
    system_prompt='Evaluate grant proposals. Score 80+ means ready to submit.'
)

@dataclass
class DraftProposal(BaseNode[GrantState, None, str]):
    feedback: str | None = None

    async def run(self, ctx: GraphRunContext[GrantState]) -> CritiqueDraft:
        if self.feedback:
            prompt = f"Revise this draft based on feedback:\n{ctx.state.draft}\n\nFeedback: {self.feedback}"
        else:
            prompt = f"Write a grant proposal for {ctx.state.funder_name} about {ctx.state.program_area}"

        result = await drafter_agent.run(
            prompt,
            message_history=ctx.state.drafter_messages
        )
        ctx.state.drafter_messages = result.all_messages()
        ctx.state.draft = result.output
        ctx.state.revision_count += 1

        return CritiqueDraft()

@dataclass
class CritiqueDraft(BaseNode[GrantState, None, str]):
    async def run(self, ctx: GraphRunContext[GrantState]) -> DraftProposal | End[str]:
        result = await critic_agent.run(
            f"Evaluate this grant proposal:\n{ctx.state.draft}"
        )

        if result.output.approved or ctx.state.revision_count >= 3:
            return End(ctx.state.draft)
        else:
            return DraftProposal(feedback=result.output.feedback)

grant_graph = Graph(nodes=[DraftProposal, CritiqueDraft])

The return type on CritiqueDraft.run(), DraftProposal | End[str], isn't just documentation. pydantic-graph reads that annotation to validate transitions and generate workflow diagrams. Your IDE can catch invalid transitions at write-time.

CRM Cleanup with Human Review

Merging duplicate donor records is genuinely risky. "John Smith, $50k lifetime donor" and "John Smith, one-time $20 gift" might be different people. An incorrect merge destroys data that can't easily be recovered.

One architectural pattern for this: don't let the AI execute directly. Have it propose merges, require human approval, then execute.

from uuid import UUID
from pydantic import BaseModel

class MergeProposal(BaseModel):
    primary_id: UUID
    duplicate_id: UUID
    confidence_score: float
    reasoning: str
    fields_to_merge: list[str]

The agent outputs MergeProposal objects only. If it returns anything that doesn't match this schema, validation fails and no action is taken.

@dataclass
class ScoutDuplicates(BaseNode[CRMState]):
    async def run(self, ctx: GraphRunContext[CRMState]) -> AwaitApproval | End[None]:
        result = await scout_agent.run(
            f"Analyze these records for duplicates:\n{ctx.state.donor_records}"
        )

        if result.output is None:
            return End(None)

        ctx.state.pending_proposal = result.output
        return AwaitApproval()

@dataclass
class AwaitApproval(BaseNode[CRMState]):
    async def run(self, ctx: GraphRunContext[CRMState]) -> ExecuteMerge | RejectMerge:
        # Graph pauses here. State gets serialized.
        # Resume later with ExecuteMerge or RejectMerge.
        raise NotImplementedError("Resume with ExecuteMerge or RejectMerge")

pydantic-graph can persist state to disk or a database. The workflow pauses at AwaitApproval, a staff member reviews, and you resume with the appropriate next node.

This pattern adds latency and requires building a review UI. It's worth the tradeoff when incorrect actions have high cost. For lower-stakes operations, direct execution might be fine.

When to Use What

Not everything needs a graph. The PydanticAI docs are upfront about this: graphs add complexity.

What you're building	Approach
Q&A, content generation, simple extraction	Single agent
Task that needs to call APIs or query databases	Agent with tools
Multi-step but linear workflow	Agent delegation (one agent calls another)
Loops, branches, or mid-workflow pauses	pydantic-graph

Getting Started

pip install pydantic-ai pydantic-graph

The PydanticAI docs are thorough. The graph documentation is worth reading if you're considering workflow orchestration, even if you end up choosing a different tool.

Where This Fits

Output formatting is one piece of the production AI puzzle. Accuracy, cost, latency, and prompt maintenance are separate challenges that structured output doesn't solve.

But for organizations integrating AI with existing systems (particularly systems that expect specific data formats), having a framework that enforces output structure removes a real source of bugs. For non-profits managing donor relationships and grant compliance, where data quality directly affects operations, that's a meaningful reduction in integration friction.

Additional Resources

PydanticAI & Related

PydanticAI Documentation - The official docs, including setup guides and API reference
pydantic-graph Documentation - State machine workflows with type-safe transitions
Multi-Agent Patterns - Agent delegation, programmatic hand-off, and when to use graphs
Pydantic Documentation - The underlying validation library; worth understanding if you're using PydanticAI

Alternative Approaches to Structured Output

Instructor - A lightweight library focused specifically on structured extraction from LLMs, also built on Pydantic
OpenAI Structured Outputs Guide - Native JSON schema support in OpenAI's API
Comparing Structured Output Libraries - Paul Simmering's detailed comparison of Instructor, Marvin, Mirascope, Outlines, and others

Workflow Orchestration Alternatives

LangGraph - LangChain's graph-based framework for agent workflows, with different design tradeoffs than pydantic-graph
LangGraph GitHub Repository - Source code and examples