Generate Test Data with Local AI Models over MCP
Run a fully offline test-data workflow: a local AI model writes SQL, Seedmancer's MCP server runs and snapshots it locally, and your schema never leaves your machine.
Most "AI test data" workflows send your database schema to a cloud model. For a side project that is fine. For a codebase under an NDA, a healthcare schema, or a financial system, it is a non-starter — the structure of your database is sensitive, and shipping it to a third-party API is not an option.
The good news is that you do not need a cloud model at all. Seedmancer's MCP server runs entirely on your machine, and it is designed so that the AI agent — not Seedmancer — writes the SQL. Point that agent at a local model running through Ollama, and the whole loop becomes air-gapped: the model writes the SQL locally, Seedmancer runs it against your local database locally, and nothing ever leaves your laptop.
This post explains how the pieces fit together and how to set up a fully local test-data generation workflow.
Why the MCP server is local by design
The Model Context Protocol lets an AI agent call external tools instead of just producing text. Seedmancer ships an MCP server (seedmancer mcp) that exposes its CLI surface as typed tools.
The important design decision is in the server's own instructions to the agent:
There is no cloud AI generation tool here — you are the AI. YOU generate the test data by writing SQL and calling Seedmancer to run, snapshot, and manage it.
In other words, Seedmancer does not call out to an AI service to generate data. It hands the agent a set of tools — inspect the schema, run SQL, snapshot the result — and the agent supplies the intelligence. Whatever model backs that agent is where the generation actually happens.
When that model is a cloud one (Claude, GPT) your schema is described to a remote API. When that model is local (an open-weight model running on your own hardware), nothing leaves the machine. Same tools, same workflow — the only thing that changes is where the model runs.
The fully local stack
A completely offline test-data generation setup has three local pieces:
┌─────────────────────────────────────────────┐
│ Your machine │
│ │
│ Local model (Ollama) │
│ │ writes SQL │
│ ▼ │
│ MCP host (Continue / Cline / Zed) │
│ │ calls tools over stdio │
│ ▼ │
│ seedmancer mcp │
│ │ runs SQL, snapshots CSVs │
│ ▼ │
│ Local PostgreSQL │
└─────────────────────────────────────────────┘
- A local model served by Ollama (or LM Studio). This is the brain that writes the SQL.
- An MCP-capable host that supports local models — for example Continue or Cline, both of which can use an Ollama model and connect to MCP servers.
- Seedmancer's stdio MCP server, spawned locally by the host.
No API token, no internet connection, and no cloud quota are required for any of this. Local generation (generate_dataset_local) runs against your local database and snapshots the result to CSVs on disk.
Step 1: install and initialise Seedmancer
# macOS
brew install KazanKK/tap/seedmancer
# or via Go
go install github.com/KazanKK/seedmancer@latest
From your project root:
seedmancer init
Provide a postgres:// URL when prompted. This writes seedmancer.yaml with a local environment. Then capture a baseline so the agent has a schema to work against:
seedmancer export myapp/baseline
Note that init, export, seed, and local generation all work offline — seedmancer login is only needed for cloud push/pull, which this workflow does not use.
Step 2: run a local model with Ollama
Install Ollama and pull a model that is good at SQL. Larger code-capable models produce better schema-aware SQL, but even mid-size models handle straightforward fixtures well:
ollama pull qwen2.5-coder:14b
# or a smaller option
ollama pull llama3.1:8b
Ollama serves the model on http://localhost:11434. Nothing is sent anywhere else.
Step 3: connect Seedmancer's MCP server to a local-model host
Add Seedmancer to your MCP host's config. The server shape is the same across hosts:
{
"mcpServers": {
"seedmancer": {
"command": "seedmancer",
"args": ["mcp", "--log-file", "/tmp/seedmancer-mcp.log"]
}
}
}
In Continue, configure the model provider to point at your local Ollama instance and enable the Seedmancer MCP server. In Cline, select the Ollama model and add the same MCP server entry. The host spawns seedmancer mcp over stdio when it starts.
Once connected, run this once per project so the workflow guidance persists across conversations:
Run the
install_agent_rulestool.
This writes a rules file into the project so the agent reliably reaches for Seedmancer's tools when you ask for test data.
Step 4: generate data from a prompt
Now describe the state you need in plain language. You do not write SQL — the local model does:
"Create a scenario called
billing/prowith a premium workspace, three members, and an expired trial subscription. Make the data realistic."
Behind the scenes the agent works through Seedmancer's tools:
get_statusandlist_schemasconfirm the project is set up.describe_schemareturns the exact tables and columns so the SQL matches your real structure.generate_dataset_localruns a full, idempotent SQL script the model wrote —TRUNCATEbeforeINSERTfor every populated table — withinherit: "myapp/baseline"so foreign-key dependencies resolve, andpromptset to your request so the intent is saved on the scenario.
Seedmancer runs the SQL inside a transaction against your local database, exports the result to CSVs, and stores it as revision r001 under .seedmancer/scenarios/billing/pro/. Seed it instantly:
seedmancer seed billing/pro --yes
Because the model ran locally and the SQL ran locally, your schema and data never touched a network.
Why local generation is still reproducible
A local model is non-deterministic, but the output of this workflow is not a one-off. What gets saved is a revision:
- Immutable revisions —
r001,r002, … are never modified.latestadvances, but any historical revision can be seeded with--revision r001. - Saved SQL and purpose — the SQL the model wrote, plus your original prompt, are stored on the scenario. Anyone can retrieve them with the
get_dataset_sqltool. - Schema fingerprint — every revision is bound to the schema it was captured against, so drift is detectable.
The model's creativity happens once. After that, every developer and every CI run seeds the exact same CSVs from the committed .seedmancer/ folder — no model required at seed time.
Handling schema changes locally
When your schema changes, the saved scenario may no longer fit. The local agent can fix it without any cloud round-trip:
check_state_schemareturns a structured diff between the stored revision and the live database.get_dataset_sqlretrieves the old SQL and the saved purpose as a reference.- The model rewrites the full SQL to match the new schema, keeping it true to the original intent.
generate_dataset_localcreates a new revision from the rewritten SQL.
Old revisions are preserved, so you can always roll back.
When to prefer local AI generation
| Situation | Local AI generation fits because | |---|---| | Sensitive or regulated schema | Schema and data never leave the machine | | Air-gapped or offline environment | No network or API token required | | No cloud budget | Local generation consumes no quota | | Privacy-conscious team | The model runs on hardware you control |
For teams without those constraints, a cloud model through an MCP host works identically and is often faster at writing complex SQL. The workflow and the resulting revisions are the same either way — local AI simply moves the model onto your own hardware.
Summary
Seedmancer's MCP server is local-first by design: the agent writes the SQL, Seedmancer runs and snapshots it, and generate_dataset_local never needs the cloud. Pair that server with a local model through Ollama and an MCP-capable host, and you get a fully air-gapped test-data generation loop — schema-aware, reproducible, and private.
See the CLI documentation for the full MCP configuration and tool reference, or read the companion guide on the general AI + MCP workflow.