Back to all posts
Test dataSeed scriptsDatabasePostgreSQLBest practices

Why Seed Scripts Fail at Scale (and What to Use Instead)

Seed scripts start simple and become unmaintainable. Discover the four ways they break as projects grow and how state-based scenarios solve each one.

Every project starts with a seed script. A handful of INSERT statements, maybe a seeds.sql file or a db/seeds.rb, enough to boot the application and run the first few tests. It works. Nobody thinks about it again.

Six months later, the script takes two minutes to run, breaks every time someone touches the schema, and three different developers maintain three different versions of it in three different branches. Nobody knows which one is correct.

This is the natural lifecycle of a seed script. Understanding why it fails — and what replaces it — is the first step to getting reliable test data back.


Why seed scripts feel right at first

Seed scripts are appealing because they are transparent. You can read a SQL file and understand exactly what data will be in the database. There are no abstractions, no framework magic, just INSERT statements that anyone on the team can edit.

They are also universal. Every database supports them. Every language has a way to run them. They fit naturally into existing workflows — run once after migrations, commit to the repository alongside the code, done.

The problems only emerge when the codebase grows.


The four ways seed scripts break

1. They accumulate without a strategy

Seed scripts grow by addition. Someone needs a new test user with admin privileges, so they add an INSERT. Someone else needs an order in a pending state, so they add another. Nobody removes anything, because removing rows might break a test nobody can trace back to the script.

Within a year, the script inserts hundreds of rows across dozens of tables. Half of it is not used by any test. The other half is referenced in a way nobody documented.

Running the script now takes two minutes. Running it multiple times creates duplicates. Resetting the database between test suites is not practical anymore, so tests start sharing state.

2. Schema changes silently break them

A NOT NULL column is added. A foreign key is tightened. A table is renamed.

If nobody updates the seed script immediately, it breaks the next time it runs. The failure is often cryptic — a constraint violation deep in a chain of foreign key inserts, or a silent truncation of a value that no longer fits the updated column type.

In practice, seed script maintenance lags behind schema migrations. The gap between "migration merged" and "seed script updated" is where tests break for no apparent reason.

3. They represent one global state, not multiple scenarios

A good test suite has multiple starting conditions. An API test needs a user without a confirmed email. An E2E test needs an admin account with specific permissions. A billing test needs a workspace with an expired subscription.

Seed scripts model one state. If you need multiple, you add if conditions, environment variables, or multiple files. The logic accumulates. The implicit dependencies between different pieces of the state become hard to reason about.

Teams often resort to creating new users and objects at the start of each test rather than seeding proper scenarios, which means tests are creating state rather than starting from it.

4. They are not reproducible across machines

A seed script that runs successfully on your laptop may fail on a colleague's machine because the database has different data, different ordering, or a different version of the schema.

CI environments may run with a clean database, making them inconsistent with local environments that have accumulated months of leftover test rows.

"Works on my machine" is a symptom of seed scripts, not a coincidence.


What state-based test data looks like

The alternative to a global seed script is a set of named scenarios, each representing a specific application state that tests start from.

Instead of:

psql myapp < seeds.sql

You define states:

| Scenario | State | |---|---| | myapp/baseline | Migrations applied, reference data loaded, no user rows | | billing/free | One workspace on the free plan | | billing/pro | Premium workspace with three members and an active subscription | | billing/expired | Premium workspace with a lapsed trial | | auth/locked | A user account locked after failed login attempts |

Each scenario is an immutable snapshot of your database at that moment — CSVs aligned to your actual schema, with a content-addressed fingerprint that detects if the schema has changed.

Seeding any of them takes under a second:

seedmancer seed billing/pro --yes

How scenarios survive schema changes

The key difference from a seed script is the schema fingerprint.

When Seedmancer captures a scenario, it records a fingerprint of the schema at that moment. When you later run seedmancer seed billing/pro, it checks whether the current schema matches the fingerprint. If the schema has changed, the seed is blocked and you are told which scenario needs updating.

This is the opposite behavior from a seed script, which silently fails — or silently succeeds while inserting bad data.

When your schema changes, you update the affected scenarios explicitly:

seedmancer refresh billing/pro

The refresh command (Pro plan) adapts the existing scenario data to the new schema and creates a new revision — an MCP agent in an AI host can do the same locally via generate_dataset_local. Old revisions are preserved. You can always roll back to any previous state.


Scenarios are versioned and shareable

Each time you update a scenario, Seedmancer creates a new immutable revision: r001, r002, and so on. The latest pointer tracks the most recent one, but previous revisions are always available:

seedmancer seed billing/pro --revision r002

The scenario folder lives inside .seedmancer/scenarios/, which is safe to commit to the repository. It contains only CSVs and JSON — no secrets, no credentials. Every developer and every CI runner can use it without any cloud account.

For teams that want to share datasets across environments or machines without committing them, Seedmancer's cloud push and pull (Pro) handles that too:

seedmancer push billing/pro
seedmancer pull billing/pro

The migration path from a seed script

You do not need to throw away your seed script. Use it as the starting point for a baseline scenario:

  1. Run your existing seed script against a fresh database.
  2. Export the result:
seedmancer export myapp/baseline

That captures the schema fingerprint and a CSV snapshot of every table. Your existing baseline is now a Seedmancer scenario.

From there, add scenario-specific variants:

# Generate a billing/pro scenario from a prompt in your AI host
# or write the SQL yourself and pipe it via the CLI:
seedmancer generate-local billing/pro --inherit myapp/baseline

Your seed script stays in place for reference if needed, but the scenarios become the authoritative starting states for your tests.


Conclusion

Seed scripts are not wrong. They are a natural early-stage solution to a problem that only becomes visible as the project grows.

The failure mode is predictable: a single global state that accumulates without strategy, breaks silently on schema changes, and diverges across machines and environments.

The solution is not a better seed script — it is a different model: named, versioned, schema-aware scenarios that represent the specific states your tests need to start from.

If your seed script is already causing friction, exporting your current state as a baseline and layering scenarios on top is the quickest path to reliable test data. The CLI documentation covers the full workflow.