Executable Specifications — The Future of Product Engineering

Engineering Strategy • November 27, 2025

How machine-readable PRDs, ProdMoh and the Model Context Protocol (MCP) let teams turn acceptance criteria into executable tests, shift verification left, and make AI-assisted code generation reliable and auditable.

Introduction — Why “Writing Requirements” No Longer Scales

For decades, the development loop was human-centered: PMs wrote requirements, engineers interpreted them, QA verified behavior. That model functioned when releases were slower and humans were the only coders. Today, with AI agents and copilots writing large portions of production code and teams shipping continuously, requirements must be both human- and machine-readable. Executable specifications are structured, test-generating artifacts that close the gap between intent and implementation.

What are Executable Specifications?

An executable specification is a structured, machine-readable representation of product behavior that can be transformed automatically into tests, mocks, CI checks and validation logic. Instead of merely describing expected behavior, an executable spec verifies it.

Properties of executable specifications:

  • Machine-readable: structured schema (JSON/AST) rather than free-form prose.
  • Deterministic: versioned artifacts produce reproducible tests and scaffolds.
  • Test-generating: maps acceptance predicates to language-specific test templates.
  • CI-verifiable: pipelines validate PRD→tests→implementation alignment.
  • Traceable: every test maps back to PRD version and story ID for auditability.

Why Traditional Requirements Fail in the AI + Agent Era

Traditional PRDs fail for several reasons, and these failures are magnified when AI participates in code generation:

  1. Ambiguity: Prose invitations allow AI to invent defaults that don't match product intent.
  2. Decay: Clarifications in chat or meetings seldom update canonical docs, creating drift.
  3. IDE invisibility: Developers and agents need requirements in the same environment they code.
  4. Non-determinism: AI models will choose plausible but incorrect implementations without explicit constraints.
  5. Late validation: QA discovers mismatches post-integration, causing costly rework.

Core Attributes — What a Robust Executable Spec Looks Like

At minimum, an executable specification should constitute the following building blocks:

1. Predicates (Acceptance Criteria)

Atomic boolean assertions about system behavior. Example (JSON):

{
  "type": "predicate",
  "expr": "response.status == 200 && response.json.items.length > 0"
}

2. Examples (Ground Truth)

Concrete I/O cases that anchor behavior and reduce hallucination:

{
  "input": { "query": "red shoes" },
  "output": { "min_results": 1 }
}

3. Invariants

Business invariants that must always hold (e.g., price >= 0), enforced as test assertions and CI checks.

4. Non-functional Requirements (NFRs)

NFRs (latency, throughput, security) must be first-class fields translated into CI smoke tests and policy checks:

{
  "nfr": "latency",
  "max_ms": 300
}

5. Metadata & Versioning

Each PRD and story must carry version metadata (semver/timestamp) for reproducibility and audit trails.

How Executable Specs Flow Through the Development Lifecycle

Here’s an end-to-end pattern that teams can implement now:

  1. Author: PMs author structured PRDs in ProdMoh using canonical schema and validation rules.
  2. Publish: ProdMoh publishes PRD fragments via MCP and issues scoped tokens.
  3. Consume: IDE plugins retrieve relevant story context (predicates, examples, NFRs) for the current file/branch.
  4. Generate: IDE agents propose unit/integration tests and mocks derived from acceptance criteria.
  5. Review: Developers review, run, and commit the tests alongside code; PR metadata references PRD versions and story IDs.
  6. Enforce: CI validates PRD-to-tests mapping, runs NFR checks, and prevents merges if alignment fails.

Concrete Example — Minimal End-to-End

ProdMoh PRD (canonical JSON)

{
  "prdId": "prod-2025-payment-v1",
  "meta": { "version": "2025.11.1", "author": "pm@company.com" },
  "stories": [
    {
      "id": "S-100",
      "title": "Display paid badge on priced items",
      "acceptance": [
        { "type": "predicate", "expr": "response.json.items[0].badges includes 'paid'" }
      ],
      "examples": [
        { "query": "red shoes", "product": { "id": "p-123", "price": 199 } }
      ]
    }
  ]
}

IDE Client Request (pseudo)

GET /mcp/prd/prod-2025-payment-v1/stories?file=src/components/search.ts
Authorization: Bearer <MCP-TOKEN>

Generated Test (Jest)

test('search includes paid badge for priced product', async () => {
  // generated from story S-100
  const res = await search('red shoes');
  expect(res.json.items[0].badges).toContain('paid');
});

Commit message example: refs prd:prod-2025-payment-v1#S-100

Mechanics — From Predicates to Tests

Turning acceptance criteria into tests involves a few hardened steps designed to minimize false positives and preserve human oversight:

1. Canonicalization

Convert free-form acceptance prose into a small set of canonical primitives (predicates, invariants, examples). This can be enforced with a linter in ProdMoh.

2. Mapping Layer

The MCP client contains a mapping layer that converts predicates into language-specific test scaffolds. Example mapping:

{
  "predicate": "response.json.items[0].badges contains 'paid'",
  "template": "test('<title>', async () => { const res = await <call>; expect(res.json.items[0].badges).toContain('paid'); });"
}

3. Confidence Scoring

Assign a confidence score to each generated artifact based on example completeness, predicate ambiguity, and presence of mocks. Low-confidence tests require PM signoff before CI treats them as authoritative.

4. Developer-in-the-loop

Generated tests are proposals — developers must review and edit them before committing. This safeguards against blind reliance on generated artifacts.

5. CI & Policy Gating

CI validates that PRD-derived tests exist for changed stories, PRD version metadata matches, and NFR policies pass. Treat PRD version mismatches as a blocking failure in the pipeline.

Non-functional Requirements & Policy Enforcement

NFRs (performance, security, privacy) require different handling. Represent them as policy-level artifacts in the PRD and translate them into CI smoke tests. Example:

{
  "nfr": { "latency_ms": 300, "p99": 450 }
}

CI executes lightweight performance checks and will block merges if thresholds are exceeded. Security-sensitive flows should carry a security-review flag in the PRD and require additional token scopes for generation and execution.

Security, Tokens & Auditability

MCP tokens are the gateway to product intent. Treat them like sensitive credentials:

  • Scoped tokens: granular scopes (read:stories, annotate, generate-tests).
  • Short lifetimes: ephemeral developer tokens; CI tokens with restricted scopes.
  • Repository binding: bind tokens to repos or branches where possible to reduce blast radius.
  • Audit logs: record which PRD version produced which test and which token/IDE requested it.

Persist test provenance in three places: ProdMoh audit logs, PR/commit metadata, and CI build metadata. This supports compliance and post-incident analysis.

Organizational Adoption — A Pragmatic Rollout Plan

Phase 0 — Foundations

  • Create canonical PRD templates emphasizing machine-parsable acceptance criteria.
  • Train PMs and add pre-publish linting in ProdMoh to reduce ambiguous ACs.
  • Establish token governance and audit pipelines in platform teams.

Phase 1 — Pilot

  • Choose one team, one medium-complexity feature, and validate the flow end-to-end.
  • Collect pilot metrics: PR cycle time, clarification threads, escaped defects.

Phase 2 — Scale

  • Roll out schema & linters org-wide, automate token issuance, and integrate NFR checks into CI.
  • Train additional PMs and platform engineers on best practices.

Key Metrics & Business Impact

Pilots and early industry adopters report measurable gains:

PR cycle time
Target: -20% in pilot
Clarification threads / PR
Target: -50% in pilot
Escaped defects
Target: -30% in pilot

These are achieved through improved clarity, test generation, and CI enforcement rather than purely better AI models.

Risks, Limitations & Mitigations

  • Overreliance on generated artifacts: require human review gates and low-confidence thresholds.
  • Mis-specified acceptance criteria: add pre-publish linting and template enforcement in ProdMoh.
  • Tooling fragmentation: offer lightweight SDKs, clear versioning, and incremental rollouts.

Executable Specs vs Traditional PRDs — Quick Comparison

DimensionTraditional PRDExecutable Spec
FormatFreeform proseSchema-driven JSON/structured fields
AudienceHumansHumans + AI Agents
TestabilityManualAuto-generated
TraceabilityWeakStrong (versioned)
DiscoverabilityExternal to IDEPullable via MCP into IDE

Practical Checklist — How to Start

  1. Adopt ProdMoh canonical schema and train PMs on predicate writing.
  2. Provision MCP tokens for a pilot team and configure IDE clients.
  3. Integrate PRD version checks into CI and require PR metadata that references PRD versions.
  4. Measure pilot metrics (30–60 days) and iterate templates and linters.

Conclusion — The Long Arc: From Ambiguity to Executable Intent

Executable specifications are not a theoretical novelty — they are a necessary evolution to support AI-assisted, agentic development. By making PRDs machine-readable, versioned, and test-generating, teams can dramatically reduce ambiguity and rework while enabling agents in IDEs to write code that aligns with product intent. ProdMoh + MCP provide the technical foundation today; disciplined authoring, governance, and CI policy make it safe and measurable in production.