07 · Future

v1.5 · v2.0 · Roadmap

Three engines on a single enforcement philosophy: VALIDATE ships first, CONSENSUS raises the confidence bar, FORGE closes the fix loop. Every milestone has a concrete exit gate — cited evidence, not a feeling.

v1.0

VALIDATE — GA

v1.5

CONSENSUS — Q3 2026

v2.0

FORGE — Q1 2027

Skills shipped

Commands shipped

three-engine architecturediagram

FORGE builds code  →  VALIDATE proves it works  →  CONSENSUS confirms agreement
 (v2.0)                  (v1.0 — shipped)                (v1.5)

01 v1.0 — VALIDATE Engine

v1.0VALIDATE EngineSHIPPED 2026-04-23

The evidence-based functional validation engine. It refuses to let a feature ship without cited proof that it works against the real system. No mocks. No test files. No "it compiled" verdicts.

Inventory	Count	Notes
Skills	51	Platform validators, quality gates, orchestration, analysis
Commands	19	/validate, /validate-fix, /validate-sweep, /validate-team, forge suite
Hooks	7	block-test-files, completion-claim-validator, validation-not-compilation …
Agents	7	platform-detector, evidence-capturer, verdict-writer, sweep-controller …
Rules	9	Installed to .claude/rules/vf-* via /forge-install-rules

Auto-detects iOS, Web, API, CLI, React Native, Flutter, Django/Flask, Rust CLI

7-phase pipeline: Research → Plan → Preflight → Execute → Analyze → Verdict → Ship

Dependency-aware validator waves (DB → API → Web/iOS → Integration)

3-strike fix loop with fresh-evidence invariant per attempt

Benchmark scoring: Coverage 35% + Evidence Quality 30% + Enforcement 25% + Speed 10%

7 hooks: test-file blocking, evidence gate, build-vs-validation guard, mock detection

Self-contained HTML evidence dashboard with screenshot gallery and trend chart

CI/CD exit codes for GitHub Actions / GitLab / Jenkins integration

example: PASS verdict with cited evidenceoutput

Journey J2: User can delete account
Verdict: PASS
Evidence:
  step-01-navigate-to-settings.png  (23 KB) — Settings page rendered
  step-02-click-delete-button.png   (21 KB) — Confirmation dialog visible
  step-03-confirm-deletion.json     (1.2 KB) — DELETE /users/42 → 204
  step-04-redirect-to-login.png     (19 KB) — User landed at /login
  evidence-inventory.txt            (414 B)

02 v1.5 — CONSENSUS Engine

v1.5CONSENSUS EnginePLANNED · Q3 2026

The execution-time agreement gate. Where VALIDATE proves the system works once, CONSENSUS proves it works according to ≥2 independent validators. Single-validator verdicts are biased — CONSENSUS eliminates that bias for high-stakes features.

/validate-consensus --validators N spawns N blind, independent validators on the same feature

Evidence isolation: e2e-evidence/consensus/validator-{N}/ — no cross-writes permitted

5 synthesis states: UNANIMOUS_PASS, UNANIMOUS_FAIL, MAJORITY_PASS, MAJORITY_FAIL, SPLIT

3 confidence tiers: HIGH (unanimous), MEDIUM (≥⅔ after disagreement analysis), LOW (split)

Disagreement protocol routes diverging criteria through sequential-analysis for root cause

4 disagreement types: missing evidence, contradictory evidence, interpretation gap, validator error

SPLIT escalates to human — synthesizer never fabricates agreement to unblock a ship

New agents: consensus-validator (N instances) + consensus-synthesizer (1 instance)

confidence formulaformula

agreement_ratio = max(pass_count, fail_count) / total_validators

confidence =
  HIGH    if agreement_ratio == 1.0    ← unanimous
  MEDIUM  if agreement_ratio >= 2/3    ← after disagreement analysis resolves
  LOW     if agreement_ratio <  2/3    ← split; unresolved

Confidence degrades monotonically. Evidence quality cannot substitute for agreement.
HIGH requires unanimity regardless of how compelling one validator's evidence is.

payment flow consensus exampleoutput

Feature: Payment processing — refund flow
Validators: 3

Per-journey synthesis:
  J1 Refund happy path        UNANIMOUS_PASS    HIGH
  J2 Partial refund           UNANIMOUS_PASS    HIGH
  J3 Refund after dispute     MAJORITY_PASS     MEDIUM (1 dissent → interpretation gap)
  J4 Double-refund prevention SPLIT             LOW    → DISAGREEMENT_UNRESOLVED

Overall: DISAGREEMENT_UNRESOLVED (weakest journey governs)
Action: escalate J4 to human; do not ship.

03 v2.0 — FORGE Engine

v2.0FORGE EnginePLANNED · Q1 2027

The autonomous fix-and-revalidate loop. FORGE closes the gap between "validation found a FAIL" and "validation found a FAIL AND fixed it." Three attempts, fresh evidence every time, different root cause required each time.

/validate-sweep --autofix: detect FAIL → diagnose root cause → apply minimal fix → re-validate

3-strike cap: max 3 fix attempts per journey; after that, mark UNFIXABLE and continue

Fresh-evidence invariant: each attempt writes to e2e-evidence/forge-attempt-N/ — never reuses

Different-cause rule: each attempt must target a different root cause — same-fix retry counts as failure

Rollback on repeated failure: automatic revert when repeated hypothesis fails

.validationforge/forge-state.json: resume from last incomplete phase on crash or interrupt

Per-attempt hypothesis log: links root cause hypothesis to fix applied to outcome evidence

autofix loop — 7-step disciplineprotocol

1. READ       the FAIL verdict and cited evidence
2. TRACE      to specific source code (file:line)
3. HYPOTHESIZE one root cause (single function/line named)
4. APPLY      minimal fix targeting that cause
5. RE-VALIDATE the failed journey
6. IF FAIL persists → document WHY this hypothesis failed → move to next hypothesis
7. IF 3 attempts exhausted → mark UNFIXABLE → log all attempted causes → continue

autofix example — login flow fixed in 2 attemptsoutput

Journey J2: Login flow
Attempt 1: Hypothesis — missing null check in auth.ts:45
           Fix — added guard; re-validated
           Result — FAIL (different error surfaced)
Attempt 2: Hypothesis — session cookie not set on redirect
           Fix — added Set-Cookie header in callback handler
           Result — PASS
Final: J2 PASS after 2 attempts
Evidence: forge-attempt-1/, forge-attempt-2/

04 Beyond v2.0 — Platform and Network

Researched but not scoped to a specific version. Ships when the preceding engine lands clean.

Cloud-backed evidence retention

A hosted layer at validationforge.dev/team/<slug> that retains evidence past the 30-day local window. Makes validation posture visible across a team. Keeps on-device capture unchanged — the plugin still runs locally, the dashboard aggregates.

Multi-project benchmarking

Compare coverage, evidence quality, and speed across multiple projects in the same org. Answers "which of my three products has the weakest evidence culture?" Builds on the v1.0 four-dimension benchmark model.

AI evidence analysis

Vision models pre-process captured screenshots to suggest verdicts before human review. Flags '500 error page' or 'expected confirmation dialog' so the verdict writer can prioritize. Already scaffolded as Phase 3.5 (ai-evidence-analysis skill).

IDE integration

VS Code extension that surfaces evidence inline next to the code under validation. Click a PASS badge on a function to see the screenshot/log that proved it works. The evidence graph already exists locally — the extension gives it a first-class surface.

05 What Is NOT on This Roadmap

Clarity on scope is part of the contract. These items are explicitly out of scope.

✕Test-framework integrations (Jest, Mocha, pytest, XCTest) — the no-mock mandate is load-bearing

✕CI/CD pipeline orchestration — /validate-ci produces exit codes; VF is not a CI runner

✕Auto-generated journeys from requirements docs — /validate-plan discovers, it does not invent

✕Validation-as-a-service SaaS — the cloud dashboard complements the local plugin, not replaces it

06 Feedback Loop

File against PRD.md — not vague wishlists

If a feature you need is missing, file it against validationforge/PRD.md with a concrete journey description and the evidence format you would expect. The roadmap moves on concrete asks backed by real scenarios — not on vague wishlists. The shipping schedule is a promise backed by evidence.