Skip to main content

vf-docsroadmap

07 · Future

v1.5 · v2.0 · Roadmap

Three engines on a single enforcement philosophy: VALIDATE ships first, CONSENSUS raises the confidence bar, FORGE closes the fix loop. Every milestone has a concrete exit gate — cited evidence, not a feeling.

v1.0
VALIDATE — GA
v1.5
CONSENSUS — Q3 2026
v2.0
FORGE — Q1 2027
51
Skills shipped
19
Commands shipped
three-engine architecturediagram
FORGE builds code  →  VALIDATE proves it works  →  CONSENSUS confirms agreement
 (v2.0)                  (v1.0 — shipped)                (v1.5)
v1.0VALIDATE EngineSHIPPED 2026-04-23

The evidence-based functional validation engine. It refuses to let a feature ship without cited proof that it works against the real system. No mocks. No test files. No "it compiled" verdicts.

InventoryCountNotes
Skills51Platform validators, quality gates, orchestration, analysis
Commands19/validate, /validate-fix, /validate-sweep, /validate-team, forge suite
Hooks7block-test-files, completion-claim-validator, validation-not-compilation …
Agents7platform-detector, evidence-capturer, verdict-writer, sweep-controller …
Rules9Installed to .claude/rules/vf-* via /forge-install-rules
Auto-detects iOS, Web, API, CLI, React Native, Flutter, Django/Flask, Rust CLI
7-phase pipeline: Research → Plan → Preflight → Execute → Analyze → Verdict → Ship
Dependency-aware validator waves (DB → API → Web/iOS → Integration)
3-strike fix loop with fresh-evidence invariant per attempt
Benchmark scoring: Coverage 35% + Evidence Quality 30% + Enforcement 25% + Speed 10%
7 hooks: test-file blocking, evidence gate, build-vs-validation guard, mock detection
Self-contained HTML evidence dashboard with screenshot gallery and trend chart
CI/CD exit codes for GitHub Actions / GitLab / Jenkins integration
example: PASS verdict with cited evidenceoutput
Journey J2: User can delete account
Verdict: PASS
Evidence:
  step-01-navigate-to-settings.png  (23 KB) — Settings page rendered
  step-02-click-delete-button.png   (21 KB) — Confirmation dialog visible
  step-03-confirm-deletion.json     (1.2 KB) — DELETE /users/42 → 204
  step-04-redirect-to-login.png     (19 KB) — User landed at /login
  evidence-inventory.txt            (414 B)
v1.5CONSENSUS EnginePLANNED · Q3 2026

The execution-time agreement gate. Where VALIDATE proves the system works once, CONSENSUS proves it works according to ≥2 independent validators. Single-validator verdicts are biased — CONSENSUS eliminates that bias for high-stakes features.

/validate-consensus --validators N spawns N blind, independent validators on the same feature
Evidence isolation: e2e-evidence/consensus/validator-{N}/ — no cross-writes permitted
5 synthesis states: UNANIMOUS_PASS, UNANIMOUS_FAIL, MAJORITY_PASS, MAJORITY_FAIL, SPLIT
3 confidence tiers: HIGH (unanimous), MEDIUM (≥⅔ after disagreement analysis), LOW (split)
Disagreement protocol routes diverging criteria through sequential-analysis for root cause
4 disagreement types: missing evidence, contradictory evidence, interpretation gap, validator error
SPLIT escalates to human — synthesizer never fabricates agreement to unblock a ship
New agents: consensus-validator (N instances) + consensus-synthesizer (1 instance)
confidence formulaformula
agreement_ratio = max(pass_count, fail_count) / total_validators

confidence =
  HIGH    if agreement_ratio == 1.0    ← unanimous
  MEDIUM  if agreement_ratio >= 2/3    ← after disagreement analysis resolves
  LOW     if agreement_ratio <  2/3    ← split; unresolved

Confidence degrades monotonically. Evidence quality cannot substitute for agreement.
HIGH requires unanimity regardless of how compelling one validator's evidence is.
payment flow consensus exampleoutput
Feature: Payment processing — refund flow
Validators: 3

Per-journey synthesis:
  J1 Refund happy path        UNANIMOUS_PASS    HIGH
  J2 Partial refund           UNANIMOUS_PASS    HIGH
  J3 Refund after dispute     MAJORITY_PASS     MEDIUM (1 dissent → interpretation gap)
  J4 Double-refund prevention SPLIT             LOW    → DISAGREEMENT_UNRESOLVED

Overall: DISAGREEMENT_UNRESOLVED (weakest journey governs)
Action: escalate J4 to human; do not ship.
v2.0FORGE EnginePLANNED · Q1 2027

The autonomous fix-and-revalidate loop. FORGE closes the gap between "validation found a FAIL" and "validation found a FAIL AND fixed it." Three attempts, fresh evidence every time, different root cause required each time.

/validate-sweep --autofix: detect FAIL → diagnose root cause → apply minimal fix → re-validate
3-strike cap: max 3 fix attempts per journey; after that, mark UNFIXABLE and continue
Fresh-evidence invariant: each attempt writes to e2e-evidence/forge-attempt-N/ — never reuses
Different-cause rule: each attempt must target a different root cause — same-fix retry counts as failure
Rollback on repeated failure: automatic revert when repeated hypothesis fails
.validationforge/forge-state.json: resume from last incomplete phase on crash or interrupt
Per-attempt hypothesis log: links root cause hypothesis to fix applied to outcome evidence
autofix loop — 7-step disciplineprotocol
1. READ       the FAIL verdict and cited evidence
2. TRACE      to specific source code (file:line)
3. HYPOTHESIZE one root cause (single function/line named)
4. APPLY      minimal fix targeting that cause
5. RE-VALIDATE the failed journey
6. IF FAIL persists → document WHY this hypothesis failed → move to next hypothesis
7. IF 3 attempts exhausted → mark UNFIXABLE → log all attempted causes → continue
autofix example — login flow fixed in 2 attemptsoutput
Journey J2: Login flow
Attempt 1: Hypothesis — missing null check in auth.ts:45
           Fix — added guard; re-validated
           Result — FAIL (different error surfaced)
Attempt 2: Hypothesis — session cookie not set on redirect
           Fix — added Set-Cookie header in callback handler
           Result — PASS
Final: J2 PASS after 2 attempts
Evidence: forge-attempt-1/, forge-attempt-2/

Researched but not scoped to a specific version. Ships when the preceding engine lands clean.

Cloud-backed evidence retention
A hosted layer at validationforge.dev/team/<slug> that retains evidence past the 30-day local window. Makes validation posture visible across a team. Keeps on-device capture unchanged — the plugin still runs locally, the dashboard aggregates.
Multi-project benchmarking
Compare coverage, evidence quality, and speed across multiple projects in the same org. Answers "which of my three products has the weakest evidence culture?" Builds on the v1.0 four-dimension benchmark model.
AI evidence analysis
Vision models pre-process captured screenshots to suggest verdicts before human review. Flags '500 error page' or 'expected confirmation dialog' so the verdict writer can prioritize. Already scaffolded as Phase 3.5 (ai-evidence-analysis skill).
IDE integration
VS Code extension that surfaces evidence inline next to the code under validation. Click a PASS badge on a function to see the screenshot/log that proved it works. The evidence graph already exists locally — the extension gives it a first-class surface.

Clarity on scope is part of the contract. These items are explicitly out of scope.

Test-framework integrations (Jest, Mocha, pytest, XCTest) — the no-mock mandate is load-bearing
CI/CD pipeline orchestration — /validate-ci produces exit codes; VF is not a CI runner
Auto-generated journeys from requirements docs — /validate-plan discovers, it does not invent
Validation-as-a-service SaaS — the cloud dashboard complements the local plugin, not replaces it
File against PRD.md — not vague wishlists
If a feature you need is missing, file it against validationforge/PRD.md with a concrete journey description and the evidence format you would expect. The roadmap moves on concrete asks backed by real scenarios — not on vague wishlists. The shipping schedule is a promise backed by evidence.