ValidationForge · Getting Started · Docs

From zero to cited verdict
in five steps.

Install the plugin, detect your platform, plan a journey, sweep the real system, read the evidence. No mocks, no test files, no frameworks — just proof that your code works.

30s

Setup time

Hooks installed

Platforms detected

Test files created

First-run grade

01Installation & Setup — 5 Steps

Install via Claude Code marketplace

One command adds the plugin to your Claude Code environment. No npm install, no config files yet.

bashterminal

claude plugin marketplace add krzemienski/validationforge
claude plugin install validationforge@validationforge

# Verify the install succeeded
claude plugin list | grep validationforge
validationforge  v1.0.0  krzemienski/validationforge  active

Run /vf-setup in your project

Setup auto-detects your platform (Next.js, iOS, API, CLI, Flutter, etc.), writes the enforcement config, and registers the 7 enforcement hooks. Takes 30–60 seconds.

consolesetup output

> /vf-setup

[vf-setup] Scanning project root: /Users/you/my-app
[vf-setup] Detected: Next.js 15 (App Router)          ← HIGH confidence
[vf-setup] Detected: Node 20.11.0, pnpm
[vf-setup] No existing .vf/ directory found — will create

Please answer 3 questions:
  1. Enforcement level (strict/standard/permissive)? [standard]: strict
  2. Evidence retention in days (0 = forever)? [30]: 30
  3. Default validator platform? [web]: web

[vf-setup] Writing .vf/config.json
[vf-setup] Initializing .vf/benchmark-history.json = []
[vf-setup] Appending e2e-evidence/ to .gitignore
[vf-setup] Registering hooks: block-test-files, evidence-gate-reminder,
           validation-not-compilation, completion-claim-validator
[vf-setup] Setup complete. Next: /validate-plan &lt;journey-name&gt;

Setup summary:
  Platform:         web (Next.js 15)
  Enforcement:      strict
  Retention:        30 days
  Hooks installed:  7
  Evidence dir:     /Users/you/my-app/e2e-evidence

Plan your first validation journey

A journey is a named user flow — login, signup, checkout, etc. VF scans your routes and generates a YAML contract with measurable PASS criteria and required evidence types.

consolevalidate-plan output

> /validate-plan user-login

[validate-plan] Reading journey name: user-login
[validate-plan] Scanning routes: /login, /api/auth/*, /dashboard
[validate-plan] Detected: next-auth@5.0.0, Drizzle ORM
[validate-plan] Generating PASS criteria and evidence specs...

Plan written: plans/journeys/user-login.yaml (3.2 KB, 8 steps)

# Generated PASS criteria (excerpt)
pass_criteria:
  - id: pc-01
    description: "Login page renders email + password fields"
    evidence: [step-01-login-rendered.png]
  - id: pc-02
    description: "Invalid credentials shows error message"
    evidence: [step-02-error-message.png]
  - id: pc-03
    description: "Valid credentials redirects to /dashboard"
    evidence: [step-03-dashboard-rendered.png, step-03-session-cookie.txt]

Execute: /validate-sweep

Sweep runs the full pipeline against the real system. It starts your dev server if needed, launches Playwright (web) or idb (iOS) or curl (API), walks every step, and writes timestamped evidence to disk.

consolesweep output

> /validate-sweep

[sweep] Lock file written: .vf/state/validation-in-progress.lock
[sweep] Preflight PASS (build: 2.1s, server :3000: reachable)
[sweep] Loaded 1 journey: user-login
[sweep] Booting Playwright (chromium, headless=false, 1280x720)
[sweep] Journey: user-login
  [01/08] navigate /login                   → step-01-login-rendered.png  (78 KB) ✓
  [02/08] fill invalid credentials          → step-02-error-message.png   (62 KB) ✓
  [03/08] fill valid credentials            → step-03-credentials.png     (61 KB) ✓
  [04/08] wait_for_url /dashboard           → step-04-dashboard.png       (91 KB) ✓
  [05/08] capture session cookie            → step-03-session-cookie.txt  (1 KB)  ✓
  [06-08/08] remaining steps               ✓ ✓ ✓
[sweep] Writing verdict: e2e-evidence/user-login/verdict.md
[sweep] Writing report:  e2e-evidence/report.md
[sweep] Lock released.

Result: PASS (1 journey, 5/5 PASS criteria, 9 evidence files)

Review evidence and benchmark

Open the self-contained HTML dashboard for a visual summary. Then run /validate-benchmark to score your project's validation posture (Coverage 35% / Evidence Quality 30% / Enforcement 25% / Speed 10%).

consolebenchmark output

> /validate-dashboard
[dashboard] Writing: e2e-evidence/dashboard.html (self-contained)
open e2e-evidence/dashboard.html

> /validate-benchmark

ValidationForge Benchmark — 2026-04-23
==================================================
Coverage          35/35  (1/1 journeys validated)
Evidence Quality  28/30  (9 files cited, 0 zero-byte)
Enforcement       25/25  (strict + hooks active)
Speed              9/10  (2m 45s vs median 3m)
--------------------------------------------------
Total Score       97/100   Grade: A

02Edge Cases & Gotchas

Monorepo

Run /vf-setup inside each package that ships independently. Pass --scope packages/web to pin the root when multiple package.json files exist at sibling depth.

strict mode + legacy tests

strict enforcement installs block-test-files hook which refuses any write to *.test.*, *.spec.*, or *_test.* paths. Run /vf-setup in permissive first, migrate or archive tests, then upgrade.

Pre-existing e2e-evidence/

Setup does NOT wipe existing evidence. It writes .vf/state/legacy-evidence-detected.txt and prompts you to archive before first run.

Missing MCP server

Setup warns about missing prerequisites (e.g. Playwright MCP for web). It does NOT block setup — just warns. Install Playwright MCP before running /validate-sweep on web projects.

03Iron Rules — Non-Negotiable

Fix the real system. Never adjust the plan to make a verdict PASS.

Never create mocks, stubs, test doubles, or test files. The hook will block them.

Never mark a journey PASS without specific cited evidence on disk.

Never skip preflight. If it fails, STOP — do not enter EXECUTE.

Never exceed 3 fix attempts per journey. Fourth = mark UNFIXABLE.

Never reuse evidence from a previous attempt. Each attempt writes fresh captures.

Compilation success ≠ functional validation. Build output is necessary, not sufficient.

Empty files are invalid evidence. Zero-byte screenshot / log / response = FAIL.

04What Gets Created

treeproject layout after /vf-setup

my-app/
├── .vf/
│   ├── config.json              ← platform, retention, enforcement level
│   ├── benchmark-history.json   ← initialized as []
│   ├── state/                   ← lock files + active-run markers
│   └── hooks-installed.lock
├── .claude/rules/
│   ├── validation-discipline.md
│   ├── evidence-management.md
│   ├── platform-detection.md
│   ├── execution-workflow.md
│   └── team-validation.md
├── e2e-evidence/
│   └── .gitkeep                 ← directory reserved for evidence
├── .gitignore                   ← e2e-evidence/ appended
└── ... (existing files untouched)

05Next Steps