06 · Practice

Usage Examples

Seven commands, seven deep-dives. Every scenario shows literal console output, real file paths, generated YAML, gotchas, and edge cases sourced from production Next.js 15 projects. No fabrication — zero mock data.

Commands covered

30s

Min vf-setup

15m

Max sweep runtime

Max fix attempts

Top benchmark score

01 /vf-setup — First-time project setup

01Bootstrapping a Next.js 15 project under strict enforcement30–60s

Before

project structuretree

my-app/
├── src/
├── package.json
├── next.config.ts
└── tsconfig.json

Console output

terminalconsole

> /vf-setup

[vf-setup] Scanning project root: /Users/nick/work/my-app
[vf-setup] Detected: Next.js 15 (App Router)          ← HIGH confidence
[vf-setup] Detected: Node 20.11.0
[vf-setup] Detected: package manager = pnpm
[vf-setup] No existing .vf/ directory found — will create

Please answer 3 questions:
  1. Enforcement level (strict/standard/permissive)? [standard]: strict
  2. Evidence retention in days (0 = forever)? [30]: 30
  3. Default validator platform? [web]: web

[vf-setup] Writing .vf/config.json
[vf-setup] Initializing .vf/benchmark-history.json = []
[vf-setup] Appending e2e-evidence/ to .gitignore
[vf-setup] Registering hooks: block-test-files, evidence-gate-reminder,
           validation-not-compilation, completion-claim-validator
[vf-setup] Setup complete. Next: /validate-plan &lt;journey-name&gt;

Setup summary:
  Platform:         web (Next.js 15)
  Enforcement:      strict
  Retention:        30 days
  Hooks installed:  7
  Evidence dir:     /Users/nick/work/my-app/e2e-evidence

After — .vf/ directory created

treetree

my-app/
├── .vf/
│   ├── config.json              # platform, retention, enforcement level
│   ├── benchmark-history.json   # initialized as []
│   ├── state/                   # lock files + active-run markers
│   └── hooks-installed.lock
├── e2e-evidence/
│   └── .gitkeep
└── .gitignore                   # e2e-evidence/ appended

Edge cases

Monorepo

Run /vf-setup inside each package that ships independently. VF refuses to set up at the repo root if it detects multiple package.json files at sibling depth. Workaround: pass --scope packages/web to pin the root.

Pre-existing e2e-evidence/

Setup does NOT wipe existing evidence. Writes .vf/state/legacy-evidence-detected.txt and prompts you to archive first with: tar -czf legacy-e2e-$(date +%Y%m%d).tar.gz e2e-evidence/

Gotcha: strict enforcement installs the block-test-files hook which refuses any write to *.test.*, *.spec.*, or *_test.* paths. If your repo has legacy test files, run /vf-setup in permissive mode first, migrate, then upgrade to strict.

02 /validate-plan — Define a validation journey

02Planning an OAuth sign-up journey15–30s

Console output

terminalconsole

> /validate-plan user-signup-with-oauth

[validate-plan] Reading journey name: user-signup-with-oauth
[validate-plan] Scanning routes: /signup, /api/auth/*, /dashboard
[validate-plan] Detected: next-auth@5.0.0, Google provider, Drizzle ORM
[validate-plan] Generated PASS criteria: 5 criteria, 11 steps

Plan written: plans/journeys/user-signup-with-oauth.yaml (4.1 KB)

Generated journey YAML

plans/journeys/user-signup-with-oauth.yamlyaml

journey:
  slug: user-signup-with-oauth
  platform: web
  dependencies: []
  evidence_dir: e2e-evidence/user-signup-with-oauth/

pass_criteria:
  - id: pc-01
    description: "Landing page renders signup CTA"
    evidence: [step-01-landing-rendered.png]
  - id: pc-02
    description: "OAuth consent screen loads (Google)"
    evidence: [step-02-oauth-consent.png, step-02-oauth-network.json]
  - id: pc-03
    description: "Callback creates user row in DB"
    evidence: [step-03-db-user-row.json]
  - id: pc-04
    description: "Session cookie set with Secure+HttpOnly+SameSite=Lax"
    evidence: [step-04-cookie-headers.txt]
  - id: pc-05
    description: "Dashboard renders with user email visible"
    evidence: [step-05-dashboard-rendered.png]

steps:
  - play: navigate
    url: https://localhost:3000/signup
  - play: click
    selector: "[data-testid=google-signup]"
  - play: wait_for_url
    pattern: "accounts.google.com/*"
    # ... 8 more steps

Gotcha: Regenerating a journey (/validate-plan user-signup-with-oauth a second time) writes user-signup-with-oauth.v2.yaml next to the original instead of overwriting. You must explicitly rm plans/journeys/user-signup-with-oauth.yaml first to replace it. This prevents silent contract drift between runs.

03 /validate-sweep — Execute and capture evidence

03Running a full sweep — partial FAIL with evidence2–15min

Console output

terminalconsole

> /validate-sweep

[sweep] Lock file written: .vf/state/validation-in-progress.lock
[sweep] Preflight PASS (build: 2.1s, server :3000: reachable)
[sweep] Loaded 1 journey: user-signup-with-oauth
[sweep] Booting Playwright (chromium, headless=false, 1280x720)
[sweep] Journey: user-signup-with-oauth
  [01/11] navigate /signup                  → step-01-landing-rendered.png (82 KB) ✓
  [02/11] click [data-testid=google-signup] → step-02-oauth-consent.png (104 KB) ✓
  [02/11] network capture                   → step-02-oauth-network.json (3 KB)   ✓
  [03/11] fill oauth credentials            → step-03-oauth-submitted.png (91 KB) ✓
  [04/11] wait_for_url /dashboard           → timeout 30s                        ✗
  [04/11] FAIL — pc-03,pc-04,pc-05 blocked on callback timeout
[sweep] Writing verdict: e2e-evidence/user-signup-with-oauth/verdict.md
[sweep] Writing report:  e2e-evidence/report.md
[sweep] Lock released.

Result: FAIL (1 journey, 3 PASS criteria blocked)
See: e2e-evidence/report.md

Evidence captured

e2e-evidence/user-signup-with-oauth/tree

├── step-01-landing-rendered.png         82 KB
├── step-02-oauth-consent.png           104 KB
├── step-02-oauth-network.json            3 KB
├── step-03-oauth-submitted.png          91 KB
├── step-04-callback-timeout-console.txt  1 KB
├── evidence-inventory.txt              612 B
└── verdict.md                          1.4 KB

Gotcha: Never delete .vf/state/validation-in-progress.lock manually while sweep is running. Other commands (/validate-dashboard, /validate-benchmark) honor the lock. If sweep crashes, the lock becomes stale after 1 hour and is auto-ignored.

04 /validate-fix — 3-strike fix loop

04Fixing the OAuth callback timeout — attempt 1 succeeds5–45min

Console output — successful fix

terminalconsole

> /validate-fix

[fix] Reading: e2e-evidence/user-signup-with-oauth/verdict.md
[fix] First FAIL: pc-04 — callback timeout at /dashboard
[fix] Attempt 1 of 3

[fix] Root cause hypothesis: missing NEXTAUTH_URL env var on localhost
[fix] Proposed fix:
  File: .env.local
  +NEXTAUTH_URL=http://localhost:3000
  +NEXTAUTH_SECRET=dev-secret-do-not-use-in-prod

[fix] Applying fix. Running /validate-sweep on user-signup-with-oauth only.
  [04/11] wait_for_url /dashboard            → 2.1s ✓
  [05/11] dashboard renders email            → step-05-dashboard-rendered.png ✓
  [06–11] all remaining steps PASS

[fix] Attempt 1 verdict: PASS
[fix] Notes written to e2e-evidence/user-signup-with-oauth/forge-attempt-1/
[fix] Exiting — journey PASSes. 2 attempts were not consumed.

3-strike exhaustion

terminalconsole

[fix] Attempt 3 of 3 FAILED — same symptom (callback timeout)
[fix] Hypothesis recap:
  attempt-1: missing NEXTAUTH_URL          → partial fix, symptom moved
  attempt-2: cookie SameSite=Strict        → regressed pc-04 anyway
  attempt-3: redirect URL allowlist        → no observable change
[fix] Marking journey UNFIXABLE. No further automatic attempts.
[fix] Human intervention required.
[fix] See: e2e-evidence/user-signup-with-oauth/UNFIXABLE.md

Gotcha: If attempt 2 and 3 target the same root cause (even if the patch differs), VF rejects the identical hypothesis at plan time: "attempt-3 rejected — same root cause as attempt-2." This forces exploration, not thrashing.

05 /validate-consensus — Multi-validator consensus

053-validator consensus — majority disagreement resolved to PASSv1.5 preview

Console output

terminalconsole

> /validate-consensus --validators 3

[consensus] Coordinator: spawning 3 independent validators
[consensus] Preflight PASS (required before validators spawn)
[consensus] Dispatching validators in parallel...
  validator-1 → e2e-evidence/consensus/validator-1/   (pid 48112)
  validator-2 → e2e-evidence/consensus/validator-2/   (pid 48113)
  validator-3 → e2e-evidence/consensus/validator-3/   (pid 48114)

[consensus] Waiting for completion...
  validator-1 COMPLETE  (4m 12s, 11/11 steps PASS, verdict: PASS)
  validator-2 COMPLETE  (4m 38s, 10/11 steps PASS, verdict: FAIL on pc-04)
  validator-3 COMPLETE  (4m 05s, 11/11 steps PASS, verdict: PASS)

[consensus] Invoking synthesizer...
[synthesizer] Per-journey tuple: (pass=2, fail=1, total=3)
[synthesizer] State: MAJORITY_PASS (agreement_ratio=0.67)
[synthesizer] Confidence: MEDIUM → triggers disagreement protocol
[synthesizer] Diverging criterion: pc-04 (v2 FAIL, v1/v3 PASS)
[synthesizer] Running sequential-analysis over pc-04 evidence...
  v1 step-04-dashboard.png: dashboard renders, cookie Secure+HttpOnly ✓
  v2 step-04-cookie-headers.txt: Set-Cookie missing SameSite attribute
  v3 step-04-dashboard.png: dashboard renders, cookie present          ✓
[synthesizer] Root cause: case (b) contradictory evidence
[synthesizer] Resolution: stale browser state in v2; re-running pc-04 only
[synthesizer] Re-synthesized: (pass=3, fail=0) → UNANIMOUS_PASS
[synthesizer] Final verdict: PASS (confidence HIGH)

Report: e2e-evidence/consensus/report.md

Synthesizer report excerpt

e2e-evidence/consensus/report.mdmarkdown

Verdict:    PASS
Confidence: HIGH (after disagreement resolution)
Validators: 3 (all independent)
Agreement:  1.00 (3/3 PASS after re-resolution)

Synthesis trace
Journey                  v1     v2             v3     State           Final
user-signup-with-oauth   PASS   FAIL → PASS    PASS   UNANIMOUS_PASS  PASS

Disagreement protocol fired
- pc-04 had one FAIL (v2) against two PASS
- Root cause: stale browser state in validator-2
- Action: re-ran v2 pc-04 in isolation — evidence now agrees
- Cited evidence:
    e2e-evidence/consensus/validator-1/step-04-dashboard.png
    e2e-evidence/consensus/validator-2/pc-04-rerun/step-04-cookie-headers.txt
    e2e-evidence/consensus/validator-3/step-04-dashboard.png

Gotcha: Consensus verdicts cannot be upgraded retroactively. If the original disagreement protocol is unresolvable (case c — under-specified criterion), the journey ships as DISAGREEMENT_UNRESOLVED with LOW confidence. The synthesizer never fabricates agreement to ship.

06 /validate-dashboard — Evidence dashboard HTML

06Generating a self-contained evidence dashboard~5s

Console output

terminalconsole

> /validate-dashboard

[dashboard] Reading e2e-evidence/ (3 journeys, 47 evidence files, 4.2 MB)
[dashboard] Reading .vf/benchmark-history.json (last 10 runs)
[dashboard] Generating thumbnails: 12 screenshots @ 240x135
[dashboard] Inlining: styles (2.1 KB), trend chart (SVG, 6 KB)
[dashboard] Writing: e2e-evidence/dashboard.html (1.2 MB, self-contained)

Dashboard ready. Open with:
  open e2e-evidence/dashboard.html

Dashboard summary panel

Journey	Status	PASS criteria	Evidence files	Last run
user-signup-with-oauth	PASS	5/5	11	2026-04-23 21:35
dashboard-logout	PASS	3/3	6	2026-04-23 21:36
billing-checkout	FAIL	2/4	8	2026-04-23 21:41

Gotcha: The dashboard is self-contained HTML (works via file://). But inlined screenshots can exceed 20 MB for runs with 50+ screenshots. Use --compress to serve screenshots from a sibling dashboard-assets/directory instead.

07 /validate-benchmark — Posture score + trend

07Measuring and trending validation posture — Grade A~10s

Console output

terminalconsole

> /validate-benchmark

[benchmark] Reading plans/journeys/ (3 journeys defined)
[benchmark] Reading e2e-evidence/ (3 journeys executed, 47 evidence files)
[benchmark] Reading .vf/config.json (enforcement: strict)
[benchmark] Reading .vf/benchmark-history.json (9 prior runs)
[benchmark] Computing...

ValidationForge Benchmark — 2026-04-23 21:47:02
==================================================
Coverage          35/35  (3/3 journeys validated)
Evidence Quality  27/30  (47 files cited, 0 zero-byte, 2 missing headers)
Enforcement       25/25  (strict + no test files + no mocks + all hooks)
Speed              8/10  (9m 14s vs project median 7m 30s)
--------------------------------------------------
Total Score       95/100
Grade             A
Trend             +3 vs run #9 (was 92/100)

Appended to: .vf/benchmark-history.json

Scoring dimensions

Dimension	Weight	What it measures
Coverage	35%	Validated journeys / total discoverable features
Evidence Quality	30%	Evidence citations, observation quality, verdict rigor
Enforcement	25%	Hooks installed, no mocks, no test files, rules active
Speed	10%	Validation time relative to project size

History JSON (appended, never overwritten)

.vf/benchmark-history.jsonjson

[
  { "ts": "2026-04-20T10:11:00Z", "score": 88, "grade": "B+", "coverage": 32 },
  { "ts": "2026-04-22T15:33:00Z", "score": 92, "grade": "A-", "coverage": 35 },
  { "ts": "2026-04-23T21:47:02Z", "score": 95, "grade": "A",  "coverage": 35,
    "delta": { "score": 3, "coverage": 0, "evidence_quality": 2, "speed": 1 } }
]

Gotcha: Cross-project comparison is invalid — only same-project trends count. The history file is project-local. Copying .vf/benchmark-history.json from one repo to another makes scores look valid but the trend line is meaningless. VF does not detect this automatically.

08 Quick-Reference Gotcha Matrix

Command	Single biggest gotcha
/vf-setup	strict mode blocks writes to .test. — migrate legacy tests first
/validate-plan	Re-running writes .v2.yaml instead of overwriting; must rm first
/validate-sweep	Don't delete the in-progress lock file manually
/validate-fix	Attempt N rejected if same root cause as attempt N-1
/validate-consensus	SPLIT never silently downgrades to MAJORITY — HIGH requires unanimity
/validate-dashboard	Inlined screenshots can exceed 20 MB; use --compress for email
/validate-benchmark	History JSON is project-local; copying it invalidates trends