06 · Practice
Usage Examples
Seven commands, seven deep-dives. Every scenario shows literal console output, real file paths, generated YAML, gotchas, and edge cases sourced from production Next.js 15 projects. No fabrication — zero mock data.
7
Commands covered
30s
Min vf-setup
15m
Max sweep runtime
3
Max fix attempts
95
Top benchmark score
01 /vf-setup — First-time project setup
01Bootstrapping a Next.js 15 project under strict enforcement30–60s
Before
project structuretree
my-app/ ├── src/ ├── package.json ├── next.config.ts └── tsconfig.json
Console output
terminalconsole
> /vf-setup [vf-setup] Scanning project root: /Users/nick/work/my-app [vf-setup] Detected: Next.js 15 (App Router) ← HIGH confidence [vf-setup] Detected: Node 20.11.0 [vf-setup] Detected: package manager = pnpm [vf-setup] No existing .vf/ directory found — will create Please answer 3 questions: 1. Enforcement level (strict/standard/permissive)? [standard]: strict 2. Evidence retention in days (0 = forever)? [30]: 30 3. Default validator platform? [web]: web [vf-setup] Writing .vf/config.json [vf-setup] Initializing .vf/benchmark-history.json = [] [vf-setup] Appending e2e-evidence/ to .gitignore [vf-setup] Registering hooks: block-test-files, evidence-gate-reminder, validation-not-compilation, completion-claim-validator [vf-setup] Setup complete. Next: /validate-plan <journey-name> Setup summary: Platform: web (Next.js 15) Enforcement: strict Retention: 30 days Hooks installed: 7 Evidence dir: /Users/nick/work/my-app/e2e-evidence
After — .vf/ directory created
treetree
my-app/ ├── .vf/ │ ├── config.json # platform, retention, enforcement level │ ├── benchmark-history.json # initialized as [] │ ├── state/ # lock files + active-run markers │ └── hooks-installed.lock ├── e2e-evidence/ │ └── .gitkeep └── .gitignore # e2e-evidence/ appended
Edge cases
Monorepo
Run /vf-setup inside each package that ships independently. VF refuses to set up at the repo root if it detects multiple package.json files at sibling depth. Workaround: pass --scope packages/web to pin the root.
Pre-existing e2e-evidence/
Setup does NOT wipe existing evidence. Writes .vf/state/legacy-evidence-detected.txt and prompts you to archive first with: tar -czf legacy-e2e-$(date +%Y%m%d).tar.gz e2e-evidence/
Gotcha:
strict enforcement installs the block-test-files hook which refuses any write to *.test.*, *.spec.*, or *_test.* paths. If your repo has legacy test files, run /vf-setup in permissive mode first, migrate, then upgrade to strict.02 /validate-plan — Define a validation journey
02Planning an OAuth sign-up journey15–30s
Console output
terminalconsole
> /validate-plan user-signup-with-oauth [validate-plan] Reading journey name: user-signup-with-oauth [validate-plan] Scanning routes: /signup, /api/auth/*, /dashboard [validate-plan] Detected: next-auth@5.0.0, Google provider, Drizzle ORM [validate-plan] Generated PASS criteria: 5 criteria, 11 steps Plan written: plans/journeys/user-signup-with-oauth.yaml (4.1 KB)
Generated journey YAML
plans/journeys/user-signup-with-oauth.yamlyaml
journey: slug: user-signup-with-oauth platform: web dependencies: [] evidence_dir: e2e-evidence/user-signup-with-oauth/ pass_criteria: - id: pc-01 description: "Landing page renders signup CTA" evidence: [step-01-landing-rendered.png] - id: pc-02 description: "OAuth consent screen loads (Google)" evidence: [step-02-oauth-consent.png, step-02-oauth-network.json] - id: pc-03 description: "Callback creates user row in DB" evidence: [step-03-db-user-row.json] - id: pc-04 description: "Session cookie set with Secure+HttpOnly+SameSite=Lax" evidence: [step-04-cookie-headers.txt] - id: pc-05 description: "Dashboard renders with user email visible" evidence: [step-05-dashboard-rendered.png] steps: - play: navigate url: https://localhost:3000/signup - play: click selector: "[data-testid=google-signup]" - play: wait_for_url pattern: "accounts.google.com/*" # ... 8 more steps
Gotcha: Regenerating a journey (
/validate-plan user-signup-with-oauth a second time) writes user-signup-with-oauth.v2.yaml next to the original instead of overwriting. You must explicitly rm plans/journeys/user-signup-with-oauth.yaml first to replace it. This prevents silent contract drift between runs.03 /validate-sweep — Execute and capture evidence
03Running a full sweep — partial FAIL with evidence2–15min
Console output
terminalconsole
> /validate-sweep [sweep] Lock file written: .vf/state/validation-in-progress.lock [sweep] Preflight PASS (build: 2.1s, server :3000: reachable) [sweep] Loaded 1 journey: user-signup-with-oauth [sweep] Booting Playwright (chromium, headless=false, 1280x720) [sweep] Journey: user-signup-with-oauth [01/11] navigate /signup → step-01-landing-rendered.png (82 KB) ✓ [02/11] click [data-testid=google-signup] → step-02-oauth-consent.png (104 KB) ✓ [02/11] network capture → step-02-oauth-network.json (3 KB) ✓ [03/11] fill oauth credentials → step-03-oauth-submitted.png (91 KB) ✓ [04/11] wait_for_url /dashboard → timeout 30s ✗ [04/11] FAIL — pc-03,pc-04,pc-05 blocked on callback timeout [sweep] Writing verdict: e2e-evidence/user-signup-with-oauth/verdict.md [sweep] Writing report: e2e-evidence/report.md [sweep] Lock released. Result: FAIL (1 journey, 3 PASS criteria blocked) See: e2e-evidence/report.md
Evidence captured
e2e-evidence/user-signup-with-oauth/tree
├── step-01-landing-rendered.png 82 KB ├── step-02-oauth-consent.png 104 KB ├── step-02-oauth-network.json 3 KB ├── step-03-oauth-submitted.png 91 KB ├── step-04-callback-timeout-console.txt 1 KB ├── evidence-inventory.txt 612 B └── verdict.md 1.4 KB
Gotcha: Never delete
.vf/state/validation-in-progress.lock manually while sweep is running. Other commands (/validate-dashboard, /validate-benchmark) honor the lock. If sweep crashes, the lock becomes stale after 1 hour and is auto-ignored.04 /validate-fix — 3-strike fix loop
04Fixing the OAuth callback timeout — attempt 1 succeeds5–45min
Console output — successful fix
terminalconsole
> /validate-fix [fix] Reading: e2e-evidence/user-signup-with-oauth/verdict.md [fix] First FAIL: pc-04 — callback timeout at /dashboard [fix] Attempt 1 of 3 [fix] Root cause hypothesis: missing NEXTAUTH_URL env var on localhost [fix] Proposed fix: File: .env.local +NEXTAUTH_URL=http://localhost:3000 +NEXTAUTH_SECRET=dev-secret-do-not-use-in-prod [fix] Applying fix. Running /validate-sweep on user-signup-with-oauth only. [04/11] wait_for_url /dashboard → 2.1s ✓ [05/11] dashboard renders email → step-05-dashboard-rendered.png ✓ [06–11] all remaining steps PASS [fix] Attempt 1 verdict: PASS [fix] Notes written to e2e-evidence/user-signup-with-oauth/forge-attempt-1/ [fix] Exiting — journey PASSes. 2 attempts were not consumed.
3-strike exhaustion
terminalconsole
[fix] Attempt 3 of 3 FAILED — same symptom (callback timeout) [fix] Hypothesis recap: attempt-1: missing NEXTAUTH_URL → partial fix, symptom moved attempt-2: cookie SameSite=Strict → regressed pc-04 anyway attempt-3: redirect URL allowlist → no observable change [fix] Marking journey UNFIXABLE. No further automatic attempts. [fix] Human intervention required. [fix] See: e2e-evidence/user-signup-with-oauth/UNFIXABLE.md
Gotcha: If attempt 2 and 3 target the same root cause (even if the patch differs), VF rejects the identical hypothesis at plan time: "attempt-3 rejected — same root cause as attempt-2." This forces exploration, not thrashing.
05 /validate-consensus — Multi-validator consensus
053-validator consensus — majority disagreement resolved to PASSv1.5 preview
Console output
terminalconsole
> /validate-consensus --validators 3 [consensus] Coordinator: spawning 3 independent validators [consensus] Preflight PASS (required before validators spawn) [consensus] Dispatching validators in parallel... validator-1 → e2e-evidence/consensus/validator-1/ (pid 48112) validator-2 → e2e-evidence/consensus/validator-2/ (pid 48113) validator-3 → e2e-evidence/consensus/validator-3/ (pid 48114) [consensus] Waiting for completion... validator-1 COMPLETE (4m 12s, 11/11 steps PASS, verdict: PASS) validator-2 COMPLETE (4m 38s, 10/11 steps PASS, verdict: FAIL on pc-04) validator-3 COMPLETE (4m 05s, 11/11 steps PASS, verdict: PASS) [consensus] Invoking synthesizer... [synthesizer] Per-journey tuple: (pass=2, fail=1, total=3) [synthesizer] State: MAJORITY_PASS (agreement_ratio=0.67) [synthesizer] Confidence: MEDIUM → triggers disagreement protocol [synthesizer] Diverging criterion: pc-04 (v2 FAIL, v1/v3 PASS) [synthesizer] Running sequential-analysis over pc-04 evidence... v1 step-04-dashboard.png: dashboard renders, cookie Secure+HttpOnly ✓ v2 step-04-cookie-headers.txt: Set-Cookie missing SameSite attribute v3 step-04-dashboard.png: dashboard renders, cookie present ✓ [synthesizer] Root cause: case (b) contradictory evidence [synthesizer] Resolution: stale browser state in v2; re-running pc-04 only [synthesizer] Re-synthesized: (pass=3, fail=0) → UNANIMOUS_PASS [synthesizer] Final verdict: PASS (confidence HIGH) Report: e2e-evidence/consensus/report.md
Synthesizer report excerpt
e2e-evidence/consensus/report.mdmarkdown
Verdict: PASS
Confidence: HIGH (after disagreement resolution)
Validators: 3 (all independent)
Agreement: 1.00 (3/3 PASS after re-resolution)
Synthesis trace
Journey v1 v2 v3 State Final
user-signup-with-oauth PASS FAIL → PASS PASS UNANIMOUS_PASS PASS
Disagreement protocol fired
- pc-04 had one FAIL (v2) against two PASS
- Root cause: stale browser state in validator-2
- Action: re-ran v2 pc-04 in isolation — evidence now agrees
- Cited evidence:
e2e-evidence/consensus/validator-1/step-04-dashboard.png
e2e-evidence/consensus/validator-2/pc-04-rerun/step-04-cookie-headers.txt
e2e-evidence/consensus/validator-3/step-04-dashboard.pngGotcha: Consensus verdicts cannot be upgraded retroactively. If the original disagreement protocol is unresolvable (case c — under-specified criterion), the journey ships as
DISAGREEMENT_UNRESOLVED with LOW confidence. The synthesizer never fabricates agreement to ship.06 /validate-dashboard — Evidence dashboard HTML
06Generating a self-contained evidence dashboard~5s
Console output
terminalconsole
> /validate-dashboard [dashboard] Reading e2e-evidence/ (3 journeys, 47 evidence files, 4.2 MB) [dashboard] Reading .vf/benchmark-history.json (last 10 runs) [dashboard] Generating thumbnails: 12 screenshots @ 240x135 [dashboard] Inlining: styles (2.1 KB), trend chart (SVG, 6 KB) [dashboard] Writing: e2e-evidence/dashboard.html (1.2 MB, self-contained) Dashboard ready. Open with: open e2e-evidence/dashboard.html
Dashboard summary panel
| Journey | Status | PASS criteria | Evidence files | Last run |
|---|---|---|---|---|
| user-signup-with-oauth | PASS | 5/5 | 11 | 2026-04-23 21:35 |
| dashboard-logout | PASS | 3/3 | 6 | 2026-04-23 21:36 |
| billing-checkout | FAIL | 2/4 | 8 | 2026-04-23 21:41 |
Gotcha: The dashboard is self-contained HTML (works via
file://). But inlined screenshots can exceed 20 MB for runs with 50+ screenshots. Use --compress to serve screenshots from a sibling dashboard-assets/directory instead.07 /validate-benchmark — Posture score + trend
07Measuring and trending validation posture — Grade A~10s
Console output
terminalconsole
> /validate-benchmark [benchmark] Reading plans/journeys/ (3 journeys defined) [benchmark] Reading e2e-evidence/ (3 journeys executed, 47 evidence files) [benchmark] Reading .vf/config.json (enforcement: strict) [benchmark] Reading .vf/benchmark-history.json (9 prior runs) [benchmark] Computing... ValidationForge Benchmark — 2026-04-23 21:47:02 ================================================== Coverage 35/35 (3/3 journeys validated) Evidence Quality 27/30 (47 files cited, 0 zero-byte, 2 missing headers) Enforcement 25/25 (strict + no test files + no mocks + all hooks) Speed 8/10 (9m 14s vs project median 7m 30s) -------------------------------------------------- Total Score 95/100 Grade A Trend +3 vs run #9 (was 92/100) Appended to: .vf/benchmark-history.json
Scoring dimensions
| Dimension | Weight | What it measures |
|---|---|---|
| Coverage | 35% | Validated journeys / total discoverable features |
| Evidence Quality | 30% | Evidence citations, observation quality, verdict rigor |
| Enforcement | 25% | Hooks installed, no mocks, no test files, rules active |
| Speed | 10% | Validation time relative to project size |
History JSON (appended, never overwritten)
.vf/benchmark-history.jsonjson
[
{ "ts": "2026-04-20T10:11:00Z", "score": 88, "grade": "B+", "coverage": 32 },
{ "ts": "2026-04-22T15:33:00Z", "score": 92, "grade": "A-", "coverage": 35 },
{ "ts": "2026-04-23T21:47:02Z", "score": 95, "grade": "A", "coverage": 35,
"delta": { "score": 3, "coverage": 0, "evidence_quality": 2, "speed": 1 } }
]Gotcha: Cross-project comparison is invalid — only same-project trends count. The history file is project-local. Copying
.vf/benchmark-history.json from one repo to another makes scores look valid but the trend line is meaningless. VF does not detect this automatically.08 Quick-Reference Gotcha Matrix
| Command | Single biggest gotcha |
|---|---|
| /vf-setup | strict mode blocks writes to *.test.* — migrate legacy tests first |
| /validate-plan | Re-running writes .v2.yaml instead of overwriting; must rm first |
| /validate-sweep | Don't delete the in-progress lock file manually |
| /validate-fix | Attempt N rejected if same root cause as attempt N-1 |
| /validate-consensus | SPLIT never silently downgrades to MAJORITY — HIGH requires unanimity |
| /validate-dashboard | Inlined screenshots can exceed 20 MB; use --compress for email |
| /validate-benchmark | History JSON is project-local; copying it invalidates trends |