Your Test Output Is Burning Tokens: Taming Verbose Reporters for AI Agents
Test runners like Jest and Vitest ship with reporters designed for humans watching terminals. Every file gets a line:
PASS src/components/Button.test.tsxPASS src/components/Card.test.tsxPASS src/components/Dialog.test.tsxPASS src/components/Dropdown.test.tsxPASS src/utils/format.test.tsPASS src/utils/date.test.ts... (200 more lines)FAIL src/components/Nav.test.tsx● Nav > renders active stateExpected: "active"Received: "inactive"
For a developer at their terminal, scrolling green text reassures. But tests now run in two other contexts where that output costs more than it helps:
- CI — nobody reads the log unless something fails. A red build forces you to scroll past hundreds of "PASS" lines to find the failure.
- AI agents read every line of that output. Each "PASS" line consumes tokens, filling the context window with successful test output instead of the actual problem.
We hit this at Buffer. A 215-suite test run produced
~3,500 tokens of output, almost all "PASS" lines. Our AI agent spent more tokens
reading test results than writing code. We tried adding --reporter=dot to our
CLAUDE.md instructions, but the agent didn't always use it. The flag was a
suggestion; we needed a guarantee.
Detect the Environment, Choose the Reporter
The fix: detect the environment in your test config and switch reporters automatically. No agent instructions required, no flags to remember.
Claude Code sets
CLAUDECODE=1 in every
shell it spawns. CI providers — GitHub Actions, GitLab CI, CircleCI, Travis CI,
and Jenkins — all set
CI=true.
Your config reads these variables and picks the right reporter — deterministic
regardless of how the agent invokes the test command.
Here's what we shipped at Buffer. CI and Claude Code each get their own reporter configuration; local development keeps the default.
For Jest, add the logic to jest.config.ts. The
summary reporter
prints a final count plus full details for any failures, with no per-file
output. Jest's default summaryThreshold is 20, meaning it only prints failure
details when more than 20 tests fail. Set it to 0 so every failure prints in
full. In CI, you can pair it with a custom reporter that collects failures for
GitHub PR comments:
const isCI = process.env.CI === "true";const isClaude = process.env.CLAUDECODE === "1";function getReporters() {if (isCI) {return [["summary", { summaryThreshold: 0 }], "jest-ci-reporter"];}if (isClaude) {return [["summary", { summaryThreshold: 0 }]];}return ["default"];}export default {reporters: getReporters(),// ... rest of your config};
For Vitest, add it to vitest.config.ts. The
dot reporter compresses
each file to a single character — a dot for pass, an x for fail:
import { defineConfig } from "vitest/config";const isCI = process.env.CI === "true";const isClaude = process.env.CLAUDECODE === "1";function getReporters() {if (isCI) {return ["dot", "ci-reporter"];}if (isClaude) {return ["dot"];}return ["default"];}export default defineConfig({test: {reporters: getReporters(),// ... rest of your config},});
Both frameworks also accept reporter flags on the command line
(--reporters for Jest,
--reporter for Vitest). But relying
on an AI agent to pass the right flag is probabilistic — the agent may forget or
run a different test script that omits it. Environment variables make it
deterministic.
The resulting matrix:
| Environment | Jest Reporter | Vitest Reporter |
|---|---|---|
| Local dev | default | default |
| CI | summary + CI reporter | dot + CI reporter |
| Claude Code | summary | dot |
What the Agent Sees
Before (default reporter, ~250 lines):
PASS src/components/Button.test.tsx (3 suites, 12 tests)PASS src/components/Card.test.tsx (2 suites, 8 tests)... (200+ more PASS lines)FAIL src/components/Nav.test.tsx● Nav > renders active stateexpect(received).toBe(expected)Expected: "active"Received: "inactive"Test Suites: 1 failed, 214 passed, 215 totalTests: 1 failed, 847 passed, 848 total
After (summary reporter, ~10 lines):
FAIL src/components/Nav.test.tsx● Nav > renders active stateexpect(received).toBe(expected)Expected: "active"Received: "inactive"Test Suites: 1 failed, 214 passed, 215 totalTests: 1 failed, 847 passed, 848 total
Same failure details, 96% less output.
When all tests pass, the gap widens further. The default reporter prints every file name — 215 lines. The summary reporter prints two:
Test Suites: 215 passed, 215 totalTests: 848 passed, 848 total
Trade-offs
You lose progress feedback. The summary reporter stays silent until the suite finishes. For long-running suites, the agent sees nothing until completion. In practice this has not mattered — AI agents do not need reassurance that the process is running.
Debugging intermittent failures gets harder. The default reporter's per-file
timing helps identify slow or flaky tests. Use the verbose reporter when
investigating flakiness.
The Same Fix Applies to Linters and Build Logs
Test reporters are one interface between your tools and whatever reads the
output. Linters and type checkers have the same problem. Anywhere a tool
produces verbose output that an AI agent consumes, you can detect the
environment and switch to a compact format. Check your test config — if
process.env.CI is set and you're still using the default reporter, you're
paying for output nobody reads.