Open Source Notable

TracePact: Open-Source Tool Catches Silent AI Agent Failures Before Production

March 8, 2026 2 min read

Most AI agent bugs don't look like bad output. They look like this: you tweak a prompt, the agent's final answer seems fine, but it silently stopped reading config files before deploying, or swapped npm test for npm run build. The output passes a glance check. The behavior is broken.

TracePact is a new open-source testing framework built specifically for this problem. It records a known-good agent run as a "cassette" (a snapshot of every tool call, in order, with arguments), then diffs future runs against that baseline.

The diff output is concrete:

read_file (seq 0) (removed) - the agent stopped reading a file it used to read
bash.cmd: "npm test" -> "npm run build" - it changed which command it runs

The tool supports assertions that feel familiar if you've written unit tests: toHaveCalledToolsInOrder(['read_file', 'write_file']), toHaveToolCallCount('read_file', 2), toNotHaveCalledTool('bash'). It also supports MCP (Model Context Protocol) tracing, so you can verify which MCP servers and tools your agent invokes.

The CI integration is the practical selling point. Record a baseline once with npx tracepact run --live --record, then replay in CI with npx tracepact run --replay ./cassettes - no API calls needed for the replay. Set --fail-on warn to break your build when behavioral drift is detected.

TracePact is built in TypeScript with packages for Vitest integration, a CLI, and a Promptfoo adapter. It's MIT-licensed and available on npm as @tracepact/core.

This fills a real gap. Existing LLM evaluation tools focus on output quality - does the response sound right? TracePact focuses on behavior correctness - did the agent do the right things in the right order? For coding agents, ops automation, and workflow tools where the sequence of actions matters as much as the final output, that distinction is critical.

Related Tools

More from today

LlamaIndex Silently Defaults to OpenAI, Leaking "Local" RAG Data

Nullbook Is an Open-Source Finance AI That Keeps Your Data on Your Laptop

AI-Hist Lets You Search 50K+ Claude Code and Codex Conversations in SQLite

Cookie Preferences