SteelSpine for AI Devs — Debug any agent. Replay any run.

The math is brutal

At 99% accuracy per step, a 100-step agent still fails 63% of the time. When it fails, your logs say "completed successfully."

63%

of 100-step agent tasks fail at 99% per-step accuracy

46%

of developers don't trust what their AI outputs

32%

output quality is the #1 blocker to production

0

no known tool combines replay + proof + memory

See it catch a real bug

A live recording of SteelSpine wrapping a real agent, detecting divergence between two runs, and producing the signed audit report. No theory; actual terminal output.

↑ A real refund-bot run. Watch SteelSpine catch the policy violation in real time.

Three commands, full debug flow

Wrap any agent or shell command. Capture every event, compare runs to find divergence, replay from any state. No SDK install. No framework integration. No code changes.

01 / Capture

Run once

Wrap any agent. Stream every event into a hash-chained log. Plain-English verdict after each run.

steelspine run python3 my_agent.py

02 / Compare

Run twice

Find exactly where two runs diverged. Line-level classification: failure, recovery, success, artifact event.

steelspine compare

03 / Verify

Prove integrity

Cryptographic chain. Tamper-evident. Auditor verifies with public key. Independently checkable forever.

steelspine verify-run

What you get beyond capture

Replay any failure deterministically

steelspine replay reconstructs state at any captured event. Branch alternate paths from any decision point with steelspine simulate. Explore what-if scenarios offline without burning API calls.

Persistent memory across sessions

Every LLM call starts blind. SteelSpine's memory proxy injects persistent entity context into every prompt. One env var: OLLAMA_HOST=http://localhost:11435. Works with Ollama, LM Studio, llama.cpp, vLLM, and any OpenAI-compatible endpoint.

Native framework integrations

OpenTelemetry receiver for LangChain, LangGraph, LlamaIndex, CrewAI, OpenAI Agents SDK, Haystack, DSPy, and 50+ OTel-instrumented frameworks. One env var (OTEL_EXPORTER_OTLP_ENDPOINT) and traces flow into the signed audit chain.

Claude Code and Cursor MCP support

SteelSpine ships an MCP server that exposes 8 inspection tools to Cursor, Claude Code, and Windsurf. Your AI assistant can query captured runs, verify integrity, compare divergence, and search history directly from inside the IDE. See integrations.

CI eval gating

steelspine compare --strict exits non-zero on regression versus baseline. Drop into GitHub Actions, GitLab CI, or Jenkins. Block agent regressions from merging. See CI/CD recipe.

Pricing

One tier for indie devs and small AI teams. Honest pricing. No paywalls on features.

Free

Free, no account needed

Unlimited captures and runs
Full replay, compare, verify, branch
OpenTelemetry receiver (50+ frameworks)
Persistent memory proxy
CI eval gating (compare --strict)
Cryptographic audit chain (HMAC + Ed25519)
Local-first deployment, no cloud uploads
1 user, 1 machine

Get it free

Multi-seat teams or enterprise compliance deployment? See DevOps tier or compliance tier.

Why did your agentdo that?