Debug, replay, and verify any run.
Tamper-evident proof for agents and programs.
Scroll to continue ↓
That's it. One command. Full history. Proof it wasn't touched.
No vendor lock-in. Runs locally. Works with anything.
Your agent failed. You have logs. You still don't know why —
and it won't remember any of it next session.
SteelSpine AI fixes both. Zero code changes.
¹ Vellum / Towards Data Science, 2025 · ² Stack Overflow Developer Survey, 49,000 respondents, 2025 · ³ Gartner, 2025
The Short Version
Other tools give you traces. A trace shows you what happened — it doesn't let you replay it, prove it, or remember it next session. SteelSpine AI does all three.
At 99% accuracy per step, a 100-step agent still fails 63% of the time — the math compounds. When it fails, your logs say "completed successfully." SteelSpine AI shows you the exact event where it went wrong and why, then lets you replay it deterministically from that point.
Zero code changesEvery LLM call starts from zero. No memory of last session, no entity context, no continuity. Gartner found a 20% customer churn increase when agents lose session context³ — and stuffing more context past 100k tokens doubles inference time and quadruples cost. Change one URL and SteelSpine AI injects persistent memory into every request — no framework changes, ever.
One URL changeLangSmith, Galileo, Arize — they all give you traces. Traces show you what happened. They cannot prove nothing was changed. SteelSpine AI's SHA-256 rolling hash chain detects any edit, deletion, or insertion to any event, past or present. Cryptographically.
Patents PendingSee It In Action
# Wrap your agent — nothing else changes
$ steelspine run python my_agent.py
✓ Run captured: run_0047 | 312 events | 4.2s
✓ Verdict: SUCCEEDED — hash chain clean
Divergence detected vs run_0046 — auto-compare running
# Find out exactly where two runs split
$ steelspine compare
↳ Divergence at event 187: param "query" changed
↳ 3 downstream decisions invalidated — root cause isolated
# Cryptographic proof of what your AI decided
$ steelspine verify-run
✓ SHA-256 chain: CLEAN | 312/312 events verified | Audit ready
The Difference
LangSmith, Galileo, Arize, and W&B Weave are all built around the same idea: collect traces, visualize spans. That's useful. It's also where they stop.
Trace-only tools
SteelSpine AI
"Traces — not code — provide the only record of what your agent did and why." — LangSmith
SteelSpine AI agrees. Then goes further: replay it, prove it, and remember it.
The Problem
LLMs are stateless. Every run is a black box. When an agent fails — or worse, silently produces a wrong answer — you have logs, maybe. You don't have a causal record of what it decided, why, and what changed.
A tool call returns bad data at event 47. The agent recovers — but the final answer is wrong. Your logs say "completed successfully."
Two runs of the same agent on identical input produce different results. You have no way to find where they split or what caused it.
Regulated industries need proof of what an AI did and why. "The model decided" is not a compliance answer. You need a signed ledger.
How It Works
Wrap any agent with steelspine run. Every event is captured, hashed, and indexed in real time.
No instrumentation, no SDK required. Full replay, divergence detection, and tamper-evident audit — out of the box.
# Before: blind
python my_agent.py
# After: full causal record
$ steelspine run python my_agent.py
✓ 247 events captured | Chain: CLEAN | 4.1s
✓ Verdict: SUCCEEDED — no failures detected
steelspine run works on Python, Node, shell scripts, Docker containers. No changes to your agent required — ever.
$ steelspine compare run_0041 run_0042
Run A (run_0041): SUCCEEDED
Run B (run_0042): FAILED at event 112
param "temperature" 0.2 → 0.8
↳ 5 downstream decisions changed
↳ Root cause: config drift
Event-by-event comparison of any two runs. Finds the precise divergence point, shows what changed, traces every downstream decision that flowed from it.
$ steelspine verify-run --compliance-html
run_0041: CLEAN (247 events verified)
run_0042: CLEAN (301 events verified)
Hash chain: SHA-256 rolling
EU AI Act Art.12: MAPPED
Report: self-contained HTML — auditor ready
Every run produces a verifiable audit report. The SHA-256 rolling hash chain detects any byte-level change to any event — past, present, or future. No known tool offers this.
--compliance-html$ steelspine memory-agent
✓ Proxy running at http://localhost:11435
# One line in your agent
base_url = "http://localhost:11435"
✓ 3 entities recalled from prior sessions
✓ Session context injected — 14 prior interactions
A transparent LLM proxy sits between your agent and the model. It automatically recalls entities and injects prior session context — no framework changes, no new SDK to learn.
Architecture
SteelSpine AI sits between your agent and the world — capturing, evaluating, and recording every decision into a causal ledger that outlives the session.
A financial ledger records what moved and when. SteelSpine AI does the same for agent decisions — a permanent, verifiable record of every event, from first input to final output.
Command Reference
steelspine run
Wrap and capture any agent. Zero modification. Plain-English verdict after every run.
steelspine compare
Event-level divergence between any two runs. Traces root cause through downstream decisions.
steelspine status
Instant triage dashboard. Red attention banner fires automatically on critical signals.
steelspine what
"What failed in the last 10 runs?" — natural-language query, plain-English answer.
steelspine monitor
Background daemon. Proactive failure alerts in real time.
steelspine memory-agent
Transparent LLM proxy on :11435. Injects persistent entity memory into every request. One URL change. Any model.
steelspine verify-run
SHA-256 hash chain verification. --html for dev report. --compliance-html for EU AI Act Art.12.
steelspine replay
Deterministic replay from any event. Branch at any point to test alternate execution paths.
steelspine simulate
Branch alternative futures. Test different inputs against any captured agent state.
steelspine patterns
Cross-run failure pattern detection across your full run history.
steelspine ui
Browser dashboard — run manager, memory browser, audit viewer, timeline.
steelspine doctor
Health check with auto-fix. Detects config drift, stale state, and storage issues.