Debug, replay, and verify any run.

Tamper-evident proof for agents and programs.

Scroll to continue ↓

Your AI ran.
Something went wrong.
You have no idea why.

That's it. One command. Full history. Proof it wasn't touched.
No vendor lock-in. Runs locally. Works with anything.

AI Agent Debugging & Continuity Infrastructure

Why did your agent
do that?

Your agent failed. You have logs. You still don't know why —
and it won't remember any of it next session.
SteelSpine AI fixes both. Zero code changes.

63% of 100-step agent tasks fail
at 99% per-step accuracy¹
46% of developers don't trust
what their AI outputs²
32% output quality is the
#1 blocker to production²
0 no known tool combines
replay + proof + memory

¹ Vellum / Towards Data Science, 2025  ·  ² Stack Overflow Developer Survey, 49,000 respondents, 2025  ·  ³ Gartner, 2025

The Short Version

The problem is real. The gap is wide.
No known tool closes it.

Other tools give you traces. A trace shows you what happened — it doesn't let you replay it, prove it, or remember it next session. SteelSpine AI does all three.

01
Debug

At 99% accuracy per step, a 100-step agent still fails 63% of the time — the math compounds. When it fails, your logs say "completed successfully." SteelSpine AI shows you the exact event where it went wrong and why, then lets you replay it deterministically from that point.

Zero code changes
02
Remember

Every LLM call starts from zero. No memory of last session, no entity context, no continuity. Gartner found a 20% customer churn increase when agents lose session context³ — and stuffing more context past 100k tokens doubles inference time and quadruples cost. Change one URL and SteelSpine AI injects persistent memory into every request — no framework changes, ever.

One URL change
03
Prove

LangSmith, Galileo, Arize — they all give you traces. Traces show you what happened. They cannot prove nothing was changed. SteelSpine AI's SHA-256 rolling hash chain detects any edit, deletion, or insertion to any event, past or present. Cryptographically.

Patents Pending

See It In Action

Add it to any agent in 30 seconds.

steelspine — agent session

# Wrap your agent — nothing else changes

$ steelspine run python my_agent.py

✓ Run captured: run_0047 | 312 events | 4.2s

✓ Verdict: SUCCEEDED — hash chain clean

Divergence detected vs run_0046 — auto-compare running


# Find out exactly where two runs split

$ steelspine compare

↳ Divergence at event 187: param "query" changed

↳ 3 downstream decisions invalidated — root cause isolated


# Cryptographic proof of what your AI decided

$ steelspine verify-run

✓ SHA-256 chain: CLEAN | 312/312 events verified | Audit ready

The Difference

Trace-only tools show you what happened.
SteelSpine AI lets you act on it.

LangSmith, Galileo, Arize, and W&B Weave are all built around the same idea: collect traces, visualize spans. That's useful. It's also where they stop.

Trace-only tools

  • See what happened — read-only logs and spans
  • Requires SDK install or framework-specific wiring
  • No replay — you can read the trace, not re-run it
  • No memory between sessions — every call starts blind
  • No tamper detection — logs can be edited silently
  • Often SDK-dependent or framework-specific wiring required

SteelSpine AI

  • Full causal event record — every decision, every tool call
  • Zero instrumentation — wrap any command, any language
  • Deterministic replay from any event — branch at any point
  • Persistent entity memory via proxy — one URL change
  • SHA-256 hash chain — cryptographic tamper detection
  • Any agent, any framework, any LLM — no lock-in
"Traces — not code — provide the only record of what your agent did and why." — LangSmith
SteelSpine AI agrees. Then goes further: replay it, prove it, and remember it.
For the technically curious

The Problem

AI agents run. Things go wrong.
You have no idea why.

LLMs are stateless. Every run is a black box. When an agent fails — or worse, silently produces a wrong answer — you have logs, maybe. You don't have a causal record of what it decided, why, and what changed.

Agents fail silently

A tool call returns bad data at event 47. The agent recovers — but the final answer is wrong. Your logs say "completed successfully."

🔀

Runs diverge unexpectedly

Two runs of the same agent on identical input produce different results. You have no way to find where they split or what caused it.

📋

Audit trails don't exist

Regulated industries need proof of what an AI did and why. "The model decided" is not a compliance answer. You need a signed ledger.

How It Works

SteelSpine AI adds a causal execution layer
underneath every run.

Wrap any agent with steelspine run. Every event is captured, hashed, and indexed in real time. No instrumentation, no SDK required. Full replay, divergence detection, and tamper-evident audit — out of the box.

steelspine run — zero instrumentation

# Before: blind

python my_agent.py


# After: full causal record

$ steelspine run python my_agent.py

✓ 247 events captured | Chain: CLEAN | 4.1s

✓ Verdict: SUCCEEDED — no failures detected

Debug: One command wraps anything.

steelspine run works on Python, Node, shell scripts, Docker containers. No changes to your agent required — ever.

  • Every tool call, decision, and state transition captured
  • SHA-256 hash chain written after every event
  • Plain-English verdict on every run
  • Works with LangChain, AutoGen, LangGraph, raw Python
steelspine compare — divergence detection

$ steelspine compare run_0041 run_0042


Run A (run_0041): SUCCEEDED

Run B (run_0042): FAILED at event 112


param "temperature" 0.2 → 0.8

↳ 5 downstream decisions changed

↳ Root cause: config drift

Compare: Find the exact split point.

Event-by-event comparison of any two runs. Finds the precise divergence point, shows what changed, traces every downstream decision that flowed from it.

  • Automatic comparison after every run
  • Event-level root cause isolation
  • Branch simulation: "what if" scenario testing
  • Plain-English verdict — no log parsing required
steelspine verify-run — tamper-evident audit

$ steelspine verify-run --compliance-html


run_0041: CLEAN (247 events verified)

run_0042: CLEAN (301 events verified)


Hash chain: SHA-256 rolling

EU AI Act Art.12: MAPPED

Report: self-contained HTML — auditor ready

Prove: Cryptographic audit. Always.

Every run produces a verifiable audit report. The SHA-256 rolling hash chain detects any byte-level change to any event — past, present, or future. No known tool offers this.

  • Detects any edit, deletion, or insertion in any event
  • Self-contained HTML report — submit to any auditor
  • EU AI Act Art. 12 compliance report via --compliance-html
  • Patents Pending
steelspine memory-agent — persistent memory

$ steelspine memory-agent

✓ Proxy running at http://localhost:11435


# One line in your agent

base_url = "http://localhost:11435"


✓ 3 entities recalled from prior sessions

✓ Session context injected — 14 prior interactions

Remember: Persistent memory via proxy.

A transparent LLM proxy sits between your agent and the model. It automatically recalls entities and injects prior session context — no framework changes, no new SDK to learn.

  • Works with any OpenAI-compatible API
  • Tested with Ollama, qwen3:32b, and local models
  • Entities persist across sessions automatically
  • Zero changes to your agent logic

Architecture

AI proposes. SteelSpine AI governs.

SteelSpine AI sits between your agent and the world — capturing, evaluating, and recording every decision into a causal ledger that outlives the session.

🤖
Your Agent
LLM · LangChain · AutoGen
raw Python · any runtime
every event
SteelSpine AI
captures · hashes · indexes
replays · compares · remembers
causal record
📒
Decision Ledger
tamper-evident · auditable
replayable · provable
A financial ledger records what moved and when. SteelSpine AI does the same for agent decisions — a permanent, verifiable record of every event, from first input to final output.

Command Reference

Every command maps to a promise.

Debug — find out why it failed
steelspine run

Wrap and capture any agent. Zero modification. Plain-English verdict after every run.

steelspine compare

Event-level divergence between any two runs. Traces root cause through downstream decisions.

steelspine status

Instant triage dashboard. Red attention banner fires automatically on critical signals.

steelspine what

"What failed in the last 10 runs?" — natural-language query, plain-English answer.

steelspine monitor

Background daemon. Proactive failure alerts in real time.

Remember
steelspine memory-agent

Transparent LLM proxy on :11435. Injects persistent entity memory into every request. One URL change. Any model.

Prove — deterministic replay + cryptographic proof
steelspine verify-run

SHA-256 hash chain verification. --html for dev report. --compliance-html for EU AI Act Art.12.

steelspine replay

Deterministic replay from any event. Branch at any point to test alternate execution paths.

steelspine simulate

Branch alternative futures. Test different inputs against any captured agent state.

steelspine patterns

Cross-run failure pattern detection across your full run history.

Setup & Tooling
steelspine ui

Browser dashboard — run manager, memory browser, audit viewer, timeline.

steelspine doctor

Health check with auto-fix. Detects config drift, stale state, and storage issues.