EU AI Act Article 12 · Enforced August 2, 2026
Debug, replay, and verify any run.
Tamper-evident audit trails for any AI agent.
Cryptographically sealed runs, replay on demand, signed compliance reports.
Scroll to continue ↓
That's it. One command. Full history. Proof it wasn't touched.
No vendor lock-in. Runs locally. Works with anything.
Your agent failed. You have logs. You still don't know why —
and it won't remember any of it next session.
SteelSpine AI fixes both. Zero code changes.
¹ Vellum / Towards Data Science, 2025 · ² Stack Overflow Developer Survey, 49,000 respondents, 2025 · ³ Gartner, 2025
The Short Version
Other tools give you traces. A trace shows you what happened — it doesn't let you replay it, prove it, or remember it next session. SteelSpine AI does all three.
At 99% accuracy per step, a 100-step agent still fails 63% of the time — the math compounds. When it fails, your logs say "completed successfully." SteelSpine AI shows you the exact event where it went wrong and why, then lets you replay it deterministically from that point. Every failure is permanently recorded — find it days, months, or years later.
Zero code changesEvery LLM call starts from zero. No memory of last session, no entity context, no continuity. Gartner found a 20% customer churn increase when agents lose session context³ — and stuffing more context past 100k tokens doubles inference time and quadruples cost. Change one URL and SteelSpine AI injects persistent memory into every request — no framework changes, ever.
One URL changeLangSmith, Galileo, Arize — they all give you traces. Traces show you what happened. They cannot prove nothing was changed. SteelSpine AI's SHA-256 rolling hash chain detects any edit, deletion, or insertion to any event, past or present. Cryptographically.
Patents PendingSee It In Action
↑ A real refund-bot run. Watch SteelSpine catch the policy violation in real time.
Or read it as a sequence:
# Wrap your agent — nothing else changes
$ steelspine run python my_agent.py
✓ Run captured: run_0047 | 312 events | 4.2s
✓ Verdict: SUCCEEDED — hash chain clean
Divergence detected vs run_0046 — auto-compare running
# Find out exactly where two runs split
$ steelspine compare
↳ Divergence at event 187: param "query" changed
↳ 3 downstream decisions invalidated — root cause isolated
# Cryptographic proof of what your AI decided
$ steelspine verify-run
✓ SHA-256 chain: CLEAN | 312/312 events verified | Audit ready
Beyond Capture
The capture-and-audit demo above is the first 10% of what SteelSpine does. Underneath the CLI is a five-layer infrastructure stack — every piece runs locally, no cloud dependency, no vendor lock-in.
Wrap any agent or command. Stream stdout/stderr to a hash-chained event log. Replay offline against any captured state.
steelspine run · replay-run · branch-create
HMAC-SHA256 + Ed25519 chain. Tamper-evident. Independently verifiable by an auditor with just the public key. EU AI Act Article 12 compliant out of the box. Optional hardening: compliance_mode auto-enables RFC 3161 timestamping via eIDAS-accredited TSA; --pq-sign adds ML-DSA-65 post-quantum signatures (NIST FIPS 204) for long-archive audits.
verify-run · pack-create · pack-verify
Transparent proxy in front of any OpenAI-compatible LLM. Auto-injects relevant context into every prompt. Promotes durable facts to long-term entity store. The same agent remembers across sessions.
memory-agent · memory recall · entities
OpenTelemetry receiver for LangChain & OTel agents. Filesystem-drop, passive-watch, raw-log-capture. Pull events from anywhere they already are — no instrumentation needed.
otel-receiver · adapters/* · capture-pipe
Branch from any captured state. Simulate alternate paths. What-if any decision your agent made — explored offline, no live API costs.
branch-create · simulate · replay-branch
No cloud uploads. No telemetry to vendors. Your agent runs, your captures, your memory, your audits — all stay on your machine. Works offline. Works in air-gapped environments. Ships with the bundle.
~/.prime/ · open architecture
Compatible with: any agent you build or run from the command line. · Not yet supported: hosted UIs (ChatGPT.com, Claude.ai web). See docs for the integration matrix.
The Difference
LangSmith, Galileo, Arize, and W&B Weave are all built around the same idea: collect traces, visualize spans. That's useful. It's also where they stop.
Trace-only tools
SteelSpine AI
steelspine eval --fail-on-diff exits 1 on regression| SteelSpine AI | LangSmith | Langfuse | |
|---|---|---|---|
| Pricing | One-time, local, unlimited traces | $20K–$40K impl. + $9K–$18K/yr; trace overage in dev [1] | Open source; key features paywalled [2] |
| Uptime | 100% — runs on your machine | 88.8% on billing; 17hr outage reported [3] | Self-host requires ClickHouse + Redis + S3 [4] |
| Replay | ✓ Step-by-step deterministic replay | ✕ Re-runs prompt — not the original execution | ✕ No replay |
| Tamper detection | ✓ SHA-256 hash chain per run | ✕ No integrity verification | ✕ No integrity verification |
| Divergence point | ✓ Exact line where two runs split | ✕ Statistical diff only | ✕ No diff |
| Compliance audit | ✓ EU AI Act Art. 12 HTML report | ✕ | ✕ |
| Third-party notarization | ✓ RFC 3161 / eIDAS-accredited TSA — auto on with compliance_mode |
✕ | ✕ |
| Quantum-resistant | ✓ ML-DSA-65 (NIST FIPS 204) via --pq-sign |
✕ | ✕ |
| Human-oversight gate | ✓ EU AI Act Article 14 — --require-approval with sealed audit trail |
✕ | ✕ |
| Offline replay | ✓ Reconstruct any failure without live API calls | ✕ Requires live API to re-run | ✕ No offline replay |
| Failure root cause | ✓ Step-level: "diverged at step N, tool X returned unexpected schema" | ✕ Span view only — no causal chain | ✕ No root cause analysis |
| CI eval gating | ✓ steelspine eval --fail-on-diff — exit 1 on regression |
Partial — bt eval (cloud-dependent) |
✕ No CLI eval gating |
| Framework integration | ✓ OTel receiver — one env var, any framework | 50+ via SDK wrappers (code changes required) | 50+ via OTel (self-host: complex infra) |
| Policy guardrails | ✓ Pre-execution rules — block or warn before a step runs | ✕ No pre-execution enforcement | ✕ No guardrails |
[1] MetaCTO — The True Cost of LangSmith, 2026 · [2] langfuse.com/pricing · [3] Product Hunt — LangSmith reviews, 2026 · [4] langfuse.com/self-hosting
"Traces — not code — provide the only record of what your agent did and why." — LangSmith
SteelSpine AI agrees. Then goes further: replay it, prove it, and remember it.
The Problem
LLMs are stateless. Every run is a black box. When an agent fails — or worse, silently produces a wrong answer — you have logs, maybe. You don't have a causal record of what it decided, why, and what changed.
A tool call returns bad data at event 47. The agent recovers — but the final answer is wrong. Your logs say "completed successfully."
Two runs of the same agent on identical input produce different results. You have no way to find where they split or what caused it.
Regulated industries need proof of what an AI did and why. "The model decided" is not a compliance answer. You need a signed ledger.
How It Works
Wrap any agent with steelspine run. Every event is captured, hashed, and indexed in real time.
No instrumentation, no SDK required. Full replay, divergence detection, and tamper-evident audit — out of the box.
# Before: blind
python my_agent.py
# After: full causal record
$ steelspine run python my_agent.py
✓ 247 events captured | Chain: CLEAN | 4.1s
✓ Verdict: SUCCEEDED — no failures detected
steelspine run works on Python, Node, shell scripts, Docker containers. No changes to your agent required — ever.
$ steelspine compare run_0041 run_0042
Run A (run_0041): SUCCEEDED
Run B (run_0042): FAILED at event 112
param "temperature" 0.2 → 0.8
↳ 5 downstream decisions changed
↳ Root cause: config drift
Event-by-event comparison of any two runs. Finds the precise divergence point, shows what changed, traces every downstream decision that flowed from it.
$ steelspine verify-run --compliance-html
run_0041: CLEAN (247 events verified)
run_0042: CLEAN (301 events verified)
Hash chain: SHA-256 rolling + HMAC-SHA256
Ed25519 signature: VERIFIED ✓
ML-DSA-65 signature: VERIFIED ✓ (post-quantum)
RFC 3161 timestamp: VERIFIED ✓ (eIDAS / Sectigo)
EU AI Act Art.12: MAPPED
Report: self-contained HTML — auditor ready
Every run produces a verifiable audit report. The SHA-256 rolling hash chain detects any byte-level change to any event — past, present, or future. No known tool offers this.
--compliance-html$ steelspine memory-agent
✓ Proxy running at http://localhost:11435
# One line in your agent
base_url = "http://localhost:11435"
✓ 3 entities recalled from prior sessions
✓ Session context injected — 14 prior interactions
A transparent LLM proxy sits between your agent and the model. It automatically recalls entities and injects prior session context — no framework changes, no new SDK to learn.
Architecture
SteelSpine AI sits between your agent and the world — capturing, evaluating, and recording every decision into a causal ledger that outlives the session.
A git log records every commit — who changed what, when, and why. SteelSpine AI does the same for agent decisions — a permanent, verifiable record of every event, from first input to final output.
Command Reference
steelspine run
Wrap and capture any agent. Zero modification. Plain-English verdict after every run.
steelspine compare
Event-level divergence between any two runs. Traces root cause through downstream decisions.
steelspine status
Instant triage dashboard. Red attention banner fires automatically on critical signals.
steelspine what
"What failed?" — natural-language query, plain-English answer. Failures are permanently recorded.
steelspine monitor
Background daemon. Proactive failure alerts in real time.
steelspine diagnose
Step-level root cause. "Diverged at step 3: tool returned unexpected schema." Not just that it failed — why.
steelspine eval
Score runs against criteria. --fail-on-diff exits 1 on regression. --watch for CI pipelines.
steelspine memory-agent
Transparent LLM proxy on :11435. Injects persistent entity memory into every request. One URL change. Any model.
steelspine verify-run
SHA-256 hash chain verification. --html for dev report. --compliance-html for EU AI Act Art.12.
steelspine replay
Deterministic replay from any event. Branch at any point to test alternate execution paths.
steelspine replay-run
Offline replay of any archived run. Reconstruct any failure without live API calls — forensics from the file alone.
steelspine simulate
Branch alternative futures. Test different inputs against any captured agent state.
steelspine patterns
Cross-run failure pattern detection across your full run history.
steelspine policy
Pre-execution guardrails. Define rules that block or warn before a step runs. policy check <run_id> for post-hoc audit.
steelspine ui
Browser dashboard — run manager, memory browser, audit viewer, timeline.
steelspine doctor
Health check with auto-fix. Detects config drift, stale state, and storage issues.
steelspine otel-receiver
OpenTelemetry receiver on :4318. Point OTEL_EXPORTER_OTLP_ENDPOINT here — LangChain, LlamaIndex, and 50+ frameworks auto-ingest. Zero code changes.
pip install steelspine-langchain
Native LangChain callback handler. Attach to any LLM, chain, or AgentExecutor in two lines. OTel path first, subprocess fallback. Never raises, never blocks your agent.
from steelspine_langchain import SteelSpineCallbackHandler handler = SteelSpineCallbackHandler() llm = ChatOpenAI(callbacks=[handler])