EU AI Act Article 12 · Enforced August 2, 2026

Debug, replay, and verify any run.

Tamper-evident audit trails for any AI agent.

Cryptographically sealed runs, replay on demand, signed compliance reports.

Scroll to continue ↓

Your AI ran.
Something went wrong.
You have no idea why.

That's it. One command. Full history. Proof it wasn't touched.
No vendor lock-in. Runs locally. Works with anything.

EU AI Act Article 12 enforcement starts August 2, 2026 · make your AI auditable in one command — how →
AI Agent Observability · EU AI Act Art.12 Ready · ISO 42001 · NIST AI RMF

Why did your agent
do that?

Your agent failed. You have logs. You still don't know why —
and it won't remember any of it next session.
SteelSpine AI fixes both. Zero code changes.

63% of 100-step agent tasks fail
at 99% per-step accuracy¹
46% of developers don't trust
what their AI outputs²
32% output quality is the
#1 blocker to production²
0 no known tool combines
replay + proof + memory

¹ Vellum / Towards Data Science, 2025  ·  ² Stack Overflow Developer Survey, 49,000 respondents, 2025  ·  ³ Gartner, 2025

The Short Version

The problem is real. The gap is wide.
No known tool closes it.

Other tools give you traces. A trace shows you what happened — it doesn't let you replay it, prove it, or remember it next session. SteelSpine AI does all three.

01
Debug

At 99% accuracy per step, a 100-step agent still fails 63% of the time — the math compounds. When it fails, your logs say "completed successfully." SteelSpine AI shows you the exact event where it went wrong and why, then lets you replay it deterministically from that point. Every failure is permanently recorded — find it days, months, or years later.

Zero code changes
02
Remember

Every LLM call starts from zero. No memory of last session, no entity context, no continuity. Gartner found a 20% customer churn increase when agents lose session context³ — and stuffing more context past 100k tokens doubles inference time and quadruples cost. Change one URL and SteelSpine AI injects persistent memory into every request — no framework changes, ever.

One URL change
03
Prove

LangSmith, Galileo, Arize — they all give you traces. Traces show you what happened. They cannot prove nothing was changed. SteelSpine AI's SHA-256 rolling hash chain detects any edit, deletion, or insertion to any event, past or present. Cryptographically.

Patents Pending

See It In Action

Add it to any agent in 30 seconds.

↑ A real refund-bot run. Watch SteelSpine catch the policy violation in real time.

Or read it as a sequence:

steelspine — agent session

# Wrap your agent — nothing else changes

$ steelspine run python my_agent.py

✓ Run captured: run_0047 | 312 events | 4.2s

✓ Verdict: SUCCEEDED — hash chain clean

Divergence detected vs run_0046 — auto-compare running


# Find out exactly where two runs split

$ steelspine compare

↳ Divergence at event 187: param "query" changed

↳ 3 downstream decisions invalidated — root cause isolated


# Cryptographic proof of what your AI decided

$ steelspine verify-run

✓ SHA-256 chain: CLEAN | 312/312 events verified | Audit ready

Beyond Capture

Infrastructure for AI agents.
Not a logging library.

The capture-and-audit demo above is the first 10% of what SteelSpine does. Underneath the CLI is a five-layer infrastructure stack — every piece runs locally, no cloud dependency, no vendor lock-in.

Layer 1

Capture & Replay

Wrap any agent or command. Stream stdout/stderr to a hash-chained event log. Replay offline against any captured state.

steelspine run · replay-run · branch-create
Layer 2

Cryptographic Audit

HMAC-SHA256 + Ed25519 chain. Tamper-evident. Independently verifiable by an auditor with just the public key. EU AI Act Article 12 compliant out of the box. Optional hardening: compliance_mode auto-enables RFC 3161 timestamping via eIDAS-accredited TSA; --pq-sign adds ML-DSA-65 post-quantum signatures (NIST FIPS 204) for long-archive audits.

verify-run · pack-create · pack-verify
Layer 3

Persistent Memory

Transparent proxy in front of any OpenAI-compatible LLM. Auto-injects relevant context into every prompt. Promotes durable facts to long-term entity store. The same agent remembers across sessions.

memory-agent · memory recall · entities
Layer 4

Adapters & Ingress

OpenTelemetry receiver for LangChain & OTel agents. Filesystem-drop, passive-watch, raw-log-capture. Pull events from anywhere they already are — no instrumentation needed.

otel-receiver · adapters/* · capture-pipe
Layer 5

Branching & Simulation

Branch from any captured state. Simulate alternate paths. What-if any decision your agent made — explored offline, no live API costs.

branch-create · simulate · replay-branch
Built In

All Local. All Yours.

No cloud uploads. No telemetry to vendors. Your agent runs, your captures, your memory, your audits — all stay on your machine. Works offline. Works in air-gapped environments. Ships with the bundle.

~/.prime/ · open architecture

Compatible with: any agent you build or run from the command line. · Not yet supported: hosted UIs (ChatGPT.com, Claude.ai web). See docs for the integration matrix.

The Difference

Trace-only tools show you what happened.
SteelSpine AI lets you act on it.

LangSmith, Galileo, Arize, and W&B Weave are all built around the same idea: collect traces, visualize spans. That's useful. It's also where they stop.

Trace-only tools

  • See what happened — read-only logs and spans
  • Requires SDK install or framework-specific wiring
  • No replay — you can read the trace, not re-run it
  • No memory between sessions — every call starts blind
  • No tamper detection — logs can be edited silently
  • "Run failed" — no step-level explanation of why or where
  • No CI eval gating — can't fail a build on agent regression

SteelSpine AI

  • Full causal event record — every decision, every tool call
  • Zero instrumentation — wrap any command, any language
  • Deterministic replay from any event — branch at any point
  • Persistent entity memory via proxy — one URL change
  • SHA-256 hash chain — cryptographic tamper detection
  • Step-level root cause — "diverged at step 3: tool returned unexpected schema"
  • CI eval gating — steelspine eval --fail-on-diff exits 1 on regression
  • OTel receiver — auto-ingest LangChain, LlamaIndex, 50+ frameworks via one env var
  • Policy guardrails — define pre-execution rules that block or warn before a step runs
SteelSpine AI LangSmith Langfuse
Pricing One-time, local, unlimited traces $20K–$40K impl. + $9K–$18K/yr; trace overage in dev [1] Open source; key features paywalled [2]
Uptime 100% — runs on your machine 88.8% on billing; 17hr outage reported [3] Self-host requires ClickHouse + Redis + S3 [4]
Replay ✓ Step-by-step deterministic replay ✕ Re-runs prompt — not the original execution ✕ No replay
Tamper detection ✓ SHA-256 hash chain per run ✕ No integrity verification ✕ No integrity verification
Divergence point ✓ Exact line where two runs split ✕ Statistical diff only ✕ No diff
Compliance audit ✓ EU AI Act Art. 12 HTML report
Third-party notarization ✓ RFC 3161 / eIDAS-accredited TSA — auto on with compliance_mode
Quantum-resistant ✓ ML-DSA-65 (NIST FIPS 204) via --pq-sign
Human-oversight gate ✓ EU AI Act Article 14 — --require-approval with sealed audit trail
Offline replay ✓ Reconstruct any failure without live API calls ✕ Requires live API to re-run ✕ No offline replay
Failure root cause ✓ Step-level: "diverged at step N, tool X returned unexpected schema" ✕ Span view only — no causal chain ✕ No root cause analysis
CI eval gating steelspine eval --fail-on-diff — exit 1 on regression Partial — bt eval (cloud-dependent) ✕ No CLI eval gating
Framework integration ✓ OTel receiver — one env var, any framework 50+ via SDK wrappers (code changes required) 50+ via OTel (self-host: complex infra)
Policy guardrails ✓ Pre-execution rules — block or warn before a step runs ✕ No pre-execution enforcement ✕ No guardrails

[1] MetaCTO — The True Cost of LangSmith, 2026 · [2] langfuse.com/pricing · [3] Product Hunt — LangSmith reviews, 2026 · [4] langfuse.com/self-hosting

"Traces — not code — provide the only record of what your agent did and why." — LangSmith
SteelSpine AI agrees. Then goes further: replay it, prove it, and remember it.
For the technically curious

The Problem

AI agents run. Things go wrong.
You have no idea why.

LLMs are stateless. Every run is a black box. When an agent fails — or worse, silently produces a wrong answer — you have logs, maybe. You don't have a causal record of what it decided, why, and what changed.

Agents fail silently

A tool call returns bad data at event 47. The agent recovers — but the final answer is wrong. Your logs say "completed successfully."

🔀

Runs diverge unexpectedly

Two runs of the same agent on identical input produce different results. You have no way to find where they split or what caused it.

📋

Audit trails don't exist

Regulated industries need proof of what an AI did and why. "The model decided" is not a compliance answer. You need a signed ledger.

How It Works

SteelSpine AI adds a causal execution layer
underneath every run.

Wrap any agent with steelspine run. Every event is captured, hashed, and indexed in real time. No instrumentation, no SDK required. Full replay, divergence detection, and tamper-evident audit — out of the box.

steelspine run — zero instrumentation

# Before: blind

python my_agent.py


# After: full causal record

$ steelspine run python my_agent.py

✓ 247 events captured | Chain: CLEAN | 4.1s

✓ Verdict: SUCCEEDED — no failures detected

Debug: One command wraps anything.

steelspine run works on Python, Node, shell scripts, Docker containers. No changes to your agent required — ever.

  • Every tool call, decision, and state transition captured
  • SHA-256 hash chain written after every event
  • Plain-English verdict on every run
  • Works with LangChain, AutoGen, LangGraph, raw Python
steelspine compare — divergence detection

$ steelspine compare run_0041 run_0042


Run A (run_0041): SUCCEEDED

Run B (run_0042): FAILED at event 112


param "temperature" 0.2 → 0.8

↳ 5 downstream decisions changed

↳ Root cause: config drift

Compare: Find the exact split point.

Event-by-event comparison of any two runs. Finds the precise divergence point, shows what changed, traces every downstream decision that flowed from it.

  • Automatic comparison after every run
  • Event-level root cause isolation
  • Branch simulation: "what if" scenario testing
  • Plain-English verdict — no log parsing required
steelspine verify-run — tamper-evident audit

$ steelspine verify-run --compliance-html


run_0041: CLEAN (247 events verified)

run_0042: CLEAN (301 events verified)


Hash chain: SHA-256 rolling + HMAC-SHA256

Ed25519 signature: VERIFIED ✓

ML-DSA-65 signature: VERIFIED ✓ (post-quantum)

RFC 3161 timestamp: VERIFIED ✓ (eIDAS / Sectigo)

EU AI Act Art.12: MAPPED

Report: self-contained HTML — auditor ready

Prove: Cryptographic audit. Always.

Every run produces a verifiable audit report. The SHA-256 rolling hash chain detects any byte-level change to any event — past, present, or future. No known tool offers this.

  • Detects any edit, deletion, or insertion in any event
  • Ed25519 asymmetric signature — auditors verify with public key, no secret needed
  • Self-contained HTML report — submit to any auditor
  • EU AI Act Art. 12 compliance report via --compliance-html
  • Patents Pending
steelspine memory-agent — persistent memory

$ steelspine memory-agent

✓ Proxy running at http://localhost:11435


# One line in your agent

base_url = "http://localhost:11435"


✓ 3 entities recalled from prior sessions

✓ Session context injected — 14 prior interactions

Remember: Persistent memory via proxy.

A transparent LLM proxy sits between your agent and the model. It automatically recalls entities and injects prior session context — no framework changes, no new SDK to learn.

  • Works with any OpenAI-compatible API
  • Tested with Ollama, qwen3:32b, and local models
  • Entities persist across sessions automatically
  • Zero changes to your agent logic

Architecture

AI proposes. SteelSpine AI governs.

SteelSpine AI sits between your agent and the world — capturing, evaluating, and recording every decision into a causal ledger that outlives the session.

🤖
Your Agent
LLM · LangChain · AutoGen
raw Python · any runtime
every event
SteelSpine AI
captures · hashes · indexes
replays · compares · remembers
causal record
📒
Decision Ledger
tamper-evident · auditable
replayable · provable
A git log records every commit — who changed what, when, and why. SteelSpine AI does the same for agent decisions — a permanent, verifiable record of every event, from first input to final output.

Command Reference

Every command maps to a promise.

Debug — find out why it failed
steelspine run

Wrap and capture any agent. Zero modification. Plain-English verdict after every run.

steelspine compare

Event-level divergence between any two runs. Traces root cause through downstream decisions.

steelspine status

Instant triage dashboard. Red attention banner fires automatically on critical signals.

steelspine what

"What failed?" — natural-language query, plain-English answer. Failures are permanently recorded.

steelspine monitor

Background daemon. Proactive failure alerts in real time.

steelspine diagnose

Step-level root cause. "Diverged at step 3: tool returned unexpected schema." Not just that it failed — why.

steelspine eval

Score runs against criteria. --fail-on-diff exits 1 on regression. --watch for CI pipelines.

Remember
steelspine memory-agent

Transparent LLM proxy on :11435. Injects persistent entity memory into every request. One URL change. Any model.

Prove — deterministic replay + cryptographic proof
steelspine verify-run

SHA-256 hash chain verification. --html for dev report. --compliance-html for EU AI Act Art.12.

steelspine replay

Deterministic replay from any event. Branch at any point to test alternate execution paths.

steelspine replay-run

Offline replay of any archived run. Reconstruct any failure without live API calls — forensics from the file alone.

steelspine simulate

Branch alternative futures. Test different inputs against any captured agent state.

steelspine patterns

Cross-run failure pattern detection across your full run history.

steelspine policy

Pre-execution guardrails. Define rules that block or warn before a step runs. policy check <run_id> for post-hoc audit.

Setup & Tooling
steelspine ui

Browser dashboard — run manager, memory browser, audit viewer, timeline.

steelspine doctor

Health check with auto-fix. Detects config drift, stale state, and storage issues.

steelspine otel-receiver

OpenTelemetry receiver on :4318. Point OTEL_EXPORTER_OTLP_ENDPOINT here — LangChain, LlamaIndex, and 50+ frameworks auto-ingest. Zero code changes.

pip install steelspine-langchain

Native LangChain callback handler. Attach to any LLM, chain, or AgentExecutor in two lines. OTel path first, subprocess fallback. Never raises, never blocks your agent.

from steelspine_langchain import SteelSpineCallbackHandler
handler = SteelSpineCallbackHandler()
llm = ChatOpenAI(callbacks=[handler])