SteelSpine AI™ — EU AI Act Article 12 Audit Trail. Debug Any Agent. Prove What It Decided.

The Short Version

The problem is real. The gap is wide.
No known tool closes it.

Other tools give you traces. A trace shows you what happened — it doesn't let you replay it, prove it, or remember it next session. SteelSpine AI does all three.

01

Debug

At 99% accuracy per step, a 100-step agent still fails 63% of the time — the math compounds. When it fails, your logs say "completed successfully." SteelSpine AI shows you the exact event where it went wrong and why, then lets you replay it deterministically from that point. Every failure is permanently recorded — find it days, months, or years later.

Zero code changes

02

Remember

Every LLM call starts from zero. No memory of last session, no entity context, no continuity. Gartner found a 20% customer churn increase when agents lose session context³ — and stuffing more context past 100k tokens doubles inference time and quadruples cost. Change one URL and SteelSpine AI injects persistent memory into every request — no framework changes, ever.

One URL change

03

Prove

LangSmith, Galileo, Arize — they all give you traces. Traces show you what happened. They cannot prove nothing was changed. SteelSpine AI's SHA-256 rolling hash chain detects any edit, deletion, or insertion to any event, past or present. Cryptographically.

Patents Pending

See It In Action

Add it to any agent in 30 seconds.

↑ A real refund-bot run. Watch SteelSpine catch the policy violation in real time.

Or read it as a sequence:

steelspine — agent session

# Wrap your agent — nothing else changes

$ steelspine run python my_agent.py

✓ Run captured: run_0047 | 312 events | 4.2s

✓ Verdict: SUCCEEDED — hash chain clean

Divergence detected vs run_0046 — auto-compare running

# Find out exactly where two runs split

$ steelspine compare

↳ Divergence at event 187: param "query" changed

↳ 3 downstream decisions invalidated — root cause isolated

# Cryptographic proof of what your AI decided

$ steelspine verify-run

✓ SHA-256 chain: CLEAN | 312/312 events verified | Audit ready

Beyond Capture

Infrastructure for AI agents.
Not a logging library.

The capture-and-audit demo above is the first 10% of what SteelSpine does. Underneath the CLI is a five-layer infrastructure stack — every piece runs locally, no cloud dependency, no vendor lock-in.

Layer 1

Capture & Replay

Wrap any agent or command. Stream stdout/stderr to a hash-chained event log. Replay offline against any captured state.

steelspine run · replay-run · branch-create

Layer 2

Cryptographic Audit

HMAC-SHA256 + Ed25519 chain. Tamper-evident. Independently verifiable by an auditor with just the public key. EU AI Act Article 12 compliant out of the box. Optional hardening: compliance_mode auto-enables RFC 3161 timestamping via eIDAS-accredited TSA; --pq-sign adds ML-DSA-65 post-quantum signatures (NIST FIPS 204) for long-archive audits.

verify-run · pack-create · pack-verify

Layer 3

Persistent Memory

Transparent proxy in front of any OpenAI-compatible LLM. Auto-injects relevant context into every prompt. Promotes durable facts to long-term entity store. The same agent remembers across sessions.

memory-agent · memory recall · entities

Layer 4

Adapters & Ingress

OpenTelemetry receiver for LangChain & OTel agents. Filesystem-drop, passive-watch, raw-log-capture. Pull events from anywhere they already are — no instrumentation needed.

otel-receiver · adapters/* · capture-pipe

Layer 5

Branching & Simulation

Branch from any captured state. Simulate alternate paths. What-if any decision your agent made — explored offline, no live API costs.

branch-create · simulate · replay-branch

Built In

All Local. All Yours.

No cloud uploads. No telemetry to vendors. Your agent runs, your captures, your memory, your audits — all stay on your machine. Works offline. Works in air-gapped environments. Ships with the bundle.

~/.prime/ · open architecture

Compatible with: any agent you build or run from the command line. · Not yet supported: hosted UIs (ChatGPT.com, Claude.ai web). See docs for the integration matrix.

The Difference

Trace-only tools show you what happened.
SteelSpine AI lets you act on it.

LangSmith, Galileo, Arize, and W&B Weave are all built around the same idea: collect traces, visualize spans. That's useful. It's also where they stop.

Trace-only tools

✕ See what happened — read-only logs and spans
✕ Requires SDK install or framework-specific wiring
✕ No replay — you can read the trace, not re-run it
✕ No memory between sessions — every call starts blind
✕ No tamper detection — logs can be edited silently
✕ "Run failed" — no step-level explanation of why or where
✕ No CI eval gating — can't fail a build on agent regression

SteelSpine AI

✓ Full causal event record — every decision, every tool call
✓ Zero instrumentation — wrap any command, any language
✓ Deterministic replay from any event — branch at any point
✓ Persistent entity memory via proxy — one URL change
✓ SHA-256 hash chain — cryptographic tamper detection
✓ Step-level root cause — "diverged at step 3: tool returned unexpected schema"
✓ CI eval gating — steelspine eval --fail-on-diff exits 1 on regression
✓ OTel receiver — auto-ingest LangChain, LlamaIndex, 50+ frameworks via one env var
✓ Policy guardrails — define pre-execution rules that block or warn before a step runs

	SteelSpine AI	LangSmith	Langfuse
Pricing	One-time, local, unlimited traces	$20K–$40K impl. + $9K–$18K/yr; trace overage in dev [1]	Open source; key features paywalled [2]
Uptime	100% — runs on your machine	88.8% on billing; 17hr outage reported [3]	Self-host requires ClickHouse + Redis + S3 [4]
Replay	✓ Step-by-step deterministic replay	✕ Re-runs prompt — not the original execution	✕ No replay
Tamper detection	✓ SHA-256 hash chain per run	✕ No integrity verification	✕ No integrity verification
Divergence point	✓ Exact line where two runs split	✕ Statistical diff only	✕ No diff
Compliance audit	✓ EU AI Act Art. 12 HTML report	✕	✕
Third-party notarization	✓ RFC 3161 / eIDAS-accredited TSA — auto on with `compliance_mode`	✕	✕
Quantum-resistant	✓ ML-DSA-65 (NIST FIPS 204) via `--pq-sign`	✕	✕
Human-oversight gate	✓ EU AI Act Article 14 — `--require-approval` with sealed audit trail	✕	✕
Offline replay	✓ Reconstruct any failure without live API calls	✕ Requires live API to re-run	✕ No offline replay
Failure root cause	✓ Step-level: "diverged at step N, tool X returned unexpected schema"	✕ Span view only — no causal chain	✕ No root cause analysis
CI eval gating	✓ `steelspine eval --fail-on-diff` — exit 1 on regression	Partial — `bt eval` (cloud-dependent)	✕ No CLI eval gating
Framework integration	✓ OTel receiver — one env var, any framework	50+ via SDK wrappers (code changes required)	50+ via OTel (self-host: complex infra)
Policy guardrails	✓ Pre-execution rules — block or warn before a step runs	✕ No pre-execution enforcement	✕ No guardrails

[1] MetaCTO — The True Cost of LangSmith, 2026 · [2] langfuse.com/pricing · [3] Product Hunt — LangSmith reviews, 2026 · [4] langfuse.com/self-hosting

"Traces — not code — provide the only record of what your agent did and why." — LangSmith
SteelSpine AI agrees. Then goes further: replay it, prove it, and remember it.

The Problem

AI agents run. Things go wrong.
You have no idea why.

LLMs are stateless. Every run is a black box. When an agent fails — or worse, silently produces a wrong answer — you have logs, maybe. You don't have a causal record of what it decided, why, and what changed.

⚡

Agents fail silently

A tool call returns bad data at event 47. The agent recovers — but the final answer is wrong. Your logs say "completed successfully."

🔀

Runs diverge unexpectedly

Two runs of the same agent on identical input produce different results. You have no way to find where they split or what caused it.

📋

Audit trails don't exist

Regulated industries need proof of what an AI did and why. "The model decided" is not a compliance answer. You need a signed ledger.

How It Works

SteelSpine AI adds a causal execution layer
underneath every run.

Wrap any agent with steelspine run. Every event is captured, hashed, and indexed in real time. No instrumentation, no SDK required. Full replay, divergence detection, and tamper-evident audit — out of the box.

steelspine run — zero instrumentation

# Before: blind

python my_agent.py

# After: full causal record

$ steelspine run python my_agent.py

✓ 247 events captured | Chain: CLEAN | 4.1s

✓ Verdict: SUCCEEDED — no failures detected

Debug: One command wraps anything.

steelspine run works on Python, Node, shell scripts, Docker containers. No changes to your agent required — ever.

Every tool call, decision, and state transition captured
SHA-256 hash chain written after every event
Plain-English verdict on every run
Works with LangChain, AutoGen, LangGraph, raw Python

steelspine compare — divergence detection

$ steelspine compare run_0041 run_0042

Run A (run_0041): SUCCEEDED

Run B (run_0042): FAILED at event 112

param "temperature" 0.2 → 0.8

↳ 5 downstream decisions changed

↳ Root cause: config drift

Compare: Find the exact split point.

Event-by-event comparison of any two runs. Finds the precise divergence point, shows what changed, traces every downstream decision that flowed from it.

Automatic comparison after every run
Event-level root cause isolation
Branch simulation: "what if" scenario testing
Plain-English verdict — no log parsing required

steelspine verify-run — tamper-evident audit

$ steelspine verify-run --compliance-html

run_0041: CLEAN (247 events verified)

run_0042: CLEAN (301 events verified)

Hash chain: SHA-256 rolling + HMAC-SHA256

Ed25519 signature: VERIFIED ✓

ML-DSA-65 signature: VERIFIED ✓ (post-quantum)

RFC 3161 timestamp: VERIFIED ✓ (eIDAS / Sectigo)

EU AI Act Art.12: MAPPED

Report: self-contained HTML — auditor ready

Prove: Cryptographic audit. Always.

Every run produces a verifiable audit report. The SHA-256 rolling hash chain detects any byte-level change to any event — past, present, or future. No known tool offers this.

Detects any edit, deletion, or insertion in any event
Ed25519 asymmetric signature — auditors verify with public key, no secret needed
Self-contained HTML report — submit to any auditor
EU AI Act Art. 12 compliance report via --compliance-html
Patents Pending

steelspine memory-agent — persistent memory

$ steelspine memory-agent

✓ Proxy running at http://localhost:11435

# One line in your agent

base_url = "http://localhost:11435"

✓ 3 entities recalled from prior sessions

✓ Session context injected — 14 prior interactions

Remember: Persistent memory via proxy.

A transparent LLM proxy sits between your agent and the model. It automatically recalls entities and injects prior session context — no framework changes, no new SDK to learn.

Works with any OpenAI-compatible API
Tested with Ollama, qwen3:32b, and local models
Entities persist across sessions automatically
Zero changes to your agent logic

Architecture

AI proposes. SteelSpine AI governs.

SteelSpine AI sits between your agent and the world — capturing, evaluating, and recording every decision into a causal ledger that outlives the session.

🤖

Your Agent

LLM · LangChain · AutoGen
raw Python · any runtime

every event

⚙

SteelSpine AI

captures · hashes · indexes
replays · compares · remembers

causal record

📒

Decision Ledger

tamper-evident · auditable
replayable · provable

A git log records every commit — who changed what, when, and why. SteelSpine AI does the same for agent decisions — a permanent, verifiable record of every event, from first input to final output.

Command Reference

Every command maps to a promise.

Debug — find out why it failed

steelspine run

Wrap and capture any agent. Zero modification. Plain-English verdict after every run.

steelspine compare

Event-level divergence between any two runs. Traces root cause through downstream decisions.

steelspine status

Instant triage dashboard. Red attention banner fires automatically on critical signals.

steelspine what

"What failed?" — natural-language query, plain-English answer. Failures are permanently recorded.

steelspine monitor

Background daemon. Proactive failure alerts in real time.

steelspine diagnose

Step-level root cause. "Diverged at step 3: tool returned unexpected schema." Not just that it failed — why.

steelspine eval

Score runs against criteria. --fail-on-diff exits 1 on regression. --watch for CI pipelines.

Remember

steelspine memory-agent

Transparent LLM proxy on :11435. Injects persistent entity memory into every request. One URL change. Any model.

Prove — deterministic replay + cryptographic proof

steelspine verify-run

SHA-256 hash chain verification. --html for dev report. --compliance-html for EU AI Act Art.12.

steelspine replay

Deterministic replay from any event. Branch at any point to test alternate execution paths.

steelspine replay-run

Offline replay of any archived run. Reconstruct any failure without live API calls — forensics from the file alone.

steelspine simulate

Branch alternative futures. Test different inputs against any captured agent state.

steelspine patterns

Cross-run failure pattern detection across your full run history.

steelspine policy

Pre-execution guardrails. Define rules that block or warn before a step runs. policy check <run_id> for post-hoc audit.

Setup & Tooling

steelspine ui

Browser dashboard — run manager, memory browser, audit viewer, timeline.

steelspine doctor

Health check with auto-fix. Detects config drift, stale state, and storage issues.

steelspine otel-receiver

OpenTelemetry receiver on :4318. Point OTEL_EXPORTER_OTLP_ENDPOINT here — LangChain, LlamaIndex, and 50+ frameworks auto-ingest. Zero code changes.

pip install steelspine-langchain

Native LangChain callback handler. Attach to any LLM, chain, or AgentExecutor in two lines. OTel path first, subprocess fallback. Never raises, never blocks your agent.

from steelspine_langchain import SteelSpineCallbackHandler
handler = SteelSpineCallbackHandler()
llm = ChatOpenAI(callbacks=[handler])

Your AI ran.
Something went wrong.
You have no idea why.

Why did your agent
do that?

The problem is real. The gap is wide.
No known tool closes it.

Add it to any agent in 30 seconds.

Infrastructure for AI agents.
Not a logging library.

Capture & Replay

Cryptographic Audit

Persistent Memory

Adapters & Ingress

Branching & Simulation

All Local. All Yours.

Trace-only tools show you what happened.
SteelSpine AI lets you act on it.

AI agents run. Things go wrong.
You have no idea why.

Agents fail silently

Runs diverge unexpectedly

Audit trails don't exist

SteelSpine AI adds a causal execution layer
underneath every run.

Debug: One command wraps anything.

Compare: Find the exact split point.

Prove: Cryptographic audit. Always.

Remember: Persistent memory via proxy.

AI proposes. SteelSpine AI governs.

Every command maps to a promise.

Try before you buy.
14 days free.

Your AI ran. Something went wrong. You have no idea why.

Why did your agent do that?

The problem is real. The gap is wide.No known tool closes it.

Add it to any agent in 30 seconds.

Infrastructure for AI agents.Not a logging library.

Capture & Replay

Cryptographic Audit

Persistent Memory

Adapters & Ingress

Branching & Simulation

All Local. All Yours.

Trace-only tools show you what happened.SteelSpine AI lets you act on it.

AI agents run. Things go wrong.You have no idea why.

Agents fail silently

Runs diverge unexpectedly

Audit trails don't exist

SteelSpine AI adds a causal execution layerunderneath every run.

Debug: One command wraps anything.

Compare: Find the exact split point.

Prove: Cryptographic audit. Always.

Remember: Persistent memory via proxy.

AI proposes. SteelSpine AI governs.

Every command maps to a promise.

Try before you buy.14 days free.

Your AI ran.
Something went wrong.
You have no idea why.

Why did your agent
do that?

The problem is real. The gap is wide.
No known tool closes it.

Infrastructure for AI agents.
Not a logging library.

Trace-only tools show you what happened.
SteelSpine AI lets you act on it.

AI agents run. Things go wrong.
You have no idea why.

SteelSpine AI adds a causal execution layer
underneath every run.

Try before you buy.
14 days free.