For AI Builders & Local-LLM Devs

Local-first persistent memory.
One URL change.

Every LLM call starts blind. SteelSpine's memory proxy injects persistent entity context into every prompt. Works with Ollama, LM Studio, llama.cpp, vLLM, and any OpenAI-compatible endpoint. No SDK. No framework changes. Memory lives on your machine.

CA$29.99/mo after trial · No vendor lock-in · Memory stays on your machine

The memory problem in AI agents

LLMs are stateless by default. Every conversation, every agent run, every NPC interaction starts from zero context. You either stuff growing context windows (slow + expensive) or you build a memory layer yourself (months of engineering, lots of edge cases).

The third option is a transparent proxy that handles memory automatically. Mem0 and Letta proved this category exists. SteelSpine's memory-agent implements the same pattern with two structural advantages: local-first storage (memory lives in ~/.prime/entities/ on your machine, not a vendor's cloud) and cryptographic audit chain (every memory mutation is signed and verifiable).

How it works

Three minutes to set up. One URL change. No SDK install.

Step 1: Start the memory proxy

steelspine start
# Memory proxy now listening on http://localhost:11435
# Auto-detects local LLM (Ollama, LM Studio, llama.cpp, vLLM)

Step 2: Point your LLM client at the proxy

# Instead of:
#   OLLAMA_HOST=http://localhost:11434
# Use:
export OLLAMA_HOST=http://localhost:11435

# Or for any OpenAI-compatible client:
export OPENAI_BASE_URL=http://localhost:11435/v1

Step 3: Use your LLM normally

That's it. The proxy auto-extracts entities from each conversation turn, persists them to ~/.prime/entities/<name>.json, and injects relevant context into the next prompt automatically. Your agent code does not change. Your framework does not change.

What you get: the same LLM call now produces context-aware responses about people, projects, characters, or any entity that surfaced in earlier conversations. No tinkering with vector stores. No prompt engineering for context retrieval. No framework lock-in.

Before and after the memory proxy

Same LLM. Same prompt. The proxy injects entity context automatically. Same one URL change, different conversational quality.

steelspine — memory proxy demo

# Before: direct to Ollama, no memory layer

$ OLLAMA_HOST=http://localhost:11434 ollama run llama3.1 \

    "What did Sarah say about the project deadline?"

  → I don't have context about Sarah or any project. Could you provide more details?

# Start the memory proxy (one-time)

$ steelspine start

✓ Memory proxy on http://localhost:11435

✓ Auto-detected: Ollama at localhost:11434

# After: same client, change port to proxy. Memory injected automatically.

$ OLLAMA_HOST=http://localhost:11435 ollama run llama3.1 \

    "What did Sarah say about the project deadline?"

✓ Entity context injected: Sarah (8 prior turns) · project_alpha (12 prior turns)

  → Sarah mentioned the project deadline is October 15. She also said the design

    review needs to happen by October 1 to stay on track.

# Memory persists to ~/.prime/entities/ on your disk. Signed. Local-first.

No SDK install. No framework changes. The proxy speaks OpenAI's chat completions API, so any client (LangChain, LlamaIndex, custom, terminal) that points at it gets memory automatically. Same with LM Studio, llama.cpp, vLLM, and any OpenAI-compatible endpoint.

Four properties that differentiate

01 / Local-first

Memory stays on your machine

All entity memory is stored at ~/.prime/entities/ on your filesystem. No cloud sync. No vendor servers. No data residency complications. Air-gapped deployments are first-class. Memory survives vendor outages, account terminations, and ToS changes.

02 / Cryptographically Audited

Every memory write is signed

Memory mutations append to the same HMAC-SHA256 + Ed25519 hash chain that powers SteelSpine's compliance audit. Tampering with stored memory breaks the chain. Useful for regulated environments where AI-modified data has audit-trail requirements.

03 / Drop-in Compatibility

One URL change. Any OpenAI-compatible LLM.

The proxy speaks OpenAI's chat completions API. Any client (LangChain, LlamaIndex, custom, terminal) that points at it gets memory automatically. Works with Ollama, LM Studio, llama.cpp, vLLM, OpenAI itself, Anthropic via gateway, anything that speaks the same wire protocol.

04 / Replayable

Every memory state has a captured history

Memory does not just persist; it has a replay surface. Reconstruct what an entity knew at any point in time with state-at <event_id>. Branch alternative memory states for what-if exploration. Debug memory bugs deterministically.

How this compares to Mem0, Letta, Zep

The memory category exists. SteelSpine is the local-first + audit-grade entry. Honest comparison:

Property Mem0 Letta Zep SteelSpine Memory
Pricing entry $19/mo retail, $249/mo Pro Open source (self-host) Free tier + credits CA$29.99/mo + free trial
Local-first storage Cloud-first Self-hosted Cloud-first Local-first default
Cryptographic audit chain No No No HMAC + Ed25519 signed
SDK required SDK SDK / Python framework SDK No SDK — transparent proxy
Setup steps npm install + API key Self-host stack API key + integration One env var
Works with any OpenAI-compatible LLM SDK-specific Framework-specific SDK-specific Any OpenAI-compatible endpoint
Replay / branch memory state No No No replay-branch, state-at
Bundled with audit + replay infrastructure Memory only Memory only Memory only Full SteelSpine product

Mem0 has $24M Series A and 48k GitHub stars. Strong product. SteelSpine differentiates on local-first + cryptographic audit + drop-in transparent proxy + bundled with full audit/replay stack. Same category, structurally different positioning.

Pricing

One tier. Memory plus the full SteelSpine stack (debug, replay, verify, branch, OTEL, MCP) included.

CA$29.99/mo
14-day free trial · No credit card required
  • Persistent entity memory across sessions
  • Auto entity extraction + context injection
  • Transparent proxy (Ollama, LM Studio, llama.cpp, vLLM, any OpenAI-compatible)
  • Cryptographically-signed memory mutations
  • Replay + branch memory state
  • Local-first storage (~/.prime/entities/)
  • Plus the full SteelSpine stack: debug, replay, verify, OTEL, MCP, Cursor + Claude Code integrations
  • 1 user, 1 machine
Start 14-Day Free Trial

Multi-seat team or enterprise deployment? See DevOps tier or compliance tier.