For AI Builders & Local-LLM Devs

Local-first persistent memory.
One URL change.

Every LLM call starts blind. SteelSpine's memory proxy injects persistent entity context into every prompt. Works with Ollama, LM Studio, llama.cpp, vLLM, and any OpenAI-compatible endpoint. No SDK. No framework changes. Memory lives on your machine.

Get it free See how

Free · No vendor lock-in · Memory stays on your machine

The memory problem in AI agents

LLMs are stateless by default. Every conversation, every agent run, every NPC interaction starts from zero context. You either stuff growing context windows (slow + expensive) or you build a memory layer yourself (months of engineering, lots of edge cases).

The third option is a transparent proxy that handles memory automatically. Mem0 and Letta proved this category exists. SteelSpine's memory-agent implements the same pattern with two structural advantages: local-first storage (memory lives in ~/.prime/entities/ on your machine, not a vendor's cloud) and cryptographic audit chain (every memory mutation is signed and verifiable).

How it works

Three minutes to set up. One URL change. No SDK install.

Step 1: Start the memory proxy

steelspine start
# Memory proxy now listening on http://localhost:11435
# Auto-detects local LLM (Ollama, LM Studio, llama.cpp, vLLM)

Step 2: Point your LLM client at the proxy

# Instead of:
#   OLLAMA_HOST=http://localhost:11434
# Use:
export OLLAMA_HOST=http://localhost:11435

# Or for any OpenAI-compatible client:
export OPENAI_BASE_URL=http://localhost:11435/v1

Step 3: Use your LLM normally

That's it. The proxy auto-extracts entities from each conversation turn, persists them to ~/.prime/entities/<name>.json, and injects relevant context into the next prompt automatically. Your agent code does not change. Your framework does not change.

What you get: the same LLM call now produces context-aware responses about people, projects, characters, or any entity that surfaced in earlier conversations. No tinkering with vector stores. No prompt engineering for context retrieval. No framework lock-in.

Before and after the memory proxy

Same LLM. Same prompt. The proxy injects entity context automatically. Same one URL change, different conversational quality.

        
        steelspine — memory proxy demo

# Before: direct to Ollama, no memory layer
$ OLLAMA_HOST=http://localhost:11434 ollama run llama3.1 \
    "What did Sarah say about the project deadline?"
  → I don't have context about Sarah or any project. Could you provide more details?
# Start the memory proxy (one-time)
$ steelspine start
✓ Memory proxy on http://localhost:11435
✓ Auto-detected: Ollama at localhost:11434
# After: same client, change port to proxy. Memory injected automatically.
$ OLLAMA_HOST=http://localhost:11435 ollama run llama3.1 \
    "What did Sarah say about the project deadline?"
✓ Entity context injected: Sarah (8 prior turns) · project_alpha (12 prior turns)
  → Sarah mentioned the project deadline is October 15. She also said the design
    review needs to happen by October 1 to stay on track.
# Memory persists to ~/.prime/entities/ on your disk. Signed. Local-first.

No SDK install. No framework changes. The proxy speaks OpenAI's chat completions API, so any client (LangChain, LlamaIndex, custom, terminal) that points at it gets memory automatically. Same with LM Studio, llama.cpp, vLLM, and any OpenAI-compatible endpoint.

Four properties that differentiate

01 / Local-first

Memory stays on your machine

All entity memory is stored at ~/.prime/entities/ on your filesystem. No cloud sync. No vendor servers. No data residency complications. Air-gapped deployments are first-class. Memory survives vendor outages, account terminations, and ToS changes.

02 / Cryptographically Audited

Every memory write is signed

Memory mutations append to the same HMAC-SHA256 + Ed25519 hash chain that powers SteelSpine's compliance audit. Tampering with stored memory breaks the chain. Useful for regulated environments where AI-modified data has audit-trail requirements.

03 / Drop-in Compatibility

One URL change. Any OpenAI-compatible LLM.

The proxy speaks OpenAI's chat completions API. Any client (LangChain, LlamaIndex, custom, terminal) that points at it gets memory automatically. Works with Ollama, LM Studio, llama.cpp, vLLM, OpenAI itself, Anthropic via gateway, anything that speaks the same wire protocol.

04 / Replayable

Every memory state has a captured history

Memory does not just persist; it has a replay surface. Reconstruct what an entity knew at any point in time with state-at <event_id>. Branch alternative memory states for what-if exploration. Debug memory bugs deterministically.

How this compares to Mem0, Letta, Zep

The memory category exists. SteelSpine is the local-first + audit-grade entry. Honest comparison:

Property	Mem0	Letta	Zep	SteelSpine Memory
Pricing entry	$19/mo retail, $249/mo Pro	Open source (self-host)	Free tier + credits	Free
Local-first storage	Cloud-first	Self-hosted	Cloud-first	Local-first default
Cryptographic audit chain	No	No	No	HMAC + Ed25519 signed
SDK required	SDK	SDK / Python framework	SDK	No SDK — transparent proxy
Setup steps	npm install + API key	Self-host stack	API key + integration	One env var
Works with any OpenAI-compatible LLM	SDK-specific	Framework-specific	SDK-specific	Any OpenAI-compatible endpoint
Replay / branch memory state	No	No	No	replay-branch, state-at
Bundled with audit + replay infrastructure	Memory only	Memory only	Memory only	Full SteelSpine product

Mem0 has $24M Series A and 48k GitHub stars. Strong product. SteelSpine differentiates on local-first + cryptographic audit + drop-in transparent proxy + bundled with full audit/replay stack. Same category, structurally different positioning.

Pricing

One tier. Memory plus the full SteelSpine stack (debug, replay, verify, branch, OTEL, MCP) included.

Free

Free, no account needed

Persistent entity memory across sessions
Auto entity extraction + context injection
Transparent proxy (Ollama, LM Studio, llama.cpp, vLLM, any OpenAI-compatible)
Cryptographically-signed memory mutations
Replay + branch memory state
Local-first storage (~/.prime/entities/)
Plus the full SteelSpine stack: debug, replay, verify, OTEL, MCP, Cursor + Claude Code integrations
1 user, 1 machine

Get it free

Multi-seat team or enterprise deployment? See DevOps tier or compliance tier.

Local-first persistent memory.One URL change.

The memory problem in AI agents

How it works

Step 1: Start the memory proxy

Step 2: Point your LLM client at the proxy

Step 3: Use your LLM normally

Before and after the memory proxy

Four properties that differentiate

Memory stays on your machine

Every memory write is signed

One URL change. Any OpenAI-compatible LLM.

Every memory state has a captured history

How this compares to Mem0, Letta, Zep

Pricing

Local-first persistent memory.
One URL change.