Local-first AI platform — v2.5.0 now shipping

Your private AI platform.
Every model. Zero cloud.

Multi-tab chat, a node-graph automation engine, an integrated terminal, finance charts, scheduled agents, and a layered memory system — all running on your own machine, owned entirely by you.

Learn more → See the node graph engine →
localhost:5173 — Sterling
Sterling v2 — multi-tab chat with history panel
50+ Built-in nodes
10 LLM providers
20 UI themes & patterns
0 Telemetry & tracking
v2.5 Shipped today

One input. Every model.

Sterling routes every send through a unified model router. Switch providers per message, mid-conversation. Cost ticker tracks real spend — including a subscription-equivalent stat for flat-rate Claude.ai CLI sessions.

Claude Opus / Sonnet / Haiku API
GPT-4o / o3 / o4-mini API
Grok-3 (xAI) API
Gemini Pro / Flash API
DeepSeek Chat / Reasoner API
Qwen Plus / Turbo / Coder API
Ollama (any model) LOCAL
Claude.ai CLI SUBSCRIPTION

Every conversation,
side by side.

Open as many sessions as you need. Background send lets you fire off a task and return while the agent works. ESC cancels any stream, instantly, across every provider.

  • Streaming queue — hit Enter while a stream is in flight to queue the next message; it auto-sends when the current one finishes.
  • @mention autocomplete to switch agents mid-conversation, and /command autocomplete that surfaces every saved graph as a runnable slash command.
  • Up/Down arrow walks both your chat sends and slash-command invocations with draft preservation.
  • Drag-and-drop file attachments — text files inline, binary as path references.
  • Live cost ticker showing real API spend plus a subscription-equivalent stat for flat-rate sessions.
  • Per-send model switching — next message can go to a different provider entirely.
Sterling multi-tab chat
⏵ Background send
Sterling light theme
Light theme — 20 background patterns available
Sterling start screen
Start Screen

Build automations by
wiring nodes, not writing code.

A full visual automation runtime with 50+ generic primitives. Wire them into any workflow. Save as plain JSON. Trigger from chat with a /command.

  • Take full manual control or use AI to completely wire node-graphs. Modify anything, any time.
  • Wave-concurrent execution — every node whose inputs are ready runs in parallel via asyncio.gather().
  • Typed ports catch wiring mistakes at edit time. 15 port types including ohlcv for the finance pipeline.
  • Live execution viz — node borders animate, edges flash as values flow, hover any output port for a value preview.
  • Pause / Step / Resume — pause before any wave, advance one wave at a time, or let it run.
  • Full run history — every event persisted. Click any past run to scrub its timeline in replay mode.
  • Pre-run safety modal — flags risky file writes, path traversals, and missing subgraph references before execution.
  • Use node-graphs as reusable blocks and chain together for complex systems.
trigger.* agent.invoke model.complete flow.foreach flow.branch flow.parallel chart.candlestick finance.history web.search file.write data.template output.report event.publish
Sterling node graph editor
50+ built-in nodes
Sterling node graph running with live animations
Live execution — pulsing amber while running, green on success, foreach progress bars visible
Active graphs pill showing run tree
Active graphs pill — click any layer in the run tree for live view
Nested subgraphs
Every saved graph auto-registers as a subgraph: node. Drop it on any canvas. Nested runs are first-class — visible in the active-runs tree, cancellable from the top level, recoverable on refresh.
Foreach progress bars
flow.foreach emits progress events after each iteration. Node cards render a live fill bar with 5/15 (1✗) — green for done, red overlay for failures. No guessing where a long batch run is.
Pub/sub event bus
event.publish sends a payload on a named channel; trigger.event subscribes a graph to it. Subscriber discovery is implicit. Hop guard at depth 5 prevents loops. Dedup window prevents double-fires. The System viewer shows every active subscription triple and the dedup table.
Agent-authored graphs
Six tools — list_nodes, list_graphs, get_graph, create_graph, update_graph, run_graph — are globally available to Anthropic agents. Ask an agent to build a workflow in chat; it saves and runs it without leaving the conversation.
↑↓
Import / export
Graphs are plain .graph.json files. Toolbar Import and Export buttons round-trip them without loss. Share graphs as files; no proprietary format, no cloud registry required.
Refresh recovery
A browser refresh during a 30-minute graph run reconstitutes the canvas in ~1 second. Sterling walks the persisted event log through the exact same code path live events use — the result is identical to having watched the whole time.

End-to-end stock charts
from one /command.

Type /stock NVDA 4d in chat. Sterling fetches OHLCV bars, computes indicators, renders a candlestick, embeds it in a markdown report, and opens a live interactive tab — without leaving the interface.

📈
finance.history
Fetches OHLCV bars for any ticker. Alpha Vantage or yfinance backends. Supports any <num><unit> period: 1d, 4d, 1w, 2w, 1mo, 6mo, 1y, 7y, ytd, max. Period grammar bridges yfinance's fixed allowlist transparently.
finance.indicator
Computes SMA, EMA, RSI (Wilder's, 14-bar default), MACD (12/26 EMA crossover), and Bollinger Bands (upper/middle/lower, configurable σ multiplier). Output is a labeled time-series ready to overlay on any chart.
9 chart types
chart.candlestick, .line, .bar (vertical/horizontal), .scatter, .histogram, .heatmap, .grid (1–4 panel layouts). chart.to_png server-renders via vl-convert. chart.tab opens an interactive vega-embed tab with zoom, pan, and hover tooltips.

See what's running.
All the way down.

The System modal gives you a live window into every layer of the platform — GPU, processes, recent errors, the tool event log, and the pub/sub event bus subscriber table, all filterable in real time.

  • GPU tab — utilization %, VRAM %, per-model residency, sparklines for the last few minutes, and a one-click flush to eject every loaded Ollama model.
  • Errors tab — recent backend warnings ring-buffered and surfaced without digging through logs.
  • Tool log — every output.log call from node graph runs, filterable by graph, run, level, and message content.
  • Event Bus tab — live subscriber triples (channel, graph, node) plus the dedup table with clear controls for debugging publish chains.
Sterling system GPU view
Live GPU stats
Event bus viewer
Pub/sub event bus
Tool log
Tool event log
Process list
Process monitor

Memory that's yours.
Files, not magic.

Sterling doesn't extract or summarize anything automatically. Every memory file is something you wrote, stored at a known path on disk. The platform assembles the system prompt fresh on every send.

  • Always-injected: SOUL.md (identity/personality), MEMORY.md (long-term notes), GRAPH_BUILDING.md (graph-authoring rules for AI agents).
  • Conditionally injected: per-project memory when a session is tagged, per-agent memory when that agent is active, and your Personal Info block from Settings.
  • Frontmatter triggers — memory files declare a trigger: <regex> header. In selective mode, only matching files load — keeping domain memories from taxing every turn.
  • Memory indicator in the topbar shows exactly which files loaded this turn. In selective mode, hover any chip to see the trigger regex and the matched snippet.
Sterling memory indicator and VRAM
🧠 Memory indicator

Configured to you.
Yours to own.

Every preference, every path, every API key — stored in plain files or your OS keyring. Eight pages of settings, zero cloud configuration.

Background patterns settings
Backgrounds — 20 patterns
System settings
System — model & memory mode
Personal settings
Personal — injected every send
Search settings
Search — SearXNG config
Storage and encryption
Storage — AES-Fernet vault
Custom paths
Paths — custom file sections
Heartbeat settings
Heartbeat — recurring agent
Help page
Help — inline, always current

A real PTY.
Not a pretend one.

Every terminal tab is a full PTY-backed shell — your actual $SHELL, xterm-256color, ncurses support. vim, htop, less — all work. The terminal resizes the underlying PTY when the tab resizes, and closing the tab kills the shell cleanly.

  • xterm.js with @xterm/addon-fit — tracks the container size and calls ioctl to resize the PTY.
  • 256-color escape sequences and full ncurses support for interactive programs.
  • WebSocket disconnect cleans up active terminals so a browser close doesn't leak shell processes.
  • FD hardening at startup prevents the terminal PTY from inheriting Sterling's listening socket across uvicorn reloads.
Sterling integrated terminal
Real PTY shell

Runs on your machine.
Not theirs.

Sterling binds to 127.0.0.1 by default. No telemetry, no analytics, no opt-out toggles — there's nothing to opt out of. Every session, memory file, agent, and graph is a plain file on your disk.

  • API keys are stored in your OS keyring by default. Encrypted-vault mode available for headless hosts — AES-Fernet protected by a master password plus a one-time recovery key.
  • Messages go to whichever LLM provider you configured. Sterling itself never sees your conversations beyond writing them locally as JSONL files you own.
  • No remote access without explicit STERLING_BIND_ALL=1. Default is strictly localhost-only.
  • Every data file — sessions, graphs, memory, agents — is human-readable, git-friendly, and lives in a path you control.
Sterling encrypted key vault
OS keyring + AES vault

Real software.
Real failure modes. Real fixes.

The largest hurdles in Sterling's history were reliability under production workloads. Each was diagnosed by tracing the actual code path and fixed structurally, not worked around.

~50ms
Cancel propagation latency
ESC always cancels any active stream. The chat handler fires its task and moves on — it no longer awaits, which was blocking the WebSocket receive loop and silently queuing abort messages. Claude CLI subprocess races readline against cancel_event.wait(), terminating in ~50ms even when wedged in a tool-use loop.
~1s
Refresh-during-run recovery
A browser refresh mid-run reconstitutes the full canvas in about one second. Sterling walks the persisted event log through the same code path live events use. Works through arbitrarily nested subgraph runs — the entire run tree recovers, not just the top level.
8s
Shutdown hard timeout
Backend shutdown actively cancels every in-flight chat stream and graph run, closes WebSockets cleanly with code 1001, force-cancels stragglers after a 3-second drain, and wraps the whole sequence in an 8-second asyncio.wait_for. The process cannot hang on shutdown.
Cancel through nested runs
v2.5.0 promoted nested runs to first-class. Inner subgraph runs adopt their parent's cancel token via a bridge task. Outer ESC tears down the entire chain — a 5-level foreach nesting stops as reliably as a single top-level run. No custom propagation needed at each level.

Built on a solid foundation.

Python + FastAPI on the backend, React + TypeScript on the front. A single shared WebSocket carries everything bidirectional. SQLite + JSONL + plain files for storage — no exotic infra.

Frontend
React 18 + TypeScript via Vite
Zustand for app state
React Flow for node graph canvas
Monaco for file editing
xterm.js for terminal
vega-embed for chart tabs
react-markdown + remark-gfm
WebSocket
Sterling
FastAPI · port 8766
Python 3.11+
REST
Backend + Storage
SQLite — sessions, graphs, runs, tasks, settings
JSONL — append-only message logs
APScheduler — cron tasks + heartbeat
OS keyring + AES-Fernet vault
ptyprocess — terminal PTY shells
vl-convert — server-side chart render
Multiple LLM SDKs — unified router

The platform that earns its keep
one tag at a time.

Per-commit version bumps. Visible state. Built for users who have outgrown what a hosted chat product can do for them.