Local-first AI platform — v2.5.0 now shipping

Your private AI platform.
Every model. Zero cloud.

Multi-tab chat, a node-graph automation engine, an integrated terminal, finance charts, scheduled agents, and a layered memory system — all running on your own machine, owned entirely by you.

Learn more → See the node graph engine →

localhost:5173 — Sterling

Sterling v2 — multi-tab chat with history panel

Multi-provider

One input. Every model.

Sterling routes every send through a unified model router. Switch providers per message, mid-conversation. Cost ticker tracks real spend — including a subscription-equivalent stat for flat-rate Claude.ai CLI sessions.

Claude Opus / Sonnet / Haiku API

GPT-4o / o3 / o4-mini API

Grok-3 (xAI) API

Gemini Pro / Flash API

DeepSeek Chat / Reasoner API

Qwen Plus / Turbo / Coder API

Ollama (any model) LOCAL

Claude.ai CLI SUBSCRIPTION

Multi-tab chat

Every conversation,
side by side.

Open as many sessions as you need. Background send lets you fire off a task and return while the agent works. ESC cancels any stream, instantly, across every provider.

Streaming queue — hit Enter while a stream is in flight to queue the next message; it auto-sends when the current one finishes.
@mention autocomplete to switch agents mid-conversation, and /command autocomplete that surfaces every saved graph as a runnable slash command.
Up/Down arrow walks both your chat sends and slash-command invocations with draft preservation.
Drag-and-drop file attachments — text files inline, binary as path references.
Live cost ticker showing real API spend plus a subscription-equivalent stat for flat-rate sessions.
Per-send model switching — next message can go to a different provider entirely.

⏵ Background send

Node graph engine

Build automations by
wiring nodes, not writing code.

A full visual automation runtime with 50+ generic primitives. Wire them into any workflow. Save as plain JSON. Trigger from chat with a /command.

Take full manual control or use AI to completely wire node-graphs. Modify anything, any time.
Wave-concurrent execution — every node whose inputs are ready runs in parallel via asyncio.gather().
Typed ports catch wiring mistakes at edit time. 15 port types including ohlcv for the finance pipeline.
Live execution viz — node borders animate, edges flash as values flow, hover any output port for a value preview.
Pause / Step / Resume — pause before any wave, advance one wave at a time, or let it run.
Full run history — every event persisted. Click any past run to scrub its timeline in replay mode.
Pre-run safety modal — flags risky file writes, path traversals, and missing subgraph references before execution.
Use node-graphs as reusable blocks and chain together for complex systems.

trigger.* agent.invoke model.complete flow.foreach flow.branch flow.parallel chart.candlestick finance.history web.search file.write data.template output.report event.publish

50+ built-in nodes

▦

Nested subgraphs

Every saved graph auto-registers as a subgraph: node. Drop it on any canvas. Nested runs are first-class — visible in the active-runs tree, cancellable from the top level, recoverable on refresh.

⟳

Foreach progress bars

flow.foreach emits progress events after each iteration. Node cards render a live fill bar with 5/15 (1✗) — green for done, red overlay for failures. No guessing where a long batch run is.

⇄

Pub/sub event bus

event.publish sends a payload on a named channel; trigger.event subscribes a graph to it. Subscriber discovery is implicit. Hop guard at depth 5 prevents loops. Dedup window prevents double-fires. The System viewer shows every active subscription triple and the dedup table.

✎

Agent-authored graphs

Six tools — list_nodes, list_graphs, get_graph, create_graph, update_graph, run_graph — are globally available to Anthropic agents. Ask an agent to build a workflow in chat; it saves and runs it without leaving the conversation.

↑↓

Import / export

Graphs are plain .graph.json files. Toolbar Import and Export buttons round-trip them without loss. Share graphs as files; no proprietary format, no cloud registry required.

↺

Refresh recovery

A browser refresh during a 30-minute graph run reconstitutes the canvas in ~1 second. Sterling walks the persisted event log through the exact same code path live events use — the result is identical to having watched the whole time.

Chart & Finance Pipeline

End-to-end stock charts
from one /command.

Type /stock NVDA 4d in chat. Sterling fetches OHLCV bars, computes indicators, renders a candlestick, embeds it in a markdown report, and opens a live interactive tab — without leaving the interface.

📈

finance.history

Fetches OHLCV bars for any ticker. Alpha Vantage or yfinance backends. Supports any <num><unit> period: 1d, 4d, 1w, 2w, 1mo, 6mo, 1y, 7y, ytd, max. Period grammar bridges yfinance's fixed allowlist transparently.

〜

finance.indicator

Computes SMA, EMA, RSI (Wilder's, 14-bar default), MACD (12/26 EMA crossover), and Bollinger Bands (upper/middle/lower, configurable σ multiplier). Output is a labeled time-series ready to overlay on any chart.

▦

9 chart types

chart.candlestick, .line, .bar (vertical/horizontal), .scatter, .histogram, .heatmap, .grid (1–4 panel layouts). chart.to_png server-renders via vl-convert. chart.tab opens an interactive vega-embed tab with zoom, pan, and hover tooltips.

Full system visibility

See what's running.
All the way down.

The System modal gives you a live window into every layer of the platform — GPU, processes, recent errors, the tool event log, and the pub/sub event bus subscriber table, all filterable in real time.

GPU tab — utilization %, VRAM %, per-model residency, sparklines for the last few minutes, and a one-click flush to eject every loaded Ollama model.
Errors tab — recent backend warnings ring-buffered and surfaced without digging through logs.
Tool log — every output.log call from node graph runs, filterable by graph, run, level, and message content.
Event Bus tab — live subscriber triples (channel, graph, node) plus the dedup table with clear controls for debugging publish chains.

Live GPU stats

Memory system

Memory that's yours.
Files, not magic.

Sterling doesn't extract or summarize anything automatically. Every memory file is something you wrote, stored at a known path on disk. The platform assembles the system prompt fresh on every send.

Always-injected: SOUL.md (identity/personality), MEMORY.md (long-term notes), GRAPH_BUILDING.md (graph-authoring rules for AI agents).
Conditionally injected: per-project memory when a session is tagged, per-agent memory when that agent is active, and your Personal Info block from Settings.
Frontmatter triggers — memory files declare a trigger: <regex> header. In selective mode, only matching files load — keeping domain memories from taxing every turn.
Memory indicator in the topbar shows exactly which files loaded this turn. In selective mode, hover any chip to see the trigger regex and the matched snippet.

🧠 Memory indicator

Settings

Configured to you.
Yours to own.

Every preference, every path, every API key — stored in plain files or your OS keyring. Eight pages of settings, zero cloud configuration.

Backgrounds — 20 patterns

System — model & memory mode

Personal — injected every send

Search — SearXNG config

Storage — AES-Fernet vault

Paths — custom file sections

Heartbeat — recurring agent

Help — inline, always current

Integrated terminal

A real PTY.
Not a pretend one.

Every terminal tab is a full PTY-backed shell — your actual $SHELL, xterm-256color, ncurses support. vim, htop, less — all work. The terminal resizes the underlying PTY when the tab resizes, and closing the tab kills the shell cleanly.

xterm.js with @xterm/addon-fit — tracks the container size and calls ioctl to resize the PTY.
256-color escape sequences and full ncurses support for interactive programs.
WebSocket disconnect cleans up active terminals so a browser close doesn't leak shell processes.
FD hardening at startup prevents the terminal PTY from inheriting Sterling's listening socket across uvicorn reloads.

Real PTY shell

Privacy by design

Runs on your machine.
Not theirs.

Sterling binds to 127.0.0.1 by default. No telemetry, no analytics, no opt-out toggles — there's nothing to opt out of. Every session, memory file, agent, and graph is a plain file on your disk.

API keys are stored in your OS keyring by default. Encrypted-vault mode available for headless hosts — AES-Fernet protected by a master password plus a one-time recovery key.
Messages go to whichever LLM provider you configured. Sterling itself never sees your conversations beyond writing them locally as JSONL files you own.
No remote access without explicit STERLING_BIND_ALL=1. Default is strictly localhost-only.
Every data file — sessions, graphs, memory, agents — is human-readable, git-friendly, and lives in a path you control.

OS keyring + AES vault

Reliability

Real software.
Real failure modes. Real fixes.

The largest hurdles in Sterling's history were reliability under production workloads. Each was diagnosed by tracing the actual code path and fixed structurally, not worked around.

~50ms

Cancel propagation latency

ESC always cancels any active stream. The chat handler fires its task and moves on — it no longer awaits, which was blocking the WebSocket receive loop and silently queuing abort messages. Claude CLI subprocess races readline against cancel_event.wait(), terminating in ~50ms even when wedged in a tool-use loop.

~1s

Refresh-during-run recovery

A browser refresh mid-run reconstitutes the full canvas in about one second. Sterling walks the persisted event log through the same code path live events use. Works through arbitrarily nested subgraph runs — the entire run tree recovers, not just the top level.

Shutdown hard timeout

Backend shutdown actively cancels every in-flight chat stream and graph run, closes WebSockets cleanly with code 1001, force-cancels stragglers after a 3-second drain, and wraps the whole sequence in an 8-second asyncio.wait_for. The process cannot hang on shutdown.

∞

Cancel through nested runs

v2.5.0 promoted nested runs to first-class. Inner subgraph runs adopt their parent's cancel token via a bridge task. Outer ESC tears down the entire chain — a 5-level foreach nesting stops as reliably as a single top-level run. No custom propagation needed at each level.

Architecture

Built on a solid foundation.

Python + FastAPI on the backend, React + TypeScript on the front. A single shared WebSocket carries everything bidirectional. SQLite + JSONL + plain files for storage — no exotic infra.

Frontend

React 18 + TypeScript via Vite

Zustand for app state

React Flow for node graph canvas

Monaco for file editing

xterm.js for terminal

vega-embed for chart tabs

react-markdown + remark-gfm

WebSocket

Sterling

FastAPI · port 8766

Python 3.11+

REST

Backend + Storage

SQLite — sessions, graphs, runs, tasks, settings

JSONL — append-only message logs

APScheduler — cron tasks + heartbeat

OS keyring + AES-Fernet vault

ptyprocess — terminal PTY shells

vl-convert — server-side chart render

Multiple LLM SDKs — unified router

Your private AI platform.Every model. Zero cloud.

One input. Every model.

Every conversation,side by side.

Build automations bywiring nodes, not writing code.

End-to-end stock chartsfrom one /command.

See what's running.All the way down.

Memory that's yours.Files, not magic.

Configured to you.Yours to own.

A real PTY.Not a pretend one.

Runs on your machine.Not theirs.

Real software.Real failure modes. Real fixes.