Running — v0.3.0

Sterling Agent

Not a chatbot. An agent.

A personal AI system that thinks, remembers, schedules, and acts — backed by Claude, extended with local models, and built entirely around one person's actual work.

Claude + Ollama
25+ specialized agents
Runs locally
Persistent memory
Scheduled tasks

Built because nothing else was enough.

Jayson is building an AI-powered empire — Flexible, adaptive, robust. A creators utopia. It started with a small online booking system. Then it turned into a fully featured photography critiquing website built on php and MariaSQL. He started building helper apps for electronics circuit design, tools, game helpers, professional landing pages. From there it turned into a full stack web development company branded under his own name. If you're going to develop professional websites for business and commerce, you might as well build a proper booking Saas next.

From here he saw the need for more control. He installed OpenClaw. Quickly captivated, but also hugely turned off by the API limits set out by anthropic recently on agents, he needed a new angle. He built his own openclaw. Specificly to work around Anthropics forced API usage on agents and blocking of CLI use. it started as a thin wrapper around Claude CLI.Soon it became a pipe to paralell proceses and invoke more agents. To improve efficiency and build more complexity, faster. This is where Sterling Agent realy took stride.

Testing and development of an agent, and a desire to try and automate a complex project, lead Jayson to build an Avatar scripted AI News Channel on YouTube. This obviously needed a website to complement. The news platform is comprised of a daily published long format YouTube video, a targeted short video, and web based news article — each hosted by a fully AI-generated avatar. Every video starts as a topic scrape for direction and then raw information. It ends as a polished production package ready for publish. He learnt how to setup and manage MCP and API connections. He dove into image video and audio genereation. He took many side quests on. Automating along the way. There are still some very manual components; there are also a host of fully automated components. Every day it shifts more towards autopiliot. An evolving pipeline. That pipeline needs an operator.

Off-the-shelf tools weren't it. Generic chat interfaces forget everything the moment you close them. Cloud assistants don't have access to local files, production workflows, or project context. What was needed was something that could think across sessions, delegate to specialized models, run tasks on a schedule, and keep an entire operation moving — without constant hand-holding.

So Sterling Agent got built. It started as a thin wrapper around Claude CLI and grew from there. Every feature added because it was actually needed. Every decision made in service of real work. It runs locally on a single machine, talks to the browser over WebSocket, and maintains persistent memory across every conversation.

25+
Specialized agents
v0.3
Current version
10
AI models wired in
~3.5k
Lines of frontend
Scheduled tasks

This is the actual app.

Dark theme, real-time streaming chat, collapsible tool call blocks, sidebar panels for agents, history, scheduling, and files. No mockup — this is what runs.

localhost:8765
Sterling Agent interface — streaming chat with sidebar panels, tool call blocks, agent routing, and file management

Sterling Agent  ·  localhost:8765  ·  Claude-backed · WebSocket-streamed · fully local

Inside Sterling Agent.

Every panel in the sidebar is a live tool. Not a settings menu — an active workspace. Here's what each one does.

The main interface.

Real-time streaming responses with full tool call visibility. Every tool Sterling calls — web searches, file reads, agent invocations — expands inline as a collapsible block so nothing is hidden. The model switcher in the input bar toggles between any cloud or local model mid-conversation. A message queue lets you send the next prompt before the current response finishes. Cancel and resume at any point without losing the session.

Sterling Agent — Chat panel
Cron-based autonomous tasks.

Jobs defined here fire prompts directly into Claude on a cron schedule — and they persist across server restarts. The heartbeat fires every 10 minutes in its own isolated session, reviewing memory, checking project state, and surfacing reminders. Custom jobs can trigger any pipeline, reminder, or automated task. Scheduled work streams its output into the chat feed live, so nothing runs blind.

Sterling Agent — Scheduled panel
Full workspace file browser.

Every agent definition, memory file, config, and workspace document is editable directly in the browser. Changes take effect immediately — no file manager, no terminal, no restart. The panel is organized into collapsible sections: Workspace files open by default; System, Runtime logs, and session logs collapse for a cleaner view. Click any file to open an inline editor. Save and it's live.

Sterling Agent — Files panel
All 34 agents, live.

Every loaded agent shown as a card with its name, model, category, and color. Agents are organized by pipeline role — Workflows, Research, Content, Utility. @mention any agent in chat to route that message directly to it, or let Sterling invoke them programmatically during pipeline work. Tab autocomplete makes @mentions fast — type @ and the list filters in real time.

Sterling Agent — Agents panel
The agent editor.

Click Edit on any agent to open the full-pane editor. Name, model, color, and category sit in a compact header row. The system prompt textarea fills the remaining space — no resize handle, no modal — just a direct edit surface. Changes save to the agent's .md file on disk and take effect immediately on the next invocation. New agents can be created and deleted here too.

Sterling Agent — Agent editor
Every conversation, logged.

Full session history stored as per-session JSONL files. Sessions tab shows all conversations with search — heartbeat-only sessions collapse into their own group so the main list stays clean. Events tab is a paginated connection log with the page controls pinned above the list so they stay in the same position page to page. Reload any session to resume the Claude thread exactly.

Sessions — heartbeat grouping

Sterling Agent — History Sessions

Events — pagination pinned above log

Sterling Agent — History Events
Every integration, one panel.

OAuth services (HeyGen, YouTube, Google), API-key services (Anthropic, ElevenLabs, Tavily), and MCP-managed services (Gmail) all tracked in one place. Three kinds of auth, one unified status view. Live connected/expired state per service, connect/disconnect toggle for OAuth flows, kind badge so you know what manages the credential. No credentials in config files.

Sterling Agent — Connections panel
The input bar is a control panel.

Model switcher, VRAM inspector, token cost tracker, and pipeline task watcher are all embedded in the chat input bar — always visible, never in a separate settings screen. Switch models mid-conversation, eject a loaded Ollama model from GPU, see live token cost and context %, and track running background pipelines with per-job PID tags — all from one bar.

Model selector

Sterling Agent — Model selector

Token cost tracker · pipeline task watcher · PID tags

Sterling Agent — VRAM inspector

VRAM inspector + eject

Sterling Agent — Chat bar features
System configuration.

Model selection, heartbeat interval, and system preferences — all adjustable from the browser without touching config files or restarting the server. Settings persist in SQLite. The Mission Control section lets you start, stop, or restart the server itself from the UI. One button to bring everything back up if something goes sideways mid-session.

Sterling Agent — Settings panel
Built-in documentation.

A full reference panel covering every feature, model, keyboard shortcut, and troubleshooting step — written and kept current as the system grows. The Features section walks through every panel. Models section groups Cloud and Local with tag pills. Shortcuts lists every keyboard action. Troubleshooting includes copy-ready commands for diagnosing common failures. Documentation that lives in the tool, not somewhere else.

Sterling Agent — Help panel

What it actually does.

From a single chat interface, Sterling can research, write, schedule, delegate, publish, and remember — all while keeping the conversation open.

🧠
Claude + Local Models

Claude (via the Anthropic CLI) is the primary brain — the orchestrator that reasons, delegates, and drives long-horizon tasks. Local Ollama models (Qwen, DeepSeek, Llama Vision) handle agent subtasks where speed or cost matters. Both share the same streaming interface.

💾
Persistent Memory Across Sessions

Every Claude session can be resumed. Long-term memory lives in editable workspace files — MEMORY.md, PROJECTS.md, SOUL.md — loaded fresh into every system prompt. Sterling remembers decisions, preferences, and project context across all conversations.

Scheduled Autonomous Tasks

APScheduler runs cron-based jobs that fire prompts directly into Claude sessions. The heartbeat fires every 10 minutes in its own isolated session — reviewing memory, checking project state, surfacing reminders. Custom jobs scheduled from the UI persist across restarts.

🤖
25+ Specialized Agents

Each agent is a Markdown file with YAML frontmatter — name, model, category, write directory, system prompt. @mention one in chat to invoke it interactively. Sterling invokes them programmatically for pipeline work. Agents can call other agents.

Background Pipeline Mode

Long pipelines run in separate Claude sessions — the chat stays live while they work. Job output streams into the chat feed in real time, labeled and color-coded with a pulsing status dot. Sterling and Jayson can keep talking while the pipeline runs.

🎬
YouTube Content Pipeline

Research → scrub → script → production markup → titles → description → Shorts version → HeyGen scene prompt → thumbnail prompt → keywords → compile → proof. Every stage is a named agent. The full chain runs as a single orchestrated background workflow.

🎞️
Motion Graphics Pipeline

Scripts with [STATCARD] and [LOWER3] markup get parsed into animated HTML capture files. Playwright records each card to ProRes .mov for DaVinci Resolve import — stat cards, lower thirds, and outros with count-up animations and staggered reveals.

🌐
Browser Automation + Web Search

Playwright headless Chromium handles JS-rendered pages, Cloudflare-gated content, and soft paywalls. Tavily powers structured web search. Both are available as tools to any agent in the pipeline — deep research is the output, not just headlines.

🔗
OAuth Connections

Custom PKCE OAuth 2.0 handles external service auth. HeyGen is live — OAuth token stored in SQLite, status in the Connections panel. A video watcher polls the HeyGen API for completed recordings and downloads them automatically.

📁
File Registry + Inline Media

Uploaded images, captured frames, and generated files get 8-char hex IDs. The [[FILE:id]] system renders them inline in chat. Agent write directories are controlled per-agent — research, pipeline output, scripts, and code stay separated.

researcher
agent_scrub
agent_script
agent_script_markup
agent_titles
agent_description
agent_short
agent_heygen
agent_thumbprompt
agent_thumb
agent_keywords
agent_sources
compile
agent_proof

Everything wired in.

Technical breakdown by category. Built incrementally — every item here was added because the work demanded it.

⚙️ Core Infrastructure 12 features
FastAPI + WebSocket server backend
Uvicorn on port 8765. WebSocket /ws handles all streaming chat — text chunks, tool calls, tool results, session state, usage stats, and job events.
Session resumption claude
Active Claude session ID persisted in SQLite. Every message passes --resume to continue the exact conversation thread across browser refreshes and server restarts.
SQLite persistence
agent.db stores sessions, scheduled jobs, settings, file registry, OAuth tokens, connection events, and the message queue. No external database required.
JSONL conversation logs
Every session writes a per-session .jsonl file. Full history panel with search, session switching, and inline file previews backed by these logs.
Message queue
Messages sent mid-response queue in SQLite and drain after the current response completes. Each queued message re-parses for @mentions independently.
APScheduler (cron + interval) scheduler
AsyncIOScheduler with CronTrigger for custom jobs and IntervalTrigger for heartbeat. All jobs persist across restarts via the scheduled_jobs table.
Heartbeat isolation
The heartbeat fires in its own session (session_id=None) — it never overwrites the interactive Claude session. Reads HEARTBEAT.md, broadcasts output to chat.
Memory reminder system
Each heartbeat scans MEMORY.md for [remind] tags with today's date and schedules one-shot jobs at 10:03 AM. Marker files prevent duplicate firings across restarts.
256MB subprocess buffer
asyncio StreamReader set to 256MB for Claude CLI stdout. The default 64KB caused silent crashes on large tool results — a critical reliability fix that unlocked stable tool use.
PKCE OAuth 2.0 auth
Generic OAuth layer in oauth.py. Authorization URL generation, code exchange, per-service token storage in SQLite. Broadcasts connection events to all WebSocket clients.
File registry + [[FILE:id]] system
8-char hex IDs for all registered files. Frontend parses [[FILE:id]] tags in agent responses and renders images inline or as download links via /api/file-ref/{id}.
Mission Control panel
Start, stop, and restart service processes from the browser UI. Port-based process management via /api/mc/* — includes full server restart in one click.
🤖 AI Models 10 models

Cloud — Anthropic API

Claude Sonnet 4.6 primary · orchestrator
Primary reasoning engine. Runs via Claude CLI subprocess with stream-json output. Persistent sessions, full tool access, handles all long-horizon tasks, delegation, and judgment calls.
Claude Opus 4.7 most capable
Most capable Claude model. Reserved for tasks requiring maximum reasoning depth — complex orchestration, nuanced writing, or edge cases where Sonnet falls short.
Claude Haiku 4.5 fast · lightweight
Fast, low-cost Claude for simple structured tasks. Used where speed matters and full reasoning is overkill — quick lookups, format conversions, lightweight confirmations.

Local — Ollama on NVIDIA Titan RTX 24GB VRAM

Qwen3 30B local · general
General-purpose local model with strong reasoning. Handles a wide range of agent tasks — research, summarization, structured output — without the VRAM cost of the 35B variant.
Qwen3.6 35B local · heavy · ~24GB VRAM
Primary heavy agent model. Script writing, markup, thumbnail generation, descriptions. Explicitly unloaded from VRAM between pipeline phases to free memory for lighter models.
Qwen3.6 27B local · balanced
Mid-range Qwen. Writer, format, and moderate content agents. Faster cold start than 35B while retaining strong structured output performance.
Qwen3 14B local · fast
Lightweight utility model. Slug generation, condensing, headline pulls, keyword lists, scrubbing. Low VRAM footprint — stays resident between pipeline stages without cost.
Qwen3 Coder 30B local · code
Code-specialized model powering the Coder agent. Full-stack aware: Python, FastAPI, JavaScript, Bash, Swift, C/C++, Win32 API, CUDA. Strong at reading existing codebases before writing.
DeepSeek R1 32B local · reasoning · ~19GB VRAM
Chain-of-thought reasoning model. Q4 quantized, fits the Titan RTX. Runs with tools=False and extended think=True chains. Best for analysis tasks where deliberate multi-step reasoning matters.
DeepSeek R1 14B local · reasoning · lighter
Lighter reasoning model for when R1-style deliberation is needed but VRAM headroom is tight. Same no-tools, think=True configuration as the 32B — lower cost, less depth.
🧩 Agents System 34 agents loaded
Agent definition format YAML + Markdown
Each agent is a .md file with YAML frontmatter: name, model, description, color, category, write_dir. The Markdown body is the system prompt. All fields editable in the browser.
@mention routing with autocomplete
@AgentName prefix resolves to agent model + system prompt. Tab autocomplete shows a filtered dropdown. Last 50 messages injected as context for stateless Ollama agents.
Agent-to-agent delegation
POST /api/agent/invoke handles all agent-to-agent calls. Claude uses curl in its system prompt; Ollama models use the call_agent tool. Output always auto-saved to disk.
Per-agent write directories
write_dir frontmatter restricts each agent's write access. Research writes to workspace/research/, pipeline agents to workspace/pipeline/, coder to workspace/code/.
Slug-based output routing
Pass slug in invoke body → output at {write_dir}/slugs/{slug}/{slug}.{agent}.{MMDD}.md. All stages for a topic land in the same slug directory across agent write roots.
CRUD agent editor
Full-pane editor in the Agents panel UI. Shows filename, all frontmatter fields, and a full-height system prompt textarea. Create, edit, delete agents directly from the browser.
Task anchor injection
After each Ollama tool-call round, a [Task anchor] message reinjects the original prompt into the message history. Prevents objective drift on long tool-use loops.
VRAM gate + eject
Queue-based protection prevents concurrent large model loads from exceeding GPU capacity. VRAM popup in input bar shows loaded models and Eject button per model.
Why so many agents?
Each agent is single-responsibility — it does one thing well and hands off. This is intentional: when a single AI tries to research, write, proof, and publish all in one session, it drifts. It loses context. It compensates with confident hallucination.

Breaking the work into small, chained stages solves that. Research feeds scrubbing feeds scripting feeds markup — and no single agent ever carries the full load. Lighter tasks (slug generation, condensing, headline pulls) run on fast local models like Qwen 14B to save cost and time. Heavier ones (scripting, markup) go to the 35B. The model is matched to the task.

Multiple agents can run in parallel as background pipelines — offloading entire stages from Claude entirely. Sterling (Claude) acts as the orchestrator: it delegates, reviews, and directs. Agents execute. This division keeps costs low, quality high, and Claude focused on judgment rather than grunt work.

All loaded agents

researcher researcher_roundup youtube_workflow agent_broad agent_narrow agent_research agent_sources agent_scrub agent_condense agent_script agent_script_markup agent_short agent_titles agent_description agent_heygen agent_thumbprompt agent_thumb agent_slug agent_save agent_save_ammend agent_compiler agent_keywords agent_screen agent_framegrab agent_proof agent_motion agent_verify agent_web_publish agent_condense headlines coder writer format vision
🎬 Content Pipeline — YouTube Production YouTube production
researcher workflow orchestrated
Full research pipeline: generate slug → broad search (15–30 URLs) → narrow per-source fetch → Playwright scrape every article → structured research file with source attribution.
researcher_roundup orchestrated
Multi-story variant. Runs N parallel topic researches and produces a composite file with STORY section headers for roundup-style episodes covering multiple AI news items.
agent_scrub + agent_condense
agent_scrub strips nav noise, cookie boilerplate, and duplicate content from raw research. agent_condense extracts dated bullet facts and appends to a master condensate file.
agent_script qwen3.6:35b
~1500-word YouTube script in 9-section structure. Written in Jane Sterling's voice — direct, authoritative, unscripted-sounding. Reads the scrubbed research file as input material.
agent_script_markup qwen3.6:35b
Inserts [CUT], [BROLL], [STATCARD], and [LOWER3] production markup into a finished script — creating a capture-ready production document that feeds both HeyGen recording and motion graphics.
agent_titles + agent_description
CTR-optimized YouTube title variants and full video description (hook + summary + bullet topics + source links). Both feed the upload package and are proofed before delivery.
agent_short
125–175-word YouTube Shorts script from the most newsworthy moment in the episode. Standalone deliverable for vertical video distribution.
agent_heygen + agent_thumbprompt + agent_thumb
agent_heygen produces SCENE/LIGHTING/MOOD for the recording session. agent_thumbprompt generates a photographic thumbnail prompt + 3 titles + color palette. agent_thumb adds Photoshop layout specs.
agent_keywords + agent_proof
20 YouTube hashtags (channel hashtag first, then broad AI, then topic-specific). agent_proof reads the full compiled package and returns a structured change list before final delivery.
web_publish workflow orchestrated
Parses finished YouTube packages into the news channel website template. agent_verify confirms correctness before write. Automated publish from pipeline output to live site.
🎞️ Motion Graphics Pipeline ProRes deliverables
agent_motion
Parses [STATCARD] and [LOWER3] markup from script_markup files. Writes structured data files, then calls generate-motion-html.py to produce capture-ready animated HTML.
generate-motion-html.py
Generates animated HTML files from stat card data. CSS animations with configurable timing: count-up numbers, fade/slide in/out, stagger delays, roll-up for bottom-edge cards.
record-motion-cards.py
Playwright records each HTML motion graphic to a ProRes .mov. Individual card deliverables — stat card, lower third, outro — sized for DaVinci Resolve compositing import.
whisper-edl.py
Generates an EDL (Edit Decision List) from Whisper transcript output. Links audio timestamps to Resolve edit points for semi-automated assembly cuts from the script.
composite-cards.py
ffmpeg compositing utility. Crops individual cards from 4K captures at pixel-accurate bounds per slug, scales to proxy resolution for QC review before Resolve delivery.
🔌 Integrations Connected services
HeyGen connected
PKCE OAuth live. heygen-watch.py polls the v3 API every 60s for completed videos, downloads MP4 + SRT, registers via sterling-save. Recording stays manual via the web dashboard — no API credits needed.
ElevenLabs V3 (via HeyGen)
Jane Sterling's voice: "Tiffany - Friendly" on ElevenLabs V3. Wired through HeyGen's avatar recording interface. No direct ElevenLabs API calls — consistent voice without extra integration cost.
Tavily web search
Structured web search with 5-result responses. Available to all Ollama agents via the web_search tool. Primary entry point for broad research before Playwright article fetching.
Playwright / Chromium browser automation
Headless browser tool for the browser_fetch agent tool — handles JS-rendered pages, Cloudflare-gated content, and soft paywalls. Also drives screenshot capture and motion card recording.
Gmail MCP
Gmail access for pipeline health monitoring. The VPS nightly pipeline sends status emails; Sterling reads them each morning via heartbeat to flag failures or confirm clean runs.
Google Calendar / Drive stubbed
PKCE OAuth flows implemented and waiting. Require Google Cloud Console client registration. Architecture mirrors the working HeyGen integration exactly — a client_id away from live.

A living system.

This is not a product with a release cycle. It is a breathing workspace. When something breaks, it gets fixed on the spot — in the same interface, mid-conversation, with Sterling diagnosing and implementing the fix while Jayson watches the diff land. No ticket filed. No sprint planned. Fixed.

New features get implemented the moment they are needed. Background pipelines exist because blocking the chat during a research job was annoying — so they got built. The motion graphics pipeline exists because a video needed stat cards and there was no better path — so it got built. The architecture is the residue of actual use, not planning.

The system operates with as much or as little autonomy as Jayson wants. Fully supervised, one message at a time — or handed a goal and left to run overnight while the scheduler manages the pipeline. Sterling knows when to ask and when to just do it.

It finds its own limits and grows around them. The 64KB stdout crash got hit in a real session and fixed before the next one. The snowballing session log hit 197MB and got capped the same day. Every rough edge sharpened by use, not by audit.

It is not a product. It is an extension of how one person thinks and works. The goal was never feature completeness. It was fit — a system that matches the shape of the work so precisely that using it stops feeling like using a tool at all.

Built for one person.
Grown from necessity.

Sterling Agent has no product team, no roadmap, and no users besides Jayson. Every feature in it exists because real work demanded it — not because it seemed like a good idea in the abstract, not because a framework made it easy, not because someone filed a feature request. The heartbeat runs because autonomous memory review turned out to matter. Background pipelines exist because blocking the chat while a research job ran was annoying. The motion graphics pipeline exists because a YouTube production needed stat cards and there was no better way to get them.

The agent system grew the same way. One model was never enough — different tasks need different strengths. The pipeline chaining patterns emerged from actually running the YouTube workflow end-to-end and discovering which handoffs worked. The bugs that got fixed are the ones that broke real sessions — the 64KB stdout limit, the snowballing session logs, the model tag showing "synthetic" after a refresh. The architecture is the residue of real use.

This isn't a product. It's infrastructure for one person who needed something that could think alongside them, remember what matters, and do the work when asked. That's what it does.

Sterling Agent  ·  v0.3.0  ·  running locally  ·  built by Jayson