Sterling Agent

Why it exists

Built because nothing else was enough.

Jayson is building an AI-powered empire — Flexible, adaptive, robust. A creators utopia. It started with a small online booking system. Then it turned into a fully featured photography critiquing website built on php and MariaSQL. He started building helper apps for electronics circuit design, tools, game helpers, professional landing pages. From there it turned into a full stack web development company branded under his own name. If you're going to develop professional websites for business and commerce, you might as well build a proper booking Saas next.

From here he saw the need for more control. He installed OpenClaw. Quickly captivated, but also hugely turned off by the API limits set out by anthropic recently on agents, he needed a new angle. He built his own openclaw. Specificly to work around Anthropics forced API usage on agents and blocking of CLI use. it started as a thin wrapper around Claude CLI.Soon it became a pipe to paralell proceses and invoke more agents. To improve efficiency and build more complexity, faster. This is where Sterling Agent realy took stride.

Testing and development of an agent, and a desire to try and automate a complex project, lead Jayson to build an Avatar scripted AI News Channel on YouTube. This obviously needed a website to complement. The news platform is comprised of a daily published long format YouTube video, a targeted short video, and web based news article — each hosted by a fully AI-generated avatar. Every video starts as a topic scrape for direction and then raw information. It ends as a polished production package ready for publish. He learnt how to setup and manage MCP and API connections. He dove into image video and audio genereation. He took many side quests on. Automating along the way. There are still some very manual components; there are also a host of fully automated components. Every day it shifts more towards autopiliot. An evolving pipeline. That pipeline needs an operator.

Off-the-shelf tools weren't it. Generic chat interfaces forget everything the moment you close them. Cloud assistants don't have access to local files, production workflows, or project context. What was needed was something that could think across sessions, delegate to specialized models, run tasks on a schedule, and keep an entire operation moving — without constant hand-holding.

So Sterling Agent got built. It started as a thin wrapper around Claude CLI and grew from there. Every feature added because it was actually needed. Every decision made in service of real work. It runs locally on a single machine, talks to the browser over WebSocket, and maintains persistent memory across every conversation.

25+

Specialized agents

v0.3

Current version

AI models wired in

~3.5k

Lines of frontend

∞

Scheduled tasks

UI panels

Inside Sterling Agent.

Every panel in the sidebar is a live tool. Not a settings menu — an active workspace. Here's what each one does.

Panel 01 — Chat

The main interface.

Real-time streaming responses with full tool call visibility. Every tool Sterling calls — web searches, file reads, agent invocations — expands inline as a collapsible block so nothing is hidden. The model switcher in the input bar toggles between any cloud or local model mid-conversation. A message queue lets you send the next prompt before the current response finishes. Cancel and resume at any point without losing the session.

Panel 02 — Scheduled

Cron-based autonomous tasks.

Jobs defined here fire prompts directly into Claude on a cron schedule — and they persist across server restarts. The heartbeat fires every 10 minutes in its own isolated session, reviewing memory, checking project state, and surfacing reminders. Custom jobs can trigger any pipeline, reminder, or automated task. Scheduled work streams its output into the chat feed live, so nothing runs blind.

Panel 03 — Files

Full workspace file browser.

Every agent definition, memory file, config, and workspace document is editable directly in the browser. Changes take effect immediately — no file manager, no terminal, no restart. The panel is organized into collapsible sections: Workspace files open by default; System, Runtime logs, and session logs collapse for a cleaner view. Click any file to open an inline editor. Save and it's live.

Panel 04 — Agents

All 34 agents, live.

Every loaded agent shown as a card with its name, model, category, and color. Agents are organized by pipeline role — Workflows, Research, Content, Utility. @mention any agent in chat to route that message directly to it, or let Sterling invoke them programmatically during pipeline work. Tab autocomplete makes @mentions fast — type @ and the list filters in real time.

Panel 04b — Agents / Edit

The agent editor.

Click Edit on any agent to open the full-pane editor. Name, model, color, and category sit in a compact header row. The system prompt textarea fills the remaining space — no resize handle, no modal — just a direct edit surface. Changes save to the agent's .md file on disk and take effect immediately on the next invocation. New agents can be created and deleted here too.

Panel 05 — History

Every conversation, logged.

Full session history stored as per-session JSONL files. Sessions tab shows all conversations with search — heartbeat-only sessions collapse into their own group so the main list stays clean. Events tab is a paginated connection log with the page controls pinned above the list so they stay in the same position page to page. Reload any session to resume the Claude thread exactly.

Sessions — heartbeat grouping

Events — pagination pinned above log

Panel 06 — Connections

Every integration, one panel.

OAuth services (HeyGen, YouTube, Google), API-key services (Anthropic, ElevenLabs, Tavily), and MCP-managed services (Gmail) all tracked in one place. Three kinds of auth, one unified status view. Live connected/expired state per service, connect/disconnect toggle for OAuth flows, kind badge so you know what manages the credential. No credentials in config files.

Chat bar

The input bar is a control panel.

Model switcher, VRAM inspector, token cost tracker, and pipeline task watcher are all embedded in the chat input bar — always visible, never in a separate settings screen. Switch models mid-conversation, eject a loaded Ollama model from GPU, see live token cost and context %, and track running background pipelines with per-job PID tags — all from one bar.

Model selector

Token cost tracker · pipeline task watcher · PID tags

VRAM inspector + eject

Panel 07 — Settings

System configuration.

Model selection, heartbeat interval, and system preferences — all adjustable from the browser without touching config files or restarting the server. Settings persist in SQLite. The Mission Control section lets you start, stop, or restart the server itself from the UI. One button to bring everything back up if something goes sideways mid-session.

Panel 08 — Help

Built-in documentation.

A full reference panel covering every feature, model, keyboard shortcut, and troubleshooting step — written and kept current as the system grows. The Features section walks through every panel. Models section groups Cloud and Local with tag pills. Shortcuts lists every keyboard action. Troubleshooting includes copy-ready commands for diagnosing common failures. Documentation that lives in the tool, not somewhere else.

Capabilities

What it actually does.

From a single chat interface, Sterling can research, write, schedule, delegate, publish, and remember — all while keeping the conversation open.

🧠

Claude + Local Models

Claude (via the Anthropic CLI) is the primary brain — the orchestrator that reasons, delegates, and drives long-horizon tasks. Local Ollama models (Qwen, DeepSeek, Llama Vision) handle agent subtasks where speed or cost matters. Both share the same streaming interface.

💾

Persistent Memory Across Sessions

Every Claude session can be resumed. Long-term memory lives in editable workspace files — MEMORY.md, PROJECTS.md, SOUL.md — loaded fresh into every system prompt. Sterling remembers decisions, preferences, and project context across all conversations.

⏰

Scheduled Autonomous Tasks

APScheduler runs cron-based jobs that fire prompts directly into Claude sessions. The heartbeat fires every 10 minutes in its own isolated session — reviewing memory, checking project state, surfacing reminders. Custom jobs scheduled from the UI persist across restarts.

🤖

25+ Specialized Agents

Each agent is a Markdown file with YAML frontmatter — name, model, category, write directory, system prompt. @mention one in chat to invoke it interactively. Sterling invokes them programmatically for pipeline work. Agents can call other agents.

⚡

Background Pipeline Mode

Long pipelines run in separate Claude sessions — the chat stays live while they work. Job output streams into the chat feed in real time, labeled and color-coded with a pulsing status dot. Sterling and Jayson can keep talking while the pipeline runs.

🎬

YouTube Content Pipeline

Research → scrub → script → production markup → titles → description → Shorts version → HeyGen scene prompt → thumbnail prompt → keywords → compile → proof. Every stage is a named agent. The full chain runs as a single orchestrated background workflow.

🎞️

Motion Graphics Pipeline

Scripts with [STATCARD] and [LOWER3] markup get parsed into animated HTML capture files. Playwright records each card to ProRes .mov for DaVinci Resolve import — stat cards, lower thirds, and outros with count-up animations and staggered reveals.

🌐

Browser Automation + Web Search

Playwright headless Chromium handles JS-rendered pages, Cloudflare-gated content, and soft paywalls. Tavily powers structured web search. Both are available as tools to any agent in the pipeline — deep research is the output, not just headlines.

🔗

OAuth Connections

Custom PKCE OAuth 2.0 handles external service auth. HeyGen is live — OAuth token stored in SQLite, status in the Connections panel. A video watcher polls the HeyGen API for completed recordings and downloads them automatically.

📁

File Registry + Inline Media

Uploaded images, captured frames, and generated files get 8-char hex IDs. The [[FILE:id]] system renders them inline in chat. Agent write directories are controlled per-agent — research, pipeline output, scripts, and code stay separated.

YouTube production workflow — full agent sequence

researcher

→

agent_scrub

→

agent_script

→

agent_script_markup

→

agent_titles

→

agent_description

→

agent_short

→

agent_heygen

→

agent_thumbprompt

→

agent_thumb

→

agent_keywords

→

agent_sources

→

compile

→

agent_proof

Full feature reference

Everything wired in.

Technical breakdown by category. Built incrementally — every item here was added because the work demanded it.

⚙️ Core Infrastructure 12 features

FastAPI + WebSocket server backend

Uvicorn on port 8765. WebSocket /ws handles all streaming chat — text chunks, tool calls, tool results, session state, usage stats, and job events.

Session resumption claude

Active Claude session ID persisted in SQLite. Every message passes --resume to continue the exact conversation thread across browser refreshes and server restarts.

SQLite persistence

agent.db stores sessions, scheduled jobs, settings, file registry, OAuth tokens, connection events, and the message queue. No external database required.

JSONL conversation logs

Every session writes a per-session .jsonl file. Full history panel with search, session switching, and inline file previews backed by these logs.

Message queue

Messages sent mid-response queue in SQLite and drain after the current response completes. Each queued message re-parses for @mentions independently.

APScheduler (cron + interval) scheduler

AsyncIOScheduler with CronTrigger for custom jobs and IntervalTrigger for heartbeat. All jobs persist across restarts via the scheduled_jobs table.

Heartbeat isolation

The heartbeat fires in its own session (session_id=None) — it never overwrites the interactive Claude session. Reads HEARTBEAT.md, broadcasts output to chat.

Memory reminder system

Each heartbeat scans MEMORY.md for [remind] tags with today's date and schedules one-shot jobs at 10:03 AM. Marker files prevent duplicate firings across restarts.

256MB subprocess buffer

asyncio StreamReader set to 256MB for Claude CLI stdout. The default 64KB caused silent crashes on large tool results — a critical reliability fix that unlocked stable tool use.

PKCE OAuth 2.0 auth

Generic OAuth layer in oauth.py. Authorization URL generation, code exchange, per-service token storage in SQLite. Broadcasts connection events to all WebSocket clients.

File registry + [[FILE:id]] system

8-char hex IDs for all registered files. Frontend parses [[FILE:id]] tags in agent responses and renders images inline or as download links via /api/file-ref/{id}.

Mission Control panel

Start, stop, and restart service processes from the browser UI. Port-based process management via /api/mc/* — includes full server restart in one click.

🤖 AI Models 10 models

Cloud — Anthropic API

Claude Sonnet 4.6 primary · orchestrator

Primary reasoning engine. Runs via Claude CLI subprocess with stream-json output. Persistent sessions, full tool access, handles all long-horizon tasks, delegation, and judgment calls.

Claude Opus 4.7 most capable

Most capable Claude model. Reserved for tasks requiring maximum reasoning depth — complex orchestration, nuanced writing, or edge cases where Sonnet falls short.

Claude Haiku 4.5 fast · lightweight

Fast, low-cost Claude for simple structured tasks. Used where speed matters and full reasoning is overkill — quick lookups, format conversions, lightweight confirmations.

Local — Ollama on NVIDIA Titan RTX 24GB VRAM

Qwen3 30B local · general

General-purpose local model with strong reasoning. Handles a wide range of agent tasks — research, summarization, structured output — without the VRAM cost of the 35B variant.

Qwen3.6 35B local · heavy · ~24GB VRAM

Primary heavy agent model. Script writing, markup, thumbnail generation, descriptions. Explicitly unloaded from VRAM between pipeline phases to free memory for lighter models.

Qwen3.6 27B local · balanced

Mid-range Qwen. Writer, format, and moderate content agents. Faster cold start than 35B while retaining strong structured output performance.

Qwen3 14B local · fast

Lightweight utility model. Slug generation, condensing, headline pulls, keyword lists, scrubbing. Low VRAM footprint — stays resident between pipeline stages without cost.

Qwen3 Coder 30B local · code

Code-specialized model powering the Coder agent. Full-stack aware: Python, FastAPI, JavaScript, Bash, Swift, C/C++, Win32 API, CUDA. Strong at reading existing codebases before writing.

DeepSeek R1 32B local · reasoning · ~19GB VRAM

Chain-of-thought reasoning model. Q4 quantized, fits the Titan RTX. Runs with tools=False and extended think=True chains. Best for analysis tasks where deliberate multi-step reasoning matters.

DeepSeek R1 14B local · reasoning · lighter

Lighter reasoning model for when R1-style deliberation is needed but VRAM headroom is tight. Same no-tools, think=True configuration as the 32B — lower cost, less depth.

🧩 Agents System 34 agents loaded

Agent definition format YAML + Markdown

Each agent is a .md file with YAML frontmatter: name, model, description, color, category, write_dir. The Markdown body is the system prompt. All fields editable in the browser.

@mention routing with autocomplete

@AgentName prefix resolves to agent model + system prompt. Tab autocomplete shows a filtered dropdown. Last 50 messages injected as context for stateless Ollama agents.

Agent-to-agent delegation

POST /api/agent/invoke handles all agent-to-agent calls. Claude uses curl in its system prompt; Ollama models use the call_agent tool. Output always auto-saved to disk.

Per-agent write directories

write_dir frontmatter restricts each agent's write access. Research writes to workspace/research/, pipeline agents to workspace/pipeline/, coder to workspace/code/.

Slug-based output routing

Pass slug in invoke body → output at {write_dir}/slugs/{slug}/{slug}.{agent}.{MMDD}.md. All stages for a topic land in the same slug directory across agent write roots.

CRUD agent editor

Full-pane editor in the Agents panel UI. Shows filename, all frontmatter fields, and a full-height system prompt textarea. Create, edit, delete agents directly from the browser.

Task anchor injection

After each Ollama tool-call round, a [Task anchor] message reinjects the original prompt into the message history. Prevents objective drift on long tool-use loops.

VRAM gate + eject

Queue-based protection prevents concurrent large model loads from exceeding GPU capacity. VRAM popup in input bar shows loaded models and Eject button per model.

Why so many agents?

Each agent is single-responsibility — it does one thing well and hands off. This is intentional: when a single AI tries to research, write, proof, and publish all in one session, it drifts. It loses context. It compensates with confident hallucination.

Breaking the work into small, chained stages solves that. Research feeds scrubbing feeds scripting feeds markup — and no single agent ever carries the full load. Lighter tasks (slug generation, condensing, headline pulls) run on fast local models like Qwen 14B to save cost and time. Heavier ones (scripting, markup) go to the 35B. The model is matched to the task.

Multiple agents can run in parallel as background pipelines — offloading entire stages from Claude entirely. Sterling (Claude) acts as the orchestrator: it delegates, reviews, and directs. Agents execute. This division keeps costs low, quality high, and Claude focused on judgment rather than grunt work.

All loaded agents

researcher researcher_roundup youtube_workflow agent_broad agent_narrow agent_research agent_sources agent_scrub agent_condense agent_script agent_script_markup agent_short agent_titles agent_description agent_heygen agent_thumbprompt agent_thumb agent_slug agent_save agent_save_ammend agent_compiler agent_keywords agent_screen agent_framegrab agent_proof agent_motion agent_verify agent_web_publish agent_condense headlines coder writer format vision

🎬 Content Pipeline — YouTube Production YouTube production

researcher workflow orchestrated

Full research pipeline: generate slug → broad search (15–30 URLs) → narrow per-source fetch → Playwright scrape every article → structured research file with source attribution.

researcher_roundup orchestrated

Multi-story variant. Runs N parallel topic researches and produces a composite file with STORY section headers for roundup-style episodes covering multiple AI news items.

agent_scrub + agent_condense

agent_scrub strips nav noise, cookie boilerplate, and duplicate content from raw research. agent_condense extracts dated bullet facts and appends to a master condensate file.

agent_script qwen3.6:35b

~1500-word YouTube script in 9-section structure. Written in Jane Sterling's voice — direct, authoritative, unscripted-sounding. Reads the scrubbed research file as input material.

agent_script_markup qwen3.6:35b

Inserts [CUT], [BROLL], [STATCARD], and [LOWER3] production markup into a finished script — creating a capture-ready production document that feeds both HeyGen recording and motion graphics.

agent_titles + agent_description

CTR-optimized YouTube title variants and full video description (hook + summary + bullet topics + source links). Both feed the upload package and are proofed before delivery.

agent_short

125–175-word YouTube Shorts script from the most newsworthy moment in the episode. Standalone deliverable for vertical video distribution.

agent_heygen + agent_thumbprompt + agent_thumb

agent_heygen produces SCENE/LIGHTING/MOOD for the recording session. agent_thumbprompt generates a photographic thumbnail prompt + 3 titles + color palette. agent_thumb adds Photoshop layout specs.

agent_keywords + agent_proof

20 YouTube hashtags (channel hashtag first, then broad AI, then topic-specific). agent_proof reads the full compiled package and returns a structured change list before final delivery.

web_publish workflow orchestrated

Parses finished YouTube packages into the news channel website template. agent_verify confirms correctness before write. Automated publish from pipeline output to live site.

🎞️ Motion Graphics Pipeline ProRes deliverables

agent_motion

Parses [STATCARD] and [LOWER3] markup from script_markup files. Writes structured data files, then calls generate-motion-html.py to produce capture-ready animated HTML.

generate-motion-html.py

Generates animated HTML files from stat card data. CSS animations with configurable timing: count-up numbers, fade/slide in/out, stagger delays, roll-up for bottom-edge cards.

record-motion-cards.py

Playwright records each HTML motion graphic to a ProRes .mov. Individual card deliverables — stat card, lower third, outro — sized for DaVinci Resolve compositing import.

whisper-edl.py

Generates an EDL (Edit Decision List) from Whisper transcript output. Links audio timestamps to Resolve edit points for semi-automated assembly cuts from the script.

composite-cards.py

ffmpeg compositing utility. Crops individual cards from 4K captures at pixel-accurate bounds per slug, scales to proxy resolution for QC review before Resolve delivery.

🔌 Integrations Connected services

HeyGen connected

PKCE OAuth live. heygen-watch.py polls the v3 API every 60s for completed videos, downloads MP4 + SRT, registers via sterling-save. Recording stays manual via the web dashboard — no API credits needed.

ElevenLabs V3 (via HeyGen)

Jane Sterling's voice: "Tiffany - Friendly" on ElevenLabs V3. Wired through HeyGen's avatar recording interface. No direct ElevenLabs API calls — consistent voice without extra integration cost.

Tavily web search

Structured web search with 5-result responses. Available to all Ollama agents via the web_search tool. Primary entry point for broad research before Playwright article fetching.

Playwright / Chromium browser automation

Headless browser tool for the browser_fetch agent tool — handles JS-rendered pages, Cloudflare-gated content, and soft paywalls. Also drives screenshot capture and motion card recording.

Gmail MCP

Gmail access for pipeline health monitoring. The VPS nightly pipeline sends status emails; Sterling reads them each morning via heartbeat to flag failures or confirm clean runs.

Google Calendar / Drive stubbed

PKCE OAuth flows implemented and waiting. Require Google Cloud Console client registration. Architecture mirrors the working HeyGen integration exactly — a client_id away from live.

Built because nothing else was enough.

This is the actual app.

Inside Sterling Agent.

What it actually does.

Everything wired in.

A living system.

Built for one person.
Grown from necessity.

Sterling Agent

Built because nothing else was enough.

This is the actual app.

Inside Sterling Agent.

What it actually does.

Everything wired in.

A living system.

Built for one person.Grown from necessity.

Built for one person.
Grown from necessity.