Skipper is an operating system for your home, built to make a busy household run with less friction. It tracks your goals and projects, plans your meals, fires your reminders, organizes what you research, keeps the calendar straight, watches the smart home, and quietly remembers everything that matters across every conversation.
20 first-class apps. Voice on every device. Integrates with Trello, Tastytrade, and Home Assistant. The longer you use it, the more useful it gets.
Skipper is built around the way a household actually works. It plans, organizes, watches, and remembers across people, devices, and conversations, so the things that matter don't fall through the cracks.
Goals, projects, tasks, reminders, schedules and a daily prioritize view that pulls from all of them. Three focus slots for what really matters today.
Searchable recipe library, scaling, cooking mode, paste-to-parse via chat. Meal ideas from a tagged library: "low-effort, no Mexican, must use potatoes."
Smart-home control through Home Assistant, appliance tracking, household item locator, maintenance schedules, vehicle service records.
"Skipper, go research how Kalman filters get used in trend trading." Twenty minutes later you have a fully sourced markdown document linked to the right project, delivered to your Discord.
Every conversation seeds a long-term memory. The longer you use it, the more it knows about you, your projects, and your preferences, and the more useful it gets.
Live Tastytrade integration, multiple strategies, streaming TradingView indicator signals, and an intraday risk monitor that DMs you when the equity curve, news, or positions warrant attention.
Gmail rule engine, scheduled polling, and a chat-driven email tool that can compose, search, and triage, with every action logged per rule.
Discord DMs, the web app, the Android app, and always-on voice with wake-word. Same memory, same tools, same context. Pick whichever surface is closest.
Reminders, schedules, nags, and calendar items all share one unified delivery layer: Discord, Pushover, and mobile push. Anti-spam cooldowns so notifications only fire when there's actually something new.
The mobile app runs an always-on wake-word listener with a rolling pre-roll buffer. When it fires, you hear a chime. That's "I heard you," not "go ahead, start now." You can talk straight through it; the pre-roll covers the chime and the ephemeral-token mint.
Picovoice Porcupine on-device. No cloud listening. Configurable phrase (default: "Hey Skipper").
Rolling buffer captures audio before and during the chime so users never need to wait.
Server mints a short-lived OpenAI Realtime API token. Separate billing key supported.
gpt-realtime for thought + voice; Whisper-1 for transcript. Tool calls relayed via WebSocket.
Every tool has an Ack: line like "Setting that reminder…" spoken while the call runs, then the actual result.
The voice surface uses the same agent loop as web and Discord. Same memory, same tools, same context. Switch apps with "open the recipes app"; the voice session re-scopes its tools to that app's category set automatically.
Compact alias blocks in the system prompt help the realtime model lock onto callable targets (recipes-app vs. recipes-tool vs. recipe-tool). Wake chimes, tool acks, and TTS all flow through one delivery layer so the audio never overlaps.
None of these are mockups. Each is a full app, with its own data model, its own UI, and its own chat-callable tools, addressable by name from anywhere. This is what's live today.
Hierarchical task trees with arbitrary nesting. Kahn's-algorithm dependency-aware stack ranking. Per-entity auto-nags and Trello-board card linking.
Recurring schedules with RRULE support, auto-notifications, PM digest integration. Aggregates reminders, tasks, nags, and to-do into one view.
Markdown editor with section-aware LLM enhancement and inline diff-aware revisions. CodeMirror 6, autocomplete, entity linking, full-text + vector search.
Your personal family journal. Captures milestones, memories, and moments chronologically. Searchable, photo-aware, with entity links so you can revisit any chapter of family history.
Live Tastytrade integration, paper-trading backtests, 10+ strategies (BIL monthly, dual momentum, HMA, Kalman, RSI mean reversion, ML ARKK gate, Polygon ML).
Intraday risk domain. Every 15 min during market hours, LLM analyzes equity curve + news + positions and DMs hold / exit / re-enter recommendations.
Vehicles, service records, issues, valuations, conditions. MCP tools + REST + React detail UI with maintenance schedule integration.
Automation, appliance tracking, insurance, with Home Assistant voice integration scoped per room.
CRUD with categories, ratings, chef notes, image carousel, scaling, cooking mode. Paste-to-parse via chat. Generates standalone meal menus.
Gmail OAuth, rule engine with activity log per rule, account/rule UI. Scheduled polling, action chains.
Household item locator with photo + location tracking. "Sticky" item history suggests where to file something on re-add.
3 focus slots with reorder / dismiss. Dynamic backlog auto-pulls from tasks, reminders, nags, auto issues, schedules, and to-do.
Daily checklist with nudge delivery, calendar visibility, prioritize-backlog inclusion. Per-user.
Drop any file in a folder; the intelligence pipeline chunks, embeds, and fact-extracts so chat can recall it semantically.
Ingest URLs and docs, automatic chunking + embedding, semantic search, automatic injection during chat.
Async research pipeline: Brave search → fetch → LLM summarize → synthesize document. Brave/Finnhub/TradingView signal feeds.
Markdown → HTML → PDF → physical printer (lpr). 4-step PDF fallback chain (weasyprint → Chrome headless → wkhtmltopdf → pandoc).
Unified delivery to Discord DMs, Pushover, FCM mobile push, server channels. Anti-spam cooldowns per entity / issue type.
Hosted library with pop-out HLS player (hls.js). Per-episode progress, watch state, queue management.
Operator-grade views into the agent's background cognition, runtime tool authoring, job queues, and live system state.
Why 20 and not 5? Each app is a self-contained package. Drop a folder in apps/ with a manifest, and the platform discovers and wires it on startup. Per-app Postgres schemas, auto-mounted routes, auto-registered tools. See how →
Every integration shares the same memory and the same agent loop. Mention a Trello card in a voice session and the Discord channel sees the same answer.
Native bot. DMs + channels. File attachments embed into context.
Live card sync (not cached). Checklists, labels, due dates, bidirectional delete.
Token manager, order placement, positions, balances, paper backtests.
Streaming webhook feed. Pine Script indicators push signals straight into Skipper's strategy runners in real time.
Voice control for lights, scenes, and devices, scoped per room.
OAuth2, scheduled polling, rule engine with action chains and activity log.
Mobile push to Android. Auto registration on app launch.
Mobile alerting fallback. Per-user channels. Critical-priority overrides.
Search + market data feeds powering the research and trading pipelines.
Save screenshots into web/info-shots/ with the slug filenames below. Each tile auto-swaps to your image as soon as the file exists. No HTML edit needed.
The rest of this page is the technical story: architecture, the tool router, memory pipeline, background engine, and the optimizations that make it cheap to run.
FastAPI backend hosts the agent loop, the MCP tool registry, the data layer, and an SPA. A dedicated thinking scheduler runs background cognition. Postgres + pgvector stores everything. The same context is reachable from voice, web, mobile, and Discord.
The React SPA is genuinely a windowed desktop. Goals, Documents, Recipes, Investments, the Calendar, the Locator: each a full app with its own routes and components. But every one of them can hand context to the agent with a click: "summarize this", "create a task from this", "remember this for next week."
Background cognition runs continuously on a separate thinking_scheduler with dynamic intervals (30s → 1h) and a shared 15M-token daily budget. Domains throttle themselves at 70% and 90% to stay polite.
Five long-running background loops keep the system alive: reminders, jobs (research / print / refinement), Trello live-sync, thinking, and the dispatcher.
The Tastytrade integration isn't bolted into the agent process. It's a separate FastAPI service hosted on its own AWS EC2 instance, with its own database connection pool, its own APScheduler, its own webhooks, and its own strategy runners. It's authored in the same repo but deployed independently.
Why it matters: if the Skipper agent crashes, gets restarted, or is taken down for a deploy, positions stay tracked, strategies keep running, TradingView webhooks keep firing, and risk monitoring keeps watching the equity curve. Trading is a real-money domain. It doesn't sit behind a chat process.
The agent talks to it via a private API tunnel. From the user's perspective there's still one Investments app and one chat, but underneath, the two services have isolated failure domains.
Most AI tools are reactive: you ask, they answer, the lights go out. Skipper has a thinking framework: a runtime that lets it run autonomous cognitive loops in the background, each one observing, evaluating, and acting on its own rhythm. New responsibilities are added as new domains. This is the capability that turns the assistant into an agent.
A thinking domain is any responsibility worth running on a clock. Each one registers a handler that follows a strict observe → evaluate → act contract, then runs as its own concurrent asyncio task, independent of chat and independent of every other domain, but sharing the same brain, memory, and tools.
Critically, every cycle returns next_check_seconds. The domain controls its own rhythm based on what it found: busy → faster, idle → slower. Bounds are clamped to 30s through 1h so nothing spins or starves.
These are the domains live today. The framework supports any number. Anything that benefits from running on a rhythm is just another registration.
Wakes up daily, picks the next project that needs review, loads its full state + history, runs LLM analysis, and DMs people about blockers, missing dates, and overdue work.
Drains the memory ingestion queue in batches of 10. Delegates each item to the right digester. Self-paces tight when queue is full, idles when empty.
Every 30 min during active hours, reads accumulated memories with a cursor, filters noise, decides what's worth writing down, files markdown into a self-organizing folder tree.
Self-improvement. Manages a persistent job tree of hundreds of focused LLM calls (cycle → phase → unit). Survives crashes and resumes from where it left off. See callout below.
The priority-0 domain. Each user message runs as one cycle: context assembly, retrieval, tool routing, agent loop, post-processing. The same framework as every other domain.
Every goal Skipper owns spawns its own thinking domain named after the goal ID. Between conversations the handler loads the full goal context and quietly works on it: analyzing health, creating tasks, DMing collaborators, researching.
The Evolve domain is a cycle manager that orchestrates hundreds of focused LLM calls against Skipper's own platform: its apps, tools, prompts, integrations, and specs. It reads its own architecture, identifies gaps, drafts fixes, files them as issues, even authors new tools at runtime.
Each evolve cycle decomposes into phases, each phase into units. All state lives in Postgres via the job queue. A server crash mid-cycle resumes from the next unit on restart.
Pauses automatically if daily token spend gets high. Picks up tomorrow where it left off. The whole cycle can span days without anyone touching it.
Evolve can call create_tool / update_tool. And the resulting tool is registered live with the MCP server and immediately callable in the same session.
The point isn't this list. It's the framework. Anything that benefits from running on a rhythm (monitoring a feed, scanning for patterns, auditing data, drafting a report, learning from history) is just another domain registration. The thinking system is what makes Skipper an agent, not an assistant.
Skipper is not a monolith with 20 hardcoded features. It's a runtime that discovers apps at startup and wires them into the agent, the API, the database, and the tool router automatically. Each app is a self-contained package. Adding a new domain is a manifest, not a refactor.
Every app lives in apps/<id>/ and ships these files. Most are optional. The loader picks up whatever's present.
Each app gets its own Postgres schema app_<id>. The loader validates that no cross-schema foreign keys exist. If app A's table tries to FK into app B's table, startup fails loudly. Apps communicate through events and the platform's link registry, never by reaching into each other's tables.
When the agent boots, the loader walks every package in apps/ and runs this lifecycle. If any step fails, the app is marked degraded but the agent keeps booting.
Load manifest.yaml; validate id, version, entity-type prefixes.
CREATE SCHEMA IF NOT EXISTS app_<id>. Each app gets its own.
Apply unapplied SQL files in order. Tracked in app_migrations.
Reject any cross-schema foreign keys. Hard error on violation.
Entity-type prefixes (ml-*, mc-*) registered in the platform.
App row written to app_registry with version and status.
Each public function in tools.py with a docstring becomes an MCP tool.
FastAPI router from routes.py mounted at /api/apps/<id>/.
Manifest's tool_category + keywords merged into the dynamic router.
Event-bus subscriptions declared in handlers.py registered.
Job handlers and thinking-domain registrations from the package activated.
Reachable from chat, voice, the SPA, Discord, and mobile. All without core code changes.
This is why there are 20 apps and not 5. New domains can be added by anyone who can write a manifest and a few Python functions. Including Skipper itself via the Evolve thinking domain, which authors tools and (eventually) full apps through this same framework. The platform isn't a feature of the product. It's the substrate the product runs on.
If you stuff every tool schema into context, the model gets distracted and you pay for every token, every turn. Skipper's router scans the message for keywords and entity-ID patterns (g-*, p-*, t-*, sch-*, veh-*), picks the matching categories, and only those tool schemas are sent.
Tools are organized into 40+ categories by domain. Reminders, recipes, investments, filesystem, memory, scheduling, and so on. 29 categories ship in the core tool_routes.json; each loaded app contributes another. Every category declares its tools, its keyword triggers, and per-tool acknowledgment templates ("Setting that reminder…") that the voice surface speaks before the call finishes.
Four meta-tools are always present: list_all_tools, request_tools, open_app, restart_agent. If the agent needs something outside the current set mid-turn, it just asks for it.
Behavioral guides in prompts/guides/ load on the same trigger. Talking about reminders? You get the reminders guide. Talking about goals? The goals guide. Nothing else.
Each is independently routable. Mixed messages can pull multiple categories.
Memory is not a side-cache. It's the spine. Every CRUD operation, every conversation turn, every successful tool call seeds the long-term knowledge base. By the next conversation, Skipper already knows what changed.
The agent (or you) calls remember(content, tags, about). Tagged, entity-linked, full provenance back to the conversation turn that created it.
Every entity CRUD writes an [auto]-tagged memory. recall(entity_id=t-abc) returns the full change history with zero work.
After every chat turn, the conversation is enqueued. A cheap LLM extracts factual statements in the background. No latency added. And writes them as searchable memories. Knowledge compounds organically.
Memories don't pile up in a flat table forever. Every chat turn and every app-record event drops a job onto the memory_ingestion_queue. A Postgres-backed work queue with entity-keyed dedup (a burst of updates to the same entity collapses to last-write-wins) and crash-safe stale-row recovery for items left "processing" after a restart.
Two thinking domains drain the queue in parallel, each on its own rhythm:
The result: a self-organizing folder tree of auto-documents. Topic indexes, reference cards, running narratives. Created and reorganized by the agent itself, readable by humans, all sitting inside the Folders app. Working memory persists between cycles so the domain remembers what it's already organized.
OpenAI text-embedding-3-small (1536-dim). Stored binary-packed as float32 for compact disk + O(1) seeks. Postgres + pgvector for the index. Tag normalization (plurals → singular) means "memories" and "memory" both hit the same rows.
Recall is hybrid: cosine similarity narrows the pool, then tag / entity / author filters refine. On startup, missing embeddings backfill lazily so adding a new field never blocks a release.
Drop a document into a folder and a post-processing pipeline runs two passes: (1) chunk + embed for semantic search; (2) LLM fact extraction for structured recall. Web URLs go through the knowledge store with the same treatment.
Skipper doesn't just respond. It runs work. A dedicated job dispatcher manages multiple types of long-running, asynchronous work behind the scenes. The runtime is the same whether the agent kicked off the job, you scheduled it, or a cron rule fired it.
job_dispatcher.py is a handler-registry engine. Each job type registers a handler with a per-type concurrency limit. The dispatcher claims pending work from Postgres, runs handlers as asyncio tasks, and gives each one a live JobContext. Its own progress channel and cancellation flag.
A separate 30-second job runner loop polls multiple sources (research, refine, print, PM, schedule-triggered) and dispatches what's due. Five long-running loops total: reminders, jobs, Trello live-sync, thinking, dispatcher.
Web search → fetch → summarize → synthesize → DM. See deep-dive below.
Section-aware document revision. Diff-aware edits over existing research docs.
Markdown → HTML → PDF → physical printer. 4-step fallback (weasyprint → Chrome headless → wkhtmltopdf → pandoc).
10 AM daily cycle. Scans goals/projects/tasks for missing dates, owners, blockers; per-project LLM analysis; grouped DM delivery.
Lighter mid-day check between daily scrums. Flags status changes since the last cycle.
Strategy analysis pipeline. Runs the active trading strategies, logs equity curves, emits signals.
Cron expressions + custom rules fire jobs on time-of-day, day-of-week, market-hours triggers.
You ask. It thinks. Twenty minutes later you have a fully sourced, cleanly written document linked to the right project, delivered to your Discord DM.
SMART_MODEL strategizes. Splits the prompt into 1–N targeted search queries. No naive single-query fetches.
Configurable 1–20 results per query. Deduped across queries. Sources scored and ranked.
Custom HTML parser strips boilerplate (nav, footer, sidebars, script, style). 20s timeout per page; failures skipped gracefully.
Each fetched page passes through an LLM with the original research question as context, producing a faithful per-source summary.
All summaries fed to SMART_MODEL which writes a sectioned markdown document with citations and a reading-level appropriate to the request.
Document filed in Folders, tagged, optionally linked to a goal/project/task. DM auto-chunked at 2000 chars. Memory layer learns what you researched.
A separate refine job type runs section-aware diffs against an existing research doc. "go deeper on the Kalman gain calculation, and add a section on numerical stability". Generating its own fresh queries, fetching new sources, and writing surgical revisions per section rather than rewriting the whole document. Diff-aware. Source-aware. Reversible.
Every part of the stack assumes scarce tokens, slow networks, and humans who notice latency.
~5–15 tool schemas per turn instead of 346. Token spend drops by orders of magnitude.
Tool calls bypass the MCP subprocess and run the Python function directly. Single-digit-ms overhead.
Memory extraction runs after the response ships. Zero added latency to the user-facing turn.
15M tokens / day across thinking domains. Throttles at 70% (slow) and 90% (pause).
Background loops self-pace 30s → 1h based on what they find. No fixed cron tax.
float32 on-disk, O(1) seeks, lazy backfill on startup. New fields don't block releases.
Vector similarity narrows; tag / entity / author filters refine. Cheap and precise.
Audio recorded before/during the chime is prepended to the realtime request. No "wait for the beep."
Per-entity, per-issue-type cooldowns (3 days default). The system stays quiet when it has nothing new.
Internal IDs preserved in chat history (for the agent) but stripped from user-facing output.
Voice prompt includes condensed alias lookups so the realtime model resolves callable targets fast.
Goals + projects + tasks ranked via Kahn's algorithm so blockers never end up below blockees.
Honest answer: no. Not without the engineering foundation underneath.
The $1,000 receipt above is real, but it isn't $1,000 of any spend. It's $1,000 spent by someone who has been building software for three decades. AI is a remarkable force multiplier, but it isn't a substitute for engineering judgment, system design, debugging instincts, or knowing when the AI is wrong.
Through this build I directed the AI thousands of times. I rejected suggestions, redirected approaches, caught architectural mistakes before they shipped, and made the hard calls about what not to build. Voice integration, the trading service process isolation, the thinking framework, the per-app schema validation, the cross-domain memory pipeline. None of those emerged from prompts alone. They came from knowing what kind of system would hold up over years of daily use.
What's changed is the labor: the typing, the boilerplate, the lookups, the routine glue code, the test scaffolding, the first-draft documentation. AI is genuinely fast at all of that. What hasn't changed is the thinking: knowing what to build, how to structure it, when to abstract and when not to, where the failure modes are, why a particular pattern matters at scale.
If AI is the multiplier, experience is what it multiplies. This page is the receipt for one. It would be a very different receipt for someone without the foundation.
Honest answer: not today. Skipper isn't packaged for distribution, and I haven't made any firm decisions yet about what comes next.
What's on the table:
Right now I'm focused on building something that actually fills my family's needs. And on learning and having fun with the technology while it's still this new. If any of the directions above interests you, the door's open.
Get in touch on LinkedInNot a chat wrapper. Not a demo. A working operating system for a household, with 20 apps, 346 tools, 4 surfaces, live integrations, and a brain that thinks in the background and remembers what mattered yesterday. Built end-to-end. UI, agent loop, MCP server, mobile, voice, thinking scheduler, isolated trading service, the lot.