building regent, an ai executive assistant from scratch

April 1, 2026/3 min read

last month i shipped the first version of regent, a production-grade saas platform that replaces human executive assistants for busy professionals. you connect your email, and regent processes everything 24/7: drafting replies, extracting tasks, managing calendars, and delivering briefings via sms, whatsapp, and signal.

here's how it works under the hood, and what i learned building it.

the problem

executive assistants cost $4,000 to $10,000/month. they're great, but they sleep, take vacations, and can only handle so many emails per hour. i wanted to build something that works around the clock, learns your preferences over time, and costs a fraction of the price.

the architecture

the system has two main parts: a next.js frontend with 107 components across 27 routes, and a go backend that orchestrates per-user ai pipelines.

every incoming email goes through a 3-stage pipeline:

stage 1: categorize. the ai reads the email and classifies it: urgent, needs reply, informational, or spam. this determines routing priority and which models get used downstream.

stage 2: summarize. the email gets condensed into a structured summary with key points, action items, and deadlines extracted. this feeds into the daily briefing system.

stage 3: draft reply. using RAG memory injection, the ai pulls relevant context from past conversations, your preferences, and organizational knowledge to draft a contextually appropriate reply.

RAG memory injection

this is the part i'm most proud of. every user has a personal knowledge graph stored in pgvector. when drafting a reply, the system:

embeds the incoming email into a 384-dimensional vector
searches for similar past conversations and outcomes
retrieves your stated preferences and communication style
injects all of this as context into the reply generation prompt

the result is replies that sound like you, reference past conversations accurately, and follow your preferred tone.

multi-model routing

not every email needs the same model. regent uses a tiered approach:

ollama (qwen3/gemma3) for categorization and simple replies: fast, cheap, runs locally
gemini as a fallback for complex reasoning tasks
circuit breaker pattern so if one model is down, traffic automatically routes to the backup

each model has health checks, and the system tracks latency and error rates per-model to make routing decisions.

the go backend

i chose go for the backend because of goroutines. each user gets their own goroutine for email processing, managed by a supervisor pattern with exponential backoff. if a user's pipeline crashes, it restarts automatically without affecting other users.

the backend also runs 16 scheduled cron jobs including nightly batch processing, and uses AES-256-GCM encryption for all stored credentials with key rotation support.

the database

60 sql migrations, 50+ tables, all with compile-time RLS enforcement through supabase. the schema is designed so that no query can ever accidentally leak data between users. row-level security is enforced at the database level, not just the application level.

billing

4-tier stripe subscription from $97 to $697/month. each tier unlocks different processing limits, model access, and features. the billing system handles upgrades, downgrades, proration, and webhook-driven status changes.

what i learned

start with the pipeline, not the ui. i spent the first two weeks building just the email processing pipeline in go before touching any frontend code. by the time i built the ui, i had a working system to build against.

RAG is only as good as your chunking strategy. i rewrote the embedding pipeline three times before landing on a hybrid approach: recent emails get full embeddings, older ones get summary embeddings, and preferences get their own dedicated vector space.

go's concurrency model is perfect for this. managing thousands of concurrent user pipelines with goroutines is elegant in a way that would be painful in node.js or python.

regent is live and processing emails. the whole thing (frontend, backend, ai pipeline, billing) took about 8 weeks of focused work.

Comments

No comments yet

the problem#

the architecture#

RAG memory injection#

multi-model routing#

the go backend#

the database#

billing#

what i learned#