Marketing Agent System
Designing an AI-native execution system for content and ad workflows with observability, control, and real-world integrations.
Introduction
I built this system as an experiment around a practical question: what does it take to make agentic AI useful in an actual operating workflow, rather than as a chatbot demo?
The answer, in my experience, is that the hard part is rarely "calling the model." The hard part is everything around it: state, tools, retries, approvals, observability, scheduling, human control, and making sure model output does not directly become uncontrolled side effects.
This project became a small but serious operating system for marketing execution around Upscale Print. It combines a conversational agent layer with deterministic worker execution, so reasoning and action are connected without being collapsed into the same thing.
Problem
A lot of "AI agent" systems are impressive in demos but weak in practice. Common failure modes include: too much hidden state, unclear ownership of side effects, poor observability, no real safeguards around risky actions, brittle integrations with external systems, and no useful distinction between planning and execution.
I wanted to explore a different approach: an AI-native workflow system where models help with reasoning, planning, and evaluation, while concrete actions remain durable, inspectable, and operationally controlled.
The target workflows were practical ones: Instagram planning and publishing, creative generation, quality evaluation, Google Ads analysis and optimization, and metric collection and feedback loops.
What I owned
I owned the system end to end:
- Architecture and data model
- Agent and workflow design
- Tool integrations and scheduling
- Operational safeguards and deployment
Outcome
What exists today:
- A working two-process AI-native system
- Specialist agents for planning, analysis, composition, copywriting, evaluation, and orchestration
- A durable execution layer with job queues, retries, scheduling, and auditability
- Real integrations with Instagram, Google Ads, image generation, and evaluation workflows
- Human control surfaces for risky actions like budget changes
- Feedback loops where performance data informs future planning
System overview
The system has two main layers: a TypeScript/Mastra agent layer for conversational interaction, tool-based reasoning, and specialist roles; and a Python worker layer for scheduled jobs, API integrations, queue execution, and deterministic side effects. Both layers operate against shared SQLite state that acts as control plane, audit trail, and durable memory.
Agent UI ↔ Agent Layer ↔ Shared SQLite State ↔ Python Worker ↔ External APIs
- 1.Strategy and recent performance data are gathered
- 2.A strategist agent creates a weekly content plan
- 3.Planned posts are stored durably in the database
- 4.Content variations are generated
- 5.Creative assets are generated and evaluated
- 6.If creative quality is weak, retry/revise loops run
- 7.Ready posts are queued for publishing
- 8.Metrics are collected after publishing
- 9.Learnings are fed back into future planning
Key decisions
Separate reasoning from execution
The agent decides, proposes, analyzes, or plans. The worker performs side effects through explicit jobs. This separation reduces risk, improves observability, makes retries tractable, keeps behavior inspectable, and creates a clearer human control surface.
Shared SQLite state as a simple control plane
A small, durable shared state model rather than distributing state across agent memory, process memory, and external services. This gave the system inspectability, recoverability, simpler debugging, and a practical audit trail with low operational overhead.
Approval-gated actions for risky changes
Some actions should not happen just because a model suggested them. Budget changes are a good example. Approval-aware flows and caps prevent recommendations from becoming immediate uncontrolled side effects.
Evaluation loops instead of trusting first output
The system uses vision-based review and retry loops for generated creative. Generative systems often need a quality layer around them, especially when brand consistency is important.
Observability and health checks as first-class features
The system was designed to be operable, not just clever. That meant adding status visibility, health checks, logging, alerts, and durable records of what happened.
What mattered most
- Model calls are only one part of the product
- Workflows need observability and control
- Side effects should be explicit and inspectable
- Human oversight matters for real-world operations
- Evaluation is not optional when quality matters
- Practical architecture usually beats magical architecture
What I'd improve next
- Richer observability and evaluation dashboards
- Better operator UX around approvals and intervention
- Stronger performance measurement on workflow outcomes
- Broader channel coverage beyond the current scope
- More explicit testing/evaluation harnesses for agent quality over time
“The biggest lesson from this project is that useful agent systems are designed as operational products, not prompt demos.”