Obsidian-Agent-Post: Voice-Consistent AI Content Pipeline
The Challenge
AI-generated content has a credibility problem: it sounds generic, lacks personal voice, and fails silently on quality. Most AI content tools optimize for speed, not substance. The result is 'AI slop' -- technically correct but indistinguishable from any other LLM output. I needed a system that could generate content across 6 channels (LinkedIn, Blog, Twitter/X, Substack, Medium, GitHub) while maintaining a consistent personal voice, scoring quality across multiple dimensions, and catching issues that single-pass LLM review misses.
The Approach
Built a 5-layer AI content pipeline using a multi-agent development framework with strict contracts: (1) Multi-Source Research -- 10 parallel sources (HackerNews, Reddit, ArXiv, GitHub, YouTube, RSS, Obsidian vault, Brave, newspaper4k, Google Drive) with semantic deduplication and conflict detection, completing in <90 seconds. (2) CQS Scoring Engine -- 6-dimension quality rubric (Clarity, Quality, Substance, Voice, Structure, Impact) with hard gates (voice match >=70%, word count compliance) and soft scoring, calibrated against published content. (3) Voice Consistency -- 31 checkpoints across 4 categories (Core Voice 9, Theoretical Content 10, Practical Content 11, Optional 2) ensuring every post sounds like me, not generic AI. (4) Karpathy 3-Loop Validation -- 4 rounds with 11 independent audit agents performing structural, adversarial, and logical validation. 80 verified findings, 0 false positives. (5) Multi-Channel Generation -- single research topic adapted to 6 channel-specific formats (LinkedIn 400-word hooks, Blog 1500-word deep-dives, Twitter threads, Substack newsletters, Medium articles, GitHub READMEs). Tech: FastAPI + SQLModel + PostgreSQL + Next.js 14.
Key Learnings
- Voice consistency cannot be LLM self-corrected -- explicit checkpoints (31 rules across 4 categories) prevent 90% of generic AI output
- Multi-dimensional scoring (CQS with 6 dimensions) catches quality issues that single-metric systems and LLM self-review miss entirely
- Karpathy multi-round validation (4 rounds, 11 independent agents) surfaces issues that single-pass review cannot detect -- convergence pattern: 15, 33, 18, 14 findings per round
- FAILURE: Initial single-pass generation produced 69.1 avg CQS -- unacceptable quality variance. Added 3-loop validation with hard gates to reach 75+ consistently.
- Dogfooding proof: every article on sathyan.ai was produced by this pipeline -- the content IS the evidence that the system works
“Every article on this site was produced by the system described in this case study. The content you are reading IS the proof that the system works.”
— Self-referential proof -- dogfooding