Muftau Jimoh - Software Developer

Sound ONOFF

PROJECT TITLE

AI Coaches - Creator Monetization

My Role

I led the end-to-end design, development, and deployment of AI Coaches — a creator monetization platform built as a Whop App.

My responsibilities included product design, full-stack engineering, and AI architecture. I built the multi-format knowledge processing pipeline (PDFs, YouTube, websites, images with OCR), the RAG-powered chat engine with Google Vertex AI and Pinecone, and the real-time voice input system using Deepgram WebSocket streaming.

I also designed the monetization layer with Whop API integration for subscription and one-time payment handling, built a 3-tier context caching system (in-memory → Redis → PostgreSQL), and implemented group chat, multi-coach conversations, theming, and comprehensive creator analytics.

What is AI Coaches

AI Coaches is a creator monetization platform — think "NotebookLM meets Gumroad" — that enables creators to transform their knowledge into AI-powered coaches their audience pays to interact with.

Creators upload their content (documents, videos, websites) and the platform builds intelligent AI assistants that understand and communicate their specific expertise. Each coach supports customizable personas, flexible pricing models (free, subscription, one-time), voice input, file attachments, and full theming.

Built as a Whop App, it integrates directly into Whop's creator ecosystem for seamless monetization, with features including group chats, multi-coach conversations, BYOK (bring your own API key), and detailed revenue and engagement analytics.

1. The Problem

Content creators have valuable knowledge but limited ways to provide 24/7 interactive support to their audience. Building custom AI assistants traditionally requires significant technical expertise. <br/><br/> Monetizing AI-powered services requires separate infrastructure for payments, hosting, and scaling — creating barriers that prevent most creators from offering AI-driven experiences.

a. User Challenges: Creators struggle to scale their expertise beyond live sessions and pre-recorded content. Their audience wants on-demand, personalized answers but creators can't be available 24/7. Existing AI tools are generic and don't reflect the creator's unique knowledge and voice.

b. Business Challenges: Building a multi-tenant platform that processes diverse knowledge formats into searchable vector embeddings, delivers real-time AI chat with source citations, handles payment flows through Whop's ecosystem, and scales cost-effectively with per-creator usage tracking and quota management.

2. Design Process

a. User Persona

Course Creator Persona

Alex, an online course creator with 50,000+ students, wants to offer a premium AI coach trained on his course material that students can chat with for instant answers — generating recurring revenue while reducing his support workload.

Community Leader Persona

Maria, a Whop community owner running a fitness coaching group, needs an AI coach that answers nutrition and workout questions based on her published guides and videos — available 24/7 to her paying members without her manual involvement.

4. Development Approach

1. Technology Stack

Frontend

Built with Next.js 15 and TypeScript, featuring Tailwind CSS 4, Radix UI components, Recharts for analytics, SiriWave for voice visualization, and dnd-kit for drag-and-drop interactions.

Backend

Next.js API routes (serverless) with Prisma ORM, handling chat streaming via SSE, knowledge processing, and Whop webhook integration for payment events.

Database

PostgreSQL via Supabase with Prisma ORM for coaches, conversations, messages, knowledge sources, processing jobs, analytics, and payment tracking across 15 interconnected models.

AI Integration

Google Vertex AI (Gemini 2.5 Flash Lite) as primary LLM with BYOK support for OpenAI, Anthropic, and Google models. Vertex AI Text Embeddings for vector generation and Pinecone for semantic search.

Voice & OCR

Deepgram WebSocket API for real-time voice transcription with SiriWave audio visualization. AWS Textract for OCR on uploaded images (PNG, JPG, TIFF).

Monetization

Whop API integration (GraphQL + REST) for automated product/plan creation, subscription management, webhook-driven access control, and per-member message quota tracking.

Infrastructure

AWS S3 for file storage with presigned URLs, Redis/Valkey (Upstash) for warm caching, AWS Lambda for async processing, Sentry for error monitoring, and PostHog for product analytics.

2. Key Development Steps

a. Multi-Format Knowledge Processing Pipeline

Built an 8-stage processing pipeline (Queued → Extracting → Chunking → Embedding → Storing → Completed) with support for PDFs, DOCX, TXT, images (OCR via Textract), YouTube transcripts, and web scraping.
Implemented smart chunking (1000 chars with 100 char overlap), batch embedding generation via Google Vertex AI, and Pinecone storage with per-coach namespace isolation.

b. RAG Chat Engine with 3-Tier Caching

Designed a context-aware chat system with query rewriting, deep inquiry mode for vague questions, and semantic search with source citations and page numbers.
Built a 3-tier caching architecture: hot cache (in-memory, <1ms) → warm cache (Redis, ~5ms) → cold storage (PostgreSQL, ~50ms) with FIFO compaction for token budget management.

c. Creator Monetization & Whop Integration

Integrated Whop API for automated product/plan creation, subscription lifecycle management, and webhook-driven access control with per-member message quotas.
Built flexible pricing models (free, weekly/monthly/yearly subscriptions, one-time purchases) with standard and extended message tiers and bonus allocations.

d. Advanced Chat Features

Implemented real-time voice input via Deepgram WebSocket with SiriWave visualization, file attachments mid-conversation with S3 storage, and optimistic UI with background sync.
Built group chat with message reactions, replies, @mentions, and multi-coach conversations where users can route messages to specific coaches.

3. Performance Optimization

To deliver a scalable, real-time creator platform:

Designed a 3-tier context caching system (in-memory → Redis → PostgreSQL) reducing average response latency by over 90% for returning conversations.
Implemented parallel processing for RAG retrieval, query rewriting, and cache warmup to minimize time-to-first-token.
Used optimistic UI patterns for instant message display with background synchronization and automatic rollback on failure.
Built idempotency guards via Redis-backed keys to prevent duplicate knowledge processing and message delivery.

Live Demo

Next Project: twiq