YouTube Chapter Generator
A creator-first tool that transforms raw speech into accurate, audience-friendly YouTube chapters using AI-powered transcription and analysis.
Role
Overview
This is a creator-first tool that transforms raw speech into accurate, audience-friendly YouTube chapters. I designed and built the product end-to-end, combining multi-language transcription, AI-driven analysis, and human-in-the-loop editing to ensure the final structure reflects the creator's intent.
The project is open-source on GitHub, allowing users to integrate their own ASR systems for transcription and use their preferred LLM for chapter generation. This flexibility ensures the tool can adapt to different workflows and technical requirements.
Multi-language Audio Input
Speech in any supported language is automatically detected and converted into text through intelligent processing. This allows creators to work seamlessly across multilingual content without manual language selection or pre-processing.
The system handles accents, mixed-language content, and varying audio quality gracefully, producing reliable transcriptions that serve as the foundation for accurate chapter generation.
AI-Powered Chapter Generation
Gemini analyzes the recording to identify natural breakpoints, topic shifts, and narrative flow. The result is chapter markers that are both meaningful and clearly titled—going beyond simple timestamp detection to understand the actual content structure.
The AI considers context, pacing, and viewer expectations to produce chapters that enhance discoverability while respecting the creator's narrative intent.
Human-in-the-loop Transcript Editing
Users can review and refine the AI-generated transcript through an interactive editor before chapters are finalized. This ensures the highest level of accuracy and allows creators to correct any transcription errors or adjust chapter boundaries.
The interface is designed for speed—keyboard shortcuts, inline editing, and real-time preview make refinement feel natural rather than tedious.
Technical Stack
The frontend is built with React and modern JavaScript, creating a responsive interface designed for clean, efficient creator workflows. Component architecture prioritizes reusability and maintains consistent interaction patterns throughout the experience.
On the backend, Express integrates with AWS Transcribe to provide reliable, high-quality speech-to-text processing. Gemini AI powers the analytical layer, generating structured chapter output that can be easily exported and used across platforms.
