APP OVERVIEW
Problem
YouTube's automatic captioning and chapter tools suffer from poor accuracy and generate generic, unhelpful chapter titles that fail to capture the actual content flow. Creators have no control over the automated process, resulting in chapters that don't reflect their content strategy or audience needs.
The workflow problem is compounded for multilingual creators: existing chapter generation tools only accept English input, while YouTube's auto-translation produces inaccurate results that hurt discoverability. This forces creators to choose between authentic content in their native language or optimized English metadata for broader reach.
Manual chapter creation remains time-intensive and impractical for regular content production, while fully automated solutions lack the nuance and accuracy that quality content demands. Creators need a solution that combines automation efficiency with human oversight and control.
Solution
YouTube Chapter Generator uses a unique two-step pipeline: ASR (Automatic Speech Recognition) followed by LLM (Large Language Model) analysis. AWS Transcribe handles multi-language audio transcription with high accuracy, then Gemini AI analyzes the content flow to generate contextually relevant English chapter titles with precise timestamps.
The human-in-the-loop design gives creators control over the final output: review and edit AI-generated transcripts, adjust chapter boundaries, and refine titles before export. This approach delivers the efficiency of automation while preserving creator agency and ensuring chapters align with content strategy and brand voice.



KEY FEATURES
Multi-language Audio Input
Support for multiple languages in audio input with intelligent detection and processing, automatically converting speech to text regardless of the source language.
AI-Powered Chapter Generation
Advanced AI analysis using Gemini AI to identify natural break points, topic changes, and content flow to generate meaningful chapter markers with descriptive titles.
Human-in-the-loop Transcript Editing
Interactive transcript editing interface allowing users to review, correct, and refine AI-generated transcripts before chapter generation for maximum accuracy.
TECHNICAL STACK
FRONTEND
Built with React and modern JavaScript, featuring responsive design and intuitive user interface for seamless audio processing and chapter management workflows.
BACKEND
Express.js server with AWS Transcribe integration for accurate speech-to-text conversion and Gemini AI for intelligent content analysis and chapter generation.
AI & CLOUD
Leverages AWS Transcribe for multi-language audio processing and Google's Gemini AI for advanced natural language understanding and chapter optimization.
NEXT STEPS
Looking ahead, there are two promising directions for expanding the YouTube Chapter Generator:
- Open-Source Version
- Allow users to choose their own ASR system for transcription (AWS Transcribe, Whisper, etc.).
- Let them pair that with their preferred LLM (Gemini, OpenAI, local models, etc.) for chapter generation.
- Distribute with simple CLI or Docker setup so it's easy to run locally.
- Native macOS App
- Build a desktop app that runs the two-step process (ASR → LLM) directly on the machine.
- Integrate chapter generation into the YouTube upload workflow, so chapters attach automatically.
- Simplify the publishing process for creators who upload frequently from their Macs.