Overview
A creator-first tool that turns raw speech into accurate, audience-friendly YouTube chapters — combining multi‑language transcription, AI analysis, and human-in-the-loop editing so structure matches intent.
Built to solve real workflow pain: control over titles and boundaries, reliable timestamps, and export-ready chapters that improve discoverability and retention.
Client
Personal Project
Key Features
Multi-language Audio Input
Support for multiple languages in audio input with intelligent detection and processing, automatically converting speech to text regardless of the source language.
AI-Powered Chapter Generation
Advanced AI analysis using Gemini AI to identify natural break points, topic changes, and content flow to generate meaningful chapter markers with descriptive titles.
Human-in-the-loop Transcript Editing
Interactive transcript editing interface allowing users to review, correct, and refine AI-generated transcripts before chapter generation for maximum accuracy.



Technical Stack
Frontend
React + modern JS; responsive UI for clean workflows.
Backend
Express with AWS Transcribe; reliable speech-to-text.
AI & Cloud
Gemini AI for analysis; exportable chapter output.
Next Steps
Looking ahead, there are two promising directions for expanding the YouTube Chapter Generator:
- Open-Source Version
- Allow users to choose their own ASR system for transcription (AWS Transcribe, Whisper, etc.).
- Let them pair that with their preferred LLM (Gemini, OpenAI, local models, etc.) for chapter generation.
- Distribute with simple CLI or Docker setup so it's easy to run locally.
- Native macOS App
- Build a desktop app that runs the two-step process (ASR → LLM) directly on the machine.
- Integrate chapter generation into the YouTube upload workflow, so chapters attach automatically.
- Simplify the publishing process for creators who upload frequently from their Macs.
