assistant-skills/youtube-transcript/SKILL.md
2026-05-17 18:59:28 -05:00

3.3 KiB

name description
youtube-transcript Download and summarize transcripts from YouTube videos using yt-dlp. Use this skill whenever the user provides a YouTube URL and wants the transcript, a summary, or to analyze the content of a video. Also trigger when the user says "transcribe this video", "get the subtitles", "what does this video say", or "summarize this YouTube video".

YouTube Transcript Download & Summarization

Overview

This skill downloads auto-generated or manual subtitles from YouTube videos using yt-dlp, cleans them into readable plain text, and then summarizes or analyzes the content as requested.

Prerequisites

  • yt-dlp must be installed (check with which yt-dlp)
  • Python 3 is used for cleaning the VTT output

Step 1: Download the transcript

Use yt-dlp to fetch subtitles without downloading the video:

yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt \
  -o "/tmp/opencode/transcript" "YOUTUBE_URL"

Flags explained:

  • --write-auto-sub: Download auto-generated subtitles (use --write-sub instead if you need only manually uploaded subtitles, or both flags for either)
  • --sub-lang en: Prefer English subtitles
  • --skip-download: Don't download the video/audio
  • --sub-format vtt: Get subtitles in VTT format

Also grab the video title for context:

yt-dlp --print title "YOUTUBE_URL"

Step 2: Clean the VTT to plain text

The raw VTT file contains timestamps, HTML-like tags, and duplicated lines. Clean it with Python:

import re

with open('/tmp/opencode/transcript.en.vtt', 'r') as f:
    content = f.read()

# Remove VTT timestamp tags
content = re.sub(r'<[^>]+>', '', content)
# Remove timestamp lines
content = re.sub(r'\d{2}:\d{2}:\d{2}\.\d+ --> \d{2}:\d{2}:\d{2}\.\d+.*', '', content)
# Remove VTT headers
content = re.sub(r'WEBVTT.*', '', content)
content = re.sub(r'Kind:.*', '', content)
content = re.sub(r'Language:.*', '', content)

# Deduplicate consecutive identical lines (VTT repeats text for overlap)
lines = content.strip().split('\n')
clean = []
prev = ''
for line in lines:
    line = line.strip()
    if line and line != prev:
        clean.append(line)
        prev = line

text = ' '.join(clean)
text = re.sub(r'\s+', ' ', text)

with open('/tmp/opencode/transcript_clean.txt', 'w') as f:
    f.write(text)

This produces a single clean paragraph of text at /tmp/opencode/transcript_clean.txt.

Step 3: Read and summarize

Read the cleaned transcript. For long transcripts, split into chunks (~4000 words each) to avoid truncation, then read each chunk.

Summarize the content according to the user's request:

  • If they asked for a summary, provide a concise summary organized by topic
  • If they asked about a specific argument or section, find and explain that part
  • If they want the full transcript, present the cleaned text

Notes

  • The subtitle file will be named based on the -o flag plus the language suffix, e.g. /tmp/opencode/transcript.en.vtt
  • If no English subtitles are available, yt-dlp will error. Try without --sub-lang en to see what languages are available.
  • Auto-generated subtitles can have inaccuracies, especially for proper nouns and technical terms. Note this if the user needs precision.
  • For very long videos (>1 hour), the transcript may be very large. Consider splitting into sections and summarizing each before combining.