u

2026-05-17 18:59:28 -05:00 · 2026-05-17 18:59:28 -05:00 · bda05feab1
commit bda05feab1
parent 6d9cd18292
3 changed files with 97 additions and 0 deletions
--- a/_skill-index.md
+++ b/_skill-index.md
@ -41,6 +41,7 @@ description: Master index of all skills in your robot assistant system. Your ass
 | "add this to my brag sheet," "log this kudos," "save this feedback," "add to brag sheet," "log a win" | **brag-sheet** |
 | "support punt," "forward to support," "send to support," "punt this to support," "hand off to support" | **send-to-support** |
 | "add a contact," "look up a contact," "find someone's number," "update a contact," "delete a contact," "list my contacts," "contacts" | **contacts** |
+| "transcribe this video," "get the subtitles," "what does this video say," "summarize this YouTube video," "YouTube transcript" | **youtube-transcript** |

 ---

@ -214,6 +215,12 @@ description: Master index of all skills in your robot assistant system. Your ass
 **File:** `skills/contacts/SKILL.md`
 **Dependencies:** `uv` CLI, Python 3.12+, `contacts` script at `skills/contacts/scripts/contacts`

+### YouTube Transcript
+**Purpose:** Download and summarize transcripts from YouTube videos using yt-dlp. Cleans VTT subtitles into readable plain text and summarizes or analyzes content as requested.
+**Triggers:** "transcribe this video," "get the subtitles," "what does this video say," "summarize this YouTube video," "YouTube transcript"
+**File:** `skills/youtube-transcript/SKILL.md`
+**Dependencies:** `yt-dlp` CLI, Python 3
+
 ---

 ## Adding New Skills
--- a/pov-doc/assets/verkada-logo.png
+++ b/pov-doc/assets/verkada-logo.png
--- a/youtube-transcript/SKILL.md
+++ b/youtube-transcript/SKILL.md
@ -0,0 +1,90 @@
+---
+name: youtube-transcript
+description: Download and summarize transcripts from YouTube videos using yt-dlp. Use this skill whenever the user provides a YouTube URL and wants the transcript, a summary, or to analyze the content of a video. Also trigger when the user says "transcribe this video", "get the subtitles", "what does this video say", or "summarize this YouTube video".
+---
+
+# YouTube Transcript Download & Summarization
+
+## Overview
+
+This skill downloads auto-generated or manual subtitles from YouTube videos using `yt-dlp`, cleans them into readable plain text, and then summarizes or analyzes the content as requested.
+
+## Prerequisites
+
+- `yt-dlp` must be installed (check with `which yt-dlp`)
+- Python 3 is used for cleaning the VTT output
+
+## Step 1: Download the transcript
+
+Use yt-dlp to fetch subtitles without downloading the video:
+
+```bash
+yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt \
+  -o "/tmp/opencode/transcript" "YOUTUBE_URL"
+```
+
+Flags explained:
+- `--write-auto-sub`: Download auto-generated subtitles (use `--write-sub` instead if you need only manually uploaded subtitles, or both flags for either)
+- `--sub-lang en`: Prefer English subtitles
+- `--skip-download`: Don't download the video/audio
+- `--sub-format vtt`: Get subtitles in VTT format
+
+Also grab the video title for context:
+
+```bash
+yt-dlp --print title "YOUTUBE_URL"
+```
+
+## Step 2: Clean the VTT to plain text
+
+The raw VTT file contains timestamps, HTML-like tags, and duplicated lines. Clean it with Python:
+
+```python
+import re
+
+with open('/tmp/opencode/transcript.en.vtt', 'r') as f:
+    content = f.read()
+
+# Remove VTT timestamp tags
+content = re.sub(r'<[^>]+>', '', content)
+# Remove timestamp lines
+content = re.sub(r'\d{2}:\d{2}:\d{2}\.\d+ --> \d{2}:\d{2}:\d{2}\.\d+.*', '', content)
+# Remove VTT headers
+content = re.sub(r'WEBVTT.*', '', content)
+content = re.sub(r'Kind:.*', '', content)
+content = re.sub(r'Language:.*', '', content)
+
+# Deduplicate consecutive identical lines (VTT repeats text for overlap)
+lines = content.strip().split('\n')
+clean = []
+prev = ''
+for line in lines:
+    line = line.strip()
+    if line and line != prev:
+        clean.append(line)
+        prev = line
+
+text = ' '.join(clean)
+text = re.sub(r'\s+', ' ', text)
+
+with open('/tmp/opencode/transcript_clean.txt', 'w') as f:
+    f.write(text)
+```
+
+This produces a single clean paragraph of text at `/tmp/opencode/transcript_clean.txt`.
+
+## Step 3: Read and summarize
+
+Read the cleaned transcript. For long transcripts, split into chunks (~4000 words each) to avoid truncation, then read each chunk.
+
+Summarize the content according to the user's request:
+- If they asked for a summary, provide a concise summary organized by topic
+- If they asked about a specific argument or section, find and explain that part
+- If they want the full transcript, present the cleaned text
+
+## Notes
+
+- The subtitle file will be named based on the `-o` flag plus the language suffix, e.g. `/tmp/opencode/transcript.en.vtt`
+- If no English subtitles are available, yt-dlp will error. Try without `--sub-lang en` to see what languages are available.
+- Auto-generated subtitles can have inaccuracies, especially for proper nouns and technical terms. Note this if the user needs precision.
+- For very long videos (>1 hour), the transcript may be very large. Consider splitting into sections and summarizing each before combining.