Add vision-analysis skill; move workaround from TOOLS.md; update log-work-expense to skip image analysis
This commit is contained in:
parent
ac1b4f83f3
commit
9e522fc164
3 changed files with 44 additions and 0 deletions
22
vision-analysis/SKILL.md
Normal file
22
vision-analysis/SKILL.md
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
---
|
||||
name: vision-analysis
|
||||
description: Analyze images using a vision model via Python. Use when the built-in image tool fails or when you need to extract text, describe content, or analyze one or more images. Triggers on any request to analyze, read, describe, or extract information from images.
|
||||
---
|
||||
|
||||
# Vision Analysis
|
||||
|
||||
Analyze images by calling the OpenRouter API directly via Python.
|
||||
|
||||
## Script
|
||||
|
||||
```bash
|
||||
uv run --with requests ~/notes/skills/vision-analysis/scripts/analyze.py "<prompt>" <image1> [image2] ...
|
||||
```
|
||||
|
||||
- **Prompt** (required): What to extract or analyze from the images.
|
||||
- **Images** (required): One or more local file paths.
|
||||
|
||||
## Notes
|
||||
|
||||
- The built-in `image` tool does not work with the configured vision model. Always use this script instead.
|
||||
- Max ~20 images per call (API limit).
|
||||
20
vision-analysis/scripts/analyze.py
Normal file
20
vision-analysis/scripts/analyze.py
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
import requests, json, sys, base64
|
||||
|
||||
API_KEY = "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"
|
||||
MODEL = "google/gemini-2.5-flash-lite"
|
||||
|
||||
prompt = sys.argv[1]
|
||||
paths = sys.argv[2:]
|
||||
|
||||
content = [{"type": "text", "text": prompt}]
|
||||
for p in paths:
|
||||
with open(p, "rb") as f:
|
||||
b64 = base64.b64encode(f.read()).decode()
|
||||
content.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}})
|
||||
|
||||
resp = requests.post(
|
||||
"https://openrouter.ai/api/v1/chat/completions",
|
||||
headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
|
||||
json={"model": MODEL, "messages": [{"role": "user", "content": content}], "max_tokens": 2000}
|
||||
)
|
||||
print(resp.json()["choices"][0]["message"]["content"])
|
||||
Loading…
Add table
Add a link
Reference in a new issue