Handle video and audio files without building your own upload, processing, and polling pipeline.
curl -X "POST" "https://api.videototext.dev/v1/tasks" \
-H "Authorization: Bearer $VTT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"assetId": "asset_9m2k8r",
"language": "Auto",
"timestampMode": "CHUNK",
"transcriptionMode": "balanced"
}'
Video To Text API
Speech-to-text infrastructure for modern apps
Video To Text API turns recordings, meetings, interviews, lessons, and media libraries into clean text your product can search, summarize, subtitle, and automate.
Get full text, timestamped chunks, word timings, source metadata, task status, and billing fields.
Use balanced for everyday workloads, or switch to precision for quality-sensitive content.
Everything needed to ship transcription workflows
Designed for SaaS teams that need reliable transcription inside real product flows.
Large-file uploads
Upload media through signed URLs so files do not have to pass through your application server.
Reliable task creation
Use predictable task states and optional retry safeguards to build resilient transcription queues.
Timestamped transcripts
Build captions, search, editors, clips, and review tools from chunk and word-level timing data.
Clear mode controls
Pick balanced for speed and cost, or precision when transcript quality matters most.
Make spoken content usable across your product
Give users faster ways to find, review, edit, and repurpose recorded speech.
Meeting intelligence
Turn calls and recordings into searchable notes, summaries, action items, and customer insights.
Media operations
Generate transcripts for podcasts, webinars, interviews, learning content, and long-form videos.
Subtitle and editing tools
Use timestamps as the foundation for captions, clip selection, review workflows, and timeline editors.