This guide walks through the full API flow: create an upload, send the media file, complete the upload, create a transcription task, and poll for results.
Visit Video To Text to manage your account, create API keys, and return to the product workspace.
Set your API base URL and key:
export VTT_API_BASE_URL="https://api.videototext.dev"export VTT_API_KEY="vtt_xxxxx"1. Create an upload
Section titled “1. Create an upload”Create a signed upload URL for the media file.
curl -X POST "https://api.videototext.dev/v1/uploads" \ -H "Authorization: Bearer $VTT_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "filename": "meeting.mp4", "mimetype": "video/mp4" }'The response includes the fields needed for the next steps:
{ "data": { "uploadUrl": "https://storage.example.com/signed-upload-url", "fileKey": "uploads/example.mp4", "fileUrl": "https://static.example.com/uploads/example.mp4", "uploadId": "00000000-0000-0000-0000-000000000000", "expiresAt": "2026-06-06T10:00:00.000Z" }, "meta": {}}2. Upload the file
Section titled “2. Upload the file”Upload the binary file directly to uploadUrl. Use the same Content-Type value passed when creating the upload.
curl -X PUT "https://storage.example.com/signed-upload-url" \ -H "Content-Type: video/mp4" \ --data-binary "@meeting.mp4"3. Complete the upload
Section titled “3. Complete the upload”Complete the upload session so Video To Text can validate the object and create an asset.
curl -X POST "https://api.videototext.dev/v1/uploads/$UPLOAD_ID/operations/complete" \ -H "Authorization: Bearer $VTT_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "fileKey": "uploads/example.mp4", "fileUrl": "https://static.example.com/uploads/example.mp4", "filename": "meeting.mp4", "mimetype": "video/mp4", "fileSize": 10485760 }'The response returns the assetId used to create a transcription task.
{ "data": { "assetId": "00000000-0000-0000-0000-000000000000" }, "meta": {}}4. Create a transcription task
Section titled “4. Create a transcription task”Create the task from the uploaded asset. The default mode is balanced.
curl -X POST "https://api.videototext.dev/v1/tasks" \ -H "Authorization: Bearer $VTT_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "assetId": "00000000-0000-0000-0000-000000000000", "language": "Auto", "timestampMode": "CHUNK", "transcriptionMode": "balanced" }'Use precision when the workflow favors accuracy over cost.
The response returns data.task.transcriptId; use that value as TASK_ID when polling.
| Mode | Best for |
|---|---|
balanced | Fast, cost-efficient transcription for most production workflows. |
precision | Higher-accuracy transcription when quality matters more than cost. |
5. Poll the task
Section titled “5. Poll the task”Poll the task endpoint until status becomes SUCCEEDED, FAILED, or CANCELED.
curl "https://api.videototext.dev/v1/tasks/$TASK_ID" \ -H "Authorization: Bearer $VTT_API_KEY"Successful tasks return transcript text, public chunk timing fields, word timestamps, source file details, and billed credits.