User Tools

Site Tools


api:voice:stt

API : Voice : Speech to text

Introduction

This request will generate text from an audio file.

Request

URL https://api.telecomx.dk/voice/stt
Method POST - multipart form data
Access level Any authenticated user.
Body engine String [optional] Which speech engine to use: FREE or ELEVEN. Limits apply to usage of non-free engines. Defaults to FREE.
file Binary The audio file to convert to text.
language String [optional] ISO-639-1 or ISO-639-3 language format, e.g. en, eng, da, or dan, for the spoken audio. Can help with language detection, FREE works best with it.
tag_audio_events Boolean [optional] True to tag audio events like laugther, footsteps etc. Defaults to true. (only applies to ELEVEN).
timestamps_granularity String [optional] Timestamp precision: word, character or none. Defaults to word. (only applies to ELEVEN).
diarize Boolean [optional] True to annotate which speaker is speaking. Default to false. (only applies to ELEVEN).

Request body example

{
  "engine": "ELEVEN",
  "file": <BINARY BLOB>,
  "language": "da",
  "tag_audio_events": false,
  "timestamps_granularity": "word",
  "diarize": false
}

Response

Property Type Description
language_code String Language detected, ISO 639-1 format.
language_probability Number Confidence in language detected, 0 - 1.
text String The complete transcribed text.
words Array List of timestamps.
words[].text String Text.
words[].start Number Starting time in fractional seconds.
words[].end Number End time in fractional seconds.
words[].type String Type of segment: word, spacing.
words[].speaker_id String Id of who is speaking, if diarize is enabled.
words[].characters Array List of characters, if granularity is character.
words[].characters[].text String The character spoken.
words[].characters[].start Number Start time in fractional seconds.
words[].characters[].end Number End time in fractional seconds.

Note that properties holding no value may be omitted from the response.

Example

{
  "language_code": "da",
  "language_probability": 0.9086595773696899,
  "text": "Hej. Goddag, du snakker med Morten Hansen fra TDC. Jeg er ham teknikeren, der skal ud til jer. Så prøv lige at ringe til mig. Det var lige om hvordan adgangsforholdene er. Ring til mig på 71 91 99 99. Det var 71 91 99 99. Hej.",
  "words": [
    {
      "text": "Hej.",
      "start": 0.899,
      "end": 0.959,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 0.959,
      "end": 0.959,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "Goddag,",
      "start": 0.959,
      "end": 1.199,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.199,
      "end": 1.22,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "du",
      "start": 1.22,
      "end": 1.299,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.299,
      "end": 1.299,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "snakker",
      "start": 1.299,
      "end": 1.539,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.539,
      "end": 1.539,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "med",
      "start": 1.539,
      "end": 1.639,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      ...
    }
  ]
}

Errors

Error code Message Description
404 file Audio file missing or invalid format
403 access_denied Insufficient access level
403 quota_exceeded Quota limit has been reached
422 file No speech detected in audio file
500 internal_error <Unspecified>
api/voice/stt.txt · Last modified: 2025/05/12 10:21 by Per Møller

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki