API : Voice : Speech to text

Introduction

This request will generate text from an audio file.

Request

URL	https://api.telecomx.dk/voice/stt
Method	POST - multipart form data
Access level	Any authenticated user.
Body	engine	String	`[optional]` Which speech engine to use: FREE or ELEVEN. Limits apply to usage of non-free engines. Defaults to FREE.
	file	Binary	The audio file to convert to text.
	language	String	`[optional]` ISO-639-1 or ISO-639-3 language format, e.g. en, eng, da, or dan, for the spoken audio. Can help with language detection, FREE works best with it.
	tag_audio_events	Boolean	`[optional]` True to tag audio events like laugther, footsteps etc. Defaults to true. (only applies to ELEVEN).
	timestamps_granularity	String	`[optional]` Timestamp precision: word, character or none. Defaults to word. (only applies to ELEVEN).
	diarize	Boolean	`[optional]` True to annotate which speaker is speaking. Default to false. (only applies to ELEVEN).

Request body example

{
  "engine": "ELEVEN",
  "file": <BINARY BLOB>,
  "language": "da",
  "tag_audio_events": false,
  "timestamps_granularity": "word",
  "diarize": false
}

Response

Property	Type	Description
language_code	String	Language detected, ISO 639-1 format.
language_probability	Number	Confidence in language detected, 0 - 1.
text	String	The complete transcribed text.
words	Array	List of timestamps.
words[].text	String	Text.
words[].start	Number	Starting time in fractional seconds.
words[].end	Number	End time in fractional seconds.
words[].type	String	Type of segment: word, spacing.
words[].speaker_id	String	Id of who is speaking, if diarize is enabled.
words[].characters	Array	List of characters, if granularity is character.
words[].characters[].text	String	The character spoken.
words[].characters[].start	Number	Start time in fractional seconds.
words[].characters[].end	Number	End time in fractional seconds.

Note that properties holding no value may be omitted from the response.

Example

{
  "language_code": "da",
  "language_probability": 0.9086595773696899,
  "text": "Hej. Goddag, du snakker med Morten Hansen fra TDC. Jeg er ham teknikeren, der skal ud til jer. Så prøv lige at ringe til mig. Det var lige om hvordan adgangsforholdene er. Ring til mig på 71 91 99 99. Det var 71 91 99 99. Hej.",
  "words": [
    {
      "text": "Hej.",
      "start": 0.899,
      "end": 0.959,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 0.959,
      "end": 0.959,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "Goddag,",
      "start": 0.959,
      "end": 1.199,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.199,
      "end": 1.22,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "du",
      "start": 1.22,
      "end": 1.299,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.299,
      "end": 1.299,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "snakker",
      "start": 1.299,
      "end": 1.539,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.539,
      "end": 1.539,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "med",
      "start": 1.539,
      "end": 1.639,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      ...
    }
  ]
}

Errors

Error code	Message	Description
404	file	Audio file missing or invalid format
403	access_denied	Insufficient access level
403	quota_exceeded	Quota limit has been reached
422	file	No speech detected in audio file
500	internal_error	<Unspecified>

WIKI

Table of Contents

API : Voice : Speech to text

Introduction

Request

Request body example

Response

Example

Errors