Python

Model

class Model(*args, **kwargs)[source]

Class holding a DeepSpeech model

Parameters
  • aModelPath (str) – Path to model file to load

  • aBeamWidth (int) – Decoder beam width

createStream()[source]

Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().

Returns

Object holding the stream

Throws

RuntimeError on error

enableDecoderWithLM(*args, **kwargs)[source]

Enable decoding using beam scoring with a KenLM language model.

Parameters
  • aLMPath (str) – The path to the language model binary file.

  • aTriePath (str) – The path to the trie file build from the same vocabulary as the language model binary.

  • aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.

  • aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.

Returns

Zero on success, non-zero on failure (invalid arguments).

Type

int

feedAudioContent(*args, **kwargs)[source]

Feed audio samples to an ongoing streaming inference.

Parameters
  • aSctx (object) – A streaming state pointer returned by createStream().

  • aBuffer (int array) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize (int) – The number of samples in @p aBuffer.

finishStream(*args, **kwargs)[source]

Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.

Parameters

aSctx (object) – A streaming state pointer returned by createStream().

Returns

The STT result.

Type

str

finishStreamWithMetadata(*args, **kwargs)[source]

Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.

Parameters

aSctx (object) – A streaming state pointer returned by createStream().

Returns

Outputs a struct of individual letters along with their timing information.

Type

Metadata()

intermediateDecode(*args, **kwargs)[source]

Compute the intermediate decoding of an ongoing streaming inference.

Parameters

aSctx (object) – A streaming state pointer returned by createStream().

Returns

The STT intermediate result.

Type

str

sampleRate()[source]

Return the sample rate expected by the model.

Returns

Sample rate.

Type

int

stt(*args, **kwargs)[source]

Use the DeepSpeech model to perform Speech-To-Text.

Parameters
  • aBuffer (int array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize (int) – The number of samples in the audio signal.

Returns

The STT result.

Type

str

sttWithMetadata(*args, **kwargs)[source]

Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.

Parameters
  • aBuffer (int array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize (int) – The number of samples in the audio signal.

Returns

Outputs a struct of individual letters along with their timing information.

Type

Metadata()

Metadata

class Metadata[source]

Stores the entire CTC output as an array of character metadata objects

confidence()[source]

Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.

items()[source]

List of items

Returns

A list of MetadataItem() elements

Type

list

num_items()[source]

Size of the list of items

Returns

Size of the list of items

Type

int

MetadataItem

class MetadataItem[source]

Stores each individual character, along with its timing information

character()[source]

The character generated for transcription

start_time()[source]

Position of the character in seconds

timestep()[source]

Position of the character in units of 20ms