Python¶

Model¶

class Model(*args, **kwargs)[source]¶

Class holding a DeepSpeech model

Parameters

aModelPath (str) – Path to model file to load
aBeamWidth (int) – Decoder beam width

createStream()[source]¶

Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().

Returns: Object holding the stream
Throws: RuntimeError on error

enableDecoderWithLM(*args, **kwargs)[source]¶

Enable decoding using beam scoring with a KenLM language model.

Parameters

aLMPath (str) – The path to the language model binary file.
aTriePath (str) – The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.

Returns

Zero on success, non-zero on failure (invalid arguments).

Type

int

feedAudioContent(*args, **kwargs)[source]¶

Feed audio samples to an ongoing streaming inference.

Parameters

aSctx (object) – A streaming state pointer returned by createStream().
aBuffer (int array) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
aBufferSize (int) – The number of samples in @p aBuffer.

finishStream(*args, **kwargs)[source]¶

Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.

Parameters: aSctx (object) – A streaming state pointer returned by createStream().
Returns: The STT result.
Type: str

finishStreamWithMetadata(*args, **kwargs)[source]¶

Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.

Parameters: aSctx (object) – A streaming state pointer returned by createStream().
Returns: Outputs a struct of individual letters along with their timing information.
Type: Metadata()

intermediateDecode(*args, **kwargs)[source]¶

Compute the intermediate decoding of an ongoing streaming inference.

Parameters: aSctx (object) – A streaming state pointer returned by createStream().
Returns: The STT intermediate result.
Type: str

sampleRate()[source]¶

Return the sample rate expected by the model.

Returns: Sample rate.
Type: int

stt(*args, **kwargs)[source]¶

Use the DeepSpeech model to perform Speech-To-Text.

Parameters

aBuffer (int array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (int) – The number of samples in the audio signal.

Returns

The STT result.

Type

str

sttWithMetadata(*args, **kwargs)[source]¶

Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.

Parameters

aBuffer (int array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (int) – The number of samples in the audio signal.

Returns

Outputs a struct of individual letters along with their timing information.

Type

Metadata()

Metadata¶

class Metadata[source]¶

Stores the entire CTC output as an array of character metadata objects

confidence()[source]¶: Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.

items()[source]¶

List of items

Returns: A list of MetadataItem() elements
Type: list

num_items()[source]¶

Size of the list of items

Returns: Size of the list of items
Type: int

MetadataItem¶

class MetadataItem[source]¶

Stores each individual character, along with its timing information

character()[source]¶: The character generated for transcription

start_time()[source]¶: Position of the character in seconds

timestep()[source]¶: Position of the character in units of 20ms