Python¶
Model¶
-
class
Model
(*args, **kwargs)[source]¶ Class holding a DeepSpeech model
-
createStream
()[source]¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to
feedAudioContent()
andfinishStream()
.- Returns
Object holding the stream
- Throws
RuntimeError on error
-
enableDecoderWithLM
(*args, **kwargs)[source]¶ Enable decoding using beam scoring with a KenLM language model.
- Parameters
aLMPath (str) – The path to the language model binary file.
aTriePath (str) – The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.
- Returns
Zero on success, non-zero on failure (invalid arguments).
- Type
-
feedAudioContent
(*args, **kwargs)[source]¶ Feed audio samples to an ongoing streaming inference.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream()
.aBuffer (int array) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
aBufferSize (int) – The number of samples in @p aBuffer.
-
finishStream
(*args, **kwargs)[source]¶ Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream()
.- Returns
The STT result.
- Type
-
finishStreamWithMetadata
(*args, **kwargs)[source]¶ Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream()
.- Returns
Outputs a struct of individual letters along with their timing information.
- Type
-
intermediateDecode
(*args, **kwargs)[source]¶ Compute the intermediate decoding of an ongoing streaming inference.
- Parameters
aSctx (object) – A streaming state pointer returned by
createStream()
.- Returns
The STT intermediate result.
- Type
-
sttWithMetadata
(*args, **kwargs)[source]¶ Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Parameters
aBuffer (int array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (int) – The number of samples in the audio signal.
- Returns
Outputs a struct of individual letters along with their timing information.
- Type
-
Metadata¶
-
class
Metadata
[source]¶ Stores the entire CTC output as an array of character metadata objects
-
confidence
()[source]¶ Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.
-
items
()[source]¶ List of items
- Returns
A list of
MetadataItem()
elements- Type
-