Python¶
Model¶
-
class
Model
(model_path)[source]¶ Class holding a DeepSpeech model
- Parameters
aModelPath (str) – Path to model file to load
-
beamWidth
()[source]¶ Get beam width value used by the model. If setModelBeamWidth was not called before, will return the default value loaded from the model file.
- Returns
Beam width value used by the model.
- Type
-
createStream
()[source]¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to
feedAudioContent()
andfinishStream()
.- Returns
Stream object representing the newly created stream
- Type
- Throws
RuntimeError on error
-
disableExternalScorer
()[source]¶ Disable decoding using an external scorer.
- Returns
Zero on success, non-zero on failure.
-
enableExternalScorer
(scorer_path)[source]¶ Enable decoding using an external scorer.
- Parameters
scorer_path (str) – The path to the external scorer file.
- Throws
RuntimeError on error
-
stt
(audio_buffer)[source]¶ Use the DeepSpeech model to perform Speech-To-Text.
- Parameters
audio_buffer (numpy.int16 array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
- Returns
The STT result.
- Type
-
sttWithMetadata
(audio_buffer, num_results=1)[source]¶ Use the DeepSpeech model to perform Speech-To-Text and return results including metadata.
- Parameters
audio_buffer (numpy.int16 array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
num_results (int) – Maximum number of candidate transcripts to return. Returned list might be smaller than this.
- Returns
Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
- Type
Stream¶
-
class
Stream
(native_stream)[source]¶ Class wrapping a DeepSpeech stream. The constructor cannot be called directly. Use
Model.createStream()
-
feedAudioContent
(audio_buffer)[source]¶ Feed audio samples to an ongoing streaming inference.
- Parameters
audio_buffer (numpy.int16 array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
- Throws
RuntimeError if the stream object is not valid
-
finishStream
()[source]¶ Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference. The underlying stream object must not be used after this method is called.
- Returns
The STT result.
- Type
- Throws
RuntimeError if the stream object is not valid
-
finishStreamWithMetadata
(num_results=1)[source]¶ Compute the final decoding of an ongoing streaming inference and return results including metadata. Signals the end of an ongoing streaming inference. The underlying stream object must not be used after this method is called.
- Parameters
num_results (int) – Maximum number of candidate transcripts to return. Returned list might be smaller than this.
- Returns
Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
- Type
- Throws
RuntimeError if the stream object is not valid
-
freeStream
()[source]¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference.
- Throws
RuntimeError if the stream object is not valid
-
intermediateDecode
()[source]¶ Compute the intermediate decoding of an ongoing streaming inference.
- Returns
The STT intermediate result.
- Type
- Throws
RuntimeError if the stream object is not valid
-
intermediateDecodeWithMetadata
(num_results=1)[source]¶ Compute the intermediate decoding of an ongoing streaming inference and return results including metadata.
- Parameters
num_results (int) – Maximum number of candidate transcripts to return. Returned list might be smaller than this.
- Returns
Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
- Type
- Throws
RuntimeError if the stream object is not valid
-
Metadata¶
CandidateTranscript¶
-
class
CandidateTranscript
[source]¶ Stores the entire CTC output as an array of character metadata objects
-
confidence
()[source]¶ Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.
-
tokens
()[source]¶ List of tokens
- Returns
A list of
TokenMetadata()
elements- Type
-