Java¶
DeepSpeechModel¶
-
class
DeepSpeechModel¶ Exposes a DeepSpeech model in Java.
Public Functions
-
org.deepspeech.libdeepspeech.DeepSpeechModel.DeepSpeechModel(String modelPath) An object providing an interface to a trained DeepSpeech model.
- Parameters
modelPath: The path to the frozen model graph.
- Exceptions
RuntimeException: on failure.
-
long org.deepspeech.libdeepspeech.DeepSpeechModel.beamWidth() Get beam width value used by the model. If setModelBeamWidth was not called before, will return the default value loaded from the model file.
- Return
Beam width value used by the model.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.setBeamWidth(long beamWidth) Set beam width value used by the model.
- Parameters
aBeamWidth: The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.
- Exceptions
RuntimeException: on failure.
-
int org.deepspeech.libdeepspeech.DeepSpeechModel.sampleRate() Return the sample rate expected by the model.
- Return
Sample rate.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.freeModel() Frees associated resources and destroys model object.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.enableExternalScorer(String scorer) Enable decoding using an external scorer.
- Parameters
scorer: The path to the external scorer file.
- Exceptions
RuntimeException: on failure.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.disableExternalScorer() Disable decoding using an external scorer.
- Exceptions
RuntimeException: on failure.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.setScorerAlphaBeta(float alpha, float beta) Enable decoding using beam scoring with a KenLM language model.
- Parameters
alpha: The alpha hyperparameter of the decoder. Language model weight.beta: The beta hyperparameter of the decoder. Word insertion weight.
- Exceptions
RuntimeException: on failure.
-
Metadata org.deepspeech.libdeepspeech.DeepSpeechModel.sttWithMetadata(short [] buffer, int buffer_size, int num_results) Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
- Parameters
buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).buffer_size: The number of samples in the audio signal.num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
DeepSpeechStreamingState org.deepspeech.libdeepspeech.DeepSpeechModel.createStream() Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().
- Return
An opaque object that represents the streaming state.
- Exceptions
RuntimeException: on failure.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.feedAudioContent(DeepSpeechStreamingState ctx, short [] buffer, int buffer_size) Feed audio samples to an ongoing streaming inference.
- Parameters
cctx: A streaming state pointer returned by createStream().buffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).buffer_size: The number of samples inbuffer.
-
String org.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecode(DeepSpeechStreamingState ctx) Compute the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result.
- Parameters
ctx: A streaming state pointer returned by createStream().
-
Metadata org.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecodeWithMetadata(DeepSpeechStreamingState ctx, int num_results) Compute the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result.
- Parameters
ctx: A streaming state pointer returned by createStream().num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
String org.deepspeech.libdeepspeech.DeepSpeechModel.finishStream(DeepSpeechStreamingState ctx) Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.
- Return
The STT result.
- Note
This method will free the state pointer (
ctx).- Parameters
ctx: A streaming state pointer returned by createStream().
-
Metadata org.deepspeech.libdeepspeech.DeepSpeechModel.finishStreamWithMetadata(DeepSpeechStreamingState ctx, int num_results) Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
- Note
This method will free the state pointer (
ctx).- Parameters
ctx: A streaming state pointer returned by createStream().num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.addHotWord(String word, float boost) Add a hot-word.
- Parameters
word:boost:
- Exceptions
RuntimeException: on failure.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.eraseHotWord(String word) Erase a hot-word.
- Parameters
word:
- Exceptions
RuntimeException: on failure.
-
void org.deepspeech.libdeepspeech.DeepSpeechModel.clearHotWords() Clear all hot-words.
- Exceptions
RuntimeException: on failure.
-
Metadata¶
-
class
Metadata¶ An array of CandidateTranscript objects computed by the model.
Public Functions
-
long org.deepspeech.libdeepspeech.Metadata.getNumTranscripts() Size of the transcripts array
-
CandidateTranscript org.deepspeech.libdeepspeech.Metadata.getTranscript(int i) Retrieve one CandidateTranscript element
- Return
The CandidateTranscript requested or null
- Parameters
i: Array index of the CandidateTranscript to get
-
CandidateTranscript¶
-
class
CandidateTranscript¶ A single transcript computed by the model, including a confidence
value and the metadata for its constituent tokens.
Public Functions
-
long org.deepspeech.libdeepspeech.CandidateTranscript.getNumTokens() Size of the tokens array
-
double org.deepspeech.libdeepspeech.CandidateTranscript.getConfidence() Approximated confidence value for this transcript. This is roughly the
sum of the acoustic model logit values for each timestep/character that
contributed to the creation of this transcript.
-
TokenMetadata org.deepspeech.libdeepspeech.CandidateTranscript.getToken(int i) Retrieve one TokenMetadata element
- Return
The TokenMetadata requested or null
- Parameters
i: Array index of the TokenMetadata to get
-
TokenMetadata¶
-
class
TokenMetadata¶ Stores text of an individual token, along with its timing information
Public Functions
-
String org.deepspeech.libdeepspeech.TokenMetadata.getText() The text corresponding to this token
-
long org.deepspeech.libdeepspeech.TokenMetadata.getTimestep() Position of the token in units of 20ms
-
float org.deepspeech.libdeepspeech.TokenMetadata.getStartTime() Position of the token in seconds
-