Java¶
DeepSpeechModel¶
-
class
DeepSpeechModel
¶ Exposes a DeepSpeech model in Java.
Public Functions
-
org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.DeepSpeechModel(String modelPath, int beam_width)
An object providing an interface to a trained DeepSpeech model.
- Parameters
modelPath
: The path to the frozen model graph.beam_width
: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.
-
int org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sampleRate()
Return the sample rate expected by the model.
- Return
Sample rate.
-
void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.freeModel()
Frees associated resources and destroys model object.
-
void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.enableDecoderWihLM(String lm, String trie, float lm_alpha, float lm_beta)
Enable decoding using beam scoring with a KenLM language model.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
lm
: The path to the language model binary file.trie
: The path to the trie file build from the same vocabulary as the language model binary.lm_alpha
: The alpha hyperparameter of the CTC decoder. Language Model weight.lm_beta
: The beta hyperparameter of the CTC decoder. Word insertion weight.
-
Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sttWithMetadata(short [] buffer, int buffer_size)
Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Return
Outputs a Metadata object of individual letters along with their timing information.
- Parameters
buffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).buffer_size
: The number of samples in the audio signal.
-
DeepSpeechStreamingState org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.createStream()
Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().
- Return
An opaque object that represents the streaming state.
-
void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.feedAudioContent(DeepSpeechStreamingState ctx, short [] buffer, int buffer_size)
Feed audio samples to an ongoing streaming inference.
- Parameters
cctx
: A streaming state pointer returned by createStream().buffer
: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).buffer_size
: The number of samples inbuffer
.
-
String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecode(DeepSpeechStreamingState ctx)
Compute the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result.
- Parameters
ctx
: A streaming state pointer returned by createStream().
-
String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStream(DeepSpeechStreamingState ctx)
Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The STT result.
- Note
This method will free the state pointer (
ctx
).- Parameters
ctx
: A streaming state pointer returned by createStream().
-
Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStreamWithMetadata(DeepSpeechStreamingState ctx)
Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.
- Return
Outputs a Metadata object of individual letters along with their timing information.
- Note
This method will free the state pointer (
ctx
).- Parameters
ctx
: A streaming state pointer returned by createStream().
-
Metadata¶
-
class
Metadata
¶ Stores the entire CTC output as an array of character metadata objects
Public Functions
-
MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItems()
List of items
-
int org.mozilla.deepspeech.libdeepspeech.Metadata.getNum_items()
Size of the list of items
-
MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItem(int i)
Retrieve one MetadataItem element
- Return
The MetadataItem requested or null
- Parameters
i
: Array index of the MetadataItem to get
-
MetadataItem¶
-
class
MetadataItem
¶ Stores each individual character, along with its timing information
Public Functions
-
String org.mozilla.deepspeech.libdeepspeech.MetadataItem.getCharacter()
The character generated for transcription
-
int org.mozilla.deepspeech.libdeepspeech.MetadataItem.getTimestep()
Position of the character in units of 20ms
-
float org.mozilla.deepspeech.libdeepspeech.MetadataItem.getStart_time()
Position of the character in seconds
-