Java

DeepSpeechModel

class DeepSpeechModel

Exposes a DeepSpeech model in Java.

Public Functions

org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.DeepSpeechModel(String modelPath, int beam_width)

An object providing an interface to a trained DeepSpeech model.

Parameters
  • modelPath: The path to the frozen model graph.

  • beam_width: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.

int org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sampleRate()

Return the sample rate expected by the model.

Return

Sample rate.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.freeModel()

Frees associated resources and destroys model object.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.enableDecoderWihLM(String lm, String trie, float lm_alpha, float lm_beta)

Enable decoding using beam scoring with a KenLM language model.

Return

Zero on success, non-zero on failure (invalid arguments).

Parameters
  • lm: The path to the language model binary file.

  • trie: The path to the trie file build from the same vocabulary as the language model binary.

  • lm_alpha: The alpha hyperparameter of the CTC decoder. Language Model weight.

  • lm_beta: The beta hyperparameter of the CTC decoder. Word insertion weight.

Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.sttWithMetadata(short [] buffer, int buffer_size)

Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.

Return

Outputs a Metadata object of individual letters along with their timing information.

Parameters
  • buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • buffer_size: The number of samples in the audio signal.

DeepSpeechStreamingState org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.createStream()

Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().

Return

An opaque object that represents the streaming state.

void org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.feedAudioContent(DeepSpeechStreamingState ctx, short [] buffer, int buffer_size)

Feed audio samples to an ongoing streaming inference.

Parameters
  • cctx: A streaming state pointer returned by createStream().

  • buffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

  • buffer_size: The number of samples in buffer.

String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecode(DeepSpeechStreamingState ctx)

Compute the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

String org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStream(DeepSpeechStreamingState ctx)

Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Note

This method will free the state pointer (ctx).

Parameters

Metadata org.mozilla.deepspeech.libdeepspeech.DeepSpeechModel.finishStreamWithMetadata(DeepSpeechStreamingState ctx)

Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.

Return

Outputs a Metadata object of individual letters along with their timing information.

Note

This method will free the state pointer (ctx).

Parameters

Metadata

class Metadata

Stores the entire CTC output as an array of character metadata objects

Public Functions

MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItems()

List of items

int org.mozilla.deepspeech.libdeepspeech.Metadata.getNum_items()

Size of the list of items

MetadataItem org.mozilla.deepspeech.libdeepspeech.Metadata.getItem(int i)

Retrieve one MetadataItem element

Return

The MetadataItem requested or null

Parameters

MetadataItem

class MetadataItem

Stores each individual character, along with its timing information

Public Functions

String org.mozilla.deepspeech.libdeepspeech.MetadataItem.getCharacter()

The character generated for transcription

int org.mozilla.deepspeech.libdeepspeech.MetadataItem.getTimestep()

Position of the character in units of 20ms

float org.mozilla.deepspeech.libdeepspeech.MetadataItem.getStart_time()

Position of the character in seconds