JavaScript (NodeJS / ElectronJS)

Model

class Model(aModelPath, aBeamWidth)

An object providing an interface to a trained DeepSpeech model.

Arguments
  • aModelPath (string) – The path to the frozen model graph.

  • aBeamWidth (number) – The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.

Throws

on error

Model.createStream()

Create a new streaming inference state. The streaming state returned by this function can then be passed to Model.feedAudioContent() and Model.finishStream().

Throws

on error

Returns

object – an opaque object that represents the streaming state.

Model.enableDecoderWithLM(aLMPath, aTriePath, aLMAlpha, aLMBeta)

Enable decoding using beam scoring with a KenLM language model.

Arguments
  • aLMPath (string) – The path to the language model binary file.

  • aTriePath (string) – The path to the trie file build from the same vocabulary as the language model binary.

  • aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.

  • aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.

Returns

number – Zero on success, non-zero on failure (invalid arguments).

Model.feedAudioContent(aSctx, aBuffer, aBufferSize)

Feed audio samples to an ongoing streaming inference.

Arguments
  • aSctx (object) – A streaming state returned by Model.setupStream().

  • aBuffer (buffer) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize (number) – The number of samples in @param aBuffer.

Model.finishStream(aSctx)

Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.

Arguments
  • aSctx (object) – A streaming state returned by Model.setupStream().

Returns

string – The STT result. This method will free the state (@param aSctx).

Model.finishStreamWithMetadata(aSctx)

Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.

Arguments
  • aSctx (object) – A streaming state pointer returned by Model.setupStream().

Returns

object – Outputs a Metadata() struct of individual letters along with their timing information. The user is responsible for freeing Metadata by calling FreeMetadata(). This method will free the state pointer (@param aSctx).

Model.intermediateDecode(aSctx)

Compute the intermediate decoding of an ongoing streaming inference.

Arguments
  • aSctx (object) – A streaming state returned by Model.setupStream().

Returns

string – The STT intermediate result.

Model.sampleRate()

Return the sample rate expected by the model.

Returns

number – Sample rate.

Model.stt(aBuffer, aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Arguments
  • aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize (number) – The number of samples in the audio signal.

Returns

string – The STT result. Returns undefined on error.

Model.sttWithMetadata(aBuffer, aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.

Arguments
  • aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

  • aBufferSize (number) – The number of samples in the audio signal.

Returns

object – Outputs a Metadata() struct of individual letters along with their timing information. The user is responsible for freeing Metadata by calling FreeMetadata(). Returns undefined on error.

Module exported methods

FreeModel(model)

Frees associated resources and destroys model object.

Arguments
  • model (object) – A model pointer returned by Model()

FreeStream(stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

Arguments
FreeMetadata(metadata)

Free memory allocated for metadata information.

Arguments
printVersions()

Print version of this library and of the linked TensorFlow library on standard output.

Metadata

class Metadata()

Stores the entire CTC output as an array of character metadata objects

Metadata.confidence()

Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.

Returns

float – Confidence value

Metadata.items()

List of items

Returns

array – List of MetadataItem()

Metadata.num_items()

Size of the list of items

Returns

int – Number of items

MetadataItem

class MetadataItem()

Stores each individual character, along with its timing information

MetadataItem.character()

The character generated for transcription

Returns

string – The character generated

MetadataItem.start_time()

Position of the character in seconds

Returns

float – The position of the character

MetadataItem.timestep()

Position of the character in units of 20ms

Returns

int – The position of the character