JavaScript (NodeJS / ElectronJS)¶
Model¶
-
class
Model
(aModelPath, aBeamWidth)¶ An object providing an interface to a trained DeepSpeech model.
- Arguments
aModelPath (string) – The path to the frozen model graph.
aBeamWidth (number) – The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.
- Throws
on error
-
Model.
createStream
()¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to
Model.feedAudioContent()
andModel.finishStream()
.- Throws
on error
- Returns
object – an opaque object that represents the streaming state.
-
Model.
enableDecoderWithLM
(aLMPath, aTriePath, aLMAlpha, aLMBeta)¶ Enable decoding using beam scoring with a KenLM language model.
- Arguments
aLMPath (string) – The path to the language model binary file.
aTriePath (string) – The path to the trie file build from the same vocabulary as the language model binary.
aLMAlpha (float) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (float) – The beta hyperparameter of the CTC decoder. Word insertion weight.
- Returns
number – Zero on success, non-zero on failure (invalid arguments).
-
Model.
feedAudioContent
(aSctx, aBuffer, aBufferSize)¶ Feed audio samples to an ongoing streaming inference.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream()
.aBuffer (buffer) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
aBufferSize (number) – The number of samples in @param aBuffer.
-
Model.
finishStream
(aSctx)¶ Signal the end of an audio signal to an ongoing streaming inference, returns the STT result over the whole audio signal.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream()
.
- Returns
string – The STT result. This method will free the state (@param aSctx).
-
Model.
finishStreamWithMetadata
(aSctx)¶ Signal the end of an audio signal to an ongoing streaming inference, returns per-letter metadata.
- Arguments
aSctx (object) – A streaming state pointer returned by
Model.setupStream()
.
- Returns
object – Outputs a
Metadata()
struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata()
. This method will free the state pointer (@param aSctx).
-
Model.
intermediateDecode
(aSctx)¶ Compute the intermediate decoding of an ongoing streaming inference.
- Arguments
aSctx (object) – A streaming state returned by
Model.setupStream()
.
- Returns
string – The STT intermediate result.
-
Model.
sampleRate
()¶ Return the sample rate expected by the model.
- Returns
number – Sample rate.
-
Model.
stt
(aBuffer, aBufferSize)¶ Use the DeepSpeech model to perform Speech-To-Text.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (number) – The number of samples in the audio signal.
- Returns
string – The STT result. Returns undefined on error.
-
Model.
sttWithMetadata
(aBuffer, aBufferSize)¶ Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Arguments
aBuffer (object) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aBufferSize (number) – The number of samples in the audio signal.
- Returns
object – Outputs a
Metadata()
struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata()
. Returns undefined on error.
Module exported methods¶
-
FreeModel
(model)¶ Frees associated resources and destroys model object.
- Arguments
model (object) – A model pointer returned by
Model()
-
FreeStream
(stream)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
- Arguments
stream (Object) – A streaming state pointer returned by
Model.createStream()
.
-
FreeMetadata
(metadata)¶ Free memory allocated for metadata information.
- Arguments
metadata (object) – Object containing metadata as returned by
Model.sttWithMetadata()
orModel.finishStreamWithMetadata()
-
printVersions
()¶ Print version of this library and of the linked TensorFlow library on standard output.
Metadata¶
-
class
Metadata
()¶ Stores the entire CTC output as an array of character metadata objects
-
Metadata.
confidence
()¶ Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.
- Returns
float – Confidence value
-
Metadata.
items
()¶ List of items
- Returns
array – List of
MetadataItem()
-
Metadata.
num_items
()¶ Size of the list of items
- Returns
int – Number of items
-
MetadataItem¶
-
class
MetadataItem
()¶ Stores each individual character, along with its timing information
-
MetadataItem.
character
()¶ The character generated for transcription
- Returns
string – The character generated
-
MetadataItem.
start_time
()¶ Position of the character in seconds
- Returns
float – The position of the character
-
MetadataItem.
timestep
()¶ Position of the character in units of 20ms
- Returns
int – The position of the character
-