JavaScript (NodeJS / ElectronJS)¶
Model¶
-
class
Model
(aModelPath)¶ exported from
index
An object providing an interface to a trained DeepSpeech model.
- Arguments
aModelPath (string) – The path to the frozen model graph.
-
Model.
beamWidth
()¶ Get beam width value used by the model. If
Model.setBeamWidth()
was not called before, will return the default value loaded from the model file.- Returns
number – Beam width value used by the model.
-
Model.
createStream
()¶ Create a new streaming inference state. One can then call
Stream.feedAudioContent()
andStream.finishStream()
on the returned stream object.- Returns
index.Stream – a
Stream()
object that represents the streaming state.
-
Model.
disableExternalScorer
()¶ Disable decoding using an external scorer.
-
Model.
enableExternalScorer
(aScorerPath)¶ Enable decoding using an external scorer.
- Arguments
aScorerPath (string) – The path to the external scorer file.
-
Model.
sampleRate
()¶ Return the sample rate expected by the model.
- Returns
number – Sample rate.
-
Model.
setBeamWidth
(aBeamWidth)¶ Set beam width value used by the model.
- Arguments
aBeamWidth (number) – The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.
-
Model.
setScorerAlphaBeta
(aLMAlpha, aLMBeta)¶ Set hyperparameters alpha and beta of the external scorer.
- Arguments
aLMAlpha (number) – The alpha hyperparameter of the CTC decoder. Language Model weight.
aLMBeta (number) – The beta hyperparameter of the CTC decoder. Word insertion weight.
-
Model.
stt
(aBuffer)¶ Use the DeepSpeech model to perform Speech-To-Text.
- Arguments
aBuffer (Buffer) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
- Returns
string – The STT result. Returns undefined on error.
-
Model.
sttWithMetadata
(aBuffer, aNumResults)¶ Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
- Arguments
aBuffer (Buffer) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.
- Returns
index.Metadata –
Metadata()
object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by callingFreeMetadata()
. Returns undefined on error.
Stream¶
-
class
Stream
(nativeStream)¶ Provides an interface to a DeepSpeech stream. The constructor cannot be called directly, use
Model.createStream()
.- Arguments
nativeStream (object) – SWIG wrapper for native StreamingState object.
-
Stream.
feedAudioContent
(aBuffer)¶ Feed audio samples to an ongoing streaming inference.
- Arguments
aBuffer (Buffer) – An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
-
Stream.
finishStream
()¶ Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.
- Returns
string – The STT result. This method will free the stream, it must not be used after this method is called.
-
Stream.
finishStreamWithMetadata
(aNumResults)¶ Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.
- Arguments
aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.
- Returns
index.Metadata – Outputs a
Metadata()
struct of individual letters along with their timing information. The user is responsible for freeing Metadata by callingFreeMetadata()
. This method will free the stream, it must not be used after this method is called.
-
Stream.
intermediateDecode
()¶ Compute the intermediate decoding of an ongoing streaming inference.
- Returns
string – The STT intermediate result.
-
Stream.
intermediateDecodeWithMetadata
(aNumResults)¶ Compute the intermediate decoding of an ongoing streaming inference, return results including metadata.
- Arguments
aNumResults (number) – Maximum number of candidate transcripts to return. Returned list might be smaller than this. Default value is 1 if not specified.
- Returns
index.Metadata –
Metadata()
object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by callingFreeMetadata()
. Returns undefined on error.
Module exported methods¶
-
FreeModel
(model)¶ Frees associated resources and destroys model object.
- Arguments
model (index.Model) – A model pointer returned by
Model()
-
FreeStream
(stream)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
- Arguments
stream (index.Stream) – A streaming state pointer returned by
Model.createStream()
.
-
FreeMetadata
(metadata)¶ Free memory allocated for metadata information.
- Arguments
metadata (index.Metadata) – Object containing metadata as returned by
Model.sttWithMetadata()
orStream.finishStreamWithMetadata()
-
Version
()¶ Returns the version of this library. The returned version is a semantic version (SemVer 2.0.0).
- Returns
string –
Metadata¶
CandidateTranscript¶
-
class
CandidateTranscript
()¶ interface, exported from
index
A single transcript computed by the model, including a confidence value and the metadata for its constituent tokens.
-
CandidateTranscript.
confidence
¶ type: number
Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/token that contributed to the creation of this transcription.
-
CandidateTranscript.
tokens
¶ type: index.TokenMetadata[]
-
TokenMetadata¶
-
class
TokenMetadata
()¶ interface, exported from
index
Stores text of an individual token, along with its timing information
-
TokenMetadata.
start_time
¶ type: number
Position of the token in seconds
-
TokenMetadata.
text
¶ type: string
The text corresponding to this token
-
TokenMetadata.
timestep
¶ type: number
Position of the token in units of 20ms
-