C API¶
See also the list of error codes including descriptions for each error in Error codes.
-
int
DS_CreateModel
(const char *aModelPath, ModelState **retval)¶ An object providing an interface to a trained DeepSpeech model.
- Return
Zero on success, non-zero on failure.
- Parameters
aModelPath
: The path to the frozen model graph.[out] retval
: a ModelState pointer
-
void
DS_FreeModel
(ModelState *ctx)¶ Frees associated resources and destroys model object.
-
int
DS_EnableExternalScorer
(ModelState *aCtx, const char *aScorerPath)¶ Enable decoding using an external scorer.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.aScorerPath
: The path to the external scorer file.
-
int
DS_DisableExternalScorer
(ModelState *aCtx)¶ Disable decoding using an external scorer.
- Return
Zero on success, non-zero on failure.
- Parameters
aCtx
: The ModelState pointer for the model being changed.
-
int
DS_AddHotWord
(ModelState *aCtx, const char *word, float boost)¶ Add a hot-word and its boost.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.word
: The hot-word.boost
: The boost.
-
int
DS_EraseHotWord
(ModelState *aCtx, const char *word)¶ Remove entry for a hot-word from the hot-words map.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.word
: The hot-word.
-
int
DS_ClearHotWords
(ModelState *aCtx)¶ Removes all elements from the hot-words map.
- Return
Zero on success, non-zero on failure (invalid arguments).
- Parameters
aCtx
: The ModelState pointer for the model being changed.
-
int
DS_SetScorerAlphaBeta
(ModelState *aCtx, float aAlpha, float aBeta)¶ Set hyperparameters alpha and beta of the external scorer.
- Return
Zero on success, non-zero on failure.
- Parameters
aCtx
: The ModelState pointer for the model being changed.aAlpha
: The alpha hyperparameter of the decoder. Language model weight.aLMBeta
: The beta hyperparameter of the decoder. Word insertion weight.
-
int
DS_GetModelSampleRate
(const ModelState *aCtx)¶ Return the sample rate expected by a model.
- Return
Sample rate expected by the model for its input.
- Parameters
aCtx
: A ModelState pointer created with DS_CreateModel.
-
char *
DS_SpeechToText
(ModelState *aCtx, const short *aBuffer, unsigned int aBufferSize)¶ Use the DeepSpeech model to convert speech to text.
- Return
The STT result. The user is responsible for freeing the string using DS_FreeString(). Returns NULL on error.
- Parameters
aCtx
: The ModelState pointer for the model to use.aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.
-
Metadata *
DS_SpeechToTextWithMetadata
(ModelState *aCtx, const short *aBuffer, unsigned int aBufferSize, unsigned int aNumResults)¶ Use the DeepSpeech model to convert speech to text and output results including metadata.
- Return
Metadata struct containing multiple CandidateTranscript structs. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling DS_FreeMetadata(). Returns NULL on error.
- Parameters
aCtx
: The ModelState pointer for the model to use.aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.aNumResults
: The maximum number of CandidateTranscript structs to return. Returned value might be smaller than this.
-
int
DS_CreateStream
(ModelState *aCtx, StreamingState **retval)¶ Create a new streaming inference state. The streaming state returned by this function can then be passed to DS_FeedAudioContent() and DS_FinishStream().
- Return
Zero for success, non-zero on failure.
- Parameters
aCtx
: The ModelState pointer for the model to use.[out] retval
: an opaque pointer that represents the streaming state. Can be NULL if an error occurs.
-
void
DS_FeedAudioContent
(StreamingState *aSctx, const short *aBuffer, unsigned int aBufferSize)¶ Feed audio samples to an ongoing streaming inference.
- Parameters
aSctx
: A streaming state pointer returned by DS_CreateStream().aBuffer
: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples inaBuffer
.
-
char *
DS_IntermediateDecode
(const StreamingState *aSctx)¶ Compute the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result. The user is responsible for freeing the string using DS_FreeString().
- Parameters
aSctx
: A streaming state pointer returned by DS_CreateStream().
-
Metadata *
DS_IntermediateDecodeWithMetadata
(const StreamingState *aSctx, unsigned int aNumResults)¶ Compute the intermediate decoding of an ongoing streaming inference, return results including metadata.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling DS_FreeMetadata(). Returns NULL on error.
- Parameters
aSctx
: A streaming state pointer returned by DS_CreateStream().aNumResults
: The number of candidate transcripts to return.
-
char *
DS_FinishStream
(StreamingState *aSctx)¶ Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.
- Return
The STT result. The user is responsible for freeing the string using DS_FreeString().
- Note
This method will free the state pointer (
aSctx
).- Parameters
aSctx
: A streaming state pointer returned by DS_CreateStream().
-
Metadata *
DS_FinishStreamWithMetadata
(StreamingState *aSctx, unsigned int aNumResults)¶ Compute the final decoding of an ongoing streaming inference and return results including metadata. Signals the end of an ongoing streaming inference.
- Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling DS_FreeMetadata(). Returns NULL on error.
- Note
This method will free the state pointer (
aSctx
).- Parameters
aSctx
: A streaming state pointer returned by DS_CreateStream().aNumResults
: The number of candidate transcripts to return.
-
void
DS_FreeStream
(StreamingState *aSctx)¶ Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
- Note
This method will free the state pointer (
aSctx
).- Parameters
aSctx
: A streaming state pointer returned by DS_CreateStream().
-
void
DS_FreeString
(char *str)¶ Free a char* string returned by the DeepSpeech API.
-
char *
DS_Version
()¶ Returns the version of this library. The returned version is a semantic version (SemVer 2.0.0). The string returned must be freed with DS_FreeString().
- Return
The version string.