.Net Framework¶
DeepSpeech Interface¶
-
interface
IDeepSpeech
¶ Client interface of the Mozilla’s deepspeech implementation.
Subclassed by DeepSpeechClient.DeepSpeech
Public Functions
-
void DeepSpeechClient.Interfaces.IDeepSpeech.PrintVersions()
Prints the versions of Tensorflow and DeepSpeech.
-
unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.CreateModel(string aModelPath, uint aBeamWidth)
Create an object providing an interface to a trained DeepSpeech model.
- Parameters
aModelPath
: The path to the frozen model graph.aBeamWidth
: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.
- Exceptions
ArgumentException
: Thrown when the native binary failed to create the model.
-
unsafe int DeepSpeechClient.Interfaces.IDeepSpeech.GetModelSampleRate()
Return the sample rate expected by the model.
- Return
Sample rate.
-
unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.EnableDecoderWithLM(string aLMPath, string aTriePath, float aLMAlpha, float aLMBeta)
Enable decoding using beam scoring with a KenLM language model.
- Parameters
aLMPath
: The path to the language model binary file.aTriePath
: The path to the trie file build from the same vocabulary as the language model binary.aLMAlpha
: The alpha hyperparameter of the CTC decoder. Language Model weight.aLMBeta
: The beta hyperparameter of the CTC decoder. Word insertion weight.
- Exceptions
ArgumentException
: Thrown when the native binary failed to enable decoding with a language model.
-
unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)
Use the DeepSpeech model to perform Speech-To-Text.
- Return
The STT result. The user is responsible for freeing the string. Returns NULL on error.
- Parameters
aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.
-
unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize)
Use the DeepSpeech model to perform Speech-To-Text.
- Return
The extended metadata result. The user is responsible for freeing the struct. Returns NULL on error.
- Parameters
aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.
-
unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FreeStream()
Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
-
unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FreeString(IntPtr intPtr)
Free a DeepSpeech allocated string
-
unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FreeMetadata(IntPtr intPtr)
Free a DeepSpeech allocated Metadata struct
-
unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.CreateStream()
Creates a new streaming inference state.
- Exceptions
ArgumentException
: Thrown when the native binary failed to initialize the streaming mode.
-
unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FeedAudioContent(short [] aBuffer, uint aBufferSize)
Feeds audio samples to an ongoing streaming inference.
- Parameters
aBuffer
: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
-
unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.IntermediateDecode()
Computes the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result. The user is responsible for freeing the string.
-
unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.FinishStream()
Closes the ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The STT result. The user is responsible for freeing the string.
-
unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.FinishStreamWithMetadata()
Closes the ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The extended metadata result. The user is responsible for freeing the struct.
-
DeepSpeech Class¶
- class
Client of the Mozilla’s deepspeech implementation.
Public Functions
-
unsafe void DeepSpeechClient.DeepSpeech.CreateModel(string aModelPath, uint aBeamWidth)
Create an object providing an interface to a trained DeepSpeech model.
- Parameters
aModelPath
: The path to the frozen model graph.aBeamWidth
: The beam width used by the decoder. A larger beam width generates better results at the cost of decoding time.
- Exceptions
ArgumentException
: Thrown when the native binary failed to create the model.
-
unsafe int DeepSpeechClient.DeepSpeech.GetModelSampleRate()
Return the sample rate expected by the model.
- Return
Sample rate.
-
unsafe void DeepSpeechClient.DeepSpeech.Dispose()
Frees associated resources and destroys models objects.
-
unsafe void DeepSpeechClient.DeepSpeech.EnableDecoderWithLM(string aLMPath, string aTriePath, float aLMAlpha, float aLMBeta)
Enable decoding using beam scoring with a KenLM language model.
- Parameters
aLMPath
: The path to the language model binary file.aTriePath
: The path to the trie file build from the same vocabulary as the language model binary.aLMAlpha
: The alpha hyperparameter of the CTC decoder. Language Model weight.aLMBeta
: The beta hyperparameter of the CTC decoder. Word insertion weight.
- Exceptions
ArgumentException
: Thrown when the native binary failed to enable decoding with a language model.
-
unsafe void DeepSpeechClient.DeepSpeech.FeedAudioContent(short [] aBuffer, uint aBufferSize)
Feeds audio samples to an ongoing streaming inference.
- Parameters
aBuffer
: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).
-
unsafe string DeepSpeechClient.DeepSpeech.FinishStream()
Closes the ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The STT result. The user is responsible for freeing the string.
-
unsafe Models.Metadata DeepSpeechClient.DeepSpeech.FinishStreamWithMetadata()
Closes the ongoing streaming inference, returns the STT result over the whole audio signal.
- Return
The extended metadata. The user is responsible for freeing the struct.
-
unsafe string DeepSpeechClient.DeepSpeech.IntermediateDecode()
Computes the intermediate decoding of an ongoing streaming inference.
- Return
The STT intermediate result. The user is responsible for freeing the string.
-
unsafe void DeepSpeechClient.DeepSpeech.PrintVersions()
Prints the versions of Tensorflow and DeepSpeech.
-
unsafe void DeepSpeechClient.DeepSpeech.CreateStream()
Creates a new streaming inference state.
- Exceptions
ArgumentException
: Thrown when the native binary failed to initialize the streaming mode.
-
unsafe void DeepSpeechClient.DeepSpeech.FreeStream()
Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.
-
unsafe void DeepSpeechClient.DeepSpeech.FreeString(IntPtr intPtr)
Free a DeepSpeech allocated string
-
unsafe void DeepSpeechClient.DeepSpeech.FreeMetadata(IntPtr intPtr)
Free a DeepSpeech allocated Metadata struct
-
unsafe string DeepSpeechClient.DeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)
Use the DeepSpeech model to perform Speech-To-Text.
- Return
The STT result. The user is responsible for freeing the string. Returns NULL on error.
- Parameters
aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.
-
unsafe Models.Metadata DeepSpeechClient.DeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize)
Use the DeepSpeech model to perform Speech-To-Text.
- Return
The extended metadata. The user is responsible for freeing the struct. Returns NULL on error.
- Parameters
aBuffer
: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).aBufferSize
: The number of samples in the audio signal.
-
ErrorCodes¶
-
enum
DeepSpeechClient::Enums
::
ErrorCodes
¶ Error codes from the native DeepSpeech binary.
Values:
-
DS_ERR_OK
= 0x0000¶
-
DS_ERR_NO_MODEL
= 0x1000¶
-
DS_ERR_INVALID_ALPHABET
= 0x2000¶
-
DS_ERR_INVALID_SHAPE
= 0x2001¶
-
DS_ERR_INVALID_LM
= 0x2002¶
-
DS_ERR_MODEL_INCOMPATIBLE
= 0x2003¶
-
DS_ERR_FAIL_INIT_MMAP
= 0x3000¶
-
DS_ERR_FAIL_INIT_SESS
= 0x3001¶
-
DS_ERR_FAIL_INTERPRETER
= 0x3002¶
-
DS_ERR_FAIL_RUN_SESS
= 0x3003¶
-
DS_ERR_FAIL_CREATE_STREAM
= 0x3004¶
-
DS_ERR_FAIL_READ_PROTOBUF
= 0x3005¶
-
DS_ERR_FAIL_CREATE_SESS
= 0x3006¶
-
Metadata¶
-
struct
Metadata
¶ Package Attributes
-
unsafe IntPtr DeepSpeechClient.Structs.Metadata.items
Native list of items.
-
unsafe int DeepSpeechClient.Structs.Metadata.num_items
Count of items from the native side.
-
unsafe double DeepSpeechClient.Structs.Metadata.confidence
Approximated confidence value for this transcription.
-
MetadataItem¶
-
struct
MetadataItem
¶ Package Attributes
-
unsafe IntPtr DeepSpeechClient.Structs.MetadataItem.character
Native character.
-
unsafe int DeepSpeechClient.Structs.MetadataItem.timestep
Position of the character in units of 20ms.
-
unsafe float DeepSpeechClient.Structs.MetadataItem.start_time
Position of the character in seconds.
-