DeepSpeech
v0.7.4
Introduction
Using a Pre-trained Model
CUDA dependency
Getting the pre-trained model
Model compatibility
Using the Python package
Create a DeepSpeech virtual environment
Activating the environment
Installing DeepSpeech Python bindings
Using the Node.JS / Electron.JS package
Using the command-line client
Installing bindings from source
Dockerfile for building from source
Third party bindings
Training Your Own Model
Prerequisites for training a model
Getting the training code
Creating a virtual environment
Activating the environment
Installing DeepSpeech Training Code and its dependencies
Recommendations
Basic Dockerfile for training
Common Voice training data
Training a model
Training with automatic mixed precision
Checkpointing
Exporting a model for inference
Exporting a model for TFLite
Making a mmap-able model for inference
Continuing training from a release model
Fine-Tuning (same alphabet)
Transfer-Learning (new alphabet)
UTF-8 mode
Augmentation
Sample domain augmentations
Spectrogram domain augmentations
Multi domain augmentations
Supported platforms for inference
Linux / AMD64 without GPU
Linux / AMD64 with GPU
Linux / ARMv7
Linux / Aarch64
Android / ARMv7
Android / Aarch64
macOS / AMD64
Windows / AMD64 without GPU
Windows / AMD64 with GPU
Decoder and scorer
CTC beam search decoder
Introduction
External scorer
Decoding modes
Default mode (alphabet based)
UTF-8 mode
Implementation
External scorer scripts
Reproducing our external scorer
Building your own scorer
Architecture and training
DeepSpeech Model
Geometric Constants
n_input
n_context
n_hidden_1, n_hidden_2, n_hidden_5
n_cell_dim
n_hidden_3
n_hidden_6
Parallel Optimization
Asynchronous Parallel Optimization
Synchronous Optimization
Hybrid Parallel Optimization
Adam Optimization
API Reference
Error codes
C API
Data structures
Metadata
CandidateTranscript
TokenMetadata
.NET Framework
DeepSpeech Class
DeepSpeechStream Class
ErrorCodes
Metadata
CandidateTranscript
TokenMetadata
DeepSpeech Interface
Java
DeepSpeechModel
Metadata
CandidateTranscript
TokenMetadata
JavaScript (NodeJS / ElectronJS)
Model
Stream
Module exported methods
Metadata
CandidateTranscript
TokenMetadata
Python
Model
Stream
Metadata
CandidateTranscript
TokenMetadata
Examples
C API Usage example
Creating a model instance and loading model
Performing inference
Full source code
.NET API Usage example
Creating a model instance and loading model
Performing inference
Full source code
Java API Usage example
Creating a model instance and loading model
Performing inference
Full source code
JavaScript API Usage example
Creating a model instance and loading model
Performing inference
Full source code
Python API Usage example
Creating a model instance and loading model
Performing inference
Full source code
User contributed examples
DeepSpeech
Docs
»
Index
Edit on GitHub
Index
B
|
C
|
D
|
E
|
F
|
I
|
M
|
N
|
O
|
S
|
T
|
V
B
beamWidth() (Model method)
C
CandidateTranscript (C++ class)
(class in native_client.python)
CandidateTranscript() (class)
CandidateTranscript.confidence (CandidateTranscript attribute)
CandidateTranscript.tokens (CandidateTranscript attribute)
CandidateTranscript::confidence (C++ member)
CandidateTranscript::num_tokens (C++ member)
CandidateTranscript::tokens (C++ member)
confidence() (CandidateTranscript method)
createStream() (Model method)
D
DeepSpeechClient::Enums::DS_ERR_FAIL_CREATE_SESS (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_FAIL_CREATE_STREAM (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_FAIL_INIT_MMAP (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_FAIL_INIT_SESS (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_FAIL_INTERPRETER (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_FAIL_READ_PROTOBUF (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_FAIL_RUN_SESS (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_INVALID_ALPHABET (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_INVALID_SCORER (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_INVALID_SHAPE (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_MODEL_INCOMPATIBLE (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_NO_MODEL (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_OK (C++ enumerator)
DeepSpeechClient::Enums::DS_ERR_SCORER_NOT_ENABLED (C++ enumerator)
DeepSpeechClient::Enums::ErrorCodes (C++ enum)
DeepSpeechClient::Interfaces::IDeepSpeech (C++ class)
DeepSpeechClient::Models::CandidateTranscript (C++ class)
DeepSpeechClient::Models::DeepSpeechStream (C++ class)
DeepSpeechClient::Models::Metadata (C++ class)
DeepSpeechClient::Models::TokenMetadata (C++ class)
disableExternalScorer() (Model method)
DS_CreateModel (C++ function)
DS_CreateStream (C++ function)
DS_DisableExternalScorer (C++ function)
DS_EnableExternalScorer (C++ function)
DS_FeedAudioContent (C++ function)
DS_FinishStream (C++ function)
DS_FinishStreamWithMetadata (C++ function)
DS_FreeMetadata (C++ function)
DS_FreeModel (C++ function)
DS_FreeStream (C++ function)
DS_FreeString (C++ function)
DS_GetModelSampleRate (C++ function)
DS_IntermediateDecode (C++ function)
DS_IntermediateDecodeWithMetadata (C++ function)
DS_SetScorerAlphaBeta (C++ function)
DS_SpeechToText (C++ function)
DS_SpeechToTextWithMetadata (C++ function)
DS_Version (C++ function)
E
enableExternalScorer() (Model method)
F
feedAudioContent() (Stream method)
finishStream() (Stream method)
finishStreamWithMetadata() (Stream method)
FreeMetadata() (built-in function)
FreeModel() (built-in function)
FreeStream() (built-in function)
freeStream() (Stream method)
I
intermediateDecode() (Stream method)
intermediateDecodeWithMetadata() (Stream method)
M
Metadata (C++ class)
(class in native_client.python)
Metadata() (class)
Metadata.transcripts (Metadata attribute)
Metadata::num_transcripts (C++ member)
Metadata::transcripts (C++ member)
Model (class in native_client.python)
Model() (class)
Model.beamWidth() (Model method)
Model.createStream() (Model method)
Model.disableExternalScorer() (Model method)
Model.enableExternalScorer() (Model method)
Model.sampleRate() (Model method)
Model.setBeamWidth() (Model method)
Model.setScorerAlphaBeta() (Model method)
Model.stt() (Model method)
Model.sttWithMetadata() (Model method)
N
native_client.python (module)
O
org::mozilla::deepspeech::libdeepspeech::CandidateTranscript (C++ class)
org::mozilla::deepspeech::libdeepspeech::DeepSpeechModel (C++ class)
org::mozilla::deepspeech::libdeepspeech::Metadata (C++ class)
org::mozilla::deepspeech::libdeepspeech::TokenMetadata (C++ class)
S
sampleRate() (Model method)
setBeamWidth() (Model method)
setScorerAlphaBeta() (Model method)
start_time() (TokenMetadata method)
Stream (class in native_client.python)
Stream() (class)
Stream.feedAudioContent() (Stream method)
Stream.finishStream() (Stream method)
Stream.finishStreamWithMetadata() (Stream method)
Stream.intermediateDecode() (Stream method)
Stream.intermediateDecodeWithMetadata() (Stream method)
stt() (Model method)
sttWithMetadata() (Model method)
T
text() (TokenMetadata method)
timestep() (TokenMetadata method)
TokenMetadata (C++ class)
(class in native_client.python)
TokenMetadata() (class)
TokenMetadata.start_time (TokenMetadata attribute)
TokenMetadata.text (TokenMetadata attribute)
TokenMetadata.timestep (TokenMetadata attribute)
TokenMetadata::start_time (C++ member)
TokenMetadata::text (C++ member)
TokenMetadata::timestep (C++ member)
tokens() (CandidateTranscript method)
transcripts() (Metadata method)
V
Version() (built-in function)
Read the Docs
v: v0.7.4
Versions
master
latest
v0.7.3
v0.7.2
v0.7.1
v0.7.0
v0.6.1
v0.6.0
Downloads
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.