Welcome to DeepSpeech’s documentation!¶
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper. Project DeepSpeech uses Google’s TensorFlow to make the implementation easier.
To install and use DeepSpeech all you have to do is:
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate
# Install DeepSpeech
pip3 install deepspeech
# Download pre-trained English model files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer
# Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz
tar xvf audio-0.9.3.tar.gz
# Transcribe an audio file
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. For the latest release, including pre-trained models and checkpoints, see the GitHub releases page.
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the release notes to find which GPUs are supported. To run deepspeech
on a GPU, install the GPU specific package:
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate
# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu
# Transcribe an audio file.
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
Please ensure you have the required CUDA dependencies.
See the output of deepspeech -h
for more information on the use of deepspeech
. (If you experience problems running deepspeech
, please check required runtime dependencies).
Introduction
- Using a Pre-trained Model
- CUDA dependency (inference)
- Getting the pre-trained model
- Important considerations on model inputs
- Model compatibility
- Using the Python package
- Using the Node.JS / Electron.JS package
- Using the command-line client
- Installing bindings from source
- Dockerfile for building from source
- Third party bindings
- Training Your Own Model
- Prerequisites for training a model
- Getting the training code
- Creating a virtual environment
- Activating the environment
- Installing DeepSpeech Training Code and its dependencies
- Recommendations
- Basic Dockerfile for training
- Common Voice training data
- Training a model
- Training with automatic mixed precision
- Checkpointing
- Exporting a model for inference
- Exporting a model for TFLite
- Making a mmap-able model for inference
- Fine-Tuning (same alphabet)
- Transfer-Learning (new alphabet)
- UTF-8 mode
- Augmentation
- Training from an Anaconda or miniconda environment
- Supported platforms for inference
- Building DeepSpeech Binaries
Contact/Getting Help¶
There are several ways to contact us or to get help:
Discourse Forums - The Deep Speech category on Discourse is the first place to look. Search for keywords related to your question or problem to see if someone else has run into it already. If you can’t find anything relevant there, search on our issue tracker to see if there is an existing issue about your problem.
Matrix chat - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the
#machinelearning
channel on Mozilla Matrix; people there can try to answer/helpCreate a new issue - Finally, if you have a bug report or a feature request that isn’t already covered by an existing issue, please open an issue in our repo and fill the appropriate information on your hardware and software setup.
Decoder and scorer
Architecture and training
API Reference