Welcome to DeepSpeech’s documentation!

util.audio.audiofile_to_input_vector(audio_filename, numcep, numcontext)[source]

Given a WAV audio file at audio_filename, calculates numcep MFCC features at every 0.01s time step with a window length of 0.025s. Appends numcontext context frames to the left and right of each time step, and returns this data in a numpy array.

util.text.levenshtein(a, b)[source]

Calculates the Levenshtein distance between a and b.

util.text.sparse_tensor_value_to_texts(value, alphabet)[source]

Given a tf.SparseTensor value, return an array of Python strings representing its values.

util.text.sparse_tuple_from(sequences, dtype=<class 'numpy.int32'>)[source]

Creates a sparse representention of sequences. Args:

  • sequences: a list of lists of type dtype where each element is a sequence

Returns a tuple with (indices, values, shape)

util.text.text_to_char_array(original, alphabet)[source]

Given a Python string original, remove unsupported characters, map characters to integers and return a numpy array representing the processed string.

util.text.wer(original, result)[source]

The WER is defined as the editing/Levenshtein distance on word level divided by the amount of words in the original text. In case of the original having more words (N) than the result and both being totally different (all N words resulting in 1 edit operation each), the WER will always be 1 (N / N = 1).


Returns the number of GPUs available on this system.

class util.stm.STMSegment(stm_line)[source]

Representation of an individual segment in an STM file.


Parses an STM file at stm_file into a list of STMSegment.

Indices and tables