The Awesomeness of Viterbi
Automatic Speech Recognition (ASR) refers to the task of automatically convert a speech signal into text. Nowadays, we all use this marvelous technology without even wonder about its complexity.
Currents state of the art system are based on different Deep Neural Networks Architectures (the most basic one is the hybrid Deep Neural Network- Hidden Markov Model (DNN-HMM)) where the acoustic modeling is performed by the neural net and the decoding by the HMM.
Is kind of awesome to see that still today, Viterbi algorithm is used by some modern, basic speech recognition system. To put some context, speech recognition in the 70’s was something still quite poor, dark times for researches in speech, until statistical models were adopted (Jelinek and collaborators) at IBM. Statistical models in the form of HMM was the inflection point in speech and Viterbi as the leader of the battalion.
This simple and beautiful algorithm makes possible to calculate the most probable state of sequences to decode a transcription from speech to the text. What basically does is just recursion!! something that probably all of you learned in your first year in programming :). Viterbi algorithm is indeed a very cool example of Dynamic programming applied to a Machine Learning task. I suggest you to take a look at it. It is not all neural nets in this field!! 😀
By: Fabian Ritter