Rychlik Abstract: Sequence-to-sequence mapping problem and CTC

In this introduction, I will discuss machine learning problems formulated in terms of sequence-to-sequence mapping. For example, translation from one language to another is mapping sequences of characters in one language to a sequence of characters in another. Another problem is that of speech-to-text translation, where the input is an audio signal (a sequence of pressure levels samples at high speed, typically ranging from 8 kHz to 192 kHz). The output is a sequence of characters representing speech. This second problem involves significant disparity: the number of characters in the output is relatively small as compared to the number of sound samples in the input. This is a perfect application for CTC, or Connectionist Temporal Classification, combined with Recurrent Neural Networks (RNN). CTC is a probabilistic model which allows to construct the most likely output sequence, learning from examples. The technique has been applied to numerous problems within the past decade.