The Effect of Different Optimization Techniques on End-to-End Turkish Speech Recognition Systems that use Connectionist Temporal Classification


ARSLAN R. S., Barissi N.

2nd International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2018, Kizilcahamam, Ankara, Türkiye, 19 - 21 Ekim 2018 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ismsit.2018.8567240
  • Basıldığı Şehir: Kizilcahamam, Ankara
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: Acoustic Model(AM), Connectionist Temporal Classification(CTC), Long Short Term Memory(LSTM), Optimization Techniques, Recurrent Neural Network(RNN)
  • Kayseri Üniversitesi Adresli: Hayır

Özet

© 2018 IEEE.In the production of acoustic models for speech recognition applications, the use of Long Short Term Memory(LSTM) based Recurrent Neural Network(RNN) has begun to get better results than the use of Gaussian Mixture Model(GMM). The creation of GMM-based acoustic models is prolonging the deep learning process due to the need for aligned Hidden Markov Model(HMM). As a solution to this problem, another method to generate acoustic models is proposed that is based on Connectionist Temporal Classification(CTC). In this study, a CTC based model is created and the effect of different optimization techniques on the classification performance is compared. These tests were applied on Turkish speech datasets to determine the best optimization techniques to be used in speech recognition applications. Our evaluation results showed that GradientDescent, ProximalGradientDescent and RMSPROP produce better results than other algorithms.