Journal of Data Analysis

  • Home
  • About
  • Login
  • Register
  • Categories
  • Search
  • Current
  • Archives
  • Announcements
  • PUBLICATION ETHICS
Home > Volume 2, Number 2, December 2019 > Misbullah
User
 
Template of Article
Example of Article
Certificate of Reviewing
Open Journal Systems
PRINCIPAL CONTACT
Asep Rusyana
Department of Statistics, Faculty of Mathematics and Natural Sciences, Syiah Kuala University
Jalan Syech Abdurrauf No.3, Kopelma Darussalam, Banda Aceh 23111, Aceh, Indonesia
Email: jda@unsyiah.ac.id
Mobile Phone: +6281360635965

Picture Link:

 

Link
  • Department of Statistics
  • Forstat
  • Universitas Syiah Kuala
  • ISSN Online
  • Google Scholar
  • Garuda
  • Crossref
  • IOS
  • WorldCat
  • BASE
  • ISJD
  • Microsoft Academic
  • DOAJ
  • Arjuna Akreditasi Jurnal Nasional
  • Sinta
  • Scopus
  • Web of Science

 Picture Link:

 

 

 

Journal Help

Penerapan Time Delay Neural Network pada Model Akustik untuk Sistem Voice-to-Text Berbahasa Sunda

Alim Misbullah, Nazaruddin Nazaruddin, Marzuki Marzuki, Zulfan Zulfan

Abstract

Penerapan metode deep learning dalam berbagai bidang terutama pada kasus pengenalan pola sudah menghasilkan akurasi yang sangat menjanjikan. Jaringan saraf tiruan atau neural network merupakan bagian dari deep learning yang digunakan untuk melatih model pada kasus pengenalan pola seperti model untuk sistem pengenalan ucapan (voice-to-text). Neural network akan menyimpan informasi dari setiap fitur data berupa bobot pada jaringan yang terhubung antar layer pada model yang dibangun. Bobot pada jaringan tersebut diperbaharui berdasarkan banyaknya fitur dari data yang diinput. Sistem voice-to-text merupakan salah satu bidang pengenalan pola yang mengimplementasikan neural network untuk membangun model akustik. Model akustik pada sistem pengenalan ucapan dilatih menggunakan data audio berupa percakapan atau rekaman dari setiap individu untuk bahasa tertentu seperti bahasa Inggris. Penerapan neural network untuk sistem pengenalan ucapan berbahasa Inggris sudah banyak dilakukan bahkan sudah diimplementasikan dalam bentuk aplikasi karena mampu menghasilkan akurasi yang tinggi. Namun, penggunaan neural network untuk bahasa lokal masih jarang digunakan. Dalam tulisan ini, time delay neural network digunakan untuk membangun model akustik pada sistem pengenalan ucapan berbahasa Sunda. Berdasarkan hasil pengujian terhadap model akustik, time delay neural network mampu menghasilkan WER sampai dengan 0.57% setelah dilakukan penyesuaian pada hyperparameter dari neural network.

Implementation of deep learning techniques has given promising results recently in any research area, especially for pattern recognition. Neural network as a part of deep learning has been widely used to build model for various pattern recognition field including speech recognition. In neural network, weights which is parameters among layers play important roles to capture information from input data. The parameters are updated frequently based on input features in each iteration. In speech recognition, neural network is implemented to build acoustic model that uses speech from different speakers as training data. The acoustic model is built for specific language such as English, Mandarin and Indonesian. In recent years, the speech recognition system using deep neural network for English language has been developed well and use in many applications. But, implementation of deep neural network for local language is rarely done. In this research, time delay neural network is used to build acoustic model for speech recognition system of Sundanese language. Based on experimental result, the implementation of time delay neural network can reduce WER to be 0.57% with well-tuned hyperparameters of neural network.

 Keywords

neural network;speech recognition;acoustic model

 Full Text:

PDF (Bahasa Indonesia)

References

Deng, L., Hinton, G., & Kingsbury, B. (2013, May). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8599-8603). IEEE.

Vanhoucke, V., Devin, M. and Heigold, G., 2013, May. Multiframe deep neural networks for acoustic modeling. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7582-7585). IEEE.

Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Kingsbury, B. and Sainath, T., 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29.

Peddinti, V., Povey, D. and Khudanpur, S., 2015. A time delay neural network architecture for efficient modeling of long temporal contexts. In Sixteenth Annual Conference of the International Speech Communication Association.

Seltzer, M.L., Yu, D. and Wang, Y., 2013, May. An investigation of deep neural networks for noise robust speech recognition. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7398-7402). IEEE.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P. and Silovsky, J., 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.

Ittichaichareon, C., Suksri, S. and Yingthawornsuk, T., 2012, July. Speech recognition using MFCC. In International Conference on Computer Graphics, Simulation and Modeling (pp. 135-138).

Dimitriadis, D., Maragos, P. and Potamianos, A., 2005. Robust AM-FM features for speech recognition. IEEE signal processing letters, 12(9), pp.621-624.

Tiwari, V., 2010. MFCC and its applications in speaker recognition. International journal on emerging technologies, 1(1), pp.19-22.

Su, D., Wu, X. and Xu, L., 2010, March. GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4890-4893). IEEE.

Kjartansson, O., Sarin, S., Pipatsrisawat, K., Jansche, M. and Ha, L., 2018. Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali.

Hughes, T., Nakajima, K., Ha, L., Vasu, A., Moreno, P.J. and LeBeau, M., 2010. Building transcribed speech corpora quickly and cheaply for many languages. In Eleventh Annual Conference of the International Speech Communication Association.

Tran, V.H., Nguyen, L.T.T., Hoang, T. and Tran, X.T., 2013. Design and Implementation of a SoPC System for Speech Recognition.

Sakti, S. and Nakamura, S., 2014. Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks. In Spoken Language Technologies for Under-Resourced Languages.

Rahmawati, R. and Lestari, D.P., 2017, October. Java and Sunda dialect recognition from Indonesian speech using GMM and I-Vector. In 2017 11th International Conference on Telecommunication Systems Services and Applications (TSSA) (pp. 1-5). IEEE.

Arwandani, G., Osmond, A.B. and Nugrahaeni, R.A., 2018. Deep Neural Network Untuk Pengenalan Ucapan Pada Bahasa Sunda Dialek Utara. eProceedings of Engineering, 5(3).

Fathurrahman, D.N., Osmond, A.B. and Saputra, R.E., 2018. Deep Neural Network Untuk Pengenalan Ucapan Pada Bahasa Sunda Dialek Tengah Timur (majalengka). eProceedings of Engineering, 5(3).

Hakim, L.A., Osmond, A.B. and Saputra, R.E., 2018. Recurrent Neural Network Untuk Pengenalan Ucapan Pada Bahasa Sunda Selatan Dialek Garut. eProceedings of Engineering, 5(3).

Kjartansson, O., Sarin, S., Pipatsrisawat, K., Jansche, M. and Ha, L., 2018. Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali.

Senior, A. and Lopez-Moreno, I., 2014, May. Improving DNN speaker independence with i-vector inputs. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 225-229). IEEE.

Park, D.S., Chan, W., Zhang, Y., Chiu, C.C., Zoph, B., Cubuk, E.D. and Le, Q.V., 2019. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779.

Stolcke, A., 2002. SRILM-an extensible language modeling toolkit. In Seventh international conference on spoken language processing.

DOI: https://doi.org/10.24815/jda.v2i2.15235

Refbacks

  • There are currently no refbacks.

Partnership with: 

About The Authors

Alim Misbullah
Jurusan Informatika, Fakultas MIPA, Universitas Syiah Kuala
Indonesia

I am a junior lecturer at Informatics Department, Mathematics and Science Faculty, Syiah Kuala University. My research areas are focus on Machine Learning, Big Data and Deep Learning, especially for Speech Recognition System.

Nazaruddin Nazaruddin
Jurusan Informatika, Fakultas MIPA, Universitas Syiah, Banda Aceh, Indonesia
Indonesia

Marzuki Marzuki
Jurusan Statistika, Fakultas MIPA, Universityas Syiah Kuala, Banda Aceh, Indonesia
Indonesia

Zulfan Zulfan
Jurusan Informatika, Fakultas MIPA, Universitas Syiah, Banda Aceh, Indonesia
Indonesia

Indexed by:

 
Keywords ARIMA Analisis Regresi Banda Aceh Biplot Canonical correlation analysis Computer Network Correspondence Development areas Forecasting Hybrid Kemiskinan Korelasi MANOVA Mean lingkage Multidimensional Nutritonal status Quality of Service Software-defined Network Spatial Regression Stunting Sumatera Island
Journal Content

Browse
  • By Issue
  • By Author
  • By Title
  • Other Journals
  • Categories
Since 17 March 2018
Visitors since 17 March 2018

Flag Counter

View My Stats

Font Size

 

 

 ISSN 2623-0658 (Print) 

 ISSN 2623-2286 (Online)

 

Creative Commons License
Journal of Data Analysis (JDA) is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.