Mel Frequency Cepstral Coefficients (MFCC) Method and Multiple Adaline Neural Network Model for Speaker Identification

Sudi Sasongko - University of Mataram, Mataram, 83125, Indonesia
Shofian Tsaury - University of Mataram, Mataram, 83125, Indonesia
Suthami Ariessaputra - University of Mataram, Mataram, 83125, Indonesia
Syafaruddin Ch - University of Mataram, Mataram, 83125, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.4.01376

Abstract


Speech recognition technology makes human contact with the computer more accessible. There are two phases in the speaker recognition process: capturing or extracting voice features and identifying the speaker's voice pattern based on the voice characteristics of each speaker. Speakers consist of men and women. Their voices are recorded and stored in a computer database. Mel Frequency Cepstrum Coefficients (MFCC) are used at the voice extraction stage with a characteristic coefficient of 13. MFCC is based on variations in the response of the human ear's critical range to frequencies (linear and logarithmic). The sound frame is converted to Mel frequency and processed with several triangular filters to get the cepstrum coefficient. Meanwhile, at the speech pattern recognition stage, the speaker uses an artificial neural network (ANN) Madaline model (many Adaline/ which is the plural form of Adaline) to compare the test sound characteristics. The training voice's features have been inputted as training data. The Madaline Neural Network training is BFGS Quasi-Newton Backpropagation with a goal parameter of 0,0001. The results obtained from the study prove that the Madaline model of artificial neural networks is not recommended for identification research. The results showed that the database's speech recognition rate reached 61% for ten tests. The test outside the database was rejected by only 14%, and 84% refused testing outside the database with different words from the training data. The results of this model can be used as a reference for creating an Android-based real-time system.


Keywords


MFCC; ANN; madaline; identification; speaker

Full Text:

PDF

References


B. Alkhatib and M. M. W. Kamal Eddin, “Voice Identification Using MFCC and Vector Quantization,” Baghdad Science Journal, vol. 17, no. 3(Suppl.), p. 1019, Sep. 2020, doi:10.21123/bsj.2020.17.3(suppl.).1019.

R. Jahangir et al., “Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network,” IEEE Access, vol. 8, pp. 32187–32202, 2020, doi: 10.1109/access.2020.2973541.

A. Riyani, “A Identifying Human Voice Signals Using the Fast Fourier Transform (Fft) Method Based on Matlab,” Journal of Informatics, Information System, Software Engineering and Applications (INISTA), vol. 1, no. 2, pp. 42–50, May 2019, doi:10.20895/inista.v1i2.52.

M. S. A. Apsari and I. M. Widiartha, “Classification of Women’s Voices Using Fast Fourier Transform (FFT) Method,” JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), vol. 10, no. 1, p. 59, Aug. 2021, doi: 10.24843/jlk.2021.v10.i01.p08.

M. Muhathir, S. Susilawati, and R. Muliono, “Analisis Fast Fourier Tansform untuk Pengenalan Voice Register Wanita dalam Teknik Bernyanyi,” Journal of Informatics And Telecommunication Engineering, vol. 2, no. 2, p. 92, Jan. 2019, doi:10.31289/jite.v2i2.2166.

J. F. Mahdi, “Frequency analyses of human voice using fast Fourier transform,” Iraqi Journal of Physics, vol. 13, no. 27, pp. 174–181, Feb. 2019, doi: 10.30723/ijp.v13i27.276.

M. H. Widianto, H. I. Pohan, and D. R. Hermanus, “Introduction to Indonesian Syllables Using the LPC Method and the Neural Network of Backpropagation,” International Journal of Engineering Trends and Technology, vol. 69, no. 5, pp. 137–146, May 2021, doi:10.14445/22315381/ijett-v69i5p220.

L. Wang, “A Machine Learning Assessment System for Spoken English Based on Linear Predictive Coding,” Mobile Information Systems, vol. 2022, pp. 1–12, Sep. 2022, doi: 10.1155/2022/6131572.

P. Choubey, “Warped Linear Predictive Coding of Speech Signal of Processing,” International Journal for Research in Applied Science and Engineering Technology, vol. 9, no. 5, pp. 1819–1827, May 2021, doi:10.22214/ijraset.2021.34680.

R. Mohd Hanifa, K. Isa, and S. Mohamad, “Speaker ethnic identification for continuous speech in malay language using pitch and MFCC,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 19, no. 1, p. 207, Jul. 2020, doi:10.11591/ijeecs.v19.i1.pp207-214.

R. R. Huizen and F. T. Kurniati, “Feature extraction with mel scale separation method on noise audio recordings,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 24, no. 2, p. 815, Nov. 2021, doi: 10.11591/ijeecs.v24.i2.pp815-824.

S. M. Al Sasongko, E. D. Jayanti, and S. Ariessaputra, “Application of Gray Scale Matrix Technique for Identification of Lombok Songket Patterns Based on Backpropagation Learning,” JOIV : International Journal on Informatics Visualization, vol. 6, no. 4, p. 835, Dec. 2022, doi: 10.30630/joiv.6.4.1532.

H. Kim and S. H. Jung, “SOGN: novel generative model using SOM,” Electronics Letters, vol. 55, no. 10, pp. 597–599, May 2019, doi:10.1049/el.2019.0202.

A. de Oliveira, M. Dajer, and J. Teixeira, “Clustering Pathologic Voice with Kohonen SOM and Hierarchical Clustering,” Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, 2021, doi: 10.5220/0010210901580163.

N. N. An, N. Q. Thanh, and Y. Liu, “Deep CNNs With Self-Attention for Speaker Identification,” IEEE Access, vol. 7, pp. 85327–85337, 2019, doi: 10.1109/access.2019.2917470.

F. Ye and J. Yang, “A Deep Neural Network Model for Speaker Identification,” Applied Sciences, vol. 11, no. 8, p. 3603, Apr. 2021, doi: 10.3390/app11083603.

O. Mamyrbayev, A. Toleu, G. Tolegen, and N. Mekebayev, “Neural architectures for gender detection and speaker identification,” Cogent Engineering, vol. 7, no. 1, p. 1727168, Jan. 2020, doi:10.1080/23311916.2020.1727168.

C. Xu, W. Rao, J. Wu, and H. Li, “Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2696–2709, 2021, doi:10.1109/taslp.2021.3100682.

A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, “Voxceleb: Large-scale speaker verification in the wild,” Computer Speech & Language, vol. 60, p. 101027, Mar. 2020, doi:10.1016/j.csl.2019.101027.

H. Shim, J. Jung, J. Kim, and H. Yu, “Integrated Replay Spoofing-Aware Text-Independent Speaker Verification,” Applied Sciences, vol. 10, no. 18, p. 6292, Sep. 2020, doi: 10.3390/app10186292.

N. A. Norani, M. S. Mohd Kasihmuddin, Mohd. A. Mansor, and N. S. N. Khurizan, “Logic Learning in Adaline Neural Network,” Pertanika Journal of Science and Technology, vol. 29, no. 1, 2021, doi:10.47836/pjst.29.1.16.

F. Salehi, M. Jaloli, R. Coben, and A. M. Nasrabadi, “Estimating brain effective connectivity from EEG signals of patients with autism disorder and healthy individuals by reducing volume conduction effect,” Cognitive Neurodynamics, vol. 16, no. 3, pp. 519–529, Nov. 2021, doi: 10.1007/s11571-021-09730-w.

“A Research on Different Filtering Techniques and Neural Networks Methods for Denoising Speech,” Special Issue, vol. 8, no. 9S2, pp. 503–511, Aug. 2019, doi: 10.35940/ijitee.i1107.0789s219.

S. Kumar, S. S. Gornale, R. Siddalingappa, and A. Mane, “Gender Classification Based on Online Signature Features using Machine Learning Techniques”, Int J Intell Syst Appl Eng, vol. 10, no. 2, pp. 260–268, May 2022. doi: 10.31838/jcr.07.09.222.

T. S. Gunawan, M. F. Alghifari, M. A. Morshidi, and M. Kartiwi, “A Review on Emotion Recognition Algorithms using Speech Analysis,” Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 6, no. 1, Mar. 2018, doi: 10.52549/ijeei.v6i1.409.

V. Kumar and O. P. Roy, “Formant Measure of Indian English Vowels for Speaker Identity,” Journal of Physics: Conference Series, vol. 2236, no. 1, p. 012011, Mar. 2022, doi: 10.1088/1742-6596/2236/1/012011.

S. Tirronen, S. R. Kadiri, and P. Alku, “The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection,” Journal of Voice, Apr. 2022, doi: 10.1016/j.jvoice.2022.03.021.

H. Nurdiyanto, H. Kurniawan, and S. Karnila, “Human Voice Recognition Using Artificial Neural Networks,” Turkish J. Comput. Math. Educ., vol. 12, no. 9, pp. 1070–1077, 2021, [Online]. Available: https://www.proquest.com/scholarly-journals/human-voice-recognition-using-artificial-neural/docview/2623465013/se-2.

A. Revathi, C. Ravichandran, P. Saisiddarth, and G. S. R. Prasad, “Isolated Command Recognition Using MFCC and Clustering Algorithm,” SN Computer Science, vol. 1, no. 2, Mar. 2020, doi:10.1007/s42979-020-0093-x.

J.-S. Sheu and C.-W. Chen, “Voice Recognition and Marking Using Mel-frequency Cepstral Coefficients,” Sensors and Materials, vol. 32, no. 10, p. 3209, Oct. 2020, doi: 10.18494/sam.2020.2860.

S. M. Qaisar, S. Bahanshal, and H. Alwazani, “A Cloud Assisted Hybrid Model Based Speaker Recognition and Resource Sharing,” Procedia Computer Science, vol. 163, pp. 410–416, 2019, doi:10.1016/j.procs.2019.12.123.

S. Nagarajan, S. S. S. Nettimi, L. S. Kumar, M. K. Nath, and A. Kanhe, “Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales,” Digital Signal Processing, vol. 104, p. 102763, Sep. 2020, doi:10.1016/j.dsp.2020.102763.

I. Pires et al., “Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review,” Sensors, vol. 18, no. 2, p. 160, Jan. 2018, doi:10.3390/s18010160.

D. M. Midyanti, S. Bahri, and H. I. Midyanti, “ADALINE Neural Network For Early Detection Of Cervical Cancer Based On Behavior Determinant,” Scientific Journal of Informatics, vol. 8, no. 2, pp. 283–288, Nov. 2021, doi: 10.15294/sji.v8i2.31064.