Transliterating Javanese Script Images to Roman Script using Convolutional Neural Network with Transfer Learning

Mohammad Naufal - Universitas Surabaya, Surabaya, 60293, Indonesia
Joko Siswantoro - Universitas Surabaya, Surabaya, 60293, Indonesia
Juan Soebroto - Universitas Surabaya, Surabaya, 60293, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.3.2566

Abstract


The Javanese script holds immense cultural significance within Indonesia despite its diminishing usage in contemporary contexts. Its presence remains notable in specific regions of Java and remains integral to many historical documents and texts. Consequently, there is an urgent need for a transliteration system adept at converting Javanese script into contemporary scripts like Roman or Indonesian, thereby contributing to preserving Java's linguistic and cultural legacy. However, reading or transliterating Javanese script can be time-consuming, especially for longer texts, presenting considerable challenges for non-native readers. This study aims to develop an effective transliteration system for converting Javanese script into Roman script. This system addresses the pressing need to preserve Java's linguistic and cultural heritage by facilitating the readability and accessibility of Javanese script, especially for non-native readers. This study introduces an Optical Character Recognition (OCR) system tailored to identify Javanese script characters and transcribe them into Roman characters, explicitly focusing on fundamental nglegena and sandhangan swara characters. Individual characters are isolated by leveraging horizontal and vertical projection techniques, facilitating subsequent classification using a Convolutional Neural Network (CNN) employing transfer learning methodologies. The system's achievement of an impressive average similarity score of 90.78% is noteworthy, with the Xception architecture demonstrating superior efficiency in transliteration tasks. Implementing such a system harbors significant promise in safeguarding the Javanese script and enhancing its accessibility to a broader audience. This research contributes substantially to preserving and propagating Indonesia's rich cultural and linguistic heritage amidst the digital age.

Keywords


Javanese Script; transliteration; Optical Character Recognition; Convolutional Neural Network

Full Text:

PDF

References


S. Wilonoyudho, “Urbanization and Regional Imbalances in Indonesia,” Indones. J. Geogr., vol. 49, no. 2, p. 125, Dec. 2017, doi: 10.22146/ijg.13039.

A. R. Widiarti and R. Pulungan, “A method for solving scriptio continua in Javanese manuscript transliteration,” Heliyon, vol. 6, no. 4, p. e03827, Apr. 2020, doi: 10.1016/j.heliyon.2020.e03827.

I. Prihandi, I. Ranggadara, S. Dwiasnati, Y. S. Sari, and Suhendra, “Implementation of Backpropagation Method for Identified Javanese Scripts,” J. Phys. Conf. Ser., vol. 1477, no. 3, p. 032020, Mar. 2020, doi: 10.1088/1742-6596/1477/3/032020.

G. A. Robby, A. Tandra, I. Susanto, J. Harefa, and A. Chowanda, “Implementation of Optical Character Recognition using Tesseract with the Javanese Script Target in Android Application,” Procedia Comput. Sci., vol. 157, pp. 499–505, 2019, doi: 10.1016/j.procs.2019.09.006.

A. Susanto, C. Atika Sari, I. U. W. Mulyono, and M. Doheir, “Histogram of Gradient in K-Nearest Neighbor for Javanese Alphabet Classification,” Sci. J. Inform., vol. 8, no. 2, pp. 289–296, Nov. 2021, doi: 10.15294/sji.v8i2.30788.

G. H. Wibowo, R. Sigit, and A. Barakbah, “Javanese Character Feature Extraction Based on Shape Energy,” Emit. Int. J. Eng. Technol., vol. 5, no. 1, pp. 154–169, Jul. 2017, doi: 10.24003/emitter.v5i1.175.

B. E. Praheto and F. B. B. Utomo, “Transliteration Method In Learning Reading Of The Javanese Script,” 2019.

Y. Harjoseputro, Y. D. Handarkho, and H. T. R. Adie, “The Javanese Letters Classifier with Mobile Client-Server Architecture and Convolution Neural Network Method,” Int. J. Interact. Mob. Technol. IJIM, vol. 13, no. 12, p. 67, Dec. 2019, doi: 10.3991/ijim.v13i12.11492.

A. Setiawan, A. S. Prabowo, and E. Y. Puspaningrum, “Handwriting Character Recognition Javanese Letters Based on Artificial Neural Network,” vol. 1, no. 1, 2019.

C. K. Dewa, A. L. Fadhilah, and A. Afiahayati, “Convolutional Neural Networks for Handwritten Javanese Character Recognition,” IJCCS Indones. J. Comput. Cybern. Syst., vol. 12, no. 1, p. 83, Jan. 2018, doi: 10.22146/ijccs.31144.

A. R. Widiarti, R. Pulungan, A. Harjoko, Marsono, and S. Hartati, “A proposed model for Javanese manuscript images transliteration,” J. Phys. Conf. Ser., vol. 1098, p. 012014, Sep. 2018, doi: 10.1088/1742-6596/1098/1/012014.

F. Ilham and N. Rochmawati, “Transliterasi Aksara Jawa Tulisan Tangan ke Tulisan Latin Menggunakan CNN,” J. Inform. Comput. Sci. JINACS, vol. 1, no. 04, pp. 200–208, Jul. 2020, doi: 10.26740/jinacs.v1n04.p200-208.

A. P. Tafti, A. Baghaie, M. Assefi, H. R. Arabnia, Z. Yu, and P. Peissig, “OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym,” in Advances in Visual Computing, vol. 10072, G. Bebis, R. Boyle, B. Parvin, D. Koracin, F. Porikli, S. Skaff, A. Entezari, J. Min, D. Iwai, A. Sadagic, C. Scheidegger, and T. Isenberg, Eds., in Lecture Notes in Computer Science, vol. 10072. , Cham: Springer International Publishing, 2016, pp. 735–746. doi: 10.1007/978-3-319-50835-1_66.

J. Tse, C. Jones, D. Curtis, and E. Yfantis, “An OCR-independent character segmentation using shortest-path in grayscale document images,” in Sixth International Conference on Machine Learning and Applications (ICMLA 2007), Cincinnati, OH, USA: IEEE, Dec. 2007, pp. 142–147. doi: 10.1109/ICMLA.2007.21.

I. Zeger, S. Grgic, J. Vukovic, and G. Sisul, “Grayscale Image Colorization Methods: Overview and Evaluation,” IEEE Access, vol. 9, pp. 113326–113346, 2021, doi: 10.1109/ACCESS.2021.3104515.

S. Wan, Y. Xia, L. Qi, Y.-H. Yang, and M. Atiquzzaman, “Automated Colorization of a Grayscale Image With Seed Points Propagation,” IEEE Trans. Multimed., vol. 22, no. 7, pp. 1756–1768, Jul. 2020, doi: 10.1109/TMM.2020.2976573.

M. Hagara, R. Stojanović, T. Bagala, P. Kubinec, and O. Ondráček, “Grayscale image formats for edge detection and for its FPGA implementation,” Microprocess. Microsyst., vol. 75, p. 103056, Jun. 2020, doi: 10.1016/j.micpro.2020.103056.

H. Guo, D. Yang, J. Zhao, and Y. Liu, “Research on the Preprocessing of Tai Le Character Recognition,” J. Phys. Conf. Ser., vol. 1617, no. 1, p. 012039, Aug. 2020, doi: 10.1088/1742-6596/1617/1/012039.

Y.-Q. Liu, X. Du, H.-L. Shen, and S.-J. Chen, “Estimating Generalized Gaussian Blur Kernels for Out-of-Focus Image Deblurring,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 3, pp. 829–843, Mar. 2021, doi: 10.1109/TCSVT.2020.2990623.

K. A. M. Said and A. B. Jambek, “Analysis of Image Processing Using Morphological Erosion and Dilation,” J. Phys. Conf. Ser., vol. 2071, no. 1, p. 012033, Oct. 2021, doi: 10.1088/1742-6596/2071/1/012033.

B. Amin, M. Mohsin Riaz, and A. Ghafoor, “Automatic shadow detection and removal using image matting,” Signal Process., vol. 170, p. 107415, May 2020, doi: 10.1016/j.sigpro.2019.107415.

M. T. Nyo, F. Mebarek-Oudina, S. S. Hlaing, and N. A. Khan, “Otsu’s thresholding technique for MRI image brain tumor segmentation,” Multimed. Tools Appl., vol. 81, no. 30, pp. 43837–43849, Dec. 2022, doi: 10.1007/s11042-022-13215-1.

M. Khayyat, L. Lam, C. Y. Suen, F. Yin, and C.-L. Liu, “Arabic Handwritten Text Line Extraction by Applying an Adaptive Mask to Morphological Dilation,” in 2012 10th IAPR International Workshop on Document Analysis Systems, Gold Coast, Queenslands, TBD, Australia: IEEE, Mar. 2012, pp. 100–104. doi: 10.1109/DAS.2012.20.

R. Ptak, B. Żygadło, and O. Unold, “Projection–Based Text Line Segmentation with a Variable Threshold,” Int. J. Appl. Math. Comput. Sci., vol. 27, no. 1, pp. 195–206, Mar. 2017, doi: 10.1515/amcs-2017-0014.

K. L. Banumathi and A. P. Jagadeesh Chandra, “Line and word segmentation of Kannada handwritten text documents using projection profile technique,” in 2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT), Mysuru: IEEE, Dec. 2016, pp. 196–201. doi: 10.1109/ICEECCOT.2016.7955214.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT: IEEE, Jun. 2018, pp. 4510–4520. doi: 10.1109/CVPR.2018.00474.

S. Mascarenhas and M. Agarwal, “A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for Image Classification,” in 2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON), Bengaluru, India: IEEE, Nov. 2021, pp. 96–99. doi: 10.1109/CENTCON52345.2021.9687944.

F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI: IEEE, Jul. 2017, pp. 1800–1807. doi: 10.1109/CVPR.2017.195.

Phiard, “Aksara Jawa.” 2020. [Online]. Available: https://www.kaggle.com/datasets/phiard/aksara-jawa/data

Setgeek, “Thefuzz.” 2021.

D. K. Po, “Similarity Based Information Retrieval Using Levenshtein Distance Algorithm,” Int. J. Adv. Sci. Res. Eng., vol. 06, no. 04, pp. 06–10, 2020, doi: 10.31695/IJASRE.2020.33780.

Sugiarto, I. G. S. M. Diyasa, and I. N. Diana, “Levenshtein Distance Algorithm Analysis on Enrollment and Disposition of Letters Application,” in 2020 6th Information Technology International Seminar (ITIS), Surabaya, Indonesia: IEEE, Oct. 2020, pp. 198–202. doi: 10.1109/ITIS50118.2020.9321030.

M. H. Al Farisi et al., “K-Means Algorithm and Levenshtein Distance Algorithm for Sentiment Analysis of School Zonation System Policy,” in 2021 Sixth International Conference on Informatics and Computing (ICIC), Jakarta, Indonesia: IEEE, Nov. 2021, pp. 1–6. doi: 10.1109/ICIC54025.2021.9632943.