Augmentation of Additional Arabic Dataset for Jawi Writing and Classification Using Deep Learning

Safrizal Razali, Kahlil Muchtar, Muhammad Hafiz Rinaldi, Yudha Nurdin, Aulia Rahman

Abstract


This research aims to create an additional dataset containing Arabic characters for writing Jawi script and to train classification models using deep learning architectures such as InceptionV3 and ResNet34. The initial stage of the study involves digital image processing to obtain the additional Arabic character dataset from several sources, including HMBD, AHAWP, and HUCD, encompassing various connected and disconnected forms of Jawi script. Image processing includes steps such as preprocessing to enhance image quality, segmentation to separate Arabic characters from the background, and augmentation to increase dataset variability. Once the dataset is formed, we train the models using appropriate training data for each InceptionV3 and ResNet34 architecture. The classification evaluation results indicate that the model with ResNet34 architecture achieved the best performance with an accuracy of 96%. This model successfully recognizes Jawi script accurately and consistently, even for classes with similar shapes. The main contribution of this research is the availability of the additional Arabic character dataset that can be utilized for Jawi script recognition and performance assessment of various deep learning models. The study also emphasizes the importance of selecting the appropriate architecture for specific character recognition tasks. The research findings affirm that the model with ResNet34 architecture has excellent capability in recognizing the additional Arabic characters for writing Jawi. The results of this research have the potential to support further developments in Jawi character recognition applications and provide valuable insights for researchers in the field of character recognition sourced from Arabic characters.  Dataset augmentation results can be accessed at https://singkat.usk.ac.id/g/En0skCKGAR

Keywords


additional arabic characters; classification model training; deep learning architectures; jawi script

Full Text:

PDF

References


M. F. Nasrudin, K. Omar, M. S. Zakaria, and L. C. Yeun, “Handwritten Cursive Jawi Character Recognition : A Survey,” pp. 247–256, 2008, doi: 10.1109/CGIV.2008.36.

S. Safrizal, F. Arnia, and R. Muharar, “Pengenalan Aksara Jawi Tulisan Tangan Menggunakan Freemen Chain Code (FCC), Support Vector Machine (SVM) dan Aturan Pengambilan Keputusan,” Jurnal Nasional Teknik Elektro, vol. 5, no. 1, p. 45, 2016, doi: 10.25077/jnte.v5n1.185.2016.

H. M. Balaha, H. A. Ali, M. Saraya, and M. Badawy, “A new Arabic handwritten character recognition deep learning system (AHCR-DLS),” Neural Comput Appl, vol. 33, no. 11, pp. 6325–6367, Jun. 2021, doi: 10.1007/s00521-020-05397-2.

M. A. Khan, “Arabic handwritten alphabets, words and paragraphs per user (AHAWP) dataset,” Data Brief, vol. 41, p. 107947, Apr. 2022, doi: 10.1016/j.dib.2022.107947.

F. Mushtaq, M. M. Misgar, M. Kumar, and S. S. Khurana, “UrduDeepNet: offline handwritten Urdu character recognition using deep neural network,” Neural Comput Appl, 2021, doi: 10.1007/s00521-021-06144-x.

M. M. Misgar, F. Mushtaq, S. S. Khurana, and M. Kumar, “Recognition of offline handwritten Urdu characters using RNN and LSTM models,” Multimed Tools Appl, vol. 82, no. 2, pp. 2053–2076, Jan. 2023, doi: 10.1007/s11042-022-13320-1.

A. El-Sawy, M. Loey, and H. El-Bakry, “Arabic Handwritten Characters Recognition using Convolutional Neural Network,” WSEAS Transactions on Computer Research, vol. 5, pp. 11–19, 2017, Accessed: Dec. 20, 2022. [Online]. Available: https://www.wseas.org/multimedia/journals/computerresearch/2017/a045818-075.pdf

N. Pratiwi, “Dataset of Handwritten Arabic Characters with Harakat (Fathah, Kasrah and Dhammah),” Mendeley Data, V1,. Mendeley, 2022.

M. AbdElNafea and S. Heshmat, “Novel Databases for Arabic Online Handwriting Recognition System,” in 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), IEEE, Feb. 2020, pp. 263–267. doi: 10.1109/ITCE48509.2020.9047778.

S. Safrizal, “Pengenalan Karakter Jawi Tulisan Tangan Menggunakan Fitur Sudut,” VOCATECH: Vocational Education and Technology Journal, vol. 1, no. 1, pp. 1–4, 2019, doi: 10.38038/vocatech.v1i0.1.

S. Razali, Fitri Arnia, Rusdha Muharrar, Kahlil Muchtar, and Akhyar Bintang, “Improved Classification of Handwritten Jawi Script Based on Main Part of Script Body,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 1, pp. 94–104, Feb. 2023, doi: 10.29207/resti.v7i1.4600.

K. Saddami, K. Munadi, Y. Away, and F. Arnia, “DHJ: A database of handwritten Jawi for recognition research,” in 2017 International Conference on Electrical Engineering and Informatics (ICELTICs), IEEE, Oct. 2017, pp. 292–296. doi: 10.1109/ICELTICS.2017.8253279.

S. Joseph and J. George, “Data Augmentation for Handwritten Character Recognition of MODI Script Using Deep Learning Method,” 2021, pp. 515–522. doi: 10.1007/978-981-15-7062-9_51.

N. Elaraby, S. Barakat, and A. Rezk, “A conditional GAN-based approach for enhancing transfer learning performance in few-shot HCR tasks,” Sci Rep, vol. 12, no. 1, p. 16271, Sep. 2022, doi: 10.1038/s41598-022-20654-1.

C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J Big Data, vol. 6, no. 1, Dec. 2019, doi: 10.1186/s40537-019-0197-0.

S. Yang, W. Xiao, M. Zhang, S. Guo, J. Zhao, and F. Shen, “Image Data Augmentation for Deep Learning: A Survey,” Apr. 2022, [Online]. Available: http://arxiv.org/abs/2204.08610

Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random Erasing Data Augmentation,” Aug. 2017, [Online]. Available: http://arxiv.org/abs/1708.04896

H. Inoue, “Data Augmentation by Pairing Samples for Images Classification,” Jan. 2018.

C. Summers and M. J. Dinneen, “Improved Mixed-Example Data Augmentation,” May 2018, [Online]. Available: http://arxiv.org/abs/1805.11272

R. Takahashi, T. Matsubara, and K. Uehara, “Data Augmentation using Random Image Cropping and Patching for Deep CNNs,” Nov. 2018, doi: 10.1109/TCSVT.2019.2935128.

A. Stephen, A. Punitha, and A. Chandrasekar, “Designing self attention-based ResNet architecture for rice leaf disease classification,” Neural Comput Appl, vol. 35, no. 9, pp. 6737–6751, Mar. 2023, doi: 10.1007/s00521-022-07793-2.

G. Meena, K. K. Mohbey, S. Kumar, R. K. Chawda, and S. V. Gaikwad, “Image-Based Sentiment Analysis Using InceptionV3 Transfer Learning Approach,” SN Comput Sci, vol. 4, no. 3, p. 242, Mar. 2023, doi: 10.1007/s42979-023-01695-3.

Institute of Electrical and Electronics Engineers, 2018 International Interdisciplinary PhD Workshop (IIPhDW) : 9-12 May 2018.

L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” J Big Data, vol. 8, no. 1, p. 53, Dec. 2021, doi: 10.1186/s40537-021-00444-8.

A. Ghosh, A. Sufian, F. Sultana, A. Chakrabarti, and D. De, “Fundamental Concepts of Convolutional Neural Network,” vol. 172, V. E. Balas, R. Kumar, and R. Srivastava, Eds., in Intelligent Systems Reference Library, vol. 172. , Cham: Springer International Publishing, 2020, pp. 519–567. doi: 10.1007/978-3-030-32644-9_36.

K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in International Conference on Learning Representations , May 2015, pp. 1–14. doi: 10.48550/arXiv.1812.01187.

T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, “Bag of Tricks for Image Classification with Convolutional Neural Networks,” Dec. 2018, [Online]. Available: http://arxiv.org/abs/1812.01187

L. R. Baltazar et al., “Artificial intelligence on COVID-19 pneumonia detection using chest xray images,” PLoS One, vol. 16, no. 10, p. e0257884, Oct. 2021, doi: 10.1371/journal.pone.0257884.

N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Trans Syst Man Cybern, vol. 9, no. 1, pp. 62–66, Jan. 1979, doi: 10.1109/TSMC.1979.4310076.

R. Andonie, “Hyperparameter optimization in learning systems,” Journal of Membrane Computing, vol. 1, no. 4, pp. 279–291, Dec. 2019, doi: 10.1007/s41965-019-00023-0.

S. R. Shah, S. Qadri, H. Bibi, S. M. W. Shah, M. I. Sharif, and F. Marinello, “Comparing Inception V3, VGG 16, VGG 19, CNN, and ResNet 50: A Case Study on Early Detection of a Rice Disease,” Agronomy, vol. 13, no. 6, p. 1633, Jun. 2023, doi: 10.3390/agronomy13061633.

R. Zhang, L. Du, Q. Xiao, and J. Liu, “Comparison of Backbones for Semantic Segmentation Network,” J Phys Conf Ser, vol. 1544, no. 1, p. 012196, May 2020, doi: 10.1088/1742-6596/1544/1/012196.

M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” Aug. 2020.

M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.




DOI: https://doi.org/10.17529/jre.v20i1.33722

Article Metrics

Abstract view : 0 times
PDF - 0 times

Refbacks

  • There are currently no refbacks.


View My Stats

 

Creative Commons License

Jurnal Rekayasa Elektrika (JRE) is published under license of Creative Commons Attribution-ShareAlike 4.0 International License.