Automatic Summarization of Court Decision Documents over Narcotic Cases Using BERT

Galih Wasis Wicaksono; Sheila Fitria Al asqalani; Yufis Azhar; Nur Putri Hidayah; Andreawana Andreawana

doi:10.30630/joiv.7.2.1811

Automatic Summarization of Court Decision Documents over Narcotic Cases Using BERT

Galih Wasis Wicaksono - University of Muhammadiyah Malang, Malang, Indonesia
Sheila Fitria Al asqalani - University of Muhammadiyah Malang, Malang, Indonesia
Yufis Azhar - University of Muhammadiyah Malang, Malang, Indonesia
Nur Putri Hidayah - University of Muhammadiyah Malang, Malang, Indonesia
Andreawana Andreawana - University of Muhammadiyah Malang, Malang, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.7.2.1811

Abstract

Reviewing court decision documents for references in handling similar cases can be time-consuming. From this perspective, we need a system that can allow the summarization of court decision documents to enable adequate information extraction. This study used 50 court decision documents taken from the official website of the Supreme Court of the Republic of Indonesia, with the cases raised being Narcotics and Psychotropics. The court decision document dataset was divided into two types, court decision documents with the identity of the defendant and court decision documents without the defendant's identity. We used BERT specific to the IndoBERT model to summarize the court decision documents. This study uses four types of IndoBert models: IndoBERT-Base-Phase 1, IndoBERT-Lite-Bas-Phase 1, IndoBERT-Large-Phase 1, and IndoBERT-Lite-Large-Phase 1. This study also uses three types of ratios and ROUGE-N in summarizing court decision documents consisting of ratios of 20%, 30%, and 40% ratios, as well as ROUGE1, ROUGE2, and ROUGE3. The results have found that IndoBERT pre-trained model had a better performance in summarizing court decision documents with or without the defendant's identity with a 40% summarizing ratio. The highest ROUGE score produced by IndoBERT was found in the INDOBERT-LITE-BASE PHASE 1 model with a ROUGE value of 1.00 for documents with the defendant's identity and 0.970 for documents without the defendant's identity at a ratio of 40% in R-1. For future research, it is expected to be able to use other types of Bert models such as IndoBERT Phase-2, LegalBert, etc.

Full Text:

PDF

References

F. Solihin and I. Budi, â€œRecording of law enforcement based on court decision document using rule-based information extraction,â€ in 2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018, 2019, pp. 349â€“354, doi: 10.1109/ICACSIS.2018.8618187.

M. Campr and K. JeÅ¾ek, â€œComparing semantic models for evaluating automatic document summarization,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, vol. 9302, no. 1, pp. 252â€“260, doi: 10.1007/978-3-319-24033-6_29.

Y. Liu, â€œFine-tune BERT for Extractive Summarization,â€ Cornell Univ., vol. 1, no. 1, pp. 1â€“6, 2019.

N. Baruah, S. K. Sarma, and S. Borkotokey, â€œA Single Document Assamese Text Summarization Using a Combination of Statistical Features and Assamese WordNet,â€ in Advances in Intelligent Systems and Computing, 2021, vol. 1199, pp. 125â€“136, doi: 10.1007/978-981-15-6353-9_12.

S. Meena, M. Ramkumar, R. Asmitha, and G. Emil Selvan, â€œText Summarization Using Text Frequency Ranking Sentence Prediction,â€ in 2020 4th International Conference on Computer, Communication and Signal Processing (ICCCSP), Sep. 2020, pp. 1â€“5, doi: 10.1109/ICCCSP49186.2020.9315203.

D. Gunawan, S. H. Harahap, and R. Fadillah Rahmat, â€œMulti-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia,â€ in Proceeding - 2019 International Conference on ICT for Smart Society: Innovation and Transformation Toward Smart Region, ICISS 2019, 2019, pp. 1â€“5, doi: 10.1109/ICISS48059.2019.8969785.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, â€œBERT: Pre-training of deep bidirectional transformers for language understanding,â€ in NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2019, vol. 1, pp. 4171â€“4186.

Y. Liu and M. Lapata, â€œText summarization with pretrained encoders,â€ in EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020, pp. 3730â€“3740, doi: 10.18653/v1/d19-1387.

T. Hirao, Y. Yoshida, M. Nishino, N. Yasuda, and M. Nagata, â€œSingle-document summarization as a tree knapsack problem,â€ in EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2013, pp. 1515â€“1520.

D. Djamal, S. Ono, and D. A. Wicaksana, â€œLaporan Penelitian Penyederhanaan Format Putusan,â€ 2016.

L. Gong, D. He, Z. Li, T. Qin, L. Wang, and T. Y. Liu, â€œEfficient training of BERT by progressively stacking,â€ in 36th International Conference on Machine Learning, ICML 2019, 2019, vol. 2019-June, pp. 4202â€“4211.

H. Zhang, J. Cai, J. Xu, and J. Wang, â€œPretraining-based natural language generation for text summarization,â€ in CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference, 2019, pp. 789â€“797, doi: 10.18653/v1/k19-1074.

R. Wei, H. Huang, and Y. Gao, â€œSharing Pre-trained BERT Decoder for a Hybrid Summarization,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11856 LNAI, pp. 169â€“180, doi: 10.1007/978-3-030-32381-3_14.

V. Kieuvongngam, B. Tan, and Y. Niu, â€œAutomatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2,â€ pp. 1â€“13, 2020.

D. Miller, â€œLeveraging BERT for Extractive Text Summarization on Lectures,â€ pp. 1â€“7, 2019.

W. Kai and Z. Lingyu, â€œResearch on Text Summary Generation Based on Bidirectional Encoder Representation from Transformers,â€ in Proceedings - 2020 2nd International Conference on Information Technology and Computer Application, ITCA 2020, Dec. 2020, pp. 317â€“321, doi: 10.1109/ITCA52113.2020.00074.

H. F. Mahdi, R. Dagli, A. Mustufa, and S. Nanivadekar, â€œJob Descriptions Keyword Extraction using Attention based Deep Learning Models with BERT,â€ Jun. 2021, doi: 10.1109/HORA52670.2021.9461296.

S. R. K. Harinatha, B. T. Tasara, and N. N. Qomariyah, â€œEvaluating Extractive Summarization Techniques on News Articles,â€ in Proceedings - 2021 International Seminar on Intelligent Technology and Its Application: Intelligent Systems for the New Normal Era, ISITIA 2021, Jul. 2021, pp. 88â€“94, doi: 10.1109/ISITIA52817.2021.9502230.

H. Gupta and M. Patel, â€œMethod of Text Summarization Using Lsa and Sentence Based Topic Modelling with Bert,â€ in Proceedings - International Conference on Artificial Intelligence and Smart Systems, ICAIS 2021, Mar. 2021, pp. 511â€“517, doi: 10.1109/ICAIS50930.2021.9395976.

S. Cahyawijaya et al., â€œIndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation,â€ pp. 843â€“857, 2021, doi: 10.18653/v1/2021.emnlp-main.699.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, â€œIndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,â€ Nov. 2021, pp. 757â€“770, doi: 10.18653/v1/2020.coling-main.66.

K. Park, J. Lee, S. Jang, and D. Jung, â€œAn Empirical Study of Tokenization Strategies for Various Korean NLP Tasks,â€ pp. 1â€“10, 2020.

J. A. Balazs and J. D. VelÃ¡squez, â€œOpinion Mining and Information Fusion: A survey,â€ Inf. Fusion, vol. 27, no. 1, pp. 95â€“110, 2016, doi: 10.1016/j.inffus.2015.06.002.

J. Atkinson, G. Salas, and A. Figueroa, â€œImproving opinion retrieval in social media by combining features-based coreferencing and memory-based learning,â€ Inf. Sci. (Ny)., vol. 299, no. 1, pp. 20â€“31, 2015, doi: 10.1016/j.ins.2014.12.021.

B. Richardson and A. Wicaksana, â€œComparison of Indobert-Lite and Roberta in Text Mining for Indonesian Language Question Answering Application,â€ Int. J. Innov. Comput. Inf. Control, vol. 18, no. 6, pp. 1719â€“1734, 2022, doi: 10.24507/ijicic.18.06.1719.

A. Alwehaibi and K. Roy, â€œComparison of Pre-Trained Word Vectors for Arabic Text Classification Using Deep Learning Approach,â€ in Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018, 2019, pp. 1471â€“1474, doi: 10.1109/ICMLA.2018.00239.

H. R. Salim, C. De, N. D. Pratamaputra, and D. Suhartono, â€œIndonesian automatic short answer grading system,â€ Bull. Electr. Eng. Informatics, vol. 11, no. 3, pp. 1586â€“1603, 2022, doi: 10.11591/eei.v11i3.3531.

U. Naseem, I. Razzak, and P. W. Eklund, â€œA survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter,â€ Multimed. Tools Appl., vol. 80, no. 28â€“29, pp. 35239â€“35266, 2021, doi: 10.1007/s11042-020-10082-6.

J. M. Sanchez-Gomez, M. A. Vega-RodrÃguez, and C. J. PÃ©rez, â€œSentiment-oriented query-focused text summarization addressed with a multi-objective optimization approach,â€ Appl. Soft Comput., vol. 113, p. 107915, 2021, doi: 10.1016/j.asoc.2021.107915.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, â€œImproving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,â€ in IOP Conference Series: Materials Science and Engineering, 2020, vol. 874, no. 1, pp. 1â€“7, doi: 10.1088/1757-899X/874/1/012017.

Username
Password
Remember me