Automatic Summarization of Court Decision Documents over Narcotic Cases Using BERT

Galih Wasis Wicaksono - University of Muhammadiyah Malang, Malang, Indonesia
Sheila Fitria Al asqalani - University of Muhammadiyah Malang, Malang, Indonesia
Yufis Azhar - University of Muhammadiyah Malang, Malang, Indonesia
Nur Putri Hidayah - University of Muhammadiyah Malang, Malang, Indonesia
Andreawana Andreawana - University of Muhammadiyah Malang, Malang, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.2.1811

Abstract


Reviewing court decision documents for references in handling similar cases can be time-consuming. From this perspective, we need a system that can allow the summarization of court decision documents to enable adequate information extraction. This study used 50 court decision documents taken from the official website of the Supreme Court of the Republic of Indonesia, with the cases raised being Narcotics and Psychotropics. The court decision document dataset was divided into two types, court decision documents with the identity of the defendant and court decision documents without the defendant's identity. We used BERT specific to the IndoBERT model to summarize the court decision documents. This study uses four types of IndoBert models: IndoBERT-Base-Phase 1, IndoBERT-Lite-Bas-Phase 1, IndoBERT-Large-Phase 1, and IndoBERT-Lite-Large-Phase 1. This study also uses three types of ratios and ROUGE-N in summarizing court decision documents consisting of ratios of 20%, 30%, and 40% ratios, as well as ROUGE1, ROUGE2, and ROUGE3. The results have found that IndoBERT pre-trained model had a better performance in summarizing court decision documents with or without the defendant's identity with a 40% summarizing ratio. The highest ROUGE score produced by IndoBERT was found in the INDOBERT-LITE-BASE PHASE 1 model with a ROUGE value of 1.00 for documents with the defendant's identity and 0.970 for documents without the defendant's identity at a ratio of 40% in R-1. For future research, it is expected to be able to use other types of Bert models such as IndoBERT Phase-2, LegalBert, etc.


Full Text:

PDF

References


F. Solihin and I. Budi, “Recording of law enforcement based on court decision document using rule-based information extraction,†in 2018 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2018, 2019, pp. 349–354, doi: 10.1109/ICACSIS.2018.8618187.

M. Campr and K. Ježek, “Comparing semantic models for evaluating automatic document summarization,†in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, vol. 9302, no. 1, pp. 252–260, doi: 10.1007/978-3-319-24033-6_29.

Y. Liu, “Fine-tune BERT for Extractive Summarization,†Cornell Univ., vol. 1, no. 1, pp. 1–6, 2019.

N. Baruah, S. K. Sarma, and S. Borkotokey, “A Single Document Assamese Text Summarization Using a Combination of Statistical Features and Assamese WordNet,†in Advances in Intelligent Systems and Computing, 2021, vol. 1199, pp. 125–136, doi: 10.1007/978-981-15-6353-9_12.

S. Meena, M. Ramkumar, R. Asmitha, and G. Emil Selvan, “Text Summarization Using Text Frequency Ranking Sentence Prediction,†in 2020 4th International Conference on Computer, Communication and Signal Processing (ICCCSP), Sep. 2020, pp. 1–5, doi: 10.1109/ICCCSP49186.2020.9315203.

D. Gunawan, S. H. Harahap, and R. Fadillah Rahmat, “Multi-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia,†in Proceeding - 2019 International Conference on ICT for Smart Society: Innovation and Transformation Toward Smart Region, ICISS 2019, 2019, pp. 1–5, doi: 10.1109/ICISS48059.2019.8969785.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,†in NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2019, vol. 1, pp. 4171–4186.

Y. Liu and M. Lapata, “Text summarization with pretrained encoders,†in EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020, pp. 3730–3740, doi: 10.18653/v1/d19-1387.

T. Hirao, Y. Yoshida, M. Nishino, N. Yasuda, and M. Nagata, “Single-document summarization as a tree knapsack problem,†in EMNLP 2013 - 2013 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2013, pp. 1515–1520.

D. Djamal, S. Ono, and D. A. Wicaksana, “Laporan Penelitian Penyederhanaan Format Putusan,†2016.

L. Gong, D. He, Z. Li, T. Qin, L. Wang, and T. Y. Liu, “Efficient training of BERT by progressively stacking,†in 36th International Conference on Machine Learning, ICML 2019, 2019, vol. 2019-June, pp. 4202–4211.

H. Zhang, J. Cai, J. Xu, and J. Wang, “Pretraining-based natural language generation for text summarization,†in CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference, 2019, pp. 789–797, doi: 10.18653/v1/k19-1074.

R. Wei, H. Huang, and Y. Gao, “Sharing Pre-trained BERT Decoder for a Hybrid Summarization,†in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11856 LNAI, pp. 169–180, doi: 10.1007/978-3-030-32381-3_14.

V. Kieuvongngam, B. Tan, and Y. Niu, “Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2,†pp. 1–13, 2020.

D. Miller, “Leveraging BERT for Extractive Text Summarization on Lectures,†pp. 1–7, 2019.

W. Kai and Z. Lingyu, “Research on Text Summary Generation Based on Bidirectional Encoder Representation from Transformers,†in Proceedings - 2020 2nd International Conference on Information Technology and Computer Application, ITCA 2020, Dec. 2020, pp. 317–321, doi: 10.1109/ITCA52113.2020.00074.

H. F. Mahdi, R. Dagli, A. Mustufa, and S. Nanivadekar, “Job Descriptions Keyword Extraction using Attention based Deep Learning Models with BERT,†Jun. 2021, doi: 10.1109/HORA52670.2021.9461296.

S. R. K. Harinatha, B. T. Tasara, and N. N. Qomariyah, “Evaluating Extractive Summarization Techniques on News Articles,†in Proceedings - 2021 International Seminar on Intelligent Technology and Its Application: Intelligent Systems for the New Normal Era, ISITIA 2021, Jul. 2021, pp. 88–94, doi: 10.1109/ISITIA52817.2021.9502230.

H. Gupta and M. Patel, “Method of Text Summarization Using Lsa and Sentence Based Topic Modelling with Bert,†in Proceedings - International Conference on Artificial Intelligence and Smart Systems, ICAIS 2021, Mar. 2021, pp. 511–517, doi: 10.1109/ICAIS50930.2021.9395976.

S. Cahyawijaya et al., “IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation,†pp. 843–857, 2021, doi: 10.18653/v1/2021.emnlp-main.699.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,†Nov. 2021, pp. 757–770, doi: 10.18653/v1/2020.coling-main.66.

K. Park, J. Lee, S. Jang, and D. Jung, “An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks,†pp. 1–10, 2020.

J. A. Balazs and J. D. Velásquez, “Opinion Mining and Information Fusion: A survey,†Inf. Fusion, vol. 27, no. 1, pp. 95–110, 2016, doi: 10.1016/j.inffus.2015.06.002.

J. Atkinson, G. Salas, and A. Figueroa, “Improving opinion retrieval in social media by combining features-based coreferencing and memory-based learning,†Inf. Sci. (Ny)., vol. 299, no. 1, pp. 20–31, 2015, doi: 10.1016/j.ins.2014.12.021.

B. Richardson and A. Wicaksana, “Comparison of Indobert-Lite and Roberta in Text Mining for Indonesian Language Question Answering Application,†Int. J. Innov. Comput. Inf. Control, vol. 18, no. 6, pp. 1719–1734, 2022, doi: 10.24507/ijicic.18.06.1719.

A. Alwehaibi and K. Roy, “Comparison of Pre-Trained Word Vectors for Arabic Text Classification Using Deep Learning Approach,†in Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018, 2019, pp. 1471–1474, doi: 10.1109/ICMLA.2018.00239.

H. R. Salim, C. De, N. D. Pratamaputra, and D. Suhartono, “Indonesian automatic short answer grading system,†Bull. Electr. Eng. Informatics, vol. 11, no. 3, pp. 1586–1603, 2022, doi: 10.11591/eei.v11i3.3531.

U. Naseem, I. Razzak, and P. W. Eklund, “A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter,†Multimed. Tools Appl., vol. 80, no. 28–29, pp. 35239–35266, 2021, doi: 10.1007/s11042-020-10082-6.

J. M. Sanchez-Gomez, M. A. Vega-Rodríguez, and C. J. Pérez, “Sentiment-oriented query-focused text summarization addressed with a multi-objective optimization approach,†Appl. Soft Comput., vol. 113, p. 107915, 2021, doi: 10.1016/j.asoc.2021.107915.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing for Student Complaint Document Classification Using Sastrawi,†in IOP Conference Series: Materials Science and Engineering, 2020, vol. 874, no. 1, pp. 1–7, doi: 10.1088/1757-899X/874/1/012017.