ON

— The information presented in the documents regarding industrial relations disputes constitutes four legal disputes. However, too much information leads to difficulty for readers to find essential points highlighted in industrial relations dispute documents. This research aims to summarize automated documents of court decisions over industrial relations disputes with permanent legal force. This research involved 35 documents of court decisions obtained from Indonesia’s official Supreme Court website and employed an extractive summarization approach to summarize the documents by utilizing Cross Latent Semantic Analysis (CLSA) and Long Short-Term Memory (LSTM) methods. The two methods are compared to obtain the best results CLSA was employed to analyze the connection between phrases, requiring the ordering of related words before they were converted into a complete summary. Then, the use of LSTM is combined with the Attention module to decoder and encoder the information entered so that it becomes a form that can be understood by the system and provides a variety of splitting of documents to be trained and tested to see the highest performance that the system can generate. The research has found out that the CLSA method gave a precision of 79.1%, recall score of 39.7%, and ROUGE-1 score of 50.9%, and the use of LSTM was able to improve the performance of the CLSA method with the results obtained 93.6%, recall score of 94.5 %, and ROUGE-1 score of 93.9% on the variation of splitting 95% training and 5% testing.


I. INTRODUCTION
Artificial Intelligence (AI) is a system that can learn from the user's experience.AI can also form a pattern out of data prepared as exercise materials.The pattern allows automatization resembling those made by humans, commonly dubbed human-centered Artificial Intelligence [1], [2].The implementation of AI has reached other sectors, such as education [3], [4], health [5], and social & business sectors [6].Departing from this wider scope, AI also plays its role in automated decision-making within a legal purview.In automated decision-making, AI can consider what comes after the decision-making according to training data previously collected and processed [7].
The documents on court decisions were sourced from industrial relations disputes (henceforth referred to as PHI).The documents consist of judicial decisions with abundant information.The substance of the documents covers four disputes: right disputes, 2) conflict of interest, 3) layoffs, and 4) labor union-related disputes.However, information ranges from significant to insignificant.The problem is making it hard for the readers to spot the essence in the PHI documents.The documents on industrial relations disputes require an analysis recalling that they take the scope between society and corporate.
Text summarization is categorized into two, constituting abstractive and extractive summarization.Abstractive summarization functions to compose new words identical to the words existing in original documents.Abstractive summarization refers to creating new phrases without omitting the substance in the original documents [8].On the other hand, extractive summarization requires the selection of important words existing in the original documents.This type of summarization cannot produce new phrases but can create new sentences to be incorporated into a complete summary of the original documents.Extractive summarization is preferred because it is simpler and easier to implement in several cases of automated text summarization [9].
Previous studies reviewed and surveyed text summarization of court decisions.The first study was focused on the summarization of single court decisions sorted out with Latent Semantic Analysis (LSA), Maximum Marginal Relevance (MMR), conditional Random Field (CRF), and Matrix Factorization (NMF), but for multi-documents, the methods such as Dependency Word Pair (DWP), Cosine Similarity and Latent Dirichlet Allocation (LDA) are more recommended [10].
Other studies summarized article documents in Bahasa by employing the MMR method [11], while the use of LSA was proven in article summarization presented in Bahasa [12].The study using Cross Latent Semantic Analysis (CLSA) in document summarization was presented in Bahasa [13].The comparison between CLSA and LSA sourced from 240 news documents in Bahasa resulted in an F-Measure score of 70% and CLSA accounting for 72%.The method of CLSA often gives shorter summary results and can extract essential points compared to LSA, but the LSA is more reliable in language use [14].
Several studies using Long Short-Term Memory (LSTM) to summarize documents have been conducted, including either extractive [15], [16] or abstractive summarization [17]- [19], which have proven the performance of LSTM in text summarization.In the text summarization of court decision documents, several methods such as LSA [20], and the merging of several methods such as LSA, LUHN, LEXRANK, and SUMBASIC [21], were employed.
This research required the summarization of single documents in an automated mode on the documents of court decisions by employing CLSA and LSTM methods.The PHI decisions used in this research also held permanent legal force.The summarization performed did not spoil the messages contained in the documents, and it could help readers spot important information contained in the decisions.

II. MATERIALS AND METHOD
Research stages in the summarization of the documents of PHI employed CLSA and LSTM, requiring data collection, data pre-processing, document segmentation, document ranking, and evaluation.

A. Collecting Data
The PHI data were sourced from text documents manually downloaded from the official website of the directory of Indonesia's Supreme Court (https://putusan3.mahkamahagung.go.id/).This research involved 35 documents of the PHI court decisions with permanent legal force.The data were collected based on keywords/types of cases of industrial relations disputes.The documents obtained were downloaded in Portable Document Format (PDF), and the documents of the PHI decisions were further converted into document.txt.
Each sheet of the document of PHI decision contained a watermark and information on both the header and footer.To improve the data quality, we cleaned and normalized documents, and the documents were further normalized to erase ASCII from the documents.

B. Pre-processing
Pre-processing was performed to enhance the quality of testing data for further processing.Moreover, the file format conversion was also performed to adjust the document input to the methods used.The methods used in text pre-processing were examined according to the characteristics of the methods under the type and model of the dataset used and to enhance accuracy [22], [23].The pre-processing stage employed case folding, tokenizing, and StopWord Removal [24], [25].The omission of less important words was performed using a modified StopWord list, recalling that not all words were to be omitted.This approach was intended to gain results relevant to the characteristics of PHI documents.The process taken in setting the StopWord list took a consultation with a legal expert.

C. Document Segmentation
One PHI document has five core parts, constituting decision header (title and decision number), the identity of both plaintiff and defendant, case (lawsuit), judge's consideration, indictment, and decision footer (the time and the names of the panel of judges deciding a case).The segmentation of the PHI decision documents into several decisions could result in independent information separate from the other parts of a decision, and this setting degrades the quality of the summary result.

D. Cross Latent Semantic Analysis (CLSA)
CLSA is a method aiming to summarize a text through extraction developed based on an LSA method.CLSA was employed to analyze the connection between phrases, requiring the ordering of related words before they were converted into a complete summary [13] [14].The process of summarization with the method of CLSA began with word weighing by using Term-Frequency Method-Inverse Document Frequency (TF-IDF) [12].The matrix process represented the results of word weighing previously performed using TF-IDF.The weight matrix of words was decomposed to reduce the data dimension.This research employed Singular Value Decomposition (SVD) method to perform matrix decomposition resulting from the previous matrix input.SVD reduces the number of dimensions of the matrix of the weight of words from the documents of PHI decisions.The decomposition and the normalization results are presented in equation 3. [13], [26].

=
(3) Legend: A : Matrix of word weighing results U V : : Singular vector of matrix A The diagonal vector from the corresponding singular vector S : Diagonal matrix with positive and null matrix Following the matrix decomposition stage, a word list forming each sentence was obtained, and this narrowed down the number of sentences [27] so that the system was able to perform extraction of sentences of the total weight previously obtained from SVD, as shown in Table 4. Furthermore, the decomposition result would be re-selected according to the weight adjusted to the average score [26].

E. Long Short-Term Memory
Long Short-Term Memory (LSTM) can fix the drawbacks spotted in the conventional Recurrent Neural Network (RNN).In equation 4, is a hidden state of the input of the period of -t, and ℎ is a hidden state of the previous period, while f represents activation function (non-linear) [28], where f is replaceable by LSTM.The stages of LSTM are presented in Fig. 1.
Fig. 1 indicates that LSTM has three gates: forgot gate, the input gate, and output gate [29].The computation process in LSTM [30] began by filtering information with forgot gate (ft), where less significant information was omitted, which is represented by the sigmoid function ( ) as in equation 5.
As seen in the equation, Xt represents a hidden state from the input during the period of t, and ht-1 represents the hidden state of the previous period.The second stage constitutes the input gate, representing the process where the system was intended to sort out particular information renewed into cell state with the function of tanh as shown in Equations 6 and 7. Furthermore, in the following stage, cell state received information determined in an earlier process, as shown in equation 8.
Output gate was the final stage, representing a process to give output score in a hidden state and to put the cell state at tanh with the implementation of sigmoid, as shown in Equations 9 and 10.The implementation of LSTM in document summarization is presented in Fig. 2.
In the modeling process, the training data is entered into LSTM summarization models as a data input that has been created in Fig. 2. First, the input layer is defined according to the vocab size and maximum word length, then followed by an embedding layer.This layer's vocabulary retrieval process is coded with an array of integers and embedding vectors for each word index.The input layer was given based on the maximum length of 800 words.The embedding layer referred to the score of the maximum length of the original text and the text from an expert with 800 and 400 words, respectively.Furthermore, the implementation of stacked LSTM with several layers of LSTM stacked over one another with the number of neurons based on a latent dimension with the value of 150.The use of layer attention was involved in the model-making to maximize text encoding and decoding processes.Thus, combining the layers of LSTM and Attention requires the definition of combining layers from the input resulting from the two layers.The final stage involved defining the output layer using layer dense with softmax activation.
The compiling model process utilized the optimizer the Rmsprop referring to loss sparse categorical cross-entropy with its function to immediately change integers into one-hot vector.The iteration score/epoch was defined by 500 using callback early stopping with patience ten so that if the loss value does not change in 10 iterations, the lowest value is used.

F. Testing and Evaluation
This section aims to find out the effectiveness and accuracy of the method used: recall-oriented understudy for gusting evaluation (ROGUE), which tests text summary results [31] by matching the summary results generated by the system and those of the summarization performed by an expert [32].To find out the accuracy level of the summary generated by the system compared to the summary given by an expert, ROGUEPrecission was used, while ROGUERecall was used to find out the level of success of the system in regaining the information generated, as presented in equation 12-13:

III. RESULTS AND DISCUSSION
The summary results involving 35 documents of PHI court decisions were tested using ROUGE-1 score on each document, as presented in Table 1.For the result, the maximum result of the ROUGE score was taken from the method of LSTM with a maximum score of 100% and the minimum score of 70%.In addition to showing the matching scores of the summary results, this research also tested the summary results using the average precision and recall for the three methods.
These two testing methods were intended to determine the system's success in finding solutions to generate a summary resembling the original summary results.The comparison of the number of words between the original documents and the summary given by an expert and system is presented in Table 4.In the LSTM method, the separation of data into training and test data also affects the results of each document, different proportions in the splitting process also provide variations on each performance obtained so that providing various variations of splitting is very necessary as a way to obtain The best results and total of data fo each splitting can be seen in Table 2.Each category of data splitting could implement epoch sharing a similar indicating number of 500.However, a good fit condition could be identified with the help of callbacks early topping functioning to retain the most minimum loss validation score.The data splitting mode accounted for 80:20, the iteration accounted for 441, and the lowest loss validation score accounted for 25%.The second data splitting model represented 90:10 with the iteration of 391, and the lowest loss validation score accounted for 23%.Then, the last splitting 95:5 with the iteration of 500 and the lowest loss validation score accounted for 2%.The training process of those models is presented in Fig. 4, 5, and 6.Each category of data splitting had a different ROUGE-1 score, with its best results presented with a data proportion of 95% of training data in comparison to 5% of testing data as shown in Table 3.This can happen because in the training process the data provided is much more so that it can affect the learning process.The system becomes more stable and performs much better than other splitting proportions.From Table 3 know that each of splitting have minimum ROUGE-1 score.But, for each splitting have the same maximum value which is in 100% score.First in 80:20 splitting proportion the minimum value is 58% and the maximum value is 100%.Then, in 90:10 splitting proportion the minimum value is 48% and the maximum value is 100%.Last, in 95:5 splitting proportion the minimum value is 70% and the maximum value is 100%.
Furthermore, Table 4 shows the comparison of words in 35 documents after going through the summary process with various methods, in the table it can be seen that the LSTM has a much lower word count than the CLSA method.So from the results of the comparison of the number of words it can be concluded that LSTM is able to summarize documents optimally.Departing from testing result analysis, this research has found there were several contributing factors resulting in the varied scores of the evaluation of the summarization of the documents on PHI court decisions with LSA and CLSA methods.The variety of words in the sentences extracted affected the quality of the summarization results.Numerous numbers and symbols in PHI court decisions have hampered the system of extracting the connection between phrases, affecting the relevance of the summary results given by the expert during the evaluation process.Numerous words in the documents of the decisions were also present as one of contributing factors causing the decrease in the evaluation score generated because the summary result generated by the system was not as systematic as that generated by the expert.The characteristics of the summary result produced by the expert were capable of setting the restriction of each summary result, while the summary result given by the system failed to restrict the number of words in the documents.The discrepancy between the number of words in the summary result given by the expert and that of the system would decrease the ROUGE score, but the factors affecting the variety of summarization results could be sorted out using an LSTM method.

IV. CONCLUSION
The research results have found that the LSA, CLSA, and LSTM can be used in the summarization of documents on PHI court decisions based on the testing that examines the relevance and resemblance level of the manual summary given by an expert.The most reliable result was obtained from the LSTM method.In terms of the potential model made, the summarization model based on the CLSA method can still be improved by selecting and considering pertinent preprocessing methods.In the time to come, the improvement of summary results may require the use of synonyms for each part of the document of PHI court decisions.
document towards words Tf : The number of words searched in one document idf : Inverse Document Frequency N : The number of documents df : The number of documents towards words searched

Fig. 2
Fig. 2 Structure of LSTM for document summarization

TABLE I ROUGE
SCORE RESULTS FOR EACH METHOD (LSA, CLSA, AND LSTM)