Implementation of Ensemble Machine Learning Classifier and Synthetic Minority Oversampling Technique for Sentiment Analysis of Sustainable Development Goals in Indonesia

— As part of the Sustainable Development Goals (SDGs), governments worldwide have committed to improving people's lives to improve the quality of life for all, including the 17 such goals that were agreed upon in 2015 to benefit the human race as a whole. It would be interesting to see how society responds to the SDGs after approximately half of them have been achieved. This public response was analyzed in terms of sentiment. Within the total number of internet users in Indonesia, there are 18.45 million Twitter users. The platform enables anyone to write about anything they are experiencing in their lives, such as what is happening in their environment, what is happening in their education system, what is happening in the food industry, how people feel, and many more. The platform enables anyone to write about anything they are experiencing in their lives, such as what is happening in their environment, what is happening in their education system, what is happening in the food industry, how people feel, and many more. To model the data collected, the researchers used Ensemble Machine Learning Classifiers (EMLC) to model the data by using a machine learning classifier that uses machine learning techniques. The best model in this study is EMLC-Stacking with a data splitting of 80:20 and using SMOTE, which obtains an accuracy of 91%. This accuracy results from a 5% increase compared to when not using SMOTE. From 15,698 tweets, this research found that 47% were positive sentiments, 28% were negative sentiments, and 25% were neutral sentiments. The results that we measured offer hope that there will be a positive trend in the journey of the SDGs until 2030 if these findings are true.


I. INTRODUCTION
The agenda for the Sustainable Development Goals (SDGs), which the United Nations General Assembly endorsed in September 2015 as a new universal development agenda [1], has superseded the framework for the Millennium Development Goals (MDGs), which was agreed upon in the year 2000.Figure 1 illustrates the seventeen Sustainable Development Goals set for the world.SDGs have Universal, Integrative, and Inclusive principles to ensure that no one is left behind or is called No One Left Behind.SDGs are global and national commitments to improve the welfare of society, covering 17 goals [2] as follows: The Fundamental 5P of SDGs can be seen in Figure 2. The SDGs contain five fundamental principles (Figure 2) that strike a balance between economic, social, and environmental dimensions: 1) People, 2) Planet, 3) Prosperity, 4) Peace, and 5) Partnership.These five fundamental principles are known as the 5P [1] and encompass 17 Goals and 169 Goals that cannot be separated, connected, or integrated to improve the quality of human existence.
In recent years, social media has evolved as a virtual place that allows users to express their worries about the environment and health in addition to other problems of public interest such as transportation [3], health [4], [5], education [6], [7], communication [7], and economics [8] [9].The usage of social media to carry out day-to-day activities is highly prevalent all over the world.Social media platforms are an essential source of information regarding the communication and viewpoints of internet users [10].
Twitter users can produce and express their ideas directly with one another through the usage of tweets and tweets.Users can also engage with one another through the platform, which may be done by following other users, viewing other users' accounts, and using the same hashtags to discuss topics of interest to both parties.Following that, Twitter was utilized more frequently for research objectives [11], including the investigation of public opinion [3].Twitter's growth statistics are favorable, and the number of active users is growing every month [12], in contrast to the growth statistics of other social media platforms such as Facebook.
As shown in Figure 3, people can write about anything happening in their lives on Twitter, which is one reason why there are 18.45 million users of Twitter among all internet users in Indonesia.Therefore, when researching sentiment analysis, Twitter is a significant social network.The originality of this study is demonstrated by sentiment analysis (identifying positive, neutral, and negative connotations) of all 17 goals that make up the Sustainable Development Goals (SDGs).RapidMiner and the Python programming language will be the technologies utilized for this project.The Foundational Methodology for Data Science (FMDS), in general, was used throughout the process of carrying out this research.As a result, when we finish our study, we have an idea of the precision of the utilized models, the impact of using SMOTE, and the total number of positive, neutral, and negative reactions to the Sustainable Development Goals in Indonesia.

II. MATERIALS AND METHOD
The field of study known as sentiment analysis examines individuals' thoughts, feelings, and perspectives through written language [13].In sentiment analysis, the first step is to classify the text into sentences or documents, and the second step is to decide if the opinions represented in those phrases or documents are positive, negative, or neutral.Emotions such as happiness, sadness, or rage can also be gleaned via sentiment analysis.Expression of sentiment refers to the focus of a specific issue; hence, a remark regarding one subject may have a different meaning than a statement regarding another subject.
Flores et al. [24] used SMOTE to get higher accuracy, concluding that using SMOTE increased accuracy in using the K-12 program dataset in the Philippines.Jayapermana et al. [20] used the Ensemble Machine Learning Classifier to research the COVID-19 vaccine.This resulted in the conclusion that using the stacking ensemble classifier obtained higher accuracy than using only a single algorithm.Based on related research, this study showed the effect of using SMOTE and the Ensemble Machine Learning Classifier on the accuracy of sentiment analysis of Twitter data related to the overall SDGs in Indonesia.This research uses the Foundational Methodology for Data Science [45].This methodology is a development of the CRISP-DM framework.Figure 4 is an FMDS scheme.

A. Business Understanding
Every project or research begins with a business understanding that forms the basis for an effective solution to a business problem.This phase aims to answer the question, "What problem are you trying to find a solution to?"In this phase, the research begins by defining the problem and determining the objectives and scope of the study.

B. Analytical Approach
After the business problem is clearly understood, this research is continued by determining the approach to solving the problem.This phase aims to answer the question, "How can researchers use existing data to solve problems?"This stage requires posing the problem in the context of machine learning.This research used an approach that implemented the Ensemble Machine Learning Classifier and SMOTE to analyze sentiment toward SDGs in Indonesia.

C. Data Requirements
The analytical approach chosen determines the data requirements.This stage aims to answer the question, "What data is needed to solve your problem?"In particular, the analytical method requires specific content, format, and data representation guided by domain knowledge.The data needed for this research is from Indonesian-language tweets regarding SDGs originating from Twitter users in Indonesia.

D. Data Collection
At this stage, the research began to collect data.This stage aims to answer the question, "Where does the data come from, and how do you get it?"The data used in this study is a collection of Indonesian language tweets regarding the 17 SDGs on 10-16 July 2022.The data was obtained from Twitter using the Twitter API (Application Programming Interface) via RapidMiner.

E. Data Understanding
After data collection is reviewed, determine whether it is by business and data needs.This stage aims to answer the question, "Does the data obtained meet the needs to solve the problem?"At this stage, the researchers ensured that the data obtained was as needed and understood its contents and initial information.

F. Data Preparation
This stage includes all activities to construct a dataset in the modeling stage.The Research Phase aims to answer the question, "What should the researchers do to prepare the data so they can continue modeling using existing data?"This phase has two stages (Figure 5): the first is to clean the data, and the second is to label the data in the modeling stage.

G. Modelling
The modeling phase focuses on developing a predictive model using a predetermined analytical approach.At this stage, the researchers conducted several experiments and combinations of algorithms with their respective parameters to find the best model for the next phase.This phase aims to answer the question, "How can the data be visualized to get the required solution?".
This study compared the performance of the Ensemble Machine Learning Classifier -Stacking and Voting.EMLC Stacking used the Naïve Bayes (NB) algorithm, Support Vector Machine (SVM), and k-nearest Neighbors (k-NN) as first learners and the k-NN algorithm as metal learners.Meanwhile, EMLC Voting used the NB, SVM, and k-NN algorithms as the learner.The split data used a ratio between 70:30 and 80:20 because, empirically, it is the best range for splitting data [56].Research also examines algorithm performance and the effect of using SMOTE on existing datasets.Table 1 shows the model scheme used in this study.

H. Evaluation
At this stage, the researchers evaluated the quality of the model based on the accuracy obtained at the previous stage.This phase aims to answer the question, "Does the model obtained really answer the initial question or does it need to be readjusted?"At this stage, the researchers evaluated the best model using the confusion matrix with several assessments, including accuracy, precision, recall, and f1score obtained from the best confusion matrix model.The following is the formula used for model evaluation.

I. Deployment
At this stage, the researchers implement the best model obtained in the previous stage for data that does not yet have a label (data that has passed the data preparation phase).At this stage, it has output as prediction results for all data processed in the previous stage.

J. Feedback
At this stage, the researchers conclude the previous stages and provide feedback based on the word cloud of each SDG.The word cloud here was taken based on the SDGs results in the data preparation stage.Based on the word cloud, researchers can provide feedback on topics being discussed by internet users in Indonesia.

III. RESULTS AND DISCUSSION
The research data collection took place on July 16, 2022, which means that the tweets obtained were between July 10-16, 2022, or up until a week before the data was taken from Twitter.Researchers obtained data from as many as 30,381 tweets by using RapidMiner and Twitter's application programming interface (API).This information was found by searching with the terms listed in Table 2 (which may be found below).Data understanding is obtained as a file in CSV (commaseparated values) format.This dataset has three columns: ID, Text, and SDGs, which are explained in Table 3 below.In the next stage, the Text attribute was used to clean the data.Data preparation was processed so that it was ready for the modeling stage.This stage uses 2 tools: Jupiter Notebook (for cleansing, case folding, normalization, tokenizing, stop word removal, and stemming) and Microsoft Excel for manual labeling.Data is removed by eliminating duplication after the text preprocessing process.After these two processes, the data is considered clean and ready to be used for the next stage.However, before the modeling stage, the clean data is divided into 2, unlabeled and labeled data.labeled data is obtained by manually labeling Sentiments, while unlabeled data was used at the deployment stage to make predictions based on the best model.Table 4 is an example of the results of the stages of data preparation.The amount of tweet data obtained after this process also decreased from the initial amount, which was 30,381 to 15,698 tweet data.Figure 6 is a graph comparing data preparation and data collection results.The model carried out in this study is in Table 1 in the research methodology section.Table 5 contains a summary of the comparison of the accuracy of the models carried out in this study.Based on the modeling, it was found that Model D (EMLC-Stacking, SMOTE, 80:20) has the highest accuracy of 91%.This accuracy increases by 5% when compared to when not using SMOTE.In the previous stage, it was concluded that EMLC stacking using SMOTE and 80:20 data division resulted in the highest accuracy compared to when not using SMOTE.Table 6 below displays the confusion matrix obtained for Model D. With the caption T.Pos is True Positive, T.Neu is True Neutral and T.Neg is True Negative.Next Pred. is an abbreviation for Predictions.By using formula (1), the researchers get the accuracy of the best model based on the correctly predicted sentiment.The accuracy obtained is 91% with the following calculation description.
By using formulas (2), ( 3), (4) the results of recall, precision, and f1-score for each sentiment derived from the best model are obtained.Table 6 below is a complete summary of the evaluation of the best models used for the next stage which includes precision, recall and f1-scores for the positive, neutral and negative sentiments that have been obtained from Model D. Using the best model, the next step is to make predictions for the 15,698 unlabeled tweets.Figure 7 shows the predicted ratio of Indonesian public sentiment towards the SDGs.Most tweets have positive sentiments of 47%, followed by negative and neutral sentiments of 28% and 25%.This indicates that the positive opinions tweeted by the people of Indonesia are more than negative or neutral.This is because the majority of the 17 SDGs have keywords that have positive sentiments.The keywords (Table 2) used to search for tweets influence the number of positive sentiments.Based on Figure 8, the researchers obtained knowledge for each SDGs that came from tweets from internet users in Indonesia.SDGs 1, SDGs 2, SDGs 10, and SDGs 13 have a higher negative sentiment than other sentiments, which indicates that the sentiment of internet users in Indonesia regarding the SDGs is primarily negative.SDGs 3, SDGs 4, SDGs 5, SDGs 6, SDGs 7, SDGs 8, SDGs 9, SDGs 12, SDGs 14, SDGs 15, SDGs 16, SDGs 17 have a higher positive sentiment than other sentiments indicating that user sentiment the internet in Indonesia regarding the SDGs has the majority of positive sentiments.SDG 11 has a neutral sentiment that is higher than other sentiments, which indicates that the sentiment of internet users in Indonesia regarding the SDGs is mostly neither positive nor negative.
Based on the previous stages, researchers can find out that the majority of internet user sentiment in Indonesia towards the SDGs is positive.However, there are several SDGs where the majority have negative and neutral sentiments.The following is a collection of word clouds for each sentiment, positive sentiment is shown in Figure 9, neutral sentiment is shown in Figure 10, and negative sentiment is shown in Figure 11.Each sentiment has a word that occurs frequently respectively.Based on Figure 9, we can see that the 5 words that often appear for positive sentiment tweets are "Indonesia", "kerja (work)", "masyarakat (community)", "Menteri (Minister)" and "sehat (healthy)".These words are related to SDGs 3 and SDGs 8, which, in general, based on Figure 8, are the SDGs with the majority of positive sentiment.Based on figure 10, we can see that the 5 words that often appear for neutral sentiment tweets are "kerja (work)", "ikan (fish)", "laut (sea)", "sehat (healthy)" and "kampus (campus)".These words are related to SDGs 3, SDGs 4, SDGs 8, and SDGs 14 which, in general, based on Figure 8, are SDGs with most positive sentiments, but there are also many tweets with neutral sentiments on these SDGs.Based on Figure 11, we can see that the 5 words that often appear for negative sentiment tweets are "orang (people)", "kerja (work)", "lapar (hungry)", "Indonesia" and "miskin (poor)".These words are related to SDGs 1, SDGs 2 which in general, based on Figure 8, are the SDGs with the majority of negative sentiment.The sentiment of internet users in Indonesia regarding the SDGs has been predicted to fall into one of three categories, with the following results: 47% positive sentiment, 28% negative emotion, and 25% neutral sentiment.The percentage of people who feel positively about SDG 7 is the highest, while the percentage of people who think negatively about SDG 13 is the highest.The percentage of people who feel neutral about SDG 11 is the highest.
In further research at the modeling stage, it is recommended that Python be used.For further study, it is recommended to collect datasets for a whole month or a longer time (performing the data collection process every 7 days).So, the dataset used aims to describe the sentiments of internet users in Indonesia towards the SDGs in a longer time.Future research can try to change parameters or use a different combination of algorithms to get higher accuracy than this research.
Future studies are advised to examine and further analyze each SDG so that it is hoped that it can aid the government in devising policies based on sensitivity to problems in each SDG sector, which are currently being discussed by internet users in Indonesia.It is hoped that it can help the government formulate policies based on sensitivity to problems in each SDG sector, which are currently being discussed by internet users in Indonesia.This will help increase the level of living of people in Indonesia to accomplish the aims of Sustainable Development (SDGs) by the year 2030.

Fig. 3
Fig. 3 Country Ranking by Twitter Users

Fig. 4 Foundational
Fig. 4 Foundational Methodology for Data Science

Fig. 6
Fig. 6 Comparison of Data Preparation and Data Collection Results

Fig. 7
Fig. 7 Pie Chart of SDGs Sentiment in Indonesia

Fig. 8
Fig. 8 Graph of SDGs Sentiment in Indonesia

Fig. 11
Fig. 11 Word cloud of Negative Sentiment Tweets about SDGs in Indonesia IV.CONCLUSIONS This research's findings indicate that the Ensemble Machine Learning Classifier (EMLC) stacking kind is superior to the other available options.In this investigation, the best model that uses EMLC-Stacking achieves an accuracy of 91% when SMOTE and 80:20 data division are utilized.Compared to the prior model, which had an accuracy of 86% without employing SMOTE, the accuracy can rise by as much as 5% when utilizing SMOTE.The model's performance is dependent not only on the use of algorithms and SMOTE but also on the distribution of data for both the training data and the test data.In this study, the optimal data sharing consisted of 80% for training data and 20% for test data.The sentiment of internet users in Indonesia regarding the SDGs has been predicted to fall into one of three categories, with the following results: 47% positive sentiment, 28% negative emotion, and 25% neutral sentiment.The percentage of people who feel positively about SDG 7 is the highest, while the percentage of people who think negatively about SDG 13 is the highest.The percentage of people who feel neutral about SDG 11 is the highest.In further research at the modeling stage, it is recommended that Python be used.For further study, it is recommended to collect datasets for a whole month or a longer time (performing the data collection process every 7 days).So, the dataset used aims to describe the sentiments of internet users in Indonesia towards the SDGs in a longer time.Future research can try to change parameters or use a different combination of algorithms to get higher accuracy than this research.Future studies are advised to examine and further analyze each SDG so that it is hoped that it can aid the government in devising policies based on sensitivity to problems in each SDG sector, which are currently being discussed by internet users in Indonesia.It is hoped that it can help the government formulate policies based on sensitivity to problems in each

TABLE V COMPARISON
OF MODEL ACCURACY

TABLE VII SUMMARY
EVALUATION OF MODEL D