Karonese Sentiment Analysis: A New Dataset and Preliminary Result

Ichwanul Muslim Karo Karo - Universiti Tun Hussein Onn, Johor, 86400, Malaysia
Mohd Farhan Md Fudzee - Universiti Tun Hussein Onn, Johor, 86400, Malaysia
Shahreen Kasim - Universiti Tun Hussein Onn, Johor, 86400, Malaysia
Azizul Azhar Ramli - Universiti Tun Hussein Onn, Johor, 86400, Malaysia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.6.2-2.1119


Amount social media active users are always increasing and come from various backgrounds. An active user habit in social media is to use their local or national language to express their thoughts, social conditions, socialize, ideas, perspectives, and publish their opinions. Karonese is a non-English language prevalent mostly in North Sumatra, Indonesia, with unique morphology and phonology. Sentiment analysis has been frequently used in the study of local or national languages to obtain an overview of the broader public opinion behind a particular topic. Good quality Karonese resources are needed to provide good Karonese sentiment analysis (KSA). Limitation resources become an obstacle in KSA research. This work provides Karonese Dataset from multi-domain social media. To complete the dataset for sentiment analysis, sentiment label annotated by Karonese transcribers, three kinds of experiments were applied: KSA using machine learning, KSA using machine learning with two variants of feature extraction methods. Machine learning algorithms include Logistic Regression, Naïve Bayes, Support Vector Machine and K-Nearest Neighbor. Feature extraction improves model performance in the range of 0.1 – 7.4 percent. Overall, TF-IDF as feature extraction on machine learning has a better contribution than BoW. The combination of the SVM algorithm with TF-IDF is the combination with the highest performance. The value of accuracy is 58.1 percent, precision is 58.5 percent, recall is 57.2, and F1 score is 57.84 percent


Karonese sentiment analysis; support vector machine; k-nearest neighbor; logistic regression, naïve bayes

Full Text:



