Implementation of Word Trends Using a Machine Learning Approach with TF-IDF and Latent Dirichlet Allocation

Dianda Rifaldi - Ahmad Dahlan University, Yogyakarta 55166, Indonesia
Abdul Fadlil - Ahmad Dahlan University, Yogyakarta 55166, Indonesia
- Herman - Ahmad Dahlan University, Yogyakarta 55166, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.4.2452

Abstract


In today's technological age, the prevalence of social media has become ubiquitous, facilitating the easy dissemination of information and communication. This has led to the uploading of various content, including opinions on mental health, particularly in Indonesia. Mental health refers to an individual's emotional, psychological, and social well-being, commonly affecting individuals from adolescence to adulthood. This research utilized Twitter data on mental health issues gathered from October to November 2022, employing TF-IDF and Latent Dirichlet Allocation (LDA) to conduct topic modeling for word trend analysis based on user-generated content. The sentiment analysis concept was used to label text as either negative or positive sentiment. Subsequently, TF-IDF weighed the word frequency in the documents/tweets, categorizing the data based on the resulting sentiments. Manual labeling ensured accuracy, avoiding potential errors from libraries provided in the Indonesian language. Employing these two topic modeling techniques, conclusions were drawn for each concept, aiming to identify word trends, mainly focusing on mental health discourse within Twitter user-generated content. Results indicated the synchronicity of the keyword 'mental health' with word trends generated by LDA. At the same time, TF-IDF produced word trends based on positive and negative labels, revealing commonly used terms by Twitter users to express these concerns. Furthermore, subsequent research can be experimented by comparing topic modeling techniques using Latent Semantic Allocation (LSA), Probabilistic Latent Semantic Analysis (pLSA), and Hierarchical Dirichlet Process (HDP), where LSA and pLSA present approaches closely aligned with LDA.


Keywords


LDA; mental health; modeling; sentiment analysis; TF-IDF

Full Text:

PDF

References


M. Makita, A. Mas-Bleda, S. Morris, and M. Thelwall, “Mental Health Discourses on Twitter during Mental Health Awareness Week,” Issues Ment. Health Nurs., vol. 42, pp. 437–450, 2021, doi:10.1080/01612840.2020.1814914.

C. Liddelow et al., “Defining the scope and content of mental health guidelines for community sport in Australia: A Delphi study,” Psychol. Sport Exerc., vol. 70, no. March 2023, p. 102553, 2024, doi:10.1016/j.psychsport.2023.102553.

M. M. Barry, T. Kuosmanen, T. Keppler, K. Dowling, and P. Harte, “Priority actions for promoting population mental health and wellbeing,” Ment. Heal. Prev., vol. 33, no. October 2023, p. 200312, 2024, doi: 10.1016/j.mhp.2023.200312.

A. P. S. Melo et al., “All-cause and cause-specific mortality among people with severe mental illness in Brazil’s public health system, 2000–15: a retrospective study,” The Lancet Psychiatry, 2022, doi:10.1016/S2215-0366(22)00237-1.

M. Mahmoud and R. Mahmood, “Heliyon Differences in mental health status between individuals living with diabetes , and pre-diabetes in Qatar : A cross-sectional study,” Heliyon, vol. 10, no. 1, p. e23515, 2024, doi: 10.1016/j.heliyon.2023.e23515.

C. A. Laurenzi et al., “SSM - Mental Health Development of a school-based programme for mental health promotion and prevention among adolescents in Nepal and South Africa,” SSM - Ment. Heal., vol. 5, no. July 2023, p. 100289, 2024, doi: 10.1016/j.ssmmh.2023.100289.

W. F. Satrya, R. Aprilliyani, and E. H. Yossy, “Sentiment analysis of Indonesian police chief using multi-level Sentiment analysis of Indonesian police chief using multi-level ensemble model ensemble model,” Procedia Comput. Sci., vol. 216, pp. 620–629, 2023, doi:10.1016/j.procs.2022.12.177.

A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment analysis and classification of Indian farmers’ protest using twitter data,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 2, p. 100019, 2021, doi: 10.1016/j.jjimei.2021.100019.

S. E. Uthirapathy and D. Sandanam, “Topic Modelling and Opinion Analysis on Climate Change Twitter Data Using LDA and BERT Model.,” Procedia Comput. Sci., vol. 218, no. 2022, pp. 908–917, 2022, doi: 10.1016/j.procs.2023.01.071.

D. Rozado, R. Hughes, and J. Halberstadt, “Longitudinal analysis of sentiment and emotion in news media headlines using automated labelling with Transformer language models,” PLoS One, vol. 17, no. 10 October, pp. 1–15, 2022, doi: 10.1371/journal.pone.0276367.

D. Suhartono, K. Purwandari, N. H. Jeremy, S. Philip, P. Arisaputra, and I. H. Parmonangan, “Deep neural networks and weighted word embeddings for sentiment analysis of drug product reviews,” Procedia Comput. Sci., vol. 216, no. 2022, pp. 664–671, 2023, doi:10.1016/j.procs.2022.12.182.

A. Borg and M. Boldt, “Using VADER sentiment and SVM for predicting customer response sentiment,” Expert Syst. Appl., vol. 162, p. 113746, 2020, doi: 10.1016/j.eswa.2020.113746.

L. Li, Y. Mao, Y. Wang, and Z. Ma, “How has airport service quality changed in the context of COVID-19: A data-driven crowdsourcing approach based on sentiment analysis,” J. Air Transp. Manag., vol. 105, no. March, p. 102298, 2022, doi:10.1016/j.jairtraman.2022.102298.

E. Rosenberg et al., “Results in Engineering Sentiment analysis on Twitter data towards climate action,” Results Eng., vol. 19, no. July, p. 101287, 2023, doi: 10.1016/j.rineng.2023.101287.

Aldinata, A. M. Soesanto, V. C. Chandra, and D. Suhartono, “Sentiments comparison on Twitter about LGBT,” Procedia Comput. Sci., vol. 216, pp. 765–773, 2023, doi: 10.1016/j.procs.2022.12.194.

R. Catelli, S. Pelosi, C. Comito, C. Pizzuti, and M. Esposito, “Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in Italy,” Comput. Biol. Med., vol. 158, no. February, p. 106876, 2023, doi:10.1016/j.compbiomed.2023.106876.

D. Sunitha, R. K. Patra, N. V. Babu, A. Suresh, and S. C. Gupta, “Twitter sentiment analysis using ensemble based deep learning model towards COVID-19 in India and European countries,” Pattern Recognit. Lett., vol. 158, pp. 164–170, 2022, doi:10.1016/j.patrec.2022.04.027.

R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “ScienceDirect ScienceDirect The Impact of Features Extraction on the Sentiment Analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019, doi:10.1016/j.procs.2019.05.008.

J. Sangeetha and U. Kumaran, “Measurement : Sensors Sentiment analysis of amazon user reviews using a hybrid approach,” Meas. Sensors, vol. 27, no. January, p. 100790, 2023, doi:10.1016/j.measen.2023.100790.

A. Mee, E. Homapour, F. Chiclana, and O. Engel, “Sentiment analysis using TF–IDF weighting of UK MPs’ tweets on Brexit [Formula presented],” Knowledge-Based Syst., vol. 228, p. 107238, 2021, doi:10.1016/j.knosys.2021.107238.

A. A. Firdaus, A. Yudhana, and I. Riadi, “Public Opinion Analysis of Presidential Candidate Using Naïve Bayes Method,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, 2023, doi: 10.22219/kinetik.v8i2.1686.

R. Taqiuddin, F. A. Bachtiar, and W. Purnomo, “Opinion Spam Classification on Steam Review using Support Vector Machine with Lexicon-Based Features,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, 2021, doi:10.22219/kinetik.v6i4.1323.

F. Alzami, E. D. Udayanti, D. P. Prabowo, and R. A. Megantara, “Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 3, pp. 235–242, 2020, doi:10.22219/kinetik.v5i3.1066.

O. Rakhmanov, “A Comparative Study on Vectorization and Classification Techniques in Sentiment Analysis to Classify Student-Lecturer Comments,” Procedia Comput. Sci., vol. 178, pp. 194–204, 2020, doi: 10.1016/j.procs.2020.11.021.

N. H. Gabriela, R. Siautama, C. I. A. Amadea, and D. Suhartono, “Extractive Hotel Review Summarization based on TF/IDF and Adjective-Noun Pairing by Considering Annual Sentiment Trends,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 558–565, 2021, doi:10.1016/j.procs.2021.01.040.

D. Suhartono, “ScienceDirect ScienceDirect Top 10 Countries with the Highest Rates of Stress,” Procedia Comput. Sci., vol. 216, pp. 672–681, 2023, doi: 10.1016/j.procs.2022.12.183.

A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019, doi:10.1109/icoase.2019.8723825.

R. Xie, S. K. W. Chu, D. K. W. Chiu, and Y. Wang, “Exploring Public Response to COVID-19 on Weibo with LDA Topic Modeling and Sentiment Analysis,” Data Inf. Manag., vol. 5, no. 1, pp. 86–99, 2021, doi: 10.2478/dim-2020-0023.

V. Gangadharan and D. Gupta, “Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques,” Procedia Comput. Sci., vol. 171, no. 2019, pp. 1337–1345, 2020, doi: 10.1016/j.procs.2020.04.143.

R. K. Gupta, R. Agarwalla, B. H. Naik, J. R. Evuri, A. Thapa, and T. D. Singh, “Prediction of research trends using LDA based topic modeling,” Glob. Transitions Proc., vol. 3, no. 1, pp. 298–304, 2022, doi: 10.1016/j.gltp.2022.03.015.

B. Ozyurt and M. A. Akcayol, “A new topic modeling based approach for aspect extraction in aspect based sentiment analysis: SS-LDA,” Expert Syst. Appl., vol. 168, no. November, p. 114231, 2021, doi:10.1016/j.eswa.2020.114231.

E. Chagnon, R. Pandolfi, J. Donatelli, and D. Ushizima, “Benchmarking topic models on scientific articles using BERTeley ✩,” Nat. Lang. Process. J., vol. 6, no. October 2023, p. 100044, 2024, doi: 10.1016/j.nlp.2023.100044.

G. Faisal, N. F. Najwa, M. A. Furqon, and F. Rozi, “IT-Architecture Study Literature Research Collaboration: Malay Architecture Context,” Int. J. Informatics Vis., vol. 5, no. 3, pp. 212–217, 2021, doi:10.30630/joiv.5.3.479.

R. Maskat, S. M. Shaharudin, D. Witarsyah, and H. Mahdin, “A Survey on Forms of Visualization and Tools Used in Topic Modelling,” Int. J. Informatics Vis., vol. 7, no. 2, pp. 517–526, 2023, doi: 10.30630/joiv.7.2.1313.

A. N. Rachman, H. Mubarok, E. N. F. Dewi, and R. E. Putra, “Implementation of Convolutional Neural Network and Long Short-Term Memory Algorithms in Human Activity Recognition Based on Visual Processing Video,” Int. J. Informatics Vis., vol. 7, no. 2, pp. 494–501, 2023, doi: 10.30630/joiv.7.2.1504.

H. Min and L. Ke, “Study of topic mining for microblog based on CBMB-HL model,” Proc. 2020 IEEE Int. Conf. Adv. Electr. Eng. Comput. Appl. AEECA 2020, pp. 853–858, 2020, doi:10.1109/aeeca49918.2020.9213478.

V. Vukanti and A. Jose, “Business Analytics: A case-study approach using LDA topic modelling,” Proc. - 5th Int. Conf. Comput. Methodol. Commun. ICCMC 2021, no. Iccmc, pp. 1818–1823, 2021, doi:10.1109/iccmc51019.2021.9418344.

W. Wei, D. Nan, L. Zhang, J. Zhou, L. Wang, and X. Tang, “Short text data model of secondary equipment faults in power grids based on LDA topic model and convolutional neural network,” Proc. - 2020 35th Youth Acad. Annu. Conf. Chinese Assoc. Autom. YAC 2020, pp. 156–160, 2020, doi: 10.1109/yac51587.2020.9337597.

G. Xu, X. Wu, H. Yao, F. Li, and Z. Yu, “Research on Topic Recognition of Network Sensitive Information Based on SW-LDA Model,” IEEE Access, vol. 7, pp. 21527–21538, 2019, doi:10.1109/access.2019.2897475.

J. Rashid et al., “Topic Modeling Technique for Text Mining over Biomedical Text Corpora through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering,” IEEE Access, vol. 7, pp. 146070–146080, 2019, doi: 10.1109/access.2019.2944973.

W. Zheng, B. Ge, and C. Wang, “Building a TIN-LDA Model for Mining Microblog Users’ Interest,” IEEE Access, vol. 7, no. c, pp. 21795–21806, 2019, doi: 10.1109/access.2019.2897910.

D. Buenano-Fernandez, M. Gonzalez, D. Gil, and S. Lujan-Mora, “Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach,” IEEE Access, vol. 8, pp. 35318–35330, 2020, doi:10.1109/access.2020.2974983.

Y. J. Jung and Y. Kim, “Research trends of sustainability and marketing research, 2010–2020: Topic modeling analysis,” Heliyon, vol. 9, no. 3, p. e14208, 2023, doi: 10.1016/j.heliyon.2023.e14208.

Z. Yang, Z. Gong, Q. Zhang, and J. Wang, “International Journal of Transportation Analysis of pedestrian-related crossing behavior at intersections : A Latent Dirichlet Allocation approach,” Int. J. Transp. Sci. Technol., vol. 12, no. 4, pp. 1052–1063, 2023, doi:10.1016/j.ijtst.2022.12.003.

P. Yang, Y. Yao, and H. Zhou, “Leveraging Global and Local Topic Popularities for LDA-Based Document Clustering,” IEEE Access, vol. 8, pp. 24734–24745, 2020, doi: 10.1109/access.2020.2969525.

C. Ma and C. Qirui, “Spatial-temporal evolution pattern and optimization path of family education policy: An LDA thematic model approach,” Heliyon, vol. 9, no. 7, p. e17460, 2023, doi:10.1016/j.heliyon.2023.e17460.