Development of extraction features for Detecting Adolescent Personality with Machine Learning Algorithms
DOI: http://dx.doi.org/10.62527/joiv.8.3-2.3091
Abstract
This study aims to develop a Natural Language Processing (NLP)-based feature extraction algorithm optimized for personality type classification in adolescents. The algorithm used is TF-IDF + N-Gram Z, which combines Term Frequency-Inverse Document Frequency (TF-IDF) with the N-Gram Z technique to improve the feature representation of the analyzed text. TF-IDF functions to measure the importance of words in a document, while N-Gram Z enriches the context by considering the order of words that appear sequentially. The dataset in this study consists of 3,200 sentences generated by adolescent respondents through a survey designed to explore aspects of their personality. After the feature extraction process is complete, three variants of the Naïve Bayes method are applied for classification, namely Multinomial Naïve Bayes, Bernoulli Naïve Bayes, and Complement Naïve Bayes. Each variant has distinctive characteristics in handling certain data types, such as binomial and multinomial data. The results of the study show that the combined TF-IDF + N-Gram Z algorithm can produce highly representative features, as evidenced by high classification performance. The Multinomial Naïve Bayes and Complement Naïve Bayes variants each achieved 98% accuracy. These findings provide significant contributions to the development of NLP-based personality classification methods for Detecting Adolescent Personality. The combination of the TF-IDF + N-Gram Z algorithm with various Naïve Bayes variants produces an exceedingly high level of accuracy and can be applied in practice in the fields of psychology and adolescent education.
Keywords
Full Text:
PDFReferences
F. Syuhada and R. A. Pratama, “Feature Extraction Technique For Text Mining Requirement For Reuse in Software Product Lines: A Systematic Literature Review,” SainsTech Innovation Journal, vol. 3, no. 2, pp. 87–95, 2020, doi: 10.37824/sij.v3i2.2020.230.
G. R. Kumar, S. R. Basha, and S. B. Rao, “A Summarization on Text Mining Techniques for Information Extracting From Applications And Issues,” Journal of Mechanics of Continua and Mathematical Sciences, vol. Special Issue, no. 5, pp. 324–332, Jan. 2020, doi: 10.26782/jmcms.spl.5/2020.01.00026.
H. Liang, X. Sun, Y. Sun, and Y. Gao, “Text feature extraction based on deep learning: a review,” EURASIP J Wirel Commun Netw, vol. 2017, no. 211, pp. 1–12, Dec. 2017, doi: 10.1186/s13638-017-0993-1.
M. K. A. Reiki, Y. Sibaroni, and E. B. Setiawan, “Comparison of Term Weighting Methods in Sentiment Analysis of the New State Capital of Indonesia with the SVM Method,” International Journal on Information and Communication Technology (IJoICT), vol. 8, no. 2, pp. 53–65, Jan. 2023, doi: 10.21108/ijoict.v8i2.681.
Md Arif et al., "Predicting Customer Sentiment in Social Media Interactions: Analyzing Amazon Help Twitter Conversations Using Machine Learning," International Journal of Advanced Science Computing and Engineering, vol. 6, no. 2, pp. 52-56, Aug. 2024. doi: https://doi.org/10.62527/ijasce.6.2.211.
V. Vichianchai and S. Kasemvilas, “A New Term Frequency with Gaussian Technique for Text Classification and Sentiment Analysis,” Journal of ICT Research and Applications, vol. 15, no. 2, pp. 152–168, Oct. 2021, doi: 10.5614/itbj.ict.res.appl.2021.15.2.4.
V. Talasila, M. V Mohan, and N. M. R, “Enhancing Text-to-Image Synthesis with an Improved Semi-Supervised Image Generation Model Incorporating N-Gram, Enhanced TF-IDF, and BOW Techniques,” International Journal of Intelligent Systems and Applications in Engineering , vol. 11, no. 7s, pp. 381–397, 2023, [Online]. Available: www.ijisae.org
C.-Z. Liu, Y.-X. Sheng, Z.-Q. Wei, and Y.-Q. Yang, “Research of Text Classification Based on Improved TF-IDF Algorithm,” in International Conference of Intelligent Robotic and Control Engineering, 2018, pp. 218–222. doi: 10.1109/IRCE.2018.8492945.
H. Fan and Y. Qin, “Research on Text Classification Based on Improved TF-IDF Algorithm,” in International Conference on Network, Communication, Computer Engineering, 2018, pp. 501–506. doi: 10.2991/ncce-18.2018.79.
B. Kabra and C. Nagar, “Convolutional Neural Network based sentiment analysis with TF-IDF based vectorization,” Pg 1 J. Integr. Sci. Technol, vol. 11, no. 3, p. 503, 2023, [Online]. Available: http://pubs.thesciencein.org/jist
M. N. Saadah, R. W. Atmagi, D. S. Rahayu, and A. Z. Arifin, “Information Retrieval Of Text Document With Weighting Tf-Idf And Lcs,” Journal of Computer Science and Information, vol. 6, no. 1, pp. 34–37, 2013, doi: https://doi.org/10.21609/jiki.v6i1.216.
F. Alzami, E. D. Udayanti, D. P. Prabowo, and R. A. Megantara, “Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, pp. 235–242, Aug. 2020, doi: 10.22219/kinetik.v5i3.1066.
I. Imelda and Arief Ramdhan Kurnianto, “Naïve Bayes and TF-IDF for Sentiment Analysis of the Covid-19 Booster Vaccine,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 1, pp. 1–6, Jan. 2023, doi: 10.29207/resti.v7i1.4467.
P. H. Prastyo, I. Ardiyanto, and R. Hidayat, “Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF,” in International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), Institute of Electrical and Electronics Engineers Inc., Oct. 2020, pp. 1–6. doi: 10.1109/ICDABI51230.2020.9325685.
M. I. Alfarizi, L. syafaah, and M. Lestandy, “Emotional Text Classification Using TF-IDF (Term Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory),” Jurnal Informatika, vol. 10, no. 2, pp. 225–232, 2022, doi: 10.30595/juita.v10i2.13262.
Y. Pratama, Abdiansah, and K. J. Miraswan, “Sentiment Analysis Using Pseudo Nearest Neighbor and TF-IDF Text Vectorizer,” Sriwijaya Journal of Informatic and Applications, vol. 4, no. 2, pp. 69–75, 2023, doi: 10.36706/sjia.v4i2.68.
Suhasini V and N. Vimala, “A Hybrid TF-IDF and N-Grams Based Feature Extraction Approach for Accurate Detection of Fake News on Twitter Data,” Turkish Journal of Computer and Mathematics Education, vol. 12, no. 6, pp. 5710–5723, 2021, doi: 10.17762/turcomat.v12i6.10885.
C. G. Jung, The Collected Works of C. G. Jung, vol. 5. Pantheon Books, 1956.
M. C. Shehni and T. Khezrab, “Review of literature on learners’ personality in language learning: Focusing on extrovert and introvert learners,” Theory and Practice in Language Studies, vol. 10, no. 11, pp. 1478–1483, 2020, doi: 10.17507/tpls.1011.20.
N. Rugova, “Social networks as an important part of communication in contemporary trends in adolescents, their impact on their personality and psycho-social behavior,” Technium Social Sciences Journal, vol. 17, pp. 244–258, 2021, doi: 10.47577/tssj.v17i1.2873.
R. Rajkumar and V. Ganapathy, “Bio-Inspiring Learning Style Chatbot Inventory Using Brain Computing Interface to Increase the Efficiency of E-Learning,” IEEE Access, vol. 8, pp. 67377–67395, 2020, doi: 10.1109/ACCESS.2020.2984591.
Y. Hernández, A. Martínez, H. Estrada, J. Ortiz, and C. Acevedo, “Machine Learning Approach for Personality Recognition in Spanish Texts,” Applied Sciences (Switzerland), vol. 12, no. 6, pp. 1–17, Mar. 2022, doi: 10.3390/app12062985.
A. P. Rosyadi, W. Maharani, and P. H. Gani, “Personality Detection on Twitter User Using XGBoost Algorithm,” Jurnal Teknik Informatika (JUTIF), vol. 5, no. 1, pp. 69–75, 2024, doi: 10.52436/1.jutif.2024.5.1.1166.
I. Maliki and M. A. Sidik, “Personality Prediction System Based on Signatures Using Machine Learning,” in IOP Conference Series: Materials Science and Engineering, IOP Publishing Ltd, Aug. 2020. doi: 10.1088/1757-899X/879/1/012068.
M. K. Anam et al., “Sentiment Analysis for Online Learning using The Lexicon-Based Method and The Support Vector Machine Algorithm,” ILKOM Jurnal Ilmiah, vol. 15, no. 2, pp. 290–302, 2023, doi: 10.33096/ilkom.v15i2.1590.290-302.
J. S. Baruni and J. G. R. Sathiaseelan, “Keyphrase Extraction from Document Using RAKE and TextRank Algorithms,” International Journal of Computer Science and Mobile Computing, vol. 9, no. 9, pp. 83–93, Sep. 2020, doi: 10.47760/ijcsmc.2020.v09i09.009.
Z. H. Amur, Y. K. Hooi, G. M. Soomro, H. Bhanbhro, S. Karyem, and N. Sohu, “Unlocking the Potential of Keyword Extraction: The Need for Access to High-Quality Datasets,” Applied Sciences, vol. 13, no. 7228, pp. 1–19, 2023, doi: 10.3390/app13127228.
C. Dev and A. Ganguly, “Sentiment Analysis of Assamese Text Reviews: Supervised Machine Learning Approach with Combined n-gram and TF-IDF Feature,” ADBU Journal of Electrical and Electronics Engineering (AJEEE) |, vol. 5, no. 2, 2023, [Online]. Available: www.tinyurl.com/ajeee-adbu|
M. Hadyan Baqi, Y. Sibaroni, and S. Suryani Prasetiyowati, “Comparative Analysis of Naive Bayes Model Performance in Hate Speech Detection in Media Social Twitter,” Jurnal Riset Komputer), vol. 10, no. 1, pp. 2407–389, 2023, doi: 10.30865/jurikom.v10i1.5493.
S. D. Bappon and A. Iqbal, “Classification of Tourism Reviews from Bengali Texts using Multinomial Naïve Bayes,” in Proceedings of 2022 25th International Conference on Computer and Information Technology, ICCIT 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 270–275. doi: 10.1109/ICCIT57492.2022.10055560.
A. Kanavos, I. Karamitsos, A. Mohasseb, and V. C. Gerogiannis, “Comparative Study of Machine Learning Algorithms and Text Vectorization Methods for Fake News Detection,” in International Conference on Information, Intelligence, Systems and Applications, 2023, pp. 1–8. doi: 10.1109/IISA59645.2023.10345953.
A. Bhat, C. Satish, N. D’Souza, and N. Kashyap, “Effect of Dynamic Stoplist on Keyword Prediction in RAKE,” International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 4, no. 6, pp. 259–264, 2018, [Online]. Available: www.ijsrcseit.com
A. M. Rukmi, D. B. Utomo, and N. I. atus Sholikhah, “Study of parameters of the nearest neighbour shared algorithm on clustering documents,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Mar. 2018. doi: 10.1088/1742-6596/974/1/012061.