Indonesian Online News Extraction and Clustering Using Evolving Clustering

Muhammad Alfian; Ali Ridho Barakbah; Idris Winarno

doi:10.30630/joiv.5.3.537

Indonesian Online News Extraction and Clustering Using Evolving Clustering

Muhammad Alfian - Informatics and Computer Engineering Department, Politeknik Elektronika Negeri Surabaya, Indonesia
Ali Barakbah - Informatics and Computer Engineering Department, Politeknik Elektronika Negeri Surabaya, Indonesia
Idris Winarno - Informatics and Computer Engineering Department, Politeknik Elektronika Negeri Surabaya, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.5.3.537

Abstract

43,000 online media outlets in Indonesia publish at least one to two stories every hour. The amount of information exceeds human processing capacity, resulting in several impacts for humans, such as confusion and psychological pressure. This study proposes the Evolving Clustering method that continually adapts existing model knowledge in the real, ever-evolving environment without re-clustering the data. This study also proposes feature extraction with vector space-based stemming features to improve Indonesian language stemming. The application of the system consists of seven stages, (1) Data Acquisition, (2) Data Pipeline, (3) Keyword Feature Extraction, (4) Data Aggregation, (5) Predefined Cluster using Automatic Clustering algorithm, (6) Evolving Clustering, and (7) News Clustering Result. The experimental results show that Automatic Clustering generated 388 clusters as predefined clusters from 3.000 news. One of them is the unknown cluster. Evolving clustering runs for two days to cluster the news by streaming, resulting in a total of 611 clusters. Evolving clustering goes well, both updating models and adding models. The performance of the Evolving Clustering algorithm is quite good, as evidenced by the cluster accuracy value of 88%. However, some clusters are not right. It should be re-evaluated in the keyword feature extraction process to extract the appropriate features for grouping. In the future, this method can be developed further by adding other functions, updating and adding to the model, and evaluating.

Keywords

Evolving clustering; incremental clustering; news extraction; stemming.

Full Text:

PDF

References

A. S. M. Romli, Jurnalistik Online: Panduan Mengelola Media Online, I. Kurniawan, Bandung, Indonesia: Nuansa, 2012.

S. S. Kurnia, Jurnalisme Kontemporer, Jakarta, Indonesia: Yayasan Pustaka Obor Indonesia, 2017.

D. Z. E. Puspitasari, A. R. Barakbah, I. Winarno, â€œAutomatic Representative News Generation using Automatic Clusteringâ€, in Industrial Electronics Seminar (IES) 2012, 2012.

AMSI. (2019) Dari 47 Ribu, Baru 2.700 Media Online Terverifikasi Dewan Pers. [Online]. Available: https://www.amsi.or.id/dari-47-ribu-baru-2-700-media-online-terverifikasi-dewan-pers/

J. B. Schmitt, C. A. Debbelt, F. M. Schneider, â€œToo much information? Predictors of information overload in the context of online news exposureâ€, Information Communication and Society, vol. 21, no. 8, pp. 1151-1167, Apr. 2017.

I. SubaÅ¡iÄ‡, B. Berendt, â€œPeddling or Creating? Investigating the Role of Twitter in News Reportingâ€,in Proc. ECIR 2011, 2011, p 207-213, 2011.

P. Virmani, S. Taneja, â€œA Text Preprocessing Approach for Efficacious Information Retrieval,â€ Smart Innovations in Communication and Computational Sciences, pp.13-22, Jan. 2019.

J. Asian, H. E. Wiliams dan S. M. M. Tahaghoghi, â€œStemming Indonesianâ€, in Proceedings ACSC '05, 2005, p. 307-314.

Z. Aliniya, S. A. Mirroshandel, â€œA novel combinatorial merge-split approach for automatic clustering using imperialist competitive algorithmâ€, Expert System with Applications, vol. 117, p. 243-266, Mar. 2019.

M. K. Islam, M. M. Ahmed, K. Z. Zamli, â€œA buffer-based online clustering for evolving data streamâ€, Information Sciences, vol. 489, p. 113-135, Jul. 2019.

M. Adriani, J. Asian, B. Nazief, S. M. M. Tahaghoghi dan H. E. Wiliam, â€œStemming Indonesian: A confix-Stripping Approachâ€, ACM Transactions on Asian Language Information Processing, vol. 6, no. 4, Dec. 2007.

L. D. Pratiwi, â€œPerbandingan Algoritma Nazief Adriani dan Paice Husk untuk Proses Stemming Teks Bahasa Indonesia,â€ B.Sc thesis, UIN Sunan Gunung Djati, Bandung, Indonesia, Oct. 2019.

A. S. Rizki, A. Tjahyanto dan R. Trialih, â€œComparison of Stemming Algorithm on Indonesian Text Processingâ€, TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 17, no. 1, pp. 95-102, Feb. 2019.

M. S. H. Simarangkir, â€œStudi Perbandingan Algoritma-Algoritma Stemming untuk Dokumen Teks Bahasa Indonesia,â€ Jurnal INKOFAR, vol. 1, no. 1, Jul. 2017.

P. Prihatin, I. D. Putra, I. Giriantri dan M. Sudarma, â€œStemming Algorithm for Indonesian Digital News Text Processingâ€, International Journal of Engineering and Emerging Technology, vol. 2, no. 2, pp. 1-7, Mar. 2018.

M. Sigita, A. R. Barakbah, E. M. Kusumaningtyas, I. W., â€œAutomatic Representative News Generation using On-Line Clusteringâ€, EMITTER International Journal of Engineering Technology, vol. 1, no. 1, pp. 107-113, Dec. 2013.

J. Azzopardi, C. Staff, â€œIncremental Clustering of News Reportsâ€, Algorithms - Open Access Journal, vol. 5, no. 3, pp. 364 - 378, Dec. 2012.

A. M. Bakr, N. M. Ghanem dan M. A. Ismail, â€œEfficient Incremental Density-based algorithm for clustering large datasets,â€ Alexandria Engineering Journal, vol. 54, no. 4, pp. 1147-1154, Dec. 2015.

P. Laban dan M. Hearst, â€œnewsLens: building and visualizing long-ranging news stories,â€ in Proceedings of the Events and Stories in the News Workshop, 2017, p. 1-9.

R. Florence, B. Nogueira dan R. Marcacini, â€œConstrained Hierarchical Clusteriing of News Events,â€ in Proceedings of the 21st International Database Engineering & Applications Symposium (IDEAS) 2017, 2017, p. 49-56.

I. Shabirin, â€œCluster Based News Representative Generation with Automatic Incremental Clusteringâ€, M.Eng. thesis, Politeknik Elektronika Negeri Surabaya, Surabaya, Indonesia, Jul. 2017.

A. R. Barakbah, K. Arai, â€œReversed Pattern of Moving Variance for Accelerating Automatic Clusteringâ€, EEPIS Journal, vol. 2, no. 9, pp. 15-21, 2004.

T. Kohonen, Self-Organizing Maps: Learning Vector Quantization, ser. Springer Series in Information Sciences, Springer, Berlin, Heidelberg : Springer, 1995, vol 30.

Username
Password
Remember me