A Survey on Forms of Visualization and Tools Used in Topic Modelling

Ruhaila Maskat - Universiti Teknologi MARA Shah Alam, Selangor Malaysia
Shazlyn Shaharudin - Universiti Pendidikan Sultan Idris, Tanjong Malim, Malaysia
Deden Witarsyah - Telkom University Bandung, Indonesia
Hairulnizam Mahdin - Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat, Johor, Malaysia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.2.1313

Abstract


In this paper, we surveyed recent publications on topic modeling and analyzed the forms of visualizations and tools used. Expectedly, this information will help Natural Language Processing (NLP) researchers to make better decisions about which types of visualization are appropriate for them and which tools can help them. This could also spark further development of existing visualizations or the emergence of new visualizations if a gap is present. Topic modeling is an NLP technique used to identify topics hidden in a collection of documents. Visualizing these topics permits a faster understanding of the underlying subject matter in terms of its domain. This survey covered publications from 2017 to early 2022. The PRISMA methodology was used to review the publications. One hundred articles were collected, and 42 were found eligible for this study after filtration. Two research questions were formulated. The first question asks, "What are the different forms of visualizations used to display the result of topic modeling?" and the second question is "What visualization software or API is used? From our results, we discovered that different forms of visualizations meet different purposes of their display. We categorized them as maps, networks, evolution-based charts, and others. We also discovered that LDAvis is the most frequently used software/API, followed by the R language packages and D3.js. The primary limitation of this survey is it is not exhaustive. Hence, some eligible publications may not be included.


Keywords


Topic visualization; Topic modelling; Visualization tools; Review; Survey

Full Text:

PDF

References


P. Kherwa and P. Bansal, “Topic Modeling: A Comprehensive Review EAI Endorsed Transactions on Scalable Information Systems,†EAI Endorsed Transactions on Scalable Information Systems, vol. 7, no. 24, pp. 1–16, 2019.

M. J. Page et al., “The PRISMA 2020 statement: An updated guideline for reporting systematic reviews,†The BMJ, vol. 372, 2021, doi: 10.1136/bmj.n71.

U. Chauhan and A. Shah, “Topic Modeling Using Latent Dirichlet allocation: A Survey,†ACM Computing Surveys, vol. 54, no. 7, 2022, doi: 10.1145/3462478.

C. Sievert and K. Shirley, “LDAvis: A Method for Visualizing and Interpreting Topics,†in Workshop on Interactive Language Learning, Visualization and Interfaces, 2015, pp. 63–70, doi: 10.3115/v1/w14-3110.

P. N. Castillo, Mastering D3. js. Packt Publishing Ltd, 2014.

M. E. Roberts, B. M. Stewart, and D. Tingley, “stm: R Package for Structural Topic Models,†Journal of Statistical Software, vol. 91, no. 1, pp. 1–40, 2019, doi: 10.18637/jss.v000.i00.

A. A. Haidar, B. Yang, and J. G. Ganascia, “Visualizing the first world war using StreamGraphs and information extraction,†Proceedings of the International Conference on Information Visualisation, vol. 2016-Augus, pp. 290–293, 2016, doi: 10.1109/IV.2016.81.

J. de Leeuw and P. Mair, “Multidimensional scaling using majorization: SMACOF in R,†Journal of Statistical Software, vol. 31, no. 3, pp. 1–30, 2009, doi: 10.18637/jss.v031.i03.

M. E. Martin and N. Schuurman, “Area-Based Topic Modeling and Visualization of Social Media for Qualitative GIS,†Annals of the American Association of Geographers, vol. 107, no. 5, pp. 1028–1039, 2017, doi: 10.1080/24694452.2017.1293499.

N. Schneider, N. Fechner, G. A. Landrum, and N. Stiefl, “Chemical Topic Modeling: Exploring Molecular Data Sets Using a Common Text-Mining Approach,†Journal of Chemical Information and Modeling, vol. 57, no. 8, pp. 1816–1831, 2017, doi: 10.1021/acs.jcim.7b00249.

S. Liu and P. Jansson, “City event detection from social media with neural embeddings and topic model visualization,†Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, vol. 2018-Janua, no. 2012, pp. 4111–4116, 2017, doi: 10.1109/BigData.2017.8258430.

J. Yan et al., “MetaTopics: An integration tool to analyze microbial community profile by topic model,†BMC Genomics, vol. 18, no. Suppl 1, pp. 1–5, 2017, doi: 10.1186/s12864-016-3257-2.

S. Manna and O. Phongpanangam, “Exploring Topic Models on Short Texts: A Case Study with Crisis Data,†Proceedings - 2nd IEEE International Conference on Robotic Computing, IRC 2018, vol. 2018-Janua, pp. 377–382, 2018, doi: 10.1109/IRC.2018.00078.

T. Helldin, H. J. Steinhauer, A. Karlsson, and G. Mathiason, “Situation Awareness in Telecommunication Networks Using Topic Modeling,†2018 21st International Conference on Information Fusion, FUSION 2018, pp. 549–556, 2018, doi: 10.23919/ICIF.2018.8455529.

X. Cheng et al., “Topic modelling of ecology, environment and poverty nexus: An integrated framework,†Agriculture, Ecosystems and Environment, vol. 267, no. July, pp. 1–14, 2018, doi: 10.1016/j.agee.2018.07.022.

M. Choi et al., “TopicOnTiles: Tile-based spatio-temporal event analytics via exclusive topic modeling on social media,†Conference on Human Factors in Computing Systems - Proceedings, vol. 2018-April, pp. 1–11, 2018, doi: 10.1145/3173574.3174157.

D. Jin et al., “A novel generative topic embedding model by introducing network communities,†The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019, pp. 2886–2892, 2019, doi: 10.1145/3308558.3313623.

Q. Liu, Q. Chen, J. Shen, H. Wu, Y. Sun, and W. K. Ming, “Data analysis and visualization of newspaper articles on thirdhand smoke: A topic modeling approach,†JMIR Medical Informatics, vol. 7, no. 1, pp. 1–9, 2019, doi: 10.2196/12414.

K. R. Prasad, M. Mohammed, and R. M. Noorullah, “Hybrid topic cluster models for social healthcare data,†International Journal of Advanced Computer Science and Applications, vol. 10, no. 11, pp. 490–506, 2019, doi: 10.14569/IJACSA.2019.0101168.

C. Koylu, “Modeling and visualizing semantic and spatio-temporal evolution of topics in interpersonal communication on Twitter,†International Journal of Geographical Information Science, vol. 33, no. 4, pp. 805–832, 2019, doi: 10.1080/13658816.2018.1458987.

D. J. Carter and A. Rahmani, “Proximity and Neighbourhood: Using Topic Modelling to Read The Development of Law in the High Court of Australia,†Monash University Law Review, vol. 45, no. 3, pp. 785–824, 2019.

A. Goswami, P. Mohapatra, and C. Zhai, “Quantifying and visualizing the demand and supply gap from e-commerce search data using topic models,†The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019, pp. 348–353, 2019, doi: 10.1145/3308560.3316605.

S. K. Ray, A. Ahmad, and C. A. Kumar, “Review and Implementation of Topic Modeling in Hindi,†Applied Artificial Intelligence, vol. 33, no. 11, pp. 979–1007, 2019, doi: 10.1080/08839514.2019.1661576.

B. Zafari and T. Ekin, “Topic modelling for medical prescription fraud and abuse detection,†Journal of the Royal Statistical Society. Series C: Applied Statistics, vol. 68, no. 3, pp. 751–769, 2019, doi: 10.1111/rssc.12332.

E. S. Negara, D. Triadi, and R. Andryani, “Topic Modelling Twitter Data with Latent Dirichlet Allocation Method,†ICECOS 2019 - 3rd International Conference on Electrical Engineering and Computer Science, Proceeding, no. October 2019, pp. 386–390, 2019, doi: 10.1109/ICECOS47637.2019.8984523.

M. Asghari, D. Sierra-Sosa, and A. S. Elmaghraby, “A topic modeling framework for spatio-temporal information management,†Information Processing and Management, vol. 57, no. 6, p. 102340, 2020, doi: 10.1016/j.ipm.2020.102340.

M. Odlum et al., “Application of topic modeling to tweets as the foundation for health disparity research for COVID-19,†Studies in Health Technology and Informatics, vol. 272, pp. 24–27, 2020, doi: 10.3233/SHTI200484.

G. Yang, A. Ma, Z. S. Qin, and L. Chen, “Application of topic models to a compendium of ChIP-Seq datasets uncovers recurrent transcriptional regulatory modules,†Bioinformatics, vol. 36, no. 8, pp. 2352–2358, 2020, doi: 10.1093/bioinformatics/btz975.

G. Tao, Y. Miao, and S. Ng, “COVID-19 Topic Modeling and Visualization,†Proceedings of the International Conference on Information Visualisation, vol. 2020-Septe, no. Iv, pp. 734–739, 2020, doi: 10.1109/IV51561.2020.00129.

Q. Deng, Y. Gao, C. Wang, and H. Zhang, “Detecting information requirements for crisis communication from social media data: An interactive topic modeling approach,†International Journal of Disaster Risk Reduction, vol. 50, no. January, p. 101692, 2020, doi: 10.1016/j.ijdrr.2020.101692.

L. Juan, Y. Wang, J. Jiang, Q. Yang, G. Wang, and Y. Wang, “Evaluating individual genome similarity with a topic model,†Bioinformatics, vol. 36, no. 18, pp. 4757–4764, 2020, doi: 10.1093/bioinformatics/btaa583.

Y. Miyata, E. Ishita, F. Yang, M. Yamamoto, A. Iwase, and K. Kurata, “Knowledge structure transition in library and information science: topic modeling and visualization,†Scientometrics, vol. 125, no. 1, pp. 665–687, 2020, doi: 10.1007/s11192-020-03657-5.

H. Liu, Z. Chen, J. Tang, Y. Zhou, and S. Liu, Mapping the technology evolution path: a novel model for dynamic topic detection and tracking, vol. 125, no. 3. Springer International Publishing, 2020.

T. Zhang, B. Lee, Q. Zhu, X. Han, and E. M. Ye, “Multi-Dimension Topic Mining Based on Hierarchical Semantic Graph Model,†IEEE Access, vol. 8, pp. 64820–64835, 2020, doi: 10.1109/ACCESS.2020.2984352.

D. Buenano-Fernandez, M. Gonzalez, D. Gil, and S. Lujan-Mora, “Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach,†IEEE Access, vol. 8, pp. 35318–35330, 2020, doi: 10.1109/ACCESS.2020.2974983.

K. S. Cheng, Z. Wang, P. C. Huang, P. Chundi, and M. Song, “TopExplorer: Tool Support for Extracting and Visualizing Topic Models in Bioengineering Text Corpora,†IEEE International Conference on Electro Information Technology, vol. 2020-July, pp. 334–343, 2020, doi: 10.1109/EIT48999.2020.9208294.

V. Bulatov et al., “TopicNet: Making additive regularisation for topic modelling accessible,†LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings, no. May, pp. 6745–6752, 2020.

D. Opitz, E. Graells-Garrido, and I. Pérez-Messina, “Toward Characterizing Cities with Social Media Images Using Activity Recognition, Topic Modeling and Visualization,†The Web Conference 2020 - Companion of the World Wide Web Conference, WWW 2020, pp. 688–693, 2020, doi: 10.1145/3366424.3384361.

S. H. Kim and H. G. Cho, “User-topic modeling for online community analysis,†Applied Sciences (Switzerland), vol. 10, no. 10, 2020, doi: 10.3390/APP10103388.

S. Han, S. Ye, and H. Zhang, “Visual exploration of Internet news via sentiment score and topic models,†Computational Visual Media, vol. 6, no. 3, pp. 333–347, 2020, doi: 10.1007/s41095-020-0178-4.

G. Ertek and L. Kailas, “Analyzing a decade of wind turbine accident news with topic modeling,†Sustainability (Switzerland), vol. 13, no. 22, 2021, doi: 10.3390/su132212757.

Zoya Seemab Latif, F. Shafait, and R. Latif, “Analyzing LDA and NMF Topic Models for Urdu Tweets via Automatic Labeling,†IEEE Access, vol. 9, pp. 127531–127547, 2021, doi: 10.1109/ACCESS.2021.3112620.

J. S. Kim, H. Kim, E. Lee, and Y. Seo, “Analysis of research on metabolic syndrome in cancer survivors using topic modeling and social network analysis,†Science Progress, vol. 104, no. 4, pp. 1–15, 2021, doi: 10.1177/00368504211061974.

L. Chen, P. Wang, X. Ma, and X. Wang, “Cancer communication and user engagement on Chinese social media: Content analysis and topic modeling study,†Journal of Medical Internet Research, vol. 23, no. 11, pp. 1–9, 2021, doi: 10.2196/26310.

A. R. Alharbi, M. Hijji, and A. Aljaedi, “Enhancing topic clustering for Arabic security news based on k-means and topic modelling,†IET Networks, vol. 10, no. 6, pp. 278–294, 2021, doi: 10.1049/ntw2.12017.

B. Zhong and Q. Liu, “Medical insights from posts about irritable bowel syndrome by adolescent patients and their parents: Topic modeling and social network analysis,†Journal of Medical Internet Research, vol. 23, no. 6, pp. 1–13, 2021, doi: 10.2196/26867.

R. Kumari, J. Y. Jeong, B. H. Lee, K. N. Choi, and K. Choi, “Topic modelling and social network analysis of publications and patents in humanoid robot technology,†Journal of Information Science, vol. 47, no. 5, pp. 658–676, 2021, doi: 10.1177/0165551519887878.

J. Liu et al., “Tracing the Pace of COVID-19 Research: Topic Modeling and Evolution,†Big Data Research, vol. 25, p. 100236, 2021, doi: 10.1016/j.bdr.2021.100236.

J. Won et al., “Trends in nursing research on infections: Semantic network analysis and topic modeling,†International Journal of Environmental Research and Public Health, vol. 18, no. 13, 2021, doi: 10.3390/ijerph18136915.

P. J. M. Noble, C. Appleton, A. D. Radford, and G. Nenadic, “Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs,†PLoS ONE, vol. 16, no. 12 December, pp. 1–18, 2021, doi: 10.1371/journal.pone.0260402.