Visualization of Prediction Potential Hotspots in Multidimensional Datasets

Adam Dudáš - Matej Bel University, Tajovského 40, Banská Bystrica, Slovakia
Bianka Modrovičová - Matej Bel University, Tajovského 40, Banská Bystrica, Slovakia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.9.1.2477

Abstract


Correlation analysis and visual analysis of multidimensional datasets with the objective of identification of patterns and trends is an essential element of decision-making processes. Conventional visualization models in the considered area, such as correlation heatmaps, are used to visually represent the value of the correlation coefficient measured between pairs of attributes of the multidimensional dataset but are hard to read when working with a large number of attributes. This study concerns the design and implementation of a visualization model, which can be used to identify prediction potential hotspots in analysed datasets - parts of the dataset, which are strongly correlated with a high number of attributes in the dataset. The proposed model focuses on a graphical representation of such hotspots based on planar, multicomponent graphs, with the aim of meta-analysis of large, multidimensional datasets. The implemented approach is evaluated on a case study focused on the analysis of the original cubic graph property dataset where several prediction potential hotspots of different correlation types are constructed. Other than the construction of the hotspots themselves, this study shows a comparison of results gained by the graphical model to the conventional model used in the meta-analysis of multidimensional datasets – Shapley value explanations. The results presented in this study point to the need for a robust visualization framework for the analysis of correlation structures in multidimensional datasets and for models of visualization based on virtual and augmented reality.


Keywords


visual data analysis; prediction potential; correlation analysis; prediction potential hotspots

Full Text:

PDF

References


L. Xue, D. Jiang, R. Wang, J. Yang, and M. Hu, “Learning semantic dependencies with channel correlation for multi-label classification,” Visual Computer, vol. 36, no. 7, 2020, doi: 10.1007/s00371-019-01731-5.

X. Li, Y. Fan, G. Lv, and H. Ma, “Area-based correlation and non-local attention network for stereo matching,” Visual Computer, vol. 38, no. 11, 2022, doi: 10.1007/s00371-021-02228-w.

A. B. Abadi and S. Tahcfulloh, “Digital Image Processing for Height Measurement Application Based on Python OpenCV and Regression Analysis,” International Journal on Informatics Visualization, vol. 6, no. 4, pp. 763–770, 2022, doi: 10.30630/joiv.6.4.1013.

S. S. Skiena, The data science design manual. 2017.

M. Vagač et al., “Crawling images with web browser support,” in 2015 IEEE 13th International Scientific Conference on Informatics, INFORMATICS 2015 - Proceedings, 2016. doi: 10.1109/Informatics.2015.7377848.

N. A. B. M. Zahruddin, N. D. Kamarudin, R. M. Jusoh, N. A. A. Fataf, and R. Hidayat, “Case Study: Using Data Mining to Predict Student Performance Based on Demographic Attributes,” International Journal on Informatics Visualization, vol. 7, no. 4, pp. 2460–2468, 2023, doi: 10.30630/joiv.7.4.2454.

L. de Espona Pernas, A. Vichalkovski, W. Steingartner, and E. Pustulka, “Automatic Indexing for MongoDB,” in Communications in Computer and Information Science, 2023. doi: 10.1007/978-3-031-42941-5_46.

V. M. Shervegar, “Heart sound classification using wavelet scattering transform and support vector machine,” Intelligent Data Analysis, vol. 27, 2023, doi: 10.3233/IDA-237432.

V. Šalgová, M. Kvet, M. Kvet, M. Čajka, and P. Grofčík, “IMDB database performance analysis,” in ICCC 2022 - IEEE 10th Jubilee International Conference on Computational Cybernetics and Cyber-Medical Systems, Proceedings, 2022. doi: 10.1109/ICCC202255925.2022.9922783.

M. Potočár and M. Kvet, “Comparison of Unigram, HMM, CRF and Brill’s Part-of-Speech Taggers Available in NLTK Library,” in Conference of Open Innovation Association, FRUCT, 2023. doi: 10.23919/FRUCT58615.2023.10143061.

J. Tang, Y. Fan, Y. Du, X. Li, and X. Chen, “A feature-aware long-short interest evolution network for sequential recommendation,” Intelligent Data Analysis, 2023, doi: 10.3233/ida-230288.

A. Michalíková and M. Vagač, “A tire tread pattern detection based on fuzzy logic,” in Advances in Intelligent Systems and Computing, 2016. doi: 10.1007/978-3-319-26154-6_29.

C. Song, J. Wu, L. Zhu, and X. Zuo, “Weight correlation reduction and features normalization: improving the performance for shallow networks,” Visual Computer, vol. 38, no. 7, 2022, doi: 10.1007/s00371-021-02125-2.

F. Cauteruccio and G. Terracina, “Extended High-Utility Pattern Mining: An Answer Set Programming-Based Framework and Applications,” Theory and Practice of Logic Programming, 2023, doi: 10.1017/S1471068423000066.

K. Prasanna, C. B. Jyothi, S. K. Mathivanan, P. Jayagopal, A. Saif, and D. J. Samuel, “Deep learning models for predicting the position of the head on an X-ray image for Cephalometric analysis,” Intelligent Data Analysis, vol. 27, 2023, doi: 10.3233/IDA-237430.

H. Liu, C. Chen, Y. Li, Z. Duan, and Y. Li, “Characteristic and correlation analysis of metro loads,” in Smart Metro Station Systems, 2022. doi: 10.1016/b978-0-323-90588-6.00009-3.

D. Nettleton, Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects. 2014. doi: 10.1016/C2013-0-00263-0.

H. Bon-Gang, Performance and Improvement of Green Construction Projects: Management Strategies and Innovations. 2018. doi: 10.1016/C2017-0-01403-9.

D. R. Weier and A. P. Basu, “An investigation of Kendall’s τ modified for censored data with applications,” J Stat Plan Inference, vol. 4, no. 4, 1980, doi: 10.1016/0378-3758(80)90023-3.

A. Dudáš, “Graphical representation of data prediction potential: correlation graphs and correlation chains,” Visual Computer, 2024, doi: 10.1007/s00371-023-03240-y.

M. Yoshida, A. Ido, M. Kawase, N. Wakabayashi, M. Morinaga, and M. Kameda, “Imaging technique for inspecting the inside of coal-fired boilers in operation using a visible-light camera,” J Vis (Tokyo), vol. 26, no. 6, pp. 1339–1357, 2023, doi: 10.1007/s12650-023-00935-1.

L. Zhang, H. Jiang, and W. Zhao, “Visual analysis of correlation between diseases evolution and human dynamics,” International Journal on Informatics Visualization, vol. 3, no. 2–2, pp. 203–212, 2019, doi: 10.30630/joiv.3.2-2.279.

Y. Zhang, C. Yu, R. Wang, and X. Liu, “Visual dimension analysis based on dimension subdivision,” J Vis (Tokyo), vol. 24, no. 1, pp. 117–131, 2021, doi: 10.1007/s12650-020-00694-3.

J. Wang, X. Cai, J. Su, Y. Liao, and Y. Wu, “What makes a scatterplot hard to comprehend: data size and pattern salience matter,” J Vis (Tokyo), vol. 25, no. 1, pp. 59–75, 2022, doi: 10.1007/s12650-021-00778-8.

A. Michalikova and B. Pazicky, “Classification of Tire Tread Images by Using Neural Networks,” in INFORMATICS 2019 - IEEE 15th International Scientific Conference on Informatics, Proceedings, 2019. doi: 10.1109/Informatics47936.2019.9119306.

B. Fieri, J. La’la, and D. Suhartono, “Introversion-Extraversion Prediction Using Machine Learning,” International Journal on Informatics Visualization, vol. 7, no. 4, pp. 2154–2160, 2023, doi: 10.30630/joiv.7.4.1019.

H. Jafari and H. Farahani, “Fuzzy Graph Coloring Using Gröbner Basis,” New Mathematics and Natural Computation, 2023, doi: 10.1142/S179300572450025X.

P. Ganesan, “Spectral bounds for the quantum chromatic number of quantum graphs,” Linear Algebra Appl, vol. 674, 2023, doi: 10.1016/j.laa.2023.06.007.

A. Dudas and B. Modrovicova, “Decision Trees in Proper Edge k-coloring of Cubic Graphs,” in Conference of Open Innovation Association, FRUCT, 2023. doi: 10.23919/FRUCT58615.2023.10143001.

Christoph Molnar, “Interpretable Machine Learning, Christoph Molnar,” https://christophm.github.io/interpretable-ml-book/.