A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data

Syahirah Suboh - Universiti Teknologi PETRONAS, Perak, Malaysia
Izzatdin Aziz - Universiti Teknologi PETRONAS, Perak, Malaysia
Shazlyn Shaharudin - Universiti Pendidikan Sultan Idris, Perak, Malaysia
Saidatul Ismail - Universiti Teknologi MARA, Selangor, Malaysia
Hairulnizam Mahdin - Universiti Tun Hussien Onn, Johor, Malaysia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.7.1.1297

Abstract


In data analysis, recognizing unusual patterns (outliers’ analysis or anomaly detection) plays a crucial role in identifying critical events. Because of its widespread use in many applications, it remains an important and extensive research brand in data mining. As a result, numerous techniques for finding anomalies have been developed, and more are still being worked on. Researchers can gain vital knowledge by identifying anomalies, which helps them make better meaningful data analyses. However, anomaly detection is even more challenging when the datasets are high-dimensional and multivariate. In the literature, anomaly detection has received much attention but not as much as anomaly detection, specifically in high dimensional and multivariate conditions. This paper systematically reviews the existing related techniques and presents extensive coverage of challenges and perspectives of anomaly detection within high-dimensional and multivariate data. At the same time, it provides a clear insight into the techniques developed for anomaly detection problems. This paper aims to help select the best technique that suits its rightful purpose. It has been found that PCA, DOBIN, Stray algorithm, and DAE-KNN have a high learning rate compared to Random projection, ROBEM, and OCP methods. Overall, most methods have shown an excellent ability to tackle the curse of dimensionality and multivariate features to perform anomaly detection. Moreover, a comparison of each algorithm for anomaly detection is also provided to produce a better algorithm. Finally, it would be a line of future studies to extend by comparing the methods on other domain-specific datasets and offering a comprehensive anomaly interpretation in describing the truth of anomalies.


Keywords


anomaly detection; high-dimensional data; multivariate data, information science

Full Text:

PDF

References


M. Çelik, F. Dadaşer-Çelik, and A. Ş. Dokuz, “Anomaly detection in temperature data using dbscan algorithm,†in 2011 international symposium on innovations in intelligent systems and applications, 2011, pp. 91–95.

R. Alguliyev, R. Aliguliyev, and L. Sukhostat, “Anomaly detection in Big data based on clustering,†Statistics, Optimization & Information Computing, vol. 5, no. 4, pp. 325–340, 2017.

I. Ben-Gal, “Outlier detection,†in Data mining and knowledge discovery handbook, Springer, 2005, pp. 131–146.

S. Ayesha, M. K. Hanif, and R. Talib, “Overview and comparative study of dimensionality reduction techniques for high dimensional data,†Information Fusion, vol. 59, pp. 44–58, 2020.

A. Ukil, S. Bandyoapdhyay, C. Puri, and A. Pal, “IoT healthcare analytics: The importance of anomaly detection,†in 2016 IEEE 30th international conference on advanced information networking and applications (AINA), 2016, pp. 994–997.

L. Basora, X. Olive, and T. Dubot, “Recent advances in anomaly detection methods applied to aviation,†Aerospace, vol. 6, no. 11, p. 117, 2019.

M. A. Hayes and M. A. M. Capretz, “Contextual anomaly detection framework for big sensor data,†J Big Data, vol. 2, no. 1, p. 2, 2015.

A. Sreenivasulu, “Evaluation of cluster based Anomaly detection.†2019.

X. Yang, Z. Wang, and X. Zi, “Thresholding-based outlier detection for high-dimensional data,†J Stat Comput Simul, vol. 88, no. 11, pp. 2170–2184, 2018.

P. Navarro-Esteban and J. A. Cuesta-Albertos, “High-dimensional outlier detection using random projections,†TEST, pp. 1–27, 2021.

H. Wang, M. J. Bah, and M. Hammad, “Progress in outlier detection techniques: A survey,†Ieee Access, vol. 7, pp. 107964–108000, 2019.

N. R. Prasad, S. Almanza-Garcia, and T. T. Lu, “Anomaly detection,†Computers, Materials and Continua, vol. 14, no. 1, pp. 1–22, 2009, doi: 10.1145/1541880.1541882.

D. Samariya and A. Thakkar, “A Comprehensive Survey of Anomaly Detection Algorithms,†Annals of Data Science. Springer Science and Business Media Deutschland GmbH, 2021. doi: 10.1007/s40745-021-00362-9.

Y. Yang, L. Chen, and C. Fan, “ELOF: fast and memory-efficient anomaly detection algorithm in data streams,†Soft comput, vol. 25, no. 6, pp. 4283–4294, 2021.

E. Uzabaci, I. Ercan, and O. Alpu, “Evaluation of outlier detection method performance in symmetric multivariate distributions,†Communications in Statistics-Simulation and Computation, vol. 49, no. 2, pp. 516–531, 2020.

R. A. Johnson, D. W. Wichern, and others, Applied multivariate statistical analysis, vol. 6. Pearson London, UK:, 2014.

S. Thudumu, P. Branch, J. Jin, and J. J. Singh, “A comprehensive survey of anomaly detection techniques for high dimensional big data,†J Big Data, vol. 7, no. 1, pp. 1–30, 2020.

H. Liu, X. Li, J. Li, and S. Zhang, “Efficient Outlier Detection for High-Dimensional Data,†IEEE Trans Syst Man Cybern Syst, vol. 48, no. 12, pp. 2451–2461, Dec. 2018, doi: 10.1109/TSMC.2017.2718220.

V. S. L’vov, A. Pomyalov, and I. Procaccia, “Outliers, extreme events, and multiscaling,†Phys Rev E, vol. 63, no. 5, p. 56118, 2001.

X. Xu, H. Liu, and M. Yao, “Recent progress of anomaly detection,†Complexity, 2019.

K. Malik, H. Sadawarti, and K. G S, “Comparative analysis of outlier detection techniques,†in IJCA, 2014, vol. 97, no. 8, pp. 12–21.

D. Ghosh and A. Vogt, “Outliers: An evaluation of methodologies,†in Joint statistical meetings, 2012, vol. 2012.

P. J. Rousseeuw and M. Hubert, “Anomaly detection by robust statistics,†Wiley Interdiscip Rev Data Min Knowl Discov, vol. 8, no. 2, p. e1236, 2018.

J. M. Kim and C. S. Park, “Elimination of multidimensional outliers for a compression chiller using a support vector data description,†Sci Technol Built Environ, vol. 27, no. 5, pp. 578–591, 2021.

G. Horváth, E. Kovács, R. Molontay, and S. Nováczki, “Copula-based anomaly scoring and localization for large-scale, high-dimensional continuous data,†ACM Transactions on Intelligent Systems and Technology (TIST), vol. 11, no. 3, pp. 1–26, 2020.

S. Kandanaarachchi and R. J. Hyndman, “Dimension reduction for outlier detection using DOBIN,†Journal of Computational and Graphical Statistics, vol. 30, no. 1, pp. 204–219, 2021.

S. Suboh and I. A. Aziz, “Anomaly Detection with Machine Learning in the Presence of Extreme Value-A Review Paper,†in 2020 IEEE Conference on Big Data and Analytics (ICBDA), 2020, pp. 66–72.

X. Chen, B. Zhang, T. Wang, A. Bonni, and G. Zhao, “Robust principal component analysis for accurate outlier sample detection in RNA-Seq data,†BMC Bioinformatics, vol. 21, no. 1, pp. 1–20, 2020.

R. Foorthuis, “On the nature and types of anomalies: a review of deviations in data,†Int J Data Sci Anal, vol. 12, no. 4, pp. 297–331, 2021.

H. A. M. Shaffril, A. A. Samah, S. F. Samsuddin, and Z. Ali, “Mirror-mirror on the wall, what climate change adaptation strategies are practiced by the Asian’s fishermen of all?,†J Clean Prod, vol. 232, pp. 104–117, 2019.

P. D. Talagala, R. J. Hyndman, and K. Smith-Miles, “Anomaly detection in high-dimensional data,†Journal of Computational and Graphical Statistics, vol. 30, no. 2, pp. 360–374, 2021.

Y. Öner and H. Bulut, “A robust EM clustering approach: ROBEM,†Communications in Statistics-Theory and Methods, vol. 50, no. 19, pp. 4587–4605, 2021.

H. Song, Z. Jiang, A. Men, and B. Yang, “A hybrid semi-supervised anomaly detection model for high-dimensional data,†Comput Intell Neurosci, vol. 2017, 2017.

W. G. Martinez, M. L. Weese, and L. A. Jones-Farmer, “A one-class peeling method for multivariate outlier detection with applications in phase I SPC,†Qual Reliab Eng Int, vol. 36, no. 4, pp. 1272–1295, 2020.

D. Moher, A. Liberati, J. Tetzlaff, D. G. Altman, and others, “Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement,†Int J Surg, vol. 8, no. 5, pp. 336–341, 2010.

O. O. Aremu, R. A. Cody, D. Hyland-Wood, and P. R. McAree, “A relative entropy based feature selection framework for asset data in predictive maintenance,†Comput Ind Eng, vol. 145, p. 106536, 2020.

S. Anitha and M. Metilda, “An efficient and robust cluster based outlying points detection in multivariate data sets,†International Journal of Engineering & Technology, vol. 7, no. 4, pp. 2881–2885, 2018.

V. Yepmo, G. Smits, O. Pivert, and V. Yepmo Tchaghe, “Anomaly Explanation : A Review Anomaly Explanation: A Review,†2022. [Online]. Available: https://hal.archives-ouvertes.fr/hal-03449887

B. Rad, F. Song, V. Jacob, and Y. Diao, “Explainable anomaly detection on high-dimensional time series data,†in DEBS 2021 - Proceedings of the 15th ACM International Conference on Distributed and Event-Based Systems, Jun. 2021, pp. 142–147. doi: 10.1145/3465480.3468292.

T. Fujiwara, N. Sakamoto, J. Nonaka, K. Yamamoto, K.-L. Ma, and others, “A visual analytics framework for reviewing multivariate time-series data with dimensionality reduction,†IEEE Trans Vis Comput Graph, vol. 27, no. 2, pp. 1601–1611, 2020.