A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data

Syahirah Suboh; Izzatdin Abdul Aziz; Shazlyn Milleana Shaharudin; Saidatul Akmar Ismail; Hairulnizam Mahdin

doi:10.30630/joiv.7.1.1297

A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data

Syahirah Suboh - Universiti Teknologi PETRONAS, Perak, Malaysia
Izzatdin Aziz - Universiti Teknologi PETRONAS, Perak, Malaysia
Shazlyn Shaharudin - Universiti Pendidikan Sultan Idris, Perak, Malaysia
Saidatul Ismail - Universiti Teknologi MARA, Selangor, Malaysia
Hairulnizam Mahdin - Universiti Tun Hussien Onn, Johor, Malaysia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.7.1.1297

Abstract

In data analysis, recognizing unusual patterns (outliersâ€™ analysis or anomaly detection) plays a crucial role in identifying critical events. Because of its widespread use in many applications, it remains an important and extensive research brand in data mining. As a result, numerous techniques for finding anomalies have been developed, and more are still being worked on. Researchers can gain vital knowledge by identifying anomalies, which helps them make better meaningful data analyses. However, anomaly detection is even more challenging when the datasets are high-dimensional and multivariate. In the literature, anomaly detection has received much attention but not as much as anomaly detection, specifically in high dimensional and multivariate conditions. This paper systematically reviews the existing related techniques and presents extensive coverage of challenges and perspectives of anomaly detection within high-dimensional and multivariate data. At the same time, it provides a clear insight into the techniques developed for anomaly detection problems. This paper aims to help select the best technique that suits its rightful purpose. It has been found that PCA, DOBIN, Stray algorithm, and DAE-KNN have a high learning rate compared to Random projection, ROBEM, and OCP methods. Overall, most methods have shown an excellent ability to tackle the curse of dimensionality and multivariate features to perform anomaly detection. Moreover, a comparison of each algorithm for anomaly detection is also provided to produce a better algorithm. Finally, it would be a line of future studies to extend by comparing the methods on other domain-specific datasets and offering a comprehensive anomaly interpretation in describing the truth of anomalies.

Keywords

anomaly detection; high-dimensional data; multivariate data, information science

Full Text:

PDF

References

M. Ã‡elik, F. DadaÅŸer-Ã‡elik, and A. Åž. Dokuz, â€œAnomaly detection in temperature data using dbscan algorithm,â€ in 2011 international symposium on innovations in intelligent systems and applications, 2011, pp. 91â€“95.

R. Alguliyev, R. Aliguliyev, and L. Sukhostat, â€œAnomaly detection in Big data based on clustering,â€ Statistics, Optimization & Information Computing, vol. 5, no. 4, pp. 325â€“340, 2017.

I. Ben-Gal, â€œOutlier detection,â€ in Data mining and knowledge discovery handbook, Springer, 2005, pp. 131â€“146.

S. Ayesha, M. K. Hanif, and R. Talib, â€œOverview and comparative study of dimensionality reduction techniques for high dimensional data,â€ Information Fusion, vol. 59, pp. 44â€“58, 2020.

A. Ukil, S. Bandyoapdhyay, C. Puri, and A. Pal, â€œIoT healthcare analytics: The importance of anomaly detection,â€ in 2016 IEEE 30th international conference on advanced information networking and applications (AINA), 2016, pp. 994â€“997.

L. Basora, X. Olive, and T. Dubot, â€œRecent advances in anomaly detection methods applied to aviation,â€ Aerospace, vol. 6, no. 11, p. 117, 2019.

M. A. Hayes and M. A. M. Capretz, â€œContextual anomaly detection framework for big sensor data,â€ J Big Data, vol. 2, no. 1, p. 2, 2015.

A. Sreenivasulu, â€œEvaluation of cluster based Anomaly detection.â€ 2019.

X. Yang, Z. Wang, and X. Zi, â€œThresholding-based outlier detection for high-dimensional data,â€ J Stat Comput Simul, vol. 88, no. 11, pp. 2170â€“2184, 2018.

P. Navarro-Esteban and J. A. Cuesta-Albertos, â€œHigh-dimensional outlier detection using random projections,â€ TEST, pp. 1â€“27, 2021.

H. Wang, M. J. Bah, and M. Hammad, â€œProgress in outlier detection techniques: A survey,â€ Ieee Access, vol. 7, pp. 107964â€“108000, 2019.

N. R. Prasad, S. Almanza-Garcia, and T. T. Lu, â€œAnomaly detection,â€ Computers, Materials and Continua, vol. 14, no. 1, pp. 1â€“22, 2009, doi: 10.1145/1541880.1541882.

D. Samariya and A. Thakkar, â€œA Comprehensive Survey of Anomaly Detection Algorithms,â€ Annals of Data Science. Springer Science and Business Media Deutschland GmbH, 2021. doi: 10.1007/s40745-021-00362-9.

Y. Yang, L. Chen, and C. Fan, â€œELOF: fast and memory-efficient anomaly detection algorithm in data streams,â€ Soft comput, vol. 25, no. 6, pp. 4283â€“4294, 2021.

E. Uzabaci, I. Ercan, and O. Alpu, â€œEvaluation of outlier detection method performance in symmetric multivariate distributions,â€ Communications in Statistics-Simulation and Computation, vol. 49, no. 2, pp. 516â€“531, 2020.

R. A. Johnson, D. W. Wichern, and others, Applied multivariate statistical analysis, vol. 6. Pearson London, UK:, 2014.

S. Thudumu, P. Branch, J. Jin, and J. J. Singh, â€œA comprehensive survey of anomaly detection techniques for high dimensional big data,â€ J Big Data, vol. 7, no. 1, pp. 1â€“30, 2020.

H. Liu, X. Li, J. Li, and S. Zhang, â€œEfficient Outlier Detection for High-Dimensional Data,â€ IEEE Trans Syst Man Cybern Syst, vol. 48, no. 12, pp. 2451â€“2461, Dec. 2018, doi: 10.1109/TSMC.2017.2718220.

V. S. Lâ€™vov, A. Pomyalov, and I. Procaccia, â€œOutliers, extreme events, and multiscaling,â€ Phys Rev E, vol. 63, no. 5, p. 56118, 2001.

X. Xu, H. Liu, and M. Yao, â€œRecent progress of anomaly detection,â€ Complexity, 2019.

K. Malik, H. Sadawarti, and K. G S, â€œComparative analysis of outlier detection techniques,â€ in IJCA, 2014, vol. 97, no. 8, pp. 12â€“21.

D. Ghosh and A. Vogt, â€œOutliers: An evaluation of methodologies,â€ in Joint statistical meetings, 2012, vol. 2012.

P. J. Rousseeuw and M. Hubert, â€œAnomaly detection by robust statistics,â€ Wiley Interdiscip Rev Data Min Knowl Discov, vol. 8, no. 2, p. e1236, 2018.

J. M. Kim and C. S. Park, â€œElimination of multidimensional outliers for a compression chiller using a support vector data description,â€ Sci Technol Built Environ, vol. 27, no. 5, pp. 578â€“591, 2021.

G. HorvÃ¡th, E. KovÃ¡cs, R. Molontay, and S. NovÃ¡czki, â€œCopula-based anomaly scoring and localization for large-scale, high-dimensional continuous data,â€ ACM Transactions on Intelligent Systems and Technology (TIST), vol. 11, no. 3, pp. 1â€“26, 2020.

S. Kandanaarachchi and R. J. Hyndman, â€œDimension reduction for outlier detection using DOBIN,â€ Journal of Computational and Graphical Statistics, vol. 30, no. 1, pp. 204â€“219, 2021.

S. Suboh and I. A. Aziz, â€œAnomaly Detection with Machine Learning in the Presence of Extreme Value-A Review Paper,â€ in 2020 IEEE Conference on Big Data and Analytics (ICBDA), 2020, pp. 66â€“72.

X. Chen, B. Zhang, T. Wang, A. Bonni, and G. Zhao, â€œRobust principal component analysis for accurate outlier sample detection in RNA-Seq data,â€ BMC Bioinformatics, vol. 21, no. 1, pp. 1â€“20, 2020.

R. Foorthuis, â€œOn the nature and types of anomalies: a review of deviations in data,â€ Int J Data Sci Anal, vol. 12, no. 4, pp. 297â€“331, 2021.

H. A. M. Shaffril, A. A. Samah, S. F. Samsuddin, and Z. Ali, â€œMirror-mirror on the wall, what climate change adaptation strategies are practiced by the Asianâ€™s fishermen of all?,â€ J Clean Prod, vol. 232, pp. 104â€“117, 2019.

P. D. Talagala, R. J. Hyndman, and K. Smith-Miles, â€œAnomaly detection in high-dimensional data,â€ Journal of Computational and Graphical Statistics, vol. 30, no. 2, pp. 360â€“374, 2021.

Y. Ã–ner and H. Bulut, â€œA robust EM clustering approach: ROBEM,â€ Communications in Statistics-Theory and Methods, vol. 50, no. 19, pp. 4587â€“4605, 2021.

H. Song, Z. Jiang, A. Men, and B. Yang, â€œA hybrid semi-supervised anomaly detection model for high-dimensional data,â€ Comput Intell Neurosci, vol. 2017, 2017.

W. G. Martinez, M. L. Weese, and L. A. Jones-Farmer, â€œA one-class peeling method for multivariate outlier detection with applications in phase I SPC,â€ Qual Reliab Eng Int, vol. 36, no. 4, pp. 1272â€“1295, 2020.

D. Moher, A. Liberati, J. Tetzlaff, D. G. Altman, and others, â€œPreferred reporting items for systematic reviews and meta-analyses: the PRISMA statement,â€ Int J Surg, vol. 8, no. 5, pp. 336â€“341, 2010.

O. O. Aremu, R. A. Cody, D. Hyland-Wood, and P. R. McAree, â€œA relative entropy based feature selection framework for asset data in predictive maintenance,â€ Comput Ind Eng, vol. 145, p. 106536, 2020.

S. Anitha and M. Metilda, â€œAn efficient and robust cluster based outlying points detection in multivariate data sets,â€ International Journal of Engineering & Technology, vol. 7, no. 4, pp. 2881â€“2885, 2018.

V. Yepmo, G. Smits, O. Pivert, and V. Yepmo Tchaghe, â€œAnomaly Explanation : A Review Anomaly Explanation: A Review,â€ 2022. [Online]. Available: https://hal.archives-ouvertes.fr/hal-03449887

B. Rad, F. Song, V. Jacob, and Y. Diao, â€œExplainable anomaly detection on high-dimensional time series data,â€ in DEBS 2021 - Proceedings of the 15th ACM International Conference on Distributed and Event-Based Systems, Jun. 2021, pp. 142â€“147. doi: 10.1145/3465480.3468292.

T. Fujiwara, N. Sakamoto, J. Nonaka, K. Yamamoto, K.-L. Ma, and others, â€œA visual analytics framework for reviewing multivariate time-series data with dimensionality reduction,â€ IEEE Trans Vis Comput Graph, vol. 27, no. 2, pp. 1601â€“1611, 2020.

Username
Password
Remember me