Entropy Based Method for Malicious File Detection

Muhammad Edzuan Zainodin - Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
Zalmiyah Zakaria - Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
Rohayanti Hassan - Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
Zubaile Abdullah - Universiti Tun Hussein Onn Malaysia, Batu Pahat, Johor, Malaysia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.6.4.1265


Ransomware is by no means a recent invention, having existed as far back as 1989, yet it still poses a real threat in the 21st century. Given the increasing number of computer users in recent years, this threat will only continue to grow, affecting more victims as well as increasing the losses incurred towards the people and organizations impacted in a successful attack. In most cases, the only remaining courses of action open to victims of such attacks were the following: either pay the ransom or lose their data. One commonly shared behavior by all crypto ransomware strains is that there will be attempts to encrypt the victims’ files at a certain point during the ransomware execution. This paper demonstrates a technique that can identify when these encrypted files are being generated and is independent of the strain of the ransomware. Previous research has highlighted the difficulty in differentiating between compressed and encrypted files using Shannon entropy, as both file types exhibit similar values. Among the experiments described in this study, one showed a unique characteristic for the Shannon entropy of encrypted file header fragments, which was used to differentiate between encrypted files and other high entropy files such as archives. The Shannon entropy of encrypted file header fragments has a unique characteristic in one of the tests discussed in this study. This property was used to distinguish encrypted files from other files with high entropy, such as archives. To overcome this drawback, this study proposed an approach for test case generation by enhancing the entropy-based threat tree model, which would improve malicious file identification. The file identification was enhanced by combining three entropy algorithms, and the test case was generated based on the threat tree model. This approach was then evaluated using accuracy measurements: True Positive, True Negative, False Positive, False Negative. A promising result is expected. This method solves the challenge of leveraging file entropy to distinguish compressed and archived files from ransomware-encrypted files in a timely manner.


Entropy; malicious; ransomware.

Full Text:



Frank Swiderski, W. S. (2004). Threat Modeling. Microsoft PressDiv. of Microsoft Corp. One Microsoft Way Redmond, WA United States.

Amirani, M. C., Toorani, M., & Shirazi, A. A. B. (2008). A new approach to content-based file type detection. Proceedings - IEEE Symposium on Computers and Communications. https://doi.org/10.1109/ISCC.2008.4625611

Ammann, P., & Offutt, J. (2016). Introduction to Software Testing. In Introduction to Software Testing. https://doi.org/10.1017/9781316771273

Bajpai, P., & Enbody, R. (2020). An Empirical Study of Key Generation in Cryptographic Ransomware. International Conference on Cyber Security and Protection of Digital Services, Cyber Security 2020. https://doi.org/10.1109/CyberSecurity49315.2020.9138878

Bat-Erdene, M., Kim, T., Park, H., & Lee, H. (2017). Packer detection for multi-layer executables using entropy analysis. Entropy, 19(3). https://doi.org/10.3390/e19030125

Beebe, N. L., Maddox, L. A., Liu, L., & Sun, M. (2013). Sceadan: Using concatenated N-gram vectors for improved file and data type classification. IEEE Transactions on Information Forensics and Security, 8(9). https://doi.org/10.1109/TIFS.2013.2274728

Bereziński, P., Jasiul, B., & Szpyrka, M. (2015). An entropy-based network anomaly detection method. Entropy, 17(4). https://doi.org/10.3390/e17042367

Chew, C. J. W., & Kumar, V. (2019). Behaviour based ransomware detection. Proceedings of 34th International Conference on Computers and Their Applications, CATA 2019. https://doi.org/10.29007/t5q7

Conti, G., Bratus, S., Shubina, A., Sangster, B., Ragsdale, R., Supan, M., Lichtenberg, A., & Perez-Alemany, R. (2010). Automated mapping of large binary objects using primitive fragment type classification. Digital Investigation, 7(SUPPL.). https://doi.org/10.1016/j.diin.2010.05.002

Damashek, M. (1995). Gauging similarity with n-grams: Language-independent categorization of text. Science, 267(5199). https://doi.org/10.1126/science.267.5199.843

Davies, S. R., Macfarlane, R., & Buchanan, W. J. (2021). Differential Area Analysis for Ransomware Attack Detection within Mixed File Datasets. https://doi.org/10.1016/j.cose.2021.102377

Divakaran, D. M., Liau, Y. S., & Thing, V. L. L. (2016). Accurate in-network file-type classification. Cryptology and Information Security Series, 14. https://doi.org/10.3233/978-1-61499-617-0-139

Ezhilarasan, M., Thambidurai, P., Praveena, K., Srinivasan, S., & Sumathi, N. (2008). A new entropy encoding technique for multimedia data compression. Proceedings - International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 2007, 4. https://doi.org/10.1109/ICCIMA.2007.22

Felderer, M., Büchler, M., Johns, M., Brucker, A. D., Breu, R., & Pretschner, A. (2016). Chapter One – Security Testing: A Survey. Advances in Computers, 101.

Fitzgerald, S., Mathews, G., Morris, C., & Zhulyn, O. (2012). Using NLP techniques for file fragment classification. Proceedings of the Digital Forensic Research Conference, DFRWS 2012 USA. https://doi.org/10.1016/j.diin.2012.05.008

Garg, A., Curtis, J., & Halper, H. (2003). Quantifying the financial impact of IT security breaches. Information Management and Computer Security, 11(2–3). https://doi.org/10.1108/09685220310468646

Guo, J., He, J., & Huang, N. (2016). Research of Multiple-type Files Carving Method Based on Entropy.

Hall, G. A. (2006). Sliding Window Measurement for File Type Identification. Proceedings of the 1997 ACM Symposium on Applied Computing.

Iwamoto, K., & Wasaki, K. (2016). A Method for Shellcode Extractionfrom Malicious Document Files Using Entropy and Emulation. International Journal of Engineering and Technology, 8(2). https://doi.org/10.7763/ijet.2016.v8.866

Karampidis, K., & Papadourakis, G. (2017). File Type Identification - Computational Intelligence for Digital Forensics. The Journal of Digital Forensics, Security and Law. https://doi.org/10.15394/jdfsl.2017.1472

Karampidis, K., Papadourakis, G., & Deligiannis, I. (2015). File Type Identification - A Literature Review. 9th International Conference on New Horizons in Industry Business and Education, NHIBE 2015.

Karen, S., & Angela, O. (2008). NIST - Technical Guide to Information Security Testing and Assessment Recommendations. Nist Special Publication, 800. https://doi.org/10.6028/NIST.SP.800-115

Karresand, M., & Shahmehri, N. (2006). File type identification of data fragments by their binary structure. Proceedings of the 2006 IEEE Workshop on Information Assurance, 2006. https://doi.org/10.1109/iaw.2006.1652088

Lee, K., Lee, S. Y., & Yim, K. (2019). Machine Learning Based File Entropy Analysis for Ransomware Detection in Backup Systems. IEEE Access, 7. https://doi.org/10.1109/ACCESS.2019.2931136

Li, W. J., Wang, K., Stolfo, S. J., & Herzog, B. (2005). Fileprints: Identifying file types by n-gram analysis. Proceedings from the 6th Annual IEEE System, Man and Cybernetics Information Assurance Workshop, SMC 2005, 2005. https://doi.org/10.1109/IAW.2005.1495935

Lyda, R., & Hamrock, J. (2007). Using entropy analysis to find encrypted and packed malware. In IEEE Security and Privacy (Vol. 5, Issue 2). https://doi.org/10.1109/MSP.2007.48

McDaniel, M., & Heydari, M. H. (2003). Content based file type detection algorithms. Proceedings of the 36th Annual Hawaii International Conference on System Sciences, HICSS 2003. https://doi.org/10.1109/HICSS.2003.1174905

McGraw, G. (2006). Software Security: Building Security in. Proceedings - International Symposium on Software Reliability Engineering, ISSRE. https://doi.org/10.1109/ISSRE.2006.43

Pareek, H., & Hyderabad, C.-D. (2014). Entropy and n-gram analysis of malicious PDF documents. https://www.researchgate.net/publication/235974671


Potter, B., & McGraw, G. (2004). Software security testing. In IEEE Security and Privacy (Vol. 2, Issue 5). https://doi.org/10.1109/MSP.2004.84

Rényi, A. (1955). On a new axiomatic theory of probability. Acta Mathematica Academiae Scientiarum Hungaricae, 6(3–4). https://doi.org/10.1007/BF02024393

Revo, R., Made, G., Sasmita, A., Agus, I. P., & Pratama, E. (2020). Testing for Information Gathering Using OWASP Testing Guide v4 (Case Study : Udayana University SIMAK-NG Application). Jurnal Ilmiah Teknologi Dan Komputer, 1(1).

Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Shannon, C. E. (1997). The Mathematical Theory of Communication. M.D. Computing, 14(4). https://doi.org/10.2307/410457

Shannon, M. M. (2004). Forensic Relative Strength Scoring: ASCII and Entropy Scoring. In International Journal of Digital Evidence Spring (Vol. 2, Issue 4). www.ijde.org

Tian-yang, G., Yin-sheng, S., & You-yuan, F. (2010). Research on Software security testing. World Academy of Science, Engineering and Technology, 70. https://doi.org/10.5281/zenodo.1081389

Tsallis, C., Mendes, R. S., & Plastino, A. R. (1998). The role of constraints within generalized nonextensive statistics. Physica A: Statistical Mechanics and Its Applications, 261(3–4). https://doi.org/10.1016/S0378-4371(98)00437-3

Xie, H., Abdullah, A., & Sulaiman, R. (2013). Byte Frequency Analysis Descriptor With Spatial Information For File Fragment Classification. 25–26.

Young, A. L., & Yung, M. (2017). Cryptovirology: The birth, neglect, and explosion of ransomware: Recent attacks exploiting a known vulnerability continue a downward spiral of ransomware-related incidents. In Communications of the ACM (Vol. 60, Issue 7). https://doi.org/10.1145/3097347

Young, A., & Yung, M. (1996). Cryptovirology: extortion-based security threats and countermeasures. Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy. https://doi.org/10.1109/secpri.1996.502676