Feature Selection Techniques for Selecting Proteins that Influence Mouse Down Syndrome Using Genetic Algorithms and Random Forests

Fiqhri Putra - Department of Computer Science, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University, 16680, Indonesia
Fadhlal Surado - Department of Computer Science, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University, 16680, Indonesia
Global Sampurno - Department of Computer Science, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University, 16680, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.4.3.375

Abstract


Feature selection technique is a technique to reduce data dimensions which are widely used to find the set of features that best represent data. One area of science that often applies this technique is bioinformatics. An example of its application is the selection of significant proteins in the case of Down syndrome. To find out the most influential protein, experiments were carried out on normal mice with trisomy rats (down syndrome mice) totaling 1080 sample and obtained 77 levels of protein expression. The analysis carried out was divided into three groups. Each group was searched for the most influential proteins using genetic algorithms with fitness calculations using random forest algorithms. The results of the protein selection of the three data groups indicate the relationship of the selected proteins to the improvement of learning ability and memory. The results of evaluating selected protein models show a high degree of accuracy, which is above 98.7% for each data group.

Keywords


genetic algorithm, protein expression, random forest, feature selection, down syndrome mouse.

Full Text:

PDF

References


Chandrashekar G, Sahin F. 2014. A survey on feature selection methods. Computers & Electrical Engineering. 40 (1): 16-28. doi: 10.1016 / j.compeleceng.2013.11.024.

Fong S, Zhuang Y, Tang R, Yang XS, Deb S. 2013. Selecting optimal feature sets in high-dimensional data by swarm search. Journal of Applied Mathematics. 2013: 1-18. doi: 10.1155 / 2013/590614.

Guyon, I. and Elisseeff, A., 2003. An introduction to variable and feature selection. Journal of machine learning research, 3 (Mar), pp.1157-1182.

Li J, Liu H. 2017. Challenges of feature selection for big data analytics. IEEE Intelligent Systems. 32 (2): 9-15. doi: 10.1109 / e.g.2017.38.

Kulan H, Dag T. 2019. In silico identification of critical proteins associated with learning processes and immune systems for Down syndrome. PLoS ONE 14 (1): e0210954.https: // doi. org / 10.1371 / journal.pone.0210954

Rajappa GP. 2012. Solving Combinatorial Optimization Problems Using Genetic Algorithms and Ant Colony Optimization [dissertation]. Knoxville (US): University of Tennessee.

Mitchell M. 1996. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press. ISBN 9780585030944.

Ahmed MM, Dhanasekaran AR, Block A, Tong S, Costa ACS, Gardiner KJ. Protein Profiles Associated With Context Fear Conditioning and Their Modulation by Memantine. Molecular Cellular Proteomics: MCP. 2014; 13 (4): 919–937. https://doi.org/10.1074/mcp.M113.035568 PMID: 24469516

Newcomer JW, Farber NB, Olney JW. 2000. NMDA receptor function, memory, and brain aging. Dialogues in clinical neuroscience, 2 (3), 219-232.

Ahmed MM, Dhanasekaran AR, Block A, Tong S, Costa ACS, Stasko M, et al. Protein dynamics associated with failed and rescued learning in the Ts65Dn mouse model of Down syndrome. In Cunto F, ed. PLoS ONE. 2015; 10 (3): e0119491. https://doi.org/10.1371/journal.pone.0119491 PMID: 25793384

Czabotar PE, Lessene G, Strasser A, Adams JM. 2013. Control of apoptosis by the BCL-2 protein family: implications for physiology and therapy. Nature Reviews Molecular Cell Biology. 15 (1): 49-63. doi: 10.1038 / nrm3722.

Harada H, Andersen JS, Mann M, Terada N, Korsmeyer SJ. 2001. P70s6 kinase signals cell survival as well as growth, inactivating the pro-apoptotic molecule BAD. Proceedings of the National Academy of Sciences. 98 (17): 9666-9670. doi: 10.1073 / pnas.171301998.

Higuera C, Gardiner KJ, Cios KJ. Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome. PLoS ONE. 2015; 10 (6): e0129126. https://doi.org/10.1371/journal. pone.0129126 PMID: 26111164.