A New Feature Extraction Approach in Classification for Improving the Accuracy of Proteins
DOI: http://dx.doi.org/10.62527/joiv.9.1.2589
Abstract
Keywords
Full Text:
PDFReferences
L. Guruprasad, “Protein Structure,” Resonance, 2019, doi:10.1007/s12045-019-0783-7.
N. Fujii, T. Takata, N. Fujii, K. Aki, and H. Sakaue, “D-Amino acids in protein: The mirror of life as a molecular index of aging,” Biochim. Biophys. Acta - Proteins Proteomics, vol. 1866, no. 7, pp. 840–847, 2018, doi: 10.1016/j.bbapap.2018.03.001.
S. Kadakeri, M. R. Arul, R. Bordett, N. Duraisamy, H. Naik, and S. Rudraiah, Protein synthesis and characterization. Elsevier Ltd., 2020.
Q. Zhong et al., “Protein posttranslational modifications in health and diseases: Functions, regulatory mechanisms, and therapeutic implications,” MedComm, vol. 4, no. 3, pp. 1–112, 2023, doi:10.1002/mco2.261.
F. Li et al., “Positive-unlabelled learning of glycosylation sites in the human proteome,” BMC Bioinformatics, vol. 20, no. 1, pp. 1–17, 2019, doi: 10.1186/s12859-019-2700-1.
T. Pitti, C. T. Chen, H. N. Lin, W. K. Choong, W. L. Hsu, and T. Y. Sung, “N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding,” Sci. Rep., 2019, doi: 10.1038/s41598-019-52341-z.
D. Wang et al., “MusiteDeep: A deep-learning based webserver for protein post-translational modification site prediction and visualization,” Nucleic Acids Res., vol. 48, no. W1, pp. W140–W146, 2021, doi: 10.1093/nar/gkaa275.
Y. Zhang and L. Sun, “Sweetening the Deal: Glycosylation and its Clinical Applications,” J. Biomed. Sci., vol. 9, no. 3, pp. 1–7, 2020, doi: 10.36648/2254-609x.9.3.9.
Y. Mazola, G. Chinea, and A. Musacchio, “Integrating bioinformatics tools to handle glycosylation,” PLoS Comput. Biol., vol. 7, no. 12, pp. 1–8, 2011, doi: 10.1371/journal.pcbi.1002285.
G. Taherzadeh, A. Dehzangi, M. Golchin, Y. Zhou, and M. P. Campbell, “SPRINT-Gly: Predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties,” Bioinformatics, vol. 35, no. 20, pp. 4140–4146, 2019, doi: 10.1093/bioinformatics/btz215.
A. V. Everest-Dass, E. S. X. Moh, C. Ashwood, A. M. M. Shathili, and N. H. Packer, “Human disease glycomics: technology advances enabling protein glycosylation analysis–part 2,” Expert Review of Proteomics. 2018, doi: 10.1080/14789450.2018.1448710.
H. Bashir, B. A. Wani, B. A. Ganai, and S. A. Mir, “Protein Glycosylation: An Important Tool for Diagnosis or Early Detection of Diseases,” Protein Modificomics, pp. 339–359, 2019, doi:10.1016/b978-0-12-811913-6.00013-8.
F. R. Lumbanraja, B. Mahesworo, T. W. Cenggoro, A. Budiarto, and B. Pardamean, “An evaluation of deep neural network performance on limited protein phosphorylation site prediction data,” Procedia Comput. Sci., vol. 157, pp. 25–30, 2019, doi:10.1016/j.procs.2019.08.137.
S. Cramer, D. Buschmann, and R. H. Schmitt, “Comparison of Feature Extraction Algorithms for Prediction of Quality Characteristics,” Procedia CIRP, vol. 112, pp. 579–584, 2022, doi:10.1016/j.procir.2022.09.061.
F. Li et al., “GlycoMine: A machine learning-based approach for predicting N-, C-and O-linked glycosylation in the human proteome,” Bioinformatics, vol. 31, no. 9, pp. 1411–1419, 2015, doi:10.1093/bioinformatics/btu852.
C.-H. Chien, C.-C. Chang, S.-H. Lin, C.-W. Chen, Z.-H. Chang, and Y.-W. Chu, “N-GlycoGo: Predicting Protein N-Glycosylation Sites on Imbalanced Data Sets by Using Heterogeneous and Comprehensive Strategy,” IEEE Access, 2020, doi: 10.1109/access.2020.3022629.
A. Alkuhlani, W. Gad, M. Roushdy, and A. B. M. Salem, “PUStackNGly: Positive-Unlabeled and Stacking Learning for N-Linked Glycosylation Site Prediction,” IEEE Access, vol. 10, pp. 12702–12713, 2022, doi: 10.1109/access.2022.3146395.
A. Alkuhlani, W. Gad, and M. Roushdy, “International Journal of Intelligent Prediction of O-Glycosylation Site Using Pre-Trained,” vol. 23, no. 1, pp. 41–52, 2023, doi:10.21608/ijicis.2023.160986.1218.
A. Bateman et al., “UniProt: The universal protein knowledgebase,” Nucleic Acids Res., vol. 45, no. D1, pp. D158–D169, 2017, doi:10.1093/nar/gkw1099.
P. Regan, P. L. McClean, T. Smyth, and M. Doherty, “Early Stage Glycosylation Biomarkers in Alzheimer’s Disease,” Medicines, vol. 6, no. 3, p. 92, 2019, doi: 10.3390/medicines6030092.
U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1060–1073, 2022, doi: 10.1016/j.jksuci.2019.06.012.
N. De Jay, S. Papillon-Cavanagh, C. Olsen, N. El-Hachem, G. Bontempi, and B. Haibe-Kains, “MRMRe: An R package for parallelized mRMR ensemble feature selection,” Bioinformatics, vol. 29, no. 18, pp. 2365–2368, 2013, doi: 10.1093/bioinformatics/btt383.
M. Radovic, M. Ghalwash, N. Filipovic, and Z. Obradovic, “Minimum redundancy maximum relevance feature selection approach for temporal gene expression data,” BMC Bioinformatics, vol. 18, no. 1, pp. 1–14, 2017, doi: 10.1186/s12859-016-1423-9.
T. Chen, T. He, M. Benesty, V. Khotilovich, and Y. Tang, “xgboost: Customized Extreme Gradient Boosting,” pp. 1–4, 2018, [Online]. Available: https://cran.r-project.org/package=xgboost.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
L. Zhang and C. Zhan, “Machine Learning in Rock Facies Classification: An Application of XGBoost,” pp. 1371–1374, 2017, doi: 10.1190/igc2017-351.
T. Chen and T. He, “xgboost: Extreme Gradient Boosting,” R Lect., no. 2016, pp. 1–84, 2014.
D. Berrar, “Cross-validation,” Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 2018. .
M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, and A. Ralescu, “Confusion-matrix-based kernel logistic regression for imbalanced data classification,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9, pp. 1806–1819, 2017, doi: 10.1109/TKDE.2017.2682249.
B. Ma, F. Meng, G. Yan, H. Yan, B. Chai, and F. Song, “Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data,” Comput. Biol. Med., 2020, doi:10.1016/j.compbiomed.2020.103761.