Transformer in mRNA Degradation Prediction

Tan Wen Yit - Universiti Teknologi Malaysia, Johor, Malaysia
Rohayanti Hassan - Universiti Teknologi Malaysia, Johor, Malaysia
Noor Hidayah Zakaria - Universiti Teknologi Malaysia, Johor, Malaysia
Shahreen Kasim - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia
Sim Hiew Moi - Universiti Teknologi Malaysia, Johor, Malaysia
Alif Ridzuan Khairuddin - Universiti Teknologi Malaysia, Johor, Malaysia
Hidra Amnur - Politeknik Negeri Padang, Sumatera Barat, Indonesia

Citation Format:



The unstable properties and the advantages of the mRNA vaccine have encouraged many experts worldwide in tackling the degradation problem. Machine learning models have been highly implemented in bioinformatics and the healthcare fieldstone insights from biological data. Thus, machine learning plays an important role in predicting the degradation rate of mRNA vaccine candidates. Stanford University has held an OpenVaccine Challenge competition on Kaggle to gather top solutions in solving the mentioned problems, and a multi-column root means square error (MCRMSE) has been used as a main performance metric. The Nucleic Transformer has been proposed by different researchers as a deep learning solution that is able to utilize a self-attention mechanism and Convolutional Neural Network (CNN). Hence, this paper would like to enhance the existing Nucleic Transformer performance by utilizing the AdaBelief or RangerAdaBelief optimizer with a proposed decoder that consists of a normalization layer between two linear layers. Based on the experimental result, the performance of the enhanced Nucleic Transformer outperforms the existing solution. In this study, the AdaBelief optimizer performs better than the RangerAdaBelief optimizer, even though it possesses Ranger’s advantages. The advantages of the proposed decoder can only be shown when there is limited data. When the data is sufficient, the performance might be similar but still better than the linear decoder if and only if the AdaBelief optimizer is used. As a result, the combination of the AdaBelief optimizer with the proposed decoder performs the best with 2.79% and 1.38% performance boost in public and private MCRMSE, respectively.


Transformer; optimizer; AdaBelief

Full Text:



Pardi N, Hogan M, Porter F, Weissman D. (2018). ‘mRNA vaccines — a new era in vaccinology’. Nat Rev Drug Discovery 17, 261–279. doi: 10.1038/nrd.2017.243.

Mohammadi Y, Nezafat N, Negahdaripour M, Eskandari S, Zamani M. (2022). ‘In silico design and evaluation of a novel mRNA vaccine against BK virus: a reverse vaccinology approach.’ Immunol Res 1, 1-20. doi: 10.1007/s12026-022-09351-3.

Miao L, Zhang Y, Huang L. (2021). ‘mRNA vaccine for cancer immunotherapy.’ Mol Cancer 20(1), 41. doi: 10.1186/s12943-021-01335-5.

Fan N, Chen K, Zhu R, Zhang Z, Huang H, Qin S, Zheng Q, He Z, He X, Xiao W, Zhang Y, Gu Y, Zhao C, Liu Y, Jiang X, Li S, Wei Y, Song X. ‘Manganese-coordinated mRNA vaccines with enhanced mRNA expression and immunogenicity induce robust immune responses against SARS-CoV-2 variants’. Sci Adv. 8 (51), eabq3500. doi: 10.1126/sciadv.abq3500.

Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X. (2019). Deep learning in bioinformatics Introduction, application, and perspective in the big data era. Methods. 166. 4-21. doi: 10.1016/j.ymeth.2019.04.008.

Ramkumar T, George PD, Gnanasambandan R, Mohanraj G, Venketesh P. (2022). ‘Towards computational solutions for precision medicine based big data healthcare system using deep learning models: A review’. Computers in Biology and Medicine 149, 106020. doi: 10.1016/j.compbiomed.2022.106020.

Zhao L, Xiaoxuan C, Shaoqiang S, Hongwei Z, Xi W, Hong C, Weifu L, Lin L. (2022). ‘DeepBSA: A deep-learning algorithm improves bulked segregant analysis for dissecting complex traits.’ Molecular Plant 15(9), 1418-1427. doi: 10.1016/j.molp.2022.08.004.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones, L, Gomez AN, Kaiser L, Polosukhin I. (2017). Attention Is All You Need. Computing Research Repository (CoRR).

Skansi S. (2018). Introduction to Deep Learning: From Logical Calculus to Artificial Intelligence (Undergraduate Topics in Computer Science) (1st ed. 2018 ed.). Springer. doi: 10.1007/978-3-319-73004-2.

He S, Gao B, Sabnis R, Sun Q. (2021). Nucleic Transformer: Deep Learning on Nucleic Acids with Self-Attention and Convolutions. doi: 10.1101/2021.01.28.428629.

Ruhul A, Chowdhury RR, Sajid A, Md Habibur RS, Md Nazmul KL, Md Moshiur R, Md Zahid HK, Swakkhar S. (2020). ‘iPromoter-BnCNN: a novel branched513CNN-based predictor for identifying and classifying sigma promoters.’ Bioinformatics 7, btaa609. doi: 10.1093/bioinformatics/btaa609.

Liu B, Yang F, Huang DS, Chou KC. (2018). ‘iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC.’ Bioinformatics 34(1), 33-40. doi: 10.1093/bioinformatics/btx579.

Khare A. (2020). Covid19 feature engineering xgboost. Kaggle.

Castro TF. (2020). HistGradientBoosting Baseline. Kaggle.

Imran SA, Islam MT, Shahnaz C, Islam MT, Imam O, Haque M. (2020). COVID-19 mRNA Vaccine Degradation Prediction using Regularized LSTM Model. 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), 328-331. doi: 10.1109/WIECON-ECE52138.2020.9398044.

Amgad M, Suliman MF, Nur Arifin A, David A, Setyanto TW. (2022). ‘iVaccine-Deep: Prediction of COVID-19 mRNA vaccine degradation using deep learning.’ Journal of King Saud University - Computer and Information Sciences 34(9), 7419-7432. doi: 10.1016/j.jksuci.2021.10.001.

Lin, W. (2020). [covid] AE pretrain + GNN + Attn + CNN. Kaggle.

Yosinski, J., Clune, J., Bengio, Y., Lipson, H. (2014). How transferable are features in deep neural networks? Advances in neural information processing systems, 3320–3328.

Juha K, Tapani R, Kyung HC. (2015). ‘Chapter 7 - Unsupervised deep learning: A short review.’ Advances in Independent Component Analysis and Learning Machines 1, 125-142. doi: 10.1016/B978-0-12-802806-3.00007-5.

Alzubaidi L, Zhang J, Humaidi AJ, Ayad Al-D, Ye D, Omran Al-S, Santamaría J, Mohammed AF, Al-Amidie M, Farhan L. (2021). ‘Review of deep learning: concepts, CNN architectures, challenges, applications, future directions.’ J Big Data 8, 53. doi: 10.1186/s40537-021-00444-8.

Horwath JP, Zakharov DN, Mégret R, Eric AS. (2020). ‘Understanding important features of deep learning models for segmentation of high-resolution transmission electron microscopy images.’ npj Comput Mater 6, 108. doi: 10.1038/s41524-020-00363-x.

Perez, L., Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning.

Manoj Krishna M, Neelima M, Mane H, Rao MatchaVG. (2018). ‘Image classification using Deep learning.’ International Journal of Engineering & Technology 7(2.7), 614. doi: 10.14419/ijet.v7i2.7.10892.

Qing L, Suzhen Z, Yuechun W. (2022). ‘Deep Learning Model of Image Classification Using Machine Learning.’ Advances in Multimedia 1, 12 pages. doi: 10.1155/2022/3351256.

Ioffe, S., Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift.

Dean J, Corrado G, Monga R, Chen K, Devin M, Le Quoc V, Mao M, Ranzato MA, Senior A, Tucker P, Yang K, Ng AY. (2012). ‘Large scale distributed deep networks.’ In NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems 1, 1223-1231.

Z. C. Lipton, “The Mythos of Model Interpretability,” Queue, vol. 16, no. 3, pp. 31–57, Jun. 2018, doi: 10.1145/3236386.3241340.

Sharma, S. (2022). ‘Multi-SAP Adversarial Defense for Deep Neural Networks.’ International Journal of Advanced Science Computing and Engineering 4(1), 32-47. doi: 10.30630/ijasce.4.1.76.

Alani, A.A., Alani, A.A., Ani, K.A.A.A. (2021). ‘COVID-CNNnet: Convolutional Neural Network for Coronavirus Detection.’ International Journal of Data Science, 2(1), 9-18. doi: 10.18517/ijods.2.1.9-18.2021.

Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Muller, K.R. (2017). Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst., 28(11), 2660–2673.

Ribeiro, M., Singh, S., Guestrin, C. (2016). Why should i trust you? Explaining the predictions of any classifier.1135–1144.

Bin L, Kai L. (2019). ‘ipromoter-2l2.0: Identifying promoters and their types by combining smoothing cutting window algorithm and sequence-based features.’ Molecular Therapy - Nucleic Acids 18, 80–87. doi: 10.1016/j.omtn.2019.08.008.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

JOIV : International Journal on Informatics Visualization
ISSN 2549-9610  (print) | 2549-9904 (online)
Organized by Department of Information Technology - Politeknik Negeri Padang, and Institute of Visual Informatics - UKM and Soft Computing and Data Mining Centre - UTHM
W :
E :,,

View JOIV Stats

Creative Commons License is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.