A Model for Enhancing Pattern Recognition in Clinical Narrative Datasets through Text-Based Feature Selection and SHAP Technique

Sirajo Muhammad Dalhatu - Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
Masrah Azrifah Azmi Murad - Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Selangor, Malaysia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.4.3664

Abstract


Clinical narratives contain crucial patient information for predicting cardiac failure. Accurate and timely cardiac failure recognition (CFR) significantly impacts patient outcomes but faces challenges like limited dataset sizes, feature space sparsity, and underutilization of vital sign data. This study addresses these issues by developing a methodology to improve CFR accuracy and interpretability within clinical narratives. Four datasets—the Framingham Heart Study, Heart Disease from Kaggle, Cleveland Heart Disease, and Heart Failure Clinical Records—undergo preprocessing, including handling missing values, removing duplicates, scaling, encoding categorical variables, and transforming unstructured data using natural language processing (NLP). Various feature selection methods (Chi-Squared, Forward Selection, L1 Regularization) are used to identify influential features for CFR, and the SHapley Additive exPlanations (SHAP) technique is integrated to improve interpretability. Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF) models are trained and evaluated. Performance was evaluated using accuracy, precision, recall, f1-score, and area under the receiver operating characteristic curve (AUC-ROC). Results indicate that L1 Regularization with LR and Chi-Squared with RF perform best for specific datasets. The final model, combining all datasets with Forward Selection and RF, achieves high accuracy (91%), precision (87%), recall (97%), f1-score (91%), and AUC-ROC (94%). This study concludes that advanced text-based feature selection and SHAP interpretability significantly enhance CFR model accuracy and transparency, aiding clinical decision-making. Future research should incorporate more diverse datasets, explore advanced NLP techniques, and validate models in various clinical settings to enhance robustness and applicability.


Keywords


Cardiac failure recognition; clinical narratives; predictive modelling; SHapley Additive exPlanations (SHAP)

Full Text:

PDF

References


T.-D. Le, R. Noumeir, J. Rambaud, G. Sans, and P. Jouvet, “Machine Learning Based on Natural Language Processing to Detect Cardiac Failure in Clinical Narratives,” Dec. 20, 2021, arXiv: arXiv:2104.03934. Accessed: Jan. 01, 2024. [Online]. doi: https://doi.org/10.48550/arXiv.2104.03934

I. Mahmud, M. M. Kabir, M. F. Mridha, S. Alfarhood, M. Safran, and D. Che, “Cardiac Failure Forecasting Based on Clinical Data Using a Lightweight Machine Learning Metamodel,” Diagnostics, vol. 13, no. 15, Art. no. 15, Jul. 2023, doi: 10.3390/diagnostics13152540.

T.-D. Le, P. Jouvet, and R. Noumeir, “Improving Transformer Performance for French Clinical Notes Classification using Mixture of Experts on a Limited Dataset,” Mar. 22, 2023, arXiv: arXiv:2303.12892. Accessed: Jan. 01, 2024. [Online]. doi: 10.48550/arXiv.2303.12892

S. Srinivasan, S. Gunasekaran, S. K. Mathivanan, B. A. M. M. B, P. Jayagopal, and G. T. Dalu, “An active learning machine technique based prediction of cardiovascular heart disease from UCI-repository database,” Sci. Rep., vol. 13, no. 1, Art. no. 1, Aug. 2023, doi: 10.1038/s41598-023-40717-1.

T.-D. Le, R. Noumeir, J. Rambaud, G. Sans, and P. Jouvet, “Detecting a Patient’s Condition From Clinical Narratives Using Natural Language Representation,” IEEE Open J. Eng. Med. Biol., vol. 3, pp. 142–149, 2022, doi: 10.1109/OJEMB.2022.3209900.

D. Li et al., “KTI-RNN: Recognition of Heart Failure from Clinical Notes,” Tsinghua Sci. and Technol., vol. 28, no. 1, pp. 117-130, Feb. 2023, doi: 10.26599/tst.2021.9010093

R. Johnson-Koenke, “Stories of the Heart: Illness Narratives of Veterans Living With Heart Failure,” Fed. Pract., no. 39 (5), no. 39 (5), May 2022, doi: 10.12788/fp.0260.

J. N. Johnson and D. J. Driscoll, “Clinical Recognition of Congestive Heart Failure in Children,” in Heart Failure in the Child and Young Adult, Elsevier, pp. 161–165, 2018, doi: 10.1016/B978-0-12-802393-8.00013-2.

C. Sampaio, I. Renaud, and P. P. Leão, “Illness trajectory in heart failure: narratives of family caregivers,” Rev. Bras. Enferm., vol. 72, no. 1, no. 1, Feb. 2019, doi: 10.1590/0034-7167-2018-0645.

Kumar Mohan, Yousuf Fadhil Salim AL-Mamari, and Mohammed Ahmed Mohammed AL-Najadi, “Techniques of Machine Learning for Detecting Heart Failure,” Int. J. Data Inform. Intell. Comput., vol. 2, no. 2, Jun. 2023, doi: 10.59461/ijdiic.v2i2.62.

M. Sakoda et al., “Early Detection of Worsening Heart Failure in Patients at Home Using a New Telemonitoring System of Respiratory Stability,” Circ. J., vol. 86, no. 7, pp. 1081-1091, Jun. 2022, doi: 10.1253/circj.CJ-21-0590.

R. H. G. Schwinger, “Pathophysiology of heart failure,” Cardiovasc. Diagn. Ther., vol. 11, no. 1, pp. 263-276, Feb. 2021, doi: 10.21037/cdt-20-302.

I. Sopek Merkaš, A. M. Slišković, and N. Lakušić, “Current concept in the diagnosis, treatment and rehabilitation of patients with congestive heart failure,” World J. Cardiol., vol. 13, no. 7, pp. 183-203, Jul. 2021, doi: 10.4330/wjc.v13.i7.183.

S. K. Nadar and M. M. Shaikh, “Biomarkers in Routine Heart Failure Clinical Care,” Card. Fail. Rev., vol. 5, no. 1, pp. 50-56, Feb. 2019, doi: 10.15420/cfr.2018.27.2.

TRUST participants et al., “The emotional and social burden of heart failure: integrating physicians’, patients’, and caregivers’ perspectives through narrative medicine,” BMC Cardiovasc. Disord., vol. 20, no. 1, Dec. 2020, doi: 10.1186/s12872-020-01809-2.

J. Wistrand, “Ailing Hearts and Troubled Minds: An Historical and Narratological Study on Illness Narratives by Physicians with Cardiac Disease,” J. Med. Humanit., vol. 43, no. 1, pp. 129-139, Mar. 2022, doi: 10.1007/s10912-020-09610-0.

L. Frohwirth et al., “Understanding Contraceptive Failure: An Analysis of Qualitative Narratives,” Womens Reprod. Health, vol. 10, no. 2, pp. 280-302, Jul. 2022, doi: 10.1080/23293691.2022.2090304.

C. W. Wong et al., “Misdiagnosis of Heart Failure: A Systematic Review of the Literature,” J. Card. Fail., vol. 27, no. 9, pp. 925-933, Sep. 2021, doi: 10.1016/j.cardfail.2021.05.014.

P. Atkinson, “Illness Narratives Revisited: The Failure of Narrative Reductionism,” Sociol. Res. Online, vol. 14, no. 5, pp. 196-205, Nov. 2009, doi: 10.5153/sro.2030.

J. Misra, “autoNLP: NLP Feature Recommendations for Text Analytics Applications”, arXiv preprint arXiv:2002.03056, 2020, https://doi.org/10.48550/arXiv.2002.03056.

X. Tang, Y. Dai, and Y. Xiang, “Feature selection based on feature interactions with application to text categorization,” Expert Syst. Appl., vol. 120, pp. 207–216, Apr. 2019, doi: 10.1016/j.eswa.2018.11.018.

K. Thirumoorthy and K. Muneeswaran, “Feature Selection for Text Classification Using Machine Learning Approaches,” Natl. Acad. Sci. Lett., vol. 45, no. 1, pp. 51-56, Feb. 2022, doi: 10.1007/s40009-021-01043-0.

K. Wang et al., “Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP,” Comput. Biol. Med., vol. 137, p. 104813, Oct. 2021, doi: 10.1016/j.compbiomed.2021.104813.

M. Bodini, M. W. Rivolta, and R. Sassi, “Opening the black box: interpretability of machine learning algorithms in electrocardiography,” Philos. Trans. R. Soc. Math. Phys. Eng. Sci., vol. 379, no. 2212, Oct. 2021, doi: 10.1098/rsta.2020.0253.

T. Räz, “ML Interpretability: Simple Isn’t Easy,” Studies in History and Philosophy of Science, vol. 103, pp. 159-167, Feb. 2024, doi: 10.1016/j.shpsa.2023.12.007.

M.-Y. Lee and S.-N. Yu, “Selection of Heart Rate Variability Features for Congestive Heart Failure Recognition Using Support Vector Machine-Based Criteria,” in 5th European Conference of the International Federation for Medical and Biological Engineering, vol. 37, pp. 400–403, 2011, doi: 10.1007/978-3-642-23508-5_104.

L. Adilova, M. Kamp, G. Andrienko, and N. Andrienko, “Re-interpreting Rules Interpretability,” In Review, preprint, Apr. 2022. doi: 10.21203/rs.3.rs-1525944/v1.

J. Qu, J. Arguello, and Y. Wang, “Understanding the Cognitive Influences of Interpretability Features on How Users Scrutinize Machine-Predicted Categories,” Proceedings of the 2023 Conference on Human Information Interaction and Retrieval, pp. 247–257, Mar. 2023, doi: 10.1145/3576840.3578315.

S. Sengupta and M. A. Anastasio, “A Test Statistic Estimation-Based Approach for Establishing Self-Interpretable CNN-Based BinaEry Classifiers,” IEEE Transactions on Medical Imaging, vol. 43, no. 5, pp. 1753-1765, May 2024, doi: https://doi.org/10.48550/arXiv.2303.06876

O. O. Bifarin, “Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification,” PLOS ONE, vol. 18, no. 5, May 2023, doi: 10.1371/journal.pone.0284315.

A. Bogdanova, A. Imakura, and T. Sakurai, “DC-SHAP Method for Consistent Explainability in Privacy-Preserving Distributed Machine Learning,” Hum.-Centric Intell. Syst., vol. 3, no. 3, Jul. 2023, doi: 10.1007/s44230-023-00032-4.

A. Salih et al., “A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME,” Advanced Intelligent Systems, Jun. 2024, doi: 10.1002/aisy.202400304.

M. Tuan Le, M. Thanh Vo, N. Tan Pham, and S. V.T Dao, “Predicting heart failure using a wrapper-based feature selection,” Indones. J. Electr. Eng. Comput. Sci., vol. 21, no. 3, pp. 1530, Mar. 2021, doi: 10.11591/ijeecs.v21.i3.pp1530-1539.

M. Ashtiyani, S. Navaei Lavasani, A. Asgharzadeh Alvar, and M. R. Deevband, “Heart Rate Variability Classification using Support Vector Machine and Genetic Algorithm,” J. Biomed. Phys. Eng., Aug. 2018, doi: 10.31661/jbpe.v0i0.614.

F. Li, H. Zhou, H. Li, Y. Zhang, and Z. Yu, “Person Text-Image Matching via Text-Feature Interpretability Embedding and External Attack Node Implantation,” IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1-14, Oct. 2024, doi: 10.1109/TETCI.2024.3462817.

T. Des Touches, M. Munda, T. Cornet, P. Gerkens, and T. Hellepute, “Feature selection with prior knowledge improves interpretability of chemometrics models,” Chemom. Intell. Lab. Syst., vol. 240, p. 104905, Sep. 2023, doi: 10.1016/j.chemolab.2023.104905.

H. Mamdouh Farghaly and T. Abd El-Hafeez, “A high-quality feature selection method based on frequent and correlated items for text classification,” Soft Comput., vol. 27, no. 16, pp. 11259-11274, Jun. 2023, doi: 10.1007/s00500-023-08587-x.

Q. Ni, L. Chen, J. Zhu, J. Pang, Z. Wang, and X. Yang, “Prediction and interpretation of gamma pass rate based on SHAP value feature selection,” In Review, preprint, May 2023, doi: 10.21203/rs.3.rs-2974857/v1.

H. Liu, X. Shen, X. Tang, and J. Liu, “Day-Ahead Electricity Price Probabilistic Forecasting Based on SHAP Feature Selection and LSTNet Quantile Regression,” Energies, vol. 16, no. 13, pp. 5152, Jul. 2023, doi: 10.3390/en16135152.

Y. Gebreyesus, D. Dalton, S. Nixon, D. D. Chiara, and M. Chinnici, “Machine Learning for Data Center Optimizations: Feature Selection Using Shapley Additive exPlanation (SHAP),” Future Internet, vol. 15, no. 3, pp. 88, Feb. 2023, doi: 10.3390/fi15030088.

A. Chen, Z. Yu, X. Yang, Y. Guo, J. Bian, and Y. Wu, “Contextualized medication information extraction using Transformer-based deep learning architectures,” J. Biomed. Inform., vol. 142, p. 104370, Jun. 2023, doi: 10.1016/j.jbi.2023.104370.

A. Javeed, S.S. Rizvi, S. Zhou, R. Riaz, S.U Khan, and S.J. Kwon, “Heart risk failure prediction using a novel feature selection method for feature refinement and neural network for classification,” Mobile Information Systems, 2020(1), 8843115, doi:. https://doi.org/10.1155/2020/8843115