Loading...
Thumbnail Image
Item

Sequential machine learning in prediction of common cancers

Andjelkovic, Jovan
Ljubic, Branimir
Pavlovski, Martin
Diaz, Wilson
Obradovic, Zoran
Research Projects
Organizational Units
Journal Issue
DOI
http://dx.doi.org/10.1016/j.imu.2022.100928
Abstract
Cancer is one of the most common causes of death in the world. It is characterized by the multi-stage transformation of normal cells into tumor cells. Early cancer detection can significantly reduce its consequences, which was the objective of many machine learning (ML) published studies. However, most of them focused on microarray, gene expression, or publicly available medical datasets. Almost none offered an approach for predicting cancer through analysis of sequential data, such as Electronic Health Record (EHR) data. This paper presents a sequential ML approach to predict the occurrence of lung cancer, breast cancer, cervical cancer, and liver cell cancer using EHR data. The accuracy of sequence learning models based on long short-term memory (LSTM) and bidirectional gated recurrent units (GRU) were compared to traditional ML methods based on multilayer perceptron, random forest, decision tree, and K-nearest neighbor. The models were trained and tested on 50,606 patient hospitalization histories. Unsupervised and supervised data reduction methods (singular value decomposition (SVD) and a neural network embedding layer) were applied to overcome the challenges of high-dimensionality and sparsity of EHR data. The results provided evidence that for this application GRU outperforms alternatives based on accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity (recall), specificity, precision, and F1 score. It was the best performing model with accuracy between 81% (breast cancer) and 88% (liver cancer) on balanced out of sample EHRs. Multilayer perceptron and LSTM manifested comparable performances (accuracies between 78% and 87%) among the alternatives, while decision tree was the worst-performing model. The findings of this study could potentially aid medical professionals in cancer diagnostics, treatment, and prevention. In particular, experiments confirmed that GRU could accurately predict cancer by learning from simplified patient representations using an embedding layer or SVD. Therefore, GRU's predictions could be used in early cancer detection, potentially improving patients' survival rates.
Description
Citation
Jovan Andjelkovic, Branimir Ljubic, Ameen Abdel Hai, Marija Stanojevic, Martin Pavlovski, Wilson Diaz, Zoran Obradovic, Sequential machine learning in prediction of common cancers, Informatics in Medicine Unlocked, Volume 30, 2022, 100928, ISSN 2352-9148, https://doi.org/10.1016/j.imu.2022.100928.
Citation to related work
Elsevier
Has part
Informatics in Medicine Unlocked, Vol 30
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos