Loading...
Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2024-08
Advisor
Committee member
Group
Department
Oral Biology
Subject
Permanent link to this record
Collections
Research Projects
Organizational Units
Journal Issue
DOI
http://dx.doi.org/10.34944/dspace/10636
Abstract
Objectives: To predict oral cancer-related mortality among adults in the United States and identify the predictors of oral cancer-related mortality using the Machine Learning Approach.
Methods: We extracted data for 8,176 participants from the SEER database (1975 to 2022). A
series of 38 demographic, clinicopathological, and lifestyle factors were extracted from the SEER
database along with the outcome variable Oral Cancer-Related Mortality (OCRM) coded as “Died
from Oral Cancer” and “Alive/Died from Other Causes.” The data were pre-processed using recipe
packages in R. Machine Learning (ML) models-extreme gradient boosting (xgboost), Lassso
Regression, and K-nearest neighbor were used to perform prediction of oral cancer prognosis
under five-fold cross-validation to prevent overfitting or underfitting of the data. Model
performance was evaluated using the Brier score, area under the curve (AUC), specificity,
sensitivity, and accuracy. ML model was performed using MachineShop Package in R.
Results: The study participants were 63% male and predominantly non-Hispanic white (71%).
7444 participants were alive or dead of other causes and 732 were dead due to cancer. Across all
models, XGBoost ML model performed the best with a Brier Score of 0.0677, an accuracy of
91%, a 13% kappa statistic, an ROC AUC of 84%, a sensitivity of 99%, and less than 1%
specificity. Out of 38 variables assessed, 17 were found to be the most important predictors of
OCRM. The most important predictors of OCRM (in descending order) were cancer stage group,
age, T stage, Lymph node surgery, cancer site, tumor rarity, N stage, marital status, radiation,
income, grade, lymph node size, surgery radiation sequence, race, histology, the sequence number
of multiple primary cancers, side of a paired organ which tumor originated from.
Conclusion: Our Machine-Learning model was effective in predicting oral cancer mortality
using clinicopathological variables from the national cancer registry.
Keywords: machine learning, metastasis, oral cancer, prediction, squamous cell carcinoma,
SEER database.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu