Show simple item record

dc.contributor.advisorRazzaghi, Talayeh
dc.contributor.authorCejda, Nicholas
dc.date.accessioned2021-05-12T15:32:42Z
dc.date.available2021-05-12T15:32:42Z
dc.date.issued2021-05-14
dc.identifier.urihttps://hdl.handle.net/11244/329534
dc.description.abstractPredicting disease incidence based on Single Nucleotide Polymorphisms (SNPs) for a complex multi-factorial disease like sarcoidosis remains a difficult prediction problem. If disease prediction could be improved, genetic screening could be implemented to assist identifying disease early, potentially improving patient outcomes. In this thesis, we examine the predictive performance of several supervised machine learning models to assess if genetic variability can be used to accurately predict disease incidence in an African American patient population (n = 2,915). Further, we consider the use of SNP “functional scores” such as Combined Annotation Dependent Deletion (CADD) scores and FATHMM-XF scores to see if they can improve predictive ability. Here we show that support vector machine (SVM), and random forest (RF) models can significantly outperform the naïve baseline model (p < 0.05) in terms of accuracy and achieve area under the ROC curve (AUC) values of 0.6016 and 0.6019, respectively. A neural network (NN) model had the optimal AUC value of 0.6103 but was slightly non-significant (p = 0.05) when compared to the naïve model in terms of accuracy. The overall impact of adding functional scores was minimal to negative on predictive performance. This work reveals that supervised machine learning based on SNPs can significantly outperform random chance when predicting sarcoidosis incidence and supports the idea that genetic screening and disease modeling prior to disease incidence could improve preventative care.en_US
dc.languageen_USen_US
dc.rightsAttribution-NonCommercial 4.0 International*
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/*
dc.subjectsupervised machine learningen_US
dc.subjectdisease predictionen_US
dc.subjectsingle nucleotide polymorphismsen_US
dc.subjectdisease modelingen_US
dc.subjectrandom foresten_US
dc.subjectsupport vector machineen_US
dc.subjectneural networken_US
dc.subjectsarcoidosisen_US
dc.titlePredicting Sarcoidosis Disease Incidence using Single Nucleotide Polymorphisms and Supervised Machine Learningen_US
dc.contributor.committeeMemberMontgomery, Courtney
dc.contributor.committeeMemberNicholson, Charles
dc.contributor.committeeMemberPan, Chongle
dc.date.manuscript2021-05-11
dc.thesis.degreeMaster of Scienceen_US
ou.groupGallogly College of Engineeringen_US
shareok.orcid0000-0003-4518-4125en_US


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


Attribution-NonCommercial 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial 4.0 International