Developing and deploying data mining techniques in healthcare

Piri, Saeed

dc.contributor.advisor	Liu, Tieming
dc.contributor.author	Piri, Saeed
dc.date.accessioned	2018-04-23T19:36:01Z
dc.date.available	2018-04-23T19:36:01Z
dc.date.issued	2017-07
dc.identifier.uri	https://hdl.handle.net/11244/299540
dc.description.abstract	Improving healthcare is a top priority for all nations. US healthcare expenditure was $3 trillion in 2014. In the same year, the share of GDP assigned to healthcare expenditure was 17.5%. These statistics shows the importance of making improvement in healthcare delivery system. In this research, we developed several data mining methods and algorithms to address healthcare problems. These methods can also be applied to the problems in other domains.
dc.description.abstract	The first part of this dissertation is about rare item problem in association analysis. This problem deals with the discovering rare rules, which include rare items. In this study, we introduced a novel assessment metric, called adjusted support to address this problem. By applying this metric, we can retrieve rare rules without over-generating association rules. We applied this method to perform association analysis on complications of diabetes.
dc.description.abstract	The second part of this dissertation is developing a clinical decision support system for predicting retinopathy. Retinopathy is the leading cause of vision loss among American adults. In this research, we analyzed data from more than 1.4 million diabetic patients and developed four sets of predictive models: basic, comorbid, over-sampled, and ensemble models. The results show that incorporating comorbidity data and oversampling improved the accuracy of prediction. In addition, we developed a novel "confidence margin" ensemble approach that outperformed the existing ensemble models. In ensemble models, we also addressed the issue of tie in voting-based ensemble models by comparing the confidence margins of the base predictors.
dc.description.abstract	The third part of this dissertation addresses the problem of imbalanced data learning, which is a major challenge in machine learning. While a standard machine learning technique could have a good performance on balanced datasets, when applied to imbalanced datasets its performance deteriorates dramatically. This poor performance is rather troublesome especially in detecting the minority class that usually is the class of interest. In this study, we proposed a synthetic informative minority over-sampling (SIMO) algorithm embedded into support vector machine. We applied SIMO to 15 publicly available benchmark datasets and assessed its performance in comparison with seven existing approaches. The results showed that SIMO outperformed all existing approaches.
dc.format	application/pdf
dc.language	en_US
dc.rights	Copyright is held by the author who has granted the Oklahoma State University Library the non-exclusive right to share this material in its institutional repository. Contact Digital Library Services at lib-dls@okstate.edu or 405-744-9161 for the permission policy on the use, reproduction or distribution of this material.
dc.title	Developing and deploying data mining techniques in healthcare
dc.contributor.committeeMember	Heragu, Sunderesh
dc.contributor.committeeMember	Yousefian, Farzad
dc.contributor.committeeMember	Paiva, William
dc.contributor.committeeMember	Dulen, Dursun
osu.filename	Piri_okstate_0664D_15272.pdf
osu.accesstype	Open Access
dc.type.genre	Dissertation
dc.type.material	Text
thesis.degree.discipline	Industrial Engineering and Management
thesis.degree.grantor	Oklahoma State University

Files in this item

Name:: Piri_okstate_0664D_15272.pdf
Size:: 3.343Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

OSU Dissertations [11222]

Show simple item record

SHAREOK^TM

advancing Oklahoma scholarship, research and institutional memory

Developing and deploying data mining techniques in healthcare

Files in this item

This item appears in the following Collection(s)