Application of Artificial Neural Networks, Gradient Boosted Decision Trees, and Multilevel Logistic Models in a Supervised Learning Environment to Investigate Differences in Classification Performance when Predicting College Enrollment
MetadataShow full item record
The use of data mining algorithms for applied practice is becoming commonplace in many industries. The application of these models to the domain of educational data and practice could provide significant gains in understanding and implementation of prediction in the classroom. The wealth of data collected from students as they progress through a traditional education track could benefit greatly from machine learning and data mining. The present dissertation is designed to examine the usefulness, when compared to Multilevel Logistic Regression, of Artificial Neural Networks and Gradient Boosted Decision Trees, at predicting college enrollment using data collected as students progressed through high school. Because of the immense amount of data that data mining algorithms can interact with, the emphasis is placed on, but not limited to, variables representing difficulty of coursework, advanced placement, STEM vs non-STEM, behavioral referrals, attendance, and any statewide standardized testing. The grade level data was analyzed independently for each model to determine at what pace model predictive consistency increased as new and more relevant information was collected. The comparison of model predictive capacity revealed that certain data mining algorithms could indeed be used in place of traditional statistical models, but the gains were not always consistent across all grade levels. Implications and future research are discussed.
- OU - Dissertations