Predicting NBA Efficiency from NCAA Statistics Using Play-by-Play and On-Off Statistics
Abstract
New data is available for college basketball since 2009-10. However, when adding interactions, there is a sparse dataset with over 30,000 variables. This causes problems, because there are far more variables than observations. Further, communicating the rigorous mathematical findings with any credibility within an industry that is relatively new to data science approaches is a challenge. With this in mind, I use this new, contextual data to predict future NBA on-court efficiency.
Understanding that analytics are only a piece of the bigger puzzle of drafting players in the NBA, the goal is to use this new data to build a simple model to predict who will become a maximum contract NBA player, with a focus on explainability. I use a novel approach to splitting players into three positions instead of five, by using this contextual information as a proxy. By being able to discuss a simple model with specific context, I believe this is a good process for the NBA Draft when used in tandem with scouting analyses. This allows for clear and transparent takeaways to discuss with the vast basketball knowledge that employees in NBA organizations bring to the table. This should be helpful when given little time to make a decision that has the potential to impact the legacy of an organization. I finish with visualizations of model results from 2007 thru 2018.
Collections
- OU - Theses [2088]