Prediction of U.S. Election Using Twitter Data: A Case Study
Abstract
Social network/media has become popular over the last few years and is moving closer to be an integral part in one�s life. With the rise of new social media movement, the analysis of social networking blog contents has become an important tool of big data analytics. Recent research studies on the use of Twitter for predicting political elections have raised many questions as well as interest in using Twitter data for predictive analysis. The overarching objective of our research is to study the capability of Twitter data as an ex-ante indicator of event outcomes. The 2014 US midterm election has been chosen as the event for this study. This work analyses both pre-poll and post-poll data from Twitter related to 2014 midterm elections in U.S. Relevant tweets are extracted from the tweet stream with the help of a Map-Reduce Program in a Hadoop system by specifying appropriate keywords configuration for running Apache Flume. This data are classified into four groups using �Democrat� and �Republican� as the division criteria. Two time-series of sentiments (positive and negative) are constructed for each group. Several statistics are also compiled from each group of tweets and used as predictive indicators. Original tweet count, retweet count, and user count in each group are some of the statistics compiled. All the statistics favor the Republican party to win which actually was the outcome of the election. Our research consists of two parts. The first part is prediction of election results and the second part is modeling sentiment before and after the election. We used Hidden Markov Model as a tool for both parts. The hidden states of the model were used as sentiment indicators and state changes were interpreted as sentiment changes. The results of the HMM agreed with the actual outcomes. Our study provides support for the argument that Twitter data can be considered as a reliable predictor of events.
Collections
- OSU Theses [15752]