dc.contributor.advisor | Crick, Christopher John | |
dc.contributor.author | Karuparthy, Srikanth | |
dc.date.accessioned | 2016-04-15T21:49:24Z | |
dc.date.available | 2016-04-15T21:49:24Z | |
dc.date.issued | 2015-05-01 | |
dc.identifier.uri | https://hdl.handle.net/11244/33423 | |
dc.description.abstract | In online social networks like Twitter, the users usually get inundated with the continuous stream of short messages or tweets. This problem can be handled using classification. Classification is a supervised data mining technique which involves assigning a label to a set of unlabeled objects. A conventional approach for classifying text or tweets is to extract features from the linguistic content posted by the users. A recurrent problem in classification is feature selection, that is, to decide the best set of features for making a particular classification decision among the infinite possible different sets of features. This process usually involves heuristic approaches that require manual feature selection by experts, which involves guesswork, prior information about the dataset and a great deal of tweaking and experimental validation. To address this problem we propose and employ a non-heuristic machine learning approach which will automatically decide the feature set for a classification task. Our analysis shows that our automated feature selection process for Twitter content classification performs on par with current state-of-the-art approaches which incorporate painstaking, time-consuming human effort to manually and heuristically select a feature set. This approach will improve the timeliness and accessibility of data mining social media data streams. | |
dc.format | application/pdf | |
dc.language | en_US | |
dc.publisher | Oklahoma State University | |
dc.rights | Copyright is held by the author who has granted the Oklahoma State University Library the non-exclusive right to share this material in its institutional repository. Contact Digital Library Services at lib-dls@okstate.edu or 405-744-9161 for the permission policy on the use, reproduction or distribution of this material. | |
dc.title | Non-heurisitc Machine Learning Apprach for Classifying Twitter Content | |
dc.type | text | |
dc.contributor.committeeMember | Cline, David | |
dc.contributor.committeeMember | Park, Nohpill | |
osu.filename | Karuparthy_okstate_0664M_13788.pdf | |
osu.accesstype | Open Access | |
dc.description.department | Computer Science | |
dc.type.genre | Thesis | |