Show simple item record

dc.contributor.advisorGeorge, K. M.
dc.contributor.authorParyani, Jyotsna
dc.date.accessioned2018-03-13T18:16:15Z
dc.date.available2018-03-13T18:16:15Z
dc.date.issued2017-05-01
dc.identifier.urihttps://hdl.handle.net/11244/54589
dc.description.abstractTwitter data (tweets) has all the attributes of Big Data. Also, it has become the source of information where people post their real-time experiences and their opinions on various day-to-day issues. Therefore, twitter data mining is being used for knowledge extraction and prediction in various domains. As its popularity and size grow, the veracity of knowledge extracted becomes a concern. Veracity is one of the V�s of Big Data. The integrity of data, data authenticity, trusted origin, trustworthiness are some of the aspects that deal with Veracity. This thesis deals with the Veracity aspect of Big Data, in particular, veracity in Twitter data, from the truthful vantage point. In this research, we have compared existing Big Data Veracity models with a newly proposed measure. The proposed Veracity measure is entropy and it is compared with two other models, namely Objectivity, Truthfulness and Credibility model(OTC) and Diffusion, Geographic and Spam indices (DGS model) of Veracity. Our approach is to define topics on the set of tweets related to a domain and compute the veracity measures of the topics. The proposed model is based on the bag-of-words model for topic definition. Based on the values of the measures further inferences are achieved. For our analysis, we selected three domains. The domains we chose are the flu, food poisoning, and politics. The topics for flu and food poisoning data are based on anchor words taken from CDC website. Anchor words of topics for Politics data are taken from �ontheissues.org� website. The entropy, OTC model, and DGS model are calculated for each topic. Our analysis shows no correlation between entropy, OTC model, and DGS model when compared as time series. Computed values of the models could position the topics in a veracity spectrum
dc.formatapplication/pdf
dc.languageen_US
dc.rightsCopyright is held by the author who has granted the Oklahoma State University Library the non-exclusive right to share this material in its institutional repository. Contact Digital Library Services at lib-dls@okstate.edu or 405-744-9161 for the permission policy on the use, reproduction or distribution of this material.
dc.titleCase Study on Determining the Big Data Veracity: A Method to Compute the Relevance of Twitter Data
dc.contributor.committeeMemberPark, N.
dc.contributor.committeeMemberThomas, Johnson
osu.filenameParyani_okstate_0664M_15182.pdf
osu.accesstypeOpen Access
dc.description.departmentComputer Science
dc.type.genreThesis
dc.type.materialtext


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record