Approach to Quantify Information in Tweets
Abstract
Microblogs such as Twitter play an important role in online social communications. Unlike traditional media, hot topics and emerging news will become much more popular in a short span with the help of information spreading platforms like Twitter. Nowadays Twitter is widely used in many professions to analyze data. For example, sentiment analysis is the popular approach to opinion mining where the sentiment values of the tweets are classified into weighted classes positive, negative or neutral. These signed weights may not be the best approach for analysis in all cases. Information diffusion is an alternative method to analyze the information defined as information passing through person to person where the research mostly focuses on graph-based models. The edges of the network graph are constructed based on either retweet status or hashtags, and information flow is modeled as transmission from node to node where nodes are users. Generally speaking, an analysis of tweets quantify information inherent in tweets. In this research, a new approach is proposed to quantify information in tweets as unsigned weights. This approach is suitable to analyze problems if tweets can be interpreted to convey unsigned weight contribution to the problem. The weight computation method presented in this thesis extract keywords called tokens from tweets. Then weights are associated with tokens. The weights are interpreted as quantification of information. To identify tokens two methods are used, one approach uses a technique in Topic Modeling LDA (Latent Dirichlet allocation) to determine tokens and their weights. The second approach is iterative which starts with some anchor words (keywords set) and with similarity measure between anchor word set and the words in tweets. More words are added based on some threshold value of similarity. To associate weights to tokens NMF (Nonnumeric Matrix Factorization) is used. To compute the weight contribution of a tweet, a formula for its potential is used.
Collections
- OSU Theses [15752]