Twitter sentiment analysis to study association between food habit and diabetes
Authors
Advisors
Issue Date
Type
Keywords
Citation
Abstract
Social media platforms such as Twitter, Facebook, Instagram and etc. are rapidly becoming key resources for many researches. Among all micro blogging services, Twitter is one of the most important ones. Vast amounts of freely available, user-generated online content, in addition to allowing for efficient and potentially automated, real-time monitoring of public sentiment, allow for bottom-up discovery of emergent patterns that may not be readily detectable using traditional surveillance methodologies such as pre-formulated surveys.
One of the most significant health issues in the world and particularly in the US is the high rate of diabetes, which causes early death, cardiovascular disease and many other health problems. In this work, a framework is developed to study the association between social media attitude towards “Fast food” and reported diabetes rate available from Government websites. In this work, two classification methods are used for predicting the sentiments in tweets containing the word “Fast Food”. First method is a generic classifier that uses a predefined dictionary to compute the polarity of a given tweet due to get its sentiment; the polarity score is a float within the range [-1.0, 1.0]. Therefore, if the polarity is less than 0 the result of sentiment will be negative; if it is equal to 0, the result of sentiment will be neutral; otherwise it will be positive [1]. The second one is manually labeled classifier that uses a manually labeled training set specifically suited for the purpose of predicting tweet sentiments containing the word “Fast Food”. For both the classifiers, correlation coefficients between predicted negative tweets percentage and reported diabetes rates were computed. It was observed that negative sentiments predicted by the manually labeled classifier showed stronger correlation to the reported diabetes rate for 14 states of the U.S. as compared to that of the generic classifier.