NLP Analysis of Republican Candidate Rubio Debate Tweets
An nlp analysis I did for my MS, code and data on my github
At the time of this debate, Marco Rubio was polling at around 5.2% and ranked seventh among potential Republican nominees. I selected Rubio due to perceptions of his strong debate performances and his reputation as an articulate, potentially more moderate Republican candidate with nuanced views that could appeal to a broader electorate. This analysis uses NLP techniques to evaluate Rubio’s performance in this debate.
After isolating observations for Rubio from the dataset, we retained 275 observations out of the original 13,870 for all candidates. For the entire dataset, the sentiment distribution of tweets is as follows: 61% negative, 22% neutral, and 16% positive. For Rubio, the proportions are 38% negative, 18.5% neutral, and 43% positive. These preliminary results indicate that tweets mentioning Rubio were more positive and less negative compared to the overall sentiment distribution for all candidates. When filtering for sentiment confidence scores of 60% or higher, the Rubio dataset, with 244 observations, shows 40.6% negative, 15.5% neutral, and 43.9% positive tweets. These results confirm that a plurality of tweets about Rubio are positive, and his proportion of negative tweets is significantly lower than the overall dataset.
For a more qualitative analysis, I filtered the Rubio dataset to observations with a Subject Confidence score of 0.4 or higher and maintained the filter for Sentiment Confidence at 0.6 or higher. This filtered dataset contains 229 observations. Most observations were classified under the "none of the above" subject category; among these, 85 were positive, 51 negative, and 25 neutral. For observations classified into specific subject categories—abortion, immigration, jobs and economy, and religion—the sentiment skewed negative (Figure 1). A table of Rubio tweet subjects by category and their respective retweet counts indicates that content regarding immigration received the highest retweets, with 1,044 total retweets (Figure 2). Analyzing the cross-section of retweet content by subject and sentiment reveals that immigration-related retweets were largely positive, with 659 positive instances and 100 negative instances (Figure 4). Overall, retweets referencing Rubio were predominantly positive: 2,654 positive, 1,687 negative, and 988 neutral (Figure 3).
Figure 1: Filtered Rubio Tweets Subject and Sentiment Table
Figure 2: A table of Rubio tweet subjects by category and each category's respective retweet count
Figure 3: Overall Rubio retweets by sentiment
Figure 4: The cross-section of retweet content by subject and sentiment
Given the positive sentiment in the Rubio retweet dataset, I performed a word frequency analysis on this subset. First, I filtered the dataset to Rubio retweets about immigration and manually examined the resultant dataframe. Two observations stood out for their high retweet counts (Table 1). These results indicate that the most retweeted content about Rubio’s stance on immigration highlighted his debate points that immigration to the U.S. is not solely from Latin America. The most retweeted tweet praised Rubio’s statement that more new immigrants come from Asia, while the second most retweeted tweet, graded as neutral, appears to be a tweet pontificating how great it would be if Rubio performed an anti-Canadian-immigration bit directed at another debater.
Table 1: Rubio’s most retweeted tweets regarding immigration
I initially generated a word cloud for tweets about immigration with more than one retweet, but it lacked analytic value. Thus, I created a word cloud for Rubio content in the "none of the above" subject category to glean insights about general positive sentiment tweets (Figure 5). The words in this cloud reflect frequent terms in positive tweets about Rubio not classified into a specific subject category; notable words include "rubio2016," "impressed," "blessed," "wise," "abortion," "entertaining," "enjoyed," "emerges," and "strong."
Figure 5: Word cloud of frequent words in Rubio Tweets within the “None of the above” subject category with a “positive” sentiment word cloud
I then performed topic modeling on a number of subjects within the Rubio dataset. I will not include those results in this writeup as none of the outputs, even when adjusting the number of topics and terms, provided any discernible analytic value. This may be a function of the relatively small filtered dataset as well as indicate a need for greater filtering of the tokens included in the topic model.
Concluding this analysis, Rubio saw a shift in polling outcomes in the negative direction following the debate by 3 percentage points. Based on the analysis I performed, what what explicitly correlative to the negative shift in Rubio’s polling. However, the analysis does indicate where Rubio received positive traction on social media: Immigration. Rubio content related to immigration went the most ‘viral’ as a measure of retweets and received a plurality of positive sentiment. This may be an indication that a good strategy for Rubio could have been to focus on his popular takes regarding immigration. Being that Rubio himself comes from an immigrant, is a lifelong republican and holds fairly moderate views on immigration, I would suggest to the campaign that talking points on the issue of immigration should be a strong focal point for future campaigning. Another recommendation for the Rubio campaign based on this analysis would be to be wary of presenting his take on the abortion issue, as retweets in this category were high and largely were of negative sentiment; in the interest of avoiding negative viral tweets about Rubio’s stance on abortion, it may be of interest to re-evaluate how Rubio discusses this issue during debates.