Data Collection
Using the Twitter Developr API and Tweepy, I collected a total of 327,083 tweets from 2017 to 2022 with 154,766 tweets from directly after Trump's Muslim Travel Ban and 172,317 tweets from the past week. Of the 327,083 tweets, 1,470 tweets were geotagged (1,332 from after the ban and 138 from current day).
Data Pre-Processing
The tweets were separated into 2 datasets: 1) After Muslim Travel Ban and 2) Current Day. From there, any tweets from each dataset that had a place id associated with it were added into 2 additional datasets: 1) After Muslim Travel Ban with Place ID and 2) Current Day with Place ID.
Sentiment Analysis
A pretrained RoBERTa-Base Model was used to classify each tweet as Negative, Neutral, or Positive. This transformer-based model was trained on roughly 124 million tweets from January 2018 to December 2021 and specifically finetuned for sentiment analysis.
Geolocating Tweets
From each geolocated tweet, I extracted the longitude and latitudes for each tweet. Utilizing the Geopy Python Package, I determined the country/state of each tweet. All tweets from outside the US were dropped.
Geolocation Analysis
With the geolocated tweets and their respective sentiment analysis scores, I calculated the average negative sentiment score for each state both immediately after the Muslim travel ban and current day. I then calculated the difference between the average negative sentiment score for each state after the ban and current day, showing the change in sentiments.
Generating Visualizations
In order to generate visualizations for geotagged tweets, I used Flourish Studio to merge my data values with the base US states model.