With the World Cup in the books, we decided to wrap up the tournament by analysing the mood on Twitter around the prolific goalscorers (with 2 or more goals) throughout the Cup. We scraped more than 10 million tweets and extracted around 2.5 million in English and conducted sentiment analysis on those English tweets. We started the scrape on November 20 and stopped on December 19.
- Breakout star and Manchester United target Cody Gakpo was amongst the most ‘loved’ goalscorers. Only 9.86% of tweets directed towards him were deemed as negative.
- Germany’s Niclas Fullkrug was the most insulted goalscorer, with 30.62% of all Tweets directed towards him deemed as negative.
- Lionel Messi and Kylian Mbappé accounted for the highest numbers of tweets. They combined for 1,645,975 tweets. More than 50% of all English tweets.
- Former Arsenal and Chelsea striker, Olivier Giroud was amongst the most loved, with only 10.48% of tweets mentioning him being deemed as negative.
- Harry Kane received the second-highest % of negative tweets, with 19.66% of all English tweets mentioning him being deemed as negative.
- Arsenal star Bukayo Saka were amongst the most ‘loved’ goalscorers. He came in as number four, with almost 42% of tweets directed at him deemed as positive.
In this table, you can have a look at the data yourself.
We used an API (Application Programming Interface) to scrape 10 Million tweets related to goal scorers during the World Cup. We removed tweets mentioning the players that scored 1 goal, thus focusing on the goalscorers that scored 2 or more goals. We extracted only English tweets (2,586,811) and cleaned the data using Python programming language. The data cleansing consisted of the process of removing URLs, Hashtags, Mentions, Punctuation, Duplicates, Null Values, and special characters from tweets. Once the data was cleaned we used NLP (Natural Language Processing) techniques, TextBlob Python Library to be more specific, to analyse the sentiments of the data.
TextBlob returns the polarity of a sentence. Polarity lies between [-1,1], where -1 defines a negative sentiment, and +1 defines a positive sentiment. Once we had all cleaned tweets labelled as positive or negative, we grouped the data by date and player and calculated the percentage of negative and positive tweets for each date and each player.