rhondamuse.com

Premier League Twitter Analysis with Python Tweepy

Written on

In this article, we'll build a dataset from the ground up by collecting real-time Tweets from around the globe. Welcome to Python Data Science December #6.

Social media has become an essential platform in today's sports arena, allowing clubs to connect and engage with their fans. In this analysis, we will delve into the Twitter interactions of the six leading Premier League football clubs in England.

This piece is part of my ongoing series, Python — Data Science December. Comprehensive resources, datasets, and the necessary Python libraries and installations can be found at the end in the Summary & Resources section.

Creating a Twitter App

Warning: This article was written prior to Elon Musk's acquisition of Twitter. There have been numerous reports regarding chaotic events at Twitter recently. While my code remains functional, I cannot guarantee that every step will work seamlessly for you.

To fetch Tweets from Twitter using their API, you must:

  • Have a Twitter account (register at Twitter.com).
  • Create an application on the Twitter Developer Portal. Let's walk through this process together.

Head over to the Developer Portal and initiate a new project.

Twitter will assist you throughout this setup. You need to configure three items:

  • Project Name: Code&Dogs
  • Use Case: Exploring the API
  • Project Description: Exploring Twitter API with Python

Once this is complete, you can establish an App within your project, again following three steps:

  • App Environment: Development (alternatively, you could select Staging or Production)
  • App Name: DataScienceDecember
  • Keys & Tokens: You will receive an API Key, API Key Secret, and Bearer Token. Make sure to copy these or write them down.

With the Project and App set up, we can utilize them with a monthly limit of 2,000,000 Tweets.

In addition to your API Key, API Key Secret, and Bearer Token, you'll require two more credentials found in the Authentication Tokens section of your App: the Access Token and Access Token Secret. I recommend storing all keys and tokens in a dedicated credentials.py file.

Exploring Tweepy

Let's dive into Tweepy and learn how to authenticate our Twitter App using Python.

  • Import the tweepy library and the previously created credentials.py (lines 1-2).
  • Load all keys and secrets from credentials.py (lines 4-8).
  • Establish a tweepy OauthHandler with your credentials (lines 10-11) and connect to the Twitter API (line 12).
  • As a preliminary test, send a new tweet to your timeline (lines 14-15) and verify it on Twitter.

Tip: By default, your Twitter App is set to Read-only access. You can modify this in the App settings to Read/Write, but you may need to regenerate your access token and secret. For the next steps, you can skip this if you do not require write access.

Next, let's investigate how to read Tweets from any user's timeline.

To begin, we select Elon Musk's Twitter account (line 1) and read his timeline with the following parameters, storing the results in the tweets variable (lines 2-7):

  • count=10: Specifies the number of Tweets to retrieve.
  • include_rts=False: Excludes Re-Tweets from the timeline.
  • exclude_replies=True: Prevents replies from appearing in the results.
  • tweet_mode='extended': Ensures Tweets with more than 140 characters are included.

Once we have all the Tweets, we loop through them and print details such as text and creation date (lines 9-13).

The results confirm that we accurately retrieved data from Elon Musk's Twitter profile.

Building Our Dataset

Now that we understand how to use Tweepy to obtain Tweets, let's focus on building a dataset to compare the Twitter activities of the top six Premier League football clubs.

Starting with Manchester United, we can break the code into manageable sections:

  • We already know how to read Tweets from a user's timeline. We designate userID = 'ManUtd', which is the official Twitter account for Manchester United (line 1), and increase the count to 200 (line 3). The rest remains unchanged (lines 1-6).
  • We store all Tweets in the TweetCollector list (lines 8-9) and save the id of the last fetched Tweet (line 10).

Next, we will continuously request more Tweets (lines 12-26) until no more are available (lines 20-22). There are certain Twitter limits that restrict the number of Tweets you can fetch within a specified timeframe.

  • Begin the while loop (line 12).
  • Request Tweets from the user ManUtd, starting from where we last stopped using the max_id parameter (line 18). If there are no more Tweets, we exit the loop (lines 20-22).
  • Append the Tweets to the TweetCollector list, save the id of the last fetched Tweet, and display the total number of Tweets collected so far (lines 24-26).

The loop will repeat, and it appears that ~3000 Tweets is the maximum we can obtain.

Once the loop is complete, we process the full list of Tweets stored in TweetHelper (lines 1-7) and split each Tweet into the following components:

  • The club name ‘Manchester United’ (line 1).
  • The Id, creation date, favorite count, and retweet count (lines 2-5).
  • The Tweet text (line 6), ensuring to remove any line breaks.

We then save this information in tweetsHelper.

Finally, we will save tweetsHelper into a pandas DataFrame, add the necessary headers, and export it as a CSV file.

Let's take a brief look at the generated CSV file.

Now, we will replicate the process for Liverpool F.C. by simply changing the userID variable to LFC and running the script again. Once completed, we will have a CSV file with Tweets from Liverpool F.C.

Next, we will perform the same steps for the remaining top six clubs in the Premier League:

  • Arsenal (userID = 'Arsenal')
  • Chelsea (userID = 'ChelseaFC')
  • Manchester City (userID = 'ManCity')
  • Tottenham (userID = 'SpursOfficial')

This will result in six distinct CSV files, one for each club.

The final task is to combine these datasets into a single file. We can easily read all CSV files into separate DataFrames (lines 1-6), merge them into one dataset (line 3), and save it as a single large CSV file.

Let's quickly examine the structure and value counts of the combined data.

We observe that we have a total of 16,598 Tweets organized across six columns. Each Twitter account has a roughly comparable number of Tweets (between ~2,700 and ~3,000), with the exception of Manchester City, which has only ~2,000 Tweets.

The discrepancy may be due to Twitter's documentation stating that the user_timeline method only retrieves the 3,200 most recent activities from a user's timeline. Even if many retweets or other statuses are included, they are counted, despite being filtered out using exclude_replies=True and include_rts=False.

That's all for today; see you tomorrow! ?

Summary & Resources

This marks the sixth installment of Python Data Science December. We constructed our dataset by extracting Tweets from the top six Premier League football clubs in England.

To stay updated with my stories and support me, consider registering on Medium. If you have questions or need assistance, feel free to leave a comment—I'll be sure to respond.

You can access the complete Python code along with the datasets (totaling 16,598 rows) for free on GitHub. Additionally, I have prepared an advanced dataset containing Tweets from ALL Premier League clubs (totaling 53,123 rows), which will be shared exclusively on Patreon for a small donation.

  • ? GitHub (free) — full code & datasets (16,598 rows total)
  • ? Patreon ($3/month for regular & advanced content) — advanced dataset with Tweets from ALL Premier League clubs (53,123 rows total)

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Mastering Array Clearing in JavaScript: Techniques and Tips

Learn effective methods to empty arrays in JavaScript, including using length, splice, and loops.

Exploring the Lightness of the Milky Way: A Cosmic Mystery

Discover the intriguing reasons behind the Milky Way's surprising lightness, revealed through the study of Fast Radio Bursts.

# Embrace Self-Reliance: Why Blaming Others Hinders Your Growth

Understanding the importance of self-trust and accountability can transform your life. Stop blaming others and start believing in yourself.

Raising the Bar: The Hidden Dangers of Low Standards in Leadership

Discover the subtle signs of low standards in leadership and how to combat them to foster a high-performing team.

Exploring Character Journeys: Meda's Travel Dreams

Discover Meda's travel aspirations and the significance of her choices through imaginative dialogue.

The Evolution of Technology: Triumphs and Terrors for the Over-70s

Exploring the relationship between the over-70s and technology, highlighting triumphs and challenges in adapting to digital life.

Unlocking the Secret to Alleviating Neck Pain Through Strength

Discover how strengthening the upper traps can help relieve neck pain effectively and sustainably.

Exploring Dual-Star Systems: The Fascinating TOI-1338 Discovery

Scientists unveil the TOI-1338 system, revealing planets orbiting two stars, reminiscent of Tatooine from Star Wars.