Suhem Parack
Posted on June 2, 2022
The reverse chronological timeline endpoint in the Twitter API v2 returns Tweets that appear in a user's home timeline, with the most recent Tweets first. In this short tutorial, I will show how you can get these Tweets using the Tweepy package in Python and perform some basic exploratory analysis on them. Specifically, we will learn how to get:
- Recent Tweets that appear in a user's home timeline
- Timestamp for the first and last Tweet in the timeline
- Most liked Tweet in the timeline
- Different languages that appear in the timeline
- Common topics that appear in the timeline
- Accounts with the most Tweets that appear in the timeline
- Types of Tweets that appear in the timeline
In order to use the Twitter API v2, you need to sign up for a Twitter developer account. Once you have signed up, you will need to obtain your keys and tokens to connect to the Twitter API in Python, using Tweepy. Note: the reverse chronological timeline endpoint only works with user access token (and not with app-only auth). Finally, make sure you have Python installed on your machine and that you have the most recent version of Tweepy installed by running:
pip3 install tweepy --upgrade
Getting recent 800 Tweets from the reverse chronological timeline
In order to get Tweets from the reverse chronological timeline with Tweepy, you will first have a to initializa the client (which makes the API calls for you) with your consumer_key
, consumer_secret
, access_token
and access_token_secret
. Next, you can use the get_home_timeline
function. You can get maximum 100 Tweets per call, so if you want more Tweets, you will have to use the Paginator functionality in Tweepy, and specify how many Tweets you want returned. So, for example, if you want 800 Tweets (as shown in the code below), you can specify limit=8
for the Paginator.
Also, by default the Twitter API v2 returns only the Tweet ID and text for a Tweet. If you want additional data such as the time the Tweet was created, language of the Tweet, metrics (such as like_count), you will have to request those individually using fields and expansions. In this example, because we want the username of the person Tweeting, we will have to set expansions=['author_id']
. Then, we will create a users
dictionary with the user ID as the key and the user information such as name, username etc as the value, so that we can easily lookup user information for each Tweet.
In the example below, we are creating a dictionary called tweets_dict
which contains the Tweet ID as the key and the value is an object with the Tweet text, the time when the Tweet was created, the number of likes for the Tweet, context annotations and user name of the person Tweeting it.
import tweepy
client = tweepy.Client(consumer_key='REPLACE_ME',
consumer_secret='REPLACE_ME',
access_token='REPLACE_ME',
access_token_secret='REPLACE_ME')
tweets_dict = dict()
# Limit = 8 below will result in recent 800 Tweets being returned because for each request we are requesting 100 Tweets
for response in tweepy.Paginator(client.get_home_timeline,
max_results=100,
tweet_fields=['created_at', 'lang', 'context_annotations', 'public_metrics', 'referenced_tweets'],
expansions=['author_id'],
limit=8):
tweets = response.data
users = {u["id"]: u for u in response.includes['users']}
for tweet in tweets:
print(tweet.id)
user = users[tweet.author_id]
tweets_dict[tweet.id] = {
"id": tweet.id,
"text": tweet.text,
"created_at": tweet.created_at,
"lang": tweet.lang,
"like_count": tweet.public_metrics['like_count'],
"context_annotations": tweet.context_annotations,
"username": user.username
}
print(len(tweets_dict))
First and last Tweet creation timestamp from the timeline
Now that we have the recent Tweets from the user's home timeline, we can simply get the first and last Tweet from the tweets_dict
to determine the first and last Tweet in the users home timeline. In the example below, we also print the difference in time between these 2 Tweets.
first_tweet = tweets_dict[list(tweets_dict)[0]]
last_tweet = tweets_dict[list(tweets_dict)[-1]]
print("First Tweet in timeline is {} created at {}".format(first_tweet['id'], first_tweet['created_at']))
print("Last Tweet in timeline is {} created at {}".format(last_tweet['id'], last_tweet['created_at']))
print("Number of days between first and last Tweet: {}".format(first_tweet['created_at'] - last_tweet['created_at']))
In my case, I got the following response:
First Tweet in timeline is 1532378597620473856 created at 2022-06-02 15:08:22+00:00
Last Tweet in timeline is 1444321306330107905 created at 2021-10-02 15:20:08+00:00
Number of days between first and last Tweet: 242 days, 23:48:14
Most liked Tweet from the timeline
To get the most liked Tweet from the timeline, we can reverse sort tweets_dict
on the like_count
and that will give us Tweets based on the like_count
(most to least).
for k,v in sorted(tweets_dict.items(), key=lambda x: x[1]['like_count'], reverse=True):
print(k,v['like_count'])
Different languages present in the timeline
To see the common languages present in the timeline, we will create a languages
dictionary and then count how many times a language appears and then reverse sort the dictionary based on the count and print it.
languages = dict()
for key, value in tweets_dict.items():
if value['lang'] not in languages:
languages[value['lang']] = 1
else:
languages[value['lang']] = languages[value['lang']] + 1
for k, v in sorted(languages.items(), key=lambda item: item[1], reverse=True):
print(k, v)
In my case, it gave me the following response:
en 727
und 9
fr 5
in 3
es 2
tr 2
hi 1
pl 1
ar 1
ja 1
ro 1
it 1
tl 1
ca 1
Most common topics that appear in the timeline
The Twitter API v2 supports Tweet annotations that provide contextual information about a Tweet and return named entities present in a Tweet. Each context_annotation
contains a domain and entity. Check out the complete list of supported domains here. We create a topics
dictionary and count the entity name and add to it.
topics = dict()
for key, value in tweets_dict.items():
if "context_annotations" in value:
annotations = tweet['context_annotations']
for annotation in annotations:
if 'entity' in annotation:
entity = annotation['entity']
if 'name' in entity:
name = entity['name']
if name in topics:
topics[name] = topics.get(name) + 1
else:
topics[name] = 1
for k, v in sorted(topics.items(), key=lambda item: item[1], reverse=True):
print(k, v)
In my case, the response I got is:
Services 756
Twitter 756
Most common accounts that appear in the timeline
In order to get the most common accounts that appear in the timeline, we can create a usernames
dictionary and count the number of times a username appears in the timeline, and then we can reverse sort and print it.
usernames = dict()
for key, value in tweets_dict.items():
if value['username'] not in usernames:
usernames[value['username']] = 1
else:
usernames[value['username']] = usernames[value['username']] + 1
for k, v in sorted(usernames.items(), key=lambda item: item[1], reverse=True):
print(k, v)
In my case, I got the following response:
suhemparack 374
icahdq 228
TwitterDev 114
hackingcommsci 18
TwitterAPI 16
SentimentsDev 6
Types of Tweets present in the timeline
Sometimes, you may want to understand how many of the Tweets that appear in the timeline are Original Tweets, Replies, Retweets or Quote Tweets. In order to do so, use the referenced_tweets
field and then in the determine_tweet_type
function, we check whether it is replied_to
, quoted
or retweeted
. If it is neither, then we know that it is an original Tweet.
import tweepy
def determine_tweet_type(tweet):
if 'referenced_tweets' in tweet:
# Check for reply indicator
if tweet['referenced_tweets'][0]['type'] == "replied_to":
return "Reply Tweet"
# Check for quote tweet indicator
elif tweet['referenced_tweets'][0]['type'] == "quoted":
return "Quote Tweet"
# Check for retweet indicator
elif tweet['referenced_tweets'][0]['type'] == "retweeted":
return "Retweet"
else:
return "Original Tweet"
else:
return "Original Tweet"
client = tweepy.Client(consumer_key='REPLACE_ME',
consumer_secret='REPLACE_ME',
access_token='REPLACE_ME',
access_token_secret='REPLACE_ME')
tweets_dict = dict()
# Limit = 8 below will result in recent 800 Tweets being returned because for each request we are requesting 100 Tweets
for response in tweepy.Paginator(client.get_home_timeline,
max_results=100,
tweet_fields=['created_at', 'lang', 'context_annotations', 'public_metrics',
'referenced_tweets'],
expansions=['author_id'],
limit=8):
tweets = response.data
users = {u["id"]: u for u in response.includes['users']}
for tweet in tweets:
user = users[tweet.author_id]
tweets_dict[tweet.id] = {
"id": tweet.id,
"type": determine_tweet_type(tweet),
"text": tweet.text,
"created_at": tweet.created_at,
"lang": tweet.lang,
"like_count": tweet.public_metrics['like_count'],
"context_annotations": tweet.context_annotations,
"username": user.username
}
types = dict()
for key, value in tweets_dict.items():
if value['type'] not in types:
types[value['type']] = 1
else:
types[value['type']] = types[value['type']] + 1
for k, v in sorted(types.items(), key=lambda item: item[1], reverse=True):
print(k, v)
In my case, I got the following response:
Original Tweet 348
Retweet 195
Quote Tweet 112
Reply Tweet 101
I hope this tutorial is helpful to you in learning how to do exploratory analysis on the reverse chronological timeline. If you have any questions or feedback, feel free to reach out to me on Twitter.
Posted on June 2, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.