Translating Tweets from the Twitter API v2 using AWS Amazon Translate in Python
Suhem Parack
Posted on September 16, 2021
Researchers use the Twitter API to get Twitter data for various research studies. In some cases, they want to translate the Tweet text from one language to another, and then perform further analysis on the text.
In this post, we will learn how to get Tweets using the Twitter API v2 and then convert Tweets from one language to another using Amazon Translate in Python.
Prerequisite
In order to follow this tutorial, you need a Twitter developer account. Once you have a developer account, you will need a bearer token to connect to the Twitter API v2 to get the Tweets. Follow these instructions for obtaining a bearer token.
You will also need an Amazon Web Services (AWS) account in order to use the Amazon Translate service. Instructions on setting up your AWS credentials locally can be found here. Please set it up in order to get the sample code working.
In order to connect to the Twitter API v2, we will use the twarc library. In order to use Amazon Translate, we will use the boto3 library. Thus, first we will import the required libraries:
from twarc import Twarc2, expansions
import boto3
Next, we will setup the twarc client that we will use to get Tweets from the Twitter API v2. In order to set it up, we will pass it the bearer token (obtained in the prerequisites section)
# Replace your bearer token below
client = Twarc2(bearer_token="REPLACE_ME")
Then we will write a small helper function that takes in the input Tweet text, the source language (the language which the Tweet is in) and the target language (the language to which we want the Tweet translated to). In this function, we initialize the Amazon Translate client by writing boto3.client("translate")
.
Once the client is setup, we pass it the Tweet text, source language and target language. Once we have the response, we will return a dictionary with the appropriate values for the original text, the translated text, the source and target language codes from this function
def translate(input_text, source_lang, target_lang):
translate_client = boto3.client("translate")
result = translate_client.translate_text(Text=input_text, SourceLanguageCode=source_lang,
TargetLanguageCode=target_lang)
return {"originalText": input_text, "translatedText": result.get('TranslatedText'),
"sourceLang": result.get('SourceLanguageCode'), "targetLang": result.get('TargetLanguageCode')}
Then, we specify the main function. In this function, we will call the search_recent
method of the twarc library and pass it a search query. This will search for Tweets from the last 7 days based on the conditions specified in the search query.
For this demo, I am specifying that I want Tweets from a particular account (from:SentimentsDev
) that are in the Hindi language (lang:hi
). Learn more about writing search queries here.
def main():
# Replace the query below with your own
query = "from:SentimentsDev lang:hi -is:retweet"
# The search_all method call the recent-archive search endpoint to get Tweets based on the query
search_results = client.search_recent(query=query, max_results=100)
# Twarc returns all Tweets for the criteria set above, so we page through the results
for page in search_results:
# The Twitter API v2 returns the Tweet information and the user, media etc. separately
# so we use expansions.flatten to get all the information in a single JSON
result = expansions.flatten(page)
for tweet in result:
# Here we are calling the translate function and passing it the tweet text, the source language code
# and the target language code
response = translate(tweet['text'], tweet['lang'], 'en')
# Below we print the original text, the translated text, the source and target language codes
print("Original Text: {}".format(response['originalText']))
print("Translated Text: {}".format(response['translatedText']))
print("Source Language: {}".format(response['sourceLang']))
print("Target Language: {}".format(response['targetLang']))
Finally we will call the main function.
if __name__ == "__main__":
main()
In this example, the Tweet was:
Hence, the response will look like this:
Original Text: आज का मौसम बहुत अच्छा है
Translated Text: Today's weather is very good
Source Language: hi
Target Language: en
As you can see the original Tweet was in the Hindi language (denoted by the language code hi
) and it got converted to English. Similarly, you can translate between various other languages using Amazon Translate.
I hope this small tutorial is helpful to you. Please reach out to me on Twitter @suhemparack with questions or feedback.
Posted on September 16, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 16, 2021