Scraping tweets without Twitter API for FREE

iw4p

Nima Akbarzadeh

Posted on March 14, 2023

Scraping tweets without Twitter API for FREE

Image description

In the past (before Elon Musk…), you could easily and freely apply for a developer account to get your own tokens and start using Twitter API without any struggle. One of the strengths of the developer account besides making bots and tweeting via API was search API. You could almost grab all the tweets you want. But after Elon Musk, unfortunately, you have to pay for it!

Tiers will start at $500,000 a year for access to 0.3 percent of the company's tweets. Researchers say that's too much for too little data. [source]

There is one solution that almost always works. Selenium! (Also, it's good to know that the great alternative for selenium in JS is puppeter).

It almost allows you to scrape everything on the surface of the web. Just you have to write a script for your use case with the selenium library.

How

The algorithm for scraping tweets is so easy.
These are the steps:

  1. Open Twitter search with an advanced search query.
  2. Scrape specific tags to get the value
  3. scroll
  4. Repeat the steps until you scrape the number of tweets you need.

Code

It can be written by your script or using other libraries like twitter_scraper_selenium
It's available on PyPI and GitHub.



pip install twitter_scraper_selenium


Enter fullscreen mode Exit fullscreen mode

(Note: For saving as CSV and working as data frames, we must install pandas and other dependencies too)
Then you can write your own wrapper function like this

from twitter_scraper_selenium import scrape_keyword
import json
import pandas as pd
import asyncio



def scrape_profile_tweets_since_2023(username: str):
    kword = "from:" + username
    path = './users/' + username
    file_path = path + '.csv'
    tweets = scrape_keyword(
                            headless=True,
                            keyword=kword,
                            browser="chrome",
                            tweets_count=2, # Just last 2 tweets
                            filename=path,
                            output_format="csv",
                            since="2023-01-01",
                            # until="2025-03-02", # Until Right now
                            )
    data = pd.read_csv(file_path)
    data = json.loads(data.to_json(orient='records'))
    return data


Enter fullscreen mode Exit fullscreen mode

You can call this function for multiple accounts at the same time, as this:



from twitter_scraper_selenium import scrape_keyword
import json
import pandas as pd
import asyncio

def scrape_profile_tweets_since_2023(username: str):
    kword = "from:" + username
    path = './users/' + username
    file_path = path + '.csv'
    tweets = scrape_keyword(
                            headless=True,
                            keyword=kword,
                            browser="chrome",
                            tweets_count=2, # Just last 2 tweets
                            filename=path,
                            output_format="csv",
                            since="2023-01-01",
                            # until="2025-03-02", # Until Right now
                            )
    data = pd.read_csv(file_path)
    data = json.loads(data.to_json(orient='records'))
    return data
You can call this function for multiple accounts at the same time, as this:
from multiprocessing import Pool

# Just one account
# scrape_profile_tweets_since_2023('elonmusk')

# Run in parallely
def functionToRunParallely(i):
    return i

noOfPools = 5

if __name__ == "__main__":
    with Pool(noOfPools) as p:
        p.map(scrape_profile_tweets_since_2023,['elonmusk', 'BarackObama', 'cathiedwood'])



Enter fullscreen mode Exit fullscreen mode

Result
Your result will be something like this:

Image description

In the next post, we are going to scrape mentions/replies as well.

If you like the post, please use clap or follow me on GitHub and LinkedIn!
Github.com/iw4p

https://www.linkedin.com/in/nimk/

💖 💪 🙅 🚩
iw4p
Nima Akbarzadeh

Posted on March 14, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related