some approaches to scrape the clutch-page: with bs4 and pandas - a comparison
hub
Posted on May 29, 2023
trying to gather the data form the page "https://clutch.co/il/it-services"
and that said i - think that there are probably several options to do that
a. using bs4 and requests
b. using pandas
this first approach uses a.
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://clutch.co/il/it-services"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
company_names = soup.find_all("h3", class_="company-name")
locations = soup.find_all("span", class_="locality")
company_names_list = [name.get_text(strip=True) for name in company_names]
locations_list = [location.get_text(strip=True) for location in locations]
data = {"Company Name": company_names_list, "Location": locations_list}
df = pd.DataFrame(data)
df.to_csv("it_services_data.csv", index=False)
This code will scrape
a. the company names and locations from the specified webpage and
b. stores them in a Pandas DataFrame.
c. It will then save the data to a CSV file named "it_services_data.csv" in the current working directory.
i am wondering if a panda-approach could be useful as well?
import pandas as pd
url = "https://clutch.co/il/it-services"
# Use pandas to read HTML content and extract tables from the webpage
tables = pd.read_html(url)
# Assuming the desired table is the first one on the page
table = tables[0]
# Extract the columns we're interested in
df = table[["Company Name", "Location"]]
# Optional: Clean up the column names if needed
df.columns = ["Company Name", "Location"]
# Optional: Perform further data processing or analysis using the Pandas DataFrame
# Save the data to a CSV file
df.to_csv("it_services_data.csv", index=False)
In this approach, pandas' read_html() function is used to read the HTML content of the webpage and extract tables.
Assuming the desired table is the first one on the page, we are able to assign it to the table variable.
Then, we can extract the columns we're interested in and create the DataFrame.
Finally, we are able to perform further data processing or analysis if needed and save the data to a CSV file.
Posted on May 29, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.