Using Google Jobs Listing Results API from SerpApi

chukhraiartur

Artur Chukhrai

Posted on November 15, 2022

Using Google Jobs Listing Results API from SerpApi

What will be scraped

wwbs-google-jobs-listing

πŸ“ŒNote: Some queries may not display all sections. You can check your query in the playground.

Why using API?

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status-page

Full Code

If you don't need explanation, have a look at full code example in the online IDE.

from serpapi import GoogleSearch
import os, json


def extract_multiple_jobs():
    params = {
        # https://docs.python.org/3/library/os.html#os.getenv
        'api_key': os.getenv('API_KEY'),        # your serpapi api
        'engine': 'google_jobs',                # SerpApi search engine 
        'gl': 'us',                             # country of the search
        'hl': 'en',                             # language of the search
        'q': 'barista new york',                # search query
    }

    search = GoogleSearch(params)               # where data extraction happens on the SerpApi backend
    results = search.get_dict()                 # JSON -> Python dict

    return [job.get('job_id') for job in results['jobs_results']]


def scrape_google_jobs_listing(job_ids):
    data = []

    for job_id in job_ids:
        params = {
            # https://docs.python.org/3/library/os.html#os.getenv
            'api_key': os.getenv('API_KEY'),    # your serpapi api
            'engine': 'google_jobs_listing',    # SerpApi search engine 
            'q': job_id,                        # search query (job_id)
        }

        search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
        results = search.get_dict()             # JSON -> Python dict

        data.append({
            'job_id': job_id,
            'apply_options': results.get('apply_options'),
            'salaries': results.get('salaries'),
            'ratings': results.get('ratings')
        })

    return data


def main():
    job_ids = extract_multiple_jobs()
    google_jobs_listing_results = scrape_google_jobs_listing(job_ids)

    print(json.dumps(google_jobs_listing_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

Preparation

Install library:

pip install google-search-results
Enter fullscreen mode Exit fullscreen mode

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import os, json
Enter fullscreen mode Exit fullscreen mode
Library Purpose
GoogleSearch to scrape and parse Google results using SerpApi web scraping library.
os to return environment variable (SerpApi API key) value.
json to convert extracted data to a JSON object.

Top-level code environment

The extract_multiple_jobs() function is called to get all the job_id values. The resulting list of job_ids is passed to the scrape_google_jobs_listing(job_ids) function to retrieve the required data. The explanation of these functions will be in the corresponding headings below.

This code uses the generally accepted rule of using the __name__ == "__main__" construct:

def main():
    job_ids = extract_multiple_jobs()
    google_jobs_listing_results = scrape_google_jobs_listing(job_ids)

    print(json.dumps(google_jobs_listing_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

This check will only be performed if the user has run this file. If the user imports this file into another, then the check will not work.

You can watch the video Python Tutorial: if name == 'main' for more details.

Extract Multiple Jobs

The function returns a list of job_id values. The value of this identifier will be used in the next function to create the request.

This function provides a code snippet for getting data from the first page. If you want to extract data using pagination, you can see it in the Scrape Google Jobs organic results with Python blog post.

At the beginning of the function, parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary.

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    'api_key': os.getenv('API_KEY'),        # your serpapi api
    'engine': 'google_jobs',                # SerpApi search engine 
    'gl': 'us',                             # country of the search
    'hl': 'en',                             # language of the search
    'q': 'barista new york',                # search query
}
Enter fullscreen mode Exit fullscreen mode
Parameters Explanation
api_key Parameter defines the SerpApi private key to use.
engine Set parameter to google_jobs to use the Google Jobs API engine.
gl Parameter defines the country to use for the Google search. It's a two-letter country code. (e.g., us for the United States, uk for United Kingdom, or fr for France). Head to the Google countries page for a full list of supported Google countries.
hl Parameter defines the language to use for the Google Jobs search. It's a two-letter language code. (e.g., en for English, es for Spanish, or fr for French). Head to the Google languages page for a full list of supported Google languages.
q Parameter defines the query you want to search.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict
Enter fullscreen mode Exit fullscreen mode

Returns a compiled list of all job_id using list comprehension:

return [job.get('job_id') for job in results['jobs_results']]
Enter fullscreen mode Exit fullscreen mode

The function looks like this:

def extract_multiple_jobs():
    params = {
        # https://docs.python.org/3/library/os.html#os.getenv
        'api_key': os.getenv('API_KEY'),        # your serpapi api
        'engine': 'google_jobs',                # SerpApi search engine 
        'gl': 'us',                             # country of the search
        'hl': 'en',                             # language of the search
        'q': 'barista new york',                # search query
    }

    search = GoogleSearch(params)               # where data extraction happens on the SerpApi backend
    results = search.get_dict()                 # JSON -> Python dict

    return [job.get('job_id') for job in results['jobs_results']]
Enter fullscreen mode Exit fullscreen mode

Scrape Google Jobs Listing

This function takes the job_ids list and returns a list of all data.

Declaring the data list where the extracted data will be added:

data = []
Enter fullscreen mode Exit fullscreen mode

For each job_id value in the job_ids list, separate requests will be made and the corresponding data will be retrieved:

for job_id in job_ids:
    # data extraction will be here
Enter fullscreen mode Exit fullscreen mode

Next, we write a parameters for making a request:

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    'api_key': os.getenv('API_KEY'),    # your serpapi api
    'engine': 'google_jobs_listing',    # SerpApi search engine 
    'q': job_id,                        # search query (job_id)
}
Enter fullscreen mode Exit fullscreen mode
Parameters Explanation
api_key Parameter defines the SerpApi private key to use.
engine Set parameter to google_jobs_listing to use the Google Jobs Listing API engine.
q Parameter defines the job_id string which can be obtained from Google Jobs API.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict
Enter fullscreen mode Exit fullscreen mode

We can then create a dictionary structure from values such as job_id, apply_options, salaries, and ratings. The extracted data is written according to the corresponding keys. After that, the dictionary is appended to the data list:

data.append({
    'job_id': job_id,
    'apply_options': results.get('apply_options'),
    'salaries': results.get('salaries'),
    'ratings': results.get('ratings')
})
Enter fullscreen mode Exit fullscreen mode

At the end of the function, the data list is returned with the retrieved data for each job_id:

return data
Enter fullscreen mode Exit fullscreen mode

The complete function to scrape all data would look like this:

def scrape_google_jobs_listing(job_ids):
    data = []

    for job_id in job_ids:
        params = {
            # https://docs.python.org/3/library/os.html#os.getenv
            'api_key': os.getenv('API_KEY'),    # your serpapi api
            'engine': 'google_jobs_listing',    # SerpApi search engine 
            'q': job_id,                        # search query (job_id)
        }

        search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
        results = search.get_dict()             # JSON -> Python dict

        data.append({
            'job_id': job_id,
            'apply_options': results.get('apply_options'),
            'salaries': results.get('salaries'),
            'ratings': results.get('ratings')
        })

    return data
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    "job_id": "eyJqb2JfdGl0bGUiOiJCYXJpc3RhIiwiaHRpZG9jaWQiOiJuc3Y1d1hyNXdFOEFBQUFBQUFBQUFBPT0iLCJnbCI6InVzIiwiaGwiOiJlbiIsImZjIjoiRXVJQkNxSUJRVUYwVm14aVFtcFdYMjl0V0ZadU9USTNWV0ZZUlZRek9XRTJPVlJtYUc1RVZtaGpaRk5WT1VFMlNYZFpaR2ROU0dzdFoyMVBkMmxmUTNKS2RUQnJjMWxFT0dZNFNHWnFXRUZNTjB4eFRWVmtMV1JRVVRWaVJGbFVSMVo1YmxsVWVuazVPRzlxVVVsTmVXcFJjRXhPVWpWbWMwdFlTMlo2V21SUU1XSkZZa2hTY2pKaGRYcEdlRzVxTVVWNGIwZ3lhVXd3UlZGVVZ6Tk5XSGRNYXpKbVYyVjNFaGQzYkhCeFdTMWZUMHhNTW01d2RGRlFNRGhwUW05QmF4b2lRVVJWZVVWSFpqSTJWMjF3TjBoU2FtNDRPSHB5WkVWTldVMVhVWGRTU1hwMVFRIiwiZmN2IjoiMyIsImZjX2lkIjoiZmNfMSIsImFwcGx5X2xpbmsiOnsidGl0bGUiOiIubkZnMmVie2ZvbnQtd2VpZ2h0OjUwMH0uQmk2RGRje2ZvbnQtd2VpZ2h0OjUwMH1BcHBseSBkaXJlY3RseSBvbiBDdWxpbmFyeSBBZ2VudHMiLCJsaW5rIjoiaHR0cHM6Ly9jdWxpbmFyeWFnZW50cy5jb20vam9icy80MTc4NjMtQmFyaXN0YT91dG1fY2FtcGFpZ249Z29vZ2xlX2pvYnNfYXBwbHlcdTAwMjZ1dG1fc291cmNlPWdvb2dsZV9qb2JzX2FwcGx5XHUwMDI2dXRtX21lZGl1bT1vcmdhbmljIn19",
    "apply_options": [
      {
        "title": "Apply on Trabajo.org",
        "link": "https://us.trabajo.org/job-1683-20221107-34e191c4eb8c8ca3ec69adfa55061df2?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      },
      {
        "title": "Apply on Jobs",
        "link": "https://us.fidanto.com/jobs/job-opening/nov-2022/barista-1432712052?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      },
      {
        "title": "Apply on Craigslist",
        "link": "https://newyork.craigslist.org/mnh/fbh/d/new-york-cafe-barista/7553733276.html?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      },
      {
        "title": "Apply directly on Culinary Agents",
        "link": "https://culinaryagents.com/jobs/417863-Barista?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      }
    ],
    "salaries": null,
    "ratings": null
  },
  ... other results
]
Enter fullscreen mode Exit fullscreen mode

Join us on Twitter | YouTube

Add a Feature RequestπŸ’« or a Bug🐞

πŸ’– πŸ’ͺ πŸ™… 🚩
chukhraiartur
Artur Chukhrai

Posted on November 15, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related