Crawling Glassdoor

sizief

Ali Deishidi

Posted on April 28, 2019

Crawling Glassdoor

header
So a week ago I read a post here about Ruby would kill Python in the future. There are lots of debates, as always, on the comment section of that post. Someone mentions that it does not matter who would kill whom, every language or tool is suitable for something; You have to pick right tool for right work.

I think that is the right answer, however, it is important to consider that what market (or industry) thinks about tools and languages. Do they want a Ruby programmer as much as they want a Python developer?

To find out, I created a small Ruby project which does these tasks:

  • Crawl Glass-door pages for predefined cities for software job ads
  • Store pages
  • Create anagram to count number of occurrences of each keyword (such as Python or Ruby
  • Generate yml and png files to visualize how industry is in need for each skill.

Here is the output for total ten cities around the world. Remember, this is the number of ads that contains each keyword. For example if ad contains Java then it increase the number of total ads that contains this specific keyword.

Software Languages

And here is the number of technologies mentions:

Software Technologies

So finally which language is in demand more than others?

The answer is easy: Java. But if you are looking for script languages, then the answer is Python, Javascript and then Ruby. However there are interesting findings when you compare results for individual cities.

For example this is an output for Amsterdam:
Amsterdam stats
Notice anything unusual? yes Scala is in demand even more than Pyhton or Javascript!

See the rest of these reports here. It includes chart for New york, Berlin, London, Toronto, Singapore, Dubai, Tallinn and etc.

About the project

The structure is easy. First there is a configuration file which you can define the cities, keywords, categories and etc.

urls:
  - Tallinn;https://www.glassdoor.ca/Job/tallinn;jobs
job_types:  
  - software
  - back-end
  - front-end
category:
  - languages
  - technologies
languages:
  - java
  - javascript 
  - c 
Enter fullscreen mode Exit fullscreen mode

Then by running client.rb it will get first URL from the configuration file, crawls the web page, saves all URL specific parameter for each listing page, gets the second page and repeats it until the last page.

After that another class will crawl the web site again. This time it download the whole ad page and save it on disk.

The third class then create an anagram of all predefined keywords and scans every document that we saved in previous step. We save the results as a yml file then.

Here is the sample of output:

languages:
  java: 324
  javascript: 196
  c: 75
  c#: 140
  c++: 144
technologies:
  kafka: 41
  nosql: 60
Enter fullscreen mode Exit fullscreen mode

At the end with the help of Gruff Gem we generate images from YAML files.

Side notes

  • This could be helpful if you are investigating your next career path or your next language to learn. Nothing serious more than that.
  • The project is pretty much configurable. Just update the config file: add what city you want, the first URL, what keywords you are looking for and what categories. Then run it (wait minutes to get all the data) and check the output on result folder. link to project
  • Have fun!
💖 💪 🙅 🚩
sizief
Ali Deishidi

Posted on April 28, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Crawling Glassdoor
glassdoor Crawling Glassdoor

April 28, 2019