Web Scraping Script in Ruby

I was working on a project and I had to scrape a web page so I look into the options and I found Nokogiri.
Nokogiri is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.
To get the document I used HTTParty.
HTTParty makes http fun! Also, makes consuming restful web services dead easy.
For this example, I will be scrapping https://rubygems.org/search?query=%s.

Script

The final script is given below:

require 'HTTParty'
require 'Nokogiri'

class RubygemsScrapper
  attr_accessor :parse_page

  # initialize repo for ruby gems requires query string
  def initialize(q)
    doc = HTTParty.get("https://rubygems.org/search?query=#{q}")
    @parse_page ||= Nokogiri::HTML(doc)
  end

  # get the first result's version or if not found returns -1
  def get_latest_version
    begin
      parse_page.css('.gems__gem').css('.gems__gem__version').children[0].text
    rescue
      -1 # Not found
    end
  end

  # get the first result's link to ruby gems org or if not found returns -1
  def get_link
    begin
      "https://rubygems.org" + parse_page.css('.gems__gem').attribute('href').value
    rescue
      -1 # not found
    end
  end

  # Calling scrapper
  scrapper = RubygemsScrapper.new('yiya')
  p scrapper.get_latest_version
  p scrapper.get_link
end

This class would get the name of gem to be searched and returns the first element’s latest version and link to it.

Blog

Web Scraping Script in Ruby

Sulman Baig

Script

Join Our Newsletter. No Spam, Only the good stuff.

Related