Ruby CLI application: scraping, object relationships and single source of truth
Steve Kim
Posted on October 26, 2020
There’s an exciting aspect to building a CLI application, or rather the CLI in general. For the average user like myself who hasn’t a clue about the inner workings of a computer and has been confined to the comfort of GUIs, even just typing commands into the Terminal can give you the sensation of feeling like you’ve become some notorious hacker in a spy movie. And thus, to be able to successfully build an entire CLI-based application from scratch was truly a rewarding experience.
The Goal
I decided to build a very basic application for Premier League football (or soccer, in some countries) where it shows the current league standings, all the clubs in the league, and information for each individual club, and perhaps even stats for individual players (although I never got that far). It started off as a breeze. I thought I had a pretty good grasp of the concepts being dealt with, how I was going to obtain the data by scraping, and so on and so forth. To be fair, I had a great guide to follow, using a video demo that was done by my bootcamp instructor. All of it made so much sense when I was watching the demo, but as with most things in life, it’s one thing to watch someone do it and completely another to actually do it.
My first despair
Scraping consumed a bulk of my time. Not fully understanding and knowing how to effectively utilize Nokogiri methods was my downfall. I was fixated on chaining my .css
selectors when later on I discovered that I could have much more easily grabbed the same data by joining ids and classes directly to tags. For instance, a line in my scraper class that grabs a piece of data like so:
.css('.tableBodyContainer.isPL').css('tr:not(.expandable)').css('.long').text
could have just as easily accomplished the same thing using:
.search('tbody.tableBodyContainer.isPL span.long').text
My second despair
I knew two important rules about building relationships when going into this project. One was that the objects will need to follow the principle of maintaining a ‘single source of truth’ when building relationships across classes and that this should be done by having the object which “belongs-to” another object be accountable for holding the relationship. And once I’ve done that, I knew I’d be expected to establish the remaining relationships only through methods. Simple enough, right? The only problem was that this seemed much easier in my head when the relationship was A -< B >- C as opposed to what I had to do which was A -< B -< C. So instead of B keeping track of both A and C, I needed to have B accountable for A and C accountable for B, then somehow build methods that will allow A to interact with C and vice versa. After building and re-building my classes over and over and hours of rubber duck debugging, I got it done.
league = League.find_or_create_by_name(league_name)
new_club = Club.new(name, league, position, matches_played, matches_won, matches_drawn, matches_lost, goals_for, goals_against, goal_diff, points)
Player.new(new_club, player_number, player_name, player_position)
My Club class was keeping track of my League class and my Player class was keeping track of my Club class.
Then I went on to build methods in my League class that could communicate with my Player class, like so:
def clubs
Club.all.select {|club| club.league == self}
end
def players
Player.all.select {|player| self.clubs.include?(player.club)}
End
And then finally an instance method inside my Player class to access my League class:
def league
League.all.select {|league| league.players.include?(self)}
End
Final thoughts
There’s something I have yet to figure out and that’s a way to delay my deeper level scrapes until they're needed, instead of scraping all of my data in advance when the application first runs. The scraping simply takes way too long at the moment. Although I would ideally like to store the URL for my deeper scrapes as instance variables and then pass it into a scraper method as needed, this is proving to be a lot more difficult than I had anticipated primarily because of the way my second scrape is designed and the way my logic is currently built in the CLI class. Hopefully as I dive deeper into programming and become more skillful, I will be able to find a more elegant solution.
Posted on October 26, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 26, 2020