How JavaScript helped me buy my dream boat

moozzyk

Pawel Kadluczka

Posted on April 7, 2024

How JavaScript helped me buy my dream boat

Programming is a superpower. With just a little imagination, programmers can do things that would be expensive or inaccessible to others. I had used it to build more or less useful things and had a chance to use it again recently when our family and I decided to get a boat. As this was our first boat, we had no experience buying one and didn't know the market. To make up for this, I turned to software.

The idea

Even though I rented a boat a few times, I had no idea how to buy one. I decided that the best way to learn would be to look at many boats to see what's available and at what price.
Finding different dealerships and checking their inventory was one option. I dismissed it almost immediately because it was time-consuming and tedious work, the inventory was limited, and the prices seemed inflated. Craigslist seemed like a much better option. People and dealerships post their ads daily, so there is a continuous supply of diverse inventory. The downside was that many posts were low-quality or uninteresting from my perspective.
So, I came up with this idea: build an application that pulls Craigslist posts, filters out ones that are clearly uninteresting, and allows curating the remaining ones. Curated posts shouldn't show up again, even if reposted or updated. Downloaded posts should be accessible even if the original post on Craigslist was deleted. With an application like this, I could see similar boats, learn about different models and equipment, and compare prices.

Technical overview

Once I had a general idea of what to build, I started thinking about the implementation. I needed to pull posts on a regular cadence - e.g., daily. This could be easily implemented by using cron. It required a server to run, but fortunately, I already had a Raspberry Pi that powers a few small tasks around my house, like allowing me to open my garage door from my phone. The same Raspberry Pi could host my new application.
I needed a database to store posts and curation results. At first, I considered MySQL or PostgreSQL, but given how simple my requirements were, I realized that SQLite would be more than enough.
I decided to run my application in a docker container to make it easy to manage. I separated the data from the application by storing the SQLite DB file on a separate Docker volume. This way, I could easily maintain historical data when updating the application: I would simply spin a container with the new version and mount the volume with the DB file.
Here is a diagram depicting the architecture of my solution:

System Architecture

I implemented the script to pull the post and the application using TypeScript and used NodeJS to run them.

Nitty-gritty details

Pulling the posts

Pulling the posts turned out to be more difficult than I anticipated. I knew Craigslist killed the API long ago, but I thought the posts could be fetched by sending a simple HTTP request. At first, I tried using the python-craigslist wrapper, but I couldn't make it work. After some investigation, it turned out that getting the post gallery with an HTTP request wouldn't work. The gallery worked by downloading a few JavaScript files that fetched additional information to dynamically build the DOM. As I didn't want to give up on my idea, I figured I could use the Puppeteer (headless Chrome browser) to download the post gallery (individual posts could still be downloaded with fetch). This got the job done but required writing more code than I had planned for. The Puppeteer-based solution was also slow (especially on an older Raspberry Pi). It didn't matter too much, though - the script that used it was executed by cron every night at 2 a.m. and ran in the background for at most a couple of minutes.

While using Puppeteer on my Macbook just worked, making it work on a Raspberry Pi took some effort. I wrote a separate post about this because this is a longer story.

I thought more people might be interested in accessing Craigslist posts programmatically, so I made my solution open-source and published it as an npm package.

Note: I checked recently, and it is now possible to get the Craigslist post gallery again with a simple HTTP request. From that perspective, using Puppeteer is no longer needed. The good news is that the document structure didn't change, and my npm package continues to work.

Post curation

Web application

I needed to build an application to present results and allow manual curation of posts. I used the Express framework to do this. Even though it was the first time I had ever used it, the online tutorials and resources made it easy. To say that the user interface was simple would be an understatement - it was bare. Nevertheless, it had all the functionality I needed. Here is how it looks:

User interface

  1. No https, as there is no need for it. The application only runs on my local network.
  2. The application is running on my Raspberry Pi.
  3. The number of reposts and updates.
  4. Asking price.
  5. Price range.
  6. The picture of the boat. It let me tell quickly if this was the kind of boat I wanted. The image is also a link to the Craigslist post.
  7. The curation button. Clicking it will hide the post and add it to the list of uninteresting posts.

With this setup, I believed it was only a matter of time before I spotted the right opportunity. And sure enough, after a few months, I found a boat I loved and eventually bought.

Boat

Learnings

In my day job, I work on backend systems that process large volumes of data. I spend a lot of time on system design, code mostly in C++, and ensure our services run smoothly. This side project allowed me to learn about technologies I had rarely, if ever, used. I shared what I learned along the way. Here is a list

Finally, the code that ties everything together: https://github.com/moozzyk/boat-scraper


💙 If you liked this article...

I publish a weekly newsletter for software engineers who want to grow their careers. I share mistakes I’ve made and lessons I’ve learned over the past 20 years as a software engineer.

Sign up here to get my articles delivered to your inbox.

https://www.growingdev.net/

💖 💪 🙅 🚩
moozzyk
Pawel Kadluczka

Posted on April 7, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related