Dev Diary Week 4 - Saving to SQLite
Angelo Stavrow
Posted on May 15, 2020
This post is an entry in a weekly development diary on building a feed-aggregator-based blog on Glitch. Only a small part of building an app is code; the bulk of your time is spent planning, experimenting, making mistakes, getting frustrated, and making progress in baby steps.
The plan
Last time, I set up an OPML file with some default feeds that get passed to the feedparsing API. This is a good start, but we don't want the site to want to have to request feeds every time it's loaded — that takes time, and slow sites are a terrible user experience. It's time to add a database!
As I mentioned, storing the feed content in a database saves us from having to send packets back and forth across the internet to get content every time the site loads. There are a couple of practical considerations here:
- How often should we check for new content?
- How do we deal with updates to content that's already in the database?
The first question is fairly straightforward. We'll want to track the last time we checked for new content on each feed and, if it's been more than a certain amount of time, we can check again for new content. The actual time interval here doesn't matter; what does matter is knowing that we'll be tracking this.
The second question is the kind of thing that sounds like it could become very complex, but should be straightforward because of the project scope I defined in week 2. Specifically, we'll be generating a linklog-style blog that only shares a title, a link back to the source content, and maybe a summary if we can get one from the feed. As such, we don't have to worry too much about how to handle the content of the post changing, though this doesn't help in the case of "reruns" — something all feed-reading apps have to deal with.
So, this week, I'm planning on doing the following:
- Determining what the database structure should look like
- Creating a way to check when the feed's database entry was last updated
- Creating a way to add feed items to the database
This feels a bit ambitious, as I haven't really worked with SQLite before, but we'll see how it goes!
I think we can start with two tables here: a feeds
table, which will store metadata about each feed, and an entries
table, which will store entries from the various feeds. The theoretical limit for the number of rows in any SQLite table is 264, though that will realistically be constrained by the 140TB maximum database size — still plenty of room for storing the data we're interested in.
The feed
table will start off with the following rows:
-
id
(INTEGER
): the unique ID for the row, autoincrementing -
title
(TEXT
): the feed title -
feedUrl
(TEXT
): the url of the feed -
lastFetched
(INTEGER
): the timestamp when we last checked the feed (as Unix Time)
The entry
table will start off with the following rows:
-
id
(INTEGER
): the unique ID for the row, autoincrementing -
guid
(TEXT
): the feed GUID for the entry -
title
(TEXT
): the entry title -
entryUrl
(TEXT
): the url of the entry -
publishDate
(INTEGER
): the date and time when the entry was published (as Unix Time) -
feedId
: a foreign key matching this entry to the correct feed in thefeeds
table
How did it go?
All the planning and research work took a lot more time than I expected! Implementing persistent storage is more complex than the work I've been doing so far, and while I'm fairly comfortable working in MySQL or T-SQL, I've never worked with SQLite before, so I needed to do more digging to understand its limitations and built-in types, for example.
I added a storageController.js file that has some methods for initializing and updating the database. These still need some work, but basically here's how storageController
works:
- When the app loads, it calls
initialize()
, which checks to see if the database file already exists. If it does, great! It will just output the number of feeds and entries to the server's console. If it doesn't, it'll create it based on a given schema. - When the database needs to be updated, it calls
insertNewestEntries()
and passes in afeedObject
from feedparser.js. It then follows a couple of paths:- If the feed already exists in the database, it adds any new entries (according to what the latest
publishDate
is in the feed's stored articles). - If it doesn't, it creates the record for the feed's metadata in the
feed
table, and then inserts as many articles as it has in theentry
table.
- If the feed already exists in the database, it adds any new entries (according to what the latest
There are a few helper functions in the module as well, for things like getting the ID of a feed, and getting the last published date for a given feed.
What went well?
We're using the sqlite3
npm module and it's really quite a pleasure to work with. I had some example code to look at from the original starter app, which was very helpful.
What did I have trouble with?
Learning the ins and outs of a new database engine can take a while! There were a lot of new concepts to wrap my head around and it took me a lot longer than I thought to get this work done — I didn't post an update last week because of it! This is why it's important to build some flexibility into your estimates.
What did I learn?
I learned a whole lot about using SQLite! While I'm pretty comfortable writing SQL, I'd never used this particular database engine. Luckily, once things are set up, writing queries is pretty similar to MySQL or T-SQL.
What will I work on next week?
I still have some work to do here! For example, when insert the newest entries in the database, the code doesn't update the feed's lastFetched
property. We'll need that later.
We also need some way to retrieve everything as a single combined feed, where the entries are ordered by published date. This will be the basis for the content we show on the blog!
And finally, we need to remove some of the endpoints I created in the first couple of weeks — they should be replaced by one to refresh all feeds that are found in the OPML file. Right now I'm adding a DISALLOW_WRITE
flag to be sure that no one can add stuff to the database via the /api/parse/:feed
endpoint.
Do you have a rule of thumb for estimates?
You never know what kind of interesting problems you'll stumble up when you're working on an app, but it's almost a guarantee that it'll take longer than you expect. How much of a "safety margin" do you generally add to your estimates for unexpected surprises? Has that margin gotten smaller as you get more experienced? What kind of factors, if any, influence your safety margins?
Posted on May 15, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.