Recurring Background Jobs in Flask

jmarhee

Joseph D. Marhee

Posted on May 24, 2022

Recurring Background Jobs in Flask

Feeling some frustration with most RSS reader apps being way too feature-heavy for my liking, I recently tried building a very simple RSS feed reader that pulls from a feeds.yaml file that lists URLs and does little else beyond aggregating those links and, where available, providing the article body.

In the app, a function like this builds the data body that is populated when Flask renders the template on the page in parser.py:


import feedparser
import os
import yaml

def buildConfig():
    with open(os.environ['FEED_YAML_PATH']) as f:
        dict = yaml.load(f, Loader=yaml.FullLoader)
    return dict['feeds']

def buildFeed(feeds):
    feed_body = []
    for url in feeds:
        feed_data = feedparser.parse(url)
        new_feed = {"feed" : url, "data": [feed_data]}
        feed_body.append(new_feed)
    return feed_body
Enter fullscreen mode Exit fullscreen mode

basically just dumping all of the feed data into a larger JSON object for Flask to render in the template.

I wanted to be able to see new feeds I added to this file without restarting Flask, and one way of doing this was to regenerate this data object on every page load, but that would make it take forever to load unless I added a bunch of additional logic or other dependencies or components (storing feeds in memory, etc.)

In order to avoid adding new components to the app (it's a very small Docker image, and restoring app data only amounts to this Yaml file being available), I decided to add a recurring background task using the apscheduler package which has a BackgroundScheduler function in app.py:

from apscheduler.schedulers.background import BackgroundScheduler
Enter fullscreen mode Exit fullscreen mode

then I have it define on app startup the variable that will store the feed_data object to be updated by the recurring job we'll define in a moment, and initialize it:

feed_data = None

@app.before_first_request
def initialize_feeds():
    feeds = buildConfig()
    global feed_data
    feed_data = buildFeed(feeds)
    return feed_data

Enter fullscreen mode Exit fullscreen mode

and then adding an updateFeeds function that behaves similarly to the above function (in a production app, you could avoid repeating yourself and have the above function reference this one, but for the sake of demonstration):

def update_feeds():
    feeds = buildConfig()
    global feed_data
    feed_data = buildFeed(feeds)
    return feed_data

scheduler = BackgroundScheduler()
job = scheduler.add_job(update_feeds, 'interval', minutes=1)
scheduler.start()
Enter fullscreen mode Exit fullscreen mode

and then having that function run as an argument to the scheduler.add_job() function above, with an interval of 1 minute.

So, now, on changes to your Yaml file containing a new URL, the feed_data object will be updated in the background, and load time will be unaffected.

💖 💪 🙅 🚩
jmarhee
Joseph D. Marhee

Posted on May 24, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related