Did you know it is possible to schedule the execution of virtually any task to run in the cloud with AWS Lambda?

I have been asked how to do it a few times by people using Dashbird, so I thought about summarizing it on an article for public reference.

An event-driven approach is perhaps the most powerful aspect of AWS Lambda. In short, Lambda can automatically respond to many kinds of events inside AWS or externally.

This behavior can be leveraged to act as a task scheduler, which is what we’ll cover in this article.

DynamoDB TTL

DynamoDB is like Lambda for data storage: serverless, highly scalable, fully managed, No-SQL database.

It has a cool feature called TTL, which stands for "time to live". TTL allows us to assign a timestamp for the deletion of any entry in the DB. When that time comes, a background job in DynamoDB will automatically delete the entry for us.

Another interesting feature is the table streams: DynamoDB will track every change made to items (DB entries) in a particular table and generate a stream of objects describing those changes. These streams can be consumed by a variety of sources, including... you guessed it, Lambda!

These two features can be combined to transform DynamoDB into a task scheduler. Here's how this would work:

First, create an item with a TTL

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('MyTable')

table.put_item(
   Item={
        'customerID': 'ABC123',
        'task': {
            'action': 'alert_expiration',
            'target': 'subscription',
            'args': {
                'subscriptionID': 'XYZ123'
            }
        },
        'ExpirationTime': 1609416000
    }
)

In the example above, we are setting a schedule for December 31, 2019, alerting a customer about a subscription that is close to expiration.

Note the ExpirationTime attribute we set for the item. DynamoDB will constantly compare the ExpirationTime of items in our table against the current timestamp. When the item is overdue, Dynamo will delete it.

When the item gets deleted, DynamoDB streams will trigger our Lambda function. We can optionally request Dynamo to provide the contents of the deleted item with the streams so that Lambda knows what to do.

In the Lambda code, we need to implement our logic for processing the tasks. In the present example, it could be sending an email to the customer using a pre-determined template.

Important: the frequency with which Dynamo will scan your items and delete expired ones may vary up to 48 hours. This is relevant because, if your use case requires precise time resolution, the implementation above is not recommended.

S3 Object Expiration

S3 is like Lambda for object storage. It can reliably store from text files to images, videos, etc, organized in buckets, and scale seamlessly according to demand.

Using S3 as a task scheduler for AWS Lambda is very similar to DynamoDB streams. We can set an expiration date for an object. S3 will scan our objects regularly and delete the expired ones.

Events in S3, such as object deletion, can also trigger a Lambda function, similarly to DynamoDB.

We can store a JSON file on S3, containing instructions for our Lambda function to process it when it comes the time.

Here's how it works:

import json
from datetime import datetime
import boto3

s3 = boto3.client('s3')

data = {
    'customerID': 'ABC123',
    'task': {
        'action': 'alert_expiration',
        'target': 'subscription',
        'args': {
            'subscriptionID': 'XYZ123'
        },
    }
}

s3.put_object(
    Body=json.dumps(data),
    Bucket='my-task-scheduler',
    Key='test-scheduler.json',
    Expires=datetime(2019, 12, 31)
)

Again, note the Expires attribute, that sets when the object is due for deletion.

When this object gets actually deleted, S3 will invoke our function providing the bucket name and object key. With this information, our Lambda can retrieve the object contents from S3, read the task instructions and perform the logic we want.

Important: as in DynamoDB, AWS does not guarantee an object will get deleted right after expiration, making this implementation unsuitable for use cases that require a precise time resolution.

CloudWatch Rule Schedule

CloudWatch is perhaps the most obvious choice for Lambda task scheduling, and could not be more straightforward: choose a "Schedule" as the event source and a Lambda function as the "Target":

There are mainly two disadvantages, though:

1) Only supports recurrent schedules

We can only choose a fixed-rate time, e.g. 'every 2 hours', or 'every 5 minutes', or a CRON-like expression.

It is not possible to schedule a specific date for the event to happen, for example, as we did with DynamoDB and S3.

2) No customized events

Each and every invocation of the Lambda by a CloudWatch Rule Schedule will provide the exact same request payload. We cannot customize how Lambda will run based only on what the scheduler provides, like we did with DynamoDB and S3.

Wrapping up

We have explored three ways of using an event-driven approach for scheduling tasks for AWS Lambda to process, each having its pros and cons.

There is one challenge of using this sort of architecture: keep control and visibility over what is going on in your system. Autonomous invocations can be difficult to track, especially when the application start scaling up.

In case you would like to stay on top of your stack while still reaping the benefits of serverless event-driven architecture, I would recommend to check out Dashbird, a monitoring service specially designed for your use case.

Disclosure: I work as a developer advocate for Dashbird.

Blog

How to Schedule Any Task with AWS Lambda

Renato Byrro