Event Driven Processing of ip-ranges.json

mbacchi

Matt Bacchi

Posted on February 1, 2024

Event Driven Processing of ip-ranges.json

Imagine you have a security group that needs to allow all IP addresses of AWS EC2 instances. Or imagine you have to allow IP addresses of Github Actions runners so that only your CI workers connect to your VPC. Both of those IP address ranges change regularly, and need to be updated (usually by hand.)

If we want to automate these security group updates, how could you figure out when these IP address ranges have changed? AWS has an SNS notification sent every time their ip-ranges.json list changes. The SNS notification can be used to initiate an automated procedure to update our security group.

What we're describing is an event driven architecture. In event driven architectures, an event producer causes an event to be created. A downstream event consumer handles the event and may trigger further events.

In this 2 part blog post series, we'll cover event driven architectures. In the first part of the example used to illustrate the mechanics, we'll build the IP address range processing component. This piece of the puzzle processes the AWS ip-ranges.json file and inserts it into a DynamoDB table. The second blog in this series will insert the IP ranges into AWS security groups.

After the next blog post, you'll be able to manage IP address ranges in your environment using Event Driven Architecture.

What is Event Driven Architecture?

The Wikipedia page for event driven architecture (or EDA) describes it as "a software architecture paradigm concerning the production and detection of events". This is in contrast to software which is focused on its own state, and doesn't concern itself with external state changes. Event driven architectures are often made up of loosely coupled components that act independantly on events that they're concerned with.

Why use EDA for data processing?

Data processing is often well suited to the EDA model because it's rarely necessary to syncronously perform a data processing task. If a user purchases something in a shopping cart, certainly the interatction with the bank or credit card needs to be syncronous and immediate. But if the website catalogs all purchases in a "top ten products" list from all user purchases, that processing can be done asyncrynously at a later, and possibly less busy time of the day. It's common for data engineering teams to perform extract, transform, load (or ETL) jobs that process data off hours, to avoid contention for compute resources. This can all be done with an event driven model, where the list of purchased items in our above example would be sent to an event bus (such as AWS EventBridge or Apache Kafka.) Consumers of that type of event would then act on the event depending on their function.

SNS notification for ip-ranges.json

As mentioned above, anyone can subscribe to SNS notifications for the ip-ranges.json list. When a change occurs to that list the SNS notification is sent out. It acts as the event producer in this case, causing anyone who is subscribed to the notification list to receive the event.

In our example today, we'll setup an AWS Lambda function to process these events stemming from the SNS notification. Our function will be the event consumer in this scenario.

Data organization of the ip-ranges.json file

The ip-ranges.json file syntax consists of a creation date (to indicate last update time) and a list of IPv4 and IPv6 address ranges. These look like the following sample:

    {
      "ip_prefix": "52.4.0.0/14",
      "region": "us-east-1",
      "service": "EC2",
      "network_border_group": "us-east-1"
    },
Enter fullscreen mode Exit fullscreen mode

We want to maintain all of this information, and in fact enhance it with the date stamp (or synctoken) which is included in the ip-ranges.json file as mentioned above.

DynamoDB Single Table Design

We'll be inserting all of these IP address ranges into a DynamoDB table for future access and processing (triggered via events.) DynamoDB is great for serverless applications because it allows a large number of stateless connections via HTTP. Using a single DynamoDB table, we'll store all of this data in a way that can be easily queried by creating composite keys that pre-join the data fields and make lookups extremely fast. This concept is discussed quite extensively, but Alex Debrie (an AWS Data Hero) has a blog post that covers these ideas quite well.

Since we want to enable queries that return IP prefixes based on AWS region and service. In order to do that, we'll format the data with both a primary key (PK) and sort key (SK), as well as a synctoken that is completely separate from the IP prefixes. This synctoken will enable us to remove all IP prefixes that have the synctoken because we'll be creating a global secondary index using the synctoken to allow fast queries of the items containing a particular synctoken. We can add non-normalized data to the same DynamoDB table because it isn't an SQL database, it's more like a key-value or wide-column store database, generally referred to as NoSQL.

Here is what our primary and secondary key layout will look like:

Primary Key and Attributes

Lambda Function Overview

We'll use a Lambda function that gets triggered by an SNS notification whenever the IP ranges JSON changes. This SNS subscription, Lambda Function, DynamoDB table and all the AWS infrastructure is configured using AWS Cloud Development Kit (CDK.) The Lambda Function itself does one thing. It downloads the JSON, cycles through every item, and adds it to the DynamoDB table.

AWS CDK configuration

The configuration used to create the Lambda Function and other infrastructure is in the Github repository here. Instructions on how to deploy the infrastructure are also in that repository.

Once deployed the Lambda Function will do nothing until the next time the SNS notification invokes it when the JSON file has changed. Below you can see a screenshot of recent invocations of the function in my environment.

Cloudwatch Metrics of Lambda Function Invocations

Wrapping Up

In this first part of the blog series, you saw how we used Event Driven Architecture to respond to an event and perform some data processing. In the next section we'll handle using these IP address ranges to update security groups allowing traffic for certain AWS services.

Thanks for reading and have a great day!

Resources

  • AWS blog post on "How to Automatically Update Your Security Groups for Amazon CloudFront and AWS WAF by Using AWS Lambda"

Cover photo by Dushyant Kumar on Unsplash

💖 💪 🙅 🚩
mbacchi
Matt Bacchi

Posted on February 1, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related