AWS Best Practices: Three Tier VPC
Dennis Groß (he/him)
Posted on December 18, 2022
The Three Tier VPC is an AWS best practice provides strong network security principles.
TL;DR
Here is the Terraform source code for the VPC that we create in this post
aws-terraform-examples/modules/vpc at master · gdenn/aws-terraform-examples
A Three Tier VPC consists of three different subnet types
- Web Subnet - public subnet, assigns public ipv4 addresses to resources in this subnet directly through the Internet Gateway
- Computing Subnet - private subnet, cannot be reached from the outside since there is no direct route to the Internet Gateway. This subnet can reach resources from the internet through a Nat Gateway route to the Internet Gateway.
- Data Subnet - private isolated subnet, resources in this subnet cannot be reached from the internet and the resources themselves cannot reach out to the internet.
In general use the
- Web Subnet for public resources such as Application Load Balancers, frontend Applications, or Lambda functions that you want to make directly accessible from the internet.
- Computing Subnet for backing services such as private APIs or frontend applications that you expose through an Application Load Balancer.
- Data Subnet for data services and data processing or anything that does not must communicate with external resources on the internet.
Create the VPC
You can create a VPC with the Terraform aws_vpc
resource.
resource "aws_vpc" "main" {
cidr_block = var.cidr
instance_tenancy = "default"
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = var.vpc_name
Environment = var.environment
}
}
A few things are important here
-
instance_tenancy defines where VPC resources will be placed and should be set to
"default"
. There is also the"dedicated"
option which in the case of EC2 instances will use dedicated instances. Keep this on"default"
unless you have a clear use case for"dedicated"
, otherwise this might cost you a small fortune. - enable_dns_hostnames determines whether your resources will receive CNAMEs from the AWS DNS service, leave this on by default. It makes sense to use CNAMEs instead of hardcoded IPs wherever possible.
- enable_dns_support defines whether your VPC resources are supported by the AWS DNS, you should leave this on as well.
You can see that we are using a bunch of variables in this VPC resource. Here is a complete snippet of the [variables.tf](http://variables.tf)
content.
variable "environment" {
type = string
description = "environment type (staging/prod/sdlc)"
default = "production"
}
variable "cidr" {
type = string
description = "vpc cidr"
default = "10.0.0.0/16"
}
variable "cidr_offset" {
description = "offset that we pass to the cidrsubnet function to build subnets"
default = 8
}
variable "profile" {
type = string
description = "aws profile"
}
variable "log_group_name" {
type = string
description = "vpc-flow-logs"
}
variable "region" {
type = string
default = "eu-central-1"
}
variable "vpc_name" {
type = string
description = "vpc name"
default = "test"
}
variable "availability_zones" {
type = list(string)
description = "list of availability zones"
default = ["eu-central-1a", "eu-central-1b", "eu-central-1c"]
}
Every VPC requires you to define a CIDR range which can be a /16
(largest) subnet or a /28
subnet (smallest). The size of your VPC highly depends on the number of resources you want to deploy into it.
Keep in mind that a couple of IPs in each VPC are reserved. Personally, I try to anticipate what I want to do with the VPC in the future. You might deploy some extra resources in 1-2 years, so choose a larger subnet than you need now.
Enable VPC Flow Logs
Flow logs contain top-level meta information of Layer 4 (IP Layer) packages that are transferred in the VPC, it is in general a good practice to enable them and drain them to a CloudWatch Log Group.
They can help you to…
- analyze attack vectors when your network got compromised.
- debug security group issues with resources.
- are useful for intrusion detection.
Here is how you activate the VPC Flow Logs.
resource "aws_flow_log" "vpc_flow_logs" {
iam_role_arn = aws_iam_role.flow_logs_role.arn
log_destination = aws_cloudwatch_log_group.vpc_log_group.arn
traffic_type = "ALL"
vpc_id = aws_vpc.main.id
}
We need to create (or reuse an existing) Cloud Watch Log Group. I like to have one Log Group per application stack.
resource "aws_cloudwatch_log_group" "vpc_log_group" {
name = var.log_group_name
}
And an IAM role that gives the VPC permission to push Flow Logs into the Cloud Watch Log Group.
resource "aws_iam_role" "flow_logs_role" {
name = "flow-logs-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = {
Effect = "Allow"
Principal = {
Service = "vpc-flow-logs.amazonaws.com"
}
Action = "sts:AssumeRole"
}
})
}
resource "aws_iam_role_policy" "create_log_group_policy" {
name = "allow-log-group-policy"
role = aws_iam_role.flow_logs_role.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
Effect = "Allow",
Resource = [
"*"
]
}
]
})
}
Public Web Subnet
Resources in the Web Subnet receive static IPv4 addresses and require a direct route to the Internet Gateway.
resource "aws_internet_gateway" "main_igw" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.vpc_name}-igw"
}
}
The Internet Gateway is hosted by AWS in the public zone and is a region resilient service. You don’t have to provide multiple instances of the Internet Gateway to ensure high availability, AWS ensures that there is a sufficient amount of Internet Gateways deployed in your region in case of a failure.
And that’s sufficient since VPCs are regional constructs which means your blast radius with a VPC is at most a region (if you use multiple availability zones for your subnets).
The next thing that we need is a Route Table that routes directly to the Internet Gateway that we provisioned.
resource "aws_route_table" "public_subnet" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main_igw.id
}
}
We also add a few locals before we start with the Web Subnet itself.
locals {
azs_count = length(var.availability_zones)
computing_offset = local.azs_count
data_offset = local.azs_count * 2
}
Always keep your code DRY and use locals to
- Avoid code redundancies
- Assign complex expressions to local variables with proper names
We create the Web Subnet with the aws_subnet
resource.
resource "aws_subnet" "subnet_web" {
vpc_id = aws_vpc.main.id
map_public_ip_on_launch = true
count = local.azs_count
cidr_block = cidrsubnet(var.cidr, var.cidr_offset, count.index)
availability_zone = element(var.availability_zones, count.index)
tags = {
Name = "${var.vpc_name}-subnet-web"
Environment = var.environment
}
}
The Terraform count
operator creates a subnet for each availability zone that we configured in the variables file. Provide a non-overlapping CIDR range for each subnet that you deploy into an availability zone.
The cidrsubnet function from Terraform computes our VPC CIDR for our subnets which makes it obsolete to pass additional subnet CIDR ranges through variables.
Computing Subnet
The next subnet is the Computing Subnet. The subnet is private, resources in the subnet receive no static IPv4 address and are not reachable from the internet. But the resources can communicate with servers that are on the internet.
The subnet is a great fit for internal APIs or resources that you expose through an Application Load Balancer.
resource "aws_subnet" "subnet_computing" {
vpc_id = aws_vpc.main.id
map_public_ip_on_launch = false
count = local.azs_count
cidr_block = cidrsubnet(var.cidr, var.cidr_offset, count.index + local.computing_offset)
availability_zone = element(var.availability_zones, count.index)
tags = {
Name = "${var.vpc_name}-subnet-computing-${count.index}"
Environment = var.environment
}
}
Notice that we set the map_public_ip_on_launch
to false, this option only makes sense if you route directly to an Internet Gateway which is not the case with the computing subnet.
The next thing that we need is a NAT Gateway.
resource "aws_eip" "nat_eip" {
vpc = true
depends_on = [aws_internet_gateway.main_igw]
count = local.subnet_count
}
resource "aws_nat_gateway" "natgw" {
count = local.subnet_count
allocation_id = aws_eip.nat_eip[count.index].id
subnet_id = aws_subnet.subnet_web[count.index].id
depends_on = [aws_internet_gateway.main_igw]
tags = {
Name = "natgw-${count.index}"
Environment = var.environment
}
}
The Route Table of the Computing subnet routes requests from resources to the internet through the NAT Gateway and from there to the Internet Gateway.
The NAT Gateway requires a static public IPv4 address which we provide through an Elastic IP (eip). The NAT Gateway does a Port Address Translation (PAT), and resources that communicate to the internet use the NAT Gateway’s public IPv4 address. The NAT Gateway assigns each requesting resource from within the subnet a port.
Resources are not always reachable from the same port, that’s the reason why you can’t request them from the internet.
We need to create a Routing Table, a Route, and a Route Table Association for each subnet to finish the computing subnet.
resource "aws_route" "private_subnet" {
count = local.subnet_count
route_table_id = aws_route_table.private_subnet[count.index].id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.natgw[count.index].id
}
resource "aws_route_table" "private_subnet" {
count = local.subnet_count
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.vpc_name}-private-subnet-route-table-${count.index}"
Environment = var.environment
}
}
resource "aws_route_table_association" "subnet_computing_route_table_association" {
count = local.azs_count
subnet_id = aws_subnet.subnet_computing[count.index].id
route_table_id = aws_route_table.private_subnet[count.index].id
}
Data Subnet
The last subnet is the Data subnet. Resources in this subnet cannot communicate to the internet and are not reachable from within the internet.
That makes it a perfect fit for data services or services that perform data processing, or in general everything that doesn’t have to communicate to the internet.
Putting your data services into a private, isolated subnet is a security best practice. An attacker that compromises your data service cannot offload data to a malicious server from within the VPC subnet.
We start with the subnet resource again.
resource "aws_subnet" "subnet_data" {
vpc_id = aws_vpc.main.id
map_public_ip_on_launch = false
count = local.azs_count
cidr_block = cidrsubnet(var.cidr, var.cidr_offset, count.index + local.data_offset)
availability_zone = element(var.availability_zones, count.index)
tags = {
Name = "${var.vpc_name}-subnet-data-${count.index}"
Environment = var.environment
}
}
This subnet does not need a route to the NAT Gateway or Internet Gateway but it is still necessary to provide a Route Table and Route Table Associations for each subnet.
resource "aws_route_table" "private_isolated_subnet" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.vpc_name}-private-isolated-subnet-route-table"
Environment = var.environment
}
}
resource "aws_route_table_association" "subnet_data_route_table_association" {
count = local.azs_count
subnet_id = aws_subnet.subnet_data[count.index].id
route_table_id = aws_route_table.private_isolated_subnet.id
}
Productive VPC Usage
There are a couple of things that you should do when you create a VPC for productive use
- Make sure you have one NAT Gateway per availability zone. NAT Gateway instances will be deployed into your Web subnet, thus they are not a fully-managed solution like the Internet Gateway and you need to make sure that you have multiple instances of them (HA).
- Enable VPC Flow Logs and drain them to Cloud Watch.
- Use multiple availability zones (at least 3) for your VPC so resources in your VPC can be deployed into multiple zones.
- Be a bit more generous with your VPC CIDR range, it is quite tedious to find out that you're VPC is not sufficiently sized. Moving productive resources is quite challenging and VPC peering is also not a walk in the park. So think ahead in regards to your VPC size
The VPC stack from this post has everything that you need for productive use, just make sure that you configure three availability zones in the variable.
Considerations for non-productive Use
There are also a few things that you should be aware of when you use this VPC architecture for non-productive use, mostly in regard to cost-efficiency.
- Use at least two availability zones even for non-productive use. Resources like the Application Load Balancer require at least two availability zones to be deployed, regardless of whether it is for productive or non-productive use.
- Deploy only one NAT Gateway and associate the NAT Gateway with all Computing subnet Route Tables. NAT Gateway cost you per second that they run and additionally for traffic that it transfers so it is a good idea to limit the amount of NAT Gateway for non-productive use.
- Keep in mind that AWS charges you for cross-availability-zone transfer of network data (except for managed solutions). Try to keep your resources deployed in the same AZ so they don’t incur costs for cross-az communication.
- Tag your VPC with the environment tag so operators are never in doubt if the VPC is used for productive or non-productive use (the best practice is still to separate via accounts).
Summary
Every AWS cloud engineer should be comfortable with the configuration of a VPC and should know the core components involved.
You will use VPCs for a large percentile of AWS services and misconfiguring it can cause you a lot of trouble in the long run, so take your time and try to understand all components that we used in this post.
Here is a short recap of the things that we learned in this post
- Web Subnet - public subnet for public reachable resources such as load balancers or internet-facing APIs and frontend applications.
- Computing Subnet - private subnet for resources that need to communicate to the internet but should not be reachable from the internet.
- Data Subnet - private, isolated subnet for data services and data processing.
- Internet Gateway - provides static, public IPv4 addresses to resources.
- NAT Gateway - uses Port Address Translation, routes to the Internet Gateway, and makes sure that resources can communicate to the internet but are not reachable from the internet.
Posted on December 18, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.