Deploying a highly available Vault cluster on Amazon EKS using Terraform
Chabane R.
Posted on April 15, 2021
Many companies moving to the cloud want to continue working with legacy tools to:
- avoid vendor lock-in,
- use the existing skill and process,
- take advantage of the multi-cloud strategy,
- and so on.
Among companies that have used Vault in their on-premises environment, many continue to use it after their migration to the cloud.
Vault is a tool for securely accessing secrets. A secret is anything that you want to tightly control access to, such as API keys, passwords, or certificates. Vault provides a unified interface to any secret, while providing tight access control and recording a detailed audit log. [1]
In this post we will deploy step by step a Vault cluster on Amazon Amazon Elastic Container Kubernetes.
Using terraform we will deploy:
- A highly available architecture that spans three Availability Zones.
- A virtual private cloud (VPC) configured with public and private subnets according to AWS best practices.
- In the public subnets:
- Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.
- In the private subnets:
- A group of Kubernetes nodes.
- An Amazon EKS cluster, which provides the Kubernetes control plane.
To deploy the Vault cluster, we create in AWS:
- An Elastic Load Balancer for the Vault UI.
- An AWS Certificate Manager (ACM) certificate for the Vault UI.
- A boot-vault IAM role to bootstrap the Vault servers.
- A vault-server IAM role for Vault to access AWS Key Management Service (AWS KMS) for auto unseal.
- AWS Secrets Manager to store the Vault on Amazon EKS root secret.
- An AWS KMS key for auto unseal.
In Kubernetes:
- A dedicated node group for Vault on Amazon EKS.
- A dedicated namespace for Vault on Amazon EKS.
- An internal Vault TLS certificate and certificate authority for securing communications.
- For the Vault service:
- Vault server pods.
- A Vault UI.
If you prefer to use AWS Cloudformation instead of Terraform, the equivalent workshop can be found in aws-quickstart
Prerequisites
- Installing and configuring AWS CLI
- Terraform
- Kubectl
- Vault
- Create a public hosted zone in Route 53. See tutorial
- Request a public certificate with AWS Certificate Manager. See tutorial
Network
In this section, we create a VPC, 3 private and public subnets, 3 NAT Gateways and an internet gateway.
plan/vpc.tf
resource "aws_vpc" "security" {
cidr_block = var.vpc_cidr_block
instance_tenancy = "default"
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Environment = "core"
Name = "security"
}
lifecycle {
ignore_changes = [tags]
}
}
resource "aws_default_security_group" "defaul" {
vpc_id = aws_vpc.security.id
}
plan/subnet.tf
resource "aws_subnet" "private" {
for_each = {
for subnet in local.private_nested_config : "${subnet.name}" => subnet
}
vpc_id = aws_vpc.security.id
cidr_block = each.value.cidr_block
availability_zone = var.az[index(local.private_nested_config, each.value)]
map_public_ip_on_launch = false
tags = {
Environment = "security"
Name = each.value.name
"kubernetes.io/role/internal-elb" = 1
}
lifecycle {
ignore_changes = [tags]
}
}
resource "aws_subnet" "public" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
vpc_id = aws_vpc.security.id
cidr_block = each.value.cidr_block
availability_zone = var.az[index(local.public_nested_config, each.value)]
map_public_ip_on_launch = true
tags = {
Environment = "security"
Name = each.value.name
"kubernetes.io/role/elb" = 1
}
lifecycle {
ignore_changes = [tags]
}
}
plan/igw.tf
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.security.id
tags = {
Environment = "core"
Name = "igw-security"
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.security.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
tags = {
Environment = "core"
Name = "rt-public-security"
}
}
resource "aws_route_table_association" "public" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
subnet_id = aws_subnet.public[each.value.name].id
route_table_id = aws_route_table.public.id
}
plan/nat.tf
resource "aws_eip" "nat" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
vpc = true
tags = {
Environment = "core"
Name = "eip-${each.value.name}"
}
}
resource "aws_nat_gateway" "nat-gw" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
allocation_id = aws_eip.nat[each.value.name].id
subnet_id = aws_subnet.public[each.value.name].id
tags = {
Environment = "core"
Name = "nat-${each.value.name}"
}
}
resource "aws_route_table" "private" {
for_each = {
for subnet in local.public_nested_config : "${subnet.name}" => subnet
}
vpc_id = aws_vpc.security.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.nat-gw[each.value.name].id
}
tags = {
Environment = "core"
Name = "rt-${each.value.name}"
}
}
resource "aws_route_table_association" "private" {
for_each = {
for subnet in local.private_nested_config : "${subnet.name}" => subnet
}
subnet_id = aws_subnet.private[each.value.name].id
route_table_id = aws_route_table.private[each.value.associated_public_subnet].id
}
Amazon EKS
In this section we create our Kubernetes cluster with the following settings:
- restrict access to a specific IP (it could be your office range IPs) and to the NAT gateways IPs (if you want to access the vault from a CI / CD tool hosted in this VPC)
- enable all logs
- enable IAM roles for service accounts
- security groups for the cluster
plan/eks-cluster.tf
resource "aws_eks_cluster" "security" {
name = var.eks_cluster_name
role_arn = aws_iam_role.eks.arn
version = "1.17"
vpc_config {
security_group_ids = [aws_security_group.eks_cluster.id]
endpoint_private_access = true
endpoint_public_access = true
public_access_cidrs = concat([var.authorized_source_ranges], [for n in aws_eip.nat : "${n.public_ip}/32"])
subnet_ids = concat([for s in aws_subnet.private : s.id], [for s in aws_subnet.public : s.id])
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
depends_on = [
aws_iam_role_policy_attachment.eks-AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.eks-AmazonEKSVPCResourceController,
aws_iam_role_policy_attachment.eks-AmazonEKSServicePolicy
]
tags = {
Environment = "core"
}
}
resource "aws_iam_role" "eks" {
name = var.eks_cluster_name
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
data "tls_certificate" "cert" {
url = aws_eks_cluster.security.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "openid" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.cert.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.security.identity[0].oidc[0].issuer
}
resource "aws_iam_role_policy_attachment" "eks-AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks.name
}
resource "aws_iam_role_policy_attachment" "eks-AmazonEKSServicePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
role = aws_iam_role.eks.name
}
resource "aws_security_group" "eks_cluster" {
name = "${var.eks_cluster_name}/ControlPlaneSecurityGroup"
description = "Communication between the control plane and worker nodegroups"
vpc_id = aws_vpc.security.id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.eks_cluster_name}/ControlPlaneSecurityGroup"
}
}
resource "aws_security_group_rule" "cluster_inbound" {
description = "Allow unmanaged nodes to communicate with control plane (all ports)"
from_port = 0
protocol = "-1"
security_group_id = aws_eks_cluster.security.vpc_config[0].cluster_security_group_id
source_security_group_id = aws_security_group.eks_nodes.id
to_port = 0
type = "ingress"
}
Here we create two nodegroups, one private and one public.
plan/eks-nodegroup.tf
resource "aws_eks_node_group" "private" {
cluster_name = aws_eks_cluster.security.name
node_group_name = "private-node-group-security"
node_role_arn = aws_iam_role.node-group.arn
subnet_ids = [for s in aws_subnet.private : s.id]
labels = {
"type" = "private"
}
instance_types = ["t3.small"]
scaling_config {
desired_size = 3
max_size = 5
min_size = 3
}
depends_on = [
aws_iam_role_policy_attachment.node-group-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.node-group-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.node-group-AmazonEC2ContainerRegistryReadOnly
]
tags = {
Environment = "core"
}
}
resource "aws_eks_node_group" "public" {
cluster_name = aws_eks_cluster.security.name
node_group_name = "public-node-group-security"
node_role_arn = aws_iam_role.node-group.arn
subnet_ids = [for s in aws_subnet.public : s.id]
labels = {
"type" = "public"
}
instance_types = ["t3.small"]
scaling_config {
desired_size = 1
max_size = 3
min_size = 1
}
depends_on = [
aws_iam_role_policy_attachment.node-group-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.node-group-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.node-group-AmazonEC2ContainerRegistryReadOnly,
]
tags = {
Environment = "core"
}
}
resource "aws_iam_role" "node-group" {
name = "eks-node-group-role-security"
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
Version = "2012-10-17"
})
}
resource "aws_iam_role_policy_attachment" "node-group-AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.node-group.name
}
resource "aws_iam_role_policy_attachment" "node-group-AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.node-group.name
}
resource "aws_iam_role_policy_attachment" "node-group-AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.node-group.name
}
resource "aws_iam_role_policy" "node-group-ClusterAutoscalerPolicy" {
name = "eks-cluster-auto-scaler"
role = aws_iam_role.node-group.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
]
Effect = "Allow"
Resource = "*"
},
]
})
}
resource "aws_security_group" "eks_nodes" {
name = "${var.eks_cluster_name}/ClusterSharedNodeSecurityGroup"
description = "Communication between all nodes in the cluster"
vpc_id = aws_vpc.security.id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
self = true
}
ingress {
from_port = 0
to_port = 0
protocol = "-1"
security_groups = [aws_eks_cluster.security.vpc_config[0].cluster_security_group_id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.eks_cluster_name}/ClusterSharedNodeSecurityGroup"
Environment = "core"
}
}
Vault
In this section, we create the AWS resources needed to allow Vault Cluster to access Secret Manager, CloudWatch logs, and KMS keys. We also create a RecordSet on Route53 to access vault-ui
. We upload the necessary scripts to the S3 bucket.
plan/vault.tf
resource "aws_iam_role" "vault-unseal" {
name = "vault-unseal"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": aws_iam_openid_connect_provider.openid.arn
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub": "system:serviceaccount:vault-server:vault"
}
}
}
]
})
tags = {
Environment = "core"
}
}
resource "aws_iam_role_policy" "vault-unseal" {
name = "vault-unseal"
role = aws_iam_role.vault-unseal.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"iam:GetRole",
]
Effect = "Allow"
Resource = "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:role/vault-unseal"
},
{
Action = [
"kms:*",
]
Effect = "Allow"
Resource = "*"
}
]
})
}
resource "aws_iam_role" "vault" {
name = "vault"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": aws_iam_openid_connect_provider.openid.arn
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub": "system:serviceaccount:vault-server:boot-vault"
}
}
}
]
})
tags = {
Environment = "core"
}
}
resource "aws_iam_role_policy" "vault" {
name = "vault"
role = aws_iam_role.vault.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"logs:CreateLogStream",
"logs:DescribeLogStreams"
]
Effect = "Allow"
Resource = "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:vault-audit-logs"
},
{
Action = [
"logs:PutLogEvents",
]
Effect = "Allow"
Resource = "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:vault-audit-logs:log-stream:*"
},
{
Action = [
"ec2:DescribeInstances",
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"s3:*",
]
Effect = "Allow"
Resource = "*"
},
{
Action = [
"secretsmanager:UpdateSecretVersionStage",
"secretsmanager:UpdateSecret",
"secretsmanager:PutSecretValue",
"secretsmanager:GetSecretValue"
]
Effect = "Allow"
Resource = aws_secretsmanager_secret.vault-secret.arn
},
{
Action = [
"iam:GetRole"
]
Effect = "Allow"
Resource = "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:role/vault"
}
]
})
}
resource "aws_kms_key" "vault-kms" {
description = "Vault Seal/Unseal key"
deletion_window_in_days = 7
policy = <<EOT
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Enable IAM User Permissions",
"Action": [
"kms:*"
],
"Principal": {
"AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
},
"Effect": "Allow",
"Resource": "*"
},
{
"Sid": "Allow administration of the key",
"Action": [
"kms:Create*",
"kms:Describe*",
"kms:Enable*",
"kms:List*",
"kms:Put*",
"kms:Update*",
"kms:Revoke*",
"kms:Disable*",
"kms:Get*",
"kms:Delete*",
"kms:ScheduleKeyDeletion",
"kms:CancelKeyDeletion"
],
"Effect": "Allow",
"Resource": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
"Principal": {
"AWS": [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault",
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
]
}
},
{
"Sid": "Allow use of the key",
"Action": [
"kms:DescribeKey",
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey",
"kms:GenerateDataKeyWithoutPlaintext"
],
"Principal": {
"AWS": [
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault",
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
]
},
"Effect": "Allow",
"Resource": "*"
}
]
}
EOT
}
resource "random_string" "vault-secret-suffix" {
length = 5
special = false
upper = false
}
resource "aws_secretsmanager_secret" "vault-secret" {
name = "vault-secret-${random_string.vault-secret-suffix.result}"
kms_key_id = aws_kms_key.vault-kms.key_id
description = "Vault Root/Recovery key"
}
resource "aws_route53_record" "vault" {
zone_id = data.aws_route53_zone.public.zone_id
name = "vault.${var.public_dns_name}"
type = "CNAME"
ttl = "300"
records = [data.kubernetes_service.vault-ui.status.0.load_balancer.0.ingress.0.hostname]
depends_on = [
kubernetes_job.vault-initialization,
helm_release.vault,
data.kubernetes_service.vault-ui
]
}
resource "aws_s3_bucket" "vault-scripts" {
bucket = "bucket-${data.aws_caller_identity.current.account_id}-${var.region}-vault-scripts"
acl = "private"
tags = {
Name = "Vault Scripts"
Environment = "core"
}
}
resource "aws_s3_bucket_object" "vault-script-bootstrap" {
bucket = aws_s3_bucket.vault-scripts.id
key = "scripts/bootstrap.sh"
source = "scripts/bootstrap.sh"
etag = filemd5("scripts/bootstrap.sh")
}
resource "aws_s3_bucket_object" "vault-script-certificates" {
bucket = aws_s3_bucket.vault-scripts.id
key = "scripts/certificates.sh"
source = "scripts/certificates.sh"
etag = filemd5("scripts/certificates.sh")
}
Here we create our Kubernetes resources to initialize and deploy the Vault cluster.
plan/k8s.tf
resource "kubernetes_namespace" "vault-server" {
metadata {
name = "vault-server"
}
}
data "template_file" "vault-values" {
template = <<EOF
global:
tlsDisable: false
ui:
enabled: true
externalPort: 443
serviceType: "LoadBalancer"
loadBalancerSourceRanges:
- ${var.authorized_source_ranges}
- ${aws_eip.nat["public-security-1"].public_ip}/32
- ${aws_eip.nat["public-security-2"].public_ip}/32
- ${aws_eip.nat["public-security-3"].public_ip}/32
annotations: |
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ${var.acm_vault_arn}
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: https
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443,8200"
service.beta.kubernetes.io/do-loadbalancer-healthcheck-path: "/ui/"
service.beta.kubernetes.io/aws-load-balancer-internal: "false"
external-dns.alpha.kubernetes.io/hostname: "vault.${var.public_dns_name}"
external-dns.alpha.kubernetes.io/ttl: "30"
server:
nodeSelector: |
eks.amazonaws.com/nodegroup: private-node-group-security
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca
extraVolumes:
- type: secret
name: vault-server-tls
image:
repository: "vault"
tag: "1.6.0"
logLevel: "debug"
serviceAccount:
annotations: |
eks.amazonaws.com/role-arn: "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
extraEnvironmentVars:
AWS_ROLE_SESSION_NAME: some_name
ha:
enabled: true
nodes: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = 0
tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
tls_client_ca_file = "/vault/userconfig/vault-server-tls/vault.ca"
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/vault/data"
}
service_registration "kubernetes" {}
seal "awskms" {
region = "${var.region}"
kms_key_id = "${aws_kms_key.vault-kms.key_id}"
}
EOF
}
resource "helm_release" "vault" {
name = "vault"
chart = "hashicorp/vault"
values = [data.template_file.vault-values.rendered]
namespace = "vault-server"
depends_on = [kubernetes_job.vault-certificate]
}
resource "kubernetes_cluster_role" "boot-vault" {
metadata {
name = "boot-vault"
}
rule {
api_groups = [""]
resources = ["pods/exec", "pods", "pods/log", "secrets", "tmp/secrets"]
verbs = ["get", "list", "create"]
}
rule {
api_groups = ["certificates.k8s.io"]
resources = ["certificatesigningrequests", "certificatesigningrequests/approval"]
verbs = ["get", "list", "create", "update"]
}
}
resource "kubernetes_service_account" "boot-vault" {
metadata {
name = "boot-vault"
namespace = "vault-server"
labels = {
"app.kubernetes.io/name" = "boot-vault"
}
annotations = {
"eks.amazonaws.com/role-arn" = aws_iam_role.vault.arn
}
}
}
resource "kubernetes_job" "vault-initialization" {
metadata {
name = "boot-vault"
namespace = "vault-server"
}
spec {
template {
metadata {}
spec {
container {
name = "boot-vault"
image = "amazonlinux"
command = ["/bin/bash","-c"]
args = ["sleep 15; yum install -y awscli 2>&1 > /dev/null; export AWS_REGION=${var.region}; aws sts get-caller-identity; aws s3 cp $(S3_SCRIPT_URL) ./script.sh; chmod +x ./script.sh; ./script.sh"]
env {
name = "S3_SCRIPT_URL"
value = "s3://${aws_s3_bucket.vault-scripts.id}/scripts/bootstrap.sh"
}
env {
name = "VAULT_SECRET"
value = aws_secretsmanager_secret.vault-secret.arn
}
}
service_account_name = "boot-vault"
restart_policy = "Never"
}
}
backoff_limit = 0
}
depends_on = [
kubernetes_job.vault-certificate,
helm_release.vault,
aws_s3_bucket_object.vault-script-bootstrap
]
}
resource "kubernetes_job" "vault-certificate" {
metadata {
name = "certificate-vault"
namespace = "vault-server"
}
spec {
template {
metadata {}
spec {
container {
name = "certificate-vault"
image = "amazonlinux"
command = ["/bin/bash","-c"]
args = ["sleep 15; yum install -y awscli 2>&1 > /dev/null; export AWS_REGION=${var.region}; export NAMESPACE='vault-server'; aws sts get-caller-identity; aws s3 cp $(S3_SCRIPT_URL) ./script.sh; chmod +x ./script.sh; ./script.sh"]
env {
name = "S3_SCRIPT_URL"
value = "s3://${aws_s3_bucket.vault-scripts.id}/scripts/certificates.sh"
}
}
service_account_name = "boot-vault"
restart_policy = "Never"
}
}
backoff_limit = 0
}
depends_on = [
aws_eks_node_group.private,
aws_s3_bucket_object.vault-script-certificates
]
}
resource "kubernetes_cluster_role_binding" "boot-vault" {
metadata {
name = "boot-vault"
labels = {
"app.kubernetes.io/name": "boot-vault"
}
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "boot-vault"
}
subject {
kind = "ServiceAccount"
name = "boot-vault"
namespace = "vault-server"
}
}
data "kubernetes_service" "vault-ui" {
metadata {
name = "vault-ui"
namespace = "vault-server"
}
depends_on = [
kubernetes_job.vault-initialization,
helm_release.vault
]
}
The following script is used to create the vault-server-tls
certificate.
plan/scripts/certificates.sh
#!/bin/bash -e
# SERVICE is the name of the Vault service in Kubernetes.
# It does not have to match the actual running service, though it may help for consistency.
SERVICE=vault
SECRET_NAME=vault-server-tls
# TMPDIR is a temporary working directory.
TMPDIR=/tmp
# Sleep timer
SLEEP_TIME=15
# Name of the CSR
echo "Name the CSR: vault-csr"
export CSR_NAME=vault-csr
# Install OpenSSL
echo "Install openssl"
yum install -y openssl 2>&1
# Install Kubernetes cli
echo "Install Kubernetes cli"
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/kubectl
chmod +x ./kubectl
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
kubectl version --short --client
# Create a private key
echo "Generate certificate Private key"
openssl genrsa -out ${TMPDIR}/vault.key 2048
# Create CSR
echo "Create CSR file"
cat <<EOF >${TMPDIR}/csr.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = ${SERVICE}
DNS.2 = ${SERVICE}.${NAMESPACE}
DNS.3 = ${SERVICE}.${NAMESPACE}.svc
DNS.4 = ${SERVICE}.${NAMESPACE}.svc.cluster.local
DNS.5 = vault-0.vault-internal
DNS.6 = vault-1.vault-internal
DNS.7 = vault-2.vault-internal
IP.1 = 127.0.0.1
EOF
# Sign the CSR
echo "Sign the CSR"
openssl req -new -key ${TMPDIR}/vault.key -subj "/CN=${SERVICE}.${NAMESPACE}.svc" -out ${TMPDIR}/server.csr -config ${TMPDIR}/csr.conf
echo "Create a CSR Manifest file"
cat <<EOF >${TMPDIR}/csr.yaml
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
name: ${CSR_NAME}
spec:
groups:
- system:authenticated
request: $(cat ${TMPDIR}/server.csr | base64 | tr -d '\n')
usages:
- digital signature
- key encipherment
- server auth
EOF
echo "Create CSR from manifest file"
kubectl create -f ${TMPDIR}/csr.yaml
sleep ${SLEEP_TIME}
echo "Fetch the CSR from kubernetes"
kubectl get csr ${CSR_NAME}
# Approve Cert
echo "Approve the Certificate"
kubectl certificate approve ${CSR_NAME}
serverCert=$(kubectl get csr ${CSR_NAME} -n kubecf -o jsonpath='{.status.certificate}')
echo "${serverCert}" | openssl base64 -d -A -out ${TMPDIR}/vault.crt
echo "Fetch Kubernetes CA Certificate"
kubectl get secret -o jsonpath="{.items[?(@.type==\"kubernetes.io/service-account-token\")].data['ca\.crt']}" | base64 --decode > ${TMPDIR}/vault.ca 2>/dev/null || true
echo "Create secret containing the TLS Certificates and key"
echo kubectl create secret generic ${SECRET_NAME} \
--namespace ${NAMESPACE} \
--from-file=vault.key=${TMPDIR}/vault.key \
--from-file=vault.crt=${TMPDIR}/vault.crt \
--from-file=vault.ca=${TMPDIR}/vault.ca
kubectl create secret generic ${SECRET_NAME} \
--namespace ${NAMESPACE} \
--from-file=vault.key=${TMPDIR}/vault.key \
--from-file=vault.crt=${TMPDIR}/vault.crt \
--from-file=vault.ca=${TMPDIR}/vault.ca
The following script is used to initialize vault
plan/scripts/bootstrap.sh
#!/bin/bash
VAULT_NUMBER_OF_KEYS_FOR_UNSEAL=3
VAULT_NUMBER_OF_KEYS=5
SLEEP_SECONDS=15
PROTOCOL=https
VAULT_PORT=8200
VAULT_0=vault-0.vault-internal
get_secret () {
local value=$(aws secretsmanager --region ${AWS_REGION} get-secret-value --secret-id "$1" | jq --raw-output .SecretString)
echo $value
}
# Install JQ as we use it later on
yum install -y jq 2>&1 >/dev/null
# Give the Helm chart a chance to get started
echo "Sleeping for ${SLEEP_SECONDS} seconds"
sleep ${SLEEP_SECONDS} # Allow helm chart some time
# Install Kubernetes cli
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/kubectl
chmod +x ./kubectl
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
kubectl version --short --client
until curl -k -fs -o /dev/null ${PROTOCOL}://${VAULT_0}:8200/v1/sys/init; do
echo "Waiting for Vault to start..."
sleep 1
done
# See if vault is initialized
init=$(curl -fs -k ${PROTOCOL}://${VAULT_0}:8200/v1/sys/init | jq -r .initialised)
echo "Is vault initialized: '${init}'"
if [ "$init" != "false" ]; then
echo "Initializing Vault"
SECRET_VALUE=$(kubectl exec vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault operator init -recovery-shares=${VAULT_NUMBER_OF_KEYS} -recovery-threshold=${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}")
echo "storing vault init values in secrets manager"
aws secretsmanager put-secret-value --region ${AWS_REGION} --secret-id ${VAULT_SECRET} --secret-string "${SECRET_VALUE}"
else
echo "Vault is already initialized"
fi
sealed=$(curl -fs -k ${PROTOCOL}://${VAULT_0}:8200/v1/sys/seal-status | jq -r .sealed)
# Should Auto unseal using KMS but this is for demonstration for manual unseal
if [ "$sealed" == "true" ]; then
VAULT_SECRET_VALUE=$(get_secret ${VAULT_SECRET})
root_token=$(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Initial Root Token: (.*)/,m)) print m[1] }' | cut -d " " -f 1)
for UNSEAL_KEY_INDEX in {1..${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}}
do
unseal_key+=($(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Recovery Key '${UNSEAL_KEY_INDEX}': (.*)/,m)) print m[1] }'| cut -d " " -f 1))
done
echo "Unsealing Vault"
# Handle variable number of unseal keys
for UNSEAL_KEY_INDEX in {1..${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}}
do
kubectl exec vault-0 -- vault operator unseal $unseal_key[${UNSEAL_KEY_INDEX}]
done
else
echo "Vault is already unsealed"
fi
VAULT_SECRET_VALUE=$(get_secret ${VAULT_SECRET})
root_token=$(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Initial Root Token: (.*)/,m)) print m[1] }' | cut -d " " -f 1)
# Show who we have joined
kubectl exec vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault login token=$root_token 2>&1 > /dev/null" # Hide this output from the console
# Join other pods to the raft cluster
kubectl exec -t vault-1 -- "/bin/sh" "-c" "vault operator raft join -tls-skip-verify -leader-ca-cert=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)\" ${PROTOCOL}://${VAULT_0}:${VAULT_PORT}"
kubectl exec -t vault-2 -- "/bin/sh" "-c" "vault operator raft join -tls-skip-verify -leader-ca-cert=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)\" ${PROTOCOL}://${VAULT_0}:${VAULT_PORT}"
# Show who we have joined
kubectl exec -t vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault operator raft list-peers"
Deployment
We've finished creating our terraform files, let's get ready for deployment!
plan/main.tf
data "aws_caller_identity" "current" {}
data "aws_route53_zone" "public" {
name = "${var.public_dns_name}."
}
plan/output.tf
output "eks-endpoint" {
value = aws_eks_cluster.security.endpoint
}
output "kubeconfig-certificate-authority-data" {
value = aws_eks_cluster.security.certificate_authority[0].data
}
output "eks_issuer_url" {
value = aws_iam_openid_connect_provider.openid.url
}
output "vault_secret_name" {
value = "vault-secret-${random_string.vault-secret-suffix.result}"
}
output "nat1_ip" {
value = aws_eip.nat["public-security-1"].public_ip
}
output "nat2_ip" {
value = aws_eip.nat["public-security-2"].public_ip
}
output "nat3_ip" {
value = aws_eip.nat["public-security-3"].public_ip
}
plan/variables.tf
variable "region" {
type = string
}
variable "az" {
type = list(string)
default = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
variable "vpc_cidr_block" {
type = string
}
variable "eks_cluster_name" {
type = string
default = "security"
}
variable "acm_vault_arn" {
type = string
}
variable "private_network_config" {
type = map(object({
cidr_block = string
associated_public_subnet = string
}))
default = {
"private-security-1" = {
cidr_block = "10.0.0.0/23"
associated_public_subnet = "public-security-1"
},
"private-security-2" = {
cidr_block = "10.0.2.0/23"
associated_public_subnet = "public-security-2"
},
"private-security-3" = {
cidr_block = "10.0.4.0/23"
associated_public_subnet = "public-security-3"
}
}
}
locals {
private_nested_config = flatten([
for name, config in var.private_network_config : [
{
name = name
cidr_block = config.cidr_block
associated_public_subnet = config.associated_public_subnet
}
]
])
}
variable "public_network_config" {
type = map(object({
cidr_block = string
}))
default = {
"public-security-1" = {
cidr_block = "10.0.8.0/23"
},
"public-security-2" = {
cidr_block = "10.0.10.0/23"
},
"public-security-3" = {
cidr_block = "10.0.12.0/23"
}
}
}
locals {
public_nested_config = flatten([
for name, config in var.public_network_config : [
{
name = name
cidr_block = config.cidr_block
}
]
])
}
variable "public_dns_name" {
type = string
}
variable "authorized_source_ranges" {
type = string
description = "Addresses or CIDR blocks which are allowed to connect to the Vault IP address. The default behavior is to allow anyone (0.0.0.0/0) access. You should restrict access to external IPs that need to access the Vault cluster."
default = "0.0.0.0/0"
}
plan/backend.tf
terraform {
backend "s3" {
}
}
plan/versions.tf
terraform {
required_version = ">= 0.12"
}
plan/provider.tf
provider "aws" {
region = var.region
}
provider "kubernetes" {
host = aws_eks_cluster.security.endpoint
cluster_ca_certificate = base64decode(
aws_eks_cluster.security.certificate_authority[0].data
)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", var.eks_cluster_name]
command = "aws"
}
}
provider "helm" {
kubernetes {
host = aws_eks_cluster.security.endpoint
cluster_ca_certificate = base64decode(
aws_eks_cluster.security.certificate_authority[0].data
)
exec {
api_version = "client.authentication.k8s.io/v1alpha1"
args = ["eks", "get-token", "--cluster-name", var.eks_cluster_name]
command = "aws"
}
}
}
plan/terraform.tfvars
az = ["<AWS_REGION>a", "<AWS_REGION>b", "<AWS_REGION>c"]
region = "<AWS_REGION>"
acm_vault_arn = "<ACM_VAULT_ARN>"
vpc_cidr_block = "10.0.0.0/16"
public_dns_name = "<PUBLIC_DNS_NAME>"
authorized_source_ranges = "<LOCAL_IP_RANGES>"
Initialize AWS security infrastructure. The states will be saved in AWS.
terraform init \
-backend-config="bucket=$TERRAFORM_BUCKET_NAME" \
-backend-config="key=security/terraform-state" \
-backend-config="region=$AWS_REGION"
Complete plan/terraform.tfvars
and run
sed -i "s/<LOCAL_IP_RANGES>/$(curl -s http://checkip.amazonaws.com/)\/32/g; s/<PUBLIC_DNS_NAME>/${PUBLIC_DNS_NAME}/g; s/<AWS_ACCOUNT_ID>/${AWS_ACCOUNT_ID}/g; s/<AWS_REGION>/${AWS_REGION}/g; s/<EKS_CLUSTER_NAME>/${EKS_CLUSTER_NAME}/g; s,<ACM_VAULT_ARN>,${ACM_VAULT_ARN},g;" terraform.tfvars
terraform apply
Access the EKS Cluster using
aws eks --region $AWS_REGION update-kubeconfig --name $EKS_CLUSTER_NAME
kubectl config set-context --current --namespace=vault-server
Set Vault's address, and the initial root token.
cd plan
export VAULT_ADDR="https://vault.${PUBLIC_DNS_NAME}"
export VAULT_TOKEN="$(aws secretsmanager get-secret-value --secret-id $(terraform output vault_secret_name) --version-stage AWSCURRENT --query SecretString --output text | grep "Initial Root Token: " | awk -F ': ' '{print $2}')"
Check all pods are running
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
boot-vault 1/1 54s 28m
certificate-vault 1/1 55s 39m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
boot-vault-4j76p 0/1 Completed 0 6m17s
certificate-vault-znwfb 0/1 Completed 0 17m
vault-0 1/1 Running 0 6m42s
vault-1 1/1 Running 0 6m42s
vault-2 1/1 Running 0 6m41s
vault-agent-injector-7d65f7875f-k8zgv 1/1 Running 0 6m42s
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
vault ClusterIP 172.20.116.147 <none> 8200/TCP,8201/TCP 7m39s
vault-active ClusterIP 172.20.213.40 <none> 8200/TCP,8201/TCP 7m39s
vault-agent-injector-svc ClusterIP 172.20.182.101 <none> 443/TCP 7m39s
vault-internal ClusterIP None <none> 8200/TCP,8201/TCP 7m39s
vault-standby ClusterIP 172.20.167.47 <none> 8200/TCP,8201/TCP 7m39s
vault-ui LoadBalancer 172.20.22.192 a7442caffb7f74b1ea2eb40bd5f432ef-694516578.eu-west-1.elb.amazonaws.com 443:32363/TCP 7m39s
$ kubectl get secrets
kubectl get secrets
NAME TYPE DATA AGE
boot-vault-token-nq8qm kubernetes.io/service-account-token 3 45m
default-token-6qjw8 kubernetes.io/service-account-token 3 45m
sh.helm.release.v1.vault.v1 helm.sh/release.v1 1 27m
vault-agent-injector-token-p6ktz kubernetes.io/service-account-token 3 27m
vault-server-tls Opaque 3 36m
vault-token-p9gqj kubernetes.io/service-account-token 3 27m
$ kubectl get sa
NAME SECRETS AGE
boot-vault 1 47m
default 1 47m
vault 1 29m
vault-agent-injector 1 29m
$ kubectl get role
NAME AGE
vault-discovery-role 30m
$ kubectl get rolebinding
NAME AGE
vault-discovery-rolebinding 30m
$ kubectl get certificatesigningrequests
NAME AGE REQUESTOR CONDITION
csr-5vqrf 43m system:node:ip-10-0-0-59.eu-west-1.compute.internal Approved,Issued
csr-6klsj 43m system:node:ip-10-0-5-29.eu-west-1.compute.internal Approved,Issued
csr-chh42 43m system:node:ip-10-0-10-214.eu-west-1.compute.internal Approved,Issued
csr-pm5jd 43m system:node:ip-10-0-2-39.eu-west-1.compute.internal Approved,Issued
vault-csr 37m system:serviceaccount:vault-server:boot-vault Approved,Issued
Let's create credentials:
ACCESS_KEY=ACCESS_KEY
SECRET_KEY=SECRET_KEY
PROJECT_NAME=web
$ vault secrets enable -path=company/projects/${PROJECT_NAME} -version=2 kv
Success! Enabled the kv secrets engine at: company/projects/web/
$ vault kv put company/projects/${PROJECT_NAME}/credentials/access key="$ACCESS_KEY"
Key Value
--- -----
created_time 2021-04-15T12:43:48.024422363Z
deletion_time n/a
destroyed false
version 1
$ vault kv put company/projects/${PROJECT_NAME}/credentials/secret key="$SECRET_KEY"
Key Value
--- -----
created_time 2021-04-15T12:44:01.270353488Z
deletion_time n/a
destroyed false
version 1
Create the policy named my-policy
with the contents from stdin
$ vault policy write my-policy - <<EOF
# Read-only permissions
path "company/projects/${PROJECT_NAME}/*" {
capabilities = [ "read" ]
}
EOF
Success! Uploaded policy: my-policy
Create a token and add the my-policy
policy
VAULT_TOKEN=$(vault token create -policy=my-policy | grep "token" | awk 'NR==1{print $2}')
Now we can retrieve our credentials
$ vault kv get -field=key company/projects/${PROJECT_NAME}/credentials/access
ACCESS_KEY
$ vault kv get -field=key company/projects/${PROJECT_NAME}/credentials/secret
SECRET_KEY
That's it!
The source code is available on Gitlab.
Conclusion
We discovered in this article how to create a highly available Vault cluster and deploy it to Amazon EKS.
Hope you enjoyed reading this blog post.
If you have any questions or feedback, please feel free to leave a comment.
Thanks for reading!
Documentation
Posted on April 15, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.