AWS Systems Manager (SSM) Cross Region Replication
Katherine (she/her)
Posted on January 12, 2022
Overview of SSM Replication
This blog post will explain in detail how to set up cross region replication for AWS Parameter Store. As of the writing of this blog post, AWS does not have a native feature for replicating parameters in SSM. If you are using SSM Parameter Store instead of Secrets Manager and are seeking a way to replicate parameters for DR/Multi-Region purposes, this post may help you.
Diagram showing the architecture setup:
Serverless Framework Setup
I used Lamby cookie-cutter as the framework for this Lambda, which made a lot of the initial set up very easy! Please take a look at that site & set up your serverless framework for the work to be done ahead. I will first share the CloudFormation template used, then share the code that makes the replication work as well as plain in detail what's happening.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: AWS SSM regional replication for multi-region setup
Parameters:
StageEnv:
Type: String
Default: dev
AllowedValues:
- test
- dev
- staging
- prod
Mappings:
KmsMap:
us-east-1:
dev: 'arn:aws:kms:us-east-1:123456:key/super-cool-key1'
staging: 'arn:aws:kms:us-east-1:123456:key/super-cool-key2'
prod: arn:aws:kms:us-east-1:123456:key/super-cool-key3'
us-east-2:
dev: 'arn:aws:kms:us-east-2:123456:key/super-cool-key1'
staging: 'arn:aws:kms:us-east-2:123456:key/super-cool-key1'
prod: 'arn:aws:kms:us-east-2:123456:key/super-cool-key1'
DestinationMap:
us-east-1:
target: "us-east-2"
Resources:
ReplicationQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: !Sub 'SSM-SQS-replication-${StageEnv}-${AWS::Region}'
VisibilityTimeout: 1000
LambdaRegionalReplication:
Type: AWS::Serverless::Function
Properties:
CodeUri: .
Handler: lib/ssm_regional_replication.handler
Runtime: ruby2.7
Timeout: 900
MemorySize: 512
Environment:
Variables:
STAGE_ENV: !Ref StageEnv
TARGET_REGION: !FindInMap [DestinationMap, !Ref AWS::Region, target]
SKIP_SYNC: 'skip_sync'
Events:
InvokeFromSQS:
Type: SQS
Properties:
Queue: {"Fn::GetAtt" : [ "ReplicationQueue", "Arn" ]}
BatchSize: 1
Enabled: true
ReactToSSM:
Type: EventBridgeRule
Properties:
Pattern:
detail-type:
- Parameter Store Change
source:
- aws.ssm
Policies:
- Statement:
- Sid: ReadSSM
Effect: Allow
Action:
- ssm:GetParameter
- ssm:GetParameters
- ssm:PutParameter
- ssm:DeleteParameter
- ssm:AddTagsToResource
- ssm:ListTagsForResource
Resource:
- !Sub "arn:aws:ssm:*:${AWS::AccountId}:parameter/*"
- Statement:
- Sid: DecryptSSM
Effect: Allow
Action:
- kms:Decrypt
- kms:Encrypt
Resource:
- !FindInMap [KmsMap, us-east-1, !Ref StageEnv]
- !FindInMap [KmsMap, us-east-2, !Ref StageEnv]
LambdaFullReplication:
Type: AWS::Serverless::Function
Properties:
CodeUri: .
Handler: lib/ssm_full_replication.handler
Runtime: ruby2.7
Timeout: 900
MemorySize: 512
Environment:
Variables:
STAGE_ENV: !Ref StageEnv
TARGET_REGION: !FindInMap [DestinationMap, !Ref AWS::Region, target]
SKIP_SYNC: 'skip_sync'
Events:
DailyReplication:
Type: Schedule
Properties:
Description: Cronjob to run replication at 9:30am EST every Wednesday (cron is UTC)
Enabled: True
Name: DailySSMReplication
Schedule: "cron(30 13 ? * 4 *)"
Policies:
- Statement:
- Sid: SQSPerms
Effect: Allow
Action:
- sqs:SendMessage
Resource:
- !Sub "arn:aws:sqs:*:${AWS::AccountId}:SSM-SQS-replication-*"
- Statement:
- Sid: ReadSSM
Effect: Allow
Action:
- ssm:GetParameter
- ssm:GetParameters
- ssm:PutParameter
- ssm:AddTagsToResource
- ssm:ListTagsForResource
- ssm:DescribeParameters
Resource:
- !Sub "arn:aws:ssm:*:${AWS::AccountId}:*"
- !Sub "arn:aws:ssm:*:${AWS::AccountId}:parameter/*"
- Statement:
- Sid: DecryptSSM
Effect: Allow
Action:
- kms:Decrypt
- kms:Encrypt
Resource:
- !FindInMap [KmsMap, us-east-1, !Ref StageEnv]
- !FindInMap [KmsMap, us-east-2, !Ref StageEnv]
The above template does a number of things. It creates my SQS queue, a regional replication lambda that is event based, and a full replication lambda that is cron based. Under the 'Mappings' section I have "KmsMap" which maps to the aws/ssm KMS keys. If you use other keys for your SSM entries, enter that value here. If you use many keys across your SSM parameters, simply add them to the lambda properties, example here:
- Statement:
- Sid: DecryptSSM
Effect: Allow
Action:
- kms:Decrypt
- kms:Encrypt
Resource:
- !FindInMap [KmsMap, us-east-1, !Ref StageEnv]
- !FindInMap [KmsMap, us-east-2, !Ref StageEnv]
- 'arn:aws:kms:us-east-1:123456:key/my-managed-key1'
The other 'Mapping', DestinationMap
, sets up my source and target region. My original SSM parameters are in us-east-1
, so the target is us-east-2
in this case. The SQS queue holds all of the parameters from the LambdaFullReplication
, since lambdas cannot run indefinitely, there's a high chance the function won't finish before going through all of your parameters. This LambdaFullReplication
function sends the parameters to the SQS queue, where the LambdaRegionalReplication
then performs the put action to the destination region. The VisibilityTimeout
is set to 1000
to allow some wiggle room for the lambda (900
).
The full replication lambda runs every Wednesday (or whatever frequency you'd like) for a few reasons:
- to do the initial get/put for the parameters and
- to catch any parameters that have/delete the skip_sync tag
I will discuss the skip_sync
tag in detail when discussing the code. The regional replication lambda runs when there's an entry in the SQS queue that has to be processed, or anytime there's a change to a parameter, driven by event based actions.
Code Setup
Next I will discuss and share the Ruby code that actually does the work. There are three Ruby files that make this lambda function, parameter_store.rb
, ssm_regional_replication.rb
, and ssm_full_replication.rb
. I will share the code along with the comments around what is happening in the file.
require 'aws-sdk-ssm'
# Create ParameterStore class, to be shared by both regional
# and full replication lambda.
class ParameterStore
# The parameter store class creates instance variables with "attr_accessor"
# for the initial client, response, name, and tag_list.
attr_accessor :client, :response, :name, :tag_list
# Initialize method for hash
# this allows the client & name instance vars
# to be used outside of the init method
def initialize(h)
self.client = h[:client] # this gets the client key from CloudWatch metrics
self.name = h[:name] # gets the name of the param & assigns it to name instance var
end
# this method takes the client & name args from prev method.
def self.find_by_name(client, name)
# create new client connection & name from private `find_by_name` method
new(client: client, name: name).find_by_name(name)
end
private
def find_by_name(name)
# set begin block in order for the get_parameter call to
# loop through all of the parameters
begin
# declare instance variable with self.response
# set to the AWS client connection calling
# get_parameter method via Ruby CLI
# extract the name & with_decruption options set
self.response = client.get_parameter({
name: name,
with_decryption: true,
})
# rescue to look for AWS SSM throttling errors.
# take the exception below, and place in variable "e"
rescue Aws::SSM::Errors::ThrottlingException => e
p "Sleeping for 60 seconds while getting parameters."
sleep(60)
# will re-run what is in begin block
retry
end
self
end
# creates a `tag_list` instance var
# `||=` operator is Ruby "short-circuit" which means
# if `tag_list` is set, then skip this part,
# if not set, then set it to what is on the right side of equals sign.
# the purpose is to set the tag_list var equal to
# the response from the `list_tags_for_resource`¹
# which contains resource_type set to Parameter, and the
# resource_id set to name
def tag_list
@tag_list ||= client.list_tags_for_resource({resource_type: 'Parameter', resource_id: name})
end
# checks the `tag_list` method above & runs a
# select method on the tag_list hash
# loops to see if there is a key with the `key` value in hash
# and checks presence of a `skip_sync` tag with the `.any?`
# boolean method. If this exists, then the lambda function
# will not run and the replication will not occur.
# If this does not exist, then it proceeds.
# You may want to skip syncing for regional specific resources.
# If you want to replicate an initial skip_sync param, simply
# remove the tag in question and on the next run, the param will sync`
def skip_sync?
tag_list[:tag_list].select {|key| key[:key] == $skip_tag }.any?
end
# Calls the Ruby `put_parameters` method on the `client_target` parameter.
# `put_parameter` replicates name, value, type, and overwrite. This method
# also adds the tags copied over from the tag_list method to resources by name.
def sync_to_region(client_target)
client_target.put_parameter({
name: response['parameter']['name'], # required
value: response['parameter']['value'], # required
type: response['parameter']['type'], # accepts String, StringList, SecureString
overwrite: true,
})
client_target.add_tags_to_resource({resource_type: 'Parameter', resource_id: name, tags: tag_list.to_h[:tag_list]})
end
end
The next file I will discuss is the ssm_full_replication.rb
piece of the code. As you may gather from the name, this is responsible for full replication.
# this pulls the AWS sdk gem
require 'aws-sdk-ssm'
require 'aws-sdk-sqs'
require_relative 'parameter_store'
# Declare global variables which are set to the
# respective values from CloudFormation template.
$target_region = ENV['TARGET_REGION'] or raise "Missing TARGET_REGION variable."
$skip_tag = ENV['SKIP_SYNC'] or raise "Missing skip_sync tag."
$stage_env = ENV['STAGE_ENV']
# method set to us-east-1 for source region.
# var `sqs_client` set to new SQS client connection in target region
# var `sts_client` set to new STS client conn in source region.
# call `send_message` on `sqs_client` var with queue_url & message_body as params.
def send_params_to_sqs(name)
region = "us-east-1"
sqs_client = Aws::SQS::Client.new(region: $target_region)
sts_client = Aws::STS::Client.new(region: region)
sqs_client.send_message(
queue_url: "https://sqs.#{region}.amazonaws.com/#{sts_client.get_caller_identity.account}/SSM-SQS-replication-#{$stage_env}-#{region}",
message_body: name
)
end
# sets new SSM client connection in source region
# and new SSM client_target connection in target region
def handler(event:, context:)
client = Aws::SSM::Client.new
client_target = Aws::SSM::Client.new(region: $target_region)
# next_token set to nil, which is important at start of lambda func
next_token = nil
# loop starts with begin block which
# runs before the rest of the code in method.
loop do
begin
# describe_batch is set to value from
# describe_parameters² call on the client variable.
@describe_batch = client.describe_parameters({
# parameter_filter limits request results to what we need
parameter_filters: [
{
key: "Type",
values: ["String", "StringList", "SecureString"]
},
],
# next_token is set to next set of items to return
next_token: next_token,
})
# describe_batch var calls iterative loop and
# sends param name to send_params_to_sqs method
@describe_batch.parameters.each do |item|
send_params_to_sqs(item.name)
end
# break means that func will end if the next_token value is empty.
break if @describe_batch.next_token.nil?
next_token = @describe_batch.next_token
# exception handling. it looks for this error message, and this is how it will handle, by pausing for 60 seconds.
rescue Aws::SSM::Errors::ThrottlingException
p "Sleeping for 60 seconds while describing parameters."
sleep(60)
end
end
end
The last file to share is the ssm_regional_replication.rb
file. This file is event based and does the regional replication.
# this pulls the AWS sdk gem
require 'aws-sdk-ssm'
require_relative 'parameter_store'
# Global vars for file
$target_region = ENV['TARGET_REGION'] or raise "Missing TARGET_REGION variable."
$skip_tag = ENV['SKIP_SYNC'] or raise "Missing skip_sync tag."
# CloudWatch sends events in a specific format compared to SQS triggered lambdas
# so this method grabs the values from CloudWatch handles both formats.
def massage_event_data(event)
# pull out values from a cloudwatch invocation
operation = event.fetch('detail', {})['operation']
name = event.fetch('detail', {})['name']
return operation,name if operation && name
operation = 'Update'
name = event.fetch('Records', []).first['body']
return operation,name
end
def handler(event:, context:)
# set vars called operation and name. output from prev. method.
# create new client & target vars for SSM
operation,name = massage_event_data(event)
client = Aws::SSM::Client.new
client_target = Aws::SSM::Client.new(region: $target_region)
# this logic runs event based code. If the operation from
# the CloudWatch metrics is equal to either update or create
# the ps var uses the ParameterStore find_by_name class method
# and passes the client * name.
if operation == 'Update' || operation == 'Create'
ps = ParameterStore.find_by_name(client, name)
# if the ps var has a skip_sync tag, then the CloudWatch logs
# you will get what's in the puts string. if there is no tag
# it syncs to target region.
if ps.skip_sync?
puts "This function has been opted out, not replicating parameter."
else
ps.sync_to_region(client_target)
end
# if the operation is delete in the source region, then the delete_parameter method is called on the
# client_target and it's also deleted from the target_region to ensure parity.
elsif operation == 'Delete'
response = client_target.delete_parameter({
name: name, # required. go into event, reference the detail key, and the value name
})
end
end
References to AWS API docs page:
If you want to be sure that there are no missed variables, you can always set up a CloudWatch alarm on if your lambda has any failed invocations or if your SQS queue isn't sending any messages. I hope that this has helped others who are looking for a way to replicate SSM parameters in AWS from one region to another. That's the end of the code, I know it is a lot to digest, so if you have any questions please leave a comment and I'll do my best to follow up.
Posted on January 12, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.