Connecting to private Ec2 Instances using Systems Manager - A Hands-On Guide
LionelPJ
Posted on March 10, 2022
Systems Manager is a wonderful service and has many untapped features!! One common feature that has become popular in the recent past is connecting to Ec2 instances using Session Manager (a feature of Systems Manager) instead of using ssh. To use Session Manager you must enable Systems Manager in your account. The setup guide for Systems Manager is very exhaustive and it’s not very clear on what’s the minimum that you require to enable it in your account. This article focuses on the minimal steps involved to do just that. We will go through what resources we plan on using, to build it. Instead of using the console, we will use CloudFormation. I will assume that you have an existing network so my scripts work with them. In case this is a new account feel free to use my utility script to build a 3 tier network from scratch and then apply the script I provide within the resources. So, if you see me mention first script
in this article, I refer to this 3 tier network script, which you are free to reuse.
My focus on CloudFormation is to automate the process and also to help those who are learning for the AWS exams to refer to a working sample. I could have done it in Terraform but I wanted to stick to AWS products and features instead.
For those of you who are not familiar with the resources we are building using CloudFormation and its properties, I have added references in the resources section down below.
Prerequisites:
Log in as an IAM admin user or a role that has the permissions to run cloudformation. [This article is not heavily focused on the privilege of this logon identity. Feel free to make it least privileged if you need to. Our focus will be on enabling systems manager which is the main theme of the article.]
You need a VPC for this exercise. You can use an existing VPC with a minimum of 2 subnets. The default VPC also works but I would strongly advise against it for production systems. If you don’t have a VPC or want to create a new custom one for this exercise, check the resources section to build one before you enable systems manager.
Resources to Build
1. A new mySsmRole that does the following -
- Has a trust policy for ec2
- Has 2 main managed policies attached namely -
AmazonSSMManagedInstanceCore
This required trust policy enables an instance to use Systems Manager core service functionality. It provides minimum permissions which allow the instance to:
* Register as a managed instance
* Send heartbeat information
* Send and receive messages for Run Command and Session Manager
* Retrieve State Manager association details
* Read parameters in Parameter Store
This policy replaces the old AmazonEC2RoleforSSM
policy and is mandatory
CloudWatchAgentServerPolicy
This policy enables the Amazon CloudWatch agent, by allowing access to read instance information and write it to CloudWatch Logs, Metrics and EventBridge. Permissions also grant access to read Amazon EC2 tags, volumes, and CloudWatch configuration parameters in Parameter Store. You can also create a more restrictive policy that, for example, limits writing access to a specific CloudWatch Logs log stream. For more details, refer to the CloudWatch user guide.
AmazonSSMDirectoryServiceAccess
[Only applicable for Windows users joining a Domain server]
This instance trust policy enables a managed instance to seamlessly join a domain by providing access to the required AWS Directory Service API actions. This is optional. In my article, I am not going to add it.
Now that you have some background information, lets build our script starting with the role
Resources:
MySsmRole:
Type: AWS::IAM::Role
Properties:
Description: Role that allows SSM capability
ManagedPolicyArns:
- !Ref CWAgentServerPolicyArn
- !Ref SSMManagedInstanceCoreArn
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- 'sts:AssumeRole'
Path: /
Tags:
- Key: Name
Value: mySsmRole
notice the 2 managed policies that are attached and the trust policy for ec2
Parameters:
CWAgentServerPolicyArn:
Type: String
Default: 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy'
SSMManagedInstanceCoreArn:
Type: String
Default: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
these default values were fetched by visiting the IAM page and manually copying the ARN's
Note: Remember that CloudFormation adds prefixes and suffixes to mySsmRole
to make them unique during a deployment.
2. Create an instance profile
From the console, when you attach a role to an EC2, an instance profile is automatically created. Now, when you automate this instead using CloudFormation or Terraform it's your responsibility to create it. The instance profile is just a container that is created within EC2 to pass the iam role and the permissions attached to it.
So our script would now look like -
MyEc2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: "/"
Roles:
- !Ref MySsmRole
InstanceProfileName: MySsmRoleInstanceProfile
remember MySsmRole was just created in step 1
A note on CloudFormation conditional statements
Before we continue to the next section, I would like to explain how to add conditions into our script. This will be employed so our script can work with both existing VPC, subnets or route tables and with values that you provide.
My idea is that if you have used my basic 3 tier network script from the resources section, those values will be automatically imported cross stack. If you instead want to provide your own values you are free to do so.
Here is how I am doing it. For example - for a VPC id, by default I set the value to an empty string.
clientVpc:
Type: String
Default: ''
If you override it with a value, then this snippet will return a true
value
clientVpcExists: !Not [ !Equals [!Ref clientVpc, '']]
Now, in the place where I have to use a VPC id, I place an if condition around it to use client VPC if it exists, otherwise to import the value from my previous stack
VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
With this understanding let's continue with the next section.
3. Create Interface and Gateway Endpoints
Our The EC2 instances are private. There is no internet connectivity. To allow the ability to connect using systems manager, we are going to add the 4 mandatory endpoints seen in the table below. All optional ones though mentioned here, will not be added unless you have a need for it. In that case, please modify my script, as you need.
Endpoint | Purpose | Is Mandatory |
---|---|---|
com.amazonaws.region.ssm | For the Systems Manager service | Yes |
com.amazonaws.region.ssmmessages | To connect to our instances through a secure data channel using Session Manager | Yes |
com.amazonaws.region.ec2 | To create snapshots or call EBS | Yes |
com.amazonaws.region.ec2messages | Systems Manager uses this endpoint to make calls from SSM Agent to the Systems Manager service | Yes |
com.amazonaws.region.s3 | Systems Manager uses this endpoint to update SSM Agent, perform patching operations, and for tasks like uploading output logs you choose to store in S3 buckets, retrieving scripts or other files you store in buckets, and so on. If the security group associated with your instances restricts outbound traffic, you must add a rule to allow traffic to the prefix list for Amazon S3. | Yes |
Sample snippets of our endpoints would look like -
ec2InterfaceEndpoint:
Type: AWS::EC2::VPCEndpoint
Properties:
VpcEndpointType: Interface
ServiceName: !Sub 'com.amazonaws.${AWS::Region}.ec2'
VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
SubnetIds:
- !If [clientSubnetAExists, !Ref clientSubnetA, !ImportValue AppSubnetA]
- !If [clientSubnetBExists, !Ref clientSubnetB, !ImportValue AppSubnetB]
SecurityGroupIds:
- !Ref myDefaultSecurityGroup
PrivateDnsEnabled: true
everything looks similar for ec2MessagesInterfaceEndpoint, and ssmInterfaceEndpoint except for the ServiceName that has the service specific endpoint as seen in the table above
S3GatewayEndpoint:
Type: AWS::EC2::VPCEndpoint
Properties:
ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
VpcEndpointType: Gateway
VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
RouteTableIds:
- !If [clientRouteTableAExists, !Ref clientRouteTableA, !ImportValue PrivateARouteTable]
- !If [clientRouteTableBExists, !Ref clientRouteTableB, !ImportValue PrivateBRouteTable]
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal: '*'
Action: '*'
Resource: '*'
notice the gateway endpoint here adds route table entries instead
My parameters and conditions to make these work are as follows
clientVpc:
Type: String
Default: ''
clientSubnetA:
Type: String
Default: ''
clientSubnetB:
Type: String
Default: ''
clientRouteTableA:
Type: String
Default: ''
clientRouteTableB:
Type: String
Default: ''
Conditions:
clientVpcExists: !Not [ !Equals [!Ref clientVpc, '']]
clientSubnetAExists: !Not [ !Equals [!Ref clientSubnetA, '']]
clientSubnetBExists: !Not [ !Equals [!Ref clientSubnetB, '']]
clientRouteTableAExists: !Not [ !Equals [!Ref clientRouteTableA, '']]
clientRouteTableBExists: !Not [ !Equals [!Ref clientRouteTableB, '']]
For reference - the complete script is available in the resources section.
Notes for success:
- The default security group id that you provide to this script must allow for all access. This will be used by ssm and allows your private instance to receive updates from the internet using the endpoints.
- The default security group should be part of the VPC that you chose to build your resources with.
- The subnets must be part of the VPC and in the same region.
- The private route table must be associated to the private subnets within the VPC.
- When a new instance is ready and running, the systems manager connect button must be enabled if everything was done right. Or you will have to backtrack and identify what step you missed.
- Most modern linux versions have systems manager agent installed on them. This is documented here. If your OS is not supported check the guide on how to install the ssm agent.
- The first time the instance tries to connect the ssm agent is enabled and ready to use.
- Failure to enable systems manager and the reason for its failure is not provided as a feedback. Instead you are provided with an 8 step guide on enabling systems manager.
Testing Our Script
1. Testing script with a custom VPC using CloudFormation Console
For ease, I will use the script within this article for the custom VPC values. Feel free to use your own values instead.
here I will not set any client variables as I am going to use the imported values that were exported cross stack from my first script
Verifying ssm access using a new ec2 instance
Lets launch a new instance using the free tier ami's. The only difference from your regular choices are the ones that are captured here below. Please make sure that these are set right.
here choose myVpc, the appA subnet and the new instance profile that was created
give the instance a meaningful name
choose the default security group
and finally launch the instance. Wait for it to go into a running state then connect to the ec2 instance.
when session manager is enabled, the connect button is available to click
a successful yum update on the new private instance connected privately
Cleanup
If you have not done it already, terminate your private instance and delete ssm stack so you don't end paying for the endpoints while it's not in use.
2. Testing script with a default VPC using CloudFormation Console
Now, for ease to test with a different VPC I am going to use the default VPC. You may ask, are VPC endpoints needed because they are by default public. So I tested and found that a new instance that is created with just the role doesn't enable the systems manager
a typical error message when it cant connect
So, in order to test my script and adapt to the default VPC, I had to modify it slightly and comment out a line of code within the s3 gateway endpoint because default VPC's only have 1 route entry.
S3GatewayEndpoint:
Type: AWS::EC2::VPCEndpoint
Properties:
ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
VpcEndpointType: Gateway
VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
RouteTableIds:
- !If [clientRouteTableAExists, !Ref clientRouteTableA, !ImportValue PrivateARouteTable]
# - !If [clientRouteTableBExists, !Ref clientRouteTableB, !ImportValue PrivateBRouteTable]
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal: '*'
Action: '*'
Resource: '*'
notice the second line is commented out
Now, if you launch the script and provide the client VPC id, 2 subnet id's, and the private route table id for clientRouteTableA parameter, the script will successfully deploy.
notice all values are overriden with default VPC values. all other options are same as the previous scenario to create the stack
Once the instances are running you now see that you can connect to this public instance.
Congratulations You just got Systems Manager enabled in your accounts.
Cleanup
Just like the last time, cleanup after yourself by terminating your public instance and delete ssm stack so you don't end paying for the endpoints while it's not in use.
Resources
- 3 tier VPC network cloud formation script
- enabling ssm script
- CloudFormation resource type references
- VPC Endpoint - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-vpcendpoint.html
- IAM Role - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-role.html
- IAM Instance Profile - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-instanceprofile.html
[about Lionel Pulickal]
Lionel is a Cloud Engineer who has worked in the IT industry since 1997. He has all the three AWS associate level exams, the solution architect professional and networking specialty exams under his belt. He loves hands-on and always loves to share the knowledge he has gained over the years.
Posted on March 10, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.