Connecting to private Ec2 Instances using Systems Manager - A Hands-On Guide

lionelpj

LionelPJ

Posted on March 10, 2022

Connecting to private Ec2 Instances using Systems Manager - A Hands-On Guide

Systems Manager is a wonderful service and has many untapped features!! One common feature that has become popular in the recent past is connecting to Ec2 instances using Session Manager (a feature of Systems Manager) instead of using ssh. To use Session Manager you must enable Systems Manager in your account. The setup guide for Systems Manager is very exhaustive and it’s not very clear on what’s the minimum that you require to enable it in your account. This article focuses on the minimal steps involved to do just that. We will go through what resources we plan on using, to build it. Instead of using the console, we will use CloudFormation. I will assume that you have an existing network so my scripts work with them. In case this is a new account feel free to use my utility script to build a 3 tier network from scratch and then apply the script I provide within the resources. So, if you see me mention first script in this article, I refer to this 3 tier network script, which you are free to reuse.

My focus on CloudFormation is to automate the process and also to help those who are learning for the AWS exams to refer to a working sample. I could have done it in Terraform but I wanted to stick to AWS products and features instead.

For those of you who are not familiar with the resources we are building using CloudFormation and its properties, I have added references in the resources section down below.

Prerequisites:

  1. Log in as an IAM admin user or a role that has the permissions to run cloudformation. [This article is not heavily focused on the privilege of this logon identity. Feel free to make it least privileged if you need to. Our focus will be on enabling systems manager which is the main theme of the article.]

  2. You need a VPC for this exercise. You can use an existing VPC with a minimum of 2 subnets. The default VPC also works but I would strongly advise against it for production systems. If you don’t have a VPC or want to create a new custom one for this exercise, check the resources section to build one before you enable systems manager.

Resources to Build

1. A new mySsmRole that does the following -
  • Has a trust policy for ec2
  • Has 2 main managed policies attached namely -

AmazonSSMManagedInstanceCore
This required trust policy enables an instance to use Systems Manager core service functionality. It provides minimum permissions which allow the instance to:

* Register as a managed instance
* Send heartbeat information
* Send and receive messages for Run Command and Session Manager
* Retrieve State Manager association details
* Read parameters in Parameter Store
Enter fullscreen mode Exit fullscreen mode

This policy replaces the old AmazonEC2RoleforSSM policy and is mandatory

CloudWatchAgentServerPolicy
This policy enables the Amazon CloudWatch agent, by allowing access to read instance information and write it to CloudWatch Logs, Metrics and EventBridge. Permissions also grant access to read Amazon EC2 tags, volumes, and CloudWatch configuration parameters in Parameter Store. You can also create a more restrictive policy that, for example, limits writing access to a specific CloudWatch Logs log stream. For more details, refer to the CloudWatch user guide.

AmazonSSMDirectoryServiceAccess [Only applicable for Windows users joining a Domain server]
This instance trust policy enables a managed instance to seamlessly join a domain by providing access to the required AWS Directory Service API actions. This is optional. In my article, I am not going to add it.

Now that you have some background information, lets build our script starting with the role

Resources:
  MySsmRole:
    Type: AWS::IAM::Role
    Properties:
      Description: Role that allows SSM capability
      ManagedPolicyArns:
        - !Ref CWAgentServerPolicyArn
        - !Ref SSMManagedInstanceCoreArn
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ec2.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Tags:
        - Key: Name
          Value: mySsmRole
Enter fullscreen mode Exit fullscreen mode

notice the 2 managed policies that are attached and the trust policy for ec2

Parameters:
  CWAgentServerPolicyArn:
    Type: String
    Default: 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy'
  SSMManagedInstanceCoreArn:
    Type: String
    Default: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Enter fullscreen mode Exit fullscreen mode

these default values were fetched by visiting the IAM page and manually copying the ARN's

Note: Remember that CloudFormation adds prefixes and suffixes to mySsmRole to make them unique during a deployment.

2. Create an instance profile

From the console, when you attach a role to an EC2, an instance profile is automatically created. Now, when you automate this instead using CloudFormation or Terraform it's your responsibility to create it. The instance profile is just a container that is created within EC2 to pass the iam role and the permissions attached to it.

So our script would now look like -

  MyEc2InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: "/"
      Roles:
      - !Ref MySsmRole
      InstanceProfileName: MySsmRoleInstanceProfile
Enter fullscreen mode Exit fullscreen mode

remember MySsmRole was just created in step 1

A note on CloudFormation conditional statements

Before we continue to the next section, I would like to explain how to add conditions into our script. This will be employed so our script can work with both existing VPC, subnets or route tables and with values that you provide.

My idea is that if you have used my basic 3 tier network script from the resources section, those values will be automatically imported cross stack. If you instead want to provide your own values you are free to do so.

Here is how I am doing it. For example - for a VPC id, by default I set the value to an empty string.

clientVpc:
    Type: String 
    Default: ''
Enter fullscreen mode Exit fullscreen mode

If you override it with a value, then this snippet will return a true value

clientVpcExists: !Not [ !Equals [!Ref clientVpc, '']]
Enter fullscreen mode Exit fullscreen mode

Now, in the place where I have to use a VPC id, I place an if condition around it to use client VPC if it exists, otherwise to import the value from my previous stack

VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
Enter fullscreen mode Exit fullscreen mode

With this understanding let's continue with the next section.

3. Create Interface and Gateway Endpoints

Our The EC2 instances are private. There is no internet connectivity. To allow the ability to connect using systems manager, we are going to add the 4 mandatory endpoints seen in the table below. All optional ones though mentioned here, will not be added unless you have a need for it. In that case, please modify my script, as you need.

Endpoint Purpose Is Mandatory
com.amazonaws.region.ssm For the Systems Manager service Yes
com.amazonaws.region.ssmmessages To connect to our instances through a secure data channel using Session Manager Yes
com.amazonaws.region.ec2 To create snapshots or call EBS Yes
com.amazonaws.region.ec2messages Systems Manager uses this endpoint to make calls from SSM Agent to the Systems Manager service Yes
com.amazonaws.region.s3 Systems Manager uses this endpoint to update SSM Agent, perform patching operations, and for tasks like uploading output logs you choose to store in S3 buckets, retrieving scripts or other files you store in buckets, and so on. If the security group associated with your instances restricts outbound traffic, you must add a rule to allow traffic to the prefix list for Amazon S3. Yes

Sample snippets of our endpoints would look like -

ec2InterfaceEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      VpcEndpointType: Interface
      ServiceName: !Sub 'com.amazonaws.${AWS::Region}.ec2'
      VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
      SubnetIds: 
        - !If [clientSubnetAExists, !Ref clientSubnetA, !ImportValue AppSubnetA]
        - !If [clientSubnetBExists, !Ref clientSubnetB, !ImportValue AppSubnetB]
      SecurityGroupIds:
        - !Ref myDefaultSecurityGroup
      PrivateDnsEnabled: true
Enter fullscreen mode Exit fullscreen mode

everything looks similar for ec2MessagesInterfaceEndpoint, and ssmInterfaceEndpoint except for the ServiceName that has the service specific endpoint as seen in the table above

S3GatewayEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
      VpcEndpointType: Gateway
      VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
      RouteTableIds:
        - !If [clientRouteTableAExists, !Ref clientRouteTableA, !ImportValue PrivateARouteTable]
        - !If [clientRouteTableBExists, !Ref clientRouteTableB, !ImportValue PrivateBRouteTable]
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal: '*'
            Action: '*'
            Resource: '*'
Enter fullscreen mode Exit fullscreen mode

notice the gateway endpoint here adds route table entries instead

My parameters and conditions to make these work are as follows

  clientVpc:
    Type: String 
    Default: ''
  clientSubnetA:
    Type: String
    Default: ''
  clientSubnetB:
    Type: String
    Default: ''
  clientRouteTableA:
    Type: String
    Default: ''
  clientRouteTableB:
    Type: String
    Default: ''
Enter fullscreen mode Exit fullscreen mode
Conditions:
  clientVpcExists: !Not [ !Equals [!Ref clientVpc, '']]
  clientSubnetAExists: !Not [ !Equals [!Ref clientSubnetA, '']]
  clientSubnetBExists: !Not [ !Equals [!Ref clientSubnetB, '']]
  clientRouteTableAExists: !Not [ !Equals [!Ref clientRouteTableA, '']]
  clientRouteTableBExists: !Not [ !Equals [!Ref clientRouteTableB, '']]
Enter fullscreen mode Exit fullscreen mode

For reference - the complete script is available in the resources section.

Notes for success:
  • The default security group id that you provide to this script must allow for all access. This will be used by ssm and allows your private instance to receive updates from the internet using the endpoints.
  • The default security group should be part of the VPC that you chose to build your resources with.
  • The subnets must be part of the VPC and in the same region.
  • The private route table must be associated to the private subnets within the VPC.
  • When a new instance is ready and running, the systems manager connect button must be enabled if everything was done right. Or you will have to backtrack and identify what step you missed.
  • Most modern linux versions have systems manager agent installed on them. This is documented here. If your OS is not supported check the guide on how to install the ssm agent.
  • The first time the instance tries to connect the ssm agent is enabled and ready to use.
  • Failure to enable systems manager and the reason for its failure is not provided as a feedback. Instead you are provided with an 8 step guide on enabling systems manager.

Testing Our Script

1. Testing script with a custom VPC using CloudFormation Console

For ease, I will use the script within this article for the custom VPC values. Feel free to use your own values instead.

Image description

here I will not set any client variables as I am going to use the imported values that were exported cross stack from my first script

default values
default values
click next

review
acknowledge and create stack

script complete

Verifying ssm access using a new ec2 instance

Lets launch a new instance using the free tier ami's. The only difference from your regular choices are the ones that are captured here below. Please make sure that these are set right.

vpc settings and role
here choose myVpc, the appA subnet and the new instance profile that was created

tag
give the instance a meaningful name

defaultSg
choose the default security group

and finally launch the instance. Wait for it to go into a running state then connect to the ec2 instance.

connect
click connect

session manager enabled
when session manager is enabled, the connect button is available to click

ssm
a successful yum update on the new private instance connected privately

Cleanup

If you have not done it already, terminate your private instance and delete ssm stack so you don't end paying for the endpoints while it's not in use.

2. Testing script with a default VPC using CloudFormation Console

Now, for ease to test with a different VPC I am going to use the default VPC. You may ask, are VPC endpoints needed because they are by default public. So I tested and found that a new instance that is created with just the role doesn't enable the systems manager

unable to connect
a typical error message when it cant connect

So, in order to test my script and adapt to the default VPC, I had to modify it slightly and comment out a line of code within the s3 gateway endpoint because default VPC's only have 1 route entry.

  S3GatewayEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
      VpcEndpointType: Gateway
      VpcId: !If [clientVpcExists, !Ref clientVpc, !ImportValue MyVpc]
      RouteTableIds:
        - !If [clientRouteTableAExists, !Ref clientRouteTableA, !ImportValue PrivateARouteTable]
        # - !If [clientRouteTableBExists, !Ref clientRouteTableB, !ImportValue PrivateBRouteTable]
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal: '*'
            Action: '*'
            Resource: '*'
Enter fullscreen mode Exit fullscreen mode

notice the second line is commented out

Now, if you launch the script and provide the client VPC id, 2 subnet id's, and the private route table id for clientRouteTableA parameter, the script will successfully deploy.

overrides
notice all values are overriden with default VPC values. all other options are same as the previous scenario to create the stack

tag

default sg

Once the instances are running you now see that you can connect to this public instance.

ssm connect

Congratulations You just got Systems Manager enabled in your accounts.

Cleanup

Just like the last time, cleanup after yourself by terminating your public instance and delete ssm stack so you don't end paying for the endpoints while it's not in use.

Resources

[about Lionel Pulickal]

Lionel is a Cloud Engineer who has worked in the IT industry since 1997. He has all the three AWS associate level exams, the solution architect professional and networking specialty exams under his belt. He loves hands-on and always loves to share the knowledge he has gained over the years.

💖 💪 🙅 🚩
lionelpj
LionelPJ

Posted on March 10, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related