Automatic Golden Image Generation using CI/CD

Introduction:

Everyone, In every organization, security and compliance guardrails are measured in order to maintain the things are aligned with client expectations and agreement. There are many types of guardrails or compliance parameters out of which golden image creation is one of them. Before going into deep dive, let's under stand what is Golden Image.
Golden Image is basically an image that has all required or supporting packages to be installed like agent packages, software or utilities packages, vulnerability agent package etc. there can be other packages installed which are approved by client. So when you're going to build a golden image for the first time, you just have to make sure that all required tools are installed and running fine in that server(windows/linux) to support the environment. After all this needs to be aligned with approved SOE parameters document. Along with making sure all packages are installed, another thing which is OS needs to be updated with latest patches for current month. Once these all are done, then take a snapshot of that instance and consider as base image which is known as Golden Image. This image would be used for further server build activity in future.

Diagram:

Prerequisites:

GitLab
Terraform
Ansible(optional)
AWS Cloud Platform

Guidelines:

In this project, I have planned to build golden image for the first time as I didn't have any image earlier, so it's kind of we are starting from scratch. So, let me tell you guys first, that below are the planned action items to be done for this project ->

Build AWS EC2 instance using Terraform.
Provision EC2 instance using Ansible.
Created CICD pipeline to build sequence of activities.
Once entire provisioning is completed, then take an AMI of that instance.
Lastly, terminate the instance.

Note: As this is done for the first time, so ansible is required because there is no OS hardening parameters implemented. After instance provisioning with latest patches and implementing all security standards, once image is created, then for next month activity, Ansible will not be required because OS hardening parameters would have baked up in last month.

Build an Instance using Terraform

I have taken a sample base image (not last month golden image) as a reference, fetched this image using terraform and created a new EC2 instance.

var.tf

variable "instance_type" {
  description = "ec2 instance type"
  type        = string
  default     = "t2.micro"
}

data.tf:

## fetch AMI ID ##
data "aws_ami" "ami_id" {
  most_recent = true
  filter {
    name   = "tag:Name"
    values = ["Golden-Image_2024-06-13"]
  }
}

## Fetch SG and Keypair ##
data "aws_key_pair" "keypair" {
  key_name           = "keypair3705"
  include_public_key = true
}

data "aws_security_group" "sg" {
  filter {
    name   = "tag:Name"
    values = ["management-sg"]
  }
}

## Fetch IAM role ##
data "aws_iam_role" "instance_role" {
  name = "CustomEC2AdminAccess"
}

## Fetch networking details ##
data "aws_vpc" "vpc" {
  filter {
    name   = "tag:Name"
    values = ["custom-vpc"]
  }
}

data "aws_subnet" "subnet" {
  filter {
    name   = "tag:Name"
    values = ["management-subnet"]
  }
}

instance.tf

resource "aws_iam_instance_profile" "test_profile" {
  name = "InstanceProfile"
  role = data.aws_iam_role.instance_role.name
}

resource "aws_instance" "ec2" {
  ami                         = data.aws_ami.ami_id.id
  instance_type               = var.instance_type
  associate_public_ip_address = true
  availability_zone           = "us-east-1a"
  key_name                    = data.aws_key_pair.keypair.key_name
  security_groups             = [data.aws_security_group.sg.id, ]
  iam_instance_profile        = aws_iam_instance_profile.test_profile.name
  subnet_id                   = data.aws_subnet.subnet.id
  user_data                   = file("userdata.sh")

  root_block_device {
    volume_size = 15
    volume_type = "gp2"
  }
  tags = {
    "Name" = "GoldenImageVM"
  }
}

output.tf

output "ami_id" {
  value = {
    id               = data.aws_ami.ami_id.image_id
    arn              = data.aws_ami.ami_id.arn
    image_loc        = data.aws_ami.ami_id.image_location
    state            = data.aws_ami.ami_id.state
    creation_date    = data.aws_ami.ami_id.creation_date
    image_type       = data.aws_ami.ami_id.image_type
    platform         = data.aws_ami.ami_id.platform
    owner            = data.aws_ami.ami_id.owner_id
    root_device_name = data.aws_ami.ami_id.root_device_name
    root_device_type = data.aws_ami.ami_id.root_device_type
  }
}

output "ec2_details" {
  value = {
    arn         = aws_instance.ec2.arn
    id          = aws_instance.ec2.id
    private_dns = aws_instance.ec2.private_dns
    private_ip  = aws_instance.ec2.private_ip
    public_dns  = aws_instance.ec2.public_dns
    public_ip   = aws_instance.ec2.public_ip

  }
}

output "key_id" {
  value = {
    id          = data.aws_key_pair.keypair.id
    fingerprint = data.aws_key_pair.keypair.fingerprint
  }
}

output "sg_id" {
  value = data.aws_security_group.sg.id
}

output "role_arn" {
  value = {
    arn = data.aws_iam_role.instance_role.arn
    id  = data.aws_iam_role.instance_role.id
  }
}

userdata.sh

#!/bin/bash
sudo yum install jq -y
##Fetching gitlab password from parameter store
GITLAB_PWD=`aws ssm get-parameter --name "gitlab-runner_password" --region 'us-east-1' | jq .Parameter.Value | xargs`

##Set the password for ec2-user
PASSWORD_HASH=$(openssl passwd -1 $GITLAB_PWD)
sudo usermod --password "$PASSWORD_HASH" ec2-user

## Create gitlab-runner user and set password
USER='gitlab-runner'
sudo useradd -m -u 1001 -p $(openssl passwd -1 $GITLAB_PWD) $USER

##Copy the Gitlab SSH Key to gitlab-runner server
sudo mkdir /home/$USER/.ssh
sudo chmod 700 /home/$USER/.ssh
Ansible_SSH_Key=`aws ssm get-parameter --name "Ansible-SSH-Key" --region 'us-east-1' | jq .Parameter.Value | xargs`
sudo echo $Ansible_SSH_Key > /home/$USER/.ssh/authorized_keys
sudo chown -R $USER:$USER /home/$USER/.ssh/
sudo chmod 600 /home/$USER/.ssh/authorized_keys
sudo echo "StrictHostKeyChecking no" >> /home/$USER/.ssh/config
sudo echo "$USER  ALL=(ALL) NOPASSWD  : ALL" > /etc/sudoers.d/00-$USER
sudo sed -i 's/^#PermitRootLogin.*/PermitRootLogin yes/; s/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
sleep 40

Here, we have used a shell script to get prerequisites installed for Ansible like user creation and providing sudo access etc.
Provision EC2 instance using Ansible:

Note: Before triggering ansible job in GitLab, please make sure you login to the server you built from gitlab runner as gitlab-runner is going to login to new server for ansible provisioning and that time it will get an error if we don't perform below one ->

main.yml

---



name: Set hostname

hosts: server

become: true

gather_facts: false

vars_files:


../vars/variable.yml
roles:
../roles/hostnamectl



name: Configure other services

hosts: server

become: true

roles:


../roles/ssh
../roles/login_banner
../roles/services
../roles/timezone
../roles/fs_integrity
../roles/firewalld
../roles/log_management
../roles/rsyslog
../roles/cron
../roles/journald



name: Start Prepatch

hosts: server

become: true

roles:


../roles/prepatch



name: Start Patching

hosts: server

become: true

roles:


../roles/patch



name: Start Postpatch

hosts: server

become: true

roles:


../roles/postpatch



name: Reboot the server

hosts: server

become: true

tasks:


reboot:
msg: "Rebooting machine in 5 seconds"

Prepare GitLab CI/CD Pipeline:

There are 4 stages created for entire deployment activity. Initially it will start with validation to make sure if all required services are running fine as expected.

If yes, then it will proceed for resource(EC2) build using Terraform. Here, I have used Terraform Cloud to make things more reliable and store state file in managed memory location provided by Hashicorp. But terraform cli can be used without any issues.

After successful resource build, provisioning needs to be performed to implement basic security standards and complete OS hardening process using Ansible CLI.

At last, once provisioning with patching is completed, pipeline job will take an AMI using AWS CLI commands.
Below are the required stages for this pipeline ->

Validation
InstanceBuild
InstancePatching
TakeAMI

.gitlab-ci.yml

default:

  tags:

    - anirban

stages:


Validation
InstanceBuild
InstancePatching
TakeAMI
Terminate


job1:

  stage: Validation

  script:

    - sudo chmod +x check_version.sh

    - source check_version.sh

  except:

    changes:

      - README.md

  artifacts:

    when: on_success

    paths:

      - Validation_artifacts

job2:

  stage: InstanceBuild

  script:

    - sudo chmod +x BuildScript/1_Env.sh

    - source BuildScript/1_Env.sh

    - python3 BuildScript/2_CreateTFCWorkspace.py -vvv

except:

    changes:

      - README.md

  artifacts:

    paths:

      - Validation_artifacts

      - content.tar.gz

job3:

  stage: InstancePatching

  script:

    - INSTANCE_PRIVATEIP=aws ec2 describe-instances --filters "Name=tag:Name, Values=GoldenImageVM" --query Reservations[0].Instances[0].PrivateIpAddress | xargs

    - echo -e "[server]\n$INSTANCE_PRIVATEIP" > ./Ansible/inventory

    - ansible-playbook ./Ansible/playbook/main.yml -i ./Ansible/inventory

    - sudo chmod +x BuildScript/7_Cleanup.sh

  when: manual

  except:

    changes:

      - README.md

  artifacts:

    when: on_success

    paths:

      - Validation_artifacts

      - ./Ansible/inventory

job4:

  stage: TakeAMI

  script:

    - echo '------------Fetching Instance ID------------'

    - INSTANCE_ID=aws ec2 describe-instances --filters "Name=tag:Name, Values=GoldenImageVM" --query Reservations[0].Instances[0].InstanceId | xargs

    - echo '----------Taking an Image of Instance-----------'

    - aws ec2 create-image --instance-id $INSTANCE_ID --name "GoldenImage" --description "Golden Image created on $(date -u +"%Y-%m-%dT%H:%M:%SZ")" --no-reboot --tag-specifications "ResourceType=image, Tags=[{Key=Name,Value=GoldenImage}]" "ResourceType=snapshot,Tags=[{Key=Name,Value=DiskSnaps}]"

  when: manual

  except:

    changes:

      - README.md

job5:

  stage: Terminate

  script:

    - echo '------------Fetching Instance ID------------'

    - INSTANCE_ID=aws ec2 describe-instances --filters "Name=tag:Name, Values=GoldenImageVM" --query Reservations[0].Instances[0].InstanceId | xargs

    - echo '--------------------Terminating the Instance--------------------'

    - aws ec2 terminate-instances --instance-ids $INSTANCE_ID

  when: manual

  except:

    changes:

      - README.md

Validation:

As per below images, we can see instances has been launched and provisioned successfully, post that AMI has been taken.

Conclusion:

So, after all we are at the end of this blog, I hope we all get an idea or approach how pipeline can be set up to build image without any manual intervention. However, in the pipeline I have referred Continuous Delivery approach, hence few stages are set to be trigged manually. There is one thing to highlight mandatorily which is "Do not set Ansible stage(job3) in gitlab as automatic. Use when: manual key to set this stage manual. As I mentioned on the top, ansible stage requires gitlab runner to login to newly build server which I could have mentioned as a command in the pipeline, but I didn't, thought of lets verify things by entering into the server from gitlab runner.

Hopefully you have enjoyed this blog, please go through this one and do the hands-on for sure🙂🙂. Please let me know how did you feel, what went well and how and where I could have done little better. All responses are welcome💗💗.

For upcoming updates, please stay tuned and get in touch. In the meantime, let's dive into below GitHub repository -> 👇👇

Thanks Much!!
Anirban Das.

Blog