CI/CD using GitHub Actions for Rails and Docker

archonic

Archonic

Posted on August 31, 2022

CI/CD using GitHub Actions for Rails and Docker

I've spent far too long doing something I technically didn't need, so now I'm going to spend more time writing an article about it 😉. Once upon a time, I was running tests on CodeShip. It was meh. It wasn't really designed for parelleization (at the time), it was slow as heck, I would get transient failures when Elasticsearch threw a fit and the results were just spat out in one huge text blob. I moved to CircleCI.

Moving from Codeship to CircleCI was relatively easy. The interface on CircleCI was more responsive, it parsed the RSpec test results better and parallelization was easy to implement. I still managed to find things I didn't like: The parallel runners had lopsided timing which refused to update and worst of all, there was no way to have it notify when all parallel workers had either succeeded or failed. I could only get a pass or fail notification for each worker, meaning that each push to GitHub would result in 10 Slack notifications 😱. I burned a few hours trying to fix that but only found plenty of posts saying it wasn't possible. I decided to switch services when I could find the time.

When GitHub Actions was announced, it seemed a more attractive option since I could have both a git repo and CI/CD infra in one place. Oh and it was free. But would the switch be worth it? Would I be able to parallelize GitHub Actions, how fast would it be, how long would it take to setup, would I be able to finally stop being bombarded by 1 Slack notification per parallel worker? So many questions and only one way to find out.

The Old Setup

My previous setup involved a lot of building images and a lot of waiting.

  1. [Manual] Build the image locally, reach some working state
  2. [Optional] Push the image to Google container registery (GCR) if there were changes to the OS or apk/apt packages
  3. [Manual] Push to GitHub
  4. [Automatic] CircleCI pulls the latest image from GCR, pulls the branch or latest master commit from GitHub, re-runs bundle and yarn, precompiles assets, then runs RSpec
  5. [Manual] If all 10 workers reported tests passing (eye twitch), then I would trigger a deploy locally by running git push hostname:master where hostname is the git remote of my production server where Dokku is configured. This would build the image from scratch again, then deploy that image.

If that sounds wasteful, that's because it is. There was utility in CircleCI pulling from GCR then rerunning steps - I only needed to push/pull when there were changes beneath the bundle layer, which was only OS and APT/APK library dependencies. Having to remember to push to GCR for the CircleCI tests to be valid did trip me up a few times though.

The Dream

I had previously setup a GitHub Action to build the image and do nothing else with it. This simple sanity check is actually quite useful. It's possible after all to push a Dockerfile which can't build and it helped me troubleshoot an issue where building worked on my Mac, on Windows with Ubuntu WSL but not on Ubuntu (thanks Docker), which was required to deploy using Dokku. Getting that build action working was incredibly simple, but now was the time to see if I could reach the holy land: one repository service and CI/CD to rule them all.

I won't lie, getting tests running on GH Actions was rough. This took a lot of guess and check and waiting, hence why I wrote an article to save you from my torment! You're welcome. "Build once use everywhere" is the dream, but seeing as there can be differences based on where we build (cough thanks again Docker), I'm going to edit "the dream" to be build once use everywhere once we push to GitHub.

The New Setup

The new workflow should look like this:

  1. [Manual] Build the image locally, reach some working state
  2. [Manual] Push to GitHub
  3. [Automatic] GitHub Actions workflow begins. This builds the image utilizing the GHA docker layer cache and runs rspec on that image using docker-compose.
  4. [Automatic] If this succeeds, push the already built image to ghcr.io
  5. [Automatic] Conditionally deploy to production by instructing Dokku to pull the image from ghcr.io.

This is much more automated and depending on how warm the various caches are, could be faster. In this article I'm not going to cover my attempt to parallelize RSpec within GitHub Actions. The important part is that after pushing to GH, we can go all the way to production automatically if everything went well.

Problems encountered were many

1 - Lack of instructional resources for this particular use-case

There's lots of GHA resources out there but I couldn't find any which build the image, then run tests on the output of the build. They all seem to be content to run tests on an ubuntu-latest stack, then build their own Dockerfile and assume the differences are irrelevant. My app requires a multipart Dockerfile build for tests to pass.
Solution: The article you're currently reading!

2 - Lack of configurability with the container statement

You can run your steps on a Docker image of your choosing with the container statement, but good luck running on a dynamic tag such as pr-123. The tags output which is typically available with ${{ steps.meta.outputs.tags }} is not available in the container statement. There's also no access to ${{ github.ref }} or even the env. That means I can't use the container statement 🤷‍♂️.
Solution: Instead of uploading the image to a container registry only to pull it and run tests on it, it's better to only push valid images to the container registry and eliminate the network bottleneck. That means using load:true in the build step then just using docker-compose to run on a locally available image.

3 - Providing env to docker without duplication

Providing the environment to the ubuntu-latest layer was easy by using env in the workflow yaml file and ${{ secrets.EXAMPLE }} but providing the env to the docker run command without a lot of duplication was a challenge. It has a very easy solution but it comes with a gotcha (see 4).
Solution: Just dump the env to the .env file that docker compose is expecting. For example:

-
  name: Run tests
  run: |
    env > .env
    docker-compose run web rails db:setup 
    docker-compose run web bundle exec rspec
Enter fullscreen mode Exit fullscreen mode

4 - Commands just didn't work. Like, any of them.

bundle, rails, rake, they all said /usr/bin/env: ‘ruby’: No such file or directory.
Solution: This one was tricky and it sure seemed like GHA was out to get me but I actually did it to myself with step 3. The $PATH variable in the base ubuntu-latest environment won't suite your Dockerfile environment and it was just overridden. Check your $PATH in your local environment with something like docker-compose run web echo $PATH and make sure those paths are in your GH workflow yaml under $PATH.

5 - Elasticsearch takes a long time to boot

Elasticsearch not being fully initialized before running the database seed step was causing this error:

Faraday::Error::ConnectionFailed: Couldn't connect to server
Enter fullscreen mode Exit fullscreen mode

This was never an issue locally, but locally I would typically run docker-compose exec web rspec after running docker-compose up which gives Elasticsearch plenty of time to boot.
Solution: It was finally time to setup a proper healthcheck in docker-compose.yml. Here's my elasticsearch service:

elasticsearch:
  container_name: elasticsearch
  image: docker.elastic.co/elasticsearch/elasticsearch:7.4.2
  environment:
    - discovery.type=single-node
    - cluster.name=docker-cluster
    - bootstrap.memory_lock=true
    - "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
    - "logger.org.elasticsearch=error"
  healthcheck:
    test: curl --fail elasticsearch:9200/_cat/health >/dev/null || exit 1
    interval: 30s
    timeout: 10s
    retries: 5
  ulimits:
    memlock:
      soft: -1
      hard: -1
  volumes:
    - esdata:/usr/share/elasticsearch/data
  ports:
    - 9200:9200
Enter fullscreen mode Exit fullscreen mode

You may need to change your healthcheck test depending on your version of ES. This requires a corresponding depends_on statement in the web container:

depends_on:
  elasticsearch:
    condition: service_healthy
  postgres:
    condition: service_started
  redis:
    condition: service_started
Enter fullscreen mode Exit fullscreen mode

It seems odd to me that you have to state that explicitly after defining a healthcheck statement. Maybe docker compose 4 will do it automatically. I ran into the same issue with Redis and the solution to that is in docker-ci.yml below.

6 - Test failures due to an apparent asset compilation issue

As this point I had tests running and some passing, which was a huge relief. But all request tests were failing with this:

ActionView::Template::Error:
Webpacker can't find application.js in /app/public/packs-test/manifest.json. Possible causes:
1. You want to set webpacker.yml value of compile to true for your environment
  unless you are using the `webpack -w` or the webpack-dev-server.
2. webpack has not yet re-run to reflect updates.
3. You have misconfigured Webpacker's config/webpacker.yml file.
4. Your webpack configuration is not creating a manifest.
Your manifest contains:
{
}
Enter fullscreen mode Exit fullscreen mode

This is a surprising because we just baked the assets into the image on the build step.

Solution: Tip o' the hat to Daniela Baron here, there's a real life saver of tool call tmate.

Insert this and you basically get a debug statement in your GH workflow:

- # Creates a SSH tunnel!
  name: Setup tmate session
  uses: mxschmitt/action-tmate@v3
Enter fullscreen mode Exit fullscreen mode

Just wait for the step and copy the ssh statement to your terminal.

Master must go inside the tunnel

This was a heck of an issue to troubleshoot. I've always baked production assets into the image. I've never had a missing assets error unless I'm messing with webpack configuration. This setup has always worked for all environments that have ever relied on it - Codeship, CircleCI and 2 very different production environments.

Somehow, running this shows a populated manifest:

docker run web cat public/packs/manifest.json
Enter fullscreen mode Exit fullscreen mode

but this says no such file:

docker-compose run web cat public/packs/manifest.json
Enter fullscreen mode Exit fullscreen mode

... wat? 🤔. How could the assets be obviously present in the image and then vanish when using docker-compose?

As with all things computers, this was another case of us shooting ourselves in the foot. Docker Compose mounts the volumes that you tell it to with a statement like this:

volumes:
  - ".:/app"
Enter fullscreen mode Exit fullscreen mode

That takes the current directory (.) and mounts it into the /app directory within the image. This is what ends up writing a folder with no compiled assets into the folder that already had assets compiled. This wasn't noticable on Codeship or CircleCI because they had an explicit asset precompilation step. I thought of 3 ways to solve this:

  1. Use addnab/docker-run-action and give up on Docker Compose altogether 😅.
  2. Just recompile the assets despite having just compiled them. Easy but wasteful.
  3. Write a docker-compose.yml specifically for CI which does not mount local files.

I tried 1, had difficulty passing in the env and trouble with connections to services. Then I tried 2 and it worked but it was super slow. 3 seems to be the right approach and it brought runtime down from 31m to 12m!

Without further ado

Here is the yamls you've been waiting for.

docker-ci.yml

version: "3.7"
services:
  postgres:
    image: "postgres:14-alpine"
    environment:
      POSTGRES_USER: "example"
      POSTGRES_PASSWORD: "example"
    ports:
      - "5432:5432"
    volumes:
      - "postgres:/var/lib/postgresql/data"
  redis:
    image: "redis:5-alpine"
    command: ["redis-server", "--requirepass", "yourpassword", "--appendonly", "yes"]
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    ports:
      - "6379:6379"
    volumes:
      - redis:/data
    sysctls:
      # https://github.com/docker-library/redis/issues/35
      net.core.somaxconn: "511"
  sidekiq:
    depends_on:
      - "postgres"
      - "redis"
      - "elasticsearch"
    build:
      context: .
      args:
        environment: development
    image: you/yourapp
    command: bundle exec sidekiq -C config/sidekiq.yml.erb
    volumes:
      - ".:/app"
      # don"t mount tmp directory
      - /app/tmp
    env_file:
      - ".env"
  web:
    build:
      context: .
      args:
        environment: development
    image: you/yourapp
    command: bundle exec rspec
    depends_on:
      elasticsearch:
        condition: service_healthy
      postgres:
        condition: service_started
      redis:
        condition: service_healthy
    tty: true
    stdin_open: true
    ports:
      - "3000:3000"
    env_file:
      - ".env"
  elasticsearch:
    container_name: elasticsearch
    image: docker.elastic.co/elasticsearch/elasticsearch:7.4.2
    environment:
      - discovery.type=single-node
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
      - "logger.org.elasticsearch=error"
    healthcheck:
      test: curl --fail elasticsearch:9200/_cat/health >/dev/null || exit 1
      interval: 30s
      timeout: 10s
      retries: 5
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
volumes:
  redis:
  postgres:
  esdata:
Enter fullscreen mode Exit fullscreen mode

main.yml

name: Main

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  build:
    name: Build, Test, Push
    runs-on: ubuntu-20.04
    env:
      REGISTRY: ghcr.io
      IMAGE_NAME: ${{ github.repository }}
      POSTGRES_USER: example
      POSTGRES_PASSWORD: example
      POSTGRES_HOST: postgres
      # Humour me here, this needs to be production for the sake of baking assets into the image
      RAILS_ENV: production
      NODE_ENVIRONMENT: production
      ACTION_MAILER_HOST: localhost:3000
      REDIS_URL: redis://redis:yourpassword@redis:6379
      DATABASE_URL: postgresql://example:example@postgres:5432/dbname?encoding=utf8&pool=5&timeout=5000
      ELASTICSEARCH_URL: elasticsearch:9200
      # Actually secret secrets
      EXAMPLE_KEY: ${{ secrets.EXAMPLE_KEY}}
      # Append our own PATHs because env > .env nukes it!
      PATH: /usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/local/bundle/bin:/usr/local/bundle/gems/bin:/usr/lib/fullstaq-ruby/versions/3.0.4-jemalloc/bin:/app/bin
    steps:
      -
        name: Checkout repo
        uses: actions/checkout@v3
        with:
          fetch-depth: 1

      - # Not required but recommend to be able to build multi-platform images, export cache, etc.
        name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      -
        name: Log in to the Container registry
        uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      -
        name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@v3
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: ${{ steps.extract_branch.outputs.branch }}

      -
        name: Build with Docker
        uses: docker/build-push-action@v3
        with:
          context: .
          load: true
          cache-from: type=gha
          cache-to: type=gha,mode=max
          tags: |
            ${{ steps.meta.outputs.tags }}
            ${{ github.repository }}:latest

      # - # Creates a SSH tunnel!
      #   name: Setup tmate session
      #   uses: mxschmitt/action-tmate@v3

      -
        name: Run tests
        run: |
          env > .env
          ./run-tests.sh

      - # NOTE Building in this step will use cache
        name: Build and push Docker image
        uses: docker/build-push-action@v3
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
Enter fullscreen mode Exit fullscreen mode

run-tests.sh

# Uses docker-compose to run tests on GitHub Actions

# Exit if a step fails
set -e

echo "============== DB SETUP"
docker-compose -f docker-ci.yml run web rails db:reset RAILS_ENV=test

echo "============== RSPEC"
docker-compose -f docker-ci.yml up --abort-on-container-exit
Enter fullscreen mode Exit fullscreen mode

Conclusions (for now)

As you can see, GitHub Actions is more freehand than most CI services. They basically say "here's Ubuntu, have fun". Running the image you have built is possible but they certainly don't hold your hand. If you know of a GitHub Action that does hold your hand with regards to running tests on the image that was just built, sound off in the comments!

If you have any suggestions for improvements, I'm all ears, although I can't say I'm very eager to make changes to this workflow any time soon 😫.

Was I able to parallelize GitHub Actions?

No. Just running on the built image was a real challenge. If you know how to parallelize GHA using docker-compose, let me know in the comments!

How long did it take to set up this GHA?

Too long. Embarrassingly long. But I saved you time, right?

... right?

How fast is it?

This comparison is not even close to fair. Locally and using CircleCI, I'm using an already built image and CircleCI has 10 parallel workers.

CircleCI (warm): 4m20s 🪴
Local (warm): 10m44s
GHA (warm): 12m42s

I am actually impressed with the time GHA gets considering that it's building the image, bundling (with a warm cache), precompiling assets and pushing the image to ghcr.io.

Was the switch worth it?

Right now, no. The amount of time it took to set this up would take a very long time to pay off. 12m vs 4m is not an improvement, although it's a big benefit to not have to think about when I should be pushing a fresh image to our container registry and it will be a huge benefit to be able to deploy (without having to rebuild) automatically.

Was I able to finally stop being bombarded by 1 Slack notification per parallel worker?

Yes! But I also went back to a serial workflow with just one worker. I could have easily specified one worker in CircleCI and got the same result.

Stay tuned for deploy.yml for the Dokku portion of things, when I get around to writing it!

Final Notes!

  • You probably don't want to run deploy steps on every push to master! You might want to only run when pushing a new tag.
  • If you're copy and pasting, you need to check your PATH and write that in to main.yml. The one I've provided will very likely not work for you.
  • Note the lack of a volume for the web container in docker-ci.yml. This is on purpose because we don't want to overwrite the assets that we just precompiled. Not precompiling assets a second time saves a lot of time. This also means you won't be able to access files written during the test (ex. a results.json file) then access them in a later step.
💖 💪 🙅 🚩
archonic
Archonic

Posted on August 31, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related