Creating resilient API synthetic canary tests using CloudWatch Synthetics
Wojciech Matuszewski
Posted on August 22, 2021
Developers test their software in multiple ways. Unit, integration, and end-to-end tests are, arguably, the most common ways to do so.
In addition to the three pillars of testing mentioned above, one might make a case for synthetic tests. These tests run on a schedule and continuously exercise the application to ensure it is working as expected.
This blog post will show you how to build such synthetic tests using the CloudWatch Synthetics. We will cover how to deploy them and ensure they stay resilient to any network intermittent issues that might arise.
Let us dive in.
All the code examples in this blog post are written in TypeScript. I will be using AWS CDK as my IaC tool of choice.
The Synthetics
package
Before writing any test code, we must familiarize ourselves with a tool the CloudWatch Synthetics exposes - mainly the Synthetics
package.
First off, the Synthetics
package above is not available on npm, which means that we cannot test our canary code locally before deploying it to AWS.
Secondly, no public TypeScript definitions exist for the Synthetics
package. We will get over this hurdle later on by creating them manually.
Thirdly, the Synthetics
package does not support retrying network requests out of the box. I would argue that any test dealing with network communication or UIs should retry the assertion if it did not pass the first time around. We would not want to get woken up in the middle of the night because of intermittent network communication issues.
We will also look at how one might add that capability later on.
Deploying a simple canary
With the previous chapter behind us, we are ready to deploy a simplistic CloudWatch Synthetics canary test that we will be improving upon as time progresses. The AWS CDK makes the deployment part hustle-free and painless.
Here is our starting point in terms of a canary test.
// canary.ts
// We will switch to `import` whenever we add types for this module.
const synthetics = require("Synthetics");
import http from "http";
export const handler = async () => {
const requestOptions: http.RequestOptions = {
hostname: "jsonplaceholder.typicode.com",
method: "GET",
port: 443,
protocol: "https:",
path: "/todos/1"
};
await synthetics.executeHttpStep(
"ping",
requestOptions,
async (res: http.ServerResponse) => {
return new Promise(resolve => {
res.on("error", error => {
reject(error);
});
res.on("end", () => {
resolve(undefined);
});
});
}
);
};
One significant thing to note here - the executeHttpStep
does not propagate data passed to the resolve
callback.
A "global" variable needs to be created and mutated as the data streams in chunks to retrieve data returned by the HTTP response.
Here is how one might read the data returned from the jsonplaceholder
API and use it outside of the executeHttpStep
callback:
// canary.ts
let rawResponse = "";
await synthetics.executeHttpStep("ping", requestOptions, async res => {
return new Promise(resolve => {
res.on("data", chunk => {
rawResponse += chunk;
});
res.on("close", () => {
resolve(undefined); // Using `undefined` to avoid TypeScript errors.
});
});
});
const response = JSON.parse(rawResponse);
This API is very unfortunate and will be a major thorn in our side whenever we implement retries.
Code is much more sparse on the AWS CDK side of things.
To deploy the canary, we can leverage the synthetics
construct.
import * as synthetics from "@aws-cdk/aws-synthetics"
// Class definition and so on.
const buildResult = buildSync({
external: ["Synthetics"],
minify: true,
platform: "node",
bundle: true,
entryPoints: [join(__dirname, "./canary.ts")],
write: false
});
const canaryCode = Buffer.from(buildResult.outputFiles[0].contents).toString(
"utf-8"
);
new synthetics.Canary(this, "MyCanary", {
schedule: synthetics.Schedule.rate(cdk.Duration.minutes(1)),
runtime: synthetics.Runtime.SYNTHETICS_NODEJS_PUPPETEER_3_1,
successRetentionPeriod: cdk.Duration.days(1),
failureRetentionPeriod: cdk.Duration.days(20),
test: synthetics.Test.custom({
code: synthetics.Code.fromInline(canaryCode),
handler: "index.handler"
})
});
Since the Canary
construct does not support canaries written in TypeScript out of the box, I'm leveraging esbuild
for bundling and transpilation.
After deploying the stack, navigate to the CloudWatch dashboard and select the Synthetics Canaries tab. Our canary should be green with one step marked as "Passed".
Adding TypeScript type definitions
As I eluded earlier, there are no publicly available TypeScript typings for the Synthetics
package, meaning that the synthetics
variable defined in the canary.ts
file is untyped - TypeScript evaluates the type of that variable as any
.
Thankfully, TypeScript exposes a way to declare those typings manually through ambient modules.
Here is a very bare-bones Synthetics
ambient module declaration.
// synthetics.d.ts
declare module "Synthetics" {
import type { ServerResponse, RequestOptions } from "http";
declare function executeHttpStep(
stepName: string,
options: RequestOptions,
validationFunction?: (res: ServerResponse) => Promise<unknown>
): Promise<void>;
export = { executeHttpStep };
}
With type declaration in place, we can now retire the CJS require
in favor of ES6 import
within the canary.ts
file. Doing so will make TypeScript infer typings for Synthetics
package from the ambient module we have declared.
// canary.ts
- const synthetics = require("Synthetics")
+ import synthetics from "Synthetics";
- await synthetics.executeHttpStep("ping", requestOptions, async (res: http.ServerResponse) => {
+ await synthetics.executeHttpStep("ping", requestOptions, async (res) => {
Validating the response status
Let us begin with asserting the status of the response. If the response returns a status outside of <200, 299> range, we should fail the ping
step, thus failing the whole canary.
// canary.ts
await synthetics.executeHttpStep("ping", requestOptions, async res => {
return new Promise((resolve, reject) => {
// Asserting the response `statusCode`
if (res.statusCode < 200 || res.statusCode > 299) {
reject(`${res.statusCode}: ${res.statusMessage}`);
}
// Rest of the code from the previous section
});
});
One important thing to note here - the rejection inside the executeHttpStep
callback does not mean rejection of the executeHttpStep
function.
In fact, the rejection is swallowed. Depending on the provided executeHttpStep
settings, the test might or might not continue.
Adding retries
Now it's time to start thinking about the resiliency of our tests.
Retrying on 500 statusCode
I would argue that retrying once or twice whenever the returned response has a 5xx status code makes sense. In the era of microservices, it's not uncommon for intermittent network errors to occur. If the same 5xx response status persists, though, we should be pretty confident that something is not working.
Since the Synthetics
package does not support retrying requests out of the box, we must implement that logic ourselves.
I will be using p-retry
module to carry out the retrying. I like the p-retry
interface and value its ease of use.
// canary.ts
const numOfRetries = 2;
const shouldFailStep = (attemptCount: number) =>
attemptCount == numOfRetries + 1;
pRetry(
async attemptCount => {
let shouldRetryStep = false;
let stepFailed = true;
await synthetics.executeHttpStep("ping", requestOptions, async res => {
return new Promise((resolve, reject) => {
log.warn(`response status code: ${res.statusCode}`);
// (1)
if (res.statusCode >= 500) {
if (shouldFailStep(attemptCount)) {
return reject("Retries exhausted");
}
// (2)
shouldRetryStep = true;
resolve(undefined);
}
if (res.statusCode < 200 || res.statusCode > 299) {
reject(`${res.statusCode}: ${res.statusMessage}`);
}
res.on("close", () => {
stepFailed = false;
resolve(undefined);
});
});
});
if (shouldRetryStep) {
throw new Error("Retrying step");
}
// (3)
if (stepFailed) {
throw new pRetry.AbortError("Step failed");
}
// You might be interested in reacting to the `res.on('data')` event and returning the result here.
},
{
retries: numOfRetries,
maxTimeout: 2_000,
minTimeout: 500
}
);
There is a lot to unpack here, so let us move through the code step by step.
As I eluded earlier, retrying whenever a request returns a 5xx status code makes sense. This logic checks whether we have exhausted our retries. If so, calls the
reject
callback failing the step, thus failing the canary in the process.The
executeHttpStep
API design gives us no alternative to communicating with the world outside of the callback other than through a "global" variable. Here I'm signaling that the request should be retried, but I'm usingresolve
to ensure the step is not marked as "failed".I have to manually propagate the
reject
call because theexecuteHttpStep
will never reject. Again, using a "global" variable due to API constraints.
Retrying on timeouts
Sadly, I could not trigger any timeout errors despite the aggressive one-millisecond timeout
or agent
properties specified on the requestOption
object. Given this experience, I conclude that the executeHttpStep
does not support timeouts and ignores them.
If this is indeed true, it makes the Synthetics
package much less viable in the context of API canary tests.
Alternative approach
Let us discuss the alternative AWS setup one might use to deploy API canary tests, effectively avoiding the Synthetics
package altogether.
While the requests made with the executeHttpStep
function produce nice visuals in the AWS CloudWatch Synthetics console, if all we require are logs of the process that runs the test, we might want to look for other solutions.
In this case, I would recommend looking into EventBridge and the capability of invoking a target, for example, an AWS Lambda function, on a fixed schedule. Here is an excellent resource on how to implement this infrastructure.
Summary
I hope that you find this exploration of the Synthetics
package helpful.
While there might be better options for a platform to deploy API canary tests on, the Synthetics
package can also monitor web applications. I do not have that much experience in that area, but it is an avenue worth exploring if that is your use case.
As always, if you have noticed some facts that are incorrect or misleading, please let me know!
You can find me on Twitter - @wm_matuszewski
Thank you for your time.
Posted on August 22, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
August 22, 2021