Getting Started with Synthetic Monitoring on GCP and Datadog

When you are talking about Monitoring, we often think about getting information on cpu usage, memory usage and other classic ops metrics. A 2024 trend in the monitoring landscape is also to gather end to end metrics and that’s why we see more and more “Synthetic Monitoring” panels in modern observability tools.
Like in the pyramid testing model, testing end to end metrics is a complementary approach to the metrics that we are used to check.

With new SEO constraints linked to the performance of web applications, many companies are now managing end to end metrics to get the full picture of what is working and how fast their websites are.

“Everything fails, all the time” is a famous quote from Amazon's Chief Technology Officer Werner Vogels. This means that software and distributed systems may eventually fail because something can always go wrong.

Definition & Concepts

In a few words : It’s a way to monitor your application, by simulating user actions and business critical scenarios. This aims to warn you of production related issues, so you can fix them before impacting most of your users.
If you are familiar with End-to-end testing, you are almost good to go. The idea, here, is to execute scripts that mimic your users and ”run” most business critical scenarios.
The key difference is that you are running this on your production environments, and focus will be given on performance of your application.
Once your scenario is executed, you’ll get plenty of metrics around your application, whether it’s related to web performance, or on one of your internal components.
Based on these metrics, you’ll be able to know where the effort needs to be made on your assets, to make your application more robust and fault-tolerant.
In general, there are different type of monitoring that refer to Synthetic monitoring :

1/ Availability Monitoring :

Will verify that your service is available at any time. It is a bit more sophisticated than a simple health check control, instead this monitor ensures that the service is running well and responding as it intends to be.

2/ Transaction Monitoring :

It is a step ahead of Availability Monitoring, now we are going to add scripts that will simulate users interactions to make sure business critical scenarios are working as expected.

3/ Web performance monitoring :

As its name suggests, it will focus on application performance throughout Core Web Vitals, this helps identify improvements/ degradation for your end-users.

Use-case

In this article, we will focus on Transaction & Web performance monitoring :

Our critic scenario is :

A user navigates to https://training.zenika.com
Type in search bar : “CKA”
Should be redirected to a result page containing results.
Choose the first training.
Should be redirected to the training detail page.

Synthetic monitoring setup on Google Cloud

Demo: https://github.com/Tarektouati/GCP-synthetic-monitoring
A synthetic monitor is composed of 2 Google Cloud components :
A cloud function
A monitor attached to the cloud function.

We’ll create a Node.JS cloud function with a puppeteer inside to which iterate on our user’s scenario.

As it’s puppeteer, you can write it by yourself, and rely on testing-library best parcticies and install pptr-testing-library.

if you are lazy, that’s why you can use Chrome recorder, to generate your puppeteer user journey.

Once, your are good to go, deploy cloud function in the your desired region, by running :

gcloud functions deploy <YOUR_CLOUD_FUNC_NAME> — gen2 — runtime=nodejs18 — region=<REGION> — source=. — entry-point=<YOUR_CLOUD_FUNC_ENTRYPOINT> — memory=2G — timeout=60 — trigger-http

You should see you cloud function available on GCP console

Next, attach a monitor to that cloud function :

gcloud monitoring uptime create <YOUR_MONITOR_NAME> — synthetic-target=projects/<PROJECT_ID>/locations/<REGION>/functions/<YOUR_CLOUD_FUNC_NAME> — period=5

You should also see the monitor available and configured on the GCP console

Navigate to the monitor detail page, you can see whether it status is passing or not.

Go ahead and create an Alerting policy, this step is crucial as it will notify you when your monitoring goes wrong.

Define a duration before an incident is declared, select a channel (email, slack, pager-duty, …). And add instruction to help the duty-girl/boy understand the incident.

For now, we have seen Transaction Monitoring with GCP, but what about Web performance monitoring ?

This doesn’t come out of the box, but still possible on GCP by combining puppeteer with lighthouse (checkout puppeteer documentation for this https://github.com/GoogleChrome/lighthouse/blob/main/docs/puppeteer.md).

Set-up on synthetic monitoring on Datadog

A synthetic monitor on Datadog is composed of different components :

Select UX Monitoring > Synthetic tests > New Test then create a “Browser test”

Configure your test by filling:

Target URL
Name,
Browser targets (Chrome, Firefox, …)
Locations: Chose one or multiple locations based on your business requirements

Next, same as the GCP part, we need to define the test period and alerting conditions.

To create our test scenario it’s quite easy and fast, Datadog allow you to record a journey directly from your browser (This requires a browser extension to be installed)

Go ahead and create your own journey, and once you are satisfied, create your monitor.

By default, a Browser performance dashboard is already available.

This one showcases the metrics like :

success rate per browser (chosen in the test configuration in the steps above)
Core Web Vitals
Long running tasks (which can be painful for your users)
3-party integration
…

Conclusion

Leveraging Google Cloud for Availability and Transaction Monitoring is a robust and efficient choice, especially for those already integrated into the Google Cloud ecosystem.

The seamless integration and comprehensive tools available within Google Cloud ensure thorough and effective transaction monitoring.

However, when it comes to web performance monitoring, while it’s still possible on Google Cloud, exploring Datadog can provide additional benefits.

Datadog’s Real User Monitoring (RUM) is particularly noteworthy, offering advanced capabilities and insights that might better serve your web performance monitoring needs.

More information :

Link to Datadog doc
Link to Google Cloud doc

Blog