Build an Event-Driven Uptime Monitor in Go ๐Ÿš€

marcuskohlberg

Marcus Kohlberg

Posted on October 20, 2023

Build an Event-Driven Uptime Monitor in Go ๐Ÿš€

TL;DR

This guide shows you how to build an uptime monitoring tool using Go and Encore. Get your entire application running in the cloud in 30 minutes!

๐Ÿš€ What's on deck:

  • Install Encore
  • Create your app from a starter branch
  • Run locally to try the frontend
  • Build the backend
  • Deploy to Encore's free development cloud

โœจ Final result:

Uptime Monitor

Demo app: Try the app

When we're done, we'll have a backend with this event-driven architecture:

Uptime Monitor Architecture
In this diagram (automatically generated by Encore) you can see individual services as white boxes, and Pub/Sub topics as black boxes.

๐Ÿ Let's go!

To make it easier to follow along, we've laid out a trail of croissants to guide your way.

Whenever you see a ๐Ÿฅ it means there's something for you to do!

๐Ÿ’ฝ Install Encore

Install the Encore CLI to run your local environment:

  • macOS: brew install encoredev/tap/encore
  • Linux: curl -L https://encore.dev/install.sh | bash
  • Windows: iwr https://encore.dev/install.ps1 | iex

Create your Encore application

๐Ÿฅ Create your new app from this starter branch with a ready-to-go frontend to use:



encore app create uptime --example=github.com/encoredev/example-app-uptime/tree/starting-point


Enter fullscreen mode Exit fullscreen mode

๐Ÿ’ป Run your app locally

๐Ÿฅ Check that your frontend works by running your app locally.



cd uptime
encore run


Enter fullscreen mode Exit fullscreen mode

You should see this:
Encore Run This means Encore has started your local environment and created local infrastructure for Pub/Sub and Databases.

Then visit http://localhost:4000/frontend/ to see the frontend.
The functionality won't work yet, since we haven't yet built the backend yet.

โ€“ Let's do that now!

๐Ÿ”จ Create the monitor service

Let's start by creating the functionality to check if a website is currently up or down.

Later we'll store this result in a database so we can detect when the status changes and
send alerts.

๐Ÿฅ Create a service named monitor containing a file named ping.go. With Encore, you do this by creating a Go package:



mkdir monitor
touch monitor/ping.go


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Add an API endpoint named Ping that takes a URL as input and returns a response indicating whether the site is up or down.

With Encore you do this by creating a function and adding the //encore:api annotation to it.

Paste this into the ping.go file:



package monitor

import (
    "context"
    "net/http"
    "strings"
)

// PingResponse is the response from the Ping endpoint.
type PingResponse struct {
    Up bool `json:"up"`
}

// Ping pings a specific site and determines whether it's up or down right now.
//
//encore:api public path=/ping/*url
func Ping(ctx context.Context, url string) (*PingResponse, error) {
    // If the url does not start with "http:" or "https:", default to "https:".
    if !strings.HasPrefix(url, "http:") && !strings.HasPrefix(url, "https:") {
        url = "https://" + url
    }

    // Make an HTTP request to check if it's up.
    req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
    if err != nil {
        return nil, err
    }
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return &PingResponse{Up: false}, nil
    }
    resp.Body.Close()

    // 2xx and 3xx status codes are considered up
    up := resp.StatusCode < 400
    return &PingResponse{Up: up}, nil
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Let's try it! Make sure you have Docker installed and running, then run encore run in your terminal and you should see the service start up.

๐Ÿฅ Now open up the Local Dev Dashboard running at http://localhost:9400 and try calling
the monitor.Ping endpoint, passing in google.com as the URL.

If you prefer to use the terminal instead run curl http://localhost:4000/ping/google.com in a new terminal instead. Either way, you should see the response:



{"up": true}


Enter fullscreen mode Exit fullscreen mode

You can also try with httpstat.us/400 and some-non-existing-url.com and it should respond with {"up": false}.
(It's always a good idea to test the negative case as well.)

๐Ÿงช Add a test

๐Ÿฅ Let's write an automated test so we don't break this endpoint over time. Create the file monitor/ping_test.go and add this code:



package monitor

import (
    "context"
    "testing"
)

func TestPing(t *testing.T) {
    ctx := context.Background()
    tests := []struct {
        URL string
        Up  bool
    }{
        {"encore.dev", true},
        {"google.com", true},
        // Test both with and without "https://"
        {"httpbin.org/status/200", true},
        {"https://httpbin.org/status/200", true},

        // 4xx and 5xx should considered down.
        {"httpbin.org/status/400", false},
        {"https://httpbin.org/status/500", false},
        // Invalid URLs should be considered down.
        {"invalid://scheme", false},
    }

    for _, test := range tests {
        resp, err := Ping(ctx, test.URL)
        if err != nil {
            t.Errorf("url %s: unexpected error: %v", test.URL, err)
        } else if resp.Up != test.Up {
            t.Errorf("url %s: got up=%v, want %v", test.URL, resp.Up, test.Up)
        }
    }
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Run encore test ./... to check that it all works as expected. You should see something like this:



$ encore test ./...
9:38AM INF starting request endpoint=Ping service=monitor test=TestPing
9:38AM INF request completed code=ok duration=71.861792 endpoint=Ping http_code=200 service=monitor test=TestPing
[... lots more lines ...]
PASS
ok      encore.app/monitor      1.660


Enter fullscreen mode Exit fullscreen mode

๐ŸŽ‰ It works. Well done!

๐Ÿ”จ Create site service

Next, we want to keep track of a list of websites to monitor.

Since most of these APIs will be simple CRUD (Create/Read/Update/Delete) endpoints, let's build this service using GORM, an ORM library that makes building CRUD endpoints really simple.

๐Ÿฅ Create a new service named site with a SQL database. To do so, create a new directory site in the application root with migrations folder inside that folder:



mkdir site
mkdir site/migrations


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Add a database migration file inside that folder, named 1_create_tables.up.sql. The file name is important (it must look something like 1_<name>.up.sql) as Encore uses the file name to automatically run migrations.

Add the following contents:



CREATE TABLE sites (
    id BIGSERIAL PRIMARY KEY,
    url TEXT NOT NULL
);


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Next, install the GORM library and PostgreSQL driver:



go get -u gorm.io/gorm gorm.io/driver/postgres


Enter fullscreen mode Exit fullscreen mode

Now let's create the site service itself. To do this we'll use Encore's support for dependency injection to inject the GORM database connection.

๐Ÿฅ Create site/service.go and add this code:



package site

import (
    "encore.dev/storage/sqldb"
    "gorm.io/driver/postgres"
    "gorm.io/gorm"
)

//encore:service
type Service struct {
    db *gorm.DB
}

var siteDB = sqldb.Named("site").Stdlib()

// initService initializes the site service.
// It is automatically called by Encore on service startup.
func initService() (*Service, error) {
    db, err := gorm.Open(postgres.New(postgres.Config{
        Conn: siteDB,
    }))
    if err != nil {
        return nil, err
    }
    return &Service{db: db}, nil
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ With that, we're now ready to create our CRUD endpoints.
Create the following files:

site/get.go:



package site

import "context"

// Site describes a monitored site.
type Site struct {
    // ID is a unique ID for the site.
    ID int `json:"id"`
    // URL is the site's URL.
    URL string `json:"url"`
}

// Get gets a site by id.
//
//encore:api public method=GET path=/site/:siteID
func (s *Service) Get(ctx context.Context, siteID int) (*Site, error) {
    var site Site
    if err := s.db.Where("id = $1", siteID).First(&site).Error; err != nil {
        return nil, err
    }
    return &site, nil
}


Enter fullscreen mode Exit fullscreen mode

site/add.go:



package site

import "context"

// AddParams are the parameters for adding a site to be monitored.
type AddParams struct {
    // URL is the URL of the site. If it doesn't contain a scheme
    // (like "http:" or "https:") it defaults to "https:".
    URL string `json:"url"`
}

// Add adds a new site to the list of monitored websites.
//
//encore:api public method=POST path=/site
func (s *Service) Add(ctx context.Context, p *AddParams) (*Site, error) {
    site := &Site{URL: p.URL}
    if err := s.db.Create(site).Error; err != nil {
        return nil, err
    }
    return site, nil
}


Enter fullscreen mode Exit fullscreen mode

site/list.go:



package site

import "context"

type ListResponse struct {
    // Sites is the list of monitored sites.
    Sites []*Site `json:"sites"`
}

// List lists the monitored websites.
//
//encore:api public method=GET path=/site
func (s *Service) List(ctx context.Context) (*ListResponse, error) {
    var sites []*Site
    if err := s.db.Find(&sites).Error; err != nil {
        return nil, err
    }
    return &ListResponse{Sites: sites}, nil
}


Enter fullscreen mode Exit fullscreen mode

site/delete.go:



package site

import "context"

// Delete deletes a site by id.
//
//encore:api public method=DELETE path=/site/:siteID
func (s *Service) Delete(ctx context.Context, siteID int) error {
    return s.db.Delete(&Site{ID: siteID}).Error
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Restart encore run to cause the site database to be created, and then call the site.Add endpoint:



curl -X POST 'http://localhost:4000/site' -d '{"url": "https://encore.dev"}'
{
  "id": 1,
  "url": "https://encore.dev"
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ Record uptime checks

In order to notify when a website goes down or comes back up, we need to track the previous state it was in.

To do so, let's add a database to the monitor service as well.

๐Ÿฅ Create the directory monitor/migrations and the file monitor/migrations/1_create_tables.up.sql:



CREATE TABLE checks (
    id BIGSERIAL PRIMARY KEY,
    site_id BIGINT NOT NULL,
    up BOOLEAN NOT NULL,
    checked_at TIMESTAMP WITH TIME ZONE NOT NULL
);


Enter fullscreen mode Exit fullscreen mode

We'll insert a database row every time we check if a site is up.

๐Ÿฅ Add a new endpoint Check to the monitor service, that takes in a Site ID, pings the site, and inserts a database row in the checks table.

For this service we'll use Encore's sqldb package instead of GORM (in order to showcase both approaches).

Add this to monitor/check.go:




package monitor

import (
    "context"

    "encore.app/site"
    "encore.dev/storage/sqldb"
)

// Check checks a single site.
//
//encore:api public method=POST path=/check/:siteID
func Check(ctx context.Context, siteID int) error {
    site, err := site.Get(ctx, siteID)
    if err != nil {
        return err
    }
    result, err := Ping(ctx, site.URL)
    if err != nil {
        return err
    }
    _, err = sqldb.Exec(ctx, `
        INSERT INTO checks (site_id, up, checked_at)
        VALUES ($1, $2, NOW())
    `, site.ID, result.Up)
    return err
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Restart encore run to cause the monitor database to be created, and then call the new monitor.Check endpoint:



curl -X POST 'http://localhost:4000/check/1'


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Inspect the database to make sure everything worked:



encore db shell monitor


Enter fullscreen mode Exit fullscreen mode

You should see this:




psql (14.4, server 14.2)
Type "help" for help.

monitor=> SELECT * FROM checks;
 id | site_id | up |          checked_at
----+---------+----+-------------------------------
  1 |       1 | t  | 2022-10-21 09:58:30.674265+00


Enter fullscreen mode Exit fullscreen mode

If that's what you see, everything's working great!๐ŸŽ‰

โฐ Add a cron job to check all sites

We now want to regularly check all the tracked sites so we can
immediately respond in case any of them go down.

We'll create a new CheckAll API endpoint in the monitor service that will list all the tracked sites and check all of them.

๐Ÿฅ Let's extract some of the functionality we wrote for the
Check endpoint into a separate function.

In monitor/check.go it should look like so:



// Check checks a single site.
//
//encore:api public method=POST path=/check/:siteID
func Check(ctx context.Context, siteID int) error {
    site, err := site.Get(ctx, siteID)
    if err != nil {
        return err
    }
    return check(ctx, site)
}

func check(ctx context.Context, site *site.Site) error {
    result, err := Ping(ctx, site.URL)
    if err != nil {
        return err
    }
    _, err = sqldb.Exec(ctx, `
        INSERT INTO checks (site_id, up, checked_at)
        VALUES ($1, $2, NOW())
    `, site.ID, result.Up)
    return err
}


Enter fullscreen mode Exit fullscreen mode

Now we're ready to create our new CheckAll endpoint.

๐Ÿฅ Create the new CheckAll endpoint inside monitor/check.go:



import "golang.org/x/sync/errgroup"

// CheckAll checks all sites.
//
//encore:api public method=POST path=/checkall
func CheckAll(ctx context.Context) error {
    // Get all the tracked sites.
    resp, err := site.List(ctx)
    if err != nil {
        return err
    }

    // Check up to 8 sites concurrently.
    g, ctx := errgroup.WithContext(ctx)
    g.SetLimit(8)
    for _, site := range resp.Sites {
        site := site // capture for closure
        g.Go(func() error {
            return check(ctx, site)
        })
    }
    return g.Wait()
}


Enter fullscreen mode Exit fullscreen mode

This uses an errgroup to check up to 8 sites concurrently, aborting early if we encounter any error. (Note that a website being down is not treated as an error.)

๐Ÿฅ Run go get golang.org/x/sync/errgroup to install that dependency.

๐Ÿฅ Now that we have a CheckAll endpoint, define a cron job to automatically call it every 5 minutes.

Add this to monitor/check.go:



import "encore.dev/cron"

// Check all tracked sites every 5 minutes.
var _ = cron.NewJob("check-all", cron.JobConfig{
    Title:    "Check all sites",
    Endpoint: CheckAll,
    Every:    5 * cron.Minute,
})


Enter fullscreen mode Exit fullscreen mode

Note: For ease of development, cron jobs are not triggered when running the application locally, but work when deploying the application to a cloud environment.

๐Ÿš€ Deploy to Encore's free development cloud

To try out your uptime monitor for real, let's deploy it to Encore's development cloud.

Encore comes with built-in CI/CD, and the deployment process is as simple as a git push encore.

(You can also integrate with GitHub to activate per Pull Request Preview Environments, learn more in the CI/CD docs.)

๐Ÿฅ Deploy your app by running:



git add -A .
git commit -m 'Initial commit'
git push encore


Enter fullscreen mode Exit fullscreen mode

Encore will now build and test your app, provision the needed infrastructure, and deploy your application to the cloud.

After triggering the deployment, you will see a URL where you can view its progress in Encore's Cloud Dashboard. It will look something like: https://app.encore.dev/$APP_ID/deploys/...

From there you can also see metrics, traces, link your app to a GitHub repo to get automatic deploys on new commits, and connect your own AWS or GCP account to use for production deployment.

๐Ÿฅ When the deploy has finished, you can try out your uptime monitor by going to:
https://staging-$APP_ID.encr.app/frontend.

You now have an Uptime Monitor running in the cloud, well done!โœจ

Publish Pub/Sub events when a site goes down

An uptime monitoring system isn't very useful if it doesn't
actually notify you when a site goes down.

To do so let's add a Pub/Sub topic
on which we'll publish a message every time a site transitions from being up to being down, or vice versa.

๐Ÿฅ Define the topic using Encore's Pub/Sub package in a new file, monitor/alerts.go:



package monitor

import "encore.dev/pubsub"

// TransitionEvent describes a transition of a monitored site
// from up->down or from down->up.
type TransitionEvent struct {
    // Site is the monitored site in question.
    Site *site.Site `json:"site"`
    // Up specifies whether the site is now up or down (the new value).
    Up bool `json:"up"`
}

// TransitionTopic is a pubsub topic with transition events for when a monitored site
// transitions from up->down or from down->up.
var TransitionTopic = pubsub.NewTopic[*TransitionEvent]("uptime-transition", pubsub.TopicConfig{
    DeliveryGuarantee: pubsub.AtLeastOnce,
})


Enter fullscreen mode Exit fullscreen mode

Now let's publish a message on the TransitionTopic if a site's up/down state differs from the previous measurement.

๐Ÿฅ Create a getPreviousMeasurement function in alerts.go to report the last up/down state:



import "encore.dev/storage/sqldb"

// getPreviousMeasurement reports whether the given site was
// up or down in the previous measurement.
func getPreviousMeasurement(ctx context.Context, siteID int) (up bool, err error) {
    err = sqldb.QueryRow(ctx, `
        SELECT up FROM checks
        WHERE site_id = $1
        ORDER BY checked_at DESC
        LIMIT 1
    `, siteID).Scan(&up)

    if errors.Is(err, sqldb.ErrNoRows) {
        // There was no previous ping; treat this as if the site was up before
        return true, nil
    } else if err != nil {
        return false, err
    }
    return up, nil
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Now add a function in alerts.go to conditionally publish a message if the up/down state differs:



import "encore.app/site"

func publishOnTransition(ctx context.Context, site *site.Site, isUp bool) error {
    wasUp, err := getPreviousMeasurement(ctx, site.ID)
    if err != nil {
        return err
    }
    if isUp == wasUp {
        // Nothing to do
        return nil
    }
    _, err = TransitionTopic.Publish(ctx, &TransitionEvent{
        Site: site,
        Up:   isUp,
    })
    return err
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Finally modify the check function in check.go to call the publishOnTransition function:



func check(ctx context.Context, site *site.Site) error {
    result, err := Ping(ctx, site.URL)
    if err != nil {
        return err
    }

    // Publish a Pub/Sub message if the site transitions
    // from up->down or from down->up.
    if err := publishOnTransition(ctx, site, result.Up); err != nil {
        return err
    }

    _, err = sqldb.Exec(ctx, `
        INSERT INTO checks (site_id, up, checked_at)
        VALUES ($1, $2, NOW())
    `, site.ID, result.Up)
    return err
}


Enter fullscreen mode Exit fullscreen mode

Now the monitoring system will publish messages on the TransitionTopic whenever a monitored site transitions from up->down or from down->up.

However, it doesn't know or care who actually listens to these messages. The truth is right now nobody does. So let's fix that by adding a Pub/Sub subscriber that posts these events to Slack.

Send Slack notifications when a site goes down

๐Ÿฅ Start by creating a Slack service slack/slack.go containing the following:



package slack

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

type NotifyParams struct {
    // Text is the Slack message text to send.
    Text string `json:"text"`
}

// Notify sends a Slack message to a pre-configured channel using a
// Slack Incoming Webhook (see https://api.slack.com/messaging/webhooks).
//
//encore:api private
func Notify(ctx context.Context, p *NotifyParams) error {
    reqBody, err := json.Marshal(p)
    if err != nil {
        return err
    }
    req, err := http.NewRequestWithContext(ctx, "POST", secrets.SlackWebhookURL, bytes.NewReader(reqBody))
    if err != nil {
        return err
    }
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    if resp.StatusCode >= 400 {
        body, _ := io.ReadAll(resp.Body)
        return fmt.Errorf("notify slack: %s: %s", resp.Status, body)
    }
    return nil
}

var secrets struct {
    // SlackWebhookURL defines the Slack webhook URL to send
    // uptime notifications to.
    SlackWebhookURL string
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Now go to a Slack community of your choice (where you have permission to create a new Incoming Webhook). If you don't have any, join the Encore Slack and ask in #help and we're happy to help out.

๐Ÿฅ Once you have the Webhook URL, save it as a secret using Encore's built-in secrets manager:



encore secret set --local,dev,prod SlackWebhookURL


Enter fullscreen mode Exit fullscreen mode

๐Ÿฅ Test the slack.Notify endpoint by calling it via cURL:



curl 'http://localhost:4000/slack.Notify' -d '{"Text": "Testing Slack webhook"}'


Enter fullscreen mode Exit fullscreen mode

You should see the Testing Slack webhook message appear in the Slack channel you designated for the webhook.

๐Ÿฅ It's now time to add a Pub/Sub subscriber to automatically notify Slack when a monitored site goes up or down. Add the following to slack/slack.go:



import (
    "encore.dev/pubsub"
    "encore.app/monitor"
)

var _ = pubsub.NewSubscription(monitor.TransitionTopic, "slack-notification", pubsub.SubscriptionConfig[*monitor.TransitionEvent]{
    Handler: func(ctx context.Context, event *monitor.TransitionEvent) error {
        // Compose our message.
        msg := fmt.Sprintf("*%s is down!*", event.Site.URL)
        if event.Up {
            msg = fmt.Sprintf("*%s is back up.*", event.Site.URL)
        }

        // Send the Slack notification.
        return Notify(ctx, &NotifyParams{Text: msg})
    },
})


Enter fullscreen mode Exit fullscreen mode

๐ŸŽ‰ Deploy your finished Uptime Monitor

You're now ready to deploy your finished Uptime Monitor, complete with a Slack integration!

๐Ÿฅ As before, deploying your app to the cloud is as simple as running:



git add -A .
git commit -m 'Add slack integration'
git push encore


Enter fullscreen mode Exit fullscreen mode

You now have a fully featured, production-ready, Uptime Monitoring system running in the cloud. Well done! โœจ

๐Ÿคฏ Wrapping up: All of this came in at just over 300 lines of code

You've now built a fully functioning uptime monitoring system, accomplishing a remarkable amount with very little code:

  • You've built three different services (site, monitor, and slack)
  • You've added two databases (to the site and monitor services) for tracking monitored sites and the monitoring results
  • You've added a cron job for automatically checking the sites every 5 minutes
  • You've set up a Pub/Sub topic to decouple the monitoring system from the Slack notifications
  • You've added a Slack integration, using secrets to securely store the webhook URL, listening to a Pub/Sub subscription for up/down transition events

All of this in just a bit over 300 lines of code!๐Ÿคฏ

If you want to build something more, check out Encore's templates library for inspiration.

If you need help or want to chat, come join the developer's hangout on Slack.

๐Ÿ’– ๐Ÿ’ช ๐Ÿ™… ๐Ÿšฉ
marcuskohlberg
Marcus Kohlberg

Posted on October 20, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

ยฉ TheLazy.dev

About