Track custom github repository metrics with Github Webhooks, FastAPI and DETA
Aavash Shrestha
Posted on September 28, 2020
Tracking important metrics for a Github repository facilitates analysis and optimization of a team's development and delivery process. The metrics help in identifying bottlenecks, prioritizing resources and setting goals.
In this tutorial, we are going to implement and deploy a simple self-hosted application that keeps track of the Code review turnaround, the time take between a code review assignment and completion, for a repository using Github Webhooks, FastAPI and Deta.
This tutorial showcases how, with Deta, you can direct most of your focus and effort on developing the tool and get it seamlessly deployed to production.
The application's users can either generate a graph or get a json response of average code review turnarounds for a specified duration averaged over a specified period. To generate the graph, we will use Bokeh, an interactive visualization library.
{
"2020-09-18T09:00:22": 0,
"2020-09-19T09:00:22": 454.88,
"2020-09-20T09:00:22": 1315.75,
"2020-09-21T09:00:22": 87.13,
"2020-09-22T09:00:22": 178.05,
"2020-09-23T09:00:22": 95.83,
"2020-09-24T09:00:22": 40.7,
"2020-09-25T09:00:22": 0
}
I have named the application GRT (Github Code Review Turnaround). A complete source code of the application is available on github.
Application Design
GRT needs to achieve three main things in order to keep track of the code review turnarounds and allow users to easily retrieve them:
- Know when a code review request has been requested or deleted, and when a code review has been submitted. We use Github Webhooks for this.
- Store and update information about code reviews in a persistent storage. For this we use Deta Base.
- Offer an api for users to see the average code review turnaround. The user should be able to specify the type of response, duration and the period to average the metrics over.
Implementation and Deployment
The following guide assumes that you have signed up for Deta and have the Deta CLI installed.
This guide is also only for unix environments. Some shell commands might be different for windows. Please, use respective commands for windows.
Create a deta micro
Firstly, we create a new FastAPI application on Deta.
- Create a directory called
grt
andcd
into it.
$ mkdir grt && cd grt
- Create two files
main.py
andrequirements.txt
in the root of the directory.
$ touch main.py requirements.txt
- Specify
fastapi
andbokeh
as dependencies in therequirements.txt
file. It should look like this:
fastapi
bokeh
- Create the Deta Micro. With a
main.py
file andfastapi
specified as a dependency in therequirements.txt
file, all you need to do is typedeta new
in order to create a new fastAPI app. From the root of the directory, enter
$ deta new
Successfully created a new micro
{
"name": "grt",
"runtime": "python3.7",
"endpoint": "https://{your_subdomain}.deta.dev",
"visor": "enabled",
"http_auth": "enabled"
}
Adding dependencies...
Collecting fastapi
...
Collecting bokeh
...
Successfully installed Jinja2-2.11.2 MarkupSafe-1.1.1 PyYAML-5.3.1 bokeh-2.2.1 fastapi-0.61.1 numpy-1.19.2 packaging-20.4 pillow-7.2.0 pydantic-1.6.1 pyparsing-2.4.7 python-dateutil-2.8.1 six-1.15.0 starlette-0.13.6 tornado-6.0.4 typing-extensions-3.7.4.3
The endpoint
will be different for your micro.
- You should see that
http_auth
isenabled
by default. We will disable the auth for the github webhook and use a webhook secret.
Set up the webhook
- The command
deta details
shows details about your deployed micro including your micro's http endpoint. Copy your micro's endpoint from the output ofdeta details
.
$ deta details
{
"name": "grt",
"runtime": "python3.7",
"endpoint": "https://{your_subdomain}.deta.dev",
"visor": "enabled",
"http_auth": "enabled"
}
Go to
Webhooks
underSettings
for the repository you want to track your metrics on and click onAdd Webhook
.In the
Payload URL
, use your micro's endpoint with the route/webhook_events
as the webhook endpoint.
https://{your_subdomain}.deta.dev/webhook_events
- Change the
Content type
toapplication/json
. - Generate a long secure random string (there are services online that do this) and use that as the Webhook Secret. Keep hold of this secret as you will need it to set up the app's environment later.
- Select
Let me select individual events
when selecting the events to trigger the webhook. Select the following events:-
Pull requests
: To know when a code review is requested -
Pull requests reviews
: To know when a code review has been submitted or deleted
-
- Click on
Add Webhook
to add the webhook.
Set up the environment
The webhook secret used in setting up the webhook is provided to the micro through an environment variable WEBHOOK_SECRET
.
- Create a
.env
file in the app's root directory and add your secret in the file. Make sure not to expose this file publicly.
$ echo WEBHOOK_SECRET=your_webhook_secret > .env
$ cat .env
WEBHOOK_SECRET=your_webhook_secret
- Update the environment variables of your app.
$ deta update -e .env
You should see that the environment variables have been successfully updated.
Implement the webhook endpoint
Time to code, let's add a POST endpoint that receives webhook events from github.
- Open
main.py
in your editor and add the following code.
from fastapi import FastAPI, Request
# FastAPI app
app = FastAPI()
@app.post("/webhook_events")
async def webhook_handler(request: Request):
# handle events
payload = await request.json()
event_type = request.headers.get("X-Github-Event")
# reviews requested or removed
if event_type == "pull_request":
action = payload.get("action")
if action == "review_requested":
# TODO: store review request
return "ok"
elif action == "review_request_removed":
# TODO: delete review request
return "ok"
return "ok"
# review submitted
if event_type == "pull_request_review" and payload.get("action") == "submitted":
# TODO: update review request
return "ok"
# ignore other events
return "ok"
Github sends different payloads for different events. The event_type
is denoted by the header X-Github-Event
and the action is denoted by the action
field in the payload. The code above just identifies what action triggered the webhook.
As you can see, there are several TODOs
in the code. For now, we just return ok
to github without actually doing anything with the data. We will handle these events properly after we have implemented storing, retrieving, updating and deleting the review requests' information.
The next step is to verify the signature sent by github. The signature is used for integrity and authentication; it verifies that the payload came from github and that the payload has not been modified by anybody in between.
- Create a file called
utils.py
and add the following code.
import os
import hmac
# get the webhook secret from the environment
WEBHOOK_SECRET = os.getenv("WEBHOOK_SECRET")
# caclulate hmac digest of payload with shared secret token
def calc_signature(payload):
digest = hmac.new(
key=WEBHOOK_SECRET.encode("utf-8"), msg=payload, digestmod="sha1"
).hexdigest()
return f"sha1={digest}"
- Open
main.py
and add the code to verify the signature.
from fastapi import FastAPI, Request, HTTPException
import utils
# FastAPI app
app = FastAPI()
@app.post("/webhook_events")
async def webhook_handler(request: Request):
# verify webhook signature
raw = await request.body()
signature = request.headers.get("X-Hub-Signature")
if signature != utils.calc_signature(raw):
raise HTTPException(status_code=401, detail="Unauthorized")
# handle events
payload = await request.json()
event_type = request.headers.get("X-Github-Event")
# reviews requested or removed
if event_type == "pull_request":
action = payload.get("action")
if action == "review_requested":
# TODO: store review request
return "ok"
elif action == "review_request_removed":
# TODO: delete review request
return "ok"
return "ok"
# review submitted
if event_type == "pull_request_review" and payload.get("action") == "submitted":
# TODO: update review request
return "ok"
# ignore other events
return "ok"
Now only github can send you payloads on your webhook endpoint. Let's deploy what we have until now with the deta deploy
command. It should take only a few seconds.
$ deta deploy
Implement the review request store
We are using Deta Base to store information about review requests.
We use the python SDK to talk to our database which is pre-installed on a deta micro.
Each item in the database is information about a review request and will have the following schema
{
"key": str, // randomly_generated
"reviewer": str, // reviewer
"pull_request": int, // pull request number
"requested_at" : int, // posix timestamp of request
"submitted_at" : int, // posix timestamp of review submission
"submitted": bool, // if the review has been submitted
"crt": int // code review turnaround in seconds
}
- Create a file called
reviews.py
with the following code.
from dateutil.parser import isoparse
from datetime import datetime, timezone
from deta import Deta
# manages storing, fetching and updating review requests information
class ReviewRequestStore:
def __init__(self):
# creating a new base (or table) is only one line of code
self.db = Deta().Base("code_reviews")
# get review req from pull request number and reviewer
def __get_review_req(self, pr_num: int, reviewer: str):
# generator
review_reqs_gen = next(
self.db.fetch(
{"submitted": False, "pull_request": pr_num, "reviewer": reviewer}
)
)
review_reqs = []
for r in review_reqs_gen:
review_reqs.append(r)
# there should be only one corresponding unsubmitted review request
if len(review_reqs) == 0:
raise Exception("No corresponding review request found")
if len(review_reqs) > 1:
raise Exception(
"Found multiple imcomplete reviews for same pull request and reviewer"
)
return review_reqs[0]
# store review request
def store(self, payload: dict):
# POSIX timestamp
current_time = int(datetime.now(timezone.utc).timestamp())
item = {
"reviewer": payload["requested_reviewer"]["login"],
"pull_request": payload["pull_request"]["number"],
"requested_at": current_time,
"submitted": False,
}
self.db.put(item)
# mark review request complete
def mark_complete(self, payload: dict):
submission_time = int(isoparse(payload["review"]["submitted_at"]).timestamp())
pr_num = payload["pull_request"]["number"]
reviewer = payload["review"]["user"]["login"]
review_req = self.__get_review_req(pr_num, reviewer)
# updates to the review request
updates = {
"submitted": True,
"submitted_at": submission_time,
"crt": submission_time - review_req["requested_at"],
}
self.db.update(updates, review_req["key"])
return
# delete review request
def delete(self, payload: dict):
pr_num = payload["pull_request"]["number"]
reviewer = payload["requested_reviewer"]["login"]
review_req = self.__get_review_req(pr_num, reviewer)
self.db.delete(review_req["key"])
# get review requests created since date
def get(self, created_since: str):
# posix timestamp
since = int(isoparse(created_since).timestamp())
# query submitted reviews created since 'since'
review_reqs_since_gen = next(
self.db.fetch({"requested_at?gte": since, "submitted": True})
)
review_reqs_since = []
for req in review_reqs_since_gen:
review_reqs_since.append(req)
return review_reqs_since
# initializing a singleton, only one instance should be used
rev_req_store = ReviewRequestStore()
Creating or connecting to the database is only a single line of code if you use Deta Base as you can see in the constructor of the ReviewRequestStore
class. It requires no pre-set up of a database.
self.db = Deta().Base("code_reviews")
The ReviewRequestStore
class offers methods to store, mark as complete, delete and retrieve the review requests from the database. These methods do the necessary processing of the github payloads to store, update and retrieve only necessary information.
Also, an instance of the class is already instantiated here as it should be a singleton. We will import this instance directly in our main.py
and later for the insights.
- Now we update our
main.py
to handle the payloads from github. Openmain.py
and update the code to the following.
from fastapi import FastAPI, Request, HTTPException
import utils
from reviews import rev_req_store
# FastAPI app
app = FastAPI()
@app.post("/webhook_events")
async def webhook_handler(request: Request):
# verify webhook signature
raw = await request.body()
signature = request.headers.get("X-Hub-Signature")
if signature != utils.calc_signature(raw):
raise HTTPException(status_code=401, detail="Unauthorized")
# handle events
payload = await request.json()
event_type = request.headers.get("X-Github-Event")
# reviews requested or removed
if event_type == "pull_request":
action = payload.get("action")
if action == "review_requested":
# store the review request
rev_req_store.store(payload)
elif action == "review_request_removed":
# delete the review request
rev_req_store.delete(payload)
return "ok"
# review submitted
if event_type == "pull_request_review" and payload.get("action") == "submitted":
# mark review request complete
return "ok"
# ignore other events
return "ok"
Let's deploy the latest changes.
$ deta deploy
Generate the insights
Now we need to implement retrieving the data from the store and generating the insights with average review turnaround time. We use Bokeh for generating the HTML chart.
- Create a file called
insights.py
with the following code
from datetime import datetime, timedelta
from dateutil.parser import isoparse
from statistics import mean
from math import isnan, nan
from bokeh.plotting import figure
from bokeh.resources import CDN
from bokeh.embed import file_html
from bokeh.models import HoverTool
from reviews import rev_req_store
# manages generating the insights data
class Chart:
def __init__(self):
# maps durations to number of days
self.__durations = {
"week": 7, # number of days
"month": 30, # number of days
}
# maps periods to number of seconds
self.__periods = {
"day": 60 * 60 * 24, # number of seconds
"week": 60 * 60 * 24 * 7, #number of seconds
}
# get submitted reviews bucketed by preiods based on duration
def __get_insights(self, duration: str, period: str):
if not self.__durations[duration] or not self.__periods[period]:
raise ValueError("bad duration or period")
since = self.__get_since(self.__durations[duration])
submitted_reviews = rev_req_store.get(since)
return self.__bucket_submissions(since, period, submitted_reviews)
# convert duration into iso 8601 date format
def __get_since(self, days: int):
since = datetime.now() - timedelta(days=days)
return since.isoformat()
# bucket submitted reviews based on submission timestamp since date averaged by period
def __bucket_submissions(self, since: str, period: str, submitted_reviews: list):
now_posix = int(datetime.now().timestamp())
since_posix = int(isoparse(since).timestamp())
buckets = {}
average_buckets = {}
separators = []
# separators are calculated based on period
# for eg. if period is "day", separators are distanced by 86400 seconds
start = since_posix + self.__periods[period]
for start in range(since_posix, now_posix + 1, self.__periods[period]):
buckets[start] = []
separators.append(start)
# fill the buckets
for rev in submitted_reviews:
for separator in separators:
# the separaotrs are sorted in increasing order
# so a simple comparision suffices here
if separator > rev["requested_at"]:
buckets[separator].append(rev["crt"])
break
# compute average for each bucket
for separator in buckets:
date = datetime.fromtimestamp(separator)
crts = buckets[separator]
average_buckets[date] = nan # nan here to denote missing data for the chart
if len(crts) != 0:
average_buckets[date] = round(mean(buckets[separator]) / 60, 2)
return average_buckets
# generate html chart with bokeh
def __generate_chart(self, buckets: dict):
p = figure(
title="Average code review turnarounds",
x_axis_type="datetime",
x_axis_label="date",
y_axis_label="average turnaround (mins)",
plot_height=800,
plot_width=800,
)
x = list(buckets.keys())
y = list(buckets.values())
p.scatter(x, y, color="red")
p.line(x, y, color="red", legend_label="moving average code review turnaround")
return file_html(p, CDN, "Average code review turnarounds")
# get html chart
def get_chart(self, duration: str, period: str):
buckets = self.__get_insights(duration, period)
return self.__generate_chart(buckets)
# get json of average values
def get_json(self, duration: str, period: str):
buckets = self.__get_insights(duration, period)
for date in buckets:
if isnan(buckets[date]):
buckets[date] = 0
return buckets
Here we create a class Chart
that manages the insights. Chart
offers two main methods to get the insights, get_chart
and get_json
to either get an html chart or a json.
The insights are calculated based on the parameters duration
and period
.
The main algorithm here is to get the submitted reviews since a specific date, bucket the submitted reviews based on the period and return averages for each period.
# get submitted reviews bucketed by preiods based on duration
def __get_insights(self, duration: str, period: str):
if not self.__durations[duration] or not self.__periods[period]:
raise ValueError("bad duration or period")
since = self.__get_since(self.__durations[duration])
submitted_reviews = rev_req_store.get(since)
return self.__bucket_submissions(since, period, submitted_reviews)
# convert duration into iso 8601 date format
def __get_since(self, days: int):
since = datetime.now() - timedelta(days=days)
return since.isoformat()
# bucket submitted reviews based on submission timestamp since date averaged by period
def __bucket_submissions(self, since: str, period: str, submitted_reviews: list):
now_posix = int(datetime.now().timestamp())
since_posix = int(isoparse(since).timestamp())
buckets = {}
average_buckets = {}
separators = []
# separators are calculated based on period
# for eg. if period is "day", separators are distanced by 86400 seconds
start = since_posix + self.__periods[period]
for start in range(since_posix, now_posix + 1, self.__periods[period]):
buckets[start] = []
separators.append(start)
# fill the buckets
for rev in submitted_reviews:
for separator in separators:
# the separaotrs are sorted in increasing order
# so a simple comparision suffices here
if separator > rev["requested_at"]:
buckets[separator].append(rev["crt"])
break
# compute average for each bucket
for separator in buckets:
date = datetime.fromtimestamp(separator)
crts = buckets[separator]
average_buckets[date] = nan # nan here to denote missing data for the chart
if len(crts) != 0:
average_buckets[date] = round(mean(buckets[separator]) / 60, 2)
return average_buckets
The main algorithm:
- calculate the exact time from which we need to retrieve the duration. For eg. if the duration is
week
, calculate the timestamp of exactly a week ago. This is thesince
. - get submitted reviews created since the
since
timestamp - divide up the time between
since
andnow
to equal intervals ofperiod
by using timestamps as separators. So, besides the first one, each separator will beperiod
seconds higher than the previous separator. - bucket review submissions to the right intervals based on the
submitted_at
timestamp - calculate average review turnarounds of each bucket
Add api for getting the insights
Now that we have our insights, the final step is to implement the api that the users can get the insights from. For this we offer a GET endpoint to get the insights.
- Open
main.py
and update it to the following code
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse
import utils
from reviews import rev_req_store
from insights import Chart
# FastAPI app
app = FastAPI()
# chart
chart = Chart()
## cache generated charts
CACHE_MAX_AGE = 300
@app.post("/webhook_events")
async def webhook_handler(request: Request):
# verify webhook signature
raw = await request.body()
signature = request.headers.get("X-Hub-Signature")
if signature != utils.calc_signature(raw):
raise HTTPException(status_code=401, detail="Unauthorized")
# handle events
payload = await request.json()
event_type = request.headers.get("X-Github-Event")
# reviews requested or removed
if event_type == "pull_request":
action = payload.get("action")
if action == "review_requested":
rev_req_store.store(payload)
elif action == "review_request_removed":
rev_req_store.delete(payload)
return "ok"
# review submitted
if event_type == "pull_request_review" and payload.get("action") == "submitted":
rev_req_store.mark_complete(payload)
return "ok"
# ignore other events
return "ok"
# get average turnaround insights
# last: for last 'x', 'x' is only one of 'week' or 'month' currently
# period: 'period to calculate average of, currently 'day' or 'week'
# plot: whether to generate a plot or not, returns json if plot is False
@app.get("/turnarounds/")
def get_turnarounds(last: str = "week", period: str = "day", plot: bool = True):
try:
if not plot:
return chart.get_json(last, period)
html_chart = chart.get_chart(last, period)
return HTMLResponse(
content=html_chart, headers={"Cache-Control": f"max-age={CACHE_MAX_AGE}"}
)
except ValueError:
raise HTTPException(status_code=400, detail="Bad duration or period")
We added a route /turnarounds/
to get the insights with three query parameters.
-
last:str
: the duration since the request to get the average turnarounds of, onlyweek
ormonth
supported for now, defaults toweek
-
period:str
: the period to calculate the average over, onlyday
orweek
supported for now, defaults today
-
plot:bool
: whether to view a plot or get a json response, defaults totrue
Finally, deploy the changes
$ deta deploy
And we are done. The application should now keep track of the review turnarounds and you can easily get the insights from the api.
If you don't see the application behaving as expected, you can see real-time logs of your application in Deta Visor. To open the visor page, navigate to your micro's visor page on Deta or open it from the cli directly:
$ deta visor open
Deta Base also offers a UI which can be used to easily see what is stored in the database. Here's a screenshot of my base's data with completed submissions.
The entire source code of the application can be viewed on github.
Conclusion
In a matter of few hours we created a github insights tool ourselves (instead of subscribing to an expensive enterprise solution) and deployed it to production effortlessly.
GRT can be easily tweaked and extended to enable additional features:
- get individual insights for a reviewer
- add other
durations
andperiods
- extend the api to accept
from
andto
dates - return
min
andmax
turnaround times along with the average for eachperiod
- configure the same instance of the app to be used for multiple repositories
Deta enables developers direct their focus primarily on development and implementation of ideas and tools like grt and get them out as quickly as possible.
Posted on September 28, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 28, 2020