Track your coding progress on GitHub with a .NET Worker Service

sannae

Edoardo Sanna

Posted on January 13, 2023

Track your coding progress on GitHub with a .NET Worker Service

Since I started pushing my code to GitHub, I really liked the Top Languages stats in the repository's properties (as well as the beautiful anuraghazra's github-readme-stats cards): I was just amazed that some kind of hidden mechanism was able to tell the language used in almost all the files in the repository.

As I later found out, GitHub uses the Linguist library to measure the amount of lines written in a specific language... which is still pretty magic πŸͺ„.

🎯 Motivation

What surprised me is that there's no historical data about these statistics: I was wondering if I could produce a nice line chart race to show the progress I made along time on the languages I've been working on, similarly to the Top Programming Languages section of the Octoverse, maybe also adding some additional post-elaboration analytics, like

  • what's the fastest growing language
  • what's the language I've been working most lately
  • what's the least used language, or the 'sleeping' ones

⚠️ Warning: "Top Languages" don't indicate any skill level! But it's definitely interesting to use the amount of written lines of code as an approximate metric of progress, even for simple motivational purpose! Wouldn't it be nice to see a line chart showing the amazing progress you've made on all the languages across, let's say, 10-20 years?

πŸ‘ΆπŸ» Baby steps

Apparently the GitHub REST Repositories API only has a list request (GET /repos/{owner}/{repo}/languages, giving the current list of languages used in a specific repo, but there's no place in GitHub where such historical data are stored.

What if I use this listing feature to get a daily image of the languages used in all my repositories, and persist it on a historical table in a database?

Since I will use it to track my languages' progress, let's call it LangTracker.

Therefore, below I will try to:

  • set up the starter background service
  • add the authentication to GitHub API
  • connect the service to a database
  • deploy and run it

We will need the following ingredients:

πŸš› Background Service

Since I'm learning .NET, I will create it as a background Worker Service.

Let's create the project using the worker template with the .NET CLI:

# Create new project from template
dotnet new worker --name LangTracker

# Create new solution
dotnet new sln --name LangTracker

# Add project to solution
dotnet sln ./LangTracker.sln add ./LangTracker.csproj 
Enter fullscreen mode Exit fullscreen mode

The basic Program.cs is quite simple:

// Program.cs
using LangTracker;
IHost host = Host.CreateDefaultBuilder(args)
    .ConfigureServices(services =>
    {
        services.AddHostedService<Worker>();
    })
    .Build();
await host.RunAsync();
Enter fullscreen mode Exit fullscreen mode

It just creates an host and registers the hosted service Worker in the Dependency Injection container. It then builds and starts the background service, and it finally runs the application.

The worker behavior is defined in the Worker.cs class, specifically in a while loop which runs indefinitely until a CancellationToken is hit:

// Worker.cs
namespace LangTracker;
public class Worker : BackgroundService
{
    private readonly ILogger<Worker> _logger;

    public Worker(ILogger<Worker> logger)
    {
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            // Do something
            // ...

            // Wait 1000 ms 
            await Task.Delay(1000, stoppingToken);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

You may want to expand a little the execution frequency: to have a daily image, I set it to 24 hours:

await Task.Delay(1000*3600*24, stoppingToken); // πŸ‘ˆπŸ» <-- or whatever you prefer
Enter fullscreen mode Exit fullscreen mode

πŸ—„οΈ Database

I expect each language record to be saved in a single table with a DateTime property and a Size property (i.e. size in KB).

For the sake of simplicity, I'll be using PostgreSQL as DBRMS.

To manage the database, we'll first install the required packages:

dotnet add package Microsoft.EntityFrameworkCore
dotnet add package Microsoft.EntityFrameworkCore.Tools
dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL
Enter fullscreen mode Exit fullscreen mode

Let's then get started by modeling our GithubLanguage record with the following class:

// Models/GitHubLanguage.cs
namespace LangTracker.Models
{
    public class GithubLanguage
    {
        [Key]
        public int Id { get; set; }
        public DateTime Date { get; set; }
        public string? Repo { get; set; }
        public string? Language { get; set; }
        public double Size { get; set; }
    }
}
Enter fullscreen mode Exit fullscreen mode

We'll then scaffold a Database Context class with all the details of our database:

// Data/DbContext.cs
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    if (_configuration is not null)
    {
       optionsBuilder.UseNpgsql(_configuration.GetConnectionString("PostgreSQL"));
    }
    base.OnConfiguring(optionsBuilder);
}
Enter fullscreen mode Exit fullscreen mode

And in mapping the model to the database:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    // Map table names ("Languages")
    modelBuilder.Entity<GithubLanguage>().ToTable("Languages", "dbschema");
    modelBuilder.Entity<GithubLanguage>(entity =>
    {
        entity.HasKey(e => e.Id);
    });
    base.OnModelCreating(modelBuilder);
}
Enter fullscreen mode Exit fullscreen mode

Remember to add the connection string in the appsettings.json file:

  "ConnectionStrings": {
    "PostgreSQL": "User ID=YOUR_POSTGRES_ID;Password=YOUR_POSTGRES_PASSWORD;Host=POSTGRES_HOSTNAME;Port=POSTGRES_PORT;Database=LangTracker;Pooling=true;"
  },
Enter fullscreen mode Exit fullscreen mode

⚠️ PostgreSQL has some issues when saving datetimes instead of UTC timestamps, hence remember to add at the beginning of Program.cs:

// Pgsql-specific configuration for datetimes
AppContext.SetSwitch("Npgsql.EnableLegacyTimestampBehavior", true);
AppContext.SetSwitch("Npgsql.DisableDateTimeInfinityConversions", true);
Enter fullscreen mode Exit fullscreen mode

Finally, let's create the database using Entity Framework Core's migrations, e.g. via .NET CLI:

# Create the first migration, called InitialCreate
dotnet ef migrations add InitialCreate

# Update the database
dotnet ef database update
Enter fullscreen mode Exit fullscreen mode

πŸ’» GitHub client

Let's proceed with getting an API personal access token from GitHub: you may generate one in your settings at "Personal Access Tokens".

Then test it via CLI, asking for instance the details of your user:

# bash
user=$"YOUR-GITHUB-USERNAME"
token=$"YOUR-GITHUB-PERSONAL-ACCESS-TOKEN"
curl -i -u "$user:$token" https://api.github.com/users/$user`
Enter fullscreen mode Exit fullscreen mode

The switch -i will display the HTTP headers: notice the content-type header (it should be application/json) and the x-ratelimit-limit header (maximum amount of available request per hour, it should be 5000 for authenticated requests - usage is tracked by the x-ratelimit-remaining).

In order not to hardcode our GitHub username and token, we will save them as environment variables and pass them to our background service.

So now, back to our Worker, we'll have to inject the configuration :

public Worker(ILogger<Worker> logger, IConfiguration configuration) // πŸ‘ˆπŸ» <-- add configuration here
{
    _logger = logger;
    _configuration = configuration; // πŸ‘ˆπŸ» <-- and here
}
Enter fullscreen mode Exit fullscreen mode

This loads all the .NET default configuration sources, which includes non-prefixed environment variables and user secrets.

Hence, by adding a new set of environment variables:

GITHUB_LOGIN=[YOUR-GITHUB-USERNAME-HERE]
GITHUB_TOKEN=[YOUR-GITHUB-TOKEN-HERE]
Enter fullscreen mode Exit fullscreen mode

We will be able to retrieve them just by adding to our while loop:

// Read token from env variables
string login = _configuration["GITHUB_LOGIN"];
string token = _configuration["GITHUB_TOKEN"];
Enter fullscreen mode Exit fullscreen mode

To interact with the GitHub API, let's install the Octokit.NET library with dotnet add package Octokit, then instantiate a client as simply as:

// Instantiate Github Client
var client = new GitHubClient(new ProductHeaderValue("lang-tracker"));

// Authenticate with token
var tokenAuth = new Credentials(login, token);
client.Credentials = tokenAuth;
Enter fullscreen mode Exit fullscreen mode

Fetch data

Once the client is authenticated with a specific login on the GitHub API, we should be able to retrieve all the repositories for a specific user:

// Get all repos, public & private, from current login
var repos = await client.Repository.GetAllForCurrent();
Enter fullscreen mode Exit fullscreen mode

Then, loop over the found repositories, saving the current image of all the languages' sizes in KB:

foreach (Repository repo in repos)
{
    // Filter away Company repos & forked repos
    if (repo.Owner.Login == login && !repo.Fork)
    {
    // All languages in current repo
        var langs = await client.Repository.GetAllLanguages(repo.Id);

        foreach (var lang in langs)
        {
            // New language record
        var githubLangRecord = new GithubLanguage
            {
                Date = DateTime.Now,
                Repo = repo.Name,
                Language = lang.Name,
                Size = lang.NumberOfBytes / 1024.00
            };
    }
    }
}
Enter fullscreen mode Exit fullscreen mode

The code above creates a new GithubLanguage object, made of the retrieved language, the corresponding repository, the size in KB and it labels it with the current datetime.

I've neglected the Company's repositories and the forked ones, as I only want to keep track of my personal projects.

Save to database

Now, let's just add the operations required to save each record to the database:

foreach (var lang in langs)
{
    var githubLangRecord = new GithubLanguage
    {
        // ...
    };

    dbContext.Add(githubLangRecord); // πŸ‘ˆπŸ» <-- Add entity
}
dbContext.SaveChanges(); // πŸ‘ˆπŸ» <-- Save into db
Enter fullscreen mode Exit fullscreen mode

πŸ‹ Deployment

Now, I chose the Worker Service template because it can be easily deployed as a systemd daemon or a Windows service with a simple addition to Program.cs:

// Program.cs
using LangTracker;
IHost host = Host.CreateDefaultBuilder(args)
    .UseWindowsService() // πŸ‘ˆπŸ» <-- here
    .UseSystemd() // πŸ‘ˆπŸ» <-- and here
    .ConfigureServices(services =>
    {
        services.AddHostedService<Worker>();
    })
    .Build();
await host.RunAsync();
Enter fullscreen mode Exit fullscreen mode

If you want to try it out, remember to install the packages:

dotnet add package Microsoft.Extensions.Hosting
dotnet add package Microsoft.Extensions.Hosting.Systemd
dotnet add package Microsoft.Extensions.Hosting.WindowsServices
Enter fullscreen mode Exit fullscreen mode

Nevertheless, deploying as a containerized service is sooo much more convenient, as it gives me the chance to specify the installation instructions in a single docker-compose.yaml file and run it with a single command.

Therefore, we'll be using

The Dockerfile is the standard one automatically built by Visual Studio:

# base image
FROM mcr.microsoft.com/dotnet/aspnet:6.0 AS base
WORKDIR /app

# build image: restore and build
FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["LangTracker.csproj", "."]
RUN dotnet restore "./LangTracker.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "LangTracker.csproj" -c Release -o /app/build

# publish image: publish
FROM build AS publish
RUN dotnet publish "LangTracker.csproj" -c Release -o /app/publish

# run image: run
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "LangTracker.dll"]
Enter fullscreen mode Exit fullscreen mode

The next step is to provide a docker-compose.yaml recipe to start two container services:

  • langtracker_app containing the app (based on the Dockerfile above) and
  • langtracker_db containing the database (based on the official postgres image):

To ensure connectivity between the two containers, remember to:

  1. Add the two environment variables POSTGRES_PASSWORD and POSTGRES_PORT, respectively the password for the postgres user and the local port to be mapped to the container's default PostgreSQL port 5432
  2. Change your connection string's content in appsettings.json from Host=POSTGRES_HOSTNAME to Host=langtracker_db, taking advantage of the default bridge network driver providing automatic DNS resolution between containers.

Here's the docker-compose file I've been using:

services:
 db:
   image: postgres
   container_name: langtracker_db
   restart: always
   ports:
      - "${POSTGRES_PORT}:5432"
   environment:
     POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
   volumes:
     - postgres-data:/var/lib/postgresql/data
 app:
   container_name: langtracker_app
   build:
     context: .
     dockerfile: ./Dockerfile
   depends_on:
     - db
   environment:
     - GITHUB_LOGIN=${GITHUB_LOGIN}
     - GITHUB_TOKEN=${GITHUB_TOKEN}
volumes:
 postgres-data:
Enter fullscreen mode Exit fullscreen mode

Let's just run it:

docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

Depending on your network, it may require a few minutes to download the base images and run the two containers, but in the end:

docker-compose

Now, when you access the database in your langtracker_db container (even with a simple sudo -u postgres psql -h localhost -p YOUR_POSTGRES_LOCAL_PORT), you may perform a nice aggregating query like:

select "Date"::date as Day, "Language", sum("Size") as TotalSizeKB from "dbschema"."Languages" 
group by "Date"::date, "Language" 
order by Day desc, TotalSizeKB desc;
Enter fullscreen mode Exit fullscreen mode

Giving you the final result of your daily image of languages (below, a couple of days during last summer):

    day     |  Language  |  totalsizekb
------------+------------+----------------
 2022-07-20 | C#         |   166.75390625
 2022-07-20 | PowerShell |   166.16015625
 2022-07-20 | HTML       | 151.7119140625
 2022-07-20 | Python     |    94.58984375
 2022-07-20 | TSQL       |     23.9765625
 2022-07-20 | JavaScript |  23.9599609375
 2022-07-20 | CSS        |    9.599609375
 2022-07-20 | Shell      |     2.54296875
 2022-07-20 | Dockerfile |    2.412109375
 2022-07-19 | PowerShell |   166.16015625
 2022-07-19 | HTML       | 148.2822265625
 2022-07-19 | C#         | 143.2822265625
 2022-07-19 | Python     |    94.58984375
 2022-07-19 | TSQL       |     23.9765625
 2022-07-19 | JavaScript |  23.7392578125
 2022-07-19 | CSS        |   8.5712890625
 2022-07-19 | Shell      |     2.54296875
 2022-07-19 | Dockerfile |     1.56640625
Enter fullscreen mode Exit fullscreen mode

Small and progressive steps in C# and HTML, apparently 😊...

We're good to go! πŸŽ‰βœ¨ I'll just run it for a few months, then maybe I'll adjust it to track the language stats only once a month or so, and see the results.

πŸ’‘ Next steps

The background service does what it's supposed to do, but it's way far from perfect. Here are some random ideas popping up:

πŸͺ² Fixes:

  • If the only purpose is to show aggregated data, maybe it'd be a good idea to save in the database only the aggregated form

  • The service works from the moment you start it, but it cannot retrieve past data.

    • Maybe we could replace the /repos/{owner}/{repo}/languages call with the /repos/{owner}/{repo}/commits request
    • Then retrieve all the files in each commit, and invoke the above-mentioned Linguist library to get the list of languages
    • This looks like a very time consuming effort, as perfectly described by @ahegiy in its note on a GitLab issue on the same topic.
      • I particularly like the idea of doing this 'on demand' when asked by the user

🌼 Features:

  • Create a frontend application to show nice line chart (maybe with ChartJS?)
  • Explore the GitHub GrapQL API instead of the REST API!

Maybe I'll try to deep dive into one of these points in the next month or so, we'll see.

Meanwhile, have a great time and keep coding! πŸ‘πŸ»

πŸ’– πŸ’ͺ πŸ™… 🚩
sannae
Edoardo Sanna

Posted on January 13, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related