AdTech using SingleStoreDB and Looker
Akmal Chaudhri
Posted on March 16, 2023
Abstract
In this article, we'll see how to run a SingleStore GitHub project for implementing an AdTech application in SingleStoreDB Cloud. We'll also use Looker to create some visualisations.
Introduction
In this article, we'll take an existing SingleStore GitHub repo and demonstrate the ease with which it can be deployed and run on SingleStoreDB Cloud. The complete updated code can be found in a forked GitHub repo for ease of use.
Create a SingleStoreDB Cloud account
A previous article showed the steps to create a free SingleStoreDB Cloud account. We'll use AdTech Demo Group as our Workspace Group Name and adtech-demo as our Workspace Name. We'll make a note of our password and host name.
Install Go
First, we'll download and install Go, as follows:
rm -rf /usr/local/go && tar -C /usr/local -xzf go1.20.1.linux-amd64.tar.gz
We may need to use sudo
, as follows:
sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.20.1.linux-amd64.tar.gz
We'll add the following to the PATH
variable:
export PATH=$PATH:/usr/local/go/bin
and then check the version:
go version
This will print the installed version of Go. For example:
go version go1.20.1 linux/amd64
Clone GitHub repo
We'll now clone the following GitHub repo:
git clone https://github.com/VeryFatBoy/starter_kits
Data Generator
Let's first start with the AdTech data generator and change to the directory:
cd starter_kits/adtech/generator
We'll now build the code, as follows:
go mod init generator
go mod tidy
go build
This will create an executable generator
file.
Let's now create a small
dataset for initial testing:
./generator small
This will create four files with tab-separated values:
campaigns-small.tsv
events-small.tsv
networks-small.tsv
platforms-small.tsv
We'll gzip
these files:
gzip *.tsv
We can also pass the parameter medium
or large
for larger datasets.
Load and run SQL
We'll move up one directory level:
cd ..
In the file setup.sql
, the following four code blocks each use a pipeline and have been commented out:
...
-- create or replace pipeline load_events as
-- load data s3 'reference-solutions.memsql.com/events-large.tsv.gz'
-- config '{"region": "us-east-1"}'
-- credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'
-- into table events;
-- start pipeline load_events;
...
-- create or replace pipeline load_campaigns as
-- load data s3 'reference-solutions.memsql.com/campaigns-large.tsv.gz'
-- config '{"region": "us-east-1"}'
-- credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'
-- skip duplicate key errors
-- into table campaigns;
-- start pipeline load_campaigns;
...
-- create or replace pipeline load_networks as
-- load data s3 'reference-solutions.memsql.com/networks-large.tsv.gz'
-- config '{"region": "us-east-1"}'
-- credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'
-- skip duplicate key errors
-- into table networks;
-- start pipeline load_networks;
...
-- create or replace pipeline load_platforms as
-- load data s3 'reference-solutions.memsql.com/platforms-large.tsv.gz'
-- config '{"region": "us-east-1"}'
-- credentials '{"aws_access_key_id": "", "aws_secret_access_key": ""}'
-- skip duplicate key errors
-- into table platforms;
-- start pipeline load_platforms;
...
If we decide to use pipelines later, we can uncomment and modify the code to suit our needs.
The following four new code blocks, respectively, have been added so that we can load the locally generated data instead:
...
load data local infile "generator/events-small.tsv.gz"
into table events
columns terminated by '\t';
...
load data local infile "generator/campaigns-small.tsv.gz"
into table campaigns
skip duplicate key errors
columns terminated by '\t';
...
load data local infile "generator/networks-small.tsv.gz"
into table networks
skip duplicate key errors
columns terminated by '\t';
...
load data local infile "generator/platforms-small.tsv.gz"
into table platforms
skip duplicate key errors
columns terminated by '\t';
...
Next, we'll run the commands in the setup.sql
file to create the adtech
database and tables and load the data:
mysql -u admin -h <host> -P 3306 -p<password> --local-infile < setup.sql
We'll replace the <host>
and <password>
with the values from our SingleStoreDB Cloud account.
From our SingleStoreDB Cloud account, we can use the SQL Editor to check that the tables were correctly created:
USE adtech;
SHOW TABLES;
Benchmarking using dbbench (optional)
The dbbench
tool can be used to run workloads against a SingleStoreDB database. A discussion of the tool is beyond the scope of this article, but the SingleStore GitHub repo is a great place to start and contains a tutorial.
To build and run the dbbench
program, we'll first change to the directory starter_kits/adtech/benchmarks
.
Next, we'll set up some environment variables:
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
and then install dbbench
, as follows:
go install github.com/memsql/dbbench@latest
We can run the program from the starter_kits/adtech/benchmarks
directory, as follows:
dbbench --host=<host> --username=admin --password=<password> --intermediate-stats=false adtech.ini
We'll replace the <host>
and <password>
with the values from our SingleStoreDB Cloud account.
Here is an example of the results using the small
dataset with an S-00
Workspace Size:
Connected
starting impressions per campaign
starting ctr_per_country
starting campaign facts
starting spend per advertiser
stopping spend per advertiser
stopping ctr_per_country
stopping campaign facts
stopping impressions per campaign
ctr_per_country: latency 59.327792ms±15.229575ms; 30 transactions (16.847 TPS); 7470 rows (4194.792 RPS)
16.777216ms - 33.554432ms [ 12]: ██████████████████████████████████████████████████
33.554432ms - 67.108864ms [ 7]: █████████████████████████████▏
67.108864ms - 134.217728ms [ 10]: █████████████████████████████████████████▋
134.217728ms - 268.435456ms [ 1]: ████▏
spend per advertiser: latency 58.056977ms±15.261319ms; 30 transactions (17.215 TPS); 15870 rows (9106.741 RPS)
8.388608ms - 16.777216ms [ 1]: ███▎
16.777216ms - 33.554432ms [ 6]: ████████████████████
33.554432ms - 67.108864ms [ 15]: ██████████████████████████████████████████████████
67.108864ms - 134.217728ms [ 7]: ███████████████████████▎
134.217728ms - 268.435456ms [ 1]: ███▎
impressions per campaign: latency 116.803395ms±35.36871ms; 30 transactions (8.559 TPS); 30 rows (8.559 RPS)
8.388608ms - 16.777216ms [ 2]: ██████▋
16.777216ms - 33.554432ms [ 2]: ██████▋
33.554432ms - 67.108864ms [ 2]: ██████▋
67.108864ms - 134.217728ms [ 15]: ██████████████████████████████████████████████████
134.217728ms - 268.435456ms [ 7]: ███████████████████████▎
268.435456ms - 536.870912ms [ 2]: ██████▋
campaign facts: latency 64.628849ms±30.600693ms; 30 transactions (15.464 TPS); 0 rows (0.000 RPS)
8.388608ms - 16.777216ms [ 5]: █████████████████████████
16.777216ms - 33.554432ms [ 5]: █████████████████████████
33.554432ms - 67.108864ms [ 10]: ██████████████████████████████████████████████████
67.108864ms - 134.217728ms [ 9]: █████████████████████████████████████████████
134.217728ms - 268.435456ms [ 0]:
268.435456ms - 536.870912ms [ 1]: █████
Using dbbench
allows us to build reference implementations and then test the performance against these.
Visualisations using Looker
Finally, we'll create some visualisations.
To use Looker, we can use an existing account or request a free trial.
Once logged into the Looker account, from Account > Admin > Database > Connections we'll create a New Connection, as follows:
- Name: reference_solutions
- Dialect: SingleStore 7+
-
Remote Host:
<host>
Port: 3306 - Database: adtech
- Username: admin
-
Password:
<password>
We'll replace the <host>
and <password>
with the values from our SingleStoreDB Cloud account.
We can test if the connection is successful.
Next, from Account > Develop, we'll ensure that Development Mode is toggled to ON, select Manage LookML Projects, and choose New LookML Project.
We'll now enter the following:
- Project Name: adtech
- Starting Point: Generate Model from Database Schema
- Connection: reference_solutions
- Build Views From: All Tables
Next, we'll select Create > Dashboard from the Looker home page. We'll use the name Events and then click Create Dashboard.
We'll click Edit Dashboard and then Add Visualization.
From the left nav, we'll select the Events table, which will appear expanded in a new window. Under DIMENSIONS, we'll select Country; under MEASURES, we'll choose Count. We'll select Google Maps on the right-hand side and then choose Run.
This will create a heatmap as shown in Figure 1.
Hovering over the different countries will provide the number of events for that country.
From the Looker home page, we'll select Create > Dashboard. We'll use the name Campaigns and then click Create Dashboard.
We'll click Edit Dashboard and then Add Visualization.
We'll select the Campaigns table from the left nav, which will appear expanded in a new window. Under DIMENSIONS, we'll select Campaign Name; under MEASURES, we'll choose Count. On the right-hand side, we'll set the Row Limit to 10, select Pie Chart and then choose Run.
This will create a chart as shown in Figure 2.
Additional dashboards could be created to get further insights.
Summary
We have generated some example AdTech data and loaded the small
dataset into SingleStoreDB Cloud. The next step would be to create and load the medium
or large
dataset. We could also store the AdTech data in the cloud and use the pipelines feature to stream the data into SingleStoreDB.
We have built and tested dbbench
. This tool could be handy for creating reference test results.
Finally, we have created two quick dashboards using Looker to gain insights into some of our AdTech data. Additional visualisations could be easily created using this powerful browser-based tool.
Posted on March 16, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.