Building map-based data visualizations with Mapbox, React, and Cube.js 🗺
Igor Lukanin
Posted on January 29, 2021
TL;DR: I'll explain how to build a visually appealing and fast web app with different kinds of maps. It'll be fun.
Hey devs 👋
As you most likely know, there are many ways to visualize data, but when it comes to location-based (or geospatial) data, map-based data visualizations are the most comprehensible and graphic.
In this guide, we'll explore how to build a map data visualization with JavaScript (and React) using Mapbox, a very popular set of tools for working with maps, navigation, and location-based search, etc.
We'll also learn how to make this map data visualization interactive (or dynamic), allowing users to control what data is being visualized on the map.
Here's our plan for today:
- Set up the dataset and launch an API
- Create a frontend app and integrate it with Mapbox
- Learn how to build heatmap visualization
- Learn how to build dynamic points visualization
- Learn how to build points and events visualization
- Learn how to build choropleth visualization
- Have a moment to great feelings in the end 😇
And... do you wonder what our result is going to look like? Not that bad, right?
To make this guide even more interesting, we'll use Stack Overflow open dataset, publicly available in Google BigQuery and on Kaggle. With this dataset, we'll be able to find answers to the following questions:
- Where do Stack Overflow users live?
- Is there any correlation between Stack Overflow users' locations and their ratings?
- What is the total and average Stack Oerflow users' rating by country?
- Is there any difference between the locations of people who ask and answer questions?
Also, to host and serve this dataset via an API, we'll use PostgreSQL as a database and Cube.js as an analytical API platfrom which allows to bootstrap an backend for an analytical app in minutes.
So, that's our plan — and let's get hacking! 🤘
If you can't wait to discover how it's built, feel free to study the demo and the source code on GitHub. Otherwise, let's proceed.
Dataset and API
Original Stack Overflow dataset contains locations as strings of text. However, Mapbox best works with locations encoded as GeoJSON, an open standard for geographical features based (surprise!) on JSON.
That's why we've used Mapbox Search API to perform geocoding. As the geocoding procedure has nothing to do with map data visualization, we're just providing the ready to use dataset with embedded GeoJSON data.
Setting Up a Database 🐘
We'll be using PostgreSQL, a great open-source database, to store the Stack Overflow dataset. Please make sure to have PostgreSQL installed on your system.
First, download the dataset ⬇️ (the file size is about 600 MB).
Then, create the stackoverflow__example
database with the following commands:
$ createdb stackoverflow__example
$ psql --dbname stackoverflow__example -f so-dataset.sql
Setting Up an API 📦
Let's use Cube.js, an open-source analytical API platform, to serve this dataset over an API. Run this command:
$ npx cubejs-cli create stackoverflow__example -d postgres
Cube.js uses environment variables for configuration. To set up the connection to our database, we need to specify the database type and name.
In the newly created stackoverflow__example
folder, please replace the contents of the .env file with the following:
CUBEJS_DEVELOPER_MODE=true
CUBEJS_API_SECRET=SECRET
CUBEJS_DB_TYPE=postgres
CUBEJS_DB_NAME=stackoverflow__example
CUBEJS_DB_USER=postgres
CUBEJS_DB_PASS=postgres
Now we're ready to start the API with this simple command:
$ npm run dev
To check if the API works, please navigate to http://localhost:4000 in your browser. You'll see Cube.js Developer Playground, a powerful tool which greatly simplifies data exploration and query building.
The last thing left to make the API work is to define the data schema: it describes what kind of data we have in our dataset and what should be available at our application.
Let’s go to the data schema page and check all tables from our database. Then, please click on the plus icon and press the “generate schema” button. Voila! 🎉
Now you can spot a number of new *.js
files in the schema
folder.
So, our API is set up, and we're ready to create map data visualizations with Mapbox!
Frontend and Mapbox
Okay, now it's time to write some JavaScript and create the front-end part of our map data visualization. As with the data schema, we can easily scaffold it using Cube.js Developer Playground.
Navigate to the templates page and choose one of predefined templates or click "Create your own". In this guide, we'll be using React, so choose accordingly.
After a few minutes spent to install all dependencies (oh, these node_modules
) you'll have the new dashboard-app
folder. Run this app with the following commands:
$ cd dashboard-app
$ npm start
Great! Now we're ready to add Mapbox to our front-end app.
Setting Up Mapbox 🗺
We'll be using the react-map-gl wrapper to work with Mapbox. Actually, you can find some plugins for React, Angular, and other frameworks in Mapbox documentation.
Let's install react-map-gl
with this command:
$ npm install --save react-map-gl
To connect this package to our front-end app, replace the src/App.jsx
with the following:
import * as React from 'react';
import { useState } from 'react';
import MapGL from 'react-map-gl';
const MAPBOX_TOKEN = 'MAPBOX_TOKEN';
function App() {
const [ viewport, setViewport ] = useState({
latitude: 34,
longitude: 5,
zoom: 1.5,
});
return (
<MapGL
{...viewport}
onViewportChange={(viewport) => {
setViewport(viewport)
}}
width='100%'
height='100%'
mapboxApiAccessToken={MAPBOX_TOKEN}
/>
);
}
You can see that MAPBOX_TOKEN
needs to be obtained from Mapbox and put in this file.
Please see the Mapbox documentation or, if you already have a Mapbox account, just generate it at the account page.
At this point we have an empty world map and can start to visualize data. Hurray!
Planning the Map Data Visualization 🔢
Here's how you can any map data visualization using Mapbox and Cube.js:
- load data to the front-end with Cube.js
- transform data to GeoJSON format
- load data to Mapbox layers
- optionally, customize the map using the
properties
object to set up data-driven styling and manipulations
In this guide, we'll follow this path and create four independent map data visualizations:
- a heatmap layer based on users' location data
- a points layer with data-driven styling and dynamically updated data source
- a points layer with click events
- a choropleth layer based on different calculations and data-driven styling
Let's get hacking! 😎
Heatmap Visualization
Okay, let's create our first map data visualization! 1️⃣
Heatmap layer is a suitable way to show data distribution and density. That's why we'll use it to show where Stack Overflow users live.
Data Schema
This component needs quite a simple schema, because we need only such dimension as “users locations coordinates” and such measure as “count”.
However, some Stack Overflow users have amazing locations like "in the cloud", "Interstellar Transport Station", or "on a server far far away". Surprisingly, we can't translate all these fancy locations to GeoJSON, so we're using the SQL WHERE
clause to select only users from the Earth. 🌎
Here's how the schema/Users.js
file should look like:
cube(`Users`, {
sql: `SELECT * FROM public.Users WHERE geometry is not null`,
measures: {
count: {
type: `count`
}
},
dimensions: {
geometry: {
sql: 'geometry',
type: 'string'
}
}
});
Web Component
Also, we'll need the dashboard-app/src/components/Heatmap.js
component with the following source code. Let's break down its contents!
First, we're loading data to the front-end with a convenient Cube.js hook:
const { resultSet } = useCubeQuery({
measures: ['Users.count'],
dimensions: ['Users.geometry'],
});
To make map rendering faster, with this query we're grouping users by their locations.
Then, we transform query results to GeoJSON format:
let data = {
type: 'FeatureCollection',
features: [],
};
if (resultSet) {
resultSet.tablePivot().map((item) => {
data['features'].push({
type: 'Feature',
properties: {
value: parseInt(item['Users.count']),
},
geometry: JSON.parse(item['Users.geometry']),
});
});
}
After that, we feed this data to Mapbox. With react-map-gl
, we can do it this way:
return (
<MapGL
width='100%'
height='100%'
mapboxApiAccessToken={MAPBOX_TOKEN}>
<Source type='geojson' data={data}>
<Layer {...{
type: 'heatmap',
paint: {
'heatmap-intensity': intensity,
'heatmap-radius': radius,
'heatmap-weight': [ 'interpolate', [ 'linear' ], [ 'get', 'value' ], 0, 0, 6, 2 ],
'heatmap-opacity': 1,
},
}} />
</Source>
</MapGL>
);
}
Note that here we use Mapbox data-driven styling: we defined the heatmap-weight
property as an expression and it depends on the "properties.value":
'heatmap-weight': [ 'interpolate', ['linear'], ['get', 'value'], 0, 0, 6, 2]
You can find more information about expressions in Mapbox docs.
Here's the heatmap we've built:
Useful links
- Heatmap layer example at Mapbox documentation
- Heatmap layers params descriptions
- Some theory about heatmap layers settings, palettes
Dynamic Points Visualization
The next question was: is there any correlation between Stack Overflow users' locations and their ratings? 2️⃣
Spoiler alert: no, there isn't 😜. But it's a good question to understand how dynamic data loading works and to dive deep into Cube.js filters.
Data Schema
We need to tweak the schema/User.js
data schema to look like this:
cube('Users', {
sql: 'SELECT * FROM public.Users WHERE geometry is not null',
measures: {
max: {
sql: 'reputation',
type: 'max',
},
min: {
sql: 'reputation',
type: 'min',
}
},
dimensions: {
value: {
sql: 'reputation',
type: 'number'
},
geometry: {
sql: 'geometry',
type: 'string'
}
}
});
Web Component
Also, we'll need the dashboard-app/src/components/Points.js
component with the following source code. Let's break down its contents!
First, we needed to query the API to find out an initial range of users reputations:
const { resultSet: range } = useCubeQuery({
measures: ['Users.max', 'Users.min']
});
useEffect(() => {
if (range) {
setInitMax(range.tablePivot()[0]['Users.max']);
setInitMin(range.tablePivot()[0]['Users.min']);
setMax(range.tablePivot()[0]['Users.max']);
setMin(range.tablePivot()[0]['Users.max'] * 0.4);
}
}, [range]);
Then, we create a Slider
component from Ant Design, a great open source UI toolkit. On every chnage to this Slider's value, the front-end will make a request to the database:
const { resultSet: points } = useCubeQuery({
measures: ['Users.max'],
dimensions: ['Users.geometry'],
filters: [
{
member: "Users.value",
operator: "lte",
values: [ max.toString() ]
},
{
member: "Users.value",
operator: "gte",
values: [ min.toString() ]
}
]
});
To make maps rendering faster, with this query we're grouping users by their locations and showing only the user with the maximum rating.
Then, like in the previous example, we transform query results to GeoJSON format:
const data = {
type: 'FeatureCollection',
features: [],
};
if (points) {
points.tablePivot().map((item) => {
data['features'].push({
type: 'Feature',
properties: {
value: parseInt(item['Users.max']),
},
geometry: JSON.parse(item['Users.geometry']),
});
});
}
Please note that we've also applied a data-driven styling at the layer properties, and now points' radius depends on the rating value.
'circle-radius': {
property: 'value',
stops: [
[{ zoom: 0, value: 10000 }, 2],
[{ zoom: 0, value: 2000000 }, 20]
]
}
When the data volume is moderate, it's also possible to use only Mapbox filters and still achieve desired performance. We can load data with Cube.js once and then filter rendered data with these layer settings:
filter: [
"all",
[">", max, ["get", "value"]],
["<", min, ["get", "value"]]
],
Here's the visualization we've built:
Points and Events Visualization
Here we wanted to show the distribution of answers and questions by countries, so we rendered most viewable Stack Overflow questions and most rated answers. 3️⃣
When a point is clicked, we render a popup with information about a question.
Data Schema
Due to the dataset structure, we don't have the user geometry info in the Questions
table.
That's why we need to use joins in our data schema. It's a one-to-many relationship which means that one user can leave many questions.
We need to add the following code to the schema/Questions.js
file:
joins: {
Users: {
sql: `${CUBE}.owner_user_id = ${Users}.id`,
relationship: `belongsTo`
},
},
Web Component
Then, we need to have the dashboard-app/src/components/ClickEvents.js
component to contain the following source code. Here are the most important highlights!
The query to get questions data:
{
measures: [ 'Questions.count' ],
dimensions: [ 'Users.geometry']
}
Then we use some pretty straightforward code to transform the data into geoJSON:
const data = {
type: 'FeatureCollection',
features: [],
};
resultSet.tablePivot().map((item) => {
data['features'].push({
type: 'Feature',
properties: {
count: item['Questions.count'],
geometry: item['Users.geometry'],
},
geometry: JSON.parse(item['Users.geometry'])
});
});
The next step is to catch the click event and load the point data. The following code is specific to the react-map-gl
wrapper, but the logic is just to listen to map clicks and filter by layer id:
const [selectedPoint, setSelectedPoint] = useState(null);
const { resultSet: popupSet } = useCubeQuery({
dimensions: [
'Users.geometry',
'Questions.title',
'Questions.views',
'Questions.tags'
],
filters: [ {
member: "Users.geometry",
operator: "contains",
values: [ selectedPoint ]
} ],
}, { skip: selectedPoint == null });
const onClickMap = (event) => {
setSelectedPoint(null);
if (typeof event.features != 'undefined') {
const feature = event.features.find(
(f) => f.layer.id == 'questions-point'
);
if (feature) {
setSelectedPoint(feature.properties.geometry);
}
}
}
When we catch a click event on some point, we request questions data filtered by point location and update the popup.
So, here's our glorious result:
Choropleth Visualization
Finally, choropleth. This type of map chart is suitable for regional statistics, so we're going to use it to visualize total and average users’ rankings by country. 4️⃣
Data Schema
To accomplish this, we'll need to complicate our schema a bit with a few transitive joins.
First, let's update the schema/Users.js
file:
cube('Users', {
sql: 'SELECT * FROM public.Users',
joins: {
Mapbox: {
sql: '${CUBE}.country = ${Mapbox}.geounit',
relationship: 'belongsTo',
},
},
measures: {
total: {
sql: 'reputation',
type: 'sum',
}
},
dimensions: {
value: {
sql: 'reputation',
type: 'number'
},
country: {
sql: 'country',
type: 'string'
}
}
});
The next file is schema/Mapbox.js
, it contains country codes and names:
cube(`Mapbox`, {
sql: `SELECT * FROM public.Mapbox`,
joins: {
MapboxCoords: {
sql: `${CUBE}.iso_a3 = ${MapboxCoords}.iso_a3`,
relationship: `belongsTo`,
},
},
dimensions: {
name: {
sql: 'name_long',
type: 'string',
},
geometry: {
sql: 'geometry',
type: 'string',
},
},
});
Then comes schema/MapboxCoords.js
which, obviously, hold polygon coordinates for map rendering:
cube(`MapboxCoords`, {
sql: `SELECT * FROM public.MapboxCoords`,
dimensions: {
coordinates: {
sql: `coordinates`,
type: 'string',
primaryKey: true,
shown: true,
},
},
});
Please note that we have a join in schema/Mapbox.js
:
MapboxCoords: {
sql: `${CUBE}.iso_a3 = ${MapboxCoords}.iso_a3`,
relationship: `belongsTo`,
},
And another one in schema/User.js
:
Mapbox: {
sql: `${CUBE}.country = ${Mapbox}.geounit`,
relationship: `belongsTo`,
}
With the Stack Overflow dataset, our most suitable column in the Mapbox
table is geounit
, but in other cases, postal codes, or iso_a3
/iso_a2
could work better.
That's all in regard to the data schema. You don't need to join the Users
cube with the MapboxCoords
cube directly. Cube.js will make all the joins for you.
Web Component
The source code is contained in the dashboard-app/src/components/Choropleth.js
component. Breaking it down for the last time:
The query is quite simple: we have a measure that calculates the sum of users’ rankings.
const { resultSet } = useCubeQuery({
measures: [ `Users.total` ],
dimensions: [ 'Users.country', 'MapboxCoords.coordinates' ]
});
Then we need to transform the result to geoJSON:
if (resultSet) {
resultSet
.tablePivot()
.filter((item) => item['MapboxCoords.coordinates'] != null)
.map((item) => {
data['features'].push({
type: 'Feature',
properties: {
name: item['Users.country'],
value: parseInt(item[`Users.total`])
},
geometry: {
type: 'Polygon',
coordinates: [ item['MapboxCoords.coordinates'].split(';').map((item) => item.split(',')) ]
}
});
});
}
After that we define a few data-driven styles to render the choropleth layer with a chosen color palette:
'fill-color': {
property: 'value',
stops: [
[1000000, `rgba(255,100,146,0.1)`],
[10000000, `rgba(255,100,146,0.4)`],
[50000000, `rgba(255,100,146,0.8)`],
[100000000, `rgba(255,100,146,1)`]
],
}
And that's basically it!
Here's what we're going to behold once we're done:
Looks beautiful, right?
The glorious end
So, here our attempt to build a map data visualization comes to its end.
We hope that you liked this guide. If you have any feedback or questions, feel free to join Cube.js community on Slack — we'll be happy to assist you.
Also, if you liked the way the data was queries via Cube.js API — visit Cube.js website and give it a shot. Cheers! 🎉
Posted on January 29, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
January 29, 2021