Building map-based data visualizations with Mapbox, React, and Cube.js 🗺

igorlukanin

Igor Lukanin

Posted on January 29, 2021

Building map-based data visualizations with Mapbox, React, and Cube.js 🗺

TL;DR: I'll explain how to build a visually appealing and fast web app with different kinds of maps. It'll be fun.


Hey devs 👋

As you most likely know, there are many ways to visualize data, but when it comes to location-based (or geospatial) data, map-based data visualizations are the most comprehensible and graphic.

In this guide, we'll explore how to build a map data visualization with JavaScript (and React) using Mapbox, a very popular set of tools for working with maps, navigation, and location-based search, etc.

We'll also learn how to make this map data visualization interactive (or dynamic), allowing users to control what data is being visualized on the map.

Here's our plan for today:

And... do you wonder what our result is going to look like? Not that bad, right?

Alt Text

To make this guide even more interesting, we'll use Stack Overflow open dataset, publicly available in Google BigQuery and on Kaggle. With this dataset, we'll be able to find answers to the following questions:

  • Where do Stack Overflow users live?
  • Is there any correlation between Stack Overflow users' locations and their ratings?
  • What is the total and average Stack Oerflow users' rating by country?
  • Is there any difference between the locations of people who ask and answer questions?

Also, to host and serve this dataset via an API, we'll use PostgreSQL as a database and Cube.js as an analytical API platfrom which allows to bootstrap an backend for an analytical app in minutes.

So, that's our plan — and let's get hacking! 🤘

If you can't wait to discover how it's built, feel free to study the demo and the source code on GitHub. Otherwise, let's proceed.

Dataset and API

Original Stack Overflow dataset contains locations as strings of text. However, Mapbox best works with locations encoded as GeoJSON, an open standard for geographical features based (surprise!) on JSON.

Alt Text

That's why we've used Mapbox Search API to perform geocoding. As the geocoding procedure has nothing to do with map data visualization, we're just providing the ready to use dataset with embedded GeoJSON data.

Setting Up a Database 🐘

We'll be using PostgreSQL, a great open-source database, to store the Stack Overflow dataset. Please make sure to have PostgreSQL installed on your system.

First, download the dataset ⬇️ (the file size is about 600 MB).

Then, create the stackoverflow__example database with the following commands:

$ createdb stackoverflow__example
$ psql --dbname stackoverflow__example -f so-dataset.sql
Enter fullscreen mode Exit fullscreen mode

Setting Up an API 📦

Let's use Cube.js, an open-source analytical API platform, to serve this dataset over an API. Run this command:

$ npx cubejs-cli create stackoverflow__example -d postgres
Enter fullscreen mode Exit fullscreen mode

Cube.js uses environment variables for configuration. To set up the connection to our database, we need to specify the database type and name.

In the newly created stackoverflow__example folder, please replace the contents of the .env file with the following:

CUBEJS_DEVELOPER_MODE=true
CUBEJS_API_SECRET=SECRET
CUBEJS_DB_TYPE=postgres
CUBEJS_DB_NAME=stackoverflow__example
CUBEJS_DB_USER=postgres
CUBEJS_DB_PASS=postgres
Enter fullscreen mode Exit fullscreen mode

Now we're ready to start the API with this simple command:

$ npm run dev
Enter fullscreen mode Exit fullscreen mode

To check if the API works, please navigate to http://localhost:4000 in your browser. You'll see Cube.js Developer Playground, a powerful tool which greatly simplifies data exploration and query building.

Alt Text

The last thing left to make the API work is to define the data schema: it describes what kind of data we have in our dataset and what should be available at our application.

Let’s go to the data schema page and check all tables from our database. Then, please click on the plus icon and press the “generate schema” button. Voila! 🎉

Now you can spot a number of new *.js files in the schema folder.

So, our API is set up, and we're ready to create map data visualizations with Mapbox!

Frontend and Mapbox

Okay, now it's time to write some JavaScript and create the front-end part of our map data visualization. As with the data schema, we can easily scaffold it using Cube.js Developer Playground.

Navigate to the templates page and choose one of predefined templates or click "Create your own". In this guide, we'll be using React, so choose accordingly.

After a few minutes spent to install all dependencies (oh, these node_modules) you'll have the new dashboard-app folder. Run this app with the following commands:

$ cd dashboard-app
$ npm start 
Enter fullscreen mode Exit fullscreen mode

Great! Now we're ready to add Mapbox to our front-end app.

Setting Up Mapbox 🗺

We'll be using the react-map-gl wrapper to work with Mapbox. Actually, you can find some plugins for React, Angular, and other frameworks in Mapbox documentation.

Let's install react-map-gl with this command:

$ npm install --save react-map-gl
Enter fullscreen mode Exit fullscreen mode

To connect this package to our front-end app, replace the src/App.jsx with the following:

import * as React from 'react';
import { useState } from 'react';
import MapGL from 'react-map-gl';

const MAPBOX_TOKEN = 'MAPBOX_TOKEN';

function App() {
  const [ viewport, setViewport ] = useState({
    latitude: 34,
    longitude: 5,
    zoom: 1.5,
  });

  return (
    <MapGL
      {...viewport}
      onViewportChange={(viewport) => {
        setViewport(viewport)
      }}
      width='100%'
      height='100%'
      mapboxApiAccessToken={MAPBOX_TOKEN}
    />
  );
}
Enter fullscreen mode Exit fullscreen mode

You can see that MAPBOX_TOKEN needs to be obtained from Mapbox and put in this file.

Please see the Mapbox documentation or, if you already have a Mapbox account, just generate it at the account page.

At this point we have an empty world map and can start to visualize data. Hurray!

Planning the Map Data Visualization 🔢

Here's how you can any map data visualization using Mapbox and Cube.js:

  • load data to the front-end with Cube.js
  • transform data to GeoJSON format
  • load data to Mapbox layers
  • optionally, customize the map using the properties object to set up data-driven styling and manipulations

In this guide, we'll follow this path and create four independent map data visualizations:

  • a heatmap layer based on users' location data
  • a points layer with data-driven styling and dynamically updated data source
  • a points layer with click events
  • a choropleth layer based on different calculations and data-driven styling

Let's get hacking! 😎

Heatmap Visualization

Okay, let's create our first map data visualization! 1️⃣

Heatmap layer is a suitable way to show data distribution and density. That's why we'll use it to show where Stack Overflow users live.

Data Schema

This component needs quite a simple schema, because we need only such dimension as “users locations coordinates” and such measure as “count”.

However, some Stack Overflow users have amazing locations like "in the cloud", "Interstellar Transport Station", or "on a server far far away". Surprisingly, we can't translate all these fancy locations to GeoJSON, so we're using the SQL WHERE clause to select only users from the Earth. 🌎

Here's how the schema/Users.js file should look like:

cube(`Users`, {
  sql: `SELECT * FROM public.Users WHERE geometry is not null`,

  measures: {
    count: {
      type: `count`
    }
  },

  dimensions: {
    geometry: {
      sql: 'geometry',
      type: 'string'
    }
  }
});
Enter fullscreen mode Exit fullscreen mode

Web Component

Also, we'll need the dashboard-app/src/components/Heatmap.js component with the following source code. Let's break down its contents!

First, we're loading data to the front-end with a convenient Cube.js hook:

const { resultSet } = useCubeQuery({ 
  measures: ['Users.count'],
  dimensions: ['Users.geometry'],
});
Enter fullscreen mode Exit fullscreen mode

To make map rendering faster, with this query we're grouping users by their locations.

Then, we transform query results to GeoJSON format:

let data = {
  type: 'FeatureCollection',
  features: [],
};

if (resultSet) {
  resultSet.tablePivot().map((item) => {
    data['features'].push({
      type: 'Feature',
      properties: {
        value: parseInt(item['Users.count']),
      },
      geometry: JSON.parse(item['Users.geometry']),
    });
  });
}
Enter fullscreen mode Exit fullscreen mode

After that, we feed this data to Mapbox. With react-map-gl, we can do it this way:

  return (
    <MapGL
      width='100%'
      height='100%'
      mapboxApiAccessToken={MAPBOX_TOKEN}>
      <Source type='geojson' data={data}>
        <Layer {...{
          type: 'heatmap',
          paint: {
            'heatmap-intensity': intensity,
            'heatmap-radius': radius,
            'heatmap-weight': [ 'interpolate', [ 'linear' ], [ 'get', 'value' ], 0, 0, 6, 2 ],
            'heatmap-opacity': 1,
          },
        }} />
      </Source>
    </MapGL>
  );
}
Enter fullscreen mode Exit fullscreen mode

Note that here we use Mapbox data-driven styling: we defined the heatmap-weight property as an expression and it depends on the "properties.value":

'heatmap-weight': [ 'interpolate', ['linear'], ['get', 'value'], 0, 0, 6, 2]
Enter fullscreen mode Exit fullscreen mode

You can find more information about expressions in Mapbox docs.

Here's the heatmap we've built:

Alt Text

Useful links

Dynamic Points Visualization

The next question was: is there any correlation between Stack Overflow users' locations and their ratings? 2️⃣

Spoiler alert: no, there isn't 😜. But it's a good question to understand how dynamic data loading works and to dive deep into Cube.js filters.

Data Schema

We need to tweak the schema/User.js data schema to look like this:

cube('Users', {
  sql: 'SELECT * FROM public.Users WHERE geometry is not null',

  measures: {
    max: {
      sql: 'reputation',
      type: 'max',
    },

    min: {
      sql: 'reputation',
      type: 'min',
    }
  },

  dimensions: {
    value: {
      sql: 'reputation',
      type: 'number'

    },

    geometry: {
      sql: 'geometry',
      type: 'string'
    }
  }
});
Enter fullscreen mode Exit fullscreen mode

Web Component

Also, we'll need the dashboard-app/src/components/Points.js component with the following source code. Let's break down its contents!

First, we needed to query the API to find out an initial range of users reputations:

const { resultSet: range } = useCubeQuery({
    measures: ['Users.max', 'Users.min']
});

useEffect(() => {
  if (range) {
    setInitMax(range.tablePivot()[0]['Users.max']);
    setInitMin(range.tablePivot()[0]['Users.min']);
    setMax(range.tablePivot()[0]['Users.max']);
    setMin(range.tablePivot()[0]['Users.max'] * 0.4);
  }
}, [range]);
Enter fullscreen mode Exit fullscreen mode

Then, we create a Slider component from Ant Design, a great open source UI toolkit. On every chnage to this Slider's value, the front-end will make a request to the database:

const { resultSet: points } = useCubeQuery({
  measures: ['Users.max'],
  dimensions: ['Users.geometry'],
  filters: [
    {
      member: "Users.value",
      operator: "lte",
      values: [ max.toString() ]
    },
    {
      member: "Users.value",
      operator: "gte",
      values: [ min.toString() ]
    }
  ]
});
Enter fullscreen mode Exit fullscreen mode

To make maps rendering faster, with this query we're grouping users by their locations and showing only the user with the maximum rating.

Then, like in the previous example, we transform query results to GeoJSON format:

const data = {
  type: 'FeatureCollection',
  features: [],
};

if (points) {
  points.tablePivot().map((item) => {
    data['features'].push({
      type: 'Feature',
      properties: {
        value: parseInt(item['Users.max']),
      },
      geometry: JSON.parse(item['Users.geometry']),
    });
  });
}
Enter fullscreen mode Exit fullscreen mode

Please note that we've also applied a data-driven styling at the layer properties, and now points' radius depends on the rating value.

'circle-radius': { 
  property: 'value', 
  stops: [ 
    [{ zoom: 0, value: 10000 }, 2], 
    [{ zoom: 0, value: 2000000 }, 20]
  ] 
}
Enter fullscreen mode Exit fullscreen mode

When the data volume is moderate, it's also possible to use only Mapbox filters and still achieve desired performance. We can load data with Cube.js once and then filter rendered data with these layer settings:

filter: [ 
  "all", 
  [">", max, ["get", "value"]], 
  ["<", min, ["get", "value"]] 
],
Enter fullscreen mode Exit fullscreen mode

Here's the visualization we've built:

Alt Text

Points and Events Visualization

Here we wanted to show the distribution of answers and questions by countries, so we rendered most viewable Stack Overflow questions and most rated answers. 3️⃣

When a point is clicked, we render a popup with information about a question.

Data Schema

Due to the dataset structure, we don't have the user geometry info in the Questions table.

That's why we need to use joins in our data schema. It's a one-to-many relationship which means that one user can leave many questions.

We need to add the following code to the schema/Questions.js file:

joins: {
  Users: { 
    sql: `${CUBE}.owner_user_id = ${Users}.id`, 
    relationship: `belongsTo` 
  },
},
Enter fullscreen mode Exit fullscreen mode

Web Component

Then, we need to have the dashboard-app/src/components/ClickEvents.js component to contain the following source code. Here are the most important highlights!

The query to get questions data:

{
  measures: [ 'Questions.count' ],
  dimensions: [ 'Users.geometry']
}
Enter fullscreen mode Exit fullscreen mode

Then we use some pretty straightforward code to transform the data into geoJSON:

const data = { 
  type: 'FeatureCollection',
  features: [], 
};

resultSet.tablePivot().map((item) => {
  data['features'].push({
    type: 'Feature',
    properties: {
      count: item['Questions.count'],
      geometry: item['Users.geometry'],
    },
    geometry: JSON.parse(item['Users.geometry'])
  });
}); 
Enter fullscreen mode Exit fullscreen mode

The next step is to catch the click event and load the point data. The following code is specific to the react-map-gl wrapper, but the logic is just to listen to map clicks and filter by layer id:


const [selectedPoint, setSelectedPoint] = useState(null);

const { resultSet: popupSet } = useCubeQuery({
  dimensions: [
    'Users.geometry',
    'Questions.title',
    'Questions.views',
    'Questions.tags'
  ],
  filters: [ {
    member: "Users.geometry",
    operator: "contains",
    values: [ selectedPoint ]
  } ],
}, { skip: selectedPoint == null });


const onClickMap = (event) => {
  setSelectedPoint(null);
  if (typeof event.features != 'undefined') {
    const feature = event.features.find(
      (f) => f.layer.id == 'questions-point'
    );
    if (feature) {
      setSelectedPoint(feature.properties.geometry);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

When we catch a click event on some point, we request questions data filtered by point location and update the popup.

So, here's our glorious result:

Alt Text

Choropleth Visualization

Finally, choropleth. This type of map chart is suitable for regional statistics, so we're going to use it to visualize total and average users’ rankings by country. 4️⃣

Data Schema

To accomplish this, we'll need to complicate our schema a bit with a few transitive joins.

First, let's update the schema/Users.js file:

 cube('Users', {
  sql: 'SELECT * FROM public.Users',
  joins: {
    Mapbox: {
      sql: '${CUBE}.country = ${Mapbox}.geounit',
      relationship: 'belongsTo',
    },
  },
  measures: {
    total: {
      sql: 'reputation',
      type: 'sum',
    }
  },

  dimensions: {
    value: {
      sql: 'reputation',
      type: 'number'
    },

    country: {
      sql: 'country',
      type: 'string'
    }
  }
});
Enter fullscreen mode Exit fullscreen mode

The next file is schema/Mapbox.js, it contains country codes and names:

cube(`Mapbox`, {
  sql: `SELECT * FROM public.Mapbox`,

  joins: {
    MapboxCoords: {
      sql: `${CUBE}.iso_a3 = ${MapboxCoords}.iso_a3`,
      relationship: `belongsTo`,
    },
  },

  dimensions: {
    name: {
      sql: 'name_long',
      type: 'string',
    },

    geometry: {
      sql: 'geometry',
      type: 'string',
    },
  },
});
Enter fullscreen mode Exit fullscreen mode

Then comes schema/MapboxCoords.js which, obviously, hold polygon coordinates for map rendering:

cube(`MapboxCoords`, {
  sql: `SELECT * FROM public.MapboxCoords`,

  dimensions: {
    coordinates: {
      sql: `coordinates`,
      type: 'string',
      primaryKey: true,
      shown: true,
    },
  },
});
Enter fullscreen mode Exit fullscreen mode

Please note that we have a join in schema/Mapbox.js:

MapboxCoords: {
  sql: `${CUBE}.iso_a3 = ${MapboxCoords}.iso_a3`, 
  relationship: `belongsTo`,
},
Enter fullscreen mode Exit fullscreen mode

And another one in schema/User.js:

Mapbox: {
  sql: `${CUBE}.country = ${Mapbox}.geounit`,
  relationship: `belongsTo`,
}
Enter fullscreen mode Exit fullscreen mode

With the Stack Overflow dataset, our most suitable column in the Mapbox table is geounit, but in other cases, postal codes, or iso_a3/iso_a2 could work better.

That's all in regard to the data schema. You don't need to join the Users cube with the MapboxCoords cube directly. Cube.js will make all the joins for you.

Web Component

The source code is contained in the dashboard-app/src/components/Choropleth.js component. Breaking it down for the last time:

The query is quite simple: we have a measure that calculates the sum of users’ rankings.

const { resultSet } = useCubeQuery({
  measures: [ `Users.total` ],
  dimensions: [ 'Users.country', 'MapboxCoords.coordinates' ]
});
Enter fullscreen mode Exit fullscreen mode

Then we need to transform the result to geoJSON:

if (resultSet) {
  resultSet
    .tablePivot()
    .filter((item) => item['MapboxCoords.coordinates'] != null)
    .map((item) => {
      data['features'].push({
        type: 'Feature',
        properties: {
          name: item['Users.country'],
          value: parseInt(item[`Users.total`])
        },
        geometry: {
          type: 'Polygon',
          coordinates: [ item['MapboxCoords.coordinates'].split(';').map((item) => item.split(',')) ]
        }
      });
    });
}
Enter fullscreen mode Exit fullscreen mode

After that we define a few data-driven styles to render the choropleth layer with a chosen color palette:

'fill-color': { 
  property: 'value',
  stops: [ 
    [1000000, `rgba(255,100,146,0.1)`], 
    [10000000, `rgba(255,100,146,0.4)`], 
    [50000000, `rgba(255,100,146,0.8)`], 
    [100000000, `rgba(255,100,146,1)`]
  ],
}
Enter fullscreen mode Exit fullscreen mode

And that's basically it!

Here's what we're going to behold once we're done:

Alt Text

Looks beautiful, right?

The glorious end

So, here our attempt to build a map data visualization comes to its end.

Alt Text

We hope that you liked this guide. If you have any feedback or questions, feel free to join Cube.js community on Slack — we'll be happy to assist you.

Also, if you liked the way the data was queries via Cube.js API — visit Cube.js website and give it a shot. Cheers! 🎉

💖 💪 🙅 🚩
igorlukanin
Igor Lukanin

Posted on January 29, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related