Making Datasets Fly with Datasette and Fly

superfly

Fly.io

Posted on March 30, 2020

Making Datasets Fly with Datasette and Fly

The creator of Datasette, the tool for Dataset sharing and exploration, has added Fly to the platforms you can use to publish and share data. We take a look at Datasette and show how well it works with Fly.

I've always liked finding a good dataset. With a background in databases and writing, I know a good dataset can bring a demo to life, be it a census of Squirrels in Central Park or a survey of grocery purchases in London. Datasets can also provide valuable foundations for citizen journalism and, under analysis, provide insights.

The key to making these datasets work for people is making them accessible and available, which is where Datasette comes in. It's a project by Simon Willison designed to make sharing datasets easy. It all hinges on SQLite - essentially, you load up an SQLite database with data and then hand it to Datasette which presents it through a web site, to the world.

First, the data!

I'm going to use the New York Central Park Squirrel Census Data for my data source because squirrels rock. Go there and click the View Data button to see the data presented in the very fine NYC OpenData viewer.

I want the raw data though so I'll click on Export and select CSV which will kick off an immediate download. We now have a 2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv file to work with.

Making an Sqlite database

Ensure you have SQLite3 installed on your system; it'll already be there on a macOS system as it is heavily used throughout the OS. If you run sqlite3 filename.db the data will be persisted to that file rather than just held in memory, so let's begin.

❯ sqlite3 squirrels.db
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
sqlite>
Enter fullscreen mode Exit fullscreen mode

And the first thing we need to do is set the mode to CSV. If you look at the documentation, you'll see the .mode command listed as setting the output mode. That's not quite completely true, it's also used as a hint to the import command.

sqlite> .mode csv
Enter fullscreen mode Exit fullscreen mode

Now we are ready to import with .import. This command takes the CSV filename and a table name to import into. If the table isn't there, it'll use the first row in the CSV file to create the table columns and then import. (ProTip: If the table already exists it just imports everything, including the header row so always drop the table first).

sqlite> .import 2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv squirrels
Enter fullscreen mode Exit fullscreen mode

And we can do a quick check on what actually got imported with the .schema command.

sqlite> .schema squirrels
CREATE TABLE squirrels(
  "X" TEXT,
  "Y" TEXT,
  "Unique Squirrel ID" TEXT,
  "Hectare" TEXT,
  "Shift" TEXT,
  "Date" TEXT,
  "Hectare Squirrel Number" TEXT,
 ...
  "Lat/Long" TEXT,
  "Zip Codes" TEXT,
  "Community Districts" TEXT,
  "Borough Boundaries" TEXT,
  "City Council Districts" TEXT,
  "Police Precincts" TEXT
);
Enter fullscreen mode Exit fullscreen mode

There are a lot of columns about squirrels. Now I can exit sqlite3 and get ready to apply Datasette to the database.

Installing Datasette

There are a couple of ways to install and run Datasette. I'm going to go with the one that is simplest for most developers:

  • make sure Python3 is installed (brew install python3 on macOS using HomeBrew)
  • pip3 install datasette - You may get an error on macOS as it stops the file being copied to a system directory. Don't worry, just repeat the command with --user on the end and add the directory it suggests into your path.

And we're ready to test running datasette squirrels.db:

❯ datasette squirrels.db
Serve! files=('squirrels.db',) (immutables=()) on port 8001
INFO:     Started server process [2812]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
Enter fullscreen mode Exit fullscreen mode

Navigating to http://127.0.0.1:8001 and clicking on the Squirrel table, we should see the squirrels data available:

And clicking through you can start using Datasette's UI to compose views of particular facets of the data, write your own SQL queries, export data as JSON or CSV or access the data through a REST/JSON API.

Local Datasette Index

The next stop is making it available to the world.

Publishing to Fly

While other platforms are already built into Datasette, the Fly publishing element of Datasette is a new plugin, created by Datasette's developer. That means it has to be installed separately with pip3 install datasette-publish-fly

With that installed, you can run

datasette publish fly squirrels.db --app squirrels
Enter fullscreen mode Exit fullscreen mode

The --app flag lets you set the app name and it will be rejected if it clashes with an existing app. You may, if you are in multiple organizations, be asked to pick one of those too. Once you've done that, the publish command takes over, builds an image and deploys it onto the Fly platform. If you want to know what IP address and hostname the app is on, run flyctl info -a <appname> like so:

❯ flyctl info -a squirrels

App
  Name     = squirrels
  Owner    = dj
  Version  = 0
  Status   = running
  Hostname = squirrels.fly.dev

Services
  PROTOCOL   PORTS
  TCP        80 => 8080 [HTTP]
             443 => 8080 [TLS, HTTP]

IP Addresses
  TYPE   ADDRESS                                CREATED AT
  v4     77.83.142.59                           23m21s ago
  v6     2a09:8280:1:7bc8:bf19:7779:aef7:8f18   23m21s ago
Enter fullscreen mode Exit fullscreen mode

And that also tells us where we need to browse: squirrels.fly.dev. We're online and we can dig down into a table view where we can compose queries.

Squirrel Table

Deploying with Plugins

Datasette Plugins aren't just for publishing; there are a whole range of additional capabilities waiting to be slotted in. Take datasette-cluster-map, for example. It looks for latitude and longitude columns in the data and turns the data into an interactive map using them. Let's see how we use this with Fly.

Tuning the tables

The Squirrel data has X and Y coordinates which match Longitude and Latitude; we'll need to rename those columns first. Now, for a long time, sqlite lacked the ability to rename columns, so you'll find many workarounds on the web if you search. The good news is, though, that since 2018 you have been able to rename columns with the alter table rename column command. So I'll just load up the sqlite3 database and alter those columns:

❯ sqlite3 squirrels.db
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
sqlite> alter table squirrels rename column X to Longitude;
sqlite> alter table squirrels rename column Y to Latitude;
sqlite> .schema squirrels
CREATE TABLE squirrels(
  "Longitude" TEXT,
  "Latitude" TEXT,
...
sqlite> .exit
❯
Enter fullscreen mode Exit fullscreen mode

Run Locally Again

I now need to install that plugin with pip3:

❯ pip3 install datasette-cluster-map
Enter fullscreen mode Exit fullscreen mode

And run datasette locally:

❯ datasette squirrels.db
Serve! files=('squirrels.db',) (immutables=()) on port 8001
INFO:     Started server process [39385]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
Enter fullscreen mode Exit fullscreen mode

And if we browse to that http://localhost:8001/ we'll see the index page as we did before. Now to deploy to Fly.

Deploying and Plugins

The difference here from when we previously published to Fly is that we have to list the plugins we need installed with our Datasette. The --install flag takes care of that, so now I can publish to Fly with:

datasette publish fly squirrels.db --app squirrels-mapped --install datasette-cluster-map
Enter fullscreen mode Exit fullscreen mode

And that will include the cluster-map plugin. If I now browse to "https://squirrels-mapped.fly.dev", and click in on the squirrels table:

Squirrels Mapped

Our view of the Squirrels data now includes a cluster map over Central Park that we can click in on and get a closer view. When sightings resolve to a single squirrel, you can hover over it to get all the details.

Datasette and Fly

So what does Fly add to Datasette? Well, as well as being super simple to deploy, you may have noticed that all the connections we're making are HTTPS secured, with Let's Encrypt certificates being generated automatically. If you want more, it's simple to use your own custom domain or hostname with your Fly/Datasette deployment. You can also deploy all around the world so your dataset is available where people need it to be. And there's also the edge networking/SSL termination which makes interaction that bit snappier. There are a whole lot more to explore in Datasette - check out the documentation - and it's a great way to discover how you can make your applications Fly.

Thanks to Simon Willison, not only for Datasette and the Fly plugin, but also for his feedback on this article. And to the Squirrels of New York's Central Park for taking part in the census. Want to learn more about Fly? Head over to our Fly Docs for lots more, including a Hands On where you can get a free account and deploy your first app today.

💖 💪 🙅 🚩
superfly
Fly.io

Posted on March 30, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related