Turn Text Into HERE Maps with Python NLTK

j12y

Jayson DeLancey

Posted on January 6, 2019

Turn Text Into HERE Maps with Python NLTK

Originally published at developer.here.com so check out the original too.

What is in your Top 5 travel destinations? I asked my 8 year old son this recently and the response surprised me. The answer was very specific and unfamiliar.

Venice Mdina Aswan Soro Gryfino

OK, I know Venice but I’m a software engineer in Northern California and haven’t studied geography in quite some time. I have to admit I have no idea where some of those places are.

Maybe you have found yourself reading a travel article that beautifully describes a glamorous locale, places to stay, visit, and where to eat. These articles can be swell, but a simple map can go a long way to add the context of place that some expertly crafted prose alone cannot do. If the author or publisher didn’t include a map for you and you don’t have an 8-yr old Geography savant in your house, Python can help.

Solution

To solve this problem we will stand up a Python Flask server that exposes a few APIs to

  1. Download a given URL and parse the HTML with BeautifulSoup.
  2. Extract locations from the text based on some clues with the Natural Language Toolkit (NLTK).
  3. Geocode the location to determine a latitude and longitude with the HERE Geocoder API.
  4. Place markers on a map to identify the recognized places with the HERE Map Image API.

text - download - extract - geocode - map

Server Setup

For this section I make an assumption you are running an environment like OSX or Linux. If you are running Windows you will need to adjust some of the commands a bit.

Configuration

With the Twelve-Factor App the case is made that a best practice is to store config in the environment. I agree and like to store my API credentials in variables APP_ID_HERE and APP_CODE_HERE found in a file called HERE.sh.

#!/bin/bash
export APP_ID_HERE=your-app-id-here
export APP_CODE_HERE=your-app-code-here
Enter fullscreen mode Exit fullscreen mode

I source it into my environment with . HERE.sh to avoid any hard-coded credentials accidentally being released with my source.

Structure

The web server component will need several files you can see summarized in the listing below. Start by running mkdir app/api_1_0.

├── app
│   ├── __init__.py
│   └── api_1_0
│       ├── __init__.py
│       ├── demo.py
│       ├── health.py
├── HERE.sh
├── manage.py
├── config.py
└── requirements.txt
Enter fullscreen mode Exit fullscreen mode

If you aren't using Virtual Environments for Python you should be. You can find more from the Hitchiker's Guide to Python to get off on the right footing. You'll want to initialize your environment with the libraries in requirements.txt which can be done with pip install -r requirements.txt if the requirements.txt contains the following dependencies.

Flask
Flask-Script
gunicorn
nltk
requests
Enter fullscreen mode Exit fullscreen mode

App

We need manage.py as the main entrypoint to our application. It looks like the following listing:

import os
import app
from flask_script import Manager, Server

app = app.create_app('default')
manager = Manager(app)

if __name__ == '__main__':
  port = os.environ('PORT', 8000)
  manager.add_command('runserver', Server(port=port))
  manager.run()
Enter fullscreen mode Exit fullscreen mode

I've left out a few niceties like logging and printing the URL for brevity. This isn't particularly interesting to our task and is just some housekeeping to run a simple server for our APIs.

The config.py is also important for pulling in some of those environment variables we'll need to reference later.

import os

class Config(object):
  SECRET_KEY = os.environ.get('FLASK_SECRET_KEY')
  APP_ID_HERE = os.environ.get('APP_ID_HERE')
  APP_CODE_HERE = os.environ.get('APP_CODE_HERE')

  @staticmethod
  def init_app(app):
    pass

config = {'default': Config}
Enter fullscreen mode Exit fullscreen mode

Unlike other Python projects, our init files are pretty important on this one. In app/init.py we define the create_app function we saw in manage.py.

from config import config
from flask import Flask

def create_app(config_name):
  app = Flask(__name__)
  app.config.from_object(config[config_name])
  config[config_name].init_app(app)

  from .api_1_0 import api as api_1_0_blueprint
  app.register_blueprint(api_1_0_blueprint, url_prefix='/api/1.0')

  return app
Enter fullscreen mode Exit fullscreen mode

This gives us nice clean api versioning for any resources in our API. We also need to define app/api_1_0/init.py with some configuration

from flask import Blueprint
api = Blueprint('api', __name__)

from . import health
from . import demo
Enter fullscreen mode Exit fullscreen mode

As you can see, we do need to make sure each library we create is identified as part of the blueprint.

Healthcheck

To make sure our server is running properly we can add a quick healthcheck endpoint in the file app/api_1_0/healthcheck.py.

from flask import jsonify
from flask import current_app as app
from . import api

@api.route('/health', methods=['GET'])
def handle_health():
  return jsonify({
    'hello': 'world',
    'app_id_here': app.config['APP_ID_HERE'],
    'app_code_here': app.config['APP_CODE_HERE']
    })
Enter fullscreen mode Exit fullscreen mode

At this point we should be able to run python manage.py runserver and have proof of life. If you use your browser to go to http://localhost:8000/healthcheck we should get a response that confirms our server is up and has our app_id and app_code properly configured.

You may not want to display this once you hit production but is fine while we're at a "hello world" stage.

Text

For the purposes of getting started I will use a simple text file with just our locations from before.

Mdina
Aswan
Soro
Gryfino
Venice
Enter fullscreen mode Exit fullscreen mode

For more complex data sets to test with I recommend just trying out something from the New York Times, Wall Street Journal, or BBC travel sections.

Extract

We need to extract text from HTML and tokenize any words found that might be a location. We will define a method to handle requests for the resource /tokens so that we can look at each step independently.

@api.route('/tokens', methods=['GET'])
def handle_tokenize():
  # Take URL as input and fetch the body
  url = request.args.get('url')
  response = session.get('url')

  # Parse HTML from the given URL
  body = BeautifulSoup(response.content, 'html.parser')

  # Remove JavaScript and CSS from our life
  for script in body(['script', 'style']):
    script.decompose()

  text = body.get_text()

  # Ignore punctuation
  tokenizer = RegexpTokenizer(r'\w+')

  # Ignore duplicates
  tokens = set(tokenizer.tokenize(text))

  # Remove any stop words
  stop_words_set = set(stopwords.words())
  tokens = [w for w in tokens if not w in stop_words_set]

  # Now just get proper nouns
  tagged = pos_tag(tokens)
  tokens = [w for w,pos in tagged if pos in ['NNP', 'NNPS']]

  return jsonify(list(tokens))
Enter fullscreen mode Exit fullscreen mode

Before this will work, we need to download NLTK resources. This is a one-time operation you can do in an interactive python shell or by executing a simple script.

$ python
...
>>> import nltk
>>> nltk.download('stopwords')
>>> nltk.download('averaged_perceptron_tagger')
Enter fullscreen mode Exit fullscreen mode

With this demo.py in place we can restart our server and call the following endpoint to get back a list of any terms that could potentially be locations.

#!/bin/bash
curl http://localhost:8000/api/1.0/tokens?url=$1
Enter fullscreen mode Exit fullscreen mode

This returns a response from one sample article that looks like:

["Canada", "Many", "Conners", "Kelly", "Americans", "Biodinamica", "Milan", "Fabio", ... ]
Enter fullscreen mode Exit fullscreen mode

We've trimmed the wordcount down dramatically from the original article but there is still some more work that could be done to fine tune this recognition process.
This is good enough for a first pass though without adding more complexity so let's see if we can start recognizing these places with the geocoder.

Geocode

The HERE Geocoder API very simply takes a human understandable location and turns it into geocordinates. If you put in an address, you get back latitude and longitude.

Here's the listing for a geocoder endpoint:

@api.route('/geocode', methods=['GET'])
def handle_geocode():
  uri = 'https://geocoder.api.here.com/6.2/geocode.json'
  headers = {}
  params = {
    'app_id': app.config['APP_ID_HERE'],
    'app_code': app.config['APP_CODE_HERE'],
    'searchtext': request.args.get('searchtext')
  }

  response = session.get(uri, headers=headers, params=params)
  return jsonify(response.json())
Enter fullscreen mode Exit fullscreen mode

Restart the python webserver and send a request for a city like "Gryfino":

#!/bin/bash
curl http://localhost:8000/api/1.0/geocode?searchtext=$1
Enter fullscreen mode Exit fullscreen mode

The response includes among other things the location I might put a marker to display this position on a map.

"DisplayPosition": {
  "Latitude": 53.25676,
  "Longitude": 14.48947
},
Enter fullscreen mode Exit fullscreen mode

Map

Finally, we're going to take the latitude and longitude we received from our
geocode request and generate a simple render with the HERE Map Image API.

This listing looks like the following:

@api.route('/mapview', methods=['GET'])
def handle_mapview():
  uri = 'https://image.maps.api.here.com/mia/1.6/mapview'
  headers = {}
  params = {
    'app_id': app.config['APP_ID_HERE'],
    'app_code': app.config['APP_CODE_HERE'],
    'poi': request.args.get('poi')
  }

  response = session.get(uri, headers=headers, params=params)
  image_path = tempfile.mktemp()
  open(image_path, 'wb').write(response.content)

  return image_path
Enter fullscreen mode Exit fullscreen mode

For simplicity and brevity I haven't included any of the error / response handling you should do here. I've also cheated a bit by just storing the image to the local filesystem for illustration.

Now by calling this endpoint with a comma-separated list of latitude, longitude pairs it will return a map with all of the locations having markers.

Map with Points Placed

Place names without additional context can be ambiguous so in some cases there was more than one match. This map is only showing the first match, despite how much fun Venice beach may be.

Summary

The reason for making /tokens, /geocode, and /mapview separate endpoints is that this illustrates how you might setup microservices with Python + Flask for each operation you want to perform. This would allow a deployment to scale them independently.

You can find the full source code listing in the GitHub project.

For an extra dose of inception, try running the server and processing this article itself.

💖 💪 🙅 🚩
j12y
Jayson DeLancey

Posted on January 6, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related