Building an API on open data
Gordon Wintrob
Posted on September 20, 2017
Welcome to GET PUT POST, a newsletter all about APIs. Each edition features an interview with a startup about their API and ideas for developers to build on their platform. Want the latest interviews in your inbox? Subscribe here.
This edition, I spoke with Ed Freyfogle, Co-Founder of OpenCage. Their API makes it easy to turn addresses into coordinates and, unlike Google and others, it’s built on top of a completely open data set. We discuss how the business got started, their stack to fan requests to other services, and scrappy marketing efforts.
What is OpenCage?
OpenCage is a geocoding service that incorporates all the benefits of open data. We help developers create solutions in ways that they can’t when working with proprietary data services and aim to make geocoding as simple as possible.
Before we started, I had a company that did online real estate searches. That brand was called Nestoria and we built a real estate search that worked for about ten countries.
Geocoding is the process of converting addresses into geographic locations and vice versa. There are great tools available for geocoding in countries like the US, UK, and most major European states. However, it’s a lot more difficult in countries like India and Brazil where Nestoria was involved.
We built our own geocoding service because we were geocoding millions of Indian and Brazilian properties. In these countries, it’s difficult to purchase geodata (if it even exists). To build our internal service cost effectively, we had to work with open data.
In 2013, we took the geocoding technology and turned it into its own brand: OpenCage. Nestoria was acquired in 2015 by one of our competitors and we spun OpenCage off as its own company.
How much usage did you have when you made that leap to exclusively doing geocoding?
We first released OpenCage as an experiment and called it a beta service. We learned a lot by listening to our users and got great feedback. Initially, it was free and then we talked with people who were using it, found out which services they would be willing to pay for, and created a business around that feedback.
We only introduced pricing after we added a lot to the service and made it more reliable. We’ve seen slow and steady growth in terms of usage and number of customers. The pain and pleasure of a geocoder is that the work is never done since the world is a complicated and dynamic place. New mapping issues and use cases come up continually.
There’s a massive demand for geocoding. There are many different offerings on the market from proprietary players like Google and developers can theoretically build their own service on open data. However, at this point, it’s far more efficient to use a service like ours rather than try to build it from scratch.
How many people use the API?
We offer a couple of different tiers. Tens of thousands of developers have joined our free tier and a smaller percentage of them are in our premium tier.
There’s diversity in the types of geocoding that people need. Many developers have a project involving a large database of addresses that need to be geocoded and then no longer need our service. We can’t build an ongoing business around interactions like that, so we focus on customers who have an ongoing need for geocoding.
What are some examples of use cases for the API?
Often it’s people who somehow have a database of addresses which need to be converted into coordinates. That’s the bread and butter of our use case. Those addresses can come from everywhere from online shopping transactions to patient records.
One area of growth is reverse-geocoding, particularly for vehicle tracking. Every new car continually records its travels and gathers coordinates. Customers want to convert those coordinates into addresses.
The cost of a device that can record location has fallen dramatically and it’s getting to the point where high-end bicycles often have location tracking capabilities. The problem is that coordinates mean little to humans.
For example, when someone rents a car in Spain, the agency asks customers if they’re going to leave the country. If they say yes, they pay more. The agency has a tracking device in the car that records the coordinates of its journey. Periodically, agencies check a car’s data and need to convert that data into information about whether that car has crossed into France.
A large portion of our business comes from developers working on apps where users allow access to their location data. Developers find it useful to convert those coordinates into human terms.
Do you have any customers that you’d like to highlight?
The vehicle tracking industry is huge and many of our big customers are in that space. One example is Bosch (a company well known in Europe as an electronics supplier). They use us to track cargo containers.
Geocoding isn’t particularly glamorous, but it’s important with regard to the logistics of getting things from A to B.
How do you market OpenCage?
Since OpenCage was originally a smaller part of another business, we already had some contacts. Those were our first users and they typically came from the online real estate industry.
The next step was to become active in the OpenStreetMap community – a global community of people interested in mapping. From a technical standpoint, it involves literally mapping the entire world down to insane levels of detail. We initially sponsored a geo event in London and expanded from there.
People search for geocoding help so marketing our product means being good with SEO and being in places where people ask technical questions regarding geocoding.
Usually there are three reasons why people are unhappy with proprietary suppliers like Google.
First, proprietary suppliers are not useful in their country, which is the case in many less developed markets. Second, people dislike their terms of use, which hinder developers from fixing bugs themselves. The third reason is price – they become expensive when geocoding at high volumes. We’re much more affordable because open data is available for free and we’re a smaller operation.
How large is your team?
My co-founder and I are the only ones who work full-time. Depending on the project, we may bring on freelancers and contractors.
How do you help bring on new customers?
We have a free trial to let people test our product. It’s also necessary to have great documentation that makes it easy for developers to get started.
Developers considering Google Maps are faced with an extremely wide variety of cluttered and confusing services and can easily waste days reading about each option. It’s hard to make a choice without having a background in this space.
That’s why our position is clear. We’re simple. We just do geocoding.
Conversely, there are developers who think that they can build their own system with OpenStreetMap. The reality is that it’s quite complex, both in the software necessary for geocoding and the underlying data. It’s not easy to keep a geocoding database up to date.
How do you prioritize features?
We rely on customer feedback and focus mainly on the people paying the bills. We’ve had several features that have come about in that way.
When someone asks for a feature, our first response is “Will you pay us to develop this?” We often still offer it to all users, but a paying customer really validates the need
For example, we had a customer last summer doing medical research. He had data regarding patients and needed to geocode their addresses. He was very concerned about privacy, so we added a new privacy parameter. This was very much a custom feature but it was useful to others.
Tell me more about your interview series.
Most of our data comes from sources like OpenStreetMap that have great global communities.
People all around the world are mapping their local neighborhoods and countries. There are different motivations for why people do that, but one is that they want to put that data into OpenStreetMap so that they can use it. We reach out to those communities, encourage them, and contribute to them.
We don’t have a massive budget, so we can’t be the triple gold sponsor of OpenStreetMap conferences, but we can make people aware of the OpenStreetMap community. One way we do that is by hosting an interview series on our blog where we interview people from all over the world including places like Tunisia and Costa Rica.
In some less developed parts of the world, there are no maps and people are trying to make maps for those countries. We piggyback on that global community and provide a forum where everyone can talk.
What do you wish people would build on top of OpenCage?
One of the big challenges we have is – because it’s such a low-level service – that we often don’t know exactly how people are using it. We just see queries coming in and we don’t usually know what the final product is. We’re usually only a small part of very complex projects.
The most common use case consists of people who come to our website and put in their home address. On the map, the exact coordinate could be off by 20 meters, which is confusing to some. However, many of our customers aren’t interested in precise accuracy.
For example, we once had a customer that specifically did not want precision. Their use case was an app where people could make and send videos to friends. When the friend received the video, this customer wanted to show the location of the sender, but not have that location be super precise because of privacy issues.
One area where developers can build is by providing “annotations” based on the coordinate data. We can add information about time zones, currency, the country calling code, and other pieces of relevant information that might be useful.
Tell me more about your stack.
We are a simple wrapper around different open geocoders. When a query comes into us, we do some validation and authentication of the query and then fire it off to the different services. There are some geocoders that are designed for specific countries that we don’t use for every query.
We receive the results, de-duplicate them, rank them, clean them up, add annotations, and send that back to the user.
We have our own servers; we’re not on the cloud. The challenge is the sheer volume of data that we’re managing – it’s now over 1TB. It’s much more cost effective to lease our own servers than it is to use an AWS or cloud-based approach.
Our server is pretty thin and the software is written in Perl, which is very good at text manipulation. We do authentication and validation and then send off queries to the different geocoders. The whole point of using our API is that it doesn’t really matter what our stack is. Requests are fanned out and aggregated.
We have libraries in almost every major language to make it easy for developers to interact with us. We’ve written and maintained some of those libraries, while members of the community submitted others. They’re all open source.
What happens when one of these services is down or slow?
Some of them can be slow. If a certain geocoder hasn’t responded, then we just rely on the ones that have returned.
One of the challenges we face is garbage input that slows down the geocoders. Sometimes a user will want to geocode a database, so they’ll write some for loops before really looking at their data or cleaning it up.
Sometimes users just get data from crawling the web. They often scrape all kinds of junk and then fire it at our service. We try to catch stuff like that before it gets to the geocoding services.
Want more API interviews in your inbox?
This post was originally published on medium.com
Posted on September 20, 2017
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.