What's an API and how to access one using Python?

ernestinem

ernestine-m

Posted on July 21, 2020

What's an API and how to access one using Python?

Last month, I was given my very first task at work as a beginner in data science : retrieve data from an API that uses the Oauth2 authorization protocol. With hindsight, that seems like a very basic task, but I had trouble finding a how-to online that is beginner-friendly. This article is a little breakdown of the steps needed to communicate with an API using python 3.

What is an API ?

The textbook definition of an API (or Application Programming Interface) is "a set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service."

To put it simply, an API is the messenger between a client and a server and allows us to retrieve data. It can be compared to a waiter in a restaurant who takes our order, transmits it to cooks in the kitchen, then delivers our food back to us.

A very helpful 3 minute explanation

We could use different architectural styles to code an API but the standard one is based on the representational state transfer (REST), which allows for interoperability between computer systems on the internet. Indeed, A RESTful API, or REST API, uses existing HTTP methodologies to communicate:

  • GET to retrieve a resource/data
  • PUT to change the state of a resource or update it
  • POST to create a resource
  • DELETE to remove a resource

What is OAuth2 ?

OAuth logo

In order to access an API, you need an authorization. The most common standard is called OAuth and is used by most big tech companies. OAuth allows access tokens to be issued to third-party clients by an authorization server, with the approval of the resource owner. The third party then uses the access token to access the protected resources hosted by the resource server.

So, how does it work ?

The workflow I had to use for this task was client_credentials, which consists of 2 steps:

Step 1: Request an access token with the information given by the resource owner

In order to communicate with APIs, python has a very useful HTTP library called requests that allows us to retrieve data in a very simple way. There’s no need to manually add query strings to URLs, or to form-encode POST data.

The code I wrote

import requests

values = {"grant_type":"client_credentials",
   "client_id": ' given by the resource owner',
   'client_secret' : 'given by the resource owner',
   'scope' : 'specified in the API documentation'
}
headers = {
  'Authorization': 'given by the resource owner'
}
r = requests.post('host/oauth2/token', data=values, headers=headers)

print(r.json())

This code gives us back an access token that allows us to move to step 2.

Step 2 : Retrieve the data by using the access token that's been issued

For this step, I used Postman, a collaborative platform for API developments that also allows us to send requests. This tool is useful for beginners as it auto-generates headers. The only one I had to add was a range header, because the API results were paginated.

paginated ?

Yes, just like books, APIs can be paginated. Since databases can contain millions or billions of data, requesting all of it at once could cause the server to crash. Pagination was invented in order to prevent such an issue to occur by limiting the number of pages of data you get at each request. There are 3 main types of pagination :

  • Offset-based pagination
  • Keyset pagination
  • Seek pagination

This article goes into greater details about each one of these methods!

💖 💪 🙅 🚩
ernestinem
ernestine-m

Posted on July 21, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related