Getting Started with HTTPX, Part 1: Building a Python REST Client (Synchronous Version)

bowmanjd

Jonathan Bowman

Posted on August 8, 2020

Getting Started with HTTPX, Part 1: Building a Python REST Client (Synchronous Version)

HTTPX is a modern HTTP client library for Python. Its interface is similar to the old standby Requests, but it supports asynchronous HTTP requests, using Python's asyncio library (or trio). In other words, while your program is waiting for an HTTP request to finish, other work does not need to be blocked.

In this first article, we will first build a client that is synchronous. In other words, each request will complete before the next one starts. The second article will build tests for this client, then Part 3 will adapt it to make the requests asynchronously overlap, and then test asynchronously in Part 4.

To interact with HTTPX, let's build a mini-project called pypedia. It will be a command-line tool to list a few Python-related articles from Wikipedia.

Poetry eases Python project and dependency management, so I use that to quickly get a project up and running. If new to Poetry, you may appreciate an article in which I introduce it.

Setup with Poetry

poetry new --src pypedia
cd pypedia
poetry add httpx
Enter fullscreen mode Exit fullscreen mode

Two functions and a command runner

In the src/pypedia directory, create a Python file called synchronous.py.

"""Proof-of-concept Wikipedia search tool."""
import logging
import time

import httpx

EMAIL = "your_email@provider"  # or Github URL or other identifier
USER_AGENT = {"user-agent": f"pypedia/0.1.0 ({EMAIL})"}

logging.basicConfig(filename="syncpedia.log", filemode="w", level=logging.INFO)
LOG = logging.getLogger("syncpedia")


def search(query, limit=100, client=None):
    """Search Wikipedia, returning a JSON list of pages."""
    if not client:
        client = httpx
    LOG.info(f"Start query '{query}': {time.strftime('%X')}")
    url = "https://en.wikipedia.org/w/rest.php/v1/search/page"
    params = {"q": query, "limit": limit}
    response = client.get(url, params=params)
    LOG.info(f"End query '{query}': {time.strftime('%X')}")
    return response


def list_articles(queries):
    """Execute several Wikipedia searches."""
    with httpx.Client(headers=USER_AGENT) as client:
        responses = (search(query, client=client) for query in queries)
    results = (response.json()["pages"] for response in responses)
    # results = (response.json() for response in responses)
    return dict(zip(queries, results))


def run():
    """Command entry point."""
    queries = [
        "linksto:Python_(programming_language)",
        "incategory:Computer_programming",
        "incategory:Programming_languages",
        "incategory:Python_(programming_language)",
        "incategory:Python_web_frameworks",
        "incategory:Python_implementations",
        "incategory:Programming_languages_created_in_1991",
        "incategory:Computer_programming_stubs",
    ]
    results = list_articles(queries)
    for query, articles in results.items():
        print(f"\n*** {query} ***")
        for article in articles:
            print(f"{article['title']}: {article['excerpt']}")
Enter fullscreen mode Exit fullscreen mode

In summary, the above has two significant functions and a command runner.

Using Client.get()

The search function accepts a reusable HTTPX Client instance and the query string, then performs a client GET request to the Wikipedia search endpoint.

The HTTPX Client is passed into the search() function as the client variable, so we can use methods like client.get(), passing two arguments: url and params. Whatever key:value pairs are in the params dict will make up the query string appended to the url, such as q (the Wikipedia search terms) or limit.

In case a pre-existing client is not passed to search(), the subsequent requests will use httpx.get(). This makes it easy to use search() by itself and test it in an isolated fashion.

httpx.Client in a context manager

The list_articles() function opens an HTTPX Client as a context manager, so that cleanup is assured and automatic. It accepts one parameter, queries, and then iterates over that list, calling search() with every query. It does this inside the context manager. This way, all the client.get() calls in the search() function should benefit from the re-use of a single HTTP persistent connection.

For those familiar with Requests, this is the equivalent to Requests's Session object.

The HTTPX Advanced Usage guide has excellent rationale and instructions for using the Client.

Enable the command runner

The run() function executes whatever we want to have executed when called as a script. In this case, it creates a list of search terms, then sends the list to list_articles(), then parses and prints the result.

With Poetry, the entry point for a script is defined in pyproject.toml. So we add this to that file:

[tool.poetry.scripts]
syncpedia = "pypedia.synchronous:run"
Enter fullscreen mode Exit fullscreen mode

So, the script syncpedia will call the run function of the synchronous submodule of the package pypedia.

poetry install
Enter fullscreen mode Exit fullscreen mode

Synchronous execution

To run:

poetry run syncpedia
Enter fullscreen mode Exit fullscreen mode

Assuming all works well, titles and excerpts of many Wikipedia articles should scroll by.

The calls to the Wikipedia API happened synchronously, in a sequence. One completed before the next began. This can be seen in the log file.

$ cat syncpedia.log
INFO:root:Start query 'linksto:Python_(programming_language)': 05:39:16
INFO:root:End query 'linksto:Python_(programming_language)': 05:39:17
INFO:root:Start query 'incategory:Computer_programming': 05:39:17
INFO:root:End query 'incategory:Computer_programming': 05:39:18
INFO:root:Start query 'incategory:Programming_languages': 05:39:18
INFO:root:End query 'incategory:Programming_languages': 05:39:19
INFO:root:Start query 'incategory:Python_(programming_language)': 05:39:19
INFO:root:End query 'incategory:Python_(programming_language)': 05:39:19
INFO:root:Start query 'incategory:Python_web_frameworks': 05:39:19
INFO:root:End query 'incategory:Python_web_frameworks': 05:39:20
INFO:root:Start query 'incategory:Python_implementations': 05:39:20
INFO:root:End query 'incategory:Python_implementations': 05:39:20
INFO:root:Start query 'incategory:Programming_languages_created_in_1991': 05:39:20
INFO:root:End query 'incategory:Programming_languages_created_in_1991': 05:39:20
INFO:root:Start query 'incategory:Computer_programming_stubs': 05:39:20
INFO:root:End query 'incategory:Computer_programming_stubs': 05:39:21
Enter fullscreen mode Exit fullscreen mode

In the instance above, each call took 1 second or less, executing in a clear order.

In other words, everybody had to wait in line.

HTTPX has the ability to make calls asynchronously, in which each call does not need to wait its turn in line. This can potentially have performance benefits. We will explore the async possibilities later in the series.

For now, we cannot forget to write tests, and the next article will engage this.

💖 💪 🙅 🚩
bowmanjd
Jonathan Bowman

Posted on August 8, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related