Getting your user's country and name in a Whatsapp bot using Fastapi and FaunaDB - Part 2

alessandrojcm

Alessandro Cuppari

Posted on May 20, 2020

Getting your user's country and name in a Whatsapp bot using Fastapi and FaunaDB - Part 2

Hello there! Here we will continue when we left off in the previous part. So right now we have our shiny static actions all set up; now, in this post, we will scaffold the FastApi project and write the code needed to query FaunaDB.

Setting up our project

Ok so first I will assume you have both Python 3.8.2 and virtualenv in your system; if not, please set them up according to your operating system before continuing.

Next, let's create a folder and a virtual environment for our project; for the remaining of this article I will assume a UNIX-based system:

mkdir our-awesome-api && cd our-awesome-api
virtualenv venv
source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Now, we will install the packages we need, and what do we need? Just FastApi and Fauna! Well no, we'll need also a couple more:

pip install uvicorn fastapi faunadb python-dotenv
Enter fullscreen mode Exit fullscreen mode

Ok, the two in the middle we recognize, but what are the other two? Well, python-dotenv it's a package FastApi (or more precisely, Pydantic) uses to parse .env files; just like Javascript's dotenv.

On the other hand, uvicorn it's an ASGI server; remember those? It's the layer that allows FastApi to talk to HTTP servers like Nginx or Apache, it also has a nice dev server built-in along with hot reload when the code changes.

The file structure

Now, let's create a nice file structure for our project:

└───src
    ├───core
    ├───models
    └───app.py
Enter fullscreen mode Exit fullscreen mode

What do we have here?

  • src it's the source root of our project
  • In core lives all the code that's critical for the start of our application
  • In models are all the data models (like users, for example)
  • In app.py is where we create our application instance

Now that we have our packages installed and our file structure all set up, let's figure out what we need first.

Querying FaunaDB

The core of this project is, basically, storing and retrieving users. For that, we need to query our database; now, how do we do that? First, let's see how Fauna stores data: the records fauna stores are called documents (like in MongoDB), and documents are stored into collections. Like I mentioned in the previous article, Fauna does not enforce any type of data schema at the database level, and because of that, we can find inconsistencies between documents in the same collection. Nevertheless, all the documents have two common properties:

  • ref, which is a unique identifier given automatically to all documents (like an old fashioned id)
  • ts, which is a timestamp that changes every time an event occurs to this particular document (like creation or update). Fauna automatically stores the events in the lifetime of an object, and you can query it, but that is beyond the scope of this post.
  • data, this is a dictionary containing all the data we store for the object. So, for example, is this was a users collection this would contain information like the username or email of our users.

Cool, now that we know how the data is stored, we need to know how to query it. Because no data schema is enforced at the database level, Fauna does not know out of the box how to query our data (we can't for example, query by username like we would do in a relational database); to tell Fauna how to query our data, we need to create an index. Indexes are a set of instructions, associated with a particular collection, that tells fauna how to query that collection; at the same time, we can tell indexes what data we need to be retrieved. So, for example, maybe we need to list all our users, this list will only show their username and the date they signed up; it would be a waste of bandwidth to fetch all the additional data we won't show in the list. So, for this, we can create an index that does exactly this: paginate through all the users and retrieve only their username and the document creation date.

Ok, enough theoretical talking, let's recall our flowchart from the last post and see what indexes we need:

Bot's greeting process flowchart

Now, we see that we need to check if the user already exists. For that, we'll need an index that queries our collection via a unique index. What property of our users is unique? That's right, their phone number! So, let's go to the Fauna website.

Creating an index

Well, that title is a bit misleading. We can't create an index for a collection that does not exist and we cannot create a collection for a database that does not exist. So let's go to FaunaDB and sign up using our GitHub credentials.

Then, in the dashboard, let's click in "New Database" and create our shiny new instance:
Database creation

Next, let's create our collection, I named mine users for the sake of originality:
Collection creation
Now, with the database and collection created, we can finally make our index:
Index creation
Ok, we have several things going on here:

  • First of all, we need to select to which collection this index belongs. We say that this is for querying our users collection.
  • Then, the index name. Pretty self-explanatory.
  • Now we have this thing called "terms", what's this? Well, indexes can be used to paginate over a collection (that is, to list a collection) or they can be used to query a collection using a special field (for example, via the phone number, as we have here). Terms are exactly that: the terms that this index will use to query our data against. We write phone_number here and we see the dashboard formats this to data.phone_number, that is because all non-standard fields are considered to be inside of the data dict I mentioned before. By standard fields, I mean fields that Fauna generated, like ref and ts.
  • The values are all those fields we want the index to fetch, in this case, we want to fetch the name, the country, and the ref. Notice how the dashboard put ref and no data.ref.

Cool! Now let's click save and leave the dashboard for now.

Querying Fauna from Python

Ok, now we need to write code in order to query our database from python. For that, I adopted an OOP approach, kinda like a traditional ORM:

# src/models/user_document.py
from typing import Optional, Any

from pydantic import BaseModel

from faunadb import query as q
from faunadb.objects import Ref

class User(BaseModel):
  ref: Optional[Ref]
  ts: Optional[str]
  phone_number: str
  name: str
  country: str

  class Config:
    arbitrary_types_allowed = True

  def __init__(self, **data: Any):
    super().__init__(**data)

 @classmethod
  def get_by_phone(cls, phone_number: str):
      result = session().query(
          q.paginate(q.match(q.index("get_by_phone_number"), phone_number))
      )
      if len(result["data"]) == 0:
          return None
      # The result is a list with the values ordered as the index we defined
      ref, name, country = result["data"][0]

      return User(
          ref=ref,
          name=name,
          country=country,
          phone_number=phone_number,
      )
Enter fullscreen mode Exit fullscreen mode

Now, what the heck do we have here? First of all, what is BaseModel parent class? Well, FastApi uses Pydantic under the hood. Pydantic it's a validation library that uses the type hints introduced in Python 3.6 for doing its validation. For a class to be validated with Pydantic, we have to do two things:

  • Inherit from the BaseModel class
  • Defined our class attributes with type hints

And that's what we have done here:

  • ref and ts its Fauna's reference type as discussed above, the Python driver comes with its own type hints for the former, so we use those; for tss we use int, because it is just a UNIX timestamp. You'll see why are they marked as Optional in a moment.
  • We add the same attributes we defined in our index, so the phone number, the country, and the name.

After that, we have just defined a constructor to init the base class.

Now, we come to the heart of this class, the get_by_phone static method:

  • First, we create a query using the FaunaDB client (we will define the session object in a moment)
    • Inside the query we pass a paginate object
    • Then we say we want to match against and index, in this case, is the "get_by_phone_number" index we created above
    • As a param to that index, we pass the phone_number parameter passed to the function
  • If the query returns an empty list, we say that we did not find the user, so we return None
  • If not, then we create an instance of User with the first element of the list the query returned. Notice that this is a static method, because of that we use User directly instead of self

Cool, now two things:

  • The order in which the document's properties are returned is the same order that we defined in our index
  • The returned data is a list, why is that? Well, remember that Fauna does not make any assumptions about our data. It does not know that the phone_number must be unique, that is for us to enforce at the application level. So for all Fauna knows, there could be a thousand documents with the same number, what we're doing here is a very educated guess; so we need to be careful to not allow duplicate phone numbers at the application level.

Populating that core folder

You noticed above that I imported a magical session object, you also probably guessed that was the Fauna client initialized somewhere. If you did, you are correct; let's write that initialization code now:

For this, we need to first write a model to hold the configuration of our app.

# src/models/config.py
from typing import Optional

from pydantic import BaseSettings, AnyHttpUrl, validator, HttpUrl
from pydantic.types import SecretStr

class Config(BaseSettings):
  DEBUG: bool = False
  FAUNA_SERVER_KEY: SecretStr = "your_server_key_here"
  FAKE_NUMBER: str = "+15555555"
  TWILIO_AUTH_TOKEN: Optional[str]

  def __init__(self, **kwargs):
        super().__init__(**kwargs)

  @validator("TWILIO_AUTH_TOKEN")
  def check_for_twilio_token(cls, v, values, field):
    if (
        "DEBUG" in values
        and not values["DEBUG"]
        and v is None
    ):
        raise RuntimeError("{f} must be set".format(f=field))
    return v
Enter fullscreen mode Exit fullscreen mode

Now, what's this? Well, Pydantic comes with a very handy class to handle settings (and validation of those settings) for your application. Let's explain what's going on here:

  • First, the base class is BaseSettings instead of BaseModel
  • Then, we defined the configuration variables our application will need:
    • DEBUG a handy flag to, well, enable debug mode. Defaults to False
    • FAUNA_SERVER_KEY, the API key four our database. It is of type SecretStr, which is a helper type to avoid printing the value to stdout: it will print a bunch of asterisks instead of the real value
    • FAKE_NUMBER, more about that later ;)
    • TWILIO_AUTH_TOKEN our Twilio API Token, we'll need it later.
  • After that, there is a Pydantic decorator, this function acts as a custom validator for the property that was passed as a parameter to the decorator. In this case, if we are running in debug mode, we don't care if the Twilio API key is not present so we jus return; on the other hand, if DEBUG is set to false, we need to check that the Twilio API key is present, otherwise the application would raise an exception

Great, now let's put this model into use:

# src/core/config.py
import sys
from os import environ

from src.models.config import Config

if environ.get("DEBUG", True):
    config = Config(_env_file=".env")
else:
    config = Config()
Enter fullscreen mode Exit fullscreen mode

Not much going on here: we are just checking if DEBUG is set. If it is, we populate our config via a .env file; if not the config will be taken from environment variables.

Now, the Pydantic base setting has an order of priority to load variables:

  • Higher priority goes to environment variables
  • The next higher priority goes to the .env file, so if the same variable is defined in the file and in the environment, the one in the environment will take precedence.
  • Defaults defined in the model

So if an attribute not marked as Optional which does not have a default and is not defined either in the .env nor in the environment, will cause Pydantic to throw an error.

Next, the misterious session object:

# src/core/engine.py
from faunadb.client import FaunaClient

_session = None


def session():
    """
    Helper function to get faunadb session, if there is one already it just returns
    """
    from .config import config

    global _session
    if _session is None:
        _session = FaunaClient(secret=config.FAUNA_SERVER_KEY.get_secret_value())
    return _session
Enter fullscreen mode Exit fullscreen mode

Nothing crazy here (and sorry for the global), we are just checking if Fauna client is initialized: if it is, we just return it, if it is not, then we initialize it with the key loaded from our configuration. Check that we are calling the get_secret_value method, remember that helper class from Pydantic? Well, we have to call this method in order to get the real value, otherwise, we would get a bunch of asterisks.

Our first endpoint

Now finally! Let's write our first endpoint. What does this endpoint do? Well, according to our flowchart, it will be responsible to check if our user is in the database and will react accordingly. Let's create a routes folder inside our source root:

mkdir src/routes
Enter fullscreen mode Exit fullscreen mode

Inside this is of course our __init__.py and our new greet.py file which will contain our routes. Next, let's first modify our bot's schema to redirect to our endpoint when the user greets:

"uniqueName": "greeting",
"actions": {
  "actions": [
    {
      "redirect": {
        "method": "POST",
        "uri": "our_super_yet_to_be_endpoint/greeting"
      }
    }
  ]
},
// Other stuff down here
Enter fullscreen mode Exit fullscreen mode

Here, we just changed our default action to be a redirect instead of a say action. Also, we said the Autopilot should send this as a POST request.

Now, in order to write our endpoint, we need to take a look at the Autopilot request reference. There, we learn that autopilot sends the data using the application/x-www-form-urlencoded encoding. For FastApi to parse that, we need yet to add another library:

pip install python-multipart
Enter fullscreen mode Exit fullscreen mode

Now, finally, our endpoint:

# src/routes/greet.py
from fastapi import APIRouter, HTTPException, Form

from src.models import User

user_greeting = APIRouter()

@user_greeting.post("/greeting")
def greet_user(UserIdentifier: str = Form(...)):
    """
    User greet endpoint
    :param: UserIdentifier: user's phone number from Twilio
    """

    user = User.get_by_phone(phone_number=UserIdentifier)

   # Greeting the user since already exists
    if user is not None:
        return {
            "actions": [
                {"remember": {"name": user.name, "country": user.country}},
                {"say": "Hi there {name}! Welcome back, how are things in {c}?".format(name=user.name, c=user.country)},
            ]
        }

    return {
        "actions": [
            {
                "say": "Hello there!"
            },
            {"redirect": "task://can-have-name"},
        ]
    }
Enter fullscreen mode Exit fullscreen mode

Not much going on here, if you've worked with Python web frameworks this will be pretty familiar to you. Let's see what's going on:

  • First we create a router, this is what FastApi uses to bundle routes together. It helps organize code in self-contained routes.
  • Next, we use that object to create a POST endpoint using it as a decorator, FastApi has decorator for all the HTTP verbs. We pass the endpoint's URI as a parameter.
  • Next, the function name, it can be anything

Now, what's with the function parameters? Well, FastApi also takes advantage of Python type hints in order to validate our requests. So if, for example, we pass something like: email: str as a parameter, FastApi will look for a parameter called email of type str in the query parameter string of the request. Similarly for all types of request parameters (Body, path parameters, etc). The name of the function parameter needs to be the exactly the parameter we're expecting; otherwise, FastApi will respond with a 422 Unprocessable Entity error.

So, in this case, we're expecting a parameter called UserIdentifier (as per the Twilio docs), which is a string. We also pass Form as a default value: that way we tell FastApi that we are expecting an application/x-www-form-urlencoded request. The ellipsis (the three ...) inside the Form constructor is just telling that this form has no dependencies (dependencies are another FastApi feature beyond the scope of this post). But, after all, what's a UserIdentifier? Well, depends on the channel Autopilot is using to communicate with the user, in Whatsapp and SMS channel it will be the user's phone number.

After that it's all pretty self-explanatory:

  • We use the Fauna index we created before to check if the user exists.
  • If it exists, we return a personalized greeting
  • If not, we redirect to the action that will ask the user for their name; using the task's uniqueName we created in our previous post

By default, if we return a Python dict from a FastApi endpoint, it will be encoded as application/json; which is just what we need. Also, check that we are adhering to Autopilot's Action Schema. We already know about the say and redirect, but what is this remember thingy? Well, Autopilot bots have the concept of "session": sessions are a time frame on which a conversation between the bot and a user is considered active. While the session is active, all data interchanged between the bot and the user will be store and the bot is able to send messages to the user without having to wait for their input; the time the session is considered active varies between channels, for Whatsapp is 24 hours. Now, the bot also has a memory that lasts the whole session; with the remember action; we are telling the bot to store those values in memory. That way, if we need to refer to the user's name or country, we don't have to hit the database again.

Wrapping up

So that's all for this post folks! Initially, this was to be a two-post series, but it overextended a bit. So for the sake of not boring the heck out of you, I decided to make this into a three-post series. In the next post, we'll implement the endpoint to store the name, deploy this thing live and a few more tricks ;)

💖 💪 🙅 🚩
alessandrojcm
Alessandro Cuppari

Posted on May 20, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related