Design a Query Language Client for Pythonistas and Data Scientists

cheukting_ho

Cheuk Ting Ho 🐍

Posted on March 16, 2020

Design a Query Language Client for Pythonistas and Data Scientists

Having an effective yet user-friendly query language is one of the most important aspects of successful database technology. Here at TerminusDB, we aim to provide a good experience to our users whose main language is Python. Enter WOQLpy — a Query Language Client for Pythonistas and Data Scientists.

What is WOQLpy?

WOQLpy comes with the Terminus Python Client which you can install with pip:

pip install terminus-client-python

Enter fullscreen mode Exit fullscreen mode

Example usage:

import woqlclient.woqlClient as woql
from woqlclient import WOQLQuery
server_url = "http://localhost:6363"
key = "root"
dbId = "pybike"
client = woql.WOQLClient() client.connect(server_url, key)
client.createDatabase(dbId, "Bicycle Graph")
station = WOQLQuery().doctype("Station").label("Bike Station") journey = WOQLQuery().doctype("Journey") journey = journey.label("Journey")
journey = journey.property("start_station", "Station").label("Start Station")
journey = journey.property("end_station", "Station").label("End Station")                           
schema = WOQLQuery().when(True).woql_and(station, journey)                           
schema.execute(client)

Enter fullscreen mode Exit fullscreen mode

The complete script can be found in GitHub.

As you see, the python client can execute a query created with WOQLpy. It is because WOQLpy lets you build WOQL — the original query language in JSON-LD format with methods in WOQLQuery object and send it to TerminusDB through the Python client. Without WOQLpy, queries need to be written in JSON-LD, for example, the same query above will be:

{
  "when": [
    {
      "true": []
    },
    {
      "and": [
        {
          "add_quad": [
            "scm:Station",
            "rdf:type",
            "owl:Class",
            "db:schema"
          ]
        },
        {
          "add_quad": [
            "scm:Station",
            "rdfs:subClassOf",
            "tcs:Document",
            "db:schema"
          ]
        },
        {
          "add_quad": [
            "scm:Station",
            "rdfs:label",
            {
              "@value": "Bike Station",
              "@language": "en"
            },
            "db:schema"
          ]
        },
        {
          "add_quad": [
            "scm:end_station",
            "rdf:type",
            "owl:ObjectProperty",
            "db:schema"
          ]
        },
        {
          "add_quad": [
            "scm:end_station",
            "rdfs:range",
            "scm:Station",
            "db:schema"
          ]
        },
        {
          "add_quad": [
            "scm:end_station",
            "rdfs:domain",
            "scm:Journey",
            "db:schema"
          ]
        },
        {
          "and": [
            {
              "add_quad": [
                "scm:start_station",
                "rdf:type",
                "owl:ObjectProperty",
                "db:schema"
              ]
            },
            {
              "add_quad": [
                "scm:start_station",
                "rdfs:range",
                "scm:Station",
                "db:schema"
              ]
            },
            {
              "add_quad": [
                "scm:start_station",
                "rdfs:domain",
                "scm:Journey",
                "db:schema"
              ]
            },
            {
              "and": [
                {
                  "add_quad": [
                    "scm:Journey",
                    "rdf:type",
                    "owl:Class",
                    "db:schema"
                  ]
                },
                {
                  "add_quad": [
                    "scm:Journey",
                    "rdfs:subClassOf",
                    "tcs:Document",
                    "db:schema"
                  ]
                },
                {
                  "add_quad": [
                    "scm:Journey",
                    "rdfs:label",
                    {
                      "@value": "Journey",
                      "@language": "en"
                    },
                    "db:schema"
                  ]
                }
              ]
            },
            {
              "add_quad": [
                "scm:start_station",
                "rdfs:label",
                {
                  "@value": "Start Station",
                  "@language": "en"
                },
                "db:schema"
              ]
            }
          ]
        },
        {
          "add_quad": [
            "scm:end_station",
            "rdfs:label",
            {
              "@value": "End Station",
              "@language": "en"
            },
            "db:schema"
          ]
        }
      ]
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Challenges in design

It’s nice to have WOQLpy, but there are some challenges in adopting WOQL in Python. For example, you may notice that there is a method in the example called woql_and, in WOQL, there are operations that are the same as the Python keywords, to name a few: and, or, not, as, from. In this case, we added the woql_ prefix to make it valid in Python.

Another challenge is, as many of the methods we allow provide for more that one way of input, for example, with .label we allow users to input just a string “Journey” or a dictionary {“@value”: “Journey”, “@language”: “en”}, we have to do lots of checking in handling different user inputs. But it is totally worth it as we believe it will give a better usage.

Make it better for Data Scientist

We are still at the beginning of our journey to make TerminusDB super useful for data scientists. One of the features that we have in mind to implement is to make our cool graph visualization embedded in Jupyter notebook. Right now it is possible to be embedded to websites thanks to our Javascript client. We should be able to do that same to Jupyter notebook for Python client.

Also, we think it would be nice to have a direct io with pandas data frame instead of passing it with csvs.


TerminusDB is open-source, we would like to invite you to try it out and give us feedback and ideas. Please feel free to join the slack channel and chat with the team: http://bit.ly/terminusdb-slack

💖 💪 🙅 🚩
cheukting_ho
Cheuk Ting Ho 🐍

Posted on March 16, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related