Personal Knowledge Graphs in Relational Model
Volodymyr Pavlyshyn
Posted on April 8, 2024
Various graph databases offer functionalities with a wide range of graph-oriented query languages, from Cypher to graphQL and custom ones. Graph databases could be optimized for storing and processing big graphs but require time to master and learn, and sometimes, it has quite a step in the learning curve. Personal Knowledge Graphs usually have a much smaller scale and are part of user applications or personal knowledge systems.
In a classical application, only a part of the data has a graph nature, and we could have a mixed setup of regular relational data and graphs.
In AI-powered applications, we have a mixed case of
- graph data
- vectors and vector indexes
- regular documents
It is hard to find a database that satisfies all these conditions.
I have been happy with CozoDB for a long time. You could combine PGvector and Apache AGE for Postgres and, together with Postgress's document-like features, build a lot. Sometimes, we need embeddable databases, and we hear the leader is SQLite. We are still waiting for a PGlite implementation that brings Postgres on edge.
We will avoid discussing the scalability of relational databases and load it as a topic for a separate article. Still, relational structures are widespread and well-known and offer many tools. They have good Developer Experience.
Graphs are not relational structures, but we could try to adopt relations to achieve a good representation and performance.
If you have small and fixed graphs, you could represent them as an Adjacent matrix, but this model needs to be more scalable. Any model should be optimized for your queries and needs; current models are subjective.
Directed Graph
A simple and common type of graph is a simple-directed graph where edges connect to nodes.
It is easy to model as a relational structure.
So sample data
Nodes
Edges
RDF Like Graphs
In the Resource definition framework, nodes and edges are not directly differentiated, and you could use edge and node in a different context if needed. All data is stored as triples of resources. Sometimes, modeling graphs closer to RDF is helpful, but reasoning and building queries in this model are hard. I prefer separate relations for nodes and edges.
ClassicaL RDF does not have a label and models it as a tripel relation, but as far as labels are concerned, I add it as a column.
Resource
Triple
Named Graphs and Graph of Graphs
The concept of the named graph came from the RDF community, which needed to group some sets of triples. In this way, you form subgraphs inside an existing graph. You could refer to the subgraph as a regular node. This setup simplifies complex graphs, introduces hierarchies, and even adds features and properties of hypergraphs while keeping a directed nature.
It looks complex, but it is easy to model it with slightly modifying a directed graph.
So, the node could host graphs inside. Let's reflect this fact with a location for a node. If a node belongs to a main graph, we could set the location to null or introduce a main node . it is up to you
Nodes could have edges to nodes in different subgraphs. This structure allows any nesting graphs. Edges stay location-free
Hypergraph
A hypergraph is a mathematical generalization of graphs where a hyperedge could connect multiple or no nodes. So, you have a set of nodes instead of a pair of nodes. Hypergraphs are an emerging domain for modeling complex and dynamic systems and are widely used for temporal and event-dependent graphs. We will model undirected hypergraphs.
Usually, a hypergraph is drawn as sets that overlap or as Vin diagrams.
As far as edges now has many-to-many relations with nodes we just need a joint table
Nodes
Edges
Edge to nodes
The table could grow quickly if you have a big edge with a wide set of nodes.
HyperGraph with Edges as Nodes
As you noticed, we could point the Me node to KGraph because Kgraph is now an edge. So, in a Hypergraph, edges are a set of nodes. If we want to have a graph of graph-like setup, we need the ability to use edges as nodes, the same as the RDF framework does with a resource.
We could simplify a lot of relations and create more complex structures.
To achieve this, we could combine an RDF-like schema with a hypergraph schema. The model would still remain relatively simple.
we couldn’t reuse edges like we did in RDF because they contain different resources.
We could deduce pure nodes and edges from relations. Unfortunately, as long as you allow empty edges, there is no way to differentiate empty edges from single nodes. Hypergraphs could model named graphs and graphs of graphs, but from my experience, named graphs are more convenient for using nodes with a location.
Conclusion
Relational and embeddable databases could be a good choice for small-scale and personal Knowledge graphs. I have had a lot of positive experience with graph structure on a relations model with a datalog. Also, any datalog database with persistence could give you good results. Most static fact-based semantics graphs work well with a simple directed graph. I am increasingly working with AI applications in which complex ideas, conversations, or events could contain subgraphs and multiple entities. In this case, a directed graph or simple triple is not enough. As for me, graphs of graphs and named graphs give good results for this task and still stay close to what SPARQL 1.1 and Turtle could model.
<span id="3b27" data-selectable-paragraph=""># N-Graphs<br><http://example.org/alice/foaf.rdf> {<br> <http://example.org/alice/foaf.rdf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .<br> <http://example.org/alice/foaf.rdf#me> <http://xmlns.com/foaf/0.1/name> "Alice" .<br>}<br><http://example.org/bob/foaf.rdf> {<br> <http://example.org/bob/foaf.rdf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .<br> <http://example.org/bob/foaf.rdf#me> <http://xmlns.com/foaf/0.1/name> "Bob" .<br>}</span>
Hypergraphs are a more robust and new tool well suited for complex dynamic and temporal aware systems. It is not so much standard tools that work with hypergraphs as industry standards. Hypergraphs where edges could be used as nodes deviate from a classical mathematical model but give the most flexible platform for modeling. Sometimes, this model could simplify the amount of edges. More general models is simple to store but more complex to reason and query so you need to find a balance yourself.
Posted on April 8, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.