Personal Knowledge Graphs in AI RAG on user phone
Volodymyr Pavlyshyn
Posted on September 30, 2024
Graphs and vector search are potent tandems for AI-powered applications, which are booming nowadays. Personal knowledge graphs are the core of semantic memory for many agentic AI applications.
At Mykin, we craft AI agentic architecture with a complex memory model directly on the user's device.
Mykin is a privacy-focused AI agent on top of sovereign data owned by users.
Kin. A personal AI for your work life
Get inspired, talk things through, navigate situations or get personalized guidance with Kin. Built for privacy…
mykin.ai
Our Nord Stars
The technical Nord stars of mykin are :
Privacy by design — guide on how to keep architecture secure and private
ssi principles — focus on user data ownership and sovereignty
local first architecture — give the instrument to the user to own
Data Ownership
All these Nord stars have one aspect in common—data ownership. The user has full control and ownership of the data. This means that we shift from a classical all-in-cloud centralized model to a local-first architecture where data is stored and processed, a mesh of user devices, and potentially some cloud services or capabilities are involved.
So we need to run complex RAG and vector search and vector graph clustering primarily on user device.
Expectations for database capabilities
general queries on structured data (regular application data) like messages, conversations, settings, etc
vector search and similarity search capabilities to RAG pipelines and different LLM and ML-powered flows
Graph and graph search capabilities (ML and semantic memory )
As far as we work on mobile, we have a few tech capabilities, too
- embeddable with good support for mobile bindings
- single file database that simplifies a backup
- portable
- battery friendly
- fast and nonblocking io as much as possible
- wide community support
- reliability ## LibSQL If you follow my articles, you already know the answer — LibSQL
I described the full journey of vector search and graphs on top of relational models in my articles
Personal Knowledge Graphs in AI RAG-powered Applications with libSQL
I spend a long time working on privacy first personal ai assistant
ai.plainenglish.io
We have 1 question — how to run LibSQL on a user device?
We are using React Native, so the library should have react native bindings
LibSQL on React native
It is plenty of libraries for React native that run Sqlite but not LibSQL
react-native-sqlite-storage
Widely used with support for transactions and raw SQL queries.
Supports both Android and iOS.
Provides a promise-based API.
react-native-sqlite-2
A lightweight alternative.
Based on a WebSQL API.
Works well for simple databases but has limited features compared to react-native-sqlite-storage.
react-native-sqlite
Similar to react-native-sqlite-storage, but with more minimalistic features.
Might require manual linking.
watermelondb
Built on top of SQLite but offers a more modern approach.
Designed for highly scalable databases in React Native.
Provides an ORM-like interface and works with large datasets efficiently.
expo-sqlite (if using Expo)
Built-in SQLite support for Expo apps.
It is lightweight and easy to use but has fewer advanced features than other libraries.
Expo-sqlite is now a de facto library for SQLite in the Expo ecosystem, and my first idea was to convince the community to add libsql as an engine or fork it and use it for our internal needs.
It was much more challenging than I expected. Sometimes, a big open-source project is a closed door for new ideas and improvements. So it is a door that is hard to nook.
OP-SQLite
OP SQLite Documentation | Notion
Built with Notion, the all-in-one connected workspace with publishing capabilities.
ospfranco.notion.site
The fastest SQLite library for react-native by Ospfranco is what I read the first time when I found OP-SQL in Git Hub. And it is
It has few interesting features for react native app
Async operations
The default query runs synchronously on the JS thread. There are async versions for some of the operations. This will offload the SQLite processing to a different thread and prevent UI blocking. It is also real multi-concurrency, so it won’t bog down the event loop.
Raw execution
If you don’t care about the keys you can use a simplified execution that will return an array of results.
Hooks
You can subscribe to changes in your database by using an update hook that give a full row :
// Bear in mind: rowId is not your table primary key but the internal rowId sqlite uses
// to keep track of the table rows
db.updateHook(({ rowId, table, operation, row = {} }) => {
console.warn(Hook has been called, rowId: ${rowId}, ${table}, ${operation}
);
// Will contain the entire row that changed
// only on UPDATE and INSERT operations
console.warn(JSON.stringify(row, null, 2));
});
db.execute('INSERT INTO "User" (id, name, age, networth) VALUES(?, ?, ?, ?)', [
id,
name,
age,
networth,
]);
Extension Load
It was the first library that allowed me to load an extension by myself and even more. Oskar adds CR-SQL extension as an option to a library to make it work out of the box !!!
Open to Cooperation
One of LibSQL's mottos is to be open to contributions. Oskar was more open to contributions, saw the amazing benefits of libsql, and added it as an option to op-sql.
Little How-To
So, how do you build a vector search-aware personal knowledge graph on a user device ?
I expect that you will have a React native or expo project. You need to add op-sql
yarn add @op-engineering/op-sqlite
You need version 7.3.0+
Now let's configure Libsql. You need to add this section to your package.json
"op-sqlite": {
"libsql": true
}
As far as we do a polymorphic library that runs not only on the device but on nodejs also. I made an abstraction that allows me to swap libsql implementations.
// @ts-nocheck
import {
open as openLibsql,
OPSQLiteConnection,
QueryResult,
Transaction,
} from '@op-engineering/op-sqlite';
import {
BatchQueryOptions,
DataQuery,
DataQueryResult,
IDataStore,
UpdateCallbackParams,
StoreOptions,
} from '@mykin-ai/kin-core';
import { documentDirectory } from 'expo-file-system';
export class DataStoreService implements IDataStore {
private _db: OPSQLiteConnection | undefined;
private _isOpen = false;
public _name: string;
private _location: string;
public useCrSql = true;
private _options: StoreOptions;
constructor(
name = ':memory:',
location = documentDirectory,
options: StoreOptions = {
vectorDimension: 512,
vectorType: 'F32',
vectorNeighborsCompression: 'float8',
vectorMaxNeighbors: 20,
dataAutoSync: false,
failOnErrors: false,
reportErrors: true,
},
) {
this._name = name;
this._options = options;
if (location?.startsWith('file://')) {
this._location = location.split('file://')[1];
} else {
this._location = location;
}
if (this._location.endsWith('/')) {
this._location = this._location.slice(0, -1);
}
}
getVectorOption() {
return {
dimension: this._options.vectorDimension,
type: this._options.vectorType,
compression: this._options.vectorNeighborsCompression,
maxNeighbors: this._options.vectorMaxNeighbors,
};
}
async query(query: string, params?: any[] | undefined): Promise {
try {
await this.open(this._name);
const paramsWithCorrectTypes = params?.map((param) => {
if (param === undefined || param === null) {
return null;
}
if (param === true) {
return 1;
}
if (param === false) {
return 0;
}
return param;
});
const data = await this._db.executeRawAsync(query, paramsWithCorrectTypes);
return {
isOk: true,
data,
};
} catch (e) {
console.error(e.code, e.message);
return {
isOk: false,
data: [],
errorCode: e.code || 'N/A',
error: e.message,
};
}
}
async execute(query: string, params?: any[] | undefined): Promise {
try {
await this.open(this._name);
const paramsWithCorrectTypes = params?.map((param) => {
if (param === undefined || param === null) {
return null;
}
if (param === true) {
return 1;
}
if (param === false) {
return 0;
}
return param;
});
const data = await this._db.executeAsync(query, paramsWithCorrectTypes);
return {
isOk: true,
data: data.rows?._array ?? [],
};
} catch (e) {
console.error(e);
return {
isOk: false,
data: [],
errorCode: e.code || 'N/A',
error: e.message,
};
}
}
async open(name: string): Promise {
try {
if (this._isOpen && name === this._name) {
return true;
}
if (this._isOpen && name !== this._name) {
await this.close();
this._isOpen = false;
}
this._name = name;
this._db = openLibsql({
name: this._name,
location: this._location,
});
console.log('Opened db');
this._isOpen = true;
return true;
} catch (e) {
// eslint-disable-next-line no-console
console.error("couldn't open db", e);
return false;
}
}
async isOpen(): Promise {
return Promise.resolve(this._isOpen);
}
async close(): Promise {
if (this.useCrSql) {
this._db.execute(select crsql_finalize();
);
}
this._db.close();
this._isOpen = false;
return Promise.resolve(true);
}
}
Now we are ready to make graph tables and indexes. I'll skip the entire class as far it is too long and give only essential parts
const vectorOptions = this._store.getVectorOption()
Give us vector configurations, such as the type of vector value and the dimension of embeddings, as the same as vector index params.
const createR = await this._store.execute(
)
create table if not exists edge (
id varchar(36) primary key not null,
fromId varchar(36) not null default '',
toId varchar(36) not null default '',
label varchar not null default '',
displayLabel varchar not null default '',
vectorTriple ${vectorOptions.type}_BLOB(${vectorOptions.dimension}),
createdAt real,
updatedAt real,
source varchar(36) default 'N/A',
type varchar default 'edge',
meta text default '{}'
);
Now we have a triple store that has references to nodes
const createR = await this._store.execute(
)
create table if not exists node (
id varchar(36) primary key not null,
label varchar not null default '',
vectorLabel ${vectorOptions.type}_BLOB(${vectorOptions.dimension}),
displayLabel varchar not null default '',
createdAt real,
updatedAt real,
source varchar(36) default 'N/A',
type varchar default 'node',
entity text default '{}',
meta text default '{}'
);
If you want to know how to model graphs in relational db read my article
Personal Knowledge Graphs. Semantic Entity Persistence in Relational Model
In my last two articles, we modeled different kinds of graphs in a portable relational model.
blog.stackademic.com
Time to create an index
const createIndex = await this._store.execute(CREATE INDEX IF NOT EXISTS idx_edge_vectorTriple
, 'compress_neighbors=${vectorOptions.compression}'
ON edge (libsql_vector_idx(vectorTriple${vectorOptions.compression !== 'none' ?: ''}
, 'max_neighbors=${vectorOptions.maxNeighbors}'
${vectorOptions.maxNeighbors ?: ''}));
)
We configure compress_neighbors and max_neighbors to get the best storage space footprint. if you want to learn more about space complexity, read this article
The space complexity of vector search indexes in LibSQL
Hey, so I continue my adventure in vector search and Graph clustering at
ai.plainenglish.io
Now, we could create a triple
const createOp = await this._store.execute(
`
insert into edge (id, fromId, toId , label, vectorTriple, displayLabel, createdAt, updatedAt)
values (?, ? , ? , ? , vector(${this._store.toVector(
await this.embeddingsService.embedDocument(`${fromNode.label} ${normalizedLabel} ${toNode.label}`)
)}) , ? , ?, ?);
`,
[
this._getUuid(),
fromNode.id,
toNode.id,
normalizedLabel,
label,
Date.now(),
Date.now(),
]
)
Unfortunately, op-sql does not support float32array as a parameter as libsql does. To make a workaround, we need to use a bit of dynamic SQL and create a serialized vector as part of queries. My toVector method does a stringify of float32array and cares about quotes. Please note that we pass a serialized array to a vector function in SQL. I hope that the next version of op-SQL will support float32arrays
Time to query !!
const _top = top ?? 10
const vector = this._store.toVector(await this.embeddingsService.embedQuery(query))
const querySql = `
select e.id, e.label, e.displayLabel, e.createdAt, e.updatedAt, e.source, e.type , e.meta , fn.label, fn.displayLabel, tn.label, tn.displayLabel, vector_distance_cos(e.vectorTriple , ${vector}) distance
from vector_top_k('idx_edge_vectorTriple', ${vector} , ${_top}) as i
inner join edge as e on i.id = e.rowid
inner join node as fn on e.fromId = fn.id
inner join node as tn on e.toId = tn.id
where 1=1 ${maxDistance ? `and distance <= ${maxDistance}` : ''}
order by distance
limit ${_top};
`
const edgeData = await this._store.query(querySql)
Few notes
by default, the vector index works and returns rowid so be careful that the joins
index does not return distance. Still, you could calculate it if you needed
vector_top_k expect top parameter and will return top N items. If you have complex filtering or external top limitations, remember to set a much bigger top N to make the search possible. In our case it is not an issue.
Issues and challenges
I faced a few challenges in React Native, mainly for iOS. They are related to how native modules are compiled and linked in iOS.
One quite unpleasant issue — if you have another library that use another version of Sqlite — it could unpredictably override linking and broke libsql completely.
Compilation Clashes
If you have other packages that are dependent on sqlite (specially if they compile it from source) you will have issues.
Some of the known offenders are:
expo-updates
expo-sqlite
cozodb
You will face duplicated symbols and/or header definitions since each of the packages will try to compile SQLite from sources. Even if they manage to compile, they might compile sqlite with different compilation flags and you might face threading errors.
Unfortunately, there is no easy solution. It would be best if you got rid of the double compilation by hand, either by patching the compilation of each package so that it still builds or removing the dependency on the package.
On Android you might be able to get away by just using a pickFirst strategy (here is an article on how to do that). On iOS depending on the build system you might be able to patch it via a post-build hook, something like:
pre_install do |installer|
installer.pod_targets.each do |pod|
if pod.name.eql?('expo-updates')
# Modify the configuration of the pod so it doesn't depend on the sqlite pod
end
end
end
Follow op-sql docs to get an updated list of libs
Gotchas | Notion
Built with Notion, the all-in-one connected workspace with publishing capabilities.
ospfranco.notion.site
RNRestart crash
One more ios issue
import RNRestart from 'react-native-restart';
if you for some reasons need to restart app and use react-native-restart you need to make that you close all connections
import { closeAllConnections } from '@storage/data-store-factory';
import RNRestart from 'react-native-restart';
export const restartApplication = async (): Promise => {
await closeAllConnections();
RNRestart.restart();
};
Now you could also do a personal knowledge graph with vector search on a user device!!!
I want to say thanks to Oskar and Turso team for their amazing work
Posted on September 30, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.