Jack Herrington
Posted on July 2, 2024
So you want to try out vector search but you don’t want to pay OpenAI, or use Huggingface, and you don’t want to pay a vector database company. I’m here for you. Let’s get vector search going, on your own machine, for free.
What Are We Doing?
Let’s take a quick step back and talk about what we are doing and why we are doing it. Vector search of AI embeddings is a way to create a search based on concepts. For example, searching for ‘pet’ might yield results for both dogs and cats. This is super valuable because it means that your customers can get better search results
To accomplish this we first take the text we want to search and send it to an AI for it to create an “embedding”. An embedding is a lengthy array of floating point values, usually between ~300 and ~1500 numbers.
The embedding value for cat, dog, and pet would be similar. So if you were to compare cat and dog they would be close, where dog and pizza would not be close.
What a vector database allows you to do is to store these vectors, along with their associated data (probably the original text of the data). Once the data is stored you can then query the database with a new vector to get any nearby results. For example, if we stored cat and dog with their embeddings in the database, we could then take a input text of “pet”, create the embedding for that, then use that to query the database and we would likely get back cat and dog.
Why Postgres and OpenLlama?
Postgres is a fantastic database that you can easily install and run locally. And with the pgvector extension to Postgres you can create vector fields that you can then use in your SQL queries.
There are multiple ways to install Postgres on your machine. On my Mac I used the Postgres.app to install Postgres.
OpenLlama is a very easy way to install and run AI models locally. I used Homebrew to install OpenLlama using brew install ollama.
For our simple test application we’ll load all the lines from the 1986 horror film Aliens.
Getting Set Up
There are lots of models to choose from with OpenLlama. For this application I chose snowflake-arctic-embed because its ideal for fast creation of embeddings. To install it I used the command ollama pull snowflake-arctic-embed .
The last part of the setup is to create a local Postgres database. You can name it whatever you like, I chose lines because we are searching lines from a movie.
With the database created we can use the psql command to run some commands. The first is to add the vector extension to the database. That enables the vector field type. To do that I use the create extension command:
CREATE EXTENSION vector;
Now we need to create a table to hold the line text as well as the vectors, here are the commands to create the table along with an index on the position value which is the position of the line in the script.
CREATE TABLE lines (
id bigserial PRIMARY KEY,
position INT,
text TEXT,
embedding VECTOR(1024)
);
CREATE UNIQUE INDEX position_idx ON lines (position);
The important thing to note here is the size of the vector. Different models create different sizes of vector. In the case of our snowflake model the embedding size is 1,024 numbers, so we set the vector size to that.
You will want to use the same embedding AI for both storage and query. If you use different models then the numbers won’t line up.
Creating The Vector Indexes
As you can imagine, comparing two 1,024 value floating point arrays could be costly. And comparing lots of them could be very costly. So these new vector databases have come up with different indexing models to make that more effecient. The Postgres vector support has different types of indexes, we will use the Hierarchical Navigable Small Worlds (HNSW) type to create three different indexes:
CREATE INDEX ON lines USING hnsw (embedding vector_ip_ops);
CREATE INDEX ON lines USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON lines USING hnsw (embedding vector_l1_ops);
Why three? Because there are multiple ways to compare two vectors. There is Cosine, which is often the default, and is good for doing more concept comparison. There is also uclidean, and dot-product comparison. Postgres supports all of these methods (and more).
Whichever you use you will want to make sure the indexes are enabled for that so that you can get high speed queries.
Loading The Database
With the model downloaded, and Postgres setup we can now start loading the database with our movie lines and their embeddings. I’ve posted the complete project that also includes a NextJS App Router UI on github. The script is located in the load-embeddings directory. The original data is from this script page.
Before you can load the data you’ll need to copy the .env.example files to .env and then change the values to match whatever your Postgres connection details.
To load the embedding into Postgres run node loader.mjs with Node 20 or higher.
The key parts of the script are the embedding generation:
import ollama from "ollama";
...
const response = await ollama.embeddings({
model: "snowflake-arctic-embed",
prompt: text,
});
Where we use the ollama library to invoke the snowflake embedding model with each line of text one-by-one.
We then insert the line into the database using an INSERT statement:
await sql`INSERT INTO lines
(position, text, embedding)
VALUES
(${position}, ${text}, ${`[${response.embedding.join(",")}]`})
`;
The only tricky thing here is how we format the embedding which is by joining all the numbers together into a string and wrapping it in brackets.
With all the data loaded into the database it’s time we make a query.
Making Our First Query
To make sure this works there is a test-query.mjs file in theload-embeddings directory. To make a vector query we first run the model to turn the query into a vector, like so:
const response = await ollama.embeddings({
model: "snowflake-arctic-embed",
prompt: "food",
});
In this case the prompt is food and we use exactly the same process as we did on the loader script to turn that into an embedding.
We then use a SQL SELECT statement to query the database with that vector:
const query = await sql`SELECT
position, text
FROM
lines
ORDER BY
embedding <#> ${`[${response.embedding.join(",")}]`}
LIMIT 10`;
We are using ORDER BY to order the records in the database by their similarity to the given embedding then using LIMIT to get back just the top 10 most similar.
The <#> syntax in the ORDER BY is important because it defines which comparison algorithm to use. From the documentation our options are:
<-> - L2 distance
<#> - (negative) inner product
<=> - cosine distance
<+> - L1 distance (added in 0.7.0)
You can decide for yourself which comparison provides the best output for your application, but be sure to index the table properly based on that comparison method.
On my machine this test query yielded, amongst other things:
317 Guess she don't like the corn bread, either.
Which is a classic line from the movie that indeed references a type of food (corn bread).
Putting A User Interface On It
With a little extra effort I put a NextJS App Router interface on it that you can play with by running pnpm dev in the root directory of the project after the database has been loaded and the .env file set up properly.
This NextJS app uses exactly the same SELECT operation to query the lines from the database.
Conclusions
Obviously you’re not going to take an Aliens script searching application production. But from what I’ve shown you here you could search text content, product descriptions, comments, almost any kind of text.
Enjoy!
Posted on July 2, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.