Semantic Search Over Satellite Images Using Qdrant

niranjanakella

Niranjan Akella

Posted on December 29, 2023

Semantic Search Over Satellite Images Using Qdrant

Build your very own image search engine

Know about me & reach me out at: LinkedIn or X đŸ€

Image description

Do you ever wonder how Google Photos and Apple Photos are able to understand images?
Or, how do they allow you to search for images based on what you ‘type’?
Or, how Google’s very own image search works?

Well, I cracked it!

Image description

Image description

More Than Just an Introduction

In this new mind-boggling project, I was able to mimic this very ability of such powerful platforms right on my local system.

Creativity and imagination go hand-in-hand. We should always indulge in imaginative thought experiments that spark creativity, and this is one such thought experiment that has been teasing me for quite some time. I am happy to share that I have succeeded to some extent in satisfying my intellectual thirst through the help of Qdrant & OpenAI’s open-sourced model.

In this article, I’ll be exploring the creation of a semantic image search engine using OpenAI's latest and greatest open-sourced CLIP model coupled with the sheer might of Qdrant’s Vector Database.

This project is divided into the following sections:

  • Environment Setup
  • Data Pre-processing & Populating Vector Database
  • Embedding Feature-Vector-Driven Semantic Search Over Vector Database for Active Image Retrieval

Image description

Environment Setup

I always love to organize my projects with a proper structure, which makes them easier to review later on. Similarly, I believe you also prefer to keep your projects straightforward and manageable.

Pro Tip:
I prefer to divide my AI projects this way:

model_type/
|----project_title/
      |----demo/
    |----recorded_demo.mp4
    |----stable_build/
      |----exp_<experiment_number>/
            |----data/
          |----raw/
          |----processed_training_data/
|----model/
    |----metrics/
          |----classification_reports/
          |----performance_scores/
      |----README.md
Enter fullscreen mode Exit fullscreen mode

The first step in preparing the environment for this project involves pulling the Docker container image and then executing it on your local Docker daemon. [Don't forget to launch the Docker application first].

  • Pull the Qdrant client container from the Docker Hub. Then run the container using the following command, which will host the application at localhost:6333.
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
 -v $(pwd)/qdrant_storage:/qdrant/storage:z \
   qdrant/qdrant
Enter fullscreen mode Exit fullscreen mode

_NOTE: If you are running on Windows, kindly replace $(pwd) with your local path.
_

Image description

  • Next comes the most important step of all: ‘Use an Environment’. You need to have an independent environment when performing experiments or, else, you will surely fall into a blackhole like Matthew McConaughey in the film ‘Interstellar’.

Image description

So, let’s create a Python environment (I used Conda) and install the following basic dependencies necessary to run the AI model.

conda create -n qdrant -y
pip install qdrant-client sentence-transformers accelerate tqdm datasets gradio
Enter fullscreen mode Exit fullscreen mode

Now that we are all set, Let’s begin the show!

Data Pre-Processing and Populating Vector Database

For this project, I’ve used the ‘arampacha/rsicd’ dataset, a collection of diverse satellite images from Hugging Face. We leverage the datasets library from Hugging Face to load the training split of the dataset.

import datasets

print("[INFO] Loading dataset...")
ds = datasets.load_dataset('arampacha/rsicd', split='train')
Enter fullscreen mode Exit fullscreen mode

Now comes the AI.

I have browsed through a pile of models to find the one that best fits my need for generating feature-focused embeddings from satellite images, as well as creating text embeddings that can be utilized later for semantic search.

I settled on OpenAI's CLIP model, specifically 'openai/clip-vit-base-patch32'. This model is tailored for zero-shot image classification and yields a (1,512)-dimensional feature embedding for each image. And it doesn’t stop there. Being pre-trained on images and their corresponding captions, it aligns both textual and visual contexts within the same embedding tensor space. This implies that whether you input text or an image, you will receive a (1,512)-dimensional embedding tensor.

Image description

The elegance of the CLIP model lies in its ability to map both image data and textual data to the same embedding space as illustrated in the above image.

If the input query is textual, we can use the tokenizer to tokenize it and create token_ids. Subsequently, we can generate an embeddings tensor using the get_text_features method from the model class. This process will result in an embedding feature tensor with the shape (1,512).

If the input query is an image, we can use the processor to process and convert the image into a format suitable for the model. Following this, we can generate an image embedding tensor with the shape (1, 512) using the get_image_features method from the model class.

Hence, it functions as a versatile model capable of generating either image embeddings or text embeddings depending on our specific use case. The key advantage is the consistent dimensionality of both embedding types, whether text or image. Pre-trained to understand the interconnected feature distribution between an image and its captions, the model stands as the optimal choice for text-to-image or image-to-image searches.

OpenAI’s comment on CLIP model:
‘If the task of a dataset is classifying photos of dogs vs cats, we check for each image whether a CLIP model predicts the text description “a photo of a dog” or “a photo of a cat” is more likely to be paired with it.’

Image description

from transformers import AutoTokenizer, AutoProcessor, AutoModelForZeroShotImageClassification

print("[INFO] Loading the model...")
model_name = "openai/clip-vit-base-patch32"
tokenizer = AutoTokenizer.from_pretrained(model_name)
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForZeroShotImageClassification.from_pretrained(model_name)
Enter fullscreen mode Exit fullscreen mode

Here, we have a tokenizer that is used to tokenize text and a processor to prepare images which are consumable by the model.

After loading the model, you will need to instantiate a Qdrant client tethering to the local docker container running the Qdrant application. Create a Qdrant data collection that would be hosting the vectorized data. We set the vector size to be 512 since the output embedding feature tensor from the model is of shape (1,512).

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient("localhost", port=6333)
print("[INFO] Client created...")

print("[INFO] Creating qdrant data collection...")
client.create_collection(
    collection_name="satellite_img_db",
    vectors_config=models.VectorParams(size=512, distance=models.Distance.COSINE),
)
Enter fullscreen mode Exit fullscreen mode

Populate the VectorDB by processing each image in the dataset, extract its features using the CLIP model, and upload the resulting embeddings to Qdrant’s ‘satellite_img_db’ VectorDB data collection.

Note: If you observe closely, I am not only saving the image embeddings but also storing the image pixel values and image size in the vector payload. I will use this information later to reconstruct the image for display on the Gradio app. To better understand the flow of the experiment, do check out the ‘Data Flow’ illustration that I made in the following section.

from tqdm import tqdm
import numpy as np

print("[INFO] Creating a data collection...")
records = []
for idx, sample in tqdm(enumerate(ds), total=len(ds)):
    processed_img = processor(text=None, images=sample['image'], return_tensors="pt")['pixel_values']
    img_embds = model.get_image_features(processed_img).detach().numpy().tolist()[0]
    img_px = list(sample['image'].getdata())
    img_size = sample['image'].size 
    records.append(models.Record(id=idx, vector=img_embds, payload={"pixel_lst": img_px, "img_size": img_size, "captions": sample['captions']}))

#uploading the records to client
print("[INFO] Uploading data records to data collection...")
#It's better to upload chunks of data to the VectorDB 
for i in range(30,len(records), 30):
    print(f"finished {i}")
    client.upload_records(
        collection_name="satellite_img_db",
        records=records[i-30:i],
    )


print("[INFO] Successfully uploaded data records to data collection!")
Enter fullscreen mode Exit fullscreen mode

Embedding Feature-Vector-Driven Semantic Search Over Vector Database for Active Image Retrieval

Now that we have our data ready and chilling in Qdrant’s VectorDB, let’s build an app to interact with it and retrieve information through Qdrant’s Semantic Search functionality.

I will be using Gradio to build a quick functional application with a beautiful UI. Why? Because it comes with a prebuilt UI bundle that is easy to set up and great for quick demos. Coding through it is a breeze. Just visit hugging face spaces and you will understand what I mean.

To put it in simple terms – all we need in this application is to consume a text input from the user, vectorize the text by generating text-embeddings using the ‘get_text_features’ method from the model class, then using the vectorized text as query we perform semantic search over the vectorDB utilizing the search method from Qdrant’s client class.

Image description

def process_text(text):
    inp = tokenizer(text, return_tensors="pt")
    text_embeddings = model.get_text_features(**inp).detach().numpy().tolist()[0]
    hits = client.search(
        collection_name="satellite_img_db",
        query_vector=text_embeddings,
        limit=1,
    )

    for hit in hits:
        img_size = tuple(hit.payload['img_size'])
        pixel_lst = hit.payload['pixel_lst']

        new_image = Image.new("RGB", img_size)
        new_image.putdata(list(map(lambda x: tuple(x), pixel_lst)))

    return new_image

iface = gr.Interface(
    title="Semantic Search Over Satellite Images Using Qdrant Vector Database",
    description="by Niranjan Akella",
    fn=process_text,
    inputs=gr.Textbox(label="Input prompt"),
    outputs=gr.Image(type="pil", label="Satellite Image"),
)

iface.launch()
Enter fullscreen mode Exit fullscreen mode

Note: The complete code is shared at the end along with the link to Git-gist.

You can directly run the Gradio application from the terminal using the Python runtime python3 app.py

Scope

The scope of this experiment doesn’t end here. In this project, I have built a text-to-image search engine, but it is also possible to build an image-to-image search engine using the processor of the CLIP model. I highly recommend you to experiment with that and reach out to me on LinkedIn or X to discuss more about it.

Image search demo:
Image description

Conclusion

In this project, I successfully combined the power of OpenAI's CLIP model for image embeddings with Qdrant’s Semantic Search functionality over its vector database for efficient semantic search, trying to mimic a very popular Google Photos/Apple Photos functionality. I demonstrated the power of AI coupled with a powerful VectorDB like Qdrant along with a working demo using Gradio application providing a user-friendly interface for semantic image search based on textual queries. This article serves as a wonderful guide for building your own image search engine using AI+Qdrant’s VectorDB – combining advanced open-sourced AI models and a scalable vector database.


Here's the code

💖 đŸ’Ș 🙅 đŸš©
niranjanakella
Niranjan Akella

Posted on December 29, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related