This is a submission for the Cloudflare AI Challenge.

What I Built

I built a Q&A chat app for house recommendation. The idea is simple, you can ask using text or image about your dream house and the app will find the most relevant house listing stored on a Vectorize database. Currently, I have 100 house listing in Bogor, Indonesia.

And you read that right, you can upload an image to perform reverse image search or more accurately, a semantic search using an image embedding model😉

Demo

Try it! https://rumah-frontend.pages.dev

Example prompt:

Recommend me a house with 2 bedrooms
House near Bojong Gede

My Code

fahminlb33 / koderumah

House Recommendation RAG

Retrieval-Augmented Generation (RAG) for house recommendation.

This project uses multiple AI models to perform QnA style house search/recommendation using RAG method. It's a more advanced use case of CloudFlare AI which integrates many CloudFlare services and AI models.

@cf/meta/llama-2-7b-chat-int8
@cf/baai/bge-large-en-v1.5
@cf/unum/uform-gen2-qwen-500m
mobilenet_v3 through @tensorflow/tfjs

Try here: https://rumah-frontend.pages.dev/

Example propmpts:

Recommend me a house with 2 bedrooms
House near Bojong Gede

Requirements

Node v20.12.0
npm v10.5.0
Wrangler v3.0.0 or newer

You'll need CloudFlare Worker Pro Plan to be able to use Vectorize service which currently are in Beta.

Tech Stack

Vite
React
Radix UI
Tailwind CSS
zod
itty-router
jpeg-js
@tensorflow/tfjs
drizzle-orm
CloudFlare services used: Pages, Workers, Workers AI, Vectorize, D1, R2

Deployment

Step 1: Clone this repo, install the npm packages, and create the necessary databases, buckets, and indexes.

# clone the repo
git clone https://github.com/fahminlb33/koderumah.git
# install npm packages
npm install

# create D1 databases
npx wrangler

…

View on GitHub

Tech stack:

CloudFlare Workers, Pages, AI, Vectorize, D1, R2
Backend: itty-router, zod, drizzle-orm, tensorflow.js
Frontend: Remix, React, Radix UI

Journey

This is my third and final submission to the CloudFlare Hackathon. My previous submission was about creating a storybook and dev.to author recommendation, now I’m focusing on LLM and RAG for Q&A.

RAG: Retrieval-Augmented Generation.

Building the RAG pipeline

This time my idea was to build an AI assistant to give house recommendation based on text prompt. You can enter a prompt describing the house you want, for example, the number of bedrooms, bathrooms, etc. and then the model will give you house recommendations based on the house listing stored on the D1 database.

There are three parts that make up the RAG pipeline.

Query agent: this agent provides context or “memory” from earlier prompt, if exists. This produces a new “refined prompt,” hopefully with an added context from a previous chat.
Semantic search: the refined prompt is then fed to a text embedding model and a vector search is performed to a Vectorize index, returning the most relevant document containing the house listing.
Answer agent: using the retrieved documents as context, this agent will then summarize and generate a final response to the users.

Overall, it is the usual RAG pipeline you’ll see on many tutorials on the internet. But can we improve it?

Prompting by text is mainstream, what about image?

I found using text prompts to be effective, but I wanted to explore if using an image as a query could enhance the experience.

Currently, CloudFlare AI doesn’t have an image embedding model available. To solve this, I considered using a 3rd party service for image embedding. However, I recalled that TensorFlow has a JS version that could potentially run on a web worker.

Initially, I faced difficulties in the image decoding process with TensorFlow.js because it is designed mainly for browsers, which have built-in image decoding capabilities. Fortunately, you can decode an image using pure JS library such as jpeg-js and run a TensorFlow.js model in a CloudFlare worker.

BUT, it is slow. Really slow...

It takes about 5 seconds to perform a single image embedding. It is good enough for a prototype, but in the long run this will lead to bad UX. The bottleneck appears to be caused by workers needing to download a model and set up everything from scratch each time they run an image embedding process. Since each call to a Worker is isolated, I cannot cache the model for future inference.

Now that we have got the embedding of our image, we can continue with semantic search and summarize the retrieved documents. This will enable us to generate a conclusive answer.

Architecture Diagram

The models used are:

Multiple Models and Triple Task

Text Generation: @cf/meta/llama-2-7b-chat-int8
Text Summarization: @cf/facebook/bart-large-cnn
Text Embedding: @cf/baai/bge-large-en-v1.5
Image to Text: @cf/unum/uform-gen2-qwen-500m
Image Embedding: mobilenet_v3

What I Learned

Compared to my previous submission, this app is definitely more intricate, but fun otherwise. I don’t even have to use LangChain to build this RAG pipeline. Overall, this project shows that CloudFlare AI, especially the Text Generation model quality is quite good for building RAG apps. The only major problem I faced on this project is the model hallucinations in the query agent, causing the responses to be reformulated into a question, not a statement. Maybe my system prompt is not optimal yet.

The fact that we can also bring our own TensorFlow.js model to CloudFlare Worker is a major advantage, as it simplifies our system architecture and allows us to run nearly everything on CloudFlare Worker. But keep in mind the drawback I mentioned above😉

Also, big thanks to my friend @rasyidf for building the frontend app. I couldn’t do it without him.

Blog

Recommend me a House🏡 RAG with Cloudflare AI🌤️

Fahmi Noor Fiqri