CF-assist - Visual smart assistant for visually impaired
Captain jay
Posted on April 15, 2024
This is a submission for the Cloudflare AI Challenge.
What I Built
CF-assist is a visual assistant that helps visually impaired individuals get visual insights via the application with just a click and hold action. Users can hold and talk to the assistant, and it will help them understand what the image is. The image can be from their camera, constantly running when the app is open, or users can talk to the LLM via the application.
How to use
There are 2 modes which can be toggled from the navbar:
-
Text mode
- Type and send to talk to LLM normally.
- Click an image by clicking the photo button, and the very next text you send can be about the image.
- Drag and drop or choose a file, then send the next text about the image to get insight.
-
CF-assist mode
- Hold and speak, then leave after a short interval; the assistant will give you the result via audio. (Important: hold the line at the bottom or a few cm outside the area; that's the area where the application picks up a hold movement. The same goes for a click; click the line; it can detect a few cm off clicks too. A detailed video will be uploaded tonight.)
- Click and hold when the user wants to know what's happening in their background and get visual assist. First click the line, then hold the line, speak, and leave after a short time; the assistant will respond via audio.
My Code
Here is the GitHub code: cfassisttesttwo
Installation
-
Clone the repository:
git clone <repository_url>
-
Navigate to the project directory:
cd cfassist
-
Install dependencies:
npm install
-
Navigate to the server directory:
cd server
-
Install server dependencies:
npm install
-
Return to the project directory:
cd ..
-
Start the development server:
npm run dev
Environment Variables
Server:
- Create a
.env
file in theserver
directory. - Obtain
CLOUDFLARE_API_TOKEN
andCLOUDFLARE_APP_ID
from your Cloudflare account. -
Add the following lines to the
.env
file:
CLOUDFLARE_API_TOKEN=<your_cloudflare_api_token> CLOUDFLARE_APP_ID=<your_cloudflare_app_id>
Client:
- Create a
.env
file in thecfassist
directory. -
Define
NEXT_PUBLIC_SERVER_URL
with the server URL. For local development, use:
NEXT_PUBLIC_SERVER_URL=http://localhost:4000
Note: Ensure that all environment variables are correctly set before running the application.
Demo
You can check the website right now
cfassisttesttwo.pages.dev
(Disclaimer: While testing it on other devices and browser the permission of using sound doesn't seem to work well on mobile devices, for optimal experience of using cfassist mode on full potential I recommend using the website on only desktop devices or devices which gives more option on audio and camera permission without explicitly changing it)
Meanwhile, you can look at the images of the running application:
Journey
I started this application using the Next.js framework as the base and TypeScript as the language of choice for both frontend and backend. The models I was particularly interested in for making my application are:
- Automatic speech recognition
- Image to text
- Text generation
For immediate speech-to-text service, I used browser transcribing service, which is faster as it runs on the user's device, but it is not as accurate as Cloudflare AI model. So, I used both, browser one for immediate transcribing and Cloudflare AI for final transcription.
As of now, no login method is implemented, and it's free to use; everything is stored in local storage, ensuring the lowest latency.
I made this application in such a way that it's easy for visually impaired individuals or people without any application knowledge to easily access the application; it's just click, hold, speak, and get visual assistance for anyone in need.
Posted on April 15, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.