Visual Guides For Any Skill With Cloudflare AI

This is a submission for the Cloudflare AI Challenge.

What I Built

My app, Guide-ify, makes a visual guide for a given life. This is done with a 3-step process:

Make a request to research more about the given skill.
Summarize these findings.
Create a corresponding visual aid for each step in the summary.

These steps each correspond to a different Cloudflare AI model (see "Using AI Models" below for more details).

Demo

You can try out the app here: https://cloudflare-playground-bn1.pages.dev/

My Code

Here's the repo for my project: https://github.com/MrAlexLau/cloudflare-playground

Tech stack:

The front end app was built with Sveltekit
Styling is done with Tailwind with some snippets used from Flowbite
The app itself is hosted on Cloudflare Pages
The logic to generate output is done with Cloudflare Workers

Journey

My goal was to build something fun and useful that stitched together multiple AI models. One of the exciting aspects with new AI tools is how easily they can be integrated together.

Getting Started With Cloudflare

This was my first time using the Cloudflare's Pages and Workers, and I was pleased that they were both a breeze to use! In particular, the guide for deploying a sveltekit page made it very quick to get a site up and running on Pages.

My next step was to submit a request to a Cloudflare Worker. The two main issues I ran into were that my requests were coming across CORS errors and that I wasn't sure how to parse the params within the worker itself.

Fortunately there was sample code for both of these issues as well. I specifically referenced the CORS Header Proxy and the Fetch JSON when writing the code for my Workers. For reference, you can view all of Cloudflare's Worker examples here as well as longer form tutorials.

Using AI Models

My app uses 3 task types of Cloudflare's AI models:

Text-to-Text (for researching the given topic)
Summarization (for summarizing the research results if the given research can't be easily parsed)
Text-to-Image (for creating images for each step in the guide)

The code for these workers ended up being quite straightforward. For example - within a Worker, calling the Text-to-Text model is as simple as:

let response = await ai.run('@cf/mistral/mistral-7b-instruct-v0.1', { prompt: "Tell me everything there is to know about turtles." });

I decided to create separate Workers for each task, which results in a separate REST endpoint that I can call for each of these operations. This worked well since I was able to test them independently and makes them more composable going forward (eg - I could create another app that uses the same Workers without having to change them at all!).

One thing that's really nice about working with Cloudflare Workers is that the editor has the ability to test out the Worker on the fly before you deploy it. Specifically you can log output (as shown in the red boxes) and preview output (as shown in the blue boxes):

Handling Inconsistent Output

If you work heavily with any text-based AI model, one issue you're bound to run into is output data inconsistency. Specifically, my prompt to the models would request something like "Give me an list of actionable steps for learning how to juggle." Sometimes the output would be a numeric list, other times it would be a bulleted list or simply be a paragraph of text. The AI model being called also affected the output, for example the @cf/tiiuae/falcon-7b-instruct model seems to prefer bulleted lists.

I needed some way of ensuring that the data I parsed could be read reliably every time while still making sure I was preserving the intended content of the message. My approach to normalizing this data was with a 2-step process:

Check and see if the output text from the Text-to-Text model is a numbered list. If it is, parse out each step in that list.
If the output is not a numbered list, then send a request to the Summarization AI model.

Using the Summarization model is especially useful since strips out any exclamations added in by the Text-to-Text model. For example, I noticed that the llama text model will often add phrases like *giggles* to its output which aren't relevant for my purposes. Using the summarization model strips out this type of extraneous text as well as preserving the main points from the text model's output.

Next Steps

Being that this is a project that's submitted for a hackathon, it's not production-ready code. My next steps for cleanup would be to:

Restrict CORS settings used by my Worker endpoints. Right now they're open to make development easier, but that's not a practice I'd ever use for a production-grade application.
Add error handling and tests.
Moving worker logic directly into my project and developing via Cloudflare's Wrangler.
- I developed my workers directly within the Cloudflare web ui which worked well for my use case, but I feel like Wrangler would be a more smooth developer experience going forward since I wouldn't have to leave my text editor when modifying Worker code.
- You can find the source code for my Workers here.

What I Learned

There was quite a bit that I learned with this project:

Working within Cloudflare
- How to deploy to Pages and Workers
- Using the Workers web editor to modify, preview, and deploy updates
- I like that the code for Workers is version controlled on the Cloudflare side, you can easily roll back to a previous version if you'd like
- I was surprised that the free tier comes with 100k Worker requests per day for on the free plan
- Having set up these tools now, I feel like creating a new project with Pages and Workers would be very fast
Sharpening up my Svelte and Tailwind skills
Submitting requests and reading responses from Cloudflare AI Models. For example, here's how to turn binary image output into a url that my app can use:

const imageSrc = await fetch(
  // This worker calls the Text-to-Image `@cf/stabilityai/stable-diffusion-xl-base-1.0` AI model
  "https://text-to-image.mralexlau.workers.dev",
  requestOptions({ prompt: sentence })
).then((response) => {
    // response is binary output
    return response.blob();
}).then(function(blob) {
  const objectURL = URL.createObjectURL(blob);

  // now the return value will be a string that an <img> tag can use for its src attr
  return objectURL;
});

Multiple Models and/or Triple Task Types
Again my app uses 3 task types of Cloudflare's AI models. Specifically the models that I used were:

Summarization - @cf/facebook/bart-large-cnn
Text-to-Image - @cf/stabilityai/stable-diffusion-xl-base-1.0
Text-to-Text - Defaults to @cf/tiiuae/falcon-7b-instruct

I also noticed that many of the text models have similar interfaces, so I added a setting to try out other text models. Users can select any of these text models:

@cf/meta/llama-2-7b-chat-fp16
@cf/meta/llama-2-7b-chat-int8
@cf/mistral/mistral-7b-instruct-v0.1
@cf/tiiuae/falcon-7b-instruct
@hf/google/gemma-7b-it
@hf/nousresearch/hermes-2-pro-mistral-7b

Wrapping Up

I had fun on this project! Using a new ecosystem can be daunting at times, but I felt like I was able to get up and running quickly and that future projects would be a breeze to set up. Looking forward to seeing what other AI models Cloudflare adds in the future.

If you have any questions about the code or running this project, please reach out in the comments!