Prompt Engineering with Serverless AI
Matt Butcher
Posted on December 21, 2023
This article demonstrates how to do LLM prompt engineering for a first AI app. We'll write a couple dozen lines of TypeScript.
I recently developed an AI-based Crossword Puzzle Helper and shared my experience through a blog post. Following this, I received numerous inquiries, particularly regarding the construction of the prompt for this application, which assists me in solving daily crossword puzzles without directly providing the answers. I collaborated with my colleague Sohan on a video walkthrough to address these queries. In this video, we enhanced the basic crossword solver into a more creative and advanced tool.
But in that video, I breezed through prompt engineering. And that’s a hot (and sometimes misunderstood) topic right now. So, in this post, I’ll illustrate my approach to Prompt Engineering with some examples.
Prompt Engineering (and Meta’s LLaMa2)
The process of structuring text that can be interpreted and understood by a generative AI model is called Prompt Engineering. This can apply to text-only models such as Llama2 and image generation models such as Stable Diffusion. Often, the key to generating an appropriate piece of text or an accurate image comes down to how the prompt is engineered.
Prompts can range from straightforward to very complex, based on what you want the AI model to generate. For example:
“Tell me a joke about cats” is a simplistic prompt, but for the purposes of our crossword solver, I had to specify a few details, such as :
- I did not want the AI model to give me the answer directly, as that would defeat the purpose of the app. Instead, it had to give me some suggestions for potential answers
- Also, I wanted the answers to be under 20 characters, as your typical crossword grid is 20x20.
So this is what I came up with:
<s>
[INST]
<<SYS>>
You are an assistant who helps solve crossword puzzle clues.
Respond with one or more suggestions.
A suggested answer should be less than 20 characters.
<</SYS>>
[/INST]
Wait a minute, what’s the <<SYS>>
token there? And the [INST]
tag? And that funny <s>
at the start? These are all ways to mark up a prompt. Let’s take a look.
Prompt Structure
Every model has a particular format for prompting. When it comes to generative AI, you can think of a prompt as a query. You’re explaining to the model what you want it to do.
The LLaMa 2 model has certain tokens that you can use to make complex prompts. Here are some tokens that are useful when you are writing your first prompt:
-
<s>
marks the start of an exchange. -
[INST]
denotes an interaction between us and the LLM. For example, if we were modeling a conversation, we might have several[INST]
sections. -
<<SYS>>
denotes a system directive. We use system directives to tell the LLM how it should behave.
So, in the above example of the crossword solver, the <<SYS>>>
prompt is telling the LLM how to act in responding to the directive. There’s a structure to the example that might not be immediately evident. But it’s something like this: The <s>
wraps one back-and-forth exchange. It has exactly one [INST]
. An [INST]
possibly has a <<SYS>>
directive, and it also may have a question or text. With this in mind, let’s build another one.
Here’s an example of a prompt to build a bot that tells you the sentiment of any conversation. Any conversation sent to this model will be tagged as ‘positive’, ‘negative’ or ‘neutral’:
const prompt =`<s>
[INST]
<<SYS>>
You are a bot that generates sentiment analysis responses. Respond with a single positive, negative, or neutral.
Follow the pattern of the following examples:
User: Hi, my name is Bob
Bot: neutral
User: I am so happy today
Bot: positive
User: I am so sad today
Bot: negative
<</SYS>>
User: I had a rough day at work.
[/INST]
`
Now it’s a little clearer:
-
<s>
starts the exchange. - Then, we define a fairly significant
<<SYS>>
prompt with a few examples. We tell LLaMa2 that a conversation is an exchange in a particular format. - Then, near the end of the
[INST]
, we supply the prompt:User: I had a rough day at work.
In this example, we expect the LLM to read the <<SYS>>
and take on the role of acting like a sentiment analysis engine that expects User: SOME STATEMENT
, and it should return Bot: [positive | negative | neutral]
Let’s try it by writing a super simple Spin app in TypeScript.
Putting This All Together
To create the app, I’m using Spin 2.1. I’m going to scaffold out the project like this:
$ spin new sentiment -t http-ts
Description: A quick sentiment analysis tool
HTTP path: /...
$ cd sentiment
$ npm install
added 141 packages, and audited 142 packages in 6s
20 packages are looking for funding
run `npm fund` for details
found 0 vulnerabilities
The code in these examples is all available in my GitHub repo
Notice that I ran three commands:
-
spin new sentiment -t http-ts
created a new project namedsentiment
and applied the templatehttp-ts
(an HTTP service in TypeScript) -
cd sentiment
to change into the newsentiment
directory -
npm install
to install the initial dependencies
Now I’ll open the newly created sentiment/
project in my code editor. The file src/index.ts
is my main code. Let’s write an LLM inference using the prompt from above:
// I added `Llm` and `InferencingModels` to the imports
import { Llm, InferencingModels, HandleRequest, HttpRequest, HttpResponse } from "@fermyon/spin-sdk"
// This was created for me
export const handleRequest: HandleRequest = async function (request: HttpRequest): Promise<HttpResponse> {
// My new code:
// The prompt we described above.
const prompt = `<s>
[INST]
<<SYS>>
You are a bot that generates sentiment analysis responses. Respond with a single positive, negative, or neutral.
Follow the pattern of the following examples:
User: Hi, my name is Bob
Bot: neutral
User: I am so happy today
Bot: positive
User: I am so sad today
Bot: negative
<</SYS>>
User: I had a rough day at work.
[/INST]
`
// This is the model we're using.
const model = InferencingModels.Llama2Chat
// This does the inference
let answer = Llm.infer(model, prompt)
return {
status: 200,
headers: { "content-type": "text/plain" },
// I just changed this line to print out what the LLM answered
body: `The LLM said: ${answer.text}`
}
}
After importing a few extra objects (Llm
and InferencingModels
), I just changed the default request handler to do the following:
-
const prompt
is the LLaMa2 prompt exactly as we defined it above. -
const model = InferencingModels.Llama2Chat
declares that I am using LLaMa2’s chat model. (There’s also aLlama2Code
model that generates code snippets.) - Most importantly,
let answer = Llm.infer(model, prompt)
is the line that does the actual LLM inference. I send the model and prompt and wait for the LLM to return ananswer
. - Then, in the
return
ed object, I just setbody: 'The LLM said: ${answer.text}'
Before I can run this code, though, I need to make one quick configuration change.
Every Spin project has a spin.toml
file for configuring the app. I need to tell Spin’s security sandbox to let me use the LLM features. So, I need to add one line (directly in the component.sentiment
component section) to that file, as shown below:
spin_manifest_version = 2
[application]
authors = ["Matt Butcher <matt.butcher@fermyon.com>"]
description = "A quick sentiment analysis tool"
name = "sentiment"
version = "0.1.0"
[[trigger.http]]
route = "/..."
component = "sentiment"
[component.sentiment]
source = "target/sentiment.wasm"
exclude_files = ["**/node_modules"]
ai_models = ["llama2-chat"] # <-- ADDED THIS LINE
[component.sentiment.build]
command = "npm run build"
I am able to build and test locally by executing spin build --up
, which builds the code and then initiates a local server. However, for this local deployment process, I need to download the already trained LLaMa2 model and install it locally. Although there are instructions for configuring an already trained model on your local environment, I find it much more convenient to simply deploy my app to Fermyon Cloud's free tier. This only requires a GitHub handle and benefits from their AI-grade GPUs and the pre-installed LLaMa2 13b model:
$ spin build && spin deploy
Building component sentiment with `npm run build`
> sentiment@1.0.0 build
> npx webpack --mode=production && mkdir -p target && spin js2wasm -o target/sentiment.wasm dist/spin.js
asset spin.js 12.8 KiB [compared for emit] (name: main)
orphan modules 5.91 KiB [orphan] 11 modules
runtime modules 937 bytes 4 modules
cacheable modules 9.76 KiB
./src/index.ts + 5 modules 9.13 KiB [built] [code generated]
./node_modules/typedarray-to-buffer/index.js 646 bytes [built] [code generated]
webpack 5.89.0 compiled successfully in 739 ms
Starting to build Spin compatible module
Preinitiating using Wizer
Optimizing wasm binary using wasm-opt
Spin compatible module built successfully
Finished building all Spin components
Uploading sentiment version 0.1.0 to Fermyon Cloud...
Deploying...
Waiting for application to become ready............................ ready
Available Routes:
sentiment: https://sentiment-44wbpowj.fermyon.app (wildcard)
When the deployment is done, it will give me a URL that I can then access using a web browser. You can see the results of my code by going to https://sentiment-44wbpowj.fermyon.app.
💡 LLMs require incredible amounts of data crunching. If you’re running locally, the above code can take five minutes or more to return an answer. Even on Fermyon’s powerful A100 GPUs it may take several seconds to get an answer.
As you can see, the LLM followed our guidance and returned the message Bot: negative
. In other words, it concluded (correctly) that the prompt User: I had a rough day at work
was reflecting negative sentiment.
Let’s make one very minor adjustment to our app and redeploy. We’ll just change the prompt:
const prompt = `<s>
[INST]
<<SYS>>
You are a bot that generates sentiment analysis responses. Respond with a single positive, negative, or neutral.
Follow the pattern of the following examples:
User: Hi, my name is Bob
Bot: neutral
User: I am so happy today
Bot: positive
User: I am so sad today
Bot: negative
<</SYS>>
User: Today is the greatest day I've ever known.
[/INST]
`
All I changed was the last line: User: Today is the greatest day I've ever known.
Now, when we re-run spin build && spin deploy
, and visit our application in our browser we’ll see:
The LLM said: Bot: positive
Conclusion
As magic as LLMs feel, behind the scenes, they do some serious vector math on a vast numerical representation of a huge assortment of human-generated texts. Essentially, think about the probability of a word following another word and how you can get a LLM to model the output you want. Prompt Engineering boils down to being specific in detailing how you want the output to look, combined with a lot of trial and error.
I was on a live stream recently with my friend and colleague Sohan. We played around with different prompts and saw for ourselves how the Llama 2 model worked well with questions but found it harder (amusingly!) with “Fill in the Blanks” type requests.
We take the example and go one step further, having the LLM do multiple back-and-forth exchanges to refine an answer.
Check it out (the prompt engineering bit is around 5 minutes long and starts at the 12:13 mark):
Let me know what your prompt engineering secrets are and if you were able to get a LLM to do an even more sophisticated sentiment analysis!
--
The cover image was generated by Bing using the prompt "Illustrate prompt engineering with a tie-in for webassembly."
Posted on December 21, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.