An Easy way to get started programming with LLMs

So you're interested in trying to build something cool with LLMs and are looking for a way to get started without too much fuss. I'd like to share a little bit of what I've been working on at Substrate and step through how to use it to start making useful things right away.

I'm using JavaScript here, but this can also be done with Python too!

To follow along you can download the Substrate SDK and get an account which comes with a bunch of free credit.

The basic setup is to npm install substrate, then open up a new JavaScript file.

Let's start with a basic interaction with an LLM. For now, we don't have to care about the specific model - we just want to get something working.

import { Substrate, ComputeText } from "substrate"

const substrate = new Substrate({ apiKey: "YOUR_API_KEY" })

const computeText = new ComputeText({ prompt: "tell me story about an owl" })
const result = await substrate.run(computeText)

console.log(result.json)

If you run that you should see the story the LLM created in just a few moments. Nice!

Let's introduce some prompt-engineering now and see if we can get the LLM to answer a question for us and see how we can get the response formatted in a useful way.

const question = "What are the most popular events at the Olympics?";

const computeText = new ComputeText({ 
  prompt: `
    Answer the following question, but do so with a numbered list with no more than 5 items.

    === Question
    ${question}`,
})
const result = await substrate.run(computeText)

console.log(result.json)

Here we've expanded our prompt to add in some additional instructions and provided our question to the LLM in a clear way.

Here's what it gave me:


Here are the top 5 most popular events at the Olympics:

1. **Track and Field (Athletics)**: Events like the 100m dash, long jump, high jump, and marathon are consistently among the most-watched and followed sports at the Olympics.

2. **Swimming**: Swimming events such as the 100m freestyle, 200m individual medley, and relay events have a huge following due to the world-class athletes competing in this sport.

3. **Gymnastics**: Artistic gymnastics, rhythmic gymnastics, and trampoline events are extremely popular, with the women's all-around competition being one of the most-watched events.

4. **Diving**: The precision and skill required for diving events make them thrilling to watch, with the platform and springboard competitions drawing large audiences.

5. **Figure Skating**: Figure skating events, including men's and women's singles, pairs, and ice dance, are highly anticipated and watched, especially during the winter Olympics.

Now let's see if we can extract this text into a more structured format. LLMs are really good at this and it can help us integrate the data into a program we'd like to build.

We're going to use a variant of ComputeText called ComputeJSON for this that lets us define a schema for the output. And in order to chain these together in one request, we'll need a way to do that too - so let's also import a set of helper functions from sb as well.

import { Substrate, ComputeText, ComputeJSON, sb } from "substrate"

Now let's setup ComputeJSON to extract the event name and a description as labeled fields.

const question = "What are the most popular events at the Olympics?";

const computeText = new ComputeText({ 
  prompt: `
    Answer the following question, but do so with a numbered list with no more than 5 items.

    === Question
    ${question}`,
})

const computeJSON = new ComputeJSON({
  prompt: sb.interpolate`Extract the event name and event description from the following: ${computeText.future.text}`,
  json_schema: {
    type: "object",
    properties: {
      events: {
        type: "array",
        items: {
          type: "object",
          properties: {
            name: { type: "string" },
            description: { type: "string" },
          }
        }
      }
    }
  }
})

const result = await substrate.run(computeJSON)
console.log(result.get(computeJSON).json_object);

Ok there's a little more going on here, so let me explain.

In the prompt for ComputeJSON we'd like to use the future result of the ComputeText response. We're using the helper sb.interpolate to do that. It works just like normal interpolation, but it let's us also use future too. In the json_schema field we're also defining the structure of the JSON data we'd like in the end.

You might have also noticed that we're now also using substrate.run(computeJSON) too. This will run both of these LLM calls as we have wired them together by including the computeText.future.text value as an input. The SDK is smart enough to figure out we'll need to run both.

Finally when we console.log at the end we're using a helper result.get(<node>) to make selecting a specific value from the result a little easier.

The output I got from running that last example looks like this:

{
  "events": [
    {
      "name": "track and field (athletics)",
      "description": "events like the 100m dash, long jump, high jump, and marathon are consistently among the most-watched and followed events at the olympics."
    },
    {
      "name": "gymnastics",
      "description": "artistic gymnastics, rhythmic gymnastics, and trampoline events have a huge following due to their technical difficulty and the athleticism required."
    },
    {
      "name": "swimming",
      "description": "the 100m freestyle, 200m individual medley, and relay events are always highly anticipated and closely watched."
    },
    {
      "name": "diving",
      "description": "the platform diving and springboard events showcase incredible skill and precision, making them a fan favorite."
    },
    {
      "name": "figure skating",
      "description": "the men's and women's singles, pairs, and ice dance events are highly popular due to their artistic expression and technical complexity."
    }
  ]
}

Not bad!

We've just created a pretty simple pipeline that chains two LLMs together and produced some fun structured data.

This pipeline is a two node graph in the end. And on Substrate you can create even more complex graphs by chaining as many different models together as you like. Models that output many kinds of data too: text, images, video, and audio.

I'll be blogging here a little more to share some fun ideas and ways of using Substrate to work with AI models like we did here with some very simple code. Stay tuned!

Blog

An Easy way to get started programming with LLMs

Liam Griffiths

Join Our Newsletter. No Spam, Only the good stuff.

Related