Run & Debug your LLM Apps locally using Ollama & Llama 3.1

yemi_adejumobi

Yemi Adejumobi

Posted on August 14, 2024

Run & Debug your LLM Apps locally using Ollama & Llama 3.1

Image description

In the rapidly evolving landscape of AI and ML, large language models (LLMs) have become increasingly powerful and ubiquitous. However, the costs and complexities associated with running these models in cloud environments can be prohibitive, especially for developers and small teams looking to experiment and innovate.

Enter Ollama, a game-changing tool that brings the power of LLMs to your local machine. This blog post will explore how Ollama can simplify your development process, allowing you to run LLM applications locally with ease and efficiency while adding Langtrace, an open-source observability tool that complements Ollama perfectly, providing crucial insights into your LLM application's performance and behavior.

Whether you're a seasoned AI developer or just starting your journey with language models, this guide will equip you with the knowledge and tools to take your LLM projects to the next level. Let’s dive in.

What is Ollama?

Ollama is an innovative tool that enables running large language models (LLMs) locally, providing a cost-effective solution for testing and development. By running LLMs locally, you can experiment and refine your ideas without incurring significant production costs.

By running LLMs locally, you can:

  • Reduce cloud costs: Save on cloud computing expenses by running LLMs on your local machine.
  • Faster experimentation: Quickly test and iterate on your ideas without relying on remote servers.
  • Improved data privacy: Keep your data local and secure, reducing the risk of data breaches.

Setting up Ollama and running LLMs locally

For this step, we will be using Meta’s latest open source model, Llama3.1. For most optimal performance with Ollama ensure your laptop has at least 16GB of RAM. If you do then follow these steps:

  1. Download and install Ollama https://ollama.com/download
  2. Download the desired LLM model (e.g., Llama3.1 or other open-source models). In a terminal window run the following to run llama3.1 locally for example
ollama run llama3.1
Enter fullscreen mode Exit fullscreen mode
  • This is similar to docker commands, it will pull and run llama3.1

Image description

  • Once it is done pulling, you should have a terminal prompt you can start chatting from.

Image description

For further customization and to use Modelfile to create your own custom system prompt, refer to Ollama documentation here.

Instrumenting Ollama with Langtrace

Now that you have a local LLM, let’s say you are building a customer service bot and you would like to view detailed traces on the LLM requests, this is where Langtrace shines. Langtrace provides a Python SDK that enables observability for Ollama, allowing you to trace LLM calls and gain valuable insights into your application's performance. To instrument Ollama with Langtrace:

  1. Generate an API key from langtrace.ai - you can also self-host.
  2. Install the Langtrace Python or Typescript SDK.
  3. Import the SDK and initialize the SDK.
  4. Start tracing!

Example code snippet:

from langtrace_python_sdk import langtrace, with_langtrace_root_span
import ollama
from dotenv import load_dotenv

load_dotenv()

# langtrace.init(write_spans_to_console=False)
langtrace.init(api_key = 'YOUR_API_KEY', write_spans_to_console=False)

@with_langtrace_root_span()
def give_recs():
  response = ollama.chat(model='llama3.1', messages=[
    {
      'role': 'user',
      'content': 'You are an AI assistant with expertise in mens clothing. Help me pick clothing for a black tie dinner at work.',
    },
  ])
  print(response['message']['content'])

if __name__ == "__main__":
  print("Running fashionista bot...")
  give_recs()

Enter fullscreen mode Exit fullscreen mode

Here is what the trace looks like in Langtrace UI

Image description

Here is a link to a reference cookbook for Ollama integration with Langtrace.

Tracing LLM call

With Langtrace, you can now trace LLM calls and capture essential metadata, such as:

  • Input, Output and Total tokens
  • Latency
  • Error rates

This data provides valuable insights into your application's performance, helping you optimize and improve it over time.

Image description

In the next blog in this series, we will cover how to use Langtrace to perform evaluations on your application’s accuracy and optimize its behavior.

Quick Update

I added a UI option to this bot. Feel free to check out the code here. I use Streamlit for the UI but you can swap it out for Gradio or any other library.

Image description

To see this in action, install Streamlit

pip install streamlit 
Enter fullscreen mode Exit fullscreen mode

Then run the code using

streamlit run ollama-fashionistav2.py
Enter fullscreen mode Exit fullscreen mode

Next steps

In conclusion, combining Ollama's local LLM capabilities with Langtrace's observability features unlocks a powerful toolset for building and optimizing LLM applications. By following the steps outlined in this post, you can leverage the benefits of running LLMs locally with Ollama, including reduced cloud costs, accelerated experimentation, and improved data privacy.

With Langtrace, you can gain valuable insights into your application's performance, identify bottlenecks, and optimize its behavior. By integrating Ollama and Langtrace, you can build more efficient, effective, and innovative LLM applications. Try out Ollama and Langtrace today and discover the advantages of local LLM development and open-source observability for yourself!

💖 💪 🙅 🚩
yemi_adejumobi
Yemi Adejumobi

Posted on August 14, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related