Yemi Adejumobi
Posted on August 14, 2024
In the rapidly evolving landscape of AI and ML, large language models (LLMs) have become increasingly powerful and ubiquitous. However, the costs and complexities associated with running these models in cloud environments can be prohibitive, especially for developers and small teams looking to experiment and innovate.
Enter Ollama, a game-changing tool that brings the power of LLMs to your local machine. This blog post will explore how Ollama can simplify your development process, allowing you to run LLM applications locally with ease and efficiency while adding Langtrace, an open-source observability tool that complements Ollama perfectly, providing crucial insights into your LLM application's performance and behavior.
Whether you're a seasoned AI developer or just starting your journey with language models, this guide will equip you with the knowledge and tools to take your LLM projects to the next level. Let’s dive in.
What is Ollama?
Ollama is an innovative tool that enables running large language models (LLMs) locally, providing a cost-effective solution for testing and development. By running LLMs locally, you can experiment and refine your ideas without incurring significant production costs.
By running LLMs locally, you can:
- Reduce cloud costs: Save on cloud computing expenses by running LLMs on your local machine.
- Faster experimentation: Quickly test and iterate on your ideas without relying on remote servers.
- Improved data privacy: Keep your data local and secure, reducing the risk of data breaches.
Setting up Ollama and running LLMs locally
For this step, we will be using Meta’s latest open source model, Llama3.1. For most optimal performance with Ollama ensure your laptop has at least 16GB of RAM. If you do then follow these steps:
- Download and install Ollama https://ollama.com/download
- Download the desired LLM model (e.g., Llama3.1 or other open-source models). In a terminal window run the following to run
llama3.1
locally for example
ollama run llama3.1
- This is similar to docker commands, it will pull and run llama3.1
- Once it is done pulling, you should have a terminal prompt you can start chatting from.
For further customization and to use Modelfile
to create your own custom system prompt, refer to Ollama documentation here.
Instrumenting Ollama with Langtrace
Now that you have a local LLM, let’s say you are building a customer service bot and you would like to view detailed traces on the LLM requests, this is where Langtrace shines. Langtrace provides a Python SDK that enables observability for Ollama, allowing you to trace LLM calls and gain valuable insights into your application's performance. To instrument Ollama with Langtrace:
- Generate an API key from langtrace.ai - you can also self-host.
- Install the Langtrace Python or Typescript SDK.
- Import the SDK and initialize the SDK.
- Start tracing!
Example code snippet:
from langtrace_python_sdk import langtrace, with_langtrace_root_span
import ollama
from dotenv import load_dotenv
load_dotenv()
# langtrace.init(write_spans_to_console=False)
langtrace.init(api_key = 'YOUR_API_KEY', write_spans_to_console=False)
@with_langtrace_root_span()
def give_recs():
response = ollama.chat(model='llama3.1', messages=[
{
'role': 'user',
'content': 'You are an AI assistant with expertise in mens clothing. Help me pick clothing for a black tie dinner at work.',
},
])
print(response['message']['content'])
if __name__ == "__main__":
print("Running fashionista bot...")
give_recs()
Here is what the trace looks like in Langtrace UI
Here is a link to a reference cookbook for Ollama integration with Langtrace.
Tracing LLM call
With Langtrace, you can now trace LLM calls and capture essential metadata, such as:
- Input, Output and Total tokens
- Latency
- Error rates
This data provides valuable insights into your application's performance, helping you optimize and improve it over time.
In the next blog in this series, we will cover how to use Langtrace to perform evaluations on your application’s accuracy and optimize its behavior.
Quick Update
I added a UI option to this bot. Feel free to check out the code here. I use Streamlit for the UI but you can swap it out for Gradio or any other library.
To see this in action, install Streamlit
pip install streamlit
Then run the code using
streamlit run ollama-fashionistav2.py
Next steps
In conclusion, combining Ollama's local LLM capabilities with Langtrace's observability features unlocks a powerful toolset for building and optimizing LLM applications. By following the steps outlined in this post, you can leverage the benefits of running LLMs locally with Ollama, including reduced cloud costs, accelerated experimentation, and improved data privacy.
With Langtrace, you can gain valuable insights into your application's performance, identify bottlenecks, and optimize its behavior. By integrating Ollama and Langtrace, you can build more efficient, effective, and innovative LLM applications. Try out Ollama and Langtrace today and discover the advantages of local LLM development and open-source observability for yourself!
Posted on August 14, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.