Building A Python Script For Natural Language Q&A In The Console For Free Using A LLaMA Model

Motivation

In today's digital age, our dependency on the internet for various tasks has become a norm. However, there are instances when we find ourselves without a stable internet connection while venturing outside. Imagine having access to a powerful language model without relying on the internet, and being able to experiment with different configurations and scenarios.

One of the challenges we face when we step outside is the unreliable or non-existent internet connection. But having a console-based language generation liberates us from this constraint by providing a self-contained environment that doesn't rely on continuous internet access.

Why pay for complex and expensive language generation services in the cloud when you only need simple and low-quality outputs? The console-based approach caters to those seeking quick, straightforward responses without the burden of hefty costs. By leveraging offline capabilities, it eliminates the need for constant connectivity, ensuring cost-effectiveness for those looking for minimalistic language generation solutions.

Every language model has its own unique characteristics and applications. With console-based language generation, you can explore and experience the diverse capabilities of different llama models. Whether you're looking for a specific tone, style, or expertise, the ability to taste and experiment with various llama models opens up a world of creative possibilities.

No two scenarios are the same, and our language generation needs can vary based on context. The console-based approach empowers users to experiment with different configurations and scenarios. By modifying prompts, adjusting token limits, or exploring stop conditions, users can tailor the language generation experience to meet their specific requirements.

Prerequisites

Python and basic skills with pip
llama-cpp bindings for Python
A compatible LLaMA model
A computer with enough CPU, Memory and Storage (GPU is not required)

First of all, you have to install the dependencies

python3 -m pip install llama-cpp-python termcolor

Or if you have previously installed llama-cpp-python through pip and want to upgrade your version, just run this instead:

python3 -m pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh

Downloading the LLaMA model

To harness the power of llama-cpp-python, we need to start by downloading a suitable model.

For this blog post, we'll be working with the WizardLM model, specifically the wizardLM-7B.ggmlv3.q4_1.bin variant. This particular model strikes a balance between accuracy and inference speed, making it ideal for CPU usage.

While there are other models available with varying levels of accuracy and speed, feel free to experiment and find the one that best suits your needs. Keep in mind that the selected model may require 2.8 GB to 8 GB of disk space, and up to 10 GB of memory.

Ensure you explore the linked models in the README to discover the full range of supported options. This flexibility allows you to choose models that align with your specific requirements. With the right model in hand, you're ready to embark on a journey of language generation and exploration.

The Python Script

Using your favorite text editor, create a python script and name it whatever you want. e.g., my-script.py

Here’s our script to experiment with:

Note: Don't forget to insert you LLaMA model name in the MODEL_PATH variable.

#!/bin/python
import os
import sys
from termcolor import colored
from llama_cpp import Llama

MODEL_PATH = "models/<here comes the LLaMA model file name>"
MAX_TOKENS = 2048

def run_console_app():
    if not os.path.isfile(MODEL_PATH):
        print(f"Error: Invalid model path '{MODEL_PATH}'")
        sys.exit(1)

    llm = Llama(model_path=MODEL_PATH)

    print(colored("\n## Welcome to the Llama Console App! ##\n", "yellow"))
    print("Enter your prompt (or 'q' to quit):")

    while True:
        user_input = input("> ")
        if user_input.lower() == 'q':
            print("Exiting...")
            break

        user_input = f"Q: {user_input} A:"
        print(colored("Processing...", "yellow"))

        try:
            output = llm(user_input, max_tokens=MAX_TOKENS, stop=["Q:"], echo=True)

            choices_text = output["choices"][0]["text"]
            choices_text = choices_text.replace(user_input, "").strip()
            formatted_text = colored(choices_text, "green")
            print(f"\n{formatted_text}\n")
        except Exception as e:
            print(f"Error: Failed to generate response. {str(e)}")

if __name__ == '__main__':
    run_console_app()

How to use it?

That’s pretty much it! Now you can open your preferred terminal and just run python3 <your .py file name>.py and voilà.

You should see a prompt like this:

Conclusion

With the ability to taste different llama models and experiment with configurations, this approach puts the power of language generation directly into our hands. Whether you're a writer, researcher, or problem-solver, this console-based solution opens up a world of possibilities, even when you're on the go.

Happy chatting's!

Blog