Training ChatGPT with local data to create your own chat bot!

0xmichaelwahl

Michael Wahl

Posted on July 2, 2023

Training ChatGPT with local data to create your own chat bot!

Using OpenAI’s ChatGPT, we can train a language model using our own local/custom data, thats scoped toward our own needs or use cases.

I am using a Mac/MacOS, but you can also use Windows or Linux.

Install Python
You need to make sure you have Python installed, and at least version 3.0+. Head over to following link and download python installer: . You can also open a terminal and run python3 --version to verify you have the correct version of python installed.

Upgrade PIP
python3 -m pip install -U pip

Installing Libraries

pip3 install openai
pip install gpt_index==0.4.24
pip3 install PyPDF2
pip3 install gradio
Enter fullscreen mode Exit fullscreen mode

Get OpenAI key

Prep Data
Create a new directory named ‘docs’ anywhere you like and put PDF, TXT or CSV files inside it. You can add multiple files if you like but remember that more data you add, more the tokens will be used. Free accounts are given 18$ worth of tokens to use.

Script

from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import gradio as gr
import sys
import os

os.environ["OPENAI_API_KEY"] = 'ApiGoesHere'

def construct_index(directory_path):
    max_input_size = 4096
    num_outputs = 512
    max_chunk_overlap = 20
    chunk_size_limit = 600

    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.7, model_name="text-davinci-003", max_tokens=num_outputs))

    documents = SimpleDirectoryReader(directory_path).load_data()

    index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)

    index.save_to_disk('index.json')

    return index

def chatbot(input_text):
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    response = index.query(input_text, response_mode="compact")
    return response.response

iface = gr.Interface(fn=chatbot,
                     inputs=gr.inputs.Textbox(lines=7, label="Enter your text"),
                     outputs="text",
                     title="My AI Chatbot")

index = construct_index("docs")
iface.launch(share=True)

Enter fullscreen mode Exit fullscreen mode

Save as app.py

Open Terminal and run following command

python3 app.py

This will start training. This might take some time based on how much data you have fed to it. Once done, it will output a link where you can test the responses using simple UI. It outputs local URL: http://127.0.0.1:7860

You can open this in any browser and start testing your custom trained chatbot. The port number above might be different for you.

To train on more or different data, you can close using CTRL + C and change files and then run the python file again.

If this article was helpful, maybe consider a clap or follow me back.

💖 💪 🙅 🚩
0xmichaelwahl
Michael Wahl

Posted on July 2, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related