Using RAG to Build Your IDE Agents

In the post-GPT revolution era, many of us developers have started using LLM-enabled tools in our development workflows. Nowadays, you can complete new and complex development tasks in a short span with the help of theses LLM tools when used correctly.

Until you start using them for anything related to new APIs or SDKs or their latest version, this is the place where they fall short.

Fixing the Shortcomings with RAG

At CommandDash (formerly Welltested), our team has been working in code generation. Like other organizations in this field, we recognized the challenges and have actively developed solutions.

Therefore, with the Dash Agent Framework, we initiated building a robust RAG system to tackle these issues from inception stage itself.

What is RAG?

Retrieval-Augmented Generation (RAG) is a powerful technique that combines the capabilities of LLMs with relevant references to enrich the responses. These relevant references are typically derived from external knowledge sources like document databases and more.

RAG significantly enhances the capabilities of LLMs, especially when working with new packages and frameworks. By accessing up-to-date information from documentation, code examples, and other sources, RAG-based LLMs can:

Provide accurate and contextual responses: Instead of relying solely on pre-trained data, LLMs can access the latest documentation and code examples to provide accurate and relevant information.
Adapt to evolving technologies: As APIs and SDKs evolve, RAG can keep pace by constantly updating its knowledge base from official sources.

In this blog, we will build a powerful IDE agent for PandasAI using Dash Agent. Then later on, we'll understand how using RAG can significantly improve LLM responses.

Building PandasAI Agent

PandasAI is a Python platform that makes it easy to ask questions about your data in a natural language. It integrates generative artificial intelligence capabilities into pandas to allow you to extract insights effortlessly.

Now that you're familiar with PandasAI. Let's start our journey to build our own PandasAI agent. The job of this agent will be to assist developers in building and integrating PandasAI code efficiently.

Prerequisite Steps

1. Install Dart
Dash Agent is built upon dart language. If you haven't already, follow the official Flutter installation instructions here.

2. Install dash_cli
Now, install the dash_cli command line tool that enables you to create and publish your agents at the CommandDash marketplace. Open your terminal and run the following command:

dart pub global activate dash_cli

Create PandasAI Project

Next, you will create the pandas_ai project. This is the place where you will define your agent configurations. Run the following command in the terminal:

dash_cli create pandas_ai

This will create a dash agent project that contains the template code agent building. Then, open the project in your preferred IDE where the flutter extension is installed.

Adding Agent Data Sources

The core of a RAG-based agent lies in its knowledge base, known as data sources. These sources provide the agent with context and information to understand and respond to user requests.

For our PandasAI agent, we will gather data from the following sources:

Official PandasAI Documentation: https://docs.pandas-ai.com, https://pandasai-docs.readthedocs.io/en/latest
Official Examples and Issues shared by PandasAI team: https://github.com/sinaptik-ai/pandas-ai
Other Open Source Examples: CSV Chatbot, GroqMultiCSVChatPandasAI , MutipleCSVChatllama3Pandasai, PandasAI-Tutorials

Navigate to the lib/data_sources file in your project and replace the existing code with:

import 'package:dash_agent/data/datasource.dart';
import 'package:dash_agent/data/filters/filter.dart';
import 'package:dash_agent/data/objects/file_data_object.dart';
import 'package:dash_agent/data/objects/project_data_object.dart';
import 'package:dash_agent/data/objects/web_data_object.dart';

// Indexes all the documentation related data
class DocsDataSource extends DataSource {
  @override
  List<FileDataObject> get fileObjects => [];

  @override
  List<ProjectDataObject> get projectObjects => [];

  @override
  List<WebDataObject> get webObjects => [
        WebDataObject.fromSiteMap('https://docs.pandas-ai.com/sitemap.xml'),
        WebDataObject.fromSiteMap(
            'https://www.xml-sitemaps.com/download/pandasai-docs.readthedocs.io-a2835e7d4/sitemap.xml?view=1'),
      ];
}

// Indexes all the example code and issues related data
class ExampleDataSource extends DataSource {
  final accessToken = 'your_personal_github_access_token';
  @override
  List<FileDataObject> get fileObjects => [];

  @override
  List<ProjectDataObject> get projectObjects => [];

  @override
  List<WebDataObject> get webObjects => [
        WebDataObject.fromGithub(
            'https://github.com/sinaptik-ai/pandas-ai', accessToken,
            codeFilter: CodeFilter(pathRegex: r'^examples\/.*')),
        WebDataObject.fromGithub(
            'https://github.com/ismailtachafine/PandasAI-CSV-Analysis',
            accessToken,
            codeFilter: CodeFilter(pathRegex: r'.*\.py$')),
        WebDataObject.fromGithub(
            'https://github.com/kBrutal/CSV_ChatBot', accessToken,
            codeFilter: CodeFilter(pathRegex: r'.*\.py$')),
        WebDataObject.fromGithub(
            'https://github.com/InsightEdge01/GroqMultiCSVChatPandasAI',
            accessToken,
            codeFilter: CodeFilter(pathRegex: r'.*\.py$')),
        WebDataObject.fromGithub(
            'https://github.com/InsightEdge01/MutipleCSVChatllama3Pandasai',
            accessToken,
            codeFilter: CodeFilter(pathRegex: r'.*\.py$')),
        WebDataObject.fromGithub(
            'https://github.com/TirendazAcademy/PandasAI-Tutorials',
            accessToken,
            codeFilter: CodeFilter(pathRegex: r'.*\.py$')),
      ];
}

The above code shared the references of sources that need to be indexed both the documentation and examples. Apart from the sources link, you have also provided accessToken and codeFilter:

accessToken: During processing, the CommandDash server indexes data for WebDataObject.fromGithub via Github's official API. To fetch the data from the GitHub API efficiently, the personal github token is required and can be easily generated by visiting the tokens page.
CodeFilter: This filter enables the framework to selectively index the code files based on the regex shared. This is optional.

Note: Your Personal Access Token is very sensitive data. Please make sure not it share it with anyone or push it to any public source.

You can learn more about WebDataObject and associated properties in detail at CommandDash documentation.

Adding Agent System Prompt and Metadata

Next, you'll add system prompt and agent metadata to the AgentConfiguration class. Navigate to lib/agent.dart file and replace the existing code with:

import 'package:dash_agent/configuration/metadata.dart';
import 'package:dash_agent/data/datasource.dart';
import 'package:dash_agent/configuration/command.dart';
import 'package:dash_agent/configuration/dash_agent.dart';
import 'data_sources.dart';

class PandasAI extends AgentConfiguration {
  final docsDataSource = DocsDataSource();
  final exampleDataSource = ExampleDataSource();

  // Add the metadata information about PandasAI agent
  @override
  Metadata get metadata => Metadata(
      name: 'Pandas AI',
      avatarProfile: 'assets/logo.jpeg',
      tags: ['LLM Framework', 'Data Analysis']);

  // Add the systemPrompt for dash agent's commandless mode (also know as chat mode). 
  // System prompt is a key component for conversational-style agents. As it provides 
  // the initial context and guidance regarding the agent's purpose and functionality to the LLM.
  @override
  String get registerSystemPrompt =>
      '''You are a Pandas AI assistant inside user's IDE. PandasAI is a Python library that makes it easy to ask questions to your data in natural language.

      You will be provided with latest docs and examples relevant to user questions and you have to help them achieve their desired results. Output code and quote links and say I don't know when the docs don't cover the user's query.''';

  // Add the data sources that needs to indexed for RAG purposes.
  @override
  List<DataSource> get registerDataSources =>
      [docsDataSource, exampleDataSource];

  @override
  List<Command> get registerSupportedCommands => [];
}

The above code basically glues everything all together for the PandasAI agent - data source, metadata, system prompt, commands, etc. that are needed to build the dash agent.

For more details related to the AgentConfiguration please read the dash_agent framework.

Finally, head to bin/main.dart file and replace the existing code with:

import 'package:dash_agent/dash_agent.dart';
import 'package:pandas_ai/agent.dart';

/// Entry point used by the [dash-cli] to extract your agent
/// configuration during publishing.
Future<void> main() async {
  await processAgent(PandasAI());
}

That's it. Your agent is now configured and ready to be used. Next, you'll publish it so that it can be tested and shared with other devs as well.

Publishing the PandasAI Agent

You need to be logged in to dash_cli using GitHub auth to publish your agent. Run the following command in the terminal to login:

dash_cli login

Finally, run the following command in the terminal from the root folder of your pandas_ai project to publish the agent:

dash_cli publish

This will validate the configuration and if all looks good. It will schedule your agent publication. Once your agent is ready to be used. You will get an email confirming the successful publication and PandasAI will be visible in the CommandDash Marketplace:

What's Next

Congratulations! 🎉 Now you know how to create powerful agents using the Dash Agent framework. These agents leverage the power of RAG and LLMs. We're excited to see the innovative agents you'll build with Dash Agent.

Also, don't forget to try out the PandasAI agent, which is currently live on the CommandDash extension for VS Code. Check it out here.

Next up, we will see in our upcoming blog, how well PandasAI perform. Stay tuned!

Blog