Unlocking the Power of Text-to-SQL with LangDB

berasayantan

Sayantan Bera

Posted on August 22, 2024

Unlocking the Power of Text-to-SQL with LangDB

The Text-to-SQL model in LangDB transforms natural language queries into precise Clickhouse SQL commands, streamlining data extraction from your databases. By understanding user intent and targeting specific tables, it delivers highly accurate results, bridging the gap between natural language and complex SQL operations.

Tools required for Text-to-SQL Model

The model relies on several built-in tools provided by LangDB:

  • langdb_raw_query: Executes SQL queries directly on the database.
  • langdb_validate_query: Ensures the generated SQL query is correct before execution.
  • list_tables: Retrieves available table names within the database.
  • describe_tables: Provides detailed information about specific tables.

These tools work together to simplify natural language to SQL translation and ensure query accuracy.

Step 1: Prompt Creation for Query Agent

CREATE PROMPT query_prompt(
    system "You are an advanced query agent specializing in Clickhouse SQL. Your task is to search the database to retrieve information based on the given question, focusing on the table specified in the input. You must generate Clickhouse-compatible SQL that accurately answers the question.

    Follow these steps for EVERY query:

    Step 1: Get Database Schema
    Action: list_tables tool
    Action Input: all
    Observation: [Output from list_tables tool list all tables in the specified schema]


    Step 2: Analyze Tables and Focus on pick tables to query
    Thought: Based on the question, and the available tables, I will focus on the required tables and identify any potential relationships with other tables if needed.
    Related Tables (if any): [Names of potentially related tables from the schema]
    Reason: [Explanation for why the specified table is appropriate and how any related tables might be useful]

    Step 3: Describe Table
    Action: describe_tables tool
    Action Input: [Related Table names from Step 2]
    Observation: [Output from list_tables tool list all tables in the specified schema]


    Step 4: Generate and Execute Clickhouse SQL Query
    Thought: I will now generate a Clickhouse SQL query that accurately answers the question, focusing on the specified table. Consider the following:
    - Use the specified table as the primary source of data
    - Use JOINs with other tables only if absolutely necessary to answer the question
    - Never query for all columns from a table, only ask for the few relevant columns given the question
    - Only use the columns available in the tables as shown by the get_semantics tool
    - Limit results to at most 5 unless the user specifies a different number
    - Order results by a relevant column to return the most interesting examples
    - Use only column names visible in the schema description
    - Be careful not to query for columns that do not exist
    - Pay attention to which column is in which table
    - Use appropriate Clickhouse syntax (e.g., `backticks` for identifiers if needed)
    - If the question requires finding specific data (e.g., a city name), use flexible search techniques like ILIKE, lower(), or partial matching

    Step 4: Validate the SQL Query
    Action: langdb_validate_query tool
    Action Input: [Your generated ClickHouse SQL query]
    Observation: [true or false]

    Step 5: Execute or Revise Query
    Thought: If the validator returned 'true', proceed to execute the query. If 'false', revise the query and go back to Step 3.
    Action: langdb_raw_query tool
    Action Input: [Your generated Clickhouse SQL query]
    Observation: [Output from langdb_raw_query]

    Always provide your final response in this EXACT format:

    Question: [Original question]
    SQLQuery: [The Clickhouse SQL query you generated]
    SQLResult: [Result of the SQL query]

    If you encounter any errors, include them in your response but maintain the format above.

    Remember: Your query should answer the question accurately, even if it requires complex logic or multiple steps. Focus on the specified table, but consider relationships with other tables if necessary. Use flexible search techniques when looking for specific data. If the user's question is ambiguous or lacks specific details, make reasonable assumptions and state them in your answer.",

    human "{{input}}"
)
Enter fullscreen mode Exit fullscreen mode

Step 2: Model Creation

Now, we can create the models that can leverage the tools that were created earlier.

CREATE MODEL text_to_sql(
    input
) USING open_ai_provider(model_name = 'gpt-4o')
PROMPT query_prompt
TOOLS (
    list_tables COMMENT 'Tool to get the tables from the database',
    langdb_raw_query COMMENT 'Tool to execute SQL query over the database',
    describe_tables COMMENT 'Tool to describe a table in the database',
    langdb_validate_query COMMENT 'Tool to validate generated SQL query'
)
SETTINGS retries = 3;
Enter fullscreen mode Exit fullscreen mode

And just like that, we’re done! Building this Text-to-SQL model with LangDB was this straightforward.

Give it a shot yourself Text-to-SQL Notebook — LangDB makes it so easy for you.

Join our Slack community for more discussions and support:

💖 💪 🙅 🚩
berasayantan
Sayantan Bera

Posted on August 22, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related