What would SQLite look like if written in Rust? — Part 2

thepolyglotprogrammer

João Henrique Machado Silva

Posted on February 23, 2021

What would SQLite look like if written in Rust? — Part 2

What would SQLite look like if written in Rust? — Part 2

Writing a SQLite clone from scratch in Rust

Part 0 — Overview

Part 1 — Understanding SQLite and Setting up CLI Application and REPL

Part 3 - Understanding the B-Tree and its role on database design

Alright! We are building a clone of SQLite, aka SQLRite, and last time all we did was a simple CLI application that will use for external commands and some accessory functions and also a simple REPL that would take a simple command to exit the application gracefully.

Now we are taking another couple of steps towards our goal. First we want to parse the input to be able to differentiate if the input is a MetaCommand or a SQLCommand . A MetaCommand start with a dot and take direct actions like .open , .help and .exit . And a SQLCommand is, well, you know.

The second step we want to take is to be able parse each of the command types and take the appropriate action. For now, we wont go too far on the database side, but we do want to be able to differentiate between different SQL Statements and have their components broken down into parts and ready to be executed. So next time we can focus on getting the parsed SQL Statement and execute it. Even against a simplified in-memory storage.

SQLRite Flow Diagram ([https://github.com/joaoh82/rust_sqlite](https://github.com/joaoh82/rust_sqlite))

The frontend of SQLite parses the SQL statements, optimizes them, and as I mentioned on the previous post of the series, generates equivalent SQLite internal representation called bytecode

This bytecode is then passed to the virtual machine, on the backend side, which executes it.

Here is a diagram os the SQLite Architecture to refresh our minds.

SQLite Architecture (https://www.sqlite.org/arch.html)

Breaking the logic into steps like this has a couple advantages:

  • Reduces the complexity of each part (e.g. virtual machine does not worry about syntax errors).

  • Allows compiling common queries once and caching the bytecode for improved performance.

With this in mind, let’s get started!

First let’s look at some changes in our main.rs . Since this code can get quite extensive I won’t be commenting line by line, but instead focusing on the main points and design choices. For that reason I will try to add as many comments as possible and of course make the code as readable as I can. Nevertheless, please do not hesitate to start a discussion board, create an issue or contact me directly in case you have any questions.

Getting back to the project, as I mentioned above, the first thing I wanted to do is to be able to differentiate between a MetaCommand and a SQLCommand . You can see that I take care of that on line 64 by calling get_command_type(command: &String) that returns an enum of type repl::CommandType with the choices repl::CommanType::SQLCommand(String) and repl::CommanType::MetaCommand(String) . This way I can easily differentiate between the two types of inputs and take the appropriate action on each of them.

Also, as I mentioned on the previous post of the series, I like to keep the main.rs as clean as possible, and just like a point of entry to the application. The main.rs should have the least amount of clutter as possible.

extern crate clap;

mod meta_command;
mod repl;
mod sql;
mod error;

use repl::{REPLHelper, get_config, get_command_type, CommandType};
use meta_command::{handle_meta_command};
use sql::{process_command};

use rustyline::error::ReadlineError;
use rustyline::{Editor};

use clap::{App, crate_authors, crate_description, crate_version};

fn main() -> rustyline::Result<()> {
    env_logger::init();

    let _matches = App::new("Rust-SQLite")
                          .version(crate_version!())
                          .author(crate_authors!())
                          .about(crate_description!())
                          .get_matches();

    // Starting Rustyline with a default configuration
    let config = get_config();

    // Getting a new Rustyline Helper
    let helper = REPLHelper::default();

    // Initiatlizing Rustyline Editor with set config and setting helper
    let mut repl = Editor::with_config(config);
    repl.set_helper(Some(helper));

    // This method loads history file into memory
    // If it doesn't exist, creates one
    // TODO: Check history file size and if too big, clean it.
    if repl.load_history("history").is_err() {
        println!("No previous history.");
    }

    // Friendly intro message for the user
    println!("Rust-SQLite - {}\n{}{}{}{}",
    crate_version!(),
    "Enter .exit to quit.\n",
    "Enter .help for usage hints.\n",
    "Connected to a transient in-memory database.\n",
    "Use '.open FILENAME' to reopen on a persistent database.");

    loop {
        let p = format!("sqlrite> ");
        repl.helper_mut()
            .expect("No helper found")
            .colored_prompt = format!("\x1b[1;32m{}\x1b[0m", p);
        // Source for ANSI Color information: http://www.perpetualpc.net/6429_colors.html#color_list
        // http://bixense.com/clicolors/

        let readline = repl.readline(&p);
        match readline {
            Ok(command) => {
                repl.add_history_entry(command.as_str());
                // Parsing user's input and returning and enum of repl::CommandType
                match get_command_type(&command.trim().to_owned()) {
                    CommandType::SQLCommand(_cmd) => {
                        // process_command takes care of tokenizing, parsing and executing
                        // the SQL Statement and returning a Result<String, SQLRiteError>
                        let _ = match process_command(&command) {
                            Ok(response) => println!("{}",response),
                            Err(err) => println!("An error occured: {}", err),
                        };
                    }
                    CommandType::MetaCommand(cmd) => {
                        // handle_meta_command parses and executes the MetaCommand
                        // and returns a Result<String, SQLRiteError>
                        let _ = match handle_meta_command(cmd) {
                            Ok(response) => println!("{}",response),
                            Err(err) => println!("An error occured: {}", err),
                        };
                    }
                }
            }
            Err(ReadlineError::Interrupted) => {
                break;
            }
            Err(ReadlineError::Eof) => {
                break;
            }
            Err(err) => {
                println!("An error occured: {:?}", err);
                break;
            }
        }
    }
    repl.append_history("history").unwrap();

    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Meta Commands

Next we will move to the meta_command module, which in the repository you will find it in src/meta_command/mod.rs . The idea here is to write a code that is scalable up to a point. I want to easily be able to add more MetaCommands in the future. There are four points on this module.

First the enum type definition, that to improve user experience I added an option Unknown to pick up any MetaCommands not yet defined. After that we have an impl block for the fmt::Display trait, that helps us configure how custom types would be printed out in case we want to use them in a println! macro for example. Then on line 25 you will see another impl block with a fn new method, acting as a constructor for our MetaCommand type. I say acting because Rust is not an Object Oriented language, so the fn new is not like a constructor would be in languages like Java , in fact you could call it anything you want instead of new .

And last but not least we have the pub fn handle_meta_command function, that is responsible for matching the inputed MetaCommand to it’s appropriate command and taking action. You will notice that is returns a Result<String, SQLRiteError> , so we can return a message to the user with ease.

use crate::error::{Result, SQLRiteError};

use std::fmt;

#[derive(Debug, PartialEq)]
pub enum MetaCommand {
    Exit,
    Help,
    Open(String),
    Unknown,
}

/// Trait responsible for translating type into a formated text.
impl fmt::Display for MetaCommand {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            MetaCommand::Exit => f.write_str(".exit"),
            MetaCommand::Help => f.write_str(".help"),
            MetaCommand::Open(_) => f.write_str(".open"),
            MetaCommand::Unknown => f.write_str("Unknown command"),
        }
    }
}

impl MetaCommand {
    pub fn new(command: String) -> MetaCommand {
        let args: Vec<&str> = command.split_whitespace().collect();
        let cmd = args[0].to_owned();
        match cmd.as_ref() {
            ".exit" => MetaCommand::Exit,
            ".help" => MetaCommand::Help,
            ".open" => MetaCommand::Open(command),
            _ => MetaCommand::Unknown,
        }
    }
}

pub fn handle_meta_command(command: MetaCommand) -> Result<String> {
    match command {
        MetaCommand::Exit => std::process::exit(0),
        MetaCommand::Help => {
            Ok(format!("{}{}{}{}{}","Special commands:\n",
                            ".help - Display this message\n",
                            ".open <FILENAME> - Reopens a persistent database.\n",
                            ".ast <QUERY> - Show the abstract syntax tree for QUERY.\n",
                            ".exit - Quits this application"))
        },
        MetaCommand::Open(args) => Ok(format!("To be implemented: {}", args)),
        MetaCommand::Unknown => Err(SQLRiteError::UnknownCommand(format!("Unknown command or invalid arguments. Enter '.help'"))),
    }
}
Enter fullscreen mode Exit fullscreen mode

Alright people! We are finally going to do some database stuff! I bet everyone was like “Wasn’t this guy suppose to build a database?”, well yeah, but you gotta build a base first.

Like when you are building a house, laying a nice foundation is one of the most important things you can do for your software.

Database Stuff

Database Stuff

This is our sql module and in the github repository you will find it in src/sql/mod.rs . This actually doesn’t look that different from our meta_command module, at least structure wise. We have an enum , defining the types of queries we plan to support at first. Then a impl block with a fn new method, again acting as a constructor .

And then a fn process_command function returning a Result<String, SQLRiteError>, that if you can remember is invoked from our main.rs . On this function is where the magic starts to happen. You will notice that right at the beginning of the fn process_command function we make use of the sqlparser-rs crate, that did a great job building a Extensible SQL Lexer and Parser for Rust with a number of different SQL dialects, including a SQLite dialect, so for the time being I decided to go with them instead of writing a completely new SQL Lexer . By calling Parser::parse_sql() I am getting it back a Result<Vec<Statement>, ParserError which I do some basic checking and pass it it to a match statement to determine which type of SQL Statement was inputed or if there was an error during the process, if so I just return the error. The Statement returned is a sqlparser::ast::Statement , which is an enum of all the possible statements, as you can see in the link I added from the sqlparser documentation.

For now, the only SQL Statement I managed to actually build the parser was CREATE TABLE , for the rest so far we are only identifying the type of SQL Statement and returning to the user. In the match statement block that matches with CREATE TABLE we call another module parser::create which contains all the logic for the CREATE TABLE . I have this one right after this block.

mod parser;
pub mod tokenizer;

use parser::create::{CreateQuery};

use sqlparser::ast::{Statement};
use sqlparser::dialect::SQLiteDialect;
use sqlparser::parser::{Parser, ParserError};

use crate::error::{SQLRiteError, Result};

#[derive(Debug,PartialEq)]
pub enum SQLCommand {
    Insert(String),
    Delete(String),
    Update(String),
    CreateTable(String),
    Select(String),
    Unknown(String),
}

impl SQLCommand {
    pub fn new(command: String) -> SQLCommand {
        let v = command.split(" ").collect::<Vec<&str>>();
        match v[0] {
            "insert" => SQLCommand::Insert(command),
            "update" => SQLCommand::Update(command),
            "delete" => SQLCommand::Delete(command),
            "create" => SQLCommand::CreateTable(command),
            "select" => SQLCommand::Select(command),
            _ => SQLCommand::Unknown(command),
        }
    }
}

/// Performs initial parsing of SQL Statement using sqlparser-rs
pub fn process_command(query: &str) -> Result<String> {
    let dialect = SQLiteDialect{};
    let message: String;
    let mut ast = Parser::parse_sql(&dialect, &query).map_err(SQLRiteError::from)?;

    if ast.len() > 1 {
        return Err(SQLRiteError::SqlError(ParserError::ParserError(format!(
            "Expected a single query statement, but there are {}",
            ast.len()
        ))));
    }

    let query = ast.pop().unwrap();

    // Initially only implementing some basic SQL Statements
    match query {
        Statement::CreateTable{..} => {
            let result = CreateQuery::new(&query);
            match result {
                Ok(payload) => {
                    println!("Table name: {}", payload.table_name);
                    for col in payload.columns {
                        println!("Column Name: {}, Column Type: {}", col.name, col.datatype);
                    }
                },
                Err(err) => return Err(err),
            }
            message = String::from("CREATE TABLE Statement executed.");
            // TODO: Push table to DB
        },
        Statement::Query(_query) => message = String::from("SELECT Statement executed."),
        Statement::Insert {..} => message = String::from("INSERT Statement executed."),
        Statement::Delete{..} => message = String::from("DELETE Statement executed."),
        _ => {
            return Err(SQLRiteError::NotImplemented(
                "SQL Statement not supported yet.".to_string()))
        }
    };

    Ok(message)
}
Enter fullscreen mode Exit fullscreen mode

This is our sql::parser::create module. Here we have two struct types defined. The first one being ParsedColumn , well, representing a column in a table and the second one being CreateQuery , representing a table. As you can see the CreateQuery struct has a property called columns which is a vector of ParsedColumns . And our main method on this module, which is the fn new , returns a Result<CreateTable, SQLRiteError> , which will then be inserted into our database data structure that is still to be defined in the code, although I already have a pretty good idea of what is going to look like in my head and my design notes.

use sqlparser::ast::{ColumnOption, DataType, ObjectName, Statement};

use crate::error::{SQLRiteError, Result};

// Represents Columns in a table
#[derive(PartialEq, Debug)]
pub struct ParsedColumn {
    pub name: String,
    pub datatype: String,
    pub is_pk: bool,
    pub is_nullable: bool,
}

/// Represents a SQL Statement CREATE TABLE
#[derive(Debug)]
pub struct CreateQuery {
    pub table_name: String,         // table name
    pub columns: Vec<ParsedColumn>, // columns
}

impl CreateQuery {
    pub fn new(statement: &Statement) -> Result<CreateQuery> {
        match statement {
            // Confirming the Statement is sqlparser::ast:Statement::CreateTable
            Statement::CreateTable {
                name,
                columns,
                constraints: _constraints,
                with_options: _with_options,
                external: _external,
                file_format: _file_format,
                location: _location,
                ..
            } => {
                let table_name = name;
                let mut parsed_columns: Vec<ParsedColumn> = vec![];

                // Iterating over the columns returned form the Parser::parse:sql
                // in the mod sql
                for col in columns {
                    let name = col.name.to_string();
                    // TODO: Add datetime and timestamp here
                    // Parsing each column for it data type
                    // For now only accepting basic data types
                    let datatype = match &col.data_type {
                        DataType::SmallInt => "int",
                        DataType::Int => "int",
                        DataType::BigInt => "int",
                        DataType::Boolean => "bool",
                        DataType::Text => "string",
                        DataType::Varchar(_bytes) => "string",
                        DataType::Float(_precision) => "float",
                        DataType::Double => "float",
                        DataType::Decimal(_precision1, _precision2) => "float",
                        _ => {
                            println!("not matched on custom type");
                            "invalid"
                        }
                    };

                    let mut is_pk: bool = false;
                    for column_option in &col.options {
                        is_pk = match column_option.option {
                            ColumnOption::Unique { is_primary } => is_primary,
                            _ => false,
                        };
                    }

                    parsed_columns.push(ParsedColumn {
                        name,
                        datatype: datatype.to_string(),
                        is_pk,
                        is_nullable: false,
                    });
                }
                // TODO: Handle constraints,
                // Unique, Primary Key, Nullable, Default value and others.
                for constraint in _constraints {
                    println!("{:?}", constraint);
                }
                return Ok(CreateQuery {
                    table_name: table_name.to_string(),
                    columns: parsed_columns,
                });
            }

            _ => return Err(SQLRiteError::Internal("Error parsing query".to_string())),
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Dealing with errors

You may have noticed that throughout the entire code I am making reference to a SQLRiteError type. That is an error type I defined as an enum using the thiserror crate, that is a super easy to use library that provides a convenient derive macro for the standard library’s std::error::Error trait. If you check the commits in the github repository, you may notice that I first wrote my own implementation of the std::error::Error trait. But then I bumped into this trait, that basically takes care of a lot of the boiler plate, and let’s face it, the code looks super clean! This is our error module so far, located in src/error.rs .

use thiserror::Error;

use std::{result};

use sqlparser::parser::ParserError;

pub type Result<T> = result::Result<T, SQLRiteError>;

#[derive(Error, Debug, PartialEq)]
pub enum SQLRiteError {
    #[error("Not Implemented error: {0}")]
    NotImplemented(String),
    #[error("General error: {0}")]
    General(String),
    #[error("Internal error: {0}")]
    Internal(String),
    #[error("Unknown command error: {0}")]
    UnknownCommand(String),
    #[error("SQL error: {0:?}")] 
    SqlError(#[from] ParserError),
}

/// Return SQLRite errors from String
pub fn sqlrite_error(message: &str) -> SQLRiteError {
    SQLRiteError::General(message.to_owned())
}
Enter fullscreen mode Exit fullscreen mode

Summary

Alright! This time we managed to parse the user’s commands to differentiate between MetaCommand and SQLCommand . We also implemented a somewhat scalable MetaCommand module that makes it easy to add more commands in the future. We added the sql module, which by making use of the sqlparser-rs crate we are successfully parsing SQL Statements and being able to generate an ast from each SQL Statement. And we are already parsing and generating at least a simplified version of a bytecode from the CREATE TABLE SQL Statement, that is ready to go into the database (which we will do in the next chapter). And to finish it off, we also created an error module, so we have a standardized way of dealing with error throughout the application.

I would say that we are starting out with a nice base. What do you think?

I added a Project Progress and Roadmap sections to the github repository README.md , to add some visibility on where we are and where we are going.

Next time, we will finish parsing the basic SQL Statements we plan to be compatible with and start working on a In-Memory simplified version of out database .

View on Github (pull requests are more then welcome)

If you wanna follow this track don’t forget to follow me here on Dev.to and also give some love!

Part 0 — Overview

Part 1 — Understanding SQLite and Setting up CLI Application and REPL

Part 3 - Understanding the B-Tree and its role on database design

💖 💪 🙅 🚩
thepolyglotprogrammer
João Henrique Machado Silva

Posted on February 23, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related