Writing a new programming language. Part III: Type System
Konstantin Grechishchev
Posted on August 13, 2022
Hello everyone!
Welcome to the third part of the new LR-language tutorial series. Here is the link for Part II. Last time we've successfully implemented the first version of the LR-language interpreter. We were able to write programs with variables of the integer type and use them in the arithmetical expressions, assigning the results to other variables.
The state of the code by the end of this tutorial could be found in the part_3 branch of the GitHub repository. The changes made today could be seen in the e5e04bc commit
Goal for today
The goal for today is to extend the language to support String and float types as well! We will also improve runtime error reporting a bit. We would like to enhance LR-language to be able to run the following program:
{
let hello: String = "Hello";
let world: String = "World";
let hello_word: String = hello + " " + world;
let pi: Float = 3.14;
let r: Int = 5;
let square: Float = 2 * pi * r * r;
let value: String = "The square of the circle with the r = " + r + " is " + square;
}
Representing values and types
Values
As a recall, here is our current representation of the expression in the AST:
pub enum Expr {
Number(i32),
Identifier(String),
Op(Box<Expr>, Opcode, Box<Expr>),
}
The Number(i32)
is the topic of today's discussion. It is produced every time we see a numeric literal in the LR-program code:
Num: i32 = {
r"-?[0-9]+" => i32::from_str(<>).unwrap()
};
Term: Box<Expr> = {
Num => Box::new(Expr::Number(<>)),
Identifier => Box::new(Expr::Identifier(<>)),
"(" <Expr> ")"
};
However, we would like to support more types now, so it would make sense to rename Number
to Constant
. The i32
type as a value would nigher work any longer, so we replace it with the following enum:
pub enum Value {
Int(i32),
Float(f32),
String(String),
}
The updated section of the grammar looks like the following:
IntNum: i32 = {
r"-?[0-9]+" => i32::from_str(<>).unwrap()
};
FloatNum: f32 = {
r"-?[0-9]+\.[0-9]+" => f32::from_str(<>).unwrap()
};
StringLiteral: String = {
r#""[^"]*""# => <>[1..<>.len() - 1].to_owned()
};
Term: Box<Expr> = {
IntNum => Box::new(Expr::Constant(Value::Int(<>))),
FloatNum => Box::new(Expr::Constant(Value::Float(<>))),
StringLiteral => Box::new(Expr::Constant(Value::String(<>))),
Identifier => Box::new(Expr::Identifier(<>)),
"(" <Expr> ")"
};
We introduce 2 new types of literals: the number containing a dot (FloatNum
) and any characters in double quotes (StringLiteral
). The weird <>[1..<>.len() - 1]
expression simply removes the first and last character of the literal as we would like to get the string literal inside double quotes, but without them.
Types
There are two ways to deal with variable types. The first option is to derive them based on the value assigned. The second is to explicitly specify the type when you declare a variable. The first one is often the variation of the second with the type specifier being optional, so to keep it simple, we will implement option 2 for now.
First, we need to define the Type
enum:
pub enum Type {
Int,
Float,
String,
}
and the grammar for it:
Type: Type = {
"Int" => Type::Int,
"String" => Type::String,
"Float" => Type::Float,
}
Nothing special here.
As the next step, we make a slight modification to the Statement
production rule
Statement: Statement = {
"let" <Identifier> ":" <Type> "=" <Expr> ";" => Statement::new_definition(<>),
<Identifier> "=" <Expr> ";" => Statement::new_assignment(<>)
};
While we keep the assignment rule the same, we make it mandatory to specify the type when declaring a variable. The <>
operator will now pass 3 arguments to the new_definition
method, so we have to modify it as well:
pub enum Statement {
Assignment {
identifier: String,
expression: Box<Expr>,
},
Definition {
identifier: String,
expression: Box<Expr>,
value_type: Type,
},
}
impl Statement {
pub fn new_assignment(identifier: String, expression: Box<Expr>) -> Self {
Self::Assignment {
identifier,
expression,
}
}
pub fn new_definition(identifier: String, value_type: Type, expression: Box<Expr>) -> Self {
Self::Definition {
identifier,
value_type,
expression,
}
}
}
The StatementBody
struct cannot be reused any longer, so we just get rid of it.
Display traits
These are the only changes we had to make to add type system support to our grammar and syntax tree. Before we start making changes to the interpreter part, let's implement the Display for our AST structs. This would help to print them using {}
in a human-readable form and allow us to create nice runtime error messages. Here is an example of the Display
trait implementation for Expr
enum:
impl Display for Expr {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
match self {
Expr::Constant(v) => write!(f, "{}", v),
Expr::Identifier(id) => write!(f, "{}", id),
Expr::Op(e1, op, e2) => write!(f, "({} {} {})", e1, op, e2),
}
}
}
The implementation of it for other types looks very similar, so I omit it here, but you can find them in the GitHub repository.
Update the interpreter
The grammar and AST are ready, however, our code does not yet compile, because we need to make changes to the Frame
struct to store value type as well as implement the arithmetical expressions for the new enum.
Variable storage
We start with a trivial change to replace the type of local_variables
field HashMap<String, i32>
with local_variables: HashMap<String, Value>
. All the methods of the Frame
struct should now accept and return Value
instead of i32
correspondingly.
We should also perform an extra check in the assign_value
and define_variable
methods: we don't want to allow assigning the value of, let's say, String
type to the variable declared as Integer
:
pub fn assign_value(&mut self, variable_name: &str, value: Value) -> Result<(), VariableError> {
if let Some(variable) = self.local_variables.get_mut(variable_name) {
if Type::from(variable.deref()) == Type::from(&value) {
*variable = value;
Ok(())
} else {
Err(VariableError::TypeMismatch(
Type::from(&value),
variable_name.to_owned(),
Type::from(variable.deref()),
))
}
} else if let Some(parent) = self.parent.as_mut() {
parent.assign_value(variable_name, value)
} else {
Err(VariableError::UndefinedVariable(variable_name.to_owned()))
}
}
In order to represent this type of error, we introduce the new error enum VariableError
and also move the UndefinedVariable
error type to it:
#[derive(Error, Debug)]
pub enum VariableError {
#[error("Variable {0} is not defined")]
UndefinedVariable(String),
#[error("Unable to assign the value of type {0} to variable '{1}' of type {2}")]
TypeMismatch(Type, String, Type),
}
Please refer to the frame.rs
to see the full source code of the updated Frame
struct
Arithmetic operations
If we try to compile the code now, the rust compiler will return us an error explaining that it does not know how to apply the +
, -
, *
and /
between the instances of Value
enum. Fortunately, all we have to do is to implement Add, Sub, Mul and Div traits for it!
The main problem to solve here is to figure out the result type of the operation. It is obvious that 5 + 5
is 10
, but what should happen when you write 5 + "hello"
? Shall it be an error or shall it return a string containing 5hello
?
Let's establish the following rules:
- The result of the Int division is Int, the
%
remainder operations in not supported for now. - For any operation involving float and int, the result type should be float and we would cast int to float before the operation.
- Addition of anything to the string results in a new string. No other operations except
+
are supported for a string.
With this in mind, we can implement the Add
trait for the Value
:
impl Add for Value {
type Output = Result<Value, OperationError>;
fn add(self, other: Self) -> Self::Output {
let value = match self {
Value::Int(v1) => match other {
Value::Int(v2) => Value::Int(v1 + v2),
Value::Float(v2) => Value::Float(v1 as f32 + v2),
Value::String(v2) => Value::String(v1.to_string() + v2.as_str()),
},
Value::Float(v1) => match other {
Value::Int(v2) => Value::Float(v1 + v2 as f32),
Value::Float(v2) => Value::Float(v1 + v2),
Value::String(v2) => Value::String(v1.to_string() + v2.as_str()),
},
Value::String(v1) => match other {
Value::Int(v2) => Value::String(v1 + v2.to_string().as_str()),
Value::Float(v2) => Value::String(v1 + v2.to_string().as_str()),
Value::String(v2) => Value::String(v1 + v2.as_str()),
},
};
Ok(value)
}
}
Note that despite add
is never returning an error, the return type of it is still Result<Value, OperationError>
. The reason for it is to keep it consistent with other traits implementations and avoid making changes to the method signature when we add new types in the future.
The error enum is having only one variant for now:
#[derive(Error, Debug)]
pub enum OperationError {
#[error("Operation {0} {1} {2} is not defined")]
IncompatibleTypes(Type, Opcode, Type),
}
We also define a helpful macro to produce the error in the operations which could fail:
macro_rules! error {
($type_1:ident, $op:ident, $type_2:ident) => {
Err(OperationError::IncompatibleTypes(
Type::$type_1,
Opcode::$op,
Type::$type_2,
))
};
}
We can use it to implement e.g. a multiplication operation:
impl Mul for Value {
type Output = Result<Value, OperationError>;
fn mul(self, other: Self) -> Self::Output {
match self {
Value::Int(v1) => match other {
Value::Int(v2) => Ok(Value::Int(v1 * v2)),
Value::Float(v2) => Ok(Value::Float(v1 as f32 * v2)),
Value::String(_) => error!(Int, Mul, String),
},
Value::Float(v1) => match other {
Value::Int(v2) => Ok(Value::Float(v1 * v2 as f32)),
Value::Float(v2) => Ok(Value::Float(v1 * v2)),
Value::String(_) => error!(Float, Mul, String),
},
Value::String(_) => match other {
Value::Int(_) => error!(String, Mul, Int),
Value::Float(_) => error!(String, Mul, Float),
Value::String(_) => error!(String, Mul, String),
},
}
}
}
I omit implementations of Sub
and Div
traits as they are identical to the above. You can refer to the operations.rs
file to have a look at them.
Putting everything together
The only part left is to make a set of "cosmetic" changes to the evalutate_expression
method. Here is the updated code of it:
#[derive(Error, Debug)]
pub enum ExpressionError {
#[error("Unable to evalutate expression {0}: {1}")]
VariableError(String, VariableError),
#[error("Unable to evalutate expression {0}: {1}")]
OperationError(String, OperationError),
}
pub fn evalutate_expression(frame: &mut Frame, expr: &Expr) -> Result<Value, ExpressionError> {
match expr {
Expr::Constant(n) => Ok(n.clone()),
Expr::Op(exp1, opcode, exp2) => {
let result = match opcode {
Opcode::Mul => {
evalutate_expression(frame, exp1)? * evalutate_expression(frame, exp2)?
}
Opcode::Div => {
evalutate_expression(frame, exp1)? / evalutate_expression(frame, exp2)?
}
Opcode::Add => {
evalutate_expression(frame, exp1)? + evalutate_expression(frame, exp2)?
}
Opcode::Sub => {
evalutate_expression(frame, exp1)? - evalutate_expression(frame, exp2)?
}
};
result.map_err(|e| ExpressionError::OperationError(expr.to_string(), e))
}
Expr::Identifier(variable) => frame
.variable_value(variable)
.map_err(|e| ExpressionError::VariableError(expr.to_string(), e)),
}
}
First, we introduce an ExpressionError
enum to wrap new types of errors defined above. Note that we use the Display
traits implemented for AST structs to form an error message. It allows us to return nice runtime errors to a user, for example:
Unable to evaluate expression ("hello" / 5): Operation String / Int is not defined"
Second, the variable_value
method and arithmetic operation could now return an error, so we need to handle it and cast to ExpressionError
.
Summary
That is it! We can now run the program mentioned above. Here is how the stack frame looks after the execution:
Frame {
parent: None,
local_variables: {
"r": Int(
5,
),
"hello_word": String(
"Hello World",
),
"square": Float(
157.0,
),
"world": String(
"World",
),
"pi": Float(
3.14,
),
"value": String(
"The square of the circle with the r = 5 is 157",
),
"hello": String(
"Hello",
),
},
}
Amazing, isn't it?
The plan for part IV is to add support for the boolean type and to implement the if
statements!
I'll publish it if this post gets 15 reactions!
Stay in touch!
Posted on August 13, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.