Building Zerocalc, part IV - precision
Michal Ciesielski
Posted on July 7, 2024
Performing arithmetic calculations using computers results in limited precision. Operations on integers are accurate up to minimum and maximum values allowed by the selected integer representation. Using the i128
type we can get any number up to 38 digits long which is enough assuming you will not try to count atoms in the known universe (~10^79). Floating point operations are more challenging as not all values can be represented in the standard used by Rust (IEEE 754). For example:
1.1 * 2.2 = 2.4200000000000004
There are libraries, such as rug, that can help us deal with this problem, offering extended precision and big integer operations. For Zerocalc I am going to do something simpler and still get some benefits of improved precision: as long as we operate on integers, we will stick to integers and only switch to floats when necessary.
For example, when adding or multiplying integers, the result is an integer. When dividing, if the division has no reminder, we will use integers. I also want to make the operations easy to write just as we used normal primitives, for example, I want to write:
x = 1
y = 2
result = x * y
and don't think about what primitive types to operate on.
To achieve this we will use two Rust features: enum
s and std::ops
traits which are similar to C++'s operator overloading. Let's start with defining our numbers. We will derive from PartialEq
to allow comparisons of Number
instances:
#[derive(Debug, PartialEq)]
pub enum Number {
Int(i128),
Float(f64),
NaN
}
The extra NaN
or Not a Number option will help us handle edge cases such as division by 0
. Rust's floating point numbers already have similar values defined as constants such as std::f64::NAN
. To make it easy to create Number
instances in tests, we can add a helper conversion implementations:
impl From<f64> for Number {
fn from(value: f64) -> Self {
if value == std::f64::NAN || value == std::f64::INFINITY || value == std::f64::NEG_INFINITY {
Number::NaN
} else {
Number::Float(value)
}
}
}
impl From<Number> for f64 {
fn from(value: Number) -> Self {
match value {
Number::Int(i) => i as f64,
Number::Float(f) => f,
Number::NaN => f64::NAN
}
}
}
impl From<i128> for Number {
fn from(value: i128) -> Self {
Number::Int(value)
}
}
impl From<Option<i128>> for Number {
fn from(value: Option<i128>) -> Self {
match value {
Some(i) => Number::Int(i),
None => Number::NaN
}
}
}
With this, we can create a Number
from actual values easily:
let x: Number = 1.into(); // creates Number::Int(1)
let y: Number = 1.0.into(); // creates Number::Float(1.0)
You may be wondering why we have two conversions for i128
- one for the actual integer and one for the option. This will help us to handle errors as we will see in the next step.
Now we can define the first operator - addition. The rules for addition will be:
- If any operand is NaN, the result is NaN.
- If both operands are integers, the result is an integer.
- In all other cases (one or both operands are float) the result is float.
- If the result cannot be represented as
i128
orf64
- return NaN.
impl Add for Number {
type Output = Number;
fn add(self, rhs: Self) -> Self::Output {
match (self, rhs) {
(Number::NaN, _) | (_, Number::NaN) => Number::NaN,
(Number::Int(l), Number::Int(r)) => l.checked_add(r).into(),
_ => {
let l: f64 = self.into();
let r: f64 = rhs.into();
(l + r).into()
}
}
}
}
The checked_add
method returns an option - Some
if the result can be safely represented as i128, None in other cases. This is where our From<Option<i281>>
comes in handy. In the case of floats the errors will be returned as predefined constants such as NAN
. Let's test it!
#[test]
fn test_add() {
let int1: Number = 1.into();
let int2: Number = 2.into();
assert_eq!(Number::Int(3), int1 + int2);
let fl2: Number = 2.0.into();
if let Number::Float(fl_res) = int1 + fl2 {
assert!((fl_res - 3.0).abs() < std::f64::EPSILON);
} else {
panic!("Float expected");
}
}
Subtraction and multiplication are very similar, in the case of division we need to check if the reminder of dividing two integrals is 0
- if yes, the result will also be 0
. We also have to watch for division by 0
operations. checked_rem
will handle both for us:
impl Div for Number {
type Output = Number;
fn div(self, rhs: Self) -> Self::Output {
match (self, rhs) {
(Number::NaN, _) | (_, Number::NaN) => Number::NaN,
(Number::Int(l), Number::Int(r)) => {
let rem = l.checked_rem(r);
match rem {
Some(r) if r > 0 => Number::Float(self.into()) / Number::Float(rhs.into()),
Some(_) => Number::Int(l / r),
None => Number::NaN
}
},
_ => {
let l: f64 = self.into();
let r: f64 = rhs.into();
(l / r).into()
}
}
}
}
With this simple approach, we can make easy calculations that will preserve integers as long as possible and will gracefully handle errors.
Bonus
A cautious reader may have noticed the impl From<Number> for f64
implementation does not do any checks when converting 128 into f64. How can we safely convert a 128-bit integer into an 64-bit float? According to IEEE-754, the 64-bit floating numbers are stored as:
- 11 bit exponent
- 53 bit precision number
Quoting after Wikipedia:
The 11 bit width of the exponent allows the representation of numbers between 10^−308 and 10^308
So there will be rounding errors, but we can store a maximum i128 number; we can even count atoms in the universe! Let's check what happens if we convert 128::MAX
:
fn main() {
println!("i128::MAX: {},\ni128::MAX as f64: {}", std::i128::MAX, std::i128::MAX as f64)
}
Output:
i128::MAX: 170141183460469231731687303715884105727,
i128::MAX as f64: 170141183460469230000000000000000000000
Sources:
https://rtfeldman-rust-workshop.netlify.app/1.2
https://crates.io/crates/rug
https://www.wolframalpha.com/input?i=number+of+atoms+in+universe
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Posted on July 7, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.