Null-Safety vs Maybe/Option - A Thorough Comparison (Part 2/2)

practicalprogramming

Christian Neumanns

Posted on October 18, 2019

Null-Safety vs Maybe/Option - A Thorough Comparison (Part 2/2)

In part 1 of this article we saw how the null pointer error can be eliminated with null-safety or with the Maybe/Option type.

In this part we'll have a closer look at code snippets that illustrate frequent use cases of handling the 'absence of a value'.

Useful Null-Handling Features

Besides null-safety, a language should also provide specific support for common null handling operations. Let's have a look at some examples.

Note: The additional support for null-handling presented in the following chapters is typically found only in null-safe languages. However, other languages can also provide these features, even if they are not null-safe.

Searching the First Non-Null Value

Suppose we want to look up a discount for a customer. First we try to retrieve the value from a web-service. If the value is not available (i.e. the result is null), we try to retrieve it from a database, then from a local cache. If the value is still null we use a default value of 0.0.

Now we want to write a function that provides the discount for a given customer. To keep the example simple, we ignore error-handling. Moreover, we don't use asynchronous functions to make the lookup process faster.

Java

First attempt:

static Double customerDiscount ( String customerID ) {

    Double result = discountFromNet ( customerID );
    if ( result != null ) {
        return result;
    } else {
        result = discountFromDB ( customerID );
        if ( result != null ) {
            return result;
        } else {
            result = discountFromCache ( customerID );
            if ( result != null ) {
                return result;
            } else {
                return 0.0; // default value
            }
        }
    }
}

What an ugly monstrosity! Let's quickly rewrite it:

static Double customerDiscount ( String customerID ) {

    Double result = discountFromNet ( customerID );
    if ( result != null ) return result;

    result = discountFromDB ( customerID );
    if ( result != null ) return result;

    result = discountFromCache ( customerID );
    if ( result != null ) return result;

    return 0.0; // default value
}

Note: There's nothing wrong with using several return statement, as in the code above (although some people might disagree).

The complete Java source code for a test application is available here.

Haskell

Let's start again with the straightforward, but ugly version:

customerDiscount :: String -> Float
customerDiscount customerID =
    case (discountFromNet customerID) of
    Just d -> d
    Nothing -> case (discountFromDB customerID) of
               Just d -> d
               Nothing -> case (discountFromCache customerID) of
                          Just d -> d
                          Nothing -> 0.0

There are different ways to write better code. Here is one way:

customerDiscount :: String -> Float
customerDiscount customerID =
    let discountMaybe = discountFromNet customerID
                        <|> discountFromDB customerID
                        <|> discountFromCache customerID
    in fromMaybe 0.0 discountMaybe

The complete Haskell source code for a test application, including alternative ways to write the above function, is available here.

More information (and even more alternatives) can be found in the Stackoverflow question Using the Maybe Monad in reverse.

PPL

Again, first the ugly version, written in PPL:

function customer_discount ( customer_id string ) -> float_64
    if discount_from_net ( customer_id ) as net_result is not null then
        return net_result
    else
        if discount_from_DB ( customer_id ) as DB_result is not null then
            return DB_result
        else
            if discount_from_cache ( customer_id ) as cache_result is not null then
                return cache_result
            else
                return 0.0
            .
        .
    .
.

The code becomes a one-liner and more readable with the practical if_null: operator designed for this common use case:

function customer_discount ( customer_id string ) -> float_64 = \
    discount_from_net ( customer_id ) \
    if_null: discount_from_DB ( customer_id ) \
    if_null: discount_from_cache ( customer_id ) \
    if_null: 0.0

The if_null: operator works like this: It evaluates the expression on the left. If the result is non-null, it returns that result. Else it returns the expression on the right.

In our example we use a chain of if_null: operators to find the first non-null value. If the three functions called in the expression return null, we return the default value 0.0.

The complete PPL source code for a test application is available here.

Getting a Value in a Path With Nulls

Sometimes we need to do the opposite of what we did in the previous chapter. Instead of stopping at the first non-null value, we continue until we've found the last non-null value.

For example, suppose a customer record type with two attributes:

  • name: a non-null string

  • address: a nullable address

Record type address is defined as follows:

  • city: a nullable string

  • country: a non-null string

Now we want to create a function that takes a customer as input, and returns the number of characters in the customer's city. If the customer's address attribute is null, or if the address's city attribute is null then the function should return 0.

Java

These are the record types written in idiomatic Java:

static class Customer {

    private final String name;
    private final Address address;

    public Customer ( String name, Address address) {
        this.name = name;
        this.address = address;
    }

    public String getName() { return name; }
    public Address getAddress() { return address; }
}

static class Address {

    private final String city;
    private final String country;

    public Address ( String city, String country) {
        this.city = city;
        this.country = country;
    }

    public String getCity() { return city; }
    public String getCountry() { return country; }
}

Note: We don't use setters because we want our types to be immutable.

As seen already, all types are nullable in Java. We cannot explicitly specify if null is allowed for class fields.

Function (method) customerCitySize can be implemented as follows:

static Integer customerCitySize ( Customer customer ) {

    Address address = customer.getAddress();
    if ( address == null ) return 0;

    String city = address.getCity();
    if ( city == null ) return 0;

    return city.length();
}

Alternatively we could have used nested if statements, but the above version is more readable and avoids the complexity of nested statements.

We can write a simplistic test:

public static void main ( String[] args ) {

    // city is non-null
    Address address = new Address ( "Orlando", "USA" );
    Customer customer = new Customer ( "Foo", address );
    System.out.println ( customerCitySize ( customer ) );

    // city is null
    address = new Address ( null, "USA" );
    customer = new Customer ( "Foo", address );
    System.out.println ( customerCitySize ( customer ) );

    // address is null
    customer = new Customer ( "Foo", null );
    System.out.println ( customerCitySize ( customer ) );
}

Output:

7
0
0

The whole Java source code is available here.

Haskell

Defining the record types is easy:

data Customer = Customer {
    name :: String,
    address :: Maybe Address
}

data Address = Address { 
    city :: Maybe String,
    country :: String 
}

There are several ways to write function customerCitySize in Haskell. Here is, I think, the most readable one for people more familiar with imperative programming. It uses the do notation:

import Data.Maybe (fromMaybe)

customerCitySize :: Customer -> Int
customerCitySize customer =
    let sizeMaybe = do
        address <- address customer        -- type Address
        city <- city address               -- type String
        return $ length city               -- type Maybe Int
    in fromMaybe 0 sizeMaybe

Here is a version that doesn't use the do notation:

customerCitySize :: Customer -> Int
customerCitySize customer =
    let addressMaybe = address customer    -- type Maybe Address
        cityMaybe = addressMaybe >>= city  -- type Maybe String
        sizeMaybe = length <$> cityMaybe   -- type Maybe Int
    in fromMaybe 0 sizeMaybe

If we are careful with operator precedence, we can shorten the code:

customerCitySize :: Customer -> Int
customerCitySize customer =
    fromMaybe 0 $ length <$> (address customer >>= city)

Instead of using fromMaybe we can use maybe to provide the default value:

customerCitySize :: Customer -> Int
customerCitySize customer =
    maybe 0 length $ address customer >>= city

Yes, this code is concise. But there is a lot going on behind the scenes. A looooot!. To really understand the above code one has to understand Haskell. And yes, we use a monad, indicated by the bind operator >>= in the code. For more information please refer to Haskell's documentation.

We can write a quick test:

main :: IO ()
main = do

    -- city is defined
    let address1 = Address {city = Just "Orlando", country = "USA"}
    let customer1 = Customer {name = "Foo", address = Just address1}
    putStrLn $ show $ customerCitySize customer1

    -- city is not defined
    let address2 = Address {city = Nothing, country = "USA"}
    let customer2 = Customer {name = "Foo", address = Just address2}
    putStrLn $ show $ customerCitySize customer2

    -- address is not defined
    let customer3 = Customer {name = "Foo", address = Nothing}
    putStrLn $ show $ customerCitySize customer3

Again, the output is:

7
0
0

The whole Haskell source code is available here. There are also two examples of customerCitySize implementations that compile without errors, but produce wrong results.

PPL

First, the record types:

record type customer
    attributes
        name string
        address address or null
    .
.

record type address
    attributes
        city string or null
        country string
    .
.

Function customerCitySize is written like this:

function customer_city_size ( customer ) -> zero_pos_32 =
    customer.address.null?.city.null?.size if_null: 0

Note the embedded null? checks. The evaluation of customer.address.null?.city.null?.size stops as soon as a null is detected in the chain. In that case, the whole expression evaluates to null.

The if_null: operator is used to return the default value 0 if the expression on the left evaluates to null.

Instead of .null?. we can also simply write ?.. Hence the function can be shortened to:

function customer_city_size ( customer ) -> zero_pos_32 =
    customer.address?.city?.size if_null: 0

Simple test code looks like this:

function start

    // city is non-null
    const address1 = address.create ( city = "Orlando", country = "USA" )
    const customer1 = customer.create ( name = "Foo", address = address1 )
    write_line ( customer_city_size ( customer1 ).to_string )

    // city is null
    const address2 = address.create ( city = null, country = "USA" )
    const customer2 = customer.create ( name = "Foo", address = address2 )
    write_line ( customer_city_size ( customer2 ).to_string )

    // address is null
    const customer3 = customer.create ( name = "Foo", address = null )
    write_line ( customer_city_size ( customer3 ).to_string )
.

Output:

7
0
0

The whole PPL source code is available here.

Comparison

Here is a copy of the three implementations:

  • Java

    static Integer customerCitySize ( Customer customer ) {
    
        Address address = customer.getAddress();
        if ( address == null ) return 0;
    
        String city = address.getCity();
        if ( city == null ) return 0;
    
        return city.length();
    }
    
  • Haskell

    customerCitySize :: Customer -> Int
    customerCitySize customer =
        maybe 0 length $ address customer >>= city
    
  • PPL

    function customer_city_size ( customer ) -> zero_pos_32 =
        customer.address?.city?.size if_null: 0
    

Comparisons

Now that we know how the null pointer error is eliminated, let us look at some differences between using the Maybe monad in Haskell and null-safety in PPL.

Note: The following discussion is based on the Haskell and PPL examples shown in the previous chapters. Hence, some of the following observations are not valid in other languages that work in a similar way. For example, F#'s Option type is very similar to Haskell's Maybe type, but these two languages are far from being the same. Reader comments about other languages are of course very welcome.

Source Code

Here is a summary of the differences we saw in the source code examples.

Declaring the type of a nullable reference

Haskell: Maybe string (other languages use Option or Optional)

PPL: string or null (other languages use string?)

As seen already, the difference between Haskell and PPL is not just syntax. Both use different concepts.

Haskell uses the Maybe type with a generic type parameter. Form the Haskell doc.: "A value of type Maybe a either contains a value of type a (represented as Just a), or it is empty (represented as Nothing). The Maybe type is also a monad."

On the other hand, PPL uses union types to state that a value is either a specific type, or null.

A non-null value used for a nullable type

Haskell: Just "qwe" (other languages: Some "qwe")

PPL: "qwe"

This difference is important!

In Haskell "qwe" is not type compatible to Just "qwe". Suppose the following function signature:

foo :: Maybe String -> String

This function can be called as follows:

foo $ Just "qwe"

But a compiler error arises if we try to call it like this:

foo "qwe"

There are a few consequences to be aware of.

First, if a type changes from Maybe T to T, then all occurrences of Just expression must be changed to expression. The inverse is true too. A change from type T to Maybe T requires all occurrences of expression to be refactored to Just expression.

This is not the case in PPL. An expression of type string is type-compatible to an expression of string or null (but the inverse is not true). For example, the function ...

foo ( s string or null ) -> string

... can be called like this:

foo ( "qwe" )

If the function is later refactored to ...

foo ( s string ) -> string

... then it can still be called with:

foo ( "qwe" )

Secondly, in Haskell some functions with the same name might exist for input type Maybe T, as well as for input T. But the semantics are different. For example, length "qwe" returns 3 in Haskell, while length $ Just "qwe" returns 1. It is important to be aware of this, because there is no compile-time error if function length is used for an expression whose type changes from Maybe T to T or vice-versa.

Thirdly, one has to be aware of the possibility of nested Maybes in Haskell. For example, suppose again we declare:

data Customer = Customer {
    name :: String,
    address :: Maybe Address
}

data Address = Address { 
    city :: Maybe String,
    country :: String 
}

What is the return type of the following function?

customerCity customer = city <$> address customer

Is it Maybe string. No, it's Maybe ( Maybe string ) - a nested Maybe. Ignoring this can lead to subtle bugs. For an interesting discussion see the Stackoverflow question Simplifying nested Maybe pattern matching.

'No value' symbol

Haskell: Nothing (other languages: None)

PPL: null (other languages: nil, void, ...)

Checking for null

Haskell (one way to do it):

intToString :: Maybe Integer -> Maybe String
intToString i = case i of
    Just 1  -> Just "one"
    Nothing -> Nothing
    _       -> Just "not one"            

Note: Omitting the Nothing case does not produce a compiler error. Instead, the function returns Just "not one" if it is called with Nothing as input.

PPL (new version):

function int_to_string ( i pos_32 or null ) -> string or null = \
    case value of i
        when null: null
        when 1   : "one"
        otherwise: "not one"

Note: Omitting the when null case results in the following compiler error:

Clause 'when null' is required because the case expression might be null at run-time.

Providing a default non-null value

Haskell: fromMaybe 0 size (requires Data.Maybe; F#: defaultArg 0 size)

PPL: size if_null: 0 (other languages: size ?: 0, ( ?: is sometimes called 'Elvis operator')

Getting the first non-null value in a chain, or else a default value

Haskell:

customerDiscount :: String -> Float
customerDiscount customerID =
    let discountMaybe = discountFromNet customerID
                        <|> discountFromDB customerID
                        <|> discountFromCache customerID
    in fromMaybe 0.0 discountMaybe

PPL:

function customer_discount ( customer_id string ) -> float_64 = \
    discount_from_net ( customer_id ) \
    if_null: discount_from_DB ( customer_id ) \
    if_null: discount_from_cache ( customer_id ) \
    if_null: 0.0

Getting the last value in a chain, or else a default value

Haskell:

customerCitySize :: Customer -> Int
customerCitySize customer =
    maybe 0 length $ address customer >>= city

PPL:

function customer_city_size ( customer ) -> zero_pos_32 =
customer.address?.city?.size if_null: 0




Implementation

Back in 1965, Tony Hoare introduced null in ALGOL "simply because it was so easy to implement", as he said.

In Java, and probably most other programming languages, null is implemented by simply using the value 0 for a reference. That is to say, if we write something like name = "Bob", then the memory address used for variable name contains the starting address of the memory block that stores the string value "Bob". On the other hand, when name = null is executed, then the content of the memory address used for variable name is set to 0 (i.e. all bits set to zero). Easy and efficient, indeed!

A more thorough explanation is available in chapter Run-time Implementation of my article A quick and thorough guide to 'null'.

So, implementing null is easy. However, adding null-safety to a language is a totally different story. Implementing compile-time-null-safety in a practical way is far from being easy. Adding good support to simplify null-handling as far as possible is a challenge. Adding null-safety and good support for null-handling makes life more difficult for language creators, but much easier for language users (i.e. software developers). This doesn't come as a surprise, though. It's just a frequently observed fact of life:

  • It is easy to make it difficult to use.

  • It is difficult to make it easy to use.

On the other hand, a type like Maybe can simply be added to the language's standard library, without the need for special support in the language.

In he case of Haskell, Maybe is a monad in the standard prelude, and Haskell's standard functional programming features are used to handle Maybe values.

Space and Time

Let's starts with null.

There are just two kinds of basic operations needed at run-time:

  • Assign null to a reference (e.g. name = null): this is typically done by just writing 0 to a memory cell.

  • Check if a reference points to null (e.g. if name is null): this is very quickly done by just comparing the content of a memory cell with 0.

The conclusion is obvious: null operations are extremely space- and time-efficient.

On the other hand, using a wrapper type is probably less efficient, unless the compiler uses very clever optimizations.

As a general observation, it is probably fair to say that, for a given language, using a wrapper type cannot be made faster than using 0 for a null reference.

In practice, however, the performance difference might not be an issue in many kinds of applications.

A Note On The "Billion Dollar Mistake"

Yes, Tony Hoare stated that null has "probably caused a billion dollars of pain and damage in the last forty years".

However, a few seconds later he said the following, which is really important, but often ignored:

More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

-- Tony Hoare

The mistake was not the invention of null per se. The mistake was the lack of compile-time-null-safety and good support for null-handling in programming languages.

As seen in this article it is possible to eliminate the null pointer error in languages that use null. No "billion-dollar mistake" anymore!

Isn't it amazing that it took the software development industry over 40 years to recognize this and start creating null-safe languages?

Summary

Here is a summary of the key points:

Java (and most other popular programming languages)

  • All reference types are nullable.

  • null can be assigned to any reference (variable, input argument, function return value, etc.).

  • There is no protection against null pointer errors. They occur frequently and are the reason for the billion-dollar mistake.

Haskell (and some other programming languages using Maybe/Option)

  • null is not supported. Hence null pointer errors cannot occur.

  • The Maybe type (a monad with a type parameter) is used to manage the 'absence of a value'.

  • Pattern matching is used to test for Nothing.

  • Standard language features are used to handle Maybe values (e.g. the monad's bind operator >>=).

PPL (and some other null-safe programming languages)

  • By default all reference types are non-nullable and null cannot be assigned.

  • null is an ordinary type with the single value null. Union types are used to handle the 'absence of a value' (e.g. string or null).

  • The compiler ensures a null-check is done before executing an operation on a nullable type. Thus null pointer errors cannot occur.

  • The language provides specific support to simplify null-handling as far as possible. Null-handling code is concise, and easy to read and write.

Conclusion

The aim of this article was to compare null-safety in PPL with the Maybe type in Haskell. We did this by looking at a number of representative source code examples.

By comparing two languages, we must of course be careful not to generalize our observations to all other languages.

However, in the context of this article we saw that:

  • Null-safety, as well as the Maybe type eliminate the null pointer error.

  • Using the Maybe/Optional type is easier to implement in a language than null-safety. It simplifies life for language designers and implementers.

  • Providing good support for null-safety is a challenge for language creators. However, it simplifies life for developers.

Header image by dailyprinciples from Pixabay.

💖 💪 🙅 🚩
practicalprogramming
Christian Neumanns

Posted on October 18, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related