Custom (De)serialization with Scala (Scala DCP #003)
Andrew (he/him)
Posted on March 9, 2020
Daily Coding Problem is a website which will send you a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Scala, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!
Problem
Given the root to a binary tree, implement
serialize(root)
, which serializes the tree into a string, anddeserialize(s)
, which deserializes the string back into the tree.For example, given the following Node class
class Node: def __init__(self, val, left=None, right=None): self.val = val self.left = left self.right = right
The following test should pass:
node = Node('root', Node('left', Node('left.left')), Node('right')) assert deserialize(serialize(node)).left.left.val == 'left.left'
Strategy
See my discussion of this problem (and my Java implementation) here!
Code
I feel like I'm cheating a bit for these first few Daily Coding Problems, because I've solved them previously using Java. Maybe we can improve on the Java solution with Scala, though. Here's the Node
class I defined in Java, without the serialize
and deserialize
methods:
public class Node {
private String val;
private Node left;
private Node right;
public Node (String val, Node left, Node right) {
this.val = val;
this.left = left;
this.right = right;
}
public Node (String val, Node left) {
this(val, left, null);
}
public Node (String val) {
this(val, null, null);
}
public Node left() { return this.left; }
public Node right() { return this.right; }
public String val() { return this.val; }
}
Believe it or not, all of this can be replaced with a single line of Scala code:
case class Node (value: String, left: Node = null, right: Node = null)
A case class
in Scala is shorthand for a class which contains only immutable member variables. In the example above, value
, left
, and right
will be assignable only when a Node
object is created, and cannot be changed once that object is instantiated. case class
es also define a companion object of the same name which can be used for (among other things) making object instantiation a bit less verbose.
The above Scala code also uses default arguments to define left
and right
as null
when they're not specified by the user. So a Node
object can be created with zero, one, or two child Node
s, without needing to define additional constructors.
Finally, case class
es provide getters (but not setters, because of the aforementioned immutability) for all of the member variables defined in the argument list. They will have the same names as the arguments themselves.
So that single line of Scala code above allows us to do all of the following:
scala> val nA = Node("nA")
nA: Node = Node(nA,null,null)
scala> val nB = Node("nB", nA)
nB: Node = Node(nB,Node(nA,null,null),null)
scala> val nC = Node("nC", nB, nA)
nC: Node = Node(nC,Node(nB,Node(nA,null,null),null),Node(nA,null,null))
scala> nA.value
res4: String = nA
scala> nB.left
res5: Node = Node(nA,null,null)
scala> nC.right
res6: Node = Node(nA,null,null)
If all of the above benefits of case class
es weren't enough, they also define a default toString
method which prints the internal data in a nice, human-readable format, as seen above. In the Java example, we needed to write that code, as well. To get the equivalent functionality of this single line of Scala code, we needed roughly fifty lines of Java. That's quite the reduction in complexity.
Next, I'd like to take a few artistic liberties. The problem asks us to define serialize
and deserialize
methods using (presumably) Python. Serializing an object really just means defining a toString
method. In Python, the default toString
equivalent is about as ugly as the default Java toString
:
Python 2.7.16 (default, Dec 13 2019, 18:00:32)
[GCC 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.32.4) (-macos10.15-objc-s on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class Node:
... def __init__(self, val, left=None, right=None):
... self.val = val
... self.left = left
... self.right = right
...
>>> node = Node('root', Node('left', Node('left.left')), Node('right'))
>>> print(node)
<__main__.Node instance at 0x10d312ab8>
When we print
the Node
, it is serialized into a string representation using an automatically defined method. In Python, this method is called __str__()
. Although __str__()
can be redefined directly, the problem asks us to define a serialize
method instead. Rather than reinventing the wheel, I would prefer to just use our default toString
provided by our Node
case class
in Scala. The spirit of the problem is simply to be able to convert these objects to their string representations, then back into objects.
However, it's at this point that I'd like to draw your attention to something which may end up bring a problem during our deserialization process. Notice how our Scala Node
s are serialized:
scala> Node("heyo")
res16: Node = Node(heyo,null,null)
The Node
itself is denoted by the literal string Node(
, followed by comma-separated arguments, then closed with a )
. A straightforward deserialization might involve using these three things as sentinels:
- the string
Node(
begins the definition of aNode
- commas
,
separate thevalue
from theleft
andright
Node
s, and - the character
)
ends the definition of theNode
You might be able to see how this could be abused. First, imagine someone simply doesn't want their Node
to have a value
, they might define:
scala> Node("null",null,null)
res19: Node = Node(null,null,null)
scala> Node(null,null,null)
res20: Node = Node(null,null,null)
Note how both of these Node
s have the same string representation, even though they are really very different things. We can therefore require
that the value
of a Node
is not equal to the string "null"
by modifying the case class
definition:
case class Node (value: String, left: Node = null, right: Node = null) {
require(value != "null") }
Now the compiler will throw an Exception
when our value
is a string masquerading as a null
:
scala> Node("null",null,null)
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:268)
... 29 elided
Alternatively, we could override
the default toString
method to have it surround the value
with quotes. Something like...
scala> Node("null",null,null)
res42: Node = Node("null",null,null)
scala> Node(null,null,null)
res43: Node = Node(null,null,null)
This, I think, is a better solution. In addition to solving this null
issue, it also allows users to put commas ,
within labels, which otherwise could result in confusing serialized outputs like:
scala> Node("null,null,null",null,null)
res42: Node = Node(null,null,null,null,null)
...this would be a bit more difficult to parse. If you don't think that's too bad, imagine a more pathological example:
scala> Node("null,Node(null,Node(null,null,null),null)",null,null)
res42: Node = Node(null,Node(null,Node(null,null,null),null),null,null)
So maybe it is best that we redefine the toString
method, at least to surround the value
in quotes. Below, we override
the default toString
method, and make use of triple quotes to avoid having to escape the literal "
characters within the string:
scala> case class Node (value: String, left: Node = null, right: Node = null) {
| override def toString = s"""Node("$value",$left,$right)""" }
defined class Node
scala> Node("null,Node(null,Node(null,null,null),null)",null,null)
res27: Node = Node("null,Node(null,Node(null,null,null),null)",null,null)
Remember that we're defining
toString
andfromString
rather thanserialize
anddeserialize
. This is to avoid having the methodstoString
andserialize
do nearly (but not exactly) the same thing.
...now we can clearly tell where the value
begins and ends, by looking for the "
characters (or the syntax highlighting for strings). We can then sanitize user inputs by replacing any "
characters which appear within the value
with their escaped equivalents:
scala> case class Node (value: String, left: Node = null, right: Node = null) {
| override def toString = s"""Node("${value.replaceAll("\"", "\\\\\"")}",$left,$right)""" }
defined class Node
scala> Node("""Dwayne "The Rock" Johnson""",null,null)
res28: Node = Node("Dwayne \"The Rock\" Johnson",null,null)
Beautiful! Finally, we can parse the serialized output using this clever recursive pattern matching technique. The gist of this method is that we define a regular expression which matches terminal
Node
s (leaves) and another regular expression which matches arbitrary
Node
s. We try to match on the terminal
regex and if that doesn't work, we try to match on the arbitrary
one.
For our serialization technique, a terminal
Node
should match the pattern:
scala> val terminal = """Node\("((?:[^\"]|\")*)",null,null\)""".r
terminal: scala.util.matching.Regex = Node\("((?:[^\"]|\")*)",null,null\)
scala> val rock = Node("""Dwayne "The Rock" Johnson""",null,null)
rock: Node = Node("Dwayne \"The Rock\" Johnson",null,null)
scala> rock.toString match { case terminal(value) => println(s"$value") }
Dwayne \"The Rock\" Johnson
scala> val list = Node("My list is List(1,2,3)",null,null)
list: Node = Node("My list is List(1,2,3)",null,null)
scala> list.toString match { case terminal(value) => println(s"$value") }
My list is List(1,2,3)
The
.r
at the end of the string indicates that it's a regular expression.
...but any arbitrary Node
should match the pattern:
scala> val arbitrary = """Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)""".r
arbitrary: scala.util.matching.Regex = Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)
scala> val nodes = Node("null",rock,list)
nodes: Node = Node("null",Node("Dwayne \"The Rock\" Johnson",null,null),Node("My list is List(1,2,3)",null,null))
scala> nodes.toString match { case arbitrary(value, left, right) => println(s"$value -- $left -- $right") }
null -- Node("Dwayne \"The Rock\" Johnson",null,null) -- Node("My list is List(1,2,3)",null,null)
scala> rock.toString match { case arbitrary(value, left, right) => println(s"$value -- $left -- $right") }
Dwayne \"The Rock\" Johnson -- null -- null
scala> list.toString match { case arbitrary(value, left, right) => println(s"$value -- $left -- $right") }
My list is List(1,2,3) -- null -- null
The only remaining issue we have is when value
is null
, in which case toString
throws a NullPointerException
:
scala> val noll = Node(null,null,null)
java.lang.NullPointerException
at Node.toString(<console>:14)
at scala.runtime.ScalaRunTime$.inner$1(ScalaRunTime.scala:254)
at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:259)
at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:267)
at .$print$lzycompute(<console>:9)
at .$print(<console>:6)
at $print(<console>)
...
...we can avoid this by disallowing null
value
s in our constructor:
scala> case class Node (value: String, left: Node = null, right: Node = null) {
| require (value != null)
| override def toString = s"""Node("${value.replaceAll("\"", "\\\\\"")}",$left,$right)"""
| }
defined class Node
scala> val noll = Node(null,null,null)
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:268)
... 29 elided
Note that empty value
strings are still allowed, though
scala> val noll = Node("",null,null)
noll: Node = Node("",null,null)
Finally, using this SO answer as a template, we can deserialize any arbitrary Node
:
scala> def fromString(serialized: String): Node = {
| val arbitrary = """Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)""".r
| serialized match {
| case arbitrary(value,left,right) => Node(
| value.replaceAll("\\\\\"", "\""),
| if (left == "null") null else fromString(left),
| if (right == "null") null else fromString(right))
| }
| }
fromString: (serialized: String)Node
scala> println(rock.toString); println(fromString(rock.toString).toString)
Node("Dwayne \"The Rock\" Johnson",null,null)
Node("Dwayne \"The Rock\" Johnson",null,null)
scala> println(list.toString); println(fromString(list.toString).toString)
Node("My list is List(1,2,3)",null,null)
Node("My list is List(1,2,3)",null,null)
scala> println(nodes.toString); println(fromString(nodes.toString).toString)
Node("null",Node("Dwayne \"The Rock\" Johnson",null,null),Node("My list is List(1,2,3)",null,null))
Node("null",Node("Dwayne \"The Rock\" Johnson",null,null),Node("My list is List(1,2,3)",null,null))
And the final test...
scala> val node = Node("root",Node("left",Node("left.left")),Node("right"))
node: Node = Node("root",Node("left",Node("left.left",null,null),null),Node("right",null,null))
scala> fromString(node.toString).left.left.value == "left.left"
res34: Boolean = true
Nice!
Discussion
This problem is, in essence, an exercise in string manipulation. We want to make sure that our nodes are serialized in such a way that there are no degenerate cases (where two distinct nodes are mapped to the same serialized form), and we want to make sure that the user can't throw a wrench in the deserialization process by defining a value
with hard-to-parse characters like "
and ,
. Since these are our delimiters, the user could very easily make deserialization much more difficult for us by including them.
Since this is essentially a string manipulation problem, and since I'm pretty experienced with regex, I decided to go in that direction. There may be a better or clearer way of solving this problem in Scala without regex, but my entire solution is only 10 lines of code:
case class Node (value: String, left: Node = null, right: Node = null) {
require (value != null)
override def toString = s"""Node("${value.replaceAll("\"", "\\\\\"")}",$left,$right)"""
}
def fromString(serialized: String): Node = {
val arbitrary = """Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)""".r
serialized match {
case arbitrary(value,left,right) => Node(
value.replaceAll("\\\\\"", "\""),
if (left == "null") null else fromString(left),
if (right == "null") null else fromString(right))
}
}
I think my solution is clear, simple, and probably about as performant as any string manipulation problem can be, but if you have a better (or different) solution, I'd love to see it!
All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.
Suggestions? Let me know in the comments.
If you enjoyed this post, please consider supporting my work by buying me a coffee!
Posted on March 9, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.