Kotlin Data Classes, shallow copies and immutability
Julien Lengrand-Lambert
Posted on August 8, 2022
TL;DR: The data class copy
method in Kotlin creates shallow copies and data classes are NOT immutable data structures by themselves. They become immutable though, if all of their properties are immutable themselves (_val_).
Note: You can run all the samples listed here by clicking that link
Earlier this week, I gave a Kotlin introduction training to about 50 folks at Adyen. I think most of them appreciated the training and I even got some nice feedback
I gave a "introduction to Kotlin" workshop yesterday for about 50 people for the first time. Damn workshops are much more exhausting than I was expecting.
This feedback made my day though 🥰❤️. pic.twitter.com/jZE8r1UBWJ
— Julien Lengrand-Lambert (@jlengrand) July 26, 2022
https://twitter.com/jlengrand/status/1551904525358874625
One fun thing about the training is that most of the audience was coming from a C/C++ background and they asked many questions about references / values, shallow or deep copies and how Kotlin manages memory. Suffice to say I wasn't ready for it ^^. This blog summarises my findings after the training.
A quick recap about Data Classes
In Kotlin, data classes are specialised structures that are meant hold data (as the name suggests).
data class Address(val number: Int, val street: String, val city: String)
data class User(val name: String, val age: Int, val address: Address)
They come with additional goodies compared to normal classes, among which:
- generated
hashCode
andequals
method. (Equals is smart as well, checking if all properties and sub properties have the same value) - A smart
toString
method that displays the data content nicely
Duh, that's just like Java Records, I hear you say. Yeah, except that data classes also come with the copy
method, which is in my opinion give them all their power. With that copy method, you can create a new copy of an existing instance, while modifying some data at the same time. Here is an example:
val bob = User("Bob", 42, Address("12", "rue des peupliers", "Paris"))
val anOlderBob = bob.copy(age = 43)
As I was presenting this to the C++ folks, I heard the same question pop up at three different places at the same time : Is anOlderBob
a shallow, or deep copy of bob
?
In other words, if bob
changes address now, does anOlderBob
get affected?
Silence in the room.....
Benefits of Immutable data classes
See, I've been interested in Functional Programming for a little while, and I know that most of them heavily rely on immutable data structures. One of the main reasons for this is thread safety. If your data cannot be modified any more once it has been written, you are by definition thread safe and you are certain not to have synchronisation issues. I even think that the first person who taught me this was the one and only Martin Odersky.
Thing is, the huge majority of applications out there are in the business of moving data around. And if you cannot modify existing objects, well it also means that you're gonna have to create many many more objects to compensate for it right?
Let's take an example :
data class User(val name: String, val age: Int)
val users = listOf(
User("Bob", 42),
User("Georges", 12),
User("Emily", 25),
User("Amy", 46))
val olderUsers = users.map { it.copy(age = it.age + 1) }
>> [User(name=Bob, age=42), User(name=Georges, age=12), User(name=Emily, age=25), User(name=Amy, age=46)]
>> [User(name=Bob, age=43), User(name=Georges, age=13), User(name=Emily, age=26), User(name=Amy, age=47)]
Here, we are creating a list of 4 users and then creating a new list with each user older by one year. Because our users are all immutable, we have to create a new copy for each user.
For more serious applications, the obvious outcome is that this type of copying _ has _ to be shallow (meaning our new object's properties link to the memory location of the parent properties), because otherwise the performance hit of creating so many objects would be prohibitive.
Well, now let's verify it.
Playing around with Data Classes
Let's run a few tests to make sure our assumptions are correct (or not). We create FantasyHero
, a data class that has mutable and immutable properties, of primitive and more complex types.
enum class WEAPONS{
AXE, SWORD, WAND, BOW
}
enum class CLASS{
WIZARD, WARRIOR, PALADIN, THIEF
}
data class Origin(val city: String, var country: String)
data class FantasyHero(
var name: String,
val weapons: MutableList<WEAPONS>,
var heroClass: CLASS?,
val origin: Origin = Origin("Utrecht", "The Netherlands")
)
First, let's check that everything behaves as expected
val gandalf = FantasyHero(
"Gandalf the Grey",
mutableListOf(WEAPONS.WAND),
CLASS.WIZARD
)
val anotherGandalf = FantasyHero(
"Gandalf the Grey",
mutableListOf(WEAPONS.WAND),
CLASS.WIZARD
)
val gandalfCopy = gandalf.copy()
val betterGandalf = gandalf.copy(name="Gandalf the White")
println(gandalf)
println(anotherGandalf)
println(betterGandalf)
println(gandalf == anotherGandalf)
println(gandalf == gandalfCopy)
println(gandalf == betterGandalf)
println("--")
println(gandalf === anotherGandalf)
println(gandalf === gandalfCopy)
println(gandalf === betterGandalf)
>> FantasyHero(name=gandalf the grey, weapons=[WAND], heroClass=WIZARD, origin=Origin(city=Utrecht, country=The Netherlands))
>> FantasyHero(name=gandalf the grey, weapons=[WAND], heroClass=WIZARD, origin=Origin(city=Utrecht, country=The Netherlands))
>> FantasyHero(name=gandalf the white, weapons=[WAND], heroClass=WIZARD, origin=Origin(city=Utrecht, country=The Netherlands))
>> true
>> true
>> false
>> --
>> false
>> false
>> false
That all checks out. We created 2 different instances with the same data and they are considered equal (because we check the property values against each other, not the object instances). The straight copy of our instance is equal as well, while the instance where we modified a property isn't any more. And we also check that all instances of heroes are different. They are not the same object. All good, we can continue
Mutating "complex" properties
First, let's give our Wizard an extra weapon. We all know he yields a sword as well after all. Since we expect copies to be shallow copies here, we would expect the list of weapons to be changed in all copies of the hero if we change the original list. And that's exactly what happens :
gandalf.weapons.add(WEAPONS.SWORD)
println(gandalf.weapons)
println(anotherGandalf.weapons)
println(gandalfCopy.weapons)
println(betterGandalf.weapons)
>> [WAND, SWORD]
>> [WAND]
>> [WAND, SWORD]
>> [WAND, SWORD]
// Same behaviour in case we change one of the later copies btw
anotherGandalf.weapons.remove(WEAPONS.SWORD)
println(gandalf.weapons)
println(anotherGandalf.weapons)
println(gandalfCopy.weapons)
println(betterGandalf.weapons)
>> [WAND]
>> [WAND]
>> [WAND]
>> [WAND]
In the same manner, modifying a property that is a data class itself yields the same types of result, all copies are affected :
gandalf.origin.country = "France"
println(gandalf.origin.country)
println(anotherGandalf.origin.country)
println(betterGandalf.origin.country)
>> France
>> The Netherlands
>> France
>> France
Mutating a simple property
So far so good. Let's change Gandalf's name!
gandalf.name = "Gandalf the Blue"
println(gandalf.name)
println(anotherGandalf.name)
println(gandalfCopy.name)
println(betterGandalf.name)
>> Gandalf the Blue
>> Gandalf the Grey
>> Gandalf the Grey
>> Gandalf the White
When mutating the String
property of our Wizard, we see that the name change isn't propagated to any of the copies.
Doing the same test with an Enum
leads to a similar result
gandalf.heroClass = CLASS.PALADIN
println(gandalf.heroClass)
println(anotherGandalf.heroClass)
println(gandalfCopy.heroClass)
println(betterGandalf.heroClass)
>> PALADIN
>> WIZARD
>> WIZARD
>> WIZARD
Now, that had me confused at first. Confused enough to ask on the Kotlin Slack actually (thanks for the help folks!) . Are the copies holding references to the same values or not? Well, it turns out that they are and the explanation goes back to the basics of Kotlin / Java : References are immutable, so modifying them literally leads to referencing a new Object instead. The behaviour has nothing to do with data classes :
var heroClass = CLASS.WARRIOR
val theClass = heroClass
heroClass = CLASS.WIZARD
println(heroClass)
println(theClass)
>> WIZARD
>> WARRIOR
When copying data classes around, the constructor passes references to all of the properties, and all of those references are immutable.
Conclusion
Just like we expected, the data class copy
method create shallow copies of instances. We would have saved time just checking the documentation actually:
A screenshot of the documentation about copying data classes (https://kotlinlang.org/docs/data-classes.html#copying)
Now back to our question : Are data classes thread safe? Well now I realize that there's been a confusing in my head during the training. Data classes have little to do with immutability, they are a convenient way to work with objects purely holding data. However, you can make them immutable by making sure all of their fields are immutable themselves. That's what I typically do, and you probably should too 😊.
Further reading
Interestingly, Romain pointed at a KEEP in the Twitter thread which quite literally discusses the topic mentioned here. I'm keen on seeing what's gonna happen with Value Classes in the future.
Hit me up @jlengrand, if you have questions or remarks, I'm always up for learning new things 😊
Posted on August 8, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.