Immutable data

Not that uncommon and very helpful

Immutable data known also by the name of persistent data is becoming mainstream. It’s another approach first used in functional programming.

The whole idea is that once you create a given piece of data you cannot change it. It persists in that state.

At first sight, it would seem it’s impossible to build anything useful which such approach. How can you store any data if you cannot change the data structures you already have. The task seems to be impossible.

It’s not as hard as it seems. That approach to data is actually common in the programming languages. But usually, the scope of immutable data is limited.

In all languages, I know at least numbers are immutable. You cannot change the value of 5 to be 6. The only thing you can do is to get new value and store that.

In some languages, the idea of immutability extends to strings. That makes any operation on them creating a new string. That fact, in the most part, does not cause any problems.

The idea of immutable data extends that notion of value to common data structures. Things like arrays (lists), maps (hashes/objects) and sets become the same kind of thing as a number. They become values.

You can’t change anything.

For example to get a list with a new element in it you need to create a new list. It will contain everything from the previous one plus your new element. The old list stays the same as before and now you have an instance with what you wanted.

The same approach happens with maps and sets. Create new thing based on the old but with your modifications applied.

You may think that this must be horribly inefficient. To copy all those items around. Will building 100 item list require 99 copies of elements 1–99? That’s a lot of memory used just for the sake of immutability.

Fortunately, it doesn’t have to be that way.

Given the constraint of not immutability, the common parts of any data structure can be shared between all the copies. Smart algorithms and techniques can make it so only one copy of every single item is stored in memory.

It’s also possible to speed up a lot of common access patterns because the layout in memory may be optimized for that instead for easy extensibility.

In some cases, the memory usage can even go down because it’s being used in a smarter way. In places where several code paths are initialized with the same copy of a given data structure, there’s no need to make a special copy. Because of immutability, those code paths will not step on each other’s toes.

Which brings me to the biggest advantage of using immutable data. It’s much easier to reason about it. Once you have a reference to something you know that no other code path can change it. Your copy will stay the same for as long as you have it. That eliminates a certain class of bugs. Bugs which are usually hard to track because the problem is in a different part of the code than its manifestation.

In conclusion: immutable data can make your program faster and take less memory but what’s the most important is that it will make it much easier to reason about it. Any data modification is visible because one needs to swap out reference to the modified version.