We're hearing here and there the term Erasure coding. What is Erasure Coding? Erasure coding is present in Nutanix technology and shall be present also in the next release of VMware VSAN. The current VMware VSAN 6.1 has already added some cost effective features, for example allowing 2-node clusters for ROBO environments, including some cool support for Diablo Ultra DIMM Technology or NVMe storage devices.
I was wondering, as the term erasure coding is quite new for me, to do a quick write up about it, for my own (technical) culture. I usually do have a habit to do so for VMware-specific features and configurations (I think that through this blog is pretty obvious that I like VMware technology -:). I like having “my own stuff” bookmarked on “my own” space. That's why this place has started back in 2008! I precise that because several times in the past I was asked why I write about something which we can be already found through Google? Obviously, the answer is simple for me, but not evident for others.
Erasure Coding or RAID?
We all know RAID. Different types of RAID exists. RAID, in general, has usually two categories: There is a complete mirror image of the data kept on a second drive, OR parity blocks are added to the data so that failed blocks can be recovered. Mirroring doubles data size (which is normal), and parity usually adds one-fifth more data.
Erasure Coding, on the other hand, is a bit more complex, especially when using proprietary technology (like Nutanix) which differs from standard Reed-Solomon codes which are used in Reed-Solomon like implementations of erasure coding. VMware VSAN in its implementation of erasure coding will support RAID5 and RAID6 configurations for your virtual machine objects. That's at least what's surfaced up through different VMware blogs and sources.
Now I found a good definition of erasure coding at Network computing website and here is a quick quote:
Erasure coding is usually specified in an N+M format: 10+6, a common choice, means that data and erasure codes are spread over 16 (N+M) drives, and that any 10 of those can recover data. That means any six drives can fail. If the drives are on different appliances, the protection includes appliance failures, so six appliance boxes could go down without stopping operations.
Now I'm not a storage expert, but rather a “Virtualization generalist”. To me, the erasure coding technology adds a significant plus if I have to make a choice or talk to a customer about features/functions that exist and how they (possibly) work. I don't have any information so far when next version of VSAN will be available, and if the erasure coding technology will make it into that release. For those interested, there is a VSAN Beta subscription page (not sure if it's still open).
And another quite explanation of erasure coding which I found at smaesh.com blog:
The basic premise of erasure coding goes as follows: > Take a file and split into k pieces and encode into n pieces. Now, any k pieces can be used to get back the file.
While erasure codes are also called as error correcting codes, there is a crucial difference between an error and an erasure. If I send ten bits and one bit flipped, an error has occurred, and I do not know where it has occurred. However, if I store ten blocks of a file into different nodes and one node dies, I know exactly which block I lost, and so I know where the erasure has happened. See the difference?
Image courtesy of smahesh.com blog:
Erasure coding looks simple as theory. How exactly it's done (splitting files etc..), the “under the hood” technology differs in each vendor's utilization. But the basics are quite understandable. There is file splitting going on and the different pieces encoded into many pieces. And any part of the file which had been split can be used to get back the original file.
There is also ceph (an Open source) who is also using erasure coding technology, if you're interested….
- Wikipedia article on erasure coding