RAID: Part 4 – Parity, Schmarity

We’re in the home stretch now.  We’ve covered mirroring and striping, and three RAID types already.

Now we are moving into RAID5 and RAID6.  They leverage striping and a concept called parity.

I’m still new to this blogging thing and once again I bit off more than I could chew.  I wanted to include RAID5 and 6 within this post but it again got really lengthy.  Rather than put an abridged version of them in here, I will cover them in the next post.

For a long time I didn’t really understand what parity was, I just knew it was “out there” and let us recover from disk failures.  But the more I looked into it and how it worked, the more it really amazed me.  It might not grab you the way it grabs me…and that’s OK, maybe you just want to move on to RAID5/6 directly.  But if you are really interested in how it works, forge on.

What is Parity?

Parity (pronounced ‘pear-ih-tee’) is a fancy (pronounced ‘faincy’) kind of recovery information.  In the mirroring post we examined data recovery from a copy perspective.  This is a pretty straight forward concept – if I make two copies of data and I lose one copy, I can recover from the remaining copy.

The downside of this lies in the copy itself.  It requires double the amount of space (which we now refer to as a 50% usable capacity penalty) to protect data with an identical copy.

Parity is a more efficient form of recovery information.  Let’s walk through a simple example, but one that I hope will really illustrate the mechanism, benefits, and problems of parity.  Say that I’m writing numbers to disk, the numbers 18, 24, 9, and last but certainly not least, 42.

noparity

As previously discussed, a mirroring strategy would require 4 additional disks in order to mirror the existing data – very inefficient from a capacity perspective.

Instead, I’m going to perform a parity calculation – or a calculation that results in some single value I can use to recover from.  In this case I’m going to use simple addition to create it.

18 + 24 + 9 + 42 = 93

So my parity value is 93 and I can use this for recovery (I’ll explain how in just a moment).

Next question – where can I store this value?  Well we probably shouldn’t use any of the existing disks, because they contain the information we are protecting.  This is a pretty common strategy for recovery. If I’m protecting funny kitten videos on my hard drive, I don’t want to back them up to the same disk because a disk failure takes out the original hilarious videos and the adorable backups. Instead I want to store them on a different physical medium to protect from physical failure.

kittehbackup

Similarly, in my parity scheme if a disk were to fail that contained data and the parity value, I would be out of luck.

To get around this, I’ll add a fifth disk and write this value:

parity

Now the real question: how do I use it for recovery?  Pretty simply in this case.  Any disk that is lost can be rebuilt by utilizing the parity information, along with the remaining data values.  If Disk3 dies, I can recover the data on it by subtracting the remaining data values from the parity value:

reconstruct

Success – data recovery after a disk failure…and it was accomplished without adding a complete data copy!  In this case 1/5 of the disk space is lost to parity, translating to a 20% usable capacity penalty.  That is a serious improvement from mirroring.

What happens if there are two disks that fail? As with most things, it depends.  Just like RAID1, if a second disk fails after the system has fully recovered from the first failure, everything is fine.

But if there is a simultaneous failure of two disks?  This presents a recovery problem because there are two unknowns.  If Disk1 and Disk2 are lost simultaneously, my equation looks like:

93-42-9 = 42 = ?+?

Or in English, what two values add up to 42?  While it is true that 18+24=42, so does 20+22, neither of which are my data.  There are a lot of values that meet this criteria…in this case more of them aren’t my data than are.  And guessing with data recovery is, in technical terms, a terribad idea.  So we know that this parity scheme can survive a single disk failure.

Another important question – what happens if we overwrite data?  For instance, if Disk2’s value of 24 gets overwritten with a value of 15, how do we adjust?  It would be a real bummer if the system had to read all of the data in the stripe to calculate parity again for just one affected strip.

There is some re-reading of data, but it isn’t near that bad.  We remove the old data value (24) from the parity value (93), and then add the new one (15) in.  Then we can replace both the data and the parity on disk.

The process looks like:

  1. Get the old parity value
  2. Get the old data value
  3. Subtract the old data value from the old parity value, creating an intermediate parity value
  4. Add the new data value to the intermediate parity value, creating the new parity value
  5. Update the parity value with the new parity value
  6. Update the data value with the new data value

Because we are working with disks, we can replace the “Get” phrasing with read and “Update” with write.  Looking back, at this list, we see that there are two gets (reads) and two updates (writes).

The Magical XOR Operation

I know, I know – my example was amazing.  “Parity calculations should use that mechanism!,” I can hear you saying.  “Give him the copyright and millions in royalties,” you are no doubt proclaiming to everyone around you.

Unfortunately my scheme has a serious problem, and that is that my parity calculation is accumulative.  The larger the numbers get that I am protecting, the larger my parity value gets. Remember at the end of the day we aren’t working with numbers. We are working with data bits (1’s and 0’s) on disk, and we are working with a fixed strip size.  Were we to write 4 x 64KB of data with accumulative parity, I would need 256KB of parity to protect it.  Not ideal – this is essentially the same as mirroring’s usable capacity penalty!

Instead, in the real world parity is supported entirely by the bitwise “exclusive or,” or XOR, operation.  Visually the operator itself looks kind of like a crosshair, or a plus inside a circle.  XOR is a very unique operator in that it essentially allows you to add and subtract from a value (similar to my scheme) without increasing the total amount of information (the bit count).

Another cool thing about the bitwise XOR is it functions both as addition and subtraction.  To “add” two values, you XOR them together.  Then to remove one value from it, you XOR that value again. So instead of A + B – B = A, we have:

xoraddsub

XOR’s principle is very simple – if two values are different, the output is TRUE (1); otherwise the output is FALSE (0).  In other words:

  1. Take any pair of 1’s and/or 0’s and compare
  2. If they are the same, output 0
  3. Else output 1

That’s all, folks.  The 4 possible input combinations and their outputs are:

  • 0 XOR 0 = 0
  • 0 XOR 1 = 1
  • 1 XOR 0 = 1
  • 1 XOR 1 = 0

Not too crazy looking is it?  Let’s put it to the test and prove that this works for recovery.  Similar to before, I have 4 data disks but this time with just ones and zeros on them.  The parity calculation goes like so:

xorcalc

Every data bit gets XOR’d together and the result is the parity bit – in this case it is 1.

Now say Disk2 with 0 on it experiences a failure and the system needs to recover. Recovery would simply be the result of XORing all the remaining data bits and the parity bit.

Recovery (from left to right): 1 XOR 1 = 0, XOR 1 = 1, XOR 1 = 0

Success!  We have recovered the 0 bit.  Unfortunately XOR’s magic only extends so far, and in the event of a simultaneous two disk failure we are still up the creek without a paddle.  One parity value can only protect you against the loss of one disk because it can only recover one unknown value.

How about updating the parity bit in the event that we overwrite some data? Again, this works as outlined above:

  1. Read the old parity value
  2. Read the old data value
  3. XOR the old data value with the old parity value, creating an intermediate parity value
  4. XOR the new data value with the intermediate parity value, creating the new parity value
  5. Write the parity value with the new parity value
  6. Write the data value with the new data value

Same as before, there are two reads and two writes.

This is obviously a very simple example and was in no way meant to be a mathematical proof of parity recovery or XOR, but it works on any scale you choose.  Test it out!

Quick note – in the real world, strip size is not a single bit like in this example. With a 64KB strip size, that comes out to 524288 bits in a single strip.  524288 ones and zeros.  XOR functions quite simply as you add bits on, since it just compares each bit in place.  For example, 1100 XOR 1010 is 0110.  The first digit of the result is the first digit of each input XOR’d together.  The second digit is the second digits of each input XOR’d together.  And so on.  There are more detailed XOR manifestos out there as well as XOR calculators online…feel free to consult if you are interested and my explanation left you wanting.

Bitwise XOR is the mechanism for generating a parity bit without increasing the total bit count.  Using this, RAID controllers can generate parity that is identical in size to any amount of data strips being protected.  No matter the strip size, and no matter the stripe width, this mechanism will always result in an identically sized parity.

So what?

So what, indeed.  That was a lot to take in, I know…parity is certainly more complicated than mirroring.  Is it absolutely necessary to understand how parity works at this level?  No, not really.  But thus far I’ve never had a problem arise because I understood how something worked too deeply. I have encountered plenty of issues because I haven’t understood how something worked, or made assumptions about what something was doing behind the scenes.

When we get into some aspects of RAID5 and RAID6, understanding what parity is supposed to do will help clarify what those RAID types are useful for.  And if you don’t agree, feel free to wipe this from your memory banks and replace it with something more useful.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s