EMC Recoverpoint and XtremIO Part 4 – Recovery and Summary

In this final post we are going to cover a simple recovery, as well as do a quick summary.  I’ll throw in a few bonus details for free.

Recovery

Our CG has been running now for over 48 hours with our configuration – 48 hours Required Protection Window, 48 max snaps, one snap per hour.  Notice below that I have exactly (or just under, depending on how you measure) a 48 hour protection window.  I have one snap per hour for 48 hours and that is what is retained.  This is because of how I constructed my settings!

xsumm1

If I reduce my Required Protection Window to 24 hours, notice that IMMEDIATELY the snaps past 24 hours are nuked:

xsumm2

The distribution of snaps in this case wouldn’t be different because of how the CG is constructed (one snap per hour, 48 max snaps, 24 hour protection window = 1 snap per hour for 24 hours), but again notice that the Required Protection Window is much more than just an alerting setting in RP+XtremIO.

Alright, back to our recovery example.  Someone dumb like myself ignored all the “Important” naming and decided to delete that VM.

xsumm3

Even worse, they decided to just delete the entire datastore afterwards.

xsumm4

But lucky for us we have RP protection enabled.  I’m going to head to RP and use the Test a Copy and Recover Production button.

xsumm5

I’ll choose my replica volume:

xsumm6

Then I decide I don’t want to use the latest image because I’m worried that the deletion actually exists in that snapshot.  I choose one hour prior to the latest snap.  Quick note: see that virtual access is not even available now?  That’s because with snap based promotion there is no need for it.  Snaps are instantly promoted to the actual replica LUN, so physical access is always available and always immediate no matter how old the image.

xsumm7

After I hit next, it spins up the Test a Copy screen.  Now normally I might want to map this LUN to a host and actually check it to make sure that this is a valid copy.  In this case because, say, I’ve tracked the bad user’s steps through vCenter logging, I know exactly when I need to recover.  An important note though, as you’ll see in a second all snapshots taken AFTER your recovery image will be deleted!  But again, because I’m a real maverick I just tell it go to ahead and do the production recovery.

xsumm8

It gives me a warning that prod is going to be overwritten, and that data transfer will be paused.  It doesn’t warn you about the snapshot deletion but this has historically been RP behavior.

xsumm9

On the host side I do a rescan, and there’s my datastore.  It is unmounted at the moment so I’ll choose to mount it.

xsumm10

Next, because I deleted that VM I need to browse the datastore and import the VMX file back into vCenter.

xsumm11 xsumm12

And just like that I’ve recovered my VM.  Easy as pie!

xsumm13

Now, notice that I recovered using the 2:25 snap, and below this is now my snapshot list.  The 3:25 and the 2:25 snap that I used are both deleted.  This is actually kind of interesting because an awesome feature of XtremIO is that all snaps (even snaps of snaps) are independent entities; intermediate snaps can be deleted with no consequence.  So in this case I don’t necessarily think this deletion of all subsequent snaps is a requirement, however it certainly makes logical sense that they should be deleted to avoid confusion.  I don’t want a snapshot of bad data hanging around in my environment.

xsumm14

Summary

In summary, it looks like this snap recovery is fantastic as long as you take the time to understand the behavior.  Like most things, planning is essential to ensure you get a good balance of your required protection and capacity savings.  I hope for some more detailed breakdowns from EMC on the behavior of the snapshot pruning policies, and the full impact that settings like Required Protection Window have in the environment.

Also, don’t underestimate the 8,192 max snaps+vols for a single XMS system, especially if you are managing multiple clusters per XMS!  If I had to guess I would guess that this value will be bumped up in a future release considering these new factors, but in the meantime make sure you don’t overrun your environment.  Remember, you can still use a single XMS per cluster in order to sort of artificially inflate your snap ceiling.

Bonus Deets!

A couple of things of note.

First, in my last post I stated that I had notice a bug with settings not “sticking.”  After talking with a customer, he indicated this doesn’t have to do with the settings (the values) but with the process itself.  Something about the order is important here.  And now I believe this to be true because if I recreate a CG with those same busted settings, it works every time!  I can’t get it to break. 🙂  I still believe this to be a bug so just double check your CG settings after creating.

Second, keep in mind that today XtremIO dashboard settings display your provisioned capacity based on volumes and snapshots on the system, with no regard for who created those snaps.  So you can imagine with a snap based recovery tool, things get out of hand quickly. I’m talking about 1.4PB (no typo – PETAbytes) “provisioned” on a 20TB brick!

DC2_20T

While this is definitely a testament to the power (or insanity?) of thin provisioning, I’m trying to put in a feature request to get this fixed in the future because it really messes with the dashboard relevance.  But for the moment just note that for anything you protect with RP:

  • On the Production side, you will see a 2x factor of provisioning.  So if you protected 30TB of LUNs, your provisioned space (from those LUNs) will be 60TB.
  • On the Replica side, you will see a hilarious factor of provisioning, depending on how many snaps you are keeping.

I hope this series has been useful – I’m really excited about this new technology pairing!

RAID: Part 4 – Parity, Schmarity

We’re in the home stretch now.  We’ve covered mirroring and striping, and three RAID types already.

Now we are moving into RAID5 and RAID6.  They leverage striping and a concept called parity.

I’m still new to this blogging thing and once again I bit off more than I could chew.  I wanted to include RAID5 and 6 within this post but it again got really lengthy.  Rather than put an abridged version of them in here, I will cover them in the next post.

For a long time I didn’t really understand what parity was, I just knew it was “out there” and let us recover from disk failures.  But the more I looked into it and how it worked, the more it really amazed me.  It might not grab you the way it grabs me…and that’s OK, maybe you just want to move on to RAID5/6 directly.  But if you are really interested in how it works, forge on.

What is Parity?

Parity (pronounced ‘pear-ih-tee’) is a fancy (pronounced ‘faincy’) kind of recovery information.  In the mirroring post we examined data recovery from a copy perspective.  This is a pretty straight forward concept – if I make two copies of data and I lose one copy, I can recover from the remaining copy.

The downside of this lies in the copy itself.  It requires double the amount of space (which we now refer to as a 50% usable capacity penalty) to protect data with an identical copy.

Parity is a more efficient form of recovery information.  Let’s walk through a simple example, but one that I hope will really illustrate the mechanism, benefits, and problems of parity.  Say that I’m writing numbers to disk, the numbers 18, 24, 9, and last but certainly not least, 42.

noparity

As previously discussed, a mirroring strategy would require 4 additional disks in order to mirror the existing data – very inefficient from a capacity perspective.

Instead, I’m going to perform a parity calculation – or a calculation that results in some single value I can use to recover from.  In this case I’m going to use simple addition to create it.

18 + 24 + 9 + 42 = 93

So my parity value is 93 and I can use this for recovery (I’ll explain how in just a moment).

Next question – where can I store this value?  Well we probably shouldn’t use any of the existing disks, because they contain the information we are protecting.  This is a pretty common strategy for recovery. If I’m protecting funny kitten videos on my hard drive, I don’t want to back them up to the same disk because a disk failure takes out the original hilarious videos and the adorable backups. Instead I want to store them on a different physical medium to protect from physical failure.

kittehbackup

Similarly, in my parity scheme if a disk were to fail that contained data and the parity value, I would be out of luck.

To get around this, I’ll add a fifth disk and write this value:

parity

Now the real question: how do I use it for recovery?  Pretty simply in this case.  Any disk that is lost can be rebuilt by utilizing the parity information, along with the remaining data values.  If Disk3 dies, I can recover the data on it by subtracting the remaining data values from the parity value:

reconstruct

Success – data recovery after a disk failure…and it was accomplished without adding a complete data copy!  In this case 1/5 of the disk space is lost to parity, translating to a 20% usable capacity penalty.  That is a serious improvement from mirroring.

What happens if there are two disks that fail? As with most things, it depends.  Just like RAID1, if a second disk fails after the system has fully recovered from the first failure, everything is fine.

But if there is a simultaneous failure of two disks?  This presents a recovery problem because there are two unknowns.  If Disk1 and Disk2 are lost simultaneously, my equation looks like:

93-42-9 = 42 = ?+?

Or in English, what two values add up to 42?  While it is true that 18+24=42, so does 20+22, neither of which are my data.  There are a lot of values that meet this criteria…in this case more of them aren’t my data than are.  And guessing with data recovery is, in technical terms, a terribad idea.  So we know that this parity scheme can survive a single disk failure.

Another important question – what happens if we overwrite data?  For instance, if Disk2’s value of 24 gets overwritten with a value of 15, how do we adjust?  It would be a real bummer if the system had to read all of the data in the stripe to calculate parity again for just one affected strip.

There is some re-reading of data, but it isn’t near that bad.  We remove the old data value (24) from the parity value (93), and then add the new one (15) in.  Then we can replace both the data and the parity on disk.

The process looks like:

  1. Get the old parity value
  2. Get the old data value
  3. Subtract the old data value from the old parity value, creating an intermediate parity value
  4. Add the new data value to the intermediate parity value, creating the new parity value
  5. Update the parity value with the new parity value
  6. Update the data value with the new data value

Because we are working with disks, we can replace the “Get” phrasing with read and “Update” with write.  Looking back, at this list, we see that there are two gets (reads) and two updates (writes).

The Magical XOR Operation

I know, I know – my example was amazing.  “Parity calculations should use that mechanism!,” I can hear you saying.  “Give him the copyright and millions in royalties,” you are no doubt proclaiming to everyone around you.

Unfortunately my scheme has a serious problem, and that is that my parity calculation is accumulative.  The larger the numbers get that I am protecting, the larger my parity value gets. Remember at the end of the day we aren’t working with numbers. We are working with data bits (1’s and 0’s) on disk, and we are working with a fixed strip size.  Were we to write 4 x 64KB of data with accumulative parity, I would need 256KB of parity to protect it.  Not ideal – this is essentially the same as mirroring’s usable capacity penalty!

Instead, in the real world parity is supported entirely by the bitwise “exclusive or,” or XOR, operation.  Visually the operator itself looks kind of like a crosshair, or a plus inside a circle.  XOR is a very unique operator in that it essentially allows you to add and subtract from a value (similar to my scheme) without increasing the total amount of information (the bit count).

Another cool thing about the bitwise XOR is it functions both as addition and subtraction.  To “add” two values, you XOR them together.  Then to remove one value from it, you XOR that value again. So instead of A + B – B = A, we have:

xoraddsub

XOR’s principle is very simple – if two values are different, the output is TRUE (1); otherwise the output is FALSE (0).  In other words:

  1. Take any pair of 1’s and/or 0’s and compare
  2. If they are the same, output 0
  3. Else output 1

That’s all, folks.  The 4 possible input combinations and their outputs are:

  • 0 XOR 0 = 0
  • 0 XOR 1 = 1
  • 1 XOR 0 = 1
  • 1 XOR 1 = 0

Not too crazy looking is it?  Let’s put it to the test and prove that this works for recovery.  Similar to before, I have 4 data disks but this time with just ones and zeros on them.  The parity calculation goes like so:

xorcalc

Every data bit gets XOR’d together and the result is the parity bit – in this case it is 1.

Now say Disk2 with 0 on it experiences a failure and the system needs to recover. Recovery would simply be the result of XORing all the remaining data bits and the parity bit.

Recovery (from left to right): 1 XOR 1 = 0, XOR 1 = 1, XOR 1 = 0

Success!  We have recovered the 0 bit.  Unfortunately XOR’s magic only extends so far, and in the event of a simultaneous two disk failure we are still up the creek without a paddle.  One parity value can only protect you against the loss of one disk because it can only recover one unknown value.

How about updating the parity bit in the event that we overwrite some data? Again, this works as outlined above:

  1. Read the old parity value
  2. Read the old data value
  3. XOR the old data value with the old parity value, creating an intermediate parity value
  4. XOR the new data value with the intermediate parity value, creating the new parity value
  5. Write the parity value with the new parity value
  6. Write the data value with the new data value

Same as before, there are two reads and two writes.

This is obviously a very simple example and was in no way meant to be a mathematical proof of parity recovery or XOR, but it works on any scale you choose.  Test it out!

Quick note – in the real world, strip size is not a single bit like in this example. With a 64KB strip size, that comes out to 524288 bits in a single strip.  524288 ones and zeros.  XOR functions quite simply as you add bits on, since it just compares each bit in place.  For example, 1100 XOR 1010 is 0110.  The first digit of the result is the first digit of each input XOR’d together.  The second digit is the second digits of each input XOR’d together.  And so on.  There are more detailed XOR manifestos out there as well as XOR calculators online…feel free to consult if you are interested and my explanation left you wanting.

Bitwise XOR is the mechanism for generating a parity bit without increasing the total bit count.  Using this, RAID controllers can generate parity that is identical in size to any amount of data strips being protected.  No matter the strip size, and no matter the stripe width, this mechanism will always result in an identically sized parity.

So what?

So what, indeed.  That was a lot to take in, I know…parity is certainly more complicated than mirroring.  Is it absolutely necessary to understand how parity works at this level?  No, not really.  But thus far I’ve never had a problem arise because I understood how something worked too deeply. I have encountered plenty of issues because I haven’t understood how something worked, or made assumptions about what something was doing behind the scenes.

When we get into some aspects of RAID5 and RAID6, understanding what parity is supposed to do will help clarify what those RAID types are useful for.  And if you don’t agree, feel free to wipe this from your memory banks and replace it with something more useful.