EMC Recoverpoint and XtremIO Part 4 – Recovery and Summary

In this final post we are going to cover a simple recovery, as well as do a quick summary.  I’ll throw in a few bonus details for free.

Recovery

Our CG has been running now for over 48 hours with our configuration – 48 hours Required Protection Window, 48 max snaps, one snap per hour.  Notice below that I have exactly (or just under, depending on how you measure) a 48 hour protection window.  I have one snap per hour for 48 hours and that is what is retained.  This is because of how I constructed my settings!

xsumm1

If I reduce my Required Protection Window to 24 hours, notice that IMMEDIATELY the snaps past 24 hours are nuked:

xsumm2

The distribution of snaps in this case wouldn’t be different because of how the CG is constructed (one snap per hour, 48 max snaps, 24 hour protection window = 1 snap per hour for 24 hours), but again notice that the Required Protection Window is much more than just an alerting setting in RP+XtremIO.

Alright, back to our recovery example.  Someone dumb like myself ignored all the “Important” naming and decided to delete that VM.

xsumm3

Even worse, they decided to just delete the entire datastore afterwards.

xsumm4

But lucky for us we have RP protection enabled.  I’m going to head to RP and use the Test a Copy and Recover Production button.

xsumm5

I’ll choose my replica volume:

xsumm6

Then I decide I don’t want to use the latest image because I’m worried that the deletion actually exists in that snapshot.  I choose one hour prior to the latest snap.  Quick note: see that virtual access is not even available now?  That’s because with snap based promotion there is no need for it.  Snaps are instantly promoted to the actual replica LUN, so physical access is always available and always immediate no matter how old the image.

xsumm7

After I hit next, it spins up the Test a Copy screen.  Now normally I might want to map this LUN to a host and actually check it to make sure that this is a valid copy.  In this case because, say, I’ve tracked the bad user’s steps through vCenter logging, I know exactly when I need to recover.  An important note though, as you’ll see in a second all snapshots taken AFTER your recovery image will be deleted!  But again, because I’m a real maverick I just tell it go to ahead and do the production recovery.

xsumm8

It gives me a warning that prod is going to be overwritten, and that data transfer will be paused.  It doesn’t warn you about the snapshot deletion but this has historically been RP behavior.

xsumm9

On the host side I do a rescan, and there’s my datastore.  It is unmounted at the moment so I’ll choose to mount it.

xsumm10

Next, because I deleted that VM I need to browse the datastore and import the VMX file back into vCenter.

xsumm11 xsumm12

And just like that I’ve recovered my VM.  Easy as pie!

xsumm13

Now, notice that I recovered using the 2:25 snap, and below this is now my snapshot list.  The 3:25 and the 2:25 snap that I used are both deleted.  This is actually kind of interesting because an awesome feature of XtremIO is that all snaps (even snaps of snaps) are independent entities; intermediate snaps can be deleted with no consequence.  So in this case I don’t necessarily think this deletion of all subsequent snaps is a requirement, however it certainly makes logical sense that they should be deleted to avoid confusion.  I don’t want a snapshot of bad data hanging around in my environment.

xsumm14

Summary

In summary, it looks like this snap recovery is fantastic as long as you take the time to understand the behavior.  Like most things, planning is essential to ensure you get a good balance of your required protection and capacity savings.  I hope for some more detailed breakdowns from EMC on the behavior of the snapshot pruning policies, and the full impact that settings like Required Protection Window have in the environment.

Also, don’t underestimate the 8,192 max snaps+vols for a single XMS system, especially if you are managing multiple clusters per XMS!  If I had to guess I would guess that this value will be bumped up in a future release considering these new factors, but in the meantime make sure you don’t overrun your environment.  Remember, you can still use a single XMS per cluster in order to sort of artificially inflate your snap ceiling.

Bonus Deets!

A couple of things of note.

First, in my last post I stated that I had notice a bug with settings not “sticking.”  After talking with a customer, he indicated this doesn’t have to do with the settings (the values) but with the process itself.  Something about the order is important here.  And now I believe this to be true because if I recreate a CG with those same busted settings, it works every time!  I can’t get it to break. 🙂  I still believe this to be a bug so just double check your CG settings after creating.

Second, keep in mind that today XtremIO dashboard settings display your provisioned capacity based on volumes and snapshots on the system, with no regard for who created those snaps.  So you can imagine with a snap based recovery tool, things get out of hand quickly. I’m talking about 1.4PB (no typo – PETAbytes) “provisioned” on a 20TB brick!

DC2_20T

While this is definitely a testament to the power (or insanity?) of thin provisioning, I’m trying to put in a feature request to get this fixed in the future because it really messes with the dashboard relevance.  But for the moment just note that for anything you protect with RP:

  • On the Production side, you will see a 2x factor of provisioning.  So if you protected 30TB of LUNs, your provisioned space (from those LUNs) will be 60TB.
  • On the Replica side, you will see a hilarious factor of provisioning, depending on how many snaps you are keeping.

I hope this series has been useful – I’m really excited about this new technology pairing!

RAID: Part 6 – WrapUp

Finally the end – what a long, wordy trip it has been.  If you waded through all 5 posts, awesome!

As a final post, I wanted to attempt to bring all of the high points together and draw some contrasts between the RAID types I’ve discussed.  My goal with this post is less about the technical minutia and more about providing some strong direction to equip readers to make informed decisions.

Does Any of This Matter?

I always spend some time asking myself this question as I dive further and further down the rabbit hole on topics like this.  It is certainly possible that you can interact with storage and not understand details about RAID.  However I am a firm believer that you should understand it.  RAID is the foundation on which everything is built.  It is used in almost every storage platform out there.  It dictates behavior.  Making a smart choice here can save you money or waste it.  It can improve storage performance or cripple it.

I also like the idea that understanding the building blocks can later empower you to understand even more concepts.  For instance, if you’ve read through this you understand about mirroring, striping, and parity.  Pop quiz: what would a RAID5/0 look like?

raid50

Pretty neat that even without me describing it in detail, you can understand a lot about how this RAID type would function.  You’d know the failure capabilities and the write penalties of the individual RAID5 members.  And you’d know that the configuration couldn’t survive a failure of either RAID5 set because of the top level striping configuration.  And let’s say that I told you the strip size of the RAID5 group was 64KB, and that the strip size of the RAID0 config was 256MB.  Believe it or not, this is a pretty accurate description of a 10 disk VNX2 storage pool from a single tier RAID5 perspective.

Again to me this is part of the value – when fancy new things come out, the fundamental building blocks are often the same.  If you understand the functionality of the building block, then you can extrapolate functionality of many things.  And if I give you a new storage widget to look at, you’ll instantly understand certain things about it based on the underlying RAID configuration.  It puts you in a much better position than just memorizing that RAID5 is “parity.”

Okay, I’m off my soapbox!

Workload – Read

  • RAID1/0 – Great
  • RAID5 – Great
  • RAID6 – Great

I’ve probably hammered this home by now, but when we are looking at largely read workloads (or just the read portion of any workload) the RAID type is mostly irrelevant from a performance perspective in non-degraded mode.  But as with any blanket statement, there are caveats.  Here are some things to keep in mind.

  • Your read performance will depend almost entirely on the underlying disk (ignoring sequential reads and prefetching).  I’m not talking about the obvious flash vs NLSAS; I’m talking about RAID group sizing.  As a general statement I can say that RAID1/0 performs identically to RAID5 for pure read workloads, but an 8 disk RAID1/0 is going to outperform a 4+1 RAID5.
  • Ask the question and do tests to confirm: does your storage platform round robin reads between mirror pairs in RAID1/0?  If not (and not all controllers do), your RAID1/0 read performance is going to be constrained to half of the spindles.  From the previous bullet point, our 8 disk RAID1/0 would be outperformed by a 4+1 disk RAID5 in reads because only 4 of the 8 spindles are actually servicing read requests.

Workload – Write

  • RAID1/0 – Great (write penalty of 2)
  • RAID5 – Okay (write penalty of 4)
  • RAID6 – Bad (write penalty of 6)

Writes are where the RAID types start to diverge pretty dramatically due to the vastly different write penalties between them.  Yet once again sometimes people draw the wrong conclusion from the general idea that RAID1/0 is more efficient at writes than RAID6.

  • The underlying disk structure is still dramatically important.  A lot of people seem to focus on “workload isolation,” meaning e.g. with a database that I would put the data on RAID5 and the transaction logs on RAID1/0.  This is a great idea from a design perspective starting with a blank slate.  However, what if my RAID5 disk pool I’m working with is 200 disks and I only have 4 disks for RAID1/0?  In this case I’m pretty much a lock to have better success dropping logs into the RAID5 pool because there are WAY more spindles to support the I/O.  There are a lot of variables here about the workload, but the point I’m trying to make is you should take a look at all the parts as a whole when making these decisions.
  • If your write workload is large block sequential, take a look at RAID5 or RAID6 over RAID1/0 – you will typically see much more efficient I/O in these cases.  However, make sure you do proper analysis and don’t end up with heavy small block random writes on RAID6.

Going back and re-reading some of my previous posts, I feel like I may have given the impression that I don’t like RAID1/0.  Or that I don’t see value in RAID1/0.  That is certainly not the case and I wanted to draw an example to show when you need to use RAID1/0 without question.  That example is when we see a “lot” of small block random writes and don’t need excessive amounts of capacity.  What is a “lot”?  Good question.  Typically the breaking point is around 30-40% write ratio.

Given that a SAS drive should only be allowed to support around 180 IOPs, let’s crunch some numbers for an imaginary 10,000 front end IOPs workload. How many spindles do we need to support the workload at specific read/write ratios?  (I will do another blog post on the specifics of these calculations)

Read/Write Ratio RAID1/0 disk count RAID5 disk count RAID6 disk count
90%/10% 62 73 78
75%/25% 70 98 123
60%/40% 78 125 167

So, at lighter write percentages, the difference in the RAID type doesn’t matter as much.  But as we already learned RAID1/0 is the most efficient at back end writes, and this gets incredibly apparent at the 60/40 split.  In fact, I need over twice the amount of spindles if I choose RAID6 instead of RAID1/0 to support the workload.  Twice the amount of hardware up front, and then twice the amount of power suckers and heat producers sitting your data center for years.

Capacity Factor

  • RAID1/0 – Bad (50% penalty)
  • RAID5 – Great (generally ~20% penalty or less)
  • RAID6 – Great (generally ~25% penalty or less)

Capacity is a pretty straightforward thing so I’m not going to belabor the point – you need some amount of capacity and you can very quickly calculate how many disks you need of the different RAID types.

  • You can get more or less capacity out of RAID5 or 6 by adjusting RAID group size, though remember the protection caveats.
  • Remember that in some cases (for instance, storage pools on an EMC VNX) a choice of RAID type today locks you in on that pool forever.  By this I mean to say, if someone else talks you into RAID1/0 today and it isn’t needed, not only is it needlessly expensive today, but as you add storage capacity to that pool it is needlessly expensive for years.

Protection Factor

  • RAID1/0 – Lottery! (meaning, there is a lot of random chance here)
  • RAID5 – Good
  • RAID6 – Great

As we’ve discussed, the types vary in protection factor as well.

  • Because of RAID1/0’s lottery factor on losing the 2nd disk, the only thing we can state for certain is that RAID1/0 and RAID6 are better than RAID5 from a protection standpoint.  By that I mean, it is entirely possible that the 2nd simultaneous disk failure will invalidate a RAID1/0 set if it is the exact right disk, but there is a chance that it won’t.  For RAID5, a 2nd simultaneous failure will invalidate the set every time.
  • Remember is that RAID1/0 is much better behaved in a degraded and rebuild scenario than RAID5 or 6.  If you are planning on squeezing every ounce of performance out of your storage while it is healthy and can’t stand any performance hit, RAID1/0 is probably a better choice.  Although I will say that I don’t recommend running a production environment like this!
  • You can squeeze extra capacity out of RAID5 and 6 by increasing the RAID group size, but keep it within sane limits.  Don’t forget the extra trouble you can have from a fault domain and degraded/rebuild standpoint as the RAID group size gets larger.
  • Finally, remember that RAID is not a substitute for backups.  RAID will do the best it can to protect you from physical failures, but it has limits and does nothing to protect you from logical corruption.

Summary

I think I’ve established that there are a lot of factors to consider when choosing a RAID type.  At the end of the day, you want to satisfy requirements while saving money.  In that vein, here are some summary thoughts.

If you have a very transactional database, or are looking into VDI, RAID1/0 is probably going to be very appealing from a cost perspective because these workloads tend to be IOPs constrained with a heavy write percentage.  On the other hand, less transactional databases, application, and file storage tend to be capacity constrained with a low write percentage.  In these cases RAID5 or 6 are going to look better.

In general the following RAID types are a good fit in the following disk tiers, for the following reasons:

  • EFD (a.k.a. Flash or SSD) – RAID5.  Response time here is not really an issue, instead you want to squeeze as much capacity as possible out of them for use, ’cause these puppies are pricey!  RAID5 does that for us.
  • SAS (a.k.a. FC) – RAID5 or RAID1/0.  The choice here hinges on write percentage.  RAID6 on these guys is typically a waste of space and added write penalty.  They rebuild fast enough that RAID5 is acceptable.  Note – as these disks get larger and larger this may shift towards RAID1/0 or RAID6 due to rebuild times or even UBEs, but these are actually enterprise grade and have exponentially less UBE rate.
  • NLSAS (a.k.a. SATA) – RAID6.  Please use RAID6 for these disks.  As previously stated, they need the added protection of the extra parity, and you should be able to justify the cost.

Again, this is just in general, and I can’t overstate the need for solid analysis.

Hopefully this has been accurate and useful. I really enjoyed writing this up and hope to continue producing useful (and accurate!) material in the future.