EMC Recoverpoint and XtremIO Part 4 – Recovery and Summary

In this final post we are going to cover a simple recovery, as well as do a quick summary.  I’ll throw in a few bonus details for free.

Recovery

Our CG has been running now for over 48 hours with our configuration – 48 hours Required Protection Window, 48 max snaps, one snap per hour.  Notice below that I have exactly (or just under, depending on how you measure) a 48 hour protection window.  I have one snap per hour for 48 hours and that is what is retained.  This is because of how I constructed my settings!

xsumm1

If I reduce my Required Protection Window to 24 hours, notice that IMMEDIATELY the snaps past 24 hours are nuked:

xsumm2

The distribution of snaps in this case wouldn’t be different because of how the CG is constructed (one snap per hour, 48 max snaps, 24 hour protection window = 1 snap per hour for 24 hours), but again notice that the Required Protection Window is much more than just an alerting setting in RP+XtremIO.

Alright, back to our recovery example.  Someone dumb like myself ignored all the “Important” naming and decided to delete that VM.

xsumm3

Even worse, they decided to just delete the entire datastore afterwards.

xsumm4

But lucky for us we have RP protection enabled.  I’m going to head to RP and use the Test a Copy and Recover Production button.

xsumm5

I’ll choose my replica volume:

xsumm6

Then I decide I don’t want to use the latest image because I’m worried that the deletion actually exists in that snapshot.  I choose one hour prior to the latest snap.  Quick note: see that virtual access is not even available now?  That’s because with snap based promotion there is no need for it.  Snaps are instantly promoted to the actual replica LUN, so physical access is always available and always immediate no matter how old the image.

xsumm7

After I hit next, it spins up the Test a Copy screen.  Now normally I might want to map this LUN to a host and actually check it to make sure that this is a valid copy.  In this case because, say, I’ve tracked the bad user’s steps through vCenter logging, I know exactly when I need to recover.  An important note though, as you’ll see in a second all snapshots taken AFTER your recovery image will be deleted!  But again, because I’m a real maverick I just tell it go to ahead and do the production recovery.

xsumm8

It gives me a warning that prod is going to be overwritten, and that data transfer will be paused.  It doesn’t warn you about the snapshot deletion but this has historically been RP behavior.

xsumm9

On the host side I do a rescan, and there’s my datastore.  It is unmounted at the moment so I’ll choose to mount it.

xsumm10

Next, because I deleted that VM I need to browse the datastore and import the VMX file back into vCenter.

xsumm11 xsumm12

And just like that I’ve recovered my VM.  Easy as pie!

xsumm13

Now, notice that I recovered using the 2:25 snap, and below this is now my snapshot list.  The 3:25 and the 2:25 snap that I used are both deleted.  This is actually kind of interesting because an awesome feature of XtremIO is that all snaps (even snaps of snaps) are independent entities; intermediate snaps can be deleted with no consequence.  So in this case I don’t necessarily think this deletion of all subsequent snaps is a requirement, however it certainly makes logical sense that they should be deleted to avoid confusion.  I don’t want a snapshot of bad data hanging around in my environment.

xsumm14

Summary

In summary, it looks like this snap recovery is fantastic as long as you take the time to understand the behavior.  Like most things, planning is essential to ensure you get a good balance of your required protection and capacity savings.  I hope for some more detailed breakdowns from EMC on the behavior of the snapshot pruning policies, and the full impact that settings like Required Protection Window have in the environment.

Also, don’t underestimate the 8,192 max snaps+vols for a single XMS system, especially if you are managing multiple clusters per XMS!  If I had to guess I would guess that this value will be bumped up in a future release considering these new factors, but in the meantime make sure you don’t overrun your environment.  Remember, you can still use a single XMS per cluster in order to sort of artificially inflate your snap ceiling.

Bonus Deets!

A couple of things of note.

First, in my last post I stated that I had notice a bug with settings not “sticking.”  After talking with a customer, he indicated this doesn’t have to do with the settings (the values) but with the process itself.  Something about the order is important here.  And now I believe this to be true because if I recreate a CG with those same busted settings, it works every time!  I can’t get it to break. 🙂  I still believe this to be a bug so just double check your CG settings after creating.

Second, keep in mind that today XtremIO dashboard settings display your provisioned capacity based on volumes and snapshots on the system, with no regard for who created those snaps.  So you can imagine with a snap based recovery tool, things get out of hand quickly. I’m talking about 1.4PB (no typo – PETAbytes) “provisioned” on a 20TB brick!

DC2_20T

While this is definitely a testament to the power (or insanity?) of thin provisioning, I’m trying to put in a feature request to get this fixed in the future because it really messes with the dashboard relevance.  But for the moment just note that for anything you protect with RP:

  • On the Production side, you will see a 2x factor of provisioning.  So if you protected 30TB of LUNs, your provisioned space (from those LUNs) will be 60TB.
  • On the Replica side, you will see a hilarious factor of provisioning, depending on how many snaps you are keeping.

I hope this series has been useful – I’m really excited about this new technology pairing!

SAN vs NAS Part 5: Summary

We’ve covered a lot of information over this series, some of it more easily consumable than others.  Hopefully it has been a good walkthrough of the main differences between SAN and NAS storage, and presented in a little different way than you may have seen in the past.

I wanted to summarize the high points before focusing on a few key issues:

  • SAN storage is fundamentally block I/O, which is SCSI.  With SAN storage, your local machine “sees” something that it thinks is a locally attached disk.  In this case your local machine manages the file system, and transmissions to the array are simple SCSI requests.
  • NAS storage is file I/O, which is either NFS or CIFS.  With NAS storage, your local machine “sees” a service to connect to on the network that provides file storage.  The array manages the file system, and transmissions to the array are protocol specific file based operations.
  • SAN and NAS have different strengths, weaknesses, and use cases
  • SAN and NAS are very different from a hardware and protocol perspective
  • SAN and NAS are sometimes only offered on specific array platforms

Our Question

So back to our question that started this mess: with thin provisioned block storage, if I delete a ton of data out of a LUN, why do I not see any space returned on the storage array?  We know now that this is because there is no such thing as a delete in the SAN/block/SCSI world.  Thin provisioning works by allocating storage you need on demand, generally because you tried to write to it.  However once that storage has been allocated (once the disk has been created), the array only sees reads and writes, not creates and deletes.  It has no way of knowing that you sent over a bunch of writes that were intended to be a delete.  The deletes are related to the file system, which is being managed by your server, not the array.  The LUN itself is below the file system layer, and is that same disk address space filled with data we’ve been discussing.  Deletes don’t exist on SAN storage, apart from administratively deleting an entire object – LUN, RAID set, Pool, etc.

With NAS storage on the other hand, the array does manage the file system.  You tell it when to delete something by sending it a delete command via NFS or CIFS, so it certainly knows that you want to delete it.  In this manner file systems allocations on NAS devices usually fluctuate in capacity.  They may be using 50GB out of 100GB today, but only 35GB out of 100GB tomorrow.

Note: there are ways to reclaim space either on the array side with thin reclamation (if it is supported), or on the host side with the SCSI UNMAP commands (if it is supported).  Both of these methods will allow you to reclaim some/all of the deleted space on a block array, but they have to be run as a separate operation from the delete itself.  It is not a true “delete” operation but may result in less storage allocated.

Which Is Better?

Yep, get out your battle gear and let’s duke it out!  Which is better?  SAN vs NAS!  Block vs File!  Pistols at high noon!

Unfortunately as engineers a lot of times we focus on this “something must be the best” idea.

Hopefully if you’ve read this whole thing you realize how silly this question is, for the most part.  SAN and NAS storage occupy different areas and cover different functions.  Most things that need NAS functionality (many access points and permissions control) don’t care about SAN functionality (block level operations and utilities), and vice versa.  This question is kind of like asking which is better, a toaster or a door stop?  Well, do you need to toast some delicious bread or do you need to stop a delicious door?

In some cases there is overlap.  For example, vSphere datastores can be accessed over block protocols or NAS (NFS).  In this case what is best is most often going to be – what is the best fit in the environment?

  • What kind of hardware do you have (or what kind of budget do you have)?
  • What kind of admins do you have and what are their skillsets?
  • What kind of functionality do you need?
  • What else in the environment needs storage (i.e. does something else need SAN storage or NFS storage)?
  • Do you have a need for RDMs (LUNs mapped directly from the array in order to expose some of the SCSI functionality)?

From a performance perspective 10Gb NFS and 10Gb iSCSI are going to do about the same for you, and honestly you probably won’t hit the limits of those anyway.  These other questions are far more critical.

Which leads me to…

What Do I Need?

A pretty frequently asked question in the consulting world – what do I need, NAS or SAN?  This is a great question to ask and to think about but again it goes back to what do you need to do?

Do you have a lot of user files that you need remote access to?  Windows profiles or home directories?  Then you probably need NAS.

Do you have a lot of database servers, especially ones that utilize clustering?  Then you probably need SAN.

Truthfully, most organizations need some of both – the real question is in what amounts.  This will vary for every organization but hopefully armed with some of the information in this blog series you are closer to making that choice for your situation.

SAN vs NAS Part 1: Intro

Welcome to the New Year!

I wanted to write a blog post on a very confusing storage topic (at least for myself) but I have also been searching for another large scale topic similar to the set I wrote on RAID last year.  After thinking about it I feel like my confusing question is really just a subset of a misunderstanding about block storage.  So without further ado, I’m going to write up a pretty detailed break down of SAN (Storage Area Networks), or block storage, vs NAS (Network Attached Storage), or file storage.  This is another topic, like RAID, that is fundamental and basic but not always fully understood.

Certainly there are other write ups on this topic out there, and in ways this can be summed up in just a few bullet points.  But I think a larger discussion will really help solidify understanding.

The specific confusing question I’ll ask and hopefully answer is, with thin provisioned block storage, if I delete a ton of data out of a LUN, why do I not see any space returned on the storage array?  Say I’ve got a thin 1TB LUN on my VMAX, and it is currently using (allocated) 500GB of space.  I go to the server where this LUN is attached and delete 300GB of data.  Querying the VMAX, I still see 500GB of space used.

This concept is hard to understand and I’ve not only asked this question myself, I’ve fielded it from several people in a variety of roles.  Central to understanding this concept is understanding the difference between file and block storage.

To start out, let’s briefly define the nature of things about file and block storage.

SAN – Block Storage

The easiest way to think of SAN is a disk drive directly attached to a computer.  Block storage access is no different from plugging in a USB drive, or installing another hard drive into the server, as far as how the server accesses it.  The medium for accessing it over your SAN varies with protocols and hardware, but at the end of the day you’ve got a disk drive (block device) to perform I/O with.

NAS – File Storage

The idea with NAS is that you are accessing files stored on a file server somewhere.  So I have a computer system in the corner that has a network service running on it, and my computer on my desk connects to that system.  Generally this connection is going to be CIFS (for Windows) or NFS (for *nix/vSphere).  The file protocol here varies but we are (most of the time) going to be running over IP.  And yes, sometimes Linux folks access CIFS shares and sometimes Windows folks do NFS, but these are exceptions to the rule.

In part 2, I’ll be covering more of the differences and similarities between these guys.

Thoughts on Thin Provisioning

I finally decided to put all my thoughts down on the topic of thin provisioning.  I wrestled with this post for a while because some of what I say is going to go kinda-sorta against a large push in the industry towards thin provisioning.  This is not a new push; it has been happening for years now.  This post may even be a year or two too late…

I am not anti-thin – I am just not 100% pro-thin.  I think there are serious questions that need to be addressed and answered before jumping on board with thin provisioning.  And most of these are relatively non-technical; the real issue is operational.

Give me a chance before you throw the rocks.

What is Thin Provisioning?

First let’s talk about what thin provisioning is, for those readers who may not know.  I feel like this is a pretty well known and straightforward concept so I’m not going to spend a ton of time on it.  Thin provisioning at its core is the idea of provisioning storage space “on demand.”

Before thin provisioning a storage administrator would have some pool of storage resources which gave some amount of capacity.  This could be simply a RAID set or even an actual pooling mechanism like Storage Pools on VNX.  A request for capacity would come in and they would “thick provision” capacity out of the pool.  The result would mean that the requested capacity would be reserved from the pooled capacity and be unavailable for use…except obviously for whatever purpose it was provisioned for.  So for example if I had 1000GB and you requested a 100GB LUN, my remaining pool space would be 900GB.  I could use the 900GB for whatever I wanted but couldn’t encroach into your 100GB space – that was yours and yours alone.  This is a thick provisioned LUN.

Of course back then it wasn’t “thick provisioning,” it was just “provisioning” until thin came along! With thin provisioning, after the request is completed and you’ve got your LUN, the pool is still at 1000GB (or somewhere very close to it due to metadata allocations which are beyond the scope of this post).  I have given you a 100GB LUN out of my 1000GB pool and still I have 1000GB available.  Remember that as soon as you get this 100GB LUN, you will usually put a file system on it and then it will appear empty.  This emptyness is the reason that the 100GB LUN doesn’t take up any space…there isn’t really any data on it until you put it there.

Essentially the thin LUN is going to take up no space until you start putting stuff into it.  If you put 10GB of data into the LUN, then it will take up 10GB on the back side.  My pool will now show 990GB free.  You should have a couple of indicators on the array like allocated or subscribed or committed and consumed or used.  Allocated/subscribed/committed is typically how much you as the storage administrator have created in the pool.  Consumed or used is how much the servers themselves have eaten up.

What follows are, in no particular order, some things to keep in mind when thin provisioning.

Communication between sysadmin and storage admin

This seems like a no-brainer but a discussion needs to happen between the storage admins providing the storage and the sysadmins who are consuming it.  If a sysadmin is given some space, they typically see this as space they can use for whatever they want.  If they need a dumping ground for a big ISO, they can use the SAN attached LUN with 1TB of free space on it.  Essentially they will likely feel that space you’ve allocated is theirs to do whatever they want with.  This especially makes sense if they’ve been using local storage for years.  If they can see disk space on their server, they can use it as they please.  It is “their” storage!

You need to have this conversation so that sysadmins understand activities and actions that are “thin hostile.”  A thin hostile action is one that effectively nullifies the benefit of thin provisioning by eating up space from day 1.  An example of a thin hostile action would be hard formatting the space a 500GB database will use up front, before it is actually in use.  Another example of a thin hostile action would be to do a block level zero formatting of space, like Eager Zero Thick on ESX.  And obviously using excess free space on a LUN for a file dumping ground is extremely thin hostile!

Another area of concern here is deduplication.  If you are using post-process deduplication, and you have thin provisioned storage, your sysadmins need to be aware of this when it comes to actions that would overwrite a significant amount of data.  You may dedupe their data space by 90%, but if they come in and overwrite everything it can balloon quickly.

The more your colleagues know about how their actions can affect the underlying storage, the less time you will spend fire fighting.  Good for them, good for you.  You are partners, not opponents!

Oversubscription & Monitoring

With thin provisioning, because no actual reservation happens on disk, you can provision as much storage as you want out of as small a pool as you want.  When you exceed the physical media, you are “oversubscribing” (or overcommitting, or overprovisioning, or…).  For instance, with your 1000GB you could provision 2000GB of storage.  In this case you would be 100% oversubscribed.  You don’t have issues as long as the total used or consumed portion is less than 1000GB.

There are a lot of really appealing reasons for doing this.  Most of the time people ask for more storage than they really need…and if it goes through several “layers” of decision makers, that might amplify greatly.  Most of the time people don’t need all of the storage they asked for right off the bat.  Sometimes people ask for storage and either never use it or wait a long time to use it.  The important thing to never forget is that from the sysadmin’s perspective, that is space you guaranteed them!  Every last byte.

Oversubscription is a powerful tool, but you must be careful about it.  Essentially this is a risk-reward proposition: the more people you promise storage to, the more you can leverage your storage array, but the more you risk that they will actually use it.  If you’ve given out 200% of your available storage, that may be a scary situation when a couple of your users decide to make good on the promise of space you made to them.  I’ve seen environments with as much as 400% oversubscription.  That’s a very dangerous gamble.

Thin provisioning itself doesn’t provide much benefit unless you choose to oversubscribe.  You should make a decision on how much you feel comfortable oversubscribing.  Maybe you don’t feel comfortable at all (if so, are you better off thick?).  Maybe 125% is good for you.  Maybe 150%.  Nobody can make this decision for you because it hinges on too many internal factors.  The important thing here is to establish boundaries up front.  What is that magic number?  What happens if you approach it?

Monitoring goes hand in hand with this.  If you monitor your environment by waiting for users to email that systems are down, oversubscribing is probably not for you.  You need to have a firm understanding of how much you’ve handed out and how much is being used.  Again, establish thresholds, establish an action plan for exceeding them, and monitor them.

Establishing and sticking with thresholds like this really helps speed up and simplify decision making, and makes it very easy to measure success.  You can always re-evaluate the thresholds if you feel like they are too low or too high.

Also make sure your sysadmins are aware of whether you are oversubscribed or not, and what that means to them.  If they are planning on a massive expansion of data, maybe they can check with you first.  Maybe they requested storage for a project and waited 6 months for it to get off the ground – again they can check with you to make sure all is well before they start in on it.  These situations are not about dictating terms, but more about education.  Many other resources in your environment are likely oversubscribed.  Your network is probably oversubscribed.  If a sysadmin in the data center decided to suddenly multicast an image to a ton of servers on a main network line, you’d probably have some serious problems.  You probably didn’t design your network to handle that kind of network traffic (and if you did you probably wasted a lot of money).  Your sysadmins likely understand the potential DDoS effect this would generate, and will avoid it.  Nobody likes pain.

“Runway” to Purchase New Storage

Remember with thin provisioning you are generally overallocating and then monitoring (you are monitoring, aren’t you?) usage.  At some point you may need to buy more storage.

If you wait till you are out of storage, that’s no good right?  You have a 100% consumed pool, with a bunch of attached hosts that are thinking they have a lot more storage to run through.  If you have oversubscribed a pool of storage and it hits 100%, it is going to be a terrible, horrible, no good, very bad day for you and everyone around you.  At a minimum new writes to anything in that pool will be denied, effectively turning your storage read-only.  At a maximum, the entire pool (and everything in it) may go offline, or you may experience a variety of fun data corruptions.

So, you don’t want that.  Instead you need to figure out when you will order new storage.  This will depend on things like:

  • How fast is your storage use growing?
  • How many new projects are you implementing?
  • How long does it take you to purchase new storage?

The last point is sometimes not considered before it is too late.  When you need more storage you have to first figure out exactly what you need, then you need to spec it, then you need a quote, the quote needs approval, then purchasing, then shipping, then it needs to be racked/stacked, then implemented.  How long does this process last for your organization?  Again nobody can answer this but you.  If your organization has a fast turn around time, maybe you can afford to wait till 80% full or more.  But if you are very sluggish, you might need to start that process at 60% or less.

Another thing to consider is if you are a sluggish organization, you may save money by thick provisioning.  Consider that you may need 15TB of storage in 2 years.  Instead you buy 10TB of storage right off the bat with a 50% threshold.  As soon as you hit 5TB of storage used you buy another 10TB to put you at 20.  Then when you hit 10 you buy another 10TB to put you at 30.  Finally at 15TB you purchase again and hit 40TB.  If you had bought 20 to begin with and gone thick, you would have never needed to buy anything else.  This situation is probably uncommon but I wanted to mention it as a thought exercise.  Think about how the purchasing process will impact the benefit you are trying to leverage from thin provisioning.

Performance Implications

Simply – ask your vendor whether thin storage has any performance difference over thick.  The answer with most storage arrays (where you have an actual choice between thick and thin) is yes.  Most of the time this is a negligible difference, and sometimes the difference is only in the initial allocation – that is to say, the first write to a particular LBA/block/extent/whatever.  But again, ask.  And test to make sure your apps are happy on thin LUNs.

Feature Implications

Thin provisioning may have feature implications on your storage system.

Sometimes thin provisioning enables features.  On a VMAX, thin provisioning enables pooling of a large number of disks.  On a VNX thin provisioning is required for deduplication and VNX Snapshots.

And sometimes thin provisioning either disables or is not recommended with certain features.  On a VNX thin LUNs are not recommended for use as File OE LUNs, though you can still do thin file systems on top of thick LUNs.

Ask what impact thin vs thick will have on array features – even ones you may not be planning to use at this very second.

Thin on Thin

Finally, in virtualized environments, in general you will want to avoid “thin on thin.”  This is a thin datastore created on a thin LUN.  The reason is that you tend to lose a static point of reference for how much capacity you are overprovisioning.  And if your virtualization team doesn’t communicate too well with the storage team, they could be unknowingly crafting a time bomb in your environment.

Your storage team might have decided they are comfortable with a 200% oversubscription level, and your virt team may have made this same decision.  This will potentially overallocate your storage by 400%!  Each team is sticking to their game plan, but without knowing and monitoring the other folks they will never see the train coming.

You can get away with thin on thin if you have excellent monitoring, or if your storage and virt admins are one and the same (which is common these days).  But my recommendation still continues to be thick VMs on thin datastores.  You can create as many thin datastores as you want, up to system limits, and then create lazy zeroed thick VMs on top of them.

Edit: this recommendation assumes that you are either required or compelled to use thin storage.  Thin VMs on thick storage are just as effective, but sometimes you won’t have a choice in this matter.  The real point is keeping one side or the other thick gives you a better point of reference for the amount of overprovisioning.

Summary

Hopefully this provided some value in the form of thought processes around thin provisioning.  Again, I am not anti-thin; I think it has great potential in some environments.  However, I do think it needs to be carefully considered and thought through when it sometimes seems to be sold as a “just thin provision, it will save you money” concept.  It really needs to be fleshed out differently for every organization, and if you take the time to do this you will not only better leverage your investment, but you can avoid some potentially serious pain in the future.