Beginning with AWS CloudFormation – Part 1

One of my few new goals for this year is to get back to blogging regularly about stuff I’m learning or interested in.  Keep a look out here for (hopefully!) more content this year than previous years which might have had just a handful of posts.

AWS CloudFormation is a utility that allows you to define AWS “infrastructure” as code in text files called Templates.  You can use it to deploy almost anything via JSON or YAML scripts.  The deployed resources are collectively called stacks.  There are other IaC options here as well, like Terraform, but I think it is handy to know the native toolset as well.  Plus if you are going for AWS certifications you’ll need to be familiar with it.

Continue reading

EMC RecoverPoint and XtremIO Part 1 – Initial Findings and Requirements

Back in the saddle again after a long post drought!  I’ve been busy lately working on some training activities with pluralsight, as well as dealing with a company merger.  I’m no longer with Varrow, as Varrow was acquired by Sirius Computer Solutions.  And enjoying time with my son, who is about to turn 1 year old – hard to believe!

Over the past couple of weeks, I’ve been involved in some XtremIO and Recoverpoint deployments.  RP+XtremIO just released not too long ago and it has been a bit of a learning curve – not with the product itself, but with the new methodology.  I wanted to lay out some details in case anyone is looking at this solution.

There is a good whitepaper on called Recoverpoint Deploying with XtremIO Tech Notes.  It does a good job of laying out the functionality, but for me at least still missed some important details – or maybe just didn’t phrase them so I could understand.

First, great news, from a functional standpoint this solution is roughly the same as all other RP implementations.  The same familiar interface is there, you create CGs and can do things like test a copy, recover production, and failover.  So if you are familiar with Recoverpoint protection operationally there is not a lot of difference.

Under the covers, things are hugely different.  I’m going to talk about the snap based replication a little later, and probably in part 2 as well.

First, the actual deployment is roughly the same.  Don’t forget your code requirements:

  • RecoverPoint 4.1.2 or later
  • XtremIO 4.0 or later

RP is deployed with Deployment Manager as usual, and XtremIO is configured as usual. 3GB repository volume (as usual!).

RP to XtremIO zoning is simple – everything to everything.  A single zone with all RP ports and all XtremIO ports from a single cluster in each fabric.

With the new 4.0 code, a single XtremIO Management Server (XMS) can manage multiple clusters.  Even though it would probably work, I would use a single zone per fabric for each cluster regardless of whether it is in the same XMS or not. More on the multi-cluster XMS with Recoverpoint later…

When you go to add XtremIO arrays into RP, you’ll use the XMS IP, and then a new rp_user account.  I’m not sure what the default password here is, so I just reset the password using the CLI.  If you have pre-zoned, you just select the XtremIO array from the list, give the XMS IP and rp_user creds.  If you haven’t pre-zoned, you also have to enter the XtremIO serial as well.


Here is the “I didn’t zone already” screen.  If you did pre-zone, you’ll see your serial in the list at the top and don’t need to enter it below.  Port 443 is required to be open between RP, XMS, and SCs.  Port 11111 is required between RP and SCs.  Usually this is in the same data center so not a huge deal.

Once the arrays have been added in and your RP cluster is configured like you want it, the rest is again same as usual.

  1. Create initiator group on XtremIO for Recoverpoint with all RP initiators.
  2. Create journal volumes, production volumes, and replica volumes.
  3. Present them to RecoverPoint
  4. Configure consistency groups.  Here there are important things to understand about the snap-based protection schemes, that I’ll go over later.

One important change due to the snap based recovery – no recovery data is stored in journals, only metadata related to snapshots!  Because of this, journals need to be as small as possible – 10GB for normal CGs, 40GB for distributed.  They won’t use all this space but we don’t care (assuming your jvols are on XtremIO) because XtremIO is thin anyway.  Similarly, each CG only needs one journal, as your protection window is not defined by your total journal capacity.

RP with XtremIO licensing is pretty simple.  You can either buy a basic (“/SE”) or full (“/EX”) license for your brick size.  Either way you can protect as much capacity as you can create, which is nice considering XtremIO is thin and does inline dedupe/compression.  Essentially basic just gives you remote protection, XtremIO to XtremIO only, only 1 remote copy.  Full adds in the ability to do local, as well as go from anything to anything (e.g. XtremIO to VNX, or VMAX to XtremIO), and a 3rd copy (so production, local, and remote, or production and two remote).  Obviously you need EX or CL licensing for the other arrays if you are doing multiple types.  Just a point of clarification here, the “SE” and “EX” for XtremIO are different than normal.  So if you have a VNX with /SE licensing, you can’t use it with /SE (or even /EX) XtremIO licensing. 

If you are using iSCSI with XtremIO, you can still do RP in direct attach mode, similar to what we do on VNX iSCSI.  Essentially you will direct attach  up to two bricks directly to your exactly 2 node RP cluster.  I would imagine (though not confirmed) that you could have more than two bricks, but only would attach two of them to RPAs.  vRPA is not currently supported – this remains a Clariion/VNX/VNXe only product.

I’m going to cover some details about the snap based protection in the next post, but in the meantime know that because it IS all snap based and there is no data in the journal to “roll” to, that image access is always direct and it is always near instantaneous.  It doesn’t matter if you are trying to access an image from 1 minute ago that has 4KB worth of changes, or an image from a week ago with 400GB worth of changes.  This part is very cool, as there is no need to worry about rolling.  There is also no need to worry about the undo log for image access – with traditional recoverpoint you were “gently encouraged” 🙂 to not image access for a long time, because as the writes piled up, eventually replication would halt.  And there was a specific capacity for the undo log.

Instead now the only capacity based limit you are concerned about is the physical capacity on the XtremIO brick itself.

Allegedly Site Recovery Manager is supported but I didn’t do any testing with that.

RP only supports volumes from XtremIO that use 512 as the logical block size, not the 4K block size.  Although there is such little support for 4K block size now I’m still strongly discouraging anyone to use that, unless they have tons of sign off and have done tons of testing.  But if you are using 4K block size, then you won’t be able to use RP protection.  Just to clarify, this is the setting I’m talking about – this is unrelated to FS block sizing a-la NTFS or anything of that nature.


A few other random caveats:

  • If one volume at a copy is on an XtremIO array, then all volumes at that copy must be on that XtremIO array.  So for a given single copy (all the volumes in a copy), you can’t split them between array types or even clusters due to snapshotting.
  • There must be a match between production and the replica size, although I always recommend this anyway.
  • Resize for volumes is unfortunately back at the old way.  Remove both prod/replica volumes from CG, resize, then re-add.  Hopefully a dynamic resize will be available at some point.

In the next post I’m going to talk about some things I know and some things I’ve observed during testing with the snapshotting behavior, but I wanted to call out a specific limitation right now and will probably hammer on it later – there is an 8,192 limit of total volumes + snapshots per XMS irrespective of Recoverpoint.  This sounds like a ton, but each production volume you protect will have (at times) two snapshots associated with it.  Each replica volume will have max_snaps + 1 snapshots associated with it.  Because this is a per XMS limitation and not a per cluster limitation, depending on exactly how many volumes you have and how many snapshots you want to keep, you may still want a single XMS per cluster in a multi-cluster configuration.

More to come!