EMC RecoverPoint Journal Sizing

A commenter on my post about RecoverPoint Journal Usage asks:

How can I tell if my journal is large enough for a consistency group? That is to say, where in the GUI will it tell me I need to expand my journal or add another journal lun?

This is an easy question to answer but for me this is another opportunity to re-iterate journal behavior.  Scroll to the end if you are in a hurry.

Back to Snapshots…

Back to our example in the previous article about standard snapshots – on platforms where snapshots are used you often have to allocate space for this purpose…like with SnapView on EMC VNX and Clariion, you have to allocate space via a Reserve LUN pool.  On NetApp systems this is called the snapshot reserve.

Because of snapshot behavior (whether Copy On First Write or Redirect On Write), at any given time I’m using some variable amount of space in this area that is related to my change rate on the primary copy.  If most of my data space on the primary copy is the same as when I began snapping, I may be using very little space.  If instead I have overwritten most of the primary copy, then I may be using a lot of space.  And again, as I delete snapshots over time this space will free up.  So a potential set of actions might be:

  1. Create snapshot reserve of 10GB and create snapshot1 of primary – 0% reserve used
  2. Overwrite 2.5GB of data on primary – 25% reserve used
  3. Create snapshot2 of primary and overwrite a different 2.5GB of data on primary – 50% reserve used
  4. Delete snapshot1 – 25% reserve used
  5. Overwrite 50GB of data – snapshot space full (probably bad things happen here…)

There is meaning to how much space I have allocated to snapshot reserve.  I can have way too much (meaning my snapshots only use a very small portion of the reserve) and waste a lot of storage.  Or I can have too little (meaning my snapshots keep overrunning the maximum) and probably cause a lot of problems with the integrity of my snaps.  Or it can be just right, Goldilocks.

RP Journal

Once again the RP journal does not function like this.  Over time we expect RP journal utilization to be at 100%, every time.  If you don’t know why, please read my previous post on it!

The size of the journal only defines your protection window in RP.  The more space you allocate, the longer back you are able to recover from.  However, there is no such thing as “too little” or “too much” journal space as a rule of thumb – these are business defined goals that are unique to every organization.

I may have allocated 5GB of journal space to an app, and that lets me recover 2 weeks back because it has a really low write rate.  If my SLA requires me to recover 3 weeks back, that is a problem.

I may have allocated 1TB of journal space to an app, and that lets me recover back 30 minutes because it has an INSANE write rate.  If my SLA only requires me to recover back 15 minutes, then I’m within spec.

RP has no idea about what is good journal sizing or bad journal sizing, because this is simply a recovery time line.  You must decide whether it is good or bad, and then allocate additional journals as necessary.  Unlike other technology like snapshots, there is no concept of “not enough journal space” beyond your own personal SLAs. In this manner, by default RecoverPoint won’t let you know that you need more journal space for a given CG because it simply can’t know that.

Note: if you are regularly using the Test A Copy functionality for long periods of time (even though you really shouldn’t…), then you may run into sizing issues beyond just protection windows, as portions of the journal space are also used for that.  This is beyond the scope of this post, but just be aware that even if you are in spec from a protection window standpoint, you may need more journal space to support the test copy.

Required Protection Window

So RecoverPoint has no way of knowing whether you’ve allocated enough journal space to a given CG.  Folks on the pre-sales side have some nifty tools that can help with journal sizing by looking at data change rate, but this is really for the entire environment and hopefully before you bought it.

Luckily, RecoverPoint has a nice internal feature to alert you whether a given Consistency Group is within spec or not, and that is “Required Protection Window.”  This is a journal option within each copy and can be configured when a CG is created, or modified later.  Here is a pic of a CG without it.  Note that you can still see your current protection window here and make adjustments if you need.rpj1

Here is where the setting is located.

rpj2

And here is what it looks like with the setting enabled.

rpj3

So if I need to recover back 1 hour on this particular app, I set it to 1 hour and I’m good.  If I need to recover back 24 hours, I set it that way and it looks like I need to allocate some additional journal space to support that.

Now this does not control behavior of RecoverPoint (unlike, say, the Maximum Journal Lag setting) – whether you are within or under your required protection window, RP still functions the same.  It simply alerts you that you are under your personally defined window for that CG.  And if you are under for too long, or maybe under it at all if it is a mission critical application, you may want to add additional journal space to extend your protection window so that you are within spec.  Again I repeat, this is only an alerting function and will not, by itself, do anything to “fix” protection window problems!

Summary

So bottom line: RP doesn’t – or more accurately can’t – know whether you have enough journal space allocated to a given CG because that only affects how long you can roll back for.  However, using the Required Protection Window feature, you can tell RP to alert you if you go out of spec and then you can act accordingly.

SAN vs NAS Part 5: Summary

We’ve covered a lot of information over this series, some of it more easily consumable than others.  Hopefully it has been a good walkthrough of the main differences between SAN and NAS storage, and presented in a little different way than you may have seen in the past.

I wanted to summarize the high points before focusing on a few key issues:

  • SAN storage is fundamentally block I/O, which is SCSI.  With SAN storage, your local machine “sees” something that it thinks is a locally attached disk.  In this case your local machine manages the file system, and transmissions to the array are simple SCSI requests.
  • NAS storage is file I/O, which is either NFS or CIFS.  With NAS storage, your local machine “sees” a service to connect to on the network that provides file storage.  The array manages the file system, and transmissions to the array are protocol specific file based operations.
  • SAN and NAS have different strengths, weaknesses, and use cases
  • SAN and NAS are very different from a hardware and protocol perspective
  • SAN and NAS are sometimes only offered on specific array platforms

Our Question

So back to our question that started this mess: with thin provisioned block storage, if I delete a ton of data out of a LUN, why do I not see any space returned on the storage array?  We know now that this is because there is no such thing as a delete in the SAN/block/SCSI world.  Thin provisioning works by allocating storage you need on demand, generally because you tried to write to it.  However once that storage has been allocated (once the disk has been created), the array only sees reads and writes, not creates and deletes.  It has no way of knowing that you sent over a bunch of writes that were intended to be a delete.  The deletes are related to the file system, which is being managed by your server, not the array.  The LUN itself is below the file system layer, and is that same disk address space filled with data we’ve been discussing.  Deletes don’t exist on SAN storage, apart from administratively deleting an entire object – LUN, RAID set, Pool, etc.

With NAS storage on the other hand, the array does manage the file system.  You tell it when to delete something by sending it a delete command via NFS or CIFS, so it certainly knows that you want to delete it.  In this manner file systems allocations on NAS devices usually fluctuate in capacity.  They may be using 50GB out of 100GB today, but only 35GB out of 100GB tomorrow.

Note: there are ways to reclaim space either on the array side with thin reclamation (if it is supported), or on the host side with the SCSI UNMAP commands (if it is supported).  Both of these methods will allow you to reclaim some/all of the deleted space on a block array, but they have to be run as a separate operation from the delete itself.  It is not a true “delete” operation but may result in less storage allocated.

Which Is Better?

Yep, get out your battle gear and let’s duke it out!  Which is better?  SAN vs NAS!  Block vs File!  Pistols at high noon!

Unfortunately as engineers a lot of times we focus on this “something must be the best” idea.

Hopefully if you’ve read this whole thing you realize how silly this question is, for the most part.  SAN and NAS storage occupy different areas and cover different functions.  Most things that need NAS functionality (many access points and permissions control) don’t care about SAN functionality (block level operations and utilities), and vice versa.  This question is kind of like asking which is better, a toaster or a door stop?  Well, do you need to toast some delicious bread or do you need to stop a delicious door?

In some cases there is overlap.  For example, vSphere datastores can be accessed over block protocols or NAS (NFS).  In this case what is best is most often going to be – what is the best fit in the environment?

  • What kind of hardware do you have (or what kind of budget do you have)?
  • What kind of admins do you have and what are their skillsets?
  • What kind of functionality do you need?
  • What else in the environment needs storage (i.e. does something else need SAN storage or NFS storage)?
  • Do you have a need for RDMs (LUNs mapped directly from the array in order to expose some of the SCSI functionality)?

From a performance perspective 10Gb NFS and 10Gb iSCSI are going to do about the same for you, and honestly you probably won’t hit the limits of those anyway.  These other questions are far more critical.

Which leads me to…

What Do I Need?

A pretty frequently asked question in the consulting world – what do I need, NAS or SAN?  This is a great question to ask and to think about but again it goes back to what do you need to do?

Do you have a lot of user files that you need remote access to?  Windows profiles or home directories?  Then you probably need NAS.

Do you have a lot of database servers, especially ones that utilize clustering?  Then you probably need SAN.

Truthfully, most organizations need some of both – the real question is in what amounts.  This will vary for every organization but hopefully armed with some of the information in this blog series you are closer to making that choice for your situation.

SAN vs NAS Part 4: The Layer Cake

Last post we covered the differences between NFS and iSCSI (NAS and SAN) and determined that we saw a different set of commands when interacting with a file.  The NFS write generated an OPEN command, while the iSCSI write did not.  In this post we’ll cover the layering of NAS (file or file systems) on top of SAN (SCSI or block systems) and how that interaction works.

Please note!  In modern computing systems there are MANY other layers than I’m going to talk about here.  This isn’t to say that they don’t exist or aren’t important, but just that we are focusing on a subset on them for clarity.  Hopefully.

First, take a look at the NFS commands listed here: https://tools.ietf.org/html/rfc1813

nfscommandsNotice that a lot of these commands reference files, and things that you would do with files like read and write, but also create, remove, rename, etc.

Compare this with the SCSI reference: http://www.t10.org/lists/op-alph.htm

Notice that in the SCSI case, we still have read and write, but there is no mention of files (other than “filemarks”).  There is no way to delete a file with SCSI – because again we are working with a block device which is a layer below the file system.  There is no way to delete a file because there is no file.  Only addresses where data is stored.

As a potentially clumsy analogy (like I often wield!) think about your office desk.  If it’s anything like mine, there is a lot of junk in the drawers.  File storage is like the stuff in a drawer.  The space in a drawer can have a lot of stuff in it, or it can have a little bit of stuff in it.  If I add more stuff to the drawer, it gets more full.  If I take stuff out of the drawer, it gets less full.  There is meaning to how much stuff is in an individual drawer as a relation to how much more stuff I can put in the drawer.

Block storage, on the other hand, is like the desk itself.  There are locations to store things – the drawers.  However, whether I have stuff in a drawer or I don’t have stuff in a drawer, the drawer still exists.  Emptying out my desk entirely doesn’t cause my desk to vanish.  Or at least, I suspect it wouldn’t…I have never had an empty desk in my life.  There is no relationship to the contents of the drawers and the space the desk occupies.  The desk is a fixed entity.  An empty drawer is still a drawer.

To further solidify this file vs block comparison, take a look at this handsome piece of artwork depicting the layers:

fsvisio_1Here is a representation of two files on my computer, a word doc and a kitty vid, and their relationship to the block data on disk.  Note that some disk areas have nothing pointing to them – these are empty but still zero filled (well…maybe, depending on how you formatted the disk).  In other words, these areas still exist!  They still have contents, even if that content is nothing.

When I query a file, like an open or read, it traverses the file system down to the disk level.  Now I’m going to delete the word doc.  In most cases, this is what is going to happen:

fsvisio_2My document is gone as far as I can “see.”  if I try to query the file system (like look in the directory it was stored in) it is gone.  However on the disk, it still exists.  (Fun fact: this is how “undelete” utilities work – by restoring data that is still on disk but no longer has pointers from the file system.)  It isn’t really relevant that it is still on the disk, because from the system’s perspective (and the file system’s perspective) it doesn’t exist any more.  If I want to re-use that space, the system will see it as free and store something else there, like another hilarious kitten video.

Sometimes this will happen instead, either as you delete something (rarely) or later as a garbage collection process:

fsvisio_3The document data has been erased and replaced with zeros.  (Fun fact: this is how “file shredder” programs work – by writing zeros (or a pattern) once (or multiple times) to the space that isn’t being actively used by files.)  Now the data is truly gone, but from the disk perspective it still isn’t really relevant because something still occupies that space.  From the disk’s perspective, something always occupies that space, whether it is kitty video data, document data, or zeros.  The file system (the map) is what makes that data relevant to the system.

This is a really high level example, but notice the difference in the file system level and the disk level.  When I delete that file, whether the actual disk blocks are scrubbed or left intact, the block device remains the same except for the configuration of the 1’s and 0’s.  All available addresses are still in place.  Are we getting closer to understanding our initial question?

Let’s move this example out a bit and take a look at an EMC VNX system from a NAS perspective.  This is a great example because there are both SAN/block (fibre channel) and NAS/file (cifs/nfs) at the same time.  The connections look like this:

dm1

From my desktop, I connect via NFS to an interface on the NAS (the datamover) in order to access my files.  And the datamover has a fibre channel connection to the block storage controllers which is where the data is actually stored.  The datamover consumes block storage LUNs, formats them with appropriate file systems, and then uses that space to serve out NAS.  This ends up being quite similar to the layered file/disk example above when we were looking at a locally hosted file system and disk.

What does it look like when I read and write?  Simply like this:

DM2My desktop issues a read or write via NFS, which hits the NAS, and the NAS then issues a read or write via SCSI over Fibre Channel to the storage processor.

Reads and writes are supported by SCSI, but what happens when I try to do something to a file like open or delete?

DM3The same command conversion happens, but it is just straight reads and writes at the SCSI level. It doesn’t matter whether the NAS is SAN attached like this one, or it just has standard locally attached disks.  This is always what’s going to happen because the block protocol and subsystems don’t work with files – only with data in addresses.

By understanding this layering – what file systems (NAS) do vs what disks (SAN) do – you can better understand important things about their utility.  For instance, file systems have various methods to guarantee consistency, in spite of leveraging buffers in volatile memory.  If you own the file system, you know who is accessing data and how.  You have visibility into the control structure.  If the array has no visibility there, then it can’t truly guarantee consistency.  This is why e.g. block array snapshots and file array snapshots are often handled differently.  With NAS snapshots, the array controls the buffers and can easily guarantee consistent snapshots.  But for a block snapshot, the array can only take a picture of the disk right now regardless of what is happening in the file system.  It may end up with an inconsistent image on disk, unless you initiate the snapshot from the attached server and properly quiesce/clean the file system.

Back to the idea of control, because NAS systems manage the file side of things, they also have a direct understanding of who is trying to access what.  Not only does this give it the ability to provide some access control (unlike SAN which just responds happily to any address requests it gets), it also explains why NAS is often ideal for multi-access situations.  If I have users trying to access the same share (or better yet, the same file), NAS storage is typically the answer because it knows who has what open.  It can manage things on that level.  For the SAN, not so much.  In fact if you want two hosts to access the same storage, you need to have some type of clustering (whether direct software or file system) that provides locks and checks.  Otherwise you are pretty much guaranteed some kind of data corruption as things are reading and writing over top of one another.  Remember SAN and SCSI just lets you read and write to addresses, it doesn’t provide the ability to open and own a file.

In part 5 I’ll provide a summary review and then some final thoughts as well.

SAN vs NAS Part 3: File Systems

In the last blog post, we asked a question: “who has the file system?”  This will be important in our understanding of the distinction between SAN and NAS storage.

First, what is a file system?  Simply (see edit below!), a file system is a way of logically sorting and addressing raw data.  If you were to look at the raw contents of a disk, it would look like a jumbled mess.  This is because there is no real structure to it.  The file system is what provides the map.  It lets you know that block 005A and block 98FF are both the first parts of your text file that reads “hello world.”  But on disk it is just a bunch of 1’s and 0’s in seemingly random order.

Edit: Maybe I should have chosen a better phrase like “At an extremely basic level” instead of “Simply.” 🙂 As @Obdurodon pointed out in the comments below, file systems are a lot more than a map, especially these days.  They help manage consistency and help enable cool features like snapshots and deduplication.  But for the purposes of this post this map functionality is what we are focusing on as this is the relationship between the file system and the disk itself.

File systems allow you to do things beyond just reads and writes.  Because they form files out of data, they let you do things like open, close, create, and delete.  They allow you the ability to keep track of where your data is located automatically.

(note: there are a variety of file systems depending on the platform you are working with, including FAT, NTFS, HFS, UXFS, EXT3, EXT4, and many more.  They have a lot of factors that distinguish them from one another, and sometimes have different real world applications.  For the purposes of this blog series we don’t really care about these details.)

Because SAN storage can be thought of as a locally attached disk, the same applies here.  The SAN storage itself is a jumbled mess, and the file system (data map) is managed by the host operating system.  Similar to your local C: drive in your windows laptop, your OS puts down a file system and manages the location of the block data.  Your system knows and manages the file system so it interacts with the storage array at a block level with SCSI commands, below the file system itself.

With NAS storage on the other hand, even though it may appear the same as a local disk, the file system is actually not managed by your computer – or more accurately the machine the export/share is mounted on.  The file system is managed by the storage array that is serving out the data.  There is a network service running that allows you to connect to and interact with it.  But because that remote array manages the file system, your local system doesn’t.  You send commands to it, but not SCSI commands.

With SAN storage, your server itself manages the file system and with NAS storage the remote array manages the file system.  Big deal, right?  This actually has a MAJOR impact on functionality.

I set up a small virtual lab using VirtualBox with a CentOS server running an NFS export and an iSCSI target (my remote server), and a Ubuntu desktop to use as the local system.  After jumping through a few hoops, I got everything connected up.  All commands below are run and all screenshots are taken from the Ubuntu desktop.

I’ll also take a moment to mention how awesome Linux is for these type of things.  It took some effort to get things configured, but it was absolutely free to set up a NFS/iSCSI server and a desktop to connect to it.  I’ve said it before but will say it again – learn your way around Linux and use it for testing!

So remember, who has the file system?  Note that with the iSCSI LUN, I got a raw block device (a.k.a. a disk) presented from the server to my desktop.  I had to create a partition and then format it with EXT4 before I could mount it.  With the NFS export, I just mounted it immediately – no muss no fuss.  That’s because the file system is actually on the server, not on my desktop.

Now, if I were to unmount the iSCSI LUN and then mount it up again (or on a different linux desktop) I wouldn’t need to lay down a file system but that is only because it has already been done once.  With SAN storage I have to put down a file system on the computer it is attached to the first time it is used, always.  With NAS storage, there is no such need because the file system is already in place on the remote server or array.

Let’s dive in and look at the similarities and differences depending on where the file system is.

Strace

First let’s take a look at strace.  strace is a utility that exposes some of the ‘behind the scenes’ activity when you execute commands on the box.  Let’s run this command against a data write via a simple redirect:

strace -vv -Tt -f -o traceout.txt echo “hello world” > testfile

Essentially we are running strace with a slew of flags against the command [ echo “hello world” > testfile ].  Here is a screenshot of the relevant portion of both outputs when I ran the command with testfile located on the NFS export vs the local disk.

strace

Okay there is a lot of cryptic info on those pics, but notice that in both cases the write looks identical.  The “things” that are happening in each screenshot look the same.  This is a good example of how local and remote I/O “appears” the same, even at a pretty deep level.  You don’t need to specify that you are reading or writing to a NAS export, the system knows what the final destination is and makes the necessary arrangements.

Dstat

Let’s try another method – dstat.  Dstat is a good utility for seeing the types of I/O running through your system.  And since this is a lab system, I know it is more or less dead unless I’m actively doing something on it.

I’m going to run a large stream of writes (again, simple redirection) in various locations (one location at a time!) while I have dstat running in order to see the differences.  The command I’m using is:

for i in {1..100000}; do echo $i > myout; done

With myout located in different spots depending on what I’m testing.

For starters, I ran it against the local disk:

localdisk_dstat

Note the two columns in the center indicating “dsk” traffic (I/O to a block device) and “net” traffic (I/O across the network interfaces).  You can think of the “dsk” traffic as SCSI traffic.  Not surprisingly, we have no meaningful network traffic, but sustained block traffic.  This makes sense since we are writing to the local disk.

Next, I targeted it at the NFS export.

nfs_dstat

A little different this time, as even though I’m writing to a file that appears in the filesystem of my local machine (~/mynfs/myout) there is no block I/O.  Instead we’ve got a slew of network traffic.  Again this makes sense because as I explained even though the file “appears” to be mine, it is actually the remote server’s.

Finally, here are writes targeted at the iSCSI LUN.

iscsi_dstat

Quite interesting, yes?  We have BOTH block and network traffic.  Again this makes sense.  The LUN itself is attached as a block device, which generates block I/O.  However, iSCSI traffic travels over IP, which hits my network interfaces.  The numbers are a little skewed since the block I/O on the left is actually included in the network I/O on the right.

So we are able to see that something is different depending on where my I/O is targeted, but let’s dig even deeper.  It’s time to…

WIRESHARK!

For this example, I’m going to run a redirect with cat:

cat > testfile

hello world

ctrl+c

This is simply going to write “hello world” into testfile.

After firing up wireshark and making all the necessary arrangements to capture traffic on the interface that I’m using as an iSCSI initiator, I’m ready to roll.  This will allow me to capture network traffic between my desktop and server.

Here are the results:

iscsi_write

There is a lot of stuff on this pic as expected, but notice the write command itself.  It is targeted at a specific LBA, just as if it were a local disk that I’m writing to.  And we get a response from the server that the write was successful.

Here is another iSCSI screenshot.

iscsi_write2

I’ve highlighted the write and you can see my “hello world” in the payload.  Notice all the commands I highlighted with “SCSI” in them.  It is clear that this is a block level interaction with SCSI commands, sent over IP.  Note also that in both screenshots, there is no file interaction.

Now let’s take a look at the NFS export on my test server.  Again I’m firing up wireshark and we’ll do the same capture operation on the interface I’m using for NFS.  I’m using the same command as before.

nfscap_write

Here is the NFS write command with my data.  There are standard networking headers and my hello world is buried in the payload.  Not much difference from iSCSI, right?

The difference is a few packets before:

nfscap_open

We’ve got an OPEN command!  I attempt to open the file “testfile” and the server responds to my request like a good little server.  This is VERY different from iSCSI!  With iSCSI we never had to open anything, we simply sent a write request for a specific Logical Block Address.  With iSCSI, the file itself is opened by the OS because the OS manages the file system.  With NFS, I have to send an OPEN to the NAS in order to discover the file handle, because my desktop has no idea what is going on with the file system.

This is, I would argue, THE most important distinction between SAN and NAS and hopefully I’ve demonstrated it well enough to be understandable.  SAN traffic is SCSI block commands, while NAS traffic is protocol-specific file operations.  There is also some overlap here (like read and write), but these are still different entities with different targets.  We’ll take a look at the protocols and continue discussing the layering effect of file systems in Part 4.

SAN vs NAS Part 2: Hardware, Protocols, and Platforms, Oh My!

In this post we are going to explore some of the various options for SAN and NAS.

SAN

There are a couple of methods and protocols for accessing SAN storage.  One is Fibre Channel (note: this is not misspelled, the protocol is Fibre, the cables are fiber) where SCSI commands are encapsulated within Fibre Channel frames.  This may be direct Fibre Channel (“FC”) over a Fibre Channel fabric, or Fibre Channel over Ethernet (“FCoE”) which further encapsulates Fibre Channel frames inside ethernet.

With direct Fibre Channel you’ll need some FC Host Bus Adapters (HBAs), and probably some FC switches like Cisco MDS or Brocade (unless you plan on direct attaching a host to an array which most of the time is a Bad Idea).

With FCoE you’ll be operating on an ethernet network typically using Converged Network Adapters (CNAs).  Depending on the type of fabric you are building, the array side may still be direct FC, or it may be FCoE as well.  Cisco UCS is a good example of the split out, as generally it goes from host to Fabric Interconnect as FCoE, and then from Fabric Interconnect to array or FC switch as direct Fibre Channel.

It could also be accessed via iSCSI, which encapsulates SCSI commands within IP over a standard network.  And then there are some other odd mediums like infiniband, or direct attach via SAS (here we are kind of straying away from the SAN and are really just directly attaching disks, but I digress).

What kind of SAN you use depends largely on the scale and type of your infrastructure.  Generally if you already have FC infrastructure, you’ll stay FC.  If you don’t have anything yet, you may go iSCSI.  Larger and performance environments typically trend toward FC, while small shops trend towards iSCSI.  That isn’t to say that one is necessarily better than the other – they have their own positives and negatives.  For example, FC has its own learning curve with fabric management like zoning, while iSCSI connections are just point to point over existing networks that someone probably already knows.  The one thing I will caution against here is if you are going for iSCSI, watch out for 1Gb configurations – there is not a lot of bandwidth and the network can get choked VERY quickly.  I personally prefer FC because I know it well and trust its stability, but again there are positives and negatives.

Back to the subject at hand – in all cases with SAN the recurring theme here is SCSI commands.  In other words, even though the “disk” might be a virtual LUN on an array 10 feet (or 10 miles) away, the computer is treating it like a local disk and sending SCSI disk commands to it.

Some array platforms are SAN only, like the EMC VMAX 10K, 20K, 40K series.  EMC XtremIO is another example of a SAN only platform.  And then there are non-EMC platforms like 3PAR, Hitachi, and IBM XIV.  Other platforms are unified, meaning they do both SAN and NAS.  EMC VNX is a good example of a unified array.  NetApp is another competitor in this space.  Just be aware that if you have a SAN only array, you can’t do NAS…and if you have a NAS only array (yes they exist, see below), you can’t do SAN.  Although some “NAS” arrays also support iSCSI…I’d say most of the time this should be avoided unless absolutely necessary.

NAS

NAS on the other hand is virtually always over an IP network.  This is going to use standard ethernet adapters (1Gb or 10Gb) and standard ethernet switches and IP routers.

As far as protocols there is CIFS, which is generally used for Windows, and NFS which is generally used on the Linux/Unix/vSphere side.  CIFS has a lot of tie-ins with Active Directory, so if you are a windows shop with an AD infrastructure, it is pretty easy to leverage your existing groups for permissions.  NFS doesn’t have these same ties with AD, but does support NIS for some authentication services.

The common theme on this side of the house is “file” which can be interpreted as “file system.”  With CIFS, generally you are going to connect to a “share” on the array, like \\MYARRAY1\MYAWESOMESHARE.  This may be just through a file browser for a one time connection, or this may be mounted as a drive letter via the Map Network Drive feature.  Note that even though it is mounted as a drive letter, it is still not the same as an actual local disk or SAN attached LUN!

For NFS, an “export” will be configured on the array and then mounted on your computer.  This actually gets mounted within your file system.  So you may have your home directory in /users/myself, and you create a directory “backups” and mount an export to it doing something like mount -t nfs 172.0.0.10:/exports/backups /users/myself/backups.  Then you access any files just as you would any other ones on your computer.  Again note that even though the NFS export is mounted within your file system, it is still not the same as an actual local disk or SAN attached LUN!

Which type of NAS protocol you use is generally determined by the majority of your infrastructure – whether it is Windows or *nix.  Or you may run both at once!  Running and managing both NFS and CIFS is really more of a hurdle with understanding the protocols (and sometimes licensing both of them on your storage array), whereas the choice to run both FC and iSCSI has hardware caveats.

For NAS platforms, we again look to the unified storage like EMC VNX.  There are also NAS gateways that can be attached to a VMAX for NAS services.  EMC also has a NAS only platform called Isilon.

One thing to note is that if your array doesn’t support NAS (say you have a VMAX or XtremIO) the gateway solution is definitely viable and enables some awesome features, but it is also pretty easy to spin up a Windows/Linux VM, or use a Windows/Linux physical server (but seriously, please virtualize!) that uses array block storage, but then serves up NAS itself.  So you could create a Windows file server on the VMAX and then all your NAS clients would connect to the Windows machine.

The reverse is not really true…if your array doesn’t support SAN, it is difficult to wedge SAN into the environment.  You can always do NFS with vSphere, but if you need block storage you should really purchase some infrastructure for it.  iSCSI is a relatively simple thing to insert into an existing environment, just again beware 1Gb bandwidth.

Protection

One final note I wanted to mention is about protection.  There are methods for replicating file and block data, but many times these are different mechanisms, or at least they function in different ways.  For instance, EMC RecoverPoint is a block replication solution.  EMC VNX Replicator is a file replication solution.  RP won’t protect your file data (unless you franken-config it to replicate your file LUNs), and Replicator won’t protect your block data.  NAS supports NDMP while SAN generally does not.  Some solutions, like NetApp snapshots, do function on both file and block volumes, but they are still very different in how they are taken and restored…block snapshots should be initiated from the host the LUN is mounted to (in order to avoid disastrous implications regarding host buffers and file system consistency) while file snapshots can be taken from any old place you please.

I say all this just to say, be certain you understand how your SAN and NAS data is going to be protected before you lay down the $$$ for a new frame!  It would be a real bummer to find out you can’t protect your file data with RecoverPoint after the fact.  Hopefully your pre-sales folks have you covered here but again be SURE!

And……..

We’ve drawn a lot of clear distinctions between SAN and NAS, which kind of fall back into the “bullet point” message that I talked about in my first post.  All that is well and good, but here is where the confusion starts to set in: in both NAS cases (CIFS and NFS), on your computer the remote array may appear to be a disk.  It may look like a local hard drive, or even appear very similar to a SAN LUN. This leads some people to think that they are the same, or at least are doing the same things.  I mean, after all, they even have the same letters in the acronym!

However, your computer never issues SCSI commands to a NAS.  Instead it issues commands to the remote file server for things like create, delete, read, write, etc.  Then the remote file server issues SCSI (block) commands to its disks in order to make those requests happen.

In fact, a major point of understanding here is, “who has the file system?”  This will help you understand who can do what with the data.  In the next post we are going to dive into this question head first in a linux lab environment.

SAN vs NAS Part 1: Intro

Welcome to the New Year!

I wanted to write a blog post on a very confusing storage topic (at least for myself) but I have also been searching for another large scale topic similar to the set I wrote on RAID last year.  After thinking about it I feel like my confusing question is really just a subset of a misunderstanding about block storage.  So without further ado, I’m going to write up a pretty detailed break down of SAN (Storage Area Networks), or block storage, vs NAS (Network Attached Storage), or file storage.  This is another topic, like RAID, that is fundamental and basic but not always fully understood.

Certainly there are other write ups on this topic out there, and in ways this can be summed up in just a few bullet points.  But I think a larger discussion will really help solidify understanding.

The specific confusing question I’ll ask and hopefully answer is, with thin provisioned block storage, if I delete a ton of data out of a LUN, why do I not see any space returned on the storage array?  Say I’ve got a thin 1TB LUN on my VMAX, and it is currently using (allocated) 500GB of space.  I go to the server where this LUN is attached and delete 300GB of data.  Querying the VMAX, I still see 500GB of space used.

This concept is hard to understand and I’ve not only asked this question myself, I’ve fielded it from several people in a variety of roles.  Central to understanding this concept is understanding the difference between file and block storage.

To start out, let’s briefly define the nature of things about file and block storage.

SAN – Block Storage

The easiest way to think of SAN is a disk drive directly attached to a computer.  Block storage access is no different from plugging in a USB drive, or installing another hard drive into the server, as far as how the server accesses it.  The medium for accessing it over your SAN varies with protocols and hardware, but at the end of the day you’ve got a disk drive (block device) to perform I/O with.

NAS – File Storage

The idea with NAS is that you are accessing files stored on a file server somewhere.  So I have a computer system in the corner that has a network service running on it, and my computer on my desk connects to that system.  Generally this connection is going to be CIFS (for Windows) or NFS (for *nix/vSphere).  The file protocol here varies but we are (most of the time) going to be running over IP.  And yes, sometimes Linux folks access CIFS shares and sometimes Windows folks do NFS, but these are exceptions to the rule.

In part 2, I’ll be covering more of the differences and similarities between these guys.

How Much Journal Space Will EMC RecoverPoint use?

I see this question asked relatively frequently, and it is super easy to answer.  However I wanted to provide some context so that folks can understand a little better about how RecoverPoint works, and why the journal works the way it does.

The Answer

First – how much journal space will RecoverPoint use?  All of it.  Every time.  If you allocate 10GB of journal space to a Consistency Group, RP will use all of it.  And if you allocate 100GB to that same CG (or 500GB), it will again use all of it.  Depending on the write rate, it may take a very long time to fill up, but eventually it will use it all.

(Now, the journal itself is divided into different areas and for actually storing snapshots it is only able to use part of the total capacity, the rest of it being reserved.  But we are just talking about the snapshot area)

The Reason

The reason this happens is due to how RecoverPoint functions as compared to other technologies where you might allocate capacity for recovery, like snapshot storage space.

Let’s take a moment to discuss snapshot technology, as with VNX snapshots.  In this case you don’t allocate capacity for anything – it just uses free pool space – but the space utilization mechanism is very similar to all snapshot methods.  A snapshot is taken at some point in time, and all blocks are “frozen” at that time in the snapshot.  As changes are made to the real data, one way or another the original data makes its way over to the snapshot space area.  So right after the snapshot is taken, virtually no space is utilized.  And as things change over time, the snapshot space utilization increases.

Then at some point (hopefully) you’d delete the snapshot and the space would be returned and you’d be using less snapshot space.  Let’s say with daily snapshot scheduling (one snap per day for a week), eventually you’d move into a kind of steady state where you have the total utilization for the week be stable, with some minor peaks and valleys as snapshots get deleted and retaken.  So your utilization might be a little higher on Tuesday than it is on Saturday, but overall most of your Tuesdays will look the same.

RecoverPoint is really nothing like this.  Instead, abstractly, I like to think as the journal space as a bucket.  You put the bucket under your production LUN and any writes get split into the bucket.  Over time the bucket gets full of writes.  This happens for EVERY consistency group, EVERY time, and is why RP will ALWAYS use all journal space.  Of course the journal is oriented by time and this is where the bucket analogy begins to break down.  So let’s dig a little deeper.

Think of the RP journal as a line – like waiting to purchase tickets.  Or more accurately, a time line.  Whether you have one journal volume or multiple journal volumes, they still form this same line as a whole.  It starts out empty and the first write comes in and heads immediately to the front of the line, because there is nothing in it. Like this:firstwrite

That first, only write is now our oldest write in the queue (because again it is the only write!).

Subsequent writes queue up behind it.  Like this:

morewrites

Eventually the line capacity (journal capacity) is full and we can’t let anyone else in line, like this:

fullwrites

Now we are at kind of the steady-state from the journal perspective.  The writes at the front of the line (the oldest point in time) start falling off to make room for newer writes as they come into the queue.  You can imagine these blocks are just continually shifting to the right as new writes come in, and old writes fall off and are lost.

This timeline defines your protection window.  You can recover from any point in time all the way back to the oldest write, and how many total writes in the queue depend on how large the journal space is.  In this manner it is (hopefully) easy to see that RecoverPoint will always use as much journal space as you give it, and the more journal space you give it, the longer in time you can roll back to.

Since I’ve already got the graphics going, and as a bonus, let’s talk about the replica LUN.  The other thing that RP is doing is constantly updating the replica LUN with journal entries.  It figures out where the next write is going, reads data from that location which is inserted in the journal, and then writes the new data into that location.  As writes pile up, the “journal lag” increases.  Essentially the replica LUN is going to be, at any given point in time, somewhere along this line, like this:

REPLICA

 

You can see several things depicted in this graphic.  We have our entire timeline of the journal, which is our protection window, with the oldest write at one end and the newest write at the other.  We also have our replica LUN which at this very moment is at the state indicated by the black arrow.

The writes in front of this black arrow are writes that have yet to be distributed to the Replica LUN.  These are the journal lag.  If a ton of new writes happen, more blue stacks up, more green falls off the end, and the Replica LUN state shifts to the right.  Journal lag increases, because we have more data that has not yet been distributed into the Replica LUN, like this.

replica_lag

The green blocks behind this represent the Undo Stream.  This is data that is read FROM the replica LUN and written INTO the journal for an undo operation.  So if RP was going to process that next blue block, it would first find the location in the Replica LUN the block was destined for.  Then it would read and insert the current data into the journal, which would be a new green block at the front of the green blocks.  Finally it would write the blue block into the replica LUN and the Replica LUN state would advance one block.  And if write I/O ceases for long enough (or there is just enough performance for the Replica operations to catch up), then the Replica LUN state moves up, the undo stream gets larger, and the journal lag gets smaller.

The Summary

In summary:

  • RecoverPoint will always use ALL of the journal space you give it, regardless of the activity of what it is protecting
  • RecoverPoint journal space can be seen as a time line, with the oldest writes on one end and the newest writes on another end.  This time line is the protection window
  • The Replica LUN, at any given point in time, is somewhere along the time line.  Any space between the Replica LUN and the newest write represents the journal lag.