Shedding Light on Storage Encryption

Posted on April 22, 2016 by jcason

I’ve been noticing some fundamental misunderstandings around storage encryption – I see this most when dealing with XtremIO although plenty of platforms support it (VNX2 and VMAX). I hope this blog post will help someone who is missing the bigger picture and maybe make a better decision based on tradeoffs. This is not going to be a heavily technical post, but is intended to shed some light on the topic from a strategic angle.

Hopefully you already know, but encryption at a high level is a way to make data unreadable gibberish except by an entity that is authorized to read it. The types of storage encryption I’m going to talk about are Data At Rest Encryption (often abbreviated DARE or D@RE), in-flight encryption, and host-based encryption. I’m talking in this post mainly about SAN (block) storage, but these concepts also apply to NAS (file) storage. In fact, in-flight encryption is probably way more useful on a NAS array given the inherent security of FC fabrics. But then, iSCSI, and it gets cloudier.

Before I start, security is a tool and can be used wisely or poorly with equivalent results. Encryption is security. All security, and all encryption, is not great. Consider the idea of cryptographic erasure, by which data is “deleted” merely because it is encrypted and nobody has the key. Ransomware thrives on this. You are looking at a server with all your files on it, but without the key they may as well be deleted. Choosing a security feature for no good business reason other than “security is great” is probably a mistake that is going to cause you headaches.

encryptionblogpic

Here is a diagram with 3 zones of encryption. Notice that host-based encryption overlaps the other two – that is not a mistake as we will see shortly.

Data At Rest Encryption

D@RE of late is typically referring to a storage arrays ability to encrypt data at the point of entry (write) and decrypt on exit (read). Sometimes this is done with ASICs on an array or I/O module, but it is often done with Self Encrypting Drives (SEDs). However the abstract concept of D@RE is simply that data is encrypted “at rest,” or while it is sitting on disk, on the storage array.

This might seem like a dumb question, but it is a CRUCIAL one that I’ve seen either not asked or answered incorrectly time and time again: what is the purpose of D@RE? The point of D@RE is to prevent physical hardware theft from compromising data security. So, if I nefariously steal a drive out of your array, or a shelf of drives out of your array, and come up with some way to attach them to another system and read them, I will get nothing but gibberish.

Now, keep in mind that this problem is typically far more of an issue on a small server system than it is a storage array. A small server might just have a handful of drives associated with it, while a storage array might have hundreds, or thousands. And those drives are going to be in some form of RAID protection which leverages striping. So even without D@RE the odds of a single disk holding meaningful data is small, though admittedly it is still there.

More to the point, D@RE does not prevent anyone from accessing data on the array itself. I’ve heard allusions to this idea that “don’t worry about hackers, we’ve got D@RE” which couldn’t be more wrong, unless you think hackers are walking out of your data center with physical hardware. If the hackers are intercepting wire transmissions, or they have broken into servers with SAN access, they have access to your data. And if your array is doing the encryption and someone manages to steal the entire array (controllers and all) they will also have access to your data.

D@RE at the array level is also one of the easiest to deal with from a management perspective because usually you just let the array handle everything including the encryption keys. This is mostly just a turn it on and let it run solution. You don’t notice it and generally don’t see any fall out like performance degradation from it.

In-Flight Encryption

In-flight encryption is referring to data being encrypted over the wire. So your host issues a write to a SAN LUN, and that traverses your SAN network and lands on your storage array. If data is encrypted “in-flight,” then it is encrypted throughout (at least) the switching.

Usually this is accomplished with FC fabric switches that are capable of encryption. So the switch that sees a transmission on an F port will encrypt it, and then transmit it encrypted along all E ports (ISLs) and then decrypt it when it leaves another F port. So the data is encrypted in-flight, but not at rest on the array. Generally we are still talking about ASICs here so performance is not impacted.

Again let’s ask, what is the purpose of in-flight encryption? In-flight encryption is intended to prevent someone who is sniffing network traffic (meaning they are somehow intercepting the data transmissions, or a copy of the data transmissions, over the network) from being able to decipher data.

For local FC networks this is (in my opinion) not often needed. FC networks tend to be very secure overall and not really vulnerable to sniffing. However, for IP based or WAN based communication, or even stretched fabrics, it might be sensible to look into something like this.

Also keep in mind that because data is decrypted before being written to the array, it does not provide the physical security that D@RE does, nor does it prevent anyone from accessing data in general. You also sometimes have the option of not decrypting when writing to the array. So essentially the data is encrypted when leaving the host, and written encrypted on the array itself. It is only decrypted when the host issues a read for it and it exits the F port that host is attached to. This results in you having D@RE as well with those same benefits. A real kicker here becomes key management, because in-flight encryption can be removed at any time without issue. You can remove or disable in-flight encryption and not see any change in data because at the ends it is unencrypted. However, if the data is written encrypted on the array, then you MUST have those keys to read that data. If you had some kind of disaster that compromised your switches and keys, you would have a big array full of cryptographically erased data.

Host Based Encryption

Finally, host-based encryption is any software or feature that encrypts LUNs or files on the server itself. So data that is going to be written to files (whether SAN based or local files) is encrypted in memory before the write actually takes place.

Host-based encryption ends up giving you both in-flight encryption and D@RE as well. So when we ask the question, what is the purpose of host-based encryption?, we get the benefits we saw from in-flight and D@RE, as well as another one. That is the idea that even with the same hardware setup, no other host can read your data. So if I were to forklift your array, fabric switches, and get an identical server (hardware, OS, software) and hook it up, I wouldn’t be able to read your data. Depending on the setup, if a hacker compromises the server itself in your data center, they may not be able to read the data either.

So why even bother with the other kinds of encryption? Well for one, generally host-based encryption does incur a performance hit because it isn’t using ASICs. Some systems might be able to handle this but many won’t be able to. Unlike D@RE or in-flight, there will be a measurable degradation when using this method. Another reason is that key management again becomes huge here. Poor key management and a server having a hardware failure can lead to that data being unreadable by anyone. And generally your backups will be useless in this situation as well because you have backups of encrypted data that you can’t read without the original keys.

And frankly, usually D@RE is good enough. If you have a security issue where host-based encryption is going to be a benefit, usually someone already has the keys to the kingdom in your environment.

Closing Thoughts

Hopefully that cleared up the types of encryption and where they operate.

Another question I see is “can I use one or more at the same time?” The answer is yes, with caveats. There is nothing that prevents you from using even all 3 at the same time, even though it wouldn’t really make any sense. Generally you want to avoid overlapping because you are encrypting data that is already encrypted which is a waste of resources. So a sensible pairing might be D@RE on the array and in-flight on your switching.

A final HUGELY important note – and what really prompted me to write this post – is to make sure you fully understand the effect of encryption on all of your systems. I have seen this come up in a discussion about XtremIO using D@RE paired with host-based encryption. The question was “will it work?” but the question should have been “should we do this?” Will it work? Sure, there is nothing problematic about host-based encryption and XtremIO D@RE interacting, other than the XtremIO system encrypting already encrypted data. What is problematic, though, is the fact that encrypted data does not compress, and most encrypted data won’t dedupe either…or at least not anywhere close to the level of unencrypted data. And XtremIO generally relies on its fantastic inline compression and dedupe features to fit a lot of data on a small footprint. XtremIO’s D@RE happens behind the compression and deduplication, so there is no issue. However host-based encryption will happen ahead of the dedupe/compression and will absolutely destroy your savings. So if you wanted to use the system like this, I would ask, how was it sized? Was it sized with assumptions about good compression and dedupe ratios? Or was it sized assuming no space savings? And, does the extra money you will be spending for the host-based encryption product and the extra money you will be spending on the additional required storage justify the business problem you were trying to solve? Or was there even a business problem at all? A better fit would probably be something like a tiered VNX2 and FAST cache which could easily handle a lot of raw capacity and use the flash where it helps the most.

Again, security is a tool, so choose the tools you need, use them judiciously, and make sure you fully understand their impact (end-to-end) in your environment.

SAN vs NAS Part 2: Hardware, Protocols, and Platforms, Oh My!

Posted on January 20, 2015 by jcason

In this post we are going to explore some of the various options for SAN and NAS.

SAN

There are a couple of methods and protocols for accessing SAN storage. One is Fibre Channel (note: this is not misspelled, the protocol is Fibre, the cables are fiber) where SCSI commands are encapsulated within Fibre Channel frames. This may be direct Fibre Channel (“FC”) over a Fibre Channel fabric, or Fibre Channel over Ethernet (“FCoE”) which further encapsulates Fibre Channel frames inside ethernet.

With direct Fibre Channel you’ll need some FC Host Bus Adapters (HBAs), and probably some FC switches like Cisco MDS or Brocade (unless you plan on direct attaching a host to an array which most of the time is a Bad Idea).

With FCoE you’ll be operating on an ethernet network typically using Converged Network Adapters (CNAs). Depending on the type of fabric you are building, the array side may still be direct FC, or it may be FCoE as well. Cisco UCS is a good example of the split out, as generally it goes from host to Fabric Interconnect as FCoE, and then from Fabric Interconnect to array or FC switch as direct Fibre Channel.

It could also be accessed via iSCSI, which encapsulates SCSI commands within IP over a standard network. And then there are some other odd mediums like infiniband, or direct attach via SAS (here we are kind of straying away from the SAN and are really just directly attaching disks, but I digress).

What kind of SAN you use depends largely on the scale and type of your infrastructure. Generally if you already have FC infrastructure, you’ll stay FC. If you don’t have anything yet, you may go iSCSI. Larger and performance environments typically trend toward FC, while small shops trend towards iSCSI. That isn’t to say that one is necessarily better than the other – they have their own positives and negatives. For example, FC has its own learning curve with fabric management like zoning, while iSCSI connections are just point to point over existing networks that someone probably already knows. The one thing I will caution against here is if you are going for iSCSI, watch out for 1Gb configurations – there is not a lot of bandwidth and the network can get choked VERY quickly. I personally prefer FC because I know it well and trust its stability, but again there are positives and negatives.

Back to the subject at hand – in all cases with SAN the recurring theme here is SCSI commands. In other words, even though the “disk” might be a virtual LUN on an array 10 feet (or 10 miles) away, the computer is treating it like a local disk and sending SCSI disk commands to it.

Some array platforms are SAN only, like the EMC VMAX 10K, 20K, 40K series. EMC XtremIO is another example of a SAN only platform. And then there are non-EMC platforms like 3PAR, Hitachi, and IBM XIV. Other platforms are unified, meaning they do both SAN and NAS. EMC VNX is a good example of a unified array. NetApp is another competitor in this space. Just be aware that if you have a SAN only array, you can’t do NAS…and if you have a NAS only array (yes they exist, see below), you can’t do SAN. Although some “NAS” arrays also support iSCSI…I’d say most of the time this should be avoided unless absolutely necessary.

NAS

NAS on the other hand is virtually always over an IP network. This is going to use standard ethernet adapters (1Gb or 10Gb) and standard ethernet switches and IP routers.

As far as protocols there is CIFS, which is generally used for Windows, and NFS which is generally used on the Linux/Unix/vSphere side. CIFS has a lot of tie-ins with Active Directory, so if you are a windows shop with an AD infrastructure, it is pretty easy to leverage your existing groups for permissions. NFS doesn’t have these same ties with AD, but does support NIS for some authentication services.

The common theme on this side of the house is “file” which can be interpreted as “file system.” With CIFS, generally you are going to connect to a “share” on the array, like \\MYARRAY1\MYAWESOMESHARE. This may be just through a file browser for a one time connection, or this may be mounted as a drive letter via the Map Network Drive feature. Note that even though it is mounted as a drive letter, it is still not the same as an actual local disk or SAN attached LUN!

For NFS, an “export” will be configured on the array and then mounted on your computer. This actually gets mounted within your file system. So you may have your home directory in /users/myself, and you create a directory “backups” and mount an export to it doing something like mount -t nfs 172.0.0.10:/exports/backups /users/myself/backups. Then you access any files just as you would any other ones on your computer. Again note that even though the NFS export is mounted within your file system, it is still not the same as an actual local disk or SAN attached LUN!

Which type of NAS protocol you use is generally determined by the majority of your infrastructure – whether it is Windows or *nix. Or you may run both at once! Running and managing both NFS and CIFS is really more of a hurdle with understanding the protocols (and sometimes licensing both of them on your storage array), whereas the choice to run both FC and iSCSI has hardware caveats.

For NAS platforms, we again look to the unified storage like EMC VNX. There are also NAS gateways that can be attached to a VMAX for NAS services. EMC also has a NAS only platform called Isilon.

One thing to note is that if your array doesn’t support NAS (say you have a VMAX or XtremIO) the gateway solution is definitely viable and enables some awesome features, but it is also pretty easy to spin up a Windows/Linux VM, or use a Windows/Linux physical server (but seriously, please virtualize!) that uses array block storage, but then serves up NAS itself. So you could create a Windows file server on the VMAX and then all your NAS clients would connect to the Windows machine.

The reverse is not really true…if your array doesn’t support SAN, it is difficult to wedge SAN into the environment. You can always do NFS with vSphere, but if you need block storage you should really purchase some infrastructure for it. iSCSI is a relatively simple thing to insert into an existing environment, just again beware 1Gb bandwidth.

Protection

One final note I wanted to mention is about protection. There are methods for replicating file and block data, but many times these are different mechanisms, or at least they function in different ways. For instance, EMC RecoverPoint is a block replication solution. EMC VNX Replicator is a file replication solution. RP won’t protect your file data (unless you franken-config it to replicate your file LUNs), and Replicator won’t protect your block data. NAS supports NDMP while SAN generally does not. Some solutions, like NetApp snapshots, do function on both file and block volumes, but they are still very different in how they are taken and restored…block snapshots should be initiated from the host the LUN is mounted to (in order to avoid disastrous implications regarding host buffers and file system consistency) while file snapshots can be taken from any old place you please.

I say all this just to say, be certain you understand how your SAN and NAS data is going to be protected before you lay down the $$$ for a new frame! It would be a real bummer to find out you can’t protect your file data with RecoverPoint after the fact. Hopefully your pre-sales folks have you covered here but again be SURE!

And……..

We’ve drawn a lot of clear distinctions between SAN and NAS, which kind of fall back into the “bullet point” message that I talked about in my first post. All that is well and good, but here is where the confusion starts to set in: in both NAS cases (CIFS and NFS), on your computer the remote array may appear to be a disk. It may look like a local hard drive, or even appear very similar to a SAN LUN. This leads some people to think that they are the same, or at least are doing the same things. I mean, after all, they even have the same letters in the acronym!

However, your computer never issues SCSI commands to a NAS. Instead it issues commands to the remote file server for things like create, delete, read, write, etc. Then the remote file server issues SCSI (block) commands to its disks in order to make those requests happen.

In fact, a major point of understanding here is, “who has the file system?” This will help you understand who can do what with the data. In the next post we are going to dive into this question head first in a linux lab environment.

Thoughts on Thin Provisioning

Posted on May 28, 2014 by jcason

I finally decided to put all my thoughts down on the topic of thin provisioning. I wrestled with this post for a while because some of what I say is going to go kinda-sorta against a large push in the industry towards thin provisioning. This is not a new push; it has been happening for years now. This post may even be a year or two too late…

I am not anti-thin – I am just not 100% pro-thin. I think there are serious questions that need to be addressed and answered before jumping on board with thin provisioning. And most of these are relatively non-technical; the real issue is operational.

Give me a chance before you throw the rocks.

What is Thin Provisioning?

First let’s talk about what thin provisioning is, for those readers who may not know. I feel like this is a pretty well known and straightforward concept so I’m not going to spend a ton of time on it. Thin provisioning at its core is the idea of provisioning storage space “on demand.”

Before thin provisioning a storage administrator would have some pool of storage resources which gave some amount of capacity. This could be simply a RAID set or even an actual pooling mechanism like Storage Pools on VNX. A request for capacity would come in and they would “thick provision” capacity out of the pool. The result would mean that the requested capacity would be reserved from the pooled capacity and be unavailable for use…except obviously for whatever purpose it was provisioned for. So for example if I had 1000GB and you requested a 100GB LUN, my remaining pool space would be 900GB. I could use the 900GB for whatever I wanted but couldn’t encroach into your 100GB space – that was yours and yours alone. This is a thick provisioned LUN.

Of course back then it wasn’t “thick provisioning,” it was just “provisioning” until thin came along! With thin provisioning, after the request is completed and you’ve got your LUN, the pool is still at 1000GB (or somewhere very close to it due to metadata allocations which are beyond the scope of this post). I have given you a 100GB LUN out of my 1000GB pool and still I have 1000GB available. Remember that as soon as you get this 100GB LUN, you will usually put a file system on it and then it will appear empty. This emptyness is the reason that the 100GB LUN doesn’t take up any space…there isn’t really any data on it until you put it there.

Essentially the thin LUN is going to take up no space until you start putting stuff into it. If you put 10GB of data into the LUN, then it will take up 10GB on the back side. My pool will now show 990GB free. You should have a couple of indicators on the array like allocated or subscribed or committed and consumed or used. Allocated/subscribed/committed is typically how much you as the storage administrator have created in the pool. Consumed or used is how much the servers themselves have eaten up.

What follows are, in no particular order, some things to keep in mind when thin provisioning.

Communication between sysadmin and storage admin

This seems like a no-brainer but a discussion needs to happen between the storage admins providing the storage and the sysadmins who are consuming it. If a sysadmin is given some space, they typically see this as space they can use for whatever they want. If they need a dumping ground for a big ISO, they can use the SAN attached LUN with 1TB of free space on it. Essentially they will likely feel that space you’ve allocated is theirs to do whatever they want with. This especially makes sense if they’ve been using local storage for years. If they can see disk space on their server, they can use it as they please. It is “their” storage!

You need to have this conversation so that sysadmins understand activities and actions that are “thin hostile.” A thin hostile action is one that effectively nullifies the benefit of thin provisioning by eating up space from day 1. An example of a thin hostile action would be hard formatting the space a 500GB database will use up front, before it is actually in use. Another example of a thin hostile action would be to do a block level zero formatting of space, like Eager Zero Thick on ESX. And obviously using excess free space on a LUN for a file dumping ground is extremely thin hostile!

Another area of concern here is deduplication. If you are using post-process deduplication, and you have thin provisioned storage, your sysadmins need to be aware of this when it comes to actions that would overwrite a significant amount of data. You may dedupe their data space by 90%, but if they come in and overwrite everything it can balloon quickly.

The more your colleagues know about how their actions can affect the underlying storage, the less time you will spend fire fighting. Good for them, good for you. You are partners, not opponents!

Oversubscription & Monitoring

With thin provisioning, because no actual reservation happens on disk, you can provision as much storage as you want out of as small a pool as you want. When you exceed the physical media, you are “oversubscribing” (or overcommitting, or overprovisioning, or…). For instance, with your 1000GB you could provision 2000GB of storage. In this case you would be 100% oversubscribed. You don’t have issues as long as the total used or consumed portion is less than 1000GB.

There are a lot of really appealing reasons for doing this. Most of the time people ask for more storage than they really need…and if it goes through several “layers” of decision makers, that might amplify greatly. Most of the time people don’t need all of the storage they asked for right off the bat. Sometimes people ask for storage and either never use it or wait a long time to use it. The important thing to never forget is that from the sysadmin’s perspective, that is space you guaranteed them! Every last byte.

Oversubscription is a powerful tool, but you must be careful about it. Essentially this is a risk-reward proposition: the more people you promise storage to, the more you can leverage your storage array, but the more you risk that they will actually use it. If you’ve given out 200% of your available storage, that may be a scary situation when a couple of your users decide to make good on the promise of space you made to them. I’ve seen environments with as much as 400% oversubscription. That’s a very dangerous gamble.

Thin provisioning itself doesn’t provide much benefit unless you choose to oversubscribe. You should make a decision on how much you feel comfortable oversubscribing. Maybe you don’t feel comfortable at all (if so, are you better off thick?). Maybe 125% is good for you. Maybe 150%. Nobody can make this decision for you because it hinges on too many internal factors. The important thing here is to establish boundaries up front. What is that magic number? What happens if you approach it?

Monitoring goes hand in hand with this. If you monitor your environment by waiting for users to email that systems are down, oversubscribing is probably not for you. You need to have a firm understanding of how much you’ve handed out and how much is being used. Again, establish thresholds, establish an action plan for exceeding them, and monitor them.

Establishing and sticking with thresholds like this really helps speed up and simplify decision making, and makes it very easy to measure success. You can always re-evaluate the thresholds if you feel like they are too low or too high.

Also make sure your sysadmins are aware of whether you are oversubscribed or not, and what that means to them. If they are planning on a massive expansion of data, maybe they can check with you first. Maybe they requested storage for a project and waited 6 months for it to get off the ground – again they can check with you to make sure all is well before they start in on it. These situations are not about dictating terms, but more about education. Many other resources in your environment are likely oversubscribed. Your network is probably oversubscribed. If a sysadmin in the data center decided to suddenly multicast an image to a ton of servers on a main network line, you’d probably have some serious problems. You probably didn’t design your network to handle that kind of network traffic (and if you did you probably wasted a lot of money). Your sysadmins likely understand the potential DDoS effect this would generate, and will avoid it. Nobody likes pain.

“Runway” to Purchase New Storage

Remember with thin provisioning you are generally overallocating and then monitoring (you are monitoring, aren’t you?) usage. At some point you may need to buy more storage.

If you wait till you are out of storage, that’s no good right? You have a 100% consumed pool, with a bunch of attached hosts that are thinking they have a lot more storage to run through. If you have oversubscribed a pool of storage and it hits 100%, it is going to be a terrible, horrible, no good, very bad day for you and everyone around you. At a minimum new writes to anything in that pool will be denied, effectively turning your storage read-only. At a maximum, the entire pool (and everything in it) may go offline, or you may experience a variety of fun data corruptions.

So, you don’t want that. Instead you need to figure out when you will order new storage. This will depend on things like:

How fast is your storage use growing?
How many new projects are you implementing?
How long does it take you to purchase new storage?

The last point is sometimes not considered before it is too late. When you need more storage you have to first figure out exactly what you need, then you need to spec it, then you need a quote, the quote needs approval, then purchasing, then shipping, then it needs to be racked/stacked, then implemented. How long does this process last for your organization? Again nobody can answer this but you. If your organization has a fast turn around time, maybe you can afford to wait till 80% full or more. But if you are very sluggish, you might need to start that process at 60% or less.

Another thing to consider is if you are a sluggish organization, you may save money by thick provisioning. Consider that you may need 15TB of storage in 2 years. Instead you buy 10TB of storage right off the bat with a 50% threshold. As soon as you hit 5TB of storage used you buy another 10TB to put you at 20. Then when you hit 10 you buy another 10TB to put you at 30. Finally at 15TB you purchase again and hit 40TB. If you had bought 20 to begin with and gone thick, you would have never needed to buy anything else. This situation is probably uncommon but I wanted to mention it as a thought exercise. Think about how the purchasing process will impact the benefit you are trying to leverage from thin provisioning.

Performance Implications

Simply – ask your vendor whether thin storage has any performance difference over thick. The answer with most storage arrays (where you have an actual choice between thick and thin) is yes. Most of the time this is a negligible difference, and sometimes the difference is only in the initial allocation – that is to say, the first write to a particular LBA/block/extent/whatever. But again, ask. And test to make sure your apps are happy on thin LUNs.

Feature Implications

Thin provisioning may have feature implications on your storage system.

Sometimes thin provisioning enables features. On a VMAX, thin provisioning enables pooling of a large number of disks. On a VNX thin provisioning is required for deduplication and VNX Snapshots.

And sometimes thin provisioning either disables or is not recommended with certain features. On a VNX thin LUNs are not recommended for use as File OE LUNs, though you can still do thin file systems on top of thick LUNs.

Ask what impact thin vs thick will have on array features – even ones you may not be planning to use at this very second.

Thin on Thin

Finally, in virtualized environments, in general you will want to avoid “thin on thin.” This is a thin datastore created on a thin LUN. The reason is that you tend to lose a static point of reference for how much capacity you are overprovisioning. And if your virtualization team doesn’t communicate too well with the storage team, they could be unknowingly crafting a time bomb in your environment.

Your storage team might have decided they are comfortable with a 200% oversubscription level, and your virt team may have made this same decision. This will potentially overallocate your storage by 400%! Each team is sticking to their game plan, but without knowing and monitoring the other folks they will never see the train coming.

You can get away with thin on thin if you have excellent monitoring, or if your storage and virt admins are one and the same (which is common these days). But my recommendation still continues to be thick VMs on thin datastores. You can create as many thin datastores as you want, up to system limits, and then create lazy zeroed thick VMs on top of them.

Edit: this recommendation assumes that you are either required or compelled to use thin storage. Thin VMs on thick storage are just as effective, but sometimes you won’t have a choice in this matter. The real point is keeping one side or the other thick gives you a better point of reference for the amount of overprovisioning.

Summary

Hopefully this provided some value in the form of thought processes around thin provisioning. Again, I am not anti-thin; I think it has great potential in some environments. However, I do think it needs to be carefully considered and thought through when it sometimes seems to be sold as a “just thin provision, it will save you money” concept. It really needs to be fleshed out differently for every organization, and if you take the time to do this you will not only better leverage your investment, but you can avoid some potentially serious pain in the future.

Just what the heck is Geometry Limited on VMAX FTS?

Posted on May 7, 2014 by jcason

VMAX is a truly amazing piece of hardware with truly amazing features and unfortunately some truly mind-boggling concepts behind it. I think really this comes with the territory – most often the tools with the biggest capabilities and flexibilities require a lot of knowledge to configure and understand.

One concept that I struggled with on the VMAX was “geometry limited” via Federated Tiered Storage so I thought I would provide some info for anyone else who is having trouble with it. This also gives me the opportunity to talk briefly about some other VMAX topics.

Federated Tiered Storage (FTS) is the ability to have the VMAX leverage third-party storage arrays behind it. So as a simple example, I could connect a VNX to the back of a VMAX, and then present usable storage to the VMAX, and then present that storage to a host. The VMAX in this case acts more like a virtual front end than a storage array itself.

FTS is a complex subject that could itself be the subject of multiple posts. Instead I want to provide a high level overview of concepts, and then go over the ‘geometry limited’ part. There are two ways that FTS manages the other arrays: external provisioning and encapsulation.

External Provisioning

External provisioning is the easiest to understand. In this method you present some amount of storage from the external array as a LUN (or multiple LUNs) to the VMAX, and the VMAX interprets this as being special disks. It then uses these disks in much the same manner as it would use a direct attached disk. I say the disks are special because it relies on the assumption that the external array is going to RAID protect the data. Therefore, there is no need to once again RAID protect them on a VMAX like it would normally do – this would waste space and likely degrade performance. Because of this, the VMAX manipulates them in unprotected form. And because they are just seen as attached disks, it also formats them which destroys any data that is on them.

From here you can do most of the stuff you would do with a normal set of disks. You can create a thin pool out of them and even use them as a tier in FAST VP. This is a good way to really maximize the benefits of FTS and leverage your older storage arrays. Or simply just consolidate your arrays into one point of management. Cool feature, but again any data on the LUNs will evaporate.

Encapsulation

Encapsulation is the focus of this post and is a little more complicated. Encapsulation allows you to preserve data on a LUN presented for the purposes of FTS. For example if you had an existing LUN on a VNX that a host was using (say oracle database data) and you wanted to present that LUN through the VMAX, you probably wouldn’t want to use External Provisioning because the VMAX would wipe out the data. Instead you do what is known as encapsulation.

When you encapsulate the external LUN, the VMAX in either a thick way or a thin way preserves all the data on the LUN. So you could then connect your oracle database server to the VMAX, attach the encapsulated LUN to it, and viola! all your data is available. It preserves the data by creating a “dummy LUN” of sorts on the VMAX and passing through from the external array to a host.

Encapsulation is neat but there are some restrictions around it. For instance, an encapsulated LUN can’t be the target of a VLUN migration (though you can VLUN migrate it somewhere else) and an encapsulated LUN can’t participate in FAST (whether geometry limited or not).

Some encapsulated LUNs are geometry limited and some aren’t.

Device Sizing

In order to understand what geometry limited means, you must first understand how the VMAX sees device sizes. VMAX sizing is always done in cylinders (15 tracks) which are 960KB. This means that what I would consider common LUN sizes (100GB, 500GB, 1TB, 2TB) don’t actually “fit” on a VMAX. Instead, if you ask it to create a 100GB LUN, it rounds up to the nearest cylinder(ish). You can kind of think of this as a “Do No Harm” rule. If you request a device size that falls exactly onto a cylinder boundary, you get that exact device size. If you request one that falls outside of a cylinder boundary, the VMAX rounds up in order to make sure that you get all the space you originally requested.

We won’t get a 100GB LUN because 100GB doesn’t fall onto a cylinder boundary:

100GB * 1024 * 1024 = 104857600KB / 960KB = 109226.6 Cylinders

So we might end up with a device that is actually 109227 cylinders, which would be (109227 * 960KB / 1024 /1024) 100.000305GB.

During “normal” operations this difference is not particularly meaningful (unless you are trying to match device sizes for things like replication, in which case it becomes tremendously important), but it is important to understand.

It is also important to understand that there is a maximum device size on a VMAX, and that is 262668 cylinders, or 240.479GB. In order for a device to be larger than this, you must create what is known as a meta device, or several devices bundled together into a larger virtual device. For instance, if I needed an 800GB LUN, I could “meta” four 200GB regular devices together and have an 800GB device.

Geometry Limited

So finally, what does geometry limited mean? Geometry limited is what happens when you encapsulate a LUN into a device that does not match up exactly with a VMAX device from a size perspective. In other words, the “dummy LUN” on the VMAX is larger than the actual LUN on the remote array. Again remember the “Do No Harm” philosophy here. You are asking the VMAX to preserve data from an external array, and there is a good chance that external device will not align with cylinder boundaries. The VMAX in this case can’t round down, because it would effectively be chopping off the last parts of what you asked it to preserve – not good! Instead, because it needs to preserve the entire external LUN, and it is required that device sizes align to cylinder boundaries, the VMAX device size is larger than the actual LUN on the external array – this is exactly what causes a device to be geometry limited. If it happens that the external LUN matches up precisely with the cylinder boundary, it is not geometry limited.

With geometry limited devices, the VMAX is going to “fake” the sizing to the host. So no matter how large the VMAX device is, the host is only going to see exactly what is on the original array.

To demonstrate, there are two specific instances where this will happen.

Scenario the First

The first occurrence of geometry limited would be when the external device size does not align with a cylinder boundary. This happens exactly like my previous example with the 100GB LUN. Any device size where the VMAX would have to round-up to match a cylinder boundary would be geometry limited. For instance, if I were to encapsulate a 15GB LUN, this device would not be geometry limited. This is because 15GB fits exactly into 960KB cylinders (16384 cylinders = 16384 * 960KB / 1024 / 1024 = 15GB).

But if I were to encapsulate a 50GB LUN, the VMAX needs to preserve the entire 50GB even though it doesn’t align with cylinder values. Similar to my 100.000305GB LUN above, the VMAX “dummy LUN” must be slightly larger than the 50GB LUN on the external array.

50GB * 1024 * 1024 = 52428800KB / 960KB = 54613.3 Cylinders

So the “dummy LUN” in this case needs to be 54614 cylinders in order to preserve all of the 50GB of data, and 54614 cylinders is larger than the original 50GB device. Hence this encapsulated LUN would be geometry limited.

Scenario the Second

The second occurrence of geometry limited happens due to meta configuration. Let’s encapsulate a 300GB LUN.

300GB * 1024 * 1024 = 314572800KB / 960KB = 327680 cylinders

OK good news! The device falls exactly onto a cylinder boundary, so no geometry limited feature here right? Well, maybe.

Remember that the max device size on a VMAX is around 240GB, so in this case we need a meta configuration to create the VMAX device. Whether this device is geometry limited or not revolves around exactly how that meta is created.

Sometimes the meta settings force all the members to be a specific size
Sometimes the meta settings force all the members to be the same size

These conditions can cause a geometry limited condition. In this case, imagine that our meta member size was 125GB and we forced all members to be the same size. In this case we would end up with 3 members and a 375GB meta – 75GB larger than the original device. Again this is a geometry limited device.

Another weird situation can arise when the original device might fall on a cylinder boundary but the meta member count causes it to deviate. For instance if we tried to do a 7 member meta for the 300GB device. Even with the proper meta settings, this is going to be a geometry limited device because 300GB / 7 will not align onto 960KB.

What does it mean and what can I do about it?

Geometry limited devices have several restrictions (do your own research to validate).

Can only be source devices for local replication
Can only be R1 devices for SRDF
Can’t be expanded

Getting rid of geometry limited is possible but still strange. For instance, you can VLUN migrate to an internal pool. This will fix the geometry limited part! What’s sort of bizarre is that the size mismatch still exists. Further, the VMAX will continue to “fake” the sizing to the host even if the VMAX device is expanded. In order to fix this you need to reset the LUN geometry, which requires that you unmount/unmap the device from all FAs…so this is disruptive.

Wrap Up

There are a lot of potential use cases for FTS and it is some really sweet technology. However, if you are going to use encapsulation, you should understand these limitations and make sure that you aren’t painting yourself into a corner.

raid-zero.com

Tag Archives: VMAX