RAID: Part 2 – Mirroring and Striping

In this post I’m going to cover two of the three patterns of RAID, and cover RAID0, 1.  Initially I was going to include RAID1/0 but this one got a bit too long, so I’ll break it out into its own post.

Mirroring

Mirroring is one of the more straightforward RAID patterns, and is employed in RAID1 and RAID1/0 schemes.  Mirroring is used for data protection.

Exactly as it sounds, mirroring involves pairs of disks (always pairs – exactly two) that are identical copies of each other.  In RAID1, a virtual disk would be comprised of two physical disks on the back end. So as I put files on the virtual disk that my server sees, essentially it is making two copies of it on the back side.

mirroring

In this scenario imagine a server is writing the alphabet to disk.  The computer host starts with A and continues through till I.  In the case of one physical disk on the left hand side, the writes end up on the single physical disk as expected.

On the right hand side, using the RAID1 pair, the computer host writes the same information, but on the storage array the writes are mirrored to two physical disks.  This pair must have identical contents at all times in order to guarantee the consistency and protection of RAID1.

The two copies are accomplished by write splitting, generally done by the RAID controller (the brains behind RAID protection – sometimes dedicated hardware, sometimes software).  The RAID controller detects a write coming into the system, realizes that the RAID type is 1, and splits the write onto the two physical disks of the pair.

How much capacity is available in both scenarios for the hosts?  300GB.  In the RAID1 configuration, even though there is a total of 600GB of space, 50% of the usable capacity is lost to RAID1.  Usable capacity is one of the key differentiators in the different RAID schemes.  Sometimes this is expressed in terms of usable and raw capacities – in this case there is 600GB of raw capacity, but only 300GB usable.  Again, the 50% factor comes in to play.

Another important concept to start to understand is what is called the write penalty.  The write penalty of any RAID protection scheme (except RAID0, discussed below) exists because in order to protect data, the storage system must have the ability to recover or maintain it in the event of a physical failure. And in order to recover data, it must have some type of extra information.  There is no free lunch, and unfortunately no magic sauce (although some BBQ sauce comes close) (note: please do not try to combine your storage array with BBQ sauce).  If the array is going to protect data that hosts are writing, it must write additional information along with it that can be used for recovery.

Write penalties are expressed in a ratio of physical disk writes to host writes (or physical disk writes to virtual disk writes, or most accurately back-end writes to front-end writes).  RAID1 has a write penalty of 2:1.  This means that for every host write that comes in (a write coming in to the front-end of the storage array), the physical disks (on the back-end of the storage array) will see two writes.

You might be wondering, what about read penalties?  Read penalties don’t really exist in normal operations. This may be a good topic for another post but for now just take it on faith that the read penalty for every non-degraded RAID type is 1:1.

The protection factor here is pretty obvious, but let’s again discuss it using common terminology in the storage world.  RAID1 can survive a single disk failure.  By that I mean if one disk in the pair fails, the remaining good disk will continue to provide data service for both reads and writes.  If the second disk in the pair fails before the failed disk is replaced and rebuilt, the data is lost.  If the first disk is replaced and rebuilds without issue, then the pair returns to normal and can once again survive a single disk failure.  So when I say “can survive a single disk failure,” I don’t mean for the life of the RAID group – I mean at any given time assuming the RAID group is healthy.

Another important concept – what does degraded and rebuild mean to RAID1 from a performance perspective?

  • Degraded – In degraded mode, only one of the two disks exists.  So what happens for writes?  If you thought things got better, you are strangely but exactly right.  When only one disk of the pair exists, only one disk is available for writes.  The write penalty is now 1:1, so we see a performance improvement for writes (although an increased risk for data loss, since if the remaining disk dies all data is lost). There is a potential performance reduction for reads since we are only able to read from one physical disk (this can be obscured by cache, but then so can write performance).
  • Rebuild – During a rebuild, performance takes a big hit.  The existing drive must be fully mirrored onto the new drive from start to finish, meaning that the one good drive must be entirely read and the data written to the new partner.  And, because the second disk is in place, writes will typically start being split again so that pesky 2:1 penalty comes back into play.  And the disks must continue to service reads and writes from the host.  So for the duration of the rebuild, you can expect performance to suffer.  This is not unique to RAID1 – rebuild phases always negatively impact potential performance.

 Striping

Striping is sometimes confused or combined with parity (which we’ll cover in RAID5 and RAID6) but it is not the same thing.  Striping is a process of writing data across several disks in sequence. RAID0 only uses striping, while the rest of the RAID types except RAID1 use striping in combination with mirroring or parity.  Striping is used to enhance performance.

striping

In this example the computer system is once again writing some alphabetic characters to disk.  It is writing to Logical Block Addresses, or sectors, or blocks, or whatever you like to imagine makes up these imaginary disks.  And these must be enormous letters because apparently I can only fit nine of them on a 300GB disk!  

On the left hand side there is a single 300GB physical disk.  As the hosts writes these characters, they are hitting the same disk over and over.  Obvious – there is only one disk!

What is the important thing to keep in mind here?  As mentioned in Part 1, generally the physical disk is going to be the slowest thing in the data path because it is a mechanical device.  There is a physical arm inside the disk that must be positioned in order to read or write data from a sector, and there is a physical platter (metal disk) that must rotate to a specific position.  And with just one disk here, that one arm and one platter must position itself for every write. The first write is C, which must fully complete before the next write A can be accomplished.

Brief aside – why the write order column?  The write order is to clarify something about this workload.  Workload is something that is often talked about in the storage world, especially around design and performance analysis.  It describes how things are utilizing the storage, and there are a lot of aspects of it – sequential vs random, read vs write, locality of reference, data skew, and many others. In this case I’m clarifying that the workload is random, because the host is never writing to consecutive slots.  If instead I wrote the data as A, B, C, D, E, F, G, H, I, this would be sequential.  I’ll provide some more information about random vs sequential in the RAID5/6 discussion.

On the right hand side there are three 100GB disks in a RAID0 configuration.  And once again it is writing the same character set.

This time, though, the writes are being striped across three physical disks.  So the first write C hits disk one, the second write A hits disk two, and the third write E hits disk 3.  What is the advantage?   The writes can now execute in parallel as long as they aren’t hitting the same physical disk.  I don’t need for the C write to complete before I start on A.  I just need C to complete before I start on B.

How about reads?  Yep, there is increased performance here as well.  Reads can also execute in parallel, assuming the locations being read are on different physical disks.

Effectively RAID0 has increased our efficiency of processing I/O operations by a factor of three.  I/O operations per second, or IOPs as they are commonly called, is a way to measure disk performance (e.g. faster disks like SAS can process roughly double the amount of IOPs than NLSAS or SATA disks).  And striping is a good way to bump up the IOPs a system is capable of producing for a given virtual disk set.

This is a good time to define some terminology around striping.  I wouldn’t necessarily say this is incredibly useful, but it can be a good thing to comprehend when comparing systems because these are some of the areas where storage arrays diverge from each other.

The black boxes outlined in green represent strips.  The red dotted line indicates a stripe.

The black boxes outlined in green represent strips. The red dotted line indicates a stripe.

  • Strip – A strip is a piece of one disk.  It is the largest “chunk” that can be written to any disk before the system moves on to the next disk in the group.  In our three disk example, a strip would be the area of one disk holding one letter.
  • Strip Size (also called stripe depth) – This is the size of a strip from a data perspective.  The size of all strips in any RAID group will always be equivalent.  On EMC VNX, this value is 64KB (some folks might balk at this having seen values of 128 – this is actually 128 blocks and a block is 512 bytes).  On VMAX this varies but (I believe) for most configurations the strip size is 256KB, and for some newer ones it is 128KB (will try to update this if/when I verify this).  A strip size of 64KB means that if I were to write 128KB starting at sector 0 of the first disk, the system would write 64KB to the disk before moving on to the next disk in the group.  And if the strip size were 128KB, the system would write the entire 128KB to disk before moving on to the next disk for the next bit of data.
  • Stripe – A stripe is a collection of strips across all disks that are “connected”, or more accurately seen as contiguous.  In our 3 disk example, if our strip size was 64KB, then the first strip on each disk would collectively form the first stripe.  The second strip on each disk would form the second stripe, and would be considered, from a logical disk perspective, to exist after the first stripe.  So the order of consecutive writes would go Stripe1-Strip1, Stripe1-Strip2, Stripe1-Strip3, Stripe2-Strip1, Stripe2-Strip2, etc.
  • Stripe Width – this is how many data disks are in a stripe.  In RAID0 this is all of them because disks only hold data, but for other RAID types this is a bit different.  In our example we have a stripe width of 3.
  • Stripe Size – This is stripe width x strip size.  So in our example if the strip size is 64KB, the stripe size is 64KB x 3 or 192KB

Note: these are what I feel are generally accepted terms.  However, these terms get mixed up A LOT.  If you are involved in a discussion around them or are reading a topic, keep in mind that what someone else is calling stripe size might not be what you are thinking it is.  For example, a 4+1 RAID5 group has five disks, but technically has a stripe width of 4.  Some people would say it has a stripe width of five.  In my writing I will always try to maintain these definitions for these terms.

Since I defined some common terms above in the RAID1 section, let’s look at them again from the perspective of RAID0.

First, usable capacity.  RAID0 is unique because there is no protection information that is written.  Because of this, there is no usable capacity penalty.  If I combine five 1TB disk in a RAID0 group, I have 5TB usable.  In RAID0, raw always equals usable.

How about write penalty?  Once again we have a unique situation on our hands.  Every front end write only hits one physical disk, so there is no write penalty – or the write penalty can be expressed as 1:1.

“Amazing!” you might be thinking.  Actually, probably not, because you have probably realized what the big issue with RAID0 is.  Just to be sure, let’s discuss protection factor.  RAID0 does not write any protection information in any way, hence it provides no protection.  This means that the failure of any member in a RAID0 group immediately and instantly invalidates the entire group (this is an important concept for later, so make sure you understand it).  If you have two disks in a RAID0 configuration and one disk fails, all data on both disks is unusable.  If you have 30 disks in a RAID0 configuration and one disk fails, all data on all 30 disks is unusable.  Any RAID0 configuration can survive zero failed disks.  If you have a physical failure in RAID0, you better have a backup somewhere else to restore from.

How about the degraded and rebuild concepts?  Good news everyone!  No need to worry ourselves with these concepts because neither of these things will ever happen.  A degraded RAID0 group is a dead RAID0 group.  And a rebuild is not possible because RAID0 does not write information that allows for recovery.

So, why do we care about RAID0?  For the most part, we don’t.  If you run RAID0 through the googler, you’ll find it is discussed a lot for home computer performance and benchmarking.  It is used quite infrequently in enterprise contexts because the performance benefit is outweighed by the enormous protection penalty.  The only places I’ve seen it used are for things like local tempdb for SQL server (note: I’m not a DBA and haven’t even played one on TV, but this is still generally a bad idea.  TempDB failure doesn’t affect your data, but I believe it does cause SQL to stop running…). 

We do care about RAID0, and more specifically the striping concept, because it is used in every other RAID type we will discuss.  To say that RAID0 doesn’t protect data isn’t really fair to it.  It is more accurate to say that striping is used to enhance performance, and not to protect data.  And it happens that RAID0 only uses striping.  It’s doing what it is designed to do.  Poor RAID0.

Neither of these RAID types are used very often in the world of enterprise storage, and in the next post I’ll explain why as I cover RAID1/0.

RAID: Part 1 – General Overview

For my first foray into the tech blogging world, I wanted to have a discussion on the simple yet incredibly complex subject of RAID.  Part 1 will not be technical, and instead hopefully provide some good footing on which to build.

For the purposes of this discussion I’m only going to focus on RAID 0, 1, 1/0 (called “RAID one zero” or more commonly “RAID ten”), 5, and 6.  These are generally the most common RAID types in use today, and the ones available for use on an EMC VNX platform.  Newcomers may feel daunted by the many types of RAID…I know I was. I spent some time memorizing a one line definition of what they mean. While this may be handy for a job interview, a far more valuable use of time would be to memorize how they work!  You can always print out a summary and hang it next to your desk.

I’ve found RAID to be one of the more interesting topics in the storage world because it seems to be one of the more misunderstood, or at least not fully understood, concepts – yet it is probably one of the most widely used.  Almost every storage array uses RAID in some form or another.  Often I deal with questions like:

  • Why don’t we just use RAID 1/0 since it is the fastest?
  • Why don’t I just want to throw all my disks into one big storage pool?
  • RAID6 for NLSAS is a good suggestion, but RAID5 isn’t too much different right?
  • RAID6 gives two disk failure protection, why would anyone use RAID5 instead?
  • Isn’t RAID6 too slow for anything other than backups?

Most of these questions really just stem from not understanding the purpose of RAID and how the types work.

In this post we’ll tackle the most basic of questions – what does RAID do, and why would I want to use RAID?

What does RAID do?

RAID is an acronym for Redundant Array of Independent (used to be Inexpensive, but not so much anymore) Disks.  The easiest way to think of RAID is a group of disks that are combined together into one virtual disk.

raid_example

If I had five 200GB disks, and “RAIDed” them together, it would be like I had one 1000GB disk.  I could then allocate capacity from that 1000GB disk.

Why would I want to use RAID?

RAID serves at least three purposes – protection, capacity, and performance.

Protection from Physical Failures

With the exception of RAID 0 (I’ll discuss the types later), the other RAID versions listed will protect you against at least one disk failure in the group.  In other words, if a hard drive suffers a physical failure, not only can you continue running (possibly with a performance impact), but you won’t lose any data.  A RAID group that has suffered a failure but is continuing to run is generally known as degraded.  What this means is a little different for each type so we’ll cover those details later.  When the failed disk is replaced with a functional disk, some type of rebuild operation will commence, and when complete the RAID group will return to normal status without issue.

Most enterprise storage arrays, and many enterprise servers, allow you to implement what is commonly known as a hot spare.  A hot spare is a disk that is running in a system, but not currently in use.  The idea behind a hot spare is to reduce restore time.  If a disk fails and you have to:

  1. Wait for a human to recognize the failure
  2. Open a service request for a replacement
  3. Wait for the replacement to be shipped
  4. Have someone physically replace the disk

That is potentially a long period of time that I am running in degraded mode. Hence the hot spare concept. With a hot spare in the system, when the disk fails, a spare is instantly available and rebuild starts.  Once the rebuild is finished, the RAID group returns to normal.  The failed disk is no longer a part of any active RAID group, and itself can be seen as a spare, unused disk in the system (though obviously not a hot spare because it is failed!).  Eventually it will be replaced, but because it isn’t involved in data service there is less of a critical business need to replace it.

An important and sometimes hazy concept, especially with desktops, is that RAID only protects you against physical failures.  It does not protect you against logical corruption.  As a simple example, if I protect your computer’s hard drives with RAID1 and one of those drives dies, you are protected. If instead you accidentally delete a critical file, RAID will do nothing for you.  In this situation, you need to be able to recover the file through the file system if possible, or restore from a backup.  There are a lot of types of logical corruption, and rest assured that RAID will not protect you from any of them.

Capacity

There are two capacity related benefits to RAID.  Note that there is generally also a capacity penalty that comes along with RAID, but we will discuss that when we get into the types.

Aggregated Usable Capacity

Continuing the example above with the five 200GB disks, if you were to come ask me for storage, without RAID the most I could give you would be a 200GB disk.  I might be able to give you multiple 200GB disks, and you might be able to combine those through a volume manager, but as a storage admin I could only present you one 200GB disk.

What if you need a terabyte of space?  I’d have to give you all five separate disks, and then you’d have to do some volume management on your end to put them together.

With RAID, I can assemble those together on the back end as a virtual device, and present it as one contiguous address space to a host.  As an example, 2TB datastores are fairly common in ESX, and I would venture to say a lot of those datastores run on disk drives much smaller than 2TB.  Maybe it is a 10 or 20 disk 600GB SAS pool, and we have allocated two TB out of that for the ESX datastore.

Aggregated Free Space

Think about the hard drive in your computer.  It is likely that you’ve got some amount of free capacity on it.  Let’s say you have a 500GB hard drive with 200GB of free space.

Now let’s think about five computers with the same configuration.  500GB hard drives, 200GB free on each.  This means that we are not using 1000GB of space overall, but because it is dedicated to each individual computer, we can’t do anything with it.

freespace

If instead we took those 500GB hard drives and grouped them, we could then have a sum total of 2500GB to work with and hand out.  Now perhaps it doesn’t make sense to give all users 300GB of capacity, since that is what they are using and they would be out of space…but perhaps we could give them 400GB instead.

Now we’ve allocated (also commonly known as “carving”) five 400GB virtual disks (also commonly known as LUNs) out of our 2500GB pool, leaving us 500GB of free space to work with.  Essentially by pooling the resources, we’ve gained the ability to add one additional hard drive without adding another physical disk.

Performance

Performance of disk based storage is largely based on how many physical spindles are backing it (this changes with EFD and large cache models, but that is for another discussion).  A hard drive is a mechanical device, and is generally the slowest thing in the data path.  Ergo, the more I can spread your data request (and all data requests) out over a bunch of hard drives, the more performance I’m going to be able to leverage.

If you need 200GB of storage and I give you one 200GB physical disk, that is one physical spindle  backing your storage.  You are going to be severely limited on how much performance you can squeeze out of that hard drive.

If instead I allocate your 200GB of space out of a RAID group or pool, now I can give you a little bit of space on multiple disks.  Now your virtual disk storage is backed by many physical spindles, and in turn you will get a lot more performance out of it.

It should be said that this blog is focused on enterprise storage arrays, but some of the benefits listed above apply to any RAID controller, even one in a server or workstation.  The aggregated free space, and in most scenarios the performance benefit, only apply to a shared storage arrays.

Hopefully this was a good high level introduction to the why’s of RAID.  In the next post I will cover the how’s of RAID 1 and 0.