RAID: Part 3 – RAID 1/0

So, if you have been following along dear reader, we are now up to speed on several things.  We have discussed mirroring (and RAID1, which leverages it) and striping (and RAID0, which leverages that).  We have also discussed RAID types using some familiar and standard terminology which will allow us to compare and contrast the versions moving forward.

Now, on to the big dog of RAID – RAID 1/0.  This is called “RAID one zero” and “RAID ten,” and sometimes “RAID one plus zero” (and indicated as RAID 1+0).  I have never heard it called “RAID one slash zero” but perhaps somebody somewhere does that also.  All of these things are referring to the same thing, and RAID ten is the most common term for it.

Why do we need RAID1/0?

In this section I wanted to ask a sometimes overlooked question – what are the problems with RAID0 and RAID1 that cause people to need something else?

If you know about RAID0 (or even better if you read Part 2) you should have an excellent idea of the failings of it.  Just to reiterate, the problem of RAID0 is that it only leverages striping, and striping only provides a performance enhancement.  It provides nothing in the way of protection, hence any disk that fails in a RAID0 set will invalidate the entire set. RAID0 is the ticking time bomb of the storage world.

RAID1’s problems aren’t quite as obvious as the “one disk failure = worst day ever” of RAID0, but once again let’s go back to Part 1 and look at the benefits I listed of RAID:

  1. Protection – RAID (except RAID0) provides protection against physical failures.  Does RAID1 provide that?  Absolutely – RAID1 can survive a single disk failure.  Check box checked.
  2. Capacity – RAID also provides a benefit of capacity aggregation.  Does RAID1 provide that?  Not at all.  RAID1 provides no aggregate capacity or aggregate free space benefit because there are always exactly two disks in a RAID1 pair, and the usable capacity penalty is 50%.  Whether I have a RAID1 set using a 600GB drive or a 3TB drive, I get no aggregate capacity benefit with RAID1, beyond the idea of just splitting a disk up into logical partitions…which can be done on a single disk without RAID in the first place.
  3. Performance – RAID provides a performance benefit since it is able to leverage additional physical spindles.  Does RAID1 provide that?  The answer is yes…sort of.  It does provide two spindles instead of one, which fits the established definition.  However there are some caveats.  There isn’t a performance boost on writes because of the write penalty of 2:1 (both of the spindles are being used for every single write).  There is a performance boost on reads because it can effectively round-robin read requests back and forth on the disks.  But, and a BIG BUT, there are only two spindles.  There are only ever going to be two spindles.  Unlike a RAID0 set which can have as many disks as I want to risk my data over, a RAID1 set is performance bound to exactly two spindles.

Essentially the problem with the mirrored pair is just that – there are only ever going to be two physical disks.

By now it may have become obvious, but RAID0 and RAID1 are almost polar opposites.  RAID1’s benefit lies mostly around protection, and RAID0’s benefit is performance and capacity.  RAID1 is the stoic peanut butter, and RAID0 is the delicious jelly.  If only there was a way to leverage them both….

What is RAID1/0?

RAID1/0 is everything you wanted out of RAID0 and RAID1. It is the peanut butter and jelly sandwich.  (Note: please do not attempt to combine your storage array with peanut butter or jelly.  Especially chunky peanut butter.  And even more especiallyer chunky jelly)

Essentially RAID1/0 looks like a combination of RAID1 and RAID0, hence the label.  More accurately, it is a combination of mirroring and striping in that order.  RAID1/0 replaces the individual disks of a RAID0 stripe set with RAID1 mirror pairs.  It is also important to understand what RAID1/0 is and what it is not.  It is true that it leverages the good things out of both RAID types, but it also still maintains the bad things of both RAID types. This will become apparent as we dive into it.

raid10

This is a busy image, but bear with me as I break it down.

  • This is an eight disk RAID1/0 configuration, and on this configuration (similar to the Part 2 examples) we are writing A,B,C,D to it. For simplicity’s sake we ignore write order and just go alphabetically
  • The orange and green help indicate what is happening at their particular parts of the diagram
  • The physical disks themselves (the black boxes) are in mirrored pairs that should hopefully be familiar by now (indicated by the green boxes and plus signs).  This is the same RAID1 config that I’ve covered previously.
  • The weirdness picks up at the orange part. The orange box indicates that we are striping across every mirrored pair.  This is also identical to the RAID0 configuration, except that the the physical disks of the RAID0 config have been replaced with these RAID1 pairs.

This is what is meant by RAID1/0.  First comes RAID1 – we build mirrored pairs.  Then comes RAID0 – we stripe data across the members, which happen to be those mirrored pairs.  It may help to think about RAID1/0 as RAID0 with an added level of protection at the member level (since we know RAID0 provides no protection otherwise).

As the host writes A,B,C,D, the diagram indicates where the data will land, but let’s cover the order of operations.

  1. The host writes A to the RAID1/0 set
  2. A is intercepted by the RAID controller.  The particular strip it is targeted for is identified.
  3. The strip is recognized to be on a mirrored pair, and due to the mirror configuration the write is split.
  4. A lands on both disks that make up the first member of the RAID0 set.
  5. Once the write is confirmed on both disks, the write is acknowledged back to the host as completed
  6. The host writes B to the RAID1/0 set
  7. B is intercepted by the RAID controller.  The particular strip it is targeted for is identified.  Due to the mirror configuration the write is split.
  8. B lands on both disks that make up the second member of the RAID0 set.
  9. Once the write is confirmed on both disks, the write is acknowledged back to the host as completed
  10. The host writes C to the RAID1/0 set
  11. etc.

Hopefully this gives an accurate, comprehensible version of the how’s of RAID1/0.  Now, let’s look at RAID1/0 using the same terminology we’ve been using.

From a usable capacity perspective, RAID1/0 maintains the same penalty as RAID1.  Because every member is a RAID1 pair, and every RAID1 pair has a 50% capacity penalty, it stands to reason that RAID1/0 also has a 50% capacity penalty as a whole.  No matter how many members are in a RAID1/0 group, the usable capacity penalty is always 50%.

The write penalty is a similar tune.  Because every member is a RAID1 pair, and every RAID1 pair has a 2:1 write penalty, RAID1/0 also has a write penalty of 2:1.  Again no matter how many members are in the set, the write penalty is always 2:1.

RAID1/0 reminds me of the Facts of Life. You know, you take the good, you take the bad?  RAID1/0 is a leap up from RAID0 and RAID1, but it doesn’t mean that we’ve gotten rid of their problems.  It is better to think that we’ve worked around their problems.  The same usable capacity penalty exists, but now I have the ability to aggregate capacity by putting more and more members into a RAID1/0 configuration.  The same write penalty exists, but again I can now add more spindles to the RAID1/0 configuration for a performance boost.

The protection factor is weird, but still a combination of the two.  How many disks failures can a RAID1/0 set survive?  The answer is, it depends.  There is still striping on the outer layer, and by now we have beaten the dead horse enough to know that RAID0 can’t lose any physical disks.  It is a little clearer, especially for this transition, to think of this concept as RAID0 can’t survive any member failures, and in traditional RAID0 members are physical disks.  In this capacity, RAID1/0 is the same: RAID1/0 can’t survive any member failures.  The difference is that now a member is made up of two physical disks that are protecting each other.  So can a RAID1/0 set lose a disk and continue running?  Absolutely – RAID1/0 can always survive one physical disk failure.

…But, can it survive two?  This is where it gets questionable.  If the second disk failure is the other half of the mirrored pair, the data is toast.  Just as toast as if RAID0 had lost one physical disk since the effect is the same.  But what if it doesn’t lose that specific disk?  What if it loses a disk that is part of another RAID1 pair?  No problem, everything keeps running.  In fact, in our example, we can lose 4 disks like this and keep running:

raid10_4fails

You can lose as many as half of the disks in the RAID1/0 set and continue running, just as long as they are the right disks.  Again, if we lose two disks like this, ’tis a bad day:

raid10_2fails

So there are a few rules about the protection of RAID1/0

  • RAID1/0 can always survive a single disk failure
  • RAID1/0 can survive multiple disk failures, so long as the disk failures aren’t within the same mirrored pair
  • With RAID1/0 data loss can occur over as little as two disk failures (if they are part of the same mirror pair) and is guaranteed to occur at (n/2)+1 failures where n is the total disk count in the RAID1/0 set. 

Degraded and rebuild concepts are identical to RAID1 because the striping portion provides no protection and no rebuild ability.

  • Any mirror pair in degraded mode will see a write performance increase (splitting writes no longer necessary), and potentially a read performance decrease.  Other mirror pairs continue to operate as normal
  • Any mirror pair in rebuild mode will see a heavy performance penalty.  Other mirror pairs continue to operate as normal with no performance penalty.

Why not RAID0/1?

This is one of my favorite interview questions, and if you are interviewing with me (or at places I’ve been) this might give you a free pass on at least one technical question.  I picked it up from a colleague of mine and have used it ever since.

Why not RAID0/1?  Or is there even a concept of RAID0/1?  Would it be the same as RAID1/0?

It does exist, and it is extremely similar on the surface.  The only difference is the order of operations: RAID1/0 is mirrored, then striped, and RAID0/1 is striped, then mirrored.  This seemingly minor difference in theory actually manifests as a very large difference in practice.

raid01

Most things about RAID0/1 are identical to RAID1/0 (like performance and usable capacity), with one notable exception – what happens during disk failure?

I covered the failure process of RAID1/0 above so I won’t rehash that. For RAID0/1, remember that any failure of a RAID0 member invalidates the entire set.  So, what happens whenever the top left disk in RAID0/1 fails?  Yep, the entire top RAID0 set fails, and now it is effectively running as RAID0 using only the bottom set.

This has two implications.  The most severe being that RAID0/1 can survive a single disk failure, but never two disk failures.  The other is that if a disk failed and a hot spare was available (or the bad disk was swapped out with a good disk), the rebuild affects the entire RAID set rather than just a portion of it.

It would be possible to design a RAID controller to get around this.  It could recognize that there is still a valid member available to continue running from in the second stripe set.  But then essentially what it is doing is trying to make RAID0/1 be like RAID1/0.  Why not just use RAID1/0 instead?  That is why RAID1/0 is a common implementation and RAID0/1 is not.

Wrap Up

In Part 4 I’m going to cover parity and hopefully RAID5 and 6, and then I’ll provide some notes to bring the entire discussion together.  However, I wanted to include some thoughts about RAID1/0 in case someone stumbled on this and had some specific questions or issues related to performance, simply because I’ve seen this a lot.

RAID1/0 performs more efficiently than other RAID types from a write perspective only.  A lot of people seem to think that RAID1/0 is “the fastest one,” and hence should always be used for performance applications.  This is demonstrably untrue.  As I’ve stated previously, there is no such thing as a read penalty for any RAID type.  If your application is entirely or mostly read oriented, using RAID1/0 instead of RAID5 or 6 does nothing but cost you money in the form of usable capacity.  And yes, there are workloads with enormous performance requirements that are 100% read.

RAID1/0 has a massive usable capacity penalty.  If you are protecting data with RAID1/0, you need to purchase twice as much storage as it needs.  If you are replicating that data like-for-like, you need to purchase four times the amount of storage that it needs.  Additionally, sometimes your jumping off point locks you into a RAID type as well, so a decision to use RAID1/0 today may impact the future costs of storage as well.  I can’t emphasize this point enough – RAID1/0 is extremely expensive and not always needed.

I like to think of people who always demand RAID1/0 like the people who might bring a Ferrari when asked to “bring your best vehicle.”  But it turns out, I needed to tow a trailer full of concrete blocks up a mountain.  Different vehicles are the best at different things…just like RAID types.  We need to fully understand the requirements before we bring the sports car.

If you are having performance problems, or more likely someone is telling you they are having performance problems, jumping from RAID5 to RAID1/0 may not do a thing for you.  It is important to do a detailed analysis of the ENTIRE storage environment and figure out what the best fit solution is.  You don’t want to be that guy who advocated a couple hundred thousand dollars of a storage purchase when it turned out there was a host misconfiguration.

2 thoughts on “RAID: Part 3 – RAID 1/0

  1. It would seem to me that raid0/1 could survive multiple disk failures, so long as all disks reside in the same raid0 set (in your example, the top one). The catch being, you would have to rebuild all disks in your raid0 set to bring back up raid1…ouch.

    Am I right in interpreting it that way, or am I missing something?

  2. Hi Ben,

    You are technically correct (the best kind of correct) that multiple disk failures in the same set would be allowed. I may have phrased it better as “active disk failures” instead of just “disk failures.”

    Something to keep in mind would be that once the RAID0 set experiences a failure, all activity would drop on the disks in that set, which would greatly reduce the chance of a drive failure. Idle drives don’t experience the same degree of mechanical wear that actives ones do. Consequently, all read activity would be directed to the active set, increasing (to some degree) the potential for failure in the remaining RAID0 set. The rebuild activity would be very high load on the entire active set as well.

    So while it is possible that another failure could occur in the dead set, it is far more likely that one would occur in the active set, and at that point any failure in the remaining set would invalidate the data. A more likely scenario for what you are talking about would be having the sets split across different disk shelves, and experiencing a shelf-level fault which takes out one entire RAID0 set. Again in this scenario the data would be available as long as a drive didn’t fail in the active set.

    But I did want to let you know that your thought process is dead on!

    Thanks for commenting,
    Joel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s