In this post I’m going to cover two of the three patterns of RAID, and cover RAID0, 1. Initially I was going to include RAID1/0 but this one got a bit too long, so I’ll break it out into its own post.
- Part 1 – General Overview
- Part 3 – RAID 1/0
- Part 4 – Parity, Schmarity
- Part 5 – RAID5 and RAID6
- Part 6 – WrapUp
Mirroring is one of the more straightforward RAID patterns, and is employed in RAID1 and RAID1/0 schemes. Mirroring is used for data protection.
Exactly as it sounds, mirroring involves pairs of disks (always pairs – exactly two) that are identical copies of each other. In RAID1, a virtual disk would be comprised of two physical disks on the back end. So as I put files on the virtual disk that my server sees, essentially it is making two copies of it on the back side.
In this scenario imagine a server is writing the alphabet to disk. The computer host starts with A and continues through till I. In the case of one physical disk on the left hand side, the writes end up on the single physical disk as expected.
On the right hand side, using the RAID1 pair, the computer host writes the same information, but on the storage array the writes are mirrored to two physical disks. This pair must have identical contents at all times in order to guarantee the consistency and protection of RAID1.
The two copies are accomplished by write splitting, generally done by the RAID controller (the brains behind RAID protection – sometimes dedicated hardware, sometimes software). The RAID controller detects a write coming into the system, realizes that the RAID type is 1, and splits the write onto the two physical disks of the pair.
How much capacity is available in both scenarios for the hosts? 300GB. In the RAID1 configuration, even though there is a total of 600GB of space, 50% of the usable capacity is lost to RAID1. Usable capacity is one of the key differentiators in the different RAID schemes. Sometimes this is expressed in terms of usable and raw capacities – in this case there is 600GB of raw capacity, but only 300GB usable. Again, the 50% factor comes in to play.
Another important concept to start to understand is what is called the write penalty. The write penalty of any RAID protection scheme (except RAID0, discussed below) exists because in order to protect data, the storage system must have the ability to recover or maintain it in the event of a physical failure. And in order to recover data, it must have some type of extra information. There is no free lunch, and unfortunately no magic sauce (although some BBQ sauce comes close) (note: please do not try to combine your storage array with BBQ sauce). If the array is going to protect data that hosts are writing, it must write additional information along with it that can be used for recovery.
Write penalties are expressed in a ratio of physical disk writes to host writes (or physical disk writes to virtual disk writes, or most accurately back-end writes to front-end writes). RAID1 has a write penalty of 2:1. This means that for every host write that comes in (a write coming in to the front-end of the storage array), the physical disks (on the back-end of the storage array) will see two writes.
You might be wondering, what about read penalties? Read penalties don’t really exist in normal operations. This may be a good topic for another post but for now just take it on faith that the read penalty for every non-degraded RAID type is 1:1.
The protection factor here is pretty obvious, but let’s again discuss it using common terminology in the storage world. RAID1 can survive a single disk failure. By that I mean if one disk in the pair fails, the remaining good disk will continue to provide data service for both reads and writes. If the second disk in the pair fails before the failed disk is replaced and rebuilt, the data is lost. If the first disk is replaced and rebuilds without issue, then the pair returns to normal and can once again survive a single disk failure. So when I say “can survive a single disk failure,” I don’t mean for the life of the RAID group – I mean at any given time assuming the RAID group is healthy.
Another important concept – what does degraded and rebuild mean to RAID1 from a performance perspective?
- Degraded – In degraded mode, only one of the two disks exists. So what happens for writes? If you thought things got better, you are strangely but exactly right. When only one disk of the pair exists, only one disk is available for writes. The write penalty is now 1:1, so we see a performance improvement for writes (although an increased risk for data loss, since if the remaining disk dies all data is lost). There is a potential performance reduction for reads since we are only able to read from one physical disk (this can be obscured by cache, but then so can write performance).
- Rebuild – During a rebuild, performance takes a big hit. The existing drive must be fully mirrored onto the new drive from start to finish, meaning that the one good drive must be entirely read and the data written to the new partner. And, because the second disk is in place, writes will typically start being split again so that pesky 2:1 penalty comes back into play. And the disks must continue to service reads and writes from the host. So for the duration of the rebuild, you can expect performance to suffer. This is not unique to RAID1 – rebuild phases always negatively impact potential performance.
Striping is sometimes confused or combined with parity (which we’ll cover in RAID5 and RAID6) but it is not the same thing. Striping is a process of writing data across several disks in sequence. RAID0 only uses striping, while the rest of the RAID types except RAID1 use striping in combination with mirroring or parity. Striping is used to enhance performance.
In this example the computer system is once again writing some alphabetic characters to disk. It is writing to Logical Block Addresses, or sectors, or blocks, or whatever you like to imagine makes up these imaginary disks. And these must be enormous letters because apparently I can only fit nine of them on a 300GB disk!
On the left hand side there is a single 300GB physical disk. As the hosts writes these characters, they are hitting the same disk over and over. Obvious – there is only one disk!
What is the important thing to keep in mind here? As mentioned in Part 1, generally the physical disk is going to be the slowest thing in the data path because it is a mechanical device. There is a physical arm inside the disk that must be positioned in order to read or write data from a sector, and there is a physical platter (metal disk) that must rotate to a specific position. And with just one disk here, that one arm and one platter must position itself for every write. The first write is C, which must fully complete before the next write A can be accomplished.
Brief aside – why the write order column? The write order is to clarify something about this workload. Workload is something that is often talked about in the storage world, especially around design and performance analysis. It describes how things are utilizing the storage, and there are a lot of aspects of it – sequential vs random, read vs write, locality of reference, data skew, and many others. In this case I’m clarifying that the workload is random, because the host is never writing to consecutive slots. If instead I wrote the data as A, B, C, D, E, F, G, H, I, this would be sequential. I’ll provide some more information about random vs sequential in the RAID5/6 discussion.
On the right hand side there are three 100GB disks in a RAID0 configuration. And once again it is writing the same character set.
This time, though, the writes are being striped across three physical disks. So the first write C hits disk one, the second write A hits disk two, and the third write E hits disk 3. What is the advantage? The writes can now execute in parallel as long as they aren’t hitting the same physical disk. I don’t need for the C write to complete before I start on A. I just need C to complete before I start on B.
How about reads? Yep, there is increased performance here as well. Reads can also execute in parallel, assuming the locations being read are on different physical disks.
Effectively RAID0 has increased our efficiency of processing I/O operations by a factor of three. I/O operations per second, or IOPs as they are commonly called, is a way to measure disk performance (e.g. faster disks like SAS can process roughly double the amount of IOPs than NLSAS or SATA disks). And striping is a good way to bump up the IOPs a system is capable of producing for a given virtual disk set.
This is a good time to define some terminology around striping. I wouldn’t necessarily say this is incredibly useful, but it can be a good thing to comprehend when comparing systems because these are some of the areas where storage arrays diverge from each other.
- Strip – A strip is a piece of one disk. It is the largest “chunk” that can be written to any disk before the system moves on to the next disk in the group. In our three disk example, a strip would be the area of one disk holding one letter.
- Strip Size (also called stripe depth) – This is the size of a strip from a data perspective. The size of all strips in any RAID group will always be equivalent. On EMC VNX, this value is 64KB (some folks might balk at this having seen values of 128 – this is actually 128 blocks and a block is 512 bytes). On VMAX this varies but (I believe) for most configurations the strip size is 256KB, and for some newer ones it is 128KB (will try to update this if/when I verify this). A strip size of 64KB means that if I were to write 128KB starting at sector 0 of the first disk, the system would write 64KB to the disk before moving on to the next disk in the group. And if the strip size were 128KB, the system would write the entire 128KB to disk before moving on to the next disk for the next bit of data.
- Stripe – A stripe is a collection of strips across all disks that are “connected”, or more accurately seen as contiguous. In our 3 disk example, if our strip size was 64KB, then the first strip on each disk would collectively form the first stripe. The second strip on each disk would form the second stripe, and would be considered, from a logical disk perspective, to exist after the first stripe. So the order of consecutive writes would go Stripe1-Strip1, Stripe1-Strip2, Stripe1-Strip3, Stripe2-Strip1, Stripe2-Strip2, etc.
- Stripe Width – this is how many data disks are in a stripe. In RAID0 this is all of them because disks only hold data, but for other RAID types this is a bit different. In our example we have a stripe width of 3.
- Stripe Size – This is stripe width x strip size. So in our example if the strip size is 64KB, the stripe size is 64KB x 3 or 192KB
Note: these are what I feel are generally accepted terms. However, these terms get mixed up A LOT. If you are involved in a discussion around them or are reading a topic, keep in mind that what someone else is calling stripe size might not be what you are thinking it is. For example, a 4+1 RAID5 group has five disks, but technically has a stripe width of 4. Some people would say it has a stripe width of five. In my writing I will always try to maintain these definitions for these terms.
Since I defined some common terms above in the RAID1 section, let’s look at them again from the perspective of RAID0.
First, usable capacity. RAID0 is unique because there is no protection information that is written. Because of this, there is no usable capacity penalty. If I combine five 1TB disk in a RAID0 group, I have 5TB usable. In RAID0, raw always equals usable.
How about write penalty? Once again we have a unique situation on our hands. Every front end write only hits one physical disk, so there is no write penalty – or the write penalty can be expressed as 1:1.
“Amazing!” you might be thinking. Actually, probably not, because you have probably realized what the big issue with RAID0 is. Just to be sure, let’s discuss protection factor. RAID0 does not write any protection information in any way, hence it provides no protection. This means that the failure of any member in a RAID0 group immediately and instantly invalidates the entire group (this is an important concept for later, so make sure you understand it). If you have two disks in a RAID0 configuration and one disk fails, all data on both disks is unusable. If you have 30 disks in a RAID0 configuration and one disk fails, all data on all 30 disks is unusable. Any RAID0 configuration can survive zero failed disks. If you have a physical failure in RAID0, you better have a backup somewhere else to restore from.
How about the degraded and rebuild concepts? Good news everyone! No need to worry ourselves with these concepts because neither of these things will ever happen. A degraded RAID0 group is a dead RAID0 group. And a rebuild is not possible because RAID0 does not write information that allows for recovery.
So, why do we care about RAID0? For the most part, we don’t. If you run RAID0 through the googler, you’ll find it is discussed a lot for home computer performance and benchmarking. It is used quite infrequently in enterprise contexts because the performance benefit is outweighed by the enormous protection penalty. The only places I’ve seen it used are for things like local tempdb for SQL server (note: I’m not a DBA and haven’t even played one on TV, but this is still generally a bad idea. TempDB failure doesn’t affect your data, but I believe it does cause SQL to stop running…).
We do care about RAID0, and more specifically the striping concept, because it is used in every other RAID type we will discuss. To say that RAID0 doesn’t protect data isn’t really fair to it. It is more accurate to say that striping is used to enhance performance, and not to protect data. And it happens that RAID0 only uses striping. It’s doing what it is designed to do. Poor RAID0.
Neither of these RAID types are used very often in the world of enterprise storage, and in the next post I’ll explain why as I cover RAID1/0.