An Article worth reading in Computerworld
I have written about this issue many times before, but it seems that the Main Stream Tech Media is finally picking it up.
Most savvy enterprise storage customers understand that to get a decent ROI they need to get a usable life of at least 5 years. So, how come the big three storage vendors raise maintenance costs exponentially for systems that are more than 3 years old? A lot of our customers have read the big three’s ROI spreadsheets and calculations that show customers how if they spend hundreds of thousands now to replace their out of maintenance equipment with new, that over 3 years (the vendor desired life of the new equipment) that you’ll actually save money. Then they call Zerowait, and they see a whole new world on what ROI means.
The hardware vendors make money by forcing you to do a forklift upgrade after 3 years. Zerowait’s business model works because we help our customers extend their ROI calculations well beyond 3 years. When a customer extends their systems lifespan they also don’t need to purchase new or upgraded Backup software or monitoring software. There is no additional employee training, or implementation costs. Our customers save on maintenance costs and all of the secondary & tertiary costs also.
So how come vendors don’t include all of these additional infrastructure costs in their little spreadsheet from the marketing department? How much money would you save if you get 3rd party support (for a fraction of OEM support cost) and keep the old stuff around for another 2 years? Certainly that “enterprise class” system you bought 3 years ago is still a good system. It was enterprise class when you bought it, did it stop being enterprise class when the manufacturer’s new model came out?




April 22nd, 2006 at 2:36 pm
Let me see if I can fill in a few more details (Hi Dave).
First, let me try to clear up the confusion about BCS vs. ZCS, and provide a little history. As Dave says, ZCS works by taking every 64th 4K block in the filesystem and using it to store a checksum on the preceding 63 4K blocks. We originally did it this way so we could do on-the-fly upgrades of WAFL volumes (from not-checksum-protected to checksum-protected). Clearly, reformatting each drive from 512 sectors to 520 would not make for an easy, on-line upgrade.
As Dave says above, the primary drawback to ZCS is performance, particularly on reads. Since the data does not always live adjacent to its checksum, a 4K read from WAFL often turns into two I/O requests to the disk. Thus was born the NetApp 520-byte-formatted drive and Block Checksums (BCS). For newly-created volumes, this is the preferred checksum method. Note that a volume cannot use a combination of both methods — a volume is either ZCS or BCS.
Pq65 provides some spare-disk output from a filer running ONTAP 7.x showing spares that could be used in either a BCS or a ZCS. The FC drive shown here is formatted with 520-byte sectors. If it is used in a ZCS volume, ONTAP will simply not use those extra 8 bytes in each sector.
When ATA drives came along, we were stuck with 512-byte sectors. But we wanted to use BCS for performance reasons. So rather than going back to using ZCS, we use what we call and “8/9ths” scheme down in the storage layer of the software stack (underneath RAID). Every 9th 512-byte sector is deemed a checksum sector that contains checksums for each of the previous 8 512-byte sectors (which is a single 4K WAFL block). This scheme allows RAID to treat the disk as if it were formatted with 520-byte sectors, and therefore they are considered BCS drives. And because the checksum data lives adjacent to the data it protects, a single disk I/O can read both the data and checksum, so it really does perform similarly to a 520-byte sector FC drive (modulo the fact that ATA drives have slower seek times and data transfer/rotational speeds).
Starting in ONTAP 7.0, the default RAID type for aggregates is RAID-DP, regardless of disk type. For traditional volumes, the default is still RAID-4 for FC drives, but RAID-DP for ATA drives. You cannot mix FC drives and ATA drives in the same traditional volume or aggregate.
The default RAID group size for RAID-DP is typically double the number of disks as for RAID-4, so if you are deploying large aggregates, the cost of parity is quite similar for either RAID type. But the ability to protect you from a single media error during a reconstruct is of course far superior with RAID-DP (the topic of one of Dave’s recent blogs on the NetApp website).
You can easily upgrade a RAID-4 aggregate to RAID-DP, or downgrade a RAID-DP aggregate to RAID-4. But you cannot shrink a RAID group, so you do want to be careful about how you configure your RAID groups before you populate them with data (assuming you don’t like the defaults).
There was an implication earlier in this blog that we used to use RAID 4, but on newer systems we use RAID 5. That’s not the case — we do not use RAID 5 on any of our systems (though an HDS system sitting behind a V-series gateway might use it internally). This is a whole topic in itself, but the reason, stated briefly, is that RAID-4 is more flexible when it comes to adding drives to a RAID group, and because of WAFL, RAID-4 does not present a performance penalty for us, as it does for most other storage vendors. RAID-DP looks much like RAID-4, but with a second parity drive.
Our “lost-writes” protection capability was also mentioned. Though it is rare, disk drives occasionally indicate that they have written a block (or series of blocks) of data, when in fact they have not. Or, they have written it in the wrong place! Because we control both the filesystem and RAID, we have a unique ability to catch these errors when the blocks are subsequently read. In addition to the checksum of the data, we also store some WAFL metadata in each checksum block, which can help us determine if the block we are reading is valid. For example, we might store the inode number of the file containing the block, along with the offset of that block in the file, in the checksum block. If it doesn’t match what WAFL was expecting, RAID can reconstruct the data from the other drives and see if that result is what is expected. With RAID-DP, this can be done even if a disk is currently missing!
We’re constantly looking for opportunities for adding features to ONTAP RAID and WAFL that can hide some of the deficiencies and quirks of disk drives from clients. I think NetApp is in a unique position to be able to do this sort of thing. It’s great to see that you guys are noticing!
Steve
April 22nd, 2006 at 3:56 pm
Wow: great historical and technical clarification. My hat is off to Dave and Steve for jumping to the task of helping clear up the confusion around this 512/520 issue.
Truly appreciated by all.