Every company in the industry has its own brand of Kool-Aid. This is the set of tenets which are believed inherently, regardless of the facts: Articles of faith if you will. Call me a skeptic, but this drives me crazy. At NetApp the Kool-Aid was largely around the notion of snapshots and the idea that only NetApp could create simple, fast, and easy snapshots. Well, I drank that Kool-Aid big time, and I dispensed it as well. When I was talking to a customer, I would try to sell them this brand of Kool-Aid, often with great success. However, once in a while the pitch would backfire. I well remember a trip to the UK where this fell apart. This was a major telecom customer in that region. Their DBA and storage networking staff were extremely competent and familiar with EMC CLARiiON arrays. The conversation went kind of like this:
Me: “NetApp snapshots are great! We have no write penalty. We have the best snapshots in the industry.”
Customer: “We use CLARiiON SnapView snapshots all the time. There is minimal, if any write penalty. We hardly notice it.”
Oops!
At this point, I have had the opportunity to thoroughly explore both NetApp and EMC snapshot technology. In this post, I will compare these technologies and discuss how they have been used by Oracle customers to implement instantaneous backup. I will conclude with a (hopefully fairly objective) discussion of the relative advantages and disadvantages of each approach. I promise to keep the Kool-Aid dispenser turned off.
EMC and NetApp took radically different approaches in creating their respective snapshot technologies. NetApp’s snapshots actually occurred as a result of serendipity (i.e. a happy accidental discovery). The design of NetApp snapshots is as an artifact of the design of NetApp’s file system called WAFL (Write Anywhere File Layout).
WAFL is so named because it never overwrites existing blocks in place when updates occur. Instead it writes new blocks containing the updated data, and then frees the old blocks. Therefore, a snapshot can be assembled by simply retaining the old blocks rather than freeing them. No additional I/O is required to do this, which leads to NetApp’s accurate claim that their snapshots have no write penalty.
On CLARiiON and Symmetrix arrays, EMC has no file system. Rather, the arrays provide LUNs to hosts which in turn run file systems like Veritas or Oracle’s ASM. Therefore EMC has no visibility into the meta-data that makes up a file system. In order to create a snapshot technology, EMC had to take a different approach. EMC copies the before image of each storage block into a set of special LUNs in something called the reserved LUN pool (RLP). EMC then writes the after image into the normal LUN in the same location as the original block. This preserves the integrity of the host file system, while allowing a point-in-time instantaneous copy of that file system to be assembled.
Interestingly, when EMC created the Celerra Snapsure Checkpoint (Celerra’s version of snapshots), they used the same approach. By then, EMC had some experience with snapshots. Even though the file system was now under EMC control, they made the same choice to use an RLP mechanism to store the snapshot data. The reasons for this will become clear from the discussion below.
The following graphics illustrate the differences between the NetApp approach and the EMC approach. First look at the NetApp approach:
The pink, buff and green file folders represent three different files in the WAFL file system. Since snapshots are being used on this file system, the old version of the blocks A, B, C, D, F and I are being retained. These are the light colored blocks. The darker versions of the blocks, marked with either a single or double quote mark, are the after images of the updated storage blocks. The double quotes represent blocks which have been updated more than once. Note that the first update of block C (block C') has been freed. It is not referenced by any snapshot. The normal colored (neither dark nor light) blocks have not been updated at all. They are shared by both the snapshot and the active file system.
Follow the I/O pattern. Writes to update existing blocks require only one I/O, that’s true. But subsequent reads of the file system are messy. If you read the file system sequentially (that is block A followed by block B, and so forth) the I/O pattern in the newly created file system is perfectly sequential. In the case of the second diagram, where the file system has been updated, it’s not. The most recent versions of the blocks have become scattered throughout the disks. Do it mentally right now. Note the amount of head movement required to read the blocks in this order: A', B', C", D', E, F', G, H, and I'.
This is the “sequential read after random write” (SRARW) performance problem of WAFL. It’s not just an artifact of snapshots. It’s inherent in the entire file system. This performance issue is real. It affects any NetApp customer who must do sequential I/O of database data after that data has been updated. That’s a lot of customers. Many, many databases are mixed use, involving OLTP-style data entry combined with DSS-style reports. Those customers will hit the SRARW issue big time.
Now consider the EMC snapshot approach. Examine the following diagram:
First note the write I/O pattern. Two writes and a read were required to update blocks A, B, C, D, F and I. This happened on the first write following the creation of the snapshot only. The first update to block C (C' in our diagram) was not written to the RLP. Only the second update (C") is stored in the production LUN. Thus, each subsequent update to these blocks will incur one write I/O only, exactly the same as if no snapshot existed. This is the “copy on first write” (COFW) performance issue. The “first write” verbiage indicates that the issue exists only on the first update to a block after the snapshot is taken.
Now examine the read I/O pattern. Blocks in the RLP are not organized sequentially, but that’s fine. You only read these blocks when you are accessing a snapshot. The blocks in the production LUN are perfectly laid out. Sequential I/O to these blocks is optimal. That’s because the blocks have been updated in place, not scattered throughout the disks.
That then is the trade off. With snapshots, as with life, there is no free lunch. A choice must be made between COFW and SRARW. In other words, a write performance penalty for RLP-based snapshots versus a read performance penalty for WAFL-based snapshots. The decision you must make as a customer is: Which of these is the more serious issue? I have spent some time seriously pondering this question. Consider the following (hopefully free of Kool-Aid):
- The COFW issue applies only to the first update to a block written after the snapshot. Therefore, this issue is temporary. The SRARW issue applies to all blocks which are updated after a given piece of data is written the first time. Therefore this issue is permanent, absent some operation to reorganize the file system. And the reorganization would require the reading and writing of every single block in the file system, potentially. A very expensive operation in other words.
- Most workloads are read intensive. Even TPC-C, certainly a very write intensive workload, is mostly reads. Index scans frequently involve sequential read I/O. Undo I/O is largely sequential. Even temp is largely written and read sequentially. Retaining the sequential I/O advantage is very important for many workloads.
- The testing my team is conducting at RTP indicates that the COFW issue is modest, and very temporary. Performance declines slightly for a few iterations, then returns to the previous level. After that, it’s exactly the same as if you had never taken the snapshot at all. The following chart illustrates this (taken from our Commercial Solutions testing effort):
- Oracle expects for storage vendors to honor a simple promise: Locality of reference. This promise is that database blocks that are laid out by Oracle near each other in the tablespace will be stored on disk near each other as well. Many Oracle storage tuning mechanisms rely on this promise. For example, Oracle has a tuning concept called a cluster in which tables are stored together. For two tables in a parent/child relationship the child rows are stored near the parent row. Index organized tables and materialized views are similar concepts. All of these tuning mechanisms within Oracle rely on the notion that data which are stored near each other in the datafile are also stored near each other on the disk. Some minor level of fragmentation can be tolerated, but wholesale scattering of the storage blocks throughout the disks will damage the effectiveness of these features. The promise of locality of reference is not kept in this case.
In my opinion having worked with both snapshot technologies, EMC makes the correct trade off. The ability to perform optimal sequential read I/O must be preserved. The promise of locality of reference must be honored. A temporary, modest write penalty is a reasonable price to pay to do that. RLP-based snapshots, such as the common implementation across CLARiiON SnapView snapshots, Celerra Snapsure Checkpoints, and Timefinder Snap, do that. NetApp’s WAFL-based snapshots do not.
What is extremely clear, though, is the choice to drink Kool-Aid – anyone’s Kool-Aid – is a fool’s choice. You need to look carefully at both types of snapshots and make an informed choice about which is appropriate for you in the context of your particular workload. Snapshots are wonderful, and they benefit the Oracle user greatly. But they are not without risks and trade offs. To contend otherwise would be dishonest.
My next post will discuss the issue of writable snapshots, and the competing technologies in that space.