Kool-Aid. It’s great stuff. When you believe it hard enough, it enables you to do amazing things. Doubt is not a problem. You simply state the company position as truth, uncritically. This gives you tremendous power in your customer presentations. Like I said, if you believe it hard enough.
Funny thing though. After a while Kool-Aid begins to lose its punch. You have to redefine the message. Reformulate the Kool-Aid you will. Otherwise, nagging doubt begins to erode your confidence. You get that haunted look as you deliver the message. Like what you are saying may not be exactly true. This detracts from the appeal of the Kool-Aid, especially as the customer begins to perceive that you may not be shooting straight.
Don’t get me wrong. All companies have Kool-Aid. I would readily admit that EMC has Kool-Aid too. When I first showed up at EMC, I had the following conversation multiple times with many different individuals:
Me: “What do you think about NAS storage for Oracle databases?”
EMC person: “NAS is great for storing small, non-mission critical Oracle databases with low I/O requirements.”
That’s Kool-Aid. Much of what I have been doing at EMC since I showed up here has been combating this particular brand of Kool-Aid. Because my entire professional life for the past 10 years has largely been about Oracle on NAS storage. Because I can run out of fingers and toes on my entire body counting Fortune 1000 companies which I have visited personally that have multiple PBs of NAS-mounted Oracle databases which they use in mission critical settings, and hit with heavy I/O loads. Many of these databases are multiple TBs in size as well. Including, most importantly, Oracle Corporation.
So, yes, NAS is perfectly appropriate for storing Oracle databases of all kinds and sizes with all sorts of I/O requirements. Including databases which the customer absolutely has to have running always. Which Oracle Corporation itself has proven in spades.
All of which is to say that I war against Kool-Aid. In all forms, and at all times. Whether from my existing employer or not. Despite its insidious power. In fact, because of it. The truth must be embraced, and Kool-Aid must be rejected, always.
Getting back to NetApp, the snapshot-for-Oracle-hot-backup form of Kool-Aid began to lose effectiveness in the last couple of years of my tenure there (basically 2004 and 2005), and there was a lot of soul searching for another message. The message which emerged centered on the notion of writable snapshots in the form of FlexClones which were introduced in ONTAP 7. FlexClones work like this:

Similar to my last post, this graphic represents the state of a WAFL file system before and after a set of updates. The blocks belong to three files, color-coded pink, buff, and green. Compared to my last post, another set of “after” images of the blocks have been created. These are the blocks created by FlexClone write activity. This has one major side effect. The space held by the snapshot (referred to as snapshot overhead) increases as write activity within the FlexClone file system occurs. Previously, the blocks held by the snapshot (the light-colored blocks) were only the before images of blocks which were overwritten by the production file system. Now another write thread has been created, which will cause more before images of blocks to be held by the snapshot. This will increase the snapshot storage overhead. I will discuss snapshot storage overhead in detail in a future post.
FlexClones were introduced with great fanfare in 2005 and the claim was widely made by many folks at NetApp (myself included), that they were “completely unique to the industry”. Without question, that’s the purest form of Kool-Aid.
Recall the conversation I told you about in my last post with a major telecom customer in the UK. Here is the rest of that conversation:
Me: “NetApp is the only storage company to have writable snapshots! FlexClones are completely unique to the industry!”
Customer: “What are you talking about? CLARiiON SnapView snapshots are writable and always have been. So are Symmetrix Timefinder Snap snapshots. EMC has had this functionality for years.”
Again: Oops!
I must admit that I feel fairly stupid about this now. In my defense, there was a widespread view within NetApp that undue familiarity with EMC as a competitor was unhealthy. I lived in a state of shocking ignorance concerning the capabilities of EMC storage arrays and software. This ignorance was completely fine as far as my employer was concerned.
The functionality of EMC writable snapshots is somewhat similar to NetApp’s FlexClones. The following graphic points out how they work:

In EMC’s case, the after images of blocks modified in the snapshot are simply written to the RLP LUNs. This eliminates the additional snapshot overhead, at the cost of not maintaining the original version of that data. Again, a weighing exercise. With EMC snapshots, if you want to keep the original version, you must not overwrite it.
Kool-Aid. Watch out for it. It will suck you in. In future posts, I will continue to point out the areas in which the storage-for-database messaging has been less than totally honest. Stay tuned.
Well Mr. Oracle,
I would say I ma very impressed with your facts and figures.I am just a customer to Netapps and would not have had more experience than you.
I reda all tru fromm bottom to top and the trend with the Oracle Backup and Recovery.I would like to send you a mail:my email add is odafeuk@yahoo.co.uk, lets get to talk more.
thanks.
Odafe
Posted by: Odafe | August 08, 2007 at 02:25 PM
Hello, I have both netapp and symmetrix storage.
The database in question is RAC OLTP and is mostly random r/w
I'd like to use snapshots as the primary recovery mechanism. I would keep 48 snapshots of the database file system.
Please correct me if I am in error.
On a COFW system, it seems to me that there is an inverse relationship between the number of snapshots and the random R/W performance of the primary lun. More snapshots decrease performance linearly.
I would think a WAFL file system could sustain the same random R/W performance regardless of the number of snapshots.
On the other hand a COFW systems excels if I am a using a short lived snapshot to make a backup to tape and then delete the snapshot after the backup completes. Once that snapshot is deleted there is no performance penalty.
a WAFL system still has the read penalty regardless if there are snapshots or not.
Is my thinking correct or am I in error?
Posted by: Jonathan Marianu | October 03, 2007 at 10:00 PM
Actually I see my error now.
A COFW system using multiple snapshots, copies the original block to the RLP once and just updates the pointers to that block in the other snapshots.
So subsequent snaps do not impact performance as much as the first snap does.
When determining the impact on performace the choice is not how many snaps but whether to maintain a persistent snapshot during production hours at all.
Thoughts?
Response by TOSG:
The COFW penalty paid by a set of snapshots can be easily determined. Let's look at the simplest case: Two snapshots. In that case, the COFW penalty is equal to the number of blocks which are updated for the first time following the creation of either or both snapshots. The reason I say either or both is because many blocks will be shared in common by both snapshots.
It would be best to give you an example in the form of a block diagram, as I do in my blog. However, responding to comments does not afford me that luxury. Perhaps I will do so later in a post.
Suffice it to say, if the two snapshots share blocks (which will normally be the case for the vast majority of blocks in a file system or LUN), then a subsequent update to a shared block will incur only one COFW penalty for that block.
Only if the two snapshots do not share a particular block (because it was or updated after the creation of the oldest snapshot, but before the creation of the newest snapshot) does the penalty get paid twice. This will occur once before the creation of the second snapshot, and once ofter it.
Inserts create another special case. A block inserted after a snapshot does not incur any penalty at all. Thus, a block inserted after the first snapshot but updated after the second snapshot incurs one COFW penalty. And blocks inserted after the newest snapshot incurs no penalty.
It seems confusing, but it really isn't. In practice, the creation of two snapshots in a file system or LUN where the vast majority of the blocks are shared will incur a COFW penalty which is extremely close to one snapshot only. If the blocks are less shared, then it will be closer to two.
In my experience multiple snapshots usually share the vast majority of the blocks in a file system or LUN, as I state above.
Let me know if this needs further clarification.
Regards,
Jeff
Posted by: Jonathan Marianu | October 03, 2007 at 10:23 PM
Let me restate and you can tell me if I have it correct.
The COFW feature consists of two components:
-an RLP block journal,
-one or more snapshot block maps.
When a write request occurs on the primary lun, a copy of the original block is added to the RLP journal and the snapshot block maps are updated as appropriate. Regardless of the number snapshots, the changed block is only written to the RLP journal once. When a snapshot is deleted, the blocks that only it references are deleted from the RLP journal. If this is the case, that seems efficient.
--------
Essentially correct. The RLP (Reserved LUN Pool for those not familiar with that term) is designed to be smart enough to know that a before image of a given block is required by more than one snapshot, if that is the case. In that event, a single copy is stored in the RLP journal, and a pointer to that copy is stored in the snapshot block map for each snapshot that requires that block to be preserved. This why a single COFW penalty is paid for multiple snapshots when a given block is updated.
Regards,
TOSG
Posted by: Jonathan Marianu | October 04, 2007 at 05:58 PM
Nice topic
thanks
I have found two interesting sources http://fileshunt.com and http://filesfinds.com and would like to give the benefit of my experience to you.
Posted by: tatianahunt download | May 16, 2008 at 08:30 AM
I would guess that the RLP tracks whether a block is dirty using a bitmap of the data blocks offsets.
Which raises further questions in my mind.
1. Do RLP systems offer variable block sizes?
2. Do differing work loads benefit from different block sizes?
3. Has there been any benchmarking done on this?
Thanks very much for a great series of articles.
Todd Bourne
-------------
Response by TOSG:
I am not aware of storage systems providing variable block sizes. A cursory examination of both Symmetrix and CLARiiON does not reveal this. As I recall, NetApp sticks to a 4 KB block size as well. If you have further information please let me know.
Posted by: Todd Bourne | July 14, 2009 at 01:43 AM
Hi Jeff,
Great blog. My question is essentially about restoring to a snapshot.
If you are using a snapshot as a rollback mechanism during an upgrade(say), then if you need to rollback with a COFW system (EMC) does it rewrite the main LUN blocks from the RLP, or does it just repoint the changed blocks back to the RLP versions?
If the latter, then that means that after rolling back to a snapshot you end up with a more fragmented block layout that when you started. If the former, then there is a write penalty while the RLP blocks are rewritten to the main LUN.
With the NetApp approach, assuming you take regular snapshots anyway and hence have a degree of inherent fragmentation, there's effectively little difference after rolling back, plus the rollback itself would be more or less instant as no block re-writes are required.
Is that right?
Posted by: Paul Lewis-Borman | July 26, 2012 at 12:54 PM