« June 2007 | Main | August 2007 »

July 2007

July 26, 2007

Oracle Backup: Which Snapshot is best? (Part 2)

Every company in the industry has its own brand of Kool-Aid. This is the set of tenets which are believed inherently, regardless of the facts: Articles of faith if you will. Call me a skeptic, but this drives me crazy. At NetApp the Kool-Aid was largely around the notion of snapshots and the idea that only NetApp could create simple, fast, and easy snapshots. Well, I drank that Kool-Aid big time, and I dispensed it as well. When I was talking to a customer, I would try to sell them this brand of Kool-Aid, often with great success. However, once in a while the pitch would backfire. I well remember a trip to the UK where this fell apart. This was a major telecom customer in that region. Their DBA and storage networking staff were extremely competent and familiar with EMC CLARiiON arrays. The conversation went kind of like this:

Me: “NetApp snapshots are great! We have no write penalty. We have the best snapshots in the industry.”

Customer: “We use CLARiiON SnapView snapshots all the time. There is minimal, if any write penalty. We hardly notice it.”

Oops!

At this point, I have had the opportunity to thoroughly explore both NetApp and EMC snapshot technology. In this post, I will compare these technologies and discuss how they have been used by Oracle customers to implement instantaneous backup. I will conclude with a (hopefully fairly objective) discussion of the relative advantages and disadvantages of each approach. I promise to keep the Kool-Aid dispenser turned off.

EMC and NetApp took radically different approaches in creating their respective snapshot technologies. NetApp’s snapshots actually occurred as a result of serendipity (i.e. a happy accidental discovery). The design of NetApp snapshots is as an artifact of the design of NetApp’s file system called WAFL (Write Anywhere File Layout).

WAFL is so named because it never overwrites existing blocks in place when updates occur. Instead it writes new blocks containing the updated data, and then frees the old blocks. Therefore, a snapshot can be assembled by simply retaining the old blocks rather than freeing them. No additional I/O is required to do this, which leads to NetApp’s accurate claim that their snapshots have no write penalty.

On CLARiiON and Symmetrix arrays, EMC has no file system. Rather, the arrays provide LUNs to hosts which in turn run file systems like Veritas or Oracle’s ASM. Therefore EMC has no visibility into the meta-data that makes up a file system. In order to create a snapshot technology, EMC had to take a different approach. EMC copies the before image of each storage block into a set of special LUNs in something called the reserved LUN pool (RLP). EMC then writes the after image into the normal LUN in the same location as the original block. This preserves the integrity of the host file system, while allowing a point-in-time instantaneous copy of that file system to be assembled.

Interestingly, when EMC created the Celerra Snapsure Checkpoint (Celerra’s version of snapshots), they used the same approach. By then, EMC had some experience with snapshots. Even though the file system was now under EMC control, they made the same choice to use an RLP mechanism to store the snapshot data. The reasons for this will become clear from the discussion below.

The following graphics illustrate the differences between the NetApp approach and the EMC approach. First look at the NetApp approach:

Snap1small_2

The pink, buff and green file folders represent three different files in the WAFL file system. Since snapshots are being used on this file system, the old version of the blocks A, B, C, D, F and I are being retained. These are the light colored blocks. The darker versions of the blocks, marked with either a single or double quote mark, are the after images of the updated storage blocks. The double quotes represent blocks which have been updated more than once. Note that the first update of block C (block C') has been freed. It is not referenced by any snapshot. The normal colored (neither dark nor light) blocks have not been updated at all. They are shared by both the snapshot and the active file system.

Follow the I/O pattern. Writes to update existing blocks require only one I/O, that’s true. But subsequent reads of the file system are messy. If you read the file system sequentially (that is block A followed by block B, and so forth) the I/O pattern in the newly created file system is perfectly sequential. In the case of the second diagram, where the file system has been updated, it’s not. The most recent versions of the blocks have become scattered throughout the disks. Do it mentally right now. Note the amount of head movement required to read the blocks in this order: A', B', C", D', E, F', G, H, and I'.

This is the “sequential read after random write” (SRARW) performance problem of WAFL. It’s not just an artifact of snapshots. It’s inherent in the entire file system. This performance issue is real. It affects any NetApp customer who must do sequential I/O of database data after that data has been updated. That’s a lot of customers. Many, many databases are mixed use, involving OLTP-style data entry combined with DSS-style reports. Those customers will hit the SRARW issue big time.

Now consider the EMC snapshot approach. Examine the following diagram:

Snap2small_2

First note the write I/O pattern. Two writes and a read were required to update blocks A, B, C, D, F and I. This happened on the first write following the creation of the snapshot only. The first update to block C (C' in our diagram) was not written to the RLP. Only the second update (C") is stored in the production LUN. Thus, each subsequent update to these blocks will incur one write I/O only, exactly the same as if no snapshot existed. This is the “copy on first write” (COFW) performance issue. The “first write” verbiage indicates that the issue exists only on the first update to a block after the snapshot is taken.

Now examine the read I/O pattern. Blocks in the RLP are not organized sequentially, but that’s fine. You only read these blocks when you are accessing a snapshot. The blocks in the production LUN are perfectly laid out. Sequential I/O to these blocks is optimal. That’s because the blocks have been updated in place, not scattered throughout the disks.

That then is the trade off. With snapshots, as with life, there is no free lunch. A choice must be made between COFW and SRARW. In other words, a write performance penalty for RLP-based snapshots versus a read performance penalty for WAFL-based snapshots. The decision you must make as a customer is: Which of these is the more serious issue? I have spent some time seriously pondering this question. Consider the following (hopefully free of Kool-Aid):

  1. The COFW issue applies only to the first update to a block written after the snapshot. Therefore, this issue is temporary. The SRARW issue applies to all blocks which are updated after a given piece of data is written the first time. Therefore this issue is permanent, absent some operation to reorganize the file system. And the reorganization would require the reading and writing of every single block in the file system, potentially. A very expensive operation in other words.
  2. Most workloads are read intensive. Even TPC-C, certainly a very write intensive workload, is mostly reads. Index scans frequently involve sequential read I/O. Undo I/O is largely sequential. Even temp is largely written and read sequentially. Retaining the sequential I/O advantage is very important for many workloads.
  3. The testing my team is conducting at RTP indicates that the COFW issue is modest, and very temporary. Performance declines slightly for a few iterations, then returns to the previous level. After that, it’s exactly the same as if you had never taken the snapshot at all. The following chart illustrates this (taken from our Commercial Solutions testing effort):

Image3_2

  1. Oracle expects for storage vendors to honor a simple promise: Locality of reference. This promise is that database blocks that are laid out by Oracle near each other in the tablespace will be stored on disk near each other as well. Many Oracle storage tuning mechanisms rely on this promise. For example, Oracle has a tuning concept called a cluster in which tables are stored together. For two tables in a parent/child relationship the child rows are stored near the parent row. Index organized tables and materialized views are similar concepts. All of these tuning mechanisms within Oracle rely on the notion that data which are stored near each other in the datafile are also stored near each other on the disk. Some minor level of fragmentation can be tolerated, but wholesale scattering of the storage blocks throughout the disks will damage the effectiveness of these features. The promise of locality of reference is not kept in this case.

In my opinion having worked with both snapshot technologies, EMC makes the correct trade off. The ability to perform optimal sequential read I/O must be preserved. The promise of locality of reference must be honored. A temporary, modest write penalty is a reasonable price to pay to do that. RLP-based snapshots, such as the common implementation across CLARiiON SnapView snapshots, Celerra Snapsure Checkpoints, and Timefinder Snap, do that. NetApp’s WAFL-based snapshots do not.

What is extremely clear, though, is the choice to drink Kool-Aid – anyone’s Kool-Aid – is a fool’s choice. You need to look carefully at both types of snapshots and make an informed choice about which is appropriate for you in the context of your particular workload. Snapshots are wonderful, and they benefit the Oracle user greatly. But they are not without risks and trade offs. To contend otherwise would be dishonest.

My next post will discuss the issue of writable snapshots, and the competing technologies in that space.

July 23, 2007

Oracle Backup: Which Snapshot is best? (Part 1)

In this post, I will begin to get to the heart of the matter: The use of storage-level instantaneous copy technology with Oracle backup.

When I talk about storage-level instantaneous copy technology (which I will refer to from now on with the acronym SLIC), I mean things like snapshots and BCVs. Anything that facilitates a point in time instantaneous copy of a database or file system, through some storage-layer mechanism. SLIC technologies come in two broad types: Physical copy technologies (like BCVs) and virtual copy technologies (like snapshots).

When I first went to work for NetApp back in 1997, this was my first challenge. Those were heady times. The internet boom was underway. EMC was pushing their version of SLIC technology, which was largely BCVs. NetApp had SLIC in the form of snapshots, but Oracle was highly resistant to the concept of NFS. My first assignment was to validate the use of NetApp’s NFS implementation with Oracle databases. In the process, I was strongly encouraged to figure out how to do an Oracle hot backup with snapshots. Which I successfully did in early 1998.

At this point, Oracle 7 was the current version, and hot backup (or “user managed backup” as Oracle preferred to call it) was the only way to back up an Oracle database. It would be difficult to over-emphasize the importance of the hot backup feature in Oracle’s success. I think it was absolutely the killer feature which allowed Oracle to become the dominant force on the planet in this space. Let me explain why that was so.

Again, the internet boom was coming into full swing. This pushed a fundamental change in the way that databases worked in many IT organizations, especially dotcoms or companies that wanted to sell products in the online marketplace. The internet was a global phenomenon. That meant 24 x 7 access. Before the internet, DBAs had some downtime each day to back the database up. No longer. Any downtime was now unacceptable. The database had to be backed up while online and open. The database software market became a horse race to see which vendor could do a better job backing up an online production database.

The dominant forces in the market were Oracle and Sybase. Microsoft SQL Server was a Sybase knock off and basically a toy. IBM DB2 was stuck in the proprietary mainframe world, and had no real open source strategy. It was really between Oracle and Sybase.

Oracle had the feature called hot backup. This allowed you to use SLIC to make an instantaneous copy of a running Oracle database while it was in a special mode called “hot backup mode”. I/O to the database was allowed to continue the entire time. Basically, hot backup mode was, and still is, a form of controlled or “gated” corruption. Oracle knew that any block written to the datafiles during the period when they were in this mode were potentially corrupt. So Oracle simply copied them to the logs as well. During recovery, Oracle ignored those blocks in the datafiles, and pulled those blocks from the logs. This meant that for purposes of optimizing hot backup, you needed to take the copy as rapidly as possible, both to minimize the number of ignored blocks, and reduce the impact of the huge increase in logging which occured while in hot backup mode. Enter SLIC, which enabled the DBA to make a copy of the database instantly.

With SLIC, Oracle databases could be backed up very rapidly. The backup operation had minimal impact on the production database. Most DBAs could not measure any performance impact at all. This meant that databases could be backed up much more often. The impact of hot backup on logging was a tiny blip, that's all.

SLIC allowed the copy to become a fully writable copy of the database instantly as well. This meant that the restore time was also dramatically reduced. Since the database could be backed up more often, this also reduced the time for recovery, since fewer log files needed to be applied.

In contrast, Sybase did not support any SLIC technology whatsoever. You were required to use a tool called Backup Server to make a backup of the database. This tool did lots and lots of I/O. The process seriously affected the production database’s performance, and the backup operation took hours. Restore and recovery time were similarly long.

Thus, when combined with SLIC, Oracle had the best online backup technology going. This allowed Oracle to crush Sybase and become the logical choice for all internet-facing database applications. This in turn led to Oracle’s dominance in the database marketplace, which persists to this day.

EMC led the charge of the storage space to provide SLIC technology to the Oracle database market, initially in the form of BCVs. As I said, I became involved in 1997 in validating NetApp snapshots, as well as NFS, for storing and backing up Oracle databases. By that time EMC had also introduced a set of snapshot SLIC technologies in the form of Timefinder Snap on the Symmetrix and SnapView snapshot on the CLARiiON.

In my next post I will discuss the differences between the SLIC approaches taken by EMC and NetApp and how those decisions have affected the Oracle database user.

July 18, 2007

Why Oracle Is So Important

In this post I will begin to explain my philosophy as an IT professional, and why I have ended up in the specialty that I am presently working in.

It's all about Oracle. Oracle is a fascinating product, and a mesmerizing company to watch. The company's principal, Larry Ellison, is certainly the most flamboyant person in the industry, far more engaging than anyone in a similar position in either Microsoft or IBM. His adventures with the Americas Cup, and the endless procession of starlets and super models who have graced his arm make him the stuff of tabloids and paparazzi. Nobody finds Bill Gates or even Steve Jobs this interesting. While this is all fun, I regard it as a distraction from the real reason why Oracle is so important.

There is one critical thing to observe about Oracle, aside from the fact that it works very, very well for the purpose for which it is intended. That is its cost. Oracle is expensive. Street price wise, if you are talking Enterprise Edition, Oracle Database 10g is an order of magnitude more expensive than either of its competitors: Microsoft SQL Server or IBM DB2. Yet it manages to dominate the enterprise database management space, according to most of the current market share surveys.

I liken this to BMW outselling Toyota in the global car market. It's counter intuitive. Why would a more expensive product outsell a less expensive product, all other things being equal? When you think about it, a Toyota provides identical functionality to a BMW generally. Similar to the car market, SQL Server is a perfectly fine product which works just as well as Oracle in most respects. Furthermore, Microsoft has consistently improved the capabilities of SQL Server relative to Oracle. Microsoft SQL Server is definitely less expensive than Oracle Database 10g. Yet Oracle continues to be a serious competitor to SQL Server. You could even say that Oracle kicks Microsoft's butt in this space.

So why would the customer pay the extra money for Oracle? Simple. The customer really, really, really cares about the data being managed by Oracle. In other words, this data is the most critical data to the customer, and therefore worth the money spent to manage it. It's the customer's crown jewels in other words. The company's critical enterprise resource management system is very likely to be managed by Oracle. And given that the customer has paid the price to run Oracle, the tendency will be to put more and more data onto Oracle, despite the cost. Oracle is simply addictive. It works, and works well. Once you have bitten the bullet, you tend to keep biting it.

Needless to say, customers are very conservative about radically changing the environment in which they store their most critical data. Since Oracle has become firmly entrenched in this niche, it is likely that they will continue to dominate in this area for a long time to come.

This situation creates what I call the "halo effect". In enterprises which run Oracle, it is the most visible software product in the data center. The Oracle data must be more reliable, more available, more backed up, and more protected than any other data. A storage, server or OS vendor who supports the Oracle environment is managing the customer's crown jewels. If you can store the Oracle database data, you can damn sure store the customer's Microsoft Exchange, Microsoft SQL Server, or file system data.

I take the position that this is the reason why Linux has penetrated the enterprise OS market. Before Oracle endorsed Linux, Red Hat and the other Linux vendors were struggling to establish a toe hold in the data center. They were relegated to web servers and other less critical functions in engineering labs. Linux was the stuff of geeks and propeller heads. Nobody took it seriously. The whole open source thing was viewed with deep suspicion.

Once Oracle started pushing Linux as a database server OS, all that changed almost overnight. The enterprise market for Linux exploded. Suddenly, Red Hat was the darling of Wall Street, and Linux started showing up inside the data center in Fortune 1000 companies. At the same time, the proprietary UNIX market (especially Sun Solaris) took a huge dump. Sun's stock went into the tank, and they have struggled ever since. Even though everyone knows that Sun Solaris still owns a huge part of the Oracle database server OS market. By endorsing Linux at the expense of proprietary UNIX, Larry Ellison moved the market. That's the halo effect at work.

This means that the business of storing Oracle data, while an interesting business in its own right, is much more important than the dollar impact of that business. This also means that it is critical that EMC continue to aggressively pursue the Oracle market, and maintain its leadership position in this market relative to NetApp. My tenure at EMC has largely been about this challenge, and our strategy in facing it.

In my next post I will explain why EMC technology is the best available solution for storing Oracle database files, and how we intend to continue to improve that technology.

July 16, 2007

Why EMC Cluster Technology Rocks

In my previous post, I began to explore the advantages of EMC technology over NetApp for Oracle database storage, focusing in that post on Consistency Groups. This post will examine the nature of clustering.

You have to bear in mind that when I joined NetApp back in 1997, they had no clustering technology at all. Everything was just a single headed filer, with lots of single points of failure. I also had no clustering experience, having come to NetApp from a background in development. When NetApp came out with their first clusters, it seemed incredibly cool to me. We used to sit around in the lab and watch the cluster countdown happen, chanting "ten!", "nine!", "eight!" and so forth until the filer failed over. All to peals of laughter when the failover happened.

Those times seem so simple now. I really had no idea what a real cluster was, and how the band-aid that NetApp clustering was (and largely still is) did not fit that model at all.

Let me explain.

A real cluster looks like this:

You have two redundant pieces of hardware running in intimate fellowship with one another. They both know pretty much everything about each other. If one of them fails, the LUNs or file systems owned by the failed member of the cluster are simply trespassed over to the other member. End of story.

NetApp's version: In the event of failure, the OS environment representing the failed member of the cluster reboots inside a virtual machine environment on the surviving member, running under, you guessed it, Java. Then this virtual machine Java thingy takes ownership of its storage objects, and continues whatever it was doing. With lots and lots of overhead. Goody!

EMC's clustering is simply better. There is really no question about this. EMC has been making clustered systems for a long, long time, and makes them very, very well. It's what we do.

July 11, 2007

Why EMC Technology Is So Cool

In my last post, I promised to explain the attraction to EMC and EMC technology. Well, here goes.

One of the coolest things about EMC technology is the notion of consistency groups. I first became aware of this concept while at NetApp, when I thought about the idea that you could coordinate snapshots between filers, or even on the same filer across volumes. This has lots of interesting uses.

For example, you can create a copy of an Oracle database called a restartable image (sometimes called a crash consistent image). This copy can be restarted to the state it was in when it was created, but it cannot be recovered to a later state. This makes it different from a backup, which can be restored, and then recovered to any later point in time.

Why is this interesting? Simple. You don't have to do anything to the source database to make this happen. You simply do it. It's exactly the same type of thing as if you powered the database server down abruptly. Sure, the database files are all in an inconsistent state. But they are inconsistent in a very interesting way. They are all from the exact same point in terms of I/O. Because of this, you can use the automatic restart feature of Oracle to open this copy of the database. This means that the creation of this kind of copy does not have to be coordinated with the production database in any way. That's very, very cool for lots of reasons.

Let's say you want to create a test/dev copy of the database. Does this version have to be recoverable? No. It simply needs to have representative data. Yes, you could use a complex backup/recovery method to create this copy. But that's overkill. How much simpler would it be to simply make a restartable copy and provide that to the test/dev users?

Problem is, in order to make a restartable copy, you must create simultaneous copies of the datafiles, log files, and controlfiles. That is, these copies must represent the exact same point in time in terms of I/O, exactly like powering down the database server abruptly. If you create a copy of the datafiles, and then a later copy of the control files, and then a later copy of the log files, this does not work. These copies cannot be restarted using the automatic restart feature of Oracle. Oracle will simply barf an error message and fail to open the database.

Best practices dictate that these files be stored on separate physical volumes. Thus, if you are going to use a snapshot or cloning technology at the storage layer to make a restartable copy, this process must be coordinated across multiple arrays, or at a minimum, multiple storage volumes.

I figured out all this while at NetApp. I even went so far as to suggest this feature. I was told that it was really, really hard. Others had suggested it. Many had tried to create it. All had failed.

EMC has had it for quite some time now. It's called consistency groups. It works in exactly the manner that I suggested, and it enables the creation of a restartable image of a running production Oracle database. No coordination with the production database is required. It works beautifully. See the following white paper for instructions on how to do this:

EMC Solutions for Oracle RAC 10g on Linux CLARiiON CX3 Series FCP Applied Technology

July 01, 2007

My Time at NetApp (Reprise)

On my first post, I explained that my decision to leave NetApp and join EMC was a push/pull, with NetApp pushing and EMC pulling. I explained some of the issues I had with NetApp in my last six months in that post. I also said I would explain the attraction to EMC in my next post. However, I have decided for the moment to remain focused on NetApp. My next post will begin to explain why I think EMC is a better storage company than NetApp, with generally superior technology.

In this post I would like to focus on the issue of NetApp's SAN implementation.

During the first years of my tenure at NetApp, NetApp was not just a NAS company. They were the NAS company. The company invented the market. (Bear in mind that when I joined the company it had only about 100 employees, versus several thousand today.)

The goals of NetApp were simple:

1. Create a new market.
2. Dominate that market.

While NetApp focused on the NAS market, this worked beautifully. This strategy began to fall apart when NetApp hired Rich Clifton (now Senior Vice President at NetApp) away from Data General.

I remember one of my first meetings with Rich. He showed a pie chart of the storage industry, which at that time showed the NAS market as 7% and the SAN market as 93% of that market. Rich asked a rhetorical question: "Does anyone believe that we can sustain our current growth rates without penetrating the SAN market?" (At that time NetApp's growth rate was about 80% per annum.)

My response was a resounding "Yes!" All of NetApp's growth up to that point was fueled by stealing market share away from the SAN market, including the database storage market. Of course all of the Oracle storage which NetApp had captured up to that point had previously been SAN. Much of this had been stolen from EMC. In other words, I argued, the way for NetApp to continue to grow at its current rate was to reshape the pie, as it had done up to that point.

I know that others within NetApp felt the same way I did, by the way, including David Hitz. (David later admitted that he was wrong, but I maintain that he was actually right all along.)

Rich disagreed, and eventually his view prevailed. This led NetApp into the development of the FCP and later the iSCSI protocols as implemented in their filer within the ONTAP operating system. All of these events happened around 2000.

I can certainly see the eventual development of iSCSI as being central to NetApp's vision of being the IP storage company. However, iSCSI at that time was very immature and production implementations were rare. The SAN market was really a FCP market.

The trouble for NetApp occurred in the FCP protocol. This protocol was implemented tactically. A classic Steve Kleiman (NetApp CTO) quip was that the goal of the FCP protocol project was to be at least as good as the "crummiest SAN RAID array in the market." At that time, the crummiest array was the Sun T1000, so the performance and reliability of the Sun T1000 was set as the bar by which the FCP was measured.

Trouble was, the T1000 really was a piece of junk. Not the least was the issue of cluster timeouts. The ability of the FCP to failover in both a host and filer cluster became a huge issue. Eventually, every significant feature enhancement in ONTAP became gated on the ability of the FCP to support that feature in a cluster environment. QA became the gating factor on Engineering's ability to perform. And the bottleneck in QA was the problem of verifying the FCP protocol's ability to failover.

While this sounds easy to fix, it wasn't. I would maintain that the decline in the rate of development and innovation at NetApp directly resulted from this one issue. If NetApp had simply stuck to their knitting and remained a NAS-only company, certainly leading the iSCSI market as it became a popular protocol, but by all means ignoring the FCP protocol entirely, then NetApp would be a far larger, richer and more successful company today than they presently are.

My next post will, as promised, explore my attraction to EMC, and why they will eventually prevail in their struggle with NetApp.

Powered by TypePad
View Jeff Browning's profile on LinkedIn

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.