« July 2007 | Main | September 2007 »

August 2007

August 28, 2007

Thin Provisioning Part II: Why TP Works Well for Oracle (Sometimes)

In my last post, I explained why I think thin provisioning is fairly useless for most Oracle database environments. I have been taking a bit of heat for this post, mostly from my good friend Barry Burke (AKA The Storage Anarchist). Given the respect and affection I have for Barry, I thought it might be worthwhile to put up a clarifying post, pointing out some of the areas where Barry believes thin provisioning is a really, really good thing for many usages cases he commonly sees. I think he makes some reasonably good points.

Bear in mind that I work in the Commercial program within EMC. That’s the set of customers who have revenues of between $25 million and $1 billion. These are most definitely low-end to mid-range customers. Barry works for the Symmetrix group, which largely address what we call the Enterprise customers (revenues of greater than $1 billion). This is a vastly different type of customer from mine. Admittedly, in my defense, the Commercial space accounts for the vast majority of business activity and technology spending in this country. So I have a bit of a bias here. However, EMC has traditionally served the Enterprise customer base more, and thus our customers are clustered in that direction. And there is also no question that Barry’s customers are the more well-heeled, household name type customers. In other words, most of the global financial services, telecommunications, insurance and manufacturing companies you probably know and love.

So let’s examine the usage cases between the two sets of customers. Taking the Commercial space first since that is what I am most familiar with. These customers have some of following characteristics:

  1. Small IT organizations with minimal politics and bureaucracy
  2. Limited budget
  3. Smaller allocations of storage due to cost constraints
  4. Granularity of the typical database space allocation is a significant percentage of the total space on the array 

Take an example of an application where the DBA has requested 100 GB of space, and has stated he or she will need 500 GB of space eventually. The storage administrator in a thin provisioning context would tend to allocate space as shown in the following graphic:

Thinprovisioning21small_2

In this graphic, you see that the storage administrator has given the DBA a thinly provisioned LUN of 500 GB in size. On an array with a few TB, that’s a significant amount of storage. I see that kind of allocation all the time. The DBA has created a database on the LUN with 100 GB of datafiles, again not an atypical size. The thinly provisioned LUN is backed up by 150 GB of space.

On the next occasion where the DBA needs additional storage, he or she would tend to do something like the following:

Thinprovisioning23small

Here, the DBA has allocated an additional 100 GB of space, probably by creating one or more datafiles. This pushes the thinly provisioned LUN over the edge. The storage administrator must now add storage to the LUN. Further, the DBA’s request to create files has been denied, much to his or her chagrin. After all, the DBA thinks he or she has 500 GB of space.

This is the dark side of thin provisioning. Like I said, thin provisioning is lying. Sometimes you get caught. The times you get caught are the times when the granularity of the additional space required by the user (in this case the DBA) is a very significant fraction of the total space in the LUN. Since Commercial DBAs tend to do that, that makes thin provisioning a risky proposition at best, and of questionable value, since the DBA will end up interfacing with the storage administrator in this case anyway. No saving in terms of administration here.

Now let’s look at a usage case that Barry pointed out to me where thin provisioning works very well for a large Enterprise customer. These customers have the following characteristics:

  1. Large IT organizations with lots of politics and bureaucracy
  2. Larger budgets with less stringent cost constraints
  3. Larger amounts of storage available on larger arrays
  4. Granularity of the space demanded by the DBA for a typical Oracle database space allocation (as in creating a new datafile) is a small percentage of the total space on the array

Point 4 is the key. In my first post, I said that thin provisioning works very well for file systems with unstructured data where the size of the marginal file is a small percentage of the array. In the case of an Enterprise class customer, the array is so huge that even a database allocation looks like a file on a file system. Let’s look at one such scenario, as illustrated by the following graphic:

Thinprovisioning22small_2

In this scenario, we have many, potentially hundreds, of apps being stored on the array. Each app has far greater provisioned space than they have utilized space. They are being stored on a fairly huge pool of physical storage as well. These may be testing or development databases, where careful configuration of the physical storage is less important. Further, the marginal granularity of each additional database space allocation is tiny compared to the total magnitude of the storage available. In this case, thin provisioning actually works very well, and provides some important benefits. In large organizations where the lead time to get additional space allocated is very long, it will save the DBA lots of time and headaches. Further, where there is a lot of political issues and the storage and database groups do not like or trust each other, this will significantly smooth the way for the DBA in getting storage for his or her database. Another benefit is free space pooling. While the apps each grow, they grow at different, somewhat unpredictable, rates. Giving them each a big thin provisioned LUN allows them to grow somewhat unfettered. As they grow physically, they are able to share the free space. This improves utilization significantly. This is the real main thrust of thin provisioning.

Thus, the critical issue is the size of the average additional space allocation from the DBA compared with the size of the array. If this is a small percentage, then thin provisioning works well, and provides important benefits. This tends to be true with larger customers who have larger arrays. In smaller customers, thin provisioning may be of less benefit.

August 26, 2007

Why thin provision does not work for Oracle data

In this post, I will discuss the concept of thin provisioning and database storage. There has been a lot of confusion on this issue recently, and I think that we need to clear the air and get some things straight.

First I would like to explain why thin provisioning is pretty much useless for Oracle datafiles right now. Then I will turn to how thin provisioning could be integrated into Oracle at which point it would be very interesting indeed.

OK, to start, I should define what I mean by thin provisioning. Assume you are a storage administrator, and you have a bunch of customers who need to store tons and tons of data. Many of you folks reading this post are probably in that category. What consumers of data on networked storage devices tend to do is the following:

  1. Figure out how much space they need to store their data.
  2. Double that because they know the data will grow over time.
  3. Double it again for ducks.

The effect is that storage consumers tend to be very liberal in their demands for storage. This becomes a problem for the storage administrator because costs balloon and utilization on the array is very low. A formula for aggressively and stupidly wasting money in other words. The following graphic illustrates this:

Thin1small

What you would like to do is the following:

  1. Give the customer a block of storage which is the size they are demanding.
  2. Under the covers, allocate a more realistic amount of storage, closer to what you think they really need right away.
  3. Monitor the storage, and add physical space to the device when you need it, in order to meet the demands for what your customer really needs.

The following graphic shows how this would work:

Thin2small

Note that the light grey, plus the yellow, represents the total space on the device. The over provisioned space is space which the array tells the customer he or she has, but it does not exist physically. It is then the storage administrator’s responsibility to make sure that physical space (i.e. disks) is added to the array in a timely manner in order to meet the customer’s expectations for space.

If this sounds like lying, you’re right. It certainly is. It’s a shell game. A very useful and cost effective shell game for many folks, though, given storage consumers’ propensity to ask for a lot of space they don’t need.

One assumption that is required in order for this to work at all, however, is the ability for the array to lie to the host operating system about the size of a file system or LUN which is served up to the host. For a file system like NFS or CIFS on a NAS box, this works well, as long as the data being stored is unstructured. Any given file is of a certain size, and that file’s space must be available for it to be stored. But the entire file system can certainly lie to the host and say its capacity is a terabyte, when it is really only 100 GB. No problem there, as long as the physical size is adequate to store the amount of data represented by the files actually in the file system.

The issue comes when you try to store structured data like Oracle. A DBA on an Oracle database will request the size of data he or she thinks the database will need, with the same propensity for over provisioning as any other consumer of data. The difference, though, is that the DBA will actually create datafiles which fill that space. And Oracle then makes a physical file on the device of that size, creates extents in this file, and writes zeros to it.

There is a concept in Oracle of auto extension of files. This concept seems like it would align well with thin provisioning. And it would if DBAs used it. Problem is, extending a file is a very expensive operation. Again, because Oracle likes to lock down that file and zero it out completely. DBAs hate that. A huge performance hit kicking the database in the teeth at any unpredictable time. Simply because the datafile ran out of space. Not good. Not good at all.

DBAs avoid auto extending files like the plague for this reason. They will allocate the space they need, always, when they create the database. And future expansions in space will be made intelligently, carefully, and methodically. That’s the way DBAs think. Believe me, I know. I am one of them.

This makes thin provisioning completely useless nonsense for Oracle data. Anyone who tells you otherwise should be viewed with deep suspicion. I say this without any bias whatsoever, since my employer sells arrays that provide thin provisioning too. I am simply telling you the way it is here.

Now, how could thin provisioning be made to work with Oracle? That’s a very interesting question. It would require integration between the storage array and the Oracle kernel. Then Oracle could avoid zeroing out a file, and simply allow the array to provide the storage Oracle needs to store the blocks presently in the file. This would probably occur within ASM. (ASM stands for Automatic Storage Management, Oracle’s storage layer.) There is some discussion of that type of integration between storage and Oracle, but do not expect to see it anytime soon if ever.

August 16, 2007

Oracle Backup: Which Snapshot is best? (Part 4)

In my past few posts, I have explored the risks and benefits of snapshot technologies from both NetApp and EMC. This series has covered:

  •  Part 1: The nature of snapshots and their benefits to the Oracle user
  •  Part 2: Snapshot performance overhead
  • Part 3: Writable snapshots

In this post, the last of this series, I will discuss the manner in which a snapshot can consume so much space that it will cause writes to the active file system to fail, as well as the mechanisms which NetApp and EMC have created to avoid this fate.

Yes, it is true. You can get an ENOSPACE error when you are using a metadata approach for creating snapshots, which is the way WAFL manages snapshots on a NetApp filer. Recall a couple of posts ago, when I included this diagram:

Snap1small

Note that the additional blocks required by the snapshot are invading the free space in the active file system. It is actually the light-colored blocks (the “before” images of the blocks) which are held by the snapshot. At NetApp, we used to have debates over whether the snapshot occupied the space, or whether it was the active file system that did so. Whatever. The effect is exactly the same. The storage space cost of a snapshot is equal to the number of blocks which have been updated since the creation of the snapshot. Thus, you can think of the storage space overhead of snapshots in this way:

Snap31small

From this diagram, you see that we are running a file system that is about 70% full. We have another 10% of snapshot overhead. This creates a file system which has about another 15% before it runs out of space.

Absent space reservations, you could do this:

Snap32small

All available space has now been fully occupied by snapshot storage overhead, even though there has been no increase in the amount of data in the active file system. This is because we kept this snapshot around too long: A sufficient number of blocks were updated after creating the snapshot to exhaust all empty space. The next write to this file system will get an ENOSPACE error. This includes updates to files already in the active file system, that require no additional space to be allocated.

Hence the common NetApp heuristic: “Old snapshots are dear; new snapshots are cheap.”

This was a depressingly common issue at NetApp while I was there, particularly with storage administrators who migrated to NetApp NAS from a more traditional SAN storage environment (typically EMC). Those folks would behave like good storage professionals: They would utilize all available space. They regarded free space as wasted space. Further, these folks tended to think that if they had created an Oracle datafile of 100 GB in size, then that file was locked down and in place. They regarded a storage device returning of an ENOSPACE error as a result of an update to that file as naughty, irrational, and strange.

For these well-behaved storage professionals, the good habits they had developed in the SAN context were a formula for disaster when dealing with NetApp snapshots in an NAS context. By running with little or no free space, they allowed no headroom for the snapshot overhead. Thus, ENOSPACE errors were common.

I used to refer to snapshots as having a “dark side”. This is the dark side I was talking about. The space allocated to a datafile is no longer guaranteed. When you make a snapshot, you can run out of space on that file anyway, although it is already allocated in the file system.

This led NetApp to introduce the notion of space reservations. The architect of this concept was Bruce Gordon, the SAN marketing guy hired by Rich Clifton during the 2000 to 2001 period. I will readily admit that I fiercely resisted this concept. Basically, what space reservations do is simple. If there are not enough free blocks in the file system to completely duplicate all of the existing data, then the snapshot creation fails. An illustration will help. Before space reservations, if you had this:

Snap33small

You could not create a snapshot at all. You do not have enough free space to duplicate the existing data. You must either free some space or add capacity. Assuming you add capacity then at this point, you could create a snapshot:

Snap34small

Snapshot overhead then begins to invade the reserved space. As you begin to accumulated updated blocks, the snapshot overhead looks like this:

Snap35small

Since you have reserved enough space to duplicate all of the data that existed at the time of the creation of the snapshot, theoretically an ENOSPACE error is impossible.

I said previously that I resisted this concept. I used to tell Bruce Gordon that as far as I was concerned, he was an EMC plant. Why? Because space reservations destroy the one primary benefit of snapshots: Space efficiency.

Go all the way back to my first post on this series. I stated that the gold standard for Storage Layer Instantaneous Copy (SLIC) technologies is BCVs. BCVs have lots and lots of advantages. They have absolutely no performance penalty. They work beautifully. They have only one downside: They require another set of disks. Before space reservations, snapshots did not. By providing the same basic functionality as BCVs (instantaneous copy) without the storage overhead of another set of disks, snapshots became the best way to do the job of Oracle database instantaneous hot backup.

With space reservations, the cost of snapshots became effectively the same the same as BCVs. In that case, BCVs win. They do not have the performance issues that metadata based snapshots do. (This performance trade-off is discussed in detail in Part 2 of this series.) Removing the cost advantage of snapshots over BCVs was a major erosion in NetApp’s core value proposition.

But, as Bruce Gordon said, “No customer will ever have an ENOSPACE error on my watch.” Bruce attempted to establish a principle that space would always be reserved such that a snapshot could never exhaust the active file system free space.

Unfortunately, FlexClones, covered in detail in my previous post, violate this principle. That is because FlexClones create another write thread. Remember that each write thread has the potential to double the space requirements, by overwriting every block in the snapshot. That was illustrated by the following diagram from my previous post:

Snap21small

Note how FlexClone increases the space requirements by adding another set of “after” image blocks to the mix. Simply reserving space for one set of additional blocks is now insufficient. You would now need to reserve space for two. Thus FlexClones make the following scenario possible:

Snap36small

You are now out of space again. The next write will get an ENOSPACE error.

EMC snapshots make all of this impossible. By using a reserved LUN pool approach, EMC simply allocates the space required for the snapshot. The snapshot space is not shared with the active file system space. Thus, it is impossible for the active file system to receive ENOSPACE from a snapshot. The following graphic illustrates this:

Snap37small_2

The snapshot space is contained within the RLP. It is not shared with the active file system. Running out of space within the RLP will cause the snapshot to become invalidated. But it will not affect the active file system at all. An ENOSPACE error can never be returned to the active file system with this design, unless the user exhausts the space in the active file system itself. Further, you decide how much space you want to allocate to the snapshot. Unlike WAFL-based snapshots, you are not writing a blank check for snapshot overhead, up to the full amount of data in the active file system. Rather, you can decide that the snapshot will only be allowed to take up 10% of that space if you want to. This adds discipline to the whole proposition of snapshot space overhead.

Once again, it is for you as the customer to judge the relative merits of these approaches. In my series on snapshots, I have attempted to bring clarity to the debate between EMC and NetApp on the benefits and risks of snapshots for Oracle database backup. Based upon the number of comments this series has received, I think you are hearing me.

Future posts on this blog will cover how EMC NAS compares to NetApp NAS for Oracle database storage.

August 03, 2007

Oracle Backup: Which Snapshot is best? (Part 3)

Kool-Aid. It’s great stuff. When you believe it hard enough, it enables you to do amazing things. Doubt is not a problem. You simply state the company position as truth, uncritically. This gives you tremendous power in your customer presentations. Like I said, if you believe it hard enough.

Funny thing though. After a while Kool-Aid begins to lose its punch. You have to redefine the message. Reformulate the Kool-Aid you will. Otherwise, nagging doubt begins to erode your confidence. You get that haunted look as you deliver the message. Like what you are saying may not be exactly true. This detracts from the appeal of the Kool-Aid, especially as the customer begins to perceive that you may not be shooting straight.

Don’t get me wrong. All companies have Kool-Aid. I would readily admit that EMC has Kool-Aid too. When I first showed up at EMC, I had the following conversation multiple times with many different individuals:

Me: “What do you think about NAS storage for Oracle databases?”

EMC person: “NAS is great for storing small, non-mission critical Oracle databases with low I/O requirements.”

That’s Kool-Aid. Much of what I have been doing at EMC since I showed up here has been combating this particular brand of Kool-Aid. Because my entire professional life for the past 10 years has largely been about Oracle on NAS storage. Because I can run out of fingers and toes on my entire body counting Fortune 1000 companies which I have visited personally that have multiple PBs of NAS-mounted Oracle databases which they use in mission critical settings, and hit with heavy I/O loads. Many of these databases are multiple TBs in size as well. Including, most importantly, Oracle Corporation.

So, yes, NAS is perfectly appropriate for storing Oracle databases of all kinds and sizes with all sorts of I/O requirements. Including databases which the customer absolutely has to have running always. Which Oracle Corporation itself has proven in spades.

All of which is to say that I war against Kool-Aid. In all forms, and at all times. Whether from my existing employer or not. Despite its insidious power. In fact, because of it. The truth must be embraced, and Kool-Aid must be rejected, always.

Getting back to NetApp, the snapshot-for-Oracle-hot-backup form of Kool-Aid began to lose effectiveness in the last couple of years of my tenure there (basically 2004 and 2005), and there was a lot of soul searching for another message. The message which emerged centered on the notion of writable snapshots in the form of FlexClones which were introduced in ONTAP 7. FlexClones work like this:

Snap21small

Similar to my last post, this graphic represents the state of a WAFL file system before and after a set of updates. The blocks belong to three files, color-coded pink, buff, and green. Compared to my last post, another set of “after” images of the blocks have been created. These are the blocks created by FlexClone write activity. This has one major side effect. The space held by the snapshot (referred to as snapshot overhead) increases as write activity within the FlexClone file system occurs. Previously, the blocks held by the snapshot (the light-colored blocks) were only the before images of blocks which were overwritten by the production file system. Now another write thread has been created, which will cause more before images of blocks to be held by the snapshot. This will increase the snapshot storage overhead. I will discuss snapshot storage overhead in detail in a future post.

FlexClones were introduced with great fanfare in 2005 and the claim was widely made by many folks at NetApp (myself included), that they were “completely unique to the industry”. Without question, that’s the purest form of Kool-Aid.

Recall the conversation I told you about in my last post with a major telecom customer in the UK. Here is the rest of that conversation:

Me: “NetApp is the only storage company to have writable snapshots! FlexClones are completely unique to the industry!”

Customer: “What are you talking about? CLARiiON SnapView snapshots are writable and always have been. So are Symmetrix Timefinder Snap snapshots. EMC has had this functionality for years.”

Again: Oops!

I must admit that I feel fairly stupid about this now. In my defense, there was a widespread view within NetApp that undue familiarity with EMC as a competitor was unhealthy. I lived in a state of shocking ignorance concerning the capabilities of EMC storage arrays and software. This ignorance was completely fine as far as my employer was concerned.

The functionality of EMC writable snapshots is somewhat similar to NetApp’s FlexClones. The following graphic points out how they work:

Snap22small

In EMC’s case, the after images of blocks modified in the snapshot are simply written to the RLP LUNs. This eliminates the additional snapshot overhead, at the cost of not maintaining the original version of that data. Again, a weighing exercise. With EMC snapshots, if you want to keep the original version, you must not overwrite it.

Kool-Aid. Watch out for it. It will suck you in. In future posts, I will continue to point out the areas in which the storage-for-database messaging has been less than totally honest. Stay tuned.

Powered by TypePad
View Jeff Browning's profile on LinkedIn

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.