May 29, 2009

Paul Maritz Keynote at EMCWorld Mentions Oracle Performance on VMWare

As you all know, EMCWorld was this week, and I attended. One of the most interesting aspects of the event was the Paul Maritz keynote. Paul Maritz is the new President and CEO of VMware as most of you undoubtedly know as well.

During the keynote, Mr. Maritz said the "o" word, i.e. Oracle. In fact, Oracle was a very prominent feature of his discussion. He presented the results of an OLTP performance benchmark that VMware has done recently. The basic results are as follows:

Metric Native VM
Throughput in business transactions per minute (1,000s) 293 250
Disk IOPS (1,000s) 71 60
Disk bandwidth (MB/s) 305 258
Network packet rate receive (1,000s) 12 10
Network packet rate send  (1,000s) 19 17
Network bandwidth receive (MB/s) 25 21
Network bandwidth send (MB/s) 66 56

Overall, this indicates that VMware virtualization has no more than a 15% performance penalty over physically booted, which I consider to be actually very good. Certainly, given the other benefits of virtualization with VMware, this would be acceptable for many customers for many of their Oracle database environments.

Now, to deal with the obvious question which I expect to be raised: Why did VMware not see the performance advantage of virtualization that I did in my previous post on comparing physically booted performance to virtualized performance?

The answer will becomes apparent when you review the VMware performance study, linked above. My peformance comparison was between the following:

  1. Oracle RAC 10g (i.e. clustered)
  2. VMware ESX 3.5 with HA Cluster (i.e. clustered)

Their study is between:

  1. Oracle Database 11g (i.e. not clustered)
  2. VMware ESX 4.0 (also not clustered)

The most important consideration is the issue of clustering. Their study shows a single 8-way Intel white box being used as a server for both physically booted and virtualized. (Similar to our study, they used identical hardware for both configurations.) Our study shows a cluster of 4 8-way Dell PE2900 servers being used for both configurations.

Closely reading the VMware study, they used a single database server with identical memory and CPU settings for both physically booted and virtualized database servers. Thus, in the VMware study, everything was done within a single instance database server with a single database image.

Another issue is that we federated the databases on the virtualized side. Thus we had 4 clustered instances and 1 database on the physically booted side, and 8 independent database servers (consisting of a non-clustered instance and a database image) on the virtualized side. As I said in my previous post (linked above), creating a larger number of database images had a performance advantage, especially compared to RAC, where cache fusion and lock management add significant overhead.

I have received many comments that this makes the results from the EMC study non-comparable. I somewhat agree, and I will deal with that issue at length in a future post. The VMware performance result is certainly more apples-to-apples than mine.

However, in my defense, I will say (as I will explain at length in my up-coming post) that the configuration I used is identical to that used by Microsoft SQL Server in their clustered product. No one seriously contends that Microsoft SQL Server with MSCS is not competitive to Oracle RAC. Certainly, Oracle considers it to be competitive, as the abundant technical literature on their own website demonstrates.

Thus, the study performed by the Oracle CSV group at EMC is a natural and obvious step: We need to understand the performance differences (as well as cost and manageability differences) between Oracle RAC and Oracle Database with VMware HA Cluster. And when we do that study, we need to configure the databases in the most natural and logical way, taking full advantage of the features of each environment. This necessarily leads you down the path of a federated environment on the virtualized side, and a single image on the RAC / physically booted side.

The VMware study is undoubtedly very valuable and important: It demonstrates that VMware can provide world-class performance for Oracle production OLTP database environments. It shows that the performance penalty of VMware in this context is modest and manageable.

More on this later.

May 14, 2009

Oracle Wants To Win The Virtualization Wars – But Will Customers Lose?

In my previous post on Oracle's support of VMware virtualization, I pointed out that Oracle is behaving like a monopolist: Attempting to establish control over the entire IT stack, including application, OS, storage and virtualization. Oracle's recent moves in the area of virtualization further demonstrate this trend, potentially benefiting Oracle to the detriment of their customers and the industry.

Oracle recently revised one of its metalink support statements regarding virtualization of Oracle software products. This is the statement concerning support for virtualization with Oracle E-Business Suite. The revised version can be found here. (Note: This URL requires a metalink account.)

The changes in this statement are very interesting, in that they speak volumes concerning Oracle's intention regarding the use of VMware virtualization with their products, especially Oracle Database. Prior to May 8, the title of this support statement was:

Platform Vendor Virtualization Technologies and Oracle E-Business Suite

With the revision on May 8, the title was changed to:

Hardware Vendor Virtualization Technologies on non x86/x86-64 Architectures and Oracle E-Business Suite

Note the use of "Hardware Vendor" rather than "Platform Vendor" and the addition of "on non x86/x86-64 Architectures". Further, prior to May 8, the text of the support statement included the following:

The use of platform (emphasis mine) vendors' virtualization technologies (both software and hardware based) (emphasis mine) to host Oracle E-Business Suite 11i and R12 is covered by Oracle's policy with regards to 3rd-party products - that is, they are 'not explicitly certified, but supported' (emphasis theirs).

On May 8, this was revised to read:

The use of hardware (emphasis mine) vendors' virtualization technologies to host Oracle E-Business Suite 11i and R12 follows the same policy as Oracle's policy with regards to customizations - that is, they are 'not explicitly certified, but supported' (emphasis theirs).

Note the replacement of the term "platform vendor" with "hardware vendor", and the deletion of the phrase "both software and hardware based". Clearly, Oracle intends to carefully limit the support statement to only hardware vendors' virtualization on non-x86/x86-64 platforms at this point. Or to state it in the reverse, to exclude software vendors' virtualization products on the x86/x86-64 platform. And who is the dominant market leader for software virtualization on that platform? VMware, of course.

Further, the earlier version of the support statement included VMware explicitly, and this was deleted from the revised version.

I am speculating, but my guess is that this move on Oracle's part was prompted by the post to Chris Wolf's blog (which he issued on May 6), stating:

The bottom line – Oracle now offers best effort support for all of its E-Business Suite applications on any x86 hypervisor.

This post was based upon the earlier version of the support statement, obviously, which granted best efforts support to all virtualization products, both software and hardware, on all platforms, and included VMware explicitly. Others posted as well, basically claiming victory on VMware virtualization of Oracle software products. The timing is extremely suspicious, and it seems pretty clear that Oracle closed the loophole which might be perceived to grant official support to virtualization of Oracle Database using VMware.

The other move is, of course, Oracle's acquisition of Virtual Iron, a startup which has been nipping at VMware's heels for some time. The intent of that acquisition is obvious: To shore up the Oracle VM offering which has big gaps when compared to VMware.

Combined together, these moves present a fairly stark and obvious message: It appears that Oracle is carefully and intentionally excluding VMware from its support statement. It also seems very obvious that this is without any technical justification whatsoever, given that Oracle has granted explicit support for virtualization products from other vendors which are completely homologous and comparable to VMware. The apparent reason is to benefit OVM, Oracle's competitive offering against VMware, by forcing Oracle Database customers to run that virtualization product instead.

If VMware were a niche player in the virtualization market, that would be one thing. Excluding such a player might be seen as mean spirited, but the impact would be minimal. In the case of VMware, though, Oracle is excluding the vast, vast market leader in the virtualization market.

This is exactly like Oracle attempting to exclude Microsoft Windows from its list of supported OS platforms and promoting Linux as the exclusive OS platform for running the Oracle Database software product. Permitting Linux as an alternative to Windows as your OS platform for running the Oracle Database is fine. Even promoting Linux is the preferred alternative is imminently justifiable. But attempting to exclude an OS platform with the market dominance of Windows would be met with howls of protest by numerous customers. I submit that this is exactly what Oracle is attempting to do by excluding VMware from its best efforts support policy.

I have several questions for Oracle:
  1. Do you have any conceivable technical justification for excluding VMware from best-efforts support, given your support for less mature, less popular software virtualization products sold by other vendors, yourself included?
  2. How are your customers served by forcing them onto a virtualization platform with less maturity and less market penetration than VMware?
  3. As Chuck Hollis pointed out in his blog, competing on your own merits to own the datacenter is one thing. Using your dominance in one area (databases) to force your customers to accept dominance in another area (virtualization) smacks of anti-competitiveness. Can't you tolerate a free and fair competitive environment? What are you afraid of?
  4. Given that there is abundant evidence from extensive testing (including at EMC) that VMware virtualization of Oracle databases is both viable and compelling, what is your real issue here?
  5. If you have no reasonable response to the above questions, then when will you extend official best-efforts support to VMware which you have already done to other equivalent virtualization products?

May 07, 2009

Response to comments from Alessandro Perilli blog

I have noticed that a large number of comments have been submitted to the recent post on Alessandro Perilli’s blog which links to my recent post on Oracle’s support policy for VMware virtualization. Many, if not most, of these comments are actually comments directed at my post, not Alessandro’s post. Therefore, I will respond to them here. I am also attempting to post this as a comment on Alessandro’s blog, but that has not appeared on his site as of yet, and I wanted my response to be available.

To respond to several of the points raised by these comments:

Why would anyone want to run Oracle on a single vCPU under VMware?

Not sure how to parse this one. VMware ESX 3.5 allows 4 vCPUs per VM. Our performance testing was done with 4 vCPU VMs (2 VMs per server on a 8 core box, consisting of 2 Quad-core processors). 4 Servers were in the HA cluster, so a total of 32 physical CPUs were in the configuration, all of which were allocated to vCPUs on 8 VMs. This is laid out pretty thoroughly in the reference architecture published on Powerlink and EMC.com. You can find the EMC.com version here.

In terms of why you would want to run Oracle in a virtualized environment using VMware, that is pretty well laid out in my blog. But to summarize:

  1. We found better performance in a virtualized environment using VMware HA cluster vs. Oracle RAC. I will deal with the fairness of that comparison on my next point.
  2. The cost of VMware HA cluster is less than Oracle RAC, and VMware is appropriate for many customers. Not all, but many. For those customers, RAC would also work, but is, again, vastly more expensive. I lay out the usage cases where VMware HA cluster works as an alternative to RAC in another item below.
  3. Manageability is higher with VMware HA cluster than RAC. Believe me, I should know. I run both routinely in our testing environment.

Why should I believe that performance of VMware HA cluster is higher than RAC when VMware will not allow third parties to benchmark their product?

Well, I am a third party, and the program I manage publishes performance benchmark results of both virtualized and physically booted Oracle production environments. These are available on Powerlink. The actual performance results are pretty confidential and proprietary, but I tend to open the kimono on this blog, as you have seen. We are using a TPC-C-like workload run under Quest Benchmark Factory for Databases. We do not claim that this is a published and audited TPC-C result, obviously. However, we have lots of experience running this.

I can tell you no one was more surprised by the VMware performance result than I was. We are still continuing to actively profile performance of both physically booted and virtualized Oracle environments. (In fact, we are beginning to profile OVM as well.) I had no particular axe to grind on this either way. The results were simply what they were.

RAC and VMware HA cluster are not comparable, are they?

Of course they are. Both Oracle Cluster Ready Services (the underlying technology behind Oracle RAC) and VMware HA cluster are cluster software products. They do basically the same sort of thing: Provide high availability for applications. They simply do so in a different sort of way. RAC provides one single database image across multiple physically booted servers. CRS provides transparent failover of VMs using VMware VMotion technology. Both are intended to protect application uptime and client access by providing a high availability solution.

Granted, they have different levels of HA. I would consider RAC to be a fault tolerant technology, in that the physical loss of a node will not result in database downtime (but may result in loss of client access). VMware HA cluster is a high availability product. A brief downtime is inevitable when a VM is being rebooted on a surviving node of the cluster after node failure. In our experience, VMware HA cluster works pretty well at getting the VMs back up and running, though.

In the end, it depends on what the customer needs, the level of HA being one of the issues. I cover that more in a later section.

Isn't RAC free with SE?

This is true, but very misleading. RAC is free with SE when the total CPU core count in the entire cluster is 2. This is a trivial cluster. Beyond that, EE is required.

And that is where the cost savings come in. EE is required for basically two products in most customer configurations:

  1. RAC
  2. Data Guard

RAC also carries the RAC upcharge above the cost of EE, making it the most expensive Oracle database software product, by far.

Assuming Data Guard is not required (and storage vendors have been competing with Data Guard using products like MirrorView and RecoverPoint for many years), then it comes down to RAC.

Assuming you can provide HA in another manner, then RAC is not required, and therefore SE can be used instead of EE, at 25% of the cost, plus savings on the RAC upcharge. I think you see the point.

The bottom line is that VMware HA cluster provides costs savings assuming the customer scenario allows this product to be used. Which is what I cover next.

Not all customers can use VMware for an HA solution, and need RAC instead. Right?

Not actually in the comments, but important. Again, EMC is a strong supporter of RAC. We run it in our lab, and we will continue to do so. No one would say that RAC does not work as well as VMware HA cluster or that it is not a great product. It is. But it is expensive and complex.

What RAC provides is two things:

  1. A single database image
  2. True fault tolerance

These are both great advantages, but not all customers need this. For example, I personally visited a very large Fortune 100 company. For confidentiality reasons, I will not use the name here, but believe me, they are a household name. This customer had many single-instance database servers running on 1U and 2U unclustered machines throughout their datacenter. The cost to manage these servers was immense, in the 9 figures per year. Yeah, that's a lot of jack.

The customer wished to consolidate all of these servers into a database cloud. I postulate that the way to do that could consist either of RAC or VMware HA cluster. But consider:

  • What is the level of HA provided by the current configuration? Very low. Each physical server is presently a single point of failure. VMware HA cluster would be a big improvement.
  • Does the customer need a single database image? Not at all. These servers are already islands of data.

The simplest and quickest way for this customer to consolidate the servers in this scenario, given the choice between RAC and VMware HA cluster, is VMware HA cluster. It is also the least expensive of those two alternatives.

I suspect that many, many customers are in this situation. For many applications VMware HA cluster is a viable high availability and consolidation solution for production Oracle databases. I would certainly not put a large ERP system on VMware. Nor the billing application for a large telco. But many, many applications could be put up on VMware just fine.

This is in many ways comparable to NAS vs. SAN. I am employed by EMC, so it may surprise you to hear me say this, but remember I came to EMC from NetApp. In my experience (not a scientific sample, but still) approximately 90% of the Oracle databases running in the world could be stored on an NFS server with absolutely no change in the client experience or uptime of the database. The same it probably true with VMware. Perhaps the percentage is higher or lower. I certainly have less experience with VMware at this point than NAS. But time will tell.

BTW, it would help me if folks could post comments to content written by me on my own blog, rather than having to run around and find it elsewhere. Just a thought...

April 30, 2009

What the Oracle / VMware support statement really means...and why

One issue I am asked about frequently is the Oracle support statement on metalink concerning VMware virtualization of Oracle database servers. I would like to clarify that issue, as much as humanly possible, because the truth, as usual, is far different from the prevailing perception.

First of all, I have to ask your forgiveness. As many of you know, I am an attorney. This is a flaw in my character which I have struggled for years to overcome. As I like to say, I am a recovering lawyer. It's a day to day thing. There are days when I have frequent moments of pomposity, pedantry, arrogance, inflexibility, stubbornness, and all the other negative personality traits we associate with lawyers. Forgive me if I revert into lawyer mode for the purpose of parsing and clarifying the Oracle metalink support statement on VMware. It seems called for somehow. So bear with me.

Note: If you would like to read the Oracle metalink support statement on VMware, the easiest way to do so is here. Or you can go to metalink directly here. The metalink URL obviously requires a metalink account.

The first statement in the metalink support note (249212.1) is the following:

Oracle has not certified any of its products on VMware virtualized environments.

This sentence, while literally true, is very misleading. The reason why the statement is so misleading is that Oracle does not certify things like this. Let me explain what I mean.

VMware virtualization simply provides virtualized hardware upon which an operating system runs. Oracle certifies operating systems, no question about that. But the operating systems running under VMware are typically Oracle supported versions. (Certainly running an Oracle supported OS version under VMware is possible.) For example, I routinely run Oracle Enterprise Linux within a VMware VM.

What Oracle does not certify is the underlying hardware. At least not for the normal database product (typically called Oracle Database 11g). RAC is a different story. I will cover RAC later. Let's stick with the normal single-instance database software for now.

For Oracle Database 11g, there is no requirement for hardware compatibility whatsoever. Search away on the Oracle website for a hardware compatibility list for the database product. You will find none.

Take VMware again. VMware ESX 3.5 provides a virtualized SCSI environment based upon either the BusLogic or the LSILogic SCSI card. These are incredibly common SCSI cards, millions of them having been sold and deployed for many years. From what I can tell, Oracle has never certified this set of SCSI cards, or any other SCSI card for that matter. Ditto for the Intel x86 processor, AMD Opteron, Broadcom NIC, Emulex HBA, or any other of the common hardware components used to run Oracle.

Do you run Oracle on a Dell PowerEdge server? Not certified. HP Proliant? Nope. IBM? You guessed it. Sun? Same.

I think you get the idea.

You might also ask: Does Oracle have a hardware compatibility list for the Oracle Enterprise Linux distribution? Surely that is an OS and would require an HCL of its own?

You would be wrong. OEL is simply a repackaging of Red Hat Enterprise Linux, and thus Oracle simply adopts the Red Hat HCL. This support statement for OEL can be found here. Interestingly Red Hat apparently supports VMware virtualization of its Linux distribution.

What all this means, again, is that the initial statement in the metalink note on VMware support is essentially misleading. The rest of the statement is even more interesting. It reads:

Oracle Support will assist customers running Oracle products on VMware in the following manner: Oracle will only provide support for issues that either are known to occur on the native OS, or can be demonstrated not to be as a result of running on VMware.

Again, this statement is literally accurate, but also misleading. It sounds very scary. You would think that you will have to duplicate your entire virtualized environment on normal hardware to get Oracle support for any issue. The way this works in practice is very different from what the support statement seems to say.

Assume you call Oracle support with an issue relating to the performance of a query. You have executed a query which contains a CONNECT BY clause. You see very poor performance. You have pulled an explain plan on the query and it looks odd to you. You think the execution of the query is very suboptimal, and you suspect that it could be dramatically improved. You call Oracle's support center to discuss the issue.

Consider: Will Oracle support ask you if your OS is virtualized? I would submit it is very unlikely. Why? OS virtualization is irrelevant to the issue at hand. This is an entirely internal issue within the Oracle kernel, specifically the query execution engine. Parsing a query has to do with issues of table statistics, query syntax, hints, and the like. The OS version simply is not a consideration.

Now, assume you call Oracle support with an issue in which you have executed the service oracleasm listdisks command, and you are seeing no disks in the output. Your Oracle instance has also crashed. The ASM instance is up, but shows no diskgroups available.

Will Oracle support ask you if you are virtualized? You bet. Absolutely. And they should. You have an issue which is intimately associated with the I/O layer of the operating system. This is actually not a database issue at all. It is an I/O subsystem issue. Virtualization may very well be your problem. You should probably call VMware support right now. In fact, I would suspect that if you ran the fdisk -l command from your Linux terminal session, you would see something very similar.

Resolving this issue will require you to interact with VMware as well as Oracle. The point is that in the range of issues, there are those which Oracle can resolve without any interaction with the OS layer, and others where the OS layer is very much an part of the resolution. In this range, the Oracle support statement really says that everyone will do the right thing. Certainly, that is the practical reality.

Now, let's deal with the issue of RAC. Oracle's final statement in the support statement covers this:

NOTE: Oracle has not certified any of its products on VMWare, and use of Oracle products in the RAC environment is also not supported.

The first part of the statement is simply a rehashing of the initial, misleading statement. I suspect that the second part of the statement is a typo and should actually read:

...use of VMware products in the RAC environment is also not supported.

Use of Oracle products in the RAC environment is obviously supported. In any event, in a RAC context, Oracle does have a certified hardware compatibility program, and VMware is not on it. Since VMware software is effectively kryptonite around Oracle (for reasons which I will cover shortly), this is inevitable at this point. I will say that there is no technical reason why VMware virtualization of RAC database servers could not be supported. Certainly, it works.

Now, as to the why. That has to do with Oracle's agenda around the issues of virtualization, licensing and HA.

Taking the virtualization piece first, Oracle wants to own the entire stack. They have made that abundantly clear in many other areas, including the following:

  • Elimination of Veritas by destroying Veritas Cluster as a viable product (replaced with CRS), and the same for Veritas File System (replaced by ASM)
  • Elimination of Sun by adopting Linux as a low-cost alternative to Solaris. (Ironically, Sun has now been acquired by the company primarily responsible for its downfall, a subject of another future blog post.)
  • Attempted elimination of Red Hat by subsuming Red Hat Linux under Oracle Enterprise Linux. (Admittedly, this one has been of dubious success so far. But stay tuned.)
  • Challenge to EMC and NetApp in the area of storage using the Oracle Exadata Storage array (also the subject of a future blog post).

I think you get the idea. Virtualization is a critical control point in the stack. Ceding leadership to VMware in this area allows another giant software company to challenge Oracle's domination in the data center. That cannot be allowed.

Licensing is another thorny issue. VMware will tend to improve CPU utilization and efficiency on Oracle database servers. All of that costs Oracle money in license revenue. Which is to say, saves the customer money on Oracle license costs. This is obviously beneficial to the customer, and detrimental to Oracle.

Finally, by allowing VMware HA cluster to protect Oracle database servers, Oracle would be allowing another HA solution as an alternative to RAC. RAC is Oracle's crown jewel in terms of licensing revenues. RAC is one of the two major reasons why customers choose to run Enterprise Edition at 4X the cost of Standard Edition (the other being Data Guard). Plus, you pay the RAC upcharge. Don't get me wrong. I run RAC in the lab, and EMC will continue to be a strong supporter of it. For many applications, RAC is absolutely required. For many others, it's total overkill, the software equivalent of swatting a fly with a nuclear bomb. VMware HA cluster could provide an acceptable level of HA to those customers at a fraction of the cost of RAC. But, again, that costs Oracle license revenue. Beneficial to the customer but detrimental to Oracle.

The bottom line is that the reasons for Oracle's statement concerning VMware support are political and not technical. And definitely not altruistic. VMware virtualization of Oracle database servers provides many significant advantages to the customer in many environments. Certainly, VMware is far more mature and robust than OVM. (I will cover OVM in a future post on this blog.) VMware is the vast market leader in the hypervisor space. Significant cost and manageability savings can be achieved by virtualizing Oracle database servers with VMware. Despite this, Oracle has chosen to issue an essentially misleading support statement on VMware virtualization, and has taken the position in numerous sales calls that VMware virtualization of Oracle database servers is unsupported and dangerous.

This does nothing to serve the customer. It serves Oracle corporation, and its political agenda. In my view, customers who wish to pursue virtualization of Oracle database servers should do so. I will certainly continue to post on this subject.

Come on Oracle: Get with it. Embrace VMware. EMC and VMware embrace Oracle. Are you an Oracle customer using VMware? What are your thoughts?

Posted by the Oracle Storage Guy April 23, 2009.

March 31, 2009

New DNFS Performance Results

In conjunction with my good friend and colleague, Dave Wild with EMC NAS Solutions Engineering, the group I manage has recently completed some testing around DNFS performance. In my previous post on this subject, I was not particularly excited about DNFS. I am now happy to say that I can retract that negative view. We found DNFS to perform very well indeed, when compared to conventional kernel NFS. As a result, the EMC Oracle Commercial Solutions Validation program is now in the process of completely converting our testing effort to Oracle 11g as a result.

The following contains a summary of our results, first starting with basic throughput:

DNFS1

As usual with our program, this is an actual database workload result using an industry standard OLTP benchmark. Note the dramatic improvement in throughput using DNFS compared to a kernel bonded NIC with kernel NFS. This implies the problem with our previous results: We were not using enough ports. And this points out one of the big benefits of DNFS: It does a far better job of utilizing multiple NICs than normal Linux does. Here is the comparison in terms of port scaling using DNFS:

Dnfs2

As you can see, the scaling on increased ports is pretty outstanding.

Another big benefit of using DNFS was improvement of CPU utilization on both the database server and the NAS file server. The benefit on the file server was particularly impressive as the following chart shows:

Dnfs3
We worked closely with Oracle DNFS group in developing this impressive performance. As a result, we can now enthusiastically endorse DNFS for NAS performance.

October 29, 2008

To RAC or not to RAC (reprise part 2)

In my last post I contrasted the use of Oracle RAC with VMware HA Cluster for creating a database cloud. In that post, I promised to describe further the suitability of the VMware HA Cluster solution as opposed to Oracle RAC. That is the purpose of this post.

There are two areas of concern when evaluating the use of VMware HA Cluster for creating an Oracle database cloud. These are:

  1. Scalability
  2. Availability

Analysts like to use the so-called "magic diagram" to describe the market space for a given technology or solution. This is very interesting in this case. The following diagram is useful:

Magic1  
Database Cluster Technology Magic Diagram

Think of this graphic as conceptual, but it makes an important point. At the bottom of Availability scale are databases that you don't even back up. Everyone has a few of these. One example is a staging database for temporary storage of summary data which is then inserted into a data mart or data warehouse. If this database fails, you will simply recreate it and rerun the job.

At the top of the same scale is your company's ERP, web catalog or online trading application. If this type of database is down for any length of time (even a few seconds), money is lost, or even worse hard legal requirements are not met and liability results. These databases absolutely have to be up and running at all times in other words.

In the middle part of the scale is everything else. These databases have various levels of HA requirements. The SLA may require no more than 2 minutes of downtime in a single month. Or perhaps several hours of downtime can be handled over a weekend. And up to half an hour of unplanned downtime may be OK in a six month period. Again, the requirements vary greatly.

The Scalability scale is similar. At the bottom of the scale (to the left in this case) are databases that are very small and have small I/O requirements. A small database stored on a personal laptop or PDA would be an example. At the top of this scale are databases which are many terabytes in size, have thousands of online users at all times, and where every user has to have the ability to see every row in every table in the entire database. The latter is referred to as a "scale-up". These databases exist in virtually every enterprise of any significant size, and frequently they are also the databases which have high HA requirements. Thus, there is a lot of commonality between these two scales. Again, the company ERP, web catalog, or online trading application may be multiple terabytes in size, have a large number of online users, high I/O demands, and the requirement that all users have access to all of the data in the database, making federating this database impractical or impossible. Very large databases with high I/O requirements probably are pretty important, or we wouldn't be spending all the money, time and effort required to maintain them, after all.

In the middle of this scale are databases that have various levels of scalability requirements. Many of these fall into the "scale-out" category, where the scalability demands can be met by creating a large number of separate databases. In my experience, this may be true for a couple of reasons:

  1. The database workload is naturally partitionable because a given class of users only needs to see a subset of the total database data. This is a federated database solution. The classic example of this is the software as a service (SaaS) customer. For example, I presented to a customer that publishes a software package for managing hair and nail salons. They have over 5.000 salons in North America that use their software. Obviously, no salon needs to be able to see the database data of any other salon, making the workload very easily partitionable. New users can either be added to an existing database, or a brand new database can be created.
  2. The customer has created a mess. I encounter this fairly frequently, even in very large enterprise accounts. Instead of creating a large database cloud, they have created a large number of small projects. Now they are experiencing poor utilization, poor manageability, high cost and data center sprawl. This is the classic "server hog" scenario, in which each separate group developing a software project wants their own small data center, consisting of servers, network switches, storage, power, cooling and the like. The result is that the overall costs are very high, including the Oracle license cost. Oddly, this is the customer that Oracle most often uses as an example for the suitability of RAC, although I find it far less compelling than the scale-up scenario, for reasons which will shortly become clear.

Mapping this set of concepts onto our magic diagram, we can see the following:

Magic2
Database Cluster Technology Magic Diagram with Products

Bear in mind this is my opinion about the way I see the market. Others will probably disagree, which I welcome. But here is my take: Oracle RAC owns the high end where availability and scalability are both very high. And it should. Oracle RAC provides unparalleled uptime with great scalability. This comes at a cost however.

At the bottom end, you have MySQL, Microsoft Access, and other user-oriented databases. (While both Oracle and Microsoft have personal versions of their high-end databases, I have always considered them developmental tools. For a simple user to create an address book with Oracle would be like swatting a fly with a nuclear bomb.)

In the middle you have a pitched battle among a variety of products, including Oracle SE, Microsoft SQL Server and IBM DB2. This is actually the bulk of the database market, as I show in the diagram. And this is where VMware HA Cluster actually helps Oracle substantially (a fact which I think many at Oracle have yet to fully appreciate).

Having configured both Oracle RAC and VMware HA Cluster many times, I can tell you that the manageability advantages of VMware HA Cluster are very strong. With VMware HA Cluster, you do not need to establish ssh equivalence with passwordless authentication as you do with RAC, for example. Granted if you do this often (as I do), this is no big deal. For the occasional user, it is mind-numbingly technical, with intimate knowledge of the various ssh files (known_hosts, authorized_keys, id_dsa.pub and the like) being required. And this is only one of a number of similarly technical issues that must be mastered in order to install and configure Oracle RAC.

To configure VMware HA Cluster, you simply establish a port on a vSwitch on each node in the cluster, create the cluster, and then drag and drop the nodes into the cluster. Believe me, those three steps (all accomplished using the VIC GUI) are far easier than creating a cluster with Oracle RAC.

Which gets to my point. Given the alternatives, and judging things like cost, manageability, and the like, unless you are either a very high-end customer (which Oracle owns as I have said), or simply slavishly committed to Oracle (these undoubtedly exist as well), you would be crazy not to try to create a moderately HA database solution with something other than Oracle RAC. Thus, you would tend to go down either the Microsoft SQL Server (with MSCS) or IBM DB2 (with the HA option) path, both of which are available at significantly less cost than Oracle RAC. And this is where VMware HA Cluster can actually help Oracle out. By giving Oracle another HA solution with simpler management and lower cost, Oracle can actually compete successfully with Microsoft SQL Server and DB2 in this space. Which is, again, the bulk of the database market. Thus, VMware HA Cluster helps Oracle to move down market, which in my experience is where alot of the interesting money is, as well as the next generation of new customers for Oracle.

Where VMware HA Cluster falls down is, not surprisingly, in these two areas: Availability and scalability. First, in the area of availability, Oracle RAC provides absolutely no downtime on the database as long as at least one node in the cluster survives. With VMware HA Cluster, the loss of a node in the cluster will result in the VMs on that node being rebooted onto one or more surviving nodes in the cluster. This is downtime. Granted, only a couple of minutes of downtime, but downtime nonetheless. For many users with many databases, this is acceptable. If your database requires absolutely no downtime, then Oracle RAC is your solution. Otherwise, VMware HA Cluster may be a viable option.

The other area is scalability. VMware HA Cluster cannot scale any given database very high. Since each database must run in single instance on a separate virtualized OS image, the size of the database in terms of I/O is limited to the maximum that you can achieve in a VMware VM. Presently, this is fairly modest.  The following chart compares our physically booted RAC solution to our VMware HA Cluster solution in terms of scalability. As you can see, the RAC solution scales far higher in terms of a single database image.

SolutionInstancesCPUsMemoryDatabase Size
Physically booted Oracle RAC48 per node24 GB per node2000 warehouses in one TPC-C database
Physically booted Oracle RAC84 per instance12 GB per instance250 warehouses in per TPC-C database (8 total)

Again, in the scale-out scenario this is no big deal. As the scalability requirements of any given database customer are fairly modest, and each database customer can be placed on a separate server, this works fine. It does not provide for a scale-up scenario though. If you have a database that is many terabytes in size, has high I/O requirements, and where each user of the database must be able to see every row in the database, then you have a scale-up and you need Oracle RAC.

Where VMware HA Cluster shines is in several areas:

  1. In an SaaS scale-out, VMware HA Cluster (and VMware in general) provide very convenient tools for creating new database servers. You can provision a new server almost instantly. I have tried to accomplish this with RAC, and it is quite a bit more difficult.
  2. In a mixed environment, such as the server hog scale-out (where many versions of Oracle and different operating systems have been used), you can easily virtualize a given database project in place, without an expensive, time-consuming and risky migration. You have the Windows 2000 version of Oracle 8i running in production? No problem. You can p2v that puppy into a VMware environment, in the process getting all the advantages of mobility, manageability and improved utilization that VMware provides. Doing the same thing in RAC would require you to migrate all of your existing projects to a single target set of OS and Oracle versions, as RAC does not allow for any heterogeneity. This is a daunting prospect for many customers in this situation. VMware HA Cluster provides a quicker, simpler and cheaper path to a database cloud in this case.
  3. In both scale-out scenarios, VMware actually provides a more natural management of a given database image than RAC does. Assuming SaaS, you would like to be able to backup and restore a given database customer's data without any disruption of the other customer's data. If the customer's data is stored in a single database, this is very easy and natural. If it is consolidated withiin a larger RAC database, this becomes more difficult. Especially if you want to do a point-in-time recovery. Ever tried to do a tablespace point-in-time recovery? I have. Great fun, believe me. The same thing is true with things like Data Guard. Data Guard is a log shipping / log apply product. You do not get to pick and choose which log you want to apply to the target database. If each customer's data is stored in a separate physical database, this works great. If a large number of database customers are consolidated, this may create issues. The same sorts of issues exist with the server hog scale-out as well.

To conclude: Both Oracle RAC and VMware HA Cluster have their place as HA solutions within the Oracle production database space. Oracle RAC owns the high end in terms of scalability and availability. VMware HA Cluster can significantly help Oracle to address the mid-market where the costs and manageability issues of Oracle RAC make that product overkill. Both products have advantages and disadvantages, and each have their place. Oracle would do well to evaluate the benefits of working with VMware and enabling their product to move down market and compete more effectively with the likes of Microsoft SQL Server.

October 15, 2008

To RAC or not to RAC (reprise)

In my post To RAC or not to RAC, That is the Question, I raise the issue of whether Oracle RAC is a cost effective product for many customers. We have now done quite a bit of work in this area, and I am proud to announce that a solution featuring VMware HA cluster as an alternative to Oracle RAC will be shipping in our program this November. I presented on this solution at VMWorld last week, and will also be presenting in VMware's booth in OOW this week on the same subject.

To give you an idea of the magnitude of the solution we are presenting, I will show you some highlights here in this post. First, this is the overall architecture of the solution from a hardware standpoint:


Hardware

Physically booted hardware

This shows the Oracle RAC physically booted configuration. We compared this solution to a virtualized solution using VMware HA cluster on the same hardware. Here is the same configuration running the VMware solution:

Hardware2 

Virtualized hardware

The high level network diagrams for these two solutions are as follows:

RACnetwork

Physically Booted Oracle RAC Network Diagram

VMwarenetwork 
 

Virtualized Oracle Database Network Diagram

As you can see, in all respects the hardware configuration of these two solutions was identical; only the software was changed. The storage and networking configurations were identical as well.

First comparing the cost of the two solutions, see the following charts:

RACpie 

Oracle RAC Solution Component Costs

VMwarepie 

VMware Solution Component Costs

OverallChart

Overall Price Comparison Between the Two Solutions

So, obviously the price for the VMware solution is far less than the RAC solution. No surpirse there, since Oracle Database 11g Standard Edition was used instead of Oracle RAC 11g Enterprise Edition, a huge savings in terms of license costs.

The surprise came in the area of performance. We ran a TPC-C style performance test (using the Quest Benchmark Factory tool) against both solutions, and produced the following results:

TPS

Transactions per Second

Users 

Users

Softwarecosts 

Software License Costs

As you can see, the performance of the virtualized solution actually exceed the RAC solution significantly.

In future posts, I will be discussing more of how we accomplished this feat, and what the areas of suitability of virtualization are with respect to Oracle. If you would like to discuss this with me live, please come by the EMC booth at OOW this week. I will be there, as well as presenting this solution in in VMware booth.

See you at OOW.

May 16, 2008

Why Oracle is the most interesting technology space for EMC

I read Jason Kotsaftis's recent blog post with interest, since I too am very biased. Rather than commenting on Jason's post (everyone knows comments very seldom get read), I thought I would wade in with my own take on this issue.

When I came to EMC, I was shocked at how Microsoft-centric we are. Not that Microsoft is an uninteresting technology space. Far from it. I simply maintain that Oracle is more interesting, and that Oracle has been woefully neglected by EMC in the past (although that is turning around rapidly due to the great work of folks in Jason's organization).

Looking at the two spaces: Microsoft and Oracle, which is more interesting for us? I maintain that Oracle is more interesting because of several factors:

  • First is the halo effect. I talk about this a lot, but for those who don't know me, here is the idea. Oracle is the most expensive piece of software ever written for general purpose use. (It is also the most complex.) To give you an idea of the cost of Oracle, I recently visited with a major Fortune 500 company and met with their DBA group. They shared the cost of their Oracle license with me. That cost for them is $22 per GB per month. They have petabytes of this stuff. Think about that for a minute. That is an astounding number. This company is launching an entire project to reduce the cost of the Oracle license, because it is such a huge component of their IT budget. The effect of the fact of the cost of Oracle is simply this: The customer cares, and cares deeply, about the Oracle infrastructure. It is their crown jewel. They have paid dearly for it. In order to play in that space, you need to be the most enterprise-ready, robust, reliable, resilent technology in there entire environment. What does this mean for you? If you can store and manage the Oracle database data, you are by definition the best vendor in their datacenter. You are handling and helping manage the customer's most important, expensive, precious environment. Are you then good enough to store and manage the Microsoft data? I would say, virtually by definition, yes. So, if you qualify yourself to store Microsoft data, you do not automatically qualify yourself to store Oracle data. On the other hand, if you have qualified yourself to store the Oracle database data, you are almost all of the way there in building credibility in the Microsoft space, especially with the higher level managers in the customer's organization who have visibility into both those technology spaces. This is the halo effect. It is very real. I have seen this work in exactly this manner many, many times in my career.
  • Of the two companies, Microsoft and Oracle, who is more aligned with EMC? Microsoft still makes the vast majority of their revenues off of the sale of consumer-oriented desktop software. Oracle is truly the enterprise software company. They dwarf Microsoft in that space. This is exactly the business we are in. If you look at the Symmetrix and compare it to Oracle, it amazing how many similarities there are in terms of the market they address. We own this space and so do they.

 

Is Microsoft easier? Absolutely. Oracle is a tough company to work with. They have a killer technology and they know it. They want to own the market. They do not believe that they need us. They know we need them. Their technology is far more complex and difficult than Microsoft.

All of that is true. But people make money doing things that are hard. Doing a great job of addressing the Oracle market will yield huge rewards, far beyond the costs of doing so. I strongly support the effort of Jason, Jeff, and Vince in building the Oracle relationship, and I am in for the long haul in helping them to do so.

February 20, 2008

"Blended" FCP / NFS Oracle Solution (Reprise)

I have received an email comment concerning my previous post on this subject, which expresses a common misconception. This comment is as follows:

I wanted to clarify I understand your design in your latest blog post entitled "Blended FCP/NFS Oracle Solution".

In your design, you have the Celerra unit acting like a NetApp gFiler/V-series would formating a LUN on the CX-3 and then servicing NFS requests to the Oracle hosts in addition to the Oracle hosts connecting directly to the CX-3 for FCP operation. This design implementation required two products (Celerra and CLARiiON) to implement versus a NetApp filer that could serve both NFS and FCP LUNs from the same storage engine.

More to the core of the issue is the performance issues that you mention. The tradeoff is that in the NetApp environment you have a filesystem designed for NFS and then the tacked on LUNs implemented as a file instead of EMC LUN environment with a unit tacked on for NFS.  On the other side of the coin you have the disk space reservations needed for LUNs on the NetApp filers that you would not see with the LUNs on the CLARiiON, but you have increased filesystem usage on CLARiiON from the Celerra that you would not have with WAFL.

Let me know if I am on the right track.

On the first issue, the writer of this comment is both correct and incorrect. Yes, Celerra uses CLARiiON as the back end. No, you do not need to purchase an additional product. This is because a Celerra, in either the integrated or multi-protocol versions, includes the CLARiiON back end.

Take the Celerra NS40. This is a Celerra head, consisting of two data movers and a control station (think of these as being similar to the NetApp head), connected via a FCP network to a CLARiiON CX3-40 back end. The following graphic makes this clear.

Untitled2_2

What EMC NAS Engineering did was actually quite brilliant. They simply exposed the FCP ports of the back-end CLARiiON for connection to hosts. Again, this provides both NFS (and CIFS) access via the Celerra front-end and FCP access via the CLARiiON back-end. This is what we now call the Celerra NS Multi-Protocol series array.

The way that this is bundled is also nice. I do not do pricing, as that is not really my area within EMC. However, I have been present when pricing was presented to many customers. It turns out that the cost of a Celerra NS40 and a CLARiiON CX3-40 are basically the same. The effect of that is that the customer gets the additional functionality that the Celerra provides for free.

It's kind of like buying a Lexus for the price of a Toyota. The FCP access to the CLARiiON CX3-40 is identical, but the Celerra NS40 provides additional functionality at minimal additional cost. That's a good deal for the customer.

Hence the blended solution. You can take a Celerra NS40, which inherently includes the CLARiiON CX3-40. You use the Celerra for NFS access to Oracle objects that do not need high-performance, low-latency I/O. The rest you place directly onto the CLARiiON CX3-40.

In the process, by the way, you actually install and configure less software on the database host. This is because, as I pointed out in my last post, you must configure a shared storage layer for the CRS files, which cannot be managed by ASM. Typically, this would require OCFS2, or even raw devices. You get this for free with NFS, which is already installed for you.

The commenter is actually on the right track on the second point. I have covered this fairly thoroughly in other posts on this blog, but the issues with NetApp FCP access include:

  1. A LUN is actually a file in the WAFL file system. This can be easily proven by mounting the volume where a LUN resides via NFS or CIFS. You will find there a file of the same size and name as the LUN.
  2. WAFL has some very troubling aspects with respect to sequential read performance. See my previous post on this point.

With EMC CLARiiON, again, a LUN is a LUN, i.e. a storage object you configure directly onto a RAID group. No file system in between you and the disk, in other words.

Keep the comments coming!

February 18, 2008

"Blended" FCP / NFS Oracle Solution

The program that I manage just published a solution that I am pretty jazzed about. This is in conjunction with the EMC Celerra NS Multi-Protocol Series Array. This array allows for both traditional NAS (i.e NFS and CIFS) access as well as FCP access to the CLARiiON CX-3 Series back end array.

Yes, I know. NetApp provides multi-protocol access already. However there is a very big difference between the FCP access provided on a NetApp filer and a CLARiiON CX-3 Series array. That is with NetApp, the FCP access is a band-aid solution which is really a special file running on top of the WAFL file system. With the CLARiiON CX-3, the LUN that you see over FCP is a really, live, good, old-fashioned LUN sitting on a RAID group. No extra WAFL file system to muck up your read access, or otherwise complicate things.

In a word, it's simpler.

Having said that, NFS has its place. There are lots and lots of files which must be managed by an Oracle RAC database server which do not require the low-latency, high-performance access of FCP. Further, if you are using ASM (which we are), then many of these files cannot be stored over ASM. This means that you need a clustered file system on top of ASM.

Guess what? You already have one. Which is completely ubiquitous and automatically installed on every UNIX and UNIX-like operating system in the industry. It's called NFS.

And it works just fine for files like the CRS files, i.e. the voting disk and the OCR file. It also works beautifully for backups, flashback recovery files, and archived logs. These files absolutely do not need the high-performance, low-latency access of FCP. Why not free up your expensive SAN and use NFS over IP to manage these files?

That's the idea behind the blended FCP / NFS solution. I have not done an exhaustive search, but as near as I can tell, no other storage vendor has done this yet. The blended solution looks like this:

Untitled_3

The blended solution can be found here. This will be the vehicle whereby our program showcases FCP solutions from EMC from now on. I hope you find it as innovative and interesting as I do.

Powered by TypePad
View Jeff Browning's profile on LinkedIn

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.