I have just finished installing CRS (cluster ready services) for Oracle RAC 11g for the first time. If you would like to share my pain, take a look at this SR on metalink. As you can see, I meticulously followed the pre-installation instructions in the clusterware installation guide, including running the configuration checker runcluvfy. It ran perfectly. All tests passed. Green lights all the way.
Then I ran the Oracle Universal Installer, and it ran perfectly until it got to the point where you run the root.sh scripts.
At which point, those scripts hung, spewed out nasty little error messages and fell on the floor. Ouch!
Turns out this was related to a permissions mismatch. Last week was my week for this sort of thing. However, I have found in the past that issues installing CRS are depressingly common. Hence the need for a preinstallation checker. Which obviously does not check for everything you need to have a successful install.
Don’t get me wrong. I love the Oracle database product. It’s definitely the best database in the business as far as I am concerned. It is a mind boggling product. Absolutely the most reliable, recoverable, and high performing database on the planet when correctly configured, tuned and such.
But the cluster layer of RAC has many issues, not the least of which is installing it. One of those is cost. On the Enterprise Edition version of Oracle software, you get to pay a 50% up charge for the privilege of running this beast. Then there is the “RAC tax”. That’s the CPU cost of running CRS, which I am told by the folks at Hotsos is around ½ of a CPU for each node in the RAC. (Haven’t heard of Hotsos? You should. They are the best in the business as far as I am concerned in the area of Oracle database consulting, especially for performance tuning.)
Given all these issues, the question then is: Why run RAC? That is the question I am struggling with. The rest of Oracle, absent RAC, is fairly simple, and works extremely well. Yes, there are a few issues to installing and running Oracle Database 11g in single instance mode. But they are nowhere near as daunting as the issues of RAC.
RAC gives you two things:
- High availability. This is the heart and soul of RAC. You can encounter a failure of any component, and the overall database will stay up and accessible to users.
- Scalability. You can add nodes to the RAC, in order to scale up the solution. Actually, this is technically referred to as “scale out”, not “scale up”. Scale up is the concept of adding capacity to a single server. Frequently, this involves downtime or a forklift upgrade. The Sun Enterprise Series was an attempt to solve that problem while maintaining a single monolithic server solution. It was pretty much of a failure. Loosely coupled clusters are the current conventional wisdom on the way to solve this problem. However, there are much more mature and easier to manage cluster solutions than RAC.
Looking at the first of these items, the first question you would ask is: Can you provide high availability to Oracle in another way? Given the economic and CPU costs, complexity, and difficulty of CRS, alternatives should be relatively attractive. Expect to see a series of posts from me in the next few weeks exploring alternatives to RAC/CRS in providing high availability. Not surprisingly, a few of these solutions may come from EMC.
On the scalability side, I am not sure that the level of scalability that RAC provides is really required for most customers. There are two data points here: Memory and CPU. On the memory side, there is no question you can scale very high on a RAC solution. The following table compares a four node RAC solution to a single node solution. The RAC solution uses our current powerhouse, the Dell PowerEdge 2900. The single instance solution uses the high-end Dell server, the PowerEdge 6950.
Platform | Memory | CPU |
---|---|---|
Dell PE2900 | 48 GB (192 GB for a 4 node RAC) | 2 2.66 GHz Quad-Core Intel Xeon processors (32 effective processors in a 4 node RAC) |
Dell PE6950 | 64 GB | 4 3.0 GHz Quad-Core AMD Opteron processors (16 effective processors) |
As you can see, the four-node RAC does have more than three times the memory and twice the CPU as the single instance solution. However, the following table, taken directly from the Oracle price list, points out the price difference:
Oracle Enterprise Edition | RAC Up Charge | Standard Edition |
---|---|---|
$40,000 per CPU | $20,000 per CPU | $15,000 per CPU |
Given the price difference, a single instance solution which works even a third as well for a third the price is a fairly good deal. That would assume that RAC did not cost you more in terms of CPU, memory and such. Which it does.
I can also tell you this: Very few folks need the throughput of even a four-node RAC. On our testing, a four-node RAC is maxing out at the high side of 10,000 users and over 500 transactions per second. That’s a lot of throughput. Very, very few customers need that much. Or even a third of that much. So the question is: How much scalability do you need? And is RAC overkill for that level of scalability, given its cost and other issues? In other words, if you can provide high availability and enough scalability for your application without RAC, do you really need it?
What will interest me greatly is how much we will get in terms of throughput on a single Dell PE6950 as opposed to the four-node RAC solution running on the Dell PE2900. I suspect that the price-performance of the Dell PE6950 solution using single instance Oracle is going to be better.
Stay tuned.