This post is an update on the 11g new feature called Direct NFS (dNFS). A lot of marketing buzz has been issued on this feature recently. I hope to add a bit of reality to the discussion.
Here is the theory behind dNFS. I/O on a database server occurs in a combination of user space and kernel space. Context swaps between the two spaces are expensive in terms of CPU cost. If you can move a part of that activity from kernel space into user space, you can save CPU cost due to reduced context swapping. One example is the difference between Linux kernel IP port bonding and the ability of dNFS to make use of multiple paths. Linux kernel bonding is expensive in terms of CPU cost and less efficient than dNFS, according to the dNFS white paper referenced below.
Further, dNFS avoids double-caching by skipping file system NFS client-side caching. The data is only cached in the database buffer cache, thus avoiding wasted memory and CPU cost in caching the data twice.
These benefits are thoroughly covered in the dNFS white paper on Oracle’s web site. This paper was co-authored by Kevin Clossen, and it does a pretty good job of stating the potential performance benefits of dNFS.
Fundamentally, I agree with these potential benefits. Problem is, that is all they are right now: potential. This is true because in our testing Oracle RAC 10g Release 2 provides far better performance than Oracle RAC 11g Release 1 does, even considering that 10g is running over plain-vanilla kernel NFS (kNFS) and 11g is running over dNFS. The reason for this is simple: 11g is only available in 32 bit. 10g is running in full-blown 64 bit.
You can say that comparing 10g 64 bit to 11g 32 bit is apples-to-oranges, and I will not necessarily disagree with you. However, bear in mind that in all other respects, the configuration was identical.
Here are the numbers. On 10g using Release 2 for x86-64, we are getting 10,200 users with a bit over 500 TPS. (Again, bear in mind that these are TPC-C style transactions. Do not compare these to normal Oracle transactions.) On 11g using Release 1 for x86-32 on identical hardware, after an excruciating level of tuning, we ultimately maxed out at 7,300 users at around 380 TPS.
Now could we get more on 11g R1 using dNFS than 11g R1 on kNFS? Maybe. I didn’t test it that thoroughly. Honestly, I don’t consider it particularly interesting. 10g is readily available and provides far more performance than 11g on dNFS.
So when does dNFS become interesting? Or stated in a larger context, when does 11g become interesting? From a performance standpoint (which is something near and dear to my heart), 11g becomes interesting when it ships in 64 bit. Until then, this is all simply theoretical. That will happen, from what I am told, sometime in Q4. Stay tuned on that.
Will we jump on 11g performance testing as soon as we have a 64 bit version with reasonable performance? Most definitely. Do I expect to see a performance benefit for dNFS at that point? Yes, I certainly do.
The other misconception which has arisen with respect to dNFS has to do with the press release issued by NetApp stating that the dNFS feature of 11g was developed collaboratively by NetApp and Oracle together. This press release is true but misleading.
It is certainly true that NetApp was directly involved in the
development of dNFS. While I was not directly involved in this development,
much of it occurred while I was at NetApp. I think I can even claim some minor amount credit
for the initial idea behind this feature. This occurred way back in the late
90s. There were a series of discussions among several of members of the NetApp
staff, myself included (most of whom are no longer there) around how you would
optimize Oracle I/O to a NAS device. The outcome of those discussions was the idea of taking the NFS client and putting that inside the Oracle kernel. This is what dNFS effectively does.
So, yes, NetApp was certainly integrally involved in the development of the dNFS protocol. I do not question that at all.
The misleading aspect of the press release is the unspoken implication that dNFS will somehow work differently or better on NetApp NAS gear than the equipment manufactured by other vendors, including EMC. That implication is completely false.
Remember what we are talking about here: CPU cost on the database server. File system caching on the database server. All of this is about optimizing the use of resources on the database server. dNFS is an NFS version 3 implementation, as the Oracle white paper clearly states. As such dNFS will work in exactly the same manner, with identical performance benefits, on any NAS device from any vendor. Including Celerra by EMC.
11gr1 x86_64 has been available on OTN for 5 days now.
-----------------------
Kevin:
Yes, I noticed that 11g 64 bit was shipped the same day I put this post up. In fact, I said so on my next post, which I put up yesterday. Like I said in that post, the timing was kind of embarrassing. But whatever. We are in the process of spinning up a 64 bit Oracle 11g Direct NFS testbed as we speak, and I will get back to you with performance results on the blog as soon as I have them, hopefully by OOW. Based upon your excellent white paper on Oracle's web site, I am expecting great things.
Regards,
TOSG
Posted by: Kevin Closson | October 24, 2007 at 11:14 PM
Hi Jeff,
Do you have any comparisons when using IP Multipathing from Sun with NFS v4? DNFS reminds me of ASM and Unbreakable Linux..
One would assume that an OS vendor would be better at making NFS and multipathing work rather than a database vendor. Oracle has a disadvantage in that it has to test it's client on all the various flavours of OS out there.
Regards
Krishna
-----------------------------
Krishna:
I am a creature of Linux at this point. While my program has done one solution using Windows, that was not done by me. Therefore, I would have to say, no, I do not have done such a comparison.
The fact that Oracle is apparently able to do such a better job at CPU utilization on using multiple IP pathways is surprising, though. I will certainly be looking into this more in the context of Linux.
Regards,
Jeff
Posted by: Krishna | November 09, 2007 at 12:39 PM
Hi
I think (storage guy) is the right person to answer my question. Please look into it as I m stuck.
I have 2 Dell 2950 Servers and 500 GB Buffalo NAS with Windows 2K3 X64 EE, Could any please tell me whether I can install Oracle RAC with this kit.
If yes , how can I configure shared storage and present it to OS so that both nodes can see it.
Please help..
--------------------------------------
Bal:
Windows with NAS is problematic. In an 11g context, you can use dNFS, but that does not help you with the CRS files. You will need to at least use iSCSI. I do not know the capabilities of Buffalo NAS, so I am not sure how to guide you there.
In general, if you can use iSCSI with this NAS box, you should be able to configure RAC. I would highly recommend with NAS that you go with Linux, though. In that case, you could use NFS for the entire setup.
Hope this helps.
Regards,
Jeff
Posted by: bal | August 14, 2008 at 08:26 PM
Guy,
Is there any performance tests available to demonstrate how DNFS performs vs. using ASM?
Posted by: William Adams | June 23, 2011 at 01:10 PM