For those who attended EMCWorld in April, you are probably aware of the joint presentation that I made with Kevin Jernigan of Oracle. This session was sponsored by Oracle, and showcased the joint solution that EMC and Oracle have created together around the use of Oracle Direct NFS Client (dNFS) clonedb to create thinly provisioned, on-demand, lightweight clones of a running production Oracle database. Oracle and EMC are also jointly writing a white paper covering this solution, which will be published imminently on EMC's Powerlink website. I will advise you on this blog when this white paper is available.
In the meantime, the purpose of this post is to provide a summary of this solution, and give an overview of the results of the joint EMC / Oracle testing.
First a bit of history. Although EMC does not have the reputation in NAS storage for Oracle databases that some other vendors have, we have done a lot of work in this area, and I believe we have captured leadership.
EMC has been working on dNFS with Oracle since it went into beta in 2007 as part of Oracle Database 11g Release 1. We made our first joint presentation with Oracle on dNFS on EMC storage at Oracle OpenWorld in 2007.
We began work on a scalable performance solution in 2010, demonstrating performance within 5% of ASM over FC on equivalent server and storage hardware. This solution was also jointly published with Oracle.
Finally in this year, we have produced the dNFS clonedb solution which is the subject of this post, and which was presented jointly by Oracle and EMC and EMCWorld in April of this year. We also hope to jointly present this solution at Oracle OpenWorld 2011. More on this later.
Now some background. First on NFS. NFS was created by Sun in 1985. It is an RFC-based UNIX protocol based upon TCP/IP, and a sibling of FTP, SSH, POP, and so forth. NFS was undoubtedly the first widely deployed method for file sharing in the industry.
NFS and NAS are equivalent terms within the database storage industry. NAS simply refers to the storage of databases on a NAS array over NFS using an IP storage network. NFS became popular for database storage starting in the mid-1990s. I was personally involved in that movement during the period that I was working for NetApp. See my earlier posts in this blog for details on that.
NFS provided the following benefits relative to traditional SAN storage over FC:
- Cheap, low TCO
- Simple and fast to deploy, because of the underlying network technology, i.e. TCP/IP over Ethernet
- Easy storage virtualization
- Easy shared file system which supported grid computing
The challenges of NFS were:
- Relatively low performance compared to FC. This is because the I/O algorithms within NFS were optimized for file server I/O, i.e. a large number of lightweight processes using relatively low level I/O against a very large number of files. Database I/O is the exact opposite: A small number of very heavyweight processes doing a lot of high level I/O against a fairly small number of files.
- High CPU utilization of the protocol
- Management complexity across platforms. Each platform had its own wrinkles in terms of things like mount point parameters
- Poor scalability across Ethernet ports, requiring the use of complex, CPU intense port trunking at the Linux (or UNIX) kernel level
In response to these challenges, in 2007, Oracle introduced dNFS as part of Oracle Database 11g Release 1. dNFS was simply an implementation of the NFS protocol inside the Oracle database kernel. The benefits provided by dNFS were:
- Improved performance by tuning the protocol to match typical database I/O patterns
- Reduced CPU utilization
- Uniform management interface across all platforms (including, ironically, Microsoft Windows)
- Vastly superior transparent port scaling by implementing the ability to pool Ethernet ports within dNFS, rather than within the OS layer.
The results, as demonstrated by the testing performed by both Oracle and EMC have been exceptional: dNFS provides nearly equivalent performance to ASM over FC, while allowing the customer to use IP storage. This provides lower cost, better TCO, and simpler management. dNFS has been a wildly successful product for Oracle as a result.
Now to dNFS clonedb. In 2010, Oracle introduced dNFS clonedb as part of Oracle Database 11g Release 2. The idea behind dNFS clonedb is to create a feature which allows you to instantly clone a running Oracle production database which is mounted over dNFS. The clone is based upon a copy of the production database. This copy can be a storage-layer virtual copy, however, including a snapshots. Using the combination of this copy, plus the running production database, you can create an unlimited number of clones, each of which takes up minimal space, and has minimal performance impact on the production database.
Thus, the benefits of dNFS clonedb are:
- Clones are created instantly
- You can have an unlimited number of clones
- Clones take up minimal space
- No performance impact on the production database
- Performance of I/O against the clone is comparable to the production database
The objectives of the solution were simply to prove these benefits.
The solution has two major components:
- RAC production database with single instance dNFS clonedb (RAC testbed)
- Both production database and dNFS clonedb as single instance (SI testbed)
Two sets of testing were required because we wanted to test the impact of creating dNFS clonedb databases against a highly-scalable RAC implementation. Most of the testing was in this environment, so we will start with that one. However, we also wanted to compare the performance of the dNFS clonedb database itself to a normal dNFS mounted database. And since the dNFS clonedb database was running on a modest single-instance Oracle database server, this required for us to perform an additional performance test against a production server on the same hardware. This enabled us to directly compare the performance of dNFS clonedb against a normal production database.
Let's start with the testing of the RAC component. The network diagram for this component was as follows:
As shown in the diagram, the major features of the RAC environment are:
- Oracle Database (and RAC) 11g Release 2, which includes dNFS and clonedb
- EMC NS-960 unified storage array
- EMC Replication Manager (RM) which is responsible for managing the process of creating snapshots (using EMC Celerra SnapSure checkpoint) of the running Oracle production database
The first step of the testing was to create a baseline of normal performance on the production RAC database. This was done using an industry-standard OLTP benchmark. This produced the following result:
The next step was to test the performance impact of creating snapshots using EMC Celerra SnapSure checkpoint and RM. RM takes the database into hot backup mode while the snapshot is being created, and this results in a slight response time performance hit. However, TPS is not affected, and the response time effect is very short-lived (only during the brief period when the database is in hot backup). Here is that result:
And finally, on the last test in the RAC component, we created two clone databases using dNFS clonedb. These operations were run while the production database was under load using the OLTP performance benchmark, of course. The effect of these operations on performance was minuscule: We simply could not find any effect at all. The run was identical to the baseline for all practical purposes. We concluded that dNFS clonedb has no perceptible impact on the production database. Here is that result:
As you can see, as I said, this is basically identical to the baseline.
Finally, we need to add the single instance testing to show the performance impact of clonedb on the clone database itself. Here is the network diagram for that testing:
This is identical to the RAC testbed, with the exception that the production database is on a single instance server, instead of a RAC cluster. Also, the hardware configuration used for the production database was identical to that used for the clone database. This allowed us to run a baseline on the production system, and then compare that to the a performance run against the clone database. This produced the following result:
At the peak TPS, this result produced the following performance:
clonedb: 400 TPS
Non-clonedb: 445 TPS
Thus, the clonedb feature under an OLTP workload is within about 10% of a normal dNFS mounted database, which I consider to be quite respectable.
We also timed the clonedb creation and snapshot creation operations. They were very brief. The snapshot creation operation, including a complete RM run with hot backup, mounting, and so forth, took about 7 minutes. Creating a clonedb database only took about 10 seconds.
The final benefit we tested was space. At creation, the clonedb database takes up virtually no space, as the following chart indicates:
Thus, at creation, a clonedb of an 11 TB production database took up only 7.4 MB of space. This is slightly misleading though. As edits are made to the clonedb database, it will begin to occupy space. The space overhead of a clonedb database can actually be easily determined as the number of database blocks which are different between the clonedb and the production database. Nonetheless, the typical clonedb database can be expected to take up far less space than a physical copy of the production database.
I believe that dNFS clonedb is an incredibly attractive and powerful addition to Oracle's already feature-rich database offering. I would recommend anyone interested in exploring NAS storage for Oracle databases to give dNFS a strong look, and especially to consider clonedb for creating clones of production databases.