« Oracle Backup: Which Snapshot is best? (Part 4) | Main | Thin Provisioning Part II: Why TP Works Well for Oracle (Sometimes) »

August 26, 2007

Comments

Pq65

So I'm no Oracle guy, but I do have a question regarding the following comment:

"The difference, though, is that the DBA will actually create datafiles which fill that space. And Oracle then makes a physical file on the device of that size, creates extents in this file, and writes zeros to it."

So what would the effect be in above scenario, if Netapp were to turn on their deduplication capability on the volume with or even without Thin provisioning enabled? Given that the file has a lot of zeros into it, that means there's a ton of duplicate blocks which means that these blocks can easily be shared.

Any comment?

Chad Sakac

Network Appliance's Deduplication (A-SIS) is very interesting, but not particularly applicable in this space. The primary use case is LAN B2D (i.e. CIFS/NFS on a low-cost storage subsystem - or on a SnapVault target).

ASIS is run on a manual or scheduled basis, and deduplicates the 4K level. So, if you have a bunch of zeros, the data would be deduplicated when ASIS was run, but then not again until it was run next. Those zeros are going to change pretty rapidly.

Network Appliance's approach is innovative and different - analagous to a trash compactor (run periodically) rather than a shredder (run in realtime) ala DataDomain and Avamar.

So - if used on a production filesystem, the fileystem needs to be big enough for the "pre-deduped" content, then goes through compression. In the Oracle on NFS case, this wouldn't benefit since the filesystem would then need to be shrunk. It would benefit only in the case that Jeff describes through - very large scale, very overscribed. There, where some databases would be growing fast, great, others stangnant - in which case the you could "compact" and be oversubscribe with minimal risk - so long as someone was watching the henhouse.

So if not on production (i.e. not applicable where the poster suggests), then what about on a B2D target - it's arguable which de-dupe approach works better - NetApp style, DataDomain sytle or EMC (Avamar) sytle. NetApp is definitely the lowest initial cost (Nearstore license), Datadomain is in the middle, Avamar the most expensive, but then again, the true price of B2D is all about the compression one can acheive.

Compression/Dedupe rates are all about two things: 1) granualarity (the smallest chunk of data that can be marked as duplicate) and; 2) the domain size (i.e. how much data is being de-duped). Avamar wins on both counts - variable chunk size - as small as 12 bytes (vs 4K) and deduplicates the entire customer dataset (vs. a 1-4TB filesystem) - it's for that reason that customers regularly see 50:1 compression ratios with Avamar.

Dominick De Ranieri

I thought EMC did thin provisioning on the Celerra head not the array?


"I say this without any bias whatsoever, since my employer sells arrays that provide thin provisioning too."

Chad

Re: "thin provisioning on the Celerra head, not the array" - all the thin provisioning mechanisms on the market today need to do some sort of "encapsulation" of the block backend.

A Celerra head is the same idea as a FAS filer "head" - a NAS device (that can do block devices as files in filesystems). EMC and NetApp take a different approach on how to do the back end - NetApp uses JBOD, and the filer head does the block functions (like RAID, BCS, etc.). EMC uses an array backend - so all block functions are handled by the array. each do certain things well. personally, I like the way that NetApp uses disk signatures rather than bus locations to identify disks. But the FC implementation and clustering model of EMC is superior. Celerra + backend is very analagous to a FAS array.

Geoff Hough

The idea that DBA's don't and won't use Oracle auto-extend with thin provisioning is erroneous. 3PAR has many thin provisioning customers happily using auto-extend for years who have never reported the slightest example of this “expensive" performance hit. 3PAR Thin Provisioning customers routinley use Oracle auto-extend to reap the benefits of thin provisioning. One Fortune 10 company has scores of TBs of thin provisioned volumes presented to transactional Oracle databases in auto-extend mode. The claim is completely spurious and without foundation with respect to 3PAR’s product, though I cannot speak for other implementations.

Geoff Hough
Director, Product Marketing
3PAR

Anne_Compellent

For additional information, Compellent discusses its thin provisioning support for Oracle here, http://www.drunkendata.com/?p=1371.

Christian

Hi,

I might be late in this debate... but better late than never.

I'm a bit surprised by your position. I did not had time yet to dig into thin provisioning yet, but I'm considering doing it for a large Oracle/DMX customer (>100TB or Oracle data) soon. More than 90% of all DB denial of service in 2006 were related to data files or archive log filling up. Data file space below high water mark + max(archivelog) vs oracle allocated space (data file +file systems) shows a ratio of 1:2, and still all Oracle Data files are in autoextend in very large tbs (>200GB). Despite those precautions, there are at least one denial of service a day because they are basically unable to predict accurately enough the business activity.

Agreed, this might be a bit extreme but we really have a case for thin provisioning (if technically feasible).

Cheers

Christian

jon crisler

I have to agree with the 3Par guy Geoff: Thin Provisioning on 3Par works fine and as-advertised. We use 3Par extensively at my company, with Oracle 10g and ASM, and it works. If the customer agrees to use TPVV and ASM, we allow them to use up to the contracted amount of storage and charge them less, and the auto-extend is not a painfull process at all. However, we do autoextend in large chunks, generally about 100mb at a time. Is there a performance penalty in doing this ? Yes, somewhat, but only during those moments when an autoextend occurs, which is not that frequent. Perhaps it just works faster in a ASM environment compared to a regular filesystem. But if your paying a lot less and are willing to take the very slight performance hit, its a good tradeoff.

seth

Also, oracle 11G introduced SMCO. It wakes up once an hour and pre-extends autoextenable table spaces to keep 10% of the space inside the tablespace free. Prevents the potential impact of need to extend the file as a result of an OLTP operation, but should allow you to take great advantage of thin provisioning as well.

Jay Weinshenker

Sorry Jeff, I've got to disagree.

"DBAs avoid auto extending files like the plague for this reason. They will allocate the space they need, always, when they create the database. And future expansions in space will be made intelligently, carefully, and methodically. That’s the way DBAs think. Believe me, I know. I am one of them."

Like Seth says, 11g has a feature to minimize the impact of this feature.

Like you I've been a DBA for 10+ years. Like you I also run Oracle on EMC (NS-480). Yes, what you're describing is how many DBAs think - and as a result, I think many DBAs need to work with storage admins to better understand how things work. Most DBAs I know of don't understand the storage or virtualization. They need to learn that as a result. I think that's one of the few upsides of Oracle getting into storage and virtualization spaces - they'll help educate old school DBAs.

I run all my cooked file system Oracle DBs with autoextending datafiles that have max file sizes. I use Oracle Enterprise Manager to monitor growth inside the tablespaces / datafiles. I use Navisphere/vCenter/Nagios to monitor the LUNS/Datastores/disks. Yes, there is a performance impact when a datafile extends but the tradeoff of less spaced used (especially when cloning my Production VMs) is worth it to the business in terms of minor performance hit vs. buying additional storage and replication.

The issue you describe is not a technical issue, its a DBA education issue. It's like complaining cars are bad because people don't know how to drive them.

The comments to this entry are closed.

Powered by TypePad
View Jeff Browning's profile on LinkedIn

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.