« Oracle Backup: Which Snapshot is best? (Part 1) | Main | Oracle Backup: Which Snapshot is best? (Part 3) »

July 26, 2007

Comments

Mike

Hi Jeff:

This is really interesting stuff.

I was hoping you can clarify NetApp's RAID choices for your readers. It used to be that NetApp said RAID 4 was good because they used an NVRAM card with a journaling file system and this made things safe and secure even though there was a hot disk. A few years ago they introduced RAID DP to make things more secure. Now they seem to be using RAID 1 for their Metrocluster. Can you help us understand all of the background to these RAID choice permutations?

open systems storage guy

These two types of snapshots do two different jobs. Copy on write snapshots are indeed higher performance, however it's at a cost. Their performance hit becomes much more serious when more than a couple of concurrent snapshots exist at the same time. A couple of percent degraded performance is fine for a single snapshot, however multiply that by 8 and see how fast your writes go.

Since most people only use snapshots as an intermediate step on the way to a real copy (either in the same array or elsewhere), there's no issue. Netapps, however, bills their snapshot as lower performance on average, but no degradation in performance if you take 100 snaps instead of 4. They are usually upfront about their WAFL technology not being performance driven, however they can do things that nobody else can.

-------
Response to this comment by the Oracle Storage Guy:

We actually tested this scenario in our Commercial Solutions Group, and the effect described by the commenter is not consistent with what we saw.

We kicked off six simultaneous snapshots and saw a bigger performance hit. Still an acceptable performance hit, but definitely bigger than a single snapshot. Bear in mind that this was six snapshots being generated simultaneously.

We then backed off from there an generated six snapshots with a four hour delay between them. Over a twenty four hour period, in other words. In this scenario, which I find to be far more real-world, each snapshot had exactly the same COFW overhead as a single snapshot did. That's because the COFW penalty had been paid, and by the time next snapshot hit, there were no further blocks being written for a given snapshot. In this situation, each snapshot was exactly symmetrical, and identical to the performance hit of a single snapshot being kicked off by itself.

So, no, in most real-world customer situations, multiple snapshots using COFW technology are not a problem. I will discuss this a bit more in the blog when I post the next time.

TimC

@mike:

There is no *hot disk* with raid4, if you believe there is, you do not understand raid4.

@op:
What you seem to have failed to mention is that the netapp *read* issue isn't an issue at all. It's easy to say *from a theoretical standpoint the mixed blocks are trouble*, I would expect more from you though as you came from netapp. We both know in reality, you still get more than enough IOPS for pretty much any database workload even with the *moving head* "problem".

------
Response by Oracle Storage Guy:

I did not say that RAID 4 causes a hot disk, so far as I know. Perhaps you can point out where you think I said that? That is not my intention.

In terms of the sequential read after random write I/O performance problem, this issue was well documented at NetApp while I was there. It is not the case that this issue is imaginary. I cannot give specific details without violating my confidentiality agreement with NetApp, which I will of course avoid doing. (Everything you see on my blog, other than personal stuff of course, is documented from public sources.) So, no, I do not agree with the commenter that this is a spurious issue.

I will always post comments to my blog, though, regardless of whether they are favorable, reasonable, or so forth. I may, as I am doing in this case, provide a clarifying response.

David

OMG, this is the best article on snapshots ever. Its easy to grasp and explains everything that EMC and NetApps never cared/wanted to mention.

Korwin

Re: "A couple of percent degraded performance is fine for a single snapshot, however multiply that by 8 and see how fast your writes go."

CoFW (Copy on First Write) does not need to write the data 8 times, using your example. It would only need to copy the data to the RLP (reserved LUN pool) once, then update metadata for the 8 snapshots.

Nice write up, thanks for sharing.


-----------------------

I am very glad this was helpful. Thanks for your kind comments.

Regards,
TOSG

 TimC

@op:
"In terms of the sequential read after random write I/O performance problem, this issue was well documented at NetApp while I was there."


You've apparently forgotten, or are dismissing the impact of a little thing called NVRam. Your example of "look at all those head seeks" is completely invalidated when the data is sitting in NVRam. Exactly where it would be sitting if you did a sequential read immediately after taking a snapshot of that data. It's where it would be sitting after you read the first block of that data as well.

For someone trying to be honest and not drinking kool-aid from either vendor, you did a pretty poor job of it here.

---------------------------

Tim:

Sorry to burst your bubble, but NVRAM has no impact whatsoever on read performance. NVRAM is a write cache. It does improve write performance by buffering writes before they need to go to disk.

In terms of reads, the laws of physics apply. A block which is in the cache (not the NVRAM, as I said that is only used for writes), will not have to be read from disk, granted. In the case of a large sequential I/O, like a database full table scan, the cache will very quickly be exhausted and subsequent reads will have to go to disk. This is a very well know performance issue. Memory cache has minimal impact on large-scale sequential I/O operations like full table scans. Instead, the sequential ordering of the data is the critical area where optimization can be obtained.

I have personally seen and documented SRARW. It is not Kool-Aid.

Regards,
Jeff

TimC

I'm sorry, that's just not true. The caching more than makes up for the "scattered" blocks on sequential IO. It's 16GB of cache... with even a LITTLE bit of intelligence it can easily get the data read ahead of time. If you're suffering that badly you've GROSSLY undersized the back end spindle count, and would be suffering regardless of vendor.

I can only assume you haven't touched a filer since you left NetApp. I'll gladly setup any *test* you'd like in my lab with a FAS3XXX and a Clariion CX3-20. I've beat on both multiple times and this phantom performance hit you talk about with sequential reads is a fallacy at best.

On the other hand, I can easily create a severe performance degradation on the clariion by leaving multiple snapshots out there on a volume. It's easily enough engineered around, but to even begin to try to compare those two *problems* as equal is ludicrous.

There's PLENTY of things you can pick on netapp about, but this is pathetic. Anyone who's spent a day with a filer in production or lab will call your BS.

------------------------

Tim:

I assume you agree with me about NVRAM as being not for reads, since you are focused on the main cache at this point, so I will drop that.

Remember that you are talking about full table scans on large Oracle database files. 16 GB is simply not enough cache to avoid going to disk.

Do this for me: Go talk to Rich Clifton. Remind him of my conversation with him in which I discussed doing a TPC-H with NetApp storage. This occurred when I was the head of Database Performance Engineering at NetApp. We had recently completed the TPC-C that I oversaw there. Ask him why he told me that doing a TPC-H would be very, very difficult and would basically require a rewrite of WAFL.

Give my regards to Rich while you are at it.

Regards,
Jeff

brianh

I'm interested of the complimentary effects of things like read-ahead algorithms, striping, and mirroring.

I'm of the opinion that these additional processes generally flatten out the significant differences between the two methods.

At the application layer (large row updates, or media file updates), are the differences consistant over time?

thanks
brian

-----------------------------

Brian:

In some respects, the features you mention will worsen, not help, the sequential read after random write issue. Read ahead is a good example. In a storage context, read ahead makes the assumption that the next set of physical blocks on disk contain contiguous data. In a fragmented file system like WAFL, this is not necessarily the case. In that case, read ahead will crowd out the cache, and thus hurt performance. In an Oracle context read ahead (multiblock read count) will attempt to read the next database blocks on disk, again assuming that these blocks are both contiguous and therefore easily accessible, and will be used by a future query. If both those assumptions are correct, then this will help performance. If the blocks are not contiguous, then this will cause an increase in random I/O and hurt performance.

Mirroring in the RAID context of RAID 1 is not relevant to NetApp storage due to the fact that they don't support it. My blog discusses this. EMC has a much richer set of RAID configuration options than NetApp does. Mirroring does enhance read performance because twice as many spindles are available to read from. It costs you a little on write performance and uses more disk, of course.

Striping does not really enter into the discussion as much as that is inherent to both technologies.

Hope this explanation helps.

Regards,
Jeff

Ranjit

Jeff ,
gr8 article and very cool-headed responses which are backed by only facts . You should be doing freelancing for techstorage or networkworld .
I know EMC storage technologies pretty well but this article just re-inforces the known.In fact there are very few article/blogs free from tech bias & marketing spins .

----------------------------

Ranjit:

Thanks very much for your kind comments. It certainly makes what I do worthwhile.

Regards,
Jeff

Hoang

Jeff,

First, thanks for the post...it's quite informative. Second, the way you explained sequential read on NetApp can cause huge performance degradation due to multiple copy of snapshots retention. So if my nightly snapshot of 300Gg db with 2 weeks retention could cause my db crawls to its knee given the fact a netapp block could have changed 14 times and it is needed at some point on the 15th day. Can you please confirm my understanding? And finally, my scenario above is somewhat exaggerated, but it seems having a 14-day retention is a bad idea on NetApp, can you please comment on data retention or if you have any recommendations.


Thanks,
--Hoang

------------------------

Response:

The period during which you retain the snapshot should not matter. The WAFL fragmentation issue will occur regardless of the period of time a snapshot is in place. If your file system is beginning to slow down after retaining a snapshot in place for a very long period of time, that may be due to space. The amount of space overhead occupied by a snapshot will tend to increase over time. NetApp WAFL is notorious for slowing down when it becomes crowded for space.

David Ross

Pretty good description of the two methodologies. Perhaps I've drunk the kool-aid, but my experience does not match what you're saying.

I have first hand experience that the Clariion grinds to a halt with as few as 2 snapshots active. Caused a Production issue based entirely on the overall SP performance (not on the snapshotted LUN itself). This was a 3/40, so perhaps it is better now with the new processors. But then again, NetApp could say the same - and I think you're saying that it isn't really a CPU issue.

On the NetApp we see no performance degradation with several snapshots - I'll admit that the read performance is perhaps harder to measure...

Have you taken into account how NetApp uses large RAID groups and multiple RAID groups in an aggregate. And the entire aggregate is available for WAFL to use. I also think you're underestimating how much 16GB of read cache will negate most applications. Databases might be the only concern. Which is why NetApp introduced FlashCache.

What EMC'ers tend to always fail to mention is how freakin' hard everything on EMC products is to administer - and more importantly how simplifying the storage environment can make a company more responsive and agile. Again, this sounds like kool-aid, but I've lived it. EMC products are disjointed and all these "features" are more difficult to get at. Clariion snapshots are entirely different administratively than DMX BCV's. If it's too hard to understand, the service won't be used properly - or at all.

I happen to think NetApp strikes a nice balance between these views. Not perfect, but nothing is.

Weiyi Yang

It's 2011. My 3240 comes with 512G flash cache. That's bigger than my top largest tables combined.

The comments to this entry are closed.

Powered by TypePad
View Jeff Browning's profile on LinkedIn

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.