« "Blended" FCP / NFS Oracle Solution | Main | Why Oracle is the most interesting technology space for EMC »

February 20, 2008

Comments

Daryl

I believe the Celerra NS's come in two flavours. Integrated & gateway - you would need a gateway system to access the Clariion directly as this includes an FC switch in the middle.
The gateway unit was much more expensive than the additional cost of a couple of switches. Although I do agree this is a great solution - we looked at the NSx variant.

--------------------------

Daryl:

You are slightly out-of-date. EMC recently announced the NS Multi-Protocol series. This is similar to an integrated NS series array, but allows for FCP access to the remaining ports on the CLARiiON. This enables the solution described in my blog post and the EMC.com solutions page.

TOSG

Lee Razo

Hi Jeff,

Long time listener, first time caller.

As one of your long time ex-colleagues over at NetApp, I benefited for many years from your database expertise and have always held you in high regard. However I have to say I am just as confused by this posting and your previous one as the other commenters.

The fact of the matter is that your “really, live, good, old-fashioned LUN sitting on a RAID group” is simply and soundly outperformed by the NetApp FAS (See http://blogs.netapp.com/dave/2008/02/controversy-net.html)... And those results were achieved even as big mean scary WAFL enabled real-world features such as RAID-6 and rolling snapshots (at 15 minute intervals).

You also say:
“ I do not do pricing, as that is not really my area within EMC. However, I have been present when pricing was presented to many customers. It turns out that the cost of a Celerra NS40 and a CLARiiON CX3-40 are basically the same”

Let me help you with that one… on the baseline SPC run, the CLARiiON CX3-40 achieved a price/performance of $20.72 per IOPS (which skyrocketed to a whopping $59.49 per IOPS with snapshot enabled) compared to the NetApp FAS3040 at $13.61 per IOPS. It’s all posted here for thorough inspection:
http://www.storageperformance.org/results/benchmark_results_spc1#a00057

Say what you will about the Storage Performance Council and the SPC-1, but I am pretty sure the test was not designed in any kind of conspiracy to artificially show the NetApp FAS appear superior to the CLARiiON.

You go on to say:
“The effect of that is that the customer gets the additional functionality that the Celerra provides for free […] It's kind of like buying a Lexus for the price of a Toyota”

With all due respect, It sounds more like buying two Toyotas for the price of 3 and calling it a Lexus.

I will grant you the fact that I still work at NetApp and drink the kool-aid on a regular basis. However knowing your background and knowing what you know, I fundamentally cannot understand your argument that this Celerra/CLARiiON Franken-storage NAS thing is somehow superior to the NetApp FAS unified (and _native_) single architecture (which _just_works_).

---------------------

Response by TOSG:

Lee:

Like you said, long time, no see. Obviously, we disagree. Doesn't mean you and I can't still be friends.

On the issue of the SPC result you cite, leaving out the highly questionable propriety of filing a performance benchmark result using your competitor's equipment, I would say that I see performance issues related to WAFL's sequential read after random write issue on a frequent basis.

Of course, we have no intention of filing a performance benchmark on your equipment. :-)

In terms of the number of CPUs in our solution, that's the nature of a gateway-oriented NAS device. You guys sell one of those, right?

The advantage, again, is that you get identical NAS functionality as an integrated NAS box, plus the functionality and performance of the most popular midrange SAN box on the market.

Regards,
TOSG

Alex McDonald

Usual disclaimer; there's NetApp blood in my veins. And switching here; this is in reply to your previous blog entry.

I didn't mean that your Oracle RAC implementation using FCP/NFS was complex; I'm talking solely about the bit to the right of the switches, the EMC CX/NS part.

It's this that's complex. Fig 1 in the paper shows this complexity in much better detail than the figure in your blog.

Three OSes to manage; Flare (CX), Dart and the Linux on the control station (NS gateway). LUN masking and fabric zoning. CX presents LUNs to data movers and AVM to build into a pool for NAS. NaviSphere. A RecoverPoint cluster.

The Oracle bit is fine, but this! Just exactly how many pieces of software and hardware are in the data path? You keep banging on about NetApp and WAFL, so please explain; how is this simpler?

Now let me try to dispel a common misconception in this latest entry; that somehow the underlying architecture of WAFL is inferior when it comes to providing a unified architecture, and SAN in particular.

It isn't. The fact that you can see a LUN as a file is -- well, meaningless. The "fact" that "WAFL has some very troubling aspects with respect to sequential read performance" is only troubling if you're EMC. We could have academic arguments about WAFL, and how it's put together, and discuss theoretical corner cases; in fact, we could probably bore the pants off each other.

So let's not. I say how it works *in practice* is much more important.

Take performance. NetApp SAN performance is stunning; disk-for-disk and dollar-for-dollar, much better with dual parity RAID than a CX with "high performing" and low capacity RAID-10.

Take snapshots. Actually, on a CX, don't bother. A CX running snapshots dies. Now that's what I'd call seriously troubling. A NetApp box doesn't even break stride.

It really doesn't matter how WAFL does it's stuff. What's more important is fixing customers problems -- simply, effectively, reliably, without high cost, and all the while performing well for his/her needs.

That's why your "blended solution" is a non-starter in the real world. You'd like the Celerra-NAS/CX-SAN solution to be a Lexus for the price of a Toyota. It's more like a cut-and-shut; the front of an Edsel welded to the back of a Pacer.

----------------------

Alex:

Cute analogy. You got a chuckle out of me on that one.

I covered the snapshot issue in my previous posts, so I will not belabor that here.

In terms of the performance benchmark you cite, I also covered that in my response to Lee Razo's comment.

Regards,
Jeff

Lee Razo

"leaving out the highly questionable propriety of filing a performance benchmark result using your competitor's equipment"

Awww c'mon Jeff, EMC's been doing it for years! Why the sudden indignance? :-)

http://www.netapp.com/library/tr/3521.pdf

Chad Sakac

Lee - that's a bit of revisionist history to claim we've been doing it for years. NetApp is a great company, great products, no need to twist the reality :-)

The reason we published the "Performance Revealed" doc - the one that resulted in the NetApp "cooking the numbers" response in TR3521 - is that you guys did it first, commissioning Veritest several times with CX attack jobs (earliest one here: http://www.lionbridge.com/NR/rdonlyres/1AFAED33-A12A-4A30-8331-A52774A54B83/0/netapp_f825c.pdf).

It is notable that it's always been you guys doing it first, and us needing to respond to the FUD. Now, on one hand, you've got to respect that as a competitive tactic (being the first to throw the grenade). On the other hand - it wastes resources at two strong companies (you guys to produce, us to respond, and then the Mutually Assured Destruction escalation) to produce one thing only - FUD.

Believe it or not, you guys did the Veritest (now Lionbridge) test, then we did a "revealed" doc (pointing out the consistent testing methodology of Netapp with low-utilization filesystems and short test durations that will gear to NetApp's great strengths, and carefully avoid their weaknesses). You guys then did the "cooking the numbers" doc - that TR article. Of course, we have a "cooking the numbers - revealed" response I'm happy to send you (we didn't publish it publicly just to avoid the inevitable "cooking the numbers, revealed, cooked" TR) What a collosal waste of time, IMHO.

I understand the need - you guys need to establish your FC credibility, and the best way to get a lot of attention fast is to go after the leader (which is why as we've had more and more NAS market success, the PR seems to refer to you guys a lot).

There is room for different designs and different solutions to different problems. It's also not all about performance. Here are some examples.

You guys and EMC are working as hard as we can to improve filer head failover time. Getting it below 30 seconds - and not in a "best case", but with snaps, replicas and lots of data in the filesystems is a REALLY hard engineering problem on a filer (which neither of us have cracked). This is of course inherited by the iSCSI and FC LUN devices that are files in those filesystems. For some customer use cases, that's not a problem. For others, it is and you can adjust timeouts, and for others, it's a total show-stopper.

Doing I/O consistency across multiple filesystems and multiple platforms is a REALLY hard engineering problem (which neither of us has cracked). Important? Definitely is in some cases.

Conversely - those are both inherently easy on a pure block device, which is why we've been able to do them forever, and you guys (and our Celerra NAS/iSCSI) can't.

The reverse is true also some things are easy with a filesystem-based design, and hard with pure block devices. For example, doing thin provisioning is easy on a filesystem-based device (we've both been able to do that for years), and VERY hard on a pure-block device (why it only just got introduced on the CLARiiON).

There are multiple car vendors - some have one model and focus on that, others have different models (small, large, SUV, trailer trucks) - there's merits to both approaches - and every customer needs to decide for themselves.

The "CX explodes when you take snaps" is a myth too. Yes, can you make a platform look bad - sure, if you try. Dave Hitz (a man I respect, and the NetApp co-founder) said this better than I ever could:

"It's important to follow vendor best practices for your app. We've got folks in our lab who know how to configure EMC systems to run really slow, and they have folks in their lab who know the same for NetApp. Pay no attention! Focus on results from configurations the vendor recommends. (Do check that the recommended config has the features you plan to use. Many features can hurt performance.)

Benchmarks are valuable, despite some flaws, but you must read between the lines to understand the true message. Are commonly used features enabled? Is data protection turned on? Are LUNs created in unusual ways? One trick I've seen is to create LUNs that span many disks, using just a small sliver of each one, with no RAID protection enabled. Nobody would ever configure a real-world system that way. In other words, poke at how the benchmark config differs from what you plan to buy."

http://blogs.netapp.com/dave/2007/03/admire_and_resp.html


eBuddha

Well this article always interested me, and I certainly understand the sequential read after random write issue. I thought I would however point out that in OnTap 7.3.1 NetApp has introduced a kind of work around that attempts to correct this issue. There is an option you can set on a per volume basis (read_realloc) that will run a reallocate (defrag) on the blocks that need to be read sequentially so on subsequent reads of the same data the file layout will be more contiguous.

The comments to this entry are closed.

Powered by TypePad
View Jeff Browning's profile on LinkedIn

disclaimer: The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.