Home > Big Data, EMC, Marketing, Performance, Uncategorized, Value > Breaking Records … Revisited

Breaking Records … Revisited


So today I found out that we’d broken a few records of our own few days ago, which was, at least from my perspective associated with surprisingly little fanfare with the associated press release coming out late last night. I’d like to say that the results speak for themselves, and to an extent they do. NetApp now holds the top two spots, and four out of the top five results on the ranking ladder. If this were the olympics most people would agree that this represents a position of PURE DOMINATION. High fives all round, and much chest beating and downing of well deserved delicious amber beverages.

So, apart from having the biggest number (which is nice), what did we prove ?

Benchmarks are interesting to me because they are the almost perfect intersection of my interests in both technical storage performance  and marketing and messaging. From a technical viewpoint, a benchmark can be really useful, but it only provides a relatively small number of proof points, and extrapolating beyond those, or making generalised conclusions is rarely a good idea.

For example, when NetApp released their SPC-1 benchmarks a few years ago, it proved a number of things

1. That under heavy load which involved a large number of random writes, a NetApp arrays performance remained steady over time

2. That this could be done while taking multiple snapshots, and more importantly while deleting and retiring them while under heavy load

3. That this could be done with RAID-6 and with a greater capacity efficiency as measured by  RAW vs USED than any other submission

4. That this could be done at better levels of performance than an equivalently configured  commonly used “traditional array” as exemplified by EMCs CX3-40

5. That the copy on write performance of the snapshots on an EMC array sucked under heavy load (and by implication similar copy on write snapshot implementations on other vendors arrays)

That’s a pretty good list of things to prove, especially in the face of considerable unfounded misinformation being put out at the time, and which, surprisingly is still bandied about despite the independently audited proof to the contrary. Having said that, this was not a “my number is the biggest”, exercise which generally proves nothing more than how much hardware you had available in your testing lab at the time.

A few months later we published another SPC-1 result which showed that we could pretty much doubl the numbers we’d achieved in the previous generation at a lower price per IOP with what was at the time a very competetive submission.

About two years after that we published yet another SPC-1 result with the direct replacement for the controller used in the previous test (3270 vs 3170). What this test didnt do was to show how much more load could be placed on the system, what it did do was to show that we could give our customers more IOPS at a lower latency with half the number of spindles .   This was the first time we’d submitted an SPC-1e result which foucussed on energy efficiency. It showed, quite dramatically how effective our FlashCache technology was under a heavy random write workload. Its interesting to compare that submission with the previous one for a number of reasons, but for the most part, this benchmark was about Flashcache effectiveness.

We did a number of other benchmarks including Spec-SFS benchmarks that also proved the remarkable effectiveness of the Flashcache technology, showing how it could make SATA drives perform as better than Fibre channel drives, or dramatically reduce the number of fibre channel drives required to service a given workload. There were a couple of other benchmarks done which I’ll grant were “hey look at how fast our shiny new boxes can run”, but for the most part, these were all done with configurations we’d reasonably expect a decent number our customers to actually buy (no all SSD configurations).

In the mean time EMC released some “Lab Queen” benchmarks, at first I thought that EMC were trying to prove just how fast their new X-blades were for processing CIFS and NFS traffic. They did this by configuring the back end storage system in such a rediculously overengineered way as to remove any possibility that they could cause a bottleneck in any way, either that or EMC’s block storage devices are way slower than most people would assume. From an engineering perspective I think they guys in Hopkington who created those X-blades did a truly excellent job, almost 125,000 IOPS per X-Blade using 6 CPU cores is genuinely impressive to me, even if all they were doing was  processing NFS/CIFS calls. You see, unlike the storage processors in a FAS or Isilon array, the X-Blade, much like the Network Processor in a SONAS system, or an Oceanspace N8500 relies on a back end block processing device to handle RAID , block checksums, write cache coherency and physical data movement to and from the disks, all of which is non-trivial work. What I find particularly interesting is that in all the benchmarks I looked at for these kinds of systems, the number of back end block storage systems was usually double that of the front end, which infers to me either that the load placed on back end systems by these benchmarks is higher than the load on the front end, or  more likely that the front end / back end architecture is very sensitive to any latency on the back end systems which means the back end systems get overengineered for benchmarks. My guess is after seeing the “All Flash DMX” configuration is that Celerra’s performance is very adversly affected by even slight increases in latency in the back end and that we start seeing some nasty manifestations of little law in these architectures under heavy load.

A little while later after being present at a couple of EMC presentations (one at Cisco Live, the other at a SNIA event, where EMC staff were fully aware of my presence), it became clear to me exactly why EMC did these “my number is bigger than yours” benchmarks. Ther marketing staff at corporate created a slide that compared all of the current SPC benchmarks in a way that was accurate, compelling and completely misleading all at the same time, at least as far as the VNX portion goes. Part of this goes back to the way that vendors, including I might say Netapp, use an availability group as a point of aggregation when reporting peformance numbers, this is reasonably fair as adding Active/Active or Active/Passive availability generally slows things down due to the two phase commit nature of write caching in modular storage environments. However, the configuration of the EMC VNX VG8 Gateway/EMC VNX5700 actually involves 5 separate availability groups (1xVG8 Gateway system with 4+1 redundancy, and and 4x VNX5700 with 1+1 redundancy). Presenting this as one aggregated peformance number without any valid point of aggregation smacks of downright dishonesty to me. If NetApp had done the same thing, then, using only 4 availabilty groups, we could have claimed over 760,000 IOPS by combining 4 of our existing 6240 configurations, but we didnt, because frankly doing that is in my opinion on the other side of the fine line where marketing finesse falls off the precipice into the shadowy realm of deceptive practice.

Which brings me back to my original question, what did we prove with our most recent submissions, well three things come to mind

1. That Netapp’s Ontap 8.1 Cluster mode solution is real, and it performs briliiantly

2. It scales linearly as you add nodes (more so than the leading competitors)

3. That scaling with 24 big nodes gives you better performance and better efficiency than scaling with hundreds of smaller nodes (at least for the SPEC benchmark)

This is a valid configuration using a single vserver as a point of aggregation across the cluster, and trust me, this is only the beginning.

As always, comments and criticism is welcome.

Regards

John

  1. Geert
    November 4, 2011 at 6:37 am | #1

    Hear hear…!

    • Ausstorageguy
      November 15, 2011 at 9:52 pm | #2

      Really? Hear hear?

      So I’ve got to ask, is there something wrong with presentation of a best of breed solution?

      Now, should netapp decide to present a best of breed with a v- series infront of 6 x e series, tune, and rake in a stellar number, you’d see no issue with that?

      Now I’d imagine that you might take exception if emc were to criticize right? Even if it followed John’s precedent here?

      Oh and misleading? Does setting wafl_downgrade_target during a benchmark mean much to you? real work load…. Hmm, well Geert, if you are unwilling to say hear hear to a reversal of the critique, then I’d suggest it’s not Kool-aid your drinking, but something more akin to the Bear Grylls, biological function level.

      The reality is, the vg8 is doing exactly what it was designed to do. Much like the also excellent v-series.

  2. Geert
    November 16, 2011 at 12:07 am | #3

    Not sure where this response is coming from and where you’re going with it, but I was really just applauding John’s writeup on this topic which really boils down to a a few simple metrics; NetApp achieved a world-record performance with a configuration which has actually been bought, at a efficiency rate which is actually achievable, and at a price-point which will at least – well say – get people very interested…

    • ausstorageguy
      November 16, 2011 at 9:13 am | #4

      Hi Geert, point taken and I retract with an unreserved apology.

  3. ausstorageguy
    November 16, 2011 at 11:25 am | #5

    (As a separate post so as not to detract from my apology to Geert.)

    I would like to offer in my defence; that although “hear hear” is fairly universal in agreement and applaud; nowadays, it seems to be the sole reserve of politicians who oft use it to back their speaking party member when berating through misinformation and manipulation of words to the opposing parties actions or intentions.

    That is how I interpreted it.

    For me, I just don’t understand why John could not stop at simply saying, NetApp achieved something special, as John is often the first to criticize the competitor for using FUD, yet more often than not the first to use FUD.

    I’m taking it as a personal mission to eliminate FUD, as often it’s completely misinformed, misinterpreted and delivered by those with little-to-no knowledge of the competitive product they speak of…. and this is from all parties.

    It’s very easy to spin numbers and use statements like:
    “4. That this could be done at better levels of performance than an equivalently configured
    commonly used “traditional array” as exemplified by EMCs CX3-40″
    An array that’s 3 generations prior???

    And it’s also easy to spin numbers:
    The NetApp config required 1728 drives, 24xFAS6240 node pairs, 576 RU (or 13.7 Racks)
    To achieve 1,512,784 NFS operations a second total.
    -That’s 875.45 NFS Operations per drive slot.

    The EMC config required 457 drives, 4xVNX5700 node pairs, 1xVG8 5 blade, 130 RU (or 3.09 Racks, but let’s call it 4 as per the diagram)
    To achieve 497,623 NFS operations a second total.
    -That’s 1,088.89 NFS Operations per drive slot.

    Or to put it another way, EMC is 213 NFS Operations per drive slot better that NetApp with considerably less the hardware.

    There was nothing more “Lab Queen” about EMC’s config than NetApp’s.

    NetApp achieved an excellent result…. John, the post should have ended there.

    • November 16, 2011 at 4:35 pm | #6

      An excellent comment, though its unfortunate that I dint effectively communicate two of of the main points I was trying to make which was

      1. You need to be careful about the way benchmarks are used and interpreted
      2. You should present the top line number honestly without resorting to tricks like unrealistic configurations or aggregating performance numbers without a valid point of aggregation.

      As you’ll know from this and my more recent posts, I take serious issue with the EMCs tactics with the way they both report, and subsequently use benchmark material

      I particularly like the point you make about the relative efficiencies of the VNX benchmark vs the FAS scaleout benchmark. One response would be to point out that one used spinning disks assisted by flash caching vs the other which used only flash drives. You might then be able to say that on a “per drive slot” basis pure flash is about twice as efficient as flash enhanced disk. Its only one data point across vastly dissimilar architectures, but its an interesting data point nonetheless. I dont want to be seen to be putting words in your mouth, so I’d be interested in wether you think that flash assisted disk get 50% of the IOPS / drive slots compared to flash only is a legitimate assertion / conclusion from an analysis of the benchmark ?

      I think that one place where you’d need to be careful about is that the SPEC benchmark stresses the CPU of the controllers more than it does the back end subsystems and that a scale-out benchmark must include an even distribution of cross cluster traffic which introduces additional latency and CPU consumption, so scale out configs like the Isilon and Ontap 8.1 Cluster mode submissions will almost by definition be less efficient on a IOPS/CPU basis than non-scale out configs like the VNX and ONTAP 7 mode benchmarks. I think you’d be safer comparing IOPS/disk on more similar architectures, which I might do if time permits.

      As far as “Lab Queens” in EMC’s configurations go, we may never agree on this but I submit that nobody would ever buy a V-MAX with 96 EFD drives just to put behind a few NS gateways, and that again, I strongly doubt that anyone would buy 2 separate VNX 5700s packed with flash for their NFS servers. These configurations are purpose built for benchmarks and therefore qualify in my eyes for the epithet of “Lab Queen”.

      In comparison the configurations used in our benchmarks are all representative of configurations I’ve seen or know of, including a couple of 24 node 6080 config that was substantially similar to our first “million SPEC-SFS” configurations, and I wouldnt be surprised if these were upgraded to configurations substantially similar to the ones used in our latest benchmark. I have no problems with the Isilon benchmark configuration, I dont think there will be too many installations of that size, but the way in which they’re configured in the benchmark is consistent with the way in which I believe Isilon systems are configured in the real world.

      My conjecture that EMC uses all Flash configs because minor increases additional latency kills performance may seem FUDish, but for the life of me I cant figure out any other reason why they seem to have so heavily over-engineered the back end. Limited queues/smallish LUN queue depths between the Front ends and Back ends would cause performance problems that could be substantially mitigated with ultra-fast SSD based back ends. I’ll do some additional checking, but if you’ve got some information one way or the other, I’d be interested in hearing it.

      Re you comment

      “4. That this could be done at better levels of performance than an equivalently configured commonly used “traditional array” as exemplified by EMCs CX3-40″
      An array that’s 3 generations prior??”

      The benchmark at the time was with an equivalent array (the 3040) which is now also three generations old, the benchmark and what it was proving at the time remains valid in my opinion. I’d be interested to see a similar side by side submission for a VNX5300 with FAST Cache and a FAS3240 with Flashcache, but this time maybe we should wait until EMC submits their own configuration first.

  4. ausstorageguy
    November 21, 2011 at 2:51 pm | #7

    RJM > “1. You need to be careful about the way benchmarks are used and interpreted
    2. You should present the top line number honestly without resorting to tricks like unrealistic configurations or aggregating performance numbers without a valid point of aggregation.”
    Thank you, I’ll take that into consideration.
    So I just wanted to set the scene for what I’m about to point out:
    RJM > “The benchmark at the time was with an equivalent array (the 3040) which is now also three generations old, the benchmark and what it was proving at the time remains valid in my opinion. I’d be interested to see a similar side by side submission for a VNX5300 with FAST Cache and a FAS3240 with Flashcache, but this time maybe we should wait until EMC submits their own configuration first.”
    RJM > “I’m pretty happy with the way NetApp does their bench-marking, and it seems to me that many others abuse the process, which annoys me, so I write about it.”
    I want you to know that my stance here is not personal; not directed towards you – we all want to believe the things closest to us are infallible – we don’t want to believe our kid is the one who bit another kid; our mother is the better cook and the company and/or products we associate ourselves with are the superior products and our companies practices unblemished – but this can cloud our judgment and cause us to look past the flaw’s.
    Now, I want to make something perfectly clear, I’m not Pro/Anti NetApp or EMC in any regard; I have work with both for many years, my first Symmetrix and Clarrion was in ’97 and ’99 respectively and my first NetApp was in 2001.
    I will be taking any other vendors to task as and when I see fit, for now however, your blog is one of my regular reads and I know your distain for FUD – You just happened to be the first, and your post contradicted your supposed dislike for FUD and engineering a benchmark.
    The Symm was scary, the Clariion basic but good (there was nothing else like it on the market at the time) and the NetApp was a glorified file server (all be it a very good file server).
    Since that time, I have lost track of how many NetApp and EMC array’s I’ve been directly involved in and have since become more and more enchanted with the products and equally as dis-enchanted with the companies practices.
    I want to go back to the FAS 3040/Clariion CX3-40 example as a demonstration of how NetApp DOES NOT (highlight not shouty) play by the rules.
    Executive Summary:
    Way back in March 2009, NetApp published a comparison of the two products in an attempt to show NetApps superiority as a performance array, the NetApp FAS3040 achieved SPC-1 result of 30,985 IOPS, whilst the EMC CX3-40 a seemingly meager 24,997 SPC-1 IOPS – the EMC left wanting with a horrific 5,988 IOPS behind the NetApp.
    On the face of it; it would appear NetApp had a just and fair lead, but this is simply not true – NetApp Engineered the EMC to be pig-slow, and I’ll prove it.
    When committing a benchmark, it’s important to ensure a Like-for-like configuration.
    NetApp simply did not do this!
    NetApp used:
    * Different Hardware for the Workload Generator,
    * Different methods for the ASU presentation,
    * Short Stroked the NetApp and Long Stroked the EMC,
    * engineered in higher latency equipment and additional hardware and services into the EMC BoM and;
    * falsified the displayed configuration.
    My goal here is to show why I place no faith in NetApp’s or any other vendor’s competitive benchmark.
    End Executive Summary.
    Now for the Nuts and Bolts:
    Now John, I know you have a passion for Benchmarks, and to reiterate the first quote in this reply, I will add that you need to be careful to ensure you are not causing undue and unfair differences in the equipment, tools, software and pricing to give an undue competitive advantage.
    I know you’ll probably be crying foul by now and stating it was fair and just, but I can prove that it was not – without a doubt – and for the life of me, I cannot believe how any of this was missed and that NetApp got away with it.
    I must warn you, this level of detail is normally reserved for the kind of person who wears anoraks and strokes their beard.
    I’ll give a breakdown of how NetApp did this by breaking it into sections and their differences:
    1. LUN to volume presentations
    2. Workload Generator (WG) Hosts
    3. HBAs
    4. Array Configurations and BoM
    5. RAID Group and LUN Configuration
    6. Workload Differences
    So let’s look at these differences in detail (John, when benchmarking, the devil IS in the detail):
    1. LUN to volume presentations:
    When NetApps Steve Daniels configured the WG’s (Workload Generators) volumes, he stripped 36 LUNs from the Clariion at the host level, yet presented only 18 LUNs from the NetApp (half the number):

    Page 64: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf

    Page 63: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf

    Despite the fact that he knew full well that the EMC Clariion had the capability to stripe the volumes in-array.

    This wouldn’t seem like a big deal, but it’s a huge difference and creates many host and array performance issues – it’s certainly known to anyone with a strong knowledge of storage networking not to do this unless you have no other choice (which he did):

    The phenomenon of striping performance loss at the host is well observed here:
    http://sqlblog.com/blogs/linchi_shea/archive/2007/03/12/should-i-use-a-windows-striped-volume.aspx

    It would seem that NetApp created a greater depth of striping for the EMC array and utilised for not possible technical reason (other than to make the workload as high as possible) small stripes for the EMC, thereby negating any possible use of the cache.

    2. Workload Generator (WG) hosts
    At first glance, the WG hosts seem identical IBM X3650 servers, however:

    The IBM X3650 used for the EMC WG is:
    PCI-X 133MHz based
    PAGE 15: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf

    The IBM X3650 use for the NetApp WG is:
    PCIe based.
    PAGE 16: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf

    PCI-X
    * is half-duplex – It CANNOT transmit and receive at the same time
    * is a Parallel bus and relies on arbitration, scheduling and shares the bus bandwidth
    * has a Line Speed of 1GB/s
    * is channeled through multiple chipsets and bridges to before interaction at the north-bridge and CPU
    * two PCI-X HBA’s still have only 1GB/s bus to share

    PCIe
    * is full-duplex – It CAN transmit and receive at the same time
    * is a Serial Interface and is point to point
    * has a line speed of 2GB/s per 4 lanes (32 PCIe lanes)
    * is direct to the north-bridge and CPU
    * two PCIe HBA’s have 2GB/s each for a total of 4GB/s (8 PCIe lanes, 4 lanes each of 32)

    Anyone with an understanding in networking will understand the implications or full and half duplex and serial vs parallel is much the same jump in performance as SATA/PATA.

    NetApp speak regularly on the benefits of PCIe over PCI-X:
    http://partners.netapp.com/go/techontap/fas6070.html

    And I quote:

    “We have also changed the system interface on NVRAM to PCIe (PCI Express). This eliminates potential bottlenecks that the older PCI-Xbased slots might introduce.”

    “Howard: PCIe was designed to overcome bandwidth limitation issues with earlier PCI and PCI-X expansion slots.”

    “Naresh: 100 MHz PCI-X slots are 0.8GB/s peak, and x8 PCIe slots are 4GB/s. PCI-X slots at 100 MHz could be shared between two slots, so a couple of fast HBAs could become limited by the PCI-X bandwidth.”

    “Tom: In addition to increased bandwidth, PCIe provides improved RAS features. For example, instead of a shared bus, each link is point to point.”

    It seems NetApp used the inferior host for the Workload Generator for the EMC and the superior WG for NetApp.

    Why did they not use the same host? I can only imagine to increase the Total Service Time when measuring the EMC, possibly doubling the response time!

    3. HBAs

    Again, at first glance, it would seem NetApp use the same Qlogic HBA’s for both tests – but as highlighted before, the two hosts used were different, one PCIe, the other PCI-X.

    The same is applied to the HBA’s

    The HBA given to the EMC is the QLA2462, which is:
    * PCI-X
    * 266 MHz but limited to 133Mhz because of the host

    The HBA given to the NetApp is the QLE2462, which is:
    * PCIe
    * Superiority highlighted before

    It’s important to note that 1GB/s is PCI-X peek speed which is easily drowned with a 2 port 4Gb/s HBA, let alone 2 of them (as per the BoM and config) – Totaling 2GB/s for both cards, yet only 1GB/s being available.

    Where as PCIe has a maximum throughput of 8GB/s, meaning 2 x PCIe x4 HBA’s would only be using 2GB/s only a quarter of the available host bus bandwidth.

    Clearly again, NetApp has engineered a superior WG host for themselves and inferior for EMC.

    4. Array Configurations and BoM

    Here are the two BoM’s from EMC and NetApp:

    (because of the lack of formatting in the responses, I’m using “!!!!! ” to highlight stand out areas)

    EMC – Page 13: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf :

    Qty Product U/M Unit List Discount Total Vendor
    1 CX3-40C-FD – SPE-FIELD INSTALL EA $42,300 0% $42,300 see attached third party quotation

    !!!! 11 CX-4PDAE-FD – 4G DAE FIELD INSTALL EA $5,900 0% $64,900 see attached third party quotation

    150 CX-4G15-146 – 146GB 15K 4GB FC EA $1,645 0% $246,750 see attached third party quotation

    !!!! 1 V-CX4014615K – VAULT PACK CX3-40 146GB 15K 4GB DRIVES QTY 5 EA $8,225 0% $8,225 see attached third party quotation

    !!!! 4 FC2-HSSDC-8M – 8M HSSDC2 to HSSDC2 bus cbl EA $600 0% $2,400 see attached third party quotation

    1 PP-WN-KIT – POWERPATH WINDOWS KIT EA $0 0% $0 see attached third party quotation
    1 NAV-ENKIT – NAVI ENTERPRISE MEDIA EA $0 0% $0 see attached third party quotation
    8 NAVAGT-WINKIT – NAVI AGENT WINDOWS MEDIA EA $0 0% $0 see attached third party quotation
    8 UTIL-WIN – Windows Software Utilities EA $40 0% $320 see attached third party quotation
    1 CX34C-KIT – CS3-40C DOCS AND RTU KIT EA $0 0% $0 see attached third party quotation
    1 C-MODEM-US – CLARIION SERVICE MODEM-US EA $0 0% $0 see attached third party quotation
    1 NAV34-EN – NAVI MGR CX3-40 ENTPR LIC EA $58,000 0% $58,000 see attached third party quotation
    !!!! 1 PP-WN-WG – PPATH WINDOWS WGR EA $1,440 0% $1,440 see attached third party quotation

    !!!!! 1 PS-BAS-PP1 – POWERPATH 1HOST QS EA $1,330 0% $1,330 see attached third party quotation
    !!!!! 1 PS-BAS-PMBLK – POWERPATH 1HOST QS EA $1,970 0% $1,970 see attached third party quotation

    1 M-PRESW-001- premium software support EA $33,929 0% $33,929 see attached third party quotation
    1 M-PRESW-004 – premium software support – open SW EA $777 0% $777 see attached third party quotation
    1 WU-PREHW-001- premium hardware support EA $31,317 0% $31,317 see attached third party quotation

    !!!!! 2 QLA2462-E-SP – 2 PORT 4GB PCI-X EA $1,700 0% $3,400 see attached third party quotation

    !!!!! 2 Brocade 16-Port 200e FC Full Fab Switch,-C,R5 EA $8,700 0% $17,400 Network Appliance, Inc.
    !!!!! 2 BSWITCH-16PORT-R5 HW Support,Premium,4hr,y mths:36 EA $1,697 0% $3,393 Network Appliance, Inc.
    2 BSWITCH-16PORT-R5 SW Subs,Premium,4hr,y mths:36 EA $0 0% $0 Network Appliance, Inc.

    Hardware Total $385,375
    Software Total $59,760
    Services Total $3,300
    prepaid software maintenance (3YR-4HOUR) $34,706
    hardware warranty upgrade summary (3YR 4HOUR) $34,710
    Total Price $517,851

    NetApp Page 14: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf :

    Storage System Ext Qty List Price Disc % Net Price Ext Net Price
    SES-SYSTEM Support Edge Services Attach PN 1 $0.00 0 $0.00 $0.00
    X1941A-R6-C Cable,Cluster 4X,Copper,5M,-C,R6 2 $97.00 0 $97.00 $194.00
    X2055A-R6-C HBA,FC,2-Port,4Gb,Disk,Optical,PCIe,-C,R6 4 $2,300.00 0 $2,300.00 $9,200.00
    X505-R6-C System Lift Handle,Detachable,-C,R6 2 $0.00 0 $0.00 $0.00
    X5515A-R6-C Rackmount Kit,4N2,DS14-Middle,-C,R6 12 $100.00 0 $100.00 $1,200.00

    !!!! X6530-R6-C Cable,Patch,FC SFP to SFP,0.5M,-C,R6 16 $0.00 0 $0.00 $0.00

    X6536-R6-C Cable,Optical,50u,2GHz/KM/MM,LC/LC,5M,-C,R6 12 $150.00 0 $150.00 $1,800.00
    X6539-R6-C SFP,Optical,4.25Gb,-C,R6 8 $120.00 0 $120.00 $960.00
    X800E-R6-C Power Cable North America,-C,R6 24 $0.00 0 $0.00 $0.00
    DOC-3XXX-C Documents,3XXX,-C 1 $0.00 0 $0.00 $0.00
    FAS3040AS-BASE-R5-C FAS3040A,IB,ACT-ACT,SAN,OS,-C,R5 2 $16,700.00 0 $16,700.00 $33,400.00
    FCP Onboard Target Ports,Quantity 4 $0.00 0 $0.00 $0.00
    LOOPS Storage Loops Attached Quantity 4 $0.00 0 $0.00 $0.00
    MULTIPATH-C Multipath configuration 1 $0.00 0 $0.00 $0.00
    X74015B-ESH4-R5-C DS14MK4 SHLF,AC,14x144GB,15K,B,ESH4,-C,R5 10 $27,418.00 0 $27,418.00 $274,180.00
    SW-T4C-CLUSTERSAN-C CFO Software,T4C,SAN Bndl 2 $4,175.00 0 $4,175.00 $8,350.00
    SW-T4C-FCPSAN-C FCP Software,T4C,SAN Bndl 2 $0.00 0 $0.00 $0.00
    SW-T4C-ISCSISAN-C iSCSI Software,T4C,SAN Bndl 2 $0.00 0 $0.00 $0.00
    SW-ONTAP4-3XXX SW,DataONTAP4,3XXX 2 $0.00 0 $0.00 $0.00
    SVC-A-IN-NBR-Z HW Support,Premium,4hr,z Mths: 36 1 $64,775.49 0 $64,775.49 $64,775.49
    SW-SSP-A-IN-NBR-Z SW Subs,Standard Replace,Inst,NBD,z Mths: 36 1 $3,006.00 0 $3,006.00 $3,006.00

    Storage Subtotal $397,065.49
    Host Attach Hardware and Software
    SW-DSM-MPIO-WINDOWS 1 $0.00 0 $0.00 $0.00
    X6518A-R6 Cable,Optical,LC/LC,5M,R6 4 $150.00 0 $150.00 $600.00

    !!!! X1089A-R6 HBA,QLogic QLE2462,2-Port,4Gb,PCI-e,R6 2 $2,615.00 0 $2,615.00 $5,230.00

    !!!! SW-DSM-MPIO-WIN Software,Data ONTAP DSM for Windows MPIO 1 $1,000.00 0 $1,000.00 $1,000.00
    SW-FAK-WIN FCP Windows Host Utilities 1 $75.00 0 $75.00 $75.00
    SW-SSP-DSM-MPIO-WIN SW Subs,Data ONTAP DSM for Windows MPIO Mths: 36 1 $360.00 0 $360.00 $360.00
    X1611A-R5-C Brocade 16-Port 200e FC Full Fab Switch,-C,R5 2 $8,700.00 0 $8,700.00 $17,400.00
    Host Subtotal $24,665.00
    Total $421,730.49

    Now here are the interesting bits

    Firstly, costs:

    - EMC: NetApp included the HBA’s and Switches and multipathing as part of the EMC array costs
    “!!!! 1 PP-WN-WG – PPATH WINDOWS WGR EA $1,440 0% $1,440 see attached third party quotation
    !!!!! 2 QLA2462-E-SP – 2 PORT 4GB PCI-X EA $1,700 0% $3,400 see attached third party quotation
    !!!!! 2 Brocade 16-Port 200e FC Full Fab Switch,-C,R5 EA $8,700 0% $17,400 Network Appliance, Inc.
    !!!!! 2 BSWITCH-16PORT-R5 HW Support,Premium,4hr,y mths:36 EA $1,697 0% $3,393 Network Appliance, Inc.
    2 BSWITCH-16PORT-R5 SW Subs,Premium,4hr,y mths:36 EA $0 0% $0 Network Appliance, Inc.”

    - NetApp: NetApp added the HBA’s and Switches and multipathing as the add-on costs
    Host Attach Hardware and Software
    “SW-DSM-MPIO-WINDOWS 1 $0.00 0 $0.00 $0.00
    X6518A-R6 Cable,Optical,LC/LC,5M,R6 4 $150.00 0 $150.00 $600.00
    !!!! X1089A-R6 HBA,QLogic QLE2462,2-Port,4Gb,PCI-e,R6 2 $2,615.00 0 $2,615.00 $5,230.00
    !!!! SW-DSM-MPIO-WIN Software,Data ONTAP DSM for Windows MPIO 1 $1,000.00 0 $1,000.00 $1,000.00”

    - EMC: NetApp included Professional in the EMC costs
    !!!!! 1 PS-BAS-PP1 – POWERPATH 1HOST QS EA $1,330 0% $1,330 see attached third party quotation
    !!!!! 1 PS-BAS-PMBLK – POWERPATH 1HOST QS EA $1,970 0% $1,970 see attached third party quotation
    (Who the hell needs PS to install PowerPath? And For that matter, who needs a Project management block for 1 host?) (I hope they got their money’s worth!)

    - NetApp: There were no included Professional services costs

    No wonder the EMC came out more expensive!!!!

    Secondly, Cabling:

    - EMC: NetApp included 4 x 8 meter HSSDC2 cables for connection from the array to the first Disk Shelf of each bus with 1m cables from then on:
    “!!!! 4 FC2-HSSDC-8M – 8M HSSDC2 to HSSDC2 bus cbl EA $600 0% $2,400 see attached third party quotation”

    (added costs in using 8m cables!)

    - NetApp: NetApp included 16 x 0.5 meter HSSDC2 cables for connection from the array to the first disk shelf of each bus and 0.5 meter from then on:
    “!!!! X6530-R6-C Cable,Patch,FC SFP to SFP,0.5M,-C,R6 16 $0.00 0 $0.00 $0.00”
    Now, this might not seem like a big deal, but 8m cables are the reserve of only very difficult scenarios such as having to stretch many rack to join shelves to the array, it is never used in latency sensitive scenarios and here’s why:

    Fibre and copper have similar latencies of 5ns per meter.

    For an 8m cable, that translates to 80ns round-trip, where as; (the EMC config)
    For a 0.5m cable, it’s 5ns round-trip (.25ns per 0.5 meter) (the NetApp config)

    Extend that to a mirrored system with 2 busses that’s 160ns round-trip then add every meter and enclosure after that (up to 0.005ms port-to-port).

    Now I want to state again, EMC never use 8m cables except in extreme circumstances and never when low latency is needed!

    It’s clear NetApp Engineered the EMC to have a slow a bus as possible when compared to the NetApp!

    Thirdly, Bus Layout:

    EMC:

    In the diagram Page 14: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf:

    You can clearly see that NetApp show that they use only 11 DAE’s for the Configuration:
    Left (bus 0):
    1 x Vault Pack with 5 disks
    5 x DAE with 75 disks

    Right (bus 1):
    5 x DAE with 74 disks

    But when I look at the RAID Group configuration I see that they use all 12 DAE’s from the BoM, different from the stated configuration:

    !!!! 1 V-CX4014615K – VAULT PACK CX3-40 146GB 15K 4GB DRIVES QTY 5 EA $8,225 0% $8,225 see attached third party quotation
    +
    !!!! 11 CX-4PDAE-FD – 4G DAE FIELD INSTALL EA $5,900 0% $64,900 see attached third party quotation

    That makes 12 DAE’s – Who cares? You’ll see!

    If we look at Page 60: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf

    Create Raid Groups: We see that the first raid group (RG0) Mirror Primary starts at 0_1_0 and the Mirror Secondary starts at 1_3_0 (x_x_x is Bus_Enclosure_Device/Disk):

    naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 0 0_1_0
    1_3_0 0_1_1 1_3_1 0_1_2 1_3_2 0_1_3 1_3_3 0_1_4 1_3_4 0_1_5 1_3_5

    And as we go further down, we see the last raid group (RG11) Mirror Primary starts at 0_4_12 and the Mirror Secondary starts at 1_6_12 (x_x_x is Bus_Enclosure_Device/Disk):

    naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 11
    0_4_12 1_6_12 0_4_13 1_6_13 0_5_12 1_7_12 0_5_13 1_7_13 0_1_14 1_3_14 0_2_14 1_4_14

    Now I first read that as a typo, why wouldn’t 0_1_0 be mirrored to 1_0_0 or 1_1_0?
    What happened to the 3 x shelves before hand on bus 0?

    Because by placing the mirror pair further down the chain, you increase the latency to get to the pair disk, increasing the service time drastically.

    There is no reason to do so other than to engineer slowness! No one in their right mind would do so!

    NetApp engineered the EMC to have a slow Backend!

    5. RAID Group and LUN Configuration

    When it came to the RAIDGroup and LUN Layout, this is where it got even worse:

    NetApp deliberately Long Stroked the EMC Disks: Page 60: http://www.storageperformance.org/results/a00059_NetApp_EMC-CX3-M40_full-disclosure-r1.pdf

    EMC: NetApp it seems, also limited the performance of each EMC RAIDGroup by using only 12 disks per RAID group offering, basically 6 disks of performance. Eg:

    naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 createrg 11
    0_4_12 1_6_12 0_4_13 1_6_13 0_5_12 1_7_12 0_5_13 1_7_13 0_1_14 1_3_14 0_2_14 1_4_14

    They then Long Stroked each RAIDGroup by having a slice of each broken up into 3 LUNS per RG for a total of 36 LUNS EG:.
    naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 0 -rg 0
    -cap 296 -sp a

    naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 1 -rg 0
    -cap 296 -sp a
    naviseccli -User admin -Password password -Scope 0 -h 10.61.162.55 bind r1_0 2 -rg 0
    -cap 65 -sp a
    naviseccli -User

    Where as NetApp created one very large aggregate:

    Page 62: http://www.storageperformance.org/results/a00057_NetApp_FAS3040_full-disclosure-r1.pdf
    create aggregate with the following configuration:
    - aggr0 settings, 4 rgs, rg sizes (1×18 + 3×17), 1 spare
    - aggr0 options:
    – nosnap=on
    - set snap reserve = 0 on aggregate aggr0
    - set snap sched to 0 0 0 on the aggregate aggr0

    Now Typically, we (the storage industry) quote a 15k spindle as being ~180 IOPs per disk avg, but we know that it’s more like ~250 IOPS at the outside and ~100 IOPS at the inside of a disk so the average is about 180 end to end.

    NetApp engineered the EMC to utilize all but 23GB of disk when presented to the host>Volume>ASU.

    NetApp created an aggregate many times larger, meaning that NetApp SHORT STROKED their array!

    There is no conceivable reason for the EMC to have so many small RaidGroups with little LUNS in them, why not just have a LUN presented form each RAID 1/0 RAIDGroup, heck even stripe inside the array.

    The only reason is to make sure the EMC utilised the full length of the disks or LONG STROKED.

    When examining the SPC-1 specifications it reveals the following:
    http://www.storageperformance.org/specs/SPC-1_v1.11.pdf
    2.6.8 SPC-1 defines three ASUs:
    - The Data Store (ASU-1) holds raw incoming data for the application system. As the
    application system processes the data it may temporarily remain in the data store,
    be transferred to the user store, or be deleted. The workload profile for the Data
    Store is defined in Clause 3.5.1. ASU-1 will hold 45.0% (+-0.5%) of the total ASU
    Capacity.
    - The User Store (ASU-2) holds information processed by the application system and
    is stored in a self-consistent, secure, and organized state. The information is
    principally obtained from the data store, but may also consist of information created
    by the application or its users in the course of processing. Its workload profile for the
    User Store is defined in Clause 3.5.2. ASU-2 will hold 45.0% (+-0.5%) of the total
    ASU Capacity.
    - The Log (ASU-3) contains files written by the application system for the purpose of
    protecting the integrity of data and information the application system maintains in
    the Data and User stores. The workload profile for the Log is sequential and is
    defined in Clause 3.5.3. ASU-3 will hold 10.0% (+-0.5%) of the total ASU Capacity.

    So, 45.0% for ASU 1 and ASU2 and 10.0% for ASU3

    By spreading the Benchmark over the entire length of the EMC Disks and only 45.0%/10% of the length of the NetApp disks, it seems.

    • November 23, 2011 at 10:05 am | #8

      Thanks for the long and considered reply, it will take me some time to give it the attention it deserves and I’m a little strapped for time right now. I’d be happy to continue this conversation on this thread or start up a new post either here on on another blog or any other plaftorm.

      I believe this kind of debate is important. I’m also happy to discuss this in person if you’d prefer.

      Regards
      John

  5. ausstorageguy
    November 28, 2011 at 11:26 am | #9

    Hi John,

    No Problem, sorry mate, the response didn’t quite look like how I intended.

    I have now posted it as a full writeup here: http://ausstorageguy.wordpress.com/2011/11/26/dragging-up-ancient-history-how-netapp-fooled-everyone-fas3040-v-cx3-40/

    Regards,
    Aus Storage Guy.

  1. November 26, 2011 at 3:30 pm | #1

Leave a Reply - Comments Manually Moderated to Avoid Spammers

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 530 other followers

%d bloggers like this: