A very long reply
This blog post is essentially a very long comment reply to Darius Zheng at Oracle on his blog
I suspect it is so long that it probably needs more formatting to be readable, so I’m posting it here too
Thanks for posting the reply, again, I think you’re missing my point
DZ … “Oracle has 2.5x Performance for 1/2 the cost of a Netapp”
A more accurate statement is that “The Oracle 7420 attained a benchmark result that was 2.5x better for 1/2 the _List Price_ of a NetApp 3270 array in 2011 that had significantly less hardware.
What this says is that Oracles List Price is a significantly lower than NetApp’s List Price. You could say the same thing about the difference between the price of a Hyundai i30 and an Audi A3.
Secondarily, as I pointed out in previous comments the rest of the results also says that the Oracle solution make relatively inefficient usage of CPU and Memory when compared to a NetApp system that achieves similar performance.
Yes the list price of the NetApp system is significantly higher than an equivalently performing 7420, but this is a marketing and pricing issue, not a technical one. In general I like to stick to technical merits, because pricing is a fickle thing that can be adjusted at the stroke of a pen, technology requires a lot more work to get things right.
In the end, how this list price differentiation translates into what these solutions will actually cost a customer is highly debatable. I do a LOT of research into street prices as part of my job, and in general storage is increasingly purchased as part of an overall upgrade and this is where the issues get murky very quickly as margins are moved around various components within the infrastructure to subsidise discounting in other areas. Having said that, I will let you in on something, based on the data I have, for many quarters, in terms of average $/RAW TB paid by customers in my market, Oracle customers paid about 25% MORE for V7000 storage than paid by Netapp customers for storage on FAS32xx and that only recently did Oracle begin to reach pricing parity with NetApp. We could argue the ways the analyst arrived at those figures, but from my analysis the trend is clear across almost all vendors and array families vis. The correlation between customer $/TB is strongly correlated with the implied manufacturing costs, and very poorly correlated with the vendor list prices. The main exceptions to this are new product introductions when there is a compelling new and unique value propositions (e.g. DataDomain) or when vendors buy business at very low or even negative margin in order to seed the market (e.g. XIV in the early days)
Now personally, I disagree with NetApp’s list pricing policy, however there are reasons why that list price is so much higher than the actual street price most people pay. Many of those reaons have to do with boring things like long term pricing contracts. If you’d like to turn this into a marketing discussion around pricing strategies, I’m cool with that, but I don’t think the people that read either of our blogs are overly interested. However I will say this again, the price people pay in the end, has more to do with the costs of manufacture, and a solution that gets more performance out of less hardware will generally cost the customer less, especially if the operational expenses are lower.
DZ .. “Why wouldn’t a customer want more CPU and Cache?”
Why would someone want less CPU or Cache ? … because it costs them less, either in street pricing terms, or in the cost of powering or cooling them. And yes, I believe that that a 7420 controller with more than eleven times as many CPU cores, and more than one hundred and sixty times as much DRAM will chew a lot more power and cooling than a 3270 controller.
It’s not just the cost of the electricity (carbon footprint and green ethics aside), its also the opportunity cost of using that power for something else. Data centers have finite resources for power and many (most) are very close to the point where you cant add more systems. In those environments, Power hungry systems that aren’t running business generating applications are not viewed kindly.
JM Interpretaiton of DZ … “Happy to do a power consumption comparison, where is the netapp information ?”
I’ve answered that in a simlar question to me on my blog at storagewithoutborders.com – See the blog-post URL in a previous comment re getting access to power consumption figures.
DZ .. You say the Netapp cache is SO efficient and you talk about an old non relevant 3160 SPEC SFS post
I referenced the “non relevant 3160 SPEC SFS post” because it is relevant, being the place where NetApp tested the same controller with a combination of flash acceleration, no flash acceleration, with both SATA, and FC/SAS spindles. The specific one I referenced was the most comparable configuration that includes flash and a 300GB 15K disks which as I pointed out achieved 1080 IOPS/15K Spindle with a cache that was 7.6% of the fileset size.
If you prefer I could have used the more recent (though still old) 6240 dual node config which uses 450GB 15K disks and achieved cache that achieved 662 IOPS per drive but with a cache that was a mere 4.5% of the file-set size, or the 24 Node 6240 config which achieved 875 IOPS per drive with a cache that was 7.6% of the file-set size. As you can see a modest amount of flash improves the IOPS/disk enormously, and there is a good correlation between more flash as a percentage of the working sets and better results in terms of IOPS/Disk. Before you ask, as far as I can tell, the main reason for the difference in IOPS/spindle between the 24 Nodes 6240 and the old 3160 with a similar cache size as a percentage of the fileset, is that NetApp’s scale-out benchmark used worst case paths to from the client to the data to provide a squeaky clean implementation of SPEC’s uniform access rule.
DZ .. “You fail to mention that the 3270 gets a MEASLY 281 IOPS per drive and that the 3250 gets a whopping 300 IOPS per drive. So your point is that the 3250 was done to compare with the 3270? What was the 3270 done for?”
Neither the 3270, nor the 3250 benchmarks used flashcache, so the IOPS/spindle are going to be good, but not stellar. I don’t know exactly why we didn’t use flash in the old 3270 benchmark maybe its because SPEC-SFS is a better indication of CPU and metadata handling than it is about reads and writes to disk, and like I said, we’d already proved the effectiveness of our flash based caching with the series of 3160 benchmarks.
Going in to the future, I doubt NetApp will do another primary benchmark without flash, but its worth saying again, that the 3250 was done to show performance equivalency with the 3270, so that configuration was as close to identical as NetApp could, and that meant neither the 3270 or the 3250 benchmarks used Flash to improve the IOPS/disk. If NetApp had done it, I have every reason to believe that the results would have been in line with the 3160 and 6240 benchmarks referenced above.
DZ .. “I thought the purpose of a benchmark was to compare many vendors systems against each other with the workload remaining consistent?”
NetApp tends to use benchmarks as ways of demonstrating how much a their technology has improved against a previous NetApp baseline, to help their customers make good purchasing decisions. Proving they’re better than someone else is not a primary consideration, though often that is a secondary effect. Oracle is free to use their benchmarks in any way they choose, personally I’d love to see a range of configurations from each technology bench-marked rather than just sweetspots, maybe opensfs and netmist will bring this about, but the fact is, running open, verifiable, and fairly comparable benchmarks is expensive and time consuming and I will probably never see enough good engineering data published. If you’ve got some ideas to simplify this, I’d love to work with you on this (seriously, we might compete against each other, but we both clearly care about this stuff, not many do)
DZ .. With that in mind the Oracle 7420 still crushes the netapp in price, efficiency and performance. I am guessing we are also still better or comparable in power usage as well.
You’ll see from the above that I respectfully disagree with pretty much everything in that last statement, and I’m looking forward to that controller power usage comparison