Archive

Archive for the ‘Uncategorized’ Category

Movember …

I have come to clean the pool ....It’s now the end  Movember which for me was a time to focus on men’s health, and in particular mental health. Most of you will know that I can be a moody old thing from time to time and some of you will know the impact that depression has on young men in particular,  and how that caused me to lose a number of good friends.

To show my commitment, I donated my face to the cause by growing a moustache for the entire month of November. My Mo sparked a number conversations, and no doubt generated some laughs; all in the name of raising vital awareness and funds for prostate cancer and male depression.

Why am I so passionate about men’s health?

  • 1 in 9 men will be diagnosed with prostate cancer in their lifetime
  • This year 20,000 new cases of the disease will be diagnosed
  • 1 in 8 men will experience depression in their lifetime

I’m asking you to support my Movember campaign by making a donation by either:

  • Donating online at: http://mobro.co/JohhMartin
  • Writing a cheque payable to ‘Movember’, referencing my Registration ID: 2207632 and mailing it to: Movember, PO Box 60, East Melbourne, VIC, 8002

If you’d like to find out more about the type of work you’d be helping to fund by supporting Movember, take a look at the Programs We Fund section on the Movember website: http://au.movember.com/about

Thank you in advance for supporting my efforts to change the face of men’s health.

Please donate to Movember
Categories: Uncategorized

How mid-sized businesses can make smart decisions on technology

This post Originally appeared in the ABC Tech and Games blog

The IT transformation currently occurring in the market thanks to cloud computing and the wide adoption of shared IT infrastructure seems like it’s predominantly affecting the large enterprise sector. But while this big-business IT revolution is going on, there is a flow-on effect to the mid-size market which is also tackling unprecedented data growth whilst struggling to assess the benefits of a move to cloud computing. The technology challenge for MSEs is in understanding how best to optimise their IT environments for both efficiency and scale so they can be comfortable in the knowledge they’ve made the right decisions for their business in the longer term. What is a mid-size business? Generally 100-1000 staff 1-3 IT staff who are generalists, not specialists IT staff typically have responsibility for the entire IT infrastructure. For the most part, the IT Vendor community in Australia tends to focus on the big end of town with multinational organisations dominating the landscape. When you then consider that 73 per cent (source: Reckon 2010 Annual SMB Business Survey) of the Australian economy is based on small businesses it seems unsurprising that mid-size enterprises which fall between these two market segments often seem to be forgotten. This means many MSEs are either reaching the limits of technology designed for small business with the resulting reliability and management headaches, or they’re paying too much to use specialist driven IT solutions aimed at the big end of town in an effort to avoid the problems they’ve just escaped from trying to do too much with SMB focused technology. These high end technologies with dedicated and siloed functionality aren’t well suited to mid-sized enterprises, not only because of their inherently high costs and inefficiencies, but also because the IT employees in these organisations are usually generalists. They need to know how all the company’s systems work, and how to fix them if they break and they simply don’t have the time to gain the specialist expertise needed to get the most out of these solutions. All of this puts the purchaser of IT for mid-size enterprises in an unenviable position. With many vendors rapidly jumping onto the cloud bandwagon, the mid-sized enterprise who still needs internal IT, but not at the kind of scale that would allow “internal clouds” are seeing a lot of turbulence in the supplier marketplace, none of which seems to be helping them. Many vendors are changing the way they are going to market and moving their focus away from their products for mid-sized business towards “cloudy” futures, and no longer investing in the kind of innovation required by this challenging business environment. While this is a worry, many of the traditional solutions offered by technology vendors to mid-sized enterprises often never really met their specific needs and challenges effectively. In an effort to win the business in these very budget-conscious organisations, mid-size businesses are often offered commodity-based solutions with low upfront costs, without being fully informed that many of these solutions cannot continue to meet their business needs as their company grows. In the midst of this gloom, one piece of good news is that in general, unlike many other areas of the marketplace that are facing budget constraints, mid-size business budgets are still reasonably healthy, though not infinite. Over the last few years, many MSEs have successfully focused on cost containment. For example, a large percentage have already taken the virtualization path, in fact, the virtualization trend is moving faster now than ever before. MSEs have seen great savings from these efforts, but most are now seeing that they’ve saved about as much as they can from consolidating and virtualizing their compute infrastructure while at the same time they are seeing a steady increase in the amount of money and percentage of their IT budget they spend on data storage. As the focus moves towards optimising data storage costs, mid-size businesses are looking for ways to reproduce the cost savings benefits they have gained in virtualizing their compute capacity. They are also looking for ways to optimize their environments to achieve more, address the data growth they are seeing and gain competitive advantage. These factors have huge implications on a company’s IT and in particular their data storage infrastructure requirements. However, these issues also provide a great opportunity to enhance IT systems to poise the company for growth. By demanding solutions that solve important and difficult challenges without undue complexity, and are powerful and scalable enough for the future, MSEs can gain great return on investment and get more from their suppliers. The test for mid-size companies is to make smart decisions on technology that is genuinely efficient, provide simplicity and offer scalability to meet growth requirements. The goal of MSEs needs to be focused around a technology foundation that will maintain pace with the growing business demands and allow the company to do more with fewer resources. So, what’s the lesson? Mid-size businesses are poised to continue growth and dominate in the market. The things they need to look out for in the technology arena are: solutions that can scale with their business needs – up and down – to protect their initial IT investment simple technologies that can be managed by IT generalists, yet still provide good cost of ownership and advanced enterprise-level capabilities partner businesses who can help them make smart decisions about the longer-term IT strategy, so it aligns properly with business objectives

Categories: Uncategorized

Breaking Records … Revisited

So today I found out that we’d broken a few records of our own few days ago, which was, at least from my perspective associated with surprisingly little fanfare with the associated press release coming out late last night. I’d like to say that the results speak for themselves, and to an extent they do. NetApp now holds the top two spots, and four out of the top five results on the ranking ladder. If this were the olympics most people would agree that this represents a position of PURE DOMINATION. High fives all round, and much chest beating and downing of well deserved delicious amber beverages.

So, apart from having the biggest number (which is nice), what did we prove ?

Benchmarks are interesting to me because they are the almost perfect intersection of my interests in both technical storage performance  and marketing and messaging. From a technical viewpoint, a benchmark can be really useful, but it only provides a relatively small number of proof points, and extrapolating beyond those, or making generalised conclusions is rarely a good idea.

For example, when NetApp released their SPC-1 benchmarks a few years ago, it proved a number of things

1. That under heavy load which involved a large number of random writes, a NetApp arrays performance remained steady over time

2. That this could be done while taking multiple snapshots, and more importantly while deleting and retiring them while under heavy load

3. That this could be done with RAID-6 and with a greater capacity efficiency as measured by  RAW vs USED than any other submission

4. That this could be done at better levels of performance than an equivalently configured  commonly used “traditional array” as exemplified by EMCs CX3-40

5. That the copy on write performance of the snapshots on an EMC array sucked under heavy load (and by implication similar copy on write snapshot implementations on other vendors arrays)

That’s a pretty good list of things to prove, especially in the face of considerable unfounded misinformation being put out at the time, and which, surprisingly is still bandied about despite the independently audited proof to the contrary. Having said that, this was not a “my number is the biggest”, exercise which generally proves nothing more than how much hardware you had available in your testing lab at the time.

A few months later we published another SPC-1 result which showed that we could pretty much doubl the numbers we’d achieved in the previous generation at a lower price per IOP with what was at the time a very competetive submission.

About two years after that we published yet another SPC-1 result with the direct replacement for the controller used in the previous test (3270 vs 3170). What this test didnt do was to show how much more load could be placed on the system, what it did do was to show that we could give our customers more IOPS at a lower latency with half the number of spindles .   This was the first time we’d submitted an SPC-1e result which foucussed on energy efficiency. It showed, quite dramatically how effective our FlashCache technology was under a heavy random write workload. Its interesting to compare that submission with the previous one for a number of reasons, but for the most part, this benchmark was about Flashcache effectiveness.

We did a number of other benchmarks including Spec-SFS benchmarks that also proved the remarkable effectiveness of the Flashcache technology, showing how it could make SATA drives perform as better than Fibre channel drives, or dramatically reduce the number of fibre channel drives required to service a given workload. There were a couple of other benchmarks done which I’ll grant were “hey look at how fast our shiny new boxes can run”, but for the most part, these were all done with configurations we’d reasonably expect a decent number our customers to actually buy (no all SSD configurations).

In the mean time EMC released some “Lab Queen” benchmarks, at first I thought that EMC were trying to prove just how fast their new X-blades were for processing CIFS and NFS traffic. They did this by configuring the back end storage system in such a rediculously overengineered way as to remove any possibility that they could cause a bottleneck in any way, either that or EMC’s block storage devices are way slower than most people would assume. From an engineering perspective I think they guys in Hopkington who created those X-blades did a truly excellent job, almost 125,000 IOPS per X-Blade using 6 CPU cores is genuinely impressive to me, even if all they were doing was  processing NFS/CIFS calls. You see, unlike the storage processors in a FAS or Isilon array, the X-Blade, much like the Network Processor in a SONAS system, or an Oceanspace N8500 relies on a back end block processing device to handle RAID , block checksums, write cache coherency and physical data movement to and from the disks, all of which is non-trivial work. What I find particularly interesting is that in all the benchmarks I looked at for these kinds of systems, the number of back end block storage systems was usually double that of the front end, which infers to me either that the load placed on back end systems by these benchmarks is higher than the load on the front end, or  more likely that the front end / back end architecture is very sensitive to any latency on the back end systems which means the back end systems get overengineered for benchmarks. My guess is after seeing the “All Flash DMX” configuration is that Celerra’s performance is very adversly affected by even slight increases in latency in the back end and that we start seeing some nasty manifestations of little law in these architectures under heavy load.

A little while later after being present at a couple of EMC presentations (one at Cisco Live, the other at a SNIA event, where EMC staff were fully aware of my presence), it became clear to me exactly why EMC did these “my number is bigger than yours” benchmarks. Ther marketing staff at corporate created a slide that compared all of the current SPC benchmarks in a way that was accurate, compelling and completely misleading all at the same time, at least as far as the VNX portion goes. Part of this goes back to the way that vendors, including I might say Netapp, use an availability group as a point of aggregation when reporting peformance numbers, this is reasonably fair as adding Active/Active or Active/Passive availability generally slows things down due to the two phase commit nature of write caching in modular storage environments. However, the configuration of the EMC VNX VG8 Gateway/EMC VNX5700 actually involves 5 separate availability groups (1xVG8 Gateway system with 4+1 redundancy, and and 4x VNX5700 with 1+1 redundancy). Presenting this as one aggregated peformance number without any valid point of aggregation smacks of downright dishonesty to me. If NetApp had done the same thing, then, using only 4 availabilty groups, we could have claimed over 760,000 IOPS by combining 4 of our existing 6240 configurations, but we didnt, because frankly doing that is in my opinion on the other side of the fine line where marketing finesse falls off the precipice into the shadowy realm of deceptive practice.

Which brings me back to my original question, what did we prove with our most recent submissions, well three things come to mind

1. That Netapp’s Ontap 8.1 Cluster mode solution is real, and it performs briliiantly

2. It scales linearly as you add nodes (more so than the leading competitors)

3. That scaling with 24 big nodes gives you better performance and better efficiency than scaling with hundreds of smaller nodes (at least for the SPEC benchmark)

This is a valid configuration using a single vserver as a point of aggregation across the cluster, and trust me, this is only the beginning.

As always, comments and criticism is welcome.

Regards

John

Some Thoughts on Bit Rot.

November 14, 2010 2 comments

During some recent discussions on Twitter, the subject of disk drive rebuild times for very large drives in excess of 10TB has raised the subject of urecoverable read errors also known as UER, which is sometimes blamed on something called  “bit rot”  however,  two NetApp sponsored studies shows that bit rot is far less of a problem for storage array reliability than many other factors.

The best publically available data on bit rot and it’s impact compared to other causes I’ve found is contained in “A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID) by Jon G. Elerath and Michael Pecht  in IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 3, MARCH 2009 http://media.netapp.com/documents/rp-0046.pdf”. The following information summarizes and paraphrases the information found in that document.

What Bit rot is and why you should care

Bit rot is a concern for two main reasons, for the home user with no RAID protection, it results in the inconvenience of a lost or corrupted file, or possibly a machine that wont boot, for the enterprise user, bit rot raises the specter, not just of a lost or corrupted file, but of the potential to completely lose an entire RAID group after the failure of a single drive due to the “Media Error on Data Reconstruct” problem. The less catastrophic issue on a enterprise calss array is far less because the additional error detection and correction available through the use of RAID and block level checksums means the chances of bit rot causing the loss or corruption of a file is vanishingly remote.

What I believe most people mean by bit rot, could be more accurately described as latent media errors rather “bit rot” which is more strictly caused by degradation of the magnetic properties of the media.

The reason for this is that most early RAID reliability models assumed that data will remain undestroyed except by “bit rot”. Although it is correct that the magnetic properties of the media can degrade, this failure mechanism is not a significant cause. Data can become corrupted any time the disks are spinning, even when data are not being written to or read from the disk.  The failure mechanisms outlined below here are not unknown, but neither are they readily available from HDD manufacturers

Common Causes for losing data

Four common causes for losing data after its been correctly written are “Thermal asperities”, scratches and smears, and corrosion.

  • Thermal asperities are instances of high heat for a short durations caused by head-disk contact. This is usually the result of heads hitting small “bumps” created by particles embedded in the media surface during the manufacturing process. The heat generated on a single contact may not be sufficient to thermally erase data but may be sufficient after many contacts.
  • Although disk heads are designed to push particles away, but contaminants can still become lodged between the head and disk, hard particles used in the manufacture of an HDD, can cause surface scratches and data erasure any time the disk is rotating.
  • Other “soft”materials such as stainless steel can come from assembly tooling. Soft particles tend to smear across the surface of the media, rendering the data unreadable.
  • Corrosion, although carefully controlled, can also cause data erasure and may be accelerated by thermal asperity generated heat

Why data is sometimes not there in the first place

A latent defect can also be caused by data that was incorrectly, or incompletely written to the disk in the first place, this can happen, this can happen because of the inherent “Bit Error Rate” or BER, writing to damaged media, or too much lubrication and “high-fly writes”

  • The bit error rate (BER) is a statistical measure of the effectiveness of all the electrical, mechanical, magnetic, and firmware control systems working together to write (or read) data. Most bit errors occur on a read command and are corrected, but since written data are rarely checked immediately after writing, bit errors can also occur during writes.
  • BER accounts for a fraction of defective data written to the HDD, but a greater source of errors is the magnetic recording media that coats the disks. Writing on scratched, smeared, or pitted media can result in corrupted data. The reasons for scratches and smears where covered earlier, however “pits and voids are caused by particles that were originally embedded in the media during the manufacturing process and subsequently dislodged during the polishing process or field use.
  • The final common cause for poorly written data is the “high-fly write.” The heads are aerodynamically designed to have a negative pressure and maintain the small, fixed distance above the disk surface at all times. If the aerodynamics are disturbed, the head can fly too high, resulting in weakly (magnetically) written data that cannot be read. In addition to “wind gusts” inside the disk, all disks have a very thin film of lubricant on them to help protection from head-disk contact. While this lubrication helps mitigate the effects of “thermal asperities”, lubrication build-up on the head can increase the flying height, resulting in weak or incomplete writes.

Where’s my data ?

Finally, all the data may have been written correctly, but the disk may not be able to “find” it, because of damage to special “servo” tracks which help keep the heads correctly aligned to the data on the disk. In some cases, it’s not damage to the servo tracks but wear and tear on the motor and disk head bearings, noise, vibration and other electromechanical errors can cause the head positioning to take too long to lock onto a track which ultimately also causes “latent block errors”

How to protect yourself

There are two main ways of dealing with these kinds of latent block errors, the first is to perform disk scrubs, which is something every reputable array vendor does, the problem is however that as disk sizes get larger and larger, the time taken to perform a full disk scrub can take too long for the protection to be as effective as it should. The other method is to use additional levels of RAID protection such as RAID-6 which allows for higher levels of resiliency and error correction in the event of hitting a latent block error when reconstructing a RAID set. NetApp uses both approaches as studies have shown that the risk of losing data through these kinds of events is thousands of times higher than predicted by most simple “MTBF” failure models.

 

 

Categories: Uncategorized

Some quick thoughts about backup

November 11, 2010 1 comment

This is a summary I wrote for someone else, not my usual blog entry, however it does encapsulate my thoughts around the benefits of NetApp’s implementation of replication based backup. I’ll try to get to a more technically focussed version soon.

Replication Based Backups

Exponential increases in data combined with increased storage density techniques means that traditional “Bulk Copy” based method of backup are no longer able to address the growing backup challenges of a modern IT environment. Even backup architectures which are based on an “incremental forever” basis may find that the time it takes to move whole files from remote locations over slow and WAN links hit scalability limits as the amount of data at the remote sites increases.

Ultimately, the most promising technology to resolve this involves replicating only changed data blocks from primary data sources to secondary arrays in other physical locations. These secondary arrays are equipped with high-density low cost disk drives to provide the large amounts of raw capcity in the densest possible footprint. Once the data has been sent to the remote array, the solution can then perform various kinds of data manipulation to store multiple recovery points within a small data storage footprint. This class of technology generally requires that the primary storage is re-hosted on intelligent storage arrays, or that new agents and secondary storage arrays and backup systems are implemented that support this advanced functionality.

This has the advantage that only changed blocks will be moved from primary storage through to the secondary storage on the NearStore. This reduces both the amount of storage that needs to be provisioned, and allows for the data to be sent over low bandwidth high latency network connections. Because of this the secondary copies of the data will can be stored automatically in an offsite location without needing a second two step process as is commonly required with tape backups.

With robust data storage architectures, multiple logical data points, and offsite copies, these technologies can provide almost all the benefits of a tape based backup solution.

Customer Backup Considerations

Proven Scalability

While a number of companies have been changing their traditional backup engines to leverage the benefits of replication based storage, NetApp pioneered this technology with the release of industries first ATA based backup to disk appliance in 2002. Since then NetApp has deployed this technology in thousands of locations world-wide many of which protect critical data estates measured in Petabytes

Centralised Policy Based Protection

The advanced data protection capabilities provided by NetApp requires a correspondingly advanced set of management methodologies and tools to fully exploit the benefits of replication based data protection.  Protection manager was designed for replication based backup,  and provides an integrated way of managing both backup and disaster recovery in a single pane of glass through the following functions and features

  • Discovery.  Detects new volumes not protected and presents as “unprotected data” in the Protection Manager UI.
  • Policy creation. Creates policies for data protection in a wizard-driven graphical process and then calls lower level NetApp tools for execution of the replication process
  • Monitoring. Monitors the whole replication process, watching the capacity and performance against policy, and ensures that protection policies are not out of compliance
  • Visualization.  Provides discovery and mapping views including drilldown and management by exception
  • Reporting. Offers status and health reporting such as a “data transfer report” to identify transfer amount, performance metrics, and duration of transfer for replication processes
  • Virtual machine support. Support through Open Systems SnapVault includes VMware ESX, Microsoft Hyper-V, and Citrix XEN
  • Application integration.  Integrates with SharePoint, SQL Server, Microsoft Exchange, Oracle, and SAP via NetApp SnapManager
  • DR task automation.  Automates tasks, leverages  templates, and provides ongoing monitoring with subsequent reporting to those in authority
  • DR readiness. Monitors resources for changes that could compromise a disaster recovery and proactively communicates them to administrators for remediation
  • One-button failover. Provides continued data access to users, even in the event of a disaster

Usable Copies

Unlike typical backup applications, snapvault always keeps the data in it’s original usable format that can be accessed by open industry standard protocols and methods. Files can be accessed using CIFS, NFS or HTTP, LUNs can be accessed by iSCSI or Fibre Channel, all without having to restore the data back to the original location (which may destroy good data), or find alternate space to recover the file / data object.

Ease of Restore

Usable copies also provides a self service restore capability that reduces recovery times (RTO), decreases helpdesk calls, and increases end user faith in the backup process. This in turn reduces counterproductive end user driven backup strategies and reduces both infrastructure and business costs.  Usable copies also allow backups to be verified for correctness, and provides easy ways of performing deep content searching of backups for legal and other data discovery requests.

Tape Integration

Because the backup data remains in the same format used for traditional primary storage, the high speed NDMP based dump and mirror to tape options used by thousands of companies around the world to protect their NetApp primary storage . These long term archival copies can be sent to tape under the control of traditional backup systems such as NetBackup, TSM or CommVault  leveraging existing knowhow and infrastructure, while minimising costs associated with tape management and off siting

 

 

Categories: Uncategorized

Data Storage for VDI – Part 10 – Megacaches

Megacaches

More recently a range of products have come to the market that take advantage of the increasing affordability of non volatile memory (particularly SLC Flash), to create caching architectures that change the rules for modular storage (in no particular order)

  • PAM-11 / FlashCache
  • Sun 7000 Logzilla and Readzilla
  • FalconStor Flash SAN accelerator (using flash modules from Violin)
  • IBM EasyTier / Something to do with SVC
  • EMC FAST Cache
  • Atlantis Computing vScaler
  • Nimble Storage
  • Lots more to come …

While I’d love to go into the details of each of these and compare the features and benefits of each technology, a lack of time and detailed information makes this really hard to do. Also, as a general principal, I don’t think that its wise for an employee of one vendor to make a lot of assertions about another vendors technology. I have enough trouble keeping up with what’s happening at NetApp without trying to gain deep subject matter expertise with, for example, HP or EMC’s technology. Having said that I do think contrasting two different approaches can be useful, so for that reason I’ve decided to deviate from that principal, and will compare as diligently as I’m able NetApp’s FlashCache and EMC FASTCache.

I’ve  included FlashCache for obvious reasons, there is already more than a Petabyte of it out there, I’ve been analyzing it for about a year now, and have I access to the engineering documentation,. I chose FAST Cache because being an EMC product means the marketing engine behind it will make it widely known and the engineering will be solid. The market presence and differing approaches of both of these technologies make them a fairly good yardstick against which the other mega-cache technologies will compare themselves..

Part of the reason I took so long to write this post was that I  spent a fair time trying to characterise the likely performance benefits of a FASTCache solution, which as a competitor is a fairly dangerous exercise.. I’ve tried to be even handed and fact based when doing this and have disclosed where possible the sources of my information, however if you believe I’ve misrepresented the technology please let me know, this is not about vendor bashing, it’s about establishing what I hope is a fair basis of comparison.

Doing this in an even handed fashion was particularly hard because a lot publically available information is either incomplete, or somewhat contradictory. I know that is an industry wide problem, but this is one area where there seems to be a lot more marketing material  than engineering substance. The main sources of my information were blog posts by EMC employees and integrators as well as an official EMC technical report, the details of which, and my takeaways from them are as follows.

How Fast is FAST for VDI ?

Chad Sakac quoted here in relation to the speed of writing data in various raid configuration  “(I’ve added SSD with 6000 IOps as commented by Chad Sakac).”  while I respect Chads comments to do with EMC’s integration with Vmware, I think he’s might be a little off here, especially given that this comment was made 6 months ago, long before FAST Cache was announced.

Mark Twomey (StorageZilla) says that EFD’s have no additional benefit for writes (I assume this applies mostly to Symmetrix which already does good write optimisation) quoted here where he says “The thing most people don’t understand about Flash is that writes aren’t really all that much faster to a good SSD than they are to a regular disk drive. And thus, predicting where writes are going isn’t an objective of FAST”

or Randy Loeschner who also works for EMC and seems to know his way around a database where he says on his blog “Solid State/Enterprise Flash Drives are similar in Write Performance to 15K Fibre Channel disks, but in READ scenarios are capable of 2500 or more READ IOPs.”

I also checked any available benchmarks and found the an EMC document that contained reasonably useful data, though even that seemed to contradict itself with regard to the number of IOPS you could get out of an EFD. Says that an Enterprise Flash Drive (EFD) can get 2,500 IOPS per drive, though without any details as to the latency or the I/O mix. Then further down it says that in a 50:50 read write 8K IOPS environment you can get 1057 IOPS per EFD at 12ms response time for reads and 24ms response time for writes without any additional help from the clarrion DRAM based write cache, or 1760 IOPS per EFD at 6ms response time for reads and 2ms response time for writes when the write cache is enabled.

I also found another informative post here at gotitsolutions.org

Which shows roughly 1100, 1500, and 2000 IOPS per drive for 100% random writes, 60:40 write read and 40:60 write read performance respectively  without help from DRAM caching. Furthermore, I had a conversation with a colleague who’s opinion I respect, it appears that “FAST Cache does 64K blocks …[which means that EMC] claim 50% more speed overall.”.

Based on the above information, I think it would be reasonable to assume  that a 6+1 EFD RAID group configured as FAST cache would allow for between 12,000 and 20,000 sub 5ms IOPS depending on the configuration and workload. Thats pretty good, but it’s not the “orders of magnitude” faster than spinning disk so often claimed, and nowhere near the performance of array cache.

The benefits of a write mega-cache

A write cache in our hypothetical 1000 user 12 IOPS per user and using 33:63 R:W VDI environment equates to about 30MiB/sec of random write activity or about 108 GiB per hour. a 6+1 RAID group of 146GB EFD drives provides about 822 GiB of usable cache space. If you split this 50:50 between read and write, this works out to about 4  hours of writes before you even begin to need to destage. This is the thing that differentiates a mega-cache from a standard cache is that it can absorb a sufficiently large number of changes to satisfy hours or possibly even entire business days’ worth of I/O. In addition a cache this large is almost certainly going improve the efficiency with which writes can go to the back end raid group. The extent to which is does this is dependent on many different factors. In some edge cases the additional improvement is marginal, in others it could be close to the kinds of efficiencies typically seen in a NetApp FAS array. In theory a 6+1 RAID-5 disk set combined with a large write cache could approach or even exceed the write efficiency of a 6+6 RAID-10 disk set.

The benefits of a read mega-cache

On the read side of the equation, mega-caches in the order of 250GiB+ have the advantage that they are able to store the majority of the active working set, especially in VDI environments where it is not unusual to see it offloading 80+% of the read I/O from the disks. This not only improves the latency of the I/Os  from cache but also those that need to come from disk. The disk improvements come from reduced I/O contention, and the ability to make read-ahead more effective as detection of the read pattern which triggers the read-ahead functions happens while the data is being served from cache. It also allows the read-ahead algorithms to be more aggressive as the potential risks of reading in too much data and flushing out other useful data is mitigated by the much larger read caches.

The Net-Net is that mega-caches can significantly reduce the average latency for disk I/O even in spindle constrained environments, and the ability to handle peak loads is significantly improved.

FlashCache

NetApp really stoked the market for mega-caches when it released the PAM-II, now called flash-cache (I’m kind of sad they changed the name, there were lots of bad PAM puns like “Flash in the PAM” that few if any will now remember) . Unlike SSD/EFD based cache architectures, FlashCache  connects to the storage controller via PCIe, and includes a NetApp created flash translation layer, some dedicated hardware acceleration and uses a driver which is tuned to the characteristics of all of this hardware. All of this results in a cache which is capable of hundreds of thousands of sub 2ms IOPS with shorter code paths and higher levels of CPU efficiency than is seen in SSD/EFD based caches.

Another thing that helps is cache awareness of FAS Deduplication and Flexclones, which in effect multiplies the effective size of the cache by the level of deduplication within the active dataset. For example if you are using deduplication for persistent desktop guest O/S images and seeing 95% deduplicatoin ratios (especially for the 2GB the core operating system portions of the image), your effective cache size is 20x larger. This means that even a modest FAS2040 with 4GB of ram can have an effective read cache of 50+GiB which comes in really handy during boot storms. For a 256GB Flash cache, using the same math, the effective cache size ends up being around 5TB ! Thats a best case situation, but the strange thing about VDI on NetApp is that best case scenarios just keep coming up over and over again which is what prompted me to start on this series of posts in the first place.

Isnt FlashCache just for reads ?

As good as FlashCache is, some commentators have quite correctly pointed out that this cache is read only, which is correct,  but they then go on to make the incorrect conclusion to say that is does nothing for write performance. This might elicit a “Thank you Captain Obvious” from some, but yet again this is one of those things which like the sun revolving around the earth, is simple, understandable, full of common sense, and also happens to be wrong.

Flashcache + Dedup/Flexlclone + Realloc  = High speed write cache.

If you’ve read through this entire series, you’ll might remember the following statement

“Thus we expect to write 336 blocks in 58+16= 74 disk operations. This gives us a write IEF of 454%”

This was on the assumption that the system  was about 80% full and that the best allocation area was about 40% utilised. But what happens when the best allocation are is completely unallocated ? This question was already covered in the following blog post, though it would appear the author decided to take another job outside of NetApp (good luck at CommVault Mike :-) and his NetApp blog may get cleaned up at some later time, so I’ve taken the liberty to take an excerpt from it.

“For demonstration, I configured a single 3 disk NetApp aggregate (2 parity, 1 data) to demonstrate how much random write I/O I could get out of a single 1TB 7200 SATA drive .. The result is over 4600 random write IOPs with an average response time of 0.4ms. “

This was 1 SATA data drive (the other two were parity drives which dont add to write speeds), 4600 random writes IOPS . If you extrapolate this, 4 1TB SATA drives will give you you get about 3TB of usable storage and 18,400 IOPS. Woo Hoo ! 4 SATA drives from NetApp = 7 EFD drives from EMC, game over discussion closed .. right ?

Well, yes, under ideal circumstances, but the world is not a perfect place, and neither are the datacenters which inhabi it, even with VDI, so what might stop this from working outside of the unicorn farm ?

1. Those disks wont stay empty

True, but to be equal to the amount of write cache used in our 7 disk EFD cache (assuming a 50:50 between read and write cache) we could add fill those 4 SATA drives with 2.5TB of data, and still have an equivalent write caching capability

2. That freespace wont stay contiguous

True, but NetApp provides methods to re-arrange the freespace via the reallocate -A command (sometimes called segment cleaning). This option is particularly well suited to VDI environments where large burst writes are fairly typical and where optimising access for sequential reads is not generally considered a high priority.

3. There will be competition for read I/O

True, but single instancing technology and smart caching allows the majority of those reads to be served from cache.

But what about the real world ?

I plan to cover each one of these in some blogs on detailed peformance tuning for NetApp, but rather than delve even deeper into abstract theory, I’m going to pull some data and graphs from an existing 2000+ seat VDI deployment that uses FlashCache and Reallocate to manage some very bursty I/O patterns. The interesting thing about this particular implementation is that it is far from an “ideal” workload ad shows what can be done with a little bit of planning and some really smart storage controllers. In addition with a little luck and some persistence I’ll also pull up a far more modest lab environment and see exactly how much you can wring out of a NetApp controller on a tight budget.

Categories: Uncategorized
Follow

Get every new post delivered to your Inbox.

Join 377 other followers