Archive

Archive for the ‘Performance’ Category

Data Storage for VDI – Part 2 – Disk Latencies

Over the years, I’ve found that there is broad misunderstanding about how many IOPs a disk can do, so its not surprising to hear and see things like the following which I’ve taken from Ruben’s blog.

An IDE or SATA disk rotating at 5,400 or 7,200 RPM. At that rate it can deliver about 40 to 50 IOPS.

This is reasonable enough, but incomplete. I’m fortunate working for one of the worlds best technology companies, because I get access to all kinds of interesting and for the most part unpulished research, some of which shows that a 7200RPM SATA drive has the following I/O characteristics

IOPS Latency in ms
10 13 (minimum)
25 15
60 20
85 30
100 40
130 100

If you were to plot the complete set of IOPS vs. latency on a graph, you’d see fairly linear response times up to around about the 100 IOPS at the 40ms point, but after that there is an exponential increase in latencies. The same is true of all forms of spinning rust with the latency curves spike up at different points depending on the kind of drive and the associated hardware and drive settings (e.g. FCAL vs. SAS interconnects, native command queuing, skip mask writes etc). I’m not trying to be clever here (well maybe just a little bit), but as a storage designer, you really don’t want to drive your disks past that point where the latency begins to spike up. It also means that you need to be careful when specifying your IOPS requirement, that you also specify the level of latency that will keep your end users happy.

For example, some DBA’s consider an average latency of more than 5ms to be unacceptable, Microsoft specifies 20ms as the acceptable latency for an Exchange IO, the SPC-1 benchmark uses 30ms. From field experience, most people are happy with the performance of their file-sharing and VDI environments if the request latency remains below 20ms. As mentioned earlier if you have a 7200RPM SATA drive using native command queuing, you should be able to achieve 60 random 4K IOPS at 20ms response time and a little less for 8K IOPS. For a 15K RPM FC drive, you should be able to achieve about 230-240 random 4K IOPS at 20ms response time with pretty much identical figures for 8K IOPS. (if you’re interested the max IOPS for a 15K drive is about 320 at 70ms, but you really don’t want to go there)

This agrees reasonably well with Ruben Spruijt’s figures, of 50 IOPS for SATA, but I really feel that while  180 IOPS for 15K FC is a fairly common rule of thumb, for VDI deployments it’s too low. 180 IOPS represents a 10ms response time for a 15K drive, and 10ms response time is lower than a SATA drive can sustainable achieve, so it seems a little unfair to equate an 15K FC drive running at 180IOPS with a SATA drive running at 50 IOPS.

Ruben also says

These are gross figures and the number of IOPS that are available to the hosts depend very much on the way they are configured together and on the overhead of the storage system. In an average SAN, the net IOPS from 15,000 RPM disks is 30 percent less than the gross IOPS

Also, fair enough and mostly true, however in my experience, this tax or inefficiency is almost entirely due to the IOPS penalty on writes. Before I go on, I’ll clarify some terminology first, Ruben uses the terms “Net IOPS” for the IOPS served to the hosts, and “Gross IOPS” for the IOPS provided by the disks at the back end. While I like these terms, I’m more used to saying “front-end IOPS” and “back-end IOPS”. I’d also like to introduce another term “IOPS Efficiency Factor”  or “IEF” which is front-end IOPS / back-end IOPS * 100 e.g. if 700 front-end IOPS generated 1000 back-end IOPS at the array this gives an IEF of 70%.

Ruben then goes on to talk about the various RAID levels, and from my perspective does a fairly good job, however there are some places where I think there are some inaccuracies, such as Rubens blanket statement that

In an average SAN, the net IOPS from 15,000 RPM disks is 30 percent less than the gross IOPS.

This might be another handy rule of thumb, but as a storage designer working for an array vendor I can state pretty confidently, that its wrong far more often than its right, and the next blog entry  Data Storage for VDI – Part 3 – Read and Write Caching will explain why.

Data Storage for VDI – Part 1 – A Personal View

What an insanely busy three months it’s been since my last blog post, I got a promotion, started learning about how marketing really works (as it turns out marketing people aren’t really evil at all) and started working on our local alliance relationships with VMware, Cisco and Microsoft. The good thing about this is that I’ve had stacks of great ideas for blog posts, the bad thing is that I’ve had almost no time to get them into a fit state for publishing. What finally goaded me into action was a blog post entitled F*** the SAN. VDI storage should be local! . Of course working for a SAN vendor I was somewhat alarmed by the assertion, especially seeing that we pride ourselves in helping VDI deployments to be easier, cheaper and more reliable. I read through the post along with Understanding how storage design has a big impact on your VDI (UPDATED), and  How the hidden “SLA bump” can kill your VDI project: You’d better know your desktop SLAs going in!, and figured that I finally had a good enough reason to get blogging again. Brian’s first statement is

Most of us have learned that the biggest constraint / bottleneck for desktop disk image storage is not storage capacity, but IOPS

which is where he references Understanding how storage design has a big impact on your VDI (UPDATED) by Ruben Spruijt I read through this and found a number of interesting assertions that are mostly correct, however I’ll spend the next few entries of this blog post addressing some of what appear to be minor, but which IMHO are potentially important inaccuracies.

Before I start I’d like to say that I think Ruben wrote a truly excellent post. It is not my intention to disparage the work done there, it’s just that from a storage guy’s point of view, and more specifically a NetApp storage guy, some of the information deserves clarification.

The rest of this post was originally about 6000 (yes six thousand) words long. Based on some interesting and constructive feeback such as  “I got half way through and then got bored”, and some gentle encouragement from friends, I’ve modified the content, and split the rest into a multi-part post which can be found below.

Data Storage for VDI – Part 2 – Disk Latencies

Data Storage for VDI – Part 3 – Read and Write Caching

Data Storage for VDI – Part 4 – The impact of RAID on performance

Data Storage for VDI – Part 5 – RAID-DP + WAFL The ultimate write accelerator

Data Storage for VDI – Part 6 – Data ONTAP Improving Read Performance

Data Storage for VDI – Part 7 – 1000 heavy users on 18 spindles

Data Storage for VDI – Part 8 – Misalignment

Follow

Get every new post delivered to your Inbox.

Join 377 other followers