Thursday, July 31, 2008

DRS Performance and Best Practices

VMWare seems to have seriously ramped up their documentation machine in a wonderful way. I'm seeing more and more valuable papers posted. http://www.vmware.com/files/pdf/drs_performance_best_practices_wp.pdf There are some interesting conclusions in the document.
  • When deciding which hosts to group into a DRS cluster, try to choose hosts that are as homogeneous as possible in CPU and memory. This seems to be a bit of a no brainer in any kind of clustered environment, but even more important in DRS where all nodes of the cluster are active partners, even though DRS will account for CPU/Memory size differences.
  • When more ESX hosts in a DRS cluster are VMotion compatible, DRS has more choices to better balance workloads across the cluster. We recommend clusters of up to 32 hosts. I'm not sure I buy 32 hosts yet. There is overhead involved in the DRS calculations even though they only take place every 5 minutes by default. The limit in a cluster was just recently raised from 16 to 32. There is a definite advantage to larger cluster size though as can be seen in the following chart. One may also wonder why they only tested up to 16 rather than 32 nodes of the cluster. Of course that could just be a lab limitation though as the cap was just lifted in u1 or u2 of 3.5.
  • The default migration threshold (moderate) works for most configurations. You can set the migration threshold to more aggressive levels when all of the following conditions are satisfied:
    • The hosts in the cluster are relatively homogeneous.
    • The virtual machines’ resource utilization remains fairly constant.
    • The cluster has relatively few constraints on where a virtual machine can be placed
    You should set the migration threshold to more conservative levels when the converse is true.
  • The default DRS frequency is once every five minutes, but you can set it to any period between one and 60 minutes. You should avoid changing the default value. This should never be less than 5 minutes and make sure you adjust aggressiveness of automation before even considering this setting.
  • In general, do not specify affinity rules unless you have a specific need to do so. In some cases, however, specifying affinity rules can improve performance.
    • Keeping virtual machines together can improve performance if the virtual machines need to communicate with each other, because network communication between virtual machines on the same host enjoys lower latencies.
    • Separating virtual machines maintains maximal availability of the virtual machines.
    • Virtual machines that might need to be separated is virtual machines with I/O‐intensive workloads. If they share a single host, they might saturate the host’s I/O capacity, leading to performance degradation. DRS does not make virtual machine placement decisions based on their usage of I/O resources.
    Very good point on the I/O placement. When will I/O be taken into consideration by DRS? Storage I/O is being addressed by VMWare well around proving throughput. But balancing that throughput is still non existent.
  • Assign resource allocations to virtual machines and resource pools carefully. Be mindful of the impact of limits, reservations and virtual machine memory overhead. Make sure you understand "slices" if you are working with DRS clusters and how they impact admission control. This has been changed made more conservative between 3.0.x & 3.5.x. Always considerer expandable reservations if you need to use reservations.
  • Virtual machines with smaller memory sizes or fewer virtual CPUs provide more opportunities for DRS to migrate them in order to improve balance across the cluster. Virtual machines with larger memory size or more virtual CPUs add more constraints in migrating the virtual machines. Hence you should configure only as many virtual CPUs and as much memory for a virtual machine as needed. One of the golden rules in VMWare no matter if using DRS or not.
  • You can specify DRS modes of automatic, manual, or partially automated at the cluster level as well as the virtual machine level. We recommend that you keep the cluster in automatic mode. DRS is now a mature and stable process. I've yet to see anyone that starts in "manual" or "partially automated" stay in that mode.
Kudos to VMWare for another quality white paper.

0 comments: