Virtual John: April 2008

Tuesday, April 29, 2008

ESX Storage Architecture

So something else interesting from yesterday is it sounds like big changes are coming to ESX storage architecture in the back end. It's still a ways off but it could finally mean Power Path coming to ESX as well as a lot of other storage tools being able to be written and work with Virtual Infrastructure. Power Path has been a major pain point for us. Our storage team requires it on all SAN connected hosts. We worked around this from some rather expensive script writing by EMC. This really seems like a script that EMC should provide customers work around the Power Path not working issue. Issue: When a Storage Processor event occurs such as a flare code upgrade or SP failure/replacement happens, all the load is pushed to the remaining SP. In a Power Path connected world, Power Path manages the trespass and moves the LUN back to the preferred SP. So at this point all ESX hosts are forced to one SP. Solution: We engaged EMC to write a perl script that works of the navi management server. This enumerates all the VMWare LUNS and checks the "default SP" If the current owner is not the default SP, the script issues a trespass and moves the LUN to the correct SP. Seems simple enough. But by the time we included all the features we wanted the functional spec was two pages long. Some examples are exit codes, configuration files, interactive/silent mode, etc...

Monday, April 28, 2008

SAN Pathing on an Active/Passive Array Part II

In a previous post I spoke of the issues around SAN Pathing on an Active/Passive Array. Today I met with VMWare EMC and our internal storage team to discuss this. I was slightly incorrect on how ESX decides where to assign a path. ESX issues a SCSI inq command and the first path to respond is used. So, because the first card to come up issues the command first, it usually takes the path. But depending on load of the array and fabric, the other card could be used. In talking through the issue though with everyone we came to a couple conclusions.

This can be done.

Build a table of the wwn from both SP.
Determine what SP a LUN is on from the wwn of the SP.
Determine what path on the second HBA is on the same SP.
Check if active path is already on preferred HBA (every other LUN alternates HBA)
If not on preferred HBA disable all but active path and path preferred to move to, the path being the ON but not ON ACTIVE path is not enough.
Disable ON ACTIVE path to move to other HBA.
Re-enable all paths.
Continue though all LUNS

In most environments the effort is not worth the benefit though.

So, we are going to keep an eye on HBA utilization and see if a bottleneck appears. At that point we may pursue the script above.

Wednesday, April 16, 2008

Optimum LUN Size for ESX Clusters

I'm curious what other organizations are using as LUN sizes? We have standardized on 250G LUNs. This seems to be what a lot of "best practices" kind of guides suggest and keeps I/O to an individual LUN manageable and tends to result in 6-8 VM's per LUN.

HA = High Availability?

So, something that has been bothering me with VMWare marketing for some time is HA. What is "High Availability"? It's a pretty subjective term. Wiki defines it as: High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period. However look at the Google definitions and you'll find some more and some less restrictive definitions. What is VMWare HA? Let's start with what HA is not, Continuous Availability. I think a lot of times CA is the perception non engineers have of what VMWare HA provides. What HA provides is a method in which all hosts in a cluster continuously monitor for the best way to restart all of the VMs on a given host, if that host fails or the VMs become isolated. In plain English this means, if one of your hosts in a cluster of VMWare Servers goes away the VMs will reboot elsewhere. Reboot = downtime, so is this high availability? Or just higher availability than no fault tolerance? How do you define High Availability?

Tuesday, April 15, 2008

Virtual Center Upgrade - Part 2

I had to simplify this when executing on multiple machines, so here is what I came up with. ps -auxwww | grep hostd |awk '{print "kill -9 " $2}' | sh service mgmt-vmware start service vmware-vpxa restart

Virtual Center Upgrade

I recently upgraded from Virtual Center 2.0.1 to 2.0.2. Now a week later have found the first real issue while attempting to import a VM from VMWare Converter. The import would hang at 1%, if you viewed the job in Virtual Center it would show creating the VM. If you connected directly to the ESX host with the VI client, you would see the VM had completed creation. Basically the communication looked like it would get to the ESX host but not come back to Virtual Center. I was able to run convert directly to the ESX host and start the VM. But none of this translated into Virtual Center. To solve this we had to restart the management services. Due to some bugs in the hostd service you have to kill the process rather than restarting it. There is a known issue in some versions of ESX that when you restart the hostd process (service vmware-hostd restart) any VMs set to shutdown on a reboot the host, may shut down. So first we have to find the pid's of the hostd services. [root@xxxxx24 root]# ps -auxwww | grep hostd root 1933 0.0 0.0 4260 4 ? S Jan30 0:00 /bin/sh /usr/bin/vmware-watchdog -s hostd -u 60 -q 5 -c /usr/sbin/hostd-support /usr/sbin/vmware-hostd -u root 1939 0.4 11.5 58636 31060 ? S Jan30 462:02 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u root 21729 0.0 0.2 3696 664 pts/1 S 16:04 0:00 grep hostd [root@xxxxx24 root]# Then kill the two hostd services. [root@xxxxx24 root]# kill -9 1933 1939 Now we can start mgmt-vmware and restart the vpxa service [root@xxxxxx24 root]# service mgmt-vmware start Starting VMware ESX Server Management services: VMware ESX Server Host Agent (background) [ OK ] Availability report startup (background) [ OK ] [root@ivpesx24 root]# service vmware-vpxa restart Stopping vmware-vpxa: [ OK ] Starting vmware-vpxa: [ OK ] [root@xxxxxx24 root]# This causes the host to disconnect from Virtual Center for a bit and when it reconnects everything synced up correctly. One caveat to this. In ESX 3.5, there are some logfiles of the pids stored in /var/run that may need to be cleaned out before starting the mgmt-vmware service.

Cisco Networking

I found this through the RTFM website. Always a quality spot for reference. At the end of the session, the participants should be able to:

Objective 1: Understand key concepts of server virtualization architectures as they relate to the network.
Objective 2: Explain the impact of server virtualization on DC network design (Ethernet & Fiber Channel)
Objective 3: Design Cisco DC networks to support server virtualization environments

http://www.cisco.com/web/BE/learn_events/pdfs/Server_Virtualization.pdf

Healthy Console Reminder

It’s easy to get spoiled these days. Most everything can be installed over RDP without a problem. But I read this today and was of reminded the importance of the console (even using /console for RDP). Thought it would be good information to share. Every couple of months or so, we'll get a call from an Administrator reporting that his system hung when he tried to reboot it after installing patches. More often than not, the patches were installed by an Administrator logging on to the server via Remote Desktop (without using the /console or /admin switch) and using either the Windows Update web site or the "Automatic Updates" tray icon. Both methods do the same thing and use the same processes. After the updates are installed, the Administrator clicks on the "Restart Now" button to complete the installation. The Remote Desktop Session goes away, and the Administrator thinks that the server is in the process of rebooting. However, the problem is that the server may not really be rebooting. When the Administrator tries to connect back into the server via RDP after several minutes, he discovers that he cannot. When he logs on at the console of the machine to investigate, he discovers that the RDP Listener is listening on port 3389 but no-one can connect via RDP. To resolve the issue, he has to reboot the server from the console. So what happened? http://blogs.technet.com/askperf/archive/2008/03/18/hotfix-installs-remote-desktop-and-the-reboot-that-wasn-t.aspx

Monday, April 14, 2008

Tools I Use - PowerRecon (3.1) - Planning Edition

I have been through a "VMware Virtualization Assessment" from VMWare and frankly wasn't that impressed. We had some very specific questions around capacity planning and Disk I/O that were not captured. I can see the value in this offering for a shop trying to get into virtualization, but in a mature IT shop with very specific questions not, general "fit" it didn't work well. So, when it came time to virtualize one of our DataCenters from an acquisition, another tool was in order. We had around 50-60 workloads to consolidate and a Clariion CX3/40 already on site. So, our big questions were: How many hosts and how fast of disk? Something I've found with many of the third party management tools around virutalization is a certain lack of maturity. PowerRecon was no different. In planning I had created a database on a shared SQL2005 server and an account with dbo rights to only the database I wanted to use. But, this didn't work. PowerRecon wants full rights on the SQL server to create it's own databases, you can scale back those rights later though. The inventory went fairly well with only one email/call to PlateSpin support for a group of problem servers. This was fixed by changing the credentials that the service started as to a Domain Admin account. Once again, a bit lacking in maturity of a product to only grant appropriate security to part that need it. You can use alternate credentials to connect/inventory/monitor, but this didn't seem to work perfectly. Once you get through these steps the rest is easy. Well it's easy if your the patient type which I'm not. Even though I really needed 20-30 days of monitoring to see some trends, I still wanted to peek at the results daily. It's pretty interesting to see the data fill in. Reporting was pretty straight forward and I customized some of the reports with ease to export and plan disk sizing. The two questions were answered very well in number of host, memory was the constraint. In disk, there were 2 very clear levels. Our SQL & File Servers were heavy on I/O and everything else was not. So FC for the SQL & File Servers and SATA for everything else. One thing that could be improved for the capacity planning tool would be the ability to plan around IOPS not just MB/Sec. I was able to get the information I wanted in IOPS from the reports, but could only run scenarios based on MB/Sec. Would I use this product again? Certainly so. I purchased enough monitoring days to have a pad to use on other virtualization projects around the enterprise. Would I buy this product in the PowerRecon Standard Edition with Planning packaging? I don't know. The price point on it seems comparable with the other products on the market that are generating a lot of buzz. I'm also currently evaluating vCharter Pro from Vizioncore and Capacity Bottleneck Analyzer Virtual Appliance from VKernel.

Tools I Use - Veeam

Veeam seems to have snuck into everyones shop with FastSCP a free SCP client that is FAST for ESX. They have branched out into backup and monitoring as well. ESX Stencils - Everyone loves a pretty picture in Visio RootAccess - Not to allow RootAccess but to add local user accounts FastSCP - Great SCP utility for copying files to ESX and can elevate to root. One caveat, if you have ESX hosts behind a firewall, FastSCP uses high ports and can not be fixed to the SCP port (22). I use FileZilla for these. ESXDiag - Good troubleshooting tool for random errors, or "double checking" a server before deploying.

Unity...

At my company our desktop standard is Windows XP. For some time I've been running the "corporate desktop" within a VM. I wanted to leverage the a 64bit OS for the host OS. I selected Fedora 64bit (FC5, 6, now 7) as the host as we are also a big RHEL shop. My background is more Windows so the experience would be good for ESX and RHEL. I recently upgraded to the beta version of 6.5 and though the unity interface is still a bit buggy, boy is it cool. It really does blur the presentation layer. The biggest caviat I've found with the beta version of 6.5 is speed. VMWare enables full debug logging on all beta releases. So it's significantly slower than 6.0 is. I have no doubt this will change in the full release though.

SAN Pathing on an Active/Passive Array

So the best practice pathing on an Active/Passive array is MRU (Most Recently Used). This keeps the host from trying to use more than one SP and only failing The command to observe the startup behavior of paths in ESX 3.0.x is "esxcfg-mpath -l" /dev/sda is the local VMFS and the other path is the CDROM. Next you will see the 4 LUNS. These are on two separate arrays. One CX600 and one CX3/80. [root@xxxxx35 root]# esxcfg-mpath -l Disk vmhba0:0:0 /dev/sda (139392MB) has 1 paths and policy of Fixed Local 2:14.0 vmhba0:0:0 On active preferred Enclosure vmhba0:264:0 (0MB) has 1 paths and policy of Fixed Local 2:14.0 vmhba0:264:0 On active preferred Disk vmhba1:0:0 /dev/sdb (256000MB) has 4 paths and policy of Most Recently Used FC 15:0.0 2100001b3209abad<->500601601060225b vmhba1:0:0 On active preferred FC 15:0.0 2100001b3209abad<->5006016a1060225b vmhba1:1:0 Standby FC 17:0.0 2100001b32091db5<->500601681060225b vmhba2:0:0 Standby FC 17:0.0 2100001b32091db5<->500601621060225b vmhba2:1:0 On Disk vmhba1:0:1 /dev/sdc (256000MB) has 4 paths and policy of Most Recently Used FC 15:0.0 2100001b3209abad<->500601601060225b vmhba1:0:1 On active preferred FC 15:0.0 2100001b3209abad<->5006016a1060225b vmhba1:1:1 Standby FC 17:0.0 2100001b32091db5<->500601681060225b vmhba2:0:1 Standby FC 17:0.0 2100001b32091db5<->500601621060225b vmhba2:1:1 On Disk vmhba1:2:0 /dev/sdd (256000MB) has 4 paths and policy of Most Recently Used FC 15:0.0 2100001b3209abad<->5006016a39a02964 vmhba1:2:0 On active preferred FC 15:0.0 2100001b3209abad<->5006016339a02964 vmhba1:3:0 Standby FC 17:0.0 2100001b32091db5<->5006016239a02964 vmhba2:2:0 Standby FC 17:0.0 2100001b32091db5<->5006016b39a02964 vmhba2:3:0 On Disk vmhba1:2:1 /dev/sde (256000MB) has 4 paths and policy of Most Recently Used FC 15:0.0 2100001b3209abad<->5006016a39a02964 vmhba1:2:1 On active preferred FC 15:0.0 2100001b3209abad<->5006016339a02964 vmhba1:3:1 Standby FC 17:0.0 2100001b32091db5<->5006016239a02964 vmhba2:2:1 Standby FC 17:0.0 2100001b32091db5<->5006016b39a02964 vmhba2:3:1 On [root@xxxxxx35 root]# You can see how ESX loads all the paths to the first hba, in this case vmhba1 (On Active). The array has a preferred SP so the array picks which Storage Processor, and the ESX host picks which hba. This will eventually create a bottleneck on vmhba1(or first HBA). To solve this, I'm working with VMWare on the best way to balance the paths. There will be no intelligence to the balancing based on load, but I would like to put half of the LUNs onto the second HBA. To do this I'm looking at a startup script that would disable the standby path on the first HBA and then disable the On Active path on the first HBA. This would force the path to fail to the second HBA, and allow the path to stay on the preferred Storage Processor. If the On Active path is failed first, this could cause a LUN trespass and move the LUN (and all other ESX hosts connected to the LUN) to the second Storage Processor. After a pause both paths would be re-enabled. I have proposed this to our VMWare/EMC TAM's and am waiting on a responses.

Day 1

So, I've been working with VMWare in some form since about 98 or 99, and quite extensively over the last 3 years or so. I decided it was time to create a place to paste my notes and thoughts and lessons learned. Maybe someone else will find it useful, maybe I'll just refer back to it for reference. A quick who am I? I am a Senior Engineer in a company with 200+ ESX hosts and on the cross department design team for the architecture. I am personally responsible for 50+ ESX hosts with from stand alone to SAN connected clusters.

Virtual John