Thursday, July 28, 2011

Setting up PCNS 3.0.0 for ESXi 4.1 update 1

Note: An updated version of these instructions for ESXi 5.0 and PCNS 3.0.1 is available in a new post here: http://tsbraindump.blogspot.com/2012/08/installing-and-configuring-apc.html

I've done this setup 3 times now, and every time I end up having to resort to trolling the internet for instructions, so I figured it's about time I write down my own. Note these are from personal experience, contacting support, etc etc.

1) Download stuff. There's a bunch of it!
a) Download PCNS 3.0.0 from the APC website. It's free now, it just requires an apc.com login.
b) While you are there, grab the latest firmware for the APC card you are using. Make sure you know the kind of network management card you have, as the older ones are not able to use the latest firmware.
c) Download the latest VMware vMA from the VMware.com website under 'tools'. As of this writing, it's 4.1 released in 2010.
d) Grab the free version of Veeam fastscp... it makes loading the APC installer on the vMA a snap.

2) Install the latest network management card firmware first. It may take several shots at it... for some reason it likes to fail, but doing it over and over usually gets it going. Don't ask why.

3) Create one vMA virtual machine FOR EACH HOST by extracting that zip file you downloaded and then launching the vSphere client and going to File -> Deploy OVF Template. It will guide you through what's required, but you basically need a 5GB volume (I usually make the volume 10GB for snapshot space), 1 processor, and a gig or so of ram. Note that I said "for each host". I spoke to APC support, and they admitted (finally) that the vifp 'fastpass' stuff does not work. My personal experiences confirm this. For some reason it only shuts down the first host on it's list!

4) Open the console of the vMA and follow the initial setup instructions. I recommend assigning a static IP as well as a real hostname to the vMA for use later. For the hostname, specify a full domain name such as VMA.domain.local.

5) Install Veeam FastSCP, then open it. Click "Add Server" and add the vMA server as a "Linux Server". Be sure to use the vi-admin username and password you specified while setting up the vMA. Uncheck the box to "Elevate account to root" as we do not have root access - it's been disabled.

6) In FastSCP, browse to the pcns300ESXi.tar.gz file you downloaded earler and right click / copy it. Note that you have to right click / copy in FastSCP, not in windows explorer... FastSCP does not have windows clipboard access built in. Once copied, expand your vMA in FastSCP, browse to the tmp folder, and paste it in the root of tmp. You can close FastSCP now.

7) Back in the vMA console, browse to the tmp directory and run "gunzip pcns300ESXi.tar.gz", then run "tar -xf pcns300ESXi.tar" to extract the file. Browse into the ESXi folder that is created, and run "sudo ./install_en.sh" to start the installation of pcns 3.0.0. You will likely get prompted with a warning (read it, it's funny) and need to enter your password. After that, installation starts. Continue with all the defaults. When you get to the part about entering an IP, just press "q" to skip it.

8) Make sure you get the message about Installation has completed, and the note about how to access it - this means installation was successful. Now, you need to add your ESXi servers to the 'fast pass' access via the following command: "sudo vifp addserver". You will be asked to enter the password for your host, so vMA will have it on file.

9) Now, we need to configure PCNS. Open a web browser and go to https://:6547 and follow the configuration wizard. The only step worth menitoning is that you should NOT check the box for "Turn off the UPS after shutdown finishes." If you check it, there's a good chance the UPS will turn off while your hosts are still shutting down. The caveat to leaving this unchecked is that your ESXi servers most likely will not turn back on if power is restored before the UPS battery loses all of it's charge... you'll have to use iLO or DRAC to get in and turn your hosts on instead.

10) Next, you will want to configure shutdown events. Once the PCNS wizard finishes it forwards you to the PCNS configuration page. Click on "Configure Events" and configure some of the important events like "UPS: On Battery" and "Input Power: Restored". You'll most likely want to notify users for most of the events. Don't forget to check the box for "Shut Down System" next to "UPS: On Battery" and set it to go off after a reasonable amount of time on battery (this depends on how much runtime your battery has, but I usually set mine for 5 minutes. If the power is off for 5 minutes, it's most likely going to be off an awful lot longer.)

11) Also, you'll want to check the "Connected Servers" tab on the left under "UPS information" to make sure your vMA IP address is listed. If not, you'll want to add it as a client on your UPS's network management card's web page.

12) Finally, in the vSphere client, make sure all of your ESXi hosts have their "Virtual Machine Startup / Shutdown" options configured and they are set to "Enabled", otherwise your guests are likely to just be turned off rather than shut down!

13 Repeat these steps for each host in your cluster. Yes, you need one VMA per host! Don't believe me? Fine. Try it yourself. Or call APC. You'll see.

That should get the system going! You can test it by, well, pulling the plug! If that's too scary for you, you can also configure the shutdown option for "PowerChute cannot communicate with the NMC", then pull the network cable from the network management card. If your hosts and guests shut down correctly, all is well!

Friday, July 1, 2011

Setting LUN Policy and PSP in ESXi 4.1

For customers with many LUNs, it can be a serious pain to create, modify, and maintain all the multipathing settings in VMware ESXi... I have a client with four active iSCSI adapters on each host, and four active iSCSI frontend ports on their Compellent SAN, which means a delightful 16 paths per LUN. Couple that with 2-3 LUNs per guest (raw device mappings), and 15 or so guests... that's a lot of clicking.

This is where VMware's vCLI is a godsend... you can script all of those monotonous clicks. For example, here's a pair of scripts I used to convert all of their Path Selection Policies and default PSP's to Round Robin.

1) Download and install vCLI
2) type "connect-viserver hostname" (where hostname is the name of the ESX host you want to make the changes on)
3) type "get-scsilun -vmhost (get-vmhost hostname) -luntype disk | set-scsilun -multipathpolicy "roundrobin""
4) Open an SSH prompt to the ESX host in question (you may need to enable SSH access in your hosts Configuration > Security Profile in vSphere).
5) In the SSH prompt, type "esxcli nmp satp setdefaultpsp --psp VMW_PSP_RR --satp VMW_SATP_DEFAULT_AA

NOTE: The --satp type may vary depending on what SAN you use... you can check what your SATP currently is by clicking "manage paths" of any one of your LUNs in vSphere... under the path selection type it displays it as "storage array type".

That's it... now all the LUNs on the host should be round robin, and all future volumes should default to round robin as well!

Update:

For ESXi 5, the SSH command syntax has changed slightly. You'll want to type the following from the console of your ESXi 5 server:

esxcli storage nmp satp set --default-psp VMW_PSP_RR --satp VMW_SATP_DEFAULT_AA
Also, the part about vCLI is no longer necessary! Once you switch the default path policy with he above command, you can just reboot the host and it should apply the policy to all existing volumes. Thanks, VMware!

Wednesday, June 22, 2011

Exchange 2010 Edge Role + 2 NIC cards + DNS issues

I am trying out a few technologies that are new to me for one of our clients, namely Exchange 2010 with the Edge role on a DMZ server, as well as installing Threat Management Gateway on the Edge server... good times! I am sure there will be several posts on TMG coming soon... I can feel it now...

Anyway, I ran into a DNS problem with my fresh Exchange 2010 install that took a little bit of doing to figure out what was going on. The basic layout of the environment is that they have a single Exchange 2010 server on their LAN with the CAS/HUB/MBX roles installed, and a single EDGE role on a DMZ server that is on the domain. This Edge server has two NIC cards... one for the inside network so it can talk to the domain and do windows authentication, and one for the outside network so that it can receive mail and publish the Exchange web interface.

Shortly after installation, mail started queuing up on my edge server. Taking immediate advantage of the tools in front of me, I used the Queue viewer and noticed a few DNS warnings that certain domains could not be resolved. Thinking I was smart, I proceeded to put in hosts entries on the Edge server... nope, didn't solve it. I then thought that perhaps it was using the external interface's DNS server to try and resolve internal server names... so I took the DNS servers away from the external NIC card... still no luck.

Then, I stumbled upon a setting that suddenly cleared it all up: on the edge server, if you open up the Exchange management console, right click on the Edge server in the middle pane, and go to properties, there is a tab for 'External DNS lookups.' I set this to the Inside NIC card, and bam, mail started flowing.

Monday, April 25, 2011

Dell Server P2V stuck at loading screen

Almost every P2V I have done to date has gone flawlessly. I frankly expected more trouble, but have been very pleased with VMware's product thus far. Almost every virtualization that experienced trouble was a result of third party software, or in this case, drivers for third party hardware.

The specific server in question was a Dell PowerVault 700 series NAS... it was running Windows Server 2003 Appliance edition, an OEM only version of windows 2003. Not really supported for virtualization, but the intention was to upgrade it to Datacenter Edition anyway once all was said and done... it was just a matter of getting there. After the virtualization process finished, the server powered on, and would not get past that animated windows logo loading screen with the grey dots moving across the bottom. I let it sit for up to 30 minutes before I was fairly sure there was no hope. The processor spiked at 100% the entire time, so I knew it was some kind of driver or service misbehaving. The system booted into safe mode just fine, but wouldn't boot into normal mode, even in diagnostic startup mode.

I worked with Microsoft, they didn't have a clue either. Time for process of elimination! I opened up MSINFO32 and made an export of all the Software Environment > System Drivers section for safe keeping, then went to work disabling all of them, section by section, starting with the ones marked as 'Stopped' with 'Start Mode' Set to Boot. I knew it had to be a 'stopped' one as windows was starting fine in safe mode, but not when everything started up normally. You can change them by opening up regedit and going to HKLM\System\Services. Expand each service name, and under it is a key called 'Startup' with a value of 0-4. 0 is 'Boot', 1 is 'System', 2 is 'Automatic', 3 is 'Manual', and 4 is 'Disabled'. Make sure to write down the original value of each 'startup' item as you set it to 4, because you will need to revert the changes.


Either way, disabling all the 'Boot' startup items didn't fix it, so I moved on to disabling all of the ones with 'start mode' set to 'System'. With all Boot and System items set to disabled, bingo! It started up. Now I put them each back to their original startup mode, one by one, rebooting into normal mode each time to see if the server would start. It finally narrowed down to one service, called "Msdisp." The description? "DELL LED DRIVER". Go figure! It was a driver for the hardware chassis to control the LED lights on the front of it, and it apparently blew up when it couldn't find whatever LED controller it was looking for. I left the service disabled, returned all others to their original values, and I was off to the races!

Hopefully this process of elimination method helps someone else out there!

Thursday, April 21, 2011

Performing a P2V on a Domain Controller

Yes, I know, it's not recommended. Heck, it's not even supported. But what if you want to throw caution to the wind, and do it anyway? Here's the method I used that worked successfully. A few notes though:

- This only works in environments with two or more domain controllers
- There may still be problems with clients authenticating should you be ballsy enough to do this during business hours, as I was. Rebooting usually fixes them.

1) You'll first of all want to transfer all of your FSMO roles to another DC in the environment. You can see which servers are hosting the FSMO roles in your environment by opening up a command prompt on your domain controller and typing "netdom query fsmo". You'll get back a neat list of which services are where. Transfer them to other domain controllers (I'll leave the specifics out of this article, google has the answers.)

2) Open a command prompt on the DC you wish to P2V and type dcpromo, then hit enter. Proceed through the process to remove active directory from this server. You may get an error the first time... it's okay... just run it again. It's complaining that services didn't stop or start in a timely fashion, which is normal for a DC. Reboot once you are done.

3) Uninstall DNS and WINS using Add/Remove programs. Also, if you are running DHCP, you may want to start up a split scope on another server, as you should disable the dhcp scope on this server for the time being.

4) Change the DNS on your server you wish to P2V so that it's primary DNS points to a VALID server other than itself... otherwise, your P2V will fail immediately. This is because the server being virtualized needs to be able to resolve the DNS names of the destination ESX server and VirtualCenter server. Change the DNS for your VirtualCenter server and ESX server so that they are all pointing to the same DNS server, just to be safe.

5) Run the P2V, it should go off without a hitch now that DNS is right.

6) Now that the server is back up and virtual, run dcpromo again to make it a domain controller once more. This should reinstall DNS for you, if you uninstalled it earlier. Fix up DHCP and WINS, and you are back in business. Don't forget to transfer back your FSMO roles, if you want them to be on your virtual server!

7) Readjust your DNS settings on the re-promoted domain controller to point to itself, and fix your VirtualCenter/ESX server DNS settings too, if you had to change them.

Thursday, March 31, 2011

Configuring Dell PowerConnect 6224 Switches for iSCSI traffic to a Compellent

This is the second time I have set up a Compellent, and this time I figured I would go all the way down the rabbit's hole and delve into switch optimization for iSCSI traffic. My last project was with Cisco switches, this time, it's with Dell. A pair of PowerConnect 6224 switches to be precise... main reason being, Dell purchased Compellent not long ago, so like-products, yadda yadda. Alas, I digress.

In previous talks with Compellent, I was give the '10 commandments' of switch configuration from at least two different representatives upon having iSCSI issues. These commandments roughly consisted of:



• Gigabit Full Duplex connectivity between Storage Center and all local Servers
• Auto-Negotiate for all switches that will correctly negotiate at Gigabit Full Duplex
• Gigabit Full Duplex hard set for all iSCSI ports, for both Storage Center and Servers for switches that do not correctly negotiate
• Bi-Directional Flow Control enabled for all Switch Ports that servers or controllers are using for iSCSI traffic.
• Bi-Directional Flow Control enabled for all Server Ports used for iSCSI (Storage Center and QLogic HBA's automatically enable it).
• Bi-Directional Flow Control enabled for all ports that handle iSCSI traffic. This includes all devices in between two sites that are used for replication.
• Separate VLAN for iSCSI.
• Two separate networks or VLANs for multipathed iSCSI.
• Two separate IP subnets for the seperate networks or VLANs in multipathed iSCSI.
• Unicast storm control disabled on every switch that handles iSCSI traffic.
• Multicast disabled at the switch level for any iSCSI VLANs.
o Multicast storm control enabled (if available) when multicast can not disabled.
• Broadcast disabled at the switch level for any iSCSI VLANs.
o Broadcast storm control enabled (if available) when broadcast can not disabled.
• Routing disabled between regular network and iSCSI VLANs.
• Do not use Spanning Tree (STP or RSTP) on ports which connect directly to end nodes (the server or Compellent controllers iSCSI ports.) If you must use it, enable the Cisco PortFast option on these ports so that they are configured as edge ports.
• Ensure that any switches used for iSCSI are of a non-blocking design.
• When deciding which switches to use, remember that you are running SCSI traffic over it. Be sure to use a quality managed enterprise class networking equipment. It is not recommended to use SBHO (small business/home office) class equipment outside of lab/test environments.
• Verify optimal MTU for replications. Default is 1500 but sometimes WAN circuits or VPNs can create additional overhead which can cause packet fragmentation. This fragmentation can suboptimal performance. The MTU is adjustable via the GUI in 5.x Storage Center Firmware.
For Jumbo Frame Support
• Some switches have limited buffer sizes and can only support Flow Control or Jumbo Frames, but not both at the same time. Compellent strongly recommends choosing Flow Control.
• All devices connected through iSCSI need to support 9k jumbo frames.
• All devices used to connect iSCSI devices need to support it.
o This means every switch, router, WAN Accelerator and any other network device that will handle iSCSI traffic needs to support 9k Jumbo Frames.
• If the customer is not 100% positive that every device in their iSCSI network supports 9k Jumbo Frames, then they should NOT turn on Jumbo Frames.
• QLogic 4010 series cards (Early Compellent iSCSI Cards) do not support Jumbo Frames.
o In the Storage Center GUI default screen, expand the tree in the following order Controllers->SN#(for the controller)->IO Cards->iSCSI->Highlight the port and the general tab should list the model number in the description.
• Because devices on both sides (server and SAN) need Jumbo Frames enabled, the change to enable to disable Jumbo Frames is recommended during a maintenance window. If servers have it enabled first, the Storage Center will not understand their packets. If Storage Center enables it first, servers will not understand its packets.



Okay, so it was more than 10. Anyway, all that roughly distills down to the following needs:

1) Set spanning-tree mode rstp and enable portfast on all ports, or disable spanning-tree all together.
2) Enable jumbo frame support on all iSCSI ports
3) Disable unicast storm-control on all iSCSI ports
4) Enable multicast storm-control on all iSCSI ports
5) Enable broadcast storm-control on all iSCSI ports
6) Enable flow control

For each switch, the commands are different... but for Dell PowerConnect 6224 in particular, these needs translated into the following commands (in order, considering I used ports 1-12 for iSCSI)

enable
configure
spanning-tree mode rstp
interface 1/g1-1/g12
spanning-tree portfast
mtu 9216
no storm-control unicast
storm-control multicast
storm-control broadcast
exit
flowcontrol

That was it, switch optimized, on we go!

Tuesday, February 8, 2011

How to fix a blackberry sync issue (the secret CNFG menu)

So the title is a bit cryptic, but I ran into an issue where a certain blackberry decided to no longer sync it's contacts with our newly migrated BES server. No idea why, tried rebooting the phone, toggling wireless sync of contacts, etc etc... no luck. The user had a bunch of stuff on the phone, so I didn't really want to wipe it or reactivate it particularly. Found this little tidbit of info on Expert's Exchange at:

http://www.experts-exchange.com/Hardware/Handhelds_-_PDAs/Blackberry/Q_23024284.html

All you need to do is go to Options > Advanced Options > Enterprise Activation and in the email field press and hold the ALT key and type CNFG. Once you enter this a hidden menu will appear and you need to change "Wireless Sync" to No, now exit this menu and wait 30 seconds and repeat the process but turn sync back to Yes. Once you've changed this setting you will see a slow sync will automatically start and it will repair all the wireless sync settings. In the rare case that this fails you just need to wipe the device and reactivate.

Monday, February 7, 2011

Moving ESXi guests without vCenter

vCenter is a very cool application, enabling you to do many things that would otherwise take hours to figure out and execute, especially in a multi-host environment (which is pretty typical for VMware). One such feature is the ability to move guest VM's between hosts. Using vCenter, it's as simple as right clicking the guest and picking 'Migrate', then picking the new destination and waiting. But what if you don't HAVE vCenter, or in my case, the server you want to move IS your vCenter server. Not so easy then!

Veeam to the rescue! If you don't know what Veeam is, you haven't been working with VMware very long. It's a great company that produces lots of useful software for managing VMware environments... and some of their best tools are free! In this case, their free product FastSCP is what we are after. Register for it and install it on any other server than the one you want to move. Setup is simple, and adding hosts is even simpler. Once you add a host, you can expand it's storage just like the storage browser in vCenter or the vSphere client, and moving servers is just a simple copy/paste operation. Well, okay, there are a few more steps:

1) If the server you are moving is part of a vCenter environment, it is highly recommended that you unregister it first. Even if it's not, you should unregister it anyway. Either from vCenter, or using vSphere to connect directly to the host, shut down the guest machine then right click and select 'Remove from Inventory'. Don't click delete! It will warn you about some stuff, just click yes and proceed. You'll notice the VM is no longer listed under your host. It's now just a bunch of files in a folder on the datastore.

2) Start up Veeam, and add the source and destination hosts to the manager if you haven't already. Expand the source host, right click on the folder for the virtual machine you want to move, and click copy. Then, expand the destination host, and right click in the right hand side's empty area, and click paste.

NOTE: If you installed Veeam on an x64 server, you may get errors when trying to copy and paste folders, or create new folders. It's a bug, and Veeam disclaimers that x64 installations are purely experimental at this point. There are two workarounds... either create the folder first using vSphere by browsing to the datastore on the host and making a new folder, then copying only files from one folder to another, or follow this method to fix up your server so that you won't get that error:

http://www.virtualvcp.com/content/view/26/1/ details below

1. Download the Microsoft .NET Framework 2.0 SDK (THE 64-BIT VERSION!) from the Microsoft Website. This download is about 300MB if I remember correctly.
2. Install the SDK on the 64-bit machine that you would like to run Veeam FastSCP on.
3. Now, open a command prompt (Start -> Run -> Type "cmd" -> OK)
4. Change directory to: C:\Program Files\Microsoft.NET\SDK\v2.0 64bit\Bin
5. Now Run: corflags "C:\Program Files (x86)\Veeam\Veeam Backup and FastSCP\VeeamShell.exe" /32BIT+
6. Now when you try and run Veeam FastSCP again, it should work fine.


3) Anyway, now that the files are moved, all you need to do is register the virtual machine on the new host. Open up vSphere, connect directly to the destination host, then browse the datastore. Find the .vmx file inside of the folder you copied and right click on it, then select 'Import' to add it to your inventory. You may get asked something to the effect of "Where did this virtual machine come from?"... in our case, you will want to pick "I moved it".

That's it, start up your server, it should work fine!



Just wait for the copy to finish, then delete the source... migration complete!