/var/log/metasplo.it: January 2013

I really do love the Citrix Xendesktop platform and products associated with it, but all too often Citrix have launch issues with their products. The latest issue is with Xentools 6.1 installer being a little dodgy and feature lacking (no PVS/VSS support) being a key problem.

I also experienced a number of issues upgrading from Xentools standard 6.0 to 6.1 on machines that didn't require PVS support.

One of the main issues that I had is what Citrix call "continous reboots with standard tools installation never finishing". Citrix provide the following explaination for the problem:

"This issue occurs when attempting to install the Standard Tools shipped with XenServer 6.1 into a VM that has no virtual network interfaces. A workaround is to create at least one virtual network interface, install the Standard Tools, and then remove the virtual network interface (if so desired)."

In my case there was in fact a virtual network interface and I was still having the looping. Eventually after around 10 reboots the installer simply hangs at the very start of the "installing drivers, installing guest tools screen" and goes no further.

Windows eventlogs give no clues and Citrix logs don't any useful information. After following the "Uninstalling the Xenserver 6.1 standard tools" steps from the CTX135099 Xenserver Tools Workaround Guide for 6.1.0, including removing all the Windows Driver packages manually, I still had no joy. In fact on some systems I couldn't remove the "Windows Driver Package - Citrix Systems Inc. (xennet) Net" package.

After scouring windows programs and featuring and removing everything guest/driver related I feel back to the trusted "wmic product get name" command to get a list of installed products. I found "Citrix Xen Windows x64 PV Drivers" which wasn't listed under Windows programs and features GUI. It seems the "Citrix Xen Windows x64 PV Drivers" package failed to uninstall or install properly and was holding up the xentools installation process. To resolve the problem is fairly simple.

Boot into Windows and open a command prompt
Issue the command - wmic product where name='Citrix Xen Windows x64 PV Drivers' call uninstall
Reboot
Rerun the xentools installation process

After following the above steps I finally had a working Xentools standard successfully installed with statistically reporting being sent back to Xencenter.

Remember if you want to install the legacy tools (with support for PVS and volume shadow copy) then your VM must have its platform:device_id set to 0001. You can read more about changing the device_id under the section title "Preparing to Install the XenServer 6.0.2 Hotfix 9 Tools or XenServer 6.1 Legacy Tools in a New Windows Vista, Windows 7, Windows Server 2008, or Windows Server 2008 R2 VM (for PVS or VSS Support)" of CTX135099.

I am not the biggest fan in of HP PCM, even the paid version PCM+ is very clunky, has reasonably limited functionality and is expensive for what it is.

We recently had a situation where one of the power supplies on an edge switch died and we didn't receive any notifications as HP PCM failed to notify us. Luckily it was noticed quickly and there was no associated downtime, but things could of been much worse. This sprung me into action, what other options are available and how could I implement them.

My main goal is to receive all the warning/critical traps from my switches. I will also do some SNMP polling to log statistics such as CPU usage and chassis temperature. The beauty of traps is they are real time, so you don't have to wait for a polling interval to find out your chassis is too hot or one of the power supplies just died.

SCOM 2012 is not quite ready

One of the most anticipated network monitoring solutions was the native SNMP engine in Microsoft SCOM 2012. I could go on forever about the SCOM 2012 SNMP implementation, but let me point out a few key problems and reasons why I decided on Nagios over SCOM.

1. Every device you receive traps from must be monitored. This doesn't sound like a big deal, for most network devices its not, but what you may not realize is for SCOM to receive traps from a device it monitors, that device must be poll-able by SNMP. So take your average Unix box, you may want to send traps from it, but not run a SNMP daemon, not possible with SCOM 2012.

2. Another monitoring limitation is you can only receive traps from the same version of SNMP that SCOM is monitoring the device with. So if you are monitoring a Unix box with SNMP v3 for security and want to send traps via v1, this is not possible. Furthermore some (5400, 26XX series) HP switches send traps in v1, there's no way I am going to monitor switches in v1, even in read-only mode.

These are two key deal breakers that make Nagios a much more appealing option.

But isn't Nagios difficult to setup?

While Nagios can be a pain to initially learn and setup, once your have everything in place it really is easy to manage and maintain. I decided to go with Centreon, the Nagios front end GUI, that not only makes Nagios configuration easier but also gives some extra options. An even easier option is to use the Fully Automated Nagios (FAN) distribution, which as its name suggests is a pre-configured Nagios/Centreon distro.

I am not going to do a step by step guide on setting up FAN, Nagios or Centreon, but I do have some very useful information for those wanting to collect traps from HP switches with FAN.

The difference between trapping and polling

When collecting traps a "TRAP OID" is used.

When polling a HP switch for information with SNMP, you use separate OID's for every separate statistic you want to collect (CPU usage, Chassis Temperature, etc). When recieving information via traps, all information is received on the single trap OID.

This can create challenges surrounding how to trigger alerts based on the text sent with trap. Chances are if you are trapping "not-info" traps (meaning the switch only sends warning/critical traps) you may want to receive notifications for every trap. However if you do get repeated traps that you don't wish to receive notifications for, there is no way to ignore traps based on text with a default Centreon/Nagios "Catch all traps receiver".

Trapping not-info events on a HP switch can be enabled by one of the following commands. depending on your switch model:

snmp-server host 192.168.1.1 community public trap-level not-info
snmp-server host 192.168.1.1 public not-info

Receiving and processing traps with Centreon

By default Centreon only catches traps that have MIBs configured, however there is a simple guide on the Centreon wiki that should get you all setup. The above guide helps you create a generic OID that Centreon issues to any unknown/unmatched trap it receives, allowing you to generate alerts based on the traps.

What you will probably want to do is also configure custom traps for the OIDs that each of your switches use to send traps. This will give you some granularity in terms of notifications and filtering out traps that aren't important to that specific switch or switch model. You can do this in Centreon under Configure > Services > SNMP Traps. Below is an example of one of the traps I use for HP switches.

Set the "Trap Name" as you wish, I used "CUSTOM-Switch-HP-Traps_1_5400"
Set the "OID" as the OID your switch uses to send traps. I have listed the trap OIDs I have discovered below.
Set a "Vendor name" as you wish, obviously HP Networks might be suitable for any HP devices.
Set the "Output Message" as $* - This will ensure you receive all of the text in the trap in your notification.
Set your "Default Status" as warning or critical, depending on how important it is in your environment.
Tick "Submit result" so the status is passed to Nagios.
Save the trap, then click on Configuration > Nagios > SNMP traps > Generate. (This last step is important, without it the SNMP daemon doesn't receive the updated trap.

That is a very basic trap definition that you can then attach to a passive Centreon service and receive notifications as the traps arrive.

List of HP trap OID's

These are the trap OID's I found while testing different HP switches, I would welcome any over that I may have missed. These can be used in conjunction with the above Centreon custom trap guide.

.1.3.6.1.4.1.11.2.3.7.11.50.0.2 - 5406 trap OID
.1.3.6.1.4.1.11.2.3.7.11.51.0.2 - 5412 trap OID
.1.3.6.1.2.1.105.0.1 - 5400 some POE trap OID
.1.3.6.1.4.1.11.2.3.7.11.87.0.2 - 2610al trap OID
.1.3.6.1.4.1.11.2.3.7.11.44.0.2 - 2650 trap OID
.1.3.6.1.4.1.11.2.3.7.11.76.0.2 - 2610-24 trap OID
.1.3.6.1.4.1.11.2.3.7.11.23.0.2 - 4180-gl trap OID

Adding filtering capability to Centreon traps

So you have Centreon setup, you are receiving traps from your HP switches and they are triggering Nagios service changes and notifications. Unfortunately you have this one annoying trap that triggers twice a day as a warning trap and you get a notification every single time, what can you do to stop it? Well with a default Centreon install nothing, but since when did we do anything default.

Just so you understand the back end a little better, the traps flow as follows.

1. Your switch or device generates a trap and sends it to the SNMP daemon running on the Centreon/Nagios box.

2.
(a) If the trap is known the daemon forwards it to the /usr/share/centreon/bin/centTrapHandler-2.x.

(b) If the trap is unknown and you have not configured a catch-all trap, the trap is dropped.

(c) If the trap is unknown and you have configured a catch-all trap, the trap is forwarded to the Centreon unknown trap handler (/usr/share/centreon/bin/snmptt2TrapHandler.pl).

It is then given the generic Centreon trap OID of .1.3.6.1.4.1.2021.13.990.0.17 and passed back to centTrapHandler-2.x.

3. centTrapHandler-2.x does the processing of the trap and passes valid information to any matching Centreon services and then Nagios.

The default centTrapHandler has no way of filtering any traps, all it cares about is matching the trap to a service and changing the status of the service.

I have made some minor modifications to the /usr/share/centreon/bin/centTrapHandler-2.x which is the trap handler Centreon uses before generating Nagios alerts. These modifications allow traps to be discarded under specific circumstances and also allow for logging of the discarded traps.

You will need my centTrapHandler-2.x.patch, available from my github here, before you get started.

Modifying the centTrapHandler-2.x to allow trap filtration

Download the centTrapHandler-2.x.patch
Backup the current centTrapHandler:
cd /usr/share/centreon/bin ; cp centTrapHandler-2.x centTrapHandler-2.x.bak
Patch the centTrapHandler:
patch centTrapHandler-2.x < centTrapHandler-2.x.patch
Create a new logging directory for ignored trap logging.
mkdir /var/log/snmp

The new trap handler is now ready to use.

Filtering known traps with Centreon

Now that you have added the capability to filter traps, there is a few things you need to know before applying filters. The following need to be true for the trap to be filtered.

The service must be passive.
The service must contain the key word "_LOG". You can change the key word it in /usr/share/centreon/bin/centTrapHandler-2.x by searching for /_LOG/ and replacing it with /yourKeyWordHere/
cust_unknownSkipEnable must be set to 1 in /usr/share/centreon/bin/centTrapHandler-2.x. This gives you an easy way to enable and disable the filtering as required.
The trap must trigger as status "Unknown", we can do this by using the "Advanced Rule matching" capabilities of Centreon trap definitions.

If you have existing services you will need to rename them with _LOG in the title to support trap filtering.

To setup the filters you need to do the following:

Go to your Centreon trap definitions:
Centreon > Configuration > Services > SNMP Traps.
Open the trap in question. This should be one of the custom traps associated with the trap OID of your switch that you created.
Enable "Advanced matching mode" and create a new "Advanced matching rule" with the following properties.
String: @OUTPUT@
Regexp: /your matching text here/ - For example: /port (.*) is Blocked by STP/ to block a STP warning
Status: Unknown
Save the changes
Export the trap definition to Nagios.
Configuration > Nagios > SNMP traps > Generate

Now when your matching text is received by the trap handler, it will be matched, set as Unknown and then dropped and logged by the centTrapHandler.

If you need troubleshoot trap filtering you can check the logs in /var/log/snmp, snmptrap_ignored.log is your matched/ignored traps and snmptrap_logging.log logs all traps received by passive services.

Thursday 10 January 2013

Citrix Xenserver 6.1 Xentools installation problems

Friday 4 January 2013

Mass local administrator password change tool for Windows servers/desktops

Thursday 3 January 2013

Monitoring HP switches with Nagios - OIDs and Trap Filtering