I am not the biggest fan in of HP PCM, even the paid version PCM+ is very clunky, has reasonably limited functionality and is expensive for what it is.
We recently had a situation where one of the power supplies on an edge switch died and we didn't receive any notifications as HP PCM failed to notify us. Luckily it was noticed quickly and there was no associated downtime, but things could of been much worse. This sprung me into action, what other options are available and how could I implement them.
My main goal is to receive all the warning/critical traps from my switches. I will also do some SNMP polling to log statistics such as CPU usage and chassis temperature. The beauty of traps is they are real time, so you don't have to wait for a polling interval to find out your chassis is too hot or one of the power supplies just died.
SCOM 2012 is not quite ready
One of the most anticipated network monitoring solutions was the native SNMP engine in Microsoft SCOM 2012. I could go on forever about the SCOM 2012 SNMP implementation, but let me point out a few key problems and reasons why I decided on Nagios over SCOM.
1. Every device you receive traps from must be monitored. This doesn't sound like a big deal, for most network devices its not, but what you may not realize is for SCOM to receive traps from a device it monitors, that device must be poll-able by SNMP. So take your average Unix box, you may want to send traps from it, but not run a SNMP daemon, not possible with SCOM 2012.
2. Another monitoring limitation is you can only receive traps from the same version of SNMP that SCOM is monitoring the device with. So if you are monitoring a Unix box with SNMP v3 for security and want to send traps via v1, this is not possible. Furthermore some (5400, 26XX series) HP switches send traps in v1, there's no way I am going to monitor switches in v1, even in read-only mode.
These are two key deal breakers that make Nagios a much more appealing option.
But isn't Nagios difficult to setup?
While Nagios can be a pain to initially learn and setup, once your have everything in place it really is easy to manage and maintain. I decided to go with Centreon, the Nagios front end GUI, that not only makes Nagios configuration easier but also gives some extra options. An even easier option is to use the
Fully Automated Nagios (FAN) distribution, which as its name suggests is a pre-configured Nagios/Centreon distro.
I am not going to do a step by step guide on setting up FAN, Nagios or Centreon, but I do have some very useful information for those wanting to collect traps from HP switches with FAN.
The difference between trapping and polling
When collecting traps a "TRAP OID" is used.
When polling a HP switch for information with SNMP, you use separate OID's for every separate statistic you want to collect (CPU usage, Chassis Temperature, etc). When recieving information via traps, all information is received on the single trap OID.
This can create challenges surrounding how to trigger alerts based on the text sent with trap. Chances are if you are trapping "not-info" traps (meaning the switch only sends warning/critical traps) you may want to receive notifications for every trap. However if you do get repeated traps that you don't wish to receive notifications for, there is no way to ignore traps based on text with a default Centreon/Nagios "Catch all traps receiver".
Trapping not-info events on a HP switch can be enabled by one of the following commands. depending on your switch model:
snmp-server host 192.168.1.1 community public trap-level not-info
snmp-server host 192.168.1.1 public not-info
Receiving and processing traps with Centreon
By default Centreon only catches traps that have MIBs configured, however there is a simple guide
on the Centreon wiki that should get you all setup. The above guide helps you create a generic OID that Centreon issues to any unknown/unmatched trap it receives, allowing you to generate alerts based on the traps.
What you will probably want to do is also configure custom traps for the OIDs that each of your switches use to send traps. This will give you some granularity in terms of notifications and filtering out traps that aren't important to that specific switch or switch model. You can do this in Centreon under
Configure > Services > SNMP Traps. Below is an example of one of the traps I use for HP switches.
- Set the "Trap Name" as you wish, I used "CUSTOM-Switch-HP-Traps_1_5400"
- Set the "OID" as the OID your switch uses to send traps. I have listed the trap OIDs I have discovered below.
- Set a "Vendor name" as you wish, obviously HP Networks might be suitable for any HP devices.
- Set the "Output Message" as $* - This will ensure you receive all of the text in the trap in your notification.
- Set your "Default Status" as warning or critical, depending on how important it is in your environment.
- Tick "Submit result" so the status is passed to Nagios.
- Save the trap, then click on Configuration > Nagios > SNMP traps > Generate. (This last step is important, without it the SNMP daemon doesn't receive the updated trap.
That is a very basic trap definition that you can then attach to a passive Centreon service and receive notifications as the traps arrive.
List of HP trap OID's
These are the trap OID's I found while testing different HP switches, I would welcome any over that I may have missed. These can be used in conjunction with the above Centreon custom trap guide.
.1.3.6.1.4.1.11.2.3.7.11.50.0.2 - 5406 trap OID
.1.3.6.1.4.1.11.2.3.7.11.51.0.2 - 5412 trap OID
.1.3.6.1.2.1.105.0.1 - 5400 some POE trap OID
.1.3.6.1.4.1.11.2.3.7.11.87.0.2 - 2610al trap OID
.1.3.6.1.4.1.11.2.3.7.11.44.0.2 - 2650 trap OID
.1.3.6.1.4.1.11.2.3.7.11.76.0.2 - 2610-24 trap OID
.1.3.6.1.4.1.11.2.3.7.11.23.0.2 - 4180-gl trap OID
Adding filtering capability to Centreon traps
So you have Centreon setup, you are receiving traps from your HP switches and they are triggering Nagios service changes and notifications. Unfortunately you have this one annoying trap that triggers twice a day as a warning trap and you get a notification every single time, what can you do to stop it? Well with a default Centreon install nothing, but since when did we do anything default.
Just so you understand the back end a little better, the traps flow as follows.
1. Your switch or device generates a trap and sends it to the SNMP daemon running on the Centreon/Nagios box.
2.
(a) If the trap is known the daemon forwards it to the /usr/share/centreon/bin/centTrapHandler-2.x.
(b) If the trap is unknown and you have not configured a catch-all trap, the trap is dropped.
(c) If the trap is unknown and you have configured a catch-all trap, the trap is forwarded to the Centreon unknown trap handler (/usr/share/centreon/bin/snmptt2TrapHandler.pl).
It is then given the generic Centreon trap OID of .1.3.6.1.4.1.2021.13.990.0.17 and passed back to centTrapHandler-2.x.
3. centTrapHandler-2.x does the processing of the trap and passes valid information to any matching Centreon services and then Nagios.
The default centTrapHandler has no way of filtering any traps, all it cares about is matching the trap to a service and changing the status of the service.
I have made some minor modifications to the
/usr/share/centreon/bin/centTrapHandler-2.x which is the trap handler Centreon uses before generating Nagios alerts. These modifications allow traps to be discarded under specific circumstances and also allow for logging of the discarded traps.
Modifying the centTrapHandler-2.x to allow trap filtration
- Download the centTrapHandler-2.x.patch
- Backup the current centTrapHandler:
cd /usr/share/centreon/bin ; cp centTrapHandler-2.x centTrapHandler-2.x.bak
- Patch the centTrapHandler:
patch centTrapHandler-2.x < centTrapHandler-2.x.patch
- Create a new logging directory for ignored trap logging.
mkdir /var/log/snmp
The new trap handler is now ready to use.
Filtering known traps with Centreon
Now that you have added the capability to filter traps, there is a few things you need to know before applying filters. The following need to be true for the trap to be filtered.
- The service must be passive.
- The service must contain the key word "_LOG". You can change the key word it in /usr/share/centreon/bin/centTrapHandler-2.x by searching for /_LOG/ and replacing it with /yourKeyWordHere/
- cust_unknownSkipEnable must be set to 1 in /usr/share/centreon/bin/centTrapHandler-2.x. This gives you an easy way to enable and disable the filtering as required.
- The trap must trigger as status "Unknown", we can do this by using the "Advanced Rule matching" capabilities of Centreon trap definitions.
If you have existing services you will need to rename them with _LOG in the title to support trap filtering.
To setup the filters you need to do the following:
- Go to your Centreon trap definitions:
Centreon > Configuration > Services > SNMP Traps.
- Open the trap in question. This should be one of the custom traps associated with the trap OID of your switch that you created.
- Enable "Advanced matching mode" and create a new "Advanced matching rule" with the following properties.
String: @OUTPUT@
Regexp: /your matching text here/ - For example: /port (.*) is Blocked by STP/ to block a STP warning
Status: Unknown
- Save the changes
- Export the trap definition to Nagios.
Configuration > Nagios > SNMP traps > Generate
Now when your matching text is received by the trap handler, it will be matched, set as Unknown and then dropped and logged by the centTrapHandler.
If you need troubleshoot trap filtering you can check the logs in /var/log/snmp, snmptrap_ignored.log is your matched/ignored traps and snmptrap_logging.log logs all traps received by passive services.