Service Monitoring
Server: signal.gnome.org, provided by OSUOSL.
URL: nagios.gnome.org
Software used: Nagios
Configuration files: /etc/nagios/*, /etc/check_mk, /etc/nagios, /etc/nrpe.d
Puppet module: modules/check-mk/*
It is intended to monitor as many important aspects of the GNOME servers and their services as possible, and alert us immediately should a server, or one of its services become unavailable, or in other circumstances such as if a server is running out of disk space, an abnormal load, unusually high number of socket connections open, or other circumstances that may indicate an imminent service failure.
Nagbot
Nagbot is a supybot (enhanced with the notify plugin) running on signal and sending out notifications directly to the #sysadmin IRC channel when something goes wrong on any of the monitored services. The username / password to access it are available on combobox:/home/admin/secret/nagbot.
XMPP Notifications
We now handle XMPP Nagios Notifications through a nice script that gets triggered anytime nagios sends out a notification on any of the hosted services. Some more informations about how this got set up are available here.
If you want to enable XMPP Notifications for your e-mail, do the following:
- cd puppet/modules/check-mk/nagios
- Modify contacts.cfg and add the following entry, when done commit and push to the Puppet repository:
define contact { contact_name username_xmpp use generic-contact alias Full Name email your-email@gmail.com pager your-email@gmail.com service_notification_commands notify-by-xmpp host_notification_commands notify-by-xmpp }
Then add the newly created entry into the admins contactgroup:
define contactgroup { contactgroup_name admins alias Nagios Administrators members user1, user2, username_xmpp }
P.S Look at the existing entries if you want to add yourself just for the email notifications and not for the XMPP ones.
NRPE
Note: the nrpe puppet module handles all the following steps automatically. These instructions are left here for documentation purposes.
To add a new host and monitor it through NRPE, do the following:
1. Setup the needed packages.
yum install nagios-plugins-load nagios-plugins-swap nagios-plugins-users nagios-plugins-time nagios-plugins-ssh nagios-plugins-ntp nagios-plugins-procs nrpe
2. copy the following template to /etc/nrpe.d/nrpe_local.cfg:
command[check_mysql]=/usr/lib64/nagios/plugins/check_mysql command[service_puppet]=/usr/lib64/nagios/plugins/check_procs -c 1:4 -a 'puppetd' command[ntpd]=/usr/lib64/nagios/plugins/check_procs -c 1:4 -a 'ntpd' command[swap_usage]=/usr/lib64/nagios/plugins/check_swap -w 90% -c 50%
3. add nagios.gnome.org's IP to the allowed IPs on /etc/nagios/nrpe.cfg:
allowed_hosts=127.0.0.1,140.211.166.75
4. open the relevant port on IPTABLES:
-A INPUT -s 140.211.166.75 -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT
5. restart NRPE
service nrpe restart
6. add a new host definitions on /etc/nagios3/conf.d/hosts.cfg on signal.gnome.org, and restart nagios3.
define host { use generic-host host_name extensions check_command check_ssh max_check_attempts 10 alias extensions address 209.132.180.183 hostgroups redhat-servers, web-servers, virtual-machines parents vbox }
CHECK_MK
Note: the check-mk::client puppet module handles all the following steps automatically. These instructions are left here for documentation purposes.
Setting up check-mk is easy:
1. open the relevant port on IPTABLES on the client host.
-A INPUT -s 140.211.166.75 -m state --state NEW -m tcp -p tcp --dport 6556 -j ACCEPT
2. install the check-mk-agent on the client host:
yum install check-mk-agent or apt-get install check-mk-agent
3. on nagios.gnome.org run and set up an inventory for the available services on the listening hosts:
sudo check_mk -II
4. add the new host to /etc/check_mk/main.mk:
all_hosts = [ 'localhost', [...], 'ostree.gnome.org', 'live.gnome.org']
5. recreate the nagios configurations, the -O flag will do everything for you including the refresh of the nagios3 service.
sudo check_mk -O