Service Monitoring

  • Server: signal.gnome.org, provided by OSUOSL.

  • URL: nagios.gnome.org

  • Software used: Nagios

  • Configuration files: /etc/nagios/*, /etc/check_mk, /etc/nagios, /etc/nrpe.d

  • Puppet module: modules/check-mk/*

It is intended to monitor as many important aspects of the GNOME servers and their services as possible, and alert us immediately should a server, or one of its services become unavailable, or in other circumstances such as if a server is running out of disk space, an abnormal load, unusually high number of socket connections open, or other circumstances that may indicate an imminent service failure.

Nagbot

Nagbot is a supybot (enhanced with the notify plugin) running on signal and sending out notifications directly to the #sysadmin IRC channel when something goes wrong on any of the monitored services. The username / password to access it are available on combobox:/home/admin/secret/nagbot.

XMPP Notifications

We now handle XMPP Nagios Notifications through a nice script that gets triggered anytime nagios sends out a notification on any of the hosted services. Some more informations about how this got set up are available here.

If you want to enable XMPP Notifications for your e-mail, do the following:

  1. cd puppet/modules/check-mk/nagios
  2. Modify contacts.cfg and add the following entry, when done commit and push to the Puppet repository:

define contact {
        contact_name    username_xmpp
        use             generic-contact
        alias           Full Name
        email           your-email@gmail.com
        pager           your-email@gmail.com
        service_notification_commands   notify-by-xmpp
        host_notification_commands      notify-by-xmpp
}

Then add the newly created entry into the admins contactgroup:

define contactgroup {
        contactgroup_name   admins
        alias               Nagios Administrators
        members             user1, user2, username_xmpp
}

P.S Look at the existing entries if you want to add yourself just for the email notifications and not for the XMPP ones.

NRPE

{i} Note: the nrpe puppet module handles all the following steps automatically. These instructions are left here for documentation purposes.

To add a new host and monitor it through NRPE, do the following:

1. Setup the needed packages.

yum install nagios-plugins-load nagios-plugins-swap nagios-plugins-users nagios-plugins-time nagios-plugins-ssh nagios-plugins-ntp nagios-plugins-procs nrpe

2. copy the following template to /etc/nrpe.d/nrpe_local.cfg:

command[check_mysql]=/usr/lib64/nagios/plugins/check_mysql
command[service_puppet]=/usr/lib64/nagios/plugins/check_procs -c 1:4 -a 'puppetd'
command[ntpd]=/usr/lib64/nagios/plugins/check_procs -c 1:4 -a 'ntpd'
command[swap_usage]=/usr/lib64/nagios/plugins/check_swap -w 90% -c 50%

3. add nagios.gnome.org's IP to the allowed IPs on /etc/nagios/nrpe.cfg:

allowed_hosts=127.0.0.1,140.211.166.75

4. open the relevant port on IPTABLES:

-A INPUT  -s 140.211.166.75 -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT

5. restart NRPE

service nrpe restart

6. add a new host definitions on /etc/nagios3/conf.d/hosts.cfg on signal.gnome.org, and restart nagios3.

define host {
    use                 generic-host
    host_name           extensions
    check_command       check_ssh
    max_check_attempts  10
    alias               extensions
    address             209.132.180.183
    hostgroups          redhat-servers, web-servers, virtual-machines
    parents             vbox
}

CHECK_MK

{i} Note: the check-mk::client puppet module handles all the following steps automatically. These instructions are left here for documentation purposes.

Setting up check-mk is easy:

1. open the relevant port on IPTABLES on the client host.

-A INPUT -s 140.211.166.75 -m state --state NEW -m tcp -p tcp --dport 6556 -j ACCEPT

2. install the check-mk-agent on the client host:

yum install check-mk-agent or apt-get install check-mk-agent

3. on nagios.gnome.org run and set up an inventory for the available services on the listening hosts:

sudo check_mk -II

4. add the new host to /etc/check_mk/main.mk:

all_hosts = [ 'localhost',
        [...],
        'ostree.gnome.org',
        'live.gnome.org']

5. recreate the nagios configurations, the -O flag will do everything for you including the refresh of the nagios3 service.

sudo check_mk -O

Infrastructure/Archive/Monitoring (last edited 2020-11-04 13:57:34 by AndreaVeri)