Skip to main content

Re: pool maintenance and monitoring

  • From: "Antonio S. Cofiño" < >
  • To: < >
  • Subject: Re: pool maintenance and monitoring
  • Date: Wed, 1 May 2013 18:56:57 +0200
  • Organization: Universidad de Cantabria

What I have implemented is a checking my zpool in a remote server from the nagios instance by using a ssh-key with command executio restriction.
from="192.168.202.6,193.144.184.36",command="/usr/sbin/zpool $SSH_ORIGINAL_COMMAND",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa

in command parameter you can restrict the zpool subcommand to avoid the execution of other commands.

The nagios plugin is a python script which is connecting to the remote server where de pool exists.

If someone is interested on the script, please drop me a message and I will make it public.

Regards

--
Antonio S. Cofiño
Grupo de Meteorología de Santander
Dep. de Matemática Aplicada y
        Ciencias de la Computación
Universidad de Cantabria
http://www.meteo.unican.es

El 01/05/2013 1:59, William Van Hevelingen escribió:
Speaking of monitoring. :)

Can anyone recommend a good nagios plugin for checking zpools?

I would like it to work on Solaris 10/11 and FreeBSD 8/9.

One issue I've run into so far is that zpool status requires root, is there an easy way I can allow the nagios or nrpe user to be able to run zpool status?

Thanks,
William


On Tue, Apr 30, 2013 at 4:01 PM, Cindy Swearingen < <mailto: >> wrote:

    Dear ZFS Friends,

    I work in the external ZFS communities and this seems like a good time
    to remind everyone that monitoring your system and pool resources is
    very important role in the continuing health of your data.

    I see very large RAIDZ1 or RAIDZ pools (50TB, 40+ devices plus 1
    spare) or non-redundant pools that are difficult to recover from if
    the hardware fails or some other bad thing happens.

    Keep in mind the following:

    1. RAIDZ pools have different failure modes:

    A. A RAIDZ1 pool can withstand the failure of 1 device per VDEV
    B. A RAIDZ2 pool can withstand the failure of 2 devices per VDEV
    C. A RAIDZ3 pool can withstand the failure of 3 devices per VDEV

    2 If a device fails or has a connection problem in a non-redundant
    pool and data is corrupted, then the pool will most likely need to
    be restored from backup.

    4. Always have good, recent backups.

    If your intended pool is so large that you can't back it up on a
    regular basis, then don't build it.

    5. You should be monitoring your pools and underlying hardware
    on a regular basis, like weekly. Non-redundant pools should be
    monitored more often.

    See this section of the ZFS Admin Guide for more information
    about maintenance and monitoring practices:

    
http://docs.oracle.com/cd/E26502_01/html/E29007/zfspools-4.html#gentextid-12507

    *Silly me: biweekly means 2 times per month. What I meant was
    semiweekly, 2 times per week. I'll fix this.




--
Thanks,
William



Re: pool maintenance and monitoring

Sarunas Vancevicius 05/01/2013

<Possible follow-up(s)>

Re: pool maintenance and monitoring

Josh Simon 05/01/2013

Re: pool maintenance and monitoring

Christian Kujau 05/01/2013

Re: pool maintenance and monitoring

Antonio S. Cofiño 05/01/2013

Re: pool maintenance and monitoring

Antonio S. Cofiño 05/01/2013

Re: pool maintenance and monitoring

Cindy Swearingen 05/01/2013

Re: pool maintenance and monitoring

Jim Klimov 05/01/2013
 
 
Close
loading
Please Confirm
Close